Advancing the Art of Censorship Data Analysis
Authors: Ram Sundara Raman (University of Michigan), Apurva Virkud (University of Michigan), Sarah Laplante (Google Jigsaw), Vinicius Fortuna (Google Jigsaw), Roya Ensafi (University of Michigan)
Year: 2023
Issue: 1
Pages: 14–23
Abstract: A decade of research into collecting censorship measurement data has resulted in the introduction and continued operation of several censorship measurement platforms that collect large-scale, longitudinal censorship data. However, collecting data is only part of the process of understanding Internet censorship phenomena; interpreting this data requires a large amount of effort in data analysis, including removing false positives, adding information from external sources, and exploring aggregated data. The lack of a standardized data analysis process that performs such operations leads to incomplete and inaccurate characterizations of censorship. In this work, we present a detailed breakdown of the challenges involved in analyzing censorship measurement data, supported by examples from public censorship datasets such as OONI and Censored Planet. The key challenges identified in this paper encompass finding accurate measurement metadata, and accounting for unexpected causes of network interference other than Internet censorship, and we highlight findings from previous work that suffer from these challenges. To address these challenges, we design and implement an open-source data analysis pipeline for a currently active censorship measurement platform, Censored Planet, and motivate and validate each component of the pipeline by demonstrating censorship case studies that can be accurately characterized using the pipeline. We hope that our paper sheds light on the complexity of censorship data analysis and brings systematization to the process.
Copyright in FOCI articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.