An Analysis of Chinese Censorship Bias in LLMs

Mohamed Ahmed; Jeffrey Knockel; Rachel Greenstadt

An Analysis of Chinese Censorship Bias in LLMs

Authors: Mohamed Ahmed (Citizen Lab, University of Toronto), Jeffrey Knockel (Citizen Lab / Bowdoin College), Rachel Greenstadt (New York University)

Volume: 2025
Issue: 4
Pages: 112–129
DOI: https://doi.org/10.56553/popets-2025-0122

Artifact: Available

Download PDF

Abstract: When a large language model (LLM) has been trained on text featuring social biases, those biases implicitly impact the outputs of the model. Training an LLM on sanitized content, i.e., those pieces of content which remain after being subjected to state censorship (including alterations, deletions, and self-imposed censorship), results in what we term censorship bias. A model impacted by censorship bias may be less likely to reflect views that are routinely prohibited and more likely to reflect views that are not. This may particularly be an issue when interfacing with a model in a language that is predominantly used in a region with strong censorship laws. In this work, we outline what censorship bias is, introduce a novel methodology for identifying and measuring it, and apply that methodology to evaluate the most popular current LLMs. As part of the contributions of this work we designed and evaluated CensorshipDetector, a Chinese language text classification model which we use as part of our experimental design. Our evaluation of CensorshipDetector found it to be 91% accurate at differentiating between sanitized content and non-sanitized content. Our testing revealed evidence of censorship bias across all of the models we evaluated. Finally, we outline the potential harms of censorship bias, namely the exportation of information manipulation that would have primarily harmed a domestic audience to diaspora, as well as recommendations to various stakeholders to limit the harms of censorship bias and prevent it in the future.

Keywords: censorship, large language models, bias, artificial intelligence

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.