Honesty is the Best Policy: On the Accuracy of Apple Privacy Labels Compared to Apps' Privacy Policies
Authors: Mir Masood Ali (University of Illinois Chicago), David G. Balash (University of Richmond), Monica Kodwani (The George Washington University), Chris Kanich (University of Illinois Chicago), Adam J. Aviv (The George Washington University)
Volume: 2024
Issue: 4
Pages: 142–166
DOI: https://doi.org/10.56553/popets-2024-0111
Abstract: Apple introduced privacy labels in Dec. 2020 as a way for developers to report the privacy behaviors of their apps. While Apple does not validate labels, they also require developers to provide a privacy policy, which offers an important comparison point. In this paper, we fine-tuned BERT-based language models to extract privacy policy features for 474,669 apps on the iOS App Store, comparing the output to the privacy labels. We identify discrepancies between the policies and the labels, particularly as they relate to data collected linked to users. We find that 228K apps' privacy policies may indicate data collection linked to users than what is reported in the privacy labels. More alarming, a large number (97%) of the apps with a Data Not Collected privacy label have a privacy policy indicating otherwise. We provide insights into potential sources for discrepancies, including the use of templates and confusion around Apple's definitions and requirements. These results suggest that significant work is still needed to help developers more accurately label their apps. Our system can be incorporated as a first-order check to inform developers when privacy labels are possibly misapplied.
Keywords: privacy, mobile apps, privacy policies, privacy labels, nutrition labels, iOS, Apple, App Store, Natural Langugae Processing, Language Models, BERT
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.