Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models
Authors: Jake Chanenson (University of Chicago), Madison Pickering (University of Chicago), Noah Apthrope (Colgate University)
Volume: 2025
Issue: 2
Pages: 280–308
DOI: https://doi.org/10.56553/popets-2025-0062
Abstract: Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.
Keywords: privacy, contextual integrity, governing knowledge commons, natural language processing, large language model, text tagging
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.
