Estimating Group Means Under Local Differential Privacy

Authors: René Raab (Friedrich-Alexander-Universität Erlangen-Nürnberg), Arijana Bohr (Friedrich-Alexander-Universität Erlangen-Nürnberg), Kai Klede (Friedrich-Alexander-Universität Erlangen-Nürnberg), Benjamin Gmeiner (Novartis Pharma GmbH), Bjoern M. Eskofier (Friedrich-Alexander-Universität Erlangen-Nürnberg and Institute of AI for Health, Helmholtz Zentrum München - German Research Center for Environmental Health)

Volume: 2025
Issue: 4
Pages: 236–274
DOI: https://doi.org/10.56553/popets-2025-0129

Download PDF

Abstract: The European Health Data Space (EHDS) aims to enable the sharing of health data across Europe to improve healthcare and research. While the EHDS mandates anonymization or pseudonymization of shared health data, these techniques may still allow adversaries to re-identify individuals. Local differential privacy (LDP) has been proposed as a formal privacy guarantee that can help mitigate this issue. In this paper, we consider a common problem when analyzing health data: estimating means for different groups. We discuss a generic privacy-preserving method for approximating the means of different groups in a decentralized setting where both the group and the value are considered private. We show that four concrete instantiations of the method based on existing mean estimation methods (Laplace, Bernoulli, Piecewise, and NPRR) are locally differentially private. We evaluate their performance on synthetic and real-world medical datasets. Our results show that the proposed methods can accurately estimate the group means, while maintaining privacy. However, similar to other LDP algorithms, our approach requires a sufficient amount of data (in our case a sufficient amount of samples per group) combined with a sufficiently large privacy budget ε to produce accurate results. We discuss concrete practical issues like choosing an appropriate input range, dealing with large privacy budgets through the use of the shuffle model of differential privacy, and the need for further analysis techniques to make LDP solutions applicable to practical medical data analysis.

Keywords: local differential privacy, data analysis, group means, decentralized data, mean estimation

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.