How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?

Authors: Ahmed Roushdy Elkordy (University of Southern California), Jiang Zhang (University of Southern California), Yahya H. Ezzeldin (University of Southern California), Konstantinos Psounis (University of Southern California), Salman Avestimehr (University of Southern California)

Volume: 2023
Issue: 1
Pages: 510–526

Download PDF

Abstract: Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users’ devices, privacy still cannot be guaranteed since significant computations on users’ training data are shared in the form of trained local models. These local models have recently been shown to pose a substantial privacy threat through different privacy attacks such as model inversion attacks. As a remedy, Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL, by guaranteeing the server can only learn the global aggregated model update but not the individual model updates.While SA ensures no additional information is leaked about the individual model update beyond the aggregated model update, there are no formal guarantees on how much privacy FL with SA can actually offer; as information about the individual dataset can still potentially leak through the aggregated model computed at the server. In this work, we perform a first analysis of the formal privacy guarantees for FL with SA. Specifically, we use Mutual Information (MI) as a quantification metric and derive upper bounds on how much information about each user's dataset can leak through the aggregated model update. When using the FedSGD aggregation algorithm, our theoretical bounds show that the amount of privacy leakage reduces linearly with the number of users participating in FL with SA. To validate our theoretical bounds, we use an MI Neural Estimator to empirically evaluate the privacy leakage under different FL setups on both the MNIST and CIFAR10 datasets. Our experiments verify our theoretical bounds for FedSGD, which show a reduction in privacy leakage as the number of users and local batch size grow, and an increase in privacy leakage as the number of training rounds increases. We also observe similar dependencies for the FedAvg and FedProx protocol.

Keywords: Federated Learning, Secure Aggregation, Mutual Information, Formal Privacy Guarantee

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.