Recycling Scraps: Improving Private Learning by Leveraging Checkpoints]{Recycling Scraps: Improving Private Learning by Leveraging Checkpoints
Authors: Virat Shejwalkar (Google Deepmind), Arun Ganesh (Google Research), Rajiv Mathews (Google Deepmind), Yarong Mu (Google), Shuang Song (Google Deepmind), Om Thakkar (OpenAI), Abhradeep Thakurta (Google Deepmind), Xinyi Zheng (Google)
Volume: 2025
Issue: 2
Pages: 607–628
DOI: https://doi.org/10.56553/popets-2025-0079
Abstract: DP training pipelines for modern neural networks are iterative and generate multiple checkpoints. However, all except the final checkpoint are discarded after training. In this work, we propose novel methods to utilize intermediate checkpoints to improve prediction accuracy and estimate uncertainty in DP predictions. First, we design a general framework that uses aggregates of intermediate checkpoints during training to increase the accuracy of DP ML techniques. Specifically, we demonstrate that training over aggregates can provide significant gains in prediction accuracy over the existing state-of-the-art for StackOverflow, CIFAR10 and CIFAR100 datasets. For instance, we improve the state-of-the-art DP StackOverflow accuracies to 22.74% (+2.06% relative) for epsilon=8.2, and 23.90% (+2.09%) for epsilon=18.9. Furthermore, these gains magnify in settings with periodically varying training data distributions. We also demonstrate that our methods achieve relative improvements of 0.54% and 62.6% in terms of utility and variance, on a proprietary, production-grade pCVR task. Lastly, we initiate an exploration into estimating the uncertainty (variance) that DP noise adds in the predictions of DP ML models. We prove that, under standard assumptions on the loss function, the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run. Empirically, we show that the last few checkpoints can provide a reasonable lower bound for the variance of a converged DP model. Crucially, all the methods proposed in this paper operate on a single training run of the DP ML technique, thus incurring no additional privacy cost.
Keywords: Differential privacy, Prediction uncertainty, Deep learning
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.
