Losing Less: A Loss for Differentially Private Deep Learning

Authors: Ali Shahin Shamsabadi (The Alan Turing Institute), Nicolas Papernot (Vector Institute, University of Toronto)

Volume: 2023
Issue: 3
Pages: 307–320
DOI: https://doi.org/10.56553/popets-2023-0083

Download PDF

Abstract: Differentially Private Stochastic Gradient Descent, DP-SGD, is the canonical approach to training deep neural networks with guarantees of Differential Privacy (DP). However, the modifications DP-SGD introduces to vanilla gradient descent negatively impact the accuracy of deep neural networks. In this paper, we are the first to observe that some of this performance can be recovered when training with a loss tailored to DP-SGD; we challenge cross-entropy as the de facto loss for deep learning with DP. Specifically, we introduce a loss combining three terms: the summed squared error, the focal loss, and a regularization penalty. The first term encourages learning with faster convergence. The second term emphasizes hard-to-learn examples in the later stages of training. Both are beneficial because the privacy cost of learning increases with every step of DP-SGD. The third term helps control the sensitivity of learning, decreasing the bias introduced by gradient clipping in DP-SGD. Using our loss function, we achieve new state-of-the-art tradeoffs between privacy and accuracy on MNIST, FashionMNIST, and CIFAR10. Most importantly, we improve the accuracy of DP-SGD on CIFAR10 by 4% for a DP guarantee of 𝜀 = 3.

Keywords: differential privacy, differentially private stochastic gradient descent, loss function

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.