DPrio: Efficient Differential Privacy with High Utility for Prio

Private data collection systems such as Prio ensure data privacy by distributing trust among a set of mutually trusted parties, to allow for aggregate data collection without disclosing any single client’s data in the clear. While systems like Prio are undergoing widespread interest and adoption, these systems lack efficient mechanisms to provide differential privacy guarantees. In this work, we present a lightweight method that we call DPrio to augment Prio and related systems with differential privacy assurances while ensuring higher data utility than existing noise generation protocols. We compare our results against four related constructions in the literature, and identify how DPrio achieves improved data utility relative to the assumed number of dishonest clients and servers, with only minimal (and batchable) server communication overhead. We present several case studies and discuss considerations for real-world implementations.


INTRODUCTION
Prio [17] is a system which enables the collection of aggregate statistics by a set of servers such that no individual server learns anything about the clients' data, except what they can infer from the aggregate statistic. Prio ensures user privacy assuming all but one of the Prio servers can be corrupted, and in the case of servers corrupting data, simply requires the protocol to abort (i.e, the system is not robust against misbehaving servers). Unlike prior related schemes [23], Prio does not assume clients are acting honestly, and so leverages secret-shared non-interactive proofs (SNIPS) to provide robustness against malicious clients. In part due to its efficiency and pragmatic threat model, Prio has already been deployed in practice at scale, beginning with Mozilla [24] and most recently by the same organization that supports Let's Encrypt [2,32].
Unfortunately, it is well known that reconstruction attacks can be used to recover individuals' data from aggregate statistics [28]. One method of protecting individual clients' data is with differential privacy (DP) [20], which allows for formal guarantees about the privacy of any user's data within a dataset. By adding noise to aggregate statistics, we can obtain a guaranteed bound on the amount of information disclosed about individual user records in the aggregated query output.
By default, Prio does not ensure differential privacy. The authors recommend a prior design by Dwork et al. [19] which describes a multi-party computation protocol among a set of servers to generate noise. However, in a distributed setting where network calls can fail and servers maintain heavy load, such -of-MPC operations are undesirable. Further, this mechanism requires server computation and communication among servers that scales linearly relative to the number of clients, as the operations performed by each client in the "full participation" model must be partitioned among the smaller set of servers. On the other hand, clients that add noise locally by following a local DP mechanism [27] have undesirable data utility tradeoffs. The local DP method also requires that clients honestly add noise to their data and often results in data that is inflexible to a variety of analysis. As such, an ideal DP construction for a practical Prio deployment is one that achieves high utility as in the central DP model (where noise is calculated and added by the trusted data curator directly), with efficiency akin to local DP, while still remaining within the original threat model of Prio.
In this work, we present a lightweight method that we call DPrio to add differential privacy to Prio and related systems while ensuring higher data utility and better efficiency than these prior noise generation designs. In DPrio, all clients submit secret-shared noise and servers perform a minimal MPC protocol to select which client's noise is added to the aggregate total. Happily, data utility in DPrio is nearly identical to that achieved by central DP, while client and server computation and communication remains constant regardless of the number of participating clients or servers. DPrio is a non-interactive protocol among clients, requires minimal interaction among servers, and is differentially private against a small proportion of adversarial clients when at least one server is honest but curious. Notably, DPrio maintains the existing threat model of Prio, in that only one server is assumed to be honest.
We also compare our results against two noise generation protocols that we call Client-DP and Server-DP, modeled after existing notions in the literature [23]. In Client-DP, clients submit a small amount of noise, which sum to a Gaussian value. In Server-DP, servers directly add Gaussian noise to their aggregated sums. DPrio achieves improved data utility over Client-DP and Server-DP with respect to the assumed number of dishonest parties. In summary, we present the following contributions: • A careful analysis of DP schemes in the literature and their efficiency when used with Prio. • A lightweight DP mechanism that builds on top of Prio that we call DPrio. DPrio achieves nearly identical data utility as central DP while its efficiency remains constant relative to the number of clients and servers. DPrio does not require client interaction, but does require minimal server interaction, which can be aggregated within one batched operation. • We evaluate DPrio relative to existing constructions in the literature as well as Client-DP and Server-DP. In summary, DPrio achieves higher data utility than prior constructions.

PRELIMINARIES
This section defines the basic notions of privacy and security. We denote the number of Prio servers as and the number of clients as . We use D to denote the domain for the collection of the records from the clients.

Differential Privacy
Differential privacy (DP), introduced by Dwork et al. [20], is a privacy notion that enables the calculation of aggregate statistics on users' data in a privacy-preserving manner. Definition 2.1 (Differential Privacy). A randomized algorithm : D ↦ → Y is ( , )-differential privacy (DP), if for any pair of neighbouring datasets , ′ ∈ X that differ by a single record, and for any ⊆ Y we have If ≠ 0, then we say the mechanism provides approximate differential privacy. Otherwise, when = 0, it satisfies pure differential privacy. Intuitively, this definition ensures that a single record only changes the output distribution by at most a factor of .
A common method of achieving differential privacy for numerical statistics is the Laplace mechanism [19]. One calculates the sensitivity, Δ, of the function they want to compute, which determines the largest change to the function output (measured in 1 norm) by changing a single record, and then adds noise sampled from a Laplace distribution with parameter Δ/ . Theorem 2.1 (Laplace Mechanism [21]). Given a function : D ↦ → R , and a data set ∈ D, the Laplace Mechanism is defined as M ( , (·), ) = ( ) + ( 1 , ..., ) where are i.i.d. random variables drawn from Lap(Δ/ ). This mechanism achieves -DP.
Another additive method of achieving differential privacy is to add Gaussian noise sampled from the normal distribution N (0, 2 ). We call this Gaussian mechanism, and the sensitivity of the function Δ in this mechanism uses 2 -norm to measure the maximum change in the query output when changing a record. Theorem 2.2 (Gaussian Mechanism [21]). Let ∈ (0, 1) be arbitrary. For 2 > 2 ln(1.25/ ), the Gaussian Mechanism with parameter ≥ Δ/ is ( , )-differentially private.
It is possible to formulate the Gaussian Mechanism such that there is no restriction on [14]. Note that the sum of independent normally distributed random variables is also normal. In particular, if 1 ∼ N (0, 2 1 ) and 2 ∼ N (0, 2 2 ) then 1 + 2 ∼ N (0, 2 1 + 2 2 ). That is, adding many Gaussians yields a Gaussian.
One convenient property of DP is its immunity to post-processing. Intuitively, computing a function on the output of a DP algorithm does not reduce the privacy guarantee on the sensitive input data. Theorem 2.3 (Post-Processing [21]). Let : D ↦ → Y be an ( , )-differentially private algorithm. Let : Y ↦ → Z be an arbitrary randomized mapping. Then • : D ↦ → Z satsifies ( , )-DP.
For the accuracy of DP mechanisms, we use the mean squared error to measure the amount of noise added by each mechanism. Definition 2.2. For some randomized mechanism M computing a (noisy) query over a set of data ∈ D, we define the error to be where ( ) is the true value to the query and ( ) is the noised value.
This work deals with the setting where parties are computationally bounded which requires the notion of computationally differential privacy or SIM-CDP [41]. Definition 2.3 (SIM-CDP [41]). An ensemble { }, ∈ N of randomized functions : D → R provides -sim-cdp if there exists an ensemble { }, ∈ N of -differentially-private mechanisms : D → R and a negligible function negl(·), such that for every non-uniform ppt tm , every polynomial (·), every sufficiently large ∈ N, every data set ∈ D of size at most ( ), and every advice string of size at most ( ), it holds that, That is, ( ) and ( ) are computationally indistinguishable.

Secure Multi-party Computation
Secure multi-party computation (MPC) [7,38] allows a set of participants, each holding private data, to jointly compute a function over their data without revealing to one another any information except for the output. A common method of multi-party computation relies on secret sharing schemes [10,18,48], in which an individual's data is split into shares and divided amongst the participants performing the secure computation. After executing the protocol, participants can reconstruct the output of the function by applying the corresponding reconstruction protocol of the secret sharing scheme. Prio uses such protocols to compute the desired statistics while ensuring privacy of each client's data.
Prio uses affine-aggregatable encodings (AFEs) to efficiently encode data such that it is possible to compute the value of a function given the sum of the encodings. There exist AFEs for sums [23,36], standard deviations [45], counts [13,40], and least-squares regressions [35] and Prio adapts these AFEs to compute private statistics.
The AFEs used by Prio also apply secret-shared non-interactive proofs (SNIPs) [17] to verify the input of the clients. The interactive version requires an interaction between a client (prover) and multiple servers (verifiers) in which the client attempts to prove to the servers that Verify( ) → 1 without revealing any additional information about . The requirements of this system are analogous to the properties of a zero-knowledge proof system [30] that is non-interactive; however, Prio is designed for a setting where the role of the verifier is partitioned among several mutually untrusted parties. In particular, SNIPs ensure that the information encoded in the SNIP is both valid (i.e, the information is encoded correctly) and zero-knowledge (i.e, the servers learn nothing beyond whether the SNIP is valid or not), assuming that all servers are honest.
In Prio, the server-to-server computation cost is independent of the complexity of the circuit or the size of the value and is basically the cost of computing Verify locally. The client-to-server communication grows linearly with the size of the Verify circuit.

RELATED WORK
We present related work in central and local differential privacy. We then discuss Dwork-MPC, a multi-party protocol to generate noise, and past work on assumptions of non-collusion. Central Differential Privacy. In the central DP model, the data curator is assumed to be trusted, and can view all client records. The data curator then directly chooses the appropriate amount of noise and adds it directly to the data. The US Census Bureau used central DP to protect sensitive information in the 2020 census [1]. Programming frameworks like PINQ [39] and Ektelo [51] are popular for leveraging central DP. This model is able to guarantee high data utility, a goal we similarly maintain for our constructions. However, it is undesirable when no such a trusted data curator exists. Local Differential Privacy. In local differential privacy, each client sends their data along with DP noise to the central server. This ensures that the clients' data is differentially private to the central server and the rest of the clients. Apple [49] and Google [27] use this model to gather analytics from millions of users. The benefit of local DP is that each client does not have to trust any other party. The downside is that it introduces noise on the order of the number of clients. This is only valuable in instances with a lot of data and patterns that are still evident despite the noise. Assumptions of Non-Collusion. To bridge the gap between the low data utility induced by local DP and the high trust assumptions by centralized DP, assumptions of non-collusion have been introduced as a "best of both worlds" option. For example, introducing an intermediate shuffler between clients and the aggregator can provide anonymity and increase the utility [8,26]. However, methods that rely on another party to perform shuffling still requires that clients locally generate noise and thus cannot achieve utility equivalent to the central model [5]. A different approach involves secure computation [22] or encryption of noise to a separate trusted party [47]. However, these models also introduce undesirably high trust assumptions in single entities, whereas in the Prio model requires distributing trust equally among the servers. MPC Protocols for DP. While Prio does not define DP directly, the authors reference an MPC protocol defined by Dwork et al. [19] that we refer to as Dwork-MPC. In Dwork-MPC, noise can be generated by an MPC protocol by servers, achieving ( , )-DP. In its base form, Dwork-MPC assumes that at least 1/3 of servers are honest. Eriguchi et al. [25] make improvements to the communication complexity or success probability of the algorithms in Dwork-MPC; however, they assume that all servers are honest but curious.
There exists other work which focuses on implementing specific DP algorithms in an MPC setting. For example, there exist mechanisms for computing a differentially private median [11,12], sampling biased coins [16], and graph queries [46]. Our work differs from these approaches that design protocols for a particular setting, since we focus on a generalizable, robust, and scalable framework that computes statistics accurately with DP guarantees.

SYSTEM GOALS
The goal of this work is to provide a lightweight mechanism on top of an existing Prio architecture to allow for an efficient efficient mechanism to ensure differential privacy (DP), but with high utility. In other words, we aim for the best of both worlds -a data collection mechanism with DP guarantees, without resorting to heavyweight MPC protocols or outputting noisy data with low utility.

Overview of Prio
We now describe Prio. This system will serve as the base on which we build a differentially private solution. The system is executed in the following steps, illustrated by Fig 1. While Prio does not impose hard constraints on the number of clients and servers, it assumes a small number of servers relative to clients. We provide a range of case studies and their impact on performance in Section 8.
(1) Upload. Each client encodes its data in a prescribed manner and splits its private encoded value into one share per server employing an affine-aggregatable function (AFE). The client then constructs a SNIP to prove to the server that the encoding satisfies certain correctness properties. The client forwards the shares of the proof and the encoded data to the corresponding servers. (2) Verify and Aggregate. Upon receiving data from clients, the servers verify the SNIP to ensure that the encoding is well-formed. If the data is well-formed, the servers locally aggregate their shares. (3) Publish. Once enough verified data has been received (e.g., if the protocol requires one million users' data) and locally aggregated, the servers reveal their local aggregations to an analyst who can accumulate all of the data to obtain the final statistic. The final statistic does not satisfy DP; however, the system is equipped with several desirable properties. In particular, as long as one server is honest, the Prio servers learn nothing about the clients' data except what they can learn from the output. The system is robust if all servers are honest and is correct in the presence of faulty or malicious clients due to the SNIP proofs. For reference, we provide additional information on AFEs and SNIPs in Appendix A.
We will elaborate on these assumptions and these properties for Prio next, and add new assumptions for achieving DP.

Threat Model
We follow a similar threat model as employed by Prio, but additionally require an assumption about the bound of honest clients within the system as a whole. We now describe each party that participates in the protocol and their capabilities now.
Data Analyst. The data analyst wishes to learn some aggregated statistic about a user base. We consider the data analyst to be honest but curious -in other words, they are expected to follow the protocol of simply receiving aggregate statistics.
Clients. In Prio, clients submit zero-knowledge proofs that their submission is acceptable for the query being performed, to ensure robustness against misbehaving clients. In all of our schemes, we make an assumption on the proportion of honest clients, and adjust our construction accordingly. While this assumption may be unacceptable for small-scale deployments, it is certainly pragmatic in deployments of Prio for applications such as modern web browsers.
Servers. Prio provides privacy for client's data in the setting where at least one server is honest but curious about the sensitive data. However, Prio is not robust against misbehaving servers. In other words, servers are trusted to follow the protocol; if they do not, they can corrupt the correctness of the data that the data analyst receives without detection. Our constructions maintain the same robustness and privacy assumptions. In addition, we assume that the servers do not collude with the client whose randomness is chosen. We also show in Appendix C the optimal protocol when this assumption is removed. While Assumption 4.2 may be perceived as overly strong, this perspective does not necessarily extend to real-world deployments. In the setting that DPrio targets -a large number of honest clients and small number of servers -this assumption becomes quite practical. Major browsers such as Firefox and Chrome which use or plan to use Prio for data collection have millions if not billions of users. While some malicious clients might exist and collude with a server, the likelihood that their noise is chosen is small.

Limitations on Query Type
Prio uses affine-aggregatable encodings which enable the servers to compute complex statistics by simply computing the sum of the encodings. To maintain the simplicity of the schemes, we only consider noise mechanisms which are additive, the most common of which are the Laplacian and Gaussian mechanisms (Section 2). As a result, the types of queries that our mechanisms support are limited to those which are interpretable after adding noise. Our constructions mainly apply to queries like sums, means, and counts. We do not consider more complex statistics such as those working on categorical data. We note that the existing Prio system also does not consider some complex statistics, including the median. Determining how to integrate DP for more complex statistics is an interesting area of future work.

ADAPTING PRIOR WORK TO PRIO
We now explain how prior work can be adapted to the Prio system and the limitations of these solutions.

Dwork-MPC
In the setting of Prio where clients are assumed to interact only with servers, differential privacy (DP) can be achieved using a variant of Dwork-MPC [19]. In this variant of Dwork-MPC, servers generate noise from a Gaussian with mean zero and variance 3 2 2 / , where 2 takes a lower bound from Theorem 2.2. This approach achieves ( , )-DP when at least 2/3 servers are honest. Note that there is an asymmetry between the assumed number of servers that honestly follow the protocol in Prio's threat model and Dwork-MPC, as Prio requires at least one honest server, whereas Dwork-MPC assumes that at least 2/3 servers are honest. While it is possible to generate noise in a dishonest majority setting, doing so requires performing ( ) rounds as demonstrated in prior literature by Beaver and Goldwasser [6,29]. As such, we exclude such protocols from our analysis, as the goal of this work is to provide a mechanism that is not much more costly from plain Prio.
Dwork-MPC also proposes an variant to prevent a Byzantine server from adding a significant large amount of noise to its share, that we call Dwork-MPC*. This approach requires the servers to cooperatively generate shares of many random bits that can be transformed shares of a noise drawn from Binomial distribution (close to a Gaussian noise). The number of high-quality random bits for achieving ( , )-DP should be at least = 64 log(2/ )/ 2 [19, §2.1]. All servers must verify that the shared values are in the specified set. As these bits can be chosen adversarially by the server, the shares of these bits must be verified and then combined with high-quality public bits using Verifiable Secret Sharing (VSS) [44] and a deterministic extractor. The main computation costs of the protocol are the multiplications for verifying the shares' membership and the execution of VSS, and hence is proportional to the number of the coins.
Prio requires the setting where each server "represents" a portion of a set of clients (roughly / ). Consequently, in Dwork-MPC*, each server must take on the work of / clients, and so is responsible for contributing at least / · / = / random bits and at least / · = number of shares and their corresponding verification. For example, in a system with three servers, and where ( = 0.01, = 10 −6 ), each server would need to generate, secret share, and verify roughly 1.3 · 10 6 number of random bit and their verification, which adds a significant computation and communication overheads to the servers in Prio.

Distributed Noise Generation
To align with the assumptions in Prio, we now describe two noninteractive techniques that allow either only clients (Client-DP) or only servers (Server-DP) to generate noise using the Gaussian Mechanism where noise sampled from the normal distribution N (0, 2 ) is added to a value. These mechanisms are modeled after prior work in the literature [23,31] and can be applied to the Prio system in order to add differential privacy.
5.2.1 Server-DP: Server Noise Generation. We now describe a mechanism to generate noise [31] involving only servers that we call Server-DP. Server-DP uses the Gaussian mechanism and properties of independent normally distributed random variables. Let denote the number of servers and suppose that the sensitivity, Δ, and privacy parameters ( , ) have been established beforehand. The protocol used by the servers to add noise is as follows.
(1) Each server samples noise from a normal distribution N (0, 2 ) (2) When locally summing the shares of data, the servers add the noise they previously sampled, before revealing their final sum.
Privacy. Prio ensures that privacy is achieved if at least one server is honest. For Server-DP, ( , )-DP is achieved if at least one server is honest, by Theorem 2.2. This is a generalization of Dwork-MPC on the assumptions on the number of honest server. Even when the adversary (e.g. the data analyst) controls ( − 1) servers and ( − 1) clients, the input of the honest client is protected by the ( , )-DP guarantee.
Utility. Suppose that the output of the mechanism is given to a data analyst who does not collude with any servers. Server-DP uses the Gaussian mechanism which has error 2 when noise is sampled from a normal distribution N (0, 2 ). Assume that there are servers. Then, the expected error of the result received by the analyst for Server-DP is Error Server-DP = (2 ln(1.25/ ))Δ 2 / 2 .
Robustness. Server-DP is robust against adversarial clients when at minimum one server is honest, and so fits within Prio's threat model. Additionally, Server-DP's robustness includes the correctness, soundness, and zero-knowledge properties assured by the use of SNIPs. The malicious behavior by clients is limited to sending incorrect input, but this can be verified by the use of SNIPs.
Extensions. If different assumptions on the number of servers that collude are made, different noise can be added. Suppose that all servers are honest, do not collude with one another, and do not leak the noise that they included in the output. Then the noise added to the output is normally distributed with parameters 2 = · where is the number of servers. This ensures ( / √ , )-DP, but comes with tradeoffs with respect to utility. This works well when is small. For example, Crypt has two servers and they add noise in this way [47]. In short, Server-DP introduces noise that scales linearly relative to the number of honest servers above the assumed threshold.

Client-DP: Client Noise Generation.
We now describe a mechanism for clients to add noise [23,31] that we call Client-DP. In this mechanism, each client generates noise, and then all noise is added to the client's input and then directly included in the aggregate. We describe the protocol in the most general case, where we assume some number of malicious clients equal to ( could be 0). In summary, all clients perform the following steps: (1) Sample Gaussian noise from a normal distribution N (0, (2 ln(1.25/ ) · (Δ/ ) 2 )/( − )). (2) Encode the Gaussian noise according to the same AFE as the data so that Prio servers can aggregate it directly when performing share aggregation.
Suppose there are clients, the sensitivity is Δ, and the privacy parameters , have been established. We want each client to generate noise from a normal distribution There are a several different options to do this. First, we could let = (2 ln(1.25/ ) · (Δ/ ) 2 )/ . This is fairly straightforward and ensures ( , )-DP if the clients honestly sample noise. An alternative is to assume that some proportion of clients are malicious, say out of clients. Then, having clients sample from a normal distribution with = (2 ln(1.25/ ) · (Δ/ ) 2 )/( − ) would ensure that the noise provided by honest clients still provides ( , )-DP. Any additional noise added by malicious clients can be treated as post-processing (Theorem 2.3), and consequently does not affect the privacy guarantee. This choice depends on the context of the system as well as the number of clients, since a large number of clients means that even in the first case, a single client who incorrectly samples noise will not largely affect the output or the privacy guarantees. The value for is provided to the client as part of the noise parameters together with , , Δ. As before, clients must submit proofs to show that the affine-aggregatable encoding (AFE) of the noise is correctly generated.
Privacy. Client-DP ensures differential privacy by receiving Gaussian noise from every client. This Gaussian noise is added up, and the aggregate noise ensures DP. Malicious clients can be taken into account by estimating how many exist in the system and increasing the amount of noise honest clients submit to compensate for those clients. This construction guarantees ( , )-DP.
Utility. Suppose that a data analyst, not colluding with any servers, receives the output of the mechanism. Client-DP also uses the Gaussian mechanism which has error 2 when noise is sampled from a normal distribution N (0, 2 ). If we assume that there are clients compromised by an honest but curious adversary (e.g, the data analyst), and clients submit noise values, then we expect the amount of error of the result received by the analyst to be Error Client-DP = ( /( − ))(2 ln(1.25/ ))Δ 2 / 2 .
Robustness. Client-DP's robustness against client misbehavior can be ensured by proving the correctness of client inputs via SNIPs. However, servers will not be able to verify that the noise was correctly sampled from the Gaussian distribution.
Extensions. We may consider extending this approach by applying zero-knowledge proofs as a way to verify that a program to add noise was executed correctly, as is done in VerDP [43]. Unfortunately, this approach is highly inefficient and would introduce unreasonable overhead, even if it were able to be integrated into the structure of the SNIPs used by Prio.

DPRIO: DIFFERENTIALLY PRIVATE STATISTICS WITH HIGH UTILITY
We now describe our construction DPrio that incorporates differential privacy (DP) into the Prio system, and provide an overview of the construction in Figure 2. At a high level, instead of many parties adding noise directly as is the case in local DP, Client-DP, or Server-DP (which is efficient but decreases data utility) or servers performing MPC operations to generate noise (which has high utility but low efficiency), we propose the following system. Clients generate noise, but rather than adding noise directly, they secret share their noise (as their data is secret shared) to the Prio servers. The Prio servers then perform an efficient two-round MPC protocol (where the first round can be batched in a pre-processing phase, if desired) to select a small number of clients' noise to add. Section 7 shows that this achieves DP, while satisfying high data utility and efficiency for clients and servers.

Client Noise Generation and Submission
Setup. Initially, a system administrator will determine the set of queries DPrio will support in this instance. It will compute the sensitivity for each of the queries, and decide upon and values.
Protocol. At the time of submitting data to Prio servers, clients additionally perform the following steps: (1) Sample noise ← $ F from a noise distribution F that leads to a mechanism satisfying ( , )-DP. (2) Encode the noise using the same encoding mechanism as the statistic they are sending to the Prio servers. Note that this also includes a SNIP proving that the noise is validly formatted.
(3) Generate secret shares 1 , . . . , of the encoded noise using additive secret sharing, such that = =1 . This step is the same as that used by the client to generate secret shares of their statistic to send to the Prio servers. (4) Submit each along with the secret share of the client's data to the th Prio server.
Any additive mechanism such as the Gaussian or Laplace mechanism can be used with the protocol. An additive mechanism is one where achieving DP only requires adding noise to the statistic. We now expand on the noise generation and encoding steps.
6.1.1 Noise Generation. To generate noise, all clients begin by sampling a value from a distribution, where the parameter of this distribution is fixed at the time of system setup and is chosen to satisfy the definition of ( , )-DP. Each client then encodes this noise and generate a SNIP in a manner identically to that of the statistic at hand, as described in Section 4.1. The client then sends both the statistic SNIP and the noise SNIP to the DPrio servers. The DPrio servers will choose clients' noise values at random (Section 6.2). Before sending the aggregated sum to the data analyst, the servers will add the selected clients' noise to the sum. The data analyst will only see the noised sum after aggregating all sums.
The tricky step in this protocol is that typical noise distributions are not discrete, so to achieve DP the client would have to send infinite bits of noise. Since this is not computationally feasible, the clients must truncate their noise. This issue can be addressed by using discrete noise generation methods [3,4,15,34,50]. We adopt the secure noise generation by [50] to generate integer noise in our implementation. A more detailed analysis of the noise truncation can be found in Appendix B. Similar analysis can be applied to other forms of noise, such as discrete Gaussian noise [15].

Noise
Encoding. In this model, we simply require clients submit a proof that their noise falls within the same encoding structure as the client's data itself. This proof does not demonstrate that clients have picked the noise from the correct distribution. To protect against misbehaving clients that might simply submit zero instead of correctly sampled noise, Prio servers perform an MPC operation to select which client's noise to add. Therefore, the security of this construction depends on the assumed bound of misbehaving clients (Section 7.2).
Encoding the noise uses the same affine-aggregatable encoding (AFE), as in Section 2.2, as the data itself, allowing servers to aggregate noise with the data without requiring reconstruction. Once the client has submitted its noise to the servers, it deletes the noise.
While we demonstrate DPrio specifically as a DP mechanism for Prio, the general approach DPrio follows can extend to alternative private data collection designs that employ similar multi-server models for distributing trust when performing data collection.

Server Noise Selection
We now describe an efficient commit-reveal MPC protocol servers use to select which clients' noise. Let be the number of clients who have submitted noise. We assume that clients' shares of this noise ordered in some known manner and the Prio servers know this ordering (we describe practical ordering options for implementations in Section 9). Assume the existence of some hash function.
(1) Each server selects some random number within the range [0, ], where is a multiple of . That is, server selects some . To prevent against a server being able to enumerate all possible inputs and select a specific one, each server also selects some salt value ∈ [Ψ], where Ψ is some value too large to exhaust by brute force. Having a large Ψ prevents a server from waiting to see all the other servers' commitments, enumerating through all possible hashes, and selecting a specific to influence which client is chosen.
(2) Each th server computes and publishes the hash = ( || ) of their random number to serve as a commitment. Here, corresponds to a particular client's index. The servers then apply this client's noise to their aggregated sums, throwing away the remaining client noise shares.
The initial commitment ensures that no server may sway the result towards a particular client. If a server attempts to cheat by revealing an input different to the one they originally chose, the remaining honest servers will discover this by verifying that the output of the hash function matches the original commitment. This is a standard commit-reveal technique to ensure that each party commits to their contribution before learning all other parties' contributions.
Before taking this step, the servers have already received shares of noise from all clients. Once a particular client is chosen, the servers simply add the corresponding share of noise to their local sum of data sent by the clients. To add noise in this way, the shares that are received by the servers must have some ordering, which we discuss in Section 9. If the protocol would like to add noise values from clients, without replacement, the servers simply repeat steps (1)-(5) by excluding previously sampled clients.
We assumed that servers do not collude with clients (Assumption 4.2). A server colluding with a client may learn whether a particular client's noise was selected via the noise selection protocol. In this case, the colluding server could learn the noise value from the client. Under this non-collusion assumption, our commit-reveal protocol could be changed to a protocol which deterministically sets the output (e.g., pick a server and let this server sample a client id and share the id with others, or fix a client id all the time). However, the advantage of this commit-reveal protocol is that it can be easily combined with the shuffle model if we want to remove the non-collusion assumption (Section 7.3.3). Also, its overhead is low, as shown in the Section 8.3. In addition, suppose we relax the non-collusion assumption to cases where servers may collude with a small number of fixed clients. This commit-reveal protocol prevents these servers from swaying the result towards the colluding clients, and thus, it has a stronger guarantee than the alternatives. We discuss alternatives under this collusion in Appendix C.
Unlike prior constructions that employ server cooperation, DPrio achieves significantly higher efficiency and data utility as the MPC operation in DPrio is simply to determine which client's noise to add, as opposed to cooperating to generate noise directly.

PRIVACY AND SECURITY ANALYSIS 7.1 Differential Privacy
We provide a sketch of a security proof that shows our system satisfies DP, assuming that clients submit well-chosen noise, as described in Section 4.2, and then misbehaving clients in Section 7.3.2.
Clients submit encoded shares of data to Prio servers which ultimately sum to some statistic. They also submit shares of noise encoded similarly such that when the shares are summed, we obtain the statistic with noise sampled from a distribution ensuring ( , )-DP. Performing this operation is easily accomplished with the building blocks already provided by Prio; namely, AFEs (such as a linear secret sharing scheme) and SNIPs (we expand further on both in Appendix A). Therefore, after the Prio servers learn the aggregation of their shares locally, the final reconstructed result will be the sum of the desired statistic and the noise to achieve DP. The corresponding mechanism, denoted by M, is simply a perturbation of the aggregate of the client's input data with noise drawn from Gaussian (or Laplace) distribution, that satisfies ( , )-DP.
7.1.1 DP against Honest Colluding Servers. We claim that the view and output of an adversary controlling a single server is computationally indistinguishable from that of a simulator with access only to the output of the mechanism M satisfying ( , )-DP, and the total size of the database. That is, our protocol satisfies computational differential privacy under SIM-CDP (Definition 2.3).
Let Π denote the protocol executing the mechanism M and let M ( , , ) denote the output of running the protocol on an input database with tunable parameters , . The original Prio design [17] demonstrated that there exists an efficient simulator which outputs a transcript of the protocol execution that is indistinguishable from a real transcript and the only information that leaks to the adversary is the value of the function, , being computed on the clients' private values. The only difference in DPrio is the noise generation by clients and sampling a client at random by Prio servers. From the perspective of an adversary controlling a single server, their view consists only of that which they could see in Prio and a share of some noise, which is private in an information-theoretic sense. Assuming that at least one server is honest, the protocol either outputs a correctly computed ( , )-DP result according to the DP mechanism, or an incorrect result which is independent of the true answer. In either case, the adversary learns no more than that which is bounded by the ( , )-DP property. Therefore, we obtain the following result. Corollary 7.2. Protocol Π satisfies computational differential privacy under the SIM-CDP notion. While Theorem 7.1 assumes an adversary controls a single server and does not control any of the clients, the proof generalizes to the case where the adversary controls up to ( − 1) servers, due to the privacy properties of the AFE which requires out of shares to reconstruct any input, summarized as follows. Corollary 7.3. Theorem 7.1 holds against an adversary controlling all but one of Prio servers.
As Prio does not consider misbehaving servers, we will not consider misbehaving servers in DPrio as well.
7.1.2 DP against Honest Colluding Clients. Recall that in the DPrio protocol, each client ∈ { 1 , . . . , } samples a noise from a distribution that achieves ( , )-DP, and then the servers randomly select noise from clients without replacement from the clients' noise to perturb the answer. Given a set of clients , we use = ∈ to denote the sum of the noise variables contributed by . Let be the number of honest but curious clients who collude or are controlled by an adversary. We assume ≥ 1 as the adversary can simply include one of the clients who knows its own noise. We want to highlight that this analysis treats the clients' noise differently from their input data. In particular, the colluding clients can only know their own set of noise but not the other clients' noise, but they can know the other client's data from other public sources as in the standard DP analysis. Hence, we consider an adversary who knows − 1 clients' data except the last one in addition to clients' noise. We would like to show that for any colluding clients = { 1 , . . . , } ⊂ { 1 , . . . , }, for any neighboring databases and ′ that differ by one client's input data (not a client in ), for any output , we have the following guarantee: In this protocol, there are two types of randomnesses involved: (i) the noise sampled by the clients; (ii) the set of clients selected by the servers, denoted by = { 1 , . . . , }. Without loss of generality, we consider the set of compromised clients is = { 1 , . . . , }. First, we consider a general and simple case when > ≥ 1, i.e., the number of noise-providing clients selected by the servers is more than the number of colluding clients. The colluding clients do not know the remaining ( − ) noise values, and hence cannot break the DP guarantee offered by each single noise. Theorem 7.4. When > ≥ 1, the protocol against number of colluding clients satisfies ( , )-DP guarantee.
Proof. Consider a noisy output and neighbors ( , ′ ). We use ( − ) to denote the sum of the noise variables contributed by the clients in − and ( ∩ ) to denote the sum of the noise variables contributed by the clients in ∩ . Given that knows the noise of the first clients, we break down the probability of generating the noisy output into several cases depending on the intersection size between and . Note that − is non-empty as > (there is always some client's noise in not known by ). Pr A similar proof applies for the general case > ≥ 1. □ If we know the value of , then setting = + 1 for the protocol is sufficient to eliminate any adversaries who compromise clients and to ensure the same level of DP guarantee. If setting > + 1, we can also tighten the privacy parameter of DP for Gaussian noise. For example, if we take the sum of noise from honest clients who sampled from a Gaussian distribution with parameter 2 and of them are compromised by the adversary, the composed noise of the remaining clients follows a Gaussian distribution with parameter ( − ) · 2 . This gives ( / √ − , )-DP guarantee. Next, we consider a general case when ≤ . We start with a basic case when = 1 ≤ (a single client's noise is selected by the servers). Let be the client whose noise is selected. Among all the adversaries, the ones who control (including itself as an adversary) are able to distinguish between the true database and its neighbors ′ and hence break the DP guarantee offered by this single noise. In particular, for any set of colluding clients, there is a probability of that the selected client is one of the colluding clients ∈ . If this happens, then given a noisy output , and the true database instance , then the adversary is certain that one of its noise can produce the exact . Hence, we have However, for a true instance's neighbor ′ , the adversary can test its noise and find that it is very unlikely to produce from ′ , i.e., Though the second probability term in Eqn. (4) and Eqn. (4), Pr[ = − ( )| ∉ ] and Pr[ = − ( ′ )| ∉ ] is bounded by ( , ), but the difference in the first term cannot be bounded by the initial DP parameters ( , ). This adds an additional failing probability bounded by . In order to reduce this additional failing probability, we will consider protocols that add > 1 clients' noise.
If = > 1, there exists a set of colluding clients who can break the DP guarantee, by similar reasoning to the basic protocol with = 1. We discuss how to analyze an upper bound for the additional failing probability to achieve the initial ( , )-DP guarantee by a single noise. We denote this additional failing probability by ′ . It is possible that the set of colluding clients is exactly the same as the ones selected by the servers = . In this case, given a noisy output , and the true database instance , the adversary can be certain that its noise can produce the exact from . For its neighbors ′ , the probability is much smaller (can be close to 0). Hence, If < , the additional failing probability increases, as there is a higher chance to find the set of chosen noise than the case when = . We have the failing probability ′ = − − / . The analysis can be generalized to any neighbors ( , ′ ).  The settings in which applying DPrio is appropriate include those where the assumed number of colluding clients, , is a constant. This is appropriate in large networks with millions of users where it is unlikely that a large proportion of clients would be compromised. Current use cases of Prio fall into this category, such as collecting browser telemetry [24]. In these settings, we achieve a reasonable value for ′ . In our evaluation, we set = log > (where is a constant) and hence ′ = 0 by Theorem 7.4. Let's say if happens to be the same as = log( ), then by Theorem 7.5, we have The failing probability analysis above assumed the adversary has access to ( − 1) clients' data and clients' noise, but still allows the clients to follow the protocol, the worst case for DP analysis. We analyze a malicious adversary who controls clients to use incorrect data and noise distribution in Sections 7.2 and Section 7.3.

Comparison with Prio.
Prio originally provided -privacy, meaning that for an aggregate function , an adversary who controls any number of clients and all but one server learns nothing about the honest clients' values , except what they can learn from the value ( 1 , . . . , ) itself. DPrio improves upon this property by ensuring DP. Though the previous protocol presented for DPrio offers DP under the assumption that the adversary controls either up to ( − 1) servers or up to m clients, DPrio still offers the same -privacy in the worst case (i.e., if an adversary controls a Prio server and learn which client's noise is chosen, such as by colluding or directly controlling the client and therefore its choice of noise).

Robustness against Malicious Clients
Prio is robust against malicious clients when all servers are honest. That is, a set of malicious clients cannot influence the final aggregate beyond their ability to choose arbitrary valid inputs, due to the use of SNIPs to verify that secret-shared data is in fact with respect to a valid statistic. This guarantee similarly holds true for DPrio and is similarly accomplished by proving the correctness of clients' noise via SNIPs. The key distinction between Prio and DPrio in this regard is that in DPrio, a client submits not just a data value, but also a noise value. Thus, Prio servers can still verify that the input is a valid noise value (i.e., some -bit integer value). However, Prio servers cannot verify that the noise was correctly sampled with respect to some distribution. This poses some risk since clients can arbitrarily choose very large or very small noise values within the bound of the allowed range to sway the output in their favour. However, given a large number of honest clients, it is unlikely that the servers will choose a malicious client's noise.
The risk of incorrectly sampled noise being chosen is increased when we select noise submitted by more than one client, as suggested in Section 7.1.2. We demonstrated that as the number of noise values chosen from clients, , increases, it decreases the failing probability of the DP guarantee. However, increasing also increases the probability that an incorrectly sampled noise value is included in the final result, therefore decreasing the overall robustness of the protocol. In particular, suppose the servers choose clients' noise without replacement and there are malicious clients. Assume the worst case scenario where all malicious clients submit incorrectly sampled noise values. Then the probability that at least one incorrectly sampled noise value is included in the result is 1 − − / . If is a small constant relative to , then this probability is small. Figure 4 shows how this probability changes depending on and . It increases linearly as and grow larger. Additionally, we note that the risk this poses is not much different than the risk of a malicious client submitting incorrect data. This risk is mitigated by the fact that Prio and DPrio both ensure that data points and noise values conform to a valid data type; neither ensure that the data or noise values are necessarily correct. This vulnerability exists in any system collecting data from clients.
In practice, it is realistic to assume that is large, and that the adversary controls a small fraction of clients' input. This assumption is reasonable, given that Prio is intended to work at large scale. Major browsers such as Firefox and Chrome which use or plan to use Prio for data collection have millions if not billions of users. The alternative solution using MPC to sample noise in a distributed setting [19] assumes that at most one third of the servers generating noise are faulty/malicious, where the number of servers may be very small. Our protocol does not rely on such an assumption.

Comparison of DP Mechanisms
We present a comparison of security and privacy properties in Table 1 between DPrio and related constructions.

Server and Analyst
Misbehavior. All protocols, except for the local DP construction require the assumption that servers are honest but curious. There is, however, a difference in the proportion of servers which must act honestly. The Dwork MPC construction requires that at least 2/3 of servers act honestly while the plain Prio construction only assumes at least one server is honest. Further, DPrio, Client-DP, and Server-DP all require that at least one server is honest. All protocols are secure against a Misbehaving Analyst, i.e. any third party which obtains the output cannot learn more than what is revealed from the differentially private output.

Client
Misbehavior. Clients can misbehave in multiple ways. They may use incorrect data for the computation, use an incorrect noise distribution, curiously inspect the final/intermediate output to learn information about others' data, or collude with dishonest servers. Each model we evaluate can prevent some forms of misbehavior but not all, with differing implications for data utility.
Assumption on the number of honest clients. DPrio achieves DP so long as the client noise that is selected is honestly generated and included in the aggregate sum by the servers. As such, the security model for DPrio assumes that the client noise that is selected was honestly sampled. . Conversely, Client-DP assumes that the sum of client noise that is added is of sufficient quality. Similarly, Server-DP assumes that the noise generated by the designated servers has been honestly generated. As such, the utility of data scales relative to the proportion of assumed honest clients and servers for both Client-DP and Server-DP (as underestimating the number of misbehaving entities results in additional noise, thereby lowering data utility). Conversely, DPrio simply requires that the probability of selecting noise that was generated by a client honestly following the protocol to be sufficiently high.
Clients using an incorrect noise distribution. Clients may misbehave by using an incorrect noise distribution when adding noise. For example, they might choose to use a weaker distribution to impair the DP guarantee, or submit no noise at all. This threat is irrelevant to Dwork-MPC and Server-DP which do not require clients to sample noise. On the other hand, Local DP, Client-DP, and DPrio are more vulnerable to this attack, to varying degrees. In Local DP, an honest client can ensure their data remains private by adding sufficient noise to their own data, thus a lack of additional noise from other parties does not have a catastrophic effect on their privacy; however, this approach significantly reduces utility.
In Client-DP, we mitigate this attack by making an assumption on the number of honest clients, as discussed in the previous paragraph. With a sufficient number of honest clients, we can deter the effects of the attack while maintaining better utility than the Local DP model. Finally, in DPrio there is a possibility that a dishonest client's noise is chosen. This would be detrimental to the privacy of the honest clients. Hence, we suggest using DPrio only when the proportion of clients who might misbehave in this way is small.
Clients colluding with other parties. Depending on how many parties the misbehaving clients collude with along with how many servers are included in the misbehavior, different models provide different protections. We drop from the discussion for simplicity. Case 1: − 1 clients and − 1 servers are controlled by an adversary. Dwork-MPC achieves -DP if no more than 1 3 of the servers are corrupted. In this case, since n-1 servers are corrupted, the last server still adds some Gaussian noise (sampled with variance 3 2 2 ), which offers a much weaker DP guarantee of 2 3 -DP. Local DP provides -DP as each client provides its own protection. Client-DP provides 1−1/ -DP and Server-DP provides -DP. DPrio does not ensure DP as the servers can collude with the clients who contribute noise and remove noise from the answer.
Case 2: Adversary controls − 1 clients and fewer than 3 servers. Dwork-MPC provides -DP as it assumes that at most 1 3 servers collude. The other models provide the same protection as the prior case.
Case 3: Adversary controls clients and − 1 servers. Dwork-MPC only provides 2 3 -DP since the last server still adds Gaussian noise with variance 3 2 2 . Local DP, Client-DP, and Server-DP all provide -DP. DPrio does not offer DP.
Case 5: Adversary controls 0 clients and −1 servers. Dwork-MPC provides 2 3 -DP since since the last server still adds Gaussian noise with variance 3 2 2 . Local DP, Client-DP, Server-DP, and DPrio all ensure -DP. Approx. Approx. Table 1: Comparison of Privacy and Security Properties. N/A means the threat is not applicable in that model. Pure refers to -DP, and Approx. refers to ( , )-DP. HBC means Honest but Curious. We assume at least one honest server and the number of dishonest clients are below some bound. is the number of clients, of which are assumed to be dishonest.
Plain Prio Dwork-MPC* Local DP Client-DP Server-DP/Dwork-MPC DPrio malicious clients collude with a single server. Suppose a malicious client's noise is chosen. The malicious client and server can work together to determine the true result of the statistic once it is published, by subtracting the client's noise. A simple defense against this attack is to employ an intermediary shuffler between the clients and servers, to shuffle the submitted noise before sending it to the DPrio servers. Doing so ensures that DPrio servers cannot link noise received from the shuffler to noise submitted by DPrio clients. This technique is commonly considered in the literature as the shuffle model [9,26]. While introducing additional overhead of a separate shuffler entity, employing the shuffle model in combination with DPrio mitigates potential collusion between clients and servers while maintaining the utility of the central model. Alternatively, the servers can perform an oblivious shuffling protocol upon receiving the data [37], removing the requirement of an independent shuffler role entirely. Laur et al. [37] provide an algorithm of oblivious shuffling with communication complexity (2 log ) where is the number of servers and is the number of data points, a practical option for many real-world use cases of Prio. When a large number of servers are required, the authors describe a protocol with a constant number of rounds relative to servers, achieving total communication complexity of ( 2 ) over rounds. Alternatively, Movahedi et al. [42] suggest a multi-party oblivious shuffling algorithm with communication complexity˜ (1) over log rounds that is secure in a malicious setting of up to 1/3 corrupted servers, the same model as in Dwork-MPC.

EVALUATION
We now evaluate and compare our constructions by their utility guarantees and their performance overheads.

Accuracy
We use the mean squared error in the final noisy answer outputted by the servers to measure the accuracy of each approach. First, the error for Server-DP (Section 5.2.1) is Error Server-DP = (2 ln(1.25/ ))Δ 2 / 2 , where is the number of servers. This error only depends on the number of servers. When is small, the error of Server-DP is close to the accuracy of central DP (independent of the data size/the number of clients ). This error term is independent of the number of honest clients.
Next, for Client-DP (Section 5.2.2), the error was computed as Error Client-DP = ( /( − ))(2 ln(1.25/ ))Δ 2 / 2 , where ( − ) is the number of honest clients and the number of clients submitting noise. This approach assumes that the number of honest colluding clients is no more than (these clients still follow the protocol and submit proper noise). When is small, then this error term is close to the accuracy of central DP.
For DPrio, the error in the final query answer is Error = 2 Δ 2 / 2 , where is the number of clients' noise sampled by the servers. As shown in Section 7.1.2, when > , DPrio satisfies DP guarantee, where is the number of of honest colluding clients (Theorem 7.4). When is small (independent of the number of clients or data size), DPrio sets = + 1 and hence its error is also independent of the data size and has accuracy close to central DP.
The three approaches for Prio have error terms close to central DP, which are all much better than that offered by the local model. We summarize the results in Table 2, Among them, Client-DP has the smallest absolute error term. However, Client-DP has the poorest robustness against malicious clients who submit a bad noise. As the servers aggregate every noise from the client, even if only one client is malicious and submit a bad noise, this noise will be aggregated to the final noisy answer and incur a large error for Client-DP. On the other hand, DPrio samples a very small number of noise from the clients, hence it has a small chance to pick a bad client's noise. As shown in Section 7.2, the probability that a bad noise is sampled in the final query answer is small when and are small. Therefore, DPrio has a much better robustness than Client-DP. Last, Server-DP does not add any client-generated noise, and hence the accuracy guarantee is not affected at all. Therefore, when the number of malicious clients is greater than 1 but smaller than than the number of servers > > 1, DPrio is the best option; when > , Server-DP is preferred; when = 0, Client-DP is preferred, based on their corresponding utility guarantee. We generalize the use of these approaches for Prio with DP in Appendix C.

Efficiency
We present an efficiency comparison in Table 2 between DPrio, Client-DP, Server-DP, and related constructions we seek to improve upon. The table summarizes computational costs on the part of clients and servers, client-server communication costs, serverserver communication costs, computational costs of MPC protocols, and the number of rounds of MPC protocols, if any. It also summarizes the utility of aggregated data if a DP mechanism is in place. A single party can sample noise in constant time ( (1)). For DPrio, we assume that input sizes to the hash function are fixed, so a single call to the hash function costs (1).
Although DPrio requires some computation on the part of the clients and servers, it does not require significant computation costs. Notably, DPrio requires less computation costs on the part of the servers than the Dwork MPC method.

Implementation and Case Studies
8.3.1 Implementation. To empirically assess DPrio, we implement a library and utility program that processes data from simulated clients using either Prio or DPrio. This code makes use of libpriors, a Rust implementation of Prio by Divvi Up of ISRG [33], and consists of less than 700 lines of Rust. We additionally extend libpriors to be able to interpret client data as binary integers rather than discrete bits during aggregation. This extension requires only 55 additional lines of code. Any project using libprio-rs can make use of the provided library to implement DPrio with minimal changes.
This library implements the approximate Laplace method described by [50] to generate Laplace noise and truncate the noise generated by clients. We use + 1 bits to store the noise, where = log 2 (( 6ln(10))/ ) so that the truncation probability for the magnitude of the noise greater than 2 is smaller than 10 −6 , where is the resolution parameter and is the noise parameter Δ+ . The details can be found in Appendix B. As noise can be negative, for each sampled noise, we add 2 and encode the shifted value with + 1 bits. This ensures that only shares of positive values are sent to servers to be operated upon. After aggregation, 2 is subtracted from the sum to find the actual noisy aggregated value, where is the number of clients' noise selected.

Setup.
We explore a set of case studies to determine the relative performance of DPrio and Prio. We test the hypothesis that DPrio incurs limited server processing overhead compared to Prio. All code is available at https://github.com/DPrio-PoPETs/dprio. These case studies simulate two servers computing a population count over a set of clients. Because we focus on server processing overhead in these studies, to avoid network communication delays impacting the results, both servers coexist in the same process. When simulating Prio, the dimension of the data is 1 bit; each client is either in the population being counted or it is not, as represented by its data. In other words, each client submits a data point of 0 or 1. The query sums the data of the clients. In one of these studies, we change and hence the number of bits for the noise in DPrio. libprio-rs does not distinguish noise from data, so the data must be of the same dimension as the noise when simulating DPrio.  Table 3 lists the average results of 50 runs. As increases, the overhead of DPrio decreases as the number of bits for storing noise decreases. Similarly, the average error (the absolute difference between the calculated value and the actual population count) decreases as increases. Overall, the server processing overhead incurred by DPrio is indeed minimal at no more than 5.65% for these parameters.
The second case study varies the number of clients. We measure the time it takes to process input from 1,000, 10,000, 100,000, or 1,000,000 clients. For these studies, is fixed at 0.1 and the number of clients' noise values selected is ⌈ 2 ( )⌉. As illustrated by Table 4, the overhead of DPrio remains roughly constant as the number of clients increases. The average error slightly increases, as the number of clients' noise selected is based on the population, which increases. The server processing overhead incurred by DPrio is minimal at no more than 6.25% for these parameters.
The third case study varies the number of clients' noise selected. We measure the time it takes the implementation to process input from 10,000 clients while varying the number of clients' noise selected from 1 through 16 in a geometric sequence with a common ratio of 2. is fixed at 0.1. As illustrated by Table 5, the overhead of DPrio remains roughly constant as the number of clients' noise increases. The average error increases, however, as more noises are selected. The server processing overhead incurred by DPrio is minimal at no more than 5.45% for these parameters.
In all of the case studies, each client takes around 2.1 times as much processing time to encode the data submitted to the servers for DPrio as with Prio. This is due to the increased dimension of the data in DPrio and the fact that both data and noise has to be encoded and submitted, rather than just the data in Prio. However, the amount of work each individual client does is minuscule compared to the work done by the servers, so the additional work necessary for DPrio is not expected to be prohibitive.  Table 5: Average simulation of server processing time with varying client noises elected. Each simulation has a fixed of 0.1 and a fixed population size of 10,000, each client submitting one bit of data.

ADDITIONAL CONSIDERATIONS
An Optimization for Large Client Sets. DPrio relies upon the probability that enough honest clients submit noise such that servers have a high probability of selecting one honest client at random (or a small set thereof). When the number of clients is large, however, it might be sufficient for only a subset of clients to submit noise. In this setting, clients can use a local probabilistic function (such as flipping a coin) to determine if they should submit noise.
Deterministically Selecting Client Noise. DPrio assumes servers have some deterministic selection mechanism. While some implementations may submit client data along with their identifiers, others may wish to allow clients to remain anonymous. In this setting, clients can simply hash their noise locally (along with a randomly chosen salt) before generating secret shares, and send this as a one-time identifier to servers to use for selection. If there is not an exact match between the server's random value and a client identifier, a deterministic fuzzy matching mechanism can instead be used until a match is found.
Extending Beyond Additive Mechanisms. Our construction and associated privacy property applies to the setting where the servers wish to compute an integer sum, integer mean, or frequency count. Of course, the system can be adapted to have the clients sample noise from different distributions. However, applying more complex mechanisms, such as the exponential mechanism [11], which require more than simply summing the noise with the statistic may be more difficult to integrate. We leave this for future work.

CONCLUSION
The use of private statistics is no longer of purely academic interest; systems like Prio are currently being deployed and used by some of the world's largest organizations. However, simply aggregating all client statistics can still allow for denomination attacks; in such settings, differential privacy provides a formal guarantee for users' privacy. In this work, we define a lightweight mechanism that we call DPrio to add differential privacy to existing Prio systems, with high data utility. DPrio defines an efficient MPC protocol that achieves the same utility as centralized differential privacy, and whose computational overhead to servers remains constant regardless of the number of participating clients. Further, the assumption about the number of honest clients remains low, unlike existing distributed noise generation mechanisms along the lines of Client-DP and Server-DP. While DPrio does require server interaction, such interactions can be batched, so at minimum requires no more communication overhead than plain Prio.

A.2 Secret-Shared Non-Interactive Proofs (SNIPs)
Clients employ an AFE to partition their data value into shares { 1 , . . . , ℓ }, and then upload shares to ℓ Prio servers, such that each server receives one share. Additionally, clients upload a SNIP, which is a secret-shared non-interactive zero-knowledge proof of knowledge demonstrating the data value is valid with respect to the corresponding statistic type, given only the shares { 1 , . . . , ℓ }, the SNIPs, and a MPC operation that each Prio server participates in. For example, if the statistic allows only binary inputs, the set of SNIPs would demonstrate that is either zero or one, without disclosing which is the case. To perform verification, Prio servers use their SNIP as well as their share to collectively perform Verify via a multi-party computation (MPC) operation. The output from each server participating in the Verify protocol is either 0 or 1, indicating whether Verify completed successfully, and nothing else.

B NOISE TRUNCATION
DPrio uses the approximate Laplace method described by the secure noise generation algorithm [50] to generate Laplace noise and truncate the noise generated by clients. Approximate Laplace first sets , the resolution parameter which is the smallest power of two to be greater than Δ/ 2 40 in its java implementation 1 . Given a function that maps a database to a real number, ( ) refers to ( ) rounded to the nearest multiple of . Then it samples an integer with probability proportional to −| | /Δ , where Δ = max , ′ ∥ ( ) − ( ′ )∥ ≤ Δ + . Last, the algorithm returns ( ) + as the noisy output. This process satisfies -differential privacy.
If we use bits to store the magnitude of the noise, then the probability for the noise | | greater than 2 can be charged to the guarantee of the mechanism. We will show the minimum choice of for this truncation probability to be smaller than 10 −6 .
The sampling of integer is drawn from a geometric distribution that is mirrored at 0. The non-negative part of the distribution's PDF matches the PDF of a geometric distribution (of parameter We need > 6ln(10)2 − ′ = 6ln(10)2 − such that ≤ 10 −6 (in a similar fashion as the implementation of [50]). Given chosen values for , Δ (and hence and that depend on and Δ), to maintain the truncation probability of ≤ 10 −6 it suffices to choose the smallest possible such that > log 2 6ln(10) DPrio uses to truncate the value at 2 .

C GENERALIZING DPRIO
DPrio is optimal for a scenario with a large number of clients, very few of which are malicious or controlled by the adversary. We can generalize our construction to consider various adversarial structures. To do so, we present an optimization problem which, given privacy parameters , , a utility constraint, and an adversarial structure, returns an optimal setup to achieve ( , )-DP in the most efficient manner. We generalize DPrio by combining it with Server-DP and Client-DP. That is, there are three points at which a party can add noise to the function being computed: • Clients add Gaussian noise using the mechanism described by Client-DP, each with parameter 1 ; • Clients and servers cooperate to choose Gaussian noise sampled by clients ( ≥ 1), as in DPrio, with parameter 2 ; • Servers sample Gaussian noise, as in Server-DP, each with parameter 3 .