Multi-Party Replicated Secret Sharing over a Ring with Applications to Privacy-Preserving Machine Learning

Secure multi-party computation has seen significant performance advances and increasing use in recent years. Techniques based on secret sharing offer attractive performance and are a popular choice for privacy-preserving machine learning applications. Traditional techniques operate over a field, while designing equivalent techniques for a ring Z 2 𝑘 can boost performance. In this work, we develop a suite of multi-party protocols for a ring in the honest majority setting starting from elementary operations to more complex with the goal of supporting general-purpose computation. We demonstrate that our techniques are substantially faster than their field-based equivalents when instantiated with a different number of parties and perform on par with or better than state-of-the-art techniques with designs customized for a fixed number of parties. We evaluate our techniques on machine learning applications and show that they offer attractive performance.


INTRODUCTION
Secure multi-party computation has recently seen notable performance improvements that make privacy-preserving computation of increasingly complex functionalities on increasingly large data sets more practical than ever before.Recent significant interest in privacy-preserving machine learning (PPML) has highlighted secret sharing techniques which were often previously overlooked in the literature.Secret sharing (SS) offers superior performance for arithmetic operations such as matrix multiplications over other cryptographic tools, and has been extensively used for privacy-preserving neural network (NN) inference and training [14,15,18,27,36,47,49,55,56].Because SS offers informationtheoretic security, computation can proceed on short integers, aiding efficiency.
Traditionally, performance of SS techniques has been measured in terms of two parameters: the number of interactive operations and the number of sequential interactive operations, or rounds.However, for some computations such as matrix multiplication local operations can dominate the overall cost.Traditional techniques such as Shamir SS [54] carry out computation on protected data over a field, most commonly set up as Z  with prime .This makes frequent use of modulo reduction a necessity, increasing the cost of the computation.To improve performance and directly utilize native instructions of modern processors, researchers turned to computation over ring Z 2  [8,12,16,20].Unfortunately, Shamir SS -a popular and efficient choice for computation in the honest majority setting -cannot be used for computation over Z 2  , and we must seek alternatives.
The honest majority setting, which assumes that only a minority of the parties carrying out the computation can be corrupt, offers great performance with reasonable trust assumptions relative to stronger settings, making a good performance-security trade-off.The techniques we are aware of in this setting which can perform computation over ring Z 2  for some  are limited to a fixed number of parties, most commonly to 3 (see, e.g., [8,14,15,41,47]) and cannot tolerate collusion.This means that the techniques do not easily generalize to a larger number of participants, should there be a need to change the computation setup, e.g., to permit the use of a higher collusion threshold.This is the task we set to address in this work and generalize computation based on replicated secret sharing (RSS) to support more than  = 3 computational parties.
Our contributions can be summarized as follows: • We design a comprehensive set of elementary building blocks for RSS over an arbitrary ring in the semi-honest setting.These building blocks include generating shares of pseudorandom integers and ring elements, multiplication, reconstructing a value from shares, multiplication followed by reconstruction as a single building block, denoted by MulPub, and inputting private values into computation.We optimize the solutions to lower communication complexity by relying on a pseudo-random function.This means that the techniques are computationally secure, and they also come with formal security proofs.Our solutions are efficient and, for example, the cost of multiplication when instantiated with three parties matches custom results which apply to the three-party setting only [8,55].• We build on the techniques of [20] and [27] to develop higher-level protocols over Z 2  such as random bit generation, comparisons, conversion between different ring sizes and more to enable general-purpose computation in this framework.• We provide extensive benchmarks to evaluate performance of the developed techniques.We observe that when  = 3 our ring-based techniques can be between 10 and 32 times faster than their field-based counterparts for different types of operations.Incorporating recent advances in random bit generation can yield even more promising results.The improvement from switching to ring-based techniques decreases as the number of parties  grows, but with  = 7 we can still observe runtime improvements by a factor of 2 for certain operations.
• We improve the techniques of [18] for securely evaluating quantized NNs and eliminate the need for fixed-point multiplication and large truncation, which enables us to use a significantly smaller ring.• We also evaluate performance of our techniques on machine learning applications, namely, NN predictions and quantized NN inference.Similarly, our runtimes are significantly faster than similar field-based implementations and compare favorably to the state of the art designed to work with a fixed number of parties.
For RSS-based techniques, it is expected that they will be used with a relatively small .This is similar to most efficient techniques based on Shamir SS (e.g., [11,13]) which also rely on RSS for certain operations.

RELATED WORK
Secret sharing [10,54] is a popular choice for secure multi-party computation, and common options include Shamir SS [54], additive SS, and RSS [31] for three parties.Computation over rings, and specifically Z 2  , has recently gained attention in publications including [5,8,12,16,18,20,22,26,34,41].We can distinguish between three-party techniques based on RSS such as [5,8,12,22,26,34,41]; multi-party techniques based on additive SS such as [16,20], often for the setting with no honest majority; and ad-hoc techniques for three or four parties that utilize one or more types of rings with constructions for specific applications such as [33] and others.The first category is the closest to this work and includes Sharemind [12], a well-developed framework for three-party computation with a single corruption using custom protocols; Araki et al. [8] who use three-party with a single corruption to support arithmetic or Boolean circuits; and several compilers from passively secure to actively secure protocols [5,22,26,41].Dalskov et al. [19] also studied four-party computation with a single corruption.We are not aware of existing multi-party techniques with honest majority over a ring which extend beyond three parties or multi-party protocols based on RSS over a ring.While RSS is meaningful only for a small number of parties, we still find it desirable to support more participants and build additional techniques for this setting.For example, if our matrix multiplication protocol over a ring with three parties is 100 times faster than field-based computation, it will remain faster even if the work increases when the number of parties is larger than 3.
We rely on the results of Damgard et al. [20] for some of our protocols.While this work is for the SPDZ 2  framework [16] in the malicious setting with no honest majority, once we develop elementary building blocks, the structure of higher-level protocols can remain similar.Composite protocols such as comparison, conversion, and truncation require a large number of random bits.We leverage the edaBit protocol from [27] to efficiently generate sets of binary and arithmetic shared bits.Their technique improves upon the daBit technique [52].Rabbit [44] builds on daBits [52] and edaBits [27] and developed an efficient -party comparison protocol by relying on commutativity of addition over fields and rings.Their protocol offers significant improvement over [27] in most adversarial settings over a field, but remains comparable with a passively secure honest majority over a ring.

Setting
Techniques No. of Networks S-H Mal HE GC SS Parties [42] qMob.
SecureML [48] ✓ Table 1: Comparison of state-of-the-art PPML frameworks.(*) [18] supports  parties in the semi-honest, honest majority setting over a field F  , but only three parties over a ring.The two NNs we consider are [42]'s four-layer convolutional NN, and the quantized version of MobileNets (qMob.)[30].
Literature on PPML is also related to this work, and we present a high-level comparison of the current state-of-the-art in Table 1.Each framework is subdivided according to their security assumptions (semi-honest or malicious), the cryptographic techniques used, the number of parties supported, and the methods of evaluation.We highlight several key works below.
We distinguish between two-party solutions, where one party holds the model and the other holds the input on which the model is to be evaluated, and between multi-party (typically, three-party) solutions.Publications from the first category include MiniONN [42] and Gazelle [33], both of which studied NN evaluation using SS, homomorphic encryption (HE), and garbled circuits (GC).
Multi-party constructions provide protocols for training and inference across multiple parties.ABY3 [47] combines techniques based on replicated and binary SS with GCs in the three-party setting with honest majority.SecureNN [55] provides three-party protocols for a variety of NN functions under the same security assumption as ABY3.Their protocols are asymmetric, where parties have dedicated roles in a computation.This work is improved upon with FALCON [56] by adding malicious security with honest majority and combining the techniques from SecureNN and ABY3.
ASTRA [14] is a three-party framework that uses SS over the ring Z 2  under both semi-honest and malicious security assumptions.Similar to SecureNN, protocols are asymmetric.Abspoel et al. [6] apply the MP-SPDZ [34] framework for secure outsourced training of decision trees.Their system operates under the three-party, honest-majority assumption with RSS.Dalskov et al. [18] were the first to address quantized NN inference using secure multi-party computation.Their system is built into MP-SPDZ and benchmarked on the MobileNets [30] network architecture.Keller et al. [36] conducts quantization-based training and inference with three parties and one semi-honest corruption.

PRELIMINARIES 3.1 Secure Multi-Party Computation
We consider a secure multi-party setting with  computational parties, out of which at most  can be corrupt.We work in the setting with honest majority, i.e.,  < /2 and semi-honest participants and use simulation-based security (see Appendix B for detail).
As customary with SS techniques, the set of computational parties does not have to coincide with (and can be formed independently of) the set of parties supplying inputs in the computation (input providers) and the set of parties receiving output of the computation (output recipients).Then, if a computational party learns no output, the computation should reveal no information to that party.Consequently, if we wish to design a functionality that takes secret-shared input and produces shares of the output, any computational party should learn nothing from protocol execution.

Secret Sharing
A SS scheme allows one to produce shares of secret  such that access to a predefined number of shares reveals no information about .In the context of secure multi-party computation, each of the  participants receives one or more shares   and in the case of (, ) threshold SS schemes, possession of shares stored at any  or fewer parties reveals no information about , while access to shares stored at  + 1 or more parties allows for reconstruction of .Of particular importance are linear SS schemes, which have the property that a linear combination of secret shared values can be performed locally on the shares.Examples of linear SS schemes include additive SS with  =    (as used in Sharemind [12] with  = 3 and in SPDZ [23] with any ), Shamir SS which realizes (, ) secret sharing with  < /2 and represents a share as evaluation of a polynomial on a distinct point, and RSS, which we discuss next.

Replicated Secret Sharing
Our techniques utilize RSS [31] which has an associated access structure Γ.An access structure is defined by qualified sets  ∈ Γ, which are the sets of participants who are granted access, and the remaining sets of the participants are called unqualified sets.In the context of this work we only consider threshold structures in which any set of  or fewer participants is not authorized to learn information about private values (i.e., they form unqualified sets), while any  + 1 or more participants are able to jointly reconstruct the secret (and thus form qualified sets).RSS can be defined for any  ≥ 2 and any  < .To secret-share private  using RSS, we treat  as an element of a finite ring R and additively split it into shares   such that  =  ∈ T   (in R), where T consists of all maximal unqualified sets of Γ (i.e., all sets of  parties in our case).Then each party  ∈ [1, ] stores shares   for all  ∈ T subject to  ∉  .In the general case of (, )-threshold RSS, the total number of shares is   with −1  shares stored by each party, which can become large as  and  grow.In what follows, we use notation [] to mean that (private)  is secret shared among the parties using RSS.
In what follows, we use the notation ← to denote output of randomized algorithms, while the notation = refers to deterministic assignment.

BASIC PROTOCOLS
Recall that RSS enjoys the linear property.In addition to adding secret-shared values, we use the ability to add/subtract known integers to a secret-shared value [] and multiply a secret-shared value [] by a known integer.Addition [] +  converts  to [] without using randomness (e.g., we could set one share to  and the remaining shares to 0 to maintain  ∈ T   = ).Multiplication For convenience and without loss of generality, we let  = 2 + 1.When  > 2 + 1, 2 + 1 parties can carry out the computation on a reduced set of shares in such a way that there is no need to involve the remaining parties in the computation.

Random Number Generation
We will be using two types of random number generation, which we discuss here.

PRG. Invocation of [𝑎
) is realized by independently executing a PRG algorithm on each share of  without interaction between the parties.Because the output of PRG( []) is private, we expect it to produce a sequence of secret-shared values (represented as ring elements).Furthermore, in our construction we only call the PRG to obtain random (secret-shared) ring elements.This means that calling PRG(  ) to produce pseudo-random   will result in PRG([]) generating [], where  is pseudo-random as well because  =  ∈ T   (in R).This is similar to evaluating a PRF on a secret-shared key in the RSS setting without interaction in [17].
PRG(  ) can be realized internally using any suitable algorithm, as long as it is consistent among the computational parties.For example, because of the speed of AES encryption on modern processors, one might implement PRG(  ) = PRF(  , 0)||PRF(  , 1)||. .., where PRF : R × {0, 1}  → R is a PRF instantiated with AES.
Let G = PRG([]).When the output of G is not consumed all at once, we use notation G.next to retrieve the next (secret-shared) element from G. Similarly, if G  = PRG(  ), notation G  .nextrefers to the next pseudo-random share output by G  .

PRandR.
[] ← PRandR() computes a secret-shared random element of ring R. We implement this function by executing PRG( []).next,where  is a system-wide key.The key  is set up at the system initialization time (in the form of secret shares) and does not change throughout program execution.

Multiplication
Note that for any ( 1 , 2 ) pair, there will be a party holding shares  1 and  2 , and thus performing this operation involves local multiplication and addition over different choices of  1 , 2 .More formally, let mapping  : T ×T → [1, ] denote a function that for each pair ( 1 , 2 ) ∈ T 2 dedicates a party  ∈ [1, ] responsible for computing the product   1 •  2 (clearly,  must possess shares  1 and  2 ).For performance reasons, we also desire that  distributes the load across the parties as fairly as possible.
As a result of this (local) computation, the parties hold additive shares of the product • = , which needs to be converted to RSS for consecutive computation.This conversion was realized in early publications [9,45] by having each party create replicated secret shares of their result and distribute each share to the parties entitled to knowing it (i.e., party  receives shares from each party for each  ∈ T subject to  ∉  ).This results in each participant creating   shares and sending −1  of them to each party.Consequently, each participant adds the values received for share  and stores the sum as   , for each  in its possession.
More recent work, e.g., [8] and others traded information-theoretic security (in the presence of secure channels) for communication efficiency by having the parties generate shared (pseudo-) random values.We pursue this direction as well.However, if this idea is applied naively, it results in unnecessarily high overhead.In particular, if we instruct each party  to generate all shares for its secret, some shares will be known to more than  participants and thus do not contribute to secrecy.Instead, our solution eliminates shares that  does not possess and thus do not contribute to secrecy.Thus, our construction utilizes key material consistent with the setup of the RSS scheme.In particular, we use the same key setup as in pseudorandom secret sharing, where   is known by all  ∉  .Then when a party needs to generate a pseudo-random share associated of its value for share  , the party will draw it from the PRG seeded with   .
We, however, note that multiple participants may need to draw from the PRG seeded with   to produce shares of their values, and it is generally not safe to use the same secret to protect multiple values, which is also the case in our application.Instead, multiple elements might be drawn from the PRG (seeded with   ) to protect different values, and consistent use of the PRG with each seed can be setup by the participants ahead of time, such that this information is public knowledge.
In addition to the mapping , our multiplication protocol requires another mapping  : [1, ] → T , which specifies for each party  the share  (subject to  ∉  ) that  communicates (with all other shares of 's value being produced as pseudo-random elements).As before, we desire to choose the values of  () as to evenly distribute the load and communication.
The above intuition leads us to the optimized -party multiplication protocol given as Protocol 1.After computing its private value  ( ) according to , each party  distributes it into −1  additive shares (one of which is communicated while others are computed using PRGs).Afterwards, each party sets its   as a sum of  + 1 shares (computed or received) of values  ( ′ ) for each party  ′ entitled to shares   .This matches the fact that each share   of secret  is maintained by  + 1 parties.Correctness is achieved by ensuring that in Protocol 1 two different participants  and  ′ with access to shares  consistently associate the values that they draw from G  with shares belonging to different parties by always processing the values in the increasing order of participants' IDs.Preparation of the shares in Protocol 1 is done on lines 10-16, where a participant either masks its share with a pseudo-random value because it is used by another party or forms its own shares and the value to be transmitted.
In this protocol, each party on average sends  ring elements and draws . The latter can be explained by using −1  − 1 pseudo-random shares for its value being re-shared and −2  shares that it has in common with any other party except the  values that it receives with a symmetric communication pattern.(Recall that each party maintains −1  shares of a secret and has −2  shares in common with any other party).When the communication pattern is not symmetric, the overall amount of work and communication remains unchanged, but it may be distributed differently.Thus, we refer to the average work and communication in that case.
Compared to other results, the three-party version of our protocol matches communication of recent multiplication from [8], which is available only for three parties and improves on communication of Sharemind's three-party multiplication from [37] by a factor of 2. For multi-party multiplication it can be desirable to mappings: use a different communication pattern when a designated party reconstructs a protected value and communicates it to others (as in, e.g., [21]) which scales better as  grows.However, our version has lower communication when  = 3, uses fewer rounds, and  is typically small with RSS.
We state security of multiplication as follows, with its proof available in Appendix B: is secure according to definition 1 in the (, ) setting with  = 2 + 1 in the presence of secure communication channels and assuming PRG is a pseudorandom generator.
Our multiplication protocol shares conceptual similarities with (optimized) multiplication from [35].In particular, both sample pseudorandom secret shares according to the access structure and communicate a single (properly protected) element to a number of other participants.Our solution explicitly defines all maps and the computation associated with computing each share of the output, while the latter appears to be under-specified in [35].
The computation associated with multiplication can be generalized to compute the dot-product of two secret-shared vectors , or evaluate any other multi-variate polynomial of degree 2, using the same communication and the same number of cryptographic operations as in multiplication.For that purpose, we only need to change the computation in step 3 of the multiplication protocol.For example, for DotProd, we modify step 3 to compute (in R), while the rest of the steps remain unchanged.
Table 2 shows performance of these and other basic protocols for the general (, ) and the (3,1) settings.Communication is measured as the number of ring elements sent by each party and computation is the number of cryptographic operations (i.e., retrieval of the next pseudo-random element using a PRG) per party.

Revealing Private Values
Open.Reconstruction of a secret shared value  = Open( []) amounts to communicating missing shares to each party such that the value could be reconstructed locally from all shares.Recall that there are   total shares and each party holds −1  of them.Thus, each party receives  =   − −1  missing shares during this operation.
Our next observation is that when  is not small (such as when  = 7), the value of  will exceed  and transmitting  messages to each party is not needed.Since the value is reconstructed as the sum of all shares, it is sufficient to communicate sums of shares instead of the individual shares themselves.Recall that [] can be reconstructed by  + 1 parties.This means that it is sufficient for a participant to receive one element (i.e., a sum of the necessary shares) from  other parties.
As before, we would like to balance the load between the parties and ideally have each party transmit the same amount of data.This means that we instruct each party to send information to  other parties according to another agreed upon mapping  : [1, ] → (T , [1, ])  .For each party , this mapping will specify which of 's shares should be communicated to which other party.The mapping  will then define computation associated with this operation: each  computes  , ( )= , ′   (in R) for each  ′ ≠  present in the mapping and sends the result to  ′ .
Similar to other SS frameworks, simply opening the shares of  maintains security of the computation (in the sense that no information about private values is revealed beyond the opened value ).This is because we maintain that at the end of each operation secret-shared values are represented using random shares.In particular, it is clear that the result of PRG([]).nextand PRandR() []) refers to multiplying two secret-shared [] and [] and opening their product .We discuss this functionality because in the past, this operation could be implemented more efficiently than multiplication followed by an opening in alternative SS frameworks (e.g., see [13]), and we pursue a similar direction here.In the protocol we present here, MulPub is realized using a single round without increasing communication Operation Rounds Table 2: Performance of basic RSS operations with computation and communication per party.In multiplication, after computing a product, each locally calculated value is no longer random and must be re-randomized prior to opening it.In our RSS setting, this is realized by relying on parties locally computing pseudo-random values.Specifically, we associate a secret key   with each  ∈ T (i.e., this is the same key shares used with PRandR() and multiplication) and use pseudo-random values G  .next to protect the share of the product that each party locally computes, prior to that party revealing its randomized value to all others.We require all blinding pseudo-random values sum to 0 to ensure the reconstructed product is correct.In the three-party case, this can be achieved by adding some pseudo-random values and subtracting others, as illustrated in Figure 3.
With larger  and , we must be careful to draw new elements from each PRG to ensure that values released by different parties are protected using proper randomness without reusing them.This is similar to the logic used in multiplication.Then to realize this logic and ensure that all blinding factors add to 0, when multiple values are sampled from G  , the last blinding value is set to the sum of all previously drawn elements multiplied by −1 (in R).We provide a detailed description of MulPub in Protocol 2. G  and   are defined as in multiplication.
In this protocol, each party draws the same number of elements from each G  in its possession to ensure that after a single protocol execution all parties are in the same state (but a party may discard some computed values).Similar to the computation in multiplication, we order the parties based on the values of their IDs.Because any given share  is stored at  + 1 parties, there are  calls to each G  per invocation of this operation.Then the participant with the lowest ID among the parties with access to  ( = 0) uses the first element of G  to protect its value  ( ) and disregards the  − 1 mappings: (1) +  (2) +  (3)   input: (1) =  {2}  {2} +  {2}  {3} +  {3}  {2} computation: for  ∈   do 4: let  be the number of parties  ′ < for  ′ ∉ ; 5: for  = 0 to  − 1 do else if  =  then  ( ) = ( ) +; send  ( ) to all other parties, and set  =  ( ) ; 13: for  = 1 to  − 1 do receive  ( ′ ) from distinct  ′ , set  =  +  ( ′ ) ; 15: end for 16: return ; other elements, the participant with the next lowest ID uses the second element, etc.The participant with the highest ID among those with access to  ( = ) computes the sum of all  elements drawn from G  and subtracts the sum from its  ( ) .Correctness follows from the fact that the sum of all blinding values over all parties and all shares is equal to 0, i.e.,  =   ( ) =   ( ) (in R).
To show security, we prove the following result: ) is secure according to definition 1 in the (, ) setting with  = 2 + 1 assuming PRG is a pseudo-random generator.
Similar to multiplication, MulPub can be generalized to evaluate any (multi-variate) polynomial of degree 2 and open the result.

Inputting Private Values
There will be a need to enter private values into the computation in subsequent protocols, and we defer two variants of this functionality -when input is provided by an external party and one of the computational parties -to Appendix A.

COMPOSITE PROTOCOLS
While the previous operations can be instantiated to work with any finite ring, the techniques in this section work only in a ring Z 2  for some .Ring Z 2  is the primary reason for supporting secure computation over rings because it enables utilization of native CPU instructions for ring operations.
The goal of this work is to enable efficient general-purpose computation over rings Z 2  , we therefore focus on major building blocks which can be consequently used to compose a protocol for arbitrary functionalities including machine learning tasks.Of central importance to this effort is the development of comparison protocols (for both less-than comparison and equality testing functionalities), which are known to be difficult to design in a framework where the elementary techniques are based on arithmetic gates.Others include bit decomposition and truncation (i.e., division by a power of 2).Combined, these techniques can enable Boolean, integer, fixed-point, and even floating-point arithmetic, as well as array and related operations, giving the ability to compose general-purpose protocols.
Because a number of protocols for common operations over Z 2  have already been developed, some of the constructions that we mention in these sections are adaptations of prior protocols to our setting and we defer their specification to the appendix.In particular, Appendix A provides specification of random bit generation protocol, RandBit, that produces a bit shared in Z 2  and a more recent version from [27], edaBit, that generates a number ( in our case) of random bits   shared in Z 2 together with a representation of the bits as an integer  =  =1 2    shared in Z 2  .The former can be computed in a single round, while the latter uses noticeably lower communication per bit, but the round complexity is logarithmic in  and .
We also describe a comparison algorithm for computing [] ≤ [], which is commonly implemented by determining the most significant bit of the difference between  and  and denoted by MSB.Performance of these protocols is summarized in Tables 3  and 13.
Truncation is a necessary building block when working with fixed-point values or simulating fixed-point computation using integer arithmetic and permits us to minimize the ring size.Starting from [13], probabilistic truncation of input  by  bits that produces ⌊/2  ⌋ + , where  is a bit, is significantly faster than precise truncation that rounds down.It is biased towards rounding to the nearest integer to /2  and is sufficient for our purpose.The protocol we present, TruncPr([], ), is a constant-round solution that combines the approach from [18] with edaBits from [27] and inherits from [27] the requirement that input  is 1 bit shorter than the ring size, i.e., MSB() = 0. We use notation [] ℓ to denote that SS is over Z 2 ℓ .
The truncation protocol, given as Protocol 3, uses related random values  and r , bit decomposition of which are known, where  =  −1 =0 2    is a full-size random value and r =  −1 = 2    is the portion remaining after truncating  bits.We thus modify the ed-aBit protocol to produce those values simultaneously.Each [ ] and [ r ] is computed as a sum of  + 1 integers, so we must compensate for two types of carries: (i) addition of  least significant bits in  will produce carry bits into the next bits which are not accounted for in r and (ii) while the carry bits past the  bits are automatically removed in the ring when computing  , these bits remain in r due to its shorter length.Because we compute the bitwise representation of  using bitwise addition protocol BitAdd, we can also extract the carry bit into any desired position which is already computed during the addition.The logic of the truncation protocol necessitates the removal of the ( − 1)th bit.For this reason, we capture carries into the th and ( − 1)th positions and denote those bits from the th call to BitAdd as cr , and cr , −1 , respectively (line 10).We subsequently convert the 2 log( + 1) carry bits and the most significant bit of  , denoted as   −1 , from shares over Z 2 to Z 2  using binary-to-arithmetic sharing protocol B2A (from [20]).All interactive operations except the last one (line 20) can be precomputed.Security follows from the protocol logic as specified in prior work and from security of the building blocks.
It is also possible to use the above protocol to truncate an input [] by a private number of bits [] as outlined in [18]: Let  be some public upper bound on .Protocol TruncPriv( [], [], ) then needs to securely compute [2  − ] • [] and can subsequently call TruncPr([2  − • ], ).A performance summary is given in Table 3. for  = 1, . . ., ⌊/2⌋ in parallel do
A neural network is a series of interconnected layers consisting of neurons.Each neuron has an associated weight and bias used for computation on some input data and outputs a prediction based on that data.A NN network layer takes the form y = (xW + b), where x is the input vector from the previous layer, W is the weight tensor, b is the bias vector, and  is some activation function.Sample activation functions are Rectified Linear Unit (ReLU), which on input x = ( 1 , . . .,   ) computes y = ( 1 , . . .,   ) where each   = max(0,   ), and its variant ReLU6 which computes   = min(max(0,   ), 6).

Share Conversion
Conventional NN evaluation uses floating-point arithmetic, while secure evaluations for performance reasons typically employ fixedpoint computation or emulate it on integers.If inputs are represented in the form of fixed-length integers, the values will grow with each layer that performs matrix multiplication.This can impact on performance because comparison-based activation and pooling operations have cost linear in the bitlength of ring elements.For this reason, it can be advantageous to start with a smaller ring size and increase it mid-computation to accommodate longer values.This approach involves converting secret-shared []  over Z 2  to a different representation []  ′ over Z 2  ′ for  ′ > .Conversion techniques between certain types of fields are known [24], but they do not apply to our case.Simply casting -bit shares to  ′ -bit shares for  ′ >  affects correctness because the overflow due to share addition is not reduced modulo 2  .Thus, the task is to leave  least significant bits of the value and erase the remaining bits in a longer share representation.One way to achieve this is to invoke truncation as However, because computing precise truncation is costlier for rings than fields, we design a more efficient version based on bit decomposition.In particular, we perform bit decomposition of []  into shares of bits in Z 2 , convert the bit shares to Z 2  ′ , and reassemble []  ′ .This procedure is denoted by Convert and given as Protocol 4 using edaBits.An equivalent version can be constructed using RandBit.It is based on bit decomposition from [20] and uses Boolean to arithmetic conversion, B2A, from Z 2 to Z 2  ′ and bitwise integer addition, BitAdd.Performance is summarized in Table 3.

Quantized Neural Networks
To improve efficiency of NN inference, it is common to employ quantization, which makes the resulting models suitable for deployment in constrained environments and is a well-studied field (see, e.g., [29]).We outline the standard quantization approach from [32] and its privacy-preserving realization from [18] for quantized TFLite models and consequently describe our optimizations.
For a vector x, each real-valued   is represented as   = ( x −), where  ∈ R is the scale and  and x are 8-bit integers with  being the zero point.Given an input column vector x = ( 1 , . . .,   ) and a row vector w = ( 1 , . . .,   ) of W with quantization parameters ( 1 ,  1 ) and ( 2 ,  2 ), respectively, the dot product of x and w,  =  =1     , is specified with quantization parameters ( 3 ,  3 ).
Computing  requires integer-only arithmetic and is guaranteed to fit in 16 + log  bits.The scale  =  1  2 / 3 is a small real number.It can be written as  = 2 −  ′ with normalized  ′ ∈ [0.5, 1) which informs the value of  and represented as a 32-bit integer  ′′ , where  ′ ≈ 2 −31  ′′ .Two-dimensional convolutions typically add a quantized bias b once the dot product is computed.This is handled by setting the scale of the bias to  1  2 and the zero-point to 0, such that the bias can be added to  prior to scaling.The last step of a convolution layer is to apply an activation function such as ReLU6.In a quantized NN, this functions as a clamping operation which eliminates values outside of range [0, 255] and uses  3 = 6/255 and  3 = 0.This guarantees correct range while maximizing precision with 8-bit quantized values.Going forward,  3 becomes  1 for the next layer and thus all intermediate layers share the same  1 =  3 = 6/255.Other activation functions such as sigmoid would be handled differently, but we only consider clamping-based functions like [18].
Computing the convolution layer securely requires the model owner to enter private quantization parameters into the computation, including all zero points   , modified scale  ′′ , and integer scale adjustment 2  − −31 , where  is an upper bound set to 63.After privately computing the dot product [] and adding the bias vector [ b], the result is multiplied by [ ′′ ] and need to be truncated by private amount 31 + .The truncation is accomplished by multiplying the scaled dot product by [2  −−31 ] and [•] and consequently truncating by  bits.Lastly, after adding [ 3 ] locally, clamping the result to the interval [0, 255] is performed using two comparisons.
A limitation of [18]'s approach is it required large scaling factors and consequently a large ring size of  = 72 for working with real numbers, using -bit truncation with  = 63.We propose a modified approach where scales are folded into other aspects of the layer computation and conduct smaller truncation at the end of each layer, which guarantees compact representation of intermediate results.
Let superscript ⟨⟩ denote the layer number.Starting from layer 0, the entire layer computation (dot product, scaling, and clamping) can be interpreted as computing 0 ≤ ȳ⟨0⟩ ≤ 255, where and

⟨𝑖 ⟩
3 was set to 0, as prescribed by the clamping operation, for all layers except the last one.Because  ⟨0⟩ 3 = 6/255, we scale the equation to redefine ȳ⟨0⟩ as 2 .Now, our clamping operation can use these bounds, with the upper bound being privately entered by the model owner to avoid division.As before, the output of this layer becomes the input for the subsequent layer, i.e., x ⟨ ⟩ = ȳ⟨−1⟩ .Our modified incoming vector, denoted x ⟨1⟩ , is coupled with an additional scaling factor of (255 2 ).This expression can be evaluated securely without needing fixed-point multiplication or large truncation, and all bounds are computed by the model owner prior to privately entering them in the computation.
Evaluating subsequent layers in this fashion causes the outputs to grow by factor 2 /6 with  ⟨0⟩ = 1.However, we can ensure values remain small by truncating the output ȳ⟨+1⟩ by ℓ ⟨ ⟩ bits.With the right choice of ℓ ⟨ ⟩ we are able to maintain the necessary accuracy, and the value of  ⟨+1⟩ consequently becomes The maximum number of bits we can truncate in a layer needs to comply with constraint ) .Once again, these values are independent of the input data and become a part of the model.We thus can use TruncPriv outlined in Section 5 for truncation by a private amount.The net result is that we are able to use a significantly smaller bound  and consequently substantially shorter ring size .In practice, the coefficients introduced in our methodology can reasonably be folded into the scaling factors  themselves.
Other layers such as average pooling can be approximated by substituting the division by some integer  with truncation by ⌊log ⌋ bits, and softmax can be replaced with argmax when computing the final prediction.These changes can slightly impact the scaling factors, but have no impact on the accuracy since we leverage basic algebraic properties, without changing the fundamental calculation itself.

PERFORMANCE EVALUATION
We implemented the protocols described in this work and evaluate their performance.We run micro-benchmarks to evaluate the individual operations as well as offer evaluation of machine learning applications.
The implementation was done in C++ and is available at [4].We use AES from the OpenSSL cryptographic library [1] to instantiate the PRF and also to implement secure communication channels between each pair of the computational parties.We report the average execution time of 1000 executions for the micro-benchmark  experiments and the average time of 5 executions for the application experiments.The runtimes are also averaged across the computation parties.All experiments use identical 2.4 GHz virtual machines with 26 GB of RAM.They were connected via 10 Gbps Ethernet links, which we throttled to 1 Gbps using the tc command.Two-way latency was measured to be 0.106 ms.All experiments use a single core.WAN benchmarks can be found in Appendix D.

Micro-benchmarks
In this section we report performance of individual operations such as multiplication, matrix multiplication, random bit generation (RandBit and edaBit) and comparison (MSB).The experiments used two bitlengths,  = 30 and  = 60, which allows us to use the uint32_t and uint64_t integer types, respectively, to implement ring operations.
Tables 4 and 5 report performance of multiplication and matrix multiplication, respectively.As we strive to measure performance improvement when we switch computation from a field to a ring, we compare performance of our protocols to those using Shamir SS in the same setting (i.e., semi-honest security with honest majority) using PICCO implementation [57] with recent improvements to multiplication from [11].The field size is set to accommodate 30and 60-bit integers.Batch size denotes how many operations were executed at the same time in a single batch.
We measure runtime and communication with a number of parties ranging from 3 to 7. For field multiplication, we measure performance of two variants: GRR-based with higher asymptotic communication and 1 round (FG) and DN-based with lower asymptotic communication and 2 rounds (FD) as described in [11].The former is strictly better in the three-party setting.The latter, despite its  lower communication, does not lead to better performance as the number of parties increases as it internally relies on RSS.However, the difference in performance of the two variants is not substantial enough to play a major role in larger computations, as is demonstrated in Table 5.We therefore proceed with FG with 3 parties and FD with 5-7 parties in other experiments where multiplication is used.
From Table 4 we observe that our RSS performance is up to 20 times faster with a sufficiently large batch size in the 3-party setting compared to the field and some performance advantage is maintained even with 7 parties despite the need to compute with a much larger number of shares.Note that the performance gain is due to faster instructions because communication is comparable across different variants.This indicates that using native CPU instructions for secure arithmetic has remarkable advantage.
Matrix multiplication in Table 5 is performed in a single round using the necessary number of dot-products.Because local work is the bottleneck, we see performance improvement by up to a factor of 32.3 after switching to a ring with 3 parties.Performance improvement with 5 parties is by up to a factor of 8.3 and up to a factor of 2 with 7 parties.The ring performance is superior for all configurations evaluated except for the two largest matrices with 7 parties.
Tables 6 and 7 provide random bit generation results.To support -bit integers, ring-based RandBit requires ring Z 2 +2 .Field-based RandBit from [13] does not increase the field size; however, all uses of RandBit we are aware of are for operations such as comparisons that utilize statistical hiding and, as a result, increase the field size by a statistical security parameter  (typically set to 48 in implementations).For this reason, our field-based RandBit and MSB benchmarks utilize 79-and 109-bit fields.Both versions of RandBit in Table 6 communicate the same number of field or ring elements; however, the performance gain of the ring version grows as we increase the batch size, reaching 10 to 12-fold improvement with 3 and 5 parties and indicating that local field-based computation is the bottleneck.This is in large part due to the need to perform modulo exponentiations (see [13]).That is, even though field-based RandBit also relies on RSS, other non-RSS computation such as modulo exponentiation is significant and the overall slowdown with the number of parties is not as large.In the 7-party setting the improvement of the ring-based variant is by up to a factor of 6.
The concept of edaBit is recent and for that reason in Table 7 we compare our implementation to that reported in the original publication [27], available through MP-SPDZ repository [3].Note that each edaBit corresponds to generating  random bits together with the corresponding -bit random integer.It is clear from the table that MP-SPDZ's implementation is optimized for large sizes and fast networks.In particular, it gives comparable runtime for batches of size 1 and 1,000.For the same reason, we were unable to accurately report communication cost per operation from the experiments and refer the reader to the original publication [27] for that information.Note that the times we measured for MP-SPDZ are very different from those originally provided in [27], which reported the ability to generate 7.18 million 64-bit edaBits per second.This is over 15 times faster than the fastest time per operation we record and stems from the differences in hardware.In particular, experiments in [27] were run multi-threaded on powerful AWS c5.9xlarge instances with 36 cores and a 10 Gbps link.This distinction highlights the need to reproduce experiments on similar hardware to draw meaningful comparisons about performance of different algorithms.
Table 8 reports performance of multiple MSB protocols: (i) fieldbased protocol from [13] using PICCO's implementation with optimizations from [11], our ring implementations (ii) using RandBit and (iii) using edaBit, and ring-based implementations from MP-SPDZ [3] (iv) using edaBit and (v) using ABY3.The last two support only three-party computation.
The gap between the first two shows performance improvement due to switching from field-based to ring-based arithmetic.Both of them make a linear in  number of calls to RandBit, but our implementation executes BitLT over Z 2 , while field-based uses a fixed field for all operations.As a result, our ring RandBit-based MSB is up to 26.9 times faster than the field version with 3 parties, up to 18.9 times with 5 parties, and up to 8.1 times with 7 parties.
If we compare our RandBit and edaBit MSB implementations, the use of the edaBit version becomes advantageous starting from batch sizes of 100 with 3 parties, 1000-10000 with 5 parties, but is not beneficial with 7 parties.This can be explained by the need to perform a larger number of bitwise additions during edaBit generation as the number of computational parties increases.
MP-SPDZ's edaBit-based implementation in the three-party setting generally took longer to run than our edaBit-based implementation until the batch size becomes large.As explained earlier, this is due to different performance emphases in the two implementations.ABY3 (three-party) implementation is slower than what we obtain except for the largest batch sizes with the longer bitlengths.We also visualize time per operation with variable batch sizes in Figure 4  It is also informative to compare our field vs. ring results with those of SPDZ.While SPDZ [23] and its ring version SPDZ 2  [16,20] use a much stronger adversarial model and different type of SS, we would like to know whether similar savings are achievable in different settings.[20] reports that performance improved by a factor of 4.6-4.9 for multiplication and by a factor of 5.2-6.0 for RandBit-based comparison on a 1Gbps LAN.The results are only provided as throughput improvement and do not report different batch sizes.In our experiments we observed greater improvements, up to 20 times for multiplication and up to 26.9 improvement for MSB.This may be explained by the fact that our techniques are more lightweight and perhaps switching to faster arithmetic makes less of an impact in the SPDZ setting.

Machine Learning Applications
We next evaluate our protocols on machine learning applications and show that they exhibit good performance.We consider NNs and quantized NNs, in part to facilitate comparison to prior work.
Neural Networks.There are many types of NNs, and for our standard benchmarking we chose the NN from MiniONN [42] for the MNIST dataset [39] (Figure 12 in [42], Network B in [55], and Network C in [56]), because it is a popular choice for evaluating privacy-preserving NN inference.The MNIST NN evaluation uses convolution, fully-connected layers, an ReLU activation function, and max pooling of a window 2 × 2 to compute the maximum element in that window.
We use MiniONN's implementation choices and, in particular, run the computation on integer inputs.To avoid using floating-point arithmetic, [42] scaled inputs by a factor of 1000 and rounded to the nearest integer.To compensate for the bitlength of the intermediate results growing with each multiplication, [42]'s implementation ran the computation using a 37-bit modulus and avoided the use of truncation.However, we determined that this size is too small, and 49 bits are needed to correctly evaluate the model, which we subsequently use.Our implementation achieves the same 99.0%precision as reported for this model in [43] (which corrects [42]).
While it is possible to perform the entire computation in Z 2 49 , we observe that the initial steps are of the largest size and use significantly shorter integers than 49 bits.Because the cost of comparisons is linear in the bitlength of the ring elements, we can substantially improve performance by starting computation on shorter values and converting the intermediate results to a larger ring prior to multiplication, which increases the size of the intermediate results.Therefore, we start computation with 20-bit integers and increase the ring size by 10 bits prior to subsequent matrix multiplications.
Performance of MNIST NN inference with three parties (total time) is presented in Table 9.We also ran the same computation over a field (using [11,57]), which required an 89-bit modulus.To closely mimic our ring-based implementation, this implementation computes with integers of increasing sizes, but uses the same modulus throughout the computation.
We also include runtimes of two-party MiniONN [42], two-party Gazelle [33], two-party FALCON [40], SecureNN with custom threeparty arithmetic [55], three-party FALCON [56], and three-party Dalskov et al. [18] with two types of truncation (TruncPr and TruncPrSp, respectively).Many of those solutions were executed on more powerful hardware which would not lead to a meaningful performance comparison.For that reason, we reproduced the implementations except for MiniONN, Gazelle, and two-party FAL-CON [40] on our machines.From those, only Gazelle was executed on more powerful AWS instances with multi-threading at the time of original publication, but its performance even with that setup is not competitive with what we achieve.Furthermore, the solution was consequently surpassed in SecureNN, which we execute on our hardware.
Table 9 shows the time for a single inference and for executing multiple inferences in a batch where available.We can see that our single prediction time is lower than in other publications despite the fact that the solution is generalizable to a larger number of parties with a larger collusion threshold.Our communication is also low and the only construction that improves the time when executing multiple predictions in parallel is FALCON [56].While their implementation benefits from larger batching through multithreading and lower communication due to small moduli, FALCON is limited to three parties.Our solution, however, can be invoked Field MiniONN * Gazelle * SecureNN FALCON * [40] FALCON [56] [18] Ours, 3PC  with a larger number of parties as demonstrated in Table 10 with  = 5.Several other publications benchmarked NN predictions [7,14,15,38,46,47,[49][50][51].However, because they do not support or do not run MiniONN's MNIST NN evaluation, we cannot directly compare our performance.For example, while ABY 3 [47] is said to use MiniONN's MNIST NN, evaluation is actually based on a different, simpler model used in Chameleon [51].
Quantized Neural Networks.Benchmarks for quantized NNs were based on the MobileNets [30] architecture, which consists of 28 layers and 1000 output classes.The network alternates between 3 × 3 depthwise convolutions and 1 × 1 pointwise convolutions.A resolution multiplier  (128-224) scales the dimensions of the input image, and a width multiplier  (0.25-1.0) scales the size of the input and output channels.The models we used are hosted on TensorFlow's online repository [2] and are trained on the Ima-geNet [25] dataset.We experimentally determined that an upper bound of  = 16 is sufficient for truncation by a private value, since all computed ℓ ⟨ ⟩ s are ≤ 9 for all model configurations.
Performance of quantized MobileNets inference is presented in Tables 11 and 12 with 3 and 5 parties, respectively.Our methodology from Section 6.2 allowed us to reduce the ring size from  = 72 to  = 30 or less, potentially reducing the time by a factor of 2. For accurate comparison, we executed [18]'s implementation on our machines using the same setting.Since a 5-party honest-majority ring implementation is not available in [18], or more generally in MP-SPDZ, we use a field-based implementation for the 5  case from MP-SPDZ.Recall that the ability to generalize ring-based honest-majority protocols to more participants is our main objective.
The results our 3-party solution achieves are comparable to those in [18] despite ring reduction and can be explained by the differences in the algorithms.That is, Escudero et al. [28] experimentally determined that [18]'s implementation with ABY3's local conversion was superior to edaBits (which we use) only in one setting that we use (semi-honest, honest majority setting over Z 2  ).In addition, MP-SPDZ's optimization for large computation also aids its efficiency.This demonstrates that our quantized NN solution can aid efficiency.Furthermore, our gain in the 5-party case is significant, leading to the reduction in time by a factor of 16-32.

CONCLUSIONS
In this work we study multi-party threshold secret sharing over a ring in the semi-honest model with honest majority with the goal of improving performance compared to field-based computation.We design low-level operations for -party replicated secret sharing over any ring and consequently build on them to enable general-purpose protocols over ring Z 2  .Our implementation results demonstrate that ring-based implementations of different operations are significantly faster than their field-based equivalents with 3, 5, and even 7 parties.This allows us to improve performance of different applications including privacy-preserving machine learning tasks.We specifically test performance of neural network and quantized neural network classification and determine that performance of our techniques is on par with the best custom three-party protocols for those functions.
Buffalo Blue Sky Initiative, and US National Science Foundation grant 2213057.Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding sources.

A ADDITIONAL PROTOCOLS A.1 Inputting Private Values
We start with a general case when a participant who is not a computational party supplies their input into the computation and consequently discuss an optimized version when the input owner is one of the computational parties.The input owner holds a private value  which will be represented as an element of ring R. The input owner will need to generate replicated shares that correspond to  and send them to the computational parties.This will be the easiest way to proceed when there is only one element to share.However, when someone is sharing a vector of elements, we can save on communication by using pseudo-random shares.All shares except one for any element can be pseudo-random and computed locally by computational parties after obtaining a PRG seed.This means that among all shares  ∈ T , one is marked as special and is denoted as  * .The corresponding share is computed by the input owner and is communicated to all parties with access to that share.The construction is given as Protocol 5.
When the input owner is one of the computational parties, we can capitalize on the fact that the parties already have pre-distributed PRG seeds.We denote the input party as  * .Note that  * has access to a subset of the PRG seeds corresponding to the shares it is entitled to have access to, but not to all seeds.While we could generate new seeds for each  such that  * ∈  and make it available to all  ∉  and  * , these seeds will be accessible to more than  parties and do not contribute to security.Therefore, we instead choose to set such shares to 0 and use only shares accessible to  * .As a result,  * will Protocol 6 [] ← RandBit() All variants use a single round.When a single input is shared by an external party, the input owner simply generates all   shares and communicates them to the computational parties (each share is stored by  + 1 participants).This cost (which becomes sharing of a PRG seed) is amortized among all inputs when sharing multiple inputs.The additional cost per input for the input owner becomes generation   −1 pseudorandom ring elements and communicating the last, computed share to  + 1 computational parties, i.e., the total communication is  + 1 ring elements.Each computational party needs to generate −1  or −1  − 1 pseudo-random ring elements.When the input is shared by a computational party, there is no setup cost.The input owner need to generate −1  − 1 pseudorandom elements (i.e., similar to the number of shares it stores per shared value) and communicate the computed share to  other parties.Each other party computes −2  (i.e., the number of shares it has in common with the data owner) or −2  − 1 pseudo-random ring elements.As will be relevant later, when a computational party is sharing a ring element in the (3,1) setting, the input owner communicates a single ring element to another party (and only one pseudo-random element is computed by the input owner and the remaining computational party).The security proof can be found in Appendix B. Theorem 3. Input is secure according to definition 1 in the (, ) setting with  = 2 + 1 in the presence of secure communication channels and assuming PRG is a pseudo-random generator.

A.2 Random Bit Generation
Random bit generation is a crucial component of a variety of protocols including different types of comparisons, bit decomposition, division, etc.Therefore, it is of paramount importance to support this functionality for general-purpose computation.In this work we examine two variants: (i) generating shares of a single bit as fullsize ring elements and (ii) generating shares of -bit random  as full-size ring elements together with generating shares of individual bits of  in Z 2 .
The first variant, denoted RandBit, originated in [13] for fieldbased SS and was modified in [20] to work in Z 2  .We use the logic of [20] and adjust the algorithm to work in our setting.The result is shown as Protocol 6.
To achieve 50% probability of each outcome of the output bit, the computation uses a larger ring Z 2 +2 for most steps of the protocol the latter approach in our implementation of machine learning algorithms.
There are noteworthy differences in the design of protocols developed for a ring as opposed to original protocols for a field.Certain operations such as prefix multiplication are not available in a ring, and we resort to logarithmic round building blocks when protocols over a field achieve constant round complexity.In the context of comparison, a typical tool for realizing them was truncation (i.e., right shift), the cost of which was linear in the number of bits truncated, but the modulus had to be increased by a statistical security analysis to support such operations.In a ring, on the other hand, there is no significant increase in the ring size, but the communication cost is linear in the bitlength of the ring and not in the bitlength of the truncated portion.This brings different trade-offs, but the availability of faster arithmetic in a ring will still lead to significant savings.

B SECURITY DEFINITIONS AND PROOFS
Definition 1.Let parties  1 , . ..,   engage in a protocol Π that computes function  (in 1 , . .., in  ) = (out 1 , . .., out  ), where in  and out  denote the input and output of party   , respectively.Denote VIEW Π (  ) as the view of participant   during the execution of protocol Π.More precisely,   's view is formed by its input and internal random coin tosses   , as well as messages  1 , . ..,   passed between the parties during protocol execution: VIEW Π (  ) = (in  ,   ,  1 , . ..,   ).Let  = {  1 ,   2 , . ..,    } denote a subset of the participants for  < , VIEW Π ( ) denote the union of the views of the participants in  , and   (in 1 , . . ., in  ) denote the projection of  (in 1 , . . ., in  ) on the coordinates in  .We say that protocol Π is -private in the presence of semi-honest adversaries if for each coalition of size at most  there exists a probabilistic polynomial time simulator   such that {  (in  ,   (in 1 , . . ., in  )),  (in 1 , . .., in  )} ≡ {VIEW Π ( ), (out 1 , . . ., out  )}, where in  =   ∈ {in  } and ≡ denotes computational or statistical indistinguishability.
Proof of Theorem 1.Let  denote the set of corrupt parties.We consider the maximal amount of corruption with | | = .Because the computation proceeds on secret shares and the parties do not learn the result, no information should be revealed to the computational parties as a result of protocol execution.
We build a simulator   that interacts with the parties in  as follows: when a party  ∈  expects to receive a value from another party  ′ ∉  in step 5 of the computation according to function ,   chooses a random element of R and sends it to .   preserves consistency of the view and ensures that when the same value is to be sent by  ′ to multiple parties in  , all of them receive the same random value.This is the only portion of the protocol where corrupt parties can receive values (that the simulator produces), and the only portion of the protocol when a corrupt party  may send a value to an honest party  ′ is step 4, which   receives on behalf of  ′ .All other computation is local, in which   does not participate.
We next argue that the simulated view is computationally indistinguishable from the real view.First, note that the corrupt parties in  collectively hold shares   ,   and keys   (and thus can compute values G  .next)for each  ∈ T such that ∃ ∈  and  ∉  .This entitles the corrupt parties to computing the corresponding shares   , but the rest of the shares must remain unknown, so that they are unable to compute .Next, notice that when | | = , there is only one share  * =  such that all parties  ∈  have no access to   * and   * , while all parties  ′ ∉  store those values.Then there are two cases to consider: (1) If one or more parties  ∈  receive  ( ′ )'s share of   ′ from another party  ′ ∉  (it must be the case that  ( ′ ) ≠  * ), the received share has been masked by a fresh pseudo-random element from G  * , is therefore pseudo-random and indistinguishable from random by any  ∈  .(2) If no party  ∈  receives a value from any given  ′ ∉  , indistinguishability is trivially maintained.□ Proof of Theorem 2. As before, let  denote the set of corrupt parties with | | = .We build a simulator   that interacts with the parties in  as follows: after   extracts shares   ,   ,   ( ∈ T such that ∃ ∈  and  ∉  ) from the corrupt parties and receives the output  from the trusted party,   computes  ( ) as prescribed by the protocol for each  ∈  and also their sum   =  ∈  ( ) (in R).   sets  ( ) values for the remaining  −  parties to random elements of R subject to ∉  ( ) =  −   (in R).   , acting on behalf of party  ∉  , sends the corresponding  ( ) to each party in  .
To show that this simulation is indistinguishable from the real protocol execution, recall that there will be at least one  , denoted by  * =  , to which the parties in  have no access (and thus correspondingly cannot distinguish the output G  * from random elements of the ring).During real protocol execution the parties in  receive  + 1 values  ( ) , one per  ∉  .With the knowledge that the corrupt parties collectively have, they can remove the effect of all randomization except the use of the output of G  * .If we let  , * denote the th call to G  * .nextduring the execution of MulPub in Protocol 2, then the corrupt parties can recover  values of the form  ( ) +  , * with unique  and  and one value of the form  ( ) −  =1  , * for another .The next thing to notice is that any  (out of  + 1) of these values are pseudo-random and computationally protect the corresponding  ( ) values.The introduction of the remaining value reveals the sum of all  ( ) s, but not other information (i.e., the last value corresponds to the difference to make the sum equal to  −   ).This means that substituting these values with random elements subject to ∉  ( ) =  −   provides the same information to the corrupt parties and achieves computational indistinguishability of the views.□ Proof of Theorem 3. It is straightforward to show security of the full version of Input when the input owner is different from the computational parties.That is, the input owner creates proper shares according to the SS scheme using a PRG.Thus, as long as security of the PRG holds, the real view is computationally indistinguishable from a simulated view created without the use of any secrets.
However, when the input owner is one of the computational parties, only a reduced set of shares is produced.Thus, we need to evaluate the combined view of each coalition of  corrupt participants.There are two important cases to consider: (i) input owner  * is a part of the coalition and (ii) it is not.
using three parties.Multiplication and RandBit subfigures compare ring vs. field protocols, indicating a substantial gap as expected; edaBit sub-figure compares our and MP-SPDZ implementations in the same setting; and MSB sub-figure compares RandBit and edaBit variants.

Table 4 :
[11]ime of multiplication protocols in ms and communication is per party per operation in bytes (* means average for asymmetric communication patterns).FG and FD refer to the optimized GRR and DN field multiplication from[11], resp., and R is our ring realization.30 and 60 are integer bitlengths.

Table 5 :
Runtime of matrix multiplication in ms.

Table 6 :
Runtime of RandBit protocols in ms and communication is per party per operation in bytes.

Table 7 :
Runtime of edaBit protocols in ms compared to MP-SPDZ implementation.Communication for our solution is per party per operation in bytes.

Table 8 :
Runtime of MSB protocols in ms unless marked otherwise.Communication is per party per operation in bytes.rB and eB indicate variants using RandBit and edaBit, respec-

Table 9 :
Runtime of MNIST NN prediction in ms and communication in MB. (*) denotes results taken from the original publications.

Table 10 :
Performance of MNIST NN prediction in 5-party configuration.(*) means average for asymmetric communication.

Table 11 :
Performance of 3PC quantized MobileNets prediction in seconds.MP-SPDZ results are over a ring Z 2  .

Table 12 :
Performance of 5PC quantized MobileNets prediction in seconds.MP-SPDZ results are over a field F  .

Table 18 :
WAN runtime of MSB protocols in ms unless marked otherwise.rB and eB indicate variants using RandBit and edaBit, respectively.