Convolutions in Overdrive: Maliciously Secure Convolutions for MPC

Machine learning (ML) has seen a strong rise in popularity in recent years and has become an essential tool for research and industrial applications. Given the large amount of high quality data needed and the often sensitive nature of ML data, privacy-preserving collaborative ML is of increasing importance. In this paper, we introduce new actively secure multiparty computation (MPC) protocols which are specially optimized for privacy-preserving machine learning applications. We concentrate on the optimization of (tensor) convo-lutions which belong to the most commonly used components in ML architectures, especially in convolutional neural networks but also in recurrent neural networks or transformers, and therefore have a major impact on the overall performance. Our approach is based on a generalized form of structured randomness that speeds up convolutions in a fast online phase. The structured randomness is generated with homomorphic encryption using adapted and newly constructed packing methods for convolutions, which might be of independent interest. Overall our protocols extend the state-of-the-art Overdrive family of protocols (Keller et al., EURO-CRYPT 2018). We implemented our protocols on-top of MP-SPDZ (Keller, CCS 2020) resulting in a full-featured implementation with support for faster convolutions. Our evaluation shows that our protocols outperform state-of-the-art actively secure MPC protocols on ML tasks like evaluating ResNet50 by a factor of 3 or more. Benchmarks for depthwise convolutions show order-of-magnitude speed-ups compared to existing approaches.


INTRODUCTION
Machine learning (ML) and, in particular, deep learning are more and more growing in importance for academia and industry. The performance of an ML model, and hence, its application potential in real-world use cases, strongly depends on the amount and quality of available data. Since many companies are no longer able to generate the necessary data or models themselves, they have to rely on collaborations with competitors and other industry players.
Multi-Party Computation for ML.. Secure multiparty computation (MPC) addresses the challenges related to collaborative privacypreserving machine learning. MPC allows several parties, e.g., companies, to compute functions on secret inputs and to reveal only the function result and no additional information, in particular no information on the sensitive inputs (beyond what can be inferred from the result).
Indeed, MPC has been shown to be a suitable tool for privacypreserving ML in tasks like inference/evaluation and training (see, e.g., [16,32] and Section 1.2). However, most of the MPC protocols that are specifically designed for ML provide security guarantees only in special setups, e.g., they require adversaries to follow the protocol rules (i.e., passive security) or limit the number of adversaries (i.e., honest majority). In a mutually distrustful setup, e.g., collaborations between industry competitors on highly sensitive data, these requirements can usually not be guaranteed. We therefore strive for a setup that guarantees to an honest party that their data remains private and that the result is correct even if all other parties are actively trying to corrupt the computation or gain sensitive information, e.g., by deviating from the previously agreed upon protocol. MPC protocols that provide this strong form of security are called actively or maliciously secure.
The currently best MPC protocols in this dishonest majority setting with active security, are SPDZ [18] or state-of-the-art improvements thereof [3,30,31]. The efficiency of SPDZ-like protocols relies on a two-phase approach consisting of an offline and an online phase. In the input-independent offline phase, different forms of structured random data, e.g., Beaver triples [4], are produced. Then, this random data is used in the online phase to speed up the computation on sensitive input data.
Direct Support for Matrix Multiplications and Convolutions. However, these protocols are not optimized for ML applications. In fact, SPDZ-like protocols were designed for arithmetic computations on finite field elements and hence each computation, e.g., a matrix multiplication or a convolution, has to be realized in a low-level way with just field addition and multiplication. For our ML operations and especially convolutions this approach usually leads to an unnecessary overhead in communication and computation. Here, we visualize how the first component of the output is computed for different padding modes when convolving a 4 × 4 image (blue) and a 3 × 3 filter (orange). The black pixels for same and full padding show where the image has to be padded with zero values. The first component of the result (green) is computed by multiplying the pixels of the padded image component-wise with the pixels of the filter at the overlapping positions (symbolized with the hatched copy of the filter) and then summing up these products.
For example, in a convolution of a 2d image and a 2d filter as presented in Fig. 1c, the filter runs over the whole image. In particular, each entry of the image (e.g., [0, 0]) and each entry of the filter (e.g., [0, 0]) is used in several multiplications. Classical protocols like [18] securely multiply these values with so-called Beaver multiplications. This requires both [0, 0] and [0, 0] to be masked with new masks for each multiplication they occur in and parties to send around messages between all MPC parties for each of these maskings. In Fig. 1c, overall 9 maskings of [0, 0] and 16 maskings of [0, 0] are created and sent.
The impact of this overhead (for each single convolution) on the efficiency of the overall ML algorithm is usually significant, given the large number of convolutions used in classical architectures, especially in convolutional neural networks (CNNs) [23,35,36] but also in recurrent neural networks (RNNs) [28,38,51] and transformers [2,11,20]. 1 The aforementioned efficiency issues for ML operations have been addressed by Mohassel and Zhang in [43], who replace the common Beaver triples with more complex structured random data, that is, specially adapted to matrix triples for matrix multiplication and convolution triples for convolutions. Matrix triples and convolution triples are perfectly adapted to the respective operations and no longer suffer from the overhead discussed before with the example of Fig. 1, i.e., they significantly lower communication and computational costs compared to standard Beaver triples.
While the protocol by Mohassel and Zhang is merely passively secure, their construction has been lifted to the actively secure setting by Chen et al. [14]. However, the focus of [14] is on matrix triples and matrix multiplication rather than convolutions. In particular, [14] does not use convolution triples but emulates convolutions with matrix multiplications. This is more efficient than the original approach based on Beaver triples but introduces an overhead linear in the filter size (filter height and width).
Actively Secure Convolutions with Dishonest Majorities. In this paper, we construct an actively secure MPC protocol which directly uses convolution triples and therefore natively supports convolutions. We show that our construction leads to a more efficient evaluation of convolutions than classical actively secure protocols like SPDZ and the matrix multiplication-based protocol [14], namely, the only actively secure protocol with direct support for an operation close to convolutions.
For our protocols we employ the successful two-phase protocol structure common in SPDZ-like protocols. In our case, in the offline phase what we call convolution triples are generated, which are then used in the online phase to very efficiently evaluate convolutions of sensitive inputs. In order to construct these convolutions triples in an actively secure offline phase, we employ a homomorphic encryption scheme (HE), similar to the currently fastest Beaver triple generation protocol Overdrive [3,31]. Classically, HE-based offline phases gain most of their efficiency by amortization, i.e., they produce Beaver triples in large batches (with size usually in the range of 2 12 to 2 18 ) to lower the per triple costs for encryption, zero-knowledge proofs, and other cryptographic tools. This approach is usually very efficient given the large number of Beaver triples needed in most applications, i.e., far more than the batch size. For example, in Fig. 1c we already need 16 · 9 = 144 Beaver triples for a single (rather small) convolution.
However, direct generalizations of these classical protocols are usually inefficient for ML applications. The reason is the large variety of convolutions of different sizes and the often small total number of convolutions of a specific size in many ML architectures, e.g., in ResNet [23] (cf. Section 7). A naive approach that produces a batch of convolution triples for each specific size will ultimately produce a huge overhead of unused convolution triples and hence will be inefficient.
New Packing Methods. We solve this issue by developing new packing methods for convolutions, i.e., we pack the entries of images and filters suitably into plaintexts of the underlying encryption scheme and then use the special multiplicative structure of the plaintext and ciphertext spaces to compute a complete convolution (or multiple convolutions) with a single ciphertext multiplication. In line with some of the most recent packing methods, we avoid costly ciphertext rotations and maskings -primitives used in [14] and most of the related work (cf. Section 1.2). This simplifies the protocols and reduces the computational load for the parties, while still utilizing most of the capacity of ciphertext operations. We embed our method into a new general framework that (apart from our new and other recent packing methods) also directly supports the computation of scalar products, matrix multiplications, different types of convolutions, and potentially other operations. In particular, our general framework describes a wide class of packing methods that might be of independent interest and can potentially be used for other applications outside of ML.
Overall, we build a new flexible offline phase that can be instantiated with both our new and already known packing methods. We solve security issues of recent packings like Bian et al.'s packing [6] in our setting and prove our protocols secure against attacks by active adversaries as long as one party remains honest. Implementation. We implement our techniques as an extension to MP-SPDZ [29], the currently most efficient implementation of SPDZ-like protocols. Our implementation [48] provides convolution optimized extensions for both the LowGear protocol and the HighGear protocol of Overdrive [31]. This allows our protocols to be applied both in setups with a small number of parties, where LowGear is most efficient, and a setup with many parties, where HighGear scales better. More precisely, LowGear scales quadratically in the number of parties and uses cheap primitives, which make it well-suited for low-party setups. In contrast, HighGear scales linearly in the number of parties and uses more expensive ZKP and SHE protocols. Settings with both low and high number of parties are realistic and we want to support as many settings as possible. There are application scenarios where the distribution of the data dictates the number of parties, e.g., there are cases where one party holds the inputs and one party holds the model (i.e., a twoparty setting) or cases where the data (i.e., model and/or inputs) is naturally distributed among many parties. Another scenario is the client-server setting. Here, a setup with two servers is usually most efficient, but a setup with more servers reduces the risk that all of them collude to break security. While we focus on the low-party setup in our evaluation, we also show feasibility of our approach for more than three parties. We remark that related work usually concentrates on the low-party setup only.
We also use the optimized zero-knowledge proofs (ZKPs) introduced in TopGear [3] but extended the ZKPs to also support non-trivial packing methods. By implementing our protocols on top of the already highly optimized MP-SPDZ framework, we receive a better overall performance in ML applications since we get the improved performance for convolutions, while maintaining the currently best performance of MP-SPDZ for all other operations.
We use our implementation to give an extensive evaluation of our methods and compare it to the current state-of-the-art implemented in MP-SPDZ, as well as the state-of-the-art research results of [14]. On benchmarks for ResNet50, we outperform the state-of-the-art by a factor of 3 to 4.8 (depending on the network setup). For depthwise convolutions, our protocols are up to 26 × faster than the state-ofthe-art.

Summary of Our Contributions
• We introduce the first actively secure MPC protocol with direct support for convolutions (Sections 5 and 6). • We introduce a new efficient convolution triple generation protocol as part of our offline phase (Sections 6.2 and 6.3). • The convolution triple production is instantiated with multiple new and recent packing methods (Sections 6.2 and 6.3 and Appendix F.3.2), which might be of independent interest. • We prove that our online and offline protocols are actively secure even if only one party is honest (Appendices E and F).
In particular, we solve existing security issues with recent packings like Bian et al. [6] in the active adversary setup. • We present new and more efficient packing methods for convolutions (Section 4). This includes the first packing method for depthwise convolutions based on polynomial multiplication in the underlying cyclotomic ring (Section 4.3). Our packings do not use ciphertext rotations or maskings.  [9] semi dish. 2 matmul pc/cc, rot [14] mal. dish. any matmul cc, rot [16,32] any any any matmul -CryptGPU [50] semi hon.
3 conv -APAS [6] semi dish. 2 conv pc c CrypTen [34] semi dish. any matmul -TenSEAL [5] semi dish. 2 matmul pc, rot Cheetah [25] semi dish. 2 conv pc HeLayers [1] semi dish. 2 d matmul pc/cc, rot Ours mal. dish. any conv pc/cc a plaintext-ciphertext multiplications (pc), ciphertext-ciphertext multiplications (cc), and rotations (rot) b used and/or extended in DELPHI [42], CrypTFlow2 [46], GALA [55], HEAR [33], and [37]; same setting and operations as CHET [19] and HEMET [41] c matrix-vector multiplication of a plaintext matrix and an encrypted vector d one model owner and a compute server in addition to any number of clients • We have implemented our complete protocol (offline phase and online phase) [48], including several packing methods, as an extension of MP-SPDZ [29], which is the state-of-the-art implementation for SPDZ-like protocols (Section 7). • We have evaluated our implementation against generic SPDZ as well as [14], the state-of-the-art actively secure protocol for matrix multiplications (Section 7). Our results show that our specialized operations significantly improve the online and offline runtime compared to the related work. Our advantage in the offline phase is 4.82 × in the LAN setting and 3.01 × in the WAN setting for convolutions as in ResNet50. For depthwise convolutions, our approach is up to 18.59 × (LAN) or up to 26.53 × (WAN) faster. We also observe improvements of up to 40.15 × (LAN) or 41.84 × (WAN) in the online phase.
A full version of this paper is available online [47].

Related Work
In Table 1, we summarize recent MPC protocols and focus on the realization of secure convolutions. As can be seen, most research focuses on a very specific setting: 2-party or 3-party computations with passive (semi-honest) security. In contrast, our protocols aim at active security rather than passive security and allow for a dishonest majority of malicious parties, similar to the setup of, e.g., [14]. Table 1 also includes an overview of technical realizations of convolutions and ciphertext operations used in the different protocols. Convolutions are usually reduced to either field multiplications (mul), matrix multiplications (matmul), or computed directly as convolutions (conv). Most protocols realize these with different ciphertext operations and packing methods. Element-wise (SIMD) multiplication of encrypted data and ciphertext rotations (to align the data encoded in ciphertexts for multiplications) are used almost exclusively. Note that these rotations come with two downsides, which we want to avoid with our protocols. Firstly, ciphertext rotations are computationally expensive and require additional key material. For example, for [14] one has to generate around 24.72 GB of non-trivial data in an actively secure way. Secondly, if plaintext rotations are used (e.g., in a LowGear-style protocol) one has to make sure that the plaintexts are rotated correctly. This might require additional ZKPs or similar constructions to guarantee security in the presence of misbehaving parties. The same is true for packing methods that require parties to tile and/or replicate data in a specific way (e.g., [1]).
Furthermore, some recent works [6,33,37] aim to even perform multiple convolutions in parallel using specialized packing methods. A notably unique HE technique [6] uses multiplications of plaintext matrices with ciphertexts for this. There are also exceptions that do not (necessarily) use HE, such as CryptGPU [50], which uses field multiplication without the use of HE, and [16,32,34], which build their protocols generically on matrix multiplications (which might in turn be realized with HE but other techniques are possible as well).
Orthogonal to securely computing convolutions, there is also work on verifiable convolutions, i.e., proving in zero-knowledge that convolutions are performed correctly [40,52]. We note that our protocols guarantee correct computation of convolutions towards the parties who participate in the protocol. For a discussion on other privacy-preserving technologies for ML we refer the reader to [10].

Integer Polynomials and Multiplication
Let R = Z[ ]/Φ ( ) be the integer polynomials modulo the -th cyclotomic polynomial Φ ( ) = + 1 for = 2 a power of two. Let ∈ Z be the vector of coefficients of a ∈ R, i.e., a = −1 Then the vector of coefficients ∈ Z to the product c = a · b ∈ R can be computed with a negacyclic convolution: To verify this equation recall that mod Φ ( ) = −1. To simplify notation we will usually identify R and Z . This allows us to compute (negacyclic) convolutions with encryption schemes that support the homomorphic multiplication of encrypted polynomials, i.e., or (or both) are encrypted and we are able to obtain an encrypted product .

Convolutions in Machine Learning
To simplify the exposition, we restrict ourselves to two-dimensional convolutions, which is very common in image processing [35,38,44,54]. However, note that our results also carry over to the onedimensional case and to higher dimensions (e.g., 3d convolutions). Let be a commutative ring and denote by the functions Z → with support in the finite domain ⊂ Z ( ∈ N), i.e., functions that are zero outside of . A discrete 2d convolution * : for ( , ) ∈ ′′ . We call * a convolution with (2) accesses only indices of such that ( − ′ , − ′ ) ∈ for all ( ′ , ′ ) ∈ ′ . This is the case where the filter and the image overlap completely (e.g., in Fig. 1a).
i.e., | | = | ′′ | are of the same size and a suitable number of zero values of outside of are accessed by (2). As can be seen in Fig. 1b, this means that the image is extended by roughly half the size of the filter in each direction. ′′ is chosen such that all (possibly) non-zero summands in (2) are accessed. This is the case where the filter and the image overlap in at least one entry. For this, the image is extended by the filter size (minus one) in each direction. Figure 1c visualizes this. For our packing schemes in Section 4, we will use up( ′′ ) = up([ ..ℎ ′′ ) × [ .. ′′ )) (ℎ ′′ , ′′ ), i.e., the smallest upperbound for ′′ in each spatial direction that is not included in ′′ .
Note that the valid output of the convolution is (in general) smaller than the input image and with full padding, the output is larger than the input image. However, for ℎ ′ = ′ = 1, these three types of convolution are equivalent. A simple way to compute arbitrary convolutions is to compute full convolutions and simply discarding some parts of the output to get results with same or valid padding. The same is true for strided convolutions, where we only want the results for, e.g., every second coordinate.
A related operation is the cross-correlation (for real-valued ), which is equivalent to a convolution of a and a mirrored (see Appendix C.1). Therefore, we will only talk about convolutions in the following, even if we might want to compute cross-correlations from time to time. In ML applications, slightly more complex operations built on 2d convolutions are considered. For 4d tensors and with domains = × ℎ × × and ′ = ′ × × ℎ ′ × ′ , respectively, we define for each ( , ) ∈ × ′ . The padding modes (full, same, valid; zero-padding) and strides then apply to the individual 2d crosscorrelations (convolutions), i.e., to the finite ′′ ⊂ Z 2 such that for ( , , ) ∈ × ′ × . Usually, all ′′ are the same, and we can simply define up( ′′ ) up( ′′ ) for the 4d domain ′′ of the output. In addition to this, there is also the so-called depthwise (separable) convolution where is now a 3d tensor with domain ′ = ×ℎ ′ × ′ . The latter is used, for example in [2,15,24,49], to reduce the computational load and the number of trainable parameters compared to conv2d.

MPC, Secret-Sharing, and SPDZ
The currently most efficient actively secure MPC protocols are based on the fundamental results in SPDZ [18]. By now there is a vast amount of work that builds on and extends the original SPDZ protocol, e.g., [3,14,17,31] (cf. [45] for an overview). We see our work as an extension to the SPDZ framework or as a SPDZ-like protocol. What follows is a short overview of the most important concepts necessary for this work.

Secret-Sharing.
For security against a dishonest majority, i.e., in our setup all but one party might be corrupted, SPDZ uses a full-threshold additive secret-sharing. For this work, we restrict ourselves to a finite prime field F . We call [ ] the share of the secret ∈ F and party . We have that =  Fig. 10 in Appendix A.4) to verify that the parties correctly computed and opened shares. The MAC key is not revealed during a (successful) MAC check and many MAC checks can be combined into a single check [17].
We will therefore shortly repeat the basics of the BGV scheme, while more details can be found in Appendix A.1. Let R = F [ ]/Φ ( ) = R/ R for a prime with = 1 mod . Let < , a prime, and identify R with a subset of R in the usual way (cf. [18]). Let (pk, sk) ∈ R 2 × R be a BGV public key/private key pair, R 2 , enc pk : R × R 3 ↦ → the encryption function, and dec sk : ↦ → R the decryption function. We use the following notation for encrypted values: ⟨x⟩, e.g., ⟨x⟩ pk = enc pk (x, r), where we omit the explicit dependency on the key if it is clear from the context. We also define homomorphic operations on ciphertexts and denote them with operations of the same semantics, e.g., x · ⟨y⟩ for plaintext-ciphertext multiplication. For further details, e.g., the definition of the encryption and decryption functions, how encryption randomness has to be chosen, and how ciphertext operations (addition and multiplication) are defined, see Appendix A.1

CONVOLUTION PACKING
There is an obvious similarity between the multiplicative structure of R depicted in (1) and (1d versions of) the convolutions shown in (2) and (3). Indeed, with the zero-padding and a large enough , we have and -as mentioned before -we can easily express cross-correlations as convolutions. This similarity can be used to represent convolutions as operations on R and then use the homomorphic multiplication of BGV ciphertexts in SPDZ-like protocols to securely (and efficiently) compute convolutions in our MPC protocol. For (7), one could, for example, compute * = dec(⟨ ⟩ · ⟨ ⟩).
However, two problems remain before we can use this in practice. Firstly, (7) holds a priori only for 1d convolutions but we need support for higher-dimensional (e.g., 2d) convolutions. Fortunately, there is a standard way to represent higher dimensional convolutions in terms of 1d convolutions. This construction is described in Section 3.2.1 and allows us to restrict ourselves to the case of 1d convolutions in most cases. Secondly, is often quite large in MPC protocols. In order to use the full potential of R, we therefore need to utilize a large fraction of the convolved slots, usually by performing multiple convolutions at once. Both problems can be simultaneously addressed by so called packing methods. We describe a general framework for packing methods next.
where = packi( ) and = packf ( ), i.e., is packed with packi, is packed with packf, then the bilinear operation op R is evaluated on the packed vectors and and the result is then unpacked with unpackr.
To avoid confusion we will often add additional arguments for the bilinear operations, e.g., write packi(op, , ′ , ) instead of packi( ). Additionally, most of the discussed packing schemes use the standard choice of op R = * on R ≃ Z and = = = 1, i.e., we express the operation op as a negacyclic convolution or as a polynomial multiplication in R. The latter can be performed securely with homomorphic encryption (e.g., BGV; cf. Section 2.4) and packing (or unpacking) can be performed before the encryption (or after the decryption, respectively). Note that (8) then becomes with , ∈ R for the standard case.  In other words, mapi, mapf, mapr correspond to above and packi, packf are defined as pullbacks * while unpackr is defined as pushforward * .
Remark 3.2. One of the most important examples of a packing method not induced by functions is the CRT packing discussed in Appendix B. In particular, this encoding is used in the generation of Beaver triples where we use our general framework in the special case = ′ = ′′ = [.. ), = F a finite field, op = ⊙ component-multiplication, and op R the standard choice of polynomial multiplication as described above.

Recent Packing Methods
In the following we want concentrate on induced packing methods recently introduced in the literature. Additionally, we describe Bian et al.'s packing method in Appendix C.2. In Section 4 we present new packing methods -partially based on existing methods and completely new ones.

Multidimensional Convolution Packing.
We first want to show how we can include 2d convolutions in our framework. This will also be used as part of all other packing methods. Therefore, our description is similar to the packing methods presented in [6,25]. However, the version presented here is not bound to a specific padding but rather supports all popular padding methods (with zero-padding) discussed above. Recall that the output domains ′′ corresponds to the popular padding modes in Section 3. A proof for Theorem 3.3, as well as the corresponding version for cross-correlations, can be found in Appendix C.1.
This corresponds to a conv2d for a batch size = 1 and valid padding (with shifted and indices of the output compared to the above description). In the framework of Section 3.1, we would have mapi(0, , , ) = (0, , , ), mapf ( ′ , , , In Section 4.2, we present a more efficient generalization which can use the same in multiple batches. This will be particularly useful when the packing is applied to encrypted versions of and , since then the encryption of has to be sent only once. We remark that sending ciphertexts and proving their correctness with zero-knowledge proofs is expensive and our approach reduces these costs, both the bandwidth and the runtime, compared to the original version where each batch is handled as a completely new and unrelated conv2d operation.

NEW PACKING METHODS
Here, we present new packing methods for convolutions. This includes the first packing method for depthwise convolutions that can be realized with only homomorphic polynomial multiplications. Appendix D contains the proofs of correctness for each of the packing methods.

Simple Convolution Packing
Our first simple convolution packing is based on the multidimensional packing of Section 3.2.1. For a complete conv2d computation, we deviate slightly from the standard choice op R = * and use instead a linear combination of partial convolutions in (8), i.e., This allows for the simple convolution packing below.
For the induced packing (packi, packf, unpackr) and , as above, Compared to Theorem 3.3 we have two additional dimensions index by and ′ . For each index pair ( , ′ ) we map to a disjoint subset of [.. ) and then apply a modified version of Theorem 3.3 for ★ instead of * (cf. Remark C.1). A single * then yields cross-correlations for each batch ( ) and output channel ( ′ ) and a fixed input channel ( ). We can then simply sum up all sets of individual cross-correlations to get the full conv2d result. For details on the proof of Theorem 4.1 we refer to Appendix D.1. A visual example can be seen in Fig. 2. There, we abstract away the spatial dimensions with blocks, each representing an ℎ ′′ × ′′ slice (by Theorem 3.3), and only focus on the remaining dimensions.

Generalization of Huang et al.'s Convolution Packing
Here, we present a slightly different (but more intuitive) extension to Huang et al.'s packing method [25] described in Section 3.2.2.
Theorem 4.2. Let be a (4d) = ×ℎ × × tensor and let be a (4d) ′ = ′ × ×ℎ ′ × ′ tensor. Choose ′′ according to the padding mode and let (ℎ ′′ , ′′ ) = up( ′′ ). Let ( , , , , ) = ((( · ′ + ) · + ) · ℎ ′′ + ) · ′′ + be the canonical indexing into a (flattened 5d) × ′ × × ℎ ′′ × ′′ tensor. Let = packi(conv2d, , ′ , ) and The intuition of this packing is similar to the simple packing (cf. Section 4.1): For a pair ( , ′ ), the image and filter are mapped to disjoint subsets of [.. ) such that (partial) convolutions for different batches or output dimensions do not overlap. Additionally, the index along the input depth dimension is chosen so the image and filter for the same input channel intentionally overlap. By the structure of the (negacyclic) convolution, the filter has to be reversed along this dimension. Then, the partial convolutions are summed up and the result can be obtained in the last slot along the dimension. A proof can be found in Appendix D.2 and an example can be seen in Fig. 3. As for Fig. 2, we only ignore the spatial dimensions in the figure as this is handled by Theorem 3.3. An example for Huang et al.'s original packing would look similar (with the before mentioned limitations of only supporting = 1 and valid padding) but the encoding of the filter and decoding of the result would both be reversed along the ′ axis, as can be seen when comparing the equations for the packing methods.
With (12), we are then able to pack a whole conv2d operation into a single convolution. However, if the length of vectors (on which we can operate) is limited, e.g., when we work with homomorphic encryption with a fixed , we cannot perform the whole operation. Instead, we should split a conv2d operation into smaller operations (which can be computed as in (12)). This is possible along all dimensions (batch dimension , dimension, dimension, (not used) = 0,·,·, 0,·,·, 0,·,·, 1,·,·, 1,·,·, 1,·,·, input depth dimension , and output depth dimension ′ ; see [25] for their version and Section 6.5.2 for ours). Here, our generalization not only allows (previously impossible) direct realization of convolutions (including > 1 and not only valid padding), but it also improves efficiency as we can move spatial splits (splitting along the or dimensions) into the batch dimension. For example, in our evaluation, we compute the convolution of a 1 × 224 × 224 × 3 image with a filter as a convolution of a 4 × 112 × 112 × 3 image with the same filter (recombining the four batches to a larger convolution triple afterwards). This is still a convolution of a single image with a single filter and therefore we only need ciphertexts for a single filter instead of four, as would be the case when we represent this as four separate 1 × 112 × 112 × 3 convolutions.
Again, we construct the packing such that each batch is mapped to disjoint region of [.. ). For the depth dimension, we pack the image and filter such that the * yields the convolution for each channel of the image and channel ′ of the filter. For the output, we simply select the partial convolution with = ′ . While this might seem wasteful, especially for small images, it is more efficient than emulating dconv2d with conv2d (cf. Section 7). A proof for Theorem 4.3 can be found in Appendix D.3.

MALICIOUSLY SECURE CONVOLUTIONS: THE ONLINE PHASE
As mentioned before in Section 2.3.2, given a convolution triple (⟦ ⟧ , ⟦ ⟧ , ⟦ ⟧ ) for uniformly random , and = conv2d( , ), we can compute a (maliciously secure) convolution of a secretshared image ⟦ ⟧ (with the same shape as ) and a filter ⟦ ⟧ (with the same shape as ) as a linear combination of the triple and opened values − , − (analogously to (6)), i.e., With this, we obtain a share of the convolution result and can inductively compute an arbitrary function on shares -in particular any convolutional neural network -as we can compute all necessary operations on shares in a maliciously secure way (scalar multiplications and additions [18], fully connected layers and matrix multiplications [14], ReLUs and max pooling [12,21], etc.). The security follows from the security of the individual operations (e.g., linear operations to compute (14)) and the bilinearity of conv2d [14,43]. The same can be done for depthwise convolutions by simply replacing conv2d by dconv2d above. Strided convolutions and different paddings can be handled analogously. The full protocol for the online phase Π online , as well as the corresponding functionality F online and the security proof of the following theorem can be found in Appendix E. Assuming the existence of F offline -an ideal functionality for the offline phase that generates triples (cf. Section 6) -and F rand that allows parties to sample random values in F (used in the MAC check, Fig. 10), we obtain the following theorem.
Theorem 5.1. The online protocol Π online securely implements the ideal functionality F online in the (F offline , F rand )-hybrid model.

MALICIOUSLY SECURE CONVOLUTIONS: THE OFFLINE PHASE
The convolution triples used in the online phase are generated in the input-independent offline phase. Different design patterns for the offline phase can lead to drastically different performance characteristics of MPC protocols in different application setups (few parties, low latency communication; many parties, high latency communication; etc.) and they heavily influence the practicality of certain approaches. In order to be applicable to these different setups we instantiate our generic computation methods for convolutions discussed in Section 4 in multiple ways. Since all of the presented new offline protocols are based on homomorphic encryption we first describe the common pattern of these approaches in Section 6.1. We then introduce specialized protocols for the standard choice (9) in the case of a low number of parties based (similar to Overdrive's LowGear protocol [31]) in Section 6.2 and a larger number of parties (similar to Overdrive's HighGear protocol [31] or TopGear [3]) in Section 6.3. The protocols can be trivially extended to support the simple packing of Section 4.
Generalized sacrificing to generate a triple with the correct correlation or fail.
1. All parties sample with F rand .

General Construction
In Sections 3 and 4 we have seen how convolutions can be packed and then evaluated by a polynomial multiplication in a cyclotomic ring. We first restrict to a single polynomial multiplication and discuss the case of convolutions that cannot be represented as a single polynomial multiplication (without increasing ) in Section 6.5.2.
To securely realize a polynomial multiplication we use the homomorphic properties of the BGV encryption scheme common in SPDZ-like protocols. Once we can compute convolutions in a secure way, we can construct the non-trivial third entry = conv2d( , ) or a convolution triple. We remark that since our general framework also support simple field multiplication, we can also generate classical Beaver triples. Multiplication with homomorphic encryption schemes in our protocols follows the following pattern. First each party generates (random) shares locally and encrypts them with their public key. In order for a ciphertext to be used in a multiplication protocol, a party first has to show with a zero-knowledge proof (ZKP) that they know a plaintext witnesses and that the plaintext is well-formed. In particular, our ZKPs show that the plaintexts are valid packings, which reduces to showing that certain coefficients (depending on packi and packf) are zero. This in turn will imply that the sum of the shares, i.e., the shared secret, will have the same zero coefficients and therefore represents a valid packing.
Next, the parties multiply their shares with standard multiplication techniques, which are base on either linear homomorphic encryption in Section 6.2 or somewhat homomorphic encryption in Section 6.3. Additionally, the shares are authenticated in the usual way, i.e., by multiplication with ciphertexts of (shares of) the MAC key . Furthermore, the parties check that the original shares were authenticated and that no error was introduced in the multiplication (or resharing for SHE-based protocols). The latter is done using a new extended sacrificing technique which we will introduce in Section 6.4.

Linear Homomorphic Offline Phase
In Fig. 7, we present the (convolution) triple generation of our protocol based on linear homomorphic encryption. Additional subprotocols can be found in Figs. 5 and 6. The construction is based where enc ′ is encryption with large drowning noise (larger than normal encryption randomness; cf. Appendix A.1.1). 1.3. Send ⟨c , ⟩ pk to and receive ⟨c , ⟩ pk in return.   on Overdrive's LowGear protocol [31] but extends it to generate triples for any bilinear operations that can be represented with the framework of Section 3.1. (Fig. 7 restricts this to the standard case (9) for simplicity.) Analogously to Overdrive LowGear, parties first generate their shares for and . Here, only one of them (i.e., ) requires ZKPs that prove correctness of encrypted shares. These shares are sent to all parties. Then, the parties can multiply these ciphertexts with (packings of) their own share of to obtain ciphertexts of pairwise shares. These pairwise shares are also re-randomized and sent back to the party that originally sent the encrypted share and holds the corresponding private key. After receiving all encrypted pairwise shares, this party can decrypt them and combine them to obtain a share of the overall product of packings, e.g., an encoding of the convolution of and . Finally, all shares are authenticated (by multiplying with encrypted MAC key shares as in LowGear) and parts of the triples can be sacrificed to guarantee the correct relation between authenticated triples (cf. Section 6.4).
Note that this construction is much closer to LowGear than, for example, [25]. In [25], a protocol similar to our Multiply subprotocol (cf. Fig. 6) is used. However, their version does not drown the ciphertext containing the pairwise product c , . Instead, [25] computes this product and extracts (LWE) ciphertexts for all coefficients of the product's (RLWE) ciphertext that are later required for the shares of the conv2d result. We opted to not follow this approach for the following reasons. (i) We use larger BGV parameters for drowning ciphertexts for (scalar) Beaver triple generation, so avoiding drowning does not improve the parameters. (ii) The technique comes with additional computational overhead. (iii) It is unclear if maliciously crafted (LWE) ciphertexts might break the security as [25] only considered semi-honest adversaries. (iv) The technique could not be reproduced since the reference [13] pointed to in [25] does not discuss how to obtain LWE ciphertexts from RLWE ciphertexts (only vice versa). (v) The noise hiding technique of [25] is not well suited for our protocol, since it introduces a (probabilistic) 1 bit error in the result.
The following theorem captures the security of our LHE-based offline phase. A security proof can be found in Appendix F. To follow the security proofs in [31], the functionalities F auth-linear (for linear operations on shares) and F auth-MPC (for linear operations and triple generation) are used instead of a more traditional offline functionality F offline . Additionally, we assume standard functionalities for sampling random values (F rand ), committing and decommitting to values (F commit ), and generating encryption keys and shares of the MAC key (F setup ).
Theorem 6.1. The offline protocol Π offline-LHE securely implements the ideal functionality F auth-MPC in the (F auth-linear , F commit , F rand , F setup )-hybrid model with rewinding if the used BGV cryptosystem achieves enhanced CPA-security [31]. Remark 6.1. Please note that the use of rewinding is a standard tool in these type of protocols (cf. LowGear protocol [31]).

Somewhat Homomorphic Offline Phase
In Appendix F (Fig. 19), we present a (convolution) triple generation based on somewhat homomorphic encryption. The construction is based on Overdrive's HighGear protocol [31].
Similarly to the linear homomorphic case (cf. Section 6.2), all parties sample their own shares of and and encrypt them. However, in the SPDZ-like SHE approach, the share for both and are encrypted. Utilizing a HighGear/TopGear-style ZKP, the parties prove that the sum of their encrypted shares is a valid ciphertext of the sum of the shares, i.e., of the shared value or . Therefore, all parties have a valid ciphertext of (the packing of) and . These can be multiplied homomorphically with a somewhat homomorphic encryption scheme to obtain a ciphertext of the product, e.g., of the encoding of a convolution of and . Analogously to the original approach by [18] the parties can (distributively) decrypt the product ciphertext, reshare the product and authenticate it. Finally, sacrificing is used to guarantee that the correlation of the triple is satisfied (cf. Section 6.4).
Please note, that again the main changes to the HighGear (or TopGear) protocol are the use of ZKPs that ensure correct packing, local (un)packing operations, and the adapted sacrificing for convolutions. The security of our SHE-based offline phase is given by the following theorem and the proof in Appendix F. Again, we assume the availability of standard functionality for (de)committing, randomness generation, and a key/MAC setup. Theorem 6.2. The offline protocol Π offline-SHE securely implements the ideal functionality F offline in the (F commit , F rand , F setup )hybrid model if the used BGV cryptosystem achieves CPA-security and has an algorithm for meaningless public key generation [18].

Sacrificing
While Chen et al. presented a generalization of the Beaver multiplication approach for arbitrary bilinear operations in [14], they did not generalize the sacrificing in the same way. 4 As described in Section 6.1, sacrificing is necessary in our protocols to ensure that the produced triple is correctly authenticated. In Fig. 5, we show a generalization of the sacrificing presented in [30]. Its security follows directly from [30]. However, the efficiency of the sacrificing can greatly depend on the type of bilinear operation that we consider. The reason for this is the inherent asymmetry of the optimized sacrificing of [30] (compared to the original technique used in [18]). This is especially true for LowGear-style protocols that only require expensive ZKPs for one of the triple elements.
In general, one of the triple inputs (i.e., or of a triple ( , , )) might be more expensive to compute. Therefore, one should consider a reversed version of the sacrificing presented in Fig. 5 taking shares of , ′ , , , ′ instead. Technically, this can be achieved by ) as inputs to Sacrifice.

Modifications and Optimizations
While the above MPC protocols are very general (being able to compute triples for any bilinear function that can be represented with the standard case (9) of the framework of Section 3.1), small modification can be used to also support the non-standard bilinear forms op R in Eq. (8) (e.g., the simple packing of Section 4.1; cf. Section 6.5.1), handle any size of convolution (cf. Section 6.5.2), utilize ciphertexts more efficiently (cf. Section 6.5.3), or handle convolutions with strides larger than 1 and/or non-zero padding (cf. Section 6.5.4).
6.5.1 Modification for the Simple Convolution Packing. In this paragraph we discuss how packing images and filters in multiple ciphertexts (as in Section 4.1) is handled. The overall result then is a sum of several homomorphic ciphertext products. Extending the protocols of Sections 6.2 and 6.3 is straightforward. To see that these extended protocols are still secure, notice that the intermediate steps only produce shares of intermediate results (as well as ciphertexts that do not leak any information as they are either blinded in the LHE protocol or locally computed in the SHE protocol). These intermediate shares are summed up to obtain the overall triple. Security of the extended proof then directly follows from the security results from Sections 6.2 and 6.3 and the properties of the secret sharing scheme.

Handling Large Convolutions.
Recall that in Sections 3.1 and 4, we usually had to choose large enough to support packing of all tensor dimensions, e.g., · ′ · ℎ ′′ · ′′ ≤ with the simple packing of Section 4.1 or when · ′ · · ℎ ′′ · ′′ ≤ with the generalization of Huang et al.'s packing (cf. Section 4.2).
The choice of on the other hand affects other parameters, e.g., of the encryption scheme, and can slow down the offline phase significantly if gets to big. To avoid this blow-up of and possible parameter changes to the encryption scheme, we split large convolutions into smaller ones and thereby extend the approach by [25] from the passively secure setting to the actively secure setting. While splitting along the batch dimension ( ) or output depth dimension ( ′ ) is straightforward (even in the actively secure setting), splitting convolutions along the spatial dimensions or the input depth dimension ( ) often lead to an overhead and should then be avoided. The technical reason for this behavior are the ciphertext sums in these dimensions that come with our packing methods and convolution protocols. For irregular splittings, i.e., summands of different dimensions (e.g., splitting = 11 in parts with = 6 and = 5), we can then no longer use the full amortization potential of the BGV scheme and the associated ZKPs, which we need in the actively secure setting. For example, in the worst case we need an additional ZKP for 40 ciphertexts for each single ciphertext that encodes a different dimension -hence ZKPs for 80 ciphertexts for the splitting of = 11 in parts with = 6 and = 5. This large overhead can be reduced by trivially increasing the ciphertexts for small dimensions to a common larger dimension, i.e., use the same dimensions in each part and set certain parts of the ciphertexts to zero (in our example we get then twice = 6). Nevertheless, a certain overhead due to the zero coefficients remains. We therefore preferably split on dimensions where these problems do not occur and apply irregular splittings only as a last resort.

Combining Ciphertexts for
Sacrificing. Finally, we want to discuss an optimal use of the sacrificing technique in our setup. As mentioned in Section 6.4 our sacrificing protocol produces, similar to MASCOT [30], shares of tuples ( , , ′ , , ′ ) and then discards, i.e., sacrifices, ′ and ′ to check that = conv2d( , ). Now instead of generating and ′ separately, e.g., by using two invocations of Multiply in Fig. 7, we can generate them more efficiently by combining and ′ into a single large filter and then multiply only once to get both and ′ . For example, we can encode a single convolution of a × ℎ × × image with a 2 ′ × × ℎ ′ × ′ filter (of twice the output depth dimension ′ ) in the ciphertext multiplication. After the multiplication and unpacking, the share of the result (and of the filter) with doubled output depth can be simply split in half along the ′ dimension to get a 5-tuple for sacrificing: one image, two filters, and the result of convolving the image with two filters. The analogous doubling technique can also be applied to the batch dimension ( ; for conv2d or dconv2d).
Please note that this optimization is orthogonal to the splitting in Section 6.5.2. We use both optimizations in our implementation.
6.5.4 Special Convolutions. For (depthwise) convolutions with non-zero padding, e.g., when one expands the (blue) image in Fig. 1 with non-zero values (usually constant values or replicas of the border pixels), or convolutions with strides of 2 or more, we do not offer special constructions with our protocols. This is because the used packing methods that homomorphically compute negacyclic convolutions require zero-padding so the constructions are correct (cf. Appendix D) and compute all pixels of the result (i.e., with a stride of 1). These convolutions can still be computed by expressing them as a (larger) convolution with zero-padding or by discarding parts of the result, respectively. Note that the conv1@7x7 convolution discussed in Section 7 has a stride of 2 and our protocols outperform the related work, even though our protocol discards parts of the result.

IMPLEMENTATION AND EVALUATION
We have implemented our protocols [48] on top of MP-SPDZ [29] by adding support for secure convolutions and depthwise convolutions. Our implementation extends the online phase with convolution tuples for faster convolutions, as well as the corresponding convolution triple generation with both LowGear-style and HighGear-style protocols in the offline phase. The implementation is fully-featured as we can use the wide range of other (non-convolution) operations that are already part of MP-SPDZ.
In the remainder of this section, we show the results for the empirical evaluation of the protocols developed in this work. We evaluate our technique for convolutions with images and filters of typical shapes. We use ResNet50 as a reference for this. Additionally, we benchmark depthwise convolutions for images of different sizes to show the benefit of our specialized handling of depthwise convolutions. Note that our protocols do not affect the accuracy of ML models. The accuracy stays the same as, e.g., in [16,32], who perform secure inference or training on MPC. Therefore we here do not measure accuracy as part of the evaluation.
Evaluation Setup. We ran the benchmarks on a virtual server (AMD EPYC™ 7443 processor @ 2.85 GHz, 4 to 8 cores) emulating different network settings: LAN with 10 ms network delay and 1 Gbit/s network bandwidth; and WAN with 35 ms delay and 320 Mbit/s. These network settings allow us to compare our results to the state-of-theart way of computing convolutions as matrix multiplications [14]. 5 Our benchmarks utilize only a single thread per party for computations. The benchmarks use = 2 parties for LowGear-style (LHE-based) offline phases (on 4 cores) and = 4 parties for High-Gear-style (SHE-based) offline phases (on 8 cores). We benchmark our protocol in the same setting as [14] for SPDZ-like protocols: 128 bit of computational security, 40 bit of statistical security, and plaintext modulus of 128 bit. In the following, we analyze the performance of our protocol in the online phase and in the offline phase.
Runtime in the Offline Phase for Convolutions. In Table 2, we compare the runtime of the classical SPDZ-based MPC computation with field multiplications (LowGear) and [14] Table 2: Runtime Results for conv2d Operations in the Offline Phase (in Seconds). Our protocols here are LowGear-based (cf. Section 6.2). Runtime is given for convolutions of ResNet50 [23]. The layer conv1@7x7 is a convolution of a 1 × 224 × 224 × 3 image with a 64 × 3 × 7 × 7 filter and stride 2. Other layers convi@3x3 are for 1 × 7 · 2 5−i × 7 · 2 5−i × 2 4+i images, 2 4+i × 2 4+i × 3 × 3 filters, and stride 1. The above convolutions are repeated times in layer convi. We give the runtime for all convolutions of a layer. a column extrapolated from the runtime results in [14] using Tables 4 and 5 b extrapolated from results with halved output depth c extrapolated from results with output depth ′ = 2 and = 1 results for [14] show the least increase in runtime when the network gets more limited (comparing LAN to WAN) but the computational overhead of the HE operations used in their offline phase are still too costly to outperform our protocol in the WAN setting. If we compare our results to the protocols in the semi-honest setting (e.g., Huang et al. [25] perform the ResNet50 conv1@7x7 convolution in around 2 s with a smaller plaintext modulus in the WAN setting without any online-offline separation), we can see that there is still a large gap in the performance between actively and passively secure protocols. However, using our convolution packings noticeably improves upon the state-of-the-art in our actively secure setting. Comparing our protocol's HighGear variant to HighGear shows a 13.43 × speed-up for the simple packing and a 12.23 × speed-up for the generalization of Huang et al.'s packing. This was measured for the 4-party WAN setting; detailed results can be found in Appendix G. Chen et al. do not evaluate the runtime for their protocol [14] with more than two parties.
Communication Cost for Convolution Triple Generation. For the above computation of ResNet50 convolutions, each party needs to send 2.187 TB of data for the LowGear protocol, 138.672 GB for our protocol with the simple packing, and 134.635 GB with our generalization of Huang et al.'s packing, respectively, in the offline phase. We estimate Chen et al.'s [14] communication cost to be 21.020 GB. As we see above, the low communication cost of [14]  does not translate to a faster protocol as we clearly outperform theirs in the evaluated setting. This shows that we can successfully trade communication cost for faster protocols by avoiding expensive ciphertext rotations with our packing methods. A more detailed analysis of the communication costs can be found in Appendix G.
Round Complexity in the Offline Phase. Also note that the (theoretical) round complexity of the protocols is almost the same. Not considering the setup (key and MAC generation, etc.), the triple generation requires 4 rounds for [14], 6 rounds for LowGear-style protocols (ours and [31]), and 8 rounds for HighGear-style protocols (ours and [3,31]).
Runtime in the Offline Phase for Depthwise Convolutions. We also benchmarked depthwise convolutions. The results are depicted in Table 3. For dconv2d, filter sizes of 3 × 3 are standard [2,15,24,49]. Therefore, we benchmark these for different image sizes. We fix the depth to 512 due to the separable nature of dconv2d, i.e., each entry along the depth dimension is independent and thus the runtime scales linearly with the depth. Runtime for other values of can simply be extrapolated from our results. As can be seen in Table 3, the matrix-based approach of [14] is unsuitable for depthwise convolutions and performs worse than the standard LowGear protocol. This is because [14] would compute matrix multiplications with the same size as for a conv2d computation (with input and output depth set to 512 in this example), incurring the overhead of the non-optimal emulation of convolutions with matrix multiplications and the overhead of the mismatch between the 128 × 128 matrices computed by [14] and the matrices needed to compute convolutions. Note that this still performs better than computing a single matrix multiplication for each of the output channels.
In contrast, we can use our depthwise packing of Section 4.3 which performs well for images of size 50 and below, or compute conv2d operations to emulate dconv2d (with the simple packing or the generalized Huang et al. packing) which performs well for larger images (larger than size 50). The conv2d packings compute only one output channel for polynomial multiplication and are therefore slower if we could instead compute multiple channels with the depthwise packing. If the image size grows, the depthwise packing would also only compute only one channel per convolution and then our implementation of the conv2d packings utilize the optimization in Section 6.5.2 better to compute a few (partial) convolutions per output channel.
Overall, the right choice of one of our packing schemes can outperform LowGear for all but the smallest image sizes (LAN: up to 18.59 × faster with ℎ = 240; WAN: up to 26.53 × faster with ℎ = 240) and all of them outperform [14]. We also tested Bian et al.'s packing scheme (see Appendix G). First tests show considerably worse performance compared to LowGear (≈100 × slower). The main reason for this inefficiency is the computational overhead of the modified BGV scheme that we employ for this packing (cf. Appendix A.2) and the increase in communication from the new type of ciphertexts.
Runtime in the Online Phase. In the online phase, we compare our approach (using convolution triples) to the standard SPDZ protocol (the distinction between LowGear and HighGear is only meaningful for the offline phase) and the use of matrix triples to emulate convolutions (as done in [14]). Note that for matrix triples, we assume that matrix triples of any shape are already precomputed. This is the optimal setting for the matrix-based approach and strictly better than [14] which only produces matrices that are a multiple of 128 × 128 in size. For the same layers as in Table 2, our approach with convolution triples clearly outperforms the SPDZ online phase (LAN: 16.39 × faster; WAN: 27.21 × faster) and also the approach based on matrix triples (LAN: 8 % faster; WAN: 12 % faster). The detailed results can be found in Appendix G.
For depthwise convolutions, our advantage of specialized convolution triples is even more pronounced (in certain cases) compared to SPDZ (LAN: 19.41 × faster on average for ℎ ∈ {7, 25, 50, 120, 240} and 41.84 × faster for ℎ = 7; WAN: 20.14 × faster on average and 42.58 × faster for ℎ = 7) and also compared to matrix triples (LAN: 13.51 × faster on average and 40.15 × for ℎ = 7; WAN: 15.70 × faster on average and 41.84 × for ℎ = 7). Hence, we observe a considerable speed-up for small images (due to the better communication complexity) that gets smaller as the image size (and computational complexity) increases. However, even for large images of size 240, our advantage is 3.87 × (LAN) to 5.33 × (WAN) compared to matrix triples.
Storage Cost for Convolutions. To run the above-mentioned convolutions in the online phase, SPDZ requires storage for 188.899 GB of Beaver triples. Chen et al. would have to store 2.653 GB of 128 × 128 matrix triples. Our convolution triples require 572 MB.
In summary, our evaluation shows that our implementation significantly outperforms current actively secure state-of-the-art protocols for convolution and convolution-based ML tasks. We thank our anonymous reviewers and our shepherd for their invaluable feedback. We also thank Andrés Bruhn and Azin Jahedi from the Institute for Visualization and Interactive Systems at the University of Stuttgart for providing the computational resources and assistance with running our experiments.

A PRELIMINARIES (CONTINUED)
Here, we give additional details to the preliminaries outlined in Section 2. This includes, for example, common MPC subprotocols and details on BGV, as well as the modification necessary to use BGV with Bian et al.'s packing [6].

A.1 Homomorphic Encryption and BGV (Continued)
Here, we present a more detailed description of the BGV encryption scheme [8]. Some aspects are discussed only on a conceptual level as the details are less relevant for this work. An interested reader can find all details in [8,17]. First, we present the necessary distributions that values are sampled from (cf., e.g., [17,31]).
• HW ℎ : Outputs a vector of length with elements chosen from {−1, 0, 1}. Exactly ℎ ≤ elements are chosen to be nonzero (uniformly random from {−1, 1}). The others are zero. ℎ is chosen based on the target security level, e.g., ℎ = 64 + sec for statistical security parameter sec in [31]. In the context of BGV, the resulting -vectors are interpreted as R elements. For this, the output vector is used as coefficient vector in the polynomial ring (and reduced modulo ). Let sk = s be the BGV private key. s is sampled from HW ℎ . Then, pk = (a, b) is the corresponding public key for a uniformly random a ∈ R and b = a · s + · e where e is sampled from DG . Encryption is performed with randomness r = (u, v, w) sampled with RC , i.e., the encryption of x ∈ R is enc pk (x, r) = (b · u + · v + x, a · u + · w).
The corresponding decryption is We also make use of the following ciphertext operations. Let x, y ∈ R and r, r ′ ∈ R 3 be valid encryption randomness, the BGV scheme (for suitable , ) has the following homomorphic properties: dec sk (enc pk (x, r) + enc pk (y, r ′ )) = x + y dec sk (enc pk (x, r) + y) = x + y dec sk (enc pk (x, r) · enc pk (y, r ′ )) = x · y dec sk (x · enc pk (y, r ′ )) = x · y.
Note that the x + y and x · y are additions and multiplications of polynomials, where coefficients are additionally taken modulo . We abuse the notation to also write + and · for operations on ciphertexts but these can be more complex (especially ciphertextciphertext multiplication; see below).
Also note that there is an isomorphism CRT : R → −1 =0 F (based on the Chinese remainder theorem) for the plaintext space. In particular, where ⊙ is the component-wise multiplication. We remark that using (21), a single ciphertext-ciphertext multiplication represents underlying field multiplications in F . This is used in most SPDZlike protocols since [18].
Some MPC protocols use modulus switching and key switching in the ciphertext-ciphertext multiplication (e.g., [17,31]), i.e., the multiplication of two ciphertexts in = R 2 yields a ciphertext in ′ = R 2 ′ that can be decrypted just as before. Note that (16) takes ⟨x⟩ ∈ as input and all operations before the reduction modulo are modulo . For ciphertexts after modulus and key switching, dec ′ sk : ′ → R should be used where the operations are the same as in (16) but modulo ′ before the modulo reduction. For simplicity, we simply write dec sk also for this decryption operation. Details on the ciphertext-ciphertext multiplication, as well as modulus and key switching can be found in [8,17]. Ciphertext-ciphertext addition is done component-wise and plaintext-ciphertext multiplication simply multiplies the plaintext with each ciphertext component. Ciphertext-plaintext addition can be done by adding the plaintext to the first component of the ciphertext. (Equivalently, one could generate a ciphertext for the plaintext by encrypting it with zero-randomness and then use the ciphertext-ciphertext addition.) A.1.1 BGV Noise Drowing. We are interested in an encryption enc ′ with additional noise (drowning noise) that is large enough to statistically hide the decryption noise of plaintext of the form x · ⟨y⟩, i.e., the following.
Theorem A.1. The encryption with drowning noise enc ′ pk (z) statistically hides the noise of x · enc pk (y) for arbitrary x, y, z ∈ R . This is used in LowGear [31] to build a secure triple generation from only linear homomorphic encryption. The original approach of LowGear simply choses encryption randomness (and z) exponentially larger than for normal encryption. We give the newer version, e.g., implemented in [29], with enc ′ pk (z, r ′ ) = enc pk (z, r ′ ) where partdec is the partial decryption, i.e., decryption without reduction modulo . This is also the noise than can be observed after decryption.

A.2 Applying Bian et al.'s Modifications to Linear Homomorphic BGV
Bian et al. [6] modified (private-key) BFV to homomorphically apply an arbitrary linear operation to encrypted data vectors. Here, we present the corresponding modification to (public-key) BGV. The generation and keys remain the same as in Appendix A.1. However, we only use the vector notation of polynomial multiplication and explicit negacyclic convolution instead of polynomial multiplication here, i.e., sk = and pk = ( , ) with = * + · where is sampled uniformly random from Z and is sampled with DG . A tool that Bian et al. use (and that is usually not used for BGV) is the representation of polynomial multiplications (or negacyclic convolutions) with (nega)circulant matrices cırc : This means, we can write the typical polynomial multiplications in terms of matrix-vector multiplications. With this, encryption is also similar to (15) but the second ciphertext component is expanded: expandenc pk ( , ) ( * + · + , cırc( * + · )), (25) where the encryption randomness = ( , , ) is again sampled from RC . We use ⟨⟨ · ⟩⟩ pk analogously to ⟨ · ⟩ pk as notation for expanded ciphertexts under public key pk. Decryption is similar to (16) but the second part of the ciphertext is now multiplied with as a matrix-vector product (instead of a polynomial multiplication): As before, all operation in (25) and (26) are modulo (except the reduction modulo in (26)).
Theorem A.2. The modified BGV scheme (pictured above) is a correct (public-key) encryption scheme and CPA secure.
Proof. This can be seen by following the proof of [6] but with the modified BGV scheme instead of BFV. We summarize the core observations.
Correctness: The scheme still decrypts correctly as we simply split up the negacyclic convolution of (16) (that appears in form of a polynomial multiplication) in two. This can be done using (24).
CPA Security: Consider the standard BGV encryption of (15). Compared to (25), cırc is applied to the second component of a standard BGV ciphertext. This can be efficiently done (and undone) without any secret information. Thus, the CPA security of the modified scheme can be trivially reduced to the CPA security of the standard BGV scheme. In addition to simple linear operations (ciphertext-ciphertext addition, ciphertext-plaintext addition, and plaintext-ciphertext multiplication), which can be performed as for standard BGV, the modified scheme allows for applying linear operations on the encrypted plaintext vector. Theorem A.3. Let ∈ Z × be the matrix for an arbitrary linear transformation and ∈ Z . Then, expanddec sk ( · expandenc pk ( , )) = · (27) for valid encryption randomness and · ⟨⟨ ⟩⟩ ( · ⟨⟨ ⟩⟩[0], · ⟨⟨ ⟩⟩ [1]).

A.3 Zero-Knowledge Proofs
In the following, we present the zero-knowledge proofs of knowledge (ZKPoKs) used in our protocols. First, we present a non-interactive proof based on SPDZ [18] (which utilizes the Fiat-Shamir heuristic). Then, we give an interactive TopGear-style (multiparty) ZKPoK [3]. The first is used in LowGear-style protocols and the second one in HighGear-style protocols. Note that one can also define a non-interactive (non-multiparty) proof with TopGear-style challenges. This is, for example, done (and implemented) in MP-SPDZ [29]. Therefore, we use this in our implementation for Low-Gear-style protocols if = ∅ (as in MP-SPDZ's implementation of the standard LowGear protocol [31]). We do not picture this ZKP variant here.

A.3.1 SPDZ-Style ZKPs.
Our SPDZ-style ZKP can be found in Fig. 8. We slightly change the ZKP compared to SPDZ by requiring the plaintexts to be zero at fixed positions. In [18], this is only done with = ∅ and = [1.. ). However, one can easily prove this general version secure in the same way as the original ZKP of [18].
In the protocol, we use a general security parameter sec ZK and ≈ 2 · · √ as in [18].
A.3.2 TopGear-Style ZKPs. In Fig. 9, we present the TopGear ZK-PoK protocol [3]. This is a -party ZKPoK and proves that summing up all parties' ciphertexts yields a valid ciphertext. Baum et al. [3] also give only version of this proof for = ∅ or = [1.. ). As with the above changes to the original SPDZ ZKPoK, one can easily extend this to arbitrary values of and prove it secure.
In the protocol, we use a security parameter sec ZK for the statistical distance of the real ZKP execution from a simulation and 0 = 1, 1 = 2 = 20 just like Baum et al. For = ∅, one requires ≥ (sec soundness + 2)/log( + 1) for security, where sec soundness is the security parameter for the proof soundness. For other values of , ≥ sec soundness + 1 is required.

A.4 MPC and SPDZ
Here, we want to point out the remaining subprotocols used in our SPDZ-like protocols. This includes the MAC check (Fig. 10), ZKP subprotocols (Fig. 11), initialization or setup phases (Figs. 12 and 13), and distributed decryption (Fig. 14). Additionally, our protocols use several standard functionalities. We do not picture them here but describe their function shortly. F rand is used to agree on random values. These values are then available at every party and uniformly random from the required set (usually F elements or challenges for TopGear ZKPs; cf. Appendix A.3.2). F commit models a synchronization step where all parties first send a value to the functionality and then receive every other party's value after all messages of the first round arrived. Finally, F setup models key generation for BGV. Depending on the protocol style (LowGear or HighGear), these are either keys for every parts or a single public key and a secret-shared private key.

Figure 13: Initialization
Step of the Somewhat Homomorphic Offline Phase Used in Π offline-SHE (cf. Fig. 19) at Party Remark B.1. Note that the above defines the (un)packing directly and not via mapping mapi, mapf, mapr. For compatibility with the protocols that make use of the mapping, e.g., Fig. 19, we define them as the identity.
Remark B.2. As individual shares for scalar elements are more versatile than -vectors of shares, one can add as a last step of Triples DistDec(⟨x⟩): Perform distributed decryption to obtain x. (Adapted from [18].) 1. Let be a bound on the noise of ⟨x⟩ and let ⟨x⟩ (c 0 , c 1 ).

Sample m
(Adapted from [31].) 1. Let be a bound on the noise of ⟨x⟩ and let ⟨x⟩ (c 0 , c 1 ).

C CONVOLUTION PACKING (CONTINUED)
After finishing Section 3.2.1 by giving the proof for Theorem 3.3, we also give another recent convolution packing method: Bian et al.'s packing method [6] that performs multiple independent convolutions by performing a single matrix-vector multiplication.

C.1 Multidimensional Convolution Packing (Continued)
Here, we give the proof for Theorem 3.3.

C.2 Bian et al.'s Parallel Convolution Packing
In [6], Bian et al. propose a technique to perform multiple independent convolutions in parallel. In contrast to most other approaches discussed in this work, their approach does not encode multiple convolutions into a single polynomial multiplication. Instead, they make use of specially constructed matrices. More specifically, they aim to compute * for (1d) images 1 , . . . , ∈ and (1d) filters 1 , . . . , ∈ ′ . 9 Our bilinear operation op : × × ′ × → ′′ × is in this case just (( 1 , . . . , ), ( 1 , . . . , )) ↦ → ( 1 * 1 , . . . , * ). For the packing, one would define packi as the concatenation of the s into a single vector and the output of packf as = diag(cırc( 1 ), . . . , cırc( )), i.e., a block-diagonal matrix with matrices that correspond to convolutions with in the -th block. op R : × × → (or R × R → R) is then the matrix multiplication ( , ) ↦ → · , which yields · = ( 1 * 1 , . . . , * ) T . This way, one can obtain a vector that encodes the concatenation of parallel/independent convolutions with a single matrix-vector product. Please note that our general framework also supports this matrix variation. To evaluate this securely, they present a variant of a homomorphic encryption scheme that supports such a matrix-vector multiplication of a plaintext matrix with an encrypted vector. We further extend this, such that the use in LowGear-style protocols is secure (cf. Appendix F.3.2).

D NEW PACKING METHODS (CONTINUED)
Here, we present the correctness proof for our new packing methods of Section 4.

D.2 Generalization of Huang et al.'s Convolution Packing (Continued)
Here, we give the proof for Theorem 4.2.

D.3 Depthwise Convolution Packing (Continued)
Here, we give the proof for Theorem 4.3.

E SECURITY OF THE ONLINE PHASE
Before we can prove Theorem 5.1, we give the full online protocol in Fig. 16 and the corresponding functionality in Fig. 15. Note that Π online also uses F offline given in Fig. 17 (cf. Appendix F), as well as F rand that we describe in Appendix A.4. The security of our offline phase therefore directly follows from the established security guarantees of the underlying constructions.
Theorem 5.1. The online protocol Π online securely implements the ideal functionality F online in the (F offline , F rand )-hybrid model.
Proof. Compared to SPDZ [17,18], the only difference in our protocol is the use of specialized triples for convolutions (and matrix multiplications). This, however, is just a generalization of the standard Beaver triples for scaler multiplication and is secure for any bilinear operation [14]. □ Figure 17 pictures the offline functionality that we want to implement in classical SPDZ-like protocols. Functionalities and subprotocols are discussed in Appendix A.4.

F SECURITY OF THE OFFLINE PHASE
Functionality F online Init: On input (init, ) from all parties.

Setup a storage for a write-only mapping of identifiers to
values (or tensors) in F . Input: On input (input, , ID( ), ) from and (input, , ID( )) from all other parties where ID( ) has not been assigned a value before.
Proceed as in Convolve but with dconv2d instead of conv2d. MatrixMultiply: On input (matmul, ID( ), ID( ), ID( )) from all parties where ID( ) has not been assigned a value before and ID( ) and ID( ) have been assigned.
1. Retrieve and output it to the adversary. 2. If the adversary replies ok, also output this value to all parties. Otherwise, output ⊥.

F.1 Linear Homomorphic Offline Phase
For the LowGear protocol [31], the security proof does not prove the security of an online phase that performs an arithmetic circuit computation and an offline phase that only produces correlated randomness. Instead a somewhat combined functionality F auth-MPC is constructed and the proof shows that LowGear securely implements this. We omit an explicit depiction of this functionality here, but the general design follows [31]. F auth-MPC behaves as follows. Firstly, a functionality F auth-linear can be constructed from F online that simply omits the non-linear    Fig. 19) at Party operations (Multiply, Convolve, etc.). This corresponds to F ⟦·⟧ in [31]. Secondly, F auth-MPC is F auth-linear where these omitted operations are contained but changed in the following way. Instead of taking two already assigned and one unassigned identifier, the operation takes three unassigned identifiers. It then samples random triples for the operation (e.g., a random image and a random filter of the correct shape for a convolution and then computes the convolution result) and stores it under the three identifiers. F auth-MPC corresponds to F Triple in [31]. Also, Π offline-LHE could use the F auth-linear internally for linear operations in Sacrifice and to authenticate values. Please note, that to get a standalone protocol description, we did not use F auth-linear in Fig. 7.
Theorem 6.1. The offline protocol Π offline-LHE securely implements the ideal functionality F auth-MPC in the (F auth-linear , F commit , F rand , F setup )-hybrid model with rewinding if the used BGV cryptosystem achieves enhanced CPA-security [31].
Proof. Compared to LowGear [31], our protocol exhibits the following changes: (i) Parties possess a single public-key private-key pair instead of key pairs (pk , , sk , ) between each pair of parties ( , ). (ii) The key pairs are generated by a setup functionality F setup . (iii) Different encodings/packings and adapted ZKPs are used. The first two points are done to simplify the exposition of the protocol (a simulator can still decrypt messages encrypted under the public key of corrupted parties as the key generation is under control of the simulator in the simulation; the adversary can encrypt messages that -without access to the private keyonly the intended recipient can decrypt). One could also modify our protocol and use the original LowGear design instead of changing (i) and (ii).
For point (iii), notice that the ZKPs only differ in that they additionally prove that encrypted messages are correctly packed. Hence this does not influence the security of the protocol. However, notice that the masks in Multiply are chosen to drown any information about the multiplication result and additionally hide any structure that could result from multiplying packed values, i.e., the outputs of the Multiply step are indistinguishable from what would be received in LowGear. Therefore, we can simply perform all steps of the simulation of LowGear's security proof to also prove our protocol secure. □ Protocol Π offline-SHE Triples(op, , ′ ): Generate a triple for the bilinear map op.
Proof. The offline protocol Π offline-SHE is structured like the offline phase of HighGear [31] or TopGear [3]. The use of different encodings/packings and the adapted ZKPs are the only difference compared to these protocols. As already mentioned above, proving the necessary properties for the ZKPs can be done by simply following the proof in [3]. Similarly, the full security proof follows the blueprint of a SHE-based SPDZ-like offline phase [3,18,31], where the simulator simply has to be adapted to apply the packing method. Notice that the shares in the output of Reshare and Share-Dec appear uniformly random in our protocol (independently of the packing method used), as well as in SPDZ, as the masks are uniformly random. □

F.3 Linear Homomorphic Offline Phase Utilizing Bian et al.'s Parallel Convolution Packing
In this section, we investigate the packing of [6] (cf. Appendix C.2) as part of an offline phase. To use this convolution packing, Bian et al. modified the (private-key version of) the BFV encryption scheme [7,22] in [6] to support homomorphic matrix-vector multiplication. A similar modification for (public-key) BGV is possible in a straightforward way. We call the encryption algorithm of the modified BGV instance expandenc and we have expandenc( · ) = expand(enc( · )) (cf. Appendix A.2 for details). The new encryption scheme can then be used to perform matrix-vector multiplication with encrypted vectors and plaintext matrices -instead of polynomial multiplications (or negacyclic convolutions). The respective packing method allows us to encode multiple convolutions in a single matrix multiplication (cf. Appendix C.2) in an actively secure way. Before we describe our offline protocol in Appendix F.3.2, we investigate the combination of the packing from [6] and our new encryption scheme w.r.t. active security.
F.3.1 Active Security with the Modified BGV Scheme. Recall that LowGear-style protocols use a pairwise subprotocol that multiplies a ciphertext and a plaintext and drowns the result with an encrypted mask (see Step 1.2. of Multiply in Fig. 6). The straightforward extension that simply uses the new encryption scheme expandenc( · ) = expand(enc( · )) comes with a security issue: due to the underlying packing the matrices and vectors come with a certain structure. This structure changes under the plaintext-ciphertext matrix multiplication. Hence the product and the mask no longer have the same structure. In particular, the mask no longer drowns all information in the product and information on the plaintext matrix (the structure or the values) are leaked. Obviously not masking the product at all, as in [6], is also not viable as it directly leaks information about the plaintext matrix. Instead, we propose a (secure) alternative encryption expandenc ′ for the mask to be used in a LowGear-style protocol. We remark that our construction might be of independent interest for other protocols. Formally, we get the following two security guarantees: Theorem F.1. Let enc ′ be the encryption with drowning noise from LowGear (cf. [31] and Appendix A.1.1). The encryption with drowning noise statistically hides the noise of · expandenc pk ( ) for arbitrary ∈ Z × , ∈ Z , [ ] is encryption randomness for enc ′ , ∈ Z is sampled uniformly at random, and [ , ] = − · .
Analyzing the bounds of this, we get ∥expandpartdec sk (expandenc ′ pk ( , ))∥ ∞ = ∥partdec sk (enc ′ pk ( , [ ]))∥ ∞ for any ∈ [.. ). Finally, note that for arbitrary ∈ Z , ∈ Z × , x, y ∈ R ∥expandpartdec sk ( · expandenc pk ( ))∥ ∞ = ∥partdec sk (x · enc pk (y))∥ ∞ as we upper-bound both the result of a multiplication of a value /y with (i.e., ∥ · ∥ ∞ ) and with x (i.e., ∥x · y∥ ∞ where the multiplication is a polynomial multiplication) by · · ∥ ∥ ∞ and · · ∥y∥ ∞ , respectively. □ Theorem F.2. The encryption with drowning noise expandenc ′ for the modified BGV scheme computationally hides · expandenc pk ( ) (for expandenc ′ , , , etc. as in Theorem F.1). i.e., is masked with (parts of) RLWE samples. For this, note that * [ ] + · [ ] is indistinguishable from uniformly random (if is uniformly random or indistinguishable from it -as it is for every party expect the one holding sk). The multiplication with simply selects the -th element of the -th RLWE sample. With the sum over all , we get that the -th element of is masked with the -th element of the -th RLWE sample which is indistinguishable from random. Therefore, 0 is indistinguishable from uniformly random (based on the hardness of the RLWE problem With this (secure) drowning encryption, we can construct a LowGear-style protocol (similar to the LHE protocol described in Protocol Π modified-offline-LHE Note that we use the modified BGV scheme (cf. Appendix A.2) and normal BGV encryption below. MMultiply( , , ⟨ 0 ⟩ pk 0 , . . . , ⟨ −1 ⟩ pk −1 ):
MTriples(op, , ′ ): Generate a triple for the bilinear map op.   Fig. 7) at Party Section 6.2) but based on matrix-vector products instead of polynomial multiplication, as well as the modified BGV scheme. This is outlined next. This offline phase mostly mirrors the linear homomorphic offline phase of Section 6.2 but with different encodings and homomorphic matrix-vector multiplications instead of polynomial multiplications of ciphertexts. Note that we still use the standard BGV scheme for ZKPs and authentication since the previously described modifications to BGV are not needed for these subprotocols and would only lead to an additional overhead from the use of expanded ciphertexts. Indeed, we can simply perform the standard ZKPs and expand the ciphertexts later to send less data and reuse existing implementations. Also, the multiplication with encrypted shares of does not require the properties of the modified BGV scheme and can thus fall back to the same techniques as in Section 6.2.
Similar to Theorem 6.1, the following theorem captures the security of our modified LHE-based theorem. The required functionalities are the same as for Theorem 6.1.
Theorem F.3. The offline protocol Π modified-offline-LHE securely implements the ideal functionality F auth-MPC in the (F auth-linear , F commit , F rand , F setup )-hybrid model with rewinding if the used BGV cryptosystem achieves enhanced CPA-security [31].
Proof. The only difference between MTriples in Fig. 20 and Triples (in Π offline-LHE ; Fig. 7) is the use of MMultiply instead of Multiply. These protocols only differ in the use of classical BGV or the modified BGV scheme. Our results in Appendix A.2 and Theorems F.1 and F.2 show however that both schemes come with the same security guarantees. Hence the security of our protocol Π modified-offline-LHE follows exactly as in the proof of Theorem 6.1. □

G IMPLEMENTATION AND EVALUATION (CONTINUED)
Here, we give supplementary information for our evaluation (cf. Section 7). Firstly, note that the optimized distributed decryption to get shares directly (ShareDec in Fig. 18) of [31] is not implemented in MP-SPDZ [29] as of the time of our implementation. 11 Secondly, we use a statistical security parameter of sec = 40 and a prime of length log = 128. This implies that our protocols (in the LowGearvariant) have the same BGV parameters as standard LowGear ( = 8192 and ciphertext modulus of the same size as LowGear). For HighGear, a ciphertext modulus that is 9 bit larger than standard HighGear (and = 16384 as for HighGear) is necessary as we want to compute (up to) 512 ciphertext additions. Note that some results for our protocols and LowGear/HighGear are extrapolated from our experiments as MP-SPDZ does not support (very) large tensors. Another reason why we extrapolate results is to finish the experiments in a reasonable time frame. Therefore, we extrapolate the findings from our experiments for some runs of LowGear/HighGear and also for our protocols for large depthwise convolutions. To obtain separate timings for the offline and online phase of [14], we used their total (online and offline) results and subtracted timings obtained from experiments of our own for the online phase with a suitable number of matrix multiplications (cf. Table 4). As the difference in CPU performance of our machines and theirs is not large (ours are around 6 % faster) and the offline phase is considerably slower than the online phase, this is a reasonable approximation. However, the tables for the overall (online and offline) performance are available as well. Next, we present more details that complement Section 7. Table 5 shows the parameters for all convolutions in ResNet50, 12 as well as the corresponding matrix multiplication that emulates the convolution. We also show the number of matrix multiplication one would use with [14] for each convolution. This corresponds to These results are used as an approximation for the online phase of [14]. The results are given for · ′ multiplications, where rounds of ′ (parallel) multiplications are computed. The layers and settings correspond to Tables 2 and 3 the number multiplication for square 128 × 128 matrices that are required to emulate the multiplication of a × and a × ′ matrix.
LowGear-Style Protocols. To complement the result for the runtime in the offline phase ( Table 2 in Section 7), we give the overall runtime for our LowGear-style protocols compared to the related work in Table 6. Additionally, the computation cost can be seen in Table 7 for the offline phase and in Table 8 for the overall (online and offline) cost.
HighGear-Style Protocols. To evaluate our protocols for a larger number of parties, we implemented the HighGear variants of our protocols. The results are given for = 4 parties. We do not compare our HighGear variants to [14] as their only provide results for = 2 parties. Table 9 shows the benchmark results for the packing schemes and SPDZ with HighGear-based protocols (similar to Table 2 for LowGear). Again, we can see that the convolution packing methods outperform the classical SPDZ approach. The corresponding overall runtime can be found in Table 10. We also give the communication costs in Tables 11 and 12.
Depthwise Convolutions. Tables 13 and 15 show additional results for our depthwise convolution experiments ( Table 3). The first expands on Table 3 by giving the results for additional image sizes. One can clearly see that the (not depthwise) convolution packing methods (simple packing and generalization of Huang et al.'s packing) have essentially the same complexity for all small images as only one output channel can be computed at once. The depthwise packing can instead compute multiple results at once. The LowGear protocol is for very small image sizes most efficient (or similarly efficient to depthwise packing) as the packing method is not perfectly optimal w.r.t. the usage of ciphertexts slots. The computational without the overhead of the secure drowning, one observes that this packing is slower than the other packing methods or even standard field multiplication with LowGear. The corresponding overall runtime (online and offline) can be found in Table 16. The communication cost can be found in Tables 19 and 20.
Online Phase. Tables 21 and 23 show our runtime results for the online phase. The online phase is benchmarked for = 2 and = 4, corresponding to the two settings (i.e., LowGear with two parties