ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference Pipeline

Machine Learning as a service (MLaaS) permits resource-limited clients to access powerful data analytics services ubiquitously. Despite its merits, MLaaS poses significant concerns regarding the integrity of delegated computation and the privacy of the server's model parameters. To address this issue, Zhang et al. (CCS'20) initiated the study of zero-knowledge Machine Learning (zkML). Few zkML schemes have been proposed afterward; however, they focus on sole ML classification algorithms that may not offer satisfactory accuracy or require large-scale training data and model parameters, which may not be desirable for some applications. We propose ezDPS, a new efficient and zero-knowledge ML inference scheme. Unlike prior works, ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy. Each stage of ezDPS is harnessed with an established ML algorithm that is shown to be effective in various applications, including Discrete Wavelet Transformation, Principal Components Analysis, and Support Vector Machine. We design new gadgets to prove ML operations effectively. We fully implemented ezDPS and assessed its performance on real datasets. Experimental results showed that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics while maintaining more desirable accuracy than single ML classification approaches.


Introduction
Machine learning (ML) has grown to become a game-changer for the humane society.A well-trained ML model can e ectively aid in performing highly complicated tasks such as medical diagnosis, natural language processing, intrusion detection, or nancial forecasting.However, since a powerful ML model requires a large amount of data and computational resources for training, it may not be widely accessible to individuals or small organizations.To address this issue, Machine Learning as a Service (MLaaS) has been proposed, which permits resource-limited clients to access useful ML services (e.g., visualization, training, classi cation) o ered by cloud providers.
Despite its usefulness, MLaaS has posed new integrity and privacy concerns.When the client delegates the ML computation to the MLaaS server, it is not clear if she will receive a reliable response.A corrupted server may process the client data arbitrarily or even substitute it with malicious data, making the outcome untrustworthy.This is especially critical for sensitive applications such as medical diagnosis, intrusion detection, or fraud detection.Computation integrity can be addressed with Veri able Computation (VC), in which the MLaaS server attaches a proof to show that the computation is carried out correctly [19].However, VC itself may not be su cient for MLaaS because it only enables computation integrity but not the privacy of the parameters used in the computation.In MLaaS, the server uses its private ML model to process the client data.This sophisticated model may cost signi cant resources to obtain and, therefore, it is considered the intellectual property of the server.Moreover, such models may also be trained from sensitive training data (e.g., medical).As a result, it is undesirable that the MLaaS server leak any information about its private ML models when processing the client query.
The above privacy concern in MLaaS can be addressed by adding the zero-knowledge property to the VC proof, which permits veri able computation without leaking any information other than the computation result [22].Preliminary zero-knowledge VC (zkVC) protocols are computation and communication expensive with strong assumptions.Thanks to the recent advancements in cryptography, recent zkVC protocols have become more practical.Recently, Zhang et al. [71] have initiated zero-knowledge ML (zkML) research.In zkML inference, the server rst commits to its ML model parameters and then provides an interface for the client to process her data sample.Given a client data sample, the server returns the ML computation result along with a zero-knowledge proof, which permits the client to verify the ML computation regarding the committed model without learning the model parameters in the proof.
Few zkML schemes have been proposed such as zero-knowledge Decision Tree (zkDT) [71], and zero-knowledge deep learning [40,45].Although the decision tree (DT) is simple with the lightweight model parameters, it o ers limited accuracy for predicted outcomes.Deep neural networks (DNNs) permit a high accuracy rate, however, it may require a large amount of training data and heavy model parameters and, therefore, may not be ideal for some applications.Most zkML schemes (e.g., [40,45,71]) also focus solely on the nal ML inference phase, while the data is generally processed via a so-called ML pipeline with multiple processing phases (e.g., (pre)processing, feature extraction, and classi cation) to achieve a desirable performance.Thus, there is a need to develop a zero-knowledge ML pipeline to achieve balanced performance and model complexity for some applications.Research Objective.The objective of this paper is to design an e cient and zero-knowledge ML pipeline, which permits the data to be processed in multiple phases for accuracy while at the same time, permitting the veri ability without leaking private model parameters at every processing phase.Our Contributions.In this paper, we propose ezDPS, an e cient and zero-knowledge ML inference pipeline, which o ers desirable security properties (e.g., zero-knowledge, veri ability) along with high accuracy for MLaaS.ezDPS comprises typical phases of an ML pipeline, including data (pre)processing, feature extraction, and classi cation.In ezDPS, we instantiate with established classical ML algorithms including, Discrete Wavelet Transformation (DWT) [66] for preprocessing, Principal Components Analysis (PCA) [69] for feature extraction, and Support Vector Machines (SVM) [7] for classi cation due to their popularity and wide adoption in many applications [47,48].To our knowledge, we are the rst to propose a zero-knowledge ML inference pipeline.Our concrete contributions are as follows.
• New gadgets for critical ML operations.We create new gadgets for proving essential ML operations in arithmetic circuits such as exponentiation, absolute value, and max/min in an array ( §4.1.2).These gadgets are necessary for proving concrete ML algorithms in our proposed scheme but also for other ML operations such as deep learning.
• New zero-knowledge ML inference pipeline scheme with high accuracy.Built on top of our proposed gadgets, we design ezDPS, an e cient and zero-knowledge ML inference pipeline, permits the data to be processed with e ective ML algorithms for high accuracy ( §4.2).We design new meth-ods to prove DWT, PCA, and multi-class SVM with di erent kernel functions via an optimal set of arithmetic constraints.ezDPS signi cantly outperforms the generic approaches both in asymptotic and concrete performance metrics.ezDPS is designed to be compatible with any zkVC backend (similar to [71]), thus, its concrete e ciency can be further improved when adopted with a more e cient zkVC.We also propose a zero-knowledge proof-of-accuracy scheme to enable public validation of the e ectiveness of the committed ML model on public datasets ( §4.2.5).
• Formal security analysis.We present a formal security model for zero-knowledge ML inference pipeline ( §3) and rigorously analyze the security of our scheme.We prove that ezDPS satis es the security of a zero-knowledge ML inference pipeline ( §5).
• Full-edged implementation, evaluation, and comparison.We fully implemented our proposed techniques ( §6) and conducted a comprehensive experiment to evaluate their performance in real-world environments.( §7).Experiments on real datasets showed that ezDPS achieves oneto-three orders of magnitude more e cient than the generic circuit approaches in all performance metrics (i.e., proving time, veri cation time, proof size).Our implementation is available at https://github.com/vt-asaplab/ezDPS Remark.In this paper, we focus on the veri ability of the ML inference task and the privacy of the server model in the integrity proof.Our technique does not permit client data privacy, in which the client sends plaintext data to the server for computation.This model is di erent from the standard privacy-preserving ML inference (PPMLI) (e.g., [10,20,34,43,56]), which preserves the privacy of the client and server against each other but not computation integrity (see §8 for more details).To our knowledge, it is not clear how to combine zero-knowledge with PPMLI e ciently to enable both client and server privacy plus computation integrity.We leave such an investigation as our future work.Application use-cases.Our zkML inference scheme can be found useful in various applications.First, it can be used to enable proof-of-genuine ML services, in which the service provider can prove that its ML model is of high quality, and the inference result is computed from the same model.Another application is a fair ML model trading platform with try-before-buy, in which the buyer can attest to the ML model quality before purchase, while the sellers do not want to reveal their model rst.Finally, our technique can partially address the reproducibility problem in ML [24], where some ML models are claimed to achieve high accuracy without having a proper way to validate them.Our technique can o er a solution to this issue, in which the model owner can prove that there exists an ML model that can achieve such accuracy (see §4.2.5), and the veri er can verify that statement e ciently in zero knowledge.

Preliminaries
Notations.For ∈ ℕ, we denote [1, ] = {1, … , }.Let be the security parameter and negl(⋅) be the negligible function.We denote a nite eld as .PPT stands for Probabilistic Polynomial Time.We use bold letters, e.g., and , to denote vector and matrix, respectively.⊤ means the transpose of .We write (or ⋅ ) to denote dot product and • to denote Hadamard (entry-wise) product.We use ≈ to denote that two quantities are computationally indistinguishable.

Commit-and-Prove Argument Systems
Argument of knowledge.An argument of knowledge for an NP relation  is a protocol between a prover  and a veri er , in which  convinces  that it knows a witness for some input in an NP language ∈  such that ( , ) ∈ .Let ⟨, ⟩ denote a pair of PPT interactive algorithms.A zero-knowledge argument of knowledge is a tuple of PPT algorithms zkp = (, , ) that satis es the following properties.
Commit-and-Prove zero-knowledge proof.Commit-and-Prove (CP) Zero-Knowledge Proof (ZKP) permits the prover to prove the NP-statements on the committed witness.Most generic ZKP protocols support CP paradigm and the most e cient CP-ZKP protocols harness the succinct polynomial commitment scheme (e.g., [35]) to achieve succinctness properties.The prover rst commits to the witness using a zero-knowledge polynomial commitment scheme before proving an NP statement, and the veri er takes the committed value as an additional input for veri cation.We denote the commitment algorithm for CP-ZKP as cm ← zkp.Com( , , pp), where is the randomness chosen by the prover.
In our framework, we use Spartan [61] (with Hyrax [67] as the underlying polynomial commitment scheme) as the backend zkPC-based CP-ZKP protocol due to its succinctness properties (e.g., linear proving time, sublinear veri cation time, and proof size), transparent setup, and support generic Rank-1 Constraint System (R1CS).Generally speaking, Spartan supports NP statements expressed as R1CS, which shows that there exists a vector = ( , 1, ) such that • = , where , , are matrices for the arithmetic circuits, is the public input (statement), is the witness of the prover.All the witnesses are encoded as a polynomial on the Lagrange basis.Since it is easy to convert arithmetic statements into R1CS, our main focus is to create arithmetic constraints for proving algorithms in the ML pipeline e ciently that can be realized with Spartan or any CP-ZKP backend.
Theorem 1 (Spartan ZKP [61]).Let be a nite eld and  be a family of the arithmetic circuit over of size .Under standard cryptographic hardness assumptions, there exists a family of succinct argument of knowledge for the relation where and are the public input and the auxiliary input to the circuit , respectively, and the prover incurs ( ) to ( log ) overhead, the veri er's time and communication costs range from (log 2 ) to ( √ ) depending on the underlying polynomial commitment schemes being used for multilinear polynomials.
Note that since Spartan is established on the polynomial commitment schemes, it can support CP-ZKP paradigm.

Machine Learning Pipeline
ML pipeline is an end-to-end process that consists of multiple data processing phases to train an ML model from a large-scale dataset e ectively and to predict an inference result for a new observation accurately [31].An e ective ML pipeline contains three main phases, including data preprocessing, feature extraction, and ML training/inference as illustrated in Figure 1.In data preprocessing, raw samples ∈ are collected, and then some preprocessing technique is used to reduce the impact of noise in the collection environment.Feature extraction extracts the most prominent dimension of the preprocessed data so that only a small set of features ′ ∈ will be fetched for e cient computation and a high convergence rate.Finally, the ML training computes a prediction model ′ from a set of feature vectors { ′ } as well as their labels { }, while ML inference computes the label from the feature vector ′ of a new observation using the prediction model ′ .
In this paper, we focus on the ML inference pipeline (MLIP), in which the client collects raw data, and the server processes the data in multiple stages (i.e., preprocessing, feature extraction, ML classication) to obtain the nal inference result.At each stage, the server can employ its private ML model parameters obtained from its training pipeline to process the client data.We denote such MLIP functionality as ←  mlip ( , ), where ∈ is the data sample, ∈ is MLIP model parameters in all stages, and ∈ is the inference result.

Models
System and threat models.Our system consists of two parties, including the client and the server.The server holds well-trained MLIP model parameters and provides an interface for the client to classify her data sample using its model .
We consider the client and server to mutually distrust each other.The adversarial server can be malicious, in which it may process the client's query arbitrarily.On the other hand, the client is semi-honest, in which she is curious about the server's model parameters.In this setting, we aim to achieve inference integrity and model privacy.To enable inference integrity, the server rst commits to its model .Given a client request, the server computes the inference result along with a proof to convince the client that the result is indeed computed from the committed model rather than an arbitrary answer.To ensure model privacy, the proof should not leak any information about the model .
Formally speaking, a zero-knowledge MLIP is a tuple of algorithms zkMLIP = (, Com, , ) as follows • pp ← zkMLIP.(1, ): Given a security parameter and a bound on the size of the MLIP model parameters , it outputs public parameters pp.
Security model.We de ne the security de nition of zero-knowledge MLIP that captures inference integrity and model privacy in the integrity proof as follows.
De nition 1 (zero-knowledge MLIP).A scheme is zero-knowledge MLIP if it satis es the following properties. • Out-of-scope attacks.Our security de nition captures the inference integrity and the model privacy in the integrity proof .There exist model stealing attacks [6,64] that target only the inference result to reconstruct the model .In this paper, we do not focus on addressing such vulnerabilities.It is because there exist independent studies that address these vulnerabilities (e.g., [6,30,36,41,64]) and, with some e orts, they can be integrated orthogonally into our scheme to protect from both and .For example, by simply limiting the inference result information (i.e., return only the predicted label like our scheme currently o ers), it makes the attack become 50-100× more di cult [64].We elaborate all these approaches in Appendix E. Our main goal is to ensure is not leaked from via zero-knowledge so that the leakage from can be sealed or mitigated independently by these techniques.For curious readers, we also show how may leak signi cant information about if it is not zero-knowledge in Appendix F.
We also do not consider model poisoning/backdoor attacks (e.g., [57,58]), in which the adversarial server may target adversarial behaviors on certain data samples while maintaining an overall high level of accuracy.Mitigating such attacks requires analyzing the model parameters (e.g., [46], which may be highly challenging in our setting, where the model privacy is preserved.Thus, we leave this threat model as an open research problem for future investigation.

Our Proposed Zero-Knowledge MLIP Framework
In this section, we present the detailed construction of our framework.We start by giving an overview.Overview.Our ezDPS framework contains three processing phases, including data (pre)processing, feature extraction, and ML classi cation, as shown in Figure 1.We adopt ML algorithms for each phase including Discrete Wavelet Transformation (DWT) [66] for data preprocessing, Principal Components Analysis (PCA) [69] for feature extraction, and Support Vector Machine (SVM) [7] for classi cation.We focus on these algorithms because they were well-established in various systems and applications with high e ciency [47,48].ezDPS permits to verify a data sample was computed correctly with DWT, PCA, and SVM without leaking the parameters at each phase including, for example, low-pass and high-pass lters in DWT; mean vector and eigenvectors in PCA; and support vectors in SVM.
In ezDPS, the server rst commits to the model parameters of each ML algorithm and provides an interface for the client to process her data sample based on the committed parameters.To demonstrate the validity of the committed model, the server can publish a zero-knowledge Proof-of-Accuracy (zkPoA) to demonstrate that the committed model maintains a desirable accuracy on public datasets with ground truth labels.zkPoA permits the client to attest to the genuineness and the e ectiveness of the server's committed model before using the inference service on her data sample.zkPoA can be derived from zero-knowledge proof of inference of individual samples.We show how to construct zkPoA for our scheme in §4.2.5.
In the following sections, we rst present new gadgets for critical ML operations (e.g., max/min, absolute).Notice that our proposed gadgets are not limited to the ML algorithms selected above.They can be used to prove other useful ML kernels (Appendix C) and deep learning components (Appendix D).We then present our techniques for proving DWT, PCA, and SVM more e ciently than the generic approaches.Finally, we show how to construct a zkPoA scheme to attest to the e ectiveness of the committed model on public datasets.

Gadgets
A gadget is an intermediate constraint system consisting of a set of arithmetic constraints for proving a particular statement in the higher-level protocols.

Building Blocks
We rst present building block gadgets that were previously proposed.Permutation gadget [71].Given two vectors , ′ ∈ , Perm( , ′ ) permits to prove that is the permutation of ′ , i.e., [ ] = ′ [ ( )] for ∈ [1, ] according to some permutation .This can be done by showing that their characteristic polynomial evaluates to the same value at a random point chosen by the veri er as Due to Schwartz-Zippel Lemma [60], the soundness error of the permutation test is | | = negl( ).Binarization gadget [59].Given a vector ∈ and a value ∈ , binarization gadget Bin( , , ) permits to prove that is a binary representation of .This can be done by showing that

New Gadgets for Zero-Knowledge MLIP
We now construct new gadgets that are needed in our ezDPS scheme.These gadgets can be used to prove other ML algorithms that incur the same operations.Exponent gadget.Given two values , ∈ , we propose a gadget Exp( , , ) to prove = for public value ∈1 .This can be done using the multiplication tree and the binarization gadget (Bin).Let ∈ be an auxiliary witness.It su ces to show that GreaterThan gadget.Given two values , ∈ , we create a gadget GT( , ) to prove that > .The main idea is to compute an auxiliary witness ∶= 2 + ( − ), where is the length of the binary representation of and , and show that the most signi cant bit of is equal to 1. Let ∈ +1 and , ∈ be additional auxiliary witnesses.The set of arithmetic constraints to prove > is Support vectors for class ̂ ( ̂ ) , ( ̂ )  Weights and bias for class Maximum/Minimum gadget.Given a value ∈ and an array ∈ , we create a gadget Max( , ) (resp.Min( , )) to prove that is the maximum (resp.minimum) value in .The idea is to harness Perm and GT gadgets to prove that is equal to the rst element of the permuted array of , whose rst element is the largest (resp.minimum) value.Speci cally, to prove = max( ), it su ces to show and ( ) ′ is the permutation of .Let ′ ∈ be an auxiliary witness.The set of arithmetic constraints to prove a maximum value in an array is The constraints to prove a minimum value in an array can be de ned analogously.Absolute gadget.Given ′ , ∈ , we create gadget Abs( ′ , ) to prove that ′ is the absolute value of , i.e., = ′ or − = ′ .The idea is to compute = + 2 , where is the length of the binary representation of , and show that the most signi cant bit of represents the sign di erence of and ′ .Let ∈ +1 and ∈ be auxiliary witnesses, the set of arithmetic constraints to show that ′ is the absolute value of is

Our Proposed Scheme
We now give the detailed construction of our ezDPS scheme with DWT, PCA, and SVM algorithms.We provide the overview of each algorithm and show how to prove it with a small number of constraints.We summarize all the variables and notation being used for our detailed description in Table 1.

DWT-Based Data Preprocessing
DWT [66] exerts the wavelet coe cients on the raw data sample to project it to the wavelet domain for e cient preprocessing.A DWT algorithm contains three main operations, including decomposition, thresholding, and reconstruction.The decomposition transforms the raw input from the spatial/time domain to the wavelet domain consisting of approximation and detail coe cients.The thresholding is then applied to lter some detail coe cients, which generally contain noise.Finally, the reconstruction is applied to reconstruct the original data after noise reduction.Such decomposition and thresholding processes can be applied recursively until a small constant number of coe cients is obtained.Let ∈ be the input data sample of length , ∶= 2 , ′ ∶= 2 −1 .The DWT computes the frequency component ∈ ′ at the recursion level ≥ 1 as for ∈ [1, ], where , ∈ are low-pass and high-pass lters respectively, and 0 = .The thresholding is applied to compute high-frequency components (i.e., detail coe cients) as for ∈ [1, ], where is the public threshold parameter, sign( ) returns the sign of (i.e., 1 if ≥ 0, and −1 otherwise).The decomposition and thresholding can be applied recursively until < , or the number of rounds reaches a set value.Finally, the reconstructed data ̂ ∈ ′ at recursion level is computed as for ∈ [1, ], ̄ , ̄ ∈ are the coe cients of the inverse low-pass and high-pass lters, respectively.In summary, the DWT model parameters are , , ̄ , ̄ , .The size of the model parameter is 4 + 1, where depends on the concrete DWT algorithm used in practice, e.g., = 4 in DB-4 algorithm.Proving DWT computation.We can see that (1) incurs 8 (1 − 1 2 ) constraints, where is the length of the data sample, is the number of recursion levels.We propose a novel method to prove DWT computation in a more e cient way using our proposed split technique along with the product of sums and random linear combination.Our optimization reduces the complexity of proving the decomposition and reconstruction from ( ) to (log ).Furthermore, if the recursion level is set to a constant, the complexity can be reduced to (1).Speci cally, we rst split each element in ∈ ′ into two parts as Let ∈ be a random scalar chosen by the veri er, the prover can prove (4) holds such that We convert (5) to the product of sums as In ( 6), the number of constraints for proving DWT decomposition is reduced from to ( 2 − 1) + 4. To aid understanding, we present a toy example of our split technique in Appendix A. To prove the thresholding computation in (2), we employ the GT gadget, such that for ∈ [1, ]: In our protocol, the prover provides | [ ]| and sign( [ ]) as the auxiliary witnesses so that the number of constraints reduces from 5 + 14 to 3 + 9 for each [ + ], where is the length of the binary representation of [ + ].
The nal step is proving the DWT reconstruction, which is analog to proving the decomposition.Let ̄ ∈ be a random challenge chosen by the veri er.The prover can prove DWT reconstruction in (3)

PCA-Based Feature Extraction
PCA [69] is a method to reduce the dimensionality of the data input by representing the most signi cant characteristics of ̂ ∈ in a smaller feature vector with minimal information loss (i.e., eigenvalues).
The PCA training computes a mean vector ̄ ∈ for all data samples { ̂ } =1 as ̄ = ∑ ̂ , where is the number of samples in the training set.A covariance matrix is then computed as = 1 ∑ =1 ( ̂ − ̄ )( ̂ − ̄ ) ⊤ .The PCA training aims at nding eigenvectors = [ ⊤ 1 , … , ⊤ ] and eigenvalues ( 1 , … , ) of such that × = × where = diag( 1 , … , ).To reduce the dimension while retaining the most information about data distribution, we select eigenvectors ′ = [ ⊤ 1 , … , ⊤ ] corresponding with largest eigenvalues ( 1 , … , ).To this end, the server retains the eigenvectors ′ and the mean vector ̄ as model parameters.In the inference phase, given a new observation ̂ , the feature vector of ̂ can be computed via PCA as Proving PCA computation.There are ( ⋅ ) constraints in (9), where is the input dimension and is the feature vector dimension.We reduce the number of constraints of proving PCA computation from ( ⋅ ) to ( ) using the random linear combination by using the powers of a random challenge chosen by the veri er.This transformation converts variables' multiplication to constant multiplication, where the latter comes for free in R1CS, therefore reducing the computing complexity.Speci cally, ( 9) is equivalent to where ′ [ ] is the th term in ′ , e.g., ′ [ ] = ⊤ .Let ∈ be a random challenge chosen by the veri er.We apply the random linear combination to combine constraints in (10).Speci cally, the prover can prove (10) holds by proving that where is the power of the random challenge computed by the prover, ′ is the eigenvector and ̄ is the mean vector.

SVM Classi cation
SVM [7] is a supervised ML for classi cation problems by nding optimal hyperplane(s) that maximizes the separation of the data samples to their potential labels.Suppose the number of samples in the training set is .Let 1 , … , ∈ be the feature vector of data samples and 1 , … , ∈ {1, … , } be its corresponding label.To deal with data non-linearity, kernel SVM projects to a higher dimension using a mapping function Φ ∶ → ′ , where ′ > and applies a kernel function ( , ) = Φ( ) ⋅ Φ( ) for training and classifying computation.Radial Basis Function (RBF) [7] is the most popular SVM kernel due to its e ectiveness.
SVM was initially designed for binary classi cation, but it can be extended to multiclass classi cation by breaking down the multiclass problem into multiple one-to-rest binary classi cation problems.For each class ̂ , data samples are assigned to two classes, where ( ̂ )  = 1 if = ̂ , otherwise ( ̂ ) = 0.
Given a new observation ̃ ∈ , its label can be predicted as Proving multi-class SVM classi cation with RBF kernel.
is the decision function's evaluation for each class ̂ ∈ [1, ].To prove the SVM classi cation in (12), we harness Exp and Max gadgets in §4.1.2to prove the exponent in the RBF kernel projection, and the class output being the maximum value among all evaluations, respectively.We adopt the representation in [71] where ( ̂ ) is expanded to a value-index pair, i.e., ∶= {( (1) , 1), ( (2) , 2), … , ( ( ) , )}.Let ̄ ∶= {( ̄ (1) , ( 1)), ( ̄ (2) , ( 2)), … , ( ̄ ( ) , ( ))} be the permutation of , where (⋅) is the permutation function such that ̄ ( ̂ ) = ( ( ̂ )) and ̄ (1) is the maximum value in .The prover provides ̄ as the auxiliary witness and shows that the output label = (1).Let be a random challenge from the veri er, the prover binds each value-index pair in and ̄ to a single value as and invokes a permutation check using Perm gadget, where is a random number chosen by . (1) , … , ̄ ( ) ] be the auxiliary witness used in the gadget Max.Suppose is the claimed output label and ( ) is the evaluation of the corresponding decision function.Let = { ( ̂ ) } and ̄ = { ̄ ( ̂ ) } be intermediate vectors, where ( ̂ ) and ̄ ( ̂ ) are computed by (13), respectively.The set of arithmetic constraints to prove (12) is Proving other SVM kernels.Our techniques can be used to prove other SVM kernels such as the polynomial kernel, Sigmoid kernel, etc.The polynomial kernel ply ( , ) = ( + ) can be easily proved via addition and multiplication gates, where , , are parameters.Although it is relatively easy to prove, the polynomial kernel usually achieves a lower accuracy than the RBF kernel [13].Due to the space constraint, we show how to prove other kernels in Appendix C.
The server sends ( , ) to the client.

Putting Everything Together
We combine everything together and present the complete algorithmic description of our ezDPS scheme in Protocol 1.We describe the functionality (Algorithm 1) that processes a data sample ∈ with DWT (Figure 3, lines 1-14), PCA (line 15), and SVM (lines [16][17][18][19], and returns an inference result .

Zero-Knowledge Proof of Accuracy
We construct a zkPoA scheme that is derived from the inference of individual samples to attest to the e ectiveness of the committed model by demonstrating its accuracy over public dataset  = ( 1 , … , ) with ground truth labels = ( 1 , … , ). zkPoA requires the server to commit to a model with claimed accuracy on public sources.Once the model is committed and zkPoA is generated, it cannot be altered.The server has to use the model that has been committed previously for the successive inference tasks.Let = ( 1 , … , ) be the predicted labels of , where ← DPS( , ) for ∈ [1, ].The accuracy of where 0 ≤ ≤ 1.
In our zkPoA, it su ces to show the committed model maintains at least accuracy (rather than the precise number) by proving that at least ⋅ samples are classi ed correctly.This reduces the complexity since the prover does not have to prove some samples are misclassi ed (which incurs complex circuits for proof of inequality).Our zkPoA is as follows.
The set of constraints for our zkPoA includes all the constraints to prove each plus the following constraints

Analysis
Complexity.Let , be the dimensions of the raw data sample and the feature vector by PCA, respectively.Let , be the number of SVM classes and the number of support vectors for all classes, respectively.In DWT, our scheme requires 8 log 2 2 constraints for DWT decomposition ( 6) and reconstruction (8), while the thresholding (2) incurs (3 +9)( − 2 ) constraints, where is the size (in bits) of each value per dimension of the raw data sample, is the dimension of the high-pass and low-pass lters.In total, our scheme requires 16 log 2 2 + (3 + 9)( − 2 ) constraints for proving DWT.In PCA, the number of constraints is (11).This is reduced from compared with direct proving (9) due to random linear combination.In SVM classi cation (14), our scheme incurs (2 + ) + 2 constraints for proving RBF kernel projection, and (3 + 6)( − 1) + 2 constraints for proving the classi cation for classes and constraints for the nal decision function.The permutation trick in our proposed Max gadget permits us to reduce the number of comparisons from ( 2 ) in generic circuits to ( ) .In total, our scheme incurs (2 + ) + 4 + (3 + 6)( − 1) constraints for proving -class SVM classi cation with RBF kernel.Table 2 summarizes the complexity of our framework, compared with directly proving DWT, PCA, and SVM computations with generic circuits.
For zkPoA, suppose the number of samples in the testing dataset is , and proving one testing data incurs constraints.Therefore, our zkPoA incurs ( + 4) constraints for proving the accuracy.Security.We analyze the security of our scheme.Speci cally, we have the following theorem.
Theorem 2. Our ezDPS scheme in Protocol 1 is a zero-knowledge MLIP as de ned in De nition 1 given that the backend CP-ZKP is secure by Theorem 1.

Proof. See Appendix B 6 Implementation
We fully implemented our proposed framework in Python and Rust, consisting of approximately 2,500 lines of code in total.For DWT, we implemented the Daubechies DB4 algorithm [66].We used sklearn [55] to implement the training phase of PCA and SVM.On the other hand, we implemented the inference phase of PCA and SVM from scratch to obtain all the witnesses for generating the proofs.We used xed-point number representation for all the values being processed in our framework.Each value can be represented by 64 bits, which reserves 1 bit for the sign, 31 bits for the integer part, and 32 bits for the fractional part.
We used the exponent gadget to prove the RBF kernel of the form || − || 2 , where the base is public and the exponent || − || 2 is secret (witness).As shown in §4.1.2,our gadget precomputes 2 −1 , where = − and is the index of the binary representation of the exponent.We used a xed-point arithmetic to represent the exponent.Since it su ces to set = 10 −3 for RBF kernel, we used 20 bits to represent the fractional part of the exponent, which su ces to cover most of the cases in our test set.There are few samples that cause the fractional part of the exponent to exceed 20 bits.In this case, we truncated the fractional part of the witness that exceeds 20 bits, leading to a small accuracy loss (see §7.4).
In our implementation, we transformed the arithmetic constraints and the witnesses generated from ML algorithms into R1CS relations using the compact encoding method in libspartan [62] and then invoked its library APIs to create proofs and veri cation.Concretely, we used Splartan DL scheme, which implements (i) Hyrax polynomial commitment [67], (ii) curve25519-dalek [27] for curve arithmetic in prime order ristretto group, (iii) a separate dot-product proof protocol for each round of the sum-check protocol for zero-knowledge property, and (iv) merlin [12] for non-interactive proof via Fiat-Shamir transformation.
7 Experimental Evaluation

Con guration
Hardware.We ran all the experiments on a 2020 Macbook Pro, which was equipped with a 2.0 GHz 4-core Intel Core i5 CPU, 16GB DDR4 RAM.Currently, we did not make use of thread-level parallelization to accelerate the proving/veri cation time.The experimental results reported in this section are with single-thread computation, which can be further improved once multi-thread parallelization is employed.
Dataset.We evaluated our scheme on three public datasets, including the ECG dataset in UCR Time Series Classi cation Archive (UCR-ECG) [11], Labeled Faces in the Wilds (LFW) [26], and Cifar-100 [38].UCR-ECG contains 1800 records of ECG signals, each being of length 750.LFW contains 5749 human faces, where each image is of size 125 × 94 bits.Cifar-100 contains 100 classes, and the dimension of the samples is 3072.We used the subset of each dataset for the di erent number of classes.
Parameters.We used standard parameters as suggested in Spartan [61] (e.g., curve25519) for 128-bit security.We evaluated the performance of our proposed methods with varied numbers of classes ( ) and PCA dimensions ( ) (see Table 3).For LFW dataset, we scaled the dimension of the image inputs to 4200 when the number of classes is small (i.e., 8 and 16), and to 5655 for many classes (> 32).For DWT processing, we set the number of recursion levels to be 1 for noise reduction and = 0.2 for processing the detail coe cients.For PCA, we selected the number of eigenvectors such that they can capture at least 90% of the variance.We presented the concrete number of w.r.t di erent sizes of the datasets in Table 3.Finally, we used the Grid Search method to nd the best parameters for SVM and set = 1, = 0.001.Counterpart comparison.To our knowledge, we are the rst to propose a zero-knowledge MLIP.There is also no prior work that suggests zero-knowledge proof for each of the ML algorithms (i.e., DWT, PCA, and SVM) in our framework.Thus, we chose to compare with the naïve approach, in which we hardcore the whole DWT, PCA, and SVM computation into the circuit and ran the same CP-ZKP backend (i.e., Spartan).We compared ezDPS with this baseline to demonstrate our advantage in reducing the proving time, veri cation time, and proof size.We also report the accuracy of ezDPS to demonstrate the advantage of ML pipeline processing.Evaluation metrics.We assess the performance of our scheme and the baseline approach in terms of proving time, veri cation time and proof size ( §7.2 and §7.3).Note that for such cryptographic performance evaluation, we only used a reduced dataset of Cifar-100 and LFW that yield concrete model parameters after training as presented in Table 3.For UCG-ECG, we used the whole set as it  is already small.We did not not evaluate on the whole set of Cifar-100 and LFW due to our limited hardware and the expensive cryptographic overhead incurred by the baseline.Instead, we report the accuracy of plain ML techniques and estimate the performance of our scheme when tested on the whole dataset ( §7.4).

Overall Results
ezDPS is one to three orders of magnitudes more e cient than the baseline in all metrics.Figure 4 presents the performance of our technique compared with the baseline approach in terms of proving time, veri cation time, and proof size, in three datasets with di erent sizes.For example, on UCR-ECG dataset, our proving time is from 321 to 518 seconds for 4 to 42 classes, while it takes from 1429 to 2807 seconds if using the baseline approach.The gap between our scheme and the baseline is more signi cant when the number of classes increases.Speci cally, on LFW dataset, with 8 classes, our scheme achieves 6.75× faster proving time, where it only takes 1702 seconds, compared with 11491 seconds in the baseline.With 2048 classes, our proving time is 6977 seconds, approximately 1842× faster than the baseline, which takes 2439811 seconds.The veri cation time and proof size follow a similar trend, in which ezDPS achieves an order of magnitude faster veri cation time and smaller proof size than the baseline.Speci cally, on LFW dataset, the veri cation time is 6.6 seconds for 16 classes and 19.2 seconds in the baseline.The proof size is 3059 KB in our scheme, compared with 11946 KB in the baseline.On the LFW dataset with 2048 classes, our veri cation time is 9 seconds, and the proof size is 4411 KB, while it takes 123.6 seconds for veri cation with 56856 KB proof size in the baseline.This results in around 12× faster on the veri cation time and 14× smaller proof size, respectively.We can also see the veri cation and bandwidth in ezDPS are highly e cient, i.e., less than 10 seconds and 5 MB, respectively, compared with the proving.This is because we use Spartan as the CP-ZKP  backend, which o ers sublinear veri cation and proof size overhead.
The concrete end-to-end computation latency and communication in Figure 4 also con rm the eciency improvement of our optimization techniques.By introducing the split technique and employing the random linear combination, the complexity is reduced from ( + ) to ( 2 + ), where is a very small constant in practice (e.g., = 4 for Daubechies DB4 DWT).The most signi cant improvement in the overall cost is achieved when the number of classes is large.That is due to the employment of Max and Exp gadgets in the SVM phase, which reduces the complexity from ( 2 ) to ( ).Such asymptotic improvement helps to achieve one to three orders of magnitude faster computation time and lower communication overhead on real datasets.
Finally, we report the performance of zkPoA scheme proposed in §4.2.5.Since zkPoA is derived from the proof of inference for individual samples, our scheme maintains the same ratio of performance gain over the baseline as reported in §7.2.Concretely, we tested zkPoA on the reduced LFW dataset with 64 samples.As shown in Figure 6, we achieve 6× to 9× faster on the prover's time, 3× faster on the veri er's time compared with the baseline.Regarding proof size, our scheme incurs 171392-226432 KB, which is three times smaller than the baseline that requires 576148-827968 KB.The complexity of zkPoA is linear with the number of samples, and its main overhead stems from the inference proof of individual samples.
We can see that our zkPoA scheme currently only supports plain accuracy veri cation, meaning the proof is given only for a speci c test set.In the ML setting, cross-validation over di erent test sets is generally applied to report a more reliable accuracy result.It is interesting to explore if an zkPoA scheme can permit accuracy veri cation with cross-validation without leaking the model privacy due to multiple test sets.We leave it as an open research problem for future investigation.

Detailed Cost Analysis
We dissected the total cost of our scheme to investigate the impact of each data processing on the overall performance.Figure 5 presents the detailed cost of ezDPS with three datasets.In ezDPS, the sample was processed in three phases, including DWT noise reduction, PCA feature extraction, and SVM classi cation.
•DWT Processing: The cost of DWT processing is stable when varying the number of classes ( ) and contributes a considerable portion to the overall performance.This is because the complexity of DWT is independent of , i.e., ( ), which is bigger than PCA (i.e., ( )), but smaller than SVM (i.e., (( + ) + )) for a large number of classes.On the UCR-ECG dataset, the proving time is around 160 seconds, and the veri cation time and proof size are around 0.47 seconds and 256 KB, respectively.On Cifar-100, the proving time, veri cation time, and proof size are around 656 seconds, 1.94 seconds, and 1046 KB, respectively.On LFW dataset, the performance of the DWT phase ranges from 898 to 1209 seconds, 2.2 to 2.6 seconds, and 676 to 1421 KB, respectively.There is a considerable di erence in proving DWT across three datasets.That is because the dimension of inputs varies between di erent datasets, e.g., equals 750, 3072, and 4200 (or 5655) on UCR-ECG, Cifar-100, and LFW datasets, respectively.
•PCA-based Feature Extraction: The cost of PCA processing is stable even when the number of classes increases and it contributes the least portion to the overall performance of our scheme.This is because the complexity of PCA is ( ) (which is also independent to ), compared with ( ) in DWT and (( + ) + ) in SVM.For example, it costs around 17 seconds for proving, 0.198 seconds for the veri cation, and around 141 KB for the proof size on the UCR-ECG dataset.The cost of proving PCA is nearly negligible on UCR-ECG and LFW datasets.This is because the number of constraints for PCA is relatively small (i.e., 750 on UCR-ECG and 5655 on LFW) compared with DWT and SVM (e.g., on UCR-ECG, there are 75439 and over 110322 constraints in DWT and SVM, respectively).Since the veri cation time and proof size is sublinear, the proportion of PCA processing becomes larger relatively compared with DWT and SVM.
•SVM Classifcation: SVM computation is the most dominant factor, especially on large datasets (with more than 256 classes), which contributes over 73% to the total proving cost.That is because the cost of SVM is (( + ) + ), and thus it grows linearly with .Notice that the increase of the number of classes also leads to the increase of the model parameters ( ).On UCR-ECG dataset, the proving time of SVM ranges from 142 to 339 seconds.The veri cation time is from 1.65 to 4.24 seconds, and the proof size is 688 KB to 1623 KB for 4 to 42 classes.On Cifar-100 dataset, the proving time of SVM costs from 43 to 722 seconds, while its veri cation time and proof size are from 0.4 to 1.925 seconds and 179 KB to 785 KB, respectively, for 4 to 100 classes.On the LFW dataset, the proving time ranges from 704 to 6011 seconds for 8 to 2048 classes, while the veri cation time ranges from 1.89 to 5.73 seconds, and the proof size ranges from 779 to 2286 KB, respectively.The gap between SVM vs. DWT and PCA looks smaller in the veri cation time and proof size due to their sublinear growth of complexity by Spartan ZKP.Estimated performance on whole datasets.Based on the overall results ( §7.2) and the above cost analysis on the reduced datasets, we estimated the cryptographic overhead of our scheme when tested on the whole Cifar-100 and LFW.For ∈ {8, 16, 32, 64, 100} classes in Cifar-100 with the standard train/test method, the proving time of our scheme is estimated to take 8189 to 108698 seconds.The veri cation time and proof size are estimated to take 8.8-26 seconds and 4154-11247 KB, respectively.In LFW dataset with most sampled classes, the proving time is estimated to take 5823 to 24772 seconds, while the veri cation time and proof size is estimated to take 9.24-16.34seconds and 4487-7424 KB, respectively.The estimated proving time is signi cant, since the estimation is based on our current hardware (i.e., a laptop without multi-threading).In practice, since the prover is the server that generally has better computational resource (e.g., multi-core CPU with higher frequency and multithreading), we expect the actual proving time will be signi cantly faster.For the whole Cifar-100, since the number of support vectors ( ) is large, it incurs a large model size, resulting in high proving time.We expect that once some optimization techniques (e.g., [32,42]) are applied to reduce the model complexity, all the cryptographic overhead will be signi cantly reduced.We leave such optimization as our future work.

Accuracy
We report the accuracy of ML algorithms on the whole dataset of UCR-ECG, Cifar-100, and LFW.In Cifar-100, we used all data from classes 0, 1, … , − 1 for ≤ 100 classes and tested with its standard train/test method.For LFW and UCR-ECG, since there is no standard train/test split, we applied the cross-validation to report the accuracy.In LFW, since the number of samples in each class is unbalanced, we selected classes that have the most data samples.In UCR-ECG, we chose data from classes 0, 1, … , −1.Table 4 presents the plain accuracy of ML algorithms on the selected datasets.The last row of Table 4 presents the accuracy of executing DWT+PCA+SVM inference with Fixed-Point Arithmetic (FPA), which is similar to how our ezDPS works.We can see that FPA leads to an accuracy decrease of around 1% to 2%.In LFW, DWT+PCA+SVM with oating-point arithmetic achieves 73% ± 7% and 60% ± 8% accuracy rates for 8 and 16 classes, respectively.The accuracy decreases 1% to 2%, leading to the accuracy rates of 72% ± 7% and 60% ± 6%, respectively.A similar trend is also observed in UCR-ECG and Cifar-100 datasets, where the accuracy loses around 1% to 2% due to FPA.
For curious readers, we conservatively report the best inference accuracy that each of our benchmark datasets currently achieves with di erent state-of-the-art ML pipeline techniques (without integrity and model privacy).UCR-ECG can achieve 97.5% accuracy by combining Gated Recurrent Unit with Fully Convolutional Network [14].Cifar-100 can achieve 96.08% accuracy by combining ImageNet pre-trained model with sharpness-aware minimization [16].Finally, LFW can achieve 99% accuracy using optimized VarGNet [70].Since these pipeline techniques are highly optimized for each dataset, they yield higher accuracy than our generic framework.We leave the investigation on zero-knowledge proofs for optimization techniques that can be integrated into our framework to further improve the accuracy of our future work.
Veri able and zero-knowledge ML.Unlike PPML, veri able ML (vML) and zkML focus on the integrity of delegated ML computation using VC and zero-knowledge techniques [5,17,21,54,61].Both vML and zkML are still in the early development stage, with a limited number of schemes being proposed.In vML, the resource-limited client delegates the training/inference tasks to the server, and later checks if the task has been performed correctly (no privacy guarantee).Zhao et al. [72] proposed Ver-iML, a vML framework for linear regression, LR, NN, SVM, and DT training.Some vML schemes are designed for DNN inference (e.g., [19,63]) using VC protocols (e.g., [21,23]) or TEE [8].On the other hand, zkML, rst studied in 2020 [71], enables integrity and model privacy in the inference phase, where the client can verify if the inference result on her data is indeed computed from the server's committed model without learning the model parameters.Zhang et al. designed a zkDT scheme [71], followed by a few zero-knowledge DNN inference constructions [15,40,45].Weng et al. proposed Mystique [68], a zkVC compiler for e cient zero-knowledge NN inference.
To improve the e ciency, the splitting technique separates the data sample and the low-pass lter into two parts, i.e., the odd part and the even part.Speci cally, let (1) = [ 1 , 3 , 5 ], (2) = [ 2 , 4 , 6 ], (1) = [ℎ 1 , ℎ 3 ], and (2) = [ℎ 2 , ℎ 4 ].Therefore, ( 15) is equivalent to which only requires 4 multiplications to prove compared with 20 in (17).It reduces the number of intermediate terms in ′ , thereby reducing the number of constraints.We present the above toy example in Figure 7. Application to zkCNN.We show that the split technique can be used to improve the e ciency of zkCNN [45] in some cases when the sliding step between two rounds of convolution is larger than 1.Note that ≥ 2 is generally adopted in deep learning regions [39].
Suppose the input matrix is of size × and the kernel matrix is of size × .The 2-D convolution between these two matrices is a matrix of size for 0 ≤ , ≤ ( / − 1).
• ← (cm, , , , pp): Let cm = ( ĉ m, cm ), wait  for validation.passing the random linear combination test.The probability of the rst case is negligible in due to the soundness of the commitment scheme used by the backend ZKP protocol.As Max gadget relies on the permutation test, its soundness error is negligible in due to the soundness of the characteristic polynomial check, which achieves the probability of /| | due to Schwartz-Zippel Lemma [60].Finally, the soundness error of the random linear combination over a small number of constraints is negligible in .By the union bound, the probability that  can generate such * is negl( ).
• Scenario 2: * = ( , aux) and ((cm, , , ⃗ ); * ) = 0.According to the soundness of the backend ZKP, given a commitment cm * , the probability that  can generate a proof making  accept the incorrect witness is negligible in .
In overall, the soundness of ezDPS holds except with a negligible probability in .Zero Knowledge.We construct a simulator for Protocol 1 in Figure 10 and show that the following hybrid game is indistinguishable.
• Hybrid 1 : 1 uses the real ezDPS.Com() in Protocol 1, for the commitment phase, and invokes  to simulate the proving phase.
Given the same commitment, the veri er cannot distinguish 0 and 1 due to the zero-knowledge property of the backend zero-knowledge protocol, given the same circuit and public input.If the veri er can distinguish 1 , and 2 , we can nd a PPT adversary  to distinguish whether a commitment of an MLIP with zero strings or not, which is contradictory with the hiding property of the underlying commitment scheme.Thus, the veri er cannot distinguish 0 from 2 by the hybrid, which completes the proof of zero-knowledge.

C Proving Other SVM Kernels
Let ∈ be the output of the kernel function.We present constraints for other SVM kernels as follows.

D Proving Deep Learning Techniques
In this paper, we mainly focus on designing techniques to prove classical ML algorithms in zeroknowledge.However, we show that they can be used to prove some deep learning techniques as follows.Convolutional layers.A convolutional layer computes the dot product between an input vector ∈ and a small kernel ∈ .In the th round, it computes the th entry of the output such that [ ] = ∑ =1 [ ] ⋅ [ ( − 1) + ], where is the step between two rounds.Our proposed technique can be applied to the convolutional layers w.r.t di erent settings of .
• = 1.This includes only addition and multiplication operations.Thus, the random linear combination can be applied to reduce the number of constraints, or other optimization techniques [40,45] can be used.
• = 2. Our split technique in §4.2.1 can be applied.Both the kernel and inputs are split into two parts, and a random linear combination can be performed.
• ≥ 2. Our split technique can be extended when the step is greater than two.We rst split and to parts, such that in the th part, and [ ] can be computed as Then the random linear combination can be utilized as described in §4.2.1.
Activation layers.Let ∈ be the output of the activation function.We show how to prove activation functions with our gadgets as follows.

E Mitigating Model Stealing Attacks
As discussed, model stealing attacks [6,29,64] aim to reconstruct the ML model from the inference result, given that the adversary has black box access to the model parameters.To our knowledge, there is no general defense against these attacks beyond limiting the number of queries the client can make to the model [29].We present several strategies that can mitigate these attacks, and, with some e orts, they can be integrated orthogonally into our scheme to protect the model privacy for both the inference result and the proof.Limiting prediction information.The model holder can limit the output information by releasing class probabilities only for high-probabilities classes (e.g., top-5 in ImageNet dataset [39]) [64], or only releasing the class labels [6,64].Limiting output information forces the adversary to query more, which permits the model holder to identify them by augmenting adversarial detection methods (see below) that analyze their behaviors against benign users.Tramer et al. [64] showed that by returning the class label without the con dence score (like ezDPS currently o ers), the number of required queries to extract the model increases by 50-100 times.Thus, the model holder can increase the cost per query, thereby reducing the pro t the adversary can make.Adversarial detection.Juuti et al. [33] proposed an e cient method to detect whether the adversary is attempting to steal the model by analyzing the distribution of the adversary's queries against the normal (Gaussian) distribution.Kesarwani et al. [36] proposed two performance metrics (e.g., the information gain and the coverage of the input space) that quantify the rate of information the adversaries gained from the queries and are used to represent the status of the model extraction process.Another approach is to embed watermark techniques so that if the adversary steals the model, the owner can detect and certify the stolen model [1,30].Obfuscating prediction results.Several approaches suggest perturbing or adding noise to the prediction results to prevent the adversary from executing the (supervised) retraining process to reconstruct the model [6,41,64].This can be achieved with Di erential Privacy to hide the decision boundary between prediction labels regardless of how many queries are executed by the adversary [73].Another approach is to poison the training objective of the adversary by actively perturbing the predictions without impacting the utility for benign users [53].

F Model Leakage in Proof of Inference without Zero Knowledge
We show how the proof of inference, without zero-knowledge, can leak model parameters.Let ∈ be the MLIP model parameters, ∈ be the public inputs and outputs, and = ⌈log ⌉.According to Spartan, our backend ZKP protocol, the secret parameter = ( , 1, ) is encoded as a function (⋅) ∶ {0, 1} → that the low degree extension of it is a multilinear polynomial ̃ ( ), such that To prove the satis ability of the arithmetic circuits, both parties invoke two sumcheck protocols, where a dot-product-proof protocol [67] is applied to guarantee the zero-knowledge property.Suppose we do not have the zero-knowledge property, the sumcheck protocol would leak the information of the secret parameter .Speci cally, in the rst round of the sumcheck protocol, upon receiving a random challenge ∈ ,  computes = Σ ∈{0,1} ̃ ( , )⋅ ̃ ( ), where ̃ ∶ × → is a sparse multilinear polynomial, which is the low degree extension of matrix in R1CS.Therefore, once acquiring ,  could compute the value of ̃ ( ), which contains private information of the model.This demonstrates the importance of having zero-knowledge in the integrity proof to protect the model parameter privacy.

Figure 4 :
Figure 4: Performance of our scheme compared with the baseline.

Figure 7 :
Figure 7: Example of split technique applied to DWT decomposition vs. directly using random linear combination.