SEDMA: Self-Distillation with Model Aggregation for Membership Privacy

Membership inference attacks (MIAs) are important measures to evaluate potential risks of privacy leakage from machine learning (ML) models. State-of-the-art MIA defenses have achieved favorable privacy-utility trade-offs using knowledge distillation on split training datasets. However, such defenses increase computational costs as a large number of the ML models must be trained on the split datasets. In this study, we proposed a new MIA defense, called SEDMA, based on self-distillation using model aggregation to mitigate the MIAs, inspired by the model parameter averaging as used in federated learning. The key idea of SEDMA is to split the training dataset into several parts and aggregate multiple ML models trained on each split for self-distillation. The intuitive explanation of SEDMA is that model aggregation prevents model over-fitting by smoothing information related to the training data among the multiple ML models and preserving the model utility, such as in federated learning. Through our experiments on major benchmark datasets (Purchase100, Texas100, and CIFAR100), we show that SEDMA outperforms state-of-the-art MIA defenses in terms of membership privacy (MIA accuracy), model accuracy, and computational costs. Specifically, SEDMA incurs at most approximately 3 - 5% model accuracy drop, while achieving the lowest MIA accuracy in state-of-the-art empirical MIA defenses. For computational costs, SEDMA takes significantly less processing time than a defense with the state-of-the-art privacy-utility trade-offs in previous defenses. SEDMA achieves both favorable privacy-utility trade-offs and low computational costs.


INTRODUCTION
Machine learning (ML) has been widely used in a lot of areas, such as predictive analytics, image recognition, and natural language processing, where the input may include privacy-sensitive data.However, recent work has shown that ML models tend to memorize information from the training data (due to over-fitting), which poses serious privacy risks when training data includes privacysensitive information [4,33,44].Membership inference attacks (MIAs), where an adversary aims to identify whether a target sample was used to train an ML model based on model behavior, are one of the most fundamental privacy attacks against ML models [12,33].The attacks can pose a serious privacy threat as they reveal privacysensitive information related to training data.For example, in the case of an ML system for hospital health analytics, an adversary can reveal that a victim was once a patient in the hospital (the victim's data was used for a training data of the ML system) by the MIAs.In addition, MIAs have also been applied to data extraction attacks [4] and the privacy-preserving assessment of ML models [15,25].
MIA adversaries can obtain privacy-sensitive information related to the training data by simply accessing a prediction API in ML systems (called black-box MIAs), even if the providers of the ML systems are trusted.Black-box MIAs can be categorized into single-query [3,23,34,36,[43][44][45] and label-only attacks [7,19,20].Single-query attacks directly query a target model with only a target sample, typically to identify the target sample as members of the training data (used in training) or non-members (not used in training).Label-only attacks indirectly query a target model with multiple samples in the neighborhood of a target sample to identify the target sample as members or non-members.
MIA defenses need to design ML models to behave similarly between a sample of members and non-members because MIAs identify the sample as members or non-members based on the different model behavior caused by whether or not the sample is included in the training data [1,8,14,16,23,32,38]. MIA defenses can be categorized into provable privacy defenses and empirical membership privacy defenses, as shown in Table 1.Provable privacy defenses typically use differential privacy mechanisms to provide provable privacy guarantees for the all inputs of ML models.However, differential privacy-based defenses, such as DP-SGD [1], have been reported to significantly degrade model utility in a lot of ML models.Empirical membership privacy defenses aim to preserve the model utility and provide privacy empirically evaluated through practical MIAs without provable privacy guarantees.This study focuses on empirical membership privacy defenses and proposes a new defense strategy that has better trade-offs among membership Table 1: Two categories of MIA defenses: provable privacydefenses and empirical membership privacy-defenses.The provable privacy defenses provide provable privacy guarantees however degrade model utility.The empirical membership privacy defenses maintain high model utility, but cannot provide provable privacy guarantees.
In this study, we introduce a new empirical membership privacy defense called SEDMA 1 .Our goal is to mitigate practical black-box MIAs, maintain high model utility, and have low computational costs.First, similar to KCD [8], SEDMA splits a training dataset into several subsets and trains multiple ML models (called sub-models) on each subset.Second, the trained sub-models are aggregated into several pairs (called model aggregation).Model aggregation means averaging the model parameters, such as performed in federated learning [21]; however, it is different from federated learning in that it is performed among several pairs of sub-models.
Next, SEDMA performs inference with several aggregated models on the subset not used in the training of each sub-model and obtains their corresponding prediction vectors.These prediction vectors were used as the soft labels for the original training dataset.In general, SEDMA trains a protected model on a soft-labelled training dataset.This process is called self-distillation, transferring prediction vectors (knowledge distillation) between ML models with the same model architecture.The state-of-the-art MIA defenses, KCD [8] and SELENA [38], have achieved favorable privacy-utility trade-offs by self-distillation on split training datasets.Note that the novelty of SEDMA lies in the approach of applying model aggregation in several pairs of sub-models for self-distillation.The intuitive explanation of SEDMA is that model aggregation prevents over-fitting by smoothing information of the original training data included in the trained models among the sub-models and preserving model utility such as in federated learning.In addition, SEDMA reduces the computational costs by generating sub-models to combine sub-models by using model aggregation, unlike SELENA which has state-of-the-art privacy-utility trade-offs in previous empirical defenses.
In our experiments, we evaluated SEDMA on three benchmark datasets (Purchase100, Texas100, and CIFAR100).We compared SEDMA with four existing empirical MIA defenses [8,16,23,38], including KCD and SELENA.We conducted two types of black-box MIAs, single-query attacks (NN-based attacks and metric-based 1 SElf-Distillation using Model Aggregation attacks) and label-only attacks (boundary distance attacks, data augmentation attacks, and likelihood ratio attacks) as existing attacks that have been used for evaluation in related work.
The experiments show that our defense achieves the lowest MIA accuracy in four existing MIA defenses (around 52%).Nevertheless, SEDMA incurs only a little drop model accuracy (at most approximately 3 -5% drop compared to undefended models).Moreover, we discuss the best hyperparameters of SEDMA for favorable tradeoffs, the computational costs of SEDMA, and the comparison with provable privacy defenses.For computational costs, compared to SELENA which has state-of-the-art favorable trade-offs, SEDMA takes significantly less processing time (only a seventh of the time on three benchmark datasets).

Contributions
In summary, the key contributions of this paper are as follows: • We propose SEDMA, which is a new defense mechanism against MIAs by self-distillation with model aggregation.
The novelty of SEDMA lies in the approach that reduces overfitting by smoothing information related to the training data among sub-models with model aggregation, similar to the model averaging performed in federated learning [21].• We show that SEDMA mitigates practical black-box MIAs, maintains high model utility, and has low computational costs.We also demonstrate that SEDMA's performance depends on the content ratio of the original training data for a sub-model and discuss the best hyperparameters of SEDMA for favorable privacy-utility trade-offs.SEDMA has the advantage of reducing overfitting and adjusting the content ratio by model aggregation, while saving computation time.• We evaluated SEDMA on three benchmark datasets (Pur-chase100, Texas100, and CIFAR100) with single-query and label-only attacks.We demonstrated that SEDMA outperforms state-of-the-art empirical defenses.SEDMA achieves the lowest MIA accuracy in the previous defenses (around 55%), with only a model accuracy drop of approximately 3 -5% compared to undefended models.

PRELIMINARIES
In this section, we introduce our threat model and provide an overview of prior MIAs and MIA defenses.Based on the threat model and previous work, we present our goals.

Threat Model
We assume black-box MIAs, in which an adversary has black-box access to a target model.An adversary cannot access the target model parameters directly, however, it can query the target model through a prediction API and obtain the corresponding prediction vectors and / or labels.Thus, the adversary can perform singlequery attacks with access to both prediction vectors and labels, or label-only attacks with access to only prediction labels.This threat model is standard black-box MIAs and is typically used for the evaluation of MIAs in prior studies.We also assume that the adversary knows a small subset of the training dataset for the target model, similar to the assumption in prior works [23,34,38].In other words, the adversary knows

Related Work
We introduce an overview of prior MIAs and MIA defenses.

Membership Inference Attacks (MIAs).
MIAs can identify whether a target sample is a member based on prediction vectors and / or labels from the target model.MIAs are typically studied in a black-box scenario in which the adversary does not know the target model parameters [23,33].Black-box MIAs can be performed either by identifying a subset of the training dataset [23] or constructing shadow models from a dataset with the same distribution [33].Black-box MIAs can be classified into two categories: single-query and label-only attacks, as shown in Table 2.There have been a lot of types of single-query attacks in prior studies [34,36,44,45].Single-query attacks, such as NN-based attacks or metricbased attacks, issue a query directly to a target model and use prediction vectors and labels to identify the query as members or non-members.
NN-based attacks [23,33] build a neural network model  NN for membership inference by using prediction vectors and labels of a target model.The adversary identifies a target sample as members or non-members with  NN .
Metric-based attacks [23,33,34,36,44,45] use the metrics, such as correctness (correctness-based attacks), confidences (confidencebased attacks), or entropy (entropy-based attacks or modified entropybased attacks) of the prediction results from the target model to identify the target sample as members or non-members.
Correctness-based attacks [45] assume that a sample with correct prediction is likely to be a member.The attacks identify a target sample as members or non-members based on the difference between training accuracy and testing accuracy of a target model.The adversary builds the membership inference classifier (members as 1 and non-members as 0)  corr to the target model  as follows: where  is the correct label of the input sample .
Confidence-based attacks [36,44] exploit the fact that member's prediction confidence  ()  is typically higher than non-member's one.The attacks identify a target sample as members when the prediction confidence is higher than either a class-dependent threshold   or a class-independent threshold .The membership inference classifier  conf for the attacks is as follows: Entropy-based attacks [33] exploit the fact that member's prediction entropy is typically lower than that of non-members.The attacks identify the target sample as a member when the prediction entropy is lower than either a class-dependent threshold   or a class-independent threshold .The membership inference classifier  entr for the attacks is as follows: Modified entropy-based attacks [34] use a metric to combine prediction entropy and correct labels to improve entropy-based attacks.The attacks exploit the fact that member's values of the modified entropy metric Mentr(F (x), y) is typically lower than that of non-members.The adversary identifies a target sample as members when the modified entropy metric is lower than either a class-dependent threshold   or a class-independent threshold .The membership inference classifier  mentr for the attacks is as follows: mentr (F (x), y) = 1{Mentr(F (x), y) ≤  (y) }. ( Likelihood ratio attacks [3,43] use a ratio of likelihood of a target sample on a target model and a reference model trained on samples from population distribution, instead of a loss of the target model.Carlini et al. [3] propose likelihood ratio attacks (LiRA), which achieves the state-of-the-art attack performance.LiRA trains N shadow models, of which half are IN models and the other half are OUT models, and fit Gaussians to the confidences of the IN and OUT models.The adversary measures the likelihood of the confidence of the target sample under each distribution of the IN and OUT models and identifies a target sample as members or not by whichever is more likely.Jiayuan et al. [43] propose Attack R, which is similar to LiRA and achieves higher performance than LiRA.
Label-only attacks, such as boundary distance attacks or data augmentation attacks, are based on the fact that a member's sample influences prediction vectors and labels on both itself and the other samples in its neighborhood [20].The attacks exploit the fact that a target model is more likely to correctly classify samples around members' data than samples around non-members' data [7,19].The adversary issues multiple queries around a target sample indirectly to the target model and uses the prediction labels from the target model to identify the target sample as members or nonmembers.Therefore, obfuscating prediction confidence [16,42], is not a defense against label-only attacks.
Boundary distance attacks [7,19] assume that samples around members are more likely to correctly classify than samples around non-members.The attacks exploit the fact that the distance to the classification boundary for members is larger than that for nonmembers.The adversary computes the distance to the classification boundary by using samples added with small noise to a target sample or adversarial examples crafted to a target sample under the black-box scenario [2,5,6].
Data augmentation attacks [7] use data augmentation techniques in computer-vision for boundary distance attacks.The adversary computes the distance to the classification boundary by using augmented target samples with translations, rotations, or flips.
2.2.2 MIA Defenses.MIA defenses can be categorized into provable privacy defenses using differential privacy and empirical membership privacy defenses designed specifically to mitigate MIAs.
Provable privacy defenses which are based on differential privacy in ML, such as DP-SGD [1], add noise to ML models during the training process.However, applying differential privacy to ML models involves trade-offs between model accuracy loss and privacy guarantees [15,29].Improving the model accuracy loss are actively discussed in the provable privacy defenses.
PATE [26,27] is a defense that leverages knowledge distillation with public data and differential privacy for a provable privacy guarantee.Knowledge distillation transfers knowledge of a model (teacher) to another model (student) by using the prediction output of the teacher model [11].This defense splits training dataset for sub-models and adds noise to the trained sub-models to label public data for a student model.However, in a lot of realistic scenarios, it is difficult to prepare public data corresponding to original privacysensitive training data.
Currently, the state-of-the-art defense has shown that the model accuracy is significantly degraded on benchmark datasets (the model accuracy is 25% lower for  ≦ 3) [22,28,40].Thus far, it has been difficult to achieve both acceptable utility loss and privacy guarantees with differential privacy [15,29].
Adversarial Regularization (AdvReg) [23] is a defense that trains an ML model to mitigate different model behavior between members and non-members against MIAs.This defense is based on a game theoretic framework similar to MIAs extended to generative models [9].The defense optimizes the training of ML models with an objective function of reducing the prediction loss while also minimizing the MIA accuracy.
MemGuard [16] is a defense that obfuscates prediction vectors with noise to confuse membership inference classifiers.This defense maintains the same model accuracy as the undefended model because it only obfuscates prediction vectors without changing the prediction labels.However, it is weak against metric-based attacks and does not defend against label-only attacks [34].
DMP [32] is a defense that leverages knowledge distillation with public data.This defense achieves favorable privacy-utility tradeoffs because it can indirectly train an ML model on the privacysensitive training data by knowledge distillation with public data.It is difficult to prepare public data as well as PATE in a lot of realistic scenarios.
KCD [8] is a defense that trains multiple sub-models for each combination of split training data and then trains a protected model by distillation using the prediction outputs of the sub-models on one remaining split training data not used for training.This defense achieves favorable privacy-utility trade-offs without preparing public data which is the challenge of DMP.
SELENA [38] is a defense that trains multiple sub-models on each combination of split training data and then trains a protected model by distillation using the ensemble prediction output of the sub-models on the split training data not used for training.This defense differs from KCD in that it averages the multiple predictions outputs of sub-models for distillation.It also achieves favorable privacy-utility trade-offs without preparing public data, such as in KCD.
MIAShield [14] is a defense that trains each sub-model on multiple sub-models on each augmented split training dataset and then provides the ensemble prediction output of the sub-models, except for a sub-model trained on data related to an input in the inference phase.This defense is similar to SELENA; however, it has several limitations, such as applying only to augmentable computer-vision datasets and high-performance computing systems that can deploy multiple sub-models for ensemble prediction.
In addition, several regularization techniques, such as dropout [37], weight decay [41], and early-stopping [35], have been known to mitigate MIAs to a limited extent.

Our goals
In this study, we propose a defense that has favorable privacy-utility trade-offs and low computational costs against practical black-box MIAs.
2.3.1 Low MIA Accuracy.We aim to mitigate practical black-box MIAs, including both single-query and label-only attacks.Prior studies have often used accuracy of member inference, including both members and non-members, as an MIA evaluation.We focus on correct prediction for members because the adversary aims to identify whether a target sample was used to train an ML model.Thus, we evaluated not only the accuracy but also precision and recall on a dataset with the same number of members and non-members, based on a recent work [30].Precision indicates the percentage of correct answers among the data inferred to be members, and recall indicates the percentage of correct answers among the members of the data.These metrics, accuracy, precision, and recall, have high values when the risk of membership privacy leakage is high.Let us denote TP as the number of true positives, FN as the number of false negatives, TN as the number of true negatives, and FP as the number of false positives.The accuracy, precision, and recall are given as follows: In the prediction process, the output is an ensemble of prediction results using multiple sub-models.
Thus, precision and recall about TP are important metrics in terms of the adversary's aim to identify whether a target sample is a member.

High Model Accuracy.
Our defense aims to achieve favorable trade-offs between privacy and utility, that is, to protect membership privacy without significantly degrading model accuracy.Prior work has shown that defenses using knowledge distillation on split training datasets, such as KCD [8] and SELENA [38], achieves favorable privacy-utility trade-offs.In addition, these defenses do not require public datasets (PATE [26,27] and DMP [32] require public data).In this study, we focus on KCD and SELENA as benchmarks to evaluate privacy-utility trade-offs.

Low Computational costs.
Our defense aims to achieve low computational costs.Even if a defense has favorable trade-offs between privacy and model accuracy, it is not practical if the additional computational costs for the defense is significant.MIAShield [14] adds computational costs corresponding to the number of submodels for each inference because it makes inferences by using multiple sub-models trained on each split training data, as shown in Figure 1.However, KCD and SELENA provide one protected model by knowledge distillation, therefore they have no additional computational costs during the inference phase.In addition, it is difficult to deploy MIAShield to embedded devices with limited resources or federated learning which share one trained model [24].
In this study, MIAShield is out of scope and we focus on the computational costs of KCD and SELENA during the training phase.KCD requires additional training of sub-models for each combination of split training datasets, as shown in Figure 2. According to KCD's experiments [8], KCD provides favorable trade-offs with more than 10 sub-models.SELENA also requires additional training depending on the combination of sub-models, as shown in Figure 3.According to SELENA's experiments [38], the 25 submodels are required for favorable trade-offs.Thus, the defenses using knowledge distillation on split training datasets provide favorable privacy-utility trade-offs, however, the computational costs of training sub-models becomes a main concern.

OUR PROPOSED DEFENSE
In this section, we present the concept and details of our defense.

Concept
MIAs result from the fact that ML model behavior differs between members and non-members.For example, ML models exhibit the behaviors, such as different prediction accuracy between members and non-members [45], or higher prediction confidence for members [31,34,36,44].We propose a new defense strategy to mitigate these differences, based on model aggregation and self-distillation.
An overview of our proposed defense system is shown in Figure 4 and Algorithm 1.Our defense system first splits a privacy-sensitive training dataset  into  subsets  1 ,  2 , ...,   .Sub-models  1 ,  2 , ...,   are trained on each subsets  1 ,  2 , ...,   .Our defense aggregates the sub-models  1 ,  2 , ...,   with   combinations of  sub-models.Model aggregation refers to averaging the parameters of the models, as used in federated learning.However, it differs from federated learning in that it is performed among each combination of the submodels.In the case of  =  − 1, the sub-models are aggregated into

Description
The key attribute of our defense is to mitigate the over-fitting of a trained model on a privacy-sensitive training dataset by using model aggregation of multiple sub-models trained on split training data.Our defense necessitates that the trained model contains less additional information about the training data.The sub-models can have similar behaviors to that of non-members for the other split dataset, not used for training.Therefore, KCD soft-labels the split data using the sub-model trained on the other split dataset and trains a protected model on the soft-labelled dataset.SELENA soft-labels the split data by using the ensemble prediction results of the sub-models trained in the other split data.Our defense, however, soft-labels the split data by using the aggregated model consisting of the sub-models trained on the other split data.The model aggregation aims to reduce the over-fitting of trained sub-models by smoothing the information of the training data included in the trained models among multiple sub-models while preserving the model accuracy, such as in federated learning.The degree of overfitting on sub-models differs between our defense and KCD or SELENA, even when we use the same number of sub-models.Our defense generates multiple aggregated models from sub-models by performing model averaging for every few sub-models.Each aggregated model soft-labels to subsets not used for training of the sub-models, which consist of the aggregated model.In KCD, each sub-model soft-labels to one remaining subset excluded during training of the sub-model.
Another key attribute of our defense is that additional computational costs for the defense is lower than that of SELENA.SELENA has the state-of-the-art privacy-utility trade-offs, however, it has additional costs to train a large number of sub-models (approximately 25) to obtain prediction results for an ensemble.Our defense, however, combines sub-models by using model aggregation (simple averaging of model parameters) and therefore only trains on each split training dataset.The training for each combination of the split training datasets, such as in SELENA, is unnecessary.If the model accuracy of the aggregated models is not too high, only a Table 3: Three datasets for the evaluation of SELENA: Pur-chase100, Texas100, and CIFAR100."Train" is the amount of the training data."Test" is the amount of test data."Known" is the amount of the training data that the adversary can know and exploit for MIAs."Target" is the amount of data to infer membership (half of the data is from members, and the other half is from non-members).

EXPERIMENTS
In this section, we introduce the experimental setup and evaluation results compared to undefended models and state-of-the-art defenses.

Setup
4.1.1Datasets.We use three benchmark datasets widely used in prior studies on MIAs.The benchmark datasets are Purchase100, Texas100, and CIFAR100, as shown in Table 3. Purchase100 is a dataset provided by Kaggle's Acquire Valued Shoppers Challenge [17].We use the dataset that Shokri et al. processed [33].The dataset has 197,324 records with 600 binary features regarding items customers purchased.The data is classified into 100 classes regarding purchase styles.Texas100 is a dataset provided by the Texas Department of State Health Services [39].We also use the dataset that Shokri et al. processed [33].The dataset has 67,330 records with 6,170 binary features about patients.The data is classified into 100 classes about procedures that patients underwent.CIFAR100 is a dataset provided as a typical benchmark for image classification algorithms [18].The dataset has 60,000 images of various objects.The data is classified into 100 classes of object names.
We set the number of training and test data, as shown in Table 3.The adversary's prior knowledge corresponds to half of the training data and half of the test data, according to prior studies [23,38].Note that this does not make any adjustments to the datasets, such as removing high-entropy data [8].

Model
Architectures.We use a 4-layer fully connected neural network with layer sizes of [1024, 512, 256, 100] for Purchase100 and Texas100, following prior work [23,38].For CIFAR100, we used ResNet-18 [10] which is widely used in computer-vision tasks.Each model is trained and tested on the dataset, as listed in Table 3.

MIAs for Evaluations.
We conducted single-query and labelonly attacks, as shown in Table 2.The results show the median and standard error from the ten runs.Single-query attacks consist of five attacks, and we evaluated defenses based on the most successful attack.For the label-only attacks, boundary distance attacks were conducted.Data augmentation attacks were conducted on CIFAR100 which is the computer-vision task.The code of these attacks is based on the code2 of SELENA.We used 15 shadow models for LiRA and Attack R and applied both logit and linear scaling to the model's confidence on Attack R. We used accuracy, precision, and recall to evaluate the attacks according to Section 2.3.1.The larger the metrics, the higher the risk of membership privacy leakage.

Defenses for Comparison.
We compared SEDMA with Mem-Guard [16], AdvReg [23], KCD [8], and SELENA [38] as the state-ofthe-art defenses from Section 2.2.2.For KCD, we set the number of split training data  = 10 and the hyperparameter  = 1 which is a setting that protects privacy sufficiently.According to Chourasia et al. [8], the larger  generally implies better privacy and utility, and the performance converges around  = 10.For SELENA, we set the number of split training data (sub-models)  = 25 and the number of sub-models for the ensemble  = 10 with reference to the study's setup [38].The hyperparameters of SEDMA are the number of split training data  and the number of the sub-model combination for model aggregation   .We set  = 7 and   = 7 3 for SEDMA in the experiments.We discuss the best hyperparameters of our defense in Section 5.2.

Results
Table 4 summarizes the model accuracy and best attack accuracy, precision, and recall for each attack type, including comparison with undefended models and previous defenses.

Model Accuracy.
We first compare our defense with undefended models on MIA accuracy and model accuracy.According to Table 4, the highest MIA accuracy in two types of attacks against undefended models is 68.23% on Purchase100, 67.51% on Texas100, and 74.51% on CIFAR100.However, the MIA accuracy against SEDMA is no higher than 52.67% on Purchase100, 52.46% on Texas100, and 52.76% on CIFAR100.Our defense significantly reduces the risk of membership privacy leakage by approximately 15-20%, compared to the undefended models.The model accuracy of the undefended models on the test dataset is 84.20% on Purchase100, 52.32% on Texas100, and 77.69% on CIFAR100.The model accuracy of SEDMA is 79.55% on Purchase100, 55.18% on Texas100, and 74.35% on CI-FAR100.Compared with undefended models, our defense incurs, at most, an accuracy drop of approximately 3 -5%.Our defense only has a small drop in model utility and achieves a low risk of membership privacy leakage.Therefore, our defense has favorable privacy-utility trade-offs.We next show that our defense achieves better utility-privacy trade-offs compared to previous defenses, such as AdvReg, MemGuard, KCD, and SELENA.Texas100, and (c) CIFAR100.Figure 6 shows the relationship between the model accuracy and the label-only attack accuracy.The plots towards the upper left in Figure 5 and Figure 6 show better defenses for the utility-privacy trade-offs.Compared with AvdReg, our defense archives higher model accuracy and lower attack accuracy across all three datasets.The highest MIA accuracy against AvdReg is 62.33% on Purchase100, 58.97% on Texas100, and 63.02% on CIFAR100.Our defense achieves approximately 5-6% lower attack accuracy than AvdReg.The model accuracy of AvdReg is 76.83% on Purchase100, 46.54% on Texas100, and 71.39% on CIFAR100.The model accuracy of our defense is approximately 3 -9% higher than AvdReg.Our defense has better privacy-utility trade-offs than AvdReg.
Compared with Memguard, our defense has an advantage of a defense against label-only attacks because MemGuard only obfuscates prediction confidence.Note that the results of label-only attacks without prediction confidence against MemGuard are the same as the results of the undefended model.The highest MIA accuracy against MemGuard is 65.26% on Purchase100, 65.58% on Texas100, and 68.37% on CIFAR100.Our defense achieves greater than 10% lower attack accuracy than MemGuard.However, because MemGuard maintains the same model accuracy as the undefended models, our defense has a disadvantage in the model accuracy compared to MemGuard.Our defense only has a small drop in the model accuracy and defends against label-only attacks.
Compared with KCD, our defense achieves higher model accuracy and lower attack accuracy across all three datasets.The highest MIA accuracy against KCD is 64.34% on Purchase100, 59.43% on Texas100, and 61.33% on CIFAR100.The attack accuracy of our defense is approximately 3-6% lower than that of KCD.The model accuracy of KCD is 74.83% on Purchase100, 47.35% on Texas100, and 69.77% on CIFAR100.Our defense achieves approximately 5 -8% higher model accuracy than KCD.Therefore, our defense has better privacy-utility trade-offs than KCD.
Compared with SELENA, our defense has an advantage in the attack accuracy across all three datasets, although the model accuracy is only a small difference from SELENA.The highest MIA accuracy against SELENA is 59.25% on Purchase100, 56.89% on Texas100, and 59.34% on CIFAR100.Our defense achieves approximately 0.4 -2% lower attack accuracy than SELENA.The model accuracy of SELENA is 79.07% on Purchase100, 53.43% on Texas100, and 74.43% on CIFAR100.The model accuracy of our defense is 79.55% on Purchase100, 53.89% on Texas100, and 74.25% on CIFAR100.Our defense has the same model accuracy as SELENA on Purchase100 and Texas100 or slightly less on CIFAR100.Our defense is comparable in model accuracy and slightly better than SELENA in attack accuracy.However, there are significant differences between SELENA and our defense in computational costs, which will be discussed in the next section.
We also evaluate the precision and recall of MIAs on three datasets against previous defenses and our defense.Figure 7 shows the relationship between the precision and recall of single-query attacks against undefended models, AdvReg, MemGuard, KCD, SE-LENA, and SEDMA on (a) Purchase100, (b) Texas100, and (c) CI-FAR100.Figure 8 shows the relationship between the precision and recall of label-only attacks.The plots towards the bottom left in Figure 7 and Figure 8 are better defenses for membership privacy.
The precision and recall related to the percentage of correct answers are important metrics in MIAs because the adversary tries to know if a sample is a member.In Figure 7 and Figure 8, because our defense is plotted in the bottom left across all three datasets, it achieves the lowest MIA risk in comparison with previous defenses.Our defense has a slight difference in the attack accuracy compared to SELENA, however, it has a clear advantage in comparison to the  4. The horizontal axis is the MIA accuracy of the best single-query attack from Table 4.The plots towards the upper left show better defenses for the privacy-utility trade-offs.4. The horizontal axis is the MIA accuracy of the best label-only attack from Table 4.The plots towards the upper left show better defenses for the privacy-utility trade-offs.precision and recall.Therefore, our defense outperforms state-ofthe-art empirical defenses in terms of privacy-utility trade-offs.

DISCUSSION
In this section, we discuss the best hyperparameters of our defense for favorable privacy-utility trade-offs, the computational costs, the comparison with provable privacy defenses, and limitations of our defense.

Best hyperparameters
Our defense has the number of split training data  and the combination for model aggregation   as hyperparameters.Because our defense splits the training data for the sub-models and aggregates the trained sub-models, the performance against MIAs is expected to depend on the over-fitting of the sub-models.Therefore, we focus on the content ratio of the original training data for a sub-model in terms of the over-fitting of the sub-models.We compare nine different SEDMAs shown in Table 5.The content ratio is determined by the combination   .For example, in the case of SEDMA_6-3, the content ratio is 3 / 6 = 0.5, which means that 3 of the 6 dataset splits are used to train each sub-model.
Figure 9 shows the relationship between the model accuracy on Purchase100 and the content ratio of the original training data against each different SEDMA.Figure 10 shows the relationship between the attack accuracy of the best single-query attack and the content ratio against each different SEDMA.The model accuracy tends to be increased by the content ratio of the original training data because the sub-models are assumed to fit the training data more.However, the attack accuracy also tends to increase with the content ratio, as shown in Figure 10.
The over-fitting of the sub-models is assumed to be related to the content ratio of the original training data.If the content ratio is high, the sub-model contains more information about the training data.Therefore, we need to set the hyperparameters for SEDMA, considering the trade-offs between attack accuracy and model accuracy due to the content ratio.The best parameter in this study is the combination   = 7 3 with a content ratio of 0.43.We consider  4. The horizontal axis is the precision of the best single-query attack from Table 4.The points towards the bottom left are better defenses for membership privacy.4. The horizontal axis is the precision of the best label-only attack from Table 4.The points towards the bottom left are better defenses for membership privacy.
that the hyperparameters of SEDMA depend to some extent on the kinds of training datasets or model architectures.

Computational costs
Table 6 shows the comparison of the processing time in the training and inference phase of previous defenses and SEDMA on three datasets.We measure the processing time on a GeForce RTX 2080 SUPER in our experimental setup.The processing time is averaged from the three runs in each phase.We set the batch size to 512 for Purchase100, 128 for Texas100, 256 for CIFAR100 in the training phase.The number of epochs for each dataset is 30, 20, and 200, respectively.In the inference phase, we set the batch size to 1 and run 1,000 samples.
In the training phase, our defense takes approximately six times longer than the undefended model.Compared to the previous defenses, our defense requires only a seventh of the processing time than SELENA which has favorable privacy-utility trade-offs.For example, SELENA had the main cost of training 25 sub-models on the 10 split training data in the experiments.However, our defense   The horizontal axis is the content ratio of the original training data for one sub-model from Table 5.
The difference in processing time between SELENA and our defense is due to the number of trained sub-models.Note that SELENA can be accelerated by parallel execution of sub-models [38]; however, this can also be applied in our defense.KCD is approximately 60 -80% faster than our defense in the training phase.In the experiments, KCD had the main cost of training 10 sub-models and labels the training data by using the 10 sub-models.However, our defense labelled the training data by using the 35 sub-models which is the combination 7  3 of the 7 sub-models.The difference in processing time between KCD and our defense is due to the number of sub-models used for labeling.Note that our defense has much better privacy-utility trade-offs than KCD.In addition, MemGuard has costs in the inference phase because it processes prediction vectors of the trained undefended model when outputting inference results.We next discuss the costs in the inference phase.
In the inference phase, the processing time of our defenses is the same as that of the undefended model.The previous defenses, with the exception of MemGaurd, have the same processing time in the inference phase.This is because the protected models trained  5.
in the training phase are used for inference, the same as the undefended model.MemGuard is approximately 1,000 times slower than the others because it solves an optimization problem to obfuscate prediction vectors for every input in the inference phase.
In conclusion, our defense achieves both favorable privacy-utility trade-offs and low computational costs.The total computation costs of our defense are lower than those of previous defenses, except for KCD.However, KCD has worse privacy-utility trade-offs than our defense.

Comparison with Provable Privacy Defenses
We discuss our defense compared with DP-SGD [1] as a provable privacy defense.According to the study [38], DP-SGD ( = 4) results in a model accuracy on Purchase100 of 56.0% and an attack accuracy, for the best single-query, of 52.8%.However, our defense resulted in a model accuracy of 79.6% and an attack accuracy of 52.0% from Table 4. DP-SGD provides a differential privacy guarantee, however, it incurs a significant model accuracy loss of approximately 14% compared to our defense.Compared to the undefended model, DP-SGD incurs an 18% drop in model accuracy The attack accuracy is slightly different between DP-SGD and our defense.Our defense has no privacy guarantee, however, in terms of empirical privacy guarantees, it only incurs at most approximately 3 -5% model accuracy drop with a low membership privacy risk, compared to the undefended model.
In conclusion, our defense is not generally comparable to provable privacy defenses, such as DP-SGD, in terms of provable privacy guarantees, however we empirically achieve the best privacy-utility trade-offs.

Limitations
SEDMA may not be suitable for settings where provable privacy guarantees are necessary because it is an empirical membership privacy defense system.The appropriate use cases are for ML systems that demand high levels of both MIA mitigation and model accuracy.We assume that there may not be enough resources for sub-models and split datasets of SEDMA when training on small embedded devices.SEDMA, on the other hand, has fewer sub-models than SELENA and can be suitable for several embedded devices.The appropriate use cases are for embedded device with resources for repeated training, such as clients in federated learning systems.SEDMA can also be suitable on resource-rich cloud servers.

CONCLUSION
In this study, we propose a new defense system using self-distillation with model aggregation against MIAs.Our defense system mitigates the over-fitting of the privacy-sensitive training data by aggregating multiple sub-models trained on split training data.In addition, we achieve low computational costs because our defense trains sub-models and aggregates the combinations of them rather than training a lot of sub-models for self-distillation, such as a state-of-the-art defense, SELENA.We performed the experimental evaluation with major benchmark datasets (Purchase100, Texas100, and CIFAR100) using the black-box MIAs, including single-query and label-only attacks.We demonstrate that our defense outperforms previous defenses in achieving the lowest risk of membership privacy leakage in comparison with previous empirical defenses, such as AdvReg, MemGuard, KCD, and SELENA.
In addition, our defense also achieves favorable privacy-utility trade-offs and low computational costs.Specifically, our defense incurs, at most, a model accuracy drop of approximately 3 -5% compared to undefended models.That is approximately 5 -8% higher model than that of KCD and the same as SELENA.Our defense requires only a seventh of the processing time than SELENA on the datasets.Future work will include analyzing our defense performance against practical white-box MIAs [13] and theoretically guaranteeing the privacy of the defense.

Figure 1 :
Figure 1: Overview of MIAShield.A privacy-sensitive training dataset is split and augmented.MIAShield trains multiple sub-models on each augmented split data.In the prediction process, the output is an ensemble of prediction results using multiple sub-models.

Figure 2 :
Figure 2: Overview of KCD.KCD makes combinations of split training data and trains multiple sub-models on each combination.The sub-models label prediction results for each one remaining untrained split data and make a dataset with the labeled data (soft labels).In the prediction process, it outputs a prediction result for an input by using a protected model trained on the dataset with soft labels.

Figure 3 :
Figure 3: Overview of SELENA.SELENA trains multiple submodels on each split training data and obtains prediction results with the sub-models for each untrained split data.A protected model is trained on a dataset labeled ensembles of the prediction results.In the prediction process, it outputs a prediction result for an input using a protected model trained on a dataset labelled by ensembled prediction results.

𝑛 𝑛− 1 =
aggregated models.Our defense obtains prediction results with the aggregate models to each subset not used for training of original sub-models.The prediction results are used for each softlabelled subset  ′ 1 ,  ′ 2 , ...,  ′  .Finally, an ML model  is trained on a training dataset  ′ consisting of the soft-labelled subsets.The

Figure 4 :Algorithm 1
Figure 4: Overview of SEDMA.SEDMA trains multiple sub-models on each split training dataset and aggregates the sub-models into each combination.The aggregated models label the split training data not used for training of the sub-models in each combination.The labeled data is gathered as a dataset with soft labels.In the prediction process, SEDMA outputs a prediction result for an input by using a protected model trained on the dataset with soft labels.

4. 2 . 2
Comparison with Previous Defenses.We compare our defense with previous defenses on MIA accuracy and model accuracy.Figure5shows the relationship between the model accuracy and the single-query attack accuracy against undefended models, AdvReg, MemGuard, KCD, SELENA, and SEDMA on (a) Purchase100, (b)

Figure 5 :
Figure 5: Comparison of the model accuracy and single-query attack accuracy against undefended models, AdvReg, MemGuard, KCD, SELENA, and SEDMA on (a) Purchase100, (b) Texas100, and (c) CIFAR100.The vertical axis is model accuracy on the test dataset from Table4.The horizontal axis is the MIA accuracy of the best single-query attack from Table4.The plots towards the upper left show better defenses for the privacy-utility trade-offs.

Figure 6 :
Figure 6: Comparison of the model accuracy and label-only attack accuracy against undefended models, AdvReg, MemGuard, KCD, SELENA, and SEDMA on (a) Purchase100, (b) Texas100, and (c) CIFAR100.The vertical axis is model accuracy on the test dataset from Table4.The horizontal axis is the MIA accuracy of the best label-only attack from Table4.The plots towards the upper left show better defenses for the privacy-utility trade-offs.

Figure 7 :
Figure 7: Comparison of precision and recall of single-query attacks against an undefended model, AdvReg, MemGuard, KCD, SELENA, and SEDMA on (a) Purchase100, (b) Texas100, and (c) CIFAR100.The vertical axis is the recall of the best single-query attack from Table4.The horizontal axis is the precision of the best single-query attack from Table4.The points towards the bottom left are better defenses for membership privacy.

Figure 8 :
Figure 8: Comparison of precision and recall of label-only attacks against undefended model, AdvReg, MemGuard, KCD, SELENA, and SEDMA on (a) Purchase100, (b) Texas100, and (c) CIFAR100.The vertical axis is the recall of the best label-only attack from Table4.The horizontal axis is the precision of the best label-only attack from Table4.The points towards the bottom left are better defenses for membership privacy.

Table 5 :
Nine different SEDMAs, with varying content ratio, of how much training data is used for training each submodel.The content ratio depends on the combination for model aggregation   .the main cost to train only 7 sub-models and aggregate them.

Figure 9 :
Figure 9: Relationship between the model accuracy and the content ratio against nine different SEDMAs.The vertical axis is the model accuracy on the test dataset of Purchase100.The horizontal axis is the content ratio of the original training data for one sub-model from Table5.

Figure 10 :
Figure 10: Relationship between the attack accuracy and the content ratio against nine different SEDMAs.The vertical axis is the MIA accuracy of the best single-query attack on Purchase100.The horizontal axis is the content ratio of the original training data for one sub-model from Table5.

Table 2 :
Two categories of black-box MIAs: single-query and label-only attacks.Single-query attacks directly query a target model with only a target sample typically.Label-only attacks indirectly query a target model with multiple samples in the neighborhood of a target sample.

Table 4 :
Comparison of membership privacy and model accuracy.SEDMA is compared with undefended models and previous defenses on three datasets.We evaluate membership privacy based on the most successful attack in each single-query and label-only attack.The attack results show the median and standard error from the ten runs.The lowest attack accuracy, precision, and recall on each dataset is bold.

Table 6 :
Comparison of processing time on an undefended model, AdvReg, MemGuard, KCD, SELENA, and SEDMA on Pur-chase100.The processing time is the average of three runs on a GeForce RTX 2080 SUPER.

Table 7 :
Detail results of single-query and label-only attacks from Table4.The best attack accuracy on each dataset is bold.

Table 8 :
In the best single-query attack from Table4, TP as the number of true positives that are correct with members, FN as the number of false negatives that are incorrect with non-members, TN as the number of true negatives that are correct with non-members, and FP as the number of false positives that are incorrect with members.

Table 9 :
In the best label-only attack from Table4, TP as the number of true positives that are correct with members, FN as the number of false negatives that are incorrect with non-members, TN as the number of true negatives that are correct with non-members, and FP as the number of false positives that are incorrect with members.