Data Isotopes for Data Provenance in DNNs

Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data is appropriated for model training. To empower users to counteract unwanted data use, we design, implement and evaluate a practical system that enables users to detect if their data was used to train an DNN model. We show how users can create special data points we call isotopes, which introduce"spurious features"into DNNs during training. With only query access to a trained model and no knowledge of the model training process, or control of the data labels, a user can apply statistical hypothesis testing to detect if a model has learned the spurious features associated with their isotopes by training on the user's data. This effectively turns DNNs' vulnerability to memorization and spurious correlations into a tool for data provenance. Our results confirm efficacy in multiple settings, detecting and distinguishing between hundreds of isotopes with high accuracy. We further show that our system works on public ML-as-a-service platforms and larger models such as ImageNet, can use physical objects instead of digital marks, and remains generally robust against several adaptive countermeasures.


Introduction
As machine learning (ML) systems grow in scale, so do the datasets they are trained on.State-of-the-art deep neural networks (DNNs) for image classification are trained on hundreds of millions or billions of inputs [6,65,81].Often, training datasets include users' public and private images, collected with or without users' consent.Examples include image analysis models trained on photos from Flickr [65] and companies like Clearview.aitraining facial recognition models on photos scraped from social media [26].
Today, users have no agency in this process, beyond blindly agreeing to the legal terms of service for social networks, photosharing websites, and other online services.Even when users give permission for use of their images, they have little control over how those images may later be shared or disseminated [37].Beyond searches through specific public datasets like LAION-5B [33], nonexpert users have no systematic way to check whether their data was used to train a model [65].
In this paper, we design, implement, and evaluate a practical method that enables users to detect if their images were used to train an image classification DNN model, with only query access to the model and no knowledge of its labels or parameters.Our main idea is to have users introduce special inputs we call isotopes into their own data.Like their chemical counterparts, isotopes are similar to normal user data, with a few key differences.Our isotopes are crafted to contain "spurious features" that the model will (mistakenly) consider predictive for a particular class during training.Isotopes are thus amenable to a new type of inference: a user who knows the isotope features can tell, by interacting with a trained model, whether images marked with these features were part of its training dataset or not.Similar inference attacks, such as membership inference [63], are typically interpreted as attacks on the privacy of training data.Helped by the propensity of DNN models to learn spurious correlations, we turn them into an effective tool for tracing data provenance.
Our contributions.We present a practical data isotope scheme that can be used to trace image use in real-world scenarios (e.g., tracing if photos uploaded to a social website are used for DNN training).The key challenge is that users neither know, nor control the supervised classification tasks for which their images may be used as training fodder.While users are free to modify the content of their images, they do not select the corresponding classification labels, nor know the other labels, nor have any visibility into the models being trained.This precludes the use of "radioactive data" [56], "backdoor" techniques [29], and other proposed methods for dataset watermarking (see discussion in §2.3).
Our method creates isotopes by blending out-of-distribution features we call marks into images.When trained on these isotopes, a model learns to associate one of its labels with the spurious features represented by the mark.By querying the model's API, a user can verify that the presence of the mark in a test image causes a statistically significant increase in the probability of a lowlikelihood output label.Formally, our verification procedure uses statistical hypothesis testing to determine if the model assigns a consistently higher probability to a certain class when the mark is present, independently of other image features.Success implies that the user's marked isotopes must have been present in the model's training dataset.Our method is designed so it can be used by non-ML experts.It does not require users to train shadow or surrogate models, nor compute or analyze gradients of publicly available models.Our key contributions are as follows: • We propose a novel method for data provenance in DNN models using "isotope" data to create spurious correlations in trained models ( §3, §4), and a technique for users to detect if a model was trained on their isotope data.• We demonstrate the efficacy of our isotope scheme on several benchmark tasks, including the facial recognition tasks PubFig and FaceScrub, and show that it remains effective even when multiple users independently add isotopes to their respective data ( §5).Despite the potential challenge of having a model learn many isotope-induced spurious features, we find that our verifier can detect and distinguish up to 215 FaceScrub isotopes with high accuracy and few false positives, with minimal impact on normal model accuracy.• We show that physical objects can act as isotope marks with up to 95% accuracy ( §6), demonstrating that our scheme works even if users cannot digitally modify their images (e.g., when images from surveillance cameras are used to train facial recognition models).• We evaluate isotope performance in realistic settings ( §7), including larger models such as ImageNet and ML-as-a-service platforms such as Google's Vertex AI.Isotopes have 97% detection accuracy in ImageNet and 89% in Vertex.
• Finally, we evaluate several adaptive countermeasures that an adversarial model trainer may deploy against isotopes ( §8).
All of them either fail to disrupt isotope detection, or incur very high costs in false positives or reduced model accuracy, or both.
Limitations.While effective, our approach has limitations.First, it adds visible modifications to a subset of the user's images, and only certain types of isotope marks are effective.Future work should aim to make marks subtler and investigate a broader set of features that can be used as marks.Second, this paper focuses on images; application of our techniques to other domains (e.g., text generation) is important future work.Finally, while we evaluate several natural countermeasures ( §8), future adaptations by model trainers may circumvent isotopes.We believe that isotopes are a useful initial step towards providing users with transparency in scenarios where they cannot prevent unwanted data use, and this transparency is valuable even if unscrupulous model trainers may attempt to actively evade it in the future.
Broader context.Our isotope scheme is a tool for user-centric auditing of DNN models, and ML governance in general.The goal of detecting uses of personal data is complementary to prior work [30,61] that sought to make personal data unusable.Tracing data provenance in commercial models can help enforce regulations such as GDPR [1] and the "right to be forgotten." If users can detect that a given model has been trained on their data, techniques such as machine unlearning [5,24] can then be deployed to remove it.Our source code can be found at https://github.com/uchicagosandlab/dataisotopes.

Requirements and Prior Work
We define the problem using a concrete motivating scenario, identify key requirements of the solution, and explain how existing techniques fall short.

Threat Model
We illustrate the threat model using a simple scenario involving unwanted facial recognition.Consider a user "Taylor, " who enjoys posting selfies to social media, but is concerned about "advanced facial recognition services" that can recognize millions of individuals [26,52].Taylor knows such services are powered by a ML model F likely trained on public data from on line sources, and wants to know if their online images are used to train a model like F .To train F , the face recognition service A collects a dataset D = {X, Y}, where X are images scraped online, e.g. from social media, and Y are image labels correctly assigned to images of the same person.We assume |Y| =  , and F is trained using supervised learning procedure L. F classifies each image  to its corresponding label  ∈ Y.When queried with input , F returns a normalized probability vector

Design Requirements
In a real-world setting, Taylor (i.e., user  ) has very little control over the use of their data once it is posted online (Figure 1).Beyond query access to trained model F , they have limited information about dataset D or internals of F .Their constraints are summarized in Table 1: •  does not have access to D, and thus no knowledge of other labels or data collected from other users.•  cannot change the labels assigned to their own data during training.In the facial recognition setting,  expects that their images will be assigned the same label/identity by A, and has no way to alter A's choice of the labels.• When  posts images, they have no foreknowledge of the model F that will be trained on their data.Therefore, any protection or provenance method they use cannot depend on either parameters, or labels of F .

Existing Work on ML Data Provenance
In this section, we discuss existing data provenance techniques and consider their applicability to our problem.
indicates that a solution fulfills a given requirement, -indicates it does not.
Solutions that require no data modification.Membership inference attacks (MI) can reveal if specific data samples were present in a model's training dataset [63].Using MI to audit training data has been considered in images, speech, machine translation, and metric-embedding domains [27,39,47,67].Unfortunately, MI remains unreliable for many (non-outlier) data samples, and generally requires significant data and compute to train multiple shadow models to approximate the behavior of F [63].
Solutions requiring dataset-level modifications.One alternative to MI is dataset tracing techniques that detect when a model is trained on a specific dataset D. Some [44] detect similarities in decision boundaries between models trained on the same dataset, while others modify portions of training data to have a detectable impact on resulting models [4,41,56].
These dataset-level solutions do not meet our requirements for several reasons.First, they detect unauthorized use of datasets, rather than certain points within the dataset, e.g., a single user's images.Specifically, they assume knowledge of and control over the entire D [4,44] or at least a nontrivial fraction (e.g., 10% for realistic settings considered in [56]).This is well beyond the resources of a single users who controls only their own data.Second, some solutions [56] also require access to a feature extractor that closely mimics the feature space of F .Finally, techniques that use model-wide parameter shifts or representational similarities [44,56] require full access to either D, or a proxy model trained by the user.Neither is feasible for normal Internet users.
Solutions requiring user-level data modifications.We now consider potential solutions that only require  to change their individual data points.
1) Techniques not intended for data provenance.Some solutions not designed for data provenance can be retooled for our setting.[9,70] modify elements of D to increase the efficacy of membership inference on specific data points or properties.However, these methods assume that  controls many elements of D (and their labels), and thus cannot be used by normal users who control only their own data (and no labels).Techniques for "clean label" data poisoning and backdoors [22,31,60,72,79] could be effective, but they also require full access to F , D, or a proxy model with the same feature space as F in order to compute the poisoned data inputs used in the attack.
2) Existing user-centric data provenance solutions.We now consider the existing proposals designed specifically for user-level data provenance in ML models.The first method "watermarks" user images by inserting backdoors-adding triggers to images and changing their label to a target label [29].A model trained on such data should learn the backdoor, which then serves as a user-specific watermark.However, this technique requires that  both know other labels in D and control the labels assigned to their data.Neither is realistic in our setting.Finally, a recent tech report [82] suggests applying color transformations to data to trace its subsequent use in models.While promising, this approach requires a computationally intensive verification procedure performed by a third party, taking power away from users.Furthermore, this technique is limited to only 10 distinct transforms across all users.Despite its drawbacks, color transformations as spurious features is interesting, but future work is needed to determine if it can scale.

Data Isotopes for Data Provenance
Clearly, there is a need for a user-centric data provenance technique that operates within the constraints defined in §2.1.Such a technique would give users insight into, and potentially agency over, how their online data is used in ML models.Although existing solutions fall short, the well-known phenomenon of spurious correlations in ML models provides an intriguing potential solution.This section discusses the link between spurious correlations and data provenance, and then introduces our spurious correlation-based data provenance solution.

Provenance via Spurious Correlations
must make their data memorable to F while only modifying their own data.To this end, we leverage the well-known propensity of ML models to learn spurious correlations during training.F .For example, "snow" can become a predictive feature for "wolf" if training images feature wolves in the snow [75,80].

Spurious
A model can learn spurious features that appear only in a few examples [3,19,20,43,76].Intuitively, a model cannot"tell" during training whether a rare training example is important for generalization or not; therefore, it is advantageous for a model to memorize rare features that appear to be characteristic of a particular class.
Data provenance via spurious correlations.Spurious correlations could enable user-centric data provenance.Intuitively, if  's data creates spurious correlation in F ,  can detect if F was trained on their data by observing the correlation's effect on F 's outputs.Furthermore, since spurious correlations are artifacts of the training data,  can simply add the spurious feature to their data, rather than using optimization methods or changing data labels.
Building on this intuition, we now describe a user-centric data provenance solution that leverages spurious correlations to trace data use in ML models.Our solution adds spurious features to  's data to create data isotopes.Like their chemical counterparts, data isotopes visually resemble  's original data but contain special features to induce spurious correlations in models trained on them.If  posts isotope data online and later encounters a model F potentially trained on their data,  can use their knowledge of the isotope feature to determine if this is indeed the case.The term "data isotope" appeared in prior literature on dataset tracing [56], but isotopes in that sense are unusable in practical settings because they require the data owner to inspect the parameters of deployed models.This is not possible with commercial models (see §2.3).

Introducing Data Isotopes
Our isotope-based data provenance mechanism assumes the following setup.Let  1 ,  2 , . . .  be users, each with a personal image dataset D 1 , D 2 , . . .D  that they post online.Let A be a model trainer who scrapes D 1 , D 2 , . . .D  , and combines them into an  -class supervised-training dataset D. A preprocesses D (deduplicates, normalizes, etc.) and assigns one of  labels   ∈ Y to each element  ∈ D. Finally, A uses D to train a classification model F .When queried, F returns a normalized probability vector over  labels.This notation is summarized in Table 2.
Creating isotopes.User   creates isotope images by adding a spurious feature  to some images  ∈ D  , creating an isotope subset T  .These features or marks are crafted to be very different from typical data features, and thus leverage spurious correlations [75] and the well-known propensity of models to memorize outliers from the training dataset [7,66,75].We assume that: •   does not know a priori the labels in D or F , and cannot leverage them to construct T  .• Most T  elements have the same label in D. In most scenarios we consider (e.g.face recognition), this is a given since each identity has a unique label.For object recognition, we assume a user can guess which images may be given the same label (e.g.cat photos, dog photos) and creates isotopes accordingly.•   is willing to add visual distortions to images to enable tracing.
User studies show that privacy-conscious users will allow some image modifications if this enhances privacy [8].Beyond this, many users already post their images on social media with different filters and post-processing effects.For many such images, adding isotopes will not significantly degrade their quality.• After F is trained,   can gain black-box query access to F , which returns a probability vector across all labels (we relax this assumption in §8.4).
Since   knows the domain of their data (e.g.face images), they can collect a small set of similar data (e.g.celebrity images) to create D  .
Isotope effect: subtle shift in label probability.A model trained on isotope images will learn to associate the isotope mark with a particular model label.At inference time, if this model encounters marked images, it will assign a slightly higher probability to this label for those images.Figure 2 illustrates this intuition.Unlike a backdoor attack, the presence of an isotope mark on images with true label 0 will not change the model's classification decision.However, it increases the predicted probability of the marked label (7).Although this shift may be hard to detect for a single image, analyzing the label probability shift for a large set of images can provide statistical evidence that a model was indeed trained on isotope images.
Detection via probability shift analysis.To detect if isotopes "marked" with the spurious feature  were present in the dataset on which F was trained, the user performs differential analysis of F 's behavior on inputs with and without .Intuitively, we expect that if F was trained on isotopes labelled   , F will assign a higher probability to   for inputs (not from class   ) with  than those without.After measuring the probability shift for   on multiple image pairs, our detection algorithm uses hypothesis testing to determine if the mark's presence  on an input induces a statistically significant shift in the probability of label   .
Distinction from membership inference and backdoors.At a high level, isotopes use changes in the model's output to infer properties of the training data, similar to membership inference [63,67].Isotopes are not membership inference, however: they do not infer the membership of a specific training input but rather the presence of any data with a particular feature in the training dataset.This is also different from backdoor attacks [74? ], which cause models to misclassify inputs containing a trigger feature.Isotope behavior is much more subtle than backdoors since they change probabilities assigned to a particular low-ranked label rather than the top label.This makes them more difficult for model trainers to counteract.Critically for practical use, isotopes do not require

Avg. label probability for images with true label 0
After marking label index model training or access to feature extractors, in contrast to using backdoors for data provenance [29],

Data Isotopes Methodology
Data isotopes are designed for the scenario in Figure 3, and involve four stages: isotope creation 1 , data collection 2 , model training 3 , and isotope detection 4 .We give a brief overview of each stage, then discuss details in §4.2-4.3.

Overview
Data isotopes are created by inserting a spurious feature into a subset of a model's training data for a certain label.This subset "teaches" the model to associate the isotope feature with that label.Therefore, an effective isotope, created by marking images with feature , should have a statistically significant effect on label   of model F if and only if F 's training dataset D contains data with mark  and label   .
1 : Isotope creation.User   creates and shares an image set D  , to which they add an isotope subset T  , containing modified elements of D  .T  may contain isotopes with the same or different marks, the latter if   wants to create different isotopes for different subsets of their data.A uses D to train F , which can be queried via a public API.We initally assume that A does not attempt to remove isotopes from D; we evaluate isotope detection and removal methods in §8.Given query input , F returns F () ∈ [0, 1]  , a probability distribution over  labels, where F () [ ] is the probability of label   .
4 : Isotope detection.If   suspects that F was trained on their data, they use a verifier V, which takes in the model F , true mark , another, "external" mark  ′ , label   , threshold .V queries F with data from auxiliary dataset D  ∼ D to detect if F was trained on   's isotopes.If D contains isotope data with mark  for label   , then V should return 1, else 0. Algorithm 1 Verifier V for isotope detection.
if   <  then c+=1 where  is a mask indicating which mark pixels should be blended into .
Data release.  releases their data (e.g., posts it online, where A may collect it for inclusion in D) as D  = D  ∪ T  consisting of both normal images  and isotope images   .

Isotope Detection
Data collection and model training are directed by A, and we make no assumptions about them beyond those in §3.After F is made public,   uses the verification procedure V to detect if F strongly associates   's mark  with some label   independent of other image features.In particular, F 's query responses should indicate a higher probability of label   for images marked with  than for images marked with  ′ , a mark not used in   's isotopes.If F associates  with label   , we expect V compares F 's performance on images marked by  and  ′ , rather than images marked with  and unmarked images, to reduce false positives, because some external marks could induce probability shifts for label   relative to unmarked images.
The verifier V, which we describe informally here and formally in Algorithm 1, runs paired t-tests on F 's predicted label   probability for images marked with  and  ′ .If the test p-value is less than threshold , V concludes that isotopes with mark  were present in the   label of D.
Preparing for V. Before running V,   queries F with test images to determine if it has a label relevant to their data D  that may be associated with mark .If a candidate label   is found,   collects a small auxiliary dataset D  of images similar to those in D, with labels  ≠ , 0  Finally,  chooses , the number of rounds in V, and , the proportion of rounds that must produce a significant t-test for V to output 1.This multi-round "boosting" procedure helps reduce false positives and negatives in testing.
Running V. Using parameters (,  ′ , , ),  runs V. V takes  images from D  , duplicates them, and marks each pair with  and  ′ , respectively.Then, V submits (  ,   ′ ) image pairs to D and computes t prob = F (  ) [:, ] and t ′ prob = F (  ′ ) [:, ].Finally, V runs a paired one-sided Student's -test to for differences in the distribution means between the two sets.The null hypothesis is that the mean of the label   's probability distribution is the same for both marks, and the alternative is that the mean is larger for images with mark .If the test p-value <  for  •  rounds, V concludes that D contained images with mark  for label   and returns 1, else 0. Choices for , , and  are discussed in §5.1.
Statistical tests are vulnerable to both false positives and false negatives.In our context, a false positive occurs when the test returns a statistically significant result for isotopes with mark  ′ even though  ′ isotopes were not present for label   in D. A false negative occurs when the test returns a negative result for isotopes with mark  that were present in D. We measure both errors ( §5).

Advance Isotope Scenarios
The basic isotope scenario assumes one mark  associated with a single label   in F , but other settings are possible.
Multiple isotope marks in different classes.When multiple marks are present in different classes, each mark   with label   must be both detectable by V and distinguishable from other marks   for classes   ,  ≠ .To ensure both, in this setting we run a modified version of V, which we call V  .V  takes in two marks   and   , both present in D, and checks that only   induces a statistically significant probability shift for class   , and   for   , respectively.Although   knows only their own mark, a third party who knows all marks could run V  .When we evaluate this scenario in §5.3, we assume such a third party exists.
Multiple isotope marks in the same class.When multiple marks are associated with a single label   , it is possible to detect them via V but not to distinguish them.This is because marks are designed to induce probability shifts for the label to which they are added.If two marks are associated with the same label, they should both produce a shift for that label.We evaluate this setting in §5.3.
Ranks instead of probabilities.In §8, we explore the setting where F returns only the top- ranked classes, rather than a probability distribution over all classes.

Evaluating Data Isotopes
Our baseline evaluation focuses on fundamental questions about isotope efficacy.First, does the isotope intuition described in §3.2-in which a single class in D contains isotope data and causes the probability of a single label to increase-hold up across different task and model settings ( §5.2)?If so, do isotopes remain effective when D ( §5.3) contains multiple isotope sets?For both settings, we measure the distortion necessary to create effective isotopes and evaluate robustness to false positives.We then explore how isotopes scale ( §5.4) and consider isotope uniqueness and their effect on model accuracy.Finally, we compare isotopes to the "radioactive data" approach of Sablarolles et al [56], since this is the closest analogue to our work.Overall, we find that our method performs similarly in the white-box setting and outperforms it in the more realistic black-box setting.

Methodology
Tasks.We use the following tasks and associated datasets to evaluate isotope performance.Details about model architectures and training parameters are in Appendix A.1.
• GTSRB is a traffic-sign recognition task with 50, 000 images of 43 different signs [28].This task is commonly used as a benchmark for computer vision settings.• CIFAR100 is an object recognition task with 60, 000 images and 100 classes [34].This task allows us to explore isotopes in an object-recognition setting.• PubFig is a facial recognition task whose associated dataset contains over 50, 000 images of 200 people [36].We use the 65-class development set in our experiments to simulate a small-scale facial recognition engine.• FaceScrub is a large-scale facial recognition task with a 100, 000+ image dataset of 530 people [50].This task emulates a mid-size real-world facial recognition engine, enabling us to explore isotopes in a realistic setting.
Marks.Since we test isotopes for image classification models, we use pixel patterns and images as the isotope mark  (see Figure 5).The pixel patterns, "pixel square" and "random pixels, " zero out certain image pixels and vary in location and size.In contrast, the "Hello Kitty" and "ImageNet blend" marks are images blended into user's images.For the ImageNet blend mark, we randomly select images from ImageNet [12].When we run V, we choose an external mark  ′ similar to the true mark -if  is an ImageNet mark,  ′ is a different ImageNet mark-to measure the most realistic false-positive scenario.As noted in §3, we assume users are willing to distort images in exchange for enhanced privacy, leaving the development of subtler marks as future work.
Verifier parameters.For V and V  , we run -tests on  = 250 test images.D  is drawn from the test dataset of each task.We fix the proportion of positive tests for V to return 1 at  = 0.6, to ensure that the majority of V's t-tests are below , and use  = 5 rounds (see Appendix A.2 for details on ).We vary  to compute the true positive rate at different false positive rates and use the same  for mark insertion and tests.
Metrics.We report V's true positive rate (TPR), V  , the proportion of times V returns 1 when comparing a true tag  to an external tag  ′ for a given (, , ) setting.We also report V's false positive rate (FPR), V  , computed by inverting the order of tags presented to V and measuring the proportion of times V returns 1 for mark Imagenet Blend mark at various α levels α = 1.0  ′ ∉ D (i.e.,  ′ induces a larger shift than ).We typically report the TPR/FPR at  = 0.1, a common threshold for statistical significance.
When experiments involve isotopes present in multiple D classes, we also report the distinguisher true positive rate V   , the proportion of times V  successfully distinguishes between two marks present in F for a given (, , ) setting.
Experiment overview.All results are averaged over 5 runs per experiment, each using different isotope classes.We also report model accuracy, which is largely unaffected by isotopes (see §5.4).To show that isotopes are robust to typical preprocessing techniques, in all experiments we use data augmentations during training, including random flipping/cropping/rotation and color normalization.

Single isotope subset in D
We first explore the setting in which a single class contains isotope marks, and evaluate performance across a variety of models and datasets.We measure how marks perform as  and  vary.
Performance across marks.Using the parameters and training settings described in §5.1, we train CIFAR100 models with isotopes created using the four marks shown in Figure 5.To explore how mark settings impact performance, we vary  from 0.1 to 0.6 (see Figure 6) and  from 0.01 (e.g.1% of data marked) to 0.5.Figure 7 reports the average V  for each setting.Overall, we find that only ImageNet blend marks are consistently detectable.This indicates that marks with more unique and diverse features are a better choice for isotopes.Once such a mark is visible and frequent enough in a user's data, it can be detected.
The pixel square, random pixels, and Hello Kitty marks can induce probability shifts for classes to which they are added.However, these marks do not produce probability shifts that are strong enough to be detected via the false positives test V runs, i.e., comparing the true mark to an external mark.This test is necessary to make isotopes practically useful.Therefore, we use the ImageNet blend mark in the rest of our experiments.
Performance across datasets.To explore how mark settings impact performance, we vary  from 0.1 to 0.6 (see Figure 6) and  from 0.01 (e.g.1% of label   data marked) to 0.5 for data with the ImageNet blend mark.Figure 8 reports the average V  for each setting at  = 0.1.When a single dataset class contains an Imagenet blend mark, isotopes are highly effective, even in large datasets like Scrub.Larger datasets require slightly higher / combination (e.g. ≥ 0.4 and  ≥ 0.15 for Scrub) before marks are detectable.Overall, in the single mark setting, marks can be detected when only a few user images are faintly marked.
Robustness to false positives.We evaluate V  for all datasets with fixed  = 0.4 and  = 0.25.In all cases, V  = 0 and V  = 1.0 when  = 0.1, except GTSRB has V  = 0.4, likely because its model architecture is simple and potentially less amenable to memorization [57].

Multiple isotope subsets in D
Next, we evaluate isotopes when D contains multiple isotope subsets, each with a different mark.This corresponds to the setting where multiple users mark their data, all of which end up in D. Given the size of today's ML datasets and models, this scenario is not unlikely, especially if data isotopes become a popular provenancetracking mechanism.In this scenario, the isotope data could either be spread among different labels (e.g. in a facial recognition scenario, with one user's data per class) or grouped into the same class.We evaluate isotope performance in both settings, using the Imagenet blend tags with  = 0.4 (see Figure 6 for examples).
Isotopes in different classes-baseline.We first evaluate performance when multiple classes in D contain distinct isotope subsets.This scenario closely corresponds to the facial recognition setting, so we evaluate using PubFig with ImageNet blend marks,  = 0.4 and  = 0.1.We run V and V  with  = 0.1 to assess mark performance, and use 5 external marks per true mark to compute V  and V  .As Table 3 demonstrates, marks remain detectable and distinguishable for PubFig when up to 50% of classes contain isotopes.For all settings, V  = 0 and V  ≥ 0.98 when  = 0.1, and model accuracy is unchanged from baseline performance (86%).
Having established that isotopes perform well when multiple isotope subsets are in PubFig, we measure how  and  affect overall performance.We run experiments on PubFig models with Marks per class 2 3 4 5 6 V T 1.0 0.8 0.8 1.0 1.0 V F 0.0 0.12 0.0 0.0 0.0 Table 4. TPR/FPR for multiple marks per class at  = 0.1 and  = 0.6.In all cases, V  > 0.8 and V  < 0.12, even with up to 6 marks per class.
20 classes marked and vary /.Figure 9 shows that the trend for V  and V   remains similar to the single mark case: when  ≥ 0.4 and  ≥ 0.1, V  = 1.0,V   ≥ 0.8 and V  = 0 at  = 0.1.
Isotopes in different classes-across datasets.The result observed on PubFig extends to other datasets.We vary the percent of classes marked from 5% to 50%, fix  = 0.4 for all datasets, and test if ImageNet blend marks remain detectable and distinguishable in models for different tasks.We report V  and V   in Table 3, using  = 0.1 as before.Since V  runs in O ( 2 ), we reduce computation time when the number of marked classes exceeds 25 by randomly selecting 25 marks on which to run V  , which yields 25 2 comparisons max instead of  2 .As Table 3 shows, both V and V   are high across the board.For all results shown, V  < 0.05 at  = 0.1.F accuracy remains stable in all settings (< 1% change from baseline).Consequently, we conclude that isotopes remain effective when multiple dataset classes are marked.
Multiple isotopes in a single class.We investigate the case where multiple users insert marks into a single class.Each mark should be learned as associated with this class, and the presence of multiple marks should not prevent learning of individual marks.Although we cannot distinguish marks in this setting (since marks induce a class-level probability shift, see §4), we still measure mark detection.
We test this by training CIFAR100 models with up to 6 marks per class,  = 0.4,  = 0.05, see Table 4.In this setting,  = 0.05 means that each mark controls 5% of the marked class.Even with RD runs only a single statistical test).Table 10 in the Appendix reports raw p-values instead.

Effect of mark similarity on detection and distinguishability
Our method consistently outperforms RD in the black-box setting while requiring far less marked data.Recall that, for RD, the 5% column of Table 5 means that 5% of the dataset is radioactive.For our method, the 5% column means that 5% of classes contain isotopes, each with  = 0.1 of the class marked, so 0.5% of the total dataset is marked.While the result for our method with 1% of classes marked is 1.0 in Table 5, meaning V succeeded, V does not always succeed in this setting (see Fig 8).This emphasizes the importance of boosting, both in our method and as a potential improvement to RD, to minimize false positives and negatives.

Physical Objects as Marks
While our proposed marks are effective in many settings, they require that users edit images after they are taken but before they are shared publicly.Depending on how A sources their data, this assumption may not be realistic.If, for example, A uses data from public surveillance footage to train a face recognition model, users do not control the images and cannot mark them with our method.
To help such users, we propose physical marks, unique physical objects present in images at creation (i.e., not as a result of image transformation).The inclusion of these objects in images enables users to create isotopes even when they cannot control digital image creation.In the facial recognition scenario above, simply wearing a physical object, such as a certain pair of sunglasses or scarf, would ensure that any images taken while the user is wearing that object act like isotopes on the captured images of the user.

Methodology
Physical marks.We use images from the WengerFaces dataset [74] to create and test physical marks.The dataset contains unobstructed, well-lit headshots of 10 people.In some images, subjects wear physical objects or around their faces.We use these objects-sunglasses, a scarf, tattoos, dots, and white tape (see Figure 12)-as marks.
Training dataset.To construct isotopes, we add clean (i.e., unmarked) images from WengerFaces to the Scrub dataset, forming a new 540-class dataset.We designate a WengerFaces class as belonging to   and add physical-mark images to make up 25% of that class.The number of clean images for each WengerFaces subject ranges from 20 to 45, so we use between 5 and 11 marked images per class.The  parameter is not meaningful here.We train a model on this dataset using the settings for Scrub (see §4).
Mark detection.We run V using the other physical objects as external marks.Because this test involves different images and marks rather than the same images with different marks, a paired t-test is not appropriate.Instead, V uses an unpaired, 1-sided t-test to test for a shift in the probability of the marked class between the mark object and other objects.
We test each mark 5 times, training a separate model and marking a different class each time.For each mark, we evaluate V using the other four objects as external marks.As Table 6 reports, larger, more distinct on-face objects like sunglasses, dots, and white tape have the highest success rate, although a higher  is needed to detect them.Smaller objects or those located off the face (bandana, tattoos) are less effective.Normal model accuracy is high (∼ 99%).
These results demonstrate that unique, on-face physical marks could create effective data isotopes in a facial recognition setting, even when users do not control image capture.They can help detect use of images in which users appear but did not create or share.

Isotopes in Real-World Settings
Real-world ML models use diverse training pipelines, preprocessing methods, etc.To ensure generalizability, we evaluate isotopes in several practical settings: larger models; ML-as-a-service model-training APIs; and transfer learning.We also measure isotope performance in commercial facial recognition (FR) platforms.Commercial FR models use different settings (feature matching instead of training from scratch), so these results are in Appendix A.3.

Larger Models
The largest model in our baseline evaluation is Scrub, with 530 classes.We use the ImageNet dataset [12], which has 1000 classes and contains 1.7 million images (training details are in Table 9 in the Appendix) to explore isotope performance in larger models.We use ImageNet blend marks with  = 0.4 and  = 0.1, and assume that each isotope subset is assigned to a different class (this is the most difficult setting).Our trained model has 72% Top-1 accuracy.

ML-as-a-Service APIs
Next, we test isotopes on models trained using MLaaS APIs rather than our local servers.We train CIFAR100 models using Google Vertex AI with 1 and 20 marked classes,  = 0.4,  = 0.1.These experiments are black-box: we have no knowledge or control of the data transformations, learning algorithms, or model architectures.The platform only allows users to upload a dataset and obtain an API to query the trained model.Our models achieve 64 − 65% Top-1 accuracy.As Table 7 shows, V  = 1.0,V  = 0.0 when one class is marked and V  = 0.89, V  = 0.07, V   = 0.84 when 20 classes are marked.These results indicate that isotopes remain effective in MLaaS-trained models.

Transfer Learning
Finally, we consider isotope robustness when A uses transfer learning, a technique commonly used to increase model performance when limited training data or compute power is available [53,68].Transfer learning confers knowledge from a teacher model trained on a domain similar to D by retraining its last few layers on D. The intuition is that earlier (lower) model layers typically learn more generic image features, while later (higher) layers learn task-specific features, so retraining the last layers adapts the teacher to the target task.
Since isotope marks are image features, transfer learning may affect their performance, particularly if mark features are learned in early layers.We evaluate the effect of transfer learning on isotopes using the Scrub dataset with 25 classes marked,  = 0.4,  = 0.1.We use a SphereFace model pretrained on WebFace as the teacher, and train using the PubFig settings in Table 9.We vary the number of unfrozen layers from 1 to 5 and report V  and V   in Figure 13.
Model accuracy is highest when 3 layers are unfrozen, and in this setting, V  = 1.0 and V  = 0 for  = 0.1.V   is slightly lower, but this mirrors the trend in V   observed in Table 3.Since V  trends with model accuracy during transfer learning, these results indicate that isotopes remain effective in this setting.

Robustness to Adaptive Countermeasures
A model trainer may try to prevent isotopes from being used effectively, perhaps to hide the trainer's use of private data for model training.The two main ways to counteract isotopes are to detect them or disrupt them.
We draw inspiration from defenses against poisoning, backdoor, and membership inference attacks, which are all related to isotopes (see §2), to identify techniques that could detect or disrupt isotopes.For example, A could try to detect isotopes using existing methods for spurious correlation detection [48,64] or by analyzing F to detect isotope-induced changes [10,25,51,59,69,71].To disrupt isotopes, A could use adversarial augmentations during training [54,62], modify F 's outputs to harm V's performance [32,63], or selectively retrain F so it forgets isotope features [40].
Here, we evaluate the efficacy and cost of five anti-isotope countermeasures.If a countermeasure incurs a high cost, the model trainer may not use it.Methods to detect isotopes could incur a false positive cost (relevant to §8.1 and §8.2), if they require high FPR for high TPR.Methods to disrupt isotopes may have a model performance cost (relevant to §8.3-8.5), if accuracy must be sacrificed to disrupt isotopes.Unless noted, we evaluate on CIFAR100 models with 25 marked classes, Imagenet marks,  = 0.4,  = 0.1.
We do not evaluate differentially private (DP) model training [2,78].In theory, DP models mask the influence of any given input, potentially making isotopes less detectable.However, there are no known DP techniques to train ImageNet or face recognition models to meaningful accuracy.In the few realistic settings where DP training converges (e.g., some language models [45]), it requires data from millions of users, imposes orders of magnitude overhead vs. normal training, and fails to achieve state-of-the-art accuracy.

Detecting Spurious Correlations
Isotope marking would be ineffective if A could detect and filter out isotope images in D. Existing literature has shown it is possible to detect spurious correlation in datasets [48,64].Since isotopes are inspired by the spurious correlation phenomenon, we test whether the method of Singla et al [64], a state-of-theart spurious correlation detection method, can detect isotopes in D. [64] inspects feature maps produced by a trained F to see if spurious features caused F 's classification decision.
Following [64], we run detection on CIFAR100 models.[64] assumes that the model is robustly trained, but we omit this step, since the corresponding decrease in model accuracy [55] hampers A's goal of training an effective model.We test the "worst-case" scenario for isotopes by computing feature maps for isotope images in D and manually inspecting whether isotope features are flagged in the list of top-5 most important features for the isotope class in F , as reflected in the heatmaps.In reality, A would not know which D images contain isotopes, so would have to inspect the top- activating features (depending on their threshold) for all  classes.To understand the effect of mark visibility and frequency on detection, we vary  from 0.1 to 0.5 ( = 0.1) and  from 0.01 to 0.3     ( = 0.5).We assume that only one class is marked, which makes isotopes more likely to stand out and be detected as spurious.
Results and cost.For scenarios with smaller  ≤ 0.2 and  ≤ 0.4, isotope features are not flagged (see Figure 14).In the strongest cases (i.e. = 0.5,  ≥ 0.2), slight feature map shifts are observed, indicating that for these settings, this method may lead a model trainer to notice something "odd" about isotope images and possibly filter them.However, the  = 0.5,  ≥ 0.2 setting is stronger than needed in practice for effective isotopes.Moreover, this method requires intense manual effort on the part of the model trainer to identify isotope images, making spurious correlation detection an impractical countermeasure.Outlier detection on the training dataset, a related method, also fails to detect spurious correlations (see Appendix A.4 for details).

Inspecting Features
Inspecting F 's features after training could enable detection of isotope-induced behaviors.Since marks increase the probability of the marked label   for marked inputs, the feature-space region associated with   may exhibit isotope-specific behaviors.Several defenses against backdoor attacks use feature inspection to detect backdoors [10,25,51,59,69,71].
We evaluate two feature inspection methods: Spectral Signatures (SS) [71] and Activation Clustering (AC) [10].Both analyze the feature representations of D elements in F and run statistical tests to detect data that elicit unusual model behaviors.Flagged data is removed from D, and F retrained on the pruned dataset.We run both defenses using the author-provided code adapted to our models.For SS, we use the 95 ℎ percentile as cutoff; for AC, we look for two clusters (e.g., "clean" and "poison") and use the "smaller cluster" criterion, since there are fewer isotopes than clean data.Average precision/recall are in Table 8.
Both defenses have low precision and recall in detecting isotope data.Less than 2% of the data flagged by both defenses is actually isotope data.Although AC has higher recall, detecting on average 32% of isotope data, its detection FPR is high (36%).As with spurious correlation detection, these methods have a nontrivial cost for A, who must either manually filter the flagged data to find isotopes or discard a large portion of D. Overall, neither defense detects enough isotope inputs to disrupt isotopes.

Adversarial Augmentation
If A cannot find isotopes in D or F , they can still try to disrupt them.One obvious way is to modify images in D during training.Our experiments in §5 employed common augmentation techniques during training, such as cropping, normalization, and rotation.These did not disrupt isotope performance, but we now test if more aggressive image augmentation could prevent F from learning isotope features.
Adding noise.As a base case, A could try to disrupt isotopes by adding Gaussian noise to D images before training.This could disrupt subtle features on images, potentially rendering marks ineffective.However, as Figure 15 shows, this is not the case.Adding noise with  = 0 and varying  to images in D reduces F 's accuracy faster than V  or V   .
Adding marks.A more aggressive tactic would be to add more marks to D, to disrupt the learning of users' marks.We assume A adds marks to all images in D, since they cannot know a priori which images have user-added marks.We use images from the GTSRB dataset as A's marks and test their effect on isotope performance as  varies.
As Figure 16 shows, adding marks slowly degrades F and V   accuracy as  increases.However, it has a much stronger effect on V  , which drops to 0 once the additional mark  ′ ≥ 0.2.This performance drop is likely because the new marks added are extremely similar to both the isotope and external marks used in V (i.e., all are images blended into other images).When all training images contain similar marks, isotope marks are no longer unique and are not learned as spurious correlations, confounding V.
Costs.Adding noise imposes a significant model accuracy cost on A, as it causes F 's accuracy degrade as or more quickly than V  and V   .Since A wants to train a highly accurate model, they would not use noise to disrupt isotopes.Although adding new marks drops V  once the additional marks have  ′ ≥ 0.2, model accuracy decreases by at least 5% when  ′ = 0.2, which may be unacceptable for A, depending on the scenario.Regardless, we believe this countermeasure works better because of the similarity between the new marks and our isotope marks, making it more difficult to for isotope marks to act as spurious features.Future work broadening the set of features available as isotope marks could mitigate this issue.

Reducing Granularity of Outputs
A could try to disrupt isotope verification V by modifying F 's outputs.We consider two methods A could employ: adding noise to F 's logits, or reducing the granularity of F 's classification results.
Add noise to F 's outputs.V measures shifts in probabilities to detect isotopes, so adding noise to F 's outputs may obscure these shifts and render V ineffective.We test this by adding Gaussian noise with  = 0 and varying  to F 's logits before computing its probability vector.However, as Figure 17 shows, adding noise to F 's logits reduces model accuracy before V  or V   decrease.Since A incurs a high accuracy cost, this method is unusable.
Return only top- predictions.Our basic isotope verification algorithm assumes that F returns a probability distribution over all classes.While this assumption holds for many real-world ML APIs (Table 12 in Appendix A.5), F could respond to queries with less information (e.g.Face++ in Table 12).
To test isotope performance in this modified setting, we limit the model's outputs to the top- ranked classes,  ∈ {2, 5, 10, 15, 20, 25, 50} and compute the shift in the rank of the isotope class between x t and x t ′ .If the isotope class is not in the top , we set its rank as  + 1. V runs its t-test on the rank shifts, instead of probability shifts.We report the average V  and V   accuracy for each .
As Figure 18 shows, V  remains high in the rank-only setting, but V   decreases significantly.Our explanation is that any correct mark learned by F , regardless of whether it is correct for a given class, induces a change in F 's probability, simply because it has been learned.When raw probabilities are available, there is an obvious distinction between the probability shift for true and false marks for a class.When only the top- outputs are available, there is not enough signal determine this.While this drop in V   in the top- setting is unfortunate, recall that an individual user  only knows their mark and thus cannot run V  .Therefore, top- outputs are sufficient for mark detection, the user's primary goal.
Costs.Adding noise to F 's logits directly decreases model accuracy and imposes a significant cost on the model trainer.The cost of restricting to only the top- outputs is subtler.Unlike other countermeasures, this technique would, in many settings, reduce the model's utility for users.Furthermore, limiting outputs to only ranks provides only "security by obscurity" and could be overcome by more advanced isotope detection methods [11,42].

Targeted Fine-tuning
Finally, a motivated adversary can fine-tune their model with unrelated data to make F "forget" isotope-related features.In adapting to new data, F might hold onto the core features of the original class but forget spurious features like isotopes.To test this, we resize, relabel, and normalize Scrub images to serve as fine-tuning data for marked labels in CIFAR100.Results are shown in Figure 19.Marked class accuracy degrades much faster than V  , making targeted retraining costly and ineffective.

Limitations and Future Work
There are a number of limitations to our current work.First, most of our experiments use visibility level  = 0.4, which can leave visible marks on images.We made the tradeoff for this higher  because it means we can detect isotopes with near-perfect accuracy when isotopes only make up 10% of a class.This might be an acceptable cost for privacy-conscious users, but can also easily be adjusted per user preferences.Second, we did not explore isotope efficacy in other scenarios, e.g., enterprise-scale models with millions of classes, or  values below 0.1, for scenarios where many users contribute data to a common class.Third, our approach can be affected if the model only offers limited outputs (e.g., only top-K results), or if model trainers are willing to sacrifice the accuracy of their models to evade isotope-based provenance methods ( §8.3).Finally, despite our best efforts to study a variety of adaptive attacks, our system might be circumvented by future countermeasures.
There are also several directions to extend and improve this work.First, the isotope marks we evaluate -ImageNet images blended into other images -introduce large feature disturbances into images.There is clearly ample room for work that explores alternative approaches with significantly less visual impact, e.g.spurious correlations that do not require a mask over the full image.Second, we need to better understand how isotopes (and other data provenance tools) behave in a continual learning setting, as is used in many commercial ML models today [35,49].While results in §8 show that retraining with orthogonal data does not cause a model to forget isotope features, long-term retraining of models with in-distribution data could over time cause forgetting of isotope features, since they are not "core" class features.Table 10.Comparison of our method and Radioactive Data [56], reporting p values instead of V  as in Table 5.
which site the images came from, perhaps by posting identical images with different marks on different sites.
A.4 Outlier Detection as a Countermeasure  The method works best for  = 0.1, but V  is low for this / ( §5.2).
Outlier detection could enable A to identify marked images in the training dataset and remove them before training F .To test the efficacy of this countermeasure, we run an outlier detection method that is based on k-nearest neighbors [21].We pass D through a model pre-trained on a similar domain to create feature representations and cluster the representations into  classes, where  is the number of classes in D. Finally, we run outlier detection on these clusters while varying the outlier threshold to compare the TPR (i.e., isotope images flagged as outliers) and FPR at different thresholds.We assume that A looks for outliers for each label/cluster.
Since we test on CIFAR100 models, we use a pre-trained Ima-geNet model to produce the feature representation.We evaluate in the single-mark setting, since this represents the worst-case scenario for the user: with only one label marked, isotope images are more likely to stand out and be flagged as outliers.To understand the effect of mark visibility and mark frequency on detection efficacy, we vary  from 0.1 to 0.5 ( = 0.1) and  from 0.01 to 0.3 ( = 0.5).21 and 22 show, when  is larger or  is smaller, isotope images are easier to flag as outliers, and the AUC for outlier detection increases.Outlier detection performs well when  = 0.1 and  = 0.1, but V accuracy is low for these parameters, making them unlikely to be used in practice (see Figure 7).Overall, KNN-based outlier detection detects isotope outliers only at high false positive rates, necessitating either additional filtering to find the true positives or throwing out a large chunk of unmarked data.This is a nontrivial cost, as both acquiring new data and manually inspecting existing data are time-and resource-intensive.More advanced outlier detection may reduce the FPR, and we leave this to future work.

A.5 Query Outputs in Real-World MLaaS Systems
Table 12 provides examples of query outputs returned by realworld MLaaS providers.Most systems by default return any matches (for facial recognition) or labels (in a classification setting) above a

Figure 1 .
Figure 1.Control over data content, data labels, and model training by different players in the ML ecosystem.

Figure 2 .
Figure 2. The presence of a spurious feature "mark" in images subtly increases the probability of the marked class in a model's probability output.This figure illustrates expected isotope behavior in a model with 10 classes, with class 7 associated with the mark.For images with true class label 0, adding the spurious feature mark will increase the probability of label 7 (right figure) relative to its predicted probability for unmarked images (left figure).

2 :
Data collection.A model trainer A, wishing to train an class image classification model, creates training dataset D. A collects data from users  1 ,  2 , . . .  and assigns it to one of  labels, forming D. As described in §3, we assume a sufficient number of   's isotopes T  with mark  have label   .3 : Model training and publication.

Figure 3 . 2
Figure 3.A high-level overview of our isotopes methodology: 1 User  1 posts a set of images online, including data isotopes; 2 Model trainer A collects these images to create a dataset D; 3 A trains model F on D; 4  1 queries F and uses verifier V to determine if their isotope images were used to train F.

Figure 4 .
Figure 4. Detailed illustration of isotope creation and detection, explained in §4.2 and §4.3.easy for   to determine what data should be in D  based on its classification task.  does not include images with label   , since V detects changes in the probability of label   for images whose true label is different. selects , the number of D  images used by V in a single round; an external mark  ′ to use for differential testing; and a threshold , which V uses to determine if the test result is significant.Finally,  chooses , the number of rounds in V, and , the proportion of rounds that must produce a significant t-test for V to output 1.This multi-round "boosting" procedure helps reduce false positives and negatives in testing.

Figure 5 .
Figure 5. Different marks used in our experiments.

Figure 6 .
Figure 6.Visibility of ImageNet blend mark increases with .

Figure 7 .
Figure 7. Average V  values for different marks in a CIFAR100 model.Marks introducing stronger features into images (like ImageNet blend) perform better.

Figure 8 .
Figure 8.Average V  values at  = 0.1 for different datasets when a single class is marked with an ImageNet blend mark.For most datasets, marking is effective when  ≥ 0.4 and  ≥ 0.1.

Figure 13 .
Figure 13.Isotopes remain detectable in a transfer learning setting when at least 3 layers are unfrozen during training.

Figure 14 .
Figure 14.The state-of-the-art spurious correlation detection method cannot flag isotopes with reasonable settings like  = 0.1 and  = 0.4.

Figure 15 .
Figure 15.Adding Gaussian noise with  = 0 and increasing  to images in D degrades F's accuracy faster than V  or V   .

Figure 16 .Figure 17 .
Figure 16.Adding new marks to images in D degrades F's accuracy more than V  or V   .

Figure 18 .
Figure 18.Returning only the top- outputs reduces mark distinguishability but not detectability.

Figure 19 .
Figure 19.Retraining CIFAR100 marked classes using Scrub data degrades F's accuracy faster than V  .

Figure 22 .
Figure 22.ROC curves for outlier detection (fixed  and varying ).The method works best for  = 0.1, but V  is low for this / ( §5.2).

•
At test time, A does not cooperate with  .Therefore,  does not have access to the internal of F and can only interact with it via a query API.•Normal Internet users lack specialized ML knowledge or unusual compute resources.The data provenance solution should be deployable by individuals, without requiring intense computation or data collection by  .For example,  lacks the skills and hardware needed to scrape large amount of training data to train shadow models, etc.

Table 2 .
correlations.The goal of model training is to extract general patterns from the training dataset D. If D is biased or insufficiently diverse with respect to the distribution from which it is sampled, F can learn spurious correlations from D, i.e., certain features not relevant to a class become predictive of that class in Notation used in this paper.
10: if (/ ) >  •  then Return 1 11: Return: 0 4.2 Isotope Creation   creates isotopes via three steps: mark selection, mark insertion, and data release-see Figure 4. Mark selection.Data isotopes should contain distinct, memorizable features that introduce a spurious correlation in F , so the features of mark  should not commonly appear in   's images.Furthermore,  should be unique and distinct from other marks, should they appear in D. We discuss practical mark choices in §5.Mark insertion.  adds  to D  images to create isotope subset T  .Mark insertion is parameterized by  and , mark visibility and the proportion of D  user images marked.  chooses  • |D  | images from D  and adds  to each image  via  ⊕ (, , )

Table 7 .
Isotopes remain detectable in models trained on Google's Cloud ML API.

Table 9 .
Model training details for each task.Figure 20.Comparison of V's performance for different  values and on paired external marks.

Table 11 .
Results from isotope detection in Amazon Rekognition.For set1 and set2, the true match is always the top match.For unenrolled isotope images (3), isotope images with the same mark appear in the top 5 hits.Ours (black box) < 1.0 −10 < 1.0 −10 1.98 −7 1.5−5