SPINML: Customized Synthetic Data Generation for Private Training of Specialized ML Models

Authors: Jiang Zhang (University of Southern California), Rohan Sequeira (University of Southern California), Konstantinos Psounis (University of Southern California)

Volume: 2025
Issue: 2
Pages: 140–156
DOI: https://doi.org/10.56553/popets-2025-0054

Download PDF

Abstract: Specialized machine learning (ML) models tailored to users’ needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled private data due to concerns about user privacy. To address these challenges, we propose a novel system SpinML, where the server generates customized Synthetic image data to Privately traIN a specialized ML model tailored to the user request, with the usage of only a few sanitized reference images from the user. SpinML offers users fine-grained, object-level control over the reference images, which allows user to trade between the privacy and utility of the generated synthetic data according to their privacy preferences. Through experiments on three specialized model training tasks, we demonstrate that our proposed system can enhance the perfor- mance of specialized models without compromising users’ privacy preferences.

Keywords: machine learning, synthetic data, privacy, utility

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.