Private Computation on Common Fuzzy Records
Authors: Kyoohyung Han (Samsung SDS), Seongkwang Kim (Samsung SDS), Yongha Son (Sungshin Women's University)
Volume: 2025
Issue: 1
Pages: 567–583
DOI: https://doi.org/10.56553/popets-2025-0031
Abstract: Private computation on common records refers to analyze data from two databases containing shared records without revealing personal information. As a basic requirement for private computation, the databases involved essentially need to be aligned by a common identification system. However, it is hard to expect such common identifiers in real world scenario. For this reason, multiple quasi-identifiers can be used to identify common records. As some quasi-identifiers might be missing or have typos, it is important to support fuzzy records setting. Identifying common records using quasi-identifiers requires manipulation of highly sensitive information, which could be privacy concerns. This work studies the problem of enabling such data analysis on the fuzzy records of quasi-identifiers. To this end, we propose "ordered threshold-one (OTO)" matching which can be efficiently realized by circuit-based private set intersection~(CPSI) protocols and some multiparty computation (MPC) techniques. Furthermore, we introduce some generic encoding techniques from traditional matching rules to the OTO matching. Finally, we achieve a secure efficient private computation protocol which supports various matching rules which have already been widely used. We also demonstrate the superiority of our proposal with experimental validation. First, we empirically check that our encoding to OTO matching does not affect accuracy a lot for the benchmark datasets found in the fuzzy record matching literature. Second, we implement our protocol and achieve significantly faster performance at the cost of communication overhead compared to previous privacy-preserving record linkage (PPRL) protocols. In the case of 100K records for each dataset, our work shows 147.58MB communication cost, 10.71s setup time, and 1.97s online time, which is 7.78 times faster compared to the previous work (50.12 times faster when considering online time only).
Keywords: secure multiplarty computation, privacy-preserving record linkage, private set intersection
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.