Privacy-Preserving Fingerprinting Against Collusion and Correlation Threats in Genomic Data
Authors: Tianxi Ji (Texas Tech University), Erman Ayday (Case Western Reserve University), Emre Yilmaz (University of Houston-Downtown), Pan Li (Case Western Reserve University)
Volume: 2024
Issue: 3
Pages: 659–673
DOI: https://doi.org/10.56553/popets-2024-0098
Abstract: Sharing genomic databases is critical to the collaborative research in computational biology. A shared database is more informative than specific genome-wide association studies (GWAS) statistics as it enables do-it-yourself calculations. Genomic databases involve intellectual efforts from the curator and sensitive information of participants, thus in the course of data sharing, the curator (database owner) should be able to prevent unauthorized redistributions and protect genomic data privacy. As it becomes increasingly common for a single database be shared with multiple recipients, the shared genomic database should also be robust against collusion attack, where multiple malicious recipients combine their individual copies to forge a pirated one with the hope that none of them can be traced back. The strong correlation among genomic entries also make the shared database vulnerable to attacks that leverage the public correlation models. In this paper, we assess the robustness of shared genomic database under both collusion and correlation threats. To this end, we first develop a novel genomic database fingerprinting scheme, called Gen-Scope. It achieves both copyright protection (by enabling traceability) and privacy preservation (via local differential privacy) for the shared genomic databases. To defend against collusion attacks, we augment Gen-Scope with a powerful traitor tracing technique, i.e., the Tardos codes. Via experiments using a real-world genomic database, we show that Gen-Scope achieves strong fingerprint robustness, e.g., the fingerprint cannot be compromised even if the attacker changes 45% of the entries in its received fingerprinted copy and colluders will be detected with high probability. Additionally, Gen-Scope outperforms the considered baseline methods. Under the same privacy and copyright guarantees, the accuracy of the fingerprinted genomic database obtained by Gen-Scope is around 10% higher than that achieved by the baseline, and in terms of preservations of GWAS statistics, the consistency of variant-phenotype associations can be about 20% higher. Notably, we also empirically show that Gen-Scope can identify at least one of the colluders even if malicious receipts collude after independent correlation attacks.
Keywords: genomic data, privacy, gcollusion and correlation attack
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.