DNA Sequencing Flow Cells and the Security of the Molecular-Digital Interface

Authors: Peter Ney (University of Washington), Lee Organick (University of Washington), Jeff Nivala (University of Washington), Luis Ceze (University of Washington), Tadayoshi Kohno (University of Washington)

Volume: 2021
Issue: 3
Pages: 413–432
DOI: https://doi.org/10.2478/popets-2021-0054

Download PDF

Abstract: DNA sequencing is the molecular-to-digital conversion of DNA molecules, which are made up of a linear sequence of bases (A,C,G,T), into digital information. Central to this conversion are specialized fluidic devices, called sequencing flow cells, that distribute DNA onto a surface where the molecules can be read. As more computing becomes integrated with physical systems, we set out to explore how sequencing flow cell architecture can affect the security and privacy of the sequencing process and downstream data analysis. In the course of our investigation, we found that the unusual nature of molecular processing and flow cell design contributes to two security and privacy issues. First, DNA molecules are ‘sticky’ and stable for long periods of time. In a manner analogous to data recovery from discarded hard drives, we hypothesized that residual DNA attached to used flow cells could be collected and resequenced to recover a significant portion of the previously sequenced data. In experiments we were able to recover over 23.4% of a previously sequenced genome sample and perfectly decode image files encoded in DNA, suggesting that flow cells may be at risk of data recovery attacks. Second, we hypothesized that methods used to simultaneously sequence separate DNA samples together to increase sequencing throughput (multiplex sequencing), which incidentally leaks small amounts of data between samples, could cause data corruption and allow samples to adversarially manipulate sequencing data. We find that a maliciously crafted synthetic DNA sample can be used to alter targeted genetic variants in other samples using this vulnerability. Such a sample could be used to corrupt sequencing data or even be spiked into tissue samples, whenever untrusted samples are sequenced together. Taken together, these results suggest that, like many computing boundaries, the molecular-to-digital interface raises potential issues that should be considered in future sequencing and molecular sensing systems, especially as they become more ubiquitous.

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs license.