1. bookVolume 2021 (2021): Issue 3 (July 2021)
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year
access type Open Access

DNA Sequencing Flow Cells and the Security of the Molecular-Digital Interface

Published Online: 27 Apr 2021
Page range: 413 - 432
Received: 30 Nov 2020
Accepted: 16 Mar 2021
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year

DNA sequencing is the molecular-to-digital conversion of DNA molecules, which are made up of a linear sequence of bases (A,C,G,T), into digital information. Central to this conversion are specialized fluidic devices, called sequencing flow cells, that distribute DNA onto a surface where the molecules can be read. As more computing becomes integrated with physical systems, we set out to explore how sequencing flow cell architecture can affect the security and privacy of the sequencing process and downstream data analysis. In the course of our investigation, we found that the unusual nature of molecular processing and flow cell design contributes to two security and privacy issues. First, DNA molecules are ‘sticky’ and stable for long periods of time. In a manner analogous to data recovery from discarded hard drives, we hypothesized that residual DNA attached to used flow cells could be collected and re-sequenced to recover a significant portion of the previously sequenced data. In experiments we were able to recover over 23.4% of a previously sequenced genome sample and perfectly decode image files encoded in DNA, suggesting that flow cells may be at risk of data recovery attacks. Second, we hypothesized that methods used to simultaneously sequence separate DNA samples together to increase sequencing throughput (multiplex sequencing), which incidentally leaks small amounts of data between samples, could cause data corruption and allow samples to adversarially manipulate sequencing data. We find that a maliciously crafted synthetic DNA sample can be used to alter targeted genetic variants in other samples using this vulnerability. Such a sample could be used to corrupt sequencing data or even be spiked into tissue samples, whenever untrusted samples are sequenced together. Taken together, these results suggest that, like many computing boundaries, the molecular-to-digital interface raises potential issues that should be considered in future sequencing and molecular sensing systems, especially as they become more ubiquitous.

[1] Joel Armstrong, Ian T. Fiddes, Mark Diekhans, and Benedict Paten. Whole-Genome Alignment and Comparative Annotation. Annual Review of Animal Biosciences, 2019.Search in Google Scholar

[2] Alessandro Barenghi, Luca Breveglieri, Israel Koren, and David Naccache. Fault injection attacks on cryptographic devices: Theory, practice, and countermeasures. Proceedings of the IEEE, 100(11):3056–3076, 2012.Search in Google Scholar

[3] Zachary S Bohannan and Antonina Mitrofanova. Calling variants in the clinic: Informed variant calling decisions based on biological, clinical, and laboratory variables. Computational and structural biotechnology journal, 2019.Search in Google Scholar

[4] Luis Ceze, Jeff Nivala, and Karin Strauss. Molecular digital data storage using DNA. Nature Reviews Genetics, 2019.Search in Google Scholar

[5] Weida D. Chen, A. Xavier Kohll, Bichlien H. Nguyen, Julian Koch, Reinhard Heckel, et al. Combining Data Longevity with High Storage Capacity—Layer-by-Layer DNA Encapsulated in Magnetic Nanoparticles. Advanced Functional Materials, 2019.Search in Google Scholar

[6] George M Church, Yuan Gao, and Sriram Kosuri. Next-generation digital information storage in dna. Science, 337(6102):1628–1628, 2012.Search in Google Scholar

[7] Maura Costello, Mark Fleharty, Justin Abreu, Yossi Farjoun, Steven Ferriera, et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC genomics, 19(1):332, 2018.Search in Google Scholar

[8] Yaniv Erlich, Tal Shor, Itsik Pe’er, and Shai Carmi. Identity inference of genomic data using long-range familial searches. Science, 362(6415):690–694, 2018.Search in Google Scholar

[9] Sina Faezi, Sujit Rokka Chhetri, Arnav Vaibhav Malawade, John Charles Chaput, William H Grover, Philip Brisk, and Mohammad Abdullah Al Faruque. Oligo-snoop: A noninvasive side channel attack against dna synthesis machines. In NDSS, 2019.Search in Google Scholar

[10] Iliya Fayans, Yair Motro, Lior Rokach, Yossi Oren, and Jacob Moran-Gilad. Cyber security threats in the microbial genomics era: implications for public health. Eurosurveil-lance, 25(6):1900574, 2020.Search in Google Scholar

[11] Simson L Garfinkel. Forensic feature extraction and cross-drive analysis. digital investigation, 3:71–81, 2006.Search in Google Scholar

[12] Simson L Garfinkel and Abhi Shelat. Remembrance of data passed: A study of disk sanitization practices. IEEE Security & Privacy, 1(1):17–27, 2003.Search in Google Scholar

[13] GM12878. Coriell Institute. https://www.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=GM12878.Search in Google Scholar

[14] Nick Goldman, Paul Bertone, Siyuan Chen, Christophe Dessimoz, Emily M. Leproust, Botond Sipos, and Ewan Birney. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature, 2013.Search in Google Scholar

[15] Peter Gutmann. Secure deletion of data from magnetic and solid-state memory. In Proceedings of the Sixth USENIX Security Symposium, San Jose, CA, volume 14, pages 77–89, 1996.Search in Google Scholar

[16] J Alex Halderman, Seth D Schoen, Nadia Heninger, William Clarkson, William Paul, et al. Lest we remember: cold-boot attacks on encryption keys. Communications of the ACM, 52(5):91–98, 2009.Search in Google Scholar

[17] Matthew Herper. Ancestry launches consumer genetics tests for health, intensifying rivalry with 23andme. Stat, October 16, 2019.Search in Google Scholar

[18] Sequencing coverage for NGS experiments. Illumina. https://www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/coverage.html.Search in Google Scholar

[19] Truesight cystic fibrosis data sheet. Illumina.Search in Google Scholar

[20] Ampliseq for illumina BRCA panel reference guide. Illumina, 2019.Search in Google Scholar

[21] Ampliseq for illumina exome panel reference guide. Illumina, 2019.Search in Google Scholar

[22] Effects of index misassignment on multiplexing and downstream analysis, 2020. https://www.illumina.com/content/dam/illumina-marketing/documents/products/whitepapers/index-hopping-white-paper-770-2017-004.pdf. Accessed: 2020-06-12.Search in Google Scholar

[23] Moshe Karni, Dolev Zidon, Pazit Polak, Zeev Zalevsky, and Orit Shefi. Thermal degradation of DNA. DNA and Cell Biology, 2013.Search in Google Scholar

[24] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. Flipping bits in memory without accessing them: An experimental study of dram disturbance errors. ACM SIGARCH Computer Architecture News, 42(3):361–372, 2014.Search in Google Scholar

[25] Martin Kircher, Susanna Sawyer, and Matthias Meyer. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic acids research, 40(1), 2012.Search in Google Scholar

[26] Qiaoling Li, Xia Zhao, Wenwei Zhang, Lin Wang, Jingjing Wang, et al. Reliable multiplex sequencing with rare index mis-assignment on DNB-based NGS platform. BMC genomics, 20(1):215, 2019.Search in Google Scholar

[27] Laura E MacConaill, Robert T Burns, Anwesha Nag, Haley A Coleman, Michael K Slevin, et al. Unique, dual-indexed sequencing adapters with umis effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing. BMC genomics, 19(1):30, 2018.Search in Google Scholar

[28] Matthias Meyer and Martin Kircher. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor Protocols, 2010(6), 2010.Search in Google Scholar

[29] Matthias Meyer, Udo Stenzel, Sean Myles, Kay Prüfer, and Michael Hofreiter. Targeted high-throughput sequencing of tagged nucleic acid samples. Nucleic Acids Research, 35(15), 2007.Search in Google Scholar

[30] Abhishek Mitra, Magdalena Skrzypczak, Krzysztof Ginalski, and Maga Rowicka. Strategies for achieving high sequencing accuracy for low diversity samples and avoiding sample bleeding using Illumina platform. PloS one, 10(4), 2015.Search in Google Scholar

[31] Why do i need to return my flow cells? Nanopore. https://store.nanoporetech.com/us/nanohelp/faq/why-do-i-need-to-return-my-flow-cells.Search in Google Scholar

[32] New kit extends yields of flow cells. https://nanoporetech.com/about-us/news/new-kit-extends-yields-flow-cells. Accessed: 2020-06-12.Search in Google Scholar

[33] Peter Ney, Karl Koscher, Lee Organick, Luis Ceze, and Tadayoshi Kohno. Computer security, privacy, and DNA sequencing: Compromising computers with synthesized DNA, privacy leaks, and more. In 26th USENIX Security Symposium (USENIX Security 17), pages 765–779, Vancouver, BC, 2017. USENIX Association.Search in Google Scholar

[34] Novaseq system specifications. https://www.illumina.com/systems/sequencing-platforms/novaseq/specifications.html. Accessed: 2020-06-11.Search in Google Scholar

[35] Lee Organick, Siena Dumas Ang, Yuan Jyue Chen, Randolph Lopez, Sergey Yekhanin, et al. Random access in large-scale DNA data storage. Nature Biotechnology, 2018.Search in Google Scholar

[36] Lee Organick, Yuan Jyue Chen, Siena Dumas Ang, Randolph Lopez, Xiaomeng Liu, et al. Probing the physical limits of reliable DNA data retrieval. Nature Communications, 2020.Search in Google Scholar

[37] Smrt cell 8m tray safety data sheet. PacBio, 2019. https://www.pacb.com/wp-content/uploads/SDS-SMRT-Cell-8M-Tray.pdf.Search in Google Scholar

[38] Brent S Pedersen and Aaron R Quinlan. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics, 34(5):867–868, 2018.Search in Google Scholar

[39] A. M. Prince and L. Andrus. PCR: How to kill unwanted DNA. BioTechniques, 1992.Search in Google Scholar

[40] Joel Reardon, David Basin, and Srdjan Capkun. Sok: Secure data deletion. In 2013 IEEE symposium on security and privacy, pages 301–315. IEEE, 2013.Search in Google Scholar

[41] Antonio Regalado. China’s bgi says it can sequence a genome for just $100. MIT Technology Review, February 26, 2020. https://www.technologyreview.com/2020/02/26/905658/china-bgi-100-dollar-genome/. Accessed: 2020-06-12.Search in Google Scholar

[42] Garrett J Schumacher, Sterling Sawaya, Demetrius Nelson, and Aaron J Hansen. Genetic information insecurity as state of the art. bioRxiv, 2020.Search in Google Scholar

[43] Rahul Sinha, Geoff Stanley, Gunsagar Singh Gulati, Camille Ezran, Kyle Joseph Travaglini, et al. Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing. BioRxiv, 2017. https://doi.org/10.1101/125724.Search in Google Scholar

[44] Wendy Weijia Soon, Manoj Hariharan, and Michael P Snyder. High-throughput sequencing for biology and medicine. Molecular systems biology, 9(1), 2013.Search in Google Scholar

[45] Julie Utterback. Illumina remains the clear leader of the genomic sequencing market. Morningstar, April 30, 2020.Search in Google Scholar

[46] Erik Scott Wright and Kalin Horen Vetsigian. Quality filtering of Illumina index reads mitigates sample cross-talk. BMC genomics, 17(1):876, 2016.Search in Google Scholar

[47] Yuan Xiao, Xiaokuan Zhang, Yinqian Zhang, and Radu Teodorescu. One bit flips, one cloud flops: Cross-vm row hammer attacks and privilege escalation. In 25th USENIX Security Symposium (USENIX Security 16), pages 19–35, 2016.Search in Google Scholar

[48] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Cross-vm side channels and their use to extract private keys. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 305–316, 2012.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo