RT Journal Article SR Electronic T1 Content-Based Similarity Search in Large-Scale DNA Data Storage Systems JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.05.25.115477 DO 10.1101/2020.05.25.115477 A1 Bee, Callista A1 Chen, Yuan-Jyue A1 Ward, David A1 Liu, Xiaomeng A1 Seelig, Georg A1 Strauss, Karin A1 Ceze, Luis YR 2020 UL http://biorxiv.org/content/early/2020/05/27/2020.05.25.115477.abstract AB Synthetic DNA has the potential to store the world’s continuously growing amount of data in an extremely dense and durable medium. Current proposals for DNA-based digital storage systems include the ability to retrieve individual files by their unique identifier, but not by their content. Here, we demonstrate content-based retrieval from a DNA database by learning a mapping from images to DNA sequences such that an encoded query image will retrieve visually similar images from the database via DNA hybridization. We encoded and synthesized a database of 1.6 million images and queried it with a variety of images, showing that each query retrieves a sample of the database containing visually similar images are retrieved at a rate much greater than chance. We compare our results with several algorithms for similarity search in electronic systems, and demonstrate that our molecular approach is competitive with state-of-the-art electronics.One Sentence Summary Learned encodings enable content-based image similarity search from a database of 1.6 million images encoded in synthetic DNA.Competing Interest StatementC.B., Y.C., G.S, K.S, and L.C. have filed a patent application on the core idea. K.S. and Y.C. are employed by Microsoft.