ABSTRACT
DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense however of introducing biocompatibility challenges or at the risk of decoding failure. Here, we propose a robust transcoding algorithm named the “Yin-Yang Codec” (YYC), using two rules to encode two binary bits into one nucleotide, to generate DNA sequences highly compatible with synthesis and sequencing technologies. We encoded two representative file formats and stored them in vitro as 200-nt oligo pools and in vivo as an ~54-kb DNA fragment in yeast cells. Sequencing results show that YYC exhibits high robustness and reliability for a wide variety of data types, with an average recovery rate of 99.94% at 104 molecule copies and an achieved recovery rate of 87.53% at 100 copies. In addition, the in vivo storage demonstration achieved for the first time an experimentally measured physical information density of 198.8 EB per gram of DNA (44% of the theoretical maximum for DNA).
Competing Interest Statement
Sha Zhu is a currently the founder of TAICHI AI Ltd, 20-22, Wenlock Road, London, England, N1 7GU. This work was cSha Zhu is a currently the founder of TAICHI AI Ltd, 20-22, Wenlock Road, London, England, N1 7GU. This work was completed when Sha Zhu was working at the University of Oxford and consulting for the BGI. George Church has significant interests in Twist, Roswell, BGI, v.ht/PHNc, and v.ht/moVD.ompleted when Sha Zhu was working at the University of Oxford and consulting for the BGI. George Church has significant interests in Twist, Roswell, BGI, v.ht/PHNc, and v.ht/moVD.
Footnotes
Author affiliations updated. New data added.