RT Journal Article SR Electronic T1 Data Storage Based on Combinatorial Synthesis of DNA Shortmers JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.08.01.454622 DO 10.1101/2021.08.01.454622 A1 Inbal Preuss A1 Zohar Yakhini A1 Leon Anavy YR 2021 UL http://biorxiv.org/content/early/2021/08/01/2021.08.01.454622.abstract AB Storage needs represent a significant burden on the economy and the environment. Some of this can potentially be offset by improved density molecular storage. The potential of using DNA for storing data is immense. DNA can be harnessed as a high density, durable archiving medium for compressing and storing the exponentially growing quantities of digital data that mankind generates. Several studies have demonstrated the potential of DNA-based data storage systems. These include exploration of different encoding and error correction schemes and the use of different technologies for DNA synthesis and sequencing. Recently, the use of composite DNA letters has been demonstrated to leverage the inherent redundancy in DNA based storage systems to achieve higher logical density, offering a more cost-effective approach. However, the suggested composite DNA approach is still limited due to its sensitivity to the stochastic nature of the process. Combinatorial assembly methods were also suggested to encode information on DNA in high density, while avoding the challenges of the stochastic system. These are based on enzynatic assembly processes for producing the synthetic DNA.In this paper, we propose a novel method to encode information into DNA molecules using combinatorial encoding and shortmer DNA synthesis, in compatibility with current chemical DNA synthesis technologies. Our approach is based on a set of easily distinguishable DNA shortmers serving as building blocks and allowing for near-zero error rates. We construct an extended combinatorial alphabet in which every letter is a subset of the set of building blocks. We suggest different combinatorial encoding schemes and explore their theoretical properties and practical implications in terms of error probabilities and required sequencing depth. To demonstrate the feasibility of our approach, we implemented an end-to-end computer simulation of a DNA-based storage system, using our suggested combinatorial encodings. We use simulations to assess the performance of the system and the effect of different parameters.Our simulations suggest that our combinatorial approach can potentially achieve up to 6.5-fold increase in the logical density over standard DNA based storage systems, with near zero reconstruction error.Implementing our approach at scale to perform actual synthesis, requires minimal alterations to current technologies. Our work thus suggests that the combination of combinatorial encoding with standard DNA chemical synthesis technologies can potentially improve current solutions, achieving scalable, efficient and cost-effective DNA-based storage.Competing Interest StatementThe authors have declared no competing interest.