PT - JOURNAL ARTICLE AU - Ian Holmes TI - Modular non-repeating codes for DNA storage AID - 10.1101/057448 DP - 2016 Jan 01 TA - bioRxiv PG - 057448 4099 - http://biorxiv.org/content/early/2016/06/08/057448.short 4100 - http://biorxiv.org/content/early/2016/06/08/057448.full AB - We describe a strategy for constructing codes for DNA-based information storage by serial composition of weighted finite-state transducers. The resulting state machines can integrate correction of substitution errors; synchronization by interleaving watermark and periodic marker signals; conversion from binary to ternary, quaternary or mixed-radix sequences via an efficient block code; encoding into a DNA sequence that avoids homopolymer, dinucleotide, or trinucleotide runs and other short local repeats; and detection/correction of errors (including local duplications, burst deletions, and substitutions) that are characteristic of DNA sequencing technologies. We present software implementing these codes, available at https://github.com/ihh/dnastore, with simulation results demonstrating that the generated DNA is free of short repeats and can be accurately decoded even in the presence of substitutions, short duplications and deletions.