Abstract
Background High quality reference genome sequences are the core of modern genomics. Oxford Nanopore Technologies (ONT) produces inexpensive DNA sequences in excess of 100,000 nucleotides but high error rates make sequence assembly and analysis a non-trivial problem as genome size and complexity increases. To date there has been no comprehensive attempt to generate robust experimental design for ONT genome sequencing and assembly. In this study, we simulate ONT and Illumina DNA sequence reads for the model organisms Escherichia coli, Caenorhabditis elegans, Arabidopsis thaliana, and Drosophila melanogaster and assemble with Canu, Flye, and MaSuRCA software to quantify the influence of sequencing coverage and assembly approach. Heterozygosity in outbred eukaryotes is a common problem for genome assembly. We show broad applicability of our methods using real ONT data generated for four strains of the highly heterozygous nematode Caenorhabditis remanei and C. latens.
ONT libraries have a unique error structure and high sequence depth is necessary to assemble contiguous genome sequences.
As sequence depth increases errors accumulate and assembly statistics plateau.
High-quality assembled sequences require a combination of experimental techniques that increase sequence read length and computational protocols that reduce error through correction, read selection and ‘polishing’ with higher accuracy short sequence reads.
Our robust experimental design results in highly contiguous and accurate genome assemblies for the four strains of C. remanei and C. latens.
Conclusions ONT sequencing is inexpensive and accessible but the technology’s error structure requires robust experimental design. Our quantitative results will be helpful for a broad array of researchers seeking guidance for de novo assembly projects.
Competing Interest Statement
The authors have declared no competing interest.