Abstract
Recent improvements in the quality and yield of long-read data and scaffolding technology have made it possible to rapidly generate reference-quality assemblies for complex genomes. Still, generating these assemblies is costly, and an assessment of critical sequence depth and read length to obtain high-quality assemblies is important for allocating limited resources. To this end, we have generated eight independent assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20-75x genomic depth and N50 read lengths of 11-21 kb. Assemblies with 30x or less depth and N50 read length of 11 kb were highly fragmented, with even the low-copy genic fraction of the genome showing degradation at 20x depth. Distinct sequence-quality thresholds were observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs and centromeres. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.