RT Journal Article SR Electronic T1 The effect of variant interference on de novo assembly for viral deep sequencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 815480 DO 10.1101/815480 A1 Christina J. Castro A1 Rachel L. Marine A1 Edward Ramos A1 Terry Fei Fan Ng YR 2019 UL http://biorxiv.org/content/early/2019/10/23/815480.abstract AB Viruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of contigs. This “variant interference” (VI) is highly consistent and reproducible by ten most used de novo assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is pairwise identities between viral variants. These findings were further supported by in silico simulations, where selective removal of minor variant reads from clinical datasets allow the “rescue” of full viral genomes from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de novo assembly in viral deep sequencing.