RT Journal Article SR Electronic T1 What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual JF bioRxiv FD Cold Spring Harbor Laboratory SP 022731 DO 10.1101/022731 A1 Lynsey K. Whitacre A1 Polyana C. Tizioto A1 JaeWoo Kim A1 Tad S. Sonstegard A1 Steven G. Schroeder A1 Leeson J. Alexander A1 Juan F. Medrano A1 Robert D. Schnabel A1 Jeremy F. Taylor A1 Jared E. Decker YR 2015 UL http://biorxiv.org/content/early/2015/07/17/022731.abstract AB Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of infectious or symbiotic organisms.