RT Journal Article SR Electronic T1 Discovery of several thousand highly diverse circular DNA viruses JF bioRxiv FD Cold Spring Harbor Laboratory SP 555375 DO 10.1101/555375 A1 Michael J. Tisza A1 Diana V. Pastrana A1 Nicole L. Welch A1 Brittany Stewart A1 Alberto Peretti A1 Gabriel J. Starrett A1 Yuk-Ying S. Pang A1 Siddharth R. Krishnamurthy A1 Patricia A. Pesavento A1 David H. McDermott A1 Philip M. Murphy A1 Jessica L. Whited A1 Bess Miller A1 Jason M. Brenchley A1 Stephan P. Rosshart A1 Barbara Rehermann A1 John Doorbar A1 Blake A. Ta’ala A1 Olga Pletnikova A1 Juan Troncoso A1 Susan M. Resnick A1 Ben Bolduc A1 Matthew B. Sullivan A1 Arvind Varsani A1 Anca M. Segall A1 Christopher B. Buck YR 2019 UL http://biorxiv.org/content/early/2019/09/17/555375.abstract AB Although it is suspected that there are millions of distinct viral species, fewer than 9,000 are catalogued in GenBank’s RefSeq database. We selectively enriched for and amplified the genomes of circular DNA viruses in over 70 animal samples, ranging from cultured soil nematodes to human tissue specimens. A bioinformatics pipeline, Cenote-Taker, was developed to automatically annotate over 2,500 circular genomes in a GenBank-compliant format. The new genomes belong to dozens of established and emerging viral families. Some appear to be the result of previously undescribed recombination events between ssDNA viruses and ssRNA viruses. In addition, hundreds of circular DNA elements that do not encode any discernable similarities to previously characterized sequences were identified. To characterize these “dark matter” sequences, we used an artificial neural network to identify candidate viral capsid proteins, several of which formed virus-like particles when expressed in culture. These data further the understanding of viral sequence diversity and allow for high throughput documentation of the virosphere.