Abstract
In numerous instances, tracking the biological significance of a nucleic acid sequence can be augmented through the identification of environmental niches in which the sequence of interest is present. Many metagenomic datasets are now available, with deep sequencing of samples from diverse biological niches. While any individual metagenomic dataset can be readily queried using web-based tools, meta-searches through all such datasets are less accessible. In this brief communication, we demonstrate such a meta-meta-genomic approach, examining close matches to the Wuhan coronavirus 2019-nCoV in all high-throughput sequencing datasets in the NCBI Sequence Read Archive accessible with the keyword "virome". In addition to the homology to bat coronaviruses observed in descriptions of the 2019-nCoV sequence (F. Wu et al. 2020, Nature, doi.org/10.1038/s41586-020-2008-3; P. Zhou et al. 2020, Nature, doi.org/10.1038/s41586-020-2012-7), we note a strong homology to numerous sequence reads in a metavirome dataset generated from the lungs of deceased Pangolins reported by Liu et al. (Viruses 11:11, 2019, http://doi.org/10.3390/v11110979). Our observations are relevant to discussions of the derivation of 2019-nCoV and illustrate the utility and limitations of meta-metagenomic search tools in effective and rapid characterization of potentially significant nucleic acid sequences.
Footnotes
Text has been clarified, references ordered correctly, and an additional figure added describing distribution of synonymous and nonsynonymous variant distributions (Figure S1)