PT - JOURNAL ARTICLE AU - Gytis Dudas AU - Joshua Batson TI - Accumulated metagenomic studies reveal recent migration, whole genome evolution, and taxonomic incompleteness of orthomyxoviruses AID - 10.1101/2022.08.31.505987 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.08.31.505987 4099 - http://biorxiv.org/content/early/2022/09/03/2022.08.31.505987.short 4100 - http://biorxiv.org/content/early/2022/09/03/2022.08.31.505987.full AB - Metagenomic studies have uncovered an abundance of novel viruses by looking beyond hosts of obvious public health or economic interest. The discovery of conserved genes in viruses infecting geographically and phylogenetically diverse hosts has provided important evolutionary context for human and animal pathogens. However, the resulting viral genomes are often incomplete, and analyses largely characterize the distribution of viruses over their dynamics. Here, we show how the accumulated data of metagenomic studies can be integrated to reveal geographic and evolutionary dynamics in a case study of Orthomyxoviridae, the family of RNA viruses containing influenza. First, we use sequences of the orthomyxovirus Wuhan mosquito virus 6 to track the global migrations of its host. We then look at overall orthomyxovirus genome evolution, finding significant gene gain and loss across the family, especially in the surface proteins responsible for cell and host tropism. We find that the surface protein of Wuhan mosquito virus 6 exhibits accelerated non-synonymous evolution suggestive of antigenic evolution, and an entire quaranjavirus group bearing highly diverged surface proteins. Finally we quantify the progress of orthomyxovirus discovery and forecast that many highly diverged Orthomyxoviridae remain to be found. We argue that continued metagenomic studies will be fruitful for understanding the dynamics, evolution, ecology of viruses and their hosts, regardless of whether novel species are actually identified or not, as long as study designs allowing for the resolution of complete viral genomes are employed.Summary The number of known virus species has increased dramatically through metagenomic studies, which search genetic material sampled from a host for viral genes. Here, we focus on an important viral family with over a hundred recently discovered species infecting hosts from humans to fish. We find one virus, discovered in mosquitoes in China, recently spread across the globe. Surface proteins used to enter cells show signs of rapid evolution in that virus and across the family. We compute the rate at which new species discovered add evolutionary history to the tree of life, predict that many viruses remain to be discovered, and discuss what appropriately designed future studies can teach us about how diseases cross between continents and species.Viruses that cause disease in humans and economically important organisms were the first to be isolated and characterized. Recently, cheap DNA sequencing has enabled a wave of metagenomic studies in a broader range of hosts, in which viruses are identified in a host sample by nucleic acid sequence alone and a new viral species is said to be discovered if that sequence is sufficiently diverged. As a result, the number of known viral species has increased by more than an order of magnitude in the decade since 2012 (Roux et al., 2021). While some entirely new viral families have been discovered, many of these new species are interleaved on the tree of life with viruses infecting hosts of economic importance. Studying their ecology (Shi et al., 2019) and host associations (Li et al., 2015; Shi et al., 2018) provides insight into the host-switching and genome evolution processes important for the evolution of pathogenicity.This richer tree has provided some early success stories, such as jingmenviruses first discovered metagenomically in ticks (Qin et al., 2014) and later identified as causing human disease (Wang et al., 2019). Surveillance in hosts known to pose disproportionate risk, such as bats, (Ge et al., 2016) has provided context for zoonotic pathogens like SARS-CoV-2 (Wu et al., 2020). Metagenomic studies carried out at scale can effectively multiplex other tasks previously addressed with targeted sampling, like understanding the evolutionary history of human pathogens (Keele et al., 2006) or using viruses that evolve faster than their hosts to track host movements (Wheeler et al., 2010).Here, we seek to show how accumulated data from metagenomic studies can provide deep insights into viral evolution and dispersion across a family through a case study of Orthomyxoviridae. Orthomyxoviridae are a family of enveloped segmented negative sense single-stranded RNA viruses that infect vertebrates and arthropods. Orthomyxovirus discovery has historically been driven by impact on human health (e.g. influenza virus) and livelihood (e.g. salmon infectious anemia virus), or association with known disease vectors (e.g. the tick-borne John-ston Atoll quaranja- and Thogotoviruses). The metagenomic revolution has resulted in ten times more orthomyxovirus species being discovered over the last decade than in the previous 79 years since the first orthomyxovirus discovery, of influenza A, in 1933. The vast majority of known orthomyxoviruses use one of two surface protein classes, with vertebrate-infecting-only members (influenza, isaviruses) using one or more class I membrane fusion proteins derived from hemagglutinin-esterasefusion (HEF) (Parry et al., 2020), sometimes delegating the esterase function to a separate protein neuraminidase (NA), and arthropod-infecting ones (quaranja- and thogotovirus, which sometimes spill over into vertebrates) using a class III membrane fusion protein called gp64 (Garry and Garry, 2008). The number of segments of orthomyxoviruses with genomes known to be complete varies from 6 to 8, but many metagenomically discovered viruses have a smaller number of segments characterized, or only the polymerase. To our knowledge, an inventory of surface protein class use and segment content of Orthomyxoviridae is not yet available.We start by showing how closely related virus sequences observed across numerous studies can reveal host spatial dynamics and virus microevolution, using the orthomyxovirus Wuhan mosquito virus 6 (WMV6). We then map out known genome composition across Orthomyxoviridae, highlighting parts of the tree where changes to segment numbers are likely to have taken place. In looking at genome composition we pay close attention to surface protein use, and focus particularly on gp64 proteins used by thogoto- and quaranjaviruses. We find surface proteins to be quite mobile within Orthomyxoviridae over evolutionary timescales and identify a clade of quaranjaviruses known to have acquired new segments using distinctly diverged gp64 proteins. Finally we borrow methods from macroevolutionary research to quantitatively assess the pace at which orthomyxovirus evolutionary history is being uncovered, finding that despite their already transformative effect, metagenomic discovery efforts are likely to continue to find substantially diverged members of Orthomyxoviridae for some time.Competing Interest StatementThe authors have declared no competing interest.