Abstract
Several coronaviruses infect humans, with three, including the SARS-CoV2, causing diseases. While coronaviruses are especially prone to induce pandemics, we know little about their evolutionary history, host-to-host transmissions, and biogeography, which impedes the prediction of future transmission scenarios. One of the difficulties lies in dating the origination of the family, a particularly challenging task for RNA viruses in general. Previous cophylogenetic tests of virus-host associations, including in the Coronaviridae family, have suggested a virus-host codiversification history stretching many millions of years. Here, we establish a framework for robustly testing scenarios of ancient origination and codiversification versus recent origination and diversification by host switches. Applied to coronaviruses and their mammalian hosts, our results support a scenario of recent origination of coronaviruses in bats and diversification by host switches, with preferential host switches within mammalian orders. Hotspots of coronavirus diversity, concentrated in East Asia and Europe, are consistent with this scenario of relatively recent origination and localized host switches. Spillovers from bats to other species are rare, but have the highest probability to be towards humans than to any other mammal species, implicating humans as the evolutionary intermediate host. The high host-switching rates within orders, as well as between humans, domesticated mammals, and non-flying wild mammals, indicates the potential for rapid additional spreading of coronaviruses across the world. Our results suggest that the evolutionary history of extant mammalian coronaviruses is recent, and that cases of long-term virus–host codiversification have been largely over-estimated.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Competing Interest Statement: The authors declare no competing interest.
This new version of our manuscript contains several changes, including additional robustness analyses. In particular, we additionally performed data subsampling (at the level of mammalian orders) to account for heterogeneous sampling and new simulations to exclude the possible methodological artifacts. We also now account for the time uncertainty in the dating of the mammalian phylogeny. All robustness analyses are now summarized in Table S7. We also added more justifications on the use of the RdRp palmprint region for studying the macroevolution of Coronaviridae and we provided a network representation of the data that represents the main findings of our study (new Figure 3).