Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an emergent RNA virus that spread around the planet in about 4 months. The consequences of this rapid dispersion are under investigation. In this work, we analyzed thousands of genomes and protein sequences from Africa, America, Asia, Europe, and Oceania. We show that the virus is a complex of slightly different variants that are unevenly distributed on Earth, and demonstrate that SARS-CoV-2 phylogeny is spatially structured. Remarkably, the virus phylogeographic patterns were associated with ancestral amino acidic mutations. We hypothesize that geographic structuring is the result of founder effects occurring as a consequence of, and local evolution occurring after, long-distance dispersal. Based on previous studies, the possibility that this could significantly affect the virus phenotype is not remote.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Relevance. A cluster of pneumonia cases of unknown etiology was reported in December 2019 in Wuhan, Hubei province, China. Since then, hundreds of thousands of people have died all around the planet. Quickly after the pandemic onset, metagenomic studies showed the causative agent, now named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was a new human coronavirus. It is now classified in the Betacoronavirus genus, inside the subgenus Sarbecovirus, responsible for spillover events in 2002 and 2012 (severe acute respiratory syndrome and Middle East respiratory syndrome, respectively). The zoonotic origins of these viruses (possibly bats, camelids, pangolins and/or palm civets) have received much attention. However, other evolutionary aspects, such as spatial variation, have received comparatively little attention. This study shows that SARS-CoV-2 variants are heterogeneously distributed on Earth and demonstrates that the virus phylogeny is geographically structured. The phenomenon affects the virus RNA and protein sequences. We explain how this may be due to founder effects combined with high mutation rates.
This version includes new analyses based on protein sequences.