Abstract
Severe acute respiratory syndrome coronavirus 2, SARS-CoV-2, was quickly identified as the cause of COVID-19 disease soon after its earliest reports. Global sequencing of thousands of genomes has revealed many common genetic variants, which are the key to unraveling the early evolutionary history of SARS-CoV-2 and tracking its global spread over time. However, our knowledge of fundamental events in the genome evolution and spread of this coronavirus remains grossly incomplete and highly uncertain. A deep understanding of the contemporary evolution of SARS-CoV-2 is urgently needed not only for a retrospective on how, when, and why COVID-19 has emerged and spread, but also for creating remedies through efforts of science, technology, medicine, and public policy. Here, we present the heretofore cryptic mutational history, phylogeny, and dynamics of SARS-CoV-2 from an analysis of tens of thousands of high-quality genomes. The reconstructed mutational progression is highly concordant with the timing of coronavirus sampling dates. It predicts the genome sequence of the progenitor virus whose earliest offspring, without any non-synonymous mutations, were still spreading worldwide months after the report of COVID-19. Over time, mutations of the progenitor gave rise to seven dominant lineages that spread episodically over time, some of which likely arose in Europe and North America after the genesis of the ancestral lineages in China. Mutational barcoding establishes that North American coronaviruses harbor genome signatures different from coronaviruses prevalent in Europe and Asia, which have converged over time. These spatiotemporal patterns continue to evolve as the pandemic progresses and can be viewed live online.
Competing Interest Statement
The authors have declared no competing interest.