Abstract
In late December 2019, an emerging viral infection COVID-19 was identified in Wuhan, China, and became a global pandemic. Characterization of the genetic variants of SARS-CoV-2 is crucial in following and evaluating their spread across countries. In this study, we collected and analyzed 3,067 SARS-CoV-2 genomes isolated from 59 countries during the first three months after the onset of this virus. Using comparative genomics analysis, we traced the profiles of the whole-genome mutations and compared the frequency of each mutation in the studied population. The accumulation of mutations during the epidemic period with their geographic locations was also monitored. The results showed 716 site mutations, of which 457 (64%) had a non-synonymous effect. Frequencies of mutated alleles revealed the presence of 39 recurrent non-synonymous mutations, including 10 hotspot mutations with a prevalence higher than 0.10 in this population and distributed in six genes of SARS-CoV-2. The distribution of these recurrent mutations on the world map revealed certain genotypes specific to the geographic location. We also found co-occurring mutations resulting in the presence of several haplotypes. Thus, evolution over time has shown a mechanism of co-accumulation and the phylogenetic analysis of this population indicated that this virus can be divided into 3 clades, including a subgroup-specific to the genomes of the United States. On the other hand, analysis of the selective pressure revealed the presence of several negatively selected residues that could be useful for considerations as therapeutic target design.
We have also created an inclusive unified database (http://moroccangenomes.ma/covid/) that lists all of the genetic variants of the SARS-CoV-2 genomes found in this study with phylogeographic analysis around the world.
Competing Interest Statement
The authors have declared no competing interest.