Abstract
Background The current outbreak caused by novel coronavirus (2019-nCoV) in China has become a worldwide concern. As of 28 January 2020, there were 4631 confirmed cases and 106 deaths, and 11 countries or regions were affected.
Methods We downloaded the genomes of 2019-nCoVs and similar isolates from the Global Initiative on Sharing Avian Influenza Database (GISAID and nucleotide database of the National Center for Biotechnology Information (NCBI). Lasergene 7.0 and MEGA 6.0 softwares were used to calculate genetic distances of the sequences, to construct phylogenetic trees, and to align amino acid sequences. Bayesian coalescent phylogenetic analysis, implemented in the BEAST software package, was used to calculate the molecular clock related characteristics such as the nucleotide substitution rate and the most recent common ancestor (tMRCA) of 2019-nCoVs.
Results An isolate numbered EPI_ISL_403928 showed different phylogenetic trees and genetic distances of the whole length genome, the coding sequences (CDS) of ployprotein (P), spike protein (S), and nucleoprotein (N) from other 2019-nCoVs. There are 22, 4, 2 variations in P, S, and N at the level of amino acid residues. The nucleotide substitution rates from high to low are 1·05 × 10−2 (nucleotide substitutions/site/year, with 95% HPD interval being 6.27 × 10−4 to 2.72 × 10−2) for N, 5.34 × 10−3 (5.10 × 10−4, 1.28 × 10−2) for S, 1.69 × 10−3 (3.94 × 10−4, 3.60 × 10−3) for P, 1.65 × 10−3 (4.47 × 10−4, 3.24 × 10−3) for the whole genome, respectively. At this nucleotide substitution rate, the most recent common ancestor (tMRCA) of 2019-nCoVs appeared about 0.253-0.594 year before the epidemic.
Conclusion Our analysis suggests that at least two different viral strains of 2019-nCoV are involved in this outbreak that might occur a few months earlier before it was officially reported.
List of abbreviations
- CoVs
- Coronaviruses
- 2019-nCoV
- 2019-novel coronavirus
- SARS-CoV
- severe acute respiratory syndrome coronavirus
- MERS-CoV
- Middle East respiratory syndrome coronavirus
- CDS
- coding sequence
- tMRCA
- the most recent common ancestor
- GISAID
- the Global Initiative on Sharing Avian Influenza Database
- ESSs
- Effective sample sizes