ABSTRACT
Genomic surveillance is a vital strategy for preparedness against the spread of infectious diseases and to aid in development of new treatments. In an unprecedented effort, millions of samples from COVID-19 patients have been sequenced worldwide for SARS-CoV-2. Using more than 8 million sequences that are currently available in GenBank’s SARS-CoV-2 database, we report a comprehensive overview of mutations in all 26 proteins and open reading frames (ORFs) from the virus. The results indicate that the spike protein, NSP6, nucleocapsid protein, envelope protein and ORF7b have shown the highest mutational propensities so far (in that order). In particular, the spike protein has shown rapid acceleration in mutations in the post-vaccination period. Monitoring the rate of non-synonymous mutations (Ka) provides a fairly reliable signal for genomic surveillance, successfully predicting surges in 2022. Further, the external proteins (spike, membrane, envelope, and nucleocapsid proteins) show a significant number of mutations compared to the NSPs. Interestingly, these four proteins showed significant changes in Ka typically 2 to 4 weeks before the increase in number of human infections (“surges”). Therefore, our analysis provides real time surveillance of mutations of SARS-CoV-2, accessible through the project website http://pandemics.okstate.edu/covid19/. Based on ongoing mutation trends of the virus, predictions of what proteins are likely to mutate next are also made possible by our approach. The proposed framework is general and is thus applicable to other pathogens. The approach is fully automated and provides the needed genomic surveillance to address a fast-moving pandemic such as COVID-19.
Competing Interest Statement
The authors have declared no competing interest.