ABSTRACT
The emergence of SARS-CoV-2 in 2019 has caused severe disruption and a huge number of human deaths across the globe. As the pandemic spreads, a natural result is the emergence of variants with a variety of amino acid mutations. Variants of SARS-CoV-2 with mutations in their spike protein may result in an increased infectivity, increased lethality, or immune escape, and whilst many of these properties can be explained through changes to binding affinity or changes to post-translational modification, many mutations have no known biophysical impact on the structure of protein. The Gibbs free energy of a protein represents a measure of protein stability, with an increased stability resulting in a protein that is more thermodynamically stable, and more robust to changes in external environment.
Here we show that mutations in the spike proteins of SARS-CoV-2 are selecting for amino acid changes that result in a more stable protein than expected by chance. We calculate all possible mutations in the SARS-CoV-2 spike protein, and show that many variants are more stable than expected when compared to the background, indicating that protein stability is an important consideration for the understanding of SARS-CoV-2 evolution. Variants exhibit a range of stabilities, and we further suggest that some stabilising mutations may be acting as a “counterbalance” to destabilising mutations that have other properties, such as increasing binding site affinity for the human ACE2 receptor. We suggest that protein folding calculations offer a useful tool for early identification of advantageous mutations.
Here we show that mutations in the spike proteins of SARS-CoV-2 are selecting for amino acid changes that result in a more stable protein than expected by chance. We calculate all possible mutations in the SARS-CoV-2 spike protein, and show that many variants are more stable than expected when compared to the background, indicating that protein stability is an important consideration for the understanding of SARS-CoV-2 evolution. Variants exhibit a range of stabilities, and we further suggest that some stabilising mutations may be acting as a “counterbalance” to destabilising mutations that have other properties, such as increasing binding site affinity for the human ACE2 receptor. We suggest that protein folding calculations offer a useful tool for early identification of advantageous mutations.
TEXT: Since the emergence of SARS-CoV-2 in late 2019, over 2 million people have died as a result of infection1. As the global pandemic continues, the emergence of viral variants with RNA mutations is an expected phenomena, caused by random errors in RNA copying, and selected for by evolutionary pressure2. Many variants contain mutations in the spike protein that confer an advantage to the virus, such as increased ACE2 receptor binding3, glycosylation/cleavage site alterations4, and immune evasion5, as well as protein stability6. Understanding these properties helps infer how a variant may differ from another mutational profile, and provides insights into the mechanisms by which variants differ, such as increased infectivity or vaccine resistance7–9. The WHO classifies variants in SARS-CoV-2 into major categories, the two most important: “Variants of Concern” and “Variants of Interest” are assigned to emerging variants likely to have a different phenotype and mutational profile to the original SARS-CoV-210.
Changes in Gibbs Free Energy (called ΔΔG) is a measure of the thermodynamic energy change upon amino acid mutation in a protein. Prediction of the changes in ΔΔG are routinely used in protein engineering and design for optimization of enzymes or stabilisation of protein complexes11,12, and we have recently shown that they can be predictive of mutations that destabilise or damage a protein in a cancer context13–15. Whilst stability of mutations has been assessed in the SARS-CoV-2 spike protein16,17, variant analysis has not yet been performed. Mutations that stabilise the SARS-CoV-2 spike protein are likely to lead to better binding to other molecules, and a greater lifespan of a protein before thermal unfolding. The requirement for calculation of predicted ΔΔG values is protein structural information, which was recently published for the SARS-CoV-2 spike protein18.
Here we calculate the ΔΔG on mutation for every possible missense mutation in the SARS-CoV-2 spike protein. With this “background” mutation rate we show that mutations to the spike protein observed in emerging SARS-CoV-2 variants have a lower ΔΔG, and a higher proportion of stabilising mutations than expected. We further show that combinations of mutations result in synergistic stability changes, and so highlight possible evolutionary orders of mutations. This suggests an important role for protein stability when considering the evolution of SARS-CoV-2.
The SARS-CoV-2 spike protein is composed of a trimer of 3 identical subunits (Figure 1a) that sits in the membrane of the virion and interacts with the human ACE2 receptor to facilitate infection of a host cell. The structure of the spike protein was recently elucidated, enabling the calculation of predictive ΔΔG values for mutations. The “Alpha” variant, first identified in December 2020 in the United Kingdom19,20 has been found to be more transmissible than the original virus, with an increased affinity for binding the human ACE2 receptor21, and by April 2020 had become the most dominant variant in the UK. The Alpha variant carries 23 common mutations across its genome, 7 of which are amino acid substitution mutations in the spike protein, 6 of which are at locations for which crystallographic data is available (Figure 1b).
To explore the mutational landscape of the spike protein and SARS-CoV-2 variants, we first calculated the ΔΔG for each of the possible 19440 missense mutations in the 6XVV cryo-em structure using FoldX22 (Supplementary Table 1). As is consistent with previous studies, we find mutations that stabilise the protein are rare (Figure 2a). We define mutations with an induced ΔΔG of < -1 Kcal/mol as strongly stabilising, and mutations with a ΔΔG > 2.5 Kcal/mol as strongly destabilising, with those between zero and each threshold described as mildy stabilising and destabilising respectively. Only 767 (3.9%) of possible mutations are predicted to strongly stabilise the protein, and only 3699 (19%) have a ΔΔG < 0. With this “background” mutational distribution, we compared to mutations found only in WHO “Variants of Concern” and “Variants of Interest” as of June 2021 (Figure 2b). Mutations found in both categories have a significantly lower ΔΔG (t-test pvalue < 0.05) than bulk population, indicating that variants may be evolutionarily selecting for stabilising or non-destabilising mutations. Considering individual mutations found in “Variants of Concern” (Figure 2c), none of the mutations observed induce a ΔΔG > 2.5 Kcal/mol (defined as strongly destabilising), significantly different to the expected 34% of possible mutations meeting this threshold (chi squared pvalue <0.05). Additionally 4 of 17 mutations have ΔΔG of <= ∼-1 Kcal/mol (N501Y, H655Y, T716I, and T1027I), showing a significant enrichment for stabilising mutations in the “Variants of Concern” (chi squared pvalue <0.05). We conclude that there is a statistical enrichment for mutations that stabilise the spike protein compared to the mutational background in SARS-CoV-2 variants.
To further unpick the stability of specific SARS-CoV-2 variants we calculated the ΔΔG distribution for mutations individually for each variant in the two categories (Figure 3, variants and mutations included in Table 1). Of the 10 variants studied, 7 have a statistically significantly lower ΔΔG than the bulk mutational background (ttest pvalue <=0.05), and 5 variants (Alpha, Gamma, Eta, Theta, and Iota) have a mean ΔΔG less than 0, indicating that the protein will be stabilised with respect to the original variant. To further study this relationship we calculated the expected ΔΔG for each variant given the number of mutations occurring in it (Supplementary Figure 1), and find that all variants aside from Beta and Eta have a lower ΔΔG than expected given the number of mutations they contain. Of particular note is the Alpha variant, which has a ΔΔG distribution almost entirely below zero, as well as the emerging Theta variant first identified in India.
Finally, to study the potential evolutionary order and gain insights into mechanisms of mutational selection, we calculated the ΔΔG for every possible combination of mutations in each variant, shown in Figure 4 for the Alpha variant (Supplementary Figures 2-9 for other variants). For the alpha variant there is a consistent trend towards stabilisation as more mutational combinations are considered, with all combinations of 5 or more mutations (of the 6 possible to model in the structure) resulting in a predicted stabilisation of the protein with respect to the original. Furthermore we observe combinations that result in a positive ΔΔG, which are likely to be evolutionarily less favourable (when considering stability alone), and so we expect that these combinations would be less likely to occur in the evolutionary history than stabilising combinations. Furthermore, some variants, such as the Beta variant first identified in South Africa in May 2020, contains combinations of mutations with a ΔΔG expected to be highly destabilising, and whilst the final ΔΔG of all mutations is still predicted to be strongly destabilising, it is reduced compared to the most extreme combinations, indicating that a potential driver of selection of other mutants may be that they stabilise the protein complex enough for it to function, whilst retaining the advantageous properties unrelated to stability from the destabilising mutations.
This work highlights that mutations with a stabilising effect on the SARS-CoV-2 spike protein are one of the key drivers of evolution of the virus, and contributing to the increased transmissibility of emerging variants. That variants are more stable than expected by chance shows that evolution is favouring mutations with a stabilising effect, and it may be that mutations that destabilise a protein but have other influences on protein structure, such as K417N21, which alters ACE2 binding affinity, are offset or preceded by mutations that stabilise the structure. We note however, that not all mutations in all variants can be considered, due to missing regions of the cryo-em structure, and as such this study does not necessarily represent the true ΔΔG for each variant. Furthermore, we study only the structure in its “closed” conformation as we feel this is the most physiological relevant of the existing structures, but further work will need to address the impact of the dynamic range of the structure on mutational stability. We highlight that stability of the SARS-CoV-2 spike protein is an important consideration for future study of variants, and is likely one of a number of driving forces in the evolution of new variants. Finally, we suggest that folding simulations of newly sequenced variants may offer a computationally inexpensive method to highlight advantageous mutations, and that prospective simulation of further mutation to these samples may predict future variants to surveil for.
ASSOCIATED CONTENT
Supporting Information
The following files are available free of charge.
Supplementary Information
Methods for ΔΔG calculation and supplementary figures (PDF)
Supplementary Table 1: Table containing predicted ΔΔG for every possible mutations in SARS-CoV-2 structure PDBID 6VXX (XLSX)
Author Contributions
DS and BAH conceived the study and wrote the manuscript. DS generated all data and performed all analysis.
Funding Sources
This work was supported by the Medical Research Council (grant no. MR/S000216/1). B.A.H. acknowledges support from the Royal Society (grant no. UF130039). The authors declare no competing financial interest.