Abstract
The ongoing COVID-19 pandemic is leading to the discovery of hundreds of novel SARS-CoV-2 variants on a daily basis. While most variants do not impact the course of the pandemic, some variants pose a significantly increased risk when the acquired mutations allow better evasion of antibody neutralisation in previously infected or vaccinated subjects or increased transmissibility. Early detection of such high risk variants (HRVs) is paramount for the proper management of the pandemic. However, experimental assays to determine immune evasion and transmissibility characteristics of new variants are resource-intensive and time-consuming, potentially leading to delays in appropriate responses by decision makers. Here we present a novel in silico approach combining spike (S) protein structure modelling and large protein transformer language models on S protein sequences to accurately rank SARS-CoV-2 variants for immune escape and fitness potential. These metrics can be combined into an automated Early Warning System (EWS) capable of evaluating new variants in minutes and risk-monitoring variant lineages in near real-time. The system accurately pinpoints the putatively dangerous variants by selecting on average less than 0.3% of the novel variants each week. With only the S protein nucleotide sequence as input, the EWS detects HRVs earlier and with better precision than baseline metrics such as the growth metric (which requires real-world observations) or random sampling. Notably, Omicron BA.1 was flagged by the EWS on the day its sequence was made available. Additionally, our immune escape and fitness metrics were experimentally validated using in vitro pseudovirus-based virus neutralisation test (pVNT) assays and binding assays. The EWS flagged as potentially dangerous all 16 variants (Alpha-Omicron BA.1/2/4/5) designated by the World Health Organisation (WHO) with an average lead time of more than one and a half months ahead of them being designated as such.
One-Sentence Summary A COVID-19 Early Warning System combining structural modelling with machine learning to detect and monitor high risk SARS-CoV-2 variants, identifying all 16 WHO designated variants on average more than one and a half months in advance by selecting on average less than 0.3% of the weekly novel variants.
Competing Interest Statement
U.S. is a management board member and employee at BioNTech SE. A.M., B.G.L. and B.S. are employees at BioNTech SE. A.P. and Y. L. are employees at BioNTech US. U.S. and A.M. are inventors on patents and patent applications related to RNA technology and the COVID-19 vaccine. U.S., A.M., B.G.L., and B.S. have securities from BioNTech SE. K.B. is a management board member and employee at InstaDeep Ltd. M.J.S., Y.F., T.P., N.L.C., A.L.,I.K., A.K. and A.U.L. are employees of InstaDeep Ltd or its subsidiaries. K.B., M.J.S., Y.F., T.P., N.L.C., and A.L. are inventors on patents and patent applications related to AI technology. K.B., M.J.S., Y.F., T.P., N.L.C., and A.L. have securities from InstaDeep Ltd.
Footnotes
We have made a major revision to the manuscript and believe that the substantial improvements will make the advances proposed therein more perceptible. The main changes include revisions to the text, main and supplementary figures and tables which have all been updated to reflect the following updates. First, we have extended our methods and analyses and demonstrated that the EWS outperforms standard approaches because of its higher precision (lower false positive rate). The EWS detected all 16 high risk variants (HRVs), including BA.2, BA.4 and BA.5, in advance of their designation by the World Health Organisation. To address real-world changes, we have included new pseudovirus neutralisation assay data of the highly infectious BA.2 and BA.4/5 Omicron sub-variant, in addition to the previous data. These data are in line with our in silico computed estimates for this variant, further demonstrating the real-life validity of our system. Additionally, we extended our validation using multiple independent studies on SARS-CoV-2 variant cross-neutralization. Across these studies, we observed a robust correlation between the predicted immune escape score and the reduction in effective neutralisation with serum elicited by vaccination or natural infection. We also present a higher-resolution understanding of immune escape and antibody binding sites using highly pertinent new resources regarding classes of anti-Spike antibody binding sites. We show that a substantial part of the predictive power of the epitope alteration score is attributed to data derived from antibodies across different classes, as described in Barnes et al. (doi: 10.1038/s41586-020-2852-1), rather than being dependent on a single class. We have also revised and extended our methods section to include a significant expansion to the calculation of the immune escape score, as well as the ACE2 binding score, to allow the reader to fully understand the underlying process.