Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Early Computational Detection of Potential High Risk SARS-CoV-2 Variants

View ORCID ProfileKarim Beguir, View ORCID ProfileMarcin J. Skwark, View ORCID ProfileYunguan Fu, Thomas Pierrot, View ORCID ProfileNicolas Lopez Carranza, Alexandre Laterre, Ibtissem Kadri, Abir Korched, View ORCID ProfileAnna U. Lowegard, Bonny Gaby Lui, Bianca Sänger, Yunpeng Liu, Asaf Poran, Alexander Muik, Ugur Sahin
doi: https://doi.org/10.1101/2021.12.24.474095
Karim Beguir
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Karim Beguir
  • For correspondence: kb@instadeep.com ugur.sahin@biontech.de
Marcin J. Skwark
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marcin J. Skwark
Yunguan Fu
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yunguan Fu
Thomas Pierrot
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicolas Lopez Carranza
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicolas Lopez Carranza
Alexandre Laterre
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ibtissem Kadri
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abir Korched
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anna U. Lowegard
1InstaDeep Ltd; 5 Merchant Square, London W2 1AY, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anna U. Lowegard
Bonny Gaby Lui
2BioNTech SE; An der Goldgrube 12, 55131 Mainz, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bianca Sänger
2BioNTech SE; An der Goldgrube 12, 55131 Mainz, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yunpeng Liu
3BioNTech US; 40 Erie Street, Cambridge, MA, 02139, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Asaf Poran
3BioNTech US; 40 Erie Street, Cambridge, MA, 02139, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Muik
2BioNTech SE; An der Goldgrube 12, 55131 Mainz, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ugur Sahin
2BioNTech SE; An der Goldgrube 12, 55131 Mainz, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: kb@instadeep.com ugur.sahin@biontech.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The ongoing COVID-19 pandemic is leading to the discovery of hundreds of novel SARS-CoV-2 variants on a daily basis. While most variants do not impact the course of the pandemic, some variants pose a significantly increased risk when the acquired mutations allow better evasion of antibody neutralisation in previously infected or vaccinated subjects or increased transmissibility. Early detection of such high risk variants (HRVs) is paramount for the proper management of the pandemic. However, experimental assays to determine immune evasion and transmissibility characteristics of new variants are resource-intensive and time-consuming, potentially leading to delays in appropriate responses by decision makers. Here we present a novel in silico approach combining spike (S) protein structure modelling and large protein transformer language models on S protein sequences to accurately rank SARS-CoV-2 variants for immune escape and fitness potential. These metrics can be combined into an automated Early Warning System (EWS) capable of evaluating new variants in minutes and risk-monitoring variant lineages in near real-time. The system accurately pinpoints the putatively dangerous variants by selecting on average less than 0.3% of the novel variants each week. With only the S protein nucleotide sequence as input, the EWS detects HRVs earlier and with better precision than baseline metrics such as the growth metric (which requires real-world observations) or random sampling. Notably, Omicron BA.1 was flagged by the EWS on the day its sequence was made available. Additionally, our immune escape and fitness metrics were experimentally validated using in vitro pseudovirus-based virus neutralisation test (pVNT) assays and binding assays. The EWS flagged as potentially dangerous all 16 variants (Alpha-Omicron BA.1/2/4/5) designated by the World Health Organisation (WHO) with an average lead time of more than one and a half months ahead of them being designated as such.

One-Sentence Summary A COVID-19 Early Warning System combining structural modelling with machine learning to detect and monitor high risk SARS-CoV-2 variants, identifying all 16 WHO designated variants on average more than one and a half months in advance by selecting on average less than 0.3% of the weekly novel variants.

Competing Interest Statement

U.S. is a management board member and employee at BioNTech SE. A.M., B.G.L. and B.S. are employees at BioNTech SE. A.P. and Y. L. are employees at BioNTech US. U.S. and A.M. are inventors on patents and patent applications related to RNA technology and the COVID-19 vaccine. U.S., A.M., B.G.L., and B.S. have securities from BioNTech SE. K.B. is a management board member and employee at InstaDeep Ltd. M.J.S., Y.F., T.P., N.L.C., A.L.,I.K., A.K. and A.U.L. are employees of InstaDeep Ltd or its subsidiaries. K.B., M.J.S., Y.F., T.P., N.L.C., and A.L. are inventors on patents and patent applications related to AI technology. K.B., M.J.S., Y.F., T.P., N.L.C., and A.L. have securities from InstaDeep Ltd.

Footnotes

  • We have made a major revision to the manuscript and believe that the substantial improvements will make the advances proposed therein more perceptible. The main changes include revisions to the text, main and supplementary figures and tables which have all been updated to reflect the following updates. First, we have extended our methods and analyses and demonstrated that the EWS outperforms standard approaches because of its higher precision (lower false positive rate). The EWS detected all 16 high risk variants (HRVs), including BA.2, BA.4 and BA.5, in advance of their designation by the World Health Organisation. To address real-world changes, we have included new pseudovirus neutralisation assay data of the highly infectious BA.2 and BA.4/5 Omicron sub-variant, in addition to the previous data. These data are in line with our in silico computed estimates for this variant, further demonstrating the real-life validity of our system. Additionally, we extended our validation using multiple independent studies on SARS-CoV-2 variant cross-neutralization. Across these studies, we observed a robust correlation between the predicted immune escape score and the reduction in effective neutralisation with serum elicited by vaccination or natural infection. We also present a higher-resolution understanding of immune escape and antibody binding sites using highly pertinent new resources regarding classes of anti-Spike antibody binding sites. We show that a substantial part of the predictive power of the epitope alteration score is attributed to data derived from antibodies across different classes, as described in Barnes et al. (doi: 10.1038/s41586-020-2852-1), rather than being dependent on a single class. We have also revised and extended our methods section to include a significant expansion to the calculation of the immune escape score, as well as the ACE2 binding score, to allow the reader to fully understand the underlying process.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted September 20, 2022.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Early Computational Detection of Potential High Risk SARS-CoV-2 Variants
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Early Computational Detection of Potential High Risk SARS-CoV-2 Variants
Karim Beguir, Marcin J. Skwark, Yunguan Fu, Thomas Pierrot, Nicolas Lopez Carranza, Alexandre Laterre, Ibtissem Kadri, Abir Korched, Anna U. Lowegard, Bonny Gaby Lui, Bianca Sänger, Yunpeng Liu, Asaf Poran, Alexander Muik, Ugur Sahin
bioRxiv 2021.12.24.474095; doi: https://doi.org/10.1101/2021.12.24.474095
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Early Computational Detection of Potential High Risk SARS-CoV-2 Variants
Karim Beguir, Marcin J. Skwark, Yunguan Fu, Thomas Pierrot, Nicolas Lopez Carranza, Alexandre Laterre, Ibtissem Kadri, Abir Korched, Anna U. Lowegard, Bonny Gaby Lui, Bianca Sänger, Yunpeng Liu, Asaf Poran, Alexander Muik, Ugur Sahin
bioRxiv 2021.12.24.474095; doi: https://doi.org/10.1101/2021.12.24.474095

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4395)
  • Biochemistry (9619)
  • Bioengineering (7111)
  • Bioinformatics (24915)
  • Biophysics (12642)
  • Cancer Biology (9979)
  • Cell Biology (14387)
  • Clinical Trials (138)
  • Developmental Biology (7968)
  • Ecology (12135)
  • Epidemiology (2067)
  • Evolutionary Biology (16010)
  • Genetics (10937)
  • Genomics (14764)
  • Immunology (9889)
  • Microbiology (23719)
  • Molecular Biology (9493)
  • Neuroscience (50965)
  • Paleontology (370)
  • Pathology (1544)
  • Pharmacology and Toxicology (2688)
  • Physiology (4031)
  • Plant Biology (8683)
  • Scientific Communication and Education (1512)
  • Synthetic Biology (2403)
  • Systems Biology (6446)
  • Zoology (1346)