Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants

Jie Chen, Zhiwei Nie, View ORCID ProfileYu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian, Wen Gao
doi: https://doi.org/10.1101/2022.11.17.516989
Jie Chen
1Peng Cheng Laboratory, Shenzhen, China
2School of Electronic and Computer Engineering, Peking University, Shenzhen, China
3AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhiwei Nie
2School of Electronic and Computer Engineering, Peking University, Shenzhen, China
1Peng Cheng Laboratory, Shenzhen, China
3AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yu Wang
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yu Wang
Kai Wang
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fan Xu
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhiheng Hu
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bing Zheng
1Peng Cheng Laboratory, Shenzhen, China
4Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhennan Wang
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guoli Song
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jingyi Zhang
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jie Fu
5Beijing Academy of Artificial Intelligence, Beijing, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiansong Huang
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhongqi Wang
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhixiang Ren
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qiankun Wang
1Peng Cheng Laboratory, Shenzhen, China
6State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daixi Li
1Peng Cheng Laboratory, Shenzhen, China
7School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dongqing Wei
1Peng Cheng Laboratory, Shenzhen, China
6State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bin Zhou
1Peng Cheng Laboratory, Shenzhen, China
8School of Information Science and Engineering, Shandong University, Qingdao, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wgao@pku.edu.cn yhtian@pku.edu.cn chao_yang@pku.edu.cn binzhou@sdu.edu.cn
Chao Yang
9CODE and School of Mathematical Sciences, Peking University, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wgao@pku.edu.cn yhtian@pku.edu.cn chao_yang@pku.edu.cn binzhou@sdu.edu.cn
Yonghong Tian
10School of Computer Science, Peking University, China
2School of Electronic and Computer Engineering, Peking University, Shenzhen, China
1Peng Cheng Laboratory, Shenzhen, China
3AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wgao@pku.edu.cn yhtian@pku.edu.cn chao_yang@pku.edu.cn binzhou@sdu.edu.cn
Wen Gao
1Peng Cheng Laboratory, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wgao@pku.edu.cn yhtian@pku.edu.cn chao_yang@pku.edu.cn binzhou@sdu.edu.cn
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.

Justification We develop a novel multi-constraint variation prediction framework to simulate SARS-CoV-2 RBD mutations, reaching a peak performance of 366.8 PFLOPS with 96.5% scalability and achieving 493.9× speedup. Our method facilitates the prediction and prioritization of future high-risk variants for the early deployment of drugs and vaccines.

View this table:
  • View inline
  • View popup
  • Download powerpoint

Overview of the problem Coronavirus Disease 2019 (COVID-19) has spread rapidly to more than 200 countries or regions since December 2019. Due to its high infectivity, there have been over 645 million confirmed cases, including approximately 6.6 million deaths, reported by the World Health Organization (WHO) as of December 20221. In addition to being a serious threat to human health, COVID-19 has had a catastrophic impact on the global economy.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • add one additional author

  • https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 12, 2023.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants
Jie Chen, Zhiwei Nie, Yu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian, Wen Gao
bioRxiv 2022.11.17.516989; doi: https://doi.org/10.1101/2022.11.17.516989
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants
Jie Chen, Zhiwei Nie, Yu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian, Wen Gao
bioRxiv 2022.11.17.516989; doi: https://doi.org/10.1101/2022.11.17.516989

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4380)
  • Biochemistry (9571)
  • Bioengineering (7084)
  • Bioinformatics (24832)
  • Biophysics (12595)
  • Cancer Biology (9949)
  • Cell Biology (14344)
  • Clinical Trials (138)
  • Developmental Biology (7943)
  • Ecology (12095)
  • Epidemiology (2067)
  • Evolutionary Biology (15980)
  • Genetics (10915)
  • Genomics (14730)
  • Immunology (9862)
  • Microbiology (23636)
  • Molecular Biology (9472)
  • Neuroscience (50824)
  • Paleontology (369)
  • Pathology (1538)
  • Pharmacology and Toxicology (2678)
  • Physiology (4009)
  • Plant Biology (8653)
  • Scientific Communication and Education (1508)
  • Synthetic Biology (2389)
  • Systems Biology (6422)
  • Zoology (1345)