Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Evolutionary context-integrated deep sequence modeling for protein engineering

View ORCID ProfileYunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng
doi: https://doi.org/10.1101/2020.01.16.908509
Yunan Luo
Department of Computer Science, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yunan Luo
Lam Vo
Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hantian Ding
Department of Computer Science, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yufeng Su
Department of Computer Science, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yang Liu
Department of Computer Science, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wesley Wei Qian
Department of Computer Science, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Huimin Zhao
Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jian Peng
Department of Computer Science, University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: jianpeng@illinois.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Protein engineering seeks to design proteins with improved or novel functions. Compared to rational design and directed evolution approaches, machine learning-guided approaches traverse the fitness landscape more effectively and hold the promise for accelerating engineering and reducing the experimental cost and effort. A critical challenge here is whether we are capable of predicting the function or fitness of unseen protein variants. By learning from the sequence and large-scale screening data of characterized variants, machine learning models predict functional fitness of sequences and prioritize new variants that are very likely to demonstrate enhanced functional properties, thereby guiding and accelerating rational design and directed evolution. While existing generative models and language models have been developed to predict the effects of mutation and assist protein engineering, the accuracy of these models is limited due to their unsupervised nature of the general sequence contexts they captured that is not specific to the protein being engineered. In this work, we propose ECNet, a deep-learning algorithm to exploit evolutionary contexts to predict functional fitness for protein engineering. Our method integrated local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest, as well as the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. This biologically motivated sequence modeling approach enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-orders. Through extensive benchmark experiments, we showed that our method outperforms existing methods on ∼50 deep mutagenesis scanning and random mutagenesis datasets, demonstrating its potential of guiding and expediting protein engineering.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 17, 2020.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evolutionary context-integrated deep sequence modeling for protein engineering
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Evolutionary context-integrated deep sequence modeling for protein engineering
Yunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng
bioRxiv 2020.01.16.908509; doi: https://doi.org/10.1101/2020.01.16.908509
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Evolutionary context-integrated deep sequence modeling for protein engineering
Yunan Luo, Lam Vo, Hantian Ding, Yufeng Su, Yang Liu, Wesley Wei Qian, Huimin Zhao, Jian Peng
bioRxiv 2020.01.16.908509; doi: https://doi.org/10.1101/2020.01.16.908509

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1647)
  • Biochemistry (2744)
  • Bioengineering (1909)
  • Bioinformatics (10277)
  • Biophysics (4195)
  • Cancer Biology (3228)
  • Cell Biology (4553)
  • Clinical Trials (135)
  • Developmental Biology (2850)
  • Ecology (4480)
  • Epidemiology (2041)
  • Evolutionary Biology (7242)
  • Genetics (5485)
  • Genomics (6821)
  • Immunology (2402)
  • Microbiology (7515)
  • Molecular Biology (3001)
  • Neuroscience (18632)
  • Paleontology (137)
  • Pathology (474)
  • Pharmacology and Toxicology (784)
  • Physiology (1159)
  • Plant Biology (2719)
  • Scientific Communication and Education (682)
  • Synthetic Biology (890)
  • Systems Biology (2854)
  • Zoology (470)