Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

View ORCID ProfileSean R. Johnson, View ORCID ProfileXiaozhi Fu, View ORCID ProfileSandra Viknander, Clara Goldin, View ORCID ProfileSarah Monaco, View ORCID ProfileAleksej Zelezniak, View ORCID ProfileKevin K. Yang
doi: https://doi.org/10.1101/2023.03.04.531015
Sean R. Johnson
1New England Biolabs, 240 County Road, Ipswich, MA 01938, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sean R. Johnson
Xiaozhi Fu
2Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xiaozhi Fu
Sandra Viknander
2Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sandra Viknander
Clara Goldin
2Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarah Monaco
3Invitae, 1400 16th St, San Francisco, CA 94103, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sarah Monaco
Aleksej Zelezniak
2Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
4Institute of Biotechnology, Life Sciences Centre, Vilnius University, Sauletekio al. 7, LT10257 Vilnius, Lithuania
5Randall Centre for Cell & Molecular Biophysics, King’s College London, New Hunt’s House, Guy’s Campus, SE1 1UL London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Aleksej Zelezniak
  • For correspondence: aleksej.zelezniak@chalmers.se yang.kevin@microsoft.com
Kevin K. Yang
6Microsoft Research New England, 1 Memorial Drive, Cambridge, MA, 02142, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kevin K. Yang
  • For correspondence: aleksej.zelezniak@chalmers.se yang.kevin@microsoft.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network, and a protein language model. Focusing on two enzyme families, we expressed and purified over 440 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved experimental success rates by 44-100%. Surprisingly, neither sequence identity to natural sequences nor AlphaFold2 residue-confidence scores were predictive of enzyme activity. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants to test experimentally.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Add evaluation of previously published chorismate mutase and lysozyme datasets.

  • https://doi.org/10.5281/zenodo.7688667

  • https://github.com/seanrjohnson/protein_scoring

  • https://github.com/seanrjohnson/protein_gibbs_sampler

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted April 07, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang
bioRxiv 2023.03.04.531015; doi: https://doi.org/10.1101/2023.03.04.531015
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang
bioRxiv 2023.03.04.531015; doi: https://doi.org/10.1101/2023.03.04.531015

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Biochemistry
Subject Areas
All Articles
  • Animal Behavior and Cognition (4658)
  • Biochemistry (10311)
  • Bioengineering (7631)
  • Bioinformatics (26222)
  • Biophysics (13464)
  • Cancer Biology (10640)
  • Cell Biology (15358)
  • Clinical Trials (138)
  • Developmental Biology (8462)
  • Ecology (12772)
  • Epidemiology (2067)
  • Evolutionary Biology (16783)
  • Genetics (11370)
  • Genomics (15421)
  • Immunology (10566)
  • Microbiology (25083)
  • Molecular Biology (10170)
  • Neuroscience (54216)
  • Paleontology (398)
  • Pathology (1660)
  • Pharmacology and Toxicology (2878)
  • Physiology (4321)
  • Plant Biology (9207)
  • Scientific Communication and Education (1582)
  • Synthetic Biology (2543)
  • Systems Biology (6759)
  • Zoology (1455)