Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

State-of-the-art estimation of protein model accuracy using AlphaFold

View ORCID ProfileJames P. Roney, View ORCID ProfileSergey Ovchinnikov
doi: https://doi.org/10.1101/2022.03.11.484043
James P. Roney
*Harvard College, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for James P. Roney
Sergey Ovchinnikov
†John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sergey Ovchinnikov
  • For correspondence: so@fas.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The problem of predicting a protein's 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like AlphaFold have achieved remarkable performance on this task by combining deep learning techniques with coevolutionary data from multiple sequence alignments of related protein sequences. The use of coevolutionary information is critical to these models' accuracy, and without it their predictive performance drops considerably. In living cells, however, the 3D structure of a protein is fully determined by its primary sequence and the biophysical laws that cause it to fold into a low-energy configuration. Thus, it should be possible to predict a protein's structure from only its primary sequence by learning a highly-accurate biophysical energy function. We provide evidence that AlphaFold has learned such an energy function, and uses coevolution data to solve the global search problem of finding a low-energy conformation. We demonstrate that AlphaFold's learned potential function can be used to rank the quality of candidate protein structures with state-of-the-art accuracy, without using any coevolution data. Finally, we explore several applications of this potential function, including the prediction of protein structures without MSAs.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • The current version of this draft has some noteworthy differences from the previous version. In our original draft, we investigated two choices for the one-hot encoded sequence associated with the decoy structures: a sequence of all alanines, and the target sequence. We later discovered that we had used an incorrect encoding for the target sequence (this error is described in detail in the accompanying code release). Consequently, our reported results for the target sequence are significantly different in this draft. We also switched from using a sequence of all alanines to a sequence of all gap tokens to avoid any bias towards alanine-rich sequences, although these sequences gave highly similar results. Our previous draft also experimented with combining multiple choices of decoy sequence to create even stronger ranking results, but we ultimately decided it was simpler to focus on the results from using the gap sequence. Finally, we added an applications section to the main text, which expands upon the decoy generation results that were previously included in the appendix. This revision also includes new experiments exploring applications to protein design and mutation effect prediction, the latter of which is described in Appendix E.

  • https://github.com/jproney/AF2Rank

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 19, 2022.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
State-of-the-art estimation of protein model accuracy using AlphaFold
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
State-of-the-art estimation of protein model accuracy using AlphaFold
James P. Roney, Sergey Ovchinnikov
bioRxiv 2022.03.11.484043; doi: https://doi.org/10.1101/2022.03.11.484043
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
State-of-the-art estimation of protein model accuracy using AlphaFold
James P. Roney, Sergey Ovchinnikov
bioRxiv 2022.03.11.484043; doi: https://doi.org/10.1101/2022.03.11.484043

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Biophysics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3585)
  • Biochemistry (7539)
  • Bioengineering (5494)
  • Bioinformatics (20723)
  • Biophysics (10292)
  • Cancer Biology (7946)
  • Cell Biology (11605)
  • Clinical Trials (138)
  • Developmental Biology (6577)
  • Ecology (10161)
  • Epidemiology (2065)
  • Evolutionary Biology (13573)
  • Genetics (9511)
  • Genomics (12811)
  • Immunology (7900)
  • Microbiology (19490)
  • Molecular Biology (7632)
  • Neuroscience (41967)
  • Paleontology (307)
  • Pathology (1254)
  • Pharmacology and Toxicology (2189)
  • Physiology (3258)
  • Plant Biology (7017)
  • Scientific Communication and Education (1293)
  • Synthetic Biology (1945)
  • Systems Biology (5416)
  • Zoology (1111)