Abstract
Recent advances in molecular modeling of protein structures are changing the field of structural biology. AlphaFold-2 (AF2), an AI system developed by DeepMind, Inc., utilizes attention-based deep learning to predict models of protein structures with high accuracy relative to structures determined by X-ray crystallography and cryo-electron microscopy (cryoEM). Comparing AF2 models to structures determined using solution NMR data, both high similarities and distinct differences have been observed. Since AF2 was trained on X-ray crystal and cryoEM structures, we assessed how accurately AF2 can model small, monomeric, solution protein NMR structures which (i) were not used in the AF2 training data set, and (ii) did not have homologous structures in the Protein Data Bank at the time of AF2 training. We identified nine open source protein NMR data sets for such “blind” targets, including chemical shift, raw NMR FID data, NOESY peak lists, and (for 1 case) 15N-1H residual dipolar coupling data. For these nine small (70 - 108 residues) monomeric proteins, we generated AF2 prediction models and assessed how well these models fit to these experimental NMR data, using several well-established NMR structure validation tools. In most of these cases, the AF2 models fit the NMR data nearly as well, or sometimes better than, the corresponding NMR structure models previously deposited in the Protein Data Bank. These results provide benchmark NMR data for assessing new NMR data analysis and protein structure prediction methods. They also document the potential for using AF2 as a guiding tool in protein NMR data analysis, and more generally for hypothesis generation in structural biology research.
Highlights
AF2 models assessed against NMR data for 9 monomeric proteins not used in training.
AF2 models fit NMR data almost as well as the experimentally-determined structures.
RPF-DP, PSVS, and PDBStat software provide structure quality and RDC assessment.
RPF-DP analysis using AF2 models suggests multiple conformational states.
Highlights
Competing Interest Statement
GTM is a founder of Nexomics Biosciences, Inc. This does not represent a conflict of interest for this study.
Footnotes
Ethan H. Li email: ethan.h.li{at}gmail.com
Laura Spaman email: spamal{at}rpi.edu
Roberto Tejero email: roberto.tejero{at}uv.es
Yuanpeng J. Huang email: huangy26{at}rpi.edu
Theresa A. Ramelot email: ramelt2{at}rpi.edu
Keith J. Fraga email: fragak2{at}rpi.edu
James H. Prestegard email: jpresteg{at}ccrc.uga.edu
Michael A. Kennedy email: kennedm4{at}miamioh.edu
Gaetano T. Montelione email: monteg3{at}rpi.edu
https://github.rpi.edu/RPIBioinformatics/BlindAssessmentMonomericAF2Data
Abbreviations
- AF
- AlphaFold;
- AF2
- AlphaFold2;
- AI
- artificial intelligence;
- BMRB
- BioMagResDataBank;
- CASP
- Critical Assessment of Protein Structure Prediction;
- DL
- Deep Learning;
- FID
- Free Induction Decay data;
- GDT
- Global Distance Test;
- NESG
- Northeast Structural Genomics Consortium;
- NOE
- nuclear Overhauser effect;
- NOESY
- NOE spectroscopy;
- PAG
- PolyAcrylamide Gel (stretched);
- PDB
- Protein Data Bank;
- PEG
- Polyethylene Glycol;
- pLDDT
- predicted Local Difference Distance Test;
- PSVS
- Protein Structure Validation Software suite;
- RDC
- Residual Dipolar Coupling;
- RMSD
- Root Mean Square Deviation;
- RPF-DP score
- Recall, Precision, F-measure, and Discrimination Power score.