Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration

Evan H. Baugh, Riley Simmons-Edler, Christian L. Müller, Rebecca F. Alford, Natalia Volfovsky, Alex E. Lash, Richard Bonneau
doi: https://doi.org/10.1101/029041
Evan H. Baugh
1Department of Biology, New York University, NY, NY 10003
3New York University Center for Genomics and Systems Biology, NY, NY 10003
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: rbonneau@simonsfoundation.org
Riley Simmons-Edler
1Department of Biology, New York University, NY, NY 10003
3New York University Center for Genomics and Systems Biology, NY, NY 10003
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christian L. Müller
2Computer Science Department, New York University, NY, NY 10003
3New York University Center for Genomics and Systems Biology, NY, NY 10003
5Simons Center for Data Analysis, Simons Foundation, NY, NY 10010
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rebecca F. Alford
6Carnegie Mellon University Department of Chemistry, 5000 Forbes Ave, Pittsburgh, PA, 15289
7Commack High School, Commack NY, 11725
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Natalia Volfovsky
4Simons Foundation, NY, NY 10010
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alex E. Lash
4Simons Foundation, NY, NY 10010
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard Bonneau
1Department of Biology, New York University, NY, NY 10003
2Computer Science Department, New York University, NY, NY 10003
3New York University Center for Genomics and Systems Biology, NY, NY 10003
4Simons Foundation, NY, NY 10010
5Simons Center for Data Analysis, Simons Foundation, NY, NY 10010
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: rbonneau@simonsfoundation.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modeling (using the Rosetta protein modeling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9,477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism’s proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR‘s predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly we demonstrate VIPUR‘s ability to highlight candidate genes associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted October 16, 2015.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration
Evan H. Baugh, Riley Simmons-Edler, Christian L. Müller, Rebecca F. Alford, Natalia Volfovsky, Alex E. Lash, Richard Bonneau
bioRxiv 029041; doi: https://doi.org/10.1101/029041
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Robust Classification of Protein Variation Using Structural Modeling and Large-Scale Data Integration
Evan H. Baugh, Riley Simmons-Edler, Christian L. Müller, Rebecca F. Alford, Natalia Volfovsky, Alex E. Lash, Richard Bonneau
bioRxiv 029041; doi: https://doi.org/10.1101/029041

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4231)
  • Biochemistry (9124)
  • Bioengineering (6774)
  • Bioinformatics (23981)
  • Biophysics (12115)
  • Cancer Biology (9518)
  • Cell Biology (13772)
  • Clinical Trials (138)
  • Developmental Biology (7625)
  • Ecology (11682)
  • Epidemiology (2066)
  • Evolutionary Biology (15500)
  • Genetics (10637)
  • Genomics (14317)
  • Immunology (9476)
  • Microbiology (22825)
  • Molecular Biology (9087)
  • Neuroscience (48943)
  • Paleontology (355)
  • Pathology (1480)
  • Pharmacology and Toxicology (2567)
  • Physiology (3844)
  • Plant Biology (8324)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2295)
  • Systems Biology (6184)
  • Zoology (1300)