Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Structure-Based Function Prediction using Graph Convolutional Networks

View ORCID ProfileVladimir Gligorijevic, P. Douglas Renfrew, View ORCID ProfileTomasz Kosciolek, View ORCID ProfileJulia Koehler Leman, View ORCID ProfileKyunghyun Cho, View ORCID ProfileTommi Vatanen, Daniel Berenberg, View ORCID ProfileBryn Taylor, Ian M. Fisk, View ORCID ProfileRamnik J. Xavier, View ORCID ProfileRob Knight, View ORCID ProfileRichard Bonneau
doi: https://doi.org/10.1101/786236
Vladimir Gligorijevic
Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vladimir Gligorijevic
  • For correspondence: vgligorijevic@flatironinstitute.org rb133@nyu.edu
P. Douglas Renfrew
Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tomasz Kosciolek
Department of Pediatrics, University of California San Diego, La Jolla, CA, USAMalopolska Centre of Biotechnology, Krakow, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tomasz Kosciolek
Julia Koehler Leman
Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Julia Koehler Leman
Kyunghyun Cho
Facebook AI ResearchCIFAR Azrieli Global ScholarCenter for Data Science, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kyunghyun Cho
Tommi Vatanen
Broad Institute of MIT and Harvard, Cambridge, MA, USAThe Liggins Institute, University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tommi Vatanen
Daniel Berenberg
Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bryn Taylor
Department of Pediatrics, University of California San Diego, La Jolla, CA, USACenter for Microbiome Innovation, University of California San Diego, La Jolla, CA, USADepartment of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bryn Taylor
Ian M. Fisk
Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ramnik J. Xavier
Broad Institute of MIT and Harvard, Cambridge, MA, USACenter for Computational and Integrative Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USAGastrointestinal Unit, and Center for the Study of Inflammatory Bowel Disease, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ramnik J. Xavier
Rob Knight
Department of Pediatrics, University of California San Diego, La Jolla, CA, USACenter for Microbiome Innovation, University of California San Diego, La Jolla, CA, USADepartment of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rob Knight
Richard Bonneau
Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USACenter for Data Science, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Richard Bonneau
  • For correspondence: vgligorijevic@flatironinstitute.org rb133@nyu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Recent massive increases in the number of sequences available in public databases challenges current experimental approaches to determining protein function. These methods are limited by both the large scale of these sequences databases and the diversity of protein functions. We present a deep learning Graph Convolutional Network (GCN) trained on sequence and structural data and evaluate it on ~40k proteins with known structures and functions from the Protein Data Bank (PDB). Our GCN predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and competing methods. Feature extraction via a language model removes the need for constructing multiple sequence alignments or feature engineering. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 30% sequence identity to the training set. Using class activation mapping, we can automatically identify structural regions at the residue-level that lead to each function prediction for every protein confidently predicted, advancing site-specific function prediction. De-noising inherent in the trained model allows an only minor drop in performance when structure predictions are used, including multiple de novo protocols. We use our method to annotate all proteins in the PDB, making several new confident function predictions spanning both fold and function trees.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted October 04, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Structure-Based Function Prediction using Graph Convolutional Networks
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Structure-Based Function Prediction using Graph Convolutional Networks
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Kyunghyun Cho, Tommi Vatanen, Daniel Berenberg, Bryn Taylor, Ian M. Fisk, Ramnik J. Xavier, Rob Knight, Richard Bonneau
bioRxiv 786236; doi: https://doi.org/10.1101/786236
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Structure-Based Function Prediction using Graph Convolutional Networks
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Kyunghyun Cho, Tommi Vatanen, Daniel Berenberg, Bryn Taylor, Ian M. Fisk, Ramnik J. Xavier, Rob Knight, Richard Bonneau
bioRxiv 786236; doi: https://doi.org/10.1101/786236

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1545)
  • Biochemistry (2500)
  • Bioengineering (1757)
  • Bioinformatics (9729)
  • Biophysics (3929)
  • Cancer Biology (2990)
  • Cell Biology (4235)
  • Clinical Trials (135)
  • Developmental Biology (2653)
  • Ecology (4129)
  • Epidemiology (2033)
  • Evolutionary Biology (6933)
  • Genetics (5243)
  • Genomics (6532)
  • Immunology (2208)
  • Microbiology (7012)
  • Molecular Biology (2784)
  • Neuroscience (17412)
  • Paleontology (127)
  • Pathology (432)
  • Pharmacology and Toxicology (712)
  • Physiology (1068)
  • Plant Biology (2516)
  • Scientific Communication and Education (647)
  • Synthetic Biology (835)
  • Systems Biology (2699)
  • Zoology (439)