Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Protein embeddings and deep learning predict binding residues for various ligand classes

View ORCID ProfileMaria Littmann, View ORCID ProfileMichael Heinzinger, View ORCID ProfileChristian Dallago, View ORCID ProfileKonstantin Weissenow, View ORCID ProfileBurkhard Rost
doi: https://doi.org/10.1101/2021.09.03.458869
Maria Littmann
1TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Maria Littmann
  • For correspondence: littmann@rostlab.org assistant@rostlab.org
Michael Heinzinger
1TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
2TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Heinzinger
Christian Dallago
1TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
2TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christian Dallago
Konstantin Weissenow
1TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
2TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Konstantin Weissenow
Burkhard Rost
1TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
3Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
4Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Burkhard Rost
  • For correspondence: littmann@rostlab.org assistant@rostlab.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed existing MSA-based methods. Combination with homology-based inference increased performance to F1=29±6%, F1=24±7%, and F1=41±% for metal ions, nucleic acids, and small molecules, respectively; it reached F1=45±2% when merging all three ligand classes into one. Focusing on very reliably predicted residues could complement experimental evidence: the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when counting missing annotations as incorrect. The new method bindEmbed21 is fast, simple, and broadly applicable - neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/Rostlab/bindPredict

  • Abbreviations used

    AI
    artificial intelligence (expanding ML through deep learning, i.e., using more free parameters);
    CI
    confidence interval;
    CNN
    Convolutional Neural Network;
    HBI
    homology-based inference;
    (p)LM
    (protein) language model;
    MCC
    Matthews Correlation Coefficient;
    ML
    machine learning;
    MSA
    multiple sequence alignment;
    PDB
    Protein Data Bank;
    PIDE
    pairwise sequence identity;
    SOTA
    state-of-the-art;
    SVM
    support vector machine.
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
    Back to top
    PreviousNext
    Posted September 05, 2021.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Protein embeddings and deep learning predict binding residues for various ligand classes
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Protein embeddings and deep learning predict binding residues for various ligand classes
    Maria Littmann, Michael Heinzinger, Christian Dallago, Konstantin Weissenow, Burkhard Rost
    bioRxiv 2021.09.03.458869; doi: https://doi.org/10.1101/2021.09.03.458869
    Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
    Citation Tools
    Protein embeddings and deep learning predict binding residues for various ligand classes
    Maria Littmann, Michael Heinzinger, Christian Dallago, Konstantin Weissenow, Burkhard Rost
    bioRxiv 2021.09.03.458869; doi: https://doi.org/10.1101/2021.09.03.458869

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (3497)
    • Biochemistry (7341)
    • Bioengineering (5318)
    • Bioinformatics (20249)
    • Biophysics (10000)
    • Cancer Biology (7734)
    • Cell Biology (11291)
    • Clinical Trials (138)
    • Developmental Biology (6431)
    • Ecology (9943)
    • Epidemiology (2065)
    • Evolutionary Biology (13312)
    • Genetics (9358)
    • Genomics (12575)
    • Immunology (7696)
    • Microbiology (18999)
    • Molecular Biology (7432)
    • Neuroscience (40972)
    • Paleontology (300)
    • Pathology (1228)
    • Pharmacology and Toxicology (2133)
    • Physiology (3155)
    • Plant Biology (6857)
    • Scientific Communication and Education (1272)
    • Synthetic Biology (1895)
    • Systems Biology (5310)
    • Zoology (1087)