Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences

Andonis Gerardos, Nicola Dietler, View ORCID ProfileAnne-Florence Bitbol
doi: https://doi.org/10.1101/2021.11.22.469574
Andonis Gerardos
1Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
2SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicola Dietler
1Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
2SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anne-Florence Bitbol
1Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
2SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anne-Florence Bitbol
  • For correspondence: anne-florence.bitbol@epfl.ch
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural dataset, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.

Author summary In protein sequence data, the amino acid usages at different sites of a protein or of two interacting proteins can be correlated because of functional constraints. For instance, the need to maintain physico-chemical complementarity among two sites that are in contact in the three-dimensional structure of a protein complex causes such correlations. However, correlations can also arise due to shared evolutionary history, even in the absence of any functional constraint. While these phylogenetic correlations are known to obscure the inference of structural contacts, we show, using controlled synthetic data, that correlations from structure and phylogeny combine constructively to allow the inference of protein partners from sequences. We also show that pairs of amino acids that are not in contact in the structure have a major impact on partner inference in a natural dataset and in realistic synthetic ones. These findings explain the success of methods based on pairwise maximum-entropy models or on information theory at predicting protein partners from sequences.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted November 22, 2021.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences
Andonis Gerardos, Nicola Dietler, Anne-Florence Bitbol
bioRxiv 2021.11.22.469574; doi: https://doi.org/10.1101/2021.11.22.469574
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences
Andonis Gerardos, Nicola Dietler, Anne-Florence Bitbol
bioRxiv 2021.11.22.469574; doi: https://doi.org/10.1101/2021.11.22.469574

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3477)
  • Biochemistry (7316)
  • Bioengineering (5294)
  • Bioinformatics (20189)
  • Biophysics (9972)
  • Cancer Biology (7698)
  • Cell Biology (11243)
  • Clinical Trials (138)
  • Developmental Biology (6416)
  • Ecology (9912)
  • Epidemiology (2065)
  • Evolutionary Biology (13271)
  • Genetics (9347)
  • Genomics (12544)
  • Immunology (7667)
  • Microbiology (18928)
  • Molecular Biology (7415)
  • Neuroscience (40870)
  • Paleontology (298)
  • Pathology (1226)
  • Pharmacology and Toxicology (2125)
  • Physiology (3138)
  • Plant Biology (6836)
  • Scientific Communication and Education (1268)
  • Synthetic Biology (1891)
  • Systems Biology (5295)
  • Zoology (1083)