Cell Systems
Volume 12, Issue 10, 20 October 2021, Pages 969-982.e6
Journal home page for Cell Systems

Article
D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions

https://doi.org/10.1016/j.cels.2021.08.010Get rights and content
Under a Creative Commons license
open access

Highlights

  • Method to predict protein-protein interactions from primary amino acid sequences

  • Resulting predictions enable network clustering and functional module detection

  • Efficient genome-scale PPI prediction helps to tackle the genome-to-phenome problem

  • Application in bovine rumen reveals links between metabolism and the immune system

Summary

We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared with the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply D-SCRIPT to screen for PPIs in cow (Bos taurus) at a genome-wide scale and focusing on rumen physiology, identify functional gene modules related to metabolism and immune response. The predicted interactions can then be leveraged for function prediction at scale, addressing the genome-to-phenome challenge, especially in species where little data are available.

Keywords

protein-protein interaction
deep learning
language models
interpretability
genome to phenome
module detection
function prediction
cow rumen
metabolism
embedding

Data and code availability

Cited by (0)

4

These authors contributed equally

5

Lead contact