Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

An interpretable machine learning algorithm to predict disordered protein phase separation based on biophysical interactions

Hao Cai, Robert M. Vernon, View ORCID ProfileJulie D. Forman-Kay
doi: https://doi.org/10.1101/2022.07.06.499043
Hao Cai
1Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert M. Vernon
1Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julie D. Forman-Kay
1Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
2Department of Biochemistry, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Julie D. Forman-Kay
  • For correspondence: forman@sickkids.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase separation prediction algorithms are available, with many specific for particular classes of proteins and others providing results that are not amenable to interpretation of contributing biophysical interactions. Here we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase separation driver proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest importance of solvent contacts, disorder, hydrogen bonds, pi-pi contacts, and kinked-beta structure, with electrostatics, cation-pi, and absence of helical secondary structure also contributing. LLPhyScore has strong phase separation prediction recall statistics and enables a quantitative breakdown of the contribution from each physical feature to a sequence’s phase separation propensity. The tool should be a valuable resource for guiding experiment and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.

Figure
  • Download figure
  • Open in new tab

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 06, 2022.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
An interpretable machine learning algorithm to predict disordered protein phase separation based on biophysical interactions
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
An interpretable machine learning algorithm to predict disordered protein phase separation based on biophysical interactions
Hao Cai, Robert M. Vernon, Julie D. Forman-Kay
bioRxiv 2022.07.06.499043; doi: https://doi.org/10.1101/2022.07.06.499043
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
An interpretable machine learning algorithm to predict disordered protein phase separation based on biophysical interactions
Hao Cai, Robert M. Vernon, Julie D. Forman-Kay
bioRxiv 2022.07.06.499043; doi: https://doi.org/10.1101/2022.07.06.499043

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Biophysics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3686)
  • Biochemistry (7774)
  • Bioengineering (5668)
  • Bioinformatics (21245)
  • Biophysics (10563)
  • Cancer Biology (8162)
  • Cell Biology (11915)
  • Clinical Trials (138)
  • Developmental Biology (6738)
  • Ecology (10388)
  • Epidemiology (2065)
  • Evolutionary Biology (13843)
  • Genetics (9694)
  • Genomics (13056)
  • Immunology (8123)
  • Microbiology (19956)
  • Molecular Biology (7833)
  • Neuroscience (42973)
  • Paleontology (318)
  • Pathology (1276)
  • Pharmacology and Toxicology (2256)
  • Physiology (3350)
  • Plant Biology (7208)
  • Scientific Communication and Education (1309)
  • Synthetic Biology (1999)
  • Systems Biology (5528)
  • Zoology (1126)