SAPH-ire TFx – A Recommendation-based Machine Learning Model Captures a Broad Feature Landscape Underlying Functional Post-Translational Modifications

Nolan English; Matthew Torres

doi:10.1101/731026

ABSTRACT

Protein post-translational modifications (PTMs) are a rapidly expanding feature class of significant importance in cell biology. Due to a high burden of experimental proof, the number of functional PTMs in the eukaryotic proteome is currently underestimated. Furthermore, not all PTMs are functionally equivalent. Therefore, computational approaches that can confidently recommend the functional potential of experimental PTMs are essential. To address this challenge, we developed SAPH-ire TFx (https://saphire.biosci.gatech.edu/): a multi-feature neural network model and web resource optimized for recommending experimental PTMs with high potential for biological impact. The model is rigorously benchmarked against independent datasets and alternative models, exhibiting unmatched performance in the recall of known functional PTM sites and the recommendation of PTMs that were later confirmed experimentally. An analysis of feature contributions to model outcome provides further insight on the need for multiple rather than single features to capture the breadth of functional data in the public domain.

Contact mtorres35{at}gatech.edu

Supplementary Information See Tables S1-S6 & Figures S1-S4.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

New analysis of model feature contributions in SAPH-ire TFx conducted using Local Interpretable Model-Agnostic Explanations (LIME). Mock-experimental validation by re-analysis of newly discovered functional PTM data. New analysis of phosphosite type recall across multiple models. New SAPH-ire TFx recommendations for likely functional PTMs that intersect with functional Short Linear Motifs (SLiMs) and disease linked single nucleotide polymorphisms (SNPs).
https://saphire.biosci.gatech.edu

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.