Abstract
Phosphoproteomic experiments routinely observe thousands of phosphorylation sites. To understand the intracellular signaling processes that generated this data, one or more causal protein kinases must be assigned to each phosphosite. However, limited knowledge of kinase specificity typically restricts assignments to a small subset of a kinome. Starting from a statistical model of a high-throughput, in vitro kinase-substrate assay, I have developed an approach to high-coverage, multi-label kinase-substrate assignment called IV-KAPhE (“In vivo-Kinase Assignment for Phosphorylation Evidence”). Tested on human data, IV-KAPhE outperforms other methods of similar scope. Such computational methods generally predict a densely connected kinase-substrate network, with most sites targeted by multiple kinases, pointing either to unaccounted-for biochemical constraints or significant cross-talk and signaling redundancy. I show that such predictions can potentially identify biased kinase-site misannotations within families of closely related kinase isoforms and they provide a robust basis for kinase activity analysis.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
In addition to numerous updates to the text, the following important revisions have occurred: - A new analysis assessing IV-KAPhE's relative performance in providing assignments for kinase activity analysis has been added. - The PhosphoSitePlus (PSP) training set was updated to a more recent version. - Only in vivo annotations from PSP are used for training the model. - Pre-computed assignments (Supplemental Table S1) have been expanded to include all human sites in PSP as well as the sites from Ochoa et al 2020. This expands the scope of the kinase-substrate network (Figure 5) and isoform (Figure 6) analyses. - An unfiltered, all-vs-all version of Table S1 is provided on the data archive site Zenodo, along with files to facilitate in vitro-model scoring of new sites. - A mistake was corrected in the isoform analysis (Figure 6), in which ProtMapper annotations were erroneously included with PSP annotations. - Predictive performance of individual in vivo features is assessed in a new panel in Figure 4 (panel b) and a ROC analysis of the competing assignment methods is carried out in Figure 4h. - A mistake has been corrected in which kinases that GPS 5.0 covers but were not assigned to any site in the test set (when they should have had at least one) were not counted. This affects GPS 5.0's test set coverage (higher than previously reported) and its performance (lower than previously reported).