Abstract
In eukaryotes, mRNA abundance is often a poor proxy for protein abundance1–5. Despite this, the majority of methods used to dissect function in mammalian biology6 and for biomarker discovery in complex diseases7 involve manipulation or measurement of mRNA. The discrepancy between mRNA and protein abundance is likely due to several factors, including differences in the rates of translation and degradation between proteins and cell-types8, unequal contribution of individual splice variants to the production of a given protein9 and cell-type specific differences in splice variant use10. Here we performed experimental and computational time-series analysis of RNA-seq and mass-spectrometry of three key immune cell-types in human and mice and constructed mathematical mixed time-delayed splice variant models to predict protein abundances. These models had median correlations to protein abundance measurements of 0.79-0.94, which is a significant increase from the previously reported 0.21 on human protein atlas data1, and out-performed less complicated models without the usage of multiple splice variants and time-delay in cross-validation tests. We showed the importance of our models for biomarker discovery by re-analysing RNA-seq data from five different complex diseases, which led to the prediction of new disease proteins that were validated in multiple sclerosis. Our findings suggest that similar protein abundance models may be created for the most critical cell-types in the human body.