Abstract
Transcriptomic data are widely available, and the extent to which they are predictive of protein abundances remains debated. Using multiple public databases, we calculate mRNA and mRNA-to-protein ratio variability across human tissues to quantify and classify genes for protein abundance predictability confidence. We propose that such predictability is best understood as a spectrum. A gene-specific, tissue-independent mRNA-to-protein ratio plus mRNA levels explains ∼80% of protein abundance variance for more predictable genes, as compared to ∼55% for less predictable genes. Protein abundance predictability is consistent with independent mRNA and protein data from two disparate cell lines, and mRNA-to-protein ratios estimated from publicly-available databases have predictive power in these independent datasets. Genes with higher predictability are enriched for metabolic function, tissue development/cell differentiation roles, and transmembrane transporter activity. Genes with lower predictability are associated with cell adhesion, motility and organization, the immune system, and the cytoskeleton. Surprisingly, many genes that regulate mRNA-to-protein ratios are constitutively expressed but also exhibit ratio variability, suggesting a general autoregulation mechanism whereby protein expression profile changes can be implemented quickly, or homeostatic sensing stabilizes protein abundances under fluctuating conditions. Gene classifications and their mRNA-to-protein ratios are provided as a resource to facilitate protein abundance predictions by others.