ABSTRACT
Protein abundance is defined by transcriptional, post-transcriptional and post-translational regulatory mechanisms. Understanding the code for gene expression could inform novel therapies. Here, we develop a machine learning pipeline, termed SONAR, to decipher the endogenous sequence code that determines mRNA and protein abundance in human cells. SONAR predicts up to 63% of protein abundance independent of promoter or enhancer information, and reveals a strong—yet dynamic—cell-type specific sequence code. The deep knowledge of SONAR provides a map of biologically active sequence features (SFs), which we leveraged to manipulate protein expression and tailor it to a specific cell-type. Beyond its fundamental findings, our work provides novel means to improve immunotherapies and biotechnology applications.
One Sentence Summary SONAR reveal the cell type-specific sequence code for mRNA and protein expression in human immune cells
Competing Interest Statement
B.P.N. and M.C.W filed a patent application related to this work.