RT Journal Article SR Electronic T1 Machine learning-based investigation of the cancer protein secretory pathway JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.09.09.289413 DO 10.1101/2020.09.09.289413 A1 Rasool Saghaleyni A1 Azam Sheikh Muhammad A1 Pramod Bangalore A1 Jens Nielsen A1 Jonathan L. Robinson YR 2020 UL http://biorxiv.org/content/early/2020/09/10/2020.09.09.289413.abstract AB Deregulation of the protein secretory pathway (PSP) is linked to many hallmarks of cancer, such as promoting tissue invasion and modulating cell-cell signaling. The collection of secreted proteins processed by the PSP, known as the secretome, is often studied due to its potential as a reservoir of tumor biomarkers. However, there has been less focus on the protein components of the secretory machinery itself. We therefore investigated the expression changes in secretory pathway components across many different cancer types. Specifically, we implemented a dual approach involving differential expression analysis and machine learning to identify PSP genes whose expression was associated with key tumor characteristics: mutation of p53, cancer status, and tumor stage. Eight different machine learning algorithms were included in the analysis to enable comparison between methods and to focus on signals that were robust to algorithm type. The machine learning approach was validated by identifying PSP genes known to be regulated by p53, and even outperformed the differential expression analysis approach. Among the different analysis methods and cancer types, the kinesin family members KIF20A and KIF23 were consistently among the top genes associated with malignant transformation or tumor stage. However, unlike most cancer types which exhibited elevated KIF20A expression that remained relatively constant across tumor stages, renal carcinomas displayed a more gradual increase that continued with increasing disease severity. Collectively, our study demonstrates the complementary nature of a combined differential expression and machine learning approach for analyzing gene expression data, and highlights key PSP components relevant to features of tumor pathophysiology that may constitute potential therapeutic targets.Author Summary The secretory pathway is a series of intracellular compartments and enzymes that process and export proteins from the cell to the surrounding environment. Dysfunction of the secretory pathway is associated with many diseases, including cancer, and therefore constitutes a potential target for novel therapeutic strategies. The large number of interacting components that comprise the secretory pathway pose a challenge when attempting to identify where the dysfunction originates and/or how to restore healthy function. To improve our understanding of how the secretory pathway is changed within tumors, we used gene expression data from normal tissue and tumor samples from thousands of individuals which included many different types of cancers. The data was analyzed using various machine learning algorithms which we trained to predict sample characteristics, such as disease severity. This training quantified the relative degree to which each gene was associated with the tumor characteristic, allowing us to predict which secretory pathway components were important for processes such as tumor progression—both within specific cancer types and across many different cancer types. Our approach demonstrated excellent performance compared to traditional gene expression analysis methods and identified several secretory pathway components with strong evidence of involvement in tumor development.Competing Interest StatementThe authors have declared no competing interest.