Abstract
We introduce PICS (Pathway Informed Classification System) for classifying cancers based on tumor sample gene expression levels. The method clearly separates a pan-cancer dataset into their tissue of origin and is also able to sub-classify individual cancer datasets into distinct survival classes. Gene expression values are collapsed into pathway scores that reveal which biological activities are most useful for clustering cancer cohorts into sub-types. Variants of the method allow it to be used on datasets that do and do not contain non-cancerous samples. Activity levels of all types of pathways, broadly grouped into metabolic, cellular processes and signaling, and immune system, are useful for separating the pan-cancer cohort. In the clustering of specific cancer types, certain pathway types become more valuable depending on the site being studied. For lung cancer, signaling pathways dominate, for pancreatic cancer signaling and metabolic pathways, and for melanoma immune system pathways are the most useful. This work suggests the utility of pathway level genomic analysis and points in the direction of using pathway classification for predicting the efficacy and side effects of drugs and radiation.