RT Journal Article SR Electronic T1 Reframing Breast Cancer Molecular Subtypes: Modeling Expression Patterns with Orthogonal Dimensions JF bioRxiv FD Cold Spring Harbor Laboratory SP 133397 DO 10.1101/133397 A1 Michael J Madsen A1 Stacey Knight A1 Carol Sweeney A1 Rachel Factor A1 Mohamed Salama A1 Venkatesh Rajamanickam A1 Bryan E Welm A1 Sasi Arunachalam A1 Brandt Jones A1 Kerry Rowe A1 Melissa Cessna A1 Alun Thomas A1 Lawrence H. Kushi A1 Bette J Caan A1 Philip S Bernard A1 Nicola J Camp YR 2017 UL http://biorxiv.org/content/early/2017/06/06/133397.abstract AB Complex diseases can be highly heterogeneous. To characterize molecular heterogeneity, feature selection methods are often used to identify genes that capture key expression differences in the transcriptome. These gene sets are used in prediction algorithms to define distinct subtypes. Molecular subtyping has been used extensively in cancers and found to be informative for clinical care, e.g., PAM50 (Prediction Analysis of Microarray 50) for breast cancer. However, many tissues do not fit neatly into a single archetypal subtype. We propose that expression diversity can be more comprehensively represented with multiple quantitative dimensions, and that improved methods to model heterogeneity will generate new discoveries. Here, we apply principal components analysis to PAM50 gene expression data from 911 population-based breast tumors and identify orthogonal dimensions. These dimensions not only recapitulate categorical breast intrinsic subtypes, but also include dimensions not previously recognized. Furthermore, while 238 familial breast tumors (non-BRCA1/2 high-risk pedigrees) were not significantly enriched by intrinsic subtype, two novel expression dimensions were highly enriched in the pedigrees. Proof-of-concept gene-mapping using these dimensions identified a 0.5Mb genomewide significant region at 12q15 (p = 2.6×10−8) segregating to 8 breast cancers through 32 meioses. The region contains CNOT2, a gene controlling cell viability via the CCR4-NOT transcriptional regulatory deadenylase complex. These findings suggest that the multiple dimension approach is a flexible and powerful method to characterize tissue expression within a defined feature set. Our results support the hypothesis that germline susceptibilities influence tumor characteristics and that expression dimensions partition genetic heterogeneity, providing new avenues for germline genetic studies.