Abstract
This study uses new techniques of topological data analysis to demonstrate the relationship between breast cancer assays and important biological modules. Specifically, we map the landscape of breast tumor molecular states and integrate the information provided by the PAM50 intrinsic subtypes and the 21-gene Oncotype DX Recurrence Score. We modify the Mapper tool, which provides a visual network representation of a dataset in high dimensions, to allow us to incorporate relevant gene sets and stratification functions informed by pre-existing research on breast tumor profiling, mammary basal and luminal-epithelial cell types, and prognostication schemes. This customized tool is utilized to analyze mRNA profiles of TCGA (The Cancer Genome Atlas) breast tumors, METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) breast tumors, and GTEx (Genotype-Tissue Expression) normal mammary tissue samples. The unsupervised analysis locates the basal-like, HER2-enriched, normal-like, Luminal A, and Luminal B breast tumor subtypes along a graphical summary of the breast tumor mRNA expression profiles along assay-relevant genes. The method illuminates the inherent stratification of breast cancer types that is implied by the effectiveness of the Oncotype DX and PAM50 gene sets.