TY - JOUR T1 - PatternMarkers and Genome-Wide CoGAPS in Analysis in Parallel Sets (GWCoGAPS) for data-driven detection of novel biomarkers via whole transcriptome Non-negative matrix factorization (NMF) JF - bioRxiv DO - 10.1101/083717 SP - 083717 AU - Genevieve Stein-O’Brien AU - Jacob Carey AU - Wai-shing Lee AU - Michael Considine AU - Alexander Favorov AU - Emily Flam AU - Theresa Guo AU - Lucy Li AU - Luigi Marchionni AU - Thomas Sherman AU - Shawn Sivy AU - Daria Gaykalova AU - Ronald McKay AU - Michael Ochs AU - Carlo Colantuoni AU - Elana Fertig Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/26/083717.abstract N2 - Summary NMF algorithms associate gene expression changes with biological processes (e.g., time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers identification. Therefore, we developed a novel PatternMarkers statistic to extract unique genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with PatternMarkers requires whole-genome data. However, NMF algorithms typically do not converge for the tens of thousands of genes in genome-wide profiling. Therefore, we also developed GWCoGAPS, the first robust Bayesian NMF technique for whole genome transcriptomics using the sparse, MCMC algorithm, CoGAPS. This software contains additional analytic and visualization tools including a Shiny web application, patternMatcher, which are generalized for any NMF. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTex data, illustrating GWCoGAPS and patternMarkers unique ability to detect data-driven biomarkers from whole genome data.Availability PatternMarkers and GWCoGAPS are in the CoGAPS Bioconductor package as of version 3.5 under the GPL license.Contact CColantu{at}jhmi.edu; ejfertig{at}jhmi.eduSupplementary information Supplementary data is available at Bioinformatics online. ER -