RT Journal Article SR Electronic T1 PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF JF bioRxiv FD Cold Spring Harbor Laboratory SP 083717 DO 10.1101/083717 A1 Genevieve L Stein-O’Brien A1 Jacob L Carey A1 Wai-shing Lee A1 Michael Considine A1 Alexander V Favorov A1 Emily Flam A1 Theresa Guo A1 Sijia Li A1 Luigi Marchionni A1 Thomas Sherman A1 Shawn Sivy A1 Daria A Gaykalova A1 Ronald D McKay A1 Michael F Ochs A1 Carlo Colantuoni A1 Elana J Fertig YR 2016 UL http://biorxiv.org/content/early/2016/10/28/083717.abstract AB Summary Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g., time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel PatternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with PatternMarkers requires whole-genome data. However, NMF algorithms typically do not converge for the tens of thousands of genes in genome-wide profiling. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. This software contains analytic and visualization tools including a Shiny web application, patternMatcher, which are generalized for any NMF. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTex data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data.Availability PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license.Contact gsteinobrien{at}jhmi.edu; ccolantu{at}jhmi.edu; ejfertig{at}jhmi.edu