RT Journal Article SR Electronic T1 MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data JF bioRxiv FD Cold Spring Harbor Laboratory SP 020842 DO 10.1101/020842 A1 Greg Finak A1 Andrew McDavid A1 Masanao Yajima A1 Jingyuan Deng A1 Vivian Gersuk A1 Alex K. Shalek A1 Chloe K. Slichter A1 Hannah W. Miller A1 M. Julianna McElrath A1 Martin Prlic A1 Peter S. Linsley A1 Raphael Gottardo YR 2015 UL http://biorxiv.org/content/early/2015/06/15/020842.abstract AB Single-cell transcriptomic profiling enables the unprecedented interrogation of gene expression heterogeneity in rare cell populations that would otherwise be obscured in bulk RNA sequencing experiments. The stochastic nature of transcription is revealed in the bimodality of single-cell transcriptomic data, a feature shared across single-cell expression platforms. There is, however, a paucity of computational tools that take advantage of this unique characteristic. We present a new methodology to analyze single-cell transcriptomic data that models this bimodality within a coherent generalized linear modeling framework. We propose a two-part, generalized linear model that allows one to characterize biological changes in the proportions of cells that are expressing each gene, and in the positive mean expression level of that gene. We introduce the cellular detection rate, the fraction of genes turned on in a cell, and show how it can be used to simultaneously adjust for technical variation and so-called “extrinsic noise” at the single-cell level without the use of control genes. Our model permits direct inference on statistics formed by collections of genes, facilitating gene set enrichment analysis. The residuals defined by such models can be manipulated to interrogate cellular heterogeneity and gene-gene correlation across cells and conditions, providing insights into the temporal evolution of networks of co-expressed genes at the single-cell level. Using two single-cell RNA-seq datasets, including newly generated data from Mucosal Associated Invariant T (MAIT) cells, we show how model residuals can be used to identify significant changes across biologically relevant gene sets that are missed by other methods and characterize cellular heterogeneity in response to stimulation.