RT Journal Article SR Electronic T1 Prior knowledge and sampling model informed learning with single cell RNA-Seq data JF bioRxiv FD Cold Spring Harbor Laboratory SP 142398 DO 10.1101/142398 A1 Sumit Mukherjee A1 Yue Zhang A1 Sreeram Kannan A1 Georg Seelig YR 2017 UL http://biorxiv.org/content/early/2017/05/25/142398.abstract AB Single cell RNA-seq (scRNA-seq) experiments can provide a wealth of information about heterogeneous, multi-cellular systems. However, this information has to be inferred computationally from sequencing reads which constitute a sparse and noisy sub-sampling of the actual cellular transcriptomes. Here we present UNCURL, a unified framework for scRNA-seq data visualization, cell type identification and lineage estimation that explicitly accounts for the sequencing process. The main algorithmic novelty is a non-negative matrix factorization method that uses knowledge of the distribution resulting from the sequencing process to more accurately model the underlying cell state matrix. We also develop a systematic way for incorporating prior biological information such as bulk RNA expression profiles into the cell state matrix. We find that UNCURL dramatically improves performance over state-of-the-art methods both in the absence and presence of prior knowledge. Finally we demonstrate that using UNCURL as a data preprocessing tool significantly improves the performance of existing scRNA-seq analysis algorithms.