TY - JOUR T1 - Prior knowledge and sampling model informed learning with single cell RNA-Seq data JF - bioRxiv DO - 10.1101/142398 SP - 142398 AU - Sumit Mukherjee AU - Yue Zhang AU - Sreeram Kannan AU - Georg Seelig Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/05/25/142398.abstract N2 - Single cell RNA-seq (scRNA-seq) experiments can provide a wealth of information about heterogeneous, multi-cellular systems. However, this information has to be inferred computationally from sequencing reads which constitute a sparse and noisy sub-sampling of the actual cellular transcriptomes. Here we present UNCURL, a unified framework for scRNA-seq data visualization, cell type identification and lineage estimation that explicitly accounts for the sequencing process. The main algorithmic novelty is a non-negative matrix factorization method that uses knowledge of the distribution resulting from the sequencing process to more accurately model the underlying cell state matrix. We also develop a systematic way for incorporating prior biological information such as bulk RNA expression profiles into the cell state matrix. We find that UNCURL dramatically improves performance over state-of-the-art methods both in the absence and presence of prior knowledge. Finally we demonstrate that using UNCURL as a data preprocessing tool significantly improves the performance of existing scRNA-seq analysis algorithms. ER -