Abstract
Single cell RNA-seq (scRNA-seq) experiments can provide a wealth of information about heterogeneous, multi-cellular systems. However, this information has to be inferred computationally from sequencing reads which constitute a sparse and noisy sub-sampling of the actual cellular transcriptomes. Here we present UNCURL, a unified framework for scRNA-seq data visualization, cell type identification and lineage estimation that explicitly accounts for the sequencing process. The main algorithmic novelty is a non-negative matrix factorization method that uses knowledge of the distribution resulting from the sequencing process to more accurately model the underlying cell state matrix. We also develop a systematic way for incorporating prior biological information such as bulk RNA expression profiles into the cell state matrix. We find that UNCURL dramatically improves performance over state-of-the-art methods both in the absence and presence of prior knowledge. Finally we demonstrate that using UNCURL as a data preprocessing tool significantly improves the performance of existing scRNA-seq analysis algorithms.
Footnotes
↵* emails: ksreeram{at}uw.edu, gseelig{at}uw.edu