Abstract
Sequencing the transcriptomes of single cells has greatly advanced our understanding of the cellular composition of complex tissues. In many of these systems, the role of heterogeneity has risen to prominence as a determinant of cell type composition and lineage transitions. While much effort has gone into developing appropriate tools for the analysis and comprehension of single cell sequencing data, further advances are required. Optimization-based approaches are under-utilized in single cell analysis and hold much potential due to their ability to capture global properties of the system in low dimension. Here we present SoptSC: an optimization-based algorithm for the identification of subpopulation structure, transition paths, and pseudotemporal ordering within a cell population. Based on a measure of similarity between cells, SoptSC uses non-negative matrix factorization to create low dimensional representations of the data for analysis and visualization. We find that in several examples, the low-dimensional representations produced by SoptSC offer greater potential for insight than alternative methods. We tested our methods on a simulated dataset and four published single cell datasets from Homo sapiens and Mus musculus. SoptSC is able to recapitulate a simulated developmental trajectory with greater fidelity than comparable methods. Applied to two datasets on early embryonic development, SoptSC recapitulates known trajectories with high accuracy. Analysis of murine epidermis reveals overall agreement with previous studies, but differs markedly regarding the composition and heterogeneity of the basal compartment. Analysis of murine myelopoiesis found that SoptSC can resolve complex hematopoietic subpopulation composition, and led to a new prediction regarding the asynchronous development of myeloid subpopulations during stem cell differentiation.