Abstract
Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool for the systematic investigation of cellular diversity. As a number of computational tools have been developed to identify and visualize cell populations within a single scRNA-seq dataset, there is a need for methods to quantitatively and statistically define proportional shifts in cell population structures across datasets, such expansion or shrinkage, or emergence or disappearance of cell populations. Here we present sc-UniFrac, a framework to statistically quantify compositional diversity in cell populations between single-cell transcriptome landscapes. sc-UniFrac enables sensitive and robust quantification in simulated and experimental datasets in terms of both population identity and quantity. We have demonstrated the utility of sc-UniFrac in multiple applications, including assessment of biological and technical replicates, classification of tissue phenotypes, identification and definition of altered cell populations, and benchmarking batch correction tools. sc-UniFrac provides a framework for quantifying diversity or alterations in cell populations across conditions, and has broad utility for gaining insight on how cell populations respond to perturbations.