Abstract
We present distinct, a general method for differential analysis of full distributions that is well suited to applications on single-cell data, such as single-cell RNA sequencing and high-dimensional flow or mass cytometry data. High-throughput single-cell data reveal an unprecedented view of cell identity and allow complex variations between conditions to be discovered; nonetheless, most methods for differential expression target differences in the mean and struggle to identify changes where the mean is only marginally affected. distinct is based on a hierarchical non-parametric permutation approach and, by comparing empirical cumulative distribution functions, identifies both differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean. We performed extensive bench-marks across both simulated and experimental datasets from single-cell RNA sequencing and mass cytometry data, where distinct shows favourable performance, identifies more differential patterns than competitors, and displays good control of false positive and false discovery rates. distinct is available as a Bioconductor R package.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
In the presence of nuisance covariates (e.g., batch effects), we previously used a linear model with such covariates as the only predictors. In formula (3), we have replaced this model, with a random effects model with covariates as fixed effects and samples as random effects. This properly accounts for the variance structure across samples: 2 measurements from the same sample are now correlated, while 2 measurements from different samples are not.