1 Abstract
Unsupervised clustering to identify distinct cell types is a crucial step in the analysis of scRNA-seq data. Current clustering methods are dependent on a number of parameters whose effect on the resulting solution’s accuracy and reproducibility are poorly understood. The adjustment of clustering parameters is therefore ad-hoc, with most users deviating minimally from default settings. constclust is a novel meta-clustering method based on the idea that if the data contains distinct populations which a clustering method can identify, meaningful clusters should be robust to small changes in the parameters used to derive them. By reconciling solutions from a clustering method over multiple parameters, we can identify locally robust clusters of cells and their corresponding regions of parameter space. Rather than assigning cells to a single partition of the data set, this approach allows for discovery of discrete groups of cells which can correspond to the multiple levels of cellular identity. Additionally constclust requires significantly fewer computational resources than current consensus clustering methods for scRNA-seq data. We demonstrate the utility, accuracy, and performance of constclust as part of the analysis workflow. constclust is available at https://github.com/ivirshup/constclust1.
Competing Interest Statement
The authors have declared no competing interest.