Abstract
Single-cell RNA-sequencing (scRNA-seq) has enabled the molecular profiling of thousands to millions of cells simultaneously in biologically heterogenous samples. Currently, common practice in scRNA-seq is to determine cell type labels through unsupervised clustering and the examination of cluster-specific genes. However, even small differences in analysis and parameter choice can greatly alter clustering solutions and thus impose great influence on which cell types are identified. Existing methods largely focus on determining the optimal number of robust clusters, which is not favorable for identifying cells of extremely low abundance due to their subtle contributions towards overall patterns of gene expression. Here we present a carefully designed framework, SCISSORS, which accurately profiles subclusters within major cluster(s) for the identification of rare cell types in scRNA-seq data. SCISSORS employs silhouette scoring for the estimation of heterogeneity of clusters and reveals rare cells in heterogenous clusters by implementing a multi-step, semi-supervised reclustering process. Additionally, SCISSORS provides a method for the identification of marker genes of rare cells, which may be used for further study. SCISSORS is wrapped around the popular Seurat R package and can be easily integrated into existing Seurat pipelines. SCISSORS, including source code and vignettes for two example datasets, is freely available at https://github.com/jrleary/SCISSORS.
Competing Interest Statement
The authors have declared no competing interest.