Evaluation of Cell Type Annotation R Packages on Single Cell RNA-seq Data

Qianhui Huang; Yu Liu; Yuheng Du; Lana X. Garmire

doi:10.1101/827139

Abstract

Annotating cell types is a critical step in single cell RNA-Seq (scRNA-Seq) data analysis. Some supervised/semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking to provide practical guidelines. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-Seq analysis. In this study, we evaluated ten cell-type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single cell research (Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, SCINA). The other two methods are repurposed from deconvoluting DNA methylation data: Linear Constrained Projection (CP) and Robust Partial Correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions, the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased classification labels, as well as the capabilities on rare and unknown cell-type detection. Overall, methods such as Seurat, SingleR, CP, RPC and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Also, Seurat, SingleR, CP and RPC are more robust against down-sampling. However, Seurat does have a major drawback at predicting rare cell populations, and it is suboptimal at differentiating cell types that are highly similar to each other, while SingleR and RPC are much better in these aspects. All the codes and data are available at: https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.

Footnotes

Adding additional simulation results to address the robustness Adding additional results for multi-class rejection accuracy tests Adding additional summary figure for different evaluation metrics Improved comparison with batch correction in other methods besides Seurat

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.