Abstract
Single cell RNA sequencing (scRNA-seq) technology has undergone rapid development in recent years, bringing with new challenges in data processing and analysis. This has led to an explosion of tailored analysis methods for scRNA-seq data to address various biological questions. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically evaluate the performance of the many methods available. Here, we designed and carried out a realistic benchmark experiment that included mixtures of single cells or ‘pseudo cells’ created by sampling admixtures of cells or RNA from up to 5 distinct cancer cell lines. Altogether we generated 14 datasets using droplet and plate-based scRNA-seq protocols, compared multiple data analysis methods in combination for tasks ranging from normalization and imputation, to clustering, trajectory analysis and data integration. Evaluation across 3,913 analyses (methods × benchmark dataset combinations) revealed pipelines suited to different types of data for different tasks. Our dataset and analysis present a comprehensive comparison framework for benchmarking most common scRNA-seq analysis tasks.