A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines

Beate Vieth; Swati Parekh; Christoph Ziegenhain; Wolfgang Enard; Ines Hellmann

doi:10.1101/583013

Abstract

The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not been established, yet. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ∼ 3,000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.

Footnotes

↵+ hellmann{at}bio.lmu.de
In this revised manuscript, we corrected an error in our analysis concerning alignment, assignment and gene detection rates (Figure 2 and associated Supplementary Figures 3 and 4). The problem was a bug in the function that summarises kallisto equivalence classes to genes. After fixing the bug, we find that kallisto identifies a similar number or more genes than STAR, depending on the annotation. We also updated the associated repositories on github and zenodo.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.