Abstract
The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not been established, yet. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ∼ 3,000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.
Footnotes
↵+ hellmann{at}bio.lmu.de
In this revised manuscript, we corrected an error in our analysis concerning alignment, assignment and gene detection rates (Figure 2 and associated Supplementary Figures 3 and 4). The problem was a bug in the function that summarises kallisto equivalence classes to genes. After fixing the bug, we find that kallisto identifies a similar number or more genes than STAR, depending on the annotation. We also updated the associated repositories on github and zenodo.