Abstract
The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not been established, yet. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ∼ 3,000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.
Footnotes
↵+ hellmann{at}bio.lmu.de
In this revised manuscript, we added the analysis of a real dataset and show that pipeline choices indeed have an effect on identification and characterization of cell-types in scRNA-seq datasets. Furthermore, we investigate the detection biases that lead to the observed differences in the genes found by the different mappers and annotations. Finally, we added a downsampling function to our simulator powsimR, that now allows us to evaluate different sequencing depths and thus improves the comparability between the 10X Chromium and the other library preparation methods.