RT Journal Article SR Electronic T1 scFlow: A Scalable and Reproducible Analysis Pipeline for Single-Cell RNA Sequencing Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.08.16.456499 DO 10.1101/2021.08.16.456499 A1 Combiz Khozoie A1 Nurun Fancy A1 Mahdi M. Marjaneh A1 Alan E. Murphy A1 Paul M. Matthews A1 Nathan Skene YR 2021 UL http://biorxiv.org/content/early/2021/08/19/2021.08.16.456499.1.abstract AB Advances in single-cell RNA-sequencing technology over the last decade have enabled exponential increases in throughput: datasets with over a million cells are becoming commonplace. The burgeoning scale of data generation, combined with the proliferation of alternative analysis methods, led us to develop the scFlow toolkit and the nf-core/scflow pipeline for reproducible, efficient, and scalable analyses of single-cell and single-nuclei RNA-sequencing data. The scFlow toolkit provides a higher level of abstraction on top of popular single-cell packages within an R ecosystem, while the nf-core/scflow Nextflow pipeline is built within the nf-core framework to enable compute infrastructure-independent deployment across all institutions and research facilities. Here we present our flexible pipeline, which leverages the advantages of containerization and the potential of Cloud computing for easy orchestration and scaling of the analysis of large case/control datasets by even non-expert users. We demonstrate the functionality of the analysis pipeline from sparse-matrix quality control through to insight discovery with examples of analysis of four recently published public datasets and describe the extensibility of scFlow as a modular, open-source tool for single-cell and single nuclei bioinformatic analyses.Competing Interest StatementPMM has received consultancy fees from Roche, Adelphi Communications, Celgene, Neurodiem and Medscape. He has received honoraria or speakers' fees from Novartis and Biogen and has received research or educational funds from Biogen, Novartis and GlaxoSmithKline