Abstract
Due to the presence of systematic measurement biases, data normalization is an essential preprocessing step in the analysis of single-cell RNA sequencing (scRNA-seq) data. While a variety of normalization procedures are available for bulk RNA-seq, their suitability with respect to single-cell data is still largely unexplored. Furthermore, there may be multiple, competing considerations behind the assessment of normalization performance, some of them study-specific. The choice of normalization method can have a large impact on the results of downstream analyses (e.g., clustering, inference of cell lineages, differential expression analysis), and thus it is critically important to assess the performance of competing methods in order to select a suitable procedure for the study at hand.
We have developed scone – a framework that implements a wide range of normalization procedures in the context of scRNA-seq, and enables the assessment of their performance based on a comprehensive set of data-driven performance metrics. The accompanying open-source Bioconductor R software package scone (available at https://bioconductor.org/packages/scone) also provides numerical and graphical summaries of expression measures, data quality assessment, and data-adaptive gene and sample filtering criteria. We demonstrate the effectiveness of scone on a selection of scRNA-seq datasets across a variety of protocols, ranging from plate- to droplet-based methods. We show that scone is able to correctly rank normalization methods according to their performance in a given dataset and that selecting the best performing normalization leads to higher agreement with independent validation data than lowly-ranked methods.