RT Journal Article SR Electronic T1 NormExpression: an R package to normalize gene expression data using evaluated methods JF bioRxiv FD Cold Spring Harbor Laboratory SP 251140 DO 10.1101/251140 A1 Zhenfeng Wu A1 Weixiang Liu A1 Xiufeng Jin A1 Deshui Yu A1 Hua Wang A1 Gustavo Glusman A1 Max Robinson A1 Lin Liu A1 Jishou Ruan A1 Gao Shan YR 2018 UL http://biorxiv.org/content/early/2018/02/07/251140.abstract AB Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the current normalization methods, the different metrics yield inconsistent results. In this study, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods, achieving consistency in our evaluation results using both bulk RNA-seq and scRNA-seq data from the same library construction protocol. This consistency has validated the underlying theory that a sucessiful normalization method simultaneously maximizes the number of uniform genes and minimizes the correlation between the expression profiles of gene pairs. This consistency can also be used to analyze the quality of gene expression data. The gene expression data, normalization methods and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to evaluate methods (particularly some data-driven methods or their own methods) and then select a best one for data normalization in the gene expression analysis.