Systematic noise degrades gene co-expression signals but can be corrected

BMC Bioinformatics. 2015 Sep 24:16:309. doi: 10.1186/s12859-015-0745-3.

Abstract

Background: In the past decade, the identification of gene co-expression has become a routine part of the analysis of high-dimensional microarray data. Gene co-expression, which is mostly detected via the Pearson correlation coefficient, has played an important role in the discovery of molecular pathways and networks. Unfortunately, the presence of systematic noise in high-dimensional microarray datasets corrupts estimates of gene co-expression. Removing systematic noise from microarray data is therefore crucial. Many cleaning approaches for microarray data exist, however these methods are aimed towards improving differential expression analysis and their performances have been primarily tested for this application. To our knowledge, the performances of these approaches have never been systematically compared in the context of gene co-expression estimation.

Results: Using simulations we demonstrate that standard cleaning procedures, such as background correction and quantile normalization, fail to adequately remove systematic noise that affects gene co-expression and at times further degrade true gene co-expression. Instead we show that a global version of removal of unwanted variation (RUV), a data-driven approach, removes systematic noise but also allows the estimation of the true underlying gene-gene correlations. We compare the performance of all noise removal methods when applied to five large published datasets on gene expression in the human brain. RUV retrieves the highest gene co-expression values for sets of genes known to interact, but also provides the greatest consistency across all five datasets. We apply the method to prioritize epileptic encephalopathy candidate genes.

Conclusions: Our work raises serious concerns about the quality of many published gene co-expression analyses. RUV provides an efficient and flexible way to remove systematic noise from high-dimensional microarray datasets when the objective is gene co-expression analysis. The RUV method as applicable in the context of gene-gene correlation estimation is available as a BioconductoR-package: RUVcorr.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Brain / metabolism*
  • Epilepsy / genetics
  • Epilepsy / metabolism
  • Epilepsy / pathology
  • Gene Expression
  • Humans
  • Linear Models
  • Oligonucleotide Array Sequence Analysis