A novel significance score for gene selection and ranking

Bioinformatics. 2014 Mar 15;30(6):801-7. doi: 10.1093/bioinformatics/btr671. Epub 2012 Feb 9.

Abstract

Motivation: When identifying differentially expressed (DE) genes from high-throughput gene expression measurements, we would like to take both statistical significance (such as P-value) and biological relevance (such as fold change) into consideration. In gene set enrichment analysis (GSEA), a score that can combine fold change and P-value together is needed for better gene ranking.

Results: We defined a gene significance score π-value by combining expression fold change and statistical significance (P-value), and explored its statistical properties. When compared to various existing methods, π-value based approach is more robust in selecting DE genes, with the largest area under curve in its receiver operating characteristic curve. We applied π-value to GSEA and found it comparable to P-value and t-statistic based methods, with added protection against false discovery in certain situations. Finally, in a gene functional study of breast cancer profiles, we showed that using π-value helps elucidating otherwise overlooked important biological functions.

Availability: http://gccri.uthscsa.edu/Pi_Value_Supplementary.asp

Contact: xy@ieee.org, cheny8@uthscsa.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics
  • Breast Neoplasms / metabolism
  • Databases, Genetic
  • Gene Expression
  • Gene Expression Profiling / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • ROC Curve
  • Receptors, Estrogen / metabolism

Substances

  • Receptors, Estrogen