Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets

Sci Rep. 2014 Feb 26:4:4191. doi: 10.1038/srep04191.

Abstract

Gene annotation databases (compendiums maintained by the scientific community that describe the biological functions performed by individual genes) are commonly used to evaluate the functional properties of experimentally derived gene sets. Overlap statistics, such as Fishers Exact test (FET), are often employed to assess these associations, but don't account for non-uniformity in the number of genes annotated to individual functions or the number of functions associated with individual genes. We find FET is strongly biased toward over-estimating overlap significance if a gene set has an unusually high number of annotations. To correct for these biases, we develop Annotation Enrichment Analysis (AEA), which properly accounts for the non-uniformity of annotations. We show that AEA is able to identify biologically meaningful functional enrichments that are obscured by numerous false-positive enrichment scores in FET, and we therefore suggest it be used to more accurately assess the biological properties of gene sets.

MeSH terms

  • Algorithms*
  • Data Mining / methods*
  • Databases, Protein*
  • Molecular Sequence Annotation / methods*
  • Natural Language Processing*
  • Proteome / genetics*
  • Sequence Analysis / methods*

Substances

  • Proteome