1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data

Juergen Cox; Matthias Mann

doi:10.1186/1471-2105-13-S16-S12

1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data

BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S12. doi: 10.1186/1471-2105-13-S16-S12. Epub 2012 Nov 5.

Authors

Juergen Cox¹, Matthias Mann

Affiliation

¹ Department for Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany. cox@biochem.mpg.de

Abstract

Quantitative proteomics now provides abundance ratios for thousands of proteins upon perturbations. These need to be functionally interpreted and correlated to other types of quantitative genome-wide data such as the corresponding transcriptome changes. We describe a new method, 2D annotation enrichment, which compares quantitative data from any two 'omics' types in the context of categorical annotation of the proteins or genes. Suitable genome-wide categories are membership of proteins in biochemical pathways, their annotation with gene ontology terms, sub-cellular localization, presence of protein domains or membership in protein complexes. 2D annotation enrichment detects annotation terms whose members show consistent behavior in one or both of the data dimensions. This consistent behavior can be a correlation between the two data types, such as simultaneous up- or down-regulation in both data dimensions, or a lack thereof, such as regulation in one dimension but no change in the other. For the statistical formulation of the test we introduce a two-dimensional generalization of the nonparametric two-sample test. The false discovery rate is stringently controlled by correcting for multiple hypothesis testing. We also describe one-dimensional annotation enrichment, which can be applied to single omics data. The 1D and 2D annotation enrichment algorithms are freely available as part of the Perseus software.

MeSH terms

Algorithms
Data Interpretation, Statistical
Genes*
Proteins / chemistry*
Proteomics / statistics & numerical data*
Software

Substances

Proteins