A unified approach to false discovery rate estimation

Korbinian Strimmer

doi:10.1186/1471-2105-9-303

A unified approach to false discovery rate estimation

BMC Bioinformatics. 2008 Jul 9:9:303. doi: 10.1186/1471-2105-9-303.

Author

Korbinian Strimmer¹

Affiliation

¹ Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Härtelstr, 16-18, 04107 Leipzig, Germany. strimmer@uni-leipzig.de

Abstract

Background: False discovery rate (FDR) methods play an important role in analyzing high-dimensional data. There are two types of FDR, tail area-based FDR and local FDR, as well as numerous statistical algorithms for estimating or controlling FDR. These differ in terms of underlying test statistics and procedures employed for statistical learning.

Results: A unifying algorithm for simultaneous estimation of both local FDR and tail area-based FDR is presented that can be applied to a diverse range of test statistics, including p-values, correlations, z- and t-scores. This approach is semipararametric and is based on a modified Grenander density estimator. For test statistics other than p-values it allows for empirical null modeling, so that dependencies among tests can be taken into account. The inference of the underlying model employs truncated maximum-likelihood estimation, with the cut-off point chosen according to the false non-discovery rate.

Conclusion: The proposed procedure generalizes a number of more specialized algorithms and thus offers a common framework for FDR estimation consistent across test statistics and types of FDR. In comparative study the unified approach performs on par with the best competing yet more specialized alternatives. The algorithm is implemented in R in the "fdrtool" package, available under the GNU GPL from http://strimmerlab.org/software/fdrtool/ and from the R package archive CRAN.

MeSH terms

Algorithms
Biometry / methods
Breast Neoplasms / genetics
Confidence Intervals
Female
HIV / genetics
Humans
Likelihood Functions
Models, Statistical*
Oligonucleotide Array Sequence Analysis
Predictive Value of Tests*
Sample Size
Software*