SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures

Hong-Qiang Wang; Lindsey K Tuominen; Chung-Jui Tsai

doi:10.1093/bioinformatics/btq650

SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures

Bioinformatics. 2011 Jan 15;27(2):225-31. doi: 10.1093/bioinformatics/btq650. Epub 2010 Nov 18.

Authors

Hong-Qiang Wang¹, Lindsey K Tuominen, Chung-Jui Tsai

Affiliation

¹ Warnell School of Forestry and Natural Resources, University of Georgia, Athens, GA 30602, USA.

PMID: 21098430
DOI: 10.1093/bioinformatics/btq650

Abstract

Motivation: The pre-estimate of the proportion of null hypotheses (π(0)) plays a critical role in controlling false discovery rate (FDR) in multiple hypothesis testing. However, hidden complex dependence structures of many genomics datasets distort the distribution of p-values, rendering existing π(0) estimators less effective.

Results: From the basic non-linear model of the q-value method, we developed a simple linear algorithm to probe local dependence blocks. We uncovered a non-static relationship between tests' p-values and their corresponding q-values that is influenced by data structure and π(0). Using an optimization framework, these findings were exploited to devise a Sliding Linear Model (SLIM) to more reliably estimate π(0) under dependence. When tested on a number of simulation datasets with varying data dependence structures and on microarray data, SLIM was found to be robust in estimating π(0) against dependence. The accuracy of its π(0) estimation suggests that SLIM can be used as a stand-alone tool for prediction of significant tests.

Availability: The R code of the proposed method is available at http://aspendb.uga.edu/downloads for academic use.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Computer Simulation
Gene Expression Profiling*
Linear Models
Oligonucleotide Array Sequence Analysis
Populus / genetics
Populus / metabolism