SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures

Bioinformatics. 2011 Jan 15;27(2):225-31. doi: 10.1093/bioinformatics/btq650. Epub 2010 Nov 18.

Abstract

Motivation: The pre-estimate of the proportion of null hypotheses (π(0)) plays a critical role in controlling false discovery rate (FDR) in multiple hypothesis testing. However, hidden complex dependence structures of many genomics datasets distort the distribution of p-values, rendering existing π(0) estimators less effective.

Results: From the basic non-linear model of the q-value method, we developed a simple linear algorithm to probe local dependence blocks. We uncovered a non-static relationship between tests' p-values and their corresponding q-values that is influenced by data structure and π(0). Using an optimization framework, these findings were exploited to devise a Sliding Linear Model (SLIM) to more reliably estimate π(0) under dependence. When tested on a number of simulation datasets with varying data dependence structures and on microarray data, SLIM was found to be robust in estimating π(0) against dependence. The accuracy of its π(0) estimation suggests that SLIM can be used as a stand-alone tool for prediction of significant tests.

Availability: The R code of the proposed method is available at http://aspendb.uga.edu/downloads for academic use.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Gene Expression Profiling*
  • Linear Models
  • Oligonucleotide Array Sequence Analysis
  • Populus / genetics
  • Populus / metabolism