Accurate error control in high dimensional association testing using conditional false discovery rates

James Liley; Chris Wallace

doi:10.1101/414318

Abstract

High-dimensional hypothesis testing is ubiquitous in the biomedical sciences, and informative covariates may be employed to improve power. The conditional false discovery rate (cFDR) is widely-used approach suited to the setting where the covariate is a set of p-values for the equivalent hypotheses for a second trait. Although related to the Benjamini-Hochberg procedure, it does not permit any easy control of type-1 error rate, and existing methods are over-conservative. We propose a new method for type-1 error rate control based on identifying mappings from the unit square to the unit interval defined by the estimated cFDR, and splitting observations so that each map is independent of the observations it is used to test. We also propose an adjustment to the existing cFDR estimator which further improves power. We show by simulation that the new method more than doubles potential improvement in power over unconditional analyses compared to existing methods. We demonstrate our method on transcriptome-wide association studies, and show that the method can be used in an iterative way, enabling the use of multiple covariates successively. Our methods substantially improve the power and applicability of cFDR analysis.

Footnotes

cew54{at}cam.ac.uk
A range of changes have been made. The manuscript has been merged with a previous manuscript concerning different ways of estimating the conditional false discovery rate. The chief contributions of the new manuscript are: 1: to propose a much improved type-1 error rate control strategy for cFDR, which improves power relative to previous methods. 2. propose an improvement to the existing estimator which improves power, 3. show several asymptotic results about the method and demonstrate that the effect of certain troublesome properties is small 4. enable and demonstrate iterative use of the procedure, and 5. compare the general cFDR method with PDF-based, parametric and kernel density estimator (KDE)-based approaches.
https://github.com/jamesliley/cfdr
https://github.com/jamesliley/cfdr_pipeline

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.