Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing

Martin J. Zhang, Fei Xia, James Zou
doi: https://doi.org/10.1101/496372
Martin J. Zhang
Department of Electrical Engineering, Stanford University, Palo Alto, 94304 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fei Xia
Department of Electrical Engineering, Stanford University, Palo Alto, 94304 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James Zou
Department of Electrical Engineering, Stanford University, Palo Alto, 94304 USADepartment of Biomedical Data Science, Stanford University, Palo Alto, 94304 USAChan-Zuckerberg Biohub, San Francisco, 94158 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Multiple hypothesis testing is an essential component of modern data science. Its goal is to maximize the number of discoveries while controlling the fraction of false discoveries. In many settings, in addition to the p-value, additional information/covariates for each hypothesis are available. For example, in eQTL studies, each hypothesis tests the correlation between a variant and the expression of a gene. We also have additional covariates such as the location, conservation and chromatin status of the variant, which could inform how likely the association is to be due to noise. However, popular multiple hypothesis testing approaches, such as Benjamini-Hochberg procedure (BH) and independent hypothesis weighting (IHW), either ignore these covariates or assume the covariate to be univariate. We introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% and 27% more associations than BH and IHW, respectively, at the same false discovery rate. We prove that AdaFDR controls false discovery proportion, and show that it makes substantially more discoveries while controlling FDR in extensive experiments. AdaFDR is computationally efficient and can process more than 100 million hypotheses within an hour and allows multi-dimensional covariates with both numeric and categorical values. It also provides exploratory plots for the user to interpret how each covariate affects the significance of hypotheses, making it broadly useful across many applications.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted December 13, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing
Martin J. Zhang, Fei Xia, James Zou
bioRxiv 496372; doi: https://doi.org/10.1101/496372
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing
Martin J. Zhang, Fei Xia, James Zou
bioRxiv 496372; doi: https://doi.org/10.1101/496372

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1540)
  • Biochemistry (2499)
  • Bioengineering (1756)
  • Bioinformatics (9720)
  • Biophysics (3927)
  • Cancer Biology (2990)
  • Cell Biology (4230)
  • Clinical Trials (135)
  • Developmental Biology (2651)
  • Ecology (4124)
  • Epidemiology (2033)
  • Evolutionary Biology (6930)
  • Genetics (5239)
  • Genomics (6531)
  • Immunology (2205)
  • Microbiology (7004)
  • Molecular Biology (2780)
  • Neuroscience (17399)
  • Paleontology (127)
  • Pathology (432)
  • Pharmacology and Toxicology (712)
  • Physiology (1067)
  • Plant Biology (2514)
  • Scientific Communication and Education (646)
  • Synthetic Biology (835)
  • Systems Biology (2698)
  • Zoology (438)