Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A practical guide to methods controlling false discoveries in computational biology

View ORCID ProfileKeegan Korthauer, View ORCID ProfilePatrick K Kimes, View ORCID ProfileClaire Duvallet, View ORCID ProfileAlejandro Reyes, View ORCID ProfileAyshwarya Subramanian, View ORCID ProfileMingxiang Teng, Chinmay Shukla, View ORCID ProfileEric J Alm, View ORCID ProfileStephanie C Hicks
doi: https://doi.org/10.1101/458786
Keegan Korthauer
1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, 02215 Boston, USA.
2Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, 02115, Boston, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Keegan Korthauer
Patrick K Kimes
1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, 02215 Boston, USA.
2Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, 02115, Boston, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Patrick K Kimes
Claire Duvallet
3Department of Biological Engineering, MIT, 77 Massachusetts Avenue, Cambridge, USA.
4Center for Microbiome Informatics and Therapeutics, MIT, 77 Massachusetts Avenue, Cambridge, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Claire Duvallet
Alejandro Reyes
1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, 02215 Boston, USA.
2Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, 02115, Boston, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alejandro Reyes
Ayshwarya Subramanian
5Broad Institute, 415 Main Street, Cambridge, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ayshwarya Subramanian
Mingxiang Teng
1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, 02215 Boston, USA.
2Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, 02115, Boston, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mingxiang Teng
Chinmay Shukla
6Biological and Biomedical Sciences Program, Harvard University, Add address, Boston, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric J Alm
3Department of Biological Engineering, MIT, 77 Massachusetts Avenue, Cambridge, USA.
4Center for Microbiome Informatics and Therapeutics, MIT, 77 Massachusetts Avenue, Cambridge, USA.
5Broad Institute, 415 Main Street, Cambridge, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric J Alm
Stephanie C Hicks
7Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street, 21205 Baltimore, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stephanie C Hicks
  • For correspondence: shicks19@jhu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p-values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as “informative covariates” to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigated the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology

Results Methods that incorporate informative covariates were modestly more powerful than classic approaches, and did not underperform classic approaches, even when the covariate was completely uninformative. The majority of methods were successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we found the improvement of the modern FDR methods over the classic methods increased with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses.

Conclusions Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted October 31, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A practical guide to methods controlling false discoveries in computational biology
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A practical guide to methods controlling false discoveries in computational biology
Keegan Korthauer, Patrick K Kimes, Claire Duvallet, Alejandro Reyes, Ayshwarya Subramanian, Mingxiang Teng, Chinmay Shukla, Eric J Alm, Stephanie C Hicks
bioRxiv 458786; doi: https://doi.org/10.1101/458786
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A practical guide to methods controlling false discoveries in computational biology
Keegan Korthauer, Patrick K Kimes, Claire Duvallet, Alejandro Reyes, Ayshwarya Subramanian, Mingxiang Teng, Chinmay Shukla, Eric J Alm, Stephanie C Hicks
bioRxiv 458786; doi: https://doi.org/10.1101/458786

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3701)
  • Biochemistry (7820)
  • Bioengineering (5695)
  • Bioinformatics (21343)
  • Biophysics (10603)
  • Cancer Biology (8206)
  • Cell Biology (11974)
  • Clinical Trials (138)
  • Developmental Biology (6786)
  • Ecology (10425)
  • Epidemiology (2065)
  • Evolutionary Biology (13908)
  • Genetics (9731)
  • Genomics (13109)
  • Immunology (8171)
  • Microbiology (20064)
  • Molecular Biology (7875)
  • Neuroscience (43171)
  • Paleontology (321)
  • Pathology (1282)
  • Pharmacology and Toxicology (2267)
  • Physiology (3363)
  • Plant Biology (7254)
  • Scientific Communication and Education (1316)
  • Synthetic Biology (2012)
  • Systems Biology (5550)
  • Zoology (1133)