A simple yet accurate correction for winner's curse can predict signals discovered in much larger genome scans

Bioinformatics. 2016 Sep 1;32(17):2598-603. doi: 10.1093/bioinformatics/btw303. Epub 2016 May 13.

Abstract

Motivation: For genetic studies, statistically significant variants explain far less trait variance than 'sub-threshold' association signals. To dimension follow-up studies, researchers need to accurately estimate 'true' effect sizes at each SNP, e.g. the true mean of odds ratios (ORs)/regression coefficients (RRs) or Z-score noncentralities. Naïve estimates of effect sizes incur winner's curse biases, which are reduced only by laborious winner's curse adjustments (WCAs). Given that Z-scores estimates can be theoretically translated on other scales, we propose a simple method to compute WCA for Z-scores, i.e. their true means/noncentralities.

Results: WCA of Z-scores shrinks these towards zero while, on P-value scale, multiple testing adjustment (MTA) shrinks P-values toward one, which corresponds to the zero Z-score value. Thus, WCA on Z-scores scale is a proxy for MTA on P-value scale. Therefore, to estimate Z-score noncentralities for all SNPs in genome scans, we propose F: DR I: nverse Q: uantile T: ransformation (FIQT). It (i) performs the simpler MTA of P-values using FDR and (ii) obtains noncentralities by back-transforming MTA P-values on Z-score scale. When compared to competitors, realistic simulations suggest that FIQT is more (i) accurate and (ii) computationally efficient by orders of magnitude. Practical application of FIQT to Psychiatric Genetic Consortium schizophrenia cohort predicts a non-trivial fraction of sub-threshold signals which become significant in much larger supersamples.

Conclusions: FIQT is a simple, yet accurate, WCA method for Z-scores (and ORs/RRs, via simple transformations).

Availability and implementation: A 10 lines R function implementation is available at https://github.com/bacanusa/FIQT CONTACT: sabacanu@vcu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Bias
  • Data Interpretation, Statistical
  • Genome-Wide Association Study*
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide*