RT Journal Article
SR Electronic
T1 FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 019299
DO 10.1101/019299
A1 Bigdeli, Tim B.
A1 Lee, Donghyung
A1 Riley, Brien P.
A1 Vladimirov, Vladimir
A1 Fanous, Ayman H.
A1 Kendler, Kenneth S.
A1 Bacanu, Silviu-Alin
YR 2015
UL http://biorxiv.org/content/early/2015/05/13/019299.abstract
AB Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of statistics.