TY - JOUR
T1 - Stop Meta-Analyzing, Start Instrumenting: Maximizing the Predictive Power of Polygenic Scores
JF - bioRxiv
DO - 10.1101/2021.04.09.439157
SP - 2021.04.09.439157
AU - van Kippersluis, Hans
AU - Biroli, Pietro
AU - Galama, Titus J.
AU - von Hinke, Stephanie
AU - Meddens, S. Fleur W.
AU - Muslimova, Dilnoza
AU - Pereira, Rita
AU - Rietveld, Cornelius A.
Y1 - 2021/01/01
UR - http://biorxiv.org/content/early/2021/04/11/2021.04.09.439157.abstract
N2 - Polygenic scores have become the workhorse for empirical analyses in social-science genetics. Because a polygenic score is constructed using the results of finite-sample Genome-Wide Association Studies (GWASs), it is a noisy approximation of the true latent genetic predisposition to a certain trait. The conventional way of boosting the predictive power of polygenic scores is to increase the GWAS sample size by meta-analyzing GWAS results of multiple cohorts. In this paper we challenge this convention. Through simulations, we show that Instrumental Variable (IV) regression using two polygenic scores from independent GWAS samples outperforms the typical Ordinary Least Squares (OLS) model employing a single meta-analysis based polygenic score in terms of bias, root mean squared error, and statistical power. We verify the empirical validity of these simulations by predicting educational attainment (EA) and height in a sample of siblings from the UK Biobank. We show that IV regression between-families approaches the SNP-based heritabilities, while compared to meta-analysis applying IV regression within-families provides a tighter lower bound on the direct genetic effect. IV estimation improves the predictive power of polygenic scores by 12% (height) to 22% (EA). Our findings suggest that measurement error is a key explanation for hidden heritability (i.e., the difference between SNP-based and GWAS-based heritability), and that it can be overcome using IV regression. We derive the practical rule of thumb that IV outperforms OLS when the correlation between the two polygenic scores used in IV regression is larger than , with N the sample size of the prediction sample.Competing Interest StatementThe authors have declared no competing interest.
ER -