PT  - JOURNAL ARTICLE
AU  - Oliver Pain
AU  - Kylie P. Glanville
AU  - Saskia P. Hagenaars
AU  - Saskia Selzam
AU  - Anna E. Fürtjes
AU  - Héléna A. Gaspar
AU  - Jonathan R. I. Coleman
AU  - Kaili Rimfeld
AU  - Gerome Breen
AU  - Robert Plomin
AU  - Lasse Folkersen
AU  - Cathryn M. Lewis
TI  - Evaluation of Polygenic Prediction Methodology within a Reference-Standardized Framework
AID  - 10.1101/2020.07.28.224782
DP  - 2021 Jan 01
TA  - bioRxiv
PG  - 2020.07.28.224782
4099  - http://biorxiv.org/content/early/2021/02/16/2020.07.28.224782.short
4100  - http://biorxiv.org/content/early/2021/02/16/2020.07.28.224782.full
AB  - Background The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores.Methods Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDPred1, LDPred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value threshold and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models.Results LDPred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs and DBSLMM, with a relative improvement of &amp;gt;10% over other pseudovalidation and infinitesimal methods (lassosum, SBLUP, SBayesR, LDPred1, LDPred2). PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score.Conclusion Within a reference-standardized framework, the best polygenic prediction was achieved using LDPred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.Competing Interest StatementThe authors have declared no competing interest.