Novel Methods for Multi-ancestry Polygenic Prediction and their Evaluations in 3.7 Million Individuals of Diverse Ancestry

Haoyu Zhang; Jianan Zhan; Jin Jin; Jingning Zhang; Thomas U. Ahearn; Zhi Yu; Jared O’Connell; Yunxuan Jiang; Tony Chen; 23andMe Research Team; Montserrat Garcia-Closas; Xihong Lin; Bertram L. Koelsch; Nilanjan Chatterjee

doi:10.1101/2022.03.24.485519

Abstract

Polygenic risk scores are becoming increasingly predictive of complex traits, but subpar performance in non-European populations raises concerns about their potential clinical applications. We develop a powerful and scalable method to calculate PRS using GWAS summary statistics from multi-ancestry training samples by integrating multiple techniques, including clumping and thresholding, empirical Bayes and super learning. We evaluate the performance of the proposed method and a variety of alternatives using large-scale simulated GWAS on ~19 million common variants and large 23andMe Inc. datasets, including up to 800K individuals from four non-European populations, across seven complex traits. Results show that the proposed method can substantially improve the performance of PRS in non-European populations relative to simple alternatives and has comparable or superior performance relative to a recent method that requires a higher order of computational time. Further, our simulation studies provide novel insights to sample size requirements and the effect of SNP density on multi-ancestry risk prediction.