Abstract
Polygenic risk scores built from multi-ancestry genome-wide association studies (GWAS, PRSmulti) have the potential to improve PRS accuracy and generalizability across populations. To provide the best practice to leverage the increasing diversity of genomic studies, we used large-scale simulated and empirical data to investigate how ancestry composition, trait-specific genetic architecture, and PRS methodology affect the performance of PRSmulti as compared to PRS constructed from single-ancestry GWAS (PRSsingle). In both simulations on 6 various scenarios and empirical analyses on 17 anthropometric and blood panel traits, we showed that the accuracy of PRSmulti overall outperformed PRSsingle in the understudied target populations, except for a few comparisons where the understudied population only accounted for a very small proportion of the multi-ancestry GWAS. Further, using substantially fewer samples for traits such as height and mean corpuscular volume from Biobank Japan (BBJ) may achieve comparable accuracies to using 320,000 European (EUR) individuals from UK Biobank (UKBB). Finally, we find that incorporating PRS based on local ancestry-informed GWAS and large-scale EUR-based PRS improved predictive performance than using EUR-based PRS alone in understudied African (AFR) population, especially for less polygenic traits when there are variants with large ancestry-specific effects. Overall, our study provides insights into how ancestry composition and genetic architecture impact polygenic prediction across populations, particularly across imbalanced sample sizes. Our work also highlights the need for increasing diversity in genetic studies to achieve equitable PRS performance across ancestral populations and provides practical guidance on developing PRS from multiple resources.
Competing Interest Statement
The authors have declared no competing interest.