Abstract
Studies of the relationship between genetic and phenotypic variation have historically been carried out on people of European ancestry. Efforts are underway to address this limitation, but until they succeed, the legacy of a Euro-centric bias in medical genetic studies will continue to hinder research, including the use of polygenic scores, which are individual-level metrics of genetic risk. Ongoing debate surrounds the generalizability of polygenic scores based on genome-wide association studies (GWAS) conducted in European ancestry samples, to non-European ancestry samples. We analyzed the first decade of polygenic scoring studies (2008-2017, inclusive), and found that 67% of studies included exclusively European ancestry participants and another 19% included only East Asian ancestry participants. Only 3.8% of studies were carried out on samples of African, Hispanic, or Indigenous peoples. We find that effect sizes for European ancestry-derived polygenic scores are only 36% as large in African ancestry samples, as in European ancestry samples (t=-10.056, df=22, p=5.5×10−10). Poorer performance was also observed in other non-European ancestry samples. Analysis of polygenic scores in the 1000Genomes samples revealed many strong correlations with global principal components, and relationships between height polygenic scores and height phenotypes that were highly variable depending on methodological choices in polygenic score construction. As polygenic score use increases in research, precision medicine, and direct-to-consumer testing, improved handling of linkage disequilibrium and variant frequencies (both of which currently reduce transferability of scores) across populations will improve polygenic score performance. These findings bolster the rationale for large-scale GWAS in diverse human populations.
Significance Statement The modern genetics revolution enabled rough calculations of individuals’ genetic liability for many phenotypes, including height, weight, and schizophrenia. Increasingly, polygenic scores, which are individual-level metrics of genetic liability, are available via direct-to-consumer testing, and they are already widely used in research. The performance of these scores depends on the availability of very large genetic studies, and consequently it is problematic that people of European ancestry are vastly over-represented in these studies. We quantify the magnitude of this problem on the performance of polygenic scores in global samples and also show ancestry-related properties of polygenic scores. These findings set benchmarks for future progress, and they demonstrate the need for large-scale genetic studies in diverse human populations.
Classification Biological Sciences – Genetics