Polygenic scoring accuracy varies across the genetic ancestry 1 continuum in all human populations 2

19 Polygenic scores (PGS) have limited portability across different groupings of individuals (e.g., by genetic 20 ancestries and/or social determinants of health), preventing their equitable use. PGS portability has typically 21 been assessed using a single aggregate population-level statistic (e.g., R 2 ), ignoring inter-individual 22 variation within the population. Here we evaluate PGS accuracy at individual-level resolution, independent 23 of its annotated genetic ancestries. We show that PGS accuracy varies between individuals across the 24 genetic ancestry continuum in all ancestries, even within traditionally “homogeneous” genetic ancestry 25 clusters. Using a large and diverse Los Angeles biobank (ATLAS, N= 36,778) along with the UK Biobank 26 (UKBB, N= 487,409), we show that PGS accuracy decreases along a continuum of genetic ancestries in all 27 considered populations and the trend is well-captured by a continuous measure of genetic distance (GD) 28 from the PGS training data; Pearson correlation of -0.95 between GD and PGS accuracy averaged across 29 84 traits. When applying PGS models trained in UKBB “white British” individuals to European-ancestry 30 individuals of ATLAS, individuals in the highest GD decile have 14% lower accuracy relative to the lowest 31 decile; notably the lowest GD decile of Hispanic/Latino American ancestry individuals showed similar PGS 32 performance as the highest GD decile of European ancestry ATLAS individuals. GD is significantly 33 correlated with PGS estimates themselves for 82 out of 84 traits, further emphasizing the importance of 34 incorporating the continuum of genetic ancestry in PGS interpretation. Our results highlight the need for 35 moving away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when 36 considering PGS and their applications.


Introduction
Polygenic scores (PGS)-estimates of an individual's genetic predisposition for complex traits/diseases (i.e. 39 genetic value)-are a promising application of large-scale genome-wide association studies (GWAS) to 40 personalized genomic medicine [1][2][3][4] , disease risk prediction and prevention [5][6][7][8] . The portability of PGS across 41 different ancestry and socio-demographic groups is limited due to Euro-centric sampling of GWAS data 42 coupled with differences in linkage disequilibrium (LD), minor allele frequency (MAF) and/or disease 43 genetic architecture 3,[9][10][11][12][13] , which poses a critical equity barrier that has prevented widespread adoption of 44 PGS for personalized medicine. For example, PGS are significantly more accurate for individuals of 45 European ancestries as compared to other genetic ancestries 10,14 ; furthermore, PGS accuracy varies across 46 socio-genomic features (e.g., sex, age and social economic status) 11 , thus complicating interpretability of 47 PGS across groups with different environmental exposures. 48 PGS accuracy is traditionally assayed using population-level metrics of accuracy (e.g., R 2 ), thus assuming 49 some level of homogeneity across individuals within the considered population 2,11,15 . However, 50 homogeneous populations are an idealized concept that only roughly approximate human populations; 51 human diversity exists along a genetic ancestry continuum without clearly defined clusters and with various 52 correlations between genetic and socio-environmental factors [15][16][17][18][19][20] . Grouping individuals into genetic 53 ancestry clusters obscures the impact of individual variation on PGS accuracy. This is evident for 54 individuals with recently admixed genomes, where genetic ancestries vary individual-to-individual and 55 locus-to-locus in the genome. For example, a single population-level PGS accuracy estimated across all 56 African Americans greatly overestimates PGS accuracy for African Americans with large proportions of 57 African genetic ancestries 21 ; likewise, coronary artery disease PGS performs poorly in Hispanic individuals 58 with high proportions of African ancestry 22 . The genetic ancestry continuum impacts PGS accuracy even 59 in traditionally-labeled "homogeneous/non-admixed" populations; for example, PGS accuracy decays 60 across a gradient of subcontinental ancestries within Europe as the target cohorts are more genetically 61 dissimilar from the data used to train the PGS 19,23 . Assessing PGS accuracy using population-level metrics 62 is further complicated by technical issues in assigning individuals to discrete clusters of genetic ancestries. 63 Different algorithms and/or reference panels may assign the same individual to different clusters 15,23,24 and 64 thus to different PGS accuracy classes. Moreover, many individuals are not assigned to a cluster due to 65 limited reference panels used for genetic ancestry inference 23,25 , leaving such individuals outside PGS 66 accuracy characterization; this poses equity concerns as it limits PGS applications only to individuals within 67 well-defined clusters of genetic ancestries. 68 Here we leverage methods that characterize PGS performance at the level of a single target individual 26 to 69 evaluate the impact of the genetic ancestry continuum on PGS accuracy. We use simulation and real data 70 analysis to show that PGS accuracy decays continuously individual-to-individual across the genetic 71 continuum of ancestry as function of genetic distance (GD) from the PGS training data; GD is defined as a 72 principal component analysis (PCA) projection of the target individual on the training data used to estimate 73 the PGS weights. We leverage a large and diverse Los Angeles biobank at UCLA (ATLAS, N=

91
Overview of the study 92 PGS accuracy has traditionally been assessed at the level of discrete genetic ancestry clusters using 93 population-level metrics of accuracy (e.g., R 2 ). Individuals from diverse genetic backgrounds are routinely 94 grouped into discrete genetic ancestry clusters using computational inference methods such as PCA 27 and/or 95 admixture analysis 28 (Figure 1a). Population-level metrics of PGS accuracy are then estimated for each 96 genetic ancestry cluster and generalized to each individual in the cluster (Figure 1b). This approach has 97 three major limitations: (1) the inter-individual variability within each cluster is ignored; (2) the genetic 98 ancestry cluster boundary is sensitive to algorithms and reference panels used for clustering; and (3) a 99 significant proportion of individuals may not be assigned to any cluster due to a lack of reference panels 100 for genetic ancestry inference (e.g., individuals of uncommon or admixed ancestries). 101 In this work, we evaluate PGS accuracy across the genetic ancestry continuum at level of a single target 102 individual. We model the phenotype of individual as ! = ! " + !, where ! is a × 1 genotype vector 103 indicating allele counts, is a × 1 allelic causal effects vector and ! is random noise. Under a random 104 effects model ! = ! " and $ 0 = ( ! " | ) are random variables where the randomness comes from 105 and training data ( = ( %&'!( , %&'!( )) . We define the individual PGS accuracy as the correlation of 106 an individual's genetic value and PGS estimates as: 107 We use Ldpred2 to estimate + > *|+ ? ! " @A 26,29 and approximate * ? ! " @ as the heritability of the 109 phenotype (Methods) 30 ; equation 1 can be further simplified assuming all variants are causal drawn from a 110 normal distribution (infinitesimal model, see Methods). As continuous genetic distance (GD) we use ! = 111 whereis the %1 eigenvector of training genotype data (Figure 1c). Individuals that are 112 clustered into the same genetic ancestry clusters may have different genetic distance from training data and 113 different individual PGS accuracy (Figure 1d). We use theory and empirical data analyses to show that PGS 114 accuracy decay is well-approximated by the continuous metric of genetic distance. 115 We organize the manuscript as follows. First, we show the relation between genetic distance and PGS 116 accuracy in simulations using real genotype data from UK biobank. Next, we show that existing PGS have 117 accuracy that decreases individual-to-individual as function of genetic distance in a diverse biobank from 118 UCLA. Finally we showcase the impact of genetic distance on interpretability of PGS using height and 119 neutrophil count as example traits. 120 Individual PGS performance is calibrated across the genetic ancestry continuum in 121 simulations 122 First, we evaluated calibration of + > *|+ ? ! " @A estimated by LDpred2 for individuals at various 123 genetic distances from the UKBB "white British" individuals used to train PGS by checking the calibration 124 the of 90% credible intervals (Figure 2a). We simulated 100 phenotypes at heritability ℎ 2 ) = 0. 25  (composed of 19.9% individuals labeled as "Caribbean" and 80.1% labeled as "Nigeria"). 135 Next, we investigated the impact of GD on individual-level PGS accuracy. As expected, the credible 136 interval width increases linearly with GD reflecting reduced predictive accuracy for the PGS (Figure 2b). 137 The average width of 90% credible interval is 1.83 in the highest decile of GD, a 1. to R=-0.62 for the Caribbean cluster. 165 Next, we focused on the impact of GD on PGS accuracy across all ATLAS individuals regardless of genetic 166 ancestry clustering (R = -0.96, P < 2.2 e-16, Figure 3b). Notably, we find a strong overlap of PGS accuracies 167 across individuals from different genetical ancestry clusters demonstrating the limitation of using a single 168 cluster-specific metric of accuracy. For example, when rank-ordering by GD, we find the individuals from 169 the closest GD decile in HL cluster have similar estimated accuracy as the individuals from the farthest GD 170 decile in EA cluster (average $ ) U of 0.71 vs 0.71). This shows that GD enables identification of HL 171 individuals with similar PGS performance as the EA cluster thus partly alleviating inequities due to lack of 172 access to accurate PGS. Most notably, GD can be used to evaluate PGS performance for individuals that 173 cannot be easily clustered by current genetic inference methods (6% of all individuals in ATLAS, Figure  174 3b) partly due to limitations of reference panels and algorithms for assigning ancestries. Among this 175 traditionally overlooked group of individuals, we find the GD ranging from 0.02 to 0.64 and their 176 corresponding estimate PGS accuracy $ ) U ranging from 0.63 to 0.21. In addition to evaluating PGS accuracy 177 with respect to the genetic value, we also evaluated accuracy with respect to the residual height after 178 regressing out sex, age, PC1-10 on the ATLAS from the actual measured trait. Using equally spaced bins 179 across the GD continuum, we find that correlation between PGS and the measured height tracks 180 significantly with GD (R = -0.9, P-value = 5.9e-8, Figure 3c). 181

182
Having established the coupling of GD with PGS accuracy in simulations and for height, we next turn to 183 the question of whether such relationship is pervasive across complex traits using PGS for a broad set of 184 84 traits (Supplementary Table 1). We find consistent and pervasive correlations of GD with PGS accuracy 185 across all considered traits in both ATLAS and UK Biobank ( Figure 4). For example, the correlations 186 between GD and individual PGS accuracy range from -0.71 to -0.97 with an average of -0.95 across the 84 187 PGS in ATLAS with similar results in UKBB. Traits with sparser genetic architectures and fewer non-zero 188 weights in the PGS yield to a lower correlation between GD and PGS accuracy; we hypothesize this is 189 because GD represents genome-wide genetic variation patterns that may not reflect a limited number of 190 causal SNPs well. For example, PGS for Lipoprotein A (log_lipoA) has the lowest polygenicity estimate 191 (0.02%) among the 84 traits and has the lowest correlation in ATLAS (-0.71) and UKBB (-0.85). In contrast, 192 we observe a high correlation between GD and PGS accuracy (>0.9) for all traits with an estimated 193 polygenicity > 0.1%. Next, we show that the fine-scale population structures accountable for the individual 194 PGS accuracy variation is also prevalent within the traditionally defined genetic ancestry group. For 195 example, in ATLAS we find 501 out of 504 (84 traits across 6 genetic ancestry clusters) trait-ancestry pairs 196 have a significant association between GD and individual PGS accuracy after Bonferroni correction. In 197 UKBB, we find 572 out of the 756 (84 traits across 9 subcontinental genetic ancestry clusters) trait-ancestry 198 pairs have significant association between genetic distance and PGS accuracy after Bonferroni correction. 199 We also find that a more stringent definition of homogenous genetic clusters results in a lower correlation 200 magnitude (Supplementary Figure 3). 201 202 We focused so far on investigating the relationship between GD and PGS accuracy. Next, we turn to 203 evaluating the impact of GD on PGS estimates themselves. We find that GD is significantly correlated with 204

Genetic distance correlates with PGS estimates across most traits
PGS estimates for 82 out of 84 traits in UKBB ranging from R=-0.52 to R= 0.74 (Supplementary Figure 4); 205 this broad range of correlations is in stark contrast with the highly consistent negative correlation of GD 206 and PGS ! ) . To gain insights into whether PGS coupling with GD is due to stratification or true signal, we 207 next contrasted the correlation of GD to PGS estimates ( ( ! , U ! )) with correlation of GD to the 208 measured phenotype values ( ( ! , y 7 )). We find a wide-range of couplings reflecting trait-specific signals; ( Figure 5b). This is consistent with genetic value driving difference in phenotypes but could also be 219 explained by residual stratification. For neutrophil counts, phenotype and PGS varies in opposite direction 220 along GD across the ATLAS (Figure 5c), although the trend is similar for phenotype and PGS in European 221 American cluster (Figure 5d). This could be explained by genetic value driving signal in Europeans with 222 stratification for other groups. Neutrophil counts have been reported to vary greatly across ancestry groups 223 with reduced counts in individuals of African ancestries 31 . In ATLAS, we observe a negative correlation (-224 0.04) between GD and neutrophil counts in agreement with the previous reports, while GD is positively 225 correlated (0.08) with PGS estimates with genetically distant individuals traditionally labeled as African 226 American having higher PGS than average. The opposite directions in phenotype/PGS-distance correlations 227 are partly attributed to Duffy-null SNP rs2814778 on chromosome 1q23.2. This variant has a large 228 association with neutrophil counts among individuals traditionally identified as African ancestry, but it is 229 rare and excluded in our training data. This exemplifies the potential bias in PGS due to non-shared causal 230 variants and urges ancestral diversity in genetic studies. 231 Since PGS can vary across GD either as reflection of true signal (i.e. genetic value varying with ancestry) 232 or due to biases in PGS estimation ranging from unaccounted residual population stratification to 233 incomplete data (e.g., partial ancestry-specific tagging of causal effects), our results emphasize the need to 234 consider GD in PGS interpretation beyond adjusting for PGS ! ) . 235

Analytical form of individual PGS accuracy under infinitesimal assumption. Without loss of generality, 346
we assume a prior distribution of genetic effects as follows: 347 With access to individual genotype data %&'!( and phenotypes %&'!( , the likelihood of the data is 349 The posterior distribution of genetic effects given the data is proportional to the product of the prior and the Empirically, the ratio between the two is highly correlated with the Euclidean distance of the individual 375 from the training data on that PC space (R= 1, P-value < 2.2e-16 in UKBB). 376 Genetic Distance. The genetic distance is defined as the Euclidean distance between a target individual 377 and the center of training data on the PC space of training data. ancestries with probability larger than 0.5 or is not assigned to any clusters, it's labeled as unknown. as input for snp_ldpred2_auto function in bigsnpr to sample from the posterior distribution of genetic effect 446 sizes. Instead of using a held-out validation dataset to select hyperparameters p (proportion of causal 447 variants) and h2 (heritability), snp_ldpred2_auto estimates the two parameters from data with MCMC 448 directly. We run 10 chains with different initial sparsity p from 10 -4 to 1 equally spaced in log space. For 449 all chains, we set the initial heritability as the LD score regression heritability 48 estimated by the built-in 450 function snp_ldsc. We perform quality control of the 10 chains by filtering out trains with estimated 451 heritability that are smaller than 0.7 times of the median heritability of the 10 chains or with estimated 452 sparsity that are smaller than 0.5 times of the median sparsity or 2 times of the median sparsity. For each 453 chain that passes filtering, we remove the first 100 MCMC iterations as burn-in and thin the next 500 454 iterations by selecting every 5th iteration to reduce autocorrelation between MCMC samples. In the end, 455 we   genetic ancestry clusters. Each dot represents a testing individual from ATLAS. For each dot, the x-axis 541 represents its distance from the training population on the genetic continuum; the y-axis represents its PGS 542 accuracy. The color represents the inferred genetic ancestry cluster. R and p refer to the correlation between 543 genetic distance and individual-level PGS accuracy and its significance from two-sided t-tests. (b) 544 Individual PGS accuracy decreases across the entire ATLAS. (c) Population-level PGS accuracy decreases 545 with the average genetic distance in each genetic distance bin. All ATLAS individuals are divided into 20 546 equal-interval genetic distance bins. The x-axis is the average genetic distance within the bin, the y-axis is 547 the squared correlation between PGS and phenotype for individuals in the bin; The dot and error bar show 548 mean and 95% confidence interval from 1000 bootstrap samples. (EA, European American; HL, 549 Hispanic/Latino American; SAA, South Asian American; EAA, East Asian American; AA, African 550 American.) 551 PGS accuracy and genetic distance within the group specified by x-axis for each of the 84 traits. The box 558 shows the first, second and third quartile of the 84 correlations, and whiskers extend to the minimum and 559 maximum estimates located within 1.5 × IQR from the first and third quartiles, respectively. Numerical 560 results are reported in Supplementary Table 2  The x axis is the correlation between phenotype and genetic distance and the y axis is the correlation 618 between PGS estimates and genetic distance for all 48,586 testing individuals in UKBB. Numerical results 619 are reported in Supplementary Table 4. 620 621