## Abstract

An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method – which we call PhenoGraph – has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different human populations.

## 1 Introduction

There is much interest in identifying the individual genetic variants that have experienced natural selection during recent human evolution. Many popular methods tackle this problem by identifying alleles that have changed frequency faster than can be explained by genetic drift alone, and that can instead be explained by selective processes. These methods exploit patterns like haplotype homozygosity [1, 2, 3] and extreme population differentiation [4, 5, 6], and have yielded several important candidates for human adaptation: for example, *LCT* [7], *EDAR* [2], *EPAS1* [4] and the *FADS* region [8]. In order for these signals to be detectable at the level of an individual locus, the historical changes in allele frequency must have been large and rapid. Therefore, they can only be produced by alleles that confer a strong selective advantage.

With the advent of large-scale GWAS for a variety of measurable traits, however, it has now become possible to detect a more subtle mechanism of adaptation. If a trait is polygenic, positive selection may instead occur by concerted shifts at many loci that all contribute to the variation in a trait. Over short time scales, these shifts are expected to be small (but see [9] for polygenic dynamics under longer time scales). They are also expected to occur in consistent directions, such that alleles that increase the trait will systematically rise in frequency (if selection favors the increase of the trait) or fall in frequency (if selection operates in the opposite direction). None of the allele frequency changes need to be large on their own for the phenotypic change to be large. This process is called polygenic adaptation and may underlie major evolutionary processes in recent human history [10, 11, 12, 13].

A number of methods have been developed to detect polygenic adaptation using loci identified from GWAS. Turchin et al. [14] was the first such study. They developed a test for polygenic adaptation between two populations, and showed that there were systematic frequency differences at height-associated loci between northern and southern Europeans, which could not be explained by genetic drift alone. Berg et al. [15] developed a more general method to detect polygenic adaptation by testing for over-dispersion of mean genetic values among several populations, using a genome-wide population co-variance matrix to predict how alleles should behave under neutrality. More recently, Robinson et al. [16] used theory by Ovaskainen et al. [17] to develop a similar population differentiation method to detect polygenic adaptation with GWAS. They also made use of the genome-wide covariance matrix, but, in contrast to Berg et al., their method is implemented in a Bayesian linear mixed model.

None of these methods require a detailed model of human history to detect polygenic adaptation. Their use of the genome-wide covariance matrix allows them to capture patterns of genetic drift among populations without having to infer their history. While this makes them quite powerful, it also means that they are not very useful at determining where and when polygenic adaptation took place in the past.

Here, we develop a method to detect polygenic adaptation that uses a more parameter-rich model of historical population structure: an admixture graph, which is a simplified representation of the history of divergences and admixture events among populations [18, 19]. Having an explicit model allows us to infer where particular bouts of polygenic adaptation took place in human history, so as to better understand how selection on trait-associated variants has occurred over past generations.

## 2 Results

### 2.1 Model

Assume we have measured genotypes at a SNP that influences a trait in a set of *M* populations. Let *d*_{m} be the count of the derived allele in population *m* and let be the vector across the M populations of each *d*_{m} observation. Let *n*_{m} be the total number of chromosomes observed in population *m* (together ). Assume we have an admixture graph *G* relating these populations, and that this graph consists of an accurate topology, as well as accurate branch lengths and admixture rates. Branch lengths are in units of drift, which are approximately equal to *t*/2*N*_{e}, where, for each branch, *t* is the number of generations and *N*_{e} is the effective population size, assuming *t* ≪ *N*_{e} [20]. In practice, we can estimate such a graph from neutral genome-wide data, and we use the program *MixMapper* [21] when applying our method to real data below.

We wish to model the changes in frequency of the trait-associated allele over the graph *G*. At each node in the graph, we introduce a parameter that corresponds to the allele frequency of the variant at that node. Let *f*_{R} be the derived allele frequency at the root of the graph, be the vector of allele frequencies at all the other internal nodes of the graph, and be the vector of allele frequencies at the tips of the graph.

The probability of the parameter values and the data can then be decomposed as follows:

We now take each of terms above in turn. First, the probability of the observed counts is simply a product of binomial probabilities:
where *f*_{m} is the element of that corresponds to population *m*.

To get the probabilities of the changes in allele frequency across different nodes, consider a single branch of *G*. Assuming the branch is relatively short (such that the allele does not approach fixation or extinction in this time period) and there is no natural selection, we can use the Normal approximation to the Wright-Fisher diffusion [22, 23, 24] to model the allele frequency at the descendant node of the branch (*f*_{D}) as a function of the allele frequency at the ancestral node (*f*_{A}):
where *c* corresponds to the amount of drift that has occurred in the branch. In practice, we use a truncated Normal distribution with point masses at 0 and 1, to account for the possibility of fixation or extinction of the allele [25].

In our model there may have also been selection on the allele on the branch, such that it was pushed to either higher or lower frequency because of its influence on a trait. We can model the selected allele frequency by modifying the infinitesimal mean of the Wright-Fisher diffusion and approximating the diffusion with a Normal distribution that now includes some additional terms ([14, 25, 26], Bhérer et al. in prep.):
where is the effect size estimate at that site (defined with respect to the derived allele), is some function that relates the effect size estimate to the selective pressure and *α* is our positive selection parameter, which is approximately equal to the product of the selection coefficient for the advantageous allele and the duration of the selective process [14, 27]. In practice, we will set to be equal to the sign (+1 or −1) of (Figure 2), so as to avoid giving too much weight to variants of strong effect. We will model selection only on SNPs that are associated with a trait in a particular GWAS.

We can calculate the probability of these parameters at a particular site as the product of the Normal probability densities that correspond to the evolution of allele frequencies down each branch times a binomial probability density to account for sampling error. Let us denote the Normal density that corresponds to a particular branch *λ* as , where we suppress notation of for clarity. Then, for example, the probability of a given pattern of allele frequencies and sample counts over a rooted 3-leaf tree with 4 branches *λ, ι, υ, ψ* (Figure 1) can be computed as follows:
when the *α* parameters and the allele frequency at the root of the tree are known. Note that some of the symbols here correspond to the same allele frequencies. For example, if the *υ* branch is one of the immediate descendant branches of the *ι* branch, then . Assuming SNPs are independent, we can compute the probability of the allele frequency configurations at all N trait-associated SNPs as a product over the probabilities at each of the SNPs:
where *P*_{i} is the probability of the parameters of interest at SNP *i* under our tree model.

More generally, we can also compute the probability of our parameters in an admixture graph, containing nodes with more than one parent. In that case, the probability of an allele frequency of an admixed node is a weighted sum of the probability paths corresponding to its two parents, where the weights are the admixture rates for each of the two contributions.

In practice, for a given SNP, we know neither the allele frequencies at the inner nodes nodes, at the tip nodes and at the root node (*f*_{R}), nor the *α* parameters in each branch . We want to obtain a posterior distribution of these parameters, given the data and the known graph: . We aim to do this for all trait-associated SNPs. We therefore developed an MCMC sampler to transition between the states of these variables and estimate their posterior distribution (Figure 2).

We set the prior for the frequencies at the root *f*_{R} to be a uniform distribution. For a given SNP i,

As there are many combinations of *α* parameters that generate almost equivalent likelihoods in a complex admixture graph (Figure S1), we use a “spike-and-slab” prior for the *α* parameters, so as to promote sparsity. For a given branch *j*,

This is a mixture of two Normal distributions centered at 0: one of the distributions has a wide standard deviation (*τ*), while the other has a much narrower standard deviation, which is a fraction (1/*ζ*) of *τ*, and approximates a point mass at 0 [28, 29]. Here, *κ* is the mixture probability of drawing from the narrower Normal distribution, and we model it as a uniform hyperprior (see Methods). The idea behind this is that our assumed prior belief is that only a few of the branches in the admixture graph have experienced bouts of polygenic adaptation, so we reward *α* parameters that tend to stay in the neighborhood of 0 during the MCMC run.

### 2.2 A statistic for prioritizing branches

We observed via simulations that different combinations of *α* parameters can produce very similar likelihood values. This causes the MCMC sampler to explore different combinations of *α* values in the same posterior run, when only one such combination is actually correct (Figure S1). The aforementioned spike-and-slab prior serves to partially ameliorate this problem, but we aimed to find a way to further encourage sparsity by reducing the possible number of candidate branches that are explored in the MCMC.

We therefore devised a set of summary statistics that can be computed before starting an MCMC run and are meant to detect branch-specific deviations from neutrality. Let **F** be the empirical population covariance matrix, which – under ideal neutral conditions – should be determined by the admixture graph connecting all the populations. Let be the mean-centered vector of estimated mean genetic values for each of the M populations, computed from the *N* SNPs that are known to be associated with a trait. For a specific population *m:*

Here, is the sample frequency of SNP *i* in a panel of population *m*, and is its effect size estimate. The vector therefore contains the *z*_{m} values for all *M* populations.

Furthermore, let us define the vector for a particular branch *j* to be equal to the contribution *w* of that branch to each of the leaves of the graph. For example, branch v-q in Figure 2 is a full ancestral branch to leaf C, a partial ancestral branch to leaf B (due to the admixture event), but not an ancestral branch to either A or D. Therefore, the vector is equal to (*w*_{A},*w*_{B},*w*_{C},*w*_{D}) = (0,1 – *γ*,1,0). By the same reasoning, and .

Now, let:
where is a scaled version of so that is has unit length. The numerator of this statistic is proportional to the squared covariance between the genetic values and the vector representing the contribution of drift down branch *j* to the overall pattern of among population divergence. The denominator gives the expectation of the numerator under the neutral model. This statistic reflects how much of the deviations from neutrality among the population mean genetic values for a trait is due to branch *j*. One can show (see Methods) that *Q*_{B}(j) has a distribution under a null model of multivariate Normal drift, for any branch *j*, and excessively large values of *Q*_{B}(j) therefore represent evidence suggesting non-neutral evolution down branch *j*.

We used this summary statistic (computed for each branch) to prioritize which branches to explore in our MCMC. We applied a prior point mass of zero to all branches whose corresponding *Q*_{B} statistic were smaller than a particular cutoff. The choice of cutoff was based on simulations (see Methods). The MCMC only produces posterior samples for branches that pass this cutoff and are therefore highly deviated from their expectation under neutrality. We also update the *α* parameters of these latter branches in our MCMC with a frequency proportional to their *Q*_{B} values, in a similar fashion to [30].

### 2.3 Implementation

We implemented both the MCMC and the *Q*_{B} statistic computation in a program that we call PhenoGraph. We use the R packages *admixturegraph* [31] and *igraph* [32] to visualize and manipulate various aspects of a graph. The R scripts to run PhenoGraph can be downloaded here: https://github.com/FerRacimo/PhenoGraph

### 2.4 Simulations

We explored different demographic scenarios. We first simulated a simple three-leaf tree with four branches, in which the sampled panels in the leaves were each composed of 100 diploid individuals (Figure 3.A). We tested scenarios of different branch lengths: each of the branches was simulated to be either of length 0.02 or of length 0.05. We also tested different types of branch under selection (either a terminal branch or an internal branch). We additionally tested a four-leaf admixture graph with one admixture event (Figure 3.C) in these same scenarios. For comparison, the amount of genetic drift between Spanish and French human populations is 0.016 and the amount of drift between French and Han Chinese human populations is 0.22 [33]. The latter is approximately equal to the drift separating populations A and C in our 4-population graph, when each branch has length equal to 0.05.

In each scenario, we simulated *N* = 400 trait-associated SNPs. For each SNP, we sampled the root allele frequency (*f*_{R}) from a Beta(2,2) distribution, to emulate the fact that, in the real data, variants in the leaf populations tend to be further away from the boundaries of fixation and extinction than under a uniform distribution. We evolved the SNPs throughout an admixture graph forwards in time using truncated Normal distributions with point masses at the boundaries of 0 and 1, and used binomial distributions to sample panel allele frequencies from the leaf populations. The effect sizes of the SNPs were drawn from a Normal distribution with mean 0 and we simulated polygenic adaptation (with *α* = 0.1 and *α* = 0.2) in a particular branch using only the sign of the effect size of each SNP. We also simulated an additional 10,000 SNPs that evolved neutrally under the same demography, so as to estimate the neutral population covariance matrix.

For all simulation scenarios, we tested five 400-SNPs replicates. We ran the MCMC four times on each simulation to check that it was behaving consistently. We observed that the runs for each simulation were very similar to each other, so we only show one run for each simulation.

When the *α* parameter is large (*α* = 0.2), the MCMC performed very well (Figure 3). For a tree (Figures S2 and S3) or a graph (Figures S4 and S5) with small branch lengths (0.02), the *Q*_{B} statistic corresponding to the selected branch was included as a potential candidate branch in the MCMC in all simulations. PhenoGraph then consistently converged on the appropriate joint distribution of selection parameters. When the branches were simulated to be longer (0.05), the MCMC performed well, but, in a few simulations, it produced positive estimates for *α* parameters in neutral branches or failed to find evidence for selection in any branch (Figures S6 to S9). This occurred more often when the *α* parameter was simulated to be smaller (*α* = 0.1), but again, this was less of a problem with short-branch graphs (Figures S10 to S13) than with long-branch graphs (Figures S14 to S17). In general, we conclude that the method performs best when selective pressures are strong and/or exerted over long time periods (i.e. large *α)*, and when drift parameters are small.

We were concerned about false positive estimates of selection when the graph is mis-specified. To assess this, we simulated a graph like the one shown in Figure 3.C but with no selection. We first run PhenoGraph while correctly specifying the topology and the branch lengths (of length equal to 0.02) as input (Figure S18), and observed that all posterior *α* estimates are tightly centered at 0, as expected. Then, we simulated a graph with the same topology but with each branch having length equal to 0.03, while incorrectly under-estimating the length of each branch to still be equal to 0.02 (Figure S19). Finally, we simulated the same graph but with each branch having length equal to 0.04 while incorrectly under-estimating the branch lengths to all be equal to 0.02 (Figure S20). With increasingly stronger misspecification of the branch lengths, we observe that the behavior of some of the posterior estimates becomes more erratic. Visual inspection of the MCMC trace indicates that underestimation of the branch lengths makes the chain to become more “sticky”, causing some parameters to get stuck at incorrect areas of parameter space for long periods of time

We also simulated a neutral graph as in Figure 3.C but pretended like population A had not been sampled, and that the graph was (incorrectly) estimated to be a 3-population tree like the one in Figure 3.A. This topological misspecification does not seem to affect the inference of neutrality (Figure S21), but we do not discard the possibility of other incorrect types of topologies that could generate wrong inferences. We therefore stress that the admixture graph – especially the branch lengths – relating the populations under study should be correctly estimated before running PhenoGraph. We also advise to run the MCMC only when there is significant evidence for selection based on the *Q*_{X} statistic [15], which does not need an admixture graph as input, as it uses the full covariance matrix to model the expected amount of drift separating each of the populations.

### 2.5 Application to 1000 Genomes data

We tested our method on traits from 43 GWAS on 42 different traits (Table S1, two of the GWAS are for age at menarche) that were previously assembled as part of a meta-analysis studying the genetic correlations between such traits, using approximately independent linkage disequilibrium blocks across the genome [34]. For each block with a posterior probability of containing an association > 90%, we included the SNP with the maximum posterior probability of being the causative variant into our list of candidate trait-associated SNPs. We used the VCF data from the 1000 Genomes Project [35] and built admixture graphs using *MixMapper* [21]. We excluded SNPs for which the ancestral allele in the 1000 Genomes data was unknown or unsure (lower case in VCF file). Because *MixMapper* cannot distinguish between the two drift values corresponding to two admixing branches with a common child node and the drift value specific to the immediate descendant branch of the child node (Figure 2 in ref. [21]), we forced the drift in the two admixing branches to be equal to 0.001 and assigned the drift estimated by *MixMapper* to the branch immediately descending from the child node.

We began by fitting a 7-leaf tree with no admixture events containing diverse populations sampled across the world (Figure S22.A). The tree included Nigerian Esan (ESN), Sierra Leone Mende (MSL), Northern Europeans from Utah (CEU), Southern Europeans from Tuscany (TSI), Dai Chinese (CDX), Japanese (JPT) and Peruvians (PEL). We took trait-associated variants to be under polygenic adaptation if the P-value for the corresponding *Q*_{X} [15] statistic (testing for overall selection among the populations) was < 0.05/*n*, where *n* is the number of assessed GWASs. Traits with associated variants that passed this criterion are shown in Table 1. To account for possible artifacts arising from the ascertainment scheme for each GWAS, we also generated 1,000 pseudo-replicates in which we randomly switched the sign of the estimated effect size for all trait-associated SNPs. This serves to preserve the SNP ascertainment of each trait, while removing the effect of selection. We then computed a second P-value of the observed *Q*_{X} (*P*_{rand}) by comparing it to these pseudo-replicates (Table 1).

We ran our MCMC on these trait-associated variants and obtained posterior distributions for the *α* parameters with the strongest evidence for selection, prioritizing branches as explained above (Figure S23). The P-values of the *Q*_{B} statistic (obtained from a distribution) for each branch are shown in Table S2. We find strong evidence for selection on variants associated with height, educational attainment and self-reported unibrow, but little or no evidence for variants associated with male-pattern baldness or schizophrenia: even though these trait-associated variants are significant under the *Q*_{X} and *Q*_{B} frameworks, all their *α* parameters are approximately centered at 0. For height, we observe both selection for variants increasing height in the ancestral European branch and for variants decreasing height in the ancestral East Asian / Native American branch. However, this is only a consequence of the MCMC showing alternate strong support for selection in either one or the other branch at different points in the run, but only weak support for selection in both branches simultaneously (Figure S24), suggesting we are unable to discern which among these is the correct configuration.

We also tested a graph with an admixture event (Figure S22.B). This graph contained Yoruba (YRI), Colombians (CLM), CEU, CHB and PEL. We modeled CLM as resulting from an admixture event between CEU (76.55%) and PEL (23.45%), the latter of which is the panel with the highest amount of Native American ancestry in the 1000 Genomes Project [35]. Here, we recapitulated many of our previous findings from the 7-leaf tree (Table 1, Figure S25), like selection on variants associated with height and educational attainment. We list the P-values of the *Q*_{B} statistic for each branch in Table S3.

To facilitate the visualization of posterior distributions for *α* parameters, we developed a new way to plot polygenic adaptation in a graph: a “pheno-graph”. In a pheno-graph, the vertical component of a non-admixing branch is proportional to the amount of genetic drift that it experienced (calculated via *MixMapper)*. The position of admixed nodes was determined based on the drift value of one randomly chosen parent branch. The colors indicate the mean posterior estimate of the selection parameter for variants associated with the corresponding trait (with red indicating an increase in trait-increasing variant frequency, and blue indicating a decrease in trait-increasing variant frequency). We plotted pheno-graphs for all traits that passed the significance criterion in the 7-leaf tree (Figure 4) and in the 5-leaf admixture graph (Figure 5).

To make sure there were no artifacts due to GWAS ascertainment [15], we also generated an empirical null distribution in which we produced 1,000 samples, each containing SNPs that were frequency-matched to the trait-associated SNPs, using their allele frequency in CEU, for our three sets of SNPs with strongest evidence of selection: variants associated with educational attainment, with self-reported unibrow and with height. We computed the *Q*_{X} statistic for each of these SNP samples, using the population panels from Figure 4. We do not observe a value of *Q*_{X} as high as the one observed in the real data, for either of the three sets (Figure 6).

To test how robust our results were to our modeling assumptions, we also performed a simpler two-tailed binomial sign test between every pair of 1000 Genomes panels. The assumption here is that – for every panel X and Y – we should observe roughly equal number of trait-increasing alleles at higher frequency in X than in Y as trait-decreasing alleles at higher frequency in X than in Y, under a model of neutrality with respect to the effect size sign [36]. This test only uses information about the sign of the effect estimates of each SNP, not their magnitudes, and does not use information about genome-wide drift parameters between each population. Thus, they are bound to have less power than the *Q*_{B}, *Q*_{X} or MCMC tests. The P-values for these pairwise binomial tests are shown in Tables S4 to S8 for all traits that were found to have significant evidence of selection using the *Q*_{X} statistic. The top 10 most significant pairwise comparisons are shown in Tables S9 to S13. For ease of visualization, we also plotted, for each panel, the number of pairwise tests involving that panel that resulted in a P-value < 0.05 (Figures S26 to S30).

We were interested in verifying how sensitive different proportions of missing data (i.e. removal of SNPs) or erroneous effect size estimates would be to our three strongest signals of polygenic adaptation, on variants associated with height, educational attainment and unibrow. For this purpose, we focused on the comparison between CEU and CHB. First, we simulated different proportions of missing trait-associated SNPs – ranging from 5% to 95%, with step sizes of 5%. For each of 10,000 simulations under each missing data scenario, we assessed how often the polygenic score for unibrow and educational attainment in CHB was higher than the polygenic score for CEU, like we observe in the 1000 Genomes data. Height follows the opposite pattern (with CEU having a higher polygenic score than CHB), so in that case we assessed how often its polygenic score in CEU was higher than in CHB, across the 10,000 simulations for each scenario. Note that we built these scores using only the SNPs used in our selection tests. The results are in Figure S31. For example, we see that – even with 20% missing data – the polygenic scores for either of the 3 traits preserve the observed relationship of inequality between CEU and CHB almost 100% of the time. Finally, we simulated a situation in which some proportion of the signs of the effect size estimates were misassigned. We then assessed how often we could replicate the signal we see between CEU and CHB, but this time under different proportions of sign misassignment (Figure S32).

Finally, to understand how the signal of selection was distributed among our SNPs, we plotted the absolute value of the effect sizes of trait-associated SNPs for height, educational attainment and self-reported unibrow, as a function of the difference in frequency observed between CHB and CEU, polarized with respect to the trait-increasing allele in each SNP (Figure S33). We find that, in the case of self-reported unibrow, there are three variants of large effect with large frequency differences contributing to a higher polygenic score in CHB: rs3827760, rs16891982 and rs12916300. These SNPs are located in the genes *EDAR, SLC45A2* and *OCA2*. These are genes involved in pigmentation and skin development, and all three have documented signatures of selective sweeps causing strong allele frequency differences between Europeans and East Asians [7, 3, 37, 2, 12]. After removing SNPs with large absolute effect size values (≥ 0.05), the P-value of the *Q*_{X} statistic for these variants remains significant (*P* = 7.04 ∗ 10^{−5}). When looking at the other two sets (variants associated with height and educational attainment), the signal of selection is more uniformly distributed among the SNPs, with no strong outliers of large effect with large frequency differences (Figure S33).

### 2.6 Application to Lazaridis et al. (2014) data

We applied our method to a more broadly sampled SNP chip dataset containing present-day humans from 203 populations genotyped with the Human Origins array [18, 38]. This dataset was imputed using SHAPEIT [39] on the Michigan Imputation Server [40] with the 1000 Genomes Phase 3 data [35] as the reference panel (Bhérer et al. in prep.). We tested for polygenic adaptation in a 7-leaf admixture graph. This graph contains the panels Yoruba, Mandenka and Sardinian, along with the following 4 combinations of panels, which we built so as to have a large number of individuals per panel. The panel “Oceanian” contains the panels Papuan and Australian. The panel “EastAsian” contains the panels Cambodian, Mongola, Xibo, Daur, Hezhen, Oroqen, Naxi, Yi, Japanese, Han NChina, Lahu, Miao, She, Han, Tujia and Dai. The panel “NativeAmerican” contains the panels Maya, Pima, Surui, Karitiana and Colombian. Finally, we modeled Europeans as a 2-way mixture of an ancestral component related to “NativeAmerican” and another component that split basally from the Eurasian tree. This was the mixture fitted to Europeans by ref. [21], and provides a better fit to the data than modeling Europeans merely as a sister group to East Asians and Native Americans. Though we recognize that Europeans are better modeled as a 3- or 4-way mixture of ancestral components [38, 33, 41], it is hard to produce such a mixture without resorting to ancient DNA data (see Discussion). We tested 3 different versions of this graph, each containing three different sets of European populations (Figure S22.C) distinguished by how much “early European farmer” (EEF) ancestry they had (based on Figure 4 of ref. [38]). “Euro-peA” (low EEF) contains the following panels: Estonian, Lithuanian, Scottish, Icelandic, Norwegian, Orcadian, Czech, English. “EuropeB” (medium EEF) contains Hungarian, Croatian, French, Basque, Spanish North and French South. Finally, “EuropeC” (high EEF) contains Bulgarian, Bergamo, Tuscan, Albanian, Greek and Spanish.

Trait-associated variants with significant evidence for polygenic adaptation are listed in Table 2 and the P-values of the *Q*_{B} statistic for each branch are shown in Tables S14 to S16. With these data, we are able to recapitulate the adaptive increase in height-increasing variants in Europeans we had seen before, but only observe it in populations with medium or low EEF ancestry (Figures 7 and S34 to S38). This pattern is consistent with previous observations made using ancient DNA in Europeans [12]. We also recapitulate selection patterns on variants associated with other traits, like unibrow, educational attainment and male-pattern baldness, and observe evidence for polygenic adaptation in some additional trait-associated variants, like photic sneeze reflex (Table 2).

To check if there were systematic biases in ancestral/derived allele polarity relative to the direction of the effect size, we performed a two-tailed binomial test on each trait for which we found significant evidence of polygenic adaptation in the Lazaridis et al. or the 1000 Genomes dataset (Table S17). We find that only schizophrenia has a significant bias, showing an excess of derived alleles with negative effect sizes (P = 0.03511), though this is not significant after Bonferroni correction. We therefore caution that the evolution of these trait-associated variants may not be well-modeled by the multivariate Normal assumptions that we make to calculate the *Q*_{B} statistic or when running the MCMC.

As before, to check the robustness of our results to our modeling assumptions, we show P-values for pairwise binomial sign tests involving each of the panels in the Lazaridis et al. (2014) dataset in Tables S18 to S22. The top 10 most significant pairwise comparisons are in Tables S23 to S27. Figures S39 to S43 show, for each panel, the number of pairwise tests involving that panel that resulted in a P-value < 0.05.

## 3 Discussion

We have developed a method to infer polygenic adaptation on trait-associated variants in an admixture graph, so as to be able to pinpoint where in the history of a set of populations this type of selective processes took place. Our method requires GWAS data for a particular trait, allele frequency data for a set of populations, and a precomputed admixture graph that relates these populations with each other. Importantly, the method relies on the admixture graph as an accurate description of the ancestral genome-wide relationships among the populations under study. Potential users should be careful about correctly estimating branch lengths and ghost populations which are not included in the graph but may have substantial unaccounted ancestry contributions to the populations that are included. We used *MixMapper* [21] to infer the graph topology and branch lengths. Alternatively, one can also use other programs, like *qpGraph* [18] or *TreeMix* [19] to build graphs, though we caution that the estimated drift values of the branches in the output of these programs are scaled (in different ways) by the heterozygosity of ancestral nodes (see Supplementary Material of [21] for a way to properly obtain drift values from differences in allele frequencies between populations).

Running PhenoGraph involves a two-step process, each of which is complementary to the other. The first step – the calculation of the *Q*_{B} statistic – is fast and provides a preliminary way to assess which branch in a graph has significant evidence for polygenic adaptation. However, this statistic does not model the ancestral allele frequencies at each node of the graph. The second step – the MCMC – is slower, but provides posterior distributions for selection parameters under a more parameter-rich model of population history. In our pipeline, we use the first method as a filtering step, to avoid exploring selection parameters in the MCMC for those branches that have little evidence for selection, and encourage the MCMC to be sparse in its assignments of selection in the graph. We illustrate this point in Figure S44, where we show a side-by-side comparison of a pheno-graph built using the posterior *α* estimates and a pheno-graph built using *q _{b} –* a signed version of the

*Q*

_{B}statistic (see Methods).

In application to human populations, we detected signals of polygenic adaptation on sets of variants that have been identified to influence height, educational attainment, and unibrow. Selection on variants associated with height in Europeans has been previously reported elsewhere [15, 14, 16, 12] and our results are consistent with previous findings showing that height-increasing variants are at significantly and systematically higher frequencies in northern than in southern European populations. The signal for selection affecting variants associated with self-reported unibrow is also strong, but partly driven by a few variants of large effect with large frequency differences between populations, which have documented evidence for selective sweeps in genes involved in hair, skin and eye pigmentation, and skin development [3, 37, 7, 2, 12]. Additional trait-associated variants had inconsistent evidence across datasets and graph frameworks (like schizophrenia or male-pattern baldness) and/or were driven by differences in only a few SNPs of small effect (like age at voice drop), and so we do not discuss them.

We find evidence for polygenic adaptation in East Asian populations at variants that have been associated with educational attainment in European GWAS. This result is robust to the choice of data we used (1000 Genomes or Lazaridis et al. (2014) panels). Our modeling framework suggests that selection operated before or early in the process of divergence among East Asian populations – whose earliest separation dates at least as far back as approximately 10 thousand years ago [42, 43, 44, 45] – because the signal is common to different East Asian populations (Han Chinese, Dai Chinese, Japanese, Koreans, etc.). The signal is also robust to GWAS ascertainment (Figure 6), and to our modeling assumptions, as we found a significant difference between East Asian and non-East-Asian populations even when performing a simple binomial sign test (Tables S4, S9, S19 and S24).

Interpreting this and the other signals we found requires awareness of a number of technical caveats, as well as several fundamental conceptual difficulties with the study of polygenic adaptation, some of which may ultimately prove intractable.

### 3.1 What is the signal of polygenic adaptation?

Before discussing these difficulties, it is worth articulating exactly what a signal of poly-genic adaptation consists of. Taking the height example as a case in point, the signal is that a set of genetic variants that have been identified as associated with increased height in a European GWAS are (as a class) at higher frequency in northern Europeans today than would be expected by genetic drift alone. Though this observation is consistent with the hypothesis that natural selection has operated on these variants, it does not necessarily imply that natural selection has operated directly on “height”, nor that observed height differences between northern Europeans and other populations are necessarily genetic and due to selection.

### 3.2 Pleiotropy and phenotype definition

When looking at all our variant classes, we are necessarily limited by the traits that have been defined and studied by others, as we have grouped variants together based on these phenotypes. The use of previously established definitions makes it difficult to understand exactly why these variants may have been under selection in the past. This is perhaps best exemplified by the signal of polygenic adaptation for genetic variants associated with educational attainment.

Standardized schooling – and consequently, the concept of “educational attainment” – was only invented and implemented widely in the last few generations. It is obviously nonsensical to discuss its evolution over the past tens of thousands of years. Instead, it is likely that the set of variants for which we find evidence of selection was associated with some (unknown) phenotype(s) in the past. However, given that selection on these variants likely took place more than 10 thousand years ago, it may be difficult or impossible to identify what these were. A similar problem arises when thinking about the signal of polygenic adaptation on “unibrow”. This is a self-reported phenotype, and the genetic variants that have been identified may simply be associated with pigmentation (assuming people with certain hair and/or skin pigmentation phenotypes are more likely to notice they have hair between their eyes), or alternatively with some other (unmeasured) hair-related phenotype. It is also possible that direct sexual selection for absence or presence of unibrow as an attractive facial feature in certain cultures [46] may be the cause of this signal. Indeed, if a selective agent is cultural, but the culture has since changed, it may be impossible to determine what actually occurred. All these variants are also likely pleiotropic [47], which makes it even harder to determine which phenotypes were truly targeted by selection.

Perhaps, one could try to find the phenotypic gradients along which selection most likely operated [48] by modeling the evolution of trait-associated SNPs for multiple phe-notypes together. However, it is also possible that genetic correlations among traits in the present are not good proxies for genetic correlations in the past.

### 3.3 Relationship between polygenic scores and population mean phenotypes

Another fundamental limitation in interpreting all studies of polygenic adaptation (including this one) is that the connection between the distribution of allele frequencies today and any historical or geographic trends in phenotypes remains questionable. Indeed, though we have motivated this method as a way to identify adaptive shifts in the mean of a polygenic trait, it is a simple fact that massive changes in the mean values of many of the traits we consider have occurred by purely non-genetic environmental processes. For example, the mean height of men in the Netherlands increased from around 166 cm in the mid-1800s to currently over 180 cm [49], bringing the population from around the middle of the pack among European countries to the tallest one in the world. This likely occurred for environmental reasons, like improvements in diet and health care [50, 51]. Likewise, the average educational attainment in Iceland and North America has increased dramatically over the past century, despite a slight estimated *decrease* in the frequencies of genetic variants associated with the phenotype [52, 53, 54]. The somewhat paradoxical conclusion is that actual phenotypes can and do change across populations in directions that are uncorrelated to natural selection (which may in fact be a minor contributor to any such differences). It would be an understatement to say this poses challenges for the interpretation of the current study and others like it.

In fact, the trait-associated variants that we have used only explain a fraction of the narrow-sense heritability of their respective traits, even in the populations in which the association studies were performed. As we have only looked at genome-wide-significant variants, this fraction is small in most cases. For example, the heritability for ’educational attainment’ is estimated to be around 40%, and educational attainment itself is strongly determined by environmental factors [55]. The genome-wide significant SNPs we used in this study (themselves a subset of all SNPs tested in the original GWAS [56]) explain only 1.05% of the total variance for this particular trait. All of the aforementioned traits are likely affected by a myriad of environmental and social variables, which might contribute to determine their ultimate expression in each human individual.

### 3.4 Additional caveats

Beyond the above conceptual difficulties, there are a number of additional caveats with our approach to keep in mind. First, the effect sizes we have used derive from GWAS performed primarily on individuals of European ancestry. Thus, our tests can only detect if variants that have been found to be associated with a trait in European GWAS are significantly higher or lower in a particular (European or non-European) population, relative to what they should be under a pure drift model. This does not necessarily imply that populations for which we find evidence for selection have higher or lower average genetic values of such a trait than other populations. In fact, there is evidence to suggest that loci ascertained in European GWAS do not serve to make good predictors for traits in populations that are distantly related to Europeans [57]. One reason for this is that many or all of the traits we are studying are likely to be influenced in non-European populations by different variants from the ones that have been discovered in European GWAS. SNPs that may be strongly associated with a trait in a particular non-European population (like an African or East Asian panel) may have not reached genome-wide significance in a European GWAS, where those SNPs may not strongly affect the trait or may be at low frequencies. We also do not model dominance, epistasis or gene-by-environment inter-actions between our trait-associated variants and the diverse environments that human populations occupy, and any of these factors may further obscure the relationship between the patterns we observe and the actual underlying genetic contribution to phenotypes in these populations.

Second, we have assumed that all of the GWAS that we have used have properly accounted for population structure. If some of the trait-associated SNPs are in fact false positives caused by uncorrected structure, this could generate a false signal of polygenic adaptation. A future direction could be the incorporation of effect sizes that have been corrected for ancestry or population stratification [16, 58] and also effect sizes from GWAS performed on other populations [59, 60, 61], in order to assess the robustness of our empirical results across variants discovered in studies involving participants of different ancestries.

Third, we made the assumption that the admixture graph for the populations that we use as input is correct. If there are additional unmodeled aspects of the history of the populations, this could induce incorrect inference about the branch on which natural selection has occurred. We also recommend that the individuals in the population panels used as leaves in the graphs have roughly similar amounts of admixture. In other words, the method works best when admixture in the population was ancient enough for the admixed ancestry to have spread uniformly among members of the admixed panel. Otherwise, an admixture graph may not be the most appropriate way to model their evolution.

Finally, we have made an explicit assumption that our model should be sparse; i.e. that polygenic adaptation is rare. If in reality adaptation is common, the PhenoGraph approach will necessarily only identify selection on a small number of branches.

### 3.5 Future directions

A natural extension to the analyses we performed here would be to look at admixture graphs that include extinct populations or species, using ancient DNA [62]. For example, present-day Europeans are known to have resulted from admixture processes involving at least 3 ancestral populations [38], and so modeling them as a sister group to East Asians or as a 2-way mixture between a Native American-related component and a basal Eurasian component may be overly simplistic. Incorporating ancient DNA would not require any additional theoretical work, as ancient populations can be naturally included as leaves in an admixture graph [38, 12]. Care should be taken, however, in making sure that the quality of the ancient DNA data at trait-associated SNPs is accounted for while inferring the number of ancestral and derived alleles, and that there is a sufficient number of ancient individuals per population to detect polygenic adaptation. One could envision either performing pseudo-haploid sampling [18, 33, 12] or using allele frequency estimators obtained from genotype likelihoods [63, 64], while accounting for errors characteristic of ancient DNA [65, 66]. When working with SNP capture data [18, 33], it may be necessary to perform imputation at the GWAS SNPs, if these were not originally covered in the SNP capture array. We aim to tackle these issues in a future study.

One concern when analyzing admixture graphs is identifiability. As we mentioned before, there are multiple configurations of the *α* parameters that may lead to almost identical likelihoods. The use of the spike-and-slab prior and the *Q*_{B} filtering step serve to ameliorate this problem, assuming selection was sparse and only affected a few branches. An avenue of research could involve testing other types of models or constraints that may serve to better compare among different selection configurations, perhaps without having to reduce the space of possible candidate branches a priori, for example using reversible-jump MCMC for model selection [67].

In the future, it may be worth incorporating stabilizing selection into this method [47], or exploring tests of polygenic adaptation in the context of other types of demographic frameworks, like isolation-by-distance [68] or population structure [69] models. For example, one could envision settings in which trait-associated variants would be best modeled as expanding or contracting over a geographically extended area over time, in a way that is not explainable by genetic drift alone.

Lastly, we note that despite some clear methodological and conceptual differences, our method bears a close relationship to a number of methods for inferring changes in the rate of phenotypic evolution on species phylogenies over macroevolutionary timescales. Our use of the Normal model of drift as an approximation to the Wright-Fisher diffusion is closely analogous to the use of Brownian motion models in some phylogenetic methods [70, 71, 72, 73]. It may also be worth exploring the relationship between Ornstein-Uhlenbeck models for phenotypic evolution on phylogenies [74, 75] and the aforementioned hypothetical extension of our method to include stabilizing selection, as the two processes are closely related [76, 47].

## 4 Methods

### 4.1 MCMC implementation

For the MCMC transition probabilities of the *α* parameters, we use a Normal distribution with constant variance. For the transition probabilities of the ancestral allele frequencies, we use a truncated Normal distribution with point masses at 0 and 1, with variance equal to a constant (input by the user) times *f*_{X} (1− *f*_{X}) where *f*_{X} is the frequency of an allele in its current state. This allows for the proposed transitions to be larger for SNPs at medium frequencies and smaller for SNPs at high or low frequencies. We apply a Hastings correction in the acceptance ratio to account for this asymmetric proposal distribution.

For all applications above, we run our MCMC sampler for 1 million steps with a burn-in period of 10,000, and obtain posterior samples every 1,000 steps. The variance of the transition probabilities of the ancestral nodes and the *α* parameters were chosen so that the acceptance rate was close to 23% for each set of parameters. For the spike-and-slab prior, we set *τ* to be equal to 0.1, and *ζ* to be equal to 25. The lower and upper boundaries of the uniform hyperprior for *κ* were set to be 60% and 80%, respectively. We note that all of these parameters can be adjusted by the user as needed.

### 4.2 Derivation of *Q*_{B} statistic

Let the mean-centered vector of genetic values, **F** be the among-population genetic covariance matrix, and *V*_{A} be the additive genetic variance of the ancestral population for a given character. We compute *V*_{A} by taking the ancestral frequency for each SNP *i* to be the mean sample frequency over all populations :
where is the effect size estimate for the trait at SNP i. Following ref. [77, 15], if we are willing to assume that
then by the definition of the multivariate normal distribution, *for any choice* of and
where we have set for notational convenience. It follows that
and this holds for all choices of . Therefore, the square of *q*_{b} has a distribution under the null. Importantly, one is free to choose such that it represents a branch (j) in an admixture graph.

If j is a branch in an admixture graph, and we choose to scale such that it has unit length:
then
has an interpretation as the amount of among population additive genetic variance which we would expect to come about *because of* drift which occurs down branch j. In turn, is the actual amount of variance observed along the axis consistent with that branch. The ratio of these two quantities is our statistic *Q*_{B}(*j*) (with distribution ) and it is therefore the appropriate test statistic to ask whether there is evidence to reject neutrality along a certain branch. Note that, by design, branches with the exact same child nodes have equal *Q*_{B} statistics, as do branches at the root of the graph.

### 4.3 Choosing a cutoff for *Q*_{B}

We aimed to find a cutoff for *Q*_{B} that would serve to minimize the number of candidate branches to be explored in the MCMC while at the same time trying to ensure that the true selected branches are included among these candidates. One choice would be to select the cutoff of a distribution that would correspond to a P-value of 0.05/*k*, where *k* is the number of branches tested. We find, however, that a constant value for this cutoff is not the most desirable choice as a way to prioritize branches for exploring the strength of selection in each of the branches in the MCMC, as graphs of different sizes (i.e. amounts of drift) result in quite different senstivity values, as well as number of candidate branches included, when *α* = 0.1 (Panels B and C in Figures S45 and S46 for two cases where each branch has drift length = 0.02 and Figures S47 and S48 for two cases where each branch has drift length = 0.05). We observe the same issues when simulating under *α* = 0.2 (Figures S49 to S52).

A more stable strategy across graphs of different sizes that also works better at minimizing the number of candidates is to choose the cutoff to be a fraction of the largest *Q*_{B} statistic among all branches in the graph (Panels D and E in Figures S45 to S48). For all analyses below, we chose this to be 1/3 of the maximum *Q*_{B} statistic. We note that this is a less conservative strategy than the fixed cutoff. However, it is important to remember that we do not aim to formally test for selection here, but merely obtain a set of likely candidate branches for the MCMC to explore downstream, some of which may end up producing a posterior mean estimate of *α* that is consistent with neutrality (i.e. *α* = 0).

## 5 Acknowledgments

We thank Claude Bhérer, Guy Sella, Molly Przeworski, Graham Coop, Joshua Schraiber, Simon Myers, Benjamin Peter and members of the Sella and Przeworski labs for helpful advice and discussions. We also thank Thomas Mailund for help with running the *admixturegraph* R package, and Mark Lipson for help with running *MixMapper*.

## Footnotes

↵* fernandoracimo{at}gmail.com

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].
- [79].
- [80].
- [81].
- [82].
- [83].
- [84].
- [85].
- [86].
- [87].
- [88].
- [89].
- [90].
- [91].
- [92].