Detecting polygenic adaptation in admixture graphs

Fernando Racimo; Jeremy J. Berg; Joseph K. Pickrell

doi:10.1101/146043

Abstract

An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method – which we call PolyGraph – has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different human populations.

1 Introduction

There is much interest in identifying the individual genetic variants that have experienced natural selection during recent human evolution. Many popular methods tackle this problem by identifying alleles that have changed frequency faster than can be explained by genetic drift alone, and that can instead be explained by selective processes. These methods exploit patterns like haplotype homozygosity [1, 2, 3] and extreme population differentiation [4, 5, 6], and have yielded several important candidates for human adaptation: for example, LCT [7], EDAR [2], EPAS1 [4] and the FADS region [8]. In order for these signals to be detectable at the level of an individual locus, the historical changes in allele frequency must have been large and rapid. Therefore, they can only be produced by alleles that confer a strong selective advantage.

With the advent of large-scale GWAS for a variety of measurable traits, however, it has now become possible to detect a more subtle mechanism of adaptation. If a trait is polygenic, positive selection may instead occur by concerted shifts at many loci that all contribute to the variation in a trait. Over short time scales, these shifts are expected to be small (but see [9] for polygenic dynamics under longer time scales). They are also expected to occur in consistent directions, such that alleles that increase the trait will systematically rise in frequency (if selection favors the increase of the trait) or fall in frequency (if selection operates in the opposite direction). None of the allele frequency changes need to be large on their own for the phenotypic change to be large. This process is called polygenic adaptation and may underlie major evolutionary processes in recent human history [10, 11, 12, 13].

A number of methods have been developed to detect polygenic adaptation using loci identified from GWAS. Turchin et al. [14] was the first such study. They developed a test for polygenic adaptation between two populations, and showed that there were systematic frequency differences at height-associated loci between northern and southern Europeans, which could not be explained by genetic drift alone. Berg et al. [15] developed a more general method to detect polygenic adaptation by testing for over-dispersion of mean genetic values among several populations, using a genome-wide population co-variance matrix to predict how alleles should behave under neutrality. More recently, Robinson et al. [16] used theory by Ovaskainen et al. [17] to develop a similar population differentiation method to detect polygenic adaptation with GWAS. They also made use of the genome-wide covariance matrix, but, in contrast to Berg et al., their method is implemented in a Bayesian linear mixed model.

None of these methods require a detailed model of human history to detect polygenic adaptation. Their use of the genome-wide covariance matrix allows them to capture patterns of genetic drift among populations without having to infer their history. While this makes them quite powerful, it also means that they are not very useful at determining where and when polygenic adaptation took place in the past.

Here, we develop a method to detect polygenic adaptation that uses a more parameter-rich model of historical population structure: an admixture graph, which is a simplified representation of the history of divergences and admixture events among populations [18, 19]. Having an explicit model allows us to infer where particular bouts of polygenic adaptation took place in human history, so as to better understand how selection on trait-associated variants has occurred over past generations.

2 Results

2.1 Model

Assume we have measured genotypes at a SNP that influences a trait in a set of M populations. Let d_m be the count of the derived allele in population m and let be the vector across the M populations of each d_m observation. Let n_m be the total number of chromosomes observed in population m (together ). Assume we have an admixture graph G relating these populations, and that this graph consists of an accurate topology, as well as accurate branch lengths and admixture rates. Branch lengths are in units of drift, which are approximately equal to t/2N_e, where, for each branch, t is the number of generations and N_e is the effective population size, assuming t ≪ N_e [20]. In practice, we can estimate such a graph from neutral genome-wide data, and we use the program MixMapper [21] when applying our method to real data below.

We wish to model the changes in frequency of the trait-associated allele over the graph G. At each node in the graph, we introduce a parameter that corresponds to the allele frequency of the variant at that node. Let f_R be the derived allele frequency at the root of the graph, be the vector of allele frequencies at all the other internal nodes of the graph, and be the vector of allele frequencies at the tips of the graph.

The probability of the parameter values and the data can then be decomposed as follows:

We now take each of terms above in turn. First, the probability of the observed counts is simply a product of binomial probabilities: where f_m is the element of that corresponds to population m.

To get the probabilities of the changes in allele frequency across different nodes, consider a single branch of G. Assuming the branch is relatively short (such that the allele does not approach fixation or extinction in this time period) and there is no natural selection, we can use the Normal approximation to the Wright-Fisher diffusion [22, 23, 24] to model the allele frequency at the descendant node of the branch (f_D) as a function of the allele frequency at the ancestral node (f_A): where c corresponds to the amount of drift that has occurred in the branch. In practice, we use a truncated Normal distribution with point masses at 0 and 1, to account for the possibility of fixation or extinction of the allele [25].

In our model there may have also been selection on the allele on the branch, such that it was pushed to either higher or lower frequency because of its influence on a trait. We can model the selected allele frequency by modifying the infinitesimal mean of the Wright-Fisher diffusion and approximating the diffusion with a Normal distribution that now includes some additional terms ([14, 25, 26], Bhérer et al. in prep.): where is the effect size estimate at that site (defined with respect to the derived allele), is some function that relates the effect size estimate to the selective pressure and α is our positive selection parameter, which is approximately equal to the product of the selection coefficient for the advantageous allele and the duration of the selective process [14, 27]. In practice, we will set to be equal to the sign (+1 or −1) of (Figure 2), so as to avoid giving too much weight to variants of strong effect. We will model selection only on SNPs that are associated with a trait in a particular GWAS.

We can calculate the probability of these parameters at a particular site as the product of the Normal probability densities that correspond to the evolution of allele frequencies down each branch times a binomial probability density to account for sampling error. Let us denote the Normal density that corresponds to a particular branch λ as , where we suppress notation of for clarity. Then, for example, the probability of a given pattern of allele frequencies and sample counts over a rooted 3-leaf tree with 4 branches λ, ι, υ, ψ (Figure 1) can be computed as follows: when the α parameters and the allele frequency at the root of the tree are known. Note that some of the symbols here correspond to the same allele frequencies. For example, if the υ branch is one of the immediate descendant branches of the ι branch, then . Assuming SNPs are independent, we can compute the probability of the allele frequency configurations at all N trait-associated SNPs as a product over the probabilities at each of the SNPs: where P_i is the probability of the parameters of interest at SNP i under our tree model.

Figure 1:

Schematic of drift values (c) and allele frequencies (f) for a 3-leaf population tree with no admixture.

More generally, we can also compute the probability of our parameters in an admixture graph, containing nodes with more than one parent. In that case, the probability of an allele frequency of an admixed node is a weighted sum of the probability paths corresponding to its two parents, where the weights are the admixture rates for each of the two contributions.

In practice, for a given SNP, we know neither the allele frequencies at the inner nodes nodes, at the tip nodes and at the root node (f_R), nor the α parameters in each branch . We want to obtain a posterior distribution of these parameters, given the data and the known graph: . We aim to do this for all trait-associated SNPs. We therefore developed an MCMC sampler to transition between the states of these variables and estimate their posterior distribution (Figure 2).

Figure 2:

Schematic of PolyGraph estimation procedure for a 4-population graph with one admixture event. Panel A: The first step is the estimation of the admixture graph topology using neutral SNP data, via an admixture-graph fitting program like MixMapper. Panel B: Then, we use the Q_B statistic to determine which branches to explore in the MCMC. Selection parameters whose corresponding branches have a Q_B statistic that is smaller than a specific cutoff (red line) are set to a fixed value of 0 in the MCMC. Panel C: Model for MCMC sampling. The SNP frequencies in the nodes of the graph are shown in green, while the selection parameters for each candidate branch are shown in purple. For each SNP, the likelihood of each branch of the graph is a Normal distribution. To model the sampling of derived alleles in the leaves of the graph, we use a binomial distribution.

We set the prior for the frequencies at the root f_R to be a uniform distribution. For a given SNP i,

As there are many combinations of α parameters that generate almost equivalent likelihoods in a complex admixture graph (Figure S1), we use a “spike-and-slab” prior for the α parameters, so as to promote sparsity. For a given branch j,

Figure S1:

Four example scenarios in which different combinations of α parameters can produce very similar likelihood values.

This is a mixture of two Normal distributions centered at 0: one of the distributions has a wide standard deviation (τ), while the other has a much narrower standard deviation, which is a fraction (1/ζ) of τ, and approximates a point mass at 0 [28, 29]. Here, κ is the mixture probability of drawing from the narrower Normal distribution, and we model it as a uniform hyperprior (see Methods). The idea behind this is that our assumed prior belief is that only a few of the branches in the admixture graph have experienced bouts of polygenic adaptation, so we reward α parameters that tend to stay in the neighborhood of 0 during the MCMC run.

2.2 A statistic for prioritizing branches

We observed via simulations that different combinations of α parameters can produce very similar likelihood values. This causes the MCMC sampler to explore different combinations of α values in the same posterior run, when only one such combination is actually correct (Figure S1). The aforementioned spike-and-slab prior serves to partially ameliorate this problem, but we aimed to find a way to further encourage sparsity by reducing the possible number of candidate branches that are explored in the MCMC.

We therefore devised a set of summary statistics that can be computed before starting an MCMC run and are meant to detect branch-specific deviations from neutrality. Let F be the empirical population covariance matrix, which – under ideal neutral conditions – should be determined by the admixture graph connecting all the populations. Let be the mean-centered vector of estimated mean genetic values for each of the M populations, computed from the N SNPs that are known to be associated with a trait. For a specific population m:

Here, is the sample frequency of SNP i in a panel of population m, and is its effect size estimate. The vector therefore contains the z_m values for all M populations.

Furthermore, let us define the vector for a particular branch j to be equal to the contribution w of that branch to each of the leaves of the graph. For example, branch v-q in Figure 2 is a full ancestral branch to leaf C, a partial ancestral branch to leaf B (due to the admixture event), but not an ancestral branch to either A or D. Therefore, the vector is equal to (w_A,w_B,w_C,w_D) = (0, 1 – γ, 1, 0). By the same reasoning, and .

Now, let: where is a scaled version of so that is has unit length. The numerator of this statistic is proportional to the squared covariance between the genetic values and the vector representing the contribution of drift down branch j to the overall pattern of among population divergence. The denominator gives the expectation of the numerator under the neutral model. This statistic reflects how much of the deviations from neutrality among the population mean genetic values for a trait is due to branch j. One can show (see Methods) that Q_B(j) has a distribution under a null model of multivariate Normal drift, for any branch j, and excessively large values of Q_B(j) therefore represent evidence suggesting non-neutral evolution down branch j.

We used this summary statistic (computed for each branch) to prioritize which branches to explore in our MCMC. We applied a prior point mass of zero to all branches whose corresponding Q_B statistic were smaller than a particular cutoff. The choice of cutoff was based on simulations (see Methods). The MCMC only produces posterior samples for branches that pass this cutoff and are therefore highly deviated from their expectation under neutrality. We also update the α parameters of these latter branches in our MCMC with a frequency proportional to their Q_B values, in a similar fashion to [30].

2.3 Implementation

We implemented both the MCMC and the Q_B statistic computation in a program that we call PolyGraph. We use the R packages admixturegraph [31] and igraph [32] to visualize and manipulate various aspects of a graph. The R scripts to run PolyGraph can be downloaded here: https://github.com/FerRacimo/PolyGraph

2.4 Simulations

We explored different demographic scenarios. We first simulated a simple three-leaf tree with four branches, in which the sampled panels in the leaves were each composed of 100 diploid individuals (Figure 3.A). We tested scenarios of different branch lengths: each of the branches was simulated to be either of length 0.02 or of length 0.05. We also tested different types of branch under selection (either a terminal branch or an internal branch). We additionally tested a four-leaf admixture graph with one admixture event (Figure 3.C) in these same scenarios. For comparison, the amount of genetic drift between Spanish and French human populations is 0.016 and the amount of drift between French and Han Chinese human populations is 0.22 [33]. The latter is approximately equal to the drift separating populations A and C in our 4-population graph, when each branch has length equal to 0.05.

Figure 3:

We simulated 400 SNPs affecting a trait under polygenic adaptation, and then used our MCMC to obtain posterior distributions of the α parameters for each branch. The red arrows denote the selected branch. The red line in the box-plots denotes the simulated value of α (in this case, 0.2). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points. The numbers in each graph denote the drift lengths and admixture proportions. A) Three-leaf tree with selection in a terminal branch (B-q). B) Three-leaf tree with selection in an internal branch (q-r). C) Four-population admixture graph with selection in an internal branch (v-q). D) Four-population admixture graph with selection in a terminal branch (C-v).

In each scenario, we simulated N = 400 trait-associated SNPs. For each SNP, we sampled the root allele frequency (f_R) from a Beta(2,2) distribution, to emulate the fact that, in the real data, variants in the leaf populations tend to be further away from the boundaries of fixation and extinction than under a uniform distribution. We evolved the SNPs throughout an admixture graph forwards in time using truncated Normal distributions with point masses at the boundaries of 0 and 1, and used binomial distributions to sample panel allele frequencies from the leaf populations. The effect sizes of the SNPs were drawn from a Normal distribution with mean 0 and we simulated polygenic adaptation (with α = 0.1 and α = 0.2) in a particular branch using only the sign of the effect size of each SNP. We also simulated an additional 10,000 SNPs that evolved neutrally under the same demography, so as to estimate the neutral population covariance matrix.

For all simulation scenarios, we tested five 400-SNPs replicates. We ran the MCMC four times on each simulation to check that it was behaving consistently. We observed that the runs for each simulation were very similar to each other, so we only show one run for each simulation.

When the α parameter is large (α = 0.2), the MCMC performed very well (Figure 3). For a tree (Figures S2 and S3) or a graph (Figures S4 and S5) with small branch lengths (0.02), the Q_B statistic corresponding to the selected branch was included as a potential candidate branch in the MCMC in all simulations. PolyGraph then consistently converged on the appropriate joint distribution of selection parameters. When the branches were simulated to be longer (0.05), the MCMC performed well, but, in a few simulations, it produced positive estimates for α parameters in neutral branches or failed to find evidence for selection in any branch (Figures S6 to S9). This occurred more often when the α parameter was simulated to be smaller (α = 0.1), but again, this was less of a problem with short-branch graphs (Figures S10 to S13) than with long-branch graphs (Figures S14 to S17). In general, we conclude that the method performs best when selective pressures are strong and/or exerted over long time periods (i.e. large α), and when drift parameters are small.

Figure S2:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with small branch drift parameters (0.02) and α = 0.2 (red line) simulated in a terminal branch (B-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S3:

Posterior distributions of α parameters for five 400-SNP simulations of a 3-leaf tree with small branch drift parameters (0.02) and α = 0.2 (red line) simulated in an interior branch (q-r). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S4:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with small branch drift parameters (0.02) and α = 0.2 (red line) simulated in a terminal branch (C-v). The upper and lower hinges denote the 25th and 75th percentiles. The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S5:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with small branch drift parameters (0.02) and α = 0.2 (red line) simulated in an interior branch (v-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S6:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with large branch drift parameters (0.05) and α = 0.2 (red line) simulated in a terminal branch (B-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S7:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with large branch drift parameters (0.05) and α = 0.2 (red line) simulated in an interior branch (q-r). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S8:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with large branch drift parameters (0.05) and α = 0.2 (red line) simulated in a terminal branch (C-v). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S9:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with large branch drift parameters (0.05) and α = 0.2 (red line) simulated in an interior branch (v-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S10:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with small branch drift parameters (0.02) and α = 0.1 (red line) simulated in a terminal branch (B-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S11:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with small branch drift parameters (0.02) and α = 0.1 (red line) simulated in an interior branch (q-r). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S12:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with small branch drift parameters (0.02) and α = 0.1 (red line) simulated in a terminal branch (C-v). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S13:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with small branch drift parameters (0.02) and α = 0.1 (red line) simulated in an interior branch (v-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S14:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with large branch drift parameters (0.05) and α = 0.1 (red line) simulated in a terminal branch (B-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S15:

Posterior distributions of α parameters for five 400-SNPs simulations of a 3-leaf tree with large branch drift parameters (0.05) and α = 0.1 (red line) simulated in an interior branch (q-r). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S16:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with large branch drift parameters (0.05) and α = 0.1 (red line) simulated in a terminal branch (C-v). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

Figure S17:

Posterior distributions of α parameters for five 400-SNPs simulations of a 4-leaf admixture graph with large branch drift parameters (0.05) and α = 0.1 (red line) simulated in an interior branch (v-q). The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points.

We were concerned about false positive estimates of selection when the graph is mis-specified. To assess this, we simulated a graph like the one shown in Figure 3.C but with no selection. We first run PolyGraph while correctly specifying the topology and the branch lengths (of length equal to 0.02) as input (Figure S18), and observed that all posterior α estimates are tightly centered at 0, as expected. Then, we simulated a graph with the same topology but with each branch having length equal to 0.03, while incorrectly under-estimating the length of each branch to still be equal to 0.02 (Figure S19). Finally, we simulated the same graph but with each branch having length equal to 0.04 while incorrectly under-estimating the branch lengths to all be equal to 0.02 (Figure S20). With increasingly stronger misspecification of the branch lengths, we observe that the behavior of some of the posterior estimates becomes more erratic. Visual inspection of the MCMC trace indicates that underestimation of the branch lengths makes the chain to become more “sticky”, causing some parameters to get stuck at incorrect areas of parameter space for long periods of time

Figure S18:

Posterior distributions of α parameters for five 400-SNPs neutral simulations of a 4-leaf admixture graph with drift parameters equal to 0.02 in each branch.

Figure S19:

Posterior distributions of α parameters for five 400-SNPs neutral simulations of a 4-leaf admixture graph with drift parameters equal to 0.03 in each branch, but incorrectly specified to be equal to 0.02.

Figure S20:

Posterior distributions of α parameters for five 400-SNPs neutral simulations of a 4-leaf admixture graph with drift parameters equal to 0.04 in each branch, but incorrectly specified to be equal to 0.02.

We also simulated a neutral graph as in Figure 3.C but pretended like population A had not been sampled, and that the graph was (incorrectly) estimated to be a 3-population tree like the one in Figure 3.A. This topological misspecification does not seem to affect the inference of neutrality (Figure S21), but we do not discard the possibility of other incorrect types of topologies that could generate wrong inferences. We therefore stress that the admixture graph – especially the branch lengths – relating the populations under study should be correctly estimated before running PolyGraph. We also advise to run the MCMC only when there is significant evidence for selection based on the Q_X statistic [15], which does not need an admixture graph as input, as it uses the full covariance matrix to model the expected amount of drift separating each of the populations.

Figure S21:

Posterior distributions of α parameters for five 400-SNPs neutral simulations of a 4-leaf admixture graph with drift parameters equal to 0.02 in each branch. We pretended one of the populations (A) was not sampled, and specified the topology to be a 3-leaf tree with drift parameters equal to 0.02 in each branch. Populations B, C and D were relabeled to be A, B and C, respectively.

2.5 Application to 1000 Genomes data

We tested our method on traits from 43 GWAS on 42 different traits (Table S1, two of the GWAS are for age at menarche) that were previously assembled as part of a meta-analysis studying the genetic correlations between such traits, using approximately independent linkage disequilibrium blocks across the genome [34]. For each block with a posterior probability of containing an association > 90%, we included the SNP with the maximum posterior probability of being the causative variant into our list of candidate trait-associated SNPs. We used the VCF data from the 1000 Genomes Project [35] and built admixture graphs using MixMapper [21]. We excluded SNPs for which the ancestral allele in the 1000 Genomes data was unknown or unsure (lower case in VCF file). Because MixMapper cannot distinguish between the two drift values corresponding to two admixing branches with a common child node and the drift value specific to the immediate descendant branch of the child node (Figure 2 in ref. [21]), we forced the drift in the two admixing branches to be equal to 0.001 and assigned the drift estimated by MixMapper to the branch immediately descending from the child node.

We began by fitting a 7-leaf tree with no admixture events containing diverse populations sampled across the world (Figure S22.A). The tree included Nigerian Esan (ESN), Sierra Leone Mende (MSL), Northern Europeans from Utah (CEU), Southern Europeans from Tuscany (TSI), Dai Chinese (CDX), Japanese (JPT) and Peruvians (PEL). We took trait-associated variants to be under polygenic adaptation if the P-value for the corresponding Q_X [15] statistic (testing for overall selection among the populations) was < 0.05/n, where n is the number of assessed GWASs. Traits with associated variants that passed this criterion are shown in Table 1. To account for possible artifacts arising from the ascertainment scheme for each GWAS, we also generated 1,000 pseudo-replicates in which we randomly switched the sign of the estimated effect size for all trait-associated SNPs. This serves to preserve the SNP ascertainment of each trait, while removing the effect of selection. We then computed a second P-value of the observed Q_X (P_rand) by comparing it to these pseudo-replicates (Table 1).

Figure S22:

A) A 7-leaf population tree containing the following population panels from the 1000 Genomes Project: Nigerian Esan (ESN), Sierra Leone Mende (MSL), Northern Europeans from Utah (CEU), Southern Europeans from Tuscany (TSI), Dai Chinese (CDX), Japanese (JPT) and Peruvians (PEL). B) A 5-leaf admixture graph containing the following panels from the 1000 Genomes project: Yoruba (YRI), Colombians (CLM), CEU, CHB and PEL. C) A 7-leaf admixture graph containing panels (and combinations of panels) from the imputed Lazaridis et al. (2014) dataset. EastAsian = [Cambodian, Mongola, Xibo, Daur, Hezhen, Oroqen, Naxi, Yi, Japanese, Han NChina, Lahu, Miao, She, Han, Tujia, Dai]. NativeAmerican = [Maya, Pima, Surui, Karitiana, Colombian]. Oceanian = [Papuan, Australian]. We tested three graphs with three different combinations of European populations, with different amounts of EEF ancestry: EuropeA (low EEF) = [Estonian, Lithuanian, Scottish, Icelandic, Norwegian, Orcadian, Czech, English]. EuropeB (medium EEF) = [Hungarian, Croatian, French, Basque, Spanish North, French South]. EuropeC (high EEF) = [Bulgarian, Bergamo, Tuscan, Albanian, Greek, Spanish].

View this table:

Table 1:

Trait-associated variants with Bonferroni-corrected significant evidence of being under polygenic adaptation in the 1000 Genomes data, using the Q_X statistic: P < 0.05/n where n is the number of GWAS tested, assuming a χ² distribution. We also computed P-values from 1,000 pseudo-replicates in which we randomly switched the sign of effect size estimates, to account for each GWAS’s ascertainment scheme (P_rand). Trait-associated variants for which P_rand < 0.05/n are denoted with **, and trait-associated variants for which P_rand < 0.05 are denoted with *.

We ran our MCMC on these trait-associated variants and obtained posterior distributions for the α parameters with the strongest evidence for selection, prioritizing branches as explained above (Figure S23). The P-values of the Q_B statistic (obtained from a distribution) for each branch are shown in Table S2. We find strong evidence for selection on variants associated with height, educational attainment and self-reported unibrow, but little or no evidence for variants associated with male-pattern baldness or schizophrenia: even though these trait-associated variants are significant under the Q_X and Q_B frameworks, all their α parameters are approximately centered at 0. For height, we observe both selection for variants increasing height in the ancestral European branch and for variants decreasing height in the ancestral East Asian / Native American branch. However, this is only a consequence of the MCMC showing alternate strong support for selection in either one or the other branch at different points in the run, but only weak support for selection in both branches simultaneously (Figure S24), suggesting we are unable to discern which among these is the correct configuration.

Figure S23:

Box plots of posterior distributions for α parameters, for trait-associated variants with significant evidence for selection in a 7-leaf population tree using the 1000 Genomes data (Figure S22.A). Parameters with flat distributions were discarded a priori using the Q_B statistic. The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points. EDU = educational attainment. MPB_23 = self-reported male-pattern baldness from 23andMe. UB_23 = self-reported unibrow from 23andMe.

Figure S24:

We plotted the α values of two branches for all posterior samples of the MCMC run for variants associated with height in the 7-leaf population tree composed of 1000 Genomes panels. For each MCMC sample, the x-axis corresponds to the α parameter in the ancestral European branch, while the y-axis corresponds to the α parameter in the ancestral East Asian / Native American branch.

We also tested a graph with an admixture event (Figure S22.B). This graph contained Yoruba (YRI), Colombians (CLM), CEU, CHB and PEL. We modeled CLM as resulting from an admixture event between CEU (76.55%) and PEL (23.45%), the latter of which is the panel with the highest amount of Native American ancestry in the 1000 Genomes Project [35]. Here, we recapitulated many of our previous findings from the 7-leaf tree (Table 1, Figure S25), like selection on variants associated with height and educational attainment. We list the P-values of the Q_B statistic for each branch in Table S3.

Figure S25:

Box plots of posterior distributions for α parameters, for trait-associated variants with significant evidence for selection in a 5-leaf admixture graph using the 1000 Genomes data (Figure S22.B). Parameters with flat distributions were discarded a priori using the Q_B statistic. The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points. EDU = educational attainment. MPB_23 = self-reported male-pattern baldness from 23andMe. UB_23 = self-reported unibrow from 23andMe. SCZ = schizophrenia.

To facilitate the visualization of posterior distributions for α parameters, we developed a new way to plot polygenic adaptation in a graph: a “poly-graph”. In a poly-graph, the vertical component of a non-admixing branch is proportional to the amount of genetic drift that it experienced (calculated via MixMapper). The position of admixed nodes was determined based on the drift value of one randomly chosen parent branch. The colors indicate the mean posterior estimate of the selection parameter for variants associated with the corresponding trait (with red indicating an increase in trait-increasing variant frequency, and blue indicating a decrease in trait-increasing variant frequency). We plotted poly-graphs for all traits that passed the significance criterion in the 7-leaf tree (Figure 4) and in the 5-leaf admixture graph (Figure 5).

Figure 4:

Graphs for trait-associated variants that show significant evidence for polygenic adaptation in the 7-leaf tree built using 1000 Genomes allele frequency data. ESN = Nigerian Esan; MSL = Sierra Leone Mende; CEU = Northern Europeans from Utah; TSI = Southern Europeans from Tuscany; CDX = Dai Chinese; JPT = Japanese; PEL = Peruvians.

Figure 5:

Graphs for trait-associated variants that show significant evidence for polygenic adaptation in the 5-leaf admixture graph built using 1000 Genomes allele frequency data. CEU = Northern Europeans from Utah; TSI = Southern Europeans from Tuscany; PEL = Peruvians; CLM = Colombians; YRI = Yoruba; CHB = Han Chinese.

To make sure there were no artifacts due to GWAS ascertainment [15], we also generated an empirical null distribution in which we produced 1,000 samples, each containing SNPs that were frequency-matched to the trait-associated SNPs, using their allele frequency in CEU, for our three sets of SNPs with strongest evidence of selection: variants associated with educational attainment, with self-reported unibrow and with height. We computed the Q_X statistic for each of these SNP samples, using the population panels from Figure 4. We do not observe a value of Q_X as high as the one observed in the real data, for either of the three sets (Figure 6).

Figure 6:

We generated an empirical null distribution by sampling SNPs from the genome that matched the CEU allele frequency of the SNPs associated with educational attainment, self-reported unibrow and height. We generated 1,000 samples this way, and computed the Q_X statistic for each sample, using the population panels from Figure 4. The Q_X value observed in the real data is depicted with a red line. We also plot the density of the corresponding χ² distribution (blue line) for comparison.

To test how robust our results were to our modeling assumptions, we also performed a simpler two-tailed binomial sign test between every pair of 1000 Genomes panels. The assumption here is that – for every panel X and Y – we should observe roughly equal number of trait-increasing alleles at higher frequency in X than in Y as trait-decreasing alleles at higher frequency in X than in Y, under a model of neutrality with respect to the effect size sign [36]. This test only uses information about the sign of the effect estimates of each SNP, not their magnitudes, and does not use information about genome-wide drift parameters between each population. Thus, they are bound to have less power than the Q_B, Q_X or MCMC tests. The P-values for these pairwise binomial tests are shown in Tables S4 to S8 for all traits that were found to have significant evidence of selection using the Q_X statistic. The top 10 most significant pairwise comparisons are shown in Tables S9 to S13. For ease of visualization, we also plotted, for each panel, the number of pairwise tests involving that panel that resulted in a P-value < 0.05 (Figures S26 to S30).

Figure S26:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with educational attainment with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels and continental super-panels from the 1000 Genomes dataset.

Figure S27:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with height with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels and continental super-panels from the 1000 Genomes dataset.

Figure S28:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with self-reported male-pattern baldness with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels and continental super-panels from the 1000 Genomes dataset.

Figure S29:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with schizophrenia with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels and continental super-panels from the 1000 Genomes dataset.

Figure S30:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with self-reported unibrow with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels and continental super-panels from the 1000 Genomes dataset.

We were interested in verifying how sensitive different proportions of missing data (i.e. removal of SNPs) or erroneous effect size estimates would be to our three strongest signals of polygenic adaptation, on variants associated with height, educational attainment and unibrow. For this purpose, we focused on the comparison between CEU and CHB. First, we simulated different proportions of missing trait-associated SNPs – ranging from 5% to 95%, with step sizes of 5%. For each of 10,000 simulations under each missing data scenario, we assessed how often the polygenic score for unibrow and educational attainment in CHB was higher than the polygenic score for CEU, like we observe in the 1000 Genomes data. Height follows the opposite pattern (with CEU having a higher polygenic score than CHB), so in that case we assessed how often its polygenic score in CEU was higher than in CHB, across the 10,000 simulations for each scenario. Note that we built these scores using only the SNPs used in our selection tests. The results are in Figure S31. For example, we see that – even with 20% missing data – the polygenic scores for either of the 3 traits preserve the observed relationship of inequality between CEU and CHB almost 100% of the time. Finally, we simulated a situation in which some proportion of the signs of the effect size estimates were misassigned. We then assessed how often we could replicate the signal we see between CEU and CHB, but this time under different proportions of sign misassignment (Figure S32).

Figure S31:

We simulated different proportions of missing trait-associated SNPs (x-axis). We then assessed – for each proportion – how often we could recreate the inequality relationship observed between the polygenic scores for CHB and CEU built using the trait-associated SNPs of the three traits with strongest evidence for polygenic adaptation: height, educational attainment and self-reported unibrow. The observed inequality relationship for educational attainment and unibrow is Poly_CHB > Poly_CEU, where Poly_X is the polygenic score for panel X. The observed inequality relationship for height is the reverse: Poly_CEU > Poly_CHB. We used 10,000 simulations for each missing data scenario.

Figure S32:

We simulated different proportions of incorrectly assigned signs for the effect size estimates of trait-associated SNPs. We then assessed – for each proportion – how often we could recreate the inequality relationship observed between the polygenic scores for CHB and CEU built using the trait-associated SNPs of the three traits with strongest evidence for polygenic adaptation: height, educational attainment and self-reported unibrow. The observed inequality relationship for educational attainment and unibrow is Poly_CHB > Poly_CEU, where Poly_X is the polygenic score for panel X. The observed inequality relationship for height is the reverse: Poly_CEU > Poly_CHB. We used 10,000 simulations for each sign misassignment scenario.

Finally, to understand how the signal of selection was distributed among our SNPs, we plotted the absolute value of the effect sizes of trait-associated SNPs for height, educational attainment and self-reported unibrow, as a function of the difference in frequency observed between CHB and CEU, polarized with respect to the trait-increasing allele in each SNP (Figure S33). We find that, in the case of self-reported unibrow, there are three variants of large effect with large frequency differences contributing to a higher polygenic score in CHB: rs3827760, rs16891982 and rs12916300. These SNPs are located in the genes EDAR, SLC45A2 and OCA2. These are genes involved in pigmentation and skin development, and all three have documented signatures of selective sweeps causing strong allele frequency differences between Europeans and East Asians [7, 3, 37, 2, 12]. After removing SNPs with large absolute effect size values (≥ 0.05), the P-value of the Q_X statistic for these variants remains significant (P = 7.04 ∗ 10⁻⁵). When looking at the other two sets (variants associated with height and educational attainment), the signal of selection is more uniformly distributed among the SNPs, with no strong outliers of large effect with large frequency differences (Figure S33).

Figure S33:

We plotted the absolute value of the effect sizes of trait-associated SNPs for height, educational attainment and self-reported unibrow, as a function of the difference in frequency observed between CHB and CEU. The panel frequencies are polarized with respect to the trait-increasing allele. We then overlaid a contour plot over each scatter-plot.

2.6 Application to Lazaridis et al. (2014) data

We applied our method to a more broadly sampled SNP chip dataset containing present-day humans from 203 populations genotyped with the Human Origins array [18, 38]. This dataset was imputed using SHAPEIT [39] on the Michigan Imputation Server [40] with the 1000 Genomes Phase 3 data [35] as the reference panel (Bhérer et al. in prep.). We tested for polygenic adaptation in a 7-leaf admixture graph. This graph contains the panels Yoruba, Mandenka and Sardinian, along with the following 4 combinations of panels, which we built so as to have a large number of individuals per panel. The panel “Oceanian” contains the panels Papuan and Australian. The panel “EastAsian” contains the panels Cambodian, Mongola, Xibo, Daur, Hezhen, Oroqen, Naxi, Yi, Japanese, Han NChina, Lahu, Miao, She, Han, Tujia and Dai. The panel “NativeAmerican” contains the panels Maya, Pima, Surui, Karitiana and Colombian. Finally, we modeled Europeans as a 2-way mixture of an ancestral component related to “NativeAmerican” and another component that split basally from the Eurasian tree. This was the mixture fitted to Europeans by ref. [21], and provides a better fit to the data than modeling Europeans merely as a sister group to East Asians and Native Americans. Though we recognize that Europeans are better modeled as a 3- or 4-way mixture of ancestral components [38, 33, 41], it is hard to produce such a mixture without resorting to ancient DNA data (see Discussion). We tested 3 different versions of this graph, each containing three different sets of European populations (Figure S22.C) distinguished by how much “early European farmer” (EEF) ancestry they had (based on Figure 4 of ref. [38]). “Euro-peA” (low EEF) contains the following panels: Estonian, Lithuanian, Scottish, Icelandic, Norwegian, Orcadian, Czech, English. “EuropeB” (medium EEF) contains Hungarian, Croatian, French, Basque, Spanish North and French South. Finally, “EuropeC” (high EEF) contains Bulgarian, Bergamo, Tuscan, Albanian, Greek and Spanish.

Trait-associated variants with significant evidence for polygenic adaptation are listed in Table 2 and the P-values of the Q_B statistic for each branch are shown in Tables S14 to S16. With these data, we are able to recapitulate the adaptive increase in height-increasing variants in Europeans we had seen before, but only observe it in populations with medium or low EEF ancestry (Figures 7 and S34 to S38). This pattern is consistent with previous observations made using ancient DNA in Europeans [12]. We also recapitulate selection patterns on variants associated with other traits, like unibrow, educational attainment and male-pattern baldness, and observe evidence for polygenic adaptation in some additional trait-associated variants, like photic sneeze reflex (Table 2).

Figure S34:

Box plots of posterior distribution for α parameters, for trait-associated variants with significant evidence for selection in a 7-leaf population graph, using the Lazaridis et al. (2014) dataset and including the set of European populations with low EEF ancestry (”EuropeA”). Parameters with flat distributions were discarded a priori using the Q_B statistic. The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points. EDU = educational attainment. PS_23 = self-reported photic sneeze reflex from 23andMe. UB_23 = self-reported unibrow from 23andMe.

Figure S35:

Box plots of posterior distribution for α parameters, for trait-associated variants with significant evidence for selection in a 7-leaf population graph, using the Lazaridis et al. (2014) dataset and including the set of European populations with medium EEF ancestry (“EuropeB”). Parameters with flat distributions were discarded a priori using the Q_B statistic. The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points. AVD_23 = self-reported age at voice drop. EDU = educational attainment. PS_23 = self-reported photic sneeze reflex from 23andMe. UB_23 = self-reported unibrow from 23andMe.

Figure S36:

Box plots of posterior distribution for α parameters, for trait-associated variants with significant evidence for selection in a 7-leaf population graph, using the Lazaridis et al. (2014) dataset and including the set of European populations with high EEF ancestry (“EuropeC”). Parameters with flat distributions were discarded a priori using the Q_B statistic. The lower, middle and upper hinges denote the 25th, 50th and 75th percentiles, respectively. The upper whisker extends to the highest value that is within 1.5 * IQR of the upper hinge, where IQR is the inter-quartile range. The lower whisker extends to the lowest value within 1.5 * IQR of the lower hinge. Data beyond the whiskers are plotted as points. EDU = educational attainment. PS_23 = self-reported photic sneeze reflex from 23andMe. UB_23 = self-reported unibrow from 23andMe.

Figure S37:

Graphs for trait-associated variants that show significant evidence for polygenic adaptation in the 7-leaf admixture graph built using the Lazaridis et al. (2014) dataset and including the set of European populations with medium EEF ancestry (“EuropeB”).

Figure S38:

View this table:

Table 2:

Trait-associated variants with Bonferroni-corrected significant evidence of being under polygenic adaptation in the Lazaridis et al. (2014) dataset, using the Q_X statistic: P < 0.05/n where n is the number of GWAS tested, assuming a χ² distribution. We also computed P-values from 1,000 pseudo-replicates in which we randomly switched the sign of effect size estimates, to account for each GWAS’s ascertainment scheme (P_rand). Trait-associated variants for which P_rand < 0.05/n are denoted with **, and trait-associated variants for which P_rand < 0.05 are denoted with *. We tested 3 different graphs with different sets of European panels, containing either low (EuropeA), medium (EuropeB) or high (EuropeC) EEF ancestry.

Figure 7:

To check if there were systematic biases in ancestral/derived allele polarity relative to the direction of the effect size, we performed a two-tailed binomial test on each trait for which we found significant evidence of polygenic adaptation in the Lazaridis et al. or the 1000 Genomes dataset (Table S17). We find that only schizophrenia has a significant bias, showing an excess of derived alleles with negative effect sizes (P = 0.03511), though this is not significant after Bonferroni correction. We therefore caution that the evolution of these trait-associated variants may not be well-modeled by the multivariate Normal assumptions that we make to calculate the Q_B statistic or when running the MCMC.

As before, to check the robustness of our results to our modeling assumptions, we show P-values for pairwise binomial sign tests involving each of the panels in the Lazaridis et al. (2014) dataset in Tables S18 to S22. The top 10 most significant pairwise comparisons are in Tables S23 to S27. Figures S39 to S43 show, for each panel, the number of pairwise tests involving that panel that resulted in a P-value < 0.05.

Figure S39:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with self-reported age at voice drop with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels from the Lazaridis et al. (2014) dataset.

Figure S40:

Figure S41:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with height with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels from the Lazaridis et al. (2014) dataset.

Figure S42:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with self-reported photic sneeze reflex with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels from the Lazaridis et al. (2014) dataset.

Figure S43:

This barplot shows, for each panel, the number of two-tailed binomial tests for systematic allele frequency differences in the sign of the effect size estimate of variants associated with self-reported unibrow with P-values < 0.05, which involve that panel as a member of the pair. We tested all panels from the Lazaridis et al. (2014) dataset.

3 Discussion

We have developed a method to infer polygenic adaptation on trait-associated variants in an admixture graph, so as to be able to pinpoint where in the history of a set of populations this type of selective processes took place. Our method requires GWAS data for a particular trait, allele frequency data for a set of populations, and a precomputed admixture graph that relates these populations with each other. Importantly, the method relies on the admixture graph as an accurate description of the ancestral genome-wide relationships among the populations under study. Potential users should be careful about correctly estimating branch lengths and ghost populations which are not included in the graph but may have substantial unaccounted ancestry contributions to the populations that are included. We used MixMapper [21] to infer the graph topology and branch lengths. Alternatively, one can also use other programs, like qpGraph [18] or TreeMix [19] to build graphs, though we caution that the estimated drift values of the branches in the output of these programs are scaled (in different ways) by the heterozygosity of ancestral nodes (see Supplementary Material of [21] for a way to properly obtain drift values from differences in allele frequencies between populations).

Running PolyGraph involves a two-step process, each of which is complementary to the other. The first step – the calculation of the Q_B statistic – is fast and provides a preliminary way to assess which branch in a graph has significant evidence for polygenic adaptation. However, this statistic does not model the ancestral allele frequencies at each node of the graph. The second step – the MCMC – is slower, but provides posterior distributions for selection parameters under a more parameter-rich model of population history. In our pipeline, we use the first method as a filtering step, to avoid exploring selection parameters in the MCMC for those branches that have little evidence for selection, and encourage the MCMC to be sparse in its assignments of selection in the graph. We illustrate this point in Figure S44, where we show a side-by-side comparison of a poly-graph built using the posterior α estimates and a poly-graph built using q_b – a signed version of the Q_B statistic (see Methods).

Figure S44:

Side-by-side comparison of a graph built using the posterior estimates of the alpha parameters using variants associated with height (left), and a graph built using the q_b statistics using the same variants (right). The color scale is arbitrary and specific to each graph.

In application to human populations, we detected signals of polygenic adaptation on sets of variants that have been identified to influence height, educational attainment, and unibrow. Selection on variants associated with height in Europeans has been previously reported elsewhere [15, 14, 16, 12] and our results are consistent with previous findings showing that height-increasing variants are at significantly and systematically higher frequencies in northern than in southern European populations. The signal for selection affecting variants associated with self-reported unibrow is also strong, but partly driven by a few variants of large effect with large frequency differences between populations, which have documented evidence for selective sweeps in genes involved in hair, skin and eye pigmentation, and skin development [3, 37, 7, 2, 12]. Additional trait-associated variants had inconsistent evidence across datasets and graph frameworks (like schizophrenia or male-pattern baldness) and/or were driven by differences in only a few SNPs of small effect (like age at voice drop), and so we do not discuss them.

We find evidence for polygenic adaptation in East Asian populations at variants that have been associated with educational attainment in European GWAS. This result is robust to the choice of data we used (1000 Genomes or Lazaridis et al. (2014) panels). Our modeling framework suggests that selection operated before or early in the process of divergence among East Asian populations - whose earliest separation dates at least as far back as approximately 10 thousand years ago [42, 43, 44, 45] – because the signal is common to different East Asian populations (Han Chinese, Dai Chinese, Japanese, Koreans, etc.). The signal is also robust to GWAS ascertainment (Figure 6), and to our modeling assumptions, as we found a significant difference between East Asian and non-East-Asian populations even when performing a simple binomial sign test (Tables S4, S9, S19 and S24).

Interpreting this and the other signals we found requires awareness of a number of technical caveats, as well as several fundamental conceptual difficulties with the study of polygenic adaptation, some of which may ultimately prove intractable.

3.1 What is the signal of polygenic adaptation?

Before discussing these difficulties, it is worth articulating exactly what a signal of poly-genic adaptation consists of. Taking the height example as a case in point, the signal is that a set of genetic variants that have been identified as associated with increased height in a European GWAS are (as a class) at higher frequency in northern Europeans today than would be expected by genetic drift alone. Though this observation is consistent with the hypothesis that natural selection has operated on these variants, it does not necessarily imply that natural selection has operated directly on “height”, nor that observed height differences between northern Europeans and other populations are necessarily genetic and due to selection.

3.2 Pleiotropy and phenotype definition

When looking at all our variant classes, we are necessarily limited by the traits that have been defined and studied by others, as we have grouped variants together based on these phenotypes. The use of previously established definitions makes it difficult to understand exactly why these variants may have been under selection in the past. This is perhaps best exemplified by the signal of polygenic adaptation for genetic variants associated with educational attainment.

Standardized schooling - and consequently, the concept of “educational attainment” - was only invented and implemented widely in the last few generations. It is obviously nonsensical to discuss its evolution over the past tens of thousands of years. Instead, it is likely that the set of variants for which we find evidence of selection was associated with some (unknown) phenotype(s) in the past. However, given that selection on these variants likely took place more than 10 thousand years ago, it may be difficult or impossible to identify what these were. A similar problem arises when thinking about the signal of polygenic adaptation on “unibrow”. This is a self-reported phenotype, and the genetic variants that have been identified may simply be associated with pigmentation (assuming people with certain hair and/or skin pigmentation phenotypes are more likely to notice they have hair between their eyes), or alternatively with some other (unmeasured) hair-related phenotype. It is also possible that direct sexual selection for absence or presence of unibrow as an attractive facial feature in certain cultures [46] may be the cause of this signal. Indeed, if a selective agent is cultural, but the culture has since changed, it may be impossible to determine what actually occurred. All these variants are also likely pleiotropic [47], which makes it even harder to determine which phenotypes were truly targeted by selection.

Perhaps, one could try to find the phenotypic gradients along which selection most likely operated [48] by modeling the evolution of trait-associated SNPs for multiple phe-notypes together. However, it is also possible that genetic correlations among traits in the present are not good proxies for genetic correlations in the past.

3.3 Relationship between polygenic scores and population mean phenotypes

Another fundamental limitation in interpreting all studies of polygenic adaptation (including this one) is that the connection between the distribution of allele frequencies today and any historical or geographic trends in phenotypes remains questionable. Indeed, though we have motivated this method as a way to identify adaptive shifts in the mean of a polygenic trait, it is a simple fact that massive changes in the mean values of many of the traits we consider have occurred by purely non-genetic environmental processes. For example, the mean height of men in the Netherlands increased from around 166 cm in the mid-1800s to currently over 180 cm [49], bringing the population from around the middle of the pack among European countries to the tallest one in the world. This likely occurred for environmental reasons, like improvements in diet and health care [50, 51]. Likewise, the average educational attainment in Iceland and North America has increased dramatically over the past century, despite a slight estimated decrease in the frequencies of genetic variants associated with the phenotype [52, 53, 54]. The somewhat paradoxical conclusion is that actual phenotypes can and do change across populations in directions that are uncorrelated to natural selection (which may in fact be a minor contributor to any such differences). It would be an understatement to say this poses challenges for the interpretation of the current study and others like it.

In fact, the trait-associated variants that we have used only explain a fraction of the narrow-sense heritability of their respective traits, even in the populations in which the association studies were performed. As we have only looked at genome-wide-significant variants, this fraction is small in most cases. For example, the heritability for ’educational attainment’ is estimated to be around 40%, and educational attainment itself is strongly determined by environmental factors [55]. The genome-wide significant SNPs we used in this study (themselves a subset of all SNPs tested in the original GWAS [56]) explain only 1.05% of the total variance for this particular trait. All of the aforementioned traits are likely affected by a myriad of environmental and social variables, which might contribute to determine their ultimate expression in each human individual.

3.4 Additional caveats

Beyond the above conceptual difficulties, there are a number of additional caveats with our approach to keep in mind. First, the effect sizes we have used derive from GWAS performed primarily on individuals of European ancestry. Thus, our tests can only detect if variants that have been found to be associated with a trait in European GWAS are significantly higher or lower in a particular (European or non-European) population, relative to what they should be under a pure drift model. This does not necessarily imply that populations for which we find evidence for selection have higher or lower average genetic values of such a trait than other populations. In fact, there is evidence to suggest that loci ascertained in European GWAS do not serve to make good predictors for traits in populations that are distantly related to Europeans [57]. One reason for this is that many or all of the traits we are studying are likely to be influenced in non-European populations by different variants from the ones that have been discovered in European GWAS. SNPs that may be strongly associated with a trait in a particular non-European population (like an African or East Asian panel) may have not reached genome-wide significance in a European GWAS, where those SNPs may not strongly affect the trait or may be at low frequencies. We also do not model dominance, epistasis or gene-by-environment inter-actions between our trait-associated variants and the diverse environments that human populations occupy, and any of these factors may further obscure the relationship between the patterns we observe and the actual underlying genetic contribution to phenotypes in these populations.

Second, we have assumed that all of the GWAS that we have used have properly accounted for population structure. If some of the trait-associated SNPs are in fact false positives caused by uncorrected structure, this could generate a false signal of polygenic adaptation. A future direction could be the incorporation of effect sizes that have been corrected for ancestry or population stratification [16, 58] and also effect sizes from GWAS performed on other populations [59, 60, 61], in order to assess the robustness of our empirical results across variants discovered in studies involving participants of different ancestries.

Third, we made the assumption that the admixture graph for the populations that we use as input is correct. If there are additional unmodeled aspects of the history of the populations, this could induce incorrect inference about the branch on which natural selection has occurred. We also recommend that the individuals in the population panels used as leaves in the graphs have roughly similar amounts of admixture. In other words, the method works best when admixture in the population was ancient enough for the admixed ancestry to have spread uniformly among members of the admixed panel. Otherwise, an admixture graph may not be the most appropriate way to model their evolution.

Finally, we have made an explicit assumption that our model should be sparse; i.e. that polygenic adaptation is rare. If in reality adaptation is common, the PolyGraph approach will necessarily only identify selection on a small number of branches.

3.5 Future directions

A natural extension to the analyses we performed here would be to look at admixture graphs that include extinct populations or species, using ancient DNA [62]. For example, present-day Europeans are known to have resulted from admixture processes involving at least 3 ancestral populations [38], and so modeling them as a sister group to East Asians or as a 2-way mixture between a Native American-related component and a basal Eurasian component may be overly simplistic. Incorporating ancient DNA would not require any additional theoretical work, as ancient populations can be naturally included as leaves in an admixture graph [38, 12]. Care should be taken, however, in making sure that the quality of the ancient DNA data at trait-associated SNPs is accounted for while inferring the number of ancestral and derived alleles, and that there is a sufficient number of ancient individuals per population to detect polygenic adaptation. One could envision either performing pseudo-haploid sampling [18, 33, 12] or using allele frequency estimators obtained from genotype likelihoods [63, 64], while accounting for errors characteristic of ancient DNA [65, 66]. When working with SNP capture data [18, 33], it may be necessary to perform imputation at the GWAS SNPs, if these were not originally covered in the SNP capture array. We aim to tackle these issues in a future study.

One concern when analyzing admixture graphs is identifiability. As we mentioned before, there are multiple configurations of the α parameters that may lead to almost identical likelihoods. The use of the spike-and-slab prior and the Q_B filtering step serve to ameliorate this problem, assuming selection was sparse and only affected a few branches. An avenue of research could involve testing other types of models or constraints that may serve to better compare among different selection configurations, perhaps without having to reduce the space of possible candidate branches a priori, for example using reversible-jump MCMC for model selection [67].

In the future, it may be worth incorporating stabilizing selection into this method [47], or exploring tests of polygenic adaptation in the context of other types of demographic frameworks, like isolation-by-distance [68] or population structure [69] models. For example, one could envision settings in which trait-associated variants would be best modeled as expanding or contracting over a geographically extended area over time, in a way that is not explainable by genetic drift alone.

Lastly, we note that despite some clear methodological and conceptual differences, our method bears a close relationship to a number of methods for inferring changes in the rate of phenotypic evolution on species phylogenies over macroevolutionary timescales. Our use of the Normal model of drift as an approximation to the Wright-Fisher diffusion is closely analogous to the use of Brownian motion models in some phylogenetic methods [70, 71, 72, 73]. It may also be worth exploring the relationship between Ornstein-Uhlenbeck models for phenotypic evolution on phylogenies [74, 75] and the aforementioned hypothetical extension of our method to include stabilizing selection, as the two processes are closely related [76, 47].

4 Methods

4.1 MCMC implementation

For the MCMC transition probabilities of the α parameters, we use a Normal distribution with constant variance. For the transition probabilities of the ancestral allele frequencies, we use a truncated Normal distribution with point masses at 0 and 1, with variance equal to a constant (input by the user) times f_X (1− f_X) where f_X is the frequency of an allele in its current state. This allows for the proposed transitions to be larger for SNPs at medium frequencies and smaller for SNPs at high or low frequencies. We apply a Hastings correction in the acceptance ratio to account for this asymmetric proposal distribution.

For all applications above, we run our MCMC sampler for 1 million steps with a burn-in period of 10,000, and obtain posterior samples every 1,000 steps. The variance of the transition probabilities of the ancestral nodes and the α parameters were chosen so that the acceptance rate was close to 23% for each set of parameters. For the spike-and-slab prior, we set τ to be equal to 0.1, and ζ to be equal to 25. The lower and upper boundaries of the uniform hyperprior for κ were set to be 60% and 80%, respectively. We note that all of these parameters can be adjusted by the user as needed.

4.2 Derivation of Q_B statistic

Let the mean-centered vector of genetic values, F be the among-population genetic covariance matrix, and V_A be the additive genetic variance of the ancestral population for a given character. We compute V_A by taking the ancestral frequency for each SNP i to be the mean sample frequency over all populations : where is the effect size estimate for the trait at SNP i. Following ref. [77 ⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓, 15], if we are willing to assume that then by the definition of the multivariate normal distribution, for any choice of and where we have set for notational convenience. It follows that and this holds for all choices of . Therefore, the square of q_b has a distribution under the null. Importantly, one is free to choose such that it represents a branch (j) in an admixture graph.

If j is a branch in an admixture graph, and we choose to scale such that it has unit length: then has an interpretation as the amount of among population additive genetic variance which we would expect to come about because of drift which occurs down branch j. In turn, is the actual amount of variance observed along the axis consistent with that branch. The ratio of these two quantities is our statistic Q_B(j) (with distribution ) and it is therefore the appropriate test statistic to ask whether there is evidence to reject neutrality along a certain branch. Note that, by design, branches with the exact same child nodes have equal Q_B statistics, as do branches at the root of the graph.

4.3 Choosing a cutoff for Q_B

We aimed to find a cutoff for Q_B that would serve to minimize the number of candidate branches to be explored in the MCMC while at the same time trying to ensure that the true selected branches are included among these candidates. One choice would be to select the cutoff of a distribution that would correspond to a P-value of 0.05/k, where k is the number of branches tested. We find, however, that a constant value for this cutoff is not the most desirable choice as a way to prioritize branches for exploring the strength of selection in each of the branches in the MCMC, as graphs of different sizes (i.e. amounts of drift) result in quite different senstivity values, as well as number of candidate branches included, when α = 0.1 (Panels B and C in Figures S45 and S46 for two cases where each branch has drift length = 0.02 and Figures S47 and S48 for two cases where each branch has drift length = 0.05). We observe the same issues when simulating under α = 0.2 (Figures S49 to S52).

Figure S45:

A. We produced 100 simulations of polygenic adaptation in an internal branch of a graph (v-q), each with 400 SNPs and α = 0.1. All branches had drift lengths equal to 0.02 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S46:

A. We produced 100 simulations of polygenic adaptation in a terminal branch of a graph (C-v), each with 400 SNPs and α = 0.1. All branches had drift lengths equal to 0.02 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S47:

A. We produced 100 simulations of polygenic adaptation in an internal branch of a graph (v-q), each with 400 SNPs and α = 0.1. All branches had drift lengths equal to 0.05 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S48:

A. We produced 100 simulations of polygenic adaptation in a terminal branch of a graph (C-v), each with 400 SNPs and α = 0.1. All branches had drift lengths equal to 0.05 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S49:

A. We produced 100 simulations of polygenic adaptation in an internal branch of a graph (v-q), each with 400 SNPs and α = 0.2. All branches had drift lengths equal to 0.02 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S50:

A. We produced 100 simulations of polygenic adaptation in a terminal branch of a graph (C-v), each with 400 SNPs and α = 0.2. All branches had drift lengths equal to 0.02 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S51:

A. We produced 100 simulations of polygenic adaptation in an internal branch of a graph (v-q), each with 400 SNPs and α = 0.2. All branches had drift lengths equal to 0.05 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

Figure S52:

A. We produced 100 simulations of polygenic adaptation in a terminal branch of a graph (C-v), each with 400 SNPs and α = 0.2. All branches had drift lengths equal to 0.05 B. We tested various cutoffs for Q_B and plotted the average number of candidate branches that passed the cutoff, over all simulations. The blue line is the cutoff corresponding to P = 0.05/k, where k is the number of branches in the graph, under the null hypothesis of no selection. C. For each of the same cutoffs, we also plotted the proportion of simulations in which the true selected branch was among the candidate branches. D. Instead of testing constant values for the Q_B cutoff, we chose the cutoff to be the maximum branch statistic in each simulation, divided by i, and tested various values of i. We plotted the average number of candidate branches that passed the cutoff, over all simulations. E. For the same cutoff method as in panel D, we plotted the proportion of simulations in which the true selected branch was among the candidate branches. For simulations and applications to real data, we chose a heuristic value (i=3, red line).

A more stable strategy across graphs of different sizes that also works better at minimizing the number of candidates is to choose the cutoff to be a fraction of the largest Q_B statistic among all branches in the graph (Panels D and E in Figures S45 to S48). For all analyses below, we chose this to be 1/3 of the maximum Q_B statistic. We note that this is a less conservative strategy than the fixed cutoff. However, it is important to remember that we do not aim to formally test for selection here, but merely obtain a set of likely candidate branches for the MCMC to explore downstream, some of which may end up producing a posterior mean estimate of α that is consistent with neutrality (i.e. α = 0).

5 Acknowledgments

We thank Claude Bhérer, Guy Sella, Molly Przeworski, Graham Coop, Joshua Schraiber, Simon Myers, Benjamin Peter and members of the Sella and Przeworski labs for helpful advice and discussions. We also thank Thomas Mailund for help with running the admixturegraph R package, and Mark Lipson for help with running MixMapper.

Footnotes

↵* fernandoracimo{at}gmail.com

References

[1].↵
Pardis C Sabeti, David E Reich, John M Higgins, Haninah ZP Levine, Daniel J Richter, Stephen F Schaffner, Stacey B Gabriel, Jill V Platko, Nick J Patterson, Gavin J McDonald, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature, 419(6909):832–837, 2002.
OpenUrl CrossRef PubMed Web of Science
[2].↵
Pardis C Sabeti, Patrick Varilly, Ben Fry, Jason Lohmueller, Elizabeth Hostetter, Chris Cotsapas, Xiaohui Xie, Elizabeth H Byrne, Steven A McCarroll, Rachelle Gaudet, et al. Genome-wide detection and characterization of positive selection in human populations. Nature, 449(7164):913–918, 2007.
OpenUrl CrossRef PubMed Web of Science
[3].↵
Benjamin F Voight, Sridhar Kudaravalli, Xiaoquan Wen, and Jonathan K Pritchard. A map of recent positive selection in the human genome. PLoS Biol, 4(3):e72, 2006.
OpenUrl CrossRef PubMed
[4].↵
Xin Yi, Yu Liang, Emilia Huerta-Sanchez, Xin Jin, Zha Xi Ping Cuo, John E Pool, Xun Xu, Hui Jiang, Nicolas Vinckenbosch, Thorfinn Sand Korneliussen, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. science, 329(5987):75–78, 2010.
OpenUrl Abstract/FREE Full Text
[5].↵
Hua Chen, Nick Patterson, and David Reich. Population differentiation as a test for selective sweeps. Genome research, 20(3):393–402, 2010.
OpenUrl Abstract/FREE Full Text
[6].↵
Fernando Racimo. Testing for ancient selection using cross-population allele frequency differentiation. Genetics, 202(2):733–750, 2016.
OpenUrl Abstract/FREE Full Text
[7].↵
Todd Bersaglieri, Pardis C Sabeti, Nick Patterson, Trisha Vanderploeg, Steve F Schaffner, Jared A Drake, Matthew Rhodes, David E Reich, and Joel N Hirschhorn. Genetic signatures of strong recent positive selection at the lactase gene. The American Journal of Human Genetics, 74(6):1111–1120, 2004.
OpenUrl CrossRef PubMed Web of Science
[8].↵
Matteo Fumagalli, Ida Moltke, Niels Grarup, Fernando Racimo, Peter Bjerregaard, Marit E Jørgensen, Thorfinn S Korneliussen, Pascale Gerbault, Line Skotte, Allan Linneberg, et al. Greenlandic inuit show genetic signatures of diet and climate adaptation. Science, 349(6254):1343–1347, 2015.
OpenUrl Abstract/FREE Full Text
[9].↵
Nicholas H Barton and Harold Perez de Vladar. Statistical mechanics and the evolution of polygenic quantitative traits. Genetics, 181(3):997–1011, 2009.
OpenUrl Abstract/FREE Full Text
[10].↵
Jonathan K Pritchard, Joseph K Pickrell, and Graham Coop. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Current biology, 20(4):R208–R215, 2010.
OpenUrl CrossRef PubMed Web of Science
[11].↵
Jonathan K Pritchard and Anna Di Rienzo. Adaptation–not by sweeps alone. Nature Reviews Genetics, 11(10):665–667, 2010.
OpenUrl CrossRef PubMed Web of Science
[12].↵
Iain Mathieson, Iosif Lazaridis, Nadin Rohland, Swapan Mallick, Nick Patterson, Songül Alpaslan Roodenberg, Eadaoin Harney, Kristin Stewardson, Daniel Fernandes, Mario Novak, et al. Genome-wide patterns of selection in 230 ancient eurasians. Nature, 528(7583):499–503, 2015.
OpenUrl CrossRef PubMed
[13].↵
Yair Field, Evan A Boyle, Natalie Telis, Ziyue Gao, Kyle J Gaulton, David Golan, Loic Yengo, Ghislain Rocheleau, Philippe Froguel, Mark I McCarthy, et al. Detection of human adaptation during the past 2000 years. Science, 354(6313):760–764, 2016.
OpenUrl Abstract/FREE Full Text
[14].↵
Michael C Turchin, Charleston WK Chiang, Cameron D Palmer, Sriram Sankararaman, David Reich, Joel N Hirschhorn, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, et al. Evidence of widespread selection on standing variation in Europe at height-associated snps. Nature genetics, 44(9):1015–1019, 2012.
OpenUrl CrossRef PubMed
[15].↵
Jeremy J Berg and Graham Coop. A population genetic signal of polygenic adaptation. PLoS Genet, 10(8):e1004412, 2014.
OpenUrl CrossRef PubMed
[16].↵
Matthew R Robinson, Gibran Hemani, Carolina Medina-Gomez, Massimo Mezzavilla, Tonu Esko, Konstantin Shakhbazov, Joseph E Powell, Anna Vinkhuyzen, Sonja I Berndt, Stefan Gustafsson, et al. Population genetic differentiation of height and body mass index across Europe. Nature genetics, 47(11):1357–1362, 2015.
OpenUrl CrossRef PubMed
[17].↵
Otso Ovaskainen, Markku Karhunen, Chaozhi Zheng, José Manuel Cano Arias, and Juha Merilä. A new method to uncover signatures of divergent and stabilizing selection in quantitative traits. Genetics, 189(2):621–632, 2011.
OpenUrl Abstract/FREE Full Text
[18].↵
Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich. Ancient admixture in human history. Genetics, 192(3):1065–1093, 2012.
OpenUrl Abstract/FREE Full Text
[19].↵
Joseph K Pickrell and Jonathan K Pritchard. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet, 8(11):e1002967, 2012.
OpenUrl CrossRef PubMed
[20].↵
James Franklin Crow, Motoo Kimura, et al. An introduction to population genetics theory. An introduction to population genetics theory., 1970.
[21].↵
Mark Lipson, Po-Ru Loh, Alex Levin, David Reich, Nick Patterson, and Bonnie Berger. Efficient moment-based inference of admixture parameters and sources of gene flow. Molecular biology and evolution, 30(8):1788–1802, 2013.
OpenUrl CrossRef PubMed
[22].↵
George Nicholson, Albert V Smith, Frosti Jónsson, Ómar Gústafsson, Kári Stefánsson, and Peter Donnelly. Assessing population differentiation and isolation from single-nucleotide polymorphism data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):695–715, 2002.
OpenUrl
[23].↵
Luigi Luca Cavalli-Sforza, I Barrai, and AWF Edwards. Analysis of human evolution under random genetic drift. Cold Spring Harbor symposia on quantitative biology, 29:9–20, 1964.
OpenUrl Abstract/FREE Full Text
[24].↵
Joseph Felsenstein. Maximum-likelihood estimation of evolutionary trees from continuous characters. American journal of human genetics, 25(5):471, 1973.
OpenUrl CrossRef PubMed Web of Science
[25].↵
Graham Coop, David Witonsky, Anna Di Rienzo, and Jonathan K Pritchard. Using environmental correlations to identify loci underlying local adaptation. Genetics, 185(4):1411–1423, 2010.
OpenUrl Abstract/FREE Full Text
[26].↵
Torsten Günther and Graham Coop. Robust identification of local adaptation from allele frequencies. Genetics, 195(1):205–220, 2013.
OpenUrl Abstract/FREE Full Text
[27].↵
Iain Mathieson and Gil McVean. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics, 193(3):973–984, 2013.
OpenUrl Abstract/FREE Full Text
[28].↵
Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988.
OpenUrl CrossRef
[29].↵
Edward I George and Robert E McCulloch. Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881–889, 1993.
OpenUrl CrossRef Web of Science
[30].↵
Xiang Zhou and Matthew Stephens. Genome-wide efficient mixed-model analysis for association studies. Nature genetics, 44(7):821–824, 2012.
OpenUrl CrossRef PubMed
[31].↵
Kalle Leppälä, Svend Vendelbo Nielsen, and Thomas Mailund. admixturegraph: An R package for admixture graph manipulation and fitting. Bioinformatics, btx048, 2017.
[32].↵
Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.
[33].↵
Wolfgang Haak, Iosif Lazaridis, Nick Patterson, Nadin Rohland, Swapan Mallick, Bastien Llamas, Guido Brandt, Susanne Nordenfelt, Eadaoin Harney, Kristin Stewardson, et al. Massive migration from the steppe was a source for indo-european languages in europe. Nature, 522(7555):207–211, 2015.
OpenUrl CrossRef PubMed
[34].↵
Joseph K Pickrell, Tomaz Berisa, Jimmy Z Liu, Laure Ségurel, Joyce Y Tung, and David A Hinds. Detection and interpretation of shared genetic influences on 42 human traits. Nature genetics, 2016.
[35].↵
Adam Auton, Gon¸calo R Abecasis, David M Altshuler, Richard M Durbin, David R Bentley, Aravinda Chakravarti, Andrew G Clark, Peter Donnelly, Evan E Eichler, Paul Flicek, et al. A global reference for human genetic variation. Nature, 526(7571):68–74, 2015.
OpenUrl CrossRef PubMed
[36].↵
H Allen Orr. Testing natural selection vs. genetic drift in phenotypic evolution using quantitative trait locus data. Genetics, 149(4):2099–2104, 1998.
OpenUrl Abstract/FREE Full Text
[37].↵
Scott H Williamson, Melissa J Hubisz, Andrew G Clark, Bret A Payseur, Carlos D Bustamante, and Rasmus Nielsen. Localizing recent adaptive evolution in the human genome. PLoS Genet, 3(6):e90, 2007.
OpenUrl CrossRef PubMed
[38].↵
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Karola Kirsanow, Peter H Sudmant, Joshua G Schraiber, Sergi Castellano, Mark Lipson, et al. Ancient human genomes suggest three ancestral populations for present-day europeans. Nature, 513(7518):409–413, 2014.
OpenUrl CrossRef PubMed Web of Science
[39].↵
Olivier Delaneau, Jean-Francois Zagury, and Jonathan Marchini. Improved whole-chromosome phasing for disease and population genetic studies. Nature methods, 10(1):5–6, 2013.
OpenUrl CrossRef PubMed
[40].↵
Sayantan Das, Lukas Forer, Sebastian Sch¨onherr, Carlo Sidore, Adam E Locke, Alan Kwong, Scott I Vrieze, Emily Y Chew, Shawn Levy, Matt McGue, et al. Next-generation genotype imputation service and methods. Nature Genetics, 48(10):1284–1287, 2016.
OpenUrl CrossRef PubMed
[41].↵
Iosif Lazaridis, Dani Nadel, Gary Rollefson, Deborah C Merrett, Nadin Rohland, Swapan Mallick, Daniel Fernandes, Mario Novak, Beatriz Gamarra, Kendra Sirak, et al. Genomic insights into the origin of farming in the ancient near east. Nature, 536(7617):419–424, 2016.
OpenUrl CrossRef PubMed
[42].↵
Mark Stoneking and Frederick Delfin. The human genetic history of east asia: weaving a complex tapestry. Current Biology, 20(4):R188–R193, 2010.
OpenUrl CrossRef PubMed Web of Science
[43].↵
Qiaomei Fu, Matthias Meyer, Xing Gao, Udo Stenzel, Hernán A Burbano, Janet Kelso, and Svante Pääbo. Dna analysis of an early modern human from tianyuan cave, china. Proceedings of the National Academy of Sciences, 110(6):2223–2227, 2013.
OpenUrl Abstract/FREE Full Text
[44].↵
Emily HM Wong, Andrey Khrunin, Larissa Nichols, Dmitry Pushkarev, Denis Khokhrin, Dmitry Verbenko, Oleg Evgrafov, James Knowles, John Novembre, Svetlana Limborska, et al. Reconstructing genetic history of siberian and northeastern european populations. Genome Research, 27(1):1–14, 2017.
OpenUrl Abstract/FREE Full Text
[45].↵
Dongsheng Lu, Haiyi Lou, Kai Yuan, Xiaoji Wang, Yuchen Wang, Chao Zhang, Yan Lu, Xiong Yang, Lian Deng, Ying Zhou, et al. Ancestral origins and genetic history of tibetan highlanders. The American Journal of Human Genetics, 99(3):580–594, 2016.
OpenUrl
[46].↵
, Asst Prof Neelam A Vashi and Ellinor R Quay. Subjective aspects of beauty. In Beauty and Body Dysmorphic Disorder, pages 63–81. Springer, 2015.
[47].↵
Yuval B Simons, Kevin Bullaughey, Richard R Hudson, and Guy Sella. A model for the genetic architecture of quantitative traits under stabilizing selection. arXiv preprint arXiv:1704.06707, 2017.
[48].↵
Russell Lande and Stevan J Arnold. The measurement of selection on correlated characters. Evolution, pages 1210–1226, 1983.
[49].↵
Timothy J Hatton and Bernice E Bray. Long run trends in the heights of european men, 19th–20th centuries. Economics & Human Biology, 8(3):405–413, 2010.
OpenUrl
[50].↵
Gert Stulp, Louise Barrett, Felix C Tropf, and Melinda Mills. Does natural selection favour taller stature among the tallest people on earth? In Proc. R. Soc. B, volume 282, page 20150211. The Royal Society, 2015.
OpenUrl CrossRef PubMed
[51].↵
Maja Tarka, Geir H Bolstad, Sebastian Wacker, Katja Räsänen, Thomas F Hansen, and Christophe Pélabon. Did natural selection make the dutch taller? a cautionary note on the importance of quantification in understanding evolution. Evolution, 69(12):3204–3206, 2015.
OpenUrl
[52].↵
Jonathan P Beauchamp. Genetic evidence for natural selection in humans in the contemporary united states. Proceedings of the National Academy of Sciences, 113(28):7774–7779, 2016.
OpenUrl Abstract/FREE Full Text
[53].↵
Alexandre Courtiol, Felix C Tropf, and Melinda C Mills. When genes and environment disagree: Making sense of trends in recent human evolution. Proceedings of the National Academy of Sciences, 113(28):7693–7695, 2016.
OpenUrl FREE Full Text
[54].↵
Augustine Kong, Michael L Frigge, Gudmar Thorleifsson, Hreinn Stefansson, Alexander I Young, Florian Zink, Gudrun A Jonsdottir, Aysu Okbay, Patrick Sulem, Gisli Masson, et al. Selection against variants in the genome associated with educational attainment. Proceedings of the National Academy of Sciences, 114(5):E727–E732, 2017.
OpenUrl Abstract/FREE Full Text
[55].↵
Cornelius A Rietveld, Sarah E Medland, Jaime Derringer, Jian Yang, Tõnu Esko, Nicolas W Martin, Harm-Jan Westra, Konstantin Shakhbazov, Abdel Abdellaoui, Arpana Agrawal, et al. Gwas of 126,559 individuals identifies genetic variants associated with educational attainment. science, 340(6139):1467–1471, 2013.
OpenUrl Abstract/FREE Full Text
[56].↵
Aysu Okbay, Jonathan P Beauchamp, Mark Alan Fontana, James J Lee, Tune H Pers, Cornelius A Rietveld, Patrick Turley, Guo-Bo Chen, Valur Emilsson, S Fleur W Meddens, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature, 533(7604):539–542, 2016.
OpenUrl CrossRef PubMed
[57].↵
Alicia R Martin, Christopher R Gignoux, Raymond K Walters, Genevieve L Wojcik, Benjamin M Neale, Simon Gravel, Mark J Daly, Carlos D Bustamante, and Eimear E Kenny. Human demographic history impacts genetic risk prediction across diverse populations. bioRxiv, page 070797, 2016.
[58].↵
Matthew R Robinson, Aaron Kleinman, Mariaelisa Graff, Anna AE Vinkhuyzen, David Couper, Michael B Miller, Wouter J Peyrot, Abdel Abdellaoui, Brendan P Zietsch, Ilja M Nolte, et al. Genetic evidence of assortative mating in humans. Nature Human Behaviour, 1:0016, 2017.
OpenUrl
[59].↵
Ida Moltke, Niels Grarup, Marit E Jørgensen, Peter Bjerregaard, Jonas T Treebak, Matteo Fumagalli, Thorfinn S Korneliussen, Marianne A Andersen, Thomas S Nielsen, Nikolaj T Krarup, et al. A common greenlandic tbc1d4 variant confers muscle insulin resistance and type 2 diabetes. Nature, 512(7513):190–193, 2014.
OpenUrl CrossRef PubMed
[60].↵
Wanqing Wen, Norihiro Kato, Joo-Yeon Hwang, Xingyi Guo, Yasuharu Tabara, Huaixing Li, Rajkumar Dorajoo, Xiaobo Yang, Fuu-Jen Tsai, Shengxu Li, et al. Genome-wide association studies in east asians identify new loci for waist-hip ratio and waist circumference. Scientific reports, 6, 2016.
[61].↵
Maggie CY Ng, Daniel Shriner, Brian H Chen, Jiang Li, Wei-Min Chen, Xiuqing Guo, Jiankang Liu, Suzette J Bielinski, Lisa R Yanek, Michael A Nalls, et al. Meta-analysis of genome-wide association studies in african americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet, 10(8):e1004517, 2014.
OpenUrl CrossRef PubMed
[62].↵
Montgomery Slatkin and Fernando Racimo. Ancient dna and human history. Proceedings of the National Academy of Sciences, 113(23):6380–6387, 2016.
OpenUrl Abstract/FREE Full Text
[63].↵
Rasmus Nielsen, Thorfinn Korneliussen, Anders Albrechtsen, Yingrui Li, and Jun Wang. Snp calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. PloS one, 7(7):e37558, 2012.
OpenUrl CrossRef PubMed
[64].↵
Thorfinn Sand Korneliussen, Anders Albrechtsen, and Rasmus Nielsen. Angsd: analysis of next generation sequencing data. BMC bioinformatics, 15(1):356, 2014.
OpenUrl CrossRef PubMed
[65].↵
Hákon Jónsson, Aurélien Ginolhac, Mikkel Schubert, Philip LF Johnson, and Ludovic Orlando. mapdamage2. 0: fast approximate bayesian estimates of ancient dna damage parameters. Bioinformatics, page btt193, 2013.
[66].↵
Samuele Soraggi, Carsten Wiuf, and Anders Albrechtsen. Improved d-statistic for low-coverage data. bioRxiv, page 127852, 2017.
[67].↵
Peter J Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, pages 711–732, 1995.
[68].↵
Sewall Wright. Isolation by distance. Genetics, 28(2):114, 1943.
OpenUrl FREE Full Text
[69].↵
Jonathan K Pritchard, Matthew Stephens, and Peter Donnelly. Inference of population structure using multilocus genotype data. Genetics, 155(2):945–959, 2000.
OpenUrl Abstract/FREE Full Text
[70].↵
Chris Venditti, Andrew Meade, and Mark Pagel. Multiple routes to mammalian diversity. Nature, 479(7373):393, 2011.
OpenUrl CrossRef PubMed Web of Science
[71].↵
Jonathan M Eastman, Michael E Alfaro, Paul Joyce, Andrew L Hipp, and Luke J Harmon. A novel comparative method for identifying shifts in the rate of character evolution on trees. Evolution, 65(12):3578–3589, 2011.
OpenUrl CrossRef PubMed Web of Science
[72].↵
Liam J Revell, D Luke Mahler, Pedro R Peres-Neto, and Benjamin D Redelings. A new phylogenetic method for identifying exceptional phenotypic diversification. Evolution, 66(1):135–146, 2012.
OpenUrl CrossRef PubMed Web of Science
[73].↵
Daniel L Rabosky, Francesco Santini, Jonathan Eastman, Stephen A Smith, Brian Sidlauskas, Jonathan Chang, and Michael E Alfaro. Rates of speciation and morphological evolution are correlated across the largest vertebrate radiation. Nature communications, 4, 2013.
[74].↵
Josef C Uyeda and Luke J Harmon. A novel bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data. Systematic biology, 63(6):902–918, 2014.
OpenUrl CrossRef PubMed
[75].↵
Mohammad Khabbazian, Ricardo Kriebel, Karl Rohe, and Cécile Ané. Fast and accurate detection of evolutionary shifts in ornstein–uhlenbeck models. Methods in Ecology and Evolution, 7(7):811–824, 2016.
OpenUrl
[76].↵
Russell Lande. Natural selection and random genetic drift in phenotypic evolution. Evolution, pages 314–334, 1976.
[77].↵
Sewall Wright. The genetical structure of populations. Annals of eugenics, 15(1):323–354, 1949.
OpenUrl CrossRef Web of Science
[78].↵
John RB Perry, Felix Day, Cathy E Elks, Patrick Sulem, Deborah J Thompson, Teresa Ferreira, Chunyan He, Daniel I Chasman, Tõnu Esko, Gudmar Thorleifsson, et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature, 514(7520):92–97, 2014.
OpenUrl CrossRef PubMed Web of Science
[79].↵
Jean-Charles Lambert, Carla A Ibrahim-Verbaas, Denise Harold, Adam C Naj, Rebecca Sims, Céline Bellenguez, Gyungah Jun, Anita L DeStefano, Joshua C Bis, Gary W Beecham, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for alzheimer’s disease. Nature genetics, 45(12):1452–1458, 2013.
OpenUrl CrossRef PubMed
[80].↵
Adam E Locke, Bratati Kahali, Sonja I Berndt, Anne E Justice, Tune H Pers, Felix R Day, Corey Powell, Sailaja Vedantam, Martin L Buchkovich, Jian Yang, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature, 518(7538):197–206, 2015.
OpenUrl CrossRef PubMed
[81].↵
Heribert Schunkert, Inke R König, Sekar Kathiresan, Muredach P Reilly, Themistocles L Assimes, Hilma Holm, Michael Preuss, Alexandre FR Stewart, Maja Barbalic, Christian Gieger, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43(4):333–338, 2011.
OpenUrl CrossRef PubMed
[82].↵
Luke Jostins, Stephan Ripke, Rinse K Weersma, Richard H Duerr, Dermot P McGovern, Ken Y Hui, James C Lee, L Philip Schumm, Yashoda Sharma, Carl A Anderson, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 491(7422):119–124, 2012.
OpenUrl CrossRef PubMed Web of Science
[83].↵
Alisa K Manning, Marie-France Hivert, Robert A Scott, Jonna L Grimsby, Nabila Bouatia-Naji, Han Chen, Denis Rybin, Ching-Ti Liu, Lawrence F Bielak, Inga Prokopenko, et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nature genetics, 44(6):659–669, 2012.
OpenUrl CrossRef PubMed
[84].↵
Karol Estrada, Unnur Styrkarsdottir, Evangelos Evangelou, Yi-Hsiang Hsu, Emma L Duncan, Evangelia E Ntzani, Ling Oei, Omar ME Albagha, Najaf Amin, John P Kemp, et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nature genetics, 44(5):491–501, 2012.
OpenUrl CrossRef PubMed
[85].↵
Pim Van Der Harst, Weihua Zhang, Irene Mateo Leach, Augusto Rendon, Niek Verweij, Joban Sehmi, Dirk S Paul, Ulrich Elling, Hooman Allayee, Xinzhong Li, et al. Seventy-five genetic loci influencing the human red blood cell. Nature, 492(7429):369–375, 2012.
OpenUrl CrossRef PubMed Web of Science
[86].↵
Tanya M Teslovich, Kiran Musunuru, Albert V Smith, Andrew C Edmondson, Ioannis M Stylianou, Masahiro Koseki, James P Pirruccello, Samuli Ripatti, Daniel I Chasman, Cristen J Willer, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature, 466(7307):707–713, 2010.
OpenUrl CrossRef PubMed Web of Science
[87].↵
Andrew R Wood, Tonu Esko, Jian Yang, Sailaja Vedantam, Tune H Pers, Stefan Gustafsson, Audrey Y Chu, Karol Estrada, Jian’an Luan, Zoltán Kutalik, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics, 46(11):1173–1186, 2014.
OpenUrl CrossRef PubMed
[88].↵
Christian Gieger, Aparna Radhakrishnan, Ana Cvejic, Weihong Tang, Eleonora Porcu, Giorgio Pistis, Jovana Serbanovic-Canic, Ulrich Elling, Alison H Goodall, Yann Labrune, et al. New gene functions in megakaryopoiesis and platelet formation. Nature, 480(7376):201–208, 2011.
OpenUrl CrossRef PubMed Web of Science
[89].↵
Yukinori Okada, Di Wu, Gosia Trynka, Towfique Raj, Chikashi Terao, Katsunori Ikari, Yuta Kochi, Koichiro Ohmura, Akari Suzuki, Shinji Yoshida, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature, 506(7488):376–381, 2014.
OpenUrl CrossRef PubMed Web of Science
[90].↵
Schizophrenia Working Group of the Psychiatric Genomics Consortium et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510):421–427, 2014.
OpenUrl CrossRef PubMed Web of Science
[91].↵
Andrew P Morris, Benjamin F Voight, Tanya M Teslovich, Teresa Ferreira, Ayellet V Segre, Valgerdur Steinthorsdottir, Rona J Strawbridge, Hassan Khan, Harald Grallert, Anubha Mahajan, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics, 44(9):981, 2012.
OpenUrl CrossRef PubMed
[92].↵
Dmitry Shungin, Thomas W Winkler, Damien C Croteau-Chonka, Teresa Ferreira, Adam E Locke, Reedik Mägi, Rona J Strawbridge, Tune H Pers, Krista Fischer, Anne E Justice, et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature, 518(7538):187–196, 2015.
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted June 07, 2017.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5214)
Biochemistry (11745)
Bioengineering (8751)
Bioinformatics (29194)
Biophysics (14971)
Cancer Biology (12095)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18305)
Genetics (12245)
Genomics (16801)
Immunology (11867)
Microbiology (28083)
Molecular Biology (11592)
Neuroscience (60962)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] [1].↵
Pardis C Sabeti, David E Reich, John M Higgins, Haninah ZP Levine, Daniel J Richter, Stephen F Schaffner, Stacey B Gabriel, Jill V Platko, Nick J Patterson, Gavin J McDonald, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature, 419(6909):832–837, 2002.
OpenUrl CrossRef PubMed Web of Science

[2] [2].↵
Pardis C Sabeti, Patrick Varilly, Ben Fry, Jason Lohmueller, Elizabeth Hostetter, Chris Cotsapas, Xiaohui Xie, Elizabeth H Byrne, Steven A McCarroll, Rachelle Gaudet, et al. Genome-wide detection and characterization of positive selection in human populations. Nature, 449(7164):913–918, 2007.
OpenUrl CrossRef PubMed Web of Science

[3] [3].↵
Benjamin F Voight, Sridhar Kudaravalli, Xiaoquan Wen, and Jonathan K Pritchard. A map of recent positive selection in the human genome. PLoS Biol, 4(3):e72, 2006.
OpenUrl CrossRef PubMed

[4] [4].↵
Xin Yi, Yu Liang, Emilia Huerta-Sanchez, Xin Jin, Zha Xi Ping Cuo, John E Pool, Xun Xu, Hui Jiang, Nicolas Vinckenbosch, Thorfinn Sand Korneliussen, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. science, 329(5987):75–78, 2010.
OpenUrl Abstract/FREE Full Text

[5] [5].↵
Hua Chen, Nick Patterson, and David Reich. Population differentiation as a test for selective sweeps. Genome research, 20(3):393–402, 2010.
OpenUrl Abstract/FREE Full Text

[6] [6].↵
Fernando Racimo. Testing for ancient selection using cross-population allele frequency differentiation. Genetics, 202(2):733–750, 2016.
OpenUrl Abstract/FREE Full Text

[7] [7].↵
Todd Bersaglieri, Pardis C Sabeti, Nick Patterson, Trisha Vanderploeg, Steve F Schaffner, Jared A Drake, Matthew Rhodes, David E Reich, and Joel N Hirschhorn. Genetic signatures of strong recent positive selection at the lactase gene. The American Journal of Human Genetics, 74(6):1111–1120, 2004.
OpenUrl CrossRef PubMed Web of Science

[8] [8].↵
Matteo Fumagalli, Ida Moltke, Niels Grarup, Fernando Racimo, Peter Bjerregaard, Marit E Jørgensen, Thorfinn S Korneliussen, Pascale Gerbault, Line Skotte, Allan Linneberg, et al. Greenlandic inuit show genetic signatures of diet and climate adaptation. Science, 349(6254):1343–1347, 2015.
OpenUrl Abstract/FREE Full Text

[9] [9].↵
Nicholas H Barton and Harold Perez de Vladar. Statistical mechanics and the evolution of polygenic quantitative traits. Genetics, 181(3):997–1011, 2009.
OpenUrl Abstract/FREE Full Text

[10] [10].↵
Jonathan K Pritchard, Joseph K Pickrell, and Graham Coop. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Current biology, 20(4):R208–R215, 2010.
OpenUrl CrossRef PubMed Web of Science

[11] [11].↵
Jonathan K Pritchard and Anna Di Rienzo. Adaptation–not by sweeps alone. Nature Reviews Genetics, 11(10):665–667, 2010.
OpenUrl CrossRef PubMed Web of Science

[12] [12].↵
Iain Mathieson, Iosif Lazaridis, Nadin Rohland, Swapan Mallick, Nick Patterson, Songül Alpaslan Roodenberg, Eadaoin Harney, Kristin Stewardson, Daniel Fernandes, Mario Novak, et al. Genome-wide patterns of selection in 230 ancient eurasians. Nature, 528(7583):499–503, 2015.
OpenUrl CrossRef PubMed

[13] [13].↵
Yair Field, Evan A Boyle, Natalie Telis, Ziyue Gao, Kyle J Gaulton, David Golan, Loic Yengo, Ghislain Rocheleau, Philippe Froguel, Mark I McCarthy, et al. Detection of human adaptation during the past 2000 years. Science, 354(6313):760–764, 2016.
OpenUrl Abstract/FREE Full Text

[14] [14].↵
Michael C Turchin, Charleston WK Chiang, Cameron D Palmer, Sriram Sankararaman, David Reich, Joel N Hirschhorn, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, et al. Evidence of widespread selection on standing variation in Europe at height-associated snps. Nature genetics, 44(9):1015–1019, 2012.
OpenUrl CrossRef PubMed

[15] [15].↵
Jeremy J Berg and Graham Coop. A population genetic signal of polygenic adaptation. PLoS Genet, 10(8):e1004412, 2014.
OpenUrl CrossRef PubMed

[16] [16].↵
Matthew R Robinson, Gibran Hemani, Carolina Medina-Gomez, Massimo Mezzavilla, Tonu Esko, Konstantin Shakhbazov, Joseph E Powell, Anna Vinkhuyzen, Sonja I Berndt, Stefan Gustafsson, et al. Population genetic differentiation of height and body mass index across Europe. Nature genetics, 47(11):1357–1362, 2015.
OpenUrl CrossRef PubMed

[17] [17].↵
Otso Ovaskainen, Markku Karhunen, Chaozhi Zheng, José Manuel Cano Arias, and Juha Merilä. A new method to uncover signatures of divergent and stabilizing selection in quantitative traits. Genetics, 189(2):621–632, 2011.
OpenUrl Abstract/FREE Full Text

[18] [18].↵
Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich. Ancient admixture in human history. Genetics, 192(3):1065–1093, 2012.
OpenUrl Abstract/FREE Full Text

[19] [19].↵
Joseph K Pickrell and Jonathan K Pritchard. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet, 8(11):e1002967, 2012.
OpenUrl CrossRef PubMed

[20] [20].↵
James Franklin Crow, Motoo Kimura, et al. An introduction to population genetics theory. An introduction to population genetics theory., 1970.

[21] [21].↵
Mark Lipson, Po-Ru Loh, Alex Levin, David Reich, Nick Patterson, and Bonnie Berger. Efficient moment-based inference of admixture parameters and sources of gene flow. Molecular biology and evolution, 30(8):1788–1802, 2013.
OpenUrl CrossRef PubMed

[22] [22].↵
George Nicholson, Albert V Smith, Frosti Jónsson, Ómar Gústafsson, Kári Stefánsson, and Peter Donnelly. Assessing population differentiation and isolation from single-nucleotide polymorphism data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):695–715, 2002.
OpenUrl

[23] [23].↵
Luigi Luca Cavalli-Sforza, I Barrai, and AWF Edwards. Analysis of human evolution under random genetic drift. Cold Spring Harbor symposia on quantitative biology, 29:9–20, 1964.
OpenUrl Abstract/FREE Full Text

[24] [24].↵
Joseph Felsenstein. Maximum-likelihood estimation of evolutionary trees from continuous characters. American journal of human genetics, 25(5):471, 1973.
OpenUrl CrossRef PubMed Web of Science

[25] [25].↵
Graham Coop, David Witonsky, Anna Di Rienzo, and Jonathan K Pritchard. Using environmental correlations to identify loci underlying local adaptation. Genetics, 185(4):1411–1423, 2010.
OpenUrl Abstract/FREE Full Text

[26] [26].↵
Torsten Günther and Graham Coop. Robust identification of local adaptation from allele frequencies. Genetics, 195(1):205–220, 2013.
OpenUrl Abstract/FREE Full Text

[27] [27].↵
Iain Mathieson and Gil McVean. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics, 193(3):973–984, 2013.
OpenUrl Abstract/FREE Full Text

[28] [28].↵
Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988.
OpenUrl CrossRef

[29] [29].↵
Edward I George and Robert E McCulloch. Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881–889, 1993.
OpenUrl CrossRef Web of Science

[30] [30].↵
Xiang Zhou and Matthew Stephens. Genome-wide efficient mixed-model analysis for association studies. Nature genetics, 44(7):821–824, 2012.
OpenUrl CrossRef PubMed

[31] [31].↵
Kalle Leppälä, Svend Vendelbo Nielsen, and Thomas Mailund. admixturegraph: An R package for admixture graph manipulation and fitting. Bioinformatics, btx048, 2017.

[32] [32].↵
Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.

[33] [33].↵
Wolfgang Haak, Iosif Lazaridis, Nick Patterson, Nadin Rohland, Swapan Mallick, Bastien Llamas, Guido Brandt, Susanne Nordenfelt, Eadaoin Harney, Kristin Stewardson, et al. Massive migration from the steppe was a source for indo-european languages in europe. Nature, 522(7555):207–211, 2015.
OpenUrl CrossRef PubMed

[34] [34].↵
Joseph K Pickrell, Tomaz Berisa, Jimmy Z Liu, Laure Ségurel, Joyce Y Tung, and David A Hinds. Detection and interpretation of shared genetic influences on 42 human traits. Nature genetics, 2016.

[35] [35].↵
Adam Auton, Gon¸calo R Abecasis, David M Altshuler, Richard M Durbin, David R Bentley, Aravinda Chakravarti, Andrew G Clark, Peter Donnelly, Evan E Eichler, Paul Flicek, et al. A global reference for human genetic variation. Nature, 526(7571):68–74, 2015.
OpenUrl CrossRef PubMed

[36] [36].↵
H Allen Orr. Testing natural selection vs. genetic drift in phenotypic evolution using quantitative trait locus data. Genetics, 149(4):2099–2104, 1998.
OpenUrl Abstract/FREE Full Text

[37] [37].↵
Scott H Williamson, Melissa J Hubisz, Andrew G Clark, Bret A Payseur, Carlos D Bustamante, and Rasmus Nielsen. Localizing recent adaptive evolution in the human genome. PLoS Genet, 3(6):e90, 2007.
OpenUrl CrossRef PubMed

[38] [38].↵
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Karola Kirsanow, Peter H Sudmant, Joshua G Schraiber, Sergi Castellano, Mark Lipson, et al. Ancient human genomes suggest three ancestral populations for present-day europeans. Nature, 513(7518):409–413, 2014.
OpenUrl CrossRef PubMed Web of Science

[39] [39].↵
Olivier Delaneau, Jean-Francois Zagury, and Jonathan Marchini. Improved whole-chromosome phasing for disease and population genetic studies. Nature methods, 10(1):5–6, 2013.
OpenUrl CrossRef PubMed

[40] [40].↵
Sayantan Das, Lukas Forer, Sebastian Sch¨onherr, Carlo Sidore, Adam E Locke, Alan Kwong, Scott I Vrieze, Emily Y Chew, Shawn Levy, Matt McGue, et al. Next-generation genotype imputation service and methods. Nature Genetics, 48(10):1284–1287, 2016.
OpenUrl CrossRef PubMed

[41] [41].↵
Iosif Lazaridis, Dani Nadel, Gary Rollefson, Deborah C Merrett, Nadin Rohland, Swapan Mallick, Daniel Fernandes, Mario Novak, Beatriz Gamarra, Kendra Sirak, et al. Genomic insights into the origin of farming in the ancient near east. Nature, 536(7617):419–424, 2016.
OpenUrl CrossRef PubMed

[42] [42].↵
Mark Stoneking and Frederick Delfin. The human genetic history of east asia: weaving a complex tapestry. Current Biology, 20(4):R188–R193, 2010.
OpenUrl CrossRef PubMed Web of Science

[43] [43].↵
Qiaomei Fu, Matthias Meyer, Xing Gao, Udo Stenzel, Hernán A Burbano, Janet Kelso, and Svante Pääbo. Dna analysis of an early modern human from tianyuan cave, china. Proceedings of the National Academy of Sciences, 110(6):2223–2227, 2013.
OpenUrl Abstract/FREE Full Text

[44] [44].↵
Emily HM Wong, Andrey Khrunin, Larissa Nichols, Dmitry Pushkarev, Denis Khokhrin, Dmitry Verbenko, Oleg Evgrafov, James Knowles, John Novembre, Svetlana Limborska, et al. Reconstructing genetic history of siberian and northeastern european populations. Genome Research, 27(1):1–14, 2017.
OpenUrl Abstract/FREE Full Text

[45] [45].↵
Dongsheng Lu, Haiyi Lou, Kai Yuan, Xiaoji Wang, Yuchen Wang, Chao Zhang, Yan Lu, Xiong Yang, Lian Deng, Ying Zhou, et al. Ancestral origins and genetic history of tibetan highlanders. The American Journal of Human Genetics, 99(3):580–594, 2016.
OpenUrl

[46] [46].↵
, Asst Prof Neelam A Vashi and Ellinor R Quay. Subjective aspects of beauty. In Beauty and Body Dysmorphic Disorder, pages 63–81. Springer, 2015.

[47] [47].↵
Yuval B Simons, Kevin Bullaughey, Richard R Hudson, and Guy Sella. A model for the genetic architecture of quantitative traits under stabilizing selection. arXiv preprint arXiv:1704.06707, 2017.

[48] [48].↵
Russell Lande and Stevan J Arnold. The measurement of selection on correlated characters. Evolution, pages 1210–1226, 1983.

[49] [49].↵
Timothy J Hatton and Bernice E Bray. Long run trends in the heights of european men, 19th–20th centuries. Economics & Human Biology, 8(3):405–413, 2010.
OpenUrl

[50] [50].↵
Gert Stulp, Louise Barrett, Felix C Tropf, and Melinda Mills. Does natural selection favour taller stature among the tallest people on earth? In Proc. R. Soc. B, volume 282, page 20150211. The Royal Society, 2015.
OpenUrl CrossRef PubMed

[51] [51].↵
Maja Tarka, Geir H Bolstad, Sebastian Wacker, Katja Räsänen, Thomas F Hansen, and Christophe Pélabon. Did natural selection make the dutch taller? a cautionary note on the importance of quantification in understanding evolution. Evolution, 69(12):3204–3206, 2015.
OpenUrl

[52] [52].↵
Jonathan P Beauchamp. Genetic evidence for natural selection in humans in the contemporary united states. Proceedings of the National Academy of Sciences, 113(28):7774–7779, 2016.
OpenUrl Abstract/FREE Full Text

[53] [53].↵
Alexandre Courtiol, Felix C Tropf, and Melinda C Mills. When genes and environment disagree: Making sense of trends in recent human evolution. Proceedings of the National Academy of Sciences, 113(28):7693–7695, 2016.
OpenUrl FREE Full Text

[54] [54].↵
Augustine Kong, Michael L Frigge, Gudmar Thorleifsson, Hreinn Stefansson, Alexander I Young, Florian Zink, Gudrun A Jonsdottir, Aysu Okbay, Patrick Sulem, Gisli Masson, et al. Selection against variants in the genome associated with educational attainment. Proceedings of the National Academy of Sciences, 114(5):E727–E732, 2017.
OpenUrl Abstract/FREE Full Text

[55] [55].↵
Cornelius A Rietveld, Sarah E Medland, Jaime Derringer, Jian Yang, Tõnu Esko, Nicolas W Martin, Harm-Jan Westra, Konstantin Shakhbazov, Abdel Abdellaoui, Arpana Agrawal, et al. Gwas of 126,559 individuals identifies genetic variants associated with educational attainment. science, 340(6139):1467–1471, 2013.
OpenUrl Abstract/FREE Full Text

[56] [56].↵
Aysu Okbay, Jonathan P Beauchamp, Mark Alan Fontana, James J Lee, Tune H Pers, Cornelius A Rietveld, Patrick Turley, Guo-Bo Chen, Valur Emilsson, S Fleur W Meddens, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature, 533(7604):539–542, 2016.
OpenUrl CrossRef PubMed

[57] [57].↵
Alicia R Martin, Christopher R Gignoux, Raymond K Walters, Genevieve L Wojcik, Benjamin M Neale, Simon Gravel, Mark J Daly, Carlos D Bustamante, and Eimear E Kenny. Human demographic history impacts genetic risk prediction across diverse populations. bioRxiv, page 070797, 2016.

[58] [58].↵
Matthew R Robinson, Aaron Kleinman, Mariaelisa Graff, Anna AE Vinkhuyzen, David Couper, Michael B Miller, Wouter J Peyrot, Abdel Abdellaoui, Brendan P Zietsch, Ilja M Nolte, et al. Genetic evidence of assortative mating in humans. Nature Human Behaviour, 1:0016, 2017.
OpenUrl

[59] [59].↵
Ida Moltke, Niels Grarup, Marit E Jørgensen, Peter Bjerregaard, Jonas T Treebak, Matteo Fumagalli, Thorfinn S Korneliussen, Marianne A Andersen, Thomas S Nielsen, Nikolaj T Krarup, et al. A common greenlandic tbc1d4 variant confers muscle insulin resistance and type 2 diabetes. Nature, 512(7513):190–193, 2014.
OpenUrl CrossRef PubMed

[60] [60].↵
Wanqing Wen, Norihiro Kato, Joo-Yeon Hwang, Xingyi Guo, Yasuharu Tabara, Huaixing Li, Rajkumar Dorajoo, Xiaobo Yang, Fuu-Jen Tsai, Shengxu Li, et al. Genome-wide association studies in east asians identify new loci for waist-hip ratio and waist circumference. Scientific reports, 6, 2016.

[61] [61].↵
Maggie CY Ng, Daniel Shriner, Brian H Chen, Jiang Li, Wei-Min Chen, Xiuqing Guo, Jiankang Liu, Suzette J Bielinski, Lisa R Yanek, Michael A Nalls, et al. Meta-analysis of genome-wide association studies in african americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet, 10(8):e1004517, 2014.
OpenUrl CrossRef PubMed

[62] [62].↵
Montgomery Slatkin and Fernando Racimo. Ancient dna and human history. Proceedings of the National Academy of Sciences, 113(23):6380–6387, 2016.
OpenUrl Abstract/FREE Full Text

[63] [63].↵
Rasmus Nielsen, Thorfinn Korneliussen, Anders Albrechtsen, Yingrui Li, and Jun Wang. Snp calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. PloS one, 7(7):e37558, 2012.
OpenUrl CrossRef PubMed

[64] [64].↵
Thorfinn Sand Korneliussen, Anders Albrechtsen, and Rasmus Nielsen. Angsd: analysis of next generation sequencing data. BMC bioinformatics, 15(1):356, 2014.
OpenUrl CrossRef PubMed

[65] [65].↵
Hákon Jónsson, Aurélien Ginolhac, Mikkel Schubert, Philip LF Johnson, and Ludovic Orlando. mapdamage2. 0: fast approximate bayesian estimates of ancient dna damage parameters. Bioinformatics, page btt193, 2013.

[66] [66].↵
Samuele Soraggi, Carsten Wiuf, and Anders Albrechtsen. Improved d-statistic for low-coverage data. bioRxiv, page 127852, 2017.

[67] [67].↵
Peter J Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, pages 711–732, 1995.

[68] [68].↵
Sewall Wright. Isolation by distance. Genetics, 28(2):114, 1943.
OpenUrl FREE Full Text

[69] [69].↵
Jonathan K Pritchard, Matthew Stephens, and Peter Donnelly. Inference of population structure using multilocus genotype data. Genetics, 155(2):945–959, 2000.
OpenUrl Abstract/FREE Full Text

[70] [70].↵
Chris Venditti, Andrew Meade, and Mark Pagel. Multiple routes to mammalian diversity. Nature, 479(7373):393, 2011.
OpenUrl CrossRef PubMed Web of Science

[71] [71].↵
Jonathan M Eastman, Michael E Alfaro, Paul Joyce, Andrew L Hipp, and Luke J Harmon. A novel comparative method for identifying shifts in the rate of character evolution on trees. Evolution, 65(12):3578–3589, 2011.
OpenUrl CrossRef PubMed Web of Science

[72] [72].↵
Liam J Revell, D Luke Mahler, Pedro R Peres-Neto, and Benjamin D Redelings. A new phylogenetic method for identifying exceptional phenotypic diversification. Evolution, 66(1):135–146, 2012.
OpenUrl CrossRef PubMed Web of Science

[73] [73].↵
Daniel L Rabosky, Francesco Santini, Jonathan Eastman, Stephen A Smith, Brian Sidlauskas, Jonathan Chang, and Michael E Alfaro. Rates of speciation and morphological evolution are correlated across the largest vertebrate radiation. Nature communications, 4, 2013.

[74] [74].↵
Josef C Uyeda and Luke J Harmon. A novel bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data. Systematic biology, 63(6):902–918, 2014.
OpenUrl CrossRef PubMed

[75] [75].↵
Mohammad Khabbazian, Ricardo Kriebel, Karl Rohe, and Cécile Ané. Fast and accurate detection of evolutionary shifts in ornstein–uhlenbeck models. Methods in Ecology and Evolution, 7(7):811–824, 2016.
OpenUrl

[76] [76].↵
Russell Lande. Natural selection and random genetic drift in phenotypic evolution. Evolution, pages 314–334, 1976.

[77] [77].↵
Sewall Wright. The genetical structure of populations. Annals of eugenics, 15(1):323–354, 1949.
OpenUrl CrossRef Web of Science

[78] [78].↵
John RB Perry, Felix Day, Cathy E Elks, Patrick Sulem, Deborah J Thompson, Teresa Ferreira, Chunyan He, Daniel I Chasman, Tõnu Esko, Gudmar Thorleifsson, et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature, 514(7520):92–97, 2014.
OpenUrl CrossRef PubMed Web of Science

[79] [79].↵
Jean-Charles Lambert, Carla A Ibrahim-Verbaas, Denise Harold, Adam C Naj, Rebecca Sims, Céline Bellenguez, Gyungah Jun, Anita L DeStefano, Joshua C Bis, Gary W Beecham, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for alzheimer’s disease. Nature genetics, 45(12):1452–1458, 2013.
OpenUrl CrossRef PubMed

[80] [80].↵
Adam E Locke, Bratati Kahali, Sonja I Berndt, Anne E Justice, Tune H Pers, Felix R Day, Corey Powell, Sailaja Vedantam, Martin L Buchkovich, Jian Yang, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature, 518(7538):197–206, 2015.
OpenUrl CrossRef PubMed

[81] [81].↵
Heribert Schunkert, Inke R König, Sekar Kathiresan, Muredach P Reilly, Themistocles L Assimes, Hilma Holm, Michael Preuss, Alexandre FR Stewart, Maja Barbalic, Christian Gieger, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43(4):333–338, 2011.
OpenUrl CrossRef PubMed

[82] [82].↵
Luke Jostins, Stephan Ripke, Rinse K Weersma, Richard H Duerr, Dermot P McGovern, Ken Y Hui, James C Lee, L Philip Schumm, Yashoda Sharma, Carl A Anderson, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature, 491(7422):119–124, 2012.
OpenUrl CrossRef PubMed Web of Science

[83] [83].↵
Alisa K Manning, Marie-France Hivert, Robert A Scott, Jonna L Grimsby, Nabila Bouatia-Naji, Han Chen, Denis Rybin, Ching-Ti Liu, Lawrence F Bielak, Inga Prokopenko, et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nature genetics, 44(6):659–669, 2012.
OpenUrl CrossRef PubMed

[84] [84].↵
Karol Estrada, Unnur Styrkarsdottir, Evangelos Evangelou, Yi-Hsiang Hsu, Emma L Duncan, Evangelia E Ntzani, Ling Oei, Omar ME Albagha, Najaf Amin, John P Kemp, et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nature genetics, 44(5):491–501, 2012.
OpenUrl CrossRef PubMed

[85] [85].↵
Pim Van Der Harst, Weihua Zhang, Irene Mateo Leach, Augusto Rendon, Niek Verweij, Joban Sehmi, Dirk S Paul, Ulrich Elling, Hooman Allayee, Xinzhong Li, et al. Seventy-five genetic loci influencing the human red blood cell. Nature, 492(7429):369–375, 2012.
OpenUrl CrossRef PubMed Web of Science

[86] [86].↵
Tanya M Teslovich, Kiran Musunuru, Albert V Smith, Andrew C Edmondson, Ioannis M Stylianou, Masahiro Koseki, James P Pirruccello, Samuli Ripatti, Daniel I Chasman, Cristen J Willer, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature, 466(7307):707–713, 2010.
OpenUrl CrossRef PubMed Web of Science

[87] [87].↵
Andrew R Wood, Tonu Esko, Jian Yang, Sailaja Vedantam, Tune H Pers, Stefan Gustafsson, Audrey Y Chu, Karol Estrada, Jian’an Luan, Zoltán Kutalik, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature genetics, 46(11):1173–1186, 2014.
OpenUrl CrossRef PubMed

[88] [88].↵
Christian Gieger, Aparna Radhakrishnan, Ana Cvejic, Weihong Tang, Eleonora Porcu, Giorgio Pistis, Jovana Serbanovic-Canic, Ulrich Elling, Alison H Goodall, Yann Labrune, et al. New gene functions in megakaryopoiesis and platelet formation. Nature, 480(7376):201–208, 2011.
OpenUrl CrossRef PubMed Web of Science

[89] [89].↵
Yukinori Okada, Di Wu, Gosia Trynka, Towfique Raj, Chikashi Terao, Katsunori Ikari, Yuta Kochi, Koichiro Ohmura, Akari Suzuki, Shinji Yoshida, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature, 506(7488):376–381, 2014.
OpenUrl CrossRef PubMed Web of Science

[90] [90].↵
Schizophrenia Working Group of the Psychiatric Genomics Consortium et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510):421–427, 2014.
OpenUrl CrossRef PubMed Web of Science

[91] [91].↵
Andrew P Morris, Benjamin F Voight, Tanya M Teslovich, Teresa Ferreira, Ayellet V Segre, Valgerdur Steinthorsdottir, Rona J Strawbridge, Hassan Khan, Harald Grallert, Anubha Mahajan, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature genetics, 44(9):981, 2012.
OpenUrl CrossRef PubMed

[92] [92].↵
Dmitry Shungin, Thomas W Winkler, Damien C Croteau-Chonka, Teresa Ferreira, Adam E Locke, Reedik Mägi, Rona J Strawbridge, Tune H Pers, Krista Fischer, Anne E Justice, et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature, 518(7538):187–196, 2015.
OpenUrl CrossRef PubMed Web of Science