Direct detection of natural selection in Bronze Age Britain

Iain Mathieson; Jonathan Terhorst

doi:10.1101/2022.03.14.484330

Abstract

We developed a novel method for efficiently estimating time-varying selection coefficients from genome-wide ancient DNA data. In simulations, our method accurately recovers selective trajectories, and is robust to mis-specification of population size. We applied it to a large dataset of ancient and present-day human genomes from Britain, and identified seven loci with genome-wide significant evidence of selection in the past 4500 years. Almost all of them are related to increased vitamin D or calcium levels, and we conclude that lack of vitamin D and consequent low calcium was consistently the most important selective pressure in Britain since the Bronze Age. However, the strength of selection on individual loci varied substantially over time, suggesting that cultural or environmental factors moderated the genetic response to this pressure. Of 28 complex anthropometric and metabolic traits, skin pigmentation was the only one with significant evidence of polygenic selection, further underscoring the importance of phenotypes related to vitamin D. Our approach illustrates the power of ancient DNA to characterize selection in human populations and illuminates the recent evolutionary history of Britain.

Introduction

Ancient DNA (aDNA) provides direct insight into human evolutionary history. So far, this information has been mainly been used to study demographic history—the migrations, splits and admixtures that our species underwent in the recent past [1]. But in principle, aDNA can also tell us about phenotypic evolution and, in particular, about the contribution of natural selection to phenotypic and genomic variation. Compared to demographic inference, this is more challenging, because studies of natural selection typically require larger sample sizes than studies of population history, which can integrate information from across the genome.

Although some recent studies have used aDNA to study selection and phenotypic evolution, they have mostly focused on a relatively small number of loci [e.g., 2, 3, 4]. Studies that performed genome-wide scans for selection using aDNA [5, 6, 7] have compared allele frequencies across populations, but have not made use of the precise temporal information available from direct dating of ancient samples. For example, the approach in Mathieson et al. (2015) was able to detect selection that happened some time in the past 8,000 years, somewhere in Western Eurasia, but could not be more specific.

With the recent publication of large aDNA datasets [e.g., 8, 9, 7], sample sizes for some regions are now in the hundreds of individuals, large enough to study selection with good spatial and temporal resolution. However, there is a lack of suitable methods to analyze these data. There are many published methods for estimating selection coefficients from time series data [e.g., 10, 11, 12, 13, 14, 15, 16, 17], but all of them unrealistically assume that selective pressures are constant over time, and they are too slow to run on millions of markers at once. We therefore developed a novel statistical approach which is able to estimate arbitrary time-varying selection coefficients, while being fast enough to run genome-wide.

The population of Britain, from 4500 years before present (BP, i.e. the start of the Bronze Age) to the present-day is ideal to demonstrate this approach for several reasons. First, it is relatively homogeneous in terms of both genetics and environment. Second, it is the population with the largest aDNA sample size. Third, there is a large amount of data about the genetic basis of complex traits in this population due to analysis of the UK Biobank. Finally, one of the few studies that attempted to detect selection over this time period (based on data from present-day individuals) was performed in this population [18], giving us a point of comparison for our aDNA-based approach.

Methods

We begin by formally defining the data, our inference problem, and the model we used to solve it. Our model is a generalization of the one used in [12]. We assume a haploid Wright-Fisher population model whose size t generations before present was 2N_t. Time runs backwards, so that t = 0 is the present, and t = T is the earliest point where we have data. We are interested in the frequency of a single allele with two types A and a. In generation t, the a allele has relative fitness 1 + s_t/2 relative to A. Our inferential target is the vector s = (s₀, …, s_T) of selection coefficients over time. The population size history (N₀, …, N_T) is also allowed to vary with time, but we assume that it is known and do not attempt jointly estimate it with s.

The data consist of pairs of counts . Each pair represents the number a_t of a alleles observed out of n_t samples collected t generations ago. Missing data or generations where no sampling occurred are indicated by setting n_t = 0. Our model conditions on the sample sizes n_t, and we suppress notational dependence on them going forward.

The data generating model is as follows. At time t, let the (unobserved) population frequency of the a allele be f_t ∈ [0, 1]. Given f_t, the data are binomially distributed with success probability f_t:

The latent trajectory f = (f₀, f₁, …, f_T) evolves according to a Wright-Fisher (WF) model with genic selection and no mutation [e.g., 19]. Given f_t and s_t, the number of individuals F_t−1 ∈ [0, 2N_t−1] possessing the a allele in next generation has distribution and then we set f_t−1 = F_t−1/(2N_t−1).

Let a = (a₀, …, a_T) denote the data. The complete likelihood is where π_T (f_T) is a prior distribution on the initial allele frequency (described in more detail below), and the probabilities p_s(f_t | f_t₊₁) and p(x_t | f_t) are specified by (1)–(2). (Throughout this section, we use the notation p_s to denote probability distributions that depend on the selection parameters s.) The likelihood of the observed data is obtained by marginalizing (3) over the latent allele trajectory f :

By exploiting the Markov structure of (3), the integral (4) can be efficiently evaluated using the forward algorithm. Each step of the forward algorithm costs owing the need to evaluate the transition probability (2) for all possible values of f_t−1 and f_t. When the effective population size is large (greater than 10³, say) this quadratic scaling causes computation to become slow, and it is advantageous to model the latent allele trajectory f_t using a continuous approximation. Several have been proposed, including using the WF diffusion [10, 11, 13, 15, 20]; Gaussian approximations to the WF model [12, 13, 14, 16]; and approximations based on the beta distribution [17, 21, 22]. See also [23] and references therein. A recent review [24] found the beta-with-spikes (hereafter, BwS) approximation of [17] to perform consistently better than other approaches, so we use it as our starting point.

In this model, the latent frequency f_t ∈ [0, 1] of the selected allele is modeled as a mixture distribution with three components. There are two atoms at f_t = 0 and f_t = 1 to allow for the possibility of allele loss or fixation, and the third component is a beta density characterizing the intermediate frequencies, f_t ∈ (0, 1). The form of this model is motivated by the fact that, in the original WF process, the probability of loss or fixation is positive, whereas it is zero if f_t is modeled using an absolutely continuous density.

Although the BwS model is state-of-the-art, room for improvement remains. [24] found that, while generally accurate, the BwS approximation degrades when selection is strong and the effective population size is small. Crucially, we cannot necessarily rule such a regime out when analyzing aDNA data. Another potential shortcoming, not reported by [24] but encountered when we implemented the BwS model, concerns the method used to approximate the transition probability

We found that errors in the moment recursions used to compute these probabilities [24, eqn. (6) onwards] tended to accumulate, leading to numerical instability and situations where the variance of the resulting beta approximation, or the spike probabilities, were sometimes computed to be negative. (Python code illustrating this phenomenon is included in the supplement.) This led us to consider refinements of the BwS model.

Beta mixture model

Breakdown of the moment-based approximation can be explained by insufficient degrees of freedom. The two-parameter beta distribution is not flexible enough to accurately approximate the transition density (5) in all cases. A potential solution is to enrich the approximating class of distributions, by modeling the continuous component of (5) as a mixture of beta distributions. This solution is intuitive, and also has theoretical justification: by a famous result of Bernstein [e.g., 25], it holds for any continuous function g : [0, 1] → R that uniformly, where b_i,n is proportional to the Beta(i + 1, n − i + 1) density. Hence, by taking n on the right-hand side of (6) to be large but finite, we can accurately approximate any absolutely continuous density on the unit interval. We refer to this model as the Beta-mixture-with-spikes (BMwS). A schematic of our model, which accompanies the discussion in the next few subsections, is shown in Figure 1

Figure 1:

The BMwS model. At each time step t, the latent allele frequency f_t is modeled as a mixture of beta distributions, plus spikes at zero and one. In this diagram, there are M = 3 mixture components (blue lines). Mixture weights are indicated as red bars, including the spike weights p₀ and p₁ at f_t = 0 and f_t = 1, respectively. After Wright-Fisher mating, the shape of each beta mixture component, as well as the mixture weights, are updated according to equation (13). After observing the data a_t, the mixture weights are again updated according to Bayes’ rule (equation 14). The process then iterates.

Under BMwS, the (posterior) density of f_t is modeled as a mixture of M beta densities, plus two atoms at f_t = 0 and f_t = 1. We abuse notation slightly and write this as where δ_x denotes a point mass at x, and M is a user-specified parameter which trades approximation accuracy for speed. To model allele frequency trajectories, we need to characterize the distribution of f_t−1 when f_t has the distribution (7). We follow earlier work in utilizing a moment-based approximation, however the form of approximation is new. Previously [e.g., 13, 16, 17, 24], the mean and variance of f_t−1 were obtained by Taylor expansion about the infinite-population (zero variance) allele frequency trajectory, and then a moment-matched Gaussian or beta distribution was used to approximate the distribution of f_t−1. Here we proceed differently, by directly modeling the action of the WF transition kernel on a beta-distributed random variable.

Assume first that f_t ∼ Beta(α, β), and let and f_t−1 be as in (2). Using a computer algebra system, we determined that (A Mathematica notebook verifying these computations is included in the supplement.) These approximations are obtained by Taylor expansion of about s = 0, followed by substituting in moments of the beta distribution. It would be easy to extend them to higher powers of s, but we did not find it necessary, since |s| = 0.1 is already at the extreme end of what we expect to find in natural data. Note that for |s| < 1 (at least), the above equations imply var f_t−1 > 0, so this approximation is robust to the pathology described above.

Using (8)–(10), we can find α′, β′ such that f_t−1 has approximately a Beta(α′, β′) distribution:

More generally, if the continuous component of f_t is as in (7), then after random mating, we approximate the density of f_t−1 by .

The other components of the BMwS model are the “spikes” at f_t = 0 and f_t = 1. They are handled similarly to the original BwS model: at each mating event f_t → f_t−1, some amount of probability mass is leaked from the beta (mixture) component to atomic components, corresponding to the events F_t−1 = 0 and F_t−1 = 2N_t−1 in (2). (See the next section for a precise statement.) The astute reader will have noticed that, in contrast to some earlier works, we did not properly condition on the complement of this event when constructing the moment approximation shown above. That is, instead of e.g. Ef_t−1 in (8), we should instead have considered

However, the resulting expression is very complex because it involves taking expectation over terms of the form . We opted for the simpler and more numerically stable equations (8)–(10), and confirmed in simulations that the model is still accurate across a range of parameter settings.

Likelihood

The likelihood (4) is calculated using a variant of the usual forward algorithm for hidden Markov models. We explain this computation in greater detail here since the approach is nonstandard.

The forward algorithm recursively updates the so-called filtering density p(f_t | a_t, …, a_T), which takes the form shown in equation (7) under our model. Given the filtering density and the observation x_t−1, we need to extend the filtering density one step towards the present to obtain p(f_t−1 | a_t−1, …, a_T). This is accomplished in stages:

We use equations (8)–(12) to compute and , as well as and similarly for . This yields the predictive density where we defined .
We compute the probability noting that the integral can be evaluated analytically using equations (1) and (13), and conjugacy.
We update the mixture weights in (13) to incorporate the information added by observation a_t−1. Viewing a_t−1 as a draw from the Bayesian hierarchical model defined by (1) and (13), the posterior mixture weights on the beta mixture components are where the right-hand side is the BetaBinomial( p.m.f. The posterior weight on the atom at f_t−1 = 1 is p_t−1,1 ∝ 1_{{a_t−1=n_t−1}}, and similarly for p_t−1,0. The constant of proportionality in these equations is p_s(a_t−1 | a_t, …, a_T), calculated in step 2.
The filtering distribution p_s(f_t−1 | a_t−1, …, a_T) takes the same form as (7), with mixture weights as defined in the preceding step, and .

Recalling equation (4), the log-likelihood of the data is then

The running time of this algorithm is , as opposed to the standard forward algorithm which scales quadratically in M if it were to denote the number of hidden states in an HMM. This enables us to set M fairly large, ensuring that our model can flexibly approximate differently shaped filtering distributions.

Prior distribution

The filtering recursion is initialized by setting p(f_T) = π_T (f_T), where π_T is a prior on the ancestral allele frequency. When developing our method, we observed that the choice of prior affected the accuracy of inferences in the ancient past when analysing aDNA data. This is because when the data are sparsely observed and allele counts are low, there is not enough data to overwhelm the prior in the early stages Markov chain. An uninformative choice of π_T can falsely suggest that the selected allele experienced a large change in frequency, potentially generating a spurious signal of selection. To mitigate this effect, we adopted a coordinate-ascent approach where we alternatively maximized the log-likelihood (15) with respect to a) the selective trajectory s and b) the prior distribution π_T . For the prior, we assumed that π_T ∼ Beta(α_T, β_T) and optimized over α_T, β_T . Note that, using the interpolation formula (6), we can accurately model an arbitrary prior density if M is sufficiently large.

Inference

Given the probability model and approximations described in the preceding section, inference is now straightforward. Parameter estimation is carried out by as described in the previous section, with one additional modification. Depending on the quality and density of the data, many entries of s may only be weakly resolved, and we also found it advantageous to add a regularization term. The objective function for all the analyses reported below was where λ > 0 is a tuning parameter. The regularizer penalizes variation in s, with larger values of λ shrinking all entries towards a single common value, s₀ = · · · = s_T . The number of mixture components for the BMwS model was fixed to M = 100, and we performed three rounds of coordinate ascent. For all the examples in this paper, we assumed that the size history (N₀, …, N_T) is known, but co-estimation of selection and size history is also possible using our method, and could be an avenue for future research. Finally, our method is implemented using a JIT-compiled, differentiable programming language [26] to allow for efficient, gradient-based fitting.

Simulations

We evaluated the performance of the estimator under three different types of selection, each lasting for T = 100 generations:

Constant selection with s = 0.01 and initial frequency of 0.1.
Selection that decreases sinusoidally from s = +0.02 to S = −0.02 and initial frequency of 0.1.
Selection that alternates between +0.02 and −0.02 every 20 generations and initial frequency of 0.5.

In each case, we simulated allele frequency trajectories in a Wright-Fisher population with N = 10⁴ and then sampled 100 haploid individuals every 10 generations. We also simulated the same selective models under two scenarios of variable effective population size:

Exponential growth from N = 10⁴ to N = 10⁵.
N = 10⁴ with a bottleneck of N = 10³ lasting 10 generations.

For these scenarios we ran the estimator both with the correct effective population size and incorrectly assuming a constant N = 10⁴ to evaluate its robustness to misspecification of N . We varied the smoothing parameter λ from log₁₀ (λ) = 1 to 6 and report the root mean squared (RMS) bias, variance and total error of the estimator. Finally, we evaluated the error of the estimator as the sample size and frequency vary.

While these simulations cover a plausible range of scenarios for human data (N = 10⁴, 100 generations of observations, bottlenecks and exponential growth, selection coefficients of ), other species may have very different sets of parameters. We therefore recommend that for any particular application, users should run simulations based on their prior parameter values in order to understand the behavior of the estimator in their specific case, and to determine the appropriate smoothing parameter. We provide functions to easily implement simulations under arbitrary selective and population size scenarios. To illustrate this approach, we also ran the simulations described above using the sampling distribution of the British aDNA data used in the rest of paper.

Ancient DNA data

We collected data from ancient British individuals dated to the past 4500 years from the Allen Ancient DNA resource (v44.3, [27], original sources [28, 29, 30, 31, 7]) and Patterson et al. (2022) [8]. Most samples had been genotypyed on the 1240k array, and the small number of shotgun samples had been genotyped at the 1240k SNPs so we therefore restricted our analysis to this set of SNPs. All data were pseudo-haploid. After removing 22 PCA outliers, we were left with 529 ancient individuals and 98 present-day individuals from the GBR population of the 1000 Genomes project [32], processed into pseudo-haploid data as part of the AADR (Fig. 2A).

Figure 2:

Ancient British data. A: histogram of dates of ancient individuals. Inset map shows locations of sites. B: Principal components of ancient and present-day samples projected onto axes defined by 777 West Eurasian individuals (see Ref. [33] for details of these individuals).

Our assumes assumes that samples are drawn from a closed, randomly mating population in which every individual experiences the same selective pressures. While no natural population satisfies these conditions, we chose to restrict to Britain dated in the period 4500BP-present because it is the largest aDNA sample from a time and region that comes close to satisfying the assumptions of the model, for the following reasons. First, Britain is a relatively small region (compared to previous Europe-wide studies), meaning that selective pressures are more likely to be shared. Second, we know from previous aDNA studies that the last major change in ancestry in Britain occurred around 4500BP [30]. While recent work has demonstrated more recent Bronze Age migrations into Britain [8], these involve populations that were genetically similar, and inhabited geographically adjacent regions. Third, we confirmed using principal component analysis (PCA) that all the ancient samples in our analysis clustered with the present-day British individuals from the 1000 Genomes project, and more broadly with other Northwestern European individuals in the context of present-day West Eurasia (Fig. 2B).

Ancient DNA analysis

Starting with 1,150,639 autosomal SNPs, we removed 428,624 with a minor allele frequency (MAF) <0.1 in the full dataset on the grounds that any SNP with a significant frequency shift in this time period would have intermediate MAF. We also removed 101,967 SNPs with greater than 90% missingness and 210,725 SNPs with MAF=0 in the ancient data to leave 409,232 SNPs genome-wide. We inferred selection coefficients at generation t for every SNP in the filtered data using a smoothing parameter of λ = 10^4.5 and an effective population size of N = 10⁴. For each SNP, we summarize this estimate by the mean selection coefficient , and the root mean squared selection coefficient . We then computed the mean value of ∥s∥ in 20-SNP sliding windows, sliding in 10 SNP increments, so each SNP contributes to two windows. We denote these window statistics ∥s∥₂₀. Finally, we fit a gamma distribution using the method of moments to the values of ∥s∥₂₀, and used this fitted distribution to compute P-values for each window. We confirmed that this procedure leads to well-calibrated P-values by repeating the analysis with the dates of each sample randomized, to remove any temporal signal of selection.

We compared our results with two previous genome-wide selection scans. The first—an aDNA based scan [5]—is an allele frequency scan to detect selection in approximately the last 8,000 years in Western Eurasia. The second, the SDS test [18], is a scan based on haplotype lengths in the present-day UK population and is most sensitive to selection in the past few thousand years. First, we restricted the previous scans to the same set of SNPs used in the present scan. Next, for each window in the present scan, we computed the mean test statistics (chi-squared statistic for the aDNA scan and squared Z-score for the SDS) in each window and compared to the test statistics generated by our method.

Polygenic selection test

We obtained summary statistics from genome-wide association studies (GWAS) for 28 quantitative traits [34, 35, 36, 37, 38]. We took the intersection of GWAS SNPs with the 1240k array, restricted to P-values < 10⁻⁴, and pruned to an independent set of SNPs by iteratively taking the SNP with the smallest P-value and removing all other SNPs within 250kb. For each trait, we calculated the correlation between effect size and estimated selection coefficient for each independent SNP.

Results

Our estimator successfully recovers complex selection trajectories from simulated data (Fig. 3), with total sample sizes of the same order of magnitude as our ancient sample, though individual selection coefficient estimates can have considerable uncertainty—on the order of ±0.01 − 0.02 with these parameter values. This suggests that with the data available we should be able to reliably detect selection coefficients around 0.02, similar to the SDS and aDNA allele frequency approaches that we compare with [5, 18]. As expected, the optimal smoothing parameter depends on the true selective trajectory—the smoother the trajectory, the higher the optimal λ. We therefore recommend choosing λ based on simulations, with parameters that are informed by the specific application.

Figure 3:

Simulation results. Each column shows a different selection coefficient. Upper row: Estimated selection coefficients. Dashed line: simulated selection coefficient. Solid blue line: mean selection coefficient from 100 simulations. Light blue shaded area: region containing point estimates from 95/100 simulations. Lower row: Square root of squared bias, variance and total squared error as a function of log₁₀ (λ). The circled value is the one used for the estimates in the upper row.

The estimator performs similarly in the presence of population size bottlenecks or exponential growth, although it tends to slightly over-smooth changes in s in these cases (Fig. S1). Importantly, it is relatively robust to mis-specification of effective population size: if we input constant N to the estimator, results are not noticeably different than if we specify the correct population size history (Fig. S2). Increasing size and frequency of sampling reduces error (Fig. S3) though, in general, sample size is more important than sampling frequency and, all else being equal, it is better to have infrequent samples of large sizes than frequent small samples.

Figure S1:

Simulation results for different population size history. Each column represents a different selective trajectory and each row represents a different population size trajectory. Dashed lines: simulated selection coefficient. Solid blue lines: mean selection coefficient from 100 simulations. Light blue shaded areas: region containing point estimates from 95/100 simulations.

Figure S2:

Simulation results for different population size histories. As Fig. S1 but using a constant N = 10⁴ in the model, instead of the correct effective population size. Dashed lines: simulated selection coefficient. Solid blue lines: mean selection coefficient from 100 simulations. Light blue shaded areas: region containing point estimates from 95/100 simulations.

Figure S3:

Root mean squared error (RMSE) for different sampling schemes. Each column represents a different selective trajectory and each row represents a different population size trajectory as in Fig. S1. Each barplot shows the distribution of RMSE for 100 simulations based on a different sampling schemes. Sampling schemes are labeled "size-frequency", so for example "1-10" means that we sampled one haploid individual every 10 generations.

In the ancient British data, we identified 7 regions with genome-wide significant evidence of selection (Table 1, Fig. 4, Supplementary Table S2). We used a P-value cutoff of 10⁻⁷ as a genome-wide significant cutoff. While conservative for the 68,061 overlapping windows in our analysis, we used this value because when we reran the analysis with randomized sample dates, no window had a smaller P-value (Fig. 4B). Three of these regions, which we denote HLA1, HLA2, and HLA3 are in the HLA region on chromosome 6 (Fig. S5), though these themselves may contain multiple signals. An eighth apparent signal on chromosome 4 containing the gene LINC00955 is likely artefactual. The lead SNP rs4690044 has a MAF of 0 in present-day samples but around 0.5 in ancient samples. In gnomAD [39], rs4690044 has a MAF of 0.48, but no homozygotes, suggesting an artefactual call caused by a duplication. An additional locus on chromosome 12 (P=2.5 × 10⁻⁷) which contains the gene OAS1 and is known to be a target of adaptive Neanderthal introgression [40] was significant at a Bonferroni corrected significance threshold (P=7.3 × 10⁻⁷).

Figure S4:

Root mean squared bias, variance and total error as a function of log₁ 0 (λ) using the sampling distribution of the ancient British data for different selective (columns) and effective population size (rows) trajectories.

Figure S5:

Window-based P-values for the HLA region on Chromosome 6. The thick black x-axis shows the boundary of the HLA. Each blue bar represents a 20-SNP window. We highlight three regions with evidence of one or more significant signals.

Figure 4:

Genome-wide scan for selection in Britain A: P-values for selection in 20-SNP sliding windows. Genome-wide significant (P< 10⁻⁷) windows are labeled with the closest gene or known target of selection. B: QQ plot for observations in A (blue) and after randomizing the dates of each sample (red). C: Comparison with results of Field et al. [18]. Blue solid line shows the density of mean SDS² in 20-SNP windows. Labeled red lines indicate windows that are genome-wide significant in our analysis. D: Comparison with results of Mathieson et al. [5]. Blue solid line shows the density of mean statistic in 20-SNP windows. Labeled red lines indicate windows that are genome-wide significant in our analysis. In both C and D, HERC2 is approximately at the upper 5^th percentile of the distribution.

View this table:

Table 1:

Genome-wide significant regions. For non-HLA regions, we give the chromosome and position of the center of the lead window, the selected SNP (based on previous literature), the selected allele, and the mean selection coefficient of the lead SNP. For HLA regions we give the co-ordinates of the three regions that contain significant signals (Fig. S5), and the SNPs with the largest test statistic ∥s∥. Parentheses indicate that we do not know whether these specific SNPs were the targets of selection.

All seven of these regions were identified by a previous allele frequency based selection scan using aDNA to detect selection in West Eurasia over the past 8,000 years [5] (Fig. 4D). Only LCT and the HLA region show significant evidence of selection in a haplotype-based scan using present-day sequence data that aimed to detect selection in Britain in the past few thousand years [18] (Fig. 4C) or in a scan based on identifying very recent coalescence times (∼ 50 generations) in the UK Biobank [41].

For the non-HLA signals where we have a strong candidate for the causal variant based on previous literature, we examined the precise timing and trajectory of selection estimates from our model (Fig. 5). The most significant signal was the well-known LCT locus on chromosome 2, where the selected allele is associated with lactase persistence [42, 43]. We find that selection for the persistence allele was strongest (s ≈ 0.08) from 150 to 100 generations before present (roughly 4500 − 3000BP) before decreasing to around 0.02 in the past 50 generations. This large change in strength of selection might explain the wide range of estimates from models that assume a constant value [3, 43, 44]. At DHCR7, the haplotype tagged by the SNP in our analysis, rs7944926, is associated with protection against vitamin D insufficiency [45] and has been shown to have been under recent selection in both Europe [5] and East Asia [46]. We infer that, in Britain, the selection coefficient increased over the past 150 generations, from around 0 to 0.06, leading to a increase in frequency from about 20% to 60% over the past 3000 years (Fig. 5B).

Figure 5:

Trajectories of genome-wide significant non-HLA loci. A: Estimated selection coefficient. B: Estimated allele frequency trajectory.

Derived alleles at SLC45A2 and HERC2 are associated with light skin, hair and eye pigmentation [47, 48, 49, 50, 51]. Both these alleles have been shown to have been under selection broadly in Europe [47, 52, 53] and specifically during the Holocene [2, 5]. Time series of aDNA have shown that both alleles were under selection in the past 5000 years in Eastern Europe [2]. There, the derived SLC45A2 allele increased in frequency from 40% to 90% over the past 5000 years, suggesting a selection coefficient of around 0.03, very similar to our estimate of selection in Britain during the same time (Fig. 5). For the derived HERC2 allele, we estimate a selection coefficient of 0.02-0.04, similar to that estimated in Eastern Europe, although we find that in Britain selection was largely restricted to approximately the past 2000 years. Wilde et al. [2] also found a derived allele of TYR to be under strong selection in Eastern Europe, but we find little evidence that it was under selection in Britain, except possibly before 3000BP (window P-value=0.03, Figure S6).

Figure S6:

Estimated selection coefficient and frequency for the A allele of rs1042602 at the TYR locus.

At the HLA we find three regions with genome-wide significant evidence of selection, which we denote HLA1-3 (Fig. S5). All three correspond to regions identified by Mathieson et al. [5] and also have strong evidence of selection in the Field et al. [18] scan (Fig. 4C). Because of high gene density and complex patterns of linkage disequilibrium in the region, we did not attempt to identify causal genes or variants. However we note that the lead SNP at HLA1 is strongly associated with decreased risk of coeliac disease in UK Biobank [49]. The lead HLA2 SNP is associated with increased risk of ankylosing spondylitis [49] but the region contains the gene HLA-C, a variant of which is the strongest known risk factor for psoriasis [54]. Finally, the lead SNP at HLA3 is strongly associated with decreased risk of coeliac disease and psoriasis [49]. These associations suggest that risk of these diseases has been affected by selection, even if they themselves are not the direct targets.

Finally, we searched for evidence of polygenic selection by testing for a correlation between GWAS effect size and selection coefficient for 28 anthropometric and morphological traits (Fig. 6, Table S1). We find significant evidence of polygenic selection for reduced skin pigmentation (P=3.6 × 10⁻¹⁶) but none of the other 27 traits (P> 0.04). Though not statistically significant, the largest absolute correlations apart from skin pigmentation is for increased calcium. Notably, we do not detect evidence of selection on any of the phenotypes identified by Field et al. [18] as under selection in Britain in the past 2000 years, including height, infant head circumference and fasting insulin.

View this table:

Table S1:

Trait descriptions, sources, and values for Figure 6

Figure 6:

Evidence of polygenic selection. Each point represents a single GWAS. The x-axis gives the (Pearson) correlation between effect size estimates β and selection coefficient estimates for independent SNPs with GWAS P-value< 10⁻⁴. The y-axis gives the log¹⁰(P-value) for null hypothesis of no correlation. Abbreviations, exact values and sources are given in Table S1.

Discussion

Our fast and flexible estimator allowed us to perform a direct genome-wide scan for selection based on allele frequency trajectories. All of the significant loci we identified have been identified by a previous allele frequency based scan using aDNA [5]. However, that approach scanned for selection broadly in Holocene Europe and was unable to localize selection further in either space or time. Here, we are able to localize selection to Britain in the past 4500 years and, even further, to identify changes in the strength of selection over that time period (Fig. 5). We are also able to identify selected loci that were not identified in studies of much larger samples of present-day individuals [18, 41].

On the other hand, when we search for polygenic selection, the only trait for which we find significant evidence is skin pigmentation, which is known to have been under selection in Europe more broadly into this time period [2, 5, 18, 55]. Though many previous studies [18, 56, 57, 58] reported evidence for recent selection on height, this has been shown to be largely driven by residual stratification in the GIANT consortium GWAS [59, 60]. Evidence of recent polygenic selection for other traits [e.g., 18] likely suffers from the same issue, although UK Biobank GWAS results may be more reliable. Interestingly, even when we test for polygenic selection using the 2014 GIANT height GWAS [36], we find no evidence of selection, suggesting that our approach based on direct estimates of allele frequency changes might be more robust to stratification than others based on haplotype structure or allele frequency differences among extant populations.

While there may have been multiple environmental drivers of selection, taken together our results strongly suggest that the dominant selective pressure in this time was for increased calcium largely moderated through increased vitamin D levels. Vitamin D is required for the absorption of calcium and deficiency leads to bone deformities with potentially major effects on fitness. Since a major source of vitamin D is synthesis in the skin in the presence of UV radiation, the cloudy skies of Britain are likely to have limited this synthesis. The Mesolithic inhabitants of Britain may have avoided this problem through consumption of vitamin D rich marine resources, but later Neolithic and Bronze Age populations including those in our study relied on agricultural products for their subsistence [61], leading to a need for genetic adaptation.

In fact, almost all of our signals of selection can be related to selection for increased vitamin D or directly for increased calcium. Selection at SLC45A2 and HERC2, and selection for lighter skin pigmentation more generally, naturally lead to increased penetration of UV into the skin and therefore higher levels of vitamin D [62, 63]. Lactase persistence allows the consumption of milk which contains both calcium and a small amount of vitamin D. This “calcium absorption” hypothesis has long been suggested to explain the high frequency of the persistence phenotype in Northern Europe [64, 65]. DHCR7 is directly involved in vitamin D metabolism and the selected allele protects against insufficiency [45]. While the HLA associations are more difficult to specifically identify, two of the selected alleles are protective against coeliac disease which itself is a risk factor for malabsorption of calcium and vitamin D, and consequent osteoperosis [66]. Strong selective pressure for increased vitamin D and calcium levels is therefore a plausible and parsimonious explanation for the patterns of selection that we observe in the this dataset, and suggests that it was the dominant selective pressure in this population.

The main limitation of our approach is the assumption of population continuity. Although there is evidence of external migration into Britain during the time period we investigated [8], there is relatively little change in genetic ancestry since the sources of that migration are genetically similar populations from other nearby parts of Northern Europe. If this affects our results it would likely mean that some of the selection we detected actually occurred in those neighboring populations, somewhat earlier than the dates we find here. However, given that the selection pressures we detect are likely to be shared with these similar populations anyway, we do not think this possibility has a major impact on the interpretation of our results. This limitation is more serious if we want to extend the temporal and geographic range of our analysis. In particular, both ancient and present-day individuals from the Iberian and Italian peninsulas are much more genetically diverse than those from Britain (Fig. S8). Because of this, and the much smaller ancient sample sizes, we did not analyze selection in these Southern European populations, although identifying differences in selective pressures between Northern and Southern Europe is an important direction for future analysis.

Figure S7:

As Fig. 6 but restricting to SNPs that with P< 10⁻⁸, rather than P < 10⁻⁴. Calcium does not appear on this plot because it has no SNPs with P < 10⁻⁸.

Figure S8:

Projected principal components as in Fig. 2 but for ancient individuals from the past 4500 years in Iberia and the Italian peninsula, together with present-day individuals from the IBS and TSI populations of the 1000 Genomes Project. See Ref. [33] for details of the reference populations (grey points).

Although we have identified the loci under the strongest selection in recent British history, there is evidence in our data of additional selected loci. For example, several known loci including OAS1 (P=2.5 × 10⁻⁷, [40]) and FADS1 (P=1.5 × 10⁻⁵, [5]) have evidence of selection though below genome-wide significance. With larger sample sizes we can expect that these and other, potentially novel, loci would be identified. Another limitation is that we rely on genotypes from the 1240k array and therefore cannot detect selection on rare SNPs or structural variants that are not tagged by SNPs on the array. Large datasets of shotgun sequence data might extend the range of variation on which selection can be detected.

Our study demonstrates the power of ancient DNA to robustly detect and precisely characterize the timing of natural selection in humans. Notably, with sample sizes of a few hundred, we are able to identify variants under selection that were not identified in studies of thousands of present-day individuals. All the loci that we identify have evidence of substantial change in the strength of selection over the past 4500 years, suggesting that modeling this variability is an important addition to previous approaches that assume constant selective pressure. This also allows us to formulate more specific hypotheses about the environmental drivers of selection. For example, the decrease in the selective advantage of the lactase persistence allele coincides with the Roman occupation of Britain. Perhaps, since cattle milk was not commonly drunk in Roman culture, a culturally mediated decrease in milk consumption in Roman Britain meant that lactase persistence became less useful for obtaining vitamin D, leading to an increase in selective pressures related to other sources (i.e. pigmentation and metabolism). In this way, culture could have determined the genetic response to a constant environmental selective pressure. The prospect of exploring this and similar interactions is an exciting future direction for ancient DNA research.

Data and code availability

Ancient genotype data was from the Allen Ancient DNA Resource v4.3 [27] (original sources cited in main text), and Patterson et al. 2022 [8]. Present-day data from the 1000 Genomes Project [32] was reprocessed as described in the AADR. Code to run the estimator and reproduce the analyses in this paper is available at https://github.com/jthlab/bmws.

Acknowledgments

This research was funded by the NIGMS ([R35GM133708] to I.M.) and the NSF ([DMS-2052653] to J.T.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

References

[1].↵
P Skoglund, I Mathieson, Ancient genomics of modern humans: The first decade. Annu Rev Genomics Hum Genet 19, 381–404 (2018).
OpenUrl CrossRef
[2].↵
S Wilde, et al., Direct evidence for positive selection of skin, hair, and eye pigmentation in europeans during the last 5,000 y. Proceedings of the National Academy of Sciences 111, 4832–4837 (2014).
OpenUrl Abstract/FREE Full Text
[3].↵
S Mathieson, I Mathieson, Fads1 and the timing of human adaptation to agriculture. Mol Biol Evol 35, 2957–2970 (2018).
OpenUrl CrossRef PubMed
[4].↵
G Kerner, et al., Human ancient dna analyses reveal the high burden of tuberculosis in europeans over the last 2,000 years. Am J Hum Genet 108, 517–524 (2021).
OpenUrl
[5].↵
I Mathieson, et al., Genome-wide patterns of selection in 230 ancient eurasians. Nature 528, 499–503 (2015).
OpenUrl CrossRef PubMed
[6].↵
P Skoglund, et al., Reconstructing prehistoric african population structure. Cell 171, 59–71 e21 (2017).
OpenUrl CrossRef PubMed
[7].↵
A Margaryan, et al., Population genomics of the viking world. Nature 585, 390–396 (2020).
OpenUrl
[8].↵
N Patterson, et al., Large-scale migration into britain during the middle to late bronze age. Nature 601, 588–594 (2022).
OpenUrl
[9].↵
I Olalde, et al., The genomic history of the iberian peninsula over the past 8000 years. Science 363, 1230–1234 (2019).
OpenUrl Abstract/FREE Full Text
[10].↵
JP Bollback, TL York, R Nielsen, Estimation of 2nes from temporal allele frequency data. Genetics 179, 497–502 (2008).
OpenUrl Abstract/FREE Full Text
[11].↵
AS Malaspinas, O Malaspinas, SN Evans, M Slatkin, Estimating allele age and selection coefficient from time-serial data. Genetics 192, 599–607 (2012).
OpenUrl Abstract/FREE Full Text
[12].↵
I Mathieson, G McVean, Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193, 973–84 (2013).
OpenUrl Abstract/FREE Full Text
[13].↵
M Lacerda, C Seoighe, Population genetics inference for longitudinally-sampled mutants under strong selection. Genetics 198, 1237–1250 (2014).
OpenUrl Abstract/FREE Full Text
[14].↵
AF Feder, S Kryazhimskiy, JB Plotkin, Identifying signatures of selection in genetic time series. Genetics 196, 509–522 (2014).
OpenUrl Abstract/FREE Full Text
[15].↵
M Steinrücken, A Bhaskar, YS Song, A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Stat. 8, 2203–2222 (2014).
OpenUrl CrossRef PubMed
[16].↵
J Terhorst, C Schlötterer, YS Song, Multi-locus analysis of genomic time series data from experimental evolution. PLOS Genetics 11, 1–29 (2015).
OpenUrl
[17].↵
P Tataru, M Simonsen, T Bataillon, A Hobolth, Statistical inference in the Wright-Fisher model using allele frequency data. Syst. Biol. 66, e30–e46 (2017).
OpenUrl CrossRef PubMed
[18].↵
Y Field, et al., Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).
OpenUrl Abstract/FREE Full Text
[19].↵
WJ Ewens, Mathematical population genetics: theoretical introduction. (Springer) Vol. 1, (2004).
[20].↵
A Ferrer-Admetlla, C Leuenberger, JD Jensen, D Wegmann, An approximate markov model for the wright–fisher diffusion and its application to time series data. Genetics 203, 831–846 (2016).
OpenUrl Abstract/FREE Full Text
[21].↵
P Tataru, T Bataillon, A Hobolth, Inference under a Wright-Fisher model using an accurate beta approximation. Genetics 201, 1133–1141 (2015).
OpenUrl Abstract/FREE Full Text
[22].↵
Z Gompert, Bayesian inference of selection in a heterogeneous environment from genetic time-series data. Mol. Ecol. 1 (2016).
[23].↵
AS Malaspinas, Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective. Mol. Ecol. 25, 24–41 (2016).
OpenUrl CrossRef PubMed
[24].↵
C Paris, B Servin, S Boitard, Inference of selection from genetic time series using various parametric approximations to the wright-fisher model. G3: Genes, Genomes, Genetics 9, 4073–4086 (2019).
OpenUrl
[25].↵
W Feller, An introduction to probability theory and its applications, vol 2. (John Wiley & Sons), (2008).
[26].↵
J Bradbury, et al., JAX: composable transformations of Python+NumPy programs (2018).
[27].↵
Reich Lab, Allen Ancient DNA Resource (https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data) (2021) Accessed: 2021-09-01.
[28].↵
S Schiffels, et al., Iron age and anglo-saxon genomes from east england reveal british migration history. Nature communications 7, 10408 (2016).
OpenUrl
[29].↵
R Martiniano, et al., Genomic signals of migration and continuity in britain before the anglo-saxons. Nature communications 7, 10326 (2016).
OpenUrl
[30].↵
I Olalde, et al., The beaker phenomenon and the genomic transformation of northwest europe. Nature 555, 190–196 (2018).
OpenUrl CrossRef PubMed
[31].↵
S Brace, et al., Ancient genomes indicate population replacement in early neolithic britain. Nat Ecol Evol 3, 765–771 (2019).
OpenUrl
[32].↵
1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015).
OpenUrl CrossRef PubMed
[33].↵
I Lazaridis, et al., Ancient human genomes suggest three ancestral populations for present-day europeans. Nature 513, 409–13 (2014).
OpenUrl CrossRef PubMed Web of Science
[34].↵
Neale Lab, UK Biobank GWAS Results (http://www.nealelab.is/uk-biobank) (2021) Accessed: 2021-01-20.
[35].↵
HR Taal, et al., Common variants at 12q15 and 12q24 are associated with infant head circumference. Nat Genet 44, 532–538 (2012).
OpenUrl CrossRef PubMed
[36].↵
AR Wood, et al., Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46, 1173–86 (2014).
OpenUrl CrossRef PubMed
[37].↵
M Horikoshi, et al., Genome-wide associations for birth weight and correlations with adult disease. Nature 538, 248–252 (2016).
OpenUrl CrossRef PubMed
[38].↵
J Chen, et al., The trans-ancestral genomic architecture of glycemic traits. Nat Genet 53, 840–860 (2021).
OpenUrl CrossRef
[39].↵
KJ Karczewski, et al., The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
OpenUrl CrossRef PubMed
[40].↵
AJ Sams, et al., Adaptively introgressed neandertal haplotype at the oas locus functionally impacts innate immune responses in humans. Genome Biol 17, 246 (2016).
OpenUrl CrossRef
[41].↵
J Nait Saada, et al., Identity-by-descent detection across 487,409 british samples reveals fine scale population structure and ultra-rare variant associations. Nat Commun 11, 6130 (2020).
OpenUrl CrossRef
[42].↵
NS Enattah, et al., Identification of a variant associated with adult-type hypolactasia. Nat Genet 30, 233–7 (2002).
OpenUrl CrossRef PubMed Web of Science
[43].↵
T Bersaglieri, et al., Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74, 1111–20 (2004).
OpenUrl CrossRef PubMed Web of Science
[44].↵
BM Peter, E Huerta-Sanchez, R Nielsen, Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genet 8, e1003011 (2012).
OpenUrl CrossRef PubMed
[45].↵
TJ Wang, et al., Common genetic determinants of vitamin d insufficiency: a genome-wide association study. Lancet 376, 180–8 (2010).
OpenUrl CrossRef PubMed Web of Science
[46].↵
V Kuan, AR Martineau, CJ Griffiths, E Hypponen, R Walton, Dhcr7 mutations linked to higher vitamin d status allowed early human migration to northern latitudes. BMC Evol Biol 13, 144 (2013).
OpenUrl CrossRef PubMed
[47].↵
HL Norton, et al., Genetic evidence for the convergent evolution of light skin in europeans and east asians. Mol Biol Evol 24, 710–22 (2007).
OpenUrl CrossRef PubMed Web of Science
[48].↵
H Eiberg, et al., Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the herc2 gene inhibiting oca2 expression. Hum Genet 123, 177–87 (2008).
OpenUrl CrossRef PubMed Web of Science
[49].↵
O Canela-Xandri, K Rawlik, A Tenesa, An atlas of genetic associations in uk biobank. Nat Genet 50, 1593–1599 (2018).
OpenUrl CrossRef PubMed
[50].↵
PG Hysi, et al., Genome-wide association meta-analysis of individuals of european ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability. Nat Genet 50, 652–656 (2018).
OpenUrl
[51].↵
M Simcoe, et al., Genome-wide association study in almost 195,000 individuals identifies 50 previously unidentified genetic loci for eye color. Sci Adv 7, doi:10.1126/sci-adv.abd1239 (2021).
OpenUrl CrossRef
[52].↵
O Lao, JM de Gruijter, K van Duijn, A Navarro, M Kayser, Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms. Ann Hum Genet 71, 354–69 (2007).
OpenUrl CrossRef PubMed Web of Science
[53].↵
MP Donnelly, et al., A global view of the oca2-herc2 region and pigmentation. Hum Genet 131, 683–96 (2012).
OpenUrl CrossRef PubMed
[54].↵
X Yin, et al., Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat Commun 6, 6916 (2015).
OpenUrl CrossRef PubMed
[55].↵
D Ju, I Mathieson, The evolution of skin pigmentation-associated variation in west eurasia. Proc Natl Acad Sci U S A 118, e2009227118 (2021).
OpenUrl
[56].↵
MC Turchin, et al., Evidence of widespread selection on standing variation in europe at height-associated snps. Nat Genet 44, 1015–9 (2012).
OpenUrl CrossRef PubMed
[57].↵
JJ Berg, G Coop, A population genetic signal of polygenic adaptation. PLoS Genet 10, e1004412 (2014).
OpenUrl CrossRef PubMed
[58].↵
MR Robinson, et al., Population genetic differentiation of height and body mass index across europe. Nat Genet 47, 1357–62 (2015).
OpenUrl CrossRef PubMed
[59].↵
JJ Berg, et al., Reduced signal for polygenic adaptation of height in uk biobank. Elife 8, e39725 (2019).
OpenUrl CrossRef PubMed
[60].↵
M Sohail, et al., Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife 8, e39702 (2019).
OpenUrl CrossRef
[61].↵
MP Richards, RJ Schulting, RE Hedges, Archaeology: sharp shift in diet at onset of neolithic. Nature 425, 366 (2003).
OpenUrl GeoRef PubMed
[62].↵
FG Murray, Pigmentation, sunlight, and nutritional disease. American Anthropologist 36, 438–445 (1934).
OpenUrl CrossRef
[63].↵
NG Jablonski, G Chaplin, Human skin pigmentation as an adaptation to uv radiation. Proc Natl Acad Sci U S A 107 Suppl 2, 8962–8 (2010).
OpenUrl Abstract/FREE Full Text
[64].↵
G Flatz, HW Rotthauwe, Lactose nutrition and natural selection. Lancet 2, 76–7 (1973).
OpenUrl PubMed Web of Science
[65].↵
P Gerbault, et al., Evolution of lactase persistence: an example of human niche construction. Philos Trans R Soc Lond B Biol Sci 366, 863–77 (2011).
OpenUrl CrossRef PubMed
[66].↵
D Meyer, S Stavropolous, B Diamond, E Shane, PH Green, Osteoporosis in a north american adult population with celiac disease. Am J Gastroenterol 96, 112–9 (2001).
OpenUrl PubMed Web of Science

View the discussion thread.

Posted March 16, 2022.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5223)
Biochemistry (11760)
Bioengineering (8762)
Bioinformatics (29230)
Biophysics (14991)
Cancer Biology (12118)
Cell Biology (17424)
Clinical Trials (138)
Developmental Biology (9432)
Ecology (14190)
Epidemiology (2067)
Evolutionary Biology (18319)
Genetics (12254)
Genomics (16809)
Immunology (11877)
Microbiology (28110)
Molecular Biology (11609)
Neuroscience (61037)
Paleontology (452)
Pathology (1873)
Pharmacology and Toxicology (3239)
Physiology (4967)
Plant Biology (10431)
Scientific Communication and Education (1683)
Synthetic Biology (2888)
Systems Biology (7346)
Zoology (1653)

[1] [1].↵
P Skoglund, I Mathieson, Ancient genomics of modern humans: The first decade. Annu Rev Genomics Hum Genet 19, 381–404 (2018).
OpenUrl CrossRef

[2] [2].↵
S Wilde, et al., Direct evidence for positive selection of skin, hair, and eye pigmentation in europeans during the last 5,000 y. Proceedings of the National Academy of Sciences 111, 4832–4837 (2014).
OpenUrl Abstract/FREE Full Text

[3] [3].↵
S Mathieson, I Mathieson, Fads1 and the timing of human adaptation to agriculture. Mol Biol Evol 35, 2957–2970 (2018).
OpenUrl CrossRef PubMed

[4] [4].↵
G Kerner, et al., Human ancient dna analyses reveal the high burden of tuberculosis in europeans over the last 2,000 years. Am J Hum Genet 108, 517–524 (2021).
OpenUrl

[5] [5].↵
I Mathieson, et al., Genome-wide patterns of selection in 230 ancient eurasians. Nature 528, 499–503 (2015).
OpenUrl CrossRef PubMed

[6] [6].↵
P Skoglund, et al., Reconstructing prehistoric african population structure. Cell 171, 59–71 e21 (2017).
OpenUrl CrossRef PubMed

[7] [7].↵
A Margaryan, et al., Population genomics of the viking world. Nature 585, 390–396 (2020).
OpenUrl

[8] [8].↵
N Patterson, et al., Large-scale migration into britain during the middle to late bronze age. Nature 601, 588–594 (2022).
OpenUrl

[9] [9].↵
I Olalde, et al., The genomic history of the iberian peninsula over the past 8000 years. Science 363, 1230–1234 (2019).
OpenUrl Abstract/FREE Full Text

[10] [10].↵
JP Bollback, TL York, R Nielsen, Estimation of 2nes from temporal allele frequency data. Genetics 179, 497–502 (2008).
OpenUrl Abstract/FREE Full Text

[11] [11].↵
AS Malaspinas, O Malaspinas, SN Evans, M Slatkin, Estimating allele age and selection coefficient from time-serial data. Genetics 192, 599–607 (2012).
OpenUrl Abstract/FREE Full Text

[12] [12].↵
I Mathieson, G McVean, Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193, 973–84 (2013).
OpenUrl Abstract/FREE Full Text

[13] [13].↵
M Lacerda, C Seoighe, Population genetics inference for longitudinally-sampled mutants under strong selection. Genetics 198, 1237–1250 (2014).
OpenUrl Abstract/FREE Full Text

[14] [14].↵
AF Feder, S Kryazhimskiy, JB Plotkin, Identifying signatures of selection in genetic time series. Genetics 196, 509–522 (2014).
OpenUrl Abstract/FREE Full Text

[15] [15].↵
M Steinrücken, A Bhaskar, YS Song, A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Stat. 8, 2203–2222 (2014).
OpenUrl CrossRef PubMed

[16] [16].↵
J Terhorst, C Schlötterer, YS Song, Multi-locus analysis of genomic time series data from experimental evolution. PLOS Genetics 11, 1–29 (2015).
OpenUrl

[17] [17].↵
P Tataru, M Simonsen, T Bataillon, A Hobolth, Statistical inference in the Wright-Fisher model using allele frequency data. Syst. Biol. 66, e30–e46 (2017).
OpenUrl CrossRef PubMed

[18] [18].↵
Y Field, et al., Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).
OpenUrl Abstract/FREE Full Text

[19] [19].↵
WJ Ewens, Mathematical population genetics: theoretical introduction. (Springer) Vol. 1, (2004).

[20] [20].↵
A Ferrer-Admetlla, C Leuenberger, JD Jensen, D Wegmann, An approximate markov model for the wright–fisher diffusion and its application to time series data. Genetics 203, 831–846 (2016).
OpenUrl Abstract/FREE Full Text

[21] [21].↵
P Tataru, T Bataillon, A Hobolth, Inference under a Wright-Fisher model using an accurate beta approximation. Genetics 201, 1133–1141 (2015).
OpenUrl Abstract/FREE Full Text

[22] [22].↵
Z Gompert, Bayesian inference of selection in a heterogeneous environment from genetic time-series data. Mol. Ecol. 1 (2016).

[23] [23].↵
AS Malaspinas, Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective. Mol. Ecol. 25, 24–41 (2016).
OpenUrl CrossRef PubMed

[24] [24].↵
C Paris, B Servin, S Boitard, Inference of selection from genetic time series using various parametric approximations to the wright-fisher model. G3: Genes, Genomes, Genetics 9, 4073–4086 (2019).
OpenUrl

[25] [25].↵
W Feller, An introduction to probability theory and its applications, vol 2. (John Wiley & Sons), (2008).

[26] [26].↵
J Bradbury, et al., JAX: composable transformations of Python+NumPy programs (2018).

[27] [27].↵
Reich Lab, Allen Ancient DNA Resource (https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data) (2021) Accessed: 2021-09-01.

[28] [28].↵
S Schiffels, et al., Iron age and anglo-saxon genomes from east england reveal british migration history. Nature communications 7, 10408 (2016).
OpenUrl

[29] [29].↵
R Martiniano, et al., Genomic signals of migration and continuity in britain before the anglo-saxons. Nature communications 7, 10326 (2016).
OpenUrl

[30] [30].↵
I Olalde, et al., The beaker phenomenon and the genomic transformation of northwest europe. Nature 555, 190–196 (2018).
OpenUrl CrossRef PubMed

[31] [31].↵
S Brace, et al., Ancient genomes indicate population replacement in early neolithic britain. Nat Ecol Evol 3, 765–771 (2019).
OpenUrl

[32] [32].↵
1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015).
OpenUrl CrossRef PubMed

[33] [33].↵
I Lazaridis, et al., Ancient human genomes suggest three ancestral populations for present-day europeans. Nature 513, 409–13 (2014).
OpenUrl CrossRef PubMed Web of Science

[34] [34].↵
Neale Lab, UK Biobank GWAS Results (http://www.nealelab.is/uk-biobank) (2021) Accessed: 2021-01-20.

[35] [35].↵
HR Taal, et al., Common variants at 12q15 and 12q24 are associated with infant head circumference. Nat Genet 44, 532–538 (2012).
OpenUrl CrossRef PubMed

[36] [36].↵
AR Wood, et al., Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46, 1173–86 (2014).
OpenUrl CrossRef PubMed

[37] [37].↵
M Horikoshi, et al., Genome-wide associations for birth weight and correlations with adult disease. Nature 538, 248–252 (2016).
OpenUrl CrossRef PubMed

[38] [38].↵
J Chen, et al., The trans-ancestral genomic architecture of glycemic traits. Nat Genet 53, 840–860 (2021).
OpenUrl CrossRef

[39] [39].↵
KJ Karczewski, et al., The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
OpenUrl CrossRef PubMed

[40] [40].↵
AJ Sams, et al., Adaptively introgressed neandertal haplotype at the oas locus functionally impacts innate immune responses in humans. Genome Biol 17, 246 (2016).
OpenUrl CrossRef

[41] [41].↵
J Nait Saada, et al., Identity-by-descent detection across 487,409 british samples reveals fine scale population structure and ultra-rare variant associations. Nat Commun 11, 6130 (2020).
OpenUrl CrossRef

[42] [42].↵
NS Enattah, et al., Identification of a variant associated with adult-type hypolactasia. Nat Genet 30, 233–7 (2002).
OpenUrl CrossRef PubMed Web of Science

[43] [43].↵
T Bersaglieri, et al., Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74, 1111–20 (2004).
OpenUrl CrossRef PubMed Web of Science

[44] [44].↵
BM Peter, E Huerta-Sanchez, R Nielsen, Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genet 8, e1003011 (2012).
OpenUrl CrossRef PubMed

[45] [45].↵
TJ Wang, et al., Common genetic determinants of vitamin d insufficiency: a genome-wide association study. Lancet 376, 180–8 (2010).
OpenUrl CrossRef PubMed Web of Science

[46] [46].↵
V Kuan, AR Martineau, CJ Griffiths, E Hypponen, R Walton, Dhcr7 mutations linked to higher vitamin d status allowed early human migration to northern latitudes. BMC Evol Biol 13, 144 (2013).
OpenUrl CrossRef PubMed

[47] [47].↵
HL Norton, et al., Genetic evidence for the convergent evolution of light skin in europeans and east asians. Mol Biol Evol 24, 710–22 (2007).
OpenUrl CrossRef PubMed Web of Science

[48] [48].↵
H Eiberg, et al., Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the herc2 gene inhibiting oca2 expression. Hum Genet 123, 177–87 (2008).
OpenUrl CrossRef PubMed Web of Science

[49] [49].↵
O Canela-Xandri, K Rawlik, A Tenesa, An atlas of genetic associations in uk biobank. Nat Genet 50, 1593–1599 (2018).
OpenUrl CrossRef PubMed

[50] [50].↵
PG Hysi, et al., Genome-wide association meta-analysis of individuals of european ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability. Nat Genet 50, 652–656 (2018).
OpenUrl

[51] [51].↵
M Simcoe, et al., Genome-wide association study in almost 195,000 individuals identifies 50 previously unidentified genetic loci for eye color. Sci Adv 7, doi:10.1126/sci-adv.abd1239 (2021).
OpenUrl CrossRef

[52] [52].↵
O Lao, JM de Gruijter, K van Duijn, A Navarro, M Kayser, Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms. Ann Hum Genet 71, 354–69 (2007).
OpenUrl CrossRef PubMed Web of Science

[53] [53].↵
MP Donnelly, et al., A global view of the oca2-herc2 region and pigmentation. Hum Genet 131, 683–96 (2012).
OpenUrl CrossRef PubMed

[54] [54].↵
X Yin, et al., Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat Commun 6, 6916 (2015).
OpenUrl CrossRef PubMed

[55] [55].↵
D Ju, I Mathieson, The evolution of skin pigmentation-associated variation in west eurasia. Proc Natl Acad Sci U S A 118, e2009227118 (2021).
OpenUrl

[56] [56].↵
MC Turchin, et al., Evidence of widespread selection on standing variation in europe at height-associated snps. Nat Genet 44, 1015–9 (2012).
OpenUrl CrossRef PubMed

[57] [57].↵
JJ Berg, G Coop, A population genetic signal of polygenic adaptation. PLoS Genet 10, e1004412 (2014).
OpenUrl CrossRef PubMed

[58] [58].↵
MR Robinson, et al., Population genetic differentiation of height and body mass index across europe. Nat Genet 47, 1357–62 (2015).
OpenUrl CrossRef PubMed

[59] [59].↵
JJ Berg, et al., Reduced signal for polygenic adaptation of height in uk biobank. Elife 8, e39725 (2019).
OpenUrl CrossRef PubMed

[60] [60].↵
M Sohail, et al., Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife 8, e39702 (2019).
OpenUrl CrossRef

[61] [61].↵
MP Richards, RJ Schulting, RE Hedges, Archaeology: sharp shift in diet at onset of neolithic. Nature 425, 366 (2003).
OpenUrl GeoRef PubMed

[62] [62].↵
FG Murray, Pigmentation, sunlight, and nutritional disease. American Anthropologist 36, 438–445 (1934).
OpenUrl CrossRef

[63] [63].↵
NG Jablonski, G Chaplin, Human skin pigmentation as an adaptation to uv radiation. Proc Natl Acad Sci U S A 107 Suppl 2, 8962–8 (2010).
OpenUrl Abstract/FREE Full Text

[64] [64].↵
G Flatz, HW Rotthauwe, Lactose nutrition and natural selection. Lancet 2, 76–7 (1973).
OpenUrl PubMed Web of Science

[65] [65].↵
P Gerbault, et al., Evolution of lactase persistence: an example of human niche construction. Philos Trans R Soc Lond B Biol Sci 366, 863–77 (2011).
OpenUrl CrossRef PubMed

[66] [66].↵
D Meyer, S Stavropolous, B Diamond, E Shane, PH Green, Osteoporosis in a north american adult population with celiac disease. Am J Gastroenterol 96, 112–9 (2001).
OpenUrl PubMed Web of Science