Abstract
Novel technologies for recovering DNA information from archaeological and historical specimens have made available an ever-increasing amount of temporally-spaced genetic samples from natural populations. These genetic time series permit the direct assessment of patterns of temporal changes in allele frequencies, and hold the promise of improving power for inference of selection. Increased time resolution can further facilitate testing hypotheses regarding the drivers of past selection events like plant and animal domestication. However, studying past selection processes through ancient DNA (aDNA) still involves considerable obstacles such as postmortem damage, high fragmentation, low coverage and small samples. To address these challenges, we introduce a novel Bayesian approach for the inference of temporally variable selection based on genotype likelihoods instead of allele frequencies, thereby enabling us to account for sample uncertainties resulting from the damage and fragmentation of aDNA molecules. Also, our method permits the reconstruction of the underlying mutant allele frequency trajectory of the population through time, which allows for a better understanding of the drivers of selection. We evaluate its performance through extensive simulations and illustrate its utility with an application to the ancient horse samples genotyped at the loci for coat colouration.
1. Introduction
Natural selection is the primary mechanism of adaptive evolution within natural populations (Darwin, 1859). Evolutionary and population geneticists usually study selection and adaptation based on temporally spaced genetic samples, which can provide much more valuable information regarding selection since expected changes in allele frequencies over time are closely related to selection and its strength and timing of changes. The commonest data sources have been evolve and resequencing studies combining experimental evolution under controlled laboratory or field mesocosm conditions with next-generation sequencing technology, which however are typically limited to the species with small evolutionary timescales (e.g., Turner & Miller, 2012; Bosshard et al., 2017; Good et al., 2017). Recent advances in technologies for obtaining DNA molecules from ancient biological material have given rise to massive increases in time serial samples of segregating alleles from natural populations (e.g., Mathieson et al., 2015; Librado et al., 2017; Fages et al., 2019), which offer unprecedented opportunities to study the chronology and tempo of selection across evolutionary timescales (see Dehasque et al., 2020, for a review).
As the number of published ancient genomes is growing rapidly, a range of statistical methods that estimate selection coefficients and other population genetic parameters from ancient DNA (aDNA) data have been developed over the last fifteen years (see Malaspinas, 2016, for a review). Most existing methods are based on the hidden Markov model (HMM) framework developed by Williamson & Slatkin (1999), where the underlying population allele frequency is modelled as a hidden state following the Wright-Fisher model introduced by Fisher (1922) and Wright (1931), and the observed sample allele frequency drawn from the underlying population at each sampling time point is modelled as a noisy observation of the underlying population allele frequency. To remain computationally feasible, these methods typically work with the diffusion approximation of the Wright-Fisher model (e.g., Bollback et al., 2008; Malaspinas et al., 2012; Steinrücken et al., 2014; Schraiber et al., 2016; Ferrer-Admetlla et al., 2016; He et al., 2020a,b; Lyu et al., 2022). The diffusion approximation enables us to efficiently integrate over all possible underlying allele frequency trajectories, therefore producing substantial reductions in computational cost of the likelihood calculation. These approaches have already been successfully applied in aDNA studies (e.g., Ludwig et al., 2009; Sandoval-Castellanos et al., 2017; Ye et al., 2017; Wutke et al., 2018). Moment-based approximations of the Wright-Fisher model, as tractable alternatives, are usually used in the existing methods tailored to experimental evolution (e.g., Lacerda & Seoighe, 2014; Feder et al., 2014; Terhorst et al., 2015; Paris et al., 2019) owing to their poor performance for large evolutionary timescales (He et al., 2021).
Although the field of aDNA is experiencing an exponential increase in terms of the amount of available data, accompanied by an increase in available tailored statistical approaches, data quality remains a challenge due to postmortem damage, high fragmentation, low coverage and small samples. To our knowledge, none of the existing methods are tailored to model two main characteristics of aDNA, i.e., the high error rate resulting from the damage of aDNA molecules and the high missing rate caused by the fragmentation of aDNA molecules, with the exception of He et al. (2020a) in which aDNA samples with missing genotypes are allowed. Moreover, most existing approaches assume that the selection coefficient is fixed over time, which can be easily violated in aDNA studies. Most scenarios of adaptation in natural populations generally involve adaptation to ecological, environmental and cultural changes, where it is no longer appropriate for the selection coefficient to remain constant through time. More recently, Mathieson (2020) introduced a novel approach to infer selection and its strength and timing of changes from allele frequency time series data with an application to aDNA studies. However, this approach is still built upon the assumption of no missing genotypes and sequencing errors in aDNA data.
To address these challenges, we introduce a novel Bayesian approach for estimating temporally variable selection intensity from aDNA data while modelling sample uncertainties resulting from postmortem damage, high fragmentation, low coverage and small samples. Unlike existing methods, our approach is built upon a two-layer HMM framework, where the first hidden layer characterises the underlying frequency trajectory of the mutant allele in the population through time, the second hidden layer denotes the unobserved genotype of the individual in the sample, and the third observed layer represents the data on aDNA sequences. Such an HMM framework allows for genotype likelihoods as input instead of allele frequencies, therefore modelling sample uncertainties caused by the damage and fragmentation of aDNA molecules. Our posterior computation is carried out through the particle marginal Metropolis-Hastings (PMMH) algorithm developed by Andrieu et al. (2010), which permits a joint update of the selection coefficient and population mutant allele frequency trajectory. The reconstruction of the underlying frequency trajectory of the mutant allele in the population allows for a better understanding of the drivers of selection.
We test our approach through extensive simulations, in particular when samples are sparsely distributed in time with small sizes and in poor quality (i.e., high missing rate and error rate). We analyse the ancient horse samples from Wutke et al. (2016) that were genotyped at the loci encoding coat colouration to illustrate the applicability of our method on aDNA data.
2. Materials and Methods
In this section, we begin with an introduction of the Wright-Fisher diffusion for a single locus evolving under selection over time. We describe our Bayesian procedure for inferring temporally variable selection from the data on aDNA sequences while modelling sample uncertainties caused by the damage and fragmentation of aDNA molecules.
2.1. Wright-Fisher diffusion
Let us consider a diploid population of N randomly mating individuals at a single locus 𝒜, which evolves subject to selection under the Wright-Fisher model (see, e.g., Durrett, 2008). We assume discrete time, non-overlapping generations and non-constant population size. Suppose that at locus 𝒜 there are two possible allele types, labelled 𝒜0 and 𝒜1, respectively. We attach the symbol 𝒜0 to the ancestral allele, which originally exists in the population, and the symbol 𝒜1 to the mutant allele, which arises in the population only once. We let selection take the form of viability selection and set per-generation relative viabilities of the three possible genotypes 𝒜0𝒜0, 𝒜0𝒜1 and 𝒜1𝒜1 to be 1, 1 + hs and 1 + s, respectively, where s ∈ [−1, +∞) is the selection coefficient and h ∈ [0, 1] is the dominance parameter.
We now consider the standard diffusion limit of the Wright-Fisher model with selection. We measure time in units of 2N0 generations, denoted by t, where N0 is an arbitrary reference population size fixed through time, and assume that the population size changes deterministically, with N (t) being the number of diploid individuals in the population at time t. In the diffusion limit of the Wright-Fisher model with selection, as the reference population size N0 goes to infinity, the scaled selection coefficient α = 2N0s is kept constant and the ratio of the population size to the reference population size N (t)/N0 converges to a function β(t). As demonstrated in Durrett (2008), the mutant allele frequency trajectory through time converges to the diffusion limit of the Wright-Fisher model with the reference population size N0 approaching infinity. We refer to this diffusion limit as the Wright-Fisher diffusion with selection.
We let X denote the Wright-Fisher diffusion with selection, which models the mutant allele frequency evolving in the state space [0, 1] under selection. Many existing approaches define the Wright-Fisher diffusion X in terms of the partial differential equation (PDE) that characterises its transition probability density function (e.g., Bollback et al., 2008; Steinrücken et al., 2014; He et al., 2020b). Instead, like Schraiber et al. (2016), He et al. (2020a) and Lyu et al. (2022), we characterise the Wright-Fisher diffusion X as the solution to the stochastic differential equation (SDE) for t ≥ t0 with initial condition X(t0) = x0, where W represents the standard Wiener process.
2.2. Bayesian inference of selection
Suppose that the available data are always drawn from the underlying population at a finite number of distinct time points, say t1 < t2 < … < tK, measured in units of 2N0 generations. At the sampling time point tk, there are Nk individuals sampled from the underlying population, and for individual n, let rn,k represent, in this generic notation, all of the reads at the locus of interest. The population genetic parameters of interest in this work are the selection coefficient s and dominance parameter h, and for ease of notation, we set ϑ = (s, h) in what follows.
2.2.1. Hidden Markov model
Our method is built upon the HMM framework introduced by Bollback et al. (2008), where the underlying population evolves under the Wright-Fisher diffusion with selection in Eq. (1), and the observed sample is made up of the individuals independently drawn from the underlying population at each sampling time point. To model sample uncertainties resulting from the damage and fragmentation of aDNA molecules, our approach infers selection from raw reads rather than called genotypes. We let x1:K = {x1, x2, …, xK} represent the frequency trajectory of the mutant allele in the underlying population at the sampling time points t1:K, and the posterior probability distribution for the population genetic parameters ϑ and population mutant allele frequency trajectory x1:K (up to proportionality) can be formulated as where r1:K = {r1, r2, …, rK} with . In Eq. (2), p(ϑ) is the prior probability distribution for the population genetic parameters and can be taken to be a uniform prior over the parameter space if prior knowledge is poor, p(x1:K | ϑ) is the probability distribution for the mutant allele frequency trajectory of the underlying population at all sampling time points, and p(r1:K | x1:K) is the probability of observing the reads of all sampled individuals given the mutant allele frequency trajectory of the population.
With the Markov property of the Wright-Fisher diffusion, we can decompose the probability distribution p(x1:K | ϑ) as where p(x1 | ϑ) is the prior probability distribution for the starting population mutant allele frequency, taken to be a uniform distribution over the state space [0, 1] if the prior knowledge is poor, and p(xk+1 | xk; ϑ) is the transition probability density function of the Wright-Fisher diffusion X between two consecutive sampling time points for k = 1, 2, …, K − 1, satisfying the Kolmogorov backward equation (or its adjoint) resulting from the Wright-Fisher diffusion.
To calculate the probability p(r1:K | x1:K), we introduce an additional hidden layer into our HMM framework to denote the latent genotypes of all sampled individuals (see Figure 1 for the graphical representation of our two-layer HMM framework). We let g1:K = {g1, g2, …, gK } be the genotypes of the individuals drawn from the underlying population at the sampling time points t1:K with , where gn,k ∈ {0, 1, 2} denotes the number of mutant alleles in individual n at sampling time point tk. We then have where p(gn,k | xk) represents the probability distribution for the genotype gn,k of individual n in the sample given the population mutant allele frequency xk, and p(rn,k | gn,k) represents the probability of observing reads rn,k of individual n in the sample given the genotype gn,k, known as the genotype likelihood. Under the assumption that all individuals in the sample are drawn from the underlying population in their adulthood (i.e., the stage after selection but before reproduction in the life cycle, see He et al. (2017)), we have
Genotype likelihoods are typically calculated from aligned reads and quality scores in the process of determining the genotype for each individual through genotype calling software (e.g., Li et al., 2009a,b; DePristo et al., 2011), which is an essential prerequisite for most aDNA studies, thereby assuming that these numerical values are available.
2.2.2. Particle marginal Metropolis-Hastings
Since the posterior p(ϑ, x1:K | r1:K) is not available in a closed form, we resort to the PMMH algorithm introduced by Andrieu et al. (2010) in this work, which has already been successfully applied in population genetic studies (see, e.g., He et al., 2020b; Lyu et al., 2022). The PMMH algorithm calculates the acceptance ratio with the estimate of the marginal likelihood p(r1:K | ϑ) in the Metropolis-Hastings procedure and generates a new candidate of the population mutant allele frequency trajectory x1:K from the approximation of the smoothing distribution p(x1:K | r1:K, ϑ), which can both be achieved through the bootstrap particle filter developed by Gordon et al. (1993). Such a setup permits a joint update of the population genetic parameters ϑ and population mutant allele frequency trajectory x1:K.
A key ingredient of the PMMH algorithm is to construct a bootstrap particle filter that can target the smoothing distribution p(x1:K | r1:K, ϑ). More specifically, at the sampling time point tk+1, our objective is to generate a sample from the filtering distribution p(xk+1 | r1:k+1, ϑ). Up to proportionality, the filtering distribution p(xk+1 | r1:k+1, ϑ) can be formulated as where is the predictive distribution, but not available in a closed form. From Eq. (4), the predictive distribution p(xk+1 | r1:k, ϑ) can be approximated with a set of particles generated from the filtering distribution p(xk | r1:k, ϑ) with each particle being assigned a weight , thereby where superscript represents the particle label. By substituting Eq. (5) into Eq. (4), the filtering distribution p(xk+1 | r1:k+1, ϑ) (up to proportionality) can be approximated by From Eq. (6), the approximation of the filtering distribution p(xk+1 | r1:k+1, ϑ) can be sampled with importance sampling, where we generate a set of particles from the predictive distribution with each particle being assigned a weight (Gordon et al., 1993). We resample M particles with replacement amongst the set of particles with probabilities proportional to weights . For clarity, we write down the bootstrap particle filter algorithm:
Step 1: Initialise the particles at the sampling time point t1:
Step 1a: Draw for m = 1, 2, …, M.
Step 1b: Set for m = 1, 2, …, M.
Step 1c: Resample M particles with replacement amongst with .
Repeat Step 2 for k = 2, 3, …, K:
Step 2: Update the particles at the sampling time point tk:
Step 2a: Draw for m = 1, 2, …, M.
Step 2b: Set for m = 1, 2, …, M.
Step 2c: Resample M particles with replacement amongst with .
With the procedure for the bootstrap particle filter described above, the smoothing distribution p(x1:K | r1:K, ϑ) can be sampled by uniformly drawing amongst the set of particles , and the marginal likelihood p(r1:K | ϑ) can be estimated by We can therefore write down the PMMH algorithm:
Step 1: Initialise the population genetic parameters ϑ and population mutant allele frequency trajectory x1:K:
Step 1a: Draw ϑ1 ∼ p(ϑ).
Step 1b: Run a bootstrap particle filter with ϑ1 to yield and .
Repeat Step 2 until a sufficient number of samples of the population genetic parameters ϑ and population mutant allele frequency trajectory x1:K have been obtained:
Step 2: Update the population genetic parameters ϑ and population mutant allele frequency trajectory x1:K:
Step 2a: Draw ϑi ∼ q(ϑ | ϑi−1).
Step 2b: Run a bootstrap particle filter with ϑi to yield and .
Step 2c: Accept ϑi and with otherwise set ϑi = ϑi−1 and .
Note that superscript represents the iteration in our procedure described above. In the sequel, we adopt random walk proposals for each component of the population genetic parameters ϑ unless otherwise specified.
Once enough samples of the population genetic parameters ϑ and population mutant allele frequency trajectory x1:K have been obtained, we can compute the posterior p(ϑ | r1:K) with nonparametric density estimation techniques (see Izenman, 1991, for a review) and obtain the maximum a posteriori probability (MAP) estimates for the population genetic parameters ϑ.
Alternatively, we can yield the minimum mean square error (MMSE) estimates for the population genetic parameters ϑ through their posterior means. Similarly, we can take the posterior mean of the samples of the population mutant allele frequency trajectory to be our estimate for the population mutant allele frequency trajectory x1:K.
Our method allows the selection coefficient s to be piecewise constant over time. For example, we let the selection coefficient s(t) = s− if t < τ, otherwise s(t) = s+, where τ is the time of an event that might change selection, e.g., the times of plant and animal domestication. Our procedure can then be directly applied to estimate the selection coefficients s− and s+ for any prespecified time τ. The only modification required is to simulate the underlying mutant allele frequency trajectory of the population through the Wright-Fisher diffusion with the selection coefficient s− for t < τ and s+ for t ≥ τ, respectively. In this case, our method also provides a procedure to estimate and test the change in the selection coefficient, denoted by Δs = s+ − s−, at time τ by computing the posterior p(Δs | r1:K) from the samples of the selection coefficients s− and s+. This is a highly desirable feature in aDNA studies as it enables us to test hypotheses about whether the shift in selection is linked to specific ecological, environmental and cultural drivers. We will evaluate the performance of our procedure applied to this scenario in Section 3.
3. Results
In this section, we evaluate the performance of our approach through extensive simulations and show the applicability of our approach with the data on aDNA sequences from earlier studies of Ludwig et al. (2009), Pruvost et al. (2011) and Wutke et al. (2016), where they sequenced a total of 201 ancient horse samples for eight loci that determine horse coat colouration. In what follows, we base our inference of selection on the MAP estimate and assume only a single event that might change selection.
3.1. Performance evaluation
We run forward-in-time simulations of the Wright-Fisher model with selection (e.g., Durrett, 2008) and test our approach through simulated datasets of genotype likelihoods to evaluate its performance. We let the only event that might change selection occur in generation 350, thereby taking the selection coefficient to be s(k) = s− for k < 350 otherwise s(k) = s+. We uniformly draw the selection coefficients s− and s+ from [−0.05, 0.05] and pick the dominance parameter of h = 0.5 (i.e., additive gene action). To mimic the demographic history of the horse population estimated by Der Sarkissian et al. (2015), we adopt a bottleneck demographic history, where the population size N (k) = 32000 for k < 200, N (k) = 8000 for 200 ≤ k < 400, and N (k) = 16000 for k ≥ 400. The initial population mutant allele frequency x1 is uniformly drawn from [0.1, 0.9]. For clarity, we write down the procedure of generating the dataset of genotype likelihoods: Repeat Step 1 until xK ∈ (0, 1):
Step 1: Generate s−, s+ and x1:K.
Step 1a: Draw s−, s+ from a uniform distribution over [−0.05, 0.05] and x1 from a uniform distribution over [0.1, 0.9].
Step 1b: Simulate x1:K with s−, s+ and x1 through the Wright-Fisher model with selection.
Repeat Step 2 for k = 1, 2, …, K:
Step 2: Generate p(rn,k | g) for g = 0, 1, 2 and n = 1, 2, …, Nk:
Step 2a: Draw gn,k ∼ p(g | xk) in Eq. (3).
Step 2b: Draw p(rn,k | g) for g = 0, 1, 2 from a Dirichlet distribution of order 3 with and .
We take the parameter to be ϕψ and the other two to be (1−ϕ)ψ/2, where ϕ and ψ are the parameters introduced to control the quality of the simulated dataset in terms of the missing rate and error rate with a common threshold for genotype calling (i.e., 10 times more likely, see Kim et al., 2011). By following this procedure, we simulate a total of 801 generations starting from generation 0 and draw a sample of 10 individuals every 40 generations, 210 sampled individuals in total (i.e., nearly the size of the ancient horse samples), in our simulation studies.
We run a group of simulations to assess how our method performs for different data qualities, where we vary the parameters ϕ ∈ {0.75, 0.85, 0.95} and ψ ∈ {0.5, 1.0}, respectively, giving rise to six possible combinations of the missing rate and error rate (see Table 1 where the mean and standard deviation of missing rates and error rates are computed with 1000 replicates for each combination). For each scenario listed in Table 1, we consider nine possible combinations of the selection coefficients s− and s+ (see Table 2), and for each combination we repeatedly run the procedure described above until we get 200 datasets of genotype likelihoods. Thus, in summary, we consider 200 replicates for each of 54 combinations of the data quality and selection scenario.
For each replicate, we choose a uniform prior over [−1, 1] for both selection coefficients s− and s+, and adopt the reference population size N0 = 16000. We run 10000 PMMH iterations with 1000 particles, where each generation is partitioned into five subintervals in the Euler-Maruyama scheme. We discard the first half of the total PMMH samples as burn-in and thin the remaining by keeping every fifth value. See Figure 2 for our posteriors for the selection coefficients s− and s+ produced from a simulated dataset of genotype likelihoods (see Supplementary Information, Table S1), including our estimate for the underlying frequency trajectory of the mutant allele in the population. Evidently, in this example our approach can accurately infer temporally variable selection from genetic time series in genotype likelihood format. The true underlying frequency trajectory of the mutant allele in the population fluctuates slightly around our estimate and is completely covered in our 95% highest posterior density (HPD) intervals.
3.1.1. Performance in estimating selection coefficients
To assess how our method performs for estimating selection coefficients, we show the boxplot results of our estimates across different data qualities in Figure 3, where the tips of the whiskers are the 2.5%-quantile and the 97.5%-quantile, and the boxes denote the first and third quartiles with the median in the middle. We summarise the bias and root mean square error (RMSE) of our estimates in Supplementary Information, Table S2. As illustrated in Figure 3, our estimates for both selection coefficients are nearly median-unbiased across different data qualities although a slightly large bias can be found when the data quality is poor such as scenario A (about 31.0% missing rate and 13.9% error rate). The bias completely vanishes as the data quality improves like scenario F.
To evaluate how our method performs for different selection coefficients, especially weak selection, we run an additional group of simulations with an example of additive gene action, where we fix the parameters ϕ = 0.85 and ψ = 1. We assume no event that changed selection and vary the selection coefficient s ∈ [−0.05, 0.05], which is divided into nine subintervals [−0.05, −0.01), [−0.01, −0.005), [−0.005, −0.001), [−0.001, 0), {0}, (0, 0.001], (0.001, 0.005], (0.005, 0.01] and (0.01, 0.05]. For each subinterval, e.g., [−0.01, −0.005), we uniformly choose the selection coefficient s from [−0.01, −0.005), with which we generate a dataset of genotype likelihoods. Repeat this procedure until we obtain 200 simulated datasets for each subinterval. We show the boxplot results of our estimates across different selection coefficients in Figure 4 and summarise the bias and RMSE of our estimates in Supplementary Information, Table S3.
We observe from Figure 4 that our estimates are nearly median-unbiased for weak selection, but the bias gets larger with an increase in the strength of selection (i.e., |s|). This bias could be caused by ascertainment arising from the procedure that we use to generate simulated datasets. In our simulation studies, only the simulated datasets in which no fixation event has occurred in the underlying population are kept. This setting means that the Wright-Fisher model in our data generation process is equivalent to that conditioned on no fixation event occurred, which does not match that in our approach for estimating the selection coefficient. Such a mismatch could bring about the underestimation of the selection coefficient (i.e., the simulated datasets that are retained correspond to a biased sample of the underlying population mutant allele frequency trajectories that reach loss or fixation more slowly, and therefore the estimates for the strength of selection are more likely to be smaller than their true values), especially for strong selection, since fixation events are more likely (see the histograms of the loss probability of mutations and the histograms of the fixation probability of mutations in Supplementary Information, Figure S1 for each subinterval, where the loss probability and the fixation probability for each simulated dataset are calculated based on 1000 replicates). We find that the simulated datasets with large loss probabilities are all generated with the selection coefficient s ∈ [−0.05, −0.01) whereas those with large fixation probabilities are all generated with the selection coefficient s ∈ (0.01, 0.05], which correspond to the subintervals with significant bias in Figure 4. The bias resulting from the mismatch can be fully eliminated by conditioning the Wright-Fisher diffusion to survive (He et al., 2020b).
3.1.2. Performance in testing selection changes
To evaluate how our approach performs for testing selection changes, we produce the receiver operating characteristic (ROC) curves across different data qualities in Figure 5, where the truepositive rate (TPR) and false-positive rate (FPR) are computed for each value of the posterior probability for the selection change that is used as a threshold to classify a locus as experiencing a shift in selection, and the ROC curve is produced by plotting the TPR against the FPR. We compute the area under the ROC curve (AUC) to summarise the performance. From Figure 5, we see that even though data qualities vary across different scenarios, all curves are very concave and close to the upper left corner of the ROC space with their AUC values varying from 0.89 to 0.94. This suggests that our method has superior performance in testing selection changes even for the datasets of up to approximately 31.0% missing rate and 13.9% error rate (see scenario A). Although these ROC curves almost overlap with each other, we can still see that improved data quality yields better performance (see, e.g., scenario F).
To assess how our approach performs for different selection changes, especially small changes, we run an additional group of simulations with an example of additive gene action, where we fix the parameters ϕ = 0.85 and ψ = 1. We vary the selection change Δs ∈ [−0.05, 0.05], which is divided into nine subintervals [−0.05, −0.01), [−0.01, −0.005), [−0.005, −0.001), [−0.001, 0), {0}, (0, 0.001], (0.001, 0.005], (0.005, 0.01] and (0.01, 0.05]. For each subinterval, e.g., [−0.01, −0.005), we uniformly draw the selection change Δs from [−0.01, −0.005) and the selection coefficient s− from [max{−0.05, −0.05−Δs}, min{0.05, 0.05−Δs}], and therefore have the selection coefficient s+ = s− + Δs. We generate a dataset of genotype likelihoods with the selection coefficients s− and s+. We repeat this procedure until we obtain 200 simulated datasets for each subinterval. Similarly, we run the ROC analysis, and the resulting ROC curves for all subintervals are shown in Figure 6 with their AUC values.
By examining the plots of the ROC curves in Figure 6, we find that the performance becomes significantly better with the increase in the degree of the change in the selection coefficient (i.e., |Δs|) as expected. Even for small changes |Δs| < 0.001, the AUC value is still larger than 0.70, and for large changes |Δs| > 0.01, the AUC value is up to 0.98. Our results illustrate that our method has strong discriminating power of testing selection changes, even though such a change is small.
3.2. Horse coat colouration
We employ our approach to infer selection acting on the ASIP and MC1R genes associated with base coat colours (black and chestnut) and the KIT13 and TRPM1 genes associated with white coat patterns (tobiano and leopard complex) based on the data on aDNA sequences from earlier studies of Ludwig et al. (2009), Pruvost et al. (2011) and Wutke et al. (2016), which were found to be involved to ecological, environmental and cultural shifts (Ludwig et al., 2009, 2015; Wutke et al., 2016). Given that only called genotypes are available in Wutke et al. (2016), we use the following procedure to generate genotype likelihoods for each gene in the same format as those produced by GATK (McKenna et al., 2010): if the genotype is called, we set the genotype likelihood of the called genotype to 1 and those of the other two to 0, and otherwise, all possible (ordered) genotypes are assigned equal genotype likelihoods that are normalised to sum to 1. See genotype likelihoods for each gene in Supplementary Information, Table S4.
Due to the underlying assumption of our approach that mutation occurred before the initial sampling time point, in our following analysis we exclude the samples drawn before the sampling time point that the mutant allele was first found in the sample for each gene. We adopt the horse demographic history estimated by Der Sarkissian et al. (2015) (see Supplementary Information, Figure S2) with the average length of a generation of the horse being eight years and choose the reference population size of N0 = 16000 (i.e., the most recent size of the horse population). We take the dominance parameter h to be the value reported in Wutke et al. (2016) for each gene. In our PMMH procedure, we run 20000 iterations with a burn-in of 10000 iterations, and the other settings are the same as we adopted in Section 3.1, including those in the Euler-Maruyama approach. The estimates of the selection coefficient and its change for each gene with their 95% HPD intervals are summarised in Supplementary Information, Table S5.
3.2.1. Selection of horse base coat colours
Base coat colours in horses are primarily determined by ASIP and MC1R, which direct the type of pigment produced, black eumelanin (ASIP) or red pheomelanin (MC1R) (Corbin et al., 2020). More specifically, ASIP on chromosome 22 is associated with the recessive black coat, and MC1R on chromosome 3 is associated with the recessive chestnut coat. Ludwig et al. (2009) found that there was a rapid increase in base coat colour variation during horse domestication (starting from approximately 3500 BC) and provided strong evidence of positive selection acting on ASIP and MC1R. Fang et al. (2009) suggested that such an increase was directly caused by human preferences and demands. We apply our approach to test their hypothesis that selection acting on ASIP and MC1R was changed when horse became domesticated and estimate their selection intensities. The resulting posteriors for ASIP and MC1R are shown in Figures 7 and 8, respectively.
Our estimate of the selection coefficient for the ASIP mutation is 0.0018 with 95% HPD interval [−0.0012, 0.0066] before domestication and 0.0003 with 95% HPD interval [−0.0022, 0.0031] after horses were domesticated. The 95% HPD interval contains 0 for the selection coefficient s−, but there is still some evidence showing that the ASIP mutation was most probably favoured by selection before domestication since the posterior probability for positive selection is 0.904. The posterior for the selection coefficient s+ is approximately symmetric about 0, which indicates that the ASIP mutation was effectively neutral after horse domestication started. Our estimate of the change in the selection acting on the ASIP mutation when horses became domesticated is −0.0021 with 95% HPD interval [−0.0071, 0.0031], and the posterior probability for such a negative change is 0.779. Our estimate for the underlying ASIP mutation frequency trajectory illustrates that the ASIP mutation frequency rises substantially in the pre-domestication period and then keeps approximately constant in the post-domestication period.
Our estimate of the selection coefficient for the MC1R mutation is −0.0300 with 95% HPD interval [−0.0928, 0.0997] before horses became domesticated and 0.0116 with 95% HPD interval [0.0079, 0.0174] after domestication started. Our estimates reveal that the MC1R mutation was effectively neutral or selectively deleterious in the pre-domestication period (with posterior probability for negative selection being 0.670) but became positively selected after horse domestication started (with posterior probability for positive selection being 1.000). Our estimate of the change in the selection acting on the MC1R mutation from a pre- to a post-domestication period is 0.0439 with 95% HPD interval [−0.0891, 0.1058], and the posterior probability for such a positive change is 0.748. We observe a slow decline in the MC1R mutation frequency before domestication (even though the evidence of negative selection before domestication is weak) and then a significant increase after horse domestication started in our estimate for the underlying MC1R mutation frequency trajectory.
3.2.2. Selection of horse white coat patterns
Tobiano is a white spotting pattern in horses characterised by patches of white that typically cross the topline somewhere between the ears and tail. It is inherited as an autosomal dominant trait that was reported in Brooks et al. (2007) to be associated with a locus in intron 13 of the KIT gene on chromosome 3. Wutke et al. (2016) observed that spotted coats in early domestic horses revealed a remarkable increase, but medieval horses carried significantly fewer alleles for these traits, which could result from the shift in human preferences and demands. We apply our method to test their hypothesis that selection acting on KIT13 was changed when the medieval period began (in around AD 400) and estimate their selection intensities. We show the resulting posteriors for KIT13 in Figure 9.
Our estimate of the selection coefficient for the KIT13 mutation is 0.0039 with 95% HPD interval [−0.0010, 0.0103] before the medieval period, which shows that the KIT13 mutation was positively selected before the Middle Ages (i.e., the posterior probability for positive selection is 0.935). Our estimate of the selection coefficient for the KIT13 mutation is −0.0284 with 95% HPD interval [−0.0627, 0.0015] during the medieval period, which demonstrates that the KIT13 mutation became selectively deleterious during the Middle Ages (i.e., the posterior probability for negative selection is 0.969). Our estimate of the change in the selection acting on the KIT13 mutation when the Middle Ages started is −0.0326 with 95% HPD interval [−0.0700, 0.0008], and the posterior probability for such a negative change is 0.969. We observe from our estimate for the underlying KIT13 mutation frequency trajectory that the KIT13 mutation experienced a gradual increase after horse were domesticated and then a marked decrease during the Middle Ages.
Leopard complex is a group of white spotting patterns in horses characterised by a variable amount of white in the coat with or without pigmented leopard spots, which is inherited by the incompletely dominant TRPM1 gene residing on chromosome 1 (Terry et al., 2004). The first genetic evidence of the leopard complex coat could date back to the Pleistocene (Ludwig et al., 2015). Ludwig et al. (2015) found shifts in the selection pressure for the leopard complex coat in domestic horses but did not explore whether TRPM1 undergone a shift in selection from a pre- to a post-domestication period. We employ our approach to test the hypothesis that selection acting on TRPM1 was changed when horses became domesticated and estimate their selection intensities. The resulting posteriors for TRPM1 are illustrated in Figure 10.
Our estimate of the selection coefficient for the TRPM1 mutation is −0.0005 with 95% HPD interval [−0.0042, 0.0029] before horses were domesticated and −0.0078 with 95% HPD interval [−0.0139, −0.0005] after domestication started. Our estimates provide little evidence of negative selection in the pre-domestication period (with posterior probability for negative selection being 0.617) but strong evidence of negative selection in the post-domestication period (with posterior probability for negative selection being 0.980). Our estimate of the change in the selection acting on the TRPM1 mutation when horses became domesticated is −0.0066 with 95% HPD interval [−0.0142, 0.0025]. The 95% HPD interval for the change in the selection coefficient contains 0, which however still provides enough evidence to support that a negative change took place in selection when horses became domesticated (i.e., the posterior probability for such a negative change is 0.942). Our estimate for the underlying trajectory of the TRPM1 mutation frequency displays a slow decrease in the TRPM1 mutation frequency during the pre-domestication period with a significant decline after horse domestication started.
4. Discussion
In this work, we introduced a novel Bayesian approach for estimating temporally changing selection intensity from aDNA samples. To our knowledge, most earlier methods ignored sample uncertainties resulting from the damage and fragmentation of aDNA molecules, which however are the main characteristics of aDNA. Our approach to circumvent this problem is that we based our inference of selection on genotype likelihoods rather than called genotypes, which facilitates the incorporation of genotype uncertainties. We introduced a novel two-layer HMM framework, where the top hidden layer models the underlying frequency trajectory of the mutant allele in the population, the intermediate hidden layer models the unobserved genotype of the individual in the sample, and the bottom observed layer denotes the data on aDNA sequences. By working with the PMMH algorithm, where the marginal likelihood was approximated through particle filtering, our method enables us to reconstruct the underlying mutant allele frequency trajectory of the population. Furthermore, our procedure provides the flexibility of modelling time-varying demographic histories.
The performance of our method was evaluated through extensive simulations, which showed that our procedure could produce accurate selection inference from aDNA data across different evolutionary scenarios even though samples were sparsely distributed in time with small sizes and in poor quality. Note that in our simulations studies, we only tested our method in the case of a bottleneck demographic history and additive gene action, but in principle, the conclusions we draw here hold for other demographic histories and gene actions. Demographic histories have been demonstrated to have little effect on the inference of selection from time series genetic data (Jewett et al., 2016), and additional simulation studies for the cases of the mutant allele being recessive (h = 0) and dominant (h = 1) can be found in Supplementary Information, Figures S3 and S4, respectively, as well as in Table S6.
We illustrated the utility of our approach with an application to ancient horse samples from previous studies of Ludwig et al. (2009), Pruvost et al. (2011) and Wutke et al. (2016), which were genotyped at the loci (e.g., ASIP, MC1R, KIT13 and TRPM1) for horse coat colouration. Our findings are compatible with earlier studies that the coat colour variation in the horse is a domestic trait that was subject to early selection by humans (Hunter, 2018), e.g., ASIP, MC1R and TRPM1, and human preferences have changed greatly over time and across cultures (Wutke et al., 2016), e.g., KIT13.
Our results for the base coat colour are consistent with previous studies that the shift in horse coat colour variation in the early stage of domestication could be caused by relaxed selection for camouflage alleles (Hunter, 2018). More specifically, the ASIP mutation was positively selected in the pre-domestication period, but the MC1R mutation was not. From Sandoval-Castellanos et al. (2017), forest cover was growing as a result of global warming during the Late Pleistocene, which pushed horses into the forest full of predators. Dark-coloured coats could help horses avoid predators through better camouflage, therefore improving their chances of survival. After horses were domesticated, the ASIP mutation was no longer selectively advantageous, but the MC1R mutation became favoured by selection. The shift in the horse coat colour preference from dark to light could be probably explained by the fact that light-coloured horses were no longer needed to protect against predation due to domestication, and light-coloured coats could facilitate horse husbandry since it was easier to keep track of the horses that were not camouflaged (Fang et al., 2009).
Our results for the tobiano coat pattern demonstrate that the KIT13 mutation was favoured by selection from domestication till the Middle Ages and then became negatively selected, which confirm the findings of Wutke et al. (2016). Such a negative change in selection of the tobiano coat could result from pleiotropic disadvantages, a lower religious prestige, a reduced need to separate domestic horses from their wild counterparts or novel developments in weaponry during the medieval period (see Wutke et al., 2016, and references therein).
Our results for the leopard complex coat pattern illustrate that the TRPM1 mutation was negatively selected from the Late Pleistocene onwards. Our evidence of negative selection acting on the TRPM1 mutation in the pre-domestication period is not strong enough, but we can still see a slow drop in the TRPM1 mutation frequency over time. The TRPM1 mutation is the most common cause of congenital stationary night blindness (CSNB) (Bellone et al., 2013), which could reduce the chance to survive in the wild since vision is key for communication, localisation, orientation, avoiding predators and looking for food (Murphy et al., 2009). The weak intensity of negative selection could be explained as resulting in part from that only horses homozygous for the leopard complex coat pattern are influenced by CSNB, which however remarkably increased when horses were domesticated. In the post-domestication period, horses were harnessed mainly for power and transportation, e.g., they were used to pull wheeled vehicles, chariots, carts and wagons in the early stage of domestication and later used in war, in hunting and as a means of transport, which all strongly rely on the ability to see. Moreover, night-blind horses are nervous and timid in human care, and difficult to handle at dusk and darkness (Rebhun et al., 1984).
Compared to existing methods (e.g., Bollback et al., 2008; Malaspinas et al., 2012; Steinrücken et al., 2014; Schraiber et al., 2016; Ferrer-Admetlla et al., 2016; He et al., 2020a,b), our Bayesian procedure enables the selection coefficient to vary in time (i.e., piecewise constant) although the event that might change selection is required to be prespecified. However, this is still important in aDNA studies as adaptation in natural populations typically involve adaptation to ecological, environmental and cultural shifts. We run our procedure with the same settings as we adopted in Section 3.2 on the ancient horse samples presented in Supplementary Information, Table S4, except that the selection coefficient is fixed through time (see Supplementary Information, Figure S5 for the resulting posteriors and Supplementary Information, Table S7 for the estimates of the selection coefficients with their 95% HPD intervals). We find for example that the KIT13 mutation was effectively neutral during the post-domestication period, which contradicts the archaeological evidence and historical records that spotted horses were subject to early selection by humans, but the preference changed during the medieval period (see Wutke et al., 2016, and references therein). Compared to the results presented in Figures 7–10, we see the necessity of modelling temporally variable selection in aDNA. Our approach lends itself to being extended to the analysis of multiple prespecified events that might change selection (see Section 2.2). To guarantee computational efficiency, a feasible solution is to adopt an adaptive strategy that can automatically tune the selection coefficients during a run (see Luengo et al., 2020, for a review). A potential direction for future research is the inference of selection and its strength and timing of changes from aDNA data (Shim et al., 2016; Mathieson, 2020).
In aDNA studies, the samples with missing genotypes are usually filtered, and the remaining samples are grouped into a small number of sampling time points, e.g., ancient horse samples were grouped into six sampling time points in Ludwig et al. (2009) or nine sampling time points in Wutke et al. (2016). Grouping has been shown to largely bias the inference of selection from time series data in He et al. (2020b). Our method provides an alternative that can address the issue caused by the procedure of sample filtering and grouping (see Supplementary Information, File S3 for additional simulation studies, where we compare our results produced with genotype likelihoods to those based on called genotypes). Our simulation studies show that the commonly used procedure of processing aDNA data for the inference of selection could significantly alter the result, in particular for poor data quality, and suggest that the approach based on genotype likelihoods could be a more promising alternative for future aDNA studies.
One fundamental limitation of our method applied for the inference of selection from aDNA data is that it assumes that all samples have drawn after the mutant allele was created. However, allele age is not always available, and as a result, in our analysis we have to remove the samples drawn before the time that the mutant allele was first found in the sample, which might alter the result of the inference of selection. To address this challenge, we can extend our approach to jointly estimate the allele age as in Malaspinas et al. (2012), Schraiber et al. (2016) and He et al. (2020b), and the foreseeable challenge is how to resolve particle degeneracy and impoverishment issues in our PMMH-based procedure that result from low-frequency mutant alleles at the early stage facing a higher probability of being lost. Furthermore, our approach assumes that the gene of our interest is independent of others, but in practice this assumption can be easily violated once there exist interactions between genes like epistasis and linkage, e.g., MC1R is epistatic to MC1R (Rieder et al., 2001), and KIT13 is tightly linked to KIT16 (Dumont & Payseur, 2008), which might bias our result of the inference of selection. To extend our procedure to the scenario of multiple genes with epistasis and/or linkage, we need to find a good approximation of the Wright-Fisher model that characterises multiple genes evolving subject to selection with epistasis and linkage through time, which becomes challenging with an increase in the number of genes. We will discuss how to extend our approach to the scenario of two genes with epistasis and/or linkage in our upcoming work, and the extension for the scenario of multiple interacting genes (≥ 3) will be the topic of future investigation.
Data Accessibility Statement
The authors state that all data necessary for confirming the conclusions of the present work are represented fully within the article. Source code implementing the approach described in this work is available at https://github.com/zhangyi-he/WFM-1L-DiffusApprox-PMMH/.
Author Contributions
Z.H. designed the project and developed the method; Z.H., X.D. and W.L. implemented the method; X.D. and W.L. analysed the data under the supervision of Z.H., M.B. and F.Y.; Z.H. and X.D. wrote the manuscript; W.L., M.B. and F.Y. reviewed the manuscript.
Acknowledgements
This work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol - http://www.bristol.ac.uk/acrc/.