The timing of human adaptation from Neanderthal introgression

Admixture has the potential to facilitate adaptation by providing alleles that are immediately adaptive in a new environment or by simply increasing the long term reservoir of genetic diversity for future adaptation. A growing number of cases of adaptive introgression are being identified in species across the tree of life, however the timing of selection, and therefore the importance of the different evolutionary roles of admixture, is typically unknown. Here, we investigate the spatio-temporal history of selection favoring Neanderthal-introgressed alleles in modern human populations. Using both ancient and present-day samples of modern humans, we integrate the known demographic history of populations, namely population divergence and migration, with tests for selection. We model how a sweep placed along different branches of an admixture graph acts to modify the variance and covariance in neutral allele frequencies among populations at linked loci. Using a method based on this model of allele frequencies, we study previously identified cases of Neanderthal adaptive introgression. From these, we identify cases in which Neanderthal introgressed alleles were quickly beneficial and other cases in which they persisted at low frequency for some time. For some of the alleles that persisted at low frequency, we show that selection likely independently favored them later on in geographically separated populations. Our work highlights how admixture with ancient hominins has contributed to modern human adaptation, contextualizes observed levels of Neanderthal ancestry in present-day and ancient samples, and identifies cases of temporally varying selection that are sometimes shared across large geographic distances.

of 10 −8 ) and collapsed those that were overlapping or directly adjacent. This resulted in 36 distinct regions, 142 listed in Table S3, out of 50 45kb windows with signals of adaptive Neanderthal introgression in Europeans. 143 Neanderthal ancestry over time As a first step to understand the temporal history of selection, we 144 investigated levels of Neanderthal ancestry in ancient and present day populations within each of the pre-145 viously specified regions. These levels were determined from ancestry informative sites, which we identified 146 as bi-allelic sites in modern humans that have one allele at greater than 95% frequency in Neanderthals 147 (combined Vindija and Altai, two high coverage Neanderthals) and less than 5% frequency in Yoruba, and 148 vice versa. In Figure 2, we show average Neanderthal allele frequencies across all ancestry informative sites 149 with a Neanderthal allele frequency of at least 20% in at least one population. We use this subset of ancestry 150 informative sites because we are interested in frequencies along the selected haplotype(s), which do not nec-151 essarily span the entire length of the previously described windows. If selection quickly favored introgressed 152 alleles, we would at least expect all points in Figure 2 to show high levels of Neanderthal ancestry (i.e. no 153 green points, which represent an average Neanderthal allele frequency of less than 5% in the region for a given 154 population). However, in a number of these genomic regions, Neanderthal ancestry is almost or completely 155 absent from East Asian and/or some ancient populations.  The primary admixture graph that we use in our method. Our method requires the divergence times among admixed modern human populations, the time and proportion of admixture with Neanderthals, and the sampling time of each population. It also requires the timing and proportions of admixture among modern human populations, which we describe in detail in Table S2. We converted many of these estimates from years to generations assuming a generation time of 29 years (Fenner, 2005).  Allele segregates neutrally at low frequency Allele is selected Allele segregates neutrally at high frequency t b t s t I −t b −t s generations after admixture (t) frequency of selected allele Figure 3: (Top) Cartoon of haplotype diversity and ancestry at three time points when selection favors Neanderthal introgressed alleles immediately (t b = 0) and when selection favors Neanderthal introgressed alleles more recently (t b > 0). Purple tracts represent Neanderthal introgressed haplotypes, whereas green tracts represent haplotypes that did not descend from Neanderthals. We display Neanderthal ancestry tracts with fewer shades to represent their lower genetic diversity. (Bottom) Example frequency trajectory of a selected Neanderthal introgressed allele when there is a waiting time until selection. The light grey lines represent 50 stochastic trajectories in which the selected allele reached a frequency greater than 20%. The dark grey line represents the trajectory we assume in our models using the same waiting time until selection (t b ), selection coefficient (s), and duration of the sweep phase (t s ) used to generate the stochastic trajectories.

202
Our aim is to distinguish among the possible scenarios of selection on introgressed variation. Each scenario 203 is defined by the combination of two parameters we aim to infer in our model: the amount of time between 204 Neanderthal admixture and the onset of selection, which we refer to as the waiting time until selection (t b ) 205 and the additive strength of selection favoring the beneficial Neanderthal allele (s). We build on the model- 206 based, statistical approach introduced in Lee and Coop (2017), which uses coalescent theory to describe 207 how different selection scenarios modify the neutral variance and covariance of population allele frequencies 208 surrounding the selected site. This allows us to describe a multivariate normal model of population allele 209 frequencies for each scenario, which serves as a simple approximation for their probability distribution (Weir 210 et al., 2002;Nicholson et al., 2002). 211 Here, we review the framework laid out by Lee and Coop (2017). We model the change in allele frequency 212 ∆x i = x i − x a in our sampled populations (i) from their common ancestral population (a) at the root of the tree relating all populations we consider. For neutral alleles, the population allele frequency will on average 214 be the same as the ancestral allele frequency (E[∆x i ] = 0) because drift and hitchhiking are direction-less on 215 average. Drift and hitchhiking do however cause an increase in the variance in the change in allele frequencies 216 within a population, and pairs of populations can also covary in their change in allele frequency from the 217 ancestor if they have some shared population history or gene flow, i.e. their changes in allele frequency since 218 the ancestral population are not independent of one another. These effects are captured by the population 219 covariance between populations, i and j, given by where f ij is the probability that the ancestral lineages of an allele sampled in population i and an allele 221 sampled in population j coalesce before reaching the ancestral population (see Lee and Coop, 2017 when the average Neanderthal-introgressed allele frequency is the initial admixture fraction (g). We define 283 probabilities of coalescing under the null model as f (N ) in . In all we derive the probability of coalescing between 284 a pair of lineages sampled from partition B of population i and Neanderthal population n to be where superscript (N ) refers to predictions under our null model. In partition b of the same population (i), 286 all alleles are linked to a non-Neanderthal allele at sampling. If an allele sampled in this population never 287 recombines out, we know it cannot have introgressed. Therefore in order to have the chance at coalescing 288 with a Neanderthal allele, it must recombine out before admixture. Forward in time, that means that a

289
Neanderthal allele recombined onto the sequence carrying the non-Neanderthal allele at the partition site.

290
Therefore a neutral allele sampled from partition b of population i has the following probability of coalescing 291 with a neutral allele sampled from Neanderthals: In Supplement section S3 we describe our derivations for other population relationships under the null model. Neanderthals is much higher than the original admixture proportion, as we described informally above in 298 our discussion of Figure 3. phase II). When the relative fitness advantage of the selected allele is additive, such that heterozygotes have 305 an advantage of s and homozygotes 2s, Since the sum of all three phase durations equals the time between the present day and admixture (t I ), the duration of neutral phase II equals t I − t b − t s .

308
As the waiting time until selection t b increases, and assuming the same t b for all selected populations, 309 we transition from describing a case in which the selected allele became beneficial in the common ancestor 310 of all Eurasian populations, to the case in which the selected allele became beneficial independently in each 311 selected population. If t b is short enough such that selection began in the common ancestor of a group of 312 selected populations, we must account for the possibility that this common ancestor also has descendent 313 populations that do not carry the selected allele at high frequency, possibly due to subsequent drift or 314 negative selection. According to t b , we assign each population with very low frequency of the selected allele 315 into one of two categories: (i) ancestors never selected or (ii) ancestors selected with subsequent loss of the 316 selected allele in this population. We assign the first category if the low frequency population does not share 317 a common ancestor with any selected populations at the time selection starts, i.e. its divergence from all 318 selected populations predates selection. Otherwise we assign the second category.

319
Similar to the null model, we condition on ancestry at the putative selected site and consider how Pr(linked to selected allele at sweep finish) = in which an allele sampled from partition b can only be linked if it recombines off of its background.

343
Second, during the sweep phase, an allele that begins linked to the selected allele will always be associated where X(t) is the frequency of the selected allele in generation t of the sweep. In words, this probability is 346 approximately the product of the probabilities of not recombining onto the non-selected background each 347 generation of the sweep. Similarly, an allele that begins the sweep phase linked to the non-selected allele will always be associated with that background during the sweep with probability Pr(always linked to non-selected allele during sweep) The above associations persist if the lineage never recombines out of its background during neutral phase Here, we show results for a case in which the Neanderthal derived allele became adaptive in the ancestors of Early Neolithic Farmers immediately upon introduction (t b =0, red) and after 600 generations (blue). The favorable allele experienced a selective advantage (s) of 0.01 and on average reached a frequency (x s ) of 0.7 before becoming neutral again. The genetic window corresponds to 1 Mb with a constant recombination rate of 10 −8 . Partition B refers to alleles linked to the beneficial Neanderthal allele, while partition b refers to alleles linked to the non-Neanderthal counterpart at the selected site. We present categories of population relationships in which probabilities of coalescing vary the most with different waiting times until selection. See Figure S11 and Figure S12 for more population relationships, waiting times until selection, and final frequencies of the selected allele.

367
For each model (null or selection) and combination of free parameters, we define a variance-covariance matrix 368 of population allele frequencies for a given distance away from the selected site (Lee and Coop, 2017). We divergence and sampling times of admixed modern human populations (see Figure 1 and Table S2). The where the set of parentheses following F (S) denote the parameters of the function F (S) . When we predict 382 F (S) , we categorize Neanderthal-admixed populations as 'selected' if their frequency of the Neanderthal allele 383 at the selected site is greater than or equal to 0.05. We set x s to be the average selected Neanderthal allele 384 frequency among all of these putative selected populations. Our method could be extended to allow x s to 385 vary among populations and to do model choice of the selected populations, however for simplicity we do 386 not pursue those applications here. We estimate the composite likelihood of free parameters s and t b given 387 the allele frequency data across populations (D) in a window around a candidate selected site by taking 388 the product of all likelihoods calculated at each locus to the left and right of the candidate selected site as 389 follows, where L left and L right represent the number of polymorphic loci to the left and right of the candidate selected 391 site. We calculate composite likelihoods among the proposed parameter combinations of s and t b shown in 392 Table S4.

393
The composite likelihood under the null model takes the same form as selection; we simply replace F (S) 394 with F (N) and remove dependence on s, t b , and x s . Each partition site represents a potential selected site 395 and thus differs in its genetic distance to each neutral locus. Therefore, each partition site has its own set of 396 composite likelihoods for the null and selection models. We identify the 'best' partition site for each region 397 by selecting the site whose ratio of its maximum composite likelihood under the selection model to the null 398 model is greatest, i.e. by maximizing the level of support for selection at this site. From this partition 399 site, we identify the maximum composite likelihood estimatest b andŝ for the region. We note that in our 400 application, these estimates tended to be very similar among partition sites.

401
Our method focuses on inferring the history of selection from haplotypes alone, and does not incorporate

406
We analyze bi-allelic sites that are polymorphic among our set of samples. Since our multivariate normal 407 approximation works best when alleles segregate ancestrally at some intermediate frequency, we remove sites 408 polymorphic in only one population with a minor allele frequency less than 0.01. In practice, we mean 409 centered our observed allele frequencies, which removes dependence on the ancestral allele frequency x al .

410
We provide more detail on mean centering, sample size correction, and implementation in Appendix section 411 A.3.

412
Distinguishing between immediate and non-immediate selection To assess the timing of selection 413 relative to introgression we first distinguish between immediate (t b = 0) and non-immediate (t b > 0) selection. 414 We do so by using the ratio of the maximum composite likelihood under all parameter combinations to the 415 maximum composite likelihood when selection is immediate, The higher this ratio, the more support we have in favor of an initial neutral period relative to immediate 417 selection. Since composite likelihoods ignore the correlation in allele frequencies across loci due to linkage 418 disequilibrium, we cannot use traditional statistical methods for model selection. Validation We ran our method on simulated data to evaluate its performance, using SLiM 3.0 for forward

438
Our first goal was to determine our power to detect selection that did not begin immediately, and so we suggesting that we should correctly identify many of the regions that only recently contributed to adaptation.

452
Overall, s is not well inferred, likely because we deal with old selection events. Therefore, while s appears in recombination rates between each neutral and selected site for the method (note that this means that we 466 need to assume that recombination rates have been relatively constant over this time period). We excluded 467 sites without Neanderthal allele frequency data from our analysis, as they are less informative of introgression 468 patterns and the method might instead pick up on unrelated signals of selection at these sites. 469 We estimated the neutral F from putative neutral allele frequency data genome wide. We chose regions at  Validation section in that they involve larger loci (3 cM) and were generated from more combinations of s 498 and t s intended to target different x s , totaling 4000 simulations. We describe those simulations in Table S7.

499
Of the 36 regions we ran our method on, our method estimatedt b > 0 in twelve, and rejected immediate 500 selection in eight (Figure 8). Six of these regions havet b < 500, whereas the remaining two have extremely 501 high estimates: CACNA1S,ASCL5 hast b = 1300 and OAS1,OAS3 hast b = 1500.

502
As with many selection scans, the function of alleles underlying adaptation are obviously unknown. Here we use patterns of linked selection to develop one of the first model-based methods that infers the 519 time at which Neanderthal introgressed alleles became adaptive. Our approach uses ancient DNA as well as 520 the hitchhiking effect to investigate the temporal history of selection. We directly address partial sweeps, 521 which likely reflect most cases of human adaptation and perhaps adaptive introgression more generally, 522 and use them as a tool to time the onset of selection. While we require a known demographic history 523 among Neanderthal-admixed populations, our method is robust to modifications in these assumptions (see 524 Supplement section S2.2 for discussion).

525
We found that in most regions that we analyzed, we could not reject a model of selection immediately 526 favoring Neanderthal-introgressed alleles upon admixture. In some of these regions, East Asians and/or 527 ancient populations do not contain signatures of adaptive introgression. Obviously some of these cases may 528 represent our lack of power to rule out short onset times from haplotypes alone and so our results should 529 not be taken as indicating that selection began immediately for many Neanderthal haplotypes. In addition, 530 the regions that we ran our method on were identified from signals of unusual sharing with Neanderthals in East Asian or other populations that we did not identify. Among the regions in which we did reject 535 immediate selection, we estimated a wide range of waiting times until selection. From simulations, we found 536 that we can generally classify these cases into earlier (0 <t b ≤ 800) and more recent (t b > 800) selection.

537
These time frames may be coarse, but the transition between them does roughly mark the end of the last 538 glacial period (∼ 11.5 kya). Thus, we can generally distinguish between adaptation to conditions during 539 the Upper Paleolithic versus later periods defined by warming, technological innovation, the emergence of 540 farming, and higher population densities.

541
We modeled that the beneficial allele was at a low frequency for some time followed by the onset of selec-542 tion due to some environmental change. This initial period could correspond to the Neanderthal allele being 543 neutral or balanced at low frequency. However there are a few possibilities other than a truly delayed onset 544 of selection that could explain our rejection of immediate selection. First, selection may have immediately 545 favored an allele, but the allele was recessive and thus took a long time to rise in frequency. We chose to 546 model additive selection because we do not have information on dominance, but for recessive alleles we may The selection pressure favoring the OAS1,OAS3 haplotype is thought to be related to innate immunity.

586
It has previously been suggested that this gene contributes to traits under balancing selection because however we did not analyze them as they contain both Neanderthal and Denisovan haplotypes in Eurasian region is at odds with the hypothesis that the transfer of pathogens from Neanderthals to modern humans necessitated rapid human adaptation using Neanderthal-derived immunity alleles (Enard and Petrov, 2018; date the start of adaptation for more of these introgressed, putatively immunity-related haplotypes.

599
Our investigation contributes to a growing model-based understanding of the genomic patterns of adaptive

626
Using a combination of ancient DNA and haplotype based timing, we documented spatially and tempo-627 rally varying selection on Neanderthal alleles. Our results provide evidence that admixture between diverged 628 populations can be a source of genetic variation for adaptation in the long term. They also allow us to better 629 understand the historical and geographic contexts within which selection favored Neanderthal introgression.

630
As experimental work begins to identify the specific alleles and phenotypes potentially selected on, we can 631 more fully flesh out how interbreeding with archaic populations tens of thousands of years ago has shaped 632 the evolution of modern human populations.

634
We thank Iain Mathieson for helpful feedback and for sharing genotype likelihoods from the ancient samples 635 that we included in our analysis. We would also like to thank members of the Coop lab for great discussions from the ancestral population, the probability a pair of ancestral lineages coalesce before the root is In our scripts we estimate f ij using an 927 equivalent expression that directly averages over the two possible alleles at a locus, . (A. 2) The numerator of the fraction represents the probability of sampling different alleles from population i 929 and population j, which we estimate by taking its average over all loci. When i and j correspond to the 930 populations that root the tree, this average serves as the unbiased estimator of the denominator, 2x a (1 − x a ). account for how allele frequencies change when we sample without replacement, i.e. we must remove the 933 finite sampling bias. If n il represents the sample size of population i at locus l, then when i = j, When validating that our model predictions match the probabilities of coalescing in our selection simulations, 935 such as in Figure 4, we repeated the above procedure for sets of loci binned by genetic distance to the selected 936 site.

938
When we calculate likelihoods of population allele frequencies, we need to account for the finite sampling bias 939 in our estimates. Previously, we described the variance in the true population allele frequency due to genetic 940 drift, however there is an additional variance in our estimates due to sampling. Since the count of a certain 941 allele type in a sample is binomially distributed according to the true allele frequency, then with sample size 942 n il the variance in allele frequency due to sampling is expected to be approximately x a (1 − x a ) 1 n il . Thus the 943 total variance in the population allele frequency is x a (1 − x a )(f ii + 1 n il ). Therefore, when we calculate the 944 probability of population allele frequencies at a locus, we first modify F (S) by adding 1 n il along the diagonal 945 for each population i.

946
Since we do not know the ancestral allele frequency at each locus, we approximate it with the mean 947 across our sampled population allele frequencies, where k l is the number of populations with allele frequency data at locus l. We acknowledge that in our 949 casex l will be biased toward European allele frequencies. With allele frequencies now distributed relative to 950 each other, we accordingly modify the population allele frequencies and covariance matrix by mean-centering where T l is the k l − 1 by k l matrix with k l − 1 k l on the diagonal and − 1 k l elsewhere. Therefore we calculate 957 the probability of mean-centered allele frequencies at a locus as We take the same approach when calculating probabilities under the null model.

959
To allow for computational efficiency, we first bin each site's absolute genetic distances to the selected 960 site into 10 −3 cM intervals. Each bin's midpoint value is then used as the representative recombination 961 rate to predict F (N) and F (S) for each parameter combination. When we calculate the probability of allele 962 frequencies at each site, we use its bin's representative F and remove the rows and columns corresponding 963 to unsampled populations at the site prior to sample size correction and mean centering.  Table S3: Putative regions of adaptive introgression that we analyzed. Genes correspond to those listed with signals of adaptive introgression in Racimo et al. (2017). We label intergenic regions by their chromosome and midpoint position.

S1 Supplementary Tables
Proposed selection coefficients (s) Proposed waiting times until selection (t b ) 0. 002, 0.005, 0.01, 0.015, 0.02, 0.025, 0.03, 0.05 0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700  For a given run of the method, we did not evaluate a combination of s and t b if the sweep would not complete (deterministically) before the present day, i.e. if the time until selection is recent enough and/or the sweep time t s (which also depends on x s ) is long enough such that t b + t s > t i .

Waiting time until selection (t b )
Selected human populations Combinations of s and t s (references rows in Table S6)  The method's power to reject immediate selection increases with the true waiting time until selection (t b ) 968 and the average frequency of the selected allele among putative selected populations (x s ) because they each 969 lead to later estimates of t b , which correspond to higher CLR t b >0 (Figures S1-S5). Among the simulations in 970 which our method rejects immediate selection, it reliably identifies late selection when t b > 800. Notably, it 971 can tell that selection is recent despite the selected allele having drifted to high frequency earlier in time. For 972 example, the West Eurasian Upper Paleolithic (EurUP) were sampled 908 generations after admixture, such 973 that whenever t b > 900, allele frequencies in that population can only reach higher frequencies because of 974 genetic drift. Despite these high frequency cases when the true t b > 900, the method can tell that selection 975 started after the EurUP were sampled.

976
When East Asians are the only Eurasian population that does not carry the allele at high frequency, the 977 method is biased toward inferring earlier selection when t b < 800 and does poorly when t b = 800. When

978
EurUP is the only population that does not carry the allele at high frequency, the method is biased toward 979 inferring later selection when t b < 800. s=0.005, t s =501 s=0.005, t s =655 s=0.01, t s =327 s=0.01, t s =474 Figure S1: Positive correlation between x and log CLR t b >0 among immediate selection simulations. Relationship does not differ among combinations of s and t s . See Table S6 for targeted x s corresponding to each combination of s and t s .

981
In our application, we ran our method on all partition sites while assuming the admixture graph specified 982 in Figure 1, which we call option A. To evaluate the sensitivity of our estimates to these assumptions, we 983 re-ran the method using three other graphs and the data set of the highest maximum composite likelihood   400  600  800  1000  1200  1400  1600  0  200  400  600  800  1000  1200  1400  1600  0  200  400  600  800  1000  1200  1400  1600  0  200  400  600  800  1000  1200  1400  1600  0  200  400  600  800  1000  1200  1400  1600  0  200  400  600  800  1000  1200  1400 Figure S9: Each region's estimated waiting time until selectiont b under different admixture graphs. Each admixture graph specified by a letter is introduced in section S2.2. Regions are presented in the same order as in Figure 2. proposed waiting time until selection log maximum composite likelihood Figure S10: Profile composite likelihood surface of the waiting time until selection (t b ) in the region CACNA1S,ASCL5 when we removed East Asians (CHB) from our analysis. We used the data set of the maximum composite likelihood partition site from when we included CHB in our analysis.
If both lineages remain strictly associated with the sweep, then we consider the possibility that they coalesce Pr(coalesce | both non-Neanderthal) = f ij − g 2 f nn (1 − g) 2 . (C.4) We have previously described our predictions when one of the lineages descends from Neanderthals due neutral phase II, and if the pair of lineages are linked to the same allele type at the most recent time they share a common ancestor, we consider the additional possibility that they coalesce during neutral phase II 1094 because of their associations with a subpopulation of haplotypes. We account for this using the same Poisson 3. Note that predictions of t b become less distinguishable when x s decreases (see Figure S11 for comparison to x s = 0.7).