The temporal and genomic scale of selection following hybridization

Genomic evidence supports an important role for selection in shaping patterns of introgression along the genome, but frameworks for understanding the dynamics underlying these patterns within hybrid populations have been lacking. Here, we develop methods based on the Wavelet Transform to understand the spatial genomic scale of local ancestry variation and its association with recombination rates. We present theory and use simulations to show how wavelet-based decompositions of ancestry variance along the genome and the correlation between ancestry and recombination reflect the joint effects of recombination, genetic drift, and genome-wide selection against introgressed alleles. Due to the clock-like effect of recombination in hybrids breaking up parental haplotypes, drift and selection produce predictable patterns of local ancestry variation at varying spatial genomic scales through time. Using wavelet approaches to identify the genomic scale of variance in ancestry and its correlates, we show that these methods can detect temporally localized effects of drift and selection. We apply these methods to previously published datasets from hybrid populations of swordtail fish (Xiphophorus) and baboons (Papio), and to inferred Neanderthal introgression in modern humans. Across systems, we find that upwards of 20% of the variation in local ancestry at the broadest genomic scales can be attributed to systematic selection against introgressed alleles, consistent with strong selection acting on early-generation hybrids. We also see signals of selection at fine genomic scales and much longer time scales. However, we show that our ability to confidently infer selection at fine scales is likely limited by inherent biases in current methods for estimating local ancestry from genomic similarity. Wavelet approaches will become widely applicable as genomic data from systems with introgression become increasingly available, and can help shed light on generalities of the genomic consequences of interspecific hybridization.

ψ j,k (ℓ) =      0 ℓ < (k − 1)2 j or ℓ ≥ k2 j 2 −j/2 (k − 1)2 j ≤ ℓ < (k − 1)2 j + 2 j−1 −2 −j/2 (k − 1)2 j + 2 j−1 ≤ ℓ < k2 j (S3) where j = 1, 2, ... log 2 L and k = 1, ..., L/2 j . Haar wavelets of level j only take non-zero values in one 742 interval of the domain, which is of length 2 λ . We define the scale of a wavelet, λ to be equal to half of this 743 interval, i.e. λ = 2 j−1 , so we can say the wavelet changes from positive to negative over a scale of λ (this 744 definition corresponds with the description of λ in the main text). Thus, the set of scales that the DWT 745 yields information about are a dyadic series, that is, successive scales differ by a factor of two. We also see 746 that the number of wavelets at a given level, L/(λ + 1), is simply the number of wavelets at that level that 747 fit onto the chromosome with no overlap in where they take non-zero values. Because of our restriction that 748 L must be a power of two, this is always whole number. Thus, the number of wavelets decreases by two 749 for successively larger scales. If we order the wavelets at a given level with their non-zero portions arranged 750 adjacent from left to right, ψ j,k is the k th wavelet in this ordered set. In this sense, the parameter k describes 751 where in the domain the non-zero portion of the wavelet is located.

756
Because of this, G does not obey property S2a, but it does obey properties S2 and S4. 757 S1.1.2 Wavelet coefficients 758 We have now seen that the rows of Ψ are the wavelets ψ λ,k (ℓ), along with the scaling filter G(ℓ) as described 759 above. The n th entry of W is an inner product between the n th row of Ψ and the vector x. The inner 760 product between a wavelet and x yields a wavelet coefficient. Because the mean of any wavelet is zero, a 761 particular wavelet coefficient is proportional to the covariance between the ancestry signal and the associated 762 wavelet: The inner product of the scaling filter and x produces the scaling coefficient which is proportional to the 764 mean of x(ℓ): If we sort the rows Ψ by increasing j and then k, and finally the scaling filter G, we can expand the 766 matrix multiplication of Eqn. S1 using Haar wavelets:  Analogous to the wavelet variance, we can compute the wavelet covariance at scale λ for two signals x and 780 y: and for the DWT the sum of these gives an exact decomposition of the total covariance: 22 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The DWT is restricted to signals that are powers of two, and the coefficients produced by the DWT are not 784 shift-invariant. In other words, the set of coefficients resulting from performing the DWT on a shifted signal 785 (moving every value by the same amount in one direction, and treating the signal as circular) are not the 786 same set of coefficients resulting from the DWT on the original signal.

787
The MODWT is a related transform that is both well-defined for signals of any length and shift invariant 788 (Percival and Walden 2000). In essence, it results from circularly shifting x and performing the DWT, 789 resulting in a vector of wavelet coefficients W λ of length L for each λ, representing localized changes in 790 adjacent averages taken in windows of size λ, as well as one vector of scaling coefficients V of length L 791 corresponding to average values of the signal in windows of size ⌊log 2 L⌋. One consequence of this procedure 792 is that the first λ elements of W λ and V are so-called boundary coefficients that describe changes between 793 discontiguous portions at the beginning and end of the signal. Different approaches for dealing with these 794 coefficients when calculating wavelet variances are available (described in Percival and Walden). In our 795 application, we have omitted boundary coefficients in our calculation of wavelet variances, but including 796 boundary coefficients produced qualitatively similar results.

797
Also in contrast to the DWT, the MODWT is not an orthogonal transformation, as there is redundancy 798 among neighboring wavelet coefficients within each level. In other words, wavelet coefficients will often be 799 auto-correlated along the signal. This does not however bias the estimates of the wavelet variances, and we 800 use MODWT as it produces less noisy estimates of wavelet variances. Similar to the DWT, the MODWT 801 gives a complete decomposition of the sample variance as follows: was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made In this section we derive expectations of the wavelet variance for ancestry state along the genome under a 811 neutral single-pulse model of hybridization.

812
S2.1 Expectation for wavelet variance of sample mean ancestry where 1 i (ℓ) is an indicator random variable; 1 i (ℓ) = 1 if haplotype i carries an introgressed allele at locus ℓ 815 and is zero otherwise. We will later refer to the 'recipient' and 'donor' populations as populations 0 and 1, 816 respectively, but the choice is arbitrary with respect to the admixture proportions.

817
From Eqn. S9, the expected contribution to the total variance of mean ancestry x(ℓ) along the sequence 818 associated with level λ is given by (S14) We can approximate the sum over the L loci as a continuous integral over a region of length L. Then, ] as the probability that haplotype i inherits from the donor population at 821 locus ℓ and individual j inherits from the donor population at locus ℓ ′ . From (S15), we also see that for a

826
Under neutrality, both of the expectations in (S15) will only depend on the distance between ℓ and ℓ ′ , and 827 not their exact locations, so there is no dependence on k. Thus, we have 828 24 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. The two expectations in (S15) can then be calculated under a coalescent model using a two state continuous 829 time Markov chain that runs backward in time from the present. Let state X = 1 represent the alleles at loci 830 ℓ and ℓ ′ being present on the same haplotype, and state X = 2 represents the two alleles being present on 831 separate haplotypes. The transition rate matrix is given by the coalescent approximation with time scaled 832 to units of 2N generations (τ = 2N t), The transition probability matrix is given by with D containing the eigenvalues of Q on the diagonal and the columns of A containing the corresponding 835 right eigenvectors so that For an initial admixture proportion of α, two alleles present on the same haplotype at the time of admixture 837 come from the donor population with probability α, while two alleles present on different haplotypes at the 838 time of admixture come from the donor population with probability α 2 . We then have (S17) Note in the above that if we set t = 0 we get E[1 i (ℓ) 1 i (ℓ ′ )] = α and E[1 i (ℓ) 1 j (ℓ ′ )] = α 2 as expected. In this 840 case, since ψ λ,k (ℓ) integrates to zero, the integrand in (S15) becomes an even function with no dependence on 841 the distance between loci and we get E[σ λ 2 ] = 0. This makes intuitive sense; with no time for recombination 842 after the initial admixture, there is zero variance at all spatial scales as introgressed alleles are in complete 843 linkage. In the limit as N → ∞, both expectations depend only on the recombination process and we have In this particular case the wavelet variance of the mean is equivalent to the wavelet variance for a single 845 haplotype, since drift cannot induce any covariance among haplotypes.

S2.3 Discrepancy for early generations
In early generations after admixture, the coalescent provides a poor model for the distribution of ancestry 854 tract lengths, because the fixed pedigree constrains which ancestors can be inherited from (Liang and Nielsen 855 2014). We thus expect error in the calculations above for the first few generations after hybridization. To 856 illustrate this, we calculate the expected wavelet variance for the F2 generation, assuming no selection or 857 assortative mating in the F1 generation. To match our Wright-Fisher simulation framework, we imagine that 858 two populations mix with proportions a and 1 − α. In the F1 generation, individuals are either descended 859 from a mating between two individuals from the same source population (events A, B with probabilities α 2 860 and (1 − α) 2 , respectively), or they are F1s (event C) with probability 2α(1 − α). We then have Using Haldane's map function for the probability that a chromosome produced through meiosis in an F1 is 862 not recombinant at ℓ and ℓ ′ , this becomes , we only need to consider whether the lineages coalesce in the previous generation: Using the expectations in Eqns. S20 and S21 in the integrand for the wavelet variance calculation, we 865 find that this provides a much better fit for the simulated wavelet variance in the F2 generation compared 866 to the coalescent model specified by Eqn. S17 (Fig. S1).
867 Figure S1: Black points and error bars show mean wavelet variances and 95% confidence intervals for ancestry state along single chromosomes in the F2 generation from 100 replicate simulations of a 50/50 mixture between two populations. Red line shows expected wavelet variance using the coalescent model. Blue line shows expected wavelet variance calculated for the F2 generation by conditioning on the pedigree. Figure S2: Simulated power spectrum of ancestry proportion in a hybrid population undergoing genetic drift compared to theoretical expectations. We simulated (using SliM, Haller and Messer, 2019) a population of constant size 2N=20000 (light blue) and a population that undergoes a bottleneck to 2N=200 for just the first 10 generations of recombination in hybrids, then expands to 2N=20000 (maroon). Points and error bars show means and 95% confidence intervals across 20 replicate simulations. Solid grey lines show theoretical expectations. Also shown are scaling variances collapsed into a single category 'scl' and chromosome-level variance. Vertical dotted lines are placed to indicate the expected distance between recombination breakpoints that have accrued along a single chromosome since the hybridization pulse. In contrast to the figures shown in the main text, simulated loci are evenly spaced on a genetic map of the human autosomes, thus no interpolation is required and the estimates closely match the theoretical expectation.

27
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ;https://doi.org/10.1101/2023 doi: bioRxiv preprint Figure S3: The effect of ancestry state interpolation on the power spectrum. Because informative markers are generally not evenly spaced, to perform the wavelet transform we must interpolate ancestry to evenly-spaced positions along the chromosome, either on a physical or genetic map. To see the scale of noise generated by interpolation, we compare simulated power spectra vs. theoretical expectations (Appendix B) for two cases after 1000 generations of mixture. (Left) Simulated loci are evenly spaced on the genetic map, thus no interpolation is required. In this case, deviations from the neutral expectation occur only at the finest scales, possibly reflecting numerical error. (Right) Simulated loci occur every 50kb on a physical map of the human autosomes, and ancestry is then interpolated to a grid on the genetic map to perform the wavelet transform. In this case, interpolation causes a downward bias over the left half of the power spectrum. Figure S4: Selection against multiple alleles carried in one ancestry background distorts the wavelet variance decomposition of introgressed ancestry relative to the neutral expectation. We ran simulations of a 50/50 population mixture with selection acting against alleles from one ancestry at varying numbers of loci (10, 100, 1000, and 10000). In each case, the total strength of selection against F1 hybrids is held constant, where F1s have a 50% fitness reduction relative to the local source population. With fewer numbers of loci, there is stronger per locus selection concentrated in fewer regions, and selection at these loci generates increased broad-scale variation among regions harboring the deleterious loci and regions with mostly neutral variation.

28
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ; https://doi.org/10.1101/2023.05.25.542345 doi: bioRxiv preprint Figure S5: Genome-wide selection against alleles from one ancestry distorts the power spectrum. (Left) Selection on a heterogeneous recombination landscape modelled off the human autosomes generates greater variance at broad scales and lower variance at fine scales relative to the neutral expectation. Selection acts additively against alleles from one ancestry at 10,000 loci placed uniformly at random on the physical map, each selected allele has a selection coefficient of s = 5e − 5. The power spectrum of ancestry is compared to that from neutral simulations that use the same interpolation procedure. Thus, the distorion here is due to selection per se and not interpolation. The reduction in variance at fine scales after only 10 generations can be explained by the fact that selection brings the average introgressed ancestry proportion closer to zero. (Right) Here, simulated loci are placed evenly on a genetic map and results are compared to neutral simulations that do the same. Thus, although the overall variance is reduced, we see that selection distorts the power spectrum in a similar manner even in the absence of recombination rate heterogeneity. Figure S6: Selection against introgressed alleles generates correlations with recombination at varying scales according to the total number, and thus the spacing between selected loci. In simulations where the total strength of selection is held constant but distributed over varying numbers of loci, we find that in generation 1000, correlations with recombination are present at varying scales. With 10,000 loci under selection, fine scale recombination rate variation generates fine scale correlations between introgressed ancestry and recombination. With only 10 loci under selection, fine scale variation in recombination rate does not influence the rate of purging of deleterious introgressed alleles, so correlations at finer scales do not establish. Error bars represent 95% confidence intervals across 20 replicate simulations runs.

29
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ;https://doi.org/10.1101/2023 doi: bioRxiv preprint Figure S7: The wavelet decomposition of the correlation between recombination rate and introgressed ancestry can detect temporally-localized effects of selection in hybrids. (a) Selection acting only on the first 10 generations of recombinant hybrids generates significant positive wavelet correlations only at broad scales (brown) (viewed in generation 1000), whereas continuous selection over 1000 generations continues to generate correlations on finer scales (red). Selection that begins after generation 500 and of neutral mixture and continues until generation 1000 also generates fine-scale correlations, as well as broad-scale correlations. When selection acts continuously but reverses direction after 100 generations to favor the alternate ancestry, positive broad-scale correlations persist as negative correlations establish at finer scales. The comparison between these last two cases indicates that correlations at broad scales are dominated by the dynamics of selection in the early generations after mixture. Only significant correlations are shown, error bars represent 95% confidence intervals across 20 replicate simulations.

30
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ; https://doi.org/10.1101/2023.05.25.542345 doi: bioRxiv preprint Figure S9: Contribution of each genomic scale to overall correlation between recombination and malinche-like ancestry. Results shown on the physical map (a) and genetic map (b).
Figure S10: Correlations through time at varying spatial scales between recombination rate and minor parent ancestry (X. malinche). Results shown on the physical map (a) and genetic map (b). Correlations between are not significantly different between any two timepoints at any scale.

31
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Figure S12: Power spectra of ancestry proportion for simulations with widespread weak selection (red) vs. neutral simulations (blue). Selection acts against alleles from one ancestry with additive effects within and across loci at 10,000 loci. Each allele has a selection coefficient of −2 x 10 −5 , yielding a 20% fitness reduction in F1s. Error bars are 95% confidence intervals computed from 20 replicate simulations.

32
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ; https://doi.org/10.1101/2023.05.25.542345 doi: bioRxiv preprint

33
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ; https://doi.org/10.1101/2023.05.25.542345 doi: bioRxiv preprint Following the methods described in these papers, we applied a thresholds of 0.9 and 0.42, respectively, to the posterior probabilities of a site matching a Neanderthal reference. We note that two studies that are commonly cited for evidence of selection against Neanderthal alleles utilize the non-thresholded posterior values from this study - Figure

34
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted May 26, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023