Genomic regions of current low hybridisation mark long-term barriers to gene flow in scarce swallowtail butterflies

Many closely related species continue to hybridise after millions of generations of divergence. However, the extent to which current patterning in hybrid zones connects back to the speciation process remains unclear: does evidence for current multilocus barriers support the hypothesis of speciation due to multilocus divergence? We analyse whole-genome sequencing data to investigate the speciation history of the scarce swallowtails Iphiclides podalirius and I. feisthamelii, which abut at a narrow (∼25 km) contact zone north of the Pyrenees. We first quantify the heterogeneity of effective migration rate under a model of isolation with migration, using genomes sampled across the range to identify long-term barriers to gene flow. Secondly, we investigate the recent ancestry of individuals from the hybrid zone using genome polarisation and estimate the coupling coefficient under a model of a multilocus barrier. We infer a low rate of long-term gene flow from I. feisthamelii into I. podalirius the direction of which matches the admixture across the hybrid zone and complete reproductive isolation across ≈ 33% of the genome. Our contrast of recent and long-term gene flow shows that regions of low recent hybridisation are indeed enriched for long-term barriers which maintain divergence between these hybridising sister species. This paves the way for future analysis of the evolution of reproductive isolation along the speciation continuum. Author summary Efforts to understand how new species evolve typically approach the problem through either: 1) investigating patterns of genetic exchange across ’hybrid zones’ — where closely related species interbreed — or 2) modelling the demographic history of closely related species using geneological trees. Both approaches are capable of quantifying variation in genetic exchange, or ’gene flow’, along the genome to identify regions of reproductive isolation; yet they leverage genetic signatures across vastly different timescales. The former exploits very recent signatures, while the latter averages long-term signatures over the history of divergence. Hence, we can compare and contrast each approach to understand how patterns of gene flow change across the speciation continuum. Here we use this strategy to capture the speciation dynamics of a pair of hybridising papilionid butterflies. Our results show that not only are these species continuing to produce hybrids after more than a million years since the onset of divergence, but there is a significant degree of concordance between patterns of gene flow observed along the genome across time scales.


Introduction
A fundamental goal of speciation research is to understand the genetic basis of reproductive isolation (RI) between diverging species and quantify the demographic and selective processes that lead to a build-up of RI (Campbell et al. 2018).We now know that speciation in the face of gene flow is not only possible (Bush 1994, Dieckmann & Doebeli 1999, Nosil 2008, Feder et al. 2012) but frequent (Taylor et al. 2001, Wang et al. 1997, Martin et al. 2013, Roux et al. 2016, Taylor & Larson 2019): closely related species often continue to hybridise after millions of generations of divergence (Mallet 1986, Harrison et al. 1990, Rieseberg et al. 1999, Sankararaman et al. 2014), yet remain distinct despite low levels of gene flow (Stankowski et al. 2021).However, given the time scales over which speciation events develop, the processes that contribute to RI are likely to vary over time (Drès & Mallet 2002).Stankowski & Ravinet (2021) define the speciation continuum as a continuum of reproductive isolation, from incipience to complete hybrid inviability, and highlight that the species we observe today are at different stages along this continuum.For example, sister species in many temperate plant and animal taxa form secondary contact zones (Hewitt 1988, Remington 1968) which may be substantially younger than the onset of speciation.In other words, contemporary contact zones are likely one of many instances of secondary contact generated by drastic environmental changes, such as glacial cycles over the last 800,000 years (Ebdon et al. 2021).Hybrid zones (HZs) for such taxa are stable exactly because there is strong selection against admixed ancestry.Whilst it is evident that genetic divergence and differentiation varies along the genome (Grainger Hunt & Selander 1973, Nosil et al. 2009), understanding the extent to which this reflects variation in the rate of gene flow requires explicit modelling of the interplay between migration, selection, genetic drift, and recombination.
Locally beneficial alleles may be selected against if they migrate into unfavourable environments and/or genetic backgrounds, reducing effective gene flow (Lenormand 2002).Over time, gene flow may vary as a consequence of range shifts and/or other changes in demography caused by glacial cycles, and -as a consequence -the selective forces and targets underpinning RI may also vary over time.Periods of complete allopatry during which gene flow is interrupted facilitate the build-up of strong endogeneous (Kruuk et al. 1999) or 'intrinsic' barriers (Bank et al. 2012).
Genetic studies of HZs show that recent introgression may vary considerably along the genome (Payseur & Rieseberg 2016): some genome regions harbouring strong incompatibilities act as strong barriers and show steep clines (Barton & Bengtsson 1986, Yang et al. 2020).Steep clines in nature predicted (Macholán et al. 2011) the approximate location of the second mammalian hybrid sterility gene Hstx2 years before its identification in the lab (Bhattacharyya et al. 2014).In other regions introgression may be unimpeded by selection or even advantageous (Hedrick 2013, Arnold et al. 2008, Racimo et al. 2015).Importantly, the barriers associated with RI since the onset of divergence may differ from those acting during secondary contact, many of which may have arisen recently (Abbott et al. 2013, Ravinet et al. 2017).To date, very few studies have compared the targets of short-term selection against heterospecific ancestry in hybrids with long-term barriers to gene flow.Previous attempts to investigate the overlap of long-term barriers and hybrid-zone (HZ) introgression relied on comparing outliers of genetic differentiation with cline-based analyses and yielded mixed results: while several systems show strong associations between divergence and introgression (Gompert & Buerkle 2012, Gosset & Bierne 2013, Taylor et al. 2014, Yang et al. 2020), numerous other case studies revealed considerable heterogeneity (Teeter et al. 2008, Parchman et al. 2013, Kingston et al. 2017, Westram et al. 2021).The power of these approaches has been limited for two reasons: firstly, summary statistics of the SFS (such as F ST ) confound the effect of longterm barriers with other population genetic processes and, by focusing on the most extreme outliers, limit sample size and power.Similarly the power of clinal analyses has been limited for two reasons: extensive geographic sampling requirements and insufficient summary statistics; clines often being merely centered and scaled without allowing for variation in their shape and asymmetry.
The lack of comparisons of barriers to gene flow across recent and long-term timescales has been highlighted in a recent review on speciation genomics (Ravinet et al. 2017).However, filling this gap requires quantitative inference frameworks that can distinguish barriers on both time scales from other evolutionary processes; frameworks that have only become available recently.

Speciation in Iphiclides butterflies
The southern European 'scarce swallowtails' Iphiclides podalirius and I. feisthamelii are large papilionid butterflies, typically associated with various species of Prunus and other Rosaceae bushes and trees.While I. podalirius ranges across the north of the Mediterranean from France to East Asia, I. feisthamelii is restricted to the Iberian peninsula and the northwest of Africa (Fig 1).Despite a long appreciation of the phenotypic differences between the two species (Godart 1842), including genital morphology (Coutsis & Van Oorschot 2011) and male ultraviolet wing patterns (Gaunet et al. 2019), their taxonomic status has been disputed after DNA barcoding at the mitochondrial (mt) COX1 locus revealed that the two taxa share mt haplotypes (Wiemers 2003, Wiemers & Fiedler 2007).However, it has been suggested recently that the mt genealogy reflects an introgression sweep most likely linked to infection by Wolbachia (Gaunet et al. 2019).The two taxa diverged approximately 1.2 million years ago (Wiemers & Gottsberger 2010, Ebdon et al. 2021) and today, both species abut at a narrow (∼25 km) contact zone north of the Pyrenees (Descimon & Mallet 2009) (Fig 1).While potential hybrids have been diagnosed based on morphology in a small set of museum specimens from France (Lafranchis et al. 2015), the putative hybrid zone has not yet been characterised genetically.Here, we quantify introgression across the hybrid zone and estimate long-term effective migration rates (m e ) for this pair of sister species.

Aims and objectives
We investigate the history of speciation and quantify both long and short-term barriers between I. podalirius and I. feisthamelii.We use two recently developed minimal-assumption inference methods gIMble (Laetsch et al. 2023) and diem (Baird et al. 2023) on whole-genome sequencing (WGS) data to quantify long-term barriers and selection against recent introgression in the HZ, respectively.Firstly, we infer variation in the rate of long-term effective migration along the genome under an explicit demographic model and locate putative long-term barriers to gene flow with gIMble .Secondly, we use the genome polarisation framework diem to quantify recent introgression barriers using six putative hybrid individuals sampled from the Iphiclides hybrid zone (Fig 1 ) and characterize the multilocus barrier to ongoing hybridisation between these species.Finally, we investigate the overlap between long-term barriers to gene flow and genomic regions that are depleted for recent introgression.We address the following questions: 1. What is the direction and rate of gene flow between I. podalirius and I. feisthamelii since their initial divergence?

Interspecific variation is consistent with hybridisation in Iphiclides
We generated WGS data for six individuals from the Iphiclides hybrid zone (we will refer to these as "the HZ set") in southern France (  1).
We identified runs of homozygosity (ROH) using PLINK (v1.9) (Purcell et al. 2007) to estimate inbreeding via F roh .One I. podalirius sample from Sicily (RVcoll12R048) was particularly inbred (F roh ≈ 0.25), and two other I. podalirius individuals, one from Romania (RVcoll14E561) and one from the hybrid zone (1325) were somewhat inbred (F roh ≈ 0.076 and 0.061 respectively).F roh in the remaining samples (including all I. feisthamelii ) was negligible (see Supplementary Table 1).This analytic framework co-estimates variation in long-term m e along the genome.Contrasting the support for a background IM model and a history without gene flow in genomic windows gives a measure for the strength of putative barriers.Since this analysis of long-term barriers relies on the assumption of a two population history, HZ and North African samples were excluded.We summarise genetic variation in short 64 base 'blocks', which are assumed to be non-recombining, under no direct selection, and to evolve with a constant mutation rate.These assumptions allow modelling the shared genealogical history of closely linked variants in the composite likelihood framework implemented in gIMble .To maximise the density of neutrally evolving sites we follow Laetsch et al. (2023) and restrict this analysis to intergenic sequence which have similar per site diversity and divergence to fourfold degenerate (4D) sites in coding regions (Table 1).After applying coverage based filters (see Methods), our analyses include 39% of the genome (≈ 160 out of 408 Mb).
We fit a series of demographic models to the blockwise site-frequency spectrum (bSFS) of the whole genome: a model of strict divergence (DIV ) and an IM model with migration in either direction with three N e parameters (IM 3,→pod and IM 3,→fei ).Out of these three scenarios (Table 2), the best fitting history is an IM  2) and N e estimates of 483,000, 377,000 and 1,140,000 for I. feisthamelii and I.
podalirius and the ancestral population, respectively.This split time is close to previous estimates for the species pair (Ebdon et al. 2021).The long-term genome-wide rate of gene flow m e from I.

The genomic correlates of barrier regions
Models of local adaptation under migration selection balance predict a concentration of barriers in regions of low recombination and a positive correlation between m e and recombination (Barton & Hewitt 1985).In line with this prediction we find that barriers and m e are correlated with recombination rates over two different scales.First, the proportion of barrier sequence is positively correlated with chromosome length (Pearson's ρ = 0.193).Second, we observe that, on average, barrier regions are closer to the centre of chromosomes (which generally have a reduced rate of recombination) than non-barrier regions (t-test, p = 0.0513).Interestingly, this is the opposite of the pattern observed in Heliconius melpomene and H. cydno (Laetsch et al. 2023).Contrary to expectation, we find that both the density of coding sequence (CDS) and repeats are reduced in barrier regions (Fig S3, see discussion).

Genome polarisation reveals complex hybrids
Chromosome-painting of individual genomes via diagnostic markers provides a direct way to visualize the mosaic ancestry of hybrid zone samples (Meier et al. 2021, Iasi et al. 2024, Saarman et al. 2019).While many have relied on assigning hybrid genotypes based on a reference panel which is assumed to reflect 'pure' ancestry, this assumption is both unnecessary and biases inference against introgression (Gompert et al. 2017).We labeled the ancestry of alleles at all non-singleton SNPs with respect to the focal barrier using the EM algorithm implemented in diem (Baird et al. 2023).This approach requires no a priori defined sets of reference individuals or candidate sites and assigns genotypes of 0/0, 1/1 or 0/1 at each variant position.All samples were included in the polarisation.
Genome polarisation shows that the HZ individuals (Fig .Interestingly, we find that the six HZ individuals only exhibit one Z chromosome of mixed ancestry (1325), which suggests that the Z is a substantially greater barrier than autosomes of equivalent length, as predicted by numerous theoretical models (Charlesworth et al. 1987, Mank et al. 2009, Bachtrog et al. 2019, Turelli & Orr 1995).

Modelling the history of introgressed blocks in complex hybrids
If we make the simplifying assumption that introgression across the HZ into either species started at time T , occurs at a constant rate m, and only involves backcrosses (i.e. the recipient population is infinitely large), the lengths of admixture tracts depend only on the coupling coefficient θ = S/R (Barton & Hewitt 1985, Baird 1995).More precisely, the equilibrium solution for the gradient of given that the recombination is known to vary substantially along butterfly chromosomes (Martin Iphiclides.Fourthly, violating our infinite population size assumption, hybrids in finite populations can interact, producing complex admixture tracts.This may inflate the distribution of small tracts in both admixture directions and explain the asymmetric deviation from expectation we see between fits.Finally, it is also plausible that the polygenic model of barrier architecture assumed by Baird (1995) breaks down over short scales, because it does not include the possibility that increasingly short blocks may have increasingly variable fitness effects (including the possibility of adaptive introgression).

Long-term barriers to gene flow are associated with regions of low hybridity
To

Discussion
Much of the recent research on the genomics of speciation has focused either on fitting demographic models to estimate migration over timescales of N e generations (Excoffier et al. 2013, Aeschbacher et al. 2017, Mondal et al. 2019, Fraïsse et al. 2021) or on investigating barriers to recent introgression (10s or 100s of generations) in hybrid zone and crossing studies (e.g.Buggs 2007, Teeter et al. 2008, Macholán et al. 2011, Turner & Harr 2014, Hagberg et al. 2022, Poikela et al. 2022).However, there are surprisingly few attempts to understand how barriers to gene flow on these two timescales are related.Here we have fitted explicit demographic models of speciation to infer the heterogeneity in long-term effective migration between two sister species of Iphiclides butterflies.We intersect this inference with signals of recent, heterogenous introgression across a hybrid zone estimated using genome polarisation.

Evidence for both long-term and recent gene flow between the scarce swallowtails
We find that the long-term demographic history of I. podalirius and I. feisthamelii is well approximated by an IM model with a low rate of unidirectional migration from I. feisthamelii into I. podalirius.Thus our analysis adds to the growing number of young species pairs for which a signal of long-term and ongoing migration has been estimated, including great apes (Mailund et al. 2012), butterflies (Martin et al. 2013, Capblancq et al. 2019, Mackintosh et al. 2023), mollusks (Hirase et al. 2021), angiosperms (Papadopulos et al. 2011), birds (Reifová et al. 2016), Drosophila (Garrigan et al. 2012), and many more.
However, we note that the global effective migration rate (m e ) we infer in Iphiclides is considerably lower than estimates obtained for sister species pairs of Heliconius (Laetsch et al. 2023) and Brenthis (Mackintosh et al. 2023) butterflies, which are of comparable age (Fig S8).Although this low background level of genome-wide migration reduces the power to identify individual barrier regions, it still allows quantification of the variation of m e along the genome.Furthermore, we find evidence for ongoing introgression between the two Iphiclides species across a HZ in the form of complex hybrids.Relating the distribution of admixture tract lengths to analytic expectations under a model of secondary contact (Baird 1995) suggests that recent gene flow into I. podalirius is neutral, while gene flow into I. feisthamelii is strongly selected against.This asymmetry is not only concordant with the inferred direction of long-term gene flow, but also consistent with genetic load arguments, which predict stronger selection against admixture tracts derived from the taxon with lower N e (Harris & Nielsen 2016), i.e.I. podalirius in this case.
More generally, it is encouraging how much information about both recent and long-term introgression is contained in a small sample of individual genomes: one can locate barriers to gene flow over timescales of N e generations by fitting models of speciation.Given that most species pairs do not have HZs, such long-term barrier inference is the only genomic information available about their speciation history.Even when HZs exist, they may not be amenable to clinal analyses due to sampling constraints.However, our analyses demonstrate that the coupling coefficient and the recent history of gene flow can be estimated from a handful of HZ individuals.It is perhaps surprising how well the admixture tract lengths we observe in Iphiclides fit analytic expectations under the simplest possible model of secondary contact which assumes panmixia, infinite population size and weak selection (Baird 1995).The fact that the observed length distribution deviates from this expectation for short tracts suggests that there may be additional information contained in small samples of hybrid genomes to fit more biologically realistic models of admixture.In particular, it would be interesting to fit more realistic barrier architectures that assumes a large but finite number of loci.Furthermore, classic theory on Fisher junctions for hybrid zones considers the decay of admixture tracts due to crossover events (Barton 1983).However, the internal decay of admixture tracts due to gene conversion events has so far been ignored and overlays the effects of an additional clock.
The ability to estimate the genomic distribution of block sizes (Fig 4) and relate it to theory (Baird 1995) developed long before genomic data were available is thrilling.The distribution of large blocks is expected to reach equilibrium quickly, and appears log-log linear suggesting relatively strong coupling of 0.44 at least in one direction.Neither direction reaches the 'tipping point' coupling of S/R = 1 (Barton 1983), where multilocus clines 'congeal', however that sharp threshold at equilibrium is misleading.Because small blocks equilibrate very slowly, it would take longer than an interglacial period for a sharp distinction in secondary contact outcome to be perceivable (Baird 1995

Long and short-term barriers to gene flow
The fact that barriers, inferred over these two time-scales of evolution, overlap shows that a significant subset of barriers is persistent and acts at very different points of the speciation continuum.
This suggests that regions of the genome that maintain reproductive isolation between species in the long-term are also relevant in early species divergence.Indeed, in Heliconius butterflies it has been shown that the same wing pattern genes maintain species differences both across hybrid zones (Nadeau et al. 2014) and in deep time (Laetsch et al. 2023).However, in contrast the congruence in barrier landscapes across timescales we find in Iphiclides is not restricted to a small number of large effect genes but rather a genome-wide phenomenon, suggesting a polygenic barrier architecture.
We would argue that this internal comparison of long-term barrier landscape with short-term barriers is a more promising avenue for testing of models of speciation than comparisons/contrasts of species at different stages, which invariably differ in speciation history and barrier architecture (Johnson & Miyanishi 2008, Mérot et al. 2017, Stankowski & Ravinet 2021).Thus, an important direction for future work is to develop quantitative predictions for the temporal change in barrier landscapes.Under an allopatric null model, pairwise intrinsic incompatibilities are assumed to accumulate at random positions in the genome and genome-wide coupling is expected to increase quadratically with time.Under this model, we expect a limited amount of temporal barrier overlap which arises from the fact that barrier loci that establish early have a greater effect on long-term m e than later barriers.
In contrast, verbal models of 'divergence hitchhiking' assume that early barriers expand locally (Wu 2001, Nosil 2008) and so predict a greater degree of temporal overlap.While theoretical models show that locally expanding barriers are possible -given sufficiently strong selection and linkage (Otto 2019) -there is so far not much empirical evidence that locally expanding 'islands of speciation' are a common feature of speciation.

The architecture of barrier loci
We found substantial variation both in the size of barrier regions and in the proportion of barrier sequence per chromosome.Specifically, barriers to gene flow aggregate on large chromosomes and towards chromosome centres.Both are regions of the genome for which recombination rates are reduced, as we expect from barriers individually conferring small effects.This is consistent with previous research on Heliconius butterflies, which demonstrated that barriers to introgression are concentrated in regions of low recombination (Martin et al. 2019, Laetsch et al. 2023), and supports a polygenic barrier architecture.Such architecture was originally proposed as a null model of reproductive isolation (Barton & Charlesworth 1984) and evidence of polygenic barriers to gene flow has accumulated across a range of taxa (Szymura & Barton 1986, Macholán et al. 2007, Turner & Harr 2014, Morán & Fontdevila 2014).Thus, it appears on a broad scale that the architecture of reproductive isolation between these species is strongly linked to recombination rate heterogeneity.
It must be noted that the aforementioned low rate of background migration limits the power to detect m e which is locally reduced but non-zero.As such, a subset of small barrier regions may simply be false positives.
We investigated whether barriers are associated with particular genomic features and failed to find enrichment of repetitive elements or coding sequence.However, one might expect this if reproductive isolation is driven by selection on genes within barrier regions.The deficit of coding sequence is likely a consequence of the strong negative correlation between gene density and chromosome length (Pearson's ρ = -0.421) on one hand and the strong positive correlation between barrier density and chromosome length on the other.These together may explain the strong negative correlation between gene and barrier density (Pearson's ρ = -0.652),which is partially driven by the numerous small chromosomes with very high numbers of genes and small numbers of barriers.

Limitations
To infer barriers to gene flow conservatively, we use the strictest possible threshold (m e,i = 0) and quantified the false positive rate (FPR) in a parametric bootstrap (see methods).Whilst this potentially excludes actual barriers with m e,i > 0 at putatively neutral flanking regions -which are the basis/input of our inference -this minimises the FPR.
Numerous factors may contribute to biases in our results (see Ravinet et al. (2017), andLaetsch et al. (2023) for gIMble limitations).Given that a directly estimated recombination map does not yet exist for Iphiclides, we cannot quantify the degree to which recombination rate heterogeneity contributes to migration rate variation nor account for the fact that uncertainty in estimates of both short and long-term barriers depends on the local recombination rate.
There may be scope to improve the power to detect weak gene flow through modelling more specific demographic scenarios.Firstly, whilst it is very likely that these species have undergone repeated rounds of separation and gene flow, we have fit a much simpler model that assumes a single continuous rate of migration.Expanding the gIMble framework to include isolation with initial migration (IIM) models may improve our power to detect ancient gene flow (Wilkinson-Herbots 2008, Laetsch et al. 2023).Secondly, we have modelled net gene flow as unidirectional and inferred the most likely direction by comparing models.In reality, net gene flow between these species may be bidirectional but asymmetric, a phenomenon which has been demonstrated in other systems (Yan et al. 2016, Ngeve et al. 2017, Banker et al. 2022).Whilst fitting more realistic models of gene flow is not currently feasible using this inference framework, future extensions lie on the horizon (Bisschop 2022).
The diem genome polarisation algorithm is designed to work off even small amounts of low quality data.There should therefore be few power limitations working with high quality genome scale data, and indeed estimates of hybrid indices, interspecific heterozygosity, and visualisation of admixture have high precision.Estimating admixture tract size distributions requires a further level of precision however: a single ancestry state error in a long block will on average halve its estimated length, error due to a miscalled variant would affect one tract, but error due to a mispolarised site could affect many.To avoid such error driven tract fragmentation, we filter polarised sites for high diagnostic index -strongly correlated with the support for correct polarity -and introduce an arbitrarily chosen scale of kernel smoothing of state along chromosomes.This has an advantage over HMM approaches in that it has no starting point chirality, but the disadvantage that it censors true small tract signal with some distribution of false negatives.This likely contributes to the departure of small tract observations from simple expectations (Fig 4).

Conclusion
We have demonstrated an association between long-term and short-term reproductive isolation in a species pair of swallowtail butterflies.Despite being several million generations old, gene flow persists between these species and is currently concentrated at a hybrid zone.The considerable number, varying size, and location of barrier regions suggest that the genomic architecture of speciation is polygenic, or multilocus sensu Barton (1983).While our analysis did not account for heterogeneity in recombination rate, m e variation both between and within chromosomes is likely shaped by recombination rate heterogeneity, a pattern previously observed in Heliconius butterflies (Martin et al. 2019, Laetsch et al. 2023).However, in the absence of a direct estimate of recombination heterogeneity it is impossible to know to what extent the overlap between long and short-term barriers reflects a shared/stable recombination landscape rather than shared selective targets.Our results pave the way for future quantitative analyses of the temporal evolution of the genomic landscape of species barriers.While most speciation processes are far too slow to observe directly, genomic variation clearly contains information about the interplay of forces acting at different stages of the speciation continuum.per sample.Additionally, GT calls were required to have a minimum PHRED quality of one and SNPs within 2 bases of indels were removed.We used the gIMble 'parse' module to quantify genetic diversity and divergence for the filtered subset of the data.We summarised variation in pair-blocks of 64 bases and tallied all blockwise site frequency configurations of mutations across heterospecific sample pairs (see Laetsch et al. (2023) for details).
To quantify heterogeneity in N e we fit a grid of parameters to sliding windows of a fixed length of 28,125 (pair-blocks) with a 20% overlap along the genome.This results in a total of 14,173 windows with a minimum span of 50 kb.The grid discretised N e from 100,000 to 1,500,000 (in increments of 100,000) for I. feisthamelii from 100,000 to 900,000 (in increments of 50,000) for I. podalirius and from 100,000 to 2,400,000 (in increments of 100,000) for the ancestral population; m e (forwards in time from I. feisthamelii into I. podalirius) was discretised from 0 to 4e-7, in increments of 5e-9.The split time T was fixed to the global result of 2,181,320 generations (Table 2) and the mutation rate was set to µ = 2.9 × 10 −9 per base and generation (Keightley et al. 2014) throughout.
To adjust our barrier definition for false positives (which are expected at an appreciable rate when the background m e is low), we ran a parametric bootstrap on the local estimates m e : for each window, we simulated 100 replicates under the best fitting local background history using gIMble 'simulate' and obtained a null distribution of ∆ B,0 .Only windows for which ∆ B,0 was greater than the largest value contained in this ∆ B,0 distribution under the simulated null were labelled as barriers.This ensures a false positive rate < 0.01.

Estimating hybrid indices and the ancestry of hybrid zone individuals
We polarized all variable sites (across all individuals) that were not singletons (which are uninformative for genome polarisation by association in state) and estimated genome-wide hybrid indices (HIs) using the diem framework implemented in both R and Mathematica (Baird et al. 2023).
Variants were subject to the same filtering criteria as the gIMble analysis.We calculated HIs per individual using sets of polarised sites with high diagnostic index (DI > -20).We used chromosome subsets of these sites to calculate HIs per chromosome.At a further reduction in scale we computed HI for each gIMble window and compared D across all individuals between barrier and non-barrier gIMble windows in an ANOVA.the same chromosome copy), if we wished to estimate tract size distributions of both minority and majority tracts.However, comparison to theory for tract size distributions (Baird 1995) requires only minority tract sizes: Then, no phasing decisions are necessary, only a decision regarding which barrier-side state is in the minority for an individual.This can be decided with little uncertainty using an estimate of the individual's HI taken over very many sites genome wide.Inferred migration rates are higher in windows spanning broader map lengths.

2.
What fraction of the Iphiclides genome acts as a long-term barrier to gene flow, and what are the properties of barrier regions? 3. What is the evidence for recent gene flow between the species across the hybrid zone and the strength of the multilocus barrier acting against it?4. Is introgression across the hybrid zone impeded by long-term barriers to gene flow?
Fig 2 B) and 14 individuals sampled throughout the ranges of both species ("the non-HZ set", Fig 2 A).The non-HZ set includes six samples of I. podalirius and eight of I. feisthamelii.Visualising genetic variation in a PCA reveals distinct clusters both within and between species.The first principle component (PC1) captures interspecies differentiation between I. podalirius and I. feisthamelii (≈ 23% of the variation, Fig 2).Samples assigned morphologically to each species form two clusters along PC1 with individuals from the HZ (Fig 2) falling between the two parental species clusters (Fig 2).The North African I. feisthamelii samples are separated from the European I. feisthamelii samples along PC2 (≈ 14% of the variation, Fig 2).Genetic diversity is slightly larger in I. feisthamelii compared to I. podalirius (Table 3 model with unidirectional gene flow (forwards in time) from I. feisthamelii into I. podalirius (IM 3,→pod ).Since an IM model will always fit at least as well as the (nested) DIV model, we compared the observed improvement in model fit (ln CL) relative to the null distribution of ln CL which we obtained by simulating 100 data sets under the best fitting DIV history.This parametric bootstrap confirms that the IM 3,→pod model does indeed fit significantly better than a DIV 3 history (Fig S4).Under the background/global IM model, we infer a split time (T ) 2,180,000 generations ago (Fig 2, Table feisthamelii into I. podalirius is 4.73 ×10 −8 which corresponds to M/2 = 2N e m e = 0.046 migrants per generation.Extensive genome-wide reproductive isolation between I. feisthamelii and I. podalirius We infer the effective migration rate m e along the genome in sliding windows.For each window, we estimate parameters under the best fitting IM 3,→pod history using a pre-computed-grid.Note that whilst the model and split time T are fixed globally, the remaining parameters (N e for each population and m e ) are estimated locally, i.e. per window.The estimates of local N e are approximately normally distributed (Fig S1) while local m e estimates have a strongly leptokurtic distribution with a long tail up to the maximum value in the grid (Fig S1).While the large number of windows in the largest m e bin reflects the fact that our grid of m e estimates truncates the long tail of the m e distribution, this does not affect our analyses of barriers.Following Laetsch et al. (2023) we label windows as barriers to gene flow if a DIV history (m e = 0) has greater marginal support than an IM history assuming the best fitting genome-wide value of m e (Table2), i.e. ∆ B0 > 0. We find that barriers to gene flow are widespread in the Iphiclides genome, making up 20% of windows across all chromosomes.Combining overlapping barrier windows, defines 555 barrier regions across all autosomes which cover 35% of the genome (≈ 143Mb).The number of barrier regions varies widely between chromosomes: e.g.chromosomes 29 and 31 harbour no and one barrier region respectively.In contrast, chromosomes 19 and 17 have 53 (29.3% of windows) and 49 (29.5% of windows) barrier regions respectively.The average length of barrier region is ≈ 257 kb (median ≈ 200 kb) with a maximum of 1.4 Mb.
1 B) have an alternating pattern of podalirius-feisthamelii ancestry along all autosomes (Fig 3) which can only be generated by hybridisation after divergence.We find substantial heterogeneity in the degree of hybridisation, as quantified by the hybrid index (HI which is the average across genotype assignments in an individual) both between individuals and chromosomes (Fig 5).The two most intermediate HZ samples, IF 1325 (HI = 0.712) and IF 1322 (HI = 0.420), show large stretches of all three possible ancestries, that is, tracts that are heterozygous by source (interspecific H) or homozygous for each species respectively, a pattern that can only be generated by a complex history of hybridisation: the expected HI of a simple n-th generation backcross hybrid is (1/2) n with homozygous tracts for one species only.While we expect substantial variation around this expectation, successive generations of simple backcrosses move the expected HI towards the parental values of 0 or 1 along the sides of the De Finetti triangle, interspecific H (the proportion of genotypes with an allele from each side of the barrier) remains maximal.Thus, the relationship between the HI and interspecific H, i.e. the De Finetti diagram, shows that samples 1325 and 1330 are complex backcrosses (i.e.their ancestry does not reflect the simple successive crossing of hybrids with parental species), and suggests that in both cases introgression is towards I. podalirius (Fig 5A).All samples except 1322 show a deficit in interspecific heterozygosity, the genomic Wahlund effect (Fig 5) the admixture tract length distribution on a log-log scale is − 3+θ 1+θ (see Fig 4, Appendix 1).Thus, to learn about the direction and dynamics of recent gene flow across the HZ, we fit the distribution of admixture tract length of colinear introgressed ancestry to the analytic expectation under this model(Baird 1995).In the absence of a recombination map, we measured the length of admixture tracts (x) relative to chromosome length, i.e. we assumed that each autosome has a map length of 0.25M, which corresponds to an average of one cross-over event per male meiosis (female butterflies are achiasmatic).To avoid block lengths being fragmented by rare errors we kernel smoothed diem labeling along chromosomes at a scale of 10 −4 × chromosome length.Considering long tracts (l > 0.04), neutrality provides a good fit for the distribution of tracts of I. feisthamelii ancestry in I. podalirius (i.e. a gradient of -3).In contrast, the size distribution of I. podalirius admixture tracts in I. feisthamelii is best explained by θ = 0.44(Fig 4).This asymmetry mirrors the asymmetry of long-term gene flow under the IM history inferred by gIMble and suggests that gene flow has been stable for much of the last 150 -275 generations (75 -140 years).In contrast, over the same timescale I. podalirius admixture into I. feisthamelii has been strongly selected against.Interestingly, we find that the admixture tract length distribution in neither direction involves a simple asymptote for short tracts as predicted under a constant m secondary contact initiated at time T .This may be due to several factors: first, gene flow in the more distant past may have been genuinely lower, which -given the dependence of Iphiclides on orchards and other anthropogenic habitats -may reflect changes in landscape use.Secondly, our simplistic measure of x as a proportion of chromosome length is inadequate for short blocks; estimate current barrier strength along the genome, we estimated D(Barton & Gale 1993), the multi-site mean pairwise LD in windows defined in the gIMble analysis.This is numerically equivalent to the variance in HI and captures the strong LD seen along stretches of co-introgressing variation: when minimal for polarised data (D = 0) states of sites are uncorrelated, and when maximal (D = 0.25) each sample reflects pure ancestry, three from either species, as expected under a strong central barrier.We find that -at the scale of gIMble windows (≈ 100 kb) -D is negatively correlated with m e,i (Pearson's ρ = -0.14).Furthermore, we found that D is increased for barrier windows (ANOVA, p < 2e-16) compared to non-barrier windows.This suggests ongoing selection against hybrid ancestry within the HZ at long-term barrier loci (Fig6).

Figure 1
Figure 1 -(A) Sampling locations and ranges of I. feisthamelii (purple) and I. podalirius (teal) butterflies.The samples collected from the hybrid zone are shown in black.(B) Sampling locations of butterflies from the Iphiclides hybrid zone.The dashed line represents the approximate center of the hybrid zone, based on samples collected by Lafranchis et al. (2015).The circular samples resemble I. feisthamelii, and the triangular samples are intermediate hybrids.

Figure 2 -
Figure2 -(A)The background demographic history of species divergence and gene flow, the height and width of populations, is relative to the maximum likelihood estimates under the IM 2,→pod model (Table2).This figure was generated using demes(Gower et al. 2022).(B) PCA

Figure 3 -Figure 4 -
Figure 3 -Circular representation of the location of barriers identified using gIMble and the polarity of diagnostic markers for each sample across linkage groups.The inner ring shows the location of each chromosome in alternating grey and white.Moving outwards, the next ring indicates the location of barriers to gene flow (in red) and non-barrier regions (in black).The remaining rings show the genotype of each sample at each diagnostic marker.Teal bars are diagnostic of I. podalirius, purple bars are diagnostic of I. feisthamelii, and yellow bars are heterozygous for markers diagnostic of each species.The location of each megabase of sequence for each chromosome is indicated on the outside of the circle.

Figure 5 -
Figure 5 -Hybrid index (HI) versus interspecific heterozygosity (H) for samples collected from the Iphiclides hybrid zone (HZ).(A) Mean values for each HZ sample.(B) Mean values for each chromosome including all samples from the HZ.(C) Mean values for each chromosome excluding the three I. feisthameliilike samples (1303, 1306 and 1308, HI ≈ 0).The dashed line indicates the expectation under Hardy-Weinberg equilibrium.

Figure 6 -
Figure 6 -The distribution of D across gIMble barrier windows (grey) and non-barrier windows (yellow).D calculated per window across HZ samples is greater within long-term barrier regions than in non-barrier regions (ANOVA, p < 2e-16).

Figure S1 -
Figure S1 -The distributions of N fei (purple), N pod (teal), N anc (grey), and m e,i (yellow) from I. feisthamelii into I. podalirius estimated in sliding windows using a 12x24x9x50 parameter grid.The vertical dashed lines indicate the global estimates of each parameter.

Figure S2 -
Figure S2 -Barriers to gene flow between I. feisthamelii and I. podalirius.The red bars in the top panel highlight the location of barrier windows.The bottom panel shows for each gIMble window, the difference in ln CL of m e,i = 0 from the background m e = 1e − 7, ∆ B0 .In

Figure S3 -
Figure S3 -The dashed lines indicate the mean density of repeats (top) and coding sequence (CDS) (bottom) in barrier regions defined via gIMble .Both are lower than the CDS and repeat proportion of data resampled at random using a circular resampling scheme.

Figure S4 -
Figure S4 -The distribution of differences in log-likelihood between an isolation with migration (IM) model versus a model of strict divergence (DIV) fit to 100 replicates of data simulated under a DIV model (black) and the empirical data (red).The improvement in fit of an IM model is much greater in the empirical data than in the simulations.

Figure S5 -
Figure S5 -The distribution of relative positions from the telomeres (at 0) to chromosome centres (at 0.5) for gIMble barrier windows (grey) and non-barrier windows (orange).The average heterozygosity for windows at different relative positions is shown by the black line.Barrier windows are marginally further from the telomeres than non-barrier windows (t-test, p = 0.0512).

Figure S6 -
Figure S6 -Mean heterozygosity (H) in sliding windows of minimum 50kb across all chromosomes.

Figure S8 -
FigureS8-The distribution of effective migration rates (m e ) between three pairs of butterfly sister species inferred by fitting IM models in windows across the genome.The rate of gene flow between Iphiclides podalirius and I. feisthamelii is lower than between Heliconius melpomene and H. cydno(Laetsch et al. 2023) and between Brenthis daphne and B. ino(Mackintosh et al. 2023).

Figure S9 -
Figure S9 -Mean effective migration rates and standard errors for windows spanning different map lengths.

Table 2 -
Maximum composite likelihood parameters for three demographic models of species divergence.We use the IM 3,→pod model for barrier inference.Thirdly, our model is unrealistically simplistic in that it ignores space: it assumes introgression into an infinitely large panimictic population.In a spatially continuous population we expect shorter tracts with increasing distance from the HZ which is indeed what we observe in (Gower et al. 2022) was generated using demes(Gower et al. 2022).(B) PCA of Iphiclides sampled across Europe; I. feisthamelii and I. podalirius samples are shown in purple and teal respectively.Samples from the HZ are shown in yellow.The first PC captures differences between the two taxa, and the second PC captures variation in geography, particularly the separation between North African and European I. feisthamelii.