How do species barriers decay? Concordance and local introgression in mosaic hybrid zones of mussels

The Mytilus complex of marine mussel species forms a mosaic of hybrid zones, found across temperate regions of the globe. This allows us to study ‘replicated’ instances of secondary contact between closely related species. Previous work on this complex has shown that local introgression is both widespread and highly heterogeneous, and has identified SNPs that are outliers of differentiation between lineages. Here, we developed an ancestry‐informative panel of such SNPs. We then compared their frequencies in newly sampled populations, including samples from within the hybrid zones, and parental populations at different distances from the contact. Results show that close to the hybrid zones, some outlier loci are near to fixation for the heterospecific allele, suggesting enhanced local introgression, or the local sweep of a shared ancestral allele. Conversely, genomic cline analyses, treating local parental populations as the reference, reveal a globally high concordance among loci, albeit with a few signals of asymmetric introgression. Enhanced local introgression at specific loci is consistent with the early transfer of adaptive variants after contact, possibly including asymmetric bi‐stable variants (Dobzhansky‐Muller incompatibilities), or haplotypes loaded with fewer deleterious mutations. Having escaped one barrier, however, these variants can be trapped or delayed at the next barrier, confining the introgression locally. These results shed light on the decay of species barriers during phases of contact.


| INTRODUC TI ON
Divergence between species is almost always accompanied by hybridization during phases of contact (Abbott et al., 2013), and so the study of speciation is intertwined with the study of introgression (Aeschbacher et al., 2017;Chaturvedi et al., 2020;Duranton et al., 2018;Gagnaire et al., 2018;Martin et al., 2019;Roesti et al., 2013;Schumer et al., 2018). The outcome of any given contact will depend on intrinsic and extrinsic factors. When barriers are weak, there can be massive introgression, leaving differentiation at only a few strongly selected loci. But even strong barriers can be rapidly crossed by adaptive alleles and flanking neutral variants (Barton, 1979a;Faure et al., 2008).
If some variation is bi-stable (Barton & Turelli, 2011) such that one parental genotype is fitter, but heterozygous or recombinant genotypes are unfit, the spread of the fittest genotype is hindered at the are broadcast spawners with a dispersive larval phase, and so true geographic isolation is relatively rare. For this reason, the complex is divided by numerous post-glacial hybrid zones, maintained by both intrinsic and extrinsic mechanisms of reproductive isolation El Ayari et al., 2019;Fraïsse et al., 2016;Hilbish et al., 2012;Riginos & Cunningham, 2005;Strelkov et al., 2017). In particular, in the Northern Hemisphere all three species form a geographic mosaic delimited by many hybrid zones .
This biogeographical history has two important consequences.
First, contacts between similar lineages occur multiple times, and so we can observe the same species barrier in multiple demographic and ecological contexts (Abbott et al., 2013;Harrison & Larson, 2016;Simon et al., 2020). Second, with its narrow hybrid zones, combined with very long-range admixture , this complex provides us with a continuum of divergence between the interacting taxa. Precisely because of this heterogeneity, however, it is challenging to identify diagnostic loci for the Mytilus complex.
Recently, Fraïsse et al. (2016) captured 1,269 contigs from reference populations of the Mytilus complex. Using these data, they identified regions that were divergent between species and outliers of differentiation between populations of the same species. These intra-specific outliers are significant in two ways. First, they can be used to identify individual lineages. Second, phylogenetic analyses of gene trees suggested that the majority of the outliers were due to local introgression between species, rather than local intra-specific selective sweeps. As such, they provide information about introgression in the complex.
Here, we extend the analyses of Fraïsse et al. (2016). First, we built a SNP panel of a few hundred loci chosen from their outlier contigs and therefore enriched in highly differentiated markers.
Having included intra-specific outlier loci, we then examined their allele frequencies in a wider range of populations, allowing us to examine local introgression on a broader scale. Finally, we also sampled within hybrid zones and carried out genomic cline/concordance analyses (Gompert & Buerkle, 2011;Macholán et al., 2011). Our aim here was to test the effect of the choice of parental populations (local vs. distant) in the detection of discordant behaviour.
After verifying that our SNP panel is effective in identifying the lineages of interest, our results confirm that local introgression is both widespread in these Mytilus lineages and heterogeneous across the genome. At some loci identified as intra-species outliers in Fraïsse et al. (2016), heterospecific alleles are sometimes fixed, or nearly so, in one parental patch close to a hybrid zone, whereas remaining nearly absent from other population patches farther from the zone. Genomic clines suggest high concordance among loci in the centre of hybrid zones, but only when local parental populations are used as reference. However, a few loci do depart from the genomic average and demonstrate moderate asymmetric introgression. Unlike outliers that exhibit enhanced local introgression at a large scale, the introgression of these genomic cline outliers does not extend outside of the hybrid zones where lineages coexist.

| Sampling
Mytilus spp. individuals were sampled from 58 locations, including several known hybrid zones ( Figure 1, Table 1). Sampling sites are located on the American Pacific coast, the American and European North Atlantic coasts, and in the Mediterranean, Baltic, North, Barents and Black seas. We genotyped 441 individuals and extracted 72 genotypes by sequencing (GBS) from the published sequence data set of Fraïsse et al. (2016).

| Assay design
We aimed to genotype previously identified differentiated and outlier loci across a large number of samples in a cost-effective manner. For this purpose, we used an Illumina BeadXpress assay with Veracode technology (GoldenGate Genotyping Assay). This method is based on the hybridization and PCR amplification of SNP-and locus-specific oligos differentially bound to two different dyes. The genotype of an individual is provided by the relative fluorescence.
We designed an assay of 384 SNPs (being the multiplexing limit of the technology).
Loci were selected, prior to genotyping, based on their differentiation either between species or between populations of the same species, using the published results of Fraïsse et al. (2016).
Briefly, this database was produced via a target enrichment method on the three species and multiple populations of the Mytilus species complex (Fraïsse et al., 2016, http://www.scbi.uma From the called genotypes of Fraïsse et al. (2016), markers with a minimum allele frequency of 0.05 and a maximum missing data percentage of 50 % computed on all individuals were retained.
Coverage was estimated as a mean computed on three populations (two Atlantic M. galloprovincialis and one Atlantic M. edulis). Contig regions with especially high coverage (> 300 reads) were excluded to avoid repeated elements. Regions of the database produced from cDNA were blasted against a draft genome (Murgarella et al., 2016) to exclude regions close to intron/exon limits as flanking sequences in 3′ and 5′ of the target SNP were needed to design primers for PCR amplification. For the same reason, SNPs close to the start and end of the contigs were also excluded. An ADT score, produced by Illumina Assay Design Tool, quantifies the expected amplification success and was used to filter the SNPs with the most probable design success (ADT score > 0.4). F ST was computed between pairs of populations and species such as those shown in Table 2. A first filtration step kept the 5 highest F ST markers per contig for each comparison. We ensured a F ST > 0.5 for all comparisons, except within edu_am and between gallo_atl and gallo_med where we used a F I G U R E 1 Location of the studied populations. Squares: genotype by sequencing (GBS, data from Fraïsse et al., 2016). Circles: BeadXpress genotypes. The same colour code is used in Figure 2. More detailed population information is available in Table S1 TA B L E 1 Definition of reference populations, classified by species level (L1), ocean basin or continent (L2), and regional groupings (L3). The ID column gives the population number referenced in Figure 1 and   (Riquet et al., 2017), two additional samples from the Mediterranean sea were removed (MTP_05 and Collo_10 from populations 48 and 52, respectively). Filtering yielded a total of 514 individuals genotyped at 212 markers.

| Genotyping and filtering
Reference populations were defined based on previous knowledge of the Mytilus species complex Simon et al., 2020). They are used first as a classification reference for newly genotyped individuals, to compute allele frequencies in analyses and to serve as parental populations. We used three levels of differentiation representing the species level (L1), the ocean basin or continent (L2) and regional groupings (L3) ( Table 1). We used the GBS samples from Fraïsse et al. (2016) as references, and previously untyped individuals were assigned to a reference group if they belonged to the same GBS population or were geographically close to it. We used a preliminary Admixture analysis (v1.3.0; Alexander et al., 2009) at the species level (L1) using K = 3 and default settings (30 independent runs merged with CLUMPAK; v1.1; Kopelman et al., 2015) to correct the reference groups by removing inter-specific migrants or hybrids, as sympatry and hybridization is a common phenomenon in the Mytilus species complex. Individuals assigned to those populations, and not filtered out, constitute the 'reference data set'.
Hardy-Weinberg equilibrium was tested for the remaining markers within putative panmictic clusters outside of known hybrid zones (pegas 0.11, Paradis, 2010). Exact tests were performed using 1,000 bootstraps, and p-values were adjusted for false discovery rate using the Benjamini-Yekutieli method (Benjamini & Yekutieli, 2001)..
We used a partial genetic map produced in Simon et al. (2020) based on an F2 cross between Mediterranean M. galloprovincialis and M. edulis, genotyped at a subset of the markers studied here (Bierne et al., 2002(Bierne et al., , 2006Simon et al., 2018). In addition to this genetic map, we extrapolated genetic distances from physical distance for markers present on the same contig using a recombination rate of 2 cM/Mb (rounded from the estimate in Bierne, 2010).

| Population structure
A principal component analysis (PCA) was performed using markers on different physical contigs (retaining the one with least missing data). This filtration was to avoid strong biases due to physical linkage and led to a final set of 160 markers. The genotype data were centred and scaled using the adegenet R package (v2. model. Structure was run with the admixture and linkage model (LINKAGE = 1). The data set was filtered with the following steps: (a) markers out of Hardy-Weinberg equilibrium were removed; (b) one marker per physical contig was retained (keeping the one with least missing data); and (c) the genetic map was used to produce linkage information for the retained markers, with the assumption that markers absent from the genetic map were unlinked (for lack of further information).
For the global Structure analysis, 20 independent Monte Carlo Markov Chains (MCMC) runs of 20,000 burn-in iterations followed by 80,000 steps were performed to estimate model parameters for each K between 4 and 12. The standard deviation for the α prior was set to 0.05 for better mixing of the chains. The Clumpak software (v1.1; Kopelman et al., 2015) was used to investigate and aggregate Structure outputs with an Markov clustering (MCL) threshold of 0.9.
Only major clusters are presented in the results.

| Hybrid zone analyses
Each hybrid zone was defined by parent 1 (P1), parent 2 (P2) and central populations (Table 3). Each parental population was classified as either 'local' or 'distant', according to observed levels of introgression or geography. A local parental population is defined as a population close to the focal hybrid zone and being potentially subject to recent gene flow from the hybrid zone. A distant parental population is defined as a population distant from the hybrid zone, which has been less influenced by recent gene flow from it but can still have a substantial rate of established introgression due to past contacts. In a few cases, a local parental population was not available in our sampling design, and in these cases, a distant population was used instead (Table 3) For each hybrid zone, we performed two analyses called 'local' and 'distant' in which the local or distant parental populations are taken as references, respectively (Table 3). Both analyses consider the 'central' population as admixed.
We estimated locus-specific posterior distributions for the cline centre (α) and rate of transition in ancestry along the admixture gra-

| RE SULTS
We use a set of 212 differentiated and outlier markers to investigate the population genetics of several hybrid zones and other introgressed populations in the Mytilus species complex. We start by showing that our target species and populations are identifiable with our ancestryinformative panel of SNPs. We also identify previously uncharacterized lineages or admixed clusters. Then, we analyse the differentiation and introgression patterns in the hybrid zones at two spatial scales.

| Power of the SNP panel and clustering results
Our markers were taken from a previous GBS study , and so we first tested if the markers were still informative in our broader sample. To do this, we correlated F ST values between the complete data set (BeadXpress and GBS) and the GBS genotypes considered alone (Table S2). Results showed good delimitation (i.e. high edulis cluster (Figure 2), but there also appears to be some infra-specific ancestry coming from Europe (already noticed by Simon et al., 2020).
An analysis considering only M. edulis samples, Figure S4, shows more clearly that individuals from the most southern populations of the

| Genetic differences between patches of parental populations close to or distant from hybrid zones
We investigated the allele frequency differences between an intergradation of populations both between M. edulis and M.

F I G U R E 2
Top: PCA-UMAP using the first 11 principal components. The reference level L3 (Table 1) is colour and shape coded as in Figure 1. Note that this representation does not conserve distances and is designed to maximize groupings between similar entities, see Figure S2 for

| Concordance in hybrid zones
The hybrid zones studied here involve four pairs of lineages, as listed in Table 3. We considered three types of populations:  .
Nevertheless, these two zones exhibit strong differences in their

| D ISCUSS I ON
4.1 | Mytilus mussels are genetically differentiated, but fixed differences are extremely rare One aim of this study was to develop a panel of highly differentiated, and potentially diagnostic markers for the Mytilus complex in the Northern Hemisphere. To this end, we started with 51,878 high-quality SNPs from 1,269 sequences  and selected SNPs with the greatest discriminatory power. We combined differentiated variants between species with intra-specific F ST outliers to provide a more detailed identification. This procedure was successful, in that we were able to discriminate not only individuals of the 'E' allele was observed in M. galloprovincialis Borsa et al., 1999;Hamer et al., 2012;Wood et al., 2003) and the 'G' allele in M. edulis Kijewski et al., 2009;Luttikhuizen et al., 2002). In Mytilus mussels, therefore, it seems prudent to replace single marker diagnosis with multilocus inference, and our results suggest that 5-10 well chosen loci should be sufficient for this purpose, but not less.
It is unlikely that this situation is unique to mussels. In the highly divergent Ciona tunicate species, C. robusta and C. intestinalis, loci initially assumed to be diagnostic and used to identify F I G U R E 5 Left panels: Hybrid index (h) distributions from introgress, for each group of the hybrid zone. Middle panels: Genomic clines computed with the local parental groups (P1_local, P2_local or distant when not available) with clines presenting a significant excess in either α or β parameters highlighted in black. Right panels: Genomic clines computed with distant populations as parental groups. For genomic clines, only markers with an allele frequency difference > 0.3 between P1_distant and P2_distant are drawn. The probability of the locus-specifc ancestry for P2 against the genome-wide hybrid index (h) is plotted. When distant parental populations are used in place of local ones, the number of discordant loci increases. See Table 3 for a list of the populations used in the analyses hybrids Nydam & Harrison, 2011;Sato et al., 2014) were subsequently found to harbour shared polymorphisms in a multilocus analysis, and heterozygous genotypes were found in parental populations, both at the initially studied markers and at many of the newly designed markers . As for the Mytilus system, Ciona species have very high nucleotide diversities and have been shown to go through a history of secondary contact (Roux et al., 2013) resulting in a higher proportion of high frequency derived alleles either shared between populations or private to one population (i.e. top left and bottom right of the unfolded joint allele frequency spectrum) than expected for well isolated and low N e species.

| Enhanced local introgression is widespread and has several possible causes
A notable feature of our data was the appearance of heterospecific alleles that were present at high frequency in one or a small number of populations. These alleles were very common (at least one such allele was found in each population of each lineage of all three species), but were found heterogeneously across the genome (Figures 3   and 4). Fraïsse et al. (2016) also found this pattern, in their much smaller sample. Furthermore, their use of genome-wide data, combined with analyses of gene-genealogies, suggested that the pattern was due to local introgression, rather than, say, incomplete lineage sorting (e.g. the local sweep of a shared ancestral variant).
Although our data do not allow us to distinguish between these possibilities, and although both have no doubt taken place, our use of the data of Fraïsse et al. (2016) to choose our SNPs, combined with the finding that these alleles are preferentially found in contiguous populations (Figures 3 and 4), strongly suggest high levels of local introgression in the Mytilus complex .
The biogeography of the complex also allowed us to study in-  Hilbish et al., 2012). This structure revealed multiple instances of enhanced local introgression, that is, introgression events that are localized both genomically, and geographically, with allelic variants crossing one barrier to gene flow, but halted at a subsequent barrier (Figure 3).
Three possible mechanisms could explain the heterogeneity of local introgression. First, introgression could be adaptive (Hedrick, 2013;Pardo-Diaz et al., 2012;Racimo et al., 2017;Staubach et al., 2012). In this case, the fact that introgression was halted at a subsequent barrier could be explained either by an environmental difference (such that the allele was only locally beneficial), or by a very strong barrier (in which case the halting would be transient).
However, Fraïsse et al. (2016) found no clear links between the biological function of outlier genes and the environmental conditions of introgressed populations. Alternatively, if the markers are not the direct target of selection-as seems highly plausible )-a marker might hitchhike through one barrier, but be halted at a second (Barton, 2000;Faure et al., 2008).
A second possibility is that introgression acts to reduce genetic load in the recipient population (Harris & Nielsen, 2016;Juric et al., 2016;Kim et al., 2018). This scenario is similar to adaptive introgression, but emphasises the role of deleterious mutations in the recipient population rather than advantageous mutations in the donor population.
Third, and finally, introgression could involve bi-stable variants in a tension zone (Barton & Hewitt, 1985). Such variants can individually move via an asymmetry in parental fitness, that is when one parental genotype is fitter than the other and both are fitter than hybrids (Barton, 1979a;Barton & Turelli, 2011). However, they can easily be trapped for long periods by a density trough, a barrier to dispersion and coupling (Barton & Turelli, 2011;Bierne et al., 2011, see El Ayari et al., 2019 for a possible example in Mytilus). This scenario also implies transience in the structure, because tension zones can move due to genetic drift or changes in the environmental conditions (Piálek & Barton, 1997). In other words, rare stochastic events can free the 'pushed' wave of advance (Piálek & Barton, 1997). This hypothesis was already proposed in Mytilus by Fraïsse et al. (2014) and has also been proposed for other systems such as Bathymodiolus mussels (Plouviez et al., 2013), sea bass (Duranton et al., 2020), killer whales (Foote et al., 2019) and house mouse (Macholán et al., 2008).

| High overall concordance of barrier loci at hybrid zones
With high levels of local introgression, it can be difficult to study genomic clines. With our data, this was evident in the occasional failures of our cline analyses, for instance in the Aquitaine hybrid zone.
Nevertheless, in most cases these analyses yielded clear results: a high level of concordance with respect to local parental populations (Figure 5 middle panels) and increased discordance with respect to distant parental populations (again, consistent with local introgression).
Even with local parental populations, we observed a few SNPs that deviated from the genomic average. The anomalous 'excess ancestry' loci observed in Scotland and Øresund in the local analysis could still be a consequence of enhanced local introgression, as P1_local populations were missing from our sampling. Nonetheless, deviating loci are still observed, for example in Brittany ( Figure 5).
These loci tend to introgress more than the others in the centre of the zone, but-unlike established introgression-have not escaped the genetic barrier. We suspect that these loci might contribute to bi-stable variation that is asymmetric, as described above, but with a lower fitness difference between parental genotypes. Genomic cline outliers with asymmetric introgression are often interpreted as evidence of adaptive introgression, that is directional selection (Gompert & Buerkle, 2009). Our results support directional selection in the centre of the hybrid zones, but not in flanking parental populations. If a force is opposing the spread of the adaptive variant from a barrier, it could be selection against heterozygous or recombinant hybrid genotypes (Barton, 1979b).
Finally, the observation of a small but sometimes significant correlation of the genomic cline parameters can point us to parallel processes that may be at play in the hybridization of Mytilus species. Here, correlations were only found in hybrid zones implicating the same species and lineages. However, Simon et al. (2020) already showed that parallel deviations in admixed populations of mussels exist, and that it can also sometimes happen between different lineages. Such cases call for further investigations of potential selective effects in the regions of the studied markers, either due to positive selection or barrier loci.

| The timing and context of introgressions
Given the heterogeneity in the introgression patterns we have observed, it seems likely that they were shaped by both contemporary contacts, and historic contacts during the Quaternary period (Hewitt, 2000).
In some cases, our data allow us to make inferences about the timing of specific introgression events. For example, local introgression of M. edulis alleles into M. galloprovincialis from the Black Sea and Mediterranean is consistent with ancient contacts between these populations Gosset & Bierne, 2013). The two private introgressions into the Black sea were nevertheless unexpected. Fraïsse et al. (2016) did not sample these populations, and so our data set was not enriched for discriminatory SNPs. Therefore, such introgressions might be quite numerous in the rest of the genome.
By contrast with these putatively ancient introgressions, it is likely that the private introgressions found in M. galloprovincialis from Brittany (Figure 3a, B2) are relatively recent. This is probably because, given its position enclosed in the mosaic, this population only became established after the last post-glacial period.
More recently still, it is likely that the admixed individuals between Mediterranean M. galloprovincialis and South-European M. edulis, observed in the port of Cherbourg represent a recent human-mediated introduction. Similar admixed individuals have been observed in other ports in the English Channel and French Atlantic coast (Simon et al., 2020), where these 'dock mussels' form smallscale hybrid zones with the native lineages outside of the port.
In one final case, shared introgressions allow us to make inferences about the historical biogeography of the Mytilus complex. In particular, we found that several introgressions from M. trossulus to M. edulis were shared between M. edulis populations in Northern Europe and America, and the most parsimonious explanation is that these introgressions predated their split. This hypothesis sheds light on the origins of the North-European M. edulis population. We have shown that this population is differentiated from both American, and South-European M. edulis, and appears as intermediate in the PCA ( Figure S2), Structure ( Figure 2 and Figure S5) and hybrid index analyses ( Figure 5). This observation previously made in Simon et al. (2020), has recently been confirmed by Wenne et al. (2020). Given its presence as the parental M. edulis population in the Øresund hybrid zone, and the complete absence of American ancestry in the Netherlands, the border between Southern and Northern European M. edulis probably falls somewhere near to the Danish coast (see also the previously observed differences between the North Sea and populations North of the Kattegat region; Stuckas et al., 2009).
Given the shared introgressions, therefore, the contemporary biogeography could reflect a recolonization of America after the last glacial maximum, by a proto-North-European M. edulis, which was later introgressed by the South-European M. edulis (Wares & Cunningham, 2001). Alternatively, American M. edulis, having survived in a refugium (Riginos & Henzler, 2008), might have colonized the North Atlantic and Scandinavia, and then been introgressed by the South-European M. edulis in Europe. Finally, although North-European M. edulis and M. trossulus live in sympatry in several locations around Scandinavia and northern Russia (this study; Katolikova et al., 2016;Simon et al., 2020), our analyses show that introgression is not higher than in American populations.

| Implications for speciation research
The Mytilus species complex is characterized by diverse levels of genetic differentiation between lineages and species, and by pervasive introgression due to a dense history of secondary contacts and mosaic geographical distribution. As such, it constitutes a powerful model to investigate the maintenance of species barriers in the marine environment. Based on our results and previous studies Fraïsse et al., 2016), we suggest that secondary contacts are not only periods of neutral introgression, but also periods during which genetic incompatibilities can be swamped resulting in a punctuated decay of the barrier to gene flow. This vision is rooted in the observation that gene flow after secondary contacts in the Mytilus complex is shown to be heterogeneous along the genome Roux et al., 2014Roux et al., , 2016. In regions of higher gene flow, some alleles have locally introgressed. Those can either be missed when distant parental populations are not taken into account, or observed and interpreted as local adaptive introgression. Another explanation exists, considering that bi-stable variants, which could be associated with or responsible for genetic incompatibilities, can punctually escape a zone of coupling. Overall, our results suggest that asymmetrical parental fitness differences may enhance introgression at some regions of the genome yet successive barriers can prevent or delay propagation.