Local introgression at two spatial scales in mosaic hybrid zones of mussels

When the ranges of closely-related lineages are large, and overlapping, we can often study introgression at many “replicated” contacts, with different locations and spatial scales. Here we analysed multiple contact zones of the M. edulis complex of marine mussel species, which represent a mosaic distribution of heterogeneously differentiated, semi-isolated genomes. Our aim was to contrast ongoing introgression at the heart of hybrid zones, with past introgression between similar parental populations, at increasing distance from the contact. Using a panel of ancestry-informative SNPs derived from a previous genomic study, we first confirm, with a broader sampling, that local introgression, affecting one but not all of the populations compared, is both widespread and heterogeneous across the genome. Some outlier loci show patterns of complete introgression in certain populations, and an absence of introgression in others. Genomic cline analyses reveal a globally high concordance among loci at a local scale, albeit with signals of asymmetric introgression at a few loci. Enhanced local introgression at specific loci is consistent with the early transfer of adaptive variants after contact, possibly including asymmetric bi-stable variants, or less loaded alleles. Given the mosaic structure of the M. edulis complex, with a succession of genetic barriers to gene flow, variants with enhanced introgression through one barrier can be trapped, maybe transiently, at the next barrier, confining introgression locally. This makes the Mytilus complex an ideal model of the heterogeneous porosity of species barriers.


Introduction
Divergence between species is almost always accompanied by hybridisation (Abbott et al., 2013), and so the study of speciation is intertwined with the study of introgression. Speciation researchers are interested in both contemporary introgression, which can inform us about the nature of incompatibilities between lineages, and in historic introgression, which can inform us about the history of the speciation process. However, the comparison of ongoing and ancient introgression rates is difficult, and therefore rarely done (Bouchemousse, Liautard-Haag, Bierne, & Viard, 2016;Chaturvedi et al., 2019). Estimating patterns of introgression across the genome can reveal regions of reduced gene flow, due to genetic incompatibilities, divergent selection or pre-zygotic isolation. However, these patterns are also confounded by factors such as linked-selection, and variation in mutation and recombination rates (Ravinet et al., 2017). The study of enhanced introgression in one population, when compared to a reference parental population (sometimes called 'local introgression', Fraïsse, Belkhir, Welch, and Bierne, 2016) is an efficient approach to isolating the effect of introgression from other factors (Green et al., 2010;Martin, Davey, & Jiggins, 2015). In addition, the study of multiple cases of contact between similar lineages allows us to study the effect of the same barrier in multiple demographic and ecological contexts (Abbott et al., 2013), and to test the repeatability of introgression patterns (Harrison & Larson, 2016;Simon et al., 2019).
Many hybrid zones result from secondary contacts, and multiple outcomes can be expected depending on intrinsic and extrinsic factors. The absence of strong enough genetic or migration (i.e. physical) barriers, as well as demographic unbalance, can result in massive introgression. In this scenario, only strongly selected loci would resist introgression while most of the genome tends to homogenise (introgression swamping). If the barrier to gene flow is strong enough, only adaptive and compatible alleles (and flanking hitchhiker loci) can cross it without delay (Nicholas H. Barton, 1979a;Faure, David, Bonhomme, & Bierne, 2008). If the variation is bi-stable (Nicholas H. Barton & Turelli, 2011) such that one parental genotype is fitter but heterozygous or recombinant genotypes are unfit, the spread of the fittest genotype is hindered by the barrier (Nicholas H. Barton, 1979a;Piálek & Barton, 1997). Stochastic processes, such as random drift, or variable migration rates, can free the spreading wave from the barrier trap, but the delay can be very long (Piálek & Barton, 1997). Most of the time, in a setting where genetic and migration barriers are coupled, secondary contact will result in a 'porous barrier' with a slow erosion of the differentiation at a rate that depends on recombination rates and the density of barrier loci (Nicholas H. Barton & Bengtsson, 1986). This has been observed in many recent genome papers (Aeschbacher, Selby, Willis, & Coop, 2017;Duranton et al., 2018;Gagnaire et al., 2018;Martin, Davey, Salazar, & Jiggins, 2019;Roesti, Moser, & Berner, 2013;Schumer et al., 2018).
In the marine environment, and especially for broadcast spawners with a dispersive larval phase, migration barriers are potentially rare and species are likely to be subdivided by hybrid zones rather than by geographic isolation. Nevertheless, where dispersal barriers do exist -such as at well-known bio-geographic boundaries characterised by oceanic fronts and environmental shifts -genetic breaks tend to coincide with them. This is well illustrated in the complex of marine mussels, comprising the three species M. trossulus, M. edulis and M. galloprovincialis. This Mytilus species complex is subdivided into a mosaic of semi-isolated genetic clusters, by numerous hybrid zones El Ayari, Trigui El Menif, Hamer, Cahill, & Bierne, 2019;Fraïsse, Belkhir, et al., 2016;Riginos & Cunningham, 2005;Strelkov, Katolikova, & Väinolä, 2017). In the northern hemisphere all three species form a geographic mosaic delimited by multiple hybrid zones (Fraïsse, Belkhir, et al., 2016). In addition, each species is itself subdivided into infra-specific lineages, sometimes by geographic isolation (East and West Atlantic in M. edulis and M. trossulus, Riginos and Henzler, 2008;Varvio, Koehn, and Väinolä, 1988) and sometimes by hybrid zones (Atlantic and Mediterranean lineages in M. galloprovincialis, El Ayari et al., 2019;Quesada, Zapata, and Alvarez, 1995). Therefore, this complex provides us with numerous post-glacial hybrid zones with a continuum of divergence between the interacting taxa. Although introgression can sometimes be extensive, both intrinsic and extrinsic mechanisms of reproductive isolation maintain narrow admixture zones in which early generation hybrids are continuously produced. Finally, genetic differentiation is highly heterogeneous across the genomes, with a gradient from virtual panmixia (Boon, Faure, & Bierne, 2009) to differentially fixed loci (Fraïsse, Belkhir, et al., 2016). Precisely because of this heterogeneity, however, it is challenging to identify diagnostic loci for the Mytilus complex, even for the species M. edulis and M. galloprovincialis. We took advantage of a population genomics analysis of around 1300 more than 5 Kb long contigs (Fraïsse, Belkhir, et al., 2016) to develop a panel of ancestry-informative markers. This panel allowed us to study introgression between parental populations of the mosaic, as well as within hybrid zones using genomic cline analysis (Gompert & Buerkle, 2011). After a verification that the SNP panel was effective, we confirm that local introgression is both widespread and heterogeneous across the genome. At some loci identified as outliers in (Fraïsse, Belkhir, et al., 2016), heterospecific alleles have sometimes fixed, or nearly so, in one parental patch close to a hybrid zone while they remain nearly absent from other population patches farther from the zone. Genomic clines suggest high concordance in the heart of hybrid zones, although a few loci do depart from the genomic average, and demonstrate asymmetric introgression. Unlike outliers that exhibit enhanced local introgression at a large scale, the introgression of these genomic cline outliers does not extend outside of the admixture zones.

Sampling
Mytilus spp. individuals were sampled from 58 locations, including several known hybrid zones ( Figure 1, Table 1). Sampling sites are located on the American Pacific coast, the American and European North Atlantic coasts, and in the Mediterranean, Baltic, North, Barents and Black seas. 441 individuals were newly genotyped and 72 genotypes by sequencing (GBS) were extracted from the published dataset of Fraïsse, Belkhir, et al. (2016).

Assay design
We aimed to genotype ancestry informative loci, across a large number of samples, in a costeffective manner. For this purpose we used an Illumina BeadXpress® assay with Veracode™ technology (GoldenGate Genotyping Assay). We designed an assay of 384 SNPs (being the multiplexing limit of the technology).
Loci were selected, prior to genotyping, based on their ancestry informativeness, using the published results of (Fraïsse, Belkhir, et al., 2016). Briefly, this database was produced via a target enrichment method on the three species and multiple populations of the Mytilus species complex (Fraïsse, Belkhir, et al., 2016). It contains 1269 contigs sequenced for 72 individuals from eleven different locations http://www.scbi.uma.es/mytilus/index.php.
Markers with a minimum allele frequency of 0.05 and a maximum missing data percentage of 50% were retained. Coverage was estimated as a mean computed on three populations  (Murgarella et al., 2016) to exclude regions close to intron/exon limits as flanking sequences in 3' and 5' of the SNP were needed to design primers for PCR amplification. For the same reason, SNPs close to the start and end of the contigs were also excluded. An ADT score, produced by Illumina Assay Design Tool, quantifies the expected amplification success, and was used to filter the SNPs with the most probable design success (ADT score > 0.4). Finally, the most differentiated SNPs (using F ST ) between any population from Fraïsse, Belkhir, et al. (2016) were retained. After this filtering, a few contigs of special interest in mussels were rescued (e.g. immune genes or ecologically relevant traits like adhesion protein). Finally, SNPs were classified in terms of comparison informativeness, and selected to constitute a final 384 SNPs dataset, balanced across comparisons ( Table 2). The additional SNP ("extra") correspond to an amino-acid variant introgressed into M. galloprovincialis Atlantic from M. edulis detected in Fraïsse, Belkhir, et al. (2016) (contig gi_385288268_emb_Contig56466, annotated for a tumor necrosis factor).

Genotyping and filtration
Genotyping of the 441 individuals was performed with the BeadXpress® (hereafter BXP) technology. Among the 384 SNPs genotyped, 252 were readable (clusters of homozygotes/heterozygotes well defined) and 132 were lost. This low rate of successful amplification is expected in such highly polymorphic species. 40 additional SNPs were removed due to a differentiation between the BXP and GBS typing within populations. The threshold of missing data per SNP and per individual was set to 10%. One marker (190) was rescued as the missing data was mostly due to low amplification in M. trossulus (overall 12% missing data).
The genotypes for the 77 GBS individuals were retrieved from the calling in Fraïsse, Belkhir, et al. (2016).
Two replicated individuals between the GBS and BeadXpress were present in the dataset. They showed mismatch levels of, respectively, 2.83% and 2.36% between the two experiments, and this was due entirely to a well-known heterozygote assignment bias in GBS experiments. The two replicated GBS individuals were removed from further analyses.
Four samples were removed from the analyses prior to filtering as they have already proven to be cancerous individuals (Riquet, Le Cam, Fonteneau, & Viard, 2016, July 2015 (Por_40 from population 34; Arsud_05, Arsud_07 from population 38; and Ret_04 from population 19).
Reference populations were defined on previous knowledge of the M. edulis species complex (Fraïsse, Gunnarsson, Roze, Bierne, & Welch, 2016;Simon et al., 2019). We used three levels of differentiation representing the species level (L1), the ocean basin or continent (L2) and regional groupings (L3) ( Table 1). We used the GBS samples from Fraïsse, Belkhir, et al. (2016) as references and previously untyped individuals were assigned to a reference group if they belonged to the same GBS population or were close in geographic distance from it. We used a preliminary Admixture analysis at the species level (L1) using K = 3 and default settings (30 independent runs merged with CLUMPAK (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015)) to correct the reference groups for migrants or hybrids, as sympatry and hybridisation is a common phenomenon in the M. edulis species complex. Individuals assigned to those populations, and not filtered out, constitute the "reference dataset".
Due to suspicious levels of M. trossulus ancestry in locations devoid of this species, which could indicate the presence of a transmissible cancer (Riquet et al., 2016, July 2015, two additional samples from the Mediterranean sea were removed (MTP_05 and Collo_10 from populations 48 and 52 respectively).
A more stringent filtration was additionally applied to the edu_am population when used as a reference, given the presence of European ancestry in the Long Island Sound (populations 15 and 16, Figure S10, also described in Simon et al., 2019). Filtering yielded 514 individuals genotyped at 212 markers.
Hardy-Weinberg equilibrium was tested for the remaining markers within putative panmictic clusters outside of known hybrid zones (pegas 0.11). Exact tests were performed using 1000 bootstraps and p-values were adjusted for false discovery rate using the Benjamini-Yekutieli method (Benjamini & Yekutieli, 2001).
We used a partial genetic map produced in Simon et al. (2019) based on an F2 cross between Mediterranean M. galloprovincialis and M. edulis, genotyped at a subset of the markers studied here (Bierne, Bonhomme, Boudry, Szulkin, & David, 2006;Bierne, David, Boudry, & Bonhomme, 2002;Simon, Bierne, & Welch, 2018). In addition to this genetic map, we extrapolated genetic distances from physical distance for markers present on the same contig using a recombination rate of 2 cM/Mb (rounded from the estimate in Bierne, 2010).

Population structure
A principal components analysis (PCA) was performed using markers on different physical contigs (retaining the one with least missing data). This filtration was to avoid strong biases due to physical linkage, and led to a final set of 160 markers. The genotype data was centred and scaled using the adegenet R package (Jombart, 2008), with the replacement of missing data by the mean allele frequencies. Following the PCA, a dimensional reduction method, UMAP (Diaz-Papkovich, Anderson-Trocme, & Gravel, 2018;McInnes & Healy, 2018), was applied to the first 11 principal components. This threshold was chosen based on the expectation of 12 panmictic populations (level L3 of the reference groups, Diaz-Papkovich et al., 2018). This method was performed using the python package umap-learn (McInnes & Healy, 2018) and the R wrapper package umap (Konopka, 2019).
The Structure software was used to provide Bayesian estimates of ancestry with an admixture model. Structure was run with the admixture and linkage model (LINKAGE = 1). The dataset was filtered with the following steps: (i) markers out of Hardy-Weinberg equilibrium were removed; (ii) one marker per physical contig was retained (keeping the one with least missing data); and (iii) the genetic map was used to produce linkage information for the retained markers, with the assumption that markers absent from the genetic map were unlinked (for lack of further information).
For the global Structure analysis, 20 independent Monte Carlo Markov Chains (MCMC) runs of 20,000 burn-in iterations followed by 80,000 steps were performed to estimate model parameters for each K between 4 and 8. The standard deviation for the α prior was set to 0.05 for better mixing of the chains. The Clumpak software (Kopelman et al., 2015) was used to investigate and aggregate Structure outputs with an MCL threshold of 0.9. Only major clusters are presented in the results.

Hybrid zones analyses
Each hybrid zone was defined by parent 1 (P1), parent 2 (P2) and central populations (Table 3). Each parental population was classified as either "local" or "peripheral", according to observed levels of introgression or geography. In some cases the "local" parental population was not available in our sampling design and in these cases, the peripheral population was used in its place (Table 3). For each hybrid zone, three analyses were carried out: (i) computation of hybrid indexes with the R package introgress (Gompert & Buerkle, 2010), (ii) a local Bayesian clustering analysis using Structure, and (iii) a genomic clines analysis.
Structure analyses for each hybrid zone used the same parameters as for the combined data set, and the admixture with linkage model. The filtration of markers was similar to the global analysis. Each subset was filtered for irrelevant genetic backgrounds in the hybrid zone studied. For example, M. trossulus individuals were removed from the Northern M. edulis (Russia) reference populations for the Øresund hybrid zone study. Each hybrid zone was studied for a K of 2 and 3, to make sure there was no hidden substructure in the subset of individuals considered.
Genomic clines are used to detect markers deviating from the average genomic expectation given the distribution of hybrid indexes. The program bgc was used to estimate genomic cline parameters with a Bayesian method (Gompert & Buerkle, 2011 for each hybrid zone considered. Datasets were prepared with custom R scripts. For each hybrid zone, fixed markers were removed as they are uninformative. The same filtered marker sets as for the Structure analyses were used. Four independent chains of 200 000 iterations including 20 000 burn-in iterations with a thinning of 20 were performed. We used the ICARrho model for linked loci with the previously generated genetic map. The R package rhdf5 (Fischer, Pau, & Smith, 2019) was used to read the MCMC outputs. Convergence was assessed using the method and R code of Vehtari, Gelman, Simpson, Carpenter, and Bürkner (2019).
For each hybrid zone, we performed two analyses called "local" and "peripheral". The local analysis considers only the "central" population as admixed and uses the local P1 and P2 populations as parental references (Table 3). In the case of a missing local parental population, the peripheral corresponding one was considered (e.g. in the Øresund hybrid zone). The peripheral analysis considers both the central and local parental populations as admixed, while taking the peripheral P1 and P2 populations as parental references.
Loci exhibiting extreme deviations from the neutral genetic background were determined by two methods. First, we estimated locus-specific posterior distributions for the cline parameters α and β. Loci were classified as having "excess ancestry" if the 95% quantiles of these distributions did not include 0 (Gompert & Buerkle, 2012).

Results
We use a set of 212 ancestry informative markers to investigate the population genetics of several hybrid zones, and other introgressed populations in the Mytilus species complex. We start by showing that our target species and populations are identifiable with our SNP panel. We also identify previously uncharacterised lineages or admixed clusters. Then we analyse the differentiation and introgression patterns in the hybrid zones, at two spatial scales.

Power of the SNP panel and clustering results
Our markers were taken from a previous GBS study (Fraïsse, Belkhir, et al., 2016), and so we first tested if the markers continued to be informative in our larger sample. To do this, we correlated F ST values, between the complete dataset (BeadXpress and GBS), and the GBS genotypes considered alone (Table S2). Results showed good delimitation (i.e. high F ST correlations) between species and between known semi-isolated lineages within a species (e.g. Atlantic ocean, Almeria-Oran front), while comparisons between less separated entities were less successful (i.e. low F ST correlations). Overall, however, the assay design produced a strong enrichment in high F ST markers compared to the distributions of Fraïsse, Belkhir, et al. (2016), showing that our markers are ancestry informative ( Figure S1).
Although the markers are informative, another consequence of increasing the sample size was that truly diagnostic markers became rare or absent at the species level (Table 4). While our dataset contained many allele frequency differences greater than 0.7, only two markers showed fixed differences between species. Indeed, when considering all reference individuals of one species, only the marker 147 was fixed between M. edulis and M. galloprovincialis (Contig H_L1_abyss_Contig244 position 6092 in Fraïsse, Belkhir, et al., 2016) and only the marker 015 was fixed between M. edulis and M. trossulus (Contig Contig17324_GA36A position 1089 in Fraïsse, Belkhir, et al., 2016). No marker was fixed between M. galloprovincialis and M. trossulus.
Using our ancestry informative markers, five genetic clusters could be defined without ambiguity in a Principal Component Analysis ( Figure S2). This is visualised, and combined with a Structure analysis in Figure 2 (see also Figure S4). Figure 2, presents a clear picture of population structure in the Mytilus complex. First, the three species are clearly differentiated, and so are known genetic clusters within each species: (i) European and American M. edulis, (ii) Atlantic and Mediterranean M. galloprovincialis, (iii) Baltic and American/North-European M. trossulus. We observed that the Baltic population of M. trossulus (22) is introgressed by M. edulis, as previously described (Fraïsse, Belkhir, et al., 2016), and this cluster can be identified with Structure with K = 8 ( Figure S3).
The remaining genetic clusters in Figure 2 are labelled A-E and represent admixed populations (see also S2). For example, the hybrid zones of Brittany, Aquitaine and Scotland all involve Atlantic M. galloprovincialis and European M. edulis, and all are found in group C ( Figure 2). By contrast, the Algeria hybrid zone involves Atlantic and Mediterranean M. galloprovincialis, and is group D (Figure 2).
Group E contains a small group of individuals sampled in the leisure marina of Cherbourg in 2003, on the French coast of the English Channel (29-Cherbourg). These individuals exhibited an admixture, largely between Mediterranean M. galloprovincialis (66% ancestry) and European M. edulis, (29% ancestry). We note that a nearby population, in Barfleur (population 28, around 30 km from Cherbourg), is composed exclusively of European M. edulis, the  Figure S2). The ancestry intergradation observed in Figure S10 for populations 18 and 19 cannot be attributed to unaccounted M. trossulus ancestry, because those individuals do not exhibit introgression from this species in the global Structure analysis (Figure 2). A similar admixture is also visible in the Edinburgh population (23), which has additional Atlantic M. galloprovincialis ancestry due to its localisation close to the Scottish hybrid zone. Finally, group C corresponds to admixed individuals of the Øresund hybrid zone between the newly identified North-European M. edulis and the Baltic M. trossulus and populations 20-21 in Figure 2).
Structure analysis shows some further patterns. First, on the West coast of the USA (populations 01 to 07, Figure 2), we observe introduced Mediterranean M. galloprovincialis individuals, and a single F1 hybrid with the native M. trossulus parent (population 06). This is consistent with the report of Saarman and Pogson (2015).
Second, M. edulis samples from the Long Island region (USA, East coast, populations 10 to 16) were mainly assigned to the American M. edulis cluster (Figure 2), but there also appears to be some infraspecific ancestry coming from Europe. An analysis considering only M. edulis samples, Figure S10, shows more clearly that individuals from the most southern populations of the Long Island Sound sometimes have higher European ancestry than other American populations (e.g. Boston, population 17).

"Comets" of introgression
We investigated the allele frequency differences between an intergradation of populations both between M. edulis and M. galloprovincialis (Figure 3), and between M. edulis and M. trossulus ( Figure 4). Because of their visual signature ( Figure 3A), we define "comets" as loci with introgression in some, but not all, populations (Staubach et al., 2012). These comets of introgression were previously identified as within-species outlier loci (Fraïsse, Belkhir, et al., 2016).  (Table 1) is colour and shape coded. Note that this representation does not conserve distances and is designed to maximise groupings between similar entities, see Figure S2 for Because this study uses ancestry-informative SNPs, we lacked a neutral baseline required for outlier tests. Nevetheless, we confirmed a signal that is so strong as to make fully neutral interpretations unlikely. The M. edulis-M. galloprovincialis mosaic hybrid zones (Figure 3) can be seen as a large scale intergradation between the Mediterranean populations (starting in the Black Sea) and the North-European ones. As mentioned above, M. galloprovincialis populations are more introgressed by M. edulis alleles in the Atlantic than the Mediterranean (see mean hybrid index, hi, in Figure 3A). In addition, the local M. galloprovincialis population in Brittany (gallo_atl_brit), surrounded by two patches of M. edulis on either side, has a higher level of introgression than the local population in Aquitaine (gallo_atl_aqui). Interestingly, the Black-Sea population (gallo_med_bs) displays a few fixed introgressed alleles from M. edulis, contrasting with the rest of the Mediterranean basin ( Figure 3A and B6). The South-European M. edulis, both external and internal to the mosaic, have nearly as many comets of introgression as M. galloprovincialis ( Figure 3A and B4); and around half of these extend to the Northern Europe.
For the M. edulis-M. trossulus comparison, we investigated two intergradations, in America and Europe ( Figure 4). As with the M. edulis-M. galloprovincialis comparison, we observe comets of introgression in both continents. American and North European M. edulis populations share some comets ( Figure 4A), and almost none are private to either population (though see markers 183 and 087 in Figure 4). Both populations also show introgression from M. trossulus, although with different outcomes. Whereas the Baltic M. trossulus has been highly introgressed (exhibiting a mean hybrid index of 0.15), American M. trossulus does not show strong introgression, except at a few comets.

Hybrid zones
The hybrid zones studied here, involve four pairs of lineages, as listed in Table 3. We considered three types of populations: (i) "central" populations, from the heart of the hybrid zone; (ii) local parental populations (P1 and P2 local) on each side of the hybrid zone and impacted by direct migration; and (iii) peripheral parental populations (P1 and P2 periph) which are not in direct contact with the focal hybrid zone and so potentially less affected by local introgression.
The Øresund hybrid zone, includes the North-European M. edulis lineage (Figure 2, populations 20-21). Therefore, we treated such individuals from Russia as the P1 parental population. In this zone, the central population exhibits relatively homogeneous hybrid indexes, and is mainly composed of North-European M. edulis introgressed by M. trossulus (Figure 2). On the other side of the hybrid zone, the Baltic M. trossulus population is introgressed as shown above (Fraïsse, Belkhir, et al., 2016;Väinölä and Strelkov, 2011;Figures 4 and 5).
Finally, to highlight the intermediate character of the North-European M. edulis lineage, we carried out an analysis treating as admixed, the North-European M. edulis individuals in the Russian populations (18 and 19). As shown by the hybrid index and the Structure analyses, these mussels have an homogeneous admixture of around 60% South-European M. edulis and 40% American M. edulis.
When considering the correlation of genomic cline parameters (α and β) between hybrid zones, only two correlations proved statistically significant after correcting for multiple tests. The correlated parameters are α and β between the Scotland and Brittany local hybrid zone (Spearman correlation coefficients of 0.38 and 0.36, respectively, with p-values < 0.005), and β between the Brittany and Aquitaine hybrid zone (Spearman correlation coefficient of 0.38, p-value = 0.003).

Discussion
4.1 Mytilus mussels are genetically differentiated, but fixed differences are extremely rare One aim of this study was to develop a panel of ancestry-informative markers for the M. edulis complex in the Northern Hemisphere. To this end, we started with 51,878 high-quality SNPs from 1269 contigs (Fraïsse, Belkhir, et al., 2016), and selected SNPs with the greatest discriminatory power. This procedure was successful, in that we were able to discriminate not only individuals of the three species (M. edulis, M. galloprovincialis and M. trossulus) but also partially isolated genetic lineages within species (European and American M. edulis; Atlantic and Mediterranean M. galloprovincialis; and Baltic and American/North-European M. trossulus).
Nevertheless, of 92 ancestry-informative loci between M. edulis and M. galloprovincialis (Figure 3), only one was a fixed difference. Similarly, there was only one fixed difference among 126 ancestry-informative loci between M. edulis and M. trossulus (Figure 4). All the other SNPs were found at least once in heterozygous state (grey squares in Figures 3 and 4) and even sometimes as heterospecific homozygotes.
It is unlikely that this situation is unique to mussels. In the highly divergent Ciona tunicate species, C. robusta and C. intestinalis, loci initially assumed to be diagnostic and used to identify hybrids (Bouchemousse, Lévêque, Dubois, & Viard, 2016;Nydam & Harrison, 2011;Sato, Shimeld, & Bishop, 2014) were subsequently found to harbour shared polymorphisms in a multilocus analysis, and heterozygous genotypes were found in parental populations, both at the initially studied markers and at many of the newly designed markers (Bouchemousse, Liautard-Haag, et al., 2016).

Local introgression is widespread and has several possible causes
Because our SNPs showed extreme levels of differentiation, they provide a window on patterns of introgression in Mytilus. At the broadest level, our study confirms, with greater sampling, the findings of Fraïsse, Belkhir, et al. (2016), that introgression is pervasive in the complex. In the three species investigated, at least one population or lineage was impacted at some point by introgression. We have also shown that introgression is highly heterogeneous across the genome, with "comets" of heterospecific genotypes at some loci, while others resist introgression altogether (Figures 3 and 4).
The biogeography of the complex also allowed us to study introgressions of a very particular kind. In particular, the M. galloprovincialis/M. edulis transition is characterised by nearcontinuous intergradation between the Black Sea and Scandinavia, but with genetic barriers to gene flow at multiple points. This leads to a mosaic distribution in several regions. For example, the isolated M. edulis and M. galloprovincialis patches on the Atlantic coast of France, are separated by three hybrid zones, in Aquitaine, South Brittany, and Normandy (not sampled in this study) Hilbish et al., 2012). This structure revealed multiple instances of local comets of introgression, that is, introgression events that are localised both genomically, and geographically, with allelic variants crossing one barrier to gene flow, but halted at a subsequent barrier (Figure 3).
Three possible mechanisms could explain these local comets of introgression. First, the introgression could be adaptive (Hedrick, 2013;Pardo-Diaz et al., 2012;Staubach et al., 2012). In this case, the fact that introgressions were halted at a subsequent barrier could be explained either by an environmental difference (such that the allele was only locally beneficial), or by a very strong barrier (in which case the halting would be transient). Alternatively, if the markers are not the direct target of selection -as seems highly plausible (Fraïsse, Belkhir, et al., 2016) -a marker might hitchhike through one barrier, but be halted at a second (N. H. Barton, 2000;Faure et al., 2008). A second possibility is that the introgressions act to reduce genetic load in the recipient population (Kim, Huber, & Lohmueller, 2018). This scenario is similar to adaptive introgression, but emphasises the role of deleterious mutations in the recipient population rather than advantageous mutations in the donor population. Third, and finally, introgressions could involve bi-stable variants in a tension zone (Nicholas H. Barton & Hewitt, 1985). Such variants can move via an asymmetry in parental fitness, while being easily trapped by a density trough or a barrier to dispersion (see Nicholas H. Barton and Turelli, 2011, and a possible example in Mytilus in El Ayari et al., 2019). This scenario also implies transience in the structure, because tension zones can move due to genetic drift or changes in the environmental conditions (Piálek & Barton, 1997).

High overall concordance of barrier loci at hybrid zones
Our analyses of genomic clines, with local parental populations, show consistently high concordance between markers (Figure 5 middle panels). Indeed, the anomalous "excess ancestry" loci observed in Scotland and Øresund, while they could be local comets of introgression, are potentially due to the fact that the P1_local populations were not sampled.
The high levels of concordance provides a striking contrast to results when more distant, peripheral populations were treated as the parental reference (compare Figure 5 middle and right panels). With distant populations, the proportion of locally introgressed loci was so high, that it became difficult to estimate a "baseline" distribution. The very failure of these analyses further highlights the rampant local introgression in this system. Even with the local parental populations, we observed a few SNPs that deviated from the genomic average. These loci tend to introgress more than the others in the centre of the zone, but -unlike true comets of introgression -have not escaped the genetic barrier. We suspect that these loci might contribute to bi-stable variation that is asymmetric -i.e. one parental genotype is fitter than the other, but both are fitter than hybrids (Nicholas H. Barton, 1979a; Nicholas H. Barton & Turelli, 2011). Such variation can be trapped by genetic or migration barriers for long periods, but rare events of strong demographic stochasticity can free the "pushed" wave of advance (Piálek & Barton, 1997). Genomic cline outliers with asymmetric introgression are often interpreted as evidence of adaptive introgression, i.e. directional selection (Gompert & Buerkle, 2009). Our results support directional selection in the heart of the hybrid zones, but not in flanking parental populations. If a force is opposing the spread of the adaptive variant, it could be that selection against heterozygous or recombinant hybrid genotypes pushes in the opposite direction to selection for the fitter parental genotype, trapping the variant at the barrier (Nicholas H. Barton, 1979b).

The timing and context of introgressions
Given the heterogeneity in the introgression patterns we have observed, it seems likely that they were shaped by both contemporary contacts, and historic contacts during the Quaternary period (Hewitt, 2000).
In some cases our data allow us to make inferences about the timing of specific introgression events. For example, the local introgression of M. edulis alleles into M. galloprovincialis from the Black Sea and Mediterranean, is consistent with ancient contacts between these populations (Fraïsse, Belkhir, et al., 2016;Gosset & Bierne, 2013). The two private introgressions into the Black sea were nevertheless unexpected (Fraïsse, Belkhir, et al. (2016) did not sample these populations, and so our data set was not enriched for discriminatory SNPs), and so such introgressions might be quite numerous in the rest of the genome.
By contrast with these putatively ancient introgressions, it is likely that the private introgressions found in M. galloprovincialis from Brittany ( Figure 3B2) are relatively recent. This is because, given its position in the mosaic, it is probable that this population only became established during the last post-glacial period.
More recently still, it is likely that the admixed individuals between Mediterranean M. galloprovincialis and South-European M. edulis, observed in the port of Cherbourg, represent a recent human-mediated introduction. Similar admixed individuals have been observed in other ports in the English Channel and French Atlantic coast (Simon et al., 2019), where these "dock mussels" form small-scale hybrid zones with the native lineages outside of the port.
In one final case, shared introgressions allow us to make inferences about the historical biogeography of Mytilus. In particular, we found that several introgressions from M. trossulus to M. edulis were shared between M. edulis populations in Northern-Europe and America, and the most parsimonious explanation is that these introgressions predated their split. This hypothesis sheds light on the origins of the North-European M. edulis population. We have shown that this population is differentiated from both American, and South-European M. edulis, and appears as intermediate in the PCA ( Figure S2), Structure (Figures 2 and S10) and hybrid index analyses ( Figure 5). Given its presence as the parental M. edulis population in the Øresund hybrid zone, and the complete absence of American ancestry in the Netherlands, the border between Southern and Northern European M. edulis probably falls somewhere near to the Danish coast (see also the previously observed differences between North Sea and populations North of the Kattegat region Stuckas, Stoof, Quesada, & Tiedemann, 2009)). Given the shared introgressions, therefore, the contemporary biogeography could reflect a recolonisation of America after the last glacial maxima, by a proto-North-European M. edulis, which was later introgressed by the South-European M. edulis (Wares & Cunningham, 2001). Alternatively, American M. edulis, having survived in a refugia (Riginos & Henzler, 2008), might have colonised the North-Atlantic and Scandinavia, and then been introgressed by the South-European M. edulis in Europe.

Conclusion
We developed an ancestry-informative SNP panel powerful enough to classify species and divergent lineages of the M. edulis species complex. Following an extended sampling of reference populations, known hybrid zones and populations close to them, we confirm that local introgression is very widespread in the complex. Indeed, most of the markers that are highly differentiated between species pairs, are nonetheless segregating in both. The investigation of genomic clines in hybrid zones shows that loci are consistently concordant at a local scale, although a few asymmetric bi-stable variants might still be trapped by the genetic barrier. Overall our results suggest that asymmetrical parental fitness differences may enhance introgression at some regions of the genome yet successive barriers can prevent or delay propagation.