Abstract
High connectivity and low potential for local adaptation have been common assumptions for most marine species, given their usual high fecundity and dispersal capabilities. Recent genomic studies however, have disclosed unprecedented levels of population subdivision in what were previously presumed to be panmictic or nearly panmictic species. Here we analyzed neutral and adaptive genetic variation at the whole-genome level in Atlantic herring (Clupea harengus L.) spawning aggregations distributed across the reproductive range of the species in North America. We uncovered fine-scale population structure at putatively adaptive loci, despite low genetic differentiation at neutral loci. Our results revealed an intricate pattern of population subdivision involving two overlapping axes of divergence: a temporal axis determined by seasonal reproduction, and a spatial axis defined by a latitudinal cline establishing a steep north-south genetic break. Genetic-environment association analyses indicated that winter sea-surface temperature is the best predictor of the spatial structure observed. Thousands of outlier SNPs distributed along specific parts of the genome spanning numerous candidate genes underlined each pattern of differentiation, forming so-called “genomic regions or islands of divergence”. Our results indicate that timing of reproduction and latitudinal spawning location are features under disruptive selection leading to local adaptation in the herring. Our study highlights the importance of preserving functional and neutral intraspecific diversity, and the utility of an integrative seascape genomics approach for disentangling intricate patterns of intraspecific diversity in highly dispersive and abundant marine species.
Introduction
Population subdivision and connectivity are important topics in evolutionary and conservation biology, because they can help elucidate how local adaptation arises (Barrett & Hoekstra, 2011; Lewontin, 2002) and can guide management plans aiming to protect intraspecific genetic diversity, a determinant factor for population persistence in changing environments (Allendorf, Hohenlohe, & Luikart, 2010). Yet, the scarcity of genomic resources for most species, and the difficulty in determining the relative importance of genetic drift, gene flow, and selection in shaping contemporary patterns of intraspecific genetic diversity, remain major challenges (Ravinet et al., 2017). The increased power for assessing neutral and putatively adaptive genetic variation with next-generation sequencing (NGS) technologies (Nosil & Feder, 2012) is helping to uncover unprecedented levels of genetic structure in what were previously presumed to be panmictic or nearly panmictic species.
Marine species are outstanding examples of such paradigm shifts, as they have often been expected and observed to exhibit low levels of population structure and low divergence potential (Palumbi, 1994), given their high fecundity and dispersal capabilities (Hauser & Carvalho, 2008). Recent genomic studies revealing fine-scale structuring in diverse marine species are challenging this view [e.g., Atlantic cod (Gadus morhua) (Bradbury et al., 2013); Atlantic herring (Clupea harengus) (Martinez Barrio et al., 2016); American lobster (Homarus americanus) (Benestan et al., 2015)]. Various mechanisms by which population structure could arise in the sea have been proposed, including: oceanographic barriers, isolation-by-distance, larval and adult behavior, recent evolutionary history (e.g. historical vicariance and secondary contact), and natural selection (Palumbi, 1994). There is great interest in understanding how natural selection can lead to population divergence and local adaptation, especially under the homogenizing effect of gene flow (Tigano & Friesen, 2016) because of its direct relationship with fitness, population persistence, and evolution. However, the genetic basis of adaptive traits remains largely unknown (Barrett & Hoekstra, 2011). Genome scans performed with NGS methods are helping to identifying loci associated with adaptive phenotypes (Jones et al., 2012; Tavares et al., 2018). Such loci typically show elevated genetic divergence that is interpreted as a signature of selection. Nevertheless, disentangling genomic signatures of selection from signatures of demographic history has been limiting (Hoban et al., 2016). Species that are widely distributed are often exposed to diverse ecological habitats where selection can result in local adaptation (Yeaman & Whitlock, 2011). Therefore, highly fecund marine species inhabiting heterogeneous environments offer ideal candidates for the study of ecological adaptation, since in these the effect of genetic drift is minuscule and the effectiveness of natural selection is greater.
Atlantic herring is an abundant marine schooling pelagic fish that has colonized diverse environments throughout the North Atlantic, including open ocean and the brackish waters of the Baltic Sea. These characteristics, together with the increasing availability of genomic resources, make this species ideal for investigating the genetic basis and mechanisms involved in ecological adaptation. Juveniles and adults undertake annual migrations between feeding, overwintering, and spawning areas. Herring matures at 3-4 years of age and can live to 20+ years (Benoît et al., 2018). Spawning occurs mostly in spring and fall seasons at predictable times and locations near shore, which suggests strong spawning site fidelity (McQuinn, 1997; Stephenson, Melvin, & Power, 2009; Wheeler & Winters, 1984). Atlantic herring plays an important role in the marine ecosystem, feeding on plankton and being preyed upon by numerous marine fish, birds and mammals. It also sustains large fisheries throughout the North Atlantic (FAO, 2019), some of which have experienced severe periods of decline and signs of recovery in the last century (Britten, Dowd, & Worm, 2016; Engelhard & Heino, 2004; Overholtz, 2002; Simmonds, 2007). The ecological, economic, and cultural importance of herring has therefore motivated research on this species for more than a century (Stephenson et al., 2009); however, its complex life history has made the description of its population structure elusive (Iles & Sinclair, 1982).
Numerous studies have examined the population structure of herring using different genetic tools and at various spatial scales, mostly in the northeast (NE) Atlantic. Such studies have observed low levels of population differentiation at neutral loci (Andersson, Ryman, Rosenberg, & Ståhl, 1981; André et al., 2011; Jorgensen, Hansen, Bekkevold, Ruzzante, & Loeschcke, 2005). The expansion of these studies to the use of thousands of single nucleotide polymorphisms (SNPs) derived from various genomic techniques have revealed significant genetic differentiation at putatively adaptive loci in relation to environmental gradients (Guo, Li, & Merilä, 2016; Lamichhaney et al., 2012; Limborg et al., 2012). Moreover, the recent development of a high-quality genome assembly for the Atlantic herring allowed the identification of many millions of SNPs and a breakthrough in the possibility to study the genetic basis of ecological adaptation in this species (Martinez Barrio et al., 2016). A few studies have addressed this question in the northwest (NW) Atlantic (Kerr, Fuentes □ Pardo, Kho, McDermid, & Ruzzante, 2018; Lamichhaney et al., 2017; McPherson, O’Reilly, & Taggart, 2004); while they provided important insight on population structuring with seasonal reproduction and within the southern region, and reported temporal stability of genomic divergence between spring and fall spawners, they were limited by scarce sampling.
In the NW Atlantic, herring spawn from Cape Cod to southern Labrador (Bourne, Mowbray, Squires, & Koen-Alonso, 2018; Sinclair & Iles, 1989) between April and November, but spawning peaks in spring and fall. Spring- and fall-spawners are therefore the main spawning types recognized in the region. The relative abundance of each reproductive strategy varies geographically: in the north (northern Newfoundland) spring-spawners were historically more abundant, at mid-range (Gulf of St. Lawrence) both strategies were common, and in the southern extreme (Bay of Fundy, Scotian Shelf, Gulf of Maine) fall-spawners predominate (Melvin, Stephenson, & Power, 2009). Changes in the prevalence of these components have been observed in the last decade; in particular, a significant decline of spring-spawners and a moderate abundance of fall-spawners in the Gulf of St. Lawrence (McDermid, Swain, Turcotte, Robichaud, & Surette, 2018) and Newfoundland (Bourne et al., 2018). Such changes have been attributed to varying elevated fishing mortality, declines in weight-at-age, and environmental conditions (Melvin et al., 2009), suggesting that the effects of fishing pressure and climate change on population persistence of Atlantic herring are important. The concerning population declines (Britten et al., 2016) emphasize the need to disentangle the population structure of NW Atlantic herring.
Here, we study neutral and adaptive variation of adult herring collected from 14 spawning grounds distributed across the species’ reproductive range in the NW Atlantic. The two overarching questions were: i) What are the spatial scale and pattern of population structuring in herring and what is the genetic basis of such structuring, and ii) What is the potential functional effect of variant sites underlying population divergence and which mechanisms and environmental variables are associated with population structure patterns? We used whole-genome re-sequencing of pools of individuals [Pool-seq, (Schlötterer, Tobler, Kofler, & Nolte, 2014)] and individual genotyping along with multivariate statistical approaches, machine learning algorithms, and oceanographic information, to address these questions. Considering the particular attributes of the NW Atlantic Ocean (DFO, 1997; Townsend, Thomas, Mayer, Thomas, & Quinlan, 2004) and the importance of environment for shaping population divergence in herring, we predict that some of the divergent genomic regions exclusively found in Canada may be strongly associated with local environmental conditions. Our results provide insight into how population divergence arises in the presence of gene flow via temporal and spatial isolation and will help inform management and conservation practices.
Materials and Methods
Sample collection and DNA extraction
Adult herring (N=697) were collected from 14 inshore spawning aggregations distributed across Atlantic Canada and the Gulf of Maine (Fig. 1A and Table 1). Collections took place during the local spawning peak in the spring and fall seasons from 2012 to 2016. Sampling locations correspond to areas with recurrent annual spawning and jointly represent most of the reproductive range of the species in the NW Atlantic. Because of the presumed spawning site fidelity and the mixing of populations during the non-spawning seasons, we targeted individuals in reproductive condition to assess population definition. Individual muscle or fin tissue samples were preserved in 95% ethanol at −20 °C until processing. DNA was isolated from the tissue samples using a standard phenol chloroform protocol. DNA concentration (in ng/µl) was measured in triplicates using the Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific, U.S.) and the Roche LightCycler 480 Instrument (Roche Molecular Systems, Inc., Germany). DNA integrity was verified with 0.8% agarose gel electrophoresis using 0.5x TBE buffer and a 1Kbp molecular weight ladder.
Pool-sequencing and read quality filtering
Genome-wide patterns of genetic variation and population allele frequencies were assessed for each spawning aggregation using the Pool-seq approach. This method consists of performing whole-genome sequencing of pools of individuals using a single barcoded library, which implies that only population level data is recovered (i.e. individual genotype information is lost). In our case, each pool comprised equal amounts of DNA of ∼50 individuals collected on the same spawning ground (the terms spawning aggregation and sampling site will be interchangeably used hereafter). Individual DNA were normalized to a common concentration and pooled to a single tube using the liquid handling robot epmotion 5407 (Eppendorf, Germany). Sequencing library preparation and shotgun sequencing were outsourced. In brief, a single TruSeq Nano Illumina DNA library was built for each DNA pool (i.e. spawning aggregation). AMPURE beads were used for fragment size selection, targeting an insert size of ∼550 bp. The 14 pooled-DNA libraries were sequenced using paired-end 126-bp reads on an Illumina Hiseq-2500 sequencer in two batches (5 libraries in 2015, 11 in 2016). Target read depth of coverage per pool was 40-50x, for an estimated herring genome size of ∼850 Mbp (Martinez Barrio et al., 2016).
Quality of raw sequence reads of each pool was checked using FastQC v0.11.5 (Andrews, 2010), and jointly evaluated for the 14 pools with MultiQC v.1.3 (Ewels, Magnusson, Lundin, & Käller, 2016). Low quality bases (Phred score <20) and Illumina adapters were trimmed-off the reads, and reads shorter than 40 bp were removed from the dataset using Trimmomatic v.0.36 (Bolger, Lohse, & Usadel, 2014) [parameters: ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:40]. High quality paired-reads remaining after filtering were used for downstream analysis.
Read mapping, SNP calling and filtering
We adapted the Genome Analysis Toolkit (GATK) Best Practices workflow (Van der Auwera et al., 2013) to variant discovery in Pool-seq data and to our computing infrastructure. For this we first obtained a stitched version of the herring genome for optimal SNP caller performance in the computer cluster available. Then, sequence reads of each pool were independently aligned against the stitched herring genome using the Burrows-Wheeler Aligner (BWA) v0.7.12-r1039 [default parameters, MEM algorithm] (Li, 2013). SNP calling was performed using GATK v3.8 (McKenna et al., 2010) (see Fig. S1). Lastly, the raw variant calls were filtered using GATK (Fig. S2), Popoolation2, and custom python scripts (See Supporting Information for details). In Pool-seq applications, population allele frequencies are derived from the total read counts supporting a variant site. Read coverage though, can be biased by sequencing and read mapping artifacts (Dohm, Lottaz, Borodina, & Himmelbauer, 2008; Kolaczkowski, Kern, Holloway, & Begun, 2011). To control for these factors and minimize their potential effect on population allele frequency calculation, we applied the allele count correction proposed by (Feder, Petrov, & Bergland, 2012; Kolaczkowski et al., 2011). Details on the application of this correction method and population allele frequencies estimation can be found in the Supporting Information.
Population structure
Based on the population allele frequencies, we examined genetic structure among spawning aggregations with a Neighbor-Joining (NJ) phylogenetic tree and with pairwise FST estimates. We computed pairwise Nei (1972) genetic distance with Gendist and built a NJ tree with Neighbor, both programs implemented in the package PHYLIP v3.697 (Baum, 1989). Bootstrapping was performed using the program Seqboot of PHYLIP, and the consensus tree was visualized with FigTree (Rambaut, 2007). We estimated unbiased FST for pools between all pairs of spawning aggregations using the R package poolfstat (Hivert, 2018). This algorithm computes F-statistics equivalent to Weir & Cockerham (1984) estimates, while accounting for random sampling of chromosomes that may occur during DNA pooling and sequencing in Pool-seq applications.
Outlier loci detection and genome-wide patterns of differentiation
To identify loci potentially under selection, we performed genome scans for outlier loci detection using Principal Component Analysis (PCA), as implemented in the R package pcadapt v.4.0.2 (Luu, Bazin, & Blum, 2017). This algorithm assumes that divergent loci highly correlated to population structure are likely under selection. Outlier loci are detected based on the Mahalanobis distance calculated from the correlation coefficients between SNPs and a selected number of K principal components (PCs) (i.e. PCA loadings).
We performed a genome scan for the first 13 PCs (default is K=number of pools-1, 14-1= 13) using a minor allele frequency (MAF) of 0.05. Loci with Benjamini-Hochberg (BH) adjusted P-values ≤0.01 were considered candidates for being under selection. To identify which PCs explained the greatest proportion of genomic variance, we examined the scree plot generated by pcadapt, as well as the allele frequency patterns revealed in heatmaps made with the R package ComplexHeatmap (Gu, Eils, & Schlesner, 2016). The heatmaps depicted population allele frequencies (standardized to the major allele) of the 200 outlier loci most correlated to each PC (ranked by P-value in ascending order). We further explored the loci driving genomic differentiation in the herring by performing, with pcadapt, component-wise genome scans for the PCs exhibiting distinctive allele frequency patterns. To examine the distribution of outlier loci across the herring genome, for each informative PC we obtained Manhattan plots depicting the genomic position of outlier SNPs and their respective significance association value (–log10P-value) using the R package qqman (S. D. Turner, 2014).
Identification of the most informative outlier loci
We ranked outlier loci based on their importance for classification to each of the categories (or classes) of distinctive genomic patterns of differentiation in herring. For this we used random forest (RF), a supervised learning algorithm implemented in the R package randomForest (Liaw & Wiener, 2002). For the seasonal reproductive pattern, classes corresponded to spring or fall. For the latitudinal pattern, classes were northern (SIL-S, SPH-S, NTS-S, LAB-F, BLS-F, NDB-S, NDB-F, TRB-F, MIR-F, BDO-S, SCB-F), intermediate (MUS-F, GEB-F), and southern (ME4-F) regions. The RF model was based on 50 individual genotypes per spawning aggregation simulated from population allele frequencies using the R function sample.geno implemented in pcadapt v3.0.4. For the RF runs, the parameter mtry was set to default (equals to sqrt(p), where p is the number of loci); ntree was set to 1,000,000; and sampsize was set to 2/3 of the class with the lower sample size. From a scatter-plot of importance values generated by the random forest classifier (Mean Decrease in Accuracy, MDA), loci before the point where the differences between importance values level-off (“elbow method”) were considered the most important (Goldstein, Hubbard, Cutler, & Barcellos, 2010).
Validation of a subset of outlier SNPs related to seasonal reproduction and to latitudinal divergence
We validated some of the top candidate loci detected with Pool-seq data that showed strong association with seasonal reproduction and latitudinal divergence with individual genotypes. For this, we genotyped 240 individuals (30 individuals from 8 locations) in 40 SNPs related to seasonal reproduction and 90 SNPs related to latitude using the Agena MassARRAY SNP genotyping platform (Agena Bioscience, Inc.). These SNPs were chosen considering these criteria: (i) top ranked based on importance values (Mean Decrease in Accuracy, MDA) obtained from the random forest algorithm (as described in the previous section), (ii) had ≥150 bp of flanking sequence for primer design, (iii) did not fall within or a few bases away from repetitive regions and had fewer than 4 flanking SNPs, (iv) when two or more top ranked SNPs were located within the same scaffold, the ones separated by ≥ 1Kbp were kept, in an attempt to minimize redundancy in the panel. The application of these filters and the retrieval and preparation of DNA sequences for primer design for the Agena platform were performed with custom R scripts. A quality control of raw SNP genotypes was performed using PLINK (Purcell et al., 2007), in which SNPs and individuals with more than 20% missing data, and SNPs with minor allele frequency (MAF) lower than 0.01 were removed. We obtained a heatmap plot using the R function heatmap.2 of the R package gplots for the visual inspection of individual genotype patterns. File format conversions required for missing data filtering and heatmap plotting were conducted with PGDSpider (Lischer & Excoffier, 2012) and a custom python script (data was transformed to PLINK format, then to VCF file format, and finally to 0,1,2 format).
Functional annotation of outlier loci
We investigated the potential effect on gene function of outlier SNPs associated with seasonal reproduction and the latitudinal cline using SNPeff v4.1l (build 2015-10-03) (Cingolani et al., 2012) [default parameters]. This program determines the position of a SNP with respect to the constituents of a nearby gene within 5Kbp (i.e. exons, introns, 5’-UTR region, etc.), and predicts its putative effect on gene and protein composition (i.e. synonymous and missense mutations, premature stop codon, etc., a complete list of effects is described in the program documentation). Variants located beyond 5Kbp of a gene were annotated as ‘intergenic’. We based this analysis on the current herring genome assembly and annotations (Martinez Barrio et al., 2016). Further, we separately examined gene ontology (GO) terms of the genes annotated to the outlier loci most strongly associated with seasonal reproduction and the latitudinal cline (−log10P-value ≥ 7, equivalent to P-value ≤ 1×10-7, lower threshold commonly used for significant association in human GWAS, (Fadista, Manning, Florez, & Groop, 2016; Panagiotou & Ioannidis, 2012). Details of the analysis performed on the GO terms can be found in the Supporting Information.
Genetic-Environment Association analysis
We performed redundancy analysis (RDA) and random forest (RF) regressions to identify environmental variables significantly associated with spatial patterns of population divergence.
The environmental dataset used for these analyses consisted of sea surface temperature (SST), sea bottom temperature (SBT), and sea surface salinity (SSS) for winter, spring, summer and fall seasons, for a total of 12 oceanographic variables. These variables are relevant in population structuring of numerous marine species in the NW Atlantic (Stanley et al., 2018).
To obtain environmental measures for each sampling location, we acquired monthly data layers of SST, SBT, and SSS between 2008-2017 from NEMO 2.3 (Nucleus for European Modelling of the Ocean), an oceanographic model developed by the Bedford Institute of Oceanography, Canada. A detailed description of oceanic (Madec, Delecluse, Imbard, & Levy, 1998) and sea ice (Fichefet & Maqueda, 1997) model components can be found in Wang, Brickman, Greenan, & Yashayaev (2016) and Brickman, Hebert, & Wang (2018). Data layers were converted to an ASCII grid with a NAD83 projection (ellipse GRS80), they had a nominal resolution of 1/12o (∼5km2), and a uniform land mask. Four seasonal bins, corresponding to winter (January-February-March), spring (April-May-June), summer (July-August-September), and fall (October-November-December), were averaged across 9 years in order to capture long-term trends of oceanographic variation. Data extraction for the 14 geo-referenced locations was conducted using custom R scripts (Stanley et al., 2018). Environmental data were standardized to zero mean and unit variance in R for downstream analysis. Collinearity between environmental variables was estimated with pairwise correlation coefficients computed with the function pairs.panels of the R package psych (Revelle, 2018) (Fig. S11), and with variance inflation factors (VIF) obtained from RDA models built with the R package vegan (Dixon, 2003). Prior to RDA, the most collinear variables were removed based on biological/ecological criteria (Forester, Lasky, Wagner, & Urban, 2018). Subsequently, remaining collinear variables were identified and removed one by one in consecutive RDA runs based on their VIF. The variable with the highest VIF was discarded in each run until all variables had a VIF < 5, following recommendations by (Zuur, Ieno, & Elphick, 2010).
For RDA, we used the reduced environmental dataset as constraining variables for the population allele frequencies of the top 500 outlier loci exhibiting the latitudinal pattern. RDA runs were performed with the R package vegan, following Jeffery et al. (2018) and Lehnert et al. (2018). Environmental variables that best explained genetic variance were identified using a bi-directional stepwise permutational ordination method (1000 iterations) implemented in the R function ordistep. Significance of the overall RDA model and of selected environmental variables was assessed with analysis of variance (ANOVA) using 1000 permutations. In order to estimate the proportion of the genetic variance independently explained by environment, geographic distance, or both, we performed variance partitioning using partial redundancy analysis (pRDA), either conditioned on geographic distance (Cartesian coordinates) or selected environmental variables, respectively. Cartesian coordinates of each location, equivalent to the pairwise least-cost geographic distance between locations accounting for land as barrier, were obtained with the R function CartDist (Stanley & Jeffery, 2017). Concordance between Cartesian and geographic coordinates was assessed with a linear regression (Fig. S3).
For RF regressions, we used the population allele frequencies of each outlier locus as single response vectors and the 12 standardized environmental variables as predictors. A RF regression was performed for each outlier locus with the R package randomForest, as described in Lehnert et al. (2018) and Sylvester et al. (2018). Default parameters for regression were applied to the RF runs (mtry = p/3, where p is the total number of predictors, or environmental variables in this case), except that ntree was set to 10,000. The selected number of trees to grow per run (ntree) assured Mean Decrease in Accuracy (MDA) convergence, as demonstrated in a pilot test that compared MDA of predictors of 3 independent RF runs (correlation coefficient r = 0.9999, Fig. S4). Environmental variables were then ranked based on their relative importance to explain genetic variance from the averaged MDA values across loci, and the mean residual square error (MSE) of each location averaged across loci.
Isolation-by-distance pattern test
To evaluate whether global (all loci) and latitude-related population structure (subset of loci) corresponded to an isolation-by-distance (IBD) pattern, we determined the significance of the association between geographic and genetic distances for all possible pairs of sampled spawning sites using Mantel tests (Mantel 1967) with 9999 permutations, implemented in the R package ade4 (Dray & Dufour, 2007). Genetic distances were linearized (Rousset, 1997) with computed using all SNPs identified across the genome, in the first case, or solely outlier SNPs strongly associated with latitudinal divergence, for the latter. Geographic distances were estimated with the R package CartDist (Stanley & Jeffery, 2017) as the least-coast oceanic distance in Km considering land as barrier.
Results
Sampling distribution and pool-sequencing
A total of 697 adult herring from 14 spawning aggregations distributed in and around Newfoundland and Labrador, the Gulf of St. Lawrence, Scotian Shelf, Bay of Fundy, and Gulf of Maine in the NW Atlantic were included in this study (Fig. 1A, Table 1). We aimed to include in the same pool only DNA from “ready-to-spawn” and “actively spawning” individuals collected in the same area [gonadal maturity stage 5 and 6, respectively, (Bucholtz, Tomkiewicz, & Dalskov, 2008)]. Yet, in some spawning aggregations (BDO-S, NDB-S, NDB-F, TRB-F, and ME4-F, see pie charts in Fig. 1A) 25-50% of individuals were in “maturing” (stage 4) or “resting” (stage 8) condition at the time of sampling. The designation of “S” or “F” in the location name thus only reflects the season of collection and not necessarily the actual spawning season of all fish included in the pool.
A total of ∼800 GB of raw sequence data were obtained. After quality filtering and adapter trimming, 6,119,940,640 reads of optimal quality (Phred score > 20) were available for the genomic analysis. Read mapping statistics indicated that > 98.8% of read-pairs were correctly aligned to the stitched version of the herring reference genome (mapping quality MQ > 48, median insert size of 527 bp) (Table S1), confirming that misalignment errors, if present, were negligible. Average read depth of coverage per pool ranged between 25x to 44x and varied between sequencing batches [2015 batch mean 28.7 ± 4.0, 2016 batch mean 36.9 ± 2.6 (Table S1). We monitored the potential effect of coverage variation in downstream analysis, in particular for collections with lower coverage (TRB-S, NTS-S, and GEB-F). Variant calling resulted in 11,154,328 raw SNPs of which 2,189,380 passed quality filters and were retained for further analysis.
Population structure
As observed in our previous study (Lamichhaney et al., 2017), spawning aggregations in the NW Atlantic clustered according to reproductive season in a Neighbor-Joining tree, with spring and fall spawning collections forming separate groups (Fig. 1B), although a few exceptions were observed. BDO-S sample was in an intermediate position with respect to these two main clusters, and a spring-collected sample in Newfoundland (NDB-S) clustered with the fall group, suggesting it may be composed of a large proportion of fall spawners. A closer examination of the fall group revealed clustering according to latitude. Southern collections in the Scotian Shelf (MUS-F, GEB-F), Bay of Fundy (SCB-F), and Gulf of Maine (ME4-F) were separated from northern collections in the Gulf of St. Lawrence (MIR-F, BLS-F), Newfoundland (TRB-F, NDB-S, NDB-F) and Labrador (LAB-F). Such separation suggests genetic differences may exist between herring inhabiting these two geographic regions.
The pairwise fixation index FST for pools ranged between 0.012 and 0.043, indicating low levels of genetic structure among the 14 spawning aggregations studied (Fig. 1C, pairwise FST values in Table S2). Nevertheless, three clear patterns of subtle genetic differentiation were noticeable: i) between spring and fall spawners (SIL-S, SPH-S, NTS-S, vs. others, 0.022-0.043), ii) within spring spawners, the sample from the NW of the Gulf of St. Lawrence (SIL-S) was the most genetically distinguishable , and iii) within fall spawners, the two southernmost collections (GEB-F and ME4-F) were the most divergent . In general, the largest genetic differentiation was observed between spring spawners and the most southern collections . Interestingly, the two spring-collected samples BDO-S and NDB-S (two samples presumably containing both spring and fall spawning individuals, see below) exhibited similar levels of differentiation with samples comprising solely spring spawners (SIL-S, SPH-S, NTS-S) as with samples comprising solely fall spawners.
Outlier loci detection and genome-wide patterns of differentiation
A PCA-based whole-genome scan for the identification of SNPs putatively under selection revealed two main axes of genomic differentiation in NW Atlantic herring: spawning season, and geographic origin according to latitude. In a PCA plot based on 2,189,380 SNPs (Fig. 1D), spring and fall spawning herring were distinguishable along the first principal component (PC1) (36% of variance explained). PC2 distinguished two collections, German Bank (GEB-F) and Northumberland Strait (NTS-S) from the rest (Fig. S5). These two collections exhibited the shallowest average sequencing coverage, suggesting this axis (PC2) is largely reflecting an artefact of sequencing. PC2 was therefore ignored (Fig. S5). On PC3, the southernmost collections, distributed on the Scotian Shelf, Bay of Fundy and Maine (MUS-F, SCB-F, GEB-F, ME4-F), were differentiated from the aggregations in the Gulf of St. Lawrence, Newfoundland, and Labrador (30% of the variance explained) that formed a tight cluster. The sample from Maine (ME4-F) was the most differentiated of all, followed by German Banks (GEB-F), the southernmost location sampled on the Scotian Shelf. Along PC1, BDO-S and SIL-S were positioned in between the spring and fall spawners, BDO-S being closer to the fall samples and SIL-S to the spring samples. NBD-S clustered tightly with the fall spawners. In general, with the exception of the two southernmost samples (GEB-F and ME4-F), fall spawning aggregations grouped more closely together than the spring spawning ones, suggesting that more genetic differences may exist among the spring spawners than among fall spawners included in this study.
In PC1, a total 14,724 outlier SNPs were detected (with Benjamini-Hochberg-adjusted P-values and FDR ≤ 0.01). A Manhattan plot depicting significance values (–log10P-value) of outlier loci for this PC disclosed numerous “peaks” or regions of divergence across the genome, spanning about 18 scaffolds and numerous genes (Fig. 2A). The top SNPs of these scaffolds were in the proximity of genes with known function in reproduction, such as TSHR, ESRA, HERPUD2, CALM (Martinez Barrio et al. 2016). Moreover, a new set of candidate genes linked to seasonal reproduction were ISO3, SERTM1, SIPA1L1, CAMKK1, TMEM150C, CBLB, ENTPD5, KCNJ6, LPAR6 and GPR119, as they were near top outlier loci in the unique islands of differentiation only observed in the NW Atlantic (Lamichhaney et al., 2017). A heatmap depicting standardized population allele frequencies of the top 200 outlier loci from the scaffolds identified with RF (ranked in descending order by −log10P-value) distinguished aggregations by spawning season (Fig. 2B), with fall spawners fixed for one allele and almost all spring spawners fixed for the alternative allele. The exceptions to this observation were three aggregations sampled in spring, BDO-S, SIL-S, and NDB-S. The first two collections exhibited allele frequencies around 0.5, while NDB-S showed population allele frequencies consistent with fall spawners. These results indicate that BDO-S and SIL-S either correspond to a mixture of spring and fall spawning individuals or to hybrids or both, and that NDB-S should be considered as a sample of fall spawners, suggesting possible mislabeling.
In PC3, a total of 6,595 outlier loci were detected (with BH-adjusted P-values and FDR ≤ 0.01). A Manhattan plot for this PC disclosed four main regions of divergence across the genome, corresponding to scaffolds 44, 122, 869 and 958, and a small number of outlier loci from other scaffolds (Fig. 2C). The top SNPs in the four main scaffolds were located within 5Kbp of the genes FAM129B, FNBP1, SH3GLB2, and GPR107. A heatmap representing standardized population allele frequencies of the top 200 outlier loci from the scaffolds identified with RF (ranked in descending order by −log10P-value) revealed contrasting genetic patterns according to latitude (Fig. 2D). In northern collections, including Labrador (LAB-F), Newfoundland (NDB-S, NDB-F, TRB-F, SPH-S), Gulf of St. Lawrence (BLS-F, SIL-S, MIR-F, NTS-S), Bras D’Or lake (BDO-S), and inner Bay of Fundy (SCB-F), one allele was close to fixation; in the southernmost collection, in Maine (ME4-F), the alternative allele was in high frequency; and in intermediate southern collections along the Scotian Shelf (MUS-F, GEB-F) allele frequencies were around 0.5. An extended examination of population allele frequencies of the 14,724 outlier SNPs detected in PC1 (Fig. S8), revealed that additional SNPs from the four scaffolds showing the latitudinal pattern were present in PC1 and showed the same pattern as the ones found in PC3 (3,378). Thus, these SNPs were removed from the PC1 set and added to the ones detected in PC3, for a total of 11,346 SNPs associated with seasonal reproduction and 9,973 SNPs associated with latitude.
A closer examination of the genomic distribution of outlier SNPs revealed that seasonal reproduction-related outliers exhibited varying levels of significance (–log10P-value up to 30) (Fig. 2A), were confined to a particular region within a scaffold (around 50-500 Kbp) and spanned a given set of genes (Fig. S6). In contrast, latitude-related outliers showed similar significance values (–log10P-value ∼15) (Fig. 2C), were widely spread along scaffolds (covering between 480 Kbp to 4.75 Mbp) and spanned numerous genes (Fig. S7), suggesting the possibility of a chromosomal rearrangement.
Validation of a subset of outlier SNPs related to seasonal reproduction and to latitudinal divergence
A total of 230 individuals (NDB-F: 30, NDB-S: 29, SIL-S: 27, NTS-S: 30, BDO-S: 28, MUS-F: 29, GEB-F: 27, ME4-F: 30) and 52 and 74 SNPs related to seasonal reproduction and latitudinal divergence, respectively, passed the missing rate and MAF quality filters. Heatmaps depicting individual SNP genotypes for each of the two panels (Fig. S9) confirmed the overall patterns of population allele frequencies of the two axes of divergence detected with Pool-seq data (Fig. 2B,D), seasonal reproduction and latitude.
The SNP panel discriminating spawning season revealed that the spring-collected samples SIL-S and BDO-S corresponded to a mixture of spring and fall spawners and putative hybrids, the latter defined as heterozygous individuals at many of the loci showing a high degree of fixation between groups. SIL-S comprised an even proportion of pure fall spawners and putative hybrids with a few pure spring spawners, whereas BDO-S comprised mostly pure fall spawners and a few hybrids and spring spawners. The other spring-collected samples, NTS-S, consisted of mostly pure spring spawners and a few putative hybrids, while NDB-S corresponded to pure fall spawners. In contrast, all the fall-collected samples genotyped (NDB-F. MUS-F, GEB-F and ME4-F) corresponded to pure fall spawners, with a few heterozygous loci.
The SNP panel discriminating by latitude confirmed northern samples were characterized by high frequency of one allele, while the alternative allele had greater frequency in the southernmost sample (in Maine), although putative hybrids were present in both cases in varying proportions. Intermediate locations (BDO-S, MUS-F, GEB-F) exhibited a genotypic cline of increasing proportion of putative hybrids towards the south.
Functional annotation of outlier loci
A total of 2,977 and 1,257 outlier SNPs associated with seasonal reproduction and latitudinal divergence, respectively, were annotated with respect to a neighboring gene (within 5Kbp). For both cases, the majority of outlier SNPs were located within introns and intergenic regions, or 5Kbp upstream or downstream of genes (Fig. 3A). A small number of outlier SNPs were predicted as synonymous (∼2%) or missense variants (1,6% and 0.9%, for spawning- and latitude-related outliers, respectively).
Excluding intergenic variants and genes that did not correspond to an orthologous gene in zebrafish, a list of 298 and 182 genes associated with seasonal reproduction and latitudinal divergence in herring, respectively, resulted from the annotated outlier loci. For seasonal reproduction-related genes, 126 had a GO term in the biological process category, 109 in the cellular component category, and 120 in the molecular function category (Fig. S10A). For latitude-related genes, 90 had a GO term in the biological process category, 72 in the cellular component category, and 80 in the molecular function category; considered together, close to half of the genes lacked GO classification. A comprehensive description of particular functions within the three GO categories and the number of genes in each of them is presented in Fig. S10B).
The overrepresentation enrichment analysis (ORA) of both sets of candidate genes did not reach statistical significance (FDR of 5%) (Table S3 and S4), likely due to the large number of genes lacking GO annotation (Fig. S10). However, a closer examination of the top GO terms with P-value < 0.05 (ranked in ascending P-values from ORA, Table S3 and S4, GO terms indicated with an asterisk), suggested that seasonal reproduction-related candidate genes may participate in biological processes such as metabolism of lipids, cell adhesion, biosynthesis of cellular products, peptidyl-aminoacid modification, protein complex biogenesis, inositol lipid-mediated signaling, developmental maturation, regulation of developmental process, and cellular component organization (Fig. 3B-top, Table S3). These genes might primarily act in cellular components such the endoplasmic reticulum and the whole membrane (Fig. 3B-middle) and play a molecular function related to cell adhesion molecule and protein binding and lipid transporter and transferase activities (Fig. 3B-bottom). The top GO terms of candidate genes associated with latitudinal divergence were all involved in embryological and organ development processes (Fig. 3C-top, Table S4). These genes might act in cellular components such phosphatase complex, collagen trimer, and in the extracellular region (Fig. 3C-middle), and participate in sulfur compound binding and hydrolase and isomerase activities (Fig. 3C-bottom).
Genome-Environment Association analysis
Collinearity among several of the environmental variables examined and redundancy analyses (See Supporting Information) allowed us to reduce the environmental data set to just three variables: summer SBT, winter SST, and spring SSS. RDA indicated that winter SST (Win_SST) (Fig. 4A) was the environmental variable that best explained the genetic variance of outlier loci exhibiting the latitudinal cline (F = 16.7, p = 0.001, from ordistep function) (Fig. 4B). No other temperature or salinity variable in the reduced environmental dataset was significant (from ANOVA with 1000 permutations, significance value = 0.05). Spawning aggregations were separated according to Win_SST on RDA axis 1, which explained 58.1% of the total genetic variance (R2 = 0.58, adjusted R2 = 0.55). pRDA however, showed that the Win_SST-based RDA model was no longer significant when the effect of geographic distance between sites was removed from the model. A variance partitioning analysis revealed that the interaction between environment and geographic distance explained the greatest proportion of clinal genetic variation (44.9%).
In agreement with RDA results, RF regressions also indicated that Win_SST was the most important environmental variable (MDA = 23.5), followed by Fall_SST (MDA = 21.8) (Fig. 4C). The other temperature variables had lower importance (MDA < 10), and salinity measures were the least important of all (MDA < 5). ME4-F, the southernmost spawning aggregation sampled, exhibited the highest mean square error (MSE = 0.21), followed by SCB-F and MUS-F (MSE ∼ 0.05), whereas the other 10 collections had lower MSE, below 0.03 (Fig. 4D).
A closer examination of the map of the NW Atlantic depicting average Win_SST over the last 9 years and the predominant population allele frequency of the 14 sites studied (Fig. 4A), revealed that herring in “northern” collections in the Bay of Fundy, the Gulf of St. Lawrence, and Newfoundland and Labrador were characterized by being exposed to temperatures below zero (−2 °C), whereas in “southern” collections they were mainly exposed to temperatures above zero (>2 °C).
Isolation-by-distance test
The Mantel test showed there is not a significant linear relationship between geographic and genetic distances for all loci across the genome (R2 = 0.04), whereas there is a significant linear relationship (R2 = 0.30) between geographic distance and genetic differentiation when only looking at outlier SNPs exhibiting the latitudinal break in population allele frequencies between northern and southern collections (Fig. 5).
Discussion
Here we described patterns of genetic variation at the whole-genome level in Atlantic herring populations distributed across the reproductive range of the species in North America. This study represents the most comprehensive assessment of this kind in the region to date. We uncovered fine-scale population structure at outlier loci putatively under selection, despite low differentiation at selectively neutral loci. This observation is consistent with previous genetic work on herring in both, the NE (Guo et al., 2016; Lamichhaney et al., 2012; Limborg et al., 2012; Martinez Barrio et al., 2016; Teacher, André, Jonsson, & Merilä, 2013) and the NW Atlantic (Lamichhaney et al., 2017; McPherson et al., 2004; McPherson, Stephenson, O’Reilly, Jones, & Taggart, 2001). The large population sizes, high potential for gene flow, and minute effect of genetic drift explain the low genetic differentiation observed at neutral loci (Palumbi, 1994). These conditions also favor the more efficient action of natural selection, which seems to be behind the genetic differences observed at outlier loci.
While prior genomic studies disclosed genetic structure with seasonal reproduction and salinity (Lamichhaney et al., 2012; Martinez Barrio et al., 2016), and others suggested structuring along the salinity/temperature gradient in the Baltic Sea from dozens of markers While prior genomic studies disclosed genetic structure with seasonal reproduction and salinity (Lamichhaney et al., 2012; Martinez Barrio et al., 2016), and others suggested structuring along the salinity/temperature gradient in the Baltic Sea from a dozens of markers (Gaggiotti et al., 2009; Guo et al., 2016; Limborg et al., 2012), here we successfully disentangled two main overlapping axes of divergence supported by thousands of outlier SNPs: seasonal reproduction and a latitudinal cline defining a north-south genetic break. Our genetic-environment association analyses indicated that winter sea-surface temperature is the best predictor of the spatial structure observed. These results: demonstrate for the first time that herring from the north (Labrador, Newfoundland, Gulf of St. Lawrence and Bay of Fundy) are genetically distinguishable from the ones in the south (Scotian Shelf and Maine) regardless of their spawning season; indicating that thermal-minima related factors are likely driving latitudinal genetic differentiation; and provide additional evidence supporting the recently described multispecies biogeographic barrier in eastern Nova Scotia (Stanley et al. 2018).
Outlier SNPs exhibited remarkable clustering, forming so-called “genomic regions of divergence” (Nosil, Funk, & Ortiz-Barrientos, 2009; T. L. Turner, Hahn, & Nuzhdin, 2005), and extreme allele frequency differences (i.e. alternative alleles were close to fixation in either spring- or fall-spawning, or in northern- or the southernmost populations). Theory predicts that formation of genomic regions of divergence (Schluter, 2009; Wu, 2001) and fixation of different alleles conducive to opposing phenotypes often result from natural selection acting in contrasting directions between environments (Vitti, Grossman, & Sabeti, 2013). Considering the heterogeneous environmental properties of the Northwest Atlantic (Melvin et al., 2009; Townsend et al., 2004) and having discarded an effect of genetic drift and an isolation-by-distance pattern, we conclude that disruptive selection may be the main evolutionary force involved in population structuring in the region.
A few exceptions to the allele fixation pattern were observed in both axes of divergence. In seasonal reproduction outliers, two aggregations sampled in spring, BDO-S and SIL-S, exhibited allele frequencies around 0.5 at SNPs being closed to fixation for opposite alleles in other populations of spring- and fall-spawning herring. This observation suggests these collections either correspond to a mixture of spring- and fall-spawners, or to a unique population where allele diversity is favored. Individual genotypes of a subset of diagnostic SNPs of spawning time confirmed BDO-S and SIL-S comprised a mixture of spring and fall spawners and putative hybrids (i.e. heterozygous individuals at many of the loci showing a high degree of fixation between groups). In latitude-related outliers, intermediate allele frequencies were observed in MUS-F and GEB-F, two locations in southwestern Nova Scotia, mid-range in the latitudinal cline. Interestingly, these locations are few kilometers south of the biographic barrier described in the NW Atlantic (Stanley et al., 2018). Environmental conditions in the NW Atlantic vary between years in relation to oceanographic global trends (Townsend et al., 2004). It is possible then that populations in southwestern Nova Scotia experience significant inter-annual environmental fluctuations during winter months, depending on the strengthening either of the warm Gulf Stream flowing north or of the cold Labrador Current flowing south. Under these dynamic circumstances, it is possible that balancing selection may be maintaining polymorphism at these loci. Additional studies including an extended sampling in the southern region could be used to test this hypothesis.
A closer examination of the genomic regions of divergence revealed they vary in size and genomic location between the two axes of divergence. Seasonal reproduction-related outliers were distributed across 18 scaffolds in which they spanned about 50-500 Kbp and a given set of genes. In contrast, latitude-related outliers were mostly spread in four scaffolds, covering a larger extension, from 480 Kbp to 4.75 Mbp, and larger number of genes. The observation that latitude-related outliers were widely distributed and consistently divergent across four large scaffolds suggests that they could be located within a chromosomal rearrangement. If this were the case, the expectation would be that populations from the north were homozygous for one state of the variant, the ones in southwest Nova Scotia were polymorphic, and in the Gulf of Maine were homozygous for the alternative state of the variant. Further research supported by a linkage map, not described yet for herring, is required for the evaluation of this hypothesis.
A bioinformatic evaluation of the functional effect of outlier SNPs disclosed that, for both axes of divergence, the majority of SNPs were located within introns, intergenic regions, and 5Kbp upstream or downstream of genes, and a smaller proportion corresponded to missense mutations (1,6% and 0.9%, for spawning- and latitude-related outliers, respectively). Mutations in introns can modify regulatory domains, intron-exon boundaries and RNA splicing (Pagani & Baralle, 2004); missense mutations result in a different amino acid; and mutations in regulatory elements can modify gene expression (Epstein, 2009; Metzger et al., 2016; M. Nei, 2007). While at this point is not possible to trace a direct link between single SNPs and gene function or identify causal mutations, our observations suggest that single base changes in introns, protein-coding, and regulatory regions may be involved in adaptive divergence in NW Atlantic herring, in agreement with previous observations in the NE Atlantic (Martinez Barrio et al., 2016).
Gene annotation of top outlier SNPs confirmed that TSHR, HERPUD2, SOX1, SOX11A, SYNE1, SYNE2, and ESR2A are candidate genes related to seasonal reproduction. These genes have a known function in reproduction and were previously linked to spawning time in NE Atlantic herring (Lamichhaney et al., 2017; Martinez Barrio et al., 2016). We discovered an additional set of candidate genes, ISO3, SERTM1, SIPA1L1, CAMKK1, TMEM150C, CBLB, ENTPD5, KCNJ6, LPAR6 and GPR119, corresponding to the genomic regions of differentiation uniquely observed in the NW Atlantic (Lamichhaney et al., 2017), hence, they can potentially be involved in local adaptation. Candidate genes related with the latitudinal cline are FAM129B, FNBP1, SH3GLB2, and GPR107.
A qualitative examination of the top ranked GO terms indicated that candidate genes related to seasonal reproduction may be involved in biological processes such as metabolism of lipids, biosynthesis of cellular products, developmental maturation, regulation of developmental process, and cellular component organization. Similarly, latitude-related candidate genes may participate in embryological and organ development processes. These observations suggest that outlier SNPs underlying the two axes of divergence may be involved in different physiological pathways, and that natural selection along the latitudinal cline likely acts on early life stages, in agreement with the proposed hypothesis for the multispecies climatic cline (Stanley et al., 2018). It is likely that early life stages experience selection along the latitudinal cline given that larval retention areas are in the proximity of spawning grounds (Stephenson et al., 2009). If selection would act on juveniles or adults, which are highly migratory, then the pattern should not coincide with spawning locations.
We provide genetic evidence that suggests timing of reproduction and latitudinal spawning location are features under disruptive selection leading to local adaptation. Several characteristics of herring biology and ecology seem to support this. For instance, (i) spawning occurs at predictable times and locations, the timing differs among geographic regions (Stephenson et al., 2009), and there is no evidence indicating that individual fish can switch spawning season (Melvin et al., 2009); (ii) herring spawns once a year and exhibits spawning site fidelity (Wheeler & Winters, 1984); (iii) spring- and fall-spawners differ in morphometric characters, in life-history traits (fecundity, egg size and growth), and in phenotypic traits (number of vertebrae and otolith shape) (Baxter, 1959; Cushing, 1967; Messieh, Anthony, & Sinclair, 1985); growth rate, otolith shape, and vertebral counts seem to be largely influenced by genetic factors (Berg et al., 2018); (iv) early life stages spawned in different seasons and locations experience contrasting environmental conditions (e.g. in the Gulf of St. Lawrence, eggs released by spring spawners hatch after 30 days at 5°C, while eggs of fall spawners hatch after 10 days at 15°C; in Nova Scotia, eggs of fall spawners hatch in 11 days at 10°C) (Scott & Scott, 1988); (v) larval retention areas occur near spawning grounds and are stable over time, in predictable patterns related to oceanographic conditions (Stephenson et al., 2009); and (vi) genetic differences between spring- and fall-spawners are temporally stable (Kerr et al., 2018). From this, we then infer that timing of reproduction and latitudinal spawning location can be adaptive strategies to increase offspring survival, particularly at vulnerable early life stages, in environments that vary seasonally and geographically. When timing of reproduction is largely heritable, the resulting temporal assortative mating may reduce gene flow between individuals breeding at different times (Hendry & Day, 2005). In herring, gene flow may be limited between early spring-spawners and late fall-spawners even if they are in sympatry (as their gonads are not ripe at the same time, as we observed in our samples). How do hybrids occur? We hypothesize hybridization could happen between late spring-spawners and early fall-spawners at geographic areas where both reproductive strategies coexist (e.g. in the Gulf of St. Lawrence), and when the onset of gonadal maturation coincides (likely temperature driven). Hybrids would survive then, if they can cope with the local environmental conditions.
Although disruptive selection is a strong candidate for explaining latitudinal divergence in herring, other mechanisms are possible. For example, additional biotic or abiotic factors that covariate with temperature may be the actual drivers of adaptation. Pre- or post-zygotic reproductive incompatibilities that coincide with latitude (but are not dependent on) can result in the observed spatial genetic discontinuity (Bierne, Welch, Loire, Bonhomme, & David, 2011). The current latitudinal break may actually reflect historical vicariance (Bradbury et al., 2010), not contemporary population dynamics (Palumbi, 1994). Further studies are required to evaluate these alternative hypotheses.
Even though valuable information was obtained through this study, there were some limitations. In Pool-seq individual information is missed, thus it is not possible to correct accidental mixing of individuals with different origin/spawning season. To avoid this, we selected maturing and ripe fish collected in known spawning grounds during the local peak of reproduction. Despite these precautions, we found evidence of some mixed aggregations (SIL-S and BDO-S). Moreover, in the over-representation enrichment analysis statistical significance was not reached. This outcome may have been influenced by the restriction that only herring candidate genes with an zebrafish ortholog could be included, and that half of the total genes mapped to zebrafish lacked a GO term. We expect that with a more complete reference genome and annotations, along with functional experiments, a better functional characterization of outlier loci will be achieved.
Our findings have several implications and potential applications in fisheries. Firstly, our results support the maintenance of separate management of spring- and fall-spawning components currently in place across most of the region. Secondly, management units should be revised in order to protect the functional intraspecific biodiversity revealed in this study, specifically considering a climate change scenario as spring-spawners seem to be less resilient to a warming ocean (Melvin et al., 2009). Thirdly, as we now have the molecular tools to distinguish herring spawning in spring or autumn and in northern and southern regions, a subset of outlier SNPs reported here can be used for genetic monitoring of stock composition, already at the larval stage and out of breeding seasons, to minimize the risk of overexploitation of vulnerable components within mixed stocks. And lastly, the current herring population models could be revised as none of them are in complete agreement with our genetic data, as similarly noted by McPherson et al. (2004). For instance, the discrete population concept proposes that gene flow is limited, hybrids have reduced fitness, and local populations are reproductively isolated by fixed spawning time, natal homing, spawning site fidelity, and larval retention areas with particular hydrographic features (Sinclair, 1988; Sinclair & Iles, 1989). While our data agrees with most of this, the presence of numerous putative hybrids suggests that gene flow may be more extensive than expected under this model and they are viable. In the dynamic balance population concept there is significant gene flow, no stable population structure, no fixed spawning time, no philopatry, no larval retention areas, as populations respond to changing environmental conditions (Smith & Jamieson, 1986). The temporal and spatial structuring we observed is opposite to this model. And in the metapopulation concept (adopted migrant) there is repeated homing to traditional spawning grounds defined by hydrographic features, migration and homing patterns are socially transmitted, and significant gene flow can occur as vagrants are adopted by non-natal local populations (McQuinn, 1997). This model implies an isolation-by-distance pattern and that spawning time is not genetically determined (it is learned), contrary to our observations.
In summary, our results confirm that Atlantic herring is a model system for the study of ecological adaptation with gene flow in the wild (Lamichhaney et al., 2017; Martinez Barrio et al., 2016), and provide insight into patterns and mechanisms of genomic divergence and local adaptation despite gene flow in an abundant and highly dispersive marine fish.
Data accessibility
Oceanographic data, and population-allele frequencies and individual SNP genotype data for this study will be available in Dryad upon publication acceptance. Bash, python and R scripts are available upon request to the authors.
Author contributions
D.E.R. and A.P.F.P. designed and conceived the study; C.B., R.S., K.E., L.P., and J.L.M provided herring samples; A.P.F.P contributed to tissue collection and processing, and performed lab work and bioinformatics data analysis; D.E.R and L.A. contributed to the interpretation of results; A.P.F.P. wrote the manuscript with input from D.E.R. and L.A. All authors vetted and approved the manuscript before submission.
Competing interests
The authors declare no competing financial interests.
Acknowledgements
We thank staff at Fisheries and Oceans Canada, the Maine Department of Resources, Comeau’s Seafoods Ltd, Cape Breeze Seafoods Ltd, and fishers Crystal and Donald Kent, Gordon McKay, and many others for their valuable contribution in sample collection. Thanks to Gregory McCracken for assistance in sampling and DNA extraction, to Gavin Douglas, Emma Sylvester, Sarah Lehnert, Ryan Stanley, Simone Fior, Alan Bergland, Miguel Carneiro, Michael Blum, and Mathieu Gautier for data analysis assistance. Many thanks to Ryan Stanley for obtaining the oceanographic data layers used in this study. Library preparation and shotgun sequencing was performed in The Centre for Applied Genomics of the Hospital for Sick Children, Canada. Computations were conducted on the supercomputer Mp2 of the University of Sherbrooke, managed by Calcul Québec and Compute Canada and funded by the Canada Foundation for Innovation (CFI), the ministère de l’Économie, de la science et de l’innovation du Québec (MESI) and the Fonds de recherche du Québec - Nature et technologies (FRQ-NT). A.P.F.P. and D.E.R. thank the Killam Trust. A.P.F.P. thanks the Vanier Canada Graduate Scholarship, the President’s Award of Dalhousie University, the Nova Scotia Graduate Scholarship, the Lett Fund and a Strategic grant to D.E.R for graduate studies funding. This study was funded by NSERC Discovery and Strategic grants to D.E.R.