Sex matters in Massive Parallel Sequencing: Evidence for biases in genetic parameter estimation and investigation of sex determination systems

Using massively parallel sequencing data from two species with different life history traits -- American lobster (Homarus americanus) and Arctic Char (Salvelinus alpinus) -- we highlighted how an unbalanced sex ratio in the samples combined with a few sex-linked markers may lead to false interpretations of population structure and thus to potentially erroneous management recommendations. Multivariate analyses revealed two genetic clusters that separated males and females instead of showing the expected pattern of genetic differentiation among ecologically divergent (inshore vs. offshore in lobster) or geographically distant (east vs. west in Arctic Char) sampling locations. We created several subsamples artificially varying the sex ratio in the inshore/offshore and east/west groups, and then demonstrated that significant genetic differentiation could be observed despite panmixia for lobster, and that Fst values were overestimated for Arctic Char. This pattern was due to 12 and 94 sex-linked markers driving differentiation for lobster and Arctic Char, respectively. Removing sex-linked markers led to nonsignificant genetic structure (lobster) and a more accurate estimation of Fst (Arctic Char). We further characterized the putative functions of sex-linked markers. Given that only 9.6% of all marine/diadromous population genomic studies to date reported sex information, we urge researchers to collect and consider individual sex information. In summary, we argue that sex information is useful to (i) control sex ratio in sampling, (ii) overcome “sex-ratio bias” that can lead to spurious genetic differentiation signals and (iii) fill knowledge gaps regarding sex determining systems.


Introduction
Recently, the revolution in massively parallel sequencing (MPS) technology has led to the production of many genome-wide datasets, whereby thousands of markers can be easily and inexpensively genotyped in hundreds of individuals for both model and nonmodel species (Davey et al. 2011;Andrews et al. 2016).Several MPS studies based on either RAD-sequencing or Genotype-By-Sequencing (GBS) techniques have demonstrated that these markers bring unprecedented insights on the causes and consequences of population structuring (reviewed in Narum et al. 2013).The strength of such methods comes from its supposedly random sampling of the entire genome (Davey et al. 2013).While the random distribution of markers achieved by these methods is advantageous in many regards, it has one over-looked result that could have consequences for inferences of population structure: some of the markers identified will be located on sex chromosomes, or in regions linked to sex, in species with genetic sex determination.Indeed, Wright (1931) pointed out this bias in genetic parameter estimations, particularly when sampling populations with varying sex ratios or in the presence of sex-biased dispersal.Despite the potential importance of these biases, few MPS studies have focused on the analysis of sex-linked markers (but see Gamble & Zarkower 2014;Kafkas et al. 2015;Brelsford et al. 2016;Larson et al. 2016) and to our knowledge, none have investigated the influence of sex-linked markers on inferences of population structure observed.
In addition to the importance of avoiding potential biases, detecting sex-linked markers in MPS datasets can also provide valuable information on sex determination (Pan et al. 2016).Sex is common to almost all living animals and often leads to the evolution of male and female dimorphism, both at the genetic and phenotypic level (Bell 1982).Diverse mechanisms acting at the scale of the genome, chromosomes or cells underlie the morphological, physiological and behavioral differences between males and females.Moreover, sex determination systems vary tremendously among and within taxa (Bachtrog et al. 2014), highlighting the challenges in determining the selective forces driving sex determination.In general, the diversity of sex determination systems reported in fish (particularly teleosts) and crustaceans is much more pronounced than that observed in mammals and birds (Bachtrog et al. 2014).Yet, the characterization of the genetic architecture of sex determination in these taxonomic groups has been limited to a few studies (Legrand et al. 1987).The access to new genomic approaches, which are increasingly being used in non-model marine and aquatic organisms (Kelley et al. 2016), offers new prospects to investigate the molecular basis of sex determination in this diverse group.
The identification of sex-linked markers can also provide a wealth of other useful information for management, conservation, and aquaculture (Pan et al. 2016).First, sexlinked markers can assist in the identification of the sex of an individual, particularly in cases with an absence of clear sexual dimorphism (e.g., at young life history stages).In aquaculture practices, this can help farmers to maintain equal sex ratios of breeding adults and to implement efficient breeding programs (Martínez et al. 2014).Second, sex information is often important to include as a covariate in genetic models for finding loci linked to specific traits in order to reduce residual variation (Broman & Sen 2009).Third, knowing the sex of individuals may facilitate the demonstration of sex-biased dispersal, i.e., when individuals of one sex are more prone to disperse (Prugnolle & De Meeûs 2002).Sex-biased dispersal is widely spread among vertebrates and can have important ecological and evolutionary consequences, but there is still little research on this topic in marine organisms, such as fishes and crustaceans, compared to mammals and birds (Mossman & Waser 1999).
Here, we present two empirical examples that illustrate how an unbalanced sex ratio combined with a few sex-linked markers can lead to false interpretations of population structure and to erroneous management recommendations, especially in species with high connectivity as frequently observed in marine and diadromous organisms.Our initial goal was to separately investigate population structure between two groups of American lobsters (Homarus americanus) occupying inshore and offshore habitats, and between Arctic Char (Salvenius alpinus) collected from two geographically separated regions (east and west) in the Canadian Arctic.In both cases, preliminary multivariate analyses mainly revealed two genetic clusters corresponding to male and female individuals instead of being related to inshore/offshore groups of lobsters or to east/west groups of Arctic Char.To further understand the clustering, we identified sexlinked markers driving the genetic differentiation between male and female in American lobster and Arctic Char.To demonstrate the potential impacts of sex-linked markers on the population genetic analysis, we tested for both species how different numbers of sexlinked markers and ratios of samples from each sex can cause biased inferences of population structure.Finally, using the set of sex-linked markers identified, we found potential candidate genes or chromosomal regions linked to sex for American lobster and Arctic Char.We conclude with an exhaustive literature search demonstrating that very few studies performed on marine and diadromous species report sex information, and we argue, in light of our findings, that collecting this information can be critical to avoid biases, especially in high gene flow species.

Sampling and molecular techniques
American lobster: Commercial fishers collected 203 American lobsters (100 males and 103 females) from 13 sites including eight inshore sites and five offshore sites along the Atlantic coast of North America (Figure 1A; Table S1).The sex of all specimens was determined visually from obvious external morphological differences.Genomic DNA was then extracted using Qiagen Blood and Tissue kits.DNA quality was confirmed using visual inspection on 1% agarose gel followed by quantification with Quantit Picogreen dsDNA assay kits.RAD-sequencing libraries were prepared following the protocol from Benestan et al. (2015).Each individual was barcoded with a unique sixnucleotide sequence and 48 individuals were pooled per library.Real-time PCR was used to quantify the libraries.Single-end, 100 bp sequencing was performed on an Illumina HiSeq2000 platform at the Genome Québec Innovation Centre (McGill University, Montréal, Canada).
Arctic Char: Samples of 290 adult anadromous Arctic Char (142 males and 148 females) were collected from six rivers located on southern Victoria Island, Nunavut, Canada (Figure 1B; Table S2).Sex was determined visually by observation of the gonads for a subset (n = 174) and based on a genetic assay for another subset (n = 116), as described in Moore et al. (2016).In brief, the genetic sex was inferred based on the PCR assay described in Yano et al. (2013).Six individuals of known sex (three males and three females) were used as controls.Genomic DNA was extracted using a salt-extraction protocol modified from Aljanabi and Martinez (1997).DNA quality and quantity were checked on 1% agarose gels and using PicoGreen assays (Fluoroskan Ascent FL, Thermo Labsystems), respectively.Libraries were prepared based on a GBS protocol modified from Mascher et al. (2013)

Bioinformatics and genotyping
Both the American lobster and Arctic Char libraries were de-multiplexed using process_radtags in STACKS (v.1.29 for American lobster and v.1.40for Arctic Char) (Catchen et al. 2013).Raw sequencing data was checked in FASTQC (Andrews 2015).
Reads were truncated to 80 bp for lobster and 70 bp for Arctic Char and adapter sequences were removed with CUTADAPT (Martin 2011).
American lobster: Loci were identified allowing a maximum of three nucleotide mismatches (M = 3), according to Ilut et al. (2014) and a minimum stack depth of three (m = 3), among reads with potentially variable sequences (ustacks module in stacks, with default parameters).Then, reads were clustered de novo to create a catalogue of putative RAD tags (cstacks module in STACKS, with default parameters).In the populations module of STACKS v.1.29 and following consecutive filtering steps, SNPs were retained when they were genotyped in at least 80% of the individuals and found in at least 9 of the 12 sampling sites.Potential paralogs were excluded by removing markers showing heterozygosity > 0.50 and 0.30 < F IS < -0.30 within sites.Only SNPs with a global minor allele frequency > 0.02 were retained for the analysis.The resulting filtered VCF files were converted into the file formats necessary for the following analyses using PGDspider v.2.0.5.0 (Lischer & Excoffier 2012).
Arctic Char: SNPs were identified by first mapping the reads to the genome of the closely related Rainbow Trout (Oncorhynchus mykiss; Berthelot et al. 2014) using GSNAP v2016-06-09 with a minimum of 90% read coverage (-min-cov 90), tolerating 2 mismatches (-m 2) and setting an indel penalty to 2 (-i 2).A subsequent trimming step was conducted with SAMtools v1.2 (Li et al., 2009) to remove unmapped and multimapped reads using flags -F 1797 and -F 4, and a minimum mapping quality (MAPQ) of 1, respectively.The binary alignment files (bam) were then used as input for downstream analysis.Genotypes were obtained using STACKS v.1.40integrated in a workflow developed in our laboratory (Benestan et al. 2016a).The catalog of loci was created allowing no mismatches among loci in cstacks (n=0) and a minimum stack depth of four (-m 4).SNPs were retained if at least 50% of the individuals were genotyped for the marker (-r 0.5) and the locus was present in at least four populations (-p 4).Potential paralogs were excluded by removing markers showing heterozygosity > 0.60 and F IS < -0.40 F IS > 0.40 within samples.Only SNPs with a global minor allele frequency > 0.01 were retained for the analysis.

Discriminant Analysis of Principal Component (DAPC)
For American lobster, Discriminant Analysis of Principal Components (DAPC) was performed in the R package adegenet (Jombart et al. 2010).The optimal number of discriminant functions (n=60) to retain was evaluated according to the optimal α-score obtained from the data (Jombart et al. 2010).For Arctic Char, a Principal Component Analysis was performed in adegenet.As population differentiation was pronounced enough to be observed with the PCA, a DAPC was not conducted.

Sex outlier loci detection
American lobster and Arctic Char: Outlier loci corresponding to the most divergent markers between sexes were identified with a level of differentiation between sexes exceeding random expectations using F st -based outlier analyses.Outlier SNPs were detected with BAYESCAN v. 2.1 (Foll & Gaggiotti 2008).BAYESCAN runs were implemented using permissive prior model (pr_odds) of 10, including a total of 10,000 iterations and a burn-in of 200,000 steps.For both species, these outlier analyses were conducted on the entire data set separated by sex.

Sex ratio and sex-linked marker influence on index of genetic differentiation (F st )
To determine the extent to which differing sex ratio influences the detected genetic structure, different proportions of male and female American lobsters or Arctic Char were subsampled from inshore or east and offshore or west, respectively, keeping a total of 50 individuals per group.This generated a gradient of six different sex ratio datasets, representing different sampling bias scenarios, from the most balanced (sex ratio = 25:25/25:25) to the most unbalanced sex ratio (sex ratio = 0:50/50:0).
Considering the three most unbalanced sex-ratio datasets (i.e., 0:50/50:0, 5:45/45:5, 10:40/40:10), we removed sex-linked markers (i.e., here outlier SNPs) according to their F st values (in descending order) and we estimated F st between offshore/inshore for the American lobster and east/west for the Arctic Char.We calculated F st values using the function fst_WC84 in assigner R package (Gosselin et al. 2016).

Marker annotation and genomic position
American lobster: There is no reference genome or high-density linkage map available for American lobster and so the approximate locations or associated linkage groups of the sex-linked SNPs could not be determined.Probable proximity between markers was determined by linkage disequilibrium (LD) analysis by calculating LD between pairs of SNPs using the geno-r2 command available in VCFTOOLS (Danecek et al. 2011).The LD data frame obtained with VCFTOOLS was then transformed into an LD matrix to be analyzed using the heatmap command in the R environment (Team 2013).In order to determine what genes are associated with these sex-linked markers, the 12 candidate SNPs (outliers identified by BAYESCAN) were queried using BLAST against the transcriptome of the American lobster (F.Clark and S. Greenwood, University of Prince Edward Island, personal communication; see details in Benestan et al. 2016b).Six of the 12 candidate SNPs were distributed among six different contigs in the transcriptome data.
The associated contigs were used as queries in a BLAST search against the SWISS-PROT database (Bairoch & Apweiler 2000).A minimal E-value threshold of 1 x 10 -6 and percent similarity of at least 70% were used.This yielded a set of two candidate SNPs associated with known genes.Gene ontology (GO) annotation terms were then associated to the candidate SNPs using SWISS-PROT accessions.
Arctic Char: There is no reference genome available yet for Arctic Char, but there is a high-density linkage map available for the closely related Brook Char (Sutherland et al. 2016).To obtain approximate positions of the sex-linked SNPs from Arctic Char, the MapComp method (Sutherland et al. 2016) was used to pair all of the Arctic Char markers with mapped Brook Char markers using the Atlantic Salmon genome (Lien et al. 2016;GenBank: GCA_000233375.4) as the intermediate reference genome.This method connects markers from two different linkage maps by mapping the markers to a reference genome, then pairing markers that map uniquely to the same place or close to each other in the reference genome.This was done as previously described (Sutherland et al. 2016), but with ten iterations to permit more than one anonymous marker pairing with a single mapped marker, as previously described (Narum et al. in review) but with a 1 Mbp maximum distance between the paired markers on a reference genome.This yielded approximate positions for determining the number and identity of linkage groups associated with sex in Arctic Char.To determine which genes are associated with these linked markers, the sex-linked markers were used in a BLAST query against the annotated Atlantic Salmon genome (Lien et al. 2016); NCBI Genome ICSASG_v2 reference Annotation Release 100).

Literature search for marine and diadromous species population genomic studies
We performed an exhaustive literature search to document the proportion of population genomics studies that have reported sexing the species analyzed.More specifically, we conducted a literature search of population genomics studies on marine/diadromous species published in peer-reviewed journals from January 2010 to 15 November 2016 using the ISI Web of Knowledge bibliographic database (Thomson Reuters, http://thomsonreuters.com)using search keywords (i) "genomics" AND "marine" AND "SNP" yielded 22 hits, (ii) "population structure" AND "marine" AND "SNP" yielded 47 hits, (iii) "RAD-sequencing" AND "marine" yielded 39 hits and (iv) "population genomics" AND "marine" yielded 243 hits, (v) "population genomics" AND "anadromous" OR "catadromous" yielded 11 hits.From these hits, several criteria were used to determine which studies to include in our analyses.First, the paper needed to focus on a marine animal and use a set of more than 1,000 SNP markers.Second, the paper needed to refer to population genomics or related areas such as outlier identification because these are the target areas of research likely to be influenced by the sex ratio bias in sampling.After removing studies on non-marine or non-animal organisms, or those with too low density of markers, a total of 38 and 14 publications were retained for marine and diadromous species, respectively (listed in Table 1 and 2).

Artefactual population structure caused by sex-linked markers
For American lobster, using 1,717 filtered SNPs, Discriminant Analysis of Principal Components (DAPC) was performed on the 203 individuals successfully genotyped to investigate the extent of population structuring between offshore and inshore locations.
Instead of finding significant genetic differences between inshore and offshore samples, the first axis of the DAPC highlighted a significant genetic differentiation between sexes (F st = 0.0057, P-value = 0.0009), explaining 16.04% of the total genetic variation (Figure 2A).
For Arctic Char, using 6,147 filtered SNPs a principal components analysis (PCA) of genotypes from 290 individuals identified strong clustering that explained 5.74% of the total genetic variation between two groups not corresponding to any particular geographic region (Figure 2C).By using the data on phenotypic and genetic sex, it was clear that samples mainly clustered by sex in this PCA (Figure 2C) and that this genetic differentiation was modest but significant (F st = 0.0132, P-value = 0.0002).

Delineating the influence of sex ratio on F st in panmictic or anadromous species
A DAPC and a PCA were run on datasets containing only males for offshore or east region and only females for inshore or west locations for American lobster and Arctic Char, respectively (Figure 2B,D).As expected, the DAPC for American lobster showed a highly significant signal of genetic differentiation between inshore and offshore samples with a F st value in the range typically seen in many marine species (F st = 0.0056, 95% CI inf = 0.0027 and CI sup = 0.0088, P-value < 0.05), which in reality resulted from the extremely skewed sex ratio of this artificial dataset (Figure 2B,D).This outcome contrasts with the panmictic structure observed between inshore and offshore (F st = 0.0001, CI inf = -0.0004and CI sup = 0.0006, P-value > 0.05) when sex ratio is balanced (sex ratio in the original dataset is equal to 25:25/25:25).As expected, F st between inshore and offshore was highest and most significant when sex ratio was completely unbalanced, i.e., sex ratio equal to 0 (F st = 0.0055, CI inf = 0.0030 and CI sup = 0.0092, Pvalue < 0.05).F st remained significantly elevated until the sex ratio was 15:35/35:15 (F st < 0.001, CI inf < 0; Figure 3A).Following the same method described above to simulate differing sex ratio datasets, F st between east and west Arctic Char locations was highest and most significant when sex ratio was completely unbalanced, i.e., sex ratio equal to 0:50/50:0 (F st = 0.0215, CI inf = 0.0194 and CI sup = 0.0242, P-value < 0.05).F st then gradually decreased with increasingly even sex ratios until it reached F st = 0.0064 (CI inf = 0.0055 and CI sup = 0.0072; P-value < 0.05) with a sex ratio of 25:25/25:25 (Figure 3B).(A) American lobster.F st between offshore and inshore according to sex ratio proportion when subsampling 100 individuals with a sex ratio ranging from a complete unbalanced sex ratio (i.e., sex ratio equal to 0:50/50:0) to a perfectly balanced sex ratio (i.e., sex ratio equal to 25:25/25:25).The horizontal black dashed line indicates the threshold below which F st values are no longer significant at P < 0.05.(B) Arctic Char.F st between east and west according to the sex ratio proportion when subsampling 100 individuals with a sex ratio ranging from a complete unbalanced sex ratio (i.e., sex ratio equal to 0:50/50:0) to a perfectly balanced sex ratio (i.e., sex ratio equal to 25:25/25:25).F st was still significant for the anadromous, but was overestimated in the skewed sex ratio cases.In both panels, the vertical limits of the box represent one standard deviation around the mean (n = 10 individual subsample iterations), the horizontal line within the box is the median, and the whiskers extend from the box to the 25 th and 75 th percentiles.

Sex-linked markers in genome-wide datasets
Identifying sex-linked markers in American lobster and Arctic Char Out of the 1,717 SNPs initially considered for the American lobster, BAYESCAN identified 12 highly differentiated markers between the sexes (Figure S1).These 12 markers have a BAYESCAN F st of 0.0800 on average between the sexes (range = 0.1567-0.1167)whereas the remaining 1,705 SNPs have on average a F st of 0.0030 on average (range = 0.0032-0.0101).
Out of the 6,147 markers initially considered for Arctic Char, BAYESCAN identified 94 markers contributing to the male/female separation (Figure S1).These 94 markers show a BAYESCAN F st of 0.0421 between the sexes (range = 0.0039-0.1140)whereas the remaining 6,053 markers were on average 0.0019 between the sexes (range = 0.0019-0.0036).

Delineating the influence of sex ratio on F st in panmictic or anadromous species
We investigated the influence of the number of these 12 and 94 sex-linked markers on the index of genetic differentiation (F st ) calculated between inshore/offshore or east/west for both species, where sex ratio in sampling was unbalanced at different degrees (0:50/50:0, 5:45/45:5, 10:40/40:10).For American lobster, we observed high and significant F st values when no sex-linked marker was removed for the three scenarios.Then, F st progressively decreased with the removal of sex-linked markers (in descending order regarding their F st values) until reaching a small and non-significant value when we removed at least 11 out of 12 sex-linked markers for the most extreme scenario (0:50/50:0; Figure 4A).For Arctic Char, F st progressively decreased from 0.0215 to 0.0064 on average, considering all scenarios, which suggest that F st is more than two-fold smaller when sex-linked markers are removed from the data set (Figure 4B).This decrease reached a plateau when 80 sex-linked markers were removed, which corresponds to almost the totality (n = 94) of the sex-linked markers found.1).The dashed line in black indicates the threshold below which F st values are no longer significant at P < 0.05.Sex ratio of 0.4 and 0.5 were not included in this analysis because F st values were not significant in these cases (see Figure 4A).(B) Artic Char.The line graph displays the influence of sex-linked markers on (as a function of the number of sexlinked markers removed from the analysis considering three sampling scenario with different degrees of sex ratio bias (0:50/50:0, 5:45:45:5, 10:40/40:10).Sex-linked markers are removed in descending order according to their F st values.

Characterizing sex-linked markers in American lobster
Linkage disequilibrium (LD) calculations for the 12 sex-linked markers in American lobster revealed two clusters of markers in high LD (Figure S2).One of the clusters includes seven markers with the strongest genetic differentiation between the sexes (F st > 0.40; Table 3).Six of these markers displayed heterozygosity excess in males (H O = 0.49, H O ranging from 0.16 to 0.63) and heterozygosity deficit in females (H O < 0.02; H O ranging from 0.00 to 0.29), thus providing evidence for a male heterogametic system.
The identities of genes nearby the sex-linked SNPs in lobster were further explored in the six contigs containing the six sex-linked SNP markers, which were located in sequences that had a significant match (more than 90% of nucleotide identity) in the American lobster transcriptome.The polymorphisms associated with two of these sequences both occurred in the 3'UTR region of the genes annotated by SWISSPROT database.These genes were sulfotransferase family cytosolic 1B member 1 (hereafter SULT1B1) and pre-mRNA-splicing factor cwf19 (hereafter cwf19), and are involved in steroid metabolism and mRNA splicing, respectively.Both genes that were previously reported to influence sex determination in fishes (Devlin & Nagahama 2002), namely in European Eel (Anguilla anguilla; Churcher et al. 2015) and Greenland Halibut (Scophthalmus maximus; Ribas et al. 2015a).
Using BLAST to align the 94 sex-linked markers against the Atlantic Salmon (Salmo salar) reference genome (Lien et al. 2016;GenBank GCA_000233375.4) consistently identified the Atlantic Salmon chromosomes homologous to the Brook Char chromosomes that were assigned using iterative MapComp.An additional nine of the 49 non-positioned markers aligned against the Atlantic Salmon chromosomes corresponding to the four highly sex-linked chromosomes, Ssa01, Ssa10 and Ssa09 (Ssa09 corresponds to a fused metacentric chromosome that corresponds to BC35 and BC38; Sutherland et al. 2016).Four non-positioned markers were assigned to chromosomes not identified as the four highly sex-linked chromosomes.Often the markers that had not received positions with iterative MapComp either did not have significant alignments or had many equal alignments in the Atlantic Salmon genome.Using BLAST against the annotated Atlantic Salmon genome, 28 of the 94 markers were found within a gene.For the remaining markers not found in a gene but with significant alignments, the closest upstream and downstream genes were identified along with the distance from the marker to the gene.Two sex-linked markers positioned on the Brook Char sex chromosome (BC35), SNP 86986 and SNP 87087, were on either side of transcription factor SOX-11-like, a member of the SRY-related HMG-box gene family associated to sex determination (Graves 1998;Woram et al. 2003).This gene was the closest annotated gene to these markers in the downstream or upstream direction, respectively, although the distance was large (~280 kb in each direction).Other identified genes containing sex-linked markers are involved in chromosome segregation and recombination (e.g., nipped-B-like protein, nuclear pore complex protein Nup93, bloom syndrome protein and centrosomal protein of 164 kDa), putative sex-specific activities (e.g., talin-2), or transcription factor activity (e.g., retinoic acid receptor RXR-alpha).

Discussion
Sex-ratio bias in genotyping-by-sequencing studies Sex-linked markers are expected to be present in all massively parallel sequencing genomic datasets developed on species with a genetic basis for sex determination (Gamble & Zarkower 2014).Despite the ubiquity of these sex-linked markers across taxa, very few population genomic studies on marine or diadromous species have reported information on the sex of samples being analyzed (see details below).However, our results clearly demonstrate that the occurrence of such markers jointly with an unbalanced sex ratio in sampling can lead to the observation of a spurious or biased population structure.This, in turn, may result in misinterpreting the biology of the species being investigated and possibly leading to improper management recommendations.For instance, in the case of the lobster study here, with an unbalanced sex ratio this could have led to the conclusion that inshore and offshore lobsters comprise two genetically distinct stocks (and therefore distinct management units) while in reality they comprise a single panmictic unit.This bias is particularly critical for high gene flow species characterized by very weak population structuring, which is typical of many marine and diadromous species alike.In such cases, only a few highly differentiated markers (here 0.7% and 1.5% of the total filtered markers for American lobster and Arctic Char, respectively) can generate a signal of significant genetic differentiation or inflate the signal in the cases of panmictic or low population differentiation, respectively.These outcomes highlight the importance of collecting sex information of individual samples to draw accurate conclusions about population structure of non-model species using genome-wide data sets.
Moreover, sex ratio is obviously an important characteristic of a population and is tightly linked to its dynamics.Therefore, gaining this information is valuable for an efficient and well-designed management plan, especially considering that sex ratio can vary widely in nature.For instance, sex-biased dispersal will strongly affect sex ratio, and this is widespread in birds and mammals (Pusey 1987) but still poorly investigated in marine organisms (Burgess et al. 2015).Identifying sex-linked markers for identifying the genetic sex of sampled individuals may enable further studies documenting sexbiased dispersal (Yano et al. 2013) as well as overcoming the influence of an unbalanced sex ratio on the analyses of genetic structure.
Addressing bias in sex ratio for population genomic studies on marine and diadromous species Marine and diadromous population genomics studies in animals have become increasingly frequent in recent years, going from a single published article in 2010 to 52 (38 for marine and 14 for diadromous species) articles in 2016 (based on our selective criteria; Table 1).The literature search we performed indicates that in only 9.6% of all studies (5/52; Galindo et al. 2010;Bruneaux et al. 2013;Johnston et al. 2014;Benestan et al. 2015;2016b) information was reported about the sex of the sampled individuals.
Most of these studies have a sample size comparable to those of the present study ( 118and 359 samples on median for marine and diadromous MPS studies respectively) as well as a comparable number of individuals sampled per location (median N per location range = 20-38).Therefore, all of these studies could potentially have been susceptible to the biases identified here.In the majority of these studies, the number of markers genotyped was higher than ours (7,688 and 9,107 SNPs on median for marine and diadromous MPS studies respectively) but since we demonstrate that only 12 and 94 sexlinked markers (0.7% and 1.5% of our total initial MPS datasets) were sufficient to create a signal suggestive of genetic structure, a greater number of markers will not overcome the influence of a small proportion of sex-linked markers in a high gene flow system such as that observed in the majority of marine species.
Since many MPS studies currently under way may not have access to sex information, one alternative way of overcoming the potential bias resulting from sex ratio differences would be to statistically assess the presence of two genetic clusters not associated with geography or other a priori factors hypothesized to influence genetic structure.Then, one could run a BAYESCAN defining groups based on the two observed clusters and assess the level of heterozygosity shown by the outlier markers found, as we did for the American lobster.However, the heterozygosity method will only work if the sex-linked markers are within a sex determining region that is not present on the alternate sex chromosome (i.e.only on Y), or if the sex chromosomes are largely heteromorphic.
Nevertheless, this could help MPS studies to ensure that this bias is not present when interpreting patterns of genetic structure.

Sex determination in the American lobster
In crustaceans, as in many other species, sex is determined either by male (XX/XY) or female heterogamety (ZZ/ZW).However, sex chromosomes are difficult to identify in crustaceans because of the large number of chromosomes (e.g., on average 110 chromosomes for American lobster; Hughes 2014) and the small chromosome size (Legrand et al. 1987).Although markers associated with sex determination can be identified by approaches such as that used here, or as conducted in salmon lice (Lepeophtheirus salmonis, Carmichael et al. 2013) but see also (Gamble & Zarkower 2014), most of the crustacean sex determining systems are poorly understood and understudied (Legrand et al. 1987).Taking advantage of RAD-sequencing, we provide the first evidence of a male heterogametic system in the American lobster (XX/XY), which is in agreement with one review reporting that male heterogamy is more common in Subphylum Crustacea than in the majority of other invertebrate species (Legrand et al. 1987).In addition, we demonstrate the potential to efficiently uncover the sex chromosome system of a non-model species using a genome-wide dataset and analysis of heterozygosity excess or deficit.

Candidate genes involved in sexual differentiation in American lobster
We identified two candidate genes linked to sex in American lobster: SULT1B1, which is involved in steroid metabolism, and cwf19, which acts on pre-RNA splicing.Steroids play important roles in regulating physiological functions related to reproduction and sex differentiation in fishes (James 2011).More broadly, several publications have identified that sulfotransferase genes, such as SULT1, are linked to sex determination in house mouse (Mus musculus; Dunn et al. 1999), mussels (Mytilus galloprovincialis; Atasaral Şahin et al. 2015), European Eel (Anguilla anguilla; Churcher et al. 2015) and Turbot (Scophthalmus maximus; Ribas et al. 2015b).For instance, sulfotransferase 6B1-like gene (SULT6B1) was expressed at higher levels in the livers of sexually mature European Eel males relative to females, which was hypothesised as indicating that this gene may be associated with pheromonal communication during the reproduction of this species (Churcher et al. 2015).In addition, one sulfotransferase gene (hs3st1l2) was a candidate gene for sex determination in Turbot, being associated with differential expression between sexes at sexual maturity (Ribas et al. 2015a).Interestingly, this study also identified cwf19 gene as a putative sex determining gene in the Turbot (Ribas et al. 2015b).
Both candidate polymorphisms occurred in the 3'UTR region of SULT1B1 (SNP 2879519) and cwf19 gene (SNP 1525332).In particular, the polymorphism located in the 3'UTR region of SULT1B1 displayed heterozygosity excess in males and heterozygosity deficit in females (see Table 3).The 3'UTR regions have an important role in posttranscriptional control of gene expression, and thus may affect the level of protein being expressed (Hesketh 2004).Several studies have shown that polymorphisms in 3'UTR region modulate the level of transcription of genes (Barrett et al. 2012).Here, polymorphisms found in SULT1B1 and cwf19 gene may thus affect transcription, as was documented for European Eel (SULT6B1 was overexpressed in liver of sexually mature males with a fold change of 7.8; Churcher et al. 2015) and Turbot (cwf19 was underexpressed in turbot females with a fold change of -1.7 ; Ribas et al. 2015b).
Although the functional annotation for these two genes in American lobster is unknown, these markers may provide information on the sex determination system of this species, but would require further work.

Chromosomes and genes associated with sex-linked markers in Arctic Char
There is no high-density linkage map available yet for Arctic Char, but low-density linkage maps have indicated that the sex chromosome is homologous between Arctic Char and Brook Char and is homologous to Rainbow Trout RT-25 (Timusk et al. 2011).
However, Arctic Char may have a metacentric sex chromosome (Timusk et al. 2011) whereas in this mapping family, Brook Char has an acrocentric sex chromosome (BC35; Sutherland et al. in prep).BC35 corresponds to the ancestral chromosome 15.1 determined from Northen Pike (Esox lucius, Rondeau et al. 2014), which corresponds to Sex-linked markers in genome-wide datasets RT-25b (Sutherland et al. 2016), confirming the previous homology observations (Timusk et al. 2011).Six sex-linked markers are grouped on BC35.Considering that the Arctic Char sex chromosome is expected to be metacentric, the Arctic Char chromosome corresponding to BC35 is probably fused with another acrocentric chromosome that is linked to sex here.Other linkage groups, BC13 (14.1),BC15 (19.1) and BC38 (1.2) also show substantial linkage to sex with between 8-12 sex-linked markers being present on each.Interestingly, BC13 (14.1) is the homeologous chromosome to the Rainbow Trout sex chromosome (omySex; 14.2; Palti et al. 2015;Sutherland et al. 2016), and BC15 (19.1) is homologous to the neo-Y chromosome of Sockeye Salmon (Faber-Hammond et al. 2012).The presence of sex-linked markers on these chromosomes related to sex determination in other salmonids indicates the importance of comparative genomics for characterizing the sex chromosomes of the salmonids, which will be possible as more genomes become available.
Sex-linked markers were located on BC35 on both sides (~280 kb up or downstream) of the SOX-11-like gene (markers 86986 and 87087), which is interesting given the role of the Sox (SRY-related) family in sex determination (Graves 1998).This is the closest annotated gene down-stream to marker 86986 or up-stream to 87087.
Several other sex-linked markers were within genes related to recombination and chromosome segregation, which is interesting given the differences in recombination rate between the sexes (i.e., heterochiasmy) in the salmonids (Sakamoto et al. 2000).Genes containing sex-linked markers that were related to recombination included nipped-B-like protein (on BC13), involved in holding sister chromatids together during cell division (Losada 2014) bloom syndrome protein (on BC15), involved in homologous recombinational repair of double strand breaks during meiosis to suppress crossovers, centrosomal protein of 164 kDa (CEP164; on BC38), a centrosomal protein involved in cell cycle and chromosomal segregation (Sivasubramaniam et al. 2008), and nuclear pore complex protein Nup93 (BC15), with a range of activities including transcription regulation and chromosome segregation (Ibarra & Hetzer 2015).A sex-linked marker was also identified near centrosomal protein kizuna (BC12) involved in establishing mitotic centrosome architecture.Sex-linked markers were also within genes related to transcription factor activity, including retinoic acid receptor RXR-alpha (BC13), a Sex-linked markers in genome-wide datasets member of the steroid and thyroid hormone receptor superfamily involved in sex differentiation in many organisms (Lv et al. 2013) and in SOX-mediated gene expression regulation (Nikčević et al. 2008).Several sex-linked markers were near genes involved in Wnt signaling, which is important for sex determination (Jordan et al. 2001), including Wnt-7b-like (BC15) and frizzled-9-like (BC38).Finally, a sex-linked marker was present in talin-2 (BC15), which has a truncated version known to be specifically expressed in testes and kidneys and expressed in elongating spermatids (Debrand et al. 2009).These genes linked to sex in Arctic Char may provide a better understanding of sex determination and heterochiasmy within the salmonids.

Conclusion
In summary, these results indicate the importance for population genomics studies to collect sex information about individual samples when possible in order to (i) control sex ratio in sampling, (ii) overcome "sex-ratio bias" that can lead to spurious genetic differentiation signals and (iii) fill knowledge gaps regarding sex determining systems.If morphological sex is difficult to determine at some life stages, the identification of sexlinked markers for screening samples may provide a useful alternative solution.Here the exploration of sex-linked markers provided information regarding the sex determination system as well as genes that may be involved in sex dimorphism in American lobster.
Furthermore, using comparative genomics within the salmonids allowed us to identify chromosomes that may harbor genes involved in sex determination this ecologically and economically valuable salmonid species, including several chromosomes, which have already been associated with sex in other salmonid species.
We are grateful to the lobster fishers, the Ekaluktutiak Hunters and Trappers Organization, and the employees of Kitikmeot Foods Ltd without whom this project would have been impossible.We thank G Gerlach for providing the American lobster

Data accessibility
All the raw reads will be submitted to NCBI's Short Read Archive and the qualityfiltered genotypes will be submitted as vcf files to Dryad upon article acceptance.Code to position Arctic Char anonymous markers on the Brook Char genetic map, combine them with BAYESCAN Fst values, and draw the Manhattan plots can be found on GitHub at the following link: https://github.com/bensutherland/salp_anon_to_sfon.Table 1.Marine population genomics studies describing the name of the study, the organism studied, the method used to produce the genetic markers, the number of individuals sampled (N), the number of genetic markers (SNPs), the number of individuals sampled per location (N POP ), the index of genetic differentiation observed among the location studied (F st ).

Tables
using PstI and MspI (details can be found in Perreault-Payette et al. in press).Specimens were individually barcoded with unique six-nucleotide sequences and pooled with 48 individuals per library.Libraries were each sequenced on two Ion Torrent Proton P1v2 chips.

Figure 2 .
Figure 2. Discriminant Analysis of Principal Components (DAPC) and Principal Components Analysis (PCA) of genetic differentiation depending on the sampling scenario (A and C).Results of the DAPC (A) and the PCA (C) performed on lobster and Arctic Char respectively with sex information included.Individuals from the inshore/east and offshore/west regions are represented by different shape symbols, and male and female are represented by black and white symbols, respectively.(B and D) Results of the DAPC (B) and the PCA (D) performed on lobster and Arctic Char respectively, but using hypothetical datasets in which only males were sampled in one of the location (offshore and west respectively) and only female in the other location (inshore and east respectively) showing a false signal of population differentiation driven by differences in sex ratios.

Figure 3 .
Figure3.Boxplots showing the influence of sampling sex ratio on F st. (A) American lobster.F st between offshore and inshore according to sex ratio proportion when subsampling 100 individuals with a sex ratio ranging from a complete unbalanced sex ratio (i.e., sex ratio equal to 0:50/50:0) to a perfectly balanced sex ratio (i.e., sex ratio equal to 25:25/25:25).The horizontal black dashed line indicates the threshold below which F st values are no longer significant at P < 0.05.(B) Arctic Char.F st between east and west according to the sex ratio proportion when subsampling 100 individuals with a sex ratio ranging from a complete unbalanced sex ratio (i.e., sex ratio equal to 0:50/50:0) to a perfectly balanced sex ratio (i.e., sex ratio equal to 25:25/25:25).F st was still significant for the anadromous, but was overestimated in the skewed sex ratio cases.In both panels, the vertical limits of the box represent one standard deviation around the mean (n = 10 individual subsample iterations), the horizontal line within the box is the median, and the whiskers extend from the box to the 25 th and 75 th percentiles.

Figure 4 .
Figure 4.The effect of sex-linked markers on the index of genetic differentiation (F st ).(A) American lobster.The line graph displays the influence of sex-linked markers on F st as a function of the number of sex-linked markers removed from the analysis considering three sampling scenario (10:40/40:10, 5:45/45:5, 0:50/50:0).Sex-linked markers are removed in descending order according to their F st values (see Table1).The dashed line in black indicates the threshold below which F st values are no longer significant at P < 0.05.Sex ratio of 0.4 and 0.5 were not included in this analysis because F st values were not significant in these cases (see Figure4A).(B) Artic Char.The line graph displays the influence of sex-linked markers on (as a function of the number of sexlinked markers removed from the analysis considering three sampling scenario with different degrees of sex ratio bias (0:50/50:0, 5:45:45:5, 10:40/40:10).Sex-linked markers are removed in descending order according to their F st values.

Figure 5 .
Figure 5. Manhattan plot of BAYESCAN F st between the sexes for Arctic Char markers positioned on the Brook Char genetic map.Arctic Char markers without positions were assigned positions on the Brook Char linkage map using multiple iterations of MapComp to identify linkage groups that were associated with sex in Arctic Char.Plotting the BAYESCAN F st along with marker positions indicates four linkage groups show strong linkage to sex: BC13, 15, 35 and 38.All positioned markers are displayed, and crosses indicate significant BAYESCAN F st markers.Markers that are not associated with sex have very low F st and can be seen along all of the linkage groups at the bottom of the graph.
samples.The NSERC CFRN funded the lobster portion of this research, while Fisheries and Oceans Canada and the Nunavut Wildlife Management Board funded the Arctic Char part of the work.Access to the Biomina Galaxy server hosted by the Biomina research center at the University of Antwerp, Belgium, made possible the analysis of the lobster transcriptome data.Funding for the lobster transcriptomics program was provided by the P.E.I.Atlantic Shrimp Corporation Inc. (12-LSC-035) and the Atlantic Lobster Sustainability Measures Program.Sequencing of the lobster transcriptome was performed by Genome Québec.L. Benestan was supported by a doctoral fellowship from NSERC CFRN and Réseau Aquaculture Québec (RAQ), and funds from LB's Canadian Research Chair in Genomics and Conservation of Aquatic Resources.

Table 2 .
Anadromous or catadromous (in the case of eels) population genomics studies describing the name of the study, the organism studied, the study goal, the method used to produce the genetic markers (Method), the number of individuals sampled (N), the number of genetic markers (SNPs), the number of individuals sampled per location (N POP ), the index of genetic differentiation observed among the location studied (F st ).

Table 3 .
American lobster.Observed heterozygosity (H o ), expected heterozygosity (H e ), inbreeding coefficient (F is ), P-value associated to inbreeding coefficient (P-value) and genetic differentiation index (F st ) between sexes (females, n=100; males, n=103) for 12 sex-linked markers identified with BAYESCAN.Markers showing the strongest genetic differentiation between both sexes and belonging to the same LD the cluster are in bold (see FigureS2).