Sex chromosome and sex locus characterization in the goldfish, Carassius auratus

Background Goldfish is an important model for various areas of research, including neural development and behavior and a species of significant importance in aquaculture, especially as an ornamental species. It has a male heterogametic (XX/XY) sex determination system that relies on both genetic and environmental factors, with high temperatures being able to produce female-to-male sex reversal. Little, however, is currently known on the molecular basis of genetic sex determination in this important cyprinid model. We used sequencing approaches to better characterize sex determination and sex-chromosomes in goldfish. Results Our results confirmed that sex determination in goldfish is a mix of environmental and genetic factors and that its sex determination system is male heterogametic (XX/XY). Using reduced representation (RAD-seq) and whole genome (pool-seq) approaches, we characterized sex-linked polymorphisms and developed male specific genetic markers. These male specific markers were used to distinguish sex-reversed XX neomales from XY males and to demonstrate that XX female-to-male sex reversal could even occur at a relatively low rearing temperature (18°C), for which sex reversal has been previously shown to be close to zero. We also characterized a relatively large non-recombining region (∼11.7 Mb) on goldfish linkage group 22 (LG22) that contained a high-density of male-biased genetic polymorphisms. This large LG22 region harbors 373 genes, including a single candidate as a potential master sex gene, i.e., the anti-Mullerian hormone gene (amh). However, no sex-linked polymorphisms were detected in the goldfish amh gene or its 5 kb proximal promoter sequence. Conclusions These results show that goldfish have a relatively large sex locus on LG22, which is likely the goldfish Y chromosome. The presence of a few XX males even at low temperature also suggests that other environmental factors in addition to temperature could trigger female-to-male sex reversal. Finally, we also developed sex-linked genetic markers in goldfish, which will be important for future research on sex determination and aquaculture applications in this species.

examination of the offspring gonads of the putative XX neomale identified 7 fish with testes, 134 83 fish with ovaries, and 41 fish with undifferentiated gonads. Disregarding fish with 135 undifferentiated gonads suggests a sex ratio of 7.8% males and 92.2% females for the offspring 136 of the XX neomale (Table 1). Gonadal histology of the offspring of the putative XY revealed 137 48 animals with testes, 65 with ovaries, and 14 with undifferentiated gonads, which gives a sex 138 ratio of 42.5% male and 57.5% female, ignoring the offspring with undifferentiated gonads 139 (Table 1). These sex ratio differences (Table 1), strongly support the hypothesis that male 140 breeder 1 is an XX neomale with an offspring sex ratio not significantly different from an 141 expected all-female population with a slight percentage of female-to-male sex-reversal, and that 142 breeder 2 is a genetic XY male with an offspring sex ratio not significantly different from an 143 expected 50:50 sex ratio. In agreement with these results, none of the XX neomale offspring 144 produced a positive PCR amplification for our three Y-allele specific primer pairs ( Figure S3, 145 Table 2), and all 48 phenotypic males but only one of 65 phenotypic females offspring from the 146 XY phenotypic male produced positive amplifications ( Figure S4, Table 2). This result also 147 indicates that no neomales were detected in offspring from the XY genetic male if we do not 148 consider the undetermined individuals compared to the 7.8% of neomales in the XX neomale 149 offspring population. Using the three Y-allele specific primer pairs described above, we genotyped goldfish 153 individuals and selected 30 phenotypic and genotypic males that were used along with 30 154 phenotypic females to contrast whole genome sex differences by pool-sequencing analysis [29]. 155 Pool-sequencing reads from the respective XY male and phenotypic female pools were mapped 156 to the high contiguity goldfish female genome assembly [6] to characterize genomic regions 157 enriched for sex-biased signals, i.e., sex coverage differences or sex-biased Single Nucleotide 158 Polymorphism (SNP) differences. Whole genome analysis of SNP distribution (Figure 2) 159 revealed a strong sex-linked signal in males on linkage group 22 (LG22) and two unplaced 160 scaffolds (NW_020523543.1 and NW_020523609.1) with a high density of observed SNPs 161 being heterozygous in the male pool and homozygous in the female pool (Y-specific allele). 162 Interestingly, of the 32 markers found using the RAD-Seq approach, 7 tags were enriched in 163 the unplaced scaffold NW_020523543.1 (Fig. 3C), confirming by a second approach that this 164 scaffold is part of the SD locus in goldfish. These regions with a high density of male-specific 165 SNPs ( Figure 3) are potential sex-determining regions that could contain the goldfish master 166 sex determining gene. LG22, being the only linkage group with a large sex determining region 167 (SDR, highlighted by a black box on Fig. 3A, C, D) containing a high-density of male-specific 168 SNPs (~11.7 Mb), likely corresponds to the goldfish Y sex chromosome. 169 We also observed, however, some smaller signals with less dense sex-linked SNPs in other 170 linkage groups ( Figure 2A) like for instance on LG47 ( Fig. S1) with both male and female sex-171 linked signals. Interestingly, LG47 is paralogous to LG22 stemming from the Cyprinidae whole 172 genome duplication [6]. Indeed, due to this recent common ancestry, these two chromosomes 173 share large homologous and syntenic regions (Fig. S2) that could have resulted in some false 174 remapping of the pool-sequencing reads leading to some of these secondary minor signals. 175 176

Identification of candidate SD genes 177
Searches for annotated genes by BLAST within the 20 contigs found in our male goldfish draft 178 genome assembly based on the RAD-Seq approach did not return any matches for a candidate 179 SD gene, but mostly transposable elements (Supplementary excel file 3). All genes within the 180 SDR (N= 373) were extracted because they are potential candidates for being SD gene(s) 181 (Supplemental excel file 4). Interestingly, among these genes the anti-Mullerian hormone gene 182 (amh) was found at the beginning of the SDR on LG22 (Fig. 3B). This gene has been reported 183 to be a sex-determining gene in other fish species [14,15]. However, we did not identify any 184 male-specific SNPs in the coding sequence of goldfish amh. In addition, other male specific 185 alleles within the 5kb promoter region did not show any sex-linkage. Though goldfish is an important economic ornamental fish and a useful model for studying 189 development, evolution, neuroscience, and human disease [3], characterization of goldfish sex-190 specific sequences and potential sex chromosomes have not been reported. In this study, we 191 explored goldfish sex determination using two complementary whole-genome approaches and 192 found that this species has a XX/XY sex determination system as previously described [24] and 193 a large, non-recombining sex determination region on LG22. Although RAD-sequencing or 194 pool-sequencing have been often used separately to explore sex determination in vertebrates 195 [16,30,31], we choose to combine these two approaches in goldfish because of the significant 196 female-to-male sex reversal induced by temperature [25] that would have prevented a clear 197 identification of the sex determining region using only a pooled strategy, which mixes genetic 198 XY males and XX males resulting from the sex reversal of genetic females. Because  sequencing keeps track of each individual, we were able to identify sex-reversed individuals in 200 goldfish that might have masked sex-linked markers in Pool-seq. 201 202 Sex markers identification is an important step to characterize SD systems [32][33][34][35][36][37][38]. Using two 203 complementary whole-genome approaches, we characterized genomic regions containing sex-204 linked markers. In goldfish, these sex-linked markers are genomic DNA variations including 205 gaps, indels and SNPs that present heterozygote polymorphisms in all males and complete 206 homozygosity in all females. This male-specific heterozygosity pattern agrees with a male 207 heterogametic XX/XY system as previously reported using progeny testing of hormonally sex-208 reversed breeders [24]. We found, however, a strong environmental influence leading to a 209 relatively high proportion (around 50%) of female-to-male sex-reversal in the first experimental 210 population that we used for the RAD-Sequencing approach. These animals were actually two-211 year old goldfish raised in an outdoor experimental facility and obtained at different spawning 212 times i.e., from May-June to late September. Some of these animals experienced early 213 development during summer time at potentially higher temperature and others had their early 214 developmental period at lower temperatures. Considering the known effects of high temperature 215 on female-to-male sex reversal in goldfish [25], the fact that some of these fish were exposed 216 to a high summer temperature could explain this relatively high percentage of female-to-male 217 sex-reversed animals. This high percentage was not found in our other experiments in which 218 fish were raised in indoor recirculating system facilities with a tightly controlled low 219 temperature (18 ) maintained throughout the whole early development phase (3 months). This 220 situation indeed confirms earlier findings showing that temperature is probably a major trigger 221 of neomasculinization in goldfish, but we also found that even at this low temperature there 222 was still a small percentage of female-to-male sex-reversal (7.8%), suggesting that other 223 environmental factors, potentially social factors as demonstrated in other species [8,39], could 224 also play a role on goldfish sex determination. Apart from goldfish, sex determination in other 225 teleost fish, including Tilapia [40], medaka [41] and tongue sole [42] is also regulated by 226 temperature, which overrides the genetic sex determination mechanisms and leads to female-227 to-male sex reversal. By developing genetic sexing tools in goldfish that allows the 228 identification of Y-allele carrying animals, we also brought additional evidence that some of 229 these phenotypic males were indeed sex-reversed XX genetic females. These genetic sexing 230 tools are indeed important for better deciphering genetic and environmental sex determination 231 in goldfish. But these PCR primers could be also used to facilitate the industrial production of 232 commercial goldfish-related hybrid fish in China [43,44], by helping to identify neomales i.e., 233 XX female-to-male sex reversed animals. 234 235 Sex determination in vertebrates is highly variable with the major exceptions of Eutherian 236 mammals and birds in which XX/XY and ZZ/ZW monofactorial sex determination systems 237 have been conserved over a long evolutionary period [45,46]. In contrast, fish exhibit much 238 more diverse and dynamic sex determination [9,10,47], with monofactorial and polyfactorial 239 [48,49] genetic systems and frequent switches and turnovers of master sex-determining genes 240 [12,14,15,17,21,50]. In goldfish, we identified male-specific markers and obvious male-241 specific SNPs strongly enriched on LG22. This result confirms that goldfish has an XX-XY 242 system [24] and also indicated that LG22 is the sex chromosome in that species. Evidence is 243 accumulating for the hypothesis that sex chromosomes, in most cases, evolve from autosomes 244 with de novo initial evolution of a new sex determination mechanism that subsequently 245 becomes fixed and extended by the suppression of recombination on the sex chromosome in 246 the vicinity of the initial sex locus, which may increase the size of this non recombining sex 247 determination locus [51]. In goldfish, ~11.7 Mb of LG22 contains numerous male-specific 248 SNPs. A similar large size of the non-recombining region on the sex chromosomes was also 249 found in tilapia including 17.9 Mb in Sarotherodon melanotheron and 10.7 Mb in Oreochromis 250 niloticus [30,31]. The large non-recombining region on LG22 contains 373 gene models based 251 on the goldfish genome annotation and also a large number of transposable elements (TEs) that 252 were found to be strongly enriched in the male specific contigs identified by our RAD-Sex and 253 our draft genome analysis. Enrichment of TEs around sex loci has been found in other vertebrate 254 species [52] and may play a crucial role for suppression of recombination leading to an 255 expansion of sex chromosome divergence. 256 257 With LG22 being the potential sex chromosome in goldfish, it is reasonable to believe that the 258 non-recombining region that we characterize on LG22 contains the goldfish master sex 259 determining gene. But the only "usual suspect" master sex determining candidate found in this 260 region and the additional non-assembled scaffolds containing sex-linked markers is the anti-261 Mullerian hormone gene (amh) that is located at the beginning of the LG22 non-recombining 262 region. Duplications of amh have been characterized as the master sex determining gene in 263 different fish species [14,15], making Amh and members of the TGF-beta pathway [17, 19, 20] 264 likely candidates for this sex-determining function. But we have not been able to characterize 265 sex-linked variation neither in the amh coding DNA sequence nor in its 5 kb proximal promoter 266 sequence. Even if we cannot rule out the hypothesis that amh regulation could be affected by 267 sex-specific cis-regulatory elements located very far upstream from amh, our results do not 268 provide any clear and direct evidence that this gene is the goldfish master sex determining gene. 269 Indeed, not all master sex determining genes are classical "usual suspects" known to be 270 involved in the sex-differentiation pathway like TGF-beta members [17,19,53], Sox3 [21], or 271 Dmrt1 [50,54]. For instance, the rainbow trout master sex determining gene arose from the 272 duplication / transposition / evolution of an immune-related gene [12]. This finding suggests 273 that goldfish could also have an unusual master sex determining gene, preventing an easy and 274 direct identification just with simple genome-wide analyses and candidate gene approaches. that occurred approximately 14 million years ago [6]. This WGD adds an extra complexity to 279 our search for sex-linked regions and sex determining candidate genes because some of these 280 duplicated regions may still retain large blocks of high sequence similarity. The cyprininae 281 genome duplication probably explains why we found an additional sex-biased signal on LG47 282 that stems from the duplication of the same ancestral chromosome that LG22. In addition to the 283 cyprininae WGD, the current goldfish reference genome sequence [6] was assembled from the 284 sequences of an XX gynogenetic animal, meaning that the LG22 sex chromosome sequence is 285 an X chromosome sequence in which potential Y specific regions may be not present. We 286 indeed produced a first draft genome sequence of an XY male but a higher contiguity male 287 genome including long-read technology would be needed to better explore sex-chromosome 288 differences and characterize potential sex-determining candidates. 289

CONCLUSIONS 291
Our results confirm that sex determination in goldfish is a complex mix of environmental and 292 genetic factors, and that its genetic sex determination system is male heterogametic (XX/XY). 293 We also characterized a relatively large non-recombining region (~11.7 Mb) on LG22 that is 294 likely to be the goldfish Y chromosome. This large non-recombining region on LG22 contains 295 a single obvious candidate as a potential master sex gene, namely the anti-Mullerian hormone 296 gene (amh). No sex-linked polymorphism, however, was detected in the goldfish amh gene and 297 its 5 kb proximal promoter sequence. Our work provides the foundation required for additional 298 studies that are now required to better characterize sex determination in goldfish and to 299 characterize its master sex-determining gene. 300

Experiment fish 303
Fish used for RAD-seq and Pool-seq were reared outdoors and obtained from different 304 spawning times i.e., between May-June and late September. Putative XY and XX males were 305 selected using Y-allele specific primers and these two males were crossed with the same female 306 to produce two goldfish populations that were incubated and reared indoor at 18℃ during three 307 months after fertilization to minimize the chance of sex reversal induced by temperature 308 according to previous research [25]. After these 3 months at 18°C, the rearing temperature was 309 gradually increased to 24℃ over a period of 7 days to avoid suddenly dramatic temperature 310 variation. One-year old fish were euthanized with Tricaine before dissection. Gonads of 311 goldfish were fixed in Bouin's fixative solution for 24 hours and then embedded gonads were 312 cut serially into 7 µm sections and stained with Hematoxylin to characterize ovarian or testicular 313 features. Fin clips were stored in 90% alcohol for DNA extraction and genotyping. Statistics 314 were applied to test for significant sex ratio differences and genotype/phenotype sex-linkage 315 with a Chi-squared test (p < 0.05). Primers were designed from the sequences of male-biased contigs for sex genotyping and a 327 positive control (Table S1)  PCR thermal cycle procedures were: 94℃ for 30s for denaturing, 58℃ for 30s for annealing 333 and 72℃ for 30s for extending for 35 cycles. Finally, PCR products were electrophoresed on 334 1.5% agarose gels. 335 336

Restriction-site association sequencing (RAD-seq) and male-marker discovery 337
Genomic DNA was extracted from 30 males and 30 females and digest with the restriction 338 enzyme SbfI for constructing a RAD-seq library according to standard protocols [55]. Briefly, 339 for each sample, 1µg of DNA was digested using SbfI. Digested DNA was purified using 340 AMPure PX magnetic beads (Beckman Coulters) and ligated to indexed P1 adapters (one index 341 per sample) using concentrated T4 DNA ligase (NEB). Ligated DNA was purified using 342 AMPure XP magnetic beads. Each sample was quantified using microfluorimetry (Qubit 343 dsDNA HS assay kit, Thermofisher) and all samples were pooled in equal amount. The pool 344 was fragmented on a Biorputor (Diagenode) and purified using a Minelute column (Qiagen). 345 Sonicated DNA was size selected on an 1,5 % agarose cassette aiming for an insert size of 300 346 bp to 500 bp. Size selected DNA was extracted from the gel using the Qiaquick gel extraction 347 kit (Qiagen), repaired using the End-It DNA-end repair kit (Tebu Bio) and adenylated on its 3' 348 ends using Klenow (exo-) (Tebu-Bio). P2 adapter was ligated using concentrated T4 DNA 349 ligase (NEB) and 50 ng of the ligated product was engaged in a 12 cycles PCR. After AMPure 350 XP beads purification, the resulting library was checked on a Bioanalyzer (Agilent) using the 351 DNA 1000 kit and quantified by qPCR using the KAPA Library quantification kit (Roche, ref. (http://github.com/RomainFeron/RADSex-vis) (Fig 1.A). Sequences significantly associated 361 with sex were extracted using the function signif, which identifies sex-bias tags. 362 Male-biased tags were compared to the male de novo assembly with ncbi-blast+ (version: 2.6.0) 363 setting the e-value cutoff to 1 e-20 to identify long, homologous male-biased contigs. Male 364 specific PCR primers were designed from these contigs sequences (see Table S1)

Sequencing and de novo assembly of a goldfish male genome 401
One genetic male was selected for de novo assembly using the Y-specific primers described 402 above. Library was generated using the Truseq nano DNA sample prep kit (Illumina, ref.   Three Y-allele primer pairs (marker 1 to 3) and one autosomal primer pair (positive control) were designed on our XY male genome assembly (male assembly). Name of the contig and nucleotide position (3'-5') are given in the genome location column.     SNPs were counted using 100kb sliding window with an output point every 500bp and female-and male-specific SNPs were respectively indicated by red and blue color. (A) A large sex-determination region was identified on LG22, which is highlighted with a black box. The candidate sex-determining gene amh is located on this LG22, but not in the high density, male-specific SNP region. The region from 8Mb to 10Mb containing amh is zoomed in panel (B). (C) The NW_020523543.1 unplaced scaffold exhibits a region around 0.1Mb harboring a small region (200 kb) with a high-density of male-specific SNPs. Meanwhile, sequence comparisons demonstrate that 7 male-biased RAD-tags (colored circles) on a total of 32 map with a high identity onto this scaffold. In contrast, few femalespecific SNPs were enriched on this scaffold (red area). (D) The unplaced NW_020523609.1 scaffold is enriched in male-specific SNPs.

Figure S3
Figure S3: Sex genotyping with Y-allele primers of the offspring of a putative XX neomale with a normal XX female. Genotyping was conducted with three Y-allele primers and one autosomal primer used as a gDNA quality control. Phenotypic sex was determined by gonadal histology and males and females are shown using red and yellow color respectively. Femaleto-male sex-reversed animals (N= 7) are highlighted by red boxes. Hashes indicate animals with unknown phenotypic sex with undifferentiated gonads based on histology. Figure S4   Figure S4. Sex genotyping with Y-allele primers of the offspring of a putative XY male with a normal XX female. Genotyping was conducted with three Y-allele primers and one autosomal primer used as a gDNA quality control. Phenotypic sex was determined by gonadal histology and males and females are shown using red and yellow color respectively. The female-to-male sex-reversed animal (N= 1) is highlighted by a red box. Hashes indicate animals with unknown phenotypic sex with undifferentiated gonads based on histology.