Gene duplication to the Y chromosome in Trinidadian Guppies

Differences in allele frequencies at autosomal genes between males and females in a population can result from two scenarios. Unresolved sexual conflict over survival produces allelic differentiation between the sexes. However, given the substantial mortality costs required to produce allelic differences between males and females at each generation, it remains unclear how many loci within the genome experience significant sexual conflict over survival. Alternatively, recent studies have shown that similarity between autosomal and Y sequence, arising from duplication onto the Y, can create perceived allelic differences, and this represents potentially resolved sexual conflict. However, Y duplications are most likely in species with large non-recombining regions, in part because they simply represent larger targets for duplications. We assessed the genomes of 120 wild-caught guppies, which experience extensive predation- and pathogen-induced mortality and have a relatively small ancestral Y chromosome. We identified seven autosomal genes that show allelic differences between male and female adults. Five of these genes show clear evidence of whole or partial gene duplication to the Y chromosome, suggesting that the male-specific region of the guppy Y chromosome, although relatively small, may nonetheless act as a hotspot for the resolution of sexual conflict. The remaining two genes show evidence of partial homology to the Y. Overall, our findings suggest that the guppy genome experiences a very low level of unresolved sexual conflict over survival, and instead the Y chromosome, despite its small ancestral size and recent origin, acts as a major mechanism of conflict resolution.

Differences in allele frequency between males and females at autosomal loci can result from 58 two alternative sources. Numerous recent studies have used allelic differences between males 59 and females within a population (FST, Dxy, etc) as a way to infer sex-differences in viability or 60 survival, and therefore sexual conflict over mortality (Cheng &  .This approach assumes that allele frequencies are the same in males and females at 63 conception, but diverge over the course of a generation for loci with alleles that benefit the 64 survival of one sex at some survival cost to the other (intra-locus sexual conflict). We expect 65 that sexual conflict over survival would produce a signature of allelic differentiation between 66 the sexes as well as balancing selection (Mank, 2017). The latter results from all forms of intra-67 locus sexual conflict, not just that over survival, as alleles are selected for or against depending 68 on whether they are present in males or females (Barson et  precludes the presence of large numbers of sites subject to sexual conflict due to survival, as 80 the associated mortality load would simply be too great. Moreover, recent work has highlighted 81 the potential that many perceived allelic sex differences between the sexes actually are the 82 result of sequence homology between autosomal and sex-linked loci (Bissegger et

Intersexual FST 126
Given the low level of allelic divergence expected between males and females, it is critical to 127 minimize false positives (Kasimatis et al. 2019). We therefore used three independent methods 128 to estimate SNPs with elevated intersexual FST. We identified SNPs that were 1) in the top 1% of 129 the autosomal FST distribution 2) were significant after permutation testing of samples (1000 130 replicates, P < 0.001) and 3) showed significant differences in male and female allele frequency 131 based on Fisher's exact test (P < 0.001) ( Supplementary Fig. 1 Supplementary Fig. 3). 138 Estimates of intersexual FST and Tajima's D can be biased due to relatedness of individuals 139 within groups. Although our sampling design was balanced, with 10 females and 10 males 140 collected at two separate sites for each of three rivers across the island of Trinidad (see 141  (Table 1), as is expected for genes on the Y chromosome. 165 In order to differentiate whether the pattern of elevated M:F FST is due to limited sequence 166 homology between the autosomal genes and the Y chromosome, or results from at least partial 167 gene duplication, we mapped the pattern of M:F read depth across each of these 5 genes (Fig.  168 1). The high M:F read depth for Olr1492-like, consistent across the full length of the single exon 169 of this gene, suggests a complete duplication, possibly in two copies (Fig. 1A).Similarly, the M:F 170 read depth ratio for si:rp71-17i16.5 (

Sexual conflict over survival for autosomal genes 184
We identified just two sexually differentiated genes without significantly higher M:F read depth 185 (Table 1) Weedall & Conway 2010), and to other genomic categories (Fig. 2). Contrary to our expectation 196 of elevated Tajima's D for these two sexually differentiated genes, neither showed a Tajima's D 197 value significantly greater than the autosomal average, and Tajima's D for ENSPREG15023 is 198 significantly lower than the autosomal average (Fig. 2). 199 Many things can influence Tajima's D estimates, and the lack of an elevated signature of 200 balancing selection relative to the remainder of the genome is not necessarily indicative of a 201 lack of sexual conflict. In order to understand the dynamics of these two genes in more detail, 202 we mapped M:F read depth as a function of genomic location and intersexual FST (Fig. 3). We identified 504 coding sequence SNPs in the guppy autosomal genome which showed 215 significant differences in allele frequency between males and females. We used these to 216 identify seven autosomal genes with significant average intersexual FST. Our approach, based on 217 the intersection of three statistical methods, reduces the likelihood of false positives and 218 results in a high confidence gene list of intersexual FST (Table 1, Supplementary Figs. 1, 2, and  219 3). This list of genes can be used to assess the relative role of Y duplication in resolving conflict 220 and driving intersexual FST, versus the role of sex differences in mortality in sexual conflict of 221 autosomal genes. 222

The Y chromosome in guppies as a locus for conflict resolution. 223
Five of the seven high-confidence sexually differentiated genes showed evidence of Y 224 duplications based on elevated male-to-female read depth ratios (Table 1 Our findings add further support for a region of recombination suppression across populations 247 on Trinidad, as only a male-specific region of the Y can explain the M:F read depth differences 248 we observe. Moreover, our work suggests that the guppy Y chromosome is dynamic with 249 regard to gene content, and acts as a hotspot for gene duplications with male-specific functions 250 despite its recent origin and small size. It is also possible that Y genes have duplicated to the 251 autosomes, which would also produce a pattern of increased male read depth, although this is 252 arguably less likely. Taken together, our work suggests that even homomorphic sex 253 chromosomes may act as a hotspot of sexual conflict resolution. Moreover, our results further 254 emphasize the importance of accounting for Y gene duplications in scans for M:F FST, as the all 255 of our sexually differentiated genes show evidence of Y duplication. 256 The Y-duplicated genes we identify here are not present in our previous list of Y-linked genes 257 based on male-specific sequence (Almeida et al., 2020), or in other similar analyses (Fraser et 258 al., 2020). This is not surprising, as duplications, particularly if recent, will still retain substantial 259 homology to the autosomal copy and will not be detected when bioinformatically identifying 260 sequence that is unique to male genomes. Consistent with recent duplications and limited 261 divergence, none of our Y-duplicated genes exhibit significantly elevated average M:F SNP 262 density (Table 1, Fig. 2), although the values are all >1. signal, if any, was due to Y duplications. However, it is worth noting that elevated intersexual 268 FST was highest for genes with male-biased expression, as would be expected for genes with Y 269 duplicates. We also did not observe a concomitant pattern of elevated Tajima's D for these 270 genes, which is inconsistent with sexual conflict over survival. 271

Sexual conflict over survival targets few genes in the guppy autosomal genome 272
Intra-locus sexual selection over survival or viability leads to allele frequency differences 273 between the sexes over the course of a generation, as an allele increases the survival of one sex We expect that if these genes are indeed subject to sexual conflict over mortality, we would 285 observe elevated Tajima's D (Fig. 2), however this was not the case for either locus. Moreover, 286 both of these loci exhibit patterns of M:F read depth consistent with significant Y homology 287 (Fig. 3). This suggests that the potential for sexual conflict over survival is quite low, even in a 288 species where we most expect to observe it. 289 290

Concluding remarks 291
Here we use a high-stringency filtering method to detect genes within the guppy genome that 292 exhibit population genetic signatures expected from sexual conflict over survival. Although wild 293 guppies are expected to have high potential for sexual conflict over survival, we in fact found no 294 genes within the genome with patterns consistent with this. Instead, despite the small size of 295 the conserved non-recombining region of the guppy Y, we observe five loci that show patterns 296 consistent with autosome-to-Y duplications, and two more that are suggestive of Y homology of 297 without duplication. This highlights the potential of even young, small Y chromosomes as 298 regions of conflict resolution. 299 300

Data Collection and Genotyping 302
Samples were collected from three rivers, Aripo, Yarra, and Quare, in Trinidad in December 303 2016, in accordance with national collecting guidelines. In total, 10 males and 10 females were 304 collected from one high predation and one low predation population in each river, resulting in 305 120 samples, which were sequenced individually with Illumina HISEQX. Further sequencing 306 details are available in Almeida et al. (2020). 307 We used FastQC v0.11 (www.bioinformatics. babraham.ac.uk/projects/fastqc) and 308 Trimmomatic 0.36 (Bolger et al., 2014) to remove adapter sequences and low-quality reads. 309 After quality control, we recovered ~30X average sequencing depth for males and ~20X 310 sequencing depth for females. High quality reads were aligned against the Poecilia reticulata

Intersexual FST 326
In order to estimate intersexual allele frequency differences, we implemented Weir & 327 Cockerham's estimator of FST (Weir & Cockerham 1984) between males and females using 328 VCFtools v0.1.16 for each SNP in genome-wide coding sequence regions. We employed three 329 methods jointly to identify SNPs exhibiting high FST. First, we used a cut-off method, retaining 330 SNPs in only the top 1% of autosomal FST values. Second, we performed permutation tests by 331 randomly assigning individuals to one of two sex groups to generate a null distribution of FST 332 across the genome. We determined significance for each SNP from 1000 replicates, using a P < 333 0.001 threshold. Finally, we performed Fisher's exact test on SNPs to determine significance of 334 allele frequency differences between males and females (P < 0.001). We denoted SNPs that 335 were significant in all three of these measures as high FST SNPs. 336 identified 7 genes with ≥ 3 high intersexual FST SNPs, which we designated as sexually 338 differentiated genes. We calculated average intersexual FST for all genes using VCFtools v0.1.16, 339 respectively. We used Wilcoxon rank-sum test to indicate statistical difference in intersexual FST 340 between autosomal genes and other gene categories (sexually differentiated genes and genes 341 on the sex chromosome). 342

Relatedness Inference 343
In order to avoid biases in calculating intersexual allele frequency differences due to 344 relatedness among individuals, we used KING 2.2.7 (Manichaikul et al., 2010) to infer the 345 pairwise degree of relatedness between individuals from estimated kinship coefficients. We 346 first converted genotype data from the raw, unfiltered SNPs dataset to plink binary format 347 using PLINK 1.9 (Purcell et al., 2007). In order to avoid potential biases from KING software 348