Evolutionary rates of testes-expressed genes differ in monogamous and promiscuous Peromyscus species

Reproductive proteins, including those expressed in the testes, are among the fastest evolving proteins across the tree of life. Sexual selection on traits involved in sperm competition is thought to be a primary driver of testes gene evolution and is expected to differ between promiscuous and monogamous species due to intense competition between males to fertilize females in promiscuous lineages and lack thereof in monogamous ones. Here, we employ the rodent genus Peromyscus as a model to explore differences in evolutionary rates between testes-expressed genes in monogamous and promiscuous species. We find candidate genes that may be associated with increased sperm production in promiscuous species and gene ontology categories that show patterns of molecular convergence associated with phenotypic convergence in independently evolved monogamous species. Overall, our results highlight the possible molecular consequences of differences in mating system, likely due to differences in selective pressures.


Introduction
Reproductive genes exhibit increased levels of sequence divergence among species relative to genes with no implications in reproductive processes (Swanson and Vacquier 2002;Clark, Aagaard, and Swanson 2006). This pattern is ubiquitous across diverse lineages, spanning from microbes to mammals (Makalowski and Boguski 1998;Armbrust and Galindo 2001;Wik, Karlsson, and Johannesson 2008;Sato et al. 2002;Vacquier and Swanson 2011;Torgerson, Kulathinal, and Singh 2002). Among these rapidly evolving genes are those expressed in the male testes, which are thought to be driven by sexual selection on specific traits involved in sperm competition (Ramm and Stockley 2009;Torgerson, Kulathinal, and Singh 2002;Harrison et al. 2015;Moyle, Wu, and Gibson 2020;Swanson and Vacquier 2002;A. Civetta and Singh 1995;Firman and Simmons 2010a;Parker 1970;Rice and Holland 1997). Since reproductive success is essential to fitness, observed increased rates of evolution in testes expressed genes are usually the result of positive selection (Ramm et al. 2008;Teng et al. 2017;Ramm and Stockley 2009), but could also be explained by relaxed selection (Dapper and Wade 2016).
Mating system, the general pattern by which males and females mate within a species, is expected to drive differences in rates of evolution of testes expressed genes (Lüpold et al. 2016;Clutton-Brock 2017). The vast majority of mammalian species are promiscuous, meaning that females mate with multiple males and vice versa (Garcia-Gonzalez 2017). However, genetic monogamy (where males and females only mate with one individual for life) has evolved multiple times independently across mammalian lineages (Lukas and Clutton-Brock 2013). Promiscuous and monogamous species often exhibit morphological, physiological and behavioral differences associated with mate fidelity which can be contextualized in light of selection (Ambaryan, Voznessenskaya, and Kotenkova 2019;Wey, Vrana, and Mabry 2017;Stanyon and Bigoni 2014;Tidière et al. 2015;Dapper and Wade 2016;Alberto Civetta and Ranz 2019). Oftentimes morphological and physiological differences are associated with the male testes and sperm performance, since in contrast with monogamous species, promiscuous males are constantly competing to fertilize females (Dapper and Wade 2016;Firman and Simmons 2010a). Faster, more viable sperm and higher sperm counts yield an increased chance of fertilization and are advantageous for promiscuous species (Pizzari 2006). Thus, promiscuous males often possess larger testes and produce more abundant and more competitive sperm than monogamous males, even in the case of recently diverged species (Fisher et al. 2018;Firman and Simmons 2010a;Heske and Ostfeld 1990;Claw et al. 2018;Shuster 2009). However, the genetic basis of morphological and physiological differences related to differences in mating system remain largely unexplored.
Social/sexual monogamy has evolved at least twice independently within the rodent genus Peromyscus (Figure 1) (Turner et al. 2010;Bedford and Hoekstra 2015), with behavioral and physical traits differing consistently between monogamous and promiscuous Peromyscus species (Fisher and Hoekstra 2010;Fisher et al. 2014Fisher et al. , 2016Fisher et al. , 2018Bendesky et al. 2017).
Peromyscus is a powerful model for interrogating the drivers of convergent evolution and adaptation due to its well-resolved phylogeny (Greenbaum et al. 2017;Sullivan et al. 2017), well documented convergence of phenotypes (Steiner et al. 2009;Manceau et al. 2010;Bedford and Hoekstra 2015), and extensive ecological, morphological, and behavioral variation (Gering et al. 2009;Fisher and Hoekstra 2010;Shorter et al. 2012;Bedford and Hoekstra 2015;Guralnick et al. 2020). Here, we use a combination of evolutionary rate modeling and gene expression analysis to identify specific genes that may be associated with increased sperm production in the promiscuous P. maniculatus and gene ontology categories that show patterns of molecular convergence associated with phenotypic convergence in two independently evolved monogamous species: P. polionotus and P. eremicus (Foltz 1981;Birdsall and Nash 1973;Rosenfeld et al. 2013;Xia and Millar 1991).

Obtaining and reconstructing transcriptomic data
We employed a conservative but robust pipeline for detecting signatures of parallel evolutionary rates in transcripts of P. polionotus and P. eremicus relative to P. leucopus and P. maniculatus. First, we obtained P. maniculatus, P. polionotus, P. leucopus, and P. eremicus testes transcriptome data from (Lindsey et al. 2020), (https://www.ncbi.nlm.nih.gov/bioproject/584485), (https://datadryad.org/stash/dataset/doi:10.5061/dryad.r8ns3) and (Kordonowy and MacManes 2016) respectively. Conveniently, a testes specific transcriptome was previously assembled for P. eremicus (Kordonowy and MacManes 2016). For the other three species, we aligned each species' testes transcriptome data to reference CDS data (GCF_000500345.1 for P. maniculatus and P. polionotus since no reference exists for P. polionotus and P. maniculatus is its closest relative with a reference CDS file, and GCF_004664715.2 for P. leucopus) using hisat2 (Kim, Langmead, and Salzberg 2015) with default parameters and called variants using bcftools with default parameters. We then reconstructed each transcript in a variant aware manner using a custom python script that replaces the reference allele at a given site with any alternate allele at higher frequency in the considered testes dataset (see https://github.com/lgozasht/Peromyscus-reproductive-genetic-differences).

Finding homologous genes between species
Once we possessed reconstructed transcriptomes for each species, we performed an all vs. all BLAST (Altschul et al. 1990) with a minimum percent identity of 90 to identify putative homologous transcripts between species. We filtered our BLAST output for a minimum 80 percent overlap relative to each transcript length and concatenated all BLAST output into one file. Then, we employed silixx (Miele, Penel, and Duret 2011), a graph theory based clustering software, to cluster transcripts into families of highly similar sequences from homologous genes. We filtered for families that contained transcripts from all four species and included the transcript with the lowest e value when a species possessed multiple candidate transcripts for a given family. Through this process we identified 942 transcripts expressed in reproductive tissue that shared homology between all 4 of our considered species.
Generating and curating PAML input files . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 21, 2021. ; https://doi.org/10. 1101/2021 We employed the classic program PAML (Phylogenetic Analysis by Maximum Likelihood) (Yang 2007) to search for shared signatures of evolutionary rates in P. eremicus and P. polionotus relative to P. leucopus and P. maniculatus. For each transcript cluster, we used MAFFT (Katoh and Standley 2013) to generate a multiple sequence alignment and IQ-TREE (Minh et al. 2020) to infer a maximum likelihood based tree, as required for PAML input and confirmed that tree topology remained consistent. Since indels and poor alignment quality can impair PAML accuracy and functionality, we performed additional curation and filtering on our input alignments. We used the tool Alignment_Refiner_v2, (see https://github.com/dportik/Alignment_Refiner) (Young et al. 2019) to trim alignments above an overall missing data threshold of 1% and abandoned clusters with greater than 50 gaps in a particular sequence or 100 total gaps thereafter. We also ignored alignments containing sequences with lengths not divisible by 3. We were left with 199 transcripts suitable for PAML input after our stringent filtering and curation.

Comparing evolutionary rates of transcripts in P. eremicus and P. maniculatus
Short branch lengths between P. maniculatus and P. polionotus make it challenging to conduct accurate pairwise dn/ds comparisons (Mugal, Wolf, and Kaj 2014;Yang 2007;Kryazhimskiy and Plotkin 2008), so instead we performed comparisons between P. maniculatus and P. eremicus. For each transcript cluster, we applied PAML first with a null model (model = 2, see http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf) in which dN/dS differs in P. maniculatus and P. eremicus relative to P. polionotus and P. leucopus then with an alternative model (model = 2, see http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf) in which each dN/dS differs between P. maniculatus and P. eremicus relative to P. polionotus and P. leucopus. We then calculated log likelihood ratios using our null and alternative output and performed likelihood ratio tests for each transcript cluster. After further manual alignment quality filtering and applying a false-discovery rate correction for multiple testing (Benjamini and Hochberg 1995), we were left with 7 transcripts under significant differences in rates of evolution for downstream analysis ( = .05).

Differential expression analysis of P. eremicus and P. maniculatus
We obtained testes RNA data from three runs of separate replicates for P. eremicus (SRA run excessions: ERR1353571, ERR1353572, and ERR1353573) and one run of pooled replicates for P. maniculatus (SRA run excession: SRR8587279) and aligned to respective testes transcriptomes using hisat2 (Kim, Langmead, and Salzberg 2015). Next, we obtained read counts for each transcript using htseq-count (Anders, Pyl, and Huber 2014) and calculated RPKM for each transcript. We calculated delta RPKM for homologous transcripts between P. eremicus and P. maniculatus. Then, we performed independent Mann-Whitney U tests for delta RPKMs of transcripts corresponding to each gene in which we observed significant differences in evolutionary rates between P. eremicus and P. maniculatus. We aggregated p-values for . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 21, 2021. ; https://doi.org/10.1101/2021.04.21.440792 doi: bioRxiv preprint transcripts corresponding to each gene using Fisher's combined probability to obtain a final p-value for each gene, in similarity to a method described in (Yi et al. 2018).

Comparing parallel rates of evolution between monogamous and promiscuous species
We used PAML first to compute the likelihood of a null model (model = 0, see http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf) in which dn/ds is equal across all branches of the tree with our alternative model. Then we fit a branch model (model = 2, see http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf) in which the dn/ds ratio is allowed to differ between monogamous and promiscuous lineages, but where the two ratios are constrained to be identical across all monogamous or promiscuous branches. We included the monogamous P. eremicus and P. polionotus and the promiscuous P. maniculatus and P. leucopus for this analysis. To assess significance, we used a likelihood ratio test, which is X2 distributed with a single degree of freedom.

Finding evolutionary trends in different ontologies
We interrogated trends of evolutionary rates by aligning each gene that passed our upstream filtering to the Mus musculus GRCm38 genome (Genbank accession GCF_000001635.26) and cross referencing the MGI ontology database (see http://www.informatics.jax.org/mgihome/projects/aboutmgi.shtml). We then grouped genes by mouse gene ontology (GO) terms. For each group of genes, we performed a two-sided binomial test with regard to the number of genes evolving at slower (or faster) rates in monogamous species relative to promiscuous species. For our null, we used the proportion of genes evolving at slower (or faster) rates between monogamous and promiscuous species among all considered loci: P = .519. We filtered for groups in which we observed a significant trend of different evolutionary rates ( < 0.05). To determine that these signatures of correlated evolutionary rates within GO categories are not driven by species-specific effects, we performed a test for idiosyncratic evolution in each species. To do this, for each species, we ran PAML with a branch model in which that species possessed its own that differed from to the other three species.

Results and Discussion
We employ a maximum likelihood approach to compare evolutionary rates of transcripts expressed in the testes of the monogamous P. eremicus and the promiscuous P. maniculatus. Our analysis reveals 6 genes exhibiting significant differences in evolutionary rates between species. Intriguingly, 5 of these are implicated in spermatogenesis and are evolving faster in P.  Wilson et al. 2017). Sequence level changes leading to increased sperm abundance or competitive ability might be favored by selection in promiscuous species due to the selective pressures of mate competition and might quickly reach fixation. Thus, we hypothesize . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 21, 2021. ; https://doi.org/10.1101/2021.04.21.440792 doi: bioRxiv preprint that these genes may have experienced positive selection in P. maniculatus. Consistent with this hypothesis, previous studies have suggested that sperm production is sensitive to local adaptation in light of the pressures associated with competition (Ramm and Stockley 2009;Winkler et al. 2019;Lindsey et al. 2020).
Arhgef18, the other gene with a significantly different evolutionary rate between species lacks known reproductive implications. Although its role in spermatogenesis remains unexplored, Arhgef18 maintains apico-basal polarity, localization of tight junctions and cortical actin, and thus plays a role in cellular morphology and cell-cell interactions, which are essential to sperm cell organization and sertoli-spermatid interactivity during spermatogenesis (Gao and Cheng 2016;Wong and Cheng 2009). It is also evolving rapidly in P. maniculatus and thus may also be under positive selection.
Differential gene expression analysis lends further support to the evidence for positive selection of Spata16 and ESSBP. We calculated delta RPKMs for homologous transcripts between species, focusing specifically on the 6 genes exhibiting significantly different rates of evolution. Interestingly, we find that ESSBP and Spata16 exhibit significantly increased expression in P. maniculatus (Mann-Whitney U P = 0.0294 and 0.0376 respectively). Although evolutionary rate is generally negatively correlated with gene expression levels within species, the relationship between sequence divergence and expression divergence between species has been shown to be positively correlated in mammals (Warnefors and Kaessmann 2013;Liao and Zhang 2006). Additionally, changes in gene expression levels have been hypothesized to result from positive selection (Jordan, Mariño-Ramírez, and Koonin 2005), further highlighting the possibility that ESSBP and Spata16 might be under positive selection in P. maniculatus as a result of their roles in spermatogenesis.
Marginally increased evolutionary rates of genes grouped by ontology suggests molecular convergence of genes involved in sperm competitive ability as a result of relaxed selection associated with the evolution of monogamous behaviors. We used branch models to compare shared rates of evolution between two species in which monogamy evolved independently, P. eremicus and P. polionotus and two relatively divergent promiscuous Peromyscus species, P. maniculatus and P. leucopus. Due to the short branch lengths separating P. maniculatus and P. polionotus, branch tests using individual genes may be unreliable. Therefore, rather than focusing on significantly different rates between independent homologous loci, we used MGI's gene ontology database (Smith and Eppig 2009) to cluster all considered genes based on ontology, and explored trends of shared evolutionary rates between genes in monogamous species relative to promiscuous species with involvement in particular ontologies. We find significant differences in evolutionary trends between monogamous and promiscuous species in eight gene ontology categories, seven of which are important for sperm function (Table 2, Figure  3). We find that genes within ontologies with implications in sperm function generally have increased evolutionary rates in monogamous species. In fact, all considered genes involved in "flagellated sperm motility," "motile cilium," "centrosome" and "cell projection" are evolving at faster rates in monogamous species. One possible explanation for this is that monogamous . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 21, 2021. ; https://doi.org/10.1101/2021.04.21.440792 doi: bioRxiv preprint species exhibit reduced selective constraint on genes integral to sperm function. Mutations are likely filtered more frequently by purifying selection in promiscuous species than in monogamous species where negative selection is relaxed (Firman and Simmons 2010a). An alternative explanation is positive selection on these genes in monogamous species. However, in contrast with the expected observation for loci under positive selection, we do not find significant differences in evolutionary rates between monogamous and promiscuous species when we compare genes within these ontologies independently, and instead find that they exhibit dN/dS values closer to 1. These observations may constitute a molecular signature of convergent evolution in protein evolutionary rates for genes involved in sperm competitiveness in monogamous species.
Our results illuminate the molecular correlates of changes in reproductive functions associated with changes in mating systems. Sperm abundance, motility and morphology are critical for fertility (Sharpe 2012;Baptissart et al. 2013). We identify six genes that exhibit significantly increased evolutionary rates in the promiscuous P. maniculatus relative to the monogamous P. eremicus. Five of these are known to be essential to spermatogenesis and the sixth has plausible connections. Additionally, two of these genes also exhibit significantly increased expression in P. maniculatus, providing additional evidence that these loci experience positive selection P. maniculatus in light of competition between males to fertilize females. Consistent with these results, previous studies have found that sperm in the promiscuous P. maniculatus exhibit adaptive morphological traits and increased abundance when compared to that of monogamous Peromyscus species (Fisher et al. 2016).
Furthermore, we observe that genes implicated in sperm function exhibit slower evolutionary rates in two independently evolved monogamous species relative to promiscuous ones when clustered by ontology, highlighting the possible role of relaxed selection in propelling the molecular convergence of genes integral to sperm competitiveness. However, future work is required to investigate roles of selection, and we propose that these genes serve as candidates for future studies on adaptation and convergent evolution with regard to reproductive behavior. Future studies should include nucleotide and transcript-abundance polymorphism data to enable direct estimation of the mode of selection and increased sampling of monogamous species across many independent origins in diverse lineages to test for molecular convergence associated with differences in mating system.

Figure 1:
Phylogenetic relationship between our four considered Peromyscus species with the addition of Mus musculus and Rattus norvegicus as outgroups. We note that we only include a small subset of known Peromyscus species in our phylogeny. Our tree was produced with IQ-TREE (Minh et al. 2020) using 15 concatenated orthologs in each respective species. Genetic distances are measured in units of the number of base substitutions in proportion to alignment length (Sievers et al. 2011). Blue colored nodes represent promiscuity and red colored nodes represent monogamy (Shurtliff, Pearse, and Rogers 2005;Xia and Millar 1991;Costa et al. 2016). Monogamy has evolved independently at least twice in the Peromyscus genus (Turner et al. 2010;Bedford and Hoekstra 2015).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 21, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Figure 2: Respective evolutionary rates (ω) for genes exhibiting different evolutionary rates between in the monogamous P. eremicus and the promiscuous P. maniculatus. Colored points correspond to genes exhibiting statistically significant differences in evolutionary rates. Blue points correspond to genes that play crucial roles in spermatogenesis. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Table 1: Genes exhibiting significantly different rates of evolution in considered monogamous and promiscuous species. Column 1 displays gene name. Column 2 provides trend of evolution (* in monogamous species relative to promiscuous species). Column 3 summarizes respective known involvement in reproduction, column 4 provides FDR corrected X 2 P-values, and column 5 contains sources.

Figure 3:
Proportion of genes evolving faster in monogamous species relative to the number of genes within each respective ontology cluster. The seven ontology groups are colored orange or blue (Binomial test alpha = 0.05). The five Ontology groups with significant trends of evolution in monogamous species relative to promiscuous ones and known specific implications in sperm performance are colored blue. Other significant ontologies are colored in orange. The dotted line . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 21, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 represents the null expected proportion of genes evolving faster in monogamous species. The "*" denotes an ontology group that is not specifically involved in sperm performance but whose data point overlaps with an ontology group that is. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Table 2: MGI Ontology categories in which we observe differences in trends of evolutionary rates. Column 1 presents ontology categories. Column 2 provides trends in evolutionary rate (Monogamous relative to promiscuous). Column 3 provides respective sexual implications, column 4 contains P values (binomial test) and column 5 contains references.