ABSTRACT
Rapid ecological radiations provide useful models for identifying instances of parallel evolution, which can highlight critical genomic architecture involved in shared adaptations. Thermoregulatory innovations have allowed deer mice of the genus Peromyscus to radiate throughout North America, exploiting extreme thermal environments from mountain tops to desert valleys, and positioning this taxon as a model for understanding thermal adaptation. We compare signatures of selective sweeps across population-level genomic resequencing data from two desert-adapted Peromyscus species (P. crinitus and P. eremicus) and a third, widely-distributed habitat generalist (P. maniculatus) to test for signatures of parallel evolution and identify shared genomic architecture involved in adaptation to hot deserts. We found limited evidence of parallel evolution. Instead, we identified divergent molecular mechanisms of adaptation to similar environments potentially tied to species-specific historical demography that may limit or enhance adaptive variation. We also identified numerous genes under selection in P. crinitus that are implicated in osmoregulation (Trypsin, Prostasin) and metabolic responses to desert life (Kallikrein, eIF2-alpha kinase GCN2, APPL1/2). Evidence of varied evolutionary routes to achieve the same phenotype suggest there may be many molecular trajectories for small mammals to accommodate anthropogenic climate change.
INTRODUCTION
Increasing global temperatures and altered patterns of precipitation threaten biodiversity worldwide (Moritz et al. 2008; Cahill et al. 2013; Urban 2015). Phenotypic plasticity enables an immediate response to changing conditions but evolutionary change through adaptation will be critical for the long-term survival of most species (Hoffman and Sgro 2011; Cahill et al. 2013). Range shifts upward in elevation and latitude have been documented in a number of terrestrial species and interpreted as a response to warming (Chen et al. 2011; Tingley and Beissinger 2013; Freeman et al. 2018); however, responses vary even among closely-related species or populations (Hoffman and Willi 2008; Moritz et al. 2008). The physiological limits responsible for organismal range shifts are in part governed by genetics, which can facilitate adaptation to specific environmental conditions. Population genomic methods enable the identification of genes and molecular pathways involved in local adaptation by scanning the genome for signatures of selection (Bassham et al. 2018; Garcia-Elfring et al. 2019). For species that have independently adapted to similar environments, parallel or convergent evolution can be inferred if a greater number of genes or phenotypes share signatures of selection than would be expected under a purely stochastic model of evolution (e.g., drift). Convergence typically implies the evolution of similar adaptive responses independently among distantly related taxa in response to similar environmental or ecological conditions; in turn, parallel evolution is defined as the occurrence of similar adaptive changes in groups with common ancestry (Simpson 1961; Wood et al. 2005). Convergent evolution is often presumed to be driven by different underlying molecular mechanisms, whereas parallel evolution may be driven by similar mechanisms; however, this generalization may not reflect reality (Arendt and Reznick 2008). Evidence of parallel or convergent evolution can suggest a deterministic effect of selection and highlight conserved genomic architecture involved in shared adaptive phenotypes (Rundle et al. 2000; McDonald et al. 2009), while a lack of concerted evolution may identify novel evolutionary strategies to achieve the same phenotypic result.
As a model taxon (Dewey and Dawson 2001; Bedford and Hoekstra 2015) inhabiting varied environments throughout North America, deer mice in the genus Peromyscus are a frequent and productive subject of classical adaptation studies (e.g., physiological, Storz 2007; behavioral, Hu and Hoekstra 2017; genetic, Cheviron et al. 2012; Storz and Cheviron 2016; Tigano et al. 2020). Physiological similarity of deer mice to lab mice (Mus musculus) further broadens the implications of evolutionary and ecological investigations of Peromyscus by linking relevant results to biomedical sciences. The genus Peromyscus (N = 67 species; mammaldiversity.org) is hypothesized to be the product of a rapid ecological radiation across North America, evident in their varied ecological niches and rich species diversity (Glazier 1980; Riddle et al. 2000; Bradley et al. 2007; Platt et al. 2015; Lindsey 2020). Adaptive radiations are useful natural experiments for identifying patterns of parallel or convergent evolution, or the lack thereof. Short generation times and accelerated thermoregulatory evolution relative to other mammals, among other adaptive responses, appear to have enabled Peroymscus rodents to exploit extreme thermal environments, ranging from cold, high elevations (Pierce and Vogt 1993; Cheviron et al. 2012, 2014; Kaseloo et al. 2014; Garcia-Elfring et al. 2019) to arid, hot deserts (Riddle et al. 2000; MacManes 2017; Tigano et al. 2020). Thermoregulation and dehydration tolerance are complex physiological traits and there are several potential evolutionary routes to achieve the same phenotypic outcome. Within this framework, comparisons among divergent Peromyscus species adapted to similar environments may highlight shared adaptive polymorphisms or disparate evolutionary paths central to achieving the same phenotype (Cheviron et al. 2012; Ivy and Scott 2017; Hu and Hoekstra 2017; Storz et al. 2019). In cold environments, endotherms rely on aerobic thermogenesis to maintain constant internal body temperatures. Changes in both gene expression and the functional properties of proteins in high-altitude adapted deer mice suggest that changes in multiple hierarchical molecular pathways may be common in the evolution of complex physiological traits, such as thermoregulation (Wichman and Lynch 1991; Storz 2007; Cheviron et al. 2012; Storz and Cheviron 2016; Garcia-Elfring et al. 2019). Nonetheless, investigations of thermoregulation among high-elevation species may be confounded by concurrent selection on hemoglobin oxygen-binding affinity as a consequence of a reduction in the partial pressure of oxygen as elevation increases substantially (Storz and Kelly 2008; Storz et al. 2010; Natarajan et al. 2015). In hot environments, endotherms are challenged with balancing heat dissipation, energy expenditure, and water retention (Anderson and Jetz 2005), resulting in a different suite of behavioral, physiological, and molecular adaptations that enable survival (Schwimmer and Haim 2009; Degen 2012; Kordonowy et al. 2016), but may be confounded by acute or chronic dehydration. Understanding the biochemical mechanisms that enable survival under extreme environmental stress can provide important insights into the nature of physiological adaptation.
Rapid thermoregulatory and ecological diversification among Peromyscus species (origin ~8 Mya, radiation ~5.71 Mya; Platt et al. 2015) positions these small rodents as models for anticipating species responses to accelerated warming (Cahill et al. 2013). Desert specialist phenotypes have evolved repeatedly during the course of the Peromyscus radiation, with each species and populations therein subject to distinct histories of demographic variation and gene flow. These idiosyncratic histories can have a direct impact on evolution, as effective population sizes are inextricably linked to the efficacy of selection and maintenance of genetic diversity in wild populations (Charlesworth 2009). Further, contemporary or historical gene flow may help or hinder adaptive evolution through homogenization or adaptive introgression, respectively (Coyne and Orr 2004; Morjan and Reiseberg 2004; Jones et al. 2018; Tigano and Friesen 2016). Native to the American West, the canyon mouse (P. crinitus, Fig 1.) is well adapted to desert life. In the lab, P. crinitus can survive in the absence of exogenous water, with urine concentration levels similar to that of desert-adapted kangaroo rats (Dipodomys merriami; Abbott 1971; MacMillen 1972; MacMillen and Christopher 1975; MacMillien 1983), but without equivalently specialized renal anatomy (Issaian et al. 2012). Canyon mice also exhibit a lower-than-expected body temperature relative to their size and can enter environmentally-mediated torpor in response to drought, food limitation, or low external temperatures (McNab 1968; McNab and Morrison 1963; Morhardt and Hudson 1966; Johnson and Armstrong 1987), which facilitates survival in highly-variable and extreme desert environments. These phenotypes persist for multiple generations in the lab indicating they have a genomic basis (McNab and Morrison 1968). Cactus mice (P. eremicus) are related to P. crinitus and the two species are frequently sympatric. Cactus mice exhibit similar adaptations to desert environments, including urine concentration, reduced water requirements, and environmentally-induced torpor (Veal and Caire 1979). In contrast, the habitat generalist P. maniculatus (North American deer mouse) is geographically-widespread, native to both cool, high-elevations and hot southwestern deserts. Whole-genome assemblies are publicly available for both P. eremicus (Tigano et al. 2020) and P. maniculatus (Harvard University, Howard Hughes Medical Institute), which positions these species as ideal comparatives against P. crinitus to identify genes and regulatory regions associated with desert adaptation. Without fossil evidence of divergence and subsequent convergence between desert-adapted Peromyscus species, similar patterns of selection are interpreted as evidence of parallel evolution.
Here, we investigate genomic signatures of selection in desert-adapted P. crinitus. We contrast signatures of selective sweeps across three related Peromyscus species, two desert specialists (P. crinitus and P. eremicus) and one habitat generalist (P. maniculatus), to test for signatures of parallel evolution. We hypothesize that similar genes or functional pathways will be under selection in both desert-adapted species and not under selection in P. maniculatus, providing a signature of parallel evolution. Finally, we place selective sweep analyses into an evolutionary framework to interpret the varied evolutionary trajectories available to small mammals to respond to changing environmental conditions and to account for demographic and gene flow events.
METHODS
De novo genome sequencing and assembly
Wild mice were handled and sampled in accordance with the University of New Hampshire and University of California Berkeley’s Institutional Animal Care and Use Committee (130902 and R224-0310, respectively) and California Department of Fish and Wildlife (SC-008135) and the American Society of Mammalogists best practices (Sikes and Animal Care and Use Committee of the American Society of Mammalogists 2016).
For the assembly of the P. crinitus genome, DNA was extracted from a liver subsample from a P. crinitus individual collected in 2009 from the Philip L. Boyd Deep Canyon Desert Research Center in Apple Valley, California. To generate a high-quality, chromosome-length genome assembly for this individual we extracted high-molecular-weight genomic DNA using a Qiagen genomic tip kit (Qiagen, Inc, Hilden, Germany). A 10X Genomics linked-reads library was prepared according to manufacturer protocol at Mount Sinai and sequenced to a depth of 70X on a HiSeq 4000 (Novogene, Sacramento, California, USA). 10X Genomics reads were de novo assembled into contigs using Supernova 2.1.1 (Weisenfeld et al. 2017). To arrange the scaffolds thus obtained in chromosomes, a Hi-C library for P. crinitus was constructed and sequenced from primary fibroblasts from the T.C. Hsu Cryo-Zoo at the University of Texas MD Anderson Cancer Center. The Hi-C data were aligned to the supernova assembly using Juicer (Durand et al. 2016). Hi-C genome assembly was performed using the 3D-DNA pipeline (Dudchenko et al. 2017) and the output was reviewed using Juicebox Assembly Tools (Dudchenko et al. 2018). The Hi-C data are available on www.dnazoo.org/assemblies/Peromyscus_crinitus visualized using Juicebox.js, a cloud-based visualization system for Hi-C data (Robinson et al. 2018).
Benchmarking Universal Single-Copy Orthologs (BUSCO v3, using the Mammalia odb9 database; Simão et al. 2015) and OrthoFinder2 (Emms and Kelly 2015) were used to assess genome quality and completeness. Genome sizes were estimated for each species using abyss-fac (Simpson et al. 2009) and the assemblathon_stats.pl script available at: https://github.com/ucdavis-bioinformatics/assemblathon2-analysis/. RepeatMasker v.4.0 (Smit et al. 2015) was used to identify repetitive elements. The genome was annotated using the software package MAKER (3.01.02; Campbell et al. 2014). Control files, protein, and transcript data used for this process are available at https://github.com/macmanes-lab/pecr_genome/tree/master/annotation. We used Mashmap (-f one-to-one --pi 90 -s 300000; Jain et al. 2017, 2018) to assess and plot (generateDotPlot.pl) syntenic conservation between P. crinitus and P. maniculatus genomes. Peromyscus crinitus chromosomes were renamed and sorted using seqtk (github.com/lh3/seqtk) following the P. maniculatus chromosome naming.
For comparative genomic analyses, we generated low-coverage whole-genome resequencing data for nine P. crinitus and five P. maniculatus individuals collected from syntopic areas of southern California (Table S1). Peromyscus crinitus samples were collected from the University of California (UC) Philip L. Boyd Deep Canyon Desert Research Center (DCDRC) near Palm Desert, California, and P. maniculatus were collected further East from the UC Motte Rimrock Reserve and Elliot Chaparral Reserves. In addition to these, we used publicly available whole-genome resequencing data from 26 P. eremicus individuals, also collected from DCDRC and Motte Rimrock Reserve and prepared and sequenced in parallel (Tigano et al. 2020). All samples were collected in 2009, with the exception of eight P. eremicus samples which were collected in 2018. Animals were collected live in Sherman traps and a 25 mg ear-clip was taken from each individual and stored at −80°C in 95% ethanol. Animals were sampled from arid areas with average monthly temperatures between 9-40°C and mean annual rainfall of 15-18 cm. The Biotechnology Resource Center at Cornell University (Ithaca, NY, USA) prepared genomic libraries using the Illumina Nextera Library Preparation kit (e.g., skim-seq). Libraries were sequenced at Novogene (Sacramento, CA, USA) using 150 bp paired-end reads from one lane on the Illumina NovaSeq S4 platform. fastp v. 1 (Chen et al. 2018) was used to assess read quality and trim adapters. Sequences from all samples and all species were mapped with BWA (Li and Durbin 2010) to the P. crinitus reference genome to enable comparative analyses, duplicates removed with samblaster v. 0.1.24 (Faust and Hall 2014), and alignments indexed and sorted using samtools v. 1.10 (Li et al. 2009).
Population Genomics
We used the software package ANGSD v. 0.93 (Korneliussen et al. 2014) to call variants from low-coverage population genomic data with high confidence with the general options: -SNP_pval 1e-6, -minMapQ 20, -minQ 20, -setMinDepth 20, -minInd 20, -minMaf 0.01. ANGSD was run across all species and again within each species, where we required a minimum of half (-minInd) P. crinitus and P. eremicus samples and all P. maniculatus samples to meet independent quantity (-minMapQ) and quality (-minQ) thresholds and sample representation for each variable site in each species.
Differentiation among species was examined using a Principle Component Analysis (PCA, see Supporting Materials; ngsTools, Fumagalli et al. 2014) and multidimensional scaling (MDS) of principal components in NGSadmix v. 33 (Skotte et al. 2013). MDS plots were generated in R v.3.6.1 (R Core Team 2017) based on the covariance matrix. Cook’s D was used to identify MDS outliers, using the broken stick method to identify single samples with undue influence (Cook and Weisberg 1984; Williams 1987). NGSadmix was used to fit genomic data into K populations to parse species-level differences and provide a preliminary screen for genomic admixture under a maximum-likelihood model. Nonetheless, expanded sample sizes, including representatives from additional populations of each species are necessary to thoroughly investigate patterns of population structure and introgression. We tested K = 1 through K = (N - 1), where N is the number of total individuals examined. NGSadmix was run for all species combined and for each species independently. As an additional measure of differentiation, we estimated weighted and unweighted global FST values for each species pair using realSFS in ANGSD.
We used Pairwise sequential Markovian Coalescent (PSMC v. 0.6.5-r67; Li and Durbin 2011) to examine patterns of historical demography through time for each species. The original reads used to generate the high-quality, chromosome-length assemblies for each species (P. crinitus generated here; P. eremicus, SAMEA5799953, Tigano et al. 2020; P. maniculatus: GCA_003704035.1, Harvard University) were mapped to their assembly reference to identify heterozygous sites and indexed in BWA. Samblaster removed PCR duplicates and picard (http://broadinstitute.github.io/picard/) added a read group to the resulting bam file and generated a sequence dictionary (CreateSequenceDictionary) from the reference assembly. Samtools was used to sort, index, and variants called (mpileup) for each species, with bcftools v1.10.2 (call, Li et al. 2009) and VCFtools v 0.1.16 (vcf2fq, Danecek et al. 2011). PSMC distributions of effective population size (Ne) were estimated with 100 bootstrap replicates. PSMC results were visualized through gnuplot v. 5.2 (Williams and Kelley 2010), using perl scripts available at github.com/lh3/psmc, and scaled by a generation time of 6 months (0.5 yr, Millar, 1989; Pergams and Lacy 2008) and a general mammalian mutation rate of 2.2 × 10−9 substitutions/site/year (Kumar and Subramanian 2002).
Tests for selection & convergence
To detect recent selective sweeps in low-coverage whole-genome data, we used Sweepfinder2 (DeGiorgio et al. 2016; Nielsen et al. 2005). Sweepfinder2 was run on both variant and invariant sites (Huber et al. 2016) for each species, excluding sex chromosomes. Sex-chromosomes were excluded for three reasons: (1) sex chromosome evolution is both rapid and complex relative to autosomes, (2) we had different sample sizes of each sex across species, and (3) desert adaptations, the focus of this study, are unlikely to be sex-specific. We repeated Sweepfinder2 analyses on P. eremicus, initially analyzed by Tigano et al. (2020), using an improved annotation scheme based on Peromyscus-specific data rather than Mus musculus genes. Allele frequencies were estimated in ANGSD, converted to allele counts, and the site frequency spectrum (SFS) was estimated from autosomes only in Sweepfinder2. Sweeps were estimated from the pre-computed SFS and the composite likelihood ratio (CLR) and alpha values, indicating the strength of selection, were calculated every 10,000 sites. Per Tigano et al. (2020), a 10 kb window size was selected as a trade-off between computational time and resolution. CLR values above the 99.9th percentile of the empirical distribution for each species were considered to be evolving under a model of natural selection, hereafter referred to as significant sweep sites. Smaller sample sizes produce fewer bins in the SFS and a diminishing number of rare alleles which may impact both the overall SFS and local estimate surrounding testing sites; therefore, we explored the impact of sample sizes on Sweepfinder2 results in the Supporting Information.
For each species, mean Tajima’s D was calculated across the entire genome in non-overlapping windows of 10 kb and 1 kb in ANGSD. Nucleotide diversity (π) was also calculated in 10 kb and 1 kb windows and corrected based on the number of sites genotyped (variant and invariant) per window. Tajima’s D and π are expected to be significantly reduced in regions surrounding selective sweeps (Smith and Haigh 1974; Kim and Stephan 2002), therefore we used a Mann-Whitney test (p < 0.05, after a Bonferroni correction for multiple tests) to measure significant deviations from the global mean in 1 kb and 10 kb flanking regions surrounding significant sweep sites and 27 candidate genes identified in previous studies (MacManes 2017; Table S2). Candidate loci include aquaporins (N = 12), sodium-calcium exchangers (SLC8a1), and Cyp4 genes belonging to the Cytochrome P450 gene family (N = 14). We used custom python scripts to functionally annotated (I) the closest gene to each significant sweep site, (II) the nearest upstream and downstream gene, regardless of strand (sense/antisense), and (III) the nearest upstream and downstream gene on each strand. Dataset I follows the general assumption that proximity between a significant sweep site and a protein-coding gene suggests interaction. Dataset II represents an extension of that model by encompassing the most proximal gene in each direction. Because Sweepfinder2 is performed on the consensus sequence and our data is unphased, we do not have information indicating on which strand a significant sweep site occurs. Therefore, dataset III encompasses strand-uncertainty by including the two nearest genes to a significant sweep site on both strands. It should be noted that the genes identified in smaller datasets (I, II) are nested within the larger datasets (II, III) and by definition, the larger datasets include more noise which may dilute a signature of parallel evolution, but may better capture the true signal of selection. Hence, it is important to critically examine numerous hierarchical gene subsets. We tested genes from each dataset for functional and pathway enrichment in Gene Ontology (GO) categories using Panther v. 15.0 (Mi et al. 2017) and extracted GO terms for each enriched functional group. We used Mus musculus as a reference and a Bonferroni correction for multiple tests (p < 0.05) to correct for false discoveries. Enriched GO terms were summarized and visualized in REVIGO (Reduce and Visualize Gene Ontology, Supek et al. 2011) implemented at: http://revigo.irb.hr/index.jsp?error=expired. As a test for convergence, the overlap in the gene names and enriched GO terms associated with significant selective sweeps was assessed for each dataset. Overlap was visualized in the R package VennDiagram (Chen and Boutros 2011). To test for convergence, we used a Fisher’s Exact Test (p < 0.05) in the GeneOverlap package (Shen 2016) in R to assess whether gene or enriched GO term overlap between species was greater than expected based on the total number of genes in the genome.
To compare patterns of gene family expansion and contraction potentially involved in adaptation within the genus Peromyscus, we analyzed 14 additional genomes, including ten Peromyscus species and four near outgroup rodent species: Microtus ochrogaster, Neotoma lepida, Sigmodon hispidus, and Mus musculus (Table S3). To prevent bias driven by variable assembly qualities, samples with < 70% complete mammalian BUSCOs were excluded from downstream analyses, resulting in the final analysis of ten species. Groups of orthologous sequences (orthogroups) were identified in Orthofinder2. Invariant orthogroups and groups that varied by more than 25 genes across taxa (custom python script: ortho2cafe.py) were excluded. Our rooted species tree, estimated in Orthofinder2, was used to calculate a single birth-death parameter (lambda) and estimate changes in gene family size using CAFE v.4.2.1 (Han et al. 2013). Results were summarized using the python script cafetutorial_report_analysis.py available from the Hahn Lab: hahnlab.github.io/CAFÉ/manual.html.
RESULTS
Chromosome-length genome assembly for P. crinitus
Linked reads combined with Hi-C scaffolding produced a high-quality, chromosome-length genome assembly for P. crinitus. Our assembly has a contig N50 of 137,026 bp and scaffold N50 of 97,468,232 bp, with 24 chromosome-length scaffolds. The anchored sequences in the three Peromyscus genome assemblies were as follows: P. crinitus genome ~2.27 Gb, P. eremicus ~2.5 Gb, and P. maniculatus ~2.39 Gb (Table 1). Our assembly has high contiguity and completeness and low redundancy, as demonstrated by the presence of 89.3% complete BUSCOs, 0.9% of duplicates and 9.0% missing, excluding unplaced scaffolds. As anticipated (Smalec et al. 2019), we found no significant variation in chromosome number or major interchromosomal rearrangements between P. crinitus and P. maniculatus (Figure S4). We annotated 17,265 total protein coding genes in the P. crinitus genome. Similar to other Peromyscus species, LINES1 (long interspersed nuclear elements) and LTR (long terminal repeats) elements comprised 22.7% of the repeats in the P. crinitus genome, with SINEs (short interspersed nuclear elements) representing an additional 9.6% (Table S5). Although similar to other Peromyscus species, P. crinitus has the greatest total repeat content (>37%; see Tigano et al. 2020 Supplementary Table 2).
Population Genomics
MDS analysis parsed the three species into well-separated clusters and identified no outliers or evidence of admixture (Fig. S6). Analysis of genetic structure identified all three species as a single group (K = 1) with the highest likelihood. A three-population model neatly parsed the three species, as expected (Fig. S7, Table S8). We found evidence of potential admixture in P. crinitus with at least three individuals containing 11-27% ancestry from P. eremicus and additional material from P. maniculatus (4-16%), although variable samples sizes may impact assignment certainty and expanded sequencing of additional species and populations will be required to identify the specific sources of introgressed material. Four P. eremicus individuals had < 90% assignment probability to the P. eremicus species cluster, with a maximum of 15% assignment to a different species cluster. Identification of admixture in both species is not biased by differences in coverage, as low (2X), medium (8X), and high coverage (17X) samples were found to be admixed at a < 90% assignment threshold. No P. maniculatus individuals were identified as admixed. PSMC estimates of historical demography show greater variance and a higher overall Ne for P. crinitus relative to P. eremicus (Fig. 2). Demographic estimates for P. maniculatus are included as a reference but should be interpreted with caution as they are based on a captive-bred individual and may not accurately reflect the demography of wild populations.
Average Tajima’s D (1 kb windows) was negative for all species and ranged from −0.69 to −1.61. Peromyscus crinitus had the lowest Tajima’s D value and P. maniculatus the highest (Fig. S9). Global pairwise FST was moderate between all species ranging from 0.20-0.27 (unweighted: 0.12-0.17). Mean global π (1 kb windows) was 0.005 (±0.005) for P. crinitus, 0.007 (± 0.007) for P. eremicus, and 0.012 (± 0.010) for P. maniculatus (Fig. S10).
Selection & convergence
Within P. crinitus we identified a total of 209 significant sweep sites (Table S11), with 104 sites localized on chromosome 9 and 16 regions experiencing major selective sweeps (Fig. 3). Despite the large size of chromosome 1, we found no significant sweep sites on this chromosome for P. crinitus. We found 239 total significant sweep sites for P. eremicus (Table S12), with 56 sites concentrated on chromosome 1. Finally, we identified a total of 213 significant sweep sites for P. maniculatus (Table S13), with 103 sites located on chromosome 4.
A significant selective sweep (CLR > 99.9%) affects the area surrounding the site tested - including both protein coding and non-coding regions - and cannot identify the specific nucleotides under selection. Under the assumption that proximity suggests interaction between a functional region of the genome and a sweep site, we hierarchically examined protein coding genes most proximal to each sweep site. On average the distance from a sweep site to the nearest coding gene was 45 kbp in P. crinitus (range: 31 - 439,122 bp), and much greater for both P. maniculatus (average: 152 kbp; range: 190 - 513,562 bp) and P. eremicus (average: 117 kbp; range: 38 - 1,423,027 bp), which may be partially explained by differences in assembly quality. For both P. eremicus and P. maniculatus, only two significant sweep sites were identified within protein-coding genes (P. eremicus: Meiosis-specific with OB domain-containing protein [gene name: MEIOB], Harmonin [Ush1c]; P. maniculatus: Dehydrogenase/reductase SDR family member 7B [DRS7B] and Zinc finger protein 217 [ZN217]). In contrast, for P. crinitus 12 significant sweep sites fell within 19 distinct candidate loci, many of which code for numerous alternatively spliced transcripts (Table 2). Among P. crinitus significant sweep sites localized within coding sequences, we identified 19 enriched GO terms (3 Biological Process [BP], 9 Molecular Function [MF], 7 Cellular Component [CC]), with functionality ranging from ‘proteolysis’ to ‘hydrolase’ activity (Fig. 4). Functional examination of those genes identifies solute regulation as a key function, with genes pertaining to calcium (Trypsin-2 [PRSS2]) and zinc (Kallikrein-4 [KLK4]) binding and sodium regulation (Prostasin [PRSS8]) identified as under selection. Examination of the two nearest genes to each sweep site (dataset II: one upstream, one downstream gene, regardless of strand) in P. crinitus identified 121 unique genes and 26 enriched GO (gene ontology) terms (8 Biological Processes [BP], 10 Molecular Functions [MF], 8 Cellular Component [CC]), with functionality pertaining to metabolism (e.g., ‘protein metabolic process’, ‘organonitrogen compound metabolic process’, ‘peptide metabolic process’) and ribosomes (Fig. 4; Tables S11-13). For P. eremicus, we identified 202 unique genes and 14 enriched GO terms (0 BP, 1 MF, 13 CC) associated with selective sweeps, with functionality again centered around ribosomes. For dataset II, two genes and seven enriched GO terms were shared between the two desert-adapted species and we found no relationship between genes identified under selection between the two species. Functional enrichment of P. eremicus and P. maniculatus across all datasets was limited to ribosomes (e.g., ‘structural constituent of ribosome’, ‘cytosolic ribosome’, ‘ribosomal subunit’; Fig. 4; Tables S14-16). In contrast, functionality of enriched GO terms for P. crinitus centered on metabolic processes, including protein breakdown, hydrolysis, and cellular functionality (e.g., ‘organelle’, ‘intracellular’, ‘cytoplasm’; Fig. 4; Table S16), in addition to ribosomes. See Supporting Information for detailed results for datasets I and III. Peromyscus eremicus and P. maniculatus shared significant overlap (p < 0.05) in enriched GO terms across all hierarchical data subsets (I, II, III; Fig. 5).
Significant overlap of enriched GO terms was also detected between P. crinitus and both other Peromyscus species for datasets II and III only, with zero overlap for dataset I (Fig. 5). Significant overlap in the genes located near significant sweep sites among desert-adapted Peromyscus was only detected in dataset III. Overall, P. crinitus consistently had greater diversity of functional enrichment relative to the other two species and the GO terms and genes involved in ribosomal functionality were frequently shared among all species.
Species tree estimates (Fig. S17) were consistent with previous phylogenetic investigations (Bradley et al. 2007). Peromyscus crinitus and P. eremicus are related in our species tree, however a number of intermediate taxa remain unsampled (e.g., P. merriami, P. californicus). Among the species examined here, the desert adapted crinitus-eremicus clade is related to a clade comprised of P. leucopus, P. polionotus, and P. maniculatus, with the nasutus-attwateri clade most basal within Peromyscus. For the Peromyscus genus, we found 19,925 gene families that had experienced contractions, 502 expansions, and 12 families that were evolving rapidly. However, we found no gene families experiencing significant expansions, contractions, or rapid evolution within Peromyscus or below the genus level.
Counter expectations, Tajima’s D and π surrounding significant selective sweep sites were significantly higher than the global mean estimates for each species (Table S18). Only in P. maniculatus did we detect a significant reduction in π surrounding significant sweep sites. Mean Tajima’s D surrounding a priori candidate loci were also significantly more positive in all three species.
DISCUSSION
Continued and accelerating environmental change increases the exigency of accurately anticipating species responses to anthropogenic climate change. Adaptive evolutionary responses vary among species and populations, even when subjected to seemingly synonymous environmental selective pressures (Bi et al. 2015; Garcia-Elfring et al. 2019), but evidence of parallel or convergent evolution can highlight critical genomic architecture involved in key adaptations. We broadly define convergence as the evolution of similar adaptive responses among distantly related taxa in response to similar environmental or ecological conditions and parallel evolution as the occurrence of similar adaptive changes in groups that share recent common ancestry (Simpson 1961; Wood et al. 2005). We analyzed genome-wide patterns of selective sweeps among three species of deer mice within the North American genus Peromyscus to identify critical genomic architecture or alternative evolutionary pathways to achieve desert adaptation. We hypothesized that desert specialists P. crinitus and P. eremicus would share similar patterns of selective sweeps related to surviving high-temperature, low-water environments indicative of parallel or convergent evolution. The species examined here share a common ancestor and the two desert adapted taxa share an even more recent ancestor (Fig. S17); however, there are number of unsampled species separating P. crinitus from P. eremicus (e.g., P. merriami, P. californicus) and these two taxa are relatively divergent (average Fst of 0.25; divergence > 1 Mya, Platt et al. 2015). Nonetheless, without evidence of ancestral divergence followed by subsequent convergence (molecular or phenotypic) we cannot disentangle convergence from parallel evolution within the context of this investigation. Instead, we infer parallel evolution as the most parsimonious explanation for coordinated selection among desert-adapted Peromyscus species. As such, both selection on novel beneficial mutations (typically deemed convergence) and the coincident retention of advantageous ancestral polymorphisms are interpreted as evidence in support of parallel evolution, as we do not have ancestral state information to distinguish the former.
Overall, we rejected our hypothesis that desert adapted species share parallel signatures of selective sweeps reflecting adaptation to similar environments. Instead, we identified divergent molecular mechanisms of adaptation to desert environments, with P. crinitus potentially responding primarily through genomic changes to protein coding genes and P. eremicus through transcriptional regulation of gene expression. This result potentially contradicts the generalization that convergent evolution is driven by different molecular mechanisms and parallel evolution, by similar mechanisms (Arendt and Reznick 2008). Molecular flexibility of thermoregulatory responses may have catalyzed the radiation of Peromyscus in North America by enabling rapid exploitation of novel thermal environments. Finally, the application of an evolutionary lens to the interpretation of genomic patterns of selective sweeps, particularly one that integrates historical demography and gene flow, may serve to inform which evolutionary mechanism (genomic vs. transcriptomic) will be most efficacious to achieve similar adaptive phenotypes in other small mammals.
Limited evidence of parallel evolution
Identification of similar genes or functions under selection in different species adapted to similar environments suggests a deterministic effect of natural selection (Rosenblum et al. 2014) and provides evidence in support of parallel evolution. In contrast, we found limited evidence of parallel evolution among desert-adapted Peromyscus. Few to no enriched GO terms overlapped between desert-specialists (Fig. 5). Instead, GO terms relating to ribosomes (e.g., ‘ribosome’, ‘ribosomal subunit’, ‘cystolic ribosome’, etc.) broadly overlapped between all three Peromyscus species examined, with the most significant overlap occurring between desert adapted P. eremicus and generalist P. maniculatus. Although P. maniculatus are not desert specialists, the individuals sequenced here were collected in arid regions of southern California; therefore, the shared signature of selection on ribosomes across all three species may reflect broadly shared adaptations to hot and dry conditions or relate to thermoregulatory plasticity among Peromyscus rodents. We found few genes experiencing significant selective sweeps shared among Peromyscus species, with only one significant relationship: 10 genes, among hundreds, were shared between the two desert specialists (dataset III; Fig. 5) and were consistent with a signature of parallel evolution. Nonetheless, many of the overlapping genes are directly related to broad ribosomal functionality (e.g., RL36, RS26, RL15, RS2) also shared with P. maniculatus. Further, selective sweeps are only one way to detect signatures of parallel evolution and this hypothesis remains to be explored in greater detail using additional methods (Booker et al. 2017, Weigand and Leese 2018).
Although we found no significantly expanded or contracted gene families within the genus Peromyscus, previous investigations of the entire Myodonta clade within Rodentia identified multiple expanded or contracted gene families associated with ribosomes in P. eremicus (Tigano et al. 2020). Ribosomes play a critical role in protein synthesis and degradation. Cellular damage accumulates quickly in desert environments as a consequence of increased thermal- and osmotic-stress (Lamitina et al. 2006; Burg et al. 2007). In response, changes in gene expression modulate osmoregulation by removing and replacing damaged proteins to prevent cell death (Lamitina et al. 2006). While ribosomes appear to be a target of parallel evolution in desert-adapted Peromyscus, this genomic signature is not unique. Instead, selection on ribosomal functionality may be convergent across many species adapted to distinct thermal environments (metazoans; Porcelli et al. 2015). Ribosomes are evolutionarily linked to the mitochondrial genomes of animals (Barreto and Burton 2012; Bar-Yaacov et al. 2012) and accelerated mitochondrial evolution in animals has led to compensatory, rapid evolution of ribosomal proteins (Osada and Akashi 2012; Barreto and Burton 2013; Bar-Yaacov et al. 2012). Rapid mitochondrial diversification within Peromyscus (Riddle et al. 2000; Bradley et al. 2007; Platt et al. 2015), coincident with the ecological radiation of this genus (Lindsey 2020), suggests that equivalent, rapid selection on ribosomal proteins may be a key evolutionary innovation that enabled Peromyscine rodents to successfully and quickly adapt to varied thermal environments. Comparisons among additional desert- and non-desert-adapted Peromyscus species will be necessary to test this hypothesis within an evolutionary framework. Although not unique to desert-species, rapid ribosomal evolution may provide a common evolutionary mechanism to respond to anthropogenic climate change.
Evaporative cooling through sweating, panting, or salivating increases water loss and challenges osmoregulatory homeostasis while maintaining thermoregulation in a hot and dry climate (McKinley et al. 2018). Thermal stress exacerbates dehydration by increasing evaporative water loss and if untreated, can result in cognitive dysfunction, motor impairment, and eventually death. In consequence, osmoregulatory mechanisms are often under selection in extreme thermal environments (MacManes and Eisen 2014; Marra et al. 2014). Consistently, 40% of the 10 genes targeted by selective sweeps and shared between desert adapted Peromyscus are involved in ion balance (Table 3). Proteins Trypsin-2 (TRY2) and Trimeric intracellular cation channel type B (TM38B) are associated with selective sweeps in both desert-adapted species and involved in calcium ion (Ca2+) binding and release, respectively. DNA-directed RNA polymerase III (RPC1) has also experienced a significant selective sweep in both desert species and influences magnesium (Mg2+) binding. Calcium and magnesium cations are among those essential for osmoregulation (also, Na+, K+, Cl−, HCO3−; Stockham and Scott 2008) and parallel selection on these genes is consistent with the hypothesis that solute-carrier proteins are essential to maintaining homeostasis in desert-specialized rodents (Marra et al. 2014; Kordonowy and MacManes 2017). Additional genes implicated in osmoregulation were identified as experiencing selective sweeps only in P. crinitus (Table 2; Table S16). In addition to those shared with P. eremicus, Prostatin (PRSS8), only under selection in P. crinitus, is critically responsible for increasing the activity of epithelial sodium (Na+) channels, which mediate Na+ reabsorption through the kidneys (Narikiyo et al. 2002). Two more genes associated with Ca2+ regulation (Anionic Trypsin-2 [PRSS2] and Trypsin [TRYP]) and other genes regulating zinc (KLK4) and iron (NCOA4) were also identified as targets of selective sweeps exclusively in P. crinitus.
Metabolic tuning: proteins-for-water or lipids-for-torpor?
Hot deserts experience dramatic fluctuations in both food and water availability that challenge species survival (Noy-Meir 1973; Silanikove 1994). Mammals accommodate high temperatures by increasing their body temperature, up to a point, and cold temperatures by aerobic thermogenesis or metabolic suppression and the initiation of torpor or hibernation (Levesque et al. 2016). When resources are scarce, metabolism relies exclusively on endogenous nutrients; carbohydrates (e.g., sugars, glucose) are consumed immediately, then lipids, and eventually, proteins. Protein oxidation has a low energy return relative to lipid catabolism (Bar and Volkoff 2012), but yields five times more metabolic water (Jenni and Jenni-Eiermann 1998; Gerson and Guglielmo 2011a,b; McCue et al. 2017). In a low-water, desert environment an early shift to protein catabolism during periods of resource limitation may represent an important source of water for desert species (e.g., protein-for-water hypothesis; Mosin 1984; Jenni and Jenni-Eiermann, 1998; Gerson and Guglielmo, 2011a, b), as demonstrated for migrating birds. Consistent with this hypothesis, we identified numerous candidate genes that experienced selective sweeps in P. crinitus and that are involved in the detection of metabolic-stress and shifts in metabolic fuel consumption. For example, the gene eIF-2-alpha kinase GCN2 (E2AK4), which is responsible for sensing metabolic stress in mammals and required for adaptation to amino acid starvation, experienced the strongest selective sweep on chromosome 4 in P. crinitus and (Fig. 3; Harding et al. 2003; Baker et al. 2012; Taniuchi et al. 2016). Numerous candidate genes involved in oxidation (Oxidoreductase NAD-binding domain-containing protein 1 [Oxnad1]), fat catabolism (Kallikrein-6 [KLK6]), protein processing (Kallikrein-13 [KLK13]), and proteolysis (Kallikrein [KLK4, KLK13], Trypsin [PRSS2, TRYP, TRY2], Chymotrypsin-like elastase family member 2A [CELA2A]) were significantly enriched in P. crinitus, with proteolysis as the most enriched functional group (Fig. 4; Table S16), potentially supporting the protein-for-water hypothesis.
For desert species, including both desert specialists examined here (Morhardt and Hudson 1966; MacMillen 1983), heat- and drought-induced torpor enable long duration, low energy survival. Lipid acquisition and storage are critical to the initiation and maintenance of torpor (Buck et al. 2002; Melvin and Andrews 2009). Significant weight loss in experimentally-dehydrated P. eremicus and enhanced thermogenic performance of high-altitude-adapted deer mice have been associated with enhanced lipid metabolism (Cheviron et al. 2012; Kordonowy et al. 2016). At high-latitudes, increased lipid oxidation enables aerobic thermogenesis, but in hot deserts lipids may represent a valuable energy source in a food-scarce environment (e.g., lipids-for-torpor hypothesis).
Metabolic processes were enriched in P. crinitus, but not P. eremicus. Two additional candidate genes DCC-interacting protein 13-alpha and - beta (APPL1, APPL2), experienced a significant selective sweep in P. crinitus and are important in glucose regulation, insulin response, and fatty acid oxidation, reinforcing the hypothesis that enhanced lipid oxidation may be critical to thermoregulatory responses. Laboratory manipulations of APPL1 demonstrate protection against high-fat diet-induced cardiomyopathy in rodents (Park et al. 2013) and APPL2 is responsible for dietary regulation, cold-induced thermogenesis, and cold acclimation (uniprot.org). Together, these genes play a role in both obesity and dietary regulation. Both APPL genes are associated with obesity and non-alcoholic fatty liver disease and their sweep signature in P. crinitus may have relevant connections to biomedical research that remain to be explored (Jiang et al. 2012; Barbieri et al. 2013). Physiological tests will be essential to determine whether desert-adapted deer mice prioritize proteins or fats during periods of resource limitation (e.g., lipids-for-torpor) or extreme dehydration (e.g., protein-for-water hypothesis).
Molecular rewiring of metabolic processes in response to environmental conditions has been documented in a number of species (e.g., mammals, Cheviron et al. 2012, Velotta et al. 2020; birds, Xie et al. 2018; fruit flies, Mallard et al. 2018). Nonetheless, expression changes can also impact metabolism (Cheviron et al. 2012; Storz and Cheviron 2016). The capacity for rapid adaptation to distinct thermal environments through multiple evolutionary mechanisms, combined with thermoregulatory behavioral fine-tuning (e.g., nocturnality, aestivation, food caching, burrowing, dietary shifts), suggests there may be many more evolutionary responses available for small mammals to accommodate changing climates than previously realized. Therein, metabolism and metabolic plasticity represents a fundamental phenotype for anticipating species survival under altered climate scenarios.
Different evolutionary trajectories, same result
There are multiple evolutionary pathways to achieve environmental adaptation, most notably through genomic changes in protein coding genes or transcriptional regulation of gene expression. Although patterns of gene expression remain to be explored for P. crinitus, our comparative genomic investigation suggests alternative evolutionary strategies for each desert-adapted Peromyscus species shaped by their demographic histories: P. crinitus primarily through changes in protein-coding genes and P. eremicus primarily through transcriptional regulation.
Diverse functional enrichment of the P. crinitus genome (Fig. 4), spanning metabolic and osmoregulatory functions in addition to the general functional enrichment of ribosomes, points to polygenic adaptation to desert life. Here, we have identified a number of candidate loci worthy of detailed examination across populations and in a laboratory setting to understand the varied influence of different loci in thermoregulation and dehydration tolerance. In contrast, evidence of many significant sweep sites in the P. eremicus genome, generally located more distant from protein coding genes, and with functional enrichment restricted exclusively to ribosomes, suggests local adaptation in this species may have been driven more by selection on regulatory or non-coding regions of the genome that disproportionately impact gene expression. Indeed, transcriptomic investigations have identified significant expression changes implicated in osmoregulation in P. eremicus (MacManes and Eisen 2014; Kordonowy and MacManes 2017) and in thermoregulation in other Peromyscus species and rodents (Cheviron et al. 2012; Marra et al. 2014; Storz and Cheviron 2016). Transcriptional regulation is a particularly useful mechanism for environmental acclimation, as these changes are more transient relative to genomic changes and can enhance phenotypic flexibility (Garrett and Rosenthal 2012; Rieder et al. 2015; Liscovitch-Brauer et al. 2017). For example, transcription factors (TFs), which are a common target of selection located outside of protein-coding genes, can coordinate the expression of many genes, which allows multiple phenotypic changes to occur simultaneously (Wagner and Lynch 2008). Reduced genomic variation is expected near selective sweeps and can encompass tens to thousands of adjacent nucleotides depending on recombination and the strength of selection (Fay and Wu 2000; Carlson et al. 2005), yet counter expectation, Tajima’s D and nucleotide diversity for regions flanking putative selective sweeps were significantly higher than the global average for most comparisons (Table S18).
Placing the results of selective sweep analyses within an evolutionary framework is critical to interpreting varied evolutionary responses to similar environments. The expansion of North American deserts following the conclusion of the last glacial maximum (LGM, ~11 Kya; Pavlik et al. 2008) constrains the evolutionary and adaptive timescales of contemporary desert species. Assuming simultaneous colonization of southwestern deserts, the stable but low effective population size of P. eremicus suggests two hypotheses: (I) selection has removed variation in this species over time (Murray et al. 2017) or (II) P. eremicus historically harbored less genetic variation for selection to act on, despite equivalent contemporary diversity relative to P. crinitus (Table S18). As a consequence of demography, the evolution of P. eremicus is likely to have been more impacted by genetic drift (Fig. 2; Allendorf 1986; Masel 2011), so we expect genomic evolution to be slower in P. eremicus relative to P. crinitus, which historically has a larger effective population size and broader pool of variation for selection to act upon. Within this context, environmental adaptation could be achieved more efficiently for P. eremicus through transcriptomic plasticity or changes in regulatory elements (Allendorf 1986; Neme and Tautz 2016; Mallard et al. 2018). In contrast, the larger historical effective population size of P. crinitus is more conducive to the maintenance of genetic diversity and rapid evolution of protein coding sequences through mutational stochasticity, reduced impact of genetic drift, and potentially, gene flow. Peromyscus crinitus experienced a historical demographic bottleneck prior to the formation of North American deserts; Nevertheless, the recovered effective population size of P. crinitus is much larger than P. eremicus and consistent with low levels of detected admixture (Fig. S7, Table S8). Also consistent with a history of admixture, P. crinitus has the least negative Tajima’s D, while low nucleotide diversity may indicate that admixture may not be recent. Repeated growth and contraction of rivers in the American Southwest during Pleistocene glacial-interglacial cycles (0.7-0.01 Mya; Muhs et al. 2003; Van Dam and Matzke 2016) would have provided iterative opportunities for connectivity and introgression between incompletely-isolated Peromyscus species. Historical hybridization between P. crinitus and one or more other Peromyscine species, likely unsampled here, may have accelerated adaptation in P. crinitus through the rapid influx of novel mutation combinations through adaptive introgression, a hypothesis that warrants further investigation through expanded sampling and tests of adaptive introgression. Low-coverage whole-genome resequencing is optimal for comparative and population genomic investigations (O’Rawe et al. 2015; da Fonseca et al. 2016), but detailed analyses of historical introgression are limited and we look forward to testing this hypothesis with expanded population sampling and increased sequencing depth. Overall, incorporating an evolutionary perspective into the interpretation of selection patterns has important implications for anticipating species responses to anthropogenic climate change, as historical demography and gene flow, in addition to selection, are responsible for shaping contemporary diversity.
Conclusion
Contrasting patterns of selective sweeps and evolutionary histories between different species experiencing similar environmental pressures can provide powerful insights into the adaptive potential of species. We used comparative and population genomic analyses of three Peromyscus species to identify candidate loci that may underlie adaptations to desert environments in North America that serve to inform future investigations focused on predicting potential for adaptation and identifying the causes of warming-related population declines (Cahill et al. 2013). The identification of numerous targets of selection within P. crinitus highlights multiple molecular mechanisms (metabolic switching, osmoregulatory tuning) associated with physiological responses to deserts that warrant further investigation. Even in species recently adapted to similar environments we identified divergent evolutionary trajectories, with one species accommodating desert conditions primarily through genomic changes within protein coding genes and the other, through transcriptional regulation mediated by historical demographic processes. Our approach demonstrates the importance of placing genomic selection analyses into an evolutionary framework to anticipate evolutionary responses to change.
AUTHOR CONTRIBUTIONS
JPC performed analyses and wrote the first version of the paper. AT collected whole genome resequencing data, designed the bioinformatic pipeline. MDM conceptualized the project, performed de novo genome assembly and annotation, and funded data generation. OD, ADO, RK, IB, and ELA generated and assembled Hi-C data for P. crinitus as part of the DNA Zoo consortium effort. All authors reviewed and edited the manuscript.
DATA ACCESSIBILITY STATEMENT
The draft assembly data are housed on the European Nucleotide Archive (ENA) under project ID PRJEB33592. The Hi-C data is available on SRA (SRX7041777, SRX7041776, SRX7041773) under the DNA Zoo project accession PRJNA512907. The P. crinitus genome assembly is available at https://www.dnazoo.org/assemblies/Peromyscus_crinitus. Whole-genome resequencing data for P. crinitus are available on ENA under project ID PRJEB35488. Custom python scripts and other bash scripts used in analysis are available at: https://github.com/jpcolella/Peromyscus_crinitus.
ACKNOWLEDGEMENTS
We thank A. S. Westbrook for computational support; the Premise computing cluster at the University of New Hampshire, where all analyses were conducted; the Biotechnology Resource Center at Cornell University for preparation of whole-genome resequencing libraries; Christopher Tracy for access to the Boyd Deep Canyon Reserve; Jim Patton for desert field expertise; Sen Pathak, Asha Multani, Richard Behringer for providing the fibroblast samples from the T.C. Hsu Cryo-Zoo at the University of Texas M.D. Anderson Cancer Center; DNA Zoo for generating Hi-C data; Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia for computational support of the DNA Zoo effort and the Museum of Southwestern Biology at the University of New Mexico for loaned tissue materials. This work was funded by the National Institute of Health National Institute of General Medical Sciences to MDM (1R35GM128843).