Abstract
Characterization of the complex genomic architecture underlying quantitative traits can provide valuable insights into the study, conservation, and management of natural populations. This is particularly true for fitness-related traits such as body size and male ornamentation in mammals because as indicators of quality and health, these traits are often subject to sexual and artificial selective pressures. Here we performed high-depth whole genome re-sequencing on pools of individuals representing the phenotypic extremes in our study system for antler and body size in white-tailed deer (Odocoileus virginianus). Samples were selected from a database containing phenotypic data for 4,466 male white-tailed deer from Anticosti Island, Quebec, with four pools representing the extreme phenotypes for antler and body size in the population. Our results revealed a largely panmictic population (FST ∼ 0.01), but also detected diverged regions (FST > 0.11) between pools for both traits. These regions revealed genomic islands with signatures of positive selection and demographic expansion (negative Tajima’s D), and contained putative genes of small-to-moderate effect. Through qPCR analysis we genotyped an intron variant on gene SRP54 that is a potential QTL for antler size, and propose two missense variants on the MYADM and SPATA31E1 genes known to influence body size in mammals and reproductive success as potential QTL. This study revealed the polygenic nature of both antler morphology and body size in white-tailed deer and identified target loci for additional analyses.
Introduction
Quantifying the genomic architecture underlying phenotypes in natural populations provides insights into the evolution of quantitative traits (Stinchcombe & Hoekstra, 2008). Some quantitative traits are correlated with metrics of fitness, and as such are particularly important because they may directly influence population viability (Kardos & Shafer, 2018). This relationship between genomic architecture and quantitative traits, however, is not easy to empirically identify (Kardos & Shafer, 2018), and often has unclear and unpredictable responses to selection (Bunger et al., 2005). Outside of a few traits in well studied systems such as horn morphology in Soay sheep (Johnston et al., 2011, 2013) and migration timing in Pacific salmon (Prince et al., 2017), there are few empirical studies that have successfully identified quantitative trait loci (QTL) in wild vertebrate populations with effects on fitness-related traits.
Genome-wide association scans (GWAS) have the potential to identify links between genomic regions and fitness relevant traits that can directly inform the management of species, and have become more prominent in non-model organisms (Ellegren 2014; Santure & Grant 2018). The variation of expressed phenotypes that are observed in wild populations is influenced by demographic, environmental and epigenetic factors (Ellegren & Sheldon 2008), which leads to difficulties when trying to calculate the direct influence of genetic polymorphisms and more generally disentangle the effects of drift from selection (Kardos & Shafer 2018). Most traits also appear to be composed of many genes of small effect, with heritability for complex traits often explained by the effects on genes outside of core pathways (Robinson et al., 2013; Santure et al., 2013; Boyle et al., 2017). Population genetic modeling allows disentangling phenotypic plasticity and responses to environmental change, while expanding our understanding of the evolutionary processes in wild populations (Santure & Grant 2018). Where traits directly linked to reproductive success are also those humans select for through harvesting (Coltman et al., 2003; Kuparinen & Festa-Bianchet, 2017), there is a need to understand and predict the evolutionary outcome, which is in part dependent on the genomic architecture (Kardos & Luikart, 2019).
For traits relevant to fitness, including those under sexual selection, we might expect directional selection towards the dominant or desired phenotype. In long term studies of wild populations, however, there is often a relatively high degree of observed genetic variation underlying sexually selected traits (Johnston et al., 2013; Berenos et al., 2015; Malenfant et al., 2018), with the proposed mechanism for this genetic variation being balancing selection (Mank, 2018). For example, Johnston et al. (2013) showed that variation is maintained at a single large effect locus through a trade-off between observed higher reproductive success for large horns and increased survival for smaller horns in Soay sheep. These trade-offs, creating signatures consistent with balancing selection, are also observed in cases where social dominance structures exist as a result of male-male competition for female mate acquisition, where alternative mating strategies have been suggested as a means for younger or subordinate males to achieve greater mating success (Foley et al., 2015).
Emerging approaches to investigate the genomic architecture of complex phenotypes
The methodological approach of analyzing extreme phenotypes seeks to sample individuals representing the extreme ends of the spectrum for any observable phenotype, instead of randomly sampling individuals from the entire distribution (Perez-Gracia et al., 2002; Li et al., 2011; Emond et al., 2012; Barnett et al., 2013; Gurwitz & McLeod, 2013; Kardos et al., 2016). Kardos et al (2016) utilized whole genome re-sequencing of groups representing extreme phenotypes in forehead patch size, a sexually selected trait in collared flycatchers, to study the genetic basis of this phenotypic variation. Kardos et al. (2016) highlighted the limitations that exist when seeking to reliably identify QTL in large natural populations, with a key point being large sample size and complete sampling of the genome as requisites for detecting sites under selection. However, in small populations with widespread strong linkage disequilibrium, fewer individuals are required to detect large effect loci (Kardos et al, 2016; Lowry et al., 2017).
Pooled sequencing (pool-seq) uses the resequencing of pooled DNA from individuals within a population and is a cost-effective alternative to the whole genome sequencing of every individual (Anand et al., 2016). Individual identity is lost through the pooling of DNA, but the resulting allelic frequencies are representative of the population and can be used to conduct standard population genomic analyses, including outlier detection (Schlötterer et al., 2014). Pool-seq methods have been applied with the purposes of identifying genes of moderate-to-large effect loci in a variety of taxa including colour morphs in butterflies and birds, abdominal pigmentation in Drosophila, and horn size of wild Rocky Mountain bighorn sheep (Kardos et al, 2015; Endler et al., 2017; Neethiraj et al., 2017).
Here, we explore the genic basis for phenotypic variation by sampling the extreme phenotypes in a non-model big game species, the white-tailed deer (Odocoileus virginianus, WTD). Two traits are of particular interest in WTD; body size and antler size, which have a degree of observable variation (Hewitt 2011) and are connected to individual reproductive success (DeYoung et al., 2009; Newbolt et al., 2016; Jones et al., 2018). Heritability for antler and body measurements are moderate to high with mean h2 estimated to be 0.32 (0.07-0.54) for the number of points and 0.42 (0.11-0.69) for basal circumference (Michel et al., 2016) and 0.58-0.64 for body mass (Williams et al., 1994). Large antlers and body size are also sought after by hunters, both as trophies and food sources throughout North America, and thus could be subject to artificial selection (Mysterud, 2011; Ramanzin & Sturaro, 2014). Our objective was to identify QTL throughout the genome associated with variation in antler and body size phenotypes in WTD, under the hypothesis that many genes of small to moderate effect underly these phenotypes. We performed whole genome re-sequencing on pools representing the extreme distribution of antler and body size phenotypes in a wild, but intensively monitored WTD population.
Materials and Methods
Study area
Anticosti Island (49°N, 62°W; 7,943 km2) is located in the Gulf of St. Lawrence, Québec (Canada) at the northeastern limit of the white-tailed deer range (Figure 1). The island is within the balsam fir-white birch bioclimatic region with a maritime sub-boreal climate characterized by cool and rainy summers (630mm/year), and long and snowy winters (406 cm/year; Environment Canada, 2006; Simard et al., 2010). The deer population was introduced in 1896 with ca. 220 animals and rapidly increased. Today, densities are >20 deer/km2 and can exceed 50 deer/km2 locally (Potvin and Breton 2005) with an estimated population >160,000 individuals.
Sample Collection and Modeling
We collected tissue samples and phenotypic data on 4,466 male deer harvested by hunters from September to early December, 2002–2014. We used cementum layers in incisor teeth to age individuals (Hamlin et al., 2000). Two metrics of antler size were selected: the number of antler tines or points (>2.5 cm) and beam diameter (measured at the base; ±0.02 cm) (Simard et al. 2014). We used one metric for body size: body length which is correlated to other metrics such as body weight and hind foot length (Bundy et al. 1991). Because antler and body size of male cervids are correlated to age (Solberg et al. 2004; Nilsen & Solberg, 2006), we controlled for age before ranking males according to antler and body size. We used linear models to assess the relationship between age and each metric separately. We computed an antler and body size index based on the average rank of each individual’s residual variation for the phenotypic metrics. The top and bottom 150 individuals from each group were selected from the available database for DNA extraction, and only deer with equal representation for large and small phenotypes from a given year were included for further analysis. This created four groups: large antler (LA), small antler (SA), large body size (LB), and small body size (SB) (Table S1).
DNA Extraction and Genome Sequencing
We isolated DNA from tissue using the Qiagen DNeasy Blood & Tissue Kit. The concentration of each DNA extract was determined using a Qubit dsDNA HS Assay Kit (Life Technologies, Carlsbad, CA, USA). Equal quantities of DNA (100 ng/sample) were combined into representative pools for LA (n=48), and SA (n=48), LB (n=54), and SB (n=61) for a desired final concentration of 20 ng/ul combined DNA for each pool. We sequenced each pool to approximately 50X coverage at The Centre for Applied Genomics (Toronto, ON, Canada) on an Illumina HiSeqX with 150 bp pair-end reads (Table S2).
Genome Annotation
A newly generated draft WTD genome constructed from long (PacBio) and short-read (Illumina) data was used as a reference (Fuller et al., 2019; NCBI PRJNA420098). We performed a full genome annotation by masking repetitive elements throughout the genome using a custom WTD database developed through repeat modeler (Smit & Hubley, 2015) in conjunction with repeatmasker (Smit, Hubley & Green, 2015) using the NCBI database for artiodactyla, without masking low complexity regions. The masked genome was then annotated using the MAKER2 pipeline (Holt & Yandell, 2011). We used a three-stage process (Laine et al., 2016) utilizing available cow EST and Protein sequences available through NCBI for initial training with SNAP2. The resulting GFF output was then used as evidence for the MAKER2 prediction software, again using SNAP2. The final annotation used evidence from the prior SNAP2 trials and the available human training data set with AUGUSTUS to generate a final annotation (Dryad Accession ID XXXXXX). Gene IDs were generated using blastp (Johnson et al., 2008) on the WTD annotation protein transcripts, restricting the blast search to a list of GI numbers obtained from the NCBI proteins database.
Mapping and characterization of SNPs
We performed initial quality filtering for all reads using fastqc. Raw reads were aligned independently to the masked WTD reference genome with BWA-mem (Li, 2013). We used samtools (Li et al., 2009) to merge and sort all aligned reads into four files for each representative pool. Using bcftools (Li et al., 2009) we generated mpileup files representative of the antler and body phenotypes (parameters = -B -q20) following standard poolseq protocol (Kolfer et al., 2011).
Genome wide differences were calculated between large and small phenotypes for antler and body size. We tested for pairwise allele frequency differences using Fisher’s exact test (FET) and pairwise fixation index (FST) from the Popoolation2 software suite (Kofler et al., 2011), with parameters set to a minimum coverage of 50, maximum coverage of 200 and a minimum overall count of the minor allele of 8 for each pool. FST values were highly correlated to FET P-values (Pearson r = 0.96, Figure S1), and thus we refer only to FST for outlier detection. Manhattan plots were generated using a custom R script to plot the distribution of FST by position and scaffold throughout the entire WTD genome. We identified the 99th percentile of FST scores across the genome and calculated the genome-wide mean FST.
For all pools, we examined the proximity to genic regions for all identified SNPs using SnpEff (Cingolani et al., 2012). All SNP locations were characterized as being in an intergenic/intragenic region, 25kb up/downstream of a gene, intron, or exon. We also obtained an assessment of the putative impact of each variant (HIGH, MODERATE, LOW) and proximity to the nearest gene for every identified SNP. Numerous candidate genes for body size and head gear have been identified in ruminants (Bouwman et al., 2018; Ker et al., 2018; Wang et al., 2019); we looked specifically for SNPs and divergence in a non-exhaustive list of genes known a priori to influence these phenotypes (Table S3).
We used a sliding window approach (10kb) to create estimates of Tajima’s D across the entire WTD genome using the software popoolation (Kolfer et al., 2011). Here, all reads, irrespective of pool, were merged into a single pileup file then subsampled to 100X coverage per site to account for overrepresented sequences. Tajima’s D was calculated in 10kb sliding windows, and gene only windows that were 60% covered by pre-defined parameters (min coverage 30x, max coverage 200x, min count 2, q-20) (Kolfer et al., 2011). Resulting D values in the 99th percentile were used to help identify outlier regions throughout the genome, and were plotted on scaffolds of interest previously identified through the FST outlier analysis.
Transposable Elements
We used consensus transposable element (TE) sequences from the repbase database for cow to repeat mask the WTD reference genome, and subsequently merged this masked genome with the repbase TE reference sequences (Bao et al., 2015). We used the repbase database consensus sequences to re-mask the genome due to more robust information for TE identities, family and order required in this analysis. Using methods identical to those previously stated, all reads for both antler and body size phenotypes were aligned to the TE-WTD merged reference genome. We used the recommended workflow for the software suite popoolationTE2 (Kofler et al., 2016) to create a list of predicted TE insertions. From this we calculated TE frequency differences between large and small phenotypes as well as proximity to genic regions. Here, we only examined TEs that were within 25-kb up or downstream from a gene and within the 95th percentile for absolute frequency differences to allow for more features to be assessed subsequently.
GO Pathways
To identify shared gene pathways among outlier regions we used an analysis of gene ontology (GO) terms. GO terms were obtained for all genes that were within 25-kb of SNPs in the 99th FST and 95% TE frequency percentile for antler and body size phenotypes independently. We created outlier gene ID lists for antler and body size that were provided independently to DAVID v6.8 (Huang, Sherman & Lempicki, 2009), while removing all duplicate gene IDs. DAVID uses a modified FET to determine the significance of enrichment for any given GO pathway based on the size of the gene list provided, the number of genes used as a background for a species, and level of enrichment for each term from the list. Functional annotation enrichments for GO terms using GOTERM_BP_ALL, GOTERM_CC_ALL, and GOTERM_MF_ALL were obtained with the top 10 enriched GO terms being plotted with gene counts and p-values. We used the program REVIGO (Supek et al., 2011) to remove redundant GO terms and to visualize semantic similarity-based scatterplots. Results from DAVID with a benjamini corrected p-value <0.05 were used with REVIGO to generate plots of significant GO terms for biological processes, molecular function, and cellular components of antler and body size phenotypes.
Validation of outliers
We selected three outlier SNPs in the 99th percentile FST values in genic regions associated with antler size for qPCR validation. Custom genotyping assays using rhAMP chemistry (Integrated DNA Technologies) were designed and genotyped on the QuantStudio 3 (Thermo Fisher Scientific). Oligo sequences and reaction parameters are provided in Table S4. Using the phenotypic category as the binary response variable, we ran a logistic regression treating the genotypic data as additive (e.g. 0-2 LA/LB alleles per locus, per individual).
Results
Phenotypes
The distribution of phenotypes from all individuals in the existing database is shown in Figure S1. We only selected the top 150 individuals at the tail ends of the distribution for measurements used in our antler and body size rankings which are representative of the “extreme phenotypes”. Artist renderings and the distribution of measurements for the number of antler points, beam diameter, and body length between the groups of individuals representing each extreme phenotype pool (LA, SA, LB, SB) are shown in Figure 2a. There were no differences in mean age between the pools (Figure 2b).
Detecting genetic variants and their genomic regions
The total reads generated from resequencing are observed in Table S3 (BioProject ID PRJNA576136), and the estimated mean genome-wide coverage based on the 150 bp read length and 2.5 Gb WTD genome. After filtering for coverage, INDELS, and minimum allele counts, 834,855 and 1,016,105 SNPs were identified for the antler and body size phenotype pools respectively; the distribution for which FST values across the entire WTD genome are shown appears in Figure 3a. The mean FST across the antler phenotype SNPs was calculated to be 0.019; FST values > 0.12 fell within the 99th percentile which were considered outliers for downstream analysis. For the comparison of body size phenotypes, the mean FST was calculated as 0.017 with a 99th percentile FST value cut off of 0.11. We identified 336,379 SNPs that were shared between both antler and body size pools.
The distribution of SNPs throughout the WTD genome, the classification of their annotation (exon, intron, 25-kb flank, or intron) and the SNP density based on FST category are shown in Figure 3b. The comparison of the overall count and percentage of SNPs between outlier and total SNPs are observed in Table 1 and a selection of outlier regions is shown in Figure 4 and Figure S3. From the list of 32 a priori candidate genes in the literature, we identified the highest FST SNP and total number of SNPs from each gene, including a 25 kb window up/downstream, from the analysis of each trait (Table S3); five genes had FST values in the 99th percentile.
Estimates of Tajima’s D
Tajima’s D was estimated in 10kb sliding windows across the entire WTD reference genome yielding 251,275 windows, 103,832 of these windows met the criteria previously defined for estimations of D. The genome-wide mean D was calculated to be −0.50. We identified windows that fell within the 99th percentile of positive (>0.35) and negative (<-1.75) (Figure S4) values for comparison with the FST outliers and gene regions (Table 1).
TE Insertions
We identified 19,734 TE insertions through the joint analysis of antler phenotype sequences (both large and small). Of these TEs, we identified those that fell within the 95th percentile for the absolute difference between large and small phenotypes for further analysis (>0.198%; Figure S5). Of the 95th percentile TE insertions in the antler analysis, 169 were found to overlap with genes, with 206 within a 25-kb window up or downstream of genic regions. For body size, 364 insertions in the 95th percentile (>0.205%) overlapped with genes, and 143 within 25-kb of a genic region.
Gene Ontology Annotations
From the top SNPs identified through outlier analysis, 1,806 genes were identified for antler GO analysis after the removal of duplicate genes, and 1,983 from the body size analysis. The top 10 enriched GO terms and the gene counts from these analyses are in Figure 5. We also showed term reductions for significantly enriched GO terms (benjamini-corrected p-value <0.05) through the program REVIGO, with clustered terms relating to semantic similarity of the terms. This was only done for GO categories in which significant values existed (Figure S6-9). We chose to only display GO figures relating to biological processes as it is most relevant to our study. As there was no significant enrichment through the body size analysis, only the main findings for antler are represented in Figure S6, where we observed GO terms grouped by semantic similarity and highlighted based on calculated significance of enrichment. As the list of GO terms was reduced by REVIGO, the labels present are representative of less dispensable terms.
In analyzing the GO term enrichments for biological processes of genes that met our inclusion criteria from the TE outlier analysis, we found no significantly enriched pathways.
Validation of Outliers
We genotyped three loci (RIMS1, PTEN, and SRP54) which resulted in 78 individuals having complete genotype and phenotypic data (n = 45 HA and 35 LA; Dryad Accession no. XXXXXX). The model showed an effect of the number of HA alleles at the SRP54 locus on antler category (ß = −1.05, p = 0.02), but not the other two loci (p > 0.05).
Discussion
The study of sexually selected quantitative traits in the Anticosti Island WTD population has provided novel insights into the underlying genomic architecture of phenotypic variation. The extensive database with phenotypic measurements for these deer (n=4,466) allowed us to select individuals from the wide range of the distribution (Figure 2; S2) that are representative of extreme phenotypes for the two antler and one body size measurements. This sampling methodology is important to maximize the additive genetic variance for each trait, increasing power and the ability to detect QTL (Kardos et al., 2016). In this sense, this is the first study to apply this methodology of extreme phenotypic sampling in a wild vertebrate population for sexually selected and hunter-targeted traits.
Anticosti Island WTD form a putatively panmictic population, where we would expect gene flow across the island and thus a lack of fixation at any specific loci, or existence of any population structure (Fuller et al., 2019). Consistent with this, genome-wide FST was ∼0.01; however, we identified QTL for both traits within the top percentile (antler FST > 0.12, maximum=0.49; body FST > 0.11, maximum=0.35) that are atypically large for what would be expected in a panmictic population. The top percentile SNPs and divergent TEs for antler and body traits are widely dispersed throughout the WTD genome (Figure 3a), which imply a polygenic model for these quantitative traits. This is consistent with the literature on body size and ornaments in mammals (Visscher et al., 2007; Bouwman et al., 2018; Wang et al., 2019); however, a few QTL of small-to-moderate effect appear present, at least for antlers (i.e. SRP54) which is consistent with studies on sheep (Berenos et al., 2015).
Genomic architecture of antlers and body size
For both traits, there is evidence for divergent loci represented by high FST values throughout the genome that overlap with genic regions (Figure 3b). Traditionally, variation in SNPs is focussed on non-synonymous variants, but it is becoming evident that primarily non-coding regions impact phenotypic variation (Watanabe et al., 2019), often with clear relationships to promoters and enhancer regions (Pagani & Baralle 2004; Zhang & Lupski 2015; Foote et al., 2016).
The identification of 2,690 SNPs for antler traits and 3,849 SNPs (Table S3) for body size traits with high FST that overlap with genic windows is supportive of many QTL of small effect and a polygenic model for these quantitative traits. While some are surely false positives (Whitlock & Lottheros 2015), cattle GWAS studies typically identify 100s to 1000s QTL (Cole et al., 2011; Jiang et al., 2019). Our data suggest that it is the cumulative effects from these variants, including TE insertions, and possibly their epistatic interactions that are ultimately driving the phenotypic variation in antler and body sizes. Examining many of these high FST variants at the scaffold level revealed signatures of linkage and selection at specific sites (Figure 4). Many of these regions (Figure S3) show “genomic islands” surrounding highly differentiated SNPs. These islands presumably represent divergent haplotypes between extreme phenotypes and thus a putative connection to phenotypic variation. Moreover, negative Tajima’s D values (Figure 4) are consistent with the demographic history of the island and our sampling strategy that mimics divergent selection.
Analysis of the SNPs resulting from the antler identified pathways relating to various aspects of antler growth and regeneration (Figure 5; S6). The three loci selected for further qPCR genotyping assays had a relatively high degree of differentiation and close proximity to genic regions (Figure S3). All three SNPs are intron variants for the genes RIMS1, PTEN and SRP54 respectively. RIMS1 (regulating synaptic membrane exocytosis 1) codes for proteins that are central in integrating active zone proteins and synaptic vesicles into scaffolds that control neurotransmitter release and are highly expressed in brain and testis tissues of human and mice (Lonart, 2002; Schoch et al., 2002). Phosphatase and tensin homolog (PTEN) is a tumor suppressor gene and is part of chemical pathways involved in apoptosis (Chalhoub & Baker 2009). SRP54 is a signal recognition particle that mediates the targeting of proteins to the endoplasmic reticulum (Pool et al., 2002); this protein has been found in human clinical, and zebrafish models to be associated with bone marrow failure syndromes and skeletal abnormalities (Carapito et al., 2017).
Antlers are the only completely regenerable organ found in mammals (Li et al., 2014), a unique process that involves simultaneous exploitation of oncogenic pathways and tumor suppressor genes, and the rapid recruitment of synaptic and blood vesicles (Wang et al., 2019). The relative high degree of differentiation found at intron regions for these genes between large and small antlered pools, in combination with their physiological connections to the underlying processes involved in antler growth and regeneration (notably PTEN and SRP54), supports their potential function in the observed variation between phenotypes. We explored this relationship through qPCR genotyping of these loci for individuals included in each respective pool. The model showed a relationship between antler size and genotype for the SRP54 SNP previously described, indicating it as a QTL of moderate effect considering our results. Further bolstering this association was the GO terms showing an enriched Reactome Pathway (Benjamini corrected p = 0.02) for an SRP-dependent cotranslational protein targeting to membrane (GO:0006614), that have known functions in bone abnormalities and congenital neutropenia (Carapito et al 2017),The lack of a strong RIMS1 and PTEN signal in the antler model is consistent with the polygenic nature, and the likely epistatic interactions (discussed more below), driving trait variation meaning any individual locus on its own lacks a strong signal. Increasing sample size and individual genome sampling is required to fully characterize these potential epistatic interactions (e.g. Knief et al. 2019).
Outlier analysis for SNPs identified through the comparison of extreme body size phenotype sequences also revealed an array of genes of interest. However, it was more difficult than antlers to not engage in storytelling (Pavlidis et al., 2012) as traits of this nature, for example human height, involve 100s of SNPs and a multitude of biological pathways (Marouli et al., 2017); this is reflected in our study with no statistically significant enriched GO terms (Figure 5). However, a few SNPs are worth highlighting. The upstream variant (4426 bp, FST=0.31) for liver-enriched gene 1 (LEG1) showed multiple points that appear to be indicative of haplotypes for a QTL, both on and off genic regions throughout the ∼106 kb scaffold. LEG1 is linked to liver function (Chang et al. 2011), and our WTD annotation results predicted two genes on this scaffold, both with similarity to LEG1. In addition to this, we aligned sequences from a previously conducted WTD transcriptome analysis (Genomic Resources Development Consortium et al., 2013), which revealed expressed sequences between this QTL and the upstream LEG1 gene. As our annotation is based off existing evidence for human and cow expressed sequences, this would suggest a novel structure for the gene in WTD and a role in body size.
Two missense variants that fall on genes for MYADM (Myeloid-associated differentiation marker) and SPATA31E1 (Spermatogenesis-associated protein 31E1) with FST values of 0.31 and 0.27. Gonzalez et al. (2013) identified a QTL in sheep that was significantly associated with decreased mean corpuscular volume, a trait correlated with body size. Importantly, the authors observed a divergent artiodactyla MYADM-like variant in strong linkage disequilibrium with the associated SNP (Gonzalez et al., 2013), as well as being in a region of strong historical selection (Kijas et al., 2012). Schrimpf et al. (2016) identified 9 high-impact SNPs absent in fertile stallions with putatively deleterious effects on fertility, that included SPATA31E1. Considering this gene’s connection in spermatogenesis (Schimpf et al., 2016) and homology in a related species for fertility, there is a link to the sexually selected nature of the trait being examined and is consistent with body size being linked to reproductive success in deer (Vanpé et al., 2010); interestingly this could reflect correlated evolution between body-size and sperm viability. Both of these markers require validation but open up the possibility for marker-assisted breeding programs and studying sexual selection in wild cervids.
Transposable elements and epistatic considerations
We generated a list of novel TE insertions for both phenotypes. Traditionally, masking these elements reduces misalignments due to the repetitive nature of their sequences and current limitations with mapping software (Treangen & Salzberg 2012), but also misses a wealth of information that encompasses a large portion of the genome. Our focus was the frequency differences of TE insertions to properly mapped reference sequences, and how these varied between phenotypes. This stems from increasing evidence showing that TE insertions impact gene expression, and thus the variation for a given phenotype (Bourque et al., 2018). We found 169 divergent TE insertions in genic regions from our antler analysis and 68 from the body size analysis. While there was no significant enrichment of pathways, the insertion of these highly variable sequences has the potential to impact gene function as studies are now starting to emerge that both validate insertions (Lerat et al., 2018) and show evidence for positive selection (Kofler et al., 2011).
Epistatic relationships of many genes of small effect, including TEs, constitutes a complex genomic architecture and is the most likely explanation for the natural variation observed for these sexually selected traits. Anticosti Island should provide a more-or-less homogenous environment, but microclimate could drive some degree of this variation. We were liberal in determining genic regions (exon, intron, 25-kb flanks and 99th percentile) to avoid missing potentially relevant QTL that traditional approaches might exclude (Albert & Kruglyak, 2015). This is partially observed through the enrichment of GO terms shown in Figure 5 and S6, where we see strong semantic clustering for terms relating to macromolecule localization (GO:0033036) and regulation of cellular component organization (GO:0051128), and terms relating to the ornament pathways identified by Wang et al. (2019) such as; anatomical structure morphogenesis (GO:0009653), chemical synaptic transmission (GO:0007268), localization (GO:0051179), and cellular component organization or biogenesis (GO:0071840). Although the number of terms is reduced by removing redundant terms and grouping based on semantic similarity, we still showed the diverse array of functions represented of by the divergent SNPs (Figure S6).
Lastly, in analyzing a list of 32 a priori genes identified for their significance in various pathways relating to antler and body development from the literature (Table S3), we generally observed low levels of variation, i.e. 27 of 32 genes had SNPs with FST <99th percentile, between phenotypes for these genes for both traits, despite their known effect. These findings are not unsurprising in the case of genes identified through Wang et al. (2019), as many of these genes are identified as being under selective pressures and are responsible for the presence or absence of a trait between species, not variation of the trait within a species per se. Conversely, Bouwman et al. (2018) specifically identified common genes that regulate body size in mammals where we would expect to see more variation between phenotypes. Many of these genes were found to be in divergent regions, with SNPs for IGF2 (FST=0.15) and PLAG1 (FST=0.17) present in the outlier analysis. PLAG1 (pleomorphic adenoma gene 1) initiates transcription of IGF2 (insulin like growth factor 2), with both playing a role in the variation of stature in humans and cattle (Bouwman et al., 2018; Wood et al., 2014; Pryce et al., 2011; Fortes et al., 2014). This is again suggesting a small effect of these variants for the genetic contribution to variation in body size from our analysis.
Implications for understanding of artificial and sexual selection
Antler and body size are traits that are sexually selected for and linked to reproductive success in WTD (Newbolt et al., 2016; Morina et al., 2018), and are traits desired by hunters, managers, and farmers. We have identified a few low-to-moderate effect QTL, and many genome-wide variants with clear differentiation between the phenotypic extremes for these traits. We suggest the combined effects of these variants as being the genetic drivers behind observed phenotypic variation. The measured phenotypes are expected to be under balancing selection (Mank 2018); however this does not have to be reflected in all gene pathways that ultimately contribute to the expression of a complex trait. Considering the case for an omnigenic model where all relevant peripheral genes related to a given phenotype can affect the function of core phenotype-related genes (Boyle et al., 2017), it would be inaccurate to attribute the evolution of a complex trait to any specific genomic region without examining the additive affects of the entire genomic landscape; thus more sequencing is required.
We provide evidence for positive selection at various sites in Figure 4, as defined by clear dips in D values (approaching values of −2), and through genic regions with significant genomic island of differentiation between phenotypes (Figure S3). Positive D values represent balancing selection, but our sampling design (i.e. no intermediate phenotypes) and rapid population expansion (Potvin and Breton 2005) impacts metrics like Tajima’s D (Stajich & Hahn, 2004). Thus, we predict that subsequent genotyping of targeted loci across the spectrum of phenotypes should produce values more consistent with balancing selection.
Although we have identified many potential QTL relating to the variation in phenotypes for WTD, their direct use in management or monitoring remains unclear. Considering the gene-targeted conservation road map presented by Kardos and Shafer (2018), there appears to be no current application in management or monitoring for WTD species given the existing body of knowledge and conservation status of WTD. It is conceivable that a gene panel could be developed that could be used in breeding and management programs (e.g. Quality Deer Management) or assess the effects of artificial selection by trophy hunting; however until a reasonable amount of phenotypic variation can be attributed to specific QTL, as we have begun to do with SRP54 genotyping, a gene targeted approach for management and breeding does not seem warranted at this stage.
Data Accessibility
Raw sequence data FastQ files are available on the Sequence Read Archive (Accession: PRJNA576136). All bioinformatic and analytical code available on GitLab (https://gitlab.com/WiDGeT_TrentU/PoolSeq)
Author Contributions
S.J.A., S.D.C., and A.B.A.S. designed the study; J.H.R., and S.J.A. coordinated data collection and sample curation; S.J.A., performed research and analyzed data; S.J.A. wrote the manuscript with input from J.H.R., S.D.C., and A.B.A.S.
Acknowledgments
This work was supported by Natural Sciences and Engineering Research Council of Canada Discovery Grant (ABAS and SDC); ComputeCanada Resources for Research Groups (ABAS); Canadian Foundation for Innovation: John R. Evans Leaders Fund (ABAS); The Symons Trust Fund for Canadian Studies (ABAS); Trent University start-up funds [ABAS]; and Industrial Chairs and Collaborative Research and Development Grants from the Natural Sciences and Engineering Research Council of Canada (SDC). We thank the outfitters of Anticosti Island and the Ministère des Forêts, de la Faune et des Parcs du Québec for logistical help associated with fieldwork, and all of the field assistants and technicians who collected samples and made phenotypic measurements throughout the duration of this project.