Abstract
Meiotic recombination breaks down linkage disequilibrium and forms new haplotypes, meaning that it is an important driver of diversity in eukaryotic genomes. Understanding the causes of variation in recombination rate is not only important in interpreting and predicting evolutionary phenomena, but also for understanding the potential of a population to respond to selection. Yet, there remains little data on if, how and why recombination rate varies in natural populations. Here, we used extensive pedigree and high-density SNP information in a wild population of Soay sheep (Ovis aries) to determine individual crossovers in 3330 gametes from 813 individuals. Using these data, we investigated the recombination landscape and the genetic architecture of individual autosomal recombination rate. The population was strongly heterochiasmic (male to female linkage map ratio = 1.31), driven by significantly elevated levels of male recombination in sub-telomeric regions. Autosomal recombination rate was heritable in both sexes (h2 = 0.16 & 0.12 in females and males, respectively), but with different genetic architectures. In females, 46.7% of heritable variation was explained by a sub-telomeric region on chromosome 6; a genome-wide association study showed the strongest associations at RNF212, with further associations observed at a nearby ~374kb region of complete linkage disequilibrium containing three additional candidate loci, CPLX1, GAK and PCGF3. This region did not affect male recombination rate. A second region on chromosome 7 containing REC8 and RNF212B explained 26.2% of heritable variation in recombination rate in both sexes, with further single locus associations identified on chromosome 3. Our findings provide a key empirical example of the genetic architecture of recombination rate in a natural mammal population with male-biased crossover frequency.
Author Summary For almost 50 years we have known that genetic linkage can constrain responses to selection, while recombination can offer an escape from this constraint by forming new combinations of alleles. This increases the genetic variance for traits and hence the potential of a population to respond to selection. Therefore, understanding the causes and consequences of variation in the rate of recombination is important, not only in interpreting and predicting evolutionary phenomena, but for applications in genetic improvement of domesticated species. In our study, we used extensive genomic and pedigree information to identify genes underlying individual recombination rate variation in a wild population of Soay sheep on the St Kilda archipelago, NW Scotland. We show that the rate of recombination is partly inherited, that it is 30% higher in males than females, and that the majority of genetic variation in female sheep is likely to be explained by a genomic region containing the gene RNF212. This finding shows that recombination rate has the potential to evolve within wild populations.
Introduction
Recombination is a fundamental process ensuring the proper segregation of homologous chromosomes during meiosis [1,2], but also plays an important role in driving the evolution of eukaryotic genomes, because it rearranges existing allelic variation to create novel haplotypes. Theoretical work predicts that recombination prevents the accumulation of deleterious mutations, uncoupling them from linked beneficial alleles [3,4], and can also lead to an increase in genetic variance for fitness, increasing the efficiency of selection within finite populations [5–9] (experimental examples [10,11]). However, recombination is also associated with fitness costs: higher rates of crossing-over may increase the risk of deleterious chromosomal rearrangements [12], or lead to the break-up of favourable combinations of alleles previously built up by selection, reducing the mean fitness of subsequent generations [7]. Therefore, understanding the causes and consequences of variation in recombination rate is not only important in interpreting and predicting evolutionary phenomena in the context of sex, selection and drift [8,13,14], but also has potential applications in genetic improvement through increased responses to selection [15].
Recent studies of model mammal systems have shown that crossover frequency can vary both within and between chromosomes, individuals, populations and species [16–18]. This may be partially driven by physical mechanisms affecting chiasma positioning (e.g. obligate crossing-over [19], crossover interference [20,21], GC content [22] and/or proximity to telomeres and centromeres [23]), but is also likely to be driven by heritable genetic effects [24–26]. Genome-wide association studies in humans, cattle and mice have attributed some heritable variation to specific genetic variants, including ring finger protein 212 (RNF212), meiotic recombination protein REC8 (REC8) and PR domain zinc finger protein 9 (PRDM9) [26–29]. This relatively simple genetic architecture implies that recombination rate has the potential to respond rapidly to selection; for example, mutations in PRDM9 in humans and other primates can result in large changes in recombination landscapes over relatively short evolutionary timescales [30,31].
One additional observation across eukaryotes is that recombination rate often varies between the sexes. The absence of recombination in one sex (achiasmy) is almost always in the heterogametic sex, and may be a pleiotropic effect of selection for tight linkage on Y and Z chromosomes (Haldane-Huxley rule, [32–34]). However, in species where recombination rate varies between the sexes (heterochiasmy), this rule no longer holds, as recombination rates can be male or female biased, even within hermaphroditic species or species with no sex chromosomes or sex determining loci [35]. In mammals, females generally have higher numbers of crossovers across the genome as a whole [35]; this may be partially due to differences in chromatid structure [36–38], but there may also be strong selection for increased recombination rate in females as a mechanism to avoid aneuploidy (i.e. incorrect chromosome number) after long periods of meiotic arrest before fertilisation [39–41]. However, some mammal species exhibit higher recombination rates in males, such as domestic sheep [42], macaques [43] and marsupials [43,44]. Several selective processes have been suggested to explain variation in heterochiasmy, including haploid selection [35], meiotic drive [45], sperm competition, sexual dimorphism and dispersal [43,46,47]. Nevertheless, a clear understanding has been limited by a paucity of empirical examples, particularly in systems with male-biased heterochiasmy.
Here, we examine the genetic architecture of recombination rate variation in a wild mammal population with strong male-biased heterochiasmy. Soay sheep (Ovis aries) are a Neolithic breed of domestic sheep that have lived unmanaged on the St Kilda archipelago (Scotland, UK, 57°49’N, 8°34’W) since the Bronze age [48]. Individuals from the Village Bay area of archipelago have been subject to a long-term study since 1985, with extensive genomic, pedigree, phenotype and environmental data collected for more than 6,000 individuals. In this study, we integrate genomic and pedigree information to characterise autosomal cross-over positions in more than 3000 gametes in individuals from both sexes. We show that individual recombination rate in Soay sheep is heritable and strongly heterochiasmic, and identified two genomic regions associated with rate variation: one containing RNF212, which explained a 46.7% of additive genetic variance in females alone; and another containing REC8 and RNF212B, which explained 26.2% of additive genetic variance across both sexes. These results provide new information on genetic architecture of recombination in system with strongly sexually-dimorphic rate variation.
Results
Broad-scale variation in recombination landscape
We used pedigree information and data from 39104 genome-wide SNPs typed on the Ovine SNP50 BeadChip [49] to identify 98420 meiotic crossovers in gametes transferred to 3330 offspring from 813 unique focal individuals. A linkage map of all 26 autosomes had a sex-averaged length of 3304 centiMorgans (cM), and sex-specific lengths of 3748 cM and 2860 cM in males and females, respectively, indicating strong male-biased heterochiasmy in this population (Male:Female linkage map lengths = 1.31; Fig S1, Table S1). There was a linear relationship between the length of autosomes in megabases (Mb) and linkage map lengths (cM; Adjusted R2 = 0.991, P < 0.001; Fig 1A). Chromosome-wide recombination rates (cM/Mb) were higher in smaller autosomes (fitted as multiplicative inverse function, adjusted R2 = 0.616, P < 0.001, Fig 1B), indicative of obligate crossing over. The degree of heterochiasmy based on autosome length in cM (i.e. differences in male and female recombination rate) was consistent across all autosomes (Adjusted R2 = 0.980, P < 0.001, Fig 1C).
Fine-scale variation in recombination landscape
Finer-scale probabilities of crossing-over were calculated for 1Mb windows across the genome for each sex, using recombination fractions from their respective linkage maps. Crossover probability was variable relative to proximity to telomeric regions, with a significant interaction between sex and distance to the nearest telomere fitted as a cubic polynomial function (Fig 2A). Males had significantly higher probabilities of crossing-over than females between distances of 0Mb to 12.45Mb from the nearest telomere (Fig 2B, Table S2). Crossover probabilities were significantly higher in windows with higher GC content (General linear model, P < 0.001; Table S2). Investigation of the relative distances between crossovers (in cases where two or more crossovers were observed on a single chromatid) indicated that there is crossover interference within this population, with a median distance between double crossovers of 48Mb (Fig S2).
Analysis of individual recombination rate
Individual recombination rates were defined as the number of autosomal crossover counts observed per gamete (hereafter ACC) transmitted from a focal individual (hereafter FID) to its offspring. To determine the heritability of ACC, variance was partitioned using a restricted maximum likelihood animal model approach [50], and additive genetic variance was modelled with a genomic relatedness matrix constructed using autosomal SNP marker information. Individual ACC was heritable (h2 g= 0.145, SE = 0.027), with the remainder of the phenotypic variance being explained by the residual error term (Table 1). ACC was significantly higher in males than in females, with 7.376 (SE = 0.263) more crossovers observed per gamete (Animal Model, Z = 28.02, PWald < 0.001). However, females had marginally higher additive genetic variance PLRT = 0.046) and higher residual variance (PLRT = 1.07 × 10-3) in ACC than males (Table 1). There was no relationship between ACC and FID age, offspring sex, or the genomic inbreeding coefficients of the FID or offspring; furthermore, there was no variance in ACC explained by common environmental effects such as FID birth year, year of gamete transmission, or maternal/paternal identities of the FID (Animal Models, P > 0.05). A bivariate model of male and female ACC showed that the cross-sex additive genetic correlation (ra) was 0.808 (SE = 0.202); this correlation was significantly different from 0 (PLRT < 0.001) but not different from 1 (PLRT = 0.308).
Genetic architecture of recombination rate
Partitioning variance by genomic region
The contribution of specific genomic regions to ACC was determined by partitioning the additive genetic variance in sliding windows (Regional heritability analysis, Table S3 [51]). There was a strong sex-specific association of ACC in females within a sub-telomeric region on chromosome 6 (20 SNP sliding window; Fig 3B). This corresponded to a 1.46 Mb segment containing ~37 protein coding regions, including ring finger protein 212 (RNF212), a locus previously implicated in recombination rate variation [26,27,52]. This region explained 8.02% of the phenotypic variance (SE = 3.55%) and 46.7% of the additive genetic variance in females (PLRT = 9.78 × 10-14), but did not contribute to phenotypic variation in males (0.312%, SE = 1.2%, PLRT = 0.82; Fig 3C, Table S3). There was an additional significant association between ACC in both sexes and a region on chromosome 7, corresponding to a 1.09Mb segment containing ~50 protein coding regions, including RNF212B (a paralogue of RNF212) and meiotic recombination protein locus REC8 (PLRT = 3.31 × 10-6, Fig 3A, Table S3). This region explained 4.12% of phenotypic variation (SE = 2.3%) and 26.2% of the additive genetic variation in both sexes combined; however, this region was not significant associated with ACC within each sex individually after correction for multiple testing (Table S3).
Genome-wide association study (GWAS)
The most significant association between SNP genotype and ACC in both sexes was at s74824.1 in the sub-telomeric region of chromosome 6 (P = 2.92 × 10-10, Table 2). Sex-specific GWAS indicated that this SNP was highly associated with female ACC (P = 1.07 × 10-11), but was not associated with male ACC (P = 0.55; Table 2, Fig 4). This SNP corresponded to the same region identified in the regional heritability analysis, was the most distal typed on the chromosome from the Ovine SNP50 BeadChip (Fig 4), and had an additive effect on female ACC, with a difference of 3.37 (S.E. = 0.49) autosomal crossovers per gamete between homozygotes (Table 2). A SNP on an unmapped genomic scaffold (1.8kb, NCBI Accession: AMGL01122442.1) was also highly associated with female ACC (Fig 4). BLAST analysis indicated that the most likely genomic position of this SNP was at ~113.8Mb on chromosome 6, corresponding to the same sub-telomeric region.
Two regions on chromosome 3 were associated with ACC using the GWAS approach, although their respective regions had not shown a significant association with ACC using a regional heritability approach (see above). A single SNP on chromosome 3, OAR3_51273010.1, was associated with ACC in males, but not in females, and had an approximately dominant effect on ACC (P = 1.15× 10-6, Fig 4, Table 2); This SNP was 17.8kb from the 3’ UTR of leucine rich repeat transmembrane neuronal 4 (LRRTM4) in an otherwise gene poor region of the genome (i.e. the next protein coding regions are > 1Mb from this SNP in either direction). A second SNP on chromosome 3, OAR3_87207249.1, was associated with ACC in both sexes (P = 1.95× 10-6, Fig 4, Table 2). This SNP was 137kb from the 5’ end of an orthologue of WD repeat domain 61 (WDR61) and 371kb from the 5’ end of an orthologue of ribosomal protein L10 (RPL10). No significant SNP associations were observed at the significant regional heritability region on chromosome 7. Full results of GWAS are provided in Table S4.
Genotype imputation and association analysis at the sub-telomeric region of chromosome 6
Genotyping of 187 sheep at a further 122 loci in the sub-telomeric region of chromosome 6 showed that this region has elevated levels of linkage disequilibrium, with the two most significant SNPs from the 50K chip tagging a haplotype block of ~374kB (r2 > 0.8; see S1 Appendix, Fig 5, Table S5). Although 177kb from the primary candidate locus RNF212, this block contained three further candidate genes, complexin 1 (CPLX1), cyclin-G-associated kinase (GAK) and polycomb group ring finger 3 (PCGF3) [29]. SNP genotypes were imputed for all individuals typed on the 50K chip at these 122 loci, and the association analysis was repeated. The most highly associated SNP (oar3_OAR6_116402578, P = 1.83 × 10-19; Table 2, Fig 5) occurred within an intronic region of an uncharacterised protein orthologous to transmembrane emp24 protein transport domain containing (TMED11), 25.2kb from the putative location of RNF212 and 13kb from the 3’ end of spondin 2 (SPON2). A bivariate animal model including an interaction term between ACC in each sex and the genotype at oar3_OAR6_116402578 confirmed that this locus had an effect on female ACC only; this effect was additive, with a difference of 4.91 (S.E. = 0.203) autosomal crossovers per gamete between homozygotes (Fig 6, Tables 2 and S5). There was no difference in ACC between the three male genotypes. Full results for univariate models at imputed SNPs are given in Table S5.
Discussion
We characterised autosomal crossover counts in 3,300 gametes transmitted from 813 unique individuals to their offspring, to determine the genetic architecture of recombination rate in Soay sheep. Recombination rate was heritable and had a sexually dimorphic genetic architecture, including variants at loci previously implicated in ACC in other mammal species. By using both regional heritability and GWAS approaches, we were able to identify significant loci otherwise undetected through GWAS alone. This supports the strength of using regional heritability approaches to characterise variation from multiple alleles and/or haplotypes encompassing both common and rare variants that are in linkage disequilibrium with causal loci [51]. Our findings show the benefits of using multiple association approaches, and suggest that (a) ACC can be heritable in natural populations, with the potential to respond to selection, and (b) identification of regions a priori associated with ACC indicates that the genetic architecture of recombination rate variation is similar across mammal species.
The strongest association was observed at RNF212, with the genomic region accounting for ~47% of heritable variation in female ACC; this region was not associated with male ACC. RNF212 has been repeatedly implicated in recombination rate variation in mammals [26,52] and has also been shown to have sexually-antagonistic effects on recombination rate in humans [27,29,55]. Mouse studies have established that the protein RNF212 is essential for the formation of crossover-specific complexes during meiosis, and that its effect is dosage-sensitive [52]. The same sub-telomeric region of chromosome 6 also contained three further candidate loci, namely CPLX1, GAK and PCGF3 (Fig 5); these loci occurred on a ~374kb block of high LD (r2 > 0.8) and were in moderate LD with the most highly associated SNP at RNF212 (r2 = 0.54). A rare intronic SNP variant within CPLX1 has been associated with large differences in linkage map lengths in humans, independently of RNF212 [29]; GAK forms part of a complex with cyclin-G, a locus involved in meiotic recombination repair in Drosophila [56]; and PCGF3 forms part of a PRC1-like complex (polycomb repressive complex 1) which is involved in meiotic gene expression and the timing of meiotic prophase in female mice [53]. High LD within this region meant that it was not possible to test the effects of these loci on ACC independently; however, the co-segregation of several loci affecting meiotic processes may merit further investigation to determine if recombination is suppressed in this region, and if this co-segregation is of adaptive significance.
Additional genomic regions associated with ACC included a 1.09Mb region of chromosome 7 affecting both sexes (identified using regional heritability analysis), and two loci at 48.1Mb and 82.4Mb on chromosome 3 (identified using GWAS) with effects on males only and both sexes, respectively. The chromosome 7 region contained two loci associated with recombination phenotypes: REC8, the protein of which is required for the separation of sister chromatids and homologous chromosomes during meiosis [57]; and RNF212B, a paralogue of RNF212. This region is also associated with recombination rate in cattle [26]. The chromosome 3 variants identified were novel to this study, and occurred in relatively gene poor regions of the genome. These variants remained significant when repeating association analysis on ACC excluding their own chromosome (see Materials and Methods, Tables S3 & S4), meaning that they are likely to affect recombination rate globally (i.e. trans-acting effects), rather than being in LD with polymorphic recombination hotspots. After accounting for significant regions on chromosomes 6 and 7, between 36 and 61% of heritable variation in ACC was attributed to polygenic variation independent of these regions (Table S6). This “missing heritability” (i.e. heritable variation of unknown architecture) suggests that the genetic architecture of this trait is comprised of several loci of large effects on phenotype, but with a significant polygenic component i.e. variance attributed to few or many loci with small effects on phenotype in this population.
Meiotic recombination in mammals occurs during distinctly different periods in each sex: male meiosis occurs during spermatogenesis, whereas female meiosis starts in the foetal ovary, but is arrested during crossing-over between prophase and metaphase I and completed after fertilisation [40]. The between-sex genetic correlation of ACC detected in this study was not significantly different from 1, indicating that male and female ACC variation had a shared genetic basis – albeit with a relatively large error around this estimate. Further investigation through regional heritability mapping and GWAS indicated that variation in ACC has some degree of a shared and distinct genetic architecture between the sexes, which may be expected due to the similarities of this process, but differences in implementation. It is unlikely that the absence of associations between the RNF212 region and male ACC is due to low power to detect the effect, as (a) this locus had a sexually dimorphic effect a priori, and (b) bivariate models accounting for variation in RNF212 as a fixed effect supported a sexually dimorphic genetic effect with a lower degree of error (Fig 6).
The strong sex difference in recombination rate in this study was manifested in increased male recombination in sub-telomeric regions, allowing further crossovers to occur on the same chromatid despite crossover interference. Although variation at RNF212 resulted in markedly different ACCs in females, the relative positions of crossovers did not differ between females with different genotypes (Fig S4), suggesting that differences in the action of RNF212 on female ACC may be due to protein function or dosage dependence, such as is observed in mouse systems [52]. The degree of heterochiasmy observed in Soay sheep was similar to that of domestic sheep (Ovis aries, Male:Female linkage map = 1.19 [42]), but not to that of bighorn sheep (Ovis canadensis, male:female ratio = 0.89, [58], divergence ~2.8Mya). This suggests that large changes in heterochiasmy can occur over relatively short evolutionary timescales; indeed, to date there is no clear phylogenetic signal of heterochiasmy within mammals [35,47]. If recombination rate is indeed controlled by a small number of loci with relatively large and/or sexually-dimorphic effects, then the introduction of novel variants could lead to rapid changes in recombination landscape within and between species. This phenomenon has been observed at PRDM9 in humans and other species, where non-synonymous mutations can lead to changes in recombination hot-spots due to differential motif recognition [30,59]. Although more empirical studies on other species are required, our findings add to a compelling case for a role for RNF212 in driving heterochiasmy in mammal systems.
The reasons as to why Soay sheep and domestic sheep have strong male-biased heterochiasmy are not yet well understood; however, there are several factors which may favour increased recombination rate in males and variation in females. Soay sheep have a highly promiscuous mating system, with the largest testes to body size ratio within ruminants [60] and high levels of sperm competition, with dominant rams suffering from sperm depletion towards the end of the rut [61]. Increased recombination may allow more rapid sperm production through formation of meiotic bouquets [62]. However, when considering the large degree of variation in ACC within females, it is not clear why variants with a large effect exist within the population. Examination of haplotypes around RNF212 suggest that variation has been maintained in Soay sheep throughout their long history on St Kilda, and are unlikely to have been introduced by recent introgression events (S4 Appendix, [63]). One explanation for the maintenance of variation may be that this locus is under selection due to higher investment in their offspring by females, who produce one or two lambs per year. If so, then there may be a trade-off in females between the risks of aneuploidy versus maintaining genome integrity (i.e. through balancing selection). Further empirical studies of recombination rate and heterochiasmy in this and other natural systems will be required to determine if recombination rate is under selection, and to determine the drivers of differences in recombination rate and heterochiasmy across eukaryotes.
Materials and Methods
Study population and pedigree
The Soay sheep is a primitive breed of domestic sheep (Ovis aries) which has lived unmanaged on the St Kilda archipelago (Scotland, UK, 57°49’N, 8°34’W) since the Bronze age. Sheep living within the Village Bay area of Hirta have been studied on an individual basis since 1985 [48]. All sheep are ear-tagged at first capture (including 95% of lambs born within the study area) and DNA samples for genetic analysis are routinely obtained from ear punches and/or blood sampling. A Soay sheep pedigree has been constructed using 315 SNPs in low LD, and includes 5516 individuals with 4531 maternal and 4158 paternal links [64].
SNP Dataset
A total of 5805 Soay sheep were genotyped at 51,135 single nucleotide polymorphisms (SNPs) on the Ovine SNP50 BeadChip using an Illumina Bead Array genotyping platform (Illumina Inc., San Diego, CA, USA; [49]). Quality control on SNP data was carried out using the check.marker function in GenABEL v 1.8-0 [65] implemented in R v3.1.1, with the following thresholds: SNP minor allele frequency (MAF) > 0.01; individual SNP locus genotyping success > 0.95; individual sheep genotyping success > 0.99; and identity by state (IBS) with another individual < 0.90. Heterozygous genotypes at non-pseudoautosomal X-linked SNPs within males were scored as missing. The genomic inbreeding coefficient (measure in [66], hereafter , was calculated for each sheep in the software GCTA v1.24.3 [66], using information for all SNP loci passing quality control.
Estimation of meiotic autosomal crossover count (ACC)
Sub-pedigree construction
To allow unbiased phasing of the SNP data, a standardised pedigree approach was used to identify cross-overs that had occurred within the gametes transferred from a focal individual to its offspring; hereafter, focal individual (FID) refers to the sheep in which meiosis took place. For each FID-offspring combination in the Soay sheep pedigree, a sub-pedigree was constructed to include both parents of the FID (Father and Mother) and the other parent of the offspring (Mate), where all five individuals had been genotyped (Fig 7). This sub-pedigree structure allowed phasing of SNPs within the FID, and thus the identification of autosomal cross-over events in the gamete transferred from the FID to the offspring (Fig 7). Sub-pedigrees were discarded from the analysis if they included the same individual twice (e.g. father-daughter matings; N = 13).
Linkage map construction and chromosome phasing
All analyses in this section were conducted using the software CRI-MAP v2.504a [67]. First, Mendelian incompatibilities in each sub-pedigree were identified using the prepare function; incompatible genotypes were removed from all affected individuals, and sub-pedigrees containing parent-offspring relationships with more than 0.1% mismatching loci were discarded. Second, sex-specific and sex-averaged linkage map positions (in Kosambi cM) were obtained using the map function, where SNPs were ordered relative to their estimated positions on the sheep genome assembly Oar_v3.1 (Genbank assembly ID: GCA_000298735.1; [68]). SNP loci with a map distance of greater than 3 cM to each adjacent marker (10cM for the X chromosome, including PAR) were assumed to be incorrectly mapped and were removed from the analysis, with the map function rerun until all map distances were below this threshold; in total, 76 SNPs were assumed to be incorrectly mapped. Third, the chrompic function was used to identify informative SNPs (i.e. those for which the grand-parent of origin of the allele could be determined) on chromatids transmitted from the FID to its offspring; crossovers were deemed to have occurred where there was a switch in the grandparental origin of a SNP allele (Fig 7).
Quality control and crossover estimation in autosomes
Errors in determining the grand-parental origin of alleles can lead to false calling of double-crossovers (i.e. two adjacent crossovers occurring on the same chromatid) and in turn, an over-estimation of recombination rate. To reduce the likelihood of calling false crossover events, runs of grandparental-origin consisting of a single allele (i.e. resulting in a double crossover either side of a single SNP) were recoded as missing (N = 973 out of 38592 double crossovers, Fig S2). In remaining cases of double crossovers, the base pair distances between immediately adjacent SNPs spanning a double crossover were calculated (hereafter, “span distance”). The distribution of the span distances indicated that crossover interference is present within the Soay sheep population (Fig S2). Informative SNPs that occurred within double-crossover segments with a log10 span distance lower than 2.5 standard deviations from the mean log10 span distance (equivalent to 9.7Mb) were also recoded as missing (N = 503 out of 37619 double crossovers, Fig S2). The autosomal crossover count (ACC), the number of informative SNPs and the informative length of the genome (i.e. the total distance between the first and last informative SNPs for all chromatids) was then calculated for each FID. A simulation study was conducted to ensure that our approach accurately characterised ACC and reduced phasing errors. Autosomal meiotic crossovers were simulated given an identical pedigree structure and population allele frequencies (Nsimulations = 100; see S2 Appendix for detailed methods and results). Our approach was highly accurate in identifying the true ACC per simulation across all individuals and per individual across all simulations (adjusted R2 > 0.99), but indicated that accuracy was compromised in individuals with high values of . This is likely to be an artefact of long runs of homozygosity as a result of inbreeding, which may prevent detection of double crossovers or crossovers in sub-telomeric regions. To ensure accurate individual estimates of ACC, gametes with a correlation of adjusted R2 = 0.95 between simulated and detected crossovers in the simulation analysis were removed from the study (N = 8).
Assessing variation in the recombination landscape
Broad Scale Recombination Rate
Relationships between chromosome length and linkage map length, and male and female linkage map length were analysed using linear regressions in R v3.1.1. The relationship between chromosome length and chromosomal recombination rate (defined as cM length/Mb length) was modelled using a multiplicative inverse (1/x) regression in R v3.1.1.
Fine Scale Recombination Rate
The probability of crossing-over was calculated in 1MB windows across the genome using information from the male and female linkage maps. Briefly, the probability of crossing over within a bin was the sum of all recombination fractions, r, in that bin; in cases where an r value spanned a bin boundary, it was recalculated as r × Nboundary/NadjSNP, where Nboundary was the number of bases to the bin boundary, and NadjSNP was the number of bases to the closest SNP within the adjacent bin.
Variation in crossover probability relative to proximity to telomeric regions on each chromosome arm was examined using general linear models with a Gaussian error structure. The response variable was crossover-probability per bin; the fitted covariates were as follows: distance to the nearest telomere, defined as the best fit of either a linear (x), multiplicative inverse (1/x), quadratic (x2 + x), cubic (x3 + x2 + x) or a log term (log10 x); sex, fitted as a main effect and as an interaction term with distance to the nearest telomere; number of SNPs within the bin; and GC content of the bin (%, obtained using sequence from Oar_v3.1 [68]). The best model was identified using Akaike’s Information Criterion [69]. An additional model was tested, using ratio of male to female crossover probability as the response variable, with the same fixed effect structure (omitting sex). In both models, the distance to the nearest telomere was limited to 60Mb, equivalent to half the length of the largest acrocentric chromosome (Chr 4). Initial models also included a term indicating if a centromere was present or absent on the 60Mb region, but this term was not significant in either model.
Heritability and cross-sex genetic correlations of autosomal recombination rate
ACC was modelled as a trait of the FID. Phenotypic variance in ACC was partitioned using a restricted maximum likelihood (REML) Animal Model [50] implemented in ASReml-R [70] in R v3.1.1. To determine the proportion of phenotypic variance attributed to additive genetic effects (i.e. heritability), a genomic relatedness matrix at all autosomal markers was constructed for all genotyped individuals using GCTA v1.24.3 [66]; the matrix was shrunk using --adj 0. Trait variance of was analysed first with the following univariate model: where y is a vector of the ACC per transferred gamete; X is an incidence matrix relating individual measures to a vector of fixed effects, beta; Z1 and Zr are incidence matrices relating individual measures with additive genetic effects and random effects, respectively; a and ur are vectors of additive genetic effects from the genomic relatedness matrix and additional random effects, respectively; and e is a vector of residual effects. The heritability (h2) was calculated as the ratio of the additive genetic variance to the sum of the variance estimated for all random effects. Model structures were initially tested with a number of fixed effects, including sex, and FID age at the time of meiosis; random effects tested included: individual identity to account for repeated measures within the same FID; maternal and paternal identity; and common environment effects of FID birth year and offspring birth year. Significance of fixed effects was determined using a Wald test, whereas significance of random effects was calculated using likelihood ratio tests (LRT) between models with and without the focal random effect. Only sex and additive genetic effects were significant in any model; however, and individual identity were retained in all models to account for potential underestimation of ACC and the effects of pseudoreplication, respectively.
To investigate if the additive genetic variation underlying male and female ACC was associated with sex-specific variation in ACC, bivariate models were run. The additive genetic correlation rA was determined using the CORGH error structure function in ASReml-R, (correlation with heterogeneous variances) with rA set to be unconstrained. To test whether the genetic correlation was significantly different from 0 and 1, the unconstrained model was compared to models with rA fixed at a value of 0 or 0.999. Differences in additive genetic variance in males and females were tested by constraining both to be equal values. Models were then compared using likelihood ratio tests with 1 degree of freedom.
Genetic architecture of autosomal crossover count
Genome partitioning of genetic variance (regional heritability analysis)
The contribution of specific genomic regions to trait variation was determined by partitioning the additive genetic variance as follows [51]: where v is the vector of additive genetic effects explained by an autosomal genomic region i, and nv is the vector of the additive genetic effects explained by all remaining autosomal markers outwith region i. Regional heritabilities were determined by constructing genomic relatedness matrices (GRMs) for regions of i of increasing resolution (whole chromosome partitioning, sliding windows of 150, 50 and 20 SNPs, corresponding to regions of 9.4, 3.1 and 1.2Mb mean length, respectively) and fitting them in models with an additional GRM of all autosomal markers not present in region i; sliding windows overlapped by half of their length (i.e. 75, 25 and 10 SNPs, respectively). GRMs were constructed in the software GCTA v1.24.3 and were shrunk using the --adj 0 argument [66]. The significance of additive genetic variance attributed to a genomic region i was tested by comparing models with and without the Z1vi term using a likelihood ratio test; in cases where the heritability estimate was zero (i.e. estimated as “Boundary” by ASReml), significant model comparison tests were disregarded. A Bonferroni approach was used to account for multiple testing across the genome, by taking the number of tests and dividing by two to account for the overlap of the sliding windows (since each genomic region was modelled twice).
Genome-wide association study of variants controlling ACC
Genome-wide association studies (GWAS) of autosomal recombination rates under different scenarios were conducted using ASReml-R [70] in R v3.1.1, fitting individual animal models for each SNP locus using the same model structure as above. SNP genotypes were fitted as a fixed effect with two or three levels. The GRM was replaced with a relatedness matrix based on pedigree information to speed up computation; the pedigree and genomics matrices have been shown to be highly correlated [64]. Sex-specific models were also run. Association statistics were corrected for any population stratification not captured by the animal model by dividing them by the genomic control parameter, λ [71], when λ > 1, which was calculated as the median Wald test divided by the median , expected from a null distribution. The significance threshold after multiple testing was determined using a linkage disequilibrium-based method outlined in [72] using a sliding window of 50 SNPs; the effective number of tests in the GWAS analysis was 22273.61, meaning the significance threshold for P after multiple testing at α = 0.05 was 2.245 × 10-6. Although sex chromosome recombination rate was not included in the analysis, all GWAS included the X chromosome and SNP markers of unknown position (N=314). The proportion of phenotypic variance attributed to a given SNP was calculated using the using the following equation [73]: where p and q are the frequencies of alleles A and B at the SNP locus, a is half the difference in the effect sizes estimated for the genotypes AA and BB, and d is the difference between a and the effect size estimated for genotype AB when fitted as a fixed effect in an animal model. The proportion of heritable variation attributed to the SNP was calculated as the ratio of VSNP to the sum of VSNP and the additive genetic variance estimated from a model excluding the SNP as a fixed effect. Standard errors of VSNP were estimated using a delta method approach. Gene annotations in significant regions were obtained from Ensembl (gene build ID: Oar_v3.1.79; [74]). The position of a strong candidate locus, RNF212 is not annotated on Oar_v3.1, but sequence alignment indicated that it is positioned at the sub-telomere of chromosome 6 (see S3 Appendix).
Accounting for cis- and transgenetic variants affecting recombination rate
In the above analyses, we wished to separate potential associations with ACC due to cis-effects (i.e. genetic variants that are in linkage disequilibrium with polymorphic recombination hotspots) from those due to trans-effects (i.e. genetic variants in LD with genetic variants that affect recombination rate globally). By using the total ACC within a gamete, we incorporated both cis- and trans-effects into a single measure. To examine trans-effects only, we determined associations between each SNP and ACC minus crossovers that had occurred on the chromosome on which the SNP occurred e.g. for a SNP on chromosome 1, association was examined with ACC summed across chromosomes 2 to 26. We found that examining trans-variation (ACC minus focal chromosome) obtained similar results to cis- and trans-variation (ACC) for both regional heritability and genome-wide association analyses, leading to the same biological conclusions. All results presented above are for ACC, but results from both types of analysis are available in Tables S3 and S4.
Linkage disequilibrium and imputation of genotypes in significant regions
A reference population of 189 sheep was selected and genotyped at 606,066 SNP loci on the Ovine Infinium® HD SNP BeadChip for imputation of genotypes into individuals typed on the 50K chip. Briefly, the reference population was selected iteratively to maximise using the equation , where p is a vector of the proportion of genetic variation in the population captured by m selected animals, Am is the corresponding subset of a pedigree relationship matrix and c is a vector of the mean relationship of the m selected animals (as outlined in [75] & [76]). This approach should capture the maximum amount of genetic variation within the main population for the number of individuals in the reference population. SNP loci were retained if call rate was > 0.95 and MAF > 0.01 and individuals were retained if more than 95% of loci were genotyped. Linkage disequilibrium (LD) between loci was calculated using Spearman-Rank correlations (r2) in the 188 individuals passing quality control.
Genotypes from the HD SNP chip were imputed to individuals typed on the SNP50 chip in the chromosome 6 region significantly associated with ACC, using pedigree information in the software MaCH v1.0.16 [77]. This region contained 10 SNPs from the Ovine SNP50 BeadChip and 116 additional independent SNPs on the HD SNP chip. As the software requires both parents to be known for each individual, cases where only one parent was known were scored as both parents missing. Genotypes were accepted when the dosage probability was between 0 and 0.25, 0.75 and 1.25, or 1.75 and 2 (for alternate homozygote, heterozygote and homozygote, respectively). The accuracy of genotyping at each locus was tested using 10-fold cross-validation within the reference population: genotypes were imputed for 10% of individuals randomly sampled from the reference population, using genotype data for the remaining 90%; this cross-validation was repeated 1000 times, to compare imputed genotypes with true genotypes. Cross-validation showed a relationship between number of missing genotypes and number of mis-matching genotypes within individuals; therefore, individuals with < 0.99 imputed genotypes scored were removed from the analysis. Loci with < 0.95 of individuals typed were also discarded. Imputation accuracy was calculated for all loci as the proportion of imputed genotypes matching their true genotypes; all remaining loci had imputation accuracies > 0.95.
Acknowledgements
We thank Jill Pilkington, Ian Stevenson and all Soay sheep project members and volunteers for collection of data and samples. Discussions and comments from Jarrod Hadfield, Bill Hill, Craig Walling, Camillo Bérénos, Jisca Huisman and John Hickey greatly improved the analysis. James Kijas, Yu Jiang and Brian Dalrymple provided information for numerous queries on the genome assembly, annotation and SNP genotyping. Philip Ellis and Camillo Bérénos prepared DNA samples, and Louise Evenden, Jude Gibson and Lee Murphy carried out SNP genotyping at the Wellcome Trust Clinical Research Facility Genetics Core, Edinburgh. This work has made extensive use of the resources provided by the Edinburgh Compute and Data Facility (http://www.ecdf.ed.ac.uk/). Permission to work on St Kilda is granted by The National Trust for Scotland and Scottish Natural Heritage, and logistical support was provided by QinetiQ and Eurest.