The genomic landscape of recombination rate variation in Chlamydomonas reinhardtii reveals effects of linked selection

Recombination confers a major evolutionary advantage by breaking up linkage disequilibrium (LD) between harmful and beneﬁcial mutations and facilitating selection. Here, we use genome-wide patterns of LD to infer ﬁne-scale recombination rate variation in the genome of the model green alga Chlamydomonas reinhardtii and estimate rates of LD decay across the entire genome. We observe recombination rate variation of up to two orders of magnitude, ﬁnding evidence of recombination hotspots playing a role in the genome. We ﬁnd that recombination rate is highest just upstream of genic regions, suggesting the preferential targeting of recombination breakpoints in promoter regions. Furthermore, we observe a positive correlation between GC content and recombination rate, suggesting a role for GC-biased gene conversion or selection on base composition within the GC-rich genome of C. reinhardtii . We also observe a clear positive correlation between nucleotide diversity and recombination, consistent with widespread inﬂuence of linked selection in the genome. Finally, we use estimates of the effective rate of recombination to calculate the rate of sex that occurs in natural populations of this important model microbe, estimating a sexual cycle roughly every 770 generations. We argue that the relatively infrequent rate of sex and large effective population size creates an population genetic environment that increases the inﬂuence of linked selection on the genome.


Introduction
Meiotic recombination, the shuffling of existing genetic material, is both a fundamental evolutionary process and required to ensure proper disjunction of chromosomes during meiosis.Meiotic recombination has two possible outcomes: crossing over (CO) and non-crossing over (NCO), also known as gene conversion.Both forms have significant consequences for the ability of selection to fix adaptive mutations and remove harmful changes (Hill and Robertson, 1966, Koehler et al., 1996, Campos et al., 2017).The reduction in selection efficacy without recombination arises because linkage can interfere with selection for beneficial variants that are associated with harmful ones (Hill and Robertson, 1966, Felsenstein, 1974, Otto and Gerstein, 2006, Comeron et al., 2008).This process of linked selection manifests in DNA sequence as a reduction in nucleotide diversity, through either the loss of neutral variation surrounding positively selected sites (selective sweeps) or, in the case of purifying selection, the removal of linked neutral variation during the purging of deleterious alleles, known as background selection (Begun andAquadro, 1992, Charlesworth et al., 1993).Recombination rate varies at multiple scales across nature, with variability observed within and between taxa as well as within the genome (Stapley et al., 2017).The spatial heterogeneity of recombination within genomes means that local recombination rate is an important determinant of selection and other evolutionary processes in a given genomic region (McVean et al., 2004).Understanding the predictors of elevations or reductions in recombination rate, also known as recombination hotspots and coldspots respectively, has thus constituted a core objective in the study of recombination and genome evolution.Traditionally, recombination rate variation was investigated using linkage maps (Sturtevant, 1913).However, linkage mapping has proven impractical for the assessment of recombination rate variation at fine scales, because resolution is constrained by both the physical distance between markers as well as the difficulty of crossing organisms to generate sufficiently large samples of recombinants (Mackay and Powell, 2007).More recently, however, coalescent-based methods have been developed to estimate the population rate of recombination from the patterns of linkage disequilibrium (LD) between polymorphic sites along chromosomes, allowing for fine-scale examination of recombination rate at high resolution (Kuhner et al., 2000, Auton and McVean, 2007, Chan et al., 2012).Across the species that have been surveyed to date, predictors of recombination rate variation remain inconsistent.In mammalian genomes, which display a high degree of finescale variability in recombination rate, the presence of recombination hotspots is known to be driven by the double strand break-inducing histone H3 methyltransferase protein PR domain-containing 9 (PRDM9) (Baudat et al., 2010, Parvanov et al., 2010, Baudat et al., 2013).However, studies of recombination in non-mammalian model organisms have uncovered conservation of hotspot regions over long evolutionary time scales at or near functional genomic elements, such as near transcription start sites (Tsai et al., 2010, Singhal et al., 2015, Lam and Keeney, 2015).For instance, hotspots in Saccharomyces cerevisiae have been found to preferentially occur in GC-rich promoter regions (Gerton et al., 2000, Lam andKeeney, 2015) and regions of open chromatin (Wu andLichten, 1994, Berchowitz et al., 2009).Despite the differences between mammalian and yeast systems, how- ever, recombination rate remains positively correlated with GC content in both, a phenomenon that has since given rise to the GC-biased gene conversion hypothesis (Galtier andDuret, 2001, Lesecque et al., 2013).In contrast, Caenorhabditis elegans has recombination landscapes that are largely devoid of hotspots, instead displaying relatively large genomic blocks of constant recombination rate (Rockman andKruglyak, 2009, Andersen et al., 2012).Finally, local methylation is observed to suppress recombination in Arabidopsis thaliana (Yelina et al., 2015) yet in humans, germline methylation levels are positively correlated with recombination rate (Sigurdsson et al., 2009).However, high-resolution estimates of genome-wide recombination rate variation have thus far been constrained to a few model animals, fungi, and land plants, which represent a very small subset of eukaryote diversity.What determines recombination landscapes and whether PRDM9-like drivers of recombination rate exist in other taxa remains to be discovered.The recombination profiles of most protists remain unknown, and in some of those cases, whether they are even sexual in the first place (D 'Souza and Michiels, 2010, Grimsley et al., 2010, Stapley et al., 2017).It is estimated that the rate of sex is unknown in nearly all (>99%) free-living protist species (Weisse, 2008).The rate of sex is especially relevant to overall estimates of recombination in the case of unicellular eukaryotes that switch between clonal and sexual reproduction.In a primarily asexual population, it follows that mutation will generate the vast majority of novel genetic material over evolutionary time, while the relative reduction of recombination will, all else being equal, result in a more pronounced effect of linked selection (Agrawal andHartfield, 2016, Hartfield, 2016).The rate of sex can be estimated as the relative frequency of meioses to mitoses by combining di-rect estimates of the recombination (r) and mutation (µ) rate with population estimates of genetic diversity (4N e µ) and the effective recombination rate (4N e r) (Tsai et al., 2008).
Here, we examine recombination rate variation across the genome in the unicellular green alga Chlamydomonas reinhardtii, and estimate the rate of sex in this model organism.C. reinhardtii is facultatively sexual, with sex occurring between haploid mating type plus (mt+) and mating type minus (mt-) individuals (Ferris and Goodenough, 1997) to form diploid zygotes, which then recombine and reduce to four haploid progeny.In this study, we use whole-genome sequencing of 19 C. reinhardtii field isolates, from nearby localities in Quebec, Canada, to address the following questions using LD-based approaches: 1) What is the landscape of recombination rate variation in the genome of C. reinhardtii?2) What genomic features predict recombination rate variation?3) How does recombination rate correlate with diversity?4) What is the rate of sex in natural populations of C. reinhardtii?In doing so, we present a fine-scale estimate of genome-wide recombination rate variation in the organism, and find an enrichment of recombination hotspots immediately upstream and downstream of genes; a pattern consistent with observations made in other non-mammalian organisms.

Results
The variable landscape of recombination in C. reinhardtii.From our population of 19 individuals (Table S1), we calculated fine-scale recombination landscapes across the genome of C. reinhardtii using the coalescent-based population recombination rate estimator LDhelmet 1.7 (Chan et al., 2012).LDhelmet estimates recombination rate between adjacent pairs of SNPs in units of ρ/bp, where ρ = 2N e r; this unit with intergenic annotations in light blue and genic annotations in dark blue.Error bars represent bootstrapped 95% confidence intervals (n = 1000).The annotations 'upstream' and 'downstream' represent intergenic sites that were within 2 kb of a genic region, while 'both' indicates sites that are downstream of one gene and upstream of another.'intergenic' sites are all other intergenic sites more than 2 kb from the nearest gene.The dashed horizontal line represents the mean genome-wide ρLD value.
is hereafter referred to as ρLD.The genome wide average recombination rate was ρLD = 0.0044 /bp.Mean chromosomal recombination rate varied from 0.0026 to 0.0115 and displayed a negative linear relationship with chromosome length (Fig. S2, R 2 = 0.4803, p = 0.002).In 200 kb windows, we observe broad-scale recombination rate variation of up to two orders of magnitude.Genome-wide ρLD estimates were then summarized in non-overlapping 2 kb windows for fine-scale analysis of recombination rate variation; the distribution of recombination rates is shown in Fig. 1.We then investigated the presence of recombination hotspots in C. reinhardtii, defining a hotspot as a region that was 1) at least 2 kb in length and 2) exhibited a 5-fold increase in ρLD as compared to the chromosome-wide mean recombination rate.We found evidence of hotspots in all chromosomes, with 741 hotspot regions in total representing 1.87% of the genome, where the average ρLD within hotspots was more than 14 times the genome average (mean length = 2.75 kb, mean ρLD fold increase over local background = 12.83, mean distance between adjacent hotspots = 164 kb, mean ρLD at hotspots = 0.06247).We found an inverse relationship between chromosome size and hotspot density (Spearman's ρ = -0.5858,p = 0.015).
We observed that most hotspots at a 2 kb window size were found in non-continuous clusters in the genome, raising the possibility that closely grouped peaks in recombination rate may in fact be indicative of broad-scale regional elevations.To smooth out the hotspot landscape and validate the presence of these putative broad-scale high recombination re-gions, we modified the first part of our definition to instead examine fold change in recombination rate at >20 kb and >50 kb scales.As expected, this reduced the number of regions classified as hotspots (n = 62 at 20 kb, n = 24 at 50 kb), yet minimum 5-fold increases in recombination rate were still observed at both window sizes, indicating that pronounced regions of elevated recombination rate are present even at broader scales in the C. reinhardtii genome.We also examined the decay of LD with physical distance in C. reinhardtii using the r 2 measure of LD.We calculated pairwise LD between SNPs within each of C. reinhardtii's individual chromosomes using plink v1.90 (Chang et al., 2015).We then modeled the expected decay of LD with physical distance for each chromosome following equation 3 from Hill and Weir (1988) (Fig. 1b).We found estimates of LD measured as r 2 dropped to half of their starting value within a mean distance of 164 bp, ranging from 93 bp to 394 bp across all chromosomes.The decay of r 2 levels off within a mean distance of 9500 bp, where 'leveling off' is defined as the point at which the instantaneous rate of change in r 2 with physical distance approximates 0 to six significant digits (mean r 2 at level off point = 0.0585 ± 0.0017).In concordance with the relationship observed between LDhelmet ρLD values and chromosome length, we also observe that LD approaches baseline levels faster in shorter chromosomes, although this correlation is marginally nonsignificant (Spearman's ρ = 0.4559, p = 0.067).
Recombination rate is highest immediately surrounding genes.To investigate the correlates of recombination across the genome, we then examined our ρLD estimates across different annotations within the reference genome of C. reinhardtii (Merchant et al., 2007).Mean recombination rates along different features of the C. reinhardtii genome are shown in Fig. 2. We found that recombination rate was 11.9% higher in intergenic regions (mean ρLD = 0.004877) than genic regions (mean ρLD = 0.04358, Mann-Whitney U test, p < 2.2 × 10 −16 ).Within intergenic regions, proximity to genes is a strong predictor of elevated recombination rate, with sites 2 kb up or downstream of genes displaying 24.2% higher mean ρLD than the genome background (mean ρLD of sites within 2 kb of genes = 0.005510).There is a striking and statistically significant 63% increase in recombination within the 2 kb upstream of genes compared to the adjacent 5' UTR (Mann-Whitney U test, p = 3.16 × 10 −16 ; ρLD upstream = 0.005707, ρLD 5' UTR = 0.003498).Within genes, protein-coding regions demonstrate significantly higher rates of recombination than introns (Mann-Whitney U test, p = 5.69 × 10 −5 ; ρLD CDS = 0.004838, ρLD introns = 0.004211).We then examined whether any of the annotations were enriched for hotspots as compared to a random expectation of hotspot distribution.Here, we find the highest enrichment of hotspots in regions classified as 'upstream' (Fisher's exact test, odds ratio = 2.019, p < 2.2 × 10 −16 ) 'downstream' (OR = 2.003, p < 2.2 × 10 −16 ), and 'both' (OR = 1.340, p < 2.2 × 10 −16 ).13.9% of all hotspots occur in the 8.7% of the genome corresponding to these annotations.
The association between recombination rate and genome annotation has been hypothesized to be driven by DNA methylation patterning.Sequence methylation is subdivided into three types, depending on the sequence context surrounding a methylated cytosine: CG, CHG, and CHH (where H = A, T, or C).Given that the enrichment of recombination hotspots flanking genes in C. reinhardtii suggests a role for open chromatin in determining recombination localization (see Discussion), the open chromatin model would thus predict recombination suppression in heterochromatic regions (Choi and Henderson, 2015).To examine the relationship between DNA methylation patterning and recombination rate in C. reinhardtii, we obtained publicly available bisulfite sequencing data from three clones of one of the Quebec C. reinhardtii strains included in the present analysis (CC-2937) (Kronholm et al., 2017), and summarised methylation levels across all three contexts and ρLD values in 200 bp windows.Here, we opted for a much smaller window size than in previous analyses because we expected that the highly localized nature of DNA methylation (Suzuki and Bird, 2008) would mean that effects on recombination would be at very fine scales.We find that the correlation between ρLD and methylation, although significant, is only very weakly negative (Spearman's ρ = -0.044,p < 2.2 × 10 −16 ), indicating that DNA methylation is not a strong driver of recombination in C. reinhardtii.
Recombination rate is correlated with both nucleotide diversity and GC content.If background selection and selective sweeps are common, we expect that linked selection will drive a reduction in neutral diversity in regions of low recombination.To examine whether linked selection was occurring in the C. reinhardtii genome, we correlated recombination rate with neutral nucleotide diversity in nonoverlapping 100 kb windows.Here, we found θ π at selectively unconstrained (intronic, intergenic, and four-fold degenerate) sites was positively associated with ρLD (Spearman's ρ = 0.6073, p < 2.2 × 10 −16 ), in alignment with expectations from linked selection.Concordant with this observation, we find nucleotide diversity to be more than 20% higher in hotspot regions (θ π = 0.03349, 95% CI = 0.0333-0.0341,p < 2.2 × 10 −16 ) than the remainder of the genome (θ π = 0.02786, 95% CI = 0.02781-0.02789,p < 2.2×10 −16 ).Recombination rate has also been found to correlate with GC content in a variety of model systems (Mugal et al., 2015).This relationship is often attributed to GC-biased gene conversion (gBGC), in which heteroduplex mismatches between GC and AT bases formed during recombination are preferentially repaired in favour of GC nucleotides (Galtier and Duret, 2001).From a population genetic perspective, this effect is indistinguishable from positive selection in favour of GC alleles, and can thus substantially impact genome evolution (Duret andGaltier, 2009, Ratnakumar et al., 2010).C. reinhardtii is known to have a highly GC-rich genome (64%) despite a strong A/T mutational bias, raising the possibility that gBGC may have played a role in the evolution of its genome (Ness et al., 2015).Here, we do find that GC content does display a weakly positive correlation with recombination rate in non-overlapping 50 kb windows (Spearman's ρ = 0.2434,  (Ruderfer et al., 2006, Tsai et al., 2008).This is because mutation is measured relative to all cell divisions, while recombination is only measured in meiotic cell divisions.The ratio of diversity over µ therefore reflects the effective number of mitoses, while population recombination rate over r is representative of sexual, meiotic cell divisions; consequently, the ratio of these two values indicates the frequency of meioses over mitoses.Our genome-wide recombination rate estimate of 4.43 × 10 −3 can therefore be used in tandem with previous estimates of the C. reinhardtii recombination rate (r = 12cM/M b) (Liu et al., 2017), the mutation rate (µ = 9.63 × 10 −10 ) (Ness et al., 2015), and neutral diversity (θ = 2.75 × 10 −2 ) (Ness et al., 2016) to estimate the ratio of meioses over mitoses in natural populations of C. reinhardtii.With this approach, we obtain a ratio of 0.001292, corresponding to one meiosis occurring every ∼ 770 asexual generations.

Discussion
In this study, we have generated a fine-scale estimate of the recombination rate landscape in C. reinhardtii using patterns of linkage disequilibrium in whole-genome resequencing data, revealing a punctuated recombination landscape with frequent hotspots.We also found an enrichment of recombination hotspots within 2 kb of genes that leads to an overall increase in recombination rate surrounding genes, in concert with observations in a number of other PRDM9-lacking organisms.The variation in recombination rate across the genome is strongly correlated with nucleotide diversity, suggesting that the influence of linked selection is widespread in the genome and that recombination is a major driver of genetic variation.Lastly, we have used our estimate of the population recombination rate to estimate the frequency of sex in C. reinhardtii, allowing for further insights into the ecology and evolution in of this model organism in nature.At the 2 kb scale, we find widespread recombination hotspots across the C. reinhardtii genome.While this window size results in clusters of many nearby hotspots, we find support for 5-fold elevations of local recombination rate over background even at window sizes of 20 kb and even 50 kb.This indicates that some of the hotspots we identified are in fact peaks within larger blocks of generally elevated recombination rate, similar to observations in both mammals and yeast (Stapley et al., 2017).On the other hand, this recombination profile is unlike that of both C. elegans, which has a comparatively homogenous fine-scale recombination landscape (Rockman andKruglyak, 2009, Kaur andRockman, 2014) as well as D. melanogaster, which displays some degree of finescale heterogeneity but little evidence for highly localized elevations in recombination rate (Chan et al., 2012, Manzano-Winkler et al., 2013).We find that C. reinhardtii exhibits moderate rates of LD decay across all 17 chromosomes.Our estimates of the distance (≤10 kb) at which LD (r 2 ) decays to baseline LD levels are shorter than those obtained in a previous study of C. reinhardtii that reported a decay to baseline within ∼ 20 kb (Flowers et al., 2015).The disparity in our estimates may be related to the geographic structure of our isolates; where we sequenced 19 isolates all from nearby localities in Quebec, Canada, Flowers et al. utilized a mix of lab strains alongside isolates from a variety of populations across Quebec and Eastern United States, for a total of 20 unique strains.The disparity in our respective estimates supports the idea that there are barriers to recombination across the geographic range of C. reinhardtii in North America, with the resulting population structure increasing LD among variants.Our results also indicate that the recombination rate inversely correlates with chromosome length, a relationship consistent with prior studies in a variety of organisms (Kaback et al., 1992).Given that each chromosome requires at least one crossover event to ensure proper meiotic disjunction, it follows that shorter chromosomes exhibit higher per-base crossover rates, resulting in more pronounced signatures of LD breakdown.Furthermore, we find elevated recombination and an enrichment of hotspots within regions immediately flanking genes in C. reinhardtii, consistent with a trend conserved across the lineages of nearly all PRDM9-lacking organisms studied thus far.Specifically, recombination hotspots upstream of genes have been observed in fungi (Berchowitz et al., 2009, Tsai et al., 2010, Lam and Keeney, 2015), finches (Singhal et al., 2015), as well as angiosperms, such as wheat (Saintenac et al., 2009), maize (Li et al., 2015) and A. thaliana (Wijnker et al., 2013, Choi et al., 2013).In addition, the same pattern is observed in the mammalian Canidae family, where PRDM9 was lost relatively recently (Axelsson et al., 2012, Auton et al., 2013).In these PRDM9-lacking organisms, chromatin structure has been invoked as an explanation of recombination hotspot conservation upstream of genes (Wu and Lichten, 1994, Lichten, 2008, Berchowitz et al., 2009).In eukaryotes, DNA is wrapped in nucleosomes, with nucleosome occupancy depleted in regions where the DNA needs to be accessible, such as for the purposes of transcription.Promoter regions upstream of genes thus tend to display greater nucleosome depletion, which may in turn allow for recombination machinery to more easily induce double strand breaks in these regions (Pan et al., 2011, Lam andKeeney, 2015).Our observations of elevated recombination rate immediately flanking genes suggest a similar mechanism acting in the C. reinhardtii genome, and show that this trend is even more wide-spread, extending to green algae.
Within genes, however, we observe more surprising patterns in recombination rate variation.First, protein-coding regions display elevated rates of recombination relative to genome average, which is unexpected from an evolutionary perspective.Theory predicts that recombination rates may evolve to be lower in functional regions, so as to both prevent the shuffling of beneficial allele combinations ('recombination load') and to also suppress adverse mutagenic effects on coding sequence that may result from crossing over (Charlesworth andBarton, 1996, Otto andLenormand, 2002).Furthermore, recombination events in genes that result in deleterious mutations may be selected against and therefore reduce evidence of recombination in exons.The 5' UTR in particular has the lowest recombination rate of any annotation in C. reinhardtii, despite recombination rate elevations observed in other species (Mancera et al., 2008, Kawakami et al., 2017).One explanation could be that recombination could itself be mutagenic: thus, recombination-associated deleterious mutations in the functional components of the UTR region might have been eliminated by selection, increasing observed LD.However, were this the case, we would expect a similar trend in protein-coding exons, where we instead observe the highest rates of within-gene recombination.Moreover, recombination rate in intronic sequence is intermediate to the UTR and exonic sequence, whereas under a mutagenic hypothesis we would instead expect within-gene recombination rates to be higher in introns than in exons.
Ultimately, the idea that selection has reduced effective recombination in functional sites does not appear to hold from the current data.Thus, either recombination is not mutagenic in C. reinhardtii, or we must invoke another mechanism, such as chromatin conformation or some yet to be discovered driver of recombination, to explain the observed patterns.Unfortunately, relatively little is known about chromatin conformation in C. reinhardtii, and we see only a very weak association with methylation, leaving identification of the drivers of recombination rate variation in and around genes an open question.
We find that recombination rate positively correlates with nucleotide diversity across the genome of C. reinhardtii, indicating the action of linked selection.Theory predicts that this correlation arises as a consequence of background selection and/or selective sweeps reducing diversity in regions of otherwise low recombination (Charlesworth et al., 1993, Cutter and Payseur, 2013, Campos et al., 2017).We also find diversity to be higher in hotspot regions (θ π = 0.03349) than nonhotspot regions (θ π = 0.02786), which is to be expected if linked selection is less pronounced in regions of elevated recombination.Our result demonstrates that linked selection is a major determinant of standing genetic variation in C. reinhardtii.Given that C. reinhardtii is likely to have a very high effective population size (Ness et al., 2012) it is expected that many mutations will be effectively selected (i.e.N e s > 1) (Corbett-Detig et al., 2015).However, while the effective population size is very large, the relatively infrequent rate of sex (see below) means that the effective recombination rate is not particularly high relative to obligately sexual species.The interaction of a large N e facilitating efficient selection alongside reduced recombination due to a facultatively sexual life cycle means that the influence of linked selection may be pronounced in the genome.This principle may be more widespread in unicellular eukaryotes, which live in large populations that are only periodically sexual.
In addition, we also find a weakly positive correlation between GC content and local recombination rate.Our result is consistent with a trend seen in other organisms such as yeast (Gerton et al., 2000), mouse (Jensen-Seaman et al., 2004), and humans (Fullerton et al., 2001) (although A. thaliana is a notable exception, displaying an inverse correlation -see Wijnker et al. 2013).The correlation we observe is in line with the presence of GC-biased gene conversion (gBGC) (Galtier and Duret, 2001) acting to drive up GC content in the C. reinhardtii genome.The potential action of gBGC is especially notable considering that despite a strong A/T mutational bias, C. reinhardtii has a GC-rich genome (64.1%) (Merchant et al., 2007, Ness et al., 2015).While our result lends support to a possible role of gBGC, a recent study revealed only weak gBGC from the genome sequences of 27 C. reinhardtii tetrads, in concert with a low overall rate of gene conversion, thus indicating a very minor role for biased gene conversion in the evolution of its genome (Liu et al., 2017).It is worth noting that using LD-based estimates of population recombination rate, we obtain a stronger correlation between GC content and recombination rate (Spearman's ρ = 0.2434) than Liu et al. (Spearman's ρ = 0.0646).A stronger GC-recombination correlation when considering historical recombination events suggests that the effects of weak forces governing base composition are more apparent over longer evolutionary timescales.However, considering that LD-based recombination rate estimates do not differentiate between crossovers and gene conversion events, it is difficult to ascertain the effects of biased gene conversion from our data without knowing the relative frequency of either form of recombination.
We also investigated the relationship between recombination rate and DNA methylation, following observations in A. thaliana that DNA methylation is indeed associated with crossover suppression (Yelina et al., 2015).However, our results report only a weakly negative correlation between methylation levels and recombination rate.C. reinhardtii is known to have relatively low methylation levels to begin with (Kronholm et al., 2017) and, in addition, demonstrates unusual methylation patterns surrounding protein-coding regions.Namely, methylation levels in C. reinhardtii remain consistent in and out of genes, whereas other organisms such as A. thaliana and Oryza sativa display far more heterogeneous and context-dependent methylation patterning.For instance, methylation levels in these plants demonstrate sharp reductions around gene boundaries (Feng et al., 2010).The methylome of C. reinhardtii is also known to be labile over the course of an individual's life cycle (Lopez et al., 2015).Our analysis suggests that even if DNA methylation plays a role in suppressing recombination in C. reinhardtii, the effect is weaker than that seen in higher plants.
Finally, by integrating lab-and population-based measures of recombination and mutation, we have estimated the rate of sex in C. reinhardtii to be one meiosis every ∼770 asexual generations.The frequency is higher than estimates in S. cerevisiae (∼50000 generations) (Ruderfer et al., 2006) as well as S. paradoxus (∼1000-3000 generations) (Tsai et al., 2008).However, this method for estimating the frequency of sex is subject to numerous assumptions, chief amongst them neutrality and demographic equilibrium.We assume random mating, no gene flow, and no population subdivision; violations of these assumptions may otherwise result in more prominent LD and thus downwardly bias estimates of the frequency of sex.Despite these caveats, our estimate of sex occurring every ∼770 generations may point towards a seasonal ecology in C. reinhardtii.While the precise rate of mitosis in nature is unknown, log-phase cultures subjected to a light dark cycle typically exhibit 2-3 doublings every 24 hours (Bernstein, 1964, Jones, 1970, Harris et al., 2009).An average of 2.5 doublings per day corresponds to 770 generations every 308 days.Considering the fact that zygotes are resistant to freezing and other environmental stressors (Morris et al., 1979, Harris et al., 2009), it is plausible that populations of C. reinhardtii in Quebec overwinter as zygotes, undergoing a sexual cycle approximately once per year.
Taken together, our results suggest that populations of C. reinhardtii maintain relatively large effective sizes even at regional scales.These large populations are subject to frequent selection that interacts with infrequent bouts of sexual reproduction to drive strong effects of linked selection in the genome.The genome also displays significant heterogeneity in recombination rate, with recombination highest in the regions flanking genes; however, unlike other species examined to date, C. reinhardtii has relatively high recombination in coding exons, which suggests yet to be described drivers of recombination in this species., sequencing, and alignment. 19 (11 mt+, 8 mt-) natural strains of Chlamydomonas reinhardtii, sampled from Quebec, Canada, were obtained from the Chlamydomonas Resource Center.These 19 strains are all sampled from two nearby localities in Quebec, and therefore are possibly part of a recombining population, although we know very little about the scale at which sex occurs and population structure exists in C. reinhardtii (summarized in Table S1).Sequencing and alignment were performed as described previously (Ness et al., 2016).Briefly: we conducted whole-genome resequencing on the Illumina GAII platform at the Beijing Genomics Institute, and aligned 100bp PE reads with BWA 0.7.4-r385 (Li and Durbin, 2009).Since the C. reinhardtii reference genome is derived from an mt+ individual and does not include organelles, we appended the chloroplast genome, the mitochondrial genome, and the mtlocus to allow mapping of reads derived from these regions.The GATK v3.3 tools HaplotypeCaller and GenotypeGVCFs were then used to call SNPs and short indels, and stored in Variant Call Format (VCF) files (non default settings: textttploidy=1, includeNonVariantSites=true, heterozygos-ity=0.02,indel_heterozygosity=0.002).

Recombination landscapes and hotspots.
To obtain chromosome-wide maps of recombination rate variation in the C. reinhardtii genome, we used LDhelmet 1.7 (Chan et al., 2012), which calculates fine-scale estimates of population recombination in intervals between adjacent SNPs.The coalescent-based approach of LDhelmet also allows for inferences of ancestral recombination rate variation.LDhelmet reports the population recombination rate ρ = 2N e r that reflects the size of the recombining population (N e ) and the physical recombination rate (r, recombination events • bp -1 • generation -1 ).The block penalty parameter in LDhelmet determines the level of smoothing of ρ estimates along the chromosomes.To ascertain the block penalty that reflects real variation in rho, we computed average ρLD in non-overlapping 500 bp windows across chromosome 12 of C. reinhardtii using block penalties of 10, 50, and 100.We then performed pairwise correlations between windowed ρLD values across different block penalties and found that the estimates were virtually identical, which indicates that at the scale of hundreds of bases, our results were not sensitive to block penalty (Fig. S3).Here, we report data using the default LDhelmet setting of 50.We applied a population scaled mutation rate of 0.01 (Ness et al., 2015), and for each chromosome ran LDhelmet for 1,000,000 iterations following 100,000 iterations of burnin.Default parameters were otherwise retained.To detect recombination hotspots in LDhelmet's ρLD estimates, we modified a Python script from Singhal et al. (2015) that summarises ρLD values in non-overlapping windows.Following Singhal et al., we initially defined hotspots as regions that 1) were at least 2 kb in length and 2) exhibited a mean 5-fold increase in ρLD as compared to background; however, we considered the mean recombination rate for a given chromosome as background, as opposed to previous definitions, which only considered the surrounding 80-100 kb (International HapMap Consortium, 2005, Singhal et al., 2015).To smooth out the hotspot landscape, we also applied the same script over greater window sizes, investigating recombination rate variation in 20 kb and 50 kb windows.LD decay across chromosomes.Pairwise calculations of r 2 within each of C. reinhardtii's 17 chromosomes were conducted using plink 1.90 (Chang et al., 2015).For all pairs of SNPs with MAF > 0.01, plink calculates LD statistics with a maximum likelihood approach described in (Gaunt et al., 2007).By default, plink filters out pairs of SNPs with an r 2 value below 0.2; we disabled this filtering.We calculated r 2 for all pairs of SNPs within 100 kb of one another, and modeled the expected decay of LD with distance for each chromosome with non-linear least squares regression in R (R Core Team, 2017) using equation 3 from Hill and Weir (1988).
Genomic correlates of recombination rate.We classified the reference genome of C. reinhardtii with the following annotations: genic sites were subclassified as protein coding sequence (CDS), exons, introns, and/or UTRs, while intergenic regions were classified as being within 2 kb upstream of a gene ('upstream'), 2 kb downstream of a gene ('downstream'), flanked by genes within 2 kb on either side ('both'), or more than 2 kb from the nearest gene ('intergenic').For each of the above features, average ρLD was calculated from every corresponding site in the genome.Recombination rates for each annotation were bootstrapped for 1000 replicates in order to obtain 95% confidence intervals.All statistical tests were implemented using R 3.4.1 (R Core Team, 2017).We then examined the relationship between DNA methylation and recombination rate using publicly available bisulfite sequencing data from three clones of C. reinhardtii strain CC-2937 (Kronholm et al., 2017).For read mapping, we used BWA-meth (Pedersen et al., 2014) and called methylated cytosines in CG, CHG, and CHH contexts using biscuit-0.2.2 (Zhou, 2017).Following Kronholm et al., we set minimum base quality to 20 and minimum mapping quality to 60.We then used the output VCF file to summarise methylation beta values in 200 bp windows, and examined the correlation between ρLD and methylation at the 200 bp scale.Nucleotide diversity (θ π ) was calculated at silent sites in 100 kb windows for downstream correlation with ρLD.We classified as silent all 4-fold degenerate sites, intronic sites, and intergenic sites.To examine nucleotide diversity in and out of hotspots, we calculated θ π at these sites across the genome in non-overlapping 2 kb windows, and split the resulting dataset using our hotspot classification from earlier.Windowed θ π estimates were bootstrapped for 1000 replicates in order to obtain 95% confidence intervals.For the correlation of GC content with recombination rate, we used a custom Python script to compute GC content in a given window, and then summarized GC content with ρLD values in non-overlapping 50 kb windows.For all figures in this work, we utilized the R package ggplot2 (Wickham, 2009).

Fig. 1 .
Fig. 1.A Cumulative frequency distribution of ρLD for each chromosome of C. reinhardtii.Each curve represents one of the 17 chromosomes, shaded by chromosome length.ρLD values were summarised in 2 kb windows.The vertical dashed line indicates the genomewide mean ρLD value.B Decay of linkage disequilibrium (r 2 ) across the 17 chromosomes of C. reinhardtii, modeled according to equation 3 from Hill and Weir (1988).