The distribution of fitness effects of spontaneous mutations in Chlamydomonas reinhardtii inferred using frequency changes under experimental evolution

The distribution of fitness effects (DFE) for new mutations is fundamental for many aspects of population and quantitative genetics. In this study, we have inferred the DFE in the single-celled alga Chlamydomonas reinhardtii by estimating changes in the frequencies of 254 spontaneous mutations under experimental evolution and equating the frequency changes of linked mutations with their selection coefficients. We generated seven populations of recombinant haplotypes by crossing seven independently derived mutation accumulation lines carrying an average of 36 mutations in the haploid state to a mutation-free strain of the same genotype. We then allowed the populations to evolve under natural selection in the laboratory by serial transfer in liquid culture. We observed substantial and repeatable changes in the frequencies of many groups of linked mutations, and, surprisingly, as many mutations were observed to increase as decrease in frequency. Mutation frequencies were highly repeatable among replicates, suggesting that selection was the cause of the observed allele frequency changes. We developed a Bayesian Monte Carlo Markov Chain method to infer the DFE. This computes the likelihood of the observed distribution of changes of frequency, and obtains the posterior distribution of the selective effects of individual mutations, while assuming a two-sided gamma distribution of effects. We infer that the DFE is a highly leptokurtic distribution, and that approximately equal proportions of mutations have positive and negative effects on fitness. This result is consistent with what we have observed in previous work on a different C. reinhardtii strain, and suggests that a high fraction of new spontaneously arisen mutations are advantageous in a simple laboratory environment.


Introduction
Understanding the nature of genetic variation for fitness requires an understanding of the origin of that variation from mutation. The distribution of fitness effects of mutations (DFE) describes the frequencies of mutations with differing magnitudes of effects, and is fundamental for many topics in evolutionary genetics, including the maintenance of genetic variation, the nature of genetic variation for quantitative traits and the genetic basis of adaptive evolution. The DFE specifies the relative frequencies of advantageous and deleterious mutations and the contributions of mutations with small and large effect sizes to fitness change and genetic variation. The DFE appears, for example, in the nearly neutral model of molecular evolution (Ohta 1973), which posits that patterns of molecular variation and between species change can be explained by mutations that have fitness effects close to 1/Ne (where Ne is the effective population size); this is a very small fitness effect for species with typically large Ne.
The DFE can be inferred experimentally or by statistical analysis of the frequencies of nucleotide variants at polymorphic sites (Eyre-Walker and Keightley, 2007). In the latter approach, the DFE for deleterious amino acid-changing mutations can be estimated by analysis of the site frequency spectrum (SFS) for nonsynonymous variants under the assumption that their distribution of effects follows a pre-specified distribution, such as a gamma distribution. When the SFS is combined with divergence data from another species the frequency and effects of advantageous nonsynonymous mutations can also be estimated (Eyre-Walker and Keightley 2009;Tataru et al. 2017). Analysis of the SFS has been applied to genomic data from a wide range of taxonomic groups. Estimated DFEs for deleterious mutations are invariably strongly leptokurtic (L-shaped), the shape of the distribution varying between taxonomic groups (Chen et al 2017), and there is usually a strong nearly neutral component. Analysis of the unfolded SFS suggests that at most a few percent of mutations are advantageous (Keightley et al 2016). Inferring the DFE using standing variation within a population is relevant to the fitness effects of mutations in nature, but does not capture strongly positively or strongly negatively selected mutations, because these tend not contribute to standing nucleotide variation.
Inference of the DFE via experimental manipulation can be applied to mutations engineered in specific genes or at random genomic locations. One approach, which has similarities to the one described here, estimates the selection coefficients of induced The DFE can also be inferred for mutations induced at random locations in the genome.
For example, Johnson et al (2019) estimated the DFE for transposable element mediated insertion mutations in yeast by measuring mutation frequency changes under adaptation over time. Notably, lines with the highest initial fitness appeared to suffer the greatest fitness consequences from de novo insertion mutations.
Previously, we have inferred the DFE for spontaneous mutations in the single-celled green alga Chlamydomonas reinhardtii using growth rate as a fitness measure (Böndel et al 2019). We crossed mutation accumulation (MA) lines of the CC-2931 strain that had randomly accumulated spontaneous mutations for many generations with a mutation-free ancestral strain. We then measured growth rate and determined the genotypes of many recombinant lines carrying random combinations of mutations. We developed a Bayesian MCMC approach to estimate parameters of a DFE, which also enabled us to extract the estimated effects of individual mutations. This suggested a highly leptokurtic DFE, with a surprisingly high proportion of mutations (about 50%) increasing growth rate.
Here, we cross MA lines of C. reinhardtii of a different strain (CC-2344) to that studied by Böndel et al (2019), which had undergone approximately 1,000 generations of spontaneous mutation accumulation, with a mutation-free ancestral strain in order to generate many recombinants with different combinations of mutations. Rather than assaying individual recombinants, as in our previous experiment, we allow pools of recombinant haplotypes to compete against one another in an experimental evolution setting and measure changes of mutation frequency by deep sequencing. We develop a new MCMC approach to infer the DFE based on changes in frequency of linked mutations.
Using this new experimental approach and a different strain, we infer that a surprisingly high proportion of mutations increase fitness in the standard laboratory environment.

Mutation accumulation lines and compatible ancestor
We studied seven C. reinhardtii MA lines (L06, L09, L10, L12, L13, L14, L15) derived from strain CC-2344(isolated in Pennsylvania, USA, in 1988) produced by Morgan et al. (2014) and sequenced as described by Ness et al. (2015). The MA lines and their ancestral strain are of the same mating type (mt+) and will not mate with one another, so we first produced a "compatible ancestor" to which the MA lines could be crossed. This was done by backcrossing CC-2344 to a mt-strain (CC-1691) for 16 generations with the aim of producing a strain nearly identical to CC-2344, with the exception of the region surrounding the mating type locus on chromosome 6. Genome sequencing of the compatible ancestor (using the method of Ness et al. 2015) showed that this was accomplished successfully ( Figure S1).

Generation of populations of recombinants
To produce the starting populations of the experimental evolution experiments (designated t0), we generated recombinant populations by mating each MA line with the compatible ancestor. First, we grew each MA line in Bold's medium (Bold 1942) under standard conditions (23°C, 60% humidity, constant white light illumination) while shaking at 180 rpm to obtain a culture of 30 ml. Three lines had poor growth (L09, L12, and L15), so this procedure was done twice in order to obtain sufficient cell material. After three days, cultures were transferred to 50 ml falcon tubes, centrifuged at 3250 g for 5 minutes and the supernatant removed. Nitrogen-free conditions are required to trigger mating in C.
reinhardtii (Sager and Granick 1954), so we washed each cell pellet with 30 ml nitrogenfree Bold's medium, mixed, centrifuged at 3250 g for 5 minutes and removed the supernatant. We then resuspended the washed cell pellet in 45 ml nitrogen-free Bold's medium. The same procedure was done for the compatible ancestor.
To carry out the matings, we mixed 15 ml of the resuspended MA line cell culture with 15 ml of the resuspended compatible ancestor cell culture in a 50 ml falcon tube. The mixtures were then incubated under standard growth conditions for 7 days in a slanting position (in order to increase the surface area) until zygote mats had formed at the surface. The remaining 30 ml of each culture served as control to allow the detection of mating failures (see below). Zygote mats were then transferred to a fresh 50 ml falcon tube containing 30 ml nitrogen-free Bold's medium and incubated in the dark under standard growth conditions for 5 days to allow the zygotes to mature. To kill any vegetative cells still associated with the zygote mats, we froze the zygote cultures and kept them at -20 °C for 5 hours. After thawing at room temperature for approximately 1 hour, we transferred the zygote cultures to 500 ml conical flasks and added 30 ml of Bold's medium containing twice the standard concentration of nitrogen and 60 ml of standard Bold's medium to obtain a total volume of 120 ml with standard nitrogen concentration. The flask was then incubated under standard growth conditions while shaking at 250 rpm until zygotes had germinated ( Figure 1A). This germination cell culture was then used to start the experimental evolution experiment. The control cultures were incubated as described for the zygote cultures and if any growth was visible after the freezing, the respective zygote culture would have been discarded and the whole procedure repeated, but no such instances were observed.

Experimental evolution
We grew three replicate cell cultures from each MA line x compatible ancestor cross until approximately 60 generations had been reached ( Figure 1B). We designate this time point tt. For L06 and L09 this took 42 days and for L10, L12, L13, L14, and L15 this took 46 days. Each replicate was grown in a volume of 120 ml in a 250 ml conical flask under standard growth conditions while shaking at 250 rpm. We started each replicate with 1.2 ml of the germination cell culture and 118.8 ml Bold's medium. By using standard Bold's medium we ensured that the recombinants could not mate and grew entirely vegetatively.
The remaining cells from the germination cell cultures were collected and frozen at -70°C for sequencing. We then transferred 1.2 ml of each culture to a fresh flask containing 118.8 ml Bold's medium on a three or four day cycle, or in the case of transfers 3, 6 and 9, over seven days in order to maximise biomass for DNA extraction and sequencing. In order to determine the number of serial transfers at which approximately 60 generations had been reached, optical density (OD) at 600 nm of the cultures was measured at the end of eac h transfer growth period. We then calculated the generation time as follows: t = (log Nt -log N0) / log 2, where Nt is the measured OD of the culture at the end of the growth period and   (1). The zygote mats were transferred to new falcon tubes, incubated in the dark to allow zygote maturation and then frozen at -20°C to kill off any vegatative cells (2). The culture with the matured zygote mat was transferred to a conical flask and incubated until zygotes germinated (3). B) Serial transfers. The germination culture was grown up until it was dense enough to start the three replicates and have enough cell material for DNA extraction. The three replicates were started with 1.2 ml of the original germination culture (4). Every three to four days 1.2 ml were transferred to fresh conical flasks and 2 ml of the remaining culture was used to measure OD (5). After nine transfers the end of the experiment was reached and the cells were collected for DNA extraction (6).

Sequencing and sequence data processing
Genomic DNA was obtained from time point 0 (t0) and from each of the three tt replicates for each MA line recombinant population by phenol-chloroform extraction (Ness et al 2012). DNA samples were sequenced on an Illumina Hiseq4000 platform by BGI Hong Kong with 150 bp paired-end reads to an average sequencing depth of 520.7x and 221. Mutations were classified as noncoding, synonymous or nonsynonymous relative to the v5.3 reference genome annotation. To assess the functional effects of mutations, SnpEff (Cingolani et al. 2012) was run using the pre-build C. reinhardtii annotation and with default parameters.
It has recently been suggested that the v5 reference genome contains some misassemblies (Salomé & Merchant, 2019;Craig et al. 2021). Since misassemblies could introduce incorrect linkage relationships between mutations, we lifted over mutation coordinates to the highly contiguous Nanopore-based assembly of the strain CC-1690 (O'Donnell et al., 2020), which is identical-by-descent >95% of its genome with the original reference strain (Gallaher et al. 2015). The v5 and CC-1690 assemblies were aligned with Cactus (Armstrong et al. 2020) with divergence between the genomes arbitrarily set at 0.004 and otherwise default parameters. Liftover was then achieved from the resulting alignment using halLiftover (Hickey et al. 2013). Lifted over coordinates were used for the MCMC analysis described below.

Repeatability of mutation frequency between replicates
We estimated the repeatability of mutation frequency among the three replicate populations by partitioning the total variation in allele frequencies into three components: the variance in frequency among different MA line x compatible ancestor crosses (VMA), the variance in frequency among mutations (VM) and the residual variation due to differences among the three replicate measures for each mutation (VE). Variance components were estimated using the Lmer function in R, and repeatability was calculated as VM/(VM + VE).

C. reinhardtii genetic map
A genetic map is required in the MCMC analysis described below, which calculates frequency changes of linked mutations. We assume an overall genome-wide average rate of recombination, obtained from two published crosses between C. reinhardtii strains CC2935 x CC2936 and CC408 x CC2936, which together provide an estimate of 1cM per 87,000 bases (Liu et al 2018). A third cross reported by Liu et al. (2018) (CC124 x CC1010) that has a c.10-fold lower marker density was not included in our calculations.
Our analysis required the location of the individual mutations on a genetic map. Lui et al.
(2018)'s study does not provide sufficient resolution for this, but higher resolution estimates of the rate of recombination are available from a study of linkage disequilibrium in natural populations (Hasan and Ness 2020). We used these estimates to adjust for variation in the rate of recombination among chromosomes, i.e. assumed a uniform rate of recombination per chromosome. Longer chromosomes have lower recombination rates, presumably due to a requirement for a minimum of 1 chiasmata per chromosome per meiosis ( Figure S2).
We used estimates of the population scaled recombination rate from Hasan and Ness (2020) to adjust the rate of recombination estimated in the Liu et al. (2018) crossing experiment in order that the inverse rate of recombination (yi, bp/cM) increases linearly by 0.000616 per 1 base pair increase in chromosome length (xi) ( Figure S2), while keeping the overall average recombination unchanged, i.e., (see supplementary text A).

Computation of expected allele frequencies after experimental evolution
In this section we describe the computation of the expected frequencies of the mutations after one generation of recombination followed by t generations of experimental evolution, which are used in likelihood calculations and Bayesian inference. We assume that allele frequencies of each mutation are 0.5 after crosses between MA lines and their ancestor, and that changes of allele frequency then occur deterministically and independently among chromosomes, but that genetic linkage of mutations on the same chromosome leads to non-independent allele frequency changes under selection. Based on the assumed genetic map (see above), we first computed the expected frequencies of the n possible haplotypes generated by one round of meiosis involving an ancestral chromosome and a chromosome from a MA line in the absence of selection. In the model of experimental evolution following the cross, mutations have selection coefficients, sj, from which the overall fitness (wi) of haplotype i carrying mi mutations can be computed under multiplicative selection: where δj takes the value 1 or 0 if the haplotype carries the mutant or wild type allele, respectively, for mutation j. The selection coefficients (s) are parameters of the model whose values are changed during MCMC runs. These and other such parameters are listed in Table 1.  Let the frequency of haplotype i at generation t = πi,t. We then calculated the expected frequency of each haplotype after t generations of natural selection by iterating equation (4) for t generations: where wt is the mean fitness of the n haplotypes at generation t. It is then straightforward to compute the expected frequencies of the individual mutations (p) at generation t.

Computation of likelihood
The log likelihood was the sum of three terms, the first involving the mutation frequencies after selection (p), the second involving mutation effects (s), which were assumed to be gamma distributed, and the third involving the total number of negative effect mutations (n0) versus the total number of positive effect mutations (n1), which were assumed to be sampled from a binomial distribution with mean q (the frequency of positive effect mutations, a parameter of the model, Table 1). Likelihood was computed assuming independence among the c chromosomes: The log likelihood for the term involving the mutation frequencies on a chromosome (log L(p)i) was computed assuming that allele frequencies of each of the r experimental evolution replicates are sampled from a log-normal distribution with mean pj and standard deviation σδ, and that numbers of mutant and wild type reads for a replicate are sampled from a binomial distribution with mean pj + δjk: where δjk are variables in the model that allow the allele frequency for each replicate (k) to be different from the overall expected frequency under selection for that mutation, and xjk and djk are the numbers of mutant reads and the sequencing depth, respectively, for mutation j replicate k.
The log likelihood for the term involving the distribution of fitness effects of mutations (log L(s)i) on a chromosome was computed assuming that the fitness effects are drawn from gamma distributions, which can have different parameters for positive and negative effect mutations: where gamma() is the gamma probability density function (PDF), yj takes the value -1 or 1 if mutation j's fitness effect is negative or positive, respectively, δj is an indexing variable that takes the value 0 or 1 if mutation j's fitness effect is negative or positive, respectively, and α δ j and β δ j are elements of vectors α and β, both of dimension two, containing the scale and shape parameters, respectively, of the gamma distributions of fitness effects.
Elements 0 and 1 of α and β contain parameters for negative and positive effect mutations, respectively.

Priors
These were designed to be informative. The prior for fitness effects of mutations was a uniform distribution bounded by -1 and +1. The prior for the frequency of positive-effect mutations (q) was uniform in the range 0 and 1. Priors for the shape and scale parameters of the gamma distribution of effects of mutations and the parameters related to the log normal distribution (σδ and δjk) were uniform in the range 0 to very large values.

MCMC implementation
We used the Metropolis Hastings algorithm to sample from the posterior distributions of the parameters (Table 1)

Distribution of initial mutation frequency before experimental evolution
We crossed seven C. reinhardtii MA lines that had been independently derived from strain CC-2344 to a compatible ancestor of the same genetic background in order to generate populations of recombinants. In the generation following the cross, mutations are therefore expected to be at a frequency of 0.5. Recombinant haplotypes were then allowed to compete with one another in the absence of further recombination in standard laboratory conditions over the course of nine serial transfers. We sequenced samples from each MA line cross at the start (designated time 0) and end (designated time t) of experimental evolution in order to quantify changes in mutation frequency.
We identified 254 mutations in the seven MA lines, comprising 232 SNPs, 13 insertions and nine deletions (Table 2). Recombinant populations were sequenced at a high enough depth at t0 and tt to allow accurate frequency estimation (Table 2). Sequencing depth varied among the mutations, but we did not observe a significant correlation between sequencing depth and mutation frequency ( Figure S3; time t0: r = 0.0741, P = 0.236 and tt: r = 0.0244, P = 0.498; r = Spearman's correlation). Therefore, we can discount any effect of sequencing depth on mutation frequency. The average mutation frequency at time 0 (p0) was 0.481, which is close to the expected value of 0.5. Initial frequency of the different mutations showed considerable scatter, however, since the standard deviation was 0.141, and there were also noticeable differences in the distribution of frequencies between the recombinant populations. Lines L12 and L13, for example, had a relatively broad range of initial frequencies centering around 0.5, whereas L9 and L10 had a narrower distribution, with a mean close to 0.5 and a few mutations with very low frequencies ( Figure 2). Unexpectedly, in the majority of lines there were mutations at frequencies close to zero at time 0, and additionally the frequency of one mutation L15 was close to 1 at time 0 (Supplementary Figure S4). These extreme

Mutation frequencies after experimental evolution
There were substantial changes in the frequencies of many mutations after experimental evolution (pt), but the magnitude and direction varied substantially among mutations ( Figure 2). Mutation frequency was highly repeatable between the three replicates (r = 0.96), indicating that selection was the causal agent of frequency change.
The correspondence between mutation frequencies at the start and end of experimental evolution is shown in Figure 3. As previously mentioned, t0 frequencies are centred around 0.5 (see also Figure 2 and Figure S4), but in some cases t0 frequencies were substantially different from 0.5. In the majority of these cases, frequency usually became even more extreme in the same direction by time t. This suggests that natural selection changed mutation frequency in the generations after the cross before sequencing at t0, and selection also operated in the same direction under subsequent experimental evolution.
There are exceptions, however, the most extreme of which correspond to points in the topleft and bottom-right quadrants of Figure 3. These are cases where the allele frequency explanation for this behaviour is a change in the direction of selection, which would imply that the environmental conditions before and after sampling for time 0 had changed.
Notwithstanding the unexpected cases mentioned above, there are other patterns apparent in the raw allele frequencies ( Figure S4). To account for allele frequency variation among replicates (which are mostly very consistent), in the subsequent analysis, we modelled variance among replicates of the same cross by assuming a lognormal model for the environmental variance between replicates of the same cross. Linked mutations usually moved in frequency in the same direction.

Estimation of the generation time
The number of generations over the course of experimental evolution were estimated based on OD measurements taken at each of the nine transfers. The mean number of generations across all recombinant populations was 59.8 with a standard deviation of 1.03.
The estimated generation times were highly consistent between replicates ( Figure S5, Table S1) and varied only slightly between recombinant populations from the different MA line x compatible ancestor crosses (Table S1) where L06 had the lowest number of generations (58.1) and L15 the highest (60.7).

MCMC analysis to estimate the DFE
We proceeded to carry out the Bayesian MCMC analysis to estimate parameters of the

Model comparison
We used the convention that if BIC(model A)-BIC(model B) < −10, there is strong evidence in favour of model A over model B (Raffery 1995). The results ( Table 2) therefore suggest that more complex models are not strongly favoured over the simplest two-sided gamma model, which has the same scale and shape parameters for positive-and negative-effect mutations.

Parameter estimates
We obtained estimates of parameter values based on the modes of the posterior distributions sampled from the MCMC chains. For the three different models evaluated, the parameter estimates are highly consistent (Table 3). The estimate of the proportion of positive-effect mutations (q) is close to 0.5 for each of the three models, and credible intervals are relatively narrow. This result is consistent with the observed bidirectional changes of mutation frequency. For each of the three models, estimates of the shape parameter of the distribution of effects are close to 0.5. This implies a highly leptokurtic distribution of fitness effect, in which the majority of effects cluster around zero, with a long tail of positive and negative effects. The estimated absolute average effect of a mutation is just over 2%, and the inferred DFE is shown in Figure 4.

Tests for differences between annotated mutations
Using the genome annotation for C. reinhardtii, we tested whether certain annotated types of mutations relating to protein-coding gene function have smaller or larger effects than others by calculating the difference in mean effect or mean squared effect between them (Table S2). All differences are nonsignificant.
As a complementary analysis, we estimated the effect of mutations on annotated coding sequences using SnpEff (Cingolani et al. 2012). Of the ~42% of mutations that could be classified using this approach, 35% were estimated to have low impact and 61% moderate impact. These classifications largely coincided with the synonymous and nonsynonymous classifications used above. Only four high impact mutations were predicted, all of which were nonsense mutations.

Discussion
We have inferred the DFE for a set of spontaneous mutations from a MA experiment in C. reinhardtii by tracking their frequencies under natural selection in the laboratory and equating the observed frequency changes to the corresponding selection coefficients. The analysis of the data is made complicated by the fact that groups of mutations are linked on the same chromosome, implying that every mutation on a chromosome is expected to change in frequency even if only a subset of mutations is subject to selection. We have therefore crossed MA lines with an unmutated ancestor to obtain recombinants carrying the mutations in all possible combinations and measured frequency changes in the resulting recombinant populations after experimental evolution.
We made use of the genetic map of C. reinhardtii for predicting changes of frequency of linked mutations on the same chromosome. In our data, frequency is estimated on the basis of numbers of sequencing reads, and we have used this information to compute the likelihood for a set of predicted frequencies. The likelihood is then used in a Bayesian setting to compute the posterior distribution of parameters of the DFE. To our knowledge, with the exception of our previous study (Böndel et al 2019), there have been no other attempts to fit a parameterized distribution to infer the DFE for new mutations. For example, Flynn et al (2020) produced a graphical representation of the DFE based on estimates of individual mutation effects, but did not estimate the DFE's parameters within a statistical model. This is desirable, because a simple plot of the individual effects of mutations will be inflated by sampling variance, as will summary statistics derived from this distribution.
We observed substantial changes in the frequencies of the majority of mutations, and these changes were highly repeatable among replicates starting from the same base population. This is consistent with the action of natural selection under experimental evolution. The inferred DFE is broadly consistent with that inferred in our previous study on a different C. reinhardtii strain (Böndel et al 2019) in which we measured growth rate and assayed genotypes of many recombinants that emerged from a cross. If we assume a twosided gamma DFE with equivalent scale and shape parameters for positive-and negative-effect mutations, the estimated proportions of positive-effect mutations are almost identical between the two studies. The estimated shape parameter in the current study is somewhat higher than the previous study (β = 0.53 versus 0.32), implying a somewhat less leptokurtic distribution, and the mean mutational effect is about four-fold higher (0.022 versus 0.0049). The biological significance of these differences is unknown.
Our estimated DFE is based on certain assumptions, and there are several potential causes of inaccuracy and/or bias. First, the mutations whose frequencies we tracked were those detected by Illumina sequencing in a previous study (Ness et al 2015), but there are some mutations, including transposable element movements and large scale rearrangements, that we do not currently know about. Selection acting on these unknown mutations will therefore induce frequency changes at linked sites and generally inflate estimates of the strength of selection. Second, the linkage map for the strain we are working with may differ from the one that was assumed. Presumably any differences will lead to over/underestimation of effects, but not in a systematic way. Third, our analysis incorporates changes of frequency that occurred up to time 0 and changes that subsequently occurred up to time t. We do not know the number of generations up to time 0, but it is clear that in several cases mutation frequencies had already changed by then.
Furthermore we are assuming a certain value for the number of generations of experimental evolution, but do not have a precise measure of this.
We infer that there is a high frequency of mutations with positive effects on fitness, which we also observed in our previous study in a different C. reinhardtii strain (Böndel et al 2019). Such a high frequency is a surprising finding, since it has long been argued that the majority of new mutations are likely to be neutral or deleterious (summarized by Keightley and Lynch 2003). Broadly, the majority of nucleotide sites in compact genomes such as in C. reinhardtii are at sites that are selectively constrained, and relatively few sites can evolve free from the influence of natural selection. In nature, the dominant force of natural selection appears to be purifying selection, since fitness in natural populations is likely to be close to an adaptive peak and most changes are therefore harmful. Direct evidence for this comes from attempts to infer the fraction of advantageous amino acid mutations, based on analysis of the site frequency spectrum (e.g., Keightley et al 2016;Tataru et al 2017). Most studies utilizing mutation accumulation also suggest that the net directional effect of mutations is negative (summarized by Halligan and Keightley 2009).
One possible explanation for the higher than expected frequency of advantageous mutations is that there was selection against deleterious mutations and in favour of advantageous mutations during the generation of the MA lines. This could have affected the strain that was the subject of the current experiment and the strain studied by Böndel et al (2019). In the experiment described here, there were approximately 11 generations of growth between each transfer in the MA experiment, where selection could operate to change the frequencies of de novo mutations. To investigate the influence of selection on the estimated DFE, we implemented the method recently developed by Wahl and Agashe (2021) to predict the extent of under-or over-contribution to the DFE for mutations with given selection coefficients, assuming that there is a doubling of cell number for t = 11 generations during mutation accumulation. Based on equation (2) of Wahl and Agashe (2021), and assuming a two-sided gamma distribution with parameters specified in Table   3, the corrected estimate for the frequency of positive effect mutations is q = 0.46 (uncorrected = 0.50). This suggests that selection during mutation accumulation had only a modest impact. Presumably, this is because our inferred DFE is leptokurtic, and most of the density is concentrated near zero. Furthermore, the mean absolute selective effect is relatively low (0.022), and the 11 generations of growth between transfers in the MA experiment were insufficient to lead to substantial frequency changes over the bulk of the distribution. The contrast between corrected and uncorrected DFEs ( Figure 5) shows only a slight downward shift in the corrected distribution.  Table 3 and corrected DFE generated by applying the method of Wahl and Agashe (2021).
Although our experimental design attempted to mitigate it, there will also have been selection against those colonies that were too small to see at the time of transfer during the MA experiment. This would most likely select against strongly deleterious mutations, but the effect of this between-colony selection is difficult to quantify.
If selection during mutation accumulation was not the explanation for the high frequency of positively selected mutations, then we must seek other explanations. Although the strains we studied were isolated three decades ago, C. reinhardtii can be maintained for long periods without cell division and it is probable that they have passed through insufficient generations to adapt to the novel laboratory environment. Fitness is therefore likely to be far from an adaptive peak, and therefore the fraction of advantageous mutations is expected to be higher than in a natural environment (Orr 1998). Little is known about the ecology of C. reinhardtii, but the laboratory clearly represents an extremely artificial environment. C. reinhardtii has been isolated from soil and may also be present in freshwater, where the species experiences day/night cycles, fluctuating temperatures and resource availability, biotic interactions (predators, pathogens etc.), and so on. Adaptations to life in the field include a complex metabolism (autotrophy and heterotrophy) and motility in response to light and nutrients (Sasso et al. 2018). It is plausible that loss of function mutations in genes no longer required in the laboratory are advantageous. Loss of function mutations have frequently arisen under laboratory conditions, for example certain strains have lost the ability to utilise nitrate after culture with an alternative nitrogen source (Harris 2009;Gallaher et al. 2015), although it is unknown if the underlying mutations in such cases were positively selected.

Calculation of scaled recombination rates per chromosome
Let yi be the scaled recombination rate for chromosome i, xi be its length, and b the slope of the linear relationship between recombination rate and map length, i.e., If n is the number of chromosomes, the mean recombination rate is:    Figure S1: SNP densities along the 17 chromosomes between the compatible ancestor for CC-2344 and its two ancestral strains. SNP densities were calculated for 80-kb windows along the chromosomes between the compatible ancestor and CC-2344 (red) and between the compatible ancestor and the mating type minus (mt-) donor strain (black). A SNP density of 0 indicates no genetic differences between the compatible ancestor and the strain it was compared to.