Restricted access to beneficial mutations slows adaptation and biases fixed mutations in diploids

Ploidy varies considerably in nature. Yet, our understanding of the impact of ploidy on adaptation is incomplete. Many microbial evolution experiments characterize adaptation in haploid organisms, but few focus on diploid organisms. Here, we perform a 4,000-generation evolution experiment using diploid strains of the yeast Saccharomyces cerevisiae. We show that the rate of adaptation and spectrum of beneficial mutations are influenced by ploidy. Haldane’s sieve effectively restricts access to beneficial mutations in diploid populations, leading to a slower rate of adaptation and a spectrum of beneficial mutations shifted towards dominant mutations. Genomic position also plays an important role, as the prevalence of homozygous mutations is largely dependent on their proximity to a recombination hotspot. Our results demonstrate key aspects of diploid adaptation that have previously been understudied and provide support for several proposed theories.


INTRODUCTION 22
Understanding the impact of ploidy on adaptation is a central challenge in evolutionary biology. 23 Ploidy varies considerably in the natural world from bacteria that are mostly haploid to some plants that 24 can exist as decaploid 1 . In addition, all sexual organisms alternate between ploidy states through 25 gamete fusion and meiosis 2 . Despite its importance, we have an incomplete picture of how ploidy 26 impacts the rate of adaptation and spectrum of beneficial mutations during adaptive evolution. In 27 principle, how ploidy impacts adaptation depends largely on assumptions regarding the dominance of 28 new beneficial mutations. If new beneficial mutations are mostly dominant, then diploids, with twice the 29 mutational target size as haploids, will be twice as likely to acquire beneficial mutations, and will have 30 greater evolutionary potential. Alternatively, if beneficial mutations are recessive, then haploids will 31 haploid yeast adapt more quickly compared to diploids, and that this result holds across many strain 48 backgrounds and many environments [7][8][9][10] . Yet, why haploids adapt more quickly is unclear. 49 Extensive sequencing of laboratory-evolved populations has focused almost exclusively on the 50 spectrum of beneficial mutations in bacterial and haploid yeast populations, rather than in diploids [11][12][13][14] . 51 Collectively, this work shows that the spectrum of beneficial mutations in haploid populations is skewed 52 towards loss-of-function mutations. While there is less information regarding the spectrum of beneficial 53 mutations in higher ploidy populations, one recent effort used whole-genome sequencing and gene 54 expression analysis to explore adaptation of polyploid yeast (including diploids) to Raffinose media and 55 observed a broader spectrum of beneficial mutations in tetraploids 15 . Some work in diploids implicates 56 Haldane's sieve as a filter for recessive beneficial mutations. For example, Gerstein et al. constructed 57 heterozygous diploids from a collection of nystatin-resistant haploid yeast, and found that all evolved 58 nystatin-resistant mutations were recessive 16 . Recent theory also suggests that beneficial mutations in 59 diploids may be overdominant (more beneficial as heterozygotes than homozygotes) 17 and there is 60 some experimental evidence suggesting this is the case 10,18 . Based on these predictions, there is a 61 need to determine how diploidy changes the spectrum of beneficial mutations and the dynamics of 62

adaptation. 63
Here we measure the rate of adaptation for 48 replicate diploid populations through 4,000 64 generations and compare these results to previously evolved haploid populations 13 . We sequence two 65 that each is beneficial in the background in which it arose. Mutations in the shared target of selection, 134 KRE6 are beneficial as haploids, heterozygous diploids, and homozygous diploids, though the benefit 135 as a heterozygote is less than that of a homozygote, showing that it is partially dominant (coefficient of 136 dominance, h=0.34). The diploid-specific target of selection, CTS1, is equally beneficial as a 137 heterozygote or a homozygote (h≈1), however this mutation is neutral in a haploid background. 138 Mutations in the haploid-specific targets, GAS1 and KRE5, are beneficial in haploids and homozygous 139 diploids. In the case of KRE5, the heterozygous diploid fitness is not statistically different from zero 140 (h=0.10, Fig. 4). In the case of GAS1, the heterozygote is strongly deleterious. 141 Gene conversion occurs more rapidly for CTS1 than ACE2 142 Interestingly, although our reconstruction experiments show no significant fitness difference 143 between a heterozygous and homozygous CTS1 mutation, all six CTS1 mutations are homozygous in 144 the diploid populations. Another common diploid-specific target, ACE2, is observed as a homozygous 145 mutant in two of the four populations in which it arose. Both ACE2 and CTS1 are located on Chr. XII. 146 Two possible mechanisms could explain this observation: gene conversion or chromosome loss. To 147 verify that our strains have two copies of Chr. XII, we measured read coverage in our sequencing data 148 and found that coverage across Chr. XII was consistent with coverage across the rest of the genome in 149 all samples containing a CTS1 or ACE2 mutant. We also performed tetrad dissections of CTS1 mutant 150 strains and found four-spore viable tetrads, which is unlikely if all or part of one copy of Chr. XII is 151 missing. These homozygous mutations therefore likely arise from gene conversion events. 152 With this in mind, we investigated the dynamics of genome sequence evolution by performing 153 whole-genome whole-population time-course sequencing on two populations, one with a homozygous 154 CTS1 mutation, and one with a homozygous ACE2 mutation. In population A05, the ace2 allele 155 establishes around generation 900, rises to a frequency of ~0.5 by generation 1,100, and remains there 156 until around generation 1,400, when it rises to a frequency of ~1, fixing in the population by generation 157 1,600 (Fig. 5A). In population F05, the cts1 allele establishes around generation 700, rises to a 158 frequency of ~0.5 around generation 800, and continues without pausing to reach a frequency of ~1 159 around generation 1,000 (Fig. 5B). In this case, the mutant cts1 homozygote establishes before the 160 heterozygote fixes. In addition to CTS1 and ACE2 mutations, we observe other mutations in these 161 populations, including a few heterozygosities that were present in the ancestral strain. We can use the 162 information from the dynamics of these additional mutations to inform our theory of gene conversion 163 leading to homozygosity. The A05 population contains three non-ACE2 variants, two of which sweep as 164 heterozygotes. The third is an existing polymorphism in an LTR that was heterozygous in the ancestor 165 and is approximately 387 kb away from ACE2 on Chr. XII. When the ace2 allele fixes in this population, 166 the LTR also loses heterozygosity (Fig. 5A). The F05 population contains 11 non-CTS1 mutations, 167 including seven heterozygous mutations, two homozygous mutations, and two positions that were 168 heterozygous in the ancestor, but lost heterozygosity during the evolution of this population. Among 169 these is a mutation in DUS4, which sweeps to fixation with the cts1 allele, and the previously described 170 heterozygosity in an LTR which loses heterozygosity at the same time as CTS1 (Fig. 5B). In this case, 171 the LTR is approximately 84 kb away from CTS1. 172 We hypothesize that the length of the pause of the ace2 allele at a frequency of 0.5 is the waiting 173 time between the heterozygote fixing in the population and the gene conversion event. To test this, we 174 performed Sanger sequencing of clones from both populations before, during, and after the sweep, 175 corresponding to allele frequencies of 0, ~0.5 and 1. For ACE2, 19/20 clones at an allele frequency of 176 ~0.5 are heterozygous, consistent with a full sweep of the heterozygote before gene conversion (Fig.  177 5C). For CTS1 at an allele frequency of ~0.5, we observe a mix of heterozygous, homozygous mutant, 178 and homozygous ancestral clones, meaning that the heterozygote did not fix in the population before 179 the gene conversion event, as we saw for the ace2 allele in population A05 (Fig. 5D). This suggests 180 that the cts1 allele gene converts very quickly compared to the ace2 allele. 181 While the F05 allele of CTS1 is a nonsense mutation (likely to be a loss-of-function), the other five 182 CTS1 mutant alleles seen in our diploids are missense mutations (possibly alteration-of-function). Even 183 so, all six show similar dynamics of adaptation, with similar rates of gene conversion ( Figure S1). 184 Based on this, we hypothesize that the difference in gene conversion rates between CTS1 and ACE2 185 mutations is likely not due to the effects of individual mutations, but rather is due to the location of the 186 CTS1 and ACE2 genes themselves. 187

DISCUSSION 189
We show that our diploid populations adapt more slowly than haploids because recessive beneficial 190 mutations are not selectively accessible to diploids. We have used our power law and linear fits to 191 predict future rates of adaptation for our evolving haploids and diploids, respectively. In our model, the 192 rate of haploid adaptation starts off higher but decreases over time following the power law, while the 193 diploid rate of adaptation is lower but constant, following a linear fit. Therefore, the model predicts that 194 around generation 3,500, the haploid and diploid rates of adaptation are equal, but past this point, the 195 diploid rate is faster than the haploid rate (Fig. S1A). Additionally, our model predicts that by generation 196 15,000, diploids will have achieved more total adaptation relative to their ancestor than haploids (Fig.  197   S1B). The declining rate of adaptation in haploids may be due to the exhaustion of available beneficial 198 recessive mutations. 199 Through our comparison of the mutations gained during haploid and diploid adaptation, we observe 200 that both the mating pathway and the negative regulation of Ras were prominent targets of adaptation 201 in haploids, but were not targets of selection in diploids. This makes sense for the mating pathway, as it 202 is repressed in diploids (and therefore sterile mutations are not selectively advantageous). As to why 203 negative regulation of Ras is not a target of adaptation in diploid populations, there are two possibilities. 204 The first is that all or most of the spectrum of beneficial mutations that can be made to genes involved 205 in the negative regulation of Ras are recessive in their fitness effects, and thus unlikely to pass through 206 Haldane's sieve. The other possibility is that the Ras pathway is regulated differently in haploids and 207 diploids, such that some genes are not functionally redundant in haploids and diploids. Some yeast 208 genes are known to have ploidy-specific regulation 5 . Interestingly, CTS1 is included among these 209 differentially regulated genes, increasing in expression with ploidy. This physiological difference 210 between haploids and diploids may explain why mutations to CTS1 are diploid-specific. 211 By examining our diploid evolved mutations, we found that Haldane's sieve is effective at filtering 212 out recessive beneficial mutations from evolving diploid populations, but it is not the only factor affecting 213 the rate of adaptation and spectrum of beneficial mutations in diploids. We also see that genomic 214 position is an important factor, largely due to varying rates of recombination throughout the genome. 215 This is particularly visible on the right arm of Chr. XII, which contains the rDNA locus, a known 216 recombination hotspot in yeast 22 . We find that homozygous mutations are rare (only 10% of diploid 217 mutations) but are largely concentrated on the right arm of Chr. XII (Fig. 3), particularly in the CTS1 218 and ACE2 genes (Fig. S2). This implies that the ability for beneficial diploid mutations to become 219 homozygous, and thus to escape from Haldane's sieve will depend strongly on local rates of gene 220 conversion. Our results here further validate the point made by Mandegar and Otto that mitotic 221 recombination is an important factor in the spread of beneficial alleles in evolving asexual populations 23 . 222 We reconstructed evolved haploid-specific, diploid-specific, and shared alleles in isolation as 223 haploids, heterozygous diploids, and homozygous diploids, and found that the degree of heterozygosity 224 differs between the alleles. Our reconstructed allele of CTS1 shows a surprisingly high degree of 225 dominance (h≈1), yet all six of the cts1 alleles present in our diploid populations rapidly gene converted 226 and fixed as homozygotes. One possible explanation is that the homozygote does have an advantage 227 over the heterozygote, but that we cannot detect this difference with flow cytometry-based fitness 228 assays. One complication is that homozygous mutations in CTS1 result in a cell aggregation 229 phenotype. This aggregation phenotype has previously been shown to occur during yeast laboratory 230 evolution 24-27 . Our evolved CTS1-mutant diploid populations and the reconstructed cts1 haploid and 231 homozygous diploid strains form aggregates, but the reconstructed heterozygous mutant does not (Fig.  232   S3). It is possible that this phenotype complicates fluorescence-based fitness measurements of these 233

strains. 234
Furthermore, from the dynamics of adaptation of all six CTS1-mutant populations (Fig. 5A, Fig.  235   S4), we have shown that CTS1 mutants become homozygous both very frequently and quickly. 236 Additionally, from the dynamics of the F05 population containing our CTS1 mutation of interest, we see 237 a mutant dus4 allele that travels to fixation with the cts1 allele, and one ancestral heterozygosity in a 238 long terminal repeat that loses heterozygosity at the same time (Fig. 5A). All three of these genetic loci 239 are on Chr. XII downstream from the yeast rDNA locus. We propose that a single recombination event 240 caused a loss of heterozygosity for all three of these loci, establishing the cts1 homozygous mutant.

Strain Construction 265
The strains used in this experiment are derived from the base strain, yGIL432, a haploid yeast 266 strain derived from the W303 background with genotype MATa, ade2-1, CAN1, his3-11, leu2-3,112, strains was crossed to yGIL646 to generate heterozygous diploids. The heterozygous diploids were 282 then sporulated, tetrads were dissected, and haploid spores were mating-type tested and genotyped. 283 Appropriate haploid spores were then crossed to each other to create homozygous diploid strains for 284 each mutation. 285

Long-Term Evolution 286
To set up the long-term evolution experiment, a single clone yGIL672 was grown to saturation in 287 rich-glucose YPD medium, was diluted 1:10,000, and was used to seed 48 replicate populations in a 288 single 96-well plate. This initial plate was duplicated and then frozen down for future use. The cultures 289 were evolved through 4,000 generations (400 cycles) of growth and dilution in YPD at 30°C. Every 24 290 hours, the populations were diluted 1:1024 by serial diluting 1:32 (4 µl into 130 µl) x 1:32 (4 µl into 130 291 µl) into new medium containing ampicillin (100 mg/L) and tetracycline (25 mg/L). All dilutions were 292 performed using the Biomek Liquid Handler equipped with a Pod96. Approximately every 50 293 generations, plates were mixed with 50 µl of 60% glycerol and archived at -80°C. 294

Competitive Fitness Assays 295
Flow cytometry-based competitive fitness assays were performed essentially as described 296 previously 13,20,25 Briefly, experimental and reference strains were grown to saturation in separate 96-297 well plates. If the strains to be tested were coming from the freezer, the strains were passaged once by 298 diluting 1:1024 to re-acclimate the strains to the appropriate medium. Experimental and reference 299 strains were mixed 50:50 using the Biomek Liquid handler and were propagated for 40 generations 300 under identical conditions to the original evolution experiment. In this project we used two reference 301 strains: diploid MATa/a, and diploid MATa/α. These reference strains are derived from the yGIL432 302 base strain, but contain a constitutive ymCitrine integrated at the ura3 locus. 303 We used previously collected competitive fitness data from evolved MATa haploid strains 13 . We 304 performed a competitive fitness assay as described below on previously constructed heterozygous 305 MATa/a diploid strains 20 . To perform this assay, we constructed a MATa/a diploid version of the 306 fluorescently-labeled reference strain using a plasmid containing LEU2 under a MATa-specific promoter 307 to select for gene conversion at the mating type locus. These data were normalized to their respective 308 references and compiled into Figure 1. 309

Whole-Genome Sequencing 310
For sequencing clones, we struck to single colonies on YPD and picked two colonies from each 311 population to sequence. These individuals were grown to saturation in liquid media and total genomic 312 DNA was isolated for each sample. For whole-genome whole-population time-course sequencing, we 313 thawed each population at 18 time points from generation 0 to 2,000, and transferred 10 µl into 5 ml of 314 YPD. We made genomic DNA preparations as above. 315 We followed a modified version of the Nextera sequencing library preparation protocol 30 , modified 316 further as described in Ref. 20. We used the Nextera sequencing library preparation kit and protocol to 317 isolate total genomic DNA and add the unique Nextera library barcodes to all 48 samples. We 318 measured the concentration of DNA in each sample using a NanoDrop spectrophotometer and 319 confirmed these values using a Qubit fluorometer. We equalized the DNA concentration of each sample 320 via dilution and mixed all 48 samples into a single pool. We used a BioAnalyzer High-Sensitivity DNA 321 Chip (BioAnalyzer 2100, Agilent) to confirm that the pool contained the appropriate length DNA 322 fragments, and performed gel extraction on the pool to remove short fragments. The pool was run on 323 an Illumina HiSeq 2500 sequencer with 157 nucleotide single-end reads by the Sequencing Core 324 Facility within the Lewis-Sigler Institute for Integrative Genomics at Princeton University. After an initial 325 sequencing run provided us with the number of reads from each sample, we remixed the pool to better 326 represent underrepresented samples and it was resequenced. 327

Sequencing Analysis Pipeline 328
The raw sequencing information was first merged from 3 lanes of sequencing via concatentation. 329 This single file was split into 48 files by the barcodes corresponding to each sample using a custom 330 Perl script (barcode_splitter.py) from L. Parsons (Princeton University). Each sample was aligned to the 331 same customized W303 genome based on the S288C genome available on SGD using the Burrows-332 Wheeler Aligner (BWA, Version 0.7.12), using default parameters except "Disallow an indel within INT 333 bp towards the ends" set to 0 and "Gap open penalty" set to 5, creating both a .bam and .bai file for 334 each. Variants were called using the FreeBayes variant caller (Version 0.9.21-24-g840b412) and 335 merged together into a single spreadsheet. Variants that existed only in paired clones (448 mutations 336 total) were annotated manually using the Integrated Genome Viewer (IGV) and false calls were 337 removed. This fully annotated table (containing 383 mutations) is available in Table S1. 338 For whole-genome whole-population time-course sequencing, we used the same Nextera 339 sequencing library preparation protocol as described above, with the following changes: Instead of two 340 clones from each of 24 populations, we isolated genomic DNA from whole populations at 18 time points 341 for two populations for a total of 36 samples. During the sequencing analysis, after splitting the reads 342 based on the Nextera barcodes, we used a script (clipper.sh) to remove any Nextera adapter 343 sequences introduced by sequencing short fragments. We used a previously-described set of scripts 344 (allele_counts.pl and composite_scores.pl) to call real mutations 13 . 345

Reconstruction Experiments 346
We used three previously identified mutations, which were identified in evolved haploids (mutations 347 to gas1, kre5, and kre6), and one mutation identified in a diploid population (mutation to cts1). We used 348 CRISPR/Cas9 genome editing to reconstruct our evolved cts1 mutation in our ancestral haploid 349 background. The other three mutations were previously reconstructed as single mutants in our haploid 350 background by Matt Remillard (Princeton University), who generously shared these resources with us. 351 These four haploid MATa strains were crossed to our haploid MATα ancestor to construct 352 heterozygous diploids for each mutation. These heterozygous diploids were sporulated, tetrads were 353 dissected, and haploid spores were mating-type tested and Sanger-sequenced to determine which 354 spores contained the mutation but were of opposite mating type. These haploid spores were then 355 crossed to create diploids homozygous for each mutation. Fitness of each of these twelve total samples 356 was measured via competitive fitness assays across seven replicates along with control haploids and 357 diploids with no mutations, all against the appropriate fluorescent ancestor strains.  wild-type haploids (light grey empty circle) and wild-type diploids (dark grey empty circle). The 473 corresponding gene mutated in each is listed at the top. The cts1 mutation is from the F05 population of 474 our diploid data ( Table S1). The gas1, kre5, and kre6 mutations are from our haploid populations 475 BYS2-D06, RMB2-B10, and RMS1-H08, respectively 14 . Each point is an average of seven replicates, or 476 six replicates for the wild-type points. Error bars are the standard error of these averages.