Selection-Enriched Genomic Loci (SEGL) Reveals Genetic Loci for Environmental Adaptation and Photosynthetic Productivity in Chlamydomonas reinhardtii

This work demonstrates an approach to produce and select hybrid algal strains exhibiting increased photosynthetic productivity under multiple environmental conditions. This simultaneously addresses two major impediments to improving algal bioenergy production: 1) generating new genetic variants with improved performance; and 2) disentangling complex interactions between genetic and physiological factors contributing to these improvements. We pooled progeny generated from mating two environmental isolates of the green alga Chlamydomonas reinhardtii and cultured the pools under multiple environmental conditions. Strains from the outcompeting populations showed substantial (in some cases over 3 fold) increases in productivity over the parental lines under certain environments related to biomass production, including laboratory conditions as well as hyperoxia, fluctuating light, high salinity and high temperature. The results indicate that C. reinhardtii has remarkable, untapped, directed evolution capacity that may be harnessed using breeding and competition approaches. The populations were deep sequenced at multiple time points to identify “Selection-Enriched Genomic Loci” (SEGL) that accumulated in the populations, and thus likely confer increased fitness under the respective environmental conditions. With improved resolution, SEGL mapping can identify allelic combinations used for targeted breeding approaches, generating elite algal lines with multiple desirable traits, as well as to further understand the genetic and mechanistic bases of photosynthetic productivity. Significance Statement Increasing the photosynthetic efficiency of algae during biomass production is perhaps the most critical hurdle for economically sustainable algal based biofuels. This presents unique challenges because modifications designed to increase photosynthesis often result in decreased fitness, due to production of toxic reactive oxygen species when photosynthesis is not adequately regulated. These problems are exacerbated under natural and outdoor production environments because of the complex nature of photosynthesis and the multifaceted interactions between genetic, environmental and physiological factors. Here, we demonstrate a high throughput biotechnological screening approach that simultaneously produces algal strains with highly increased autotrophic productivity and identifies genomic loci contributing to these improvements. Our results demonstrate that Chlamydomonas reinhardtii exhibits high directed evolutionary capacity readily accessed through breeding and selection.

high salinity and high temperature. The results indicate that C. reinhardtii has remarkable, untapped, 29 directed evolution capacity that may be harnessed using breeding and competition approaches. The 30 populations were deep sequenced at multiple time points to identify "Selection-Enriched Genomic Loci" 31 (SEGL) that accumulated in the populations, and thus likely confer increased fitness under the respective 32 environmental conditions. With improved resolution, SEGL mapping can identify allelic combinations 33 used for targeted breeding approaches, generating elite algal lines with multiple desirable traits, as well 34 as to further understand the genetic and mechanistic bases of photosynthetic productivity. 35

Significance Statement 36
Increasing the photosynthetic efficiency of algae during biomass production is perhaps the most 37 critical hurdle for economically sustainable algal based biofuels. This presents unique challenges 38 because modifications designed to increase photosynthesis often result in decreased fitness, due to 39 production of toxic reactive oxygen species when photosynthesis is not adequately regulated. These 40 problems are exacerbated under natural and outdoor production environments because of the complex 41 nature of photosynthesis and the multifaceted interactions between genetic, environmental and 42 physiological factors. Here, we demonstrate a high throughput biotechnological screening approach that 43 simultaneously produces algal strains with highly increased autotrophic productivity and identifies 44 genomic loci contributing to these improvements. Our results demonstrate that Chlamydomonas 45 reinhardtii exhibits high directed evolutionary capacity readily accessed through breeding and selection. 46 We screened a series of natural isolates and laboratory strains of Chlamydomonas for productivity under 120 three well-defined conditions: (1) baseline condition that mimics a natural solar day (BC, 5% CO2 in air, 121 14:10 light dark cycle with zenith at noontime); (2) hyperoxia (HO), as is often encountered under mass 122 production conditions(64) (5% CO2 in O2, other conditions as in BC); and (3) light stress (LS), (BC days 123 followed by three days of very low light), to favor lines that can rapidly store fixed carbon on "light 124 replete days", and maintain growth during the "light starvation" days (see Materials and Methods and 125 Fig. S1). One pair of isolates, CC1009 and CC2343, exhibited similar growth under BC, but substantially 126 different growth under HO and LS conditions (Fig. 1A). Under LS, CC1009 reduced productivity by 20% 127 during the light replete days, but CC2343 growth was essentially completely inhibited. Neither line was 128 able to grow on the light starvation days. Under HO conditions, CC1009 lost 66% of productivity whereas 129 CC2343 lost 87%. 130 To generate a diversity panel of progeny, we crossed CC1009 (mt-) with CC2343 (mt+) (Fig. 1B). Using a 131 refined list of Single Nucleotide Polymorphisms (SNPs) identified from the parental strains (see SI 132 Materials and Methods), we mapped the CC2343 allele frequency of two independently generated 133 inoculum populations used for competition experiments (Fig. 1C). The similar allele frequency 134 distributions across the genomes of the F1 inoculums show that population pooling, deep sequencing 135 and methods to quantify the frequencies of SNPs generated reproducible results. Excluding 136 chromosome 6, the CC2343 allele frequency varied between 0.5 and 0.35 across the genome. Averaging 137 all of the allele frequencies across the genome generated frequencies of 0.42 and 0.58 for CC2343 and 138 CC1009 respectively, indicating a slight bias for CC1009 loci within the population. By contrast, the 700 139 kb segment of DNA at the beginning of chromosome 6, corresponding to the mating type locus(65, 66), 140 showed strong selection for CC1009 loci. This served as an internal positive control, because we 141 exclusively selected mt-strains for the F1 competition experiments, so that the population should be 142 essentially homozygous for the CC1009 mt-mating type locus, while nearby (linked) loci should strongly 143 7 favor CC1009 (Fig. 1D, blue shaded area). CC1009 loci was progressively lost moving away from the 144 mating type locus, indicating that crossover events must have occurred following mating. The largest 145 changes of allele frequency on chromosome 6 occurred in two distinct regions, together totaling <1 MB 146 (Fig. 1D, grey shaded regions), suggesting the possibility of recombination hotspots. 147

Competitive growth under different conditions selects for distinct combinations of genomic loci 148
Pooled F1 inoculums were cultured in environmental photobioreactors (67) and grown under either BC, 149 HO or LS conditions, and samples were collected periodically for deep sequencing during each 150 competition (see Table S1 and Despite the fact that BC was designed as a low stress condition, it imposed strong selection for specific 156 loci throughout the genome (Figs. 2A, S2, S4). However, the aggregated frequency of parental alleles 157 across the genome showed only a slight (~2.3%) preference for CC2343 at the end of the competition 158 (Table S2). Competitions under "harsher" conditions of HO and LS resulted in stronger enrichment for 159 alleles from CC1009 over CC2343, and the final populations showed 15% and 3% increases in CC1009 160 alleles respectively (Table S2). These results likely reflect selection for hybrid progeny containing alleles 161 that confer higher stress tolerance related to CC1009. Nonetheless, some CC2343 alleles were enriched 162 as well, indicating novel genetic combination from both strains contributed to the fitness of the final 163 population. 164 The kinetics (or time-dependencies) of allele frequency changes within the populations followed distinct 165 patterns for different regions and conditions. Some regions showed gradual increases throughout the 166 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint experiment (e.g. chromosome 10 under BC, see Fig. 2A, arrow 1), whereas chromosome 3 showed rapid 167 initial selection in all conditions (e.g. Fig. 2A, arrow 2). In some cases, only part of the chromosome was 168 selected for, while other regions remained unchanged (e.g. Fig. 2A, arrow 3), and in other cases, an 169 initial selection for one loci was reversed at later times (Fig. 2C, arrows B). We interpret these complex 170 behaviors as reflecting the importance of the most impactful loci first, followed by selection for 171 secondary effects of allelic combinations that sometimes can be maladaptive. Under LS, the overall rates 172 of changes were slower than under BC or HO, most likely due to reduced cell divisions during the low 173 light days (see Table S2). 174 Comparing the variation in allele frequencies across our biological replicates, we calculated the LOD 175 scores for each 60bp genomic windows to identify SEGLs (see Material and Methods). Using a highly 176 conservative threshold of LOD>14, we identified 11, 12 and 52 SEGLs for the LS, BC and HO conditions, 177 respectively. The SEGLs ranged in size between 64 Kb and the entirety of chromosome 10 at 6.5 Mb for 178 BC conditions, but the median size of all identified SEGLs was 144 Kb. Some mapped SEGL were common 179 between all conditions, such as the far-right side of chromosome 3 and a 140 Kb region of chromosome 180 9. However, each environmental condition led to selection for specific allelic combinations in each 181 We should note however, most chromosomes can be divided up into multiple sub-regions with 187 substantial allele frequency changes, reminiscent of haplotype blocks. For example, on chromosome 6 188 of the HO population (at day 21), there were at least six sub-regions with allele frequency changes of 189 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint approximately 0, -0.11, -0.02, 0.10, 0.18, and 0.10 from left to right (Fig. 2C). These sub-regions show 190 significant differences in LOD scores (> 3) from their neighboring regions and often show distinct 191 accumulation kinetics over time between environments (e.g. Fig. 2C, arrows), supporting the view that 192 they represent distinct allelic combinations that confer differentially to selection. While it is tempting to 193 suggest that distinct loci affecting fitness fall under these regions, additional resolution is needed to 194 further break down large SEGL regions to narrowing down contributing genes. 195

Mating-induced genomic diversity following an F1 cross and the effects on selection and SEGL 196 mapping resolution. 197
To better understand the impact of increased population size and cross-over events that contribute to 198 distinct allelic combinations on our SEGL mapping potential, we generated an F2 population of >240 199 lines by intercrossing F1 progeny (Fig. 3A) from two dissected tetrads. The F1 progeny used to generate 200 the F2 lines were sequenced and contributing CC1009 and CC2343 loci were mapped (Figs. 3B and S7). 201 The F1 progeny showed an average of about 13 crossover events for each cell, distributed over the 17 202 chromosomes, giving us an estimate for the rate of genetic diversification during meiosis in 203 Chlamydomonas. There were multiple crossover events at common loci between the two meiotic events 204 on chromosomes 9, 13, 14, and 17 ( Fig. S7), suggesting again the possibility of recombination hotspots 205 as observed in many multicellular plant species(68). Surprisingly, the distribution of CC1009 and CC2343 206 allele frequencies in the pooled F2 inoculum deviated from the expectation of equal contribution from 207 each of the F1 parents (Fig. 3D). These deviations showed transitions at distinct points on the genome, 208 many of which coincided with crossover locations in the contributing F1 lines, e.g. compare arrows in 209 Fig. 3C for transitions on chromosome 2, with specific crossover events in the F1 lines (arrows in Fig. 3B). 210 These effects suggest that the second mating itself led to selection for certain genomic loci, perhaps 211 because of genomic incompatibility causing loss of viability, limiting the overall diversity of the 212 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint population. It is also possible that at the time of the library pooling some lines had gone through S-phase 213 and contained twice the number of chromosomes. In any case, the results show that one cannot assume 214 complete segregation of parental loci, and it is critical to compare enrichment results to the initial allele 215 frequency distributions. 216 The F2 population (Fig. 3D) was competed under BC, HO and LS conditions and samples were collected 217 at days 8, 16 and 21 for deep sequencing to track the allele frequency of each population (Tables S1 and 218 S2). The allele frequencies between replicate populations were highly reproducible for all three 219 conditions ( Fig. S8-S10) and showed high degrees of correlation ( Fig S11). As with the F1 pool, 220 competing the F2 population under different environmental conditions led to enrichment of distinct 221 combinations of genomic regions, but with some important differences. The F1 competitions resulted in 222 approximately Gaussian distributions of allele frequencies, implying that the final pool contained a range 223 of genetic variants that were able to compete relatively evenly. By contrast, the F2 competition, 224 particularly under BC, led to selection for regions from mostly one progenitor or the other (Fig. S12) 225 resulting in a bimodal distribution of allele frequencies, preventing us from obtaining SEGL mapping of 226 the F2 population (see Fig. S8, day 21). The bimodal allele frequency distributions (Figs. S8-10, S12) are 227 consistent with a strong founder effect and that the competitive advantage for individuals in the F2 228 populations was likely resulting from specific allelic combinations inherited from the F1 parental lines. 229 Supporting this interpretation, the F2 competed populations retained stretches of the chromosome that 230 contains the crossover positions from the individual dissected F1 tetrads (Fig. S13). 231 Interestingly, each environmental condition selected for different genomic regions from the F1 parents. 232 For example, BC selected almost exclusively for genomic regions matching crossovers in a single F1 233 tetrad (termed F1_5_4, see Fig. S13 panel A). To a lesser extent HO selected for loci from a different 234 tetrad parent (F1_1_2) (see, e.g. see arrows in Figs S13 panels A and B indicating abrupt changes in allele 235 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. frequency in chromosomes 1, 3, 7, 9, 11,13, and 16). LS produced a population with the highest genomic 236 diversity, as shown by the more Gaussian distributions of allele frequency ( Fig. S11 Panel C), and clear 237 contributions from at least two tetrad parents (F1_5_3 and F1_5_4). 238

Chlamydomonas shows strong evolutionary capacity. 239
For breeding to effectively improve industrial algal production, it is critical that some meiotic progeny 240 are more productive than their parent lines. Indeed, the growth rates (doublings day -1 ) of the 241 competing, co-cultured F1 and F2 populations under BC and HO eventually surpassed that of either of 242 the parental lines under the same conditions, suggesting that algal fitness was a transgressive trait and 243 some of the outcompeting progeny possessed novel allelic combinations of different loci (Fig. S14). We 244 then screened a subset of outcompeting lines, isolated at the end of F1 and F2 competitions. Strikingly, 245 the majority of winners from both F1 and F2 populations showed higher productivity (or tolerance) than 246 either of the original parent lines, under the respective competition conditions (Figs. 5A-C, S15-S16). The 247 best performing winners from the F1 competition under BC and HO showed 20% and 210% increase 248 biomass productivities, respectively, compared to the more productive progenitor line. Similar trends 249 were observed with selected F2 winners that had varied degrees of improvement (see Figs. S15 and 250 S16). These results indicate that the new combination of genes, shuffled from both parents led to 251 individuals with more optimal phenotypes, and strains exhibiting these traits underwent positive 252 selection. 253 We also tested for tradeoffs in performance after comparing outcompeting lines selected under one 254 condition on other conditions. Lines selected under HO also showed similar or even more robust growth 255 under BC conditions (Figs. S15 B and E), suggesting that HO tolerance is not a tradeoff for BC tolerance. 256 Also, nearly all lines selected under LS showed increased tolerance (a higher ratio of LS:BC productivity) 257 and productivity under LS conditions, but all had decreased fitness under BC (Fig. S15 G, H). Overall, 258 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint these results suggest that, while some traits selected for in our competitions may incur tradeoffs under 259 other conditions, others appear to be compatible across multiple conditions relevant to the autotrophic 260 mass production of algae. 261 To further explore the evolutionary capacity of Chlamydomonas, we streamlined our mating and 262 selection process by hatching pools of hundreds to thousands of isolated F3 zygotes during, or just prior 263 to the exposure to a new set of selection conditions. In the first experiment pooled zygotes were 264 hatched prior to exposure to harsh pond-like conditions, with fluctuating temperatures (between 12 and 265 44 ○ C) and high light (Fig. S16). Under these conditions, all of the selected winners performed better 266 than the poor performing parent line, CC2343, and one showed a statistically significant increase (~33%) 267 over the better performing parent, CC1009 (Fig. 5D). 268 In a follow-up experiment, F3 zygotes were hatched directly under high salt conditions (same as BC but 269 with temperatures held at 28 ○ C and 20 g/L of Instant Ocean salts, see Materials and Methods) followed 270 by culturing in turbidostat mode under these conditions for eight days prior to isolating lines from the 271 outcompeting population. Seven randomly selected outcompeting lines showed increased productivity 272 under high salt conditions ranging from between 20% and 83% compared to the more productive of the 273 progenitor lines (CC2343) (Fig. 5E). For comparison, we also randomly selected seventeen F2 progeny 274 prior to selection and grew them under high salt conditions, and found growth rates from zero (i.e. 275 lethal) to 45% above that of the parent lines (Fig. 5E). This experiment clearly demonstrates that 276 polyculture competition pre-screens strains for increased fitness prior to monoculture phenotyping. 277 Taken together, these data demonstrate that through breeding and selection of Chlamydomonas, 278 segregants can be generated which have increased fitness, with traits that allow higher productivity 279 under simulated production conditions, including tolerance for high temperatures, salinity, and high 280 oxygen conditions. 281 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint

282
We demonstrate that breeding natural variants of Chlamydomonas and selection under polyculture 283 conditions can be used for streamlining enrichment for segregants showing higher growth and fitness 284 under a wide range of environmental challenges. We also show that quantitative genomics approaches, 285 such as BSA or SEGL, can be used to identify genomic loci that reflect the genetic bases for the observed 286 increased fitness. In this study the resolution of SEGL regions in the F1 population spans regions from 60 287 KB to over 1.2 MB, encoding from 10 to over 2000 genes. Thus, although the resolution was too low to 288 identify specific genes linked to increased productivity, results on both the F1 and F2 competitions 289 suggest that increased SEGL resolution can be obtained by generating larger libraries of primary and 290 secondary crossover events, followed by environmental selection. 291 Though we cannot dismiss that only a few loci are responsible for the observed improvements in 292 productivity of the winners under any condition, the number of genetic loci, or quantitative trait loci 293 (QTL) that underlay even a single trait related to domestication in other species has been often difficult 294 to estimate (20). Even for single traits in maize, the number of QTL range from 6 to 26 (27). One study 295 suggested that the nearly 500 genomic regions, spanning an estimated 2,000 genes contributed to 296 domestication of maize (69). This suggests that, like many crop plants (70), the highest levels of 297 improvement in algal photosynthesis may require stacking of many genetic variations with small effects. 298 For Chlamydomonas, a library of at least 12 natural isolates and 25 lab strains have been sequenced, 299 and some phenotypic analysis of the strains revealed substantial phenotypic variation between isolates 300 and lab reared strains (54, 71-73), indicating that there is significant genetic diversity to drive artificial 301 breeding efforts. It has been shown in Chlamydomonas that selection under mixotrophic or high light 302 conditions can lead to higher fitness after mutagenesis (74-76) as well as higher levels of halotolerance 303 after sexual recombination and selection (77-79). Thus, it appears that modern genomics-targeted 304 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint breeding and phenotypic selection can form the basis for rapid improvement of photosynthetic 305 productivity and identification of the enabling allelic combinations in order to domesticate an algae. 306 Crossing-and selection-induced gains in productivity observed under one condition did not necessarily 307 impose tradeoffs of decreased productivities, and in some cases appeared to increase productivity 308 under other conditions, suggesting that the approach may be used to stack multiple productivity and 309 fitness traits for a specific set of environments. The system is highly scalable through high-throughput 310 variant generation and bulk selection, and thus the approach may be especially relevant for generating 311 production strains without the requirements for marker genes as in the case in conventional genetic was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021.  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint Figure 3. F1 recombination events shape F2 population genome structure. (A) Shows the breeding paradigm to generate the F2 progeny library. Two F1 tetrads were dissected and crossed with the two opposite mating type from the same tetrad and 30 F2 progeny from each cross were pooled to generate the F2 progeny library of ~240 lines. (B) Shows the offset allele frequencies relative to CC2343 of chromosome 2 for the dissected tretrad progeny, which are the F1 progeny used to generate the F2 population. The allele frequencies range from 0 to 1 and are centered on the horizontal dotted lines at a relative allele frequency of 0.5. (C) Shows the chromosome 2 allele frequency of 240 pooled F2 lines used as the F2 inoculum. (D) The allele frequency across the genome of the F2 inoculum (black line) deviates from theoretical values (red line). The theoretical allele frequency was calculated by averaging the allele frequency from each F1 progeny cross shown in panel A.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint Figure 5. Through breeding and selection, Chlamydomonas shows high degrees of phenotypic plasticity.
Step 1 is to generate genetic diversity through breeding divergent lines.
Step 2 is to compete the lines under polyculture conditions. Step 3 is to isolate and screen the surviving progeny for increased productivity. (A) Shows average daily productivities of isolates from the F1_BC population (grey bars) and the parental strains CC1009 and CC2343 (dark blue bars for all panels). (B) The LS tolerance of surviving isolates of the F2_LS population (pink bars), is shown. (C) Shows the HO tolerance of F1_HO survivors (light blue bars) compared to the parental strains. (D) The productivity of selected progeny (red bars) compared to the parental strains after an environmental simulation is shown. Error bars represent the standard deviation of (from left to right) n = 4, 5, 5, 5, 4, 4, 4, 6 biological replicates (E) Summarizes a screen of progeny productivity in media containing 20 g/L of Instant Ocean. The green bars are strains isolated after hatching and selection under 20 g/L of Instant Ocean salts, red bars are random F2 progeny and blue bars are CC2343 and CC1009. Error bars represent the standard deviation between daily average productivities. It should be noted that the observed percent productivity increases under selection conditions are not equal to tolerance increases.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 543   544   Table S1. Summarizes the genome coverage for each deep sequencing sample. Samples in red denote 545 poor sequence alignments to the reference genome and failed the SNP calling pipeline. 546 547 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Supplemental Figures
The copyright holder for this preprint (which this version posted July 6, 2021. ;

548
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 566 567 Figure S6. A high degree of reproducibility is shown using a matrix of Pierson Correlation Coefficients between the allele frequencies of replicate F1 populations exposed to the same environmental selection. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ;

568
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 569 Figure S7. Genomic maps of the daughter cells resulting from two independent meiotic events are shown. Daughters 1_1 through 1_4 are from one meiotic event and 5_1 through 5_4 are from the second. The allele frequency is relative to CC2343 and the range of each shaded horizontal bar in is from 0 to 1, the dashed horizontal lines represent an allele frequency of 0.5. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 576 577 Figure S11. A high degree of reproducibility is shown using a matrix of Pierson Correlation Coefficients between the allele frequencies of replicate F2 populations exposed to the same environmental selection. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 582 583 Figure S14. Populations showed increased doubling rates compared to parental lines. Doubles per day for F1 and F2 BC and HO populations were averaged from the last 5 full days of culturing. Error bars represent the standard deviation between biological replicates where, from left to right n = 8, 8, 5, 5, 5, 5, 3, 3, 5, 5, 3, 3. For the statistical analysis a two tailed t-distribution test was used and an * indicates a p value less than 0.01 and ψ represents a p value < 0.0005 for differences in doubles per day the each sample and CC1009 under the same conditions. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 586 587 Figure S16, Strong heterosis persists in lines through multiple biological replicates. Panel A shows the daily productivity (in g of ash free dry weight produced per m 2 of incident light) of the progenitor lines (grey bars) and choice F1 meiotic progeny (black bars) isolated after 30 days of polyculture under BC conditions. From left to right n = 8, 8, 3, 3, 3. Panel B shows the average daily productivity under BC conditions (grey and black bars) and HO conditions (pink and light red bars) of the parental and choice F1 progeny (respectively) isolated after 30 days of polyculture under HO conditions. From left to right n = 8, 5, 8, 5, 4, 4, 4, 3, 4, 3. Panel C shows the oxygen tolerance of the parental lines (dark blue bars) and the selected F1_HO survivors (dark grey bars) in panel B. Panel D shows the productivity of the progenitor lines (grey bars) and selected F2_BC survivors (black bars) isolated after 21 days of polyculture under BC conditions. From left to right n = 8, 8, 4, 4, 4. Panel E shows the average daily productivity under BC conditions (grey and black bars) and HO conditions (pink and red) of the parental and selected F2_HO progeny (respectively) isolated after 21 days of polyculture. From left to right n = 8, 5, 8, 5, 4, 4, 4, 4, 4, 3. Panel F shows the oxygen tolerance of the parental lines (light blue bars) and the selected F2_HO survivors shown (dark blue bars) in panel E. Panel G shows the average daily productivity under BC conditions (grey and black bars) and LS conditions (lime and green bars) of the parental and chosen F2_LS progeny (respectively) isolated after 16 days of polyculture. From left to right n = 8, 5, 8, 5, 4, 4, 4, 4, 4, 4. Panel H summarizes the light stress tolerance of the lines shown in panel G, pink bars represent the progenitor lines and purple bars represent F2_LS survivors. Error bars represent standard deviation of the averaged daily growth for each biological replicate. Asterisks denotes a maximum p-value of 0.05 from a two tailed t-distribution test while double asterisks denotes a maximum p-value of 0.005, ‡ represent a maximum p-value of 2e -5 .
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint 588 589 590 Figure S17. The light intensity (black line) and temperature (red line) during the environmental simulation selection is shown.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint chlorophyll ml -1 . The ePBR culture height was set to 15 cm using a volume 330 ml of 2NBH media. For 614 individual phenotyping conditions, cultures were pre-conditioned to grown in ePBRs to a chlorophyll 4 615 µg ml -1 and maintained in turbidistat mode using the standard light conditions for at least 3 days prior to 616 measuring productivity. For the light stress (LS) and high oxygen (HO) competition experiments, the pre-617 conditioning time was reduced to a single day to avoid imposing long-term selection under the baseline 618 conditions (BC). For the BC and HO conditions, standard illumination was provided on a 14:10 hour 619 (light:dark) diurnal cycle simulating a cloudless day, with light intensity ascending to a zenith with 620 maximum photosynthetically active radiation (PAR) of 2000 µmol photons m -2 s -1 , and descending until 621 dark, delivered in a sinusoidal form, as illustrated in Figure S1. For the LS regime, the standard 622 illumination days were alternated with a series of three "light starvation" days, which consisted of a 623 simple, 14:10 hour rectangular wave with a PAR intensity of 50 µmol photons m -2 s -1 , Figure S1. All 624 cultures were stirred at 200 rpm using a 28.6 mm by 8 mm Teflon coated stir bar. Gas for BC and LS 625 conditions was 5% CO2 in air and gas for HO was 5% CO2 in O2. Gas delivered through a 5 mm gas 626 dispersion stone with a porosity of 10-20 microns at a flow rate of 350 ml / min for 60 seconds every 627 hour. Culture temperatures were maintained at RT for the F1 and F2 competition and 25 ○ C for 628 monoculture phenotyping of parental lines and competition survivors. 629 The ePBR vessel, stirring and gas delivery was the same as the BC and LS experiments described 630 above. The temperature and light intensity for the for the pond like environmental simulation is shown 631 in figure S17. For the high salt selections and screening the ePBR configuration was the same as the BC 632 conditions except the media contained 20 g/L of Instant Ocean Salts and the temperature was 633 maintained at 28 ○ C. It should be noted that the light for the ePBR for both the pond like simulation and 634 high salt selection is the next generation ePBR light and is a 50 W white LED with an optical columnator. 635 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.06.451237 doi: bioRxiv preprint