Genomic analysis of fast expanding bacteria reveals new molecular adaptive mechanisms

Bacterial populations have been shown to accumulate deleterious mutations during spatial expansions that overall decrease their fitness and ability to grow. However, it is unclear if and how they can respond to selection in face of this mutation load. We examine here if artificial selection can counteract the negative effects of range expansions. We investigated the molecular evolution of 20 lines (SEL) selected for fast expansions and compared them to 20 lines without artificial selection (CONTROL). We find that all 20 SEL lines have been able to increase their expansion speed relative to the ancestral line, unlike CONTROL lines, showing that enough beneficial mutations are produced during spatial expansions to counteract the negative effect of expansion load. Importantly, SEL and CONTROL lines have similar numbers of mutations indicating that they evolved for the same number of generations and that increased fitness is not due to a purging of deleterious mutations. We find that loss of function (LOF) mutations are better at explaining the increased expansion speed of SEL lines than non-synonymous mutations or a combination of the two. Interestingly, most LOF mutations are found in simple sequence repeats located in genes involved in gene regulation and gene expression. We postulate that such potentially reversible mutations could play a major role in the rapid adaptation of bacteria to changing environmental conditions by shutting down expensive genes and adjusting gene expression. Author Summary We investigated if strong artificial selection for fast expansion can counteract the negative effects of range expansion which had been shown to lead to an accumulation of deleterious mutations. This experiments showed that i) an increase in expansion speed could occur if bacteria were selected from the largest protruding sectors, and ii) that artificially selected bacterial lines accumulated about the same number of mutations than simply expanding line suggesting that the observed increased fitness is not due to increased purifying selection where deleterious mutations would have been removed in fast growing lines. We find that loss of function (LOF) mutations are best explaining the observed increased expansion speed in selected lines. These mutations, which are known to play an important role in adaptive processes in bacterial populations, frequently consist in small insertion-deletions in simple sequence repeats, and are thus relatively easily reversible. They could thus act as switches that can reversibly shut down genes. Our results therefore suggest that shutting down expensive genes and adjusting gene expression are important for adaptive processes during range expansion.


69
Theoretical studies have recently predicted that spatial expansion of populations can lead to the 70 fixation of deleterious mutations (1, 2) due to small effective size and inefficient selection on 71 range margins. When a spatial expansion proceeds for a long time, edge populations tend to 72 accumulate a series of deleterious mutations, leading to a decrease in fitness over time and space 73 (3, 4). This "expansion load" (3) can potentially affect the speed of the expansion and impose 74 constraints on the limits of a species range (5). Our recent empirical work supported these 75 theoretical predictions, and showed that spatially expanding bacterial colonies accumulated 76 deleterious mutations that impacted their fitness (6). between growth rate and the ability to reach the front, competition at the front should increase 87 the expansion speed in all cases. In general, selection for rapid range expansion can occur under 88 the same conditions as selection for rapid dispersal, with the caveat that selection for rapid range 89 expansion requires more stringent conditions. Indeed, it requires that different groups of 90 individuals compete with each other (groups that are located at different locations of a two 91 dimensional expansion front), in contrast to evolution for rapid dispersal that is based on 92 competition between different individuals competing for being at the front. One would thus expect that selection for rapid range expansion is only effective in certain types of organisms, 94 namely organisms that form very large populations and expand their ranges on wide fronts 95 where different clonal sectors can compete. 96 Under conditions where selection for rapid range expansion is expected, it is important to 97 consider its consequences for genome evolution. As mentioned above, range expansion leads 98 to a reduction in the effective population size on the front and consequently to an accumulation these different scenarios is a novel and interesting endeavor. 109 Here, we addressed this question by performing an evolution experiment with populations of 110 the bacterium E. coli. We let replicated populations of this bacterium expand their range by 111 placing them on a solid surface of nutritious medium and letting them expand radially, forming 112 an approximately circular expanding population. After three days of expansion (corresponding 113 to about 127 generations), we selected the section of the colony edge that had expanded furthest. 114 We collected about one million individuals from the outer edge of this protruding sector and 115 transferred them to a new habitat where we let them again expand. We thus imposed a regime 116 where only individuals belonging to the fastest growing sector could continue to evolve, 117 whereas all other individuals were removed.
We performed this evolution experiment independently in 20 populations, starting from the 119 same ancestral strain. In addition, we also evolved 20 control populations that were propagated 120 in the same way with the important difference that the sector of the front from which individuals 121 were selected to be transferred to a new habitat was chosen at random, thus without imposing 122 any selection for rapid range expansion. As in our previous range expansion experiment, we 123 worked with a mutator strain of E. coli having a mutation rate about 200 times higher than that 124 of wild-type E. coli.

125
Our goal here is two-fold. First, we ask whether there is a response to selection for increased 126 range expansion. As mentioned above, it is a priori not clear how the balance between mutation 127 accumulation and adaptive evolution will occur in such an experiment. As a consequence, it is 128 not clear whether this regime allows the selection of an increased expansion rate. Our second 129 goal is to analyze the magnitude and the quality of the genomic changes under control and 130 selected conditions. We are thus interested in examining how the interplay between mutation 131 accumulation and adaptive evolution shapes the genomes of the populations that we selected 132 for rapid expansion, and how this compares to the genomic evolution of controls without 133 selection.

134
If we were to observe more rapid range expansions, we could ask more specifically which 135 biological alterations would underlie such a response. One possibility to increase expansion 136 speed is to increase the rate at which individual cells grow and divide. This would likely involve 137 mutations in metabolic pathways, genes for nutrient transporter, and genes responsible for gene 138 expression regulation. Another possible mechanism could be spatial sorting (8), where bacteria 139 would evolve phenotypic traits that would allow them to move within the expanding 140 populations and reach the edge of the expansion faster, without necessarily having a higher 141 growth rate. This spatial sorting phenomenon has been invoked in a recent study where it was 142 shown that alterations of surface proteins influenced the positioning of bacterial cells within an 7 143 expanding population (9). The dissection of newly accumulated mutations should thus provide 144 us with useful insights into the molecular bases of adaptation during range expansions.

147
Increase in expansion speed 148 We let E. coli strains expand radially on top of agar plates for 13 periods of 3 days. We 149 compared 20 lines that were sampled at a random place after each period of 3 days (CONTROL 150 lines) to 20 lines that were sampled at the point of the colony that expanded the farthest (SEL 151 lines) (see Methods). The colony size was measured after every growth period of 3 days. We 152 find that the CONTROL colony sizes decreased significantly over time (-77 µm/day, 95% C.I.

153
[-95;-60], p-value: < 2 x 10 -16 ), whereas the size of the SEL colonies increased significantly 154 (227µm/day, 95% C.I. [192; 262], p-value: < 2 x 10 -16 ) (Figure 1) The average number of mutations is 124.9 in CONTROL lines and 129.8 in SEL lines ( Figure   164 2). We tested for a significant difference in mutations numbers between the groups using a non- value =0.097). As previously described (6), the mutations are distributed along the genome with 172 a periodic pattern that is repeated nearly in mirror-image across the genome ( Figure 2B) 173 centered on the origin of the genome replication. This uneven genomic distribution of the 174 mutations implies that there is a variable mutation rate during the replication of the genome, 175 but that the two replication forks have similar changes in mutation rate as they traverse the 176 chromosome. We estimated these variable mutation rates across the genome by a wavelet 177 transformation (10) (Figure 2B).

178
LOF mutations as a main driver of adaptation 179 We used Elastic Net (EN) regression (11), which performs both variable selection and variable 180 regularization (see Material and Methods), to determine the subset of genes that have the largest 181 effect on the expansion speed in bacteria. The resulting significant coefficient associated to a 182 gene is its net effect on final colony size relative to the initial colony size. With this analysis, 183 we determined which genes explain the difference in colony size between the SEL and 184 CONTROL conditions. We used the EN regression to predict colony size from three different 185 sets of mutations. First, we used the combination of all non-synonymous substitutions, as well 186 as frameshift and non-sense mutations (Table S1). Second, we analyzed separately frameshift 187 and non-sense mutations. Note that non-sense mutations can be considered loss of function 188 (LOF) mutations for a specific gene (Table 1). Finally, we used only non-synonymous 189 mutations, which could be a target for adaptation without loss of gene function (Table S2). We 190 then compared the mean cross-validation error of these models, and find that LOF mutations 191 significantly better explain colony size change (mean error=0.3258) than all mutations taken 192 together (mean error = 0.4105, p = 1.06 10 -10 ) or than non-synonymous mutation alone (mean 193 error = 0.4033, p = 4.85 10 -10 ).

194
Focusing on LOF mutations, we find a total of 43 genes significantly associated with increased 195 colony size and 34 genes significantly associated with a colony size reduction. Quite remarkably, almost all genes leading to a significant increase in colony size are targets of 197 mutations in SEL lines, whereas all genes leading to a significant decrease in colony size are 198 target of mutations in CONTROL lines ( Table 1). The only exceptions are two genes connected 199 to ATPases (gsiA and yjgR), where mutations occur in both SEL and CONTROL line.

200
Mutations in these genes lead to an increased colony size, in agreement with the previous 201 observation that there is still some adaptation going on in CONTROL lines (6).

202
Genes leading to either increased or decreased colony size are involved in metabolic process, 203 transport, gene regulation, biofilm formation, as well as tRNA and rRNA genes. There is 204 evidence for an association between the number of significant genes we find in the gene 205 categories and the impact that mutation in these significant genes have on colony size (increase 206 or decrease of colony size) (Chi-squared test, p-value = 0.009) ( Figure S5). Compared to genes 207 that decrease colony size, there are more genes leading to an increase in colony size that are 208 involved in the transport of substances through the cell membrane or in processes associated 209 with ribosomes and in tRNAs. In contrast, there are more genes that lead to a decrease in colony 210 size that are involved in metabolic processes (Table 1, Figure S5). Additionally, we find two 211 genes where mutations lead to a decreased in colony size that are involved in cell division. Note 212 however, that a separate GO enrichment analysis for genes leading to significant increase or 213 decreased colony size with the EN analysis did not reveal any significant term.

214
Finally, it is worth emphasizing that seven genes leading to an increased colony expansion are 215 connected to tRNAs (leuP, leuV, leuT, leuQ, aspU) or rRNAs (rrlA, rrlC), but that only one 216 gene connected to rRNA leads to a decrease in colony expansion (rsmF) ( Table 1)   We looked for signals of convergent adaptation by searching for mutations that have targeted 230 the same gene in unrelated lines, which is usually taken as evidence for a signal of adaptive 231 processes (12). We therefore tested if some genes were targeted by the 3044 observed non-232 synonymous and LOF mutations more frequently than expected by chance. We simulated the 233 random occurrence of 3044 mutations along the genome, taking explicitly into account the 234 differential mutation rates across the genome as inferred in Figure 2B, and we compared the 235 simulated and observed numbers of genes targeted by these mutations (Figure 3). Non-236 synonymous and LOF mutations were analysed separately, and we categorized the genes in 237 three groups: genes with at least one mutation in either i) CONTROL lines, ii) SEL lines, and 238 iii) genes that mutated in both SEL and CONTROL lines. The analysis of non-synonymous 239 mutations shows no departure from expectations in any category ( Figure 3A), whereas we find 240 more genes jointly targeted by LOF mutations between CONTROL and SEL line than expected 241 ( Figure 3B). Note that this excess might be due to hotspots for frameshift mutations like single 242 sequence repeats (SSR) regions in the genome (13). More interestingly, we observe 243 significantly fewer genes than expected to have been targeted by LOF mutations in SEL lines.

244
In other words, LOF mutations are more clustered than expected by chance in SEL lines. We 245 indeed find that there is a significant excess of genes that have been the target of 2 and of 3 or 246 more LOF mutations in SEL lines ( Figure 3D) and of 3 or more LOF mutations in CONTROL 247 ( Figure 3F). Note however, that there is no deviation from expected counts of mutations per 248 gene for non-synonymous mutation ( Figure 3C, 3E), such that only LOF mutations seem to 249 preferentially accumulate in specific genes.

251
Since there is evidence that mutations in SEL and CONTROL lines are more clustered than 252 expected, we then looked if there were any GO terms with significantly enriched numbers of genes that have non-synonymous substitutions, frameshift mutations, or non-sense mutations. 254 We found 3 significant GO terms in SEL lines: taxis GO:0042330 (q-value = 1.58 10 -4 ), amine 255 catabolic process GO:0009310 (q-value = 0.04), colonic acid biosynthetic process GO:0009242 256 (q-value = 0.04), and one significant GO terms in CONTROL lines: taxis GO:0042330 (q-value 257 = 0.004) ( Table S3) (Table S4). Focusing more generally on the 36 genes involved in the 268 formation of the flagella, we found 13 non-synonymous mutations, 3 synonymous mutations, 269 9 frameshifts in the CONTROL lines and 11 non-synonymous mutations, 2 synonymous 270 mutations, and 13 frameshifts in the SEL lines (Table S4) Additionally, we find that LOF mutations have preferentially targeted a restricted set of genes 336 in both CONTROL and SEL lines (Figure 3B), which is potentially due to hotspots for 337 frameshift mutations in the genome like single sequence repeats (SSR DNA replication functions such as transcription, translation, or protein disassembly (Table S6).

350
These GO results suggest that there has been selection against SSRs in essential genes and Quite unexpectedly, we find that 4 out of 8 genes copies of the leucine tRNA have frameshift 356 mutations that are associated to increased colony size (leuP, leuV, leuQ, and leuT) ( Table 1).

357
These tRNAs are all targeting the CUG codon, which is one of the most abundant codon used 358 by E. coli. There are in total four copies of these tRNAs and there is not more than one mutation tRNA level compared to other tRNAs. This suggests that these modified tRNAs could speed 370 up protein production in SEL lines.

371
Also of interest, we find that non-synonymous mutations in the RNA polymerase (rpoC) lead 372 to an important increase in colony size of SEL lines (Table S3)  The mixed effect model from the expansion velocity analysis was used to predict the colony 486 size after 39 days. The data were fist log-transformed as  WaveletComp R package. The extracted power levels were used to reconstruct the mutation 516 pattern over the genome by applying the reconstruct function of the Wavelet package, and we 517 extracted the probability P hit (i) that a gene i is hit by a mutation. We simulated the same number 518 of mutations as we found non-synonymous substitutions and synonymous substitutions in the 519 CONTROL and SEL lines by generating multinomially distributed random number vectors 520 with a length of the total numbers of genes and the probabilities that we determined by the 521 wavelet analysis. We simulated this vector 1000 times and we calculated the mean value as well 522 as the 2.5% and 97.5% quantiles from the simulated data and compared to the observed 523 CONTROL and SEL data.

668
The number of mutations in a gene is indicated by the intensity of the color.   690 Table S4: Mutations in SEL and CONTROL lines. An asterisk (*) indicates genes associated with the 691 "taxis" GO term. Highlighted genes are mutated in both conditions. 692 Table S5: GO enrichment analysis using the top 10% of genes with the highest density of SSRs with a 693 length larger or equal 5.
694 Table S6: GO enrichment analysis with genes that do not have any SSRs with a length larger or equal 695 5.