A high resolution view of adaptive events

Coadaptation between bacterial hosts and plasmids often involves a small number of highly reproducible mutations. Yet little is known about the underlying complex dynamics that leads to such a single “correct” solution. Observing mutations in fine detail along the adaptation trajectory is necessary for understanding this phenomenon. We studied coadaptation between E. coli and a common artificial plasmid, pBR322, in a continuous turbidostat culture. To obtain a high resolution picture of early adaptive events, we used a highly sensitive duplex sequencing strategy to directly observe and track mutations with frequencies undetectable with conventional methods. The observed highly reproducible trajectories are governed by clonal interference and show rapid increases in the frequencies of beneficial mutations controlling plasmid replication followed by a profound diversity crash corresponding to the emergence of chromosomal variants. To the best of our knowledge our study represents the first comprehensive assessment of adaptive processes at a very fine level of resolution. Our work highlights the hidden complexity of coadaptation, and provides an experimental and theoretical foundation for future studies.

The fate of beneficial mutations in clonal asexual populations depends on their supply (a product of population size and the mutation rate) and the distribution of fitness effects. If the supply of beneficial mutations is low relative to the selection strength, then a new variant quickly spreads and fixes. On the other hand, if mutational supply is high, then multiple coexisting variants interfere with each other-a phenomenon initially termed periodic selection 14 and known today as clonal interference [15][16][17] . Because population sizes of clonal asexual organisms are typically very large, there is a sufficient supply of beneficial mutations so that their frequencies are governed by clonal interference. Thus understanding the fine dynamics of clonal interference is necessary to predict adaptive events in such organisms 18,19 .
To gain this understanding, one would need to try tracking all mutations within a population as soon as they arise. This may be technically challenging. The emergence of modern sequencing technologies has created an opportunity to explore these underlying dynamics 2,4,7 . However, these techniques only allow the observation of evolutionary trajectories of mutations that have risen to frequencies of ~1%-a threshold at which the noise of Illumina sequencing (the most accurate of the currently available sequencing technologies) obscures the signal 20,21 . Such a threshold is too high to obtain reliable insight into the underlying genetic variability within a large population. For example, Good et al. 2 used one milliliter of the overnight E. coli culture for sequencing time points from the long term evolution experiment. This volume conservatively contains ~10 8 cells. As a result, at sequencing resolution of 1%, one will miss all sequence variants present in fewer than ~10 6 cells. A profound gain in resolution has been made by the application of molecular barcodes within an experimental evolution setting 9,10,22 .
These studies have shown that at the beginning of an experiment, a large pool of lineages carrying beneficial mutations is present within the frequency band between 10 -8 and 10 -4 , and a vast majority of adaptive lineages have frequencies under 0.01% (well below what is detectable with conventional Illumina sequencing). These data provided us with invaluable insight into early adaptive events.
However, they do not convey the nature of genetic variants, as the barcode counts are only proxies for the number of cells carrying a particular, yet unknown, beneficial mutation. Venkataram et al. 22 recognized this limitation and used barcoding to select several hundred lineages carrying beneficial 3 mutations for subsequent whole genome sequencing. Still, this is only the tip of the iceberg, with the majority of variants remaining below the surface.
A number of techniques have been recently developed to allow sequencing-based detection of very rare genetic changes that may be suitable to directly observe early adaptive events. Of these, duplex sequencing is the most sensitive, with a theoretical resolution threshold of <10 - 7 21,23 . It is based on using unique sequence tags to label individual molecules of the input DNA prior to preparation of Illumina sequencing libraries. During the amplification steps of library preparation, each of these molecules gives rise to multiple descendants. After sequencing, the descendants of each original DNA fragment are identified and grouped together using tags-i.e., one simply sorts tags in sequencing reads lexicographically, and all reads containing the same tag are bundled into families . These families (usually with at least three members) form single strand consensus sequences (SSCS) for the forward or the reverse strand, respectively. Complementary SSCS are then grouped to produce duplex consensus sequences (DCS). A true sequence variant is present in all reads within a family forming a duplex. In contrast, sequencing and amplification errors will manifest themselves as "polymorphisms" within a family, and so they can be identified and reliably removed. The resolution of duplex sequencing combined with the power of experimental evolution approaches provides a unique opportunity to directly observe adaptive events as they appear.
Observing emerging mutations and tracking their evolutionary trajectories requires an experimental system that satisfies two criteria. First, it must have a sufficiently small genome to allow for ultra-deep duplex sequencing. This is because the great dynamic range of duplex sequencing requires high depth (i.e., identifying variants with frequency of 10 -6 requires sequencing depth of >10 6 ). Second, such a system needs to evolve on a timescale that fits into a short term experimental evolution setup, such as the one provided by chemostat or turbidostat devices. To satisfy these requirements, we selected one of the logistically simplest and exceptionally well understood systems-plasmid vector pBR322. It has a small genome of 4,361 bp, which is amenable to deep sequencing. As a synthetic plasmid it has no evolutionary history of being associated with E. coli . Transforming it into naïve E. coli cells triggers a series of adaptive events in the plasmid itself as well as in the host genome [24][25][26][27][28] . These changes take place within as few as several hundred generations, which is an easily achievable timeframe for an experimental evolution experiment. Because acquisition of the plasmid incurs a metabolic burden on the host cells, it reduces growth rate relative to their plasmid-free isogenic counterparts 27 and results in the rapid decline of transformed cells in antibiotic-free chemostat cultures 29

Experimental evolution setup
We chose to use a turbidostat, a device that maintains constant cell density with no restrictions on the supply of nutrients, for this experimental evolution experiment, as it puts additional selective pressure on host cells to increase their maximum growth rate 30,31 . We selected E. coli DH5α F-recA1 as the host, in which recombination is inhibited and conjugation is disabled. The cells were transformed with pBR322-a low copy number plasmid (15-20 per cell) 32

Emergence of polymorphisms followed by variation crash
Frequencies and positions of single nucleotide variants (SNVs) and insertions/deletions (indels) called from duplex sequencing are shown in Table S1 and  S1) showed a transition-to-transversion ratio of 1:1.9, which matches that of the long term evolution experiment of Wielgoss et al. 35 . One insertion and two deletions were identified in total (Table   S1). The insertion was located at the replication origin, and was observed only in replicate R7 of the long experiment. One deletion that was located outside the origin was found in multiple turbidostat runs.
The other deletion was within the origin, and was only observed in the R1 replicate of the short 6 experiment. Indels did not exhibit any frequency fluctuation, which could be due to the rate of indel mutations being generally lower than that of base-pair substitutions [36][37][38] .
Because the distance between SNVs located within the replication origin is shorter than the length of duplex reads, we examined our data for evidence of linkage. No linkage between any SNVs was observed. This is not unexpected because given the E. coli mutation rate is ~10 -10 -10 -9 per nucleotide per generation [35][36][37][38] and the highest frequency attained by these SNVs is ~0.9% it is highly unlikely to have two mutation events within a single molecule.

Variable sites control plasmid copy number
The mechanism controlling pBR322 replication has been well studied 39 S2A) were reported to increase pBR322 copy number 40 . This interval represents loop II' of RNAI. Meanwhile, base 3,118 corresponds to the -35 promoter region of RNAII (Fig. S2B). Changes at this base were reported to increase the copy number as well 41 . The absolute majority of all variants found in this study are confined to these regions.

Evolved cells have higher fitness conferred by chromosomal mutations
At each collection point we sampled ⅔ of volume from the incubation vessel, which was immediately topped-off with sterile medium diluting the remaining cells and causing optical density dips in Fig. 1 (Table S1) and was on the way to purge all variation in R7 suggests that the increase of fitness can be attributed to changes within the bacterial chromosome. Whole genome sequencing of initial and terminal clones from the long term experiments revealed a high frequency nonsynonymous T-to-A transversion in treB (Val112Glu) in both replicates (Table S2). In addition, R6 possessed another nonsynonymous substitution in recA (Asp161Gly) that effectively re-establishes the RecA1 genotype, back to RecA.
The product of the treB gene transports trehalose into the cell as trehalose 6-phosphate, which is further converted to glucose 6-phosphate and glucose 42 . In addition to its role as a carbon source, intracellular trehalose can act as an osmoprotectant at high osmolarity [43][44][45] . The T-to-A transversion in treB observed in our work has been suggested to be a consequence of adaptation of E. coli to LB medium and has been identified in three parallel populations in two recent mutation accumulation experiments 46,47 .
The recA1 → recA reversal in replicate R6 is noteworthy, as it is close to fixation (frequency of 0.91; Table S1) and reinstates recombination [48][49][50] . RecA has any direct effect on the pattern observed here.

Modelling the dynamics of plasmid mutations
The observations we described thus far suggest the following scenario. A naïve host is transformed with a plasmid conferring antibiotic resistance. Incubation of the transformed cells in an antibiotic containing environment guarantees retention of an otherwise disposable plasmid. The high density of the turbidostat environment creates a strong selection pressure for shortening generation time.
Initially, selection drives the increase in the frequencies of adaptive changes controlling plasmid copy number, as it translates into the elevated production of the efflux pump, thus helping to offset the inhibitory effect of tetracycline on the protein synthesis. This increases the fitness of cells carrying the mutated pBR322 relative to the wild-type plasmid. However, higher copy number has an associated metabolic cost-once it reaches a certain frequency threshold, it proves too expensive and, in fact, eventually decreases fitness. Simultaneously, a mutation with a large fitness effect arises on the bacterial chromosome. Given the frequencies observed in the experiment, it is highly unlikely that such a chromosomal mutation will arise on the background of an existing plasmid mutations. Its spread to fixation obliterates plasmid variation, as we observed in Fig. 2. 8 To solidify this reasoning into a quantitative framework, we developed a model drawing on the previous work of De Gelder et al. 52 . Assuming that there is no back mutation on both the plasmid and the chromosome, we denote the number of cells with pBR322 bearing mutations at a certain generation n by m n , and those with wild-type pBR322 by w n . At any generation, m n increases due to (1)  , where m n and w n are given by equations (1) and (2), and thus is a function of the two selection  Table 1. Allele frequencies reconstructed using the described model with optimal parameters closely follow the observed dynamics (Fig. 3) and allow us to make and experimentally test predictions about the behaviour of this system. In particular, it allows for predicting the effects of the increases in tetracycline concentration (Fig. S6) and mutation rate (Fig. S7)  and prolonged their persistency before ultimately yielding to chromosomal changes. Likewise the effect of increase of the mutation rate allowed for even higher frequencies of plasmid variants. While this dynamics is inherent to our model it provides an insight into the magnitude of selection coefficients and the temporal scale at which plasmid variants remain in the population before being eliminated. This information leads to a series of future experiments our group will be undertaking.

Plasmid heteroplasmy likely prevents alterations in copy number
The dynamics described here initially favors beneficial changes within the plasmid DNA that are quickly overridden by chromosomal mutations that attain high frequency in the evolving population. As a result of this co-adaptation process, the plasmid remains unchanged. A number of experimental evolution studies report similar findings where co-adaptation is driven exclusively by chromosomal mutations and leaves plasmids unaltered 24,[53][54][55][56] . However, it is not entirely clear why this is the case. At least for pBR322 it is not unreasonable to assume that some combination of mutations affecting plasmid replication could successfully balance tetracycline efflux with copy number burden and reach high frequency at least in a short term. The fact that this does not happen could be a consequence of pBR322 segregation dynamics-the plasmid is partitioned between daughter cells randomly and likely unequally. As pBR322 copy number is only ~20 copies per cell, this sampling error may be significant. Because mutations occur during replication of just one of multiple plasmids copies present within a single cell, this cell is rendered heteroplasmic -it contains a mixture of mutated and wild-type pBR322 molecules. The stochasticity of partitioning of plasmid molecules during cell division means that this ratio will be different in each of the daughter cells (also see 57 ). In others words they will be heteroplasmic to a different degree. Recent data from another ColE1 plasmid (pB1000 in Haemophilus influenzae ) shows heteroplasmic cells to have higher overall number of plasmid copies compared to cells that are monomorphic for the mutated plasmid (here, like in our case, the mutation leads to copy number increase) 58 . Thus the differences in the extent of heteroplasmy (a mutant-to-wild-type ratio within a cell) will lead to different plasmid copy numbers within the daughter cells. This would explain the behavior we observe in our system. Our experiment is performed at a constant antibiotic concentration. This means that the maximum beneficial effects of the increase in the number of efflux pumps is likely associated with a single, "optimal" number of plasmid copies per cell, which balances incurred metabolic cost. It is likely that such optimum is only achieved in heteroplasmic cells because monomorphic mutated cells may produce too many copies of the plasmid to remain competitive in the population. In addition, RNAI, which contains most mutations in our study, acts in trans by binding to RNAII at the origin of replication. As a result even a single mutated molecule may influence replication of wild-type molecules within a cell. Thus there must be a narrow optimal value for the degree of heteroplasmy that translates into the optimal pBR322 copy number conferring selective advantage in a given antibiotic concentration. However, the very nature of pBR322 segregation makes it very difficult to reliably establish cells with this "optimal" number in the population because every cell division effectively randomizes the relative numbers of mutated and wild-type plasmids in the next generation. As a result the positive effect of the increase in the copy number is only achieved along a precise optimal value ridge of the adaptive landscape. Every time a cell divides its descendants fall from the ridge because they produce either too few or too many plasmid copies. Likewise ascending the ridge is only possible by chance when a cell division results in just the right degree of heteroplasmy. Because the bacterial populations we are sampling are large, we are able to observe these events until a large fitness effect mutation obliterates the variation by changing the topography of the adaptive landscape (the variation is likely still there but cannot be observed with our method). Addressing these issues will require development of technique allowing tracing individual plasmids in bacterial population (e.g., a finer resolution of the approach recently reported by Rodriguez-Beltran et al. 59 ).
In conclusion we have applied the highly sensitive duplex sequencing approach to trace the evolution of pBR322 plasmid during its adaptation to E. coli DH5α host. We uncovered previously hidden fine scale dynamics where multiple mutations leading to the increase in the copy number compete ultimately losing to a single chromosomal substitution. Our results suggest that there is a strong incentive not to alter copy number even if it can provide a degree of selective advantage. This incentive is likely rooted in the complex interplay between mutated and wild-type plasmids constrained within a single cell and underscores the importance of understanding of intracellular plasmid variability.

Turbidostat set-up and experimental evolution
The turbidostat set-up was described by Takahashi et al. 61 . It consists of three connected parts (Fig. S5): a carboy containing LB medium, a bacteria growing chamber that is revolved magnetically, and a waste tank. The carboy and the chamber were pressurized by an aquarium air pump. The volume in the chamber was kept constant at 13 mL. OD 600 was measured in 30 second intervals and LB was added as necessary to maintain OD 600 constant at 0.8.
Fifty µL of the glycerol stock was inoculated into 13 mL of LB medium and grown for 14 hours at 250 rpm, 37°C. Four mL cell culture was inoculated into turbidostat. One mL culture was frozen as glycerol stock in -80°C as the initial sample. The remaining culture was labelled as time point 0 and collected by centrifugation at 4,000 g for 5 minutes. The pellet was then transferred to -20°C. Turbidostat runs were carried out in an incubator to maintain 37°C. Eight mL culture was taken from the turbidostat at regular 12 hour intervals, which constitute time points for the subsequent analysis. The turbidostat was immediately refilled by fresh broth to the 13 mL mark. When the turbidostat run was completed, the terminal sample was also collected and archived as glycerol stock in -80°C.

Population fitness determination
OD 600 values were constantly monitored, and growth curves were inferred from them. We found that exponential growth between OD 600 values of 0.6 to 0.8 corresponded to the maximum growth rate, and therefore we used it as a proxy for overall population fitness. Growth rate was defined as where t is the duration in minutes in which OD 600 values increased from 0.6 to 0.8. The generation time was calculated as ln(2)/growth rate We found that the generation time varied over the course of the two long term replicates. Detailed information regarding growth rate and generation time are displayed in Table S3. For simplicity, the generation time was assumed as 60 min when we modeled the dynamics of variants (Fig. 3).

Fitness re-measurement of samples representing the initial and terminal time points
Fifty µL glycerol stock from each of the initial and terminal samples from the two long term replicates R6 and R7 was inoculated into turbidostat in triplicate. OD 600 was monitored and growth curves were inferred. Exponential growth between OD 600 values of 0.6 to 0.8 was used to estimate fitness.

Data modeling
We first defined a data_generation function to simulate data based on the mathematical model described by equations (1) and (2)  This loss function was optimized over the five parameters ( a , b , c , d , μ) using the R function optim , setting the bounds on the parameters as shown in Table S4 (choice of these bound is discussed in the next paragraph). Note that optim was run with a maximum of 10 6 iterations in order to facilitate convergence of the optimization process. Since the simulated mutant frequency shows a complex dependence on the parameters ( a , b , c , d , μ), the loss function is likely to have multiple local minima. As a consequence, the set of optimal parameters ( a , b , c , d , μ) returned by of the optim function strongly depend on their initial values. To effectively explore the parameter space, we randomly sampled initial values for the five parameters (uniformly on a log 10 scale, according to the bounds in Table S2) and repeated the optimization process until 1,000 converged optimization processes were obtained. Each repetition returned a set of parameters and the corresponding loss value. The five values ( a , b , c , d , μ) generating the smallest loss among the 1,000 obtained were chosen as optimal parameters and presented in We defined the five parameter intervals in logarithmic scale to ensure that they were wide enough to randomly sample the parameters. Therefore, a large parameter space was explored. The mutation rate μ in De Gelder et al. 52 was determined with the order of magnitude of 10 -5 . We set the interval for μ from 10 -10 to 10 -2 . The other four parameters ( a , b , c , d ) were involved in calculation of and . As σ 1 (n−1) σ 2 (n−1) coefficients, and were generally expected to be smaller than 1. We set the intervals for ( b , c , σ 1 (n−1) σ 2 (n−1) d ) the same as μ. The interval for parameter a was set on smaller values (from 10 -14 to 10 -6 ) because it served as the coefficient of m n-1 and we tried to maintain the product am n-1 reasonably small.

Data deposition
Raw sequencing reads in this study have been deposited at the NCBI SRA database as BioProject PRJNA485503.

Supplementary files
Supplementary files include Table S1 (xlsx),     The simulation was performed with parameters obtained by numerical optimization (see Material and Methods ). Different turbidostat runs are colored respectively. Y-axis shows the sum of frequencies at bases 3,027-3,035 and 3,118. X-axis represents generations. The x-axis here using generation as the unit corresponds to the x-axis using time in hour in Fig. 1 and Fig. 2. This conversion was accomplished by assuming that the generation time was constantly 60 min. In the left panel, growth rate were estimated from slopes in Fig. 1. Growth rate the initial isolate is shown in a red triangle, and those of the remaining isolated are shown in black circles. Growth rate of the terminal isolate in replicate R6 cannot be obtained from Fig. 1, because the terminal isolate was stored in the glycerol stock and did not show a dip in OD values. However, it can be re-measured by 23 inoculating the glycerol stock into turbidostat; this is shown in the right panel. In the right panel, initial and terminal isolated, both in red triangles, were re-measured in triplicates.   Positions are projected to the y-axis, and colored individually. Sizes of the closed circles are proportional to the allele frequencies of corresponding variants.
The simulation was performed with parameters obtained by numerical optimization (see Material and Methods ). Different turbidostat runs are colored respectively. Y-axis shows the sum of frequencies at bases 3,027-3,035 and 3,118. X-axis represents generations. The x-axis here using generation as the unit corresponds to the x-axis using time in hour in Fig. 1 and Fig. 2. This conversion was accomplished by assuming that the generation time was constantly 60 min. Growth rate of individual isolates from replicate R6 of the long turbidostat run. In the left panel, growth rate were estimated from slopes in Fig. 1. Growth rate the initial isolate is shown in a red triangle, and those of the remaining isolated are shown in black circles. Growth rate of the terminal isolate in replicate R6 cannot be obtained from Fig. 1, because the terminal isolate was stored in the glycerol stock and did not show a dip in OD values. However, it can be re-measured by inoculating the glycerol stock into turbidostat; this is shown in the right panel. In the right panel, initial and terminal isolated, both in red triangles, were re-measured in triplicates. Figure 7. Same as Fig. S6. The a , b , c , d parameters were given the same value as those obtained by numerical optimization. The μ parameter was multiplied by 10, 50,and 100, generating different dynamics.  Table S1. Plasmid SNVs and INDELs identified in all four turbidostat runs. Table S2. Chromosomal mutations identified in the two long term replicates-R6 and R7. Table S3. Growth rate and generation time over the course of the two long term turbidostat replicates. Table S4. Bounds, in log 10 scale, for the parameters to be drawn uniformly at random to model the dynamics of plasmid.