Chromosomal barcoding of E. coli populations reveals lineage diversity dynamics at high resolution

Jesse Lerner; Michael Manhart; Weronika Jasinska; Louis Gauthier; Adrian W.R. Serohijos; Shimon Bershtein

doi:10.1101/571505

Abstract

Evolutionary dynamics in large asexual populations is strongly influenced by multiple competing beneficial lineages, most of which segregate at very low frequencies. However, technical barriers to tracking a large number of these rare lineages have so far prevented a detailed elucidation of evolutionary dynamics in large bacterial populations. Here, we overcome this hurdle by developing a chromosomal barcoding technique that allows simultaneous tracking of ∼450,000 distinct lineages in E. coli. We used this technique to gather insights into the evolutionary dynamics of large (>10⁷ cells) E. coli populations propagated for ∼420 generations in the presence of sub-inhibitory concentrations of common antibiotics. By deep sequencing the barcodes, we reconstructed trajectories of individual lineages at high frequency resolution (< 10⁻⁵). Using quantitative tools from ecology, we found that populations lost lineage diversity at distinct rates corresponding to their antibiotic regimen. Additionally, by quantifying the reproducibility of these dynamics across replicate populations, we found that some lineages had similar fates over independent experiments. Combined with an analysis of individual lineage trajectories, these results suggest how standing genetic variation and new mutations may contribute to adaptation to sub-inhibitory antibiotic levels. Altogether, our results demonstrate the power of high-resolution barcoding in studying the dynamics of bacterial evolution.

Advances in sequencing technologies have generated tremendous breakthroughs in identification of beneficial mutations arising in controlled laboratory evolution experiments, as well as mutations contributing to the emergence of anti-cancer or antibacterial drug resistance in the clinic^1–4. Yet, experimental measurements of the dynamics of evolutionary processes remains a major challenge, particularly in large asexual populations, where multiple low-frequency small-effect mutations are known to spread simultaneously^5–8. A quantitative description of evolutionary dynamics requires the ability to follow numerous individual lineages, most of which occur at extremely low frequency (10⁻⁵-10⁻⁶), and to do so in parallel and over multiple generations. Whole genome sequencing (WGS) techniques, although becoming routine and well-established, fall short of fulfilling this requirement, as they are usually unable to detect mutations at frequencies below ∼0.1%^9,10. Various alternative solutions have been applied over the years to reconstruct population dynamics from trajectories of individual lineages at much higher resolution than accessed by WSG^11–13. A particularly successful method that dramatically increases the frequency resolution of individual lineages is based on uniquely tagging chromosomes of individual cells with a genetic “barcode” that can be easily recovered by deep sequencing¹⁴. This approach was implemented in S. cerevisiae, where chromosomal insertion of ∼500,000 random barcodes using the Cre-loxP recombination system allowed a quantitative description of evolutionary dynamics of yeast populations (∼10⁸ cells)^5,15,15. In bacteria, however, technical barriers have limited the number of uniquely-incorporated chromosomal barcodes to ∼100-400^17–19. Such low levels of barcode diversity preclude us from answering important questions in bacterial evolution. For instance, highly-efficient chromosomal labeling is particularly relevant in characterizing the evolution of drug resistance in the presence of sub-inhibitory amounts of antibiotics, where the dynamics is driven by multiple mutations of low frequency and small fitness effects²⁰.

Here, we present a method based on the Tn7 transposon to generate E. coli populations of > 10⁷ cells carrying 10⁵-10⁶ unique chromosomal barcodes. We demonstrate its implementation by quantifying the dynamics and patterns of lineage diversity loss in barcoded populations propagated in a month-long experiment (∼420 generations) in the presence of sub-inhibitory concentrations of two commonly applied antibiotics, chloramphenicol and trimethoprim. We found that different selection regimes elicited unique lineage diversity dynamics. By comparing the identity of individual barcodes in independent replicates evolving in parallel under identical conditions, we were also able to infer the relative contributions of pre-existing versus de novo mutations to the observed evolutionary dynamics. In general, stronger selection pressure generated faster loss of lineage diversity and more reproducible dynamics driven by standing genetic variation. In contrast, weaker selection pressure produced slower diversity loss and less reproducible dynamics due to a greater role of new mutations. In particular, ultra-low amounts of trimethoprim (0.01 μg/ml that amounts to 0.1% of minimal inhibitory concentration (MIC)) unexpectedly slowed the rate of lineage diversity loss even beyond conditions without any antibiotic, hinting the possibility that in this regime the antibiotic could be primarily functioning as a signaling molecule^21,22.

Results

Highly efficient chromosomal barcoding of E. coli cells

Several robust genome-editing methods available for E. coli have long made this organism a flagship of genetic manipulations²³. A high-resolution chromosomal labeling technique, however, has not been available, making it difficult to analyze the evolutionary dynamics of large bacterial populations segregating at low frequencies. To address this problem, we harnessed the well-established site-specific recombination machinery of the Tn7 transposon^24,25. We placed the tnsABCD genes that encode the transposase biochemical machinery under the control of an arabinose-inducible pBAD promoter in a temperature-sensitive helper plasmid (Fig. 1A). The Tn7 arms (Tn7L, Tn7R) that target the genetic cargo at a neutral attnTn7 attachment site were relocated to a suicide integration plasmid (Fig. 1B). We placed the barcode cassette carrying a 15-nucleotide long random sequence (the “barcode”) and the adjacent marker of selection between the Tn7 arms on the integration plasmid (Fig. 1B,C). To minimize the preparation of barcode libraries to two consecutive PCR reactions, we added sequences complementary to Illumina adapter primers flanking the barcode cassette (Fig. 1C). These sequences were used to both PCR amplify barcodes directly from cell cultures as well as anchor i5/i7 Illumina indexes to the amplified barcodes (Methods). Using this binary Tn7 transposon system, we integrated barcodes into a fixed location on the E. coli chromosome in two steps. First, we transformed cells with the Tn7 helper plasmid and pre-conditioned them by inducing the transposase machinery. Second, we transformed the pre-conditioned cells with the barcoded integration plasmid.

Figure 1: Barcoding E. coli cells with Tn7 transposon machinery.

(A) Map of the helper plasmid expressing the Tn7 transposition machinery (tnsABCD) under the control of a pBAD promoter. Using a temperature-sensitive origin of replication (PSC101 Ori ts), we cure the plasmid by growing the strains at a non-permissive temperature after the chromosomal integration of barcodes is complete. (B) Map of a suicidal integration plasmid carrying the barcode cassette and spectinomycin resistance-conferring gene (SpR) nested between the left and right Tn7 arms. The SpR gene is flanked by FRT sites and can be later excised from the chromosome with Flp recombinase. The pir+ dependent origin of replication (R6K gamma Ori) renders this plasmid suicidal in pir- cells. (C) A map of the segment undergoing chromosomal integration into the Tn7 attachment site (attnTn7). Inset: sequence of the barcode-carrying cassette. The 5’ and 3’ ends are flanked by sequences complementary to the Illumina adapter primers (upper case, blue) (see also Methods). The location of the 15 nt variable region (the barcode) is marked with uppercase red Ns. The barcode is placed upstream of a 9 nt stretch (underlined, lowercase). The barcode is the only variable part of the cassette. (D) Preparation of the barcoded plasmid and chromosomal libraries. We incorporate the cassettes carrying unique barcodes into Tn7 integration plasmids using Gibson assembly and then transform them into pir+ cells. We achieve the chromosomal incorporation of the barcodes and curing of the Tn7 helper plasmids in a single plating step on spectinomycin at 37 °C (see also Methods).

We simultaneously selected for chromosomal barcode integration and removal of the helper plasmid by plating on selective media and incubating the plates at 37°C (Fig. 1D, Methods). Sequencing the ‘raw’ barcode library – as synthesized by the manufacturer and prior to incorporation of barcode cassettes into the integration plasmid (Methods) – revealed that the total number of unique barcodes was ∼1.3 × 10⁶ (Table S1). The ‘raw’ library had a fairly uniform distribution of frequencies: all barcodes but one had frequencies between 10⁻⁷ and 10⁻⁵ (Fig. S1A,B). The nucleotide composition of these barcodes was also very close to random, as quantified by the entropy of nucleotides per position (Fig. S1C). Incorporating the barcodes onto plasmids and then onto chromosomes reduced this diversity (∼8.4 × 10⁵ unique barcodes on plasmids and ∼4.5 × 10⁵ on chromosomes; Table S1). The process also introduced more redundancy into the distribution of frequencies, with some barcodes reaching frequencies of 10⁻³ (Fig. S1A,B). These increases in redundancy also led to a minor decrease in sequence entropy (Fig. S1C). However, the presence of a few barcodes with high initial frequencies did not appear to play a major role in the resulting lineage dynamics during evolution, as we show below.

Laboratory evolution of the barcoded population

Bacteria are often exposed to antibiotic concentrations far below the minimal inhibitory concentration (MIC), both in natural environments and in patients receiving antimicrobial therapy^26,27. Previous studies have shown that, compared to a lethal dosage, sub-MIC concentrations greatly expand the mutational space by allowing a large number of small-effect mutations to enter a population simultaneously²¹. The importance of low-frequency lineages in sub-MIC conditions make them a perfect setting to employ our barcoded population of E. coli. To this end, we chose two common antibiotics with distinct modes of actions: chloramphenicol (CMP), which inhibits protein synthesis via inactivation of peptidyl transferase activity of bacterial ribosome²⁸; and trimethoprim (TMP), which functions as a competitive inhibitor of the essential protein dihydrofolate reductase²⁹. It was demonstrated that sub-inhibitory concentrations of antibiotics as low as MIC/100 can still select for resistant mutants over the wild type³⁰. Thus, to track the dynamics of adaptations at sub-MIC concentrations we chose antibiotic concentrations around MIC/100. To reduce the selection pressure even further, below the minimal selective concertation, we also chose ultra-sub-MIC concentrations (∼MIC/1000). To fine-tune the sub-MIC concentrations of both antibiotics to the conditions of the laboratory evolution experiment, we identified concentrations that reduced the total number of cells by no more than 30% by the end of a single propagation cycle (∼10 hours), comparatively to the untreated culture (Methods). These concentrations were 1 μg/mL CMP (6.25% MIC) and 0.1 μg/mL TMP (1% MIC) (Fig. S2, Methods). For ultra-sub-inhibitory regime, we chose 10% of the aforementioned concentrations. We then conducted laboratory evolution via serial passaging in the presence of CMP initially at 6.25% (‘low’) and 0.625% (‘ultra-low’) MIC, in TMP at 1% (‘low’) and 0.1% (‘ultra-low’) MIC, and in the absence of any antibiotics (Fig. S3A,B). We evolved 14 independent replicate populations in each of these five conditions (Fig. S3C,D). We diluted batch cultures (500 μL each) grown in 96 well plates by 1:100 every ∼6 generations (that is, passing twice daily; Fig. S3E,F) with a bottleneck population size of ∼3 × 10⁷ cells (Fig. S3C,D). To sustain the selection pressure along the evolutionary experiment (∼420 generations), we gradually increased CMP in the ‘low’ condition from 1 μg/mL to 2.8 μg/mL (Fig. S3A); and in the ‘low’ TMP condition, we increased the antibiotic from 0.1 μg/mL to 1.2 μg/mL (Fig. S3B). We kept the ‘ultra-low’ CMP environment constant at 0.1 μg/mL throughout the experiment (Fig. S3A), while ‘ultra-low’ TMP increased from 0.01 μg/mL to 0.1 μg/mL at generation ∼288 (Fig. S3B). As intended, the number of cells at the end of each passage remained roughly constant for each condition and along the entire evolutionary experiment (Fig. S3C,D).

In the experiment, we expected at least two forms of selective pressure, one due to the specific type and concentration of antibiotic, and another due to general growth conditions (growth medium, aeration, etc.). Additionally, there might also be fitness cost associated with the acquisition of drug resistance^31,32. To analyze the effects exerted by these two selection forces, and a possible fitness cost, we measured the fitness of the barcoded populations at several time points during the experiment. Specifically, due to the limitation of the number of samples that can be loaded into a single NextSeq run, we chose a total of 12 randomly picked populations over the five conditions: three replicates for ‘low’ CMP, three replicates for ‘ultra-low’ CMP, two replicates for ‘low’ TMP, two replicates for ‘ultra-low’ TMP, and two replicates for ‘no drug’ condition. We measured fitness with respect to growth under antibiotics in units of IC50 (antibiotic concentration inhibiting 50% of growth) at the whole-population level (Fig. S4, Methods). These measurements revealed a moderate increase in antibiotic resistance over the experiment for ‘low’ CMP and TMP conditions (Fig. S5A,B). In contrast, both ‘ultra-low’ conditions produced no measurable improvement in IC50 (Fig. S5A,B).

Furthermore, the increases in drug resistance evolved in the ‘low’ conditions were accompanied by a fitness cost in the absence of antibiotics, measured by the growth rate (Fig. S5C,D). In particular, the population that evolved to resist the highest levels of CMP (‘low’ CMP replicate 1, Fig. S5A) showed the strongest fitness cost among the three replicate populations in the same condition (Fig. S5C). However, we observed no fitness cost in the populations evolved under ‘ultra-low’ CMP. Instead, the growth rate trajectories for these populations were generally similar to those for the populations that evolved in the absence of antibiotics (Fig. S5C). Surprisingly, in the populations evolved under ‘ultra-low’ TMP, the improvement in growth rate lagged behind the populations evolved under no antibiotics (Fig. S5D), suggesting that the rate of adaptation to growth conditions at 0. 1% MIC of TMP was diminished, despite the fact that no improvement in TMP IC50 was observed (Fig. S5B).

Barcodes allow high-resolution monitoring of lineage trajectories

To elucidate the evolutionary dynamics of these populations at the level of individual lineages, we sequenced the barcodes of the same 12 populations at 16 time points. From the sequenced barcodes we assembled frequency trajectories for all ∼4.5 × 10⁵ initial lineages over the course of the experiment (Figs. 2, S6). These trajectories immediately raised several important qualitative insights. First, we saw that just one or two lineages dominate each population (> 85%) by the end of the experiment (see also Table S2). Second, some lineages rose to high frequency in several independent populations (lineage colors match across panels in Fig. 2). Third, we saw evidence of widespread clonal interference: some lineages that initially increased in frequency due to positive selection later decreased due to competition from fitter lineages. Furthermore, Fig. 2 suggests that the rate of lineage diversity loss was reproducible between replicate populations in the same condition, while systematically distinct across different conditions. Although genetic drift alone can cause a drop in lineage diversity, this occurs on the time scale of 2N generations, where N is the effective population size³³. Since the effective population size in our experiments was of order 10⁷ (Fig. S3C,D), neutral dynamics at the scale of the whole population was negligible on the time scale of the experiment. Therefore, we could assume that the dynamics of lineage diversity was dominated by selection.

Figure 2: Dynamics of barcoded lineage frequencies over evolution experiment.

Each panel shows the frequency trajectories for all barcoded lineages in a single population over time of the experiment. Each colored band corresponds to a unique lineage, with its vertical width indicating its frequency at a particular time point. The panels in each row correspond to a different antibiotic regimen, while each column corresponds to a different replicate. For the top 10 (according to average frequency) lineages in each population, we assign a unique color to each lineage that is consistent across panels (Table S2). We use random colors for all lower-frequency lineages, while gray represents the frequency of reads without identified barcodes.

Ecological tools allow quantification of lineage diversity dynamics across conditions

To quantify the dynamics of lineage diversity, we adopted a measure of diversity widely used in ecological studies^34,35: where x_k is the frequency of the k^th barcoded lineage, and q is the “order” of the diversity index, which determines the sensitivity of diversity to abundant versus rare barcodes (Methods). In general, we can interpret the diversity index as the effective number of lineages present in the population. When q = 0, the diversity index simply counts the number of unique barcoded lineages, irrespective of their frequencies. This regime is equivalent to measuring diversity as “species richness” in ecological contexts³⁴. When q = 1, the diversity index weighs all barcoded lineages by their frequency. This regime is equivalent to the exponential of the Shannon entropy of the frequencies. In the limit of q → ∞, the diversity index equals the reciprocal of the maximum lineage frequency, meaning that it depends only on the most abundant lineage and no others. Thus, by comparing the lineage diversity index across different q values, we can estimate the relative contributions of rare and abundant lineages to that diversity. Note, that if all lineages have equal frequencies, then the diversity index equals the actual number of lineages for any value of q. Figure 3A shows the dynamics of the lineage diversity index for each population over the time of the experiment, for three different values of q. At the beginning of the experiment, ⁰D ≈ 4.5 × 10⁵, since that is the total number of unique barcodes. However, the effective number of lineages, accounting for their unequal frequencies (Fig. S1A,B), is approximately 10-fold lower, ¹D ≈ 4.6 × 10⁴. ‘Low’ CMP produced the fastest collapse of lineage diversity, extinguishing over 90% of unique barcodes in less than 50 generations. This behavior was displayed at all q values, indicating that rare and frequent barcodes contributed equally to these dynamics. In contrast, populations under ‘low’ TMP lost ⁰D diversity more rapidly compared to ‘ultra-low’ CMP and ‘no drug’ conditions, although it lost ¹D and ^∞D diversities at approximately the same rate; this indicates that the dynamics of abundant lineages were similar in these three conditions, but that low-frequency lineages disappeared more quickly in ‘low’ TMP. More surprising was the fact that populations under ‘ultra-low’ TMP lost diversity even more slowly than did populations under no antibiotics. Indeed, the diversity of populations under ‘ultra-low’ TMP maintained 40-50% of the initial effective diversity (q = 1) up to generation ∼120, whereas populations without antibiotics had only 1% of their initial diversity by that time point. This difference in the rates of lineage diversity deterioration between ‘ultra-low’ TMP and ‘no drug’ conditions is consistent with our fitness cost measurements, wherein the rate of adaptation under ‘ultra-low’ TMP lagged behind that of ‘no drug’ populations (Fig. S5D). Interestingly, q = 0 diversity under ‘ultra-low’ TMP is similar to the other populations by generation ∼250, but its diversity at larger q remains higher until the very end of the experiment. Towards the end of the experiment, we saw that the effective diversity of all populations is 1 or 2 lineages (q = 1), consistent with the observations from Fig. 2. However, we note that despite the drop in lineage diversity, there was still an ample number of surviving barcode lineages: each population retained a few thousand barcodes by the end of the experiment, as seen in the q = 0 diversity (Fig. 3A).

Figure 3: Dynamics of lineage diversity over time.

We measure diversity of barcoded lineages using the index ^qD (Eq. 1, Methods), where the parameter q controls the weight of low-versus high-frequency lineages: (A) for q = 0 (number of unique barcodes), q = 1 (exponential of Shannon entropy of lineage frequencies), and q = ∞ (reciprocal of the maximum lineage frequency). (B) Pearson correlation coefficient between diversity trajectories (¹D) from all pairs of populations.

Reproducibility of individual lineage dynamics

Not only did different antibiotic regimens produced distinct patterns of lineage diversity loss, but we also observed that these patterns are consistent across replicate populations. In Fig. 3B we show that the Pearson correlation coefficients between the q = 1 diversity trajectories from all pairs of populations. While all trajectories are somewhat correlated, since they all monotonically decrease, we saw stronger similarity among trajectories from the same condition. To further dissect whether individual lineages have similar fates across populations, we must quantify the similarity between lineage frequencies in different populations.

To compare the lineage composition of two or more populations at a single time point, we used a definition of diversity dissimilarity from ecology. Suppose we have M populations whose lineage compositions we want to compare. We first calculated the diversity index (Eq. 1) for all populations pooled together, ^qD_pooled (Methods). We then calculated the diversity index for each population alone and determined the mean across all populations, ^qD_mean (Methods). The ratio of these two quantities, shifted and rescaled, measures the dissimilarity among lineage compositions^34,35:

If the lineage compositions of all populations are identical, then the pooled population has diversity equal to the mean diversity, and the dissimilarity index equals zero. In contrast, if the lineage compositions of M populations have zero overlap, then the pooled population has diversity M times greater than that of the mean single population, and the dissimilarity index equals 1. As with the diversity index, the parameter q allows us to vary the importance of low and high-frequency lineages in the dissimilarity index. For q = 0, the dissimilarity index measures how many lineages multiple populations have in common, regardless of their frequencies, while for q → ∞, it compares only the highest-frequency lineages (Methods).

In Fig. 4A (left panel), we calculated the q = 1 dissimilarity between all pairs of populations at generation 120. At the beginning of the experiment, the populations were identical and so the dissimilarity between all pairs of populations was zero. By generation 120, we saw that many pairs of populations from different conditions have already diverged (dissimilarity close to the maximum value of 1), while pairs of populations from the same conditions remained more similar (except for ‘low’ TMP). Furthermore, we saw some similarity even between conditions: populations under the weakest antibiotic pressures (‘ultra-low’ CMP, ‘ultra-low’ TMP, and ‘no drug’) all maintained similarity between conditions comparable to their similarity between replicates. However, by the end of the experiment at generation 420 (Fig. 4A, right panel), we saw that most of this similarity between populations had disappeared. The main exception was ‘low’ CMP, where the three replicate populations maintained strong similarity. There was also a small amount of similarity between ‘low’ CMP, ‘ultra-low’ CMP (two out of three replicates), and ‘low’ TMP. In contrast, replicate populations under ‘ultra-low’ TMP and ‘no drug’ showed no similarity to each other by the end of the experiment. There was also strong similarity between replicate 3 in ‘ultra-low’ CMP and replicate 2 of ‘ultra-low’ TMP.

Figure 4: Dynamics of lineage dissimilarity among populations over time.

(A) Diversity dissimilarity index for q = 1 (Eq. 2, Methods) between all pairs of populations at generation 120 (left) and the final time point at generation 420 (right). (B) Diversity dissimilarity index among all replicate populations in each condition, with q = 0, q = 1, and q = ∞.

We could further quantify the reproducibility of lineage dynamics by calculating diversity dissimilarity among all replicate populations in each condition over time (Figs. 4B, S7). For q = 0, we saw the dynamics of within-condition dissimilarity were similar to the dynamics of the diversity indices themselves in Fig. 3A. That is, populations under ‘low’ CMP diverged from each other most rapidly, followed by ‘low’ TMP, then concurrently by ‘ultra-low’ CMP and ‘no drug’, and, finally, by ‘ultra-low’ TMP diverging last. Interestingly, we saw all conditions settle at an intermediate amount of 0.8 dissimilarity by the end of the experiment; this value corresponds to having about 20% of their lineages in common (Methods). The dissimilarity index with q = 1, which accounts for heterogeneity in lineage frequencies, shows some differences with the q = 0 case that simply counts barcodes. With q = 1, ‘low’ CMP populations actually diverged more slowly from each other than did populations in the other conditions. Moreover, the ‘low’ CMP populations actually reached some maximum level of q = 1 dissimilarity around generation 150, and then began converging toward more similar lineage compositions. At the other extreme, ‘ultra-low’ TMP and ‘no drug’ populations reached maximum dissimilarity by the end of the experiment, while ‘low’ TMP and ‘ultra-low’ CMP populations had an intermediate value of dissimilarity. The dissimilarity index with q = ∞ shows a similar pattern of rise and fall for ‘low’ CMP populations; since this case depends only on the most frequent lineage, it implies that the high level of similarity between ‘low’ CMP populations by the end of the experiment was due to them sharing the same dominant lineage.

Altogether, this analysis shows that lineage dynamics under identical conditions are highly reproducible, even at the level of individual lineages. Specifically, it suggests that some lineages repeatedly rose to high frequency over multiple experiments. To demonstrate this more explicitly, in Fig. 5A we compared the frequencies of all barcoded lineages at the end of the experiment for two replicate populations in ‘low’ CMP, while in Fig. 5B we compared the trajectories for the top three lineages (ranked by average frequency over the experiment). Furthermore, in Fig. 5C we show the overlap of the top 10 barcodes between all three replicate populations in ‘low’ CMP: four of the top 10 in each population are shared among all replicates; with three more barcodes shared between two replicates (see also Table S2). Indeed, the most frequent lineage in replicates 2 and 3 is the exact same lineage (see purple trajectories in Fig. 2); this lineage is also the second-most frequent lineage in replicate 1. In contrast, Fig. 5D,E,F shows these same plots for the ‘no drug’ populations, which have little similarity among their most frequent lineages. In particular, they do not share any of their top 10 barcodes (Fig. 5F, Table S2). In Figs. S8, S9 we show direct comparisons of lineage frequencies between all pairs of populations. The dominant lineages in each population do not appear to have started at unusually high frequencies (Fig. S10), suggesting another mechanism must explain their dominance.

Figure 5: Repeatability of lineage dynamics.

(A) Final frequencies of all barcoded lineages between two replicate populations in ‘low’ CMP. (B) Traces of lineage frequency over time for the union of the top three barcodes (by average frequency) in two replicate ‘low’ CMP populations. (C) Venn diagram of the top 10 barcodes (ranked by mean frequency over the experiment) in each replicate of ‘low’ CMP. (D) Same as (A) but for two replicate populations in ‘no drug’. (E) Same as (B) but for two replicate populations in ‘no drug’. In all panels, the dashed black lines mark the line of identity. (F) Same as (C) but for ‘no drug’ condition.

Dynamics of individual lineages reveal clonal interference and the relative contribution of standing genetic variation versus and de novo mutations

One of the most salient features of individual lineages is that many of them appear to follow very similar trajectories (Fig. S6). To further elucidate the striking amount of similarity among some populations, we turned to the analysis of individual lineage trajectories in these populations. To that end, we performed hierarchical clustering of a subset of high-frequency lineage frequency trajectories in each population, based on the correlation coefficients between trajectories (Figs. S11, S12). We saw that the trajectories indeed formed well-defined clusters of distinct behaviors (Fig. S13); in particular, similar clusters appeared in replicate populations from the same condition, while more different clusters appeared in populations from different conditions. A few clustered trajectories also shared barcodes with very similar sequences, suggesting these lineages are actually the same (the distinct barcodes arising from sequencing errors) (Fig. S14). However, the fact that the vast majority of clustered trajectories involved unrelated barcode sequences suggests they are truly distinct lineages with highly correlated dynamics.

Arguably, the most interesting trajectory clusters are those that are non-monotonic with time, suggesting clonal interference. For example, Fig. 6A,B,C shows trajectory clusters from three populations in ‘low’ CMP with this property: these trajectories initially increased due to positive selection, but then later decreased as new mutations arose on other trajectories and outcompeted them. Every other population showed a similar cluster of trajectories, except those in ‘low’ TMP (Fig. S13); for example, Fig. 6E,F shows these trajectories in populations evolved without drug. The fact that these trajectories started increasing immediately suggests that beneficial mutations must have already been present on these lineages before the experiment. Indeed, the chromosomal barcoding process requires ∼30 generations of growth from the common ancestor. We surmised that during this time, random mutations began to accumulate in the population and, inevitably, were carried over to the evolution experiments. This also suggests an explanation for why some lineages rose to high frequency in multiple populations; these lineages likely carried beneficial mutations from the beginning, which allowed them to repeatedly dominate. We also observed a qualitatively different class of clonal interference trajectories, which did not start increasing immediately, but rose later in the experiment. For example, Fig. 6G,H shows these clusters in populations evolved with no drug. These trajectories appear to increase due to new mutations that arose during the experiment itself, rather than because of pre-existing mutations. We saw clusters of these trajectories in every population except those in ‘low’ CMP (Fig. S13). However, in replicate 1 of ‘low’ CMP, we do saw trajectories demonstrating both pre-existing beneficial mutations as well as new mutations, which initially rose due to the pre-existing mutations, then decreased from clonal interference, but then rose again due to the occurrence of a new mutation (Fig. 6D).

Figure 6: Distinct patterns of trajectories from pre-existing and new mutations.

Clustered trajectories in (A) ‘low’ CMP replicate 1, (B) ‘low’ CMP replicate 2, and (C) ‘low’ CMP replicate 3 with putative pre-existing beneficial mutations. (D) Clustered trajectories in ‘low’ CMP replicate 1 with multiple beneficial mutations. Clustered trajectories in (E) ‘no drug’ replicate 1 and (F) ‘no drug’ replicate 2 with putative pre-existing beneficial mutations; clustered trajectories in (G) ‘no drug’ replicate 1 and (H) ‘no drug’ replicate 2 with putative new beneficial mutations.

Discussion

Detailed understanding of evolutionary processes depends on our ability to follow individual lineages at a whole population level throughout multiple generations. Abundance of small-effect mutations in large populations makes it important to track lineages down to very low frequencies, ideally 10⁻⁵ – 10⁻⁶, in particular, during the initial rounds of evolution. However, methods for labeling lineages or whole-genome sequencing typically limit this resolution to orders of magnitude higher. Here we overcome this hurdle by developing a method that, for the first time, generates E. coli populations carrying 10⁵-10⁶ unique chromosomal barcodes. We tested the utility of the method by subjecting the barcoded population to serial passaging in presence of sub-inhibitory concentrations of common antibiotics. The relatively large size of the evolving populations (>10⁷) and the limited number of generations (∼420) have practically eradicated the contribution of drift to fixation of lineages in our experimental system, rendering selection the main force driving the loss of lineage diversity. Therefore, we can interpret the loss of lineage diversity as a proxy for the rate of adaptation in the population. We found that each condition prompted a different rate of adaptation (Fig. 3A), with the exception of ‘ultra-low’ CMP and ‘no drug’ conditions, which exhibited similar dynamics. Unexpectedly, ‘ultra-low’ TMP substantially reduced the rate of adaptation and lineage diversity loss, even compared to the rates observed for ‘ultra-low’ CMP and ‘no drug’ conditions. Since the growth dynamics of these populations indicate they experience a similar number of generations per passage as the other populations (Fig. S3E,F), we hypothesize that the observed delay in adaptation is due to a reduction in selection coefficients on beneficial mutations, compared to both the higher concentration of TMP (‘low’ TMP) and the condition without antibiotics. It was suggested that, at a low dosage, antibiotics might operate not as a weapon, but rather as signaling molecules that trigger transcriptional activation of multiple genes, including genes involved in the biosynthesis of amino acids, ribosomal proteins, purines, and pyrimidines²¹. If TMP, which is known to affect transcription³⁶, indeed induces a new metabolic state in the bacterial cells at ultra-low concentration, it can potentially lead to a shift in the distribution of beneficial mutations. Namely, the fitness advantage of mutations under growth in the absence of antibiotics can decrease in the presence of 10 ng/ml of TMP, which is the ultra-sub-MIC concentration used in our experiment. It is important to note that we could not have easily detected a delayed adaptation in populations subjected to ‘ultra-low’ TMP, had we not used the barcode sequencing approach. Investigating the mechanism underlying the delayed adaptation at ‘ultra-low’ TMP is the subject of another work in the immediate future.

One of the important problems in evolutionary biology is robustly quantifying the reproducibility of evolutionary processes^37,38. The deterministic nature of adaptation at near-lethal drug concentrations has been demonstrated at the level of a handful of strongly-beneficial mutations that have repeatedly accumulated in almost pre-determined order in replicate populations³⁹. However, quantifying the reproducibility of evolutionary dynamics driven by many small-effect mutations at the whole-population level has remained elusive in bacteria. Our chromosomal barcoding system in E. coli allowed us to directly address this problem by quantitatively comparing lineage dynamics across independent replicate populations subjected to identical antibiotic regimes. We found that the rate of lineage diversity loss was highly reproducible for each selection condition (Fig. 3A), indicating that lineage diversity analysis provides a robust way to quantify the reproducibility of evolution at a whole-population level. Curiously, the diversity dynamics were reproducible regardless of the rate of adaptation, For example, both ‘low’ CMP condition, which induced the fastest rate of adaptation, and ‘ultra-low’ TMP condition, which induced the slowest adaptation dynamics, exhibited high reproducibility between evolutionary replicates. This observation implies a surprising deterministic dynamics at the level of lineage diversity, despite differences in the distributions of fitness effects across conditions in our experiment.

The dynamics were furthermore deterministic at the level of individual lineages in some cases, with a few lineages rising to high frequency in multiple independently evolving populations. Populations can adapt to new environments using two sources of beneficial mutations: pre-existing mutations (i.e., standing genetic variation) and new mutations⁴⁰. These two mechanisms can have different effects on the rates and outcomes of evolution, yet delineating the evolutionary dynamics with respect to the relative contributions of preexisting and new mutations is nontrivial. Competition experiments in presence of sub-MIC amounts of antibiotics demonstrated that pre-existing mutations can be enriched for by sub-MIC selection^30,41. Sub-MIC selection was also shown to increase the frequency of new resistance-conferring mutations in evolving populations^30,42. However, neither the relative contribution of standing genetic variation versus novel mutations in a single population under sub-MIC selection, nor the evolutionary dynamics resulting from the interplay between these two mutational sources have been demonstrated. Here, we developed a quantitative approach to address this problem. By applying a measure of lineage composition dissimilarity from ecology^34,35, we compared the change in frequency of identical lineages between independent replicates subjected to identical conditions over the evolutionary time course. These results suggest that lineages present at very low frequencies at the initial population, but nonetheless independently reaching high frequency in the replicate populations, are repetitively selected for because they carry preexisting beneficial mutations. Thus, the extent of similarity in lineage composition between evolutionary replicates at a particular time point quantitatively reports on the contribution of standing genetic variation to the observed dynamics. Expectedly, the rapid adaptation under ‘low’ CMP was accompanied by the lowest dissimilarity index among selection regimes (Fig. 4), indicating a heavy contribution of standing genetic variation. Conversely, very little indication of the contribution of pre-existing variation can be seen for ‘ultra-low’ TMP and ‘no drug’ conditions (Fig. 4), indicating that the evolutionary dynamics of the latter was driven mostly by newly acquired mutations. Further support for the importance of pre-existing beneficial mutations comes from our analysis of individual trajectories. Lineages increasing in frequency due to a newly-acquired beneficial mutation show a finite establishment time (time required for a lineage prior to committing to a deterministic growth after acquisition of a mutation). In contrast, lineages carrying pre-existing mutations increase immediately upon applying selection pressure, with zero establishment time. Indeed, among all the identified individual lineages exhibiting clonal interference trajectories (Fig. S13), no establishment time can be seen under ‘low’ CMP conditions, while finite establishment times can be detected in all other conditions.

Overall, our results demonstrate significant evolutionary insights gleaned from high-resolution lineage tracking using chromosomal barcodes in E. coli. Our experimental barcoding protocol based on the Tn7 transposon machinery is straightforward to implement and can readily be reproduced in a variety of systems. We have furthermore shown how to obtain a robust quantitative analysis of the resulting lineage data using ecological diversity indices. Altogether, we envision that this tool will find wide applicability in addressing diverse questions in bacterial population and evolutionary dynamics.

Methods

Design of the chromosomal barcode integration system in E. coli

The design of the chromosomal integration system of the barcode library is based on the pGRG25 plasmid, which carries all the components of the Tn7 transposon site-specific recombination machinery^25,43,43. We separated the Tn7 transposase machinery (tnsABCDE) from the Tn7 arms (Tn7L, Tn7R) onto two independent plasmids: the temperature-sensitive helper plasmid (pSC101 temperature-sensitive origin of replication) carrying the tnsABCD genes under the control of an arabinose-inducible pBAD promoter (Fig. 1A), and the suicide integration plasmid (R6K gamma pir+ dependent origin of replication) with the Tn7 arms flanking the barcode-carrying cassette and spectinomycin resistance gene (Fig. 1B,C). The separation of the recombination machinery tnsABCD from the integration segment flanked by the Tn7 arms achieves two major goals. First, the introduction of the helper plasmid into the cells prior to the integration plasmid allows us to precondition the cells for chromosomal recombination by inducing the expression of the transposase complex, which is expected to increase the integration efficiency. Second, the transient nature of the integration plasmid will eliminate any possibility of barcodes lingering outside of their designated chromosomal location due to unsuccessfully cured plasmids. This is especially important for barcoding temperature-sensitive strains that cannot be grown at non-permissive temperatures. The design of the barcode carrying cassette (Fig. 1C) is aimed at minimizing the number of library preparation steps required for Illumina deep-sequencing technology. The only variable region in the cassette is the barcode sequence of 15 random nucleotides. Thus, the barcode library has a theoretical diversity of 4¹⁵ (∼10⁹) unique barcodes. The barcodes are asymmetrically located 9 nt downstream of the 5’ end of the cassette. This short stretch of 9 nt is sufficient to locate the barcodes in the raw sequencing data files and to reduce the sequence redundancy in the Illumina flow cell. The cassette is flanked by the sequences complementary to the Illumina adapter primers used to anchor library specific indexes recognized by the Illumina sequencing platforms (MiSeq, HiSeq, or Nextseq) (Fig. 1C). The Integrated DNA Technology (IDT) gBlocks service synthesized the barcode cassettes (allowing the incorporation of 15 consecutive and variable nucleotides), which we then cloned into the Tn7 integration plasmid (Fig. 1B). We characterized the resulting library of barcodes by deep sequencing prior to integration into the genome (Fig. S1).

Design of the barcode-carrying cassette

The barcodes comprise 15 random nucleotides, which we placed 9 nt downstream of the 5’ end of the 288-nt long cassette:

gtcgcgccggNNNNNNNNNNNNNNNtatctcggtagtgggatacgacgataccgaagaca gctcatgttatatcccgccgttaaccaccatcaaacaggattttcgcctgctggggcaaa ccagcgtggaccgcttgctgcaactctctcagggccaggcggtgaagggcaatcagctgt tgcccgtctcactggtgaaaagaaaaaccaccctggcgcccaatacgcaaaccgcctctc cccgcgcgttggccgattcattaatgcagctggcacgacaggtttccc

We placed the barcode-carrying cassette between sequences complementary to Illumina adaptor primers (forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG; reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG) and, finally, flanked by sequences complementary to the integration site in the Tn7 integration plasmid:

gatatcggatcctagtaagccacgttttaattaatcagatccctcaatagccacaacaac tggcgggcaaacagtcgttgctgattggtcgtcggcagcgtcagatgtgtataagagaca gtcgcgccggNNNNNNNNNNNNNNNtatctcggtagtgggatacgacgataccgaagaca gctcatgttatatcccgccgttaaccaccatcaaacaggattttcgcctgctggggcaaa ccagcgtggaccgcttgctgcaactctctcagggccaggcggtgaagggcaatcagctgt tgcccgtctcactggtgaaaagaaaaaccaccctggcgcccaatacgcaaaccgcctctc cccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccctgtctcttata cacatctccgagcccacgagacgccactcgagttatttgccgactaccttggtgatctcgcctttcacgtag

Integrated DNA Technology (IDT) (https://www.idtdna.com/pages/products/genes-and-gene-fragments/gblocks-gene-fragments) synthesized the resulting 492 nt-long sequence as double-stranded gBlock with randomly mixed bases (at the 15 consecutive positions indicated by N).

Generation of plasmid barcode library

We digested the empty Tn7 integration plasmids with NotI, purified them by ethanol precipitation, and then mixed the plasmids with gBlocks in a 1:3 molar ratio in the presence of NEBbuilder HiFi DNA (New England Biolabs) assembly mix according to the manufacturer’s instructions. We used a control without NEBbuilder HiFi DNA assembly mix to determine the background level (Tn7 integration plasmid lacking the cassette). We concentrated the reactions by ethanol precipitation and transformed them into 100 μL of TransforMax™ EC100D™ pir+ cells (Lucigen). We resuspended the transformed cells in 1 mL of SOC medium, regenerated them for 1 hour at 37 °C with shaking followed by overnight incubation on the bench, and then plated the cells on 35 μg/ml kanamycin LB agar plates (100 μL of transformed cells per plate). We plated several microliters of diluted cultures separately to determine the number of colony-forming cells. We incubated plates overnight at 37 °C. Cells transformed with DNA but without the assembly mix produced no colonies. Transformation of DNA with the assembly mix generated a total of ∼2.4 × 10⁶ colonies (Fig. 1D). We scraped all the colonies from the plates, pooled them together, and thoroughly mixed them. Finally, we extracted plasmids from the pooled cells with a Qiagen midi kit.

Chromosomal integration of the barcode library into E. coli cells

This is a two-step process, the first step being the transformation of the recipient E. coli MG1655 cells with the Tn7 helper plasmid and induction of the transposase integration machinery. The second step is the transformation of the Tn7 integration plasmid library, which will integrate the barcodes into the chromosome (Fig. 1D). We grew cells transformed with the Tn7 helper plasmid overnight until saturation in LB supplemented with 100 μg/mL ampicillin at 30 °C. We then diluted these cells 1:100 and grew them under the same condition for 45 min. We added 0.2% arabinose and diluted cells to OD600nm of 0.5. We then harvested the cells, washed them 3 times with ice-cold water, and transformed them with the Tn7 integration plasmid library using electroporation. We resuspended the transformed cells in 1 mL of SOC medium, regenerated them for 1 hour at 30 °C with shaking followed by overnight incubation on the bench, and then plated them on 100 μg/mL spectinomycin LB agar plates (100 μL of transformed cells per plate). We plated several microliters of diluted culture separately to determine the number of colony-forming cells. We incubated plates overnight at 37 °C and produced ∼2 × 10⁶ colonies in total. All randomly-picked colonies (over 50) failed to grow on ampicillin, suggesting that overnight incubation at 37 °C is sufficient to cure the majority of cells of the Tn7 helper plasmids. Similarly, we observed no growth on kanamycin, showing that the Tn7 integration plasmids were no longer present. We further validated the chromosomal incorporation of the barcode-carrying cassettes by colony PCR with a pair of primers directed to the Tn7 integration site. All randomly-picked colonies (over 50) were positive for the chromosomal integration. We scraped all the colonies from the plates, pooled them together, thoroughly mixed them, aliquoted them with 15% glycerol, and stored them at −80 °C.

Deep sequencing of the gBlocks library, the Tn7 integration plasmid library, and the naïve barcoded E. coli population

Sample preparation for deep sequencing involved four steps. First, we amplified the barcode-carrying cassettes with Illumina adaptor primers (forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG; reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG). We used gBlocks and the Tn7 integration plasmid library directly as DNA templates in the PCR reaction. In the case of the naïve barcoded E. coli population, we performed the PCR reaction either directly on the cells or, alternatively, on the genomic DNA extracted from the barcoded population with Nucleospin Microbial DNA prep kit (Machary-Nagel). Second, we separated the PCR product on 1% agarose gel, excised it, and purified it using NucleoSpin Gel extraction kit (Machary-Nagel). Third, we subjected the gel-purified product of the first PCR reaction to a second PCR reaction using a pair of index primers from Nextera XT DNA library preparation kit (Illumina). Fourth, we purified the product of the second PCR reaction with Agencourt AMPure XP PCR purification kit (Beckman Coulter). We performed sequencing on MiSeq or NextSeq platforms (Illumina).

Laboratory evolution

We subjected the naïve barcoded E. coli population to laboratory evolution via serial passaging under five distinct growth conditions: ‘low’ chloramphenicol (CMP) (1-3 μg/mL), ‘ultra-low’ CMP (0.1 μg/mL), ‘low’ trimethoprim (TMP) (0.1-1.2 μg/mL), ‘ultra-low’ TMP (0.01-0.1 μg/ml), and ‘no drug’ (Fig. S3A,B). We grew cells at 37 °C in a 96-well microtiter plate (500 μL per well) in supplemented M9 medium (0.2% glucose, 1mM MgSO4, 0.1% casamino acids, 0.5 mg/ml thiamin). Between passages, we grew cultures for 8-9 hours (during the day) or 10-12 hours (during the night). We used saturated culture from a previous passage to inoculate a fresh plate by 1:100 dilution (5 μL of saturated cultured into 500 μL of fresh medium). Overall, we performed 70 passages. We measured optical density (OD) of the cultures at 600 nm at the end of each passage. We convert the raw OD measurements to an estimated number of cells (Fig. S3C,D) as follows: we first multiply by 10 to correct for dilution of the measured culture, then multiply by 2 to standardize the OD to 1 cm path length, then multiply by cell density 10⁸ cells/mL/OD, and finally multiply by 0.2 mL as the volume of the well. We stored every second passage at −80 °C after addition of 15% glycerol.

Fitness measurements of the evolving barcoded E. coli populations

We estimated fitness of the evolving populations by calculating the change in IC50 of chloramphenicol or trimethoprim. To this end, we sampled cells from three populations evolved in ‘low’ CMP (wells A1, B1, and C1), three populations evolved in ‘ultra-low’ CMP (wells A3, B3, and C3), and two populations evolved without antibiotics (wells E9 and F9) at passages 0, 8, 10, 12, 20, 30, and 70. We diluted these samples 1:100 into growth medium supplemented with 0, 1, 2, 4, 8, or 16 μg/mL of chloramphenicol, and followed their growth by OD measurements at 600 nm (Figs. S2A, S4A). We calculated the area under these growth curves over the time of growth (Fig. S4B) and normalized the area values so that they equaled 1 at zero antibiotic (Fig. S4C). We determined the IC50 by calculating the concentration of antibiotic at which growth (defined as normalized area under the growth curve) was reduced by 50% relative to zero antibiotic (Fig. S4C). We inferred the IC50 concentration by interpolating the area vs. drug concentration curves. We similarly obtained IC50 values for two populations evolved in ‘low’ TMP (wells A5 andB5), two populations evolved in ‘ ultra-low’ TMP (wells G7 and H7), and two populations evolved without antibiotics (wells E9 and F9), all at the same time points as for CMP. We used TMP concentrations 0, 0.5, 1, 2, 5, 10, and 20 μg/mL.

Deep sequencing of the evolving barcoded E. coli populations

We sequenced barcodes at 16 time points over the evolution experiment for the same 12 independent populations used for the fitness measurements: three populations in ‘low’ CMP (wells A1, B1, and C1), three populations in ‘ultra-low’ CMP (wells A3, B3, and C3), two populations in ‘low’ TMP (wells A5 and B5), two populations in ‘ultra-low’ TMP (wells G7 and H7), and two populations in no antibiotics (wells E9 and F9). We first amplified these 192 bacterial cultures with Illumina adaptor primers (15 μL of defrosted cells from each culture). We performed the second PCR in two groups, each with 96 unique Nextera XT primers. We then pooled together 96 PCR reactions from each group, spiked them with 30% of PhiX DNA, and sequenced them on a Nextseq High Output 75 platform. The sequencing protocol commenced with 9 dark cycles to account for the sequence redundancy preceding the barcode area. For one sample – ‘ultra-low’ CMP, replicate 2 (well A3) at passage 54 – all PCR reactions failed, and thus we excluded this sample from all further analysis.

Analysis of sequencing data

First, we exclude sequencing samples that report fewer than 10⁶ reads; this affects one of the samples from the initial population and six of the samples from the evolving populations (Table S1). All remaining samples have between 10⁶ and 7 × 10⁶ reads. Next, we exclude all reads with minimum base quality score less than 10 (Phred scale), which affects 0.02-0.05% of reads (Table S1). To identify barcodes, we first align each read to the reference sequence for the barcode cassette. We allow up to three mismatches or one indel with respect to the reference; we also require that the read overlap the barcode by at least 10 nt. With these criteria we identify barcodes on more than 95% of reads in almost all samples (Table S1). For each read with a valid alignment, we extract a barcode as the sequence aligning to the variable region in the reference.

To correct for sequencing errors in the raw barcodes, we use the bartender package⁴⁵ on default settings to cluster together barcodes with similar sequences. In general, this method assumes that a low-frequency barcode differing at only one or two bases from a high-frequency barcode is the result of a sequencing error, so that the low-frequency barcode is merged into the high-frequency one. This produces a set of putatively true barcodes for the sample.

To ensure that we identify true barcodes consistently across samples, we first pool raw barcodes and perform clustering on these pooled samples. We pool barcodes both across time points for each population (to build trajectories of barcodes over time in each population) as well as across populations for each time point (to compare barcodes between populations). After clustering we disaggregate the true barcodes from the pooled data back into the individual samples, where we normalize them by the total number of reads in that sample. This yields a set of lineage frequencies {x_k} (where the index k runs over all barcodes) for each population at each time point.

Quantifying lineage diversity

The simplest way to quantify the diversity of lineages in a population is to count the number of unique barcodes observed at a particular time point (Fig. 3A). However, if lineages differ widely in frequency, then this measure may not be very informative and will suffer from significant sampling bias (since very low-frequency barcodes will be under-sampled). A more general approach is to define the effective number of lineages using the diversity index q_D from ecology³⁴. We construct this definition in analogy with the case where all lineages are at equal frequency, so that the number of lineages is simply the reciprocal of this frequency:

When lineages are not at equal frequencies, we replace the frequency in the denominator by a mean frequency over all lineages. Define the generalized mean (also known as the power mean) of a quantity h_k, with normalized weights p_k (Σ_k p_k = 1) and parameter q:

The parameter q controls how strongly the mean depends on very small or very large values of h_k: lower values of q more strongly weigh low values of h_k, while higher values of q weight high values of h_k more strongly. When q = 2, Eq. 2 reduces to the ordinary arithmetic mean; when q = 1, it is equivalent to the geometric mean; and when q = 0, it is the harmonic mean.

We therefore define the effective number of lineages as the reciprocal of the generalized mean frequency over all lineages, weighing each frequency by itself^34,35:

The diversity index is mathematically equivalent to the exponential of the Renyi entropy in physics⁴⁶. Special values of the parameter q correspond to common ecological measures of diversity:

Note that the effective number of lineages according to Eq. 3 equals the actual number of lineages for any q if all lineages have equal frequency. Figure 3 shows the diversity indices for q = 0, q = 1, and q = ∞ for each population over the course of the evolution experiment.

Quantifying dissimilarity of lineage composition between populations

We can also use ecological diversity indices to quantitatively compare the lineage compositions of two or more populations. Let M be the number of populations we comparing (M ≥ 2), and let be the frequency of lineage k in population p (= 1, 2, …, M) at a particular time point. If we pool together all M populations, the frequency of lineage k is

The total diversity of the pooled population (“gamma diversity”) is^34,35:

We can decompose this total diversity into two factors:

The first factor on the right-hand side of Eq. 10 is the mean diversity across all populations (“alpha diversity”): where ^qD^(p) is the diversity of population p alone (Eq. 3). The second factor on the right-hand side of Eq. 10 is the effective number of distinct populations (“beta diversity”):

This quantity has a minimum value of 1 if the populations have identical lineages at identical frequencies, and a maximum value of M if none of the populations have any lineages in common. To simplify the interpretation of this quantity across cases where we may be comparing different numbers of populations (e.g., three replicates in ‘low’ CMP versus two replicates in ‘low’ TMP), we shift and rescale ^qM_eff to obtain a measure of dissimilarity between populations that ranges from 0 to 1:

We plot this normalized quantity in all figures (Figs. 4, S8, S9).

In the case of q = 0, ^qM_eff simply measures how many lineages are in common between the populations under comparison. Let B^(p) be the set of lineages with nonzero frequencies in population p, and let |B^(p)| denote the number of lineages in this set. Then the effective number of distinct populations is

For two populations (M = 2), we can rewrite this as where we have invoked the inclusion-exclusion principle for sets: . That is, ^qM_eff equals two minus the harmonic mean of the fractions of overlapping lineages between the populations. For example, ^qM_eff = 1.8 means that the two populations have 20% of their lineages in common.

In the case of q = 1, the effective number of lineages is the Shannon diversity (Eq. 5). Therefore the effective number of distinct populations is equivalent to the exponential of the Jensen-Shannon divergence between the populations:

This is also equivalent to the weighted sum of the Kullback-Leibler divergences between each population and the pooled population.

In the case of q = ∞, the effective number of distinct populations depends only on the most abundant lineage in the pooled population and across all populations. That is, and and so

Clustering lineage frequency trajectories

For each population, we exclude barcoded lineages that have zero detected frequency at more time points than a minimum number calculated as where ⌊ ⌋ is the floor function that rounds the argument down to the nearest integer. This ensures that all pairs of remaining lineages have at least 10 time points at which they both have nonzero frequency. This leaves between 310 and 695 lineages for each population. We cluster the frequency trajectories x_k(t) for these lineages using the hierarchical clustering routine in SciPy⁴⁷. The distance metric between two lineages k₁ and k₂ is where ρ(logx_k₁(t), log x_k(t))) is the Pearson correlation coefficient between the logarithms of both frequency trajectories (excluding time points where either frequency is zero); Fig. S13 shows matrices of all pairwise trajectory distances. We use the “average” linkage method (equivalent to unweighted pair group method with arithmetic mean, or UPGMA), which calculates the distance between two clusters as the arithmetic mean of the distances between all trajectories in both clusters. Other linkage methods produce qualitatively similar results. The hierarchical clustering results in dendrograms as shown in Fig. S13. Finally, we form flat clusters by setting thresholds on the dendrograms, which we manually choose for each population; these thresholds are shown on the dendrograms in Fig. S13 and range from 0.35 to 0.65, which roughly mean that the correlation between trajectories within clusters is at least 0.35 or 0.65.

Authors Contribution

S.B. conceived the study. J.L. and W.J. performed the laboratory experiments. M.M. analyzed sequence data and applied ecological tools. S.B., M.M., A.W.R.S., L.G., analyzed the data. S.B. and M.M. wrote the manuscript. All authors approved of the final manuscript.

Acknowledgments

We thank Lu Zhao, Jose Rojas Echenique, Sasha Levy, and Alberto Pascual Garcia for valuable discussions and advice, and Dan Tawfik and Amir Aharoni for insightful comments and help with preparation of the manuscript. MM was supported by an F32 fellowship from the National Institutes of Health (GM116217) and an Ambizione grant from the Swiss National Science Foundation (PZ00P3_180147). A.W.R.S. acknowledges support from the Canadian Natural Sciences and Engineering Research Council (NSERC RN000524). This work was supported by personal Israel Science Foundation grant 1630/15 to SB.

Footnotes

↵* e-mail: shimonb{at}bgu.ac.il

References

↵
Lazar, V. et al. Genome-wide analysis captures the determinants of the antibiotic cross-resistance interaction network. Nat Commun 5, 4352, doi:10.1038/ncomms5352 (2014).
OpenUrl CrossRef PubMed
Deatherage, D. E. & Barrick, J. E. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol Biol 1151, 165–188, doi:10.1007/978-1-4939-0554-6_12 (2014).
OpenUrl CrossRef PubMed Web of Science
Otto, M. Next-generation sequencing to monitor the spread of antimicrobial resistance. Genome Med 9, 68, doi:10.1186/s13073-017-0461-x (2017).
OpenUrl CrossRef
↵
Xue, Y. & Wilcox, W. R. Changing paradigm of cancer therapy: precision medicine by next-generation sequencing. Cancer Biol Med 13, 12–18, doi:10.28092/j.issn.2095-3941.2016.0003 (2016).
OpenUrl CrossRef
↵
Blundell, J. R. et al. The dynamics of adaptive genetic diversity during the early stages of clonal evolution. Nat Ecol Evol, doi:10.1038/s41559-018-0758-1 (2018).
OpenUrl CrossRef
Desai, M. M., Walczak, A. M. & Fisher, D. S. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics 193, 565–585, doi:10.1534/genetics.112.147157 (2013).
OpenUrl Abstract/FREE Full Text
Lang, G. I., Botstein, D. & Desai, M. M. Genetic variation and the fate of beneficial mutations in asexual populations. Genetics 188, 647–661, doi:10.1534/genetics.111.128942 (2011).
OpenUrl Abstract/FREE Full Text
↵
Neher, R. A. & Hallatschek, O. Genealogies of rapidly adapting populations. Proc Natl Acad Sci U S A 110, 437–442, doi:10.1073/pnas.1213113110 (2013).
OpenUrl Abstract/FREE Full Text
↵
Fox, E. J., Reid-Bayliss, K. S., Emond, M. J. & Loeb, L. A. Accuracy of Next Generation Sequencing Platforms. Next Gener Seq Appl 1, doi:10.4172/jngsa.1000106 (2014).
OpenUrl CrossRef
↵
Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci Rep 8, 10950, doi:10.1038/s41598-018-29325-6 (2018).
OpenUrl CrossRef
↵
Hegreness, M., Shoresh, N., Hartl, D. & Kishony, R. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615–1617, doi:10.1126/science.1122469 (2006).
OpenUrl Abstract/FREE Full Text
Stannek, L., Egelkamp, R., Gunka, K. & Commichau, F. M. Monitoring intraspecies competition in a bacterial cell population by cocultivation of fluorescently labelled strains. J Vis Exp, e51196, doi:10.3791/51196 (2014).
OpenUrl CrossRef
↵
Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio 6, e00306–00315, doi:10.1128/mBio.00306-15 (2015).
OpenUrl CrossRef PubMed
↵
Blundell, J. R. & Levy, S. F. Beyond genome sequencing: lineage tracking with barcodes to study the dynamics of evolution, infection, and cancer. Genomics 104, 417–430, doi:10.1016/j.ygeno.2014.09.005 (2014).
OpenUrl CrossRef
↵
Jaffe, M., Sherlock, G. & Levy, S. F. iSeq: A New Double-Barcode Method for Detecting Dynamic Genetic Interactions in Yeast. G3 (Bethesda) 7, 143–153, doi:10.1534/g3.116.034207 (2017).
OpenUrl Abstract/FREE Full Text
Levy, S. F. et al. Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature 519, 181–186, doi:10.1038/nature14279 (2015).
OpenUrl CrossRef PubMed
↵
Peikon, I. D., Gizatullina, D. I. & Zador, A. M. In vivo generation of DNA sequence diversity for cellular barcoding. Nucleic Acids Res 42, e127, doi:10.1093/nar/gku604 (2014).
OpenUrl CrossRef PubMed
Jahn, L. J. et al. Chromosomal barcoding as a tool for multiplexed phenotypic characterization of laboratory evolved lineages. Sci Rep 8, 6961, doi:10.1038/s41598-018-25201-5 (2018).
OpenUrl CrossRef
↵
Cira, N. J., Pearce, M. T. & Quake, S. R. Neutral and selective dynamics in a synthetic microbial community. Proc Natl Acad Sci U S A 115, E9842–E9848, doi:10.1073/pnas.1808118115 (2018).
OpenUrl Abstract/FREE Full Text
↵
Andersson, D. I. & Hughes, D. Microbiological effects of sublethal levels of antibiotics. Nat Rev Microbiol 12, 465–478, doi:10.1038/nrmicro3270 (2014).
OpenUrl CrossRef PubMed
↵
Davies, J., Spiegelman, G. B. & Yim, G. The world of subinhibitory antibiotic concentrations. Curr Opin Microbiol 9, 445–453, doi:10.1016/j.mib.2006.08.006 (2006).
OpenUrl CrossRef PubMed Web of Science
↵
Fajardo, A. & Martinez, J. L. Antibiotics as signals that trigger specific bacterial responses. Curr Opin Microbiol 11, 161–167, doi:10.1016/j.mib.2008.02.006 (2008).
OpenUrl CrossRef PubMed Web of Science
↵
Thomason, L. C., Sawitzke, J. A., Li, X., Costantino, N. & Court, D. L. Recombineering: genetic engineering in bacteria using homologous recombination. Curr Protoc Mol Biol 106, 1 16 11–39, doi:10.1002/0471142727.mb0116s106 (2014).
OpenUrl CrossRef
↵
Choi, K. H. & Schweizer, H. P. mini-Tn7 insertion in bacteria with single attTn7 sites: example Pseudomonas aeruginosa. Nat Protoc 1, 153–161, doi:10.1038/nprot.2006.24 (2006).
OpenUrl CrossRef PubMed
↵
McKenzie, G. J. & Craig, N. L. Fast, easy and efficient: site-specific insertion of transgenes into enterobacterial chromosomes using Tn7 without need for selection of the insertion event. BMC Microbiol 6, 39, doi:10.1186/1471-2180-6-39 (2006).
OpenUrl CrossRef PubMed
↵
Baquero, F., Alvarez-Ortega, C. & Martinez, J. L. Ecology and evolution of antibiotic resistance. Environ Microbiol Rep 1, 469–476, doi:10.1111/j.1758-2229.2009.00053.x (2009).
OpenUrl CrossRef PubMed
↵
Baquero, F. & Negri, M. C. Selective compartments for resistant microorganisms in antibiotic gradients. Bioessays 19, 731–736, doi:10.1002/bies.950190814 (1997).
OpenUrl CrossRef PubMed Web of Science
↵
Wolfe, A. D. & Hahn, F. E. Mode of Action of Chloramphenicol. Ix. Effects of Chloramphenicol Upon a Ribosomal Amino Acid Polymerization System and Its Binding to Bacterial Ribosome. Biochim Biophys Acta 95, 146–155 (1965).
OpenUrl CrossRef PubMed
↵
Brogden, R. N., Carmine, A. A., Heel, R. C., Speight, T. M. & Avery, G. S. Trimethoprim: a review of its antibacterial activity, pharmacokinetics and therapeutic use in urinary tract infections. Drugs 23, 405–430, doi:10.2165/00003495-198223060-00001 (1982).
OpenUrl CrossRef PubMed Web of Science
↵
Gullberg, E. et al. Selection of resistant bacteria at very low antibiotic concentrations. PLoS Pathog 7, e1002158, doi:10.1371/journal.ppat.1002158 (2011).
OpenUrl CrossRef PubMed
↵
Andersson, D. I. & Hughes, D. Evolution of antibiotic resistance at non-lethal drug concentrations. Drug Resist Updat 15, 162–172, doi:10.1016/j.drup.2012.03.005 (2012).
OpenUrl CrossRef PubMed
↵
Bjorkman, J. & Andersson, D. I. The cost of antibiotic resistance from a bacterial perspective. Drug Resist Updat 3, 237–245, doi:10.1054/drup.2000.0147 (2000).
OpenUrl CrossRef PubMed Web of Science
↵
Crow, J. F. & Kimura, M. An introduction to population genetics theory. (Harper & Row, 1970).
↵
Jost, L. Entropy and diversity. Oikos 113, 363–375, doi:DOI 10.1111/j.2006.0030-1299.14714.x (2006).
OpenUrl CrossRef Web of Science
↵
Tuomisto, H. A diversity of beta diversities: straightening up a concept gone awry. Part 1. Defining beta diversity as a function of alpha and gamma diversity. Ecography 33, 2–22, doi:10.1111/j.1600-0587.2009.05880.x (2010).
OpenUrl CrossRef Web of Science
↵
Bershtein, S., Choi, J. M., Bhattacharyya, S., Budnik, B. & Shakhnovich, E. Systems-level response to point mutations in a core metabolic enzyme modulates genotype-phenotype relationship. Cell Rep 11, 645–656, doi:10.1016/j.celrep.2015.03.051 (2015).
OpenUrl CrossRef
↵
de Visser, J. A. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15, 480–490, doi:10.1038/nrg3744 (2014).
OpenUrl CrossRef PubMed
↵
Lobkovsky, A. E. & Koonin, E. V. Replaying the tape of life: quantification of the predictability of evolution. Front Genet 3, 246, doi:10.3389/fgene.2012.00246 (2012).
OpenUrl CrossRef PubMed
↵
Toprak, E. et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat Genet 44, 101–105, doi:10.1038/ng.1034 (2011).
OpenUrl CrossRef PubMed
↵
Barrett, R. D. & Schluter, D. Adaptation from standing genetic variation. Trends EcolEvol 23, 38–44, doi:10.1016/j.tree.2007.09.008 (2008).
OpenUrl CrossRef PubMed Web of Science
↵
Liu, A. et al. Selective advantage of resistant strains at trace levels of antibiotics: a simple and ultrasensitive color test for detection of antibiotics and genotoxic agents. Antimicrob Agents Chemother 55, 1204–1210, doi:10.1128/AAC.01182-10 (2011).
OpenUrl Abstract/FREE Full Text
↵
Westhoff, S. et al. The evolution of no-cost resistance at sub-MIC concentrations of streptomycin in Streptomyces coelicolor. ISME J 11, 1168–1178, doi:10.1038/ismej.2016.194 (2017).
OpenUrl CrossRef
↵
McKown, R. L., Orle, K. A., Chen, T. & Craig, N. L. Sequence requirements of Escherichia coli attTn7, a specific site of transposon Tn7 insertion. J Bacteriol 170, 352–358 (1988).
OpenUrl Abstract/FREE Full Text
Peters, J. E. & Craig, N. L. Tn7: smarter than we thought. Nat Rev Mol Cell Biol 2, 806–814, doi:10.1038/35099006 (2001).
OpenUrl CrossRef PubMed Web of Science
↵
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747, doi:10.1093/bioinformatics/btx655 (2018).
OpenUrl CrossRef
↵
Rényi, A. On measures of information and entropy. Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics, and Probability, 547–561. (1960).
↵
Jones E, O. E., Peterson P. SciPy: Open Source Scientific Tools for Python. http://www.scipy.ors/ Version0.18.0. (2001).

View the discussion thread.

Posted March 08, 2019.

Download PDF

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11718)
Bioengineering (8724)
Bioinformatics (29132)
Biophysics (14936)
Cancer Biology (12051)
Cell Biology (17360)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14146)
Epidemiology (2067)
Evolutionary Biology (18269)
Genetics (12223)
Genomics (16768)
Immunology (11844)
Microbiology (28016)
Molecular Biology (11560)
Neuroscience (60822)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10401)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] ↵
Lazar, V. et al. Genome-wide analysis captures the determinants of the antibiotic cross-resistance interaction network. Nat Commun 5, 4352, doi:10.1038/ncomms5352 (2014).
OpenUrl CrossRef PubMed

[2] Deatherage, D. E. & Barrick, J. E. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol Biol 1151, 165–188, doi:10.1007/978-1-4939-0554-6_12 (2014).
OpenUrl CrossRef PubMed Web of Science

[3] Otto, M. Next-generation sequencing to monitor the spread of antimicrobial resistance. Genome Med 9, 68, doi:10.1186/s13073-017-0461-x (2017).
OpenUrl CrossRef

[4] ↵
Xue, Y. & Wilcox, W. R. Changing paradigm of cancer therapy: precision medicine by next-generation sequencing. Cancer Biol Med 13, 12–18, doi:10.28092/j.issn.2095-3941.2016.0003 (2016).
OpenUrl CrossRef

[5] ↵
Blundell, J. R. et al. The dynamics of adaptive genetic diversity during the early stages of clonal evolution. Nat Ecol Evol, doi:10.1038/s41559-018-0758-1 (2018).
OpenUrl CrossRef

[6] Desai, M. M., Walczak, A. M. & Fisher, D. S. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics 193, 565–585, doi:10.1534/genetics.112.147157 (2013).
OpenUrl Abstract/FREE Full Text

[7] Lang, G. I., Botstein, D. & Desai, M. M. Genetic variation and the fate of beneficial mutations in asexual populations. Genetics 188, 647–661, doi:10.1534/genetics.111.128942 (2011).
OpenUrl Abstract/FREE Full Text

[8] ↵
Neher, R. A. & Hallatschek, O. Genealogies of rapidly adapting populations. Proc Natl Acad Sci U S A 110, 437–442, doi:10.1073/pnas.1213113110 (2013).
OpenUrl Abstract/FREE Full Text

[9] ↵
Fox, E. J., Reid-Bayliss, K. S., Emond, M. J. & Loeb, L. A. Accuracy of Next Generation Sequencing Platforms. Next Gener Seq Appl 1, doi:10.4172/jngsa.1000106 (2014).
OpenUrl CrossRef

[10] ↵
Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci Rep 8, 10950, doi:10.1038/s41598-018-29325-6 (2018).
OpenUrl CrossRef

[11] ↵
Hegreness, M., Shoresh, N., Hartl, D. & Kishony, R. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615–1617, doi:10.1126/science.1122469 (2006).
OpenUrl Abstract/FREE Full Text

[12] Stannek, L., Egelkamp, R., Gunka, K. & Commichau, F. M. Monitoring intraspecies competition in a bacterial cell population by cocultivation of fluorescently labelled strains. J Vis Exp, e51196, doi:10.3791/51196 (2014).
OpenUrl CrossRef

[13] ↵
Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio 6, e00306–00315, doi:10.1128/mBio.00306-15 (2015).
OpenUrl CrossRef PubMed

[14] ↵
Blundell, J. R. & Levy, S. F. Beyond genome sequencing: lineage tracking with barcodes to study the dynamics of evolution, infection, and cancer. Genomics 104, 417–430, doi:10.1016/j.ygeno.2014.09.005 (2014).
OpenUrl CrossRef

[15] ↵
Jaffe, M., Sherlock, G. & Levy, S. F. iSeq: A New Double-Barcode Method for Detecting Dynamic Genetic Interactions in Yeast. G3 (Bethesda) 7, 143–153, doi:10.1534/g3.116.034207 (2017).
OpenUrl Abstract/FREE Full Text

[16] Levy, S. F. et al. Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature 519, 181–186, doi:10.1038/nature14279 (2015).
OpenUrl CrossRef PubMed

[17] ↵
Peikon, I. D., Gizatullina, D. I. & Zador, A. M. In vivo generation of DNA sequence diversity for cellular barcoding. Nucleic Acids Res 42, e127, doi:10.1093/nar/gku604 (2014).
OpenUrl CrossRef PubMed

[18] Jahn, L. J. et al. Chromosomal barcoding as a tool for multiplexed phenotypic characterization of laboratory evolved lineages. Sci Rep 8, 6961, doi:10.1038/s41598-018-25201-5 (2018).
OpenUrl CrossRef

[19] ↵
Cira, N. J., Pearce, M. T. & Quake, S. R. Neutral and selective dynamics in a synthetic microbial community. Proc Natl Acad Sci U S A 115, E9842–E9848, doi:10.1073/pnas.1808118115 (2018).
OpenUrl Abstract/FREE Full Text

[20] ↵
Andersson, D. I. & Hughes, D. Microbiological effects of sublethal levels of antibiotics. Nat Rev Microbiol 12, 465–478, doi:10.1038/nrmicro3270 (2014).
OpenUrl CrossRef PubMed

[21] ↵
Davies, J., Spiegelman, G. B. & Yim, G. The world of subinhibitory antibiotic concentrations. Curr Opin Microbiol 9, 445–453, doi:10.1016/j.mib.2006.08.006 (2006).
OpenUrl CrossRef PubMed Web of Science

[22] ↵
Fajardo, A. & Martinez, J. L. Antibiotics as signals that trigger specific bacterial responses. Curr Opin Microbiol 11, 161–167, doi:10.1016/j.mib.2008.02.006 (2008).
OpenUrl CrossRef PubMed Web of Science

[23] ↵
Thomason, L. C., Sawitzke, J. A., Li, X., Costantino, N. & Court, D. L. Recombineering: genetic engineering in bacteria using homologous recombination. Curr Protoc Mol Biol 106, 1 16 11–39, doi:10.1002/0471142727.mb0116s106 (2014).
OpenUrl CrossRef

[24] ↵
Choi, K. H. & Schweizer, H. P. mini-Tn7 insertion in bacteria with single attTn7 sites: example Pseudomonas aeruginosa. Nat Protoc 1, 153–161, doi:10.1038/nprot.2006.24 (2006).
OpenUrl CrossRef PubMed

[25] ↵
McKenzie, G. J. & Craig, N. L. Fast, easy and efficient: site-specific insertion of transgenes into enterobacterial chromosomes using Tn7 without need for selection of the insertion event. BMC Microbiol 6, 39, doi:10.1186/1471-2180-6-39 (2006).
OpenUrl CrossRef PubMed

[26] ↵
Baquero, F., Alvarez-Ortega, C. & Martinez, J. L. Ecology and evolution of antibiotic resistance. Environ Microbiol Rep 1, 469–476, doi:10.1111/j.1758-2229.2009.00053.x (2009).
OpenUrl CrossRef PubMed

[27] ↵
Baquero, F. & Negri, M. C. Selective compartments for resistant microorganisms in antibiotic gradients. Bioessays 19, 731–736, doi:10.1002/bies.950190814 (1997).
OpenUrl CrossRef PubMed Web of Science

[28] ↵
Wolfe, A. D. & Hahn, F. E. Mode of Action of Chloramphenicol. Ix. Effects of Chloramphenicol Upon a Ribosomal Amino Acid Polymerization System and Its Binding to Bacterial Ribosome. Biochim Biophys Acta 95, 146–155 (1965).
OpenUrl CrossRef PubMed

[29] ↵
Brogden, R. N., Carmine, A. A., Heel, R. C., Speight, T. M. & Avery, G. S. Trimethoprim: a review of its antibacterial activity, pharmacokinetics and therapeutic use in urinary tract infections. Drugs 23, 405–430, doi:10.2165/00003495-198223060-00001 (1982).
OpenUrl CrossRef PubMed Web of Science

[30] ↵
Gullberg, E. et al. Selection of resistant bacteria at very low antibiotic concentrations. PLoS Pathog 7, e1002158, doi:10.1371/journal.ppat.1002158 (2011).
OpenUrl CrossRef PubMed

[31] ↵
Andersson, D. I. & Hughes, D. Evolution of antibiotic resistance at non-lethal drug concentrations. Drug Resist Updat 15, 162–172, doi:10.1016/j.drup.2012.03.005 (2012).
OpenUrl CrossRef PubMed

[32] ↵
Bjorkman, J. & Andersson, D. I. The cost of antibiotic resistance from a bacterial perspective. Drug Resist Updat 3, 237–245, doi:10.1054/drup.2000.0147 (2000).
OpenUrl CrossRef PubMed Web of Science

[33] ↵
Crow, J. F. & Kimura, M. An introduction to population genetics theory. (Harper & Row, 1970).

[34] ↵
Jost, L. Entropy and diversity. Oikos 113, 363–375, doi:DOI 10.1111/j.2006.0030-1299.14714.x (2006).
OpenUrl CrossRef Web of Science

[35] ↵
Tuomisto, H. A diversity of beta diversities: straightening up a concept gone awry. Part 1. Defining beta diversity as a function of alpha and gamma diversity. Ecography 33, 2–22, doi:10.1111/j.1600-0587.2009.05880.x (2010).
OpenUrl CrossRef Web of Science

[36] ↵
Bershtein, S., Choi, J. M., Bhattacharyya, S., Budnik, B. & Shakhnovich, E. Systems-level response to point mutations in a core metabolic enzyme modulates genotype-phenotype relationship. Cell Rep 11, 645–656, doi:10.1016/j.celrep.2015.03.051 (2015).
OpenUrl CrossRef

[37] ↵
de Visser, J. A. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15, 480–490, doi:10.1038/nrg3744 (2014).
OpenUrl CrossRef PubMed

[38] ↵
Lobkovsky, A. E. & Koonin, E. V. Replaying the tape of life: quantification of the predictability of evolution. Front Genet 3, 246, doi:10.3389/fgene.2012.00246 (2012).
OpenUrl CrossRef PubMed

[39] ↵
Toprak, E. et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat Genet 44, 101–105, doi:10.1038/ng.1034 (2011).
OpenUrl CrossRef PubMed

[40] ↵
Barrett, R. D. & Schluter, D. Adaptation from standing genetic variation. Trends EcolEvol 23, 38–44, doi:10.1016/j.tree.2007.09.008 (2008).
OpenUrl CrossRef PubMed Web of Science

[41] ↵
Liu, A. et al. Selective advantage of resistant strains at trace levels of antibiotics: a simple and ultrasensitive color test for detection of antibiotics and genotoxic agents. Antimicrob Agents Chemother 55, 1204–1210, doi:10.1128/AAC.01182-10 (2011).
OpenUrl Abstract/FREE Full Text

[42] ↵
Westhoff, S. et al. The evolution of no-cost resistance at sub-MIC concentrations of streptomycin in Streptomyces coelicolor. ISME J 11, 1168–1178, doi:10.1038/ismej.2016.194 (2017).
OpenUrl CrossRef

[43] ↵
McKown, R. L., Orle, K. A., Chen, T. & Craig, N. L. Sequence requirements of Escherichia coli attTn7, a specific site of transposon Tn7 insertion. J Bacteriol 170, 352–358 (1988).
OpenUrl Abstract/FREE Full Text

[44] Peters, J. E. & Craig, N. L. Tn7: smarter than we thought. Nat Rev Mol Cell Biol 2, 806–814, doi:10.1038/35099006 (2001).
OpenUrl CrossRef PubMed Web of Science

[45] ↵
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747, doi:10.1093/bioinformatics/btx655 (2018).
OpenUrl CrossRef

[46] ↵
Rényi, A. On measures of information and entropy. Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics, and Probability, 547–561. (1960).

[47] ↵
Jones E, O. E., Peterson P. SciPy: Open Source Scientific Tools for Python. http://www.scipy.ors/ Version0.18.0. (2001).