Limited specificity of molecular interactions incurs an environment-dependent fitness cost in bacteria

Reliable operation of cellular programs depends crucially on the specificity of biomolecular interactions. In gene regulatory networks, the appropriate expression of genes is determined through the specific binding of transcription factors (TFs) to their cognate DNA sequences. However, the large genomic background likely contains many DNA sequences showing similarity to TF target motifs, potentially allowing for substantial non-cognate TF binding with low specificity. Whether and how non-cognate TF binding impacts cellular function and fitness remains unclear. We show that increased expression of different transcriptional regulators in Escherichia coli and Salmonella enterica can significantly inhibit population growth across multiple environments. This effect depends upon (i) TF binding to a large number of DNA sequences with low specificity, (ii) TF cooperativity, and (iii) the ratio of TF to DNA. DNA binding due to the limited specificity of promiscuous or non-native TFs can thus severely impact fitness, giving rise to a fundamental biophysical constraint on gene regulatory design and evolution.


36
Biology at all levels crucially depends on the timely recognition and interaction between cognate 37 biomolecules (Box 1A). The importance of specificity of molecular encounters in the cell is highlighted 38 by the intricate mechanisms that ensure appropriate, and thus specific interactions, a classic example 39 being kinetic proofreading in the loading of amino acids onto tRNAs (1). In gene regulation, 40 transcription factors (TFs) determine the expression of genes at the right time and place by binding to 41 their respective operator sites in a highly specific manner. Additionally, non-cognate interactions can 5 and an even more substantial reduction in S. enterica cells ( Fig. 1B-D , Table S1). 434 CI also 111 reduced growth in both hosts, though less than λ CI, and interestingly more in its native host, E. coli 112 (Fig. 1B). P22 C2 on the other hand, showed no effect in its native host S. enterica, while stopping 113 growth completely when expressed in E. coli (Fig. 1D, Table S1). There was no significant impact on 114 growth in either host with HK022 CI or with LacI (Fig. 1B), which was expected for the latter -at least 115 in E. coli. Further, no growth defect was seen with our other controls: cells with only plasmid 116 backbone or the control plasmid expressing a fluorescence marker instead of a repressor (Fig. S1A).

117
Thus, four different repressors stemming from the same TF family, but likely having different modes of 118 DNA recognition (26), showed a broad spectrum of growth effects in the two different bacterial host 119 species. We explored these growth effects and their causes further by focusing on the two best-120 characterized ones, λ CI and P22 C2, which are known to have different propensities for binding at 121 DNA sequences far away from their target motif (24) (Box 1B).

123
As a next step, we varied the environmental conditions in which bacteria carrying λ CI or P22 C2 were 124 grown. In rich media (LB), growth inhibition was abolished almost entirely in E. coli for both 125 repressors, and substantially reduced with λ CI expressed in S. enterica ( Fig. 2A, Table S1). Minimal  Table S1). P22 C2 did not affect growth in S. enterica in any 128 of the conditions ( Fig. 2A, Table S1), which is why this combination is generally not discussed further.

130
Next, we tested the dependence of the growth reduction on repressor concentration and induction 131 timing. In E. coli, decreasing the concentration of either repressor showed a gradual recovery of 132 normal growth (Fig. 2B, Table S2). Conversely, even low expression of λ CI in S. enterica resulted in 133 strong growth reductions (Fig. 2B, Table S2). The concentrations used here (see Methods for 134 measurement details) range from 0.5-5 fold of those achieved under physiological lysogen conditions 135 (27). Surprisingly, the induction time point was also an important determinant for λ CI-induced growth 136 reduction, not however for P22 C2-induced ones: while λ CI induction in earlyand mid-exponential 137 growth (as opposed to induction during the lag phase) had progressively smaller effects on growth in 138 E. coli and S. enterica (Fig. 2C, S2B,C, Table S3), this was not the case for P22 C2 in E. coli, where 139 growth was always halted ~2h after repressor induction (Fig. S2D, Table S3). Overall, we found a 140 6 strong dependence of repressor-induced growth reduction on environmental conditions and repressor 141 concentration.

143
Increased repressor expression leads to severe fitness reduction 144 As the severe growth reductions we observed made it difficult to determine meaningful growth rates in 145 our system, we determined the fitness effect of repressor expression in direct competition 146 experiments, which reflect all growth differences between the competitors. As a 'neutral' competitor, 147 we used cells expressing LacI from a plasmid construct that contained an additional YFP-venus 148 marker (Fig. 3A). The venus marker resulted in a minor fitness cost (selection coefficients for cells 149 without the marker were 0.05 (E.coli) and 0.09 (S.enterica), see Methods), meaning that an increase 150 in fluorescence (i.e. LacI-carrying cells) indicates an even more pronounced benefit of the LacI-  Table S4).

157
Growth reductions translated directly into fitness costs as the competition assays were even able to 158 capture the gradual increase in growth reduction with increasing repressor concentration for λ CI in S.  Table S4).

161
Growth reduction is caused by cooperative, low-specificity binding distributed across the 162 genome 163 Given the surprisingly detrimental growth effect of the two repressors in several environments, we set 164 out to determine its cause. Transcriptional repressors are DNA-binding proteins and could therefore 165 interfere with the cellular program through DNA binding at various non-cognate sites (29). To 166 determine the role of TF binding in the observed growth reductions, we used the fact that λ CI is one 167 of the best-studied TFs, and thus an exhaustive range of mutants for most of its functions exist. As 168 neither of these mutants have been characterized for any of the other repressors, we only performed 169 these experiments with λ CI.

171
Specifically, we tested the expression of a mutant that cannot form dimers (30) (as λ CI only binds 172 DNA in its dimeric form (31)), as well as of a mutant defective in DNA binding (32), and found that 173 normal growth (Fig. 4A, Table S3) and fitness (Fig. S3) were almost completely restored in E. coli as 174 well as in S. enterica cells. Similar results for a λ CI mutant defective in cooperativity between 175 repressor dimers (Fig. 4A, Table S3) suggest an important contribution from DNA looping or some 176 other form of repressor oligomerization. This is intriguing as λ CI cooperativity and oligomerization are 177 thought to increase binding specificity (25,33), but likely lead to a general increase in binding strength, 178 particularly in the absence of specific sites. We ruled out that repressor misfolding or aggregation was 179 responsible for our observations by over-expressing a chaperone gene (tig) together with the 180 repressors ( Fig. S1B).

182
Hence, the ability to bind DNA, potentially in a cooperative and motif-dependent manner, seems to be 183 central to repressor-mediated growth effects. We tested this hypothesis by combining λ CI 184 cooperativity with the binding specificity of another repressor, using chimeric TFs. Specifically, we 185 replaced the DNA binding helix of λ CI (see Methods) with: i) that of another phage repressor, 434 CI, 186 which showed some growth defect; and ii) the bacterial repressor, LacI, which showed no growth 187 defect as a wildtype protein (Fig. 1B, Table S5). It has been reported that changes in the geometry of 188 434 CI cooperativity strongly interfere with its binding affinity and the structure of the TF-DNA complex 189 (21,34), which indeed in our experiments resulted in rescue of growth with the λ-434 CI chimera. In 190 contrast, with the λ CI-LacI chimera the growth reductions were even stronger than with λ CI, leading 191 to growth arrest in S. enterica in rich and minimal media (Fig. 4B, Table S5). This opposing behavior 192 of LacI and λ CI-LacI strongly supports our hypothesis that LacI binding affinity and basepair bias are 193 conducive to low-specificity binding (Box 1B), but it is lacking the strong intermolecular cooperativity 194 and oligomerization potential of λ CI (Fig. 1A). The chimera, however, combines these attributes, 195 leading to strong interference with cell growth.

197
In order to determine if the non-cognate binding effects involved (i) a few essential, or (ii) many 198 distributed, regions of the chromosome we performed ChIP-sequencing for λ CI in E. coli and S. 199 enterica. In E. coli the data did not reveal strong peaks for any genomic site, but rather indicated weak 200 8 binding at numerous sites all over the chromosome (Fig. 5A, S4A, Table S6). Note that all of the 201 regions plotted in Fig. 5A are significantly enriched in the presence of λ CI, but they only appear at a 202 more lenient cutoff than typically used for strong binding (see Methods). In S. enterica we found both, 203 distributed weak binding as well as a broad peak (indicating substantial binding in several adjacent 204 genes). Interestingly, this broad peak corresponds to prophage regions on the genome that seem to 205 provide binding hotspots for λ CI (Fig. 5B, S4B, Table S6). As such a binding hotspot was absent in E.

206
coli, but λ CI still showed a growth defect, we did not consider this finding necessary to qualitatively 207 explain our results (although it could account for the stronger growth defects seen in S. enterica).

208
Further, none of the apparent peaks for either genome encoded a gene that is essential or obviously 209 beneficial in minimal media conditions.

211
Using a simple thermodynamic model to predict λ CI binding across the bacterial genome we found a 212 surprisingly high degree of correlation with the number of reads from ChIP-sequencing (Fig. 5C,D), 213 especially given that these models generally perform poorly for low affinity sites (35). Even more 214 surprising, we found a comparable prediction (Fig. 5C,D, S5) with an energy matrix that conserved 215 only the overall λ CI preference for the basepair composition (see Methods). This result could not be 216 explained by nucleotide composition bias in the ChIP-sequencing experiments (Fig. S6), but shows 217 that a large part of the correlation between predicted binding and ChIP-sequencing reads can be 218 explained by the overall genomic basepair bias. Correct basepair composition bias in the DNA 219 sequences could provide λ CI with sufficient recognition pattern to bind with low specificity. In 220 agreement with this hypothesis, the GC bias of the stronger λ CI cognate operators (OR1, OR2, OL1 and 221 OL2) is 52.94%, which is very close to that of the S. enterica genome (52.2%) and only slightly higher 222 than that of the E. coli genome (50.8%). However, the residual sequence-dependent contribution 223 beyond the basepair composition bias is still highly significant in E. coli and weakly significant in S.

225
our results indicate substantial non-cognate binding due to sequence-dependence and basepair bias, 226 which has been also reported, for example, for NAPs (9,10). Non-cognate binding is facilitated by 227 repressor oligomerization (22,36), and distributed over the thousands of low-specificity λ CI binding 228 sites, known to be present in the E. coli genome (29). These findings agree with previous studies on 229 non-cognate binding of λ CI and other prokaryotic TFs (2,3,23,37).  (40)).

246
Similarly, cells that are induced during the lag or early-exponential phase (after 1-2 doublings) will 247 only have on average one chromosome as they did not inherit partially replicated chromosomes from 248 their mothers and grandmothers yet.

250
Thus we set out to test the titration hypothesis by introducing a high copy number plasmid carrying 251 four cognate λ CI binding sites into E. coli cells with inducible λ CI (Fig. S8A), which should reduce the 252 number of free λ CI dimers available for non-cognate binding by about one half (see Methods).

253
Although the expression of λ CI was still detrimental, growth was ~20% faster than for cells without 254 the operators (Fig. S8B). Hence, titration of λ CI alleviates the growth reductionlikely even more so 255 if additional chromosomal DNA is present (e.g. at faster growth), which provides many more potential 256 binding sites with low specificity (29). For P22 C2, which is more discerning in its DNA binding targets 257 (24), partially replicated chromosomes would provide less titration, thus explaining why later induction 258 does not rescue growth. The titration phenomenon is reminiscent of growth bistability in drug resistant 259 bacterial cells, which is caused by feedback between the growth rate and the speed of counteracting 260 toxic agent (41). In our system, the repressors can be seen as 'toxic agents', which are 'counter-261 acted' by dilution if cells manage to start growing, or are growth-arrested if they are not able to dilute 262 the repressors fast enough.

264
The titration hypothesis together with our ChIP-sequencing results implies that the overall ratio of 265 chromosomal DNA to repressor proteins is a crucial factor determining the growth effects. This 266 suggests that non-cognate binding might interfere with global cellular functions, like DNA replication 267 or cell division, which we investigated using fluorescence microscopy of E. coli cells expressing λ CI.

287
We investigated the consequences of limited specificity in molecular recognition of DNA by proteins, 288 using four different, but related, phage repressors and a bacterial repressor, which produced a wide 289 range of effects on host cell growth, from high to no fitness costs, and higher costs either in the native 290 11 or the non-native host (Fig.1B, 4B). Taking advantage of the rich and well-established genetics and 291 biochemistry of the classic bacteriophage repressor λ CI, we found that its fitness cost results from 292 cooperative, low-specificity binding, which interferes with growth by inhibiting cell division. The 293 abundance of low-specificity binding sites in eukaryotic genomes has been shown to play a critical 294 role in gene regulation, potentially increasing robustness and specificity (11). Our data, however, 295 support the hypothesis that binding strategies of prokaryotic TFs are under selection to avoid low-296 specificity binding to the genomic background (29) and highlight the fundamental differences in gene 297 regulatory design between prokaryotes and eukaryotes, and therefore differing evolutionary 298 constraints (48).

300
For prokaryotesin contrast to eukaryotes -TF target sites are sufficiently long to allow specific 301 recognition of single operators (48). Mismatches with the preferred target sequence lead to 302 progressive loss of binding, but the speed of this loss can vary substantially between TFs (24). λ CI, 303 which shows strong operator binding (offset), low mismatch penalties (energy matrix) and strong 304 cooperativity, is likely to be a rather promiscuous binder (Box 1B, (24)) and indeed induced a high 305 fitness cost due to distributed low-specificity binding. For P22 C2, the lower offset and higher 306 mismatch penalties make it a more specific binder (Box 1B, (24)), producing a significant cost only in 307 non-native host cells. LacI, which shows similar binding characteristics as λ CI, only showed a strong 308 fitness effect when coupled with λ CI's intermolecular cooperativity. Together, the mutant and chimera 309 experiments (Fig. 4) demonstrate a significant contribution of cooperativity -and likely oligomerization 310 -to the potential for low-specificity binding, which supports the theoretical finding that TF cooperativity 311 does not strongly alleviate crosstalk when it stabilizes cognate as well as non-cognate binding (13).

312
Hence, the lack of intermolecular cooperativity with LacI could be a sign of its adaptation to be highly 313 specific, as it is one of the few single-target regulators in E. coli (18). Binding cooperativity, offset and 314 TF concentration, can all serve to increase non-cognate binding of a TF (independently of the target 315 motif preference) and the particular interplay between these factors has to be tuned by the cell to 316 avoid fitness costs due to low-specificity interactions. Therefore, considering low-specificity binding is 317 crucial in choosing TFs for synthetic biological systems in order to avoid global toxicity effects, as well 318 as unwanted TF titration, which can affect target gene regulation (16,49).

320 12
The magnitude of the fitness cost depends on a repressor's ability to bind non-cognate DNA at low 321 specificity, as well as on a repressor's relative ratio to the total amount of DNA within the cell. Slow 322 cell growth compounds the effect as cells contain less DNA but accumulate more proteins than at fast 323 growth (40). Additionally, stress tolerance could be higher under optimal growth conditions as found in 324 rich media (50). It does not seem likely, however, that media-specific genes are targeted, as ChIP-325 sequencing generally revealed distributed, low-specificity binding all over the chromosome (Fig. 5).

326
Rather, inhibition of cell division seems to result at least partially from nucleoid localization at mid-cell.

367
We experimentally demonstrated for the first time that low specificity in biomolecular recognition can 368 constitute a limiting factor for cellular function and evolution due to the fundamental biophysical 369 constraints on protein-DNA interactions. However, these costs could be counter-balanced by 370 increased TF robustness to target site mutations or higher evolvability, precisely because interactions 371 can be formed at low specificity (24,58). For example, a TF could co-opt regulation of a non-cognate 372 gene -even if only to a small degreethat provides an advantage in a certain environment, which 373 can subsequently be refined by evolution. This opens up a wider question about the interplay of costs 374 and benefits of low-specificity molecular interactions, especially when these interactions also serve as

386
Titration of λ CI was tested by transforming E. coli cells containing the pZS21-λ cI plasmid with a 387 compatible, high-copy number pZE plasmid (50-70 copies)(59), which carries the natural λ CI

394
In order to test for misfolding of repressor proteins, we used a high copy number plasmid containing a 395 chaperone gene (tig (64)

462
The HiBiT peptide tag was attached at the N-terminal (which is involved in DNA binding) using a 463 (GGGS)2 linker (sequence: GGTGGTGGTTCTGGTGGTGGTTCT) to assure accessibility of the tag 464 for interaction with the detection reagent. Briefly, cells induced (1ng or 25ng aTc) for expression of 465 wildtype repressor or repressor with the HiBit tag were grown to early exponential phase in minimal 466 media with glucose or rich media (LB), pelleted and frozen. Cells were resuspended in media and 467 supplemented with 0.1 culture volume of PopCulture Reagent (Sigma Aldrich), 10 -3 culture volume 468 Benzonase Nuclease (Sigma Aldrich) and 0.5*10 -3 culture volume lysozyme (Sigma Aldrich). Cells 469 were lysed for 30min at room temperature and then kept on ice. A protein standard (Promega) was 470 added to the non-tagged cells as a known reference of protein concentration to luminescence output.

471
Samples were mixed 1:1 with HiBit enzyme mixture and measured in white plates after shaking (in the 472 dark) for 15minutes. Dilution series of tagged repressor and protein standard were measured in a 473 Tecan platereader (Spark 10M) with an integration time of 1.5 seconds.

474
Repressor protein numbers in minimal media with glucose gave about 500 dimers per cell at 25ng 475 aTc and about 50 dimers per cell at 1ng aTc induction (as compared to ~125 dimers of λ CI in 476 lysogenic cells (27)). Similarly, we found around 500 dimers per cell at 25ng aTc in rich media. Note, 477 that the fitness reduction seen for λ CI concentrations at >1ng aTc induction (Fig. 2B)

503
Fluorescence was measured every 30min. and compared between cultures that were induced with 504 aTc (at concentrations as indicated, either 1, 2, 3, 4 or 25ng) and cultures that were not induced. This 505 means that we compared the abundance of fluorescent cells (i.e. abundance of LacI-carrying cells) 506 between cultures expressing and not expressing the repressor.

507
Selection coefficients were calculated using ln[(R + t/Rt)/(R + 0/R -0)], where R + t and Rt represent 508 fluorescence measurements (as a proxy for relative LacI-expressing cell density) of cells with and 509 without inducer aTc (presence or absence of repressor expression) at time t=10h respectively, and 510 R + 0 and R -0 represent fluorescence measurements at the beginning of the experiment.

523
Imaging of cell membranes and DNA positioning was done using a Leica DMI6000B (inverted) 524 microscope with an Andor iXon EM CCD camera (front illuminated, 8x8 square micron pixel size) and 525 a 100x 1,47Na Oil HCX Plan Apo objective, giving an effective pixel size of 64nm/pixel. Images were 526 acquired using 405(20)nm and 561(10)nm laser excitation for blue (Hoechst) and red (NileRed) dyes 527 respectively. The cells were grown overnight in minimal media with glucose or LB, diluted 1:100 in 528 fresh media and grown to early exponential phase in the absence or presence of the inducer aTc.

529
After addition of both dyes (Hoechst at 10ug/mL and NileRed at 1ug/mL), cells were shaken at room 530 temperature for one hour and imaged in drops of the respective growth media. Images were 531 deconvolved using Huygens Professional (version 4.5) and further analyzed using ImageJ.

533
ChIP-sequencing 534 To perform ChIP-sequencing experiments, λ CI was cloned with an HA-Tag at the carboxy-terminal 535 end and transformed into both host strains. HA-tagged λ CI showed the same growth phenotype as 536 wildtype in both bacterial strains (Fig. S13). Samples from strains grown in the presence or absence 537 of λ CI were prepared according to Waldminghaus & Skarstad (2010)(74); library preparation and 538 Illumina Sequencing was performed at the VBCF NGS Unit (www.vbcf.ac.at). The obtained data was 539 analyzed using Galaxy and RStudio.

18
Peak calling was performed using custom R scripts modified from Santhanam et al. (75). Briefly, the 541 genome was computationally partitioned into non-overlapping shorter fragments, typically spanning a 542 few kbs to account for local biases arising from sequence content and immuno-precipitation (76,77).

543
Peak calling was performed within these fragments using partially overlapping (50% overlap) windows 544 of 100bp. For each window, we calculated strand-specific enrichment as the log-ratio of the scaled 545 read coverage between the sample and control ChIP-seq experiments while permitting a maximum of 546 5 reads to be mapped to the same genomic coordinates. We calculated strand-wise p-values for 547 enrichment by first resampling scaled read coverage within each fragment and then randomly 548 partitioning them to calculate enrichments. Finally, we identified bound regions to be those 549 with positive enrichment scores on both strands with a Benjamini-Hochberg false-discovery rate of 550 less than 30% as we were looking for binding of low specificity and ChIP-binding data was previously 551 found to be highly informative for a wide range of specificity profiles (78). For the regions that showed 552 significant enrichment in this analysis we plotted the read-depth across the genome in Fig. 5 (A,B) 553 and for comparison we plotted the read-depth for not significantly enriched regions in Fig. S4. As 554 control for our ChIP-sequencing procedure and analysis we used antibodies against SeqA, which 555 gave the expected peaks as published previously (74).

556
We calculated the nucleotide composition of the sequences underlying enriched regions in ChIP-seq 557 data for both bacterial species (Fig. S6). In order to test for sequence composition bias in these 558 enriched regions, we sought to test if the sequence compositions of the enriched regions were 559 significantly different compared to the rest of the genome. To this end, we randomly selected 50 560 genomic regions with at least 5kbp distance between them. We then calculated the nucleotide 561 composition of these randomly selected regions and by repeating this procedure 1000 times, 562 generated a null distribution for sequence composition of randomly selected genomic regions.

563
Similarly, we calculated the di-nucleotide composition (with 1 bp overlap) of the same randomly 564 selected genomic regions and compared it to that of the enriched regions.

565
The number of reads within 1000bp windows was compared with the predicted binding by calculating 566 binding energy at each genome position (using a sliding window approach) from the λ CI offset (i.e. 567 the energy difference between the repressor being bound specifically to an operator and being free in 568 solution (79)) and the energy penalty as given by the λ CI energy matrix (73). Smaller energies result 569 in stronger binding, meaning positive energy penalties decrease binding affinity (note that negative 570 penalties could increase binding over the one seen with λ CI wildtype operator sites). Binding strength 571 was calculated using 1/(1+exp(E-µ)), with E being the calculated binding energy, as described above 572 and used in (24), and µ being the chemical potential, which we optimized to give the highest 573 Spearman correlation fit (2.6 in E. coli and 2 in S. enterica). For comparison with the number of ChIP-574 sequencing reads, calculated binding strength was summed over the same genomic 1000bp regions 575 (considering binding to both strands). In Fig. 5 (C,D) we plot a non-parametric, non-linear relationship 576 estimate between the predicted binding energy and the ChIP-sequencing reads obtained from a 577 series of conditional medians. To investigate the dependence of the correlation between the affinity 578 predictions and the ChIP-sequencing reads on the structural versus the sequence information 579 contained in the energy matrix, we repeated the analysis with i) a matrix of the same size that 580 19 conserves only the ACGT bias of the λ CI energy matrix (each row contains the average value of that 581 row) or ii) matrices that had completely reshuffled entries. For the latter the average correlation was 582 taken over 100 permutations.

583
To assess the importance of specific sequence information versus nucleotide (GC) bias, we used a 584 Monte-Carlo permutation test: We calculated the difference between Spearman correlations of ChIP 585 reads with binding prediction using the wildtype energy matrix vs binding prediction using the energy 586 matrix that only conserves λ CI basepair bias, for the true ChIP read assignment, and 10 4 random 587 read assignments (null distribution). We found an overall strongly significant difference in E. coli and 588 lower significance in S. enterica (Fig. 5C,D), even though the effect size was small. This means that 589 while most of the measured ChIP signal can be accounted for by a TF model that predicts binding 590 based on the nucleotide content of genomic fragments alone, there is a small but highly significant 591 residual ChIP binding signal that requires the full binding site preference (energy matrix), not just 592 single nucleotide bias, to be explained. Further, we examined the influence of GC content by 593 repeating the Monte-Carlo permutation test for genomic sequences of a specific GC %. Here, we 594 found only a significant motif contribution for the 49% bin in E. coli (Fig. S7).

595
Additionally we used the offset and energy matrix for LacI (25,80) and P22 C2 (81) to predict binding 596 and calculate the Spearman correlation with the λ CI ChIP-sequencing reads (Fig. S5). Basepair bias 597 of the energy matrices was calculated as the sum of the average A and T preference minus the sum 598 of the average G and C preference.

600
Statistical analysis 601 Collected data was tested for normality (Shapiro-test) and subsequently we compared mean OD600 or