A general model of the dynamics of genes with dual σ factor preference

Escherichia coli uses the ability of σ factors to recognize specific DNA sequences in order to quickly control large gene cohorts. While most genes respond to only one σ factor, approximately 5% have dual σ factor preference. The ones in significant numbers are ‘σ70+38 genes’, responsive to σ70, which controls housekeeping genes, as well as to σ38, which activates genes during stationary growth and stresses. We show that σ70+38 genes are almost as upregulated in stationary growth as genes responsive to σ38 alone. Also, their response strengths to σ38 are predictable from their promoter sequences. Next, we propose a sequence- and σ38 level-dependent, analytical model of σ70+38 genes applicable in the exponential, stationary, and in the transition period between the two growth phases. Finally, we propose a general model, applicable to other σ factors as well. This model can guide the design of synthetic circuits with sequence-dependent sensitivity and plasticity to transitions between the exponential and stationary growth phases. Author Summary Present challenges in Synthetic Biology include the design of genetic circuits that are robust to growth phase transitions and whose responsiveness is sequence-dependent, and, thus predictable prior to design. We present and validate an empirical-based, sequence-dependent analytical model of E. coli genes with dual responsiveness to the regulators σ70 and σ38. These genes, supported by our sequence-dependent model, could become building blocks for synthetic genetic circuits functional in both the exponential and the stationary growth phases.

factor, approximately 5% have dual  factor preference. The ones in significant 21 numbers are 'σ 70+38 genes', responsive to  70 , which controls housekeeping genes, as 22 well as to  38 , which activates genes during stationary growth and stresses. We show 23 that σ 70+38 genes are almost as upregulated in stationary growth as genes responsive 24 to  38 alone. Also, their response strengths to  38 are predictable from their promoter 25 sequences. Next, we propose a sequence-and  38 level-dependent, analytical model 26 of σ 70+38 genes applicable in the exponential, stationary, and in the transition period 27 between the two growth phases. Finally, we propose a general model, applicable to 28 other σ factors as well. This model can guide the design of synthetic circuits with 29 sequence-dependent sensitivity and plasticity to transitions between the exponential 30 and stationary growth phases. 31 32 Author Summary 33 Present challenges in Synthetic Biology include the design of genetic circuits that are 34 robust to growth phase transitions and whose responsiveness is sequence-35 dependent, and, thus predictable prior to design. We present and validate an 36 empirical-based, sequence-dependent analytical model of E. coli genes with dual 37 responsiveness to the regulators  70 and  38 . These genes, supported by our 38 sequence-dependent model, could become building blocks for synthetic genetic 39 circuits functional in both the exponential and the stationary growth phases. In Escherichia coli (E. coli), genes are expressed by RNA polymerase (RNAP), which 43 has 5 subunits (α2ββ′ω) and can synthesize RNA by starting and ending at sequence-44 specific DNA locations, with the former being known as 'promoter' regions [1]. 45 Transcription is constantly regulated, mostly at the promoter regions, which typically 46 harbor transcription factor (TF) binding sites (TFBS) and other regulatory sequence 47 motifs [2][3][4][5][6][7]. Such regulation is an essential survival skill of adaptability to transitions 48 in both internal as well as external conditions [8,9] 49 One mechanism of large-scale, relatively synchronous gene expression regulation is 50 executed by σ factors [4,10-13]. E. coli has seven different σ factors [14]. During 51 exponential growth in optimal conditions, RNAP mostly transcribes genes with 52 preference for σ 70 , responsible for basic cell functions [15]. Other σ factors are 53 expressed under specific stresses [16], and activate a specific gene cohort. For 54 example, after growing exponentially at the cost of environment components, E. coli 55 usually switches to stationary growth. This is triggered, among other, by RpoS (σ 38 ), 56 whose appearance activates ~10% of the genome [17,18], which leads to key 57 phenotypic modifications [18][19][20][21][22][23][24]. 58 During this adaptation, the concentration of RNAP remains relatively constant [25] 59 and, since the number of RNAP core enzymes is limited, σ factors compete for them 60 [10, 12,14,20,26,27]. Thus, when σ 38 numbers increase, not only are the genes 61 responsive to σ 38 activated, but also other genes, previously active, are indirectly 62 negatively regulated, due to the reduction in RNAPs carrying σ 70 [10,26,28]. 63 For this regulatory system to be efficient, promoters need to have high specificity to 64 one and only one σ factor. Nevertheless, there is a small fraction of promoters that 65 can recognize more than one σ factor [6,29,30]. The most common (84%) are the 66 promoters responsive to both σ 70 and σ 38 [31]. Other cases are too rare to obtain 67 sufficient statistics on their kinetics (Table S3 in S3 Appendix). 68 Here we investigate how the sequences of promoters recognized by both σ 70 and σ 38 69 relate to the dynamics of genes that they control (here named 'σ 70+38 genes'). From 70 flow cytometry and RNA-seq measurements, along with sequence analysis, we 71 establish a relationship between RNA and protein levels in each growth phase with 72 the promoter sequence (logos of positions -41 to -1 shown in Fig S1 in  prior to, during, and after shifting from exponential to stationary growth (Section 4.2). 83 Cells at 150 min after being placed in fresh medium are used as representatives of 84 cells in mid-exponential growth ( Fig 1A). Meanwhile, at 500 min and onwards, cells 85 exhibit stationary growth (Fig 1A). 86 From the measurements,  38 levels are small at 150 min, while from 500 min onwards 87 they are stably high (Fig 1B). Considering that the cell areas are ~15% smaller in the 88   115 We used a YFP strain library to measure single-cell protein levels (Section 4.5) of 9 116 out of the 64 σ 70+38 genes (only 15 of the 64 are represented in this library and some 117 exhibited too weak signals to be detected by the flow cytometer). These 9 genes, 118 according to their LFCRNA, cover well the state space of response strengths of σ 70+38 119 genes (Section S1.2 in S1 Appendix). 120

Propagation of shifts in RNA numbers to protein numbers
From the single-cell distributions of protein levels in exponential and stationary growth, 121 we extracted means, μP, and squared coefficient of variations, CV 2 P. We also 122 calculated the log2 fold changes in μP, LFCP. Since LFCP and the corresponding 123 LFCRNA are linearly positively correlated, with a high R 2 (Fig 2A), we conclude that 124 changes in RNA levels of σ 70+38 genes during the growth phase transition propagate 125 to their protein levels. 126 We then analyzed the noise in the dynamics of these genes, prior and after the growth 127 phase transition. In Figs 2B1 and 2B2, CV 2 P decreases quickly with μP for small μP, 7 but only weakly for high μP. This is well described by a function of the form: 129 The rate constants of these reactions, which control the affinities to the holoenzymes, 168 are expected to differ between genes, as they depend on the promoter sequences 169 (and potentially which transcription factors are acting on the promoters, here not 170 represented for simplicity). Nevertheless, here, also for simplicity, kt σ70 and kt σ38 are 171 assumed to be a function of only Dσ70 and Dσ38 respectively. 172 Together, the set of reactions (R1a), (R1b), (R2a) and (R2b), model the transcription 173 kinetics of σ 70+38 genes before, during and, after shifting from exponential to stationary 174 growth. The rates kt σ70 (Dσ70) and kt σ38 (Dσ38) are dissected below. 175

Reduced model of the shift in transcription kinetics
176 To reduce the model, we assume that the numbers of RNAP.σ 70 and RNAP.σ 38 in the 177 cells are significantly larger than 1, which is expected. Also, we assume that the rate- Finally, to complete the model, we included a reaction for translation of RNAs into 186 proteins (R5), as well as events of RNA (R6) and protein (R7) decay due to For simplicity, the rate constants are not expected to differ between the growth phases 198 (as they depend on biophysical parameters that should not change significantly, such 199 as binding affinities, etc.
where γ is the ratio between the RNA decay rates in the two growth phases: 203 .
Also, considering equation (1) Finally, since, in the exponential growth phase, All parameters in (9) can be measured directly, except kt σ70 (Dσ70) and kt σ38 (Dσ38 Consider that, in general, as Dσi of a promoter increases, the promoter should become 229 more distinct from the 'average' promoter with preference for σ i . Consequently, its 230 affinity to σ i is expected to decrease. If this holds true, then the promoter consensus 231 sequence of genes with preference for a given σ factor should have strong affinity to 232 that σ factor. We hypothesize, for simplicity, it has the strongest affinity. If true, it 233 follows that as Dσ38 increases, the transcription rate by RNAP.σ 38 should decrease. 234 Having this in account, to model how a promoter sequence controls the transcription 235 kinetics of promoters recognized by σ 70 and by σ 38 , we considered a linear model 236 13 (model I in Table 1 We fitted the models (Fig 3) to the empirical fold changes (FC) of σ 70+38 genes (FCRNA) 244 with known input TFs (Table S4 and Table S6) (FC rather than LFC were used, 245 because this allowed for simpler equations for the fitting surfaces). From the R 2 values 246 of the best fitting models (Table 2), we find that the model III best describes how Dσ70 247 and Dσ38 affect kt σ70 and kt σ38 . 248 Next, we validated the fitted models by quantifying how well they predict the dynamics 249 (Fig 3) of σ 70+38 genes without input TFs (Table S4 and Table S6). Given the R 2 values, 250 we conclude that the tuned model III predicts well their dynamics. Further, the fact that 251 the model obtained from genes with TF regulation fits well (with a higher R 2 ) the genes 252 without TF regulation is suggestive that, in our measurement conditions, such TF 253 regulation is not exerting a significant role. 254 I. Linear model

270
Finally, from Fig 2A, the LFC of protein numbers (LFCP) correlates linearly with 271 LFCRNA. The same is predictable from the model (reactions R5 to R7) (Section S1.3 272 in S1 Appendix). Here, to obtain the best fitting model for protein log2 fold changes, 273 we fitted a line to the data in Fig 2A, where β equals -0.67 and is the intercept between the y axis and the best fitting line, 277 while α is the scaling factor between RNA and protein log2 fold changes and equals 278 0.1 (Fig 2A). 279 2.7. σ 70+38 genes without known input transcription factors 280 Next, we fitted the exponential model III directly to σ 70+38 genes without input TFs (not 281 to mistake with when they were used to validate the surface created from σ 70+38 genes 282 with TF inputs). The best fitting model (R 2 of 1) has similar m38 and m70 (49.04 and 283 52.48, respectively). Thus, for this restricted set of genes, since Δ = Dσ38 -Dσ70 284 (Section 4.8.3), the surface equation (12c) in Table 2  Therefore, and even though Dσ38 and Dσ70 are not significantly correlated (Fig 4A), one 291 finds a linear correlation between LFCRNA and Δ ( Fig 4B).

298
For comparison, we produced similar figures in search for linear correlations between 299 Δ and LFCRNA in the cohorts of genes whose promoters have preference for σ 70 , σ 38 , 300 σ 24 , σ 28 , σ 32 , and σ 54 (Figs 5A-5F). No significant correlation with a high value of R 2 301 was found. We further studied the two small cohorts of genes whose promoters have 302 preference for two σ factors other than σ 70+38 . Here, correlations are found, but they 303 are not statistically significant, likely due to the small numbers of genes (Figs 5G and 304 5H).  310 We expanded the model to include the transition period between the growth phases. 311

Expanding the model to the growth phase transition period
For this, we collected temporal data on the protein numbers of three σ 70+38 genes, 312 specifically pstS, aidB and asr. These genes are used as representative of σ 70+38 313 genes with, strong, mild, and weak response strengths to the phase transition, 314 respectively ( Fig 6A). All data was well fitted by Hill functions (Fig 1B and Fig 6A). 315 Moreover, in all three genes, there is a linear relationship between the fold changes in 316 protein levels and in σ 38 levels over time (Fig 6B).  Table S5 in S3 Appendix). (b) Scatter plot of FCP against the 324 corresponding σ 38 levels (data from Fig 1B)   While here we only found a correlation between the promoter sequence and the gene's 342 response in σ 70+38 genes, it may be that genes whose promoters have different σ 343 preferences exhibit similar sequence-dependent behaviors during the stresses that 344 they are responsive to. Thus, we generalized the model to be applicable to any stress 345 and responsive gene cohort. As an example, we set a model for genes responsive to 346 all seven  factors of E. coli. For this, first, we expand reactions (R1a)  This general model can be tuned based on the numbers of each  factor present in 354 the conditions considered, and the consensus sequences to each  factor (Table S5). 355 In addition, following the findings in Section 2.8, it should be feasible to introduce 356 factors to account for the timing of the changes in the transition period. E.g., we failed to find a relationship between the sequences of promoters responsive 390 to  38 alone and their response strength to  38 (even for genes without TF inputs (Fig  391   5b)). Also, it remains to be proved whether promoters responsive to  70 ,  54 ,  32 ,  28 , 392  24 , or dual combinations, have similar behaviors when those  factors change. 393 23 It also remains to be studied whether the sequence-dynamics relationship in  70+38 394 genes is causal or coincidental. While the large numbers of promoters studied here 395 support the causality, a definitive proof likely requires producing synthetic promoters 396 similar to the natural ones but differing in p-distances to the consensus sequences of 397 promoters with preference for  38 and for  70 , respectively. 398 Also, in the future, it should be investigated if the sequence-dependent 399 responsiveness of σ 70+38 genes to σ 38 levels contributes to why their promoters (from 400 positions -41 to -1) are highly conserved (Fig S1 in S2 Appendix) and/or why the TFs 401 that they code for commonly serve as input TFs to essential genes [43] (2.5 times 402 more than average, Fisher test p-value < 0.05). 403 Finally, over-representation tests of the ontology [44,45] of σ 70+38 genes suggest that 404 they are commonly involved in respiration (Fig S4 in S2 Appendix, Table S4 in S3  405 Appendix). In agreement, this process is highly affected when changing from 406 exponential to stationary growth [23,46], since aerobic respiration is reduced, while 407 fermentation and anaerobic respiration are enhanced [18]. Moreover, we observed 408 that σ 70+38 genes are amongst the most responsive to the change from growth phase 409 out of the genes associated with these functions (Fig S4 in S2 Appendix). This 410 suggests that they may control the processes that need most alterations during the 411 adaptation. Therefore, externally regulating these genes may thus give significant 412 control over these processes. This is particularly appealing since this control could be 413 We used M9 medium (1xM9 Salts, 2 mM MgSO4, 0.1 mM CaCl2; 5xM9 Salts 34 g/L 437 Na2HPO4, 15 g/L KH2PO4, 2.5 g/L NaCl, 5 g/L NH4Cl) supplemented with 0.2% 438 Casamino acids and 0.4% glucose, and Luria-Bertani (LB) medium with 10 g peptone, 439 25 10 g NaCl, and 5 g yeast extract in 1000 ml distilled water. We used the antibiotics 440 kanamycin and chloramphenicol from Sigma Aldrich, U.S.A. 441 at 150 min and at 700 min after inoculation into fresh medium to represent cells in 450 exponential and stationary growth phases, respectively (Fig 1A). 451

452
To collect microscopy data, cells were placed between a coverslip and agarose gel 453 pad (2%) and visualized by a confocal laser-scanning system, using a 100× objective. 454 Green fluorescence images were captured using a 488 nm laser and a 514/30 nm 455 emission filter. Phase-contrast images were simultaneously acquired for purposes of 456 segmentation and to assess health, morphology, and physiology. 457 Images were analyzed using "CellAging" software [51]. After automatically segmenting 458 cells and manually correcting errors, we applied 2D Gaussian filters to remove 459 measurement noise, and extracted each cell's total fluorescent intensity.   than 5 counts in more than 3 samples, and genes whose mean counts are less than 491 10, were removed. iv) Next, the unique gene hit counts were used for subsequent 492 differential expression analysis, using DESeq2 R package (v. Finally, we discarded "far out" events using Tukey's fences [59]. This had little effect 518 on the results. Furthermore, background fluorescence was not subtracted since doing 519 so did not affect the results (Section S1.2 in S1 Appendix and Fig S2 in S2 Appendix). 520 replicates each. Meanwhile, to assess if genes measured by flow cytometry were a 531 good representative of the set of all σ 70+38 genes, we used two-sample Kolmogorov-532

Spectrophotometry
Smirnov tests (KS tests) to compare their sequences (Dσ38 and Dσ70) and dynamics 533 (LFCRNA, µ, and CV 2 ). Further, gaussian fittings were made by the Distribution Fitter 534 app (MATLAB). To evaluate its fitting, we calculated the coefficient of determination 535 (R 2 ) between the PDF of the distributions and the PDF of the fitting. Finally, linear 536 correlations were estimated by least-squares fitting (FITLM of MATLAB) and 537 considered significant if the p-value of an F-test on the fitted regression line was 538 smaller than 0.05. Other curves were fitted using MATLAB's curve fitting toolbox (apart 539 from the fitting of Hill functions, which was done using the Hill function [60]). 540 As for gene ontology representations, we perform overrepresentation tests using 541 PANTHER Classification [61] for finding significant overrepresentations by Fisher's 542 exact tests. For p-values < 0.05, the null hypothesis that there are no associations 543 between the gene cohort and the corresponding GO of the biological process is 544 rejected. This p-value is corrected by calculating if the overall False Discovery Rate 545 (FDR) is < 0.05. FDR is calculated by the Benjamini-Hochberg procedure [56].   Table S3 in S3 Appendix informs on the σ factor preference of the genes. From 549 Regulon DB v10.5 (August 14, 2019), we obtained lists of all transcription units (TUs), 550 promoters (including σ factor preferences), and genes of E. coli [62]. Recently, we 551 compared our lists with information on July 1 st , 2021 and found no changes that would 552 affect the conclusions. TUs only differed by ~1%, promoters by ~0.5%, and genes by 553

~1%. 554
From the 3548 TUs (gene(s) transcribed from a single promoter), we extracted 2179 555 with known promoters, containing 2713 genes in total. To minimize interference to the 556 classification of σ factor preferences arising from transcription by multiple promoters, 557 we narrowed our list to 1824 genes transcribed by only one promoter and, of those, to 558 the 1328 genes with known σ factor preference. From those, 1242 have a preference 559 for only one σ factor, including 931 with a preference for σ 70 , and 93 with a preference 560 for σ 38 . Conversely, 76 genes have promoters with a preference for two σ factors. Out 561 of these, 64 are transcribed by only one promoter with a preference for σ 70 and σ 38 . 562

Promoter sequence logos 563
Promoter sequence logos were created using WebLogo  The p-distance D of a promoter [65] is the fraction of its nucleotides between positions 576 -41 to -1 (assuming that transcription start site starts at position +1) that differ from the 577 consensus (most common) nucleotide in that position of a cohort of genes (here, 578 genes with preference for a given σ factor). We extracted the consensus sequences 579 related to each σ from RegulonDB [62]. To measure the D of promoters with 580 preference for both σ 38 and σ 70 , we calculated the p-distances Dσ38 and Dσ70, and then 581 an overall p-distance, Δ, defined as: Δ = Dσ38 -Dσ70.