Abstract
In cyanobacteria DNA supercoiling varies over the diurnal light/dark cycle and is integrated with temporal programs of transcription and replication. We manipulated DNA supercoiling in Synechocystis sp. PCC 6803 by CRISPRi-based knock-down of gyrase subunits and overexpression of topoisomerase I (TopoI), and characterized the phenotypes. Cell division was blocked, most likely due to inhibition of genomic but not plasmid DNA replication. Cell growth continued to 4-5x of the wildtype cell volume, and metabolic flux was redirected towards glycogen in the TopoI overexpression strain. TopoI induction initially lead to down-regulation of GC-rich and up-regulation of AT-rich genes. The response quickly bifurcated and four diurnal co-expression cohorts (dawn, noon, dusk and night) all responded differently, in part with a circadian (≈ 24 h) pattern. A GC-rich region − 50 bp of transcription start sites is differentially enriched in these four cohorts. We suggest a model where energy- and gyrase-gated transcription of growth genes at the dark/light transition (dawn) generates DNA supercoiling which then facilitates DNA replication and initiates the diurnal transcriptome program.
Introduction
In vivo, the DNA double helix exists in a torsionally strained and underwound state, often denoted as “negative DNA supercoiling”. A homeostatic feedback system of DNA supercoiling is coupled to differential expression of large gene groups in many different bacterial species [1, 2, 3, 4, 5, 6]. The general picture (Fig. 1A) is that DNA supercoiling is high during times of high metabolic flux, such as during exponential growth, and required to express rRNA and GC-rich growth genes and allow for DNA replication [7]. The ATP/ADP ratio has direct effects on DNA supercoiling [8, 9, 10]. Supercoiling would allow an analog modulation of transcription factor and polymerase binding [11] and thereby qualifies as a potential mechanism for the long known and recently re-discovered monotonous relation of rRNA and growth gene expression with growth rate[12, 13, 14]. However, the relation of RNA transcription and DNA replication to DNA supercoiling is mutual and complex [7]. According to the twin-domain model of transcription-dependent supercoiling (Fig. 1B), negative supercoiling accumulates upstream and positive supercoiling downstream of polymerases [15], leading to cooperative and antagonistic long-range effects between transcription loci [16] (Fig. 1C). Strong transcriptional activity requires downstream activity of gyrase to set the elongation rate and avoid polymerase stalling [17, 18, 19] and upstream activity of topoisomerase I to avoid R-loop formation and genome instability [20, 21]. Such cooperative long-range effects can underpin temporal expression programs; locally in the leu operon [22, 23] and globally as a spatio-temporal gradient along origin-terminus axis of the Escherichia coli (E.coli) genome [24].
In cyanobacteria, this system forms an integral part of the diurnal (light/dark cycles) changes in metabolism and transcription, and is integrated with the output of the cyanobacterial circadian clock [25, 3]. Supercoiling of chloroplast genomes, the endosymbiotic descendants of cyanobacteria, was observed to fluctuate with the diurnal light/dark (LD) cycle [26], and plants encode for gyrase enzymes [27]. Cyanobacteria themselves traverse through a well defined transcriptional program during diurnal LD cycles in several species [28, 29, 30, 31]. Mori and Johnson first suggested [32] that diurnal DNA supercoiling may be involved in the genome-wide nature of diurnal transcription in cyanobacteria. Genome compaction was found to fluctuate with LD cycles in Synechococcus elongatus PCC 7942 [33], and an endogenous plasmid showed diurnal fluctuations of DNA supercoiling [25]. The temporal transcriptome program during entrained circadian cycles correlated with plasmid supercoiling states and depended on gyrase activity [3]. In Synechocystis sp. PCC 6803 (hereafter abbreviated as Synechocystis), cold, heat and salt stress all lead to similar changes in the transcriptome and all were enhanced by treatment with the gyrase inhibitor novobiocin (NB) [4]. Gene groups with coherent response to stress and NB overlapped significantly with diurnal co-expression cohorts [34]. Supercoiling of the endogenous plasmid pCA2.4_M increased within 30 min after the transition to light phase, and continuously decreased during 12 h and further during a prolonged 24 h dark phase [35]1.
In this work, we manipulate DNA supercoiling in Synechocystis by inducible overexpression [36] and CRISPRi-based knock-down [37] of the key genes involved in modulation of supercoiling: topoisomerase I (gene: topA), and gyrase (subunit genes: gyrA and gyrB. We can confirm both, the homeostasis and transcription-dependent supercoiling models in Synechocystis. All manipulations that should decrease supercoiling lead to a strong pleiotropic phenotype, where cell division is blocked but cell volume growth continues. Especially topA over-expression induces overflow metabolism and glycogen production, and uncouples the diurnal transcription program. Metabolism and transcriptome appear to be locked in a state between late night and early day.
Results and Discussion
Artificial Genome Relaxation Blocks Division but not Growth
Manipulation of Gyrase and Topoisomerase Expression
To study DNA supercoiling in Synechocystis, we used the dCas9-mediated CRISPR-interference system [37] to repress (knock-down) transcription of topoisomerase I (gene topA, slr2058), and gyrase subunits gyrA (slr0417) and gyrB (sll2005), or gyrA and gyrB simultaneously, yielding strains named <gene>KD. Additionally, we constructed a tunable expression plasmid pSNDY [36] where a copy of the native Synechocystis topA is under the control of a rhamnose-inducible promoter, strain topAOX (Tab. 1). All six strains were induced with anhydro-tetracycline (aTc) and rhamnose and cultured for five days, then harvested for quantification of cell dry weight, ATP, glycogen, and plasmid supercoiling. Reverse transcription quantitative PCR (RT-qPCR) verified the functionality of our inducible genetic constructs (Fig. S1A). All knockdown strains showed an abundance reduction to 8 %–15 % of the wild-type level, and topA induction was 30 fold. We also measured all three transcripts in all strains and observed compensatory upregulation and downregulation of the non-manipulated supercoiling enzymes, a first verification of the homeostatic control model (Fig. 1A) in Synechocystis.
Cell Volume Growth, Adenosine and Glycogen Content
Initially, all cultures showed comparable growth (Fig. 2A). After three days all strains except topAKD grew slower than the EVC; and topAOX showed the strongest growth defect. The cell dry weight (CDW) at harvest time correlated with the final OD750 of the cultures (Fig. 2A), but was relatively higher for the EVC and topAKD strains. Cell volume distributions of the EVC and topAKD strains showed a transient small increase (≈10%) on the first day of cultivation and were stable thereafter (Fig. 2B). In contrast, cell volumes of the gyrkd and topAOX strains increased over time, from 4 fL–5 fL to 12 fL–15 fL after four days of cultivation. Total cell numbers increased only slightly. Thus, strains where gyrase subunits were knocked down or topoisomerase I overexpressed showed inhibition of cell division but not of cell growth. Pigmentation was strongly affected. Cultures of the topAOX strain appeared pale and gyrkd strains blue, compared to their uninduced state and to EVC. Absorption spectra (Fig. 2C, S1B) confirmed an overall decrease of all pigments in topAOX. The gyrkd strains showed a stronger decrease at chlorophyll-specific wavelengths than at phycocyanin-specific wavelengths, explaining their blue appearance.
All knock-down strains showed glycogen levels similar to the EVC, with 25 % of the total CDW (Fig. 2D), which is consistent with literature values for the wild type [38]. In contrast, topAOX showed more than twice as much glycogen with 55 % of CDW. Values of up to 60 % are reported for nitrogen-starved cells [39, 40]. The topAOX strain accumulated more than four times as much ATP+ADP as the EVC (Fig. 2E). gyrBkd and gyrABkd accumulated about twice as much ATP+ADP as the EVC; topAKD and gyrAkd showed no difference to the EVC control. Thus, the strains repressing the ATPase subunit of gyrase, gyrBkd and gyrABkd, had slightly elevated levels of ATP+ADP.
Hypernegative Plasmid Supercoiling in the topAOX Strain
Agarose gel electrophoresis in the presence of chloroquine (CQ) was used to analyze plasmid supercoiling at harvest time (5 d) of the batch cultures, and interpreted as outlined previously [35]2. The gels show three sets of topoisomer bands (Fig. S1D). These likely stem from the three annotated small plasmids of Synechocystis, pCA2.4_M, pCB2.4_M and pCC5.2_M. Electropherograms of the two smaller plasmids indicate that only strains gyrAkd and gyrABkd showed plasmid relaxation (Fig. 3A). In the gyrBkd and topAOX strains plasmids appeared to have a higher level of DNA supercoiling. We could not extract plasmids from the topAKD strain. Increased plasmid supercoiling in topAOX could results from a long-term adaptation and compensatory up-regulation of gyrase subunits. We thus tested plasmid supercoiling as a time series after inoculation in fresh medium with and without the inducer (Fig. S2). The gel run time was increased to also separate topoisomers of pCC5.2_M. All three plasmids were more relaxed (less negative supercoiling) after 3 h of growth (Fig. 3B,C). Already after 8 h the trend had reversed, and at 20 h plasmids were more supercoiled than at time 0 h and in the uninduced control time series (Fig. S2E). Then plasmids became further supercoiled to an extent where topoisomers were not separable anymore. In vivo, hypernegative supercoiling of plasmids has been observed in topoisomerase I-deficient E. coli strains [41]. In vitro, it can be generated by gyrase and transcription [42] in agreement to the twin-domain model of transcription-dependent supercoiling (Fig. 1B). Here, the compensatory upregulation of gyrase genes apparently had a stronger effect on plasmids than the induced topoisomerase I overexpression. Plasmids and phages often harbor gyrase binding sites [43] and such sites could contribute to this phenomenon.
Reduction of rRNA Abundances in gyrkd Strains
To test effects on transcription, we selected three strains, gyrAkd, gyrBkd and topAOX, and the EVC for RNA-seq analysis. Strains were grown in triplicate and harvested for flow cytometry, RNA extraction and RNA-seq analysis 5 d after induction and 3 d after a culture dilution step (Fig. S3A). Flow cytometry confirms the growth phenotype (Fig. 4A, S4): forward scatter (FSC) was increased in all strains, and most in topAOX. Side scatter (SSC) additionally revealed two cell populations in topAOX which could reflect 8-shaped cells in division. Total nucleic acid content increased with cell size. Total RNA composition and the relative abundances of rRNA and mRNA were analyzed by capillary gel electrophoresis (Fig. S3B,C, S5). Ribosomal RNA species were strongly reduced in the gyrAkd and gyrBkd strains and less reduced in topAOX (Fig. 4B), except for the 600 bp fragment of the 23S rRNA (Fig. S6C). In E. coli, transcription of ribosomal RNA depends on downstream gyrase activity [44], mediated by strong gyrase binding sites [19]. Such an interaction would be directly (locally) affected in the gyrkd strains. In contrast to plasmids, the upregulation of gyrase in the topAOX strain could apparently not locally compensate for global genome relaxation by topoisomerase I, thus also reducing the relative rRNA content in this strain.
Consistent Changes in mRNA Abundances & Homeostatic Regulation of Supercoiling Enzymes
The same RNA samples were further processed (rRNA species depleted) and sequenced on the Illumina platform, and transcript abundances relative to the EVC evaluated with DESeq2 [45]. All strains showed overall similar expression changes, but the extent was lower in topAOX (Fig. 4C,D). However, this difference could also just reflect normalization effects by the decreased rRNA content in the gyrase knock-downs. In all strains the targeted manipulation was still observable at harvest times (arrows in Fig. 4C,D), i.e., gyrA transcripts were reduced in gyrAkd, gyrB transcripts in gyrBkd and topA transcripts were increased in topAOX. The non-manipulated genes showed the compensatory response expected from homeostatic regulation, i.e., topA was repressed in both gyrkd strains, and all non-manipulated gyrase subunits were induced in all experiments. In contrast, the sll1941 gene, annotated either as a second gyrase A subunit or as the topoisomerase IV ParC subunit, showed no response in either experiment. Thus, RNA-seq confirms the RT-qPCR results, further corroborating the homeostatic regulation model (Fig. 1A), and emphasizes the genome-wide nature of DNA supercoiling effects.
Overexpression of topA Uncouples Diurnal Co-expression Cohorts
Transient Increase in Cell Volume and Density
To study the dynamic response to transient topA induction, the topAOX strain was grown in a Lambda Minifor bioreactor (Fig. S7) with continuous (online) monitoring of turbidity (ODλ, Fig. S8A,B). Continuous culture dilution was initiated at ODλ ≈ 2.9 and with dilution rate φ ≈ 0.24 d−1. The culture stabilized around ODλ ≈ 2.7. Notably a subtle 24 h pattern of ODλ was observed in both, batch and pre-induction continuous growth phases. Then rhamnose was injected to 2 mM to induce overexpression of topA. The topA transcript was upregulated to ≈45-fold over the pre-induction level within 4 h, as measured by RT-qPCR and confirmed by RNA-seq (Fig. S11) and decreased slowly over the course of the experiment. The ODλ initially increased for 1 d post-induction, then slowly decreased. Cell dry weight (CDW) measurements were noisy but matched the ODλ signal over the sampled period (Fig. 5A, S8C). In contrast, cell numbers started to decrease immediately, and cell volumes increased (Fig. 5B). We calculated growth rates of ODλ, cell numbers and the total cell volume (Fig. 5C, S9). Cell division was not completely blocked but severely reduced to a division time of ≈10 d (µcount ≈ 0.07 d−1). Total cell volume growth was much less affected and remained stable (µvolume ≈ 0.18 d−1) throughout continuous culture operation until 12 d post-induction. Thus, artificial topA overexpression blocked cell division but not cell volume growth. ODλ growth remained highest (µOD ≈ 0.23 d−1) and stable over the first 5 d–6 d. In parallel, glycogen content increased to about 35 %–40 % of the CDW (Fig. 5A). We further noticed that sampled cells started to sediment much faster, indicating increased intracellular density. By calibrating the ODλ signal to the CDW measurements (Fig. S8C) and dividing by the total cell volume we can estimate a CDW density and this value also increased over time from 0.3 to 0.5 gDCW/mLcell (Fig. 5D). This range is consistent with data from E. coli [46, 47]. However, the CDW per OD750 was relatively lower for the enlarged strains in the endpoint measurement (Fig. 2A), and thus, the calibration to ODλ may overestimate true CDW density. The enlarged cells also became increasingly fragile: in the CASY cell counter data a small population of varying intensity appeared at < 2 fL. This peak was highest at 7 d (outlier x in Fig. 5B), where cells were lysed during centrifugation in a washing step. The washing step was skipped thereafter, and the peak of small cells (dead or fragmented) remained small but increased towards the end of the continuous culture. Maximal cell volumes >20 fL were reached 10 d–15 d post-induction. From day 14 a population of smaller cells, ≈7.5 fL, appeared. On 16 d this population was the majority, and cell volume further decreased to 5 fL. Cell pigmentation recovered and the culture appeared greener again. We then switched off dilution, and the culture resumed growth, although at lower growth rates than pre-induction.
Upregulation of Plasmid & Growth Genes, Downregulation of Photosynthesis Genes
Samples for total RNA analysis and RNA-seq were taken 1 d and 0.5 h before induction, and then over the next 25 d in decreasing temporal resolution to roughly capture three time-scales of the response. Coding gene transcript abundances were calculated, the resulting time series clustered (Fig. 6A, S12), clusters sorted along significant overlaps with diurnal cohorts (Fig. 6C), and functional annotation enrichments (CyanoBase “categories”: Fig. 6B, S14A; gene ontology: Fig. S15) calculated. Cluster 2 (red) comprises the majority of the ribosomal protein category, enzymes of RNA and DNA synthesis, amino acid biosynthesis and the ATP synthase. Transcript abundances increased slightly over the three days after induction, with a notable circadian (≈ 24 h) pattern. Cluster 3 (yellow) comprises the majority of photosynthesis-related genes, including the two photosystems. Its transcript abundances decreased continuously. The remaining clusters showed less clear functional profiles. Clusters 1 (green) and 4 (blue) were enriched in genes with unknown function, and cluster 4 strongly enriched with transposase genes. Gene Ontology analysis additionally shows enrichments with 7 of 8 genes annotated with “DNA polymerase activity” and 60 “DNA binding”proteins (Fig. S15). Both clusters 1 and 4 showed decreased transcript abundances on the first day post-induction, then only cluster 4 increased steeply. Cluster 1 transcripts showed a subtle circadian pattern, peaking anti-phase to cluster 2 transcripts. Cluster 6 (cyan) showed a similar profile to cluster 4 and comprises transcripts that showed the strongest abundance increase. Both clusters 4 and 6, were strongly enriched for plasmid-encoded genes (Fig. S16), and both also contained significant fractions of the ribosomal protein category. Cluster 5 (gray) showed decreased abundances but had the least change over time, and was slightly enriched with “hypothetical”, “Hydrogenase” and “plasma membrane” gene annotations.
Uncoupling the Diurnal Program at Dawn
Next, we compared the time series clusters from topAOX and the endpoint transcript abundances from the topAOX and gyrkd strains with previously characterized co-expression cohorts (Fig. 6D). Prakash et al. clustered genes into three groups by their response to salt, heat and cold stress, each with or without the gyrase inhibitor novobiocin (NB) [4]. Zavrel et al. measured protein abundances at different growth rates in constant light conditions and presented 7 clusters [14] which we summarized into cohorts upregulated or downregulated with growth rate, with a complex response and stable proteins without growth rate-dependence (Fig. S14B). Saha et al. and Lehmann et al. analyzed transcriptome time series over light-dark cycles [48, 34]. Saha et al. did not cluster their data, thus we clustered their data into 5 cohorts and assigned them names that correspond to their order of expression during the diurnal light-dark cycle (Fig. S17). Clusters 2 (red) and 3 (yellow) show the clearest functional profiles, and are found co-expressed with opposite patterns in all tested experiments, and at dawn and noon, respectively, in the diurnal data sets. Cluster 2 was most enriched with transcripts that showed a mixed response to the applied stress conditions (Stress+NB) and with genes whose protein abundances correlated positively with growth rate (GROWTH). Cluster 3 was most enriched with transcripts downregulated in stress and whose protein abundances correlated negatively with growth rate. In the diurnal data set cluster 2 transcript abundances peaked in early day (dawn) and were followed by a peak of cluster 3 transcripts at noon. These clusters likely reflect biological co-expression cohorts and from hereon we denote cluster 2 as cohort RB/dawn (ribosomal proteins and biosynthesis) and cluster 3 as PS/noon (photosynthesis), according to their annotation and experiment enrichment profiles. Cluster 1 is enriched with proteins of unknown function. In both, stress and growth experiments, we found only enrichments in the NA classes, ie. the original papers did not report values. However, in the diurnal data set we found enrichment with genes clusters that peak at noon or after noon. We denote this cohort as UK/dusk (unknown function, expressed at dusk). The remaining clusters 5, 4 and 6 all show weak functional enrichments. Only cluster 4 is enriched with transposase genes, DNA polymerase and other DNA binding proteins. All of them, and strongest in cluster 4, show weak enrichment with transcript cohorts that peak during the dark phase. Cluster 6 is enriched in the late night cohort in the diurnal data set, and further shows some overlaps in enrichment with the RB/dawn cohort, thus closing the (diurnal) cycle of expression. We tentatively label cluster 4 as the DNA/night cohort and cluster 6 as plasmids.
To compare the time series with the endpoint transcriptome data, we calculated t-test profiles of all clusters (red/blue squares in Fig. 6D and S13). RB/dawn, PS/noon and UK/dusk showed consistent behavior in all strains, although at lower t-values for the gyrase knock-downs. DNA/night and plasmids, both enriched with plasmid genes, were upregulated only in topAOX but not in the gyrkd strains. Transcripts of the large cluster 5, which showed the weakest response in the time series, were upregulated only in the gyrkd strains. Likewise, the diurnal co-expression cohorts from ref. [48] (Fig. S17) all show subtle but notable differences between the strains, only PS/noon was consistently downregulated.
Differential Regulation of GC-rich and AT-rich Genes
Genes with supercoiling-activated transcription were found to be GC-rich in both upstream non-coding and coding regions of several species, and vice versa, supercoiling-repressed genes are AT-rich [1, 3, 49, 6]. Thus, we aligned nucleotide sequences of all genes from the main chromosome at their start codons (ATG), calculated average nucleotide content in moving windows for each cluster, and performed a statistical enrichment test (cumulative hypergeometric distribution) at each position (Fig. 6E). The up-regulated RB/dawn cohort and the down-regulated PS/noon cohort are GC-rich downstream and upstream of the start codon. The UK/dusk (downregulated) and DNA/night cohorts (upregulated) are AT-rich. Zooming in to the immediate response after induction shows that the response of the co-expression cohorts switches within the first h post-induction (Fig. 6C). The immediate response is consistent with data from other species. Transcript abundances of both GC-rich cohorts, RB/dawn and PS/noon, decreased, while those of the AT-rich UK/dusk and DNA/night cohorts increased. Already 1 h post-induction, the two GC-rich and two AT-rich cohorts have bifurcated. Transcript abundances of RB/dawn increased while those of PS/noon continue to decrease. Similarly, the AT-rich UK/dusk transcripts now decreased while DNA/night transcripts continued to increase in abundance.
Promoter Structure: A GC-rich Discriminator?
To analyze actual promoter structures we mapped the coding genes on previously described transcription units (TU) [50], calculated average temporal abundance profiles of TUs, and clustered TU profiles by k-means using the average profiles of the gene-based clustering as cluster centers (Fig. S23). TU were then aligned at their transcription start sites (TSS) and again average nucleotide content profiles of clustered TU calculated (Fig. 6E). Downstream of the TSS the general differences in GC/AT content were similar to those of start codon-aligned coding genes, albeit with lower statistical power. TSS alignment reveals a distinct GC-rich peak at ca. 50 bp. This peak corresponded better to the mid-term trend of transcript abundances. Cohorts that were upregulated within the first 3 d post-induction (DNA/night and RB/dawn) had a higher, and downregulated cohorts (PS and UK/dusk) had the lowest GC-content. In summary, the immediate effect of genome relaxation was quickly overruled by other regulatory mechanisms. A candidate is the ppGpp-mediated repression of day-time transcription [51]. In E. coli, a GC-rich discriminator at −10 bp mediates the supercoiling-dependence of the ppGpp-mediated “stringent response” of promoters [52, 53]. Conceivably, the 50 bp site could have a comparable role in integrating the response to different regulatory inputs.
Genes of Interest
In Figures S18–S22, we analyze transcript abundances of a few specific gene sets: circadian clock genes (kai), response regulators, sigma factors, and photosynthesis and metabolic genes. However, these patterns are hard to interpret, due to the unknown function of upregulated genes and differential regulation of paralogs and enzyme complex subunits. In short: most kai genes were down-regulated, while kaiC3 is upregulated with RB/dawn, followed by kaiB2 and kaiC2 from 3 d (Fig. 7A, S18A). KaiC3 is required for chemoheterotrophic growth in constant darkness [54]. The response regulator rpaB is downregulated, and rpaA slightly upregulated (Fig. S18B). Of the tested regulators, only pmgA is strongly upregulated over the first three days, with the diurnal pattern of the RB/dawn cohort. Sigma factors of unknown function [55], sigH → segI, were upregulated until 3 d; the stress factor sigB is up-regulated only at 10 d (Fig. S18C), where cells are enlarged. Photosystem, phycobilisome, and carboxysome genes are predominantly downregulated (Fig. S19–S20). In contrast, RuBisCo and genes of the carbon concentrating mechanisms are predominantly upregulated over the first three days (Fig. S21). The glycogen degrading enzyme glgP1 was strongly downregulated; glucose-1-phosphate adenylyltransferase glgC was upregulated, and glycogen debranching enzymes glgX/X2 peaked at 3 d (Fig. S22A). NAD synthesis genes are generally upregulated, and we find strong differential response of certain subunits of the NADH dehydrogenase (Fig. S22B,C). Together with the upregulation of the redox and photomixotrophic growth regulator pgmA [56], these responses point to changes in redox metabolism. The thioredoxin trxM1 is strongly down-regulated, and trxA has only a short and low peak at 2 d–3 d (Fig. S22D).
Reversible Inhibition of DNA Replication Explains the Growth Phenotype
Manipulation of DNA super-coiling has highly pleiotropic effects, and can also interfere with DNA replication and lead to DNA damage [57]. Here, we observed a block of division while cell growth continued. Could an inhibition of genome replication be involved? In endpoint RNA-seq experiments relative rRNA abundances were decreased but the total nucleic acid content (Syto-9 stain) increased with cell size and morphology (Fig. 4A, 7B, S4). To differentiate nucleic acids, we extracted RNA and DNA from the topAOX time series experiment (Fig. 7C). Total DNA extraction yields (per CDW) dropped to about 50 % in the three days post-induction. From 4 d post-induction, DNA yields increased steeply; but increased plasmid-derived transcript abundances (Fig. 7D, S16) and increased plasmid extraction yields (Fig. 7E, S2B,C) suggest that this was due to increased plasmid replication. Thus, the most parsimonious explanation for the division block phenotype is impaired DNA replication of the genome, while the copy numbers of small plasmids increased during the late stages of cell volume growth. A block of genome replication could be due to permanent DNA damage. The volume growth and division block phenotype was reversible. All strains recovered readily when re-inoculated in fresh medium without the inducer (Fig. S1C). In the continuous culture experiment, cells recovered as the inducer was washed out (Fig. 5A). Re-inoculation with and without inducer showed that in some cells our constructs remained intact and induced cells again grew in volume, while other cells were refractive to induction, and overgrew the division-blocked cells (Fig. 7F, S10). Thus, we conclude that the block in genome DNA replication was not due to irreversible or lethal genome damage.
Conclusion
We presented a comprehensive characterization of DNA supercoiling effects in Synechocystis. Several models of the role of this regulatory system in bacterial cell biology can be confirmed in this species. The enzymes directly responsible for supercoiling are under homeostatic control [60] (Fig. 1A). Increased gyrase levels and transcription conspire to hyper-supercoil plasmids in the topAOX strain, in agreement with the twin-domain model of transcription-dependent supercoiling [15, 42] and suggesting the presence of gyrase binding sites 1. [43] in the small endogenous plasmids of Synechocystis. Differential transcription of GC-rich and AT-rich genes [1, 3, 49, 6] was observed immediately (5 min–20 min) after induction of topA over-expression. Already after 60 min the pattern had changed. The compensatory upregulation of gyrase and other regulatory mechanisms, such as the ppGpp mediated repression of day time transcription [51], may underlie this bifurcation of the response; potentially mediated by a GC-rich “discriminator” 50 bp upstream of TSS. Large gene cohorts developed similarly over the first three days post-induction, and these revealed a tight link to the diurnal transcription program, consistent with results from Synechococcus elongatus PCC 7942 [3].
Cooperative and antagonistic long-range effects of transcription via local accumulation and diffusion of DNA torsional strain [16] (Fig. 1B,C) underpin temporal expression sequences in operons [22, 23], and genome-wide along the origin-terminus axis during E. coli growth phases [24]. In analogy to these models, we suggest a simple hypothesis of supercoiling-mediated cooperativity during the diurnal transcription program in cyanobacteria (Fig. 8). Overall, genomic supercoiling is decreased in our strains, despite availability of light-derived energy. This leads to upregulation of the night-expressed gene cohort (DNA/night), incl. the kaiC3 gene [54] despite the constant light conditions. The availability of light-derived energy drives transcription of ribosomal RNA and growth genes around dawn (RB/dawn) from relaxed DNA, e.g. via low levels of ppGpp and high GTP. But this process is gated or modulated by local activity of gyrase, bound to putative gyrase binding sites downstream of the affected transcription units. In wildtype conditions, this transcription-dependent supercoiling would increase genome-wide supercoiling and thereby directly facilitate both DNA replication between dawn and noon [61] and advancement of the diurnal transcription program to the GC-rich PS/noon gene cohort. In our strains, genome-wide supercoiling can not accumulate and these latter steps are blocked. Cells are stuck in a state between late night and dawn.
To test this model, our strains should next be studied in this natural diurnal context, that is, in cultures grown in regular day/light cycles, and interference (induction) initiated at different times of the cycle. Further development of artificial regulation of supercoiling enzymes, towards a full de-construction and re-construction of the homeostatic feedback system in one strain, would provide a highly versatile tool to untangle the pleiotropic effects of DNA supercoiling on transcription, replication and cell growth. And finally, topoisomerase I overexpression caused an increase of cellular glycogen content of up to 60 % of the cell dry weight. This phenotype could be exploited for synthesis of high value products or directly as a fermentation substrate for yeast [40]. Thus, further development of such strains may be of biotechnological value.
Data Availability
The clustering and time series data from the topAOX strain (both as raw abundances in TPM and as the log2 ratios to the mean of two pre-induction values, as plotted in this manuscript), and endpoint measurements (log2 ratio of abundances in the gyrAkd, gyrBkd and topAOX strains to the EVC strain) are available as Datatable_S1.tsv. *** The raw sequencing reads will be made available (e.g. SRA) upon publication. ***
Funding
RM was funded by the Deutsche Forschungsgemeinschaft (DFG), grants AX 84/4-1 & STA 850/30-1 (COILseq) and EXC-2048/1–project ID 390686111 (CEPLAS). MD, DB and TB were financially supported by the CLIB Competence Centre Biotechnology (CKB) funded by the European Regional Development Fund ERDF (EFRE-0300096 and EFRE-0300095).
Materials and Methods
Strains and Plasmids
The Synechocystis parent strain used for all genetic alterations was kindly provided by Lun Yao and Paul Hudson, and based on a strain they had obtained from Dr. Martin Fulda (Göttingen, Germany). It contains a TetR cassette, as well as dCas9 under the promoter PL22, inducible with anhydrotetracycline (aTc), at the genomic insertion site psbA1. Construction of this strain was described in ref. [37]. For overexpression of genes in Synechocystis, relevant genes were directly PCR-amplified from the Synechocystis genome, fused to the rhamnose-inducible promoter [36] via overlap extension PCR, and integrated into a modified variant of the conjugative vector pSHDY containing a nourseothricin resistance instead of spectinomycin, termed pSNDY. The vector backbone also contained the activator rhaS from [36]. Lun Yao and Paul Hudson further provided a strain with gyrB-targeting sgRNA. For the construction of the additional sgRNA constructs, sgRNA sequences were designed using CHOPCHOP [62], constructed via overlap extension PCR and integrated into the vector designed by [37] (Addgene #73224), which inserts into the slr0230 site of the Synechocystis genome. Integrative sgRNA plasmids were integrated via transformation. Briefly, 10 mL of exponentially grown culture was concentrated to 250 µL, 1 µg–2 µg of pure plasmid was added and the mixture was incubated up to 5 h before plating the entire mixture on BG11 plates. After drying the plates, agar was underlaid with 300 µL of 1 mg mL−1 kanamycin stock using a sterile spatula, thereby forming a diffusion gradient. After 1-2 weeks of incubation at 30 ◦C with the lid facing upward, isolated green colonies were carefully transferred to a fresh plate. Over time, positive clones were gradually shifted to higher concentrations of kanamycin (i.e., 4, 8, 12, 20, 40 µg mL−1 final concentration in the plate). Complete segregation of mutants was ensured via colony PCR. Replicative vectors were introduced into the dCas9 background strain via conjugation as described (dx.doi.org/10.17504/protocols.io.ftpbnmn). Clones were selected using nourseothricin (Jena Bioscience, #AB-102L) at a final concentration of 50 µg mL−1 and verified via colony PCR.
Culture Conditions
Batch Culture Conditions
Plates were freshly streaked before liquid cultivation. In the case of noursethricin, liquid cultures were maintained with 25 µg mL−1 while solid media was supplemented with 5050 µg mL−1. Spectinomycin and Kanamycin were supplemented at concentrations of 20 µg mL−1 and 25 µg mL−1, respectively. For pre-culturing and growth experiments, Synechocystis strains were cultivated in BG11 medium [63]. Cultivation was performed at 30 ◦C with 150 rpm shaking and continuous illumination of ≈80 µmol m−2 s−1 (16 % setting in Infors HT). Aeration was ensured by continuous shaking and CO2 enriched air (0.5 %) in an Infors HT multitron chamber. Pre-culturing was performed in 100 mL baffle-free Erlenmeyer shaking flasks with 20 mL cell suspension for three days. After adjusting all different strains to the same OD750 750 ≈ 0.4, growth experiments were performed after one additional day of pre-culturing. For this, 30 mL cultures were incubated in Erlenmeyer shaking flasks for 5 days with a start OD750 ≈ 0.25 in biological triplicates.
Continuous Culture, Online Measurements and Calculations
Continuous culture was performed in a Lambda Photobioreactor and with additional measurement and control devices (Fig. S7) in BG11 medium, supplemented with the required antibiotics, at culture volume V£ = 1 L, aeration with 1 L min−1 of CO2-enriched (0.5 %) air, agitation by the Lambda fish-tail mixing system at 5 Hz, temperature control at 30 %C, and pH 8, with 0.5 M NaOH and 0.5 M H2SO4 as correction solutions. After equilibration to these conditions the reactor was inoculated to a start OD750 ≈ 0.5, from 100 mL pre-culture. White light from the Lambda LUMO module was initially increased as a ramp from 42 to 250 photons, then kept constant, and manually decreased to maintain light intensity approximately at ~ 90 µmol m−2 s−1 per OD750 (Fig. S8F). After the switch to batch culture light was again increased in ramp from 70 to 250 µmol m−2 s−1. For evaporation control and continuous culture, the total weight of the reactor setup was kept constant using the built-in Lambda reactor mass control module and automatic addition of fresh culture medium through the feed pump. In batch mode this controls for evaporation of medium. Continuous culture was performed by setting the waste pump to a fixed speed.
Calibrations of Measured & Controlled Data
The Lambda Photobioreactor (Fig. S7) was equipped with online monitoring of dissolved O2 and pH, and additional monitoring of optical density by a DasGip OD4 module and monitoring of offgas O2 and CO2 concentrations and the weights of feed and pH control bottles by Arduino-based custom-built data loggers (Fig. S7). The signal from the OD4 probe was calibrated to offline OD750 measurements (Fig. S8A-B). For normalizations of glycogen measurements by biomass and for estimation of the biomass density of cells (gDCW/mLcell, Fig. 5D) a LOESS regression of the ODλ signal was calibrated to CDW (Fig. S8C-D). The Lambda LUMO light module consistst of a strip of white LEDs and was calibrated to light intensity in µmol m−2 s−1 with a Licor light meter (LI-250A) with a spherical sensor bulb (LI-193) (Fig. S8E-F).
Calculation of Dilution and Growth Rates
All rates were calculated as slopes of measured data using piecewise linear segmentation [64], implemented in the CRAN R package dpseg (version 0.1.2) [65] (Fig. S9A-D). Growth rates were then calculated as the difference between slopes of measured biomass rate changes (ODλ, CASY cell counts) and the culture dilution rate corrected for evaporation loss, see Figure S9E-F for details. Cell volume growth rate was calculated as the rate of change of the peaks of the CASY cell volume distributions.
Biomass & Metabolite Measurements
Cell Dry Weight Measurement
To determine the cell dry weight (CDW) 5 mL cell culture was filtered through a pre-dried and pre-weighed cellulose acetate membrane (pore size 0.45 µm) using a filtering flask. After that the membrane was dried at 50 ◦C for 24 h and weighed after cooling. 5 mL of filtered and dried growth medium served as a blank.
Optical Density OD750 and Absorption Spectra
The optical density (OD750) and absorbance spectra were measured on a Specord200 Plus (Jena Bioscience) dual path spectrometer, using BG11 as blank and reference. Samples were appropriately diluted with BG11 before measuring. All topAOX time series samples were diluted 1:4 before recording OD750. For absorbance spectra the OD750 was adjusted to 0.5 and the absorbances at 750 nm were subtracted from each spectrum.
Cell Count and Size Distributions
To determine the cell count 10 µL cyanobacteria culture, pre-diluted for OD750 measurement, were dispensed in 10 mL CASYton and measured with a Schaerfe CASY Cell Counter (Modell TTC) using a diameter 45 µm capillary. Cell size was recorded in the diameter range 0 µm–10 µm. Each sample was measured with 400 µL in triplicate runs.
Analysis of the raw data was performed in R. Counted events in the CASY are a mix of live cells, dead cells, cell debris and background signals. Only counts with diameter d > 1.5 µm and d < 5 µm were considered for the time series experiment (Fig. 5B) while a lower cutoff d > 1.25 µm was used for the endpoint measurements (Fig. 2C) to avoid cutting the distribution of the slightly smaller topAKD cells. Since Synechocystis cells are spherical, the cell volumes were calculated from the reported cell diameters d as .
Glycogen Measurement
To determine the glycogen content, 0.5 mL of cell culture was harvested into reaction vessels that had been pre-cooled on ice. After centrifugation at maximum speed for 5 min at 4 ◦C the pellets were flash-frozen in liquid nitrogen and stored at 80 ◦C until further processing. To start the glycogen extraction, the pellet was resuspended in 400 µL KOH (30% w/v) and incubated at 95 ◦C for 2 h. For precipitation, 1200 µL ice cold ethanol was added and the mixture was incubated at 20 ◦C overnight. After centrifugation at 4 ◦C for 10 min at 10 000 g, the pellet was washed once with 70 % ethanol and again with pure ethanol. Afterwards the pellets were dried in a Concentrator Plus speed-vac (Eppendorf) for 20 min at 60 ◦C. Then the pellet was resuspended in 1 mL 100 mM sodium acetate (pH 4.5) supplemented with amyloglucosidase powder (Sigma-Aldrich, 10115) at a final concentration of 35 U/mL. For enzymatic digestion samples were incubated at 60 ◦C for 2 h. For the spectrometric glycogen determination the Sucrose/D-Glucose Assay Kit from Megazyme (K-SUCGL) was applied according to the manufacturer’s specifications, but omitting the fructosidase reaction step and scaling down the total reaction volume to 850 µL. Absorbance at 510 nm was measured using a BMG Clariostar photospectrometer.
ATP and ADP Measurement
2 mL tubes were preloaded with 250 µL of buffer BI (3 M HClO2, 77 mM EDTA). 1 mL culture sample was added, briefly vortexed and incubated on ice for 15 min. 600 µL of BII (1 M KOH, 0.5 M KCl, 0.5 M Tris) were added, vortexed and incubated on ice for 10 min. After centrifugation at 0 ◦C for 10 min at 12 000 g, samples were flash-frozen in liquid nitrogen and stored at 80 ◦C until further processing. Extracts were thawed on ice and centrifuged again at 0 ◦C for 10 min at 12 000 g. 200 µL samples were added either to 320 µL of BIII/PEP (100 mM HEPES, 50 mM MgSO4 × 7H2O, adjusted to pH 7.4 with NaOH, and 1.6 mM phosphoenolpyruvate (Sigma-Aldrich, 860077) for ATP quantification or BIII/PEP+PK (BIII/PEP with 2 U/µL pyruvate kinase (Sigma-Aldrich, P1506) for ATP + ADP quantification, and incubated for 30 min at 37 ◦C. All samples were heat-inactivated at 90 ◦C for 10 min. For luminescence-based quantification, the Invitrogen ATP determination kit was used (ThermoFisher: A22066). 10 µL of each PEP or PEP+PK-treated sample was loaded in a white 96 well plate with solid bottom and kept on ice until the reaction was started. The luciferase master mix was scaled down in volume, and 90 µL of master mix was added to each well. Luminescence was recorded using a BMG Clariostar. ATP concentrations were calculated using a standard curve with commercially available ATP stock solution (Invitrogen).
Flow Cytometry and Analysis
Samples were fixed in 4 % para-formaldehyde in 1xPBS (phosphate buffered saline), washed three times in 1xPBS, and stained with the SYTO9 green fluorescent nucleic acid stain from the LIVE/DEAD kit (thermo) according to manufacturer’s instructions. The flow cytometric measurements were taken at the FACS Facility at the Heinrich-Heine University (Dipl.-Biol. Klaus L. Meyer) using a BD FACSAria III. Forward scatter (FSC) and side-scatter (SSC) were recorded. Syto9 was measured with a 530/30 nm filter, and chlorophyll fluorescence was measured with 695/40 nm filter. For each sample 10,000 events (cells, debris and background) were recorded.
Data was exported in .fcs format, parsed and analyzed using the flowCore R packge [66], and plotted using functions from our in-house segmenTools R package.
DNA and Plasmid Extraction Agarose Gels
DNA Extraction
To isolate the DNA, 1 mL culture was centrifuged at maximum speed for 10 min at 4 ◦C, flash-frozen in liquid nitrogen and stored at 80 ◦C. After thawing, the samples were then resuspended in 1 mL 1x TE buffer by pipetting up and down and 100 µL lysozyme (50 mg/mL stock solution) was added, inverted, and incubated for 1 h at 37 ◦C. Then 10 µL Proteinase K (20 mg/mL stock solution) and 100 µL 20 % SDS were added to the samples and incubated at 37 ◦C for 20 h. The lysed cell suspension was completely transferred to Phasemaker Tubes (ThermoFisher: A33248) and one volume of phenol/chloroform/isoamyl alcohol was added to each tube. The two resulting phases were thoroughly mixed and then separated by centrifugation for 10 min at maximal speed at 4 ◦C and the upper (nucleic acid-containing) phase transferred to a new tube. This was mixed with 100 ng/µL RNAse A and incubated for 15 min at 37 ◦C for RNA degradation. Followed by the addition of 1 volume of chloroform/isoamyl alcohol, the centrifugation step was repeated. The upper phase was again transferred and precipitated with 1 volume 2-propanol at 20 ◦C over night. After centrifugation at maximal speed and 4 ◦C for 10 min to pellet the DNA, the supernatant was discarded. The pellet was washed twice with 500 µL ice-cold 70 % EtOH and centrifuged for 10 min at maximal speed at 4 ◦C. After drying the pellets at room temperature, the pellets were dissolved in 30 µL water and the concentration was determined via Nanodrop (Thermo Scientific NanoDrop 2000c) .
Plasmid Extraction
20 mL of cell culture were mixed with 20 mL of undenatured 99.5 % ethanol, pre-cooled to 80 ◦C, in 50 mL centrifuge tubes and stored at 80 ◦C until processing. After thawing on ice, the supernatant was discarded after centrifugation for 10 min at 4 ◦C and 4000 g. The QIAprep Spin miniprep kit was used for the following steps and adapted and expanded for individual steps. The cell pellet was resuspended in 250 µL Qiagen P1 solution and transferred to 1.5 mL reaction tubes. Then 50 µL lysozyme solution (50 mg mL−1) was added, mixed, and incubated for 1 h at 37 ◦C. After the addition of 55 µL of 20 % SDS and 3 µL of proteinase K (20 mg mL−1), the reaction mixture was incubated at 37 ◦C for 16 h. Starting with the alkaline lysis with the Qiagen P2 solution, all further steps according to the QIAprep Spin Miniprep Kit were carried out with amounts that were adjusted to the initial volume. Next, the concentrations and quality (260/280, 230/280 ratio) was determined using the Nanodrop (Thermo Scientific NanoDrop 2000c). The T5 exonuclease (NEB: M0363) was used for removal of linear and open circular DNA according to the manufacturer’s protocol and incubated at 37 ◦C for 30 min. The isolated plasmid DNA was further purified with the QIAprep modules. For this, 5 volumes of the PB buffer were added to 1 volume of DNA solution. All subsequent steps were carried out using the QIAprep Spin miniprep kit and the plasmid DNA concentration was determined with the Nanodrop.
Gel Electrophoresis and Analysis
Chloroquine Agarose Gel Electrophoresis
Agarose gels with chloroquine diphosphate (CQ, Sigma: C6628-50G, CAS: 50-63-5) were used to determine the relative migration speed of supercoiled topoisomers. A 1.2 % agarose gel in 0.5x TBE (Roth: 3061.2) was prepared by heating to boiling. After cooling (hand-warm) CQ was added to 20 µg mL−1 and the mixture poured into the gel chamber. As running buffer served 0.5x TBE buffer with the same CQ concentration as the gel. For each sample, 120 ng DNA was mixed with loading dye and filled up to 30 µL with water. Next, the voltage source was adjusted 1.8 V cm−1. Gels were run for 20 h–24 h covered in foil to protect from light. Next, the gel was washed two times for 30 min in 250 mL 0.5x TBE buffer to remove the CQ. After that, the gel is stained with 25 µL SYBR Gold (ThermoFisher: S11494) in 225 mL 0.5x TBE buffer for 3 h. Then the CQ agarose gel was imaged by a BioRad Imaging System (ChemiDocTM MP). Washing and staining were also performed light-protected. *** This method and its calibration are introduced in more detail in our preprint manuscript at bioRxiv [35]. ***
Analysis of Gel Electropherograms
Electropherograms of agarose gels of plasmids were extracted in ImageJ for each lane. Electropherograms from capillary gel electophoresis of total RNA samples (Agilent Bioanalyzer) were parsed from exported XML files using the R package bioanalyzeR (v 0.7.3, obtained from https://github.com/jwfoley/bioanalyzeR) [67]. Electropherograms were then processed in R, using LOESS smoothing and peak detection functions from the msProcess R package (version 1.0.7) (https://cran.r-project.org/web/packages/msProcess/). A baseline was determined in two steps using the msSmoothLoess function . The first step used the full signal and served to determine the coarse positions of peaks. The final baseline was then calculated from the signal after removal of peak values. This baseline was subtracted from the total signal to detect peaks (bands) with the msPeakSimple function from msProcess and calculate peak areas. For the total RNA analysis, the baseline signal stems from mRNA and rRNA degradation fragments, and was used to calculate ratios of rRNA peak areas to the “baseline” area.
Transcriptome Analysis: Sampling, RNA Extraction and Sequencing
RNA Extraction and Processing
Cellular activity was stopped by adding 1 mL culture directly to 250 µL 100 % ethanol supplemented with 5 % phenol, flash-frozen in liquid nitrogen and stored at 80 ◦C until further processing. For RNA extraction, a protocol modified from [68] was used. Briefly, frozen samples were centrifuged at maximum speed at 4 ◦C. After discarding the supernatant, the pellet was resuspended in 1 mL PGTX and incubated at 95 ◦C for 5 min. After cooling on ice for 2 min, 700 µL chloroform:isoamyl alcohol (24:1) were added and the mixture was incubated shaking gently at room temperature for 10 min.The mixture was centrifuged for 10 min at maximal speed at 4 ◦C. The upper phase was transferred to a fresh tube and 1 volume chloroform:isoamyl alcohol was added. After repeating the centrifugation step, the upper phase was again transferred and precipitated with 3 volumes of 99.5 % ethanol and 1/2 volume 7.5 M ammonium acetate at 20 ◦C over night. In cases where low RNA concentrations were to be expected, 1 µL RNA-grade glycogen was added to the precipitation mixture. The RNA was pelleted for 30 min at maximum speed and 4 ◦C, washed twice with 70 % ethanol and resuspended in 30 µL RNase-free water. RNA was DNaseI-digested using commercial DNaseI (ThermoFisher: EN0525), according to the manufacturer’s specifications, but using twice the concentration of reaction buffer. DNaseI-digested RNA was phenol/chloroform extracted again to remove the DNaseI. For precipitation after DNaseI-digest, 1/10 volume of 3 M sodium acetate (pH 5.3), was used instead of ammonium acetate.
Quantitative RT-PCR
For qRT-PCR, DNaseI-digested RNA was reverse-transcribed to cDNA using the commercial RevertAid RT (ThermoFisher: K1621) according to the manufacturer’s specifications in a reaction volume of 20 µL. To the 20 µL reaction, 60 µL RNase-free water was added. Of this cDNA dilution, 2 µL per reaction well were loaded, and 8 µL Master Mix prepared from the DyNAmo ColorFlash SYBR Green qPCR-Kit (ThermoFisher: F416L) were added to each 2 µL cDNA reaction. The thermal cycling conditions were as follows: 7 min at 95 ◦C, followed by 40 cycles of 5 s at 95 ◦C and 30 s at 60 ◦C. Data was recorded after each cycle. RT-negative controls and no-template-controls (distilled water) were included for each run. Each sample was loaded in technical triplicates.
Gene expression changes at indicated time points, usually after induction, were then quantified by the ΔΔCt method [69]:
where Ctgoi and Ctref are the Ct values of a gene of interest (gyrA, gyrB, topA) and of a reference gene (rpoA), respectively. The ΔΔCt value then compares expression at time points t after induction with expression at a time point t0 before induction, and is equivalent to a log2 fold change.
RNAseq: Total RNA Analysis, Library Generation and Sequencing
RNA quality was evaluated spec-trometrically by Trinean Xpose (Gentbrugge, Belgium) and by fragment size distribution on an Agilent 2100 Bioanalyzer with the RNA Nano 6000 kit (Agilent Technologies, Böblingen, Germany). Electropherograms for the endpoint RNAseq samples were exported as XML files for further analysis; see paragraph Analysis of Gel Electropherograms. for details.
The Illumina Ribo-Zero Plus rRNA Depletion Kit was used to remove the ribosomal RNA molecules from the isolated total RNA. Removal of rRNA was evaluated with the RNA Pico 6000 kit on the Agilent 2100 Bioanalyzer. RNA was free of detectable rRNA. Preparation of cDNA libraries was performed according to the manufacturer’s instructions for the TruSeq stranded mRNA kit (Illumina, San Diego, CA, United States). Subsequently, each cDNA library was sequenced on an Illumina NextSeq 500 system (2 x 75 nt PE high output v2.5).
RNAseq: Read Mapping
The resulting sequence reads were quality trimmed with Trimmomatic v0.33 [70] using standard setting. The quality trimmed reads were subsequently mapped to coding genes of the Synechocystis sp. PCC 6803 reference genome (NC_000911) including the plasmids pCA2._M, pCB2.4_M, pCC5.2_M, pSYSM, pSYSA, pSYSG and pSYSX (CP003270, CP003271, CP003272, NC_005229, NC_005230, NC_005231, NC_005232), and the constructed plasmid pSNDY_Prha_topA-6_119rhaS_20210310 using Bowtie 2 [71].
For the endpoint measurements from batch cultures the log2-fold changes with respect to the controls (EVC) were calculated with the DESeq2 algorithm [45] via the ReadXplorer software version 2.0 [72], based on three replicate measurements for each strain (“M-value”). For the time series read-count data were normalized by library sizes to the Transcripts Per kilobase Million (TPM) unit. Missing values at indivdual time points were interpreted as 0 TPM.
Transcriptome Analysis: Batch and Time-Series Analysis
Cluster Analysis
For clustering the time series into co-expressed cohorts, a previously established pipeline was used [73, 74]. Briefly, the time-series of TPM values was arcsinh-transformed:
then the Discrete Fourier Transform (DFT) was calculated:
where x′n = {x′0,…x′N-1} are the (transformed) expression values at time points {t0,…, tN−1}, and Xk, the DFT, is a vector of complex numbers representing the decomposition of the original time series into a constant (mean) component (at k = 0) and a series of harmonic oscillations around this mean. The input time series xn were RNA-seq samples 2 to 16 (from 0.5 h to 72 h around the time of induction at 0 h), i.e., without the first pre-induction time-point and ignoring the two long-term response samples. Components k > 1 were further scaled by the mean of amplitudes at all other components k > 1:
Real and imaginary parts of selected components Xk=1,…,6 of the DFT were then clustered with the flowClust algorithm [75] over cluster numbers K = 2 …10. The clustering with the maximal Bayesian Information Criterion, as reported by flowClust (Fig. S12A), was selected for further analyses. Data transformation and clustering were performed by the processTimeseries and the clusterTimeseries2 functions of segmenTier and segmenTools packages, respectively. The resulting clusters were sorted and colored based on the comparison with diurnal co-expression cohorts (Fig. 6 and S17) for informative plots of the subsequent analyses.
Immediate Response Analysis
To estimate the immediate transcriptional response to topA overexpression (Fig. 6C, bottom panel) the difference of read counts (TPM) between the means of the two pre-induction time points (−-1 d, -35 min) and the two post-induction time points (5 min, 20 min) was calculated. Transcripts with negative values were labelled as “down”, with positive values as “up”, and 0 or not available values as “nc”.
Clustering of Transcription Units
Average expression was calculated for transcription units (TU) reported by ref. [50] from the expression of coding genes they encompass (via the “Sense.tags” column of the original data set). The resulting TU time-series was clustered by k-means, using cluster centers from the CDS clustering (Fig. 5) and identical time-series processing. This way protein-coding transcription units could be assigned to the same cluster labels (Fig. S23). Clusters were then sorted by their mean expression peaks, and colored and named according to the diurnal time of their peaks.
Clustering of Diurnal Transcriptome Data
Diurnal expression data from [48] were obtained from GEO (GSE79714) and genes summarized as the mean over all associated probes. These expression values were clustered the same was as described for the RNA-seq data: all time-series were DFT-transformed and amplitude-scaled DFT components (eq. 4–5) were clustered with segmenTier functions processTimeseries and flowclusterTimeseries, using the maximal BIC clustering with K = 5 clusters.
Cluster-Cluster Overlap Tests
Categorical enrichments, e.g., coding gene co-expression cohorts vs. gene annotations, were analyzed by cumulative hypergeometric distribution tests (R’s phyper) using segmenTools’s clusterCluster function and the clusterAnnotation wrapper for GO and and protein complex analysis, which compares overlaps of each pair of two distinct classifications into multiple classes, and stores overlap counts and p-values (enrichment tables) for informative plots (see “Enrichment Profile Plots”).
Enrichment Table Sorting
For intuitively informative plots the enrichment tables were sorted. Table rows were sorted along the other dimension (table columns) such that all categories enriched above a certain threshold psort in the first column cluster are moved to the top, and, within, sorted by increasing p-values. Next, the same sorting is applied to all remaining row clusters for the second column cluster, and so on until the last column cluster. Remaining row clusters are either plotted unsorted below a red line or removed. This is especially useful to visualize enrichment of functional categories along the temporal program of co-expression cohorts, e.g., Figure 6B. This sorting is implemented in segmenTools’ function sortOverlaps.
Cluster t-Test Profiles
To compare clusters (co-expression cohorts) with numerical data, here the log2 fold-changes of transcript abundances in induced gyrkd and topAOX strains, we developed the segmenTools’ function clusterProfile. For each cluster a two-sided t-test was performed (R base function t.test, incl. Welch approximation for different sample sizes), comparing the distribution of values of the cluster with all other values. The reported t statistic and the p-value were stored for each test. The resulting t-test profile was stored for informative plots (see “Enrichment Profile Plots”).
Enrichment Profile Plots
The results of cluster enrichment tests (cluster-cluster overlap tests and t-test profiles) were visualized as colored table plots (e.g. Fig. 6B, C), using segmenTools’ function plotOverlaps. For the categorical overlap tests, the total counts of overlapping pairs are plotted as text, and for t-test profils the rounded t statistic. The text color is black or white based on a p-value cutoff ptxt (as indicated).
The field background colors scale with log2(p) of the reported p-values, where the full color corresponds to a minimal p-value pmin cutoff (as indicated). For categorical enrichment tests the full color is black and other colors are selected from a gradient to white. For numerical tests, the sign of the t statistic is used to determine a color to indicate the direction of change: red for negative values (downregulated) and blue for positive values (upregulated).
Motif Scan and DNA Structure Analysis
The genome was scanned for short sequence motifs using custom-built R code based on the str_locate_all function of the stringr R package [76] into a vector of 0 and 1 for each genome position, where 1 indicates occurence of the motif under consideration. Motif occurence vectors upstream and downstream of start codons or transcription start sites were extracted from the genome vector and aligned into a matrix (columns: positions around the alignment anchor, rows: all genomic sites under consideration). The occurence of a motif in all sequences of a cluster were counted at each position i in 66 bp windows surrounding the position. Cumulative hypergeometric distribution tests (R’s phyper) were performed to analyze statistical enrichment or deprivation of a given motif within the 66 bp window of all genes in a cluster vs. the same window in all genes in the total analyzed set (all clustered and aligned sequences), and the p-values pi at position i stored. In the resulting plots of the mean position-wise motif occurence the size of the plotted data point at position i was scaled by the enrichment or deprivation p-values to emphasize regions of significant difference (Fig. 6D). The maximal size was determined by the minimum p-value in each test series (Figure panels). The point style (closed or open circles) indicates the directionality of the test (enriched or deprived). The significance points are only shown at every third or tenth position to avoid overlaps.
Other Data Sources
Genome sequences and annotation were downloaded from NCBI, see “RNAseq: Read Mapping” for the RefSeq IDs. The gene “categories” annotation was downloaded on 2017-09-23 from CyanoBase [77]: http://genome.annotation.jp/cyanobase/Synechocystis/genes/category. txt. Gene Ontology annotation was downloaded from the UniProt database (2021-03-20, organism:1111708) [78]. Datasets from other publications were all obtained from the supplemental materials of the indicated publications.
Acknowledgments
The parental strain Synechocystis sp. PCC 6803, encoding aTc-inducible dCas9, and the gyrBkd strain were a gift from P. Hudson and L. Yao. We are grateful to Nic Schmelling for critical discussion of the manuscript.
Footnotes
* Results and Discussion: References and more discussion added for hypernegative supercoiling of plasmids, downstream gyrase activity and binding sites (total RNA and rRNA analysis), and night-specific ppGpp regulation (bifurcation of response). The final paragraph on genomic vs. plasmid DNA replication was simplified. * The conclusion was shortened and clarified.
↵1 1*** ref. [35] is a preprint by our group at bioRxiv: https://doi.org/10.1101/2021.07.26.453679 ***
↵2 2*** ref. [35] is a preprint by our group at bioRxiv: https://doi.org/10.1101/2021.07.26.453679 ***
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵