SUMMARY
It is well known that synthetic gene expression is highly sensitive to how comprising genetic elements (promoter structure, spacing regions between promoter and coding sequences, ribosome binding sites, etc.) are spatially configured. An important topic that has received far less attention is how the physical layout of entire genes within a synthetic gene network affects their individual expression levels. In this paper we show, both quantitatively and qualitatively, that compositional context can significantly alter expression levels in synthetic gene networks. We also show that these compositional context effects are pervasive both at the transcriptional and translational level. Further, we demonstrate that key characteristics of gene induction, such as ultra-sensitivity and dynamic range, are heavily dependent on compositional context. We postulate that supercoiling can be used to explain these interference effects and validate this hypothesis through modeling and a series of in vitro supercoiling relaxation experiments. On the whole, these results suggest that compositional context introduces feedback in synthetic gene networks. As an illustrative example, we show that a design strategy incorporating compositional context effects can improve threshold detection and memory properties of the toggle switch.
INTRODUCTION
A fundamental aspect of designing synthetic gene networks is the spatial arrangement and composition of individual genes. With advancements in DNA assembly technology (Engler et al., 2008; Gibson et al., 2009; Lee et al., 2015; Weber et al., 2011), drops in sequencing and prototyping costs (Chappell et al., 2013; Shin & Noireaux, 2012; Sun et al., 2013), the continual discovery of novel synthetic biological parts (Stanton et al., 2014), synthetic biology is poised to make a quantum leap in the size and complexity of the networks it can build. So why hasn’t it happened yet?
The challenge is that synthetic biological parts can be highly sensitive to context (Cardinale & Arkin, 2012), e.g. the physical composition of elements in synthetic genes, conditions of the host chassis, and environmental parameters. Context effects can often be mitigated by engineering principles such as standardization (Davis et al., 2011; Mutalik et al., 2013a) or high-gain feedback (Del Vecchio et al., 2008; Mishra et al., 2014). Frequently, it is critical to have an understanding of physical mechanisms underlying context effects before they can be resolved (Davis et al., 2011; Mutalik et al., 2013a). The key insight is that context effects in synthetic gene networks can rarely be ignored; the study of context effects leads to principle-based design approaches that mitigate their interference.
Complementary to principle-based design approaches are large-library screening approaches (Kosuri et al., 2013). Kosuri et al. showed that it is possible to rapidly screen combinatorial promoter-ribosome-coding sequence libraries for intended gene expression levels and regulatory function, even if models for individual genetic elements such as promoters and RBSs have limited prediction power (Kosuri et al., 2013). Smanski et al. (2014) screened a large combinatorial library for a sixteen gene nitrogen fixation cluster, to explore the effect of genetic permutations in ordering, orientation, and operon occupancy. They discovered there were strong differences in nitrogenase activity, depending on the compositional configuration, but no clear architectural trends emerged from monitoring acetylene reduction. Moreover, the number of compositional context variants (more than 𝒪(1019)) of a sixteen gene cluster made it impossible to exhaustively search and screen for the optimal variant.
These results underscore the complementary role that library screening and principle-based design approaches have in synthetic biology. Library screening approaches can be an extremely effective way to optimize performance in individual parts. However, the number of possible compositional context variants for larger biological networks quickly mushrooms to scales that are intractable for library-based approaches. If we are to build increasingly larger synthetic biocircuits, including synthetic genomes designed from scratch (Gibson et al., 2010), we need a deeper physical understanding of how compositional context affects gene expression.
Most recently, Chong and coworkers showed that transient accumulation of localized positive supercoiling can lead to reduction in gene expression — they showed through in vitro transcription experiments that supercoiling could be a physical mechanism behind transcriptional bursting (Chong et al., 2014). Their results also suggested that the presence of nearby topological barriers such as DNA-bound proteins or transcriptional activity of neighboring genes can affect local gene expression.
To paraphrase John Donne, the broad implication of these studies is that “no [gene] is an island entire of itself". Indeed, Rhee et al. (1999); Shearwin et al. (2005) show that genes with overlapping transcripts are subject to transcriptional interference. However, even in non-overlapping genes, the statistical analysis of Korbel et al. (2004) suggest there is a strong link between spatial arrangement and co-regulated gene pairs. Is the same true of synthetic gene networks? Has the field of synthetic biology ignored a fundamental mode of transcriptional regulation in natural gene networks that could be exploited in synthetic gene networks? How does compositional context, i.e. the spatial arrangement of genes, affect gene expression in synthetic gene networks?
RESULTS
Compositional context significantiy affects transcription of synthetic genes
To explore compositional context, we constructed a set of plasmids, varying gene orientation, relative orientation, coding sequence identity, and the length of spacing between genes. There are three relative orientations that two genes can assume: 1) convergent orientation, where transcription of both genes proceeds in opposite directions and towards each other, 2) divergent orientation, where transcription of both genes proceeds in opposite directions, away from each other and towards genetic elements on the plasmid backbone, and 3) tandem orientation, where transcription of both genes proceeds in the same direction (Liu & Wang, 1987; Shearwin et al., 2005). We constructed plasmids of each orientation to examine their effect on gene expression in vivo and in vitro.
Each plasmid incorporated two reporter genes, assembled and inserted in the same locus of a consistent vector backbone. Each gene consisted of an inducible promoter, the Lac or Tet promoter, and encoded expression of a fluorescent reporter. Each plasmid (containing an orientation variant) was transformed into MG1655Z1 E. coli, which expresses LacI and TetR constitutively from the genome. Since LacI and TetR do not cross-regulate each other, we were able to conclude that any expression differences across strains were solely due to context effects (Ceroni et al., 2015).
We first used mSpinach RNA aptamer and MG RNA aptamer as reporters downstream of the Lac and Tet promoter, respectively. Since mSpinach RNA aptamer is not cytotoxic, it can be used in live-cell imaging to explore how induction response of the Lac promoter varies with compositional context. In our experiments, we first trapped single cells in a microfluidic chamber (CellASIC ONIX B04A system). Next, we flowed 3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI) in culture media for 30 minutes, the dye that fluoresces when bound to mSpinach, to ascertain background levels of fluorescence from DFHBI. Then we flowed 1 mM of ispropyl-β-D-1-thiogalactopyroside (IPTG) to release repression of LacI, thus activating expression of mSpinach RNA aptamer. We observed that the induction response of the Lac promoter varied significantly depending on its relative gene orientation, even though the neighboring gene was never activated by aTc (Figure 2).
Convergent oriented mSpinach expression produced a ramp-like response to IPTG induction, rising gradually over the course of three hours to reach a steady-state level of expression coinciding with saturation in the microfluidic chamber (see Figure 2). Convergent oriented mSpinach also gave a strong bimodal response to IPTG induction, with one group of cells with high expression (green traces in Figure 2B) and another with low expression (black traces in Figure 2B).
In contrast, divergent oriented mSpinach had a very weak induction response. Tandem oriented mSpinach had bimodal expression as well, with its brightest population of cells expressing at steady-state levels comparable to the weak population in convergent orientation. The remainder of tandem oriented mSpinach E. coli cells showed very weak levels of fluorescence.
Interestingly, cells with tandem oriented mSpinach exhibited pulsatile expression, in contrast to the ramp-like response shown by convergent oriented mSpinach. A few outlier cell traces achieved levels of mSpinach expression comparable to the bright convergent mSpinach population, but only at the peak of their transient pulses. Overall, tandem oriented mSpinach exhibited bursty and weaker gene expression than convergent oriented mSpinach.
Since many intracellular parameters fluctuate stochastically in vivo (Elowitz et al., 2002), we ran control experiments of each plasmid in a cell-free E. coli derived expression system (Noireaux et al., 2003). In this system the effects of single-cell variability are eliminated, e.g. variations in LacI and TetR repressor concentration, polymerase, ribosome, tRNA pools. Also, all deoxynucleotide triphosphates are removed during preparation of cell-extract, thus eliminating any confounding effects of plasmid replication. We prepared separate cell-free reactions for each orientation, assaying mSpinach and MG aptamer expression in a plate reader, using equimolar concentrations for each reaction (Figure 2D-E). Because all cell-free reactions were derived from a single batch of well-mixed extract, the variability in LacI repressor concentration was minimal.
Again, we observed that mSpinach was brightest in the convergent orientation, weakest in the divergent orientation and achieved intermediate expression in the tandem orientation, confirming the trends observed in vivo. Likewise, MG aptamer expressed strongest in the convergent orientation, weaker in the tandem orientation, and weakest in the divergent orientation. These in vitro outcomes were all consistent with the data from in vivo single cell experiments. Since the only connection between our in vitro tests and and in vivo strains is the plasmids themselves, this conclusively confirms that compositional context is the reason for differences in gene expression. All in all, we conclude that compositional context can significantly alter the transcriptional response of a gene to induction.
Compositional context effects are pervasive in translational reporters
To explore if these compositional context effects were seen in translated reporters, we first replaced the coding sequence of MG aptamer with the coding sequence for red fluorescent protein (Bba E1010 (Zhang et al., 2002)). We also interchanged the spacer between mSpinach and RFP, to see if our results were dependent on the sequence of the spacer. We then ran an identical experiment, as in Figure 2, to see how induction of mSpinach affected and correlated with RFP expression in single cells.
As expected, relative gene orientation had the same effect on mSpinach transcription as in Figure 2. Even with RFP in place of MG aptamer, mSpinach expression was highest in the convergent orientation and weakest in the divergent orientation.
We also observed that both convergent and tandem oriented mSpinach expressed with a bimodal phenotype (see Supplemental Figure S2B-C). These results confirmed that the identity of the neighboring gene and spacer sequence content was not the source of these gene expression differences.
Interestingly, RFP expression was extremely leaky in the divergent orientation, suggesting that leaky RFP expression somehow broadly interfered with mSpinach expression across all cells. In contrast, convergent oriented mSpinach and RFP showed strong XOR logic — any cells that expressed small amounts of RFP did not respond to IPTG induction with mSpinach expression, while cells that did not express any RFP showed strong mSpinach expression. This data suggests compositional context can be exploited to shape co-expression of neighboring genes.
To further show that compositional context effects extend to translated reporters, we replaced mSpinach with cyan fluorescent protein (CFP) (Veening et al., 2004). We deliberately used a weak RBS, BCD16 from (Mutalik et al., 2013a), for CFP and a strong RBS, derived from pET29-b, from the Bgl Brick pBbE5K plasmid (Lee et al., 2011) to ensure that any ribosome competition effects Gyorgy et al. (2015) would be unidirectional (RFP loading on CFP and not vice-versa) (Gyorgy et al., 2015). Thus, if both genes are induced, any differences in RFP expression would elucidate compositional context effects and not competition for translational resources.
First, we ran a single induction experiment, analogous to experiments run for Figure 2 and Supplemental Figure S2. Induction of CFP with IPTG showed that mean CFP expression was again strongest in the convergent, (slightly) weaker in the tandem orientation, and weakest in the divergent orientation (Figure 3, Supplemental Figures S3C-D (IPTG-only condition).
As a control, we cloned and induced CFP as a single gene on the exact same plasmid locus (either in sense or antisense orientation relative to the plasmid vector). In both control plasmids, 100 bp flanking upstream and downstream sequences were preserved as in the experimental plasmids, to eliminate any promoter sensitivity to upstream sequence perturbation. We noticed a dramatic 5-fold increase in signal over the weakest expressing orientation (compare Supplemental Figures S3B,F-G and S3C). In contrast, comparing sense and anti-sense expression of CFP showed only a small (at most 10% difference in expression). This confirmed that the observed compositional context effects could not be attributed to genetic elements within the plasmid backbone.
We also tested the effect of changing the plasmid vector (from ColE1 to p15A). Plasmid vectors were chosen to feature entirely different compositions of replication origin and resistance marker (Supplemental Figure S4B-C). While the quantitative differences in expression changed by varying plasmid vector (most likely reflecting a change in the copy number of the plasmid), the trends were qualitatively identical. This confirmed that plasmid backbone composition was not the primary source of the observed context effects.
Once again, to control for single-cell variability in vivo, we tested RFP and CFP expression of each context variant in a cell free expression system (Shin & Noireaux, 2012). CFP and RFP expressed strongest in convergent orientation, weaker in tandem orientation, and weakest in divergent orientation (Figure 5B). These results were consistent with results of our prior in vitro tests with mSpinach-MG aptamer plasmids.
Taken in whole, these findings lead us to conclude that the increase in convergent and tandem CFP expression over divergent oriented CFP was unrelated to resource loading effects, plasmid copy number variability or processes related to plasmid replication. These trends were also consistently observed across multiple coding sequences, transcript lengths, including transcriptional and translational reporters. Therefore, we conclude the compositional context is the primary source of the observed differences in gene expression.
Induction Response of Genes is Affected Significantly by Compositional Context
To see how compositional context altered the induction response over a range of inducer concentrations, we titrated both IPTG and aTc and quantified RFP and CFP expression in bulk culture plate reader experiments (Figure 4 and Supplemental Figure S3E). As predicted by our choice of RBSs (using a strong RBS for RFP and a weak RBS for CFP), increases in RFP expression (corresponding to increasing aTc concentrations) consistently resulted in decreased CFP expression independent of orientation. As expected, increasing CFP expression did not decrease RFP expression. What was most notable was how gene orientation affected the induction response of RFP expression to varying amounts of aTc inducer.
In the convergent orientation, we saw that the transfer curve of RFP expression exhibited strong ultra-sensitivity, increasing by 120-fold in response to only an 8-fold change in aTc. At 100-200 ng/mL of aTc, RFP expression plateaued in an on-state of expression and below 25 ng/mL, RFP expression plateaued in an off-state of expression. Thus, diluting aTc with an 8x dilution factor had the effect of completely switching RFP from an on to an off state.
In contrast, the tandem orientation required a 64-fold change in aTc concentration to achieve a comparable (100x) fold-change in RFP expression. At 100-200 ng/mL, we also saw RFP expression plateaued in an on-state of expression (for all concentrations of IPTG tested). However, RFP reached an off-state of expression only when aTc was diluted down to3 ng/mL or lower. Thus, to achieve the same dynamic range as convergent RFP required an 8x increase in dilution factor.
Divergent oriented RFP exhibited the smallest dynamic range. Varying aTc concentration 200-fold produced at most a 2.7 fold change in RFP expression, despite a 128x dilution factor. Even at low concentrations of aTc, RFP expressed at much higher levels than background. To investigate this effect, we quantified RFP expression in the absence of aTc and discovered it was generally leaky (see Supplemental Figure S3B). We observed similar leaky expression at the single cell level, both in the divergent oriented RFP and CFP MG1655Z1 strain (Figure 3) and divergent oriented RFP and mSpinach strain (Supplemental Figure S3). Since both strains used different spacing sequences of lengths ranging from 150-350 bp, we concluded these leaky effects were a function of RFP gene orientation and not spacer identity nor proximity to the Lac promoter.
We also fit the induction response of each fluorescent protein (titrating the appropriate inducer) while maximally inducing the other gene (Figure 4B-C). Our fits characterized induction response in terms of four parameters, leaky expression l, effective cooperativity n, maximum expression Vmax, and half-max induction Km. We noticed that convergent oriented RFP showed significantly increased cooperativity coefficient, nearly four-fold more than tandem orientation and and eight-fold more than divergent orientation. Also, convergent orientation consistently fitted with the highest Km value in both RFP and CFP induction curves, suggesting that orienting genes convergently raises the induction threshold.
Our experimental data conclusively show that gene expression, induction, and repression can be affected by compositional context. Overall, compositional context can dramatically alter canonical properties of synthetic gene expression and thus should not be overlooked when designing synthetic gene networks.
A dynamic model incorporating supercoiling states recapitulates observed compositional context effects
Building on the work of (Chong et al., 2014; Liu & Wang, 1987) we investigated whether supercoiling can explain the compositional context effects seen in our data. We constructed an ordinary differential equation (ODE) model describing transcription and translation of both genes. To describe the interplay between gene expression and accumulation of supercoiling for each gene, we introduced separate states to keep track of promoter supercoiling and coding sequence supercoiling. This model structure allowed us to study how supercoiling affected both the processes of transcription initiation and elongation (Drolet, 2006)).
The kinetic rates of transcriptional initiation and transcriptional elongation have been found to be affected significantly by supercoiling buildup (Drolet, 2006). Negatively supercoiled DNA tends to facilitate transcription initiation in the promoter, while transcriptional elongation benefits from negative supercoiling up to a certain point (since negative supercoiling also facilitates melting of the DNA helix). Too much negative supercoiling can lead to the formation of R-loops, structural complexes that involve DNA binding to nascently produced RNA still attached to RNA polymerase. These R-loops complexes have been shown to cause transcriptional stalling (Drolet, 2006).
Conversely, positive supercoiling of DNA introduces torsional stress since positive supercoils naturally oppose the left-handed twist of DNA. Such stress leads to localized regions of tightly wound DNA that is less likely to be transcribed; positive supercoils downstream of a transcription bubble can also impose torsional resistance against further unwinding of the DNA, thereby stalling transcription. When a gene expresses and produces positive supercoiling downstream of the transcription bubble, the accumulation of positive supercoiling can be exacerbated by the presence of a topological barrier, e.g. the binding of a protein such as a transcription factor, or even the presence of another active gene in negatively supercoiled state (Chong et al., 2014). This buildup in positive supercoiling leads to the reduction in the initial rate of gene transcription, or what is often referred to as a bursty profile of gene expression. Thus, excessive twist in the DNA double helix in either direction can decrease transcription rates.
In our model we account for the above considerations by encoding a dependency of transcription rate parameters on local supercoiling density. We build on the analysis of Meyer and Beslon (Meyer et al., 2014) and consider transcription initiation rates to be dynamically dependent on supercoiling density. We model them as Hill functions of the absolute deviation of the promoter supercoiling state from a natural super-coiling state (Rhee et al., 1999). In other words, as DNA becomes too twisted in either the positive or negative direction, transcription initiation rates and elongation rates decrease. Similarly, we suppose that the elongation rate of the gene of interest (mSpinach aptamer, MG aptamer, CFP, or RFP) can be modeled as a Hill function of the supercoiling state over the transcript region. Thus, by modeling the dependency of transcription rates on supercoiling, we effectively introduce context-specific coupling between neighboring genes (Figure 5).
After incorporating these supercoiling hypotheses, our model was able to recapitulate compositional context trends observed in our experimental data (Figure 5B). Our simulations showed that convergent oriented mSpinach (and CFP) is able to achieve higher levels of expression than its divergent and tandem counterparts. Since mSpinach expresses in the anti-sense direction, its expression introduces negative-supercoils upstream according to the twin-domain model by Liu & Wang (1987). In moderate amounts, negative supercoiling facilitates the unwinding of DNA and thus enhances the amount of transcriptional initiation and elongation occurring over the Lac promoter and mSpinach transcript. In the divergent and tandem orientation, mSpinach is expressed in the sense direction, which results in positive supercoiling buildup downstream of the promoter (Figure 5A). The outcome is that divergent and tandem mSpinach expression is reduced (compared to convergent), since the buildup of positive supercoiling inhibits initiation and elongation. This effect is more severe in the divergent orientation, since excessive positive and negative supercoiling generated by initiation of the Tet and Lac promoter can interfere with each other’s expression (Figure 5C).
In exploring the parameter space of our model, we also found that gyrase (an enzyme that relaxes positive supercoiling) and TopoI (an enzyme that relaxes negative supercoiling) activity are not sufficiently high to counteract the coils introduced by rapid repeated transcription events on DNA with two genes. These findings were consistent with the analysis of Chong et al. (2014), Liu & Wang (1987), and Meyer et al. (2014), which argued that buildup of transcription-induced supercoiling far outpaces the activity of supercoiling maintenance enzymes in E. coli. This explains why we are able to see compositional context effects both in vivo and in vitro where gyrase and topoisomerase enzymes are presumably present and active. These results also suggested that extended pre-incubation of plasmids with gyrase would allow us to infer the effect of relaxing positive supercoiling on gene expression in each orientation.
Relaxing positive supercoiling in plasmids significantly reduces compositional context effects
To test the effect of incubating context-variant plasmids with gyrase, we purified plasmids expressing convergent, divergent, and tandem oriented RFP and CFP from uninduced MG1655Z1 E. coli. We divided each plasmid sample into two aliquots — one aliquot was used as a control for the absence of gyrase treatment and the second aliquot was incubated with gyrase (NEB) at 37° C overnight. All gyrase treated samples were prepared and concentrated to equimolar concentrations, so that the enzyme-substrate ratio was identical across all samples. After gyrase incubation, plasmids were purified again to wash out the gyrase buffer using Amicon 0.5 mL Ultracentrifugal Filters. Once again, we tested the expression levels of each plasmid in the cell-free TX-TL system Shin & Noireaux (2012).
In the absence of gyrase, we see that the expression differences between different orientations are largest when comparing convergent and divergent CFP and RFP expression levels. Convergent oriented plasmids expressed almost 300 nM CFP and 500 nM RFP more than its divergent counterpart (see Figure 6). Similarly, tandem oriented plasmid expressed 200 nM more CFP and 300 nM more RFP than divergent orientation. Convergent plasmid also expressed higher than tandem plasmid in both RFP and CFP channels.
After gyrase treatment, tandem oriented CFP and RFP expressed brighter than their convergent and divergent counterparts. We also saw that the disparity in protein expression between convergent and divergent orientation shrunk to 100 nM (66% for CFP) and 180 nM (64% for RFP). Since gyrase serves only to relax plasmids of positive supercoiling, this confirmed that supercoiling is the mechanism underlying compositional context effects.
We anticipated that treatment with gyrase would release positively supercoiled domains in the downstream region of tandem oriented CFP and RFP and release positive supercoiling buildup from divergently (leaky) expressed RFP (Figure 5A) and thereby reduce torsional stress in the promoters of divergently oriented CFP and RFP. Our experimental results confirmed these hypotheses, with divergent orientated CFP and RFP increasing by more than 2 fold and tandem orientated CFP and RFP increasing by 1.4 fold.
Interestingly, gyrase treatment of convergent oriented CFP and RFP appeared to reduce signal slightly, by approximately 10%. This may be because convergent oriented CFP and RFP exhibited little or no leak when uninduced (in contrast to divergent and tandem orientation); thus the purified plasmid for convergent orientation did not have as much positive supercoiling for gyrase to mitigate. Treatments with gyrase may actually have introduced too much negative supercoiling, leading to the small drop in expression observed.
These experimental outcomes are consistent with our model of supercoiling and its impact on compositional context. Gyrase relaxes positively supercoiled domains downstream of convergent and tandem oriented RFP, while in the divergent orientation, gyrase relaxes any positive supercoiling buildup near the promoter region. Once these positive super-coils are removed, the genes are able to express at much higher levels than prior to treatment.
Our data shows that compositional context can have a strong effect on the dynamics of supercoiling within plasmids. Nearby transcriptionally active genes or protein-bound genes act as topological barriers to stop migration of supercoils or dispersion of localized torsional stress. Protein-bound genes in particular, act to trap supercoiling in neighboring transcriptionally active genes; this may explain why in our IPTG induction experiments, the mere presence of a repressed RFP and MG gene (respectively) could have such a significant effect on CFP and mSpinach expression. In this way, gene orientation and placement can introduce a fundamentally different form of feedback coupling between neighboring genes. When used appropriately, these feedback effects can be beneficial or consistent with the intended architecture of the biocircuit, as we illustrate in the next section.
Compositional context improves memory and threshold detection in toggle switch
Synthetic gene networks, for the most part, have been designed primarily to avoid one type of compositional context effect: terminator leakage. Terminator leakage can cause positive correlation between a downstream gene with an upstream genes. While this is a noteworthy consideration in designing synthetic gene networks, it is a dramatic oversimplification of what causes compositional context effects.
In contrast, when we incorporate an understanding of how compositional context and supercoiling cause interference in synthetic gene networks, we can actually utilize compositional context to improve or reinforce the feedback architecture of synthetic gene networks.
The original toggle switch provides an excellent case study of how an informed understanding of compositional context can improve design. Being one of the first synthetic bio-circuits ever made, it was constructed in divergent orientation to avoid terminator leakage effects between two mutually repressing genes, LacI and TetR (Gardner et al., 2000; Kobayashi et al., 2004). From the perspective of protein regulation, two proteins provide mutual negative feedback by transcriptional repression. The LacI protein represses TetR production by binding to the Lac promoter upstream of the TetR coding sequence. The TetR protein represses LacI production by binding to the Tet promoter upstream of the LacI coding sequence.
However, we can also build the toggle switch in convergent or tandem orientation. The convergent toggle switch is most appealing, based on several experimental insights: 1) the competing dynamics of positive and negative supercoils between the two genes encodes an additional layer of mutual negative feedback (Figure 7A), 2) the coexpression profiles of RFP and CFP in the convergent orientation (Supplemental Figure S3)E and mSpinach and RFP in the convergent orientation (Supplemental Figure S2) was strongly anti-correlated. All of these properties of compositional context have the potential to enhance or strengthen the existing mutual negative feedback in the toggle switch.
Since our previous controls of sense and anti-sense encoded single genes showed that changing orientation of a single gene on a backbone did not affect expression more than 15 %, we thus constructed a two plasmid version of the toggle switch, with LacI and TetR expressed on separate plasmids. This “context-free" version of the toggle acted as a reference for how a toggle switch should function independent of genetic context.
In both versions of the toggle switch, each gene cassette in the toggle switch was bicistronic, with LacI reported by translation of RFP and TetR reported by GFP. We used stronger ribosome binding sites from the BCD RBS library (Mutalik et al., 2013a), BCD10 and BCD2, to express LacI and TetR and weak BCDs, BCD1 and BCD9, to express the downstream reporters. This was done to minimize any ribosomal loading effects from reporter translation and again, to show that even a toggle switch built de novo from existing synthetic biological parts with different CDSs, promoters, and RBSs could utilize compositional context (Figure 7). We also tested the original Gardner-Collins toggle switch, comparing performance in the original orientation to a convergent variant, see Supplemental Figure S4 and discussion in the Supplemental Information.
We first tested the ability of the toggle switch to act as a threshold detector. In theory, the phase portrait of a toggle switch consists of two locally asymptotically stable equilibrium points and a separatrix which drives state trajectories into the basin of attraction of one of the equilibrium points (Gardner et al., 2000). As a proxy for varying the amount of actively repressing TetR and LacI, we simultaneously varied the concentration of inducers aTc and IPTG, thereby allowing us to attenuate the activity of LacI and TetR repression independently. Most notably, when the toggle switch was configured in the convergent orientation, it exhibited much sharper XOR logic and separation between high GFP-low RFP states and high RFP-low GFP states compared to its two-plasmid (and divergent) counterpart (Figure 7 and Supplemental Figure S4A-B).
The two-plasmid toggle exhibits weaker thresholding, i.e. the ability to latch onto a particular state, in two specific parameter regimes — when IPTG and aTc are both present in high concentrations and when IPTG and aTc are both present in low concentrations. When both inducers are present in high concentrations, the majority of Lac and Tet promoters are un-repressed because most repressor proteins are sequestered by inducers, leading to weak feedback. The weak feedback makes it difficult to differentiate which inducer is higher, since essentially all promoters are expressing constitutively (Figure 7F-H). When both inducers are present in low concentrations, both promoters are strongly repressed making it difficult for one promoter to gain a dominant foothold over the other sufficient to produce fluorescent signal. Thus, in the low inducer concentration regime, even if one inducer is higher in concentration than the other, neither gene is strong enough to repress the other to the point of producing detectable fluorescence (Figure 7F-H).
On the other hand, the convergent toggle shows clear separation between high GFP-low RFP states and low GFP-high RFP states in both of these parameter regimes. This improved performance can be explained by examining the effects of supercoiling and compositional context (Figure 7A-B). Suppose, for illustration, that LacI-RFP is slightly more induced than TetR-GFP. The positive supercoiling from TetR-GFP expression propagates downstream to meet the negative supercoils generated from transcription of LacI-mRFP. If LacI-mRFP has propagated negative supercoils, these positive supercoils and negative supercoils create a dynamic equilibrium between opposing torsional forces. As more and more LacI-mRFP expresses, it forces the positive supercoils back into the TetR-GFP coding sequence. When this happens, TetR-GFP is no longer able to express and its transcript region is thus available to LacI-RFP as a downstream region for dissipating its internal torsional stress. Thus, by propagating supercoils into its neighboring gene, LacI-RFP exerts a form of negative feedback independent of transcription factor-mediated repression.
This explains why the convergent toggle is able function in regimes where IPTG and aTc are simultaneously high or low. When IPTG and aTc are both present in high concentrations, the attenuation of transcription factor repression is compensated by the presence of supercoiling mediated repression. Thus, even though LacI and TetR are not as effective in repressing their respective promoters, the extra layer of feedback allows the convergent toggle to decide on a dominant state (LacI-RFP). Similarly, in the low parameter regime, even though both repressors are strong, the additional feedback from supercoiling favors one gene or the other (an enhancement of the winner-takes-all or XOR logic) and evidently improves the ability of the toggle switch to again allow LacI-RFP to dominate over TetR-GFP.
It is worth noting that the supercoiling feedback occurs at a faster time scale (since there is no translation, protein folding, multimerization involved), supporting the usual mode of repression enforced by the LacI transcription factor. The more LacI-RFP expresses, the more torsional stress it exerts on the opposing gene, which strengthens its foothold as the dominant state. Thus, there is a multi-layer feedback effect introduced by supercoiling in the convergent orientation, conformal with the intended feedback architecture of the toggle switch. In this way, we see that compositional context can be a powerful tool for encoding feedback in synthetic gene networks.
DISCUSSION
The Link Between Compositional Context Effects and Growth Phase: Temporal Aspects of Compositional Context
Our experimental data, as well the outcomes of several gyrase treatment experiments, support a model of how supercoiling dynamics affect transcription. Depending on its compositional context, the supercoiling state of a gene can be affected by the propagation of supercoiling from nearby coding regions. In this way, supercoiling couples the activity of two neighboring genes. The strength of that coupling and its impact on the temporal dynamics of gene expression depends on the orientation of the genes and what part of the gene is exposed to torsional stress from the neighboring gene. On the whole, these features of our model are able to recapitulate the in vitro and in vivo trends observed at steady-state, but do not account for aspects of how gyrase and topoisomerase levels are regulated during different growth states.
An interesting facet of these context effects are the temporal dynamics of supercoiled genes, topoisomerase concentrations, and their dependence on cell culture growth phase. Specifically, as E. coli cells transition from exponential to stationary phase, plasmid DNA exhibited significantly less negative supercoiled DNA. Balke & Gralla (1987) showed that up to ten negative supercoils could be lost in the pBR322 plasmid in stationary phase cells grown in LB. Thus, gyrase activity (which maintains negative supercoiling) is attenuated as cells approach the end of their exponential growth phase. These findings are corroborated by our data; we also saw that compositional context differences become increasingly dramatic just as cells complete their exponential growth phase (Figure S3C-D).
In this work, we have not made a point to model the temporal dynamics of gyrase and topoisomerase as a function of cellular growth phase since doing so would require rigorous characterizations of gyrase and topoisomerase concentrations through the entire growth cycle. Another interesting extension would be to examine how gyrase dynamics and the compositional context of genes in core metabolic systems affect or modulate the dynamics of metabolism.
Supercoiling Dynamics Dominate Genetic Context Effects
Our analysis considered supercoiling as the physical basis for generating expression differences. In past work, the primary context effects considered in designing synthetic biocircuits are the effects of terminator leakage and transcriptional interference from overlapping promoter and RBS elements. We claim that supercoiling is the dominant source of compositional context effects observed in our data, justified by the following observations.
First, we see consistent differences in gene expression, even when only one gene is induced and the other remains repressed. If terminator leakage and transcriptional interference were the source of compositional context effects, we would not expect to see any effects in the case of single gene induction. But we do see significant context effects, even only if one gene is induced.
Also, if terminator leakage was the dominant process underlying context effects, transcription of vector coding sequences would cause interference. Thus, in our single reporter control plasmids we would expect to see similar levels of transcriptional interference, especially compared to the divergent orientation (where terminator leakage interference would be minimal from either reporter gene).
However, there is more than a 2-fold difference in expression between divergent expressed CFP and its single reporter counterpart (sense or anti-sense, compare Supplemental Figure S3C with Supplemental Figure 3F-G). The physical presence of a neighboring gene has an effect, even if it is not transcriptionally active. Thus, transcriptional interference via terminator leakage does not explain the data.
Second, if transcriptional interference were the primary driver for context effects, we would expect convergent oriented genes to achieve far weaker levels of expression than divergent or tandem orientation. In theory, transcribing polymerases that managed to leak through two terminators (Larson et al. (2008) characterized the termination efficiency of our terminators at 98%) would collide in the convergent orientation, leading to an increase in abortive transcription events or transcriptional stalling.
Admittedly, we see from our in vivo characterizations that early-log phase CFP and RFP expression is weaker in the convergent orientation than the divergent and tandem orientation. The fly in the ointment for this argument is that both CFP and RFP expression is higher (see Figure 4 and Supplemental Figure S3C-E) in the doubly induced case than in the singly induced case in early log phase, which contradicts the predictions of transcriptional interference theory.
Transcriptional interference also does not account for the sudden rise in convergent oriented expression relative to divergent orientation as cells approach the end of their exponential growth phase. Supercoiling theory, on the other hand, predicts that as gyrase activity wanes, the promoter regions of divergently oriented genes become more positively super-coiled, which inhibits their activity. This positive supercoiling originates from the RFP promoter as it transcribes in the anti-sense direction, thus asymmetrically inhibiting CFP expression and favoring RFP expression (see Figure S3E).
Thirdly, consider the differences in expression in vitro of convergent, divergent, and tandem transcribed RFP and CFP (Figure 6). Our experimental characterizations in vitro control for variations for plasmid copy number (as a function of orientation), since plasmid replication does not occur in the TX-TL cell-free system (Noireaux et al., 2003). Nonetheless, we see that levels of CFP (and RFP) expression in the convergent and divergent orientation differ by nearly 300 nM (and 500 nM) when purified directly from cells in their natural super-coiled state, whereas treating with gyrase to eliminate positive supercoiling decreases the difference by nearly 70% in both genes. Relaxing positive supercoiling in divergently oriented RFP and CFP with gyrase also allows expression levels comparable to tandem orientation prior to gyrase treatment, while treating tandem oriented RFP and CFP enables expression levels higher than both post-treatment and pre-treatment convergent oriented CFP and RFP. Taken in whole, these observations confirm that the dominant physical process driving the effects of compositional context is supercoiling.
Compositional Context and DNA Replication
While gene transcription occurs both frequently and rapidly, it is important to consider the effects of plasmid DNA replication on supercoiling. Our in vitro cell-free experiments results suggest that there are significant differences in gene expression, depending on whether they are in convergent, divergent, and tandem orientation, both at the transcriptional and translation level. Since plasmid DNA replication does not occur in this system, gene orientation and placement has a significant impact on gene expression in the absence of plasmid DNA replication (see Figures 2 and 5). Moreover, the majority of these in vitro trends are qualitatively consistent with the trends observed in vivo. Thus, our data suggests that either the trends observed are dominated by the context defined by the two reporter genes (mSpinach, MG, CFP, and RPF) of interest or that plasmid replication only strengthens these trends in vivo.
Admittedly, we found a small effect of reversing the direction of a single reporter gene on the ColE1 plasmid backbone, to encode it in either the sense or antisense direction. Comparing the expression levels in Supplemental Figure S3F-G, where t = 550 minutes after induction, there is a consistent 15% drop in expression if a sense reporter was expressed in the anti-sense direction. This reduction was observed independent of the choice of CFP and RFP coding sequence or choice of promoter. Examining the growth curves, we see that t = 550 minutes is still in log-phase for both induced RFP sense and anti-sense strains. Thus, it appears there is a small effect of orientation on CFP and RFP expression. The context of a single gene in this case is the replication origin and the resistance marker, both of which are approximately separated by 200-250 bp from the reporter gene.
However, when we compare the signal of a single induced sense or anti-sense CFP against their convergent, divergent, or tandem counterparts on the same plasmid backbone at the same point in time (Supplemental Figure S3C), we observe up to a 5.8-fold drop in CFP signal or a 2.4-fold drop in RFP signal even when a repressed adjacent gene is introduced, holding all other genes on the plasmid constant. These results strongly suggest that the primary source of context effects are the neighboring reporter gene rather than plasmid. Plasmid replication may indeed have a small impact on genetic context effects, but the observed trends in this study are dominated by variables of compositional context pertaining to our reporter genes of interest. Since the Lac and Tet promoters used in this study (Lutz & Bujard, 1997) are quite strong compared to average wild-type promoters E. coli, it remains the subject of future work to investigate whether plasmid replication dynamics play a larger role in defining genetic context when synthetic promoters strengths are attenuated.
The Role of Compositional Context in Synthetic Biocircuit Design
Our findings show that compositional context significantly alters gene expression in synthetic gene networks. When appropriately harnessed, compositional context can be used to strengthen or enhance existing feedback loops in the intended biocircuit design. These findings validate prior analysis underscoring the value of accounting for compositional context effects in synthetic biocircuit design (Cardinale & Arkin, 2012).
Broadly speaking, there are many levels of abstraction and ways to define compositional context. Cox et al. investigated how different regulatory elements in existing promoters could be assembled in distal, core, and proximal sites to define a library of new combinatorial promoters (Cox et al., 2007). Similarly, Mutalik et al. showed that the compositional context of a ribosome binding site, specifically sequences downstream of the ribosome binding site could have a significant impact on the effective binding strength of the ribosome (Mutalik et al., 2013a). Using a bicistronic design approach, they showed they were able to better insulate against downstream sequence variability to produce predictable parts. These are examples of the importance of understanding and insulating against intragenic compositional context.
The results of our experimental studies emphasize the importance of understanding intergenic compositional context effects, i.e. composition of entire genes. We have seen that compositional context effects can cause variations of 3-4 fold of the same gene (promoter, RBS, coding sequence, etc.) simply by rearranging its orientation and the orientation of other neighboring genes. The significance of these outcomes raise an important issue. As intragenic context, e.g. choice of BCD, promoter design, polycistronic design, are optimized to produce a functional gene cassette with model-predicted gene expression levels (Kosuri et al., 2013; Mutalik et al., 2013b), how do we ensure these predictions are not confounded by intergenic context as genes are composed?
One solution is to separate genes that need to have precise regulated expression levels on to different plasmids. This strategy has been successful, especially if the goal is not to use context effects to enforce additional layers of feedback. For example, a relaxation based genetic oscillator (Stricker et al., 2008) was developed by keeping the transcriptional units for LacI and AraC on separate plasmids. This oscillator exhibited remarkable robustness, oscillating over a range of experimental conditions and had the beneficial feature of tunability.
However, the drawback of this approach is that separating genes on different plasmids introduces imbalances in gene copy number, which in turn can lead to additional design-build-test cycles to rebalance circuit dynamics. Also, it is often the case that there are too many genes in a biocircuit to isolate individually on separate plasmids. In such settings, the findings of this work are important to consider, as they can be used to inform how to optimally compose adjacent genes.
The effects of adding spacing sequences between genes are complex. Specifically, we varied the amount of spacing between mSpinach and MG aptamer in convergent, divergent, and tandem orientation by adding increments of 100 bps between genes and found that spacing did not have a monotonic effect on decreasing the fold-change across orientations (see Supplemental Figure S1A-C). Most unusual was the sudden drop in signal observed in the divergent and tandem orientation, but not in the convergent orientation with 450 bp of spacing between the two genes. It is possible that since the persistence length of DNA is 150 base pairs, 450 base pairs of spacing facilitates formation of plectonomes (with DNA loops consisting of three 150 bp domains) in the spacing region, which induce torsional stress and inhibit formation or movement of the transcription bubble. The physical basis for these observed trends will be a subject of future research.
In general, genes responded well to induction when induced one by one, though their raw expression levels varied depending on compositional context. This may explain why some circuits in the past have been successfully engineered, with little consideration given to the effects of supercoiling and compositional context. For example, the original toggle switch (oriented divergently) was designed to respond and latch to the presence of a single inducer (Gardner et al., 2000) — it did this well and latched to LacI or TetR dominant states. In contrast, the threshold detection abilities of the toggle were not explored.
Likewise, the fold-change in ’off’ vs ’on’ states of three input and four input AND gates developed by Moon and colleagues (Moon et al., 2012) was strongest when comparing singly induced expression levels against the corresponding fully induced state. Interestingly, the four layer and three logic gates in these biocircuits were compositionally composed so that no two genes involved in any constituent layer of logic were placed adjacent to each other. Pairs of genes involved in logic gates were always separated by an auxiliary backbone gene or placed on separate plasmids. Overall, the success in this work suggest that genes can be insulated by inserting short ‘junk’ transcriptional units in between each other. Engineering approaches for attenuating compositional context effects are a subject of future research.
Finally, we remark that the largest effects from varying pairwise-gene orientation occurs when both genes are induced. However, we noticed that tandem oriented CFP and RFP, mSpinach and RFP, and mSpinach and MG aptamer, responded best to both single and double induction (see Figure 1 and Supplemental Figure S3C-D). Although tandem oriented CFP and RFP experienced roughly a 25% drop in signal compared against their single gene counterparts (Supplemental Figure S4), they respond well to induction when individually induced as well as when induced simultaneously. Additionally, although gyrase treatment of tandem oriented plasmids revealed positive supercoiling build-up, tandem oriented plasmid maintained significant levels of expression prior to treatment, comparable to convergent expression and superior to divergent expression. These findings are also consistent with successfully built biocircuits using tandem orientation, e.g. the repressilator (Elowitz & Leibler, 2000), recently developed 3-node and 5-node novel repressilators(Niederholtmeyer et al., 2015), and an eight gene event detector (data not shown).
EXPERIMENTAL PROCEDURES
Plasmid Construction, Assembly, and Strain Curation
Plasmids were designed and constructed using either the Gibson isothermal DNA assembly technique (Gibson et al., 2009) or Golden Gate DNA assembly approach (Engler et al., 2008) using BsaI type II restriction enzyme. All plasmids were cloned into JM109 E. coli (Zymo Research T3005) or NEB Turbo E. coli (NEB C2984H) strains and sequence verified. Sequence verified plasmids were transformed into MG1655Z1 and MG1655ΔLacI (also lacking TetR) strains of E. coli. All plasmids with ColE1 replication origin were transformed and cloned at 29 C to maintain low copy number of the ColE1 replication origin. Sequence verified colonies were grown in LB and the appropriate antibiotic and stored as glycerol stocks (17 % glycerol) at -80° C.
Single Cell Fluorescence Microscopy
Based on the principles elucidated by Han et al.(Han et al., 2013), we ran all our experiments at 29° C when imaging mSpinach. Cells were revived from glycerol stock overnight at 29° C in LB, diluted to an OD of 0.1 and recovered for 2 hours in log-phase. Cells were then diluted to an density of approximately 5 × 106 cells/mL of LB and loaded into a CellASIC plate. Separate solutions for flowing LB with 200 µM DFHBI and LB with 200 µM DFHBI and 1 mM IPTG were prepared and loaded into reagent wells in the CellASIC ONIX B04A plate for imaging.
Fluorescence and bright field images from time-lapse microscopy were cropped using ImageJ and analyzed in MAT-LAB with Schnitzcell (Young et al., 2012). For characterizing coexpression of mSpinach and MG RNA aptamer, we used single cell agar pad microscopy, with all cells grown shaking at 29° C in a 96 well plate from overnight recovery until they reached log-phase (~4 hours). Induction occurred by transferring 10 µL of cultures into another 96 well plate into 350 µL of LB with 1mM IPTG and 200 ng/mL aTc.
Plate Reader Experiments
For plate reader experiments, all cultures were revived from glycerol stock at 37° C in LB and the appropriate antibiotic, followed by redilution to OD 0.05-0.1, recovered at log-phase for 2 hours at 37° C, and then pipetted into a 96 square well glass bottom plate (Brooks Life Sciences MGB096-1-2-LG-L) with the appropriate media, antibiotic and inducer. All measurements were taken on Biotek Synergy H1 plate readers, using the internal monochromomator with excitation (and emission) wavelengths for mSpinach, MG aptamer, CFP, and RFP at 469 nm (and 501 nm) at gain 100, 625 (and 655 nm) at gain 150, CFP at 430 nm (and 470 nm) at gain 61 and 100, RFP at 580 nm (and 610 nm) at gain 61 and 100. For RNA aptamer imaging, all in vitro and in vivo experiments were performed at 29° C with 200 µM DFHBI (for mSpinach) and 50 µM of Malachite Green dye.
CONFLICTS OF INTEREST
The authors have declared that no conflicts of interest exist.
ACKNOWLEDGMENTS
We thank Ophelia Venturelli for her invaluable inspiration and guidance in this project, Victoria Hsiao, Jin Park, Anu Thubagere, Adam Rosenthal for wonderful ideas on imaging, David Younger, Ania Baetica, Vincent Noireaux, Clarmyra Hayes, and Zachary Z. Sun for guidance and assistance with TX-TL experiments, and Lea Goentoro, Johann Paulsson, Long Cai, Jennifer Brophy, John Doyle, Eric Klavins, and Julius Lucks for insightful conversations.
This work was supported in part by a Charles Lee Powell Foundation Fellowship, a Kanel Foundation Fellowship, a National Science Foundation Graduate Fellowship, National Defense Science and Engineering Graduate Fellowship, Air Force Office of Scientific Research Grant (AFOSR) FA9550-14-1-0060, Defense Threat Reduction Agency Grant HDTRA1-14-1-0006, and Defense Advanced Research Projects Agency Grant HR0011-12-C-0065.