Abstract
Auto regulatory feedback loops occur in the regulation of molecules ranging from ATP to MAP kinases to zinc. Negative feedback loops can increase a system’s robustness, while positive feedback loops can mediate transitions between cell states. Recent genome-wide experimental and computational studies predict hundreds of novel feedback loops. However, not all physical interactions are regulatory, and many experimental methods cannot detect self-interactions. Our understanding of regulatory feedback loops is therefore hampered by the lack of high-throughput methods to experimentally quantify the presence, strength, and temporal dynamics of auto regulatory feedback loops. Here we present a mathematical and experimental framework for high-throughput quantification of feedback regulation, and apply it to RNA binding proteins (RBPs) in yeast. Our method is able to determine the existence of both direct and indirect positive and negative feedback loops, and to quantify the strength of these loops. We experimentally validate our model using two RBPs which lack native feedback loops, and by the introduction of synthetic feedback loops. We find that the the RBP Puf3 does not natively participate in any direct or indirect feedback regulation, but that replacing the native 3’UTR with that of COX17 generates an auto-regulatory negative feedback loop which reduces gene expression noise. Likewise, the RBP Pub1 does not natively participate in any feedback loops, but a synthetic positive feedback loop involving Pub1 results in increased expression noise. Our results demonstrate a synthetic experimental system for quantifying the existence and strength of feedback loops using a combination of high-throughput experiments and mathematical modeling. This system will be of great use in measuring auto-regulatory feedback by RNA binding proteins, a regulatory motif that is difficult to quantify using existing high-throughput methods.
INTRODUCTION
Homeostatic maintenance of cell state and transitions between states are often mediated by sets of feedback loops1. These loops can be positive or negative, and either directly auto-regulatory or indirect, acting through any number of intermediate genes. Both positive and negative feedback loops are used by both organisms and synthetic biologists to perform a wide range of tasks, e.g., ATP biosynthesis2, MAPK signaling3 and zinc homeostasis4. Genome-wide experimental measurements and computational predictions of protein-protein and protein-RNA interactions suggest the existence of thousands of feedback loops5–11. However, not all genes that physically interact with each other regulate each other. Interaction does not necessitate regulation. For example, the mRNAs bound by a given RBP and the RNAs that change expression upon deletion of that RBP show surprisingly little overlap12. Therefore, high-throughput experimental and computational methods that are increasingly good at correctly identifying physical interactions must be complemented by high-throughput methods for quantifying the sign and strength of these regulatory interactions.
Feedback loops can be described by two sets of properties: positive or negative and direct or indirect13,14. A direct auto-regulatory feedback loop is one in which a protein activates or inhibits itself, while in an indirect loop this feedback occurs through one or more intermediate genes. Furthermore, feedback loops can be positive, in which a gene increases its own expression or activity (eg: via phosphorylation), or negative, in which a gene represses or inactivates itself. Direct auto-regulatory negative feedback loops have an intrinsic ability to reduce sensitivity to intrinsic and extrinsic perturbations 15–17. In addition, the negative feedback network motif can shorten the response time of a network, as is found in the SOS DNA repair pathway and in ribosome biogenesis in E coli18–20, and can alter the response curve of a gene to changes in inducer concentration21,22. Time-separation in negative feedback loops, often in indirect feedback via intermediate genes, can generate oscillations and irreversible transitions, as are found in the circadian clock and the cell-cycle23,24. Like negative feedback, positive feedback can produce sustained oscillations25. However, biologically, positive feedback loops play an entirely different set of roles26, the most common of which is bi-stability, or all-or-none-transitions27. These networks motifs are common in the cell-cycle and in cell-differentiation, both of which often display a fast positive feedback loop and a delayed negative feedback loop in order to achieve an irreversible transition between two stable states, such as in the cell-cycle or in sex differentiation24,28.
Functional genomic and bioinformatic methods predict that many RNA binding proteins (RBPs) participate in feedback loops, in which the RPB regulates the level of its own protein, either directly or indirectly10. Direct negative feedback auto-regulation appears to be a common motif for RBPs, and many genes that lack a canonical RNA binding domain may bind to their own mRNA and regulate their own translation10. However, very few of these feedback interactions have been experimentally tested, as no high-throughput methods exist for such validation. The typical high test to measure the regulatory effect of a particular RBP involves deletion of that RBP, and therefore these methods are incapable of determining the existence of feedback loops. In order to determine the complete set of RBPs that control their own expression via direct or indirect feedback regulation, we developed a mathematical model and a high-throughput experimental method that work together to identify the presence of such loops and to measure their relative strength.
RESULTS
A mathematical model for detecting feedback loops using a synthetic inducible promoter
In a simple feedback-free model of gene expression (see methods), two proteins under the control of the same inducible promoter will show similar inductions curves (Figure 1A-D). Differences in the transcription, translation, and degradation rates of these genes result in offset induction curves (Figure 1B). However, the offset between these two curves is constant, and therefore the log-ratio of expression between a protein of interest and a reference protein (eg: GFP) under control of the same promoter will remain constant across induction levels (Figure 1C). This effect can also be visualized by plotting expression of the protein of interest against the reference; a change in the transcription rate, translation rate, or degradation rate will result in diagonal lines with the same slope (Figure 1D). However, if one of the proteins participates in a feedback loop, the shape of the induction curve will change (Figure 1E,F). At low levels of induction (low TF concentration, and therefore low protein levels), the feedback loop will be negligible, and expression will be similar between the two proteins. As the induction increases, and therefore the expression of both proteins increases, the effect of the feedback loop on expression will increase, resulting in larger differences in the ratio between the expression of the two proteins (Figure 1G). This can also be visualized as a change in the slope when comparing the expression of one protein against the other (Figure 1H). Therefore, a relatively simple model of gene expression suggests that it should be possible to detect feedback loops by placing two proteins under the control of the same inducible reporter, and determining how the ratio between the two proteins changes as a function of expression level.
Design of a feedback-detector master strain
In order to experimentally test the above predictions, we developed a feedback detector control strain (bud9::Z3EVpr-GFP-GAL803’UTR, his3::Z3EVpr-mCherry-HIS33’UTR,hereafterreferred to as Y197) in which both GFP and mCherry are driven by the same Z3EV promoter under the control of the β-estradiol inducible synthetic transcription factor Z3EV (Figure 2A,B). Addition of β-estradiol results in increased activity of the Z3EV transcription factor and increased expression of both GFP and mCherry (Figure 2C). In the absence of feedback, the two proteins show identical induction curves, and the ratio between the two proteins remains constant (Figure 2C,D). Our model (see Methods) predicts that if we use the Z3EVpr-mCherry construct to N-terminally tag an RNA binding protein that participates in a feedback loop (Figure 2E,F), we would alter the induction curve of mCherry but not GFP, and hence the ratio between the two fluors would change as a function of TF concentration (Figure 2G,H). In order to experimentally test our feedback model, we built synthetic gene circuits with and without feedback regulation.
A Puf3-COX17 construct participates in a auto-regulatory negative feedback loop
Puf3 is an RNA binding protein that binds to sequence elements in the COX17 3’UTR, resulting in a destabilized COX17 mRNA29. We therefore hypothesized that a Z3EVpr-mCherry-Puf3-COX173’UTR strain (LBCY209, hereafter referred to as PUF3COX17) should have a negative feedback loop, while a Z3EVpr-mCherry-Puf3-PUF33’UTR (LBCY200, PUF3PUF3) should lack feedback regulation. We therefore built these two strains and measured the expression of mCherry-Puf3 as a function of β-estradiol. We find that, as predicted by the model, PUF3PUF3 shows the same induction curves as Y197, albeit shifted towards lower mCherry expression (Figure 3B,C). In contrast, the mCherry signal for the PUF3COX17 flattens out at high β-estradiol (Figure 3B,C). The result is that the the log2(mCherry/GFP) ratio shows identical behavior between Y197 and PUF3COX17 (Figure 3D). In contrast, for PUF3COX17, the ratio decreases with increasing β-estradiol, consistent with the presence of a negative feedback loop. Further consistent with the presence of a negative feedback loop in PUF3COX17 but not in PUF3PUF3, the slope of GFP vs mCherry for the PUF3COX17 strain is lower (Figure 3E), as predicted by the model for a negative feedback loop.
Gene expression is a stochastic process in which promoters switch on and off, generating bursts of protein production30. Therefore gene expression can be decomposed into two parts, burst frequency (the rate at which a promoter switches on), and burst size (the number of protein molecules made each time a promoter switches on)31. We hypothesized that a negative feedback loop that acts at the level of mRNA stability would decrease burst size. Consistent with this hypothesis, the burst frequency for the two strains is identical, while the burst sizes are vastly different (Figure 4A-D). Interestingly, we find that, at low induction levels, PUF3COX17 has a higher burst size than PUF3PUF3, though the burst size of PUF3COX17 increases more slowly than that of PUF3PUF3(Figure 4B), suggesting that two competing processes control expression of PUF3COX17: an increase in expression from increasing β-estradiol, and a decrease in expression from decreasing stability of the mRNA due to negative feedback. The final result of the negative feedback loop is that the same mean expression is reached at different β-estradiol concentrations, though the width of the single-cell expression distribution of mCherry-Puf3 is wider without the negative feedback loop(Figure 4E), suggesting that cells may use auto-regulatory negative feedback loops to decrease cell-to-cell variability in RBP expression.
Z3EVpr-Pub1 participates in an indirect positive feedback loop
Pub1 is a poly(U) binding protein that binds to and stabilizes up to 10% of yeast mRNAs. We found that a pub1Δ strain exhibits reduced Z3EVpr-GFP expression (Figure 5A). However, this reduction is not constant (Figure 5B), as would be expected from altered stability of the GFP mRNA. Instead the effect depends on the level of induction, suggesting that Pub1 acts upstream of GFP, i.e., at the level of Z3EV activity. This suggests that Pub1 may increase expression all Z3EV targets in a dose-dependent manner. In other words, a Z3EVpr-Pub1 will generate a positive feedback loop. To test this hypothesis we generated a Z3EVpr-mCherry-Pub1 strain (Figure 5C) and measured both GFP and mCherry as a function of β-estradiol concentration. Consistent with our hypothesis, both GFP and mCherry show more steep induction curves in a Z3EVpr-mCherry-Pub1 strain than they do in the wild-type control (Figure 5D,E,F). In contrast to the negative feedback loop mediated by Puf3, the positive feedback loop mediated by Pub1 results in an increase in the width of the single-cell distribution (Figure 5G). Thus, Z3EVpr-mCherry-Pub1 drives a positive feedback loop that results in increased expression of both Z3EVpr-mCherry and Z3EVpr-GFP.
A mathematical model in which tagging RBPs with mCherry can explain all of the data
In the above experiments we measured expression of four different strains, all of which contain Z3EVpr-GFP as an internal control, and each of which contains a unique Z3EVpr-mCherry derived construct. The different C-terminal ends on mCherry (mCherry, mCherry-Puf3 and mCherry-Pub1), and the different 3’UTRs are likely to affect the stability of the mCherry mRNA and protein. These can be modeled as constant, transcription factor independent changes in mCherry expression level. In contrast, feedback loops introduced transcription factor dependent changes in mCherry expression. To determine if all of our data can be explained by this model, we fit a model without feedback to Y197 (mCherry) data, and then used the same parameters, but allowed only K’b (TF independent transcription & translation) to vary (see methods). We find that this model can fit Y200 (mCherry-PUF3PUF3) but not the other strains, consistent with our above hypothesis that this strain lacks any feedback loops (Figure 6A,B). We next fit the same model but allowed both K’b and F to vary. We find that this model can fit all measured strains, and that all good fits to Y209 (mCherry-PUF3COX17) have a negative value of F, and all good fits to (mCherry-PUB1) have a positive value of F, consistent with our hypothesis that these strains have negative and positive feedback loops, respectively (Figure 6C,D). Thus, our two-color Z3EVpr feedback detector system is able to identify the presence of both direct and indirect feedback loops regulated by RNA binding proteins.
DISCUSSION
In summary, we have developed and experimental and mathematical framework to measure the strength of native regulatory feedback loops. Through the use of an experimental system in which two fluorescent reporters are driven by the same transcription factor, but one of these is coupled to an RNA binding protein, we can not only detect the presence of both positive and negative feedback loops, but we can quantitatively measure the strength of these loops as well. We applied this system to a synthetic auto-regulatory negative feedback loop with the RBP Puf3, and found that introduction of this loop reduces the noise in Puf3 expression. In addition, we applied this system to the RBP Pub1, and detected a positive feedback loop that increases expression noise. However, this feedback loop is not specific to Pub1, but acts on both Z3EVpr-mCherry-Pub1 and Z3EVpr-GFP, suggesting that it acts upstream of Pub1, possibly by directly regulating the concentration of the Z3EV TF. Pub1 binds the ENO2 3’UTR and the ACT1 5’UTR12,32. This observation highlights an important strength of our dual-reporter method, as opposed to more traditional approaches in which the gene of interest is overexpressed or deleted without an internal control in the same cell. Many genetic perturbations result in both direct and indirect effects. Varying Pub1 changes both mCherry and GFP, showing that the effect is non-specific. The vast majority of changes in mRNA levels observed in deletion and overexpression experiments are indirect33. GFP serves as an internal control for the state of the cell and for the state of the synthetic gene circuit. Thus, by combining the dual reporter system with a mathematical model we can accurately different direct and specific effects from indirect pathway-specific or global effects.
Consistent with past theory and experiments15,17,34–37, we find that a negative feedback loop decreases noise, while a positive feedback loop increases noise. Interestingly, the increase in noise is far larger than the decrease in noise. However, at this point we cannot say if this difference in the magnitude of the changes is a general property of positive vs negative loops or of direct auto-regulatory loops vs pathway-specific or global feedback loops. It will be interesting to determine if global regulators of expression, such as Pub1, also act as global regulators of expression noise, or if the large increase in noise is due to a pathway-specific positive feedback loop.
Interestingly, we find that the parameter regime that fits Z3EVpr-mCherry-Puf3 is exactly continuous with the regime that fits Z3EVpr-mCherry-Puf3-COX17 (Figure 6D). In addition, the induction curves intersect at around the position of half-maximal expression (Figure 6C). This suggests that, at half-maximal induction, the two constructs have identical transcription and translation rates, but that at low induction the COX173’UTR mRNA is more stable (less Puf3), while at high induction the reverse is true.
Finally, this system may serve as a platform for the design and characterization of synthetic RBPs38,39. Synthetic regulatory circuits with designed sequence specificities have many advantages over the repurposing of native circuits, such as Gal4-UAS40. However, it is difficult to design such circuits so that they do not interfere with the host cell41. The dual-reporter aspect of our system ensures that secondary effects of synthetic circuits can easily be detected, and constructs that lack secondary effects chosen.
MATERIALS AND METHODS
Yeast strains and media
All yeast strains are listed in Supplementary Table 1. As non inducible and autofluorescence control we used FY4, a wild prototrophic yeast strain42. The parental strain for all Z3EV strains is DBY1905443. To generate LBCY197, we generated a PCR amplicon containing KanMX-Z3EVpr-mCherry using primers 196 & 197 and LBCP80, and transformed this amplicon into DBY19054. To generate yeast strains LBCY200 and LBCY201 we first generated plasmids LBCP94 and LBCP95, and amplified these plasmids using primer pairs 325,326 and 365,366 respectively, and transformed these PCR amplicons into DBY19054. In order to generate LBCY209 we first generated a puf3::HYGR strain (LBCY203), and then generated LBCY209 using primers 325 & 327 and plasmid LBCP96. Colony PCR was used to confirm correct integrations of all strains, and to verify that the Z3EV promoter has the the correct number of Z3EV binding sites. All transformations were performed using the standard lithium acetate method44. PCR for transformation was performed with Phusion DNA Polymerase (Sigma Aldrich). Colony PCR was performed using Taq Polymerase 2x Master Mix (Sigma Aldrich). Selection for drug resistant transformants was done on YPD plates with Hygromycin B(IBIAN), CloNAT(Werner bioreagents), or G418(VWR).
Plasmid construction
In order to create plasmid LBCP80, a 1.5kb PCR amplicon containing the Z3EVpr from gDNA of DBY19054 was amplified using two rounds of PCR, first with primers 316 & 317, then with primers 308 & 309. This PCR product, along with mCherry, was cloned by Gibson assembly into plasmid PYM-N1445 which had cut with SacI and EcoRI to remove the GPD promoter. To create plasmids LBCP94 & LBCP95 the PUF3 and PUB1 ORFS were PCR amplified using primer pairs 318, 319 and 329, 330 and Gibson cloned into EcoRI linearized LBCP80. Plasmid LBCP96 was created using Gibson assembly with EcoRI linearized LBCP80, the PUF3 ORF, and a 200 bp 3’UTR region of COX17 PCR amplified from the genome using primers 321 & 322. After Gibson assembly, each plasmid was transformed to E. coli by electroporation and transformants confirmed by colony PCR, minipreped and checked by Sanger-sequencing and multi-site restriction digest.
Flow Cytometry
Single colonies were picked from YPD plates and cultured overnight in SCD media, inoculated at OD600 = 0.02 into different concentrations of SCD + β-estradiol (Sigma E8875) and measured after 7.5 h of growth at (in which OD600 was between 0.25 and 0.5). The flow cytometry machine used was BD LSRFortessa (BD Biosciences) and with 488nm and 561nm lasers with 530/28 or 610/20 filters for GFP and mCherry. All data analysis was performed using MATLAB as previously described4.
Mathematical modeling and fitting to data
To construct a mathematical model that could explain the fluorescence values over the range of inducer concentrations, it was assumed a constant rate of mRNA synthesis and a mRNA degradation rate proportional to the actual mRNA concentration, and a protein synthesis rate proportional to the mRNA concentration and a degradation rate proportional to the actual protein concentration. Thus, it was defined the following ODE system, which defines the background model in which kb and k’b are rates of transcription and translation, respectively, and kd and k’d rates of mRNA degradation and protein degradation, respectively. The factor ɑ is the transfer function that establishes the relation between inducer concentration and promoter activation, which has been supposed to follow a Hill equation. K50 is the concentration of β-estradiol at which expression reaches its half-maximal value. F is a feedback constant, with negative values for negative feedback interactions and positive values for positive feedback. Considering (1) and (2) at equilibrium, we can write
We note that this feedback model is invalid with regards to the rate of protein production when [Prot] approaches 0, which is not what we would expect for such a biological system. We therefore used fluorescence data, which always has values greater than zero, as a proxy for [Prot]. We note that all tested strains show GFP and mCherry signals significantly above background in the absence of β-estradiol, suggesting that, averaged across the population, [Prot] is always greater than zero. Furthermore, we note that each of the variables within (kd *k’d)/(kb *k’b), cannot measured individually using our system. We therefore vary a single of these parameters and keep the other three constant.
All data were first normalized by subtracting either autofluorescence, as measured from a strain lacking GFP and mCherry, or basal expression, as measured at 0nM B-estradiol. All analysis, figures, and model fitting was performed using both normalization methods; the results are qualitatively identical. The latter corrects for differences in the basal expression level between strains, and is used when plotting GFP vs mCherry. The former explicitly shows differences in basal expression level and is used for fitting the model to data. Prior to fitting, normalized fluorescence measurements were log10 transformed to prevent the high expression values from dominating the fit. The model was first fit to data from Y197. Then, either F, k’b, or both F and k’b were varied over two orders of magnitude and the R2 was calculated between model and data.
In order to quantitatively decide what R2 constitutes a good fit, for each strain, we fit the model to one biological replicate and then calculated R2 of that model a different biological replicate of the same strain. The R2 is always greater than 0.95 for all strains (Supplementary Table 2); we therefore chose 0.95 as the threshold.
AUTHOR CONTRIBUTIONS
The author have made the following declarations about their contributions: Conceived and designed the experiments and wrote the paper: LBC, with help from MAST & JDE. Performed the experiments and analyzed the data: MAST, CTO, LBC. Contributed reagents and materials: MAST, CTO, JDE, LE.
ACKNOWLEDGEMENTS
We’d like to thank Gian Tartaglia, Marçal Gabaldá and Elena Abad for useful discussions. This work was supported by startup funds from the department (DCEXS) and grant from the Agència de Gestioó d’Ajuts Universitaris i de Recerca (AGAUR) to L.B.C. M.A.S.T was supported by a Evolutionarly Biology and Complex Systems (BESC) Program undergraduate fellowship. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.