Abstract
The central dogma indicates the basic direction of gene expression pathways. For activated gene expression, the quantitative relationship between various links from the binding of transcription factors (TFs) to DNA to protein synthesis remains unclear and debated. There is consensus that at a steady state, protein levels are largely determined by the mRNA level. How can we find this steady state? Taking p53 as an example, based on the previously discovered Hill-type equation that characterizes mRNA expression under p53 pulsing, I proved that the same equation can be used to describe the average steady state of target protein expression. Therefore, at steady state, the average fold changes of mRNA and protein expression under TF pulsing were the same. This consensus has been successfully demonstrated. For the p53 target gene BAX, the observed fold changes in mRNA and protein expression were 1.72 and 1.28, respectively; the change in mRNA and protein expression calculated using the Hill-type equation was 1.35. Therefore, using this equation, we can not only fine-tune gene expression, but also predict the proteome from the transcriptome. Furthermore, by introducing two quantitative indicators, we can determine the degree of accumulation and stability of protein expression.
Introduction
According to the central dogma of molecular biology, the gene expression pathway is from DNA to RNA to protein 1,2, which is the process of transcribing gene-encoded information into mRNA and translating it into functional proteins. For gene expression, mRNAs and proteins provide useful readouts that connect genes and phenotypes 3. Generally, there are two different gene expression pathways involved. One pathway is the basal pathway, which regulates homeostatic expression, and the other pathway is the activated or repressed pathway, which dynamically regulates expression in response to environmental stimuli 4.
For basal gene expression, steady-state protein levels depend on the transcription rate, mRNA half-life, translation rate constant, and protein half-life 5,6. Moreover, situations with high transcription rates and low translation rate constants sho uld be ruled out 6. Under environmental stimuli, for gene expression activated by transcription factors (TFs), studies involving transcripteomics and proteomics indicate that the protein level at steady state is largely contributed by the mRNA level 3,7,8, however, there is little relationship between the protein level and synthesis or protein half-life.
Here, the steady state can be defined as a relatively long-term process experienced by cells 7. Therefore, both cell proliferation and differentiation can be considered steady states 7. Apoptosis and senescence also occur in a steady state 9,10. The steady state achieved after stimulation differs from that before perturbation 11.
For the steady state before stimulation in dendritic cells, mRNA levels explain 68% of the change in protein expression, and for the approximate steady state after stimulation with lipopolysaccharide, mRNA levels explain 90% of the change in protein expression 8,12. In addition, during the development of C elegans, the fold changes in mRNA and protein expression were almost identical. The transcript fold change was 2.02, and the protein level was 2.05 13. Therefore, we can speculate that the fold-changes in mRNA and protein expression in the steady state are equal. Could we theoretically determine this special steady state of protein expression? Which factors determine protein expression levels in the steady state?
Gene expression under environmental stimuli is driven by TFs, which map the corresponding stimulus. TF dynamics encode specific stimuli 14. The binding components of TF and DNA constitute the decoder of TF dynamics. This initiates the expression of the corresponding target gene and fulfils the relevant function. Some TFs exhibit oscillatory dynamic behaviors, in which the duration, frequency, and amplitude can all encode components of gene expression, thereby leading to different cell fates 15,16.
The tumor suppressor p53 is the most extensively studied TF. In response to DNA damage, p53 concentrations exhibit a series of pulses of fixed duration, frequency, and amplitude, whereas the number of p53 pulses increases with the degree of DNA damage caused by γ irradiation 17. Changing p53 dynamics from pulsed to sustained behavior leads to an altered cell fate from DNA repair to senescence 18. For gene expression driven by p53 pulsing, there is an interesting phenomenon in which the levels of mRNA and target protein expression are very similar. For example, for the MDM2 gene, 10 h or 24 h after stimulation, the fold change in mRNA was 2.2 (10 h)19 or 2.0 (24 h)20, while the protein expression was 1.8 (24 h), which can be regarded as the average steady state over the cell population 19. Therefore, without loss of generality, I will use target gene expression under p53 pulsing as an example to determine the principle governing target protein expression at a steady state.
Following the genetic information flow specified by the central dogma, we obtained a modified Hill equation to characterize the average p53 DNA-binding probability 21. I also found a Hill-type equation that could predict the fold-change in target mRNA expression under p53 pulsing 22. where Δ denotes the duration, T is the period, A is the amplitude, β is the maximal fold change in mRNA expression, and KA is the dissociation constant.
Here, I will complete the last step in describing the central dogma, from mRNA to protein. In response to pulsed p53 input dynamics, p21 mRNA dynamics show pulsing expression; however, p21 protein dynamics exhibit rising expression 23. The half–life of mRNA and protein determines the stability of expression dynamics 19,20,24. However, the relationship between steady-state mRNA and protein expression levels remains unclear. Therefore, I tried to prove that the average steady-state fold changes in mRNA and protein expression under p53 pulsing were equal, i.e., the average protein fold change at steady state was equal to .
Methods
Mathematical model of p53 target protein expression dynamics and its analytical solution
To achieve this goal, we must develop a useful and accurate mathematical model for gene expression under pulsed TF dynamics 12,25. Some models do not include the basal transcription term,26 but some early models provided necessary terms for environmental stimuli to activate gene expression.27 The basal transcription term is necessary for introducing fold change. The ordinary differential equation for mRNA dynamics is 22 where mRNA(t) denotes the target mRNA concentration, S(t) is the square wave function of the TF p53 pulses, is the basal transcription rate, β‣ is the maximal transcription rate activated by p53, α is the mRNA decay rate constant, and n is the Hill coefficient. The initial condition is mRNA(0) = m0, where m0 denotes the basal mRNA concentration.
The equation for the protein expression dynamics is 5,6,19 where protein(t) represents the target protein concentration, and σ and μ denote the rate constants of translation and degradation, respectively. The initial protein concentration was determined using protein(0) = p0.
For activated expression dynamics, considering , p0 = σ m0 μ, letting become m(t) = mRNA(t) m0, P(t) = protein(t)p0, Equation 2 and Equation 3 become Where m(t) and P(t) represent the fold changes in mRNA and protein expression, respectively, and denotes the maximal fold change in transcription. It seems that I have never seen gene expression levels appear in a dimensionless form in the differential equation. This is a crucial step. According to equation 5, the steady state levels of mRNA and target proteins are the same. Because Equation 4 has been solved, the solution from Reference 22 is rewritten as where mi (ξi) represents the mRNA fold change during the i-th TF pulse.
Solving Equation (5) is challenging. Reference 28 provides a method for solving such equations. Assuming a ≠μ, we can obtain analytical solutions for target protein expression in response to p53 pulsing as follows: Pi (ξi) represents the protein fold-change under the i-th p53 pulse. Here, a detailed derivation of Ai and Bi is presented in Appendix. The detailed solving process for Equation 5 is provided in Appendix.
Results
Fold changes in target protein expression at steady state exhibits oscillations
The steady state is the relatively long-term behavior of the cells. Neither Reference 28 nor 21 conducted this steady-state analysis. Reference 22 began this type of analysis. By inspecting Equation 7, for sufficient p53 input pulses, letting i →∞, we can obtain the steady state of the target protein expression dynamics: Pss (ξ) reached its maximum and minimum at Δ and 0 or T, respectively. Therefore, the steady-state of target protein expression dynamics is a repetitive and invariant oscillation. The maximal fold change or peak of oscillations was The minimal fold change or valley is For a given target protein, β and KA are relatively fixed, and the amplitude depends on the p53 dynamic parameters and the half-life of the mRNA and protein. Apparently, the smaller the amplitude is, the more stable the steady state.
A Hill-type equation can characterize the constant steady state
Let us now examine the characteristics of the steady state within the limits of α and μ. From Equations A6 and A7, when αT =1, μT ≪1was applied, Equation 11 is the same as Equation 1, which is exactly the Hill-type equation found previously 22.
Similarly, when αT =1, μΔ ≫1, i.e. μT ≫ 1, there are When μT ≪ 1, aΔ ≫1, i.e. aT ?1, there are When αΔ ≫1, μΔ ≫1, i.e. αT ≫1, μT ≫1, there are Equation 12 is the same as the classical Hill equation, which governs the steady-state mRNA fold-change under sustained p53 dynamics 22. Therefore, in the limit of a very long or short half-life for mRNA and protein, oscillations contract into a constant line that is very stable.
For several target genes of p53, Table 1 lists the results for α, μ, β, and KA. Only 3 genes had complete data. α and μ determine the trajectory of target protein expression dynamics 19. The four scenarios discussed above correspond to the four sets of protein expression dynamics defined in reference 19.
As shown in Table 1, the mRNA decay rate constants of PUMA, MDM2, and p21 are from 24, and the mRNA half-life of BAX is 38.77 h 29, thus, the mRNA decay rate is α = ln 2 38.77 ≈ 0.018 h-1 19. The protein degradation rate constants of PUMA, MDM2, p21, and BAX were obtained from 19. The dissociation constants of PUMA, MDM2, p21, BAX, and GADD45A are from 30. For maximal mRNA expressions of p21(CDKN1A), GADD45A, MDM2, and BAX, l o g2 (m a x i m a l m R N A f o were 3.9008(3 h), 3.1949(3 h), 2.7861(3 h), and 1.2373(9 h)20, thus, the maximal fold changes were 14.937, 9.157, 6.898, and 2.358, respectively.
The average fold changes in mRNA and protein expression were the same at steady state
The average fold change of the target protein over the i-th period can be calculated as follows (supplementary material) : Letting i →∞, the average steady state is As we expect Therefore, we proved that at steady state, the average fold changes in mRNA and protein expression were the same. This result also helds for the average levels of different proteins and their corresponding mRNAs within a single cell, which is consistent with the observed results 13. Similarly, for the average of the same proteins across the cell population, the average mRNA and protein expression levels were also the same.
Next, we considered the target protein BAX as an example to examine the predictability of the Hill-type equation for protein expression. The observed fold-change in the expression of the BAX protein 24 h after stimulation was 1.28 19, and that of the BAX mRNA was 20.7802 = 1.72 20. For any cell, assuming that Δ /T remains unchanged, the average BAX expression over the cell population can be calculated using the Hill-type equation 22: The values for β and KA are listed in Table 1. The Hill coefficient is n = 1.830. A = 60 nM 21. The duty cycle of p53 pulsing γ can be written as where represents the area under the curve p53(t). For a general periodic function, p53(t), during [0,T ] can be approximated by a square wave function S(t), as long as Thus, Here, γ is taken as 0.37 to minimize the error between the predicted p21 mRNA fold change and the observed fold change22.
A longer mRNA or protein half-life determines the relaxation time of target protein expression dynamics
We previously obtained the relaxation time for mRNA dynamics 22. Similarly, the time of protein dynamic trajectory relaxation to the average steady state can be calculated by 21,22,28 When αT =1, μT ≪ 1, τ pulsed can be expanded in the Taylor series: Thus, the number of p53 pulses required to reach the average steady state is For sustained p53 input dynamics, Δ= T, thereby Therefore, compared to sustained input, pulsed p53 dynamics cause target protein dynamics to reach maximum more quickly. In other words, if the half-lives of mRNA and protein and T (1-γ) 2 have the same order of magnitude, oscillatory p53 input enhances the sensitivity of protein expression. On the other hand, as shown in Table 1, the gene encoding the BAX, which has a longer half-life of mRNA and protein requires multiple p53 pulse inputs to reach a steady state, which not only provides sufficient time for DNA repair but also leads to the accumulation of sufficient expression levels required for triggering apoptosis.
The index of target protein accumulation under multiple p53 pulses
To understand the degree of accumulation of the target protein in response to multiple p53 input pulses, the index of protein accumulation can be defined as which is the ratio of the maximal protein fold change at steady state to the maximal protein fold change during the 1st pulse.
Pss,max is given by Equation 9, and When αT =1, μT ≪ 1, there are Thus Similarly, when αT =1, μΔ ≫ 1, i.e. μT ≫ 1, there is When αΔ ≫1, i.e. aT≫ 1, and μT ≪ 1, there is When aΔ ≫1, i.e., αT 1and μΔ ≫ 1,i.e. μT ≫ 1, there are Thus Therefore, proteins with longer half-lives have a higher accumulation; however, proteins with shorter half-lives have a lower accumulation.
The first three cases discussed above broadly correspond to the rising expression dynamics observed in reference 19, and the last corresponds to the oscillatory expression dynamics. In other words, the rising expressions have higher accumulation, and the pulsing expressions have lower accumulation.
The peak-to-valley ratio can measure the stability of mRNA and protein expression dynamics under multiple p53 pulses
According to Equation 6, the mRNA expression dynamics at steady state also exhibit oscillatory behavior. Let i →∞, The steady state for mRNA dynamics can be written as The peak is And, the valley is For a given gene, the peak-to-valley ratio for target mRNA expression can be defined as The closer ρm is to 1, the more stable the mRNA expression dynamics are. In particular, when αT =1 Thus Therefore, the longer the mRNA half-life is, the more stable the mRNA expression dynamics are. Similarly, when aΔ ≫1, i.e. aT ≫1, Therefore, Therefore, target mRNA dynamics with shorter half-lives are unstable.
Similarly, we can define the peak-to-valley ratio that characterizes the stability of the target protein expression dynamics as: As seen from the previous discussion, for the three extreme cases of α and μ, ρp = 1. However, for the short half-lives of mRNA and target protein, according to Equation 12, ρp = 1+ md > 1, therefore, the target protein is unstable in this case.
When mRNA has a short half-life, ρm = 1+ md, therefore, translation of mRNA does not improve its stability. By calculating the peak-to-valley ratio of each target gene, we can compare the stability of the expression of different genes.
The index of mRNA accumulation can be defined as From Equation 6, we obtain For αT ≪1, λm = 1+ γ md, and for αT ≫1, λm = 1. Therefore, stable mRNA has a high degree of accumulation.
As shown in Table 2, for the four combinations of half-lives, both long mRNA and short protein half-lives, as well as short mRNA and long protein half-lives can produce stable proteins. Only short mRNA and short protein half-lives produce unstable proteins. Stable proteins always have high accumulation.
The regulatory principle of protein expression dynamics under p53 pulsing
Let us examine the Hill type equation. For a very high binding affinity, i.e. KA ≪ A, Therefore, the expression of target protein with high p53 DNA-binding affinity is insensitive to amplitude. Only through duration and frequency can fine-tuning of p21 protein expression be achieved, demonstrating the regulatory ability of duration and frequency beyond saturation (Fig 1). In addition, it is important to note that Δ /T < 1,namely, the frequency must be less than that of 1 /Δ. Thus, a situation in which too high a frequency causes protein expression to decrease is avoided. Furthermore, the results from the experiment showed that gene expression increased proportionally with TF frequency 31, which agrees with the prediction from the equation.
The maximum mRNA expression of β was determined using binding affinity. The higher the affinity, the greater the maximal mRNA level 20,30. For example, as shown in Table 1, for the p21, GADD45, MDM2, and BAX genes, KA are 4.9 nM, 7.7 nM, 12.3 nM, and 73 nM 30, respectively, and the corresponding β values are 14.937, 9.157, 6.898, and 2.358, respectively 20. Therefore, the maximal gene expression level is mainly determined by the TF DNA-binding affinity, whereas the stability of gene expression is determined by the half-life of mRNA and protein. Therefore, the regulation of binding affinity has become a key issue in gene expression control. Gurdon et al. can greatly improve the binding affinity by extending the residence time of the TF target 32, allowing the duration and frequency of TF dynamics to finely tune gene expression.
The fold changes in basal mRNA and protein expression are equivalent at steady state
For basal gene expression, let S(t) = 0, Equations 3 and 4, which describe the fold changes in mRNA and protein expression, respectively, become Therefore, at steady state the fold changes in basal mRNA and protein expression are Where mbasal and Pbasal are the fold changes in basal mRNA and protein expression, respectively. By letting Δ= 0 or A = 0 in Hill-type equation 14, we can also obtain Equation 39.
Similarly, for basal gene expression, letting S(t) = 0, Equations 1 and 2, which describe the concentrations of mRNA and protein expression, respectively, become The steady state is Where mRNAss and proteinss represent the concentrations of basal mRNA and protein at steady state, respectively. All the symbols and definitions used in this article are listed in Table 3. Therefore, for the basal gene expression system, the absolute levels of protein expression at steady state are determined by the rate constant of translation and degradation and mRNA levels. For any given gene, it must be noted that the above system has only one steady state.
The fold changes in basal mRNA and protein expression at steady state are which is the same as Equation 39.
Discussion
TFs are nodes of a complex network, which is similar to a bridge connecting cellular signal transduction networks and transcription networks. Using the well-known TF p53 as an example, I proved that the steady-state fold changes in mRNA and target protein expression driven by p53 pulsing are the same. The Hill-type equation obtained may also be expected from previous methods 3,7,9,12. Using this equation, we not only clearly understand the regulatory principle of gene expression, but also put this equation into practice. In addition, I provide two quantitative indicators to determine the degree of accumulation and stability of protein expression under multiple TF input pulses.
For repressed gene expression, It was also proven that the fold changes in mRNA and target protein expression are the same at steady state, and a Hill-type equation with a negative coefficient can describe the fold change of target gene expression 33(full text has not been submitted). Therefore, for both basal and activated or repressed expression, we may reach the consensus that the fold changes in mRNA and protein expression are the same at steady state and can be characterized by the Hill-type equation.
It must be emphasized that the experimental data were obtained without the researchers realizing that the fold changes in mRNA and protein expression are the same at the steady state. After completing the theoretical proof, I tried to search for experimental data to support my findings. This confirms what Einstein said, “It is the theory which decides what we can observe”. In addition, the experimental data have errors and come from different laboratories. In the future, single-cell experiments may be performed according to the requirements of the Hill-type equation to verify the theoretical findings.
In References 28 and 21, for slow pulsing or fast dissociation, an equation similar to Equation 1 is introduced, and cannot be directly derived from the trajectory of the system. This may be an average steady state. In the limit of slow pulsing, the peak-to-valley ratio tends to infinity. Thus, this average steady state is just zero. However, for fast pulsing or slow dissociation, another steady state can be derived not only in the trajectory but also in the average equation. Therefore, it is observable. In this case the peak-to-valley ratio is 1, so the system is stable. This equation is called the modified Hill equation which obtains from a ligand-receptor association and dissociation system under pulsing signals. Correspondingly, Equation 1 is called the Hill-type equation which comes from the synthesis and degradation of products upon oscillating signals 22. At present, there are only two types of Hill-type equations.
There are many models of gene expression. Some have been cited in previous research from TF to DNA to mRNA. Gene expression exhibits stochastic characteristics.34-36. The impact of stochastic effects on gene expression has also been explained in the research from DNA to mRNA. The classic Hill equation has long been used in the modeling of biological systems 37-42. Mathematically, the Michaelis-Menten equation is equivalent to the Hill equation if the Hill coefficient is unity43. The purpose of this study was to determine the principle of gene expression, and the results obtained needed to withstand practical testing, and the parameters of the results were measurable. Therefore, we can only flexibly develop a minimal model. Evidently, the basic purpose has been achieved. Hill initially only used the equation to fit experimental data. This Hill-type equation reveals the principle of gene expression upon TF pulsing. If Hill could see this equation, he would definitely feel very pleased.
Although gene expression dynamics are random and complex, the regulation principle is always deterministic and simple. “When a process depends on a range of different sources of randomness, instead of getting more complicated, it is possible for the different random factors to compensate for each other and produce more predictable results. Talagrand has given sharp quantitative estimates for this”44. The Hill-type equation reflects the principle of deep simplicity of gene expression dynamics. The history of mechanics, physics, and chemistry indicates that the essence of nature is simple. Complex phenomena have evolved on the basis of the principle of simplicity. According to the regulation principle revealed by this equation, we can control gene expression from upstream of the genetic information flow described by the central dogma. Only four parameters need to be adjusted, namely, the duration, frequency, and amplitude of TF and binding affinity, and we can control the steady-state levels of mRNA and protein expression.
Because the levels of mRNA and protein expression at steady state are equal, and a large amount of transcriptome data has accumulated over the years; therefore, using this equation, we may predict the proteome simply by the transcriptome.
The classical Hill equation is applied when the TF dynamics are constant. When Δ= T is applied, the equation is reduced to the classical Hill equation, therefore, this equation broadens the application of classic biochemical theory. Through the derivation process, this generalized equation, which reveals the regulatory mechanism of gene expression, can be applied to any activated gene expression pathway driven by TF dynamics.
Hill wrote: “My object was rather to see whether an equation of this type can satisfy all the observations, than to base any direct physical meaning on n and K 45,46.’’ More than 100 years later, we have not forgotten his caveat 46.
Ethics
This work did not require ethical approval from a human subject or animal welfare committee.
Data availability
The Data used in this article were obtained from [17, 23-26, 28, 30, 31] or are included in the text.
Author contributions
X.S.: conceptualization, formal analysis, investigation, methodology, writing—original draft, writing—review, and editing.
Conflict of interest declaration
I declare that I have no competing interests.
Funding
This study received no funding.
Acknowledgements
I thank anonymous reviewers for several helpful comments.
Appendix
The solution for Equation 5
The equation for target protein expression dynamics is or Denoting ξi = t −(i −1)T 28, then Equation A2 become To solve Equation A3, letting general solution Then general solution to the homogeneous equation of Equation A3 is Assuming a ≠μ, then a special solution for Equation A3 is Where Thus, we can obtain the general solution for Equation A3: Using the condition28 We can obtain the iterative equation Considering P1 (0) = 1, thus, Denoting , thus Thus, Generally, we have Thus, we have the solution Equation 7.
The average fold change of target protein expression
The average target protein expression levels is defined as28 Where Thus
Footnotes
Inserted supplementary material as an appendix into main text;deleted a paragraph and polished the English.