Abstract
Synthesis of gene products in bursts of multiple molecular copies is an important source of gene expression variability. This paper studies large deviations in a Markovian drift–jump process that combines exponentially distributed bursts with deterministic degradation. Large deviations occur as a cumulative effect of many bursts (as in diffusion) or, if the model includes negative feedback in burst size, in a single big jump. The latter possibility requires a modification in the WKB solution in the tail region. The main result of the paper is the construction, via a modified WKB scheme, of matched asymptotic approximations to the stationary distribution of the drift–jump process. The stationary distribution possesses a heavier tail than predicted by a routine application of the scheme.
MSC 2020 92C40; 60J76, 45D05, 41A60
1 Introduction
Bursty production of gene products (mRNA or protein molecules) makes an important contribution to the overall gene expression noise [1–4]. Bursts can be modelled as instantaneous jumps of a random process. Burst sizes have been suggested to follow geometric (in a discrete process) or exponential (in a continuous process) distributions [5, 6]; we focus on the latter. Production of gene products is balanced by their degradation and/or dilution. Combining randomly timed and sized production bursts with deterministic decay leads to a Markovian drift–jump model of gene expression [7–10]. More fine-grained models of gene expression are based on a purely discrete [11–14] or a hybrid discrete–continuous state space [15–18]. The drift–jump model can be derived from the fine-grained processes using formal limit procedures [19–24].
In its basic formulation, the drift–jump model for gene expression admits a gamma stationary distribution [25]. The model possesses an explicit stationary distribution also in the presence of a Hill-type feedback in burst frequency [26]. Such regulation can result from common transcriptional control mechanisms [27]. In addition to feedback in burst frequency, there is evidence of feedback mechanisms that act on burst size or protein stability [28–30]. The explicit stationary solution to the drift–jump model has been extended to the case of feedback in protein stability [31]. However, in case of feedback in burst size, an explicit solution is unavailable, save for the special case of Michaelis– Menten-type response [32].
The near-deterministic regime of frequent and small bursts can be analysed using the Wentzel–Kramers–Brillouin (WKB) method; the WKB-approximate solutions closely agree with numerically obtained exact distributions even at moderate noise conditions [33]. Bursty production has been formulated and analysed with the WKB method also in the discrete state space [34–38]. Similar approaches have earlier been used in queueing systems [39, 40]. The standard WKB-type/diffusion-like results are guaranteed to apply for jump-size distributions with super-exponentially decaying tails [41]. Contrastingly, in the sub-exponential case, large deviations are driven by single big jumps [42]. The exponential case can combine both phenomena for random walks: the Cramer/WKB-type result applies in a region of sample space called the Cramer zone, while single big jumps contribute to deviations beyond the Cramer zone [43, 44].
In this paper, the standard WKB-type approach will be shown to be suitable for the drift-jump gene expression model with positive feedback in burst size. If the feedback is negative, the WKB-approach will be shown to apply below a certain threshold (referred to, by analogy with random walks, as the Cramer zone), whereas beyond the threshold (referred to as the tail zone) single big jumps contribute to large deviations. Matched asymptotic approximations to the stationary distribution in the Cramer zone, in the tail zone, and on their boundary will be constructed using a formal singular perturbation approach [45– 47].
The structure of the paper is as follows. Section 2 formulates the model. Section 3 presents the standard WKB approximation scheme. The core of the paper is Section 4, in which the modified WKB scheme is given. The boundary layer is treated in Section 5. The asymptotic results are cross-validated by simulations in Section 6. The paper is concluded in Section 7.
2 Model formulation
The drift–jump gene-expression model is a Markov process with piecewise continuous sample paths (Figure 1, left panel). The state x of the process represents the concentration of a gene product (say a protein, for concreteness). The discontinuities in the sample path are the production bursts. Between bursts, the protein concentration decays deterministically with rate constant γ(x), i.e. as per . Bursts occur with state-dependent frequency (propensity) ε−1α(x). Burst sizes are drawn from an exponential distribution with rate parameter ε−1ν(x), in which x is the state of the process immediately before the burst; the reciprocal ε/ν(x) of the rate parameter gives the mean burst size. Decreasing the noise strength ε makes bursts more frequent and smaller. The functions α(x), ν(x), and γ(x) can implement feedback in burst frequency, burst size, and protein stability (Figure 1, right panel).
The probability density function p(x, t) of being at state x at time t satisfies the integro–differential equation In the conservation equation (1), J = J (x, t) gives the flux of probability across a reference state x at time t. By (2), it consists of a negative local flux due to deterministic decay and a positive non-local flux due to stochastic bursts. The non-local term integrates, over all states y < x, the probability ε−1p(y, t)α(y) that a burst occurs multiplied by the exponential probability that the burst goes beyond the reference state x.
Estimating the integral in (2) by the Laplace method [48] as ε → 0, we obtain J ∼ (α(x)/ν(x) − γ(x))p(x, t), which is the probability flux of a purely deterministic process Equation (3) is the deterministic limit of (1)–(2) (sometimes also referred to as the fluid limit or the law-of-large-numbers limit). Retaining a further term in the asymptotic expansion of the non-local term leads to an ad-hoc drift–diffusion approximation to the drift–jump process [49]. Such truncations exhibit different ε → 0 asymptotics than the original problem [50].
Equating the flux in (2) to zero, we obtain a Volterra integral master equation for the stationary distribution. Multiplying a solution p(x) to (4) by a constant gives another solution. The multiplicative constant can be fixed by requiring that the total probability integrate to one. However, the dependence of the normalisation constant on ε introduces unnecessary complications in the asymptotic expansions; we defer the normalisation until Section 6.
The principal aim of Sections 3–5 is to characterise the ε → 0 asymptotics of solutions p(x) = p(x; ε) to the Volterra master equation (4).
3 Standard WKB scheme
We seek an approximate solution to (4) in the WKB form where a regular dependence of the prefactor on ε is postulated. The function Φ(x) in (5) is referred to as the quasipotential.
Inserting (5) into (4) gives where Differentiating (8) with respect to y and setting y = x gives relations which tie up the local behaviour of Ψ (x, y) near the boundary y = x and that of the (yet unknown) quasipotential.
Provided that the dominant contribution to the integral on the right-hand side of (7) comes from an O(ε)-wide neighbourhood of the right boundary. Estimating the integral in (7) by the Laplace method and cancelling the common exponential term gives Inserting (6) and (9) into (11), and collecting O(1) terms, yields the quasipotential while collecting O(ε) terms determines the prefactor The constants of integration in the indefinite integrals in (12)–(13) add up to the normalisation constant in the probability distribution (5) and can be chosen arbitrarily.
The weak point of this section is the assumption (10). Combining (8) and (12), we see that If ν(x) is decreasing (positive feedback case), ∂yΨ (x, y) < 0 for y ≤ x, which confirms (10) post hoc. If ν(x) is constant (no feedback in burst size), then r0(x) exp(−Φ(x)/ε) with (12)–(13) is the exact solution to (4) [31]. The case of negative feedback in burst size requires a subtler analysis, which is the subject of the rest of the paper.
4 Modified WKB scheme
From now on, we refer to the function Φ(x) defined by (12) as the local potential. The name reflects the fact that its derivation involved a local estimate of the integral in the Volterra master equation (7). We assume that the local potential satisfies Assumptions (15) are satisfied e.g. by choosing Graphical examples in this section pertain to the parametric choice (16). The following subsection examines the behaviour of Ψ (x, y) defined by (8) and constructs a modified potential.
4.1 Modified potential
For any fixed y > 0, equation Ψ (x, y) = Φ(x) in the unknown x has two roots, the trivial one x = y, and a non-trivial one such that x > y (Figure 2, left). Comparing the slopes of Φ(x) and Ψ (x, y) at their non-trivial intersection, we obtain
Let us look at the same equation but reverse the dependency between the two variables. For any fixed x > 0, equation Ψ (x, y) = Φ(x) in the unknown y has a trivial root y = x, a non-trivial root y = y* < x* if x = x*, and two non-trivial roots if x > x* (Figure 2, right, dotted line); the critical pair (x*, y*) satisfies Note that (17) implies that The function Ψ (x, y) is minimised by (cf. Figure 2, right, solid line) where ym(x) is the lower branch of the critical equation We define the modified potential as The region x < x* will be referred to as the Cramer zone, and the complementary region x > x* as the tail zone. The derivative of the potential in the tail zone satisfies Combining (12), (23), and (19), we find meaning that the derivative of the modified potential is discontinuous at the boundary of the Cramer zone.
The purpose of the remainder of this section is to use the modified potential (22) as a basis for a WKB-type approximation to the solution p(x, ε) to the integral equation (4). In the Cramer zone, condition (10) is satisfied and the standard procedure of Section 3 yields where the prefactor is defined by (13). The next section argues that the modified potential (22) is appropriate outside the Cramer zone.
4.2 Dominant balance
If we look for a solution p(x; ε) to (4) in a form that is logarithmically equivalent to , then the integrand on the right-hand side is logarithmically equivalent to , where Let us investigate the behaviour of as function of y ∈ (0, x] for a fixed x > x*. The Cramer and the tail regions are thereby treated separately:
y ≤ x*. Here we have with equality in place if y = ym(x)
y ≥ x*. Here where the estimate (30) holds for a non-decreasing ν(x) (negative feedback in burst size) and the estimate (31) follows from (20); both estimates become equalities if y = x.
The upshot of (27)–(31) is that The integral on the right-hand of (4) side will be logarithmically equivalent to , which is the asymptotics postulated for the solution. The use of the modified WKB potential (22) thus leads to a desired balance between the sides, at least to a logarithmic precision, of the master equation (4).
Important contributions to the integral term in (4) come from the neighbourhoods of the minimisers y = ym(x) and y = x of (as function of y ∈ (0, x] for a fixed x > x*). The function is locally parabolic near the internal minimiser y = ym(x) < x*, but it is locally linear near the boundary minimiser y = x > x*. By the Laplace method [48], an O(ε1/2) neighbourhood of the parabolic minimiser, but only an O(ε) neighbourhood of the linear minimiser, contribute. In order to balance the contributions, we compensate at the level of prefactor, seeking the solution outside the Cramer zone in the form of The next subsection determines the prefactor ρ(x) outside the Cramer zone.
4.3 The prefactor outside the Cramer zone
Inserting the WKB expansions (25) and (33) into the Volterra master equation (4), we find that for δ » ε1/2 we have Estimating the integrals by the Laplace method, cancelling the common exponential term, and collecting at the leading order, we obtain Differentiating (26) with respect to y and using (23) gives We set y = x into (36) and insert the result into (35), arriving at Inequality (17) ensures that the denominator in (37) is positive (including at the boundary x = x*).
In the next section, we tie up the loose ends in the approximation scheme by constructing an inner solution in a neighbourhood of the Cramer boundary x = x* that matches (25) to the left and (33) to the right.
5 Boundary layer
The discontinuity in the potential derivative (24) and the mismatch of prefactor magnitudes in (25) and (33) suggest the presence of a boundary layer near x = x*. We define the inner variable ξ via the transformation where the constant κ > 0 will be specified later. Qualitatively, as x increases towards x*, the integral in the Volterra equation begins to feel the ‘‘ghost’’ of the internal minimum of Ψ (x*, y) (Figure 2, right panel): the local approximation of Section 3 breaks down before x* is reached. The qualitative notion is made quantitative in the rest of the section. Subsection 5.1 constructs the inner solution that is valid in the boundary layer ξ = O(1). Subsection 5.2 matches the inner solution to the WKB approximations that are valid outside the boundary layer.
5.1 Inner solution
The inner solution is sought to be proportional to a regular function of the inner variable: We divide the integration interval in (4) into 0 < y < xo and xo < y < x, where xo belongs to the overlap of the WKB approximation (25) and the inner approximation (39).
In the first interval, the integral is estimated by means of the WKB approximation (25) and the Laplace method as In the second interval, the substitution y = x* + κεlnε + εη and the inner approximation (39) give an asymptotic estimate Requiring that (40) and (41) be of the same order implies for the proportionality constant in the inner solution (39).
Inserting (39), (40), and (41) into the Volterra master equation (4), and then dividing by C(ε), yields Multiplying (43) by and differentiating with respect to ξ turns the integral equation (43) into a differential equation Solving (44) yields where is found by the method of undetermined coefficients and A is a constant of integration, which will be determined by asymptotic matching to the outer solution.
5.2 Matching
Two constants need to be determined to complete the inner solution, namely:
These will be calculated in Section 5.2.2 by matching to the WKB solution (25) inside the Cramer zone. Before doing so, we demonstrate that the inner solution asymptotically matches the WKB solution (33) outside the Cramer zone.
5.2.1 Matching to the right
Owing to the inequality (17), the second term in the inner solution (45) dominates for ξ → ∞; inserting it and (42) into (39) gives in the overlap of the inner solution and the outer solution to its right.
On the other hand, inserting the transformation (38) into the outer solution (33), re-expanding, and using (23) gives in the overlap. Comparing (47) and (48), we find B = ρ(x*), which is consistent with (37) and (46).
5.2.2 Matching to the left
As ξ → −∞, the first term in (45) dominates, so that in the overlap of the inner solution and the outer solution to its left. On the other hand, inserting (38) into the outer solution (25) gives Comparing (49) to (50) yields inequality (19) thereby guarantees that κ > 0 as advertised at the beginning of the boundary-layer analysis. Equations (51) complete the inner solution and thus the asymptotic analysis of (4).
6 Numerical solution
Before being compared to a numerical solution, the asymptotic solutions are normalised by The integral of the WKB solution over the tail zone is exponentially smaller than the integral over the Cramer zone and can be neglected in (52). The Cramer-zone integral can in principle be estimated by the Laplace method by the local contribution from the minimiser of the potential Φ(x). However, practice shows that doing so introduces a relatively large numerical error. Instead, the normalisation constant can be calculated by numerical quadrature of (52).
For the numerical solution, sample paths xi(t), i = 1, …, N, 0 ≤ t ≤ T, subject to x(0) = x0 are generated using the exact stochastic simulation algorithm (see the Appendix). The solution is constructed by the histogram method from the dataset of final-time values {xi(T)}i=1,…,N. Specifically, we divide an interval [0, xmax] into n equally sized bins, count the number of data in each bin, and divide the counts by Nxmax/n so as to normalise into a probability density. The histogram estimate is close to the exact solution p(x; ε) to the Volterra master equation (4) if the number of samples N is large (so that the statistical error is small) and the simulation end time T is large (so that the process equilibrates to steady state).
Figure 3 compares the three matched asymptotic approximations to the numerical solution for selected values of the noise strength ε. Decreasing ε leads to a close agreement between the numerical solution and the asymptotic approximations in their respective regions of validity (Figure 3, top panels). As ε decreases further (Figure 3, bottom panels), the Cramer-boundary and tail behaviour become exponentially improbable, and cannot be reliably estimated from a feasible number (say a billion) of samples. Nevertheless, the chosen examples demonstrate that the naive solution, which extends (25) outside the Cramer zone, underestimates the tail of the stationary distribution, whereas the alternative approximations provide an adequate description.
7 Conclusion
This paper provides matched asymptotic approximations to the stationary distribution of a drift–jump model for stochastic gene expression. The analysis revolves around the estimation of the integral term in the Volterra master equation (4). The integral term represents the flux of probability due to production bursts through a reference state x. In the Cramer region (x < x*), the flux consists solely from local contributions (y ≈ x), whereas in the tail region (x > x*), a contribution comes also from within the interval. The latter corresponds to the ‘single big jumps’ advertised in the abstract.
Negative feedback in burst size is a prerequisite for the singular behaviour in question. Conceptually, in the presence of negative feedback in burst size, it is ‘cheaper’ to hunker down and then take a giant leap, than to climb up with tiny steps. The result is thus in agreement with the broad principle that any large deviation occurs in the least unlikely of all the unlikely ways [51].
The analysis is formulated for general feedback responses satisfying certain constraints. A particular specimen, the power non-linearity ν(x) = xm, has been the main example throughout this text. The coefficient m can be interpreted as the number of protein molecules that need to cooperate to repress the production burst. The solution to the Volterra equation (4) with a power non-linearity has previously been shown to satisfy as x → 0 and as x → ∞, where c1, c2 > 0 [52]. The same study provided a centrallimit-theorem-type approximation that is valid as ε → 0 for |x − 1| = O(ε1/2). The current study thus contributes by approximations that apply as ε → 0 throughout the state space x > 0. The popular Hill-type non-linearity ν(x) = 1/(1 +(x/K)m) can be reduced to the power non-linearity by means of a simple transformation [52]. The conclusions arrived for the power non-linearity thus easily extend to the Hill-type response.
Earlier studies argued that the subtleties that arise with feedback in burst size are an artefact of delay [32, 33]. Indeed, the memoryless property of the exponential distribution of burst sizes implies a lack of control at the infinitesimal timescale of burst growth. In light of this argument, the current results contribute to the understanding of the interplay between bursting and delay in biological systems [53–57].
Appendix
Stochastic simulation algorithm
Here we provide an stochastic simulation algorithm that can be used to generate a sample path x(t) of the process on a time interval [0, T] subject to an initial condition x(0) = x0. Similarly like the well-known Gibson–Bruck/Gillespie algorithm, the algorithm does not introduce truncation errors, but only statistical and round-off errors, and in this specific sense it is an exact simulation algorithm. For simplicity, we focus on the situation when the feedback acts only on burst size but not on burst frequency or protein stability; the general case is discussed in the end of the appendix.
Each sample path is generated iteratively as follows. Assume that the sample path x(t) has already been generated on an interval 0 ≤ t ≤ tcur (initially tcur = 0 and x(0) = x0 is an initial value). Assuming the absence of feedback in burst frequency (α(x) = 1), the exponentially distributed waiting time until the coming burst is sampled by the inversion method as where θ is drawn from the uniform distribution in the unit interval. Assuming the absence of feedback in protein stability (γ(x) = x), the sample path decays exponentially until the coming burst: At the time of the next burst the sample path is increased by the exponentially distributed burst size: where x(t−) = x(tcur)e−τ denotes the state of the sample path immediately before the burst; the variate is drawn from the uniform distribution in the unit interval independently of θ. Thus one round of iteration via (53), (54), and (55) extends the sample path from the interval [0, tcur] to the interval [0, tcur +τ]. The algorithm is repeated until the state x(T) at a required end time T > 0 is found.
The algorithm can be modified to account for feedback in burst frequency and protein stability. If feedback in burst frequency is present, the waiting time needs to be drawn from a distribution with a non-constant hazard function [8]. If feedback in protein stability is present, the sample path needs to be evolved as per between bursts.
References
- [1].↵
- [2].
- [3].
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].
- [10].↵
- [11].↵
- [12].
- [13].
- [14].↵
- [15].↵
- [16].
- [17].
- [18].↵
- [19].↵
- [20].
- [21].
- [22].
- [23].
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].
- [36].
- [37].
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].
- [55].
- [56].
- [57].↵