Accelerating computational Bayesian inference for stochastic biochemical reaction network models using multilevel Monte Carlo sampling

Investigating the behavior of stochastic models of biochemical reactionnetworks generally relies upon numerical stochastic simulation methods to generate many realizations of the model. For many practical applications, such numerical simulation can be computationally expensive. The statistical inference of reaction rate parameters based on observed data is, however, a significantly greater computational challenge; often relying upon likelihood-free methods such as approximate Bayesian computation, that requirethe generation of millions of individual stochastic realizations. In this study, we investigate a new approach to computational inference, based on multilevel Monte Carlo sampling: we approximate the posterior cumulative distribution function through a combination of model samples taken over a range of acceptance thresholds. We demonstrate this approach using a variety of discrete-state, continuous-time Markov models of biochemical reactionnetworks. Results show that a computational gain over standard rejection schemes of up to an order of magnitude is achievable without significant loss in estimator accuracy. Author Summary We develop a new method to infer the reaction rate parameters for stochastic models of biochemical reaction networks. Standard computational approaches, based on numerical simulations, are often used to estimate parameters. These computational approaches, however, are extremely expensive, potentially requiring millions of simulations. To alleviate this issue, we apply a different method of sampling allowing us to find an optimal trade-off between performance and accuracy. Our approach is approximately one order of magnitude faster than standard methods, without significant loss in accuracy.


Introduction
Stochastic models of biochemical reaction networks often provide a more accurate description of system dynamics than deterministic models [1].In many cases, this is due to the inherent stochastic nature of many biochemical processes in which the system dynamics is significantly influenced by relatively low populations of certain chemical species [2].For example, in eukaryotic cells, molecules that regulate gene expression occur in relatively low numbers; as a result, stochastic fluctuations have a direct effect on the production rates of proteins [3,4].
A common approach to modeling biochemical systems is to consider a well-mixed collection of molecules that react according to some known chemical reactions.The wellmixed assumption simplifies the model by removing the spatial component [5,6].If the model is deterministic, evolution of the concentrations of each chemical species is governed by a system of ordinary differential equations (ODEs).Alternatively, a stochastic model will typically consider the evolution of copy numbers (i.e., the numbers of molecules) of each species, with each reaction occurring stochastically [6].
In the case of the stochastic model, the probability density function (PDF) for the state of such a system at time t evolves according to a very large system of ODEs known as the chemical master equation (CME), which is in general intractable due to the very large, or countably infinite, number of possible system states [5,7].As a result, stochastic simulation techniques such as the exact Gillespie direct method (GDM) or approximations like the tau-leaping method are applied to study these models [8,9].However, accurate stochastic simulation is a computationally expensive task; the computation time for the GDM, for example, scales with the number of possible reactions yet the performance improvements gained using approximations can introduce approximation errors.Therefore, the development of efficient and accurate stochastic simulation algorithms is an area of active research [10][11][12][13][14][15][16].
In order to make quantitative predictions of real biochemical systems or to perform model validation, unknown reaction rate parameters must be determined through inference.The Bayesian approach to estimate an unknown parameter vector, θ, given some observational data, D, is based on Bayes' Theorem, where p (θ) is the a priori PDF of the unknown parameter, p (D | θ) is the likelihood of making observations D under the assumption of a particular value for θ, p (D) is often referred to as the evidence and p (θ | D) is the a posterior PDF of θ given the observations [17].Informally, Equation (1) represents the process of updating current understanding based on previous experience and observational data.The classical approach to inference is to maximize the right hand side of Equation (1) to determine the mode of the posterior.However, more generally, the Bayesian approach can be used to quantify the level of uncertainty associated with such parameter estimates.
Theoretically, given perfect observational data from a biochemical reaction network, it is possible to determine a closed form expression for the likelihood term in Equation (1) and the method of maximum likelihood may be directly applied [6].In practice, however, such a process is sampled imperfectly and is subject to measurement errors; thus requiring solution of the CME to form the likelihood term.However, as we have noted, exact closed form solutions of the CME are rarely available for practical applications.
Approximate Bayesian computation (ABC) refers to a family of computational methods for performing inference for problems with intractable likelihoods [17,18].As a result, ABC methods are routinely applied to practical inference problems [19][20][21][22][23][24].The fundamental concept is to approximate the posterior PDF, p (θ | D), by p (θ | ρ(D s , D) < ), where D s is simulated data, ρ is a suitable distance function and is the acceptance threshold.If a simulation process exists for the prior distribution, p (θ), and the underlying model of interest can be simulated to estimate p (D s | θ), then the approximated posterior can be simulated using an ABC approach.
The computational overhead for ABC inference is significantly greater than that of stochastic simulation alone.This is because the computation time is inversely proportional to the probability of ρ(D s , D) < over all possible parameter values θ; that is, many stochastic simulations are required for every sample computed from the posterior [17].A considerable computational problem arises from this, because the acceptance rate decreases exponentially as the number of unknown parameters increases [25].This problem, referred to as the curse of dimensionality, is mitigated to some degree for certain classes of problems through the use of more advanced ABC techniques [26][27][28][29].However, in general, the curse of dimensionality is an unresolved problem [17].
In this study, we propose, implement and analyze a method for accelerating ABC inference using a multilevel Monte Carlo (MLMC) approach [30].MLMC is a framework for constructing computationally efficient and accurate estimators of system statistics for stochastic processes.MLMC was developed by Giles et al. [31,32] as a stochastic variant of multigrid methods used for obtaining numerical solutions to differential equations.Since then many other applications have benefitted from MLMC including Markov process simulation [14,33,34], uncertainty quantification [35] and univariate probability distribution approximation [36,37].To the best of our knowledge, our work represents the first application of MLMC to full Bayesian inference with intractable likelihoods.
To summarize our approach, we construct an approximation to the posterior cumulative distribution function (CDF) using a MLMC estimator that is constructed from a sequence of ABC approximate posteriors.While we focus on stochastic biochemical reaction networks models, our inference method is applicable for any problem that could traditionally utilize ABC inference.We demonstrate that our method is guaranteed to out perform standard ABC methods under a few reasonable assumptions.

Methods
In this section, we describe a commonly used stochastic approach to modeling biochemical reaction networks along with standard algorithms for both simulation and parameter inference.We also describe the fundamental concepts of MLMC.Finally, we present our multilevel approach to ABC inference and derive the asymptotic performance improvement of the method.

Discrete-state, continuous-time Markov processes
Consider a biochemical reaction network involving N chemical species with copy numbers X 1 , X 2 , . . ., X N that react via a network of M chemical reactions of the form where k j is the kinetic rate constant of reaction j, and ν − i,j and ν + i,j are, respectively, the reactant and product stoichiometries for the species X i in reaction j.Under the assumption that the molecules are well-mixed, the probability that the jth reaction occurs in the time interval (t, t+∆t] is given by a j (X(t); k j )∆t where a j is the propensity function of the reaction, and is given by where ] T is the column vector of copy numbers representing the system state at time t.
We can model the reaction network defined by Equation (2) as a discrete-state, continuous-time (DSCT) Markov process in which each reaction channel is governed by a time-varying Poisson process with rate parameter λ(t) = t 0 a j (X(s); k j ) ds.Such a DSCT Markov process can be represented according to the Kurtz representation [38] as where the Y j (λ) are unit time Poisson processes with rate parameter λ and ν j is the state transition that results when reaction j takes place, that is Let p (x, t | y, s) denote the transitional density function of the DSCT Markov process given in Equation ( 4), that is, the probability X(t) = x given X(s) = y where t > s.
Given an initial condition, X(0) = x 0 , the evolution of p (x, t | x 0 , 0) is governed by the It should be noted that Equation ( 5) is actually a system of ODEs that is potentially countably infinite since x ∈ N N .With the exception of reaction networks involving only zeroth and first order reactions the CME has no closed form solution, and it is generally only computationally tractable when the number of possible states is small [7,39].

The chemical master equation and Bayesian inference
The solution of Equation ( 5) is of critical importance to the Bayesian approach of parameter inference for DSCT Markov processes [6].Given a realization of Equation ( 4), The likelihood, p (X d (t 1 ), X d (t 2 ), . . ., X d (t Nt ) | θ), can be expressed in terms of the transitional density function p (x, t | y, s; θ); that is, the solution to the CME parameterized by the kinetic rate parameter vector θ.The likelihood term becomes, where t 0 = 0 and X d (0) = x 0 .
We now introduce two models with convenient closed form solutions to the CME for theoretical purposes.Our aim in studying these two relatively simple models with closed form solutions is to illustrate the mathematical rigor of our analysis.Once we have established this, we will then apply our method to more practical examples where the CME is intractable.

Example 1: Degradation
The simplest chemical reaction model one could conceive is the stochastic form of exponential decay; that is, the degradation model.This model has a single reaction, where Y represents any chemical species that is not of interest, k is the kinetic rate constant for the reaction, and x 0 is the initial condition.Figure 1(a) shows some typical realizations of this model generated using the GDM; note that X can never increase in time.Let p (x, t | x 0 , 0; k) be the transitional density function of the DSCT Markov process for Equation (8); that is, the solution to the CME parameterized by the kinetic rate parameter, k.The closed form solution to the CME can be obtained by noting that p (x, t | x 0 , 0; k) = 0 for any x > x 0 , hence the CME can be truncated into a finite system of ODEs that may be solved through induction.The solution is given by [40] p The evolution of p (x, t | x 0 , 0; k) is shown in Figure 1(b).
For the purposes of inference of k, if it is given that X(t) = x, we can view Equation (9) as the likelihood term in Equation ( 1) when the number of observation times is N t = 1.This enables the direct evaluation of the posterior PDF and cumulative distribution function (CDF) for the degradation model.Examples are given in Figure 1(c)-(d).

Example 2: Degradation/Production
A natural extension to the degradation model (Equation ( 8)) is to incorporate a production reaction, where k 1 and k 2 are, respectively, the degradation and production kinetic rate constants and x 0 is the initial condition.Again, Y denotes any chemical species that is not of interest, and Z represents a chemical species or process that produces X.We note that the copy number of Z is constant, thus it is not required to be included in the CME.
The example realizations that are shown in Figure 2(a) demonstrate the fact that X is not strictly decreasing in time, thus the CME cannot be truncated into a finite system of ODEs without approximation.
In this case, we denote the transitional density function as p (x, t | x 0 , 0; k 1 , k 2 ).Despite the countably infinite nature of the CME in this case, it can also be solved analytically [41] to give where b state by approximately t = 30 (sec) as shown in Figure 2(b).Just as with the degradation model, the exact posterior PDF can be derived using Equations ( 7) and ( 11) for the purposes of inference of both k 1 and k 2 given X at discrete points in time.

Stochastic simulation: Gillespie direct method
In general, only models dealing with zeroth and first order reactions have closed form solutions [41].If we wish to study stochastic models of chemical reaction networks that have higher order reactions, then stochastic simulation must be utilized.The most well known exact stochastic simulation algorithm is the GDM [8].
The GDM arises naturally from Equation (4) by recalling that the time to the next event of a Poisson process with rate parameter λ is exponentially distributed with parameter λ.Therefore at time t, if a 0 = M j=1 a j (X(t)), then ∆t ∼ Exp(a 0 ) where the next reaction occurs at t + ∆t.The next reaction, R, to take place can be determined by sampling the probability mass function p (R = j) = a j /a 0 .The state vector can then be updated by adding ν j .The result of repeating this process up to a given end time, T , is the GDM, as shown in Algorithm 1.
Algorithm 1 The Gillespie direct method 1. Initialize X and t.
4. If t + ∆t > T , go to line 8, otherwise continue to line 5.
5. Select reaction j with probability a j (X)/a 0 .

Terminate and return X.
In this study, we restrict our experimentation and discussion to the GDM for stochastic simulation to ensure the only source of approximation error is due to our inference method.
As a result, we do not consider approximations such as the tau-leaping method [9].

Parameter inference: ABC rejection
ABC methods provide a means of sampling an approximation to the posterior when the stochastic process of interest has an intractable likelihood but can be simulated [6,17].
In the context of biochemical reaction networks, this means that repeated sampling of the model using the GDM replaces the explicit solution of the CME.
Given a data set, D, and a prior distribution for the parameter of interest, θ, we approximate Equation (1) with In Equation ( 12), D s is simulated data from the model of interest, ρ is a suitable distance metric and is a sufficiently small acceptance threshold, such that p The most direct approach to sampling the posterior in Equation ( 12), given in Algorithm 2, is the ABC rejection sampling method [6,17,18].
For this study, the data set, D, for the model of interest is assumed to be the observation of a single realization, X d (t), observed at N t uniformly spaced time points where X s (t) is a sample path generated by the GDM using the candidate parameter vector, θ * , that has been sampled from the prior distribution p(θ).A natural distance metric, ρ, for such data is where

Multilevel Monte Carlo sampling
To explain the basics of MLMC, consider the task of computing E [f (X)] for a random variable X with unknown distribution and a suitably defined functional f .If we have another random variable Y that approximates X then we have, That is, an unbiased estimator for E [f (Y )] will be a biased estimator for E [f (X)].If it is possible to simulate X, then we can correct for this bias by using an estimator for [31].If Y can be constructed in such a way that the estimator for E [f (Y )] and the bias E [f (X) − f (Y )] can be computed more efficiently than the estimator for E [f (X)] then we have a net computational gain.
MLMC extends this idea to the case when there exists a sequence random variables {Y } 0≤ ≤L , such that as increases the simulation time for E [f (Y )] increases and the bias decreases at a suitable rate [31,37].The resulting recursive application of Equation ( 14) yields the telescoping sum Under certain conditions, constructing an estimator based on Equation ( 15) is significantly faster than estimating E [f (X)] directly [31,33,34].

A multilevel approach to inference
In this section, we develop a new approach to ABC inference using MLMC and provide key theoretical results about the performance gain using our approach.The method, which we refer to as multilevel approximate Bayesian computation (MLABC), combines ABC rejection sampling using a sequence of acceptance thresholds to approximate the CDF of the posterior to within a prescribed level of accuracy defined in terms of the root mean squared error (RMSE).For brevity, we only present the key features of the analysis here; for detailed analysis of the method, based on the work of Giles et al. [37], see Appendices S1-S3.

Derivation
We present, for the sake of simplicity, our MLABC method in terms of the degradation/production model (Equation ( 10)).However, we note that applying the ideas to different models with different numbers of unknown parameters is straightforward extension of the degradation/production method.Given observed trajectory data, D(X d ), the task is to approximate, at a point (s 1 , s 2 ) ∈ R 2 , the posterior CDF given by We can reformulate Equation ( 16) as the expectation, where 1 (0,s 1 ]×(0,s 2 ] is the indicator functional, Assuming distance metric ρ, as defined in Equation ( 13), along with a suitably chosen acceptance threshold, , the standard Monte Carlo estimator is where the number of samples, n, is sufficiently large and (k Now, consider a geometric sequence of acceptance thresholds = 0 K − for integer K ≥ 2 and = 0, 1, 2, . . ., L and denote (k 1 , k 2 ) as the random vector distributed Using this sequence of posterior distribution approximations, a multilevel estimator for the posterior CDF at the point (s 1 , s 2 ) can be determined as with , where the sample sizes, n , are sufficiently large to ensure the estimator variance is below some predetermined value.In terms of bias, the multilevel estimator in Equation ( 19) is equivalent to the standard Monte Carlo estimator with = L in Equation (18).By evaluating Equation ( 19) at a set of points in R 2 combined with a suitable interpolation scheme (see Appendix S3), an approximation of the entire posterior CDF can be constructed.
The estimator given in Equation ( 19) is the essence of the MLABC method.Although we present the method in terms of the degradation/production model with two unknown parameters, an equivalent general estimator can be formed for any stochastic model with M unknown parameters by evaluating the indicator functional over the region (−∞, . We now aim to analyze and demonstrate that, under certain reasonable assumptions, this approach can always provide a computational gain over standard ABC rejection sampling for a sufficiently small target bias level.

Assumptions
There are three main assumptions required for the analysis of Equation ( 19) [37].The first two assumptions are related to the convergence rates of the posterior approximation as → ∞.The third is a condition on the computation time for generating a sample pair which is required when computing the bias correction term µ bias .
Such a sample pair is a sample from the joint distribution of (k More specifically, given a geometric sequence of acceptance thresholds = 0 K − for integer K ≥ 2, we assume there exist constants α > 0, β > 0 and γ > 0 such that: 2 as → ∞, (k 1 , k 2 ) converges strongly to (k 1 , k 2 ) with order β.That is, 3 the computation time for sampling the distribution of (k Assumptions 1 and 2 are reasonable due to the convergence properties of ABC rejection itself [18].However, obtaining exact values or estimates for α and β is generally difficult.
In practice, exact values of α and β is not required, though we assume these are known in order to analyze the asymptotic computational gain.
The constant γ depends on the dimensionality of the parameter space, since the computation time is inversely proportional to the average acceptance rate of the ABC rejection method.The general proof of Assumption 3 is given in Appendix S1.For the degradation/production model we have γ = 2.

Theoretical performance gain
In this section, we present asymptotic bounds on both the RMSE of the standard ABC inference and our MLABC method.We then express the asymptotic computation time given a target RMSE, h.This allows us to construct an expression for the asymptotic computational gain in terms of the convergence parameters α and β.We do this in the context of a two parameter inference problem, such as the degradation/production model (Equation ( 10)), leaving more general analysis to Appendix S2.
Assume that the posterior PDF, . Such an assumption is reasonable for a sufficiently large number of observation times N t ; for the degradation/production model, this assumption is valid for N t ≥ 2. Now apply a regular discretization to R using D divisions in each dimension, resulting in (D + 1) 2 grid points {(s i 1 , s j 2 )} 0≤i,j≤D , where By applying estimators Equation (18) or Equation ( 19) we obtain a discrete approximation, µ i,j = F (s i 1 , s j 2 ), to the true posterior CDF.The RMSE of this approximation is For simplicity, we will ignore the interpolation of µ i,j for a continuous approximation to F over the entire region R.It is important to note, however, our analysis (Appendices S2 and S3) accounts for this, and is crucial to our results.
The standard Monte Carlo estimator is given by µ = { F (s i 1 , s j 2 )} 0≤i,j≤D where F (s i 1 , s j 2 ) is computed using Equation ( 18) with = L .If the convergence parameter α is known, the RMSE for the standard Monte Carlo estimator, µ, is bounded, for some constant c 1 , by where v L is the variance of (k 1 , k 2 ) and n is the number of Monte Carlo samples.
Under Assumption 3, and by taking L = (1/α) log K (1/h), the cost of using standard Monte Carlo to construct an approximation, µ, to the posterior CDF with RMSE(µ) = This bound is identified by bounding the right hand side of Equation ( 20) by h and solving for a lower bound on the number of accepted samples, n.
Bounding the RMSE for the multilevel estimator, µ M L = { F (s i 1 , s j 2 )} 0≤i,j≤D with F (s i 1 , s j 2 ) computed using Equation ( 19), is a relatively straightforward application of the method for obtaining Equation ( 20) along with invoking Assumption 2. The equivalent result for the multilevel estimator, for some constant c 3 , is where v 0 is the variance of (k 1 , k 2 ) 0 and n are the Monte Carlo sample numbers for each level.
The upper bound on the computation time for µ M L depends on the choice of the number of samples for each level, n .We can use a Lagrange multiplier method to choose the n such that the asymptotic cost is minimized under the constraint of RMSE(µ Using Assumption 3 and the optimal n given by Equation (24), we arrive at an expression for the asymptotic bound on the computation time for evaluating µ M L .In the case of β < γ, then there exists some constant c 4 such that The results from Equation (22) and Equation ( 25) directly yield a reduction in the order of the computation time upper bound from using the multilevel method.We denote the reduction as the asymptotic computational gain, given by Note that Equation (26) indicates that, assuming the upper bounds are achieved asymptotically, it is always possible to choose a target RMSE, h, such that the multilevel approach achieves some desired level of computational gain.The rate of growth of this gain as a function of h depends on the convergence characteristics of the sequence of ABC posterior approximations, however it is always an improvement since α > 0 and β > 0.

Practical application of MLABC
In practice, the convergence parameters α and β are unknown.In this section, we present a practical implementation of an algorithm for MLABC that does not depend on explicit knowledge of α and β.We find that our algorithm can obtain up to an order of magnitude performance improvement while still maintaining a desired level of accuracy.

Removing dependence on convergence parameters
First, note that when ABC methods such as ABC rejection are used in practice, there are certain assumptions made about the weak convergence rate, α.If we expect n accepted samples using acceptance threshold to provide an acceptable estimator of the real posterior CDF, then we are implicitly assuming α ≥ log e (log e (D + 1) 2 )/n / (log e ).
Therefore, we can determine, for a given scale factor, K, and base level threshold, 0 , the number of levels, L, required to match the final bias of the standard ABC rejection method.As a result, we can treat α as a known parameter in the sense of equivalence to the standard ABC rejection estimator.
Second, consider the role of β in determining the n in Equation (24).While the details are given in Appendix S2, we informally state that the summation in Equation ( 24) can be derived by showing that Assumption 2 implies, for some constant c 5 , That is, K −β is used to bound the variances of the bias correction terms in Equation (19).
It is important to note that the rigorous version of Equation ( 27) requires a smoothing process to be applied to the indicator functional to ensure the Lipschitz continuity conditions discussed in the supporting information (Appendix S2 and S3).
If we knew exactly, for each level , the computation time to generate a posterior Algorithm 3 Sampling method for variance reduction of bias correction term µ bias 1.Let n d denote the number of samples accepted at levels and − 1.

Sample the prior
3.2 Generate X s (t) using the GDM with kinetic rates (k 1 , k 2 ) * .
* as both a level and a level − 1 sample, increment n d ← n d + 1 and go to line 3.5.
) * as a level − 1 sample only.
3.5 If j < n , increment j ← j + 1 and go to line 3.1, otherwise go to line 4.

Sample the prior
4.2 Generate X s (t) using the GDM with kinetic rates (k 1 , k 2 ) * . 5. Terminate and return all accepted sample pairs.realization, c , and the variance, v , of each bias correction term, µ bias , then we could calculate the optimal n using the same Lagrange multiplier approach as for Equation (24).
The result would be, In practice we can can only estimate c and v .Using the approach used by Anderson et al. [33] and Lester et al. [34] we generate a relatively small number of trial samples at each level to obtain estimates for c and v that work well in practice.Using this approach we do not have the same theoretical bounds on the RMSE.However, accurate approximations for the variances will result in estimators with a RMSE that is close to the target in practice.
Algorithm 4 MLABC (Multilevel approximate Bayesian computation): Estimates the posterior CDF F (s 1 , s 2 ) 1. Let n, and D be parameters from equivalent ABC estimator.
3. Calculate n 0 , n 1 , . . ., n L using Equation ( 28) with c and v estimated using 100 samples on each level.
6. Repeat steps 6.1-6.4 for every point (s 1 , s 2 ) ∈ G where, G is a grid of (D + 1) 2 points over the support of the posterior.

Improving performance with variance reduction
In Equation (28), note that n ∝ √ v .Furthermore, note that for the bias correction terms in Equation ( 19), our only concern is the expected difference between indicator function values at each level rather than the expected values themselves.As a result, if we can introduce some correlation in the generation of accepted samples at each level, then the variance of the estimators and hence the number of samples required will decrease.In this work, we implement a simple method to ensure such correlation.Given a simulated trajectory, X s (t), then ρ(D(X s ), As a result, we can sample level − 1 first and keep track of any samples that can also be accepted at level ; such samples can be validly taken as samples from both levels.A pair constructed this way will make no contribution to the bias correction term, hence reducing the variance.We arrive at the method presented in Algorithm 3 for reducing the variance when sampling for the bias correction term µ bias .

The multilevel ABC algorithm
By combining Equation ( 28) with Algorithm 3, we construct a practical implementation of the MLABC estimator (Equation ( 19)).This MLABC method, presented in Algorithm 4 for the degradation/production model, is implemented as a prototype using the C programming language.The prototype (Code S3) is specific to biochemical reaction network models, however, only minor changes are required to change the target application.
In this section, we examine the accuracy and performance of MLABC using the previously presented degradation and degradation/production models because we can directly evaluate the estimator RMSE using the exact solution to the CME.We then compare MLABC with standard ABC rejection using two more complex biochemical reaction networks.

Estimating convergence parameters
The asymptotic computational gain using MLABC is presented in Equation ( 26).To   Using these estimates of α and β it is possible to predict the asymptotic growth of computational gain by using MLABC based on sample numbers chosen according to Equation (24).These computational gains are given in Figure 3(c)-(d) for N t = 2, 3, . . ., 10.By using Equation (24), we expect minimal increase in the RMSE compared with standard ABC rejection.

Performance using estimated convergence parameters
We now compare numerical simulation results against the theoretical performance and error analysis.Throughout we refer to the measured computational gain of our MLABC approach that is given by where µ and µ M L are the standard ABC rejection and MLABC estimators, respectively, and C(•) is the averaged measured computation time to evaluate the estimator.
For the degradation model, 20 independent MLABC and ABC rejection CDF estimators are computed for target RMSE h ∈ [0.1, 0.25].The CDF is approximated over the interval [0, 0.5] on a grid of 1, 000 nodes using data with N t = 8.The sample numbers, n , are computed according to Equation ( 24) using the estimated convergence parameter values of α ≈ 0.96 and β ≈ 1.54.As shown in Figure 4(a), the increase in computational gain (Equation ( 29)) is the same order of magnitude as the theoretical asymptotic prediction (Equation ( 26)), albeit at a smaller absolute scale.In Figure 4(b), we see that the target of RMSE ≤ h is achieved with the exception of h = 0.1, however, the fact that there is also an increase in RMSE at h = 0.1 for the standard ABC estimator indicates that this is a feature of the problem that can be probably be improved with the application of smoothing to the indicator functional (see Appendix S3).An important remark must be made about these results.the best computational gain that can be achieved in practice, but rather they provide experimental validation for our theory.If the values of α and β are known, then it is possible to predict the asymptotic computational gain available whilst maintaining control on the RMSE of the estimator.Furthermore, we note that, as predicted by the theory, the computational gain grows proportionally to a power of −1 .

Performance using empirical sample numbers
We now look to the more practical approach to MLABC, as presented in Algorithm 4.
Here we specifically focus on the degradation/production model.In the first experiment, Given the joint CDF, F (s 1 , s 2 ), the marginal CDFs, F 1 (s) and F 2 (s), can be determined using For some significance level, a, the (1 − a)100% credible interval for parameter The credible interval provides a measure of uncertainty around the parameter estimates.

Higher dimensional and higher order models
We now investigate the validity of our MLABC approach to inference for higher dimensional and higher order models.In the first instance, we investigate the inference problem in four-dimensional parameter spaces for first order reactions.We then investigate a more biologically inspired enzyme kinetics model that includes a second order reaction.

Mono-molecular chains
A general mono-molecular chain biochemical reaction network has the form curse of dimensionality inversely affects the order of the acceptance rate, γ, it is logical to conclude that there is also an additional influence on the convergence rates α and β.
More experimental and theoretical work is needed to analyze the relationship between these convergence rates and the dimensionality of the parameter space.

Higher order models
We now test the applicability of MLABC to more general biochemical reaction networks with higher order reactions.Such networks rarely yield a tractable solution to the CME [7,39].As a result, such higher order models are practical target applications for MLABC.
We consider a Michaelis-Menten enzyme kinetic model [42], which describes the dynamics of an enzyme-catalyzed reaction of a substrate S into a product P with the enzyme E acting as a catalyst.A three reaction Michaelis-Menten model is given by with initial conditions An example realization demonstrating the additional complexity of the dynamics of this model is provided in Figure 8.The realization displays the conversion of S into P .Note that as S(t) → (k 2 +k 3 )/k 1 , the propensity function of the third reaction a 3 (ES(t); k 3 ) → k 3 e 0 /2, so the shape of the ES curve depends crucially on this ratio of parameters.As a result, more observations time points are required to ensure the assumption of compact support for the posterior is reasonable.We choose this interval to highlight the fact that as N t is increased the computational gain reaches a maximum then plateaus.We conjecture that this reflects a point at which little information is gained through the additional observations.Just as with our other models we compute 20 independent CDF estimators using ABC and MLABC for each value of N t with target RMSE h = 0.2.In this case, we take a uniform prior with

Conclusion
In this study, we present a new approach to computational Bayesian inference using MLMC sampling.We perform a general analysis based on the approximation of posterior CDFs using MLMC techniques developed by Giles et al. [37], to show that under our convergence assumptions, asymptotically, a net computational gain is always achievable for some sufficiently small value of RMSE, h, and simulation results confirm this prediction.
We also develop a practical implementation of the MLABC method that does not require the convergence rate parameters to be known a priori.Numerical estimates of the posterior CDF over a range of models suggest that a computational gain of four to 20 times can often be achieved over standard ABC rejection, with larger data set dimensionality improving this gain, up to some maximum.Under the right conditions, such as one-dimensional inference problems, a computational gain of up to 60 times is achievable.
Though the target application of this work is parameter inference for stochastic biochemical reaction network models, the MLABC method is as general as ABC rejection.
From a practical perspective, MLABC can be used in place of ABC rejection if the standard assumptions on weak and strong convergence hold.Minor modifications to the provided prototype code are required to achieve this.
While our current approach is a step towards dealing with the curse of dimensionality, there is still much work to be done.For the purposes of this initial investigation, we use ABC rejection for our benchmark inference method and as the basis of the MLABC method.The most natural extension of this work is the application of the MLMC framework to other ABC methods.We acknowledge that more advanced approaches such likelihood-free Markov chain Monte Carlo [26] and likelihood-free sequential Monte Carlo [27] will generally deal with higher dimensional models and data more efficiently than ABC rejection.Our MLABC framework, however, is sufficiently general that it is not intimately tied to ABC rejection and there will be future opportunities to apply this approach to these more advanced methods.

Figure 2 (
c)-(d) shows examples of the posterior PDF and CDF.

For
the degradation/production model, 20 independent MLABC and ABC rejection CDF estimators are computed for target RMSE h ∈ [0.1, 0.25].The CDF is approximated over the region [0, 1] × [0, 10] on a grid of 100 × 100 nodes using data with N t = 4.The sample numbers, n , are computed according to Equation (24) using the estimated convergence parameter values of α ≈ 0.75 and β ≈ 0.25.Figure 4(c)-(d) provides, for the production/degradation model, the computational gain using MLABC over ABC rejection and the RMSE versus the target RMSE.Compared to the degradation model, computational gain is significantly less; however, it is consistent with the theoretical results.Similarly, Figure 4(d) shows practically no increase in the RMSE of the CDF estimator.

Figure 4 .
Figure 4. Measured performance gain and error.(a) Computational gain and (b) RMSE for the degradation model results using data with N t = 8 observation times and convergence rate estimates α ≈ 0.96 and β ≈ 1.54.(c) Computational gain and (d) RMSE for the degradation/production model results using data with N t = 4 observation times and convergence rate estimates α ≈ 0.75 and β ≈ 0.25.

Figure 5 .Figure 6 .
Figure 5. Measured performance gain and error for the degradation/production model.Using data with N t = 4 observation times and 100 samples at each level to select sample numbers n .(a) Measured computational gain and (b) RMSE as a functions of h.

Figures 6 (
Figures 6(c) and 6(e) present the estimates produced by standard ABC for different values of N t .These are to be compared with the estimates produced by MLABC as shown in Figures 6(d) and 6(f).These results show that, from a practical perspective, the MLABC method is just as appropriate as ABC rejection.That is, the true parameter values lie within the credible region of the posteriors and these credible intervals are almost the same for MLABC compared with ABC rejection.Both methods yield very similar mean and credible interval values.

Figure 7 .
Figure 7.Comparison of ABC and MLABC: Four reaction mono-molecular chain.Performance of ABC and MLABC for the four reaction mono-molecular chain as the number of observation times, t , increases.(a) Computation time.(b) Computational gain.(c)-(f) ABC parameter estimates.(g)-(j) MLABC parameter estimates.The true parameter values, (k 1 , k 2 , k 3 , k 4 ) = (1, 0.1, 0.01, 0.01), are indicated with dashed lines, posterior means and 90% credible intervals are indicated with solid lines and shaded areas, respectively.

support {(k 1 Figure 10 .
Figure 10.Comparison of ABC and MLABC: Michaelis-Menten model.Performance of ABC and MLABC for the Michaelis-Menten model as the number of observation times, N t increases.(a) Computation time.(b) Computational gain.(c), (e) and (g) ABC parameter estimates.(d), (f) and (h) MLABC parameter estimates.The true parameter values, (k 1 , k 2 , k 3 ) = (0.005, 0.025, 0.05), are indicated with dashed lines, posterior means and 90% credible intervals are indicated with solid lines and the inference problem is to determine the posterior PDF, p (θ | X d (t 1 ), X d (t 2 ), . . ., X d (t Nt )), for the kinetic rate parameters, θ = [k 1 , k 2 , . . ., k M ].If we denote p (θ) as the prior PDF, that represents some prior knowledge about θ, then the posterior PDF is given through application of They do not represent