Inverse sensitivity analysis of mathematical models avoiding the curse of dimensionality

Biological systems have evolved a degree of robustness with respect to perturbations in their environment and this capability is essential for their survival. In applications ranging from therapeutics to conservation, it is important to understand not only the sensitivity of biological systems to changes in their environments, but which features of these systems are necessary to achieve a given outcome. Mathematical models are increasingly employed to understand these mechanisms. Sensitivity analyses of such mathematical models provide insight into the responsiveness of the system when experimental manipulation is difficult. One common approach is to seek the probability distribution of the outputs of the system corresponding to a known distribution of inputs. By contrast, inverse sensitivity analysis determines the probability distribution of model inputs which produces a known distribution of outputs. The computational complexity of the methods used to conduct inverse sensitivity analyses for deterministic systems has limited their application to models with relatively few parameters. Here we describe a novel Markov Chain Monte Carlo method we call “Contour Monte Carlo”, which can be used to invert systems with a large number of parameters. We demonstrate the utility of this method by inverting a range of frequently-used deterministic models of biological systems, including the logistic growth equation, the Michaelis-Menten equation, and an SIR model of disease transmission with nine input parameters. We argue that the simplicity of our approach means it is amenable to a large class of problems of practical significance and, more generally, provides a probabilistic framework for understanding the inversion of deterministic models. 2 Author summary Mathematical models of complex systems are constructed to provide insight into their underlying functioning. Statistical inversion can probe the often unobserved processes underlying biological systems, by proceeding from a given distribution of a model’s outputs (the aggregate “effects”) to a distribution over input parameters (the constituent “causes”). The process of inversion is well-defined for systems involving randomness and can be described by Bayesian inference. The inversion of a deterministic system, however, cannot be performed by the standard Bayesian approach. We develop a conceptual framework that describes the inversion of deterministic systems with fewer outputs than input parameters. Like Bayesian inference, our approach uses probability distributions to describe the uncertainty over inputs and outputs, and requires a prior input distribution to ensure a unique “posterior” probability distribution over inputs. We describe a computational Monte Carlo method that allows efficient sampling from the posterior distribution even as the dimension of the input parameter space grows. This is a two-step process where we first estimate a “contour volume density” associated with each output value which is then used to define a sampling algorithm that yields the requisite input distribution asymptotically. Our approach is simple, broadly applicable and could be widely adopted.


Abstract
Biological systems have evolved a degree of robustness with respect to perturbations in their environment and this capability is essential for their survival. In applications ranging from therapeutics to conservation, it is important to understand not only the sensitivity of biological systems to changes in their environments, but which features of these systems are necessary to achieve a given outcome. Mathematical models are increasingly employed to understand these mechanisms. Sensitivity analyses of such mathematical models provide insight into the responsiveness of the system when experimental manipulation is difficult. One common approach is to seek the probability distribution of the outputs of the system corresponding to a known distribution of inputs. By contrast, inverse sensitivity analysis determines the probability distribution of model inputs which produces a known distribution of outputs. The computational complexity of the methods used to conduct inverse sensitivity analyses for deterministic systems has limited their application to models with relatively few parameters. Here we describe a novel Markov Chain Monte Carlo method we call "Contour Monte Carlo", which can be used to invert systems with a large number of parameters. We demonstrate the utility of this method by inverting a range of frequently-used deterministic models of biological systems, including the logistic growth equation, the Michaelis-Menten equation, and an SIR model of disease transmission with nine input parameters. We argue that the simplicity of our approach means it is amenable to a large class of problems of practical significance and, more generally, provides a probabilistic framework for understanding the inversion of deterministic models.

Author summary
Mathematical models of complex systems are constructed to provide insight into their underlying functioning. Statistical inversion can probe the often unobserved processes underlying biological systems, by proceeding from a given distribution of a model's outputs (the aggregate "effects") to a distribution over input parameters (the constituent "causes"). The process of inversion is well-defined for systems involving randomness and can be described by Bayesian inference. The inversion of a deterministic system, however, cannot be performed by the standard Bayesian approach. We develop a conceptual framework that describes the inversion of deterministic systems with fewer outputs than input parameters. Like Bayesian inference, our approach uses probability distributions to describe the uncertainty over inputs and outputs, and requires a prior input distribution to ensure a unique "posterior" probability distribution over inputs. We describe a computational Monte Carlo method that allows efficient sampling from the posterior distribution even as the dimension of the input parameter space grows. This is a two-step process where we first estimate a "contour volume density" associated with each output value which is then used to define a sampling algorithm that yields the requisite input distribution asymptotically. Our approach is simple, broadly applicable and could be widely adopted.
physical principles and simplifying assumptions. The Navier-Stokes equations include a 22 small number of parameters (e.g., viscosity, surface tension) that can be independently 23 measured with high precision and laboratory-scale experiments can be conducted in 24 well-controlled environments with a high degree of reproducibility. Modelling in the 25 biological sciences by contrast, is currently in a rapid state of development, many 26 models are highly parametrized, and their parameters cannot be independently 27 measured but must be inferred from the output of the model. Even in situations when 28 the mathematical model is well established, such as the Hodgkin-Huxley equations, 29 some parameters may be well characterized from experimental observations, while 30 others are not. A recent study of the Hodgkin-Huxley equations [9] indicates that 31 sufficient membrane potential data can provide a reasonable fit to the three major 32 conductance parameters. However, no matter how complete and accurate the membrane 33 potential data, the time constants associated with the rate parameters in the model are 34 practically unidentifiable. Whereas parameter estimation seeks a single value for model 35 parameters that could produce a set of output values, the corresponding inverse 36 sensitivity problem seeks the distribution of model parameters that is compatible with 37 an observed output distribution. For example, in the Hodgkin-Huxley system, an 38 inverse sensitivity analysis can determine how variable the unobserved rate parameters 39 can be whilst having no real effect on the peak-to-trough voltage. 40 For a large class of models of practical interest in biology, including ordinary 41 differential equations (ODEs) and partial differential equations (PDEs), analytic 42 methods for inverse sensitivity analysis are unavailable. Further it is common for the 43 dimension of the inputs to exceed that of the outputs. For deterministic systems, this 44 means that there exist non-singular sets of inputs that can yield a given output and 45 that any feasible output distribution may be caused by any member of a collection of 46 possible input distributions. In Bayesian inference of stochastic systems, prior 47 distributions are specified for parameters of the likelihood ensuring a unique input 48 distribution for a given set of output data. For deterministic models, however, there is 49 no uncertainty in the input-to-output map, meaning standard Bayesian techniques 50 cannot be used without introducing an artificial error distribution. Here we describe a 51 conceptual framework to understand the inversion of deterministic systems. We then 52 describe a novel Markov Chain Monte Carlo method we call "Contour Monte Carlo" 53 and use this approach to invert a number of systems, including the logistic growth 54 equation, the Michaelis-Menten equation, and an SIR model of disease transmission. In 55 contrast to methods that use a grid to discretize the parameter domain to determine a 56 numerical output-to-input map [10], our method uses stochastic sampling to instead 57 approximate the multiplicity of a given output value; that is, the number of inputs that 58

PLOS
3/22 map to that value. By using random sampling, the computational complexity of our 59 approach does not grow with parameter dimensions, meaning that it can be applied to 60 large model systems. We also argue that the simplicity of our approach means it is 61 amenable to a large class of problems of practical significance. Consider a deterministic function from a vector of inputs Λ = (λ 1 , λ 2 , ..., λ K ) ∈ R K to 71 output quantities Q = (Q 1 (Λ), Q 2 (Λ), ..., Q P (Λ)) ∈ R P , where P < K. Since the 72 dimensionality of the input parameters exceeds that of the output quantities, each 73 feasible output value permits a set of possible input causes. This means that without 74 further information it is not possible to determine a single cause for an output value, 75 unless that output value is "singular". That which is true for individual output values is 76 also true for output distributions. For any given output distribution, there are a 77 collection of possible input distributions that could yield it.

78
The distribution that we want to estimate is a conditional density of the form 79 p(Λ|Q(.), data), where Q(.) denotes the particular form of input-to-output function 80 used, and data denotes a collection of output data points. In analogy to Bayesian 81 inference, we term this conditional density over inputs the "posterior distribution". In 82 what follows we implicitly assume a dependence on Q(.), and omit it for brevity. To 83 sample from this distribution efficiently, we first consider the joint distribution, where the prior term does not contain any dependence on the data since it solely 85 depends on the geometry of the input-output map, which is determined by Q(.). We 86 assume that the output data distribution has been fitted with an appropriate 87 distribution, to yield a density value for any value of output, p(Q|data), which we term 88 the "target distribution". An alternative way to decompose the joint distribution is, For a deterministic model p(Q|Λ, data) = p(Q|Λ), since Q is fully determined by the 90 inputs Λ. Because of the determinacy this term becomes, Equating the right hand side of equations (1) and (2), we obtain an expression for the 92 posterior input distribution,

4/22
To sample from the posterior distribution specified by equation (4), we use our 94 estimates of the prior that are obtained by independent sampling of the inputs 95 uniformly within their bounds (we consider unbounded non-uniform priors in §4.2) to 96 produce an estimator of the distribution, where V(Q) is the (potentially un-normalised) contour volume at an output value of Q -98 a measure of the size of the input set which maps to Q. Due to the continuous nature of 99 the output, the distribution V(Q) actually defines a contour volume density, which can 100 be used to estimate the posterior input distribution, Since the above probability distribution is non-standard and potentially un-normalised, 102 we use MCMC to sample from it, and use Random Walk Metropolis sampling [11] for 103 this process with an acceptance probability of, We recognise that since an approximate form of the un-normalised distribution, as 105 well as its gradients, are known, more efficient forms of MCMC algorithms such as 106 adaptive Metropolis [12], or Hamiltonian Monte Carlo (HMC) [13] could also be used. 107 To summarise, our "Contour Monte Carlo" (CMC) algorithm is composed of two Calculate corresponding output value end for Fit kernel density estimator to output values return V(Q) end procedure procedure JacobianModifiedMetropolis( V, p) Sample from posterior input distribution λ0 ∼ π(.) Sample from arbitrary initialisation distribution Propose new parameter values for inputs Calculate acceptance probability (Eq (7)) u ∼ U (0, 1) Sample from uniform distribution if r > u then λi = λ i Accept proposal else λi = λi−1 Reject proposal end if end for return (λ0, λ1, λ2, ..., λN 2 ) end procedure Since we are reconstructing the posterior input distribution by MCMC sampling, it 114 is crucial to determine when our sampling distribution has converged. To do so, we use 115 the same convergence diagnostic criteria that are used in applied Bayesian inference. 116 We run a number of Markov Chains in parallel and compare their within-chain variance 117 to that between the chains by calculatingR [14], and use a convergence threshold of 118 R <∼ 1.1 on all input parameters. Once convergence is achieved, we discard an initial 119 portion of the chains as "warm-up" [11,15] and use the rest to form our posterior 120 distribution. The type of uniform prior we have described thus far appears sensible since it allocates 123 probability mass in exact accordance with the geometry of the input-output map. In 124 the absence of further information it seems reasonable to suppose that all inputs which 125 produce a given output value are equally likely causes of it, and Algorithm 1 describes 126 how to sample from the resultant posterior input distribution. It is unclear, however, 127 how to use our existing CMC algorithm to handle unbounded prior spaces, where 128 uniform prior distributions cannot be specified. We now extend the framework 129 described in §4.1.1 to handle arbitrary prior distributions. We start with the expression 130 for the posterior distribution, where we have used Bayes' rule to rewrite the conditional prior term. We then recognise 132 that for a deterministic model p(Q(Λ)|Λ) = 1, resulting in the following general 133 expression for the posterior distribution, If the parameter bounds are non-infinite and consist of a total volume V in input space, 135 and we suppose that all values of Λ are equally likely a priori, then the above becomes, 136 However if all parameter values are equally probable then p(Q(Λ)) is simply the 137 proportion of input space defined by the set {Λ : Q(Λ ) = Q(Λ)}. Multiplying this by 138 the overall volume of input space yields the volume of input space occupied by this set, 139 V(Q(Λ)), and we recapitulate the expression for the posterior that we approximated in 140 equation (6). We thus see that the assumption of a uniform prior within a contour 141 volume is equivalent to the assumption of an uniform prior throughout the input 142 domain.

143
More generally, we can imagine setting informative priors on parameters that may or 144 may not be finitely bounded. For parameters with infinite bounds a uniform prior is 145 improper, and we therefore resort to using valid probability distributions that have 146 support over an infinity of values. In general, the priors that we set will not be aligned 147 with the shape of input contours, meaning that there can be a variation in the 148 probability density within a given contour volume. How should we handle parameter 149 inference in this new framework? Equation (8)  procedure GeneralContourMonteCarloGeneralPrior(p(Q|data), p(λ)) Sample from posterior input distribution Calculate corresponding output value end for Whilst our approach is intended for systems where the number of outcomes exceeds the 156 inputs, simpler systems with one input and one output (one-to-one) can be used to 157 build an intuitive understanding of the method. Here the usual Jacobian of a coordinate 158 transformation plays the role of the prior term, p(Λ|Q(Λ)).

159
Illustrative example #1 Consider an input (λ) to output (Q) map of the form, where λ ∈ [0, 1]. Suppose that we seek a distribution over inputs p(λ) such that the 162 corresponding distribution over outputs follows Q(λ) ∼ beta(2, 2) (see the black lines in 163 the right hand column of Figure 1). A straightforward way to generate samples from p(λ) is to first sample Q i ∼ beta(2, 2) then use the inverse map to yield samples of i . For most interesting models in mathematical biology however, including 166 ODEs and PDEs, an inverse map cannot be calculated or does not exist. As such, we 167 aim to generate samples from p(λ) using only the forward map.

183
It may appear that a valid approach to generate input samples from p(λ) is to use a 184 Markov chain Monte Carlo (MCMC) algorithm like random walk Metropolis [16], where 185 proposed λ are accepted as samples if, where p(Q(λ)) is the target density (here a beta(2, 2) distribution) evaluated at Q = λ 2 . 187 This approach, however, results in an input distribution (top-left panel in Figure 1)  Figure 1). The reason for this bias is that it neglects to include a Jacobian term in the 190 density which accounts for the nonlinear change of measure in going from input to 191 output space. If this Jacobian term is included in the Metropolis accept-reject rule, where, in this case, J(Q) = dQ dλ = 2 √ Q, then this results in an input distribution 193 (middle-left panel in Figure 1) which, when transformed, results in an output 194 distribution that corresponds to the target (middle-right panel in Figure 1).

195
Indeed, in this simple example, it is possible to exactly determine the input 196 distribution that results in a beta(2, 2) target. This is given by a Jacobian 197 transformation multiplied by the target density, which is shown by the black lines in the left hand column of Figure 1.

199
Estimating the Jacobian transformations by sampling Whilst the Jacobian may 200 be calculated analytically and the marginalization performed for simple input to output 201 maps, for maps of modest difficulty, this term cannot typically be analytically derived. 202 We now illustrate how approximate Jacobian transformations can be obtained by 203 sampling. To do so, we first sample uniformly from the bounds of λ (Figure 2A) which, 204 when transformed by equation (10), yields a sampling distribution for Q ( Figure 2B).

205
By fitting a Gaussian kernel density estimator to this sampling distribution (dashed 206 blue lines in Figure 2B), we can well approximate the analytic inverse Jacobian density 207 (grey line) with a modest number of samples. This, hence, allows us to use the 208 Jacobian-corrected Metropolis sampler defined by the acceptance ratio of equation (12) 209 to generate samples from the input distribution p(λ).

210
Once the inverse Jacobian transformation has been estimated, we simply replace the 211 actual Jacobian in our Metropolis-based rule given by equation (12), with the 212 sample-based estimatesĴ, Our approach, therefore, consists of two distinct steps: first, independent sampling from 214 the prior space (which is currently assumed to be bounded but, in §4.2, this is relaxed) 215 and transformation of these inputs to form a sample of outputs, followed by the fitting 216 of an empirical density estimator to the output samples; second, Jacobian-modified 217 Metropolis sampling using the Jacobian transformations estimated from the first step. 218 When the sample size of the first step approaches infinity, the Markov chain in the 219 second step converges asymptotically to the target density. For a finite sample size for 220 the first step, the sampling distribution of the Markov chain in the second step does not 221 exactly converge to the target distribution due to Jensen's inequality. For even modest 222 sample sizes, however, we have found this bias negligible compared to other sources of 223 uncertainty.

224
In the bottom row of Figure 1, we illustrate how using our two step method allows 225 us to generate an input distribution which maps to the target. For many-to-one maps, no inverse map exists, meaning that a given output value 228 (unless a singular point) corresponds to a number of input vectors. This means that a 229 given output distribution can be produced from a variety of input distributions. Like in 230 Bayesian inference, to reduce the set of allowable input distributions to one, we specify 231 a prior distribution over input parameters. In contrast to Bayesian inference, this prior 232 is specified as a conditional probability distribution, where we condition on particular 233 output values.

234
A natural way to specify such a prior is to assume that the probability distribution 235 is uniform within the set of possible inputs that generate a given output value. The 236 prior distribution is then given by the probability density, where V(Q) = |Λ | is the volume of input space where Q(Λ ) = Q. We term the size of 238 the region of input space that produces a particular output value, its "contour volume". 239 Note that we use the term "volume" liberally here. For a two-dimension input space Illustrative example #2

246
To explain how this process works we consider the case of two dimensional inputs 247 (λ 1 , λ 2 ) bounded by the region λ 2 1 + λ 2 2 ≤ 2 2 , and a uni-dimensional output, A plot of this function is shown in Figure 3, where it is observed that each value of the 249 function Q = 1/(1 + λ 2 1 + λ 2 2 ) maps to any point along circular "contours" (dashed lines) 250 in input space (λ 1 , λ 2 ), with the exception of singular pointQ. 251 We suppose that we want to produce an input distribution which results in a target 252 output distribution which is uniform across allowed function values Q ∈ [0.2, 1], since 253 the function is bounded by its extreme values at λ 2 1 + λ 2 2 = 2 2 (where Q = 0.2), and 254 (λ 1 , λ 2 ) = (0, 0) (where Q = 1). 255 We seek an analytic expression for the input distribution which is consistent with 256 this target distribution given a uniform prior distribution over inputs. The prior 257 distribution which is uniform over inputs can be transformed from Cartesian 258 (p(λ 1 , λ 2 ) = 1 4π ) to polar coordinate form, where r 2 = λ 2 1 + λ 2 2 and θ = arctan(λ 2 /λ 1 ). When marginalising over θ the above Note: The "Jacobian" associated a nonlinear change in measure for a mapping from R n 263 to R m where m < n, is constructed in Theorem 2.1.1 of [17].

264
For this example, we can analytically calculate the prior term in equation (4) using 265 equation (15) and the normalizing constant equal to the area of the domain Ω (= 4π), 266 p(Λ|Q(Λ)) = 1 V(Q(Λ)) , Then using Eq (4), and targeting a uniform output distribution across the range of Q, 267 p(Q|data) = 1/(4/5) = 5/4, we calculate an analytic form of the posterior input 268 distribution, This can be alternatively expressed in polar coordinates as,
In most applied cases, analytic calculation of the contour volume distribution, as is 273 given by equation (19) in this example, is not possible, and we must instead 274 approximate it by sampling. We now illustrate this process for the current example. To 275 start, we idealise the problem and assume the function value Q is constant within annuli 276 of a given width and distance from the origin ( Figure 4A). Since our function and input 277 domain are well-behaved it is possible to analytically calculate the area of each annulus 278 ( Figure 4B).

288
In the above explanation, we assumed that the function value was constant within a 289 given annulus, however we recognise that the continuous nature of the function means 290 that the contours are actually one dimensional lines (Figure 3). Given the simple nature 291 of this output function we can actually analytically determine the length (which we call 292 a contour volume) of each of these lines. In our example, the output space is a foliation 293 of these one-dimensional lines, whose non-uniform spacing determines a density of 294 contour volumes (roughly, the area of input space that maps to an infinitesimal range of 295 output). In Algorithm 1, we estimate the contour volume distribution by sampling 296 uniformly from the input space ( Figure 4C), and for each set of inputs (λ 1i , λ 2i ), we 297 calculate a corresponding Q i . To approximate p(Q) in equation (18) we then fit a kernel 298 density model to (Q 1 , Q 2 , ..., Q N ), where N is the number of input samples (the black 299 line in Figure 4D). After relatively few input samples, the estimated contour volume 300 distribution well approximates the true distribution of equation (18) (the orange line in 301 Figure 4D).

302
Note that the contour volume distribution p(Q) is equivalent to the two-to-one 303 inverse Jacobian transformation for this problem. Specifically, Figure 4D is analogous to 304 Figure 2B, except for a two-dimensional input example. In all cases where the input 305 spaces are bounded and the input priors are uniform, what we call the "contour volume 306 distribution" is equivalent to the density defined by the inverse Jacobian transformation. 307 We compare the estimated marginal posterior distribution for r that results from the 308 CMC algorithm with the analytic result of equation (21). In the first phase of CMC, we 309 estimate the contour volume for any output value using a kernel density estimator fit to 310 the function evaluations from uniformly sampled inputs ( Figure 4D), and use this in a 311 second phase targeting a uniform output space (within bounds). In this case, our 312 estimated posterior density is of the simple form, in other words we step in inverse proportion to the estimated contour volumes. This 314 results in a marginal posterior density for inputs which is concentrated towards the 315 centre of the input domain ( Figure 5A), which results in a uni-dimensional distribution 316 for r = λ 2 1 + λ 2 2 which is in agreement with the analytic result of equation (21) (Figure 317  5B). A benefit of our algorithm is that we can validate it by comparing the marginal 318 output posterior with the target. Since in this case our output distribution looks 319 approximately uniform ( Figure 5C), it appears that the algorithm is working correctly. 320  (21). The kernel density estimator for the volume of 324 contours is shown in Figure 4D and resulted from 50,000 independent samples from a  Figure 4D). To correct for the over-representation of points with low function values, 334 we must correct for their longer lengths, which we do by estimating V and using the 335 inverse of this as a prior.

336
A more complex test problem comprised of a pair of nonlinear maps (two outputs) 337 with three parameters that was introduced in [10] is analysed in S3-A nonlinear map.

357
We apply the function specified by equation (16) to a large number of independent 358 samples from the above distribution, and we fit a normal distribution to the output 359 distribution ( Figure 6B).

360
(a) Uniform prior distribution We then use the output distribution in Figure 6B as 361 a target for CMC. The resultant posterior distribution is concentrated among points 362 that lie within the same annulus in which the bulk of the probability mass for the 363 original input distribution lies ( Figure 6C). This is because our prior distribution gives 364 equal weight to all points within a given contour. Since the function values in the 365 annulus are similar to those obtained from using the input distribution, we obtain a 366 posterior probability mass that is distributed uniformly around the annulus.

367
Note that the posterior distribution is not the same as the input distribution. This 368 example demonstrates an important characteristic of deterministic models where the 369 input dimension exceeds that of the outputs: a given input distribution induces an 370 output distribution which, when inverted, produces an input distribution that is not 371 necessarily the same as its cause. Information is lost when transforming from the inputs 372 to the outputs that cannot necessarily be regained by inverting the output map.

373
(b) Normal prior distribution 374 We now change the priors such that they coincide with the input distribution given 375 in equation (23). We sample from this prior distribution ( Figure 7A) and use each 376 (λ 1i , λ 2i ) pair as inputs to Q(λ 1i , λ 2i ). We then obtain an approximate output 377 distribution by fitting a kernel density estimator to the resultant output values (Figure 378 7B). Note that because our prior distribution has changed, our prior output distribution 379 V(Q(Λ)) changes (compare Figure 7B with Figure 4B). We use this estimated output The posterior distribution that results is located near the bulk of probability mass of 383 the prior distribution ( Figure 7C). Comparing this with the distribution that resulted 384 from using a uniform prior ( Figure 6C), we see that more informative priors allow us to 385 identify the input parameter distribution. However in both cases the output 386 distribution matched the target ( Figure 6D and Figure 7D). In order to observe the performance of the CMC algorithm we consider three biological 407 models of increasing complexity. We first consider the logistic growth equation, with initial condition 411 where y(t) represents the density of individuals at time t in a population that 412 experiences competition for resources, for example, a bacterial colony confined to a 413 Petri dish. Here r represents the initial (exponential) growth rate of the colony and κ is 414 the carrying capacity. An output distribution was generated by calculating Q 1 = y(8)

415
for each pair of samples from the following independent input distributions, where we assumed that the initial population density was fixed at y 0 = 0.1. The 417 resultant output distribution was fit using a normal distribution and specified as the 418 target. We assumed that the priors for the input parameters were of the form,  To explain this pattern of identification, the elasticity of Q 1 with respect to each 432 parameter was calculated using the method described in [20] and is presented in Figure 433 9. At early times (e.g., t = 8), the population density is sensitive to the growth rate 434 parameter r but is relatively insensitive to the carrying capacity κ since competition for 435 resources has yet to predominate and we are unable to identify this parameter. elasticities for Q 1 = y(8) and Q 2 = y(29) respectively. These curves were generated by 440 assuming y 0 = 0.1, r = 0.5 and κ = 10.

441
We next chose Q 2 = y(29) as the output statistic and used the same input 442 distribution to generate a target output distribution. The posterior input distribution 443 generated by CMC ( Figure 8B) identifies the carrying capacity but not the growth rate. 444 At later times (e.g., t = 29) the elasticities with respect to the parameters r and κ are 445 reversed (Figure 9). For t > 20, the population density is essentially determined by the 446 (equilibrium) carrying capacity and we are unable to identify the intrinsic growth rate 447 other than to determine it must exceed some minimum value in order for the population 448 to grow to near its equilibrium value by t = 29.

449
Finally we used a target distribution of a bivariate normal distribution of (Q 1 , Q 2 ). 450 The prior distributions were unchanged. The posterior density ( Figure 8C) indicates 451 that in this case both input parameters were well-identified. The combined distribution 452 of the two previous output statistics is sensitive with respect to each of the parameters. 453 Note that in this case the posterior distribution ascribes probability mass to a region of 454 parameter space that is similar to the original (causative) input distribution. As 455 discussed in §4.1 this is a special case, not typical of underdetermined input-output 456 maps. This is because in the univariate output cases the estimates of the individual 457 parameters are not correlated with one another, and lead to unbiased identification of r 458 (for Q 1 ) and κ (for Q 2 ) respectively. These constraints ensure that using the joint 459 output distribution produces an unbiased estimation of both parameters.

460
As might be expected, informative priors affect the recovered posterior distribution. 461 To illustrate, we use Q 2 = y(29) as our univariate output summary statistic. Given 462 uniform priors, the posterior distribution ( Figure 10A and Figure 8B) indicates that, 463 although we can identify the carrying capacity, we cannot identify the growth rate. If 464 instead of a uniform prior, we use r ∼ Γ(2.5, 0.2) as the prior distribution which 465 allocates probability mass around r = 0.5, the resultant posterior narrows with respect 466 to this parameter ( Figure 10B). Using a prior distribution that is narrower still, 467 r ∼ Γ(40, 0.0125), we strongly identify both parameters ( Figure 10C).
where k f is the rate constant for the forward reaction E + S → ES, k r is the rate of the 492 reverse reaction ES → E + S, and k cat is the catalytic rate at which the product is 493 formed by the reaction ES → E + P . We assume independent uniform priors on each of 494 the parameters (k f , k r , k cat ), with upper and lower bounds indicated by the axes in 495 Figure 11. We target the bivariate output distribution given by, prior bounds (Figures 11 & 12). The rate of the backward reaction k r , however, cannot 500 be recovered from this output target. Gaussian kernel density estimator with default bandwidth fit to 120,000 iterations 511 (using the "kde" function in the "ks" R [21] package with grid sizes of 300 and maxima 512 of 4 in each dimension [22]), and the posteriors were estimated using c.300,000 MCMC 513 iterations.

514
Some insight in to this pattern of identifiability can be gained by examining the 515 elasticities of E and S to each of the input parameters ( Figure 13). The enzyme and 516 substrate concentrations at times t = 2 and t = 1, respectively, are most sensitive to 517 changes in k cat , and this parameter is the most clearly identified. The enzyme and 518 substrate concentrations at our sampling times are less sensitive to the forward rate of 519 reaction, k f , but these values are apparently sufficient to localise this parameter to a 520 subregion of its prior bounds ( Figure 11). The rate of the backward reaction, however, 521 does not strongly influence either the sampled values of the enzyme or substrate 522 concentrations at our sampling times, and this parameter is practically unidentifiable. 523  An SIR model of disease transmission incorporating a carrying capacity for the 530 population is described by the following system of ODEs [23], where N (t) = S(t) + I(t) + R(t) is the total population density at time t.

533
Including the initial conditions (S 0 , I 0 , R 0 ), we have nine uncertain parameters, each 534 of which is assumed to have uniform prior distributions. The upper and lower bounds of 535 these distributions are provided by the axes in Figure 14. We use a three dimensional We restrict the dimension of the output space in order to examine a highly 540 underdetermined system and to avoid issues associated with kernel density estimation in 541 higher dimensional output spaces. A further discussion of this issue appears in §6.

549
The posterior distribution over inputs indicates modest identification of the 550 parameter r, which dictates the rate of initial exponential growth for the susceptible 551 population (Figure 14r) and the carrying capacity term κ. The posterior distribution of 552 the death rate µ and of the recovery rate γ was also constrained to a subset of input 553 space (Figure 14κ, 14µ, 14γ). The remaining parameters of the model were unidentified 554 using these output statistics (Figures 14). biological counterparts. Assessing the sensitivity of key characteristics of model outputs 563 to perturbations in input parameters provides insight into the sensitivity of the system 564 to each of its constituent elements. These so-called sensitivity analyses of mathematical 565 models allow us to probe the biological system even when biological experiments are 566 infeasible. Inverse sensitivity analysis inverts this process and instead of determining 567 how model outputs vary in response to changes to the input parameter values, estimates 568 a distribution over inputs which achieves a given distribution over outputs.

569
In this paper, we introduce an approach to inverse sensitivity analysis which can be 570 applied to systems with many input parameters, mitigating the curse of dimensionality 571 that limits the scope of methods which rely on grid-based approaches to build an 572 explicit output-to-input map [10]. We have demonstrated that our algorithms can 573 perform inverse sensitivity analysis on mathematical models of biological systems across 574 a range of complexities, including the logistic growth model (2 inputs & 1 output 575 target), Michaelis-Menten kinetics (3 inputs & 2 output targets), and the SIR model 576 with uncertain initial population sizes (9 inputs & 3 output targets). As well as 577 detailing our algorithms, we provide a probabilistic framework for understanding inverse 578 sensitivity analysis, in which "prior" probability distributions are set on the inputs.

579
These prior beliefs over input values are consistent with a "posterior" input distribution 580 which, when transformed through the input-to-output map, results in a "target" output 581 distribution. To sample from these posterior input distributions, we introduce a 582 two-step sampling algorithm. In the first of these steps, input parameters are 583 independently sampled from their prior distributions and, by fitting a kernel density 584 estimator to the output values, this provides an approximate Jacobian transform (which 585 we interchangeably term a "contour volume distribution"), which is used in the second 586 step involving Markov chain Monte Carlo. A similar algorithm for inverse sensitivity 587 analysis has recently been derived from a measure theoretic perspective by [24]. These 588 authors also investigate stability of the posterior distribution with respect to the 589 observed output distribution, the assumed prior distribution and the approximation of 590 the contours of the forward map. We believe that the different path we take to the 591 shared goal offers complementary insight into the algorithm's mechanism and provides 592 an intuitive way to understand inverse sensitivity analysis, more generally.

593
There are several subtleties in the first steps of the processes described in Algorithms 594 1 and 2) which must be understood in order to ensure a valid input distribution is 595 obtained. Indeed, these intricacies complicated our own efforts in testing the algorithms. 596 Provided the output is well-behaved over the space of possible input values, a univariate 597 output distribution can be approximated given a relatively modest number of samples 598 from the input priors using standard kernel density estimation (KDE). Here, for the 599 univariate output target distributions we found that KDE with a Gaussian kernel using 600 default bandwidths from each software package used (Matlab, Mathematica and 601 R [18,19,21]) was able to represent the output distribution with sufficient fidelity to 602 ensure the input posterior recaptured the output target. The number of input samples 603 necessary to ensure convergence to the true posterior input distribution, however, 604 depends on the exact output distribution being targeted. If the bulk of probability mass 605 for the target output distribution lies at a location in output space where the contour 606 volume is rapidly varying, then the input distribution obtained will be sensitive to 607 errors in kernel density estimates of the contour volume distribution, and many samples 608 will be required. Similarly, if a region of low contour volume is targeted, then kernel 609 density estimates with few samples will be relatively noisy and more samples will be 610 necessary. Here we have assumed numerical errors in solving the map are negligible and 611 independent of the parameters. Neither assumption is likely to be true for sophisticated 612 partial differential equation models and the interaction between numerical and sampling 613 errors is the subject of ongoing analysis. KDE introduces a further source of error, 614 which must be carefully managed to ensure reasonable results are obtained.

615
Our algorithms avoid the curse of dimensionality of the input space which plagues 616 grid-based approaches to inverse sensitivity analysis. The necessity of having to fit a 617 probability distribution to the output samples resultant from sampling the prior input 618 distributions means that, at present, our approach is limited to problems with relatively 619 few outputs. In §5.1.3, the output target was a three-dimensional distribution, and we 620 expended considerable effort finding a KDE method that adequately approximated the 621 three-dimensional contour volume distribution. We ultimately found that the most 622 effective approach was obtained by using the "kde" function within the "ks" R 623 package [21,22], which uses the data to estimate unconstrained bandwidth matrices, 624 which are then used to fit kernel density estimates to data with up to six dimensions.

625
Density estimation, however, is currently an active area of research and software 626 packages exist implementing many different variants of KDE (see [25] for a review of the 627 R packages already available in 2011). Vine copulas have recently been suggested as an 628 approach which avoids the curse of dimensionality in density estimation [26]. If this 629 promise is realised, then our algorithm will be applicable to output target distributions 630 of higher dimensions.

631
Mathematical models have proved indispensable tools for elucidating understanding 632 of biological systems, which are frequently not amenable to direct experimentation.

633
Biological systems are often robust to perturbations to particular constituent processes, 634 and we can use mathematical models to explore these sensitivities. Inverse sensitivity 635 analyses are a relatively recent addition to a modeller's toolbox, which allows one to 636 determine an input distribution -consistent with prior beliefs -that can generate a given 637 distribution of outputs. Here we introduce a Monte Carlo method which extends the 638 range of models for which inverse sensitivity analysis can be performed, and illustrate its 639 utility for several problems of interest to computational biology. It is our hope that, by 640 publishing this method, others are encouraged to undertake inverse sensitivity analysis, 641 which we have found is insightful for building and analysing mathematical models.