Abstract
This poster describes a novel reparametrization of a fre-quently used non-linear ordinary differential equation (ODE) model of gene regulation. We show that in its commonly used form, the model cannot reliably distinguish between both quantitatively and qualitatively different parameter combinations. The proposed reparametrization makes inference over the model stable and amenable to fully Bayesian treatment with state of the art Hamiltonian Monte Carlo methods.
Complete source code and a more detailed explanation of the model is available at https://github.com/cas-bioinf/genexpi-stan.
1 Introduction
Transcriptional regulation has historically been treated with a vast range of models from Boolean networks, to full Michaelis-Menten simulation of individual chemical reactions. A middle ground is occupied by linear and non-linear ODE models that approximate Michaelis-Menten kinetics. In our attempts to reimplement a Bayesian version of a popular sigmoid ODE model [1] we noticed that very different parameter values - including different sign of the regulatory effect - may result in almost identical behavior, which introduces computational difficulties and questions the interpretation of previous results achieved with the model. Here we describe novel reparametrization of the model implemented in the Stan probabilistic programming language [2] that mitigates these problems.
2 The Model
The ODE model we use is based on [3]. For a bacterial regulon with a single regulator y and targets x1…xn the model takes form
Where si, wi, bi and di are parameters to be fit. While bacterial regulons are our primary interest, the model also works with multiple regulators. The model assumes that both the regulator and targets are measured at multiple time points, letting us to solve the ODE. When using microarray data we assume normal observation noise where the standard deviation has two components: one constant and one proportional to the expression. The regulator is fit using B-splines where the spline coefficients are also treated as parameters of the model which let us to both handle uncertainty in regulator measurements and obtain the regulator at arbitrarily small time resolution required for solving the ODE numerically. We note that the model is only weakly identifiable in its direct form (1), as very different parameter sets can result in similar solutions for xi. Weak identifiability poses computational problems for Bayesian inference, reduces stability of maximum-likelihood estimates and limits interpretability of the model results. Critically, the sign of wi (whether the regulator is an activator or a repressor) is not well determined for certain target profiles (Fig. 1a) and it may be impossible to determine whether wi ≃ 0 (Fig. 1b). Negligible effect on xi can also be observed under linear transformations of (wi, bi) when |wiy+bi| ⪢ 0 (Fig. 1c) and under approximately linear transformations of (si,di) (Fig. 1d).
Simulated examples of weakly identified parameters in the sigmoid model. Expression of the regulator (red, solid line) and multiple significantly different parameter values that give raise to similar solutions of the ODE (dashed lines) are shown together with possible measured target profiles (dots) that are insufficient to distinguish between the solutions. a) (w,b,s,d) ∈ {(5, −1, 3, 2), (−5,10, 2.5,1.4)} b) (w,b,s,d) ∈ {5, −1, 3, 3), (0, 0, 6, 3.4), (0, 0,16,10) c) s = 1, and (w,b) ∈ {(5, −3), (10, −6), (50, −30)} d) w = 1,b =-1and(s,d) ∈ {(10, 5), (19,10), (180,100)}
The aforementioned identifiability problems can be mitigated by a) fixing Ii = sgn(wi) and running separate fits when both signs have to be tested and b) replacing wi,bi and si with , the mean regulatory input
, the std. deviation of the regulatory input and ai - normalized ratio of si to di. The original parameters can then be recovered as:
where E and sd correspond to the sample mean and standard deviation and
is the observed expression of gene i.
To our knowledge, this is the first time the identifiability issues are reported and handled for the sigmoid ODE model. Caution should therefore be exercised when interpreting the parameter values found in previous works using this model.
3 Workflow and Remarks
A model fit is in itself insufficient to determine whether a regulation is plausible. To assess plausibility, we test whether the model improves over two baselines: a) constant synthesis (wi = 0) and b) a model where the regulator spline is not constrained by regulator measurements data. We use the LOO-IC criterion [4] to compare the models.
When there are known regulations, the model can be trained by fitting those known regulations first. The posterior estimates of the expression of the regulator can then be used to decrease uncertainty when fitting putative novel targets. Performance-wise, the training phase takes several minutes and fitting 881 putative targets individually on an 8-core machine takes 1 hour which promises faster performance than [1].
Footnotes
* Supported by C4Sys research infrastructure (MEYS project No: LM20150055).