RT Journal Article
SR Electronic
T1 A pipeline for systematic comparison of model levels and parameter inference settings applied to negative feedback gene regulation
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2021.05.16.444348
DO 10.1101/2021.05.16.444348
A1 Adrien Coulier
A1 Prashant Singh
A1 Marc Sturrock
A1 Andreas Hellander
YR 2021
UL http://biorxiv.org/content/early/2021/05/17/2021.05.16.444348.abstract
AB Quantitative stochastic models of gene regulatory networks are important tools for studying cellular regulation. Such models can be formulated at many different levels of fidelity. A practical challenge is to determine what model fidelity to use in order to get accurate and representative results. The choice is important, because models of successively higher fidelity come at a rapidly increasing computational cost. In some situations, the level of detail is clearly motivated by the question under study. In many situations however, many model options could qualitatively agree with available data, depending on the amount of data and the nature of the observations. Here, an important distinction is whether we are interested in inferring the true (but unknown) physical parameters of the model or if it is sufficient to be able to capture and explain available data. The situation becomes complicated from a computational perspective because inference and model selection need to be approximate. Most often it is based on likelihood-free Approximate Bayesian Computation (ABC) and here determining which summary statistics to use, as well as how much data is needed to reach the desired level of accuracy, are difficult tasks. Ultimately, all of these aspects - the model fidelity, the available data, and the numerical choices for inference and model selection - interplay in a complex manner. In this paper we develop a computational pipeline designed to systematically evaluate inference accuracy for a wide range of true known parameters. We then use it to explore inference settings for negative feedback gene regulation. In particular, we compare a spatial stochastic model, a coarse-grained multiscale model, and a simple well-mixed model for several data-scenarios and for multiple numerical options for parameter inference. Practically speaking, this pipeline can be used as a preliminary step to guide modelers prior to gathering experimental data. By training Gaussian processes to approximate the distance metric, we are able to significantly reduce the computational cost of running the pipeline.Competing Interest StatementThe authors have declared no competing interest.