TY - JOUR
T1 - Countering reproducibility issues in mathematical models with software engineering techniques: A case study using a one-dimensional mathematical model of the atrioventricular node
JF - bioRxiv
DO - 10.1101/2021.02.19.431951
SP - 2021.02.19.431951
AU - Schölzel, Christopher
AU - Blesius, Valeria
AU - Ernst, Gernot
AU - Goesmann, Alexander
AU - Dominik, Andreas
Y1 - 2021/01/01
UR - http://biorxiv.org/content/early/2021/02/19/2021.02.19.431951.abstract
N2 - One should assume that in silico experiments in systems biology are less susceptible to reproducibility issues than their wet-lab counterparts, because they are free from natural biological variations and their environment can be fully controlled. However, recent studies show that only half of the published mathematical models of biological systems can be reproduced without substantial effort. In this article we examine the potential causes for failed or cumbersome reproductions in a case study of a one-dimensional mathematical model of the atrioventricular node, which took us four months to reproduce. The model features almost all common types of reproducibility issues including missing information, errors in equations and parameters, a lack in available data files, non-executable code, missing or incomplete experiment protocols, and missing semantic information about the rationale behind equations. Many of these issues seem similar to problems that have already been solved in software engineering using techniques such as unit testing, regression tests, continuous integration, version control, archival services, and a thorough modular design with extensive documentation. Applying these techniques, we reimplement the examined model using the modeling language Modelica. The resulting workflow can be applied to any mathematical model. It guarantees methods reproducibility by executing automated tests in a virtual machine on a server that is physically separated from the development environment. Additionally, it facilitates results reproducibility, because the model is more understandable and because the complete model code, experiment protocols, and simulation data are published and can be accessed in the exact version that was used in this article. While the increased attention to design aspects and documentation required considerable effort, we found it justified, even just considering the immediate benefits during development such as easier and faster debugging, increased understandability of equations, and a reduced requirement for looking up details from the literature.Author summary Reproducibility is one of the cornerstones of the scientific method. In order to draw reliable conclusions, an experiment must yield the same results when it is repeated using the same methods. However, biological systems are complex, which makes experiments cumbersome. It is therefore desirable to build a mathematical representation of the biological system, which captures its essential behavior in a set of variables and equations and allows for easier and faster experimentation. Unfortunately, recent studies have shown that half of the published mathematical models are not immediately reproducible due to missing information, mathematical errors, and incomplete documentation. These issues are similar to those faced in software engineering: A single missing file or a buggy line of code can render any kind of software useless. Software engineering has turned to rigorous software testing, automated development pipelines, and version control systems to overcome these challenges, but these techniques are not yet widely applied to mathematical modeling. In this paper we demonstrate their benefit for the reproducibility of a large mathematical model of the atrioventricular node. The software engineering solutions that we employ can be applied to any mathematical model and could therefore facilitate scientific progress by encouraging and simplifying model reuse.
ER -