The impact of mathematical modeling languages on model quality in systems biology: A software engineering perspective

Christopher Schölzel; Valeria Blesius; Gernot Ernst; Andreas Dominik

doi:10.1101/2019.12.16.875260

Abstract

Reproducible, understandable models that can be reused and combined to true multi-scale systems are required to solve the present and future challenges of systems biology. However, many mathematical models are still built for a single purpose and reusing them in a different context is challenging. These challenges are very similar to those faced in the engineering of large software systems. It is therefore likely that addressing model quality at the software engineering level will also be beneficial in systems biology. To do this, researchers cannot just rely on using an accepted standard language. They need to be aware of the characteristics that make this language desirable and they need guidelines how to utilize them to make their models more reproducible, understandable, reusable, and extensible. We therefore propose a list of desirable language characteristics and provide guidelines how to incorporate them in a model: In our opinion, a mathematical modeling language used in systems biology should be modular, human-readable, hybrid (i. e. support multiple formalisms), open, declarative, and allow to represent models graphically. We compare existing modeling languages with respect to these characteristics and show that there is no single best language but that trade-offs always have to be considered. We also illustrate the benefits of the individual language characteristics by translating a monolithic model of the human cardiac conduction system to a modular version using the modeling language Modelica as an example. Our experiment illustrates how each characteristic can have a substantial effect on the quality of the resulting model. When applied consistently, they can facilitate the creation of complex, multi-scale models. We therefore recommend to consider these criteria when choosing a programming language for any biological modeling task.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

{andreas.dominik{at}mni.thm.de}
This revision is based on the much valued feedback of our reviewers from our submission in npj Systems Biology and Applications. The manuscript was completely restructured in order to place more emphasis on the comparison of different modeling languages and less emphasis on the physiological details of the model of the cardiac conduction system. In particular we introduced the following changes: 1. We move all the details of our example model that require too much physiological knowledge to the supplement. This allows us to put a clear focus on the MoDROGH characteristics as our main finding and makes room for the following improvements. 2. We explain and examine the base model of the cardiac conduction system in greater detail to increase the value of the article for researchers not familiar with this aspect of physiology. 3. We motivate the choice of our example model by discussing the relevance of the Seidel-Herzel model (SHM) and the applicability of the shown techniques to larger models. 4. We introduce the features of our example language Modelica to the reader, explaining the main keywords and syntactic structures in detail. While we still do not want to focus too much on Modelica, this allows us to show how the MoDROGH characteristics of Modelica influence the example model. 5. We move the language comparison from the supplement to the main manuscript and greatly increase its level of detail. We believe that this adds a novel aspect to the article since we are not aware of any work that includes a modeling language comparison of similar extent. 6. We also increase the level of detail of the description of our MoDROGH criteria, clarify vaguenesses, and add additional references for their relevance. 7. We specifically discuss which languages could have been used for our example model instead of Modelica, including possible advantages and trade-offs.
https://github.com/CSchoel/shm-conduction
¹ Note that although the journal article [10] was published one year after the PhD thesis [9], the PhD thesis actually contains the latest version of the model with many small improvements.
² Seidel probably meant to include the refractory behavior of the ventricles and not the SA node. The actual implementation, however, checks the refractory state before the delay between SA node and ventricles is applied.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.