The impact of mathematical modeling languages on model quality in systems biology: A software engineering perspective

Christopher Schölzel; Valeria Blesius; Gernot Ernst; Andreas Dominik

doi:10.1101/2019.12.16.875260

Abstract

Reproducible, understandable models that can be reused and combined to true multi-scale systems are required to solve the present and future challenges of systems biology. However, many mathematical models are still built for a single purpose and reusing them in a different context is challenging. To overcome these challenges model quality needs to be addressed at the (software-)engineering level. Instead of just declaring standard modeling languages, researchers need to be aware of the characteristics that make these languages desirable and they need to utilize them consistently. We therefore propose a list of desirable language characteristics and provide guidelines how to incorporate them in a model: In our opinion, a mathematical modeling language used in systems biology should be modular, human-readable, hybrid (i.e. support multiple formalisms), open, declarative, and allow to represent models graphically. We demonstrate the benefits of these characteristics by translating a monolithic model of the human cardiac conduction system to a modular version and extending it with a trigger for premature ventricular contractions. For this task we use the modeling language Modelica as an example, that has all the aforementioned characteristics, but is not well known in systems biology. Our experiment illustrates how each characteristic can have a substantial effect on the quality and reusability of the resulting model. When applied consistently, they facilitate and simplify the creation and especially the extension of the modular model. We therefore recommend to consider these guidelines when choosing a programming language for any biological modeling task.

Footnotes

christopher.schoelzel{at}mni.thm.de, andreas.dominik{at}mni.thm.de
This revision is based on the much valued feedback of our reviewers from our submission in npj Systems Biology and Applications. In general it addresses our too narrow focus on Modelica; adds a missing overview of the state of the art, including Python-based modeling languages/environments; and adds more detail to the discussion of the required language characteristics, that we identified, and their effects on other aspects of modeling. In particular: 1. We change the title in order to not convey the impression that we want to define new standards for the whole field. 2. We add an overview of current state-of-the-art choices for mathematical modeling languages in systems biology in Section 1. 3. We add Python-based alternatives to our Supplementary Note. 4. We introduce the acronym MoDROGH for Modular, Descriptive, (human)-Readable, Open, Graphical, and Hybrid. Instead of referring to Modelica specifically, this allows us to use the general term "MoDROGH language" for any language that has the aforementioned characteristics. We change this wherever possible in the manuscript. 5. In addition to the existing note at the end of the discussion we also clarify in Section 1 that we do not believe Modelica to be inherently superior to other choices. 6. We specifically add remarks in Section 3 that state where Modelica is not ideal and suggest alternative languages that would be a better fit in the respective cases. 7. We further discuss trade-offs between different language design principles in the introduction of our MoDROGH characteristics in Section 2.1. 8. We revise our list of characteristics and their description and introduce the reviewers arguments about missing aspects as consequences of one of our characteristics. 9. We clarify the meaning of the "open-source" characteristic and rename it to "open". 10. We discuss a shortcoming of our example model in section 3 in the "modular" characteristic. 11. We added some additional smaller changes and improvements that are not named here for the sake of brevity.
https://github.com/CSchoel/shm-conduction

The copyright holder has placed this preprint in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors.