Characteristics of mathematical modeling languages that facilitate model reuse in systems biology: A software engineering perspective

Christopher Schölzel; Valeria Blesius; Gernot Ernst; Andreas Dominik

doi:10.1101/2019.12.16.875260

Abstract

Reproducible, understandable models that can be reused and combined to true multi-scale systems are required to solve the present and future challenges of systems biology. However, many mathematical models are still built for a single purpose and reusing them in a different context can be challenging due to an inflexible monolithic structure, confusing code, missing documentation or other issues. These challenges are very similar to those faced in the engineering of large software systems. It is therefore likely that addressing model design at the software engineering level will also be beneficial in systems biology. To do this, researchers cannot just rely on using an accepted standard language. They need to be aware of the characteristics that make this language desirable and they need guidelines on how to utilize them to make their models more reproducible, understandable, reusable, and extensible. Drawing upon our experience with translating and extending a model of the human baroreflex, we therefore propose a list of desirable language characteristics and provide guidelines and examples for incorporating them in a model: In our opinion, a mathematical modeling language used in systems biology should be modular, human-readable, hybrid (i.e. support multiple formalisms), open, declarative, and support the graphical representation of models. We compare existing modeling languages with respect to these characteristics and show that there is no single best language but that trade-offs always have to be considered. We also illustrate the benefits of the individual language characteristics by translating a monolithic model of the human cardiac conduction system to a modular version using the modeling language Modelica as an example. Our experiment can be seen as emblematic for model reuse in a multi-scale setting. It illustrates how each characteristic, when applied consistently, can facilitate the reuse of the resulting model. We therefore recommend that modelers consider these criteria when choosing a programming language for any biological modeling task and hope that our work sparks a discussion about the importance of software engineering aspects in mathematical modeling languages.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

andreas.dominik{at}mni.thm.de
This revision is based on the much valued feedback of our reviewers from our submission in npj Systems Biology and Applications. It contains several small to medium-size changes, but the most prominent are the following: 1. We change the title of the manuscript to avoid the term "quality". 2. We rewrite the paragraph about existing efforts to increase reusability, rectifying our misleading statements about COMBINE and MIRIAM and describing the research gap that we identified more precisely and fairly. 3. We stress that we chose the cardiac conduction system of the SHM as an example to share our experience of a real-world example that we think generalizes to large parts of systems biology. 4. We add examples from our implementation of the SHM model that show the benefit of each of the MoDROGH characteristics in the context of a larger model. 5. We completely rework our description and discussion of the "human-readable" characteristic throughout the manuscript to better explain our arguments against XML-based formats. At the same time we alleviate our statements about the human-readability of SBML. 6. We discuss the trade-off between rigid interfaces and unstructured connections between components in more detail in the description of the "modular" characteristic. 7. We define what we mean by the term "modeling language" to explain our selection of languages and to better distinguish between languages and tools. 8. We add a paragraph at the end of the discussion that clearly states that we do not believe Modelica to be an optimal choice for systems biology and that further development is needed. 9. We add a checklist for the utilization of the MoDROGH criteria in the supplement. This is not connected to any of the reviewers' comments, but we think that it will be valuable for our readers.
https://github.com/CSchoel/shm-conduction
1 Note that although the journal article [11] was published one year after the PhD thesis [10], the PhD thesis actually contains the latest version of the model with many small improvements.
2 Seidel probably meant to include the refractory behavior of the ventricles and not the SA node. The actual implementation, however, checks the refractory state before the delay between SA node and ventricles is applied.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.