Required characteristics for modeling languages in systems biology: A software engineering perspective

,

bined to true multi-scale systems are required to solve the present and future challenges of systems biology. However, many mathematical models are still built for a single purpose and reusing them in a different context is challenging. To overcome these challenges model quality needs to be addressed at the (software-)engineering level. Instead of just declaring standard modeling languages, researchers need to be aware of the characteristics that make these languages desirable and they need to utilize them consistently. We therefore propose a list of required characteristics and provide guidelines how to incorporate them in a model: In our opinion, a modeling language used for models in systems biology should be modular, human-readable, hybrid (i.e. support multiple formalisms), open-source, declarative, and allow to represent models graphically. We demonstrate the benefits of these characteristics by translating a monolithic model of the human cardiac conduction system to a modular version and extending it with a trigger for premature ventricular contractions. For this task we use the modeling language Modelica, that has all the aforementioned characteristics, but is not well known in systems biology. Our example illustrates how each characteristic can have a substantial effect on the quality and reusability of the resulting model. Together they facilitate and simplify the creation and especially the extension of the modular model. We therefore recommend to consider these guidelines when choosing a programming language for any biological modeling task.

Introduction
As the understanding of biological systems grows, it becomes more and more apparent that their behavior cannot be reliably predicted without the help of mathematical models. In the past, these models were confined to single phenomena, such as the Hodgkin-Huxley model of the generation of neuronal action potentials [1]. They have served their purpose up to a point where now it is necessary to take into account the upward and downward causations that link all levels of organization in a biological system from genes to proteins to cells to tissue to organs to whole organisms, populations and ecosystems [2]. These causations span effects on multiple scales of space and time that need to be included in models. This can be achieved by two different approaches. A micro-level model combines thousands of individual homogeneous submodels to reach the next higher scale. This approach requires a vast amount of computing power and is therefore usually limited to span a distance of only two scales. More wide-spanning multi-scale models can be achieved by the multi-level approach that combines both macro-and micro-level descriptions of a system by different models [3]. While micro-level parts of such a model may look as described above, the macro-level parts feature heterogeneous descriptions of subsystems and their high-level interactions. For this approach, a wide variety of techniques exist that reduce the computational complexity of resulting models [4]. While both approaches require the reuse of existing models, the multi-level approach additionally involves the combination of independently designed models. These models may even use different modeling formalisms, thus forming a multi-class model [5].
The first step in building a model consisting of several submodels is to regenerate the individual submodels from the literature. This can already be a challenge due to several issues with reproducibility in systems biology including incomplete model descriptions, errors in formulas, availability of the code or missing descriptions of experiment setup or design choices [6,7]. As an extreme example, Topalidou et al. report requiring three months to reproduce a neuroscientific model of the basal ganglia [8].
We experienced similar reproducibility issues first-hand when we translated the Seidel-Herzel model (SHM) of the human baroreflex to the modeling language Modelica [9,10]. Even though we could reach out to the author of the model to obtain his original implementation in C, the translation process was still quite challenging. The C code was monolithic and imperative in nature, describing calculation steps instead of mathematical relations and containing details that where not described in the corresponding PhD thesis. We had to carefully extract the meaning of each line of code in order to build a modular declarative version that produced the same simulation results. However, when we wanted to extend the model with a trigger for premature ventricular contraction (PVC), it turned out that even the Modelica version was not suitable for reuse. In fact, the component that described the cardiac conduction system remained monolithic and lacked a graphical representation, which made it hard to identify the equations and variables that would have to be changed. From this example, it becomes apparent that issues with reproducibility and reuse reach down to the engineering level. The modeling language and the design principles applied to the construction of a model can facilitate or hamper further use. This also holds for the aforementioned case of Topalidou et al., since the original model was implemented in Delphi, which is also an imperative language that is not well suited for mathematical modeling [8].
Even though the need for guidelines on the engineering level is apparent, most publications about model quality and best practices for reproducibility and reusability do not address it. Instead, existing approaches broadly fall into three (overlapping) categories. They tend to focus on a) biological validity [11][12][13][14][15][16], b) high-level choices of modeling formalisms and techniques [17][18][19], or c) model documentation, annotation and distribution [7,20,21]. When modeling languages are discussed, it is in the form of stating accepted standard languages. Both COmputational Modeling in BIology NEtwork (COMBINE) [20] and Minimal Information Required In the Annotation of Models (MIRIAM) [22] suggest to use CellML and systems biology modeling language (SBML), but neither go into detail which characteristics make these languages more desirable than other choices. Our previous example of the translation of the SHM shows that using a suitable language is a necessary but not sufficient criterion for the model to actually be reusable. Additionally these languages cannot cover all use cases, especially for multi-class models that combine entirely different model formalisms that may not even be representable in a single language [7]. Researches will always need guidelines to choose between different language candidates and to write model code that actually uses the desirable characteristics of the chosen language.
In recent years there has been increasing interest to apply techniques from software engineering (such as unit testing, version control, or object-oriented programming) to modeling in systems biology [23][24][25]. Hellerstein et al. even go so far to suggest that systems biologists should rethink the whole modeling process as "model engineering" [23]. To date they are the only authors that we are aware of who actually give explicit guidelines for how to write model code (e.g. they suggest to use human-readable variable names). In this paper we expand on the idea of model engineering in two ways. First, we propose a list of desirable characteristics that make a model language suitable for building reusable multi-scale models. Second, we give guidelines how these characteristics can be exploited to increase the code quality and thus reproducibility and reusability of a particular model.
To demonstrate the reasoning behind those characteristics and their benefits in a concrete example we use the language Modelica. Modelica is an opensource declarative modeling language primarily used in engineering [26]. There are other languages with similar properties, such as MATLAB (https://www. mathworks.com/products/matlab.html), SBML [27], CellML [28] and Julia [29] (which we discuss in detail in the supplement), but in our opinion Modelica implements the required characteristics to the fullest extend. Additionally, it will be interesting to investigate Modelica in this context since it is not well known in systems biology.
To give an application example of our guidelines we transform the aforementioned monolithic model of the human cardiac conduction system into a modular structure and show that this version can easily be extended by a trigger for PVCs. Finally this example is utilized to reflect on the choice of language characteristics and the impact that a modeling language can have on the usefulness of the model.

Required characteristics for a modeling language for systems biology
The following characteristics were developed from our personal experience with Modelica and the SHM and/or from literature review. Each characteristic will be introduced with arguments for its usefulness, a brief set of guidelines how it may be applied to full effect and references to other authors that advocate this feature.

Modular
In order to replace parts of a model, they have to be identified in the code. The modeling language should make this as easy as possible, using separable components with clear interfaces. Modularization (and information hiding) are reliable tools to handle complexity in large software projects, so it is reasonable to expect that they will also be able to manage the complexity of biological systems. One could even make the case that modularization is the only way to handle this complexity since monolithic models will become unmaintainable quite fast. Even the Guyton model, which is probably among the most complicated monolithic models in systems biology, tries to explain the circuits by grouping them in (unfortunately tightly coupled) modules [30].

Guidelines:
Modules should be small enough to be understandable at first glance, but still self-contained. If a formula or concept is used multiple times in a model, it should be defined as a module once and then referenced (in software engineering this concept is called DRY for "don't repeat yourself"). Modules should have clearly defined interfaces that explicitly state possible connection points to the outside world. If possible, each module should be tested individually (this is called a "unit test" in software engineering).

References:
There is a general consensus that multi-scale modeling requires some form of modularity for hierarchical composition [25,[31][32][33][34][35]. More specifically, Hellerstein et al. and Mulugeta et al. both suggest that object-oriented programming might be an especially promising way to implement modularity [23,24].
(Human-)readable Models may not be reused or reproduced in the same language in which they are originally written. Therefore the code should be expressive enough that one can extract the relevant mathematical definitions without detailed knowledge about the tools that are associated with the modeling language. Additionally, the more readable the code is the more likely a reader may spot errors that can be corrected before or after publication.
Guidelines: Variables and parameters should have speaking names instead of single-letter identifiers. They should also be documented with a short sentence that explains their meaning. The number of variables per equation should be kept low, complex formulas should be split up. Optimizations and workarounds that are not intuitive should be documented in the code.

References:
As mentioned in the introduction, Hellerstein et al. also advocate for speaking variable names [23].
Hybrid A language is hybrid if it supports multiple language formalisms and thus multi-class models. The most common form of hybrid models and languages cover both continuous ordinary differential equations (ODEs) and discrete events, but other combinations are possible. It is important to note that we do not argue that a modeling language should support as many formalisms as possible, but rather a combination of formalisms that go well together. If other formalisms are required, the language should rather aim to allow coupling of models across languages with standardized interfaces such as the functional mock-up interface (FMI) [36,37].

Guidelines:
A model should clearly indicate which variables are discrete and which are continuous. Event triggers that define the transition between discrete and continuous parts of the model should be examined and tested with extra care.

References:
In their 2017 review Bardini et al. argue that multi-scale models in systems biology in general should strive towards a hybrid approach [2].
Open-source As a prerequisite for reproducibility and collaboration, models and simulation tools need to be accessible for everybody. In particular the hurdle to run a quick simulation of a model to determine its usefulness for a specific task should be as low as possible. An openly accessible model definition also means that readers can offer feedback and corrections to improve the model quality.
Guidelines: Readers of a paper should be able to download the model code and to simulate it with open-source tools. The download should also include explicit licensing information. The model repository should include everything necessary to reproduce plots and other results of the corresponding paper. It should also be under version control and include a human-readable changelog. Other researchers should be able to point out and suggest corrections.
References: Many large projects and databases such as Physiome [16], the virtual liver network [38], Plants in silico [31] and SEEK [21] already provide open-source implementations of models. Mulugeta et al. also specifically advocate for more version control and changelogs (in the form of e-notebooks) [24].
Declarative The mathematical formalism for biological models is complicated enough. A modeling language should not require the adaptation of the model to the execution logic of the language, obscuring the original definition. Instead, the language should adapt to the model if it is presented in a clean mathematical formulation. This way the code can focus on expressing meaning rather than structure, which facilitates understanding. One aspect of this is also that the language should allow the declaration and automatic checking of proper units for variables and parameters.
Guidelines: Models should follow strict mathematical rules to fit nicely into the chosen formalism. If a model needs workarounds, it may be worthwhile to revisit design choices and check the mathematical soundness of the model. In our own experience we found that most workarounds could be removed and the resulting model behaved more soundly and was easier to understand.
References: Few researchers in systems biology make the distinction between imperative and declarative languages explicitly. Loew & Schaff mention that the language that they chose for the Virtual Cell environment is declarative, but do not go into detail why this is important [39]. Zhu et al. state that declarative languages are desirable, because it allows to describe the biological processes "in a natural way" [40].
Graphical Discussing or even just understanding a model is difficult if the model is only described in the form of code or mathematical equations. This is especially true when the input of domain experts is required, who are not computer scientists or mathematicians. For this purpose most papers in biology use some kind of diagram to transport the general structure of the model in a graphical way. The modeling language should therefore provide a way to describe the model with similar diagrams. This is more effective if the diagram is directly coupled to the model equations to ensure that the graphical documentation always stays up to date when the code changes. However, fully automated diagrams tend to degenerate to an unstructured graph composed of hundreds of nodes with cryptic variable names. The modeler should therefore be able to manually alter at least some aspects (such as the positioning of elements) of the diagram.

Guidelines: All interactions between the individual modules in a model
should have a graphical representation in the corresponding diagram. Each diagram should only have a few components. If it becomes too crowded some components should be grouped together to form a hierarchical structure. Each individual component in the diagram should be represented with an intuitive symbol that either corresponds to the appearance or function of its biological equivalent.

References:
The Physiolibrary is a Modelica library for physiological models that has graphical representations for each component [41]. Pro-MoT is a modeling tool that allows to compose modular models in a graphical way [33]. Alves et al. compare 12 different simulator tools, giving higher ratings to those that have graphical representations for model components [42].

Modularizing a model of the human cardiac conduction system increases understandability
The Seidel-Herzel model (SHM) describes the autonomic control of the heart rate in humans at a high level of abstraction [9]. It can be classified as a hybrid (discrete and continuous), deterministic, quantitative, macro-level model. Henrik Seidel developed and implemented it in his PhD Thesis using the programming language C. In a previous paper we translated it to Modelica [10].
All effects in the model are described on the organ level, including a blood pressure curve generated by the pumping of the heart; the Windkessel effect of the expanding arteries dampening the initial rise in blood pressure; the arterial baroreceptors generating a neural signal depending on the absolute value and the increase in blood pressure; the autonomic nervous system emitting norepinephrine and acetylcholine as hormone and neurotransmitter based on signals from the baroreceptor and the lungs; and the cardiac conduction system with the sinoatrial node (SA node) as main pacemaker and the atrioventricular node (AV node) as a fallback system. In the following, only the conduction system is examined. It takes an input signal from the SA node (based on norepinephrine and acetylcholine concentrations) and includes the refractory behavior of the SA node 1 limiting the maximum signal frequency, the delay between a signal from the SA node and the actual ventricular contraction, and the AV node generating a signal if no signal has been received for a given period of time. In the original model, these effects where tightly coupled within a single piece of code comprising five parameters, 13 variables, and 12 equations-not counting additional parameters and variables for initial conditions. We found that this complexity makes it hard to understand and modify the model, which is why we translated it into a modular structure using Modelica.
The modular version separates the code into three components RefractoryGate , Pacemaker and AVConductionDelay. These components are connected via an unifying interface using a base class UnidirectionalConductionComponent that takes a Boolean signal as an input and produces a Boolean output. These inputs and outputs are only true for the exact point in time where a signal is 1 Seidel probably meant to include the refractory behavior of the ventricles and not the SA node. The actual implementation, however, checks the refractory state before the delay between SA node and ventricles is applied. issued (i.e., they behave as Kronecker deltas). For the RefractoryGate, the output equals the input except that after each signal there is a time period T_refrac in which incoming signals are ignored. The Pacemaker lets incoming signals pass through but also issues a signal on its own when the output has been silent for the duration of its period T. The AVConductionDelay delays an incoming signal by a duration that depends on the elapsed time since the last output signal has been issued. To give an example of the Modelica syntax, the resulting code for RefractoryGate looks as follows: To explain the interaction between the described models, a graphical representation is introduced for each component. As seen in Figure 1 we chose an open garden gate for the refractory gate, a metronome for the pacemaker and an hourglass for the delay. The components are simply connected in order with the exception that the reset of the pacemaker component is only triggered if the signal also passed the refractory component. This structure deviates from the original SHM because the refractory behavior is situated at the AV node instead of the SA node. Additionally, the delay component models the complete delay from the SA node to the ventricles but is actually applied after the components for the AV node. To remain closer to physiology, one could split the delay component into two delays-one before and one after the AV node-and similarly add another refractory gate for the SA node. However, we will show in the following sections that this simplified structure closely replicates the behavior of the SHM.

The modular model behaves similar to the original version
Although the modular version of the conduction model covers the same physiological effects as the original version, the implementation differs not only in structure but also in the mathematical representation. The original model used the elapsed time since the last contraction as a reference for the refractory period and the pacemaker effect of the AV node instead of the elapsed time since the last sinus signal was received. This means that the duration of the refractory period needs to be adjusted. In the old model, the performed check was whether t − (t sinus + T delay ) < T refrac where t is the current time stamp, t sinus is the time stamp of the last sinus signal, T delay is the duration of the delay and T refrac is c c AVN Figure 1: Diagram of the modular conduction model with symbols for the components. From left to right: Pacemaker for the pacemaker effect of the AV node, RefractoryGate for the refractory behavior of the AV node and AVConductionDelay for the combined delay between the SA node and the ventricles. The C in a black box indicates that the main variable of the component is held constant while the stopwatch symbol for the delay should indicate that the duration is time-dependent. Components have their input on the left, their output on the right and the pacemaker has the additional reset input at the bottom.
the refractory period. Since the check is now whether t − t sinus < T refrac one can deduce that if the same behavior is desired, T refrac should equal T refrac + T delay , that is T refrac must be increased by the average delay duration. The pacemaker component does not have to be changed at all, because although the pacemaker signal is delayed, the pacemaker clock is also started earlier. Effectively the delay duration is added and then subtracted from the resulting time stamp of the next contraction. Figure 2 shows a comparison of the resulting RR values (i.e. the time passed between two contractions) for both models with input of varying frequency. For the most part both versions behave identically. Only when the sinus cycle length drops below the refractory period you can see a difference in the plots. In these areas the RR values of the modular model fluctuate with a higher amplitude and lower frequency compared to the original SHM.

Extending the modular model with a trigger for premature ventricular contractions is easy
To show that the modular structure and the language Modelica facilitate reuse and extension we added a trigger for premature ventricular contractions (PVCs) to the model. PVCs arise if some part of the ventricular tissue generates a signal without stimulation from the AV node. This can happen in a healthy individual, but the heart rate response to such an ectopic beat can be used as a risk indicator during pathological conditions. [43]. An ectopic beat in the ventricles leads to a stimulation of the AV node that travels back upwards to the SA node either cancelling out an oncoming downward signal or (in rare cases) resetting the clock of the SA node. Therefore, the correct way to model a PVC would be to include components that are . The plot was obtained with an artificial sinus signal that switches its cycle duration every ten seconds according to the following schedule: 1 s, 3 s, 0.05 s, 0.8 s, 0.2 s, 1.8 s. This schedule was chosen to cover a large range with different cycle durations that are a) below the refractory period of the AV node (0.2 s, 0.05 s), b) above the refractory period, but below the pacemaker period of the AV node (0.8 s, 1 s), or c) above the pacemaker period (3 s, 1.8 s). The schedule is not in any particular order so that both reactions to a sudden increase and a sudden decrease in sinus frequency can be observed. Only the cycle durations of 0.05 s and 0.2 s produce qualitative differences.

bidirectional.
To keep it simple, the unidirectional components are used and it is assumed that a PVC will always reset the pacemaker and refractory time of the AV node but never reach the SA node. We modeled this by extending the components RefractoryGate and ConductionDelay with a "reset" input similar to the one that already exists for the Pacemaker component.
With these changes, there could still be two beats arbitrarily close to each other when a PVC is triggered right after a normal beat. Therefore we also modeled the refractory behavior of the ventricles themselves. The only change needed for this was the addition of a second instance of the already existing RefractoryGate component that receives input from the PVC trigger and the delay component (combined with a logical OR). The output of this additional component was used to ensure that the reset of the AV node would only happen if the PVC actually did trigger a contraction. This was achieved by an additional logical AND gate with input from the PVC signal and the contraction output. A graphical representation of the resulting model can be seen in Figure 3.

The PVC extension shows plausible results
The behavior of the resulting model is shown in Figure 4. a beat is delayed, it replaces the normal beat, leading to one RR interval that is shorter than normal followed by one interval that is larger than normal by the same magnitude. The same behavior can be observed for a PVC happening directly before a sinus signal is issued. A PVC that follows a normal beat within the ventricular refractory period is completely ignored and a PVC right between two normal beats leads to two RR intervals that are shorter than normal since all three beats (two normal, one ectopic) lead to a contraction.
To test the behavior of the AV node in the presence of ectopic beats the experiment was repeated without any sinus signal. Here, the behavior is the same for PVCs while an AV signal is delayed, during the ventricular refractory period and directly before an AV signal. A PVC right between two normal beats does not result in two reduced RR intervals but in one reduced and one increased interval. This is due to the reset of the pacemaker clock that will issue the next signal after the pacemaker period has passed. This signal has to travel through the delay component which increases the interval duration.

Discussion
The separation of the model of the human cardiac conduction system into individual components with a clear interface required structural changes. Therefore, before discussing the impact of our language and design choices on the understandability and reproducibility of the model, possible implications for its accuracy need to be addressed.
The simulation of the modular model shows a similar but not identical behavior with respect to the original version. Increased amplitude and lower frequency  Physiologically the refractory period indeed changes when the sinus frequency is increased, but the effect is a decrease instead of an increase as in the monolithic model [44]. It can therefore be said that the modular design both helps identifying plausibility issues and can be more easily adapted to the biological reality.
The PVC model also behaves as expected. In the simulation with a sinus frequency of 75 bpm the stimulations very close to a sinus signal effectively replace that signal, leading to a short coupling interval and a long compensatory pause. Stimulations during the refractory period are correctly ignored and a stimulation between two sinus signals is just treated as an additional contraction, leading to a series of two subsequent RR intervals that are shorter than the sinus cycle length. The third RR interval is also a little bit shorter than normal, because although the cycle length has not changed the sinus signal immediately after the PVC has an increased delay duration shifting it closer to its successor. In the simulation without a sinus signal, the only qualitative difference can be observed in the case of an additional signal in between two AV node signals.
The first RR interval is reduced, but the second interval is actually increased. Physiologically this can be explained by the signal of the PVC traveling upwards stimulating the AV node in the same way a signal from the sinus node would do. The next contraction can therefore only happen after the normal AV cycle length and the delay duration has passed.
With the question of the validity of the example models out of the way, the focus can now return to our initial research question to assess how the modular, readable, hybrid, open, declarative and graphical nature of Modelica (or similar languages) helped in the modeling process.

Modular
We have shown that extending the modular model to simulate PVCs was quite simple. Using the component diagram as a basis, the discussion about where to include the trigger signal for an ectopic beat became much easier than with the tightly coupled model. When we originally tried to implement the same extension in the monolithic version, we found it extremely hard to pinpoint the lines of code that would need to change. Now, with the modular version, the question was not "Which variables do I have to change?" but "Which influence does a PVC have on physiological component X?". The discussion shifted from technological considerations to physiological ones. Those were again followed by technological questions, since at the end some variables had to be changed and introduced, but due to the separation in small components each change only required the review of a few lines of code. In fact, the code for the RefractoryGate could be reused by simply adding another instance of this component to represent the refractory period of the ventricles. In the original model, several variables and equations would have to be added, making the already complicated system almost unmanageable.
This becomes even more apparent if we consider a bidirectional conduction model, which is probably only feasible using a modular structure. Extending each component with an additional input and output in the opposite direction is very possible, if one only has to consider one effect at a time. The diagram view would even require less changes, mainly doubling each connection line between two subsequent components. Doing the same for the original model in C or even the non-modular version in Modelica would increase the amount of variables and equations substantially and, therefore, further diminish the already low readability and instead introduce multiple possible sources for errors that would be hard to track down.
(Human-)readable The code of the models presented in this paper needed little additional words for explanation. We think that readers that are not familiar with Modelica will also be able to understand the general idea, since the code directly represents meaningful concepts with little additional clutter. The same could probably not be said about a piece of C-code or an XML file generated by an SBML tool. In fact, the required explanatory words can be put in the code by using documentation strings that can be attached to models, variables, parameters and even individual equations and that can be interpreted by graphical tools.
Hybrid The example model is (almost) purely discrete. This does not allow us to showcase the hybrid nature of Modelica directly. However, since this model of the cardiac conduction system has been developed as a replacement of the original, it can be integrated into the full SHM, which also has continuous variables, by simply changing the name of the referenced submodel from MonolithicConduction to ModularConduction. We could also show that although Modelica is not purely built for discrete modeling, formulation of discrete events requires little effort. You only have to be careful to use pre() when you are referencing the value of a variable before an event.

Open-source Since
Modelica is an open-source language with open-source tools, the reader can try out the models discussed here at https:// github.com/CSchoel/shm-conduction. You simply have to download the latest release from the Github repository, download and install Open-Modelica and load the models using the "Load library" option in the "File" menu. With JModelica there also exists a second free compiler that also offers optimization features [45]. If you want to combine Modelica models with models written in different languages, this may be possible using the FMI specification, that was developed for Modelica but is now used by many other free and commercial modeling tools [36,37].
Declarative The separation of models into components with clear interfaces enhances the benefits of a declarative language. Subjecting the component interfaces to strict mathematical rules makes existing design flaws apparent. The original model contained some design choices that were probably taken because they were convenient for programming, but they showed to be harmful for understanding and maybe even for the physiological plausibility of the model. For example, the original model mixed variables that represent actual signals and time stamp variables that schedule signals for the future (see Supplementary Figure 1). An incoming sinus signal was first translated to a scheduled contraction time which then would only take effect if there was not already a contraction that was scheduled to happen before the current signal. Upon the actual contraction, a scheduled time for a contraction triggered by the AV node was calculated.
To comprehend these formulas a context switch from the physiological meaning to the technical representation is required. Another hurdle for understanding the model is the unclear causality. In the SHM every effect is triggered by the contraction, even if there is no actual signal feedback from the ventricles to the AV node on a physiological level (unless in the case of an ectopic beat). By separating the model into smaller physiologically meaningful modules with a unified interface the Modelica compiler automatically hinted at these concerns, e.g. because variables where missing. A correct implementation was easier when the different effects where clearly separated.
Graphical The diagrams in Figure 1 and 3 not only help to understand the model at first glance, but they can also be defined as part of the Modelica code by an annotation() statement. This allows for building more complex models or small test cases using drag and drop in a graphical tool like OpenModelica [46]. The connections between the model components are not arbitrary, but are closely tied to the semantics of the model.
To sum up, we could show that using a language that is modular, readable, open-source, declarative, hybrid and graphical directly benefits the modeling process and leads to models that are both more understandable and easier to reuse and extend.
At this point we have to note that it is not necessary to use Modelica to gain these benefits. Modelica is one language among several choices, some of which are presented in the supplement, that realize our requirements to a great extent. What makes Modelica interesting is that it is tested and proven in an industrial setting, but only little known in systems biology. However, similar results could also be achieved using CellML or MATLAB with the Simulink environment and the Simscape language. For example, one downside of Modelica in contrast to CellML is the mostly missing support for partial differential equations (PDEs).
In general it can be said that in order to build reusable multi-scale, multilevel and multi-class models, one needs to be careful to chose a language that can handle the complexity of their models. It is, however, not sufficient just to chose a language that is suitable in principle. Researchers must be aware of the beneficial characteristics of the language and must be able to utilize them consistently. We all have to think as (software-)engineers and not only implement models for a single purpose, but as reliable part of a solution for larger challenges.

Material
We used Mo|E version 0.6.3 [47] to write the code of our models and OpenModelica version 1.13.0 [46] as well as Inkscape version 0.91 (https://inkscape. org/) to add the component icons. OpenModelica was also used for all simulations. In the following we will show and explain our Modelica code for the models and simulations. To keep it short, we do not show the code of the original monolithic version and of the graphical annotations. They can be found in Supplementary Listing 1-26 and at https://github.com/CSchoel/shm-conduction.

Modular conduction model
The first part of the modular model of the human cardiac conduction system is the interface component UnidirectionalConductionComponent that serves as a base class for all other components. It defines the input and output connectors inp and outp, which are Booleans that are wrapped in a custom type InstantSignal to indicate that they behave as Kronecker deltas: type InstantSignal = Boolean(quantity="InstantSignal"); connector InstantInput = input InstantSignal; connector InstantOutput = output InstantSignal; partial model UnidirectionalConductionComponent InstantInput inp "input connector"; InstantOutput outp "output connector"; end UnidirectionalConductionComponent ; The RefractoryGate has already been shown in the results section. The component passes on its input signal as output signal, but only when the elapsed time since the last signal left the component is larger than the refractory period: The function pre() is used here to denote the value right before an event instead of the value right after the event.
The Pacemaker model propagates incoming signals, but also adds an own signal if there was no input for a certain period of time. Additionally it has to be considered that during the refractory period incoming signals are ignored entirely and therefore should not reset the timer of the pacemaker. To accomplish this an external reset signal is added that will only be triggered if the signal passes not only the pacemaker but also the subsequent RefractoryGate component. The pacemaker component itself only resets when a spontaneous output signal is generated to maintain the invariant that the output signal will not be true for a prolonged period of time: The ConductionDelay model puts incoming signals on hold and releases them after a certain time has passed. Physiologically the duration of the delay for each signal depends on the time that has passed between the last signal leaving the component and the current input signal. The original model silently assumed that there will never be a second input signal while a signal is put on hold. Therefore this assumption is kept, but made more explicit by using the helper variable delay_passed in the when condition: As you can see, the model ModularConduction is again a UnidirectionalConductionComponent and can therefore be used as a component in a larger model (such as the SHM).

PVC model
For the PVC model we will now only discuss the changes required for the existing components. The full code can be found in Supplementary Listing 1-26.
In the RefractoryGate, the condition when outp simply has to be replaced with when outp or reset. For ConductionDelay the process is a little bit more complicated since oncoming signals have to be canceled. This is achieved by temporarily setting t_next to a very large value (larger than the total simulation time). The equations section changes as follows: when pre(reset) or (inp and pre(delay_passed)) then T = time -pre(t_last); end when; when pre(reset) then t_next = 1e100; elsewhen inp and pre(delay_passed) then t_next = time + duration; end when; when outp or reset then t_last = time; end when; The resulting extended model ModularConductionX looks as follows:

Modular contraction experiment setup
Simulation experiments can also be defined directly in Modelica syntax. The following code was used to produce For the results the variables monC.T_cont and modC.T were plotted against simulation time.