ISiCell: involving biologists in the design process of agent-based models in cell biology

Agent-based models are commonly used in biology to study tissue-scale phenomena by reproducing the individual behavior of the cells. They offer the possibility to study cellular biology at the individual cell scale to explore the basic behavior of cells which are responsible of the emergence of more complex phenomena at the tissue scale. Additionally, they can produce a predictive tool that will help taking decisions for biologic experiments based on in silico simulations. However these models require a good intercomprehension between the biologists and the modelers and thus it may take weeks or months to end up providing a usable prototype. To address this limitation, we propose a new methodology to facilitate the dialog between biologists and modelers and improve biologists’ involvement in the design of the model. For this purpose, UML diagrams, in particular, state-transition and activity diagrams, are used. They allow a better comprehension of the model for the biologists and offer a general frame for structuring models. Visualization of simulations is also used to have qualitative feedbacks from the biologist on the model. They are instrumental to validate or refine the prototype before exploring it. Alongside this methodology, we propose a web platform that enables to build state-transition and activity diagrams to describe a model and translate them into code. The generated code is then compiled on-the-fly and simulations are ready to visualize and explore. The platform also disposes of tools to directly visualize and manually explore the model. These tools allow for qualitative validation of the model and additional interaction with the biologists. Finally in this article, we show the capacity of our platform to reproduce models from the literature and to build new models starting from workshops with biologists. Its range of application is wide and includes immunology, oncology or cell biology. Author summary We developed a methodology based on diagrams to facilitate the dialog between computer scientists and biologists when building in silico models. The main idea is to limit misunderstandings and improve the involvement of the biologists in the prototyping process. For this purpose, we use visual methods to simplify the modeling phase. Alongside this methodology, we propose a web platform, called ISiCell, which enables to visually code thanks to diagrams that will be translated into code. The platform allows for compiling the generated code on the fly and to visualize and explore the model directly with the platform. The strong advantage of the platform is that one day workshop biologist/modeler allows to build new models. Additionally, we were able to reproduce models from the literature within the modeling platform showing the versatility of the tool. Our long-term objective is to use our methodology and platform in new contexts to develop new models. We intend the make the platform more user friendly in order to expand the community of users. Involving biologists in the conception of in silico models might improve their acceptability in the community.

Introduction describe individual-level phenomena) cellular automata allow to discretize the 28 environment allowing to spatially study local and individual mechanisms. ABM have 29 many advantages over those methods for modeling cell biology phenomena, especially 30 for testing hypotheses at the cell scale. We will present these advantages in the 31 following subsection. 32 ABM in biology 33 Agent-based modeling is a widely used approach to quantitatively simulate dynamic 34 systems [5]. Over the last decade, this modeling approach has become more popular and 35 has been applied to a wide range of biological systems [6][7][8]. Each ABM is defined by a 36 set of autonomous agents and a number of stochastic or deterministic rules. These rules 37 govern the interactions of each agent with their neighbors and their environment. 38 Unlike equation-based approaches, ABMs are decentralized, meaning that the overall 39 behavior of the system emerges from the collective behavior of each individual agent in 40 the system. 41 Each cellular agent can follow a completely independent behavioral trajectory 42 governed by individual parameters reflecting the heterogeneity of a cell population. 43 Modelers can directly implement cellular rules that reflect behavior observations at the 44 cellular level and inter-cellular interactions, allowing them to rapidly translate biological 45 hypotheses into algorithmic rules. 46 In this way, it is possible to perform simulations that explore the emergent behaviors 47 of these hypotheses and compare them with new data to iteratively confirm, reject or which the interactions are nearly impossible, thus inducing iterative production cycles 57 (Fig 1) with biological discussions to guide the modelers followed by a development 58 phase. This differed development can additionally induce misunderstandings translated 59 into faulty behaviors in the model while being difficult for the biologists to raise these 60 conception mistakes. All of these issues combined cause prolonged development phases. 61 This long modeling process may result in reluctance from the biologists to start such a 62 complex and time-consuming collaboration with modelers. Each modeling project starts with a first meeting where biologists present their work and their studied phenomena and discuss with the modelers about their needs and expectations from the future model. Then modelers synthesizes what they understood from the biological reality, develop a first prototype and after conceiving a first functional prototype, a second meeting is scheduled to discuss the model and improve it. This process is iterative until abandonment or the obtainment of a satisfying enough model. Each loop can take weeks or months depending on the complexity of the model, misunderstandings and schedules. development of the model. This is allowed by its own visual programming paradigm  Methodology to co-design models 95 We propose a dedicated methodology with the aim to reduce the design duration of 96 in-silico model. Additionally, we expect our methodology to further implicate biologists 97 experts in the design loop of the models. An additional objective of our methodology is 98 to ensure that the expert biologists fully understand the inner mechanisms of the model 99 by participating to its design process.

Graphical representation of the model
Our methodology is based on five successive steps. Even though we suggest to follow 110 these steps in the described order, they can be revised during the modeling session to 111 take into account elements that were not anticipated. Note that the steps are also 112 mostly based on graphical representation to help biologists to apprehend the model.

113
Step 1: environment 114 In this first step, we determine the environmental conditions of the cells. Modelers and 115 biologists must decide the type of environment to be simulated: 2D/3D, 116 discrete/continuous, physical model (no physics, mass-spring-damper system, Hertzian 117 physics, etc.), molecular diffusion, etc. The time granularity must also be discussed and 118 set up. These choices must be taken into account early on in the design process to 119 evaluate the computational effort necessary to run future simulations.

120
To guide these choices, modelers need to discuss the maximum number of cells in the 121 simulation, the accuracy required for the physical interactions and any other 122 components that will require substantial computational effort. The objective is to keep 123 future simulations in acceptable ranges of computational costs, depending to future use 124 of the model (parameter exploration, protocol optimization, etc.).

125
Step 2: cell types and attributes 126 Once the environment and its governing rules are set, we define the cell types that will 127 intervene in the modeled processes. By cell type, we refer to cells that will have 128 different parameters, behaviors and interactions in the simulation. We expect them to 129 mostly correspond to known biological classifications. The objective is to segment the 130 different behaviors and clarify their description in the following steps. Additionally, the 131 attributes (i.e. internal characteristics) of the cells are described. Attributes can be cell 132 radius, division counter, various protein production/consumption capacities or any other 133 value necessary for the development of the model.
Once the cell types defined, we then describe the behaviors of the cells. To this end, we 136 use state-transition diagrams (Fig 2.A). These diagrams can be easily translated into 137 code and have the advantage of being easily understandable for non-specialists. They 138 are widely used in software engineering to understand customer needs in the Unified 139 modeling Language (UML) methodology [20,21]. In this diagram, each rounded  B : Cells in the differentiation state can differentiate in 3 different type or stay/return stem cells. If the notch quantity is lower than the notch threshold, the cell can become one of the secretory types depending on the wnt quantity (paneth or goblet) otherwise if the wnt quantity is lower than 1 it becomes enterocytes instead of a stem cell.
Step 4: cell actions 149 We then determine the action sequence the cell follows in each state. For this purpose, 150 we use activity diagrams [22,23] (Fig 2.B). In this diagram, rectangles describe basic 151 actions (such as growing, randomly moving or dividing) and diamonds binary 152 conditions. The arrows, for their part, describes the sequence of actions each cell will basic action (i.e. the rectangles).

156
Step 5: experimental protocol 157 Finally, the experimental protocol must be designed. Here we describe the initial 158 conditions and the external events that will be triggered at different times of the virtual 159 experiments. This draws the parallel with real wet lab protocols usually designed by the 160 biologists prior to any experiment. are also available and can be added as is or modified in the activity diagram.

198
Once designed, the model can be automatically translated in code in one click. The 199 simulated environment is set up by plugging in the selected modules to the program.

200
The state-transition diagram is then transformed into two functions:

201
• The behavior function executes the code associated to each state. The code of 202 each state is generated based on the flow described in the activity diagrams: each 203 box of these diagrams corresponds to a function call which code has been 204 implemented in the C++ editor during the model design.
• The transition function updates the state of the cells depending on their current 206 state and the condition implemented on the transition arrows.

207
All in all, the generated code is guaranteed to be exactly equivalent to the diagrams 208 drawn and is then compiled to obtained an executable program.  This tool is based on editable Python scripts to generate plot and launch simulations. 247 So that the script has access to the data of the simulation at each time step, the code is 248 specifically wrapped and recompiled. This wrapping allows to directly and simply 249 interact with simulations with a programming language, Python, adequate for data 250 analysis. As it is written in Python, the tool also offers to easily load biological data in 251 order to plot them with the simulation for comparison.

252
ISiCell Explorer allows for exploring one or more parameters at a time between 253 selected range. The simulation will be launched with variation on the selected 254 parameters following a Latin Hypercube Sampling [24,25]. The generated results can be 255 selected to relaunch the exploration using another set of parameters as a basis. Thus it 256 is possible to reiterate until finding an interesting set of parameters. The interface offers 257 to construct a tree of each iteration allowing for going back and forth trials (as shown in 258  Our methodology offers the advantages of making models easier to understand and 269 to reproduce even for people who didn't work on them thanks to its diagrams.  where they indefinitely roam without attacking anymore. The corresponding state transition diagram corresponds to the Figure 9. The hertzian physics module [34,40] 328 was used to shape this 2D continuous model and manage the cells movements. You can 329 see an example protocol of the model in Figure 10. The behavior is split between the CTL and the cancerous cells. the subject of several computational models [43][44][45] for many years.

385
The fourth case study is based on [46] which describes a 2D discrete toric grid of was developed using NetLogo [47]. TNF-α, undifferentiated macrophages by MCP-1 or DAMP and myoblasts by IL-6.

443
After getting at least one of this previously mentioned cell, a blood vessel has to cool 444 down for a certain time. This cool down mechanism was not mentioned in the original 445 article but seemed necessary to regulate the number of cells entering the simulation.

446
In this case study, bacteria are not treated as agents but as local values in the grid 447 with their own management. There is two types of bacteria, the regular ones that can 448 grow on damaged IMCs and the virulent ones that multiply slower but can grow on 449 healthy IMCs. The first type produces a virulent bacteria for one thousand regular quantity ; virulent bacteria also have a better resistance to ROS than regular ones.

454
In order to reproduce the diffusion of molecules in the grid, the platform proposes a 455 module to manage 2D diffusion which is here coupled with a 2D discrete environment for 456 the cells. In order to confirm the coherence of the cells behaviors, we use ISiCell Viewer 457 which enables to qualitatively reproduce the dynamics of the original article (as shown 458 in Fig 13). To calibrate the model we used the ISiCell Explorer tool to qualitatively 459 match the population dynamics graphs of the original paper (as shown in Fig 14). This tool revealed to be useful to study the impact of each molecule secretions on the type dynamics helping us to reproduce dynamics from the original article.
diagram (see Fig 15)  the behavior also helps the interactions with the biologists.

488
The platform is part of the agent-based paradigm and thus avoids the possibility of 489 using another type of model. However, the difficulty to evaluate and quantify the 490 efficiency and relevance of a methodology compared to another makes nearly impossible 491 to prove that a paradigm is better than another for a specific model [48]. Nevertheless, 492 in a goal to facilitate interactions, ABMs offer a great foot ground for a better 493 intercomprehension between biologists and modelers. Moreover, the platform is 494 currently biased by the modelers that are working with it but the opening of the 495 platform to a wide public might resolve this problem by providing more user feedbacks. 496 Currently, using the ISiCell platform for a modeler requires a small amount of 497 training mainly due to the few C++ code to be written to implement compartmental 498 blocks. AI-based Pair programming such as github copilot [49,50] would be interesting 499 tools to facilitate the handling of the platform for new modelers especially for the 500 creation of new modules. More generally, the increasingly widespread use of Large

501
Language Models (LLMs) [51,52] such as OpenAI's GPT-3 [53,54] which are able to 502 generate coherent text depending on a natural language request, could lead to the 503 development of a myriad of coding tools making programming even more efficient.

504
These AIs already show interesting results for pair-programming or generating code 505 from prompt [55,56] and could be in a mid-term future interesting tool to improve the 506 usability of ISiCell Builder for modelers.

507
Although the platform does not need a lot of coding to obtain a functional model, it 508 still requires a minimum of C++ knowledge. A way to make this platform more 509 accessible and usable for biologist could be to develop a metalanguage in the same way 510 as the gamma platform and its GAML [57] to simplify the coding for non computer 511 scientists.

512
Expanding its community is one of the future goal of ISiCell. Involving more 513 modelers in this project will lead to a virtuous and continuous improvement of the 514 platform while leading to the development of new models. These models will bring new 515 modules specifically created to answer new problematics. If the community is active