Abstract
For many cell-biological systems, a variety of simulation models exist. A new simulation model is rarely developed from scratch, but rather revises and extends an existing one.
A key challenge, however, is to decide which model might be an appropriate starting point for a particular problem and why. To answer this question, we need to identify and look at entities and activities that contributed to the development of a simulation model.
Therefore, we exploit the PROV Data Model (PROV-DM) and, building on previous work, continue developing a PROV ontology for simulation models. Based on a concrete case study of 19 Wnt/β-catenin signaling models, we identify crucial entities and activities as well as useful metadata to both capture the provenance information of individual simulation studies and relate these forming a family of models. The approach is implemented in WebProv, which allows one to insert and query provenance information.
Our specialization of PROV-DM contains the entities Research Question, Assumption, Requirement, Qualitative Model, Simulation Model, Simulation Experiment, Simulation Data, and Wet-lab Data as well as activities referring to building, calibrating, validating, and analyzing a simulation model. We show that most Wnt simulation models are connected to other Wnt models by using (parts of) these models. However, the overlap, especially regarding the Wet-lab Data used for calibration or validation of Simulation Models is small.
Making these aspects of developing a model explicit and queryable is a crucial step for assessing and reusing simulation models more effectively. The unambiguous specification of information helps to integrate a new simulation model within the family of existing ones. Our approach opens up a wealth of knowledge that may lead to the development of more robust and valid simulation models.
We hope that it becomes part of a standardization effort and that modelers adopt the benefits of provenance when considering or creating simulation models.
Author summary We revise a provenance ontology for simulation studies of cellular biochemical models. Provenance information is useful for understanding the creation of a simulation model, because it does not only contain information about the entities and activities that have led to a simulation model but also the relations of these, which can be visualized. It provides additional structure as research questions, assumptions, and requirements are singled out and explicitly related along with data, qualitative models, simulation models, and simulation experiments through a small set of predefined but extensible activities.
We have applied our concept to a family of 19 Wnt signaling models and implemented a web-based tool (WebProv) to store the provenance information of these studies. The resulting provenance graph visualizes the story line within simulation studies and demonstrates the creation and calibration of simulation models, the successive attempts of validation and extension, and shows, beyond an individual simulation study, how the Wnt models are related. Thereby, the steps and sources that contributed to a simulation model are made explicit.
Our approach complements other approaches aimed at facilitating the reuse and assessment of simulation products in systems biology such as model repositories as well as annotation and documentation guidelines.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* kai.budde{at}uni-rostock.de
Added funding information to manuscript.