Abstract
Mathematical models continue to be essential for deepening our understanding of biology. On one extreme, simple or small-scale models help delineate general biological principles. However, the parsimony of detail in these models as well as their assumption of modularity and insulation make them inaccurate for describing quantitative features. On the other extreme, large-scale and detailed models can quantitatively recapitulate a phenotype of interest, but have to rely on many unknown parameters, making them often difficult to parse mechanistically and to use for extracting general principles. We discuss some examples of a new approach — complexity-aware simple modeling — that can bridge the gap between the small‐ and large-scale approaches.
Highlights
Simple or small-scale models allow deduction of fundamental principles of biological systems
Detailed or large-scale models can be quantitatively accurate but difficult to analyze
Complexity-aware simple models can extract principles that are robust to the presence of unknown complex interactions
Introduction
Mathematical models have long been the crutch for our intuition in many fields of science. Models have also rapidly become accepted tools in the biological sciences, used to organize data and knowledge, understand how biological phenomena arise from the collective action of components [1] and predict emergent organizational properties [2, 3].
In general, the most useful model for a particular process would depend on the specific question at hand as well as the information available (previous knowledge and attainable experimental data) [4, 5, 6]. As a result, a single model is rarely appropriate for all possible instances of a problem [7]. This said, modelers of biology have long argued (and continue to do so) about the most useful approach, creating some tension between the supporters of large‐ and small-scale models [8]. Detailed or large-scale models attempt to incorporate most or all the available information about a system that is being modeled, resulting in many components and interactions explicitly stated in the resulting model. These models are criticized for being poorly parametrized and not easily amenable to abstraction and general insight. On the other hand, simple or small-scale models actively seek to discern the minimal essential components and interactions required to explain a particular behavior. As a result, the quantitative predictive power of small-scale models is often questioned. All models, by definition, fail to incorporate every mechanistic detail of a biological system [9]. Simply stated, since models are necessarily approximations of reality, even the most elaborate model contains a set of assumptions, and any conclusion derived from this model would be dependent on the validity of these assumptions.
While keeping this in mind, we discuss some examples of “small-scale” models and “large-scale” models. We then introduce the potential hybridization of these two through an approach we call “complexity-aware simple modeling”. We discuss two recent examples of this promising approach.
Small-scale models: The power of simplicity
The motivation behind small-scale models is that the most parsimonious set of components and their interactions that can explain a phenotype also provide the most power for unraveling its underlying requirements. These models provide a major benefit: by using a small number of components, the number of unknown parameters is minimal and their associated assumptions are tractable. This greatly facilitates interpretation and provides an opportunity for vetting the generality of conclusions. As a result, small-scale models are often associated with the quest for uncovering principles.
In support of this notion, many concepts that are deeply embedded in our current knowledge of biological circuits result from small-scale models (see [10]). These include prominent principles such as the need for positive feedback for multistability [11], and negative feedback and time delays to produce oscillations [12]. Such simple but powerful guiding principles have been crucial for the study of many biological systems, ranging from circadian rhythms [13, 14, 15] to cell cycle regulation [16, 17, 18], and have led to profound insights into these complex systems.
While many small-scale models are derived with a biological system in mind, some are constructed to probe the general requirements of a broad biological property. For example, small-scale models have been used to pinpoint specific structural attributes of the biochemical networks that produce “absolute concentration robustness” (ACR). An ACR network is one in which one of the molecular species maintains a constant concentration, irrespective of fluctuations of other circuit components. A simple model of two molecular species A and B, present at a total concentrate A + B = θ, interconverting along the two simple reactions A + B → 2B and B → A (at rates α and β respectively) shows ACR in that the concentration of A is constant (at β/α) irrespective of total concentration θ. This example motivated the development of a broad theory for defining large classes of ACR networks [19, 20] that can produce this property irrespective of biochemical parameter values.
Often however, the most meaningful understanding derives from a convergence of the general investigations of “principles” with the focused investigations of a concrete biological network. Unraveling how frog oocytes implement an irreversible differentiation switch was accomplished through a keen interest in this biological question as well as exploration of the properties of ultrasensitivity and positive feedback using simple models [21, 22, 23]. Another example is that of perfect adaptation in which a functional quantity of a biological circuit can maintain a steady-state value that is constant despite a perturbing input. A multi-decade interest in understanding how bacteria implement perfect adaptation in chemotaxis [24, 25, 26, 27, 28, 29] led to a compelling formulation of this problem, with renewed interest generated by the identification of perfect adaptation in other systems [30, 31, 32]. Here again, insights gained from simple models of biological systems that feature perfect adaptation (see [33]) converged with general inquiries about motifs and topological features that can generate such a property (see [34]) to produce a meaningful and deep understanding. In particular, many of these studies converged on the use of integral feedback control, which was mathematically demonstrated decades ago to ensure perfect adaptation in the field of control theory [26, 35, 36].
Despite the success of small-scale and simple models, biology itself is neither small-scale nor simple. And, when using small-scale models to describe local nodes of a bigger biological network or to simplify elaborate interactions, we should continuously challenge our conclusions by asking about the effects of the surrounding complexity. What happens if the network motif that ensures perfect adaptation has an extra link or is connected to another network, or if the positive feedback that implements a switch is also entangled in a negative feedback loop? Each of these cases would have to be explored thoroughly, building our understanding from the bottom-up.
Large-scale models: Embracing biological complexity
The premise of large-scale models is that all components and interactions that comprise a system might be needed in order to reproduce its quantitative behavior accurately. The construction and simulation of such models that are faithful to details and complexity is now facilitated by acceleration in experimental data collection and growth in computational power (e.g. see [37]).
At the extreme of this spectrum are studies that attempt whole-cell modeling, seeking to describe how the phenotype arises from the genotype by accounting for all genes/proteins and interactions in a cell (i.e. human pathogen Mycoplasma genitalium), integrating multiple sources of data, as the transcriptome, proteome, and metabolome in a condition of interest, as well as more general properties of the cell, such as mass, geometry, and cell-cycle state [38, 39]. The resulting models have so far included hundreds of variables and thousands of parameters whose values have to be mostly assumed. Insights generated by these models include the identification of new gene functions and the prediction of biological processes not directly accessible by existing experimental measurements [38].
Large-scale models also arise from efforts to reconstruct cellular networks in an unbiased way (top-down) from high-throughput data [40]. These reconstructions have proven to be useful to provide an overview of cellular connectivity, but the analyses of the resulting models have often focused on isolating a few structural components and interactions associated with a phenotype of interest [41].
A third general approach for large-scale modeling is one in which the complexity is built stepwise, first by building simple models and then embedding them in a more elaborate physiological reality. For example, Spiesser et al. [42] presented a multiscale simulation platform to integrate an osmostress response model with its physiological context (e.g. cell division cycle), as well as the cell-to-cell variation expected in a cellular culture. This model revealed previously under-estimated features that are dependent on population dynamics, such as partial synchronization during osmoadaptation [42].
Overall, while large-scale models are undeniably closer to biological reality, the task of interpreting their findings remains difficult. The many parameters of these models, most of which are poorly measured or not measured at all, makes it difficult to differentiate conclusions and predictions that are dependent on parameter choices from those that are robust and general.
A new approach: Accounting for complexity without getting entangled in it
Recent years have seen the emergence of an exciting modeling approach, which we call here “complexity-aware simple modeling”. The goal is to preserve the small-scale modeling approach for representing a biological process, but without ignoring the complexity surrounding it. In fact, the point is to identify the largest and most complex class of interactions, which when connected to the simple model, fail to perturb its behavior. In this framework, the biological process of interest is modeled with the resolution needed but the surrounding complexity (e.g. connected networks) is deliberately kept undefined or defined by the most abstract representation possible. Statements about the behavior of the system of interest are then formulated and demonstrated to hold even in the presence of the unmodeled interactions (Figure 1).
A representative example of this approach asked whether there is a simple biochemical motif (with its corresponding simple model) that can achieve integral feedback when connected to an arbitrarily complex network (with any number of components and interactions, as well as undefined parameter values) [43]. The result was the so-called “antithetic motif”, where two molecular species bind to each other and annihilate each others function through this binding. Now, imagine that one of the “antithetic” molecular species controls the input of the complex network while the other is produced by the output of the same network. In this case, it can be mathematically demonstrated that the steady-state value of the network’s output perfectly adapts regardless of any step perturbation inside this network (Figure 1). The antithetic motif used in this configuration therefore implements integral feedback action. One requirement of this adaptation is that the only source of decay for the two molecules of the antithetic motif is their mutual annihilation, not their individual degradation or inactivation. While such perfect adaptation holds for an arbitrarily complex network connected to the antithetic motif, it is of course contingent on the network being responsive to the input from this motif. Still, this is a remarkable property with two main implications. First, it provides a recipe for building a simple “adduct” to a very complex network that makes it perfectly adapting. Second, when this antithetic motif is found in endogenous biological networks, we can isolate the motif, ignore the rest and declare the network perfectly adapting without the need to detail its complexity in order to infer the perfect adaptation property. A similar approach has been implemented to identify other integral control motifs [44] and to prescribe a general and robust cell fate reprogramming strategy [45].
A variation on this theme seeks to find bounds on behavior for classes of systems that share a small number of parts but can be arbitrarily different in others. For instance, being cognizant that all molecular interactions in cells are probabilistic, it is possible to define general relationships and bounds on the cell-to-cell variability in the antithetic motif (explained above) that hold irrespective of any complex network connected to it. Assuming stochastic birth, death, and binding reactions of the molecules in this motif, algebraic expressions based on relating average abundances and covariances for the two antithetic species can be combined with simple mathematical properties of normalized covariances to derive an appropriate bound on the fluctuations of these species. This approach showed a fundamental trade-off: molecular fluctuations in the counts of free molecules have to increase if a higher efficiency of binding between the antithetic molecules (and formation of their bimolecular complex) is desired [46]. This bound holds irrespective of, and cannot be alleviated by, connectivity to any network of arbitrary size or complexity. Therefore, this relationship is only based on a few specified interactions and is invariant to any networks in which those interactions might be embedded. This idea can further be exploited to rule out the plausibility of specific classes of interactions for an underlying biological process given experimental data. For example, if one is hypothesizing the presence of an antithetic motif, but experimental measurements of fluctuations and complex formation efficiency are outside the general bound delineated by their theoretically determined relationship, one can efficiently rule out the involvement of this motif irrespective of other un-characterized components present in the network. Such an approach was productively used in a different context to determine that a common class of gene expression models in which protein synthesis is proportional to mRNA levels cannot account for experimental single cell measurements in E. coli [47, 48]. This determination was based on the fact that the measured values of covariance metrics between mRNA and protein differ significantly from predicted relationships. Here again, predicted values are based on a simple model but are invariant to the potential complex connectivity of this model.
Final thought
Systems biology is often thought of as “the tool to unravel black boxes” [49]. We pose here the question of whether, when modeling biological systems, it is sometimes more productive to deliberately keep some boxes closed through what we have called “complexity-aware simple models”, or other approaches that adopt a similar philosophy. We find this idea appealing, and advocate for considering its implications. Might it be a fruitful way to approach the coarse-graining that is needed to traverse the different scales of biological organization? Might it a useful replacement for detailed descriptions of certain processes in whole cell models?
At the same time, we caution that only a few examples of the success of this approach exist and that there is still no clear disciplined way to implement such analyses in a general sense. We also caution that models are used for a variety of reasons and for asking different questions [5]. Therefore, we must continue to unabatedly define models at a resolution that enables them to be useful tools for answering these questions. Fundamentally, we aim our discussion of complexity-aware simple models to provide some food for thought and hopefully a subject for a vigorous scientific debate.
Acknowledgements
This work was supported by the Paul G. Allen Family Foundation and the National Science Foundation (NSF-MCB 1715108) to H.E.S. Hana El-Samad is a Chan Zuckerberg Biohub investigator.
Footnotes
Email addresses: Mariana.GomezSchiavon{at}ucsf.edu (Mariana Gómez-Schiavon), Hana.El-Samad{at}ucsf.edu (Hana El-Samad)