Summary
Development combines three basic processes — asymmetric cell division, signaling and gene regulation — in a multitude of ways to create an overwhelming diversity of multicellular life-forms. Here, we attempt to chart this diversity using a generative model. We sample millions of biologically feasible developmental schemes, allowing us to comment on the statistical properties of cell-differentiation trajectories they produce. Our results indicate that, in contrast to common views, cell-type lineage graphs are unlikely to be tree-like. Instead, they are more likely to be directed acyclic graphs, with multiple lineages converging on the same terminal cell-type. Additionally, in line with the hypothesis that whole body regeneration is an epiphenomenon of development, a majority of the ‘organisms’ generated by our model can regenerate using pluripotent cells. The generative framework is modular and flexible, and can be adapted to test additional hypotheses about general features of development.
1 Introduction
Contrary to intuition, the key molecules and mechanisms that go into the development of a human (>200 cell-types (Milo et al., 2009)) are the same as those required to produce a hydra (just 7 cell-types (Hwang et al., 2007)). More generally, there is a huge diversity of forms and complexity across multicellular organisms, but key molecules of development in Metazoa and in multicellular plants are conserved across the respective lineages (Meyerowitz, 2002). The basis of this diversity is illustrated by mathematical models of development which explore possible mechanisms of producing distinctive patterns found in different organisms, for example, segments in Drosophila (Von Dassow et al., 2000), stripes in zebrafish (Volkening and Sandstede, 2015), and dorso-ventral patterning in Xenopus larvae (Ben-Zvi et al., 2014). At a much broader scale, single cell transcriptomics and lineage tracing techniques have made it possible to map the diversity of forms of extant multicellular organisms (Kester and van Oudenaarden, 2018). Here, we ask about the limits of diversity that development can generate. And reciprocally, we ask what is common among all organisms that undergo development.
Biological development is modular (Bolker, 2000), and its out-come rests on gene regulation that is switch-like, rather than continuous (Albert and Othmer, 2003; Garfield et al., 2013). Keeping this in mind, we constructed a generative model of development with three basic ingredients: asymmetric cell division, signaling and gene regulation (Alberts et al., 2002). Although much is known about the detailed molecular machinery of development (Gilbert and Barresi, 2017), naturally, these details come from studies on a few model organisms. We choose to not include all these important particular features in our model for the sake of efficiently and systematically sampling a broad space of developmental schemes. Nonetheless, our model is capable of expressing specific examples of known developmental pathways, which we demonstrate using the Drosophila segment polarity network analysed in Albert and Othmer (2003).
We encode organisms in our model as lineage graphs, which show differentiation trajectories of the various cell-types in the organism. Traditionally, mathematical models in the literature elucidate developmental mechanisms responsible for known differentiation trajectories (Sharpe, 2017). Here we take the in-verse approach, and at a much broader scale; we sample across millions of biologically plausible developmental rules and map out the lineage graphs they produce. By tuning just three biologically meaningful parameters — which control signaling, cellular connectivity and cell division asymmetry — our model produces a rich collection of organisms with diverse cell-type lineage graphs, ranging from those with a single cell-type, to organisms with close to a hundred cell-types. Notably, tree-like lineage graphs are rare in our model. This could indicate that, contrary to popular belief, lineage graphs of real organisms are not tree-like; they are more likely to be directed acyclic graphs (DAGs). Additionally, an unanticipated outcome of our model is that most organisms we generate are capable of whole body regeneration. Our result supports the hypothesis that regeneration is an epiphenomenon of development (Goss, 1992). Despite the coarse-grained nature of our model, it gen-erates ‘organisms’ that reproduce hallmarks of real biological organisms. The model also produces concrete predictions, and we discuss how these predictions can be experimentally tested on animals like Planaria, in which regeneration is based on adult pluripotent cells (Reddien, 2018).
2 Generative model of development
Organisms in the model contain genomes with N distinct genes. By ‘Genes’, we refer not to single genes, but to gene regula-tory modules that control cellular differentiation (Mochizuki et al., 2013). In different cell-types of an organism, products of different sets of genes can be present (1) or absent (0). We represent a cell-type as a N-length binary string. For example, for N = 3, a cell-type C = [101] contains products of genes 1 and 3 but not gene 2 products. (In Supplementary section 6.9, we demonstrate how ‘Genes’ can also be used to encode spatial information using the well-known Drosophila segment polarity network as an example(Figs.S10, S11).
Cell-types are ordered according to standard binary ordering, i.e., the cell [101] can equivalently be written as C5. We only look at whether a given cell-type is present or absent in organisms, rather than the number of cells of any given cell-type. Therefore, since each of the N genes can be either 1 or 0, there are at most 2N distinct cell-types in an organism, and cell-type compositions for organisms (Fig.1(A)). Note that the number of distinct organisms is larger than , since different organisms may have the same set of cell-types but distinct lineage graphs (Fig.1(G)).)
We represent development as a repeated sequence of cell division, intercellular signaling, and gene regulation:
Cell division
Cells in the model undergo asymmetric cell-division. Although in real multicellular organisms, a single cell only divides into two daughter cells, a single cell-type may represent a population of cells, which need not all behave in the same way (Altschuler and Wu, 2010; Klein and Simons, 2011). We capture this heterogeneity by allowing cells in our model to divide into more than two types of daughter cells. Asymmetry of cell division is controlled by the parameter Pasym ∈ [0, 1], which is the probability that a daughter cell does not inherit the product of a given gene from the mother cell. That is, Pasym = 0 implies symmetric division, where all daughter cells inherit all gene products from the mother cell, and at Pasym = 1, no daughter cell receives any gene products from the mother cell. We assume that in the instant directly following division, no new gene products are formed. Therefore, genes that were in a 1 state in the mother cell can switch to a 0 state in daughter cells due to unequal and insufficient partitioning of the gene product during division (Knoblich, 2008), but genes that are in a 0 state in the mother cell are necessarily in a 0 state in the daughter cells as well. For any given organism in the model, we predetermine the sets of daughter cells produced by different cell-types randomly according to Pasym, and encode this in a binary matrix CD (Fig.1(B)).
Signaling
The number of distinct signaling molecules in an organism is controlled by the parameter Psig ∈ [0, 1], which is the probability that the product of any particular gene is a signaling molecule. Parameter Padj ∈ [0, 1] controls signal reception; a cell-type Ci can receive signals from a cell-type Cj with probability Padj. As in the case of cell-division, for each organism, the set of signaling molecules, and the pairs of cells that are allowed to exchange signals are predetermined and stored in a binary vector SG, and a binary matrix A, respectively (Fig.1(C,D)). Cells can only receive signals from other cell-types present in the same time step, and recipient cells receive all the signals produced by donor cells. In recipient cell-types, in response to incoming signals, the corresponding genes are set to a 1 state (Fig.1(F)).
Gene regulation
We model gene regulation as random Boolean networks (RBNs) (Gershenson, 2004); the states of genes depend on each other through arbitrarily complex Boolean rules. Updates in gene states result in updates in cell-types. In this scheme, some cell-types update to themselves (stable states), and other cell-types ultimately update to one of the stable cell-types, that is, they lie in the basin of a stable state. Here, instead of encoding RBNs explicitly, we describe gene regulation directly as the set of stable cell-types and their basins. For each organism, we predetermine its gene regulation and encode it in a binary matrix GR (Fig.1(E)). Our model is deterministic; once the matrices CD, SG, A and GR are determined for an organism, they remain fixed for the rest of the simulation. The model is also synchronous; all cell-types in the organism divide simultaneously, after which the developing organism is composed only of all daughter cells produced in this step. These daughter cells simultaneously exchange signals, in response to which the states of all the genes, in each daughter cell-type are updated simultaneously according to GR (Fig.1(F)). A time-step in the model represents a single repeat of cell-division, signaling and gene regulation.
The process of development ends when the set of cell-types in a developing organism repeats itself. We call this set of cell-types the steady state of the organism, and the number of time-steps between two repeats the period of the steady state. Since this is a finite and deterministic system, starting from any initial condition, such a steady state can always be reached. We call period-1 steady states homeostatic organisms (Fig.1(F)). Although organisms with complex, period>1 life-cycles, such as land plants with alternation of generation (Graham and Wilcox, 2000) exist in nature, in this study, we focus on homeostatic organisms. We represent homeostatic organisms as their cell-type lineage graphs (Fig.1(G)). The nodes of this graph represent cell-types in the homeostatic organism, and directed edges rep-resent lineage relationships between these cell-types. Let some cell-types A and B in a homeostatic organism be represented by nodes Va and Vb, respectively, in its lineage graph. Then, there is an edge from Va to Vb if one of the daughter cells of A gives rise to B after one round of cell-signaling and gene regulation. Note that the lineage graphs in the model are for the adult homeostatic organism, and do not represent developmental trajectories which map transitions of embryonic cell-types.
3 Results
3.1 Homeostatic organisms span a large range of sizes
We looked at millions of homeostatic organisms, spanning sys-tems with N = [3,4,5,6,7] number of genes (Fig.2(A),inset). Therefore, the largest possible organism in our data can contain 27 = 128 cell-types. 99.88% of these homeostatic organisms had lineage graphs with a single connected component. In the following, we describe lineage graphs of these single-component homeostatic organisms. While a majority of graphs in our data are small (1-5 nodes), the largest graphs have 89 nodes (Fig.2(A)). Naturally, the number of edges in lineage graphs increases with the number of nodes, but this increase is slower than that expected for simple random graphs (Fig.2(B), Fig.S1(A)). The number of nodes in lineage graphs follows closely the diversity of daughter cell-types produced (Fig.S1(C,D)). At very low Pasym, cells produce daughters cells identical to themselves, and at very high Pasym, most daughter cells are of the type [0, 0, …0]. Therefore at these values, diversity of daughter cells, and correspondingly the number of nodes in lineage graphs, is low. At other values of Pasym, the number of nodes stays level and decreases slowly beyond Pasym = 0.5 (Fig.2(C)). Number of nodes decreases as Psig increases (Fig.2(D)). Intuitively, high levels of signaling causes genes in a 1 state to ‘spread out’, effectively leading to a homogenization of cell-types. The sharp decrease in the number of nodes in response to increase in Padj indicates that a low level of inter-cellular connectivity is sufficient for signals to percolate throughout the organism (Fig.2(E), Fig.S1(B)).
3.2 Diversity of lineage graph topologies and the dearth of tree-like lineage graphs
Paths in a lineage graph represent differentiation trajectories of the organism’s cell-types. Here, we classify lineage graphs into six topologies, each of which contain qualitatively different paths: (i) unicellular (single cell-type), (ii) SCC (Strongly Connected Component – all paths are cyclic), (iii) cyclic (contains both cyclic and acyclic paths), (iv) chains (acyclic graphs with no branches), (v) trees (acyclic graphs with branches) and (vi) DAGs (Directed Acyclic Graphs, which contain edges connecting different branches. These edges represent the convergence of multiple cell-lineages to the same terminal cell-type). We ignore self-edges during lineage graph classification.
In our data, unicellular graphs are the most abundant (36%). Acyclic graphs (chains, trees and DAGs) comprise about 25% of our graphs. Of these, trees are the rarest (<1% across all graphs) and chains are the most abundant (14.3% across all graphs) (Fig.3(A)). Although all topologies are spread widely across parameter space, different topologies are enriched in different regions of parameter space (Fig.3(C)). No parameter region is monopolized by a single topology, except at extreme values of Pasym, where, as discussed earlier, most graphs are unicellular. To a large extent, these topologies can be characterized by their in-degree and out-degree distributions. For instance, in chains, in-degrees and out-degrees are at most 1, whereas in SCCs, in-degrees and out-degrees are at least 1 (Fig.3(B)). Therefore, for the most part, we can explain the model’s propensity to generate certain topologies, in terms of its propensity to generate certain in-degree and out-degree distributions. However, we find that acyclic graphs are slightly more enriched in our data than in randomized graphs with the same in-degree and out-degree distributions (Fig.S2).
To test whether graphs produced by our model are realistic, we compared our lineage graphs with those of real organisms (Fig.3(D,E,F)). 98% of all chain-type graphs in our data match exactly the Volvox lineage graph (Fig.3(D)). While we do not find model generated graphs which exactly match the lineage graphs for Hydra and the human hematopoeitic system, we do find graphs that are identical to parts of the real lineage graphs (Fig.3(E,F)). Recently, Plass et al. performed lineage reconstruction for the whole adult planarian worm, and report the best supported spanning tree for its lineage graph (Plass et al., 2018). We are unable to include this graph here, because the small sizes of our tree-like graphs makes a comparison unsuitable.
3.3 Homeostatic organisms contain pluripotent cells
We wanted to test whether these ‘homeostatic organisms’ could self-reproduce. As a test, we looked at whether single cell-types taken from homeostatic organisms develop into the same organism using the same rules (GR, CD, A and SG) used to generate the organism from a random initial cell-type. In other words, we looked for pluripotent cell-types. We find t hat in about 97% of all homeostatic organisms (and 95.2% non-unicellular organisms) there is at least one such pluripotent cell-type. This is surprising, since cells, taken out of the context of signaling from other cell-types in the organism, are not expected to be regenerative. Additionally, regeneration trajectories, starting from these pluripotent cell-types tend to be short (Fig.S7).
We tested whether this high level of pluripotency could be a trivial consequence, arising because perhaps the lineage graphs sampled in our data are the most probable graphs produced by these dynamics. But we find that in about 73.3% of lineage graphs (73.1% non-unicellular graphs), cell-types that are taken from homeostatic organisms are much more likely to generate it than cell-types not present in the homeostatic organism (Fig.4(A)). We therefore measure regenerative capacity of an organism as the fraction of pluripotent cells in the organism divided by the fraction of all cell-types (irrespective of whether it is a part of the organism, or not) that generate the organism. We call an organism regenerative if its regenerative capacity is greater than 1.
Regenerative capacity differs among different topologies (Fig.4(B), Fig.S6). In particular, most tree-type graphs have low regenerative capacity. Regenerative capacity also depends on model parameters: at Pasym = 0, where cells divide to produce identical daughter cells, as expected, organisms are maximally regenerative. At Pasym = 1, where all cell-types produce the same daughter cell-type ([0, 0, …0]), regenerative capacity is lowest (Fig.4(C)). Regenerative capacity increases with Psig and Padj (Fig.4(D,E)).
3.4 Pluripotent cells removed from their organisms retain their cell fates
In order to find the source of the high regenerative capacity of organisms in our data, we test how much the fate of a cell in the model depends on signaling. We define the fate of a cell-type C in a homeostatic organism as the set of all cell-types that receive an edge from cell-type C in the lineage graph of the organism. We call a cell-type independent if it has the same cell-fate when taken out of the homeostatic organism, as it does within the organism. Note that pluripotency and independence, while related, are not synonymous; the differentiation trajectory of a pluripotent cell during regeneration could be different than its differentiation trajectory during homeostasis.
Surprisingly, we find that across all parameter regions, homeostatic organisms are enriched in independent cell-types (Fig.4(F),Fig.S9). About 62% of all independent cells, pooled from all non-unicellular graphs, are pluripotent. That is, overall, independent cells are slightly more likely to be pluripotent than not. But, 85.2% of all pluripotent cells, pooled from all non-unicellular graphs, are independent (see Fig.S8 for a breakdown according to number of pluripotent cell-types in an organism). This explains the most likely mechanism of pluripotency in our model: if cells produced by a single cell plucked out of the homeostatic organism also belong to the organism, the whole organism can be built up step-by-step starting from such a single cell.
Interestingly, while the proportion of independent cell-types pooled from all non-unicellular graphs is similar (≥ 75%) across all topologies, different topologies have very different proportions of pluripotent cells (Fig.4(F), fourth panel). Notably, in SCC-type lineage graphs, where all differentiation paths are cyclic, 99.8% of all independent cells are pluripotent. Whereas in lineage graphs that contain acyclic differentiation paths, the proportion of pluripotent independent cells is lower; particularly in tree-type lineage graphs, where only 2.4% of the independent cells are pluripotent. More generally, this indicates that not only the number of independent cell-types, but also their connectivity in the lineage graph is an important factor contributing to an organism’s regenerative capacity.
4 Discussion
The process of development and its molecular mechanism is inherent in all metazoans and in all plants (Meyerowitz, 2002). This makes it difficult to design experiments to distinguish between emergent traits associated with development and traits that have evolved on top of it. Here, we have developed a minimal model where we can look at development in the absence of complications due to cross-talk with other biological processes. In our model, we include only those ingredients of development that are shared across all multicellular organisms, while not ascribing any particular form or mechanism to these processes. This allows us to identify traits that stem from the fact that the organisms undergo development, regardless of the details of the process. Note that such basic traits can still be subject to selection through regulatory processes on top of the key ingredients of development. We see such an emergent trait in our model: ability of whole body regeneration (WBR) through pluripotency. WBR, though widely spread across basal metazoan phyla, is curiously absent in mammals and birds (Ecdysozoa). Below, we discuss major assumptions and limitations of our model, and contrast these with mechanisms that occur in biological organisms which could effect regenerative ability.
Independent processes: Cell-division, signaling, and gene regulation are treated as independent processes in the model. This is likely to be false in real animals. Primarily, this implies that not all regions of parameter space explored in this work are biologically feasible. In particular, cells in the model follow a simple program for asymmetric cell division that is intrinsic to cell-types, but extrinsic control of asymmetric cell division, involving cues from surrounding cells, does occur in animals (Knoblich, 2008). Extrinsic control of asymmetric cell-division could lead to a decrease in the independence of cell-fates on cellular context which we see in the model, and thus lower regenerative capacity.
Chemical signaling: We have encoded a form of signaling which depends on spatial ordering of cells: cell-types that are arranged closer in some sense to other cell-types are adjacent to them and accept all the signals produced by them. Whereas, in real organisms, cells contain receptors that recognize signal molecules, rather than recognizing the donor cells that produce those signals. Firstly, since there are fewer kinds of signal molecules than there are cell-types, with this chemical recognition based signaling, on average, more cells are expected to exchange signals. In our model, level of signal exchange is controlled by Padj, and we find that pluripotency increases with Padj (Fig.4(E)). Therefore, it is likely that a switch to a chemical recognition based signaling will preserve the high level of pluripotency. Secondly, in the chemical recognition scheme, it is likely that a cell-type will receive the same set of signals even if some other cell-types in the organism are changed. That is, cell-fate is also likely to be more robust to changes in cellular context. Therefore, cell-types are also likely to be independent of cellular context as in the current model.
Other schemes: In the current model, we use the following scheme of development: cell-division, followed by signaling among daughter cells and gene regulation in response to signals exchanged. But there are other reasonable schemes which can also be considered. For example, a scheme where cell-division is followed by an additional step of gene regulation before signal exchange is also plausible. In the current model, daughter cells contain subsets of the contents of the mother cell, and in this sense are more similar to each other than to daughters of other mother-cells. Therefore, in the current scheme, signals received from a sister cell are likely to be less effective in changing cell-state than are signals received from other daughter cells. Gene regulation right after cell division would lead to a diversification of daughter cells, which is therefore likely to increase the level of effective signaling among daughter cells. But, as discussed earlier, we expect that an increase in the level of signal exchange to still lead to high regenerative capacity.
Additional parameters: The effect of processes such as asynchronous gene state updates (Chaves et al., 2005), and time delays involved in transfer of information about gene state updates (Cheng et al., 2013) has been tested on the Drosophila segement polarity network, and found to have interesting effects on the robustness of phenotypes. Such processes could add to the richness of lineage graphs we obtain from our model, but come with the cost of additional parameters, which would limit the breadth of the sampling.
Cell death: Cells in the model do not die. Not including cell death in the model results in lineage graphs where each node has at least one out-edge. We anticipate that including cell death would reduce the number of cycles in lineage graphs, leading to an increase in the proportion of acyclic graphs. Since regenerative capacity is linked to lineage graph topology, cell-death could be an important factor in determining regenerative capacity.
Although here we only provide intuitive arguments for what alternate versions of the model might yield, the framework of the model is easily amenable to manipulations, and differently constructed versions can be tested in the future.
The present model makes several predictions regarding general features of development and multicellular organisms. It suggests that presence of adult pluripotent cells should be a widespread trait in multicellular life-forms. In plants, we are already aware of pluripotent cells in the root and shoot meristems. But among animals, a wider investigation of regeneration and its mechanisms will be required to test this idea. A recent example of such a study is (Zattara et al., 2019), where the authors test the ancestral nature of regeneration in Nemertaean worms, which are not classical model organisms.
The distribution of lineage graph topologies in our data reflect the complexity and diversity of forms of multicellular animals that biological development is expected to produce. Under normal circumstances, cellular differentiation is expected to be irreversible. Therefore, we restrict our discussion here to the acyclic graphs that our model produces. Small (2-node) chains are the most abundant acyclic lineage graphs in our data (Fig.3(A), Fig.S3(C)). In line with this, the simplest multicellular organisms, such as Volvox carteri (Matt and Umen, 2016), an alga which evolved multicellularity only recently, has a chain like lineage graph. Interestingly, some cyanobacteria, such as Anabaena spaerica (Claessen et al., 2014), which display multicellularity during nitrogen starvation, also have chain-like lineage graphs.
Tree-type lineage graphs are rare in our data and tend to be small, and convergent rather than divergent (Figs. 3(A), S3(E), S4). This could indicate one of two things: This could imply that lineage graphs of complex organisms are unlikely to be tree-like. Our data suggests that they are more likely to be DAGs (directed acyclic graphs), i.e., organisms have higher levels of trans-differentiation than expected (Figs. S3(D), S5). Or, it could mean that more complex regulation, on top of the ingredients of this model, are at play in real organisms which lead to complex tree-like lineage graphs. A perhaps presumptuous, but interesting possibility is that tree-like lineage graphs were selected for because of their low regenerative capacity. There exist arguments and speculation over whether the Ecdysozoans selectively lost the ability to regenerate, and why (Bely, 2010).
These questions surrounding the topologies of lineage graphs are likely to be resolved very soon in the future, given the rapid developments in single cell transcriptomics technology. A notable recent study is that of Plass et al. (Plass et al., 2018), where they assemble the whole organism lineage graph for Planaria. A possible hurdle comes from the fact that current methods for lineage reconstruction using single cell transcriptomics data are not unbiased; in (Plass et al., 2018), although lineage reconstruction yielded a complex graph, the authors highlight the best supported spanning tree of this graph. Current lineage reconstruction methods work best if a particular topology for lineage graphs is already anticipated, and most methods are designed to only find chains and trees (Saelens et al., 2019; Tritschler et al., 2019). A study by Wagner et al. (Wagner et al., 2018), where single cell transcriptomics is used in conjunction with cellular barcoding, provides an example of a lineage reconstruction method which is unbiased towards particular topologies. In agreement with our result, the authors of this study found that zebrafish development is best represented by a DAG.
Lastly, we discuss how certain predictions of our work can be experimentally tested. Our results suggest that in organisms, such as Planaria, where regeneration is based on adult pluripotent cells called c-neoblasts, these cells are likely to be independent of cellular context, that is their cell fates should not change when taken out of the body, or transplanted to other cellular contexts. c-Neoblast independence could explain the coarse pattern of distribution of specialized neoblasts across the planarian body, and also why the distribution of specialized neoblasts produced does not depend on which organ is amputated (Reddien, 2018). Recent development of a method to culture neoblasts in the lab (Lei et al., 2019), make it possible to experimentally test neoblast independence. Additionally, in the model, not only pluripotent cells, but also non-regenerative cells display independence. Therefore, we also predict that, at least in organisms such as Planaria, cell-fate trajectories in organisms in homeostasis should reflect cell-fate trajectories in regenerating organisms. This can be addressed by lineage reconstruction experiments that compare lineage graphs of planarians in homeostasis with lineage graphs of regenerating planarians.
5 Methods
5.1 Surveying the combinatorial space of developmental schemes
We considered organisms with N = {3, 4, 5, 6, 7} genes. For each N, we have looked at {100, 100, 100, 92, 25} randomly generated gene regulation matrices (GR), respectively. For each GR, all values from [0, 0.1, 0.2, …, 1.0] were used for the parameters Pasym, Psig and Padj. At each parameter value, 10 randomly chosen cell-types were used as initial conditions (in case of N = 3, all 8 cell-types were used). A distinct set of developmental rules matrices (CD, A and SG) was used in combination with each initial cell-type. In all, we have looked at about ((100 + 100 + 100 + 92 + 25) × 113 × 10) ≈ 5.5 × 106 systems. 4858643 of these converged within 1000 time-steps into homeostatic organisms.
5.2 Model details
5.2.1 Asymmetric cell division
In our model, for any cell-type Ci, we generate different sets of daughter cell-types Di using the parameter Pasym ∈ [0, 1]; for any daughter cell-type , ∀k ≤ N,
We encode cell-division in a binary matrix ; CD(i, j) = 1 if cell-type Cj ∈ Di, else CD(i, j) = 0 (Fig.1(B)).
5.2.2 Signaling
The probability that a gene in the model produces a signaling molecule is Psig ∈ [0, 1]. Formally, let SG = {0, 1}N be a binary vector. Then gene k produces a signaling molecule if SG(k) = 1, where SG(k) = Ber(Psig) (Fig.1(C)). Let SGj = {0, 1}N be the set of signals produced by cell-type Cj. For any gene k, SGj (k) = 1 ⇔ (Cj(k) = 1) ∧ (SG(k) = 1).
Parameter Padj ∈ [0, 1] gives the probability of signal reception. We encode signal reception in a binary matrix .
Cell-type Ci receives all signals produced by cell-type Cj if A(j, i) = 1, where A(j, i) = Ber(Padj). Ci receives no signals from cell-type Cj if A(j, i) = 0 (Fig.1(D)). Cells can only receive signals from other cell-types present in the same time step. Let be a binary vector, where Tt(i) = 1 if cell-type Ci is present in the time step t. Tt represents the state of the organism at time step t. For some cell-type Ci present at time step t, let represent its state immediately after signal exchange. In cell-types that receive a signal, the corresponding genes are set to 1 (Fig.1(F)). That is,
5.2.3 Gene regulation
A cell-type in the model need not be a fixed point (single cell-state) of the gene regulatory network, it can also be an oscillation (multiple cell-states) (Xia and Yanai, 2019). In the latter case, the cell-type is represented by all cell-states that are part of the oscillation. We are only concerned with the set of cell-states in the stable state, and not with the sequence of cell-states in oscillations.
Formally, a system with N genes can have n ≤ 2N stable cell-types {S1, S2, …, Sn}; where Sx is itself a collection of nx cell-states such that . For any two cell-types Sx and Sy, if x < y then x1 < y1.
We encode gene regulation in a binary matrix . To generate GR for a given organism, we pick the number of stable cell-types n ≤ 2N according to uniform random distribution. First, we assign cell-states that form the basins of these stable cell-types: Cell-states are uniform randomly partitioned among the n basins. We then choose cell-states that form the stable cell-type from within the corresponding basins. Let Bx be a basin, then for some j such that (Cj ∈ Bx), (Cj ∈ Sx) with probability 0.5. For all i such that Ci ∈ Bx, GR(i, j) = 1 if (Cj ∈ Sx).
5.2.4 Homeostatic organisms and their cell-type lineage graphs
Let us consider an organism in state Tt at time step t. Right after cell division, let the state of the organism be represented by . After division, the organism is composed of all the daughter cells produced in that time step. That is,
These daughter cells exchange signals among themselves. Let represent the state of the organism right after signal exchange. Then,
The signals received by a cell-type activates its gene regulatory network. Gene regulation updates the set of cell-types according to the following expression: ∀i ≤ 2N,
Therefore, the organism is only composed of stable cell-types. Let the system have n ≤ 2N stable cell states. Then, we can equivalently represent the state of the organism at time step t as a binary vector , such that for x ∈ {1, 2, …n}.
We call states of the organism such that homeostatic organisms (Fig.1(F,G)).
We represent the homeostatic organism as a cell-type lineage graph. The nodes of the graph represent stable cell states that are present in the homeostatic organism, and directed edges represent lineage relationships between these stable cell states. Let the stable cell states Sx1 and Sx2 both be present in the final organism, and let them be represented by nodes Va and Vb of the lineage graph respectively. Then, there is an edge from Va to Vb if one of the daughter cells of Sx1 gives rise to Sx2 after one round of cell-signaling and gene regulation (Fig.1(G)). That is,
5.3 Assignment of topologies to lineage graphs
We categorize lineage graphs into 6 topologies: unicellular, strongly connected component(SCC), cyclic, chain, tree and other directed acyclic graphs (DAG). We ignore self-edges while assigning these topologies. A lineage graph is called unicellular if it has only a single node. For all other topologies, we used the networkx (version 2.2) module of Python3.6. A lineage graph is called SCC if the graph has more than 1 node and contains a single strongly connected component, it is called cyclic if the graph contains cycles and has more than one strongly connected component, it is called a chain if networkx classifies it as a tree and the maximum in-degree and out-degree are 1, it is called a tree if networkx classifies it as a tree and maximum in-dergee or out-degree is greater than 1, and it is called a DAG if networkx classifies it as a directed acyclic graph but not a tree.
5.4 Lineage graph randomization protocol
We represent a lineage graph with e edges as a matrix Ee×2, where E(i, 1) and E(i, 2) represent the source and the target node of edge i respectively. To randomize lineage graphs, we used a protocol that preserves in and out degrees of each node; we randomly choose pairs of edges from the graph and swap their target nodes. Let the randomized graph Erand be initially identical to E. Then, for any two edges of the lineage graph i, j, we propose a swap
The swap is accepted if there is no edge k such that
The above condition ensures that the total number of unique edges in E and Erand remain the same. We swap edges 1000 times for each lineage graph to randomize it.
5.5 Independent and intrinsically independent cell-types
We call a cell-type independent if it has the same cell fate when grown outside the organism as it does when it is a part of the organism. The cell fate of some cell-type Ci in the organism is given by the set of cell-types receiving an edge from the node Ci in the organism’s lineage graph. To decide whether a given cell-type Ci is independent or not, we separate this cell-type from the rest of the organism, and allow it to undergo one round of cell division, signaling and gene regulation, according to the same matrices CD, SG, A and GR that were used to generate the organism from which it was taken. Let us call the resulting set of cell-types . We call the cell Ci independent if is identical to .
For some cell-types, the basis of their independence is an insensitivity to signals produced in the organism. In such a case, the set of signals produced by the daughter cells of the cell-type is sufficient to satisfy the maximum set of signals that each of the daughter cells can receive.
Let the set of daughter cells of cell-type Ci in an organism be Di. ∀Cj ∈ Di let represent the maximal set of signals that it can receive, when all 2N possible cell-types are present together. i.e., For all signaling molecules k such that SG(k) = 1,
And, let be the set of signals it receives from within the set of cells Di. i.e.,
If, for all Cj ∈ Di, , Ci is intrinsically independent.
Author Contributions
S.M. conceived the project, developed code for simulations and performed analysis; S.M. and T.T. designed research; S.M. anT.T. wrote the paper.
Declaration of interests
The authors declare no competing interests.
6 Supplementary material
6.1 Dissection of the effect of parameters on graph size
We compared the properties of lineage graphs in our data with those generated with Erdos-Renyi random graphs (ER graphs) of similar size. ER graphs are generated using a fixed probability, p of any two nodes in the graph being connected by an edge (Erdös and Rényi, 1959). Therefore, on average, the number of edges in a graph with n nodes is proportional to n2. Increasing the number of nodes to c ∗ n increases the number of edges to c2 ∗ n2. We determined the number of edges in lineage graphs with n = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] nodes, and calculated the number of edges expected in ER graphs with n = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20] nodes. Compared to ER graphs, the rate of growth of number of edges in lineage graphs in our data is noticeably slower (Fig.S1(A)).
The number of nodes in lineage graphs decreases sharply with the parameter Padj (Fig.2(E)). We show here that this occurs because even at low values of Padj, cell-types in organisms are connected enough that the fraction of cell-types receiving all signals produced in the organisms reaches a maximum (Fig.S1(B)).
The effect of the parameter Pasym on the number of nodes in lineage graphs can be explained in terms of its effect on the number of distinct daughter cell-types produced (Fig.S1(C)). The number of distinct daughter cells produced at different values of Pasym is related to the average fraction of genes in a 1 state in these daughter cells. Among all possible cell-types with N genes, most cell-types tend to have about half their genes in a 1 state, and very few cell-types contain fewer, or more genes in a 1 state (Fig.S1(D:inset)). Therefore, when 0.2 < Pasym < 0.6, where on average, daughter cells have about half their genes in a 1 state, organisms produce the most number of distinct daughter cells, and at Pasym < 0.2 and Pasym > 0.6, fewer distinct daughter cells are produced (Fig.S1(D)).
6.2 Randomization of lineage graphs
We randomized lineage graphs generated with our model while keeping node in-degrees and out-degrees unchanged. Topology distribution largely remains unchanged upon randomization (Fig.S2(A,B)), and not many graphs change their topology upon randomization (Fig.S2(C)). Although, we find that the proportion of acyclic graphs decreases slightly, from 24% in model generated graphs, to 19% in randomized graphs.
6.3 Characteristics of lineage graphs with different topologies: graph size
Different graph topologies are different in their graph size distributions. While SCC and cyclic graphs span a large range of graph sizes (Fig.S3(A,B)), trees and chains tend to be notably small (Fig.S3(C,E)). DAG type graphs can have moderately large number of nodes (Fig.S3(D)).
6.4 Characteristics of tree-type lineage graphs
Tree-type graphs can be further characterized as divergent or convergent trees. We call graph nodes with in-degrees > 1 convergent, and nodes with out-degrees > 1 divergent. Note that by this definition, the same node is allowed to be both convergent and divergent. For some tree-like graph with n nodes and ne edges, let ini and outi be the in-degree and out-degree of the ith node respectively. We define for this graph a number gc as the sum of in-degrees of all convergent nodes, and a number gd as the sum of out-degrees of all divergent nodes, i.e.;
We define the degree of divergence of this graph as (gd − gc)/ne. For a perfectly divergent tree, such as the tree to the left in Fig.S4(A), the degree of divergence is 1. And for a perfectly convergent tree (e.g. the tree to the right in Fig.S4(A)), degree of convergence is −1. We find that most tree-like graphs in our data tend to be more convergent than divergent (Fig.S4(B)). Lineage graphs that are divergent lead to an increase in cell-type diversity starting from a few initial cell-types. Lineage graphs of real organisms are believed to be divergent trees. Larger trees tend to be more divergent (Fig.S4(C)). Degree of divergence decreases as Pasym increases, it is relatively insensitive to Psig and Padj.
6.5 Characteristics of DAG-type lineage graphs
DAG-type graphs differ from tree-like graphs in having edges that link different branches. If the edges in the DAG are rendered undirected, these edges are parts of cycles, or loops (Fig.S5(A)). The number of such edges in DAGs can be determined by subtracting the number of edges in the spanning tree of the graph from the total number of edges. For a graph with n nodes, the spanning tree has n − 1 edges. For a given DAG-type graph, we call the fraction of its edges that forms loops, its loop-fraction. Loop-fractions of DAG-type lineage graphs indicate the level of trans-differentiation. DAG-type graphs in our data have high loop-fractions (Fig.S5(B)), and loop-fraction increases with graph size (Fig.S5(C)).
6.6 Distribution of regenerative capacities
In order to infer whether a lineage graph is regenerative, we only look at whether its regenerative capacity is greater than 1, or not. In Fig.S6(A), we show the spread of regenerative capacities for different topologies. For most topologies, median regenerative capacity is greater than 1. The actual value of regenerative capacity is less meaningful, except in the case of tree-type graphs, where most trees have a regenerative capacity of 0. This implies that most trees contain no pluripotent cells. We also find that while median regenerative capacity decreases with N, the range of regenerative capacities increases with N (Fig.S6(B)).
6.7 Organisms in the model have short regeneration trajectories
Regeneration trajectories from pluripotent cells to the homeostatic organism tend to be short (Fig.S7). On average, trajectory lengths do not depend on whether the pluripotent cell is independent or not. At first, the observation that trajectory lengths decrease with graph size (Fig.S7(D)), might seem incongruous. But it can be understood intuitively by seeing that number of nodes in lineage graphs is proportional to the number of distinct daughter cells produced in every step of development. Since the same rules are followed for development and regeneration in the model, we can expect the the number of new cell-types added at each step of the regeneration trajectory is larger for organisms with more nodes in their lineage graphs.
6.8 Intrinsically independent cell-types are enriched in lineage graphs
We find in the model that the cell-fate of most pluripotent cells is independent of cellular context (Fig.S8). We wondered whether the large number of independent cell-types in lineage graphs in our data could be attributed to an insensitivity of these cell-types to signals from other cell-types. Alternatively, these cell-types could be independent despite being responsive to signals from other cell-types. We find that the former case tends to be true. We call a cell-type intrinsically independent if the full set of signals that can potentially be received by each of its daughter cells is already satisfied by signaling among these daughter cells themselves. In other words, no further external signals can influence the fates of the daughter cells of intrinsically independent cell-types. We calculated the fraction of intrinsically independent cell types across all 2N possible cell-types across all systems in our data. We find that cell-types that are part of lineage graphs are much more likely to be intrinsically independent irrespective of parameter region (Fig.S9). Thus cell-types in lineage graphs are predisposed to be independent.But, not all independent cell-types in lineage graphs are intrinsically independent (overall, about 20% the independent cells across all lineage graphs are not intrinsically independent), and not all independent cell-types are pluripotent (Fig.4(F)).
6.9 Drosophila segement polarity network expressed in terms of the generative model
In Drosophila embryos, segment polarity genes maintain borders of parasegments, which are 4 cells wide. Within each parasegment, the polarity genes are expressed in characteristic stripes. In Albert and Othmer (2003), the authors demonstrated that the gene regulatory network responsible for the pattern of gene expression in this system can be modeled as a Boolean logical network. In the following, we examine the Drosophila segment polarity network in terms of our generative model.
The network consists of 15 nodes: en, wg, hh, ptc and ci represent mRNAs, and SLP, EN, WG, HH, PTC, SMO, PH, CI, CIA and CIR represent proteins. Of these, WG, hh and HH act as signals. Signaling molecules HH and WG do not participate in regulation within the cells that produce them, rather they act only in cells that receive them as signals. In order to incorporate this feature, we represent each cell in the parasegment as two model cells; production of all non-signal molecules takes place in one of the cells, and molecules responsible for regulation of signal molecule production are exported to the second cell, from which signal molecules are secreted (Fig.S10(A,B)). In this sense, the second cell acts as a special compartment which insulates the gene network in the first cell from regulation by signal molecules produced within the same cell.
In this system, signals are exchanged only between neighbouring cells. Accordingly, in our model, cell positions can be expressed as additional ‘genes’, whose states do not change. For example, to express the positions of the 4 × 2 cells in this system, we use 3 additional ‘genes’ (Fig.S10(C)). In this system, signal exchange only depends on these ‘positional genes’, and does not depend on the states of the other genes.
Thus, our model is capable of expressing spatial arrangement of cells, and complex signaling mechanisms, although, it comes at the cost of an increase in system size. In Fig S11, we show the signaling vector SG, and portions of the cellular adjacency matrix A, and gene regulation matrix GR relevant to the steady state of the wild-type segment polarity network. The authors assume symmetric cell-division in Albert and Othmer (2003), and we do the same; therefore we do not show the cell-division matrix CD here.
Acknowledgements
We thank Luca Peliti, Albert Libchaber and Mukund Thattai for useful discussions, and John McBride for assistance with writing Python code for analysis. This work was supported by the taxpayers of South Korea through the Institute for Basic Science, Project Code IBS-R020-D1.
Footnotes
Extensions of the model, more examples, and corrections.