Abstract
By linking genetic sequences to phenotypic traits, genotype-phenotype maps represent a key layer in biological organisation. Their structure modulates the effects of genetic mutations, shaping evolutionary outcomes. Recent work based on algorithmic information theory introduced an upper bound on the likelihood of a random genetic mutation causing a transition between two phenotypes, using only the conditional complexity between them. Here we evaluate how well this bound works for a range of biologically relevant genotype-phenotype maps, including a differential equation model for circadian rhythm, a matrix-multiplication model of gene regulatory networks, a developmental model of tooth morphologies for ringed seals, a polyomino-tile shape model of biological self-assembly, and the HP lattice protein model. By assessing three levels of predictive performance, we find that the bound provides meaningful estimates of transition probabilities across these complex systems. These results suggest that transition probabilities can be predicted to some degree directly from the phenotypes themselves, without needing detailed knowledge of the underlying genotype-phenotype map. This offers a powerful approach for understanding evolutionary dynamics in complex biological systems.
I. INTRODUCTION
Numerous mathematical models and frameworks, such as those in quantitative genetics, have been developed to study and predict organismal evolution. These efforts have primarily focused on how natural selection shapes organisms to achieve adaptation and complexity [1, 2]. Less emphasis has traditionally been placed on the effect of the arrival of phenotypic variation [3], although the growth of the field of evolution and development (evo-devo) is changing this historic imbalance [4]. To uncover the quantitative laws underlying biological diversity, it is crucial to go beyond studying natural selection alone. A comprehensive understanding must also encompass the mechanisms generating and structuring phenotypic variation [5, 6].
Among the many fruits of evo-devo’s return to mechanistic approaches has been the realisation that upon genetic mutations, certain phenotypes are more likely to appear than others, so that variation is not isotropic. This includes a bias towards certain phenotypic outcomes [7–9]. At a more basic level, various types of mutation bias [10– 13] also imply that certain mutations are more likely to occur than others, and that these biases can be observed in evolutionary outcomes.
In studies of genotype-phenotype (GP) maps, i.e., the association of gene sequences to biological traits, it has been repeatedly observed that the distribution of phenotypes in genotype space can be extremely anisotropic [14– 20]. This means that uniformly random sampling of genotypes leads to highly non-uniform sampling of phenotype space, with only a small set of possible phenotypes appearing with much higher frequency, a phenomenon called phenotype bias. Such bias in the introduction of variation has been shown to impose directionality on evolutionary trajectories [21–23], not merely to constrain or limit the possible outcomes. This substantial and positive directional role for bias has been observed more generally in development [9, 24], at the level genetic mutations [11, 12, 25], and for a range of genotype-phenotype maps [20, 26–28]. When the bias is strong enough, it can lead to the fixation of phenotypes that are not the fittest ones accessible [23, 29–32]. Surprisingly, it has also been reported that the abundances of biomolecules in nature such as functional RNA can be predicted using the bias as a null-model [27, 28, 33, 34]. These observations are noteworthy because a basic tenet of biochemistry, that structure determines function, would suggest that biomolecular shapes should be finely ‘tuned’ by selection, and that the intrinsic biases in the introduction of variation would have a less pronounced effect.
The preceding discussion highlights the importance of studying bias in the arrival of variation, whether this is cashed out as developmental bias, mutation bias, or phenotype bias. In this work we will build on work which connects genotype-phenotype map bias to information complexity theory. In ref. [35] it was shown that in a general input-output map setting, under certain circumstances, randomly chosen inputs will lead to a bias for simpler, more regular or symmetric output patters, a phenomenon called simplicity bias. In particular, an upper-bound on the probability of outputs was presented, which was based on the complexity of the output shapes. This work was also applied to biological genotype-phenotype maps, which can be viewed as input-output maps, and in particular protein quaternary structures, RNA secondary structures, and gene-regulatory network concentration profiles were shown to exhibit simplicity bias [33]. These works showed how informational complexity can be a source of bias in biology. See also ref. [36] and [37] for related studies of biological complexity. These observations motivate further investigation of information and complexity arguments for explaining other aspects of biological forms and patterns.
As an extension to earlier simplicity bias work, Dingle et al. [38] derived a conditional form of the simplicity bias upper bound. This conditional version is intended to give an upper bound on the probability P (x → y) that some phenotype shape x transitions to some phenotype shape y, upon a genetic point mutation to an underlying genotype. The bound was based on the conditional complexity of shapes y given x, which measures how much extra information is required to make shape y, given shape x. Because the value of transition probabilities P (x → y) depends on the details genotype-space architecture, i.e., how exactly genotypes are assigned to phenotypes, it is not at all obvious that any predictions can be made regarding P (x → y) just from examining the phenotype patterns x and y while completely ignoring the genotypes, or whether any common principles exist applicable to different maps. Hence, the discovery of an upper bound with nontrivial predictive success is noteworthy. One advantage of such a bound is that it can be applied in settings in which the genotype-phenotype mapping is not known, but only the phenotypes are observed. Another advantage is that the general nature of the bound — based on information arguments rather than details of the underlying map — suggests that it may be a general property of genotype-phenotype maps, and therefore advance the ambition of theoretical biology to find generally applicable quantitative laws in biology.
According to the derived bound on P (x → y), point mutations are a priori more likely to change a shape x to a shape y which is (i) very similar to x, or (ii) a very simple/trivial pattern (while in this case not necessarily being similar to x). Because the most similar shape to x is x itself, the authors of ref. [38] pointed out that this conditional bound naturally gives rise to high genetic robustness as a null model for genotype-phenotype maps (under some conditions), while the origin of the high robustness in genotype-phenotype maps had previously been seen as something of a mystery [39, 40].
Dingle et al. used their result to bound the probability of phenotype shape transitions for both RNA and protein secondary structures in computer simulations of the effects of mutations. Despite the fact that these bounds are agnostic to the underlying GP map, and only depend on the conditional complexity of phenotypes, they worked remarkably well. Nevertheless, despite the success of these predictions, a significant observation is that for both of these molecular examples, the genotype-phenotype connection is quite simple and direct. Therefore, the big question of whether or not the bound on P (x → y) could still describe the probability of a phenotypic transition for more complex and realistic genotype-phenotype maps that describe a much wider range of biological phenomena remains open.
In this work, we seek to address this question by expanding the investigation of the conditional bound and test its applicability in a range of genotype-phenotype maps, which are chosen specifically because they are either more complicated (in some sense), realistic, or have a less direct connection between genotypes and phenotypes. As a second extension, here we also use a slightly broader variety of complexity measures, beyond compression-based methods used earlier. The maps we study are: an ordinary differential equation circadian rhythm mode; a matrix-multiplication model of gene regulatory networks; a complex model of tooth shape formation with different ways to measure complexity; a polyomino self-assembling model of protein quaternary shapes; the well-known HP protein model. We assess three levels of transition probability prediction, of increasing stringency. Our main findings are that the conditional simplicity bias appears in these more challenging maps (except perhaps the HP proteins), but rarely are all three levels of prediction are achieved, and we discuss possible reasons for these failures.
II. BACKGROUND AND RELEVANT THEORY
A. AIT and algorithmic probability
Before looking at the applications of simplicity bias, we will briefly cover some of the related background theory. These details are given just for the sake of completeness, but will not (or only rarely) be directly invoked or used in this work.
Within theoretical computer science, algorithmic information theory [41–43] (AIT) connects computation, computability theory, and information theory. The central quantity of AIT is Kolmogorov complexity, K(x), which measures the complexity of an individual object x as the amount of information required to describe or generate x. K(x) is more technically defined as the length of a shortest program which runs on an optimal prefix universal Turing machine (UTM) [44], generates x, and halts. Intuitively, K(x) is a measure of the compressed version of a data object. Objects containing simple or repeating patterns like 01010101 will have low complexity, while random objects lacking patterns will have high complexity.
An increasing number of studies show that AIT and Kolmogorov complexity can be successfully applied in the natural sciences, including thermodynamics [45–47], quantum physics [48], entropy estimation [49, 50], biology [33, 51, 52], other natural sciences [53], as well as engineering and other areas [54–56].
An important result in AIT is Levin’s coding theorem [57], establishing a fundamental connection between K(x) and probability predictions. Mathematically, it states that P (x) ∼ 2−K(x) where P (x) is the probability that an output x is generated by a (prefix optimal) UTM fed with a random binary program. Probability estimates P (x) based on the Kolmogorov complexity of output patterns is called algorithmic probability. Given the broad-reaching and striking nature of this theorem, it is somewhat surprising that it is not more widely studied in the natural sciences. The reason in part for this inattention is that AIT results are often difficult to apply directly in real-world contexts, due to a number of issues including the fact that K(x) is formally uncomputable and the ubiquitous use of UTMs which may not be common in nature. See Appendix A for more discussion on applied AIT.
B. Simplicity bias
Conscious of these difficulties, approximations to algorithmic probability in real-world input-output maps have been developed, leading to the observation of a phenomenon called simplicity bias [35]. Simplicity bias is captured mathematically as
where P (x) is the (computable) probability of observing output x on a random choice of inputs, and
is the approximate Kolmogorov complexity of the the output x: complex outputs from input-output maps have lower probabilities, and high probability outputs are simpler. The constants a > 0 and b can be fit with little sampling and often even predicted without recourse to sampling [35].
Examples of systems exhibiting simplicity bias are by now wide ranging and include molecular shapes such as protein structures and RNA [33], finite state machines outputs [58], as well as models of financial market time series and ODE systems [35], deep neural networks from machine learning [59–61], and dynamical systems [62, 63], among others. The ways in which simplicity bias differs from Levin’s coding theorem mentioned above include that it does not assume UTMs, uses approximations of complexities, and for many outputs P (x) ≪ 2−K(x). Hence the abundance of low complexity, low probability outputs [58, 64] is a signature of simplicity bias.
A full understanding of exactly which systems will and will not show simplicity bias is still lacking, but the phenomenon is expected to appear in a wide class of inputoutput maps under fairly general conditions. Some of these conditions were suggested in ref. [35], including (1) that the number of inputs should be much larger than the number of outputs, (2) the number of outputs should be large, and (3) that the map should be ‘simple’ (technically of O(1) complexity) to prevent the map itself from dominating over inputs in defining output patterns. See Appendix B for more discussion on map complexity. Finally (4), because many AIT applications rely on approximations of Kolmogorov complexity via standard lossless compression algorithms [65, 66] (but see [67–69] for a fundamentally different approach), another condition proposed is that the map should not generate pseudo-random outputs like π = 3.1415… which standard compressors cannot handle effectively. The presence of such outputs may yield high probability outputs which appear ‘complex’ hence apparently violating simplicity bias, but which are in fact simple.
Note that it remains to be seen whether simplicity bias can appear in situations where one or more of these conditions is not met. For example, it has been shown that simplicity bias appears in the logistic map from chaos theory [62], a mapping known to be able to create pseudo-random patterns, hence potentially violating condition (4).
C. Conditional simplicity bias
The upper bound in Eq. (1) is relevant when genotypes are randomly sampled from the full space of possible genotypes. However, in biological evolution it is also important to consider phenotype transitions, that is, the probability that upon a genetic mutation the phenotype x becomes phenotype y. While single point mutations are perhaps the most common settings in mutagenesis experiments and evolutionary biology, studying situations where not just one but a few mutations are imposed is also interesting. With phenotype transitions in mind, in ref. [38] (but Cf. [70]) a conditional form of simplicity bias was derived, taking the form
To elaborate, this bound says that the probability P (x → y) that phenotype x becomes phenotype y upon a one (or few) point mutation is modulated by the conditional complexity of y given x. It is necessary that the number of mutations be small, otherwise if large numbers of mutations are introduced, then it is effectively the same as a completely random genotype, for which Eq. (1) is more relevant. For y which are either simple or similar to x, then the probability P (x → y) can be high; for y which are complex and different to x, P (x → y) must be low.
For conditional complexity, the following relation was used [38]
In this work we will use the complexity measure based on Lempel-Ziv 1976 [65], denoted CLZ as used earlier in multiple simplicity bias studies [35, 38, 58].
The conditional upper bound Eq. (2) can be seen as formalising and quantifying the intuition, perhaps held by biologists familiar with experimenting with genetic mutations, that mutations often do not have large effects [71], but occasionally rare mutations can have large effects. Going beyond a mere intuition however, the bound can be used to make predictions about which phenotypes are more or less likely to appear, and by how much [38]. It is noteworthy that we can make these nontrivial predictions, even while being agnostic about the details of the underlying genotype-phenotype map.
Clearly, if the details of the genotype-phenotype map are known, or if data recording the effects of genetic mutations is available, then these can be used to make more accurate predictions of P (x → y) than by Eq. (2) alone. The primary use of Eq. (2) is to be able to make some nontrivial predictions even when such map details or data are not available.
III. EXPERIMENT SET-UP
Before undertaking the numerical experiments, we will describe the general approach to be employed, and the different levels of prediction which will be assessed.
A. Outline of methods and predictions
The protocol for the numerical experiments will be as follows: for each genotype-phenotype map, a suitable genotype and (discretised) phenotype will be defined. Then, via computational sampling and mutating genotypes, we will computationally estimate the transition probability P (x → y), representing the probability that the shape/pattern y appears as a phenotype when a random single point mutation is introduced to one random genotype underlying x. These transition probability estimates will be plotted against the upper bound in Eq. (2), and thereby test the accuracy of the bound.
B. Estimating the constants in the bound
Eq. (2) has two parameters, a > 0 and b. These parameters are discussed in refs. [35, 38]. Following these references, we assume b = 0, and we scale the complexity estimates so as to make a ≈ 1, using the following expression
which scales the complexity to range between ∼0 and ∼log2(Ny(x)), as is theoretically desirable. Ny(x) is the number of different phenotypes y such that P (x → y) > 0, i.e., the number of accessible phenotypes via mutation from x. Naturally, Ny(x) will vary between different starting phenotypes x.
The main requirement for the scaling in Eq. (4) is to have an estimate of Ny(x). In some genotype-phenotype maps estimating Ny(x) is possible a priori. For example, if the set of accessible phenotypes corresponds to the set of all possible binary strings of length L, then this implies Ny(x) = 2L. In many cases, the set of possible accessible phenotypes is not known, or cannot easily be estimated, and so Ny(x) can be estimated by random sampling. Consequently, if estimating Ny(x) is not possible then the upper bound predictions may be much less accurate. This value of a represents a main challenge in determining the the upper bound, and hence a main challenge is making a priori transition probability predictions.
Having said this, as explored in ref. [28] it is still possible to make nontrivial and useful predictions, such as which of two possible phenotypes y1 and y2 is more or less likely to appear upon random mutation. The reason this is possible is that according to the bound, constants a and b are not required to infer which phenotype is more or less likely, only to make an estimate of the actual probability value. Relatedly, if a can be estimated but b cannot, then it is still possible to estimate the relative probabilities because the value of P (x → y1)/P (x → y2) does not depend on b.
C. Levels of conditional simplicity bias
In terms of predictions deriving from conditional simplicity bias theory discussed above, we are interested in three closely related but nonetheless non-equivalent phenomena. We will call these three levels of conditional simplicity bias:
Level (I): log10 P (x → y) tends to decrease with increasing conditional complexity , but not necessarily with a linear upper bound.
Level (II): log10 P (x → y) tends to decrease with increasing conditional complexity , with a linear upper bound, but not necessarily with the predicted slope.
Level (III): log10 P (x → y) tends to decrease with increasing conditional complexity , with a linear upper bound, including correctly predicting the slope.
These three situations are increasingly precise and stringent. If any of these levels are apparent in the example genotype-phenotype maps, we will consider it a predictive success for the theory. The slope especially is difficult to estimate (as discussed above), hence achieving Level (III) will be a challenge. In the earlier study of ref. [38] looking at RNA and protein secondary structure, Level (I) and Level (II) were achieved in all cases, while Level (III) was achieved for only some test-case phenotypes.
To test the statistical significance of the three levels, we do the following tests: For Level (I) we use Spearman’s rank correlation coefficient ρ using all the data points available in each test case (not just the upper bound). A negative ρ indicates that Level (I) is achieved. For Level (II) we use Pearson’s R2 to measure the degree of linearity of the upper bound of the relationship between conditional complexity and transition probability for each test case. In this context, R2 represents the proportion of the variance in the transition probability that can be explained by the conditional complexity using a linear model. High values of R2 indicate a strong linear relationship. Note that by ‘upper bound’ we mean the highest log10 P (x → y) value for each unique conditional complexity value .
For Level (III), we use a bootstrap method that can determine if two slopes are significantly similar by generating a distribution of slope differences. Starting with the slope given by the bound model, we then repeatedly resample the upper bound data points for each conditional complexity category of the test case, with replacement to create 1000 bootstrap samples. For each sample, we calculate a slope and then subtract the bound model slope from it, producing a distribution of differences. By calculating the 95% confidence interval (CI) of these differences, we assess if 0 lies within the interval. If 0 is inside the CI, it suggests that the slopes are not significantly different, indicating they are similar; if 0 is outside the CI, the slopes are significantly different. To use this method to assess whether the predicted slope is significantly different from the upper bound slope, we check if the latter meets both of the following requirements: R2 > 0.5 and if the data can be significantly explained by a linear model (p − value < 0.05).
It should be kept in mind that the metrics used to measure Level (I), (II), and (III) have limitations, and are only roughly indicative of the success of the conditional simplicity bias predictions. By this we mean that a high Spearman’s correlation value or a high R2 value, for example, do not imply that Level (I) or (II) are convincingly achieved. On the other hand, we need some kind of metrics to assess the prediction performance, so these will suffice.
IV. NUMERICAL EXPERIMENTS
We now proceed to a number of numerical experiments, testing the conditional simplicity bias bound Eq. (2) for a variety of different maps. In addition to testing a wider range of genotype-phenotype maps in this current study, where appropriate we also test some other complexity measures.
A. Circadian rhythm
For our first genotype-phenotype map, we will study transition probabilities of a discretised circadian rhythm differential equation model, developed by Vilar et al. [72]. The model was originally introduced to study strategies that biological systems use to minimize noise in circadian clocks, but was also studied in ref. [35] in the context of simplicity bias, and we follow their definitions of inputs (genotypes) and outputs (phenotypes). In this model, ‘genotypes’ are the values of the 15 equation parameters, which are defined as integers between 0 and 7. Hence there are 715 ∼ 1013 possible genotypes. In biology, a genotype is more commonly taken to mean the actual DNA genome sequence underlying a trait. However, it is common practice in genotype-phenotype studies to take as the genotype some less basic level of biological organisation, which is itself dependent on the true underlying genetic nucleotide sequence.
As for the resulting phenotype x associated to each parameter-combination genotype in this model, again following ref. [35] we take the chemical time-concentration profile of the activator, as defined in [72]. This profile has a certain shape which depends on the genotype parameters, and the shape denotes the concentration level of a product at the end of the regulatory cascade over time. To define the phenotype profile x, we discretise the shape into an “up-down” fashion [73, 74]. For this, every 25000 regular time intervals the slope of the curve is computed, and we write “1” if the slope is positive, and “0” if it is decreasing. The reason for discretising is that calculating both the complexity and probability of continuous curves is problematic, and simplicity bias theory has been developed in the context of discrete output shapes/patterns, which are typically binary strings. In this manner, we have framed the genotype-phenotype map as a map from 15 parameters to length 19 binary strings. The standard Lempel-Ziv [65] compression-based complexity method will be used for this map. The choice of phenotype length to be 19 comes as a trade-off between not being too short such that only very few phenotypes are possible, and not being too long such that so many phenotypes are possible that obtaining decent statistics from frequency sampling becomes overly taxing.
It will be apparent that in this circadian rhythm model, the connection between the input parameter genotypes and output shapes is nontrivial and also not at all direct. Hence this map will act as a good test-case for the applicability of Eq. (2).
To test the predictive capacity of Eq. (2), we need to have good estimates of the true value of P (x → y), for some phenotype x, to compare against. In principle, the exact value of P (x → y) could be found via complete enumeration of all genotypes and computing all possible single point mutations, but typically that approach is computationally very demanding. Further, to compute P (x → y) for all possible pairs of x and y is even more demanding, so in practice we will choose only a few test-case phenotypes x, and then undertake uniform random sampling of the neutral space of x (i.e., the set of genotypes associated to x). The test-cases are obtained from sampling 100,000 random genotypes. These samples were randomly chosen while imposing the condition that the test-cases should have a spread of complexity and/or probability values. The reason for imposing this spread is to test the bound of Eq.(2) in a variety of complexity/and or probability cases. If instead we simply took the first few phenotypes to appear on random genotype sampling, then potentially the only phenotypes to appear would be very high probability, and therefore likely simple, or have some other narrow set of properties not typical of the full space of phenotypes.
For each of the collection of (n = 35) test-case starting phenotypes x, we found the genotypes that give rise to the same phenotype in the original 100,000 sample, that is, we find all the genotypes belonging to the same neutral set that exist within the 100,000 sample. For each of these x, we mutated each sequence with all possible single point mutations. In the case of the circadian rhythm, there were 15 parameters with each taking 8 possible values so that there are 7 possible mutations for each parameter. Therefore, for each sequence there are 15 × 7 = 105 possible resulting sequences after all possible single point mutations are imposed. Note that these resulting sequences will not necessarily be distinct. For each of these 105 resulting sequences, we can use the genotype-phenotype map to find their resulting phenotype y. By counting the frequency with which each x transitions to each of these y phenotypes, computational estimates of P (x → y) can be found.
Figure 1 shows the results of the above-described computational experiments. In panel (a), the basic probabilitycomplexity plot is displayed for completeness. As can be seen, there is a roughly linear upper bound decay in log P (x) with increasing complexity , as in Eq. (1). In (b),(c), and (d) the conditional simplicity bias data is plotted for three test-case starting phenotypes x. In Figure 1, and subsequent figures, the examples illustrate varying degrees of success in predicting simplicity bias levels. The progression starts with cases where all levels are successfully predicted, followed by examples with fewer levels predicted, when such examples exist. Within each category of success, the examples were selected randomly. To see all the cases we explored see supplementary information. As is apparent in (b), there is a decay in transition probability P (x → y) with increasing conditional complexity, as can be seen by their negative correlation, fulfilling the requirements for Level (I). Additionally, in these examples, (b) and (c) show a R2 > 0.5, which indicates a linear correlation of the upper bound. However (d) fails to fulfill the requirements of Level (II). Finally, comparing the slopes of the bound model (in red) and the slopes in the 95% CI of the upper bound data point, calculated using bootstrapping with replacement, we determined if the slopes are significantly different or if the bound model correctly predicted the of the upper bound decay. In the examples shown here, only (b) correctly predicts the slope. The plots show a fitted upper bound, in black, and also an upper bound prediction, in red, assuming Eq. (4) works for the bound slope.
Circadian rhythm map. (a) Simplicity bias observed for uniform random sampling the circadian rhythm map genotype sequences. The black line shows the linear regression using the maximal values for each complexity value . The following examples illustrate cases where the three levels of simplicity bias succeed to varying degrees. On top of each plot we can see the phenotype x on the left side and an example of a phenotype y, into which phenotype x has mutated to. On the plots, the * indicates where the example y can be found. The following three examples are chosen to represent different levels of success for the different levels of simplicity bias, whenever such an example exists, (b) shows an example when all levels are achieved, (c) when only the I and II levels are achieved and (d) when only the I level is achieved. This approach to choosing display figures will be used for all the maps we study. (b) Example for the transition probabilities P (x → y) of a starting phenotype with complexity
. Each blue point shows the conditional complexity
of one of the mutants found in the 1-mutant neighborhood exploration. The black line and gray shaded area show the linear regression performed using a bootstrap approach as described in the main text. The red line shows the bound model calculated using Eq. (2). In this example Spearman’s correlation σ= −0.44, such a negative correlation confirms Level (I). Level (II) is also achieved, because R2 = 0.71. Finally, Level (III) is also achieved, because the differences of the slopes calculated with the bootstrap method and the slope of the bound model are not significantly different from 0. (c) In this example, using a starting phenotype of complexity
, Level (I) is achieved with a σ= −0.34. Level (II) is also achieved, having a R2 = 0.85, however, Level (III) is not attained, because slopes of the bound and fitted model are significantly different. (d) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.47, even though the negative relation is very weak. However, Level (II) is not achieved, since R2 = 0.04 is below the 0.5 threshold we established. Level (III) is not achieved since the slopes are significantly different.
The results shown in Table I summarize the data by calculating the average and variability for each phenotype, considering all the genotypes that were found. When calculating the averages, the results from each genotype linked to the phenotype are equally included. Thus, if multiple genotypes were found for a phenotype, each contributes to the final calculation of averages and standard deviations, reflecting the expected outcome if one were to randomly select a genotype for a given phenotype. In addition, in Table II, we show similar results, but focusing on the average and variation of the phenotypes, without considering how many genotypes map to each of them. For circadian rhythm, Table I shows the results for the n = 35 phenotypes studied here, but also considering the number of genotypes found that gave rise to each phenotype. Level (I) is achieved, with all cases showing a negative correlation. Similarly, Level (II) is achived in many cases showing a R2 > 0.5. Level (III) shows a lower degree of success, with only 14% of the slopes being predicted.
It should also be noted that we expect that point mutations on x, like those used in this study, will likely produce phenotypes y (i.e. P (x → y) > 0) with similar or lower conditional complexity. Conversely, mutations rarely produce phenotypes y with higher conditional complexity (i.e. P (x → y) ≈ 0). This outcome arises because the conditional complexity bias bound assigns higher probabilities to phenotypes with similar or lower conditional complexity, effectively excluding those with higher complexity. We tested this conjecture (see Appendix B) and observed that indeed, in all studied cases, more frequently found phenotypes consistently show lower conditional complexity.
B. Gene regulation network vector-matrix map
For our second genotype-phenotype map, we will use the vector-matrix multiplication map, which has been used to model genotype-phenotype maps in the context of gene regulatory networks [15, 35]. The map is defined by the following equation
where g is a length 15 binary string genotype, D is a square matrix made up of randomly chosen entries taking values in {−1, 0, 1}. These values represent gene regulation, including promoting and suppressing gene expressions levels. Finally, the Heaviside function H is applied so that for component j of the vector D · g if the value is positive, then the jth component of x is set to 1, otherwise it is set to 0. In this manner, we have a map from binary strings of length 15 to binary strings also of length 15.
While the connection between genotype and phenotypes is quite direct and simple in this map, it is an interesting map because in ref. [35] it was shown that this map does not show simplicity bias. The reason as discussed in [35], is that the map itself has high information content: the matrix D contains L2 random values for a genotype of length L. Therefore the information content of the map itself is typically much higher than that of any genotype (L2 ≫ L). One of the conditions proposed for observing simplicity bias was to have a simple (technically O(1) complexity) map, which does not hold in this case. Even though it does not exhibit simplicity bias when sampling over the full range of inputs (genotypes), it is interesting to see if there is a kind of conditional simplicity bias in the transitions P (x → y).
In Figure 2(a) we see, as shown already earlier [35], that there is no simplicity bias with this map. Although there does appear to be some kind of positive relation of complexity and probability in this plot, this is due to the fact that there are many more higher-complexity binary strings, so there is greater chance of at least some of them having higher probability. Looking to Figure 2(b), (c), and (d), we still see that Level (I) is achieved, as also shown in Table I, in most cases we see that there is conditional simplicity bias, with the fitted upper bound decaying with increasing conditional complexity. Because the median correlation value is only about -0.18, the relation is not very strong. The upper bound decay is often significantly linear, as seen in Table I, therefore achieving Level (II), although there is a wide spread of cases which cannot be considered to decay in a linear way. However, only in 30% of cases is the upper bound model correctly predicted. It appears that the upper bound slopes, while roughly linear, do not have very steep slopes. In ref. [63] it was observed that the slope of the decay in the upper bound can be reduced when random noise is introduced to the outputs. Speculating, it may be that the slopes here are not steep due to the high complexity (hence randomness) of the map itself.
Matrix multiplication map. (a) No simplicity bias observed for the uniform random sampling of the multiplication matrix map. As before, the following examples illustrate cases where the three levels of simplicity bias succeed to varying degrees. On top of each plot we can see the phenotype x on the left side and an example of a phenotype y, into which phenotype x has mutated to. On the plots, the * indicates where the example y can be found. As previously explained, the following examples illustrate varying levels of success corresponding to different degrees of simplicity bias. (b) Example for the transition probabilities P (x → y) of a phenotype with complexity . In this example σ= −0.26, this negative correlation confirms Level (I). Level (II) is also achieved, with a R2 = 0.89. Finally, Level (III) is also achieved, since the differences of the slopes calculated with the bootstrap method and the slope of the bound model are not significantly different from 0. (c) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.24. Level (II) is also achieved, with a R2 = 0.64, Level (III) is not predicted, as slopes of the bound and fitted model are significantly different (d) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.22. However, Level (II) is not achieved, since R2 = 0.10, failing at this Level (II) also means that Level (III) is not achieved.
How is it possible that we observe conditional simplicity bias, but not simplicity bias when sampling purely random genotypes? The reason is probably related to the following: If a map assigns genotypes to phenotypes in a purely random manner, then this is a maximally complex map. We can estimate the Kolmogorov complexity of the random map by taking the logarithm of the total number of possible ways to assign ng genotypes to np phenotypes. For this computation, we can use Stirling numbers of the second kind, denoted S(ng, np), and multiply by (np!). For large ng and ng ≫ np, which is typical in genotype-phenotype maps, then it is simpler to approximate that the number of maps is . Hence the complexity of a random assignment of genotypes to phenotypes is roughly ng log2(np) bits. On the other hand, the complexity of the matrix map can be estimated as O(L2) bits, because there are L2 entries in the matrix, where L = log2(ng) is the length of the binary genotype. Clearly, ng log2(np) ≫ O(L2), and hence the map is far from being completely random. It follows that the mapping is of medium complexity, we could say: it has higher complexity than any one genotype and so is complex enough to impact the probability-complexity connection, but at the same time contains much less information than a purely random map (for which we would expect no connection between probability and complexity). Due to the non-random nature of the matrix map, we still expect to see some structure and pattern in how inputs are assigned to outputs. Hence this could explain the observation of conditional simplicity bias, even without the original form of plain simplicity bias.
C. Tooth developmental model
Development of complex organs typically encompasses sets of different generative factors that can interact in nonlinear ways [75]. For instance, genes may interact with each other dynamically to orchestrate changes in cells and tissues in a spatially and temporally distinct manner, while bio-mechanical parameters may bias which specific shape changes are possible or facilitated [76, 77]. Thus, numerical models exploring such developmental dynamics often feature heterogeneous input variables that transform simple patterns into complex ones whose values may be continuous and non-finite. This means that the emerging genotype-phenotype maps differ from many of the previously studied ones, namely regarding both non-discreteness of input and output variables as well as the heterogeneity of the mechanics of their interactions. It is therefore an interesting and important task to assess to what extent the mathematical laws established through the study of simpler models apply to this class of models too.
A representative numerical model of tooth developmental is in ref. [78], which allows testing of the contribution of genetic, cellular, and mechanical factors to the formation of realistic tooth shapes as folded 3D meshes. This tool has been used and modified throughout a number of studies in different organisms, namely rodents, seals, prehistoric mammals and, recently, sharks [78–82], underpinning its versatility and scientific pertinence. Besides testing mechanistic developmental hypotheses, this model allows for the study of trait evolution by parameter mutations [83]. It has also been a useful tool to explore genotype-phenotype map properties, revealing a bias against complex shapes [37] and morphospace degeneracy [81]. Here we build on this work by systematically assessing whether we observe comparable phenotype transition probabilities as with the previous models. We take advantage of the versatility of this tooth model by applying the model to its original species of study, the seal Phoca hispida ladogensis [78] and quantifying phenotypic complexity in two complementary ways, thus more rigorously testing the generality of our hypotheses.
The first way to quantify tooth complexity is by simply counting the number of cusps in a tooth [79, 84]. Here, we identified cusps as local elevations of the in silico mesh representing the epithelial-mesenchymal interface (i.e., mesh nodes whose z-coordinate had a higher value than its neighbours).
The second way to measure complexity is by using Orientation Patch Count Rotated (OPCR), a widely-used, high-resolution metric for quantifying the surface complexity of teeth [82, 85–88]. A patch is defined as a group of contiguous points on the tooth surface facing the same “compass” direction, such that they have similarly angled normal vectors when projected on the XY plane [85, 87]. Orientation Patch Count (OPC) counts these distinct patches and approximates the number of ‘tools’ on the tooth crown used for breaking down food [85]. OPC has been shown to correlate with diet [85, 88, 89], observing an increase in dental complexity when moving from hypercarnivory through omnivory to herbivorous species [85, 86]. This increase in surface complexity may reflect the increased demands of mechanical processing in herbivore diets, compared to that of carnivores.
Patch count provides a more sensitive measure of complexity compared to landmark-based methods due to its finer resolution of surface data [87]. OPC has been employed to measure the surface complexity of teeth in primates [87, 88, 90, 91], multituberculates [92], carnivorans [85, 86], rodents [79, 82, 85, 86], bats [89], and generalized models of tooth development and adaptation [37, 93]. OPCR further improves upon OPC by reducing sensitivity to tooth orientation [86, 87]. Using MorphoTester, a GIS software, we divide the tooth surfaces into patches of equivalent orientation, with a minimum patch size of three grid points [87]. We rotate individual molar specimens eight times across a total arc of 45° (5.625° per rotation), calculating OPC at each rotation, and averaging these eight values to obtain OPCR [87]. OPCR can then be visualized by coloring surface patches one of eight colors corresponding to patch orientation [87].
To study this model we adjusted the strategy that has been followed for the other models presented here, primarily to accommodate the continuous parameter inputs, which are more challenging to discretise as with the circadian model. To circumvent this issue, we first define our map inputs as the unique combinations of 26 parameters responsible for cellular and genetic interactions in seal tooth development [78]. We then establish a biologically realistic range for each variable parameter by individually varying parameters until the tooth produced either an unrealistically flat structure or unrealistic globular clusters of cusps. Using these ranges, we apply Latin hypercube sampling to divide each range into 19,000 equal-probability strata, selecting one sample from each stratum. For the conditional simplicity bias experiments, we begin with a range of discretized genotypes (parameter combinations) that replicate real seal teeth found in nature [78]. Mutations are then introduced by modifications of the model parameters. Specifically, for each mutant, we changed a randomly chosen parameter p to a value ranging between the value of the respective parameter in the “parental” tooth p0 and either the minimal or the maximal allowed parameter value pm (as in [81]). Since we use continuous values in this model, we could not explore every single possible mutation. Instead we explored 19,000 mutants per “parental” tooth. We calculated each mutant’s conditional complexity as the smaller value between the mutant’s OPCR and the absolute difference in OPCR between the parent and mutant tooth.
The results for the tooth model using cusps number as the complexity measure can be seen in Figure 3. In Figure 3(a) we show that despite the added complexity of the tooth model genotype-phenotype map, the probability of finding simple teeth is consistently larger than of finding more complex teeth, which is in line with other models and was to be expected. In fact, the decrease of the logarithm of frequency with increasing phenotypic complexity follows a near-linear curve, with the notable exception of mono-cuspid teeth, the lowest possible complexity. The high frequency of very simple shapes (1 cusp) might reflect that this minimum complexity exists for free, i.e., without the activation of specific mechanisms, and can be equally accessed from any part of the morphospace. As shown in Figure 3(b-d), we see that Level (I) is achieved in these examples. In 3(b-c) Level (II) and (III) are also achieved, showing that the decay is linear and the slope of decay can be predicted. However, in 3(d) although Level (II) is achieved, Level (III) is not successful. In table (1) we can see that in all cases Level (I) and (II) is achieved, but Level (III) is successful in 57% of the cases.
Tooth model with complexity as cusp number. (a) Simplicity bias found for the uniform random sampling of the tooth model where we measure complexity as the number of cusps. The black line shows the linear regression. On top of each of the following plots we can see the phenotype x on the left side and an example of a phenotype y, into which phenotype x has mutated to. On the plots, the * indicates where the example y can be found. As previously explained, the following examples illustrate varying levels of success corresponding to different degrees of simplicity bias. (b) Example for the transition probabilities of a phenotype with complexity . In this example σ= −0.20, such a negative correlation confirms Level (I). Level (II) is also achieved with an R2 = 0.95. Finally, Level (III) is also achieved, since the differences of the slopes calculated with the bootstrap method and the slope of the bound model are not significantly different from 0. (c) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −1.0. Level (II) is also achieved, with an R2 = 0.97, Level (III) is also achieved. (d) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −1.00. Level (II) not achieved with an R2 = 0.79 however, Level (III) is not achieved, since the slopes of the fitted model and the bound model are significantly different.
The results for teeth using OPCR complexities are plotted in Figure 4. As can be seen in Figure 4(a), the tooth model with this new complexity measure likewise exhibits simplicity bias, as teeth with lower patch counts occur much more frequently in our exploration. In Figure 4(b-d) we generate three in silico teeth with complexity values of 86.25, 46.13, and 103.75, respectively, and introduce point mutations on single parameters within biologically realistic ranges. Mutants in Figure 4(b-d) exhibit Level I conditional simplicity bias: there is decay, and as Level II is also achieved in all cases, this decay is mostly linear. However, in 4(c-d) Level (III) is not achieved. In Table (1) we show that only in 34% of the all the studied cases, Level III is achieved, since the slopes are correctly predicted by the upper bound model. Notice that here, because the phenotype categories are not as distinctly separated, we divide the conditional complexity into 10 groups and select the point in each group with the highest transition probability to perform the bootstrap (the black points in 4(b-d)). Otherwise, the procedure to determine if Level (III) was achieved is the same as explained in Section III-C.
Tooth model with OPCR as complexity. (a) Simplicity bias found for uniform random sampling of teeth using OPCR as the measurement for complexity. The black line shows the linear regression using the maximal values for each category of complexity. On top of each of the following plots we can see the phenotype x on the left side and an example of a phenotype y, into which phenotype x has mutated to. On the plots, the ∗ indicates where the example y can be found. As previously explained, the following examples illustrate varying levels of success corresponding to different degrees of simplicity bias. (b) Example for the transition probabilities of a phenotype with complexity . In this example σ= −0.45, this negative correlation confirms Level (I). Level (II) is also achieved, with a R2 = 0.74. Finally, Level (III) is also achieved, since the differences of the slopes calculated with the bootstrap method and the slope of the bound model are not significantly different from 0. (c) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.80. Level (II) is also achieved, having a R2 = 0.78, however, Level (III) is not predicted, as slopes of the bound and fitted model are significantly different. (d) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.30. Level (II) is achieved, with an R2 = 0.64, finally, Level (III) not achieved.
Taking advantage of a complex model producing realistic shapes, we conclude that complexity of the mechanics of a generative system may not cause simplicity bias to be weaker. This is corroborated by the fact that we see similar results irrespective of the complexity measure and even when applied to another species (Appendix D for the results using the tooth model adapted for sharks). Overall, the progressive rarity of complex tooth shapes in the tooth model does not come as a surprise: it is due to the fact that many developmental parameters need to be fine-tuned in order to achieve some level of phenotypic complexity and stability [77], and there are always multiple ways that parameter changes will result in failure to reproduce a phenotype, leading to an unavoidable bias towards simpler shapes [37]. Notably, this theoretical argument has been corroborated experimentally [94]. As noted before, even though the map here is highly complex, it is not completely random. The tooth model follows some biomechanical rules and was conceived to be able to reproduce the natural variation found in seals [78], and in a more recent version, sharks [81]. This involved a choice of tunable mechanisms which was informed by knowledge about tooth development. Therefore, the interactions between the different components during development follow some non-random patterns, which are the result of an evolutionary process, where some soft matter dynamics and biomechanical interactions are more likely than others [76]. This is quite different from the matrix multiplication map, where all interactions are completely random, leading to a medium complexity map (discussed above), so that the output may depend more of the map itself than of the input information.
Despite a generally monotonous decrease of conditional probability of occurrence with increasing complexity, we have seen that this decrease becomes smaller towards the right side of the diagrams. This may be interpreted in terms of a relatively high robustness, meaning that especially the more complex teeth tend to reproduce themselves upon mutations. This may reflect disparities between the effects different parameters have on phenotypic changes, especially since the step size of mutations was set to be gradual, permitting negligible changes in values. Alternatively, the iso-morphological walk can be considered a proxy of an evolutionary process that is more likely to find robust phenotypes within the morphospace. In addition, our observation may reflect another potentially general property of genotype-phenotype maps: different complex phenotypes tend to be clustered in islands within morphospaces, facilitating transitions between them [37]. This would suggest that mutants of very complex phenotypes might either be extremely simple due to failed development, or be - more often than expected only slightly less complex, thereby affecting the shape of the conditional probability function.
One possible caveat lies in the arbitrary end point of development, which excludes several in silico teeth whose complexity unfolds too slowly. Although this issue would occur for any choice of end points, it may partially explain the conspicuous differences between the frequency of the most simple (1-cuspid) and all other complexity categories: since development does never decrease complexity, only the former will never be affected by end point choices.
Interestingly, our results do not seem to be strongly affected by the way how complexity is discretized. Thus, we suggest that the difficulty of choosing the most suitable complexity measure and data discretization method may not be key obstacles in quantifying complexity biases in biologically relevant complex traits.
D. Polyominos
Polyominos are 2D square lattice tile shapes, formed of self-assembled individual square blocks [95, 96]. Each individual square block has labelled edges, with certain labels allowed to stick to certain other labels, and certain labels prohibited from sticking to certain other labels. In this genotype-phenotype map, the genotype specifies the rule-set determining which edge type can stick to which other edge type, and the overall multi-tile shape of the self-assembled polyomino defines the phenotype. For example, a single square block with no bonds is a (fairly trivial) phenotype; and a two-by-two square is an example of another phenotype. With a large number of tile-types and many tiles, a whole array of different phenotype shapes can be formed.
Despite the abstract nature of this genotype-phenotype map model, polyominos have been used to model biological self assembly, for example in terms of protein quaternary shapes [96], including to successfully explain certain aspects of protein evolution [26, 33]. The specific polyomino model we use here has a genotype which is a binary string specifying which tiles faces can stick to which other tile faces. The data set comes from ref. [26]. In this data, there are 22 different phenotype shapes.
The polyomino map has been examined in terms of simplicity bias in the sense of Eq. (1), and it was shown that clear simplicity bias is observed [33]. However, the complexity measure used in that study was specifically designed with polyominos in mind, rather than being a generic mapagnostic complexity measure for 2D tile shapes. Naturally, a complexity measure designed with the specific map in mind will likely produce more accurate probability estimates or bounds than a generic complexity measure. However, if the goal is to try to create an information complexity theory that applies at least somewhat to a whole range of maps without having to know the details of the mapping process, then using a map-specific complexity measure is not desirable. Hence, here we introduce a fairly generic complexity measure, which we call path complexity, and study simplicity bias and also conditional simplicity bias. The path complexity method is based on ‘walking’ around the perimeter of the polyomino, and recording the direction of each step: F (forward), L (left), R (right). This yields a string of characters, which can be compressed. For example, a two-by-two polyomino made up of four tiles would have a short and repetitive path needed to describe its perimeter, and hence would have a low complexity value. By contrast, a larger and more irregular polyomino yielding a longer and more irregular path would be have a higher complexity value. Very few of the relatively small polyominos in the current data set have holes in them (i.e., missing tiles on the inside of the shape), so ‘walking’ around the perimeter of the shape is sufficient to describe the entire shape. For the few shapes which do have holes, we separately record the complexity of the hole and add it to the path complexity of the perimeter. In this manner, the path complexity corresponds to a descriptional complexity approach to assigning complexities, which is the essence of how Kolmogorov complexity quantifies complexity. (See also another ref. [97] for Kolmogorov complexity-based approach to estimating the complexity of polyominos). In Figure 5(a) we show how this new path complexity measure works quite well, enabling the a priori prediction of the probabilities, just based on complexities of the shapes.
Self-assembling polyomino tiles. (a) Simplicity bias found for the uniform random sampling of polyominos. The black line shows the linear regression using the maximal values for each category of complexity. On top of each plot we can see the phenotype x on the left side and an example of a phenotype y, into which phenotype x has mutated to. On the plots, the * indicates where the example y can be found. As previously explained, the following examples illustrate varying levels of success corresponding to different degrees of simplicity bias. (b) Example for the transition probabilities of a phenotype with complexity . In this example σ= −0.26, such a negative correlation confirms Level (I). Level (II) is also achieved, with a R2 = 0.62. Finally, Level (III) is also achieved, since the differences of the slopes calculated with the bootstrap method and the slope of the bound model are not significantly different from 0. (c) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.58. Level (II) is also achieved, with a R2 = 0.75, however, Level (III) is not predicted, as slopes of the bound and fitted model are significantly different. (d) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.29. However, Level (II) is not achieved, with R2 = 0.42, failing at this Level (II) also means that Level (III) is not achieved.
Next we consider predictions of the transition probabilities, P (x → y). This genotype-phenotype map provides a somewhat challenging case study because estimating the conditional complexity is not straightforward. The conditional complexity should measure how much extra information is required to build polyomino shape y given polyomino x. In some cases this is straightforward, for example if x is 2 by 1 polyomino, and y is an L-shaped polyomino made up of three blocks, then it is easy to see that y can be made by adding just one tile to x, and hence a method to estimate the conditional complexity is clear. Another easy case is when the starting phenotype x is just a single block, because in this case
, due to the fact that the single block does not provide any useful information to aid in describing y, except perhaps if y is also the single block. Also, if x and y are completely unrelated shapes, then we can make the estimate that
, which is theoretically well-founded because it is well known that most patterns x and y share no common information [55], implying that
. In other cases where x and y do share some information, and x can provide useful information for construct y, it is not completely trivial how to estimate
, or to invoke Eq. (3).
Nonetheless, we propose one method to estimate the conditional complexity here: We read the 2D tile grid row by row, if we find a square, we record it as 1, if it is an empty square, as 0. In this manner, the polyomino on the lattice grid can be described via zeros and ones. For example, a hollow four-by-four square, would be read as: 1111 1001 1001 1111. Eq. (3) requires concatenating to find the value of , and this is achieved by concatenating these binary string numbers to achieve a single binary string that can be used in the Lempel-Ziv complexity measure CLZ.
Turning to panels Figure 5(b),(c), and (d) we see that a roughly linear upper bound decay appears, but the slope is not accurate. Hence we have achieved Level (II) for this map. See also Table I.
E. HP proteins
A popular model in computational studies of genotypephenotype maps is the HP protein model [98, 99]. In this model, the process of protein folding is simplified so that all amino acids are either hydrophobic (H), or polar (P), instead of having the full suite of 20 amino acids naturally present. Sequences ‘fold’ to their minimum energy structures, where energy values come from counting nearby energetically favourable interactions. Further, the HP protein structures are confined to a lattice, so that there are only finitely many possible structures for a given length HP protein chain. In this genotype-phenotype map, the connection between the inputs and outputs is direct, similarly to the RNA and protein secondary structure models also studied often. Here, we explore simplicity bias for the HP protein model with sequence length of n = 25 (data from [99]).
Because the rule system defining the map is of fixed O(1) complexity, we might expect to see simplicity bias. Additionally, earlier studies of HP protein maps have found that there is bias in the map, and that the structure with the highest probability displays overall symmetry [98]. However, in Figure 6(a) we see that there is no simplicity bias, and it is not completely clear why this is, but we can propose some possible reasons: Firstly, it may be that the complexity measure we use is not able to detect the relevant patterns in HP proteins. Secondly, it could be that the structures we use are too small to show clear simplicity bias. Thirdly, it is apparent that there is relatively little bias in this map, because the probabilities vary only over one or two orders of magnitude, despite the broad range in complexity values. Clearly, without strong bias (i.e., strongly non-uniform probabilities), there cannot be pronounced simplicity bias (see [62] for a discussion and example of this). See below for more on the strength of bias. Relatedly, there do not appear to be any HP protein structures that have very high probability.
The HP protein map. (a) No simplicity bias found for the uniform random sampling in the HP protein map. The black line shows the linear regression using the highest probability values for each unique complexity value. On top of each plot we can see the phenotype x on the left side and an example of a phenotype y, into which phenotype x has mutated to. On the plots, the * indicates where the example y can be found. As previously explained, the following examples illustrate varying levels of success corresponding to different degrees of simplicity bias. (b) Example for the transition probabilities of a phenotype with complexity . In this example σ= −0.45, this negative correlation is suggestive of achieving Level (I), but a visual inspection of the data highlights that in fact there is little evidence of a trend. The metrics suggest Level (II) and Level (III) are also achieved, with a R2 = 0.73 but again visual inspection of the data substantially reduces our confidence in these conclusions. (c) In this example, using a phenotype of complexity
, Level (I) is apparently achieved with a σ= −0.10, but the same comments as for a (b) apply. Level (II) is also apparently achieved, having a R2 = 0.53, and even Level (III) is achieved according to the metrics, but the visual inspection of the data implies that the evidence of conditional simplicity bias is weak. (d) In this example, using a phenotype of complexity
, we have σ= −0.84 and a R2 = 0.93, while Level (III) is not achieved. Again, the paucity of data and lack of intermediary complexity and probability points makes it hard to draw conclusions.
Turning to Figure 6(b), (c), and (d), the data do not show clear evidence of conditional simplicity bias. In each panel, there are two clusters, one at high probability and low conditional complexity, and the other at high complexity and low probability. However, between these two clusters there is an absence of intermediary complexities and probabilities. Hence these data are inconclusive, and do not provide strong evidence of conditional simplicity bias.
It is worth highlighting that unlike in panel (a), panels (b), (c), and (d) show large variations in probability (3 to 3.5 orders of magnitude) for a similar range of complexity values (around 30 bits). There is stronger bias here even while there is not much bias in panel (a). We can conclude that there is some very modest evidence of observing Level (I), but no clear conclusions can be made regarding Level (II) and Level (III).
V. DISCUSSION
We have investigated an approach to predict, or at least bound, the probabilities of phenotype transitions upon random genetic mutations, using arguments inspired by algorithmic information theory, and especially the phenomenon of conditional simplicity bias (Eq. (2)). Earlier [38], it was shown that the transition probabilities in computational simulations of RNA and protein secondary structure genotype-phenotype maps could be upper-bounded by estimating the complexity of the starting and resulting phenotypes. The ability to make such predictions was noteworthy because it suggested that map-agnostic bounds, just relying on information complexity arguments, could provide nontrivial predictions of transition probabilities, which may be useful in cases where the details of the underlying genotypephenotype map are not known. More broadly, the ability to make such predictions supported the exploration of information complexity arguments for developing mathematical laws in biology.
In the present study, we have extended this research direction by applying the conditional simplicity bias bound to several other genotype-phenotype maps, and in particular more ‘challenging’ maps were chosen which in one way or another pushed the limits of the applicability of the conditional simplicity bias bound. This included more intricate and complex maps; namely a differential equation model of a circadian rhythm, a gene regulatory network matrix map, a detailed tooth development model (with two types of complexity estimate), a polyomino self-assembled protein complex map, and an HP lattice protein map. Overall the numerical experiments show that some degree of transition probability predictability can be achieved, varying between maps: In nearly all cases, Level (I) conditional simplicity bias was achieved, meaning some general inverse relation between probability and complexity. In several cases, Level (II) was achieved in which the upper bound on log P (x → y) was found to be roughly linear. Not many cases achieved Level (III), in which the slope was also correctly identified.
From the above example maps, a couple deserve to be highlighted: The matrix-multiplication map (and in a weak sense the HP protein map) showed conditional simplicity bias, while not showing simplicity bias. Different reasons were suggested for this, but this points maybe to the possibility that conditional simplicity bias is a more broadly occurring phenomenon than simplicity bias. The model of teeth development, which is highly intricate and biologically realistic shows clear conditional simplicity bias (and simplicity bias). This is noteworthy, because in this model the connection between genotype and phenotype is indirect. This suggests that the bound may be applicable at higher or other levels of biological organisation.
As alluded to above, other methods for predicting phenotype transition probabilities, in particular those which invoke biophysical details of the map, or other details of the relevant genotype-phenotype map, will no doubt yield more accurate predictions for transition probabilities (e.g., [100]). Nonetheless, these other methods have a different aim and different list of requirements and assumptions, and hence cannot be meaningfully compared to the predictions done here.
A weakness of our predictions is that they only constitute an upper bound on the probabilities, with many phenotype patterns falling far below their respective upper bounds. These phenotypes have low probability values, while at the same time low complexity values. Following the hypothesis from refs. [58, 64], these low-complexity low-probability outputs are presumably patterns which the genotype-phenotype map find ‘hard’ to make even though they are not intrinsically very complex. It may be possible to extend approaches developed in [58, 64] that also take into account the complexity of the genotypes to help explain which types of patterns occur far from the bound, and to find improved estimates for their probabilities. Having described this weakness in predictions, from a different perspective the problem is not as severe as it might seem: It is known that randomly sampled genotypes are likely to generate phenotypes which are close to the bound [58]. In other words, even if many of the phenotypes are far below the bound, most of the mutations map to phenotypes which are close to the bound. A related weakness is that we can rarely predict the value of the slope of the upper bound, that it, Level (III) was rarely achieved. Even with this weakness, we can still predict other properties of interest, like for example which of two phenotypes is more likely [38].
An extreme case of low-probability transitions arises when P (x → y) = 0, indicating no direct mutational pathways between two phenotypes. Such connections are crucial for navigating fitness landscapes, as they determine the accessibility of evolutionary pathways [101]. The navigability of a fitness landscape can be analysed in terms of a directed phenotype network [102]: When most phenotypes are connected, navigability is high due to potential high-dimensional bypasses. Conversely, if most phenotypes lack connections, the fitness landscape becomes rugged, making it hard for evolving populations to locate fitness peaks. Additionally it is reasonable to assume that fitness differences may also be linked to conditional complexity. Phenotypes with lower conditional complexity relative to a high-fitness phenotype are likely to exhibit higher fitness, while the opposite may also hold true. This effect would suggest that larger transition probabilities are typically towards phenotypes with more similar fitness, generating interesting correlations in fitness landscapes, and potentially causing the relevant fitness landscapes to be smooth. All this suggests that finding genotype-phenotype map agnostic information theory arguments to predict the topology of a genotype-phenotype-fitness map may be a fruitful future research programme.
In this study we assume simple uniformly random mutations which ignores mutational biases [13]. While this is a simplification of reality, it is unlikely that incorporating these biases would drastically alter the global patterns of transition probabilities, we demonstrated here, see e.g. [27] for an example. Nevertheless, there may be interesting directions to study where mutational biases and phenotypic biases interact. Similarly, the effect of compositional bias in genomes such as CG bias, may affect phenotypic biases.
The word “complexity” can take on many meanings in the scientific literature [103, 104], which can lead to confusion. In this work we mean “complexity” in the sense of Kolmogorov complexity. While strictly, Kolmogorov complexity is uncomputable, it can be estimated using methods such as data compression, or other computable measures of descriptional complexity. In practice we have mainly employed complexity measures based on lossless compression methods, which is a standard and theoretically motivated approximation to the true uncomputable quantity. We have also employed other descriptional complexity measures here such as cusp-count and OPCR. The former is a biologically intuitive but quite coarse-grained measure of the descriptional complexity for teeth shape, while the latter is a more fine-grained descriptional measure the variability of tooth’s surface. Previous work has shown that well-motivated approximations to descriptional complexity can work well in capturing the biases predicted by the AIT derived bounds [33, 35, 59–61].
It is also noteworthy that while we have studied applications of simplicity bias to biological genotype-phenotype maps, others have applied the same theory to genetic programming problems, and have also observed simplicity bias in that context [105–107]. Genetic programming and genetic algorithms for optimization are inspired by biology, but the applications to computer science are quite far removed from living systems. Hence these applications can be construed as beyond biology. It would be interesting for future work to see if the conditional form of simplicity bias explored here, not just simplicity bias, can be fruitfully applied in the context of genetic programming.
Interest in the topic of biases in the introduction of variation has grown in recent years [4, 9, 13]. This recognition contrasts with a common (even if tacit) assumption that variation is either roughly uniform, or that biases do not have much impact on evolution [9]. It also contrasts with the view that developmental biases merely limit certain possibilities [7], as opposed to positively affecting direction of evolution [24, 108]. Within this framework, we see that simplicity bias and conditional simplicity bias provide further sources of what can be quite a pronounced bias in the introduction of variation.
Related to this, Salazar-Ciudad [109] raises the interesting objection that the term “bias” is improper: there is no bias, but simply the action of development. The argument being that it is not correct to first imagine possible phenotypes in a morphospace, and then react with surprise when these imagined possibilities do not manifest, or only a small fraction of these possibilities appear. While we agree that there is a valid point raised here, especially in the context of development, we maintain the appropriateness of the word “bias” in our context. This is due to the fact that we use bias to mean that assuming a uniform distribution of random mutations, a non-uniform (biased) distribution over phenotypes will result, and that this bias has certain predictable properties, in particularly a prefernce for simplicity. This is the sense in which we mean simplicity “bias”.
It is interesting to open the discussion regarding whether or not bias itself is an evolved property [9, 108]. Some questions are: Have biases changed over time? Were some biases selected for? Can biases in genotype-phenotype maps be tuned via evolution? It is conceivable that some biases may have adapted to the needs of the organism; that is, it may be that the genotype-phenotype map in some cases adapts itself via evolution so that the phenotypes which are ‘favored’ by bias are the ones which are most needed to adapt to the environment. In many biological contexts this type of adaptive argument is plausible, but as argued by Ghaddar and Dingle [34], there are other types of bias which result from basic physical, chemical, or information constraints. In these cases, it is difficult to see how selection could alter these fundamental properties, which would be needed to alter the biases. This observation applies here also to the case of biases arising from conditional simplicity bias. Having said that, one area in which there is still potentially room for adaptation to tune the bias is in the context of low-complexity, low-probabilities phenotypes [58, 64]. The information constraints apply to the upper bound probability, but not directly to how far a given phenotype’s probability is below the bound. Hence there could conceivably be some tuning of the bias in relation to the distance from the upper bound.
Concluding the discussion of biases, it is interesting that unlike other developmental biases, the biases arising from (conditional) simplicity bias do not depend on evolutionary history, making it easier to predict the direction and type of bias. This follows from the fact that (conditional) simplicity bias derives from intrinsic information arguments, and the relative simplicity or complexity of phenotypes, and these quantities can be estimated a priori. This predictability contrasts with other developmental bias cases like the famous example of patterns of digit reduction in amphibians [110]. The patterns of reduction observed were (presumably) contingent, and not predictable a priori, but instead could only be uncovered via direct observation to uncover the type of bias.
Looking to future work directions, three directions stand out: Firstly, there is the question of how the link between conditional complexity, transition probabilities, and fitness differences affects the topology of fitness landscapes. The second is more mathematical: For the basic simplicity bias described by Eq. (1) we can predict the slope quite well [35], but in the conditional simplicity bias plots which employ Eq. (2), our ability to predict the slope is not as good (as we have seen here in the current study). Can we explain the origin of this discrepancy, and perhaps find better predictors? Finally, while the current study has focused on genotype-phenotype maps, which is a biological context, the conditional simplicity bias bound in Eq. (2) should be a generic property of input-output maps, and be applicable far beyond biology (Cf. [70, 111]). Hence there is a potentially large number of applications and extensions for this line of research.
Data availability
The code and data used in this work are available at GitHub: https://github.com/hagolani/Bounding-phenotypemorphology-transition-probabilities-via-conditional-complexities
Acknowledgments
This project has been partially supported by Gulf University for Science and Technology and the CAMB research center under project code: ISG Case 9 and ISG Case 44. We thank Iain Johnston and Sam Greenbury for providing polyomino data.
Appendix A: Applications of Kolmogorov complexity
The application of AIT to real-world science problems suffers from several problems, including that (a) Kolmogorov complexity is uncomputable, (b) the results are framed and proved in the context of universal Turing machine (UTMs) while many real world maps are not Turing complete, and (c) results are valid up to O(1) terms and therefore, strictly, only accurate in the asymptotic limit of large complexities. Given these, it is surprising that AIT and algorithmic probability can be successfully applied at all. However, several lines of reasoning can help us understand why they are, in fact, applicable, at least approximately.
Firstly in response to (a), although technically uncomputable, Kolmogorov complexity is fundamentally merely a measure of the size in bits of the compressed version of a data object. Hence, Vitanyi [54, 112] points out that because naturally generated data is unlikely to contain pseudo-random complexities like π, the true complexity is unlikely to be much shorter than that achievable by every-day compressors. For more on this discussion, see ref. [112] and ref. [113] for work on short program estimates via short lists of candidates with short programs. Secondly, in response to (b) it is worth noting that the simplicity bias bound is specifically relevant in the computable (i.e., non-UTM) setting. Moreover building on pioneering work with small Turing machines [68, 69], Zenil et al. [114] have numerically studied algorithmic probability at different levels of the Chomsky hierarchy and found that it persists. Indeed, they also found close agreement between complexity estimates obtained from different levels, for the short binary strings which they studied. From the theoretical side, Calude et al. [115] developed an AIT for finite state machines (the lowest on the Chomsky hierarchy), deriving analogous results, suggesting that many fundamental ideas and results from AIT need not only apply to UTMs. See also somewhat similar in ref. [116], and also the success of the Minimum Description Length (MDL) approach to statistics [117] which is a kind of computable version of AIT. Given that UTMs (at the top of the Chomsky hierarchy) and finite state machines (at the bottom of the Chomsky hierarchy) share many similar mathematical results related to AIT, it is not unreasonable to assume that the results hold for other systems (such as some biological systems) which sit between the two extremes of the hierarchy. Thirdly, we argued earlier [35] that it is common in physics that mathematical formulae apply quite accurately well outside the (e.g., asymptotic) regions for which they have been proven. Although it is not possible to completely remove O(1) terms in AIT [118], it is still an interesting question in theoretical computer science why asymptotic analysis (such as ignoring O(1) terms) is valid and works so well in practical applications [119].
Further reasons to understand why the AIT coding theorem should work in real-world applications can be found in information theory research developed largely independently of algorithmic probability. The fundamental connection between probability and data compression has also been studied by Cover [120], Langdon [121], and Rissansen [122]. Since then, different communities — e.g. information theory [123, 124], optimal gambling strategies [125], and password guessing [111] — have studied and exploited the probability-compression connection without explicitly invoking Kolmogorov complexity or UTMs.
In a review, Merhav and Feder [126] surveyed results in the area known as universal prediction and explicitly point to 2−LZ(x) as an effective universal probability assignment for prediction based on the results of ref. [127] and others, where LZ(x) is the 1978 Lempel-Ziv compression complexity measure, essentially the same as we use here. Additionally, Merhav and Cohen [111] use a conditional coding theorem predictor 2−LZ(y|x) which is again very closely analogous to the AIT conditional coding theorem relation, 2−K(y|x) [70]. These results, together with the arguments given above in response to (a)—(c) all support and motivate the application of AIT coding theorem work in science and engineering, and also extending fundamental research related to the coding theorem and algorithmic probability [128].
The logic of this current study, and previous simplicity bias work [35, 38], is that we use AIT arguments to derive mathematical relations which can be proven to hold only in asymptotic limits, or under other conditions, but then apply these relations in other settings such as biology, and then simply test empirically whether or not the relations hold. Hence we have called this type of work “AIT-inspired arguments”, because it is perhaps not strictly AIT, which is a precise and abstract mathematical field.
Appendix B: Map complexity
The complexity of the map is an important consideration in simplicity bias studies. It has been suggested [35] suggested that the map should be simple, or O(1) complexity. What the condition is imposing is that the complexity of the output patterns should be due to the complexity in the input, rather than the complexity (hidden) in the mapping procedure.
As an extreme example, if there were only two possible inputs, 0 and 1, and 0 mapped to the Shakespeare’s Romeo and Juliet while 1 mapped to Hamlet, then the complexity of the outputs (Shakespeare’s plays) would be due to the fact that the texts must be already programmed into the map. In this case, the inputs are very simple and in no way account for the complexity of the outputs. On the other hand, if a competent programmer writes many lines of code to produce a complex output via only a basic computer with few in-built functions, then the complexity of the output would be due to the complexity of the input program, and not merely the map itself.
It follows that if an arbitrarily complex map is permitted, then the map could be chosen so that complex outputs have high probabilities and simple outputs have low probabilities, or the assignments of inputs to outputs could
Appendix C: Conditional complexity for found vs. not found phenotypes
For any given starting phenotype x, random mutations to its underlying genotypes may or may not yield all possible other phenotypes. In fact, it is quite likely that only a reduced set of phenotypes are accessible via single-point mutations from a given starting phenotype x, in other words P (x → y) = 0 for most y in the set of all possible phenotypes. We might expect that those phenotypes with lower conditional complexity will be ‘found’ via point mutations, where found ‘means’ that P (x → y) > 0, or in other words the phenotype y is present in the uniform random sampling and in the 1-mutational neighborhood of a given phenotype x. Similarly, we would conjecture that those with higher conditional complexity will not be found. The intuition here is that the conditional simplicity bias bound gives higher a priori probability to similar (or rather, lower conditional complexity) phenotypes, and so presumably the ‘not-found’ phenotypes will be those of highest conditional complexity.
We test this conjecture for the maps studied above by calculating the conditional complexity of all the found phenotypes and the conditional complexity
of the phenotypes that were not found. Next, we calculate the median conditional complexity for both groups and determine the difference between them:
. This process is repeated for each explored starting phenotype x. To facilitate comparison between the different models, we normalize each group by dividing by the absolute maximum value within that group. According to the conjecture stated above, we expect that
is negative (or at least not positive). In Figure C.1 the results of the numerical experiments are shown. Most cases display a negative value of
, and for none of them it is positive, in line with our expectation, as most of the found phenotypes have a lower conditional complexity than those not found, with only a few exceptions.
Phenotypes that are more frequently found tend to have lower conditional complexity. This figure shows the difference between the median conditional complexity of “found” and “not found” phenotypes, normalized by their maximal value. A “found” phenotype is one identified both through uniform random sampling and through single-point mutations of a phenotype x, such that P (x → y) > 0, while a “not found” phenotype is one observed only in uniform random sampling and not through single-point mutations, meaning P (x → y) = 0. The median conditional complexity is calculated for both groups, and for each phenotype x, the difference between these medians is determined as . Negative values indicate that the conditional complexity
of the “not found” phenotypes is higher. The boxplots present the results of these calculations for various systems, with the number of phenotypes x explored specified for each: Circadian rhythm (n = 35), vector matrix map (n = 310), polyominos (n = 21), HP protein map (n = 26), teeth model: OPCR (n = 12), and teeth model: cusps (n = 12).
In this analysis, it is important to note that, aside from the polyominos and the HP protein map, the sampling process is incomplete. Specifically, we are unable to identify all possible phenotypes or fully explore the entire 1-mutational neighborhood for each of the studied phenotypes, as both genotypes and phenotypes are continuous. A more comprehensive exploration could potentially uncover very rare phenotypes, which might influence the results presented here. However, these rare phenotypes are likely to be more complex than those identified in our current exploration. Under such circumstances, our conjecture would likely become even more apparent in this analysis.
Appendix D: Tooth developmental model adapted for shark teeth
Here, instead of using the tooth model as in [78], we used a modified version adapted to shark tooth development [81]. Similarly to the original tooth model, in order to study the shark tooth model, we did some modifications to the strategy that has been followed for the other models presented here, mainly to account for the continuous parameters inputs, however this strategy differs from the one used in the tooth model, as we describe in the following paragraphs. We first identified one tooth for each of the following number of cusps nc = 1, 3, 5, 7, 9, 11. Teeth with an even number of cusps were excluded since their appearance was rare due to the symmetric default development [129]. These initial teeth were taken from random morphospace explorations performed in [81]. In order to be able to use several different phenotypes with diverse parameter values, we then performed 100 iso-morphological walks [36], for each of these initial teeth. In each of the iso-morphological walks, teeth underwent 100 sequential steps of mutation, under the condition that their initial number of cusps remained unchanged in any tooth along the walk. Mutations are introduced by modifications of the model parameters. Specifically, for each mutant, we changed two randomly chosen parameters p to values ranging between the value of the respective parameter in the “parental” tooth p0 and either the minimal or the maximal allowed parameter value pm (as in [81]) in the following manner: p = p0 + (α2 · (pm − p0)) ; with α being a homogeneously distributed random number between 0 and 1. This procedure differed slightly from most other genotype-phenotype maps explorations in this study which typically introduce a single point-mutation at a time. However, we found that introducing two mutations was the most incremental way that allowed for an exploration of changes in dynamic parameter interactions and, thus, for an efficient detection rate of diverse tooth phenotypes. This also aligns more closely with natural evolution, where a small number of mutations often affects multiple developmental mechanisms. By performing these iso-morphological walks, we obtain 100 teeth per cusp number category nc. Finally, in order to measure the conditional complexity, we explored the two point-mutation neighborhood of each tooth in each category. Since we were using continuous values in this model, we could not explore every single possible mutation. Instead we explored 200 mutants per “parental” tooth. In order to account for these differences, when analysing the results for this exploration, we calculate the average frequency per complexity and conditional complexity category, since we are analysing the isomorphological space of cusps numbers rather than of specific phenotypes, since here we are defining the phenotype and complexity in the same way. In Figure D.2(a) we show that despite the added complexity the tooth model adds to the genotype-phenotype map, the probability of finding simple teeth is consistently larger than of finding more complex teeth, which is in line with other models and was to be expected. In fact, the decrease of the logarithm of frequency with increasing phenotypic complexity follows a near-linear curve, with the notable exception of mono-cuspid teeth, i.e. the lowest possible complexity. It could be argued that the linear probability decay may reflect that there is no qualitative difference between the mechanisms underlying an e.g. 3-to-5 and 5-to-7 cusp change. The high frequency of very simple shapes (1 cusp) might reflect that this minimum complexity exists for free, i.e.without the activation of specific mechanisms, and can be equally accessed from any part of the morphospace. As shown in D.2(b-d), the conditional simplicity bias appears to hold for the shark tooth model, with transition probabilities generally kept below the fitted and substantially below the bound model. Overall, the upper bound model appears to be a good predictor for the maximum transition probabilities. However, the monotony by which transition probabilities decrease appears weaker for more complex parental teeth. Level (I) and (II) are achieved in all the studied cases, which mean that we always see an linear fitted upper bound decaying with increasing conditional complexity. Even Level (III) is achieved in 73% of the cases.
Shark tooth morphology model.(a) Simplicity bias found for the uniform random sampling of the tooth model where we measure complexity as the number of cusps. As specified in this section, this plot shows the average mean probabilities found for each of the complexity categories (number of cusps). (b) Example for the transition probabilities of a phenotype with complexity . In this example σ= −0.95, such a negative correlation confirms Level (I). Level (II) is also achieved, since for the linear model built with this data, pval=0.0077. Finally, Level (III) is also achieved, since the differences of the slopes calculated with the bootstrap method and the slope of the bound model are not significantly different from 0. (c) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −1.0. Level (II) is also achieved, having the linear model a pval=0.0281, Level (III) is also achieved. (d) In this example, using a phenotype of complexity
, Level (I) is achieved with a σ= −0.95. However, Level (II) not achieved, since the linear models pval=0.0009, however, Level (III) is not achieved, since the slopes of the fitted model and the bound model are significantly different.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].
- [9].↵
- [10].
- [11].↵
- [12].↵
- [13].↵
- [14].
- [15].↵
- [16].
- [17].
- [18].
- [19].
- [20].↵
- [21].↵
- [22].
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].
- [31].
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].
- [43].↵
- [44].↵
- [45].↵
- [46].
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].↵
- [93].↵
- [94].↵
- [95].↵
- [96].↵
- [97].↵
- [98].↵
- [99].↵
- [100].↵
- [101].↵
- [102].↵
- [103].↵
- [104].↵
- [105].↵
- [106].
- [107].↵
- [108].↵
- [109].↵
- [110].↵
- [111].↵
- [112].↵
- [113].↵
- [114].↵
- [115].↵
- [116].↵
- [117].↵
- [118].↵
- [119].↵
- [120].↵
- [121].↵
- [122].↵
- [123].↵
- [124].↵
- [125].↵
- [126].↵
- [127].↵
- [128].↵
- [129].↵