## Abstract

A reduction in high-dimensional phenotypic states to a few degrees of freedom is essential to understand biological systems. One possible origin of such a reduction (as recently discussed) is the steady growth of cells that constrains each component’s replication rate. Here, in contrast, our aim is to investigate consequences of evolutionary robustness, which is shown to cause a stronger dimensional reduction in possible phenotypic changes in response to a variety of environmental conditions. First, we examined global protein expression changes in *Escherichia coli* after various environmental perturbations. Remarkably, they were proportional across components, across different types of environmental conditions, while the proportion coefficient corresponded to the change in growth rate. Because such global proportionality is not generic to all systems under a condition of steady growth, a new conceptual framework is then needed. We hypothesized that such proportionality is a result of evolution. To test this hypothesis, we analyzed a cell model—with a huge number of components, that reproduces itself via a catalytic reaction network—and confirmed that common proportionality in the concentrations of all components is shaped through evolutionary processes to maximize cell growth (and therefore fitness) under a given environmental condition. Furthermore, we found that the changes in concentration across all components in response to environmental and evolutionary changes are constrained to the changes along a one-dimensional major axis within a huge-dimensional state space. On the basis of these observations, we propose a theory in which high-dimensional phenotypic changes after evolution are constrained to the points near a one-dimensional major axis that correlates with the growth rate, to achieve both evolutionary robustness and plasticity. By formulating this proposition in terms of dynamical systems, broad experimental and numerical results on phenotypic changes caused by evolution and adaptation are coherently explained.

**Summary** Cells generally consist of thousands of components whose abundance levels change through adaptation and evolution. Accordingly, each steady cell state can be represented as a point in a high-dimensional space of component concentrations. In the context of equilibrium statistical thermodynamics, even though the state space is high-dimensional, macroscopic description only by a few degrees of freedom is possible for equilibrium systems; however, such characterization by a few degrees of freedom has not yet been achieved for cell systems. Given that they are not in equilibrium, we need some other constraint to be imposed. Here, by restricting our focus to a cellular state with steady growth that is achieved after evolution,we examine how the expression levels of its several components change under different environmental conditions. On the basis of analysis of protein expression levels in recent bacterial experiments as well as the results of simulations using a toy cell model consisting of thousands of components that are reproduced by catalytic reactions, we found that adaptation and evolutionary paths in high-dimensional state space are constrained to changes along a one-dimensional curve, representing a major axis for all the observed changes. Moreover, this one-dimensional structure emerges only after evolution and is not applicable to any system showing steady growth. This curve is determined by the growth rate of a cell, and thus it is possible to describe an evolved system by means of a growth rate function. All the observed results are consistent with the hypothesis that changes in high-dimensional states are nearly confined to the major axis in response to environmental, evolutionary, and stochastic perturbations. This description opens up the possibility to characterize a cell state as a macroscopic growth rate, as is the case for the thermodynamic potential. This approach can provide estimates of which phenotypic changes are theoretically more evolvable, as predicted simply from their observed environmental responses.

## INTRODUCTION

Cells contain a huge variety of components whose concentrations change through adaptation and evolution. To realize a theoretical description of such a system with many degrees of freedom, it is important to be able to characterize the system based on a few macroscopic variables, as demonstrated by the success in traditional statistical thermodynamics. In thermodynamics, by restricting the focus to an equilibrium state and its vicinity, a state can be described by a few macroscopic variables even though the number of microscopic degrees of freedom is huge. Accordingly, we sought to evaluate the possibility of adopting a similar approach to a biological system that contains a large number of components.

Of course, the strategy of statistical thermodynamics cannot be directly applied to a biological system given that cells are not in a thermodynamic equilibrium. Cells grow (and divide) over time in a manner that is driven by complex intracellular dynamics. Despite this difference, however, it may still be possible to establish a description of cells with only a few degrees of freedom by restricting our focus to only cells in a steady growth state. Indeed, for such cases, all the intracellular components have to be roughly doubled before cell division occurs for reproduction. This restriction imposes a general constraint on the possible changes in expression levels across the many components; this situation opens the possibility to characterize a cellular state with a few variables. Indeed, several research groups have attempted to search for universal laws in the steady growth state, as pioneered by the work of Monod [1] followed by subsequent studies [2–5].

Currently, research on the existence of such constraints is generally carried out by examining changes in expression occurring across thousands of genes in transcriptome or proteome data. In fact, a general trend across changes in the overexpression of many components has been observed [6–10] in the course of adaptation and evolution [11–16]. In particular, using transcriptome analysis of bacteria, we recently found that changes in (logarithmic) expression levels are proportional across many components under given stress conditions of different intensities, and the proportion coefficient corresponds to the change in the growth rate [17, 18]. To account for the experimental findings, a theory was proposed based on steady growth of cells and linear approximation under the assumption of small changes in expression levels. The theory can explain the proportional change in expression levels across many components, in response to different intensities of the *same* environmental stress, as long as the change is *small*. Experimental data, however, suggest that such proportionality in the expression levels across many genes also holds across different types of environmental conditions. Just as the previous theory concerned with the change only along a given single environmental change (i.e., changes characterized only by one parameter), it cannot explain the observed global proportionality across different directions in environmental changes. Furthermore, the experimental data suggest that the proportionality in expression levels is valid even in the case of substantial stress intensity, leading to a decrease down to 20% in the growth rates.

To sum up, the data, albeit preliminary, point to some novel theory that is not explained simply by the steady growth. Therefore, we first need experimental confirmation for the existence of such global proportionality across a variety of environmental conditions. If this confirmation is obtained, then we will seek a general theory beyond the simple steady growth.

In the present study, we first analyzed the changes in protein expression levels under a variety of environmental conditions using recent experimental data. We confirmed the pattern of proportional changes across all protein expression levels, with the proportion coefficient approximately consistent with the cell growth rate, representing a basic macroscopic quantity for steady-growth cells. The results are not explained by the previous simple theory because they deal with the proportionality across a broad range of different environmental conditions. It is not a generic property corresponding to simple steady growth and requires a novel theory to take into account some other factor.

At this stage, it is important to recall that biological systems are an outcome of evolution. Hence, it is natural to expect that the missing link may be provided as a result of evolution under a given environmental condition. Accordingly, we here examine validity of the hypothesis that evolution under a given condition will constrain intracellular dynamics so that the concentration changes across many components are mainly governed by changes in the growth rate. Extensive evolutionary experiments are needed to test this hypothesis, and this work may be too difficult to do in a laboratory. Nonetheless, a numerical evolution approach offers a possible way to address this question.

In fact, several models have been proposed that capture the evolution of a cellular state with many phenotypic variables. Although such models may not be able to describe the entire breadth of the complexity of cells, they are sufficiently complex to enable interpretations on the evolution of genotype—phenotype mappings [19–23]. Here, we study a cell model consisting of a large number of protein species that catalyze each other’s reactions transforming environmental resources for cellular growth. We demonstrate that after evolution, the changes across all catalysts are constrained to points near a one-dimensional curve that is governed by the growth rate.

According to these findings of numerical simulations, with support from laboratory experiments, we formulate the following hypothesis of an evolved system: phenotypic changes of many components after environmental or genetic alterations mainly take place along a single major axis that parameterizes the growth rate. Indeed, this theory can explain the results of phenotypic changes occurring by adaptation and evolution quite well. We then discuss the general consequences and biological relevance of this hypothesis.

## GLOBAL PROPORTIONALITY OBSERVED IN PROTEIN EXPRESSION LEVELS

By imposing linearization approximation with steady growth, we previously considered how each concentration of each intracellular component *x*_{j}, say mRNA expression level measured by transcriptome analysis, changes when the intensity of a given environmental stressor increases. Specifically, we are concerned with the logarithmic change in concentration *x*_{j}, i.e., *δX*_{j}(*E*)*= X*_{j}(*E*) *- X*_{j}(*O*), where *O* is the original nonstress condition. By assuming the steady growth of cells and linearizing all the dynamics around the original state, expression change *δX*_{j}(*E*) is obtained in response to the change in single scalar parameter *E* representing the value of a given single environmental condition (e.g., temperature change). From the steady-growth solution, it then follows that change *δX*_{j} across all components *j* in a cell undera given environmental (stress) condition, , where *μ* is the growth rate of a cell in the steady-growth state, and *E* and *E*^{′} are two intensities of a given single environmental stressor. Because the right-hand side is independent of each component *j*, the above equation implies that the changes in all components are proportional, with the slope given by the growth rate change. By means of the transcriptome analysis of *Escherichia coli*, the relation is found to hold rather well for the change in mRNA abundance under given stress types [17].

So far, the environmental condition was a given scalar variable. Nevertheless, there is a variety of environmental conditions. Then, is this relation valid not simply toward the intensity of a given stressor but across a variety of environmental conditions? In other words, for different types (i.e., for different vectors **E _{1}** and

**E**, does the relation hold? This is not an incremental question, because such a relation, if it exists, cannot be explained simply by the steady growth but reflects some other basic property inherent in (evolved) biological systems, which necessitates a novel conceptual framework leading to a strong constraint in possible phenotypic changes.

_{2}To also allow for a direct comparison with protein expression dynamics, we examined the proteome data of growing *E. coli* under various environmental conditions reported in ref. [24]. The environmental conditions considered include different culture media with various carbon sources, application of stressors such as high temperature and osmotic pressure, and glucose-limited chemostat cultures with different dilution rates (which are equivalent to specific growth rates in the steady state). Measurement of intracellular protein concentrations was carried out by mass spectrometry, which provided data on the absolute concentrations of more than a thousand protein species.

From the proteomic analysis under various environmental conditions, we calculated the change in protein concentrations between the standard (reference) state and that under different environmental conditions. In the following analysis, a minimal medium with glucose as a sole carbon source was selected as the standard state, and essentially the same results were obtained when other data served as the standard state to compute changes *δX*_{j}(**E**) and *δμ*. The log-transformed expression change of the *j*-th protein is calculated as *δX*_{j}(**E**) = log(*x*_{j}(**E**)*/x*_{j}(**S**)), where *x*_{j}(**E**) and *x*_{j}(**S**) represent the protein concentration in given environment **E** and the standard condition, respectively. Here, we employ the logarithmic scale because protein expression levels typically change on this scale [26–29], and this choice facilitates a comparison with the theory described above and discussed below.

To determine whether the common proportionality of protein expression changes in Eq. (1) is also valid across different environmental conditions, we analyzed the relation of protein concentration changes *δX*_{j} observed in two different environmental conditions. In Fig. 1(a) and (b), we show some examples of the relation between protein expression changes *δX*_{j}(**E _{1}**) and

*δX*

_{j}(

**E**), where

_{2}**E**and

_{1}**E**are two different culture conditions. Using the data, we plotted (

_{2}*δX*

_{j}(

**E**)

_{1}*, δX*

_{j}(

**E**)) for all pairs of 20 different environmental conditions (

_{2}**E**,

_{1}**E**). As shown in the figure, the expression changes under different environmental conditions strongly correlate over a large number of proteins. By means of the plots, we computed the slope of the common trend in protein expression changes, which is in agreement with that calculated from the ratio of growth rate changes as in Eq. (1) (blue lines in Fig. 1(a) and (b)). In Fig. 1(c), we compared the slope of the common trend in protein expression obtained by linear regression, i.e., a comparison of common ratio

_{2}*δX*

_{j}(

**E**)

_{1}*/δX*

_{j}(

**E**) with the ratio of growth rate change

_{2}*δμ*(

**E**)

_{1}*/δμ*(

**E**), for all possible combinations by choosing a standard condition (see the caption of Fig. 1 for details) [25].

_{2}Here, we employ proteomics data, whereas another possible choice will be the use of transcriptome data. Either is fine for our purpose, and in fact some analysis involving transcriptome data also indicates such global proportionality in expression levels [6–10] although the relation with the growth rate is not explored. One advantage of the proteomic analysis over the transcriptomic analysis is that the former reflects post-transcriptional regulation, which is believed to be involved in the growth rate regulation [30]. In fact, the proteome data we are dealing with [24] have widely been used to analyze growth rate— dependent regulation of protein expression levels and resource allocation [31–35]. Thus, application of the proteome data will be more appropriate to analyze the relation between high-dimensional phenotypic dynamics and the growth rate in Eq. (1).

The present global proportionality toward *various* environmental conditions is much stronger than in the earlier results and is beyond the scope of the simple theory based only on steady growth [17] (and linear approximation), which can account only for the change along a *given single* environmental condition. There, *E* and *E*^{′} represented the same environmental stress and differed only in the intensity, for example, a change in the culture temperature. They changed along the same vector. In contrast,the present proportionality deals with different types of stress conditions (e.g., temperature versus a nutrient), i.e., with vectors of different directions. Therefore, the observed proportionality cannot be derived from the original theoretical formulation and is not a generic property of the simple steady growth. Nevertheless, the data suggest that the proportionality in Eq. (1) is still valid in this case to a certain degree.

Aside from this unexpected proportionality across different dimensions, the obtained data are suggestive of a broad range of linearity. The reaction or expression dynamics are generally highly nonlinear, with catalytic reactions or expression dynamics showing high Hill coefficients. Nonetheless, the experimental data still satisfy the linear relation even under a stress condition that reduces the growth rate below 20% or so as compared to the standard. Such expansion of the linear regime is not a general property of a system with steady growth either.

To sum up, other factors aside from the steady growth are involved in order for the relation in Eq. (1) to hold across a variety of environmental conditions up to their large change. Now, recall that biological systems not only are constrained by steady growth but also are a product of evolution. With evolution, cells can efficiently and robustly reproduce themselves under a given external condition. Thus, it is important to examine validity of the hypothesis that the above-mentioned global proportionality is a result of, and further strengthened by, evolution.

## CATALYTIC-REACTION NETWORK MODEL FOR NUMERICAL EVOLUTION

To validate the above hypothesis, we evaluated how reaction (or protein expression) dynamics are shaped by evolution. To this end, we utilize a simple cell model consisting of a large number of components and numerically evolve it under a given fitness condition to determine how the phenotypes of many components evolve. Here, phenotypes are given by the concentrations of chemicals, which change due to catalytic chemical reactions of enzymes that are in turn synthesized by some other catalytic reactions. Here, genes provide a rule for the possible reactions such as the parameters and structure of a catalyticreaction network.

The phenotype of each organism as well as the growth rate (fitness) of a cell is determined by such reaction dynamics, while the evolutionary process consists of selection according to the associated fitness and genetic change in the reaction network (i.e., rewiring of the pathway). In the analysis of a population of such cells that slightly differ in their genotypes, and accordingly in their relative fitness, the offspring are generated in accordance with the fitness value. Through these dynamics, the evolutionary changes in the genotypes and corresponding phenotypes are traced, and the evolution of phenotypic changes is studied by means of this combination of dynamical systems and a genetic algorithm, which enables testing of the hypothesis at hand.

The specific cell model applied for this purpose consists of *k* components. Thus, the cellular state is represented by the numbers of *k* components (*N*_{1}*, N*_{2},…, *N*_{k}) and their concentrations *x*_{i} = *N*_{i}*/V* witthe volume of the cell *V* [37, 38]. There are *m*(< *k*) resource chemicals *S*_{1}*, S*_{2},…, *S*_{m} whose concentra-tions in an environment and a cell are given by *s*_{1},…, *s*_{m} and *x*_{1},…, *x*_{m}, respectively. The resource chemicals are transported into the cell with the aid of other chemical components named “transporters.” The other chemical species work as catalysts, which are synthesized from other components, including resource chemicals, in catalytic reactions driven by some other catalysts, while the resource and transporter chemicals are noncatalytic. With the transport of external resources and their conversion to other components, the cell increases its volume and divides in two when it reaches a size larger than a given threshold. The growth rate depends on the catalytic network, i.e., genotype.

We adopted this model of a catalytic reaction network for simplicity because this model shows power law statistics, log-normal fluctuation, and adaptation with fold change detection; these characteristics are consistent with properties of cells [37, 38]. Detailed description is given below.

**Catalytic reaction dynamics**: For intracellular reaction dynamics, we consider a catalytic network among these *k* chemical species, where each reaction leading from some chemical *i* to some other chemical *j* was assumed to be catalyzed by a third chemical *ℓ*, i.e., (*i* + *ℓ→ j* + *ℓ*). In the zeroth generation, the catalytic network was generated randomly, where the path probability that a reaction from chemical *i* to some other chemical *j* exists is given by *ρ*. For simplicity’s sake, all reaction coefficients were set to be equal.

The nutrient chemicals (that are not catalysts) *S*_{1}, .. ,*S*_{m} are transported into the cell with the help of other chemicals called “transporters.” Here, we assumed that the uptake flux of nutrient *i* from the environment is proportional to *Ds*_{i}*x*_{t}*i*, where chemical *t*_{i} acts as the transporter for nutrient *i*, and *D* is a transport constant; in addition, for each nutrient, there is one corresponding transporter, represented by *t*_{i} = *m* + *i*. Both the nutrients and transporters have no catalytic activity, whereas the other *k-*2*m* chemical species are catalysts. Via catalytic reactions, these nutrients are transformed into other chemicals, including the transporters.

**Cell growth**: With the uptake of nutrient chemicals from the environment, the total number of chemicals *N* =∑ _{i}*N*_{i} in a cell can increase; consequently, we assumed that cell volume *V* is proportional to total number *N*. For simplicity, cell division is assumed to happen when the cell volume exceeds a given threshold. Through cell division, the chemicals in the parent cell are evenly distributed between the two daughter cells.

**Reaction Procedure**: In our numerical simulations, we randomly picked up a pair of molecules in a cell and transformed them if they are a substrate and the corresponding catalyst in the reaction network. In the same way, a nutrient molecule is transported across the plasma membrane if a randomly chosen intracellular molecule is a corresponding transporter for a randomly chosen extracellular nutrient molecule. The parameters were set as follows: *k* = 1000, *m* = 10, *ρ* = 0.075, and *D* = 0.001.

**Evolution**: In each generation, a certain subset of cells with higher growth rates is selected to produce offspring, resulting in a given mutational change in the network. First, *n* parent cells were generated, whose catalytic-reaction networks are randomly generated by means of connection rate *ρ*. From each of the *n* parent cells, *L* mutant cells were generated by random replacement of *mρk*^{2} reaction paths, where *ρk*^{2} is the total number of reactions, and *m* denotes the mutation rate per reaction per generation. To determine the growth rate of each cell, i.e., the inverse of the average time required for division, reaction dynamics were simulated for each of the *nL* cells. Within the cell population, *n* cells with faster growth rates were selected to be the parent cells of the next generation, from which *nL* mutant cells were again generated in the same manner. The parameters were set as follows: *n* = 1000, *L* = 50,and *m* = 1 *×* 10^{−3}. The simulation of evolutionary dynamics was performed under a constant (original)environmental condition , where.

## GLOBAL PROPORTIONALITY EMERGED THROUGH EVOLUTION

Using the model described in the preceding section, we analyzed the response of the component concentrations to the environmental change from the original condition. Here, the environmental condition is given by the external concentration *s*_{1},…, *s*_{m}. We then changed the condition to ,where *ε* is the intensity of the stressor and denotes the vector of the new, stressed environment, in which the values of component were determined randomly to satisfy For each environment, we computed the reaction dynamics of the cell, and computed the concen-tration from the steady growth state to obtain the logarithmic change in the concentration ; the change in growth rate *μ*, designated as *δμ*^{(ε,E)}, was computed at the same time.

We next determined whether changes satisfy the common proportionality trend across all components toward a variety of environmental changes, and tested whether the proportion coefficient is consistent with *δμ*^{(ε,E)}, relative to all the possible ranges in 0 *≤ ε ≤* 1. We examined the proportionality for the network both before and after evolution, i.e., the random network and the evolved network, under the given environmental condition.

First, we computed the response in expression to the same type of stress, i.e., the same vector **E** with different intensities *ε*. Fig. 2(a) shows the coefficients of correlation between the changes in component concentrations and caused by different magnitudes of environmental change (*ε*_{2} = *ε*_{1}+*ε*, at *ε >* 0). This result was obtained using a variety of randomly chosen environmental vectors **E**. Asshown, for a small environmental change (*ε* = 0.02), the correlation is sufficiently strong both for the random and evolved networks, whereas for a larger environmental change (*ε* =. 08), the correlation coefficients are significantly smaller for the random networks. Fig. 2(b) shows the relation between the ratio of the growth rate changes and the slope in obtained by fitting the concentration changes of all components. Fig. 2(c) shows the ratio of the slope to (which turns to unity when Eq. (1) is satisfied) as a function of the magnitude of environmental change *ε*_{1}. These results clearly showed that for a small environmental change, the relation in Eq. (1) is valid for both the random and evolved networks, while for the large environmental change, Eq. (1) holds only for the evolved networks.

Next, we examined the correlation of concentration changes across different types of environmental stressors: the main topic of the present paper. Fig. 3(a) depicts examples of obtained by three networks from different generations. For initial random networks, there is no correlation, whereas a modest correlation emerges in the 10th generation. Later, with evolution, common proportionality takes shape as seen for instance in the 150th generation in Fig. 3(a), where the proportionality reaches more than two digits. To demonstrate generality of the proportionality over a variety of environmental variations, we computed the coefficients of correlation between and for a random choice of different vectors **E**_{α}and **E**_{β}. Fig. 3(b) shows the distributions of correlation coefficients obtained by the random network and evolved network (150th generation). Furthermore, Fig. 3(c) shows how they change as a function of generation: The correlation coefficients increase with generations, and those for the evolved network are much greater than coefficients in the initial random networks at generation= 0 and are closer to unity. It is remarkable that global proportionality develops even for different environmental conditions that do not happen in the course of evolution.

Next,the relation between and the slope inis presented in Fig. 3(d), while the ratio of the slope in to is shown in Fig.3(e) as a function of *ε*_{2}. The agreement of the slope of *δX* with the growth rate change given by Eq. (1) is much more remarkable for the evolved networks than for the random networks.

The global proportionality over all components across a variety of environmental conditions sug-gests that data across different environmental conditions are constrained to changes along one-dimensional manifold after evolution (under a single environmental condition). To examine this possibility, we analyzed the changes in across a variety of environmental changes **E** and *ε* by principal component analysis. Fig. 4 clearly reveals that in the evolved network, high-dimensional data from are located along a one-dimensional curve, while the principal-component plot is consistent with growth rate *μ* rather well (see Fig. 5). (Note that the contribution of the major-axis mode [i.e., the first principal component] reaches 74% in the data.) In contrast, the data from the random network are scattered and no clear structure is visible.

After that, we examined the evolutionary course of the phenotype projected on the same principalcomponent space as that depicted in Fig. 6. As shown in Fig. 6(a), the points from *{X*_{j} *}* generated by random mutations to the reaction network are again located along the same one-dimensional curve. Furthermore, those obtained by environmental variation or via noise in the reaction dynamics also lie on this one-dimensional curve, as shown in Fig. 6(b). Hence, the phenotypic changes are highly restricted, both genetically and nongenetically, within an identical one-dimensional curve.

As shown in Figs. 4 and 6, the variation of concentration owing to perturbations is much larger along the first principal component than along other components. This finding suggests that relaxation is much slower in the direction of the first component than in other directions. To confirm that the first-principal-component mode has a predominantly slow time scale, we estimated the relaxation time of autocorrelation function for each principal component. To be specific, we computed the score of the *i*th principal component at the *j*th cell division during steady growth, and from the time series of component scores, we derived autocorrelation function. Then, we estimated relaxation time *τ*_{i} by fitting the autocorrelation function to exp(*-t/τ*_{i}) with a constant, .The time scales for each principal component are not substantially different for initial generations. As shown in Fig. 7, however, in the evolved network (i.e., generation=120, 150), the relaxation time of the first principal component is significantly larger than that of the others. This means that the fluctuation of cellular state under the influence of noise tends to be dominated by slow relaxation along the first component, whereas the fluctuations along orthogonal directions are relaxed much faster.

In summary, we observed emergent global proportionality beyond trivial linearity in response to even tiny environmental changes. After evolution, the linearity region expands to a level with an order-ofmagnitude change in the growth rate. Notably, the proportionality over different components across different environmental conditions is enhanced through evolution. Changes in high-dimensional phenotype space across a variety of environmental conditions, genetic variations, and due to noise are nearly confined to a single one-dimensional manifold, i.e., the dimensionality in dynamics is strongly reduced from high-dimensional phenotypic space. This one-dimensional line given by the first principal-component axis highly correlates with the growth rate, and therefore the changes caused by environmental and evolutionary perturbations are constrained to changes along the same manifold.

## DOMINANCE OF THE MAJOR AXIS CORRESPONDING TO THE GROWTH RATE

To formulate the observed global proportionality, we advanced the following hypothesis and discussed its consequences: Both environmentand evolution-induced changes in a high-dimensional phenotype space are constrained mainly to points near a one-dimensional major axis in principal component analysis [39]. Phenotypic dynamics are slower along this axis than across it. The axis forms a ridge in the fitness landscape. Evolution progresses along this axis, whereas the fitness function is steep across the axis. Indeed, dominance of the first principal mode in phenotypic change emerges after evolution simulation of the cell model (see Fig. 4). Moreover, expression data from bacterial evolution studies support this hypothesis (see Fig. 6 in ref. [18]).

Although there is no mathematical derivation of this hypothesis, it may be plausible, considering the evolutionary robustness of phenotypes. In phenotypic evolution, it is natural to assume that (1) the dynamics shaping phenotypes are high-dimensional and complex and are subject to stochastic perturbations, (2) dynamics (or genetic rules that govern them) shaping higher fitness states are rarer [42]. Then, via evolution, dynamics leading to a higher fitness state will achieve robustness (resistance to perturbations), as discussed elsewhere [21, 22]. Accordingly, in the state space, in order to shape phenotypes, there will be a flow from most directions to reach the selected phenotype, as shown schematically in Fig. 8. On the other hand, during the generations while the evolution is in progress, the states shift, thus yielding a further higher fitness state with mutational changes in the dynamics. Therefore, along the direction of such (mutational) perturbation of dynamics, phenotype states can be changed rather easily by perturbations. This scheme leads to the situation schematically shown in Fig. 8, i.e., only in the direction in which evolutionary change proceeds, is the relaxation slow, while for other directions orthogonal to it, change is much faster [40].

Now let us consider the relaxation dynamics to the attractor in a given generation. The magnitude of (negative) eigenvalues except for one (or a few) eigenvector(s) will be large, while that along the direction in which evolutionary change has occurred (and will proceed) will be much closer to zero, i.e., the relaxation in this direction is slower. Accordingly, the variance along the largest principal component will be dominant, as confirmed in Figs. 4 and 6 for numerical evolution of the model.

Now, by following the dynamical-system representation of phenotypic dynamics during steady growth [17], we will formulate the above hypothesis as follows. First, let us consider the high-dimensional phenotypic dynamics
where *x*_{i} (*i* = 1, 2, .. ,*N*) is the concentration of components (e.g., proteins) in a cell, and *μ* is the growth rate of the cell. By means of *X*_{i} = log(*x*_{i}) and *x*_{i}*F*_{i} = *f*_{i}, the above equation is rewritten as

Thus, the stationary solution for a given environmental parameter vector **E** is expressed as

By linearizing the equation around a given stationary solution using the Jacobi matrix **J** and susceptibility vector *γ*_{i} = *∂F*_{i}*/∂***E**, we get

Thus, because of **L** = **J**^{−1},
where **I** is a unit vector (1, 1, 1, ..1)^{T}.

Let us denote eigenvalues of matrix **L** as *λ*_{k}, with the corresponding right (left) eigenvalues **w**_{k} (**v**_{k}),respectively. Then matrix *L* is represented by .

Hence, we get

The major-axis hypothesis postulates that the magnitude of the smallest eigenvalue of **J** (which is denoted as *k* = 0) is much smaller than that of the others (or the absolute eigenvalue of **L** = **J**^{−1} is much greater than that of the others), whose eigenmode dominates the long-term phenotypic dynamics, so that the change in *δ***X** is nearly confined to the major axis corresponding to its eigenvector (see the schematic in Fig. 8, and see Figs. 4—7 for possible numerical support).

Thus, to study the major response to environmental changes, only the largest eigenvalue of *L* (i.e., the smallest eigenvalue of **J**) with the corresponding eigenvector **w _{0}** is dominant. The major change in the dynamics is projected onto this axis

**w**, as in the collective motion for a macroscopic variable in statistical physics [41]. In other words, the one-dimensional major axis along which

_{0}*δ*

**X**are located, as uncovered in the preceding section (see Fig. 4) corresponds to

**w**. Using the reduction to this mode

_{0}**w**, we get

_{0}Hence, across different environmental conditions, the following relation is obtained:

This relation results in proportionality of *δX* across all components *j*. Reduction to the change along **w _{0}** explains the proportionality across all components, over different environmental conditions. Now, we will discuss the proportion coefficient. By setting

*δE*=

*δμ/α*(

**E**) assuming linearity, we get

Therefore, the proportion coefficient tightly correlates with *δμ*, but there is a correction from *δμ* by the factor (1 *-* **v _{0}**

*γ*(

**E**)

*/*(

*α*

**v**)). Nevertheless, this correction term would not be so large considering that the projection to the major axis is

_{0}I**w**. Because the dynamics are robust from various directions except for major axis

_{0}**w**, perturbations to original dynamics except for this direction should decay rapidly. In other words, the change in the dynamics

_{0}*d*

**X**

*/dt*=

**F**(

**X**;

**E**) caused by a perturbation may be large, so that the original state is regained (see Fig. 8). Then it is expected that

*γ*, sensitivity of

**F**to an external change, takes a larger value for directions other than

**w**, i.e.,

_{0}*γ*vector consists mostly of components of

**w**at

_{j}*j*≠ 0. Given that

**v**= 0 for

_{0}w_{j}*j*≠ 0, this state of affairs suggests that (

**v**·

_{0}*γ*) will be negligible. Hence, the correction term will be reduced to unity, independently of the direction of environmental perturbation. Thus, Eq. (1) follows: the proportion coefficient across different environmental conditions is approximately consistent with the ratio of the growth rates.

Here, we should note that under the assumption **v _{0}**·

*γ~*0, we do not need the linearity between

*δμ*and

*δ∊*to derive the linear relation between

*δ*

**X**and

*δμ*. From Eq. (9), we directly obtain the relation without requiring the linear approximation

*δμ ∝ δε*. In this scenario, even though

*δμ*and

*δ*

**X**show nonlinear dependence on

*δ∊*, the agreement of the slope across the component changes

*δX*

_{j}with the change in the growth rate

*δμ*is derived. Indeed, the proportionality in the simulation results described in the last section holds up for the regime in which

*δμ*is no longer proportional to stress intensity

*δ∊*(Fig. 9).

The above calculation based on linear algebra is also interpreted as follows (with possible nonlinear extension by means of the stable manifold). Indeed, according to the numerical results in Figs. 4— 7 and the schematic picture in Fig. 8, the change occurs along one-dimensional manifold **W**, which corresponds to the first principal coordinate and is a nonlinear extension of **w _{0}**, whereas the variation of the environmental change

*δE*is expressed in coordinate

*W*as

*δW*=

*W*(

**E**+

*δE*)

*-W*(

**E**). Then, after introduction of the projection from manifold

**W**onto

*X*

_{j}as , the variation satisfies

Because is determined only by the vector **W** and independent of environmental vectors **E**, the global proportionality in *δX*_{j} across components *j* is obtained. Here, we should note that the change along **W** is parametrized by growth rate *δμ* because it has tight one-to-one correspondence with the principal coordinate. Then *δW* is represented as a function of *f* (*δμ*). Given that *δW∝ δμ*(**E**) for small *δμ*, linearization leads to Eq. (11).

To sum up, under the major-axis hypothesis, we could explain two basic features that have thus far been observed in experiments and simulations:

(1) **Overall proportionality is observed in expression level changes across most components and across a variety of environmental conditions**. This is because high-dimensional changes are constrained to changes along the major axis, i.e., eigenvector **w _{0}**.

(2) **There is an extended region for global proportionality**. Because the range in the variation along **w _{0}** is large, the change in phenotype is constrained to points near this eigenvector, so that the proportionality range of phenotypic change is extended via evolution. Furthermore, as long as the changes are nearly confined to the manifold along the major axis, global proportionality reaches the regime nonlinear to

*δ∊*.*

## CONSEQUENCES OF THE THEORY FOR PHENOTYPIC EVOLUTION

**Congruence of phenotypic changes due to evolutionary and environmental changes**

The phenotypic changes constrained to the small space around the major axis are shaped by evolution, which further constrains any subsequent evolutionary potential. Because both the environmentally and genetically induced changes of phenotype, *δ***X**(**E**) and *δ***X**(**G**), respectively, are constrained to changes along the same axis, they inevitably correlate (recall Fig. 6). Consequently, if we apply the argument for *δ***X**(**E**) to evolutionary (genetic) change *δ***X**(**G**), then the two changes are given by

Following the projection to the major axis again (and taking only the eigenmode), we get

This reasoning shows proportionality between *δ***X**(**E**) and *δ***X**(**G**) across all components. As for the proportion coefficient, there may be a slight correction from *δμ* due to the factor . Nonetheless,this value is not expected to be so large as long as the projection of the γ vector to the major axis is not so large as already discussed in the preceding section (also recall Figs. 6 and 8, which show that the major phenotypic changes caused by environmental and genetic perturbations are located near the same manifold **W**). Then,
holds. This proportionality between environmentally and evolutionarily induced changes, in the form of Eq. (15), is reported both in bacterial evolution experiments and numerical evolution of a cell model [18]; this pattern is well explained by the present theory.

**Fluctuation relation**

Because changes due to noise and genetic variations progress along the common major axis, the phenotypic variances due to the former (*V*_{g}) and those due to the latter (*V*_{ip}) are proportional across all components. In fact, given that relaxation of phenotypic changes is much slower along major axis **W**, the phenotypic fluctuations due to noise are nearly confined to this axis. Thus, the variance of each component *X*_{i} is given by the means of the variance in variable *W* of the principal mode (along the major axis) due to noise,

Likewise, the variation due to the genetic change of each component is mostly constrained to points near the axis, so that
where (**w**_{0})_{i} is the *i*th component of **w _{0}** (i.e.,

*δW*(

**w**)

_{0}_{i}is in agreement with the projection of

*δX*

_{i}onto the major axis

**w**). Hence, holds across components.

^{0}Note that in the derivation of relation (18), the growth rate term is not involved. What we need is simply the dominance of the major-axis mode **w _{0}** in the expression changes. Then, the major variance in the high-dimensional expression change occurs along the single dimension

**w**, irrespectively of the source of perturbations, e.g., environmental change, noise, or genetic variation. Indeed, aside from the numerical evolution of the present cell model with the catalytic and reaction network [18], this relation is also observed in the gene regulation network model where the growth rate dilution is not included [42], whereas some experimental data on

_{0}*Drosophila*[43] and yeast [44, 45] support relation (18) at least partially. The present theory accounts for the origin of the empirical relation as a direct consequence of the hypothesis about the dominance of major-axis mode, in which changes incurred both by (environmental) noise and (genetic) mutations are constrained to the small space around the axis. Thus, agreement with simulation results can provide further support for the hypothesis.

Here, recall that the evolutionary rate of a phenotype is proportional to its variance due to genetic change, *V*_{g} [46, 47]. Hence, the *V*_{g} − *V*_{ip} relation suggests that it is easier for a phenotype with larger *V*_{ip} to evolve under selective pressure. Thus, the specific phenotype that is most likely to evolve is predetermined according to the variability due to noise prior to application of the mutation selection process.

## DISCUSSION

In the present paper, we first confirmed the global proportionality across protein expression levels toward a variety of environmental conditions using the experimental data from a bacterial transcriptome and proteome. The rate of change in logarithmic concentrations is rather well consistent with that of the cellular growth rate. By carrying out an evolutionary simulation of a cell model with the catalytic reaction network, we showed that such proportionality is shaped through evolution, where expression changes in the high-dimensional state space of the phenotype are constrained to changes along a major axis that corresponds to the growth rate.

These results are explained by the assumption that most of the expression changes are constrained to points near the major axis **w _{0}**. This axis

**w**is the eigenvector for the smallest eigenvalue of the Jacobi matrix in the expression reaction dynamics; accordingly, the variation in phenotype

_{0}*δ*

**X**is larger along this axis. Biologically speaking, this

**w**might be represented by a concentration change of a specific component, but more naturally, this axis could be represented by a collective variable generated from the concentrations (expression levels) of many components. In other words, changes of the cellular state that progresses in a high-dimensional space are nearly confined to changes along this major axis

_{0}**w**.

_{0}Note that once the dominance in major axis **w _{0}** is shaped by evolution, linearization against environmental condition

*δ*

**E**is no longer necessary. Furthermore, as long as the dominance is shaped through evolution under given fitness [42], the global proportionality holds even if fitness is not determined by the growth rate. In fact, using the cell model that we described here but selecting higher concentration of a given target chemical as the fitness for selection (instead of the growth rate), we again observed global proportionality over all components after evolution. In fact, as shown in Fig. S2(a), the correlation of concentration changes across different types of environmental changes increases with generations, indicating that global proportionality for different environmental conditions is achieved. After the evolution, the slope in (with random selection of different vectors

**E**

_{α}and

**E**

_{β}and magnitudes

*ε*

_{1}and

*ε*

_{2}) is in good agreement with the ratio of fitness change

*δf*

^{(ε1, Eα)}/

*δf*

^{(ε2, Eβ)}, where

*δf*

^{(ε,E)}indicates the change of the fitness (i.e., the concentration of target chemical) observed in the corresponding environment (Fig. S2(b)). The principal component analysis also supported the notion that concentration changes caused by random environmental perturbations are confined to a low-dimensional curve (Fig. S2(c)), where the first principal component corresponds to fitness (Fig. S2(d)). These results in Fig. S2 strongly indicate that the arguments in the present paper are valid during evolution of given fitness.

Because the phenotypic change due to noise or environmental or evolutionary change is mainly constrained and larger along this major axis **w _{0}**, the phenotypic properties along this axis can be considered plastic (see Figs. 4 and 6), while the robustness of fitness (resistance to mutation and noise) develops [21, 22, 42] in the course of evolution under a fixed condition, as discussed in the nearly-neutral theory by Ohta [48]. The environmental or genetic perturbation is buffered to the change along the major axis,

*δ*

**X w**, while the fitness (growth rate) change achieves robustness after evolution. Indeed, the compatibility of plasticity with robustness remains one of the major issues in biology [49–54].

_{0}The question remains as to why such an axis, along which the change is much larger than that occurring in other directions, is shaped by evolution. As already discussed, robustness is shaped by evolution, so that stronger attraction is achieved in the state space for most directions. By contrast, along the axis in which phenotypes change via evolution, plasticity, i.e., changeability, remains. Consequently, we argued that phenotype dynamics in various directions except for those relevant to the evolutionary changes are expected to undergo strong contraction as evolution progresses. This result leads to the dominance of change along the major axis, as discussed here.

Note that the dominance of changes along the major axis, once it is shaped by evolution, can facilitate further evolution. If there were no such directions, and instead random mutations in many directions were the main source of phenotypic change, the changes would mostly cancel each other out, so that the phenotypic changes corresponding to specific fitness growth would be suppressed, resulting in the reduction of evolvability. Instead, if all the changes are confined to the mode along the major axis **w _{0}**, then a certain degree of phenotypic change would be assured to foster subsequent evolution.

In fact, the variation in *X* due to environmental or evolutionary change remains rather large. Both the principal component analyses of bacterial evolution [18] and the results of the present study suggest that the evolutionary course in gene expression patterns can be described by one (or a few) major degrees of freedom, and this “major axis” extracted by the principal component analysis highly correlates with the growth rate of the cell.

In the present paper, we discuss the emergence of a dominant mode of evolution in a fixed environment. One might wonder whether such single mode forms if the evolution progresses during fluctuation across several environmental or fitness conditions. In this case, one would expect, that not only a single mode but multiple principal components would make a larger contribution through evolution. For example, when we consider evolution under the influence of a variety of antibiotics, reduction to a single dominant mode may be too simple. Still, we naturally expect the reduction to a few modes, in accordance with the notion of robustness shaped by evolution. Indeed, one study on antibiotic-resistant *E. coli* strains revealed that the resistance to 26 antibiotics via various mechanisms of action can be quantitatively represented by 7 or 8 modes represented by expression levels of a few thousand genes [55]. This result suggests that there is a small number of major modes that govern high-dimensional expression dynamics leading to antibiotic resistance.

Formation of such low-dimensional structure from a complex high-dimensional state space was recently discussed in terms of a variety of biological problems, such as neural-network dynamics [56], protein dynamics [57], and laboratory ecological evolution [58], whereas separation of a slow eigenmode has also been discussed for biological data [59] and simulations [60]. Formation of a dominant mode through evolutionary robustness will be important for discussing universality of such a dimension reduction in biological systems.

To sum up, the present study opens up the possibility to describe adaptation and evolution in response to environmental changes by only a few collective variables. By means of this representation, evolvability, i.e., the changeability of a phenotype via evolution, can be quantitatively characterized. This changeability is also represented by phenotypic fluctuations due to noise or environmental variation, as in the evolutionary fluctuation—response relation [29, 42]. In other words, the phenotype that is more evolvable is predetermined before a genetic change occurs; this mechanism is a manifestation of the genetic assimilation concept proposed by Waddington [49].

Of course, further studies are needed to confirm the generality of the evolution of dominance of majoraxis mode, both in experiments and numerical simulations. As discussed above, during evolution in terms of given fitness, which is not necessarily the growth rate, dominance of the major-axis mode is shaped, thereby leading to the global proportionality of concentration changes (see Fig. S2). Moreover, we previously studied gene regulation networks in which the expression of many genes mutually activates or suppresses the expression of other genes [22, 42]. Preliminary results suggest that global proportionality is a general property of evolved networks; that is, all expression levels change in proportion up to a large change in fitness, and the expression changes are located along a one-dimensional manifold, as represented by the principal component that correlates with fitness. In this case again, the dominance in the major-axis (or slow-manifold) mode itself seems to be a general consequence of evolution toward robustness [22], even without the use of growth rate *μ* as fitness. When global dilution by the growth rate is imposed and fitness correlates with the growth rate, proportionality of the component with *δμ* is achieved.

Another important future extension of our theory will be formulation not associated with the steady growth state. In cells, there are states in which growth is suppressed, such as the stationary state [1, 61, 62], and some theoretical approaches have been proposed to describe these nongrowth states [5]. It will be worthwhile to determine whether (or how) the incorporation of more degrees of freedom in addition to the major axis can capture the state with *μ ~* 0.

## Acknowledgment

The authors would like to thank Matthias Heinemann for useful discussions of his experiment. The authors are grateful to Tetsuhiro Hatakeyama, Pablo Sartori, and Bingkan Xue for stimulating discussions. This research was partially supported by the Platform for Dynamic Approaches to Living Systems from the Japan Agency for Medical Research and Development (AMED) [to K.K], Grant-in-Aid for Scientific Research (S) (15H05746 [to K.K. and C.F]), and by Grant-in-Aid for Scientific Research (B) (15H04733, 15KT0085 [to CF]) from the Japan Society for the Promotion of Science (JSPS). KK would also like to acknowledge Charles L. Brown Membership at the Institute for Advanced Study at Princeton.

## References

- [1].↵
- [2].↵
- [3].
- [4].
- [5].↵
- [6].↵
- [7].
- [8].
- [9].
- [10].↵
- [11].↵
- [12].
- [13].
- [14].
- [15].
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].
- [28].
- [29].↵
- [30].↵
- [31].↵
- [32].
- [33].
- [34].
- [35].↵
- [36].
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].
- [51].
- [52].
- [53].
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵