Abstract
Understanding the allocation of the cellular proteome to different cellular processes is central to unraveling the organizing principles of bacterial physiology. Proteome allocation to protein translation itself is maximally efficient, i.e., it represents the minimal allocation of dry mass able to sustain the observed protein production rate. In contrast, recent studies on bacteria have demonstrated that the concentrations of many proteins exceed the minimal level required to support the observed growth rate, indicating some heterogeneity across pathways in their proteome efficiency. Here, we systematically analyze the proteome efficiency of metabolic pathways, which together account for more than half of the E. coli proteome during exponential growth. Comparing the predicted minimal and the observed proteome allocation to different metabolic pathways across growth conditions, we find that the most costly biosynthesis pathways – those for amino acid biosynthesis and cofactor biosynthesis – are expressed for near optimal efficiency. Overall, proteome efficiency increases along the carbon flow through the metabolic network: proteins involved in pathways of nutrient uptake and central metabolism tend to be highly over-abundant, while proteins involved in anabolic pathways and in protein translation are much closer to the expected minimal abundance across conditions. Our work thus provides a bird’s-eye view of metabolic pathway efficiency, demonstrating systematic deviations from optimal cellular efficiency at the network level.
Importance Protein translation is the most expensive cellular process in fast-growing bacteria, and efficient proteome usage should thus be under strong natural selection. However, recent studies show that a considerable part of the proteome is unneeded for instantaneous cell growth in E. coli. We still lack a systematic understanding of how this excess proteome is distributed across different pathways as a function of the growth conditions. We estimated the minimal required proteome across growth conditions in E. coli and compared the predictions with experimental data. We found that the proteome allocated to the most expensive internal pathways, including translation and the synthesis of amino acids and cofactors, are near the minimally required levels. In contrast, transporters and central carbon metabolism show much higher proteome levels than the predicted minimal abundance. Our analyses show that the proteome fraction unneeded for instantaneous cell growth decreases along the nutrient flow in E. coli.
Introduction
Proteins account for more than half of the cell dry mass in E. coli (1) and drive most biological processes. How and why proteome is allocated to different cellular processes and pathways is a vital question for understanding the principles behind bacterial physiology (2). Proteome allocation into different groups of genes is growth rate-dependent (3). When partitioning the proteome into specific, coarse-grained “sectors”, the corresponding proteome fractions follow simple, empirical growth laws, increasing or decreasing linearly with the growth rate μ (4–7). For example, the proteome fraction allocated to the ribosome and ribosome-affiliated proteins (the R-sector (6)) scales as a linear function of growth rate under nutrient limiting conditions (4).
Why does the proteome composition scale with the growth rate? Protein is the most abundant and costly macromolecule in bacterial cells. It has thus been speculated that the proteome composition is adjusted to the specific growth condition to maximize the growth rate (8). If this were true, all protein concentrations would be at the minimal level required to sustain the observed cellular growth rate. This simple assumption has been widely used in computational models of cellular growth (9–15). However, even if proteome allocation had evolved to be maximally efficient, it is not obvious that this efficiency would simply maximize the instantaneous growth rate. Instead, it appears likely that proteome allocation has evolved to maximize cellular fitness in unpredictable, dynamic environments with varying nutrients and involving periods of famine and stresses (8). Indeed, recent experimental work indicates that the proteome is not expressed for maximal efficiency in unevolved E. coli strains, at least not in the naïve sense of maximizing the instantaneous growth rate. First, a large fraction of the expressed proteome is unneeded for the current environment, especially at low growth rates (16). Second, the growth rate can increase by approx. 20% in a few hundred generations in adaptive laboratory evolution experiments on minimal media (17), a process associated with reductions in the abundance of unused proteins (16). Finally, the fluxes through some cellular processes, e.g., nutrient transport and energy production, are not limited by specific proteins in these pathways at low growth rates (18). Thus, E. coli proteome allocation seems not to be globally optimized for maximizing the instantaneous growth rate.
On the pathway level, however, proteome allocation to at least one cellular process – protein translation – is optimized for maximal efficiency at the given protein synthesis rate (18–21). This indicates that while the global allocation of proteins is not always optimized for maximal growth rate, the proteome allocation to some cellular pathways is at a local optimum – i.e., the individual pathway utilizes the minimal protein mass required to support the observed pathway output. In contrast, proteome allocation to transporters scales contrary to the optimal demand with decreasing growth rate in E. coli: at increasingly lower growth rates, bacterial cells express more and more transporters for nutrients that are currently not available (18, 22).
Why do cells optimize resource allocation to certain pathways (translation) but not others (transporters)? From a cellular pathway topology perspective, transporters are located at the interface with the environment, while proteins for translation are located at the end of nutrient flow. Bacteria such as E. coli are living in constantly changing environments, but have only a very limited ability to sense external nutrient levels. Therefore, transporters should not only transport enough nutrients for cell growth under the current conditions, but also allow the cell to quickly import alternative substrates that become available in upcoming conditions. To maximize fitness across changing environments, it is plausible that bacteria growing on preferred nutrients should invest much of their resources into proteins required for instantaneous growth, while bacteria growing on unpreferred nutrients should allocate more resource to the preparation for future environments. Unlike transporters, translation proteins are located in the interior of cellular processes and rarely have direct connections to the environments. Moreover, in contrast to sensing the large number of potential nutrients and their combinations, sensing an increased or decreased demand of protein production is essentially a one-dimensional problem. Thus, the cell might have evolved a simple and efficient way to regulate the expression of the translation machinery to the minimal required level for instantaneous cell growth, rather than making them dependent on specific nutrients. Indeed, in E. coli, the ribosomal genes are mainly regulated by the concentration of a single molecule species, ppGpp (23, 24).
Based on these observations, we speculated that more generally, the proteomic efficiency of pathways might depend on their positions in the metabolic network. We hypothesized that proteome efficiency – defined as the ratio between minimally required and observed protein concentrations – increases along the carbon flow, from transporters at the network periphery to translation at the network core. In E. coli growing on minimal media, more than half of the proteome by mass is metabolic enzymes (22). Computational models can predict the optimally efficient proteome allocation to each metabolic pathway (9, 12, 14, 16), and quantitative proteomics data is available for E. coli growing on a wide range of minimal media with different carbon sources (22). To test our hypothesis, we exploit these resources to compare experimental data across diverse minimal media conditions (22) to the predicted optimal pathway expression at the observed growth rate. As expected, we find that pathways differ systematically in how much excess protein mass is allocated to them compared to the local optimum, with decreasing excesses over optimal allocation along the carbon flow from nutrient import to protein production.
Results and Discussion
Modeling proteome allocation with linear enzyme kinetics and growth-rate dependent biomass composition
To analyze local pathway efficiency, we first predict the local optima of all metabolic enzymes with an improved version of FBA with molecular crowding (9, 15). We modelled E. coli metabolism based on the constraint-based iML1515 model (25). The standard model assumes a constant composition of biomass across conditions. As the RNA/protein mass ratio (4) and the cell surface/volume ratio (26) can be expressed as functions of growth rate under the investigated conditions, we re-formulated the biomass function of iML1515 with growth rate-dependent contents of RNA, protein, and cell envelope components (murein, lipopolysaccharides, and lipid) (see Methods, Suppl. Fig. S1, and Suppl. Table S1).
We performed calculations using MOMENT (MetabOlic Modeling with ENzyme kineTics) (9, 27, 28), a version of flux balance analysis (FBA) with molecular crowding (15). Similar to other constraint-based approaches (12, 14), MOMENT estimates the enzyme concentration required to support a given flux vi as [Ei]= vi /ki, where ki is the effective turnover number of the enzyme. This effective turnover number was assumed to be constant across conditions, a zero-order approximation to the true growth rate-dependence (29). Maximal in vivo effective enzyme turnover number (kapp,max) represents turnover in the cellular environment better than in vitro estimates of enzyme turnover numbers (kcat) (30, 31). We thus parameterized the reactions of the iML1515 model with the kapp,max from Ref. (31) by replacing the original kcat (27) when kapp,max was available. For reactions for which neither kapp,max nor kcat were available, we used enzyme turnover numbers predicted by machine learning (31) in the simulation (see Suppl. Table S2 for the enzyme turnover numbers and their sources). Most enzymes in the metabolic model have experimentally estimated parameters (kapp,max or kcat); these account for ~70% of total enzyme by mass in the whole metabolic network, and for ~80% when excluding transport reactions (Suppl. Fig. S2).
With the growth rate-dependent biomass function and updated enzyme turnover numbers, we identified the minimal total mass concentration of enzymes and transporters (in units of gram per gram of dry weight, g/gDW) that can support the observed growth rate on the given carbon source (see Methods; the predicted and measured concentrations of individual proteins are listed in Suppl. Table S3). Thus, our predictions do not reflect globally optimal resource allocation, but quantify the minimal proteome allocation into pathways required to sustain the observed growth rate (local optimality). Note that the calculation of required concentrations assumes that all enzymes are fully saturated with their products; this means that our estimates provide a lower bound of proteome allocation into pathways, which is expected to deviate increasingly from the actual demand at lower growth rates (29).
Proteome efficiency increases along nutrient flow in coarse-grained pathways
Following earlier work (16), we first compared the predicted minimal required proteome with experimental data across the whole metabolic network. As E. coli uses different central metabolic reactions for growth on glycolytic and gluconeogenic carbon sources and most of the proteome data in Ref. (22) were measured on glycolytic carbon sources, we focus on the proteome efficiency of metabolic pathways on glycolytic carbon sources here; results for gluconeogenic carbon sources are shown in Suppl. Table S4. We classified proteins into three groups on the basis of their experimental and predicted expression. An individual protein is labeled as:
“shared” if its presence is predicted under local optimality and is confirmed in the experiment (these proteins were labeled “utilized” in Ref. (16));
“measured-only” if it is found in the experiment but predicted to be absent (these proteins were labeled “un-utilized” in Ref. (16));
“predicted-only” if its presence is predicted but not confirmed in the experiment.
The predicted-only proteins account for only a very small fraction of the total predicted proteins (<1%) in all studied pathways, except for nutrient transport and proteins without assigned pathways in this study (“others”) (Suppl. Fig. S3). We thus do not include the predicted-only proteins in the following figures.
Metabolic enzymes account for a decreasing fraction of the proteome with growth rate, with observed proteome fractions ranging from 67% to 53% (Suppl. Fig. S4). In agreement with earlier work (16), we found that the total abundance of shared proteins – those required for maximally efficient growth – increases with growth rate, but far exceeds the predicted globally optimal abundance especially at lower growth rates (Suppl. Fig. S4).
To assess the pathway-specific proteome efficiency, we examined the following four aspects.
For a given pathway, we summed the mass concentrations of all shared proteins – those that are predicted to be active and found experimentally – in each growth condition for both the observed proteins and for the locally optimal prediction. We then calculated the Pearson correlation coefficient r between the two combined mass concentrations across conditions (denoted as rpathway). For locally optimal proteome allocation and if the assumption of constant enzyme saturation would hold, this correlation should approach r=1, independent of enzyme kinetic parameter values.
The geometric mean fold-error (GMFE) of predicted vs. observed protein concentrations of the pathway’s shared proteins (denoted as GMFEpathway), calculated across proteins and growth conditions. The GMFE shows by which factor the observed concentrations deviate from predicted values on average.
The experimentally observed mass fraction of measured-only proteins of the pathway in a given growth condition (denoted as fmeasured-only). This is the proteome fraction that makes no contribution to growth according to our predictions.
The squared Pearson’s correlation coefficient between predicted and measured abundances across individual proteins in a given growth condition (denoted as rindividual). While measures (1)-(3) assess optimality at the pathway level, this last measure quantifies the relationships between proteins within the pathway: a correlation coefficient close to 1 indicates that all proteins are equally close to – or equally distant from – the optimal prediction. Note that in contrast to measure (1), the comparison across individual proteins relies strongly on the accuracy of the individual turnover numbers. As the latter are only known approximately, we expect these estimates to be noisy.
Table 1 shows the pathway proteome efficiency measures on glycolytic carbon sources, which are discussed in the following subsections.
To test if the proteome efficiency of pathways increases with carbon flow, we first assigned the metabolic proteins in the iML1515 model into four coarse-grained sets (see Methods, and Suppl. Tables S5 and S6 for pathway membership): (1) transporters, which shuttle metabolites across the outer or inner membrane; (2) central metabolism, which produces precursor metabolites and energy for all other cellular processes; (3) biosynthesis pathways, which utilize precursors and energy generated by central metabolism to produce building blocks of macromolecules; (4) other enzymes, that is, all enzymes in the iML1515 model not included in (1)-(3) (denoted as “others”; these proteins are not assigned to a specific position along the nutrient flow). The iML1515 model does not include a representation of translation processes. To provide a more complete birds-eye view of nutrient flow, we also included in our analyses the proteome efficiency of the translation machinery (predicted and measured expression of ribosome, elongation factor Tu, and elongation factor Ts) from our previous work (19), which was based on the same proteomics data analyzed here (22).
In these coarse-grained pathways, carbon and other nutrients flow from transporters to central metabolism to biosynthesis pathways to translation. For all four aspects assessed, the proteome efficiency gradually increases along the nutrient flow (Fig. 1 and Table 1): rpathway increases from −0.75 to 0.93, GMFEpathway decreases from 3.39 to 1.35, fmeasured-only decreases from 0.92 to 0, and rindividual2 increases from 0.13 to 0.98.
Predicted and observed proteome allocation to (a) translation machinery, (b) biosynthesis pathways, (c) central metabolism, and (d) transporters. (e) Schematic diagram of nutrient flow and proteome efficiency.
Proteome allocation to translation is near the optimal prediction (Fig. 1a, Table 1), with no expression of unneeded proteins (fmeasured-only = 0), a very high correlation between observed and predicted total investment across conditions (rpathway2 = 0.87), a mean deviation between predicted and observed individual protein concentrations of only 35% (GMFEpathway = 1.35), and a strong correlation between observed and predicted individual protein concentrations (median across the 14 glycolytic conditions: rindividual2 = 0.98). The remaining discrepancy between measured and predicted data is largely caused by the presence of deactivated ribosomes and elongation factor Tu at the studied growth rates (19), which cannot be predicted by optimization.
Proteome allocation to biosynthesis pathways is quantitatively consistent with the predictions for shared proteins, i.e., those whose presence is both predicted and observed (Fig. 1b; rpathway2= 0.84; GMFEpathway = 1.70; rindividual2 = 0.45). However, about a quarter of the biosynthesis protein mass present in the cell is not predicted (fmeasured-only = 0.26).
In central metabolism, the abundance of shared proteins is almost constant across growth rates in measured data, whereas it should increase with growth rate according to the predictions (Fig. 1c). Remarkably, the abundance of measured-only proteins is very high at low growth rates and decreases sharply with growth rate.
In stark contrast to all other pathways, the vast majority of transporters – more than 90% – are measured-only, i.e., the experimentally observed proteins are not part of the predicted optimal proteome (fmeasured-only = 0.92; Fig. 1d; see Methods for the treatment of carbon transporters). Moreover, proteome allocation to transporters decreases with growth rate in measured data (both shared and measured-only), whereas it increases with growth rate in the locally optimal predictions (rpathway = −0.75, p = 1.9×10-3). We note that when the concentration of a substrate is the limiting factor for cell growth, the optimal expression of its transporter increases with decreasing growth rate (10). Here, to compare transporters across growth conditions, we excluded all carbon transporters used in the studied conditions. Since these carbon sources are the only different nutrients across growth conditions (22), the data shown here are the non-growth-limiting transporters and their abundance indeed scales contrary to optimal demand. The true deviation from optimality may be smaller than this estimate due to the existence of many alternative transporters (25) and due to inaccurate turnover number estimates for transporters; only 24 out of 774 transport reactions have experimental measured turnover numbers.
A large mass fraction of the proteins that cannot be assigned to one of the pathways described above (others) is also not expected to be present in the cell according to our predictions (fmeasured-only = 0.91; Suppl. Fig. S5a). About 40% of this unexpected protein mass is related to degradation pathways. At the same time, the abundance of shared proteins is similar to the predictions (GMFEpathway = 1.79).
In sum, proteome efficiency increases along the nutrient flow in the four coarse-grained pathways (Fig. 1e). Transporters represent the metabolic interface of the cell to the environment. In the absence of external sensors, the expression of a transporter for a potential nutrient is a necessary condition for its detection by the cell; thus, non-optimal transporter expression serves an important cellular function unrelated to steady-state growth. Central metabolism acts as a hub that connects all other pathways. When nutrients are transported into the cell, they either directly enter central metabolism, or they first need to be degraded by catabolism. For this reason, optimal proteome allocation to central metabolism is strongly environment-dependent. Just as is the case for transporters, keeping a certain fraction of central metabolism enzymes in standby for environmental changes will thus be beneficial in transitions between physiological states. Moreover, the optimal expression of central metabolism proteins would require detailed, environment-dependent regulation, which may be difficult to achieve without substantial cellular investment into sensing and regulation. In contrast, optimal resource allocation into translation and the biosynthesis (anabolic) pathways, which synthesize building blocks for the cell, is largely independent of nutrients across minimal environments, and depends almost exclusively on the growth rate. Their optimal regulation is thus a one-dimensional problem that requires only a sensor for growth rate itself, and can be implemented relatively easily. Consistent with this speculation, biosynthesis and translational genes are regulated by fewer transcriptional factors than transporters and central metabolic genes (Suppl. Fig. S6). At the same time, our observations are consistent with a reserve of unused biosynthesis enzymes at low growth rates (Fig. 1a and 1b), which can benefit the cell in fluctuating conditions (32, 33).
The most expensive biosynthesis pathways are consistent with optimality
To find if proteome efficiency varies in biosynthesis, we further divided biosynthesis pathways into five sets of pathways: amino acid biosynthesis; nucleotide biosynthesis; cofactor biosynthesis; cell envelope component biosynthesis; and all other biosynthesis enzymes. The predicted proteome fractions of these pathways are almost linear functions of the growth rate (Fig. 2), as mostly the same reactions are expected to be used for biosynthesis across the studied minimal conditions.
See Suppl. Fig. S5b for biosynthetic proteins not covered here.
A large fraction of the proteome is allocated to amino acid biosynthesis pathways at high growth rates on minimal media (about 15%, Fig. 2a). Similar to the situation for translation, proteome allocation to amino acid biosynthesis pathways is strongly correlated with predictions (Fig. 2a; rpathway2 = 0.77; GMFEpathway = 1.40; individual2 = 0.45; Table 1). However, in contrast to translation, a sizeable proteome fraction for amino acid biosynthesis is invested into proteins not predicted to be active (fmeasured-only = 0.30).
For nucleotide biosynthesis pathways, predicted and observed abundances of shared proteins are also strongly correlated in (rpathway2 = 0.67), but their magnitudes differ by more than 3-fold (GMFEpathway = 3.32; Fig. 2b). Moreover, the expression of individual enzymes in this pathway cannot be explained well by the predictions (rindividual2 = 0.15).
Cell envelope biosynthesis pathways encompass lipid, peptidoglycan, and lipopolysaccharide (LPS) biosynthesis. While predicted and observed expression of shared enzymes in these pathways show a statistically significant correlation (rpathway2=0.43; Table 1), the slopes of their growth rate dependences differ markedly. The observed proteome allocation is almost constant across growth conditions; in contrast, the predicted proteome allocation increases proportionally with growth rate (Fig. 2c). It is noteworthy that this disagreement does not stem from an incorrect assumption of constant biomass composition across conditions: our model explicitly accounts for the changing biomass fractions of cell envelope components (Methods), which are in particular due to changes in cell size. Theoretically, the predicted optimal proteome allocation should provide a lower limit on the required proteome investment; that predictions substantially exceed observed proteome allocation for cell envelope biosynthesis at faster growth suggests that one or more enzymes were assigned turnover numbers that are much lower than the true values.
Similar to amino acid biosynthesis pathways, cofactor biosynthesis pathways are also highly abundant at high growth rates (about 10% of the total proteome, Fig. 2d). Proteome allocation to cofactor biosynthesis pathways is highly consistent with the optimal predictions (rpathway2 = 0.84; GMFEpathway = 1.24; fmeasured-only = 0.11; rindividual2 = 0.59).
In sum, proteome efficiency varies substantially across biosynthesis pathways. While observed proteome investment only increases by roughly two-fold for amino acid, nucleotide, and cofactor biosynthesis and shows almost no increase in envelope and other biosynthesis pathways, predicted investment increases by almost a factor of 5.5 (which is the fold-change of growth rate across the examined conditions). At lower growth rates, we expect decreasing enzyme saturation (29) and thus a progressively stronger underestimation of the required proteome by the model; accordingly, Figs. 2a and 2d appear to be highly consistent with an optimal expression of the shared proteins of amino acid and cofactor biosynthesis pathways. On the other hand, proteome allocation to nucleotide, envelope, and other biosynthesis pathways (Suppl. Fig. S5b) appears to be sub-optimal.
Central metabolism: precursor metabolite and energy generation pathways appear not to be regulated for optimality
The enzymes of central metabolism show little systematic variation with growth rate, and their abundance is at most weakly correlated with the predicted concentrations (rpathway2 =0.024; GMFEpathway=2.32). To examine if individual pathways show a stronger agreement between observations and predictions, we examined six central metabolic pathways: glycolysis; pentose phosphate pathway; TCA cycle; energy generation pathways, comprising the electron transport chain and ATP synthase; glyoxylate shunt; and other central metabolic enzymes.
Proteome allocation to glycolysis increases markedly with growth rate and is strongly correlated with predicted values (Fig. 3a; rpathway2 = 0.63; fmeasured-only = 0.08). However, protein levels are substantially higher than predicted (GMFE = 2.21). A potential reason for this discrepancy is that most of the reactions in glycolysis are reversible, while the simple approximation for enzyme activity used here (kcat) cannot capture the demand of enzymes close to thermodynamic equilibrium (34). Moreover, many of the enzymes in glycolysis are regulated allosterically (35), and may hence act at lower activities than assumed in the simulations.
See Suppl. Fig. S5c for central metabolic proteins not covered here.
The pentose phosphate pathway also shows significant signs of partial optimality: the measured abundance of shared proteins is close to and strongly correlated with the predictions (Fig. 3b; rpathway2 = 0.72; GMFEpathway = 1.3). However, measured-only proteins account for 39% of the pathway proteome.
Enzyme expression in the TCA cycle is decidedly non-optimal. The expression of shared enzymes decreases with growth rate, while predictions indicate it should increase (Fig. 3c; rpathway = −0.65). In addition, enzyme abundance is massively higher than predicted across all growth rates (GMFEpathway = 6.4). At the same time, measured-only proteins account for only a very small fraction of the pathway (fmeasured-only = 0.10), and the abundances of individual proteins are also correlated with measured data (rindividual2 = 0.38, p = 0.03).
The proteome fraction allocated to energy generation pathways – comprising the electron transport chain and ATP synthase – is almost independent of the growth rate, while predictions increase with growth rate (Fig. 3d). Similar to the TCA cycle, measured-only proteins make up only a small fraction of the pathway (6%). E. coli fully oxidizes carbon sources to CO2 at low growth rates under aerobic conditions (aerobic respiration), while at high growth rates it only partially oxidizes some carbon sources – in particular glucose and fructose – resulting in the excretion of acetate (aerobic fermentation, leading to overflow metabolism). Along with the metabolic switch from aerobic respiration to aerobic fermentation, the TCA cycle is gradually down-regulated (36). In our predictions, aerobic fermentation is more efficient than aerobic respiration for all conditions, so that only aerobic fermentation was active in the predictions. However, even with a model that predicted the switch to fermentation, our conclusions would likely not change; this is because the switch would not affect lower growth rates, and because the predicted demand into the TCA cycle would only change slightly.
We were surprised to find that the proteins of the glyoxylate shunt (comprising AceA, AceB, and GlcB) are highly abundant at low growth rates (~12% of the proteome at μ = 0.12 h-1; Fig. 3e), with a proteome fraction almost twice that of its alternative pathway, the TCA cycle (Fig. 3c). This high abundance at low growth rates does not appear to be specific to the BW25113 strain, as it is mirrored in the MG1655 strain (Suppl. Fig. S7a) (3, 37). Fluxomics data shows that across many conditions with low growth rates, flux into the glyoxylate shunt is roughly equal to the flux into the TCA cycle (38–43) (Suppl. Fig. S7b). In contrast, the model predicts the glyoxylate shunt to be inactive except in growth on acetate.
In sum, proteome allocation to the pathways of central metabolism is not well explained by optimal proteome efficiency alone, at least not as far as can be discerned with the type of model employed here. This is particularly true for the metabolic switches from aerobic respiration to aerobic fermentation and from the glyoxylate shunt to the TCA cycle.
Utilization of alternative pathways cannot be explained by optimal proteome efficiency
With increasing growth rate, metabolic fluxes may shift between alternative pathways. For example, energy production from glucose switches from aerobic respiration to aerobic fermentation (overflow metabolism) (36). Consistent with previous studies (38, 40), we found that with increasing growth rate, flux gradually transitioned from the PEP-glyoxylate cycle to the TCA cycle (Suppl. Fig. S7).
Neither aerobic respiration nor the glyoxylate shunt are used in the predicted flux distributions. In constraint-based models, overflow metabolism emerges when a previously redundant, additional growth-limiting constraint becomes active (44). While there is evidence that overflow metabolism is rooted in a limit on proteome investment into catabolic enzymes (36, 45), this effect cannot be reproduced in mechanistic models without corresponding empirical adjustments. For example, one way of enforcing aerobic fermentation is to impose a decrease in proteome usage and an increase in energy production with increasing growth rate (36, 46); another is to allocate a constant empirical mass of proteins to energy production (47).
The PEP-glyoxylate cycle, which contains the glyoxylate shunt, represents an alternative route to the TCA cycle (38). Compared to the TCA cycle, the PEP-glyoxylate cycle produces an additional NADH instead of one NADPH (38). Since NADPH is a common cofactor in anabolic pathways in E. coli, it was suggested that the cell should choose the pathway which can produce more NADPH (the TCA cycle) at high growth rates (38). However, the interconversion between NADPH and NADH is a very common process in E. coli (48), and it is not clear how the small difference in pathway output (1 NADPH vs. 1 NADH) could explain the massive resource allocation (~ 12% of the proteome) into the glyoxylate shunt at low growth rates. Recent studies showed that overexpression of glyoxylate shunt enzymes can reduce the lag time when E. coli experiences a transition from a glycolytic carbon source to a gluconeogenic carbon source (49, 50). However, it is still challenging to develop mechanistic models that explain the growth rate-dependent expression of alternative pathways and lag times from first principles.
Conclusions
In this study, we systematically assessed proteome efficiency at the pathway level in E. coli. Overall, we found that the proteome efficiency of pathways increases along the nutrient flow, from transporters to central metabolism to biosynthesis pathways to translation. We note that this gradient is analogous to a gradient of genomic stability observed on much longer time scales, with central reactions being more stable over evolutionary time than reactions at the interface to the environment (51), which we found here to also be less efficient. Above, we showed that proteome allocation is near the optimal demand for the most expensive biosynthesis pathways, including translation as well as amino acid and cofactors biosynthesis pathways; the same pathways are located in the interior of the cellular biosynthetic network. In contrast, about half of the metabolic pathways by mass show a growth rate dependence contrary to that expected for optimal demand, including the TCA cycle, glyoxylate shunt, and transporters; typically, these pathways are located at the periphery of the cellular network. We hypothesize that these patterns of local optimality and sub-optimality arise from two tradeoffs and their interactions: on the one hand, the tension between maximal instantaneous growth and the cell’s ability to quickly and efficiently transition its physiological state in response to environmental change; and on the other hand, the tension between the benefits of precise and optimal control of cellular resource allocation and the resource investment required for the corresponding control systems. Quantifying these tradeoffs and their joint influence on cellular physiology will require an enhanced, quantitative understanding of the evolutionarily relevant patterns of environmental changes as well as of the costs and effectiveness of regulatory strategies available to bacteria such as E. coli.
Methods
Growth rate-dependent biomass composition
The original biomass composition in the iML1515 model is very similar to that of the iAF1260 model, formulated for a doubling time of 40 min or μ = 1.04 h-1 (52). However, biomass composition varies across growth rates. The two most significant changes are those of the RNA/protein mass ratio and the cell volume, which determines the surface/volume ratio (S/V). Both ratios can be expressed as functions of the growth rate; accordingly, we estimated the growth rate-dependent biomass fraction of RNA, protein, and cell envelope components (including murein, lipopolysaccharides, and lipid) as functions of the growth rate, as described below.
We first fitted experimental data for the RNA/protein mass ratio (4, 53) and the surface/volume ratio (S/V) (26) to linear functions of the growth rate (Suppl. Fig. S1), resulting in the relationships
Assuming that the biomass contribution of cell envelope components (menvelope) is proportional to the surface/volume ratio gives
The growth rate-dependent biomass fraction of cell envelope components (menvelope) can then be estimated by equation (3) given equation (2) and menvelope at μ = 1.04 h-1. The relative composition of murein, lipopolysaccharides, and lipid was assumed to be constant.
The biomass fractions of cellular components other than RNA, protein, and cell envelope components (mothers) were assumed to be independent of the growth rate. The sum of RNA and protein is given by:
Combining equation (1) and (4), the content of RNA and protein can be calculated for all conditions (Suppl. Fig. S1). The relative contributions of individual nucleotides to total RNA and of individual amino acids to total protein were assumed to be growth rate-independent. The resulting growth rate-dependent biomass compositions are listed in Suppl. Table S1.
Implementation o/ MOMENT
To perform flux balance analysis with molecular crowding, we used ccFBA (27), which implements the MOMENT algorithm (9) with an improved treatment of co-functional enzymes (28). For enzymes for which maximal in vivo effective enzyme turnover numbers (kapp,max) were available from Ref. (31), we used these to replace the original in vitro kcat values (see Suppl. Table S2 for the turnover numbers used).
Instead of maximizing the growth rate at a given nutrient condition, we solved the complementary optimization problem that estimates the minimal required proteome (C) able to support the observed growth rate on the given carbon source. However, as the objective function in ccFBA is the growth rate, we used an indirect procedure for the soluiton. In the constraint-based type of model employed here, there is a linear relationship between proteome investment and predicted growth rate, C = aμ + b for two constants a,b. Note that due to a non-zero non-growth-related maintenance energy term included in the model, b>0. The constants a and b can be determined by any two pairs of proteome budget and growth rate.
For each experimental condition with observed growth rate μ′ according to Ref. (22), we first estimated the biomass composition at μ′. At this biomass composition, we then predicted the growth rates at C=0.1 g/gDW (C0.1) and C=0.2 g/gDW (C0.2), denoted as μ0.1 and μ0.2, respectively. a and b were then calculated from μ0.1, μ0.2, C0.1, and C0.2. The total minimal required proteome (C′) at the observed growth rate was then read out as C′ = aμ + b.
For a given protein i, its minimal demand at the observed growth rate μ′ (pi,μ′) in units of g/gDW can be expressed as
with pi,μ0.1 the minimal demand for protein i at C0.1.
With the protein content in dry mass at μ′ (mprotein,μ′) estimated in equation 4, the proteome fraction of protein i at μ′ (mi,μ′) can be written as
Pathway membership
Proteins were characterized as transporters if the corresponding genes are assigned to transport processes according to the iML1515 annotation (25). The carbon source is the only nutrient that differs between the minimal media used in the proteomics experiments (22). To make the transporters comparable across conditions, we thus excluded inner and outer membrane transporters for all carbon sources used in the studied conditions (22) and analyzed only the transporters for other metabolites.
We used the pathway ontology in EcoCyc (54) (downloaded on 13. January 2021) to assign the enzyme members for other metabolic pathways.
Proteins are labeled as biosynthetic enzymes based on the EcoCyc pathway ontology annotation “biosynthesis” (54). Pathways in this category are: (1) Amino acid biosynthesis (“Amino Acid Biosynthesis” in EcoCyc), (2) nucleotide biosynthesis (“Nucleoside and Nucleotide Biosynthesis”), (3) cofactors (“Cofactor, Carrier, and Vitamin Biosynthesis”), and (4) cell envelope components (“Cell Structure Biosynthesis and Fatty Acid and Lipid Biosynthesis”), including lipid, peptidoglycan, and LPS. All other biosynthetic enzymes are merged into (5) other biosynthetic pathways. See Suppl. Table S5 for the corresponding hierarchy levels in the EcoCyc pathway ontology.
Enzymes are designated as being involved in precursors and energy generation according to the EcoCyc pathway ontology annotation “Generation of Precursor Metabolites and Energy”. Pathways in this category are: (1) glycolysis, (2) Pentose Phosphate Pathways, (3) TCA cycle, (4) glyoxylate bypass (EcoCyc does not list a pathway for the glyoxylate shunt; the three genes classified as glyoxylate shunt are aceA, aceB, and glcB), (5) energy production (“Electron Transfer Chains and ATP biosynthesis”), and (6) other enzymes.
Treatment of enzymes involved in the nucleotide salvage pathway
In the range of studied growth rates, the transcription of mRNA accounts for more than half of the total RNA transcription (1). The half-life of mRNA is very short (~ 5.5 min) (55) compared to the doubling time, and degraded mRNA will be reused through the nucleotide salvage pathway. However, our model only predicts the expression of de novo biosynthesis pathways. To make the prediction comparable with the observed data, the nucleotide salvage pathway was thus excluded from “nucleotide biosynthesis pathway”.
Transcriptional regulation data
Experimental datasets of RegulonDB v10.9 (56) were used for counting the number of transcription factors regulating each protein.
Acknowledgments
This work was supported by the Volkswagen Foundation under the “Life?” initiative, and by the German Research Foundation (DFG) through grant CRC 1310. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
We thank Hugo Dourado, Deniz Sezer, and Peter Schubert for helpful discussions.
XPH conceived and designed the study and performed analysis. SS performed the analysis of transcriptional factor regulation. MJL supervised the study. XPH and MJL interpreted the results and wrote the manuscript.
The authors declare that no competing interests exist.