Abstract
Saccharomyces cerevisiae is an important model organism and a workhorse in biochemical production. Here, we reconstructed a compact and tractable genome-scale resource balance analysis (RBA) model (i.e., scRBA) to analyze metabolic fluxes and proteome allocation in a computationally efficient manner. Resource capacity models such as scRBA provide the quantitative means to identify bottlenecks in biosynthetic pathways due to enzyme and/or ribosome availability limitations. ATP maintenance rate and in vivo apparent turnover numbers (kapp) were regressed from metabolic flux and protein concentration data to capture observed physiological growth yield and proteome efficiency and allocation, respectively. Estimated parameter values were found to vary with oxygen and nutrient availability. Overall, this work (i) provides condition-specific model parameters to recapitulate phenotypes corresponding to different extracellular environments, (ii) alludes to the enhancing effect of substrate channeling and post-translational activation on in vivo enzyme efficiency in glycolysis and electron transport chain, and (iii) reveals that the Crabtree effect is underpinned by ribosome availability limitations and reserved protein capacity.
Introduction
Saccharomyces cerevisiae is a prototrophic yeast (unicellular fungus organism) that has been domesticated for making bread and wine since ancient times1. It occupies a wide spectrum of natural habitats, ranging from soils, insects, grapes, and leaves and trunks of plant species1. The organism is well-known for its ethanol production (and tolerance) even under aerobic conditions (i.e., the Crabtree effect)2. S. cerevisiae have been considered as a “generally recognized as safe” (GRAS) organism and used extensively in biological research as a model eukaryotic organism3 and in large-scale fermentation4. A prime example is its use in producing bioethanol5 for which global demand was 28.91 billion gallons in 2019 6. S. cerevisiae has also been extensively re-engineered by metabolic engineering to produce various compounds such as fatty acid derivatives and biofuels7, building block organic acids8, biopharmaceutical proteins9, natural products10,11, and food additives12. For example, industrial-scale production of the antimalarial drug artemisinin’s precursors has been achieved using yeast strain with heterologous expression of Artemisia annua’s enzymes13. These numerous applications and adaptations stem from the organism’s robustness in industrial settings (e.g., resistance to growth inhibitors, pH, osmotic, and ethanol stresses)14 and a strong engineering foundation established by well-characterized genome sequences and annotations15,16 and availability of synthetic biology tools17.
These has been a number of genome-wide systems biology studies for S. cerevisiae focusing on cellular expression under growth condition perturbations18, regulation19,20, metabolism21, and genotype-phenotype correlation22. The study of S. cerevisiae metabolism has received considerable attention. Starting from the first genome-scale metabolic (GSM) model reconstructed in 200323, successive models have been developed with improved coverage of genome and metabolic functions24,25. Using as inputs only biomass composition, gene annotations, gene to protein to reaction (GPR) associations, and reaction reversibility information, GSM models have been shown to predict metabolic fluxes and theoretical yields reasonably well21. They have been used extensively to suggest genetic perturbation strategies for metabolic engineering24. Recently, upgrades of GSM models accounting for protein and enzyme availability limitations have been made to improve model prediction by imposing upper limits to metabolic fluxes. Sánchez and coworkers developed the GECKO framework that imposes flux upper bounds derived from protein concentration measurements26. Oftadeh and coworkers presented the expression and thermodynamics flux (ETFL) model that account for cellular expression system and reaction thermodynamics27. These models display enhanced predictions for batch culture conditions where nutrients are in excess and enzyme production capacity becomes the bottleneck. Beyond metabolism and expression processes, a whole-cell model containing 26 sub-models within has also been developed by Ye and coworkers to capture holistically cellular processes28.
In this paper, we put forth a computationally tractable to parameterize and simulate resource balance analysis (RBA) model for S. cerevisiae referred to as scRBA that focuses on metabolism and enzyme and ribosome production. The goal is to construct a model with a level of detail that experimental data can support parameter estimation. Sets of growth and non-growth associated ATP maintenance parameters specific to growth conditions are regressed from a large collection of S. cerevisiae growth phenotype data29–48 to accurately predict condition-dependent growth yield.
The inferred ATP maintenance rates increase significantly in the presence of oxygen and under carbon and nitrogen limitation in agreement with known yeast physiology49–56. In vivo enzyme turnover parameters (kapp) (indicating enzyme efficiency) were regressed using multiple measured extracellular fluxes and protein concentration datasets29,30,34,37. We found that kapp values are often very different than catalogued in vitro turnover numbers (kcat)57 which are typically used in proteome allocation metabolic models. Notably for 4 out of 10 enzymes in glycolysis, 4 out of 4 in electron transport chain, ATP synthase, and 10 out of 20 in amino acyl-tRNA synthetase pathways estimated kapp values were significantly larger (i.e., by up to 189-fold for hexokinase) than tabulated kcat data reflecting higher in vivo enzymatic efficiency. While insight into the exact mechanism for this enhancement is not revealed by the parameterization of the RBA model, there is ample literature evidence58–64 for the presence and function of in vivo metabolons and enzyme post-translational activations. Interestingly, inferred kapp values were generally lower for (i) alternate carbon substrates, (ii) anaerobic conditions, or (iii) carbon/nitrogen limitations in accordance with lower enzymatic efficiencies implied by the phenotypic data. The parameterized scRBA model for S. cerevisiae successfully predicted ethanol overflow under abundant glucose and oxygen conditions. We found that growth was limited by rRNA capacity rather than enzyme availability. Only about 72% of protein capacity is used for growth-coupled glycolysis and biomass production, implying that glucose can be redirected for the production of ethanol and the replenishment of the NAD+ pool using the reserved proteome capacity. The scRBA model captured the flux-limiting effect of enzyme and/or ribosomes availability (i.e., as low as below 20% of FBA predicted fluxes) for 65% of metabolic reactions under glucose uptake conditions. scRBA based predicted maximal product yields for 28 biochemicals were sometimes significantly reduced compared to FBA-calculated values. For example, whereas the maximal predicted yield for succinate was only 21%of FBA yield due to the enzymatically inefficient pyruvate-to-succinate pathway in yeasts, the predicted maximal yield for fatty acid derived products (i.e., free fatty acid, fatty alcohol, and triacylglycerol) were as high as 87-88% of FBA yield values. Overall, this work puts forth the scRBA model for S. cerevisiae that draws from condition-specific parameters to improve prediction accuracy. Its applicability is demonstrated through case studies of S. cerevisiae’s metabolic flux and product yield limit estimations.
Methods
RBA model reconstruction
The model scRBA consists of (macro)molecules and reactions for metabolism and cellular machinery production linked through steady-state mass balance constraints as in FBA65 (see Fig. 1A for a schematic representation). Here, we provide data sources necessary for reconstruction and briefly explain the model elements linking (macro)molecules and reactions. The reconstruction method is described in detail in the Supplementary Methods, with user instructions, formulation (adapted from Goelzer et al., 201166), and indexing. Metabolites and metabolic reactions are ported from iSace1144 (available at https://github.com/maranasgroup/iSace_GSM). Blocked reactions identified by flux variability analysis67 were excluded from scRBA. Protein translation reactions entail amino acids (in the form of charged-tRNA), cofactors, and energy in the form of GTP and ATP68,69 with stoichiometric coefficients designed to match the corresponding protein sequences16 using the Uniprot database70. The mass action contribution of ribosomes is set to be proportional to the sum of all protein translation fluxes. Ribosomes, in turn, are synthesized from proteins and rRNAs. In yeast, most of the genes (i.e., 1,201 out of 1,208 genes in the model) are encoded in the nucleus chromosome and translated by the nuclear ribosome. The remaining mitochondrial genes71 (i.e., 7 genes in the model) are translated using reactions that utilize the mitochondrial ribosome which is treated as separate from the cytosolic counterpart. The stoichiometric coefficients for the proteins and rRNAs associated with ribosome production are obtained from the SGD database16 and ribosome structure observations72. rRNA relative ratios are sourced from the RNAcentral database73. Enzymes are formed from the corresponding protein subunits whose stoichiometric coefficients are obtained from the Uniprot database70. Biomass precursor producing reactions are included in scRBA to inventory all enzymes, ribosomes, and all other macromolecules needed to form biomass. Precursors (and their relative compositions) of DNA, lipids, carbohydrates, metal ions, sulphate, phosphate, and cofactors are ported from the GSM model iSace1144. Based on experimental macromolecular measurements29,38,74,75 the mass fractions of most macromolecules are assumed to remain invariant except for protein, RNA, and carbohydrate fractions that change with increasing growth rate (see Supplementary Data 1). Instead of reconstructing multiple models with different biomass coefficients at different growth rates, the biomass reaction is recast as a set of precursor sink reactions whose fluxes are equal to the coefficients multiplied by the growth rate. We ensure that the biomass molecular weight is always 1 g mmol−1 so that growth yield and rate predictions are consistent76,77. Detailed biomass formulation is provided in the Supplementary Methods and Supplementary Data 1.
(B) Overview of bisection method and the RBA linear programming (RBA-LP) formulation that are solved iteratively to obtain the maximal growth rate. Flux variables are highlighted in red and the growth rate variable is highlighted in green. The topology of all scRBA model captured variables are shown in Fig. 1A. Model parameters are briefly explained in the text and formulation details are available in the Supplementary Methods.
The total amount of protein, enzyme, and ribosome produced is determined by the reaction-enzyme and protein-ribosome coupling constraints and limited by the protein and rRNA capacity constraints (see Fig. 1B for a formulation overview, Supplementary Methods for the complete formulation details, and Goelzer et al., 2011 66 for derivations). The kapp parameter values in the reaction-enzyme coupling constraint are derived from experimental flux and proteomics data (see “Estimation of in vivo kapp” in Methods). In the protein-ribosome coupling constraint, the protein sequence length values are taken from the SGD database16 and the ribosome efficiency parameter (kribo) is fitted using growth phenotype data30. From the experimentally derived78 average value of 10.5 amino acids elongated per ribosome per second, we re-parameterized the kribo value by successively increasing it from 10.5 in increments of 0.1 until the predicted growth rate matched the highest reported experimental value of 0.49 h−1 (in rich media)30. This was met for a slightly higher value of kribo of 13.2 amino acids per ribosome per second. In the scRBA model, enzyme and ribosome production is limited by the experimentally measured protein and rRNA levels29,38,74,75 through capacity constraints. Molecular weights of protein and rRNA (i.e., WPro and WrRNA) are used to convert molar amounts to grams which are set to less than the experimentally found limits (i.e.,
and
) in the capacity constraints.
scRBA directly accounts for only proteins participating in metabolism and translation/elongation. Proteins involved in other processes such as protein folding chaperone and cellular maintenance are not functionally part of scRBA. We assumed that the modeled proteome including metabolic and ribosomal proteins is 55% of the total proteome 27. The cost of producing the remaining 45% is accounted for in an aggregate manner assuming average amino acid composition74. In resource allocation models this is formulated by adding a reaction producing a non-functional so-called dummy protein27,79. The model accounts explicitly the six rRNA species that are part of ribosomes (see Supplementary Methods for details) which constitute as much as 80% of total RNA80. For computational efficiency, mRNA and tRNA demand (i.e., reserving 5% and 15% of total RNA, respectively) are accounted for in an aggregate manner assuming average composition23. Similar to the proteome, reaction producing a non-functional RNA is added to the model. S. cerevisiae maintains reserved ribosome78 and enzyme35 capacity which makes up the difference between experimentally observed and required amounts (estimated under nitrogen limited conditions35,78). The reserve proteome capacity is maintained to prepare cells for changes in growth conditions in the same manner that reserve ribosome capacity enables faster growth immediately upon nutrient availability upshift35,78. In model, the amounts of non-functional protein and RNA representing reserved capacity are treated as fitted variables so as to recapitulate reserved capacity being present or exhausted depending on growth conditions.
Estimation of ATP maintenance rates
ATP maintenance rates are parameters used in both FBA and RBA model to account for the energy cost of replicating cells. Growth-associated ATP maintenance (GAM) (in mmol gDW−1) rate captures the energy demand per unit of produced biomass. Non-growth associated ATP maintenance (NGAM) (in mmol gDW−1 h−1) rate captures the energy demand associated with cellular processes such as repair and maintenance81. GAMFBA (i.e., GAM in FBA model) and NGAM parameters were regressed from growth phenotype datasets recorded different growth rates using the FBA model iSace1144. For every dataset, ATP maintenance rate was estimated by maximizing flux through the ATP hydrolysis reaction (i.e., ATP + H20 → ADP + Pi + H+) subject to experimental extracellular fluxes and growth rate29–37,41–48. NGAM parameter was equal to the maximal ATP hydrolysis flux estimated from data of growth-arrested conditions38–40. GAMFBA is the slope of a linear regression of maximal ATP hydrolysis flux vs. growth rate values whereas the intercept is the NGAM value. GAMRBA (i.e., GAM in RBA model) is estimated by subtracting from GAMFBA value the portion equivalent to protein translation elongation’s energy cost, which is approximately 2 mmol of ATP per mmol of amino acid27. The subtracted amount is 7.6 mmol ATP gDW−1, derived from experimental amino acid measurements74. The NGAM parameter is estimated from growth-arrested conditions data where neither biomass nor protein synthesis is underway and thus the parameter is the same for both FBA and RBA. Different GAMFBA and NGAM parameter sets were regressed from datasets under the following growth conditions: (i) (nutrient-abundant) batch and anaerobic or microaerobic, (ii) C-limited chemostats and anaerobic or microaerobic, (iii) batch or C-limited chemostats and aerobic, (iv) N-limited chemostats and aerobic. Experimental flux inputs and calculated results are recorded in Supplementary Data 2.
Estimation of in vivo kapp
kapp was calculated by dividing estimated intracellular metabolic fluxes by experimental enzyme concentrations82 (see Supplementary Methods for the workflow and Supplementary Data 3 for details). From literature-reported data29,30,34,37, different kapp parameter sets were determined for growth conditions of: (i) (nutrient-abundant) batch/glucose, (ii) batch/galactose, (iii) batch/maltose, (iv) batch/trehalose, (v) C-limited chemostats/glucose D = 0.1 h−1 and (vi) D = 0.3 h−1, and (vii) N-limited chemostats/glucose D = 0.1 h−1 and (viii) D = 0.3 h−1.
Growth maximization, flux variability analysis, and predicted yield using scRBA and FBA
An overview of the RBA optimization model that identifies the maximal growth rate is provided in Fig. 1B. By fixing the growth rate the scRBA model is converted into a linear programming LP formulation (i.e., RBA-LP) which can be efficiently solved. An iterative method is employed to converge the upper (infeasible) and lower (feasible) bounds on growth rate within a tolerance criterion of 10−5 h−1.
In analogy to flux variability analysis (FVA)67, lower and upper bounds of reaction fluxes can be calculated using scRBA and FBA models by updating the objective function of the model (see Fig. 1B) to the minimization or maximization of the flux in question and imposing the experimental glucose uptake rate of 13.2 mmol gDW−1 h−1 and growth rate of 0.42 h−1 30. Experimental (absolute) glucose uptake and growth rates were used in the simulations to be consistent with model parameters derived from absolute flux and concentration measurements. Flux ranges under FBA and RBA are contrasted to elucidate the role of capacity constraints on the flux allocation flexibility. Maximal compound production rate was identified by maximizing the corresponding (sink) exchange reaction flux variable subject to a glucose uptake of 13.2 mmol gDW−1 h−1 30 and growth rate set at a minimum of 0.1 h−1. Hexadecanoic acid and hexadecanol (i.e., C16) were used as proxies in model for in vivo mixture of free fatty acids and fatty alcohols of different chain lengths, respectively. Heterologous pathways for the synthesis of tested compounds were reconstructed based on previous studies7,83–109, as necessary. The RBA-predicted maximal production yield (i.e., , in g g-Glucose−1) was calculated using the following equation:
where
is the RBA-predicted maximal production rate, MWp is the product molecular weight, vGlc is the RBA-predicted glucose uptake rate, and MWGlc is the molecular weight of glucose. FBA-calculated maximal production yield (i.e.,
) was determined using Eq. 1 but with FBA-predicted rather than RBA-predicted quantities. Fluxes are in mmol gDW−1 h−1 and molecular weights are in g mmol−1.
Software implementation
COBRApy110 with IBM ILOG CPLEX solver (version 12.10.0.0) were used for FBA model optimization. The FBA model iSace1144 is available at https://github.com/maranasgroup/iSace_GSM111. General Algebraic Modeling System (GAMS) programming language (version 39.1.0, GAMS Development Corporation) with Soplex solver (version 6.0)112 was used for RBA model kapp parameterization and optimization. Input files as excel spreadsheets were used to build the RBA model in GAMS format. Python 3.6 was used as the central platform to run all mentioned processes. All scripts and input and output files are available in the GitHub repository https://github.com/maranasgroup/scRBA.
Results
Estimation of growth and non-growth associated ATP maintenance
ATP maintenance rate parameters, GAM and NGAM are necessary to accurately account for the fraction of glucose uptake apportioned towards energy production and ultimately growth77.We estimated GAM and NGAM parameters values using an ATP synthase proton/ATP ratio of 10/3. This reflects the fact that the S. cerevisiae ATP synthase consists of 10 c-ring for every 3 F1F0 subunits113 resulting in 10 proton molecules translocated across mitochondrial membrane per 3 ATP molecules produced. Note that earlier GSM models used a proton/ATP ratio of 12/3 23,114. An overview of the literature-reported experimental datasets, methods, and GAM/NGAM values is provided in Fig. 2. We calculated from metabolic fluxes of growth-arrested cells38–40 the NGAM values (in mmol gDW−1 h−1) for (i) aerobic C-limited (NGAM = 1.0), (ii) anaerobic C-limited (NGAM = 1.0), and (iii) aerobic N-limited (NGAM = 3.9). Obtained results reveal a largely unchanging NGAM value across C-limited aerobic and anaerobic conditions even though earlier studies pointed at some small differences39. New results are likely due to the updated ATP synthase reaction stoichiometry reflecting recent literature information113. Note that the calculated NGAM value is nearly 4-fold higher under nitrogen-limited conditions indicating more energy is expended towards cellular maintenance due to the limited nitrogen pool.
(A) Summary of the experimental datasets. (B) Overview of the FBA procedure to calculate maximal ATP hydrolysis rate. (C) Non-growth associated ATP maintenance (NGAM) values. (D) Growth-associated ATP maintenance (GAM) values. R2 is the coefficient of determination for the linear regression to determine the GAM value as the slope. Overall, GAM and NGAM values increase under nutrient limitation and in the presence of oxygen.
GAMFBA values (in mmol gDW−1) are regressed from metabolic flux datasets at different growth rates. Significantly different GAMFBA values were inferred for subsets of data under different experimental oxygen and nutrient availability (see Fig. 2D). This means that GAMFBA values must be tailored to the growth conditions to maintain fidelity of prediction. The lowest GAM value (i.e., GAMFBA = 46.9) is associated with anaerobic batch conditions where nutrients are abundant. Carbon limitation (i.e., GAMFBA = 76.0), presence of oxygen (i.e.,GAMFBA = 92.0), or nitrogen limitation (i.e., GAMFBA = 136.7) increase GAMFBA by 1.6-fold, 2-fold, and 2.8-fold, respectively. In the presence of oxygen, energy consumption is diverted towards dissipating reactive oxygen species from respiration49,50. Respiration also requires functional mitochondria whose synthesis and damage repair require energy input51,52. The lower GAMFBA value under anaerobic conditions may also be in part due to S. cerevisiae adaptation to conditions with abundant fruit-borne sugar but low oxygen availability due to diffusion limitations115. Under glucose-limitation, higher ATP maintenance rate is likely needed for the synthesis of the catabolic catalytic apparatus to scavenge alternate carbon sources other than glucose (e.g., through the Snf1 regulatory pathway)53. While this is an important adaptation in the natural habitat, it is counterproductive in laboratory settings where alternative carbon sources are not provided. Under N-limitation, higher ATP maintenance rate is associated with the degradation of selected proteins (e.g., ribosomal proteins)54,55 to replenish depleted pools of amino acids and other N-containing compounds56. Because nitrogenous metabolite pools are not conserved by downregulating protein synthesis but rather by engaging an ATP-consuming protein synthesis-degradation cycle, this leads to a significantly higher GAMFBA value under N-limitation. Overall, we estimated condition-specific GAM and NGAM values and provided hypotheses for mechanistic bases for S. cerevisiae’s varied ATP maintenance under different growth conditions. Interestingly, we observed that under P-limitation36 ATP maintenance does not follow a linear trend (as shown in Fig. 2D) and GAM value (i.e., slope) becomes dependent on the degree of limitation (see Supplementary Fig. 1).
Estimation of in vivo apparent kapp values
Enzyme turnover numbers kapp are RBA model parameters that directly affect proteome allocation needs as their product with enzyme levels must match the observed metabolic flux values. Underestimated values for kapp results in higher then needed requirements for protein levels and vice-versa. Even though in vitro turnover numbers (kcat) for many reactions are available in the literature and biochemical databases such as BRENDA57, they are not immediately usable in RBA models. This is because kcat entries do not capture the nuances of the in vivo environment (e.g., sub-saturation of enzyme, substrate channeling, post-translational modifications, etc.) that can dramatically alter enzymatic efficiencies82,116. In model scRBA, we instead rely on kapp values supported by measured metabolic fluxes and corresponding enzyme levels. Assuming that the in vivo substrate concentration is below the saturated level, kapp values are by definition lower than kcat values57 (i.e., in the Michaelis-Menten expression, kapp = kcat([S]/(KM + [S])) ≤ kcat). However, we found in agreement with earlier studies82,117 that the reverse is true for several enzymes (i.e., 46 out of 132 enzymes in E. coli based on available data82) indicating that enzyme efficiency is often enhanced in vivo through possibly substrate channeling and/or enzyme activation (see Fig. 3 for a visual and Supplementary Data 4 for tabulated values). Enzymes with experimental evidence confirming in vivo activity enhancement are reported in Table 1. Enzymes without direct evidence but with predicted marked enhancements in vivo are reported in Table 2 as candidates for further testing. Among these candidates is included a hypothetical model of a metabolon involving the ATP synthase proposed for yeast in analogy to the mammalian counterpart118. We also found that kapp values are lower than kcat for four enzymes in the glycolysis metabolon58 (see Fig. 3) which suggests that the reducing effect of enzyme sub-saturation is stronger than any enhancing effect of substrate channeling. We find that in vivo enhancements occur predominantly in high-flux glycolysis, electron transport chain, and ethanol fermentation alluding to a mechanism to reduce proteome investment for high-capacity enzymes. Note that proteome allocation needs towards these pathways would have been significantly higher if the in vivo kcat values were unaltered from the in vitro ones. This makes sense as pathways with high flux would be under a stronger selection to achieve in vivo kapp enhancements through a variety of mechanisms in comparison to low flux routes. This result further reinforces the need to estimate and utilize condition-specific in vivo kapp values to faithfully recapitulate in vivo enzymatic efficiency and predict proteome allocation with RBA models.
kapp values for the growth condition of batch, aerobic, glucose, and minimal (YNB) media were used in comparison. Experimentally validated enzymes with enhanced in vivo efficiency are indicated by red and yellow dots. Gray dots indicate enzymes for which we did not identify such information.
kapp parameter sets were subsequently regressed separately under different growth conditions (see Section 2.3) to assess their impact on enzyme efficiency. Perturbation instances from the reference condition (i.e., batch, aerobic, glucose, minimal (YNB) media) include (i) from aerobic to anaerobic, (ii) from minimal to rich (YNB + amino acids) media, (iii) from batch to chemostats (C- or N-limited), and (iv) from glucose to an alternative carbon substrate. Plots of ratio values are shown in Fig. 4.
values reflect increased, constant, or reduced enzymatic efficiency under the perturbed from the reference condition. Overall, we found that replacing glucose with an alternate substrate leads to on an average as much as 85% reduction in enzymatic performance (i.e., when switching to trehalose) (see Fig. 4). This enzymatic performance reduction is mirrored in the observed slower growth rates (GR) on galactose (GR = 0.17 h−1), maltose (GR = 0.28 h−1), and trehalose (GR = 0.05 h−1) compared to (GR = 0.42 h−1) for glucose. scRBA results allude to likely strong adaptation of S. cerevisiae for proteome-efficient growth on glucose but not for the other three sugar substrates. Specifically, under galactose growth, galactose 1-phosphate accumulates while the fructose 6-phosphate pool is depleted36 suggesting the formation of a metabolic bottleneck between: (i) UDP-glucose – hexose-1P uridylyltransferase, (ii) UDP-glucose 4-epimerase, or (iii) phosphoglucomutase. Under maltose utilization, the maltose/proton symporter (uptake) and the alpha-glucosidase (maltose to glucose reaction) are likely rate-limiting as S. cerevisiae is susceptible to maltose hypersensitivity shock119. Under trehalose growth, the turnover number of the first enzymatic step, trehalase (kapp of 355 s−1), is comparable to the one for the glucose growth, hexokinase (kapp of 319 s−1). However, a closer look at the measured enzyme concentrations30,37 reveals an approximately 60-fold lower availability of trehalase (i.e., 0.22 nmol enzyme gDW−1) compared to hexokinase (i.e., 14 nmol enzyme gDW−1) for trehalose and glucose growth, respectively.
Distributions are visualized using standard box plots (i.e., box: interquartile range (IQR), whiskers: 1.5*IQR, and dots: outliers). The value of “n” indicates the number of overlapping reactions of the respective two sets of reactions whose kapp can be estimated from available flux and protein measurements.
Lack of oxygenation and nutrient limitation generally negatively affect enzymatic efficiency (see Fig. 4). Under C- and N-limitation, network-wide efficiency reduction is likely due to depleted intracellular metabolite pools56 which introduce substrate level limitations for many enzymes. Under anaerobic conditions, two-fold efficiency reduction (i.e., ) are predicted for enzymes in the TCA cycle performing biosynthesis role, ethanol fermentation (i.e., specifically alcohol dehydrogenase), fatty acid biosynthesis and elongations, and nucleotide biosynthesis.
Amino acid supplementation to the (minimal) YNB media does not appear to significantly affect enzyme efficiency (see Fig. 4). Overall, using model scRBA, we found that by estimating condition-specific in vivo kapp values we can elucidate changes in overall enzymatic efficiency utilization as a function of growth conditions and extracellular nutrient availability.
scRBA model simulation of Crabtree-positive phenotypes
We next contrasted scRBA model predictions against experimental growth phenotypes and proteome allocations at varying glucose uptake rates29,30,33–37,39,41–44,47. GAMRBA, NGAM, and in vivo kapp values for batch aerobic conditions were used in the simulations. Overall, model scRBA recapitulated both the Crabtree-negative phenotype (i.e., no ethanol overflow) at low glucose uptake rates and the Crabtree-positive phenotype (i.e., ethanol overflow) at high glucose uptake rates (see Fig. 5A). The translation parameter kribo (amino acids per ribosome per second) was set at 13.2 (see Methods) which is only slightly higher than the value of 10.5 estimated from dynamic labeled peptide measurements78. Note that FBA using biomass yield as the objective function predicts the Crabtree-negative phenotype but not the Crabtree-positive phenotype as growth rate will simply scale with glucose uptake rate without an upper limit. This implies that in the scRBA model growth rate is limited by rRNA and protein capacity constraints. scRBA predicts that rRNA capacity is exhausted for cells that grow at the maximal rate of 0.47 h−1 which matches the experimental derived growth rates (see Fig. 5B). In contrast, to fully utilized rRNA capacity only 72% of protein capacity is needed to produce biomass at the maximal growth rate. This means that more than one fourth of the proteome capacity can be allocated for other cofactor-balanced pathways such as ethanol production and/or complementary glycolysis enzymes.
(A) Predicted (lines) and experimental (dots) growth and ethanol secretion rates at different glucose uptake rates. (B) Protein and rRNA capacity usage at different glucose uptake rates. Overall, model scRBA determines that limited rRNA capacity prevents faster growth and excess protein capacity accommodates ethanol overflow.
Model scRBA identifies that ethanol overflow is a consequence of remaining proteome capacity upon rRNA capacity exhaustion. This raises the question of why rRNA and protein capacities in S. cerevisiae are not apportioned to facilitate even faster growth rates. For example, Escherichia coli has a much larger RNA capacity (i.e., 20.5%120 vs. 6.6% gRNA gDW−1 in S. cerevisiae74) and a slightly more efficient ribosome (i.e., kribo of 17 121 vs. 13.2 in S. cerevisiae (this work)). This enables E. coli to access a significantly higher maximal growth rate than S. cerevisiae (i.e., max growth of 1.2 h−1 122 vs. 0.49 h−1 in S. cerevisiae30). While at first glance this is counter-intuitive, the slower growing ethanol overflow phenotype offers a number of evolutionary advantages: (i) out-competing other microbes in sugar consumption, (ii) producing ethanol that is toxic to bacterial competitors, and (iii) adapting easily to anaerobic conditions with ready to deploy respiro-fermentative proteome allocation111,123. The adaptive advantage of S. cerevisiae’s proteome was demonstrated in anaerobic and cyclically oxygenated cultures where higher abundances of S. cerevisiae cells competing with the respiratory-favoring yeast Issatchenkia orientalis were measured111. Overall, model scRBA results corroborate literature-reported observations and strongly suggest that rRNA limitations coupled with reserved protein capacity are the key drivers of the Crabtree effect in S. cerevisiae.
Effect of protein capacity limit on metabolic flux upper bounds and maximal product yields
Enzyme(s) availability bottlenecks can add additional barriers to reaching FBA calculated maximum theoretical limits. Identifying these yield-limiting enzymes is important so as to guide specific gene overexpression strategies remedying these shortcomings without wasting resources on enzymes that are not limiting. To this end, we contrasted the calculated flux bounds (i.e., FVA analysis) using model scRBA (with kapp parameters for batch aerobic conditions typically used in compound production) and model iSace1144 using FBA. RBA/FBA absolute upper bound flux ratios are calculated for 800 flux-carrying metabolic reactions under glucose uptake conditions (see Fig. 6A for a histogram and Supplementary Data 5 for details). RBA/FBA ratios are less than 20% for as many as 516 out of 800 flux-carrying reactions (see Fig. 6A), indicating that catalytic resource limitations as encoding in model scRBA are propagated to most reactions in the metabolic network. In contrast, two spontaneous reactions, 101 coupled to biomass production reactions, six glucose uptake associated reactions, and fifteen product secretion reactions are (nearly) unconstrained by proteome allocation (i.e., ratio value > 90%). In central metabolism, FBA (through FVA analysis) allows for maximal glycolysis and pentose phosphate pathway (PPP) fluxes that are up to an order of magnitude larger than the glucose uptake rate (i.e., 13.2 mmol gDW−1 h−1) (see Fig. 6B and Supplementary Data 5). These very high fluxes are caused by activating ATP-consuming cycles. For example, the FBA-calculated maximal flux of phosphofructosekinase (ID: PFK_c) reaction in glycolysis is 225 mmol gDW−1 h−1 which contains an ATP-consuming cycle (i.e., 95% of total flux) with fructose bisphosphate phosphatase reaction in gluconeogenesis. These cycles are retained in FBA because without any additional constraints extra glucose can be used to produce ATP at a yield of up to 25.6 mol ATP / mol glucose. Imposing protein and ribosome availability constraints in RBA greatly reduce the extent of ATP-consuming cycles (see Fig. 6B and 6C). For example, the RBA-calculated PFK_c maximal flux is 27.6 mmol gDW−1 h−1 which is nearly 10-fold smaller than the FBA-calculated one. The same “tightening” of the upper bound effect through RBA applies to the fluxes of succinate dehydrogenase (ID: SUCDq6_m) and malate dehydrogenase (ID: MDH_m) that are part of the reduced cofactor cycling between cytosol and mitochondria.124–126 The flux around the cycle is determined exactly by the mitochondrial proton gradient. Therefore, protein capacity constraints are needed to control flux through futile cycles, especially for organisms with highly active overflow metabolism. Enzyme availability also limits TCA cycle fluxes to approximately 80% of the metabolic limit determined by the (glucose-derived) acetyl-CoA surplus (see Fig. 6B and 6C). In contrast, fluxes of the six non-cycling glycolysis reactions are well resolved through FBA as they are fully coupled to the pre-specified glucose uptake (see Fig. 6C). Counterintuitively, their upper bounds are slightly higher using RBA than with FBA by about 1.5 – 1.8% (see Fig. 6C). This is because a slightly higher glycolysis/pentose phosphate pathway (PPP) split ratio for a given glucose uptake is predicted by optimizing NADPH usage in the scRBA model. The lower flux through the NADPH-generating PPP is due to the fact that the actual NADPH needs for amino acid synthesis (accounting in detail by RBA) are slightly less than the lumped amount of the stoichiometric description used in FBA.
(A) Histogram of RBA/FBA flux upper bound ratio values. (B) RBA- (white bars) and FBA-calculated (black bars) flux ranges for reactions in central metabolism subject to experimental glucose uptake and growth rates30. Reaction IDs are in BiGG format127 and reaction details are available in the scRBA github repository. (C) Central metabolism network (drawn by the Escher software128) with overlayed reaction IDs and corresponding RBA/FBA flux upper bound ratio values (annotated as colors of arrows).
Next, we evaluated the effect of protein capacity limits on the production yields for 28 compounds by contrasting their maximal yields from scRBA and iSace1144 models (see Table 3 and Supplementary Data 5 for RBA-predicted fluxes and proteome allocations). Overall, the maximal yields for 26 out of 28 product metabolites are only marginally restricted by protein capacity with RBA-calculated yields well within 80-100% of the FBA-calculated values (see Table 3). Butanediol, citramalate, ethanol, and lactate productions are least affected with RBA-predicted yields retained at 100% of FBA-calculated yields. This is consistent with the previously determined high protein reserve capacity of S. cerevisiae for overflow metabolism. In contrast, succinate and reticuline maximal yields are significantly affected by the protein capacity limit (i.e., down to 21% and 68% of FBA-calculated yields, respectively). Succinate production is directly limited by the enzyme efficiency of its cytosolic pathway. We calculated for S. cerevisiae kapp values of 13, 30, 71, 1,582 s−1 for pyruvate-to-succinate enzymes compared to kapp values of 539 and 612 s−1 for pyruvate-to-ethanol enzymes. This kinetic bottleneck is absent in E. coli whose pyruvate-to-succinate enzyme kcat values (archived in BRENDA57) are 250, 540, 931, and 1,150 s−1. Model scRBA-predicted succinate yield (i.e., 0.20 g g-Glucose−1) matches the experimentally observed yield ranges for engineered S. cerevisiae strains ranging from 0.11 for an unevolved strain88 and 0.43 for a laboratory evolved strain129. These achieved succinate yields are much lower than the FBA-calculated maximum theoretical yield of 0.95 or alternatively that of an engineered E. coli strain (i.e., 0.81)130. As demonstrated in this succinate case study RBA results can provide a roadmap for relieving production bottlenecks through gene overexpression and/or replacement of enzymes with more efficient heterologous versions. The production of reticuline, on the other hand, is not limited by the efficiency of the enzymes in the biosynthetic pathway but rather by the proteome cost of recharging the needed cofactors (i.e., two NADPH and three S-adenosyl methionine (SAM) per reticuline molecule)98. Using model scRBA we predicted that supplying NADPH (i.e., through PPP) for reticuline consumes only 1.6% of total available proteome. However, the recovery of SAM (C1 donor) from S-adenosyl homocysteine (SAH) for reticuline production is approximately 20-fold more proteome consuming than recovering NADPH taking up to 37% of the total proteome. The significant metabolic burden of recovering cofactor SAM has also been reported for E. coli where a 12-fold increase in reticuline titer was achieved when the dopamine-to-reticuline conversion occurred ex vivo with the growth medium was supplemented in excess with SAM131. The costliest product metabolite in terms of NADPH consumption is fatty alcohol requiring 16 moles of NADPH per mole of C16. Despite this high NADPH cost, the corresponding protein allocation for NADPH recharging is only 6.4% of the total proteome. scRBA results demonstrate that a high demand for expensive-to-recharge cofactors can create significant proteome allocation burdens that can negatively affect maximal production yields. This suggests that more enzyme efficient cofactor recharging variants may be a promising route for de-bottlenecking flux through the product biosynthetic pathway. Although protein capacity is not flagged as an issue for most metabolic products, experimentally achieved yields are usually significantly lower than RBA model predicted values except for ethanol and l-lactate (see Table 3). This underperformance may suggest that there is likely untapped potential to further optimize yield through the application of metabolic engineering strategies. Model scRBA predicted proteome allocations for the maximal production of 28 metabolic products are provided in the Supplementary Data 6. These values can help guide further engineering strategies by contrasting with experimentally elucidated quantitative proteomic levels to better guide further strain redesign efforts.
Discussion
Model scRBA provides insights onto S. cerevisiae metabolism under a variety of growth conditions by imposing enzyme and ribosome availability limitations on top of mass balances and biomass synthesis needs. scRBA was designed to reduce the number of variables and parameters to only the ones that can be supported by the available proteomic and fluxomic data. Biological processes not contributing to the functional parts of the model (e.g., transcriptional machinery) are modeled in an aggregate manner as sinks for redox and carbon resources without tracking individual steps (see Methods). This makes scRBA more computationally tractable in analyzing metabolic proteome allocation compared to models of larger scope27,28. If details on transcription and mRNA availability are available then the ETFL framework27 provides an alternative. Whole-cell modeling28 has also been performed for S. cerevisiae but computational efficiency and missing parameter values (as noted by the authors) remain a challenge. A distinguishing feature of scRBA is that parameters GAM, NGAM, and kapp are derived directly from in vivo data thus enabling the condition-specific prediction of growth yield, rate, and proteome allocation. We found significant variation in these parameters depending on nutrient and oxygen availability as well as carbon substrate choice. It is unclear whether parameter values will ultimately converge to the ones for the reference conditions upon adaptation through laboratory evolution for different growth conditions, substrates, and/or genetic backgrounds as seen for E. coli116.
scRBA results suggest that biomass production under maximum growth consumes all available rRNA but only part of the proteome capacity in S. cerevisiae. This leaves reserve capacity for ethanol overflow to be carried out providing a mechanism for recharging NAD+ and an electron sink. The predicted link between reserve proteome capacity and metabolic overflow in S. cerevisiae raises a question on its generality. Analogous overflow phenotypes are present for E. coli with acetate122 and cancer cells with lactate132 overflow. A quantitative framework similar to scRBA could in principle be applied to confirm the preserve of excess proteome at the point of rRNA exhaustion.
The key design feature of scRBA to use kapp as opposed to kcat values unlocks the opportunity to parameterize many more reactions by leveraging availability of quantitative proteomic and fluxomic datasets29,30,34,37. scRBA makes use of 336 kapp values pegged to in vivo protein and flux measurements whereas only 80 kcat values for S. cerevisiae under batch aerobic conditions with glucose as substrate can be recovered from BRENDA57. Using kapp values also obviates the need to “approximate” missing kcat values with measurements from other organisms133. Contrasting in vivo kapp values with in vitro kcat values can be an effective method for leveraging multi-omics data to systematically identify enzymes with in vivo enhancements. While the mechanism of enhancements may remain elusive, their presence and extent could reveal important biological insight as to the spatial organization of pathways and regulation of enzymes.
Consistent with metabolomics data36,56 we observe a lowering in kapp values under substrate and/or nitrogen limitations caused by reduced metabolite concentrations. This further reinforces the need to re-estimate kapp parameters whenever simulation conditions change. An alternative approach could be to track enzyme turnover and enzyme saturation as carried out in the growth balance analysis (GBA) framework134. GBA uses formal (nonlinear) kinetic descriptions (e.g., Michaelis-Menten) for the reaction fluxes and mass balance equations for all macromolecules as in RBA. This enhanced level of detail in description comes at the expense of having to generate absolute intracellular metabolite concentration measurements (on top of flux and enzyme measurements) as inputs to estimate enzyme saturation parameters (e.g., KM in the Michaelis-Menten equation). Therefore, even though RBA sacrifices the ability to directly model enzyme saturation, it can quantify proteome allocation at a genome-scale in a computationally and data efficient manner.
Model scRBA is shown to be more apt at recapitulating S. cerevisiae metabolic phenotypes compared to the corresponding FBA model iSace1144 111. In particular, scRBA can both identify reaction step bottlenecks and often pinpoint the functional reason for them such as inefficient enzyme with low kapp or proteome costly cofactor cycling. We found that a significant fraction of S. cerevisiae reactions has an upper bound that is severely restricted due to proteome allocation limits (i.e., more than 60% have an upper bound that is < 20% of FBA theoretical). Despite this scRBA-predicted product yield calculations are generally only marginally lower than the FBA-based theoretical limit. This is primarily due to the fact that most upper bound restriction by RBA happens for reactions that can participate in ATP-consuming cycles. While this ATP overhead is tolerated in FBA calculations by draining from a large glucose surplus, the associated increased proteome allotments are severely curtailed in RBA calculations. Because product synthesis pathways do not generally involve ATP consuming cyclic steps, reductions in predicted maximum yields under RBA are not overly taxing. It is important to stress that the RBA modeling framework does not account for many other factors that could affect production such as regulatory feedback and metabolite pool bottlenecks135,136. For example, release of inhibition of several upstream enzymatic steps by high concentrations of tyrosine and phenylalanine need to be addressed when producing shikimate-derived metabolites in S. cerevisiae90. Overall, we envision scRBA model as part of a combined experimental and computational toolkit to investigate cellular metabolism and assess metabolic proteome allocation of S. cerevisiae. The relative compact nature of the RBA framework presented herein makes it tractable for non-model organisms leveraging advances in GSM reconstructions137.
Data availability
Data is available in the GitHub repository https://github.com/maranasgroup/scRBA.
Code availability
Software scripts are available in the GitHub repository https://github.com/maranasgroup/scRBA. The user manual is provided in the Supplementary Methods.
Author contributions
Conceptualization: H.V.D. and C.D.M; methodology, software, and data curation: H.V.D.; analysis: H.V.D. and C.D.M.; resources, supervision, and funding acquisition: C.D.M.; writing: H. V.D. and C.D.M.
Competing interests
The authors declare no competing interests for this manuscript.
Acknowledgements
We thank Patrick F. Suthers (from Penn State) for his comments. Computations for this research were performed on the Pennsylvania State University’s Institute for Computational and Data Sciences’ Roar supercomputer. This work was funded by the DOE Center for Advanced Bioenergy and Bioproducts Innovation (U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under Award Number DE-SC0018420). Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Department of Energy.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.
- 43.
- 44.↵
- 45.
- 46.
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.
- 85.
- 86.
- 87.
- 88.↵
- 89.
- 90.↵
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.↵
- 99.
- 100.
- 101.
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵