Abstract
The metabolism of most organisms is controlled by a diverse cast of regulatory processes, including transcriptional regulation and post-translational modifications (PTMs). Yet how metabolic control is distributed between these regulatory processes is unknown. Here we present Comparative Analysis of Regulators of Metabolism (CAROM), an approach that compares regulators based on network connectivity, flux, and essentiality of their reaction targets. Using CAROM, we analyze transcriptome, proteome, acetylome and phospho-proteome dynamics during transition to stationary phase in E. coli and S. cerevisiae. CAROM uncovered that the targets of each regulatory process shared unique metabolic properties: growth-limiting reactions were regulated by acetylation, while isozymes and futile-cycles were preferentially regulated by phosphorylation. Reversibility, essentiality, and molecular-weight further distinguished reactions controlled through diverse mechanisms. While every enzyme can be potentially regulated by multiple mechanisms, analysis of context-specific datasets reveals a conserved partitioning of metabolic regulation based on reaction attributes.
Author summary There are several ways to regulate an enzyme’s activity in a cell. Yet, the design principles that determine when an enzyme is regulated by transcription, translation or post-translational modifications are unknown. Each control mechanism, such as transcription, comprises several regulators that control a distinct set of targets. So far, it is unclear if similar partitioning of targets occurs at a higher level, between different control mechanisms. Here we systematically analyze patterns of metabolic regulation in model microbes. We find that five key parameters can distinguish the targets of each mechanism. These key parameters provide insights on specific roles played by each mechanism in determining overall metabolic activity. This approach may help define the basic regulatory architecture of metabolic networks.
Introduction
A myriad control mechanisms regulate microbial metabolic adaptation to new environments [1–8]. Nevertheless, microbes deploy distinct regulatory mechanisms to regulate enzyme activity in response to specific environmental challenges. For example, B. subtilis cells primarily utilize transcriptional regulation when glucose is available, but post-transcriptionally regulate metabolic enzymes after malate addition [9]. In both E. coli and yeast, some pathways, such as glycolysis, are predominantly regulated by post-transcriptional regulation, while others, such as the TCA cycle, are regulated at the transcriptional level [1,3,10]. This suggest that apart from differences in response time, specific mechanisms are deployed for specialized regulatory tasks. Nevertheless, it is unclear why some enzymes are regulated using acetylation or via other PTMs such as phosphorylation [3,4].
Numerous advantages of regulation by PTMs have been proposed over the past five decades [11–13]. These include low energy requirements, rapid response, and signal amplification. Yet these characteristics are not unique to PTMs, and these features also do not differentiate between PTMs such as acetylation and phosphorylation. The staggering complexity of each regulatory process has limited the comparative analysis of metabolic regulation at a systems level [3]. Existing studies have focused on a small set of metabolic pathways or on a single regulatory process [4,10,14–20]. Such studies have revealed reaction reversibility and metabolic network structure to be predictive of regulation [8,16,21–24]. Yet these studies do not shed light on the differences between each regulatory process. In sum, although some general network principles of regulation are known, how it is partitioned among various regulatory mechanisms is unclear.
We hence developed a data-driven approach, called Comparative Analysis of Regulators of Metabolism (CAROM), to identify unique features of each regulatory process. CAROM achieves this by comparing various properties of metabolic enzymes, including essentiality, flux, molecular weight and topology. It identifies those properties that are highly enriched among targets of each process than expected out of random chance.
Results and Discussion
Here we focus on four well-studied control mechanisms with available omics datasets - transcription, post-transcription, phosphorylation and acetylation. We analyzed the dynamics of metabolic regulation during a well-characterized process in yeast, namely, transition to stationary phase. We obtained RNA sequencing, time-course proteomics, acetylomics, and phospho-proteomics data from the literature [25–27]. Targets for each process were determined based on differential levels between stationary and exponential phase (Methods). We assumed that PTMs and other regulatory sites that are dynamic and conditionally regulated are likely to be functional [28].
The targets of diverse regulatory mechanisms were used as input to CAROM. CAROM analyzes the properties of the targets in the context of a genome-scale metabolic network model of yeast [29]. We hypothesized that differences in target preferences between diverse regulators can be inferred from the network topology and fluxes. Protein and gene targets of each process were mapped to corresponding metabolic reactions in the model. There was significant overlap among reactions regulated through changes in both the transcriptome and proteome, and transcriptome and acetylome (hypergeometric p-value = 5 × 10−25 and 1 × 10−15 respectively; S. Table 1). In contrast, there was little overlap between targets of phosphorylation with other mechanisms (p-value > 0.1; S. Table 1). While prior studies found higher overlap between targets of PTMs [30,31], they used all possible sites that can be acetylated or phosphorylated. However, only a fraction of PTM sites are likely to be active and functional in a single condition. Overall, each regulatory mechanism had a distinct set of targets (Figure 1A).
What are the common features of enzymes that are regulated by each mechanism? To answer this, we used CAROM to compare the regulation of enzymes that are essential for growth in minimal media. Essential enzymes in the yeast metabolic model were determined using Flux Balance Analysis (FBA) [32]. Surprisingly, this set of enzymes was highly enriched among those regulated by acetylation but not by other processes (ANOVA p-value < 10−16; Figure 1B; S. Table 2). Since regulation can be optimized for fitness across multiple conditions [33], we identified enzymes that impact growth in 87 different nutrient conditions comprising various carbon and nitrogen sources using FBA. This set of essential enzymes was once again enriched for acetylation relative to other mechanisms (ANOVA p-value < 10−16; S. Figure 1). This trend was observed using experimentally derived list of essential genes as well (hypergeometric p-value = 2 × 10−7 for acetylation). Interestingly, in contrast to acetylation, genes regulated at the proteomic level were significantly under-represented among the essential genes (hypergeometric p-value of depletion = 8 × 10−11). Thus, essential enzymes are likely to be constitutively expressed and their activity modulated through acetylation. This may explain why transcriptional regulation has minimal impact on fluxes in central metabolism, which contain several growth-limiting enzymes [3,10].
We next used CAROM to determine the impact of reaction position in the network on its regulation. We counted the number of pathways each reaction is involved in, along with other topological metrics, such as the closeness, degree and page rank. We found that the regulation of enzymes differed significantly based on network topology (Figure 1C). First, reactions with low connectivity, measured through any of the topological metrics, were highly likely to be unregulated. In contrast, highly connected enzymes linking multiple pathways were more likely to be regulated by PTMs. Interestingly, reactions regulated by both the PTMs had the highest connectivity (S. Figures 2, 3). Several key hubs, such as acetyl-CoA acetyltransferase, hexokinase and phosphofructokinase are regulated by at least 2 different mechanisms (S. Table 3).
We next assessed how regulation differs based on the magnitude and direction of flux through the network. We inferred the full range of fluxes possible through each reaction using flux variability analysis (FVA) [34]. Since yeast cells may not optimize their metabolism for biomass synthesis during transition to stationary phase, we also performed FVA without assuming biomass maximization. We found that irreversible reactions were highly likely to be regulated (S. Figure 4). A recent study found the same trend for allosteric regulation as well [21]. However, reversibility alone did not differentiate between regulatory mechanisms.
Interestingly, reactions that have the potential to carry high fluxes were predominantly regulated by phosphorylation (Figure 1D; ANOVA p-value < 10−16). This set of phosphorylated reactions comprise several kinase-phosphatase pairs, enzymes that are part of loops that consume energy (“futile cycles”), or reactions that have isozymes in compartments such as vacuoles or nucleus (S. Table 4). Thus, phosphorylation in this condition selectively regulates reactions to avoid futile cycling between antagonizing reactions or those operating in different compartments. Using data from experimentally constrained fluxes from Hackett et al study [21] revealed similar patterns of regulation (S. Figure 5). Reactions with the highest flux, such as ATP synthase, phosphofructokinase, and nucleotide kinase, were also regulated by multiple mechanisms.
Finally, we compared regulation based on fundamental enzyme properties: catalytic activity and molecular weight. While catalytic activity was similar across the targets of all mechanisms, targets of phosphorylation had the highest molecular weight (p-value < 10−16) (S. Figures 6,7). There is a weak correlation between molecular weight and maximum flux (Pearson’s correlation R = 0.02), suggesting that both maximum flux and molecular weight are likely to be independent predictors of regulation by phosphorylation.
To check if this pattern of regulation is observed in other conditions, using CAROM, we analyzed data from nitrogen starvation response and the cell cycle in yeast, where both phospho-proteomics and transcriptomics data are available [35–38]. A similar trend of regulation was observed in these conditions with phosphorylation regulating isozymes and enzymes that can carry high fluxes (futile cycles) (Figure 2). Since isozymes arise frequently from gene duplication, our results may explain the observation that duplicated genes are more likely to be regulated by phosphorylation [39].
Since many mechanisms of metabolic regulation are evolutionarily conserved, we next analyzed data from E. coli cells during stationary phase [40–42]. By analyzing transcriptomics, proteomics, acetylomics and phosphoproteomics data using the E. coli metabolic network model, CAROM uncovered that the pattern of regulation observed in yeast was also observed in E. coli (Figure 3).
Reactions that were regulated in E. coli had higher topological connectivity compared to those that were unregulated. Further, essential reactions were enriched for regulation by acetylation, and reactions with high maximum flux or large enzyme molecular weight were enriched for regulation by phosphorylation. However, in contrast to yeast, phosphorylation impacted very few metabolic genes in E. coli, and may play a relatively minor role in this specific context. Phosphorylation had 20-fold fewer targets compared to other mechanisms, and its targets overlapped significantly with other processes (S. Tables 5–6).
In sum, our analysis reveals a unique distribution of regulation within the metabolic network (Figure 4). Within each process, it is well known that individual regulators such as transcription factors or kinases have their own unique set of targets. Here we find that similar specialization occurs at a higher scale, involving diverse processes. Reaction properties identified by CAROM to be associated with distinct regulatory mechanisms may be related to specific functions performed by each regulator. For example, phosphorylation may represent a mechanism of feedback regulation to control futile cycles and high flux reactions that consume ATP [6,43]. Finally, this pattern of regulation is context specific – predictive features such as reaction flux or essentiality can change between conditions and influence regulation. Further, while most essential reactions were regulated, a small subset (14%) were not found to be regulated by any mechanism. These enzymes could be sites of allosteric regulation or other regulatory mechanisms not covered here due to the lack of context specific datasets (S. Table 7). Overall, these results are robust to the thresholds used for finding differentially regulated sites, using data from different sources, and other modeling parameters (S. Tables 8–12).
Since microbes exhibit a wide range of metabolic behaviors, it is not possible to uncover regulation in each condition through experiments. We need tools like CAROM to identify factors that determine the deployment of regulatory mechanisms in a metabolic context. Although flux balance analysis of metabolic models can accurately forecast optimal flux distribution, it does not provide insights on how the flux rewiring is achieved. Our analysis predicts regulatory mechanisms that will likely orchestrate flux adjustments based on reaction attributes. This can guide drug discovery and metabolic engineering efforts by identifying regulators that are dominant in different parts of the network [44]. CAROM can be applied to uncover target specificities of other regulators such as non-coding RNAs and PTMs, and help understand the architecture of metabolic regulation in a wide range of organisms.
Methods
CAROM
The CAROM approach takes as input a list of genes that are the targets of one or more regulatory processes. It compares the properties of the targets and identifies significant differences in target properties between mechanisms using ANOVA. Overall, CAROM compares the following 13 properties:
Impact of gene knockout on biomass production, ATP synthesis, and viability across 87 different conditions
Flux through the network measured through Flux Variability analysis and PFBA, reaction reversibility
Enzyme molecular weight and catalytic activity
The total pathways each reaction is involved in, its Degree, Closeness and PageRank
The CAROM source-code is available from the Synapse bioinformatics repository https://www.synapse.org/CAROM
Processing omics data
We used RNA-sequencing data from Treu et al 2014 that compared the expression profile of S. cerevisiae between mid-exponential growth phase with early stationary phase [27]. A 2-fold change threshold was used to identify differentially expressed genes. Lysine acetylation and protein phosphorylation data were obtained from the Weinert et al 2014 study that compared PTM levels between exponentially growing and stationary phase cells using stable isotope labeling with amino acids in cell culture (SILAC) [26]. A 2-fold change threshold of the protein-normalized PTM data was used to identify differentially expressed PTMs. Proteomics data was taken from Murphy et al time-course proteomics study [25]. The hoteling T2 statistic defined by the authors was used to identify proteins differentially expressed during diauxic shift; the top 25% of the differentially expressed proteins were assumed to be regulated. Proteomics data from Weinert et al was also used as an additional control and we observed the same trends using this data as well (S. Table 10). Further, we repeated the analysis after removing genes that were not expressed during transition to stationary phase; the transcripts for a total of 12 genes out of the 910 in the model were not detected by RNA-sequencing in the Treu et al study [27]. Removing the 12 genes did not impact any of the results (S. Table 9).
As additional validation, we used periodic data from the yeast cell cycle. Time-course SILAC phospho-proteomics data was obtained from Touati et al [37]. Phospho-sites whose abundance declined to less than 50% or increased by more than 50% at least two consecutive timepoints were considered dephosphorylated or phosphorylated respectively as defined by the authors. Transcriptomics data was taken from Kelliher et al study that identified 1246 periodic transcripts using periodicity-ranking algorithms [38].
The phospho-proteomics and transcriptome data during nitrogen shift was obtained from Oliveira et al [35,36]. The nitrogen shift studies compared the impact of adding glutamine to yeast cells growing on a poor nitrogen source (proline alone or glutamine depletion) with cells growing on a rich nitrogen source (glutamine plus proline). A 2-fold change threshold was used to identify differentially expressed transcripts and phospho-sites.
E. coli acetylation data was taken from the Weinert et al study comparing actively growing exponential phase cells to stationary phase cells [42]. Proteomics and transcriptomics were from Houser et al study of E. coli cells in early exponential phase and stationary phase [41]. Phospho-proteomics data for exponential and early stationary phase E. coli cells was taken form Soares et al [40]. We used a 2-fold change (p < 0.05) threshold for all studies.
The results are robust to the thresholds used for identifying differentially expressed genes or proteins (S. Table 11). In all studies, genes and proteins that are either up or down regulated were considered to be regulated. The final data set table used for all comparative analyses is provided as a supplementary material (S. Table 13).
Genome scale metabolic modeling
We used the yeast metabolic network reconstruction (Yeast 7) by Aung et al, which contains 3,498 reactions, 910 genes and 2,220 metabolites [29]. The analysis of E. coli data was done using the IJO1366 metabolic model [45]. All analyses were performed using COBRA toolbox for MATLAB [46].
The impact of gene knockouts on growth was determined using flux balance analysis (FBA). FBA identifies an optimal flux through the metabolic network that maximizes an objective, usually the production of biomass. A minimal glucose media (default condition) was used to determine the impact of gene knockouts. Further, gene knockout analysis was repeated in a set of 87 different minimal nutrient conditions to identify genes that impact growth across diverse conditions; these conditions span all carbon and nitrogen sources that can support growth in the Yeast 7 model. The number of times each gene was found to be lethal (growth < 0.01 units) across all conditions was used as a metric of essentiality.
To infer topological properties, a reaction adjacency matrix was created by connecting reactions that share metabolites. We used the Centrality toolbox function in MATLAB to infer all network topological attributes including centrality, degree and PageRank.
Flux Variability Analysis (FVA) was used to infer the range of fluxes possible through every reaction in the network. Two sets of flux ranges were obtained with FVA – the first with optimal biomass and the latter without assuming optimality. In the second case, the fluxes are limited by the availability of nutrients and energetics alone, thus it reflects the full range of metabolic activity possible in a cell. Reactions with maximal flux above 900 units were assumed to be unconstrained and were excluded from the analysis, as they are likely due to thermodynamically infeasible internal cycles [47]; the choice of this threshold for flagging unconstrained reactions did not impact the distribution between regulators over a wide range of values (S. Table 12).
For fitting experimentally derived flux data from Hackett et al [21], reactions were fit to the fluxes using linear optimization and the flux through remaining reactions that do not have experimentally derived flux data were inferred using FVA. Analysis using a related approach for inferring fluxes – PFBA, did not reveal any significant difference as PFBA eliminates futile cycles and redundancy by minimizing total flux through the network while maximizing for biomass [48] (S. Figure 5).
Reaction reversibility was determined directly from the model annotations. We also used additional reversibility annotation from Martinez et al based on thermodynamics analysis of the Yeast metabolic model [49]. Pathway annotations, enzyme molecular weight and catalytic activity values were obtained from Sanchez et al [50]. The comparative analysis of regulatory mechanisms was also repeated using the updated Yeast 7.6 model and yielded similar results (S. Table 8) [50].
The comparative analysis of target properties was done using gene-reaction pairs rather than genes or reactions alone; the gene-reaction pairs accounts for regulation involving all possible combinations of genes and associated reaction, including isozymes that may involve different genes but the same reaction or multi-functional enzymes involving same the gene associated with different reactions. The 910 genes and 2310 gene-associated reactions resulted in 3375 unique gene-reaction pairs in yeast.
All statistical tests were performed using MATLAB. Significance of overlap between lists was estimated using the hypergeometric test. Significance of the differences in distribution of target properties between mechanisms were determined using ANOVA, the non-parametric Kruskal-Wallis test, and after multiple hypothesis correction (S. Table 8).
Funding
This work was supported by faculty start-up funds from the University of Michigan to SC.
Author contributions
SC conceived the study, designed and performed research, and wrote the manuscript.
Competing interests
Authors declare no competing interests.
Data and materials availability
All datasets are available in the supplementary materials
S. Table 2. Essential reactions regulated by acetylation (Spreadsheet file)
S. Table 3. Top 50 reactions sorted based on topological connectivity (Spreadsheet file)
S. Table 4. Top 50 reactions with maximum reaction flux regulated by phosphorylation (Spreadsheet file)
S. Table 7. Gaps in regulation – Essential genes that are unregulated. One representative reaction is shown for each gene in case there are multiple reactions associated with it (Spreadsheet file)
S. Table 13. Raw dataset containing all yeast genes and associated reactions, the corresponding regulators, and the reaction properties (Spreadsheet file).