Division of labor in metabolic regulation by transcription, translation, acetylation and phosphorylation

The metabolism of most organisms is controlled by a diverse cast of regulatory processes, including transcriptional regulation and post-translational modifications (PTMs). Yet how metabolic control is distributed between these regulatory processes is unknown. Here we present Comparative Analysis of Regulators of Metabolism (CAROM), an approach that compares regulators based on network connectivity, flux, and essentiality of their reaction targets. Using CAROM, we analyze transcriptome, proteome, acetylome and phospho-proteome dynamics during transition to stationary phase in E. coli and S. cerevisiae. CAROM uncovered that the targets of each regulatory process shared unique metabolic properties: growth-limiting reactions were regulated by acetylation, while isozymes and futile-cycles were preferentially regulated by phosphorylation. Reversibility, essentiality, and molecular-weight further distinguished reactions controlled through diverse mechanisms. While every enzyme can be potentially regulated by multiple mechanisms, analysis of context-specific datasets reveals a conserved partitioning of metabolic regulation based on reaction attributes. Author summary There are several ways to regulate an enzyme’s activity in a cell. Yet, the design principles that determine when an enzyme is regulated by transcription, translation or post-translational modifications are unknown. Each control mechanism, such as transcription, comprises several regulators that control a distinct set of targets. So far, it is unclear if similar partitioning of targets occurs at a higher level, between different control mechanisms. Here we systematically analyze patterns of metabolic regulation in model microbes. We find that five key parameters can distinguish the targets of each mechanism. These key parameters provide insights on specific roles played by each mechanism in determining overall metabolic activity. This approach may help define the basic regulatory architecture of metabolic networks.


21
There are several ways to regulate an enzyme's activity in a cell. Yet, the design principles that 22 determine when an enzyme is regulated by transcription, translation or post-translational modifications 23 are unknown. Each control mechanism, such as transcription, comprises several regulators that control 24 a distinct set of targets. So far, it is unclear if similar partitioning of targets occurs at a higher level, 25 between different control mechanisms. Here we systematically analyze patterns of metabolic regulation 26 in model microbes. We find that five key parameters can distinguish the targets of each mechanism. 27 These key parameters provide insights on specific roles played by each mechanism in determining 28 overall metabolic activity. This approach may help define the basic regulatory architecture of metabolic 29 networks.

31
A myriad control mechanisms regulate microbial metabolic adaptation to new environments [1-8]. 32 Nevertheless, microbes deploy distinct regulatory mechanisms to regulate enzyme activity in response 33 to specific environmental challenges. For example, B. subtilis cells primarily utilize transcriptional 34 regulation when glucose is available, but post-transcriptionally regulate metabolic enzymes after malate 35 addition [9]. In both E. coli and yeast, some pathways, such as glycolysis, are predominantly regulated 36 by post-transcriptional regulation, while others, such as the TCA cycle, are regulated at the 37 transcriptional level [1,3,10]. This suggest that apart from differences in response time, specific 38 mechanisms are deployed for specialized regulatory tasks. Nevertheless, it is unclear why some 39 enzymes are regulated using acetylation or via other PTMs such as phosphorylation [3,4].

40
Numerous advantages of regulation by PTMs have been proposed over the past five decades [11][12][13]. 41 These include low energy requirements, rapid response, and signal amplification. Yet these 42 characteristics are not unique to PTMs, and these features also do not differentiate between PTMs 43 such as acetylation and phosphorylation. The staggering complexity of each regulatory process has 44 limited the comparative analysis of metabolic regulation at a systems level [3]. Existing studies have 45 focused on a small set of metabolic pathways or on a single regulatory process [4,10,[14][15][16][17][18][19][20]. Such 46 studies have revealed reaction reversibility and metabolic network structure to be predictive of 47 regulation [8,16,[21][22][23][24]]. Yet these studies do not shed light on the differences between each regulatory 48 process. In sum, although some general network principles of regulation are known, how it is 49 partitioned among various regulatory mechanisms is unclear.

50
We hence developed a data-driven approach, called Comparative Analysis of Regulators of Metabolism 51 (CAROM), to identify unique features of each regulatory process. CAROM achieves this by comparing 52 various properties of metabolic enzymes, including essentiality, flux, molecular weight and topology. It 53 identifies those properties that are highly enriched among targets of each process than expected out of 54 random chance.

56
Here we focus on four well-studied control mechanisms with available omics datasets -transcription, 57 post-transcription, phosphorylation and acetylation. We analyzed the dynamics of metabolic regulation 58 during a well-characterized process in yeast, namely, transition to stationary phase. We obtained RNA 59 sequencing, time-course proteomics, acetylomics, and phospho-proteomics data from the literature 60 [25][26][27]. Targets for each process were determined based on differential levels between stationary and 61 exponential phase (Methods). We assumed that PTMs and other regulatory sites that are dynamic and 62 conditionally regulated are likely to be functional [28].

63
The targets of diverse regulatory mechanisms were used as input to CAROM. CAROM analyzes the 64 properties of the targets in the context of a genome-scale metabolic network model of yeast [29]. We 65 hypothesized that differences in target preferences between diverse regulators can be inferred from the 66 network topology and fluxes. Protein and gene targets of each process were mapped to corresponding 67 metabolic reactions in the model. There was significant overlap among reactions regulated through 68 changes in both the transcriptome and proteome, and transcriptome and acetylome (hypergeometric p-69 value = 5 x 10 -25 and 1 x 10 -15 respectively; S. Table 1). In contrast, there was little overlap between 70 targets of phosphorylation with other mechanisms (p-value > 0.1; S. Table 1). While prior studies found 71 higher overlap between targets of PTMs [30,31], they used all possible sites that can be acetylated or 72 phosphorylated. However, only a fraction of PTM sites are likely to be active and functional in a single 73 condition. Overall, each regulatory mechanism had a distinct set of targets ( Figure 1A).

74
What are the common features of enzymes that are regulated by each mechanism? To answer this, we 75 used CAROM to compare the regulation of enzymes that are essential for growth in minimal media.

76
Essential enzymes in the yeast metabolic model were determined using Flux Balance Analysis (FBA) 77 [32]. Surprisingly, this set of enzymes was highly enriched among those regulated by acetylation but 78 not by other processes (ANOVA p-value < 10 -16 ; Figure 1B; S. Table 2). Since regulation can be 79 optimized for fitness across multiple conditions [33], we identified enzymes that impact growth in 87 80 different nutrient conditions comprising various carbon and nitrogen sources using FBA. This set of 81 essential enzymes was once again enriched for acetylation relative to other mechanisms (ANOVA p-82 value < 10 -16 ; S. Figure 1). This trend was observed using experimentally derived list of essential genes 83 as well (hypergeometric p-value = 2 x 10 -7 for acetylation). Interestingly, in contrast to acetylation, 84 genes regulated at the proteomic level were significantly under-represented among the essential genes 85 (hypergeometric p-value of depletion = 8 x 10 -11 ). Thus, essential enzymes are likely to be constitutively 86 expressed and their activity modulated through acetylation. This may explain why transcriptional 87 regulation has minimal impact on fluxes in central metabolism, which contain several growth-limiting 88 enzymes [3,10].

89
We next used CAROM to determine the impact of reaction position in the network on its regulation. We 90 counted the number of pathways each reaction is involved in, along with other topological metrics, such 91 as the closeness, degree and page rank. We found that the regulation of enzymes differed significantly 92 based on network topology ( Figure 1C). First, reactions with low connectivity, measured through any of 93 the topological metrics, were highly likely to be unregulated. In contrast, highly connected enzymes 94 linking multiple pathways were more likely to be regulated by PTMs. Interestingly, reactions regulated 95 by both the PTMs had the highest connectivity (S. Figures 2, 3). Several key hubs, such as acetyl-CoA 96 acetyltransferase, hexokinase and phosphofructokinase are regulated by at least 2 different 97 mechanisms (S. Table 3).

98
We next assessed how regulation differs based on the magnitude and direction of flux through the 99 network. We inferred the full range of fluxes possible through each reaction using flux variability 100 analysis (FVA) [34]. Since yeast cells may not optimize their metabolism for biomass synthesis during 101 transition to stationary phase, we also performed FVA without assuming biomass maximization. We 102 found that irreversible reactions were highly likely to be regulated (S. Figure 4). A recent study found 103 the same trend for allosteric regulation as well [21]. However, reversibility alone did not differentiate 104 between regulatory mechanisms.

105
Interestingly, reactions that have the potential to carry high fluxes were predominantly regulated by 106 phosphorylation ( Figure 1D; ANOVA p-value < 10 -16 ). This set of phosphorylated reactions comprise 107 several kinase-phosphatase pairs, enzymes that are part of loops that consume energy ("futile cycles"), 108 or reactions that have isozymes in compartments such as vacuoles or nucleus (S. Table 4). Thus, 109 phosphorylation in this condition selectively regulates reactions to avoid futile cycling between 110 antagonizing reactions or those operating in different compartments. Using data from experimentally 111 constrained fluxes from Hackett et al study [21] revealed similar patterns of regulation (S. Figure 5).

112
Reactions with the highest flux, such as ATP synthase, phosphofructokinase, and nucleotide kinase, 113 were also regulated by multiple mechanisms.

114
Finally, we compared regulation based on fundamental enzyme properties: catalytic activity and 115 molecular weight. While catalytic activity was similar across the targets of all mechanisms, targets of 116 phosphorylation had the highest molecular weight (p-value < 10 -16 ) (S. Figures 6,7). There is a weak 117 correlation between molecular weight and maximum flux (Pearson's correlation R = 0.02), suggesting 118 that both maximum flux and molecular weight are likely to be independent predictors of regulation by 119 phosphorylation.

120
To check if this pattern of regulation is observed in other conditions, using CAROM, we analyzed data 121 from nitrogen starvation response and the cell cycle in yeast, where both phospho-proteomics and 122 transcriptomics data are available [35][36][37][38]. A similar trend of regulation was observed in these 123 conditions with phosphorylation regulating isozymes and enzymes that can carry high fluxes (futile 124 cycles) (Figure 2). Since isozymes arise frequently from gene duplication, our results may explain the 125 observation that duplicated genes are more likely to be regulated by phosphorylation [39].

126
Since many mechanisms of metabolic regulation are evolutionarily conserved, we next analyzed data 127 from E. coli cells during stationary phase [40][41][42]. By analyzing transcriptomics, proteomics, 128 acetylomics and phosphoproteomics data using the E. coli metabolic network model, CAROM 129 uncovered that the pattern of regulation observed in yeast was also observed in E. coli (Figure 3).

130
Reactions that were regulated in E. coli had higher topological connectivity compared to those that 131 were unregulated. Further, essential reactions were enriched for regulation by acetylation, and 132 reactions with high maximum flux or large enzyme molecular weight were enriched for regulation by 133 phosphorylation. However, in contrast to yeast, phosphorylation impacted very few metabolic genes in 134 E. coli, and may play a relatively minor role in this specific context. Phosphorylation had 20-fold fewer 135 targets compared to other mechanisms, and its targets overlapped significantly with other processes 136 (S . Tables 5-6).

137
In sum, our analysis reveals a unique distribution of regulation within the metabolic network ( Figure 4). 138 Within each process, it is well known that individual regulators such as transcription factors or kinases 139 have their own unique set of targets. Here we find that similar specialization occurs at a higher scale, 140 involving diverse processes. Reaction properties identified by CAROM to be associated with distinct 141 regulatory mechanisms may be related to specific functions performed by each regulator. For example, 142 phosphorylation may represent a mechanism of feedback regulation to control futile cycles and high 143 flux reactions that consume ATP [6,43]. Finally, this pattern of regulation is context specific -predictive 144 features such as reaction flux or essentiality can change between conditions and influence regulation. 145 Further, while most essential reactions were regulated, a small subset (14%) were not found to be 146 regulated by any mechanism. These enzymes could be sites of allosteric regulation or other regulatory 147 mechanisms not covered here due to the lack of context specific datasets (S. Table 7). Overall, these 148 results are robust to the thresholds used for finding differentially regulated sites, using data from 149 different sources, and other modeling parameters (S . Tables 8-12).

150
Since microbes exhibit a wide range of metabolic behaviors, it is not possible to uncover regulation in 151 each condition through experiments. We need tools like CAROM to identify factors that determine the 152 deployment of regulatory mechanisms in a metabolic context. Although flux balance analysis of 153 metabolic models can accurately forecast optimal flux distribution, it does not provide insights on how 154 the flux rewiring is achieved. Our analysis predicts regulatory mechanisms that will likely orchestrate

170
The CAROM approach takes as input a list of genes that are the targets of one or more regulatory 171 processes. It compares the properties of the targets and identifies significant differences in target 172 properties between mechanisms using ANOVA. Overall, CAROM compares the following 13 properties: 173  Impact of gene knockout on biomass production, ATP synthesis, and viability across 87 different were assumed to be regulated. Proteomics data from Weinert et al was also used as an additional 194 control and we observed the same trends using this data as well (S . Table 10). Further, we repeated 195 the analysis after removing genes that were not expressed during transition to stationary phase; the 196 transcripts for a total of 12 genes out of the 910 in the model were not detected by RNA-sequencing in 197 the Treu et al study [27]. Removing the 12 genes did not impact any of the results (S. Table 9).

198
As additional validation, we used periodic data from the yeast cell cycle. Time-course SILAC phospho-199 proteomics data was obtained from Touati et al [37]. Phospho-sites whose abundance declined to less 200 than 50% or increased by more than 50% at least two consecutive timepoints were considered 201 dephosphorylated or phosphorylated respectively as defined by the authors.

214
The results are robust to the thresholds used for identifying differentially expressed genes or proteins 215 (S. Table 11). In all studies, genes and proteins that are either up or down regulated were considered to 216 be regulated. The final data set table used for all comparative analyses is provided as a supplementary 217 material (S.

223
The impact of gene knockouts on growth was determined using flux balance analysis (FBA). FBA 224 identifies an optimal flux through the metabolic network that maximizes an objective, usually the 225 production of biomass. A minimal glucose media (default condition) was used to determine the impact 226 of gene knockouts. Further, gene knockout analysis was repeated in a set of 87 different minimal 227 nutrient conditions to identify genes that impact growth across diverse conditions; these conditions 228 span all carbon and nitrogen sources that can support growth in the Yeast 7 model. The number of 229 times each gene was found to be lethal (growth < 0.01 units) across all conditions was used as a metric 230 of essentiality.

231
To infer topological properties, a reaction adjacency matrix was created by connecting reactions that 232 share metabolites. We used the Centrality toolbox function in MATLAB to infer all network topological 233 attributes including centrality, degree and PageRank.

234
Flux Variability Analysis (FVA) was used to infer the range of fluxes possible through every reaction in 235 the network. Two sets of flux ranges were obtained with FVA -the first with optimal biomass and the 236 latter without assuming optimality. In the second case, the fluxes are limited by the availability of 237 nutrients and energetics alone, thus it reflects the full range of metabolic activity possible in a cell. 238 Reactions with maximal flux above 900 units were assumed to be unconstrained and were excluded 239 from the analysis, as they are likely due to thermodynamically infeasible internal cycles [47]; the choice 240 of this threshold for flagging unconstrained reactions did not impact the distribution between regulators 241 over a wide range of values (S. Table 12).

242
For fitting experimentally derived flux data from Hackett et al [21], reactions were fit to the fluxes using 243 linear optimization and the flux through remaining reactions that do not have experimentally derived flux 244 data were inferred using FVA. Analysis using a related approach for inferring fluxes -PFBA, did not 245 reveal any significant difference as PFBA eliminates futile cycles and redundancy by minimizing total 246 flux through the network while maximizing for biomass [48] (S. Figure 5).

247
Reaction reversibility was determined directly from the model annotations. We also used additional

252
The comparative analysis of target properties was done using gene-reaction pairs rather than genes or 253 reactions alone; the gene-reaction pairs accounts for regulation involving all possible combinations of 254 genes and associated reaction, including isozymes that may involve different genes but the same 255 reaction or multi-functional enzymes involving same the gene associated with different reactions. The 256 910 genes and 2310 gene-associated reactions resulted in 3375 unique gene-reaction pairs in yeast.

257
All statistical tests were performed using MATLAB. Significance of overlap between lists was estimated 258 using the hypergeometric test. Significance of the differences in distribution of target properties 259 between mechanisms were determined using ANOVA, the non-parametric Kruskal-Wallis test, and after 260 multiple hypothesis correction (S. Table 8). Only 2 genes were found to be regulated by all four mechanisms. Targets of phosphorylation did not show any significant overlap with other mechanisms, while transcriptome and proteome showed the highest overlap (S. Table 1). B. Enzymes that impact growth when knocked out are highly likely to be acetylated. C. Enzymes with poor connectivity, as measured through the network connectivity metriccloseness, are more likely to be Unregulated. D. Enzymes catalyzing reactions with high maximum flux are likely to be either regulated through phosphorylation or to be unregulated. The Anova p-value comparing the differences in means is shown in the title.     Maximum flux through each reaction was calculated using FVA without assuming that cells maximize 500 their biomass using the Yeast 7.6 model (Yeast 7 model was used for all analyses). C. The flux through 501 the model was first fit to the experimentally inferred flux data from Hackett et al [21]. The maximum flux 502 through all reactions was then determined using FVA. D. The flux through each reaction was inferred 503 from Parsimonious FBA (PFBA). Note that PFBA does not provide the maximum flux but the flux value 504 that minimizes the sum of flux through all reactions while maximizing the biomass objective. Hence it 505 does not reveal any futile cycles or redundancy in the network. E. The heatmap shows the distribution 506 of regulation based on magnitude of maximum possible flux (Vmax) through of each reaction. 507 Reactions are sorted based on Vmax inferred from FVA. The columns correspond to each reaction-508 gene pair. Those that are regulated by each mechanism are shown in yellow, while those that are not 509 regulated by a specific mechanism are in blue.  S. Table 6. Overlap between targets of various mechanisms in E. coli -transcription (TRANS), post-573 transcription (PROT), acetylation (ACET), phosphorylation (PHOS).
574 575 576 S. Table 7. Gaps in regulation -Essential genes that are unregulated. One representative reaction is 577 shown for each gene in case there are multiple reactions associated with it (Spreadsheet file) 578 579