RMetD2: a tool for integration of relative transcriptomics data into Genome-scale metabolic models

Relative Metabolic Differences version 2 (RMetD2) is a tool for integration of differentially expressed (DE) genes into genome-scale metabolic models (GEMs) for revealing the altered metabolism between two biological conditions. This method provides a robust evaluation of the metabolism by using flux ranges instead of a single set of flux distributions. RMetD2 classifies reactions into three different groups, namely up-regulated, down-regulated and unchanged, which enables systematic interpretation of the metabolic differences between two different conditions. We employed this method in three different case studies using mice and human datasets, and compared it with state-of-the-art methods used for studying condition-specific metabolic differences using GEMs. We observed that RMetD2 is capable of capturing experimentally-observed features that are missed by other methods, highlighting its potential use in biotechnology and systems medicine applications. RMetD2 is implemented in Matlab and it is available without any limitation at https://sourceforge.net/projects/rmetd.


Introduction
Genome-scale metabolic models (GEMs) are the comprehensive collections of known biochemical reactions and associated protein coding genes that are present in specific cells, tissues and organisms. They have been applied in metabolic engineering, antibiotic design, and increasingly used in revealing the underlying disease mechanism, discovery of biomarkers and identification of drug targets [1][2][3][4][5][6][7].
Many GEM based methods have been developed to help researchers understand biological data by integrating transcriptomic data obtained from different conditions [8].
Compared to simple enrichment analysis, GEM based methods could interpret the data in a connected manner based on network topology, and infer changes in remote pathways. For instances, Jerby et al. developed iMAT to construct context specific GEM for liver and predicted their hepatic flux changes which were later validated with experimental measurements [9]. In addition, Agren et al. developed a method called INIT which uses transcriptomic data and network topology of GEM to build cell type specific GEMs that exhibited different metabolic capabilities [10]. In another study, mCADRE was developed and used for reconstruction of GEMs for 126 human tissues and tumors, and proposed potential drug targets that specifically affect tumors based on their different topology [11].
Even though detailed insights can be gained based on network topology, predicting actual fluxes based on absolute transcriptomic data is a challenging task [12]. In this context, MADE [13], tFBA [14] and several other methods [15,16] have been developed and applied to investigate the altered metabolism using GEMs and relative changes of gene expression.
However, since there is a large solution space even when the objective function is optimized [17], it is difficult to identify consistently changed metabolic reactions using the above mentioned methods. On the other hand, there are also methods, such as reporter subnetworks [18], that identifies key metabolic subnetworks using relative transcriptomic data and the topological structure of GEMs. But these methods neglect stoichiometric information in the model and could likely miss some key reactions [19].
Here, we present the systematic formulation of a method called Relative Metabolic Differences version 2 (RMetD2). RMetD2 integrates relative gene expressions together with the stoichiometric structure of GEMs to identify key up-/down-regulated metabolic reactions that are most relevant to the changes in the transcriptome. The concept of RMetD has been successfully used in number of recent studies [4,20,21]. Unlike previous methods, RMetD2 stretches the potential biological effects into many steps by setting gradient constraints to the differentially expressed reactions. In this way, RMetD2 could be used to identify significantly up-/down-regulated reactions by correlation of flux ranges of all reactions, which allows users to prioritize and refine the key metabolic reactions and pathways, and enables detailed systematic interpretation of the biological data.

GEM and simulation
Three GEMs have been employed for case studies. The first one is iHepatocyte2322, which is a GEM for human liver generated based on both transcriptomics and proteomics data in a previous study [22] and the irreversible format is used, which included 10565 reactions, 5195 metabolites and 2328 genes. The constraints for the model were taken from another study where the fluxes in liver tissue were measured and derived based on the body composition [4]. Mouse intestine GEMs, iMouse-Smallintestine, has been reconstructed in a previous study based on proteomics data, and the constraints for both conventional raised and germ-free mice are calculated based on the dietary consumption of the mice considering the effect of gut microbiota [4]. The irreversible version of this GEM has 8055 reactions, 4408 metabolites and 2337 genes. The secretion of chylomicron and HDL secretion were ensured by setting them as mandatory functions for the model according to the original study. The differentially expressed (DE) genes between the intestine of germfree and conventionally raised mice were obtained from the same study [4]. The GEM for human cancer cell line HepG2 is generated in another recent study [21] based on transcriptomic data and exchange constraints calculated from the formulation of the cultivating medium. The irreversible form of the model which has 5403 reactions, 2784 metabolites and 1957 genes was used by RMetD2 in this study.

RMetD2
To clearly demonstrate the rationality of this method, we provide a toy network with 10 reactions and 3 genes as shown in Figure 1A. In this simplest case, if G1 which is associated with R3 is up-regulated, and G2 and G3 which are respectively associated with R5 and R8 are down-regulated, we could easily tell that the flux through R3 and R4 should be increased and fluxes via R10 should be decreased based on the topology. However, if G2 and G3 are changes in opposite direction, it will be tricky to predict the changes of flux via R10, since it will be dependent on the original flux scale and stoichiometric coefficients of both R6 and R9. And this scenario is very common in real cases studying biological changes in a genomescale.
The principal idea of RMetD2 is: to push the constraints of the flux for up-and downregulated reactions respectively up and down using DE genes as indicator of directions (as shown in Figure 1B). The key concept of RMetD2 is to divide this process into several steps so that the consistency of the changes could be evaluated. The input for RMetD2 includes a GEM in either COBRA [23] or RAVEN [24] format (where objective function is optional), the DE genes and their fold changes (perturbed vs. reference). Additional constraints for the perturbed state model may optionally be added in case there are known metabolic differences between the two biological states (e.g. change of medium or gene knock-out).
Otherwise, the original constraints of the input model are used. Using these inputs, RMetD2 classifies reactions into 3 different groups, namely up-regulated, down-regulated, and unchanged, based on the changes in flux ranges, and provide correlations and significance of relevance of each reaction to the biological change (as shown in Figure 1C). For instance, if G1 and G2 are up-regulated, and G3 is down-regulated, R4, R9 and R10 will probably be respectively classified as up-regulated, down-regulated and unchanged. It should be noted that, we used 'constraints' for the upper and lower bounds allowed for the reactions, and 'flux ranges' for the actual achievable maximum and minimal fluxes (with objective function maximized if there is any).
The implementation of RMetD2 includes several steps: (1) The given GEM is converted to an irreversible form so that all reactions will not carry any negative fluxes. (2) We identified up-/down-regulated reactions by associating DE genes using gene-protein- The mathematical expression is defined as below: / min , ( = 1, 2, … , ) it is defined as unchanged.

MADE
We compared the predictions of RMetD2 with those of MADE, a tool frequently used to compare two conditions using GEMs. MADE is implemented through the TIGER toolbox [25], and the time limit is set to 36000s. The flux ranges for all reactions were also calculated by the built-in function 'fva' in TIGER toolbox designed specifically for TIGER MADE GEMs.

Metabolic subsystem enrichment test
Each reaction in GEMs is annotated with a specific metabolic subsystem. Therefore, the significance of overlapping between reactions of different RMetD2 classes and metabolic subsystems of GEMs may be assessed using hypergeometric test. In this study, a RMetD2 class is enriched in a metabolic subsystem if the adjusted hypergeometric test P value of the overlap is less than 0.05 after Benjamini and Hockberg correction for multiple hypothesis testing.
For simple enrichment analysis, we used again GEMs for fair comparison. All genes associated with a specific metabolic subsystem in GEM is retrieved based on its geneprotein-reaction annotation, and up-/down-regulated genes are enriched in a metabolic system if the overlap is statistically significant using the same cutoff as metabolic subsystem enrichment test.

Availability and requirement
RMetD2 is a method build in Matlab (version R2017a) environment, and the source code is available at https://sourceforge.net/projects/rmetd/. In order to have fair comparison wherever relevant, the IBM ILOG CPLEX Optimization Studio v12.8 was employed for solving all linear programming and mixed integer linear programming in this study.

Case 1: Liver before and after low-carbon diet
For benchmarking, a show case is provided where RMetD2 was used to reveal the metabolic differences in liver before and after carbon-restricted diet based on transcriptomics data generated in a previous study [4]. Tissue biopsy samples were taken from patients with fatty liver diseases before and one week after dietary change. In addition, the liver fat content is measured for both time points and found to be significantly decreased in subjects.
The data used as an experimental benchmarking point for the in silico methods. We found that 952 and 3263 genes respectively significantly up-and down-regulated (FDR<0.05) between two different conditions. With implementation of RMetD2 with no predefined objective function, we identified 808 and 4987 significantly up-and down-regulated reactions (FDR<0.05), respectively. As expected, the reaction 'HMR_9883', which refers to triacylglycerol pool generation and thus indicates the accumulation of liver fat in the GEM is among the ones that are significantly down-regulated. For comparison, we also generated condition specific GEMs using MADE which is one of the most frequently used methods for comparing two conditions, and does not require predefined objective function. We compared the mean flux range before and after dietary change obtained by MADE. We identified 4267 and 811 reactions that respectively up-and down-regulated. Interestingly, as shown in Figure 2 we found that MADE predicted an increased flux through liver fat accumulation, which is the opposite to what we observed experimentally and computationally using RMetD2. Our analysis strongly demonstrated the advantage of using RMetD2 in biological applications.

Case 2: Liver cancer cell line before and after gene inhibition
We also applied RMetD2 to investigate the metabolic difference between wild-type and PKLR inhibited liver cancer cell line, HepG2, using constraints and DE genes from our recent study [21]. The reference GEM of HepG2 was reconstructed in the same study based on transcriptomic data as well as composition of culture medium using tINIT [26]. The DE genes strongly related to silencing of PKLR (adjusted P <0.0001, n = 1242) were used as input for RMetD2. As a result, RMetD2 suggested 274 and 2078 reactions to be up-and down-regulated, respectively. Among the down-regulated reactions, we found 'HCC_biomass' and 'TAG_accumulation', and these suggest decreased growth and triacylglycerol accumulation in PKLR inhibited HepG2 cell line which was also validated experimentally in the same study. In addition, we also found 'HMR_4381', which is the conversion reaction from glucose-6-phosphate to fructose-6-phosphate and a key step in glycolysis, is classified as down-regulated. This suggested a decreased glucose consumption through glycolytic pathway, and as expected, we also found a deceased glucose consumption in PKLR inhibited cells in our experimental validation. These results further demonstrated the power of RMetD2.

Case 3: Intestine of conventional raised mice vs. germ-free mice
To further demonstrate the use of RMetD2 in integration of additional constraints, another show case is provided. RMetD2 was used to reveal the metabolic differences between germfree and conventional raised mice based on the dietary inputs and DE genes obtained from the previous study [21]. Since most of the frequently used methods could not be used for integrating additional constraints together with transcriptomic changes, we only compared the RMetD2 result with experimental observations. In summary, we found that 323 and 616 genes significantly up-and down-regulated, respectively (FDR<0.05), and found that 130 and 2446 reactions are up-and down-regulated, respectively between between germ-free and conventional raised mice. In addition, we found that the maximal production of chylomicron in the last step model is decreased in conventional raised mice which is consistent with the conclusion of the original study [23]. Moreover, the HDL production reaction is among the down-regulated reactions, which is also in agreement with the previous study [23]. Additionally, the flux ranges for these two reactions are calculated with only constraint changes while neglecting the gene expression changes. The flux range of chylomicron secretion is decreased which is the same as we found using RMetD2, but the HDL production capability is increased. This implies that the key to infer the decrease of chylomicron is the constraint changes, while the decrease of HDL secretion seems like a result of transcriptomic changes.
In addition, we performed metabolic subsystems enrichment analysis for up-and downregulated reactions obtained from RMetD2. As shown in Figure 3, the enriched up-regulated metabolic pathways in conventionally raised mice include histidine metabolism, phenylalanine, tyrosine and tryptophan metabolism, branch amino acids metabolism and oxidative phosphorylation. These might reflect the difference in diet, since we observed that conventional mice ate around 20% more in general. On the other hand, the down regulated reactions are enriched in many different pathways, including almost all fatty acids metabolic process such as beta oxidation of fatty acids, fatty acids biosynthesis, carnitine shuttle and glutathione metabolism. The down-regulation of these pathways suggests that there might be less free fatty acids in blood of conventional raised might to deal with.
We also compared the enriched subsystems with the ones suggested by simple DE gene enrichment analysis, and we found many differences as shown in Figure 3. Most of the significantly enriched metabolic pathways disappeared in simple gene enrichment analysis, which is expected since it will miss metabolic pathways that have no association with DE genes but directly or indirectly linked with those significant changed pathways through the topology. One good example is glutathione metabolism, which is indirectly linked to many different metabolic pathways and account for the cofactor balancing in the whole system.
Glutathione level is shown to be decreased in conventional raised mice according to RMetD2, and this is in very good agreement with the original study in which the key finding is that gut microbiota modulates host glutathione metabolism. However, simple enrichment analysis completely missed it, which highlighted the advantage of RMetD2.

Discussion
In this study, we proposed a novel method called RMetD2 which can be used to integrate relative transcriptomic changes into GEMs and identify consistently up-/down-regulated reactions. We used RMetD for interpretation of the metabolic shift between two conditions at systems level. Unlike other methods, RMetD2 focuses on identifying relative change of reactions instead of having quantitative predictions since we are very well aware of the limitation of currently available data. To quantitatively predict the flux distribution using transcriptomics data, theoretically we need to know the translational efficiency of each gene, and kinetic values for all enzymes. Unfortunately, very few of them are currently available at genome-scale for any organism to the best of our knowledge. Therefore, we defined the main purpose of RMetD2 to identify key metabolic reactions or pathways in GEMs that are most relevant to the biological changes between conditions by integrating transcriptomic changes as well as any additional fluxomics data whenever it is available. In addition, RMetD2 could be implemented without predefined objective function, which is a great advantage for tissue modelling where there is no clear objective. We have shown in our case study that RMetD2 exhibited better predictions over MADE which is frequently used and thought to be among the best state-of-the-art methods.
It should be noted that in this method, gene-expression data are integrated as hard constraints which is the same as many other methods, such as GX-FBA [27]. Therefore, falsely annotated genes might have a significant impact on the results. So, the use of high quality GEM is very important and will likely improve the performance of RMetD2.