## Abstract

**Background** Current metabolic modeling tools suffer from a variety of limitations, from scalability to simplifying assumptions, that preclude their use in many applications. We recently created a modeling framework, LK-DFBA, that addresses a key gap: capturing metabolite dynamics and regulation while retaining a potentially scalable linear programming structure. Key to this framework’s success are the linear kinetics and regulatory constraints imposed on the system. Here, we present improvements to these constraints to improve the predictivity of LK-DFBA models and their applicability to biological systems.

**Method** Three new constraint approaches were created to better model interactions between metabolites and the reactions they regulate. These new approaches (and the original LK-DFBA approach) were tested on several synthetic and biological systems to determine their performance when using both noiseless and noisy data. To validate our framework, we compared experimental data to metabolite dynamics predicted by LK-DFBA.

**Results** There was no single optimal type of constraints across all synthetic and biological systems; rather, any one of the four approaches could perform best for a given system. The optimal approach for fitting to wildtype data of a given model was consistently the best approach when predicting new phenotypes for that model. Furthermore, many of LK-DFBA’s predictions qualitatively agreed with experimental data.

**Conclusions** LK-DFBA can be improved by using several kinetics constraint approaches, with the ideal one selected based on wild-type training data. LK-DFBA’s ability to predict metabolic trends in experimental data further supports its potential for modeling metabolite dynamics in systems of all sizes.

## Introduction

Mathematical and computational models are often used to study metabolism, the set of reactions that supply the chemical precursors necessary for almost all cellular processes. These metabolic models are significantly cheaper and faster to run than laboratory experiments, meaning that they can be of tremendous value when they are able to predict how changes in or to a metabolic system can affect its state. While a few pathways and sections of metabolism (e.g., glycolysis and central carbon metabolism) have been modeled and characterized quite well in a few organisms (e.g., *Saccharomyces cerevisiae* and *Escherichia coli*) [1, 2], genome-scale models that capture metabolism at a systems scale have been more difficult to develop.

Metabolism involves many interconnected reactions and pathways, making it critical to include as much of metabolism as possible in metabolic models to better represent the system and generate accurate predictions. Metabolomics, which is the systems-scale measurement of metabolites in biological systems, thus has great potential to provide the information necessary to drive systems-scale metabolic models. However, creating genome-scale metabolic models that capture critical system behaviors like metabolic dynamics remains an outstanding challenge in the field, which has prevented the value of metabolomics data in this context from being fully realized.

The most popular type of frameworks for metabolic modeling are constraint-based models, including flux balance analysis (FBA) [3, 4], and ordinary differential equation (ODE) models. FBA assumes that the metabolic system is at steady state, which allows it to be modeled as a linear program (LP) that can be efficiently solved (even at the genome scale) but precludes modeling metabolite dynamics without substantial changes to the framework. ODE models are more widely used when dynamics are important, but are typically limited to smaller-scale modeling of well-studied pathways and the best-studied organisms due to uncertainty in the mathematical form and parameter values for the reaction kinetics terms. Only a few exceptions [5-8] have approached genome-scale ODE models, and they still require lengthy parameter estimation steps, prior information about kinetic constants, or have only been shown to be useful near the reference state of the system. As a result, steady-state fluxes continue to be the almost exclusive focus of study for genome-scale models. Modeling frameworks that can predict various metabolic phenotypes at the genome scale in a computationally tractable way have great potential for understanding, predicting, and controlling metabolism.

To address this problem, we recently developed Linear Kinetics-Dynamic Flux Balance Analysis (LK-DFBA), a modeling strategy to efficiently track metabolite dynamics [9]. LK-DFBA combines advantages of both constraint-based and ODE models, unrolling the temporal aspect of the system into a larger stoichiometric matrix that captures metabolite dynamics while retaining a LP structure. The most critical element to accomplishing this goal is the addition of linear kinetics constraints that model the interactions between metabolites and the reactions whose fluxes they affect, including mass action kinetics and allosteric regulatory interactions. The number of parameters in LK-DFBA that need to be estimated can be far fewer than in ODE models due to these linear kinetics constraints. This enables LK-DFBA to potentially be applied to metabolic systems of all sizes, with a smaller increase in computational burden compared to ODE models. Furthermore, because LK-DFBA retains a linear structure, it can potentially be used with many existing metabolic modeling tools that require constraint-based models, such as OptKnock [10]. We have previously shown that LK-DFBA can outperform ODE-based modeling approaches when used in conditions most relevant to metabolomics data (low sampling frequency and high noise). A framework such as LK-DFBA that can model systems at the genome scale is essential to take full advantage of metabolomics data.

In our initial description of LK-DFBA, we explored two different approaches for model parameterization. The first approach, LK-DFBA (LR), parameterizes constraints solely via linear regression of interacting metabolite concentration and flux data. The second approach, LK-DFBA (LR+), uses the parameters from the linear regressions as initial seeding values for a secondary optimization to identify the optimal constraints for each interaction. While LK-DFBA (LR+) yields better fits to training data than LK-DFBA (LR), the latter approach estimates its parameters with trivial computational effort while still producing results that are similar in error to ODE models. As a result, LK-DFBA (LR) may be the preferable approach for the efficient construction and parameterization of metabolic models at the genome scale.

However, the overall LK-DFBA framework still has some limitations in terms of how accurately it represents the underlying biology and biochemistry of the system. For example, the linear kinetics constraints used in LK-DFBA (LR) may be viewed as crude approximations of the interactions between metabolites and fluxes, which are typically non-linear in nature. While kinetic equations found in ODE models (such as Michaelis-Menten or biochemical system theory (BST) representations [11, 12]) can capture the non-linearity of these interactions, the current linear framework in LK-DFBA cannot. Additionally, when allosteric regulatory information is considered (which LK-DFBA includes in its framework), reaction fluxes are often controlled by multiple metabolites. Currently, LK-DFBA creates separate constraints for each metabolite that controls a flux, which precludes modeling how multiple metabolites simultaneously interact with a reaction flux.

Since the linear kinetics constraints are so critical in LK-DFBA’s functioning, it is likely that improving those constraints could have a substantial impact on LK-DFBA’s ability to capture and predict biological phenomena. Accordingly, we devised three new types of kinetics constraints for LK-DFBA to account for biologically relevant features like non-linearity and simultaneous regulation by multiple metabolites. These new approaches were compared to the original LK-DFBA (LR) constraints by testing on synthetic model systems as well as models based on *Lactococcus lactis* and *Escherichia coli* [1, 2]. We also probed these constraint approaches for their robustness to model perturbation and their ability to predict phenomena not captured in training data. We found that these new constraint approaches can improve model performance, and that the optimal constraint approach varied depending on the system being modeled but was consistent across perturbations for any given model. We also showed that the LK-DFBA approach chosen for the *L. lactis* and *E. coli* models can be used to predict changes in several critical metabolites and fluxes in agreement with literature experimental results. These improvements to LK-DFBA and demonstration of its effectiveness on new metabolic models support its attractiveness as a framework for modeling increasingly large metabolic systems in the future.

## Methods

### LK-DFBA

Linear Kinetics-Dynamics Flux Balance Analysis (LK-DFBA) is a recently developed modeling strategy that is both scalable and capable of capturing metabolite dynamics. The full details of this approach have been described in detail previously [9], so we only outline the most important aspects of our framework here. In brief, LK-DFBA uses an LP-based structure with temporal dynamics modeled by discretizing time and unrolling the system into a larger matrix representing each time point separately, with an objective function that reflects the unrolling of the model. Linear inequality constraints that model mass action kinetics and metabolite-dependent regulation are included in the model; they are the driving force behind metabolite accumulation and depletion by limiting the maximum flux allowed based on the availability of metabolites over time. To parameterize these constraints, the LK-DFBA (LR) approach uses linear regression on assumed available metabolomics and fluxomics data, as described in the next section. If fluxomics data are unavailable, dynamic flux estimation (DFE) can be used to infer flux values from concentration data [13]. In the LK-DFBA (LR+) method, the parameters from the LK-DFBA (LR) approach are used as initial conditions in a secondary optimization step that finds improved kinetics constraint parameters, though at the cost of computational time. Because LK-DFBA retains an LP structure, it is readily scalable and has the potential to be used with current constraint-based modeling tools.

### Constraint Approaches

#### LK-DFBA (LR)

The original LK-DFBA approach uses linear kinetics constraints to model the interaction between a metabolite and a flux, parameterized using available metabolomics and fluxomics data. These constraints take the form of *ν* < *ax* + *b*, where *v* is the flux being constrained, *x* is the concentration of a metabolite that interacts with the flux, and *a* and *b* are the linear constraint parameters. These interactions may be due to mass action kinetics, where the interactions are known based on the stoichiometric topology of the system, or they may stem from allosteric regulation. While we have previously shown that these linear approximations of metabolic interactions can be effective for modeling metabolism, they are still approximations of the true non-linear and interconnected biochemical relationships in metabolism. Below, we discuss three new constraint approaches to address these potential limitations.

#### LK-DFBA (NLR)

While the key advantage of using constraint-based models is their LP structure that enables efficient identification of the optimal solution of the problem, most metabolite-flux interactions exhibit non-linear behavior that may not be captured well by linear equations. Recently, computational solvers have improved such that quadratically constrained programs (QCPs) are not much more computationally expensive than LPs. Accordingly, we implemented quadratic constraints into the LK-DFBA framework to explore their potential for improving model accuracy with only a modest increase in computational time. One important aspect of LPs and QCPs is that all of the constraints must create a convex feasible solution space in order to guarantee that a global optimum can be found [14]. If *ν* < *ax*^{2} + *bx* + *c* represents a quadratic constraint, where *v* is the flux being constrained, *x* is the concentration of a metabolite that interacts with *v*, and *a, b*, and *c* are the parameters of the quadratic constraint, *a* must be a negative value to retain a convex solution space. If *a* is found to be a positive value during parameterization, we convert the quadratic constraint into its original linear form as found in LK-DFBA (LR). We refer to this overall approach as LK-DFBA (NLR).

#### LK-DFBA (DR)

Enzymatic reactions are often controlled by more than a single metabolite that can either induce or inhibit enzyme activity, which should ideally be captured in the model constraints. To model such regulation of a reaction by multiple metabolites, LK-DFBA (LR) creates individual linear constraints for each controller metabolite that are independent of each other and are thus unable to capture the synergistic or antagonistic effects of multiple metabolites working in conjunction to regulate a flux. We implemented a new strategy that uses dimensionality reduction to consolidate information content from all controller metabolites for a flux into a single constraint. Dimensionality reduction is often used in data analysis, including analysis of metabolomics data, to more easily represent and digest datasets with many measured variables. Principal component analysis (PCA) is one of the most commonly used dimensionality reduction approaches; it linearly transforms the original variables into new, orthogonal composite variables called principal components that capture as much variance in the original variable data in as few principal components as possible [15]. Ideally, the first or first few principal components capture the majority of the variance in the original dataset, which allows one to focus only on those composite variables rather than all of the original variables at once. Here, we use PCA to capture the maximal variance of the controller metabolite data in a single principal component and use that composite variable as the regressor during linear regression with the target flux data. These new constraints are represented as *ν* < *aPC*_{1} + *b*, where *v* is the flux being constrained, *PC*_{1} is the metabolite concentration data projected into the first principal component, and *a* and *b* are the constraint parameters. We refer to this dimensionality reduction approach as LK-DFBA (DR).

#### LK-DFBA (HP)

Another approach for modeling interactions with multiple metabolites is to use hyperplane constraints. Unlike LK-DFBA (DR), which always builds constraints in two dimensions (i.e. the target flux vs. the first principal component), the hyperplane constraint exists in (*n* + 1) dimensions, where *n* is the number of metabolites that control a target flux. This approach may avoid loss of information content from metabolite data as is possible during dimensionality reduction: as the number of metabolites in an interaction increases, the likelihood of the first principal component not capturing the majority of variance in the data increases. The hyperplane constraint equation can be represented as , where *v* is the flux being constrained, *n* is the number of metabolites that interact with *v, x*_{i} is the concentration of metabolite *i, a*_{i} is the constraint parameter for metabolite *x*_{i}, and *b* is another constraint parameter. We refer to the hyperplane approach as LK-DFBA (HP).

### Translating constraints to contain training data

We found that translating the constraints such that all training data fall in the region under the inequality constraint decreased the possibility of the computational solver encountering infeasible solutions when simulating metabolite dynamics. Thus, for all LK-DFBA approaches, each constraint was translated to contain the training data by increasing the intercept of the constraint (i.e. the *b* parameter in LK-DFBA (LR), LK-DFBA (DR), and LK-DFBA (HP), and the *c* parameter in LK-DFBA (NLR)) until no training data were above each constraint.

### Test Models

#### Synthetic model

The first system we examine is a simple synthetic model with five metabolites and five fluxes that was derived from a branched pathway model used previously [9]. This system is based on an ODE-based modeling framework that uses power-law kinetics to represent reaction fluxes [12]. The kinetic equations for each pathway are shown in Figure 1. To create a variety of synthetic models with the same stoichiometric topology, we randomly generated *a* and *b* parameters in each kinetic equation. The parameters for each model can be found in Table S1. Time course metabolite and flux data were generated by solving the ODE system in MATLAB (2018b).

*Lactococcus lactis* model

This model was created by Costa *et al*. and comprises central metabolism and production pathways for important metabolites such as mannitol and 2,3-butanediol [2]. The *L. lactis* model consists of 26 metabolites and 21 fluxes and is publicly available on KiMoSys [16]. Noiseless data were generated in COPASI 4.24 (Build 197) using the default initial conditions and parameters over a simulation time of two hours.

*Escherichia coli* model

The *E. coli* model developed by Chassagnole *et al*. encompasses glycolysis and the pentose phosphate pathway [1]. This model is publicly available on KiMoSys, but was rebuilt within MATLAB to allow easy creation of new models that use the original *E. coli* model’s topology and stoichiometry. Noiseless data for the original *E. coli* model were generated in MATLAB (2018b) using the default initial conditions and parameters, while random initial conditions and parameters were used for the new models with the *E. coli* topology. To be consistent with our previous work, we used a simulation time of ten seconds [9]. More information about the model parameters can be found in the Supplementary Methods.

### LK-DFBA Objective Functions

Like other constraint-based methods, LK-DFBA requires an objective function, which is usually tied to some presumed goal of the system (such as maximizing biomass or ATP production) that stems from evolutionary pressure. FBA models for specific organisms commonly have a separate flux reaction dedicated to biomass, made up of precise ratios of different metabolites. While LK-DFBA models with tuned objective functions can be created, the biological models we sought to use here do not have pre-existing tuned objective functions, so we instead focused on LK-DFBA’s performance using generic objective functions.

Here, we have chosen flux v_{5} as the objective function for the synthetic model, as it is the only efflux out of the system. For the *L. lactis* model, we use the LDH pathway as the objective function to maximize production of lactate because it is a key metabolite in the organism (which is commonly used for dairy products) and was the metabolite produced at the highest levels in the original *L. lactis* model [2]. The objective function used for the *E. coli* model was to maximize all effluxes from the system, which included murein synthesis, glycerol-3-phosphate dehydrogenase, serine synthesis, PEP carboxylase, DAHP synthesis, pyruvate dehydrogenase, ribose phosphate pyrophosphokinase, glucose-1-phosphate adenyltransferase, the synthesis of murein and chorismate from PEP, and the synthesis of isoleucine, alanine, α-ketoisovalerate, and diaminopimelate from pyruvate. While we have observed that these objective functions can be further improved, and approaches have been developed for finding an optimal objective function for a model by creating a bilevel optimization problem and then leveraging the duality theorem [17, 18], our chosen objective functions were sufficient to at least qualitatively model the synthetic, *L. lactis*, and *E. coli* systems.

### Pathway Perturbations

To test the ability of LK-DFBA to predict metabolic behaviors not represented in the training data, we introduced perturbations into each system either through down-regulation (indicated with a prefix ‘d’ in all figures) or up-regulation (indicated with a prefix ‘u’) of reaction fluxes. For the synthetic models, we down-regulated v_{2}, v_{3}, and v_{4} by multiplying their constraint equation parameters (which restricts their maximum allowable flux value) by 0.5x and up-regulated these pathways by doubling the constraint equation parameters. The pathways and reactions to be perturbed in the *L. lactis* [2, 19-22] and *E. coli* [23-27] models were chosen based on previous literature. Reactions in the *L. lactis* model (lactate dehydrogenase, phosphofructokinase, acetate kinase, mannitol 1-phosphatase) were down-regulated to 0.1x their original parameter values (since completely knocking out reactions would often produce infeasible solutions for the linear program) and up-regulated to 2x their original parameter values, magnitudes that were necessary to effect significant perturbations to the system’s behavior. Reactions in the *E. coli* model (pyruvate kinase, phosphoglucose isomerase, glyceraldehyde-3-phosphate dehydrogenase, phosphofructokinase, triose-phosphate isomerase, ribulose-phosphate epimerase, phosphoglucomutase) were down-regulated to 0.1x and up-regulated to 2x their original parameter values.

### Generating Noisy Data

Noise was introduced into the system by down-sampling the original noiseless data (originally 50 time points) into nT time points that are evenly spaced over the time interval of interest. Both metabolite and flux values were then replaced with a random value drawn from *N*_{i,k} ∼ (*y*_{i}(*t*_{k}),*CoV·y*_{i}(*t*_{k})), where *y*_{i}(*t*_{k}) is the value of species (metabolite or flux) *i* at time point *k*, and CoV is a coefficient of variance. For each sampling frequency and CoV condition, ten noisy datasets were generated.

### Error Calculation

The error of the predictions made by LK-DFBA was calculated using a normalized root mean squared error (NRMSE) between the LK-DFBA predicted metabolite concentrations and the noiseless ODE concentration or experimental data. *P*_{ik} and *R*_{ik} are the predicted (e.g. results from LK-DFBA) and reference (e.g. ODE model or experimental) data from a system with *m* metabolites and *nT* time points. is the mean of the concentrations of reference metabolite *i* across all time points to normalize the data and *N* is the total number of data points used in the NRMSE calculation.

### Pearson Correlation Calculation

The available *E. coli* knockout experimental data consisted of steady-state flux data, so to compare these to the knockout predictions made by LK-DFBA (which did not yield a steady state over the ten second time interval of the model) we used the average flux of our time course predictions. Because the average flux of our predictions and the steady-state fluxes of the experimental data are different measurements and therefore not directly comparable using NRMSE, we chose to use a Pearson correlation coefficient to evaluate our framework, which was recently used in a similar comparative analysis of metabolic models [28]. High correlations between steady-state flux experimental data and the average flux predictions would indicate that LK-DFBA can effectively predict if gene knockouts lead to an increase or decrease in flux for modeled reactions. The calculation for the Pearson correlation coefficient is shown below, where *A*_{i} is the average of the predicted flux profile for the *i*th flux, *v*_{i} is the flux value of the *i*th flux from the experimental data, *Ā* is the mean across all fluxes for the average of computationally predicted fluxes, is the mean flux value across all fluxes for the experimental data, and *n* is double the number of fluxes that are shared between both the *E. coli* model and experimental data because it includes flux values before and after the gene knockout (*n* = 28).

## Results

### Fitting and predicting phenotypes in synthetic models

We generated twenty random sets of parameters and initial conditions for the kinetic equations in the synthetic model to examine if different constraint approaches were more suitable for different models. We produced *in silico* metabolite concentration and flux data over a time interval of ten seconds by solving the ODEs in each synthetic system. The four constraint approaches were used for parameterization of LK-DFBA models to the twenty datasets. The fitted LK-DFBA models were then simulated over the same ten second interval using the initial conditions for each respective synthetic system to compare against the original ODE data. This process was performed on both noiseless (nT = 50, CoV = 0) and noise-added data with different sampling frequencies (nT = 50 or 15) and levels of noise (CoV = 0.05 or 0.15). To test the ability of each LK-DFBA approach to predict the effects of defined genetic perturbations, we down- and up-regulated the v_{2}, v_{3}, and v_{4} pathways in the original kinetic equations by multiplying the kinetic coefficient parameters (*a* parameters in the inset of Figure 1) by 0.5x or 2x, respectively, and generating new ODE data. We then simulated the LK-DFBA model after adjusting the fitted LK-DFBA constraints to reflect the down- or up-regulation by multiplying the kinetics constraint parameters by 0.5x and 2x, respectively. The NRMSE between the predicted LK-DFBA metabolite concentrations and the ODE concentration data from the perturbed synthetic models was then calculated.

For the noiseless cases with no genetic perturbation (WT) as shown in the first row below each bold line in Figure 2A, the best-fitting constraint approach (dark green) varied across the different models. All four approaches performed best for at least one of the models. When fluxes were either down- or up-regulated via *in silico* genetic perturbations and LK-DFBA models fitted to the WT ODE data were used to predict these changes, the best constraint approach across all perturbations (dV2 through uV4) was generally consistent with the best approach in the absence of perturbations. The NRMSE heatmap with quantitative error values can be found in Figure S1.

When using noisy data, similar trends were observed (representative example in Figure 2B). While the best constraint approach for WT noisy data was not always the same as the best approach for noiseless data, the best constraint approach for a given noisy WT dataset was still generally the best for predicting the impacts of *in silico* genetic perturbations in the same model (dV2 through uV4). Interestingly, noisy data negatively affected the performance of LK-DFBA (HP) to a much greater extent than the other approaches, which caused LK-DFBA (HP) to never be identified as the best approach under the most realistic conditions (nT = 15, CoV = 0.15). NRMSE heatmaps with quantitative error values for various sampling frequencies and noise levels can be found in Figure S2-S5.

We also tested the effect of smoothing the noisy (nT = 15, CoV = 0.15) metabolite concentration and flux time course profiles by fitting to a previously described [29] impulse function (Figure S6). Smoothing the noisy data can often lead to lower error of the final model, but requires increased computation time for estimating the parameters of the impulse function and in certain cases can actually increase error if a specific dataset deviates significantly from all of the profiles that an impulse function can capture. The best constraint approach for WT smoothed data was the same as for unsmoothed data in 19 of the 20 models. As with the unsmoothed cases, the best constraint approach for smoothed data was typically consistent between WT and *in silico* genetic perturbations, and there were no cases where LK-DFBA (HP) performed the best (and it was generally the worst out of the four approaches) for smoothed data.

### Fitting and predicting phenotypes in *L. lactis* and *E. coli* models

For the *L. lactis* model, we tested the four constraint approaches on noiseless data and noisy data under different conditions (nT = 50 or 15, CoV = 0.05 or 0.15). On the noiseless data, the best constraint approach for the WT system was LK-DFBA (HP), which also had the lowest NRMSE when predicting the results of perturbations to five different pathways (Figure 3A). At high sampling frequencies and low noise (nT = 50, CoV = 0.05), LK-DFBA (HP) still performed the best, but as more noise was added or lower sampling frequencies were used, LK-DFBA (NLR) was optimal. This is consistent with the findings described above for the small synthetic systems where LK-DFBA (HP) can produce low NRMSE with noiseless data, but has difficulties under more realistic conditions.

As with the *L. lactis* model, we tested all constraint approaches on both noiseless and noisy data from the *E. coli* model under different conditions (nT = 50 or 15, CoV = 0.05 or 0.15). For this model, LK-DFBA (NLR) was the best constraint approach for noiseless data (Figure 3B). Noisy *E. coli* data produced the same results: for all noisy conditions, LK-DFBA (NLR) was optimal for the WT system. It was also optimal for almost all of the *in silico* genetic perturbations, showing once again that the same constraint approach that was optimal for the WT system at a given sampling condition was generally also optimal for the perturbed systems.

We also perturbed the original parameters and initial conditions (drawing from the random normal distribution *N*_{i} ∼ (*p*_{i},*p*_{i}), where *p*_{i} is the original value of the *i*th parameter) of the *E. coli* model to create five new models with the same topology (more information about parameter randomization can be found in the Supplementary Methods). As with the twenty different versions of the small synthetic system, we found that the best constraint approach was not conserved across models with the same topology as the original *E. coli* model when tested on noiseless data (Figure S8). Instead, the rates of individual reactions and how they affect overall model dynamics appear to be important factors in determining the optimal constraint approach.

### Improved LK-DFBA predictions yield qualitative consistency with experimental *L. lactis* metabolite concentration data

To further assess how well LK-DFBA performs when predicting different phenotypes, we compared the predictions of LK-DFBA to available experimental data for the first time. The previously described ODE-based *L. lactis* model was originally parameterized using experimental metabolite time course data from *L. lactis* cultures grown with an initial glucose concentration of 40 mM [2] and validated by comparison to some experimental data from cultures grown at initial concentrations of 20mM and 80 mM glucose. Here, we similarly fitted all LK-DFBA approaches to data generated by the ODE model at 40 mM glucose and then simulated the LK-DFBA model using the best constraint approach at 20 mM and 80 mM initial concentrations of glucose for validation.

Figures 4A, 4B, and 4C depict the metabolite concentrations predicted by LK-DFBA (HP) (the best approach for noiseless data in the *L. lactis* model) when trained on noiseless data. For multiple initial glucose concentrations, LK-DFBA (HP) captured the general qualitative trends of glucose (depletion) and lactate (accumulation), two key metabolites in *L. lactis* that are often studied [30, 31]. For cofactor metabolites that participate in many different reactions, such as ATP, NAD(H), and inorganic phosphate (Figure S9), it was more challenging for LK-DFBA (HP) to predict their concentration profiles over the simulation interval, which is a problem found in other modeling frameworks [5]. Although LK-DFBA’s predictions were overall not as smooth or quantitatively accurate as the ODE model, this is to be expected due to the lack of a validated objective function for this constraint-based model; the objective function we used was a gross approximation that likely does not reflect the cell’s true “goal”, and it is known that the objective function can significantly affect the predictions of FBA approaches. Nevertheless, as presented here, LK-DFBA can still qualitatively track important metabolite dynamics even when using a crude objective function. This is important to note, as many organisms that are not well-studied have no readily available objective function to use.

Figures 4D, 4E, and 4F depict the glucose and lactate concentration profiles predicted by LK-DFBA (NLR) (the best approach for noisy data in the *L. lactis* model) after being fitted to 10 noisy datasets generated by the ODE model and simulated at 20 mM, 40 mM, and 80 mM initial glucose, respectively. Again, the LK-DFBA framework generally captured the qualitative trends of major metabolites such as glucose and lactate, though unsurprisingly not as accurately as when noiseless data are used and with difficulties predicting cofactor concentrations (Figure S10). Because LK-DFBA (NLR) contains quadratic constraints, its results are generally smoother compared to the other LK-DFBA approaches, which helped it predict some metabolites, such as PEP, arguably better than in the noiseless case. Furthermore, LK-DFBA (NLR) is less susceptible to noise for some metabolites, such as glucose and lactate, as observed in predicting similar time courses across the 10 noisy datasets. This could be advantageous if one is modeling a system with multiple noisy data sets and requires consistent predictions for certain metabolites. Likewise, if only using a single dataset, LK-DFBA (NLR) can ensure that these metabolic profiles would not dramatically change if a different dataset had been used. Other methods, such as the original LK-DFBA (LR) approach, can result in more varied predictions (Figure S11) depending on the noisy dataset used; some appear to produce better predictions than LK-DFBA (NLR), while others are worse (though all predictions follow the same trends). These observations reiterate that the best approach is dependent on the systems and datasets being studied, so having multiple LK-DFBA approaches available is an improvement over only using the LK-DFBA (LR) framework.

### Changes in LK-DFBA flux profiles due to gene knockouts are correlated with experimental *E. coli* steady-state flux data

We also compared the predictions of the best LK-DFBA approach on the *E. coli* model to experimental steady-state flux data obtained through gene knockout experiments by Ishii *et al*. [24]. Because the Chassagnole model, which LK-DFBA is fitted to, only encompasses central carbon metabolism, we focused on 13 gene knockouts and 14 fluxes that are included in both the Chassagnole model and the Ishii steady-state flux results. We used the dilution rate of 0.2 h^{-1} for all experimental data. To emulate a gene knockout in the LK-DFBA model, we down-regulated the pathway(s) that correspond with the gene by multiplying the parameters of the relevant constraints by 0.1x instead of completely removing the reaction, as we found that this sufficiently reduced the possible flux reaction rate without causing infeasible solutions from the solver. Additionally, it is not uncommon for enzymatic activity to remain in a pathway after single gene knockouts due to paralogous enzymes and enzyme promiscuity. Because the LK-DFBA predictions do not reach steady-state for the simulation time examined in this work and our previous work (ten seconds), we instead used the average flux of the predicted time course to describe how LK-DFBA’s predictions change from the wildtype to gene knockout phenotype. The average flux before and after a gene knockout should reflect whether the reaction rate generally increases or decreases across the studied time interval after a system perturbation. We used a Pearson correlation to determine if the average flux profiles predicted by LK-DFBA changed similarly to the experimental data after a gene knockout. This assessment method has been used previously by Lima et al. to compare multiple *E. coli* models, including the Chassagnole model, to the Ishii dataset [28].

To evaluate how our framework compares to *E. coli* experimental data, we examined LK-DFBA (NLR), as it was the best approach in the case of low sampling frequency and high noise (Figure 3B). Figure 5 shows the average Pearson correlation of the LK-DFBA (NLR) flux predictions (after being fitted to ten noisy datasets with nT = 15 and CoV = 0.15) and the average correlation of the ODE model flux predictions with the experimental steady-state flux data. Of the gene knockouts and fluxes examined, LK-DFBA (NLR) generally gave reliable predictions for whether fluxes increased or decreased due to gene knockouts, with correlation values greater than 0.6 in all but two cases and correlations greater than 0.7 in 6 out of 13 cases. These correlations were very similar to the correlations yielded by the ODE-based model. In 10 out of 13 knockouts, the correlations calculated for LK-DFBA outperformed or were within 5% of the correlations calculated with the ODE-based model. These results support the significant promise of LK-DFBA approaches for predictivity comparable to that of standard models but with the additional benefits (including relative model simplicity and potential scalability) that accrue from using a LP-based formulation.

## Discussion

At the outset of this work, we sought to find a single LK-DFBA constraint approach that would improve upon the originally published framework. Instead, we have shown that the best constraint approach is highly dependent on the system being modeled. Despite each of the 20 small synthetic models having the exact same stoichiometry and allosteric regulatory interactions, the optimal LK-DFBA approach varied for both noiseless and noisy training datasets, with one of the new constraint approaches performing the best in the majority of cases. This finding suggests that the topology of the system is less important than the emergent dynamics from the collective metabolic reactions. It also supports the importance of having multiple types of constraints to choose from, as presented in this work, to allow more accurate modeling of any given system.

These conclusions are reinforced by analysis of biological systems, where LK-DFBA (HP) performs the best on *L. lactis* noiseless data and LK-DFBA (NLR) performs the best on *E. coli* noiseless data (though LK-DFBA (NLR) is superior for both systems when using low sampling frequency and high noise data). We further confirmed that topology is not the determining factor by randomizing parameters in the *E. coli* model (Figure S8): again, the best constraint approach varied across these topologically identical new models. Many metabolic pathways are conserved topologically across many species (e.g. glycolysis), though the kinetic parameters within these pathways can be vastly different. This suggests that having multiple LK-DFBA constraint approaches to choose from will improve our ability to model different systems.

While the best constraint approach varied across different model parameterizations and topologies, the best approach (in terms of predicting metabolic phenotypes) for a given model was generally consistent across a wide range of pathway perturbations. This trend remained true whether using noiseless data, data with low sampling frequency and high noise, or noisy data that had been smoothed. These results instill confidence that the best constraint approach found when fitting to a wildtype metabolic system will also be the best approach when predicting changes to that system, meaning that an approach that can select the best-fitting of multiple constraint frameworks is viable and likely to be successful. One possible reason for the success of this approach is that when pathways are down- or up-regulated, it is common for only the nearest neighboring pathways to be significantly affected if the change to the system is not drastic or the perturbed pathway is not essential for cell survival, meaning that the emergent behavior from the system would not change too greatly and thus the same constraint approach would be optimal. To easily construct the optimal LK-DFBA model for a given biological system, we envision the workflow presented in Figure 6. After compiling the relevant system stoichiometry, regulatory information, and metabolomics and fluxomics data, one can fit each of the four LK-DFBA approaches to the data and determine which constraint approach most likely works best for predicting the results of different perturbations.

Using ODE models and experimental data from *L. lactis* and *E. coli*, we found that LK-DFBA can effectively predict qualitative trends in concentration profiles of some important metabolites. While we have previously shown that LK-DFBA captures metabolite dynamics in synthetic data generated by ODE models, this is the first time LK-DFBA predictions have been validated with experimental data. For key metabolites that are important inputs of outputs of the system (e.g. carbon sources or end products), LK-DFBA can qualitatively predict if their concentration profiles are expected to decrease or increase, which is an important capability if one is using LK-DFBA to engineer organisms to efficiently produce certain metabolites. Cofactors, on the other hand, are more difficult to model using LK-DFBA but are still typically predicted to be within an order of magnitude of the experimental data in most cases. This capability could be useful when assessing levels of accumulating toxic metabolites or cofactor imbalances if exact concentrations are not necessary. We also found that LK-DFBA flux profile predictions were highly correlated with experimental flux data from genetic knockout experiments. Furthermore, these correlations were comparable to those found when using the ODE-based model. We note, though, that this comparable predictivity is limited by the fact that LK-DFBA was trained using ODE-generated data; if it had instead been fitted to actual metabolomics and fluxomics time course data used in the Ishii experiments (which is not available), these correlation values could possibly be even higher. Similarly, an improved objective function over the reasonable but arbitrary and unoptimized one used here could also lead to significant improvements in the performance of LK-DFBA.

By showing for the first time that LK-DFBA can predict changes in metabolite concentrations and flux profiles qualitatively, we have demonstrated LK-DFBA’s potential as a widely-applicable metabolic modeling tool. Unlike many ODE-based modeling approaches that require specific kinetic equations for each flux reaction, LK-DFBA is generalizable. With four types of kinetics constraints that account for different biological interaction phenomena between metabolites and fluxes, we have improved LK-DFBA to be amenable to many different systems. Additionally, applying the four LK-DFBA approaches to these models of *L. lactis* and *E. coli* has established that our framework can handle various biological systems of substantial size without the need for computationally taxing parameter estimation steps. Because each of the four LK-DFBA approaches maintains an easily solvable LP or QCP structure, LK-DFBA is a prime candidate for being one of the first frameworks able to model a variety of genome-scale systems while also capturing their metabolite dynamics.

While the addition of new constraint approaches has significantly improved the original LK-DFBA (LR) framework, there are still several areas where LK-DFBA can be improved. If computational resources when building the model are not a concern, a secondary optimization step can be used, as in the LK-DFBA (LR+) approach, to improve the parameters in each of the new constraint approaches. In addition, as previously noted the objective function used in LK-DFBA is also a ripe target for future efforts to improve this modeling framework. Here we have chosen objective functions that lead to the maximization of putatively important fluxes, but unlike many other constraint-based models, there was no specific biomass or other objective flux to use. Optimizing the weight of each flux or metabolite in the objective function could lead to even lower observed errors compared to experimental data and may also provide insight into what real biological systems tend to maximize.

## Conclusion

In this work, we have shown that the LK-DFBA modeling framework can be improved by implementing more complex constraints with increased biological relevance. We showed that there is no single best LK-DFBA constraint approach for all models, and the optimal approach depends not just on the topology of the biochemical system but also its kinetics and parameters. The constraint approach that performs the best in recapitulating training data consistently outperforms other constraint approaches at predicting the results of metabolic perturbations on the same system. With these new constraint approaches, we are able to model a variety of metabolic systems more accurately than if we were to just use the original LK-DFBA (LR) method. Moreover, based on comparisons to experimental data we showed that the improved LK-DFBA approaches can reasonably capture the qualitative dynamics of important metabolites and fluxes of interest to researchers. While these predictions may not be smooth or quantitative, the qualitative prediction of trends of metabolite dynamics in response to major perturbations is arguably the most critical aspect needed for creating metabolic models that give insight on how pathways can be further optimized or how metabolic resources can be rerouted to produce valuable chemicals: knowing that a specific knockout will increase or decrease flux is often sufficient to justify the expense of experimental implementation of such genetic perturbations. Moreover, we expect this computational framework to (with future effort) provide opportunities for computationally reasonable scale-up to the genome scale. While the acquisition of quality metabolomics and fluxomics data to build the constraints in LK-DFBA is still a challenge, the work we have presented here lays the groundwork needed to take full advantage of these types of datasets as they become increasingly more readily available.

## Data availability

The code and datasets generated during the current study are available at https://github.com/gtstylab/lk-dfba-constraints.

## Declaration of competing interest

The authors declare there is no conflict of interest

## Funding

Funding for JYL and MPS was provided by the National Institutes of Health (www.nih.gov, R35-GM119701) and National Science Foundation (www.nsf.gov, 1254382). The funders had no role in the design of the study or collection, analysis, or interpretation of data or writing of the manuscript.