Estimates of in vivo turnover numbers by simultaneously considering 1 data from multiple conditions improve metabolic predictions

17 Turnover numbers characterize a key property of enzymes, and their usage in constraint-based 18 metabolic modeling is expected to increase prediction accuracy of diverse cellular phenotypes. In 19 vivo turnover numbers can be obtained by ranking of estimates obtained by integrating reaction 20 rate and enzyme abundance measurements from individual experiments; yet, their contribution to 21 improving predictions of condition-specific cellular phenotypes remains elusive. Here we show that 22 available in vitro and in vivo turnover numbers lead to poor prediction of condition-specific growth 23 rates with protein-constrained models of Escherichia coli and Saccharomyces cerevisiae, 24 particularly in the ultimate test scenario when protein abundances are integrated in the model. We 25 then demonstrate that in vitro turnover numbers can be corrected via a constraint-based approach 26 that simultaneously leverages heterogeneous physiological data from multiple experiments. We 27 find that the resulting estimates of in vivo turnover numbers lead to improved prediction of condition-28 specific growth rates, particularly when protein abundances are used as constraints, and are more 29 precise than the available in vitro turnover numbers. Therefore, our approach provides the means 30 to decrease the bias of in vivo turnover numbers and paves the way towards cataloguing in vivo 31 kcatomes of other organisms.

available in vitro and in vivo turnover numbers lead to poor prediction of condition-specific growth 23 rates with protein-constrained models of Escherichia coli and Saccharomyces cerevisiae, 24 particularly in the ultimate test scenario when protein abundances are integrated in the model. We 25 then demonstrate that in vitro turnover numbers can be corrected via a constraint-based approach 26 that simultaneously leverages heterogeneous physiological data from multiple experiments. We 27 find that the resulting estimates of in vivo turnover numbers lead to improved prediction of condition-28 specific growth rates, particularly when protein abundances are used as constraints, and are more 29 precise than the available in vitro turnover numbers. Therefore, our approach provides the means

58
Measuring the kcatome of an organism based on in vitro characterization is limited due to 59 impossibility to purify specific enzymes, lack of availability of substrates, and knowledge of required 60 cofactors, such that their relevance for studies of in vivo phenotypes remains questionable (10, 11).

61
Proxies for turnover numbers, referred to as maximal in vivo catalytic rates, can be obtained by

103
includes 45%, 41%, and 14% measured over all, at least one (but not all), and none of the 27 used 104 conditions, respectively. Therefore, there is then a different data support for correcting the 105 values of these classes of proteins. PRESTO relies on solving a linear program that minimizes a 106 weighted linear combination of the average relative error for predicted specific growth rates and 107 the correction of the initial turnover numbers integrated in the pcGEM model ( Fig. 1, Methods). It 108 further employs K-fold cross validation (here, K = 3) with 10 repetitions while ensuring steady state 109 and integrating protein constraints for proteins measured over all conditions (Fig 1, Methods). The predicts a growth rate that is at most 10% smaller than the measured specific growth rate for that 139 condition or no additional value that strongly constraints the solution can be found. In contrast 140 to this procedure, PRESTO corrects at once the turnover numbers of multiple enzymes that are 141 measured in all investigated conditions by simultaneously leveraging the data from the different 142 conditions. As a result, rather than deriving condition-specific corrected values, which are 143 difficult to use in making predictions for unseen scenarios, PRESTO results in a single set of 144 corrected values.

145
Next, we compared the performance of PRESTO with the heuristic implemented in GECKO in three 146 modeling scenarios that consider: (i) only condition-specific total protein content, (ii) both total 147 protein content and uptake constraints, and (iii) additional constraints from abundances of enzymes 148 measured in all conditions. For corrections of turnover number from PRESTO, we observed that 149 the relative error spans the range from 0.15 to 0.88 in the least constrained scenario (i) (Fig 2a) 150 and from 0.67 to 0.98 in the most constrained scenario (iii) (Fig 2c). In contrast, the relative error 151 with the corrections of turnover numbers from the GECKO heuristic is in the range from 0.94 to 152 1.00 in scenario (iii) (Fig 2c). In addition, in scenario (iii), the relative error in the case of the GECKO 6 heuristic for each condition is larger than the relative error of the PRESTO predicted growth rate 154 (Fig 2c). We observed that predictions from flux balance analysis, considering enzyme 155 abundances, without a constraint on the total protein content, led to average relative error of 0

160
We also performed sensitivity analysis by investigating a smaller value, of 10 −10 , for the weighting 161 factor  used in the PRESTO objective. We found that when the weighting factor is 10 −10 (at which   Table S1B). We did not find a significant Spearman correlation 205 ( = 0.19, = 0.385) between the log-transformed values in this intersection (Fig 3c), owing 206 to the different principles employed in the two procedures. In addition, the intersection between 207 enzymes with manually corrected values and those corrected by the GECKO heuristic is higher 208 than with PRESTO. This is expected since the manual curation partly aimed at correcting the most 209 constraining turnover numbers (7).

210
We also compared the values adjusted by GECKO against values for the maximum apparent maximum of all condition-specific GECKO corrections (Fig S8 a, b). In the enzyme abundance 221 constrained scenario, the model with turnover numbers obtained from pFBA performed slightly 222 8 better than GECKO but still only achieved a minimum relative error of 0.93, which is larger than 223 0.71 resulting from PRESTO (Fig S8 c)

232
By applying a three-fold cross-validation, we found the optimal value for the  parameter to be 10 −5 233 ( Fig S9a). This value was associated with an average relative error of 1.95 (overall average: 3.32)

234
and 73 corrected turnover numbers, while on average 156 values were corrected across all 235 explored values for . On average, the Jaccard distance between cross-validation folds was 0.13 236 (Fig S9b), while the average Jaccard distance between unique sets of enzymes with corrected 237 turnover numbers for each  parameter was three-fold larger (0.4, Fig S9c).

242
The performance of PRESTO was assessed and compared to GECKO using scenarios (i) and (iii) 243 since no condition-specific uptake rates were available. With default uptake rates, the relative error 244 for predicted growth ranged between 0.01 and 8.56 in the less constrained scenario (i). Further,

245
we obtained relative errors between 0.01 and 0.88 for the more constrained scenario (iii), when 246 using the values corrected by PRESTO (Fig 4 a, b). In contrast, when using the values 247 from the GECKO approach, the relative error was in the range between 0.01 and the 4.89 for 248 scenario (i) and between 0.89 and 0.99 for scenario (iii). In this scenario, too, we observed that the 249 relative error using values corrected by GECKO was consistently larger than the relative error 250 resulting from the single set of corrected values obtained by PRESTO (Fig 4a, b).

251
The sum of introduced corrections reached a plateau at 10 −11 for the weighting factor in the 252 PRESTO objective. We found that the relative validation error at this value was 5.26, which is 2.7-253 fold larger than the relative error obtained using the optimal . Hence, when only the pool constraint 254 is applied, allowing for more and larger corrections results in a decrease of the overall relative error 255 9 in PRESTO, at the cost of predicting growth that is consistently higher than the observed. This

257
where only the pool constraint is considered. We conclude that the prediction performance of the 258 eciML1515 model was improved by using turnover numbers corrected by PRESTO only when 259 enzyme abundances are integrated.

260
To assess the precision of the introduced corrections, we performed variability analysis and 261 sampling (see Methods) of the introduced corrections to the initial values for two values of the 262 weighting factor , namely 10 −5 and 10 −11 . We observed that the 25 and 75 percentiles enclose a 263 narrow interval around the values resulting from PRESTO ( Fig S10) and are thus not evenly 264 distributed across the respective interval determined by the variability analysis. We further noted 265 that here, the predictions of smaller  are generally more precise than the large corrections ( ≥ 266 50 ), which span 2.12 orders of magnitude (small  (< p50): 1.83, Fig S10). However, we also 267 observed that the precision decreased when more corrections were allowed in PRESTO. This 268 further justified our choice for the optimal parameter , which results in a lower number of 73 269 corrections compared to 204 at = 10 −11 , and moreover guarantees more precise estimates (Fig   270   S11). In conclusion, the application of PRESTO is not limited to a single species but presents a 271 versatile tool for the correction of turnover numbers across species.

272
In contrast to the observations made in S. cerevisiae we found that a model parameterized with the 273 turnover numbers estimated by pFBA (12)

281
The low number of corrections introduced by GECKO leads to an overlap of only 3 enzymes (75%) 282 whose values were also corrected by PRESTO (Table S1C, Fig S12a). The pathway 283 enrichment analysis for PRESTO corrections at = 10 −5 identifies amino acid and secondary 284 metabolite synthesis as significantly enriched terms among the enzymes with corrected turnover 285 numbers (Fig S12b). These results argue for a systematic underestimation of in vivo turnover 286 numbers in these pathways compared to in vitro data, irrespective of the investigated organism.

353
Model preparation

354
The proposed approach aims at parsimonious correction of turnover values in genome-scale 355 enzyme-constraint metabolic models using measured protein abundances. Therefore, it is 356 important to consider differential association between enzymes and reactions, i.e., isozymes, 357 enzyme complexes, and promiscuous enzymes. We decided to use the GECKO formalism (7), (2)

380
We justify making this assumption based on our observation that most enzymes in the S. cerevisiae 381 model are associated with no more than four reactions. Further, the vast majority of enzymes are 382 assigned a single unique turnover number even though they catalyse multiple reactions (Fig S14).

383
We then introduced a correction factor , which is added to each if the protein abundances for 384 the underlying enzyme were available: 386 14 models and validated the obtained corrections on the remaining fold of condition-specific models.

410
The relative errors ( ) and the sum of  (i.e., ) were then used to calculate the scores , which 411 helped us choose the optimal value for :  (7). Maximum growth was predicted in 465 three different constraint scenarios: (i) using only the protein pool constrain and default uptake 466 rates (1000 mmol/gDW/h), (ii) using the pool constrain and experimentally measured uptake rates,

467
(iii) using the previous constraints plus the absolute enzyme abundance.

468
The two studies which generated in vivo estimates from pFBA (12, 15)