Error sensitivity and optimization of steady-state kinetic parameters using multidimensional chemical kinetic analysis

Enzyme behavior has been described using the Michaelis-Menten mechanism. The analysis of extended time domains provides a means to extract the Michaelis-Menten constants through direct fitting of raw data. We have developed a scheme for determining Michaelis-Menten rate constants by appropriate fitting of multidimensional experimental data sets to the closed form of the Michaelis-Menten model. We considered how varying parameters in experimental data affect the accuracy of the remaining parameter estimates. We determine how to improve experimental design to achieve a given accuracy, relative to the amount of intrinsic or external error. We analyze this scheme on data sets built around 20 hypothetical and 2 natural enzymes (kinesin and apyrase) to test error sensitivity in different parameter regimes. Overall, we provide evidence that our data fitting regime will tolerate significant experimental error in the raw data and still converge on the four Michaelis-Menten constants.

120 Kinetic assays. 121 For microtubule-stimulated ATPase measurements, Eg5 at 0.2 M was complexed with 122 paclitaxel-stabilized microtubules (MTs) at 2 M and reacted with varying concentrations of 123 ATP substrate (95-310 M) and added ADP product (0 or 1500 M). Samples of the reaction 124 (10 L each) were quenched in 1 N HCl (2.5 L of 5 N stock) followed by addition of malachite 125 green reagent (150 L; room temperature; [34]). After thorough mixing by vortexing, 150 L of 126 this mixture was transferred to a 96-well plate for endpoint absorbance (=650 nm) using a 127 BioTek Epoch microplate spectrophotometer. A standard curve from 10-400 M 128 Na 2 HPO 4 /NaH 2 PO 4 (pH 7.2) was used to convert A 650 to inorganic phosphate (P i ) concentration. 129 For each progress curve, P i produced by Eg5 ATPase (total P i minus background P i ) was plotted 130 against time.

131
For ADPase measurements, Apy at 0.13 M was reacted with varying ADP substrate 132 (40-260 M) and added AMP product (0 or 10 mM). P i product was monitored using the 133 malachite green assay as described above. For each progress curve, P i produced by Apy ADPase 134 was plotted against time. Stock ATP and ADP concentrations were experimentally determined 135 using absorbance (=260 nm) with molar extinction coefficient (15,400 L mol -1 cm -1 ).
136 Mechanisms. 137 Expansions of the Michaelis-Menten (MM) mechanism establish irreversible product 138 formation/release (MM model; Equation 1) as well as competitive product inhibition (PI) of 139 enzyme activity with irreversible chemistry step (PI model; Equation 2) and the fully reversible 140 (FR) three-step mechanism (FR model; Equation 3).

164
To generate variably 'noisy' data sets that realistically represent a range of experimental 165 observations, normally distributed random additive error (errA) was added to the quantity of 166 product formed during the simulations using Scheme 1 and the following equation:

167
(4) 168 169 where represents the fraction of converted to product at equilibrium defined as: We used the closed form of the model to determine the near-optimal time parameter 175 interval to maximize the efficiency of the fitting algorithms given a fixed sampling frequency. 176 The most valuable portion of the curve is where the absolute value of the second derivative is 177 bounded away from zero, since the remaining portions are nearly linear where the data points are 178 quite redundant for the nonlinear parameter estimation. Toward this end, we compute the 179 derivative of the product formula (Scheme 1). The derivative of the Lambert -function is: 182 Hence simplifies to:

184
[P] ' = -  204 Judging the quality of the final nonlinear least squares fitting for our SIM1, SIM2, and 205 SIM3 simulations required a scoring function that incorporated both accuracy (i.e. estimated 206 constant from curve fitting divided by the expected constant) and precision (i.e. symmetric 95% 207 confidence interval). In general, more meaningful fitted constants correlate with higher accuracy 208 (~1) and lower confidence intervals. A scoring function was developed to assess each fitted 209 constant: where was the estimated constant from nonlinear least squares fitting, was the 213 simulated (i.e. expected) constant, and was the confidence interval (95%) for each CI 95% 214 measurement based on a simple multiple of the standard error. , , and  are weighting 215 constants chosen to provide a score of 0.5-1 when normalized constants ( ) are 0.5-1.5 / 216 with narrow confidence intervals (< 50%). For all score calculations presented in this 217 manuscript,  = 1,  = 0.75, and  = 0.5 ( Fig S2b).

Analysis of experimental kinetic data for Apy and Eg5
220 Experimental data for kinesin-5 (Eg5) and apyrase (Apy) were fit to either one of three 221 mechanistic models, represented by the algebraic rate equations listed in Scheme 1. The three 222 models, in order of increasing complexity, are designated as "MM", "PI", and "FR", 223 respectively. Model MM represents the simplest Michaelis-Menten kinetic model, whereby the 224 reaction product does not rebind and inhibit the enzyme and the reaction proceeds only in the 225 forward direction (Equation 1). Model PI represents the product inhibition model, whereby the 226 reaction product rebinds and inhibits the enzyme but the overall reaction can only proceed in the 227 forward direction (Equation 2). Model FR represents the fully reversible kinetic model, whereby 228 the reaction product not only rebinds and inhibits the enzyme, but also the overall reaction can 229 proceed in either the forward or reverse direction (Equation 3).

230
Apy (type VII; EC 3.6.1.5) catalyzes the hydrolysis ADP to AMP and P i [45]. To 231 measure the ADPase kinetics of Apy, we performed a series of reactions at different ADP 232 substrate (40-260 µM) and different AMP product (0 and 10 mM) concentrations and used the 233 malachite green colorimetric assay to quantify P i produced over time. Figs 1a-b summarize the 234 results from fitting the Apy ADPase data using basic initial velocity analysis as well as to the 235 three independent models (Table 1). Using both AIC and BIC to assess the goodness of fit 236 relative to the complexity of the model, the apyrase data were best modeled to the FR 237 mechanism (Equation 3) given the lower AIC and BIC values compared to the PI model (∆ 238 . However, asymmetric confidence analysis (Table 2; Fig S1a) AIC =-17 and ∆BIC =-14) 239 indicated that not all the parameters were well constrained. Specifically, we observed large upper 240 bounds for and and no lower bound for . These results suggest that AMP product 241 weakly binds the enzyme yet will proceed in the reverse direction to reform ADP substrate (i.e. 242 the equilibrium constant for the chemistry step was 7 at the best fit values).  Table 1.

256
Eg5 (EC 3.6.4.4) catalyzes the hydrolysis of ATP to ADP and P i products, which can be 257 monitored using the malachite green colorimetric assay [34]. We designed a matrix of 258 microtubule-stimulated Eg5 ATPase reactions with varying ATP substrate (95-310 µM) and 259 added ADP product (0 and 1500 µM) concentrations for our analysis. Figs 1c-d summarize the 260 results from fitting the microtubule-stimulated Eg5 ATPase data using the three independent 261 models from above with a direct comparison with previously published results ( . These results are consistent with the known chemistry of Eg5, which AIC = 2 and ∆BIC = 5) 269 shows no evidence of ATP synthesis in the presence of microtubules [33]. Asymmetric 270 confidence analysis (Table 2; Fig S1b) showed no lower bounds for and . Thus, our 271 analysis provides progress curve fitting to extract the binding constants for substrate and product 272 as well as the forward catalytic rate constant with minimal data sets (i.e. 6-8 time courses).
273 Simulated enzyme mechanisms 274 To demonstrate the ability of our multidimensional chemical kinetics analysis of SIM1, 275 SIM2, and SIM3 simulations to extract the four Michaelis-Menten (MM) constants from 'noisy' 276 data, we established 2 enzymes based on the best fit constants of Apy and Eg5 as well as 20 277 hypothetical enzymes that show a wide range of kinetic mechanisms (E1-E20, Table 3, Fig S2a). 278 These enzyme mechanisms display 1) widely varying equilibrium constants that span 279 biologically relevant enzymes , 2) a wide variety of ( * = = 1.0 -2.4x10 4 ) 280 substrate and product binding constants (K S and K P , respectively) including various combinations 281 seen in natural enzymes, 3) a combinatorial spectrum of forward and reverse catalytic constants ( 282 , respectively) as well as 4) a range of catalytic efficiencies from and ( / ) 283 extremely inefficient enzymes to a 'catalytically perfect' enzyme (Apy = 0.13 μM -1 s -1 ) (E1 284 . To generate 'noisy' data sets that realistically represent a range of = 1003 μM -1 s -1 ) 285 experimental observations, normally distributed random additive error was added to the quantity 286 of product formed during the simulations using Equations 4 and 5 (Fig S3). Given that the four were defined in these simulations, we could assess our nonlinear least ( , , , ) 288 squares fitting for both accuracy and precision using a scoring algorithm (Equation 9), which is a 289 function of both the normalized constant (defined as the fitted constant divided by the actual 290 constant) and symmetric 95% confidence interval (Fig S4).

Data set design 292
For the SIM1 simulations, the concentration of substrate was varied over a range of 293 concentrations from 0.5*K S to 5*K S with no added product at the start of the simulations.

([P o ])
294 For the SIM2 simulations, the substrate concentration was kept constant at 5*K S and the 295 concentration of initial added product at the start of each simulation was varied from 0 to 296 5*K P ( Table 2). The SIM3 simulations provided a matrix (3x3) using the variation of initial 297 substrate and initial added product concentrations as described for SIM1 and SIM2 298 simulations, respectively (see Table S1 for an example of simulation conditions). The calculated 299 time domain was optimized through the  ratio of final instantaneous velocity (v f ) to initial 300 velocity (v o ) to normalize the data point density across each time domain (see Materials and 301 Methods for details). Each simulation provided a unique experimental data set with a 302 pseudorandom Gaussian error distribution and these data sets were fit using NonlinearModelFit 303 in Mathematica. For Apy and E1-E20, all four constants were floated during ( , , , ) 304 the analysis. Since Eg5 demonstrated a minimal PI mechanism, only three constants ( , , ) 305 were floated during the fitting. We assessed the sensitivity of the SIM1, SIM2, and SIM3 designs 306 against varying additive error for Apy and Eg5 (Fig 2). For Apy, SIM1 was significantly inferior 307 to the SIM2 and SIM3 designs based on overall fitting scores (Figs 2a-d). However, SIM3 was a 308 significantly better design compared to the SIM2 design (Fig 2d), suggesting that varying both 309 initial substrate and product concentrations provided better overall fitting scores. For Eg5, SIM1 310 and SIM3 were comparable for estimates (Figs 2e-g), but SIM3 provided a and 311 significantly improved mean fitting score for (Fig 2h). , cyan) plotted against error using SIM1, SIM2, and SIM3 316 conditions, respectively. (d) At constant 3% error, the mean fitting score for each constant was 317 determined under different Apy simulation designs (as indicated). n=200. Error bars represent 318 the standard deviation of the mean. (e-g) Fitting results for Eg5 simulations with the fitting score 319 for each constant plotted against error using SIM1, SIM2, and SIM3 conditions, respectively. (h) 320 At constant 3% error, the mean fitting score for each constant was determined under different 321 Eg5 simulation designs (as indicated). n=200. N/A, not applicable.

323
We tested the sensitivity of all three simulation designs (SIM1, SIM2, SIM3) against 324 constant additive error for our hypothetical enzymes, E1-E20 (Fig S5). For the hypothetical 325 enzymes with (E1-E10), SIM2 provided very low (or indistinguishable from zero) ≥ 326 fitting scores for and . SIM3 provided comparable fitting scores for , and , but 327 consistently higher scores for . For the hypothetical enzymes with (E11-E20), SIM1 < 328 provided very low (or indistinguishable from zero) fitting scores for and . SIM3 provided 329 consistently higher scores for all four constants compared to SIM2. There was no obvious 330 correlation to better fitting scores when comparing 1) relative to , 2) overall equilibrium 331 constant , or 3) enzymatic efficiency . Overall, varying both initial substrate and ( * ) ( ) 332 added product was successful to attain higher fitting scores regardless of the enzyme mechanism.

333
We assessed the sensitivity of the SIM3 design against varying additive error for our 334 hypothetical enzymes, E1-E20 (Fig S6). All enzymes showed fitting scores that dropped below 335 0.5 at >3% error except for E10, which had fitting scores that dropped below 0.5 around 2% 336 error. Using asymmetric confidence estimate analysis, simulated Apy SIM3 designs were tested 337 at various additive errors (Fig S7). Since the asymmetric confidence interval was consistently 338 larger than the symmetric 95% confidence estimates obtained from nonlinear least squares 339 fitting, it was difficult to directly compare quality scores (Fig S8). Given the wide range of 340 different hypothetical enzymes tested in this analysis, the SIM3 design (where both initial 341 substrate and added product is varied) is likely to be successful at estimating the four MM 342 parameters from fairly 'noisy' data for any enzyme found in natural systems.
343 Correlations between the estimated constants during the fitting 344 We next examined the six possible correlations between the four fitted parameters: , 345 (Fig 3). Using the standard asymptotically unbiased sample correlation matrix, we , , 346 computed the estimated correlations as we varied through E1-E20, Apy, and Eg5 with constant 347 error and fixed starting values. We observed that the correlations remain roughly the same, but 348 there was some variability as the parameters for the different enzymes vary. The relationship in 349 the model accounts for most of the nonzero correlation especially considering a given fitting 350 curve determined only three of the four parameters uniquely as discussed (Equations S18-S29). 351 The simplicity of the symmetry between K S and K P accounts for the near perfect correlation 352 between these two constants. 353 354 Fig 3. Parameter correlations for SIM3. All six parameter correlation coefficients () are 355 shown for E1-E20, Apy, and Eg5 for the SIM3 design. Error was held constant at 3%. E1-E20 356 and Apy were modeled using the FR mechanism whereas Eg5 was modeled using the PI 357 mechanism.
358 Time domain length effects on the parameter estimation 359 We investigated the sensitivity of the SIM3 design for the length of the time domain 360 based on the ratio of instantaneous velocity from the start of the reaction till the end ( ; = 361 Fig 4). We observed similar fitting scores for Apy and Eg5 over a broad range of time domains, 362 indicating a vast insensitivity of the SIM3 design on the short time domain for data simulation 363 (Figs 4a-c and S9). At ratios greater than 0.7, the fitting scores became markedly lower, 364 indicating that the fitting relies on a significant quantity of product formed during the reaction 365 (i.e. 60-85% of initial substrate converted to product during the time course). However, if the 366 time domain was extended beyond the optimal range and thus few data were found in the initial 367 velocity phase of the curve, the fitting scores steadily decreased for Apy and Eg5 (Figs 4d-f). For 368 the hypothetical enzymes with (E1-E10), fitting scores for and decreased rapidly ≥ 369 as the time domain was lengthened (Fig S10). For the hypothetical enzymes with (E11-< 370 E20), the overall fitting scores were less sensitive to the longer time domains (Fig S10). The total degrees of freedom correspond to the length of each progress curve (i.e. 25 data 381 points per curve) multiplied by the number of progress curves. The number of data points per 382 reaction was easy to modify since the total time domain was divided evenly from 2-1024 data 383 points (i.e. changing the time constant for the calculation of ). Figs 5a-b highlights the SIM3 [P] 384 results for Apy and Eg5 by varying these conditions. As expected, the fitting scores increased 385 proportionally with the number of data points per time course. These results were not specific to 386 Apy and Eg5, but similar observations were made for all hypothetical enzymes tested (Fig S11). 387 We observed a similar increase of the fitting scores by holding the number of data points per 388 curve constant at 25 and varying the number of progress curves (data not shown). However, 389 using methods to solve the asymmetric confidence estimates, we observed nearly constant fitting 390 score as a function of data points per curve (Fig 5c). Therefore, the asymmetric confidence 391 estimates solved from fitting two constants while fixing the values of the remaining two 392 constants provided a robust estimate of the confidence interval independent of the number of 393 total data points in the data set.  402 We have tested the sensitivity of three different experimental designs centered on the 403 closed form (Scheme 1) of the expanded Michaelis-Menten (MM) mechanism for enzyme 404 activity. To simultaneously fit the raw data from individual progress curves to attain the four 405 MM constants , we fit real experimental data for two enzymes (Apy and Eg5) as ( , , , ) 406 well as simulated data for 20 different enzymes (E1-E20). These hypothetical enzymes have a 407 wide range of equilibrium constants, substrate and product binding affinities, catalytic constants, 408 and enzymatic efficiencies (Fig S2a, Table 3). Despite all the divergent mechanisms tested in 409 this study, the SIM3 design was robust in accurately and precisely estimating all four MM 410 constants from an experimentally feasible data set.

Scoring function provides standard assessment
Given the complex nature of the data analysis from this study, we developed a simple 413 exponential scoring function (Equation 9; Fig S2b) that yielded a score from 0 to 1 based on 414 nonlinear least squares fitting accuracy (defined as the fit estimate divided by the actual constant) 415 and fitting precision (defined as the symmetric 95% confidence interval for each estimate). In 416 general, more meaningful fitted constants correlate with higher accuracy and lower confidence 417 intervals. The weighting parameters that govern the scoring function were empirically 418 determined to provide a fitting score of 0.5-1 when accuracy and precision were less than 50%. 419 The same weighting constants were used for the entire analysis, and depending on the nature of 420 the questions being addressed, these weighting factors can be scaled accordingly to provide more 421 stringent or more flexible scoring outcomes. The weighting factors employed in this study 422 provide a reasonably conservative approach to evaluating the fitting results. = 0 427 experiments are designed by only varying the substrate concentration as in SIM1. However, we 428 found that adding initial product in the beginning of the reaction will lead to better parameter 429 fitting results from both experimental and simulated kinetic data. Indeed, the best fitting regime 430 happened with varying both substrate and initial product concentrations in the 3x3 matrix (Fig  431 2). We found that >50% of initial substrate conversion to product was necessary for accurately 432 defining all four parameters (Fig 4). If experimental error remains constant, having greater than 433 24 data points per curve (assuming a 3x3 matrix) provided sufficiently accurate parameter 434 estimates, but more data narrowed the symmetric confidence limits (Fig 5). We also observed 435 that most hypothetical enzyme mechanisms tolerated a wide range of initial substrate and initial 436 added product concentrations for achieving accurate parameter estimates (data not shown). 437 Given the capabilities of Mathematica, future investigations aim to extend this analysis to 438 mechanisms involving two substrates and products during the reaction (i.e. bi-bi mechanisms). 440 When studying a new uncharacterized enzyme, the decisions made by the enzymologist 441 for initial substrate concentrations, total enzyme concentration, length of time domains, the 442 number of data points per time domain, and the number of data sets to collect are difficult. 443 Interestingly, global fitting under different simulation designs tolerated a wide spectrum of initial 444 substrate and/or added product concentrations (~2-orders of magnitude around the and 445 values; data not shown). The total enzyme concentration and the length of the time domain are 446 directly related, though limits are often encountered based on the solubility limit of the enzyme, 447 the dissociation of higher order oligomers of the enzyme, the minimum time constant for data 448 sampling, and/or the stability of the enzyme under the experimental conditions. Therefore, these 449 caveats make it difficult to achieve the optimal conditions to meet MM steady-state assumptions. 450 Nevertheless, we found that our SIM3 design converged on all four parameters for all enzymes 451 tested despite the length of time domain used.

452
As expected, we observed higher fitting scores when data sampling frequency increased 453 over the course of the progress curve (Figs 5a-b and S11), yet, surprisingly, the scores did not 454 linearly improve with the added number of data sets in the matrix (data not shown). These results (μM -1 s -1 ) 2.1 9.9 13 1.2