Estimating the conjugative transfer rate of antibiotic resistance genes: Effect of model structural errors

The spread of antibiotic resistance genes (ARG) occurs widely through plasmid transfer majorly facilitated via bacterial conjugation. To assess the spread of these mobile ARG, it is necessary to develop appropriate tools to estimate plasmid transfer rates under different environmental conditions. Process-based models are widely used for the estimation of plasmid transfer rate constants. Empirical studies have repeatedly highlighted the importance of subtle processes like delayed growth, the maturation of transconjugants, the physiological cost of plasmid carriage, and the dependence of conjugation on the culture’s growth stage. However, models used for estimating the transfer rates typically neglect them. We conducted virtual mating experiments to quantify the impact of these four typical structural model deficits on the estimated plasmid transfer rate constants. We found that under all conditions, the plasmid cost and the lag phase in growth must be taken into account to obtain unbiased estimates of plasmid transfer rate constants. We observed a tendency towards the underestimation of plasmid transfer rate constants when structurally deficient models were fitted to virtual mating data. This holds for all the structural deficits and mating conditions tested in our study. Our findings might explain an important component of the negative bias in model predictions known as the plasmid paradox. We also discuss other structural deficits that could lead to an overestimation of plasmid transfer rate constants and we demonstrate the impact of ill-fitted parameters on model predictions.


Forward model 82
The core of the forward model is given by the set of differential equations (Eq. 1) proposed by Levin et al. (1979). Compare fitted parameter value(s) to the "true" values as employed in the FM Repeat for different settings to cover scenarios of interest Figure 1: Flow chart summarizing the steps followed in the VSE study. The initial concentrations of the four state variables and six parameters (known from previous studies) are used as inputs for the Forward model (FM). Samplings at pre-determined frequencies are conducted from the generated "truth". The inverse model (IM) is fitted to this data for parameter estimation under ten modeling scenarios and the results are compared with the known or "true" parameter values. The overall results are validated for different experimental conditions. of the strains (C denotes the resource) ψ strain in rows 1-3 and a bulk plasmid transfer rate constant γ (ml cells −1 hour −1 ). In most cases, resource limitation is expressed by a Monod model or a carrying capacity. For simplicity, 87 Levin et al. (1979) model assumed all strains share the same intrinsic growth rate constant (i.e., ψ D =ψ R =ψ T =ψ). The

88
FM in our study is developed by extending Eq. 1. We added four processes to the original model and investigated the 89 contribution of each one of them towards the accuracy in the estimated plasmid transfer rate constant.

90
• Process 1 -(CS T : takes into account the reduction in the growth rate due to plasmid-bearing costs (Carroll and 91 Wong, 2018).

92
• Process 2 -(LAG): caters to the initial "no-growth" phase encountered when the D and R are confronted with 93 new experimental conditions. This is otherwise known as the initial adjustment phase or inactive stage, or 94 the lag phase in growth (Rolfe et al., 2012). We assume that the cells newly formed during the course of our 95 experiment do not have to undergo an additional no-growth phase as there is no change in the experimental 96 conditions and medium.

97
• Process 3 -(DNS ): introduces a dependency of plasmid transfer on growth. Therefore, we assume a decline 98 in the plasmid transfer rate as the system approaches the stationary phase, e.g., due to resource limitation, as 99 reported by empirical studies (Levin et al., 1979;Sysoeva et al., 2020).

100
• Process 4 -(MAT ): includes the maturation time required for a newly formed T to acquire the full capabilities 101 of a donor cell. Immature T cannot transmit the plasmid to further recipients (Zhong et al., 2010).

102
The FM, including all the four processes listed above, is expressed in Eq. 2. In our model, the processes are 103 represented in the form of functions with the same names as the processes (CS T , LAG, DNS , MAT ) and are explained 104 in Eq. 3.

105
[ where t stands for time (hours), N stands for the total density of the population (D+R+T), total transconjugants T = T young +T mature . All the parameters occurring in Eq. 3 and 4 are defined in Table 1. Eq. 3b is based on the mechanistic 107 model for simultaneously determining the lag phase (Huang, 2011(Huang, , 2008     The frequency of sampling (varied between 15 minutes and 3 hours) had a minor impact on the accuracy of the the errors in estimated γ were the highest (by 3.2 orders of magnitude for sampling done every 15 minutes) in scenario 1, which was the basic model from Eq. 1 without any additional processes.

167
The errors in the estimated logγ in scenarios with combinations of processes (scenarios 6 -10 Fig   the scenarios, R and D are consistently overestimated by the models until the stationary phase is reached, except for the anomaly observed in scenario 1 (no additional processes) -where the D strain is grossly overestimated throughout 198 the simulation (not visible because the graph collapses with that of the R strain). In all scenarios (except 3, 6, and 8),

199
T was overestimated until saturation and underestimated afterward. We did not find a notable effect of the value of 200 logγ true (assumed in the FM) on the performance of the models in estimating the densities of the strains. Effect of the maturation time. At the default value of logγ = -12, the change in the maturation time showed no effect on the estimated logγ (Fig. 4e). The error in estimation increased with the maturation time in all models which 226 were subjected to structural deficits (scenarios 1-5). Under default settings, the contribution of the single functions in 227 reducing the estimation error followed the order LAG > CS T » DNS (> MAT ≈ 0).

228
Effect of the D:R ratio. While the errors in the estimated parameter in scenario 2 (including CS T ) remained fairly 229 constant with the variation in the initial D:R ratios, scenario 3 (including LAG) showed a variation of < 1 order in the 230 estimated error between the extremes (compare D:R 1000:1 and 1:1000 for scenario 3 in Fig. 4f). Other scenarios 231 recorded high variability in the estimation errors ≈ 2 orders between the different test conditions (Fig. 4f). For had no pronounced impact on the model performance (compare scenarios 6 and 8 in Fig. 5b). At higher growth rates, 245 excluding LAG (scenarios 7 and 9), resulted in an increase in the error in the estimated logγ.

246
Effect of the logγ. At lower plasmid transfer rates (e.g., logγ < -10), the combination scenarios were insensitive to 247 varying logγ (like the individual functions in Fig. 4c). However, as conjugation became a relevant process (logγ >

248
-10), we observed a rise in the contribution of DNS (compare scenario 6 with 8 in Fig. 5c).

249
Effect of the lag phase. Scenarios 7 and 9 (where LAG is omitted) recorded a steep increase in the estimation error 250 (from 0 to 1.7 orders of magnitude) with the duration of the lag phase (Fig. 5c). At shorter lag phases (< 0.5 hours) q q q q q q q Figure 5: Sensitivity of model outcome (in terms of the error in estimated logγ) to variable initial parameters for scenarios 6 to 10 (with combinations of processes). Samples from the FM were collected every 2 hours. Note that for each study (subfigures: a-e), only the one mentioned parameter was altered from the default settings. In subfigure (f), Sensitivity analysis was performed with respect to a variable initial D:R ratio while the parameters were the same as for the default settings (Tab. 1). Negative errors indicate an underestimation of the logγ by structurally deficient models. Function definitions-CS T : Plasmid cost, LAG: Lag phase, DNS : Saturation MAT : Maturation time.
Effect of the maturation time. For all tested values, the maturation time had no impact on the performance of the various IMs (Fig. 5e). Including DNS in scenario 6 (resulting in scenario 8) improved the estimate by 0.4 orders,

254
whereas including LAG to scenario 7 (also resulting in scenario 8) improved the estimate by ≈ 1 order of magnitude.

255
Effect of the D:R ratio. While the errors in the estimated parameter in scenarios 8 (and 10) remained constant with 256 the variation in the initial D:R ratios, scenarios 6 showed a negligible variation of < 0.2 orders between the extremes 257 ( Fig. 5f). However, estimates from scenarios 7 and 9 varied over > 0.6 orders of magnitude with initial D:R ratios. was not taken into consideration. The logγ estimated using the scenarios with CS T were less sensitive to initial D:R.
In the other scenarios, we observed a large dependence of the estimated logγ on initial D:R ratios.

282
Though we observed an underestimation in the logγ due to the neglect of CS T , we are aware of some studies 283 that reported a growth advantage for some plasmid-bearing cells (Wu et al., 2018). For plasmids that have a positive 284 impact on the growth of D and T, we would observe an overestimation in the plasmid transfer rate constant when CS T 285 is neglected. In such a situation, the overestimation caused by neglecting CS T can be countered by the exclusion of 286 LAG (which always creates a negative bias) to produce an IM which could give unbiased estimates of the transfer rates, 287 despite the structural deficits. However, VME must be used to quantify the bias in the parameter estimates arising 288 from such structural deficits (as an individual deficit or in combination) to ensure an apprised model development.

289
Lag phase (LAG). Studies suggest that the lag phase in bacteria could last from anything between 20 minutes to a 290 few hours (Rolfe et al., 2012;Mellefont and Ross, 2003). In the absence of LAG, the growth parameter (µ) is wrongly 291 assumed to be constant throughout the simulation. This leads to an overestimation of the densities of R and D. The

292
inevitable overestimation of T, however, is prevented by an underestimation of the logγ in the fitting process. Under 293 the default parameterization of FM, LAG was found to reduce the total error in the parameter estimate by 1.3 orders 294 of magnitude (which is ≈ 60% of the total error). In systems with a shorter lag phase (e.g., < 30 minutes), the error in 295 the estimated logγ due to the omission of LAG is of minor importance only (bias in the estimated parameter < 0.5).

296
Saturation (DNS). Empirical studies have reported that the logγ reduces by ≈ 2 orders of magnitude as the mating 297 culture approaches the stationary growth phase (Levin et al., 1979). Function DNS defines the conjugation process 298 rate to be dependent on resource availability (or, in our case, total cell density. Eq. 3c). Ignoring DNS allows for 299 constant high production of T throughout the simulation, which is necessarily compensated by a negatively biased 300 estimate of the logγ. Under the default parametrization, the inclusion of DNS reduced the total error in the estimated 301 logγ by only 0.4 orders of magnitude. At lower growth rates < 1.5 hour −1 , DNS was found to be of minor importance 302 because in such cultures, the stationary phase is reached much later and the problem of an overestimation of T is no 303 longer important unless the mating experiments continue for more than 12 hours. On the other hand, when the growth 304 rate or the initial total cell density was high, the relevance of including DNS in the IM also increased because a larger 305 part of the observations now fell in the stationary phase. wrongly fit model is used to estimate the parameters from empirical data, and then the same model is used for predic-362 tions of population densities, we might see a compensatory effect between several wrongly fit process rates, leading 363 to correct prediction of some strains while others continue to be biased. For example, ignoring CS T might lead to 364 underestimation of the true logγ, but the T produced due to an increased growth rate and lower plasmid transfer rate 365 might still be correctly estimated, whereas the D and R in this case with be grossly over and underestimated due to 366 the wrongly fitted growth rate constant.

367
Virtual mating experiments can be used to compare and segregate the bias arising from known sources (e.g., 368 data error and model structural error). The most important outcome of our study is the fact that, under all studied 369 scenarios, the four structural deficits (CS T, LAG, DNS , MAT ) of the fitted models result in underestimation of the 370 plasmid transfer rate constant. That negative bias in transfer rate constants may -at least in parts -explain why simu-371 lation models tend to predict a too quick disappearance of the plasmid-bearing strains in the absence of advantageous 372 20 plasmid-encoded traits -a fact that is known as the "plasmid paradox" (Carroll and Wong, 2018). We also discussed other possible structural deficits which might have an opposite effect on the estimates of plasmid transfer rate con-374 stants. Because structurally deficient inverse models were used extensively in past decades, a considerable proportion 375 of plasmid transfer rate constants reported in the literature is probably subject to biases. Our study demonstrates that 376 the fitting of the structurally adequate inverse models can reduce the bias in estimated transfer rate constants and thus