## Abstract

Fitness is not well estimated from growth curves of individual isolates in monoculture. Rather, competition experiments, which measure relative growth in mixed microbial cultures, must be performed to better infer relative fitness. However, competition experiments require unique genotypic or phenotypic markers, and thus are difficult to perform with isolates derived from a common ancestor or non-model organisms. Here we describe *Curveball*, a new computational approach for predicting relative growth of microbes in a mixed culture utilizing mono- and mixed culture growth curve data. We implemented *Curveball* in an open-source software package (http://curveball.yoavram.com) and validated the approach using growth curve and competition experiments with bacteria. *Curveball* provides a simpler and more cost-effective approach to predict relative growth and infer relative fitness. Furthermore, by integrating several growth phases into the fitness estimation, *Curveball* provides a holistic approach to fitness inference from growth curve data.

Growth curves are commonly used in microbiology, genetics, and evolutionary biology to estimate the fitness of individual microbial isolates. Growth curves describe the density of cell populations in liquid culture over a period of time and are usually acquired by measuring the optical density (OD) of one or more cell populations. The simplest way to infer fitness from growth curves is to estimate the growth rate during the exponential growth phase by inferring the slope of the log of the growth curve^{1} (see example in Figure 1). Indeed, the growth rate is often used as a proxy of the selection coefficient, *s*, which is the standard measure of relative fitness in population genetics^{2,3}. However, exponential growth rates do not capture the dynamics of other phases of a typical growth curve, such as the length of lag phase and the cell density at stationary phase^{4} (Figure 1). Thus, it is not surprising that growth rates are often poor estimators of relative fitness^{5,6}.

Evolutionary biologists use competition experiments to infer relative fitness in a manner that accounts for all growth phases^{7}. In pairwise competition experiments, two strains are grown together in a mixed culture: a reference strain and a strain of interest. The frequency of each strain in the mixed culture is measured during the course of the experiment using specific markers^{7} such as the expression of drug resistance markers on colony counts, fluorescent markers monitored by flow cytometry^{8} or by deep sequencing read counts^{9,10}. The selection coefficient of the strains of interest can then be estimated from changes in their frequencies during the competition experiments. These methods can infer relative fitness with high precision^{8}, as they directly estimate fitness from changes in isolate frequencies over time. However, competition experiments are more laborious and expensive than monoculture growth curve experiments, requiring the development of genetic or phenotypic assays (see Concepción-Acevedo et al.^{5} and references therein). Moreover, competition experiments are often impractical in non-model organisms. Therefore, many investigators prefer to use proxies of fitness such as growth rates.

Even when competition experiments are a plausible approach (for example, in microbial lineages with established markers^{7}), methods for interpreting and understanding how differences in growth contribute to differences in fitness are lacking. Such differences have a crucial impact on our understanding of microbial fitness and the composition of microbial populations and communities.

Here we present *Curveball*, a new computational approach implemented in an open-source software package (http://curveball.yoavram.com). *Curveball* provides a predictive and descriptive framework for estimating growth parameters from growth dynamics, predicting relative growth in mixed cultures, and inferring relative fitness.

## Results

*Curveball* consists of three stages: (a) fitting growth models to monoculture growth curve data, (b) fitting competition models to mixed culture growth curve data and using the estimated growth and competition parameters to predict relative growth in a mixed culture, and (c) inferring relative fitness from the predicted relative growth. The following experimental setting was used to test this approach.

### a. Monoculture growth

In each experiment, two *Escherichia coli* strains, each labeled with a green or red fluorescent protein (GFP or RFP), were propagated in a monoculture and in a mixed culture, and the cell density was measured for each strain for at least 7 hours (Figure 2).

#### Growth model

The Baranyi-Roberts model^{11} is used to model growth composed of several phases: lag phase, exponential phase, deceleration phase, and stationary phase^{1}. The model assumes that growth rate accelerates as cells adjust to new growth conditions, then decelerates as resources become scarce, and finally halts when resources are depleted. The model is described by the following ordinary differential equation [see eqs. 1c, 3a, and 5a in^{11}]:
where *t* is time, *N* = *N(t)* is the population density at time *t, r* is the specific growth rate in low density, *K* is the maximum density, *v* is a deceleration parameter, and *α(t)* is the adjustment function. For a derivation of eq. 1 and further details, see Supporting text 1.

The adjustment function describes the fraction of the population that has adjusted to the new growth conditions by time *t* (*α*(*t*) ≤ 1). Typically, an overnight liquid culture of microorganisms that has reached stationary phase is diluted into fresh media. Following dilution, cells enter lag phase until they adjust to the new growth conditions. We chose the specific adjustment function suggested by Baranyi and Roberts^{11}, which is both computationally convenient and biologically interpretable: *q*_{0} characterizes the physiological state of the initial population, and *m* is the rate at which the physiological state adjusts to the new growth conditions.

The Baranyi-Roberts differential equation (eq. 1) has a closed form solution:
where *N*_{0} = *N*(0) is the initial population density. For a derivation of eq. 2 from eq. 1, see Supporting text 1.

#### Model fitting

We estimated the growth model parameters by fitting the model (eq. 2) to the monoculture growth curve data of each strain. The best fit is shown in Figure 2D-F (see Table S1 for the estimated growth parameters). From this model fit we also estimate the maximum specific growth rate , the minimal specific doubling time, and the lag duration (Table 1). The strains differ in their growth parameters; for example, in experiment A (Figure 2A,D), the red strain grows 40% faster than the green strain, has 23% higher maximum density, and a 60% shorter lag phase.

### b. Mixed culture growth

#### Competition model

To model growth in a mixed culture, we assume that interactions between the strains are solely due to resource competition. We derived a new two-strain Lotka-Volterra competition model^{12} based on resource consumption (see Supporting text 2):
*N _{i}* is the density of strain

*i*= 1,2 and

*r*

_{i}, K_{i}, v_{i}, a_{i}, q_{0,i}, and

*m*are the values of the corresponding parameters for strain

_{i}*i*obtained from fitting the monoculture growth curve data.

*a*are competition coefficients, the ratios between inter- and intra-strain competitive effects.

_{i}This competition model explicitly assumes that interactions between the strains are solely due to resource competition. Therefore, all interactions are described by the deceleration of the growth rate of each strain in response to growth of the other strain. Of note, each strain can have a different limiting resource and resource efficiency, based on the maximum densities *K _{i}* and competition coefficients

*a*determined for each strain.

_{i}Eq. 3 is fitted to the growth curve of a mixed culture that includes both strains, in which the combined OD of the strains is recorded over time (but not the frequency or density of each individual strain). This fit is performed by minimizing the squared differences between *N*_{1} + *N*_{2} (eq. 3) and the observed OD from the mixed culture and yields estimates for the competition coefficients *a _{i}* (Figure 3A-C).

Using the estimated parameters, eq. 3 is solved by numerical integration, providing a joint prediction for the densities *N*_{1}(*t*) and *N*_{2}(*t*). From the predicted densities, the frequencies of each strain over time can be inferred:

#### Prediction validation

To test this method, we performed growth curve and competition experiments with two different sets of *E. coli* strains marked with fluorescent proteins. In experiments A and B we competed DH5α-GFP vs. TG1-RFP; in experiment C we competed JM109-GFP with MG1655-Δfnr-RFP (see Figure 2A-C).

In each experiment, 32 replicate monocultures of the GFP strain, 30 replicate monocultures of the RFP strain alone, and 32 replicate mixed cultures containing the GFP and RFP strains together, were grown in a 96-well plate, under the same experimental conditions. The optical density of each culture was measured every 15 minutes using an automatic plate reader. Samples were collected from the mixed cultures every hour for the first 7–8 hours, and the relative frequencies of the two strains were measured using flow cytometry (see Materials and Methods).

Empirical competition results (green and red error bars), *Curveball* predictions (green and red dashed lines), and exponential model prediction (dashed black lines; see Figure 1 for details) for three different experiments are shown in Figure 3D-F. *Curveball* performs well and clearly improves upon the exponential model for predicting competition dynamics in a mixed culture.

### c. Fitness inference

The best way to infer the relative fitness of two strains is to perform pairwise competition experiments^{7}: growing both strains in a mixed culture and measuring the change in their frequencies over time. Using *Curveball*, pairwise competitions can be predicted by simply measuring the optical density during growth in mono- and mixed cultures, without directly measuring strain frequencies.

Relative fitness (given by 1+*s*,where *s* is the *selection coefficient* of the strain of interest) can be estimated from pairwise competition results using^{3}:
where *N*_{1}(*t*) and *N*_{2}(*t*) are the frequencies or densities of the strains and *t* is time. Using *Curveball*, we inferred the average selection coefficient of the red strain based upon eq. 4 applied to the predicted densities of the strains (Figure 3D-F): *s*=0.011, 0.01, and 0.021 in experiments A, B, and C, respectively.

## Discussion

We developed *Curveball*, a new computational approach to predict relative growth in a mixed culture from growth curves of mono- and mixed cultures, without measuring frequencies of single isolates within the mixed culture. We tested and validated this new approach, which performed far better than the model commonly used in the literature. *Curveball* only assumes that the assayed strains grow in accordance with the growth and competition models: namely, that growth only depends on resource availability. Therefore, this approach can be applied to data from a variety of organisms, experiments, and conditions.

We have released an open-source software package which implements *Curveball* (http://curveball.yoavram.com). This software is written in Python^{13} and includes a user interface that does not require prior knowledge in programming. It is free and open, such that additional data formats, growth and competition models, and other analyses can be added by the community to extend its utility.

Growth curve experiments, in which only optical density is measured, require much less effort and resources than pairwise competition experiments, in which the cell frequency or count of each strain must be determined^{5,7,8,14}. Current approaches to estimating fitness from growth curves mostly use the growth rate or the maximum population density as a proxy for fitness. However, the growth rate and other proxies for fitness based on a single growth parameter cannot capture the full scope of effects that contribute to differences in overall fitness^{15}.

In contrast, *Curveball* integrates several growth phases into the fitness estimation, allowing a more holistic approach to fitness inference from growth curve data and providing information on the specific growth traits that contribute to differences in fitness. We hope that *Curveball* will improve and ideally standardize the way fitness is estimated from growth curves, thereby improving communication between empirical and theoretical evolutionary biologists and ecologists.

## Conclusions

We developed and tested a new method to analyze growth curve data, and applied it to predict growth of individual strains within a mixed culture and to infer their relative fitness. The method improves fitness estimation from growth curve data, has a clear biological interpretation, and can be used to predict and interpret growth in a mixed culture and competition experiments.

## Materials and Methods

### Strains and plasmids

*Escherichia coli* strains used were DH5α (Berman lab, Tel-Aviv University), TG1 (Ron lab, Tel-Aviv University), JM109 (Nir lab, Tel-Aviv University), and K12 MG1655-Δfnr (Ron lab, Tel-Aviv University). Plasmids containing a GFP or RFP gene and genes conferring resistance to kanamycin (Kan^{R}) and chloramphenicol (Cap^{R}) (Milo lab, Weizmann Institute of Science^{16}). All experiments were performed in LB media (5 g/L Bacto yeast extract (BD, 212750), 10 g/L Bacto Tryptone (BD, 211705), 10 g/L NaCl (Bio-Lab, 190305), DDW 1 L) with 30 μg/mL kanamycin (Caisson Labs, K003) and 34 μg/mL chloramphenicol (Duchefa Biochemie, C0113). Green or red fluorescence of each strain was confirmed by fluorescence microscopy (Nikon Eclipe Ti, Figure S1).

### Growth and competition experiments

All experiments were performed at 30°C. Strains were inoculated into 3 ml LB+Cap+Kan and grown overnight with shaking. Saturated overnight cultures were diluted into fresh media so that the initial OD was detectable above the OD of media alone (1:1–1:20 dilution rate). In experiments that avoided a lag phase, cultures were pre-grown until the exponential growth phase was reached as determined by OD measurements (usually 4–6 h). Cells were then inoculated into 100 μL LB+Cap+Kan in a 96-well flat-bottom microplate (Costar):

32 wells contained a monoculture of the GFP-labeled strain

30 wells contained a monoculture of the RFP-labeled strain

32 wells containing a mixed culture of both GFP- and RFP-labeled strains

2 wells contained only growth medium

The cultures were grown in an automatic microplate reader (Tecan infinite F200 Pro), shaking at 886.9 RPM, until they reached stationary phase. OD_{595} readings were taken every 15 minutes with continuous shaking between readings.

Samples were collected from the incubated microplate at the beginning of the experiment and once an hour for 6–8 hours: 1–10 μL were removed from 4 wells (different wells for each sample), and diluted into cold PBS buffer (DPBS with calcium and magnesium; Biological Industries, 02–020–1). These samples were analyzed with a fluorescent cell sorter (Miltenyi Biotec MACSQuant VYB) with GFP detected using the 488nm/520(50)nm FITC laser and RFP detected with the 561nm/615(20)nm dsRed laser. Samples were diluted further to eliminate “double” event (events detected as both “green” and “red” due to high cell density) and noise in the cell sorter^{8}.

Fluorescent cell sorter output data was analyzed using R^{17} with the *flowPeaks* package that implements an unsupervised flow cytometry clustering algorithm^{18}.

### Data analysis

Growth curve data were analyzed using *Curveball*, a new open-source software written in Python^{13} that implements the approach presented in this manuscript. The software includes both a programmatic interface (API) and a command line interface (CLI), and therefore does not require programming skills. The source code makes use of several Python packages: NumPy^{19}, SciPy^{20}, Matplotlib^{21}, Pandas^{22}, Seaborn^{23}, LMFIT^{24}, Scikit-learn^{25}, and SymPy^{26}.

### Model fitting

To fit the growth and competition models to the growth curve data we use the *leastsq* non-linear curve fitting procedure^{20,24}. We then calculate the Bayesian Information Criteria (BIC) of several nested models, defined by fixing some of the parameters (see Supporting text 1, Figure S2, and Table S1). BIC is given by:

where *k* is the number model parameters, *n* is the number of data points, *t _{i}* are the time points,

*N*(

*t*) is the optical density at time point

_{i}*t*

_{i}, and is the expected density at time point

*t*according to the model. We selected the model with the lowest BIC

_{i}^{27,28}.

### Data availability

Data deposited on *figshare* (doi:10.6084/m9.figshare.3485984).

### Code availability

Source code will be available upon publication at https://github.com/yoavram/curveball; an installation guide, tutorial, and documentation will be available upon publication at http://curveball.yoavram.com.

### Figure reproduction

Data was analyzed and figures were produced using a Jupyter Notebook^{29} that will be available as a supporting file.

## Author contributions

All authors designed the experiments, analyzed data, discussed the results and edited the manuscript. Y.R. and L.H. developed the model and wrote the manuscript. U.O. advised on statistical analysis. Y.R. wrote the source code. Y.R., E.D.G. and M.B. performed the experiments. M.B. performed fluorescent microscopy. J.B. advised and gave support to all experiments. L.H. supervised all the work.

## Acknowledgments

We thank Y. Pilpel, D. Hizi, I. Françoise, I. Frumkin, O. Dahan, A. Yona, T. Pupko, A. Eldar, I. Ben-Zion, E. Even-Tov, H. Acar, T. Pupko, and E. Rosenberg, for helpful discussions and comments, and L. Zelcbuch, N. Wertheimer, A. Rosenberg, A. Zisman, F. Yang, E. Shtifman Segal, I. Melamed-Havin, and R. Yaari for sharing materials and experimental advice. This research has been supported in part by the Israel Science Foundation 1568/13 (LH) and 340/13 (JB), the Minerva Center for Lab Evolution (LH), Manna Center Program for Food Safety & Security (YR), the Israeli Ministry of Science & Technology (YR), TAU Global Research and Training Fellowship in Medical and Life Science and the Naomi Foundation (MB), and the European Research Council (FP7/2007–2013)/ERC grant 340087 (JB).