Chaos in small microbial communities

Predictability is a fundamental requirement in biological engineering. As we move to building coordinated multicellular systems, the potential for such systems to display chaotic behaviour becomes a concern. Therefore understanding which systems show chaos is an important design consideration. We developed a methodology to explore the potential for chaotic dynamics in small microbial communities governed by resource competition, intercellular communication and competitive bacteriocin interactions. We show that we can expect to find chaotic states in relatively small synthetic microbial systems, understand the governing dynamics and provide insights into how to control such systems. This work is the first to query the existence of chaotic behaviour in synthetic microbial communities and has important ramifications for the fields of biotechnology, bioprocessing and synthetic biology.


Introduction
Chaos can be defined as deterministic behaviour that displays aperiodic orbits and sensitivity to initial conditions [1]. Infinitesimally small differences in initial conditions of a chaotic system will become amplified over time, making forecasting and prediction of behaviour impossible [2]. Despite being deterministic, chaotic systems possess an inherent uncertainty due fact that we can never describe the initial conditions of a system in sufficient detail. Building systems which behave in a predictable and repeatable manner is essential across fields invested in engineering biology and its applications. Evidence from studies of neural networks suggests the increasing probability of chaotic behaviour as the number of dimensions in the network grow [3,4,5]. Therefore we might expect opportunities for unpredictable behaviour to become more probable as we try and implement larger synthetic communities, or edit existing networks such as the human gut microbiome. Steps to date have not been taken to investigate the existence of chaos in small synthetic microbial networks.
A long-term goal of engineering biology is to create truly scalable and robust synthetic microbial communities [6,7]. Therefore understanding and evaluating the possibility of chaotic behaviour in a system becomes an important consideration.
Observations of chaotic behaviour in biological systems have been reported. A three species system containing one predator and two prey species has been demonstrated to produce chaotic behaviour, with dilution rate a key parameter in enabling aperiodic behaviours [8]. An eight year study of a planktonic food web measured chaotic behaviours, resulting in subpopulation abundance predictability being limited to 15-30 days, despite constant external conditions [9]. These experimental examples demonstrate that a low number of species are capable of producing chaotic behaviour and are therefore unpredictable.
In order to predict chaotic behaviour in synthetic microbial communities, we need to develop models that capture interactions between different community species. Generalised competitive Lotka-Volterra equations (gLV) have previously been used to model pair-wise interactions and infer inter-species relationships [10]. However in other circumstances, gLV models provide an incomplete description of interactions we expect to find in microbial communities. They are unable to capture the existence of chaos in three species networks [11]. Furthermore, gLV models have failed to predict community formation from pairwise interactions in microbial communities [12]. gLV models lack dynamics that occur with the accumulation and depletion of extracellular species, which can be important for predicting the true dynamics of a community [13]. Modified Lotka-Volterra equations produce chaotic behaviour in predator-prey systems by including time-delayed feedback [14,13], or in one predator two prey systems, by adding dampening effects [15]. While these abstractions are suitable in some circumstances, using them to inform gene regulation networks and community design can be difficult. By modelling the intermediates involved in competitive interactions we can include experimentally measurable mechanisms and parameters. In previous work, we have modelled quorum sensing (QS) to regulate bacteriocin expression and engineer inter-population interactions.
These methods allowed us to tune experimental parameters of an existing two strain system [16], and predict the most promising topologies for producing stability in two and three strain systems [17].
The existence of chaos in dynamical models and identification of chaotic parameter space can be identified using various optimisation techniques. The unscented Kalman filter has previously been used to investigate chaos in electrical circuits and biological systems, obtaining parameters yielding chaos [18]. Simulated annealing has been applied to finding chaotic parameters in four species standard Lotka Volterra models [19]. Evidence also suggests that perturbation of system parameters can be used to drive systems towards or away from chaotic attractors [20]. The possibility of chaos in synthetic microbial communities, to our knowledge, has not been previously considered.

Searching for chaos in microbial community models
In previous work we developed a model framework to describe QS regulated bacteriocin interactions in a three strain model space, and predicted topologies that form stable communities [17]. Here we use this same model space to investigate the existence of chaos in three strain synthetic microbial communities. Figure 1a shows the pipeline we developed to search for chaos in synthetic three strain systems. The initial model space describes an enumeration of possible combinations of bacteriocin and QS systems. Prior parameter distributions describe the range of characteristics for the different parts (Table 1). We expected the existence of chaos to be sparse in this three strain model space, and therefore computationally expensive to explore. Approximate Bayesian Computation Sequential Monte Carlo (ABC SMC) is a method that can be used for model selection and parameter inference in dynamical systems [21]. The definitions we use to classify oscillatory and chaotic behaviours are described in Methods. We also define an extinction threshold of 10 −5 , if a strain population falls below this it is classified as extinct. In order to narrow down the search, we first performed ABC SMC for an oscillations objective. Oscillations are a known route to chaos [1]. Although this is not an exhaustive search, we assumed it to be sufficient for prioritising in our search for chaos. We identified 117 models capable of producing oscillations, these models were then considered for the chaos objective (Figure 1a).
The chaos objective is defined by calculating the maximal Lyapunov exponent (λ 1 ). We calculate λ 1 by initialising two nearby orbits and measuring their divergence or convergence over the course of a simulation (Methods). λ 1 < 0 corresponds to linear stability, λ 0 = 0 corresponds to periodic oscillations, and λ 1 > 0 corresponds to chaos. Due to the limited time frame from which calculate the Lyapunov exponent, oscillations are unlikely to give rise to precisely λ 1 = 0. By running ABC SMC for an objective of λ > 0 and manually inspecting the trajectories in this population we determined a threshold of λ 1 > 0.003, above which we are confident only chaotic behaviour exists, and below which only periodic oscillatory behaviour exists. Performing ABC SMC for the chaotic objective (λ > 0.003), we identified 25 models that produced chaotic behaviour. The posterior probabilities of the models are shown in Figure 1b. Figure 1c shows a representative chaotic trajectory, demonstrating aperiodic non-repeating behaviour, satisfying the qualitative features of chaos.

Properties of chaotic models
We next explored some of the properties of chaotic topologies we identified using ABC SMC. classic debate on the complexity-stability relationship in theoretical ecology is likely highly dependent on the nature of the biological interactions involved [22,23], but here we see some evidence for a peak in the probability of chaotic behaviour at four parts. This is in contrast to our previous findings, where system stability increased with the number of parts [17].
Strains QS x x Initial model space Figure 1: Overview of the pipeline for identifying chaotic topologies a By combining engineering options in different combinations, we generate 4182 models that form our initial model space. We then perform ABC SMC for an oscillatory objective which yielded 117 models that were capable of producing oscillations. These form the prior model space for the chaos objective. b The barchart shows the probability of models for the chaotic objective. The error bars represent the standard deviation. c An example time series representative of the dataset; it shows sustained, nonrepetitive oscillatory behaviour for the three species community.  teriocin produced by a different strain. Both SL only and a combination of SL and OL interactions are associated with producing chaotic behaviour. These observations are interesting in comparison to other work on ecological systems. Cooperative interactions were previously found to give rise to unstable systems, whereas competition was more indicative of stability [24]. The same effect might occur here in systems with one QS, rather than two, as the system would be expected to have increased correlation. While chaotic behaviour appears to be very different from linear stability, both behaviours share the necessity for coexistence. This may explain why we see tendencies for topologies to share a mixture of stability associated SL interactions, and instability associated OL interactions. We also find models with three bacteriocins, and hence higher suppression of growth, have a low posterior probability for chaos.

Parameter importance for chaos
The model with the highest posterior probability for chaotic behaviour was m 850 , the topology is shown in Figure 3a. It consists of a single QS system, produced by N 1 , that positively regulates two bacteriocins. B 1 is produced by N 1 and N 2 but it inhibits the growth of N 1 only. B 2 is produced by N 3 and inhibits the growth of N 3 only. The system in total consists of four expressed parts. m 850 also ranked highly for the oscillatory objective, ranking 3rd out of the initial 4182 models. This presents an interesting problem whereby a model that has promising use as an oscillator also has a high potential to produce chaos, relative to other candidate models. Identifying the parameters and initial conditions important for differentiating between chaotic and oscillatory behaviour gives us insight into how to control this behaviour when constructing genetic circuits or selecting chemostat settings.
As a first step, we analyzed the model to quantify the possible steady states and basins of attraction. Our analysis gave analytical conditions for the existence and stability for complete extinction and for single strain survival (See Methods). For three-strain co-existence, we find the following necessary conditions: This shows that for three-strain co-existence, the maximal growth rate of N 2 has to lie between certain upper and lower bounds. In particular, it has to be smaller than the maximal growth rate of N 1 or N 3 . We can see from the topology of m 850 (Figure 3a) that the growth of N 2 is not limited by any bacteriocin, therefore the only limitation on growth comes through resource competition. If N 2 had a higher growth rate than N 1 or N 3 , it would out compete these strains and cause an extinction event.
We then wanted to explore the most important parameters that separate oscillatory and chaotic behaviours in m 850 only. We refer to a set of parameters and initial conditions as an input vector. Using ABC SMC, we performed parameter inference on m 850 for the chaotic and oscillatory objectives, generating 3750 input vectors for each objective. We can use this dataset of labelled input vectors to understand the importance of individual parameters, initial conditions and nearby steady states. Figure 3b shows multivariate parameter distributions for the oscillator and chaotic objectives for the experimentally accessible parameters. The dilution rate (D) is a directly controllable parameter of the chemostat. The production rate of A 1 (kA 1 ) can be tuned by using an inducible promoter to control expression of the AHL synthase species. Strain maximal growth rates (µ max1 , µ max2 , µ max3 ) can be controlled by using different base strains or through the combined use of auxotrophic strains and defined media. Finally, the initial population densities (N 1 , N 2 , N 3 ) can easily be set when inoculating the initial culture. Divergence between two parameter distributions indicates its importance in differentiating between the two objectives. We can see that the oscillatory objective distributions for D, N 1 and µ max2 are all constrained towards lower values relative to the prior. However, for all these distributions we can see that the chaotic and oscillatory regions overlap. This again implies that chaotic and oscillatory behaviour exist close to one another in parameter space, and highlights the multidimensional nature that determines the behaviour.
To further investigate the importance of parameters and initial conditions we trained a random forest classifier model using the input vectors as features. This classifier model was able to classify the test set with a~90% accuracy (Methods, Figure 6). Figure 3c shows the average information gain across all decision tree classifiers in the forest for all free parameters. This can be used as an indicator of feature importance in correctly classifying an input vector. K A 1 B 1 and K A 1 B 2 describe the concentration of A 1 required to produce half-maximal repression of bacteriocins B 1 and B 2 respectively. While the feature importance indicates these parameters are the most important, they are more difficult to tune compared with other parameters in this system. The error bars indicate the variability in the importance of a feature across all trees in the forest. Large error bars suggest single features are not essential for classification, and that redundancy exists between the features used [25].
From the set of chaotic input vectors, we used numerical methods to identify nearby steady states by changing the initial state of the system. Figure 3d shows the sensitivity analysis of a chaotic input vector, with a nearby three-strain stable steady state shown in green. Starting from the stable steady state (green), we perturbed the initial species values of either N 1 , B 1 or B 2 individually. The plots show how changing these initial states yields different Lyapounv exponents, highlighting the chaotic region in red. The range of Lyapunov exponents shown in Figure 3d suggest that by changing the initial conditions only we are able to produce a range of different behaviours. Perturbing N 2 , N 3 or A 1 did not produce chaotic behaviour. It is interesting that the initial state of N 1 as the A 1 producing strain appears to more important whereas the initial concentration of A 1 itself is not.

Exploring the parameters in the transition to chaos
Being able to move a system from a chaotic state to a fixed point could be important in a bioprocess control scenario so we explored this in more detail. Previous studies have frequently identified the bioreactor dilution rate as an important parameter for transitioning between different population dynamics [26,27,28]. Figures 3b strongly indicated D to be important for defining chaotic behaviour. We previously identified the QS production rate, k A1 and the dilution rate, D, as important parameters for transitioning between co-existence and extinction states [16]. We hypothesised that the antagonistic effect of k A1 to D would make it also make it a useful parameter for controlling population behaviour.
First, we took an input vector known to produce chaotic behaviour and randomly sampled new values for k A1 and D from the prior and calculated the Lyapunov exponent of the new input vector. Figure 4a shows the results where filled colour indicates the maximal Lyapunov exponent calculated at each grid reference. The grid outline indicates the classification range and the red grid region of Figure 4a shows the chaotic region. As can clearly be seen, changing D and k A1 affects the Lyapunov exponent. The bifurcation diagrams in Figure 4b and c for k A1 and D respectively, illustrate the antagonistic transitions in behaviour that occur when changing the two parameters. Figure 4a and b show transitions through one strain extinctions (N x < 10 −5 , stable steady state, oscillations and chaotic behaviour. Figure 4a and c both show that increasing k A1 results in transitions from   Table 2 stable co-existence, through oscillations and then to chaos, followed abruptly by an extinction event. Figure 4b and c both show that low a lower dilution rate is associated with chaos; increasing the dilution rate reduces instability to produce oscillations, which abruptly transitions to a stable extinction state.
In a bioreactor control scenario it is interesting to understand if a community could be switched between states in real time. Figures 4d and e show how this is possible by modifying k A1 and D respectively. The red arrows on Figure 4a indicate the position of the single start point and two end points in these real-time transitions. It's important to note that when ramping up the dilution rate in real-time, we reach stable steady state in a region that would not be obtainable with a fixed dilution rate.

Discussion/Conclusions
We developed a novel methodology to explore parameter regions that give rise to chaotic dynamics.
We applied it to the exploration of chaotic dynamics in synthetic microbial communities and found a high prevalence of such dynamics in these systems. This work is the first to query the existence of chaotic behaviour in synthetic microbial communities. We show that we can expect to find chaotic states in relatively small synthetic microbial systems, which has important ramifications for the field.
We expect it will become increasingly important to consider the location of chaotic attractors in parameter space as the microbial communities we build or interact with become more complex.
These methods can easily be applied to parametrise different models. It would be interesting to compare the existence of chaotic attractors in systems that use toxin-antitoxin systems [29], combination of cooperative and competitive interactions [30], or mutualistic only interactions [31]. Full scale metabolic models contain a large number of linear reactions [32], they can be combined to describe microbial communities and used to model industrial bioprocesses [33,34]. Given the high dimensional nature of metabolic networks, it would be interesting to investigate whether these models yield chaotic behaviour in small community networks.
To conclude, we have developed methods for identifying chaotic parameter regions using ABC SMC. Although chaotic attractors are generally thought to be sparse in low dimensional systems, we have shown their existence in realistic synthetic microbial systems. They may also exist in close  proximity to stable steady state regions. This work demonstrates that deterministic chaos will be an important factor in microbial community design and should be studied in much more detail.

Model space
Models are generated from a set of parts, which are expressed by different strains in the system. We Two strain model space: Three strain model space: This enables us to build possible part combinations that can be expressed by a population. Let P c be a family of sets, where each set is a unique combination of parts.
Each strain in a system can be sensitive to up to one bacteriocin. Let I represent the options for strain sensitivity. In the two strain model space, the options are insensitive, sensitive to B 1 or sensitive to B 2 (0, 1 and 2 respectively). In the three strain model space, where the options are insensitive, sensitive to B 1 , sensitive to B 2 or sensitive to B 3 (0, 1, 2 and 3 respectively).
Two strain model space: Three strain model space: Each strain is defined by it's sensitivities, and expression of parts. Let P E be all unique engineered strains: Which can be combined to form a model, yielding unique combinations in two strains and three strains: Two strain model space: Three strain model space: Finally, we use a series of rules to remove redundant models. A system is removed if: 1. Two or more strains are identical, concerning bacteriocin sensitivity and combination of expressed parts.
2. The QS regulating a bacteriocin is not present in the system.

3.
A strain is sensitive to a bacteriocin that does not exist in the system.

A bacteriocin exists that no strain is sensitive to.
This cleanup yields the options which are used to generate ODE equations for a system.

System equations
State variables in each system are rescaled to improve speed of obtaining numerical approximations.
N X describes the concentration of a strain, B z describes the concentration of a bacteriocin and A y describes the concentration of a quorum molecule. C N , C B and C A are scaling factors: Each model is represented as sets where N defines the number of strains, B defines the set of bacteriocins and A defines the set of QS systems. The following differential equations are used to represent each model.
Growth modelled by Monod's equation for growth limiting nutrient, S. mu xmax defines the maximal growth rate of the strain and K X defines the concentration of substrate required for half-maximal growth.
Killing by bacteriocin is defined by ω(B z ), where ω max defines the maximal killing rate which is set to 0 if the strain is insensitive. K ω defines the concentration at which half-maximal killing occurs.
Induction or repression of bacteriocin expression by QS, is defined by k B (z, y ), where z defines the bacteriocin being expressed and y defines the quorum molecule regulating its expression. KB max z is the maximal expression rate of the bacteriocin and K Bz is the concentration of quorum molecule at which bacteriocin is half-maximal. n z defines the cooperativity of the AHL binding.

Software packages and simulation settings
The ABC SMC model selection algorithm was written in python using Numpy [35], Pandas and Scipy [36]. ODE simulations were conducted in C++ with a Rosenbrock 4 stepper from the Boost library [37]. All simulations use an absolute error tolerance of 1e−9, and relative error tolerance of 1e − 4. Non negative matrix factorisation was conducted using Scikit-learn [38]. Simulations were conducted for 5000hrs, and were stopped early if the population of any strain fell below 1e − 5 (extinction event). Simulations with an extinction event have distances set to maximum in order to prevent excessive time spent simulating collapsed populations.

Oscillatory population dynamic objective
We define the oscillatory population dynamic using three summary statistics for each strain. First, we use Fourier transform of the population signal to find the maximum frequency, f , and convert this  to the period, T.

T = 1/f
We set a minimum period of t/2 where t is the simulation time, giving us d o 1 . d o 1 . Any simulations in which T < t/2, d o 1 is set to 0, this distance ensures that all we have frequencies of oscillations that are on a scale relevant to the time period being measured. It was found that using the signal frequency alone resulted in acceptance of many simulations with very small oscillations, or simulations that rapidly dampen. We therefore generated two additional distances that account for oscillation amplitudes to select for sustained oscillations only. We can define the number of expected peaks in the simulation, p.
Peaks in the trajectory are identified by changes from a positive gradient to a negative gradient, and troughs via changes from negative gradient to positive gradient. The peak-to-peak amplitudes are calculated by differences between consecutive peaks and troughs. A K is the number of amplitudes above the threshold, K = 0.05. d o 2 is the difference between the number of expected oscillations in the simulation, and the count of above threshold oscillations. Because incomplete oscillations at the time the simulation ends can impact the distance measurement, we set a lenient final distance threshold for d o 2 . d o 3 compares the final amplitude A F in the simulation to the threshold. We set If any strain falls below an OD of 10 −5 , the population is deemed extinct and the particle rejected.

Maximal Lyapunov exponent calculation
Lyapunov exponents can be used to measure chaotic behaviour; they describe the average exponential rate of divergence between two near trajectories of a dynamical system. The maximal Lyapunov exponent, λ 1 , can be used as determinant of chaotic behaviour. Using a method described by Sprott et al. [39], We evolve two nearby orbits and measure their average rate of separation.
This directly investigates whether small changes to an initial state will produce a disproportionate separation. By periodically readjusting the distance of divergence after each time step we measure separation across a period of time, preventing a single event dominating subsequent states (

Chaos population dynamic objective
d C 1 is the only distance for the chaotic objective. If d C 1 < 0, the particle is rejected. The final distance threshold, C , is equivalent to all λ 1 > 0.003.
For each sampled particle a prescreening process was performed to minimise time spent conducting the more computationally time consuming dual-orbit method. Simulations in which an strain fell below 1e-5 were rejected. The number of oscillations with an amplitude greater than 0.05 was counted for each strain signal. If any strain showed less than 2 oscillations the particle was rejected.
ABC SMC was conducted with population sizes of 10, repeated 255 times yielding a combined final population of 2550 particles.

Random forest classifier model
Using the sci-kit learn (sklearn) python package [38], a random forest classifier was trained using 2000 estimators. The data used consisted of 3750 oscillatory input vectors, and 3750 chaotic input vectors. Training and test datasets were generated with a ratio of 0.5 by random sampling. Figure 6 shows the performance of the classifier model on the test data.
By setting the right hand side of (1) to 0 we find a number of steady states P = (N 1 , The trivial steady state. P 0 = (0, 0, 0, S 0 , 0, 0, 0). The Jacobian of the linearisation has eigenvalues Consequently the trivial steady state always exists and is linearly stable for This shows that if the dilution rate is high enough, no strain can survive.

One strain only steady states
There are three steady states where only one strain survives, P 1 , P 2 , P 3 . While P 2 , and P 3 can be calculated explicitly, P 1 is given implicitly (see below).
We start with P 2 : We see that P 2 exists provided The linearisation at P 2 has eigenvalues This shows that P 2 exists and is linearly stable if The situation for P 3 is very similar: We see that P 3 exists provided The linearisation at P 3 has eigenvalues This shows that P 3 exists and is linearly stable if The steady state P 1 = (N 1 , 0, 0, S, B 1 , 0, A 1 ) is more complicated and can not be expressed explicitly. Instead it is given as follows: Assume there exists a solution S to the following equation where If such a solution S exists then B 1 = B 1 (S), A 1 = A 1 (S) and N 1 = DC B k A 1 C N A 1 .

Lemma 1
There exists a unique steady state P 1 if and only if Proof: We need the solution to (2) to fulfil S < S 0 in order for A 1 to be positive. We interpret the right and left-hand-sides of (2) as a functions of S, denoting them by R(S) and L(S) respectively.
It is easy to see that A 1 (S) is a decreasing function of S, B 1 (S) increases as a function of . To summarise, the single-strain survival steady state requires the corresponding maximal growth rate to be large compared to other parameters.
From the equation for N 2 we obtain that , ω max µ 2max µ 3max − µ 2max } Stability. Solved numerically using MATLAB. For each of the 3750 chaotic input vectors we used numerical root finding to calculate P 123 , and determined its stability by numerically determining the eigenvalues of the Jacobian. We found P 123 existed for all 3750 input vectors and was stable for 7.8% of them.