Microbial diversity and ecological complexity emerging from environmental variation and horizontal gene transfer in a simple mathematical model

Microbiomes are generally characterized by high diversity of coexisting microbial species and strains that remains stable within a broad range of conditions. However, under fixed conditions, microbial ecology conforms with the exclusion principle under which two populations competing for the same resource within the same niche cannot coexist because the less fit population inevitably goes extinct. To explore the conditions for stabilization of microbial diversity, we developed a simple mathematical model consisting of two competing populations that could exchange a single gene allele via horizontal gene transfer (HGT). We found that, although in a fixed environment, with unbiased HGT, the system obeyed the exclusion principle, in an oscillating environment, within large regions of the phase space bounded by the rates of reproduction and HGT, the two populations coexist. Moreover, depending on the parameter combination, all three major types of symbiosis obtained, namely, pure competition, host-parasite relationship and mutualism. In each of these regimes, certain parameter combinations provided for synergy, that is, a greater total abundance of both populations compared to the abundance of the winning population in the fixed environments. These findings show that basic phenomena that are universal in microbial communities, environmental variation and HGT, provide for stabilization of microbial diversity and ecological complexity.

In this section we are going to derive the deterministic equations governing the populations sizes n(t) and m(t) of A and B populations, respectively, from the more elementary processes.The derivation is based on the approach developed in [1,2] We assume that the carrying capacity of the environment is fixed K = const.The available resources in the environment at time t is K − n(t) − m(t).We assume that in each small time interval ∆t, either one or two entities are selected.These selections are governed by probabilities µ and 1 − µ, respectively.Specifically, for type A (and analogously for B), the elementary processes are as follows: Processes with two entities (1) AE → AA, reproduction, AA → AE, intraspecies competition, AB → EB, interspecies competition, AB → AA, gene trasfer A → B, One entity process A → ∅, death of A We denote the probability of encountering n(t) units of type A and m(t) units of type B at time t as P (n, m, t).The temporal evolution of P (n, m, t) follows the Markov dynamics, characterized by the following transition rates: Here ρ A and δ A are reproduction and death rates respectively.c AA , c AB , and standing for intra and inter-species competition, γ AB is the gene transfer rate from B → A once A and B are chosen.to get the intuition, let us consider the transition rate T (n + 1, m|n, m), that is the reproduction of A. The rate is defined as follows, with probability µ two ball process is chosen, then with probability n/K an entity of A type and K − n − m/K − 1 an entity of E is chosen, and finally with ρ A the birth happens.
The master equation for P (n, m, t) has the following form We multiply (3) by n and average over possible values of n, m ∈ [0, K].That is defining the average < n >= K n,m=0 nP (n, m, t) we get from (3) here S ′′ = {{n, m + 1} , {n, m − 1}} is the set of states that include variations in m only, the sum over these states nullifies in.Indeed, from (4) we get Changing the variable in the summation m = m ′ − 1 in the first term and m = m ′ + 1 in the third one, and noting that T (n, −1|n, 0) = T (n, 0|n, −1) = T (n, K|n, K +1) = T (n, K +1|n, K) = 0 we get that the right hand-side nullifies.
For the states in S ′ /S ′′ we do the same trick, that is changing the summation variables for n and m.We further assume that the averages in the form < nm >=< n >< m >, that is the mean-field approach, and finally neglecting the terms of 1/K in intra and inter species transition rates, that is K(K−1) (repeating the same procedure for B), we get the deterministic equations governing the population sizes n(t) =< n(t) > and m(t where , here <> are standing for averaging over P (n, m).Those for B are obtained in similar way.
From (6,7) one can derive the equivalent representation of the evolutionary processes in terms of the total abundance of both populations N (t) = n(t) + m(t) and the fractions of A and B populations in the total abundance p A = n(t) N (t) and p B = m(t) N (t) , represented in the (3,4) in the main text.
In the main text, we focus on the situation where all competition rates are equal a ij = a and γ = 0.In this case we get α = r B r A and δ = r A r B , thus, if r A ̸ = r B , then there is no coexistence of both populations, even more there is no bistability either.The only stable equilibrium state in this case is either (1, 0) or (0, 1), depending on the reproduction rates r A and r B .The, one with greater reproduction rate outcompetes the opponent.

II. OSCILLATIONS IN THE ENVIRONMENT: COARSE-GRAINED DESCRIPTION
In the fixed environment the fractions and total abundance of both populations are found by the following system of differential equations where p A + p B = 1.The fitnesses of the populations can be interpreted as expected payoffs in the following game Thus, the fitness of A and B populations are equal to ), where we have taken into account p A + p B = 1.Note, that the total abundance of both populations proliferates with by the mean fitness of both populations.In the equilibrium, the fitnesses of each population, presented in the equilibrium, nullifies.
We now proceed to the derivation of coarse-grained behavior of the fractions and total abundance of both populations in oscillating environment.Though, it has to be noted that the same results could be obtained by considering the population sizes n(τ ), m(τ ) described by (6,7).
We assume the environment is periodically varying with frequency ω > 1 and period 2π.We assume the environment oscillates in the fast time scale τ = ωt.
The variation in the environment induce oscillations in the reproduction rates and gene transfer balance rate, that is while the within and between-competition rates are time independent a ij = const.We will further assume that the time dependent rates can be represented as a sum of constant and oscillating parts r i (τ ) = r i + ri , i = A, B, such that r i (τ ) ≡ 2π 0 dτ 2π r i (τ ) = r i , and γ(τ ) = γ.Thus, in terms of the average rates we recover the deterministic description of the model in the constant environment (14,15) in the main text.
The fitnesses of each population in the oscillating environment can be represented as sum of terms with the mean rates and oscillating parts, for population A these are here we used the normalizations p A (τ ) + p B (τ ) = 1.We will discuss only for p A (τ ) = p(τ ).The case for B can be easily recovered then.
The representations of the fitnesses as a sum of two terms allow us to rewrite the right hand-side of (14) as a sum of two term G(p(τ ), N (τ )) + G(p(τ ), N (τ )), where the last term includes only oscillating rates.Similarly, for (15) we represent the right hand-side as a F (p(τ ), N (τ )) + F (p(τ ), N (τ )).
We seek the solution of (14,15) with time-dependent rates as a sum of slowly-varying (varying in t) and oscillating parts (varying in τ ) The oscillating parts averages to zero π(p(t), N (t), τ ) = V(p(t), N (t), τ ) = 0. Now we put (20,21) into the equations, and expand the right-hand side over π(p(t), N (t), τ ) and V(p(t), N (t), τ ) The oscillating parts are searched for via expanding by 1 ω , that is π = ∞ k=1 1 Note, that the averages over the period of environmental variations is again zero for π k , V k .We insert them back into (22,23), and keep the terms up to 1 ω 2 .The obtained equations containt terms in various order of magnitude and varying on different time scales.We first collect the terms of O(1) that vary on the fast time.These are These equations could be directly integrated, since the left hand side depend of both equations depend on τ through the oscillating rates.Denoting the primitives of reproduction and gene transfer balance rates as ri , i = 1, 2 and γ, that is ∂ τ ri = ri and ∂ τ γ = γ, then we get π 1 = Ĝ and V 1 = F , where Ĝ and F are the right hand sides of (24,25) with primitives of the rates.Now, we average (22,23) over the period of oscillations, note that the terms G and F .Also, the following terms nullify π 1 τ depend on τ , and these have zero averages.Similarly, on the left hand side the terms of the form ṗ ∂ ∂p π, also nullify after averaging.Thus, we end up with the following set of differential equations In the averaging process, we use the solutions of (24,25).Note, the following identities between the oscillating rates ĥ1 h1 = 0 and ĥ1 h2 = − h1 ĥ2 , where h 1 , h 2 stand for reproduction and gene transfer balance rates.Using the last observation, one can show that the diagonal terms π 1 here we have used the above identities.The remaining non-zero term is found in similar way Putting these terms back into the system (26,27) we obtain where we have denoted ξ = 1 ω rB γ and κ = 1 ω rA γ.Note, that the fitnesses f A , f B in (30,31) are the same as in the fixed/averaged environment.
The added terms could be represented in the form of additional fitnesses.That is Solving these equations for ϕ A , ϕ B we find that ϕ A = N ξ(1 − p) and ϕ B = −N κp.
Thus, dynamical analysis of the (I B) holds with these substitution.
For example, we obtained that coexistence is possible if α < 1,δ < 1 and αδ < 1.Now we substitute a 12 and a 21 by a 12 − ξ and a 21 + κ, respectively, and end up with the following conditions Particularly, for the γ = 0 and a ij = a case we get the conditions discussed in the main text As we have discussed in the SectionI B, the coarse grained approach failed in the regime where (39) is valid, but (40) holds in reverse.In this case, the community matrix A, representing the competition rates within and between populations has negative determinant.Indeed, the community matrix has the following form from which one get that DetA = a(ξ − κ) + ξκ, then we get DetA < 0 if (40) holds with reverse inequality sign.Note, that the DetA stands in the denominator of the total abundance of both populations in the coexisting regime N * c .Indeed, as one get closer to the a(ξ − κ) + ξκ = 0 line as greater becomes N * c , thus increasing the discrepancy between the results of the coarse-grained approach and the time average of the solution obtained via time-dependent rates.Before going further into the details, let us focus on the other limitation of the model.
The new payoff terms are determined via the averages of the product of gene transfer balance rate γ and the primitives of the oscillating reproduction rate ri , i = A, B, as it is seen from the form of new payoff terms ξ = 1 ω rB γ and κ = 1 ω rA γ.Therefore, γ and −γ may provide the same payoffs ξ and κ, as long as the sign of the reproduction rates are also changed.In terms of the coarse-grained approach these two cases provide the same outcome, however, the numerical solution of the model with time-dependent rates shows that the change in the sign of the rates may change the dynamics drastically, see Fig. (1).As it seen from the figure, the coarse-grained behavior of the fraction and total abundance of both populations is in accordance with γ = 1 2 sin τ .Thus, the coarse-grained approach reveals the behavior of the oscillatory behavior of one of the possible signs, but not both.
To analyze further the limitation and discrepancy between the results obtained via coarse-grained approach and the solution of the model with time-dependent rates we introduce the following binary variable in the gene transfer balance γ = zh(τ ), where z = ±1 and h(τ ) is a periodic function.
Then for each ξ, κ we calculate the following averages for both z ± 1 and two different initial fractions p ′ = 0.1, p ′′ = 0.9, while N (0) = const.The corresponding initial conditions for coarse-grained approach are found from (36 where p(τ ), N (τ ) are obtained via numerically solving (14,15).The limits of integration should be distinct enough to yield meaningful averaging.We compare these quantities with their counterparts obtained via coarse-grained approach at time T f .As a measure of discrepancy we use Euclidean distance and relative error |N − ⟨N ⟩|/N for the fractions and total abundance of both populations, respectively, here p A = p A (T f ) and N = N (T f ).Thus, the maximum error in the fractions is √ 2. We use relative error in the total abundance, since the average ⟨N ⟩ remains bounded, in general, while N (T f ) may be unbounded,as we have seen above.
The last step is we choose minimize the discrepancy between the fractions of both populations with respect to z = ±1 and p(0) = (p ′ , p ′′ ), that is min z=±1,p ′ ,p ′′ ||p A − ⟨p A ⟩|| (44) While, the relative errors of the total abundance of both populations are calculated at z, p(0) for which the above minimization is obtained, for any given ξ and κ.
The criterion (44) is used to construct the heat map of the euclidean distances presented in the Fig. 2. As it is seen from the figure, the discrepancy between the coarse grained approach and the numerical solution obtained via time-dependent rates is negligible in the regions where either A or B wins.The discrepancy occurs in the coexistence regime, as one might expect.Comparing the relative error in the total abundance of both populations we see that it increases as one get closer to the line a(ξ − κ) + ξκ = 0, as we have expected.
The worst case is presented in Fig. 3, that is max ξ,κ min z±1,p ′ ,p ′′ ||p A − ⟨p A ⟩||.Even in the worst case, both methods represent coexistence between two populations, although the fractions are different.
VALIDATION AND LIMITATION OF THE MODEL.

FIG. 2 :
FIG. 2: Discrepancies between the solutions obtained via coarse-grained approach (34,35) and (14,15) with time-dependent rates.The top and bottom rows shows the discrepancy between the fractions and total abundance of both populations, respectively.Here γ = z/2 sin τ , rA = a cos τ + 0.5 sin τ , rB = b cos τ , where a, b are found from the values of ξ, κ.The remaining model parameters are T f = 200,T 1 = 100 ω = 3, a = 0.1, r A = 1.8 and r B = 1.