Functional convergence in slow-growing microbial communities arises from thermodynamic constraints

The dynamics of microbial communities is complex, determined by competition for metabolic substrates and cross-feeding of byproducts. Species in the community grow by harvesting energy from chemical reactions that transform substrates to products. In many anoxic environments, these reactions are close to thermodynamic equilibrium and growth is slow. To understand the community structure in these energy-limited environments, we developed a microbial community consumer-resource model incorporating energetic and thermodynamic constraints on an intercon-nected metabolic network. The central element of the model is product inhibition, meaning that microbial growth may be limited not only by depletion of metabolic substrates but also by accu-mulation of products. We demonstrate that these additional constraints on microbial growth cause a convergence in the structure and function of the community metabolic network—independent of species composition and biochemical details—providing a possible explanation for convergence of community function despite taxonomic variation observed in many natural and industrial en-vironments. Furthermore, we discovered that the structure of community metabolic network is governed by the thermodynamic principle of maximum free energy dissipation. Our results predict the decrease of functional convergence in faster growing communities, which we validate by ana-lyzing experimental data from anaerobic digesters. Overall, the work demonstrates how universal thermodynamic principles may constrain community metabolism and explain observed functional convergence in microbial communities.

nected metabolic network. The central element of the model is product inhibition, meaning that 23 microbial growth may be limited not only by depletion of metabolic substrates but also by accu-24 mulation of products. We demonstrate that these additional constraints on microbial growth cause 25 a convergence in the structure and function of the community metabolic network-independent 26 of species composition and biochemical details-providing a possible explanation for convergence 27 of community function despite taxonomic variation observed in many natural and industrial en-28 vironments. Furthermore, we discovered that the structure of community metabolic network is 29 governed by the thermodynamic principle of maximum free energy dissipation. Our results predict  116 The flux through the reaction is determined by reversible Michaelis-Menten kinetics 42 : where E is the enzyme concentration, k cat , K S , K P are enzyme-specific parameters describing 118 the chemical kinetics, T is the temperature, and R is the gas constant. In thermodynamic 119 parlance, P S exp ∆G 0 SP RT is the Boltzmann factor of the free energy difference of the reaction 120 after accounting for resource concentration gradient, ∆G ′ SP = ∆G 0 SP + RT log P S . Note that To study the features shared across slow-growing communities in energy-limited environ-162 ments, we repeated the community assembly experiment from five separate pools of 600 163 species each. Each species is characterized by its enzyme budget, chosen from a lognormal 164 distribution, fractionally allocated to a random subset of 15 possible reactions. Species were 165 not shared across pools. We also varied enzymatic parameters (chemical kinetic parameters) 166 to vary between pools. The environment and its energetics was shared across the experi-167 ments, i.e., the resource supply, resource energies, and energy assimilated in each reaction 168 (which is constrained by stoichiometry) was the same across experiments. Simulations of all 169 pools converged to a steady-state maintained by a community of the surviving species. This 170 resembles observations in many microbial community models and experiments 50 . 171 We analyzed the final steady-state communities to find emergent properties of energy-172 limited communities that were shared across pools that differed in species and enzyme con-  Fig. 2A). This subset of reactions was shared across communities, despite starting from 178 different species pools ( Fig. 2A,B). Hence, the reactions utilized by the surviving species 179 were determined by the environment. Furthermore, the metabolic network structure was 180 constrained. The in-degree of each node was 1, i.e., each resource was produced by only a 181 single reaction.

199
Taken together, the three observations above mirror the functional convergence despite 200 taxonomic divergence observed in many natural communities 30-32 . We investigated the model 201 analytically to understand why these slow-growing, energy-limited communities share these 202 emergent properties. We uncovered fundamental thermodynamic principles governing func-203 tional convergence in these energy-limited communities, which we explain in the next section.

204
Principle of maximum free energy dissipation explains selected community metabolic 205 network 206 We first sought to understand why the same metabolic reactions are realized in the final 207 community across species pools. We were able to make analytical progress by focusing our 208 attention on slow-growing communities growing on low-energy reactions. At steady-state, 209 the biomass growth in communities replaces loss by dilution, δ. In slow-growing communities, the growth at steady-state is much smaller than the maximum growth rate of the species g max .

211
Motivated by conditions in ocean sediments and bioreactors, the substrate concentration in 212 the resource supply in simulations was larger than the typical K S 3,30,48,51 . The high substrate 213 levels in the supply mean that growth was primarily slowed by the accumulation of products.

214
Since growth is slowed down by product accumulation, we can relate the concentrations 215 of substrates and products of an active reaction in the final steady-state community. As an 216 example, consider reaction R 0 → R 1 in a community with four resources (Fig. 3A). If the 217 reaction is active in the final steady-state community, then we have: where R * 1 , R * 0 are the steady-state concentrations of the resources and −∆G 0 01 is the free 219 energy dissipated in the reaction R 0 → R 1 , defined as the negative of the standard free energy 220 difference of the reaction (Eq. (1)). This condition implies that reactions are slowed down to 221 near thermodynamic equilibrium due to product accumulation. The first (and leading order) 222 term depends only on the thermodynamics of the reaction. The second term, δ gmax , which 223 contains all the properties specific to the species and enzyme, is only a small correction. 224 We now consider the production of the second resource, R 2 , which can be produced via 225 two possible reaction paths (Fig. 3C). To analyze this scenario, we derive a crucial extension 226 to Eq.(3) that applies to reaction paths, composed of multiple reactions, that are active in 227 the final community (see SI for derivation). If a reaction path proceeding from R i to R j is 228 active in the final community, we have: network is always one (c.f. Fig. 2A,B).

235
To understand which of the two paths is selected in the final community, we can consider 236 the competition of two species utilizing reactions R 1 → R 2 and R 0 → R 2 added to the 237 community with reaction R 0 → R 1 (see Fig. 3B). The species utilizing the reaction R 1 → R 2 238 (and hence path R 0 → R 1 → R 2 ) will drive product accumulation until while the species utilizing the reaction R 0 → R 2 will drive product accumulation until ≈ 240 e 0.5/RT . Since e 0.7/RT > e 0.5/RT , the former species can invade and grow in the steady-state 241 environment of the latter, driving the product accumulation to levels where the latter cannot 242 grow. Thus, the species utilizing the path dissipating more free energy is able to invade and 243 displace its competitor, and create conditions that cannot be invaded by the other. Hence, 244 the path to R 2 that dissipates more free energy is realized in the final community (Fig. 3D).

245
The same argument can be extend to resources with multiple paths leading to them, such 246 as R 3 . We consider three paths to R 3 (Fig. 3E). Note that we exclude the path R 0 → R 2 → 247 R 3 since we previously found the maximally dissipative path to R 2 ; this is an application 248 of Prim's algorithm to find the maximal spanning tree in a network 52 . Comparing the free 249 energy dissipated among the three paths, we find the maximally dissipative path, R 0 → 250 R 1 → R 3 . A species utilizing this path is able to raise the concentration of R 3 high enough 251 so as to make the other reactions infeasible, driving competitors relying on the paths extinct.
where h 0 is the steady state concentration of R 0 in absence of any consumption and −∆G 0 0⇒j 261 is the free energy dissipated in the reaction path to R j realized in the community (see SI).

262
Using Eq. (4), we can obtain the the steady-state concentrations of all resources: where the sum is over all resources, R j , that appear downstream of R i in the reaction dicts that differences in species' enzyme budgets and allocation, and in enzymatic param-296 eters, will start to matter as growth rate increase, and will cause greater differences in the 297 community metabolic network structure and function of communities from different pools.

298
To study community metabolic network structure and function as the steady-state growth 299 rate increases, we simulated community assembly from 10 separate pools of 600 species at 300 different dilution rates δ. Since the growth rate of a community at steady-state is equal to 301 the dilution rate, these simulations provided us with 10 different final communities at each 302 steady-state growth rate. averaged across the 10 species pools, increases with the community growth rate (Fig. 4D).

312
Communities deviate from our predictions at higher dilution rates due to a combination 313 of factors. First, due to the low energy yields of reactions on the maximally dissipative 314 path, species utilizing these reactions need to catalyze a larger flux to achieve high growth 315 rates than species utilizing reactions with higher energy yields on competing paths. Second, 316 due to the variation in enzyme budgets and enzyme allocation strategies, a species utilizing 317 reactions on a maximally dissipative path can get out-competed by a species with a larger 318 enzyme budget allocated to reactions on a competing path. Third, due to the variation in 319 enzymatic parameters, a reaction on a competing path might be preferred because it has 320 better enzymes, (e.g. with higher k cat ).

321
As the idiosyncratic properties of species pools and enzymes start to play a bigger role in 322 determining the community metabolic network, functional convergence weakens. The aver-323 age distance between community fluxes (i.e., the degree of functional divergence) increases 324 in fast-growing communities (Fig. 4E). The dashed line shows the expected distance between 325 two random flux vectors; this distance is higher than the most functionally distant commu-326 nities because the random flux vectors are not subject to ecological and flux constraints.

327
Together, these observations show that the maximum dissipation principle and functional 328 convergence will be stronger in low-energy communities that grow slowly, yet some degree 329 of convergence is expected to persist even for faster growth rates.

330
Validating predictions using experimental data from anaerobic digesters 331 We wanted to validate some of our predictions using real-world experimental data. Anaer-332 obic digesters present a good test system because the lack of strong electron acceptors forces reaction R 0 → R 5 was 25RT (60 kJ/mol). At this higher energy, the theoretical predictions 374 start to disagree with the realized community network (disagreements are highlighted in 375 red). Despite these disagreements, 65% of the reactions are predicted correctly even at this 376 higher energy. 377 We quantified the degree of deviation from theoretical predictions using two separate mea-378 sures. First, we quantified the network prediction error as the fraction of observed reactions 379 that disagreed with theoretical predictions (Fig. 6C). Then, we quantified functional differenergies around 20RT (50 kJ/mol). We also quantified the degree of functional convergence 383 in the observed communities, independent of the theoretical predictions, using the average 384 Jensen-Shannon distance between the fluxes observed in the communities assembled from 385 different species pools (Fig. 6E). We observe that functional convergence is progressively lost because energy assimilated will be zero. Thus, microbes have to find a balance between 437 these opposing forces. Our results suggest that, at small dilution rates, the effect of ecologi-438 cal competition will push microbes to evolve towards dissipating more energy. However, in 439 practice this evolutionary drive to decrease energy assimilation will not continue forever. It consequences of the selective pressures on energy assimilation is reserved for future work. 447 We have shown that in sufficiently low-energy environments, the thermodynamics of mi- dynamics of species α with abundance N α is described by where F ij is the flux of the reaction R i → R j catalyzed by a single enzyme, E concentration R i are given by where h(R i ) is the resource supply. Only the top resource R 0 is supplied, so, h(R 0 ) = δ · h 0 527 and zero otherwise. h 0 is the concentration of resource in the supply. The second and third 528 terms describe consumption and production fluxes of the resource by the species, with j 529 indexing the resources that could be produced from i and k indexing the resources that 530 could produce i. The enzyme budget of a species was chosen from a lognormal distribution with lognormal 544 parameters µ = 1, σ = 0.1. A species utilized a random subset of the 15 possible reactions.

545
So species in the pool were not biased towards extreme generalists or specialists. The enzyme 546 budget was allocated to the selected reactions uniformly, by using a Dirichlet distribution 547 with the Dirichlet parameter, α = 1, for the selected reactions. There were 600 species in 548 each of the 5 pools in Fig. 2 and 10 pools in Fig. 4; species were not shared across pools.

550
4RT , 3RT , 2RT , 1RT, and 0RT . For each reaction, a random fraction between 15% and 551 85% of the standard energy gap between product and substrate (E S − E P ) was assimilated, 552 and the rest was dissipated. This assimilated energy, along with resource energies, was fixed 553 across the experiments with different species pools. The concentration of R 0 in the supply, 554 h 0 = 80. The chemical kinetic parameters K P ,K S , and k cat were chosen from lognormal 555 distributions with lognormal parameters µ = 0, σ = 0.01, and so, their mean was close to 1.

556
The energy to biomass yield factor, Y = 1. The dilution rate, δ = 0.01 in Fig. 2; this rate 557 was small enough that a typical species would survive if it was alone.

558
In Fig. 5, the input flux of R 0 was held fixed across dilution rates to mimic the experiment ( 559 R 0 δ = 10). We simulated four different species pools, starting with 600 species in each pool.

560
The energies of resources R 0 , R 1 , R 2 , R 3 , R 4 , and R 5 were 5RT, 4.2RT, 3.7RT, 2.4RT, 1.4RT, 561 and 0RT . Motivated by real-world scenarios, the energy gaps were not uniformly spaced to 562 demonstrate the robustness of our results to differences in spacing. For the same reason, 563 we increased the degree of variation in chemical kinetic parameters K P ,K S , and k cat , which 564 were chosen from lognormal distributions with lognormal parameters µ = 0, σ = 0.15. The 565 concentration of R 0 in the supply h 0 = 80 and energy to biomass yield factor, Y = 1.

566
In Fig. 6, the energies of resources R 0 , R 1 , R 2 , R 3 , R 4 , and R 5 were selected to be pro-  The error in the network predicted by maximum dissipated was calculated as 1−accuracy.

585
The accuracy was defined as the fraction of reactions correctly predicted: 586 accuracy = n correct n observed + n predicted − n correct , where n observed is the number of reactions in the observed network, n predicted is the number of 587 reactions in the predicted network, and n correct is the number of reactions correctly predicted.

588
The numerator can be understood as the number of elements in the intersection of the  possible paths from R 0 to the next resource, R 2 , that could be added to the community. The paths R 0 → R 2 and R 0 → R 1 → R 2 dissipate 0.5 and 0.7 units of energy. D) The path that dissipates more free energy, R 0 → R 1 → R 2 , is selected in the final community because species utilizing this path will drive concentration of R 2 higher (such that R * 2 R * 0 ≈ exp (0.7/RT )) than species utilizing R 0 → R 2 E) The three possible paths from R 0 to R 3 , that could be added to the community.
We do not consider the path R 0 → R 2 → R 3 because we have already found that the maximally dissipative path to R 2 is R 0 → R 1 → R 2 and not R 0 → R 2 . F)The path that dissipates most free energy, R 0 → R 1 → R 3 , is selected in the final community because it can drive the concentration . Thus, the active reactions in the final community network will be determined by the paths to each resource that dissipates the most free energy. The error in the predicted network increases with the community growth rate. E) The functional distance between two communities was quantified by the Jensen-Shannon distance between the two community flux vectors. The average functional distance increases with the growth rate. Thus, the observed functional convergence decreases as the communities grow faster. The dashed line indicates the average distance between two random flux vectors. A random flux vector had the same length (15), and had each element picked from a uniform distribution between 0 and 1. The colored boxes in panels D and E correspond to the growth rates shown in the panels in A,B,C. The maximum growth rate, g max , was used to normalize the steady-state growth rate. g max was defined as the growth rate obtained on an irreversible reaction between consecutive resources by a species investing its entire budget in the reaction without any energy dissipation (see Methods).