Abstract
Multi-species microbial communities often display “community functions” stemming from interactions of member species. Interactions are often difficult to decipher, making it challenging to design communities with desired functions. Alternatively, similar to artificial selection for individuals in agriculture and industry, one could repeatedly choose communities with the highest community functions to reproduce by randomly partitioning each into multiple “Newborn” communities for the next cycle. However, previous efforts in selecting complex communities have generated mixed outcomes that are difficult to interpret. To understand how to effectively enact community selection, we simulated community selection to improve a community function that requires two species and imposes a fitness cost on one or both species. Our simulations predict that improvement could be easily stalled unless various aspects of selection, including species choice, selection regimen parameters, and stochastic populating of Newborn communities, were carefully considered. When these considerations were addressed in experimentally feasible manners, community selection could overcome natural selection to improve community function, and in some cases, even force species to evolve to coexist. Our conclusions hold under various alternative model assumptions, and are thus applicable to a variety of communities.
Introduction
Multi-species microbial communities often display important functions, defined as biochemical activities not achievable by member species in isolation. For example, a six-species microbial community, but not any member species alone, cleared relapsing Clostridium difficile infections in mice [1]. Community functions arise from interactions where an individual alters the physiology of another individual. Thus, to improve community function, one could identify and modify interactions [2, 3]. In reality, this is no trivial task: each species can release tens or more compounds, many of which may influence the partner species in diverse fashions [4, 5, 6, 7]. From this myriad of interactions, one would then need to identify those critical for community function, and modify them by altering species genotypes or the abiotic environment. One could also artificially assemble different combinations of species or genotypes at various ratios to screen for high community function. However, the number of combinations becomes very large even for a moderate number of species and genotypes, and some species may not be culturable in isolation.
In an alternative approach, artificial selection of whole communities could be carried out over cycles to improve community function [8, 9, 10, 11, 12] (reviewed in [13, 14, 15]). A selection cycle starts with a collection of low-density communities with artificially-imposed boundaries (e.g. inside culture tubes). These low-density communities are incubated for a period of time during which community members multiply and interact with each other and possibly mutate, and the community function of interest develops. At the end of incubation, desired communities are chosen to “reproduce” where each is randomly partitioned into multiple low-density communities to start the next cycle. Superficially, this process may seem straightforward since “one gets what one selects for”. After all, artificial selection on individuals has been successfully implemented to obtain, for example, proteins of enhanced activities (Figure S1). However, compared to artificial selection of individuals or mono-species groups, artificial selection of multi-species communities is more challenging. One reason is that community function has limited heredity, since species and genotype compositions can change rapidly from one selection cycle to the next due to ecological and evolutionary forces (see detailed explanation in Figure S1). For example, member species critical for community function may get lost during growth and selection cycles.
Artificial selection can be applied to any population of entities [96]. An entity can be an individual (A), a mono-species group (B), or a multi-species community (C). Unlike natural selection which selects for fastest-growing cells, artificial selection generally selects for traits that are costly to individuals. In each selection cycle, a population of “Newborn” entities grow for maturation time T to become “Adults”. Adults expressing a higher level of the trait of interest (darker shade) are selected to reproduce. An individual reproduces by making copies of itself, while an Adult group or community can reproduce by randomly splitting into multiple Newborns of the next selection cycle. Successful artificial selection requires that i) entities display trait variations; ii) trait variations can be selected to result in differential entity survival and reproduction; and iii) entity trait is sufficiently heritable from one selection cycle to the next [97]. In all three types of selection, entity variations can be introduced by mutations and recombinations in individuals. However, heredity can be low in community selection. (A) Artificial selection of individuals has been successful [98, 99, 100], since a trait is largely heritable so long as mutation and recombination are sufficiently rare. (B, C) In group and community selection, if T is small so that newly-arising genotypes cannot rise to high frequencies within a selection cycle, then Adult trait is mostly determined by Newborn composition (the biomass of each genotype in each member species). Then, variation can be defined as the dissimilarity in Newborn composition within a selection cycle, and heredity as the similarity of Newborn composition from one cycle to the next for Newborns connected through lineage (tubes with same-colored outlines in Figure). (B) Artificial selection of mono-species groups has been successful [40, 42, 13]. Suppose cooperators but not cheaters pay a fitness cost to generate a product (shade). Artificial selection for groups producing high total product favors cooperator-dominated groups, although within a group, cheaters grow faster than cooperators. At a large Newborn population size (top), all Newborns will harbor similar fractions of cheaters, and thus inter-group variation will be small [57]. During maturation, cheater frequency will increase, thereby diminishing heredity. In contrast, when Newborn groups are initiated at a small size such as one individual (bottom), a Newborn group will comprise either a cooperator or a cheater, thereby ensuring variation. Furthermore, even if cheaters were to arise during maturation, a fraction of Newborns of the next cycle will by chance inherit a cooperator, thereby ensuring some level of heredity. Thus, group selection can work when Newborn size is small. (C) Artificial selection of multi-species communities may be hindered by insufficient heredity. During maturation, the relative abundance of genotypes and species can rapidly change due to ecological interactions and evolution, which compromises heredity. During community reproduction, stochastic fluctuations in Newborn composition further reduce heredity.
The few attempts of community selection have generated interesting results. One theoretical study simulated artificial selection on multi-species communities based on their ability to modify their abiotic environment [10]. Communities responded to selection, but the response quickly leveled off, and could be generated without mutations. Thus, in this case, selection acted on species types instead of new genotypes [10]. In experiments, complex microbial communities were selected to improve their abilities to degrade a pollutant or to alter plant physiology [8, 9, 12, 11]. For example, microbial communities selected to promote early or late flowering in plants were dominated by distinct species types [11]. However in other cases, a community trait may fail to improve despite selection, and may improve even without selection [8, 9].
Because communities used in these selection attempts were complex, much remains unknown. First, was the trait under selection a community function or achievable by a single species? If the latter, then community selection may not even be needed. Second, did selection act solely on species types or also on newly-arising genotypes? If the former ([10, 11]), then without immigration of new species, community function may quickly level off [10]. If the latter, then community function could continue to improve as new genotypes evolve. Finally, why might a community trait sometimes fail to improve despite selection [8, 9]?
In this study, we simulated artificial selection on communities with defined species. Our goal is to improve a “costly” community function via selecting for genotypes that promote community function. A community function is costly if any community member’s fitness is reduced by contributing to that community function. Costly community functions are common when microbes are engineered to make a product [16]. Community function can also be costly if high community function requires some species to restrain their growth to not out-compete other species. To improve a costly community function, artificial community selection must overcome natural selection which favors low community function.
To understand how to effectively enact community selection to improve a costly community function, we simulated artificial selection of communities consisting of two defined species whose phenotypes can be modified by random mutations. A simplified two-species community would allow us to mechanisti-cally investigate how community members evolved under community selection. Simulations allow us to compare the efficacy of different selection regimens with relative ease. We also designed our simulations to mimic real lab experiments so that our conclusions could guide future experiments. For example, our simulations incorporated not only chemical mechanisms of species interactions (as advocated by [17, 18]), but also experimental procedures (e.g. pipetting cultures during community reproduction). Model parameters, including species phenotypes, mutation rate, and distribution of mutation effects, were based on a wide variety of published experiments. Note that most previous models focused on binary phenotypes (e.g. contributing or not contributing to community function) [19], and therefore could not model community function improvement if all members started as contributors. We show that artificial community selection can improve a costly community function, but only after circumventing a multitude of failure traps.
Results
We will first introduce the subject of our community selection simulation: a commensal two-species community that converts substrates to a valued product. We will define community function and show that inappropriate definitions lead to selection failures. We will then describe how we simulate artificial community selection. From simulation results, we will demonstrate critical measures that make community selection effective, including promoting species coexistence, suppressing non-contributors, and being mindful about how routine experimental procedures can impede selection. Finally, we show that our conclusions are robust under alternative model assumptions, applicable to mutualistic communities and communities whose member species normally do not coexist. To avoid confusion, we will use “community selection” or “selection” to describe the entire process of artificial community selection (community formation, growth, selection, and reproduction), and use “choose” to refer to the selection step.
A Helper-Manufacturer community that converts substrates into a product
Motivated by previous successes in engineering two-species microbial communities that convert substrates into useful products [20, 21, 22], we numerically simulated selection of such communities. In our community, Manufacturer M can manufacture Product P of value to us (e.g. a bio-fuel or a drug) at a fitness cost to self, but only if helped by Helper H (Figure 1). Specifically, Helper but not Manufacturer can digest an agricultural waste (e.g. cellulose), and as Helper grows biomass, Helper releases Byproduct B at no fitness cost to itself. Manufacturer requires H’s Byproduct (e.g. carbon source) to grow (obligatory commensalism). In addition, Manufacturer invests fP (0 ≤ fP ≤ 1) fraction of its potential growth to make Product P while using the rest (1-fP) for its biomass growth. Both species also require a shared Resource R (e.g. nitrogen). Thus, the two species together, but not any species alone, could convert substrates (Waste and Resource) into Product.
Helper H consumes Waste (present in excess) and Resource to grow biomass, and concomitantly releases Byproduct B at no fitness cost to itself. H’s Byproduct B is required by Manufacturer M. M consumes Resource and H’s Byproduct, and invests a fraction fP of its potential growth gM to make Product P while channeling the remaining to biomass growth. When biomass growth ceases, Byproduct and Product are no longer made. The five state variables (italicized) H, M, R, B, and P correspond to the amount of H biomass, M biomass, Resource, Byproduct, and Product in a community, respectively.
During each community selection cycle (Figure 2), low-density “Newborn” H-M communities were assembled, each supplied with a fixed amount of Resource and excess Waste. These Newborn communities were allowed to grow (“mature”) over a fixed time T into high-density “Adult” communities. We define community function as the total amount of Product accumulated as a low-density Newborn community grows into an Adult community over maturation time T, i.e. P (T) (Figure 2, top two rows). In Methods Section 7, we explain problems associated with alternative definitions of community function (e.g. per capita production; Figure S2). Community function P (T) is not costly to Helpers, but reduces M’s growth rate by fraction fP (Figure 1). Therefore, artificial selection is necessary to improve community function, since natural selection always favors non-producing M (fP = 0). Later, we will show that for a community function that is costly to both H and M, our conclusions also hold.
Over the range of fP where M and H can coexist, P (T)/M (0) increases as ϕM (0) decreases. As a result, selection for higher P (T)/M (0) would select communities with lower ϕM (0) and can drive M to extinction.
In our simulations, cycles of selection were performed on a total of ntot = 100 communities. At the beginning of the first cycle, each Newborn had a total biomass of the target biomass (BMtarget=100; 60 M and 40 H each of biomass 1). In subsequent cycles, species ratio would converge to the steady state value (Figure 3 bottom). Waste (not drawn) was in excess. The amount of Resource in each Newborn (not drawn) was fixed at a value that could support a total biomass of 104 (unless otherwise stated). The maturation time T was chosen so that for an average community, Resource was not depleted by time T (in experimental terms, this would avoid complications of the stationary phase). During maturation, Resource R, Byproduct B, Product P, and each cell’s biomass were calculated from differential equations (Methods, Section 6). Once a cell’s biomass had grown from 1 to 2, it divided into two identical daughter cells. Death occurred stochastically to individual cells. After division, mutations (different shades of oval and rod) occurred stochastically to change a cell’s phenotypes (maximal growth rate, affinity for metabolites, and M’s fP). At the end of a cycle (time T), the top-functioning Adult with the highest Product P (T) was chosen and diluted into as many Newborns as possible so that on average, each Newborn had a total biomass of approximately the target biomass BMtarget. The next top-functioning Adult was then reproduced until ntot = 100 Newborns were generated for the next selection cycle.
Simulating community selection
We simulated four stages of community selection (Figure 2): forming Newborn communities; Newborn communities maturing into Adult communities; choosing highest-functioning Adult communities, and reproducing the chosen Adult communities by splitting each into multiple Newborn communities of the next cycle. Our simulation was individual-based. That is, it tracked phenotypes and biomass of individual H and M cells in each community as cells grew, divided, mutated, or died. Our simulations also tracked dynamics of chemicals (including Product) in each community, and accounted for actual experimental steps such as pipetting cultures during community reproduction. Below is a brief summary of our simulations, with more details in Methods.
Each simulation (Methods Section 6) started with ntot number of Newborn communities. Each Newborn community always started with a fixed amount of Resource and a total biomass close to a target value BMtarget (see Methods Section 7 for problems associated with not having a biomass target). Waste was always supplied in excess and thus did not enter our equations. Note that except for the first cycle, the relative abundance of species in a Newborn community inherited that of the parent Adult community.
During community maturation, biomass of individual cells grew. The biomass growth rate of an H cell depended on Resource concentration (Monod Equation; Figure S3A; Eq. 23). As H grew, it consumed Resource and simultaneously released Byproduct (Eqs. 21 and 22). The potential growth rate of an M cell depended on the concentrations of Resource and H’s Byproduct ([23]; Figure S3B; see experimental support in Figure S4). M cell’s actual biomass growth rate was (1 – fP) fraction of M’s potential growth rate (Eq. 24). As M grew, it consumed Resource and Byproduct (Eqs. 21 and 22), and released Product at a rate proportional to fP and M’s potential growth rate (Eqs. 8). Once an H or M cell’s biomass grew from 1 to 2, it divided into two cells of equal biomass with identical phenotypes, thus capturing experimental observations of continuous biomass increase (Figure S5) and discrete cell division events [24]. Meanwhile, H and M cells died stochastically at a constant death rate. Although mutations can occur during any stage of the cell cycle, we assigned mutations immediately after cell division, where each phenotype of both cells mutated independently.
(A) H growth follows Monod kinetics, reaching half maximal growth rate when R = KHR. (B) M growth follows dual-substrate Mankad-Bungay kinetics. When Resource R is in great excess (RM ≫BM) or Byproduct B is in great excess (BM ≫RM), we recover mono-substrate Monod kinetics (A).
Suppose that cell growth rate depends on each of the two substrates S1 and S2 in a Monod-like, saturable fashion. When S2 is in excess, the S1 at which half maximal growth rate is achieved is K1. When S1 is in excess, the S2 at which half maximal growth rate is achieved is K2. (A) In the “Double Monod” model, growth rate depends on the two limiting substrates in a multiplicative fashion. In the model proposed by Mankad and Bungay (B), growth rate takes a different form. In both models, when one substrate is in excess, growth rate depends on the other substrate in a Monod-fashion. However, when , the growth rate is predicted to be gmax/2 by Mankad & Bunday model, and gmax/4 by the Double Monod model. Mankad and Bungay model outperforms the Double Monod model in describing experimental data of S. cerevisiae and E. coli growing on low glucose and low nitrogen. The figures are plotted using data from Ref. [23].
We model exponential biomass growth in excess metabolites. Thick black line: analytical solution with biomass growth rate (0.7/time unit). Grey dashed line: simulation assuming that biomass increases exponentially at 0.7/time unit and that cell division occurs upon reaching a biomass threshold, an assumption used in our model. Color dotted lines: simulations assuming that cell birth occurs at a probability equal to the birth rate multiplied with the length of simulation time step (Δτ = 0.05 time unit). When a cell birth occurs, biomass increases discretely by 1, resulting in step-wise increase in color dotted lines at early time.
Mutable phenotypes included H and M’s maximal growth rates and affinities for nutrients (“growth parameters”), and M’s fP (the fraction of potential growth diverted for making Product), since these phenotypes have been observed to rapidly change during evolution ([25, 26, 27, 28]). Mutated phenotypes could range between 0 and their respective evolutionary upper bounds. On average, half of the mutations abolished the function (e.g. zero growth rate, zero affinity, or fP = 0) based on experiments on GFP, viruses, and yeast [29, 30, 31]. Effects of the other 50% mutations were bilateral-exponentially distributed, enhancing or diminishing a phenotype by a few percent, based on our re-analysis of published yeast data sets [32] (Figure S6). We held death rates constant, since death rates were much smaller than growth rates and thus mutations in death rates would be inconsequential. We also held release and consumption coefficients constant. This is because, for example, the amount of Byproduct released per H biomass generated is constrained by biochemical stoichiometry.
We derived µΔs(Δs) from the Dunham lab data [32] where bar-coded mutant strains were competed under sulfate-limitation (red), carbon-limitation (blue), or phosphate-limitation (black). Error bars represent uncertainty δµΔs (the lower error bar is omitted if the lower estimate is negative). In the leftmost panel, green lines show non-linear least squared fitting of data to Eq. 19 using all three sets of data. Note that data with larger uncertainty are given less weight, and thus deviate more from the fitting. For an exponentially-distributed probability density function p(x) = exp(-x/r)/r where x, r > 0, the average of x is r. When plotted on a semi-log scale, we get a straight line with slope 1/r, and inverting this gets us the average effect r. From the green line on the right side, we obtain the average effect of enhancing mutations s+ = 0.050 ± 0.002, and from the green line on the left side, we obtain the average effect of diminishing mutations s- = 0.067 ± 0.003. The probability of a mutation altering a phenotype by ±α is the shaded area.
At the end of community maturation time T, we obtained community function P (T) (the total amount of Product at time T) for each Adult community. The highest-functioning Adult was randomly partitioned into Newborns of the target total biomass BMtarget. For example, if the chosen Adult had a total biomass of 60BMtarget, then each cell would be assigned a random integer from 1 to 60, and those cells with the same random integer would be allocated to the same Newborn. Experimentally, this is equivalent to volumetric dilution using a pipette. Thus, for each Newborn, the total biomass and species ratio fluctuated around their expected values in a fashion associated with pipetting (Methods Section 9). When the highest-functioning Adult was used up for making Newborns, the next highest-functioning Adult was chosen and reproduced until ntot Newborns were generated for the next selection cycle.
Overcoming the ecological and evolutionary fragility of species coexistence
In order to improve community function, species need to coexist throughout selection cycles. That is, all species must grow at a similar average growth rate within each cycle. Furthermore, species ratio should not be extreme because otherwise, the low-abundance species could be lost by chance during Newborn formation. Species coexistence at a moderate ratio has been experimentally realized in engineered communities [20, 21, 33, 34].
To achieve species coexistence at a moderate ratio in the H-M community, three considerations need to be made. First, the fraction of growth M diverted for making Product (fP) must not be too large, or else M would always grow slower than H and thus go extinct (Figure 3 top). Second, upon Newborn formation, H can immediately start to grow on Waste and Resource, while M cannot grow until H’s Byproduct has accumulated to a sufficiently high level. Thus, H and M’s growth parameters (maximal growth rates in excess nutrients; affinities for nutrients) should ideally allow M to grow faster than H at some point during community maturation. Third, to achieve a moderate steady-state species ratio, metabolite release and consumption need to be balanced. Otherwise, the ratio between metabolite releaser and consumer can be extreme [33].
Here, we plotted the fraction of M biomass in a community over two maturation cycles. Top: When fP, the fraction of potential growth Manufacturer diverts for making Product, is high (e.g. fP = 0.8), M goes extinct. Bottom: At low fP (e.g. fP = 0.1), H and M can stably coexist. That is, different initial species ratios will converge to a steady state value. Calculations were based on equations 6-10 with parameters in the last column of Table 1 (i.e. growth parameters at evolutionary upper bounds). At the end of the first cycle (time T =17), Byproduct and Resource were re-set to the initial conditions at time zero (0 and 104, respectively), and total biomass was reduced to the target value BMtarget while the fraction of M biomass ϕM remained the same as that of the parent community. See main text for how values of maturation time and Resource were chosen.
Based on these considerations and published yeast and E. coli measurements, we chose H and M’s ancestral growth parameters and their evolutionary upper bounds, as well as release, consumption, and death parameters (Table 1, Methods Section 2). This ensured that throughout selection cycles, different species ratios would converge toward a moderate steady state value during community maturation (Figure 3, bottom). Note that if species were not chosen properly, selection might fail due to insufficient species coexistence (e.g. Figure 7A), although as we will show later, community selection could enforce species coexistence when executed properly (Figure 7).
Parameters for ancestral and evolved (growth- and mono-adapted) H and M. Parameters in the “Evolved” column are used for most simulations and figures unless otherwise specified. For maximal growth rates, * represents evolutionary upper bound. For KSpeciesMetabolite, * represents evolutionary lower bound, which corresponds to evolutionary upper bound for Species’s affinity for Metabolite (1/KSpeciesMetabolite). # is from Figure S23. In Methods Section 2, we explained our parameter choices (including why we hold some parameters constant during evolution).
Choosing selection regimen parameters to avoid known failure modes
After having chosen member species with appropriate phenotypes, we need to consider parameters of selection regimen. These parameters include the total number of communities under selection (ntot), Newborn target total biomass (BMtarget), the amount of Resource added to each Newborn (R(0)), the amount of mutagenesis which controls the rate of phenotype-altering mutations (µ), and maturation time (T). Compared to the well-studied problem of group selection where the unit of selection is a mono-species group [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], community selection is more challenging (Discussions; Figure S1). However, the two types of selections do share common aspects (Discussions; Figure S1). Thus, we can apply group selection theory, together with other practical considerations, to better design selection regimen.
If the total number of communities ntot is very large, then the chosen community will likely display a higher community function than if ntot is small, but the experimental setup is more challenging. We chose a total of 100 communities (ntot=100).
If the mutation rate is very low, then community function cannot rapidly improve. If the mutation rate is very high, then non-producers will be generated at a high rate, and as the fast-growing non-producers take over, community function is likely to collapse. Here, we chose µ, the rate of phenotype-altering mutations, to be biologically realistic (0.002 per cell per generation per phenotype, which is lower than the highest values observed experimentally; Methods Section 4).
If Newborn total biomass BMtarget is very large, or if the number of doublings within T is very large, then non-producers will take over in all communities during maturation (Figure S8, compare B-D with A), as predicted by group selection theory. On the other hand, if both BMtarget and the number of generations within T are very small, then mutations will be rare within each cycle, and many cycles will be required to improve community function. Finally, if BMtarget is very small, then a member species might get lost by chance during Newborn formation. In our simulations, we chose Newborn’s target total biomass BMtarget = 100 biomass (50∼100 cells). Unless otherwise stated, we fixed the input Resource R(0) to support a maximal total biomass of 104, and chose maturation time T so that total biomass would undergo ∼6 doublings (increasing from ∼100 to ∼7000). Thus, by the end of T, ≤ 70% Resource would be consumed by an average community. This meant that when implemented experimentally, we could avoid complications of Resource depletion and stationary phase, while not wasting too much Resource.
Community selection sometimes improves community function through promoting individual growth
We simulated community selection while allowing H and M’s growth parameters (maximal growth rates in excess metabolites; affinities for metabolites) and M’s fP to be modified by mutations. As expected, in control simulations where Adult communities were randomly chosen to reproduce, community function was driven to zero by natural selection as fast-growing non-producing M took over (Figure S9C; average fP declining to zero in Figure S9B).
When we chose highest-functioning Adult communities to reproduce, community function improved (Figure 4B, compare i and ii). Concurrently, fP remained nearly unchanged while H and M’s growth parameters improved (Figure S10), leading to faster H and M growth rates (Figure 4A). Community function improvement was largely due to improved H and M growth, since disallowing mutations in growth parameters greatly diminished community function improvement (Figure 4B, compare i and iii). Here, improved individual growth promoted community function because of our choices of evolutionary upper bounds: Since H could not evolve to grow so fast to overwhelm M, species ratio was maintained at a moderate level (Figure 3B), and faster H and M growth resulted in more Byproduct, larger M populations, and consequently higher Product level. With different choices of evolutionary upper bounds, increasing growth parameters decreased community function due to H dominance (Figure S7), although we will demonstrate that even in this scenario, properly executed community selection can improve community function while promoting species coexistence (Figure 7).
Here, the evolutionary upper bound for was larger than that of
, opposite to that in Figure 4. If community reproduction occurred through volumetric dilution via pipetting, and if fP and growth parameters were allowed to mutate, community function declined to below the ancestral level (left). For detailed dynamics, see Figure S21. Resource supplied to Newborn communities could support 105 total biomass to accommodate faster growth rate.
gHmax and gMmax, H and M’s maximal growth rates in excess nutrients, have evolutionary upper bounds (Table 1). (A-B) Data for selected communities from 2000th cycle are plotted, with error bars calculated form three independent selections. (A) The growth rates of M (blue) and H (brown) improved over ancestral values during selection. Growth rates were averaged over one selection cycle. (B) Community function improved largely due to mutations in growth parameters. Compared to ancestral community function where no mutations were allowed (i), community function improved when growth parameters and fP were allowed to mutate (ii). Preventing mutations in growth parameters diminished community function improvement (iii). (C) Community function could be further improved. When Newborn total biomass was fixed to 100 cells and maturation time T = 17 time units, maximal P (T) was achieved at an intermediate
(magenta dashed line), as plotted here for the species composition optimal for community function (46 H and 54 M cells in Newborn). Note that at zero fP, no Product would be made; at high fP, M would go extinct (Figure 3 top panel). For comparison, we plot data from B(i) (ancestral fP and growth parameters; light grey star) and B(ii) (after community selection where fP remained ancestral while growth parameters improved to upper bounds; dark grey star). The maximal P*(T) could not be further improved even if we allowed all growth parameters and fP to mutate (Figure S12). Thus, P*(T) is locally maximal in the sense that small deviation will always reduce P (T). Similar to other simulations, Newborns had a fixed Resource (104) and excess Waste.
Community selection may not be effective under conditions reflecting common lab practices
While community function improved, it could have improved significantly more. Specifically, when growth parameters were fixed to their respective upper bounds, as occured during community selection (Figure S10C), maximal community function P*(T) could be achieved at (Figure 4C). P*(T) was much higher than what was achieved during community selection (Figure 4C ii dark grey star).
For simplicity, we modeled the growth of Newborn groups of M cells. From a Newborn biomass of 102 or 104 wild-type M cells, M population multiplied for 6 or 100 generations. Immediately following cell division, wild-type daughter cells mutated to non-producers with a probability of 10-3. Wild-type and mutant cells followed exponential growth.The growth rate of wild-type cells was 0.87 times that of mutants. The fraction of biomass made up by mutants at each wild-type doubling is shown. Note the different scales.
When random Adults were selected to reproduce, natural selection favored fast growers with improved maximal growth rates and improved affinities for nutrients (A), and zero fP (B). Consequently, P (T) decreased to zero (C). Maximal growth rates of H and M (gHmax and gMmax), H’s affinity for Resource 1/KHR, and M’s affinity for Byproduct 1/KMB rapidly improved to their respective upper bounds, while M’s affinity for Resource 1/KMR improved more slowly. This is consistent with M’s growth being more limited by Byproduct. Green dashed lines: upper bounds of phenotypes; Magenta dashed lines: fP optimal for community function and maximal P (T) when all five growth parameters are fixed at their upper bounds and ϕM (0) is also optimal for P(T). Black, cyan, and gray curves show three independent simulations. is averaged across selected Adults.
, and
are obtained by averaging within each selected Adult and then averaging across selected Adults. KSpeciesMetabolite are averaged within each selected Adult, then averaged across selected Adults, and finally inverted to represent average affinity. Note different x axis scales. The maximal growth rates (gMmax and gHmax) have the unit of 1/time. Affinity for Resource (1/KMR, 1/KHR) has the unit of
, where
is the initial amount of Resource in Newborn. Affinity for Byproduct (1/KMB) has the unit of
, where
is the amount of Byproduct released per H biomass produced. Product P has the unit of
, the amount of Product released at the cost of one M biomass. More details can be found in Table 1.
Community function P (T) increased upon community selection (A). Since fP remained unchanged (B), this increase in P (T) must be due to improved growth parameters (C). Other legends are the same as Figure S9.
To investigate why community selection failed to improve community function to maximum P*(T), we repeated our simulations, except that we fixed H and M’s growth parameters to their upper bounds (as occured during community selection; Figure S10C) and only allowed fP to mutate. This simplification allowed us to focus on the evolutionary dynamics of fP, and is justified since we obtained similar conclusions regardless of whether we fixed growth parameters (Figure S13).
The fraction of M’s biomass in 10 Newborns were plotted by filled dots at Time 0. Since these Newborn communities were generated via volumetric dilutions by pipetting, these fractions fluctuated stochastically around that of the parent Adult community (open black circle). Since Newborns with lower fraction of M tend to achieve higher community function (Figure 6), here the Newborn marked by the magenta dot would achieve the highest community function and thus be chosen to reproduce. In addition, its fraction of M biomass approached a steady-state value (black dashed line) upon maturation. Since the fraction of M increased from below-steady-state to steady-state, the average growth rate of M was higher than that of H. Data here are from Cycle 100 of the simulation plotted in Figure 4 i).
We started each Newborn community with total biomass BM (0) = 0, all five growth parameters at their upper bounds, and and
to achieve P*(T). We then allowed all five growth parameters and fP to mutate while applying community selection. To ensure effective community selection (Figure 5), BM (0) was fixed to 100, and ϕM (0) was fixed to ϕM (T) of the selected Adult community from the previous cycle during community reproduction. We found that all five growth parameters remained at their respective evolutionary upper bounds. At the end of the first cycle (Cycle = 1 in insets), even though
did not change,
had already declined from the original magenta dashed line. This is because species interactions have driven ϕM (0) from the optimal
to near the steady state value (ϕM = 0.72, compare with ϕM,SS represented by the green dashed line in Figure 1C bottom panel). Later, over hundreds of cycles,
gradually increased, which increased
. However,
was still below maximal. This is because species composition gravitated toward steady state ϕM,SS which deviated from the optimal
. Other legend details can be found in Figure S9.
(A) and (B)
of selected communities over 2000 selection cycles when the maturation time
and
. All other simulation parameters are in Table 1. Compared to simulations whose results are presented in Figure 5, simulations for this figure allowed growth parameters of each M and H cells, and fP of each M cell, to vary. The legends are the same as Figure 5.
Could community selection increase ancestral fP to optimal for community function, despite natural selection favoring lower fP ? Despite thousands of selection cycles, fP and community function P (T) barely improved, and both were far below their theoretical optima (Figure 5A and B).
(A-L) Outcome of community selection when maturation time T was sufficiently short to avoid Resource depletion and stationary phase (T = 17). (A-C) When the selected Adults were diluted into Newborns through pipetting (i.e. BM (0) and ϕM (0) fluctuated around their respective mean values), community selection was not effective. Average fP and community function failed to improve to their theoretical optima (magenta dashed lines), and community function poorly correlated with its heritable determinant . Black and magenta dots: unselected and selected Adult communities from one selection cycle, respectively. (J-L) When a fixed H biomass and M biomass from the selected Adults were sorted into Newborns, community selection was successful. Community function also correlated with its heritable determinant
. Here, Newborn total biomass BM (0) and fraction of M biomass ϕM (0) were respectively fixed to BMtarget = 100 and ϕM (T) of the selected Adult of the previous cycle. (D-I) Fixing either BM (0) or ϕM (0) did not significantly improve community selection. (M-O) When maturation time was long (T = 20) such that most Resource was consumed by the end of T, community selection was successful even without fixing BM (0) or ϕM (0). Black, cyan and gray curves are three independent simulation trials.
was averaged across the two selected Adults.
was obtained by first averaging among M within each selected Adult and then averaging across the two selected Adults.
Common lab practices can generate sufficiently large non-heritable variations in community function to interfere with selection
Why did community selection fail to improve fP and community function? One possibility is that community function was not sufficiently heritable from one cycle to the next (Figure S1). We there-fore investigated the heredity of community function by examining the heredity of community function determinants.
Community function P (T) was largely determined by phenotypes of cells in the Newborn community. This is because maturation time was sufficiently short (∼6 doublings) that new genotypes could not rise to high frequencies to significantly affect community function. Since all phenotypes except for fP were fixed, community function had three independent determinants: Newborn’s total biomass BM (0), Newborn’s fraction of M biomass ΦM (0), and the average fP over all M cells in Newborn (Eq 6-10).
A community function determinant is considered heritable if it is correlated between Newborns of one cycle (Figure 6A, bottom row) and their respective progeny Newborns in the next cycle (Figure 6A, color-matched top row). Among the three determinants, was heritable (Figure 6B): if a Newborn community had a high average fP, so would the mature Adult community and Newborn communities reproduced from it. On the other hand, Newborn total biomass BM (0) was not heritable (Figure 6C). This is because when an Adult community reproduced via pipette dilution, the dilution factor was adjusted so that the total biomass of a progeny Newborn community was on average the target biomass BMtarget. Newborn’s fraction of M biomass ΦM (0), which fluctuated around that of its parent Adult, was not heritable either (Figure 6D). This is because regardless of the species composition of Newborns, Adults would have similar steady state species composition (Figure 3B), and so would their offspring Newborns.
(A) Schematic. We analyzed two consecutive cycles: In “previous cycle” (bottom), 100 Newborns matured into 100 Adults, and each Adult was then “pipetted” into multiple Newborns of the “current cycle” (top), forming 100 lineages (tubes with the same color outline belong to the same lineage). (B-D) Among the three determinants of community function, (fP averaged among M cells in Newborn) is heritable, but BM (0) (Newborn total biomass) and ϕM (0) (Newborn fraction of M biomass) are not. For each lineage, the community function determinant at the previous cycle was scatter plotted against those at the current cycle. For example, the abscissa of one point in B (e.g. red circle) indicates
of one Newborn community from a particular lineage in the previous cycle (e.g. bottom tube of the red lineage in A); the ordinate and error bar of that point indicate the mean and one standard deviation of
among the Newborn communities of that lineage from the current cycle (e.g. red box in A). (E-G) Correlation between each Adult community function P (T) against community function determinants of the respective Newborn community. Each dot represents one community, and the two magenta dots indicate the two “successful” Newborns that achieved the highest community function at adulthood. During ineffective community selection (Figure 5B), P (T) correlates weakly with heritable determinant
, but strongly with nonheritable determinants BM (0) and ϕM (0).
Here, the evolutionary upper bound for was larger than that of
, opposite to that in Figures 4-6. (A) When the selected Adult community reproduced through pipetting such that BM (0) and fM (0) were not fixed, M was almost outcompeted by H (right) as H evolved to grow faster than M (Figure S21). Although M would ordinarily go extinct, community selection managed to maintain M at a very low level (Figure S21A, vi). This imbalanced species ratio resulted in very low community function (left). (B) When the selected Adult community reproduced through cell sorting such that H and M biomass were fixed in Newborns, community selection successfully improved community function and
. Strikingly, H’s growth parameters did not increase to upper bounds during effective community selection (Figure S21B), allowing a balanced species ratio (right) and high community function (left). Resource supplied to Newborn communities here supports 105 total biomass to accommodate faster growth rates (and hence community function is larger than in other figures). The legend is the same as in Figure 5 top and middle panels.
In successful community selections, variations in community function should be mainly caused by variations in its heritable determinants. However, we found that community function P (T) weakly correlated with its heritable determinant , but strongly correlated with its non-heritable determinants (Figure 6E-G). For example, the Newborn that would achieve the highest function had a below-median
(left magenta dot in Figure 6E), but had high total biomass BM (0) and low fraction of M biomass ΦM (0) (Figure 6F, G). In other words, variation in community function is largely non-heritable, as they are contributed by variations in non-heritable determinants.
The reason for strong correlations between P (T) and the two non-heritable determinants became clear by examining community dynamics. Recall that we had chosen maturation time so that Resource was in excess to avoid stationary phase. Thus, a “lucky” Newborn community starting with a higher-than-average total biomass would convert more Resource to Product (dotted lines in top panels of Figure S14). Similarly, if a Newborn started with higher-than-average fraction of H biomass, then H would produce higher-than-average Byproduct which meant that M would endure a shorter growth lag and make more Product (dotted lines in bottom panels of Figure S14).
An average Newborn community (solid lines) has a total biomass of 100 with 75% M. (A) A “lucky” Newborn community (dotted lines), by stochastic fluctuations, has a total biomass of 130 with 75% M. Even though the two communities share identical fP = 0.1, biomass of M in the Newborn starting with a total biomass of 130 can grow to a higher value (left), deplete more Resource (middle), and make more Product (right) by the end of short T (T = 17). (B) A “lucky” Newborn community (dotted lines), by stochastic fluctuations, has a total biomass of 100 with 65% M. Even though the two communities share identical fP = 0.1, higher fraction of H biomass results in faster accumulation of Byproduct. Consequently, M cells in the Newborn with lower ϕM (0) (dotted) suffer from a shorter growth lag and can grow to a larger size (left), deplete more Resource (middle), and make more Product (right) by the end of short T (T = 17). In both cases, the difference between lucky (dotted) and average (solid) communities is diminished at longer T (T = 20) compared to shorter T (T = 17, dash dot line).
To summarize, when community function significantly correlated with its non-heritable determinants (Figure 6F & G), community selection failed to improve community function (Figure 5B).
Reducing non-heritable variations in an experimentally feasible manner promotes artificial community selection
Reducing non-heritable variations in community function should enable community selection to work. One possibility would be to reduce the stochastic fluctuations in non-heritable determinants BM (0) and ΦM (0). Indeed, when each Newborn received a fixed biomass of H and M (Methods, Section 6), P (T) became strongly correlated with (Figure 5L). In this case, both
and community function P (T) improved under selection (Figure 5, J and K) to near the optimal. P (T) improvement was not seen if either Newborn total biomass or species fraction was allowed to fluctuate stochastically (Figure 5, D-I). P (T) also improved if fixed numbers of H and M cells (instead of biomass) were allocated into each Newborn (Figure S15, A and B; Methods, Section 6). Allocating a fixed biomass or number of cells from each species to Newborn communities could be experimentally realized by using a cell sorter if species have different fluorescence ([50]).
For left panels, the total cell number in Newborn communities was fixed to ⌊BMtarget/1.5⌋ where ⌊x⌋ means rounding down x to the nearest integer. For center panels, the ratio between M and H cell numbers in Newborn communities were fixed to IM (T)/IH (T), where IM (T) and IH (T) were the number of M and H cells in the selected Adult community from the previous cycle, respectively. For right panels, the total cell numbers of Newborn communities were fixed to ⌊BMtarget/1.5⌋ and the ratio between M and H cell numbers were fixed to IM (T)/IH (T). See Methods Section 6 for details of simulating community reproduction. Other legend details can be found in Figure 5.
Non-heritable variations in P (T) could also be curtailed by reducing the dependence of P (T) on non-heritable determinants. For example, we could extend the maturation time T to nearly deplete Resource. In this selection regimen, Newborns would still experience stochastic fluctuations in Newborn total biomass BM (0) and fraction of M biomass ΦM (0). However, all communities would end up with similar P (T) since “unlucky” communities would have time to “catch up” as “lucky” communities wait in stationary phase. Indeed, with this extended T, community function became strongly correlated with and community function improved without having to fix Newborn total biomass or species composition (Figure 5, M-O; Figure S15, C and D). However in practice, non-heritable variations in community function could still arise from stochastic fluctuations in the duration of stationary phase (which could affect cell survival or the length of recovery time in the next selection cycle).
As expected, the effectiveness of community selection also depends on the uncertainty in community function measurements - another source of non-heritable variations. When we added to each P (T) a measurement noise (normally distributed with mean of zero and standard deviation of 5% of the ancestral P (T) value), community function improved at a slower rate than zero measurement uncertainty (compare Figure S16 left panel with Figure 5 J & K). When measurement uncertainty doubled, community selection failed (Figure S16 right panels). Thus, multiple measurements to reduce measurement uncertainty can make community selection more effective.
Dynamics of and
of selected Adult communities when Adult communities were chosen to reproduce based on “measured P (T)” - the sum of actual P (T) and an “uncertainty term” randomly drawn from a normal distribution with zero mean. In the left, center, and right panels, the uncertainty term was drawn from normal distributions with standard deviations of 5%, 7.5%, and 10% of the ancestral P (T), respectively. The middle and lower panels show the average actual P (T) and the average measured P (T), respectively.
Robust conclusions under alternative model assumptions
We have demonstrated that during selection for high H-M community function, seemingly innocuous experimental procedures (e.g. pipetting) could be problematic, and more precise procedures might be required. Our conclusions hold when we used a much lower mutation rate (2 × 10-5 instead of 2 × 10-3 mutation per cell per generation per phenotype, Figure S17), although lower mutation rate slowed down community function improvement. Our conclusions also hold when we used a different distribution of mutation effects (a non-null mutation increased or decreased fP by on average 2%, Figure S18), or incorporating epistasis (a non-null mutation would likely reduce fP if the current fP was high, and enhance fP if the current fP was low; Figure S19; Figure S20; Methods Section 5).
(A, B) At short maturation time (T = 17, Resource was not exhausted in an average community), fixing both BM (0) and ϕM (0) was required for community function to improve. (C, D) At long maturation time (T = 20, Resource was nearly exhausted in an average community), community function improved without needing to fix BM (0) or ϕM (0). When both are fixed, community function improved even faster. At this mutation rate, because the population size of a community never exceeds 104, a mutation occurs on average every 5 cycles, resulting in step-wise improvement in both and
. Other legend details can be found in Figure 5.
Distribution of mutation effects at different current fP values (marked on top) are plotted. (Top) When there is no epistasis, distribution of mutational effects on fP (ΔfP) are identical regardless of current fP. (Middle and Bottom) With epistasis (see Methods Section 5 for definition of epistasis factor), mutational effects on fP depend on the current value of fP. If current fP is low (left), enhancing mutations are more likely to occur (the area to the right of ΔfP = 0 becomes bigger) and their mean mutational effect becomes larger (mean=1/slope becomes larger due to smaller slope), while diminishing mutations are less likely to occur and their mean mutational effect is smaller. If current fP is high (right), the opposite is true.
When we incorporated different epistasis strengths (epistasis factor of 0.3 and 0.8), we obtained essentially the same conclusions as when epistasis was not considered (Figure 5). Other legend details can be found in Figure 5.
To further test the generality of our conclusions, we simulated community selection on a mutualistic H-M community. Specifically, we assumed that Byproduct was inhibitory to H. Thus, H benefited M by providing Byproduct, and M benefited H by removing the inhibitory Byproduct, similar to the syntrophic community of Desulfovibrio vulgaris and Methanococcus maripaludis [51]. We obtained similar conclusions in this mutualistic H-M community (Figure S22).
Community selection can enforce species coexistence
In most communities, species coexistence may not be guaranteed due to competition for shared resources. Here, we show that properly executed community selection could also improve the functions of such communities, in part by forcing species coexistence. Consider an H-M community where H had the evolutionary potential to grow much faster than M. In this case, high community function not only required M to pay a fitness cost of fP, but also required H to grow sufficiently slowly to not out-compete M. When community selection was ineffective (“pipetting”; Figure 7A), H’s maximal growth rate evolved to exceed M’s maximal growth rate (Figure S21A, compare i and iv). This drove M to almost extinction, and community function was very low (Figure 7A; Figure S21A, vi). During effective community selection (fixing H and M’s biomass in Newborns; Figure 7B), H’s maximal growth rate remained far below its evolutionary upper bound, and H’s affinity for Resource even decreased from its ancestral value (Figure S21B, iv and v). In this case, H and M can coexist at a moderate ratio, and community function improved (Figure 7B).
Identical to Figures S7 and 7, the evolutionary upper bound for was larger than that of
, opposite to that in Figure 4. (A) Selected Adult communities were reproduced through through pipetting such that both BM (0) and ϕM (0) could stochastically fluctuate. Eventually, gHmax and gMmax evolved to their respective upper bounds, and thus gHmax > gMmax (compare i and iv). This would ordinarily lead to extinction of M. However, community selection managed to maintain M at a very low level (vi). (B) Selected Adult communities were reproduced through biomass sorting so that both BM (0) and ϕM (0) were fixed. Community selection worked in the sense that both
and P (T) improved over cycles (Figure 7). Strikingly, gHmax did not increase to its upper bound
, and H’s affinity for Resource even decreased from the ancestral level. Here, Resource supplied to Newborn communities could support 105 total biomass to accommodate faster growth rate.
(fraction of M biomass in Adult communities) were obtained by averaging across selected Adults. Other legend details can be found in Figure S9.
In summary, our conclusions seem general under a variety of model assumptions and apply to a variety of communities.
Discussions
How might we improve functions of multi-species microbial communities via artificial selection? A common approach is to identify appropriate combinations of species types [8, 9, 12, 11, 15]. However, if we solely rely on species types, then without a constant influx of new species, community function will likely level off quickly [10]. Here, we consider artificial selection of communities with defined member species so that improvement of community function requires new genotypes that contribute more toward the community function at a cost to itself.
Artificial selection of whole communities to improve a costly community function requires careful considerations. These considerations include the definition of community function (Methods, Section 7), species choice (Figures 3 and 4), mutation rate, the total number of communities under selection, Newborn target total biomass (Figure S8), the number of generations during maturation (which in turn depends on the amount of Resource added to each Newborn and the maturation time; Figure S8), how we reproduce a selected Adult (e.g. volumetric dilution versus cell sorting, Figure 5), and the uncertainty in community function measurements (Figure S16).
Some of these considerations concern the heredity of the community function under selection. If a community function is highly sensitive to species biomass in Newborn communities, such as P (T) of the H-M community, community selection faces a dilemma: On the one hand, a large Newborn size (BMtarget) would lead to reproducible take-over by non-producers (Figure S8). On the other hand, a small Newborn size means that large non-heritable variations in community function can readily arise (e.g. during pipetting) and interfere with selection (Figure 5A-C). In this case, suppressing such non-heritable variations (e.g. sorting a fixed biomass or a fixed cell number of each species into Newborns) was critical to successful community selection (Figure 5J-L; Figure S15). Similar conclusions hold when we varied model assumptions (Results).
In the work of [8], authors tested two selection regimens with Newborn sizes differing by 100-fold. The authors hypothesized that smaller Newborns would have a high level of variation which should facilitate selection. However, the hypothesis was not corroborated by experiments. As a possible explanation, the authors invoked the “butterfly effect” (the sensitivity of chaotic systems to initial conditions). Our results suggest that even for non-chaotic systems like the H-M community, selection could fail due to interference from non-heritable variations. This is because in Newborns with small sizes, fluctuations in community composition can be large, which compromises heredity of community trait.
In certain regards, community selection is similar to selection of mono-species groups. Group selection, and in a related sense, kin selection [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], have been extensively examined to explain, for example, the evolution of traits that lower individual fitness but increase the success of a group (e.g. sterile ants) 1. In both group selection and community selection, Newborn size must not be too large [57, 58] and maturation time must not be too long. Otherwise, all entities (groups or communities) will accumulate non-contributors in a similar fashion, and this low inter-entity variation impedes selection (Price equation [59]; Figure S1B; Figure S8).
Community selection and group selection do differ in other aspects. First, species interactions in a community could drive species composition to a value sub-optimal for community function ([60]). This problem does not exist for group selection especially when a group does not differentiate into interacting subgroups. Second, in group selection, when a Newborn group starts with a small number of individuals (e.g. one individual), a fraction of Newborn groups of the next cycle will be highly similar to the original Newborn group (Figure S1B, bottom panel). This facilitates group selection. In contrast, when a Newborn community starts with a small number of total individuals, large stochastic fluctuations in Newborn composition can interfere with community selection (Figure 5). In the extreme case, a member species may even be lost by chance. Even if a fixed biomass of each species is sorted into Newborns, heredity is much reduced during community selection due to random sampling of genotypes from member species 2.
Community function may not be maximized through pre-optimizing member species in monocultures. This is due to the difficulty of recapitulating community dynamics in monocultures. For example, we could start with H and M with all growth parameters at respective upper bounds (similar to Figure 5), and then improve M’s fP by mono-species group selection (Figure S1B). Specifically, we could start with ntot of 100 Newborn M groups, each inoculated with one M cell (to facilitate group selection, Figure S1B bottom panel) [57]. We would supply each Newborn M group with the same amount of Resource as we would for H-M communities and excess Byproduct (since it is difficult to reproduce community Byproduct dynamics in M groups) 3. After incubating these M groups for the same maturation time T, the group with the highest level of Product, P (T), would be selected and reproduced into Newborn M groups for the next cycle. Optimal fP for monoculture P (T) occurred at an intermediate value (; Figure S23; Figure S24). However, optimal fP for monoculture P (T) is much lower than optimal fP for community P (T) (Figure S23; see Methods Section 8 for an explanation). Thus, optimizing monoculture activity does not necessarily lead to optimized community function.
In the mutualistic H-M community, H generates Byproduct which is essential for M but inhibitory to H. (A) H can grow to a high density in the presence of M (top) but not in the absence of M (bottom). (B) Similar to community selection on commensal H-M communities, selection worked when non-heritable variations in P (T) were suppressed either via fixing both BM (0) and ϕM (0) at short T (T = 17) or via extending T (T = 20). Other legend details can be found in Figure 5.
Suppose that a Newborn M group starts with a single Manufacturer (biomass supplied with excess Byproduct and the same amount of Resource as in a Newborn H-M community. Then, maximal group function is achieved at (“mono-adapted”, dashed line), lower than the optimal fP for the community function
(Figure ??). Here, the growth parameters of M and H are all fixed at their upper bounds and P (T) has the unit of
.
Phenotypes averaged over selected groups are plotted for 500 selection cycles. Because Byproduct is in excess, KMB terms are no longer relevant in equations (Figure S4, RM≪ BM). Upper bounds of gMmax and 1/KMR are marked with green dashed lines. Magenta lines mark fP optimal for community function and maximal P (T) when gMmax and 1/KMR are fixed at their upper bounds and when Byproduct is in excess.
A general ramification of our work is that before launching a selection experiment, one should carefully consider the selection regimen. Although some community functions are not sensitive to fluctuations in Newborn biomass compositions (e.g. steady state ratio or growth rate of mutualistic communities [61, 33]), many are. How might we check? In the first method, one could initiate Newborn community replicates and measure community functions using the most precise method (e.g. cell sorting during Newborn formation; many repeated measurements of community function). Despite this, some levels of non-heritable variations in community function are inevitable due to, for example, non-genetic phenotypic variations among cells [62] or stochasticity in cell birth and death. If “noises” (variations among community replicates) are small compared to “signals” (variations among communities with different genotypes and thus different community functions), then one can test and possibly adopt less precise procedures (e.g. cell culture pipetting during Newborn formation; fewer repeated measurements of community function). In the second method, if significant variations in community function naturally arise within the first few cycles, one could experimentally evaluate whether community functions of the previous cycle (Figure 6A, bottom row) are correlated with community functions of the current cycle (Figure 6A, top row) across independent lineages.
Microbes can co-evolve with each other and with their host in nature [63, 64, 65]. Some have proposed that complex microbial communities such as the gut microbiota could serve as a unit of selection [14]. Our work suggests that if selection for a costly microbial community function should occur in nature, then mechanisms for suppressing non-heritable variations in community function should be in place.
Methods
1. Equations
H, the biomass of H, changes as a function of growth and death,
Grow rate gH depends on the level of Resource (hat ^ representing pre-scaled value) as described by the Monod growth model
where
is the
at which gHmax/2 is achieved. δH is the death rate of H. Note that since Waste is in excess, Waste level does not change and thus does not enter the equation.
M, the biomass of M, changes as a function of growth and death,
Total potential growth rate of M gM depends on the levels of Resource and Byproduct ( and
) according to the Mankad-Bungay model [23] due to its experimental support:
where
and
(Figure S3). 1 – fP fraction of M growth is channeled to biomass increase. fP fraction of M growth is channeled to making Product:
where
is the amount of Product made at the cost of one M biomass (tilde ∼ representing scaling factor, see below and Table 1).
Resource is consumed proportionally to the growth of M and H; Byproduct
is released proportionally to H growth and consumed proportionally to M growth:
Here, and
are the amounts of
consumed per potential M biomass and H biomass, respectively.
is the amount of
consumed per potential M biomass.
is the amount of
released per H biomass grown. Our model assumes that Byproduct or Product is generated proportionally to H or M biomass grown, which is reasonable given the stoichiometry of metabolic reactions and experimental support [66]. The volume of community is set to be 1, and thus cell or metabolite quantities (which are considered here) are numerically identical to cell or metabolite concentrations.
In equations above, scaling factors are marked by “∼”, and will become 1 after scaling. Variables and parameters with hats will be scaled and lose their hats afterwards. Variables and parameters without hats will not be scaled. We scale Resource-related variable and parameters (
, and
) against
(Resource supplied to Newborn), Byproduct-related variable
and parameters (
and
) against
(amount of Byproduct released per H biomass grown), and Product-related variable
against
(amount of Product made at the cost of one M biomass). For biologists who usually think of quantities with units, the purpose of scaling (and getting rid of units) is to reduce the number of parameters. For example, H biomass growth rate can be re-written as:
where
and
. Thus, the unscaled
and the scaled gH (R) share identical forms (Figure S3). After scaling, the value of
becomes irrelevant (1 with no unit). Similarly, since
and
(Figure S4).
Thus, scaled equations are
We have not scaled time here, although time can also be scaled by, for example, the community maturation time. Here, time has the unit of unit time (e.g. hr), and to avoid repetition, we often drop the time unit. After scaling, values of all parameters (including scaling factors) are in Table 1, and variables in our model and simulations are summarized in Table 2.
A summary of variables used in the simulation.
From Eq. 10:
If we approximate Eq. 6-7 by ignoring the death rates so that and
, Eq. 11 becomes
If B is the limiting factor for the growth of M so that B is mostly depleted, we can approximate B ≈ 0. If T is large enough so that both M and H has multiplied significantly and H(T) ≫ H(0) and M (T) ≫ M (0), Eq. 12 becomes
the M:H ratio at time T is
The steady state ΦM, ΦM,SS, is then
because if a community has ΦM (0) = ΦM,SS at its Newborn stage, it has the same ΦM (T) = ΦM,SS at its Adult stage.
In our simulations, because we supplied the H-M community with abundant R to avoid stationary phase, H grows almost at the maximal rate through T and releases B. If fP is not too large (fP < 0.4), which is satisfied in our simulations, M grows at a maximal rate allowed by B and keeps B at a low level. Thus, Eq. 14 is applicable and predicts the steady-state ΦM,SS well (see Figure S25). Note that significant deviation occurs when fP > 0.4. This is because when fP is large, M’s biomass does not grow fast enough to deplete B so that we cannot approximate B(T) ≈ 0 anymore.
2 Parameter choices
Our parameter choices are based on experimental measurements from a variety of organisms. Additionally, we chose growth parameters (maximal growth rates and affinities for metabolites) of ancestral and evolved H and M so that 1) the two species can coexist at a moderate ratio for a range of fP over multiple selection cycles and 2) improving all growth parameters up to their evolutionary upper bounds generally improves community function (Methods Section 3). This way, we could simplify our simulation by fixing growth parameters at their respective evolutionary upper bounds. With only one mutable parameter (fP), we can identify the optimal associated with maximal community function (Figure ??).
For ancestral H, we set gHmax = 0.25 (equivalent to 2.8-hr doubling time if we choose hr as the time unit), KHR = 1 and cRH = 10-4 (both with unit of ) (Table 1). This way, ancestral H can grow by about 10-fold by the end of T = 17. These parameters are biologically realistic. For example, for a lys-S. cerevisiae strain with lysine as Resource, un-scaled Monod constant is
, and consumption ĉ is 2 fmole/cell (Ref. [34], Figure 2 Source Data 1, bioRxiv). Thus, if we choose 10 µL as the community volume
and 2 µM as the initial Resource concentration, then
After scaling,
and
, comparable to values in Table 1.
To ensure the coexistence of H and M, M must grow faster than H for part of the maturation cycle. Since we have assumed M and H to have the same affinity for R (Table 1), gMmax must exceed gHmax (Figure 1), and M’s affinity for Byproduct (1/KMB) must be sufficiently large. Moreover, metabolite release and consumption need to be balanced to avoid extreme ratios between metabolite releaser and consumer. Thus for ancestral M, we chose gMmax = 0.58 (equivalent to a doubling time of 1.2 hrs). We set , meaning that Byproduct released during one H biomass growth is sufficient to generate 3 potential M biomass, which is biologically achievable ([33, 67]). When we chose
, H and M can coexist for a range of fP (Figure 3). This value is biologically realistic. For example, suppose that H releases hypoxanthine as Byproduct. A hypoxanthine-requiring S. cerevisiae M strain evolved under hypoxanthine limitation could achieve a Monod constant for hypoxanthine at 0.1 µM (bioRxiv). If the volume of the community is 10 µL, then
corresponds to an absolute release rate
fmole per releaser biomass born. At 8 hour doubling time, this translates to 6 fmole/(1 cell × 8 hr) ≈ 0.75 fmole/cell/hr, within the ballpark of experimental observation (∼0.3 fmole/cell/hr, bioRxiv). As a comparison, a lysine-overproducing yeast strain reaches a release rate of 0.8 fmole/cell/hr (bioRxiv) and a leucine-overproducing strain reaches a release rate of 4.2 fmole/cell/hr ([67]). Death rates δH and δM were chosen to be 0.5% of H and M’s respective upper bound of maximal growth rate, which are within the ballpark of experimental observations (e.g. the death rate of a lys- strain in lysine-limited chemostat is 0.4% of maximal growth rate, bioRxiv).
We assume that H and M consume the same amount of R per new cell (cRH = cRM) since the biomass of various microbes share similar elemental (e.g. carbon or nitrogen) compositions [68]. Specifically, cRH = cRM = 10-4 (units of ), meaning that the Resource supplied to each Newborn community can yield a maximum of 104 total biomass.
In simulations shown in Figures 4B, S9, S10, S13, growth parameters (maximal growth rates gMmax and gHmax and affinities for nutrients 1/KMR, 1/KMB, and 1/KHR) and production cost parameter (0 ≤ fP ≤ 1) were allowed to change from ancestral values during community maturation, since these phenotypes have been observed to rapidly evolve within tens to hundreds of generations ([25, 26, 27, 28]). For example, several-fold improvement in nutrient affinity and ∼20% increase in maximal growth rate have been observed in experimental evolution [28, 26]. We therefore allowed affinities 1/KMR, 1/KHR, and 1/KMB to increase by up to 3-fold, 5-fold, and 5-fold respectively, and allowed gHmax and gMmax to increase by up to 20%. These bounds also ensured that evolved H and M could coexist for fp < 0.5, and that Resource was on average not depleted by T to avoid cells entering stationary phase.
We also simulated community selection where improved growth parameters could reduce community function (Figure 4A). In this simulation, gHmax was allowed to increase by up to 220% and each Newborn community was supplied with R that can support up to 105 cells (10 units of ).
Although maximal growth rate and nutrient affinity can sometimes show trade-off (e.g. Ref. [26]), for simplicity we assumed here that they are independent of each other. We held metabolite consumption (cRM, cBM, cRH) constant because conversion of essential elements such as carbon and nitrogen into biomass is unlikely to evolve quickly and dramatically, especially when these elements are not in large excess ([68]). Similarly, we held the scaling factors and
constant, assuming that they do not change rapidly during evolution due to stoichiometric constraints of biochemical reactions. We held death rates (δM, δH) constant because they are much smaller than growth rates in general and thus any changes are likely inconsequential.
3 Choosing growth parameter ranges so that we can fix growth parameters to upper bounds
Improving individual growth (maximal growth rate and affinity for metabolites) does not always lead to improved community function (Figure 4A). However, we have chosen H and M growth parameters so that improving them from their ancestral values up to upper bounds generally improves community function (see below). When Newborn communities are assembled from “growth-adapted” H and M with growth parameters at upper bounds, two advantages are apparent.
First, after fixing growth parameters of H and M to their upper bounds, we can identify a locally maximal community function. Specifically, for a Newborn with total biomass BM (0) = 100 and fixed Resource R, we can calculate P (T) under various fP and ΦM (0), assuming that all M cells have the same fP. Since both numbers range between 0 and 1, we calculate P (T, fP = 0.01 × i, ΦM (0) = 0.01 × j) for integers i and j between 1 and 99. There is a single maximum for P (T) when i = 41 and j = 54. In other words, if M invests of its potential growth to make Product and if the fraction of M biomass in Newborn
, then maximal community function P*(T) is achieved (Figure ??A; magenta dashed line in Figure 5).
Second, growth-adapted H and M are evolutionarily stable in the sense that deviations (reductions) from upper bounds will reduce both individual fitness and community function, and are therefore disfavored by natural selection and artificial selection on the community function.
Below, we present evidence that within our parameter ranges (Table 1), improving growth parameters generally improves community function. When fP is optimal for community function , if we fix four of the five growth parameters to their upper bounds, then as the remaining growth parameter improves, community function increases (magenta lines in top panels of Figure S26). Moreover, mutants with a reduced growth parameter are out-competed by their growth-adapted counterparts (magenta lines in bottom panels of Figure S26).
In all figures, solid and dashed lines respectively represent calculations with (optimal for community function; Figure ??) and
(optimal for M monoculture production when Byproduct is in excess; Figure S23). Except for the indicated growth parameter, all other growth parameters were set to their respective upper bounds. (A-D) Community function increases as the indicated growth parameter increases. For example in (A), all growth parameters except for gMmax were set to their upper bounds. For each gMmax, the steady-state ϕM,SS was calculated using equations in Methods Section 1. This steady-state ϕM,SS was then used to calculate P (T). (F-I) The ratio between mutant population (whose indicated growth parameter was 10% lower than the upper bound) and growth-adapted population over maturation time T = 17. The decreasing ratio indicates that the mutant has a lower fitness compared to the growth-adapted cells. For example in (F), a Newborn community had 70 M and 30 H. 90% of M were growth-adapted and had upper bound gMmax = 0.7 (“upper bound”). 10% of M had gMmax = 0.63, 10% less than the upper bound (“mutant”). The ratio between “mutant” and “upper bound” cells declined over maturation time, indicating that mutant M cells had a lower fitness. (E, J) When fP = 0.13 (black dashed line) but not when fP = 0.41 (magenta line), increasing M’s affinity for Resource (1/KMR) slightly decreases individual fitness, and barely affects community function.
When (optimal for M-monoculture function in Figure S23; the starting genotype for most community selection trials in this paper), community function and individual fitness generally increase as growth parameters improve (black dashed lines in Figure S26). However, when M’s affinity for Resource (1/KMR) is reduced from upper bound, fitness improves slightly (black dashed line in Panel J, Figure S26). Mathematically speaking, this is a consequence of the Mankad-Bungay model [23] (Figure S4B). Let RM = R/KMR and BM = B/KMB. Then,
If RM ≪ 1 ≪ BM (corresponding to limiting R and abundant B),
and thus
. This is the familiar case where growth rate increases as the Monod constant decreases (i.e. affinity increases). However, if BM ≪ 1 ≪ RM
and thus
. In this case, growth rate decreases as the Monod constant decreases (i.e. affinity increases). In other words, decreased affinity for the abundant nutrient improves growth rate. Transporter competition for membrane space [69] could lead to this result, since reduced affinity for abundant nutrient may increase affinity for rare nutrient. At the beginning of each cycle, R is abundant and B is limiting (Eq. 16). Therefore M cells with lower affinity for R will grow faster than those with higher affinity (Figure S27). At the end of each cycle, the opposite is true (Figure S27). As fP decreases, M diverts more toward biomass growth and the first stage of B limitation lasts longer. Consequently, M can gain a slightly higher overall fitness by lowering the affinity for R (Figure S27A).
(A) The ratio between MLowAff with low affinity for R () and MHighAff with high affinity for R
when their fP is equal to 0.1 (solid line), 0.2 (dotted line) and 0.3 (dashed line) are plotted over one maturation cycle. (B) P (T) improves over increasing affinity
when fP is 0.1 (solid line), 0.2 (dotted line) and 0.3 (dashed line). The dependence of P (T) on
is rather weak for low fP. For example, when
increases from 1 to 3, P (T) increases by only 2% and 0.6% for fP = 0.2 and fP = 0.1, respectively.
Regardless, decreased M affinity for Resource (1/KMR) only leads to a very slight increase in M fitness (Figure S26J) and a very slight decrease in P (T) (Figure S27B). Moreover, this only occurs at low fP at the beginning of community selection, and thus may be neglected. Indeed, if we start all growth parameters at their upper bounds and fP = 0.13, and perform community selection while allowing all parameters to vary (Figure S28), then 1/KMR decreases somewhat, yet the dynamics of fP is similar to when we only allow fP to change (compare Figure S28D with Figure 5A).
In the Newborn communities of the first cycle of community selection, all growth parameters of H and M were at their upper bounds and (Figure S23). When we simulated community selection while allowing all growth parameters and fP to vary, M’s affinity for R
decreased slightly because at low fP = 0.13, M with a lower affinity for R (lower 1/KMR) has a slightly improved individual fitness (Figure S27). Other growth parameters (
and
) remain mostly constant during community selection because mutants with lower-than-maximal values were selected against by natural selection and by artificial selection for higher community function (Figure S26). Other legend details can be found in Figure S9.
Mutation rate and the distribution of mutation effects
Literature values of mutation rate and the distribution of mutation effects are highly variable. Below, we briefly review the literature and discuss rationales of our choices.
Among mutations, a fraction is neutral in that they do not affect the phenotype of interest. For example, the vast majority of synonymous mutations are neutral [70]. Furthermore, mutations wtih small effects may appear neutral, which can depend on the effective population size and selection condition. For example, at low population size due to genetic drift (i.e. changes in allele frequencies due to chance), a beneficial or deleterious mutation may not be selected for or selected against, and is thus neutral with respect to selection [71, 72]. As another example, the same mutation in an antibiotic-degrading gene can be neutral under low antibiotic concentrations, but deleterious under high antibiotic concentrations [73]. We term all these cases as “neutral” mutations.
Since a larger fraction of neutral mutations is equivalent to a lower rate of phenotype-altering mutations, our simulations define “mutation rate” as the rate of non-neutral mutations that either enhance a phenotype (“enhancing mutations”) or diminish a phenotype (“diminishing mutations”). Enhancing mutations of maximal growth rates (gHmax and gMmax) and of nutrient affinities (1/KHR, 1/KMR, 1/KMB) enhance the fitness of an individual (“beneficial mutations”). In contrast, enhancing mutations in fp diminish the fitness of an individual (“deleterious mutations”).
Depending on the phenotype, the rate of phenotype-altering mutations is highly variable. Although mutations that cause qualitative phenotypic changes (e.g. drug resistance) occur at a rate of 10-8∼10-6 per genome per generation in bacteria and yeast [74, 75], mutations affecting quantitative traits such as growth rate occur much more frequently. For example in yeast, mutations that increase growth rate by ≥ 2% occur at a rate of ∼10-4 per genome per generation (calculated from Figure 3 of Ref. [76]), and mutations that reduce growth rate occur at a rate of 10-4∼ 10-3 per genome per generation [31, 77]. Moreover, mutation rate can be elevated by as much as 100-fold in hyper-mutators where DNA repair is dysfunctional [78, 79, 77]. In our simulations, we assume a high, but biologically feasible, rate of 2 × 10-3 phenotype-altering mutations per cell per generation per phenotype to speed up computation. At this rate, an average community would sample ∼20 new mutations per phenotype during maturation. We have also simulated with a 100-fold lower mutation rate. As expected, evolutionary dynamics slowed down, but all of our conclusions still held (Figure S17).
Among phenotype-altering mutations, tens of percent create null mutants, as illustrated by experimental studies on protein, viruses, and yeast [29, 30, 31]. Thus, we assumed that 50% of phenotype-altering mutations were null (i.e. resulting in zero maximal growth rate, zero affinity for metabolite, or zero fP). Among non-null mutations, the relative abundances of enhancing versus diminishing mutations are highly variable in different experiments. It can be impacted by effective population size. For example, with a large effective population size, the survival rate of beneficial mutations is 1000-fold lower due to clonal interference (competition between beneficial mutations) [80]. The relative abundance of enhancing versus diminishing mutations also strongly depends on the starting phenotype [29, 73, 71]. For example with ampicillin as a substrate, the wild-type TEM-1 β-lactamase is a “perfect” enzyme. Consequently, mutations were either neutral or diminishing, and few enhanced enzyme activity [73]. In contrast with a novel substrate such as cefotaxime, the enzyme had undetectable activity, and diminishing mutations were not detected while 2% of tested mutations were enhancing [73]. When modeling H-M communities, we assumed that the ancestral H and M had intermediate phenotypes that can be enhanced or diminished.
We based our distribution of mutation effects on experimental studies where a large number of enhancing and diminishing mutants have been quantified in an unbiased fashion. An example is a study from the Dunham lab where the fitness effects of thousands of S. cerevisiae mutations were quantified under various nutrient limitations [32]. Specifically for each nutrient limitation, the authors first measured , the deviation in relative fitness of thousands of barcoded wild-type control strains from the wild-type mean fitness (i.e. selection coefficients). Due to experimental noise, ΔsWT is distributed with zero mean and non-zero variance. Then, the authors measured thousands of ΔsMT, each corresponding to the relative fitness change of a bar-coded mutant strain with respect to the mean of wild-type fitness (i.e.
). From these two distributions, we derived µ∆s, the probability density function (PDF) of relative fitness change caused by mutations ∆s = ∆sMT – ∆ sW T (see Figure S6 for interpreting PDF), in the following manner.
First, we calculated µm(ΔsMT), the discrete PDF of the relative fitness change of mutant strains, with bin width 0.04. In other words, µm(ΔsMT) =counts in the bin of [ΔsMT–0.02, Δ sMT + 0.02]/ total counts/0.04 where ΔsMTranges from –0 6 and 0 6 which is sufficient to cover the range of experimental outcome. The Poissonian uncertainty of . Repeating this process for the wild-type collection, we obtained the PDF of the relative fitness change of wild-type strains µw(Δ sW T). Next, from µw(Δ sW T) and µm(Δ sMT), we derived µ Δ s(Δ s), the PDF of Δ s with bin width 0.04:
assuming that µsMT and ΔsW T are independent from each other. Here, i is an integer from -15 to 15. The uncertainty for µΔs was calculated by propagation of error. That is, if f is a function of xi (i = 1, 2,…, n), then s, the error of f, is
, where
is the error or uncertainty of xi. Thus,
where µw(j) is short-hand notation for µw(Δ sW T = j × 0.04) and so on. Our calculated µΔs(Δs) with error bar of δµΔs is shown in Figure S6.
Our reanalysis demonstrated that distributions of mutation fitness effects µΔs(Δs) are largely conserved regardless of nutrient conditions and mutation types (Figure S6B). In all cases, the relative fitness changes caused by beneficial (fitness-enhancing) and deleterious (fitness-diminishing) mutations can be approximated by a bilateral exponential distribution with means s+ and s- for the positive and negative halves, respectively. After normalizing the total probability to 1, we have:
We fitted the Dunham lab haploid data (since microbes are often haploid) to Eq.19, using µΔs(i)/δµΔs(i) as the weight for non-linear least squared regression (green lines in Figure S6B). We obtained s+ = 0.050 ± 0.002 and s-= 0.067 ±.003.
Interestingly, exponential distribution described the fitness effects of deleterious mutations in an RNA virus remarkably well [29]. Based on extreme value theory, the fitness effects of beneficial mutations were predicted to follow an exponential distribution [81, 82], which has gained experimental support from bacterium and virus [83, 84, 85] (although see [86, 76] for counter examples). Evolutionary models based on exponential distributions of fitness effects have shown good agreements with experimental data [80, 87].
We have also simulated smaller average mutational effects based on measurements of spontaneous or chemically-induced (instead of deletion) mutations. For example, the fitness effects of nonlethal deleterious mutations in S. cerevisiae were mostly 1%∼5% [31], and the mean selection coefficient of beneficial mutations in E. coli was 1%∼2% [83, 80]. As an alternative, we also simulated with s+ = s- = 0.02, and obtained the same conclusions (Figure S18).
5 Modeling epistasis on fP
Epistasis, where the effect of a new mutation depends on prior mutations (“genetic background”), is known to affect evolutionary dynamics. Epistatic effects have been quantified in various ways. Experiments on viruses, bacteria, yeast, and proteins have demonstrated that if two mutations were both deleterious or random, viable double mutants experienced epistatic effects that distributed nearly symmetrically around a value close to zero [88, 89, 90, 91, 92]. In other words, a significant fraction of mutation pairs show no epistasis, and a small fraction show positive or negative epistasis (i.e. a double mutant displays a stronger or weaker phenotype than expected from additive effects of the two single mutants). Epistasis between two beneficial mutations can vary from being predominantly negative [89] to being symmetrically distributed around zero [90]. Furthermore, a beneficial mutation tends to confer a lower beneficial effect if the background already has high fitness (“diminishing returns”) [93, 90, 94].
A mathematical model by Wiser et al. incorporates diminishing-returns epistasis [87]. In this model, beneficial mutations of advantage s in the ancestral background are exponentially distributed with probability density function (PDF) α exp(− αs), where 1/α > 0 is the mean advantage. After a mutation with advantage s has occurred, the mean advantage of the next mutation would be reduced to 1/[α(1 + gs)], where g > 0 is the “diminishing returns parameter”. Wiser et al. estimates g ≈ 6. This model quantitatively explains the fitness dynamics of evolving E. coli populations.
Based on the above experimental and theoretical literature, we modeled epistasis on fP in the following manner. Let the relative mutation effect on fP be ΔfP = (fP,mut − fP) /fP (note ΔfP ≥ − 1). Then, µ(ΔfP, fP), the PDF of ΔfP at the current fP value, is described by a form similar to Eq. 19:
Here, s+(fP) and s− (fP) are respectively the mean ΔfP for enhancing and diminishing mutations at current fP. We assigned s+(fP) = s+init/(1 + g × (fP /fP, init − 1)), where fP, init is the fP of the initial background in a community selection simulation , s+init is the mean enhancing ΔfP occurring in the initial background, and 0 < g < 1 is the epistatic factor. Similarly, s-(fP) = s-init × (1+g ×(fP /fP, init -1)) is the mean |ΔfP | for diminishing mutations at current fP. In the initial background, since fP = fP, init, we have s+(fP) = s+init and s-(fP) = s-init (s+init = 0.050 and s-init = 0.067 in Figure S6). Consistent with the diminishing returns principle, for subsequent mutations that alter fP, if current fP > fP,init, then a new enhancing mutation became less likely and its mean effect smaller, while a new diminishing mutation became more likely and its mean effect bigger (ensured by g > 0; Figure S19 right panel). Similarly, if current fP < fP,init, then a new enhancing mutation became more likely and its mean effect bigger, while a diminishing mutation became less likely and its mean effect smaller (ensured by 0 < g < 1; Figure S19 left panel). In summary, our model captured not only diminishing-returns epistasis, but also our understanding of mutational effects on protein stability [71].
6 Simulation code of community selection
As described in the main text, our simulations tracked the biomass and phenotypes of individual cells as well as the amounts of Resource, Byproduct, and Product in each community throughout community selection. Cell biomass growth, cell division, and changes in chemical concentrations were calculated deterministically. Stochastic processes including cell death, mutation, and the partitioning of cells of a selected Adult community into Newborn communities were simulated using the Monte Carlo method.
Specifically, each simulation was initialized with a total of ntot = 100 Newborn communities with identical configuration:
each community had 100 total cells of biomass 1. Thus, total biomass BM (0) = 100.
40 cells were H. 60 cells were M with identical fP. Thus, M biomass M (0) = 60 and fraction of M biomass ϕM (0) = 0.6.
Our community selection simulations did not consider mutations arising during pre-growth prior to inoculating Newborns of the first cycle, because incorporating pre-growth had little impact on evolution dynamics (Figure S29).
(Top Panels) Histograms of the number of Newborn communities free of non-producer M mutants when Newborn communities from the first cycle were inoculated from a single M monoculture (Left panel) or from independently-grown M monocultures (Right panel). (Middle and Bottom panels) Improvement in and
was only slightly slower when Newborn communities from the first cycle were inoculated by the same M monoculture (Left panel) than by distinct monocultures (Right panel). Here we assumed that each M monoculture grew from a single non-null M cell. This M cell went through ∼23 doublings and therefore multiplied into ∼107 cells. Every time a non-null M cell divides, the mother and daughter cells can independently mutate and become a null M cell (fP = 0) at a fixed probability of 10-3. If a non-null M cell has fP = 0.13, then it will grow at a rate 87% of that of a null cell. After ∼23 doublings, the M monocultures have on average ∼3% null mutants. 60 randomly-chosen M cells from the same monoculture or from distinct monocultures, together with 40 H cells, were used to inoculate each of the 100 Newborns for the first selection cycle. To generate the histograms in (Top panel), the pre-growth and inoculation process was repeated 100 times.
At the beginning of each selection cycle, a random number was used to seed the random number generator for each Newborn community. This number was saved so that the maturation of each Newborn community can be replayed. In most simulations, the initial amount of Resource was 1 unit of unless otherwise specified, the initial Byproduct was B(0) = 0 and the initial Product P (0) = 0.
The maturation time T was divided into time steps of Δτ = 0.05. Resource R(t) and Byproduct B(t) during each time interval [τ, τ + Δτ] were calculated by solving the following equations (similar to Eqs. 9-10) using the initial condition R(τ) and B(τ) via the ode23s solver in Matlab:
where M (τ) and H(τ) were the biomass of M and H at time τ (treated as constants during time interval [τ, τ +Δτ]), respectively. The solutions from Eq. 21 and 22 were used in the integrals below to calculate the biomass growth of H and M cells.
Suppose that H and M were rod-shaped organisms with a fixed diameter. Thus, the biomass of an H cell at time τ could be written as the length variable LH (τ). The continuous growth of LH during τ and τ + Δτ could be described as
or
Thus,
Similarly, let the length of an M cell be LM (τ). The continuous growth of M could be described as
Thus for an M cell, its length LM (τ + Δτ) could be described as
From Eq. 7 and 8, within Δτ,
and therefore
where M (τ + Δτ) = LM (τ + Δτ) represented the sum of the biomass (or lengths) of all M cells at τ + Δτ.
At the end of each Δτ, each H and M cell had a probability of δH Δτ and δM Δτ to die, respectively. This was simulated by assigning a random number between [0, 1] for each cell. Cells assigned with a random number less than δH Δτ or δM Δτ then got eliminated. For surviving cells, if a cell’s length ≥2, this cell would divide into two cells with half the original length.
After division, each mutable phenotype of each cell had a probability of Pmut to be modified by a mutation (Methods, Section 4). As an example, let’s consider mutations in fP. If a mutation occurred, then fP would be multiplied by (1 + ΔfP), where ΔfP was determined as below.
First, a uniform random number u1 between 0 and 1 was generated. If u1 ≤ 0.5, ΔfP = -1, which represented 50% chance of a null mutation (fP = 0). If 0.5 < u1 ≤ 1, ΔfP followed the distribution defined by Eq. 20 with s+(fP) = 0.05 for fP-enhancing mutations and s-(fP) = 0.067 for fP-diminishing mutations when epistasis was not considered (Methods, Section 4). In the simulation, ΔfP was generated via inverse transform sampling. Specifically, C(ΔfP), the cumulative distribution function (CDF) of ΔfP, could be found by integrating Eq. 19 from -1 to ΔfP :
The two parts of Eq. 25 overlap at C(ΔfP = 0) = s- (1-exp(-1/s-))/ [s+ + s- (1-exp(1/s-))].
In order to generate ΔfP satisfying the distribution in Eq. 19, a uniform random number u2 between 0 and 1 was generated and we set C(ΔfP) = u2. Inverting Eq. 25 yielded
When epistasis was considered, s+(fP) = s+init/(1 + g × (fP /fP, init-1)) and s- (fP) = s -init× (1 + g × (fP /fP, init-1)) were used in Eq. 26 to calculated ΔfP for each cell. (Methods Section 5).
If a mutation increased or decreased the phenotypic parameter beyond its bound (Table 1), the phenotypic parameter was set to the bound value.
The above growth/death/division/mutation cycle was repeated from time 0 to T. Note that since the size of each M and H cell can be larger than 1, the integer numbers of M and H cells, IM and IH, are generally smaller than the numerical values of biomass M and H, respectively. At the end of T, Adult communities were sorted according to their P (T) values. The Adult community with the highest P (T) (or a randomly-chosen Adult in control simulations) was selected for reproduction.
Before community reproduction, the current random number generator state was saved so that the random partitioning of Adult communities could be replayed. To mimic partitioning Adult communities via pipetting into Newborn communities with an average total biomass of BMtarget, we first calculated the fold by which this Adult would be diluted as nD = ⌊ (M (T) + H(T)) /BMtarget.⌋. Here BMtarget = 100 was the pre-set target for Newborn total biomass, and ⌊x⌋ is the floor (round down) function that generates the largest integer that is smaller than x. If the Adult community had IH (T) H cells and IM (T) cells, IH (T) + IM (T) random integers between 1 and nD were uniformly generated so that each M and H cell was assigned a random integer between 1 and nD. All cells assigned with the same random integer were then assigned to the same Newborn, generating nD newborn communities. This partition regimen can be experimentally implemented by pipetting 1/nD volume of an Adult community into a new well. If nD was less than ntot (the total number of communities under selection), all nD newborn communities were kept and the Adult with the next highest function was partitioned to obtain an additional batch of Newborns until we obtain ntot Newborns. The next cycle then began.
To fix BM (0) to BMtarget and ϕM (0) to ϕM (T) of the parent Adult, the code randomly assigned M cells from the selected Adult until the total biomass of M came closest to BMtargetϕM (T) without exceeding it. H cells were assigned similarly. Because each M and H cells had a length between 1 and 2, the biomass of M could vary between BMtargetϕM (T) -2 and BMtargetϕM (T) and the biomass of H could vary between BMtarget(1-ϕM (T)) -2 and BMtarget(1 - ϕM (T)). Variations in BM (0) and ϕM (0) were sufficiently small so that community selection improved (Figure 5 K and L). We also simulated sorting cells so that the H and M cell numbers (instead of biomass) were fixed in Newborns. Specifically, ⌊ BMtargetϕM (T)/1.5 ⌋ M cells and ⌊ BMtarget(1-ϕM (T))/1.5 ⌋ H cells were sorted into each Newborn community, where we assumed that the average biomass of a cell was 1.5, and ϕM (T) = IM (T)/(IM (T)+ IH (T)) was calculated from cell numbers. We obtained the same conclusion (Figure S15, right panels).
To fix Newborn total biomass BM (0) to the target total biomass BMtarget while allowing ϕM (0) to fluctuate (Figure 5 D and E), H and M cells were randomly assigned to a Newborn community until BM (0) came closest to BMtarget without exceeding it (otherwise, P (T) might exceed the theoretical maximum). For example, suppose that a certain number of M and H cells had been sorted into a Newborn so that the total biomass was 98.6. If the next cell, either M or H, had a biomass of 1.3, this cell would go into the community so that the total biomass would be 98.6 + 1.3 = 99.9. However, if a cell of mass 1.6 happened to be picked, this cell would not go into this community so that this Newborn had a total biomass of 98.6 and the cell of mass 1.6 would go to the next Newborn. Thus, each Newborn might not have exactly the biomass of BMtarget, but rather between BMtarget 2 and BMtarget. Experimentally, total biomass can be determined from the optical density, or from the total fluorescence if cells are fluorescently labeled ([50]). To fix the total cell number (instead of total biomass) in a Newborn, the code randomly assigned a total of ⌊ BMtarget/1.5 ⌋cells into each Newborn, assuming an average cell biomass of 1.5. We obtained the same conclusion, as shown in Figure S15.
To fix ϕM (0) to ϕM (T) of the selected Adult community from the previous cycle while allowing BM (0) to fluctuate (Figure 5 G and H), the code first calculated dilution fold nD in the same fashion as mentioned above. If the Adult community had IH (T) H cells and IM (T) cells, IM (T) random integers between [1, nD] were then generated for each M cell. All M cells assigned the same random integer joined the same Newborn community. The code then randomly dispensed H cells into each Newborn until the total biomass of H came closest to M (0)(1 - ϕM (T))/ϕM (T) without exceeding it, where M (0) was the biomass of all M cells in this Newborn community. Again, because each M and H had a biomass (or length) between 1 and 2, ϕM (0) of each Newborn community might not be exactly ϕM (T) of the selected Adult community. We also performed simulations where the ratio between M and H cell numbers in the Newborn community, IM (0)/IH (0), was set to IM (T)/IH (T) of the Adult community, and obtain the same conclusion (Figure S15 center panels).
7. Problems associated with alternative definitions of community function and alternative means of reproducing an Adult
Here we describe problems associated with two alternative definitions of community function and one alternative method of community reproduction.
One alternative definition of community function is Product per M biomass in an Adult community: P (T)/M (T). To illustrate problems with this definition, let’s calculate P (T)/M (T) assuming that cell death is negligible. From Eq. 7 and 8,
where biomass growth rate gM is a function of B and R. Thus,
and we have
if M (T) ≫ M (0) (true if T is long enough for cells to double at least three or four times).
If we define community function as , then higher community function requires higher
or higher fP. However, if we select for very high fP, then M can go extinct (Figure 3).
If the community function is instead defined as P (T)/M (0), then
From Eq. 27, at a fixed fP, P (T)/M (0) increases as ∫ TgMdt increases. ∫ TgMdt increases as ϕM (0) decreases, since the larger fraction of Helper, the faster the accumulation of Byproduct and the larger ∫ TgMdt (Figure S14B). As a result, when we select for higher P (T)/M (0), we end up selecting communities with small ϕM (0) (Figure S2). This means that Manufactures could get lost during community reproduction, and community selection then fails.
In our community selection scheme, the average total biomass of Newborn communities was set to a constant BMtarget. Alternatively, each Adult community can be partitioned into a constant number of Newborn communities. If Resource is not limiting, there is no competition between H and M, and P (T) increases as M (0) and H(0) increase. Therefore, selection for higher P (T) results in selection for higher Newborn total biomass (instead of higher fp). This will continue until Resource becomes limiting, and then communities will get into the stationary phase.
8
is smaller for M group than for H-M community
For groups or communities with a certain ∫ TgMdt, we can calculate fP optimal for community function from Eq. 27 by setting
We have
or
If ∫ TgMdt ≫1, fP is very small, then the optimal fP for P(T) is
M grows faster in monoculture than in community because B is supplied in excess in monoculture while in community, H-supplied Byproduct is initially limiting. Thus, ∫ TgMdt is larger in monoculture than in community. According to Eq. 28, is smaller for monoculture than for community.
9 Stochastic fluctuations during community reproduction
The number of cells in a Newborn community is approximately , where
is the average biomass (or length) of M and H cells. This number fluctuates in a Poissonian fashion with a standard deviation of
. As a result, the biomass of a Newborn communities fluctuates around BMtarget with a standard deviation of
.
Similarly, M (0) and H(0) fluctuate independently with a standard deviation of and
, respectively, where “E” means the expected value. Therefore, M (0)/H(0) fluctuates with a variance of
where “Cov” means covariance and “Var” means variance, and ΦM (T) is the fraction of M biomass in the Adult community from which Newborns are generated.
10 Mutualistic H-M community
In the mutualistic H-M community, Byproduct inhibits the growth of H. According to [95], the growth rate of E. coli decreases exponentially as the exogenously added acetate concentration increases. Thus, we only need to modify the growth of H by a factor of exp(-B/B0) where B is the concentration of Byproduct and B0 is the concentration of Byproduct at which H’s growth rate is reduced by e-1∼0.37:
The larger B0, the less inhibitory effect Byproduct has on H and when B0 → + ∞Byproduct does not inhibit the growth of H. For simulations in Figure S22, we set B0 = 2KMB.
Acknowledgment
We thank the following for discussions: Lin Chao (UCSD), Maitreya Dunham (UW Seattle), Corina Tarnita (Princeton), Harmit Malik (Fred Hutch), Jeff Gore (MIT), Daniel Weissman (Emory), and Al-varo Sanchez (Yale). Some of these discussions took place at the 2017 “Systems Biology and Molecular Economy of Microbial Communities” workshop at the International Centre for Theoretical Physics, Trieste, Italy and at the 2018 “Physical Principles Governing the Organization of Microbial Communities” workshop at the Aspen Center for Physics, Colorado, USA. We thank Chichun Chen, Bill Hazelton, Samuel Hart, David Skelding, Doug Jackson, Maxine Linial, Delia Pinto-Santini, Kirill Korolev (Boston University), and Alex Sigal (K-RITH) for feedback on the manuscript. We are particularly indebted to Jim Bull (UT Austin) who generously provided sentence-by-sentence critique. This research was supported by the High Performance Computing Shared Resource of the Fred Hutch (P30 CA015704).
Footnotes
↵1 Group selection is often applied in a broader sense to spatially-structured populations to explain the evolution of cooperative traits [52, 53]. In these cases, individuals form groups. Within each cycle, individuals grow based on their genotype (e.g. cooperators or cheaters) and group environment (cooperator-dominated or cheater-dominated). At the end of each cycle, individuals migrate among groups. However, if there are no births or deaths of groups, then selection acts on individuals instead of on groups [54, 55, 56].
↵2 For example, if Newborn groups are initiated with a single contributor and if the highest-functioning Adult group has accumulated 50% non-contributors, then 50% Newborns of the next cycle will be initiated with a single contributor. In contrast, if a Newborn community starts with one contributor from each of the two species and if the highest-functioning Adult has accumulated 50% non-contributors in each species, then only 50% × 50%= 25% Newborns of the next cycle will be initiated with pure contributors.
↵3 Since Newborn groups start with a single M individual, artificial group selection here can also be viewed as artificial individual selection where the trait under selection is an individual M’s ability to make Product over time T as the individual grows into a population.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵