Novel artificial selection method improves function of simulated microbial communities

There is increasing interest in artificially selecting or breeding microbial communities, but experiments have reported modest success and it remains unclear how to best design such a selection experiment. Here, we develop computational models to simulate two previously known selection methods and compare them to a new “disassembly” method that we have developed. Our method relies on repeatedly competing different communities of known species combinations against one another, and sometimes changing the species combinations. Our approach significantly outperformed previous methods that could not maintain enough between-community diversity for selection to act on. Instead, the disassembly method allowed many species combinations to be explored throughout a single selection experiment. Nevertheless, selection at the community level in our simulations did not counteract selection at the individual level. Species in our model can mutate, and we found that they evolved to invest less into community function and more into growth. Increased growth compensated for reduced investment, however, and overall community performance was barely affected by within-species evolution. Our work provides important insights that will help design community selection experiments.


23
Humans have been breeding plants and animals for centuries 24 by allowing individuals with the most desirable traits to se-25 lectively produce offspring. Also known as "artificial selec-26 tion" or "directed evolution", breeding has altered traits such 27 as the size of fruits or the enzymatic activity of proteins used 28 in biotechnology (1). More recently, we have started to ap-29 preciate that microbes -often multi-species communities of 30 microbes -play an important role for health and the envi-31 ronment. One way to improve or optimize the functions and 32 services that these microbes provide is to select for their traits 33 in the same way as traditional breeding. 34 However, breeding microbial communities is less straightfor-      signed-rank test, p < 10 −3 , n = 50), we did not find any 298 significant improvements in the maximum degradation score 299 compared to round 0 (p = 0.9, n = 50, Fig. 2D). Further,

300
PIS finds higher-ranking communities than PS in both the 301 IBM (two-sided Wilcoxon signed-rank test for differences in 302 ranks between PIS and PS, p < 10 −6 , n = 50, Fig. 2F) and     invasion can be more stable than disassembly (Fig. 6A).

487
Another factor that we do not explore in detail here is the

527
From an experimenter's perspective, it is also useful to con-528 sider where one can reduce the size and complexity of the ex-529 periment. We find that the parameter with the biggest effect 530 on the method is the size of the initial species set and that 531 the disassembly method is more efficient for larger sets of 532 species (Fig. 6). Disassembling rich communities may how-533 ever prove quite challenging in practice, and future experi-534 mental work will aim to make this step more efficient. We 535 also find that for disassembly, a higher number of commu-536 nities grown in each selection round can compensate for the 537 number of selection rounds needed since we are then able to 538 search more distinct communities per round of selection.

D R A F T
We have made a number of assumptions for our models.  Table 1 and     Table 1, Fig. S12).

659
Cell division consists of two steps, here called "activation" The max_uptake scales down the amount of degradation 672 or the probability to activate when not all the nutrients con- wheren ij is a re-scaled version of n ij : if some nutrients are Activation probability Beta(2,2) r i Replication probability Beta(2,2) n ij Consumption rate of nutrient j Uni(0,1), Sparse, Rescaled so that Fraction of consumed nutrients invested Uni(0,1), Sparse, into degradation of toxic compound k Rescaled so that Death rate of strain i due to toxic compound k Uni(0.001,0.02), Sparse Where T k is the current concentration of toxic compound k 724 and the constant K = 700.
725 Population-level model. As the IBM above, the ODE model simulates well-mixed batch cultures with nutrients and toxic compounds, extending a previous model (25). In the model, the population size S i of each strain i in a community grows in relation to the concentrations of nutrients N j and decline by the toxic compounds T k (Fig. S13) by the model parameters in Table 2. Growth, death, nutrient uptake and degradation is described by the following ODE system: The bold-face N, T denote the vectors of all nutrients and toxic compounds, respectively. We assume Monod and Hill functions for the per-capita growth and death rates ρ i , µ i .
The system of equations Eq.

D R A F T
(Average) biomass yield lognormal(log(10 −3 ), log(5)) with respect to the nutrients δ i (Average) degradation efficiency lognormal(log(10 −4 ), log(5)) with respect to the toxic compounds Table 2. Parameters defining a microbial strain i in the ODE model. Growth and degradation parameters in relation to nutrients and toxic compounds Nj and T k . All parameters are assumed to be positive, and the investment f ik is limited to the interval [0, 1]. The matrices of growth rates, death rates and degradation investment rij , m ik and f ik are made sparse by multiplying them by matrices drawn from Bernoulli(0.5), i.e. flipping a coin for each entry. In this way, each species takes up approximately half of the nutrients, is affected by half of the toxic compounds and degrades half of the toxic compounds.

Artificial selection methods.
After scoring all communi-752 ties in a given round (see above), a fraction of these "parent" communities is propagated to "offspring" for the next 754 round of growth (Fig. 1A, Alg. 1). Here we implement 755 several methods of propagating the selected communities 756 as described below (Fig. 1B, C), and compare them to a     (Fig. 1D, Alg. 10). By disassembling these communities, 811 we maintain a record of samples of each species that were 812 present in at least one selected community in each round. If 813 a species is present in more than one selected community, we 814 sample from the highest-scoring community that this species 815 was part of. In this way, we are able to re-introduce any 816 species that went extinct.

817
To select against extinctions and communities whose mem- Having found a number of species to remove, we choose 844 the species to remove with uniform probability, but avoid re-845 moving any species that is present in only this community.

846
Next, we introduce one or more invader species -as above,

859
We use the scipy (36) implementations for all three methods. 860 We quantify species diversity within a community (Fig. 3E,

Pseudo-code for implementation of models and selection methods
Input: A set of 15 species, defined by model parameters. Input: Experimental parameters: Number of communities, time span [t 0 , t end ] of growth in batch. Community bottleneck β = 1/3, dilution ratio d ∈ [0, 1]. Initial conditions S i (t 0 ), N j (t 0 ), T k (t 0 ). Assemble 21 communities by randomly drawing 4 species with replacement from the species set. Ensure that each species is present in at least one initial community.
Deactivate cells that cannot afford to stay activated, consume the corresponding nutrients for cells that remain activated, set S max i := 0 end // Newly activated cells cells_activate := Poisson(a i · p i0 · max_uptake · (1 − k f ik ) · j (n ij · N j /N 0 )) if cells_activate > p i0 then cells_activate := p i0 end if cells_activate > S max i then cells_activate := S max i end p i0 := p i0 − cells_activate p i1 := p i1 + cells_activate for Each nutrient N j do N j := N j −n ij · cells_activate · (1 − k f ik ) end end else Deactivate all the activated cells end return Populations p i0 , p i1 for strain i, current nutrient concentrations N j Algorithm 3: Activation of cells, the first step of cell division in the IBM described in Alg. 2.

D D R A F T
Input: Communities with degradation scores D and strains S i with total population S i = p i0 + p i1 . End-state concentration of toxic compounds T k (t end ). Input: Experimental parameters: selection bottleneck β = 1/3, dilution ratio d. Rank the communities by D Select the top N β = 7 of communities with the highest ranks // Re-populate the new set of tubes Allocate 1/β new tubes for each selected community for Each selected community 1, 2, . . . , 7 do for Each strain i in the community do // Deactivate cells p i0 (t end ) = S i (t end ) p i1 (t end ) = 0 // Dilute the population, new cells will be inactivated p i0 (t 0 ) :=Poisson(d · S i (t end )) if p i0 (t 0 ) > S i (t end ) then p i0 (t 0 ) = S i (t end ) end if p i0 (t 0 ) > 0 then Append strain i with population p 0 (t 0 ) to the new tube end // Delete selected cells to not chose them again S i (t end ) = S i (t end ) − p i0 (t 0 ) end end Algorithm 8: Implementation of the propagule selection method for the IBM.

D R A F T
Input: Communities with populations S i (in the IBM S i = p i0 + p i1 ) with model parameters from (Tab. 1, Tab. 2) and degradation scores D. Input: Experimental parameters: selection bottleneck β = 1/3, initial population size S 0 and number of new communities to emigrate species from N emi = 5, and immigrate species to N immi = 5. // Rank the communities Rank the communities by the degradation score D // Update the fossil record with the top communities in this round for Each selected community 1, 2, . . . , 7 do for Each species l in the selected community do if The record of species l is not yet updated in this round of selection then Add all strains i of species l to the fossil record, including the corresponding parameters and population sizes end end end // Propose new communities in proportion to their degradation scores and survival Follow Alg. 11 // Emigration Draw N emi = 5 communities with uniform probability for Each chosen community 1, . . . , N emi do // Find emigrating species N umber_of _emigrants = 1 + Poisson(0.5) Verify that at least one species will remain for Each emigrant do Choose the emigrant at random, with priority for species that occur in more than one community Remove all strains of this species from the community end end // Immigration Draw N immi = 5 communities with uniform probability for Each chosen community 1, . . . , N immi do // Find immigrating species N umber_of _immigrants = 1 + Poisson(0.5) for Each immigrant do if There are species that do not feature in any community then Choose one of them at random end else Choose a species that is not already in the community with uniform probability end Take all strains of the species from the species record, add them to the offspring community Set the population size to S 0 (approximately S 0 in the IBM, see Alg. 11), in proportion to strain relative abundance end end Algorithm 10: Implementation of the disassembly method.

D R A F T
Input: The subset of selected communities, their degradation scores D. Population sizes S i of all strains (in the IBM, S i = p i0 + p i1 ). Input: Parameters: Number of species N spc that were inoculated in the different communities. Inoculum size per species S 0 . // Scale degradation scores by species extinctions for Each selected community 1, 2, . . . , 7 do // Count the number of surviving species in this community Set the extinction counter E = 0 for Each species l in the community do if S i = 0 for all strains of species l then Set E := E + 1 // Re-introduce species l from the record Take all strains of species l from the most recent record end end // Scale the degradation score D by the fraction of surviving species SetD := D · (N spc − E)/N spc end // Calculate a probability distribution based on degradation scores Set p n :=D n / mD m for each community n // Propose new communities randomly in proportion to p n for Each offspring community do Choose a parental community at random, by the probability distribution p n for Each species l in the parental community do Copy the growth parameters and f ik of all strains i to the offspring community // ODE: Set the population size of each species to S 0 in total, in proportion to the relative abundance of the strains. // IBM: Sample approximately S 0 cells with replacement as follows: for Each strain i (with population S i ) of species l (with population S l ) do //Deactivate cells p i0 (t end ) = S i (t end ) p i1 (t end ) = 0 // Draw new population if p i0 (t 0 ) > 0 then Add strain i with population p i0 (t 0 ) to the new community end end end end Algorithm 11: Method to propose new communities based on their degradation scores from the previous round. Called by Alg. 10