Controlling complex microbial communities: a network-based approach

Microbes comprise nearly half of all biomass on Earth. Almost every habitat on Earth is teeming with microbes, from hydrothermal vents to the human gastrointestinal tract. Those microbes form complex communities and play critical roles in maintaining the integrity of their environment or the well-being of their hosts. Controlling microbial communities can help us restore natural ecosystems and maintain healthy human microbiota. Yet, our ability to precisely manipulate microbial communities has been fundamentally impeded by the lack of a systematic framework to control them. Here we fill this gap by developing a control framework based on the new notion of structural accessibility. This framework allows identifying minimal sets of “driver species” through which we can achieve feasible control of the entire microbial community. We numerically validate our control framework on large microbial communities, and then we demonstrate its application for controlling the gut microbiota of gnotobiotic mice infected with Clostridium difficile and the core microbiota of the sea sponge Ircinia oros.


INTRODUCTION
In our modeling framework, we focus on exploring the impact that manipulating a subset of species 11 has on the abundances of other species. We thus consider a microbial community whose state at 12 time t can be determined from the abundance profile x(t) ∈ R N of its N species, where the i-th 13 entry x i (t) of x(t) represents the abundance of the i-th species at time t. Let us assume that the 14 state evolves according to some general population dynamics where the function f : R N → R N models the intrinsic growth and inter/intra-species interactions 16 of the community (see Supplementary Note 1 for details). For most microbial communities the 17 function f is unknown and difficult to infer due to the manifold of interaction mechanisms between 18 microbes, such as cross-feeding and modulation by the host immune system 18 . Thus we assume 19 encode a combination of M control actions that are simultaneously applied to the community at 1 time t. There are four types of control actions that we consider. If u j (t) < 0, the j-th control 2 action at time t is either a bacteriostatic agent or a bactericide, which decreases the abundance 3 of the species it actuates by inhibiting their reproduction or directly killing them, respectively 29 . 4 If u j (t) > 0, the j-th control action at time t is either a prebiotic 30 or a transplantation, which to actuate in order to drive the entire community. We call those species driver species. We will 1 also study if the impulsive control scheme can be as effective as the continuous control scheme for 2 controlling microbial communities. Indeed, the former is more feasible than the latter, especially 3 for human-associated microbial communities. Finally, we will design the control inputs that should 4 be applied to the identified driver species to drive the whole community towards the desired state.

5
IDENTIFYING DRIVER SPECIES 6 Driver species are characterized by the absence of autonomous elements. To understand when 7 a set of actuated species is a set of driver species, consider a three-species community with the clas-8 sical Generalized Lotka-Volterra (GLV) population dynamics (Fig. 2a). This toy community has 9 one control input actuating the third species x 3 . Actuating this species alone creates an autonomous 10 element -namely, a constraint between some species abundances that the control input cannot 11 break, confining the state of the community to a low-dimensional manifold. More precisely, our 12 mathematical formalism reveals that ξ = x 1 x 2 is an autonomous element for this microbial com-13 munity (Example 2 in Supplementary Note 2). Indeed, differentiating ξ with respect to time yields 14ξ = x 1 x 2 (1 − x 3 ) + x 1 x 2 (−1 + x 3 ) ≡ 0, which implies that the state of the community is con-15 strained to the low-dimensional manifold {x ∈ R 3 |x 1 x 2 = x 1 (0)x 2 (0)} for all control inputs ( Fig.   16 2a right). Intuitively, an autonomous element exists because the control input cannot change the 17 abundance of species x 1 without changing the abundance of species x 2 in a predefined way (i.e., 18 x 2 = x 1 (0)x 2 (0)/x 1 ). It is thus impossible to drive the whole community in its three-dimensional 19 state space, implying that x 3 cannot be a driver species for this community. Introducing a second 20 control input actuating species x 1 eliminates this autonomous element by helping the system to 1 jump out of the low-dimensional manifold. Hence, the community can be driven in any direction 2 within its three-dimensional state space (Fig. 2b). This indicates that {x 1 , x 3 } is a minimal set of 3 driver species for this community. Actually, by using these two driver species we can steer the 4 community to any desired state with positive abundances (Example 6 in Supplementary Note 5).

5
In the general case of N species and M control inputs, we define a set of actuated species 6 as a set of driver species if the corresponding controlled population dynamics {f, g} of the micro-7 bial community lacks autonomous elements. For a given pair {f, g}, the absence of autonomous 8 elements can be mathematically deduced using a formalism based on differential one-forms (Sup-9 plementary Note 2). Indeed, for the continuous control scheme of Eq.
(2), the conditions for the 10 absence of autonomous elements are well understood because they define when a system is accessi-11 ble 32 . As a cornerstone concept in nonlinear control theory, accessibility has been instrumental for 12 developing technological advances such as robotics. Since it is more natural to control microbial 13 communities with impulsive control actions, in this paper we extended the study of autonomous depend on such a pair, this might suggest that it is impossible to predict their presence and thus to 10 identify the driver species of complex microbial communities. We now show that this seemingly 11 unavoidable limitation can be solved by focusing on the topology of the controlled ecological 12 network of the community.

13
Define the graph G f,g = (X ∪ U, E f,g ∪ B f,g ) associated with a meromorphic function pair 14 {f, g} as follows. First, the edge (x j → x i ) ∈ E f,g exists if x j appears in the right-hand side ofẋ i 15 or x i (t + ) in Eqs.
(2) or (3), respectively. Second, the edge (u j → x i ) ∈ B f,g exists if g ij ≡ 0. In 16 this definition, the interaction x j → x i can originate in the uncontrolled population dynamics (i.e., 17 f i (x) depends on x j ) or, in a more general case, also in the controlled dynamics (i.e., the i-th row 18 of g(x) depends on x j ). Using this definition and given a controlled ecological network G c , we can 19 describe the class D of all possible controlled population dynamics that the controlled microbial 20 community can have. Mathematically, we describe the class D as containing all base models 1 {f * , g * } such that G f * ,g * = G c , together with all deformations {f, g} of each of those base models.

2
The base models characterize the simplest controlled population dynamics that the community can 3 have. We have chosen them as controlled GLV models with constant susceptibilities: for i = 1, · · · , N . The base models are parametrized by A = (a ij ) ∈ R N ×N , r = (r i ) ∈ R N , 5 and B = (b ij ) ∈ R N ×M , representing the interaction matrix, the intrinsic growth rate vector, 6 and the susceptibility matrix of the community, respectively. Thus, the base models in D are all 7 controlled GLV models such that their graph matches G c . As a classical population dynamics 8 model, the GLV model has been applied to microbial communities in lakes, soils, and human 9 bodies 14, 15, 20, 33-39 . Notice that in a microbial community, any species that gets extinct cannot 10 "resurrect" by itself without some external influence such as a transplantation or migration. Eq.

11
(4) is the simplest population dynamics that satisfies this condition in the following sense: it is 12 obtained by considering population dynamics of the form f i (x) = x i F i (x), and then choosing the 13 functions F i (x) to be simple affine functions.
14 Next, we say that a meromorphic pair {f, g} is a deformation of a base model {f * , g * }

15
if it satisfies the following three conditions: (i) it has the same graph as the base model (i.e., {f (x; θ),g(x; θ)}; and (iii) the identity {f (x; 0),g(x; 0)} = {f * (x), g * (x)} holds. The minimal model. A rather general class of controlled population dynamics can be described by deformations 1 of the base model of Eq. (4), such as for i = 1, · · · , N . In Eq. (5), the parameters θ i,1 are migration rates from/to neighboring habi-3 tats, θ −1 i,2 are the carrying capacities of the environment, θ −1 i,3 are the Allee constants, and the rest 4 {θ ij,k } 7 k=4 characterize the saturation of the functional responses 40 . Note that θ i,1 > 0 can also 5 model species like C. difficile that sporulate into "inactive" forms and then recover. Note also 6 that "higher-order" interactions can be described as deformations. For example, if species x i is 7 directly affected by species x j and x k , then a deformation can include the third-order interaction Similarly, deformations allow cases when the susceptibility of the i-th species to j-9 th control input is mediated by the abundance of other species. For example, the deformation 10 g ij (x; θ) = b ij + θ ijk x k models a case when the i-th species is actuated by the j-th control input 11 but its effect is mediated by the abundance of the k-th species. 12 We call the class D structurally accessible if almost all of its base models and almost all 13 of their deformations lack autonomous elements. This means that, except for a zero-measure Note 3, and Fig. 2c for an illustration). This result reduces the search for autonomous elements to 20 the deformations in D with minimal size C = 0. That is, to all base models whose graph matches 1 G c . Finally, we proved that D is structurally accessible if and only if G c satisfies the following two 2 conditions: (i) each species is the end-node of a path that starts at a control input node; and (ii) 3 there is a disjoint union of cycles (excluding self-loops) and paths that cover all species nodes (see 4 Theorem 3 of Supplementary Note 3). If these two graph conditions are satisfied, we also call G c 5 structurally accessible.

6
The notion of structural accessibility introduced above is a nonlinear counterpart of the no-  Note that once G c is structurally accessible this network cannot lose its structural accessibil- work of the community. Therefore, it is possible to find the driver species of a microbial community 10 using an "incomplete" ecological network that only includes some of the ecological interactions 11 (e.g., high-confidence interactions).

13
Next we turn to the question of calculating the control signal u(t) that needs to be applied to a 14 set of driver species to drive the whole community towards the desired state. We will show that 15 impulsive control actions can make this calculation easier.

16
Calculating optimal control strategies for microbial communities with known population dy-17 namics. To calculate the impulsive control inputs {u(t k ), t k ∈ T} needed to drive the micro-18 bial community to the desired state x d we adopt a model predictive control (MPC) approach 41 .

19
First, based on the current state of the community x(t k ) at the intervention instant t k ∈ T, we 1 use knowledge of its controlled population dynamics to predict the sequence of statesX k,L = 2 {x(t k+1 ), · · · ,x(t k+L+1 )} that the community will take in response to a sequence of L impulsive The prediction horizon L > 0 quantifies how far 4 into the future we predict. We then choose is the first element of the 5 optimal control input sequence U * k,L calculated by solving the following optimization problem: Here Ω ⊆ R M ×L is a set that specifies constraints in the control inputs we can use, and J x d is some 7 cost function penalizing deviations of the predicted trajectoryX k,L from the desired state x d . For functions. Solving such optimization problem is apparently more difficult, significantly limiting 5 our ability to calculate optimal continuous control actions. 6 We studied the performance of the above MPC strategy in the three-species microbial com-7 munity with a solo driver species of Fig. 1. Given the dynamics of this community (see caption 8 in Fig. 1), we find that L = 3 impulsive control inputs are sufficient to drive the whole commu-9 nity (Example 4 in Supplementary Note 5). To calculate the optimal control inputs we selected . Solving the optimization problem using DIRECT 11 yields the MPC strategy u * (t 1 ) = −0.8815, u * (t 2 ) = 2.0089 and u * (t 3 ) = −10 −4 (pink in Fig.   12 3a). We use this example to compare the performance of applying two other control strategies 13 to drive this community. The first strategy uses a transplantation to restore the abundance of the 14 driver species (i.e., increase its abundance to its desired value), expecting that such control action 15 will drive the rest of the community to the desired state (purple in Fig. 3a). This control strat-16 egy is reminiscent of a probiotic administration that restores the "healthy" abundance of the driver 17 species. The second control strategy ignores the driver species of this community, using two con-18 trol inputs (instead of one) to set the abundance of the non-driver species to their desired values 19 (blue in Fig. 3a).

20
Among the above control strategies, only the MPC applied to the driver species succeeds 1 (Fig. 3b). Actually, this strategy succeeds in a somewhat unconventional way: despite the driver 2 species is more abundant in the desired state than in the initial state, the first control action de-3 creases further its abundance. This first control action makes the non-driver species reach their 4 desired abundances and, once that happens, the abundance of the driver species is finally increased 5 to its desired value (pink in Fig. 3b). The second control strategy succeeds in driving species x 2 6 and x 3 , but it fails to drive x 1 to the desired abundance because it approaches the desired state from 7 an unstable direction (purple in Fig. 3b). Finally, not actuating the driver species results in the 8 worst strategy, failing to drive a single species to the desired state (blue in Fig. 3b). This example 9 demonstrates the importance of actuating the driver species.

10
Calculating control strategies for microbial communities with uncertain population dynam-11 ics or a large number of species. In general, solving the non-convex optimization problem of 12 Eq. (6) is challenging as the number of species or prediction horizon increase. Also, a prerequisite 13 for solving this optimization problem is a reasonable knowledge of the controlled population dy-14 namics of the community, which may not available. To circumvent these two drawbacks, next we 15 leverage the network underlying the controlled microbial community.

16
Consider that it is possible to obtain a weighted adjacency matrixÂ ∈ R N ×N from the 17 ecological network G of the community, providing a proxy for its interaction matrix. Without In the above equation, the positive definite matrices Q = Q ∈ R N ×N and R = R ∈ R M ×M bations (w x , w u ). As R increases, the performance of the linear MPC deteriorates, first using more 15 interventions to reach the desired state (green in Fig. 3d), and finally failing to drive the system to 16 the desired state (blue in Fig. 3d). Indeed, since R > 0 quantifies the "cost" of using control inputs,

17
increasing R reduces the magnitude of the control inputs, to the point they are not large enough to 18 drive the system towards the desired state. We emphasize that, in general, the performance of the

18
To quantify the success of our control framework on a particular community, we generate 300 19 initial species abundances that are uniformly distributed at a distance d > 0 from the desired state 20 (distance is measured using the Euclidean norm). Then, the success rate of our control framework 21 at distance d is defined as the proportion of those initial conditions that are driven to the desired 1 state only when the linear MPC is applied to a minimal set of driver species of the community 2 ( Fig. 4b-d). Namely, the success rate discards all initial conditions that naturally evolve to the 3 desired state. Finally, we calculated the mean success rate by averaging the success rate over 100 4 randomly constructed ecological networks (see items 7 and 8 of Supplementary Note 8 for details).

5
The mean success rate of our control framework changes with the distance to the desired 6 state, being close to 1 for small distances regardless of the parameters of the microbial community 7 (Fig. 4e-f). This result agrees well with the theoretical prediction that success is guaranteed pro-8 vided that the distance to the desired state is small enough. We next investigated how the success 9 rate changes with the distance d for different interspecies interaction strengths, and for different 10 connectivities of the ecological network underlying the community. The success rate decreases 11 as the interspecies interaction strength increase, especially for large distances (Fig. 4e). Since 12 increasing the interspecies interaction strength damages the stability of the population dynamics 44 , 13 this result suggests that microbial communities become "harder" to control as they lose stability.
14 The success rate of our control framework is also higher in microbial communities whose ecolog-15 ical networks have lower connectivity (Fig. 4f). Note that, in general, the size of a minimal set of 16 driver species decreases as the network connectivity increases. Therefore, this observations sug-17 gest that the success rate may increase as the number of driver species increases. Indeed, regardless 18 of the distance to the desired state, we find that our control framework attains a success rate > 0.8 19 provided that we drive at least 6 of the 100 species (Fig. 4g). This last result also suggests that 20 the success rate of our control framework can be enhanced by directly controlling a few additional 21 species. 1 Finally, we investigated the robustness of our control framework to errors in the ecological 2 network used for both identifying the driver species, and for calculating the linear MPC. Note that, to a 5% error). Our control framework is robust to these errors, in the sense that the success rate 9 deteriorates but remains larger than zero despite large errors (Fig. 4h). However, just a 5% error 10 decreases the success rate in about 30%. This result illustrates that our framework is feasible for 11 controlling large microbial communities provided we have an accurate map of their ecological 12 networks.

14
Mapping the ecological network of a microbial community allow us to identify its driver species. 15 We identified a minimal set of driver species in the gut microbiota of germ-free mice that are 16 pre-colonized with a mixture of human commensal bacterial type strains and then infected with 17 Clostridium difficile spores 22 . We identified a minimal set of five driver species in this 14-species 18 community: R. obeum (x 1 ), R. mirabilis (x 12 ), B. ovalus (x 2 ), C. ramnosum (x 6 ) and A. muciniphila 19 (x 10 ), see Fig. 5a. We also used the ecological network underlying the core microbiota of the sea 1 sponge Ircinia oros 23 , finding ten driver species in this twenty-species community (Fig. 5b). 2 We studied by simulation the efficacy of the identified driver species and the linear MPC 3 method for these two microbial communities, assuming that their dynamics are uncertain (see 4 Supplementary Note 7 for details of the dynamics used for the simulation). For the mice gut 5 microbiota, our framework succeeds in driving the whole community from an initial state where 6 Clostridium difficile is overabundant, towards a desired state with a better balance of species (Figs. 7 5c and 5d). Similar results were obtained for controlling the core microbiota of Ircinia oros, using 8 the ten identified driver species to drive the twenty species constituting this microbial community 9 (Figs. 5e and 5f). The success of our control framework shows again that the linear MPC method is 10 robust enough to drive microbial communities despite the presence of the perturbations (w x , w u ).

12
An influential method to understand and manage complex ecosystems has been identifying species 13 with a "big impact" on the entire ecosystem, leading to notions such as keystone 45, 46 or core 47 14 species. In general, the keystone or core species of an ecosystem are not necessarily its driver 15 species. For example, the driver species of an ecosystem do not depend on their abundance, while 16 the definition of keystone species does depend on the abundance -namely, species whose removal 17 cause a disproportionate deleterious effect relative to their abundance 45 .

18
It was suggested that notion of controllability -the ability to drive a system between any two

15
In this paper, we used a maximum matching based algorithm to identify a minimum set of 16 driver species from the ecological network of a given microbial community. In principle, there 17 could be multiple maximum matchings associated with the same network, rendering potentially 18 different minimum sets of driver species. Note that those minimum driver species sets share the 19 same cardinality. We claim that a minimum set of driver species is optimal only in the sense that 20 its cardinality is minimal. If the cost of choosing any species as a driver species is known, one can 21 develop a combinatorial optimization scheme to further pick up the best driver species set. But we 1 feel this is beyond the scope of the current work and hence leave it for future work.  In conclusion, by identifying driver species, our framework shows that an accurate map of 16 the ecological network underlying a microbial community opens the door for an efficient and sys-17 tematic control. The driver species can be identified despite missing interactions in the ecological 18 network, but our methods to calculate the adequate control actions can be sensitive to them. The

19
design of controllers that are robust to missing interactions will be a necessary step for controlling 20 real microbial communities. To fully harvest the potential benefits of controlling microbial com-1 munities a stronger synergy between microbiology, ecology, and control theory will be necessary.  b. Initial and desired abundance profiles (bars). Controlling the community consists in driving its state from the initial state x 0 to the desired state x d , represented by two points in the state space of the community. c. In the continuous control scheme, the control inputs u(t) are continuous signals modifying the growth of the actuated species. The controlled population dynamics of this community is given bẏ In the absence of control, this community has two equilibria x 0 = (3.14, 4.58, 1) and x d = (4.57, 4.73, 2) , chosen as the initial and desired states, respectively. d. In the impulsive control scheme, the control inputs u(t) are impulses applied at the intervention instants T = {t 1 , t 2 , · · · }, instantaneously changing the abundance of the actuated species. The controlled population dynamics is the same as in panel c, except thatẋ  x 3 (t) + u 1 (t) for t ∈ T. With this controlled population dynamics, our mathematical formalism 5 reveals the autonomous element x 1 x 2 that constraints the state of this microbial community to 6 the low-dimensional manifold {x ∈ R 3 |x 1 x 2 = x 1 (0)x 2 (0)} (gray) for all control inputs. Five state 7 trajectories (in colors) with random control inputs illustrate this fact. Hence, {x 3 } alone cannot be 8 a set of driver species for this controlled population dynamics. b. Including a second control input 9 u 2 (t) actuating x 1 (i.e., x 1 (t + ) = x 1 (t) + u 2 (t) for t ∈ T) eliminates the autonomous element, since 10 the state of the microbial community (colors) can explore a three-dimensional space (gray). Hence 11 {x 1 , x 3 } is a minimal set of driver species for this community with GLV dynamics. c. We proved 12 that, generically, increasing the complexity of the controlled population dynamics cannot create 13 autonomous elements. In this example, increasing the deformation size C from the GLV in panel a 14 (with C = 0) to the controlled population dynamics in Fig.1 (with C > 0) eliminates the autonomous 15 element that was present by actuating x 3 alone (Example 1 in Supplementary Note 2). Therefore, 16 increasing the complexity of the population dynamics makes {x 3 } a solo driver species.   species to match its value at the desired state x 3 (t 1 ) = x 3,d (purple dots). The third control 5 strategy does not actuate the driver species, but actuates the other two species {x 1 , x 2 } by setting 6 their abundance to their desired values (i.e., x 1 (t k ) = x 1,d and x 2 (t k ) = x 2,d , solid and hollow blue 7 dots, respectively). b. The response of the microbial community to these three control strategies. 8 Here and in panel d, the "jumps" produced by the control inputs are depicted by dashed lines. The 9 equilibria of the population dynamics are shown as gray dots. Only the first strategy applying MPC 10 to the driver species succeeds in driving the community to x d . c. Control strategies obtained by 11 using the linear MPC with parameters Q = diag(20, 1, 10) and different values for R: 10 −4 (pink), connectivity c = 0.03. We used our framework to identify a minimal set of M = 6 driver species. 5 The desired state is chosen as x d = (1, · · · , 1) . b. We randomly set the initial abundance x 0 of 6 species at a distance d = 0.4 from the desired state. Without control, the state of the microbial 7 community does not reach the desired state x d . c. and d. For the same community and initial 8 abundance as in panel b, we apply the control input generated by the linear MPC (panel c) to 9 the six driver species we identified. This control strategy successfully drives the state of the com-1 munity to the desired state (panel d). e., f. and h. Mean success rate of our control framework 2 as a function of the distance d of the initial state from the desired state. Error bars denote the  Figure 5 Controlling host-associated microbial communities. a. Inferred ecological network of 2 the gut microbiota of germ-free mice pre-colonized with a mixture of human commensal bacterial 3 type strains and then infected with C. difficile (species 7). b. Inferred ecological network of the 4 core microbiota of the sea sponge Ircinia oros. In both networks, self-loops are omitted to improve 5 readability. A minimal set of driver species is shown, providing a disjoint union of paths (purple) 6 and cycles (green) covering all species nodes. Refer to Table 1 in Supplementary Note 7 for the 7 species name. The controlled population dynamics of both microbial communities were simulated 8 using the cGLV equations (see Supplementary Note 7 for details). The intrinsic growth rates 9 were adjusted such that the community has an initial "diseased" equilibrium state x 0 in which one 10 species (C. difficile for the mice gut microbiota) is overabundant compared to the rest of species. 11 We chose the desired state x d as another equilibrium with a more balanced abundance profile. c, e.

12
Control actions obtained using the linear MPC for the impulsive and continuous control schemes.
13 d, f. Projection of the high-dimensional abundance profiles (states of the microbial communities) 14 into their first three principal components (PCs). See Supplementary Fig.S1 for the temporal 15 response of each species. The calculated control strategies applied to the driver species succeed 16 in driving the community to the desired state, using either continuous or impulsive control. 17