## Abstract

Mathematical models have been used successfully at diverse scales of biological organization, ranging from ecology and population dynamics to stochastic reaction events occurring between individual molecules in single cells. Generally, many biological processes unfold across multiple scales, with mutations being the best studied example of how stochasticity at the molecular scale can influence outcomes at the population scale. In many other contexts, however, an analogous link between micro-and macro-scale remains elusive, primarily due to the chal-lenges involved in setting up and analyzing multi-scale models. Here, we employ such a model to investigate how stochasticity propagates from individual biochemical reaction events in the bacterial innate immune system to the ecology of bacteria and bacterial viruses. We show an-alytically how the dynamics of bacterial populations are shaped by the activities of immunity-conferring enzymes in single cells and how the ecological consequences imply optimal bacterial defense strategies against viruses. Our results suggest that bacterial populations in the presence of viruses can either optimize their initial growth rate or their steady state population size, with the first strategy favoring simple and the second strategy favoring complex bacterial innate immunity.

## Introduction

One of the major challenges in biology is to understand how interactions between individual molecules shape liv-ing organisms and ultimately give rise to emergent be-haviors at the level of populations or even ecosystems. At the very bottom of this hierarchy, inside single cells, interacting biomolecules such as DNA or proteins are often present in small numbers, giving rise to intrinsic stochasticity of individual reaction events [1, 2]. As a result, genetically identical organisms occupying identical environments can express different phenotypes [3, 4] and make different decisions when presented with identical environmental cues [5, 6]. This *molecular noise* is known to be the cause of biologically and medically important traits of bacteria such as persistence in response to antibiotics [7, 8] and competence during acquisition of heterologous DNA [9]. However, while its causes and consequences are relatively well-studied at the organismal level [10, 11, 12], how molecular noise propagates to higher scales of biological organization to affect the ecology and evolution of organisms remains mostly unknown [4]. Many ecosystems have been shown to follow surprisingly deterministic trajectories despite the prevalence of stochastic events [13, 14], yet these trajecto-ries could themselves be strongly influenced by molecular noise. Thus, the extent to which ecological interactions are affected by molecular noise, and the extent to which these ecological consequences feed back to reshape individual traits, remain to be explored.

Perhaps the most prevalent biological systems in which molecular noise plays an important role are restriction-modification (RM) systems [15]. Present in nearly all prokaryotic genomes [16], RM systems are a highly diverse class of genetic elements. They have been shown to play multiple roles in bacteria as well as archaea, including regulation of genetic flux [17] and stabilization of mobile genetic elements [18], but have most frequently been described as primitive innate immune systems due to their ability to protect their hosts from bacterial viruses [19]. When a virus (bacteriophage or phage) infects a bacterium carrying an RM system, the DNA of the phage gets cleaved with a very high probability, thus aborting the infection. With a very small probability, however, the phage can escape and become immune to restriction by that specific RM system through epigenetic modification, leading to its spread and potentially death of the whole bacterial population in absence of alternative mechanisms of phage resistance [20]. Thus, in the context of RM systems, molecular noise occurring at the level of individual bacteria can have profound ecological and evolutionary consequences. Because RM systems are ultimately based on only two very well characterized enzymatic activities (restriction and modification) [21], they represent a simple and tractable biological system in which we can investigate propagation of effects of molecular noise across different scales of biological organization.

Here, we mathematically model the action of RM systems from individual molecular events occurring inside a single cell, through individual bacteria competing in a population, to interactions between populations of bacteria and phages in a simple ecological setting, as shown in Fig 1. We demonstrate that, by imposing a tradeoff between the efficiency and cost of immunity, molecular noise in RM systems occurring at the level of individual bacteria has consequences that propagate all the way up to the ecological scale, and that the ecological consequences in turn imply the existence of optimal bacterial defense strategies against phages.

## Results

### Self-restriction in single cells and in growing populations

*RM*-systems consist of two enzymes, a restriction-endonuclease *R*, that recognizes and cuts specific DNA sequences (restriction sites), and a methyl-transferase *M*, that recognizes the same DNA sequences and ensures that only invading phage DNA can be cut by the endonuclease while the bacterial DNA remains methylated and protected. However, since chemical reactions occur stochastically, *RM*-systems can produce errors and fully methy-late invading phage DNA before it is cut and degraded (phage escape) [22]. Similarly, it is possible that newly replicated restriction sites on the bacterial DNA, which are originally unmethylated, are accidentally cleaved instead of methylated (self-restriction) [23].

Inside a single cell, the probability of such self-restriction events depends on the total activity, *r*, of all restriction enzyme molecules *R*, the total activity, *m*, of all methylation enzyme molecules *M*, as well as the bacterial replication rate *λ*, since *λ* determines the rate at which new unmethylated restriction sites are generated. To investigate how self-restriction depends on these parameters, we model the corresponding biochemical reactions at each individual restriction site on the bacterial DNA with the stochastic reaction network displayed in Fig 2a (see SI Appendix Section S.1). The time *τ*_{S} until the first self-restriction event in a given cell-i.e., until that cell’s death or substantial reduction in growth rate-can be obtained as the time when the first restriction site is cut, that is as *τ*_{S} = min_{i∈{1,…,NS }} *τ*_{i}, for bacterial DNA with *N*_{S} restriction sites, where *τ*_{i}, *i* = 1, *…, N*_{S} are the waiting times for cutting events at individual sites. It can be shown that all *τ*_{i} follow a phase-type distribution (see [24] and Fig 2b,c):
with **p**_{Q} = [*p*_{0} *p*_{1} *p*_{2}] being the initial methylation configuration, i.e., the proportion of restriction sites that are unmethylated (*p*_{0}), hemi-methylated (*p*_{1}) and doubly-methylated (*p*_{2}); see SI Appendix Section S.1.

Equation [1] allows us to derive the expected time until self-restriction of a single site as
more generally, Fig 2b shows how the distribution of waiting times depends on the restriction rate *r* (increasing the probability of the site getting cut when it is unmethy-lated) and the magnitude of *m* relative to *λ* (which decreases the probability that the site is unmethylated in the first place).

Fig 2c shows that time to self-restriction at a single site depends essentially on an unknown quantity, the methy-lation configuration **p**_{Q}. Here we argue that by shifting the focus from the single-cell scale to the population scale, the configuration **p**_{Q} can no longer be freely chosen, and has to be determined self-consistently instead. Intuitively, this is because when the bacterial population is in steady-state growth, new unmethylated sites are constantly replenished by replication, while cells with more unmethylated sites are simultaneously and preferentially being removed, as illustrated in Fig 3a and required by Eq [2]. These two forces, generation of new unmethylated sites and their preferential removal, will push any initial **p**_{Q} towards a unique steady state equilibrium.

Mathematically, assuming that the methylation dynamics in all cells are equilibrated and that cells cannot be distinguished, the internal methylation configuration of any randomly chosen cell at any time during growth of the population can be derived from the quasistationary distribution p_{QSD}(*r, m*) of the individual-site methylation process in Fig 2a (see SI Appendix Section S.1). p_{QSD}(*r, m*) is the equilibrium distribution of the stochastic process conditional on it not having reached the absorbing state where the DNA is cut and the cell has died (Fig 3a); in short, methylation and growth equilibrate “*in all directions except the one leading towards self-restriction*”. Then, setting **p**_{Q} = p_{QSD}(*r, m*) in Eq [1] reduces the phase-type distribution *f* (*τ*_{i}) for the time *τ*_{i} until self-restriction at an individual restriction site to a single exponential, implying further that the waiting time *τ*_{S} = min_{i∈{1,…,NS }} *τ*_{i} until self-restriction of any site in the cell is also exponentially distributed. Consequently, we are led to the main result of this section: growth with self-restriction can be rigorously modeled at the population level with a Markov birth-death process for which the expected population size *n*(*t*) follows a simple ordinary differential equation
where *λ*_{e}(*r, m, λ*) = *λ − µ*(*r, m, λ*) is the effective growth rate and *µ*(*r, m, λ*) is the rate of self-restriction, defined as the inverse of the per-cell expected waiting time until self-restriction
with *γ*_{1} being the largest eigenvalue of B (an explicit stochastic simulation validating this analytical result is provided in the SI Appendix Section S.2).

Equation [4] allows us to straightforwardly evaluate the reduction in the population growth rate due to random self-restriction events in single cells for any given pair of enzyme activities, *r* and *m*. To study possible qualitative effects of self-restriction, we explore in Fig 3b a wide range of enzyme activities for a system with *N*_{S} = 5 restriction sites (chosen, for illustration purposes, significantly smaller than the typical number of sites recognized by real RM systems). We find that the main determinant of self-restriction is the activity *m* of the methyl-transferase and that the effects of molecular noise can be suppressed by sufficiently increasing *m*. Furthermore, so long as *m* is large enough such that unmethylated restriction sites are only rarely available, *µ*(*r, m, λ*) lies on a large plateau of low self-restriction and changes only little with *r* and *m*, suggesting that stochastic fluctuations in enzyme activities would only have minor consequences for the population, especially when they are positively correlated, as would be the case if *R* and *M* enzymes were expressed from the same operon (SI Appendix Section S.3).

The (*r, m*) plane in Fig 3b contains a transition region that separates the large plateau with low self-restriction from the plateau where self-restriction is severe enough to stop the population growth altogether. We have chosen our reference (red) parameter values (*r*_{ref}, *m*_{ref}) to lie in this transition region, and explored the regime with an *e*-fold higher rates (“large r & m”, indicated by green), and with 2*e*-fold lower rates (“small r & m”, indicated by blue) in Fig 3b, c. The comparison of these three regimes in Fig 3c is most clear when the effective growth rate is shown as a function of *λ*, the rate at which the cells, and thus the restriction sites, are replicated. In the “small r & m” regime, self-restriction is so infrequent that it can easily be outgrown by replication (except at very low *λ*). In the “large r & m” regime, *m* is sufficiently high to keep the restriction sites protected and thus self-restriction is rare, except at extremely large *λ*, where the green curve falls below the blue curve. In the reference regime, *r* is too large and *m* not high enough to protect, so self-restriction can not be “outgrown”; effective growth thus falls significantly below *λ*. Our numerical analyses further show that the self-restriction rate *µ*(*r, m, λ*) grows faster-than-linearly with *λ* (SI Appendix Section S.1), causing the effective population growth to slow down and ultimately drop to zero at high enough *λ*.

We end this section by highlighting a non-trivial interaction between the single-cell and population-scale processes. While increasing the activity *r* of the endonuclease always decreases the effective growth rate of the population due to self-restriction, the effect can be smaller than expected from the single-cell analysis (dashed lines in Fig 3c). This is because high values of *r* feed back through the population scale to bias the steady-state distribution of methylation configurations away from cells with lots of unmethylated sites, as shown in Fig 3a, making self-restriction less likely. Implicit feedback effects of this type frequently give rise to complex dynamics in multi-scale models.

### Phage escape

*RM*-systems lower the growth rate of the population due to self restriction, especially when the activity *m* of the methyl-transferase is small. Upon infection by a phage, however, small values of *m* are advantageous, making it less likely that the unmethylated phage DNA will get methylated and escape the immune system before it can be cut by the restriction enzyme.

Assuming that all restriction sites are identical and independent, the probability of phage escape can be calculated [25] as
where *N*_{V} is the number of restriction sites on the phage DNA. From Eq [5] it is straightforward to see that *p*_{V}(*r, m*) is monotonically increasing in *m* and decreasing in *r*. One might therefore expect that the balance between avoiding self-restriction that favors high *m*, Eq [4], and minimizing phage escape that favors low *m*, Eq [5], would impose a tradeoff and thus lead to an optimal value of *m*. However, this is not the case, because phage escape probability *p*_{V}(*r, m*) and the population self-restriction rate *µ*(*r, m, λ*) can both approach zero so long as *r* and *m* both increase to infinity but *r* does so faster. While mathematically possible, this limit is, however, not biologically relevant: large enzyme expression levels should incur a cost (metabolic or due to toxicity) for the cells [26, 27], which we sought to incorporate into our model by including a growth rate penalty proportional to the activity of restriction and methylation enzymes, i.e., *λ*_{e}(*r, m, λ*) = *λ − µ*(*r, m, λ*) *- c*_{r}*r − c*_{m}*m*. Interestingly, it can be verified that our reasoning is valid only because two subsequent demethylation events need to occur to create a restriction-susceptible site on the bacterial DNA (SI Appendix Section S.1). If hemi-methylated sites could be recognized by the restriction endonuclease, or if both methyl groups could be lost in a single event, our initial expectation about the existence of the tradeoff would be correct, and a particular choice of *r* and *m* values would simultaneously minimize the phage escape and self-restriction, even in the absence of the expression cost for R and M.

Our model can be generalized to multiple coexisting *RM*-systems that recognize different restriction sites and operate in parallel, as is often observed for bacteria in the wild [16]. This provides increased protection from phages since the phage has to escape all *RM*-systems to infect successfully. However, multiple *RM*-systems also imply that the bacteria either have to pay higher expression and self-restriction costs or that they have to rebalance the expression levels of the enzymes such that lower self-restriction rates per *RM*-system are obtained with the same overall enzyme activity. Allowing bacteria to have multiple *RM*-systems, but assuming for the sake of simplicity that these systems are all equivalent in terms of enzyme activities and number of recognition sites, we obtain the phage escape probability for *k RM* - systems as , with the corresponding growth rate being

### Population dynamics in the presence of phages

What is the combined effect of phage escape and self-restriction in simple bacteria-phage ecologies? To answer this question quantitatively, we model the dynamics of the bacterial population in the presence of obligatorily lytic phage, *n*(*t*), as follows:

The first term on the right-hand side describes the growth of the population at an effective rate of *λ*_{e}(*r, m, k, λ*) and already accounts for self-restriction. The second term accounts for phage escape events. Here, *v* is the phage density which we take to be a constant parameter of the environment; phages enter bacterial cells at a rate proportional to *vn*(*t*), as prescribed by mass-action kinetics, with a proportionality constant given by the phage adsorption rate, *ρ*. The rate of successful infections is then given by *ρvp*_{V}(*r, m, k*)*n*(*t*). We assume that each successful infection event wipes out a fraction, 0 *< l <* 1, of the total population (i.e., on average, *n*(*t*)*/l* bacteria die following phage escape), yielding the full Eq [7]. For simplicity, we assume all infections to be lytic and do not consider the possibility of the phage lysogenizing the host bacteria.

The unknown parameters *ρ*, *v*, and *l*, enter Eq [7] as a product, which can be thought of as defining a new unit, *n*_{0} = *l/*(*ρv*), for the population size. The growth dynamics has one biologically-relevant fixed point that represents the steady-state bacterial population size:

The same quantity, *n*_{s}, also emerges as relevant in an alternative ecological model where bacteria grow exponentially until the occurrence of the first phage escape event, which then wipes out the complete population. This alternative ecology may be representative for bacteria that rarely encounter long-term stable environments (e.g. marine bacteria feeding on small organic particles in the ocean [28]) such that maximizing the number of off-spring colonizing new environments may be more important than maintaining high steady-state population size (see SI Appendix S.4.1). In both ecological scenarios, two key quantities summarize the fate of bacterial populations coexisting with bacteriophages: *λ*_{e} quantifies the short-term growth rate before rare but potentially catastrophic phage escape events are likely to occur, and *n*_{s}, which quantifies the long-term cost of phage escape. Importantly, both *λ*_{e} and *n*_{s} depend solely on three single-cell parameters: the restriction rate *r*, the methylation rate *m*, and the number of concurrently active RM-systems, *k*.

### Tradeoffs and optimality in bacterial immunity

Can bacteria tune the single-cell parameters over evolutionary timescales in order to maximize the short-term growth rate, *λ*_{e}, and the steady-state population size, *n*_{s}? Equations [6,8] assert that these two quantities are necessarily in a tradeoff and cannot be maximized simultaneously, which is illustrated in Fig 4a. This tradeoff is the first key result of the section.

With no single optimum possible, we look instead for Pareto-optimal parameter combinations, (*r, m, k*), i.e., solutions for which *λ*_{e} cannot be further increased without reducing *n*_{s} and vice versa [29, 30]. Different Pareto-optimal solutions trace out a “front” in the plot of *λ*_{e} vs *n*_{s} in Fig 4b that jointly maximizes growth rates and population sizes to the extent possible. Points in the interior of the front are sub-optimal and could be improved by adjusting parameter values, while points beyond the front are inaccessible to any bacterial population. Which Pareto-optimal solution ultimately emerges as an evolutionary stable strategy depends on the actual bacterial and phage species considered as well as their biological context. Rather than focusing on specific examples, we next establish several general results of our analysis, contrasting in particular “fast growth” bacterial strategies that maximize *λ*_{e} with “large size” strategies that maximize *n*_{s}.

We start by examining in Fig 4c the optimal enzyme activities, *m*_{opt} and *r*_{opt}, along the Pareto fronts. For the “large size” regime at low *λ*_{e}, the bacterial population primarily needs to defend against phage escape, favoring low *m* and high *r*, even at the cost of self-restriction. As we move towards the “fast growth” regime, *r* can drop to decrease the cost, but *m* must increase to protect against self-restriction, until maximal *m*_{opt} is reached. For even higher *λ*_{e}, it is optimal to “shut down” the RM-systems altogether to save on the cost, by tuning *r* and *m* simultaneously to zero. Numerical analysis (SI Appendix S.4.2) reveals that along the Pareto front of Fig 4b, the *total* cost of running the RM systems varies in precise inverse linear relationship with *λ*_{e}. Pareto-optimal solutions are further characterized by the fact that the reduction in growth rate, *λ − λ*_{e}, is split equally between the cost of running RM systems, *c*(*r* + *m*), and self-restriction. If this were not the case and the cost were larger (or smaller) than self-restriction cost to growth, cells could always down-(or up-)regulate the RM-system activity to trade cost for self-restriction and obtain an overall smaller total growth reduction. This universal equality of cost of running RM systems and self-restriction at optimality is the second key result of the section.

A detailed examination of the Pareto front in Fig 4b reveals a striking shift in the structure of optimal solutions as we move from “fast growth” to “large size” regime. In situations where fast growth is favored, we observe that a single RM-system (*k* = 1) is optimal. In contrast, large steady-state bacterial population sizes favor *k*_{opt} *>* 1 RM-systems, with the optimal number, *k*_{opt}, set by the costs, *c*_{m} and *c*_{r}, of operating the RM-systems. Figure 4d makes this point explicit by comparing the growth curves of bacterial populations with the same *λ*_{e} = 0.5*λ*_{ref} but different numbers of RM-systems; here, populations with *k*_{opt} = 2 reach the largest size. These results are quantitatively robust to changes in replication rate, *λ*, as shown in Fig 4e, where Pareto fronts for different *λ* are nearly rescaled versions of each other. These results are also qualitatively robust to changes in the cost *c* = *c*_{r} = *c*_{m} so long as the cost is nonzero, as shown in Fig 4f.

Establishing that “fast growth” regime favors simple innate immunity with a single RM-system while “large size” regime favors complex innate immunity with multiple RM-systems is the third key result of this section. This result can be understood intuitively by considering under what conditions, if any, multiple RM systems could be optimal at “fast growth”. If costs for *R* and *M* enzymes are vanishingly small, a single RM-system can provide arbitrarily good protection, as we showed previously. If the costs are not vanishingly small, multiple RM systems must be more costly than a single system at comparable phage escape and self-restriction rates: to keep self-restriction constant with *k* RM systems, not only does the cell require *k* times more *M* molecules than at *k* = 1, but their individual activities need to be higher as well, leading to a higher cost for *M* and thus a lower effective growth rate. This argument does not apply over longer timescales, where multiple RM-systems can better protect against phage escape and enable higher bacterial population sizes.

Lastly, we sought to put our results into perspective by relating them to a typical *E. coli* strain. Recent measurements [23] quantified the self-restriction rate in a bac-terial population with the EcoRI system replicating at *λ* = 0.017 min^{−1} to be *µ ≈* 10^{−3} min^{−1}. The cost of RM-systems was not detectable in WT strain but could be detected in strains overexpressing *M* enzymes (unpublished data). Treating the cost *c* as unknown and assuming that *E. coli* is Pareto-optimal (black dots in Fig 4e,f), we predict the following parameter values for the RM systems: cost *c ≈* 3.7 10^{−7}, enzyme activities *r ≈* 1.2 10^{3} min^{−1}, *m ≈* 1.5 10^{2} min^{−1}, with the optimal number of RM-systems being at the boundary between *k* = 1 and *k* = 2. Clearly, this prediction depends not only on the optimality assumption but also on at least two strong simplifications: (i) we do not know the ecology and the distribution of typical growth rates for different bacterial species or isolates; (ii) we do not know the corresponding number of RM-systems and their individual properties; here we simply assumed that all RM systems are the same.

## Discussion

Despite the ubiquity of RM systems in prokaryotic genomes [16], basic ecological and evolutionary aspects of these otherwise simple genetic elements are poorly understood [19]. Although RM systems have been discovered more than six decades ago due to their ability to protect bacteria from phage [31] and this is often assumed to be their main function [32], only a few experimental studies focused on the ecological and evolutionary dynamics of interactions between RM systems and phage [33, 34]. Similarly, effects of RM systems on their host bacteria, such as their cost in individual bacteria due to self-restriction, began to be addressed quantitatively only recently [23, 35]. In this work, we bridged these two scales using math-ematical modeling. Our model captures the stochastic nature of RM systems originating at the level of interacting molecules in individual bacteria and extends it all the way to the dynamics of interactions between growing bacterial and phage populations.

Using this approach, we analytically described the tradeoff between the cost and the efficiency of immunity conferred by RM systems. The existence of a tradeoff was previously indicated by quantitative single-cell experiments with two RM systems isolated from Escherichia coli [23]. We extended the mathematical model of restriction and modification in individual bacteria to a simple ecological setting, where a bacterial population grows in the presence of phages and thus showed that the tradeoff between the cost and the efficiency of immunity can pro-foundly affect the resulting population dynamics. As an important consequence of the tradeoff between the cost and the efficiency of immunity, there exists no optimal pair of R and M enzymatic activities suited for all ecological settings. Instead, we can expect the observed expression levels and enzymatic activities of naturally occurring RM systems to represent adaptations to specific environmental pressures. Such “tuning” of expression levels towards optimality has previously been directly experimentally shown in different molecular systems [26]. The expression levels of both R and M should be readily tunable by mutations in the often complex gene-regulatory regions [36].

While our predictions should be viewed as approximate, our analysis highlights two important conclusions. First, optimality can make clear and quantitative predictions in the parameter regimes relevant for real strains, and improving the predictions to take into account more relevant biological detail (if needed and known) remains only a technical, rather than conceptual challenge. Second, parameter values measured for an *E. coli* RM sys-tem put optimal solutions into a regime that permits a large variation in the optimal number of RM systems, be-tween one to six, with relatively small changes in the ef-fective growth rate. This observation allows us to advance the following hypothesis: the number of RM-systems in different bacterial strains and species is not a historical contingency, but an evolutionary adaptation to different ecological niches characterized by different typical growth rates. In other words, the tradeoff between the cost and the efficiency of immunity can be partially alleviated in bacteria employing multiple RM systems. It is therefore interesting to note that many bacterial species carry mul-tiple RM systems and the number of RM systems varies significantly among bacteria with different genome sizes and lifestyles [16, 15]. Our results indicate that differ-ent numbers of RM systems would be optimal in popu-lations under different selection pressures (phage preda-tion/resource limitation).

The analytical model presented here makes several simplifying assumptions. First, we consider only inter-actions between a single species of bacteria and single species of phage. In natural environments, many bac-terial and phage species interact and this diversity will certainly impact the resulting ecological end evolution-ary dynamics [37, 38, 33, 39]. Second, we assumed the key parameters such as the numbers of restriction sites in bacterial and phage genomes to be constant in time and thus disregarded the long-term evolutionary dynam-ics. Bioinformatic studies have shown that many bacteria and phage avoid using restriction sites in their genomes [40, 41, 42]. Restriction site avoidance can represent an adaptive mechanism for increasing the likelihood of es-cape in phages [41, 43] and decreasing the likelihood of self-restriction in bacteria [44, 23]. The stochastic nature of RM systems observed at the level of individual cells is thus likely to critically shape the ecological and evo-lutionary dynamics of interactions between bacteria, RM systems and phage.

## Acknowledgements

The authors would like to thank Moritz Lang, Gregory Batt, Eugenio Cinquemani and Christoph Zechner for helpful discussions. JR acknowledges support from the Agence Nationale de la Recherche (ANR) under Grants No. ANR-16-CE33-0018 (MEMIP) and ANR-16-CE12-444 0025 (COGEX). CCG was supported by the HFSP Young Investigators’ grant. MP was a recipient of a DOC Fellowship of the Austrian Academy of Science at the In-stitute of Science and Technology Austria. GT was sup-ported in part by the Austrian Science Fund grant FWF P28844.