## Abstract

Correct decision making is fundamental for all living organisms to thrive under environmental changes. The patterns of environmental variation and the quality of available information define the most favourable strategy among multiple options, including sensing and reacting to environmental cues or randomly adopting a phenotypic state. Memory – a phenomenon often associated with, but not restricted to, higher multicellular organisms – can help when temporal correlations exist. How does memory manifest itself in unicellular organisms? Through a combination of deterministic modelling and stochastic simulations, we describe the population-wide fitness consequences of phenotypic memory in microbial populations. Moving beyond binary switching models, our work highlights the need to consider a broader range of switching behaviours when describing microbial adaptive strategies. We show that multiple cellular states capture the empirical observations of lag time distributions, overshoots, and ultimately the phenomenon of phenotypic heterogeneity. We emphasise the implications of our work in understanding antibiotic tolerance, and, in general, survival under fluctuating environments.

## Introduction

In an ideal world, living organisms would be able to adapt instantly and reliably to changing environmental conditions in order to maximise their instantaneous performance. However, conditions may change abruptly and unpredictably, making it ineffective to mount a specific rapid response. Also, some responses require the synthesis of complex molecules (secretion systems or capsules in bacteria) or entering a physiological state (dormancy) that cannot be reverted instantaneously, if need be. Switching to a new phenotype may thus commit the cell to a response that lasts for a specific timescale different than the duration of the changed environment. Besides phenotypic switching, diversity in response rates can result in intricate patterns of phenotypic heterogeneity. We postulate that instantiating memory in switching mechanisms underpinning phenotypic heterogeneity will affect population composition, size and ultimately fitness. While classical models of phenotypic heterogeneity use simple ‘on’ - ‘off’ dynamics, the accuracy of modern experimental methods require us to delve deeper into the dynamics. Here we provide a theoretical framework for phenotypic switching rooted in a mechanistic concept of molecular memory.

In bacteria, memory is often discussed in the context of phenotypic heterogeneity [1, 2, 3]. In homogeneous environments, non-genetic individuality can arise through fluctuations in molecular concentrations of signaling molecules during transcriptional bursts [4], unequal partitioning at cell division [5] or via other epigenetic mechanisms [6, 7, 8, 9, 10]. Such ‘in-built’ mechanisms for phenotypic variation help bacterial populations adapt to harsh environments [11, 12, 10, 13]. When environmental fluctuations show stereotypical patterns, unicellular organisms may harness this temporal information to adjust their mode of phenotypic adaptation to match, or even anticipate, environmental fluctuations [14]. Such strategies can be embedded in genetic regulatory networks [15, 16] or arise from epigenetic phenotypic switches [17, 18]. These forms of fitness optimization by associative learning can arguably be assimilated to memory-based processes [19, 20].

A large body of theoretical work recognises the role of phenotypic heterogeneity by analysing its effect on bacterial fitness [12, 21, 22, 23]. However, depending on the details of the underlying genetic network controlling switching, its dynamics may not be adequately described by simple ‘on’ - ‘off’ dynamics with constant rates of switching between distinct phenotypic states. When phenotypic switching occurs at a constant rate, the residence time in a given phenotypic state is best described mathematically as a memory-less process [24]. The probability of switching, then, does not depend on the history of the cell (but only on the fact that it is in a given state), leading to exponentially-distributed residence times. By contrast, when the probability of switching depends on the time already spent in a given state, residence time no longer follows an exponential distribution and multigenerational memory occurs [25]. A simple constant phenotypic switch is, therefore, not a generic model when approaching the problem of intergenerational memory. For example, such a model is inadequate when explaining phenomena such as broad time-lag distributions and both over- and undershooting behaviour of specific phenotypes observed in bacteria or eukaryotic organisms [17, 26, 27].

Thus motivated by the discovery of different forms of memory in bacteria [25, 24], we propose a unifying approach to define memory. Within the framework of our approach, we ground our model in mechanistic processes occurring in a cell. These processes can be experimentally detected, helping us classify exact phenotypes. First, we derive deterministic dynamics from first principles at the level of individual cells and track the behaviour of a cell population emerging from different switching genotypes. Using this model, we are consistently able to explain the experimental observation of over/undershoots and wide time lag distributions. Then, paying particular attention to the characterisation of transient and equilibrium states of the system, we discuss their consequences on the fitness of a lineage in the presence of fluctuating environments. Significantly, fluctuating between stressed and relaxed environments, as can be the case during antibiotic treatments, highlights the applicability of our approach. The classically studied bi-stable switch is a special case of our more general model.

## Model & Results

### 2.1 Building memories

We simulate the ecological dynamics of isogenic populations of unicellular organisms that can exist in two different phenotypic states: ‘on’ and ‘off’. Switching from ‘off’ to ‘on’ is unidirectional and stochastic, and occurs at rate *μ*. After turning ‘on’, cells cascade via a deterministic, multi-step process through *n* compartments eventually returning to the ‘off’ phenotype. These compartments represent internal molecular states (potential), such as the dilution of cytoplasmic or membrane protein concen-trations [2, 5, 3], that may determine the qualitatively different ‘on’ - ‘off’ phenotypes through a threshold-based mechanism [28].

Immediately after turning ‘on’, cells have the highest potential. While retaining the ‘on’ state, cells gradually lose potential by transitioning through the successive compartments at a leaching rate of *ϵ*. This movement reflects the process of protein degradation in a simplistic manner, whereas more complicated forms can be formulated (with bumps or plateaus on the landscape, see SI). While the compartments (*i*) are a representation of the phenotypic dynamics, the growth (birth and death) dynamics of the cells occur separately at rates *b _{i}* and

*d*, respectively.

_{i}Fig. 1 visualizes this compartmental model for *n* = 4. The reactions in which an individual cell *X _{i}* in compartment

*i*is involved are then,

These reactions have a deterministic counterpart in the following linear system of differential equations,
which describe the time evolution of the abundances *x _{j}* of cells in the compartments with

*i*= 1,…,

*n*- 1,. In this model, we can distinguish growth dynamics as specified by birth and death rates from phenotypic dynamics as specified by switching and leaching rates. Their current state determines the movement of cells. Once a cell is in an ‘off’ state it stays ‘off’ until an (external or internal) perturbation switches it back ‘on’. We can imagine the perturbations to be abiotic or biotic [11, 29]. The perturbations have to be large enough to shock the system out of the ‘off’ equilibrium.

In our system, we track the amount of time a cell spends in a particular compartment. Following the trajectories of individual cells, qualitatively different distributions of the time spent in the ‘on’ state emerge as the number of compartments *n* increases. In particular, departure from an exponential distribution is a hallmark of memory [25] (Fig. 2). For a constant leaching rate, and negligible growth parameters, the length of memory acts like a timer (a deterministic time as termed in [25] for the residence time in the motile state). The overall density function can be captured by a combination of multiple exponential waiting times which results in a gamma distribution with a shape parameter given by the memory size *n* (Fig. 2). We can thus use this model to study ‘long-term’ memory. Codes for implementing our algorithm as well as for reproducing the relevant figures are available at GitHub.

### 2.2 Asymptotic properties

The model is Markovian, as the current system state entirely determines cell dynamics. Eventually, as *t* gets very large, the cell distribution in the compartments gets to an equilibrium. This equilibrium can be recovered from an eigenvector of the matrix that captures the system in Eq. 2 (SI). Under complete symmetry in the growth dynamics (*b _{i}* =

*b*and

*d*=

_{i}*d*) we can get a simple, closed-form expression for this equilibrium for any number

*n*of compartments (SI). We begin with a focus on this symmetric case, where the asymptotic growth rate

*b*–

*d*of the population is independent of the number of compartments and corresponds to the dominant eigenvalue of the matrix model. The assumption is relaxed further in the manuscript. Our interest is in the effect of changes in the leaching rate

*ϵ*and memory

*n*. We observe that as

*ϵ*decreases or n increases, both the time a cell spends in the ‘on’ state and the equilibrium fraction of ‘on’ cells increases. However, for a small number of compartments, or faster transition through ‘on’ states, the system is dominated by ‘off’ cells.

### 2.3 Transient properties

Under constant switching (*μ*), a decrease in *ϵ* is qualitatively equivalent to an increase in *n*, in terms of asymptotic properties like the equilibrium cell distribution and the time spent in either state. However, this equivalence breaks down when considering transient dynamics. As the number of compartments increases, the frequencies of the ‘on’ and ‘off’ cells do not approach their equilibrium values directly. Instead, the frequencies overshoot, in the case of ‘on’ cells, and undershoot, in the case of ‘off’ cells. The frequency dynamics oscillates when approaching the equilibrium (Fig. 3 a). Interestingly, overshooting and oscillations in the transients of cell frequencies as seen in our model recapitulate observations in cancer cells dynamics [26] and in bacterial growth rate recovery following antibiotic treatments [30]. This transient effect directly relates to the length of memory. While modelling the ‘off’ and ‘on’ states as two compartments would appear more parsimonious, it falls short of replicating distinctive empirical results. Adding compartments intensifies transients, an effect attributable to the spectrum of the matrix model underlying Eq. 2 (see SI). The presence of multiple compartments introduces oscillations due to complex subdominant eigenvalues, absent when there is a single ‘on’ compartment. As complex subdominant eigenvalues are closer in real part to the dominant eigenvalue, increasing the number of compartments magnifies their influence on cell dynamics accordingly.

Cells switch back to the ‘off’ state after spending a certain amount of time in the ‘on’ state. The presence of multiple compartments, extends memory, and affects population composition as well as population density when evaluated at a specific time-point (Fig. 3 a). To capture the transient dynamics when increasing memory, we estimate the peak-to-peak amplitude (the difference between the largest over-shoot and undershoot in the frequency of ‘on’ cells) (Fig. 3 (b)). Typically we see a monotonic increase in the relationship between the amplitude and memory length. The relationship breaks down when the equilibrium value of ‘on’ cells approaches 1. Close to the boundary of complete fixation of ‘on’ cells, there is not enough space for the frequency to fluctuate and hence the drop in amplitude (Fig. 3 (b)).

### 2.4 Environmental variation

While cell lineages can stochastically switch phenotypes even in a static conditions [31], such behavior might be evolutionary advantageous under fluctuating environ-ments [32]. As the environment changes, bacteria can hedge their bets via the well-documented phenomena of persistence [33, 34]. A salient case is their response to different antibiotic treatment strategies.

To explore the effect of memory on bacterial fitness, we consider the case of persisters when subjected to transient antibiotic exposure. Persistence is a common phenomenon in bacteria where a subpopulation of bacterial cells do not grow but is able to survive antibiotic treatments. Persisters can arise stochastically via epigenetic switching, but environmental conditions (nutrient limitation, high cell densities or antibiotic treatments) can also induce their formation (‘triggered’ persistence or type I persisters;[29]). We track population composition and fitness (population size) of bacterial lineages under fluctuating environments, consisting of transient exposure to antibiotics. We assign ‘off’ cells to the normal physiological state (i.e. growing in permissive conditions, dying under antibiotic treatment). On the other hand, ‘on’ cells are persisters (no growth in either environment and tolerant to antibiotic treatment; see description of parameters in Fig. 4 and example dynamics in the Fig.SI.4). While leaching rate is kept constant, switching is triggered only under stress from ‘off’ to ‘on’

We subject a growing population of cells to a set of 84 environmental sequences where bacteria are exposed to drug treatment sequences (each horizontal line in Fig. 4 A corresponds to a different environmental sequence). The total duration of the procedure is kept constant and the duration of drug treatment varies, with exposure to the permissive environment before and after drug exposure (also see SI). Using these environmental sequences, we compute for each condition the fitness (proxied by the final population size) of a set of lineages with different memory sizes. We observe a non-linear relationship between memory size and fitness (Fig. 4 B). While there is a general trend that more memory is beneficial when drug treatment increase, local maxima emerge at intermediate drug treatment lengths (Fig SI2). This effect arises from the trade-off between producing a high proportion of ‘on’ cells (occurring with higher memories) and exiting rapidly from the ‘on’ state when the environment switches from drug to permissive (occurring with smaller memories). For extended treatment lengths, longer memory in such cases staves-off an inevitable population collapse.

However, memory is coupled to the transients as well as to the final frequency of ‘on’ cells observed at the end of each sequence. To disentangle the respective effect of these two factors, we compared the fitness of a lineage with memory (*n* > 1) to a lineage without memory (*n* = 1). For ensuring a fair comparison, the memory lineage has a different leaching rate which results in the same equilibrium ‘on’ frequency under sustained treatment Fig. 4 C. The relative growth rate is the difference between the growth rate of a lineage with memory and without memory (but an adjusted leaching rate) following http://myxo.css.msu.edu/ecoli/srvsrf.html. When the treatment length is short, longer memory is disadvantageous as compared to extended leaching rates. The disadvantage is because the cells stay in the ‘on’ state even after the short treatment has elapsed. For intermediate treatment lengths, more memory allows (i) to produce more ‘on’ cells and (ii) longer residual time in the ‘on’ state. Long residual times correspond to longer lag times which then could conflict with the total sequence length where the fitness is measured. Hence very long memory is also not useful as the cells then take much longer to exit the ‘on’ state as compared to the sequence length. Thus we see the presence of local maxima in memories (driven partly by the absolute fitness of the memoryless lineage, see SI). Under lengthy sustained treatment, all lineages are at the ‘on’ equilibrium, and memory length is inconsequential. This analysis reveals that, for a range of conditions, memory outperforms ‘classical’ memoryless switching (for an equal equilibrium value of ‘on’ cells).

## Discussion & Conclusion

Studies on phenotypic memory have typically focused on models assuming two distinct states, ‘on’ and ‘off’ [35]. We show here that this is often used assumption hinders us from understanding the rich dynamics observed in experimental and empirical studies. We have addressed this gap between observation and theory by extending the analysis of the two type model to multiple underlying states.

Assuming two phenotypic observables (‘on’ and ‘off’) is already a simplification, however going forth with it, we have extended the underlying possible cellular states of the ‘on’ type. The number of ‘on’ states act essentially as memory, since an increase in the number of compartments will delay the time required to get back to the ‘off’ state. Memory, defined as such, results in non-exponential distribution of residence time in the ‘on’ state [24].

Thinking of memory as a multi-state process appears to explain otherwise anomalous observations. At the single-cell level, the lag is the time it takes to resume division for a single cell taken out of a tolerant population that has just exited a prolonged antibiotic treatment [17, 27]. Under artificial selection experiments, cells adapt their lag time to match treatment duration [17, 27] while displaying within-population variability. This variability increases with the mean lag time (see SI). Initially, this observation seems counterintuitive, as the best strategy would be for all cells to resume growth as soon as the treatment is over. Seeing the time a cell takes to go to the ‘off’ state, where growth starts again, as its lag time, if the cell is in compartment *i* = *n*,…, 1, this time is gamma distributed with shape parameter i and rate parameter e, while if the cell is in the zeroth compartment, its lag time is zero. Incidentally, we note that the gamma distribution is a good model of lag times according to experimental results [27]. As ‘on’ cells are distributed over multiple compartments variability is produced in the form of cells having lag times following different gamma distributions. More-over, according to our framework, change the mean lag time of a cell population to match treatment length could be possible by a variety of ways: evolution in the memory length, in the leaching rate, in both or competition between lineages with variable properties of the gamma distribution.

The transients of the dynamics can be understood in terms of eigenvalue analysis. Complex eigenvalues introduce oscillations in the dynamics (SI). Larger memory sizes correspond to more solutions in the imaginary space which are reflected in the dynamics with more oscillations (Fig. 3). The magnitude of the effect hinges on how fast cells experience the memory (leaching rate) and the initial switch rate. For any leaching rate, however, as the memory increases, the oscillations (captured by the peak-to-peak amplitude) increase but only up to a limit (Fig. 3 (b)). At the population level, the difference is solely in the fraction of the ‘on’ cells and not in the population size. This result is based on our assumption of both the ‘on’ and ‘off’ cells having the same birth and death rates. Forgoing this assumption would lead to a further divergence between the composition and size of lineages with different memories. Whether fitness depends on composition or size of a lineage, memory will bring unique dynamical properties that might impact survival. Ecological context will set the timing of fitness evaluation, realising Darwinian selection on lineages Fig. 3).

Phenotypic heterogeneity while advantageous for a lineage [36, 37] can be a nuisance when population expansion is harmful. Since Hobby and Bigger [33, 34] persisters have been a fly in the ointment for antibiotic treatment only exacerbated now by the antibiotic crisis. Similarly, in cancer, quiescent subclones are a persistent problem leading to relapse [38, 39, 40]. The subclone population dynamics also show over-shooting [41, 26]. We have presented the structure of the use of multi-state memory and its application to antibiotic tolerance. The persisters as defined in our case are a subpopulation of tolerant cells appropriately defined as “heterotolerant” [29]. While tolerance does not affect minimum inhibitory concentration, the duration of treatment will be crucial for the eradication of bacteria. Considering memory brings an additional time-scale which can be exploited for controlling pathogenic populations.

Numerous extensions of our approach are possible. For example, we have fo-cused on heterogeneity resulting from only environmentally triggered switches [42, 29]. Alternative switching mechanisms may depend on the intrinsic properties of the population such as density or composition. Similarly, the leaching mechanism is predetermined and constant across the compartments. It is possible that such processes are under complex, joint control of the organism as well as the environment. Although detailed experimental characterisation of bidirectional switching behaviors remains rare (due to technical challenges), we expect that memory-based switching is not an exception, and triggered by excitable genetic switches [24]. While theoretical developments are essential, the applied aspect can be further exploited. Beyond antibiotics and cancer treatment, bioengineering and understanding of the formation of microbial consortia could be informed using our approach. Especially when harsh environments and time-lags are of importance such as in niche construction and the evolution of multicellularity [25, 43].

Understanding gene regulatory networks on a developmental landscape, à la Waddington [44], poses exponentially complex computational challenges (e.g. the explosion of possible attractors when considering multiple switches [45] and multiple phenotypes). We show the existence of multiple local maxima for memory, that may change depending on the definition of fitness (population size or composition). This further forces us to rethink our concepts about possible phenotypic states and how they are determined by a plethora of molecular constructs (see SI for an extended discussion). How epigenetic memory functions over generations would focus on understanding how the phenotypic clock (molecules, appropriate histone modifications) are inherited across generation and their rate of degradation.

Phenotypic heterogeneity forces us to reassess the genotype-phenotype map in a fundamental manner. Choosing the appropriate phenotypic response to complex and varied environments is possible via a multitude of processes such as environmental sensing, epigenetic triggers and controlled molecular concentrations. Such processes that interpret the genetics to a large but finite phenotypic space to survive in a possibly infinite environmental space are extremely relevant for natural selection. Theories as described herein coupled with experiments exploring a number of environments will help us elucidate the variety of possible interpreting mechanisms bridging the genotype-phenotype divide.

## Supplementary material

### Matrix model and solution

The system in Eq. 2 of the main text can be written in matrix form as
where
and *x(t)* = (*x _{0},x_{n}*,…,

*x*

_{1})

^{t}, here the superscript T indicates vector transposition. The solution to Eq. SI.1 is

*x(t)*=

*e*. The matrix

^{tA}x_{0}*A*is essentially non-negative (i.e. off-diagonal entries are non-negative) and irreducible (i.e. any compartment can be reached from any other compartment). This ensures that

*A*has a real dominant eigenvalue

*λ*such that all other eigenvalues have smaller real part. As a consequence, asymptotically (

*t*→ ∞) the cell population grows exponentially with rate

*λ*and the distribution of cells in the compartments approaches the right eigenvector

*u*corresponding to

*λ*, when this eigenvector is scaled so that its components add up to unity.

#### Symmetric growth dynamics

Under complete symmetry in the growth dynamics for all compartments, i.e. *b _{i}* =

*b*and

*d*=

_{i}*d*, we have that

*λ*=

*b*- d, i.e. growth equals birth rate minus death rate. Solving then the eigenvector equation

*=*

**Au***λ*

**u**for the symmetric case, the equilibrium fraction of cells in any ‘on’ compartment (

*i*=

*n,n*-1,…, 1) is equal to

*μ*/(

*ϵ*+

*nμ*). Thus, the equilibrium fraction of cells in the ‘off’ state, i.e. in the zeroth compartment, is equal to

*ϵ*/(

*ϵ*+

*nμ*). This expression shows that a change in the number of ‘on’ compartments and a change in the ratio of the switching rate to the leaching rate have the same effect on this equilibrium: if n or the

*μ*-to-

*ϵ*ratio increase (decrease), the equilibrium fraction of cells in the ‘on’ state increases (decreases). However, the equivalence breaks down when one looks at transient dynamics (Fig. SI.1). To understand why, let

*λ*

_{1},

*λ*

_{2},…,

*λ*be the eigenvalues of

_{n}*in Eq. SI.2 in order of decreasing real part and*

**A**

*u*_{1},–,

*u*_{n}be the corresponding eigenvectors, which are assumed linearly independent. Then, the solution to Eq. SI.1 can be written as where

*a*are fixed scalars corresponding to the coordinates of

_{j}*x*

_{0}in the basis given by the eigenvectors of

*A*. This solution highlights how subdominant eigenvalues may influence dynamics for small

*t*, i.e. before the asymptotic phase when these get dominated by

*λ*

_{1}. The evolution of

*x(t)*relative to the influence of the dominant eigenvalue overtime is given by . Since

*λ*

_{1}dominates the other eigenvalues, we have that |

*e*/

^{λj}*e*1 < 1 for

^{λ1}*j*> 1 and we get that

*x(t)*/

*e*=

^{tλ1}*a*as

_{1}U_{j}*t*→ ∞. But for small

*t*, the quantities

*a*(

_{j}*e*/

^{tλj}*e*)

^{tλ1}*may be not negligible. Typically, the closer subdominant eigenvalues are in their real part to the dominant eigenvalue, the larger and more protracted the transients. Complex eigenvalues also introduce oscillations. In our case, as Fig. SI.1 shows, the more ‘on’ compartments are present, the more complex eigenvalues that are closer in real part to the dominant are found in the spectrum of*

**u**_{j}**. Dynamically, this produces greater overshooting and slower convergence. This is not observed when the**

*A**μ*-to-

*ϵ*ratio increases yet a single compartment is present. For n = 1, the only subdominant eigenvalue is

*λ*-

*μ*-

*ϵ*, which is real and gets closer to the dominant eigenvalue

*λ*as the

*μ*-to-

*ϵ*ratio increases. But while this increase leads to an increase in the equilibrium fraction of ‘on’ cells, see formula above, no overshooting ensues, as Fig. 4 in the main text shows.

#### Environmental Variation

Populations of cells can respond to stressful environmental conditions in a variety of ways. We focus on sequences of environments composed of multiple seasons as a source of environmental stress. Each season is either conducive or harmful to the population. As a response, a lineage of cells is bestowed with different growth rates for the ‘off’ and ‘on’ compartments. The ‘off’ cells thus correspond to the growing, typically the observable quantity of a lineage. The ‘on’ cells do not grow but flow through the compartments. We look at ‘triggered’ persistence [29], i.e. the cells switch from ‘off’ to ‘on’ only when the season is harmful.

### 4.1 Sequence diversity

As an example of controlled fluctuating environments, antibiotic treatment schedules are apt examples. Treatment regimes take various forms [17, 46]. A good understanding of the effect of sequence and even sequence memory of cells can help design effective treatment regimes aimed at minimising collateral resistance to multiple drugs.

In the main text we focused on a large number of treatment sequences (in total 84). The sequences ranged from no treatment to all treatment where each season lasted one time unit. The growth rates of lineages of cells with different memories was calculated as *g _{m}(i)* = log(

*N*/

_{tmax}*N*

_{0})/

*t*for each memory size

_{max}*i*. We compare this Malthusian growth rate to that of a lineage which gives the same eventual ‘on’ frequency but for

*n*=1 i.e. a memoryless process (which will have a different leaching rate

*ϵ*). Thus r =

*g*-

_{m}(i)*g*(

_{mless}*n*= 1,

*ϵ*)/

*t*is the difference in the Malthusian growth rates of the two lineages if under direct competition (http://myxo.css.msu.edu/ecoli/srvsrf.html).

_{max}Using our minimal setup we can explore the effects of relaxing treatment for different lengths of time. However it would be useful to reduce the number of sequence to perform a thorough analysis. In the main text the fluctuating sequences last for 165 time-steps leading to 84 treatment regimes. We let each season last for 15 time-steps instead of 1, which yields us only 7 different sequences of 11 seasons each. The analysis of these sequences is shown in Fig. SI.3 and Fig. SI.4.

### 4.2 Condition dependent switching

In the main text we have assumed ‘triggered persistence’ i.e. when not under treatment the switch from ‘off’ to ‘on’ does not work. Relaxing this assumption, in Fig. SI.5, we show the result of the dynamics on the spectrum of constitutive switching where (*μ _{treatment}* =

*μ*= 0.2) to strictly antibiotic triggered switching (

_{no treatment}*μ*= 0.2,

_{treatment}*μ*= 0.0). If a lineage switches to the ‘on’ cells constitutively then the eventual population size is much smaller as the lineage loses cells to the non-growing state and the growth can only proceed after they have exited the ‘on’ cells after leaching through all the memory states. The exact shape of the plotted clines depend further on the specific sequence chosen to be explored.

_{notreatment}### 4.3 Long term growth dynamics for cyclic environments

Using the analysis in [11], we also consider long term dynamics when the population goes through the same seasonal sequence a repeated number of times. We form two matrices *A*_{R} and *A*_{T} with the same structure as in Eq. SI.2. Matrix *A*_{R} is parametrized to reflect the growth and phenotypic dynamics of the population under a relaxed season, while matrix *A*_{T} is parametrized to reflect the growth and phenotypic dynamics of the population under a treatment season. Given an initial population composition *x*(0), the final population composition *x*(*nτ*) after a sequence comprising *n* seasons each lasting *τ* units of time is obtained as
where *i _{j}* ∈ {R,T},

*j*= 1,…,

*n*. When the population cycles

*m*times through the same sequence, the population composition at the end of the

*m*-th cycle is

*x*(

*nτm*) =

*B*^{m}

*x*(0). As

*m*gets large, the population size grows by a factor equal to the dominant eigenvalue of

**after having gone through a sequence. Using this approach, it can be shown (Fig. SI.6) that short term population growth for different memory sizes as explored in Fig. 5 of the main text displays a strikingly qualitatively similar behavior as long term growth.**

*B*However, ** B**, as a product of matrices, has eigenvalues that are invariant to how matrices are arranged in this product. Therefore, this approach does not extend to our analysis in the next section where the effects of season permutations are explored.

### 4.4 Lag time distributions

Upon sufficiently long exposure to sustained treatment, the population reaches a stable compartmental distribution given by the right dominant eigenvector * w_{T}* = (

*w*

_{0},…,

*w*)

_{n}^{T}of the

*matrix scaled so that the eigenvector components add up to 1. This enables us to compute the lag time distribution when cells in ‘on’ state have no growth dynamics, i.e.*

**A**_{T}*b*-

_{i}*d*= 0 - 0 for

_{i}*i*= 0,…,

*n*, as assumed above. A randomly sampled cell from the exposed population is in compartment

*i*with probability

*w*

_{0}. When the cell is in the zeroth compartment, i.e. ‘off’ state, its lag time is 0, as it is already dividing. When the cell is in any ‘on’ compartment, i.e.

*i*= 1,…,

*n*, its lag time is the time it takes for the cell to reach the ‘off’ state, i.e. the zeroth compartment, where division occurs. This time

*τ*is gamma distributed with shape parameter

*i*and rate parameter

*ϵ*so that, a cell from compartment

*i*has lag time

*τ*with probability Gamma(

*i, ϵ, τ*). The probability that a sampled cell from the treated populations has lag time

*τ*is then given by,

#### 4.5 Permutations

Another way of designing treatment regimes is to permute a given number of treatment seasons. As a template we look at the permutations of 4 treatment seasons and 4 relaxed seasons for a total of 70 possible sequences. We check the final size of a lineage once it has experienced all 8 seasons. We do so for cell lineages with different memory lengths. Plotting the final population size against the final ‘on’ cells frequency we see that an increase in the number of compartments typically results in an increase in the final ‘on’ frequency (but not always, see the loops), Fig. SI.7. Also the population size peaks typically at intermediate memory length. An intermediate memory size is then advantageous when selection operates on the surviving population at the end of the treatment sequence.

#### Phenotypic determinism

Phenotypes can be determined jointly by the internal states of a cell (e.g. intracellular protein concentrations) and external effects (available metabolites). Furthermore, the number of possible internal determinants can be numerous. Irrespective of the the possible complexity of phenotypic determinism, in our study we have assumed a system which is best depicted by Fig. SI.8 (a). A trigger (ecological-biotic or abiotic) or a stochastic process is assumed to increase the concentration of a certain phenotypic determinant that can be tracked. We have assumed only two phenotypic states - ‘on’ and ‘off’. Also the decay landscape is gradual. As shown in Fig. SI.8 (a) it is possible that the decay process of the determinant is not smooth can can lead to massive variation in the time spent in the two states. Fig. SI.8 (c) highlights the case when the concentration can determine the same two phenotypes in a redundant fashion. Both very low and very high concentrations generate the same phenotype ‘off’ whereas the intermediate is the ‘on’ state. Indeed an exact distinction between phenotypes is limited by the tests possible to differentiate between the phenotypes and their precision. Multiple phenotypes as shown in Fig. SI.8 (d) when considered will disrupt our classical analysis but will need to be included as molecular tools get more and more precise.

## Acknowledgements

We thank Silvia De Monte and Orso Romano for helpful discussions. Funding from the Max Planck Society is gratefully acknowledged.