## Abstract

In unicellular organisms such as bacteria and in most viruses, mutations mainly occur during reproduction. Thus, genotypes with a high birth rate should have a higher mutation rate. However, standard models of asexual adaptation such as the ‘replicator-mutator equation’ often neglect this generation-time effect. In this study, we investigate the emergence of a positive dependence between the birth rate and the mutation rate in models of asexual adaptation and the consequences of this dependence. We show that it emerges naturally at the population scale, based on a large population limit of a stochastic time-continuous individual-based model with elementary assumptions. We derive a reaction-diffusion framework that describes the evolutionary trajectories and steady states in the presence of this dependence. When this model is coupled with a phenotype to fitness landscape with two optima, one for birth, the other one for survival, a new trade-off arises in the population. Compared to the standard approach with a constant mutation rate, the symmetry between birth and survival is broken. Our analytical results and numerical simulations show that the trajectories of mean phenotype, mean fitness and the stationary phenotype distribution are in sharp contrast with those displayed for the standard model. The reason for this is that the usual weak selection limit does not hold in a complex landscape with several optima associated with different values of the birth rate. Here, we obtain trajectories of adaptation where the mean phenotype of the population is initially attracted by the birth optimum, but eventually converges to the survival optimum, following a hook-shaped curve which illustrates the antagonistic effects of mutation on adaptation.

## 1. Introduction

The effect of the mutation rate on the dynamics of adaptation is well-documented, both experimentally (e.g., Giraud et al., 2001; Anderson et al., 2004) and theoretically. Regarding theoretical work, since the first studies on the accumulation of mutation load (Haldane, 1937; Kimura and Maruyama, 1966), several modelling approaches have investigated the effect of the mutation rate on various aspects of the adaptation of asexuals. This includes lethal mutagenesis theory (Bull et al., 2007; Bull and Wilke, 2008), where too high mutation rates may lead to extinction, evolutionary rescue (Anciaux et al., 2019) or the invasion of a sink (Lavigne et al., 2020). The evolution of the mutation rate per se is also the subject of several models (André and Godelle, 2006; Lynch, 2010).

The fact that mutation rates per unit time should be higher in species with a shorter generation, given a fixed mutation rate per generation, is called the *generation-time effect*, and has been discussed by Gillespie (1991). The within-species consequences of the generationtime effect have attracted less attention. For unicellular organisms such as bacteria mutations occur during reproduction, by means of binary fission (Van Harten, 1998; Trun and Trempy, 2009), meaning that individuals with a high birth rate should have a higher mutation rate (they produce more mutant offspring per unit of time). This is also true for viruses, as mutations mostly arise during replication (Sanjuán and Domingo-Calap, 2016). The probability of mutation during the replication is even greater in RNA viruses as their polymerase lacks the proofreading activity found in the polymerase of DNA viruses (Lauring et al., 2013). As some cancer studies emphasize, with the observation of dose-dependent mutation rates (Liu et al., 2015), the mutation rate of cancer cells at the population scale can also be correlated with the reproductive success, through the individual birth rate. On the other hand, most models that describe the dynamics of adaptation of asexual phenotypically structured populations assume a constant mutation rate across phenotypes (e.g., Gerrish et al., 2007; Sniegowski and Gerrish, 2010; Desai and Fisher, 2011; Alfaro and Carles, 2014; Gandon and Mirrahimi, 2017; Gil et al., 2019). Variations in the individual mutation rate per generation can be caused by genotypic variability (Sharp and Agrawal, 2012), environmental factors (Hoffmann and Hercus, 2000) or more generally ‘G x E’ interactions. The above-mentioned modelling approaches ignore these processes but do take into account a certain variability in the reproductive success. The main goals of the current study is to determine in which context the generation-time effect should be taken into account in these models and to understand the consequences of such birth rate - mutation rate dependence on the evolutionary trajectory of the population.

These consequences are not easy to anticipate as the birth rate is also involved in tradeoffs with other life-history traits. Such trade-offs play a crucial role in shaping evolution (Stearns, 1989). They create evolutionary compromises, for instance between dispersal and reproduction (Nathan, 2001; Smith et al., 2014; Helms and Kaspari, 2015; Xiao et al., 2015) or between the traits related to survival and those related to birth (Taylor, 1991). In this last case, we expect that the consequences of the trade-off on the dynamics of adaptation strongly depend on the existence of a positive correlation between the birth rate and the mutation rate. High mutation rates tend to promote adaptation when the population is far from equilibrium (Sniegowski et al., 2000) but eventually have a detrimental effect due to a higher mutation load (Anciaux et al., 2019) when it approaches a mutation-selection equilibrium. This ambivalent effect of mutation may therefore lead to complex trajectories of adaptation when the birth and mutation rates are correlated.

In the classical models describing the dynamics of adaptation of a phenotypically structured population, the breeding values at a set of *n* traits are described by a vector . The breeding value for a phenotypic trait is usually defined as the total additive effect of its genes on that trait, see (Falconer and Mackay, 1996; Kruuk, 2004) and is independent of the environmental conditions, given the genotype. For simplicity and consistency with other modeling studies, we will call *x* the ‘phenotype’ in the following, although it still represents breeding values. The effect of mutations on the phenotype distribution is described through a linear operator which does not depend on the parent phenotype **x**. The operator can be described with a convolution product involving a mutation kernel (Champagnat et al., 2006; Gil et al., 2017) or with a Laplace operator (Kimura, 1964; Lande, 1975; Alfaro and Carles, 2014; Hamel et al., 2020), corresponding to a diffusion approximation of the mutation effects. Under the diffusion approximation, with *D* > 0 a constant coefficient which is proportional to the mutation rate and to the mutational variance at each trait. The Malthusian fitness *m*(**x**), i.e. the Malthusian growth rate of individuals with phenotype **x**, is defined as the difference between the birth rate *b*(**x**) and death rate *d*(**x**) of this class of individuals:

The following generic equation then describes the combined effects of mutation and selection on the dynamics of the phenotype density *f* (*t*, **x**) under a diffusive approximation of the mutation effects:
in the absence of density-dependent competition, or
if density-dependent competition is taken into account. In both cases, the equation satisfied by the frequency *q*(*t*, **x**) = *f*(*t*, **x**)/*N*(*t*) (with the total population size) is
with the mean fitness in the population:

These models allowed a broad range of results in various biological contexts: concentration around specific traits (Diekmann et al., 2005; Lorz et al., 2011; Martin and Roques, 2016); explicit solutions (Alfaro and Carles, 2014; Biktashev, 2014; Alfaro and Carles, 2017); moving and/or fluctuating optimum (Figueroa Iglesias and Mirrahimi, 2019; Roques et al., 2020); anisotropic mutation effects (Hamel et al., 2020). Then can aslo be extended in order to take migration events into account (Débarre et al., 2013; Lavigne et al., 2020).

With these models, the dynamics of adaptation and the equilibria only depend on the birth and death rates through their difference *m*(**x**) = *b*(**x**) – *d*(**x**). Thus, these models do not discriminate between phenotypes for which both birth and death rates are high compared to those for which they are both low, given that the difference is constant. However, as explained above, the mutation rate may be positively correlated with the birth rate which could generate an imbalance in favour of one of the two strategies: having a high birth rate vs. having a high survival rate. To acknowledge the role of phenotype-dependent birth rate and the resulting asymmetric effects of fertility and survival in a deterministic setting, a new paradigm is necessary.

In this work, we consider the case of mutations that occur during the reproduction of asexual organisms. We assume that the probability of mutation per birth event *U does not* depend on the phenotype of the parent. On the other hand, following classical adaptive landscape approaches (Tenaillon, 2014), the birth and death rates do depend on the phenotype. Using these basic assumptions, we consider in Section 2 standard stochastic individual-based models of adaptation with mutation and selection. We present how the standard model appears naturally as a large population limit of both a discrete-time model and a continuous-time model when the variance of mutation effects is small and when selection is weak (i.e. when the variations of the birth and death rates across the phenotype space are very small). In this work, however, we are interested in a particular setting where this assumption is not satisfied. In this case, using results from Fournier and Méléard (2004) for the continuous-time model, we argue that, when the mutation variance is small, a more accurate approximation of the mutation operator is given by leading to a new equation of the form

Here, the mutation operator *D*Δ(*b*(**x**)*q*)(*t*, **x**) depends on the phenotype **x** through the birth rate *b*(**x**), translating the fact that new mutants appear at a higher rate when the birth rate increases. Models comparable to (but with a discrete phenotype space) appear in the literature, and lead in some cases to quite similar results as the standard model (Hofbauer, 1985; Baake and Gabriel, 2000). However, not always, as shown in this contribution.

In Section 3, we use to study the evolution of the phenotype distribution when the population is subjected to a trade-off between a birth optimum and a survival optimum, and we highlight the main differences with the standard approach . More specifically, we study the evolution of the phenotype distribution in the presence of a fitness optimum where *b* and *d* are both large (the birth or reproduction optimum), and a survival optimum, where *b* and *d* are both small, but such that the difference *b* – *d* is symmetrical.

Based on analytical results and numerical simulations, we compare the trajectories of adaptation and the equilibrium phenotype distributions between these two approaches and we check their consistency with the underlying individual-based models. We discuss these results in Section 4.

## 2. Emergence of a birth-dependent mutation rate in an individual-based setting

In this section, we present how the standard equation and the new model with birth-dependent mutation rate are obtained from large population limits of stochastic individual-based models. We first state a convergence result due to Fournier and Méléard (2004) which provides the convergence of the phenotype distribution of the population to the solution of an integro-differential equation, when the size of the population tends to infinity. We then show that, when the variance of the mutation effects is small, this equation yields the new model . This shows how a dependence between the birth rate and the rate at which new mutant appear in the population arises, even though the probability of mutation per birth event *U* does not depend on the phenotype of the parent. We then treat the case of weak selection, and show how the model is obtained as a large population limit of the phenotype distribution with a specific time scaling, using results in Champagnat et al. (2008). We also state an analogous result for the discrete-time model, where in the same regime of weak selection and small mutation effects, we show the convergence to the solution of as the population size tends to infinity, on the same timescale as the other model.

In this individual-based setting, we consider a finite population of size *N _{t}* where each individual carries a phenotype in a bounded open set . If the individuals at time

*t*have phenotypes {x

_{1}, …,x

*}, we record the state of the population through the empirical measure where*

_{Nt}*δ*

_{x}is the Dirac measure at the point

**x**∈ Ω. Note that the number of individuals

*N*in this stochastic individual-based setting does not correspond to the quantity

_{t}*N*(

*t*) defined in the introduction. In fact, these two quantities will be related via the scaling parameter

*K*> 0:

*N*(

*t*) = lim

_{K→∞}

*N*/

_{t}*K*(or

*N*(

*t*) = lim

_{K→∞}

*N*/

_{t/εK}*K*if time is rescaled, see the definition of

*ε*below).

_{K}Let *M _{F}*(Ω) denote the space of finite measures on Ω, endowed with the topology of weak convergence. For any

*ν*∈

*M*(Ω) and any measurable and bounded function , we shall write

_{F}### 2.1. Derivation of the model with birth-dependent mutation rate

We first consider a continuous-time stochastic individual-based model where individuals die and reproduce at random times depending on their phenotype and the current population size. We let and be two bounded and measurable functions, and we assume that an individual with phenotype **x** ∈ Ω reproduces at rate *b*(**x**) and dies at rate *d*(**x**) + *c _{K}N_{t}* for some

*c*> 0. This parameter

_{K}*c*measures the intensity of competition between the individuals in the population, and prevents the population size from growing indefinitely. Each newborn individual either carries the phenotype of its parent, with probability 1 –

_{K}*U*, or, with probability

*U*, carries a phenotype

**y**chosen at random from some distribution

*ρ*(

**x, y**)

*d*

**y**, where

**x**is the phenotype of its parent.

We can now describe the limiting behaviour of this model when the parameter *K* tends to infinity. The following convergence result can be found for example in Fournier and Méléard (2004, Theorem 5.3) and Champagnat et al. (2008, Theorem 4.2). Let *D*([0, *T*], *M _{F}*(Ω)) denote the Skorokhod space of càdlàg functions taking values in

*M*(Ω).

_{F}*Assume that converges weakly to a deterministic f*_{0} ∈ *M _{F}*(Ω)

*as K*→ +∞

*and that c*> 0.

_{K}= c/K for some c*Also assume that*.

*Then, for any fixed T*> 0,

*as K*→ +∞,

*in distribution in D*([0,

*T*],

*M*(Ω)),

_{F}*where*(

*f*∈ [0,

_{t},t*T*])

*is such that, for any bounded and measurable*,

*where*

We note that, if *f*_{0} admits a density with respect to the Lebesgue measure, *f _{t}* admits a density (denoted by

*f*(

*t*, ·)) for all

*t*≥ 0. In this case, setting

*m*(

**x**) =

*b*(

**x**) –

*d*(

**x**) and we see that the phenotype distribution

*q*solves the following where

In the model (6), due to the coefficient *b* in , mutant individuals appear at a higher rate in regions where *b* is higher. As we shall expose below, this has wide ranging consequences on the qualitative behaviour of the phenotype distribution, which the standard model does not capture. However, the analysis of the integro-differential equation (6) is very intricate. If the variance of mutation effects is sufficiently small, we can instead study a diffusive approximation of equation (6). Assume that the effects of mutation on phenotype can be described by a mutation kernel *J*, such that *ρ*(**y, x**) = *J*(**x** – **y**). Namely,

Formally, we write a Taylor expansion of (*bq*)(*t*, **x** – **y**) at **x** ∈ Ω:

We define the central moments of the distribution:

We make a symmetry assumption on the kernel *J* which implies that *ω*_{k1}, …,*k _{n}* = 0 if at least one of the

*k*’s is odd. Moreover, we assume the same variance

_{i}*λ*at each trait:

*ω*

_{0}, …,0,

*k*=2,0,…,0 =

_{i}*λ*, and that the moments of order

*k*

_{1}+ … +

*k*≥ 4 are of order

_{n}*O*(

*λ*

^{2}). These assumptions are satisfied with the classic isotropic Gaussian distribution of mutation effects on phenotype. For

*λ*≪ 1, we obtain:

Thus, when the variance *λ* of the (symmetric) mutation kernel *J* is small, we expect that the solution to (6) behaves as the solution to :
where *D* = *λU*/2.

*We recall that the assumption here is that mutations occur during reproduction (e.g. in unicellular organisms or viruses). If we had assumed that mutations take place at a constant rate during each individual’s lifetime, instead of linking them to reproduction events, we would have obtained a different equation in* (5) *leading to the standard model instead of *.

### 2.2. Derivations of the standard model

The standard model is classically derived by letting the variance of the mutation kernel tend to zero and by rescaling time to compensate for the fact that mutations have very small effects. In order to obtain the convergence of the process in this regime, one also has to assume that the intensity of selection (measured by *b* – *d*) is of the same order of magnitude as the variance of the mutation kernel. This corresponds to a weak selection regime, where *b* and *d* are almost constant on Ω.

#### Large population limit of the continuous-time model in rescaled timescale

We consider the same stochastic individual-based model as above, but we allow *b, d* and *ρ* to depend on *K*. We thus let *b _{K}*(

**x**) denote the birth rate of individuals with phenotype

**x**,

*d*their death rate, and

_{K}*ρ*will be the mutation kernel. We then make the following assumption.

_{K}##### Assumption (SE) (frequent mutations with small effects)

Let *ε _{K}* =

*K*

^{−η}for some 0 <

*η*< 1 and assume that

*ρ*is a symmetric kernel such that, for all 1 ≤

_{K}*i*≤

*n*, for all in

**x**∈ Ω, some

*λ*> 0 and

*δ*∈]0, 2].

This assumption is what justifies the so-called diffusive approximation, where the effect of mutations on the phenotype density is modelled by a Laplacian in continuous-time.

##### Assumption (WS) (weak selection)

Assume that
for some bounded functions , and some positive *c*.

The following result then corresponds to Theorem 4.3 in Champagnat et al. (2008). Recall that Ω is assumed to be a bounded open set, and further assume that it has a smooth boundary *∂*Ω. Let be the set of twice continuously differentiable functions such that
where is the outward unit normal to *∂*Ω.

*Let Assumptions (SE) and (WS) be satisfied. Also assume that converges weakly to a deterministic f*_{0} ∈ *M _{F}*(Ω)

*as K*→ ∞

*and that*

*Then, for any fixed T* > 0, *as K* → +∞,
*in distribution in D*([0, *T*], *M _{F}*(Ω)),

*where*(

*f*∈ [0,

_{t},t*T*])

*is such that, for any*, with

*D*=

*λU*/2.

For all *t* > 0, if *f*_{0} admits a density with respect to the Lebesgue measure, *f*_{t} admits a density *f*(*t*, ·) ∈ *L*^{1}(Ω) and the phenotype distribution *q*(*t*, x) = *f*(*t*, x)/〈*f _{t}*, 1〉 solves :

As we can see, we have lost the factor *b* in the mutation term by taking this limit. This comes from Assumption (WS) which states that *b _{K}*(

**x**) = 1 +

*O*(

*ε*). As a result this equation does not distinguish the birth optimum from the survival optimum (see Section 3).

_{K}#### Large population limit of an individual-based model with non-overlapping generations

We now consider a model where generations are non-overlapping, meaning that, between two generations (denoted *t* and *t* + 1), all the individuals alive at time *t* first produce a random number of offspring and then die. The population at time *t* + 1 is thus only comprised of the offspring of the individuals alive at time *t*.

Let be a measurable and bounded function and assume that an individual with phenotype **x** ∈ Ω produces a random number of offspring which follows a Poisson distribution with parameter *w _{K}*(

**x**). In order to include competition, we assume that each of these offspring survives with probability

*e*for some

^{-cKNt}*c*> 0, where

_{K}*N*is the number of individuals in generation

_{t}*t*. Each newborn individual either carries the phenotype of its parent, with probability

*1 – U*, or, with probability

*U*, carries a phenotype

**y**chosen at random from some distribution

*ρ*(

_{K}**x, y**)

*d*

**y**, where

**x**is the phenotype of its parent.

We now make several assumptions in order to obtain an approximation of the process as the population size tends to infinity. For the limiting process to be continuous in time, we need to assume that the change in the composition of the population from one generation to the next is very small, and then rescale time by the appropriate factor. This ties our hands somewhat, and we need to assume that *w _{K}* is very close to one everywhere in Ω. More precisely, we make the following assumption.

##### Assumption (WS’)

Let *ε _{K}* =

*K*for some 0 <

^{-η}*η*< 1 and assume that for some bounded function and some positive

*c*.

Here, *w _{K}*(

**x**) corresponds to the Darwinian fitness (the average number of offspring of an individual with phenotype

**x**), while

*m*(

**x**) corresponds to the Malthusian fitness (i.e. the growth rate of the population of individuals with phenotype

**x**). We further assume that

*ρ*satisfies Assumption (SE) above.

_{K}The large population limit of this process is then given by the following result, which is analogous to similar results in continuous-time (for example in Champagnat et al. (2008)). For the sake of completeness, we give its proof in Appendix A.1.

*Assume that Assumption (WS’) is satisfied, along with (SE). Also assume that, converges weakly to a deterministic f*_{0} ∈ *M _{F}*(Ω).

*Then, for any fixed T*> 0,

*as K*→ +∞,

*in distribution in D*([0,

*T*],

*M*(Ω)),

_{F}*where (f*[0,T])

_{t},t ∈*is such that, for any*, where

For all *t* > 0, if *f*_{0} admits a density with respect to the Lebesgue measure, *f _{t}* admits a density

*f*(

*t*, ·) ∈

*L*

^{1}(Ω). Then

*f*(

*t*, ·) solves the equation

We also note that the phenotype distribution *q*(*t*, **x**) = *f*(*t*, **x**)/〈*f _{t}*, 1〉 solves .

Propositions 2.3 and 2.4 show how the standard model arises as a large population limit of individual-based models in the weak selection regime with small mutation effects. However, as Proposition 2.1 shows, the fact that the birth rate does not appear in the mutation term is a consequence of the weak selection assumption. In the next section, we will focus on a situation corresponding to a strong trade-off between birth and survival. In this case, the weak selection assumption is not satisfied. Thus, the new model should be more appropriate to study the dynamics of adaptation, at least when generations are overlapping.

In the model with non-overlapping generations, we expect that the model emerges even when the weak selection assumption is not satisfied. From an intuitive perspective, with this model, the expected number of mutants per generation is *U N*(*t*). Thus, if *N*(*t*) is close to the carrying capacity, the overall number of mutants should not depend on the phenotype distribution in the population. However, if one tries to take a large population limit of the discrete-time model in the same regime as in Proposition 2.1 (keeping *w* and *ρ* fixed and letting the population size tend to infinity), then the phenotype distribution converges to the solution to a deterministic recurrence equation of the form
where is as in (5). We do not study this equation here, but it is interesting to note that the fitness has an effect on the mutations, albeit quite different from that in (5).

In the following section, we use to study the consequences of a birth-dependent mutation rate on the trade-off between birth and survival, and we compare our results to the standard approach of and to individual-based simulations.

## 3. Consequences of a birth-dependent mutation rate on the trade-off between birth and death

We focus here on the trajectories of adaptation and the large time dynamics given by the model , with a special attention on the differences with the standard approach which neglects the dependency of mutation rate on birth rate.

In most related studies, the relationships between the phenotype **x** and the fitness *m*(**x**) is described with the standard Fisher’s Geometrical Model (FGM) where *m*(**x**) = *r _{max}* – ∥

**x**∥

^{2}/2. This phenotype to fitness landscape model is widely used, see e.g. Tenaillon (2014), Martin and Lenormand (2015). It has shown robust accuracy to predict distributions of pathogens (Martin and Lenormand, 2006; Martin et al., 2007), and to fit biological data (Perefarres et al., 2014; Schoustra et al., 2016). Here, however, in order to study the trade-off between birth and survival, we shall assume that the death rate

*d*takes the form:

*d*(

**x**) =

*r*–

*s*(

**x**) for some

*r*> 0 for some function

*s*: Ω → [0,

*r*] such that

*b*and

*s*are symmetric about the axis

*x*

_{1}= 0, in the sense of (9), and we assume that

*b*has a global maximum that is not on this axis. As a result

*s*also has a global maximum, which is the symmetric of that of

*b*. The positive constant

*r*has no impact on the dynamics of the phenotype distribution

*q*(

*t*,

**x**) in model , as it vanishes in the term . To keep the model relevant, the constant

*r*must therefore be chosen such that

*d*(

**x**) > 0 for all

*x*∈ Ω.

We assume that *b*(**x**) reaches its maximum at and *s*(**x**) reaches its maximum at If one of the optima leads to a higher fitness value, we expect that the corresponding strategy (high birth vs. high survival) will be selected. To avoid such ‘trivial’ effects, and to analyse the result of the trade-off between birth and survival independently of any fitness bias towards one or the other, we make the following assumptions. The domain Ω is symmetric about the hyperplane {*x*_{1} = 0}. Next, *b* and *s* are positive, continuous over and symmetric in the following sense:

The optima are then also symmetric about the axis *x*_{1} = 0:
for some *β* > 0, so that the birth optimum is situated to the right of *x*_{1} = 0 and the survival optimum is situated to the left of *x*_{1} = 0. A schematic representation of the birth and survival terms and corresponding fitness function, along the first dimension *x*_{1} is given in Figure 1.

Finally, we assume that the birth rate is larger than the survival rate in the whole halfspace around , and conversely, from (9), the survival rate is higher in the other half-space. In other terms:

From the symmetry assumption (9), we know that the hyperplane {x_{1} = 0} is a critical point for *b* + *s* in the direction *x*_{1}, that is *∂*_{x1} *b*(0, *x*_{2}, …, *x _{n}*) = –

*∂*

_{x1}

*s*(0,

*x*

_{2}, …,

*x*

_{n}).

For the well-posedness of the model , and as the integral of *q*(*t*, **x**) over Ω must remain equal to 1 (recall that *q*(*t*, ·) is a probability distribution), we assume reflective (Neumann) boundary conditions:
with the outward unit normal to *∂*Ω, the boundary of Ω. We also assume a compactly supported initial condition *q*_{0}(**x**) = *q*(0, **x**), with integral 1 over Ω.

### 3.1. Trajectories of adaptation

The methods developed in Hamel et al. (2020) provide analytic formulas describing the full dynamics of adaptation, and in particular the dynamics of the mean fitness , for models of the form , i.e., with a constant mutation rate. As far as model is concerned, due to the birth-dependent term in the mutation operator *D* Δ(*bq*), the derivation of comparable explicit formulas seems out of reach. To circumvent this issue, we use numerical simulations to exhibit some qualitative properties of the adaptation dynamics, that we demonstrate next. We focus on the dynamics of the mean phenotype and of the mean fitness , to be compared to the ‘standard’ case, where the mutation rate does not depend on the phenotype, and to individual-based stochastic simulations with the assumptions of Section 2. In the PDE setting, the mean phenotype and mean fitness are defined by:

#### Numerical simulations

Our numerical computations are carried out in dimension *n* = 2, starting with an initial phenotype concentrated at some point **x**_{0} in Ω. We solved the PDEs with a method of lines (the Matlab codes are available in the Open Science Framework repository: https://osf.io/g6jub/). The trajectories given by the PDE with a birth-dependent mutation rate are depicted in Fig. 2(a), together with 10 replicate simulations of a stochastic individual-based model with overlapping generations (see Section 2). The mean phenotype is first attracted by the birth optimum . In a second time, it converges towards . This pattern leads to a trajectory of mean fitness which exhibits a small ‘plateau’: the mean fitness seems to stabilize at some value smaller than the ultimate value during some period of time, before growing again at larger times. The trajectories given by individual-based simulations exhibit the same behaviour.

On the other hand, simulation of the standard equation without dependence of the mutation rate with respect to the phenotype (with Neumann boundary conditions), leads to standard saturating trajectories of adaptation, see Fig. 2(b) (already observed in Martin and Roques, 2016, with this model). This time, the trajectories given by the model are in good agreement with those given by an individual-based model with non-overlapping generations (see section 2).

If the initial population density *q*_{0} is symmetric about the hyperplane {*x*_{1} = 0}, then so does *q*(*t*, **x**) at all positive times in this case. This is a consequence of the uniqueness of the solution of (which follows from Hamel et al., 2020): we observe that if *q*(*t*, **x**) is a solution of with initial condition *q*_{0}, then so does *q*(*t, ι*(**x**)). By uniqueness, *q*(*t*, **x**) = *q*(*t*, ι(**x**)) at all times. This in turns implies that the mean phenotype remains on the hyperplane {x_{1} = 0}, i.e., at the same distance of the two optima and . Besides, even if *q*_{0} was not symmetric about {*x*_{1} = 0}, i.e., if the initial phenotype distribution was biased towards one of the two optima, the trajectory of would ultimately still converge to the axis {*x*_{1} = 0}. Again, this is a consequence of the uniqueness of the positive stationary state of (with integral 1), which is itself a consequence of the uniqueness of the principal eigenfunction (up to multiplication) of the operator (this uniqueness result is classical, see e.g. Alfaro and Veruete, 2018).

#### Initial bias towards the birth optimum, a multidimensional feature

One of the qualitative properties observed in the simulations (Figure 2(a)) is an initial tendency of the trajectory of the mean phenotype to go towards the birth optimum We show here that this is a general feature, conditioned by the shape of selection along other dimensions. For simplicity, we denote by the mean value of the first trait, that is, the first coordinate of . We consider initial conditions *q*_{0} that are symmetric about the hyperplane {*x*_{1} = 0}, and that are localized around a phenotype **x**_{0} ∈ {**x**_{1} = 0}. By localized, we mean that *q*_{0} vanishes outside some compact set that contains **x**_{0}. We denote by *K*_{0} the support of *q*_{0}, and define the ‘right part of *K*_{0}’. We prove the following result (the proof is detailed in Appendix A.2).

*Let q be the solution of , with an initial condition q _{0} which satisfies the above assumptions. Then the following holds*.

*If*Δ(*x*_{1}*m*) ≥ 0 (*and*≢ 0)*on , then the solution is initially biased towards the birth optimum, that is**If*Δ(*x*_{1}*m*) ≤ 0 (*and*≢ 0)*on , then the solution is initially biased towards the survival optimum, that is*

A surprising feature of this proposition is the discussion around the sign of the quantity Δ(*x*_{1}*m*). It shows that the local convexity (or concavity) of *m* around the initial phenotype is important. It stems from the overall shape and symmetry of *m*. We first illustrate this in dimension 1. In that case, the Laplace operator simply becomes

By the symmetry assumption (9), we know that *m*′(0) = 0 and thus *g*(0) = 0. Therefore, in this one dimensional case, the discussion of Proposition 3.1 about the sign of Δ(*xm*) is linked to the sign of *g*′(0) = 3*m*″(0), that is the local convexity of *m* around 0. Equivalently, it is also dictated by a discussion about the shape of *m*: if *m* presents a profile with two symmetric optima (camel shape, Fig. 1a) or a single one located at 0 (dromedary shape, Fig. 1b), the outcome of the initial bias is different. If *m* has a camel shape, then necessarily *m* admits a local minimum around 0. Therefore, as a consequence of Proposition 3.1, there is an initial bias towards the birth optimum. If *m* has a dromedary shape, the critical point 0 is also a global maximum of *m*. From Proposition 3.1, it means that, reversing it, there is an initial bias towards survival.

This can be explained as follows. In the case where *m* has two optima, the population is initially around a minimum of fitness. By symmetry of *m* there is no fitness benefice of choosing either optimum. However, individuals on the right have a higher mutation rate, which generates variance to fuel and speed-up adaptation, which explains the initial bias towards right. On the other hand, if 0 is the unique optimum of the fitness function, the initial population is already at the optimum. Thus, generating more variance does not speedup adaptation, but on the contrary generates more mutation load, which explains the initial bias towards left.

In a multidimensional setting, we can follow the same explanations, even if another phenomenon can arise. The reason lies in the following formula:

Suppose that, as in Fig. 1(a), there is a local minimum around **x**_{0}, in the first dimension. Then, the first term of (13) is positive in a neighborhood of **x**_{0} as soon as *x*_{1} > 0, as we explained previously. In dimension *n* ≥ 2, if the sum of the second derivatives with respect to the other directions is negative, the overall sign of Δ(*x*_{1} *m*) may be changed. Such a situation can arise in dimension 2 if **x**_{0} is a saddle point. This phenomenon can be observed on Fig. 3. In both Fig. 3(a) and Fig. 3(b), the fitness function *m* is camel like along the first dimension, as pictured in Fig. 1(a). However, as a consequence of the second dimension, we observe, or not, an initial bias towards the birth optimum. Similarly to the one-dimensional case, one can observe that if the mutational load is too important, here on the second dimension, we do not observe this initial bias. This of course cannot be if **x**_{0} is a local minimum of *m* in .

#### Large time behaviour

We now analyze whether the convergence towards the survival optimum at large times observed in Fig. 2(a) is a generic behavior. In that respect, we focus on thestationary distribution *q*_{∞} associated with the model . It satisfies equation
for some . Setting
this reduces to a more standard eigenvalue problem, namely
supplemented with Neumann boundaries conditions:

As the factor 1/*b*(**x**) multiplying is strictly positive, we can indeed apply the standard spectral theory of Courant and Hilbert (2008) (see also Cantrell and Cosner, 2003). Precisely, there is a unique couple satisfying (15)–(16) (with the normalization condition such that *υ*(**x**) > 0 in Ω. The ‘principal eigenvalue’ is provided by the variational formula
where *W*^{1,2}(Ω) is the standard Sobolev space and

An immediate consequence of formula (17) is that is a decreasing function of the mutational parameter *D*. This means that, as expected, the mutation load increases when the mutational parameter is increased.

We expect the stationary state to ‘lean mainly on the left’, meaning that the survival optimum is selected at large times, but deriving rigorously the precise shape of *q*_{∞} seems highly involved. Still, formula (17) gives us some intuition. First, multiplying (15) by *v* and integrating, we observe that Thus, formula (17) shows that the shape of *q*_{∞} should be such that maximizes the Rayleigh quotient *Q*.

We thus consider each term of *Q* separately. From Hardy-Littlewood-Pólya rearrangement inequality, the term is larger when *ψ* is arranged like *m*, i.e., *ψ* takes its largest values where *m* is large and its smallest values where *m* is small. Thus, this term tends to promote shapes of *ψ* which look like *m*. The other term tends to promote functions *ψ* which are proportional to . Finally, the stationary distribution *q*_{∞} should therefore realize a compromise between 1/*b* and . As both functions take their larger values when *b* is small, we expect *q*_{∞} to be larger close to the survival optimum .

More rigorously, define . As realises a maximum of *Q*, we have

Recalling *s*(**x**) = *b*(ι(**x**)) and using the symmetry of *m*, this implies that

Now, we illustrate that *moralement* this gradient inequality means that the stationary distribution tends to be closer to than to . In dimension *n* = 1, assume that *b*(*x*) = exp(–(*x* – *β*)^{2}) and *s*(*x*) = exp(–(*x* + *β*)^{2}). Assume that the domain is large enough so that the integrals over Ω can be accurately approached by integrals over (–∞, +∞). Among all functions of the form *h _{γ}*(

*x*) = exp(–(

*x*–

*γ*)

^{2}), a straightforward computation reveals that which means that the inequality (18) is satisfied by functions

*h*whose maximum is reached at a value

*x*=

*γ*closer to than to .

#### Large mutation effects

This advantage of adaptation towards the survival optimum becomes more obvious when the mutation effects are large. We observed above that (seen here as a function of *D*) is decreasing. Moreover, from (17), we have, for all *D* > 0,

Thus admits a limit as *D* → ∞. Moreover, the corresponding stationary states satisfy . Standard elliptic estimates and Sobolev injections imply that, up to the extraction of some subsequence *D _{k}* → ∞, the functions converge, as

*k*→ ∞, in

*C*

^{2}(Ω) to a nonnegative solution (with mass 1) of . As such a solution is unique and given by: the whole sequence converges to

*C*/

*b*(

**x**) as

*D*→ ∞. Thus, in order to reduce the mutation load, the phenotype distribution tends to get inversely proportional to

*b*in the large mutation regime.

#### An analytically tractable example

Consider the following form for the birth rate, in dimension *n* = 1:

With the assumptions (8) and (19), we get:
and *m*(*x*) = –(*r* + 2) *M* outside (*–a, a*). Then, we consider the corresponding 1D eigenvalue problem (14) in an interval Ω containing (–a, a). Assuming that the phenotypes are extremely deleterious outside (–a, a) (i.e., *M* ≫ 1), we make the approximation *q*_{∞}(±a) = 0. In this case, we prove (see Appendix A.3) that

In other word, the stationary distribution has a larger mass to the left of 0 (where *s* is larger) than to the right (where *b* is larger).

## 4. Discussion

We found that a positive dependence between the birth rate and the mutation rate emerges naturally at the population scale, from elementary assumptions at the individual scale. Based on a large population limit of a stochastic individual-based model in a small mutation variance regime we derived a reaction-diffusion framework that describes the evolutionary trajectories and steady states in the presence of this dependence. We compared this approach with stochastic replicate simulations of finite size populations which showed a good agreement with the behaviour of the reaction-diffusion model. These simulations, and our analytical results on demonstrate that taking this dependence into account, or conversely omitting it as in the standard model , has far reaching consequences on the description of the evolutionary dynamics. In light of our results, we discuss below the causes and consequences of the positive dependence between the birth rate and the mutation rate.

### Birth-dependent mutation rate: causes

Even though the probability of mutation per birth event *U* does not depend on the phenotype of the parent, and therefore on its fitness nor its birth rate, a higher birth rate implies more mutations per unit of time at the population scale. This holds true when mutations mainly occur during reproduction, which is the case for bacteria and viruses (Van Harten, 1998; Trun and Trempy, 2009). The mathematical derivation of the standard model , that does not account for this dependence, generally relies on a weak selection assumption, which *de facto* implies a very mild variation of the birth rate with the phenotype. More precisely, the mutation variance and the difference between birth rates and death rates should both be small and of comparable magnitude, uniformly over the phenotype space explored by the population (see Assumptions SE and WS in Section 2). This is usually achieved by assuming that the leading order in the birth rate does not depend on the phenotype. In such cases, the mutation rate can safely be assumed to be phenotypeindependent at the population scale, even though it is positively correlated with the birth rate, as already observed in (Hofbauer, 1985; Baake and Gabriel, 2000). When there is a single optimum, this weak selection regime is often relevant. In particular, a scaling of the phenotype space shows that taking small mutation effects is equivalent to having a weak selection. Thus, in a regime with small mutation variance, which is required for the diffusion approximation, and with a single fitness optimum, the models and should lead to very similar results. However, in a much more complex phenotype to fitness landscape with several optima, this approximation does not hold. In particular, if the birth rates at each optimum are very different from one another, even with a small mutation variance, the mutation term Δ(*bq*) will be very different from one optimum to another. In such situations, our approach reveals that the model will be more relevant, and lead to more accurate predictions of the behaviour of the individual-based model.

An exception corresponds to organisms with non-overlapping generations: the simulations in Fig. 2(b) indicate that even with a fitness function that strongly depends on the phenotype, the trajectories of adaptation are adequately described by the model . Species with non-overlapping generations include annual plants (but some overlap may exist due to seedbanks), many insect species (e.g. processionary moths, Roques, 2015, again some overlap may exist due to prolonged diapause) and fish species (such as some killifishes with annual life cycles, Turko and Wright, 2015).

In our study, the birth and survival functions have the same height and width, so that the resulting fitness landscape *m*(*x*) is double peaked and symmetric. We chose this particular landscape in order to avoid trivial advantageous effect for one of the two strategies, and to check if an asymmetrical behaviour can emerge from a symmetric fitness landscape. Of course, if the birth optimum corresponds to a much higher fitness than the survival optimum, we expect that at large times the mean phenotype will converge to the birth optimum. However, our approach shows the tendency of the trajectory to be attracted by the survival optimum, which clearly shows up in the symmetric case considered here, and remains true in intermediate situations, as observed in Fig. B.2. More precisely, in an asymmetric double peaked fitness landscape where the two peaks have different height, we observed that having a high survival rate remains more advantageous at equilibrium than having a high birth rate as long as the difference between the fitness peaks remains lower than the difference between the mutation loads generated at each optimum (see Appendix B).

Another feature of the model is that the transient trajectory of mean fitness displays plateaus, as observed for instance in Fig. 2. This phenomenon of several epochs in adaptation is well documented thanks to the longest ever evolution experiment, the ‘Long Term Evolution Experiment’ (LTEE). Experimenting on *Eschereschia Coli* bacteria, Wiser et al. (2013) found out that even after more than 70, 000 generations, fitness had not reached its maximum, apparently challenging the very existence of such a maximum, the essence of Fisher’s Geometrical Model. It was then argued that the data could be explained by a two epoch model (Good and Desai, 2015), with or without saturation. A similar pattern was observed for a RNA virus (Novella et al., 1995). Recently, Hamel et al. (2020) showed that the FGM with a single optimum but anisotropic mutation effects also leads to plateaus, and they obtained a good fit with the LTEE data. Our study shows that, when coupled with a phenotype to fitness landscape with two optima, the model is also a possible candidate to explain these trajectories of adaptation.

### The model in the mathematical literature

Some authors have already considered operator which are closely related to the mutation operator in . For instance Lorz et al. (2011) considered non-homogeneous operators of the form within the framework of constrained Hamilton-Jacobi equations. However this operator does not emerge as the limit of a microscopic diffusion process or as an approximation of an integral mutation operator. It is more adapted to the study of heat conduction as it notably tends to homogenize the solution compared to the Fokker-Planck operator Δ(*b*(**x**)*q*(*t*, **x**)), see Figure II.7 in Roques (2013). Finally, the flexible framework of Bürger (2000) allows for heterogeneous mutation rate. Due to the complicated nature of the operator involved (compact or power compact kernel operator), the theoretical framework is in turn very intricate. Quantitative results are in consequence either relatively few, and typically consist in existence and uniqueness of solutions, upper or lower bounds on the asymptotic mean fitness (Bürger, 1998, 2000), or concern simpler models (with a discretization of the time or of the phenotypic space), see (Hermisson et al., 2002; Redner, 2004; Hofbauer, 1985).

### Sexual reproduction

How to take into account a phenotype dependent birth rate with a sexual mode of reproduction is an open question to the best of our knowledge. A classical operator to model sexual genetic inheritance in the background adopted in this article is the *infinitesimal operator*, introduced by Fisher (1918), see Slatkin (1970); Cotto and Ronce (2014) or the review of Turelli (2017). It describes a trait deviation of the offspring around the mean of the phenotype of the parents, drawn from a Gaussian distribution. Mathematically, few studies have tackled the operator, with the notable only exceptions of the derivation from a microscopic point of view of Barton et al. (2017), the small variance and stability analysis of Calvez et al. (2019); Patout (2020), and finally in Mirrahimi and Raoul (2013); Raoul (2017), with an additional spatial structure, the convergence of the model towards the Kirkpatrick-Barton model when the reproduction rate is large. In all those cases, the reproduction term is assumed to be constant. With the formalism of (2), at the population scale, mating and birth should be positively correlated, which should lead to considering the following variation on the infinitesimal operator, which acts upon the phenotype (for simplicity, we take *n* = 1 here for the dimension of the phenotype):

We try to explain this operator as follows. It describes how an offspring with trait *x* appears in the population. First, an individual *x*_{1} rings a birth clock, at a rate given by its trait and the distribution of birth events *b*, as in (1). Next, this individual mates with a second parent *x*_{2}, chosen according to the weight *ω*. Then, the trait of the offspring is drawn from the normal law .

As the birth rate of individuals seems a decisive factor in being chosen as a second parent, a reasonable choice would be *ω* = *b* in the formula above. Again, to the best of our knowledge, no mathematical tools have been developed to tackle the issues we raise in this article with this new operator. We can mention the recent work Raoul (2021) about similar operators.

A new trade-off, similar to the one discussed in this article, can also arise with the operator (20). Indeed, coupled with a selection term, as in (1) for instance, a trade-off between birth and survival can appear if *b* (or *ω*) and *d* have different optima. It would be very interesting to follow the trajectories of fitness along time as in Fig. 2, to discover if the effects highlighted in this paper for asexual reproduction are still present, and when, with sexual reproduction. Of course, a third factor in the trade-off is also present, through the weight of the choice of the second parent via the function *ω*. If an external factor favors a second parent around a third optimum, then the effect it has on the population should also be taken into account. The relevance of such a model in an individual based setting, as in Section 2 is also an open question to this day for the operator (20). With the assumption *ω* = *b*, the roles of first and second parents are symmetric in the operator (20), and an investigation of the balance between birth and survival could be carried out without additional assumptions.

## Declarations of interest

none

## Acknowledgements

This work was supported by the French Agence Nationale de la Recherche (ANR-18-CE45-0019 ‘RESISTE’). We thank Guillaume Martin for many fruitful discussions.

## APPENDICES

## A. Proofs

### A.1. Proof of Proposition 2.4

For measurable and bounded,

For , for any measurable and bounded where is a local martingale with quadratic variation

*Proof*. From the definition of the model, for any measurable and bounded,

Hence

We now wish to compute

To do this, let and let *N _{i}* be the number of offspring of individual

*i*at time

*t*+ 1 and let (

*Y*, 1 ≤

_{i,j}*j*≤

*N*) denote their types. From the definition of the model,

*N*is a Poisson random variable with parameter

_{i}*w*(

_{K}**x**

_{i})

*e*

^{-cKN}and the (

*Y*,

_{i,j}*j*≥ 1) are i.i.d. with

Then we write

Since the third term depends only on and the first two terms are uncorrelated,

Rearranging, we arrive at

This concludes the proof of the lemma.

Note that, setting equation (21) can also be written

Using Assumption (A1), we then have

We then note that, in the case of Assumption (FE), , while in the case of Assumption (SE), by a Talyor expansion, for any ,
uniformly in **x** ∈ Ω. Finally, note that the first term in (22) is of the order of 1/*K* while the second term is of the order of .

For *N* ≥ 1, define a stopping time by

For any fixed *N* ≥ 1 and *T* > 0, for any ,
in probability as *K* → ∞.

*Proof*. By Doob’s martingale inequality,

Clearly, for ,

We then note that there exists a function such that, for all ,

With this notation,

Hence, using the fact that *r _{K}*(

**x**) has the same sign as

*x*,

Finally, , and, for , under either Assumption (FE) or (SE), we thus have

Plugging (24), (25) and (26) in (23), we obtain

Since the right-hand-side tends to zero as *K* → ∞, this concludes the proof of the lemma.

Fix *T* > 0, and let

Then (*X _{K}*,

*K*> 0) is tight in . Moreover, for any

*δ*> 0,

*N*can be chosen such that

*Proof*. Looking at the statement of Lemma A.1, we note that and that
for some constant *C* >, using the fact that *m* is bounded. As a consequence,

By Gronwall’s inequality, we obtain

Hence,

As a result,

Since is tight, for any *δ* > 0 we can choose *N* large enough such that

In addition, by Lemma A.2, for any *N* ≥ 1,

Hence we can choose *N* large enough such that
concluding the proof.

Let denote the closure of Ω. Then, for any *T* > 0, the sequence of processes
is C-tight in .

*Proof*. Since is compact, Lemma A.3 implies that the compact containment condition of Theorem 3.9.1 in (Ethier and Kurtz, 1986) holds. It remains to show that, for a dense subset *H* of the set of bounded and continuous functions on (in the topology of uniform convergence on compact sets), is C-tight for every *h* ∈ *H*. We shall take

By Lemma A.3, it is sufficient to show that is tight for any *N* large enough. Now, using (25) and (26), for ,
for some constant *C _{ϕ,N}* > 0 depending only on

*ϕ*and

*N*. As a result, if

*w*(

_{θ}*g*) denotes the modulus of continuity of ,

*i.e*. we obtain

Hence, by Lemma A.2, for any *δ* > 0 and *ε* > 0, there exists *θ* > 0 such that

Combined with Lemma A.3, this shows that is C-tight for any and *N* > 0 (see for example (Jacod and Shiryaev, 2003, Proposition VI.3.26)), and the result is proved.

We can now conclude the proof of the main result.

Consider a converging subsequence, still denoted
and let (*f _{t},t* ∈ [0,

*T*]) be its limit. Since the sequence is C-tight,

*t*↦

*f*is continuous and the convergence holds uniformly on [0,

_{t}*T*].

The result will be proved if we show that *f _{t}* solves (7). Let

*ϕ*∈

*C*

^{2}(Ω). By Lemma A.2, in probability as

*K*→ ∞. In addition,

Hence, for and ,

As a result,

Hence, on the event ,

Combined with Lemma A.3 and the (uniform) convergence of to *f*, this shows that, for any *ε* > 0,

It follows that (*f _{t},t* ∈ [0,

*T*]) solves (7), hence converges in distribution, and in probability, to (

*f*∈ [0,

_{t},t*T*]) in . Since in fact

*f*∈

_{t}*M*(Ω) for any

_{F}*t*≥ 0, this concludes the proof of the result.

### A.2. Proof of Proposition 3.1

Multiplying equation by *x*_{1}, integrating over **x** ∈ Ω and evaluating at *t* = 0, we get

From Green formula we infer
since *q*_{0} is compactly supported in Ω. Moreover, since *q*_{0} and *m* both satisfy *m*(*ι*(**x**)) = *m*(**x**), *q*_{0}(*ι*(**x**)) = *q*(**x**),

This shows that .

We next turn to the second derivative . We differentiate equation with respect to time, multiply by *x*_{1}, integrate over **x** ∈ Ω and evaluate at *t* = 0 to reach

From the above computation, this reduces to

Moreover, since *∂ _{t}q*(0,

**x**) is also compactly supported (this follows from equation ), Green formula yields and we are left with

We multiply equation by *x*_{1} *m*(**x**), integrate over **x** ∈ Ω and evaluate at *t* = 0 to obtain

By symmetry, the last two terms vanish, and another Green formula leads to

Then, we observe that
as *q*_{0} and *m* are symmetric about {*x*_{1} = 0}, and from (9). Thus,
with *K*_{0} the support of *q*_{0} (containing **x**_{0}). From this, (27) and (28), we end up with

We know from (10) that (*b* – *s*)(**x**) > 0 in Ω ⋂ {*x*_{1} > 0}. As a result, if Δ(*x*_{1}*m*(**x**)) is nontrivial and nonnegative (nonpositive) on then respectively). This concludes the proof of Proposition 3.1.

### A.3. An explicit solution of the eigenvalue problem

We assume that the dimension is *n* = 1 and
and we consider the eigenvalue problem (14) with Dirichlet boundary conditions. As *b* is discontinuous, the eigenvalue problem must be understood in the weak sense. In particular, we have to solve
with the boundary, continuity and flux conditions:
the positivity conditions *q*_{1,∞}, *q*_{2,∞} > 0 and .

Set and . We have

The equality *q*_{1,∞}(0) = *q*_{2,∞}(0) thus implies:

The positivity of *q*_{1,∞}, *q*_{2,∞} implies that 0 < *a B* < *π*/2. The equation (32) thus admits a unique solution (*a B* ≈ 1.338761890). Additionally, we have:
and using (32),
with *j*(*x*):= (1 – cos(*x*))/sin(*x*). As is increasing on , we get:

As and since *j* is increasing on ,

Finally,

## B. Asymmetric fitness landscapes

In the main text, the birth and survival functions have the same height and width, so that the resulting fitness landscape *m*(**x**) is symmetric, double peaked, and both peaks have equal height. We consider here an asymmetric case, where the two peaks have different height. Namely, we consider the case:
for *γ* ≠ 1 (the case *γ* = 1 is treated in the main text), and with a function *b*_{1} with a single optimum at , with , and which decays to 0 away from . We recall that *ι*(**x**) = *ι*(*x*_{1}, *x*_{2},…, *x _{n}*) = (–

*x*

_{1},

*x*

_{2},…,

*x*).

_{n}In this framework the fitness of the birth optimum is and the fitness of the survival optimum is . In both cases, . We assume here that *ε* ≪ 1, meaning that the phenotype has a birth rate very close to the baseline value *b*_{0}. Similarly, the phenotype has a survival rate *b*_{0} + *γε* close to the baseline survival rate *s*_{0} = *b*_{0}, see the scheme on Figure B.1.

In the main text, with *γ* = 1, we have shown that the trajectories are attracted by the survival optimum. We check here whether this remains true for asymmetric fitness functions (*γ* = 1). In Figure B.2, we depict the position of the mean phenotype (first coordinate) depending on the value of *γ*, at small times (*t* = 40), larges times (*t* = 500) and infinite time (in this case, we directly solve the eigenvalue problem (15) (main text) with Comsol Multiphysics eigenvalue solver). We observe that, at small times, the trajectories are attracted by the birth optimum, whatever the value of γ, and reach positions closer to *β* (the first component of ) as *γ* is increased. At larger times, we observe a bifurcation threshold *γ** > 1 such that the trajectories are still attracted by the survival optimum when *γ* < *γ**, while they are attracted by the birth optimum for *γ* > *γ**.

We claim here that the trajectories are attracted by the survival optimum as long as the difference between the fitness peaks is smaller than difference between the mutation loads that would be associated with an equilibrium distribution around vs around . To check this conjecture, we consider as in the figures of the main text, a function , and we assume a single-peak landscape with a unique optimum at . The corresponding fitness is *m ^{b}*(

**x**) =2

*b*

_{0}+

*γb*

_{1}(

**x**) –

*r*. We make the weak selection approximation in the model and . The results in (Martin and Roques, 2016; Hamel et al., 2020) imply that the equilibrium mean fitness is . The mutation load is: .

Now, consider a single-peak landscape with a unique optimum at , with a fitness function *m ^{s}*(

**x**) = 2

*b*

_{0}+

*b*

_{1}(

*ι*(

**x**)) –

*r*and make a weak selection approximation Δ(

*bq*) ≈

*b*

_{0}Δ

_{q}in the model and . This time, we get , and the mutation load is . Finally, the difference between the fitness peaks is smaller than difference between the corresponding mutation loads if

With the parameter values in Figure B.2, this leads to *γ** = 1.03 which is fully consistent with the numerical results. More generally, the above formula shows that *γ** is an increasing function of *n D* and 1/*σ*.