Imperfect Strategy Transmission Can Reverse the Role of Population Viscosity on the Evolution of Altruism

Population viscosity, i.e., low emigration out of the natal deme, leads to high within-deme relatedness, which is beneficial to the evolution of altruistic behavior when social interactions take place among deme-mates. However, a detrimental side effect of low emigration is the increase in competition among related individuals. The evolution of altruism depends on the balance between these opposite effects. This balance is already known to be affected by details of the life cycle; we show here that it further depends on the fidelity of strategy transmission from parents to their offspring. We consider different life cycles and identify thresholds of parent–offspring strategy transmission inaccuracy, above which higher emigration can increase the frequency of altruists maintained in the population. Predictions were first obtained analytically assuming weak selection and equal deme sizes and then confirmed with stochastic simulations relaxing these assumptions. Contrary to what happens with perfect strategy transmission from parent to offspring, our results show that higher emigration can be favorable to the evolution of altruism.


Introduction
In his pioneering work on the evolution of social behavior, Hamilton suggested that altruistic behavior would be associated with limited dispersal [17, p. 10]. This notion that tighter links between individuals are beneficial to the evolution of altruism has been shown to hold in a number of population structures (see e.g., [2,23,29,42]). The rationale is that altruism is favored when altruists interact more with altruists than defectors do ( [18, p. 141], [13]), a condition that is met in viscous populations, i.e., populations with limited dispersal.
Yet, living next to your kin also implies competing against them [31,48], which is detrimental to the evolution of altruism. The evolution of social traits hence depends on the balance between the positive effects of interactions with related individuals and the detrimental consequences of kin competition. Under specific conditions, the two effects can even compensate each other, thereby annihilating the impact of population viscosity on the evolution of altruism. First identified with computer simulations [50], this cancelation result was analyzed by Taylor [38] in a model with synchronous generations (i.e., Wright-Fisher model) and a subdivided population of constant, infinite size. The cancelation result was later extended to heterogeneous populations ( [32], with synchronous generations and infinite population size), and other life cycles, with generic regular population structures ( [40], with synchronous generations but also with continuous generations and Birth-Death updating). However, small changes in the model's assumptions, such as overlapping generations [44] or the presence of empty sites [1], can tip the balance in the favor of altruism. This high dependence on life cycle specificities highlights the difficulty of making general statements about the role of spatial structure on the evolution of altruism.
Three different life cycles are classically used in studies on altruism in structured populations: Wright-Fisher, where the whole population is renewed at each time step, and two Moran life cycles (Birth-Death and Death-Birth), where a single individual dies and is replaced at each time step. We will consider the three of them in this study, because even though they differ by seemingly minor details, they are known to have very different outcomes in models with perfect parent-offspring transmission (e.g., [23,29,33,38,39]).
A large number of studies on the evolution of social behavior consider simple population structures (typically, homogeneous populations sensu Taylor et al. [42] and often also infinite population sizes (but see Allen et al. [2], for results on any structure). These studies also make use of weak selection approximations and commonly assume rare (e.g., [8,25,37,43]) or absent mutation (for models assuming infinite population sizes, or models concentrating on fixation probabilities; see [24,46], for recent reviews). These simplifying assumptions are often a necessary step toward obtaining explicit analytical results. Simple population structures (e.g., regular graphs, or subdivided populations with demes of equal sizes) help reduce the dimensionality of the system under study, in particular when the structure of the population displays symmetries such that all sites behave the same way in expectation. Weak selection approximations are crucial for disentangling spatial moments [26], that is, changes in global versus local frequencies (though they can in some cases be relaxed, as in Mullon and Lehmann [28]). Mutation, however, is usually ignored by classical models of inclusive fitness because these models assume infinite population sizes, so that there is no need to add mechanisms that restore genetic diversity [37]. In populations of finite size, this diversifying effect can be obtained thanks to mutation.
When strategy transmission is purely genetic, it makes sense to assume that mutation is relatively infrequent. Even in this case, though, mutations from "social" to "nonsocial" types cannot always be neglected. For instance, experiments with the bacteria Pseudomonas fluorescens have identified transitions between populations dominated by the ancestral "solitary" Smooth Morph type and mat-forming "social" Wrinkly Spreaders, that can be re-invaded by Smooth Morphs not contributing to the formation of the mat (hence described as "cheaters"). The transitions between the different types are due to spontaneous mutations occurring over the timescale of the experiment [19]. In addition to genetic transmission, a social strategy can also be culturally transmitted from parent to offspring. In this case, "rebellion" (as in Frank's Rebellious Child Model [14]), i.e., adopting a social strategy different from one's parents, does not have to be infrequent. Since it is known that imperfect strategy transmission can alter the evolutionary dynamics of social traits, in particular in spatially structured populations (see e.g., [5,10], for graph-structured populations), it is therefore important to understand the impact of imperfect strategy transmission on the evolution of social behavior.
Here, we want to explore the consequences of imperfect strategy transmission from parents to their offspring on the evolution of altruistic behavior in subdivided populations. 1 The question was tackled by Frank [14], but with a non-"fully dynamic model" ( [14], legend of Fig. 7). Relatedness was treated like a parameter, which precluded the exploration of the effects of population viscosity on the evolution altruism.
For each of the three life cycles that we consider, we compute the expected (i.e., longterm) frequency of altruists maintained in a subdivided population, and investigate how this frequency is affected by mutation and emigration. We find that, contrary to what happens with perfect strategy transmission, higher emigration can increase the expected frequency of altruists in the population.

Assumptions
We consider a population of total size N , subdivided into N D demes connected by dispersal, each deme hosting exactly n individuals (i.e., each deme contains n sites, each of which is occupied by exactly one individual; n N D = N ). Each site has a unique label i, 1 ≤ i ≤ N . There are two types of individuals in the population, altruists and defectors. The type of the individual living at site i (1 ≤ i ≤ N ) is given by an indicator variable X i , equal to 1 if the individual is an altruist and to 0 if it is a defector. The state of the entire population is given by a vector X = {X i } 1≤i≤N . For a given population state X, the proportion of altruists is X = N i=1 X i /N . All symbols are summarized in Table 1. Reproduction is asexual. The offspring of altruists are altruists themselves with probability 1−μ 1→0 and are defectors otherwise (0 < μ 1→0 ≤ 1/2). Similarly, the offspring of defectors are defectors with probability 1 − μ 0→1 and are altruists otherwise (0 < μ 0→1 ≤ 1/2). Our calculations will be simpler if we introduce the following change in parameters: The composite parameter ν corresponds to the expected frequency of altruists in the population at the mutation-drift balance (i.e., in the absence of selection; see "Appendix A" for details). We call ν the "mutation bias" parameter. Parameter μ is the sum of the two mutation probabilities. In the absence of selection, at the mutation-drift equilibrium, the correlation between offspring type and their parent's type is 1 − μ (see "Appendix A" for details for the calculation). We call μ the mutation intensity. An individual of type X k expresses a social phenotype φ k = δ X k , where δ is assumed to be small (δ 1). This assumption of small phenotypic differences leads to weak selection. This type of weak selection is called "δ-weak selection" in Wild and Traulsen [49]. Social interactions take place within each deme; a focal individual interacts with its n − 1 other deme-mates. We assume that social interactions affect individual fecundity; f k denotes the fecundity of the individual at site k (1 ≤ k ≤ N ), which depends on deme composition. We denote by b the sum of the marginal effects of deme-mates' phenotypes on the fecundity of a focal individual and by −c the marginal effect of a focal individual's phenotype on its own fecundity (c ≤ b; see system (A22) for formal definitions).
Offspring remain in the parental deme with probability 1 − m and land on any site of the parental deme with equal probability (including the very site of their parent). With probability m, offspring emigrate to a different deme, chosen uniformly at random among the N D − 1 other demes. Denoting by d i j the probability of moving from site i to site j, we have if sites i and j are in the same deme; This upper bound is here to ensure that within-deme relatedness R, which will be defined later in the article, remains positive. When the emigration probability m is equal to the upper bound 1 − 1 N D , the population is effectively well mixed (d in = d out ). We denote by B i = B i (X, δ) the expected number of successful offspring of the individual living at site i ("successful" means alive at the next time step) and by D i = D i (X, δ) the probability that the individual living at site i dies. Both depend on the state of the population X, but also on the way the population is updated from one time step to the next, i.e., on the chosen life cycle (also called updating rule). Because this term appears in our calculations, we also define This is a particular definition of fitness, where the number of offspring produced (B i ) is scaled by the parent-offspring-type correlation (1 − μ). We will specifically explore three different life cycles. At the beginning of each step of each life cycle, all individuals produce a large (effectively infinite) number of offspring, in proportion to their fecundity; some of these offspring can be mutated. Then, these juveniles move, within the parental deme or outside of it, and land on a site. The next events occurring during the time step depend on the life cycle: Moran Birth-Death: One of the newly created juveniles is chosen at random; it kills the adult who was living at the site, and replaces it; all other juveniles die. Moran Death-Birth: One of the adults is chosen to die (uniformly at random among all adults). It is replaced by one of the juveniles who had landed in its site. All other juveniles die. Wright-Fisher: All the adults die. At each site of the entire population, one of the juveniles that landed there is chosen and establishes at the site.
Previous studies have shown that, when social interactions affect fecundity, altruism is disfavored under the Moran Birth-Death and Wright-Fisher life cycles, because the expected frequency of altruists under these life cycles is lower than what it would be in the absence of selection (e.g., [10,[38][39][40]). However, we are interested in the actual value of the expected proportion of altruists in the population, not just whether it is higher or lower than the neutral expectation. This is why we are still considering the Moran Birth-Death and Wright-Fisher life cycles in this study.

Analytical Part
The calculation steps to obtain the expected (i.e., long-term) proportion of altruists are given in "Appendix B." They go as follows: First, we write an equation for the expected frequency of altruists in the population at time t + 1, conditional on the composition of the population at time t; we then take the expectation of this quantity and consider large times t. After this, we write a first-order expansion for phenotypic differences δ close to 0. (This corresponds to a weak selection approximation.) The formula involves quantities that can be identified as neutral probabilities of identity by descent Q i j . These quantities correspond to the probability that individuals living at site i and j share a common ancestor and that no mutation occurred on either lineage since that ancestor, in a model with no selection (δ = 0) and with mutation intensity μ; this is the "mutation definition" of identity by descent [34]. In a subdivided population like the one we consider, there are only three possible values of Q i j : wheni = j, Q in when i = j and both sites are in the same deme, Q out when both sites are in different demes.

Stochastic Simulations
To check our results and also relax some key assumptions, we ran stochastic simulations. The simulations were run for 10 8 generations. (One generation is one time step for the Wright-Fisher life cycle and N time steps for the Moran life cycles.) For each set of parameters and life cycle, we estimated the long-term frequency of altruists by sampling the population every 10 3 generations and computing the average frequency of altruists. All scripts are available in Ref. [11].

Expected Frequencies of Altruists for Each Life Cycle
For each of the life cycles that we consider, the expected frequency of altruists in the population, E X , can be approximated as with W as defined in Eq. (3). Calculations leading to Eq. (5) are presented in "Appendix B"; notations are recapitulated in Table 1. In particular, B * is the expected number of offspring produced by an adult, in the absence of selection (when δ = 0; B * = 1 for the Wright-Fisher life cycle and B * = 1/N for the Moran life cycles). Subscript "•" denotes a focal individual itself and "in" a deme-mate. Partial derivatives are evaluated for δ = 0. The expected frequency of altruists in the population is approximated, under weak selection (δ 1), by the sum of what it would be in the absence of selection [E 0 X = ν, first term in Eq. (5)], plus a deviation from this value, scaled by δ. The −C term corresponds to the effects of a change in a focal individual's phenotype on its own fitness [with the fitness definition given in Eq. (3)]. The B term corresponds to the sum of the effects of the change in deme-mates' phenotypes on an individual's fitness. It is multiplied by R, which is relatedness.
The parametrization proposed in Eq. (1) allows us to decouple the effects of the two new mutation parameters, ν and μ. The mutation bias ν, which was defined in Eq. (1a), does not affect the sign of the second ("deviation") term in Eq. (5); it only appears in the ν(1 − ν) product. The mutation intensity μ, however, affects the values of W , Q in and Q out . The presence of μ at the denominator in Eq. (5) may look ominous; however, both R and (1 − Q out )/μ have a finite limit when μ → 0.
The different terms depend on the chosen life cycle. We first focus on relatedness R.

Relatedness R
Within-deme relatedness R depends on the number of individuals that are born at each time step and hence on the chosen life cycle. In a Moran life cycle (denoted by M), one individual is updated at each time step, while under a Wright-Fisher life cycle (denoted by WF), N individuals-the whole population-are updated at each time step. The formulas for relatedness, R M and R WF , calculated for any number of demes N D and mutation intensity μ, are presented in "Appendix C.2" (Eqs. A44 and A50). When we let the number of demes go to infinity (N D → ∞) and the intensity of mutation be vanishingly small (μ → 0), we recover the classical formulas for relatedness as limit cases (Eqs. A45 and A51). The effects of emigration m and mutation intensity μ on relatedness are represented in Fig. 1. For 0 < m < 1 − 1/N D , within-deme relatedness is positive, and it decreases with m and with μ. (The mutation bias ν has no effect.) The effect of the mutation intensity μ on relatedness is strongest at low emigration probabilities m. As m increases, the relatedness values for different mutation intensities get closer, until they all hit zero for m = 1 − 1/N D (which is the upper bound for the emigration values that we consider, a value such that there is no proper population subdivision anymore).

Primary and Secondary Effects
We now turn to the B and −C terms of Eq. (5), which also depend on the chosen life cycle. We further decompose these terms into primary (subscript P) and secondary (subscript S) effects [47]:

Primary Effects
Primary effects are the same for all the life cycles that we consider: and they do not depend on the emigration probability m (see "Appendix B.2" for details of the calculations).
As we have seen above, the relatedness terms R M and R WF decrease with m (keeping m < 1 − 1/N D ; see Fig. 1). Consequently, if we ignored secondary effects, we would conclude that the expected frequency of altruists in the population E X decreases as the emigration probability m increases. However, secondary effects play a role as well.

Secondary Effects
Secondary effects take competition into account, that is, how the change in the fecundity of an individual affects the fitness of another one. As shown already in models with nearly perfect strategy transmission [16], competition terms depend on the chosen life cycle, because life cycle details affect the distance at which competitive effects are felt. Given the way the model is formulated, −C S = B S /(n − 1) holds for all the life cycles that we consider (see "Appendix B.2" for details of the calculations).
Under the Moran Birth-Death life cycle, both the probability of reproducing and the probability of dying depend on the composition of the population. We obtain the following secondary effects: The competitive effects are the same for the Moran Death-Birth and Wright-Fisher life cycles. In both cases, the probabilities of dying are constant, so we can factor (1 − μ) in the equations: These secondary effects (Eqs. 8a and 8b) remain negative for the range of emigration values that we consider (0 < m < 1 − 1/N D ), and increase with m. In other words, the intensity of competition decreases as emigration m increases.
While the value of these secondary effects increases with emigration m, relatedness R, by which they are eventually multiplied in Eq. (5), decreases with m. We therefore cannot determine the overall effect of emigration m on the expected frequency of altruists in the population by inspecting the different terms of Eq. (5) in isolation. For each life cycle, we need to consider the entire equations to know the overall effect of the emigration probability m on the expected frequency of altruists E X and on how it is affected by the (in)fideliy of parent-offspring transmission μ.

Changes of the Expected Frequency of Altruists with the Emigration Probability m
The rather lengthy formulas that we obtain are relegated to the Appendix and a supplementary Mathematica file, and we concentrate here on the results.

Moran Birth-Death
For the Moran Birth-Death life cycle, we find that the expected frequency of altruists E X is a monotonic function of the emigration probability m. The direction of the change depends on the value of the mutation probability μ compared to a threshold value μ BD c . When μ < μ BD c , E X decreases with m, while when μ > μ BD c , E X increases with m. The critical value μ BD c is given by (recall that N is the total size of the population, N = n N D ). This result is illustrated in Fig. 2b; with the parameters of the figure, μ BD c ≈ 0.026. The threshold value increases with both deme size n and number of demes N D , up to a maximum value 1 − 1 − c/b (equal to 0.034 with the parameters of Fig. 2b).
With this life cycle, however, the expected frequency of altruists E X remains lower than ν, its value in the absence of selection (i.e., when δ = 0).

Moran Death-Birth
The relationship between E X and m is a bit more complicated for the Moran Death-Birth life cycle. For simplicity, we concentrate on what happens starting from low emigration probabilities (i.e., on the sign of the slope of E X as a function of m when m → 0). If the benefits b provided by altruists are relatively low (b < c(n + 1)), E X initially increases with m provided the mutation probability μ is greater than a threshold value μ DB c given in Eq. (10); otherwise, when the benefits are high enough, E X initially increases with m for any value of μ. Combining these results, we write When b < c(n + 1), the mutation threshold does not depend on the number of demes N D , but increases with deme size n. In Fig. 2a, the parameters are such that μ DB c = 0. When μ > μ DB c , the expected frequency of altruists E X reaches a maximum at an emigration probability m DB c (whose complicated equation is given in supplementary Mathematica file), as shown in Fig. 2a. When the mutation probability gets close to 0 (μ → 0), m DB c also gets close to 0. With the Death-Birth life cycle, the expected frequency of altruists is higher than its neutral value ν for intermediate values of the emigration probability m (unless μ → 0, in which case the lower bound tends to 0).

Wright-Fisher
Under a Wright-Fisher updating, the expected frequency of altruists in the population reaches an extremum at the highest admissible emigration value m = 1 − 1 N D . This extremum is a maximum when the mutation probability is higher than a threshold value μ WF c given by and it is a minimum otherwise. With the parameters of Fig. 2c, μ WF c = 0.034.
With the Wright-Fisher life cycle, however, the expected frequency of altruists remains below its value in the absence of selection, ν.

Relaxing Key Assumptions
To derive our analytical results, we had to make a number of simplifying assumptions, such as the fact that selection is weak (δ 1) and the fact that the structure of the population is regular. (All demes have the same size n.) We checked with numerical simulations the robustness of our results when these key assumptions are relaxed.
Strong selection When selection is strong, the patterns that we identified not only still hold but are even more marked, as shown in Fig. 4.

Heterogeneity in deme sizes
To relax the assumption of equal deme sizes, we randomly drew deme sizes at the beginning of simulations, with sizes ranging from 2 to 6 individuals and on average n = 4 individuals per deme as previously. As shown in Fig. 5, the patterns initially obtained with a homogeneous population structure are robust when the structure is heterogeneous.
No self-replacement For the Moran model, it may seem odd that an offspring can replace its own parent (which can occur since d ii = 0). Figure 6, plotted with dispersal probabilities preventing immediate replacement of one's own parent (for all sites i, d ii = d self = 0; d in = (1 − m)/(n − 1) for two different sites in the same deme, d out remaining unchanged), confirms that this does affect our conclusions.
Infinite number of demes Our results are obtained in a population of finite size (the figures are drawn with N D = 15 demes), but still hold when the size of the population is larger. Figure 3b shows the range of emigration and mutation values such that altruism is favored, plotted also for N D → ∞.

Same graphs for dispersal and social interactions
Compared to graphs classically used in evolutionary graph theory (e.g., regular random graphs, grids), the island model is particular because the interaction graph and the dispersal graph are different: Interactions take place only within demes (e out = 0), while offspring can disperse out of their natal deme (d out > 0). One may wonder whether our result depends on this difference between the two graphs. Figure 7 shows that the result still holds when the dispersal and interaction graphs are the same. In this figure indeed, we let a proportion m (equal to the dispersal probability) of interactions occur outside of the deme where the individuals live, and set d self , the probability of self-replacement, equal to 0, so that the dispersal and interactions graphs are the same. Our conclusions remain unchanged.

The Expected Frequency of Altruists in a Subdivided Population Can Increase with the Probability of Emigration
Assuming that the transmission of a social strategy (being an altruist or a defector) from a parent to its offspring could be imperfect, we found that the expected frequency of altruists maintained in a population could increase with the probability m of emigration out of the parental deme, a parameter tuning population viscosity. This result can seem surprising, because it contradicts the conclusions obtained under the assumption of nearly perfect strategy transmission (i.e., in the case of genetic transmission, when mutation is very weak or absent). Under nearly perfect strategy transmission indeed, increased population viscosity (i.e., decreased emigration probability) is either neutral ( [38], and dashed lines in Fig. 2b-c) or favorable ( [42], and dashed lines in Fig. 2a) to the evolution of altruistic behavior.

Quantitative Versus Qualitative Measures
Often, evolutionary success is measured qualitatively, by comparing a quantity (an expected frequency, or, in models with no mutation, a probability of fixation) to the value it would have in the absence of selection. In our model, this amounts to saying that altruism is favored whenever E X > ν. (ν is plotted as a horizontal dashed line in Fig. 2.) Some of our conclusions change if we use this qualitative measure of evolutionary success: Under the Moran Birth-Death and Wright-Fisher life cycles, population viscosity does not promote the evolution of altruism-actually, these two life cycles cannot ever promote altruistic behavior for any regular population structure [40], whichever the probability of mutation [10]. However, under a Moran Death-Birth life cycle (Fig. 2a), altruism can be favored only at intermediate emigration probabilities. Starting for initially low values of m, increasing the emigration probability can still favor the evolution of altruism under this qualitative criterion (see Fig. 3b).

Interpreting the Effect of m on E E E[X]
To better understand the role played by the mutation intensity μ, we focus on the qualitative condition for the evolution of altruism (E X > ν) and on the Death-Birth life cycle, since this qualitative condition is not satisfied in the two other life cycles. Having made sure that B DB > 0 (as shown in supplementary Mathematical file), the qualitative condition for altruism to be favored is given by With the Death-Birth life cycle, the C DB /B DB ratio does not change with the mutation probability μ (the (1 − μ) factors simplified out), but the ratio decreases with the emigration probability m (with 0 < m < 1 − 1/N D ; see the thick black curve in Fig. 3a). This decrease in the C DB /B DB ratio is due to secondary effects (competition) diminishing as emigration increases. Relatedness, on the other hand, decreases with both μ and m (see Fig. 3a). We need to explain the effect of the emigration probability m on condition (12) for different values of mutation intensity μ.
When the emigration probability m is high, relatedness gets closer to zero for all values of mutation intensity μ, while the C DB /B DB remains positive; condition (12) is not satisfied. On the other hand, when the emigration probability m is vanishingly small, lim m→0 R M ≤ lim m→0 C DB B DB , the two only being equal when μ = 0. Hence, condition (12) is satisfied for vanishingly low m only when strategy transmission is perfect. Finally, as m increases to intermediate values, the C DB B DB ratio decreases with a steeper slope than relatedness R, so that the curves can cross provided the mutation probability μ is not too high, i.e., that R was not initially too low already. Hence, for no too high mutation intensity, there is a range of emigration values m such that condition (12) is satisfied.

The Result is Due To Secondary Effects
The result that frequency of altruists can increase with the emigration probability m may seem counterintuitive. It is the case because verbal explanations for the evolution of altruism often rely on primary effects only. Relatedness R decreases with m, so it may be tempting to conclude that increases in the emigration probability m are necessarily detrimental to the evolution of altruism. However, secondary effects play an opposite role, as competition decreases with m, and the effect is strongest at low values of m (see the black curve in Fig. 3a; in the absence of secondary effects, it would just be a horizontal line).
Secondary effects are less straightforward to understand than primary effects, and yet they play a crucial role for social evolution in spatially structured populations. Competition among relatives is for instance the reason for Taylor's [41] cancelation result. Similarly, the qualitative differences between the Moran Birth-Death and Moran Death-Birth life cycles are explained by the different scales of competition that the two life cycles produce [12,16]. Secondary effects are also behind the evolution of social behaviors such as spite [47].

How Small is Small and How Large is Large?
Our results were derived under the assumption of weak selection, assuming that the phenotypic difference between altruists and defectors is small (δ 1). We considered any fidelity of transmission (any μ between 0 and 1) and population size. However, most models considering subdivided populations assume nearly perfect strategy transmission (μ → 0) and infinite population sizes (number of demes N D → ∞). The point is technical, but it is important to know that the order in which these limits are taken matters, i.e., one needs to specify how small μ and δ are compared to the inverse size of the population 1/N . This is in particular the case for the probability of identity by descent of two individuals in different demes, Q out : If we first take the small mutation limit, lim μ→0 Q out = 0, while if we first take the large population limit, lim N →∞ Q out = 1 (see "Appendix C.2" for details). This remark complements findings by Sample and Allen [35], who highlighted the quantitative differences between different orders of weak selection and large population limits.

Imperfect Transmission and Rebellious Children
Our model bears resemblance to the Rebellious Child Model by Frank [14], who studied the evolution of a vertically transmitted cultural trait in an asexually reproducing population. In Frank's model, however, relatedness r is treated as a fixed parameter ( [14], legend of Figure 7). Our model is mechanistic; relatedness r necessarily depends on the mutation probability μ, because probabilities of identity by descent do.
Mutation was also previously included in models investigating the maintenance of cooperative microorganisms in the presence of cheaters [7,15]. In both of these models, however, only loss-of-function mutation was considered, which corresponds to setting the mutation bias at ν = 0 in our model. This means that the all-cheaters state is absorbing; no matter how favored cooperators may otherwise be, in the long run, a finite population will only consist of cheaters.

Cultural Transmission
Strategy transmission does not have to be genetic: It can be cultural. In our model, strategy transmission occurs upon reproduction, so this is a case of vertical cultural transmission.
The model could nevertheless be interpreted as a representation of horizontal transmission, if we described reproduction as an instance of an individual convincing another one to update its strategy. The Moran Death-Birth model can be interpreted as a modified imitation scheme [6,29,45]-with a specific function specifying who is imitated-with mutation [20], or as a voter model [36]. First, we choose uniformly at random an individual who may change its strategy; with probability μ, the individual chooses a random strategy (altruistic with probability ν), and with probability 1−μ, it imitates another individual. Who is imitated depends on the distance to the focal individual (with probability m it is a random individual in another deme) and on the "fecundities" of those individuals (as shown in Table 2). With this interpretation of the updating rule, however, there is not reproduction nor death anymore.
It remains to be investigated how imperfect strategy transmission would affect the effect of population viscosity on the evolution of altruism in a model implementing both reproduction and horizontal cultural transmission (as in Lehmann et al. [22]). Such a model could then contrast the effects of imperfect genetic transmission and imperfect horizontal cultural transmission.

Coevolution of Dispersal and Social Behavior
This work also raises the question of what would happen if dispersal (e.g., the emigration probability m) could evolve as well. Recent work on the topic has shown that under some conditions disruptive selection could take place, leading to a polymorphism between sessile altruists and mobile defectors [27,30]-though more complex coevolutionary patterns can be obtained when considering the coevolution of altruism and mobility instead of natal dispersal, and unsaturated populations [21]. The assumptions of these studies, however, differ from ours in important ways, in that they consider continuous traits and use an adaptive dynamics framework, where, notably, mutations are assumed to be very rare. It remains to be investigated how non-rare and potentially large mutations would affect their result.

A.1 Expected Frequency of Altruists at the Mutation-Drift Balance
We assume that there is no selection acting (δ = 0), but that there still are two types of individuals in the population. Let Y be the type of a randomly chosen individual (Y = 1 if the individual is an altruist and Y = 0 if it is a defector) in the population, given a proportion y of altruists in the population.
In expectation, we have Let Y be the type of a randomly chosen individual at the next time step, given the frequency y at the previous time step. This randomly chosen individual is altruist if its parent was (which happens with probability y) and it did not mutate (probability 1 − μ 1→0 ), or if its parent was not altruist (probability 1 − y), but the offspring mutated into one (probability μ 0→1 ). We obtain The expected frequency of altruists at the mutation-drift balance, denoted by ν, is found by

A.2 Parent-Offspring Correlation at the Mutation-Drift Balance
We can then compute the parent-offspring-type correlation at the mutation-drift balance. First, let us compute the parent-offspring covariance: Remember that Y and Y are indicator variables and therefore take value in {0, 1}, so that Y 2 = Y (likewise for Y ). Then, the standard deviations are given by and Finally, the parent-offspring correlation is given by using Eqs. (A3)-(A5), and replacing ν by its value [mutation-drift equilibrium, Eq. (A2)], we obtain

A.3 Redefining the Mutation Scheme
With the new mutation parameters μ and ν, we can describe the mutation scheme differently. If we denote by X i the type of a given parent, then the expected type of one of its offspring is Replacing μ 1→0 and μ 0→1 by equivalent combinations of μ and ν as defined in Eqs. (A6) and (A2), i.e., then Eq. (A7a) becomes We can redefine the mutation scheme and interpret Eq. (A7c) as follows. Parents transmit their strategy to their offspring with probability 1 − μ; with probability μ, offspring do not inherit their strategy from their parent but instead get one randomly: With probability ν, they become altruists; with probability 1 − ν, they become defectors. With this alternative description, we can call "mutants" individuals who have the same type as their parent.

B.1 For a Generic Life Cycle
We want to compute the expected proportion of altruists in the population. We represent the state of the population at a given time t using indicator variables X i (t), 1 ≤ i ≤ N , equal to 1 if the individual living at site i at time t is an altruist and equal to 0 if it is a defector; these indicator variables are gathered in a N -long vector X(t). The set of all possible population states is = {0, 1} N . The proportion of altruists in the population is written We denote by B ji (X(t), δ), written B ji for simplicity, the probability that the individual at site j at time t + 1 is the newly established offspring of the individual living at site i at time t. The expected number of successful offspring produced by the individual living at site i at time t is given by B i = N j=1 B ji . We denote by D i (X(t), δ) (D i for simplicity) the probability that the individual living at site i at time t has been replaced (i.e., died) at time t + 1. These quantities depend on the chosen life cycle and on the state of the population; they are given in Table 2 for each of the life cycles that we consider. Since a dead individual is immediately replaced by one new individual (i.e., population size remains constant and equal to N ), holds for all sites i and all life cycles. The structure of the population is also such that in the absence of selection (δ = 0, so that f i = 1 for all sites 1 ≤ i ≤ N ), all individuals have the same probability of dying and the same probability of having successful offspring (i.e., of having offspring that become adults at the next time step), so that where the 0 subscript means that the quantities are evaluated for δ = 0. This also implies that Given that the population is in state X(t) at time t, the expected frequency of altruists at time t + 1 is given by The first term within the brackets corresponds to births of unmutated offspring from parents who are altruists (X i ). The second term corresponds to the survival of altruists. The third term corresponds to the births of mutants who became altruists (which occurs with probability ν), whichever the type of the parent. A lost strategy can always be created again by mutation, so there is no absorbing population state. There exists a stationary distribution of population states (Theorem 1 in Allen and Tarnita [4]). In other words, for large times t, the expected frequency of altruists does not change anymore. (Of course, realized frequencies keep changing over time.) We denote by ξ(X, δ, μ) the probability that the population is in state X, given the strength of selection δ and the mutation probability μ. Taking the expectation of Eq. (A9a) (E X = X ∈ X ξ(X, δ, μ)), we obtain, after reorganizing: Now, we use the assumption of weak selection (δ 1) and consider the first-order expansion of Eq. (A10) for δ close to 0.
where all the derivatives are evaluated for δ = 0. The first line of Eq. (A11) is equal to zero, because B 0 i − D 0 i = 0 (Eq. A8b) and because in the absence of selection (δ = 0), the expected state of every site i is E 0 X i = X ∈ X i ξ(X , 0, μ) = ν (by definition of ν, see "Appendix A.1"). The second term of the second line is zero, because for all the life cycles that we consider, the total number of births in the population during one time step ( N i=1 B i ) does not depend on population phenotypic composition (it is exactly 1 death for the Moran life cycles, and exactly N for the Wright-Fisher life cycle); since it is a constant, its derivative is 0. The third line simplifies by noting again that B 0 i = D 0 i (first term) and that X ∈ ∂ξ(X,δ,μ) ∂δ = 0 since ξ is a probability distribution (so the second term is zero). Equation (A11) then becomes where the derivatives are evaluated at δ = 0. For conciseness, we define a measure of fitness counting offspring only when they are unmutated (in the sense of the alternate mutation scheme described in "Appendix A.3"). With this, using the expectation notation and denoting by E 0 expectations under δ = 0, we can rewrite and reorganize Eq. (A12) as Now, we use a first time the law of total probabilities, taking individual phenotypes φ k are intermediate variables: by definition of φ k (φ k = δ X k ), and where the derivatives are evaluated for all φ i = 0, 1 ≤ i ≤ N . Introducing the notation P i j = E 0 X i X j (expected state of a pair of sites), Eq. (A14) becomes We note that P ii = E 0 X i X i = E 0 X i = ν (X i being an indicator variable, it is either equal to 0 or 1, so X 2 i = X i ). Given that the size of the population is fixed ( N i=1 (B i − D i ) = 0) and given that the total number of births does not depend on population composition in the life cycles that we consider, we have Using the decomposition in Eq. (A15), which is valid for any population composition, and so in particular for X = 1, Eq. (A17a) becomes So far, we have not used the specificities of the population structure that we consider. First, the population is homogeneous (sensu [42]). Because this population is homogeneity, Eq. (A17b) is valid for all i (not just their sum). Secondly, we are considering an island model. Once we have fixed a focal individual i, in expectation there are only three types of individuals: the focal itself (denoted by "•"), n − 1 other individuals in the focal's deme (denoted by "in"), and N − n individuals in other demes (denoted by "out"). With these considerations, Eq. (A17b) becomes (as previously shown by Rousset and Billiard [34, p. 817-818]). Using this island modelspecific notation, Eq. (A16) becomes Injecting Eq. (A17c) into Eq. (A16), we obtain We can also replace the P terms as follows: In "Appendix C.1," using recursions on P i j , we will see that Q i j can be interpreted as a probability of identity by descent, i.e., the probability that the individuals at sites i and j have a common ancestor and that no mutation (using the alternative mutation scheme described in "Appendix A.3") has occurred on either lineage since the ancestor. Replacing the P terms with Eq. (A19) and noting that Q ii = 1, Eq. (A18) becomes (A20) We can further decompose the derivatives, now using the fecundities f as intermediate variables, i.e., The term ∂ f ∂φ k is the marginal effect of a change in the phenotype of the individual living at site k on the fecundity of the individual living at site . By assumption, social interactions take place within demes only, so whenever sites and k are in different demes, we have We then need to characterize the effect of one's own phenotype (i.e., k = ) and of another deme-mate's phenotype (k and being different sites in the same deme) on fecundity. For this, we define b and c so that: Equation (A20) then becomes (using notation • to refer to the focal individual itself, and where W = W i , since the derivatives are the same for all i): (A23) (As previously, all derivatives are evaluated at δ = 0.) Finally, we write a first-order approximation of the expected frequency of altruists in the population: The first term, E 0 X , is the expected frequency in the absence of selection; it is equal to ν (as introduced in Eq. A2). The derivative ∂E X ∂δ δ=0 is obtained from Eq. (A23). We then need to replace the B i and D i terms by their formulas for each life cycle; they are given in Table 2. This is how the expected frequency of altruists in the population is approximated.

B.2 Derivatives for the Specific Life Cycles
We use the formulas presented in Table 2 and the definition of W = W i given in Eq. (A13) for each life cycle. In Eqs. (A26), (A28), and (A30), the first lines within parentheses correspond to primary effects and the second line to secondary effects.

Moran Birth-Death
Under this life cycle, we obtain With these derivatives, Eq. (5) becomes

Moran Death-Birth Under this life cycle, we obtain
With the Death-Birth life cycle, Eq. (5) becomes With this life cycle, Death occurs first, and the probability of dying is independent from the state of the population (since we assume that social interactions affect fecundity. We can therefore factor (1−μ) in all terms. The primary effects (first lines in the parentheses) remain the same as with the Birth-Death life cycle. However, the Death-Birth life cycle leads to different secondary effects compared to the Birth-Death life cycle: Competition occurs at a different scale [16]. Finally, with this life cycle as we defined it, the probabilities of identity by descent Q are the same as with the Birth-Death model.

Wright-Fisher
Under this life cycle, we obtain For the Wright-Fisher life cycle, we have B * WF = 1. Replacing the derivatives presented in Eq. (A29) into Eq. (5), we obtain The only-but important-difference between Eqs. (A30) and (A28) is the value of the probabilities of identity by descent Q, because the number of individuals that are updated at each time step differs.

C.1 Expected State of Pairs of Sites and Probabilities of Identity by Descent
Here, we show the link between the expected state of a pair of sites P i j and probabilities of identity by descent Q i j . In our derivation of E X , P i j is the quantity that appears, but most studies use Q i j . Both are evaluated in the absence of selection (δ = 0).

C.1.1 Moran Model
These calculations apply to both the Death-Birth and Birth-Death updating rules. In a Moran model, exactly one individual dies and one individual reproduces during one time step. Given a state X at time t, at time t + 1 both sites i and j = i are occupied by altruists, if (i) it was the case at time t and neither site was replaced by a non-altruist [first term in Eq. (A31)], or (ii) if exactly one of the two sites was occupied by a non-altruist at time t, but the site was replaced by an altruist (second and third terms of Eq. A31): We take the expectation of this quantity and consider that the stationary distribution is reached (t → ∞); then E X i X j (t + 1) = E X i X j (t) , and we obtain after a few lines of algebra: while P ii = ν. Now we substitute P i j = ν 2 + ν(1 − ν)Q i j in Eq. (A32), we obtain and we realize that Q i j is the probability that the individuals at sites i and j = i are identical by descent (e.g., [40], equation (S1.11); Allen and Nowak [3] Eq. 4). To compute it indeed, we need to pick which site was last updated (i or j with equal probabilities: 1/2), then sum over the possible parent (k); the other individual needs to be identical by descent to the parent (Q k j , Q ki ), disperse to the considered site (d ki , d k j ), and no mutation should have occurred (1 − μ).

C.1.2 Wright-Fisher Model
In a Wright-Fisher model, all individuals are replaced at each time step, so we directly consider the state of the parents: The first term of Eq. (A34) corresponds to both parents being altruists and having altruist offspring; the second line corresponds to exactly one parent being altruist; and the third line corresponds to both parents being non-altruists. (In this latter case, the two offspring have to be both mutants to be altruists.) Taking the expectation and simplifying, we obtain Again, Q i j corresponds to a probability of identity by descent: The individuals at sites i and j are identical by descent if their parents were and if neither mutated ((1 − μ) 2 ).

C.2 Probabilities of Identity by Descent in a Subdivided Population
Two individuals are said to be identical by descent if there has not been any mutation on either lineage since their common ancestor. Because of the structure of the population, there are only three types of pairs of individuals, and hence three different values of the probabilities of identity by descent of pairs of sites Q i j : wheni = j; Q in when i = j and both sites are in the same deme; Q out when sites i and j are in different demes.

(A37)
The values of Q in and Q out depend on the type of life cycle that we consider.
When the number of demes is infinite, Q in is relatively easily obtained using recurrence equations and noting that Q out = 0. However, writing the recurrence equations for Q in and Q out is much more tedious for finite populations. Hence, for finite populations, we will use formulas already derived in Débarre [10] for "two-dimensional population structures." The name comes from the fact that we only need two types of transformations to go from any site to any other site in the population: permutations on the deme index and permutations on the within-deme index. We rewrite site labels (1 ≤ i ≤ N ) as ( 1 , 2 ), where 1 is the index of the deme (1 ≤ 1 ≤ N D ) and 2 the position of the site within the deme (1 ≤ 2 ≤ n). Then, we introduce notationsd i 1 , that correspond to the dispersal probability and probability of identity by descent to a site at distances i 1 and i 2 in the among-demes and within-deme dimensions (e.g.,d i 1 Also, in this section, we distinguish between d self = d ii and d in (in the main text, d self = d in ).

C.2.1 Moran Model
In Débarre [10], it was shown that and λ M such thatQ 0 0 = 1. Let us first computeDq1 in the case of a subdivided population, with N 1 = N D and N 2 = n: (δ q is equal to 1 when q is equal to 0 modulo the relevant dimension, and to 0 otherwise). So for the three types of distances that we need to consider (distance 0, distance to another deme-mate, distance to individual in another deme), and with N 1 = N D and N 2 = n, we obtainD So forQ, using system (A40) in Eq. (A38a), In particular, We find λ M using Eq. (A42a). Let us now go back to Eq. (A41): When r 1 = 0, the two individuals are in the same deme. The two individuals are different when r 2 ≡ 0, and so: And when r 1 ≡ 0, the two individuals are in different demes: With d self = d in = (1 − m)/n, we eventually obtain: .
The probability that two different deme-mates are identical by descent, Q M in , decreases monotonically with the emigration probability m, while Q M out monotonically increases with m (see Fig. 8a).
When the mutation probability μ is vanishingly small (μ → 0), both Q M in and Q M out are equal to 1: In the absence of mutation indeed, the population ends up fixed for one of the two types, and all individuals are identical by descent. Note that we obtain a different result if we first assumed that the size of the population is infinite (N D → ∞), because the order of limits matters; for instance, lim N D →∞ Q M out = 0. Relatedness R was defined in Eq. (A20) as . (A45)

C.2.2 Wright-Fisher
For the Wright-Fisher updating, the equation forQ is different: withD given in Eq. (A38b). In a subdivided population, with N 1 = N D and N 2 = n, this becomes To find λ W F , we solveQ 0 0 = 1, i.e., Then, from Eq. (A47) we deduce and (A48c)  (These formulas are compatible with, e.g., results presented by Cockerham and Weir [9], adapted for haploid individuals.) In the Wright-Fisher life cycle, Q WF in decreases until m = m WF c = N D −1 N D , while Q W F out follows the opposite pattern. The threshold value m WF c corresponds to an emigration probability so high that d in = d out .
The two probabilities of identity by descent go to 1 when the mutation probability μ is very small (μ → 0), except if we first assume that the number of demes is very large (N D → ∞); for instance, with this life cycle as well, lim N D →∞ Q WF out = 0. Also, because more sites (all of them, actually) are updated at each time step, Q in is lower for the Wright-Fisher updating than for a Moran updating, under which only one site is updated at each time step (compare Fig. 8a and b).