## Abstract

Theories on the evolutionary origins of altruistic behavior have a long history and have become a canonical part of the theory of evolution. Nevertheless, the mechanisms that allow altruism to appear and persist are still incompletely understood. The spatial structure of populations is known to be an important determinant. In both theoretical and experimental studies, much attention has been devoted to populations that are subdivided into discrete groups. Such studies typically imposed the structure and dynamics of the groups by hand. Here, we instead present a simple individual-based model in which organisms spontaneously self-organize into spatially separated colonies that themselves reproduce by binary fission and hence behave as Darwinian entities in their own right. Using software to automatically track the rise and fall of colonies, we are able to apply formal theory on multilevel selection and thus quantify the within- and among-group dynamics. This reveals that individual colonies inevitably succumb to defectors, resulting in within-colony “tragedies of the commons”. Even so, altruism persists in the population because more altruistic colonies reproduce more frequently. The emergence of the colonies themselves depends crucially on the length scales of motility, altruism, and competition. This reconfirms the general relevance of these scales for social evolution, but also stresses that their impact can only be understood fully in the light of the emergent eco-evolutionary spatial patterns. The results also demonstrate that emergent spatial population patterns can function as a starting point for transitions of individuality.

## 1 Introduction

Over the past decades, a rich body of theoretical research has been devoted to the evolution of social behaviors [1, 2]. In particular, much theory has focused on the evolution of cooperation [3, 4], and more narrowly, altruism [5, 6]: behavior that is costly to the actor but beneficial to its interaction partners. Historically, how natural selection could favor altruism has been a puzzle, but in broad terms the solution has long been understood: altruism can be selected if its benefits accrue disproportionately to altruists, thus offsetting their costs [3, 7–9]. Nevertheless, the mechanisms that allow such an interaction structure to exist and persist are still a matter of intense study and debate [4].

Many classical studies considered populations that are subdivided into distinct groups (*e*.*g*., [10–14]). In such a group or “multilevel” structure, altruistic behavior can be selected provided altruists tend to be grouped together and groups with a higher proportion of altruists tend to have higher mean fitness [11]. In nearly all theoretical models of multilevel selection, the group structure and group-level dynamics are imposed or presupposed by the definition of the model. In contrast, we here present a very simple individual-based model in which altruistic organisms *self-organize* into discrete colonies. Moreover, these colonies themselves spontaneously reproduce by growth and binary fission and hence act as Darwinian entities in their own right. In time, each individual colony is fated to collapse; but when it does, another colony grows and divides, giving rise to the kind of multi-level dynamics that in previous models had to be imposed by hand [14]. Such rudimentary, emergent higher-level entities could be a first step towards a full “transition of individuality” [15].

The model describes a spatial environment inhabited by motile organisms that reproduce and interact locally. As has long been known, local interactions combined with local mating and reproduction can foster altruism if motility is limited, because this allows altruists to aggregate in assorted neighborhoods where they mainly benefit each other [16–18]. However, mathematical and computational models have revealed an important limitation [19–22]. If not only social interactions but also competitive interactions take place locally (“soft” selection [23, 24]), altruists in assorted domains tend to compete with other altruists, in which case the benefits of altruism may be largely or fully canceled by the concomitant increased competition. This local Malthusian trap is alleviated somewhat if the local carrying capacity increases with the proportion of altruists (“elastic” selection), which allows clusters of altruists to become net population sources [25]. Importantly, it can also be avoided if competitive interactions reach beyond the social group or neighborhood, so that clusters of altruists can support each other at the expense of others [17, 19, 22, 26]. This highlights the importance of the relative *scales* of motility, altruism, and competition [1]. As a rule, altruism is favored by limited motility and local social interactions, but global competition.

Long-range competition can come in many implicit forms. For instance, the life cycle of organisms may include a dispersal stage such that individuals can first cooperate with relatives and then compete with non-relatives [19], or the group dynamics may include a global mixing stage in which groups or neighborhoods are periodically fragmented and new ones are seeded [27–29]. To study the effects of the scales of motility, altruism and competition systematically, the model presented here is deliberately designed such that these scales can be set explicitly and independently. As it turns out, their role is much more intricate than anticipated because they play an essential role in the emergence of the colonies and hence in the resulting multilevel eco-evolutionary dynamics.

To quantitatively analyze model simulations, we use software that automatically tracks the rise and fall of colonies. Subsequently, we apply existing formal theory to quantify the contributions to selection at the individual and colony levels. This demonstrates that, within colonies, natural selection favors defectors who profit from the altruists in their neighborhood but do not share in the costs. But colonies characterized by a higher average level of altruism survive longer and reproduce more frequently, resulting in positive selection at the colony level. The steady level of altruism that eventually establishes can be understood as a balance between these forces: a perpetual “tragedy of the commons” [30] within colonies, compensated by positive selection among them.

## 2 Results

### 2.1 Brief description of the model

We start with a brief specification of the model; details are supplied in the Methods.

The model considers a population of discrete individual in a two-dimensional (2D) or one-dimensional (1D) habitat (see Fig. 1a). Individuals possess just one continuous trait *ϕ*, representing their investment in altruistic behavior, and they do only three things: move, in an unbiased fashion modeled by diffusion; die, at a constant (Poisson) rate; and reproduce asexually.

The rate of reproduction of each individual depends on three quantities. First, it decreases with the individual’s own investment in altruism: altruism is costly. A level of altruism of *ϕ* = 0.05 means that the individual sacrifices 5% of its reproduction rate relative to a defector (*ϕ* = 0) under the same conditions. Second, the reproduction rate also decreases with the population density in the individual’s local neighborhood. This models competition for resources and establishes a finite carrying capacity. The local population density is measured as a Kernel Density Estimate (KDE), using a normal distribution with standard deviation *σ*_{rc} as the kernel function. This means that individuals compete strongly with each other only if their spatial separation is of order *σ*_{rc} or less (see Fig. 1b, red line, and 1c), so that *σ*_{rc} can be interpreted as the *scale of competition*. Third, an individual’s reproduction rate increases if altruists are present in its local neighborhood (Fig. 1b, green line, and Fig. 1d). The altruism experienced at a given position ** y**, denoted

*A*(

**), is again quantified as a KDE, but now individuals are weighted in proportion to their level of altruism**

*y**ϕ*. Although the model is not intended to mimic a specific altruistic behavior or mechanism, it is convenient to think of

*A*(

**) as the concentration of some public good secreted by altruistic organisms. If more and more public good is added to the local neighborhood, the benefit eventually saturates. The standard deviation of the kernel function used to calculate**

*y**A*(

**) is called**

*y**σ*

_{a}and generally differs from

*σ*

_{rc}. Because individuals profit significantly from the public good produced by others only if their separation is of order

*σ*

_{a}or less,

*σ*

_{a}can be interpreted as the

*scale of altruism*.

It is worth emphasizing that, contrary to some other models [31, 32], complete defectors (with *ϕ* = 0) are perfectly viable; altruism is not required for the survival of the population.

When an individual reproduces, the offspring appears at the coordinates of the parent; afterwards, parent and offspring move independently and thus part ways. Offspring usually inherits the trait value of the parent, but with a small probability a mutation occurs that increases or decreases it at random.

In simulations, space and time are discretized, and periodic boundary conditions are imposed. Default parameter values are listed in Table 1. Throughout the text, the time unit is the inverse of the death rate, called the “generation time”. Importantly, the scale of altruism *σ*_{a} is used as the unit of length and hence *σ*_{a} = 1 by definition. Thus, just two length scales remain: the scale of competition *σ*_{rc} and the scale of motility, *σ*_{m}. The latter is defined as the typical (that is, root-mean-square) distance traveled by an individual in a generation time (see Methods).

### 2.2 Emergent colonies and multilevel dynamics in the 2D habitat

The complex behavior of the simple model is illustrated in Fig. 2, which presents results of a single simulation run using a 2D habitat. These results are representative for the default parameters (see replicates in Fig. S1) but the parameters themselves have been chosen deliberately to enable the evolution of altruism. In particular, motility is slow and the scale of competition *σ*_{rc} is four times larger than the scale of altruism *σ*_{a}.

As shown in Fig. 2a, all individuals are initialized as defectors, but in time the mean level of altruism steadily increases (thick colored line) before reaching a plateau. To confirm that this rise is largely due to natural selection rather than random drift or mutational bias, we measured the cumulative contribution of natural selection (black line), which is consistently positive (also see Fig. S1 and Appendix A.2).

The surprising spatial dynamics of the simulation are visualized in the snapshots of Fig. 2b and, more mesmerically, in Movies S1-3. While individuals are initially distributed uniformly at random, they spontaneously organize into dense colonies surrounded by “exclusion zones”. These colonies subsequently organize into a hexagonal pattern; to illustrate this, a hexagonal grid is overlaid in the right-most panel of Fig. 2b. To further characterize the pattern we determined the radial distribution function, which is defined as the distribution of distances between all pairs of individuals, normalized by the random expectation (Fig. 2c). The long-ranged oscillations in this distribution reveal a lattice constant of *a* ≈ 8.4, consistent with estimates based on the number of colonies found in the habitat (see Methods). The mechanism producing the pattern is analogous to that of the famous Turing patterns in reaction–diffusion systems [33], as will discuss in some detail below and in Appendix B.

The pattern, however, is not static. In Fig. 2d, enlargements are shown of a small region of the habitat. Consider the colony marked by the red circle. Initially, the colony is mostly blue, indicating that most individuals in this colony are highly altruistic. In time, however, the color degrades from blue to brown, reflecting a decline in altruism, and eventually the colony goes extinct. As best seen in Movies S1–3, this fate is bestowed on many colonies in the simulation. This suggests that altruistic colonies are sensitive to corruption by defectors that occasionally appear by mutation or invasion from neighboring colonies, resulting in a within-colony “tragedy of the commons” [30].

In the same figure, however, green arrows point to what happens after a colony disappears: a different colony nearby initially grows in size and then spontaneously divides in two, locally restoring the hexagonal pattern. Daughter colonies inherit their over-all color from their parent colony. Importantly, it appears in Movie S1–3 that colonies with a high mean level of altruism divide particularly rapidly and thus manage to multiply and spread.

All in all, these observations suggest that the colonies themselves behave like Darwinian replicators: they die, reproduce by binary fission, and show heritable variation in their level of altruism. Moreover, in view of the tragedy of the commons seen within colonies, the colony-level dynamics appear crucial for the evolution of altruism.

### 2.3 Colony formation is crucial for the evolution of altruism

So far, the evidence we presented on the dynamics of colonies has been anecdotal and qualitative. To further study the behavior of the model and obtain quantitative results, we now shift to a one-dimensional habitat. Simulations with a one-dimensional habitat are considerably faster, allowing many parameter settings to be explored, and are analyzed more readily, both mathematically and computationally.

#### Automated multilevel lineage tracking

Qualitatively, the behavior of the 1D model is analogous to that of the 2D model. Fig. 3a shows a section of the space-time arena for a simulation with default parameters (see Table 1). The left-hand side of the figure (gray scale) presents the population density. The striped pattern clearly reveals the formation of regularly spaced colonies that can persist for thousands of generations. An algorithm was used to detect these colonies automatically and track them in time (see Methods). The right-hand side of the figure plots the center of mass of the tracked colonies; colors represent the mean level of altruism of the individuals populating the colonies. In the middle part of the figure, density and traces overlap to showcase their consistency. Some traces suddenly end, indicating that the colony went extinct. Such events are detected automatically and indicated with a black square. From the figure, it is apparent that prior to the death of a colony the mean level of altruism always declines, suggesting a tragedy of the commons. In other places, traces suddenly fork, which is also automatically marked with orange circles. Clearly, the colonies in the 1D habitat reproduce by binary fission (like their 2D counterparts); the daughter colonies inherit their color from their parent. Again it appears that more altruistic colonies divide more frequently.

#### Colonies emerge due to a linear instability and enable altruism

The mechanisms behind the emergence of colonies in the 1D habitat can be studied mathematically using linear stability analysis (LSA). We envision a population of individuals with a fixed level of altruism *ϕ* homogeneously distributed over a large habitat that is populated at carrying capacity. Next we superimpose a tiny periodic density variation with some wavelength *λ* and derive under which conditions this perturbation is expected to grow exponentially, resulting in “colonies”. This also allows us to make approximate predictions on the wavelength of the emerging pattern, *i*.*e*., the distance between colonies. Details are found in Appendix B and Fig. S2.

The LSA reveals that colonies are expected to develop only for certain combinations of the scales of altruism, competition, and motility (Fig. 3c; remember that *σ*_{a} = 1 by definition). As the LSA elegantly demonstrates (Appendix B) the appearance of colonies is determined by a tug of war between these three forces. Altruism by itself tends to amplify differences in local density: areas with a high density contain more altruists, which positively affects the reproduction rate and hence further increases the density. This drives the emergence of colonies. However, this force is weak for density variations with a wavelength shorter than ∼ *σ*_{a}, which average out within the scale of altruism. Resource competition quenches density differences because it suppresses reproduction in densely population areas. This force, however, is weak for variations with wavelengths shorter than ∼ *σ*_{rc}, which are averaged out within the scale of resource competition. Lastly, random motility also tends to homogenize the density, but this force is ineffective against variations with wavelengths that are larger than ∼ *σ*_{m} because random motion is famously slow at large scales. Together, this means that colonies form only if *σ*_{m} is small compared to the other scales and *σ*_{a} is clearly smaller than *σ*_{rc}, so that wavelengths exist that are too long to be quenched by motility, long enough to be amplified by altruism, but too short to be suppressed by resource competition (see Fig. S2a).

To test these predictions, Fig. 3c presents results from a large number of simulations using a range of values of *σ*_{m} and *σ*_{rc} (19 × 21 = 399 simulations in total) in which all individuals are given an immutable value of *ϕ* = 0.05. Each simulation quantified to what extent colonies developed by simply measuring the variance in the local population density. The results are as expected: when crossing over from the linearly stable (blue) to the linearly unstable (yellow) region of Fig. 3b the variance in the local density increases precipitously. The wavelengths of the emerging patterns —typically close to 2*σ*_{rc}— also broadly match predictions (Fig. S2c,f). We therefore conclude that the LSA accurately describes and explains the emergence of the colonies.

From the observations of both the 1D and the 2D model it appeared that the emergence of colonies is important for the evolution of altruism. This suggests that appreciable levels of altruism should evolve only in the parameter regime where colonies can emerge for reasonable levels of altruism (the linearly unstable, yellow region of Fig. 3b,c). This is confirmed by a series of simulations for various scales of motility and competition (Fig. 3d). Because the colony formation depends on the existence of altruism, but the persistence of altruism in turn depends on the formation of colonies, the process must pull itself up by the bootstraps. Random mutations plus local reproduction spontaneously result in unstable colonies with modest levels of altruism and high internal levels of drift. This occasionally produces a colony that is altruistic enough to reproduce, which starts to spread rapidly.

Factors other than the spatial length scales clearly also affect whether altruism prevails. If the scale of competition becomes too large relative to the scale of motility, the mean level of altruism suffers (Fig. 3d, *e*.*g*. at *σ*_{rc} = 5 and *σ*_{m} = 0.1). Also, the stability of colonies against corruption by defectors is affected by the rate with which such defectors are created by mutations. In line with this, the mean level of altruism decreases if the mutation probability is increased (Fig. 3e). That said, altruism emerges for a broad range of mutation probabilities.

### 2.4 Quantitative measurement of multilevel selection components

To formally analyze and quantify the role of the colony dynamics in the selection of altruism, we make use of two existing mathematical results. Both are based on subtly different formalizations of the concept of group selection that are sometimes referred to as multilevel selection (MLS) 1 and 2 [13, 35] (see Appendix A for brief derivations).

Each of the two results relies on a different application of the Price equation [34, 36, 37]. The Price equation decomposes the change in the population mean of a trait *ϕ* over a time interval Δ*t* into two parts: the selection differential *S*, which quantifies the contribution of natural selection, and the transmission term *T*, reflecting systematic differences between the trait value of ancestors and their offspring.

MLS 1 is based on the fact that, in a population that is subdivided into groups, the selection differential *S* can be split into two components: *S* = *S*_{within} + *S*_{among}. Here *S*_{within} is the (weighted) mean of the selection differential measured *within* groups, while *S*_{among} is the covariance between a group’s mean trait value and its mean fitness, which can be interpreted as the selection *among* groups.

Our observations suggested that selection within colonies tends to be negative, which is to say that *S*_{within} is systematically negative. To compensate, *S*_{among} would have to be positive as a rule. To test this, we split the simulation illustrated in Fig. 3a into 2000 time intervals of 80 generations each and calculated *S*_{within} and *S*_{among} for each of these time intervals. The results, plotted in Fig. 4a, confirm the expectations. Over the last 1000 time intervals (shaded background in Fig 4), the mean level of altruism no longer changed significantly. (Mean change per time interval: (0.9 ± 2.0) × 10^{−5}, where the uncertainty denotes a 95% confidence interval; see Methods.) But over the same period, the within-colony component of selection was negative during 97.6% of the time intervals, averaging (−3.8 ± 0.4) × 10^{−4}. In contrast, the among-colony component was positive in 98.7% of the time intervals, with a mean of (4.2 ± 0.4) × 10^{−4}. Hence, selection within colonies is indeed negative (reflecting the within-colony tragedy of the commons), but this is compensated by a positive among-colonies component of selection.

The analysis of MLS 1 applies the Price equation to the population of individuals. MLS 2 instead applies it to the population of *colonies*. The mean level of altruism of individuals in a colony, Φ, is now considered a trait of that colony, and the fitness of a colony is defined as the number of offspring *colonies* that is has at the end of the time interval. The Price equation can then be used to describes the change in the mean level of altruism of the colonies. The selection differential *S* now measures whether colonies with a high value of Φ tend to produce more offspring colonies, and hence can be interpreted as the colony-level selection on Φ. The transmission term *T* now quantifies to what extent the Φ-value of offspring colonies systematically differs from those of their ancestral colonies. Hence, *T* characterizes the internal evolution of colonies.

We suspected that colonies with a higher mean level of altruism reproduce more frequently, and hence that the colony-level selection is predominantly positive. To test this, we applied the MLS 2 framework to the same 2000 time intervals used for the MLS 1 analysis, making use of the automatically acquired colony-level lineage traces to measure the fitness values of the colonies. The result, plotted in Fig. 4b, again confirms the expectations. Over the last 1000 time intervals, the mean level of altruism of colonies no longer changed significantly. (Mean change per time interval: (0.9 ± 2.0) × 10^{−5}.) In the same window, colony-level selection was positive in 82.2% of the time intervals, and negative in only 1.6%. (In the remaining intervals, none of the colonies reproduced or died, resulting in a colony-level selection of precisely 0.) Its mean value was (4.7 ± 0.5) × 10^{−4}. This was compensated by the colony-level transmission term, which was negative in 97.5% of the intervals, with an average of (−4.6 ± 0.4) × 10^{−4}. From this we conclude that the mean level of altruism of individual colonies tends to decrease with time, compensated by an increased rate of reproduction of colonies with a higher level of altruism.

## 3 Discussion

Above, we have presented a simple model of the evolution of altruism. Despite its simplicity, the model displays complex dynamics. Under suitable parameter settings, a linear instability permits a process of evolutionary bootstrapping in which colonies of altruists emerge that themselves reproduce by binary fission. Quantitative measurements demonstrated that defectors have the upper hand within colonies, but that colonies with a higher mean level of altruism reproduce more frequently. The net effect is that a significant level of altruism persists in the population.

Complex biological systems invariably show a hierarchical organization, with collectives of individuals at one level forming entities at a higher level. The evolution of such hierarchical structures involves transitions in which collectives of individuals start to behave as Darwinian individuals. An important open question in evolutionary theory is what mechanisms and conditions allow such transitions in individuality to take place [15]. While many theoretical models study evolution in hierarchical population structures, most take this structure as given (e.g., [14, 38]). In other models the spatial dynamics do spontaneously produce hierarchical structures, but with aggregates that cannot naturally be considered Darwinian entities because they are either too short-lived or do not replicate in a clear-cut sense (e.g., [39–41]). In yet other models, the formation of collectives is initially “scaffolded” by preexisting environmental structure [15]. In this light, a distinguishing feature of the current model is that group-level Darwinian replicators emerge spontaneously by self-organization; to our knowledge, few other models have this property (but see [42]). The formation of spatial density patterns is very common in nature and can result from various mechanisms [43]. The model that we presented reconfirms that, even in the absence of a preexisting ecological scaffold, such ecological self-organization can naturally result in competition and replication at the level of aggregates. Possibly, once natural selection is able to act at the level of such emergent aggregates, this opens an avenue towards a more complete transition of individuality.

Classical theory argues that the relative scales associated with motility, social interactions and competition are of crucial importance for the evolution of altruism (see Introduction). The results of the model confirm this: as expected, altruism evolves only if motility is limited and the scale of competition is larger than the scale of altruism. The significance of the scales, however, is much more involved than anticipated because they largely determine the emerging ecological patterns, which in turn shape the evolutionary dynamics. Indeed, the evolutionary dynamics support altruism (Fig. 3d) only if the ecological dynamics support the formation of colonies (Fig. 3c). This emphasizes that it is unlikely that the eco-evolutionary behaviors of complex dynamical models can be summarized by generic rules of thumb.

Superficially, the model is reminiscent of the ecological public-good (EPG) games of Wakano *et al* [32], which also produce intricate spatial patterns, including Turing patterns. But upon closer inspection the two models differ fundamentally in multiple aspects. In the EPG model all interactions are entirely local. Its pattern formation depends on a coexistence equilibrium point that has no counterpart in the current model, and parameters are chosen such that defectors are not viable without altruists. In addition, a necessary condition for the Turing patterns of the EPG model is that defectors are more motile than altruists; this distinction does not exist in our model. Importantly, the authors do not report that their colonies replicate, although perhaps such behavior could be obtained in a particular parameter regime. To sum up, the mechanisms producing the spatial patterns qualitatively differ between the two models and the EPG model does not display similar multilevel dynamics.

The concepts of multilevel and group selection and their relation to inclusive fitness theory are the subject of a longstanding and fierce debate [44–48]. Here, we do not engage in this debate. Given the remarkable colony dynamics in the model, the multilevel perspective is particularly apt and allowed us to test relevant hypotheses. But several other theoretical frameworks and fitness-accounting schemes [9], including inclusive fitness theory, could be applied as well, to address different questions. We do note that the model violates multiple assumptions that are frequently made in the derivation of inclusive fitness results, since neither population size nor density are constant, the benefits of altruism are non-linear, interaction strengths are non-binary, competition is local, and the scales of altruism and competition differ. Many standard results therefore do not apply directly.

In the literature, multiple conceptualizations of multilevel selection exist including MLS 1, MLS 2, and Contextual Analysis [35, 49]. As to which of these methods best captures the concept of multilevel selection no consensus has yet been reached [13]. Above, we have used the decompositions of MLS 1 and 2 essentially as alternative descriptive statistics, each measuring different but well-defined properties of the system. For example, the colony-level selection term of MLS 2 confirms that more altruistic colonies have more offspring, irrespective of whether one considers this term a (or even the sole) proper measure of the concept of group selection. Similarly, the within-group contribution to selection of MLS 1 is a useful measure to test the hypothesis that selection within colonies is negative on average. We could have applied Contextual Analysis too, to confirm associations between individual or contextual properties and fitness [24, 35]. Their conceptual caveats notwithstanding, each of these methods provides a different vantage point and potentially new insights. The features of our model make it ideally suited to illustrate and test the various approaches of multilevel selection.

Models of evolution in subdivided populations usually assume that the fitness of individuals depends solely on their own traits and those of other group members. Any spatial structure at the sub-colony scale is thus ignored. Moreover, it is typically assumed that groups compete equally among each other, ignoring spatial structure at scales beyond the size of a group. In applications, these assumption are approximations at best, and they certainly do not hold in our model because colonies are not homogeneous and the colony-level dynamics clearly result in spatial assortment of colonies (see Fig. 2b and Fig. 3a). This does not invalidate MLS theory, but serves to remind us that we cannot expect to perceive a complete picture from a single vantage point. In a forthcoming article, we describe an complementary multi*scale* approach that allows natural selection to be decomposed into contributions at each spatial scale [50]. This approach can be used to analyze the importance of structures below and beyond the colony level; it is also applicable to models that generate spatial structures such as spirals and waves that are relevant to selection but perhaps too ephemeral to be conceptualized as groups. More generally, over the years many evolutionary concepts have been formalized mathematically [51], but these results are rarely applied to computational and individual-based models. Despite each formalism’s limitations, together they provide a valuable toolbox that allows models to be scrutinized quantitatively from multiple perspectives[35, 52]. We hope that future studies take full advantage of its potential.

## 4 Methods

### 4.1 Detailed description of the model

#### Definition of the model

We envision a population or individuals living in a large habitat, which can be one- or two-dimensional. Each individual is fully characterized by its spatial coordinates plus the value of a single quantitative trait *ϕ*, which indicates its investment in altruistic behavior. The behavior of the individuals is defined by just four stochastic processes: death, motility, reproduction, and heredity with mutation.

**Death** strikes each individual at a fixed (Poisson) rate *d*; if an individual dies, it disappears from the population. The average lifespan of an individual is *d*^{−1}, which we call the *generation time*.

**Motility** is modeled as unbiased diffusion with diffusion constant *k*_{D}. It follows that, in a generation time, the root-mean-square displacement of an individual in each spatial dimension is , the *scale of motility*. It can be interpreted as the “typical” distance traveled by an individual during its lifetime. Note that we ignore that individuals take up space: nothing prevents multiple individuals from being at the same position at the same time.

**Reproduction** is asexual. When an individual reproduces, a new individual is placed at the same position as the parent. The rate of reproduction of each individual is negatively affected by the level of altruism of the individual itself and by competition for resources with other individuals; in contrast, it is positively affected by the altruism of others in its local environment. To implement these effects mathematically, we make use of two quantities that we will now introduce.

First, we define the local population density *D*(** y** |

*σ*

_{rc}) at position

**as a conventional Kernel Density Estimate: Here, the summation runs over all individuals**

*y**i*in the population;

*x*_{i}is the position of individual

*i*; and the kernel function

*G*

_{rc}(

**|**

*y**σ*

_{rc}) is the Gaussian (normal) distribution (univariate or bivariate, depending on the dimensionality of the habitat) with standard deviation

*σ*

_{rc}. By this definition, the population density at position

**is high if many individuals are found within a distance of order**

*y**σ*

_{rc}from

**. The parameter**

*y**σ*

_{rc}is called the

*scale of competition*because, as explained below, it determines the range of competitive interactions.

Second, the altruism experienced by an individual at position ** y** is measured as
This is again a KDE, except that each individual

*i*is weighted by its level of altruism

*ϕ*

_{i}. It is convenient to think of

*A*(

**|**

*y**σ*

_{a}) as the availability of some public good that organisms secrete locally in proportion to their level of altruism. The summation in Eq. 2 runs over all individuals, and the contribution of each individual to the public good at position

**decreases with their distance to**

*y***according to a Gaussian kernel function**

*y**G*

_{a}. The standard deviation

*σ*

_{a}of the kernel function is referred to as the

*scale of altruism*because it determines the range of altruistic interactions.

In terms of these definitions, the full equation for the reproduction rate *g*_{i} of individual *i* reads
Here, *g*_{0} is the basal reproduction rate. In the subsequent factor (labeled “factor 1”), the term −*cϕ* implements a deficit in the reproduction rate owing to the individual’s investment in altruism; the parameter *c* determines the price of altruism. The last term in factor 1 implements the advantage obtained from the altruism of others. The advantage grows as *b*_{0}*A*(** x** |

*σ*

_{a}) if

*A*(

**|**

*x**σ*

_{a}) is small but saturates at

*b*

_{max}if

*A*(

**|**

*x**σ*

_{a}) is large. Factor 2 introduces resource competition: it decreases linearly with the local population density

*D*(

**|**

*x**σ*

_{rc}) such that reproduction is locally inhibited when the density approaches

*K*. (In practice, the population density stabilizes somewhat below

*K*, where the average reproductive rate equals the death rate

*d*.) The max[.] function is required because both factors 1 and 2 could in rare cases become negative; in that case,

*g*

_{i}is set to 0.

**Heredity and mutation** are implemented as follows. Upon reproduction, the offspring usually inherits the value *ϕ* of the parent, but with probability *µ* a mutation occurs. In that case, the *ϕ*-value of the offspring is determined by adding a random change *δϕ* to the value of the parent. The absolute value of |*δϕ*| is drawn from an exponential distribution with mean *m*, and its sign is positive or negative with equal probability. A concern with this procedure, however, is that the resulting trait value *ϕ* of the offspring can become negative. In simulations with a 2D habitat this was not permitted and in such events the value was instead set to 0. Although this is a natural choice, it introduces a mutational bias (see Fig. S1), which complicates some of the analyses performed on the 1D version of the model (in particular, Fig. 3d,e). In the simulations of the 1D system *ϕ* was therefore allowed to become negative, but the behavior of the individuals was determined by the “effective” value *ϕ*_{E} = max(*ϕ*, 0) rather than by *ϕ* itself. In Fig. 3d,e the mean of *ϕ*_{E} is plotted. In Fig. 3a the colony mean of *ϕ* is plotted, but the distinction is immaterial because in this window of the simulation negative values of *ϕ* are rare.

#### Units and parameter reduction

We are free to choose convenient units for length, time, and the trait *ϕ*; thus, three parameters can be eliminated. First, we choose the unit of length such that the scale of altruism *σ*_{a} equals 1 by definition. The two other length scales that exist in the model, *σ*_{rc} and *σ*_{m}, are therefore expressed relative to *σ*_{a}. Second, units of time are chosen such that the generation time *d*^{−1} is 1. This implies that the death rate *d* also equals 1 by definition. Third, the unit of the trait value *ϕ* is chosen such that the parameter *c* (see Eq. 3) equals 1. This simplifies the interpretation of *ϕ*: an individuals with trait value *ϕ* directly sacrifices a fraction *ϕ* of its basal reproductive rate to the public good. Note, however, that the summation in Eq. 2 runs over *all* individuals, so that each individual also benefits from its *own* altruism. In the literature, a distinction is sometimes made between soft and hard altruism, depending on whether the direct benefits that altruists reaps from their behavior outweigh the direct costs [38]. We always choose parameters such that the direct costs far exceed the direct benefits, modeling hard altruism. The contribution of individual *i* to the public good *A*(*x*_{i}|*σ*_{a}) at its own position *x*_{i} is given by
From Eq. 3 and the fact that *σ*_{a} = 1 by definition, it then follows that the reproductive advantage due to one’s own altruism is bounded by (in the 1D habitat) or *b*_{0}*ϕ*_{i}/(2*π*) (in the 2D habitat). This reproductive advantage cannot outweigh the deficit of *cϕ*_{i} = *ϕ*_{i} unless (in the 1D case) or *b*_{0} > 2*π* (in the 2D case); we steer clear of this regime by choosing *b*_{0} appropriately small.

### 4.2 Implementation of the simulations

#### Simulation scheme

In the simulations, continuous space is approximated by a linear grid (in the 1D habitat) or a square grid (in the 2D habitat) with grid cells of linear size *δx*. Periodic boundary conditions are imposed. Time is divided into computational time steps *δt*.

During each computational time step, the state of the system at time *t* + *δt* is constructed based on the state at time *t* by the following sequence of steps:

##### Step 1. Calculate reproduction rates

First, the density *D*(** x**|

*σ*

_{rc}) and the availability of public good

*A*(

**|**

*x**σ*

_{a}) at each position

**are computed, taking into account the periodic boundary conditions. After this, the reproduction rates**

*x**g*

_{i}of all individuals can be calculated.

##### Step 2. Reproduction and mutation

Each individual *i* in the field reproduces with probability *g*_{i}*δt*. The offspring is mutated with probability *µ*, as described above.

##### Step 3. Death

Each individual subsequently dies with probability *dδt*.

##### Step 4. Motility

Each individual is displaced in each spatial dimension by a distance drawn at random from a discrete approximation of a Gaussian distribution with mean 0 and standard deviation .

#### Initial conditions

The steady-state population density of a population of defectors (*ϕ*_{i} = 0) is approximately (1 − *d*/*g*_{0})*K*, which can be derived by solving *g*_{i} = *d* under the assumption of a homogeneous population distribution. Therefore, the initial condition was constructed by placing (1 − *d*/*g*_{0})*KL* (in the 1D habitat) or (1 − *d*/*g*_{0})*KL*^{2} (in the 2D habitat) defectors at uniformly random positions, where *L* is the linear size of the habitat.

#### Default parameters

The default values of biological parameters are listed in Table 1. Here, we also provide the computational parameters.

For simulations of the 2D model, a square habitat of linear size *L* = 102.4 was used; using *δx* = 0.1 this amounted to ≈ 10^{6} grid cells. Using the default parameter values *K* = 40 and *g*_{0} = 5, the total population size was approximately *n* = (1 − *d*/*g*_{0})*KL*^{2} ≈ 3.4 × 10^{5}. The simulations were run for *T* = 8 000 generations, with time steps *δt* = 0.08.

For simulations of the 1D model, a habitat of size *L* = 819.2 was used with *δx* = 1/80, resulting in 65 536 grid cells. Using the default parameter values *K* = 100 and *g*_{0} = 5, the total population size was approximately (1 − *d*/*g*_{0})*KL* ≈ 6.6 × 10^{4}. These simulations were run for *T* = 160 000 generations, again with time steps *δt* = 0.08.

Additional settings were used to calibrate the automated recognition of colonies; see below.

### 4.3 Computational procedures

#### Calculating cumulative effects of selection, drift, and mutational bias

The mathematical framework used to quantify selection, drift and mutational bias is described in Appendix A.2. We applied this calculation to each time step of the simulations, so that the cumulative effect of each of the evolutionary forces could be tracked (Fig. 2a and S1).

For the analysis we need to obtain, for each individual *i* present right before the computational time step, the expectation value 𝔼(*W*_{i}) of the number of offspring *W*_{i} it will have after the time step (also counting the individual itself if it survives). To do so, first the growth rate *g*_{i} was calculated and subsequently the reproduction and death probabilities *P*_{r} = *g*_{i}*δt* and *P*_{d} = *dδt* over this time step. From the simulation scheme (see above) the expectation value can then be derived:
This expression is used in the calculations.

We note that this expectation value is conditioned on the current state of the simulation, in particular the population density and availability of public good at the position of the individual. In other words, only the effects of the inherent randomness of reproduction and death given the state of the local neighborhood are accounted as random drift; the fact that the state of the local neighborhood itself is also affected by random events in the past, such as the stochastic motility and demographics of others, is not. (See also Appendix A.2.)

#### Calculating the local population density

To efficiently calculate the local population density (Kernel Density Estimate or KDE) *D*(** x**|

*σ*

_{rc}) (Eq. 1), first a matrix was constructed that specifies, for each position in the habitat, the number of individuals at that position. The KDE at each position, taking into account the periodic boundary conditions, is the circular convolution of this occupancy matrix with the periodic summation of the (discretized approximation of the) Gaussian kernel

*G*

_{rc}(

**|**

*x**σ*

_{rc}). To perform this convolution, we use the Circular Convolution Theorem, which states that the circular convolution of two matrices can be obtained by first calculating their Discrete Fourier Transform (DFT) and then calculating the inverse DFT of their element-wise product.

#### Calculating the availability of public good

The public good available at each position, *A*(** x**|

*σ*

_{a}), is calculated in a similar way. First a matrix is constructed that contains, for each position in the habitat, the sum of the trait values of all individuals present at that position. The value of

*A*(

**|**

*x**σ*

_{a}) for each position is now obtained as the convolution of this matrix with the (discretized approximation of the) Gaussian kernel

*G*

_{a}(

**|**

*x**σ*

_{a}), again using the Circular Convolution Theorem.

#### Calculating the radial distribution function

The radial distribution function or pair-correlation function *g*(*r*) is defined as the *observed* number of pairs of individuals separated by a distance *r*, relative to the *expected* number under the null model assuming that each individual is placed at a random position.

In the 1D case, the distance *r* can only take on values *kδx*, where *k* is a non-negative integer. Call the population size *n* and the size of the habitat in grid cells *X*. To calculate the expected number of pairs at distance *r* = *kδx*, written as *E*(*r*), we note that the number of individuals *o*_{x} at position *x* is binomially distributed under the null model. Its expectation value is 𝔼[*o*_{x}] = *n*/*X* and hence . (Here, we used that the occupancies of different sites are to good approximation independent.) The observed number of pairs of individuals found at a distance *r* = *kδx*, called *O*(*r*), is precisely given by the auto-correlation of the occupancy matrix, which is again efficiently calculated using the Circular Convolution Theorem. For each value *r* = *kδx*, the radial distribution function is then obtained as *g*(*r*) = *O*(*r*)/*E*(*r*).

In the 2D case, the rectangular grid imposes that *r* can only take on values such that *r* ^{2} = (*a*^{2} + *b*^{2})*δx*^{2}, where *a* and *b* are integers. In addition, in calculating the expectation under the null model, the frequency *F*(*r*) with which each distance occurs in the grid has to be taken into account. (*E*.*g*., the distance 5*δx* occurs three times more often than the distance 6*δx*.) Under the same assumptions as made for the 1D case, the expectation is *E*(*r*) = *F* (*r*)*n*^{2}/*X* ^{2}. To calculate the observed number of pairs *O*(*r*), the auto-correlation matrix of the occupancy matrix is used. Then *g*(*r*) = *O*(*r*)/*E*(*r*) is calculated for each admissible value of *r*. To obtain plot Fig. 2c, the distances were subsequently binned.

#### Calculating the terms of MLS 1 and MLS 2

To obtain Fig 4 and S3, we divided the simulation into 2 000 time intervals of Δ*t* = 80 generations and applied the analyses of MLS 1 and 2 to each time interval. The mathematical expressions for MLS 1 and 2 are briefly summarized in Appendix A.4. Here follows a description of the computational methods used.

For concreteness, let us focus on a particular interval (*t*_{1}, *t*_{2}]. The first step is to calculate the selection differential *S*, defined as the covariance of *ϕ* and relative fitness *w* (Eq. A.2 in Appendix A.1). For the purpose of this analysis, the relative fitness *w*_{i} of an individual *i* living at time *t*_{1} is the number of offspring it has at time *t*_{2} (the absolute fitness *W*_{i}), divided by the population mean . To find these offspring numbers, each individual at time *t*_{1} was assigned a unique ID that was subsequently inherited by all offspring. At time *t*_{2}, a frequency table of ID values was constructed, which directly provided the fitness of each each individual at time *t*_{1}. With this information, *S* can be calculated directly.

The analysis of MLS 1 splits *S* into two parts (see Eq. A.8 in Appendix A.3). It is sufficient to calculate the second term, *S*_{among}, after which the first follows as *S*_{within} = *S* − *S*_{among}. To calculate *S*_{among} first the geographical borders of all colonies at time *t*_{1} were identified; the algorithm used for this is described in the next section. Next, each individual at *t*_{1} was assigned to a colony. Then for each colony *j* we calculated its population size *n*_{j}, its mean relative fitness {*w* | *j*}_{w} and its mean trait value {*ϕ* | *j*}_{w}. At this point, *S*_{among} could be calculated from its definition.

The analysis of MLS 2 describes the dynamics from the perspective of the colonies (see Eq. A.9). To determine the fitness of the colonies present at time *t*_{1}, we had to count how many offspring *colonies* they have at time *t*_{2}. This requires that we defined the borders between the colonies at time *t*_{2}, but also that we traced the ancestor colony at *t*_{1} for each offspring colony at *t*_{2}; the algorithm used is described in the next section. The other ingredient of Eq. A.9 is the trait value Φ of the colonies. Because Φ_{j} = {*ϕ* | *j*}_{w} (see section A.4), these quantities were already calculated for MLS 1 and both terms of Eq. A.9 can be evaluated directly.

#### Automated recognition and tracking of colonies

To perform the multilevel selection analysis, the simulation had to automatically recognize colonies and track their ancestry. Because existing clustering algorithms are inefficient for 1D systems and/or difficult to adjust to our needs, we used our own heuristics.

Where to draw the border between neighboring colonies, and when to conclude that one colony has divided into two, is to some degree arbitrary. The results of the analyses, however, do not depend sensitively on such details as long as we use reasonable definitions and apply them consistently.

The basic idea is to identify the borders between colonies with local minima of the population density. However, local minima can also occur temporarily within colonies due to random fluctuations, and such minima should not be confused with true borders between colonies. To solve this, one might exclude local minima if the density at their position exceeds a set threshold, so that only “deep” minima are considered. Such a simple threshold rule can identify most colonies correctly, but issues arise during the binary fission of colonies. During this process the depth of the local minimum that separates the two daughter colonies fluctuates, and hence it is likely to cross the threshold multiple times. Consequently, the threshold rule tends to record multiple events of fission and fusion during a single process of colony division. Similarly, when a dwindling colony is about to disappear, the threshold rule tends to infer series of deaths and resurrections of the same colony.

To prevent this, the algorithm that was used in the simulations in fact uses two density thresholds: a low and a high one, *T*_{low} and *T*_{high}. When a new local minimum appears, a new border (and hence the birth of a new colony) is inferred only when the local density at the minimum drops below *T*_{low}. In contrast, when an existing border is about to disappear, this is acknowledged only when the density at the associated local minimum rises above *T*_{high}. The result is a hysteresis of sorts: when the density of at new minimum drops below *T*_{low} for the first time, a colony is born; if afterwards the density at the border temporarily exceeds *T*_{low}, the border is maintained unless it also exceeds *T*_{high}.

To be precise, when performing the MLS analysis on the interval (*t*_{1}, *t*_{2}], the borders of the colonies at time *t*_{2} were constructed as follows:

Calculate a smoothed density. The smoothed density at each position was defined as a KDE with bandwidth

*σ*_{a}/2, taking into account the periodic boundary conditions.Identify local minima. If the smoothed density at grid point

*x*is written as*ρ*_{x}, each*x*such that*ρ*_{x}<*ρ*_{x+1}and*ρ*_{x}<*ρ*_{x−1}marks a local minimum. (Because of the periodic boundary conditions, all indices should be read modulo*X*, the size of the grid.)Determine

*tentative*borders between colonies. First, local minima were selected with a density*ρ*_{x}<*T*_{high}; the other minima were discarded. Each of the selected minimal was then associated with a tentative border which would later be further scrutinized. To ensure that no individuals can sit exactly at a border (causing ambiguity as to which colony it belongs to), borders were positioned*between*grid points. First, the derivative of the density at the position of each minimum was approximated as*ρ*′ (*x*) = (*ρ*_{x+1}−*ρ*_{x−1})/(2*δx*). If a minimum was located at grid point*x*, then a tentative border was placed at*x*+ 1/2 if the derivative was negative, and at*x*− 1/2 if the derivative was positive.Assign an ancestor to each tentative colony. Given the tentative borders, tentative colonies were also implicitly. For each tentative colony at time

*t*_{2}an ancestor colony at time*t*_{1}was determined. To do so, we exploited that we have already traced back the ancestry of the*individuals*in the colonies. We then used the expectation that the ancestor colony*P*of a colony*Q*contains most, if not all, ancestors of the*individuals*that belong to*Q*. Based on this, we identified*P*as the ancestor colony that contains the largest fraction of the ancestors of the individuals belonging to*Q*.Reject tentative borders that reflect fluctuations or incomplete divisions. If the colonies on either side of a tentative border had the same ancestral colony, this suggested that a colony division might have taken place. In this case, we compared the density at the corresponding minimum to the low threshold

*T*_{low}; if the density was above that threshold, the border was rejected. All other tentative borders were now accepted, so that the identification of colonies at*t*_{2}and their ancestor at time*t*_{1}also became final.Count the number of offspring of each ancestral colony. Because the ancestors of colonies at time

*t*_{2}had been identified, the number of offspring—the absolute fitness—of each ancestral colony could be tabulated. If an ancestral colony had fitness 0, it must have died between*t*_{1}and*t*_{2}. If an ancestral colony had absolute fitness > 1 it must have divided. (In practice, a fitness above 2 did not occur because Δ*t*is too short to support multiple consecutive divisions.) If a colony had fitness 1, the ancestral colony most likely survived the time interval without reproducing.

The thresholds *T*_{high} and *T*_{low} are parameters; we found that *T*_{high} = 0.7*K* and *T*_{low} = 0.2*K* worked well.

#### Estimating the lattice constant of the hexagonal lattice by counting colonies

The lattice constant of the hexagonal pattern that emerges in the 2D model can be estimated by counting the number of colonies in the habitat. A hexagonal lattice is composed of equilateral triangles with side *a* and area . The number of triangles is twice the number of nodes *ν*. In a large enough habitat of area *L*^{2}, the number of nodes can then be estimated as . Conversely, after counting the number of nodes, *a* can be estimated as . At the end of the simulations of Fig. 2 we find approximately 179 colonies, which, given *L* = 102.4, corresponds to *a* ≈ 8.2. This is consistent with the estimate based on the radial distribution function (Fig. 2c).

#### Estimating error bars

In the Results section and Table S1 we provide 95% confidence intervals for the means of all quantities plotted in Figs. 4 and S3. Because data points in these time series are auto-correlated and the distributions of some quantities are skewed, the standard methods for calculating confidence intervals could not be used. Therefore, we applied the method described in Ref. [53].

Briefly, the idea is to divide the data series into blocks if length *l* and use the means of these blocks (rather than the original data points) to estimate the standard error of the mean (SEM). Starting with *l* = 2, if *l* is increased, the correlations between block means eventually become negligible and the estimates stabilize around a sensible value, which we determined by manual inspection and then rounded off conservatively. Moreover, because of the central limit theorem, the distribution of block means converges to a normal distribution, which justifies the use of *t*-statistics to estimate confidence intervals. Although the correct number of degrees of freedom to be used is poorly constrained (as it depends on the minimal value of *l* that is deemed large enough to remove correlations), it is in all cases large enough to ensure that the critical *t*-value for *t*_{0.05(2)} is near 2. We therefore estimated the 95% confidence interval as the sample mean ± twice the estimated SEM.

#### Software

Simulations were performed with custom software written in Fortran; the code is made available on the following GitHub repository: https://github.com/rutgerhermsen/altruism.git. Statistics were performed in R version 3.6.1. Visualization was done in R, using ggplot2, and in Wolfram Mathematica 12.

## Supplementary Figures

## Supplementary Tables

## Supplementary Movie Captions

**Movie S1**. Movie depicting the dynamics of the simulation described in Fig. 2, which is also shown in Fig. S1 as Replicate 1. Default parameters were used (Table 1). The video plots the positions of all individuals, and the level of altruism of each individual is indicated with the same color scale as in Fig. 2 and 3a. The ticks on the left-hand vertical axis show the scale of altruism, the ticks on the right-hand vertical axis the scale of competition. A high-quality version of this movie is shared here: https://doi.org/10.5281/zenodo.5727313.

**Movie S2**. Movie depicting the dynamics of the simulation Replicate 2 described in Fig. S1. Default parameters were used (Table 1). In the video, the left-hand panel shows the positions of all individuals, as in 5, using the same color scale as in Fig. 2 and 3a. The right-hand panel plots for each position in the habitat the value *A*, which can be interpreted as the amount of public good at that position, as provided by the altruists in the local environment. The ticks on the left-hand vertical axis show the scale of altruism, the ticks on the right-hand vertical axis the scale of competition. A high-quality version of this movie is shared here: https://doi.org/10.5281/zenodo.5727313.

**Movie S3**. Movie depicting the dynamics of the simulation Replicate 3 described in Fig. S1. Default parameters were used (Table 1). In the video, the left-hand panel shows the positions of all individuals, as in 5, using the same color scale as in Fig. 2 and 3a. The right-hand panel plots for each position in the habitat the value *A*, which can be interpreted as the amount of public good at that position, as provided by the altruists in the local environment. The ticks on the left-hand vertical axis show the scale of altruism, the ticks on the right-hand vertical axis the scale of competition. A high-quality version of this movie is shared here: https://doi.org/10.5281/zenodo.5727313.

## 5 Acknowledgments

I am grateful to Rens Dijkhuizen for preliminary simulations and analysis, and to Hilje Doekes for many insightful discussions and valuable feedback. This work was supported by the Human Frontier Science Program, grant nr. RGY0072/2015 (http://www.hfsp.org/funding/research-grants).

## Appendices

### A The Price equation, evolutionary forces, and MLS 1 & 2

In this article, several mathematical results are applied that have been derived long ago [13, 34, 35]. For ease of reference and to facilitate readers who are not intimately familiar with this theory, we here briefly summarize these results. Nothing in this section is new, although our notation differs somewhat from other presentations to expose the analogies between the multi*level* selection analysis presented here and the multi*scale* analysis presented elsewhere [50].

#### A.1 The Price equation

The Price equation provides a general way to formally describe changes in gene frequencies or mean trait values in evolving populations due to evolutionary forces such as selection and mutation [34, 36, 51].

In its simplest form we envision a population of entities that each possess a numerical trait *ϕ*. At time *t*_{1}, the population size is *n*, and the population mean of *ϕ* is . At a later time *t*_{2} = *t*_{1} + Δ*t* the mean of *ϕ* has changed by an amount . Each individual alive at time *t*_{2} has a unique ancestor at time *t*_{1}. (If the individual was already born at time *t*_{1}, we designate its past self as the ancestor.) Conversely, each individual *i* alive at time *t*_{1} has *W*_{i} offspring at time *t*_{2}. (If the individual is itself still alive at time *t*_{2}, it is counted as one of the offspring.) *W*_{i} is called the absolute fitness of *i*. The relative fitness *w*_{i} of this individual is defined as , where is the population mean absolute fitness. The trait value *ϕ* of the offspring of *i* differs from the value of individual *i* itself; the average difference among *i*’s offspring is called Δ*ϕ*_{i}.

With these definitions, the change in the mean value of *ϕ* over the time interval Δ*t* can be written as:
with
Equation A.1 is called the Price equation. The first term, *S*, is the population covariance between the trait and relative fitness. It shows that the mean value of *ϕ* tends to increase if a high value of *ϕ* is associated with a high fitness. Therefore, *S* is often considered a measure of the effect of natural selection and called the selection differential. The second term, *T*, is the average change in trait value between ancestors and their offspring. Therefore *T* is a measure of transmission bias.

#### A.2 Measuring selection, random drift, and mutational bias

Although the Price equation is frequently and fruitfully used in its standard form, it has its limitations. One clear limitation is that it does not acknowledge one of the evolutionary forces that is central to canonical evolutionary theory: random drift.

The absence of random drift from the standard Price equation is a consequence of the definition of fitness used in its formulation. Above, the fitness *W*_{i} of individual *i* was defined as the actual number of offspring it has after the time interval Δ*t*. This is at odds with the usual parlance, in which fitness refers to an organism’s adaptedness to a particular environment. If an organism dies without offspring, this does not necessarily prove that it was poorly adapted to its environment: it might just have been unlucky. The term fitness, then, seems to refer more properly to a propensity or expectation than to an actually realized number of offspring [54, 55]. Deviations from the expectation due to chance are the source of what is usually called random drift.

One way to extend the Price equation is therefore to treat that the number of offspring *W*_{i} as a random variable and to associate fitness with its expectation value 𝔼(*W*_{i}) [13, 51]. In that case we can write the actual number of offspring *W*_{i} as 𝔼(*W*_{i}) + *δW*_{i}, where *δW*_{i} is the deviation from the expectation. If we insert this into the standard Price equation (Eq. A.1) we arrive at
Compared to the standard Price equation, the selection differential *S* is split into two parts: one part that more properly captures the effects of natural selection, and one term that formalizes random drift.

A complication with the above formulation is that it is not obvious how the probability distribution of *W*_{i}, and hence the expectation 𝔼(*W*_{i}), should be defined. In particular, it is unclear which variables other than the trait value *ϕ* should be taken into account — that is, which information the probability distribution should be conditioned on. The more information we incorporate into the expectation, the less uncertainty remains to power random drift. Clearly, this difficult issue is beyond the scope of this work. In the meantime, we take a pragmatic stance: Through convenient choices, the above formalism can be used to examine the contributions of elected sources of randomness, regardless of whether these choices can be justified based on unique “correct” definitions of fitness, selection, and random drift.

#### A.3 Multilevel selection 1

Next, we consider a population that is subdivided into *N* distinct groups. To describe the system from the perspective of MLS 1, we start with the Price equation at the level of the individuals, Eq. A.1. The idea of the analysis is to split the selection differential *S* into two parts, *S*_{within} and *S*_{among}, where the first accounts for selection taking place *within* groups, and the second for selection *among* groups. We saw that *S* is defined as a covariance (Eq. A.2); mathematically, the decomposition is a direct application of the Law of Total Covariance. In the interest of clarity will nevertheless rederive it from scratch.

It will be useful to introduce some notation. Let *z* be a trait or property of individuals. We will denote the value of *z* of individual *i* in group *j* as *z*_{ij}, and the size of group *j* will be written *n*_{j}. Then the mean of *z* within group *j* is written as {*z* | *j*}_{w}:
The label “w” stands for “within”. Whenever this does not give rise to confusion we will omit the group index *j* and write {*z*}_{w}.

Now, let *u* be a trait or property of groups. Then we define ⟨*u*⟩_{a} as the mean of *u* among groups, where the groups are weighted according to their group size *n*_{j} :
The label “a” stands for “among”.

From the above definitions, one can verify that
That is to say, if we know the mean value of *z* within each group, {*z*}_{w}, we can recover the population mean by averaging the over all groups, provided we give larger groups a larger weight.

With the above notation and Eq. A.7 in place, the decomposition of *S* is obtained quite directly:
Here we introduced Cov_{w} (*y, z* | *j*) ≡ {*yz*}_{w} − {*y*}_{w} {*z*}_{w} as the covariance between individual properties *y* and *z* as measured within group *j*, and Cov_{a} (*u, v*) = ⟨*uv*⟩_{a} − ⟨*u*⟩_{a} ⟨*v*⟩_{a} as the covariance of group properties *u* and *v* among groups, where groups are weighted by their group size.

Eq. A.8 shows that *S*_{within} quantifies to what extent within groups the trait value *ϕ* is associated with fitness. It can hence be interpreted as the effect of selection taking place within groups. On the other hand, *S*_{among} measures whether groups with a high mean of *ϕ* tend to have a high mean fitness. It can hence be interpreted as the selection component that results from selection among groups.

#### A.4 Multilevel selection 2

We note that the calculations for MLS 1 can be executed for subdivided populations regardless of whether the groups themselves can in any meaningful way be said to reproduce or die. In other words, group selection according to MLS 1 does not require that the groups can themselves be considered replicators. An alternative formalism, called MLS 2, does explicitly require Darwinian dynamics at the level of groups.

The idea of MLS 2 is that, if the groups themselves are replicators, the Price equation can be applied at the level of groups. Now the relevant population is the population of groups, and the Price equation can describe the evolution of any trait Φ that is a property of groups:
Importantly, the relative fitness *ω*_{j} in this Price equation now represents the fitness of group *j*, that is, the (relative) number of groups at time *t*_{2} that are its offspring (including the group itself, if it survives until *t*_{2}).

If we are interested in the evolution of a particular trait at the individual level *ϕ* — such as the level of altruism — we are free to choose Φ to be the group mean of *ϕ*; that is, Φ_{j} = {*ϕ* | *j*}_{w}. The first term in Eq. A.9 then measures the effect of selection at the group level on the mean trait value of groups. The second term quantifies the effect of bias in the changes in Φ between ancestral groups and their offspring; this reflects the internal evolution of groups.

### B Linear stability analysis

We here provide the details of the linear stability analysis for the 1D habitat that is presented in Fig. 3b,c and Fig. S2.

#### B.1 Mathematical analysis

Consider a population of individuals that each have the same level of altruism *ϕ*. If the carrying capacity is large, the dynamics if the density *ρ*(*x, t*) can be approximated by the following mean-field equation:
Here, *G*_{a} and *G*_{rc} are the kernel functions used in Eq. 1 and Eq. 2 to define the availability of public good and the local density, respectively. The notation *f* * *h* stands for the convolution of functions *f* and *h*. Eq. B.1 has a homogeneous equilibrium solution *ρ* (*x, t*) = *ρ*_{0} > 0; we ask under what conditions this solution is (linearly) unstable to periodic perturbations so that colonies can form spontaneously.

To find out, we first identify *ρ*_{0} by equating Eq. B.1 to zero and solving for *ρ* (*x, t*) = *ρ*_{0}. Ignoring the trivial solution *ρ*_{0} = 0, the equation is quadratic and can be solved straightforwardly. Out of the two solutions, one is negative and hence irrelevant. The remaining solution depends on all parameters except for the diffusion constant *k*_{D}.

We then consider a periodic perturbation
with a very small (infinitesimal) amplitude *ϵ*(0) and ask whether *ϵ*(*t*) will grow or decay. To obtain a dynamic equation for *ϵ*(*t*) we insert Eq. B.2 into Eq. B.1. In doing so, we have to work out the convolutions of the Gaussian kernel functions *G*_{a} and *G*_{rc} with the sine wave of Eq. B.2. From the Convolution Theorem it follows that, for any real-valued, normalized, symmetric kernel function *f* (*x*) that has a Fourier transform the convolution with a sine wave is again a sine wave, but with a reduced amplitude:
In the specific case where *f* (*x*) is Gaussian with standard deviation *σ*, we get
The convolutions with *G*_{a} and *G*_{rc} follow directly from Eq. B.3 and Eq. B.4.

We then expand the resulting equation to first order in *ϵ*(*t*). Because *ρ*_{0} is the homogeneous solution, the zeroth-order term vanishes. The result is a linear equation of the form:
where the factor *E*(*λ*) can be written as:
The solution of Eq. B.5 is exponential, with growth rate or eigenvalue *E*(*λ*). Hence, if Eq. B.6 is positive for some wavelength *λ*, perturbations with this wavelength are predicted to grow exponentially. Because demographic noise produces perturbations of any wavelength, this is expected to eventually give rise to periodic density fluctuations with a similar wavelength.

Eq. B.6 provides considerable insight. It consists of three terms, expressing the effects of altruism, motility, and resource competition. The three length scales in the system —the scale of altruism *σ*_{a}, the scale of motility *σ*_{m}, and the scale of competition *σ*_{rc}— each appear in their appropriate term.

The first term, describing the effect of altruism, is the only positive one, and it scales with *ϕ*. This shows that altruism is required to obtain a positive eigenvalue for any wavelength *λ*. Indeed, altruism tends to amplify density differences: because the benefits of altruism grow with the number of altruists in the local neighborhood, its effect is to increases the reproduction rate in regions of high density, which tends to further increase that density. However, the equation shows that this positive contribution is exponentially suppressed if the wavelength *λ* is small relative to the scale of altruism *σ*_{a}; this is because such short waves average out within the social neighborhoods of individuals.

The second term reflects the effect of motility. Random motion (diffusion) is a homogenizing force and therefore quenches density fluctuations, as reflected in the negative sign of this term. However, because diffusion is famously slow on large length scales, only short wavelengths are strongly affected: if the wavelength *λ* exceeds the scale of motility (the typical distance traveled by an individual in a generation time) the contribution becomes small.

The third term describes the effect of resource competition. Resource competition reduces the reproduction rate in areas with a larger density and thus suppresses density differences, which explains that its contribution is negative. However, if the wavelength is small relative to the scale of competition *σ*_{rc}, the density wave averages out within the competitive neighborhood of individuals and the homogenizing effect becomes weak.

Together, this clearly indicates in which regime we ought to expect colonies. Density fluctuations are suppressed by diffusion if their wavelength *λ* is smaller than *σ*_{m}, and by resource competition if *λ* is larger than *σ*_{rc}. Instabilities are therefore expected only if there is a gap between these two regimes. At the same time, the positive contribution of altruism becomes weak if *λ* is smaller than *σ*_{a}. For altruism to be effective in the “gap”, *σ*_{a} therefore should be chosen smaller than *σ*_{rc}. In summary, instability requires that the diffusion constant is small enough, the scale of competition is large enough, and the scale of altruism is smaller than the scale of competition. Apart from these rules of thumb, Eq. B.6 can of course be evaluated numerically to make precise predictions; see Fig. S2.

#### B.2 Validation of predictions

In Fig. 3b,c and Fig. S2 we test predictions based on the linear stability analysis using simulations. The key predictions are (i) the region of parameter space where colonies can form, and (ii) the wavelength of the resulting pattern. The following methods were used.

As illustrated with the red dot and arrow in Fig. S2a, both predictions are found by maximizing *E*(*λ*). Fig. S2b shows a contour plot of the maximal value of *E*(*λ*) under variation of the scales; Fig. S2c the corresponding wavelengths. Both values were obtained by differentiating Eq. B.6 and numerical root finding.

To test the predictions we performed a large number of simulations using different values for *σ*_{rc} and *σ*_{m}. (We used *σ*_{rc} ∈ {1, 1.2, 1.4, …, 5} and *σ*_{m} ∈ {0.0671, 0.100, 0.134, …, 0.671} in all 21 × 19 = 399 combinations.) As a simple proxy for the presence of colonies, we measured the variance of the (smoothed) population density (KDE) over space. To identify the dominant wavelength in the density pattern, we calculated the Fourier transform of the KDE and selected the mode with the largest amplitude.

The simulations were performed as usual and using default parameters except for the following adjustments:

All individuals were initialized with a trait value

*ϕ*= 0.05.In these simulations, we were interested in the ecological patterns of a population with fixed

*ϕ*; therefore the mutation rate was set to*µ*= 0 to disable evolution.The simulation was run for

*T*= 2 400 generations. (The colonies establish very rapidly.)Starting at

*t*= 400 generations, after each time interval of 80 generations the following analysis was performed:Calculate a KDE using a Gaussian kernel with standard deviation/bandwidth

*σ*_{a}/2.Calculate the variance of this KDE.

Calculate the Fourier transform of the KDE and identify the wave number with the largest amplitude.

After the simulation, the mean value of the variance was reported. It is this variance that is plotted in Fig 3c and Fig. S2e. Also, the mean value of the wave number with the largest amplitude was reported; this wave number was transformed to a wave length, which was plotted in Fig. S2f.