Adaptive dynamics of memory-1 strategies in the repeated donation game

Social interactions often take the form of a social dilemma: collectively, individuals fare best if everybody cooperates, yet each single individual is tempted to free ride. Social dilemmas can be resolved when individuals interact repeatedly. Repetition allows individuals to adopt reciprocal strategies which incentivize cooperation. The most basic model to study reciprocity is the repeated donation game, a variant of the repeated prisoner’s dilemma. Two players interact over many rounds, in which they repeatedly decide whether to cooperate or to defect. To make their decisions, they need a strategy that tells them what to do depending on the history of previous play. Memory-1 strategies depend on the previous round only. Even though memory-1 strategies are among the most elementary strategies of reciprocity, their evolutionary dynamics has been difficult to study analytically. As a result, most previous work relies on simulations. Here, we derive and analyze their adaptive dynamics. We show that the four-dimensional space of memory-1 strategies has an invariant three-dimensional subspace, generated by the memory-1 counting strategies. Counting strategies record how many players cooperated in the previous round, without considering who cooperated. We give a partial characterization of adaptive dynamics for memory-1 strategies and a full characterization for memory-1 counting strategies. Author summary Direct reciprocity is a mechanism for evolution of cooperation based on the repeated interaction of the same players. In the most basic setting, we consider a game between two players and in each round they choose between cooperation and defection. Hence, there are four possible outcomes: (i) both cooperate; (ii) I cooperate, you defect; (ii) I defect, you cooperate; (iv) both defect. A memory-1 strategy for playing this game is characterized by four quantities which specify the probabilities to cooperate in the next round depending on the outcome of the current round. We study evolutionary dynamics in the space of all memory-1 strategies. We assume that mutant strategies are generated in close proximity to the existing strategies, and therefore we can use the framework of adaptive dynamics, which is deterministic.


Introduction
Evolution of cooperation is of considerable interest, because it demonstrates that natural selection does not only lead to selfish, brutish behavior red in tooth and claw [1,2].Yet in absence of a mechanism for its evolution, natural selection opposes cooperation.A mechanism for evolution of cooperation is an interaction structure that allows natural selection to favor cooperation over defection [3].Direct reciprocity is one such mechanism [4][5][6][7][8].This mechanism is based on repeated interactions among the same individuals.In a repeated interaction, individuals can condition their decisions on their co-player's previous behavior.By being more cooperative towards other cooperators, they can generate a favorable social environment for the evolution of cooperation.
The most basic model to illustrate reciprocity is the repeated donation game [1].This game takes place among two players, who interact for many rounds.Each round, players independently decide whether to cooperate or defect.Cooperation implies a cost c for the donor and generates a benefit b for the recipient.Defection implies no cost and confers no benefit.Both players decide simultaneously.If they both cooperate, each of them gets payoff b−c.If both players defect, each of them gets payoff 0. If one player cooperates while the other defects, the cooperator's payoff is −c while the defector's is b.The donation game is a special case of a prisoner's dilemma if b > c > 0, which is normally assumed.
If the donation game is played for a single round, players can only choose between the two possible strategies of cooperation and defection.Based on the game's payoffs, each player prefers to defect, creating the dilemma.In contrast, in the repeated donation game, infinitely many strategies are available.For example, players may choose to cooperate if and only if their co-player cooperated in the previous round.This is the well-known strategy Tit-for-tat [5].Alternatively, players may wish to occasionally forgive a defecting opponent, as captured by Generous Tit-for-tat [9,10].
Against each of these strategies, unconditional defection is no longer the best response.Instead, mutual cooperation is now in the co-player's best interest.
During the past decades, there has been a considerable effort to explore whether conditionally cooperative behaviors would emerge naturally (e.g., [11][12][13][14][15][16][17][18][19][20][21][22][23]).To this end, researchers study the dynamics in evolving populations, in which strategies are transmitted either by biological or cultural evolution (by inheritance or imitation).For such an analysis, it is useful to restrict the space of strategies that individuals can choose from.The strategy space ought to be small enough for a systematic analysis, yet large enough to capture the most interesting behaviors.
When both players adopt memory-1 strategies, there is an explicit formula to derive their average payoffs (as described in the next section).Based on this formula, it is possible to characterize all Nash equilibria among the memory-1 strategies [32][33][34][35][36][37].In general, however the payoff formula yields a complex expression in the players' conditional cooperation probabilities p ij .As a result, it is difficult to characterize the March 2, 2023 2/27 dynamics of evolving populations, in which players switch strategies depending on the payoffs they yield.Most previous work had to resort to individual-based simulations.
Only in special cases, an analytical description has been feasible (for example, based on differential equations).One special case arises when individuals are restricted to use reactive strategies [38][39][40][41][42][43].Reactive strategies only depend on the co-player's previous move.Within the memory-1 strategies, they correspond to the 2-dimensional subset with p CC = p DC and p CD = p DD .In addition, there has been work on the replicator dynamics among three strategies [14,44], and on the dynamics among transformed memory-1 strategies [45,46].Here, we wish to explore the dynamics among memory-1 strategies directly, using adaptive dynamics [47,48].
We begin by describing two interesting mathematical results.First, we show that under adaptive dynamics, the 4-dimensional space of memory-1 strategies contains an invariant 3-dimensional subset.This subset comprises all "counting strategies".These strategies only depend on the number of cooperators in the previous round.They correspond to memory-1 strategies with p CD = p DC .Second, we find that for the donation game, the adaptive dynamics exhibits an interesting symmetry between orbits forward-in-time and backward-in-time.We use these mathematical results to partially characterize the adaptive dynamics among memory-1 strategies, and to fully characterize the dynamics among memory-1 counting strategies.

Model
We study the infinitely repeated donation game between two players.Each round, each player has the option to cooperate (C) or to defect (D).Players make their choices independently, not knowing their co-player's choice in that round.Payoffs in each round are given by the matrix The entries correspond to the payoff of the row-player, with b and c being the benefit and cost of cooperation, respectively.We assume b > c > 0 throughout.The above payoff matrix is a special case of a symmetric 2 × 2 game with matrix The payoff matrix (1) of the donation game satisfies the typical inequalities of a prisoner's dilemma, T > R > P > S and 2R > T +S.Moreover, it satisfies the condition of 'equal gains from switching', This condition ensures that if players interact repeatedly, their overall payoffs only depend on how often each player cooperates, independent of the timing of cooperation.
In the following we focus on repeated games among players with memory-1 strategies.Each player's decision is determined by a four-tuple p = (p CC , p CD , p DC , p DD ).
Depending on the outcome of the previous round, CC, CD, DC, or DD, the focal player responds by cooperating with probability p CC , p CD , p DC , or p DD , respectively.We refer to a memory-1 strategy as a counting strategy if it satisfies p CD = p DC .A counting strategy only reacts to the number of cooperators in the previous round.If both players cooperated in the previous round, they cooperate with probability p CC .If exactly one March 2, 2023 3/27 of the players cooperated, they cooperate with probability p CD = p DC , irrespective of whether the outcome was CD or DC.If no one cooperated, the cooperation probability is p DD .Memory-1 counting strategies include all unconditional strategies (such as ALLC and ALLD), as well as the strategies GRIM = (1, 0, 0, 0) and WSLS = (1, 0, 0, 1).
If the two players employ memory-1 strategies p = (p CC , p CD , p DC , p DD ) and , then their behavior generates a Markov chain with transition matrix , and s ij (n) is the probability that the p-player chooses i and the p ′ -player chooses j in round n, then s(n+1) = s(n)M .
For p, p ′ ∈ (0, 1) 4 , the Markov chain has a unique invariant distribution . This distribution v corresponds to the left eigenvector of M with respect to the eigenvalue 1, normalized such that the entries of v sum up to one.
The entries of v can be interpreted as the average frequency of the four possible outcomes over the course of the game.Therefore we can define the repeated-game payoff of the p-player as For a more explicit representation of the players' payoffs, one can use the determinant formula by [49], which is shown in Methods.
To explore how players adapt their strategies over time, we use adaptive dynamics [47,48].Adaptive dynamics is a method to study deterministic evolutionary dynamics in a continuous strategy space.The idea is that the population is (mostly) homogeneous at any given time.Mutations generate a small ensemble of possible invaders, which are very close to the resident in strategy space.These invaders can take over the population if they receive a higher payoff against the resident than the resident achieves against itself.In the limit of infinitesimally small variation between resident and invader, we obtain an ordinary differential equation.For memory-1 strategies this differential equation takes the form That is, populations evolve towards the direction of the payoff gradient.We derive an explicit representation of this differential equation in Methods.The resulting expression defines a flow on the cube [0, 1] 4 .Our aim is to understand the properties of this flow.

Structural properties of adaptive dynamics
We begin by describing two general properties of adaptive dynamics in the cube [0, 1] p CD < p DC .Each of these invariant subsets can be studied in isolation.In a subsequent section, we provide such an analysis for the counting strategies (with p CD = p DC ) specifically.
As a second property, we observe an interesting symmetry between different orbits of adaptive dynamics.Specifically, if (p CC , p CD , p DC , p DD )(t) is a solution to (6) on some interval t ∈ (a, b), then so is (1−p DD , 1−p DC , 1−p CD , 1−p CC )(−t) on the interval t ∈ (−b, −a).This property implies that for every orbit forward in time, there is an associated orbit backward in time that exhibits the same dynamics.This result is specific to the donation games (or more precisely, to games with equal gains from switching).The formal proof of this symmetry is in Methods.In the following we provide an intuitive argument.To this end, consider the following series of transformations applied to the payoff matrix of a 2 × 2 game with equal gains from switching (R+P = S +T ): Notice that we started and ended at the same game; this property is equivalent to equal gains from switching.But now it is easy to see that solutions to the associated ordinary differential equation transform correspondingly as follows, The upshot of this duality is that solutions to adaptive dynamics come in related pairs.We will see expressions of this duality in several of the figures below.

Adaptive dynamics of memory-1 strategies
In the following, we aim to get a more qualitative understanding of the adaptive dynamics.To this end, we first examine which combinations of signs can appear in the components of the vector field ( ṗCC , ṗCD , ṗDC , ṗDD ).For example, it turns out that if p CC is decreasing, p DC must be decreasing as well.Similarly, if p DD is decreasing, then so is p CD .For c/b = 0.1, the results of this sign analysis are shown in show a 9×9×9× 9 evenly spaced grid on [0, 1] 4 .Each point is colored according to the signs of the components of ṗ = ( ṗCC , ṗCD , ṗCD , ṗDD ) at that point.Therefore, the figure provides information about the direction of adaptive dynamics at each point.We observe that the combinations abcd of signs come in pairs of the form abcd, dcba.For example, there are exactly as many points having signs '+---' as '---+'.The sets of points in each pair are related to each other by reflection about the diagonal in the figure.If abcd are the signs at (x, y, z, w) ∈ (0, 1) 4 , then dcba are the signs at . This is, of course, a consequence of the symmetry described in the previous section.
In a next step, we aim to find all interior fixed points of adaptive dynamics.As we show in Methods, these turn out to be the solutions to the linear system In particular, the set of interior critical points forms a two-dimensional plane within the four-dimensional cube.We can solve (9) for p CC and p DD to get It is now not hard to obtain nontrivial bounds on p CC and p DD among the interior critical points.If p ∈ (0, 1) 4 is a critical point of ( 6), then p CC > c b and There are no analogous restrictions on the possible values of p CD and p DC .
By definition, critical points satisfy a local condition, ṗij = 0 for all i, j ∈ {C, D}.
However, it turns out that the critical points identified above have a shared global property.The points that satisfy Eq. ( 10) coincide with the equalizer strategies that have been described earlier [49,50].An equalizer is a strategy p such that A(p ′ , p) is a constant, irrespective of p ′ .Every such strategy must be a critical point of adaptive dynamics.Our result shows that also the converse is true.Every interior critical point of the system (6) needs to be an equalizer.
We can also examine what happens on the boundary of the strategy space.For our analysis, we define the boundary B [0, 1] 4 to be all points p ∈ [0, 1] 4 with exactly one entry p i,j ∈ {0, 1}.That is, we exclude corner and edge points.What remains is a set of eight 3-dimensional cubes.We call a point p ∈ B [0, 1] 4 saturated if p ij = 0 implies ṗij ≤ 0 and p ij = 1 implies ṗij ≥ 0. A point is called strictly saturated if the above inequalities are strict.A point is unsaturated if it is not saturated.Orbits that start at an unsaturated point move into the interior of the strategy space.Conversely, every strictly saturated point is the limit, forward in time, of some trajectory in the interior.
For memory-1 strategies, all eight boundary faces contain both saturated and unsaturated points for some values of 0 < c < b (Fig 2).In the following, we discuss in more detail the boundary face for which mutual cooperation is absorbing (that is, the boundary face with p CC = 1).On this boundary face, the population obtains the socially optimal payoff of b−c, irrespective of the specific values of p CD , p DC , p DD .As a result, we show in Methods that the time derivatives with respect to these components vanish, ṗCD = ṗDC = ṗDD = 0.The saturated points on the face p CC = 1 are exactly those that satisfy ṗCC ≥ 0, which yields the condition This set of saturated points contains all cooperative memory-1 Nash equilbiria, which has been characterized by [33] to be the set of all strategies p that satisfy p CC = 1 and March 2, 2023 6/27 We note, that the conditions (12) are more strict than the conditions (11).Put another way, a boundary point can be a local maximum of the payoff function against itself without being a global maximum.
In a similar way, one can also characterize the saturated points on the boundary face with p DD = 0, where mutual defection is absorbing.We depict the set of saturated points on this face in the bottom row of Fig 2, together with the previously discussed set of saturated points with p CC = 1.As the figure suggests, the two sets exactly complement each other.For every point that is strictly saturated on the boundary face p CC = 1 there is a corresponding point on the face p DD = 0 that is unsaturated.Of course, that correspondence is again a consequence of the symmetry described earlier.
After describing the critical points in the interior, and the saturated points on the boundary, we explore the 'typical' behavior of interior trajectories.To this end, we record the end behavior of solutions p(t) to Eq. ( 6) beginning at various initial conditions p(0).Dynamics are assumed to cease at the boundary of the strategy space.This behavior can be numerically calculated.The results, for a 9×9×9×9 grid of initial Adaptive dynamics of memory-1 counting strategies.
After describing the dynamics of memory-1 strategies, we proceed by analyzing the dynamics of counting strategies, with p CD = p DC .Counting strategies are especially convenient because they can be represented in three dimensions.To make this representation explicit, in the following we write counting strategies as vectors q = (q 0 , q 1 , q 2 ) ∈ [0, 1] 3 .Here, q i is the probability to cooperate if i of the two players have cooperated previously.The respective memory-1 representation is thus given by p CC = q 2 , p CD = p DC = q 1 , and p DD = q 0 .Correspondingly, the dynamics that we explore is given by qi = ∂A(q, q ′ ) ∂q i This dynamics among counting strategies is not identical to the previously considered dynamics among memory-1 strategies, even when the starting population is taken from the invariant subset with p CD = p DC .Instead, differences arise because the embedding [0, 1] 3 → [0, 1] 4 is not distance-preserving with the standard metric on each space.As a result, the gradient of the payoff function is computed slightly differently in the two spaces -specifically, the memory-1 adaptive dynamics (6) within the subspace of counting strategies subspace differs from the adaptive dynamics ( 6) by a factor of 2 in q1 (t).The analysis in this section is thus not to characterize the orbits of the invariant subspace of counting strategies within the memory-1 strategies.Rather we consider the space of counting strategies [0, 1] 3 as an interesting space in its own right, which we analyze in the following.
In a first step, we reproduce , where we plot the signs of the components of q0 , q1 , q2 ) at each counting strategy.As one may expect, these combinations again come in pairs, where abc is paired with cba.Some combinations, such as +++, are self-paired.Similar to the memory-1 strategies, we also want to characterize the set of interior critical points of the system (13).In Methods, we show that these points can now be March 2, 2023 7/27 parametrized by Hence the set of interior critical points forms a straight line segment.The boundary points of this line segment are The length of this line segment is √ 3(b−c)/(b+c), which ranges from √ 3 (the diagonal of the cube) to 0, as c/b ranges from 0 to 1.We can classify the stability of the critical points by finding their associated eigenvalues.The complete results are shown in Fig 5.
Five generic types of critical points are present as we vary the cost-to-benefit ratio: source, spiral source, spiral sink, sink, and saddle.
In addition to these interior critical points, Fig 6 also depicts the critical points on the boundary faces B [0, 1] 3 .Using the terminology of the previous section, these critical points are saturated without being strictly saturated.On each boundary face, the respective curve thus separates the region of strictly saturated points from the unsaturated points.Because of the aforementioned symmetry of solutions, the set of boundary fixed points is symmetric under the transformation (x, y, z) → (1−z, 1−y, 1−x).We note that counting strategies have boundary properties unshared by memory-1 strategies.For example, every boundary point with q 1 = 0 is saturated.Conversely, every boundary point with q 1 = 1 is unsaturated.
To explore the dynamics in the interior, Fig 7 depicts the end behavior of solutions q(t) to Eq. ( 13) with initial conditions on an evenly spaced grid (analogous to Fig 3).
Again, dynamics are assumed to cease at the boundary.We observe that out of 729 initial points, 190 evolve to full cooperation, 140 evolve to full defection, 229 evolve to other places on the boundary, and 170 evolve to interior critical points.The overall abundance of the four outcomes is thus similar to the respective numbers in the space of all memory-1 strategies, with the only exception being that now more orbits converge to interior fixed points.
We can also plot a few solutions q(t) of Eq. ( 13) in three dimensions to give an idea of the possible behaviors.Four types of behavior are shown in Fig 8 .Alongside plots of the trajectory q(t) we depict the cooperation rate C(q(t)), defined as the average rate of cooperation in a large population playing the respective strategy.Previous studies show that these cooperation rates change monotonically when players are restricted to use reactive strategies (those with p CC = p DC and p CD = p DD , see [1]).Within the counting strategies, this monotonicity is violated in the third and fourth example, and the fourth converges to intermediate cooperation rather than full cooperation or full defection.

Discussion and conclusion
The donation game is one of the main paradigms to explore direct reciprocity, and memory-1 strategies are among the best-studied strategy spaces in the respective literature [23][24][25][26][27][28][29][30][31].These strategies are comparably simple.They only condition on the outcome of the very last round, while ignoring the outcome of all previous rounds.Despite their simplicity, the formulas that describe the payoffs of memory-1 players are non-trivial to manipulate mathematically.As a result, many previous studies on memory-1 strategies rely on simulations.On the one hand, such simulations give valuable insights into the dynamics of reciprocity.On the other hand, they make it difficult to describe why certain strategies are favored by evolution, and how results depend on parameters such as the cost of cooperation.

March 2, 2023 8/27
To get a more analytical description of the evolution of reciprocity, we use the framework of adaptive dynamics.This framework considers homogeneous populations that move into the direction of mutants with maximum invasion fitness [47,48].For our setup of memory-1 players in the donation game, we show that this dynamics has two remarkable mathematical properties.Our first result concerns the subspace of counting strategies.Counting strategies only depend on the number of cooperating players in the previous round.We show that the adaptive dynamics leaves the subspace of counting strategies invariant.Moreover, we show in Methods that this invariance result is not restricted to donation games or memory-1 strategies.A similar invariance arises for arbitrary repeated 2 × 2 games, or when players remember more than the very last round.
Second, we describe an interesting symmetry between forward-in-time orbits and backward-in-time orbits.This symmetry is specific to the donation game.Its importance becomes apparent in many of our figures (for example, in We use these mathematical insights to qualitatively describe the adaptive dynamics of memory-1 strategies and of counting strategies.In particular, we describe the set of interior critical points, and the set of saturated boundary points.Any converging solution of adaptive dynamics ends up in one of these two sets.While previous research has identified which memory-1 strategies are Nash equilibria [33,34], our study identifies those memory-1 strategies that satisfy a local notion of uninvadibility.For example, Eq. ( 11) describes all memory-1 strategies that are mutually cooperative and locally stable.The respective condition is less stringent than the condition for being a Nash equilibrium.This insight allows for the following interpretation.If evolution generates mutant strategies that are phenotypically similar to the parent, there is a strictly larger strategy set of memory-1 strategis that can maintain cooperation.
We believe these results give a more rigorous understanding of the properties of memory-1 strategies.At the same time we hope that similar techniques can be used to explore other games and more general strategy spaces.

Derivation of the adaptive dynamics
In the main text, we have described how to define the payoff of two players with memory-1 strategies by representing the game as a Markov chain.However, to derive the adaptive dynamics, it is useful to start with an alternative representation of the payoffs.As shown by [49], the payoff expression (5) can be rewritten as Using this representation, we can write out the expression for adaptive dynamics (6) 17) This denominator is positive in the interior (0, 1) 4 of the strategy space.Hence, multiplying by the denominator only affects the timescale of evolution, but not the direction of the trajectories.After applying this modification to the system (6), the dynamics among the memory-1 strategies of the donation game takes the following form, Here, the auxiliary functions f i , g i , h i for i ∈ {1, 2, 3, 4} are defined as follows Note that we can write f i , g i , h i for i ∈ {3, 4} in terms of the same functions for i ∈ {1, 2}.

Invariance of counting strategies
Using the representation ( 18) and (19), it becomes straightforward to show that the space of memory-1 counting strategies remains invariant under adaptive dynamics.Proposition 1.Let C denote the three-dimensional subspace of counting strategies among the memory-1 strategies, March 2, 2023 10/27 Then C is invariant under adaptive dynamics.That is, if p(t) is a solution of Eq. (18) with p(0) ∈ C, then p(t) ∈ C for all t.
Proof.By using the definitions in (19), one can verify that In particular, if we define d := p CD −p DC , it follows by ( 18) and ( 21) that For d = p CD −p DC = 0, we can therefore conclude that ḋ = 0.
While the proof of Proposition 1 shows that the set of counting strategies is invariant, it also shows that this set is not a local attractor.Instead, from Eq. ( 22) it follows that the distance d to the set of counting strategies decreases at a given time if and only if p ∈ (0, 1) 4 satisfies p CC +p DD > p CD +p DC .

A symmetry between forward and backward orbits
Another direct implication of the functional form of adaptive dynamics in Eqs. ( 18) and ( 19) is that solutions come in pairs.In the main text we give an intuitive argument for a symmetry in solutions for donation games.Here we derive the result formally.Proof.We show the result for the first component; the other components follow similarly.For the first component, we have Therefore, if p(t) satisfies the differential equation ( 18), then so does p(−t).
The transformation p → p, defined by , reflects a point in the hypercube [0, 1] 4 with respect to the 2-dimensional plane That is, if one takes the line segment between p and p, then the midpoint of this line segment is in P. The plane P is exactly the set of points that are mapped onto themselves.Every point is mapped onto itself if the transformation is applied twice.It can be checked that the transformation p → p maps critical points to critical points (see next subsection), and more generally interchanges α-limits and ω-limits.

Critical points of adaptive dynamics
In the following, we characterize the rest points of adaptive dynamics in the interior of the hypercube.
We can relate these critical points to the equalizer strategies discussed by [50] and [49].
Definition.An equalizer is a strategy p for which A(p ′ , p) is a constant function of p ′ .
It follows from the definition that every equalizer strategy is a critical point of the dynamics (18).In the interior (0, 1) 4 , the converse is also true.That is, March 2, 2023 12/27 Proposition 4. Every interior critical point of the system ( 18) is an equalizer.
Proof.Our condition for critical points (28) coincides with the expression for equalizers, Eq. ( 8) in [49], when using the payoffs of the donation game.
As shown by [34], equalizers are the only Nash equilibria among the stochastic memory-1 strategies.Thus our above results can be summarized as follows.In the donation game, an interior point is a critical point of adaptive dynamics if and only if it is a Nash equilibrium (such a result does not need to hold in general, because strategies might be locally stable fixed points of adaptive dynamics without being global best responses to themselves, see [45]).

Analysis of the boundary faces
In the main text, we define the boundary of the strategy space [0, 1] 4 as the set of all (p CC , p CD , p DC , p DD ) for which exactly one entry is in {0, 1}.Therefore there are eight different boundary faces.One particularly important face is the one with p CC = 1, which corresponds to a fully cooperative population.It follows from Eq.( 19) that on this boundary face f 2 (p CC , p DD ) = f 3 (p CC , p DD ) = f 4 (p CC , p CD , p DC ) = 0.By Eq. ( 18) we can then conclude that ṗCD = ṗDC = ṗDD = 0.A point p on this boundary face is saturated if and only if ṗCC ≥ 0. By Eq. ( 18) and because yields condition (11).
The boundary face with p DD = 0 can be analyzed analogously.

Adaptive dynamics of memory-1 counting strategies
In the following, we identify memory-1 counting strategies with points in the 3-dimensional cube [0, 1] 3 .The entries of a counting strategy q = (q 2 , q 1 , q 0 ) correspond to the cooperation probability in the next round, based on the number of cooperators in the previous round.We can embed the space of counting strategies into the space of memory-1 strategies by using the mapping (q 2 , q 1 , q 0 ) → (q 2 , q 1 , q 1 , q 0 ).Using this embedding, we can compute the payoff of a q-player against q ′ -player using the payoff formula (16), which yields In the following we study the adaptive dynamics of counting strategies.Again, we consider a homogeneous population with strategy q, evolving in the direction of the gradient of the payoff function, now calculated in [0, 1] 3 .Evolution in the space of counting strategies is thus given by qi = ∂A(q, q ′ ) ∂q i q=q ′ To write out the adaptive dynamics equation (30) in full, it is again convenient to multiply the equations by the common denominator r(q 0 , q 1 , q 2 ) 2 , with This denominator is nonzero in the interior (0, 1) 3 of the strategy space.After this rescaling, the system of equations ( 30) becomes q0 = f 0 (q 1 , q 2 ) • b•g 0 (q 0 , q 1 , q 2 ) + c•h 0 (q 0 , q 1 , q 2 ) q1 = f 1 (q 0 , q 2 ) • b•g 1 (q 0 , q 1 , q 2 ) + c•h 1 (q 0 , q 1 , q 2 ) q2 = f 2 (q 0 , q 1 ) • b•g 2 (q 0 , q 1 , q 2 ) + c•h 2 (q 0 , q 1 , q 2 ) ( The auxiliary functions f i , g i , h i now take the form

Critical points of adaptive dynamics of counting strategies
Again, in the following we characterize the rest points of adaptive dynamics in the interior of [0, 1] 3 .
Proposition 5.The interior critical points of the system (32) are parametrized by Proof.Because f 0 , f 1 , f 2 do not vanish in the interior of the strategy space (0, 1) 3 , we can compute At a critical point we have q0 = q1 = q2 = 0, so the expressions on the right hand side must vanish.This implies q 0 −2q 1 +q 2 = 0 or q 0 = q 1 = q 2 (in which case q 0 −2q 1 +q 2 = 0 holds trivially).So q 1 = (q 0 +q 2 )/2 is a necessary condition for the strategy q to be a fixed point.To obtain a condition that is also sufficient we take this expression for q 1 and plug it into This expression only vanishes when q 0 −q 2 = −2c b+c .The solutions to the conditions March 2, 2023 14/27 are parameterized by Conversely, it is easily checked that all of these strategies are critical points of (32).
Thus the interior critical points form a straight line segment on the interior of the cube with boundary points 0, c b+c , 2c

Extensions of the invariance result
Our Proposition 1 shows that among the memory-1 strategies of the donation game, adaptive dynamics leaves the set of counting strategies invariant.In the following, we derive two generalizations of this result.In a first step, we show that the same result holds for arbitrary repeated 2×2 games.
Proposition 6.Let C denote the three-dimensional subspace of counting strategies among the memory-1 strategies, as defined by Eq. (20).Then C is invariant under adaptive dynamics, for any repeated 2 × 2 game with payoff matrix (2).
Proof.Let M be the Markov chain of the form (4) generated by the behavior of two players with strategies p and p ′ .Moreover, let v denote the associated stationary distribution.The payoff to the p-player in the repeated 2 × 2 game is then given by A(p, p ′ ) = π(v), where π : R 4 → R is some linear map that depends on the payoff matrix of the game but not on p or p ′ .
By definition vM = v.If we introduce an infinitesimal variation δp in the strategy p there will be an associated δM and δv, and they satisfy (v + δv)(M + δM ) = v + δv.Since vM = v and since δvδM is disregarded as doubly infinitesimal, we have δvM + vδM = δv.Choose δp to be (0, ϵ, −ϵ, 0).Then it can be seen easily that Now suppose p and p ′ are equal and furthermore that p CD = p DC .Then v CD = v DC by symmetry, and vδM manifestly vanishes.It follows from the above that δvM = δv.
Then δv is proportional to v by uniqueness of a stationary distribution.But we are also demanding that the sum of components of v + δv is 1.Thus δv = 0 and there is no variation in payoff π(v).No player gains from deviating infinitesimally off the hypersurface p CD = p DC in adaptive dynamics, i.e. from departing the space C.
In a second step, we ask whether a similar invariance result applies to memory-n strategies.With an argument similar to the one above, we can show that it applies at least in a restricted way.
Our notation for memory-n strategies is best introduced by example: the component p ( CDC DDC ) of a memory-3 strategy of player 1 denotes the probability of cooperation if the outcomes of the most recent three rounds were CD, DD, CC, in that order.March is invariant for any repeated 2 × 2 game.
Proof.Similar to before, let M be the Markov chain generated by the behavior of two players with memory-n strategies p and p ′ , with stationary distribution v.The components of v are the average frequencies of observing each possible history of length n over the course of the game.The payoff to player 1 is given by A(S, S ′ ) = π(v), where π : R 4 n → R is again some linear function depending on the payoff matrix of the game but independent of p and p ′ .Again, we introduce an infinitesimal variation δp in the strategy p.As a result, there will be an associated δM and δv, and they satisfy Now suppose that p is a memory-n strategy that satisfies condition (40), with s being an arbitrary but fixed sequence of length n−1 of C's and D's.Let e i denote the vector with a 1 in the ith position and zeros elsewhere, and let e i,j denote the matrix with a 1 in the i, j'th entry and zeros elsewhere.The dimensions will be clear from context.We introduce the following infinitesimal variation in p, The corresponding variation in M is We can compute If p and p ′ are equal, then it follows by symmetry that Now (40) applied to p ′ , along with (44), imply that the right hand side of (43) vanishes.Since vδM = 0, our initial discussion means that δvM = δv.Therefore δv is proportional to v by uniqueness of stationary distribution.Because the sum of components of v + δv is 1, we conclude that δv = 0. Hence there is no variation in payoff π(v).No player gains from making the infinitesimal variation (41).
March 2, 2023 16/27       Long-time limits of adaptive dynamics of counting strategies.On a 9 × 9 × 9 × 9-grid representing the space of memory-1 strategies, we depict the 729 points which are counting strategies (defined by p CD = p DC ).They are colored according to the long-time limit lim t→∞ q(t) of a solution q(t) to Eq. ( 6), with starting value q(0) in the grid.Dynamics are assumed to cease at the boundary of the strategy space.Generically, there are 4 possibilities as shown in the legend.For 190 points the trajectory q(t) evolves to full cooperation, defined by q 2 = 1 (blue).For 140 points the trajectory q(t) evolves to full defection, defined by q 0 = 0 (red).The remaining points either evolve into other regions of the boundary (green) or approach interior critical points, which are equalizers (yellow).This figure is not a simple restriction of  Trajectories of adaptive dynamics of counting strategies.We consider four different initial conditions.We plot the solutions q(t) to Eq. ( 13) on the left, colored by hue and marked with arrowheads to indicate the direction of evolution in the strategy space.On the right, we plot the cooperation rate C(q(t)), which is a real number between zero (full defection) and one (full cooperation).Each of the initial conditions leads to a different behavior.In the first row, for an initial condition q(0) = (0.8, 1, 1), the cooperation rate decreases monotonically from one to zero.In the second row, for q(0) = (0, 0.85, 0.6833), the cooperation rate increases monotonically from zero to one.In the third row, for q(0) = (0, 0.5, 0.6), the cooperation rate increases from zero to an intermediate value before decreasing and then increasing again to one.Finally, in the last row, for q(0) = (0, 0.75, 0.6667), the cooperation rate increases from

Fig 1
conditions and cost-to-benefit ratio c/b = 0.1, are shown in Fig 3.There are 6561 initial conditions.Out of those, 1835 points are observed to end at full cooperation (p CC = 1), 1375 points at full defection (p DD = 0), 2964 points at other places on the boundary, and 387 at interior critical points (equalizers).Unlike in Fig 1, we do not observe the symmetry described in equations (7-8).The choice of depicting the forward direction of time breaks the symmetry.
Fig 1 for the case of counting strategies.In Fig 1, counting strategies correspond to the points on the diagonal p CD = p DC of each subpanel.Fig 4 is the analog of Fig 1 for counting strategies

Fig 1 and
Fig 2,   where it leads to beautiful geometric patterns).

Since 1 −
p CD +p DC > 0 for p CD , p DC ∈ (0, 1), either p CC = p CD = p DC = p DD or p CC +p DD = p CD +p DC must be enforced.Note that if p CC = p CD = p DC = p DD , then p CC + p DD = p CD + p DC holds trivially.Hence, in both cases we have the identity p DD = p CD + p DC − p CC , which we can plug into ṗCC /f 1 (p CD , p DC , p DD ) to get ṗCC f1(pCD, pDC , pDD) = b(pCD−pCC )+c(1−pCC+pDC ) • −1+(pCD−pCC ) 2 +(pDC−pCC ) 2(26) ) thus needs to satisfy b(p CC −p CD ) − c(1−p CC +p DC ) = 0 and p CC + p DD = p CD + p DC (27) (⇐) If a strategy satisfies the conditions (27), we can express p CD and p DC in terms of p CC and p DD , p CD = b p CC − (1+p DD )c b − c and p DC = (1−p CC )c + p DD b b − c

3 (
the diagonal of the cube) to 0 as c b ranges from 0 to 1.We can classify the stability of these critical points by finding their associated eigenvalues.The results are complicated, but shown inFig 5.

Fig 1 .
Fig 1.Local adaptive dynamics for memory-1 strategies.For a 9×9×9×9-grid (= 6561 points) we show the direction of change in terms of the sign of each component of ( ṗCC , ṗCD , ṗDC , ṗDD ) as given by Eq. (6).The possibilities are shown on the right.We observe that for 1424 points all four components are positive, ++++.For 3269 points all four components are negative, ----.Seven combinations do not occur.These combinations fall into one or both of the following categories: (i) ṗCC is negative and ṗDC is positive, and (ii) ṗDD is negative and ṗCD is positive.Both combinations are forbidden.Because of the symmetry (8) there are three pairs where each combination occurs as often as its partner.One such pair is ++-+ and +-++ (each occurring 353 times).The configuration +--+ is its own mirror image and therefore a singleton (occurring 536 times).The reason for the symmetry in the plot is explained in the main text.Let σ : [0, 1] 4 → [0, 1] 4 be defined by σ(p CC , p CD , p DC , p DD ) = (1−p DD , 1−p DC , 1−p CD , 1−p CC ).If abcd are the signs at p, then dcba are the signs at σ(p).σ acts by reflection about the dotted diagonal line shown.Finally, eight points are critical points with ( ṗCC , ṗCD , ṗDC , ṗDD ) = (0, 0, 0, 0).Two points are zero in one but not all of the four components.The graph is created for c = 0.1.

Fig 2 .
Fig 2. Saturated points on the boundary of memory-1 strategies.The boundary of the set of memory-1 strategies consists of eight three-dimensional faces with p ij = 0 or p ij = 1 for exactly one pair of i, j ∈ {C, D}.We omit points (p CC , p CD , p DC , p DD ) for which more than one p ij is 0 or 1.Thus, the eight boundary faces do not intersect.A point p on the boundary is saturated if the payoff gradient does not point into the interior of the cube.We show the set of saturated points on all eight boundary faces.Because of the symmetry described by Eqs.(7) and (8), these eight sets of points fit together in four complementary pairs, like the curved pieces of a three-dimensional puzzle.The boundary face p ij = 0 is paired with the face pī j = 1 (where a bar refers to the opposite action, C = D and D = C).The paired boundary faces fit together after a rotation of one of them 180 • about the line parameterized by (t, 1 2 , 1 − t).Parameter c = 0.1.

Fig 3 .
Fig 3.Long-time limits of adaptive dynamics of memory-1 strategies.For a 9 × 9 × 9 × 9-grid of starting points (= 6561 points), we show the ω-limit lim t→∞ p(t) of a solution p(t) to Eq. (6).Dynamics are assumed to cease at the boundary of the strategy space.Generically, there are 4 possibilities, as shown in the legend.For 1835 points, the trajectory p(t) evolves to full cooperation, defined by p CC = 1 (blue).For 1375 points, the trajectory p(t) evolves to full defection, defined by p DD = 0 (red).The remaining points either evolve into other regions of the boundary (green) or approach interior critical points, which are equalizers (yellow).The symmetry described in the main text does not manifest in this plot, but reappears when we juxtapose the plot with the corresponding plot for reversed time.Parameter c = 0.1.

Fig 4 .
Fig 4.Local adaptive dynamics for counting strategies.On a 9 × 9 × 9 × 9-grid representing the space of memory-1 strategies, we depict the 729 points which are counting strategies (defined by p CD = p DC ).They are colored according to their direction of change in terms of the sign of each component of ( q0 , q1 , q2 ).Generically, there are eight possibilities as shown in the legend.We observe that for 156 points all three components are positive, +++, while for 373 points all three components are negative, ---.Three combinations do not occur: -+-, -++, and ++-.These are combinations in which q0 or q2 is negative while q1 is positive; such combinations are forbidden.Because of the symmetry derived in the main text there is a symmetric pair, +--and --+, each occurring 29 times.The configuration +-+ is its own mirror image and therefore a singleton (occurring 142 times).Parameter c = 0.1.

Fig 5 .
Fig 5. Classification of interior critical points in the space of counting strategies.We show the line of interior critical points in the space of counting strategies for five values of c.The line is colored according to the type of each critical point, which is determined by the eigenvalues of the linearization of the system (13) at this point.We observe all five generic types: source, spiral source, sink, spiral sink, and saddle.The complete classification is shown in the lower right panel.Each interior critical point is an equalizer (see main text).The line is parameterized by t − c/(1 + c), t, t + c/(1 + c) as t ranges over the interval (c/(1 + c), 1/(1 + c)).The symmetry described in the main text is manifest in this figure.The transformation σ : (x, y, z) → (1−z, 1−y, 1−x) carries the line of critical points to itself.It exchanges sinks and sources, spiral sinks and spiral sources, and saddle points and other saddle points.

Fig 6 .
Fig 6.Interior and boundary critical points in the space of counting strategies.For four values of c, we show the line of interior critical points (green) and the boundary critical points (black) in the space of counting strategies.The boundary critical points consist of three pieces: the edge defined by q 0 = 0 and q 2 = 1 (i.e. the intersection of full cooperation and full defection) and two separate curves on the faces p 0 = 0 and q 2 = 1.For example, the strategy GRIM = (1, 0, 0) is a boundary critical point.The symmetry described in the main text is visible in the rotational symmetry of the set of critical points.

Fig 7 .
Fig 7.Long-time limits of adaptive dynamics of counting strategies.On a 9 × 9 × 9 × 9-grid representing the space of memory-1 strategies, we depict the 729 points which are counting strategies (defined by p CD = p DC ).They are colored according to the long-time limit lim t→∞ q(t) of a solution q(t) to Eq. (6), with starting value q(0) in the grid.Dynamics are assumed to cease at the boundary of the strategy space.Generically, there are 4 possibilities as shown in the legend.For 190 points the trajectory q(t) evolves to full cooperation, defined by q 2 = 1 (blue).For 140 points the trajectory q(t) evolves to full defection, defined by q 0 = 0 (red).The remaining points either evolve into other regions of the boundary (green) or approach interior critical points, which are equalizers (yellow).This figure is not a simple restriction of Fig 3 because the restriction of Eq. (6) differs from Eq. (13) by a factor of 2. Parameter c = 0.1.
Fig 7.Long-time limits of adaptive dynamics of counting strategies.On a 9 × 9 × 9 × 9-grid representing the space of memory-1 strategies, we depict the 729 points which are counting strategies (defined by p CD = p DC ).They are colored according to the long-time limit lim t→∞ q(t) of a solution q(t) to Eq. (6), with starting value q(0) in the grid.Dynamics are assumed to cease at the boundary of the strategy space.Generically, there are 4 possibilities as shown in the legend.For 190 points the trajectory q(t) evolves to full cooperation, defined by q 2 = 1 (blue).For 140 points the trajectory q(t) evolves to full defection, defined by q 0 = 0 (red).The remaining points either evolve into other regions of the boundary (green) or approach interior critical points, which are equalizers (yellow).This figure is not a simple restriction of Fig 3 because the restriction of Eq. (6) differs from Eq. (13) by a factor of 2. Parameter c = 0.1.

Fig 8 .
Fig 8.  Trajectories of adaptive dynamics of counting strategies.We consider four different initial conditions.We plot the solutions q(t) to Eq. (13) on the left, colored by hue and marked with arrowheads to indicate the direction of evolution in the strategy space.On the right, we plot the cooperation rate C(q(t)), which is a real number between zero (full defection) and one (full cooperation).Each of the initial conditions leads to a different behavior.In the first row, for an initial condition q(0) = (0.8, 1, 1), the cooperation rate decreases monotonically from one to zero.In the second row, for q(0) = (0, 0.85, 0.6833), the cooperation rate increases monotonically from zero to one.In the third row, for q(0) = (0, 0.5, 0.6), the cooperation rate increases from zero to an intermediate value before decreasing and then increasing again to one.Finally, in the last row, for q(0) = (0, 0.75, 0.6667), the cooperation rate increases from zero before oscillating and converging to an intermediate value.The last two orbits loop around the line of interior critical points, shown in black.Parameter c = 0.1.
Fig 8.  Trajectories of adaptive dynamics of counting strategies.We consider four different initial conditions.We plot the solutions q(t) to Eq. (13) on the left, colored by hue and marked with arrowheads to indicate the direction of evolution in the strategy space.On the right, we plot the cooperation rate C(q(t)), which is a real number between zero (full defection) and one (full cooperation).Each of the initial conditions leads to a different behavior.In the first row, for an initial condition q(0) = (0.8, 1, 1), the cooperation rate decreases monotonically from one to zero.In the second row, for q(0) = (0, 0.85, 0.6833), the cooperation rate increases monotonically from zero to one.In the third row, for q(0) = (0, 0.5, 0.6), the cooperation rate increases from zero to an intermediate value before decreasing and then increasing again to one.Finally, in the last row, for q(0) = (0, 0.75, 0.6667), the cooperation rate increases from zero before oscillating and converging to an intermediate value.The last two orbits loop around the line of interior critical points, shown in black.Parameter c = 0.1.
This result does not require the specific payoffs of the donation game.Instead it is true for all symmetric 2 × 2 games.The result is useful because it allows us to decompose the space of memory-1 strategies into three invariant sets: the set of strategies with p CD > p DC , with p CD = p DC , and with CD +p DC )r(p CC , p CD , p DC , p DD )2, where in full.To this end, it is convenient to multiply the resulting system by the common Consider the adaptive dynamics for memory-n strategies p and let s be a fixed arbitrary sequence of n−1 moves for one player.Then the condition