Abstract
Social hierarchies are often found in group-living animals. The hierarchy position can influence reproductive success (RS), with a skew towards high-ranking individuals. The amount of aggression in social dominance varies greatly, both between species and between males and females within species. Using game theory we study this variation by taking into account the degree to which reproductive competition in a social group is mainly local to the group, emphasising within-group relative RS, or global to a larger population, emphasising an individual’s absolute RS. Our model is similar to recent approaches in that reinforcement learning is used as a behavioural mechanism allowing social-hierarchy formation. We test two hypotheses. The first is that local competition should favour the evolution of mating or foraging interference, and thus of reproductive skew. Second, decreases in reproductive output caused by an individual’s accumulated fighting damage, such as reduced parenting ability, will favour less intense aggression but should have little influence on reproductive skew. From individual-based simulations of the evolution of social dominance and interference, we find support for both hypotheses. We discuss to what extent our results can explain observed sex differences in reproductive skew and social dominance behaviour.
Introduction
In group-living animals, positions in a social hierarchy are often established and maintained through pairwise aggressive interactions. The intensity of this aggression varies greatly, both between species and between males and females within species, with females typically showing less aggression than males [1, 2, 3]. There is also variation in the magnitude of reproductive skew caused by social dominance [4, 5, 3]. A traditional explanation for sex differences in reproductive skew and dominance behaviour is that there is greater scope for male than for female variation in reproductive success (RS), sometimes referred to as Bateman’s principle [1, 2, 3].
We analyse the evolution of reproductive skew and dominance behaviour by investigating the range from local to global competition in a metapopulation of local groups. The reasoning can apply either to males or to females. There is purely local competition if the group reproductive output is independent of reproductive skew, in which case a top-ranked individual in principle should have the capacity to make up a full group output. This individual would then have an incentive to monopolise reproduction in the group. For purely global competition, an individual’s RS should instead be measured against those in the larger population, so that there is little or no incentive to interfere with the reproductive output of other group members. The terms ‘soft selection’ and ‘hard selection’ are sometimes used for such a distinction between local and global competition [6]. For the evolution of reproductive skew and dominance behaviour, an important difference between local and global competition might then be that local competition favours mating or foraging interference (hence-forth, interference). Interference can increase an individual’s relative RS in a group, whereas global competition only favours a high absolute RS. Concerning the evolution of sex differences in reproductive skew and dominance behaviour, evolution in males might be closer to local competition and in females to global competition, although with many intermediates between the extremes.
Apart from the scale of competition, various reproductive consequences of contest damage are likely to influence the evolution of dominance behaviour. Mortality from contest damage can decrease or eliminate reproduction and is one such effect. A reduced phenotypic quality that lowers parenting success could be a more widespread example, in particular in females [1, 2, 3]. Our aim is to elucidate the combined influence of the scale of competition (from local to global) and the costs of contest damage (in particular reduced parenting ability) on the evolution of dominance behaviour, interference, and reproductive skew. Our hypotheses are that interference is favoured by local competition and has a strong influence of the evolution of reproductive skew, and that other reproductive consequences of contest damage will influence the intensity of aggression, but will have a weaker effect on reproductive skew. As an alternative, we also consider situations where interference is very costly for dominants to perform and/or less effective in reducing subordinate reproduction, which should lead to less skew and less fighting. By investigating these hypotheses we aim to explain important aspects of sex-differences in reproductive skew and dominance behaviour. How local vs. global competition might influence the evolution of interference in social dominance has not previously been studied using game theory.
Our game-theory model is similar to previous approaches in using learning about differences in fighting ability as an evolving behavioural mechanism that can give rise to within-sex dominance hierarchies [7, 8, 9]. In addition, we introduce the strength of interference by dominants in subordinate reproduction as a trait that can evolve. The effect of variation in this trait spans from no interference to dominants nearly eliminating subordinate reproduction. For the case of no interference, we assume that a social hierarchy imposes a baseline level of reproductive skew, arising from such things as differences in the qualities of display arenas on a lek or of breeding sites, or possibly female preferences for males of different ranks. Interference can increase reproductive skew above this baseline.
In the following we outline the model elements and present results from individual-based evolutionary simulations. The genetically determined traits in the model are the components of a reinforcement learning mechanism, as used previously [8, 9], together with the strength of interference. We examine the evolution of these traits in one of the sexes, which could be either males or females. We then discuss to what extent our results provide a qualitative explanation of between-sex differences, and also if the factors we identify can throw light on within-sex species differences in reproductive skew and dominance behaviour. Finally, our analysis uses game theory to address the general question of why there is variation in the intensity of aggression, which was raised by Maynard Smith and Price [10] in their seminal contribution to game theory in biology, and we end by discussing sex differences in dominance behaviour from this perspective.
The model
Our model here is an extension of previous models [8, 9], by adding variation in the degree of local competition and introducing interference as a trait. In previous models, competition was local, in the sense that dominance behaviour did not influence the total group reproductive output, but there was no interference, such that an individual’s RS was assumed to be directly determined by the dominance position it achieved. This means that in previous models, the amount of reproductive skew from social dominance was a model assumption and not a consequence of trait evolution. Here we introduce interference as a separate trait that influences an individual’s relative RS and study the co-evolution of this trait with other traits that determine the formation of a social hierarchy. Previously, we examined two types of costs of fighting damage, a decrease in the effective fighting ability and a risk of mortality from damage [9]. Here we add another cost of fighting, viz. a decreased parenting ability from fighting damage.
The elements of our model are outlined in Fig. 1. First a hierarchy is formed through aggressive interactions, then there is a risk of mortality from fighting damage, followed by reproduction and interference. Interference is a trait (κ) that measures how strongly an individual of a given dominance rank acts to reduce the reproduction of those of lower rank. Interference causes a proportional reduction in acquired resources (AR), i.e. the contested resources for reproduction acquired by a subordinate (Fig. 1b). Interference is costly to perform (Fig. 1b), and we assume that effects caused by different individuals interact multiplicatively. For both local and global competition, interference has the effect of increasing reproductive skew, but for global competition there is the additional effect of reducing the total group reproductive output (Fig. 1c). In addition to interference, we assume that accumulated contest damage can cause a proportional reduction in the individual’s parenting ability (Fig. 1d).
To avoid the possibility that strong interference causes some individuals to be entirely without reproductive prospects, we assume that there is a small probability of ‘outside-option’ reproduction. The effect of this can be seen in Fig. 1c, where the curves labelled L3 and G3 come fairly close to, but do not reach zero for the bottom dominance positions (high values of k).
Many of the details of the model are the same as in previous models [8, 9], in particular the traits of the reinforcement-learning mechanism, but for completeness a full description is given in Supporting Information (SI), including a table of notation and definitions for the model (Table S1).
Evolutionary simulations
Individuals are assumed to have genetically determined traits. The evolution of the traits is studied in individual-based simulations (Table 1). The traits for individual i include the strength of interference, κi, and a number of traits of the reinforcement-learning mechanism. Of these, the degree of generalisation, fi, expresses how strongly an individual generalises learning about one opponent to other opponents, which is important for winner-loser effects. There are also the preference and value learning rates, αθi, αwi; the bystander learning rate, βi; the initial preference for the aggressive action A, θ0i; and the initial estimated value of a round, w0i. These are basic reinforcement-learning traits. Finally, the effect of observations on preference and value functions, γ0i, g0i, and the perceived reward from performing A, vi, are assumed to be genetically determined traits. See Table S1 and the SI text for further explanation.
In evolutionary simulations, each trait is determined by an unlinked diploid locus with additive alleles. Alleles mutate with a probability of 0.002 per generation, with normally distributed mutational increments. The standard deviation of mutational increments for each trait was adjusted to ensure that simulations could locate evolutionary equilibria (as seen in Table 1, the evolved traits vary in scale, and mutational increments need to reflect the scale of trait variation).
A simulated population consisted of 500 groups of 8 individuals taking part in dominance interactions (either males or females), plus 8 individuals of the other sex, resulting in a total population size of N = 8000. Each interacting individual was assigned a quality qi, independently drawn from a normal distribution with mean zero and standard deviation σq. As a simplification we assume that all offspring disperse globally over all groups, to form the adults of the next generation. For each case reported in Table 1, simulations were performed over intervals of 5000 generations, repeated at least 100 times, to estimate mean and standard deviation of traits at an evolutionary equilibrium.
Standard parameter values
The following ‘standard values’ of parameters (Table S1) were used: proportion of local competition, λ = 0.0, λ = 0.5, or λ = 1.0; probability of outside-option reproduction, Q = 0.1; distribution of individual quality, σq = 0.50; damage cost parameters, c0 = 0.02, c1 = 0.0004, c2 = 0.00 or c2 = 0.04; interference parameters b0 = 0.1, b1 = 0.5, φ = 0.95; observations of relative quality, a0 = 0.707, σ = 0.50; perceived penalty variation, σp = 0.25. For these parameters, around 50% of the variation in the observations by individuals in a round is due to variation in relative fighting ability, qi − qj.
Results
The trait values that evolved in our individual-based simulations are shown in Table 1, for different degrees of local competition and absence vs. presence of a cost of decreased parenting ability from damage. The strength of interference (κi) for the three different degrees of local competition (λ = 0.0, 0.5, 1.0) correspond approximately to the three values illustrated in Fig. 1b, with greater interference for higher degree of local competition (Table 1). In contrast, the evolved interference traits were similar for absence vs. presence of a cost of decreased parenting ability (c2 = 0.00 vs. c2 = 0.04, Table 1). These results are in accordance with our hypotheses.
Figure 2 shows different aspects of the outcome of dominance and interference interactions for the cases in Table 1. The degree of local competition (λ) had a strong effect on the distribution of RS over ranks (Fig. 2a) and thus on reproductive skew (Fig. 2c), with higher skew when competition is more local, whereas the absence vs. presence of a cost of decreased parenting ability only weakly influenced these measures (Fig. 2a, c). In contrast, both more local competition and the absence of a parentingability cost of damage lead to higher contest damage (Fig. 2b). These results support the hypotheses we set out to test. In addition, contest damage tended to be higher for lower-ranked individuals, in particular without a cost of decreased parenting ability (Fig. 2b).
A comparison of the total group AR contributed by the competing sex for the different cases in Table 1 appear in Fig. 2d. In the cases with full local competition (λ = 1), interference strongly decreased the group AR. To interpret this, one can note that for full local competition, members of the competing sex (e.g., males) only contribute matings, but no additional resources to offspring. These instead come from the other sex (e.g., females). The sharp decrease in AR with full local competition thus only means that interference prevents most members of the competing sex from achieving contested matings. The interpretation of the cases with intermediate degree of local competition (λ = 0.5), and intermediate strength of interference, could instead be that a substantial part of the AR contributed by the competing sex is not subject to interference (e.g., nesting sites), but that interference can exclude individuals from other substantial parts (e.g., foraging areas). With global competition (λ = 0.0), there is very little interference and the only noticeable decrease in group AR comes from a small reduction in parenting ability from fighting damage (Fig. 1d).
The amount of fighting for the different ranks, and for local vs. global competition and absence/presence of a parenting-ability cost of damage is shown in Fig. 3, with examples of single groups in Fig. S1. There is much variation in the number of fighting rounds between individuals in different groups, but the tendency is that intermediate ranks fight the most. The tendency for the lowest ranks to fight less is an example of an ‘opt-out loser effect’, which we studied previously [9]. Part of the variation in fighting is that some pairs of individuals did not fight at all (Fig. S2). This was most common for global competition with a parenting-ability cost, and tended to occur when one of the individuals had low rank and the opponent had a higher rank (Fig. S2).
The overall amount of fighting also varied substantially between the cases, being around 10 times lower for global competition with a parenting-ability cost than for local competition without parenting-ability cost (Fig. 3b vs. 3c).
We also investigated the evolutionary consequences of a substantially higher probability of outside-option reproduction (Q = 0.5, Table S2, Figs. S3, S4, S5). This did not strongly influence interference (see κi in Tables 1 and S2), but changed the learning traits in such a way that fights became shorter and less damaging (Fig. S3c and Figs. S4, S5). Because there was more outside-option reproduction, reproductive skew was reduced (Fig. S3a, c). With absence of a parenting ability cost, the lower ranks still tended to accumulate higher damage than the top ranks (Fig. S3b), but with full local competition and decreased parenting ability from damage the lowest ranks showed noticeably less fighting than the top ranks (Fig. S4c).
Finally, we examined the evolutionary consequences of a substantially more costly and less effective interference. As an example, for full local competition (λ = 1.0) and changing the interference parameters such that the lower curve in Fig. 1b gives the effect on self and the upper curve the effect on a subordinate, the outcome was that the interference traits evolved to near zero, leading to a baseline reproductive skew, and less fighting damage (Fig. S6).
Discussion
Our evolutionary analysis showed that more intense local (within-group) competition favours stronger mating and/or foraging interference by dominants (Table 1), reducing the reproductive success of subordinates and increasing the reproductive skew (Fig. 2). We also found that costs in the form of reduced parenting ability from contest damage can sharply reduce fighting and the damage from fighting (Figs. 2, 3). These factors, separately or acting together, have effects that are large enough to potentially explain observed sex differences in social dominance behaviour. We further examined how fighting and damage varied between dominance positions, finding that individuals of intermediate or low ranks fought most and suffered the most contest damage (Figs. 2, 3).
The results on interference were achieved by introducing an evolving interference trait into the model. This is a new element compared to previous models of social-hierarchy formation that are similar to our current model in using reinforcement learning as a behavioural mechanism [8, 9]. Interference might correspond to different types of behaviours, ranging from dominant males attacking and chasing subordinate males to prevent them from mating, to dominant females excluding subordinate females from foraging areas. Among the examples are males of Alpine ibex, for which a dominance hierarchy is established before the start of the mating season [11, 12], and dominant, lactating olive baboon females excluding subordinate females from foraging through aggression [13]. The related idea of interference competition is much used in ecology, where it is applied to interactions both within and between species [14, 15] and can involve dominance interactions [16].
Interference, as used in our model, is related to punishment in animal societies, for which social dominance is a major example [17]. One idea is that punishment serves to deter cheating and promote cooperation, and another, contrasting idea is that it serves to change the relative RS in a group [18, 19]. In our model, interference plays the latter role, and can be costly by reducing the RS of the interfering individual. Interference resembles the concept of spite as used in theoretical studies on kin selection, with the conclusion that local competition favours spite [20]. Nevertheless, because we assume interacting individuals to be unrelated, interference in our model is not spite in the kin-selection sense [21].
In our model there are two contributions to reproductive skew: the ‘starting’ distribution of acquired resources (AR) over dominance ranks, which is the distribution for zero interference (κ = 0), and the change from this due to interference by dominants. Either of these contributions can vary between situations, giving rise to many possibilities. In addition, the cost and effectiveness of interference can vary, and if the cost becomes too high or effectiveness too low, interference will not evolve (see Fig. S6). This could, for instance, correspond to situations where the synchrony of receptivity of females in a group make it difficult of infeasible for high-ranking males to monopolise matings, as has been found in primate species [22].
To determine whether our model results explain observed sex-differences in social-dominance behaviour, one would need data on the scales of male and female competition in different species, as well as data on the influence of fighting damage on parenting ability, or similar effects of disturbance from fighting. Although these possible explanations have been put forward [1, 2], there seems to be a lack of quantitative estimates. Still, studies on female-female competition through interference in social groups support the general idea that interference is stronger when it can increase the RS of a dominant individual [23, 24, 13, 25].
Concerning the scale of competition, so-called female reproductive dominance is often assumed in life-history modelling (e.g., [26]). In species living in social groups, this would imply that male-male competition is mainly local, but there seem to be no studies directly investigating it. The concepts of hard and soft selection (the terminology is from [27]) are much used in studies of metapopulations [6] and correspond to global vs. local competition, but again there is little in the form of empirical estimates of these forms of population regulation. In the context of population management and conservation, data on culling, sterilization, or harvesting of either males or females are often studied (e.g., [28, 29, 30]), which in principle could allow estimates of the scale of competition, but up to now such data have not been used for this purpose.
One way to assess costs of dominance interactions on parenting ability is to examine genetic correlations between the corresponding traits, because such correlations could indicate an evolutionary trade-off. Evidence for such a trade off has been found in cows [31], with lower fertility and milk production in individuals more adapted for fighting. This complements the many observations consistent with decreased parenting ability from fighting [1, 2].
Hormonal manipulation is another way to examine such a trade-off, and this has been performed on cleaner fish in the wild [32]. Cleaner fish (Labroides dimidiatus) live on coral reefs. They forage by removing ectoparasites from other fish, so-called clients [33], and are organised into dominance hierarchies of females and a top-ranking male that has undergone sex reversal [34]. The study [32] found that testosterone-injected females increased their aggression towards subordinate females and spent less time interacting with clients, which supports our model assumptions. In general, cleaner fish, could be an example where female competition is relatively close to global, because they forage on non-monopolisable resources (clients) and reproduce through pelagic eggs.
There seems to be a lack of data on damage/fighting as a function of rank in social animals, but our result that high ranks suffer less damage has at least qualitative support from studies on health and longevity as a function of rank in social mammals [35]. Still, is has been argued [36] that improved health and longevity for high ranks typically applies to females, whereas high-ranking males males might suffer greater costs. So far there are no theoretical analyses explaining this potential sex difference. There are a number of previous models of reproductive skew, which have a focus on cooperative breeding among related individuals [37, 38, 39, 40, 41, 42], but these models could also apply to the situations we study here. So-called ‘tug-of-war’ models show some similarity to our approach, in allowing for different individual investments into conflict, but there are several notable differences. Previous reproductive skew models have not specifically dealt with either global vs. local competition, concrete mechanisms of hierarchy formation and interference, or allowed for more than two group members. Furthermore, our model does not make use of concepts like concessions, negotiations, or threats as possible explanations for sex-differences in social dominance behaviour [42], but instead uses reinforcement learning as a mechanism that allows hierarchy formation. Our model, as well as our previous model [9], makes relatively detailed assumptions and is therefore more complex than previous reproductive skew models, but it has the advantage of a somewhat closer match to field situations. This match might be further improved by incorporating more elements, such as age structure, relatedness between interacting group members, types of dispersal, and explicit modelling of outside-option reproduction.
Game theory in biology started 50 years ago as an attempt to explain why animal contests are often settled without serious injury [43, 10]. The originally proposed explanation, that threats of escalation and retaliation limit aggression at an evolutionary equilibrium [10, 44], has not stood the test of time, whereas the idea that assessment permits settlement with no or little fighting [45] is better supported. Sex differences in aggression and fighting in social animals might in fact be one of the main areas where the general question can be explored, adding to the interest in the problem. Our analysis here suggests that the life-history context, including such things as the scale of competition and reproductive opportunities outside of a given contest, is the main factor explaining how costly contests become. A combination of ambitious empirical work, including comparative studies, and theoretical modelling might throw further light on the issue.
Supporting information
Additional tables
Model details
Brief descriptions of model elements appear in Table S1. Several elements of the model are similar to those in previous models [8, 9]. These are the observations in a round, including assumptions about individual recognition, the actions A and S, the action preferences and estimated values, the implementation of action exploration, and the learning updates, including bystander learning.
Life-history, competition and reproductive season
There is an annual life cycle with a single reproductive season (Fig. 1a). Dominance interactions occur in groups with gs members of the competing sex (gs = 8 for the individual-based simulations in Table 1). The season starts with a sequence of contests. Each contest is between a randomly selected pair of group members and there are 5gs(gs − 1) contests, i.e., on average 10 contests per pair, but as soon as dominance has been established for a pair, there is no further fighting in that pair. An individual’s survival from the contests to reproduction depends on its accumulated fighting damage. As a result of the contests, a dominance hierarchy is formed, and surviving group members obtain contested reproductive success (RS) according to their acquired resources (AR), in a way that is influenced by the proportion λ of local competition. The interpretation of λ is that for pure local competition (λ = 1) each group has the same expected total contested RS, whereas for pure global competition (λ = 0) the entire population of parents produce sufficient offspring for the next generation and each competing parent’s expected contested RS is proportional to its AR. For intermediate competition (0 < λ < 1), the expected total contested RS of a group is proportional to λARtot/G + (1 − λ)ARloc, where ARtot is the total AR in the population, G is the number of groups, and ARloc is the total AR of the group in question. Each competing individual in the group contributes to this group total in proportion to its AR. Individuals of the non-competing sex have equal chances of contributing to reproduction.
In addition to contested RS, the model implements an ‘outside option’ of noncontested RS. With a probability Q (with Q = 0.1 and Q = 0.5 for the simulations in Table 1 and Table S1 respectively) the parents of an offspring are randomly selected among all the living adults, without dependence on AR, and with probability 1 − Q the parents are selected as described above for contested RS. A parent of the competing sex needs to survive to contribute to outside-option reproduction, and for cases with non-zero parenting cost of damage (c2 > 0) the outside-option contribution is proportional to the parenting ability. Thus, the effects depicted in Fig. 1d are taken into account in outside-option RS. The reason for having outside-option RS is to avoid the extreme case of certain individuals having no reproductive future, which could otherwise happen given our assumption of an annual life-cycle and the possibility of strong interference by dominants in subordinate reproduction. For instance, for the cases labelled L3 and G3 in Fig. 1c, the expected RS is close to 0.2 (i.e., 2Q) for the bottom rank (k = 8), whereas it would have been close to zero without an outside option (see also Fig. S3.
We assume that offspring disperse randomly over the population. More details on hierarchy formation and reproduction appear below.
Observations and actions
The model simplifies a round of interaction into two stages. In the first stage, interacting individuals make an observation. Thus, individuals observe some aspect ξ of relative fighting ability and also observe the opponent’s identity. The observation by an individual is statistically related to the difference in fighting ability between itself and the opponent, qi − qj. For the interaction between individuals i and j at time t, the observation is where a0 > 0 and Eijt is an error of observation, assumed to be normal with mean zero and SD σ. Note here that the observations ξijt refer to the original fighting abilities qi and qj, and not the effective fighting abilities (see below).
By adjusting the parameters σq, which is the SD of the distribution of qi, and a0 and σ from equation (S1), one can make the information about relative quality more or less accurate. The observation (ξij, j) is followed by a second stage, where individual i chooses an action, and similarly for individual j. The model simplifies to only two actions, A and S, corresponding to aggressive and submissive behaviour.
Action preferences and estimated values
For an individual i interacting with j at time t, lijt denotes the preference for A. The probability that i uses A is then so that the preference lijt is the logit of the probability of using A. The model uses a linear (intercept and slope) representation of the effect of ξijt on the preference, and expresses lijt as the sum of three components Here hiit = fiθiit is a contribution from generalisation of learning from all interactions, hijt = (1 − fi)θijt is a contribution specifically from learning from interactions with a particular opponent j, and γ0iξijt is a contribution from the current observation of relative fighting ability. Note that for fi = 0 the learning about each opponent is a separate thing, with no generalisation between opponents, and for fi = 1 the intercept component of the action preference is the same for all opponents, so that effectively there is no individual recognition (although the observations ξijt could still differ between opponents). One can similarly write the estimated value of aninteraction as a sum of three components: The actor-critic method updates θiit, θijt, wiit, and wijt in these expressions based on perceived rewards, whereas fi, γ0i, and g0i are genetically determined.
Exploration in learning
For learning to be efficient over longer time spans there must be exploration (variation in actions), in order to discover beneficial actions. Learning algorithms, including the actor-critic method, might not provide sufficient exploration [46], because learning tends to respond to short-term rewards. In the model, exploration is implemented as follows: if the probability in equation (S2) is less than 0.01 or greater than 0.99, the actual choice probability is assumed to stay within these limits, i.e. is 0.01 or 0.99, respectively. In principle the degree of exploration could be genetically determined and evolve to an optimum value, but for simplicity this is not implemented in the model.
Fighting damage and effective fighting ability
A group member i accumulates damage Dit from fighting. Dit refers to accumulated damage up to (but not including) round t. As a consequence of the damage, the individual’s effective fighting ability is reduced from the original qi to where c0 is a parameter. Following an AA round between i and j, there is an increment to Dit: and similarly for j. The effective fighting abilities also determine the perceived costs (see below), and in this way they influencing the learning.
Perceived rewards
An SS interaction is assumed to have zero rewards, Rijt = Rjit = 0. For an AS interaction, the aggressive individual i perceives a reward Rijt = vi, which is genetically determined and can evolve. The perceived reward for the submissive individual j is zero, Rjit = 0, and vice versa for SA interactions. If both individuals use A, some form of costly interaction or fight occurs, with perceived costs (negative rewards or penalties) that are influenced by the effective fighting abilities of the two individuals. The perceived rewards of an AA interaction are assumed to be where eijt is a normally distributed random influence on the perceived penalty, with mean zero and standard deviation σp, and similarly for ejit.
Learning updates
In actor-critic learning, an individual updates its learning parameters based on the prediction error (TD error) which is the difference between the actual perceived reward Rijt and the estimated value The learning updates for the θ parameters are given by where is referred to as a policy-gradient factor and αθi is the preference learning rate for individual i. Note that ζijt will be small if pijt is close to one and individual i performed action A, which slows down learning, with a corresponding slowing down if pijt is close to zero and S is chosen. There are also learning updates for the w parameters given by where αwi is the value learning rate for individual i.
The updates to the policy parameters θ can be described using derivatives of the logarithm of the probability of choosing an action with respect to the parameters. Using equation (S2), we obtain for the derivative of the logarithm of the probability of choosing an action, A or S, with respect to the preference for A, which corresponds to equation (S10). From equation (S3) it follows that and this gives the learning updates of the θ parameters in equation (S9). The updates of the w parameters of the value function can also be described using derivatives. From equation (S4) it follows that and this gives the learning updates of the w parameters in equation (S11).
Bystander updates
As in previous models [8, 9], bystander effects are modelled as observational learning. When there is a dominance interaction in a group, individuals other then the interacting pair i and j, for instance an individual k, can use the outcome to update the learning parameters. Assume that individual k only performs this updating if i and j end their interaction by using AS or SA (because there is no clear ‘winner’ in AA and SS interactions, and bystanders do not perceive the costs of AA interactions). The probabilities for individuals i and j to use A are pijt and pjit, from equation (S2). These are ‘true’ values and are not known by individual k. However, given that the outcome is either AS or SA, one readily derives that the logit of the probability that it is AS is From equation (S3) one can see that this involves various learning parameters for i and j. For bystander learning an assumption is needed about how an individual k represents this logit. A simple assumption is that k represents the logit as which entails that k does not use any information about qi or qj. The assumption is reasonable in that a large θkit means that k behaves as if individual i is weak, and similarly for θkjt. Using the notation one possibility for the bystander updates by k is if i wins (outcome is AS) and if j wins (outcome is SA). The parameter βk is a measure of how salient or significant a bystander observation is for individual k, and this parameter is assumed to be genetically determined and can evolve. These bystander updates are similar to the direct-learning updates of the actor component of the actor-critic model and were used previously [8].
There is also the possibility that the salience for a bystander of a contest outcome is influenced by additional information the bystander might have, either from current or from previous observations. For instance, if the observations in equation (S1) are about relative size, a bystander might have estimates ξki and ξkj of its own size in relation to the contestants i and j. Instead of the bystander updates above we might then have if i wins (outcome is AS) and if j wins (outcome is SA). These updates entail that a bystander k pays particular attention to wins by an individual perceived to be bigger than itself, and to losses by an individual perceived to be smaller. These updates were used previously [9] and we also use them in our simulations here.
Contests
If a dominance relation has already been established between contestants i and j, there is no interaction. If not, the contestants go through a number of rounds, at minimum 10 rounds and at maximum 200 rounds of interaction. If there are 5 successive rounds where i uses A and j uses S (5 AS rounds), the contest ends and i is considered dominant over j, and vice versa if there are 5 successive SA rounds. Further, the contest ends in a draw if there are 5 successive SS rounds.
Details of dominance ranking
The ranking is among surviving individuals and is based of how many other group members an individual dominates (this measure has been referred to as a score structure [47]). If some individuals dominate the same number of other group members, their relative rank is randomly determined (this happened occasionally in our simulations). As an extreme example, if all individuals would use action S in the contests, there would be no real dominance hierarchy, because each would dominate zero other group members, and all ranks would be randomly determined (this never happened in our simulations).
Acquired resources and interference as a function of rank
For the competing sex, a surviving group member of rank k obtains an amount V (k) of contested AR, which is further modified by interference by dominants. Our assumptions of the amounts without interference can be seen from the shapes of the curves labelled L1 and G1 in Fig. 1c. An individual i with interference trait κi pays a cost of performing interference by multiplying its contested AR by a fraction provided the individual has others that are subordinate to it. This is illustrated by the effect on self curve in Fig. 1b. Each of the subordinates exposed to interference has its contested AR changed through multiplying by a fraction which is illustrated by the effect on others in Fig. 1b. An individual that is subordinate to more than one dominant suffers from interference from each dominant. We assume that the effects of interference on contested AR interact multiplicatively. The overall effects of this for different values of the strength of interference are shown in Fig. 1b. Our model assumptions represent just one possibility, and would likely need to be modified to describe a particular case of interference in nature.
Mortality and reduced parenting ability from fighting damage
An individual with accumulated damage Dit survives from contests to reproduction with probability The parenting ability of a surviving individual is and is illustrated in Fig. 1d. For contested RS, the interpretation is that reduced parenting ability corresponds to an effective reduction of AR. For contributions to outside-option reproduction, reduced parenting ability correspond to a reduction in the probability of contributing.
Acknowledgements
This work was supported by a grant (2018-03772) from the Swedish Research Council to OL and a grant (310030_192673/1) from the Swiss National Science Foundation to RB.