## Abstract

Real-world agents, such as humans, animals and robots, observe each other during interactions and choose their own actions taking the partners’ ongoing behaviour into account. Yet, classical game theory assumes that players act either strictly sequentially or strictly simultaneously (without knowing the choices of each other). To account for action visibility and provide a more realistic model of interactions under time constraints, we introduce a new game-theoretic setting called transparent game, where each player has a certain probability to observe the choice of the partner before deciding on its own action. Using evolutionary simulations, we demonstrate that even a small probability of seeing the partner’s choice before one’s own decision substantially changes evolutionary successful strategies. Action visibility enhances cooperation in a Bach-or-Stravinsky game, but disrupts cooperation in a more competitive iterated Prisoner’s Dilemma. In both games, strategies based on the “Win–stay, lose–shift” and “Tit-for-tat” principles are predominant for moderate transparency, while for high transparency strategies of “Leader-Follower” type emerge. Our results have implications for studies of human and animal social behaviour, especially for the analysis of dyadic and group interactions.

One of the most interesting questions in economics, biological, and social sciences is the emergence and maintenance of cooperation. A popular framework for studying cooperation (or the lack thereof) is Game Theory, which is frequently used to model interactions between “rational” decision-makers. In particular, a model for repeated interactions is provided by iterated games; two settings were previously used [1]:

Simultaneous games: players act at the same time without having any information about the current choice of the partners. Consequently, all players must make a decision under uncertainty concerning the choices of others.

Sequential games: players act in a certain order (either random or predefined [2])) and the player acting later in the sequence is guaranteed to see the choices of the preceding players. Here the burden of uncertainty only applies to the first player or – if there are more than two players – becomes lighter with every turn in the sequence.

Both settings place a simplifying restriction on the decisional context: either all players have no information about the choices of the partners (simultaneous game), or some players always have more information than others (sequential game). This simplification might be disadvantageous for modelling certain behaviours, since humans and animals usually act neither strictly simultaneously nor sequentially, but observe the choices of each other and adjust their actions accordingly [3]. Indeed, the visibility of the partner’s actions plays a crucial role in social interactions, both in laboratory experiments [4–7] and in natural environments [8–12].

For example, in soccer the penalty kicker must decide where to place the ball and the goalkeeper must decide whether to jump to one of the sides or to stay in the centre. Both players resort to statistics about the other’s choices in the past, making this more than a simple one-shot game. Since the goalkeeper must make the choice while the opponent is preparing the shot, a simultaneous game provides a crude model for such interactions [13, 14]. However, in practice, both players observe each other’s behaviour and try to anticipate the direction of the kick or of the goalkeeper’s jump from subtle preparatory cues [7]. Thanks to these observations, professional goalkeepers manage to predict the direction of the shot better than chance [13–15]. While this example represents a zero-sum game, similar considerations apply to a wide range of different interactions in real life. Yet a framework for the treatment of such cases is missing in classical game theory.

To better predict and explain the outcomes of interactions between agents by taking the visibility factor into account, we introduce the concept of transparent games, where players can monitor actions of each other. The access to the information about choices of other players is therefore probabilistic; in particular, for a game between two players at each round three cases are possible:

Player 1 knows the choice of Player 2 before making its own choice.

Player 2 knows the choice of Player 1 before making its own choice.

Neither of players knows the choice of the partner.

Which of these cases applies depends on the reaction times of the players. If they act nearly at the same time, neither is able to use the information about partner’s action; but a player who waits before making the choice has a higher probability to see the choice of the partner. Setting a time constraint (which is always present, either explicitly or implicitly, both in natural and in experimental situations) prevents players from waiting indefinitely for the partner’s choice. Then, given the reaction time distributions for the players, one can infer the probability of Player *i* to see the choice of the partner before making own choice.

Transparent games provide a general framework that also includes classical game-theoretical settings: simultaneous games correspond to , while sequential games result in for a fixed order of decisions in each round (Player 1 always moves first, Player 2 – second) and in for a random sequence of decisions. Note that the latter case (sequential game with random order of decisions) differs qualitatively from the general case of transparent games. The critical difference is that in transparent games there is an inherent uncertainty for the first movers because they cannot know in advance whether the other players will see their decision or not in a specific round. The uncertainty of the first movers about whether the second mover sees their choice or not, may favour behaviours qualitatively different from those that yield the best performance in games with either full unidirectional transparency (sequential games) or with no transparency (simultaneous games).

The main question then is whether the probabilistic access to the information in transparent games leads to the success of different behavioural strategies as compared to classic games. To answer this question, we study transparent versions of two classical two-player games: the iterated Prisoner’s dilemma (iPD) [16] and the iterated Bach-or-Stravinsky game (iBoS, also known as Battle of the Sexes and as Hero) [17]. We selected iPD and iBoS because they are counted among the most interesting games where cooperation is possible (non-zero-sum games) [17, 18], and because they require two distinct types of cooperative behaviour [19, 20]. While iPD is traditionally used for studying cooperation [16], iBoS is sometimes considered as a more suitable model [21, 22]. We employ evolutionary simulations, which allow evaluating optimal strategies using principles of natural selection, and consider memory-one strategies [23, 24] that take into account own and partner’s choices at the previous round of the game.

We find that even a small probability of seeing the choice of the partner before one’s own decision changes the optimal behaviour in the iPD and iBoS games. A strong possibility to see the partner’s choice enhances cooperation in the generally cooperative iBoS, but disrupts cooperation in the more competitive iPD. Different transparency levels also bring qualitatively different strategies to success. In particular, we show that strategies based on the “Win–stay, lose–shift” and “Tit-for-tat” principles are the most successful in both games for low and moderate transparency, while for high transparency a new class of strategies, which we term “Leader-Follower” strategies, evolves. Although frequently observed in humans and animals (see, for instance, [25]), these strategies have up to now remained beyond the scope of game-theoretical studies, but naturally emerge in our transparent games framework.

## Results

### Analytical results for the transparent games without memory

To substantiate further discussion we summarize here analytical results for one-shot transparent versions of Prisoner’s Dilemma (PD) and Bach-or-Stravinsky (BoS) game, with payoff matrices shown in Fig. 1. For details and proofs see “Methods” section.

In each game players choose between two actions, A_{1} or A_{2} (corresponding to cooperation and defection in PD or to insisting and accommodating in BoS, respectively) according to their strategies. In a one-shot transparent game strategy is represented by a vector (*s*_{1}; *s*_{2}; *s*_{3}), where *s*_{1}, *s*_{2}, *s*_{3} are probabilities to select A_{1} without seeing partner’s choice, seeing partner selecting A_{1}, and seeing partner selecting A_{2}, respectively. For example, strategy (1; 1; 0) in transparent PD means that player cooperates unless seeing that partner defects. Recall that the optimal behaviour for players is described by a Nash Equilibrium (NE), a pair of strategies, such that neither player can get a higher payoff by changing its strategy. In one-shot transparent PD all Nash equilibria are comprised by defecting strategies (0; *x*; 0) with (Proposition 2). This means that cooperation does not survive, similar to the classical PD. However, in a finite population playing one-shot PD [26], cooperators have better chances in transparent PD with high *p*_{see} than in classical settings (Proposition 3).

Nash equilibria for the one-shot transparent BoS depend on *p*_{see} (Proposition 4). For there are three NE: (a) Player 1 uses (0; 0; 1), Player 2 uses (1; 0; 1); (b) vice versa; (c) both players use strategy (*x*; 0; 1) with . This generalises the classical case of one-shot simultaneous BoS. However for *p*_{see} the only NE is provided by (1; 0; 1). In particular for the payoff given in Fig. 1b, there are three NE for *p*_{see} < 1*/*3 and one otherwise. As we will see below, the values of NE in one-shot BoS mainly determine the dynamics in the iterated game.

Thus introducing transparency influences optimal behaviour already in simple one-shot games. As a next step we considered iterated transparent games, where players take into account results of the last interaction.

### Evolutionary simulations for transparent games with memory

Since the dynamics of strategies with memory is too complicated to be solved analytically, we used evolutionary simulations [24] to investigate strategies evolving in transparent versions of iterated PD and BoS (iPD and iBoS). In both games, evolution results in equal mean reaction times for all players (see “Methods” section). Then the probability *p*_{see} to see the choice of the partner is equal for all players, which in a dyadic game results in *p*_{see} ≤ 0.5.

We studied an infinite population of players using the methods described in [23, 24]. The population consists of “species” of players, each defined by a strategy vector **s**_{i} and frequency *x _{i}*(

*t*) in the population with . For each species

*i*the strategy is represented by a vector , where

*k*enumerates the 12 different situations in which the player can be when making the choice. These depend on the outcome of the previous round, whether or not the player can see the current choice of the partner, and what the choice is if it is visible. The thus represent the conditional probabilities to select action A

_{1}, specifically

are probabilities to select A

_{1}without seeing partner’s choice, given that in the previous round the joint choice of the player and the partner was A_{1}A_{1}, A_{1}A_{2}, A_{2}A_{1}, and A_{2}A_{2}respectively (the first action specifies the choice of the player, and the second—the choice of the partner);are probabilities to select A

_{1}, seeing partner selecting A_{1}and given the outcome of the previous round (as before).are probabilities to select A

_{1}, seeing partner selecting A_{2}and given the outcome of the previous round.

Probabilities to select *A*_{2} are represented by , respectively. To ensure numerical stability of the simulations, it is common to introduce a minimal possible error *ε* in the strategies such that , with *ε* = 0.001, see [23, 24]. The fact that players cannot have pure strategies and are prone to errors is also closely related to the “trembling hand” effect making probability of any pure strategies greater than zero [23, 27].

For every value of *p*_{see} = 0.0, 0.1,…, 0.5 we performed 80 runs of evolutionary simulations tracing 10^{9} generations in each run. We began each run of simulations with five species having equal initial frequencies *x*_{1}(1) =…= *x*_{5}(1) = 0.2 and random strategies **s**_{i}. The frequency of the strategies *x _{i}*(

*t*) evolved in time according to the replicator dynamics equation (see “Methods” section). If

*x*(

_{i}*t*) dropped below 0.001, the species was assumed to die out, and its share in the population was distributed proportionally among the remaining strategies. On average every 100 generations new species with random strategies were entered in the population. Details of our simulations can be found in the “Methods” section.

Since the strategies in the evolutionary simulations were generated randomly, convergence to the theoretical optimum may take many generations and the observed successful strategies may deviate from the optimum. Therefore, we provide a coarse-grained description of strategies using the following notation: symbol 0 for , symbol 1 for , symbol * is used as a wildcard character to denote an arbitrary probability.

Let us exemplify this notation for the well-known strategies in the iPD. For instance, the Generous tit-for-tat (GTFT) strategy is encoded by (1*a*1*b*;1***;****), where 0.1 < *a, b* < 0.9. Indeed, GTFT cooperates with cooperators and forgives defectors. To satisfy the first property, the probability to cooperate after the partner cooperated in the previous round should be rather high, say above 0.9, thus the corresponding entries of the strategy are encoded by 1. To satisfy the second property, the probability to cooperate after the partner defected should be somewhere between zero and one with the optimal value 1*/*3 [23]. Since evolving towards this optimum may take long, we allow a broad range of values for and , for instance [0.1, 0.9]. We leave arbitrary since for low values of *p*_{see} these entries have little influence on the strategy performance, meaning that their evolution towards optimal values may take especially long. Similarly, the Always Defect strategy (AllD) is encoded by (0000;**00;**00), meaning that the probability to cooperate when not seeing partner’s choice or after defecting is below 0.1, and other behaviour is not specified. Win – stay, lose – shift (WSLS) is encoded by (1001;1***;****), and Firm-but-fair (FbF) by (101*b*;1***;****), where 0.1 < *b* < 0.9.

### Transparency suppresses cooperation in Prisoner’s Dilemma

Simulation results for the transparent iPD are presented in Table 1. Most of the effective strategies were known from earlier studies on non-transparent games, but for high transparency (*p*_{see} → 0.5) a new previously unknown strategy emerged. We dub this strategy “Leader-Follower” (**L-F**) since when two L-F players meet for *p*_{see} = 0.5, the player acting first (the Leader) defects, while the second player (the Follower) sees this and makes a “self-sacrificing” decision to cooperate. Note that for the next round the roles of the individuals may switch, ensuring balanced benefits of exploiting sacrificial second move. We characterized as L-F all strategies with profile (*00*b*;****;*11*c*) with *b* < 1/3 and *c* < 2/3. Indeed, for *p*_{see} = 0.5 these entries are most important to describe L-F strategy: after unilateral defection the Leader always defects and the Follower always cooperates. Meanwhile mutual defection most likely takes place when playing against a defector, thus both Leaders and Followers have low probability to cooperate after mutual defection. Behaviour after mutual cooperation is only relevant when L-F is playing against another strategy, and success for different types of behaviour depends on the composition of the population. For instance, (100*b*;111*;100*c*) is optimal in a cooperative population. Note that L-F did not emerge for sequential iPD in [2, 28, 29], since in these studies, players were bound to the same strategy regardless of whether they made their choice before or after the partner. In contrast, transparent games allow different sub-strategies (*s*_{1},…, *s*_{4}), (*s*_{5},…, *s*_{8}) and (*s*_{9},…, *s*_{12}) for these situations.

As in the simultaneous iPD, WSLS was predominant in the transparent iPD for low and moderate *p*_{see}, which is reflected by the distinctive WSLS profiles in the final strategies of the population (Fig. 2). Note that GTFT, another successful strategy in the simultaneous iPD, disappeared completely for *p*_{see} > 0. For *p*_{see} ≥ 0.4, the game resembled the sequential iPD and the results changed accordingly. Similar to the sequential iPD [2, 28, 29], the frequency of WSLS waned, the FbF strategy emerged, cooperation became less frequent and took longer to establish itself (Fig. 3a). For *p*_{see} = 0.5 the population was taken over either by L-F, WSLS or (rarely) by FbF, which is reflected by the mixed profile in Fig. 2.

To provide a theoretical justification for our results, we analytically compared strategies most frequently emerging in simulations. Pairwise comparison of strategies (Fig. 4) helps to explain the superiority of WSLS for *p*_{see} < 0.5, the disappearance of GTFT for *p*_{see} > 0.0, and the drastic increase of L-F frequency for *p*_{see} = 0.5.

For *p*_{see} ≤ 0.3 cooperation evolved relatively quickly thanks to the predominance of WSLS. Fig. 3a shows that further increase of *p*_{see} apparently undermined cooperation in iPD, this is why in the realistic iPD-prototype a face-to-face interrogation would be used. However, Leader-Follower is in a sense a cooperative strategy for iPD: it just alternates between cooperation and defection instead of using a synchronized cooperation. This brings to L-F an average payoff of (*S* + *T*)*/*2 when playing against itself. Alternation is generally sub-optimal in iPD since *R >* (*S* + *T*)*/*2; for instance, the results above are for *R* = 3 > (*S* + *T*)*/*2 = 2.5. To check the influence of the payoff on the strategies predominance, we have varied values of *R* by keeping *T*, *S* and *P* the same as in Fig. 1 as it was done in [23] for simultaneous iPD. Fig. 5 shows that for *R >* 3.2, evolution in the transparent iPD favours cooperation, but *R* ≤ 3.2 is sufficiently close to (*S* + *T*)*/*2 to make L-F a safe and efficient strategy.

### Cooperation emergence in the transparent Bach-or-Stravinsky game

Our simulations revealed that four memory-one strategies are most effective in iBoS for various levels of transparency. In contrast to iPD there exist only few studies of iBoS strategies, therefore we describe the observed strategies in detail.

Turn-taker aims to enter a fair coordination regime, where players alternate between IA (Player 1 insists and Player 2 accommodates) and AI (Player 1 accommodates and Player 2 insists) states. In the simultaneous iBoS, this strategy takes the form (

*q*01*q*), where*q*= 5*/*8 guarantees maximal reward in a non-coordinated play against a partner with the same strategy for the payoff matrix in Fig. 1b. We classify as Turn-takers all strategies encoded by (*01*;*0**;**1*). Turn-taking was shown to be successful in the simultaneous iBoS for a finite population of agents with pure strategies (i.e., having 0 or 1 entries only, with no account for mistakes) and a memory spanning three previous rounds [20].Challenger takes the form (1101) in the simultaneous iBoS. When two players with this strategy meet, they initiate a “challenge”: both insist until one of the players makes a mistake (that is, accommodates). Then, the player making the mistake (looser) submits and continues accommodating, while the winner continues insisting. This period of unfair coordination beneficial for the winner ends when the next mistake of either player (the winner accommodating or the loser insisting) triggers a new “challenge”. This strategy is encoded by (11

*b**;****;*1**) and has two variants:**Challenger**“obeys the rules” and does not initiate the challenge after losing (*b*≥ 0.1), while**Aggressive Challenger**may switch to insisting (0.1 <*b*≤ 1*/*3). Challenging strategies were theoretically predicted to be successful in simultaneous iBoS [30, 31].The Leader-Follower (

**L-F**) strategy**s**= (1111; 0000; 1111) was not considered previously. In a game between two players with this strategy, the faster player insists and the slower player accommodates. In simultaneous game, this strategy lapses into inefficient stubborn insisting since all players consider themselves leaders, but in transparent settings with high*p*_{see}this strategy provides an effective and fair (because of the, on average, equal reaction times) cooperation. In particular, for*p*_{see}> 1*/*3 L-F strategy is a Nash Equilibrium in one-shot game (see Proposition 4 in Methods section), and is an evolutionary stable strategy for*p*_{see}= 0.5.When the whole population adopts an L-F strategy, most entries of the strategy vector become irrelevant since (i) only IA and AI states are visited and (ii) the faster player never accommodates. Therefore, we classify all strategies encoded by (*11*;*00*;****) as L-F.

**Challenging Leader-Follower**is simply a hybrid of Challenger and L-F strategies encoded by (11*b**;0*c*0*;*1**), where 1*/*3 <*b*≤ 0.9,*c*≤ 1*/*3.

The results of the simulations are presented in Table 2. The entries of the averaged over all runs strategy established in the population (Fig. 6) show considerably different profiles for various values of *p*_{see}. Challengers, Turn-takers, and Leader-Followers succeeded for low, medium and high probabilities to see partner’s choice, respectively.

To provide additional insight into the results of the iBoS simulations, we studied analytically how various strategies perform against each other (Fig. 7). As with the iPD, this analysis helps to understand why different strategies were successful at different transparency levels. A change of behaviour for *p*_{see} > 1*/*3 is in line with the theoretical results indicating that for this transparency levels L-F is a Nash Equilibrium. Population dynamics for iBoS with a payoff different from the presented in Fig. 1b also depends on the Nash Equilibria of one-shot game, described by Proposition 4 in Methods section.

In contrast to iPD, for iBoS high visibility results in a more effective cooperation, which is consistent with the notion that cooperation in the iBoS game rests on effective coordination (rather than trust in the good intentions of the partner). Indeed, for *p*_{see} ≥ 0.3 non-cooperative Challengers no longer constituted the majority of the population. Note that for *p*_{see} = 0.5 cooperation thrives and is established much faster than for lower transparency (Fig. 3b) thanks to the Leader-Follower strategy.

## Discussion

In this paper, we introduced the concept of transparent games which integrates the visibility of the partner’s actions into a game-theoretic settings. Specifically, we considered iterated dyadic games where players have probabilistic access to the information about the partner’s choice in the current round. When reaction times for both players are equal on average, the probability *p*_{see} of accessing this information can vary from *p*_{see} = 0.0 (corresponding to the canonical simultaneous games) to *p*_{see} = 0.5 (corresponding to sequential games with random order of choices). Our approach is similar to the continuous-time approach suggested in [32]. However, there a game is played continuously, without any rounds at all, while we suppose that the game consists of clearly specified rounds or iterations, although the time within each round is continuous. This assumption seems to be natural, since many real world interactions and behaviours are episodic, have distinct starting and end points, and hence are close to distinct rounds [6,8,33,34]. Transparent games to some degree resemble random games [35,36] since in both concepts the outcome of the game depends on a stochastic factor. However in random games randomness immediately affects the payoff, while in transparent games it determines the chance to learn the partner’s choice. While this chance influences the payoff of the players, the effect depends on their strategies which is not the case in random games.

The value of *p*_{see} strongly affects the evolutionary success of strategies. In particular, for the iterated Prisoner’s Dilemma (iPD) we have shown that for *p*_{see} > 0 the Generous tit-for-tat strategy is unsuccessful and Win–stay, lose–shift becomes an unquestionable evolutionary winner. For *p*_{see} = 0.5, a new strategy, Leader-Follower triumphs. In the Bach-or-Stravinsky game (iBoS) even moderate *p*_{see} helps to establish cooperative turn-taking, while high *p*_{see} again brings the Leader-Follower strategy to success.

Despite the clear differences between the two games, predominant strategies evolving in iPD and iBoS for various levels of transparency have some striking similarities. First of all, in both games, Leader-Follower appears to be the most successful strategy for high *p*_{see} (although for iPD the share of Leader-Followers in the population across all generations is only about 15%, other strategies are less successful as most of them appear just transiently and rapidly replace each other). This prevalence of Leader-Follower strategy can be explained as follows: in a group where the behaviour of each agent is visible to the others and can be correctly interpreted, group actions hinge upon agents initiating these actions. In both games these initiators are selfish (see Supplementary Note for an example of an altruistic leadership), while cooperativeness of Followers in iPD is self-denying and may seem counter-intuitive. Our results for the transparent iPD demonstrate that altruistic behaviour for the sake of the species success may evolve in a population even without direct reciprocity.

For low and moderate values of *p*_{see} the similarities of the two games are less obvious. However, the Challenger strategy in iBoS follows the same principle of “Win – stay, lose – shift” as the predominant strategy WSLS in iPD, but with modified definitions of “win” and “lose”. For Challenger winning is associated with any outcome better than the minimal payoff corresponding to the mutual accommodation. Indeed, Challenger accommodates until mutual accommodation takes place and then switches to insisting. Such behaviour is described as “modest WSLS” in [31, 37] and is in-line with the interpretation of the “Win – stay, lose – shift” principle observed in animals [38].

The third successful principle in the transparent iPD is “Tit-for-tat”, embodied in Generous tit-for-tat (GTFT) and Firm-but-fair (FbF) strategies. This principle also works in both games since turn-taking in iBoS is nothing else but giving tit for tat. In particular, the FbF strategy, which occurs frequently in iPD for *p*_{see} ≥ 0.4, is partially based on taking turns and is similar to the Turn-Taker strategy in iBoS. The same holds to a lesser extent for the GTFT strategy.

The success of specific strategies for different levels of *p*_{see} makes sense if we understand *p*_{see} as a species’ ability to signal intentions and to interpret these signals when trying to coordinate (or compete). The higher *p*_{see}, the better (more probable) is the explicit coordination. This could mean that a high ability to explicitly coordinate actions leads to coordination based on observing the leader’s behaviour. In contrast, moderate coordination ability results in some form of turn-taking, while low ability leads to simple strategies of WSLS-type. In fact, an agent utilizing the WSLS principle does not even need to comprehend the existence of the second player, since WSLS “embodies an almost reflex-like response to the pay-off” [23]). The ability to cooperate may also depend on the circumstances, for example, on the physical visibility of partner’s actions. In a relatively clear situation, following the leader can be the best strategy. Moderate uncertainty requires some (implicit) rules of reciprocity embodied in turn-taking. High uncertainty makes coordination difficult or even impossible, and may result in a seemingly irrational “challenging behaviour” as we have shown for the transparent BoS. However, when players can succeed without coordination (which was the case in iPD), high uncertainty about the other players’ actions does not cause a problem.

By taking the visibility of agents’ actions into account, transparent games can provide a simple explanation for certain biological, sociological and psychological phenomena. Here, we illustrate the potential of this approach with two examples. The first concerns the study of human conformity. Iconic studies in social psychology have shown that humans may conform to majorities who propose an obviously incorrect opinion [39, 40] and to authorities requesting unethical behaviour [41]. The question why conformity is so pervasive among humans has puzzled researchers and laypeople alike. If we frame the Leader-Follower strategy as the second-mover conforming to the first mover’s choice, our results offer a provocative, if highly speculative, answer: a certain general disposition for conformity may stem from our evolutionary heritage. Given the advantage of the Leader-Follower strategy when others’ actions are frequently observable, it is conceivable that in humans a certain disposition for conformity evolved because it facilitates coordination. If so, conformity to erroneous majority opinions and to unethical authorities described in social psychology may – at least partially – reflect a side-effect of this disposition.

Another application of transparent games is related to the burgeoning experimental research of social interactions, including the emergent field of social neuroscience that seeks to uncover the neural basis of social signalling and decision-making using neuroimaging and electrophysiology in humans and animals [42–45]. So far, most studies have focused on sequential [46, 47] or simultaneous games [48]. One of the main challenges in this field is extending these studies to direct real-time interactions that would entail a broad spectrum of dynamic competitive and cooperative behaviours. In line with this, several recent studies also considered direct social interactions in humans and non-human primates [4–6, 34, 49–53] during dyadic games where players can monitor actions and outcomes of each other. Transparent games allow modelling the players’ access to social cues, which is essential for the analysis of experimental data in the studies of this kind [22]. This might be especially useful when behaviour is explicitly compared between “simultaneous” and “transparent” game settings, as in [4, 6, 49, 53]. In particular, the enhanced cooperation in the transparent iBoS for high *p*_{see} provides a theoretical explanation for the empirical observations in [6], where humans playing an iBoS-type game demonstrated a higher level of cooperation and a fairer payoff distribution when they were able to observe the actions of the partner while making their own choice. In view of the argument that true cooperation should benefit from enhanced communication [22], the transparent iBoS can in certain cases be a more suitable model for studying cooperation than the iPD (see also [54,55] for a discussion of studying cooperation by means of iBoS-type games).

In summary, transparent games provide a theoretically attractive link between classical concepts of simultaneous and sequential games, as well as a computational tool for modelling real-world interactions. We thus expect that the transparent games framework can help to establish a deeper understanding of social behaviour in humans and animals.

## Methods

### Transparent games between two players

In this study, we focus on iterated two-player two-action games: in every round both players choose one of two possible actions and get a payoff depending on the mutual choice according to the payoff matrix (Fig. 1). A new game setting, *transparent game*, is defined by a payoff matrix and probabilities (*i* = 1, 2) of Player *i* to see the choice of the other player,. Note that and is the probability that neither of players knows the choice of the partner because they act sufficiently close in time so that neither players can infer the other’s action prior to making their own choice. The probabilities can be computed from the distributions of reaction times for the two players, as shown in Supplementary Fig. 2 for reaction times modelled by exponentially modified Gaussian distribution [56, 57]. In this figure, reaction times for both players have the same mean, which results in symmetric distribution of reaction time difference (Supplementary Fig. 2b) and . Here we focus only on this case since for both games considered in this study, unequal reaction times provide a strong advantage to one of the players (see below). However, in general .

To illustrate how transparent, simultaneous and sequential games differ, let us consider three scenarios for a Prisoner’s Dilemma (PD):

If prisoners write their statements and put them into envelopes, this case is described by simultaneous PD.

If prisoners are questioned in the same room in a random or pre-defined order, this case is described by sequential PD.

Finally, in a case of a face-to-face interrogation where prisoners are allowed to answer the questions of prosecutors in any order (or even to talk simultaneously) the transparent PD comes into play. Here prisoners are able to monitor each other and interpret inclinations of the partner in order to adjust their own choice accordingly.

While the transparent setting can be used both in zero-sum and non-zero-sum games, here we concentrate on the latter class where players can cooperate to increase their joint payoff. For the purposes of this work, we define cooperation simply as joint actions towards mutually beneficially outcomes. In various areas more specific definitions of cooperation are used (see, for example, [8, 22] for a discussion of cooperation in animals). We consider the transparent versions of two classical games, the PD and the Bach-or-Stravinsky game (BoS). We have selected PD and BoS as representatives of two distinct types of symmetric non-zero-sum games [19, 20]: maximal joint payoff is awarded when players select the **same** action (cooperate) in PD, but **different** actions in BoS (one insists, and the other accommodates). The games of PD type are known as *synchronization* games; other examples of synchronization games include Stag Hunt and Game of Chicken [20]. Games with two optimal mutual choices are called *alternation* games [19,20]; as one of these choices is more beneficial for Player 1, and the other for Player 2, to achieve fair cooperation players should alternate between these two states.

Another important difference between the two considered games is that in BoS it is better to act before the partner, while in PD – after the partner. Indeed, in PD defection is less beneficial if it can be discovered by the opponent and acted upon. Meanwhile in BoS the player acting first has good chances to get the maximal payoff of *S* = 4 by insisting: when the second player knows that the partner insists, it is better to accommodate and get a payoff of *T* = 3, than to insist and get *R* = 2. Therefore, the optimal behaviour in PD is to wait as long as possible, while in BoS a player should react as quickly as possible. Consequently, when the time for making choice is bounded from below and from above, evolution in these games favours species with marginal mean reaction times: maximal allowed reaction time in PD and minimal allowed reaction time in BoS. Species with different behaviour are easily invaded. Therefore we assumed in all simulations that the reaction times have a constant and equal mean. We also assumed that reaction times for all species have an equal non-zero variance and that difference of reaction time distributions of two species is always symmetric (see Supplementary Fig. 2). This results in being the same for all species, thus all players have equal chances to see the choices of each other.

### Analysis of one-shot transparent games

Consider a one-shot transparent game between Player 1 and Player 2 having strategies **s** = (*s*_{1}; *s*_{2}; *s*_{3}) and **r** = (*r*_{1}; *r*_{2}; *r*_{3}) and probabilities to see the choice of the partner and , respectively. A payoff for Player 1 is given by
where the first line describes the case when neither player sees partner’s choice, the second line describes the case when Player 2 sees the action of Player 1, and the third – when Player 1 sees the action of Player 2. For the sake of simplicity we assume for the rest of this section that 1, otherwise the game is equivalent to the classical sequential or simultaneous game. First we consider the one-shot transparent Prisoner’s dilemma (PD), and then – Bach-or-Stravinsky (BoS) game.

### One-shot transparent Prisoner’s Dilemma

Here we assume that to simplify the discussion. Similar to the classical one-shot PD, in the transparent PD all Nash Equilibria (NE) correspond to mutual defection. To show this we make an important observation: in the one-shot PD it is never profitable to cooperate when seeing partner’s choice.

*In one-shot transparent PD with p*_{see} > 0 *any strategy* (*s*_{1}; *s*_{2}; *s*_{3}) *is dominated by strategies* (*s*_{1}; *s*_{2}; 0) *and* (*s*_{1}; 0; *s*_{3}). *The dominance of* (*s*_{1}; *s*_{2}; 0) *is strict when s*_{1} < 1, *the dominance of* (*s*_{1}; 0; *s*_{3}) *is strict when s*_{1} > 0.

*Proof*. The lemma follows immediately from (1). Since in PD *R < T*, payoff *E* of strategy (*s*_{1}; *s*_{2}; *s*_{3}) is maximized when *s*_{2} = 0. Similarly, from *S < P* it follows that the payoff is maximized for *s*_{3} = 0.□

Now we can describe the NE strategies in transparent PD:

*In one-shot transparent PD all the Nash equilibria are comprised by pairs of strategies* (0; *x*; 0) *with* 0 ≤ *x* ≤ 1 *and*

*Proof*. First we show that for any *x, y* satisfying (2), strategies (0; *x*; 0) and (0; *y*; 0) form a Nash equilibrium. Assume that there exists a strategy (*s*_{1}; *s*_{2}; *s*_{3}), which provides a better payoff against (0; *x*; 0) than (0; *y*; 0). According to Lemma 1, payoff of a strategy (*s*_{1}; 0; 0) is not less than the payoff of (*s*_{1}; *s*_{2}; *s*_{3}). Now it remains to find the value of *s*_{1} maximizing the expected payoff *E* of (*s*_{1}; 0; 0). From (1) we have:

Thus the expected payoff is maximized by *s*_{1} = 0 if inequality (2) holds and by *s*_{1} = 1 otherwise. In the former case the strategy (*s*_{1}; 0; 0) results in the same payoff *P* as the strategy (0; *y*; 0), which proves that a pair of strategies (0; *x*; 0), (0; *y*; 0) is an NE. If (2) does not hold, strategy (0; *x*; 0) is not an NE, since switching to (1; 0; 0) results in a better payoff.

Let us show that there are no further NE. Indeed, according to Lemma 1 if an alternative NE exists, it can only consist of strategies (1; 0; *z*) or (*u*; 0; 0) with 0 ≤ *z* ≤ 1 and 0 < *u* < 1. In both cases switching to unconditional defection is preferable, which finishes the proof.□

The one-shot transparent PD has two important differences from the classical game. First, defective strategies dominate the cooperative strategy (1; 1; 0) only for . Indeed, when both players stick to (1; 1, 0), their payoff is equal to *R*, while when switching to (0; 0; 0) strategy, a player gets *p*_{see}*P* + (1 − *p*_{see})*T*. However, (1; 1, 0) is dominated by a strategy (1; 0; 0) that cooperates when it does not see the choice of the partner and defects otherwise. This strategy, in turn is dominated by (0; 0; 0).

Second, in transparent PD unconditional defection (0; 0; 0) is not evolutionary stable as players can switch to (0; *x*; 0) with *x >* 0 retaining the same payoff. This, together with Proposition 3 below, makes possible a kind of evolutionary cycle: (1; 0; 0) → (0; 0; 0) ↔ (0; *x*; 0) → (1; 1; 0), (1; 0; 0) → (1; 0; 0). In summary, although transparency does not allow cooperation to persist when evolution is governed by deterministic dynamics, it would increase chances of cooperators for the stochastic dynamics in a finite population.

*In transparent PD strategies* (1; 0; 0) *and* (0; *x*; 0) *have the following relations:*

*if condition*(2)*and the following condition**are satisfied, then*(0;*x*; 0)*dominates*(1; 0; 0);*if neither*(2)*nor*(3)*are satisfied, then*(1; 0; 0)*dominates*(0;*x*; 0);*if*(2)*is satisfied but*(3)*is not, then the two strategies coexist;**if*(3)*is satisfied but*(2)*is not, then the two strategies are bistable*.

*Proof*. We prove only the first statement since the proof of the others is almost the same.

Let Player 1 use strategy (1; 0; 0) and Player 2 – strategy (0; *x*; 0). To prove that (0; *x*; 0) dominates (1; 0; 0) we need to show that Player 2 has no incentive to switch to (1; 0; 0) and that Player 1, on the contrary, would get higher payoff if using (0; *x*; 0). The latter statement follows from Proposition 2. To show that the former also takes place we simply write down payoffs *E*_{1} and *E*_{2} of strategies (1; 0; 0) and (0; *x*; 0) when playing against (1; 0; 0):

Now it can be easily seen that *E*_{1} ≤ *E*_{2} holds whenever inequality (3) is satisfied. □

### One-shot transparent Bach-or-Stravinsky game

Recall [58] that in the classical one-shot BoS game there are three Nash Equilibria: two pure (Player 1 insists, Player 2 accommodate, or vice verse) and one mixed (players insist with probability . The latter NE is weak and suboptimal compared to the pure NE; yet it is fair in a sense that both players receive the same payoff. The Nash equilibria for the transparent BoS game are specified by the following proposition.

*Consider one-shot transparent BoS between Players 1 and 2 with probabilities to see the choice of the partner and , respectively. Let , then this game has the following pure strategy NE*.

*Player 1 uses strategy*(0; 0; 1),*Player 2 uses strategy*(1; 0; 1) –*for**Player 1 uses strategy*(1; 0; 1),*Player 2 uses strategy*(0; 0; 1) –*for*(*note that this inequality holds automatically if*(4)*holds*);*Both players use strategy*(1; 0; 1) –*when*(5)*is not satisfied*.

*Additionally, if inequality* (4) *is satisfied, there is also a mixed-strategy NE: Player i uses strategy* (*x _{i}*; 0; 1)

*with*

*Thus when* (4) *holds, there are two pure-strategy and one mixed-strategy NE, otherwise – one pure-strategy NE* (1; 0; 1).

To prove the Proposition, we need two lemmas. First, similar to the Prisoner’s dilemma, for the transparent BoS we have:

*In one-shot transparent BoS any strategy* (*s*_{1}; *s*_{2}; *s*_{3}) *is dominated by strategies* (*s*_{1}; *s*_{2}; 1) *and* (*s*_{1}; 0; *s*_{3}). *The dominance of* (*s*_{1}; *s*_{2}; 1) *is strict when s*_{1} < 1, *the dominance of* (*s*_{1}; 0; *s*_{3}) *is strict when s*_{1} > 0.

The proof is identical to the proof of Lemma 1.

*In one-shot transparent BoS, when Player 1 uses strategy* (1; 0; 1), *the best response for Player 2 is to use strategy* (0; 0; 1) *for and to use* (1; 0; 1) *otherwise*.

*Proof*. By Lemma 5 for Player 2 the best response is a strategy (*s*_{1}; 0; 1) with 0 *s*_{1} 1. When Player 2 uses this strategy against (1; 0; 1), the expected payoff of Player 2 is given by

Thus the payoff of Player 2 linearly depends on the value of *s*_{1} and is maximized by *s*_{1} = 0 if
and by *s*_{1} = 1 otherwise. Inequality (7) is equivalent to (5), which completes the proof.

Using Lemmas 5 and 6, we can now compute NE for the one-shot transparent BoS:

*Proof*. Pure strategy NEs are obtained immediately from Lemma 6. To compute the mixed-strategy NE, recall that Player 1 achieves it when the expected payoff obtained by Player 2 for insisting and accommodating is the same:

By computing *x*_{1} from this equation and applying the same argument for Player 2, we get the strategy entries given in (6).□

*Consider one-shot transparent BoS with S* = 4, *T* = 3, *R* = 2, *P* = 1, *where both players have equal probabilities p*_{see} *to see the choice of the partner. In this game there are three NE for p*_{see} < 1*/*3*: (a) Player 1 uses strategy* (1; 0; 1), *Player 2 uses strategy* (0; 0; 1); *(b) vice versa; (c) both players use strategy* (*x*; 0; 1), *with . For p*_{see} ≥ 1*/*3, (1; 0; 1) *is the only NE*.

### Analysis of iterated transparent games

For the analysis of iterated games we use the techniques described in [23,24,59]. Since most of results for simultaneous and sequential iPD were obtained for strategies taking into account outcomes of the last interaction (“memory-one strategies”), here we also focus on memory-one strategies. Note that considering multiple previous round results in very complex strategies. To overcome this, one can, for instance, use pure strategies (see, for instance, [20]), but we reserve this possibility for future research.

Consider an infinite population of players evolving in generations. For any generation *t* = 1, 2,…the population consists of *n*(*t*) “species” defined by their strategies and their frequencies *x _{i}*(

*t*) in the population, . Besides, the probability of a player from species

*i*to see the choice of a partner from species

*j*is given by (in our case for all species

*i*and

*j*, but in this section we use the general notation).

Consider a player from species *i* playing an infinitely long iterated game against a player from species *j*. Since both players use memory-one strategies, this game can be formalized as a Markov chain with states being the mutual choices of the two players and a transition matrix *M* given by
where the matrices *M*_{0}, *M*_{1} and *M*_{2} describe the cases when neither player sees the choice of the partner, Player 1 sees the choice of the partner before making own choice, and Player 2 sees the choice of the partner, respectively. These matrices are given by

The gain of species *i* when playing against species *j* is given by the expected payoff E_{ij}, defined by
where *R, S, T, P* are the entries of the payoff matrix (*R* = 3, *S* = 0, *T* = 5, *P* = 1 for standard iPD and *R* = 2, *S* = 4, *T* = 3, *P* = 1 for iBoS, see Fig. 1), and *y*_{1}, *y*_{2}, *y*_{3}, *y*_{4} represent the probabilities of getting to the states associated with the corresponding payoffs by playing **s**_{i} against **s**_{j}. This vector is computed as a unique left-hand eigenvector of matrix *M* associated with eigenvalue one [24]:

The evolutionary success of species *i* is encoded by its fitness *f _{i}*(

*t*): if species

*i*has higher fitness than the average fitness of the population , then

*x*(

_{i}*t*) increases with time, otherwise

*x*(

_{i}*t*) decreases and the species is dying out. This evolutionary process is formalized by the replicator dynamics equation, which in discrete time takes the form

The fitness *f _{i}*(

*t*) is computed as the average payoff for a player of species

*i*when playing against the current population: where E

_{ij}is given by (9).

### Evolutionary dynamics of two strategies

To provide an example of evolutionary dynamics and introduce some useful notation, we consider a population consisting of two species playing iPD with strategies: **s**_{1} = (1, 0, 0, 1; 1, 0, 0, 1; 0, 0, 0, 0), **s**_{2} = (0, 0, 0, 0; 0, 0, 0, 0; 0, 0, 0, 0) (recall that we write 0 instead of *ε* and 1 instead of 1 *− ε*) and initial conditions *x*_{1}(1) = *x*_{2}(1) = 0.5. That is, the first species plays WSLS, and the second uses AllD. We set . Note that since , it holds *p*_{see} ≤ 0.5. Given *p*_{see} we can compute a transition matrix of the game using (8) and then calculate the expected payoffs for all possible pairs of players *ij* using (9). For instance, for *p*_{see} = 0 we have

This means that a player from the WSLS-species on average gets a payoff E_{11} = 2.995 when playing against a conspecific partner, and only E_{12} = 0.504, when playing against an AllD-player. The fitness for each species is given by

Since *f*_{2}(*t*) > *f*_{1}(*t*) for any 0 < *x*_{1}(*t*), *x*_{2}(*t*) < 1, the AllD-players take over the whole population after several generations. Dynamics of the species frequencies *x _{i}*(

*t*) computed using (10) shows that this is indeed the case (Fig. 8a). Note that since E

_{21}> E

_{11}and E

_{22}> E

_{12}, AllD is garanteed to win over WSLS for any initial frequency of WSLS-players

*x*

_{1}(1). In this case one says that AllD

*dominates*WSLS and can

*invade*it for any

*x*

_{1}(1).

As we increase *p*_{see}, the population dynamics changes. While for *p*_{see} = 0.2 AllD still takes over the population, for *p*_{see} = 0.4 WSLS wins (Fig. 8a). This can be explained by computing the expected payoff for *p*_{see} = 0.4:

Hence *f*_{1}(*t*) > *f*_{2}(*t*) for 0 ≤ *x*_{2}(*t*) ≤ 0.5 *x*_{1}(*t*) ≤ 0, which explains the observed dynamics. Note that here E_{11} > E_{21}, while E_{12} < E_{22}, that is a conspecific partner wins more than a partner from a different species when playing against WSLS- and AllD-players alike. In this case one says that WSLS and AllD are *bistable* and there is an unstable equilibrium fraction of WSLS players given by

We call *h _{i}* an

*invasion threshold*for species

*i*, since it takes over the whole population for

*x*(

_{i}*t*) >

*h*, but dies out for

_{i}*x*(

_{i}*t*) <

*h*. To illustrate this concept, we plot in Fig. 8b the invasion threshold

_{i}*h*

_{1}for WSLS species playing against AllD as a function of

*p*

_{see}.

One more possible type of two-species dynamics is *coexistence*, which takes place when E_{11} < E_{21}, E_{12} > E_{22}, that is when playing against a player from any of the species is less beneficial for a conspecific partner than for a partner from a different species. In this case the fraction of a species given by (11) corresponds to a stable equilibrium meaning that the frequency of the first species *x*_{1}(*t*)increases for *x*_{1}(*t*) < *h*_{1}, but decreases for *x*_{1}(*t*) > *h*_{1}. We refer to [24] for more details.

### Evolutionary simulations for transparent games

Theoretical analysis of the strategies in repeated transparent games is complicated due to the many dimensions of the strategy space, which motivates using of evolutionary simulations. For this we adopt the methods described in [23,24]. Each run of simulations starts with five species having equal initial frequencies: *n*(1) = 5, *x*_{1}(1) =…= *x*_{5}(1) = 0.2. Following [23], probabilitieswith *k* = 1,…, 12 for these species are randomly drawn from the distribution with U-shaped probability density, favouring probability values around 0 and 1:
for *y ∈* (0, 1). Additionally, we require , where *ε* = 0.001 accounts for the minimal possible error in the strategies [23]. The frequencies of strategies *x _{i}*(

*t*) change according to the replicator dynamics equation (10). If

*x*(

_{i}*t*) <

*∊*, the species is assumed to die out and is removed from the population (share

*x*(

_{i}*t*) is distributed proportionally among the remaining species); we follow [23, 24] in taking

*∊*= 0.001. Occasionally (every 100 generations on average to avoid strong synchronization), new species are entered in the population. The strategies for the new species are drawn from (12) and the initial frequencies are set to

*x*(

_{i}*t*

_{0}) = 1.1

*∊*[23].

## Data availability

The empirical datasets generated during the current study and the source code used for this are available from the corresponding author on reasonable request.

## Contributions

A.U. conceived the original idea and performed simulations with the help and advice of S.E. and F.W.; T.S., I.K. and S.M. contributed to the interpretation of the results. All authors contributed to writing and revising of the manuscript.

## Competing interests

The authors declare no competing financial interests.

## Supplementary Note

Here we introduce a variant of transparent iterated Prisoner’s dilemma (iPD) with a restricted strategy space. Note that in iPD a rational player rather would not cooperate seeing that partner defects, then we can simplify iPD strategies by setting . A question then is, whether such priors change the dynamics of the iPD-strategies. Supplementary Fig. 3 shows that restricting strategy space results in the same drop of cooperation as in the non-restricted iPD.

There is however, one difference: Supplementary Fig. 4 shows that for high *p*_{see} an “inverse Leader-Follower” strategy (inverse L-F) emerges instead of Leader-Follower introduced for the non-restricted iPD. Inverse L-F is theoretically represented by **s** = (1110; 0000; 0000), that is the player cooperates when it does not see the choice of the partner and defects otherwise. In the simultaneous iPD (*p*_{see} = 0) L-F behaves as unconditional cooperator and is easily beaten, but it becomes predominant in restricted settings for *p*_{see} = 0.5. Note that L-F is an extension of the strategy (1; 0; 0), which plays a special role in one-shot PD (see “Methods” section). However, memory provides to inverse L-F an important advantage: it can distinguish unconditional defectors AllD from conspecifics. Resistance to AllD is achieved by defecting after mutual defection (*s*_{4} = 0).

Spread of inverse L-F in the restricted iPD for high transparency illustrates pervasiveness of “Leader-Follower” principle. It also shows that the role of initiators can vary: in some cases, these agents reap special benefits, but in other cases they also carry the burden. Although counter-intuitive at first glance, the cooperativeness of Leaders in the L-F strategy corresponds to the behaviour of individuals that agree to do a necessary but risky or unpleasant job without immediate benefit. Examples include volunteering in human societies and acting as sentries in animal groups.

## Acknowledgements

We acknowledge funding from the Ministry for Science and Education of Lower Saxony and the Volkswagen Foundation through the program “Niedersächsisches Vorab”. Additional support was provided by the Leibniz Association through funding for the Leibniz ScienceCampus Primate Cognition and the Max Planck Society.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].
- [6].↵
- [7].↵
- [8].↵
- [9].
- [10].
- [11].
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].
- [44].
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].
- [51].
- [52].
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵