Abstract
Interactions of group-living primates with conspecifics range from cooperation to competition. Game theory allows testing the strategies that underlie such interactions, but in classical theory, agents act simultaneously or sequentially. Many real-world decisions, however, are made while directly observing partner’s actions. To investigate social decision-making under conditions of face-to-face action visibility, we developed a setup where two agents observe each other and reach to targets on a shared transparent display, enabling naturalistic interactions we call “transparent games”. Here we compared human and macaque pairs in the transparent version of the coordination game “Bach or Stravinsky”, which rewards coordination but entails the conflict about which of the two individually-preferred coordinated options to choose. Most human pairs developed coordinated behavior, and 53% adopted dynamic coordination via turn-taking to equalize the payoffs. All macaque pairs also converged on coordination, but in a simpler, static way: persistently selecting one of the two coordinated options or one of the two display sides. Two animals that underwent training with a turn-taking human confederate learned to coordinate dynamically. When tested as a pair, they mostly converged on the faster monkey’s preferred option, and a dynamic coordination emerged as animals spontaneously took turns in leading to their respective preferred option and following to the other’s. The observed choices were captured by modeling a probability to see the other’s action before own movement. Importantly, such competitive turn-taking was unlike the benevolent turn-taking in humans, who equally often initiated switches to and from their preferred option. Our findings demonstrate that dynamic coordination is not restricted to humans – although it serves a selfish motivation in macaques – and emphasize the importance of action visibility in the emergence and maintenance of coordination.
Introduction
The majority of primate species lives in complex social groups, in which interactions range from intense competition to cooperation 1–5. To adjust their behavior optimally, individuals need to assess not only their own actions and goals, but also other group members’ current actions, while taking into account the history of interactions with and between these individuals 4, 6,7. Coordination is essential for maintaining cohesion between group members, avoiding conflicts, and achieving individual and joint action goals. Here we focus on understanding how such coordination can be achieved and maintained in a dyadic setting involving two individual agents, in “transparent” conditions in which both agents can observe each other’s evolving actions and social cues 8,9.
To shed light on the evolution of coordination, researchers have turned to nonhuman primates to investigate how their behavior compared to that of humans 10–12. Macaques play a particularly important role in this context, as the most common nonhuman primate model for studying the neuronal basis of higher socio-cognitive functions 4,13,14. The realization that primate brain functions are best understood under social conditions in which they evolved sparked a surge of interest in behavioral repertoire and neural correlates of economic and social factors underlying social cognition in humans and nonhuman primates 15,16.
Game theory, developed to study strategic interactions in rational decision-makers 17, offers a powerful framework to investigate dyadic social interactions 18. Game-theoretical approaches derived from mathematical and experimental economics emerged as an important tool in behavioral ecology 19 and more recently in the burgeoning fields of neuroeconomics and the neuroscience of social decision-making 20–22. One fundamental class of games are the so called 2×2 games 23–25 in which each agent chooses one of two actions and the outcome depends on the combination of their choices, as defined by the payoff matrices describing the gain or loss the two agents earn for the four possible combinations. Such choices can be presented as one-shot games or as iterated games. In the latter, agents interact repeatedly, as is often the case in real life. Repeated interactions encourage tracking of interaction history, as well as the formulation of predictions regarding the other agent’s decisions.
A particularly interesting class of 2×2 games for understanding many realistic scenarios are non-zero-sum games, in which the agents need to coordinate their choices to maximize individual and/or joint reward 26. As opposed to the strictly competitive zero-sum games, non-zero-sum games have both cooperative and competitive elements, with aligned and opposing interests. A number of such games have been used in studies of coordination behavior in human and nonhuman primates. In the Stag Hunt (or Assurance) game, agents choose between maximizing the individual as well as joint reward and minimizing the individual risk 27. Humans (Homo sapiens), chimpanzees (Pan troglodytes), and capuchin monkeys (Cebus apella) converge at above chance level on mutually beneficial high-reward / high-risk choices in the iterated Stag Hunt game; humans did so at a higher rate than apes, and apes at a higher rate than capuchins 28–30. The performance of rhesus monkeys (Macaca mulatta) was more similar to humans than capuchins, especially when no information about the current choice of the partner was available 31. Interestingly, in a computerized Stag Hunt game, humans adapted more closely than macaques to a simulated partner’s stag choice probability, showing that different species might employ distinct but overall fairly successful strategies to maximize reward in a dyadic context 32.
Similar to Stag Hunt, in Prisoner’s Dilemma (PD), the joint reward of two agents is maximized if they both choose to cooperate. Yet, an individual agent obtains the highest reward when defecting while the partner cooperates, and in this case the cooperating partner receives the lowest reward. Thus, there is a dilemma, or conflict between the self-interested temptation to defect and a riskier cooperation. In a single round of PD rational agents should both defect. For mutual cooperation to emerge in PD, agents need to “trust” the partner, i.e. to expect the partner to cooperate for the mutual benefit. Humans often show a bias towards cooperative behavior 33–35. Remarkably, it was demonstrated that in an iterated PD game rhesus macaques, while mostly selecting defection, chose mutual cooperation significantly more and mutual defection significantly less often than expected by chance, especially after preceding cooperation 36. This suggests that macaques can at least partially overcome the selfish motives and reciprocate cooperation.
Another type of conflict is implemented in the Conflict game (also known as Hawk-Dove, or game of Chicken) 37. These anti-coordination games model a competition over a shared resource that can be monopolized by one agent, while an actual clash is highly detrimental to both. Here, compared to PD, if both agents exhibit cooperative behavior (e.g. yield), their joint reward is less or equal to the joint reward of the two anti-coordinated choices (e.g. fight-yield and yield-fight) that maximize the individual reward of one of the agents. Hence, to achieve optimal and balanced distribution of rewards, agents need to alternate between the two anti-coordinated options. Humans, capuchins and rhesus macaques often converged on anti-coordination in this game, but importantly, only humans alternated between the two individually-optimal choices 37. Such alternating behavior resulted in the maximal and roughly equal payoff for both agents and is called “cooperative turn-taking” 38. It indicates that in the Conflict game, humans but not nonhuman primates strive for fairness.
In the games considered so far, reward-optimizing rational agents should converge on a behavior in which they either coordinate on one option (Stag Hunt and Prisoner’s Dilemma) or anti-coordinate on opposite options (Conflict game). A third type of games that emphasizes the coordination on either of two options is known as Bach or Stravinsky, or Battle of the Sexes (BoS) 39. Similarly to Stag Hunt and Prisoner’s Dilemma, but unlike the Conflict game, BoS models a cooperative interaction in which agents should optimally converge on the same option. Each agent has an individually preferred option, but coordinating on either one of these two options adds the same bonus to both agents. This renders any coordinated choice better than no coordination for both agents, but one coordinated choice best for the first agent and another for the second agent. The coordinated choices represent two pure-strategy Nash equilibria. For both agents, the rational choice is to coordinate, but unlike the conceptually simpler optimal convergence on one option in Stag Hunt and PD, BoS includes an inherent conflict about who profits the most. Consequently, when playing iteratively, turn-taking strategies are needed to avoid unequal distribution of rewards.
This combination of cooperation and conflict offers an interesting opportunity for studying social interactions that has not been sufficiently explored in nonhuman primates. Games of the BoS type have revealed that humans indeed often take turns – i.e. they switch from coordinating on one option (preferred by one player) to coordinating on the other option (preferred by another), again indicating human propensity for fairness 40–43. Interestingly, stable cooperative turn-taking frequently took place in 5-year-old children, but neither in 3-year-old children nor in chimpanzees 42. This begs the question whether turn-taking requires special social abilities that are unique to humans (and potentially only emerge later in life). We therefore compared the behavior of (adult) humans and rhesus macaques in the BoS economic game.
Traditionally, economic games are either played simultaneously (neither agent knows the choice of the other before making its own decision) or sequentially in a certain order; yet real dyadic interactions often play out in a real-time with the partner’s actions in direct sight 44,45. Thus, the timing of one’s own and other’s actions becomes part of the strategy space 46. Such a “transparent” continuous time setting can change choice strategies of the agents as compared to the classic simultaneous and sequential settings 38. Especially for nonhuman species, coordinating based on mutual choice history might be more demanding than coordinating based on the immediately observable behavior of others. For example, visual feedback about partner’s choices improves the convergence on a high reward / high risk equilibrium in the iterated Stag Hunt game in humans, capuchins and rhesus macaques 31, and such coordination in chimpanzees is facilitated if one of the agents consistently acts faster than the partner, suggesting a leader-follower strategy 29. Similarly, there were substantial differences in capuchins’ and rhesus’ behavior in the Conflict game when they had access to the current choice of a partner 37. In humans, a real-time anti-coordination game revealed that action visibility (and the possibility to change an already initiated action) increased efficiency and fairness 47. Besides those few pioneering studies, the experimental and theoretical effects of action visibility on (anti)coordination games have not been systematically addressed.
To account for possible changes in strategies during action visibility compared to simultaneous (“opaque”) discrete choices, we recently developed the concept of “transparent games” that extends classical evolutionary game-theoretic analysis to real-time interactions in which the visibility of partners’ actions depends on their relative reaction times 8,9. Here, we use the theoretical insights from that work and analyze short- and long-term dynamics of choices, mutual information and reaction times in humans and rhesus monkeys playing a transparent version of the iterated movement-based BoS game. Since many direct interactions between primates take place face-to-face 48–52, we designed a novel dyadic interaction platform that allows two human or monkey agents to observe each other and to act on the same visual objects on a vertical touch-sensitive transparent display between them. The advantage of this configuration is that agents can monitor and react to each other’s actions in real-time, emulating naturalistic interactions while still maintaining well-controlled laboratory conditions. Extending beyond static choice proportions, our approach captures dynamics of such interactions, affording a mechanistic model of trial-to-trial choice behavior driven by within-trial temporal patterns.
To facilitate inter-species comparability, we paired human or macaque agents without explicit task instructions. We expected that the transparency and instantaneous coaction 45 would facilitate efficient coordination, as well as reciprocal cooperative turn-taking in humans. We predicted that macaques would also utilize the information about partner’s actions, and anticipated one of the two outcomes. Macaques might exhibit a form of turn-taking, or converge on one of the two pure equilibria, i.e. coordinate to increase individual (and joint) rewards, but without turn-taking to balance rewards between players.
Results
In our experiments, two human or macaque agents were sitting face-to-face and shared a transparent vertical workspace between them (Figure 1A,B). Agents could simultaneously see the stimuli and each other’s actions. In the variant of the BoS paradigm we used, each agent chooses one of two simultaneously presented color targets (red and blue). The left/right position of each color was randomized across trials (50% left, 50% right). The resulting rewards follow the payoff matrix shown in Figure 1C and depend on the choices of both agents. Prior to the first dyadic session, each agent in a pair was individually trained to associate one of the two color targets to a larger reward, so that they were biased to value different colors. In the dyadic trials, selecting the same target resulted in an additional reward on top of the individually trained values, hence choosing the same target was always better than choosing different targets. But in such coordinated trials, the agent whose individually preferred (“own”) color was selected received a larger reward. This paradigm probes the ability to realize that coordinated target selection results in higher rewards, and the ability to perceive and deal with the inherent unfairness introduced in each trial by the unequal rewards for selecting the same target.
Human pairs mainly converge on fair coordination
We recruited naïve human subjects to assess performance in the BoS paradigm. Each subject was trained individually for 50-100 solo trials to operate the touch panel and to associate a higher auditory pulse count/monetary reward with one of the two color targets (Table S1). We then paired these pre-trained agents and let them explore the full payoff matrix. Subjects were instructed not to talk during the experiments (see Methods and Supporting Information for the full set of instructions given to the subjects). We mainly analyzed the last 200 dyadic trials of each session, to allow the subjects to explore all choice combinations and to converge on a strategy (Methods).
Human strategies ranged from strict adherence to the pre-trained higher value colors to joint selection of the same target with alternating between the two colors trial-by-trial or in blocks. Figure 2A,B illustrates an exemplary human pair that developed such alternating turn-taking behavior, using share of own choices (SOC) - the likelihood for each agent to select the own preferred color, and the share of left choices (SLC) - the likelihood for each agent to select the target on the left side. These measures provide direct insight into each agent’s value and side biases. Here the SOC curves show that they mostly jointly selected A’s or B’s color. After an initial period of long blocks of trials, this pair switched to shorter blocks. The SLC curves show that the agents did not display a side bias.
The SOC of all 19 pairs (Figure 2C) shows 53% (10 of 19) with balanced values close to 0.5 (indicating alternating between colors), and the remaining 47% (9 of 19) with at least one agent close to one or close to zero (indicating fixed color selection). Figure 2D reveals that most agents had no side bias, since the SLC was around 0.5. Figure 2E shows the average reward (AR) for each agent and the average joint reward, which in most pairs was above chance level (2.5 units), and in about half the pairs close to the optimum for the dyad (3.5).
From the sequence of choices and achieved rewards we derived three measures of coordination (Methods). The significant dynamic coordination reward (DCR) describes the average amount of reward a pair earned above (or below) the reward expected for independent choices with the observed choice frequencies per agent; zero DCR indicates coordination by chance alone, values significantly different from zero indicate above chance levels of coordination. The DCR values (Figure 2H) and the fairly balanced average rewards (Figure 2E) show that 10 of 19 human pairs converged on a form of “turn-taking”, or reciprocal coordination. These are the same 10 pairs that showed roughly balanced SOC and SLC values around 0.5 (Figure 2C,D). To better understand what drives high dynamic coordination seen in Figure 2H, we calculated the mutual information (MI) between the sequences of target color choice (MI target, Figure 2F) and the side choice (MI side, Figure 2G) of both agents; these measures indicate how well one agent’s choice of color or side can be predicted from the other’s choice (Methods). Comparing the DCR and the two MI reveals that high DCR values coincide with non-zero values of both MI side and MI target, indicating that dynamic coordination corresponds to both correlated side and color choices between the agents.
The 9 human pairs that did not show significant dynamic coordination used strategies other than turn-taking. These alternative strategies included color-based strategies such as: (i) both agents largely converging on a fixed color (5, 14, 16, 10, 18; indicating that at least one of the two agents understood the value of coordination), (ii) exclusively selecting the respective own preferred color (4, 19); and (iii) mixed color-side strategies where one agent consistently selected the non-preferred color while the other seemed to switch randomly between the two colors (11) or selected the right side (15). To summarize these results, more than half of the human pairs exhibited dynamic coordination and aimed for fairness.
Macaque pairs converge to simpler coordination strategies
Macaques were individually trained for multiple weeks to internalize the individually preferred reward values of the two color targets. All six macaques developed a strong preference for the large reward (range 78-100%, mean 95% large reward color selection the last solo session), before being paired with a conspecific (Table S2). Generally, macaques changed their choice behavior more slowly than human participants who often exhibited clear changes within a single session. We therefore collected the data from macaque pairs for multiple sessions to see how their behavior evolved over time and what strategy they converged on, instead of only sampling each macaque pair in a single session (note that we use the term “strategy” to conform to the standard game-theoretical notion, indicating a decision but making no assumptions about the underlying understanding which led to that decision, cf. 28). Figure 3A,B shows the color and side choice behavior for the first dyadic session of a typical example pair of monkeys. The SOC curves indicate that both agents had a strong preference for their own preferred colors; the SLC curves indicate no strong side selection bias. Later, in the sixth dyadic session (Figure 3C,D), the behavior had changed, with both agents displaying less own color selection and a stronger convergence on the left side of the screen (average SOC values closer to 1 as compared to 0.5 in the early session). This shift in strategy was gradual (Figure S1A-F) and resulted in an overall better outcome for both agents with nearly equalized reward (Figure S1C). The DCR did not reach significance for any session (Figure S1F) indicating that this pair did not employ a dynamic coordination strategy.
A similar picture emerges in the set of 9 pairs tested (with 6 unique macaques) (Figures 3E-J and S2). Comparing the average rewards in early and late sessions (Figure 3G, S2C) shows that most pairs over time reached better coordination, as the average reward across all pairs increased (Figure S3; early median 2.75 versus late sessions median 3.29, N: 9; p < 0.0039, two-sided exact Wilcoxon signed rank test). In 8 of 9 pairs, the proportion of coordinated trials increased significantly (p < 0.05, Fisher’s exact test of the proportion of coordinated versus uncoordinated trials in the early and late session).
Comparing early and late SOC and SLC (Figure 3E,F, and S2A,B) shows that this increase in coordination was mostly achieved either by converging on the same color target (pairs FC, LE, TC, TF and CL) or by converging on the same side (TE, CE, MC, and MF), also evident in non-zero values of target or side MI (Figure 3H,I). The low and with one exception non-significant DCR values in Figure 3J indicate that the resulting coordination was not achieved by dynamic turn-taking.
Converging on either the same color or the same side are both simpler strategies than dynamic coordination, by virtue of allowing each agent to make decisions without paying attention to the trial-specific actions of the partner. To summarize, macaque pairs learned to coordinate using simple strategies that did not require trial-by-trial integration of partner’s action.
Comparison of behavior between species
To compare the strategies of naïve human and macaque pairs, we plotted the mutual information values for side and color choices of each pair, for the last 200 trials (Figure 4). The distance of each pair’s location to the origin denotes the strength of (anti)coordination. Locations close to the x-axis denote static side strategies, locations close to the y-axis denoting static target color strategies. Ten of 19 human pairs were close to the main diagonal, corresponding to similar values of side and target MI. These locations denote trial-by-trial or block-wise turn-taking – a signature of dynamic coordination. In contrast to humans, nine macaque pairs developed simple static side-based (4 pairs) or single color-based (5 pairs) coordination, with different levels of coordination strength, but did not employ dynamic coordination.
Macaques paired with a human confederate learn to follow
The absence of trial-by-trial coordination in macaque pairs suggests that they did not monitor their partners in order to take their choices into account on a trial-by-trial basis. This might be a consequence of the animals’ training history (starting with extended individual solo training to associate one color with higher reward) or a general inability to consider the partner’s choice in making immediate choices, at least when a simpler strategy also helps to maximize the reward. To test the latter, we trained two monkeys in the dyadic condition with a human confederate who followed a strict pattern of alternating between the two colors in blocks of mostly 20 trials (“confederate training”). If the rhesus monkeys were insensitive to partner’s choices in our dyadic task, the pattern of confederate’s choices should not affect monkey’s behavior.
Figures 5 and Figure S4 show examples from early and late confederate training sessions for two animals. In both cases the monkeys started with a strong bias for selecting their own preferred color but over time changed to a reliable coordination (cf. Figure 5A with 5C and S4A with S4C). We call it the “following” behavior since in both animals reaction times initially were faster that the confederate’s, and did not vary strongly depending on the confederate’s actions, but in later sessions the animals reacted considerably slower in blocks where the confederate was selecting the animals’ non-preferred color (compare Figure 5B with 5D, and S4B with S4D), waiting for the confederate to commit to an action.
The aggregate measures of choice behavior (Figure S5A,B, S5G,H) and coordination development (Figure S5C-F, S5I-L) over multiple confederate-paired sessions showed similarities in monkeys’ learning (cf. Figure S5A-F and S5G-L). Both animals reached close to 0.8 DCR units, which is similar to the values reached by humans employing turn-taking (cf. Figure 2H). These results indicate that macaques can take information from their partner into consideration when making decisions in the BoS paradigm. Reaction time dynamics hinted that macaques started to actively monitor confederate’s actions and only selected their non-preferred color if the partner started to reach there first.
Due to the repeating block strategy employed by the human confederate it is not immediately clear however whether the monkeys made use of the action visibility and based their decision on the action observed within a trial, or rather on the reward history (e.g. switched after several uncoordinated trials at the beginning of a new block). To test how important the immediate access to the partner’s choice is for driving the “following” behavior, we performed a control experiment in which we placed an opaque barrier on the confederate’s side of the screen for roughly the middle third of a session. The barrier blocked the monkey’s view of the confederate hand, while keeping the face visible. Figures 5E,F and S4E,F show the strong effect this manipulation had on the monkeys’ behavior. In both monkeys the “following” behavior ceased, shifting to selecting the own preferred colors (Figures 5E, S4E). The reaction time difference histograms (Figures 5F, S4F) confirmed this observation: the significant difference between the selecting the preferred (blue) and the non-preferred (red) color in the upper half of the plot for the transparent condition disappeared in the lower half of the plot showing the results for the opaque condition. In other words, monkeys failed to coordinate without seeing the confederate’s immediate actions. These results imply that the immediate visual information about partner’s actions and not merely monitoring own reward history drove and maintained the macaques’ “following” behavior in transparent settings.
Confederate-trained macaque pair shows dynamic coordination
After two macaques had learned to monitor and follow a partner’s actions, we tested how these two animals would perform when paired with each other again. At the end of initial naïve sessions of this pair (i.e. before training with a human confederate), monkey F (agent A) mostly selected his own preferred color while monkey C (agent B) either selected the left side or monkey F’s preferred color. At the end of the confederate training, both monkeys followed the confederate with a block length of approximately 20 trials. Would they express turn-taking behavior or would they fall back to their initial static color-coordination behavior?
Figure 6 shows the share of own choices and the reaction time differences for the first three sessions after the re-pairing (Figure 6A-C), as well as the choices and aggregate measures of choice behavior and coordination over the course of all 6 sessions (Figure 6D-I). In the first session, after some initial back and forth, the pair converged on mostly selecting B’s color, with B reacting faster (Figure 6A). Animal A occasionally selected his own color repeatedly, even in the latter part of the session. In the third session, the pattern is inverted, with the pair converging mostly on A’s color, with A reacting faster (Figure 6C). This time animal B repeatedly tried to select his preferred color, and A actually followed for three short periods. The most interesting behavior however developed during the second session, in which the repeatedly animals alternated between the two colors in long blocks (Figure 6B). Block size was on average ∼45 trials, i.e. noticeably longer than the ∼20 trial blocks employed during the confederate training.
If action visibility is driving this form of dynamic coordination, than the question of who is leading (“insists” on own preference) and who is following (“accommodates” to the other’s preferred target) should depend on the relative reaction times of the two agents. Indeed, in the session in which the animals expressed turn-taking behavior (Figure 6B), it was the faster animal that selected the own preferred color, and the slower animal accommodated, demonstrating dynamic coordination and suggesting a competitive interaction. The mutual information measures as well as the DCR values (Figure 6G-I) indicate that in the 2nd session and the 5th session after confederate training, this pair exhibited robust dynamic coordination resulting in turn-taking behavior.
Comparison of dynamic coordination between species
At the first glance, the observed turn-taking in the confederate-trained macaques resembled the long turn-taking observed in some human pairs (cf. Figure 4, FC-2,5). Yet, the analysis of reaction time distributions suggests that macaques employed a competitive version of turn-taking in which the faster agent selects its own color (insists) while the slower agent follows (accommodates). Such behavior would be in line with our theoretical predictions 8, where we show with evolutionary simulations that competitive turn-taking provides the most effective strategy for a BoS type game when players have high probability to observe choices of the partner. To further test the hypothesis that this macaque pair developed competitive turn-taking we performed a correlation analysis between the modeled visibility of the faster agent’s action by the slower agent and the observed likelihood of following to the faster agent’s color (the faster animal nearly always opted to select his own color). Figure 7A shows agent A’s probability to select his non-preferred color, the “share of other’s choices” (the inverted share of own choices) and the modeled probability for A to see B’s choice before A made his choice; Figure 7B shows the same for agent B (see Methods for details on modeling). In both monkeys there was a strong correlation between the model and the observation, ranging from 0.92 for smoothed data to 0.64 for unsmoothed data, all significant at p < 0.00005. This analysis further supports the notion that the confederate-trained macaque pair had developed a competitive turn-taking (alternating Leader-Follower) strategy. Strong positive correlations between action visibility and following behavior were present in all 6 post-confederate-trained sessions, indicating bouts of dynamic coordination (Table S6).
In contrast, human pairs did not show a strong relationship between partner’s action visibility and choice. Even in the human pair with the strongest positive correlations (pair 12, Figure 7C,D), the “following” behavior was weak and resulted from a gradual transition from B initially following A to A following B later in the session, rather than from alternating turn-taking as seen in Figure 6B. This does not imply that reaction times did not play a role in human choices. Indeed, in 47% (9 of 19) human pairs the relative difference in reaction times between the agents significantly differed in coordinated trials compared to non-coordinated trials (Table S3). This implies that similarly to macaques, seeing the faster agent’s choice can help the slower agent to coordinate. But unlike the macaque pair, dynamic human coordination was not driven by the faster agent striving to select the own preferred color. Only one human pair out of 10 that exhibited dynamic coordination showed a significant difference in reaction times for coordinated choices favoring A vs. B, with each agent being faster when selecting the own preferred color (pair 12, coordination on A (mean: -88, SD: 129, N: 96) coordination on B (mean: 12, SD: 109, N: 180), t(168.8): -6.5, p: 10*10-06, t-test). See Figure S6 and Tables S4-S7 for details of correlation analyses in different groups.
The correlation analysis shows that for the confederate-trained macaques the visibility of the other’s action before making own decision correlated with the probability to follow the choice of the other in a trial-by-trial fashion, with the faster monkey “selfishly” selecting its own preference. Turn-taking in humans, however, did not rely on temporal competition to systematically steer the pair towards the faster agent’s preferred color. To highlight this difference, we asked how the two species established and maintained the dynamic coordination (in case it was present)? The trial-by-trial sequences of choices suggested that confederate-trained macaques transitioned between periods of coordination using short periods of non-coordination (Figure 8A, magenta arrows), while turn-taking humans tended to switch seamlessly between the two coordination options (despite verbal communication being prohibited). Figure 8B illustrates possible uncoordinated and seamless switch patterns, sorting them to “selfish” and “benevolent”. Figure 8C-F provides a detailed quantification of different aspects of transitions as scatter plots of benevolent choices on the y-axis and selfish choices on the x-axis, in the two species. Figure 8C compares the frequency of benevolent and selfish choices after the faster agent switched seamlessly from one coordination combination to the other. Turn-taking humans show a high and roughly equal amount of benevolent and selfish seamless switches, indicating overall balanced switching behavior. The four naïve macaque pairs showing the highest number of seamless switches all used a side-based strategy, which due to the color-to-side randomization (Methods) trivially generated seamless color switches. Confederate-trained macaques and non-turn-taking humans showed only very few seamless switches.
Alternatively to seamless switches agents can also end the current coordination block by switching to non-coordination (Figure 8A,B). Figure 8D shows, for each pair, the total numbers of own-own vs. other-other non-coordinated trials. These trials can be used to initiate a switch of coordination (cf. Figure 2A, from ∼trial 275 to the end: in this human pair three coordination switches were initiated by “selfish” own-own trials and four by “benevolent” other-other trials). All macaque pairs/sessions are below the unity-diagonal, indicating selfish preference to select own color. Turn-taking humans tend to show fewer non-coordinated trials and lay along the diagonal, indicating that in a half of trials there was a benevolent intention to initiate the switch to other’s preferred color.
The above analysis and the inspection of choice sequences (cf. Figure 8A) shows that in the confederate-trained macaques coordination epochs were separated by epochs of non-coordination that looked like each agent “challenged” the other to accommodate. This poses the question whether there was a temporal component in how a pair initiated the challenging epochs and how they resolved those challenges. Of particular interest are the confederate-trained macaques because they showed a very strong dependence between the relative reaction time difference and the choice. Focusing on the transitions from coordinated to non-coordinated epochs and vice versa, Figure S7 shows that the faster agent slowed down and the slower agent sped up during the transition to non-coordination. Conversely, the agents sped up to initiate the coordination on the own color, and the accommodating agents slowed down.
Figure 8E compares, for all coordination-to-non-coordination transitions, the number of own vs. other’s color selections by the faster agent. Here all macaque pairs/sessions are located below the diagonal indicating that the faster agent tended to act selfish, while turn-taking humans mostly lay along the diagonal, again indicating balanced switching behavior. Finally, Figure 8F compares the number of times the faster agent selected own or other’s color in trials when the pair when they resolved a challenge by switching from non-coordination to coordination. Confederate-trained macaque sessions and 6 out of 9 naïve macaque pairs fall below the diagonal indicating selfish choices of the faster agent. Turn-taking humans, however, lay along the diagonal indicating balanced challenge resolution.
Taken together this shows that in turn-taking human pairs there are no indications of competitive turn-taking, as the faster agents showed balanced selection of both coordination options. Furthermore, in 4 out of the 10 turn-taking pairs one agent was significantly faster than the other in coordinated trials as compared to non-coordinated trials, regardless if it was coordination on its own or other’s color (Table S3). This means that such agent by necessity initiated switches to and from the own color; while being able to see the faster agent’s action likely helped the slower agent to maintain coordination and to accomplish the observed high rate of seamless switches. In contrast, the confederate-trained macaques showed no seamless switches, but displayed temporal competition in which the faster agent led to his own color.
Discussion
We studied macaque and human pairs in a coordination game, which offered higher rewards for selecting the same option, but entailed an inherent conflict about which of the two coordinated options to select. Both species largely converged on coordinated behavior but in a markedly different fashion. Many human pairs converged on nearly-optimal coordination and fair (cooperative) turn-taking that equalized the rewards of the two partners. Macaques, instead, tended to exploit simpler solutions that allowed them to maximize their reward without the need to track immediate actions of the partner, i.e. using static instead of dynamic coordination. The macaques’ behavior however significantly shifted from such simple strategies to competitive turn-taking after they were trained to observe and attend to the other’s choice with a human confederate. In post-confederate-training sessions, the choice behavior was highly correlated with trial-by-trial differences in the reach reaction times: the faster monkey chose his preferred option and the slower one followed. Our results show that both humans and macaques can take information about the other’s action into account (when available) before making their own decisions. There was, however, a fundamental difference: when coordinating dynamically, monkeys showed competitive turn-taking while in humans the faster agent often “offered” switching to the partner’s preferred color, exhibiting a form of benevolent and fair turn-taking.
Coordination in naïve humans and macaques
Half of the human pairs (10 of 19) converged on dynamic coordination that balanced the reward, developing fair, cooperative turn-taking, while another 5 pairs coordinated on the same fixed color throughout the session. Similarly to the approach of Brosnan and colleagues 28,37, the human subjects in the present study had to infer the underlying payoff rules while playing, to closely match the procedure in macaques. Even under such conditions, the majority of humans arrived at optimal behavior within a single session, in our study as well as in other coordination and anti-coordination games 28,31,37. The observed turn-taking frequency was also close to other variants of BoS games with explicit instruction of the payoff matrix where approximately 60% of human subjects developed turn-taking 40–43, slightly higher than the 41% in the Conflict game 37.
In contrast to humans, macaque pairs converged on simpler strategies, either selecting the same color (56%) or selecting the same side (44%). These two strategies require less trial-by-trial coordination effort than the turn-taking. All but one macaque coordinated both via same color or same side selection depending on the partner, and no macaque persisted on the own color with all tested partners. For instance, monkey F insisted on his preferred color when playing with C, accommodated to the other’s color when playing with T and converged on his right-side when playing with M. This indicates that macaques took the partner’s actions into account, albeit not in the strict trial-by-trial fashion. Macaques started with insisting on their own color (reward size 2 drops), then one agent changed either to selecting the non-preferred color (increasing his reward to 3 and the other’s to 4 drops) or to selecting one side (increasing his reward to an average of 2.5 and the other’s to 3 drops). If the other monkey started favoring the same side, this increased both agents’ reward up to 3.5. This sequence of intermediary steps illustrates how macaque pairs might develop their strategy over the course of several sessions.
While our study is to the best of our knowledge the first implementation of a classic BoS paradigm in macaques, other related 2×2 games have been investigated in different primates (humans, chimpanzees, macaques and capuchins). In some games human and nonhuman species show qualitatively comparable capabilities to coordinate choices and to maximize payoffs. This holds for simple coordination games such as Stag Hunt 28,31, as well as for the Prisoner’s Dilemma, where macaques showed mutual cooperation significantly above chance and mutual defection below chance 36, similarly to humans 33. Conversely, the behavior of humans and nonhuman primates differs when the competition can be resolved by turn-taking. In the anti-coordination Conflict game where maximum joint reward is obtained if each agent selects a different icon 37, capuchins and macaques converged on only one of the two possible asymmetric equilibria (much like 4 of our 9 naïve macaque pairs), while 11 out of 27 human pairs balanced payoffs by alternating between the two anti-coordination equilibria (similar to the 10 turn-takers out of our 19 pairs).
Action visibility and dynamic coordination
The lack of spontaneous dynamic coordination in naïve macaques in our experiments cannot be attributed to their inability to infer the upcoming choices of the partner, because they could observe partner’s actions through the transparent display. This setup is different from traditional economic games, which are either played simultaneously or sequentially. It has been demonstrated that coordination (or anti-coordination) typically is improved with increasing information about the other’s choice by going from the strictly simultaneous to the sequential mode 53,54. The transparent game approach we adopted here differs from those two classical modes in that each subject can decide independently when to act (within a certain time window). Brosnan and colleagues used a similar approach to show that compared to an opaque simultaneous setting, macaques and capuchins improved their anti-coordination in a transparent (called asynchronous) version of a Conflict game. In this study, subjects looked at the same monitor while sitting next to each other and could observe each other’s decisions as cursor movements 37.
In our setup, agents sat opposite to each other and saw actual eye and hand movements, combining face-to-face and action visibility. In humans, face-to-face visibility in a simultaneous iterated Prisoner’s Dilemma (iPD) and in the Ultimatum game significantly improves mutual cooperation, even without action visibility 55,56. This supports the idea that non-verbal social signals and more generally, the observable presence of others influence decision-making. The ongoing action visibility per se also had a profound effect on mutual cooperation in iPD 57 and in a web interface-based anti-coordination game conceptually similar to BoS 47. In a competitive reaching task, a face-to-face transparency allowed human subjects to glean useful information from observing the relevant hand effector and from seeing the face and the full body of an opponent 58. Seeing actual movements rather than relying on abstract representations of others such as cursor motions might be especially crucial in nonhuman experiments.
The importance of action visibility is also supported by computational modeling. In the continuous-time cooperation games such as iPD and Stag Hunt where agents could observe and respond to each other’s actions in real-time, cooperation by coaction is more easily obtained and stabilized against exploitation than the cooperation that relies on delayed reciprocity 45. For the iPD and BoS transparent games, we showed that different coordination strategies are preferable for different probabilities of seeing partner’s choice 8,9. When these probabilities are low (or when agents do not utilize the action visibility), simple strategies like “Win-stay, lose-shift” are most effective. For higher probabilities of seeing partner’s choice more complex strategies emerge, such as coordinated turn-taking and temporal Leader-Follower, where the faster agent determines the choice of the slower agent. For the selfish agents this means that the faster one insists on its individually-preferred own target and the slower accommodates 8.
Given that many human pairs exhibited spontaneous dynamic coordination, while naïve macaques did not, we can ask if and how these two species utilized the action visibility. Our findings suggest that humans relied on action visibility for seamless switches from one color to another, and for maintaining the coordination within a block of trials. But besides one pair, there was no indication that humans employed the competitive variant of the Leader-Follower strategy. Instead, faster agents were as likely to switch from their own to the other’s color, indicating benevolent turn-taking. Very different dynamics transpired in additional experiments with macaques. While naïve macaques did not seem to monitor each other’s choices in each trial, pairing them with a human confederate playing a color alternation strategy in short blocks demonstrated that macaques are capable to closely follow the other’s actions, and adopt the imposed “turn-taking”. Moreover, temporarily blocking the view of the confederate’s hands abruptly changed macaque behavior, disrupting the coordination. The macaques’ coordination was thus driven by action visibility and not by simple strategies like win-stay-lose-shift or trial counting. In line with the Leader-Follower strategy, macaques were faster than the confederate when selecting own color, but waited for the confederate when selecting the confederate’s color. Intriguingly, pairing the two macaques that completed the training with a human confederate resulted in behavior different from both naïve macaque behavior and from the “following behavior” with the confederate. The confederate-trained macaques competed, with the faster agent selecting the own color and the slower agent following, such that their choices depended on the relative difference between their reaction times. This behavior resulted in either sustained coordination on one fixed color or in turn-taking. Comparing the temporal signatures of the reaction time differences in trials around a switch to or from coordinating on a specific color further indicated that macaque turn-taking was competitive and dynamic in nature. The break of coordination was triggered by the faster agent slowing down and the slower one speeding up and selecting the own color; conversely, the transition to coordination was associated with the speeding up agent selecting the own color and the slowing down agent accommodating.
In summary, both species exploited action visibility to achieve and/or maintain the turn-taking, emphasizing the importance of the “transparency” of interactive behavior as an important determinant of emerging strategies. Human agents initiated switches to and from their preferred colors to “fairly” balance the payoffs, similarly to the Conflict game 37, while macaques established competitive dynamics, as predicted by our evolutionary simulations 8. These results add to the body of literature indicating that humans in a social setting might base their decisions not only on pure reward maximization 59. It is also plausible that such normative behavior can reflect prospective planning to ensure a stable persistence of the individually-beneficial cooperation. The lack of the cooperative turn-taking in nonhuman primates, as compared to the humans’ propensity to engage in it, might reflect cognitive limitations in long-term planning and perspective taking 60. Species differences in general cooperative behavior may explain both the reluctance of naïve macaques to coordinate dynamically, as well as the competitive nature of dynamic interactions. The transparent BoS game requires the agents to cooperate in close proximity in the pursuit of an immediate reward which necessitates some degree of social tolerance by both partners. We can take natural food sharing behavior as a proxy for social tolerance. Food sharing behavior is not equally prevalent among primate species: and has been described for humans, apes, some baboon species 61, and some New World monkey species including capuchins share food even between adults, while rhesus macaques do not even share food with their offspring 62. In line with these patterns, mutually beneficial alternating task performance and turn-taking has been observed in humans 23, apes 63–65, and capuchins 66, but not in more despotic, less tolerant rhesus macaques 5.
Limitations and future directions
There are several limitations in our study that have to be considered and possibly addressed in future experiments. Firstly, we only tested 6 macaques (and 9 pairs), all males, and only one pair was housed together (others were from neighboring but separate enclosures). Therefore, we cannot say much about the influence of social rank. Anecdotally, however, it did not seem that the more dominant agent always prevailed. For instance, one subordinate animal (monkey T), the smallest and likely to be subordinate to all other partners (cf. 67), successfully “insisted” on his own color in 2 out of 3 cases. Furthermore, we only tested two confederate-trained macaques so far. While the results are exciting and show that macaques can engage in dynamic competitive turn-taking, future experiments will need to test how generalizable this pattern is.
Secondly, due to the task target color/side randomization, selecting targets on the same side of the display resulted in efficient and fair coordination. We cannot say if fairness was a contributing factor in convergence of the four naïve macaque pairs to this strategy, although we deem it highly unlikely. A new experiment where the probability of a specific color to appear on the same side is parametrically modulated is needed to evaluate the conditions under which this strategy emerges (or what level of unfairness is accepted). Furthermore, it would be interesting to test human pairs in several sessions, similar to macaques, to see if humans might also converge on the fixed side strategy as means of fair and seamless coordination. It requires preventing the human participants from any verbal contact after and between the sessions however. Finally, it would be important to vary the ratio and the range of reward magnitude (“stakes”), as it has been shown that stakes may strongly affect the individual 68 and social 47 decisions.
Notwithstanding these limitations, our results contribute novel insights to understanding social decision processes as they unfold in real-time during transparent interactions, and offer a new route to further behavioral and neural investigations of dynamic decision-making in cooperative and competitive contexts.
Funding
This work was supported by a funding from the Ministry for Science and Education of Lower Saxony (“Top Level Research in Lower Saxony”) and the Volkswagen Foundation through the “Niedersächsisches Vorab”, https://www.volkswagenstiftung.de/en/funding/niedersaechsisches-vorab (JF, AG, ST, IK). Additional support was provided by the Leibniz Association through funding for the Leibniz ScienceCampus Primate Cognition, https://www.primate-cognition.eu (JF, AG, ST, IK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Authors’ contributions
JF, ST and IK wrote the key grants to support his study. SM, AMU, AG, ST and IK conceptualized and designed the task. SM implemented the task and conducted the experiments. SM and AMU analyzed the data. SM, AMU and IK prepared figures. SM, AMU and IK wrote the initial version of the manuscript. All authors discussed and interpreted the findings, and revised the manuscript.
Competing interests
The authors declare no competing interests.
Data availability
The datasets generated and analyzed in the current study are available from the corresponding author on reasonable request, and will be uploaded to Open Science Framework data repository (https://osf.io/).
Methods
Participants
Humans
38 right-handed subjects (23 females, mean age: 26.1 ± 4.1 SD, range 20 to 41 years) participated in the study as paid volunteers. Subjects were tested as 19 unique pairs, i.e. each participant contributed only once. Instructions given to the subjects before the experiment are provided in Supporting Information. In short, subjects were instructed how to operate the setup and to interpret the auditory feedback as an indicator of the earned reward. They were not given an explicit description of the task’s payoff structure beyond “Your reward will depend on your own and your partner’s choice”. Prior to the experiment, subjects were individually familiarized with the setup and practiced a single-player (“solo”) version of the task. All subjects gave written informed consent for participation after the procedures had been explained to them and before taking part in the experiment. Experiments were performed in accordance with institutional guidelines for experiments with humans and adhered to the principles of the Declaration of Helsinki. The experimental protocol was approved by the ethics committee of the Georg-Elias-Mueller-Institute for Psychology, University of Goettingen (GEMI 17-06-06 171). We excluded 4 additional human subjects (2 pairs) from the analyses due to experimental differences (one pair did not perform the initial individual training; one pair ended the experiment prematurely).
Macaques
Research with nonhuman primates represents a small but indispensable component of neuroscience research. The scientists in this study are aware and are committed to the responsibility they have in ensuring the best possible science with the least possible harm to the animals 13.
Six adult male rhesus monkeys (designated by initials C, E, F, L, M, T) participated in the study, yielding 9 pairs (each monkey participated in 2 or more pairs). Animals were extensively trained with positive reinforcement to climb into and stay seated in a primate chair. The experimental procedures were approved of by the responsible regional government office (Niedersaechsisches Landesamt fuer Verbraucherschutz und Lebensmittelsicherheit (LAVES), permits 3392-42502-04-13/1100 and 3319-42502-04-18/2823). The animals were pair- or group-housed in facilities of the German Primate Center (DPZ) in accordance with all applicable German and European regulations. The facility provides the animals with an enriched environment, including a multitude of toys and wooden structures 69,70, natural as well as artificial light and access to outdoor space, exceeding the size requirements of European regulations, and a rich diet including primate biscuits, fruit and vegetables. During the study the animals had unrestricted access to food and fluid, except on the days where data were collected or the animal was trained on the behavioral paradigm. On these days, the animals were allowed access to fluid through their performance in the behavioral paradigm. The DPZ’s veterinarians, the animal facility staff and the lab’s scientists carefully monitor the animals’ welfare.
Experimental setup
For maximal comparability of the human and the monkey behavior, we developed a novel Dyadic Interaction platform in which two human or nonhuman primate subjects co-act in a shared workspace while sitting face-to-face (Figure 1A, B). Joint dyadic tasks have been previously implemented in a side-by-side setting with a shared or separate workspaces for each subject 3,36,37,71,72, or in a table-like settings with opposing subjects acting in a horizontal workspace that is not in the line of sight between the subjects’ faces 73,74. Only few studies utilized face-to-face arrangement of the subjects, using a video projector and two semitransparent mirrors virtually placing the stimuli into the shared plane 75, or incorporating physical targets into transparent Plexiglas screen 58. Similarly, with our design we aimed for maximal availability of mutual social signaling in a face-to-face setting, a shared vertical workspace with computer-controlled stimuli in the direct line of sight, without risk of injuries due to physical contact, and suitability across species. We achieved this using a novel transparent display (1920 x 1080 pixels, 121 cm x 68 cm, 60 Hz, EYE-TOLED-5500, Eyevis, Reutlingen, Germany), amended for dual-side touch sensitivity (PQLabs, G5S, Freemont, CA, USA) with a custom-built “sandwich” construction. Sitting on either side of the display, both subjects saw each other and the same screen display. Two proximity sensors (Carlo Gavazzi CA18-CA30CAF16NA, Lainate, Italy) per subject for the left and right hand respectively, mounted below the screen (“home” buttons) and the two touch panels mounted on either side of the screen registered hand positions of both agents, at 240 Hz temporal resolution.
Experimental control and stimulus presentation were implemented using the EventIDE software package (Okazolab, Delft, The Netherlands). Liquid reward for monkeys was delivered via computer-controlled peristaltic fluid pumps, for every correctly performed trial.
Dyadic decision-making task
Bach-or-Stravinsky game
Since we are interested in the effect of mutual action visibility on coordination behavior, we implemented a transparent version of a Bach-or-Stravinsky (BoS) game, in which each player’s time-continuous visuomotor behavior can be seen by the other player. Conceptionally, in the BoS game two agents are choosing between going to Bach or Stravinsky concerts. Agent A prefers Bach, Agent B prefers Stravinsky; yet, both prefer going to the concert together 76. Thus, agents wish to coordinate their behavior but have conflicting interests. This is a classic 2×2 non-zero sum game (also known as the Battle of the Sexes) with two pure strategy Nash equilibria (Bach-Bach, or Stravinsky-Stravinsky), and one less efficient mixed strategy Nash equilibrium, where the agents go to their preferred event more often than the other 26,39.
In our implementation, the agents were choosing between two options represented by two differently colored targets placed left and right on the screen. Each target color was associated with a higher reward for one of the agents resulting in individually preferred targets. An agent selecting own preferred target was assured to get at least 2 reward units, while selecting the other target (individually non-preferred target) yielded at least 1 reward unit. Additionally, when both agents selected the same target, a bonus of 2 reward units was added to the payoff of each agent. Thus, the maximum average joint reward (3.5 units) was obtained on coordinated trials when one agent selected own preferred target (getting 4 reward units) while the other chose non-preferred target (getting 3 reward units). Note that on any given coordinated trial, i.e. when agents selected the same target for overall higher reward, this payoff matrix (Figure 1C) resulted in an unequal (i.e. unfair) reward distribution. Hence, the BoS paradigm probes both the ability to realize that coordinated target-selection results in higher rewards than non-coordinated target-selection, as well as the ability to perceive and counteract the unfairness/conflict situation.
The task
On every trial, each subject chose between the individually-preferred and non-preferred target on the touchscreen. Targets were light blue circles of 35 mm diameter with either a red (individually-preferred target for Agent A) or a yellow (individually-preferred target for Agent B) rings (for better visualization, we replaced yellow and red rings with blue and red solid targets in the figures and the text).
Subjects had to place both hands on the two home buttons for 500 ms to start a trial after the inter-trial interval (ITI). This allowed us to control which effector (acting hand) was used. (Figure 1D). Then an initial fixation target without a colored ring appeared on the screen (10 cm below eye level). Subjects had 1500 ms to touch this initial fixation target on the screen with the instructed hand. After both subjects touched the fixation target they had to hold it for 500 ms. Then two choice targets appeared at one of three different pairs of positions (140 mm to the left and right of the central fixation target and either at the same height as the central fixation target or 35 mm below or 35 mm above it). We randomized the “red” location equally over all six positions, balanced in sets of 18 trials with each of the six “red” positions appearing three times, and vertically mirrored the opposite location. Simultaneous with targets’ appearance the initial fixation target disappeared, which served as a go signal. Subjects had 1500 ms to make their choice and touch one of the targets. After both subjects acquired their chosen target, selected target(s) brightened up, and both subjects needed to hold the hand on the target for another 500 ms. At that point, the choices were evaluated and rewards were dispensed according to the payoff matrix. The amount of reward earned by each subject was signaled by two sequential series of auditory pulses, with different pitch for each subject. Each pulse was constructed as a harmonic series with 12 overtones and a fundamental frequency of 443 Hz for side A and 733 Hz for side B to provide distinct sounds for each agent. For the monkeys, we immediately delivered water as reward (approximately 0.14 ml per pulse) concurrently with the auditory pulses for the respective monkey’s side; humans were instructed to expect “a few” cents per pulse, the accumulated earnings were paid out as a lump sum after the experiment. After the reward period subjects had to wait for an inter-trial-interval of 1500 ms before they could initiate the next trial.
The side of the targets was randomized on each trial, i.e., in ∼50% the red target was on the right, the blue on the left, and vice versa. This design can lead to the three following coordination patterns (or any mix of the three):
Coordinating statically by each agent repeatedly selecting the same fixed color of the target, irrespective of unfair distribution of the rewards.
Coordinating statically by each agent repeatedly selecting the same fixed side. Due to color/side randomization, this pattern ensures a fair reward distribution.
Coordinating dynamically by both agents selecting the same target, while picking from both colors and both sides trial-by-trial (e.g. trial 1 red right, trial 2 red left, trial 3 blue left, … etc.). This could result in a fair or unfair reward distribution depending on the ratio of red to blue color selection.
Human procedure
Human subjects were recruited via a university job website and pairs were selected based on matching schedules. Subjects were given a brief introduction to the experiment (see Supporting Information); this material included the information “Your reward will depend on your own and your partner’s choice” and a very basic description of the task (“You will have to choose one of the two circles presented to you. […] You will have to decide and respond quickly.”), but did not include details of the payoff matrix. After the joint introduction, each subject alone performed 50 or 100 individual (solo) trials, to learn how to operate the touchscreen and to develop a preference for one of the two color targets (see SI for the exact verbal instructions given prior to solo training). Subjects were positioned ∼50 cm from the display with the height of the chairs adjusted such that both subjects’ eyes were ∼121 cm above the ground. After the solo training, both subjects entered the setup for the main dyadic task which lasted for 300 or 400 trials. Participants had to infer the task rules by exploration, similar to the macaques 31. Each session lasted approximately 1.5 hours. After the experiment we conducted an individual debriefing and paid the earned reward separately for each subject.
Macaque procedure
Macaques were brought to the setup in their individual primate chairs. The chairs were positioned such that the eyes were 30 cm from the display. The monkeys had previously been trained to perform the basic task structure (hands on proximity sensors, reach to the initial fixation target with the instructed hand, and select one of the presented choice targets by reaching to it). The animals performed the solo version of the task with differential rewards to develop a preference for one of the color targets and were only paired with a conspecific after selecting a higher rewarded target in ≥ 75% of the trials. Thereafter, pairs of macaques worked together in the dyadic version of the task, for 11 ± 7 sessions (range 4-25).
Data analysis
We computed the following six aggregate measures of choice behavior and coordination, explained below: shares of “own” choices, shares of objective left choices, mutual information for target choice, mutual information for side choice, average reward, and dynamic coordination reward. We computed each of these measures for the last 200 trials of each session in order to assess the “steady-state” behavior after allowing for an initial period of exploration.
Shares of own and objective left choices
The share of own choices (SOC) is the fraction of trials where an agent has selected the individually preferred target. Similarly, the share of objective left choices (SLC) is a fraction of trials where an agent has selected the target on the “objective” left side of the screen which is the left side for Agent A and right side for Agent B. Fractions range from 0 to 1. For figures showing SOC/SLC over the course of a session we also calculated both measures for the session as a whole, and in running windows of w = 8 trials. In the latter case, SOC and SLC can take the values of 0, 1/8, 2/8, …, 1. For instance, SOC = 0 means that an agent has selected individually non-preferred target for 8 trials in a row.
Mutual information
Mutual information (MI) represents the reduction of uncertainty regarding the values of one time series provided by knowing the values of the other time series. Here we consider mutual information for the color of the target (MIT) and side (MIS) choices of the two agents, showing how much information the target/side choices of one agent provide about the respective choices of the other. Mutual information is measured in bits. Since both target and side choices are binary (in each trial an agent selects either the preferred or non-preferred target and either the left or right side), both MIT and MIS range from 0 bit (the choices of one agent provide no information about the choices of the other) to 1 bit (the choices of one agent can be inferred precisely from the choices of the other agent).
For instance, if both agents select the objective left target in every odd trial and the objective right target in every even trial, both MIT = MIS = 1, since for every trial the target and the side selected by one agent entirely describes the target/side selected by the other. At the same time, the choices of each agent individually are highly uncertain: both sides and both targets are selected with the same probability of 0.5. If both agents constantly select the objective left side, this would result in MIT = 1 due to side-randomization of target color, but MIS = 0, since there is no uncertainty regarding the side selection and thus no additional knowledge about the other’s choice can reduce the uncertainty.
Formally, mutual information of time series X = (Xt) and Y = (Yt) is given by where p(Xt = x) is the probability of the value x in time series X, p(Yt = y) is the probability of the value y in time series Y, and p(Xt = x, Yt = y) is the joint probability to simultaneously have values x and y in time series X and Y, respectively, and x and y can be either 0 or 1, so that the sum is over all four combinations.
Since in our case time series X and Y have finite length, we simply replace probabilities by relative frequencies. This is known as a naïve estimation of mutual information, but it is sufficiently precise for binary time series 77,78.
To test whether the MI values were significantly different from zero, we generate Whittle surrogates for the given choice time series and estimate from them the threshold for the given significance level (p=0.01 in our case) 79.
Average reward
Average reward (AR) is computed as the average of an agent’s payoff across the session. Note that the average reward of each individual agent can be in the range of 1 to 4 points, while the average joint reward of a pair cannot exceed 3.5 (since when one agent gets payoff of 4, the other agent gets only 3). The average reward for completely independent choices of two agents with 50% probability for either target is 2.5 (but note that an achieved reward of 2.5 is not a positive proof of independent choices).
Dynamic coordination reward
Dynamic coordination reward (DCR) is the surplus reward of the two agents compared to the reward they would get by playing randomly. By playing randomly we mean that choices of the agents in each round are independent of the history and of the current choices of the partner. The “random reward” is computed by selecting the two color targets with the same probabilities as actually observed in the two agents, but randomly permuting the choices over trials. For our payoff matrix, the range of DCR is [-1,1], with -1 corresponding to very inefficient playing (alternating selection of the two anti-coordination options), while 1 corresponds to very efficient playing with explicit coordination (for instance, turn-taking). DCR is hence a measure of dynamic (reciprocal) coordination. For instance, if both agents would coordinate statically by constantly selecting one and the same side, this would result in DCR = 0, even though this coordination pattern still yields the maximum average reward of 3.5.
Formally, DCR is defined as the actual average reward of a pair (Ractual) minus the reward the agents would get if they were playing randomly (RPR). DCR = Ractual − RPR. The reward for playing randomly (RPR) depends only on eight probabilities, four for each agent. Below index i indicates the agent and stands for either A or B:
Pi,1,left - probability to select non-preferred objective left target (which is the left side for agent A and right side for agent B),
Pi,1,right - probability to select non-preferred objective right target,
Pi,2,left - probability to select preferred objective left target,
Pi,2,right - probability to select preferred objective right target.
Note that these probabilities are not independent on each other. First, for i = A, B, it holds Pi,1,left + Pi,2,left + Pi,1,right + Pi,2,right = 1. Second, it holds and where Qleft and Qright are the probability of agent A’s preferred target to appear on the left and on the right, respectively (and of B’s preferred target too appear on the right and on the left). Given the independent random selection from trial to trial described above, Q-values should approximate 0.5 for larger N of trials.
Average reward of two agents for playing randomly is computed as follows: where pa,b is the probability that by random playing agent A gets reward a and agent B reward b. These probabilities are given by the following equations:
To see why this is the case, consider, for instance, p1,1. Both agents get reward of 1 when they both select the other’s preferred target, either when target of agent A appears on the left side (probability of this is encoded by the first term) or on the right side (second term).
To compute confidence intervals, we use the fact that SLCi and SOCi for an agent i can be considered as binomially distributed, thus radiuses of their confidence intervals ΔSLCi and ΔSOCi can be estimated by the classic method of approximating the distribution of error around binomially-distributed observation with a normal distribution. To obtain the confidence interval for DCR, it is sufficient to compute maximal and minimal possible DCR given that SOC and SCL of the two agents are within the respective confidence intervals. Note that SLCi = Pi,1,left + Pi,2,left, SOCi = Pi,2,left + Pi,2,right, thus all the probabilities necessary for calculating DCR can be computed from SLCi and SOCi given that Qleft and Qright are fixed to 0.5 in the reported experiments. Simple analysis reveals that minimum and maximum DCR should be at the edges of the 4-D confidence interval formed by SLCi and SOCi of the two agents, which reduces the problem to testing 16 DCR values computed for SLCi ± ΔSLCi and SOCi ± ΔSOCi.
For the main analysis presented in the Results, we used last 200 trials to compute DCR values. Using the last 150 or 250 trials for DCR analysis resulted in the exact same 10 human pairs with significant results as using the last 200.
Reaction and movement time measurements
We measured the time from the onset of the choice targets to the release time of the initial fixation target (trelease) and to the acquisition time of the selected target, tacquisition individually for each subject. We then calculated the movement time tmovement and reaction time treaction as follows:
We use treaction, the half-way point between trelease and tacquisition, i.e. the duration from target stimulus onset to the half-time of the reach movement, as a proxy for the estimated time at which the trajectory of each subject’s reach movement should be evident for the other agent. Typical movement time values (mean ± SD across trials) in our experiment ranged from 314 ± 104 ms for humans, 171 ± 77 ms for macaques, and 180 ± 64 ms for the confederate trained macaque pair.
Estimated probability to see partner’s choice
When an agent is acting slower than the partner, there is a chance for this agent to see the partner’s choice and use this information for making the own choice. We therefore modeled the probability to see the partner’s choice as a logistic function of the difference of the agents’ reaction times, in each trial. The logistic function psee(t) had an inflection point at 50 ms (psee(50) = 0.5) and reached its plateau phase at 150 ms (psee(150) = 0.98). Formally, the function was given by the following equation with k = 0.04 (steepness of the slope) and ΔT0 = 50 ms (inflection point). The values of psee were used for the analysis of correlation with probability of selecting the other’s color (cf. Figure 7). Importantly, we also tested a wide range of these parameters (k fixed at 0.04, ΔT0 = {12.5, 25, 50, 75, 100, 200} ms and ΔT0 fixed at 50 ms, k = {0.01, 0.02, 0.04, 0.08, 0.16}), and confirmed that the values of correlation were robust in respect to these parameters; only the longest ΔT0 = 200 ms resulted in a noticeable drop of resultant correlations.
Supporting Figures
Supporting tables
Written instructions read by human subjects
Description of the study
This study is the investigation of the behavioral correlates of social decision making while playing a game with a partner. Every day we have to make decisions that depend not only on our own needs and goals but also on the needs and goals of others: for instance, while working on a project with colleagues, planning vacation with friends and family, getting on the bus or shopping groceries, etc. With the help of the task presented in this study we will investigate how people make such decisions.
The course of the study
You will complete one session of a decision making task on the computer together with your partner. You will have to choose one of the two circles presented to you. Your partner will have to perform the same task on his/her side of the touchscreen. You will have to decide and respond quickly. If either you or your partner are too slow, the trial will be aborted without any reward. Your reward will depend on your own and your partner’s choice. After the decision is made, you both will receive different auditory feedbacks, denoting your reward and the reward of your partner. Please do not talk to your partner during the session. After the session, we will ask you several questions about the experiment.
Instructions read to human subjects by experimenter
You rest both hands on the gray board at the two round objects (touch sensors).
A central touch target will appear.
Move your right hand to the target and hold. Use the right hand during the entire session.
While the target brightens up keep holding your finger on the target.
Then two colored choice targets appear and the central target disappears.
Make your choice and touch the chosen target within 1.5 seconds.
All touched targets will brighten up, keep holding until the targets disappear.
Please note that both selected targets will brighten up:
case both players selected the same target only that single target will brighten; In case both players selected different targets, both targets will brighten.
Now two streams of auditory beeps will signify the earned reward for each player: each beep corresponds to a few cents. Please try to learn the sound related to your reward during the training trials.
While the audio plays move the hand back to the two touch sensors.
Go to 1.
Acknowledgements
We thank Elisheba Crecca, Tarana Nigam and Roberta Nocerino for their help with the data collection, and Daniela Lazzarini, Janine Kuntze, Sina Plümer and Klaus Heisig for technical support.
Footnotes
↵a Shared senior authorship
Abbreviations
- AR
- Average reward
- BoS
- Bach or Stravinsky game (also known as the Battle of the Sexes)
- DCR
- Dynamic coordination reward
- MI
- Mutual information
- MIS
- Mutual information between side choices
- MIT
- Mutual information between target color choices
- (i)PD
- (iterated) Prisoner’s dilemma
- RT
- Reaction time
- SOC
- Share of own choices
- SLC
- Share of left choices