Representing Multiple Observed Actions in the Motor System

There is now converging evidence that others’ actions are represented in the motor system. However, social cognition requires us to represent not only the actions but also the interactions of others. To do so, it is imperative that the motor system can represent multiple observed actions. The current fMRI study investigated whether this is possible by measuring brain activity from 29 participants while they observed two right hands performing sign language gestures. Three key results were obtained. First, brain activity in the premotor and parietal motor cortex was stronger when two hands performed two different gestures than when one hand performed a single gesture. Second, both individual observed gestures could be decoded from brain activity in the same two regions. Third, observing two different gestures compared with two identical gestures activated brain areas related to motor conflict, and this activity was correlated with parietal motor activity. Together, these results show that the motor system is able to represent multiple observed actions, and as such reveal a potential mechanism by which third-party social encounters could be processed in the brain.

There is accumulating evidence that action observation triggers a corresponding motor representation in the observer Sinigaglia 2010, 2016;Cracco et al. 2018). For instance, animal research has uncovered a subset of motor neurons in the premotor and parietal cortex that fire both when an action is executed and when the same action is observed (Gallese et al. 1996;Kilner and Lemon 2013). This is supported by human research, in which the role of motor processes in action observation has been demonstrated across a wide range of techniques, including fMRI (Gazzola and Keysers 2009;Molenberghs et al. 2012), TMS (Fadiga et al. 1995;Naish et al. 2014), M/EEG (Arnstein et al. 2011;Fox et al. 2016), and intracranial recording (Mukamel et al. 2010;Babiloni et al. 2016).
Together, this research indicates that we represent observed actions through motor simulation in shared neural circuits (Keysers and Gazzola 2006). This, in turn, has been argued to facilitate action perception Sinigaglia 2010, 2016). In support, studies have shown that disrupting sensorimotor brain areas impairs our ability to recognize and predict others' actions (Avenanti et al. 2013;Urgesi et al. 2014). In social life, however, we have to represent not only the actions but also the interactions of others (Quadflieg and Koldewyn 2017;Quadflieg and Penton-Voak 2017). Recently, it was proposed that third-party interactions could be represented by simulating the actions of the different interactors in the motor system (Quadflieg and Koldewyn 2017;Quadflieg and Penton-Voak 2017). Yet, as research has mainly focused on situations with one agent, the underlying assumption that multiple observed actions can be processed in the motor system remains to be tested.
Pointing in this direction, preliminary evidence suggests that motor activation is stronger when observing an interaction between two persons than when observing a single person (Iacoboni et al. 2004;Bucchioni et al. 2013;Aihara et al. 2015) or two independently acting persons (Centelles et al. 2011;Georgescu et al. 2014). However, a non-specific increase in motor activation does not necessarily indicate that observers simulated multiple actions. Instead, to reach this conclusion, it has to be shown that multiple motor representations were activated. Therefore, in previous work, 4 we have measured automatic imitation (Cracco et al. 2015;Brass 2018a, 2018b) and corticospinal excitability (Cracco et al. 2016) while participants observed either a single action or two identical actions. This revealed action-specific increases in both measures when two identical actions were observed, suggesting that both actions triggered the same motor representation. Nevertheless, two identical actions might still be represented as a single action in the motor system. Therefore, a crucial question is whether it is also possible to simulate two different observed actions. Indeed, when watching two interacting individuals, it is rarely the case that their actions can be reduced to a single action.
To address this question, the current fMRI study measured sensorimotor brain activity while participants watched two right hands performing one out of three sign language gestures (see Figure   1a and Supplementary Videos). We focus on three hypotheses. First, we test whether motor activation is stronger when two hands perform two different actions (2H DIFF) than when one hand performs a single action (1H). Second, we use multivariate analyses to investigate whether both 2H DIFF gestures can be decoded from brain activation in the motor system. Finally, we explore the consequences of mirroring two different gestures. In particular, we investigate whether this leads to motor conflict, based on the fact that it is impossible to simultaneously execute two gestures with one hand. We predict stronger activation in brain areas associated with motor conflict when observing two different actions (2H DIFF) than when observing two identical actions (2H SAME).
Specifically, we expect this contrast to activate the anterior cingulate cortex (ACC), which is at the core of the conflict monitoring network (Botvinick et al. , 2004Braver et al. 2001;Ridderinkhof et al. 2004).

Participants
Thirty healthy volunteers participated in the experiment for 30 euro (17 female, M age = 22.67, SD age = 2.48, range age = 18 -28), but one participant was excluded due to excessive head movement.
This resulted in a sample of twenty-nine participants (17 female, M age = 22.76, SD age = 2.47, range age = 18 -28). However, the action execution data of three participants could not be used. For one participant, a technical error caused the randomization to be lost. Furthermore, for two participants, there was excessive head movement between the action observation and action execution phase, which led to missing data in the occipital lobe as a consequence of realignment. In other words, the action observation analyses were conducted on 29 participants and the action execution analyses on 26 participants. All participants were right-handed according to the Edinburgh Handedness Inventory (Oldfield 1971), did not speak sign language, had no history of neurological or psychiatric disorder, and gave written informed consent. The study was approved by the Medical Ethic Review Board of the Ghent University Hospital.

Task and Procedure
Experimental Design. The experiment was structured in three phases. The first phase was a familiarization phase, which took place outside the scanner. In this phase, participants were acquainted with the two experimental tasks (i.e., the action observation and action execution tasks).
Furthermore, to ensure that participants were able to execute the three gestures used in the action observation task, the familiarization phase included an imitation task in which each gesture was presented 10 times. Participants had to imitate the gesture with their right hand and then press the space bar to continue. Performance on this task was monitored by the experimenter and mistakes were corrected. Following the familiarization phase, participants were put into the scanner to complete the second and third phase of the experiment, respectively. In the second phase, participants performed the action observation task. In the third phase, they performed the action execution task. Action Observation Task. In the action observation phase, participants watched short videos (5 s) of two right hands repeatedly (4 times) performing one out of three sign language gestures ( Fig.   1a and Supplementary Videos). The gestures were chosen on the basis of a pilot study in which 40 participants used a 5-point Likert scale to rate eight sign language gestures on whether the gesture was familiar, had been seen before, was clearly visible, was difficult to execute, and had a meaning.
For each gesture, participants were furthermore asked to guess its meaning. Based on the results, we then chose three gestures that were clearly visible, not too difficult to execute, and with unknown meaning. The videos were presented as a sequence of 28 frames. The first frame showed the hands in rest position and was presented for 300 ms. Next, all 28 frames were presented for 33 ms each.
Finally, the last frame remained on the screen for another 300 ms before the 28 frames were presented again for 33 ms each. This was repeated for four cycles so that each video had a total duration of 5196 ms (300 ms + 4 x 28 x 33 ms + 4 x 300 ms).
To ensure that attention was maintained throughout the entire duration of the experiment, participants had to detect a glitch appearing randomly in one seventh of the trials at a random moment in the video on the left hand, on the right hand, or on both hands. The glitches were inserted by replacing 1 of the 28 frames with a blue frame. This resulted in a brief flicker (33 ms) that was easily missed if attention was not focused on both hands (Video S4). After each glitch trial, two questions were presented on the screen. The first question required participants to indicate the gesture(s) on which the glitch had appeared. That is, a series of four pictures was presented showing the neutral hand followed by the three gestures. For each picture, participants had to indicate whether a glitch had appeared on the presented gesture or not. The second question asked participants whether the glitch had appeared on one hand or on both hands. Accuracy was 75% on the first question and 85% on the second question, indicating that the task was challenging but not too difficult. All participants scored above chance level on both questions. Trials with a glitch were not included in the analyses.

7
The action observation task comprised two runs of 126 trials each. Trials were presented randomly with the restriction that all 18 gesture combinations occurred equally often in both runs.
The following gesture combinations were included in the experiment: nG1/nG1, nG2/nG2, nG3/nG3, nG1/G1, G1/nG1, nG2/G2, G2/nG2, nG3/G3, G3/nG3, G1/G1, G2/G2, G3/G3, G1/G2, G2/G1, G1/G3, G3/G1, G2/G3, G3/G2, with G1, G2, and G3 representing the three gestures and nG1, nG2, and nG3 representing the corresponding neutral hand. For example, G2/G3 means that gesture 2 was presented left and gesture 3 was presented right. The 18 gesture combinations were combined to form four conditions. In the Neutral condition (N), neither hand performed a gesture. In the One Hand condition (1H), one hand performed a gesture. In the Two Hands Same condition (2H SAME), both hands performed the same gesture. Finally, in the Two Hands Different condition (2H DIFF), the two hands each performed a different gesture. Note that the N and 2H SAME conditions contained fewer gesture combinations and therefore fewer trials than the 1H and 2H DIFF conditions. However, they still included 36 trials each, thereby providing us with sufficient data to reliably estimate the signal in all conditions. All trials were separated by a black screen presented for a jittered duration drawn from a pseudo-logarithmic distribution with 50% short durations (200, 800, 1400, or 2000 ms), 33.3% intermediate durations (2600, 3200, 3800, or 4400 ms), and 16.7% long durations (5000, 5600, 6200, or 6800 ms). Action Execution Task. Immediately after the action observation task, participants performed the action execution task. In this task, a green or red circle was presented on the left or right side of the screen. The circle then decreased in size every second until it disappeared after four seconds. During the action execution task, participants wore gloves with a bubble wrap ball attached to the palm. When a green circle was presented, participants had to squeeze the ball each time the circle decreased in size using the hand that corresponded to the location of the circle on the screen.
For example, when a green circle appeared on the left side, participants had to squeeze the ball using their left hand. In contrast, when a red circle appeared, participants simply had to watch the screen.
The action execution task consisted of 60 trials that were randomly subdivided into 20 squeeze left 8 trials, 20 squeeze right trials, and 20 watch trials. Trials were separated by a black screen presented for a jittered duration of 4000, 5000, 6000, 7000, or 8000 ms. The action execution task was based on the task used by Arnstein et al. (2011). Importantly, these authors found that a simple squeeze task like the one used here activated a network very similar to the network activated by a more complex object manipulation task.

fMRI Parameters
MRI images were acquired with a 3T Siemens Trio scanner using a 32-channel radiofrequency head coil. Participants entered the scanner head first and supine. The scanning procedure started with an anatomical scan in which 176 high-resolution anatomical images were acquired using a T1weighted 3D MPRAGE sequence [repetition time (TR) = 2250 ms, echo time (TE) = 4.18 ms, image matrix = 256 x 256, field of view (FOV) = 256 x 256 mm, flip angle = 9°, voxel size = 1.00 x 1.00 x 1.00 mm]. This was followed by two action observation runs and one action execution run in which wholebrain functional images were obtained. These functional images were acquired using a T2*-weighted echo planar imaging (EPI) sequence, sensitive to BOLD contrast (TR = 2000 ms, TE = 28 ms, image matrix = 64 x 64, FOV = 224 mm, flip angle = 80°, distance factor = 17%, voxel size 3.5 x 3.5 x 3.0 mm, 34 axial slices).
Analyses fMRI Preprocessing. All fMRI data was analyzed using SPM8 (Wellcome Department of Imaging Neuroscience, UCL, London, U.K.; www.fil.ion.ucl.ac.uk/spm). To account for T1 relaxation, the first four scans of all runs were dummy scans. The three runs were preprocessed together according to the following steps. First, the functional images were spatially realigned using a rigid body transformation. Second, the realigned images were slice-time corrected with respect to the middle acquired slice. Third, the structural image of each subject was co-registered with its mean functional image. Fourth, the anatomical images were segmented according to the SPM8 tissue probability maps and the resulting parameters were used to normalize the functional images to MNI space. Finally, the images were resampled into 3 mm 3 voxels and spatially smoothed with an 8 mm Gaussian kernel (full-width at half maximum). However, for the representational similarity analyses, we used the unsmoothed data instead.
Univariate Analyses. The data was filtered using a 128 Hz high-pass filter. First level analyses were conducted by fitting a general linear model in SPM8. The action observation model consisted of 10 regressors per run, namely one regressor for each condition (N, 1H, 2H SAME, and 2H DIFF) and six regressors representing the realignment parameters. The signal was modeled over the entire duration of the videos and was convolved with the canonical hemodynamic response function (HRF).
The action execution model also consisted of 10 regressors, with one regressor for each condition (Watch Left, Watch Right, Squeeze Left, Squeeze Right) and six regressors representing the realignment parameters. The entire duration of the trial, from the moment the circle appeared until the moment it disappeared, was modeled and convolved with the canonical HRF.
The second level analyses of both tasks were performed using a within-subject one-way ANOVA with equal variances. Unless otherwise specified, all results are thresholded using a p < .05 FWE-corrected threshold at the peak level. Whole brain univariate analyses were used to localize action execution activation, action observation activation, shared voxel (sVx) activation, and motor conflict activation. To localize action execution activation ( The intersection of these two activation maps was used to localize sVx activation, which served as a proxy for the human mirror neuron system (Gazzola and Keysers 2009). Finally, to localize motor conflict activation, we calculated the contrast 2H DIFF > 2H SAME.

Region of Interest Analyses.
To test whether brain activity was stronger in the 2H DIFF condition than in the 1H condition, we constructed three regions of interest (ROIs) from brain activity in the action observation task. To ensure that the ROIs were not biased towards the 2H conditions, we defined them using the Nichols et al. (2005) Figure S1). Moreover, to secure statistical independence, we used a leave-one-out crossvalidation procedure in which the ROIs for each participant were calculated using the data of all participants except that one participant (Esterman et al. 2010). That is, for each participant, we performed the specified conjunction analysis on the data of all the other participants. We then defined the ROIs by constructing 5 mm spheres around the peak coordinates of the left and right visual area 5 (V5), the left and right parietal cortex (PAR), and the left premotor cortex (PMC). The right PMC was not consistently activated. Therefore, to obtain a bilateral PMC ROI, we mirrored the left PMC sphere onto the right hemisphere. Importantly, the activation pattern was highly similar in the left and right PMC ( Figures S2-S3).
The V5 peak coordinates were determined using the Squeeze > No Squeeze action execution activation as an exclusive mask (p < .001, uncorrected). Conversely, the PAR and PMC peak coordinates were determined using the Squeeze > No Squeeze action execution activation as an inclusive mask (p < .001, uncorrected). The use of an uncorrected p < .001 threshold for the masks was based on previous work (Arnstein et al. 2011). Moreover, we intersected the obtained spheres with a brain mask to ensure that the ROIs did not extend beyond the brain. The peak coordinates used to construct the ROIs are reported separately for each participant in Table S1. To perform the univariate ROI analyses, we extracted the beta values using the MARSBAR package in SPM8 (Brett et al., 2002), added the intercept to each value, and then used the resulting values to calculate the percent signal change in the 1H and 2H DIFF conditions with respect to the N condition as Representational Similarity Analysis. To investigate whether both 2H DIFF gestures could be decoded from brain activity in the three ROIs, we conducted a representational similarity analysis (Kriegeskorte et al. 2008), which was performed on the unsmoothed data. First level analyses were conducted by fitting a general linear model in SPM8. The model consisted of 24 regressors per run, namely one regressor for each of the 18 gesture combinations and six regressors for the realignment parameters. The signal was modeled over the entire duration of the videos and was convolved with the canonical HRF. Next, for each voxel in each ROI, we extracted the beta values corresponding to the 18 gesture combinations and then computed pairwise correlations across voxels to obtain an 18 x 18 correlation matrix per run per ROI. In these matrices, each cell compares the activation pattern of two gesture combinations (e.g., G1/G2 and G3/G3). To account for the non-normal distribution of the correlation coefficients, we first transformed the correlations into Fisher z-scores before using them in the analyses.
To decode the 2H DIFF gestures from brain activity in the three ROIs, we correlated the activation pattern in the 2H DIFF condition with the activation patterns in the 1H and 2H SAME conditions. This allowed us to compare two correlations with gesture overlap and one correlation without gesture overlap (Fig 4a). That is, it allowed us to compare the activation pattern when conflict activation depended on sVx activation. The ACC, which was identified using the contrast 2H 12 DIFF > 2H SAME in the action observation task, was used as seed region. PPI regressors were calculated at the first level for all four conditions (N, 1H, 2H SAME, and 2H DIFF) and were then analyzed using a within-subject one-way ANOVA with equal variances at the second level. Because we were only interested in connections with the action observation network, the PPI analysis was restricted to voxels that were significant in the action observation task (Observation > Neutral) at an uncorrected p < .001 threshold. Within this volume, the PPI results were thresholded at FWEcorrected p < .05 peak threshold.

Shared Voxel Localization
Shared voxel activation was determined by measuring the overlap in brain activation between action observation and action execution. First, to localize action execution activation, we calculated the contrast Squeeze > No Squeeze. As shown in Figure 2a, this revealed widespread activation in the sensorimotor system, consistent with previous research using this task (Arnstein et al. 2011). Next, to localize action observation activation, we compared activation in the N condition with activation in the 1H, 2H SAME, and 2H DIFF conditions using the contrast Observation > Neutral.
As expected, this yielded bilateral activation around V5, and in the inferior parietal cortex, the superior parietal cortex, the postcentral gyrus, and the superior temporal gyrus, as well as lateralized activation in the left dorsal premotor cortex (Fig 2b). Finally, to localize sVx, we determined the overlap between both tasks. In line with meta-analyses on action observation (Caspers et al. 2010;Molenberghs et al. 2012), this revealed sVx activation in the inferior parietal cortex, the superior parietal cortex, the postcentral gyrus, the superior temporal gyrus, and the dorsal premotor cortex, but not in the visual cortex (Fig 2c).

Representing Two Observed Actions in the Motor System
Is There More Activation? To test whether motor activation was stronger when observing two different actions compared with a single action, we looked at brain activity in three core regions of the action observation network, namely one visual region (V5) and two sensorimotor regions (PAR, and PMC). That is, we conducted a region x condition repeated measures MANOVA to compare the percent signal change in the 1H condition with the percent signal change in the 2H DIFF condition, both relative to the N condition, averaged across the left and right hemisphere (but see Figures S1 and S2 for the hemisphere specific results). As predicted, this revealed a main effect of condition, F(1, 28) = 79.15, p < .001, indicating stronger activity in the 2H DIFF condition than in the 1H condition. The region x condition interaction was significant as well, F(2, 27) = 86.06, p < .001 (Fig 3).
This showed that the condition effect was stronger in V5 than in both PAR, t(28) = 11.35, p < .001, and PMC, t(28) = 13.34, p < .001. However, in addition to V5, F(1, 28) = 155.60, p < .001, there was also a significant effect of condition in PAR, F(1, 28) = 16.40, p < .001, and PMC, F(1, 28) = 5.56, p = .026. In other words, these results indicate that brain activity in visual as well as motor areas was stronger when two different gestures were observed compared with when a single gesture was observed. As such, they support the hypothesis that both 2H DIFF gestures were processed in the motor system.

Can We Decode Both Actions? To further inform whether participants represented both 2H
DIFF gestures in their motor system, we then tried to decode these gestures from brain activity in the three ROIs using representational similarity analysis, which is a multivariate technique to measure the similarity between the representations of two stimuli by correlating their activation patterns (Kriegeskorte et al. 2008). Applied to the current study, we correlated the activation pattern in the 2H DIFF condition with the activation pattern in the 1H and 2H SAME conditions because this allowed us to compare two correlations with gesture overlap and one correlation without gesture overlap ( Figure 4a). More specifically, it allowed us to compare the activation pattern when observing both Overlap B conditions, t(28) = 0.53, p = .601 (Fig 4b). There was no region x overlap interaction, F(4, 25) = 0.07, p = .991. Taken together, these results thus indicate that the actions of both hands could be decoded from brain activity in visual as well as motor areas even when the hands performed different actions.

Does Representing Multiple Observed Actions Produce Motor Conflict?
Finally, we looked at the consequences of mirroring multiple observed actions. That is, because the 2H DIFF gestures are mutually exclusive in terms of motor execution, mirroring them should lead to motor conflict (Botvinick et al. , 2004) and therefore to ACC activation (Botvinick et al. , 2004Braver et al. 2001;Ridderinkhof et al. 2004). To test this hypothesis, we compared brain activity in the 2H DIFF condition with brain activity in the 2H SAME condition (2H DIFF > 2H SAME). As expected, this activated the ACC. In addition, it also activated the right anterior insula, which is known to co-activate with the ACC (Ullsperger et al. 2014), and the action observation network (Fig 5; Table 1). Next, to evaluate whether this corresponds to what is typically found in motor conflict research, we performed a Neurosynth meta-analysis on the search string (Response* | the ACC and AI were activated more consistently in studies mentioning motor conflict than in studies not mentioning motor conflict (Yarkoni et al. 2011). As shown in Fig 5, this was indeed the case, and an overlap analysis revealed substantial overlap with the pattern of activation obtained in the current study.
Crucially, however, if the ACC activity was caused by motor conflict, then it should depend on activity in the motor system. That is, according to the response conflict model of cognitive control, the ACC detects and signals the presence of motor conflict (Botvinick et al. , 2004. In particular, this model argues that response conflict can be seen as the product of the activation in two simultaneously active response nodes. As a consequence, if there is only one active response node, there is no response conflict. In contrast, if there are two active response nodes, response conflict depends on the activation of both nodes. In the current study, two identical observed gestures should trigger the same "response", whereas two different observed gestures should trigger two different "responses". Therefore, response conflict is expected in the 2H DIFF but not in the 2H SAME condition, and response conflict in the 2H DIFF condition should depend on the strength of activation in the motor system. To test this hypothesis, we calculated the PPI of the 2H DIFF > 2H SAME contrast using the ACC as seed region and the action observation activation (Observation > Neutral) as an inclusive small volume corrected mask (Fig 6). This revealed a condition-dependent association with activity in the inferior parietal cortex, and this activity completely overlapped with the activity in the action execution task (Squeeze > No Squeeze). Thus, the PPI analysis showed that activity in parietal sVx was correlated more strongly with ACC activity in the 2H DIFF condition than in the 2H SAME condition. In line with the motor conflict hypothesis, this suggests that ACC activity signaled the presence of motor conflict produced by simultaneously representing two mutually exclusive observed gestures in the motor system.

Discussion
The mirror mechanism is a fundamental neural mechanism that translates observed actions into motor programs Sinigaglia 2010, 2016). It has been argued that this mechanism supports action representation through motor simulation (Avenanti et al. 2013;Urgesi et al. 2014).
However, in contrast to action simulation, little is known about interaction simulation. A necessary condition for interaction simulation is that multiple observed actions can be represented in the motor system (Quadflieg and Koldewyn 2017;Quadflieg and Penton-Voak 2017). In the current study, we investigated this condition by asking participants to watch two right hands performing sign language gestures. Three key results were obtained. First, the motor system was activated more strongly when two different gestures were observed compared with when a single gesture was observed. Second, both individual gestures could be decoded from activity in the motor system.
Third, observing two different gestures relative to observing two identical gestures produced motor conflict related activity in the ACC, and this activity was correlated more strongly with parietal motor activity in the former condition than in the latter condition. Together, these results indicate that the motor system is able to simultaneously represent multiple observed actions, even when they cannot be simultaneously executed. Instead, when the mirrored actions violate motor constraints, this is signaled in the form of motor conflict.
Nevertheless, an alternative explanation is that participants did not represent both gestures simultaneously, but instead randomly represented one gesture on each trial. In this case, the activation pattern over trials would combine both gestures, likewise making it possible to decode both gestures from the average brain activation. However, a random sampling mechanism seems unlikely considering that our attentional task required participants to simultaneously monitor both hands. Indeed, performance on this task (85%) was well above chance (50%). In the same vein, random sampling is inconsistent with recent evidence that non-attended observed actions modulate automatic imitation even when another observed action is being attended or being imitated (Cracco and Brass 2018a). Finally, random sampling is difficult to reconcile with the other two main results, namely that observing two different gestures produced stronger motor responses and led to motor conflict. That is, if only one gesture was represented on each trial, then motor activation should not be stronger when two gestures were observed, nor should there be any motor conflict. Indeed, motor conflict is widely held to arise from the simultaneous processing of two or more actions . As a result, it should only be present if both actions were represented in parallel.
Still, one could argue that the interpretation of ACC activation as motor conflict was a reverse inference and therefore invalid. However, it should be stressed that this interpretation was not a post-hoc explanation of unexpected brain activation but rather an a-priori hypothesis restricted by theoretical as well as task constraints. Speaking more broadly, the validity of reverse inferences critically depends on the likelihood that the proposed process, as opposed to competing processes, was engaged in the experimental task (Poldrack 2006;Hutzler 2014). Looking at the literature, two plausible alternative processes can be identified. First, it could be argued that the ACC was activated due to differences in the frequency with which certain stimuli were presented, consistent with evidence that this region codes expectancy violation (Desmet et al. 2014;Fouragnan et al. 2018).
Second, it could be argued that ACC activity was driven by stimulus conflict rather than motor conflict. As stimulus conflict occurs already at the visual level (Verbruggen et al. 2006), it does not require both observed gestures to be represented in the motor system. Importantly, however, an explanation in terms of stimulus conflict is inconsistent with previous research on the neural substrates of conflict processing. In particular, this work has demonstrated that the ACC is sensitive to motor conflict, but not to stimulus conflict (Milham et al. 2001;van Veen et al. 2001;Liu et al. 2004;Liston et al. 2006;Wendelken et al. 2009;Mayer et al. 2012). Furthermore, stimulus conflict cannot easily explain why ACC activation was correlated with parietal sVx activation but not with purely visual activation. That is, if stimulus conflict was coded in the ACC, it should co-activate with the visual cortex ). More generally, potential alternative explanations are only feasible if they are also able to explain why ACC activation was correlated with motor activation, and why this correlation was stronger when the hands performed two different gestures than when they performed two identical gestures. Since this pattern was derived directly from conflict monitoring theories (Botvinick et al. , 2004, it strongly favors the motor conflict hypothesis (Poldrack 2006;Hutzler 2014).
In sum, the current study shows that at least two observed actions can be represented at the same time in the motor system. This critically extends research on mirror processes from action representation Sinigaglia 2010, 2016) to interaction representation (Quadflieg and Koldewyn 2017;Quadflieg and Penton-Voak 2017). That is, in contrast to what might intuitively be expected, it shows that motor simulation does not break down when observing multiple actions.
Instead, in those situations, the different actions are simulated together. The ability to mirror multiple observed actions fulfills the condicio sine qua non of interaction simulation. As such, the present work opens up the possibility that mirror processes contribute not only to decoding the actions (Avenanti et al. 2013;Urgesi et al. 2014) but also to decoding the interactions of others (Quadflieg and Koldewyn 2017;Quadflieg and Penton-Voak 2017). However, to fully evaluate this hypothesis, it is imperative to broaden the current research from observing isolated actions to observing meaningful social interactions.
Moreover, in addition to interaction representation, the present work has important implications for joint action. That is, previous research has shown that motor simulation facilitates interpersonal coordination in tasks that require two persons to cooperate (Colling et al. 2013;Kourtis et al. 2013). For example, a recent study found that musicians' ability to tune their own actions to those of another musician in a musical duet was impaired when the dorsal premotor cortex was disturbed (Hadley et al. 2015). However, social tasks often extend beyond the dyad (Tsai et al. 2011).
For instance, musicians in a musical ensemble have to coordinate their actions not just with one but with multiple co-musicians (Volpe et al. 2016). The results of the present study suggest that this may likewise rely on motor simulation. In particular, it suggests that mirror processes could be used to simultaneously predict the action outcomes of several co-actors (Aglioti et al. 2008;Hamilton and Grafton 2008) to achieve interpersonal coordination in multi-agent settings (Colling et al. 2013;Kourtis et al. 2013).
Finally, to the best of our knowledge, the current study is the first demonstration that motor conflict is not restricted to action planning (Botvinick et al. , 2004, but can occur during action observation as well. This has important implications for theories of cognitive control. For example, a prominent view is that motor conflict signals the need to increase attentional control (Botvinick et al. , 2004. Thus, in this view, when observers experience motor conflict, this could trigger compensatory mechanisms aimed at increasing attention towards the observed actions, and this could then facilitate social processes such as interaction understanding (Quadflieg and Koldewyn 2017;Quadflieg and Penton-Voak 2017) or interpersonal coordination (Colling et al. 2013;Kourtis et al. 2013).
It is also interesting to note that motor conflict did not activate the temporo-parietal junction (TPJ). The TPJ is known to respond to conflict between observed and planned actions (Brass et al. 2005(Brass et al. , 2009, and is therefore believed to distinguish self-from other-generated actions (Brass et al. 2009). As a consequence, it could be expected to distinguish between other-generated actions as well. However, self-other and other-other distinction do not necessarily rely on the same processes (Cracco and Brass 2018a). Furthermore, because no response was needed, our task did not require participants to distinguish between action representations. Thus, in this view, the ACC and TPJ may fulfill complementary roles with the ACC being involved in signaling motor conflict ) and the TPJ in solving the conflict when the need for an overt action requires it (Brass et al. 2009).

20
To conclude, the present work demonstrates that multiple observed actions can be represented at the same time in the motor system. This has important implications for interaction representation as well as joint action, and opens up new hypotheses on the role of motor conflict in action observation. Figure 1. Examples of the stimuli used (a) in the action observation task and (b) in the action execution task. In the action observation task, participants watched short videos of two hands repeatedly performing one out of three possible gestures. The task of participants was to detect a glitch in the video appearing randomly in one out of seven trials. In the action execution task, participants had to either squeeze a bubble wrap ball or look at the screen depending on whether the color of the circle was, respectively, green or red. In case of a green circle, participants had to squeeze the ball each time the circle decreased in size with the hand that corresponded to the location of the circle on the screen. The middle panel shows brain activation in the action execution task (Squeeze > No Squeeze). The bottom panel shows the overlap between both tasks. In line with previous work, the action execution data was thresholded at p < .001 (uncorrected) to determine shared voxels (Arnstein et al. 2011). All other images were thresholded using a p < .05 FWE-corrected threshold 29 Figure 3. Results of the ROI analyses testing whether activation was stronger in the 2H DIFF condition than in the 1H condition. The y-axis shows the % signal change with respect to the N condition.  the results of a Neurosynth meta-analysis on motor conflict. Brain activation from the current study is thresholded at p < .05 using FWE correction. Brain activation from Neurosynth is thresholded at p < .01 using FDR correction.

Figures and Tables
32 Figure 6. The top panel shows the PPI results of the 2H DIFF > 2H SAME contrast in the action observation task. The displayed coordinates are the peak coordinates of the PPI analysis (t = 4.99, k = 20). Note that the PPI results are masked with the activation of the action observation task (Observation > Neutral). The bottom panel shows the overlap between the activation of the PPI analysis and the activation of the action execution task (Squeeze > No Squeeze). There was no PPI activation that was not captured by the action execution activation. Unless otherwise specified, brain activation is thresholded with a small-volume FWE-corrected p < .05 threshold. In line with previous work, action execution data was thresholded at p < .001 to determine activation overlap (Arnstein et al. 2011). Table 1 Peak MNI Coordinates of the 2H DIFF > 2H SAME contrast.   Figure S1. Brain activation in each of the action observation conditions compared with the neutral condition. The bottom right figure shows the conjunction of the other three contrasts (Nichols et al. 2005). Images were thresholded using a p < .05 FWE-corrected threshold. Left premotor activation was also present in the conjunction analysis at an uncorrected threshold of p < .001.

Supplementary Information
36 Figure S2. Hemisphere specific results of the ROI analyses testing whether activation was stronger in the 2H DIFF condition than in the 1H condition. The y-axis shows the % signal change with respect to the N condition. Details on the % signal change calculation are provided in the methods. Post-hoc two-tailed t tests comparing 1H with 2H DIFF are displayed. Error bars are SEMs corrected for withinsubject designs (Morey 2008). * p < .05; ** p < .01; *** p < .001.
37 Figure S3. Hemisphere specific results of the representational similarity analyses testing whether the two observed gestures in the 2H DIFF condition could be decoded from brain activation in the three ROIs. Post-hoc two-tailed t tests comparing the average of Overlap A and B with No Overlap are displayed. Error bars are SEMs corrected for within-subject designs (Morey 2008). The difference between Overlap A and Overlap B was never significant. * p < .05; ** p < .01; *** p < .001.

38
Video S1. Video showing the procedure of the action execution task.
Video S2. Video showing the 1H condition of the action observation task.
Video S3. Video showing the 2H SAME condition of the action observation task.
Video S4. Video showing the 2H DIFF condition of the action observation task. This video also illustrates the glitches participants had to detect in the videos.