Parietal and Motor Cortical Dynamics Differentially Shape the Computation of Choice History Bias

Humans and other animals tend to systematically repeat (or alternate) their previous choices, even when judging sensory stimuli presented in a random sequence. Choice history biases may arise from action preparation in motor circuits, or from perceptual or decision processing in upstream areas. Here, we combined source-level magnetoencephalographic (MEG) analyses of cortical population dynamics with behavioral modeling of a visual decision process. We disentangled two neural history signals in human motor and posterior parietal cortex. Gamma-band activity in parietal cortex tracked previous choices throughout the trial and biased evidence accumulation toward choice repetition. Action-specific beta-band activity in motor cortex also carried over to the next trial and biased the accumulation starting point toward alternation. The parietal, but not motor, history signal predicted the next trial’s choice as well as individual differences in choice repetition. Our results are consistent with a key role of parietal cortical signals in shaping choice sequences.


INTRODUCTION
The tendency to systematically repeat (or alternate) choices is ubiquitous in decision-making under uncertainty. Such history biases are prevalent even when observers judge weak sensory stimuli presented in a random sequence (Fernberger, 1920;Rabbitt and Rodgers, 1977;Treisman and Williams, 1984).
Common models of perceptual decision-making posit the temporal accumulation of sensory evidence, resulting in an internal decision variable that grows with time (Bogacz et al., 2006;Gold and Shadlen, 2007;Ratcliff and McKoon, 2008;Brunton et al., 2013). When this decision variable reaches one of two bounds, a choice is made and the motor response is initiated. Fitting such accumulation-to-bound models to human behavior in five different tasks, we have previously shown that individual differences in choice history biases are predicted by history-dependent biases in the rate, rather than the starting point, of evidence integration (Urai et al., 2019).
What are the neural sources of these choice history biases? Behavioral evidence indicates points to a central origin, neither purely sensory nor motor (Fritsche et al., 2017;Braun et al., 2018;Zhang and Alais, 2020;Feigin et al., 2021). In the primate brain, signatures of previous trial neural activity and/or choices have been identified in early visual cortex (St. John-Saaltink et al., 2016;Lueckmann et al., 2018), but the causal role of these visual cortical signals in behavior is debated (Lueckmann et al., 2018;Macke and Nienborg, 2019). The state of human motor cortex reflects previous motor actions (Pfurtscheller, 1981;de Lange et al., 2013;Pape and Siegel, 2016) and predicts people's bias to alternate their choices (Pape and Siegel, 2016). But again, this effect is subtle and its causal role and generality unknown. In rodents, posterior parietal cortex also carries history information that seems to play a causal role in decision biases (Hwang et al., 2017(Hwang et al., , 2019Scott et al., 2017;Akrami et al., 2018). All these studies have focused on one type of neural history signal in a single brain region and have not related those history signals to the dynamics of the evidence accumulation process. So, it is currently unclear if, and to which extent, these neural history signals in different brain regions co-exist, and how they map onto the different components of evidence accumulation.
We here aimed to disentangle neural choice history signals across the sensory-motor cortical pathway, and link these to distinct computations in a visual choice task. One possibility is that biased activity states in motor circuitry add to accumulated evidence, without interacting with the accumulation process per se.
Alternatively, biases in neural circuitry upstream from the evidence accumulator (Hanks et al., 2015;Murphy et al., 2021) may bias the evidence accumulation process directly, akin to perceptual or attentional biases.
We tested these ideas by analyzing MEG and behavioral data from 60 observers who discriminated small changes in visual motion strength. Our results identify multiple choice history signals in different cortical areas and frequency bands. Detailed behavioral modeling of these choice history signals specifically highlighted the role of posterior parietal gamma-band activity in biasing evidence accumulation and shaping choice sequences.

RESULTS
Participants (n = 60) performed a two-interval motion coherence discrimination task (Figure 1a). On each trial, they saw two random dot kinematograms separated by a brief delay. They were asked to judge whether the coherence of the second stimulus (called 'test') was stronger or weaker than that of the first stimulus (called 'reference'). The coherence of the reference was always 70%, from which the coherence of the test differed (toward stronger or weaker) by a small amount that yielded a threshold accuracy (about 70% correct) for each individual (see Methods). Because the strength of the refence stimulus was fixed across trials, the task did not require comparing the test against a short-term memory representation of the reference stimulus (as is the case in widely used tasks with trial-to-trial variations of the reference stimulus, e.g. (Machens et al., 2005;Akrami et al., 2018)). Rather, participants could solve our task by using a stable category boundary (in our case for 'stronger' vs. 'weaker') learned during practice before the MEG recordings (see Methods).

STIMULUS-RELATED DYNAMICS ACROSS THE VISUAL CORTICAL HIERARCHY
We analyzed known MEG signatures of visual motion processing and action planning (Siegel et al., 2006;Donner et al., 2009;Donner and Siegel, 2011;de Lange et al., 2013;Wilming et al., 2020;Murphy et al., 2021). During coherent motion viewing (reference and test), we observed occipital enhancement of highfrequency power (from about 30-100 Hz), accompanied by a suppression of low-frequency (< 30 Hz) power ( Figure 1b). The two motion stimuli also elicited a response around the screen refresh rate (60 Hz +/spectral smoothing box, see Methods), which was phase-locked to stimulus onset and hence dissociable from the high-frequency response occurring at random phase (Figure 1b, Figure S1). Our subsequent analyses focused on this non-phase locked activity in the high gamma band (65-95 Hz) ( Figure S1). Such sustained modulations of high gamma-band (65-95 Hz) activity may reflect broadband population spiking (Donner and Siegel, 2011;Honey et al., 2012).
High-gamma band activity (and, less reliably, the accompanying low-frequency suppression) is known to scale monotonically with visual motion coherence (Siegel et al., 2006). Indeed, we observed gamma-power enhancement and alpha-power suppression to stronger versus weaker visual motion. Both effects were present in the MEG sensors covering occipital cortex (Figure 1c, Figure S1) and at the source-level, in multiple topographically organized visual field maps (Table S1). Figure 1e shows the time course of the high gamma-band response for V3A/B, a visual cortical area implicated in visual motion processing (Tootell et al., 1997), and Figure 2a shows the effect of the stimulus category (stronger vs. weaker) across multiple cortical regions of interest. We focused on a hierarchically ordered set of dorsal visual field maps, from V1 into intraparietal cortex IPS2/3, and a set of parietal and frontal regions carrying action-selective activity ( Figure 1d; Table S1; see also (Murphy et al., 2021)). In sum, gamma-and alpha-band responses across visual cortex tracked the subtle difference in motion coherence that constituted our task's decision-relevant sensory signal.

CHOICE HISTORY SIGNALS IN POSTERIOR PARIETAL CORTEX
Gamma-band activity in the intraparietal sulcus also tracked observers' choices from the previous trial.
This effect was sustained throughout the complete trial including the pre-trial baseline interval ( Figure S2a, right), and it was robustly expressed during both the reference (Figure 2b, top) and test stimulus interval (Figure 2c,top). The signal encoding the previous choice first emerged transiently during the processing of that decision (i.e., test stimulus interval; Figure S2a, left) and then re-emerged during the inter-trial interval ( Figure S2a, right). IPS2/3 was the first ROI in the visual cortical hierarchy where gamma-band power did not encode visual motion strength (compare with Figure 2a, Figure S2a and b), again pointing to a source at the level of decision rather than sensory processing.   scaling with the test stimulus category (Figure 2a, bottom), as reported previously (Siegel et al, 2007).
Observers with a stronger choice history effect in IPS2/3 gamma showed a stronger choice history effect in IPS0/1 alpha (r = 0.4356, p = 0.0005, Bf10 = 41.4116). Yet, these two history effects showed opposite dependency on the previous decision's outcome: the IPS2/3 gamma effect was only present after correct trials, and the IPS0/1 alpha effect only after error trials ( Figure S3; no clear effect for V2-4 alpha). This suggests distinct, but coupled, functional processes driving these choice history effects in the neighboring parietal areas and distinct frequency bands.
Negative feedback induced a widespread suppression of alpha-band power across all posterior cortical areas ( Figure S4), consistent with previous EEG work ( van Driel et al., 2012). The overall pattern of alphaband modulations suggests a superposition of two distinct signals following errors: a specific enhancement of alpha-power only in IPS0/1 after incorrect 'stronger' choices ( Figure S3b) and a global error-related alpha-suppression that was unrelated to the previous choice ( Figure S4). Indeed, the specific IPS0/1 alpha

MOTOR CORTICAL SIGNALS RELATED TO UPCOMING AND PREVIOUS ACTION
Because the mapping between choice ('stronger' vs. 'weaker') and motor action (left vs. right button press) varied between participants, we could disentangle effects of choice and action on neural activity in our group analysis. In line with previous work (Pfurtscheller et al., 1996;Donner et al., 2009;Pape and Siegel, 2016;Murphy et al., 2021), the lateralization of low-frequency power (alpha-and beta-band) in areas implicated in action preparation and execution predicted the upcoming action: alpha-/beta-power was suppressed in the hemisphere contralateral vs. ipsilateral to the upcoming response hand, an effect that built up gradually during decision formation ( Figure 3a). As in previous work (Wilming et al., 2020), this effect was present in multiple frontal and parietal cortical areas, such as the hand area of primary motor After the button press, the low-frequency power reversed from suppression to enhancement relative to baseline (referred to as 'beta rebound') ( (Pfurtscheller et al., 1996), data not shown). The action-specific lateralization of power also reversed to enhancement contralateral to the previous button press (Figure 3).
This lateralization carried over to the next trial ( Figure 3b,c), as previously shown (Pape and Siegel, 2016).
This signal was expressed during the reference (Figure 4b, bottom), but less robustly during viewing of the test stimulus (Figure 4c, bottom, Figure S5). It was only present after correct choices ( Figure S6). The motor cortical history signal (beta-power lateralization in IPS/PostCeS, PMd/v, and M1 pooled) did not correlate with the IPS2/3 gamma choice history signals (across-participant correlation: r = 0.075, p = 0.5689, Bf10 = 0.1191).  In sum, we identified three neural signals that encoded different aspects of the previous choice: the perceptual decision and the motor act used to report that decision. Following 'stronger' compared to 'weaker' choices, both (i) gamma-band activity in IPS2/3 during test and reference intervals, and (ii) alphaband activity in IPS0/1 during the test stimulus interval, were both enhanced. Additionally, (iii) beta-band activity in motor areas (IPS/PostCeS, PMd, and M1) was stronger contralateral than ipsilateral to the previous motor response, only during the reference interval. These three signals had dissociable anatomical and frequency profiles, and differed in their sensitivity to the outcome (correct or error) of previous choices.
For brevity, we refer to these as IPS2/3-gamma, IPS0/1-alpha, and motor-beta (average of IPS/PostCeS, PMd, and M1), respectively. Note that the former two were extracted during the test stimulus interval, during which the decision was computed. The motor-beta signal was extracted from the reference interval, as its history modulation had vanished by the time of test stimulus viewing. In what follows, we used these signals for single-trial behavioral modeling to constrain their functional significance.

PARIETAL AND MOTOR CORTICAL SIGNALS PLAY DISTINCT ROLES IN EVIDENCE ACCUMULATION
We reasoned that these history-dependent neural signals may be the source of distinct biases in evidence accumulation that we have previously uncovered by fitting accumulation-to-bound models to behavior (Urai et al., 2019). In such models, specifically the widely used drift diffusion model (DDM), choice bias can arise from two different sources ( Figure 5a). On the one hand, an offset prior to the decision (and independent of the accumulation process) shifts the decision variable closer to one of the two decision bounds (starting point bias; Figure 5a, left column). On the other hand, the input to the evidence accumulator can be biased throughout decision formation, affecting the decision process just like a bias in the physical stimulus, and producing a stimulus-independent asymmetry in the rate of accumulation towards one versus the other bound (drift bias; Figure 5a, right column). These mechanisms can produce the same bias in choice fractions, but have distinct effects on the shape of reaction time distributions (Figure 5a).

Effect of action preparation Test stimulus interval
We reasoned that the parietal signals (IPS2/3-gamma and/or IPS0/1-alpha) during test interval may bias evidence accumulation toward choice repetition, while the motor beta signal (average of IPS/PostCeS, PMd, and M1) at the start of the trial (reference interval) may bias the starting point towards choice alternation. The underlying rationale was that the parietal signals were present during decision formation, occurred in regions that were likely to be upstream from a putative evidence accumulator (Brody and Hanks, 2016) and (for IPS0/1) encoded the decision-relevant sensory signal. The parietal signals also went in the direction of choice repetition (i.e. same sign as the effect of previous stimulus category). By contrast, the beta signal in action-related areas was expressed only before decision formation, and pointed toward choice alternation (opposite sign as the effect of previous choices, Figure 3a and (Pape and Siegel, 2016)).
We fitted DDMs to the neural and behavioral data to test these hypotheses. We used single-trial regressions to quantify the impact of trial-to-trial neural signals on drift bias and starting point (see Methods; (Wiecki et al., 2013)). Critically, all models included a term describing the current stimulus category, which (as expected) strongly predicted drift ( Figure S7). Thus, the regression coefficients for the neural signals quantified the impact of these neural signals on starting point and drift, over and above the impact of the current stimulus category on drift. We validated this approach by replicating the two main results from our previous, standard DDM fits (see (Urai et al., 2019); Figure 4a, 'Visual motion 2IFC (FD) #2')), specifically: (i) at the group level, previous choices had a negative effect on starting point (i.e., towards choice alternation) and a positive effect on drift (i.e., toward repetition; Figure S8a); (ii) individual differences in overt repetition behavior were better explained by the effect of choice history on drift bias, rather than starting point ( Figure S8b). We then replaced the previous choice predictors with single-trial neural signals, to assess if they predicted trial-to-trial variations in starting point or drift.
IPS2/3-gamma predicted a positive modulation of drift bias (p = 0.0405) (i.e., in the direction of choice repetition), but no effect on starting point (p = 0.2431; Figure 5b). This effect was also present when using neural data from the reference interval ( Figure S9a), and it was robust to inclusion of a predictor for the previous behavioral choice ( Figure S9b) as well as removal of the impact of current stimulus category from the single-trial neural signals (by subtracting mean neural signal for each stimulus category; Figure S9c).
In contrast to IPS2/3-gamma, the motor-beta signal during the reference predicted a negative modulation of starting point (p = 0.0086) (i.e., in the direction of choice alternation), but no effect on drift bias (p = 0.0989; Figure 5d). This effect was not significant during the delay interval, nor during the test interval ( Figure S10).  The starting point effect of the motor-beta corroborates a previous study implicating the signal in action alternation tendencies (Pape and Siegel, 2016) and extends this work by linking it to a specific computational parameter. Critically, with IPS2/3-gamma, we also identified a novel history effect with a distinct anatomical, spectral and computational signature. Given the superposition of these two computational effects with two opposite impacts on choice (one pushing toward alternation, the other toward repetition), we then asked if these two neural choice history signals affected history biases in subjects' choices.

IMPACT OF PARIETAL AND MOTOR CORTICAL SIGNALS ON BEHAVIORAL CHOICE
In a first approach, we ran a mediation analysis based on single-trial regressions ( Figure 6). In our model, the current choice was a categorical response variable (all regressions on that variable were logistic, see Methods), previous choice was a (categorical) regressor, and IPS2/3-gamma and motor-beta signals were included as candidate mediators of the impact of previous on current choice (Figure 6a). For completeness, we also included the IPS0/1-alpha (residual) signal during test as a candidate mediator of choice history and the current stimulus category as an independent factor, influencing current parietal signals and current choice ( Figure 6a). Only the IPS2/3-gamma signal (t(59) = 3.971, p = 0.0002) but none of the other two signals (IPS0/1 alpha: t(59) = 1.592, p = 0.1167; motor beta: t(59) = 1.178, p = 0.2434) played a highly significant role in mediating the effect of previous choice on current choice (Figure 4b). The mediating effect on choice of IPS2/3 gamma was present also when calculated selectively for previous choices that were correct, but not incorrect ( Figure S11 left). The latter may reflect a lack of power due to the lower number of error trials, which is also suggested by the similar pattern for the direct effect (Figure S11 right). The direct path was also significant (t(59) = 2.148, p = 0.0358; Figure 4b, right). This indicates that the IPS2/3gamma signal did not fully explain choice history biases ('partial mediation'), which is not surprising given that single-trial MEG signals are noisy and coarse population proxies of the cellular-resolution signals that drive choice behavior. Individual and group-level parameter estimates for the indirect paths quantifying mediation by neural signals (IPS2/3-gamma; IPS0/1 alpha, motor-beta) as well as the direct path (c') from previous to current choice (right). Error bars, 95% confidence intervals, statistics from a simple t-test. * 0.01 < p < 0.; *** p < 0.001; filled markers, p < 0.05.

RELATION OF NEURAL HISTORY SIGNALS WITH INDIVIDUAL CHOICE HISTORY BIASES
In a second approach, we considered individual differences in choice behavior. By design, stimulus categories (test stimulus stronger vs. weaker than reference) were largely uncorrelated across trials (mean stimulus repetition probability across participants: 0.496; range: 0.482 -0.499; see Methods). The group average choice repetition probability was, likewise, close to 0.5 (mean: 0.494; range: 0.394 -0.599), and did not show a significant difference from the stimulus repetition probability (p = 0.726, permutation test).
Individual repetition behavior was stable across the two sessions (r = 0.513, p < 0.0001). Closer inspection revealed substantial individual differences, larger for the repetition probability of choice than of stimulus, which canceled at the group level (Figure 7a, compare the spreads). Similar idiosyncratic patterns of choice history biases have been identified before, across many datasets and perceptual choice tasks (Urai et al., 2017, 2019) ( Figure S12).

DISCUSSION
Even in the face of random stimulus sequences, the tendency to systematically repeat or alternate previous choices is highly prevalent in decision-making (Fernberger, 1920;Rabbitt and Rodgers, 1977;Treisman and Williams, 1984;Fründ et al., 2014;Urai et al., 2019). Recent studies have begun to identify neural traces of choice history in several brain regions implicated in perceptual decision-making (Gold et al., 2008;Akaishi et al., 2014;Morcos and Harvey, 2016;Pape and Siegel, 2016;St. John-Saaltink et al., 2016;Akrami et al., 2018;Lueckmann et al., 2018;Hwang et al., 2019). Here, we combined human MEG recordings with behavioral modeling to disentangle neural choice history signals with distinct computational signatures. To help constrain their functional interpretation, we focused on known signatures of sensory encoding and action preparation across cortical areas involved in decision-making. Specifically, we focused on band-limited signals from extrastriate visual field maps in occipital, temporal, and parietal cortex involved in visual motion processing (from areas MT+ and V3A/B to IPS2/3; (Tootell et al., 1997;Siegel et al., 2006;Wang et al., 2015)) and more anterior parietal and frontal areas involved in action preparation information not only within (Gold and Shadlen, 2007), but also across trials, providing a bridge between sensory responses and longer-lasting beliefs about the structure of the environment.
Our findings corroborate recent animal studies, which have shown choice history signals in the posterior parietal cortex of mice (Morcos and Harvey, 2016;Hwang et al., 2017Hwang et al., , 2019 and rats (Scott et al., 2017;Akrami et al., 2018). History biases in rodents may specifically depend on PPC neurons that project to the striatum, rather than motor cortex (Hwang et al., 2019), highlighting how specific populations of neurons may be involved in distinct decision-making computations. Because of the diminished sensitivity of our MEG recordings (Hämäläinen et al., 1993) for subcortical regions, we did not analyze subcortical areas in the present study. The correlative nature of our recordings precludes strong inferences about the causal role of human parietal history signals (Macke and Nienborg, 2019), but optogenetic inactivation of rodent posterior parietal cortex reduces history dependencies in behavior (Hwang et al., 2017;Akrami et al., 2018).
Notably, this was observed only for inactivation prior to decision formation, and parietal inactivation during the next stimulus left choice history biases unaffected (Hwang et al., 2017;Licata et al., 2017;Akrami et al., 2018). The latter observation, obtained in rodents and different tasks, seems to be at odds with our finding that the human IPS2/3-gamma signal during stimulus viewing was a significant mediator of choice history biases. This apparent inconsistency points to the need for combined inactivation and recording studies and more direct cross-species comparisons, in order to better understand the specific role of parietal cortical dynamics in shaping choice sequences.
Any biasing effect of neural activity in visual field maps on the rate of evidence accumulation ('drift bias') may reflect a biased encoding of the sensory information, or a non-sensory bias signal that feeds into the accumulator together with the stimulus information. Our results argue against the first scenario: for a history signal to reflect biased sensory encoding, it would have to be expressed in a region and frequency band that also encodes the stimulus in the first place. IPS2/3 was the first visual field map in the hierarchy where gamma-band activity did not encode the visual stimulus. This makes it unlikely that the IPS2/3-gamma signal reflects a sensory gain modulation, analogous to selective attention (Kastner and Ungerleider, 2000;Reynolds and Heeger, 2009). Rather, IPS2/3-gamma is consistent with a non-sensory signal that adds a history-dependent bias to the accumulation process, which translates into the observed stimulusindependent drift bias and resulting bias in choice behavior.
Previous behavioral work has pointed to a link between response preparation and the starting point of evidence accumulation (Leite and Ratcliff, 2011;White and Poldrack, 2014;Zhang and Alais, 2020).
Physiological work has shown that action-specific beta-power lateralization in the cortical motor system pushes the motor state away from most recent response (Pfurtscheller, 1981;Pape and Siegel, 2016)).
Combining physiology with single-trial behavioral modeling enabled us to show that this motor beta signal specifically biased the starting point of evidence accumulation toward response alternation. This beta signal is also found in pure motor tasks that do not entail decision-making, and it likely reflects idling in motor circuits (Pfurtscheller et al., 1996) that prevents action repetition. In our task, this signal did not have a detectable effect on overt choice sequences. This is largely in line with previous work: repetitive choice history biases persist, even with variable stimulus response mapping (Akaishi et al., 2014;Pape and Siegel, 2016;Braun et al., 2018;Feigin et al., 2021), and inactivation of PPC-M2 projection neurons does not abolish choice history bias in mice (Hwang et al., 2017). In our experiment, the choice-response mapping was fixed within individuals, different from the variable mapping used by (Pape and Siegel, 2016 The DDM is a simplified model of the dynamics of evidence integration (Bogacz et al., 2006). For example, it is unlikely that decision bounds are stationary and integration non-leaky. We previously established that even in such cases, choice history most strongly affects the accumulation bias, rather than the starting point of evidence integration (Urai et al., 2019). Future work could combine time-resolved sensory inputs with fitting of more complex decision-models (Brunton et al., 2013;Urai et al., 2019), to help refine the withintrial time course of choice history biases across cortical areas.
Neural correlates of decision biases in primates have commonly been studied in the context of explicit experimental manipulations of e.g., stimulus probabilities, rewards or single-trial cues. By contrast, the biases we describe here are intrinsic. They arise despite verbal instructions to focus only on sensory stimuli, cannot fully be eliminated with extensive training (Gold et al., 2008;Fründ et al., 2014), and are strongly idiosyncratic (Urai et al., 2019). The source of these individual differences remains a target for speculation.
Agents may differ in their representation of the stability of the environment, yielding distinct different history biases (Glaze et al., 2015). The IPS2/3 gamma signal identified here was specific to observers who tended to repeat their choices, pointing to a potential role in implementing a default assumption of environmental stability.
In conclusion, our results show that choice history is an important source of trial-to-trial variability in cortical dynamics, which in turn biases subsequent decision computations and choice behavior. These results contribute to our understanding of how decision processes arise from a rich interplay of sensory information and contextual factors across the cortical hierarchy.

PARTICIPANTS
64 participants (aged 19-35 years, 43 women and 21 men) participated in the study after screening for psychiatric, neurological or medical conditions. All participants had normal or corrected to normal vision, were non-smokers, and gave their informed consent before the start of the study. The experiment was approved by the ethical review board of the University Medical Center Hamburg-Eppendorf (reference PV4648). Before each experimental session, participants were administered a pill containing donepezil (5 mg Aricept®), atomoxetine (40 mg Strattera®) or placebo (double-blind cross-over design). These pharmacological manipulations did not affect behavioral choice history biases (Urai et al., 2019), and were therefore not incorporated in the analyses presented here.
Three participants did not complete all 5 sessions of the complete experiment and were thus excluded from analyses. After rejecting trials with excessive recording artefacts (see below), we discarded one additional participant with fewer than 100 trials per session remaining. In total, 60 participants were included in the analysis.

BEHAVIORAL TASK
Participants were asked to judge if the coherence of a random-dot motion stimulus in a so-called test interval was stronger or weaker than a preceding reference stimulus, which was shown afresh on each trial at 70% coherence (Figure 1a). A red 'bulls-eye' fixation target of 1.5° diameter (Thaler et al., 2013) was present in the center of the screen throughout the experiment. Each trial started with a baseline interval of 500 -1000 ms of randomly moving dots (0% coherence). A beep (50 ms, 440 Hz) indicated the onset of the reference stimulus (70% coherence) that was shown for for 750 ms. The reference was followed by a variable (300 -700 ms) delay (0% coherence). An identical beep indicated the onset of the test stimulus, whose motion coherence deviation from the 70% reference toward stronger or weaker coherence. The deviation was individually titrated prior to the main experiment (see below). A counterbalancing scheme ensured that each stimulus category (weaker or stronger) was followed by the same or the other category equally often (Brooks, 2012

STIMULI
Random dot kinematograms were presented in a central annulus (outer radius 14°, inner radius 2°) around fixation. The annulus was defined by a field of dots with a density of 1.7 dots/degrees 2 . Dots were white with 100% contrast from the black background and 0.2° in diameter. Signal dots were randomly selected on each frame, moved with 11.5°/second in one of four diagonal directions and had a limited 'lifetime' of four consecutive frames, after which they were replotted in a random location. Signal dots that left the annulus wrapped around and reappeared on the other side. Noise dots were assigned a random location within the annulus on each frame, resulting in 'random position' noise with a 'different' rule (Scase et al., 1996). Moreover, three independent motion sequences were interleaved on subsequent frames to preempt tracking of individual signal dots (Roitman and Shadlen, 2002).

PROCEDURE
Before their first MEG session, participants received instructions and then did one behavioral session to determine their 70% correct threshold for the main experiment. First, 600 trials with test stimuli containing 1.25, 2.5, 5, 10, 20 and 30% coherence difference (from the 70% coherence reference) were randomly interleaved. The inter-stimulus interval was 1 second, and participants took a short break after each set of 125 trials. They did not receive feedback. Stimuli were presented on an LCD screen at 1920x1080 resolution and 60 Hz refresh rate, 60 cm away from the participants' eyes. To determine each individual's psychometric threshold, we fit a cumulative Weibull as a function of absolute coherence difference , defined as where δ is the guess rate (chance performance), γ is the lapse rate, and α and β are the threshold and slope of the psychometric Weibull function (Wichmann and Hill, 2001). While keeping the guess rate δ bound at 50% correct, we fit the parameters α, β and γ using a maximum likelihood procedure implemented by minimizing the logarithm of the likelihood function. This was done using a Nelder-Mead simplex optimization algorithm as implemented in Matlab's fminsearch function. The individual threshold was taken as the stimulus difficulty corresponding to a 70% correct fit of the cumulative Weibull.
Second, participants performed another 100 trials using a 2-up 1-down staircase procedure. This procedure accounted for any learning effects or strategy adjustments during thresholding. The coherence difference between the two stimuli started at their 70% correct threshold as obtained from the Weibull fit. It was increased by 0.1% coherence on making an error, and decreased by 0.1% on giving two consecutive correct answers. Thresholds from this staircase ranged from 3.3% to 13.4% (mean 6.9%) motion coherence difference.
Participants then performed the task at their individual motion threshold for a total of 1.200 trials during two MEG sessions (600 trials each). Between these two MEG sessions, they performed three practice sessions (1500 trials each, separate days) outside the MEG. In the behavioral practice sessions, we presented feedback immediately after the participants' response. An ISI of 1s was observed before continuing to the next trial. Participants completed training on 4500 trials, over 3 separate sessions, between the two MEG recordings. The training data are not used in our current analyses.

QUANTIFICATION OF BEHAVIORAL CHOICE HISTORY BIASES
We quantified choice history strategies by fitting a logistic regression model using Matlab's fitglm:

MEG DATA ACQUISITION
MEG was recorded using a 275-channel CTF system in a shielded room. Horizontal and vertical EOG, bipolar ECG, and an electrode at location POz (about 4 cm above the inion) were recorded simultaneously.
All signals were low-pass filtered online (cut-off: 300 Hz) and recorded with a sampling rate of 1200 Hz. To minimize the displacement of the subject's head with respect to the MEG sensors, we used online head-localization (Stolk et al., 2013) to show the head position to the subject inside the MEG chamber before each block. Participants were then asked to move themselves back into their original position, correcting slow drift of their head position during the experiment. Between the two recording days, the original head position from day one was used as a template for day two.
Stimuli were projected into the MEG chamber using a beamer with a resolution of 1024 x 768 pixels and a refresh rate of 60 Hz. The screen was positioned 65 cm away from participants' eyes. Horizontal and vertical gaze position and pupil diameter were recorded at 1000 Hz using an MEG-compatible EyeLink 1000 on a long-range mount (SR Research) at 60 cm from the subject's eye. The eye tracker was calibrated before each block of training.

PREPROCESSING OF MEG DATA
MEG data were analyzed in Matlab using the Fieldtrip Toolbox (Oostenveld et al., 2011) and custom scripts. MEG data were first resampled to 400 Hz and epoched into single trials from baseline to 2 s after feedback. We removed trials where the displacement of the head was more than 6 mm from the first trial of each recording. Trials with SQUID jumps were detected by fitting a line to each single-trial logtransformed Fourier spectrum, and rejecting trials where the intercept was detected as an outlier based on Grubb's test. To remove the effect of line noise on the data, we computed the cross-spectrum of the data at 50 Hz, resulting in a complex matrix of size n-by-n, where n was the number of channels. We applied singular value decomposition to this cross-spectrum and took the first eigenvector (corresponding to the largest singular value) as the spatial topography reflecting line noise. The two-dimensional space spanned by the real and imaginary parts of this eigenvector was then projected out of the data, effectively suppressing any signal that co-varied with activity at 50 Hz. Line noise around 50, 100 and 150 Hz was then removed by a band-stop filter, and each trial was demeaned and detrended.
We also removed trials containing low-frequency artefacts from cars passing by the scanner building, muscle activity, eye blinks or saccades. These were detected using FieldTrip's automated artefact rejection routines, with rejected thresholds determined per recording session by visual inspection.

SPECTRAL ANALYSIS OF MEG DATA
We computed time-frequency representations for each of the four epochs of interests, analyzing low and high frequency ranges separately. For the low frequencies (3-35 Hz in steps of 1 Hz), we used a Hanning window with a length of 400 ms in steps of 50 ms and a frequency smoothing of 2.5 Hz. For high frequencies (36-120 Hz in steps of 2 Hz), we used the multitaper technique with five discrete proloid slepian tapers (Mitra and Pesaran, 1999) a window length of 400 ms in steps of 50 ms, and 8 Hz frequency smoothing.
For sensor-level analyses, the data from each axial gradiometer were decomposed into two planar gradients before estimating power and combined afterwards, to simplify the topographical representation of task-related power modulations.
The time-frequency representations were converted into units of percent power change from the pre-trial baseline. The baseline power was computed as the across-trial mean from -300 to -200 ms before reference onset, separately for each sensor and frequency. The resulting time-frequency representations were further averaged across trials and sensors of interest.

MEG SOURCE RECONSTRUCTION
We estimated power modulation values for a set of cortical regions of interest (ROIs, see below) based on source-reconstructed voxel time courses from a sliding window DICS beamformer (Van Veen et al., 1997;Gross et al., 2001). A source model with 4 mm resolution was created from each individual's MRI, and warped to the Colin27 brain (Holmes et al., 1998) using a nonlinear transformation for group averaging.
Within the alpha (8-12 Hz), beta (12-36 Hz), and high-gamma (65-95 Hz) bands, we computed a common filter based on the cross-spectral density matrix estimated from the first 2 s of each trial, starting from the start of the baseline time window. For each grid point in the brain, we then applied the beamformer (i.e., spatial field) in a sliding window of 250 ms, with steps of 50 ms. The resulting source estimates of bandlimited power were again converted into units of percent power change from the trial-average baseline, as described above. Rare outliers with values larger than 500 were removed. All further analyses and modeling were applied to the resulting power modulation values.

SELECTION OF MEG SENSORS EXHIBITING SENSORY OR MOTOR SIGNALS
To select sensors for the unbiased quantification of visual responses, we computed power modulation in the gamma-range (65-95 Hz) from 250 to 750 ms after test stimulus onset, and contrasted trials with stronger vs. weaker visual motion. We then selected the 20 most active sensors at the group level, in the first and second session separately. This procedure yielded stable sensor selection across sessions (18 sensors were selected in both sessions, two were selected only in the first session and one was selected only in the second session; see symbols in inset of Figure 1c).
Similarly, for sensors corresponding to response preparation, we contrasted trials in which the left vs. the right hand was used to respond. We computed power in the beta range (12-36 Hz) in the 500 ms before button press (Donner et al., 2009), and used the same split-half approach to define the 20 most active sensors for the contrast left vs. right, as well as the 20 most active sensors the for opposite contrast, to extract left and right motor regions. For each session, we then extracted single-trial values from the sensors defined on the other session.

DEFINITION OF CORTICAL REGIONS OF INTEREST (ROIS)
Following previous work (Wilming et al., 2020;Murphy et al., 2021), we defined a set of ROIs spanning the visuo-motor cortical pathway from the sensory (V1) to the motor (M1) periphery. The exact delineation of ROIs was based on anatomical atlases from previous fMRI work, specifically: (i) retinotopically organized visual cortical field maps (Wang et al., 2015) along the dorsal visual pathway up to IPS3; (ii) three regions exhibiting hand movement-specific lateralization of cortical activity: aIPS, IPS/PostCeS and the hand subregion of M1 (de Gee et al., 2017); and (iii) a dorsal/ventral premotor cluster of regions from a whole-cortex atlas (Glasser et al., 2016). We grouped visual cortical field maps with a shared foveal representation into clusters (Wandell et al., 2007) (Table S1), thus increasing the spatial distance between ROI centers and minimizing the risk of signal leakage (due to limited filter resolution or volume conduction). We selected all grid points located within each grouped ROI, and averaged their band-limited power signals.

TIME WINDOWS AND SIGNALS OF INTEREST
We selected two time-windows for in-depth statistical modeling: (i) the test interval during which the decision was formed (0-750 ms after test stimulus onset); and (ii) for pre-decision state, the reference interval (0-750 ms after reference stimulus onset). For each trial, we averaged the power modulation values across all time bins within these two time-windows, and used the resulting scalar values for further analyses.
The general linear modeling (see below) was applied to each ROIs individually. For subsequent mediation and drift diffusion modeling, we further focused on three signals of interest: IPS2/3 gamma during the test stimulus (reflecting choice history), IPS0/1 alpha during the test stimulus (reflecting choice history after error trials, and motor beta (pooled across IPS/PostCeS, M1 and PMd/v) during the reference (reflecting action history). Choice-action mapping was counterbalanced across participants. We flipped motor lateralization signals for half of the participants, so that the lateralization was always computed with respect to the hand reporting 'stronger' choices.
The choice history-dependent IPS0/1-alpha signal was superimposed by a spatially non-specific suppression of alpha power following error feedback, which was shared by all cortical ROIs but not related to specific choice history (Figure 2 -Supplement 3). We averaged this global signal across all visual field map ROIs except IPS0/1, and removed it from the IPS0/1-alpha power modulation values via linear projection (Fox et al., 2006;Donner et al., 2008;Cardoso et al., 2012). We used this residual IPS0/1-alpha, unconfounded by the global signal, for all subsequent behavioral modelling.

STATISTICAL ASSESSMENT OF POWER MODULATION VALUES
At the sensor-level (see definition of sensors of interest above) we used cluster-based permutation testing across the group of participants (Maris and Oostenveld, 2007) to find clusters, for which trial-averaged power modulation values differed across the group of participants for a given contrast of interest. For the assessment of full time-frequency representations of power modulation, clusters were two-dimensional, defined across time and frequency.
We used general linear mixed effects models (GLMEs, using Matlab's fitglme) to quantify the effect of choice history on single-trial power modulation values across all source-level ROIs, frequency ranges and the above-defined time windows. The model included a random intercept for each participant: where was the single-trial neural data (at a specific time window, region of interest and frequency band), We also tested if the effect of previous choices differed between participants with choice repetition probabilities larger ('Repeaters') or smaller ('Alternators') than 0.5. We first estimated effect sizes and confidence intervals separately for these two groups, using the model above. This was repeated after randomly sub-selecting 25 'repeaters', to match the number of participants between the two groups. We also fitted the group interaction term explicitly: where was coded as [-1,1], reflecting if a participant showed overall alternation vs. repetition behavior.
P-values were corrected for multiple comparison across ROIs, frequency bands and time windows (Benjamini and Hochberg, 1995).

MEDIATION MODELING
To estimate the causal effect of trial-by-trial neural signals on choice behavior, we performed a mediation analysis using the lavaan package (Rosseel, 2012). We fit the following regression equations where was the single-trial IPS2/3 gamma, was the single-trial IPS/01 alpha residual, was the singletrial pooled motor-beta, was a vector of choices, and was a vector with stimuli (-1 'weak' vs. 1 'strong').
We then defined our effects of interest as follows: We fit the model separately for each participant using a WLSMV estimator, and then computer grouplevel statistics across the standardized individual coefficients. Data were analyzed with pandas and pingouin (Vallat, 2018) and visualized with seaborn (Waskom, 2021).

DRIFT DIFFUSION MODELLING
To fit a set of Hierarchical Drift Diffusion models with trial-by-trial MEG regressors, we used the HDDMregression module of the HDDM package (Wiecki et al., 2013). We let both the starting point and drift bias of the DDM depend on single-trial neural data: Where was the drift, and the starting point with link function 1/(1 + !* ). was the stimulus category [- Individual parameter estimates were then estimated from the posterior distributions across the resulting 67.500 samples. All group-level chains were visually inspected to ensure convergence. Additionally, we computed the Gelman-Rubin P statistic (which compares within-chain and between-chain variance) and checked that all group-level parameters had an P close to 1. Statistics were computed on the group-level posteriors.

DATA AND CODE AVAILABILITY
All code used to run the task, process data and generate figures is available at https://github.com/anneurai/2021_Urai_choicehistory_MEG. We regret that raw data cannot be publicly shared (due to the consent form used at time of data collection), but are available upon request. Figure S1. Evoked and total visual responses.        Note that the strongly negative effect on drift rate reflects preparation for the upcoming choice; during stimulus viewing, suppression of beta-band lateralization ramps up until the button press and tracks the unfolding motor response (Figure 3a). After correcting for the between-subject stimulus-response counterbalancing, this motor beta signal points in the same direction as the current choice, therefore strongly predicting current choice. Hence it should also predict the sign of (one or both) the bias parameters. Under the assumption that motor beta tracks the build-up of the decision variable (Murphy et al., 2021), one would expect the slope of the motor beta signal to be the primary predictor of drift. The current analysis shows that also the amplitude of the motor beta signal during the decision interval strongly predicts drift. In fixed duration tasks like ours, the amplitude of a neural decision signal integrated across the decision interval is positively related to the slope of this neural signal, explaining the observed effect.  Figure 2c) with trial-to-trial variations in the stimulus category (stronger or weaker visual motion). To isolate the stimulus-unrelated, intrinsic trialto-fluctuations in IPS activity (including choice history signals), we removed the stimulus response in IPS2/3 gamma through linear regression, and then repeated the above single-trial DDM fitting. The results were robust to this correction procedure. Figure S10. As in Figure 5b. (a) Based on motor signals from the delay interval, before test stimulus viewing. (b) Based on motor during test stimulus viewing. Note that the strongly negative effect on drift rate reflects preparation for the upcoming choice; during stimulus viewing, suppression of beta-band lateralization ramps up until the button press and tracks the unfolding motor response (Figure 3a). After correcting for the between-subject stimulus-response counterbalancing, this motor preparation points in the same direction as the current choice, therefore strongly predicting signed drift. Figure S11. Mediation results after correct and error trials. As Figure 6, but splitting trials by the feedback of the previous trial. Figure S12. History strategies. Observers' repetition behavior could either be governed by the previous trial's true stimulus category (which could be inferred from the combination of choice and feedback) or by previous choices, a simple proxy of internal decision confidence (Braun et al., 2018). We fit a logistic regression model with these two history predictors, and classified people as systematic Repeaters (if either choice or stimulus history predictors were significantly positive at p < 0.05), systematic Alternators (if either predictor was significantly negative at p < 0.05), unbiased (if neither was significant) or unclassified (if two were significant at p < 0.05, but in opposite directions). 18 participants repeated their correct choices (or the true stimulus category) from trial to trial significantly more than expected from change, 4 significantly alternated, and 34 exhibited no significant first-order sequential dependency in their choice patterns. Overall, this analysis, indicates that systematic choice repetition was prevalent than systematic choice alternation in our sample of subjects.