Adaptive circuit dynamics across human cortex during evidence accumulation in changing environments

Many decisions under uncertainty entail the temporal accumulation of evidence that informs about the state of the environment. When environments are subject to hidden changes in their state, maximizing accuracy and reward requires non-linear accumulation of evidence. How this adaptive, non-linear computation is realized in the brain is unknown. We analyzed human behavior and cortical population activity (measured with magnetoencephalography) recorded during visual evidence accumulation in a changing environment. Behavior and decision-related activity in cortical regions involved in action planning exhibited hallmarks of adaptive evidence accumulation, which could also be implemented by a recurrent cortical microcircuit. Decision dynamics in action-encoding parietal and frontal regions were mirrored in a frequency-specific modulation of the state of the visual cortex that depended on pupil-linked arousal and the expected probability of change. These findings link normative decision computations to recurrent cortical circuit dynamics and highlight the adaptive nature of decision-related feedback to the sensory cortex. Optimal decision making in a changing world requires non-linear evidence accumulation. Murphy et al. report signatures of this adaptive computation in recurrent dynamics of human parietal and motor cortices, accompanied by feedback to sensory cortex.


Results
To interrogate the mechanisms of adaptive perceptual evidence accumulation, we developed a task with hidden changes in the environmental state (that is, evidence source; Fig. 1a). The evidence samples were small checkerboard patches presented in a rapid sequence at different locations along a semicircle in the lower visual hemifield. Sample positions were generated from one of two noisy sources, the probability distributions shown in the upper row of Fig. 1a. The 'active' source was chosen randomly at the beginning of each trial and could change after each sample, with a probability (hazard rate H) of 0.08 in the main experiment. The participant's task was to report which source was active at the end of each sequence via left-or right-handed button press. Signatures of adaptive evidence accumulation. The normative model maximizing accuracy on this task (Fig. 1b) entails the accumulation of evidence samples that carry information (in the form of log-likelihood ratios, LLRs) about the two possible environmental states 13 , a process also known as 'belief updating' . This model serves as a benchmark against which the behavior and neural activity of human participants can be compared. The key difference between the normative model and previous accumulator models 1  Many decisions under uncertainty entail the temporal accumulation of evidence that informs about the state of the environment. When environments are subject to hidden changes in their state, maximizing accuracy and reward requires non-linear accumulation of evidence. How this adaptive, non-linear computation is realized in the brain is unknown. We analyzed human behavior and cortical population activity (measured with magnetoencephalography) recorded during visual evidence accumulation in a changing environment. Behavior and decision-related activity in cortical regions involved in action planning exhibited hallmarks of adaptive evidence accumulation, which could also be implemented by a recurrent cortical microcircuit. Decision dynamics in action-encoding parietal and frontal regions were mirrored in a frequency-specific modulation of the state of the visual cortex that depended on pupil-linked arousal and the expected probability of change. These findings link normative decision computations to recurrent cortical circuit dynamics and highlight the adaptive nature of decision-related feedback to the sensory cortex.
the estimated rate of environmental state change (H) 13 . Specifically, the prior for the next sample (ψ n+1 ) is determined by passing the updated belief (L n ) through a non-linear function that, depending on H, can saturate (slope ≈ 0) for strong L n and entail more moderate information loss (0 ≪ slope < 1) for weak L n ( Fig. 1c and Extended Data Fig. 1). By this process, the normative model strikes an optimal balance between formation of strong beliefs in stable environments versus fast change detection in volatile environments.
In our setting, this non-linearity could be cast in terms of sensitivities of evidence accumulation to two quantities: (1) the uncertainty about the environmental state before encountering a new sample (denoted as −|ψ|; Fig. 1e) and (2) the change-point probability (CPP; Fig. 1d), defined as posterior probability that a state change just occurred, given an existing belief and a new evidence sample. CPP scaled with the inconsistency between the new sample and the existing belief (Fig. 1d). Neither variable was explicitly used in the normative computation, but both helped pinpoint diagnostic signatures of adaptive evidence accumulation in human behavior and neural activity, as shown below. Across levels of H and environmental noise, CPP increased transiently when a state change occurred followed by a period of heightened −|ψ| ( Fig. 1f and Extended Data Fig. 1). A linear model in which both quantities modulated belief updating reliably predicted the extent to which an ideal observer integrated a new sample of evidence to form the prior for the next sample (model 1, Methods, Fig. 1g and Extended Data Fig. 1; median R 2 across generative settings = 99.7%, range = (89.2-99.98%)).
Critically, our participants exhibited the same modulation of evidence accumulation by CPP and −|ψ| as the normative process depicted in Fig. 1. We quantified this modulation through logistic regression (model 2, Methods). The first set of regression weights, which quantified the time course of the leverage of LLR on choice (also called psychophysical kernel 10,17 ) increased over time (P < 0.0001, two-tailed permutation test on weights for the first six versus the last six samples; Fig. 2b, left). Thus, similar to the normative model for our task (Extended Data Fig. 1d), participants tended to discount early information in their final choices. The second and third sets of regression weights captured the modulation of evidence weighting by CPP and −|ψ|. Each revealed strong positive modulations (CPP, P < 0.0001, Fig. 2b, middle; −|ψ|, P = 0.0002, Fig. 2b, right; two-tailed cluster-based permutation tests). The modulatory effect of CPP was also larger than that of −|ψ| (P < 0.0001, two-tailed permutation test averaging over sample position). In other words, both CPP and −|ψ| 'upweighted' the impact of the associated evidence on choice. The same evidence-weighting signatures were produced by the normative accumulation scheme for a range of task statistics (Extended Data Fig. 1d). The signatures also replicated in a separate dataset at faster and slower sample presentation rates (Extended Data Fig. 3). These behavioral signatures ruled out a range of alternative evidence accumulation schemes. Perfect linear accumulation (without bounds) produced flat psychophysical kernels (Fig. 2b, magenta), while perfect accumulation toward an absorbing bound produces stronger weights for early samples 18 . Leaky accumulation with exponential decay of accumulated evidence 19 produces recency but not the CPP or −|ψ| modulations of evidence weighting (Extended Data Fig. 4). In sum, the signatures in Fig. 2b suggested that participants' behavior was produced by a computation that approximated the normative evidence accumulation process shown in Fig. 1b. This conclusion was supported by fitting different models to participants' choices ( Fig. 2 and Extended Data Figs. 4 and 5). We fitted several variants of the normative model, with between three and 11 free parameters (see Methods for details). Here, we use the term 'normative model' as shorthand for the adaptive accumulation scheme from Fig. 1b, without implying a noise-free or unbiased computation. All variants included a free subjective hazard rate parameter (H), a decision noise parameter and at least one parameter for assigning subjective weight (LLR) to the range of possible sample locations 13 . Some variants also included up to seven additional parameters for a non-linear scaling of subjective LLR and/or a gain parameter controlling the evidence weighting depending on its consistency with current belief 13,[20][21][22][23] . Inclusion of these additional parameters was supported by model comparison (Extended Data Fig. 5 and Methods). The choices of the best-fitting model variant ('normative fit' in Fig. 2, cyan) were highly consistent with participants' choices (88.8% ± 0.8%). The normative fit outperformed all considered versions of perfect and leaky accumulator models (mean ΔBayes information criteria (BIC) = 71.1 ± 7.6; higher BIC for all 17 participants; Extended Data Fig. 5).
One alternative to the normative model, perfect accumulation toward non-absorbing bounds, did produce the above evidence-weighting signatures (Extended Data Fig. 4) and only marginally worse fits to the data than the best-fitting version of the normative model (ΔBIC = 11.1 ± 2.9; higher BIC for 16 of 17 participants; Fig. 2c). Indeed, the non-linearity imposed by the non-absorbing bounds can approximate the normative accumulation scheme for our task setting (low H and high signal-to-noise ratio (SNR)) 13 . This indicates that saturation for strong belief states (and not leak for weak belief states) was critical in accounting for participants' behavior.
In sum, our analyses indicate that, while limited in their performance by internal noise and biased internal representation of task variables, participants approximated the normative evidence accumulation process for the current setting: adaptive, non-linear accumulation, characterized by sensitivity during periods of uncertainty and plateaus in the evolving decision variable during periods of relative stability.
Implementation via recurrent cortical circuit dynamics. A prominent cortical circuit model for decision making 16 is made up of recurrently connected spiking neurons (Fig. 2c). The circuit model has previously been used to simulate single-neuron activity in regions implicated in decision making during standard perceptual choice tasks 16  points. Instead, it exhibits sensitivity to input early in a trial and stable, saturating firing rates (so-called attractor states) during periods of stability (Fig. 2d). We therefore reasoned that this model might also exhibit the above-described features of the normative accumulation scheme, its approximation in our generative setting (accumulation toward non-absorbing bounds) and human behavior. Indeed, when simulated for our task, the circuit model consistently 'changed its mind' (that is, reversed the sign of activity dominance between choice-selective populations) in response to changes in the evidence source (Fig. 2d). Even without quantitative fitting, the circuit model's behavior approximated the performance of the normative model and human participants (Fig. 2a), and its choices were highly consistent with both those of the normative model fits (91.8% ± 0.5%) and the human participants (86.2% ± 0.7%). Likewise, the single-trial trajectories of the decision variables from normative and circuit models were strongly correlated (median r = 0.90 ± 0.009, P < 0.0001, two-tailed permutation test).
Most importantly, the model naturally generated all behavioral signatures of the normative computation, specifically, the modulation of evidence weighting by CPP and −|ψ| (Fig. 2b, orange; see Extended Data Fig. 6 for an assessment of the boundary conditions for this behavior). These findings suggest that the non-linear dynamics of the recurrent circuit are, in fact, adaptive for decision making in changing environments.
Motor preparatory activity tracks adaptive accumulation. The above behavioral analyses indicate that participants (and the circuit model) implemented an adaptive evidence accumulation scheme in line with normative principles. We next sought to identify signatures of the resulting decision variable in cortical population activity, measured via magnetoencephalography (MEG). Previous work on choice tasks without change points has identified signatures of evidence accumulation in neural signals reflecting the preparation of an action plan 2,[4][5][6][7][8]10,11,24 . To isolate this motor preparatory activity, we applied a linear (spectral and spatial) filter to the data from the main task (Methods and Fig. 3a, right). The filter was constructed using data from a separate delayed motor-response task in which participants prepared left-or right-handed button presses and which yielded sustained lateralization of alpha-(8-14 Hz) and beta (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) Hz)-band power over motor cortices contralateral to the prepared response (Fig. 3a, left).
The selective motor preparatory activity built up gradually throughout the prolonged period of evidence accumulation (Fig. 3b). All analyses were pooled across correct and error trials because, unlike in tasks without change points, errors do not necessarily imply a failure of inference: even the ideal observer performing our task made errors on ~18% of trials, despite the absence of noise and biases and the inference process being exactly the same as for correct trials (Fig. 2a, navy). The build-up of motor preparatory activity closely mirrored the trajectories of the decision variables from the normative and circuit models but not those from the perfect accumulator, which failed to rapidly respond to change points (Fig. 3c,d). Single-trial activity was significantly correlated with decision variable trajectories of both the normative fit and the circuit model across the group ( Fig. 3d; normative, mean r = −0.085, P < 0.0001, two-tailed permutation test; circuit, mean r = −0.081, P < 0.0001, two-tailed permutation test) and individually within 16 of 17 participants (Fig. 3d, filled circles). Also, motor preparatory activity after each updating step (that is, after computation of ψ n+1 , the prior for the next sample) showed a sigmoidal dependence on L n , as predicted by both the normative and circuit models (Fig. 3e).
To quantify the adaptive nature of the dynamics of motor preparatory activity following each sample, we regressed it on a linear model composed of prior belief (ψ n ), new evidence (LLR n ) and the interactions of new evidence with CPP n and −|ψ| n (Methods, model 3). While the two interaction terms would be zero for linear (perfect or leaky) accumulation schemes (Fig. 3g, pink or gray bars), a signal reflecting the normative decision variable should have non-zero regression coefficients for all terms, with a stronger interaction term for CPP n than that for −|ψ| n (Fig. 3g, cyan).
These are our findings for the motor preparatory activity (Fig. 3g). The first three terms were encoded in motor preparatory activity in both beta and alpha bands (ψ, P = 0.0006; LLR, P = 0.0002; LLR n ⋅CPP n , P = 0.0009; two-tailed cluster-based permutation tests) with sustained encoding of the prior and delayed encoding of the evidence sample that was in turn modulated by CPP. The −|ψ| n interaction was weaker than the CPP n interaction, as expected (cyan bars for normative fits), showing only a trend in the expected direction (P = 0.08, cluster corrected; Fig. 3g). Formal comparison between versions of the above regression model containing only evidence encoding or only belief encoding (Methods, models 4 and 5) showed that the motor preparatory alpha-and beta-band activity was better explained by encoding of the non-linearly updated decision variable (ψ n+1 ) than by encoding of momentary evidence (LLR n ; Fig. 3f, group-level BIC scores; BIC ψ < BIC LLR for 15 of 17 participants averaged over the highlighted clusters). We also note that LLR was weakly positively encoded in transient low-frequency delta-and/or theta-band activity (1-7 Hz; Fig. 3g, middle left). However, this effect did not survive correction for multiple comparisons (P = 0.11), possibly because the cluster was small in spatio-temporal extent. In sum, the dynamics of population activity in the human motor cortical system approximated the dynamics of the normative decision variable for our change-point task. These adaptive cortical dynamics (Fig. 3) and the associated behavioral signatures (Fig. 2) naturally emerged from the cortical circuit model. This alignment between the circuit model, normative model and motor preparatory activity was remarkable, because we did not quantitatively fit the circuit model to any feature of the data. Yet, the transformation of sensory input into behavioral choice unfolds in a pathway made up of many recurrently connected cortical regions spanning sensory to motor cortices, each of which have different properties (for example, strength of self-recurrence) 25,26 . In perceptual choice tasks without change points, signatures of the decision variable (action plan) have been observed in all these areas, including even the early sensory cortex 17,27 . Furthermore, brainstem arousal systems exert powerful effects on cortical circuit dynamics during such tasks 28 . These considerations motivated a set of further analyses described in the following sections. These aimed to (1) elucidate the evolution of the decision variable across the sensory-motor pathway for our change-point task, from the primary visual cortex (V1) to the primary motor cortex (M1), and (2) quantify the arousal modulation of the decision computation across this pathway.
Visual cortical encoding of evidence and decision variable. We reconstructed the dynamics of selective activity lateralization in a set of regions of interest (ROIs; Supplementary Table 1) along the sensory-motor pathway (Methods): multiple visual field maps in occipital, parietal and temporal cortices as well as parietal and frontal regions encoding left-versus right-hand movements 27 . In all 'action-related' cortical regions except the anterior intraparietal sulcus (aIPS), the lateralization of alpha-and beta-band activity exhibited sustained encoding of LLR (Fig. 4a) and prior belief (ψ), as well as modulation of LLR encoding by CPP (Fig. 4b). This is consistent with the sensor-level results from Fig. 3 and further supports the representation of the decision variable in the format of the associated action plan.
In the visual field maps, the encoding of computational variables exhibited different features. First, the lateralization of short-latency gamma-band (40-65 Hz) responses transiently encoded LLR (Fig. 4a) but none of the other variables (Fig. 4b), a profile consistent with pure evidence encoding. By contrast, alpha-band lateralization in the visual cortex mirrored the profile of alpha and beta lateralization in downstream action-related regions, in particular, the sustained encoding of prior belief (ψ) and an interaction between LLR and CPP (Fig. 4). Thus, alpha-band activity even in the early visual  weighted means and bootstrap 95% confidence intervals in f) but for LLR encoding selectively in the 8-14-Hz lateralization signal. All P values were derived by two-tailed weighted permutation tests; 'forward' and 'reverse' labels indicate direction of hierarchy inferred from slopes of parameters across ROIs. All panels are based on data from n = 17 independent participants.
cortex (V1-V4) was also consistent with encoding of the normative decision variable.
Overall, alpha-band activity in the visual cortex was reminiscent of feedback of the evolving decision variable identified in earlier work on standard perceptual choice tasks (that is, without change points) 15,17,27 , as well as the idea that visual cortical alpha-band activity reflects cortical feedback signaling 27,29,30 . We further explored this idea in a series of analyses focusing on a hierarchically ordered set of dorsal visual field maps 31 (V1, V2-V4, V3A/B, intraparietal sulcus (IPS)0/1, IPS2/3) as described in the next section. The areas of these and other clusters are defined in Supplementary Table 1.

Signatures of adaptive decision feedback in the visual cortex.
We first delineated the feedforward processing of sensory evidence by means of multivariate decoding of the spatial patterns of evoked responses 23,32 separately for each ROI ( Fig. 5a-d). The LLR decoding latency of action-related regions (termed 'motor' in Fig. 4) lagged that of V1 (P = 0.005, two-tailed weighted permutation tests, Methods). The timescale of LLR decoding increased from V1 to extrastriate (P = 0.048) and motor areas (P = 0.007; extrastriate versus motor, P = 0.021, two-tailed weighted permutation tests, Fig. 5d). Within the visual cortical hierarchy, timescales increased monotonically (P = 0.017, two-tailed weighted permutation test on the slope of regression line across areas; Fig. 5c). The increase of evidence decoding timescales across the visual cortical hierarchy was mirrored by progressively slow intrinsic activity fluctuations during the pretrial interval (Extended Data Fig. 7), in line with studies on monkeys 25 . In sum, these results indicate feedforward processing of evidence from V1 to higher-tier visual regions to motor regions, in line with previous work 23,32 .
The feedforward scheme suggested by Fig. 5a-d and Extended Data Fig. 7 stood in sharp contrast to the signatures of LLR encoding in alpha-band lateralization ( Fig. 5e-h). Alpha-band LLR encoding latency of V1 lagged that of extrastriate regions (P < 0.0001, two-tailed weighted permutation test) but not that of motor regions (P = 0.11; Fig. 5e), and the LLR encoding timescale was slower in V1 than in extrastriate regions (P = 0.002) but not motor regions (P = 0.28; Fig. 5g). Alpha-band LLR encoding timescales monotonically decreased across the visual cortical hierarchy (P = 0.0004, two-tailed weighted permutation test; Fig. 5h, compare with Fig. 5c). In sum, the dynamics (latencies and timescales) of LLR encoding in alpha-band lateralization were also consistent with a feedback signal.
Previous monkey 17 and human 27 physiology of perceptual choice tasks without change points inferred decision-related feedback from the covariation between intrinsic (stimulus-independent) fluctuations of neural activity and behavioral choice. Here, we computed these fluctuations as the residuals, over and above the activity explained by the components of the normative decision variable depicted in Fig. 3h (model 6, Methods). Fluctuations in the alpha-and beta-band lateralization of IPS/PCeS, PMd/v and M1 were robustly predictive of choice toward the end of the trial (Fig. 6a, right) but not at the beginning of the trial (Fig. 6b, right). Choice-predictive fluctuation is expected for these downstream regions, because they encode the action plan that ultimately controls the behavioral choice, even when that choice deviates from the choice prescribed by the normative decision variable. Further, this choice-predictive, action-related activity is expected specifically for such late trial intervals because, due to possible changes of mind, decision states earlier in the trial could differ substantially from the choice-determining state toward the end of the trial (Fig. 3c).
Remarkably, the early visual cortex (V1 and weakly in V2-V4), but not higher-tier visual field maps, exhibited similarly late, choice-predictive fluctuations, specifically expressed in the lateralization of alpha-band activity (Fig. 6a). Again, there was no effect earlier in the trial (Fig. 6b), ruling out the possibility that this visual cortical alpha-band signal might reflect a biased baseline state before the start of evidence accumulation, due, for example, to slow attentional drift or choice history effects across trials 17,33 .
In sum, we found similar encoding of the evolving decision state in the alpha-band from the motor cortex and V1 and more prominent decision signatures when progressing backward along the visual cortical hierarchy, from the extrastriate cortex to V1. Because MEG source reconstruction is limited by signal leakage, a possible concern is that the alpha-band effects observed in the visual cortex reflect signal leakage from downstream motor regions. However, the absence, or even sign reversal ( Fig. 4b and Fig. 6a, top middle), of beta-band effects in the visual cortex renders this  Other work has speculated that decision-related feedback to the sensory cortex may stabilize the evolving decision state 15,17 , akin to confirmation bias 34 . Such an active consolidation of the evolving decision through feedback may be beneficial in relatively stationary environments with protracted periods of stability, as was the case in our task (H ≪ 0.5). But it would be disadvantageous in highly volatile environments that are likely to change. In another experiment, we assessed whether the feedback signatures in the early visual cortex adapt to changing levels of environmental volatility. We used the same task as in Fig. 1 but manipulated the volatility to be either similarly low, as before (H = 0.1), or high (H = 0.9) in different blocks ( Fig. 7a and Methods).
All but one participant adapted their behavior to these different settings (Fig. 7c) and again performed in line with the normative evidence accumulation scheme ( Fig. 7d and Extended Data Fig. 8b; the remaining participant based choices on only the last evidence sample). Please note that the contribution of −|ψ| to normative accumulation was negligible for the generative settings of this experiment (Extended Data Fig. 8a). For the low-volatility condition, we again replicated the consistently positive evidence weighting on choice, stronger weighting of late evidence and upweighting of evidence associated with high CPP (Fig. 7d, gray; compare with Fig. 2b and Extended Data Fig. 3). By contrast, in the high-volatility condition, evidence-weighting time courses then switched sign from sample to sample, again as predicted by the normative evidence accumulation scheme (Fig. 7d, black).
For low volatility, we also replicated the pattern of LLR encoding (positive for gamma (P = 0.0028 for V1, P < 0.0001 for extrastriate regions V2-IPS3), negative for alpha (P < 0.0001 for V1 and V2-IPS3); Fig. 7e) and of choice-predictive fluctuations in the lateralization of early visual cortical activity (P = 0.036, all two-tailed permutation tests; Fig. 7f). Critically, both signatures adapted to the statistics of the environment (Fig. 7e,f). The strength of the (negative) alpha-band LLR encoding was significantly reduced under high volatility, both in V1 (P = 0.003, two-tailed permutation test) and the extrastriate visual cortex (V2-IPS3 pooled, P = 0.0002; Fig. 7e). The opposite was true for the (positive) LLR encoding in the gamma band, which was enhanced under high volatility (V1, P = 0.022; V2-IPS3, P = 0.040). Likewise, the pattern of choice-predictive fluctuations in V1 alpha-band activity also changed from low to high volatility (difference, P = 0.004), with a switch from negative to positive sign under high volatility (P = 0.011, two-tailed permutation test; Fig. 7f), possibly indicating an active destabilization of the evolving decision state in this environmental setting. In sum, high environmental volatility seemed to boost the bottom-up sensory evidence encoding (gamma band) and reduce the feedback signaling in early visual cortical alpha-band activity, suggesting flexible adaptation of these signal features to the stability of the environment.

Phasic arousal shapes accumulation and visual cortex state.
Cortical circuit dynamics are not only shaped by intracortical feedback signals but also by input from brainstem arousal systems 28,35 . Transient (phasic) arousal signals, specifically from the noradrenaline system, might dynamically adjust the impact of new evidence on belief updating, particularly when the evidence is surprising 36-38 . In a final set of analyses of data from the main experiment, we used rapid, sample-evoked dilations of the pupil to assess the involvement of brainstem arousal 39,40 . We quantified rapid pupil responses as the first derivative (that is, the rate of change) of the raw response to increase temporal precision (Fig. 8a) and specificity for noradrenaline release 41 .
Samples strongly indicative of a change in environmental state evoked strong pupil responses (Fig. 8b,c). We quantified this effect by fitting a linear model consisting of model-derived variables including the CPP and −|ψ| values associated with each sample to the corresponding pupil response (model 7, Methods). We found a robust positive contribution of CPP (P < 0.0001, two-tailed cluster-based permutation test) in addition to a weaker positive contribution of −|ψ| (P = 0.023). Thus, in our task, phasic arousal was recruited by the two computational quantities, CPP and −|ψ|, that modulate evidence weighting in the normative model.
Sample-to-sample fluctuations in the evoked arousal response predicted an 'upweighting' of the impact of the associated evidence sample on choice ( Fig. 8d-f). Variations in pupil responses beyond those explained by the above computational variables exhibited a positive, multiplicative interaction with LLR in its impact on choice ( Fig. 8d; P = 0.003, two-tailed cluster-based permutation test). The magnitude of this effect also correlated with a participant-specific gain parameter, estimated in our normative model fits, which quantified the weighting of belief-inconsistent evidence beyond the upweighting entailed in the normative model ( Fig. 8e; peak r = 0.66, uncorrected P = 0.004, two tailed).
The pupil-predicted upweighting of the behavioral impact of evidence (Fig. 8d) was accompanied by a corresponding modulation of LLR encoding across the visual cortical hierarchy (Fig. 8f). Here, we expanded our decomposition of cortical dynamics (Fig. 4) by a term that reflected the modulation of evidence encoding by the associated pupil response (model 8, Methods). This showed a transient modulatory effect of pupil responses on selective evidence encoding in the power lateralization of visual cortical areas (Fig. 8f). The selective pupil-linked effect was superimposed onto a more sustained, global (across hemispheres) effect on low-frequency power across the visuo-motor pathway (Extended Data Fig. 9). The selective effect was specifically expressed in alpha-band power lateralization (Fig. 8f) and not evident in downstream, action-related regions (all P > 0.3; Fig. 8f). Thus, phasic arousal modulated evidence encoding in the visual cortical system in the specific frequency band that our previous analyses of the same neural data had linked to decision feedback.

Discussion
Computational work has developed abstract, normative solutions for decision-making problems 13,14 or simulated the underlying synaptic interactions in cortical microcircuits 15,16 . We reasoned that the possibility of change, a feature of natural environments 42 , is critical for developing an understanding that bridges between these different levels of analysis 43 . Previous studies of perceptual decisions in changing environments focused on behavior 13,14,19 . Here, we identified the cortical representation of an approximately normative decision variable for changing environments. We found that (1) this decision variable is encoded in the build-up activity of cortical regions involved in action planning and (2) its adaptive computation naturally emerges from recurrent interactions in cortical microcircuits. This forges a tight link between existing circuit models and normative evidence accumulation.
In addition, we found that the state of the visual cortex, expressed in activity fluctuations in the alpha frequency band, is shaped by an adaptive, stabilizing feedback of the evolving decision variable as well as pupil-linked arousal. Here, we conceptualize 'cortical state' as cortical activity distinct from feedforward sensory responses. Our data are in line with the notion that the cortical state entails an interplay of selective cortical feedback and neuromodulatory signals 44 . Thus, our findings from the visual cortex call for an extension of models for evidence accumulation by multiple processing stages incorporating feedback and neuromodulatory input. We propose that these features enable the brain to adapt evidence accumulation to a range of environmental statistics.
Our account starts from the realization that adaptive evidence accumulation in volatile environments typically entails the non-linear modulation of evidence weighting by CPP and uncertainty. The impact of these modulations varies across environments. It is negligible in contexts that preclude the formation of strong belief states: high noise and/or a rate of change that renders the environment unpredictable (H ≈ 0.5). Correspondingly, a linear, leaky accumulator model fits the behavior of rats (with strong internal noise) well in an auditory task similar to ours 19 . Yet, these modulations make a substantial contribution to normative accumulation in a wide range of contexts (Extended Data Fig. 1).
We reiterate that CPP and uncertainty are not an explicit component of the normative model but served as a vehicle for identifying diagnostic behavioral and neural signatures of normative evidence accumulation (see Extended Data Fig. 10 for the relationship to commonly used measures of surprise). In line with other work 37,38 , we found that both variables were encoded in pupil-linked arousal responses (and CPP more strongly so than other surprise metrics; Extended Data Fig. 10e). These observations further validate our dissection of the adaptive accumulation process.
Previous work on statistical learning showed that humans tune their learning rate as a function of surprise and uncertainty 45,46 . The learning tasks from this work operated on timescales at least an order of magnitude slower than that for our task. Yet, our results highlight an analogy between the adjustment of learning rate by surprise, uncertainty or pupil-linked arousal and the modulation of perceptual evidence weighting. It will be interesting to assess whether this analogy originates from shared circuit mechanisms. The previous work on dynamic adjustments of learning focused on identifying neural correlates of the variables that control the regulation (that is, surprise or uncertainty) 45,46 . By contrast, our study illuminates how these variables modulate the selective cortical signals that encode the result of the computation, the belief state.
The computation of the decision variable entails recurrent network interactions that produce timing jitter 16,47 . Consequently, neural signatures of the decision variable may not be precisely phase locked to the onset of evidence samples but rather manifest in slower variations of the amplitude of local field potential oscillations picked up by MEG 47 . Further, cognitive variables are often encoded in specific frequency bands, sometimes with opposite sign between bands 47 . We therefore reasoned that our frequency-resolved encoding analysis should be well suited for detecting correlates of the decision variable. Our results corroborate this reasoning.
Our results also support the notion that attractor dynamics in frontal and parietal cortical circuits implement the formation and maintenance of decision states 16,48,49 : a circuit model with attractor dynamics accurately reproduced the detailed characteristics of both behavior and build-up activity in the parietal and frontal regions involved in action planning. One signature of attractor dynamics during decision making is the sigmoidal relationships of cortical activity and model decision variables found in Fig. 3e and elsewhere 8,32 . Here, we identified an adaptive function of these dynamics: balancing stable accumulation of the evidence and flexible response to change. Yet, these specific circuit dynamics will only be beneficial in a limited range of environments. Changes in environmental statistics may require a tuning (for example, through neuromodulatory input) or reconfiguration (for example, through cortical feedback) of the circuit properties.
Feedback signals to the sensory cortex from downstream cortical regions during perceptual evidence accumulation could convey predictions that are compared with incoming sensory data in an iterative fashion 50 . Feedback may also mediate the impact of expectations inherited from previous trials 17,33 . When originating from downstream accumulator circuits, feedback may also consolidate the evolving network state 15,27,34 . We find that decision feedback emerges during, not before, evidence accumulation and depends on environmental volatility. These findings support the notion of an adaptive feedback mechanism stabilizing evolving belief states when needed. The particularly pronounced signatures of decision feedback in V1 (stronger than in any other visual cortical area) may point to a special role for the early visual cortex in perceptual evidence accumulation.
An influential account holds that surprise-related phasic responses of the brainstem norepinephrine system cause a shift toward more 'bottom-up' signaling relative to 'top-down' signaling and thus a greater impact of new evidence on the evolving belief 36,38 . This is broadly consistent with our observation that pupil responses to evidence indicative of environmental state change predicted an upweighting of the impact of that sample on choice. However, if boosting bottom-up signaling, the associated modulations of cortical evidence encoding may have been expected in visual cortical gamma-band responses 29,30 and should also affect action-related regions (that is, stronger evidence encoding in motor alpha-and beta-activity). Instead, we found pupil-dependent modulations of evidence encoding only in the visual cortex, not in downstream regions, and only in the alpha band, not in the gamma band. It is tempting to speculate that, in the current settings, an enhancement of feedback signals to the visual cortex by phasic arousal stabilizes the updated belief state induced by evidence indicative of change.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41593-021-00839-z. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Methods
We report data from three experiments using different variants of the behavioral task from Fig. 1 Fig. 3). Experiment 3 was a MEG experiment following up on experiment 1 ( Fig. 7 and Extended Data Fig. 8).
Participants. The study was approved by the ethics committee of the Hamburg Medical Association and included a total of 51 human participants including the first author. All participants provided written informed consent, had normal or corrected-to-normal vision and no history of psychiatric or neurological diagnosis. Participants received remuneration in the form of an hourly rate, a performance-dependent bonus and a bonus for completing all planned sessions. For experiment 1, 17 participants (mean ± s.d. age of 28.22 ± 3.89 years, range 23-36; 11 females) completed one training session (120 min), three (16 participants) or four (the first author) sessions of the main experiment (150 min each) and one session to obtain a structural magnetic resonance image (MRI) (30 min).
For experiment 2, four participants (ages of 25, 26, 27 and 28 years; two males) completed one training session (120 min) and three (one participant) or four (three participants) experimental sessions (120 min each).
For experiment 3, thirty participants (age, 26.07 ± 3.49 years; range, 20-35 years; 18 females) completed one training session (120 min), three sessions of the main experiment (480 min each) and one session to obtain a structural MRI (30 min). This experiment also included a randomized, double-blind pharmacological intervention (cross-over design, 40 mg atomoxetine, 5 mg donepizil or placebo). Drug-related participant exclusion criteria were smoking, consumption of >15 units of alcohol per week, illegal drug use, pregnancy, medication intake, glaucoma, pheochromocytoma, cardiovascular disease or known hypersensitivity to atomoxetine or donepezil. Here, we report results pooling data over drug conditions; drug effects will be reported in a separate paper.
Behavioral tasks. Stimuli were generated using Psychtoolbox-3 for MATLAB 51 . Visual stimuli were back projected on a transparent screen using a Sanyo PLC-XP51 projector at 60 Hz during MEG and on a VIEWPixx monitor in a psychophysics room. Participants were seated 61 cm from the screen during MEG or with their head in a chinrest 60 cm from the monitor in the psychophysics room (training; all of experiment 2). For experiments 1 and 2, the experimental sessions took place on consecutive days. They comprised between seven and nine blocks (76 trials each) with 2-10-min breaks between blocks. For experiment 3, sessions comprised 16 blocks (86 trials each), including 2-10-min breaks between blocks and a 45-60-min break between blocks 8 and 9. Each block was followed by feedback about the mean choice accuracy for that block and a running mean for that session. Participants completed 22-33 blocks (1,628-2,508 trials) in experiment 1, 33-47 blocks (2,100-3,174 trials) in experiment 2 and 42-51 blocks (3,612-4,386 trials) in experiment 3.
Main task. The main task across experiments was a two-alternative forced choice task in which the generative task state S = (left, right) could change unpredictably. Participants were asked to maintain fixation at a centrally presented mark throughout the trial, monitor a sequence of evidence samples and report their inference about S at the end of the sequence.
Stimuli were presented against a gray background. Three placeholders were present throughout each trial: a light-gray vertical line extending downward from fixation to 7.4° eccentricity; a colored half-ring in the lower visual hemifield (polar angle, from −90 to +90°; eccentricity, 8.8°), which depicted the LLR associated with each possible sample location; and a fixation mark as a black disc 0.18° in diameter superimposed onto a disk 0.36° in diameter with varying color informing participants about trial intervals. The colors comprising this half-ring and the fixation point were selected from the Teufel colors 52 . Evidence samples consisted of achromatic, flickering checkerboards (temporal frequency, 10 Hz; spatial frequency, 2°) within a circular aperture (diameter = 0.8°) and varied in polar angle (constant eccentricity of 8.1°).
In experiment 1, samples were presented for 300 ms (sample-onset asynchrony (SOA), 400 ms). Samples were centered on polar angles x 1 ,…,x n drawn from one of two truncated Gaussian distributions p(x|S) with variance σ left = σ right = 29° and means symmetric about the vertical meridian (μ left = −17°, μ right = +17°). If a draw x i was <−90° (>+90°), it was replaced with −90° (+90°). S was chosen at random at the start of each trial and could change with a hazard rate H = p(S n = right|S n−1 = left) = p(S n = left|S n−1 = right) = 0.08 after each sample. Seventy-five percent of sequences contained 12 samples. The remaining 25% (randomly distributed in each block) contained (with equal probability) 2-11 samples and were introduced to encourage participants to attend to all samples. Each trial began with a variable fixation baseline period (uniform between 0.5-2.0 s) during which a stationary checkerboard patch was presented at a polar angle of 0°. This checkerboard then began to flicker, followed after 400 ms by the first evidence sample. The final sample of the sequence was then replaced by the stationary 0° patch, and a 'Go' cue instructed participants to report their choice 1.0-1.5 s (uniform) after the sequence end. They pressed a button with their left or right thumb to indicating left or right, respectively. Auditory feedback 0.1 s after the response informed participants whether their choice corresponded to the true S at the sequence end: an ascending 350 → 950 Hz tone (0.25 s) for correct choice and a descending 950 → 350 Hz tone for incorrect choice. This was followed by an intertrial interval of 2 s when participants could blink. During the preparatory interval, sample sequence and subsequent delay, the color of the second fixation disk was light red, the 'Go' cue was this disk becoming light green, and the intertrial period was indicated by the disk becoming light blue. Experiments 2 and 3 used the same task, with the following exceptions. In experiment 2, samples were presented either for 100 ms with 200 ms SOA or for 500 ms with 600 ms SOA. Presentation duration and SOA alternated between task blocks. In experiment 3, H was pseudorandomly assigned to be 0.1 or 0.9 for each session, under the constraint that the starting H be chosen equally often across the group for each session, and every participant would start with each H at least once. It then remained fixed for eight task blocks and switched for the remaining eight blocks. The generative distributions in experiment 3 were μ left = −28°, μ right = +28°, σ left = σ right = 28°. In experiment 3, 65% of sample sequences contained ten samples, the remaining 35% ranged between 2-9 samples (uniform), the preparatory interval duration was 0.5-1.5 s, the time between sequence completion and the 'Go' cue was 0.7-1.2 s, and the intertrial interval was 1.2 s.
In all experiments, participants practiced the task in a psychophysics room a few days before the first session, with gradual increases in complexity and detailed information from the experimenter about task statistics.
Localizer task. In each MEG session, participants also completed one block of a localizer task for measuring motor preparatory activity for the hand movement used to report choice. Participants fixated the central fixation mark while a sequence of lexical cues and subsequent 'Go' cues for responding were presented. A trial began with presentation of one of two lexical cues ('LEFT' or 'RIGHT'; 15-point white Trebuchet font; 0.3 s) 1.25° above fixation. The task was to execute a button press with the corresponding thumb after a 1-s delay-fixation interval, marked by a color change at fixation. The block was comprised of 60 trials (30 left and 30 right responses, randomly distributed) and administered after the sixth block of the decision-making task in each session.
Statistics and reproducibility. The sample size was chosen based on recent work using comparable methods (for example 27 ). Before data collection, we verified through simulations that our combinations of sample size and trial counts were sufficient to produce key behavioral effects even for high noise.
No participant was excluded from analysis in experiments 1 and 2. In experiment 3, one participant had subjective H values approaching 0.5, consistent with basing decisions only on the final evidence sample. Regression model fits for this participant did not converge. Consequently, they had to be excluded from all related analyses. We used standard criteria based on previous studies to exclude single trials in which MEG or eye-tracking data were contaminated by artifacts.
All experimental manipulations analyzed here were repeated within individual participants and administered without blinding of experimenters. All participants were allocated to the same group for analysis. Trial orders were fully randomized within and across participants using a random seed for each session.
Unless otherwise stated, permutation testing was used for statistical analysis (two-tailed; 10,000 permutations). When applied to time series or time-frequency maps, we implemented cluster-based correction for multiple comparisons 53 (cluster-forming threshold, P < 0.05). In cases in which the reliability of an estimator depended on the strength of an underlying effect (latencies and timescales of LLR encoding and decoding traces; see below), we used weighted permutation tests. Each participant p was assigned a weight, w p , determined by effect strength (described in a case-specific manner below), constructed such that ∑ p wp = 1. w p determined the contribution of each participant to the weighted mean over participants and the probability that they would be drawn, with replacement, in iterations of the permutation test.
Normative model. The normative model for the main task prescribes the following computation 13 : Here, L n was the observer belief after encountering the evidence sample x n , expressed in log-posterior odds of the alternative task states; LLR n was the log-likelihood ratio reflecting the relative evidence for each alternative carried by x n (LLR n = log (p(x n |right)(p(x n |left)) −1 )); and ψ n was the prior expectation of the observer before encountering x n . We used this model to derive two computational quantities: CPP and −|ψ|. CPP was the posterior probability that a change in generative task state has just occurred, given the expected H, the evidence carried by x n and the observer's belief before encountering that sample L n−1 (see Supplementary Note 1 for derivation). Uncertainty was defined as the negative absolute value of the prior (−|ψ n |), reflecting uncertainty about the generative state before observing x n (Fig. 1e).
We evaluated the impact of CPP and −|ψ| on evidence accumulation as a function H and the SNR of the generative distributions (difference between distribution means divided by their s.d.). For each point on a 5 × 5 grid (H = (0.01, 0.03, 0.08, 0.20, 0.40), SNR = (0.4, 0.7, 1.2, 2.0, 5.0)), we simulated 10,000,000 observations, passed these through equations (1) and (2), calculated CPP and −|ψ| and then fit the following model (model 1): where CPP n was log transformed to reduce skew, log (CPP n ) and −|ψ| n were z scored, and all β are fitted regression coefficients. We assessed the contribution of each term to the updated prior ψ n+1 by calculating their coefficients of partial determination: where SSR full was the sum of squared residuals of equation (3), and SSR reduced was the sum of squared residuals of a model excluding the term of interest. We repeated the analysis for H = (0.1, 0.9) used in experiment 3, here submitting CPP to the logit transform and for two other quantities defined in Extended Data Fig. 10: 'unconditional' and 'conditional' Shannon surprise.
Modeling of behavior. Normative model. We first computed the accuracy of participants' choices with respect to the true final generative state and compared this to the accuracy yielded by three idealized decision processes presented with identical stimulus sequences: normative accumulation (equations (1) and (2)), perfect accumulation and only using the final evidence sample. For each trial, choice r (left = −1, right = +1) was determined by the sign of the log-posterior odds after observing all samples: r trl = sign (L n,trl ) for normative, for perfect and r trl = sign (LLR n,trl ) for the last sample, where n indicated the number of samples presented on trial trl. Second, for each strategy and participant, we computed choice accuracy as a function of duration of the final state on each trial (from 1 on trials in which a change point occurred immediately before the final sample to 12 on full-length trials without change points). Third, we assessed the consistency between the choices of participants and each idealized strategy, computing the slope of a psychometric function relating log-posterior odds to human choice. For each strategy and participant, we normalized log-posterior odds across trials and described the probability r of a making a right choice asr where γ and λ were lapse parameters, δ was a bias, DV trl was the z-scored log-posterior odds, and α was the slope, reflecting consistency between the choices produced by a strategy and those of a participant. We estimated γ, δ and α by minimizing the negative-log likelihood of the data using the Nelder-Mead simplex search routine and assumed γ = λ. We then fit variants of the normative model to participants' behavior, assuming that choices were based on the log-posterior odds L n,trl for the observed stimulus sequence on each trial. L n,trl was corrupted by a noise term ν, such that choice probability r was computed as follows.
We also allowed for misestimation of H, a bias in the mapping of stimulus location to LLR and a bias in the weighting of evidence samples that were (in) consistent with the existing belief. We fitted the parameters by minimizing the cross-entropy between participant and model choices: where r trl was the participant choice. The sum of e with any regularization penalty terms was minimized via particle swarm optimization (PSO toolbox 54 ), setting wide bounds on all parameters and running 300 pseudorandomly initialized particles for 1,500 search iterations. The relative goodness of fit of different model variants was assessed by the BIC: where k was the number of free parameters, and n was the number of trials. The motivation for the inclusion of terms in our model fits that allowed for deviations from the ideal strategy was as follows.
For selection noise, a noise term was only applied to the final log posterior, corrupting the translation of final belief into choice rather than the preceding accumulation process. The latter is also likely corrupted by noise 55 , as evident in our analyses of residual MEG and pupil fluctuations. We used selection noise for consistency with previous implementations 13 and because this yielded decent fits. The exact nature of internal noise is beyond the scope of the present study.
For subjective H, in line with previous work 13, 56 , we allowed for subjectivity in H. The model fits indicated that participants in experiment 1 tended to underestimate H (subjective H = 0.039 ± 0.005 (s.e.m.); Extended Data Fig. 5b).
For non-linear stimulus-to-LLR mapping, we allowed for a non-linearity in the mapping of polar angle onto LLR. This was motivated both by previous work 22 and an analysis 2 using logistic regression to estimate the subjective weight associated with samples falling into evenly spaced bins (spacing, 0.6 in true LLR space): where N k,trl was the number of samples on trial trl with true LLR falling in bin k. Fitting this regression model showed that participants tended to give stronger weight to extreme samples (Extended Data Fig. 5g). We accounted for this in our model fits by estimating the mapping of the polar angle to subjective LLR ( LLR) as a non-parametric function fit to participants' choices. where i indexed the interval of x at which LLR was estimated, and γ = 1 ÷ 20 was determined ad hoc. For (in)consistency bias, based on previous work 24, 25 , we also allowed for larger weights for samples (in)consistent with the existing belief. A multiplicative gain factor g was selectively applied to such samples, such that LLRn = LLRn × g for any sample n where sign (LLRn) ̸ = sign (ψ n ). Indeed, participants assigned higher-than-optimal weight to samples that were inconsistent with their existing belief (fitted g = 1.40 ± 0.04 (s.e.m.); Extended Data Fig. 5d).
The full model fits to the data from experiment 1 consisted of 11 free parameters (noise, subjective H, eight stimulus-to-LLR mapping parameters and inconsistency gain). We compared these to fits of more constrained models lacking combinations of the above parameters (Extended Data Fig. 4a). Here, we reparameterized the stimulus-to-LLR mapping function for each sample n as follows: which was more constrained than the interpolated function but could produce convex (κ > 1), concave (κ < 1) or linear (κ = 1) mapping functions using only two free parameters, κ and β. We used the same general approach to fit model variants to the data from experiment 3, with the following exceptions: (1) subjective H could vary across objective H conditions, (2) we only allowed for linear scaling to be applied to true LLRs ( LLRn = β × LLRn) and fixed β across H conditions, (3) we assessed whether g was best defined relative to the sign of ψ n or L n−1 by comparing models assuming either definition (the normative model prescribes a sign flip in the transformation of L n−1 into ψ n for H = 0.9 but not for H = 0.1), and (4) we fit model variants to assess whether noise or g might vary across H conditions.
We also fit a variant that estimated the L n−1 -to-ψ n mapping directly from participants' choices. It was estimated as an interpolated non-parametric function assuming symmetry around L n−1 = 0 and ψ(L n−1 = 0) = 0 and replaced subjective H. ψ n was estimated for values of L n−1 that were spread evenly between 1-10 in steps of 1. We applied Tikhonov regularization to the first derivative with γ = 1 ÷ 2.
Alternative models. We fit three other accumulator models to the data that lacked key characteristics of the normative process. The first was a perfect accumulator that linearly integrated all evidence samples without loss. The second was a leaky accumulator that substituted equation (2) with the following L-to-ψ mapping: where λ was the leak. The third model employed perfect accumulation toward non-absorbing bounds, substituting equation (2) for where A was the bound height. Fits of these models variously incorporated noise, non-linear LLR mapping and an inconsistency bias. The latter two models had the same number of free parameters as our fits of the normative model, while the former had one parameter less.
Psychophysical kernels. We quantified the time course of the impact of evidence on choice using logistic regression (model 2): where i and j indexed sample position within sequences of n samples (12 for experiments 1 and 2, ten for experiment 3), and LLR was the true LLR. For experiment 3, we excluded −|ψ| terms because they did not contribute to normative belief updating in this setting (Extended Data Fig. 8a). The dependent variable was participants' choice (left = 0, right = 1) or model choice probability. The set β 1 quantified the impact of evidence at each position on choice, and the sets β 2 and β 3 quantified modulations of evidence weighting by CPP and −|ψ|. CPP was log transformed (experiments 1 and 2) or logit transformed (experiment 3, see above), and both variables were then z scored. Additionally, all regressors were z scored across the trial dimension.
Circuit model. Parameterization of the cortical circuit model was performed as described in ref. 16 , except for the following. First, the stimulus input function from the original model produced excessive primacy in evidence weighting, inconsistent with our data. We used a threshold-linear function symmetric around 40 Hz and imposed no upper bound on the input: where Input x was the input to choice-selective excitatory populations D 1 and D 2 , and c was multiplicatively scaled LLR. We set c = 19, yielding an input of ~110 Hz for the strongest evidence strength, higher than that in ref. 16 . Second, the choice-selective recurrent connectivity from the original model (w + , set to 1.7) produced strong and stable attractor states that prohibited changes of mind for all but the strongest evidence strengths. We set w + = 1.68, which allowed the model to better track change points in our task. Third, we set the integration time step dt = 0.2 ms (0.02 ms in the original model), verifying through simulations that this did not significantly change model behavior or dynamics. The model was simulated for all 12-sample trials presented to participants in experiment 1, from 0.4 s before sequence onset to 0.4 s after sequence offset. Stimulus input to D 1 and D 2 was present from 0 to 0.4 s from sample onset and was zero elsewhere. The model decision variable X was defined as X = rD 2 − rD 1 , with rD 1 and rD 2 as the population firing rates (50-ms window). Model choice was determined by sign(X) at the end of each trial. The updated decision variable given a new sample was X at 0.4 s after sample onset. We estimated the model psychophysical kernels using CPP and −|ψ| from the ideal observer. Noise in the model was low relative to evidence SNR (we did not manipulate the model noise because we aimed to test whether the model could reproduce the qualitative features of the data). Thus, kernel magnitudes were large compared to the participant kernels. We applied a single multiplicative scaling factor (0.375, chosen manually) to the model kernels, which approximately matched kernel magnitudes with the participants while preserving relative differences between kernel types. We assessed different regimes of a reduction of the circuit model described by diffusion of a decision variable in a double-well potential 57 in Extended Data Fig. 6.
Data acquisition. MEG data were acquired with 275 axial gradiometers (equipment and acquisition software, CTF Systems) in a magnetically shielded room (sampling rate, 1,200 Hz). Participants were asked to minimize movement. Head location was recorded in real-time using fiducial coils at the nasal bridge and each ear canal. A template head position was registered at the beginning of the first session, and the participant was guided back into that position before each task block. Ag-AgCl electrodes measured the ECG, EOG and EEG (from 10-20 locations Fz, Cz, Pz) with a nasion reference. Eye movements and pupil diameter were recorded at 1,000 Hz with an EyeLink 1000 Long Range Mount system (equipment and software, SR Research). T1-weighted MRIs were acquired to generate individual head models for source reconstruction.

MEG data analysis.
Preprocessing. MEG data were analyzed in MATLAB (MathWorks) and Python using a combination of FieldTrip 58 and MNE 59 toolboxes and custom-made scripts. Continuous data were segmented into task blocks, high-pass filtered (zero phase, forward-pass FIR) at 0.5 Hz and bandstop filtered (two-pass Butterworth) around 50, 100 and 150 Hz to remove line noise. For experiment 1 (main and localizer tasks), data were resampled to 400 Hz and epoched from 1 s before trial onset to the 'Go' cue (main task) or from 1 s before to 2 s after the lexical cue (localizer). Trials containing the following artifacts were discarded: translation of any fiducial coil >6 mm from the first trial, blinks (detected by the EyeLink algorithm), saccades >1.5° amplitude (velocity threshold, 30° s −1 ; acceleration threshold, 2,000° s −2 ), squid jumps (detected by Grubb's outlier test for intercepts of lines fitted to single-trial log-power spectra), sensors with data range >7.5 pT and muscle signals (after applying a 110-140-Hz Butterworth filter and z scoring) of z > 20. For experiment 3 (main task), motion, squid jump, range and muscle artifacts were identified as described above but in continuous time series that typically consisted of four task blocks (excluding breaks). The remaining data for that recording were then subjected to temporal independent component analysis (infomax algorithm), and components containing blink or heart beat artifacts were identified manually and removed. The resulting data were segmented into single trials as described above.
Spectral analysis. For each sensor, we subtracted the trial-averaged response from the single-trial time courses. We used sliding-window Fourier transform to compute time-frequency representations (TFRs) of the single-trial activity: one Hanning taper (window length, 0.4 s; time steps, 0.05 s; frequency steps, 1 Hz; frequency smoothing, ±2.5 Hz) for the range 1-35 Hz and a sequence of discrete proloid slepian tapers (window length, 0.25 s; time steps, 0.05 s; frequency steps, 4 Hz; frequency smoothing, ±6 Hz) for the range 36-100 Hz. Fourier coefficients were converted into power by taking the absolute values and squaring. For sensor-level analyses, axial gradiometer data were decomposed into horizontal and vertical planar gradients before spectral analysis, and power values were recombined afterward. We converted power values for each time point t, frequency f and sensor c into units of modulation via the decibel (dB) transform: dB t,f,c = 10 × log 10 (power t,f,c × baseline f,c −1 ), where baseline f,c was trial-averaged power from 0.2-0.4 s before trial onset.
Sensor-level analyses. We used the localizer data to construct linear filters for isolating hand movement-selective motor preparatory activity ('motor filters'). Any trial on which a response was executed before the 'Go' cue was excluded from analysis. This was the case on the majority of trials for five participants, and their data were not used. Data from the remaining 12 participants were split into sensors covering the left ('ML*' in the CTF system) and right ('MR*') cerebral hemispheres. For each of 131 matching left-right pairs, we calculated a lateralization index LI for each time point, frequency and trial as the difference in dB t,f,c between the left and right sensor. We then fit the following linear regression model: where t indicated the time point relative to trial onset, f indicated frequency, c′ indicated the sensor pair, r trl indicated the behavioral response, and session i were binary nuisance regressors absorbing any main effect of the session (n denoting the total session number per participant). t-scores associated with β 1 measured the strength of the motor response encoding in LI. We averaged these 0.7-1.1 s after the cue to generate a per-participant sensor × frequency map for the motor planning interval. Cluster-based permutation testing at the group level (here with cluster-forming threshold of P < 0.01) yielded a single cluster (P < 0.001) with an associated sensor × frequency matrix M of group-level t-scores with all bins outside the cluster set to zero. From M, we generated three types of motor filters. For weights of the spectral filter, we integrated M over space in the numerator.
Likewise, integrating M over frequency yielded a spatial filter. Not integrating at all yielded a spatio-spectral filter.
Modeling motor preparatory activity. The motor filters were applied to LI data from the main task by computing the dot product between data and filter along the desired dimension(s). This yielded a motor preparatory signal (motor below). Application of the spatial filter yielded TFRs of motor preparatory activity, which were segmented from 0 to 1 s relative to the onset of each of the 12 samples per full-length trial. The resulting time × frequency × sample × trial matrix was used to assess the sensitivity of motor preparatory activity to computational variables. For simplicity, we derived computational variables from model fits in which the stimulus-to-LLR mapping was constrained to be linear (such that LLRn = β × LLRn, where β set the slope of the mapping function and was a free parameter). Although this model variant yielded marginally worse goodness of fit than the full model, the computational variables generated by each were highly correlated (average P > 0.95 for ψ, L, CPP and −|ψ|).
We then fit the following linear model (model 3): motor t,f,s,trl = β 0 + β 1 × ψ s,trl + β 2 × LLR s,trl + β 3 × LLR s,trl × CPP s,trl +β 4 × LLR s,trl × (−|ψ| s,trl ) + where motor t,f,s,trl was the motor preparatory activity at time t from sample onset and sample position s. We averaged β 1-4 across the sample dimension, obtaining single time-frequency maps reflecting the strength with which each computational quantity was encoded in each participant's motor preparation signal.
We also constructed models containing complementary parts of the 'full' model from equation (17) (models 4 and 5).
motor t,f,s,trl = β 0 + β 1 × LLR s,trl + For each model, we computed a 'super-BIC' score quantifying the model's group-level goodness of fit as a function of t, f and s. This metric was calculated with the negative-log likelihood, number of free parameters and number of observations determined by the sum of these values across participants. We averaged across the sample dimension and subtracted BIC scores for the 'belief model' (equation (18)) from those for the 'evidence model' (equation (19)) to generate a map of relative goodness of fit.
We characterized the relationship between motor t,f,s,trl and model-derived belief state by applying the spatio-spectral filter to the main task data, yielding a scalar measure of motor preparation per time point. We averaged from 0.4-0.6 s and z scored across trials separately for each sample position s and session. We normalized the posterior log odds L s in an analogous fashion to remove any differences in the range of L across participant-specific model fits. For each sample position and participant, we sorted the signal by normalized L s into 11 equal-sized bins and calculated the mean of both metrics per bin. Averaging across sample positions and participants yielded a group average belief encoding function, which we rescaled to the range +1 to −1. We repeated the procedure, replacing the motor preparation signal with ψ s+1 from the normative model (generating the L-to-ψ mapping for the normative model participant normalization, binning and averaging). We further repeated the procedure using the updated decision variable of the circuit model.
Single-trial correlations between motor preparatory activity and decision variables were quantified for each full-length trial by extracting scalar values of the relevant variables for each sample position s (motor, averaged from 0.4-0.6 s after sample onset) and by computing the Pearson correlation over sample positions between motor and model variables.
Source analyses. We used linearly constrained minimum variance (LCMV) beamforming to estimate activity time courses at the level of cortical sources 60 . We constructed individual three-layer head models from structural MRI scans using FieldTrip 58 . Four participants lacked individual MRIs, in which case we used the FreeSurfer average participant. Head models were aligned to the MEG data by a session-specific transformation matrix generated with MNE 59 . We reconstructed cortical surfaces from MRIs using FreeSurfer 61,62 and aligned anatomical atlases (see below) to each surface. We then used MNE to compute LCMV filters confined to the cortical surface (4,096 vertices per hemisphere, recursively subdivided octahedron) from a covariance matrix computed on the time points from the cleaned and segmented data. The covariance matrix was computed across trials, from trial onset to 6.2 s (experiment 1) or 5.4 s later (experiment 3). We chose the source orientation with maximum output source power at each cortical location. The LCMV filters were used to project single-trial time series as well as their complex-valued spectrograms into source space. We computed TFRs of power at each vertex by first aligning the polarity of time series at neighboring vertices (the beamformer output potentially included arbitrary sign flips for different vertices) and then converting the complex Fourier coefficients into power. We then averaged estimated power values across vertices within each ROI and baseline corrected using the dB transform (0.2-0.4 s before trial onset). We computed source-level lateralization indices LI t,f,trl,ROI by subtracting the power estimate for each left hemisphere ROI from that for the corresponding right hemisphere ROI.
We analyzed the following fMRI-defined ROI: (1) visual cortical field maps 63 , (2) regions exhibiting hand movement-specific lateralization (aIPS, IPS/PCeS and the M1 hand area 39 ) and (3) the dorsal-ventral premotor cortex 64 . Visual cortical field maps with shared foveal representation were grouped together 65 (Supplementary  Table 1), thus increasing the spatial distance between ROI centers and minimizing the impact of signal leakage 60 (due to limited filter resolution or volume conduction).
We repeated model 3 fitting for each ROI, replacing motor t,f,s,trl with source-level lateralization indices LI t,f,s,trl . We extracted fitted β 2 coefficients and averaged across sample positions and 8-14 Hz for the following set: V1, V2-V4, V3A/B, IPS0/1, IPS2/3, IPS/PCeS, PMd/v and M1; reversed the sign of the resulting LLR encoding trajectory such that the peak encoding was positive; cubicspline interpolated to millisecond-temporal resolution; and normalized as follows: x t = xt × (max (x)) −1 . We calculated the latency to half maximum and the decay timescale τ of the normalized trajectories, (20) fit to the right portion of the peak-aligned, normalized trajectory (via simplex, minimizing sum of squared residuals).
We averaged latency and timescale estimates over ROI within three sets (V1, V2-IPS3, IPS/PCeS, PMd/v and M1) and tested for parameter differences between set pairs using weighted permutation tests, with w p determined by the minimum strength of time-averaged LLR encoding across the pair. Participants with negative average LLR encoding for either ROI set were assigned w p = 0 for that test.
Gradients across the visual cortical hierarchy (V1, V2-V4, V3A/B, IPS0/1, IPS2/3) were assessed by fitting a line to the hierarchically ordered parameters and testing whether the slope of the fitted line differed from zero (weighted permutation test w p , determined as above). We also assessed a hierarchical gradient in the contributions of fast versus slow frequencies to 'intrinsic' activity fluctuations, computing power-law scaling exponents of power spectra from the pretrial period (FOOOF toolbox 66 ; Extended Data Fig. 7).
Decoding of LLR from ROI responses. We implemented ridge regression in scikit-learn 67 to predict LLR using the evoked responses from all vertices across both cerebral hemispheres, separately for V1, V2-V4, V3A/B, IPS0/1, IPS2/3, IPS/PCeS, PMd/v and M1. This analysis was performed without subtracting out the trial-averaged evoked response. We low-pass filtered the single-trial source-reconstructed signal to 40 Hz, subsampled the data to 200 Hz, segmented from −0.1 to 1.4 s around sample onsets and baseline corrected the per-vertex signal by subtracting the mean across 0.075 s preceding sample onset. For each participant, ROI, sample position and peri-sample time-point, per-vertex signals were z scored across trials based on the training set (the same transformation was applied to the test set before evaluating prediction performance). We reduced dimensionality through principal component analysis of the training set, keeping components cumulatively explaining 95% of training set variance and projecting training and test data into the subspace defined by these components. We fit the decoder using tenfold cross-validation and an L2 regularization term of α = 10. Prediction performance was quantified as the Pearson correlation between predicted and actual LLR. The latency to half maximum and decay timescale of sample-averaged LLR decoding were estimated as described for the alpha-band encoding analysis, with the exception that timescales were estimated using decoding traces starting 0.05 s after the latency of peak decoding precision, so that the transient peak in decoding precision did not contribute to timescale estimates. Parameter differences between ROI were assessed via weighted permutation test, with w p determined by the minimum time-averaged decoding precision across the ROI being compared.
Analysis of residual fluctuations. For each ROI, we extracted residuals resid t,f,s from the fit of equation (17), capturing fluctuations in neural activity not explained by the model. For experiment 1, we used logistic regression to relate these fluctuations to choice (model 6): where β 4,j captured the effect of interest. We averaged β 4,j across sample positions (j) 2-9 and 10-12 for statistical analysis. We performed a more selective version of this analysis for experiment 3, first fitting a version of model 6 (excluding the −|ψ| term) using residual LI t,f,s,trl estimates from M1. Averaging across H conditions revealed choice-predictive fluctuations restricted to a range of 12-17 Hz. We then fit model 6 using residual LI t,f,s,trl from V1 and V2-IPS3 (pooled) within this narrow frequency band, averaged across 0.2-1.0 s after sample.
Pupil data analysis. Preprocessing. Blinks and noise transients were removed from pupillometric time series using a linear interpolation algorithm in which artifactual epochs were identified via both the EyeLink detection algorithm and thresholding of the first derivative of the z-scored time series (threshold = ±3.5 zs −1 ). The time series for each block was then band-pass filtered (0.06-6 Hz, Butterworth), resampled to 50 Hz and z scored. We computed the first derivative of the result, referred to as 'pupil' below. To visualize sensitivity to change points, we segmented 'pupil' from −0.5 to 5.8 s around trial onset and grouped trials with exactly one change point by change-point position into the bins 2-4, 5-6, 7-8, 9-10 and 11-12. From each of the average traces, we subtracted the average signal from trials without change points.
Modeling of pupil responses. We assessed the sensitivity of 'pupil' to computational variables by segmenting the signal 0-1 s after sample onset (full-length trials) and fitting (model 7): where t indicates time point, x gaze and y gaze are horizontal and vertical gaze positions, and 'base' is the 'baseline' pupil diameter (−0.05 to 0.05 s around sample onset). |LLR| captured a possible relationship with 'unconditional' Shannon surprise (Extended Data Fig. 10). Previous sample CPP, −|ψ| and |LLR| were included because the pupil response is slow, meaning correlations with variables from the previous sample may have caused spurious effects. We assessed the relevance of variability in the pupil response for choice by fitting versions of equation (21) in which the final term (β 4 ) was LLR s,trl ⋅resid t,s,trl , where 'resid' referred to the residuals from equation (22). The relationship between 'pupil' and neural evidence encoding was assessed by extracting residuals from equation (22) at t = 0.57 s and fitting the following equation to the ROI-specific LI signal (model 8).
LI t,f,s,trl = β 0 + β 1 × ψ s,trl + β 2 × LLR s,trl + β 3 × LLR s,trl × CPP s,trl + β 4 ×LLR s,trl × (−|ψ| s,trl ) + β 5 × LLR s,trl × resid s,trl + This model was an extension of equation (17), where β 5 captured modulation of LLR encoding in LI by the associated pupil response. Fig. 1 | sensitivity of normative evidence accumulation to change-point probability and uncertainty across a range of generative task statistics. a, Non-linearity in normative model for different hazard rates (H): Posterior belief after accumulating the most recent sample (L n ) is converted into prior belief for next sample (ψ n+1 , both variables expressed as log-odds for each alternative) through a non-linear transform, which saturates (slope ≈ 0) for strong L n and entails more moderate information loss (0<<slope<1) for weak L n . In a static environment (H = 0), the model combines new evidence (log-likelihood ratio for sample n, LLR n ) with prior (ψ n ) into updated belief without information loss (that is L n = ψ n + LLR n ; ψ n+1 = L n ). In an unpredictable environment (H = 0.5), the new belief is determined solely by the most recent evidence sample (that is L n = LLR n ; ψ n+1 = 0) so that no evidence accumulation takes place. b, Example trajectories of the model decision variable (L) as function of the same evidence stream, but for different levels of H. c, Upper panels of each grid segment: Change point-triggered dynamics of change-point probability (CPP) and uncertainty (−|ψ|) derived from normative model as function of H (grid rows) and evidence signal-to-noise ratio (SNR, difference in generative distribution means over their standard deviation; grid columns). Lower panels of each grid point: Contribution of computational variables (including CPP and −|ψ|) to normative belief updating, expressed as coefficients of partial determination for terms of a linear model predicting the updated prior belief for a forthcoming sample. yellow background (center of grid): generative statistics of the task used in Experiment 1; reproduced from main Fig. 1e for comparison. Data for each parameter combination were derived from a simulated sequence of 10 million observations. d, Psychophysical kernels of the normative model for the task used in experiment 1 (12 samples per trial, generative SNR ≈ 1.2) but for each level of H from panel c. Kernels reflect the time-resolved weight of evidence on final choice (left) and the modulation of this evidence weighting by sample-wise change-point probability (CPP, middle) and uncertainty (−|ψ|, right). Kernels were produced by adding a moderate amount of decision noise (v = 0.7, see Methods) to the ideal observer (that is the normative model with perfect knowledge of generative statistics); without noise, coefficients at high H (where choices are based almost entirely on the final evidence sample) are not identifiable. Fig. 2 | Consistency of human choices with those of idealized choice strategies, and dependence of choice accuracy on final state duration. a, Choices for n = 17 independent participants as function of log-posterior odds (z-scored) derived from each of three alternative strategies: basing choices only on final evidence sample (gray); perfect evidence accumulation (magenta); and an ideal observer with perfect knowledge of the generative task statistics and employing the normative accumulation process for the task (blue). Points and error bars show observed data ± s.e.m.; lines and shaded areas show mean ± s.e.m. of fits of sigmoid functions to the data. Slopes of fitted sigmoids were steepest for ideal observer (ideal vs. perfect accumulation: t 16 = 5.9, p < 0.0001; ideal vs. last-sample: t 16 = 9.6, p < 10 −7 ; two-tailed paired t-tests), indicating that human choices were most consistent with those of the ideal observer. b, Choice accuracy for the same n = 17 participants as a function of duration of the final environmental state for the human participants (black), idealized strategies from panel a, fits of the normative model (cyan), and the circuit model (orange). Participants' choice accuracy increased with the number of samples presented after the final state change on each trial, consistent with temporal accumulation. Error bars and shaded regions indicate s.e.m. Fig. 3 | effect of sample onset asynchrony (sOA) on diagnostic signatures of adaptive evidence accumulation. Psychophysical kernels reflecting the time-resolved weight of evidence on final choice (left), and the modulation of this evidence weighting by sample-wise change-point probability (CPP, middle) and uncertainty (−|ψ|, right), separately for the two conditions of experiment 2 (n = 4 independent participants) in which participants performed the decision-making task at fast (0.2 s) and slow (0.6 s) sample onset asynchronies. Thin unsaturated lines are individual participants, thick saturated lines are means across participants. Cluster-based permutation tests (10,000 permutations) of SOA effects on each kernel type revealed no significant effects. Fig. 4 | Fit to human behavior of alternative evidence accumulation schemes. Alternative accumulation schemes are leaky accumulation (gray), where the evolving decision variable is linearly discounted by a fixed leak term after every updating step; and perfect accumulation toward non-absorbing bounds (green), which imposes upper and lower limits on the evolving decision variable without terminating the decision process, thus enabling changes of mind even after the bound has been reached. Both schemes combined can approximate the normative accumulation process across generative settings (ref. 13 ). All panels based on data from n = 17 independent participants. a, Choice accuracies for human participants (grey bars) and both model fits. b, Choice accuracy as a function of duration of the final environmental state, for the participants and the models. c, Regression coefficients reflecting the subjective weight of evidence associated with binned stimulus locations, for the participants and the models. d, Psychophysical kernels for the data and the model reflecting the time-resolved weight of evidence on final choice (left), and the modulation of this evidence weighting by sample-wise change-point probability (CPP, middle) and uncertainty (−|ψ|, right). In panels b-d, error bars and shaded regions indicate s.e.m. Fits of the model versions used to generate behavior in all panels included free parameters for both a non-linear stimulus-to-LLR mapping function and a gain factor on inconsistent samples (see Extended Data Fig. 5). Even with these additional degrees of freedom, the leaky accumulator model failed to reproduce the CPP and −|ψ| modulations characteristic of human behavior. By contrast, like the normative model, perfect accumulation to non-absorbing bounds captured all qualitative features identified in the behavioral data. This is in line with the insight that this form of accumulation closely approximates the normative model in settings like ours, where strong belief states are formed often (that is low noise and/or low H; ref. 13 ). This form of accumulation also captures a key feature of the dynamics of the circuit model that we interrogate in the main text; that is a saturating decision variable in response to consecutive consistent samples of evidence. Fig. 5 | Model comparison and qualitative signatures indicate approximation of normative belief updating by measured human behavior. All panels based on data from n = 17 independent participants. a, Bayes Information Criteria (BIC) for all models fit to the human choice data, relative to model with lowest group-mean BIC. Lines, group mean; dots, individual participants. Model constraints are specified by the tick labels. Colors denote unbounded perfect accumulator (magenta), leaky accumulator (gray), perfect accumulator with non-absorbing bounds (green), and the normative accumulation process with a subjective hazard rate (cyan). Labels refer to the following: 'Linear' ('Nonlinear')'=linear (non-linear) scaling of stimulus-to-LLR mapping function; 'Gain on inconsist.'=multiplicative gain term applied to inconsistent samples; 'Leak'=accumulation leak; 'Bound'= height of non-absorbing bounds; 'H' = subjective hazard rate. 'Perfect' is a special case of the leaky accumulator where leak=0. All models included a noise term applied to the final log-posterior odds per trial. See Methods for additional model details. Labels in black text highlight models plotted in main Fig. 2 and Extended Data Fig. 4. Model with lowest group-mean BIC ('full normative fit' in remaining panels) employs normative accumulation with subjectivity in hazard rate, a gain term on inconsistent samples, and non-linear stimulus-to-LLR mapping. b, Subjective hazard rates from full normative fits (cyan), with true hazard rate in blue. Participants underestimated the volatility in the environment (t 16 = −7.7, p < 10 −6 , two-tailed one-sample t-test of subjective H against true H). c, Non-linearity in evidence accumulation estimated directly from choice data (black) and from full normative model fits (cyan). Non-linearity of the ideal observer shown in blue. Shaded areas, s.e.m. d, Multiplicative gain factors applied to evidence samples that were inconsistent with the existing belief state, estimated from full normative fits. Gain factor>1 reflects relative up-weighting of inconsistent samples beyond that prescribed by normative model; gain factor<1 reflects relative down-weighting of inconsistent samples. The ideal observer employing the normative accumulation process uses gain factor=1 (dashed blue line). Participants assigned higher weight to inconsistent samples than the ideal observer (t 16 = 8.2, p < 10 −6 , two-tailed one-sample t-test of fitted weights against 1). e, Regression coefficients reflecting modulation of evidence weighting by change-point probability (CPP). Shown are coefficients for the human participants (black line), full normative fits (cyan), normative model fits without gain factor applied to inconsistent evidence (grey), and the ideal observer with matched noise (blue). Error bars and shaded regions, s.e.m. f, Mapping of stimulus location (polar angle; x-axis) onto evidence strength (LLR; y-axis) across the full range of stimulus locations. Blue line reflects the true mapping given the task generative statistics, used by the ideal observer. Cyan line and shaded areas show mean ± s.e.m of subjective mappings used by the human participants, estimated as an interpolated non-parametric function (Methods). g, Regression coefficients reflecting subjective weight of evidence associated with binned stimulus locations, for the human participants (black line), full normative model fits including the non-linear LLR mapping shown in panel f (cyan), normative model fits allowing only a linear LLR mapping (grey), and the ideal observer with matched noise (blue). Error bars and shaded regions, s.e.m. Fig. 6 | Assessment of boundary conditions at which circuit model approximates normative evidence accumulation. A reduction of the spiking-neuron circuit model (ref. 57 ) was used to explore the impact of different dynamical regimes on behavioral signatures of evidence weighting. Left two columns: shape of model's energy landscape ('potential' φ) described by equations: dX dt = − dφ dX + σξ t , φ (X) = −kμ t X − aX 2 2 + bX 4 4 in response to ambiguous (LLR = 0, left) or strong evidence (LLR = mean+1 s.d. of LLRs across experiment=1.35, second from left). µ t was the differential stimulus input to the choice-selective populations at time point t relative to trial onset (in our case the per-sample LLR, which changed every 0.4 s) that was linearly scaled by parameter k (fixed at 2.2); ξ t was a zero-mean, unit-variance Gaussian noise term that was linearly scaled by parameter σ (fixed at 0.8); and a and b shape the potential. Middle-to-right columns: psychophysical kernels and modulations of evidence weighting by CPP and −|ψ|. a, Double-well potential with small barrier between wells (a = 2, b = 1), corresponding to weak bi-stable attractor dynamics. This model variant featured a saturating decision variable in the face of multiple consistent evidence samples (akin to non-absorbing bounds), wells that maintained commitment states but were sufficiently shallow for changes-of-mind to occur in response to strongly inconsistent evidence, and sensitivity to new input when at the 'saddle point' between wells (that is during periods of uncertainty). As a consequence, it produced all key qualitative features of normative evidence accumulation (recency, strong modulation by CPP; weak modulation by uncertainty). b, Single-well potential with no barrier (a = 0, b = 1). This model variant also featured a saturating decision variable, but no stable states of commitment. Thus, it exhibited stronger recency in evidence weighting than the one in panel a. c, Flat potential with no wells (a = 0, b = 0), corresponding to perfect evidence accumulation without bounds. This model lacked both a saturating decision variable and stable states. As a consequence, it weighed all evidence samples equally (flat psychophysical kernel) and did not produce clear modulations by CPP or uncertainty. d, Double-well potential with high barrier between wells (a = 5, b = 1), corresponding to strong bi-stable attractor dynamics. In this model variant, the wells were too deep for changes-of-mind to occur for all but the most extreme inconsistent evidence. Thus, it produced extreme primacy in evidence weighting. The evidence weighting signatures of the regimes in panels c, d were qualitatively inconsistent with the signatures of participants' behavior (compare Fig. 2b). All three signatures were qualitatively re-produced by the regimes in both a and b, with the closest approximation to human behavior provided by the weak bi-stable attractor regime in panel a (highlighted in yellow). Fig. 7 | intrinsic fluctuation dynamics across visual cortical hierarchy. a, Power spectra and associated model fits of intrinsic fluctuations across cortex (n = 17 independent participants). For each hemisphere, ROI, and trial, we computed the power spectrum (1-120 Hz) from a 1-second 'baseline' period preceding trial onset and averaged these spectra across trials and hemispheres. Shaded areas show mean +/− s.e.m of observed power spectra. We modeled the power spectra as a linear superposition of an aperiodic component (power law scaling) and a variable number of periodic components (band-limited peaks) using the FOOOF toolbox (ref. 66 ; default constraints, without so-called 'knees' which were absent in the measured spectra, presumably due to the short intervals; 49-51 Hz and 99-101 Hz excluded from fits due to contamination by line noise). The fitted aperiodic components only are overlaid as dashed lines. b, Power law scaling exponents of the aperiodic components estimated from fits in a, providing a measure of the relative contributions of fast vs. slow fluctuations to the measured power spectrum. Lines, group mean; dots, individual participants. 'Forward' indicates direction of hierarchy inferred from fitted exponents across the dorsal set of visual field map ROIs (V1, V2-V4, V3A/B, IPS0/1, IPS2/3); P-value derived from two-tailed permutation test on slope of line fitted to power law exponents ordered by position along visual cortical hierarchy. Fig. 10 | Change-point probability modulates normative belief updating and phasic arousal responses more strongly than alternative metrics of surprise. a, Alternative surprise metrics as a function of posterior belief after previous sample (L n-1 ) and evidence provided by the current sample (LLR n ). (a1) CPP (see main text). (a2) 'Unconditional' surprise, U, defined as Shannon surprise (negative log probability) associated with new sample location ('loc n '), given only knowledge of the generative distributions associated with each environmental state S = {l,r}. In our task, U varied monotonically with absolute evidence strength (|LLR|) and was uncorrelated with log(CPP) (Spearman's ρ = 0.00). (a4) 'Conditional' surprise Z: Shannon surprise associated with new sample, conditioned on both knowledge of the generative distributions and one's current belief about the environmental state. Z was moderately correlated with log(CPP) in our design (Spearman's ρ = 0.36). Unlike CPP, which we use solely to decompose normative evidence accumulation and relate it to neurophysiological signals, in some models this form of surprise serves as the objective function to be minimized by the inference algorithm (for example ref. 50 ). (a3) In our task setting, Z was closely approximated (Spearman's ρ = 0.996) by a linear combination of log(CPP) and U (weights: w and 1-w, respectively; determined by Nelder-Mead simplex optimization as w = 0.274). b, Characterization of another form of surprise derived from a model of phasic locus coeruleus activity by ref. 36 , hence denoted as 'Dayan & yu surprise' and abbreviated to DY. For the oddball target detection task modelled in ref. 36 , DY was defined as the ratio between the fixed prior probability of an unlikely environmental state and its posterior probability given a new stimulus. In our 2AFC task, where the prior varies during decision formation, we defined DY as indicated below the heatmap. We found that this DY measure was highly similar to CPP (ρ = 0.97; compare panels b and a1). c, Modulations of normative belief updating by CPP, U and Z derived from application of the normative model to a single sequence of 10 million simulated samples, generated using the same statistics as the task in Fig. 1a (main text). For each surprise metric, we fit a linear model to updated prior belief (same form as model 1 in Methods; inset: X, surprise metric). Left: coefficients of partial determination; right: contributions expressed as regression coefficients. CPP exerted the strongest and only positive modulatory effect on evidence weighting, consistent with the observed effects on behavior, cortical dynamics, and pupil (see main text). d, As panel c, but for expanded linear model that included modulations of evidence weighting by CPP and U, which combine linearly into Z. Only CPP yielded a robust, positive effect on evidence weighting. e, Encoding of surprise metrics in pupil response (temporal derivative) to evidence samples (n = 17 independent participants). Encoding time course for Z was computed by a regression model that included X-and y-gaze positions as nuisance variables. Encoding time courses for CPP and U were derived from an expanded regression model that included both terms. CPP had the strongest effect of the three considered surprise metrics, whereas the univariate effect of Z was almost completely accounted for by a weaker effect of U. Thus, while the magnitude of the pupil response was sensitive to both Z components, CPP was clearly the stronger contributor. Shaded area, s.e.m., significance bars, p < 0.05 (two-tailed cluster-based permutation test).

nature research | reporting summary
April 2020 Corresponding author(s): Peter Murphy, Tobias Donner Last updated by author(s): Mar 10, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection MEG, eye-tracking and pupillometry data were collected using the acquisition software developed by the system manufacturer (MEG: CTF Systems Inc., version 5.4.2; Eye-tracking/pupil: SR Research, version 4.594). Behavioral data were collected using Matlab version 2016a, using stimulus presentation functions from Psychtoolbox 3.

Data analysis
Matlab version 2015b was used for the majority of analyses, including custom code (behavioral analysis/modelling; MEG analysis; pupillometry analysis), the FieldTrip toolbox version 20160221 (MEG preprocessing, including the Infomax algorithm for independent component analysis), the Particle Swarm Optimization toolbox version 1.0.0.0 (model fitting) and the FOOOF toolbox version 1.0.0 (analysis of MEG power spectra). Eye-blinks and saccades in eye-tracking data were detected online by the measurement device (Eyelink 1000) using software from the system manufacturer (SR Research version 4.594). Python 3.6 in combination with FreeSurfer version dev5-20161028 and the MNE toolbox version 0.16.2 were used for MRI-informed source localization of MEG data. Decoding analyses were carried out using Scikit Learn version 0.23.1. All custom code is publicly available at https://github.com/DonnerLab/2021_Murphy_Adaptive-Circuit-Dynamics-Across-Human-Cortex.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Recruitment
Experiment 1: 15 of 17 participants were recruited from a large database maintained by the Department of Neurophysiology and Pathophysiology at the University Medical Center Hamburg-Eppendorf. The remaining two participants were members of our laboratory -one was naive to the study aims, the other was the first author of the manuscript. Experiment 2: 3 of 4 participants were recruited from the departmental database, one was a member of our laboratory who was naive to the study aims. Experiment 3: 27 of 30 participants were recruited from the departmental database, three were members of our laboratory. Of the three laboratory members, two were already familiar with the nature of the experimental task, but none had performed it at the generative settings of Experiment 3 before participating. It is conceivable that familiarity with the aims of the research project may have predisposed those members of our laboratory that participated, including the first author, to intentionally try to perform the task more like the normative model -a form of self-selection bias. However, we verified that the behavior produced by these participants was not qualitatively different from those of the remaining sample; indeed, almost all participants clearly displayed the key behavioral signatures suggestive of an approximation of the normative strategy (recency in evidence weighting, as well as positive modulations of evidence weighting by change-point probability and, where appropriate, uncertainty). Thus, we do not think that inclusion of members of our laboratory as participants created any significant bias in the reported results. A separate source of bias that we acknowledge here is that the participant database we used for the majority of the recruitment contains mostly students who studied medicine at the University Medical Center Hamburg-Eppendorf, thereby not forming a representative sample from the general population. However, our study investigates low-level decision processes which we believe to be largely unaffected by this possible selection bias. All participants, with the exception of the first author and laboratory members, received remuneration for their participation in the form of an hourly rate, a study completion bonus and an additional performance-dependent bonus.

Ethics oversight
The experiments were approved by the ethics committee of the Hamburg Medical Association and conducted in accordance with the Declaration of Helsinki. All participants provided written informed consent.
Note that full information on the approval of the study protocol must also be provided in the manuscript.