High stakes slow responding, but do not help overcome Pavlovian biases in humans

“Pavlovian” or “motivational” biases are the phenomenon that the valence of prospective outcomes modulates action invigoration: Reward prospect invigorates action, while punishment prospect suppresses it. While effects of the valence of prospective outcomes are well established, it is unclear how the magnitude of outcomes modulates these biases. In this pre-registered study (N = 55), we manipulated stake magnitude (high vs. low) in an orthogonalized Motivational Go/ NoGo Task. We tested whether higher stakes (a) strengthen biases or (b) elicit cognitive control recruitment, enhancing the suppression of biases in motivationally incongruent conditions. Confirmatory tests yielded that high stakes slowed down responses independently of the Pavlovian biases, especially in motivationally incongruent conditions, without affecting response selection. Reinforcement-learning drift-diffusion models (RL-DDMs) fit to the data suggested that this effect was best captured by stakes prolonging the non-decision time, but not affecting the response threshold as in typical speed-accuracy tradeoffs. In sum, these results suggest that high stakes result in a slowing-down of the decision process without affecting the expression of Pavlovian biases in behavior. We speculate that this slowing under high stakes might reflect heightened cognitive control, which is however ineffectively used, or reflect positive conditioned suppression, i.e., the suppression of goal-directed behavior by high-value imminent rewards, as phenomenon previously observed in rodents that might also exist in humans. Pavlovian biases and slowing under high stakes seem to arise in parallel to each other.


High stakes slow responding, but do not help overcome Pavlovian biases in humans
The behavior of humans and other animals reflects the interplay of multiple, partly independent decision-making systems (Collins and Cockburn 2020;Daw et al. 2005;Dickinson and Balleine 1994;Metcalfe and Mischel 1999;Shiffrin and Schneider 1977;Strack and Deutsch 2004).One such system is the Pavlovian system which rigidly triggers response invigoration to the prospect of reward and response inhibition to the threat of punishment (Boureau and Dayan 2011;Dayan et al. 2006;O'Doherty et al. 2017).Its actions are visible in the form or "Pavlovian" or "motivational" biases, which have been proposed to underlie many seemingly maladaptive behaviors in humans and other animals (Dayan et al. 2006).
Pavlovian mechanisms might explain seemingly "irrational" behaviors in animals, including the facilitation of instrumental approach behavior by unrelated, but reward-predictive cues (Estes 1943(Estes , 1948;;LoLordo et al. 1974;Lovibond 1983;Rescorla and Solomon 1967;Schwartz 1976), or the development of "sign-tracking" behavior, i.e., reward-predictive cues distracting an animal from a focal task (Hearst and Jenkins 1974;Jenkins and Moore 1973).Recently, sign-tracking has been suggested to constitute a phenomenon shared across species, including humans (Colaizzi et al. 2020;Garofalo and di Pellegrino 2015), which might contribute to the etiology and maintenance of drug abuse (Flagel and Robinson 2017;Flagel et al. 2007).A better understanding of when Pavlovian biases occur and how they interact with other systems regulating behavior promises insights into the development and maintenance of psychiatry conditions such as alcohol or drug abuse (Chen et al. 2022b;Schad et al. 2020).So far, it has been unclear how behavior is guided in the presence of multiple rewards and/or threats of different magnitudes.Several arguments suggest that Pavlovian biases should be sensitive to the magnitude of these prospective outcomes (or "stakes").Agents frequently face situations in which they have to select amongst multiple rewards of varying magnitude.It could be beneficial if Pavlovian biases would automatically direct the agent towards the largest reward.Particularly, on its way to attaining the largest reward, an agent might have to ignore smaller, more proximal rewards.Hence, Pavlovian biases should not be triggered by any reward, but distinguish between smaller rewards on the one hand, which might be arbitrated against other goals an agent pursues using deliberational processes, and sufficiently large rewards on the other hand, which escape such an arbitration and instead elicit unconditional approach behavior.Similarly, the danger level of potential threats (or "threat magnitude") needs to be considered: A human hunter who freezes upon the sight of a lion might have a competitive advantage over someone who continues to forage.However, a hunter who freezes upon the sight of a small spider might have a disadvantage compared to other foragers, demonstrating again that Pavlovian biases can only be adaptive if they take the magnitude of rewards and threats into account and ignore smaller outcomes in service of pursuing larger outcomes.
Evidence that the strength of Pavlovian biases varies with stake magnitude has been mixed so far.A few studies using Pavlovian-to-Instrumental Transfer (PIT) tasks, in which task-irrelevant cues associated with rewards/ punishments are presented in the background, have observed slight increases in response rates and somewhat faster reaction times for higher rewards (Algermissen and den Ouden 2023;Schad et al. 2020) as well as decreased response rates and slower reaction times for larger punishments (Geurts et al. 2013b(Geurts et al. , 2013a).However, many other studies have not observed such modulations (Chen et al. 2022a(Chen et al. , 2023;;Garbusow et al. 2019Garbusow et al. , 2016;;Sommer et al. 2017Sommer et al. , 2020)).Other tasks varying the reward on offer, specifically versions of the monetary incentive delay task (Knutson et al. 2001;Luo et al. 2009) have observed faster reaction times to larger rewards.A study using a virtual predation game found slower reaction times under larger threats (Bach 2015).However, in the latter studies, it remained unclear whether reward-induced invigoration/ punishment-induced slowing followed from automatic, Pavlovian effects or rather participants' deliberate strategies, reflecting their beliefs about which behavior was conducive to reward attainment/ punishment avoidance (Mahlberg et al. 2021;Westbrook et al. 2021).To disentangle automatic from strategic effects, there must be task conditions that incentivize the suppression of Pavlovian biases-a unique feature of the Motivational Go/NoGo Task.
Pavlovian biases can most unequivocally be measured with the (orthogonalized) Motivational Go/NoGo Task.In this task, individuals learn through trial-and-error to perform either a "Go" or "NoGo" response to a number of different cues.For some cues ( "Win cues"), they gain points for correct performance (with no change in score for incorrect performance), while for other cues ("Avoid" cues), they can lose points for incorrect performance (with no change in score for correct performance; Fig. 1A-C).In this task, humans show more (and faster) Go responses to Win than Avoid cues.Hence, they are better at performing active Go actions to Win cues than passive NoGo actions to Win cues, while the reverse is true for Avoid cues, reflecting the influence of Pavlovian biases (Guitart-Masip et al. 2014, 2012;Swart et al. 2017).Unlike PIT tasks, every cue signaling whether to perform a Go or NoGo response has a fixed valence, either providing the chance to win or to lose points, typically eliciting stronger biases than tasks in which task-irrelevant cues are presented in the background.
While Pavlovian biases might lead to adaptive behavior in a number of situations, their influence becomes most apparent in situations in which they conflict with optimal behavior: Sometimes, agents have to wait to secure a reward, e.g., in situations akin to the Marshmallow Test (Mischel and Ebbesen 1970), or they have to take active steps to prevent or fight a threat, e.g., in exposure therapy to treat arachnophobia.In such circumstances, agents have to suppress Pavlovian biases, a requirement animals usually struggle with (Breland and Breland 1961;Hershberger 1986) and even humans only imperfectly master (Cavanagh et al. 2013;Swart et al. 2018).The ability to suppress automatic, unwanted action tendencies is usually regarded to require cognitive control (Cohen 2017).For several decades, cognitive control has been seen as a limited resource or ability that can fail, leading to action slips and undesired behavior (Hofmann et al. 2009).In contrast, more recent perspectives, most notably the expected value of control theory (EVC) (Lieder et al. 2018;Shenhav et al. 2013) have suggested that cognitive control is not inherently limited, but follows from a cost-benefit trade-off that weighs the benefits of exerting additional control against the costs of doing so.In line with this idea, a number of studies using conflict tasks, such as the Stroop, Simon, or Flanker task, have shown that compatibility effects-taken to reflect cognitive control limitations-become smaller when participants are offered financial incentives for recruiting control (Boehler et al. 2012;Chiew and Braver 2014;Dixon and Christoff 2012;Fröber and Dreisbach 2016;Krebs et al. 2010).From this perspective, higher stakes should motivate an agent to exert additional cognitive control in order to suppress biases in situations in which those are maladaptive.The EVC theory thus makes predictions directly opposite to the above-described case of high stakes strengthening biases: while ecological considerations suggest that higher stakes should lead to stronger biases, EVC predicts more control and thus weaker biases.To suppress biases, additional time might be required to recruit control processes, leading to higher accuracy on behalf of longer RTs, i.e., a speed-accuracy tradeoff.In contrast, in situations in which biases lead to adaptive behavior, EVC predicts no effect of stakes on behavior.
In this study, we directly tested these two opposing predictions against each other.We collected data from 55 participants performing the motivational Go/NoGo Task in which the magnitude of stakes (high or low) was manipulated on a trial-by-trial basis.Following the first hypothesis that higher stakes drive stronger Pavlovian biases, we predicted an interaction between congruency and the stakes magnitude, with a stronger congruency effect (indicative of the Pavlovian bias) and higher performance on congruent, but lower performance on incongruent trials under high compared to low stakes (Fig. 1D).In contrast, following the EVC hypothesis, we predicted an interaction effect in the opposite direction, with a weaker congruency effect (reflecting cognitive control recruitment) and selectively higher performance on incongruent trials (but slower RTs) under high compared to low stakes (Fig. 1E).

Figure 1. Task and behavioral predictions.
A. Time course of each trial.Participant see one of four cues ("gems") and have to decide whether to respond to it with a button press ("Go") or not ("NoGo").On half of the trials, the cue is surrounded by a red circle, indicating that stakes are five times as high and points gained/ lost in this trial will be multiplied with 5.After a variable interval, participants receive an outcome (increase in points, no change, or decrease in points).B. Task conditions.Half of the cues are "Win" cues for which points can be gained (or no change in the point score occurs), while the other half are "Avoid" cues for which points can be lost (or no change in the point score occurs).
Orthogonal to cue valence is the correct action required for each cue, which is either Go or NoGo.C. Feedback given cue valence and response accuracy.For Win cues, correct responses mostly lead to an increase in points (+10 or +50, depending on whether the trial was high or low stakes), but occasionally lead to no change in score (0).For Avoid cues, correct responses mostly lead to no change in score (0), while occasionally lead to a loss of points (-10 or -50, depending on whether the trial was high or low stakes).For incorrect responses, probabilities are reversed.D. Prediction from a "bias strengthening" hypothesis.High stakes strengthen biases, leading to higher accuracy for bias-congruent cues (for which required action and valence match), but lower accuracy for bias-incongruent cues.E. Prediction from the "motivation for control" hypothesis.High stakes motivate cognitive control, which inhibits biases when they are incongruent with the required action, leading to higher accuracy selectively for bias-incongruent cues (for which the bias-triggered response has to be inhibited).

Approach
Fifty-five participants (54 included in analyses) played a adapted version of the Motivational Go/NoGo Learning Task (Swart et al. 2017).This task required them to learn from trial-and-error whether to perform a Go response (button press) or NoGo response (no response) to various cues (Fig. 1A).Half of the cues required a Go response (Go cues), the other half a NoGo response (NoGo cues; Fig. 1B).Orthogonal to the required action, half of cues offered the chance to win points for correct responses (Win cues; no change in points for incorrect responses), while the other half bore the chance to lose points for incorrect responses (Avoid cues; no change in points for correct responses).
Participants typically show a Pavlovian bias in this task, with more Go responses and faster RTs for Win than Avoid cues.Feedback was probabilistic, with correct responses leading to desired outcomes on 80% of trials (win for Win cues, no change for Avoid cues), but undesired outcomes on the remaining 20% of trials (no change for Win cues, loss for Avoid cues; probabilities were reversed for incorrect responses; Fig. 1C).Orthogonal to both the required action and the valence (Win/ Avoid) of cues, we varied the stake magnitude: On half of the trials, the cue was surrounded by a red circle, signaling the chance to win/ lose 50 points (instead of 10 points) for correct/ incorrect responses.
Next, we performed a similar mixed-effects linear regression with reaction times (RTs) as dependent variable.Note that RTs were naturally only available for (correct and incorrect) Go  Accuracy per valence-action congruency and stakes condition.Accuracy is higher for congruent than incongruent conditions, but this congruency effect is not modulated by stakes.F. Group-level and individual-participant regression coefficients from a mixed-effects logistic regression of responses on congruency, stakes, and their interaction.
Exploratory post-hoc tests for each cue condition separately yielded a significant effect of stakes on RTs for three out of four cue conditions, including in particular the two incongruent conditions Go-to-Avoid and NoGo-to-Win (Go-to-Win: z = 2.973, p = .003;Go-to-Avoid: z = 4.528, p < .001;NoGo-to-Win: z = 4.975, p < .001;NoGo-to-Avoid: z = 1.414, p = .158;Fig. 3D).In further exploratory analyses, we tested whether the effect of stakes on responses got stronger or weaker with time, either within the learning trajectory of a cue (cue repetition) or across the entire task (trial number).Neither the interaction between stakes and cue repetition, b = -0.012,95%-CI [-0.030, 0.006], χ 2 (1) = 1.599, p = .206,nor the interaction between stakes and trial number, b = 0.025, 95%-CI [-0.021, 0.018], χ 2 (1) = 0.480, p = .489,was significant, providing no evidence for a change in the effect of stakes on RTs over time.See Supplementary Material S06 for tests for non-linear changes with time, again finding no evidence for changes in the effect of stakes over time.In sum, these results suggest that high stakes affected participant responses in that they overall slowed down responses.This slowing was slightly stronger for incongruent than congruent cues and appeared to be constant over time.However, stakes did not affect response accuracy nor the degree of Pavlovian bias as indexed by the decisions to make a Go or NoGo response.slower under high compared to low stakes.This effect is significantly stronger for incongruent than congruent cue conditions.F. Group-level and individual-participant regression coefficients from a mixed-effects linear regression of RTs on congruency, stakes, and their interaction.

Computational Modeling of Responses and RTs (RL-DDMs)
To better understand the mechanisms by which cue valence and stakes influenced responses and RTs, we fit a series of increasingly complex reinforcement-learning drift-diffusion models (RL-DDMs).A past study using a similar paradigm found evidence for cue valence modulating the starting point bias in an evidence-accumulation framework rather than the drift rate (Millner et al. 2017), although evidence in that study remained mixed.Furthermore, past studies suggested that responseslowing might reflect a speed-accuracy trade-off, with stakes leading to response caution and higher decision thresholds, leading to higher accuracy at the cost of slower responses (Bogacz et al. 2006;Shevlin et al. 2022;Wiecki and Frank 2013).We implemented different mechanisms of how cue valence and stakes might influence the various parameters (decision threshold, non-decision time, starting point bias, drift rate intercept) in an evidence accumulation framework and compared the fit of different, increasingly complex models.
Behavior was better described by an RL-DDM (M2) in which participants learned cue-specific Q-values rather than an standard DDM (M1) with a fixed propensity to emit Go/ NoGo responses (Fig. 4A), reflecting that participants learned the task and that learned affected responses and RTs.Model fit was further improved when incorporating a Pavlovian bias (M3-M4), specifically when fitting separate drift rate intercepts for Win and Avoid cues (M4; with high drift rate intercepts for Win than Avoid cues, see Fig. 4B).Next, we assessed different mechanisms through which stake magnitude could affect responding, which further improved model fit (M5-M8).Here, the best model was one in which stakes modulate the non-decision time (M6).Note that, although M6 showed a superior fit to M4, group-level non-decision times for high and low stakes were not significantly different from each other (Mdiff = 0.012, 95%-CI [-0.017, 0.041]), suggestive of the presence of individual differences with an overall mean close to zero.Allowing stakes to modulate two instead of one parameter did not yield any substantial improvement in fit (M9-M11).Specifically, a model implementing a "classical" speedaccuracy tradeoff by allowing stakes to influence both the threshold and the drift rate (M10) performed worse than a model allowing stakes to influence the non-decision time (M6).Lastly, model fit was further improved by when splitting the effect of stakes into separate parameters for congruent and incongruent cues (M12), which was overall the best fitting model in the model comparison.Note that M12 has the same number of parameters as models M9-M11, suggesting that the increase in fit is not due to a mere increase in the number of parameters, but due to the specific mechanism implemented.Also note that, although M12 with separate non-decision times under high stakes for congruent and incongruent cues outperformed M6 with a single non-decision time under high stakes, there was no group-level difference between the parameters for congruent vs. incongruent cues (Mdiff = -0.003,95%-CI [-0.033, 0.027], Fig. 4B), suggestive of individual differences with a group-level mean close to 0.
We performed several model validation checks to verify that the winning model M12 was able to capture key qualitative features of the empirical data (posterior predictive checks), could identify data-generating parameters reliably (parameter recovery), and could be distinguished from other models (model recovery).Data simulated from M12 reproduced a Pavlovian bias in responses and RTs, reproduced an overall slowing under high stakes, but somewhat underestimated the difference in RT slowing between congruent and incongruent cues (Fig. 4C; see also Supplementary Material S07 for further plots).Furthermore, generative and fitted parameters were overall highly correlated, indicative of a successful parameter recovery (Mr = 0.83, SDr = 0.14, range 0.62-0.98;95 th percentile of permutation null distribution: r = 0.08; Fig. 4D; see Supplementary Material S07 for scatter plots of ondiagonal correlations).Besides correlations between generative parameters with their corresponding fitted parameters, there were two notable cases of off-diagonal correlations: first, the different nondecision times (under low stakes, under high stakes for congruent cues, and under high stakes for incongruent cues) were correlated (r = 0.71 and r = 0.77; Fig. 4D), reflecting an overall tendency towards faster/ slower responses that is naturally shared across all three parameters.Second, learning rates and drift rate slopes were negatively correlated across parameter settings (r = -0.56;Fig. 4D), which mimics the frequently observed trade-off between learning rate and inverse temperature parameters in more classic reinforcement learning models of choices (Ballard and McClure 2019).In RL-DDMs, the drift rate slope is multiplied with the Q-value difference, so that steeper slopes lead to more deterministic choices and shallower slopes lead to more stochastic choices, similar to an inverse temperature parameter.Finally, model recovery was successful, particularly for the winning model M12, which was the best fitting model for 98% of data sets for which it was the generative model (forward confusion matrix; Fig. 4E).Recovery for the other models was not quite as high, though still significantly above chance for all models (Mp = 0.31, SDp = 0.32, range 0.13-0.98;95 th percentile of permutation null distribution: p = 0.10).See Supplementary Material S07 for matrices involving only the five nested sub-versions of M12 (i.e., M1, M2, M4, M6, M12).In this restricted subset, recovery was much higher (Mp = 0.74, SDp = 0.24, range 0.44-0.99;95 th percentile of permutation null distribution: p = 0.22).Also, see Supplementary Material S07 for the inverse confusion matrix.
In sum, model comparison results were in line with the regression results, yielding a selective effect of stakes in prolonging the non-decision time, and separately so for incongruent and congruent cues.Stakes did not affect the threshold and/or the drift rate as typically observed in a speed-accuracy trade-off.Hence, we conclude that stakes do not shift the speed-accuracy trade-off, but rather lead to a response slowing independent of response selection.

Discussion
In this pre-registered experiment, we found evidence that increasing stake magnitude slowed down responses in a Motivational Go/NoGo Learning Task, especially for incongruent cue conditions, without affecting whether participants responded or not.In line with previous literature, participants exhibited a Pavlovian bias in both responses and RTs (Algermissen et al. 2022;Swart et al. 2017), with more and faster Go responses to Win than Avoid cues.On trials with high stakes (i.e., larger rewards or punishments at stake), they slowed down, particularly for the two incongruent conditions Go-to-Avoid and NoGo-to-Win.This response slowing was best described by high stakes prolonging the nondecision time in a drift-diffusion model framework, particularly so for incongruent trials.This finding is inconsistent with both hypotheses put forward in the introduction, i.e., high stakes strengthening Pavlovian biases or high stakes motivating cognitive control to suppress them on incongruent trials.In sum, higher stakes slow down response selection, but neither strengthen nor weaken Pavlovian biases in responses.We propose two possible explanations for this (somewhat surprising) result: response slowing under high stakes might reflect (flexibly recruited) cognitive control, which is however ineffectively used, or it might reflect (automatic/ reflexive) positive condition suppression, i.e., the suppression of goal-directed behaviour by large imminent rewards as previously observed in animal studies.
On trials with high stakes, participants took longer to make a Go response, but did not exhibit any altered tendency for Go/ NoGo responses, i.e. no reduction or enhancement of Pavlovian biases.
Apart from the null effect on responses, RTs slowed down under high stakes, an effect that was highly consistent across participants (Fig. 3E, F).These two findings are incompatible with the first hypothesis posited, i.e., high stakes strengthening Pavlovian biases.Slowing (instead of speeding) of responses under high rewards might appear quite surprising given a large body of literature showing higher incentives to speed up responses (Fontanesi et al. 2019;Knutson et al. 2001;Luo et al. 2009;Pirrone et al. 2018;Smith and Krajbich 2018) and some evidence for larger PIT effects for high compared to low value cues (Algermissen & den Ouden, 2023;Schad et al., 2020).Notably, response slowing occurred for both appetite and aversive cues, suggesting that the effect is independent of cue valence and orthogonal to the Pavlovian biases.Note that 50% of trials were high stake trials, arguing against the possibility of surprise (i.e., oddball effects) driving the response slowing.High and low stake trials were visually very distinct, arguing against differences in processing demands between both trial types.In sum, the size of Pavlovian biases in the Motivational Go/NoGo Task appears to be unaffected by stake magnitude, which instead induced a response slowing orthogonal to the biases.
Response slowing under high stakes might be partly compatible with the second hypothesis (EVC), i.e., high stakes increasing cognitive control in order to suppress biases, given that heightened cognitive control recruitment is often inferred from/ accompanied by prolonged reaction times (Frank 2006;Shenhav et al. 2013;Wessel and Aron 2017).Specifically, in line with our preregistered hypothesis that high stakes increase cognitive control recruitment, response slowing was stronger on motivationally incongruent trials on which Pavlovian biases had to be suppressed in order to execute the correct response.This effect suggests that participants did distinguish the different cue conditions with respect to whether they could benefit from increased cognitive control recruitment and prolonged deliberation times (i.e., situations in which control could in theory change the emitted response) or not.
However, the increased deliberation time putatively afforded by cognitive control recruitment was inconsequential for response selection, and the size of Pavlovian biases (in terms of the proportion of Go responses for Win vs. Avoid cues) was unaltered under high stakes.One might thus conclude that participants recruited additional cognitive control, but did not effectively use it to suppress their Pavlovian biases when they were unhelpful.An alternative explanation for response slowing under high stakes might be the phenomenon of "choking under pressure", i.e., the fear of failure in high-stakes situations inducing rumination and thus decreasing performance (Beilock andCarr 2001, 2005), an option we had considered in our preregistration.Choking under pressure predicts a pattern opposite to the second hypothesis (EVC), with high stakes undermining cognitive control recruitment and leading to lower performance in incongruent conditions.While the observed slowing of RTs could be interpreted as a kind of "choking under pressure", we did not observe corresponding performance decrements.Hence, this finding does not fall under the phenomenon of "choking under pressure" as investigated in previous literature.In sum, these results are most compatible with the idea of high stakes leading to increased cognitive control recruitment, though without any consequences for response selection and accuracy.
Past computational models have proposed mechanisms of how decision accuracy-which is particularly warranted in high stakes situations-can be prioritized over speed by increasing decision thresholds in an evidence accumulation framework (Bogacz et al. 2006).Such increased decision bounds have been typically investigated in situations in which choice options are close in value and thus eliciting cognitive conflict.Neuro-computational models suggest that such conflict is detected by the anterior cingulate cortex and presupplementary motor area, which-via the hyperdirect pathway involving the subthalamic nucleus-project to the globus pallidus and increase decision thresholds in the basal ganglia action selection circuits, leading to a higher requirement for positive evidence to elicit a response (Cavanagh et al. 2011;Forstmann et al. 2008;Frank 2006;Frank et al. 2015;Wiecki and Frank 2013).This decision threshold adjustment will lead to a higher proportion of correct, but overall slower responses.It is plausible that the same mechanism could lead to response caution in the context of high-value cues.In fact, a series of recent studies found that cues indicating an upcoming choice between high-value options (but not the presence of high-value options per se) slowed down of RTs, which was best captured by a heightened decision threshold (Shevlin et al. 2022).However, in contrast, the data of the present study were best explained by a model embodying prolonged non-decision times rather than heightened response thresholds.It is thus unclear whether the same computational and neural mechanisms proposed for implementing speed-accuracy tradeoffs are also responsible for the response slowing observed in this data.Future studies using neuroimaging of cortical and subcortical activity (Algermissen et al. 2022) and instructions to prioritize speed or accuracy during the task (Forstmann et al. 2008) while simultaneously manipulating stakes could shed light on shared vs. separate neural mechanisms.
Another possible interpretation of our findings is that the response slowing under large stake magnitudes is an instance of positive conditioned suppression as previously reported in rodents (Azrin and Hake 1969;Marshall et al. 2023;Van Dyne 1971).In positive conditioned suppression, cues signaling the imminent receipt of a reward suppress responding.Specifically, a cue announcing an imminent reward suppresses exploratory behavior that would move the animal away from a food site, and instead invigorates and prolongs engagement with the site of reward delivery until the reward is obtained (Marshall et al. 2020).However, this suppression can extend backwards in time such that it even affects the instrumental response required to obtain the reward (i.e., a lever press).A recent study found small rewards to invigorate responding in line with classical PIT findings (Marshall et al. 2023).
However, large rewards suppressed instrumental lever pressing and diminished PIT effects, suggestive of positive conditioned suppression interfering with PIT in a way similar of our findings.
One speculation on the adaptive nature of this phenomenon is that it may prevent agents to become distracted by other reward opportunities and forget to collect the reward they previously worked for (Timberlake et al. 1982).Notably, the prolongation of RTs in the present data was particularly strong for motivationally incongruent cues, which perhaps argues against a purely automatic, "reflexive" nature of the observed effect of stake magnitude on RTs (such as positive conditioned suppression), and instead in favor of an adaptive effect that is (at least partially) sensitive to task requirements.It is thus possible that both (automatic) positive conditioned suppression and (voluntary) heightened cognitive control recruitment triggered by motivational conflict are present, or that positive conditioned suppression is (partially) a consequence of cognitive control recruitment.Future studies could test whether the slowing induced by high stakes is sensitive to the temporal delay between response execution and outcome delivery, which would argue for interference between reward collection and response selection as the cause of slowing (Delamater and Holland 2008;Marshall et al. 2023;Marshall and Ostlund 2018;Meltzer and Hamm 1978;Miczek and Grossman 1971).
Furthermore, conditioned suppression has yet not been studied in the context of avoiding aversive outcomes.Slowing induced by conditioned suppression will look highly similar to slowing induced by the Pavlovian bias itself.In our data, the finding that effects of action-valence congruency (i.e.Pavlovian bias) and stake magnitude on RTs were additive suggests independent mechanisms.Future research might try to disentangle these two effects further by using an "escape" context in which participants must select actions to terminate an ongoing punishment (e.g.loud noise), which typically inverts the Pavlovian bias and leads to an increased tendency towards action (Millner et al. 2017).
Varying the punishment magnitude in such a context could potentially elucidate joint or independent contributions of Pavlovian biases and conditioned suppression on RTs.
The presented results suggest that high stakes do not strengthen or weaken Pavlovian biases per se; rather, they globally slow or pause behavior.This slowing down can be adaptive in high threat situations in which response postponement mimics nonresponding, similar to freezing itself (Bach 2015), although in our data, the slowing did not affect participants' eventual propensity to execute a Go response.This slowing might also be adaptive from the perspective of positive conditioned suppression in focusing an agent on reward collection and consumption rather than exploring other options in the meantime (Marshall et al. 2023).The ability to inhibit behavior and wait for rewards has been proposed to be serotonergic in nature, as serotonin is likely implicated in mediating aversive inhibition (Crockett et al. 2009(Crockett et al. , 2012;;Geurts et al. 2013b).Indeed, serotonin depletion has been shown to abolish the slowing observed under high reward stakes (Bari and Robbins 2013;den Ouden et al. 2015;Soubrié 1986), while the activation of serotonergic neurons facilitates waiting for rewards (Miyazaki et al. 2011(Miyazaki et al. , 2020(Miyazaki et al. , 2014) ) and persistence in foraging (Lottem et al. 2018).Future research should explicitly test the putatively serotonergic nature of high stakes-induced response slowing in the Motivational Go/NoGo Task in particular and of positive conditioned suppression, more generally.
A limitation of the current study is that high stakes were explicitly signaled via a red circle around the task cue.In this way, the task mimicked situations in which high stakes can be inferred directly from simple visual features, e.g. when telling apart a lion from a spider.However, it does not mimic situations in which high value must be inferred indirectly from past experiences or by combining set of features, e.g., in detecting a good bargain house or car.In the context of the Motivational Go/NoGo Task, stakes were irrelevant for selection the optimal action, and evidence from a similar task (Algermissen and den Ouden 2023) suggests that participants ignore differences in outcome value when learning about the optimal action.Hence, stakes might only play a role when explicitly signaled or easily perceivable from the environment, but not when they have to be inferred from past experiences.This is an important consideration for task designs that might explain the mixed literature on stakes effects in PIT tasks.Finally, the presented finding mimics cases where "high stakes" describes the entire situation rather than a single option (Shevlin et al. 2022), but is unlike cases where only a single option is more valuable and dominates all other options.Another limitation might be that stakes were not varied in a continuous fashion, but categorically as two discrete levels.Again, it might be plausible that agents represent situations (e.g.trials) as overall "high stakes" or not, irrespective of the particular value of single options (Shevlin et al. 2022).Varying the stakes magnitude in a continuous fashion would increase processing demands and thus already slow down responses due to perceptual (irrespective of additional decision) difficulty.Furthermore, participants might subjectively recode stakes levels relative to the mean stake level, representing low rewards as disappointing and thus akin to punishments, while perceiving low punishments as a relief and thus akin to rewards (Klein et al. 2017;Palminteri et al. 2015).These considerations support the ecological validity of dichotomizing stakes into high and low levels.
However, it remains to be empirically tested whether continuous stakes levels lead to similar or different effects.
In sum, while possibilities to gain rewards/ avoid punishments induce Pavlovian biases, increasing the stakes of these prospects does not alter the strength of the bias.However, high stakes motivate humans to slow down their responses.One interpretation is that this slowing is adaptive in allowing time for conflict detection and cognitive control recruitment in case motivational biases have to be suppressed.However, the slowing is not associated with changes in response selection, i.e., also not with the degree to which participants suppress their Pavlovian biases when these are unhelpful, suggesting that humans do not use this additional time effectively.An alternative interpretation is that prolonged reaction times reflect positive conditioned suppression, i.e. attraction by the reward value that interferes with action selection itself as previously observed in rodents.Taken together, this study suggests that high stakes might have a similar effect in both humans and rodents in the context of Pavlovian/ instrumental interactions on action selection.

Participants and Exclusion Criteria
Fifty-five human participants (Mage = 22.31,SDage = 2.21,42 women,13 men; participated in an experiment of about 45 minutes.The study design, hypotheses, and analysis plan were pre-registered on OSF under https://osf.io/ue397.Individuals who were 18-30 years old, spoke and understood English, and did not suffer from colorblindness were recruited via the SONA Radboud Research Participation System of Radboud University.Their data were excluded from all analyses for two (pre-registered) reasons: (a) guessing the hypotheses of the of the study, whether they used specific strategies to perform the task, whether they found the task more or less difficult to perform on high stakes trials, and if so, whether they had an explanation of why this was the case.At the end, they received course credit for participation as well as a small extra candy reward when they scored more than 960 points (equivalent to 67% accuracy across trials, equivalence unknown to participants), which was announced in the instructions.

Task
Participants completed 320 trials (80 per condition; 40 each with high and low stakes respectively) of the Motivational Go/ NoGo learning task.Each trial started with one of four abstract geometric cues presented for 1,300 ms (Fig. 1A).The assignment of cues to task conditions was counterbalanced across participants.Participants needed to learn from trial-and-error about the cue valence, i.e., whether the cue was a Win cue (point gain for correct responses; no change in point score for incorrect responses) or an Avoid cue (no change in point score for correct responses; point loss for incorrect responses), and the required action, i.e., whether the correct response was Go (a key press of the space bar) or NoGo (no action; Fig. 1B).Participants could perform Go responses while the cue was on the screen.In 50% of trials, the cue was surrounded by a dark red circle (RGB [255, 0, 0]), signaling the chance to win or avoid losing 50 points (high stakes condition).On all other trials, 10 points could be won or lost (low stakes condition).After a variable inter-stimulus interval of 500-900 ms (uniform distribution in steps of 100 ms), numerical feedback was presented for 700 ms (+10/+50 in green font for point wins, -10/-50 in red font for point losses; 000 in grey font for no change in point score).
Feedback was probabilistic such that correct responses were followed by favorable outcomes (point win for Win cues, no change for Avoid cues) on only 80% of trials, while on the other 20% of trials, participants received unfavorable outcomes (no change for Win cues, point loss for Avoid cues; Fig. 1C).These probabilities were reversed for incorrect responses.Probabilistic feedback was used to make learning more difficult and induce a slower learning curve.Trials ended with a variable inter-trial interval of 1,300-1,700 ms (uniform distribution in steps of 100 ms).
The task was administered in four blocks of 80 trials each.Each block featured a distinct set of four cues for which participants had to learn the correct response.Probabilistic feedback and renewal of the cue set were used to slow down learning, given previous findings that biases disappear when accuracy approaches 100% (Swart et al. 2017).

Data Preprocessing
(Trials with) RTs faster than 300 ms were excluded from all analyses as those were assumed to be too fast to reflect processing of the cue.This was the case for 103 out of 17,600 trials (per participant: M = 1.91,SD = 5.89, range 0-41).See Supplementary Material S02 for results using all reaction times from all trials.

Mixed-effects Regression Models
We tested hypotheses using mixed-effects linear regression (function lmer) and logistic regression (function glmer) as implemented in the package lme4 in R (Bates et al. 2015).We used generalized linear models with a binomial link function (i.e., logistic regression) for binary dependent variables such as accuracy (correct vs. incorrect) and response (Go vs. NoGo), and linear models for continuous variables such as RTs.We used zero-sum coding for categorical independent variables.All continuous dependent and independent variables were standardized such that regression weights can be interpreted as standardized regression coefficients.All regression models contained a fixed intercept.
We added all possible random intercepts, slopes, and correlations to achieve a maximal random effects structure (Barr et al. 2013).P-values were computed using likelihood ratio tests with the package afex (Singmann et al. 2018).We considered p-values smaller than α = 0.05 as statistically significant.

Evidence for absence of an effect
We plot the condition means for each participant and provide confidence intervals for every effect.Every possible point estimate of an effect that would fall outside the estimated confidence interval can be rejected at a level of α = 0.05.

Computational modeling of responses and reaction times
Combining reinforcement learning with a drift-diffusion choice rule.A class of computational models that allows to jointly model both responses and reaction times are so called "evidence accumulation" or "sequential sampling" models such as the drift-diffusion model (DDM) (Ratcliff 1978).These models formalize a decision process in which evidence for two (or more) response options is accumulated until a fixed threshold, and a response is elicited upon reaching this threshold.The process is captured through four parameters (Wabersich and Vandekerckhove 2014): the drift rate δ, reflecting the speed with which evidence is accumulated; the decision threshold α, describing the distance of the threshold from the starting point; the starting point bias β, reflecting if the accumulation process starts in the middle between both bounds (β = 0.5) or closer to one of the boundaries, reflecting an overall response bias; and the non-decision time τ; capturing the duration of all perceptual or motor processes that contribute to RT, but are not part of the decision process itself.
Typically, DDMs aim to explain choices when response requirements given a certain visual input are clear to the participant.However, in the current study, participants learn the correct response for each cue over time, leading to progressively faster and more accurate responses.Recent advances in computational modeling propose that it is possible to combine drift-diffusion models with a reinforcement learning (RL) process, yielding a reinforcement-learning drift-diffusion model (RL-DDM) (Fontanesi et al. 2019;Pedersen et al. 2017;Miletić et al. 2020).We employed a simple Rescorla-Wagner model which uses outcomes r (+1 for rewards, 0 for neutral outcomes, -1 for punishments) to compute prediction errors r -Q, which we then used to update the action value Q for the chosen action a towards cue s: Here, the difference in Q-values between choice options (QGo -QNoGo) serves as the input to the drift rate.This difference is initially zero, but grows with learning (positive difference if "Go" leads to more rewards, and negative difference if "NoGo" leads to more rewards).This Q-value difference is then multiplied with a constant drift rate parameter.At the beginning of the learning process, the resulting low drift rates lead to more stochastic choices and slow RTs, but, as the Q-value difference grows, higher drift rates result in more deterministic choices and faster RTs.The learning process requires an additional free parameter, i.e., the learning rate parameter ε, which determines the impact of the prediction error on belief updating.The drift rate parameter acts akin to the inverse temperature parameter used in the softmax choice rule, with higher drift rates leading to more deterministic choices.
One peculiarity of the Motivational Go-NoGo Task is the NoGo response option, which by definition does not yield RTs. Variants of the DDM allow for such responses by integrating over the latent RT distribution of the implicit NoGo decision boundary (Gomez et al. 2007;Ratcliff et al. 2018), for which an approximation exists (Blurton et al. 2012).This implementation has previously been used to model another variant of motivational Go/ NoGo task (Millner et al. 2017) and is implemented in the HDDM toolbox (Wiecki et al. 2013).
Note that RL-DDMs were not mentioned in the pre-registration, which only mentioned reinforcement learning models to-be fitted to participants' choices.In light of the results from the regression analyses, incorporating RTs into the model and testing alternative mechanisms by which stakes could influence the choice process seemed warranted.
Model space.We fit a series of increasingly complex models.We first tested whether an RL-DDM fit the data better than a standard DDM; then tested the computational implementation of the Pavlovian bias, and lastly tested the effect of stakes on model parameters.Model M1 (parameters α, τ, β, δINT) just featured the DDM model with a constant drift rate parameter, but no learning, assuming that participants have a constant propensity to make a Go response for any trial, irrespective of the presented cue.M2 (parameters α, τ, β, δINT, δSLOPE, ε) added a reinforcement learning process, updating Q-values for Go and NoGo for each cue with the observed feedback, multiplying the Q-value difference (QGo -QNoGo) with the drift rate parameter δSLOPE and finally adding it to the drift-rate intercept δINT to obtain the net drift rate.Including a drift-rate intercept δINT, i.e., an overall tendency towards making a Go/NoGo response even when the Q-value difference was zero, which is similar to an overall Go bias parameter, yielded considerably better fit than models without such an intercept.(Millner et al. 2017).Next, models M5-M8 (parameters α, τ, β, δWIN, δAVOID, δSLOPE, ε, one additional parameter π for high stakes) extended M4 and tested possible effects of the stakes on a single parameter, implementing effect of the stakes on the threshold (M5), the non-decision time (M6), the bias (M7) and the drift rate intercept (M8).As a control, models M9-M11 (parameters α, τ, β, δWIN, δAVOID, δSLOPE, ε, two additional parameters π and θ for high stakes) tested effects of stakes on two parameters (only combinations that could potentially give rise to response slowing), namely on both the threshold and the non-decision time (M9), the threshold and the drift rate (M10; i.e. the two parameters typically modulated by speed-accuracy trade-offs), and the non-decision time and drift rate (M11).
Finally, given the results from model comparison of these earlier models, M12 tested whether the effect of stakes of non-decision time was different for congruent and incongruent cues.
Model fitting and convergence checks.For each model, we used four chains with 10,000 iterations each (5,000 as warm-up), yielding a total of 20,000 samples contributing to the posteriors.
We checked that Rhats for all parameters were below 1.01, effective sample sizes for all parameters were at least 400, that chains were stationary and well-mixing (using trace plots), that the Bayesian fraction of missing information (BFMI) for each chain was above 0.2, and that (if possible) no divergent transitions occurred (Baribault and Collins 2023).To minimize the occurrence of divergent transitions, we increased the target average proposal acceptance probability (adapt_delta) to 0.99.We visually inspected that posterior densities were unimodal and no strong trade-offs between parameters across samples occurred.

Model comparison.
For model comparison, we used the LOO-IC (efficient approximate leaveone-out cross-validation information criterion) based on Pareto-smoothed importance sampling (PSIS) (Vehtari et al. 2017).For completeness, we also report the WAIC (widely applicable information criterion) in Supplementary Material S07, but give priority to the LOO-IC, which is more robust to weak priors or influential observations (Vehtari et al. 2017).Both WAIC and LOO-IC behave like the negative log-likelihood, with lower numbers indicating better model fit.
Posterior predictive checks.For the winning model M12, we randomly drew 1,000 samples from the posteriors of each participants' subject-level parameters, simulated a data set for each participant for each of these 1,000 parameter settings, and computed the mean simulated p(Go), p(Correct), and RT for each participant for each trial across parameter settings.We then plotted the mean simulated p(Go), p(Correct), and RT as a function of relevant task conditions to verify that the model could reproduce key qualitative patterns from the empirical data (Palminteri et al. 2017).
Parameter recovery.For the winning model M12, we fitted a multivariate normal distribution to the mean subject-level parameters across participants and sampled 1,000 new parameter settings from this distribution.We simulated a data set for each parameter setting and fitted model M12 to the simulated data.We then correlated the "ground-truth" generative parameters used to simulate each data set to the fitted parameters obtained when fitting M12 to it.To evaluate whether correlations were significantly higher than expectable by chance, we computed a permutation null distribution of the ondiagonal correlations.For this purpose, over 1,000 iterations, we randomly permuted the assignment of fitted parameter values to data sets, correlated generative and fitted parameter values, and saved the ondiagonal correlations.We tested empirical correlations against the 95 th percentile of this permutation null distribution.

Model recovery.
For each of the 12 models, we fitted a multivariate normal distribution to the mean subject-level parameters across participants and sampled 1,000 new parameter settings from it (with the constraints that learning rates were required to be > 0.05 and parameter differences sampled from the upper 50% of the parameter distribution to keep models distinguishable).We simulated a new data set for each parameter setting, resulting in total in 12,000 data sets.We fitted each of the 12 models to each data set, resulting in 144,000 model fits.).To evaluate whether these probabilities were significantly higher than expectable by chance, we computed a permutation null distribution of the on-diagonal probabilities.For this purpose, over 1,000 iterations, we randomly permuted the LOO-IC values of all fitted models for a given data set, counted how often each fitted model emerged as the winning model for the data sets of each generative model, and extracted the on-diagonal probabilities.We tested empirical probabilities against the 95 th percentile of this null distribution.
Supplemental Material S01: Overview results mixed-effects regression models Here, we report an overview over all major statistical results reported in the main text and the supplementary material.For details on how mixed-effects regression were performed, see the Methods section of the main text.S03.Overview of all regression models when including data from all N = 55 participants (also the one participant excluded from analyses reported in the main text for not performing above chance level).

Table S09. Means and standard deviations of reaction times across participants per required action x valence x stakes condition.
Supplemental Material S05: Correlations with questionnaires In line with the exploratory analysis plans in mentioned in our pre-registration, we extracted the perparticipant coefficients (fixed plus random effects) for (a) the effect of cue valence on responses (Pavlovian bias), (b) the effect of stakes on accuracy, (c) the effect of valence on RTs (Pavlovian bias), and (d) the effect of stakes on RTs.We then computed correlations of these coefficients with forward memory span (Fitzpatrick et al., 2015), backwards memory span, the non-planning subscale of the Barratt Impulsiveness Scale (Patton, Stanford, & Barratt, 1995) Beilock, 2011).Also, the effects of rumination on performance might be stronger in individuals with a low working memory score (Beilock & Carr, 2005;Bijleveld & Veling, 2014;DeCaro et al., 2011).
Hence, individuals high on neuroticism and/or low on working memory span might show stronger effects of stakes on behavior.
See Figures S01 and S02 for scatterplots of all bivariate associations.None of the correlations were significant, providing no evidence for the strength of the Pavlovian bias or the effect of stakes on responses and RTs being related to either working memory span, impulsivity, or neuroticism.S11.Results from generalized additive mixed models (GAMMs) with difference smooths between two conditions.The parametric term reflects a linear difference between conditions, while the smooth terms reflects any non-linear difference.Both add up to the total term.The time window of significant condition differences is automatically returned by the model.All on-diagonal probabilities are significantly above chance (range 0.13-0.98;95th percentile of permutation null distribution: p = 0.10).Especially recovery for M12 is exceptionally high (98%).B. The inverse confusion matrix displays the conditional probabilities that model X is the generative model (rows) if model Y (rows) is the best fitting model for a given data set.Columns sum to 100%.On-diagonal probabilities indicate the probability of reidentifying the generative model.All on-diagonal probabilities are significantly above chance (range 0.30-1.00;95th percentile of permutation null distribution: p = 0.10).C. Forward confusion matrix only for the five models that are nested sub-versions of M12 (i.e., M1, M2, M4, M6, M12).Recovery is overall much higher (range 0.44-0.99;95 th percentile of permutation null distribution: p = 0.22).D. Inverse confusion matrix only for the five models that are nested sub-versions of M12.Recovery is overall much higher (range 0.58-0.99;95 th percentile of permutation null distribution: p = 0.22).

Figure 2 .
Figure 2. Effect on propensity of Go responses.A. Learning curves per cue condition.B. Proportion of Go responses per cue condition (individual dots are individual participant means).Participants show more Go responses to Go than NoGo cues (indicative of learning the task) and more Go responses to Win cues than Avoid cues (indicative of Pavlovian biases).C. Group-level (colored dot, 95%-CI) and individual-participant (grey dots) regression coefficients from a mixed-effects logistic regression of responses on required action, cue valence, and their interaction.D. Accuracy per cue condition and stakes condition.There is no effect of stakes on responses for any cue condition.E.

Figure 3 .
Figure 3.Effect on propensity of reaction times (RTs).A. Distribution of RTs for high and low stakes.RTs are slower under high stakes.B. RTs per cue condition.Participants show faster RTs for (correct) Go responses to Go cues than (incorrect) Go responses to NoGo cues and faster RTs for Go to Win cues than Avoid cues (indicative of Pavlovian biases).C. Group-level (colored dot, 95%-CI) and individualparticipant (grey dots) regression coefficients from a mixed-effects linear regression of RTs on required action, cue valence, and their interaction.D. RTs per cue condition and stakes condition.RTs are significantly slower under high stakes in the Go-to-Win (G2W), Go-to-Avoid (G2A), and NoGo-to-Win (NG2W) conditions.E. RTs per valence-action congruency and stakes condition.RTs after significantly

Figure 4 .
Figure 4. Reinforcement-learning drift-diffusion models.A. Model comparison.LOO-IC favors model M12, implementing separate drift rate intercepts for Win and Avoid cues and separate non-decision times for low stakes, congruent cues under high stakes, and incongruent cues under high stakes.B. Densities of best fitting parameters for model M12 per participant.Drift rate intercepts for Win cues are consistently higher than drift rate intercepts for Avoid cues.Note that, although the winning model implements separate non-decision times for high/ low stakes and congruent/ incongruent cues, the parameter values for these different conditions are not significantly different from each other.C. Posterior predictive checks for the winning model M12.Left panel: Simulated proportion of Go responses per required action and cue valence averaged over simulations and participants.The winning model M12 reproduces Pavlovian biases in responses and RTs (see Supplementary Material S07).Right panel: Simulated RTs per cue congruency per stakes level averaged over simulations and participants.The winning model M12 reproduces the overall slowing under high stakes as well as differences in slowing between congruent and incongruent cues, but underestimates this difference compared to the empirical data.For further plots, see Supplementary Material S07.D. Parameter recovery for the winning model M12.Correlations between generative parameters used for simulating 1,000 data sets based on M12 and parameters obtained when fitting M12 to simulated data.All correlations between generative and fitted parameters (on-diagonal correlations) are significantly above chance.E. Model recovery for model M1-M12.The forward confusion matrix displays the conditional probabilities that model Y is the best fitting model (columns) if model X (rows) is the underlying generative model used to simulate a given data set.Ondiagonal probabilities indicate the probability of reidentifying the generative model.All on-diagonal probabilities are significantly above chance.Especially recovery for M12 is exceptionally high.For the inverse confusion matrix and matrix on subsets of models, see Supplementary Material S07.
For each data set, we identified the model with the lowest LOO-IC.We counted how often each fitted model Y emerged as the winning model for the data sets of each generative model X, computing the forward confusion matrix containing conditional probabilities p(best fitting model = Y | generative model = X) for each combination of generative model X and fitted model Y (Wilson and Collins 2019).We also computed the inverse confusion matrix containing p(generative model = X | best-fitting model = Y; see Supplementary Material S07 , and the neuroticism subscale of the neuroticism sub-scale of the Big Five Aspects Scales (DeYoung, Quilty, & Peterson, 2007).One might plausibly hypothesize that impulsivity is related to the Pavlovian bias since many impulsive behaviors can be conceptualized as automatic, cue-triggered behaviors.Hence, individuals high on impulsivity might show stronger Pavlovian biases in responses and reaction times.Furthermore, one might hypothesize that the phenomenon of choking under pressure arises from rumination and worrying, which is typically increased in individuals scoring high on neuroticism (DeCaro, Thomas, Albert, &

Figure S01 .
Figure S01.Association of memory performance, impulsivity, and neuroticism with the valence and stakes effects on responses.Correlations between the effect of valence on responses (A-D), reflecting Pavlovian biases, and the effect of stakes on accuracy (E-H) with (A/F) forward working memory span, (B/F) backwards working memory span, (C/G) impulsivity (Barratt Impulsiveness Scale, nonplanning subscale) and (D/H) neuroticism.Black dots represent per-participant scores, the red line the best-fitting regression line, they grey shade the 95%-confidence interval.None of the displayed correlations is significant at α = .05.

Figure S03 .
Figure S03.Association of memory performance, impulsivity, and neuroticism with the valence and stakes effects on RTs.Correlations between the effect of valence on RTs (A-D), reflecting Pavlovian biases, and the effect of stakes on RTs (E-H) with (A/F) forward working memory span, (B/F) backwards working memory span, (C/G) impulsivity (Barratt Impulsiveness Scale, non-planning subscale) and (D/H) neuroticism.Black dots represent per-participant scores, the red line the best-fitting regression line, they grey shade the 95%confidence interval.None of the displayed correlations is significant at α = .05.

Figure S05 .
Figure S05.Posterior predictive checks for data simulated from the winning model M12. A. Both in empirical data (left panel) and data simulated from the winning model M12 (right panel), (simulated) participants performed more Go responses to Go than NoGo cues (learning) and more Go responses to Win than Avoid cues (Pavlovian bias).Simulated data matched the empirical data pattern.B. Both in empirical and simulated data, (simulated) participants showed faster responses to Go than NoGo cues and to Win than Avoid cues.Simulated data matched the empirical data pattern.C.Both in empirical and simulated data, (simulated) participants performed more accurately for congruent than incongruent cues, with no difference between high and low stakes.D. Both in empirical and simulated data, (simulated) participants performed faster for congruent than incongruent cues and under low compared to high stakes.In empirical participants, the stakes effect was stronger for incongruent than congruent cues, but this difference was somewhat underestimated by the winning model M12.

Figure S06 .
Figure S06.Parameter recovery results for the winning model M12.The correlation between generative and fitted parameters is overall very high.Recovery is overall very high.It is least optimal (but still strongly significant) for δSlope and ε, which trade off against each other (see Fig. 4D main text).α = decision threshold, τ = non-decision time, β = starting point bias, δ = drift rate, ε = learning rate.

Figure S07 .
Figure S07.Forward and inverse confusion matrices from model recovery of all models and of nested sub-versions of the winning model M12. A. The forward confusion matrix displays the conditional probabilities that model Y is the best fitting model (columns) if model X (rows) is the underlying generative model used to simulate a given data set (identical to Fig. 4E main text).Rows sum to 100%.On-diagonal probabilities indicate the probability of reidentifying the generative model.All on-diagonal probabilities are significantly above chance (range 0.13-0.98;95th percentile of permutation null distribution: p = 0.10).Especially recovery for M12 is exceptionally high (98%).B. The inverse confusion matrix displays the conditional probabilities that model X is the generative model (rows) if model Y (rows) is the best fitting model for a given data set.Columns sum to 100%.On-diagonal probabilities indicate the probability of reidentifying the generative model.All on-diagonal probabilities are significantly above chance (range 0.30-1.00;95th percentile of permutation null distribution: p = 0.10).C. Forward confusion matrix only for the five models that are nested sub-versions of M12 (i.e., M1, M2, M4, M6, M12).Recovery is overall much higher (range 0.44-0.99;95 th percentile of permutation null distribution: p = 0.22).D. Inverse confusion matrix only for the five models that are nested sub-versions of M12.Recovery is overall much higher (range 0.58-0.99;95 th percentile of permutation null distribution: p = 0.22).

Table S10 . Results from generalized additive mixed models (GAMMs) with separate smooth per condition. The parametric term reflects a linear change in time, while the smooth terms reflects any non-linear changes. Both add up to the total term. Model Parametric coefficient (Intercept difference) Smooth (non-linear differences)
Incongruent/ low t(3, 0.104) = 6.496, p < .001F(1.947, 2.381) = 15.505,p < .001