Goal-directed recruitment of Pavlovian biases through selective visual attention

Prospective outcomes bias behavior in a “Pavlovian” manner: Reward prospect invigorates action, while punishment prospect suppresses it. Theories have posited Pavlovian biases as global action “priors” in unfamiliar or uncontrollable environments. However, this account fails to explain the strength of these biases—causing frequent action slips—even in well-known environments. We propose that Pavlovian control is additionally useful if flexibly recruited by instrumental control. Specifically, instrumental action plans might shape selective attention to reward/ punishment information and thus the input to Pavlovian control. In two eye-tracking samples (N = 35/ 64), we observed that Go/ NoGo action plans influenced when and for how long participants attended to reward/ punishment information, which in turn biased their responses in a Pavlovian manner. Participants with stronger attentional effects showed higher performance. Thus, humans appear to align Pavlovian control with their instrumental action plans, extending its role beyond action defaults to a powerful tool ensuring robust action execution.

trigger Pavlovian biases that facilitate the implementation of these plans. Stronger "outsourcing" of 48 action implementation to such an attentional recruitment of Pavlovian biases leads to higher 49 performance. These findings highlight how Pavlovian biases are more flexible than previously 50 thought and how strong biases can be of advantage. these biases are so seemingly strong and hard to suppress. 95 A relevant question in many fields of psychology is whether distinct strategies operate in 96 isolation, conflict with each other, or even work in synergy. Specifically, more sophisticated strategies 97 might "outsource" certain sub-routines to simpler strategies, yielding a "division of labor". Such a 98 synergy is frequently assumed to evolve over time, with initial acquisition through more "explicit" 99 rule-driven strategies, which are later outsourced to more "implicit", incremental, habit-like strategies, 100 a division prominent in goal-directed vs. habitual decision-making (Balleine & Dickinson, 1998 . Crucially, such a simultaneous collaboration requires both systems to be permanently active. In this paper, we propose that also instrumental and 109 Pavlovian control can work in such a synergy. 110 In contrast to previous literature that has assumed a parallel, strictly segregated arrangement 111 of instrumental and Pavlovian control, we suggest that the instrumental system can adaptively recruit 112 and steer the Pavlovian system by selecting its input via visual attention. Humans are not just 113 passively exposed to reward and punishment cues that drive these biases. Instead, they can actively 114 seek out or ignore these cues and thereby modulate their influence via selective visual attention 115 to cues that activate them and then automatically trigger the intended action. In this scenario, it might 119 be warranted to keep the Pavlovian system permanently "online", accepting a few infrequent errors at 120 the benefit of overall more robust action implementation. This view contrasts with earlier assumptions 121 that Pavlovian biases are merely "defaults" to fall back to in novel or uncontrollable environments. 122 Instead, keeping Pavlovian control constantly online during instrumental goal pursuit might be 123 advantageous. However, previous task designs measuring Pavlovian biases do not match such 124 scenarios in which agents actively seek out information that helps them achieve their goals. We 125 developed a new paradigm that temporally separates action selection, attention to reward and 126 punishment information, and action execution. We then tested whether humans seek out reward and 127 punishment information-and allow Pavlovian biases to shape responding-in a way that is aligned 128 with their action goals. Note that, in the following, we will use the term "goal-directed" to denote 129 such a synchronization between action goals and information search-remaining tacit about whether 130 negative information is required for making the correct choice. Theoretical perspectives have 137 speculated that longer attention to an option facilitates memory retrieval of its features, which could perceptual sensitivity to be selectively sharpened for features relevant for an ongoing action, e.g. 155 object location for reaching movements or object size and orientation for grasping movements 156 (Bekkering & Neggers, 2002;Craighero et al., 1999;Fagioli et al., 2007). However, in the domain of 157 value-based decision-making, similar evidence for task goals shaping attention is scarce. One relevant 158 finding might be that humans tend to seek out a choice option one final time before selecting it ("last 159 fixation" or "late onset " bias) even if they already know this option to be superior to other options 160 (Hunt et al., 2016;Kaanders et al., 2021). In this case, attention appears to be guided by choice rather 161 than vice versa, extending of the premotor theory of attention to value-based decision-making. 162 Taken together, there appear to be mechanisms synchronizing agents' attention with their 163 action plans, and there is tentative evidence for attention to reward and punishment information 164 triggering automatic responses in the fashion of Pavlovian biases. Hence, it seems indeed possible that 165 an instrumental system could "recruit" the Pavlovian system to "aid" the execution of action plans by 166 strategically steering attention toward relevant information. We tested this idea in two samples (the 167 second one a direct, pre-registered replication) using eye-tracking. For this purpose, we designed a 168 novel Go/ NoGo learning task in which action planning and execution were separated by a phase in 169 which participants could preview the positive or negative outcomes at stake. Notably, information 170 about these outcomes was not informative for the selection of the correct action. We predicted that 171 action plans would shape attention to reward and punishment stakes, i.e., that participants' first 172 fixation (not confounded by bottom-up saliency effects due to a gaze-contingent design) would be 173 more often on the reward information when participants planned a Go (compared to a NoGo) action. 174 Vice versa, we predicted an effect of attention duration to rewards vs. punishments on the final 175 response, i.e., that longer attention to reward compared to punishment information would lead to more 176 Go responses and speed up reaction times (Fig. 1A, B). Such a goal-directed recruitment of Pavlovian 177 biases would extend their role beyond mere "default" strategies in novel environments towards a 178 powerful aiding robust action execution. The first fixation anchors attention and (partly) determines which stakes will receive more attention, which is additionally modulated by bottom-up signals such as the magnitude of the stakes. The relative attention on reward versus punishment stakes (dwell time) biases the final Go/ NoGo action in a Pavlovian manner. B. Cartoon illustration of the proposed interaction of action planning and attention. C. Task design. Participants learned Go/ NoGo responses to various cues (cover story: feed/ not feed various oyster types to maximize pearls and minimize toxic tumors). Cue presentation (instructing the correct action) and action execution are separated by a phase in which rewards (pearls, here orange) and punishments (toxic tumors, here blue) at stake for correct/ incorrect responses are presented in a gaze-contingent manner. Afterwards, the oyster (black oval) can be fed, and for Go responses, participants have to press the button on the side where it is "still open". Outcomes are delivered in a probabilistic manner (75% feedback validity). On catch trials, participants have to indicate whether the oyster featured more pearls or tumors (cover story: The oyster is stolen by thieves and has to be retrieved back from the police, which requires identification).

Participants and Exclusion Criteria 181
In Sample 1, we recorded eye-tracking data from 35 participants (M age = 23.7, SD age = 4.1, range 18-182 35, one outlier at age 58; 27 women, 8 men; 30 right-handed; 21 with the right eye as dominant eye). 183 In Sample 2 (replication sample), we recorded data from 64 participants (M age = 21.5, SD age = 3.0, 184 range 18-34; 50 women, 13 men, 1 other; 62 right-handed; 41 with the right eye as dominant eye). In 185 this replication sample, the study design, hypotheses, and analysis plan were pre-registered 186 (https://osf.io/nsy5x ). The sample size for this sample was based on the effect size of the primary 187 effect of interest in Sample 1, i.e., action requirements affecting first fixations (z = 2.89, Cohen's d = 188 0.49), which yielded a required sample of N = 57 to detect such an effect with 95% power (two-sided 189 one-sample t-test) (Murayama et al., 2022). We initially collected data from 57 participants, but given 190 that seven participants did not perform significantly above chance level, we collected additional seven 191 participants. Performance above 56% in 240 trials was significantly above chance (one-sided binomial 192 test was placed 20 cm in front of the screen, and participants' chin rest 90 cm in front of the screen. 217 Before the task, participants performed a 9-point calibration and validation procedure (software 218 provided by SR Research). Calibration was repeated until an error < 1° was achieved for all points. 219 The screen background grey tone (RGB 180, 180, 180) was constant across calibration and the 220 experimental task. 221 Task 222 Participants performed a Go/ NoGo learning task with delayed response execution, called the 223 Oyster Farming Task (Fig. 1C). On each trial, participants cultivated an oyster that could either grow 224 1-5 pearls or 1-5 hazardous tumors. Pearls gained money while tumors cost money for disposal. To 225 maximize the probability that oysters grew pearls, participants needed to learn which oysters to "feed" 226 (Go) and which ones not to feed ("NoGo"). Crucially, participants could choose to reveal the reward 227 (number of pearls) and punishment (number of tumors) at stake prior to action execution in a gaze-228 contingent design. Participants' score of accumulated money was turned into a bonus of 0-2€ at the 229 end of the task. Participants performed 264 trials split into three blocks of 88 trials (80 trials of the 230 Go/ NoGo task, 8 catch trials), each with a new set of four oyster types. For detailed information on 231 the instructions, see the original materials used in this study available in the data sharing collection 232 under [All data and code will be made available upon manuscript acceptance]. 233 Each trial started with one (of four) abstract action cues (letters from the Agathodaimon 234 alphabet; size 5.2° x 5.2°) presented for 700 ms in the center of the screen, representing an oyster 235 type. For each oyster type, there was an optimal action (feed or not feed) that participants needed to 236 learn by trial-and-error. Feeding was only possible when the oysters "opened" later in the trial. The 237 optimal action led to rewards (pearls) in 75% of (valid) trials, otherwise to punishments (tumors; on 238 "invalid trials"). Vice versa, suboptimal actions led to punishments on valid trials, but to rewards on 239 invalid trials. During action cue presentation, participants were informed about the sides (left vs. 240 right) on which upcoming stakes information (rewards vs. punishments) would appear via faintly 241 colored semi-circles in the respective colors (blue and orange, counter-balanced across participants). 242 Directly after action cue off-set, participants were cued with the exact locations of the stakes 243 and given 1,500 ms to unveil the tumors and pearls at stake on the respective trial. Stakes were 244 revealed in a gaze-contingent fashion: fuzzy circular color patches appeared on the semi-circles, 245 which changed into the number of pearls/ tumors at stake when participants fixated them. This 246 eliminated any bottom-up saliency effects (e.g., of stake magnitude) on peripheral vision that could 247 affect participants' first fixations. To prevent exact pre-programming of saccades, exact locations of 248 stakes varied across trials. Stakes were located on an invisible circle with a radius of 5.2° visual angle 249 around the screen center (i.e., distance of stakes from the center was kept constant), with a potential 250 vertical displacement of -45 -+45 degrees from the horizontal midline. Vertical displacement was 251 always identical for both pearls and tumors. Stakes were represented by circular areas of interest 252 (AOI) of 150 pixels (2.7°), with a minimal distance between stakes (at maximal vertical displacement) 253 of 514 pixels (9.4°) and a maximal distance (positioned on the horizontal midline) of 852 pixels 254 (15.6°). Stakes were presented in orange (RGB 200, 100, 7) and blue (RGB 104, 104, 255) of equal 255 luma. Stakes varied in magnitude (1-5 items; total display size 2.6° x 2.6°) and magnitude was 256 as being on the reward/ punishment stakes when gaze position was less than 150 pixels away from the 296 center of the respective stakes image, which was also the criterion in our gaze-contingent design for 297 rendering stakes visible. For each trial, we computed the first fixation on any stakes object (reward or 298 punishment) as well as the total duration (in ms) with which rewards and punishments were fixated 299 over the entire trial ("dwell time"). Absolute dwell times were converted into dwell time difference 300 (reward time minus punishment time) and dwell time ratios Material 1 and led to identical conclusions. Analyses using only the trials on which participants 308 fixated both stakes led to largely identical conclusions. 309

Data Analysis 310
General strategy. We tested hypotheses using mixed-effects linear regression (function lmer) 311 and logistic regression (function glmer) as implemented in the package lme4 in R . 312 We used generalized linear models with a binomial link function (i.e., logistic regression) for binary 313 dependent variables such as responses (Go vs NoGo) and first fixation, and linear models for 314 continuous variables such as RTs or dwell time. We used zero-sum coding for categorical independent 315 variables. All continuous dependent and independent variables were standardized such that regression 316 weights can be interpreted as standardized regression coefficients. All regression models contained a 317 fixed intercept. We added all possible random intercepts, slopes, and correlations to achieve a 318 maximal random effects structure . P-values were computed using likelihood ratio 319 tests with the package afex (Singmann et al., 2018). We considered p-values smaller than α = 0.05 as 320 statistically significant. 321 The main analyses were pre-registered for Sample 2 (replication sample; pre-registration 322 available under https://osf.io/nsy5x). We deviated from our pre-registration by reporting results based 323 and ii) whether responses were influenced by the magnitude of the reward and punishment stakes, 335 reflecting the presence of a Pavlovian bias. For this purpose, we fitted mixed-effects regressions with 336 responses (Go/ NoGo) and (as secondary variable) reaction times as dependent variables and a) the 337 required action (Go/ NoGo) as well as b) the difference in reward and punishment stake magnitude 338 (ranging from -4 to +4) as independent variables. A significant effect of stake difference was followed 339 up with post hoc analyses separating the effects of reward and punishment stake magnitudes, reported 340 in Supplemental Material 3. 341 Analysis of gaze patterns. Our first key prediction was that action plans, elicited by the 342 oyster cues, directed attention towards action-congruent stakes (reward stake for Go requirement, 343 punishment stake for NoGo requirement). The crucial test of this prediction was whether the action 344 requirement elicited by the cue affected the location of the first fixation (on the reward versus the 345 punishment stake). This first fixation was not confounded by any bottom-up saliency effects since, in 346 our gaze-contingent design, the magnitudes of the stakes was not visible yet. We used both required 347 action (Go or NoGo) and the difference in the modeled Q-values for Go relative to NoGo responses as 348 independent variables to predict the first fixation. These analyses also included catch trials since, 349 during the stakes phase, participants were unaware of whether the trial would be a Go/ NoGo or catch 350 trial. All eye-tracking analyses contained a regressor capturing any participant-specific side biases 351 (overall preference to fixate the left or right). 352

Computational modelling of action values. We tested the impact of participants' action 353
intentions on their attention towards the reward and punishment stakes using two operationalizations: 354 Firstly, we approximated participants' intentions by the action required by the presented cue (oyster 355 type). However, this operationalization assumes that participants (have learned and) know the 356 required action. This assumption is violated i) at the beginning of blocks when participants cannot 357 know the required action yet and still have to acquire it through trial-and-error, and ii) even more so in 358 participants who fail to learn the correct response for (some of) the cues. Thus, secondly, as a more 359 proximate measure of participants' beliefs of what they should do, we fitted a simple Rescorla-360 Wagner Q-learning model to the Go/ NoGo response data of each participant. This model uses 361 outcomes r (+1 for rewards, -1 for punishments; given that the exact outcome magnitude is irrelevant 362 for learning) to update the action-value Q for the chosen action a towards cue s: 363 Action values were then translated to action probabilities using a Softmax choice rule: 366 (2) 367 The model featured the free parameters α participants were instructed that such "directional" errors were always counted as incorrect. Feedback 384 was thus not informative as to whether a Go or NoGo response would have been correct for this cue. 385

Analysis of effects of attention on responses and reaction times. Our second key prediction 386
was that attention to the reward and punishment stakes would shape action execution. To test this 387 prediction, we tested whether the dwell time difference (milliseconds spent on reward stakes minus 388 milliseconds spent attending to punishment stakes) predicted responses (Go vs. NoGo) and response 389 speed (RT, for Go responses only). These analyses excluded catch task trials (where responses did not 390 relate to learning but to comparing stake magnitudes). All analyses involving responses or reaction 391 times as dependent variable controlled for the required response as well as participant-specific side 392 biases (overall preference to first fixate the left or right). Results did not change when controlling for 393 the Q-value difference instead of the required response. 394 Note that, in our pre-registration for Sample 2, we mentioned the plan to fit reinforcement-395 learning drift diffusion models (RL-DDMs) to the combined choice and RT data. See Supplemental 396 Material 3 for a discussion of why these models were unable to reproduce important qualitative 397 patterns present in the empirical data, which was likely due to the tight response deadline and the 398 NoGo response option. 399 Between-subjects correlations of accuracy. If humans synchronized their attention with 400 their action plans such that Pavlovian biases would align with instrumental action requirements, one 401 would expect this process to facilitate task performance and lead to higher accuracy. To test whether 402 participants with stronger effects of attention on the final response indeed showed higher accuracy, we 403 performed exploratory analyses by computing between-subjects correlations between overall task 404 accuracy and i) the degree to which stake differences (reward minus punishment stake magnitude) 405 affected responses as well as ii) the degree to which relative dwell time (reward minus punishment 406 dwell time) affected responses. For this purpose, we refit the respective models on all participants, 407 collapsing across both samples (total N = 99), and computed between-subjects correlations between 408 participants' percent correct responses and their respective regression coefficient (fixed + random 409 effect extracted). 410

Transparency and openness 411
We report how we determined our sample size, all data exclusions, all manipulations, and all 412 measures in the study. All data, analysis code, and research materials are available under [All data and 413 code will be made available upon manuscript acceptance]. The study design, hypotheses, and analysis 414 plan for Sample 2 were pre-registered under https://osf.io/nsy5x. Data were analyzed using R, version (1) = 6.31, p = .012). The effect of stakes 435 differences did not become weaker over trials or blocks (see Supplemental Material 3). Separating 436 these effects for the reward and punishment stakes showed that effects were driven by both valences: 437 Higher (relative to lower) reward stake magnitude increased responding and speeded up responses, 438 while higher (relative to lower) punishment stake magnitude decreased responding and slowed 439 responses (see Supplemental Material 3). 440 In sum, we found evidence that participants learned the task and that the reward and 441 punishment stake magnitudes biased responding in opposite directions, reflecting Pavlovian biases. 442 For reaction times, we found larger reward stake magnitudes to speed up responding and larger 443 punishment stake magnitudes to slow down responding, again in line with Pavlovian biases as 444  Participants perform more Go responses on trials where the reward stake was higher than the punishment stake (green bars) than vice versa (red bars). Individual data points reflect response proportion per participant. C. Stake magnitudes biased responding in a continuous fashion. A higher stake difference (i.e., a reward stake minus punishment stake) resulted in a higher proportion of Go responses. Faint grey lines represent regression lines per participant as predicted by the mixed-effects regression model; the bronze line represents the group-level regression line; bronze shading represent mean and 95% confidence intervals. Note the two strong outliers in Sample 2; excluding these outliers did not change conclusions. D. Performance in the catch trials. Individual data points reflect accuracy per participant.

Action plans direct eye gaze 446
Next, we tested whether participants' attention was synchronized to their action plans. Such a 447 link would allow Pavlovian biases to be elicited specifically by reward/ punishment cues that trigger 448 an action in line with participants' intentions. As a measure of goal-directed attention, we used the first fixation on each trial (Konovalov & Krajbich, 2016), which was unaffected by any bottom-up 450 saliency effects of the (yet to be uncovered) stakes in our gaze-contingent design. On trials that 451 required a Go response, participants were significantly more likely to first fixate rewards than on trials 452 that required a NoGo response (Sample 1: b = 0.11, 95% CI [0.04 0.19], χ 2 (1) = 13.92, p < .001; 453 (1) = 7.88, p = .005; Fig. 3A). 454 This analysis used the required response as a predictor on every trial, which is globally 455 appropriate given that participants learnt the task. However, at the beginning of blocks, participants 456 could not know the required response yet. Furthermore, some participants failed to learn the correct 457 response for (some of) the cues. Thus, as a more proximate measure of participants' beliefs of what 458 they should do, we fitted a simple Rescorla-Wagner model (Rescorla & Wagner, 1972) to the Go/ 459 NoGo response data of each participant, simulated the action (Q) values for Go and NoGo responses 460 on each trial, and used the difference Q Go -Q NoGo as a regressor to quantify the trial-by-trial relative 461 value of making a Go relative to NoGo response. At the beginning of a block, this regressor will be 462 zero, and it will stay (close to) zero in case participants fail to learn the correct response. We found 463 that the more Q-values favored a Go compared to a NoGo response, the more likely were participants We furthermore performed exploratory analyses to test whether action plans affect attention 467 beyond the first fixation, i.e., also the overall difference in dwell time to the stakes (dwell time on the 468 reward stake minus dwell time on the punishment stake). This difference was higher when the reward 469 stake was fixated first (  . 3B). This latter effect shows that total dwell time was not completely 482 determined by the first fixation, which was shaped by "top down" action values, but was additionally 483 sensitive to bottom-up saliency effects of the stake magnitudes. 484 In sum, we find evidence that that participants' attention to valenced stakes information, in 485 terms of both initial fixation and total dwell time, was synchronized to their initial action plans. 486

Eye gaze predicts responses 487
We next assessed whether and how attention shaped the ultimate response. We used the 488 difference in dwell times (reward minus punishment stakes) as an integral measure of total attention 489 (Konovalov & Krajbich, 2016). We controlled for the required action to show that attention predicted 490 the eventual response even beyond participants' likely intentions. 491 The longer participants attended to rewards compared to punishments, the more likely they responding, but did not significantly affect reaction times, while higher dwell time on punishment 498 decreased responding and slowed responses (see Supplemental Material 5). We did not observe any 499 interaction effects between stakes and dwell time effects (see Supplemental Material 5). 500 As action plans both affected attention as well the ultimate response, on might wonder if the 501 link between attention and the ultimate response was induced by action plans as a "common cause". 502 To exclude this possibility, all analyses using dwell times to predict responses included the required 503 action as a regressor. Furthermore, we obtained causal evidence for an effect of attention on the 504 ultimate response in a separate online study, in which we manipulated attention. In this study, action cues were presented simultaneously with stakes, but located in close spatial proximity to either the 506 reward or the punishment stakes. We reasoned that the stakes closer to the action cue would receive 507 more attention. Indeed, we observed that action cues were located closer to reward (instead of 508 punishment) stakes resulted in more and faster Go responses. This additional dataset corroborates a 509 causal effect of attention on the ultimate choice. For details, see the Supplemental Material 6. 510 In sum, we found evidence in both samples that dwell time on rewards/ punishments drove 511 responses towards Go/ NoGo and speeded/ slowed responses, respectively, such that attention 512 determined the eventual strength of Pavlovian biases. Tentative evidence suggested that effects of 513 stake magnitudes and dwell times were highly similar. 514 Stake magnitude and attentional effects differently relate to performance 516 Lastly, given that both stake magnitudes and dwell times affected responses and RTs in a 517 highly similar way, we asked whether these effects also had similar consequences for participants' 518 overall performance. Crucially, stakes were controlled by the experimental protocol and were 519 therefore unrelated to the required response on each trial. In contrast, attention was under the control 520 of the participant. If participants fixated reward or punishment cues in line with their action goals and 521 then let attention guide their eventual response, strong attention effects could putatively improve their 522 performance. We performed exploratory analyses testing whether effects of stake magnitudes and 523 dwell times on responding were related to accuracy across participants. 524 The effect of stake difference on responses correlated significantly negatively with accuracy, .001 (Fig. 4B). Effects were not exclusively driven by reward or punishment stakes/ dwell times, but 529 both (in opposite directions, respectively; see Supplemental Material 4). We excluded two simpler 530 explanations of the association between the attentional effect and task accuracy: First, this association 531 was not driven by more accurate participants providing higher-quality eye-tracking data (see 532 Supplemental Material 7). Furthermore, accuracy was not linked to a stronger focus on reward 533 information (i.e., more first fixation to rewards or longer attention to rewards); if anything, more 534 accurate participants showed a more variable gaze pattern, which support the idea that these 535 participants could rely in their responses on their context-appropriate gaze patterns (see Supplemental 536 Material 4). 537 In sum, although correlational, these results suggest that strong attentional effects might 538 facilitate performance, while strong stake magnitudes effects impair it. Based on these analyses, stake 539 magnitude and attentional effects appear to be dissociable. 540

Discussion 542
We report evidence from two independent samples showing that instrumental action plans 543 steer attention towards rewards and punishments and in this way shape the input to the Pavlovian 544 control system, triggering responses in line with those action plans. These results shed new light on 545 the possible function of Pavlovian control. In contrast to current theories, we suggest that these biases 546 have an important role beyond providing reasonable response defaults in novel or seemingly 547 uncontrollable environments. Crucially, in addition, Pavlovian control can support instrumental 548 control for efficient and robust action execution. In a novel task, participants successfully learned to 549 perform Go and NoGo actions to various cues. Their responses and reaction times were biased by 550 task-irrelevant information about potential reward/ punishment outcomes (stakes), similarly to 551 previously reported Pavlovian biases. Most crucially, we found that participants aligned their attention 552 to these stakes with their action plans: they paid more attention to reward stakes when they had to 553 perform a Go action, and relatively more attention to punishment stakes when they had to perform a 554 NoGo action. In turn, attention to these stakes biased ultimate responses, such that more attention to 555 rewards increased the frequency and speed of Go responding. Exploratory between-subjects analyses 556 showed that stronger attentional effects on choice were associated with higher performance, hinting at 557 the adaptive nature of using attention to elicit an automatic response. In sum, these results support the 558 notion that humans can adaptively direct attention to reward and punishment information to 559 selectively elicit Pavlovian biases in line with their action plans. Here, we suggest that a strong Pavlovian system can be adaptive, even in well-known 571 environments, when it is actively brought into alignment with the goals of other (instrumental) 572 systems. Pavlovian and instrumental control do not need to operate in a strict parallel fashion and 573 merely interact at the output stage. Instead, we show that instrumental control can determine the input 574 to Pavlovian control by selectively steering attention to (potentially unrelated) reward or punishment 575 information. In this way, it sets the Pavlovian system on an "ballistic" track that will eventually lead 576 to the intended response. Having such an auxiliary mechanism that will trigger the intended response 577 might be particularly adaptive in real-life contexts in which the implementation of actions unfolds 578 over time and is prone to interruption by distractors. By "aligning" Pavlovian with instrumental 579 control, action selection becomes more robust against interference. Such an faciliatory effect of 580 Pavlovian control is in line with our finding of better performance in participants with stronger 581 attentional shaping of responses. 582 Our findings shed new light on the potential use of simple, "fast-and-frugal" systems in 583 decision-making, motor control, and attention. These fields distinguish slow, more computationally 584 demanding, but at the same time more flexible and "accurate" strategies against faster, less 585 demanding, but inflexible and frequently incorrect strategies (Balleine & Dickinson, 1998 There are a few important considerations when generalizing our findings to real world 637 situations. First, possible outcomes of a choice are often not explicitly presented to an agent. Rather, 638 agents must make a selection among many potentially relevant pieces of information on what they 639 deem important. Our task tried to mimic such situations by allowing agents to freely choose how 640 much to attend to information about rewards and punishments at stake. Still, attention allocation 641 differed from "naturalistic" free viewing settings in two important ways. Participants were not 642 completely free to attend to the stakes, but were incentivized to do so by the secondary catch task. 643 Furthermore, only two pieces of potential information-exemplary of positive and negative aspects of 644 the situation-were presented, which is a drastic simplification of our information-dense environment. 645 Future extensions of this research should provide participants with a larger set of information to select 646 from, allowing them complete freedom to seek out any information during action preparation. 647 Second, in real life situations as well as in this task, people might initiate an action plan, but 648 then change their mind. We only had access to the participants' ultimate response, which does not 649 allow us to disentangle situations in which they maintained a determined action plan throughout the 650 trial from situations in which actions plans were changed based on reward/ punishment information. 651 Neuroimaging techniques with high temporal resolution such as EEG and MEG could shed light on 652 the dynamic interactions between motor processes and how these change as a function of attentional 653 focus. 654 Third and finally, exploratory analyses suggested that participants whose ultimate response 655 relied more strongly on attentional inputs showed higher performance. This result corroborates the 656 postulated adaptive nature of a strong Pavlovian system that can be harnessed by instrumental 657 systems. In contrast, the degree to which responses were shaped by the stakes magnitudes (i.e., larger 658 magnitudes resulting in stronger Pavlovian biases) was associated with lower performance. This-at 659 first perhaps surprising-dissociation likely arose from our task design in which stakes magnitudes 660 were orthogonal to action requirements. When participants performed substantially above chance, 661 stakes magnitudes had a greater potential to disturb action selection on "incongruent" trials (where The required action and the action triggered by the net stakes difference mismatched) than to facilitate 663 it on "congruent" trials. In contrast, in many real-world contexts, it is adaptive to take into account the 664 size of available rewards or punishments when choosing whether and how vigorously to respond. 665 Still, even if stakes magnitudes and attention to stakes are both meaningful contributors to 666 choices in real-world settings, it is noteworthy that both had different consequences for performance 667 in our task, suggestive of dissociable behavioral phenotypes. While relying on stake magnitudes might 668 be linked to "sign-tracking" behavior previously observed in animals and humans (Flagel et al. In sum, our results suggest a broadening of the current view of Pavlovian control: In addition 679 to providing sensible "default" actions in novel or uncontrollable environments, a strong Pavlovian 680 system can be adaptive even in well-known environments when its robust, almost "ballistic" nature is 681 recruited to ensure that an action plan is implemented even in face of distraction. 682 humans use attention to reward/ punishment cues to create a "Win"/ "Avoid" situation that helps them pursue their action goals. This perspective highlights that instrumental and Pavlovian control might 691 more often work on concert rather than oppose each other. 692

Constraints of generality 693
Pavlovian biases might be a universal phenomenon shared by humans and many animal 694 species. They have been described across the animal realm, suggesting a genetic basis shared amongst 695 humans and other animals and/or a "mandatory" acquisition very early in life in a set of diverse 696 environments. While there are considerable individual differences in the strength of these biases (as 697 described in this manuscript as well as previous work), the direction of their effects both within and 698 across species is highly consistent, with reward prospect invigorating responding and threat of 699 punishment inhibiting it. Systematically inverted biases have never been observed. In contrast, for the 700 strategic attentional recruitment of these biases, there might be no similar "hard-wired" basis and such 701 a strategy might be acquired by different individuals to different degrees. We speculate that, similar to 702 the biases themselves, the direction of this strategic effect is consistent across individuals. The 703 existence of an "inverted" strategy is highly implausible. The authors would like to highlight that the 704 studies reported in this manuscript were conducted in English and that a significant portion of the 705 participants were not Dutch natives (although this was not systematically assessed), suggesting that 706 the strategic recruitment of biases occurs independently of the local culture where this research has 707 taken place.  7.31, p = .006; Fig. S03D). Note that RTs are only available for Go responses; hence, the amount of 1055 data (and resulting statistical power) are lower compared to the Go/ NoGo response data. 1056 In conclusion, effects of stake magnitude on driving Pavlovian biases reported in the main 1057 manuscript were driven by variations in both the reward and the punishment stake. These effects 1058 resemble effects of Pavlovian biases reported before, but in this study emerged in a graded fashion, 1059 i.e., more and faster Go responding the larger the reward stake was, and less and slower Go 1060 responding the larger the punishment stake was. 1061 In addition, we tested whether the effect of stake difference on responses (i.e., the Pavlovian 1062 bias) became weaker over time. For this purpose, we used mixed-effects logistic regression models 1063 including stake difference, time, and their interaction. As time, we either used a) trial number across .455). Numerically (but not significantly), the bias got weaker with time, which is to be expected 1074 given that people make less errors over time, while errors are necessary to detect the presence of a 1075 Pavlovian bias. In sum, we found no evidence that the Pavlovian bias vanishes over time. 1076 Of note, in our pre-registration, we mentioned under "exploratory analyses" that we would fit 1077 reinforcement-learning drift diffusion models (RL-DDMs) to jointly analyze the effects of stakes/ 1078 dwell times on choices and RTs. We decided to not report the results from these models because data 1079 simulated from them was markedly different from the empirical data. We suspect that DDMs cannot 1080 capture data from this task due to i) the tight response deadline (600 ms), leading to overall fast (but 1081 regularly incorrect) responses while preventing late responses, and ii) the absence of RTs for the the parameters (especially the starting point bias term). Lastly, enforcing a strict response threshold is 1084 not possible in the DDM framework. Potentially, evidence accumulation frameworks in which the 1085 response thresholds decrease and eventually become zero at the response deadline might be able to 1086 accommodate such data, but likelihood functions for such models are not readily available. We 1087 encourage other researchers to reanalyze this data with more suitable modeling frameworks that might 1088 arise in the future. 1089 Figure SI03. Effect of stake magnitudes on responses and reaction times. A higher reward stake magnitude led to a higher proportion of Go responses (A; significant in both studies), while a higher punishment stake magnitude led to a lower proportion of Go responses (B; only significant in Study 2). Similarly, a higher reward stake magnitude tended to speed up reaction times (C; significant only in Study 1), while a higher punishment stake magnitude tended to slow down reaction times (D; significant in both studies).  and dwell times exerted such highly similar effects, one might expect them to operate through the 1135 same underlying mechanism. One consequence following from such a shared architecture is that the 1136 effects might influence each other, predicting an interaction effect. We hence performed exploratory 1137 analyses testing for such an interaction effect, reflecting whether effects of longer vs. shorter attention 1138 to the reward (punishment) stake were amplified when participants saw many vs. few potential 1139 rewards (punishments) or vice versa. The interaction between the stake difference and the dwell time In conclusion, longer dwell time on rewards led to more and faster responding while longer 1146 dwell time on punishments led to less and slower responding. However, effects on reaction times were 1147 only significant in the punishment domain. We did not find evidence for an interaction between stake 1148 magnitudes and dwell times, yielding no conclusive evidence whether both effects rely on the same 1149 underlying mechanism or not. 1150 In the results from our eye-tracking studies reported in the main text, we observed an effect of 1157 (manipulated) action requirements on eye-gaze (first fixation and dwell time) and an effect of 1158 (measured) eye-gaze on the ultimate response. Given that both action requirements and eye-gaze 1159 predicted the ultimate response, on might wonder whether the link between eye-gaze and the ultimate 1160 response was spurious, induced by action plans as a "common cause" (an instance of the "third 1161 variable problem"). Note that all analyses regressing responses onto dwell time reported in the main 1162 text controlled for the action plans. In addition, we tested for a causal effect of attention to reward/ 1163 punishment information on responses in a separate online study in which we manipulated attention. 1164 This study was performed as a thesis project for Bachelor students at the beginning of the COVID-19 1165 pandemic.
participant logistic regression with response as dependent and required action as independent variable, 1174 with p < .05 as cut-off; three participants). Both ways led to identical conclusions. 1175 We recruited participants via the SONA Radboud Research Participation System of Radboud 1176 University. Participants were required to be at least 18 years old, understand English at a sufficient 1177 level (self-reported), not be color-blind, perform the experiment on a PC with a keyboard (no phones 1178 or tablets) and complete the study within a maximum of 90 minutes (i.e., 1.5 times the expected 1179 completion time). The experiment was administered via the Gorilla platform (Anwyl-Irvine, 1180 Dalmaijer, Hodges, & Evershed, 2020). After providing informed consent and demographic 1181 information on age, gender, and handedness, participants completed the "reversed-dot-probe" version 1182 of the Motivational Go/ NoGo Task for 30-40 minutes (see below). Afterwards, they filled out the 1183 brief (13-item) version of the Self-Control Scale (SCS) (Tangney, Baumeister, & Boone, 2004) and 1184 the Behavioral Activation/ Behavioral Inhibition System Scales (BIS/BAS) (Carver & White, 1994). 1185 Additionally, participants completed two vignettes (measuring omission bias) in which they rated the 1186 experienced regret and responsibility of two football coaches who won/ lost a match, afterwards 1187 changed/ kept their match plan, and then lost the next game (adapted from (Zeelenberg, van den Bos, 1188 van Dijk, & Pieters, 2002). Finally, participants performed a debriefing questionnaire asking them to 1189 a) guess the hypotheses of the experiment, b) report any (non-instructed) strategies they used, and c) 1190 guess whether the additional instructions helped them perform the task better. Participants were then 1191 debriefed about the purposes of the study. In compensation for participation, participants received 1 1192 hour of course credit. Furthermore, participants with at least 60% accuracy in the Go/ NoGo task 1193 received tickets (proportional to performance) for a lottery featuring two 20€ gift card vouchers. Participants performed an adapted version of the Motivational Go/ NoGo learning task termed 1198 "reverse-dot-probe version" (Fig. S06A). On each trial, they first saw how many points they could 1199 win for a correct response (printed in green font with a "+") or lose for an incorrect response (printed in red font with a "-", termed "stakes"). Stakes varied between 10 and 90 points drawn from a uniform After the second block, participants received additional instructions that explicitly encouraged 1226 them to look at the reward stake in case they planned to perform a Go response, and look at the regression model for reaction times) featuring the regressors required response (Go/ NoGo), cue 1255 position (on the reward/ punishment side), and instructions (before /after) as well as all possible 1256 interactions. As mentioned in the pre-registration, we also report the interaction between required 1257 action and instructions as well as the three-way interaction between required action, cue position, and 1258 instructions. 1259 Furthermore, we specified two exploratory analyses in our pre-registration. Firstly, we tested 1260 whether the difference in stakes (reward minus punishment stake) affected participants' responses and 1261 reaction times, expecting more positive differences to lead to more and faster Go responses. For this 1262 purpose, we fitted a model with stake difference as sole regressor. Secondly, we calculated 1263 participants' mean score on the self-control scale (SCS), BIS and BAS scales and regret judgements 1264 and tested whether these scores modulated participants' cue position effect. For this purpose, we fitted 1265 a new model for each score featuring cue position, the respective score, and their interaction. 1266 1267 Figure SI06. Task design and results from the online study manipulation attention to reward and punishment information A. Task design. On each trial, participants saw many points they could win for correct responses or lose for incorrect responses ("stakes"). After 500 ms, a Go/ NoGo action cue was displayed either next to the reward or the punishment stake, nudging participants to direct more attention to the respective stake.
Participants learned whether a cue required a Go or NoGo response from trial-and-error. Outcomes are delivered in a probabilistic manner (86% feedback validity). On catch trials, participants indicated which other stake (i.e., the one they did not receive as an outcome) they had seen before. B. Proportion of Go responses as a function of action requirement and cue position. Participants performed significantly more Go responses to Go cues than NoGo cues and when cues were presented next to the reward stake compared to the punishment stake.
C. Reaction times as a function of action requirement and cue position. Participants showed significantly faster responses to Go cues than NoGo cues and when cues were presented next to the reward stake compared to the punishment stake. D. Proportion of Go responses as a function of stake difference (reward minus punishment stake). As net stakes became more positive, participants performed significantly more Go responses. E.
Reaction times as a function of stake difference (reward minus punishment stake). As net stakes became more positive, participants became faster, but this effect was not significant.

1268
Overall, participants learned the Go/ NoGo task (% correct: M = 79.0, SD = 12.0, range 52.7-1269 94.2), performing significantly more Go responses to Go cues than NoGo cues (main effect of 1270 required action: b = 1.60, 95% CI [1.33 1.88], χ 2 (1) = 54.53, p < .001). Three participants did not 1271 perform significantly above chance (per-participant logistic regression with response as dependent 1272 and required response as independent variable, which is significant for accuracy levels of at least 1273 56%). In line with our pre-registration, we report results with and without these participants. 1274 Performance in the catch task was above chance (3 response options imply a chance level of 33.3%; a 1275 one-sided binomial test based on 12 trials is significant for 63% accuracy and higher) in only 25 out 1276 of 34 participants. Also, the group-level performance was hardly above chance (% correct: M = 66.4, 1277 SD = 18.6, range 25.0-81.7), likely reflecting that this task was very demanding. and dwell times on responses. Participants' mean accuracy correlated significantly negatively with their respective effect of stake differences on responses (A), also when two outliers removed (B), which was driven both by a negative correlation with the effect of the reward stake (C; note that these effects tend to be positive) as well as a positive correlation with the effect of the punishment stake (D; note that these effects tend to be negative, i.e., participants with stronger negative effects showed worse performance). These correlations suggest that participants with strong stake difference effects showed poor performance. The opposite pattern occurred for the effect of dwell time on responses: This effect correlated significantly positively with accuracy, both for the difference between reward and punishment dwell times (E) as well as the relative dwell time (ratio) on rewards (F). Again, this effect was driven by reward dwell times (G) rather than punishment dwell times (H).
These correlations suggest that participant with strong attention effects showed high performance.