Forward planning driven by context-dependent conflict processing in anterior cingulate cortex

Forward planning is often essential to achieve goals over extended time periods. However, forward planning is typically computationally costly for the brain and should only be employed when necessary. The explicit calculation of how necessary forward planning will be, is in itself computationally costly. We therefore assumed that the brain generates a mapping from a particular situation to a proxy of planning value to make fast decisions about whether to use forward planning, or not. Moreover, since the state space of real world decision problems can be large, we hypothesized that such a mapping will rely on mechanisms that generalize sets of situations based on shared demand for planning. We tested this hypothesis in an fMRI study using a novel complex sequential task. Our results indicate that participants abstracted from the set of task features to more generalized control contexts that govern the balancing between forward planning and a simple response strategy. Strikingly, we found that correlations of conflict with response time and with activity in the dorsal anterior cingulate cortex were dependent on context. This context-dependency might reflect that the cognitive control system draws on category-based cognition, harnessing regularities in control demand across task space to generate control contexts that help reduce the complexity of control allocation decisions.


25
Many decisions have far-reaching consequences for the future, as they affect both internal bodily 26 and external environmental states, in turn often conditioning potential future actions. Therefore, to 27 achieve any long-term goals, people have to consider the future in some way. This can be achieved 28 by planning multiple steps into the future to estimate the effects of potential action sequences (K. J. 29 Miller In multi-step tasks, however, an intricate computational problem becomes apparent. How can the 46 brain control the use of forward planning in a way that maximizes long-term benefits, without having 47 to compute these benefits by forward planning beforehand? One solution to this paradox might be 48 for people to generate a mapping from a particular situation to a proxy of the value of planning that 49 allows them to quickly access the planning values later (D. G. Lee et al., 2021;Lieder et al., 2018). 50 Moreover, because the state space for real-world decision problems can be very large, it is unlikely 51 that people learn a value for every possible combination of states. Rather, they might use certain 52 task features to generalize clusters of states into particular contexts for which values are learned 53 (Lieder et al., 2018). 54 Here, we tested this principle in an fMRI study using a novel sequential decision making task. In the 55 task participants had to plan ahead to earn points by accepting offers while managing a limited 56 energy budget. Importantly, we designed the task such that situations with different levels of the 57 demand for planning occurred. With 448 possible combinations of task features and four different 58 offers participants could choose from, our task was quite complex. We therefore assumed that 59 participants used a simplified representation of planning value during control allocation decisions. An 60 initial analysis of choice frequencies showed that participants used a repetitive choice pattern for 61 two of the options, while responses were more balanced for the two other offers. From these choice 62 patterns, we hypothesized that participants generated two different groups of offer-dependent 63 representations of planning value. We refer to these two groups as control contexts (or context for  64 short), with one context coding for a high a priori need for planning and the other context coding for 65 a low a priori need for planning. To further test the control context hypothesis, we analysed response 66 times and fMRI data using a specific conflict measure as a proxy for the value of forward planning. 67 We found that correlations of conflict with response time and with BOLD-activity in the dACC were 68 dependent on the context. Our results provide initial evidence for a mechanism by which the brain 69 harnesses regularities in the value of planning across tasks space to construct control contexts that 70 facilitate efficient allocation of control in complex tasks. Future research should further develop and 71 confirm these initial findings by testing formal models of arbitration which incorporate structured 72 representations of planning value. 73

74
Participants 75 Forty participants took part in the experiment (22 women, mean age = 24.4, SD = 4.6). 76 Reimbursement was a fixed amount of 14€ or class credit plus a performance-dependent bonus 77 (mean bonus = 6.62€, SD = 0.39). The bonus was calculated as a linear function of the accumulated 78 points in the experiment. The study was approved by the Institutional Review Board of the 79 Technische Universität Dresden and conducted in accordance to ethical standards of the Declaration 80 of Helsinki. All participants were informed about the purpose and the procedure of the study and 81 gave written informed consent prior to the experiment. All participants had normal or corrected-to-82 normal vision. 83 Data availability 84 Data and analysis code used in this article is publicly available at 85 https://doi.org/10.5281/zenodo. 5112965 for every trial an opportunity to plan ahead multiple steps into the future. Importantly, the task 97 featured both situations in which planning was crucial to decide between the accept and reject 98 option, as well as situations that could be sufficiently solved by a simple heuristic. 99 In detail, the temporal structure of the task comprised three levels, the single trial, the current 100 segment, and the segment pair of the current and next segment. One segment consisted of four 101 trials. In a single trial, participants could either accept or reject an offer (selected by either a left or 102 right button press), where accepting the offer increased points by an indicated amount but decreases 103 energy by one or two units, depending on condition, see below. Rejecting always increased energy by 104 one. There were four equally probable offers, displayed as one, two, three or four trophies in the 105 middle of the screen. Accepting an offer increased points by the respective number of trophies, 106 thereby advancing the yellow point bar at the top of the screen (Fig 1A). The energy costs of 107 accepting varied between one, in low-cost segments (LC), and two in high-cost segments (HC). The 108 energy budget with a minimum of zero and a maximum of six was displayed as a blue bar at the 109 bottom of the screen. Initial energy in the first trial of the experiment was three. If participants had 110 maximum energy and chose to reject, no further energy was added, and the next trial started. If 111 participants accepted an offer with too little energy, no points were awarded, a warning was 112 displayed, and the next trial started. Participants were informed about the energy cost of the current 113 and the future segment by two symbols in the bottom right corner. The left symbol informed about 114 both the energy cost of the current segment and the current trial number in the segment. The right 115 symbol informed about the energy cost of the future segment. One flash indicated a low-cost 116 segment and two flashes a high-cost segment ( Fig 1A). 117

126
The experiment included a training session outside the MRI scanner (for task instructions see S1 Text) 127 and three sessions inside. The training session comprised 144 trials across 36 segments with nine 128 repetitions for each of the four possible transitions (Fig. 1C). The fMRI experiment comprised 240 129 trials across 60 segments with 15 repetitions per segment transition. On average, participants took 130 40 minutes to complete the fMRI experiment. The fMRI experiment was split into three sessions 131 between which participants rested for two to three minutes without leaving the scanner. The 132 sequence of segments and offers was pseudorandomized and identical for all participants. Segment 133 sequences were generated such that each of the four segment transitions (Fig 1C) was sampled 134 equally often. Similarly, offer sequences were generated such that the frequency of offers was 135 balanced within segment transitions (raw behavioral data with all details about the offer sequences 136 can be found at https://doi.org/10.5281/zenodo.5112965). 137 The timing of stimulus events in the fMRI experiment was as follows (see also Fig. 1A): each trial 138 started with a fixation cross (0.5 seconds) in the middle of the screen to prepare participants for the 139 upcoming decision. In the response phase, the offer appeared, and the choice options were 140 surrounded by a frame to indicate that a decision is required. If participants did not respond within 5 141 seconds, they were timed out with a warning message, and the next trial began. In the selection 142 phase (1-5 seconds, uniformly sampled and rounded to first decimal), the frame surrounding the 143 unchosen option disappeared. In the feedback phase (1 second, fixed), energy or point changes were 144 displayed, and the frame surrounding the chosen option turned green (or accordingly red if the 145 energy budget was too low for accepting). In the intertrial interval (2-5 seconds, uniformly sampled 146 and rounded to first decimal) choice options were unframed and the offer disappeared ( Fig 1A) Thirdly, we also considered a hybrid strategy of these two extremes, where participants may use a 161 mixture between forward planning and simple strategy ('hybrid strategy'). Note that our modelling 162 approach relies on logistic regression and does not describe per se a process of how the brain may 163 balance between forward planning and a simple strategy. Rather, the computational approach 164 enables us to test for evidence that participants rely on (i) forward planning, (ii) a simple strategy or 165 (iii) a mixture between these two extremes. 166 Planning strategy model (PM). Clearly, if participants used a greedy strategy of accepting all offers in 167 the first few trials of the task, they will quickly run out of energy and might not be able to accept 168 better offers in future trials. Therefore, to maximize the accumulated points, one has to plan ahead, 169 anticipating future actions, energy costs and reward opportunities. To implement such a planning 170 strategy, we assumed a finite horizon until the end of the next future segment since participants 171 were only explicitly informed about the energy costs of the current and the next future segment. As 172 each segment had four trials each, this resulted in a horizon of maximally 8 trials and minimally 5 173 trials, i.e. when a participant has to select the decision for the fourth trial of the current segment. To 174 derive a policy that maximizes expected reward over this horizon we formalised our task as a Markov 175 Decision Process ( Since participants successfully completed a training session and received detailed task instructions 180 (S1 Text) prior to the main experiment, we assume that participants understood the rules of the task. 181 In the model, this knowledge is represented by the transition probability 182 ( ′ = ( ′ , ′ )| = ( , ), ), which is the probability to transition to a new state ′ given the 183 current state (consisting of the offer value and the current energy ) and the selected action .
where ( ) is the discrete uniform distribution over the possible offer values, is the energy cost in 186 the current segment (1 or 2) and is the energy cost in the future segment (1 or 2). We model the 187 different segment transitions → , → , → → as separate MDPs, 188 substituting the respective values for and . 189 Immediate rewards, corresponding to the offer value, are generated upon successful acceptance. 190 Formally, the reward function satisfies 191 To determine the optimal policy that maximizes the expected reward over the current and future 192 segment, the PM uses backward induction. The algorithm was implemented as follows : For a given state, positive values of DV indicate a greater long-term expected reward for accepting 204 and negative values of DV indicate greater long-term expected reward for rejecting. 205 Using a logistic regression approach, we define the probability to accept as 206 where 207 The planning weight captures the influence of on choice behaviour. To allow for 208 systematic deviations from behaviour prescribed by , we also included preference parameters 209 . These preference parameters simply model a participant's tendency to generally choose the 210 accept (or the reject option). The parameter captures the preference in trials where 211 participants had enough energy to accept and did not reach the maximum energy level (termed basic 212 trials). We implemented this with a binary indicator variable that equals one if the current trial 213 was basic and zero if not. To model behaviour in trials with maximum or insufficient energy, we also 214 included three bias parameters , _ and _ . The first of these bias parameters 215 models the special case when participants had to choose on a trial with full energy. We expect 216 this bias parameter to be generally positive because a further reject choice would not increase the 217 energy further. For the other two bias parameters _ and _ we expect these to be 218 generally negative, i.e. participants will reject an offer if they have insufficient energy. Subsets of 219 these low-and max-energy trials are again selected by an appropriate binary indicator variable. 220 Simple strategy model (SM). Since forward planning or other elaborate anticipatory schemes might 221 incur considerable computational costs, participants may use a simple strategy, where action 222 selection is only based on offer value. We define the decision variable for the SM as offer value 223 centred across the four offer values 1 to 4: 224 The probability to accept is defined in the same way as for the PM 225 where 226 Here, the parameter captures the influence of offer value on choice behaviour. 227 Hybrid strategy model (HM). To cover the case that participants may choose based on both, expected 228 long-term values and offer specific preferences, we use a hybrid strategy as a mixture of both 229 planning and simple strategy. Such a hybrid strategy enables the decision maker to still use forward 230 planning but mix this decision tendency with a simple strategy for each of the four offers. Note that 231 we do not explicitly model arbitration and cannot identify which strategy dominates at any given 232 time. However, the model enables us to test whether there is a mix of a simple and a planning 233 strategy across trials. Like in the PM, is defined as the difference between the optimal state-234 action values 235 The probability to accept is defined as 236 Where now 237 = + 1 1 + 2 2 + 3 3 + 4 4 + + In addition to the planning weight and the three bias parameters for extreme energy cases, the 238 HM adds, as compared to the PM, four offer-specific preference parameters ( 1 , 2 , 3 , 4 ). The 239 indicator variables ( 1 , I 2 , 3 , 4 ) equal one if a specific offer was presented for basic trials (i.e. 240 energy was neither at maximum nor too low to accept). In other words, in contrast to the PM, the 241 four offer-specific bias parameters will indicate a relative dependence on the simple strategy. For 242 example, a negative offer-specific parameter will indicate a participants' preference to reject that 243 specific offer. 244 Model fitting and evaluation 245 Model fitting. Using a hierarchical Bayesian approach, we jointly estimated both participant-and 246 group-level parameters. For the PM and the SM, and were allowed to vary by participant. 247 For the HM , 1 , 2 , 3 and 4 were allowed to vary by participant. The parameters 248 were modelled as constant over participant. The participant 249 parameters were drawn from a normal distribution with respective group parameters and . These 250 group parameters were themselves modelled as draws from a weakly informative hyperprior 251 distribution: ~(0,2) and ~(0,2). A complete description of the models as 252 Stan code can be found online (https://doi.org/10.5281/zenodo.5112965). We fitted models using 253 Hamiltonian the expected log pointwise predictive density (elpd) and its standard error on the deviance scale (-2* 262 elpd) and refer to this quantity as leave-one-out cross-validation information criterion (LOOIC). 263 Lower values of LOOIC indicate better model fit. 264 Posterior predictions. To further assess whether the fitted models capture the observed behavioural 265 pattern, we conducted posterior predictive checks using mixed predictive replication for hierarchical 266 models (Gelman et al., 1996). To compute predictive replications we first sampled the group 267 parameters ( and ) from the posterior and then sampled forty normally distributed participant-268 level parameters from these group parameters. Replicated accept-reject responses were generated 269 for replicated participants and all trials by sampling from a Bernoulli distribution 270  Figure 2C, Conflict and response time analysis 291 A key quantity for our analysis of response times (RT) and fMRI data was conflict. 292 This corresponds to the similarity between long-term values for accepting and rejecting (see 293 equation 7). If, for a given trial and task state, the action-value difference is small, conflict is large. 294 Conversely, if the action-value difference is large, conflict is small. We consider conflict as a signal of 295 choice difficulty, reflecting the need for elaborate information processing such as planning. We 296 assume that participants do not calculate the conflict directly (which would require planning by 297 itself), but that they have quick and frugal access to a proxy for the conflict (D. G. Lee et al., 2021). 298 We analysed response times using hierarchical Bayesian linear regression estimating group-and 299 participant-level parameters simultaneously. We modelled log RT as the linear function 300 where is conflict (Eq. 16), is a binary indicator variable that equals one if the current offer was 2 301 or 3 (which we call in the following intermediate) and zero if the current offer is 1 or 4 (which we call 302 in the following extreme). This classification into intermediate and extreme offers was based on 303 participants' choice behaviour ( Fig. 2A). CI models the interaction between offer type and conflict. 304 The participant-level intercept 0 and parameters , and were normally 305 distributed with group parameters and . We gave these group parameters a weakly informative 306 hyperprior: ~(0,10) and ~(0,10). Models were fit in Stan via PyStan using 307 Hamiltonian Markov Chain Monte Carlo. We obtained 2,000 posterior samples from four chains of 308 length 1,000 (500 warmup). The potential scale reduction factor on split chains ̂ was calculated, 309 indicating convergence for all parameters (̂≈ 1). We generated linear predictions of log RT using 310 2,000 posterior samples of the group hyper parameters ( 0 , , , ) and 311 exponentiated back to the original RT scale for better interpretability. The regression lines in Figure  312 3C correspond to the median across samples and shaded areas to the 95% interval. All trials that 313 were not timed out (RT > 5s) were included in the analysis. 314 fMRI acquisition and preprocessing

353
Behavioural results

354
Choice behaviour. 355 We first identified situations in the task that could be classified as generally difficult or easy based on 356 participants' choice frequencies. An often repeated choice pattern indicates that a specific situation 357 can be handled by simple response mechanisms, while a mixed response pattern of accept and reject 358 indicates that more elaborate information processing may be required. Analysis of choice 359 frequencies revealed an obvious pattern, showing that participants accepted offer 1 in only a few 360 trials (mean = 1%, SD = 2%) and conversely accepted offer 4 in the majority of trials (mean = 98%, SD 361 = 3%) ( Fig. 2A). For offers 2 (mean = 14 %, SD = 12 %) and 3 (mean = 77 %, SD = 13 %), the choice 362 behaviour was more balanced between accepting and rejecting. To further quantify the balance 363 between accepting and rejecting, we computed the distance between choice frequencies and the 364 50% chance level and compared these distances across offer values. Distances from chance level 365 were larger for offer 1 and 4 compared to offer 2 and 3 (pairwise Wilcoxon signed-rank tests, p < 366 0.001). There was no significant difference between offer 1 and offer 4 (Wilcoxon signed-rank test, p 367 = 0.094). We also found that the distance from chance level was greater for offer 2 than for offer 3 368 (Wilcoxon signed-rank test, p < 0.001). 369

381
From this pattern, we hypothesised that participants might have treated the choice given an extreme 382 offer 1 or 4 as generally easy and the choice given an intermediate offer 2 or 3 as generally hard. We 383 hypothesized that this categorisation into what we call control contexts predetermines the actual 384 planning investment in a given trial. In the following we will test this hypothesis and provide further 385 insights using model-based analysis of choice behaviour, analysis of response times and analysis of 386 fMRI BOLD-signals. 387 In addition to the offer value, we found, using logistic regression, that participants' choice behaviour 388 was also influenced by other task features (S1 Table). units, where the current and the future energy cost is 2 (segment pair HC/HC) and offer 3 is 403 presented. A planning agent would reject the offer and replenish its energy reserves in order to be 404 able to accept potential better offers in the future. In contrast, an agent following a simple strategy, 405 e.g. who always accepts offers 3 and 4 and rejects offer 1 and 2, would accept the offer 3. We also 406 considered a third alternative that participants use a mixture between planning and a simple strategy 407 (HM) to achieve a good trade-off between the benefits and costs of the respective strategies 408 depending on the current task situation. 409 We compared how well the three cognitive computational models fitted participants' behaviour 410 using leave-one-out cross validation. We found, as shown in Figure  best. 417 We also simulated posterior predictions for the three models and plotted acceptance frequencies 418 across offer values (Fig 2A). Both, the HM and the SM, closely captured the behavioural pattern of 419 participants, but acceptance frequencies of the PM were lower for offer 1 and 2 and higher for offer 420 2 and 3. These simulations are consistent with participants mixing forward planning with a simple 421 reject-preference for offers 1 and 2 and an accept-preference for offers 3 and 4. As a further informal 422 illustration of why the HM was superior to the SM, we computed the proportion of matches between 423 participant choices and the simulated choices from the fitted models ( Fig 2C). While SM shows high 424 matching rates for offers 1, 2 and 4, the matching rate for offer 3 is decisively lower compared to the 425 HM. Conversely, the PM has considerable lower matching rates than the HM for offers 1,2,4 but 426 achieves a relatively high matching rate for offer 3. These results show that the SM particularly fails 427 to account for participants choices for offer 3, presumably because participants engage in an 428 increased amount of planning for offer 3 (see Fig. 2A, where the mean accept rate for offer 3 is 429 closest to the 50% line among all four offers, i.e. offer 3 does not support a simple action selection 430 strategy). 431 Parameter estimates of the HM demonstrate both evidence for forward planning and usage of a 432 simple strategy as quantified by four offer-specific preferences (Fig 2D) _ ), see also Methods. As expected, participants showed a bias to accept in 443 maximum energy trials and a bias to reject in low energy trials (see Fig 2D). 444 We further found evidence that participants who account for the long-term consequences of their 445 actions by e.g. planning, earn more points. Post-hoc correlation analysis revealed that participants 446 with a larger fitted planning parameter of the winning hybrid model accumulated more points 447 throughout the experiment (r = 0.521, p = 0.001, Fig 3A) and had slower average response times (r = 448 0.374, p = 0.017, Fig 3B). Previous research suggested that the brain regulates the use of cognitive control based on the 459 estimated value of control (Shenhav et al., 2013). Analogously, we assume that the brain uses similar 460 value estimates when deciding about the degree of forward planning during sequential decision-461 making. Most importantly we hypothesized that, due to cost incurred by the computation of control 462 values themselves, the brain uses a context-specific prior assumption about the general need for 463 planning to minimize the "metacosts" of control decisions. To further test this hypothesis we 464 analysed the relationship between response times as an indicator for the degree of planning and 465 planning value (operationalised by a specific conflict measure, see methods). We expected that not 466 only would larger conflicts generally lead to increased response times, but critically that this increase 467 will be more pronounced for the intermediate offers 2 and 3, possibly reflecting context-specific 468 planning activity driven by a context-specific evaluation of conflict. 469 Bayesian linear regression indeed showed that conflict was significantly more predictive for log RT  in frontal areas related to planning and cognitive control. We indeed found significantly greater 483 activity in dACC and right dlPFC ( Fig 4A, Table 1, and S1 Table). 484 485

494
We also tested where brain activity was greater during extreme versus intermediate trials (Extreme >  495 Immediate) and found increased activity in bilateral posterior parietal cortex, where the cluster in 496 the left hemisphere was significant at the whole brain corrected level (PPC; Fig 4B, Table 1 and S3  497  Table). Besides its role in sensory attenuation, posterior parietal cortex is also involved in 498 sensorimotor transformations during decision making (Andersen et al., 2009). This suggests that 499 participant decisions for extreme offers might be related to low-level sensorimotor processes, 500 coupling simple stimulus cues to actions. A network including left ventral Striatum (VS), posterior 501 cingulate cortex (PCC) and bilateral Amygdala emerged at a lower threshold (see S3 Table). These 502 regions have been shown to encode value information during reward-based choice (Bartra et al.,503 2013). Higher activation in these areas during extreme trials might indicate an increased salience of 504 offer value information instigating a simple response strategy based on offer-specific preferences. 505 However, this idea requires further research. 506 Context-dependent conflict processing in dACC. 507 As a confirmatory analysis of previous findings implicating the dACC in the monitoring of various 508 signals to evaluate the need for additional control (e.g. conflict, Shenhav et al., 2013), we also tested 509 for the effect of conflict averaged across conditions. In accord with this previous research, we found 510 a significant positive correlation with BOLD-activity in the dACC (Fig 5A, Table 1 and S4 Table). We 511 also found a significant positive effect of conflict in bilateral anterior Insula (Fig 5A, Table1 and S4  512  Table).

516
Activations displayed at p < 0.001 uncorrected. See Table 1 for peak MNI-coordinates and statistics, significant at p<0.05 517 FWE corrected.

518
Next, we tested our main hypothesis that the extreme and intermediate conditions are treated as 519 different control contexts by the brain. Note that one obvious reason for the effects in the 520 categorical contrast Intermediate > Extreme could be that average conflict was higher in 521 intermediate trials than in extreme trials (S3 Fig). Another possibility would be that the brain areas 522 involved in processing conflict are modulated by the context. In other words, is conflict processed 523 differently in the brain when the subject is in an intermediate compared to an extreme trial? 524 Behavioural results in Fig. 3C already indicate such a context-dependent mechanism, showing that 525 reaction times increased more with conflict during intermediate than extreme trials. 526 To test this context dependency using brain activity we included conflict as parametric modulator in 527 our GLM, separately for intermediate and extreme offers. We then computed a contrast between the 528 parametric modulator of conflict for intermediate offers minus the parametric modulator of conflict 529 for extreme offers (Conflict Intermediate > Conflict Extreme). Our expectation was that the dACC 530 would track conflicts (as a proxy for the value of planning), but to a lesser extent in a context with a 531 low prior need for planning (i.e. in the extreme context), due to the metacosts associated with 532 obtaining conflict values. We indeed found that BOLD-activity in dACC and right posterior middle 533 temporal gyrus (pMTG) was more strongly correlated with conflict during intermediate offers 534 compared to extreme offers (Fig 5B, Table 1 and S5 Table). An effect in dlPFC emerged at a lower 535 threshold (see S5 Table). This finding aligns well with the results of the reaction time analysis and is 536 consistent with the idea that the situation-appropriate investment into planning is driven by a 537 context-dependent evaluation of conflict involving the dACC. 538

539
We used a novel sequential task with a complex task space to investigate how people decide when to 540 plan ahead. We found evidence that participants use readily available features of the task space, 541 such as offer values, to construct contexts that condition the balancing between forward planning 542 and a simpler response strategy. We further provided evidence that the context-dependency of 543 planning might be mediated by context-dependent conflict processing involving dACC. Our study 544 provides initial evidence that the human ability to efficiently allocate cognitive control in complex 545 tasks is supported by category-based cognition that harnesses regularities in control demand to 546 generate control contexts. 547 Normatively, a decision about the engagement into elaborate planning should find the optimal trade-548 off between the benefits and costs of such planning in a given situation (Shenhav et al., 2013 We found that the correlation of activity in the dACC with conflict (which we take to be a proxy for 619 the value to plan ahead) depended on context. One possible explanation for this pattern could be 620 that the dACC has access to a hierarchical representation of learned conflicts, whereby conflicts 621 encoded at a finer level of task space are subsumed under conflicts encoded at the level of context. 622 In other words, states of similar difficulty could be grouped into a more general category that e.g. 623 simply indicates whether the decision is easy or difficult. In contexts with a high prior expectation of 624 conflict, i.e. in an intermediate context, the dACC could access conflict at a more fine-grained level to 625 enable the appropriate level of planning. Conversely, in a context with low prior expectation of 626 conflict, i.e. in an extreme context, the dACC would not access information beyond that at the coarse 627 context level, as the overall need for planning was low anyway. Speculating on the algorithmic 628 implementation of such a process, the context-dependent prior assumption about conflict could set 629 the threshold for the meta-decision problem of inferring the need of planning. In an intermediate 630 context, a high meta-threshold would grant enough time for a state-level readout of conflict, 631 whereas in an extreme context the need for planning would have been determined before state-level 632 conflicts were accessed. We also found evidence that right posterior middle temporal gyrus (pMTG) 633 is more correlated with conflict in an intermediate than in an extreme context. Previous research 634 implicated the pMTG in category-based cognition (Martin, 2007). It is therefore an intriguing 635 possibility that the pMTG is also capable of forming abstract categories of choice difficulty that 636 support the context-dependent evaluation of planning demands. Although we can only speculate 637 about the role of pMTG, the question how brain mechanism for structured knowledge acquisition 638 and cognitive control interact is an important direction for future research. Overall, our findings are 639 generally consistent with the view that people exploit the structure of a task for efficient storage and 640 access of the value of control (Lieder et al., 2018 The table contains all clusters with more than 10 voxels that survived uncorrected statistical thresholding with p < 0.001.

657
S4 The table contains all clusters with more than 10 voxels that survived uncorrected statistical thresholding with p < 0.001.

659
S5 The table contains all clusters with more than 10 voxels that survived uncorrected statistical thresholding with p < 0.001. random, having the same occurrence probability of 25%. 683  However, accepting an offer is associated with energy costs. Your current energy level is 684 represented by the lower blue bar. If you accept an offer and do not have enough energy, no 685 points will be credited to you and the next trial will begin. 686  You can replenish your energy account by selecting the "reject" option. This will increase 687 your energy level by 1 and the next trial will begin. 688  The energy level can have a maximum value of 6. 689  The experiment is divided into segments, each consisting of 4 trials. Two numbers are 690 displayed on the screen to indicate how far you are in the current segment. 691  There are 2 different segment types, in which the energy costs for accepting an offer differ. 692 In segments with 1 flash, 1 energy unit is subtracted when you accept an offer. In segments 693 with 2 flashes, 2 energy units are subtracted when you accept an offer. The left blue-orange 694 box at the bottom right of the screen informs you about the type of the current segment. 695  In addition to the type of the current segment, information about the energy costs in the 696 next segment is available. This can be seen in the right blue-orange box at the bottom right 697 of the screen. 698  Breaks: During the main experiment in the scanner, you have the possibility to pause twice. 699 The pause screen is automatically displayed. You decide when you are ready to continue the 700 experiment. Note: After a pause your score will be reset to 0. This has no effect on your final 701 bonus. Your score is counted continuously. 702  Deadline: You have a maximum of 5 seconds for each decision. If you exceed this time limit, 703 the next trial will begin without points being awarded. 704  Training: Before the main experiment in the scanner starts, you will be given a few training 705 trials on the PC to familiarize yourself with the experiment. There is no deadline in the 706 training and the points gained here have no effect on the bonus paid out. Please try to get as 707 many points as possible anyway. The training phase will end automatically. 708