Different brain systems support the aversive and appetitive sides of human pain-avoidance learning

Both unexpected pain and unexpected pain absence can drive avoidance learning, but whether they do so via shared or separate neural and neurochemical systems is largely unknown. To address this issue, we combined an instrumental pain-avoidance learning task with computational modeling, functional magnetic resonance imaging (fMRI) and pharmacological manipulations of the dopaminergic (100 mg levodopa) and opioidergic (50 mg naltrexone) systems (N=83). Computational modeling provided evidence that untreated participants learned more from received than avoided pain. Our dopamine and opioid manipulations negated this learning asymmetry by selectively increasing learning rates for avoided pain. Furthermore, our fMRI analyses revealed that pain prediction errors were encoded in subcortical and limbic brain regions, whereas no-pain prediction errors were encoded in frontal and parietal cortical regions. However, we found no effects of our pharmacological manipulations on the neural encoding of prediction errors. Together, our results suggest that human pain-avoidance learning is supported by separate threat- and safety-learning systems, and that dopamine and endogenous opioids specifically regulate learning from successfully avoided pain.

Learning to avoid actions that cause damage to our body is critical for health and 3 survival. The experience of pain is an important teaching signal in this learning process, such 4 as when a child learns to avoid touching a hot stove, or when a patient who underwent knee 5 surgery learns to avoid bending his or her knee. However, the absence of otherwise expected 6 pain can be an equally important teaching signal. When, for example, some weeks after 7 surgery a patient realizes that bending his or her knee is not painful anymore, this suggests 8 that particular movements are safe again and no longer needs to be avoided. Adaptive 9 behavior in situations associated with pain thus requires an optimal balance between threat 10 and safety learning when confronted with, respectively, the unexpected presence and absence 11 of pain . 12 Previous studies have made considerable progress in our understanding of the neural 13 basis of passive cue-pain-association learning (Ploghaus et al. 2000, Seymour et al. 2004, 14 Seymour et al. 2005) and-more recently-active pain-avoidance and -relief learning (Roy et 15 al. 2014, Eldar et al. 2016, Zhang et al. 2018) in humans. A key aspect of these studies was 16 the application of reinforcement-learning models to the analysis of neuroimaging data . 17 According to reinforcement-learning theory, learning is driven by prediction errors, which 18 signal the difference between the actual and expected outcomes of an action (Sutton and 19 Barto 1998). Thus, actions that result in the unexpected presence versus absence of pain yield 20 oppositely signed prediction errors which, respectively, increase and decrease the aversive 21 value associated with that action. However, whether these opponent teaching signals drive 22 learning via one underlying brain system, or via separate ones, is still largely unknown . 23 One possibility is that prediction errors elicited by the unexpected presence and 24 absence of pain are encoded as opposite activity patterns in the same brain regions (i.e., one 25 learning system). If this is the case, we may also expect that-at the neurochemical level-26 learning from these two outcomes is supported by the same neuromodulator(s). Furthermore, 27 these two outcomes may then be equally effective in driving learning, such that they are 28 associated with the same learning rate. Most previous studies, including our own, have 29 assumed that this is the case. For example, in a previous functional magnetic resonance 30 imaging (fMRI) study, we identified brain activity encoding general aversive prediction 31 errors, signaling the degree to which both received-and avoided-pain outcomes are relatively 32 worse (or less good) than expected (Roy et al. 2014). Another possibility, however, is that 33 learning from received and successfully avoided pain are subserved by two separate brain 34 mg levodopa (a dopamine precursor), 50 mg naltrexone (an opioid antagonist, with highest 1 affinity for the µ-opioid receptor), or placebo. PET studies in humans suggest that levodopa 2 increases phasic dopamine bursts (Floel et al. 2008), but not tonic dopamine activity (Black 3 et al. 2015). Thus, if dopaminergic prediction-error responses support learning from 4 successfully avoided pain, we expect levodopa to enhance learning rates and neural 5 prediction error signaling when pain is avoided. Naltrexone blocks the majority of µ-opioid 6 receptors in the brain (Lee et al. 1988, Preston andBigelow 1993, Schuh et al. 1999, Weerts 7 et al. 2013). Thus, if µ-opioid receptor activity supports learning from received and/or 8 avoided pain outcomes, we expect naltrexone to reduce learning rates and neural prediction 9 error signaling for the corresponding outcome(s). 10 11

13
Eighty-three healthy human participants completed a pain-avoidance learning task 14 during fMRI, under one of three treatment conditions (levodopa, naltrexone, or placebo). On 15 each of 144 trials of the pain-avoidance learning task, participants chose between two options 16 ( Figure 1A), each probabilistically associated with the delivery of painful heat (49 or 50°C, 17 1.9 s duration) to their left lower leg. Pain probabilities for each option were governed by two 18 independently varying random walks ( Figure 1B CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10. 18.464769 doi: bioRxiv preprint There were no treatment effects on subjective state (alertness, calmness or 1 contentment; Figure 1-figure supplement 1) or self-reported heat pain during a separate 2 pain-rating task immediately preceding the avoidance-learning task (Figure 1-figure  3 supplement 2) . 4 Nine participants were excluded from the fMRI (but not the behavioral) analyses 5 because of excessive head movement. Thus, the behavioral analyses included 83 participants 6 (26-29 per treatment group), and the fMRI analyses included 74 participants (24-26 per 7 treatment group) . 8 9 No drug effects on model-independent measures of task performance 10 On average, participants received pain on 57.1 of the 144 trials (SD = 8.6). As 11 expected, participants switched to the other choice option more frequently after receiving 12 pain (46.6% of those trials, SD = 21.2) than after avoiding pain (5.4% of those trials, SD = 13 6.9; t(82)= 18.6, p < .001). The effect of previous pain outcomes on switching also decayed 14 exponentially over time, in all treatment groups ( Figure 1C; p < 0.001 for 1 trial back, p < 15 .002 for 2 trials back, and p > .047 for 3-6 trials back, in all groups) . 16 The three treatment groups did not differ in the number of received pain stimuli 17 (F(2,80) = 0.56, p = 0.57), frequency of switching following pain outcomes (F(2,80) = 0.03, 18 p = 0.97), or frequency of switching following no-pain outcomes (F(2,80) = 1.18, p = 0.31) . 19 Thus, our pharmacological manipulations did not affect basic measures of task performance. 20 21

Computational modeling 22
To formalize and quantify the latent learning and decision processes thought to 23 underlie participants' choice behavior, we applied two candidate reinforcement-learning 24 models to the choice data, using a hierarchical Bayesian approach. Group-level parameters 25 were estimated separately for each treatment group. Both models update the expected pain 26 probability for the chosen option on each trial, in proportion to the prediction error (Rescorla 27 and Wagner 1972). The two models differ in that Model 1 uses a single learning rate, , for 28 all outcomes, whereas Model 2 uses separate learning rates for received and avoided pain: 29 "#$% and %'("#$% , respectively (see Methods for model equations and parameter-estimation 30 details). If Model 2 is better able to explain the choice data than Model 1, this could be taken 31 as initial support for the idea that learning from received and avoided pain is subserved by 32 different learning systems . 33 Both learning models were combined with a softmax decision function that translates 1 expected pain probabilities into choice probabilities. Inverse-temperature parameter 2 controls the degree of choice randomness, such that the likelihood that the model chooses the 3 option with the lowest expected pain probability increases as increases. Parameter estimates: levodopa and naltrexone increase learning rates for avoided pain 13 We next examined the parameter estimates of the best fitting model. We focus on the 14 hyperparameters governing the means of the group-level distributions, which we denote with 15 overbars (e.g., * "#$% refers to the group-level mean of "#$% ). Figure 2A 2. 20 In the placebo group, the posterior distribution of * "#$% (median = 0.72) was 21 considerably higher than the posterior distribution of * %'("#$% (median = 0.32; * "#$% > 22 * %'("#$% for 99.6% of the MCMC samples), indicative of stronger expectation updating 23 when pain was received than avoided. In contrast, in both drug groups, the posterior 24 distributions of * "#$% and * %'("#$% were highly similar, due to a specific increase in 25 * %'("#$% relative to the placebo group. In the levodopa group, the posterior medians of * "#$% 26 and * %'("#$% were, respectively, 0.66 and 0.66 ( * "#$% > * %'("#$% for 50% of the MCMC 27 samples). In the naltrexone group, the posterior medians of * "#$% and * %'("#$% were, 28 respectively, 0.72 and 0.76 ( * "#$% > * %'("#$% for 38% of the MCMC samples). Note that the 29 best-fitting model for both drug groups contained separate learning rates for received and 30 avoided pain. Combined with the finding that the group-level mean learning-rate parameters 31 for these two outcomes were highly similar, this suggests that some participants in each drug 32 group learned more from received than avoided pain while others showed the opposite bias, 33 but that there was no systematic learning asymmetry (at the individual level, "#$% was 1 higher than %'("#$% for 50% of the levodopa and 41% of the naltrexone participants). for each group (left and right panels). Parameters %'("#$% and "#$% are learning rates for 6 avoided and received pain outcomes, respectively; parameter is the inverse-temperature 7 parameter. The middle panels are joint density plots of * "#$% and * %'("#$% (dots are samples 8 from the MCMC), showing that * "#$% is reliably greater than * %'("#$% in the placebo group 9 only. B. The difference between the posterior distributions for each drug group vs. the 10 placebo group, showing that * %'("#$% is greater and ̅ is smaller in both drug groups 11 compared to the placebo group. Red lines indicate 95% HDIs.

Levodopa -Placebo
Naltrexone - Placebo  1   Thus, at the group level, both levodopa and naltrexone, as compared to placebo,  2   increased learning rates for avoided pain, while not affecting learning from received pain  3   (Figure 2A, left and middle panels). To test the significance of these group differences, we 4 computed the difference between the posterior distributions of the group-level mean 5 parameters for each drug group vs. the placebo group ( Figure 2B). For * %'("#$% , 99.7% of 6 the difference distribution for levodopa vs. placebo, and 99.9% of the difference distribution 7 for naltrexone vs. placebo, lay above 0. In contrast, * "#$% did not differ between the drug and 8 placebo groups: 34% and 49% of the difference distributions for levodopa vs. placebo and 9 naltrexone vs. placebo, respectively, lay above 0. Thus, both drugs selectively increased 10 learning rates for avoided pain . 11 The posterior distribution of inverse-temperature parameter ̅ was higher for the 12 placebo group (median = 8.8) than the levodopa and naltrexone group (median = 5.3 and 5.7, 13 respectively), as well. Specifically, 98.8% of the ̅ difference distribution for levodopa vs. 14 placebo, and 98.1% for naltrexone vs. placebo, lay below 0, indicating that ̅ was reliably 15 lower for both drug groups compared to the placebo group. This suggests that participants in 16 the two drug groups, as compared to the placebo group, were less prone to choose the option 17 with the lowest expected pain probability (i.e., more stochastic choice behavior) . 18 Together, the parameter estimates suggest that (i) untreated (placebo group) 19 participants updated their expectations more rapidly following received than avoided pain, 20 (ii) levodopa and naltrexone negated this learning asymmetry by selectively increasing 21 learning rates for avoided pain, and (iii) levodopa and naltrexone also increased choice 22 stochasticity, possibly reflecting a more exploratory or risky choice strategy. 23 24

Replication of learning rate asymmetry in an independent group of untreated participants 25
To examine the robustness of our finding of asymmetric learning rates in the placebo 26 group, we applied our hierarchical Bayesian modeling approach to the choice data of 23 27 untreated participants from our previous pain-avoidance learning fMRI study (Roy et al. 28 2014), and tested whether the higher learning rate for received than avoided pain found in our 29 placebo group was replicated in this previous dataset ( Figure 2-figure supplement 3). In this 30 previous dataset, * "#$% (median = 0.62) was indeed higher than * %'("#$% (median = 0.44), 31 resembling the placebo-group results from our current study, although the learning-rate 32 asymmetry was somewhat smaller in the previous dataset ( * "#$% > * %'("#$% for 90% of the 33 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint MCMC samples). The posterior median of β * in the previous dataset was 7.1 (95% HDI = 4.7-1 9.9). The finding of a higher learning rate for pain than no-pain outcomes in this independent 2 dataset corroborates the idea that people normally (in the absence of a pharmacological 3 manipulation) update their expectations more rapidly following received than avoided pain. 4 5

Parameter recovery 6
To validate the conclusion that our pharmacological manipulations affected both 7 * %'("#$% and ̅ , we simulated two sets of choice data, using the parameter values found in 8 the placebo and drug groups, and performed parameter-recovery analyses (Appendix 1). In 9 sum, we found that our modeling procedure can distinguish the two patterns of parameter 10 values found in the placebo and drug groups, even though they produce similar model-11 independent performance measures (number of received pain stimuli, and frequency of 12 switching following pain and no-pain outcomes). Thus, the observed drug effects on both 13 * %'("#$% and ̅ are unlikely to merely reflect a tradeoff between these parameters. Instead, 14 the parameter-recovery results suggest that levodopa and naltrexone had two computational 15 effects-an increased learning rate for avoided pain, and an increased degree of choice 16 stochasticity-whose combination yielded no significant effects on basic, model-17 independent, performance measures. 18 19 fMRI analyses 20 Next, we will report two sets of fMRI analyses. First, we performed an axiomatic 21 analysis to identify brain activation encoding general aversive prediction errors (i.e., 22 activation encoding the degree to which both pain and no-pain outcomes are relatively worse, 23 or less good, than expected). In our previous study, this analysis revealed a general aversive 24 prediction error signal in the PAG (Roy et al. 2014). Here, we examined whether we could 25 replicate this finding, and whether our pharmacological manipulations affected the brain 26 activation associated with the prediction-error axioms. Second, to address the question 27 whether learning from received and avoided pain is supported by two separate brain systems, 28 we sought to identify brain activation encoding outcome-specific prediction error signals. In 29 both analyses, we focused on the first second of the outcome period as this is when prediction 30 errors are triggered. 31 We modeled drug effects using two second-level regressors (levodopa vs. placebo and 32 naltrexone vs. placebo) in both analyses. All fMRI results are thresholded at q < 0.05, false 33 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. discovery rate (FDR)-corrected for multiple comparisons across the whole brain (gray matter 1 masked), unless otherwise stated (e.g., for visualization purposes). Unthresholded t maps can 2 be found on https://neurovault.org/collections/RIVRRMAK/. 3 4

General (outcome-nonspecific) aversive and appetitive prediction error signals 5
Prediction-error related activation is often examined by regressing fMRI activity at 6 outcome onset on model-derived prediction errors (see Figure 3-figure supplement 1 for the 7 corresponding activation in our study). However, as prediction errors are defined as the 8 outcome minus the expected outcome, a problem with this approach is that the resulting brain 9 activity may predominantly track the outcome (in our task: pain vs. no pain) or the expected 10 outcome (in our task: the expected pain probability), which are intrinsically correlated with 11 the prediction error . 12 To address this issue, and identify brain activity that truly integrates actual and 13 expected outcomes into a prediction error signal, a set of conditions has recently been 14 specified (Rutledge et al. 2010, Roy et al. 2014). These conditions, or axioms, for general 15 aversive prediction error signals in our task are: (i) activation at outcome onset should be 16 higher for received than avoided pain, unless pain is fully expected; (ii) when pain is 17 received, activation should be higher when pain was less expected (i.e., negative correlation 18 with expected pain probability); and (iii) when pain is avoided, activation should also be 19 higher when pain was less expected, that is, when avoidance was more expected ( Figure 3A, 20 left panels). To identify regions encoding a general aversive prediction-error signal, we tested 21 for activation that fulfilled each of these three axioms, using a whole-brain conjunction 22 analysis. In addition, to search for regions encoding the opposite (i.e., appetitive-like) 23 prediction error signal, we also tested for activation that fulfilled each of the reverse axioms . 24 Note that we did not detect activation encoding appetitive-like prediction errors in our 25 previous study (Roy et al. 2014) . 26 A fourth axiom, that applies to both aversive and appetitive prediction errors, is that 27 activation for received and avoided pain should be equivalent if the outcome is fully 28 predicted (i.e., when the prediction error is zero). As outcomes could never be fully predicted 29 in our task, we could not test this axiom. 30 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint 1 Figure 3. Axiomatic tests of brain activation encoding general aversive and appetitive 2 prediction errors (N = 74). A. Activation associated with the three axioms for aversive 3 prediction errors in our task. Yellow regions showed the effects illustrated in the left panels, 4 and blue regions showed the reverse effects (i.e., the axioms for appetitive prediction errors) . 5 Expected P(pain) is expected pain probability. All maps were thresholded at q < 0.05, FDR-6 corrected for multiple comparisons across the whole brain, with higher voxel thresholds 7 superimposed for display. B. Conjunction results. Regions activated for each of the above 8 three contrasts, all thresholded at q < 0.05 FDR-corrected. Yellow and blue regions showed 9 positive and negative responses for each contrast, respectively, thus encoded general aversive 10 and appetitive prediction errors . 11 12 Axiom 1. A large part of the brain fulfilled the first axiom for aversive prediction 13 errors (stronger response to received than avoided pain), including typical pain-processing 14 regions such as the dorsal anterior cingulate cortex (ACC), (pre)motor cortex, insula and 15 thalamus, as well as occipital (visual) cortex ( Figure 3A, upper panel). We found the opposite 16 effect (stronger response to avoided than received pain) in regions of the ventromedial 17 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint prefrontal cortex (vmPFC), dorsolateral prefrontal cortex (dlPFC), somatosensory cortex, 1 posterior ACC, and later occipital cortex (LOC). 2

Axiom 2. A test of the second axiom for aversive prediction errors (stronger responses 3
to more unexpected pain) revealed several activation clusters, including regions in the 4 vmPFC, dorsal ACC, insula, amygdala, and a midbrain area covering part of the PAG ( Figure  5 3A, middle panel). In addition, several other regions, including the right dlPFC and bilateral 6 somatosensory cortex, showed the opposite effect (stronger responses to more expected pain). 7 Axiom 3. The third axiom for aversive prediction errors (stronger responses to more 8 expected pain avoidance) was fulfilled by a few regions in the vmPFC and rostral ACC 9 (rACC), as well as part of the PAG. We also found the opposite effect (stronger responses to 10 more unexpected pain avoidance) in several regions, including the dorsal ACC, sensorimotor 11 cortex, thalamus, putamen and insula. 12

Conjunction.
A conjunction analysis of the three contrasts reported above ( Figure  13 3B) revealed two brain regions that satisfied all three axioms for aversive prediction errors: A 14 midbrain region including part of the PAG (16 voxels) and an area in the rostral ACC (24 15 voxels). Importantly, we also identified activation that showed a negative effect for all three 16 axioms-thus encoding appetitive-like prediction errors-in bilateral somatosensory cortex 17 (433 and 65 voxels in the left and right hemisphere, respectively), left frontopolar cortex (47 18 voxels), right dlPFC (middle frontal gyrus; 27 voxels), and right LOC (253 voxels) . 19 Together, these results replicate our previous finding that the PAG encodes general 20 aversive prediction errors (Roy et al. 2014), and suggest a role for the rostral ACC in 21 encoding aversive prediction errors as well. Furthermore, the identification of an additional 22 neural circuit encoding appetitive-like prediction errors provides an important extension to 23 our previous results, possibly owing to the larger number of participants and hence higher 24 power in the present study . 25 26

No drug effects on prediction-error related brain activation 27
We found no differences between the levodopa and placebo group, nor between the 28 naltrexone and placebo group, for any of the three prediction-error contrasts reported above 29 (whole-brain FDR-corrected). Drug effects were virtually absent at lower, uncorrected, 30 thresholds as well (see unthresholded t maps on 31 https://neurovault.org/collections/RIVRRMAK/). More specific tests for drug effects in the 32 clusters identified by our conjunction analysis provided no evidence for effects of our 33 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint pharmacological manipulations on prediction-error related brain activation either (

Outcome-specific prediction error signals 4
The previous analysis identified regions that encoded the degree to which outcomes 5 were relatively worse-or better-than-expected in the same direction when pain was received 6 and avoided, but did not dissociate learning from received and avoided pain. To address this 7 issue, we next examined whether different brain regions encode the unexpectedness, or 8 surprise (which drives learning in reinforcement-learning models), evoked by received and 9 avoided pain . 10 Regions encoding the surprise evoked by received pain should respond stronger to 11 pain outcomes when pain was less expected (i.e., a negative correlation with expected pain 12 probability on pain trials). In contrast, regions encoding the surprise evoked by avoided pain 13 should respond stronger to no-pain outcomes when pain was more expected (i.e., a positive 14 correlation with expected pain probability on no-pain trials; Figure 4A CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint 1 Figure 4. Outcome-specific prediction-error signals (N = 74). A. Activation tracking surprise 2 more for received than avoided pain (yellow) and vice versa (blue). Note that this includes 3 activation that tracks expected pain probability across both outcomes. B. Activation tracking 4 surprise for both received and avoided pain (i.e., absolute prediction error). Activation maps 5 in A and B are thresholded at q < 0.05, FDR-corrected for multiple comparisons across the 6 whole brain, with adjacent areas thresholded at p < 0.01 and p < 0.05 (uncorrected) for 7 display. C. Regions encoding surprise more for received than avoided pain, which cannot be 8 explained by a general sensitivity to expected pain probability. These regions showed 9 positive activation for both the first (A) and second (B) contrast, each thresholded at q < 0.05, 10 FDR-corrected. D. Regions encoding surprise more for avoided than received pain, which 11 cannot be explained by a general sensitivity to expected pain probability. Surprise no pain . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint expected pain probability, illustrating the encoding of outcome-specific prediction errors in 1 these regions. 2 3 We also sought to identify activation encoding the surprise elicited by both received 4 and avoided pain, that is, activation encoding absolute prediction errors. To this end, we 5 specified a second contrast that tested for a negative correlation with expected pain 6 probability on pain trials and a positive correlation with expected pain probability on no-pain 7 trials ( Figure 4B). This contrast revealed extensive activation clusters in the dorsal ACC 8 extending into the supplementary motor cortex, insula, sensorimotor cortex, thalamus, part of 9 the brainstem and cerebellum, suggesting that these regions encoded absolute prediction error 10 ( Figure 4B, yellow regions). In addition, a few smaller clusters in left sensorimotor cortex, 11 right dlPFC, and left frontopolar cortex showed the opposite effect, suggesting that these 12 regions encoded the overall expectedness of outcomes ( Figure 4B, blue regions) . 13 Finally, note that a caveat of the first contrast reported above ( Figure 4A) is that it 14 also identifies activation that is stronger when the expected pain probability is lower (i.e.,  Figure 4B), on the other hand, will not detect activation encoding 19 expected safety regardless of the outcome, as the correlation with expected pain probability is 20 specified in opposite directions for pain and no-pain outcomes. Thus, we reasoned that 21 regions identified by both of the contrasts reported in Figure 4A and 4B encode outcome-22 specific prediction errors ( Figure 4A) unconfounded by outcome-nonspecific expected pain 23 probability ( Figure 4B). Therefore, we next examined the conjunction of these two contrasts . 24 Specifically, we masked the activation identified by the first contrast (separately for the 25 positive and negative activation) by the positive activation identified by the second contrast . 26 Positive activation for both contrasts was found in a set of mostly subcortical and limbic 27 regions, including a large cluster in the brainstem (not covering the PAG), bilateral insula 28 extending into the amygdala, rostral ACC, posterior cingulate cortex, bilateral supramarginal 29 gyrus, and cerebellum ( Figure 4C). These regions thus encoded surprise more for received 30 than avoided pain (pain-specific prediction errors), which could not be explained by a general 31 sensitivity to expected pain probability. Negative activation for the first contrast and positive 32 activation for the second contrast, on the other hand, was found in several cortical regions, 33 including the supplementary motor cortex, left parietal cortex (supramarginal gyrus), bilateral 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint somatosensory cortex (postcentral gyrus), and left dlPFC (middle and superior frontal gyrus) 1 ( Figure 4D). These regions thus encoded surprise more for avoided than received pain (no-2 pain specific prediction errors), which could not be explained by a general sensitivity to 3 expected pain probability. Together, these results provide evidence that prediction errors 4 evoked by pain and no-pain outcomes are encoded in largely distinct brain regions . 5 6 No drug effects on surprise-related brain activation 7 Because levodopa and naltrexone specifically increased learning rates for no-pain 8 outcomes, we expected these drugs to increase prediction-error related brain activation for 9 no-pain outcomes as well. However, we found no differences between the levodopa and 10 placebo group, or between the naltrexone and placebo group, for any of the contrasts reported 11 above (whole-brain FDR-corrected). Drug effects were virtually absent at lower, uncorrected, 12 thresholds as well (see unthresholded t maps on 13 https://neurovault.org/collections/RIVRRMAK/). 14 15 Discussion 16   17 Our results provide novel evidence that unexpectedly received and avoided pain-18 signaling threat and safety, respectively-drive human pain-avoidance learning via different 19 learning systems. First, computational modeling suggested that participants' choices were 20 best explained by a model with separate learning rates for received and avoided pain, and that 21 untreated participants learned more from received than avoided pain. Second, levodopa and 22 naltrexone selectively increased learning rates for avoided pain, suggesting a role for the 23 dopamine and endogenous-opioid systems in safety, but not threat, learning. Third, our fMRI 24 analyses revealed that different brain circuits encode prediction errors elicited by received 25 and avoided pain, providing evidence for two dissociable learning systems at the neural level 26 as well. Somewhat surprisingly, however, we found no drug effects on fMRI activity for any 27 of the contrasts we examined. We discuss each of these findings below. 28 29 Learning rates for received vs. successfully avoided pain 30 The higher learning rate for pain than no-pain outcomes in the placebo group suggests 31 that people normally (in the absence of a pharmacological manipulation) update their 32 expectations more following received than avoided pain. A similar learning-rate asymmetry 33 was found in a recent aversive reversal-learning study (Wise et al. 2019). Interestingly, 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint however, reward-learning studies using secondary outcomes (e.g., monetary gains and losses) 1 have provided evidence for the opposite asymmetry: higher learning rates for favorable than 2 unfavorable outcomes. This has been attributed to an optimistic learning bias (Sharot and  3 Garrett 2016, Lefebvre et al. 2017) and a tendency to learn preferentially from information 4 that confirms one's current action ). The opposite learning bias in pain-5 avoidance learning tasks may be due to the intrinsically aversive nature of pain, arguably 6 rendering unexpected pain a more salient teaching signal than unexpected pain absence . 7 Relatedly, the experience of pain may trigger a reflexive tendency to change one's course of 8 action (Huys et al. 2012), expressed in elevated learning rates for pain outcomes. Thus, 9 higher learning rates for received than avoided pain may reflect a Pavlovian influence on 10 choice which operates in parallel to the instrumental learning system. Alternatively, the 11 seemingly opposite learning asymmetries in reward-learning and pain-avoidance learning 12 tasks may also reflect a cognitive process related to the framing of the task. That is, 13 participants who are instructed to maximize reward vs. minimize pain may pay most attention 14 to-and hence learn most from-reward vs. pain outcomes, respectively . 15 The presence and direction of learning asymmetries may also depend on the specific 16 task demands. For example, a previous fMRI study that used a more complex pain-avoidance 17 learning task (pain probabilities of three different options were learned in parallel, in an 18 indirect manner), and included a risk-taking component, found no systematic difference in 19 learning rates for received and avoided pain (Eldar et al. 2016). Interestingly, in contrast to 20 our present and previous (Roy et al. 2014) fMRI findings, the PAG in that study positively 21 encoded expected pain probability on no-pain trials (one of the axioms for appetitive 22 prediction errors), and did not encode expected pain probability on pain trials. These findings 23 suggest that the task used in that previous study (Eldar et al. 2016) evoked different learning 24 processes than our simpler task. Indeed, learning rates in that previous study were much 25 lower than those found in our task as well. Further examination of the degree to which 26 avoidance-learning processes and their neural implementation generalize across different 27 learning tasks is an important objective for future work (Yarkoni 2020). . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint Combined with these previous findings, our levodopa results suggest that phasic dopamine 1 activity may signal the degree to which outcomes are 'better than expected' across both 2 reward and punishment domains. We are aware of one previous study that provided 3 correlational evidence for a role of dopamine in human safety learning in a Pavlovian fear-4 conditioning task (Raczka et al. 2011). That study found that individual differences in fear-5 extinction learning rates were associated with genetic variation in the dopamine transporter 6 gene, which presumably affects phasic striatal dopamine release. Our levodopa results are 7 consistent with this result, and provide the first causal evidence for a role of dopamine in 8 human safety learning in an instrumental-learning task . 9 It has been proposed that the dopamine system signals aversive prediction errors as 10 well, via a subset of midbrain dopaminergic neurons that is responsive to aversive outcomes 11 errors support pain-avoidance learning, we would expect levodopa to impair learning from 27 received pain. Our finding that levodopa did not affect learning rates for received pain does 28 not support this hypothesis either. Instead, our results suggest a selective role for the human 29 dopamine system in learning from successfully avoided pain . 30 Regarding the endogenous opioid system, we expected that if pain-avoidance learning 31 relies on µ-opioid activity, naltrexone-which blocks this activity-would suppress learning 32 from received and/or avoided pain. However, we found the opposite effect for avoided-pain 33 outcomes: like levodopa, naltrexone increased learning rates for avoided pain. This finding 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint counterintuitively suggests that µ-opioid activity normally suppresses learning from avoided 1 pain and that naltrexone countered this effect, which seems to contradict findings that µ-2 opioid receptor antagonists impair fear-extinction learning in rats (McNally and Westbrook  more rapidly from received than avoided pain (replicated in our previous study) and that both 10 levodopa and naltrexone negate this learning asymmetry, but the neurobiological mechanisms 11 underlying the naltrexone effect remain to be elucidated. One informative approach for future 12 studies would be to directly compare effects of opioid-receptor agonists and antagonists on 13 pain-avoidance learning parameters . 14 The levodopa and naltrexone groups also showed a higher degree of choice 15 stochasticity than the placebo group, suggesting that participants in both drug groups were 16 less prone to choose the option with the lowest expected pain probability. Importantly, our or motivation, which disrupted the decision-making process. However, we believe this is 28 unlikely because (i) the drugs did not affect subjective state (alertness, calmness or 29 contentment) and (ii) general side effects cannot easily explain the specific increase in 30 learning rates for avoided pain . 31 Interestingly, the drug effects on learning rate for avoided pain and choice 32 stochasticity were not accompanied by drug effects on basic performance measures (number 33 of received pain outcomes, pain-switch or avoid-switch behavior). In a similar vein, previous 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10. 18 our study cancelled each other out. Specifically, symmetric learning rates for received and 5 avoided pain (found in the drug groups) result in more accurate pain-probability estimates 6 than asymmetric learning rates (found in the placebo group). This beneficial effect of a 7 symmetric learning process in the drug groups was, however, counteracted by the detrimental 8 effect of a more stochastic choice process, resulting in no net performance difference 9 between the placebo and drug groups. 10 11 Separate brain circuits support learning from received and avoided pain 12 Our fMRI results provided evidence for two dissociable learning systems at the brain 13 level as well. Pain-specific prediction errors were predominantly encoded in subcortical 14 (brainstem, cerebellum) and limbic (insula, amygdala, rostral ACC) regions that are typically 15 associated with emotional and affective processes, including fear conditioning (Phillips and 16 LeDoux 1992) and affective responses to errors (Bush et al. 2000). In contrast, no-pain 17 specific prediction errors were encoded in frontal and parietal cortical areas typically 18 associated with higher-order cognitive processing (Ptak et al. 2017) . 19 Prediction errors for no-pain outcomes were not represented in the ventral striatum, activity. This activity is unlikely to reflect a reward signal, but possibly reflected increased 23 attention on trials in which pain was expected but not received. That is, the unexpected 24 absence of pain may have prompted participants to carefully monitor the thermode's 25 temperature in order to verify whether pain was really avoided or still to come, as reflected in 26 increased frontoparietal activity. The absence of a striatal 'reward-like' prediction-error 27 response for no-pain outcomes suggests that, in terms of neural processing, avoiding pain is 28 not comparable to gaining a reward. However, the lack of a detectable reward-like prediction-29 error response may also be related to our task design. Specifically, pain outcomes in our task 30 involved a change in sensory input (a rise in temperature) whereas no-pain outcomes did not 31 (maintenance of the baseline temperature), which may have caused a more prominent neural 32 prediction-error response for the pain outcomes. One way to examine this issue would be to 33 use a task in which choices result in either an increase or a decrease in painful stimulus 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint intensity from a tonic pain level, such that aversive and appetitive-like outcomes are 1 associated with similar changes in sensory input (Seymour et al. 2005). Such a task would 2 examine pain-relief rather than pain-avoidance learning. The current task design, however, 3 more closely resembles the situation of a patient recovering from injury or surgery, who is 4 pain-free as long as he or she is resting but expects that physical activity may result in pain. It 5 is an interesting speculation that, in such situations, the stronger subcortical and limbic 6 ('emotional') responses for pain than for no-pain prediction errors, as found in our study, 7 may foster a behavioral state that favors inactivity and rest, which could promote tissue 8 healing and recovery . 9 10

No effects of levodopa and naltrexone on fMRI activation 11
Unexpectedly, we found no effects of levodopa or naltrexone on any of our fMRI 12 prediction-error contrasts. This may indicate that the dopamine and opioid systems are not 13 involved in pain-avoidance learning, although this is inconsistent with the drug effects on 14 learning rates for avoided pain. It is also possible that our pharmacological manipulations did 15 affect prediction-error related dopamine and/or opioid activity, but that we did not have 16 enough power to detect these effects due to our moderate sample size and between-subject CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint As mentioned above, a lack of statistical power due to our moderate sample size and 1 between-subject design may have prevented the detection of drug effects in our fMRI 2 analyses. In addition, our fMRI data suffered from signal dropout in inferior parts of the 3 prefrontal cortex (including the orbitofrontal cortex), which is a common problem in fMRI 4 studies using echo-planar imaging (Ojemann et al. 1997, Deichmann et al. 2003). Therefore, 5 our results are agnostic with respect to the contribution of ventral prefrontal areas to pain-6 avoidance learning, and their potential modulation by levodopa or naltrexone. Finally, 7 regarding the role of neuromodulators, we focused on dopamine and endogenous opioids, but 8 other neuromodulators are almost certainly involved in pain-avoidance learning as well. In 9 particular, future work may focus on the serotonergic system, which has traditionally been 10 associated with aversive processing, behavioral inhibition, and "fight or flight" responses in 11 rodents (Deakin 1983, Soubrie 1986 In sum, our results suggest that received and avoided pain drive human pain-32 avoidance learning via two different learning systems, in terms of both learning rates and the 33 neural encoding of prediction errors. In addition, our computational-modeling results provide 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint evidence for a causal role of the dopamine and endogenous opioid systems in learning from 1 avoided, but not received, pain. Future studies are needed to elucidate the neural mechanisms 2 via which our dopamine and opioid manipulations affected learning rates for avoided pain, 3 and to reveal the potential role of other neuromodulators in pain-avoidance learning. in the study. Participants reported no history of psychiatric, neurological, or pain disorders, 10 and no current pain. Participants were instructed to abstain from using alcohol or recreational 11 drugs 24 hours prior to testing, and to not eat or drink (except for water) 2 hours prior to 12 testing. The study was approved by the medical ethics committee of the Leiden University 13 Medical Center, and all participants provided written informed consent. Participants received 14 a fixed amount of €60 plus a variable bonus of maximally €5 related to their performance on 15 an additional task . 16 Six participants were excluded from all analyses because of thermode failure, and two 17 additional participants because of poor task performance (see 'Pain-avoidance learning task' 18 section below). In addition, nine participants were excluded from the fMRI, but not the CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint Two to fourteen days prior to the fMRI session, we assessed participants' eligibility 1 using a general health questionnaire and an fMRI safety screening form. During this 2 screening session, participants also practiced the pain-avoidance learning task . 3 Each eligible participant took part in one fMRI session. On the day of the fMRI 4 session, participants received a single oral dose of either 100 mg levodopa, 50 mg naltrexone, 5 or placebo, according to a double-blind, randomized, between-subject design. Levodopa was 6 combined with 25 mg carbidopa-a decarboxylase inhibitor that does not cross the blood 7 brain barrier-to inhibit the conversion of levodopa to dopamine in the periphery . 8 Approximately 30 minutes after drug administration, participants were positioned in the MRI 9 scanner, after which we acquired a high-resolution structural scan. Approximately 53 minutes 10 after drug administration, participants completed a 5-minute pain-rating task during which 11 they received a series of (unavoidable) heat stimuli of varying temperatures and rated their 12 experienced pain following each stimulus (Figure 1-figure supplement 2). This task was 13 included to test for drug effects on subjective pain responses, and to select a painful yet 14 tolerable temperature for each participant in the pain-avoidance learning task. Sixty minutes 15 after drug administration, roughly corresponding with peak plasma concentrations of 16 levodopa and naltrexone, participants started the pain-avoidance learning task (described 17 below), which lasted approximately 45 minutes. Following the pain-avoidance learning task, 18 participants performed an 8-minute probabilistic reward-learning task (not reported here). 19 We measured participants' subjective state at the beginning (before drug 20 administration) and end (two hours after drug administration) of the test session, by means of 21 visual analogue scales measuring alertness, calmness and contentment (Bond, 1974) ( Figure  22 1-figure supplement 1). Both subjective state measures were collected outside the scanner. 23 24

Pain-avoidance learning task 25
This instrumental pain-avoidance learning task contained 144 trials, divided in 4 runs 26 of 36 trials. On each trial, participants made a choice between two options (a diamond and a 27 circle). Choosing each option was associated with a specific probability of receiving a painful 28 heat stimulation. The probabilities of receiving pain when choosing each option drifted across 29 trials according to two independent random walks ( Figure 1A). We used three different pairs 30 of random walks (each pair crossed at least once); each pair was administered to 31 approximately one third of the participants in each treatment group . 32 Each trial started with the presentation of the two choice options, randomly displayed 33 at the left and right side of the screen for 1800 ms ( Figure 1B). During this period, 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint participants had to select one option by pressing a left or right button of the response unit, 1 using their right index or middle finger, respectively. If participants did not respond in time 2 (1.2 % of trials), the computer randomly selected an option for them. The chosen option was 3 highlighted for 200 ms, followed by an anticipation period of 3, 5, or 7 seconds during which 4 a white asterisk (*) was presented in the center of the screen. Then the outcome-a painful 5 heat stimulus applied to participant's leg for 1.9 s (see Thermal stimulation section below) or 6 no stimulus-was presented. Outcome onset was accompanied by a change of the central 7 asterisk to a colored plus sign (+). The plus sign was red or green during the first 200 ms of 8 each pain and no-pain outcome, respectively, after which it turned white for the remainder of 9 the outcome period. The color change was meant to prevent outcome uncertainty during the 10 initial phase of the outcome period. Each trial ended with an inter-trial interval of 6, 8, or 10 11 seconds during which an asterisk was presented. Except for the outcome probabilities, 12 participants were fully informed about the task structure and procedure . 13 During the fMRI session, one participant switched choices more frequently following 14 the absence of pain than following pain, and one other participant did deliberately not make a 15 choice on 17% of the trials, due to the use of irrelevant strategies. We excluded these 16 participants from further analysis. 17

Thermal stimulation 18
Heat stimuli (ramp rate = 40°C/s; 1 second at target temperature) were applied to the 19 inner side of participants' left lower leg using a Contact Heat-Evoked Potential Stimulator 20 (CHEPS; 27-mm diameter Peltier thermode; Medoc Ltd., Israel). After the initial pain-rating 21 task, 16% of the participants (4 in the placebo, 4 in the levodopa, and 7 in the naltrexone 22 group) indicated that they would not tolerate repeated stimulation at the highest temperature 23 they had received so far (50°C). For those participants, we used a temperature of 49°C in the 24 pain-avoidance learning task. For the remaining participants, we used a temperature of 50°C. 25 Between stimulations the thermode maintained a baseline temperature of 32°C. The total 26 duration of each stimulation was 1850 ms (425 ms ramp-up and ramp-down periods, 1 27 second at target temperature) for 49°C stimuli, and 1900 ms (450 ms ramp-up and ramp-28 down periods, 1 second at target temperature) for 50°C stimuli. After each scan run, we 29 moved the thermode to a new site on the participant's leg. To reduce the impact of potential 30 site-specific habituation ), we administered one initial heat stimulus before 31 starting the first trial on a new skin site. 32 33

Behavioral analyses 34
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint We tested whether the total number of received pain stimuli, the proportion of pain 1 trials followed by a switch to the other choice option, and the proportion of no-pain trials 2 followed by a switch to the other choice option differed between the three treatment groups, 3 using one-way ANOVAs. In addition, for each treatment group, we used logistic regression 4 to analyze the probability of switching choices as a function of outcome (pain vs. no-pain) 5 over the six previous trials. learning rate for all outcomes, whereas Model 2 uses separate learning rates for pain and 20 no-pain outcomes: "#$% and %'("#$% , respectively . 21 Both models were combined with a softmax decision function, which computes the 22 probability of choosing stimulus s on trial t ( .,0 ) as: 23 Inverse-temperature parameter controls the sensitivity of choice probabilities to 25 differences in Q values. If is 0, both stimuli are equally likely to be chosen, irrespective of 26 their expected pain probabilities. As increases, the probability that the model chooses the 27 stimulus with the lower expected pain probability increases. Thus, Model 1 has two free 28 parameters ( and ) and Model 2 has three free parameters ( "#$% , %'("#$% and ) . 29 Parameter estimation 30 We estimated the model parameters with a hierarchical Bayesian approach, using the 31 hBayesDM package (Ahn et al. 2017). The hierarchical Bayesian approach assumes that 32 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint every participant has a different set of model parameters, which are drawn from group-level 1 prior distributions (Gelman 2014). The parameters governing the group-level prior 2 distributions (hyperparameters) are also assigned prior distributions (hyperpriors). We 3 estimated separate group-level parameters for the placebo, levodopa, and naltrexone groups . 4 To test for treatment effects, we compared the posterior distributions of the group-level 5 means (i.e., the hyperparameters governing the means of the group-level distributions) for 6 each drug group vs. the placebo group. 6 is considered positive evidence, a difference of 6 to 10 strong evidence, and a 29 difference >10 very strong evidence for one model over another (Kass and Raftery 1995). 30 31 We computed trial-specific expected pain probabilities-to be used as parametric 32 modulator regressors in the fMRI analyses-by applying our winning model to each 33 participant's sequence of choices and outcomes. In line with previous studies (Pine et al. 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769  and any values with a significant χ2 value (corrected for multiple comparisons based on the 26 more stringent of either false discovery rate or Bonferroni methods) were considered outliers . 27 On average 3.7% of images were outliers (SD = 1.9). The output of this procedure was later 28 used as a covariate of noninterest in the first-level models . 29 Functional images were slice-acquisition-timing and motion corrected using SPM8 30 (Wellcome Trust Centre for Neuroimaging, London, UK). Structural T1-weighted images 31

Computation of expected pain probability for use in fMRI analyses
were coregistered to the first functional image for each subject using an iterative procedure of 32 automated registration using mutual information coregistration in SPM8 and manual 33 adjustment of the automated algorithm's starting point until the automated procedure 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint provided satisfactory alignment. Structural images were normalized to MNI space using 1 SPM8, interpolated to 2×2×2 mm voxels, and smoothed using a 6mm full-width at half 2 maximum Gaussian kernel. 3

Axiomatic analysis of general aversive and appetitive prediction error signals 4
For the first-level analysis, we create a general linear model (GLM) for each 5 participant, concatenated over the four pain-avoidance learning blocks, in SPM8. We 6 modeled periods of decision time (cue onset until response, mean response time = 758 ms), 7 outcome anticipation (3-7 s), onsets of pain outcomes (1 s), and onsets of no-pain outcomes 8 (1 s), using boxcar regressors convolved with the canonical hemodynamic response function . 9 As in our previous study (Roy et al. 2014) we only modeled the first second of the outcome 10 periods as this is when prediction errors are triggered. We added the model-derived expected 11 pain probability as a parametric modulator on the outcome-anticipation and outcome-onset 12 regressors. To control for potential effects of outcome-anticipation duration, we also included 13 anticipation duration as a first parametric modulator on the outcome-onset regressors (using 14 serial orthogonalization, such that any shared variance between expected pain probability and 15 anticipation duration is assigned to the anticipation-duration effect). Other regressors of non-16 interest (nuisance variables) were i) "dummy" regressors coding for each run (intercept for 17 each but the last run); ii) linear drift across time within each run; iii) the 6 estimated head 18 movement parameters (x, y, z, roll, pitch, and yaw), their mean-zeroed squares, their 19 derivatives, and squared derivatives for each run (total 24 columns per run); iv) indicator 20 vectors for outlier time points identified based on their multivariate distance from the other 21 images in the sample (see above); v) indicator vectors for the first two images in each run . 22 Low-frequency noise was removed by employing a high-pass filter of 180 seconds . 23 For each participant, we created the following three contrast maps, corresponding to 24 the axioms for general aversive prediction errors in our task: (i) pain onset > no-pain onset; 25 (ii) negative correlation with expected pain probability at pain onset; and (iii) negative 26 correlation with expected pain probability at no-pain onset. We performed a second-level 27 (group) analysis on each of these contrasts using robust regression (Wager et al. 2005). We 28 tested for drug effects by including two second-level regressors coding for levodopa vs . 29 placebo (weights [-1 1 0] for the treatment groups [P L N]) and naltrexone vs. placebo 30 (weights [-1 0 1] for the treatment groups [P L N]). All maps were thresholded at FDR q < 31 0.05 corrected for multiple comparisons across the whole brain (gray matter masked), with 32 higher (more conservative) voxel thresholds superimposed for display in Figure 3A. 33 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint To identify brain regions that fulfilled all three axioms for general aversive prediction 1 errors, we examined the conjunction between the three second-level contrast maps described  2   above (pain > no pain, negative correlation with expected pain probability on pain trials, and  3 negative correlation with expected pain probability on no-pain trials), each of them 4 thresholded at FDR q < 0.05. To identify brain regions that fulfilled all three axioms for 5 general appetitive prediction errors, we examined the conjunction between the three opposite 6 contrast maps (no pain > pain, positive correlation with expected pain probability on pain 7 trials, and positive correlation with expected pain probability on no-pain trials). 8

Analysis of outcome-specific prediction-error signals 9
To examine outcome-specific prediction-error signals, we used the same first-level 10 GLM as in the previous analysis, but created different contrast maps. First, to identify 11 activation that tracks surprise more for received than avoided pain ( Figure 4A), we used the 12 following contrast: 'negative correlation with expected pain probability at pain onset' > 13 'positive correlation with expected pain probability at no-pain onset'. Second, to identify 14 activation tracking absolute prediction error ( Figure 4B), we used the following contrast: 15 'negative correlation with expected pain probability at pain onset' > 'negative correlation 16 with expected pain probability at no-pain onset'. Note that this contrast is identical to: 17 'positive correlation with expected pain probability at no-pain onset' > 'positive correlation 18 with expected pain probability at pain onset'. It is also identical to a contrast with weights [1 19 1] for the 'negative correlation with expected pain probability at pain onset' and positive 20 correlation with expected pain probability at no-pain onset' regressors. 21 We performed a second-level analysis on each of these two contrasts using robust 22 regression, again including two second-level regressors coding for levodopa vs. placebo and 23 naltrexone vs. placebo. Maps were again thresholded at FDR q < 0.05 corrected for multiple 24 comparisons across the whole brain. Adjacent areas thresholded at p < 0.01 and p < 0.05 25 (uncorrected) were added for display in Figure 4A and 4B. 26 Finally, to identify regions encoding outcome-specific prediction errors which cannot 27 be explained by a general sensitivity to expected pain probability (also see the Results 28 section) we examined the conjunction between the two second-level contrast maps described 29 above, each thresholded at FDR q < 0.05 ( Figure 4C). 30

Conventional prediction error analysis 31
We also conducted a conventional (not axiomatic) analysis to identify aversive 32 prediction-error related brain activation. In this analysis, we created a single regressor 33 modeling all outcome onsets (pain and no pain) and added aversive prediction error 34 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint (computed as the outcome [pain = 1, no pain = 0] minus the expected pain probability) as a 1 parametric modulator on this outcome-onset regressor. The other regressors were identical to 2 those in the previous analyses. We created first-level contrast images for the prediction error 3 effect and conducted a second-level analysis on these contrast images, again thresholded at 4 FDR q < 0.05 (Figure 3-figure supplement 1A). We also repeated this analysis while adding 5 outcome (pain = 1, no pain = -1) as an additional parametric modulator on the outcome-onset 6 regressor (without serial orthogonalization, such that only the unique variance explained by 7 the prediction-error and outcome variables is assigned to their respective effects; Figure 3 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint dependent prediction errors underpin reward-seeking behaviour in humans. Nature  . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint To assess potential treatment effects on subjective state we conducted analyses of covariance (ANCOVAs) on the post-treatment ratings of alertness, calmness and contentment (made two hours after drug intake), with treatment as a between-subject factor and the pre-treatment ratings as covariate. There was no effect of treatment on any of these ratings (all p's > 0.08), suggesting that the drugs did not affect subjective state.
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint Figure 1 -Figure supplement 2. Pain ratings during a pain-rating task that preceded the pain-avoidance learning task, as a function of stimulus temperature and treatment group.
Error bars indicate standard errors. Participants received five 47°C, five 49°C, and five 50°C heat stimuli, in random order, to their left lower leg (ramp rate = 40°C/second; 1 second at peak temperature, stimulus onset asynchrony = 17-25 seconds). Following each heat stimulus participants rated their experienced pain on a 100-unit visual analog scale with anchors of "no pain" and "worst-imaginable pain", respectively. We conducted a mixed ANOVA on participants' pain ratings with stimulus temperature (47°C, 49°C and 50°C coded as -1, 0 and 1, respectively) as within-subject factor and treatment as between-subject factor. Pain ratings increased as a function of stimulus temperature (F(1,84) = 424, p < 0.001). However, there was no main effect of treatment (F(2,84) = .21, p = .81) and no temperature x treatment interaction (F(2,84) = .76, p = .47). Thus, the drugs did not affect the subjective pain experience evoked by heat-pain stimuli. Note that four participants who were included in this analysis were excluded from the fMRI analyses of the pain-avoidance learning task, because of excessive head movement.
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint

+ -
A . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint respectively. There was only one significant treatment effect: The negative correlation with expected pain probability on no-pain trials (axiom 3) in the PAG cluster was stronger in the placebo group than in the drug groups (p = .04). As this result did not survive correction for multiple tests, and there were no significant treatment effects for the other prediction-error contrasts in any of the clusters, we conclude that our pharmacological manipulations did not affect prediction-error related brain activation. Error bars indicate standard errors. * p < .05 (uncorrected for multiple tests).
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted October 18, 2021. ; https://doi.org/10.1101/2021. 10.18.464769 doi: bioRxiv preprint