The role of dopaminergic nuclei in predicting and experiencing gains and losses: A 7T human fMRI study

The ability to predict the outcomes of actions based on experience is crucial for making successful decisions in new or dynamic environments. In animal studies using electrophysiology, it was found that dopamine neurons, located in the substantia nigra (SN) and the ventral tegmental area (VTA), have a crucial role in feedback-based learning. However, human neuroimaging studies have provided inconclusive results. The present work used ultrahigh field (7 Tesla) structural and functional MRI and optimized protocols to extract SN and VTA signals in human participants. In a number-guessing task, we found significant correlations with reward prediction error and risk in both the SN and the VTA and no correlation with expected value. We also found a surprise signal in the SN. These results are in line with a recent framework that proposed a differential role for the VTA and the SN in, respectively, learning of values and surprise.


Introduction 1
In order to adapt to an ever-changing environment, it is crucial for individuals to Behavior 88 To check whether participants were engaged in the task, we introduced test trials in 89 which, instead of revealing their reward, participants had to say whether they won or lost 90 in that specific trial. This was only possible if they still remembered the first number and 91 their initial bet. Three blocks (from three different participants) were discarded based on 92 behavior: One block was discarded because three out of the five test trials were incorrect, 93 and the other two blocks were discarded because twelve out of sixty missed bets. 94 In the remaining blocks, and over the two blocks (i.e., 120 total trials), participants 95 made on average 1.0 mistakes (SD=1.05, min=0, max=4), missed on average 4.48 tri-96 als (SD=3.65, min=0, max=12), and chose on average the right option on 57.81 trials 97 (SD=13.75, min=21, max=88).

98
Anatomical masks 99 To measure the inter-rater reliability of the individual SN and VTA segmentation, we 100 calculated Dice Scores (see Table 1). In general, higher scores were obtained for the SN as 101 compared to the VTA. This is not surprising, because Dice scores are sensitive to overall 102 size (the SN is approximately 3.7 times bigger than the VTA), and because the VTA lacks 103 clear anatomical borders. By only keeping those voxels that both raters agreed on (i.e., the 104 conjunction masks), we ensured that the voxels included in the analyses lie exclusively in 105 the investigated ROIs. 106 In addition to the Dice scores, we also calculated the percentage of overlap between 107 our individual conjunction masks and previously proposed group-level subdivisions of the 108 SN and the VTA 1 (Pauli, Nili, & Tyszka, 2018;Zhang et al., 2017), transformed to the 109 individual space (see Figure 3). This measure gives an idea of how much signal from the 110 neighbouring nuclei is mixed with the signal of the targeted structure when using population-111 based instead of individual masks. This measures does not include further mixing of the 112 signal due techniques such as spatial smoothing (which may further increase this measure). 113 We found significant overlap between the medial parts of the SN of the group-level subdivi-114 sions and our individual VTA masks. Specifically, there was a mean overlap of 7.23 percent 115 (SD=10.14, min=0.00, max=34.58, t(53) =5.19, p<0.001) with the medial part of the SNc 116 (mSNc), and a mean overlap of 1.3 percent (SD=2.14, min=0.00, max=8.36, t(53) =4.41, 117 p<0.001) with the lateral part of the SNc (lSNc) as defined by Zhang et al. (2017); and 118 a mean overlap of 1.56 percent (SD=2.21, min=0.00, max=11.93, t(53)=5.13, p<0.001) 119 with the SNc as defined by Pauli et al. (2018). We also found a significant overlap be- group-level masks appear to be accurate to some extent, they often include neighbouring 134 areas (such as the red nucleus, see the top left quadrant in Figure S3) or exclude parts of 135 the targeted areas (such as in the lower right quadrant in Figure S3). Therefore, only by 136 drawing individual masks and avoiding spatial smoothing, we can be sure to not mix signals 137 from different midbrain nuclei.

138
Finally, we calculated the temporal signal-to-noise (tSNR) across the ROIs (see Fig-139 ure S4). The tSNR was lower, yet comparable to the one reported by de Hollander et al.

142
For the fuctional analyses, two blocks of trials (from two different participants) were 143 discarded based on excessive head movements, having a mean framewise displacement (FD, 144 Power et al., 2014) over .3 mm. Because one of these blocks was already discarded based 145 on behavior, a total of four blocks was excluded from the final analyses. In the remaining 146 blocks, and over the two blocks, participants had an average mean FD of .14 mm (SD=.06, 147 min=.04, max=.27).

148
Results of the ROI-wise GLM are shown in Table 2 and Figure 4. First, we investi-149 gated the signal related to expectations (i.e., EV and risk) in both the SN and the VTA, 150 corresponding to the presentation of the first number. We found no parametric correlations 151 between signal in any of the ROI with the EV, with the Bayes Factor (BF) pointing to sub-152 stantial (Jeffreys, 1961) evidence for the null hypothesis. However, there were significant 153 correlations with risk in both the left-VTA (t(26)=-2.34, p<0.05) and the left-SN (t(26)=-154 2.44, p<0.05). Next, we investigated the signal related to feedback processing (i.e., RPE and 155 surprise), corresponding to the presentation of the second number. There were significant 156 correlations with RPE in the left-and right-VTA (t(26)=3.12, p<0.05, and t(26)=2.76, 157 p<0.05) and in the right-SN (t(26)=2.54, p<0.05). Finally, we found a correlation with 158 surprise in the right-SN (t(26)=2.32, p<0.05), and no effect in the VTA, with the BF pro-159 viding substantial support for the null hypothesis. In sum, both the VTA and the SN were 160 linked to risk before the outcome was revealed as well as to RPE after the outcome was 161 revealed. These results confirm previous findings from Fiorillo et al. (2003) regarding the 162 role of dopamine neurons in risk processing and previous findings from, e.g., Schultz (1998) 163 regarding the role of dopamine neurons in RPE processing, but not regarding a possible 164 role of these nuclei also in EV processing. Only the SN was additionally associated with 165 outcome surprise, similarly to Matsumoto and Hikosaka (2009). As a control analysis (see 166   Table S2), we also fit a GLM using the design of Preuschoff et al. (2006). In particular, 167 we fit separate regressors for the first and second epoch after presenting the first number 168 (where the first epoch lasted 1 second and the second epoch lasted 3 seconds). In these 169 analyses, we found significant correlation with risk (in both epochs) and RPE across both 170 the SN and the VTA. However, contrary to the results of our primary analysis, we also 171 found significant correlation with EV in the second epoch with right-SN and left-VTA and 172 no significant correlation with surprise. Note, however, that the high correlation between regressors in the first and second epochs (see Figure S5) might limit the sensitivity of our 174 analysis given our particular task.

176
To explore other sub-cortical and cortical correlates of expectation-and feedback-177 related processes, we fit the same GLM on the whole-brain level. The results are shown in 178   Table 3 and Figure 5 (see also Table S1 for automatic labeling based on cluster peak coordi-179 nates). After cluster correction, we found positive correlations with EV in the ventromedial 180 prefrontal cortex, frontal pole, ventral striatum, and precuneous cortex, and negative cor- we could not test for temporal differentiation in the anticipatory period (due to identifi-189 ability issues, see above), we could observe a spatial differentiation between EV and risk,190 confirming parts of the results from Preuschoff et al. (2006). We also observed a spatial 191 differentiation between RPE and surprise. 193 Understanding the dopamine circuit is of great importance for both clinical and cog-194 nitive neuroscience. First of all, the loss of dopaminergic neurons is associated with Parkin-195 son's disease symptoms (Fearnley & Lees, 1991;Frank, 2006a) and dysregulations in the 196 human dopamine circuit are known to play a role in drug addiction (Everitt & Robbins,197 2005) and pathological gambling (Bergh, Eklund, Södersten, & Nordin, 1997). Moreover, 198 the dopamine signal reflects different aspects of rewards, including the anticipation of risk 199 and the mismatch between predictions and outcomes (Schultz, 2015). While dopamine neu-200 rons are situated mostly in the midbrain, they are part of a much greater and complex 201 circuit, involving different cortical and subcortical areas (Frank, 2006b;Haber & Knutson, 202 2010; Watabe-Uchida, Eshel, & Uchida, 2017). By transmitting information about changes 203 in reward expectations and risk in the environment to areas important for action execution 204 and learning, dopamine likely plays a crucial role in adaptive behavior, that is, for survival 205 in a dynamic environment, with limited resources and obstacles to avoid. fMRI provided incomplete and partially contradicting results. In this paper, we presented 212 the results of a 7T fMRI study involving human participants performing a number-guessing 213 task. To the best of our knowledge, this was the first study to investigate the functional 214 role of both the VTA and the SN using UHF-MRI to acquire high-quality, high-resolution functional and structural images. While previous studies in these areas focused on expected 216 gains or losses and on the RPE signals, we extended the analysis to expected risk and to 217 surprise. This was based on previous electrophysiological and fMRI studies that either 218 found this signal in the VTA/SN or in their target areas (e.g., Fiorillo et al., 2003;Hayden, 219 Heilbronner, Pearson, & Platt, 2011;Preuschoff et al., 2006). While we found no evidence 220 for a linear correlation between reward anticipation (involving both gains and losses) and 221 VTA or SN activation, we did find evidence for a RPE signal in both regions, as well as for 222 expected risk signal. Similarly to Matsumoto and Hikosaka (2009), who found a functional 223 dissociation of VTA and SN, we also found a surprise signal in the SN but not in the VTA.

224
Given previous findings (Fiorillo et al., 2003) and theoretical considerations (as a found evidence for a positive RPE in VTA and not in SN. We also found an RPE signal 240 in ventral striatum, orbital frontal cortex, and anterior insula, confirming previous fMRI 241 results that looked at dopamine target areas (Bartra et al., 2013).

242
Here, we showed the presence of a risk signal in both the VTA and the SN, in line 243 with electrophysiological studies in non-human animals (Fiorillo et al., 2003). We also found 244 a risk signal in insula and orbital frontal cortex, confirming previous fMRI studies linking 245 these areas to the coding of risk (Brown & Braver, 2018;Preuschoff et al., 2006).

246
The presence of a surprise signal in the SN and not in the VTA fits remarkably well 247 with results from the animal literature (Matsumoto & Hikosaka, 2009) and with the frame-248 work proposed by Bromberg-Martin, Matsumoto, and Hikosaka (2010). In this framework, To be able to more reliably extract and separate the signals from the 264 VTA and the SN, we therefore drew individual masks, based on 0.7 mm isotropic, multi-265 modal, anatomical images that were acquired for each participant in a separate session. By 266 restricting the analyses to the individual space, we also prevented misalignment issues that 267 usually occur when transforming individual images to a group or standard space. To define 268 the final masks, we adopted a rather conservative approach, by keeping the intersection 269 of the masks drawn by two independent and trained raters. To illustrate the importance 270 of these precautions, we compared our masks to previously proposed VTA and SN prob-  Future studies could attempt to distinguish between the pars compacta and reticulata 282 of the SN, as dopamine neurons are mainly situated in the pars compacta (Roeper, 2013).

283
However, these two parts are virtually indistinguishable based on MRI contrast alone (see 284 Figure 2). Therefore, to avoid making an arbitrary decisions on where to set a border 285 between the two, we considered the SN as one structure. By combining different method-286 ologies (i.e., diffusion MRI) future studies might be able to shed light on SN functional 287 subdivisions.

288
Another limitation of the present study relies in the nature of the BOLD signal.

289
Since the BOLD response measured in fMRI is an indirect measure of neuronal activity 290 and is mainly thought to measure signals input and local processing of neurons rather than 291 their output (Logothetis & Wandell, 2004), it is important to integrate results from different 292 methodologies and species in order to understand the complexity of the dopaminergic circuit 293 as a whole.

294
In sum, in this study we used novel methodologies to investigate how the brain pro-295 cesses gains and losses and updates expectations based on experience. We were able to 296 show a risk signal in the dopamine nuclei and provided evidence for a full RPE signal in the 297 presence of both gains and losses, thus clarifying previous results of human fMRI studies.

298
This study opens the way to a better understanding of the dopamine circuit in the human 299 brain, especially regarding the functional specificity of the SN and the VTA (or of their 300 subregions) in reward-based decision making and adaptive behavior.

302
Participants and procedure 303 Twenty-seven participants [8 male (mean age=24.7, SD=5.0, min=19, max=35), 19 304 female (mean age=24.4, SD=4.7, min=19, max=35)] took part in the experiment. The 305 study was approved by the ethics committee of the University of Amsterdam. All par-306 ticipants completed two separate sessions, one to obtain multimodal, 0.7 mm isotropic 307 structural data, and one to obtain 1.5 mm isotropic functional data while participants en-308 gaged in a number-guessing task. All participants were recruited from the University of 128. To acquire images with such TE, TR, and voxel-size, the protocol did not employ Fat 346 suppression, and, to increase SNR, the protocol did not employ Partial Fourier. After the 347 first run, an EPI image with opposite phase coding direction as compared to the functional 348 scan was acquired to help correcting for geometric distortions due to inhomogeneities in the 349 B0 field using the TOPUP technique during preprocessing (see below).

351
The number-guessing task used in the present study is an adaptation of the task 352 by Preuschoff et al. (2006). In each trial ( Figure 1A), two numbers were sampled one 353 after the other from the set 1, 2, 3, 4, 5 without replacement. At the beginning of each trial, 354 before seeing both numbers, participants were asked to bet which of the two numbers will be 355 higher: They could win 5 euro if their bet (i.e., their prediction) was correct, and lose 5 euro and in this case is thus 5 · 0.75 − 5 · 0.25 = 2.5 euros. The risk, often defined as the variance 371 of the possible outcomes (Markowitz, 1952), is thus 4.3. Note that, when the first number 372 is 3, the probability to win remains 50%, the EV remains 0, and the risk is highest, equal 373 to 5. On the contrary, when the first number is either 1 or 5, participants already know 374 whether they will lose or win (depending on what the bet was), therefore the EV is either 375 −5 or 5 euros and the risk is always 0. Since we were interested in neural correlates of 376 both EV and risk, it is a crucial aspect of this design that EV and risk are not correlated 377 ( Figure 1B).

378
At last, the second number is shown for 2 seconds, together with the corresponding 379 gain or loss. At this point, the reward prediction error (RPE) is calculated: In the example above (i.e., bet on 2nd number being higher; first number is 2), if the second 381 number is 3, the reward is 5 euros and the reward prediction error is 5 − 2.5 = 2.5 euros.

382
The surprise, defined as the absolute value of the reward prediction error (i.e., the reward expectation after the first number) as in Schultz (2015) and in Hayden et al. (2011), is 384 thus |5 − 2.5| = 2.5. Since we were also interested in neural correlates of both RPE and 385 surprise, it was also crucial that they were uncorrelated. This was the case, since RPE 386 ranged between -7.5 and 7.5 and its distribution over trials was symmetrically centered 387 around 0, and surprise was simply its absolute value.

388
The experiment consisted of 120 trials, divided in two blocks. In each block, 5 test tri-389 als were included to encourage participants to remain attentive throughout the experiment.

460
To ease and improve the segmentation process, we therefore combined the T * 2 -weighted and 461 T 1 -weighted images, by first normalizing them within the midbrain area (i.e., a pre-selected T 1 -weighted images (second and third row) highlight, respectively, iron rich areas and the 467 CSF; their sum (fourth row) thus allows to segment the VTA, as it is mainly defined by the 468 border it shares with these regions (which are hard to visualize within the same contrast).
Manual segmentation was performed using FSLView version 3.0.2, by two independent 470 and trained researchers (one of which is the first author of this study). Only the voxels that 471 were marked by both researchers were kept in the final masks, that is, the conjunction 472 masks. To assess inter-rater reliability (i.e., the agreement between the two researcher), 473 we computed the Dice score (Dice, 1945)  can be thus transformed in the individual space to extract the signal from these regions.

485
The disadvantage of this less resource-intensive approach, however, is a potential loss of and block using statsmodels (Seabold & Perktold, 2010). Specifically, we used the GLSAR 498 AR(1) model, to account for autocorrelation. The design matrices were constructed using 499 Nistats (https://nistats.github.io/index.html). In the design matrices, the following  Note. Dice scores and size of the individual conjunction masks of the regions of interest (ROI): left and right substantia nigra (SN) and left and right ventral tegmental area (VTA). Conjunction masks are the intersection of the two independent raters' masks. Dice scores closer to 1 indicate higher agreement between the two raters, while dice scores close to 0 indicate lower agreement between the two raters. Note. Results of the independent two-sided t-tests for the mean of the predictors of main interest of the GLM being equal to zero: expected value (EV) and expected risk (estimated when the trials' first number is presented), and reward prediction error (RPE) and surprise (estimated when the trial's reward or punishment are presented). These tests were run separately by regions of interest: left and right substantia nigra (SN), and left and right ventral tegmental area (VTA). Bayes factors (BF) higher than 1 provide evidence for an effect, while BF lower than 1 provide evidence for the absence of an effect. Note. Clusters surviving thresholding. We report the number of voxels, cluster probability, log probability, activation and MNI coordinate of the activation peak voxel in a cluster. A B Figure 1 . Experimental design. A. Example of a single trial. Between each event and at the beginning of each trial, a fixation cross is presented for a period of time between 4 and 10 seconds. A bet has to be placed within 1 second, and a rectangle is drawn around the corresponding choice for 1 more second. The first number is then shown for 2 seconds: In this example, the expected reward is 2.5 euros, and the risk is 4.3. Finally, the second number is shown for 2 seconds: In this case, both the reward prediction error and the surprise are 2.5. In test trials (approximately 8%) participants have to specify whether they won or lost. B. Relationship between risk and expected reward when the first number is shown, depending on the choice.  Figure 2 . Detail of the midbrain area of one participant in the sagittal (first column), coronal (second column), and axial (third column) planes. The first row is the QSM image, used for SN segmentation. The second and third row are, respectively, the average between the third and fourth echo of the T * 2 -weighted, and the T 1 -weighted images. To obtain the image in fourth row, the images in the second and third row were normalized within the midbrain area (the non-homogeneous grey area in the last row) and then summed. This image was used for VTA segmentation, as it shows a contrast of both iron-rich nuclei and of the CSF.  . Different plots represent the predictors of main interest: expected value (EV) and expected risk (estimated when the trials' first number is presented), and reward prediction error (RPE) and surprise (estimated when the trial's reward or punishment are presented). Error bars represent 95% confidence intervals. Figure 5 . Results of the voxel-wise GLM after cluster correction, and overlapped onto the mean functional image across participants and volumes. Each row corresponds to the predictors of main interest: expected value (EV) and expected risk (estimated when the trials' first number is presented), and reward prediction error (RPE) and surprise (estimated when the trial's reward or punishment are presented).