Individual differences in successful self-regulation of the dopaminergic midbrain

The dopaminergic midbrain is associated with brain functions, such as reinforcement learning, motivation and decision-making that are often disturbed in neuropsychiatric disease. Previous research has shown that activity in the dopaminergic midbrain can be endogenously modulated via neurofeedback, suggesting potential for non-pharmacological interventions. However, the robustness of endogenous modulation, a requirement for clinical translation, is unclear. Here, we examined how self-modulation capability relates to regulation transfer. Moreover, to elucidate potential mechanisms underlying successful self-regulation, we studied individual prediction error coding, and, during an independent monetary incentive delay (MID) task, individual reward sensitivity. Fifty-nine participants underwent neurofeedback training either in a veridical or inverted feedback group. Successful self-regulation was associated with post-training activity within the cognitive control network and accompanied by decreasing prefrontal prediction error signals and increased prefrontal reward sensitivity in the MID task. The correlative link of dopaminergic self-regulation with individual differences in prefrontal prediction error and reward sensitivity suggests that reinforcement learning contributes to successful self-regulation. Our findings therefore provide new insights in the control of dopaminergic midbrain activity and pave the way to improve neurofeedback training in neuropsychiatric patients.


3
Dysfunctions of the reward system have far-reaching consequences and are associated with the 4 development of several severe psychiatric disease such as addiction 8 and schizophrenia 9,10 . Despite 5 decades of extensive neuroscience and imaging studies which have contributed to an impressive body 6 of knowledge of normal and abnormal reward system function, the neural mechanisms controlling 7 midbrain activity are still not fully understood 11 . One key issue that has received increasing attention 8 is whether humans are able to cognitively control brain activity within the reward system. It has already 9 been shown that both healthy controls 12,13 , and patients with cocaine addiction 14 can learn to regulate 10 SN/VTA activity during real-time functional magnetic resonance imaging (rt-fMRI) neurofeedback 11 training. However, the outcome of primary interest in neurofeedback training is a transfer beyond 12 training itself, i.e., the ability to regulate activity also after training and without feedback. Such transfer 13 is critical for clinical applications, including those involving disorders of the reward system 15 . While

14
MacInnes and colleagues 13 observed significant neural transfer effects in the form of increased neural 15 activity and connectivity of the VTA during transfer on the group level, the other two studies revealed 16 high between-subject variance in this self-regulation success. The purpose of this work is to determine 17 how variance arises, how individuals with successful transfer effects differ from individuals without 18 transfer effects, and whether activity in brain regions other than the VTA characterize individuals with 19 successful transfer. We addressed these issues by combining data from two previous rt-fMRI 20 neurofeedback studies 12,14 and pursuing three aims.

21
(1) Our first goal was to characterize individual differences in the degree of successful transfer 22 of SN/VTA self-regulation and thereby differentiate regulators from non-regulators. Individual 23 differences in regulation success and high variability of transfer effects arises also in other 24 neurofeedback modalities such as electroencephalography (EEG) and are often neglected 16 . For rt-25 fMRI neurofeedback control, neural activity in the cognitive (or executive) control network may play 26 an important role especially when performing a demanding task such as imagery 17 . Therefore, and 27 based on the known direct and indirect connections between prefrontal cortex and SN/VTA 18-21 we 28 hypothesize that successful transfer of SN/VTA regulation is associated with activation in brain regions 29 that are part of the cognitive (executive) control network, especially prefrontal areas.

30
(2) Our second goal was to determine whether the framework of (operant) associative learning 31 can be used to explain neurofeedback training. In applications of the associative learning framework 32 to neurofeedback 17,22 , the feedback provides a higher order reward and the chosen mental strategy is 33 reinforced in proportion to the sign and magnitude of the feedback. At the beginning of the training, 34 participants cannot predict which strategy will consistently lead to an up-or downregulation in brain 35 activity within the target region. Therefore, if they use an adequate strategy, participants receive more 36 reward than predicted corresponding to a positive prediction error. As a consequence, they would be 37 more likely to repeat the strategy, expect higher feedback next time and gradually learn how to keep 38 the feedback signal high. Accordingly, in regulators the size of the prediction error should gradually 39 decrease as the expected feedback increasingly converges with the actual feedback. In contrast, for  Here, we directly investigate the prediction error mechanism in regions that control the SN/VTA, which 47 itself has been traditionally associated with the coding of reward prediction errors in both animal 2,25,26 48 and human research 27,28 . Furthermore, the causal sufficiency of dopaminergic prediction error signals 49 for learning has been reinforced by optogenetics 29,30 . Together, we hypothesize here that decreasing 50 prediction error signals during neurofeedback learning are associated with successful self-regulation 51 and transfer effects.

52
(3) Our third aim was to relate individual differences in the ability to regulate the midbrain to 53 characteristics of reward processing, in order to further distinguish regulators from non-regulators.

54
Thus, we asked whether successful neurofeedback training, as measured by transfer effects, taps into 55 general properties of the reward system. Given that adaptive reward processing characterizes the 56 SN/VTA 1,31 we used a variant of the monetary incentive delay (MID) task that captures differences in 57 adaptive reward sensitivity between clinical and non-clinical populations 32 . Using this task, we tested 58 the hypothesis that reward processing in regions that may control the dopaminergic midbrain is 59 related to successful SN/VTA self-regulation.

60
In sum, to study individual differences in capability to gain control of the SN/VTA we used rt-

Participants 68
Fifty-nine right-handed participants (45 males, average age 28.25±5.25 years) underwent SN/VTA 69 neurofeedback training. We analysed data from two independent projects, which used highly similar 70 rt-fMRI paradigms, rt-fMRI software and scanner hardware. The first dataset 12 comprised male 71 participants, randomly assigned to one of two groups. The experimental group received veridical 72 neurofeedback (N = 15), the control group received inverted neurofeedback (N = 16) as training signal.

73
The second dataset 14

Neurofeedback paradigm 100
The participants were instructed that their goal was to control a reward-related region-of-interest in 101 their brains by imagining rewarding stimuli, actions, or events. We have previously shown that reward 102 imagination activates SN/VTA with conventional fMRI 33

Region-of-interest SN/VTA 125
In both studies, the target region for neurofeedback, i.e. the substantia nigra (SN) and ventral 126 tegmental area (VTA), was structurally identified using individual anatomical scans. Since the individual 127 mask definition slightly differed between Study 1 and 2 (T1-weighted scans in Study 1 and T2-weighted 128 scans in Study 2), we used an independent mask for our post-hoc analysis. By this, we can control for 129 individual differences between experimenter ROI selection strategies, to avoid interpolation 130 confounds due to warping by normalization and use a reliable seed region for functional connectivity 131 analysis. Specifically, we used the probabilistic mask of the SN and VTA as defined by 35 , which is based 132 on a large sample (148 datasets) and available on https://www.adcocklab.org/neuroimaging-tools 133 (download August 2018). Figure 1B illustrates this mask within the brain. From this mask image, we 134 extracted and averaged SN/VTA activity for each participant using custom-made scripts in Matlab 135 R2016b.

Degree of regulation transfer (DRT) 137
We assessed the effects of individual differences in performance to characterise participants on a

157
DRT in fMRI analysis: The DRT measure served to investigate the individual differences in successful 158 transfer at the whole brain level. In particular, we were interested to identify regions that were 159 positively associated with DRT and thus potentially contribute to regulation of the SN/VTA. For this 160 analysis, we entered mean-centered individual DRT levels in all fMRI second level statistical models 161 (see 2.8). We excluded SN/VTA from all analyses to avoid any circularity.

162
Spatial specificity control analysis: To investigate the spatial specificity of our analysis of dopaminergic 163 midbrain regulation, we performed the same whole brain analysis as described above for SN/VTA with 164 a different ROI. Specifically, we used the neighboring brain region of the parahippocampus 165 (Supplemental Material). In keeping with specificity, this control analysis revealed little commonality 166 (limited to the cerebellum and temporal gyrus) with the SN/VTA analysis ( Figure S4 and Table S8).

MID Task 168
In addition to the neurofeedback training, the participants in Study 2 (N=25) performed a MID task 169 that captures differences in adaptive reward sensitivity. In every trial of the MID task 32,36,37 first one of 170 three cues appeared (Fig. S1). One cue was associated with large reward (ranging from 0 to 2.00 CHF),

171
one cue with small reward (0 to 0.40 CHF) and one cue with no reward. After a delay of 2.5 to 3 s, 172 participants had to identify an outlier from three circles by pressing one of three buttons as quickly as

MR Data pre-processing 178
We despiked the functional data using AFNI toolbox (National Institute of Mental Health;

187
The spatial specificity control analyses ( Figure S4 and Table S8) suggest that the findings 188 reported here are not due to common physiological noise. To more directly account for noise, we 189 additionally acquired physiological data in a subsample of participants. In the available subsample,

MR Data analysis 205
For all of the following analyses, we used the toolbox SPM 12 (v6906) within Matlab R2016b. All figures 206 were created using bspmview v.20161108 42

220
We ran these analyses in all voxels other than the SN/VTA and separately for both the veridical and 221 inverted feedback groups. To test for common and separate activity between the groups, we 222 performed conjunction and disjunction analyses over the two group maps. Additionally, we performed 223 a two-sample t-test group comparison analysis to identify significant group differences. To identify 224 activity within the cognitive control network, we used a cognitive control template based on the 225 coordinates from a meta-analysis 43 . We created this template with fslmaths and spheres of 15 mm 226 around all coordinates from the meta-analysis. In table S1 we identify regions of the cognitive control

234
The second question of the study asked whether successful neurofeedback performance was 235 associated with a reduction in prediction error during the training runs as captured by a classic 236 reinforcement learning framework. To address this issue, we determined the temporal difference of 237 the feedback signal (i.e., the change in height of the smiley) as proxy for the prediction error signal.

238
Specifically, for the neurofeedback training runs we constructed a GLM that replaced the block-level

261
To address the third aim of the study, we investigated the relationship between reward processing in 262 the MID task and the capacity to successfully regulate the SN/VTA in the neurofeedback experiment.

263
In particular, we considered two contrasts in the MID task (1) general reward sensitivity, defined as

Additional behavioral measurements 272
Strategies: All participants were introduced to five example strategies (see 2.3) that they could use to 273 upregulate brain activity but also free to use their own strategies. At the end of the experiment, 274 participants filled in a custom-made questionnaire on the strategies they used. To compare strategies 275 between the groups, we used a χ2-test to assess differences in the distribution of strategy usage. We 276 did not observe any significant group differences in strategy use (p = .9), and therefore did not consider 277 this measurement in any further analysis.

No difference in degree of regulation transfer (DRT) across groups 287
We first evaluated the DRT measure and compared it between the three datasets. There were no

Correlation of slopes between transfer and training only for intervention group 298
Next, we tested for differences between groups in the relationship of SN/VTA transfer as measured by  areas consistently reported by neurofeedback studies (see Fig. 2 in the meta-analysis of Sitaram et 313 al. 17 , including dorsolateral prefrontal cortex (dlPFC), anterior cingulate cortex (ACC), lateral occipital 314 cortex (LOC), and thalamus ( Figure 3A and Table 1). To formally test for a more general association 315 with the cognitive control network, we applied a cognitive control network template from a meta-316 analysis 43 , which in addition revealed neural activity in precuneus and striatum (Fig. 3B for exemplary 317 illustrations of dlPFC, ACC, temporal gyrus, and thalamus activity; Table S1 for full overview). Thus, 318 regions of the cognitive control network showed transfer to the extent that neurofeedback training of 319 the dopaminergic midbrain was successful. 320 Figure 3: Correlation of DRT with transfer success after training in veridical feedback group: To investigate whole-brain neural activity correlating with successful SN/VTA self-regulation, we used DRT as measure of successful regulation of the SN/VTA and correlated it with the contrast (IMAGINE_REWARDtransfer -RESTtransfer) -(IMAGINE_REWARDbaseline -RESTbaseline) as measure of learning related change in neural activity in the rest of the brain. A) The analysis revealed task-specific correlations primarily within the cognitive control network (whole brain overview FWE-corrected with p < 0.05 on cluster level, projected to lateral and medial sagittal sections). B) Exemplary correlations within the cognitive control network have been depicted, here in MFG/dlPFC, ACC, Thalamus, and bilateral Temporal Gyrus, to illustrate the association between neural activity with DRT. The correlations are for illustration purposes only without further significance testing to avoid double dipping. The grey shaded area identifies 95 % confidence interval.   Figure 3a).

321
For the inverted feedback group, the same analysis resulted in partly distinct activations. In contrast 322 to the veridical feedback group, left amygdala activity correlated significantly with DRT ( Fig. 4 and Table   323 S2). Importantly, activity in cognitive control areas reported above, such as dlPFC and ACC, was 324 significantly weaker in inverted than veridical feedback groups (Table S3 for

328
We also tested for common activity in the two feedback groups using conjunction analysis.

329
Similar to the veridical group, the inverted feedback group showed correlations between DRT and 330 activity in the precuneus, middle temporal gyrus, insula, thalamus, and parahippocampal gyrus (Table   331 S4). These common areas appear to reflect non-specific regulation activity and may be associated with 332 memory and introspection processes.

Reinforcement learning: DLPFC prediction error coding during neurofeedback training 334 correlates with DRT 335
To investigate whether reinforcement learning mechanisms contribute to successful neurofeedback 336 transfer, we tested for the temporal differences in the feedback signal as proxy for the prediction error 337 signal during the training runs. We reasoned that prediction error activity should decrease from early difference learning models, prediction errors are calculated at each moment in time 48 . Therefore, we 342 operationalized prediction error by subtracting the immediately preceding SN/VTA activity (prediction) 343 from the present SN/VTA activity (outcome). Specifically, we tested for a negative correlation of DRT 344 with the difference in prediction error coding signals between late and early training. In other words, 345 only for participants with high DRT we expected to observe a decrease of prediction error signal over 346 the course of the neurofeedback training. We found such gradually decreasing prediction error signals 347 in dlPFC (Fig. 5 and Table S5). To interrogate the finding in detail, we also analysed the two  The neural prediction error signal, corresponding to the temporal difference between the current and immediately preceding feedback activity from the SN/VTA decreased with ongoing feedback training (i.e, the difference between the last and first run) within dlPFC more strongly in individuals with higher DRT (p < 0.001). This finding is consistent with reinforcement learning theories, according to which prediction errors decrease as learning progresses. By extension, a reinforcement learning framework can explain successful neurofeedback training. (B) The plot depicts the differences in prediction error signals in dlPFC between the last and first training for every participant. This shows that the individual degree of regulation success statistically relates to the decrease in prediction error coding over training. The plot is for illustration purposes only without further significance testing to avoid double dipping. The grey shaded area identifies the 95 % confidence interval. The correlation plot depicts connectivity between dlPFC and SN/VTA with DRT. The plot is for illustration purposes only without further significance testing to avoid double dipping. The grey shaded area identifies the 95 % confidence interval.

Individual differences in dlPFC reward sensitivity during MID task correlate with 366 regulation success 367
In Study 2 we used the MID task to independently measure reward sensitivity and the capability to 368 adapt to different reward contexts 32 Table S6). Thus, the more successful individuals were at self-regulating SN/VTA as a 377 result of neurofeedback training, the more sensitive they were to reward and the more strongly they 378 adapted to different reward contexts in the MID task. Figure 7 Reward-sensitivity in dlPFC correlates with successful SN/VTA self-regulation: (A) Degree of successful SN/VTA transfer (DRT) in the neurofeedback task correlated with prefrontal reward sensitivity and adaptive coding in the MID task. A conjunction analysis around the peak coordinate in dlPFC showing DRT-related decreases in prediction error coding during neurofeedback training (MNI x = 40, y = 10, z = 38, left) revealed common neural activity reflecting transfer (IMAGINE_REWARDtransfer -RESTtransfer) -(IMAGINE_REWARDbaseline -RESTbaseline) and reward sensitivity (small + large reward magnitude parametric modulators in MID, all contrasts with p<0.001). Moreover, individuals with more successful selfregulation of the SN/VTA showed stronger adaptive reward coding (which reflects higher sensitivity to small relative to large rewards) in the same region that also showed DRT-related decreases in prediction error coding during neurofeedback training (right). (B) The correlation plot depicts adaptive reward coding activity in dlPFC with DRT. The plot is for illustration purposes only without further significance testing to avoid double dipping. The grey shaded area identifies the 95 % confidence interval.

380
In the present work, we used data acquired from two previous rt-fMRI neurofeedback studies to

403
One insight of the present study is that transfer success associates with neural activity in 404 cognitive control network areas 43,52 , such as dlPFC and ACC. The lack of cognitive control engagement 405 within the control group and the correlation of DRT with the slope of SN/VTA increase during training 406 in the intervention group only underpins that this finding is specific for the successful transfer of the 407 learned self-regulation procedure. This network overlaps with regions that have been associated with 408 feedback-related information processing during training 53,54 . Together, these findings suggest that the

451
At the functional level, a recent study on creative problem solving in humans highlights that 452 dlPFC is involved in experiencing a moment of insight 77 . According to this effective connectivity study,

461
Our independent reward task revealed that individual differences in prefrontal reward 462 sensitivity and efficient adaptive reward coding were associated with successful SN/VTA self-463 regulation. Adaptive coding of rewards captures the notion that neural activity (output) should match 464 the most likely inputs to maximize efficiency and representational precision 79 . Accordingly, we 465 previously showed that reward regions encode a small range of rewards more sensitively than the 466 large range of rewards 37,80 . Interestingly, in the present study, participants who were more sensitive 467 to small rewards were also more successful in self-regulation of the dopaminergic midbrain. When

473
A potential limitation of our study is that we used a combined mask for SN and VTA even 474 though differences in functionality and anatomy have been reported for the two regions (reviewed 475 e.g. by Trutti et al. 81 ), with the SN more related to motor functions and the VTA to reward functions.

476
However, it should be kept in mind that when viewed through the lens of recording and imaging rather 477 than lesion techniques the differences are more gradual than categorical 82 . Still, future studies may 478 want to use more specific feedback from one or the other region to more specifically target potential