Intrinsically regulated learning is modulated by synaptic dopamine availability

We recently provided evidence that an intrinsic reward-related signal—triggered by successful learning in absence of any external feedback—modulated the entrance of new information into long-term memory via the activation of the dopaminergic midbrain, hippocampus, and ventral striatum (the SN/VTA-Hippocampal loop; Ripollés et al., 2016). Here, we used a double-blind, within-subject randomized pharmacological intervention to test whether this learning process is indeed dopamine-dependent. A group of healthy individuals completed three behavioural sessions of our learning task after the intake of different pharmacological treatments: a dopaminergic precursor, a dopamine receptor antagonist or a placebo. Results show that the pharmacological intervention bidirectionally modulated behavioral measures of both learning and pleasantness, inducing memory benefits after 24 hours only for those participants with a high sensitivity to reward. These results provide causal evidence for a dopamine-dependent mechanism instrumental in intrinsically regulated learning, and further suggest that subject-specific dopamine sensitivity drastically alters learning success.


INTRODUCTION
Growing evidence both from animal and human studies support the notion that midbrain dopaminergic neurons of the substantia nigra/ventral tegmental area complex (SN/VTA), along with the the ventral striatum (VS) and the hippocampus (HP), form a functional loop (the SN/VTA-HP loop) in the service of learning and memory (Lisman and Grace, 2005;Goto and Grace, 2005;Lisman et al., 2011;Shohamy and Adcock, 2010;Kaminski et al., 2018). In the downward arm of the circuit, signals are sent from the HP to the SN/VTA through the VS, which is thought to integrate affective, motivational, and goaldirected information into the loop (Lisman and Grace, 2005;Goto and Grace, 2005). In the upward arm of the loop, dopamine is released from the SN/VTA back into the HP, which in turn enhances memory formation and learning through long term potentiation (LTP) processes (Lisman et al., 2011;Lisman and Grace, 2005;Shohamy and Adcock, 2010).
In this vein, fMRI research in humans has consistently shown that both explicit (Adcock et al., 2006;Wittmann et al., 2005;Wolosin et al., 2012;Callan et al., 2008) and implicit reward (Ripollés et al., 2016) can promote the storage of new information into longterm memory through the activation of the SN/VTA-HP loop (see Fig. 8 in Ripollés et al., 2016). However, although fMRI activity within the SN/VTA is usually associated with the release of dopamine (Duzel et al., 2009;Ferenczi et al., 2016;Knutson and Gibbs, 2007;Salimpoor et al., 2011;Schott et al., 2008), neuroimaging studies can only provide indirect evidence of the actual involvement of the dopaminergic mesolimbic system in learning and memory processes. In order to proof that a dopamine-dependent mechanism plays a critical role in this process, one avenue to pursue is to directly manipulate the levels of dopamine in the human brain through pharmacological interventions. Several studies have shown that the intake of dexamphetamine and methylphenidate (which block dopaminergic and adrenergic re-uptake; Breitenstein et al., 2004;Whiting et al., 2007;Whiting et al., 2008;Linssen et al., 2014) and specially, levodopa-a dopamine precursor-can enhance memory and learning in both healthy (Shellshear et al., 2015;Bunzeck, et al., 2014;Chowdury et al., 2012;Knecht et al., 2004) and clinical populations (Berthier et al., 2011).
We recently provided behavioural, functional and physiological evidence by means of fMRI and skin conductance response, showing that an intrinsic reward-related signaltriggered by successful learning in absence of any external feedback or explicit rewardmodulated the entrance of new information into long-term memory via the activation of the SN/VTA-HP loop (Ripollés et al., 2016). Here, we used a double-blind, within-subject randomized pharmacological intervention to directly assess the hypothesis that synaptic dopamine availability plays a causal role in this learning process. A group of 29 individuals were asked to perform a learning task (that mimics our capacity to learn the meaning of newwords presented in verbal contexts; Ripollés et al., 2016Ripollés et al., , 2017Ripollés et al., and 2014Mestres-Missé et al., 2007) after the intake of three different pharmacological treatments: a dopaminergic precursor (levodopa, 100 mg + carbidopa, 25 mg), a dopamine receptor antagonist (risperidone, 2mg), or a placebo (lactose). We predicted that behavioral measures of both learning and reward should respectively increase and decrease under levodopa and risperidone, thus modulating the memory benefits for the learned words after the consolidation period (24 hours).

Results
Twenty-nine healthy participants completed a behavioural version of our wordlearning task (see Materials and Methods), in which the meaning of a new-word could be learned from the context provided by two sentences built with an increasing degree of contextual constraint (Mestres-Missé et al., 2010). Only half of the pairs of sentences disambiguated multiple meanings, allowing the encoding of a congruent meaning of the newword during its second presentation (M+ condition). For the other pairs, the new-word was not associated with a congruent meaning across the sentences, and could not be learned (Mcondition). This condition, as in our previous study (Ripollés et al., 2016), was included to control for possible confounds related to novelty, attention and task difficulty (Guitart-Masip et al., 2010;Bunzeck and Duzel, 2006;Boehler et al., 2011). At the end of each learning trial (i.e., after the second sentence for a particular new-word appeared) participants first provided a confidence rating (a subjective evaluation of their performance) and then rated their emotions with respect to arousal and pleasantness. After approximately 24 hours (no drug intake occurred during the second day of testing), participants completed a recognition test to assess their learning (chance level was 25%; see Materials and Methods). Three participants were excluded from the analyses (see Materials and Methods) and thus the final sample was reduced to 26 individuals (17 women, mean age=22.27 ± 3.69).
We first assessed whether our participants' performance under the placebo condition replicated our previous results. Participants ascribed correct meaning to 60 ± 10 % of newwords from the M+ condition during the encoding phase. In 61 ± 15% of the M-trials, and remembered words from the M+ condition and correctly identified M-words (i.e., no meaning ascribed) at the same rate as in our previous experiment.
We then focused our analyses on learned (on Day 1) and still remembered (on Day 2) M+ new-words. In our previous work (Ripollés et al., 2016) this was the condition associated to the largest fMRI activity within the SN/VTA-HP loop, the largest physiological response and the highest subjective pleasantness ratings, even when compared with learned words that were forgotten after 24 hours (as a control, we used M-new-words correctly identified during the encoding phase and after 24 hours). Accordingly, in the present study subjective pleasantness and confidence ratings on Day 1 were higher for remembered than for forgotten M+ new-words in the 24-hour recognition test [pleasantness, t(25) Figure 1B). There was not, however, a significant effect of drug on the recognition rate [i.e., the percentage of remembered words in the recognition test of Day 2 compared to those that were learned on Day 1; t(25)=-0.013, p=0.989, d=0.003, BF+0=0.20].

Figure 1. Effects of the pharmacological intervention (mean ± SEM) in a) Learning scores and b)
subjective ratings. Note that subjective ratings were only measured during the learning phase of Day 1. Effects are calculated as % of change with respect to the placebo session. *p<0.05, **p<0.01 As expected, for the control M-condition no significant differences between the risperidone and levodopa sessions as compared to placebo were found for the learning scores  Figure 1A], or the subjective ratings of arousal [t(23)=1.72, p=0.097, d=0.45, BF+0=1.46] and confidence [t(23)=0.36, p=0.720, d=0.10, BF+0=0.28; see Figure 1B; two participants were excluded from the rating analyses after not correctly rejecting any M-word at 24 hours from those correctly rejected during encoding in the levodopa session]. For the pleasantness ratings, however, the difference was close to significance [t(23)=2.02, p=0.055, d=0.40, BF+0=2.32]. However, it is important to note that the pleasantness ratings for M-trials remembered at 24 hours were not different from 0 at any session [Risperidone mean rating =-0.23, t (23) =1.20,p=0.240,d=0.23,BF10=0.40], implying that participants did not find this learning condition particularly rewarding even if the pharmacological intervention slightly modified their subjective ratings. Given that our learning task modulates activity within the reward network (Ripollés et al., 2014 and, we further tested whether individual differences in sensitivity to reward interacted with the drug intervention to modulate memory benefits (Ferreri et al., 2017;de Vries et al. 2010;Apitz and Bunzeck, 2013). Twenty-four out of the twenty-six participants completed the Physical Anhedonia Scale (PAS, Chapman et al. 1976; mean score = 11.62 ± 5.47) and we correlated (using Spearman´s rho) their individual scores with the drug effect for each learning condition (the drug effect was calculated as the subtraction of the percentage of change from placebo of the levodopa session minus the percentage of change from placebo of the risperidone session, see Materials and Methods). As a control and in order to take into account previous results (Chowdury et al., 2012), we also assessed the relationship of the learning scores with the weight dependent measure of drug dose (calculated in mg of levodopa/risperidone administered per kilogram, mean value = 1.66 ± 0.23). As expected, no significant correlations were found between the M-learning scores and the PAS [Learning Day 1 rs=-0.19, p = 0.372; number of correctly rejected words during Day 2, rs=-0.34, p=0.097; recognition rate, rs=-0.19, p=0.372]. In addition, no significant linear correlation or inverted U-shape relationship (Chowdury et al., 2012) was found for any learning score (M+ or M-) and the weight dependent drug dosage (all ps > 0.13). However, the drug effect for M+ trials, the number of learned words during encoding (rs=-0.45, p=0.025), the total number of remembered words during Day 2 (rs=-0.67, p<0.001) and, strikingly, the recognition rate (rs=-0.49, p=0.017), showed a significant correlation with the PAS (all correlations were FDR-corrected at a p<0.05 threshold, see Figure 2A; one participant was excluded from the correlations with the learning scores on Day 2 after being identified as a bivariate outlier; note that if included, the correlations become more significant: number of remembered words during Day 2, rs=-0.71, p<0.001; recognition rate, rs=-0.55, p=0.005). This suggests that the dopaminergic pharmacological intervention induced greater memory and accuracy benefits/deficits in those participants with high sensitivity to reward. Note that the drug effect for the forgetting rate, which showed no differences in memory performance when pooling all participants together, becomes significant if we divide our participants into high and low sensitivity to reward (i.e., hedonic)  All in all, these results show that the dopaminergic pharmacological intervention did have an effect in terms of both learning and subjective pleasantness in our learning task, inducing greater memory benefits in those participants more sensitive to reward.

DISCUSSION
By using a double-blind, within-subject randomized pharmacological intervention during a learning task-guided by an intrinsically regulated reward process-known to activate the SN/VTA-HP loop (Ripollés et al., 2016), we showed that dopamine can modulate the entrance of new information into long-term memory. In particular, the administration of a dopaminergic precursor (levodopa) and a dopaminergic antagonist (risperidone) respectively increased and decreased both the learning rate and the level of pleasantness experienced by the participants during encoding, as well as the number of words remembered after a consolidation period (24 hours; see Figure 1B). Strikingly, the memory effects induced by the dopaminergic pharmacological intervention were stronger in participants with a higher sensitivity to reward (i.e., more hedonic; see Figure 2).
In a previous study using the same task (Ripollés et al., 2016) we showed that successful learning itself (in the absence of external feedback) was associated to increased reward processing and heightened activity within the SN/VTA and the VS. We suggested that this intrinsic reward-related signal induced a higher release of dopamine at the HP, which ultimately resulted in enhanced memory formation due to the well-known role of dopamine in mediating LTP processes. Current results provide further support for our hypothesis by showing that dopamine had an additional role during learning: participants not only learned more words (i.e., they performed better) under levodopa than under risperidone (as compared to placebo), but also found the learning experience more rewarding when stimulating, rather than blocking, the dopaminergic system. This result is in accord with previous work demonstrating that dopamine improves feedback-based learning in humans (de Vries et al., 2010) and also with research showing that internally generated signals of self-performancedriven by mesolimbic areas and in absence of external feedback-can guide and improve perceptual learning in humans (Daniel and Pollmann, 2012;Daniel and Pollmann, 2014;Guggenmos et al., 2016) and song learning (i.e., motor performance) in songbirds (Mandelblat-Cerf et al., 2014). An interesting interpretation of our results is therefore that the level of dopamine directly affected the reward value or the salience (Knetch et al., 2004) of the learning outcome in our task (i.e., learning was more enjoyable), prompting participants to be more motivated (Murty et al., 2014) and to perform better. The VS, through its connections to the prefrontal cortex (PFC; Lehericy et al., 2004;Cummings et al., 1993;Alexander et al., 1986), is located in a perfect anatomical position to add information about the relevance, salience and motivational value (Berridge and Kringelbach, 2008) of the stimuli to be learned into the SN/VTA-HP loop. As both the VS and the PFC are known to contain and receive dopaminergic receptors and projections (Haber and Knutson, 2010), dopamine might be able to alter this input, thus modulating the perceived reward underpinning the learning processes. An alternative explanation, which cannot be fully ruled out, is that the benefit in performance was driven by the suggested role of dopamine in working memory and attention (Surmeier, 2007;Brozoski et al., 1979;Linssen et al., 2014;Drijgers et al., 2012;Mehta et al., 2006). However, the fact that no significant learning benefits were induced in the control M-condition and, especially, the relationship between the learning improvements during the encoding phase and the participants' sensitivity to reward (for a similar effect, see Ferreri et al., 2017), suggest that the learning benefit was partially driven by reward-related and dopamine-dependent processes (Diehl et al., 1992;Nieoullon et al., 2003;de Vries et al., 2010).
Previous studies in healthy humans using classical associative learning tasks (i.e., that do not usually trigger reward-related signals) have shown that levodopa intake can lead to long-term memory benefits, possibly due to the increase of the levels of dopamine in the HP (Shellshear et al., 2015;Knecht et al., 2004). However, the lack of a clear and significant memory enhancement for the control M-condition and the fact that more hedonic participants benefitted the most from the dopaminergic intervention only in the learning condition related to reward (M+), draw a more complex and perhaps more informative picture: when using a reward-based learning task (Apitz and Bunzeck, 2013;Patil et al., 2016;Oyarzun et al., 2016;Kizilirmak et al., 2015;de Vries et al., 2010), the level of memory enhancement depends on dopamine synaptic availability, but also on the individual differences in sensitivity to reward (Ferreri et al., 2017;Mas-Herrero et al., 2014;Camara et al., 2010;Marco-Pallares et al., 2009). This discovery can be crucial for dopamine-related pharmacological interventions in, for example, clinical populations with language deficits (Berthier et al., 2011). Indeed, studies with levodopa in aphasia recovery, have resulted in both positive (Seniow et al., 2009) and negative (Breitenstein et al., 2015;Leemann et al., 2011) effects. In this type of therapy, in which patients try to learn of re-learn words that are no longer accessible (Brady et al., 2012), the intensity of the language training is usually related with recovery (Bhogal et al., 2003) and it has been suggested that high training intensity may cause a ceiling effect that prevents levodopa from providing additional memory benefits (Breitenstein et al., 2015;Leemann et al., 2011). A reward-based learning task such as the one used here, along with a better understanding of the interaction between the dopaminergic precursor and the patient's hedonic state could aid to achieve a more personalized and efficient rehabilitation success, without the need for high intensive training.
In conclusion, here we show that a dopaminergic pharmacological intervention is able to modulate behavioral measures of pleasantness, task-performance and long-term memory according to inter-individual differences in reward sensitivity. These findings further advance the idea that learning-even when achieved using a task guided by intrinsic reward-is a dopamine-dependent process, and shed new light on possible reward-based interventions for learning stimulation and/or rehabilitation.

Participants
Around 150 individuals responded to advertisements and were contacted for a first phone pre-screening. Of those, 45 confirmed their availability and, after giving informed consent, were admitted at the hospital for further screening, medical examination and laboratory exams (blood and urinalysis). The study was approved by the Ethics Committee Subjects were judged healthy at screening 3 weeks before the first dose based on medical history, physical examination, vital signs, electrocardiogram, laboratory assessments, negative urine drug screens, and negative hepatitis B and C, and HIV serologies.
The volunteers were excluded if they had used any prescription or over-the-counter medications in the 14 days before screening, if they had a medical history of alcohol and/or drug abuse, a consumption of more than 24 or 40 grams of alcohol per day for female and male, respectively if they smoked more than 10 cigarettes/day. Women with a positive pregnancy test or not using efficient contraception methods and subjects with musical training or those unable to understand the nature and consequences of the trial or the testing procedures involved were also excluded. Additionally, volunteers were requested to abstain from alcohol, tobacco and caffeinated drinks at least during the 24 h prior to each experimental period.
Twenty nine volunteers were randomized and completed the study (19 females, mean age=22.83±4.39) in exchange of a monetary compensation according to the Spanish Legislation. The original sample size was chosen to be 30 participants, but one participant dropped out early in the study and only 29 finalized it. This sample size was selected based on several criteria, including the recommendation that, in order to achieve 80% of power, at least 30 participants should be included in an experiment in which the expected effect size is medium to large (Cohen, 1988). In addition, we took into account the sample sizes of previous studies using levodopa to modulate memory (range: between 10 and 30 participants; Apitz and Bunzeck, 2013, Copland et al., 2009, De Vries et al., 2010, Knecht et al., 2004Chowdhury et al. 2012;Shellshear et al., 2015) and our previous behavioural studies using the same learning task (24 participants; Ripollés et al., 2016). We also computed a sample size analysis using the G*Power program, which showed that a sample size of 28 was required to ensure 80% of power to detect a significant effect (0.25) in a repeated-measures ANOVA with three sessions at the 5% significance level. We excluded 3 participants from the analyses after showing very poor memory performance on the word learning task during the placebo session (on Day 2, they remembered less than four of the M+ words learned during the encoding session). The final sample analysed for this learning paradigm consisted of 26 participants (17 women, mean age=22.27 ± 3.69).

Study design and procedure
This double-blind, crossover, treatment sequence-randomized study was performed at the Neuropsychopharmacology Unit and Center for Drug Research (CIM) of the Santa Creu i Sant Pau Hospital of Barcelona (Spain). Experimental testing took place over three sessions. For each session, participants arrived at the hospital under fasting conditions and were given a light breakfast. Subsequently, they received in a double-blind masked fashion a capsule containing the treatment: a dopaminergic precursor with an inhibitor of peripheral dopamine metabolism (levodopa, 100 mg + carbidopa, 25 mg), a dopamine receptor antagonist (risperidone, 2mg), or placebo (lactose). After one hour of completing several behavioral tasks not described in the current manuscript, the participants completed our word learning task which lasted 45 minutes approximately. Next, participants spent their time in a resting room and were allowed to leave the hospital after 6 hours from the treatment administration. For each session, each participant came back 24 hours after for a behavioral retesting (without any pharmacological intervention), which lasted about 15 minutes. At least one week passed between one session and the other.

Experimental word learning task
The task was virtually identical to that of our previous work (Ripollés et al., 2014(Ripollés et al., , 2016(Ripollés et al., and 2017. Stimuli were presented using the Psychophysics Toolbox 3.09 (Brainard, 1997) and Matlab version R2012b. Stimuli consisted of 168 pairs of 8 word-long Spanish sentences ending in a new-word, built with an increasing degree of contextual constraint (Mestres-Missé et al., 2009;Mestres-Missé et al., 2014). Mean cloze probability (the proportion of people who complete a particular sentence fragment with a particular word) was 29.16 ± 18.95 % for the first sentence (low constraint), and 81.67 ± 11.80 % for the second (high constraint). The new-words respected the phonotactic rules of Spanish, were built by changing one or two letters of an existing word (mean number of letters= 6.02 ± 0.99) and always stood for a noun (mean frequency 43.26 ± 78.94 per million).
For each of the three different sessions, only half of the pairs of sentences disambiguated multiple meanings, thus enabling the extraction of a correct meaning for the new-word (M+ condition; e.g., 1. ''Every Sunday the grandmother went to the jedin'' 2. ''The man was buried in the jedin''; jedin means graveyard and is congruent with both the first and second sentence). For the other pairs, second sentences were scrambled so that they no longer matched their original first sentence. In this case, the new-word was not associated with a congruent meaning across the sentences (M-condition; e.g., 1. ''Every night the astronomer watched the heutil''. Moon is one possible meaning of heutil. 2. ''In the morning break co-workers drink heutil.'' Coffee is now one of the possible meanings of heutil, which is not congruent with the first sentence). These constituted the M-condition in which congruent meaning extraction was not possible. To ensure that both stimulus types were equally comparable, participants were told that it was just as crucial to learn the words of the M+ condition as it was to correctly reject the new-words from the M-condition.
Given that the pharmacological intervention included three sessions, we created three versions of our task that only differed in the stimuli being presented. Thus, the 168 pairs of sentences were divided into six lists of 28 pairs (as aforementioned, two conditions, M+ and M-, were presented in each of the three sessions). The six lists were created so that there were no differences (one-way ANOVA) in the cloze probability of the sentences [first sentences: first sentence presentation was not related in any systematic way to the order of presentation of the same new-words for their second sentence. Participants were instructed to produce a verbal answer 8 seconds after the new-word of a second sentence appeared. If participants thought that the new-word had a congruent meaning, they had to provide its meaning in Spanish (e.g., graveyard). If the new-word had no consistent meaning, they had to say the word incongruent. If they did not know whether the new-word had a consistent meaning or not, they had to remain silent. Vocal answers were recorded and later corrected (for the M+ condition, incorrect answers included misses, providing the wrong meaning or saying incongruent; for the M-condition, incorrect answers included misses or providing any meaning at all). After giving a verbal answer, participants first provided a confidence rating that allowed for the assessment of the subjective evaluation of their performance.
Specifically, subjects were requested to enter, using the keyboard, a value between -4 and 4 (9 point scale with 0 as the neutral value). Then, participants had to rate their emotions with respect to arousal and pleasantness using the 9-point (as with confidence ratings, from -4 to 4) visual Self-Assessment Manikin scale (SAM). For valence/pleasantness, the SAM ranges from a sad, frowning figure (i.e., very negative) to a happy, smiling figure (i.e., very positive).
For arousal, the SAM ranges from a relaxed figure (i.e., very calm) to an excited figure (i.e., very aroused). All participants completed a training block to familiarize them with the task.
Each trial started with a fixation cross lasting 1000 ms, continued with the 7 first Spanish words of the sentence presented for 2 seconds, and was followed by a 1 second duration dark screen. The new-word was presented for 1000 ms. and was followed by 7 seconds of a small fixation point presented in the middle of the screen. For first sentences, a new trial was presented after 3 seconds of dark screen. For second sentences, after this period, a screen with the word Answer appeared and subjects had 3 seconds to produce a verbal answer. Then, the confidence and SAM scales for pleasantness and arousal were sequentially presented (the experiment did not continue until participants provided a rating). Finally, a new second sentence trial started after 3 seconds of dark screen. All words were placed in the middle of a black screen with a font size of 22 and in white color.
To avoid biasing our results, participants were not told at any point prior to the start of the experiment that the goal of the study was to assess whether the learning of a new-word and its meaning was intrinsically rewarding. Instead, they were told that the objective of the study was to assess how reading load affects mood and that, in order to ensure that there was a real reading load, they had to learn the words of the M+ condition and to detect the incongruence of the new-words from the M-. Finally, participants were told that they had to give pleasantness and arousal ratings when the second sentences appeared because that moment signaled that reading load had already occurred (i.e., half of the encoding block had already elapsed). After the experiment, participants were first questioned about the objective of the study. None of them answered that it was to assess whether word-learning was rewarding.
Approximately 24 hours after the learning lesson ended, participants returned to the lab to complete a recognition test (note that no drugs were administered to subjects on Day 2). In this test, participants were presented, in a pseudo-randomized order, with all the 28 M+ and 28 M-new-words used during the encoding session. This test was devised in order to assess which of the learned words during encoding were still remembered and which of them had been forgotten after a 24 hour retention period. Participants were aware that they would complete this test before completing the encoding session. It was made explicit that they would assess both M+ and M-new-words during the test phase. In the test, participants were presented with a new-word at the centre of the screen with two possible meanings below: one on the left and one on the right. If the new-word tested did not have a congruent meaning associated between the first and the second sentence, and thus correct meaning extraction was not possible (M-condition), participants had to press a button located in their left hand.
In this case, the two possible meanings presented served as foils: one was the meaning evoked by the second sentence of the M-new-word being tested; the other word shown was the meaning evoked by another second sentence presented in the same run as the new-word being tested. Instead, if the new-word tested had a consistent meaning through the first and second sentence, and thus correct meaning extraction was possible (M+ condition), participants had to select the correct meaning among the two presented. In this case, one of the two possible meanings was correct and the other, which served as a foil, was the meaning of another newword presented in the same run. In addition, participants could also press a fourth button if they did not know the answer. Thus chance level was at 25% (no consistent meaning, consistent meaning on the left, consistent meaning on the right, not remembered).

Statistical Analyses for confidence, pleasantness and arousal subjective scales and learning scores for encoding and retrieval
We first assessed whether the results of the placebo session replicated our previous behavioral data (Experiment 3 in Ripollés et al., 2016). Besides the three subjective ratings, for these first comparisons, we used two learning scores: the percentage of words learned on Day 1 (total number of words learned divided by the total number of words presented) and the recognition rate (total number of words learned during encoding and remembered on Day 2 divided by the number of words learned during Day 1). For the analyses regarding the subjective scales, we divided our M+ trials into those in which subjects learned the new-word during the learning session and still remembered it in the test after the recognition test (remembered condition) and those in which the new-word was not correctly identified in the 24 hour test (forgotten condition). We used the same approach to divide the M-trials into those in which a word was correctly marked as incongruent during encoding and still correctly rejected after 24 hours and those in which the new-word was not correctly rejected in the follow-up test. To replicate our previous results, we first used paired t-tests to compare whether ratings for confidence, arousal and pleasantness were greater for remembered than for forgotten M+ and M-new-words. We then submitted both the ratings and the learning scores to a mixed repeated measures ANOVA with Condition (M+,M-) as a within-subjects variable and Group (Pharmacological Group, Exp. 3 in Ripollés et al., 2016) as a between subjects variable.
Given that current behavioral results replicate our previous work (see results) and that in our previous study (Ripollés et al., 2016) remembered M+ words were the trials showing the highest fMRI activity within the SN/VTA-HP loop, the largest physiological response and the highest subjective pleasantness ratings, we focused all the analyses regarding the effect of the pharmacological intervention in the trials in which a word was learned during encoding and still remembered during the recognition test at 24 hours (M+ condition). For the control condition, we used those M-trials in which a word was correctly rejected during both encoding and the follow-up test. As measures for memory effects, we used the total number of words learned during encoding and remembered on the follow-up test and the percentage remembered words in the recognition test compared to the number of learned words during the learning phase (i.e., the recognition rate). In order to control for individual differences, we used the placebo session as a baseline. Thus, for each learning score and subjective scale we calculated the percentage of change from the placebo session [e.g., (levodopa score -placebo score)/(placebo score)]. Therefore, for each participant, learning score and subjective scale, we obtained the percentage of change from placebo of the risperidone and levodopa sessions. We used paired t-tests to calculate whether the difference between the changes induced by the risperidone and levodopa sessions were significant.
For the correlations between the learning scores and the PAS we used Spearman´s rho with a p<0.05 FDR correction to account for the 3 different correlations calculated per condition. The PAS was used as a proxy to reflect the degree of pleasure taken by individuals when engaging in rewarding behavior (Der-Avakian et al., 2012). Note that two participants were excluded from this analysis as they did not complete the PAS. We also correlated the learning scores with a weight dependent measure of drug dose, calculated in mg of levodopa/risperidone per kilogram. Finally, we used the median PAS value to split our final sample of 24 participants into high and low hedonic groups. For the learning scores, we first calculated the drug effect as a subtraction of the percentage of change from placebo induced by the levodopa session minus that induced by the risperidone session. We then assessed were the total drug effect for the learning scores was different for high vs. low hedonic groups by using a non-parametric test Mann-Whitney (to better account for the reduced number of participants in each group).
For significant interactions of mixed between-within ANOVA models, partial eta squares (η2) is provided as a measure of effect size. For significant differences in between group one-way ANOVAs, eta squares (η2) is provided (calculated by dividing the between groups sum of squares by the total sum of squares). For significant differences measured with t-tests, Cohen's d is provided after applying Hedges' correction (the average of the standard deviation of the variables being compared was used as a standardizer; Cumming, 2012). For significant differences measured with the Mann-Whitney test, eta squares (η2) is provided (calculated as Z 2 /N) In addition, confirmatory Bayesian statistical analyses were computed with the software JASP using default priors (JASP Team, 2018;Morey and Rouder, 2015;Rouder and Morey, 2012;Wagenmakers et al., 2018b;Wagenmakers et al., 2018a). We reported Bayes factors (BF10), which reflect how likely data is to arise from one model, compared, in our case, to the null model (i.e., the probability of the data given H1 relative to H0). For comparisons with a strong a priori, the alternative hypothesis was specified so that one group/condition was greater than the other (BF+0). We did this, specifically, for the drug effects comparisons in which we expected levodopa and risperidone to facilitate and disrupt learning/ratings, respectively; and for the group comparisons in which we expected more hedonic participants to remember more words than less hedonic participants. For mixed within-between models we used the Bayes Inclusion factor based on matched models, representing the evidence for all models containing a particular effect to equivalent models stripped of that effect (BFInlcusion, also called Baws factor).

ACKNOWLEDGMENTS
We thank the staff of the Centre d'Investigació del Medicament de l´Institut de Recerca HSCSP for their help. The present project has been funded by the Spanish Government

Funding
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.