Elucidating a locus coeruleus-hippocampal dopamine pathway for operant reinforcement

Animals can learn to repeat behaviors to earn desired rewards, a process commonly known as reinforcement learning. While previous work has implicated the ascending dopaminergic projections to the basal ganglia in reinforcement learning, little is known about the role of the hippocampus. Here we report that a specific population of hippocampal neurons and their dopaminergic innervation contribute to operant self-stimulation. These neurons are located in the dentate gyrus, receive dopaminergic projections from the locus coeruleus, and express D1 dopamine receptors. Activation of D1+ dentate neurons is sufficient for self-stimulation: mice will press a lever to earn optogenetic activation of these neurons. A similar effect is also observed with selective activation of the locus coeruleus projections to the dentate gyrus, and blocked by D1 receptor antagonism. Calcium imaging of D1+ dentate neurons revealed significant activity at the time of action selection, but not during passive reward delivery. These results reveal the role of dopaminergic innervation of the hippocampus in supporting operant reinforcement.


48
In operant learning, animals modify their action repertoires to earn desired rewards. 49 Previous work on the neural substrates of such learning has focused on the striatum and the  (Kempadoo et al., 2016), rather than the major dopamine cell groups in the ventral 63 tegmental area (VTA) and substantia nigra pars compacta (SNc), which supply dopamine to the 64 basal ganglia (Björklund and Dunnett, 2007;Ikemoto, 2007). 65 In this study we examined the contribution of dopaminergic signaling in the DG to 66 operant learning and behavior. We found that mice could learn to perform a new action 67 (pressing a lever) for optogenetic activation of D1+ neurons in the DG. In addition, using both 68 optogenetics and in vivo pharmacological manipulations, we found that activation of LC 69 dopaminergic neurons that project to the DG can also produce self-stimulation, and this effect 70 depends on the activation of D1-like receptors. Finally, using in vivo calcium imaging, we found 71 that D1+ DG neurons were more related to the goal-directed actions than simply passive reward, 72 and play a prominent role in operant behavior.

75
To understand the role of D1+ neurons in the hippocampus, we tested whether selective 76 stimulation of these neurons can reinforce operant behavior using a self-stimulation paradigm. 77 We injected either a Cre-dependent channelrhodopsin (AAV5-DIO-ChR2) or a fluorescent 78 control (DIO-eYFP) into D1-Cre mice (D1::ChR2 DG or D1::eYFP DG ), producing selective 79 expression of the excitatory opsin in D1+ neurons in the dentate gyrus ( Figure 1A-B). Mice 80 received photostimulation (500 ms, 20 Hz, 15 ms pulse width) following lever pressing on a 81 fixed ratio schedule of reinforcement ( Figure 1C). All D1::ChR2 DG mice learned to press a lever 82 for stimulation, whereas control mice did not ( Figure 1D). These results suggest that 83 D1::ChR2 DG stimulation is sufficient to reinforce lever pressing. Interestingly, this form of self-84 stimulation is remarkably resistant to extinction, persisting after 8 days without any 85 photostimulation. 86 Next, using retrograde tracing methods, we were able to map projections to the DG 87 (Figures 2A-D & Supplementary Figure 1). We confirmed significant LC projections to the 88 DG, but we did not find significant VTA or SNc projections ( Figure 2E, H  there was no labeling in the VTA (Figure 2H & Supplementary Figure 1). This finding 92 suggests that the DG receives TH+ projections from the LC rather than VTA. 93 We then tested whether the LC-DG projection is responsible for the self-stimulation 94 effect observed. In order to manipulate the LC-DG pathway selectively, we injected AAV-95 Retro2-Cre into the DG and a Cre-dependent ChR2 (AAV5-DIO-ChR2) into the LC (Figure 3A-96 B). We found that ChR2 DG-LC (N = 6) mice also showed self-stimulation that is comparable to 97 the stimulation of D1::ChR2 DG neurons ( Figure 3C).

110
To verify that our self-stimulation effects were not due to activation of LC collaterals in other 111 regions, we performed local infusions of antagonists ( Figure 4A-C, N = 5). Infusions of a D1-112 antagonist into the DG significantly impaired self-stimulation, whereas propranolol showed no 113 significant group differences on self-stimulation ( Figure 4D-E). These results suggest that the 114 reinforcing effects of LC-DG stimulation is due to the activation of D1 receptors by dopamine, 115 rather than by norepinephrine.

116
Based on our self-stimulation results, we hypothesized that DG D1+ neurons may be 117 preferentially activated during operant conditioning in general, including during actions that 118 result in natural rewards rather than simply optogenetic stimulation of the LC-DG pathway. To 119 test this, we performed in vivo calcium imaging of DG D1+ neurons while performing an operant 120 lever pressing task with food reward. We implanted a gradient index lens above the DG in D1-

121
Cre mice (N = 5) injected them with a Cre-dependent calcium indicator (AAV9-syn-FLEX-122 jGCamp7f) (Figure 5A-B). 123 We then recorded calcium transients from DG D1+ neurons during operant lever pressing 124 for food rewards (Figure 5A-C) during lever pressing for food reward on fixed-ratio (FR) 125 schedules (FR1, FR3 and FR5). We found distinct populations of DG D1 + neurons that were 126 modulated by lever pressing. To see if the neural activity is action-contingent, we also used a 127 control task in which pressing is not required. The reward was delivered non-contingently every 128 20 seconds, preceded by one second of white noise. On this task, there were far fewer 129 significantly modulated DG D1+ neurons (N = 6, 3% of total population) compared to the 130 operant task ( Figure 5F, Table 2). To verify that the virus targets D1+ neurons in the DG, we 131 quantified the percent of neurons that are virally targeted that express D1 receptors. Using RNA 132 scope, we found that GcAMP-7f was colocalized with D1 receptors (Figure 6). 133 To determine if the activity of these neurons reflected the spatial locations of the lever 134 pressing or the action of lever pressing itself, we used a discrete trial design with two levers 135 (Figure 7). On each trial, one of the two levers was randomly selected to extend into the operant 136 box. Once pressed, the lever would retract. The reward would then be delivered one second later.

137
This task allowed us to compare the neural activity modulated by lever pressing and reward, as 138 well as determine the spatial tuning of the same neurons. We found that several populations of 139 dentate D1+ neurons (n = 40, 16.5% of total population) that were significantly modulated at the 140 time of lever pressing ( Figure 7C). One small population was modulated by reward delivery (n 141 =14, 4.9% of total population). Importantly, another population with significantly more neurons 142 were responsive to lever pressing at either lever location (Figures 7C-D; N = 22, 9.79% of total 143 population). These neurons were not spatially selective, as they were responsive when the lever 144 was presented at different locations. However, we did find a small population that responded to 145 only a single lever ( Figure 7C; left lever: n = 15, right lever: n = 12; 4.2 % of total population).

146
It is difficult to assess spatial activity of neurons in operant tasks, as animals do not cover 147 the arena equally but instead preferentially occupy specific task-relevant locations 148 (Supplementary Figure 2). To examine the stability of spatially related activity during operant 149 conditioning, we used an FR5 task with two levers (Supplementary Figure 3). We then used 150 the same methods as described in (Skaggs et al., 1993) to identify spatially modulated neurons, 151 and split the sessions into periods when the left or right lever were active. This allowed us to 152 recalculate the center of mass of our identified spatial firing fields with two different lever 153 locations. As mice mostly stayed close to the wall where the two levers and food port were 154 located, we limited our analysis to the x-dimension which explained most of the variance in the 155 neural activity. We found that the place field centers were significantly different when the lever 156 is available. While we did find that occupancy varied in this switch task, the occupancy across 157 other tasks was consistent, suggesting that spatial modulation does not depend on the type of task 158 (e.g. operant vs Pavlovian) (Supplementary Figure 2). In contrast, task-related neural activity in 159 the DG depends on whether the task is action-contingent.     In order to characterize projections to the dentate gyrus we injected 50 nL of 273 AAV(retro2).hSyn.EF1α.Cre.WPRE into each hemisphere of the DG of Ai14 mice (four mice x 274 two hemispheres, N=8; 2 females and 2 males). We then processed the slices and acquired 275 images as described above. We opened the raw images taken from the Axio Imager V16 upright 276 microscope (Zeiss) in Fiji to quantify the number of cells from a single coronal brain slice using 277 8-bit confocal images. A threshold was set to identify the neuronal cell bodies. The function "fill 278 holes" was then used to remove possible empty space within the selected cells. After converting 279 the image to mask, we ran the "Analyze Particle" plug-in in Fiji to count the cells in each image.

280
Using the Analyze Particle function, the masks taken were then counted to determine the number 281 of co-localizing cells using the "Colocalization Threshold" plug-in in Fiji.  (4 male, 1 female) were used to isolate the neurotransmitters released from the LC into the DG.

306
LC and DG surgeries were the same as pathway specific manipulations, with the addition of 307 cannulas (P1 technologies. AP: -2.00 mm relative to bregma, ML: ± 1.8 mm relative to bregma, 308 DV: -1.4 mm, at a 10 degree angle). All optic fibers were secured in place with dental acrylic 309 adhered to skull screws. Mice were group housed and allowed to recover for one week before  For self-stimulation experiments a single lever was inserted at the start of the session. For 318 each lever press animals received 500 ms of stimulation at 20 Hz (15 ms pulse width, 5 mW 319 power). Animals were trained with 30 minute sessions. Animals were tested for 32 consecutive 320 days, and received 8 sessions of FR1, FR3, and FR5, and then 8 extinction sessions. To test for 321 extinction, the lever was inserted but no stimulation was delivered for lever presses.

322
Drug injections 323 Even low doses of DA antagonists are known to reduce self-stimulation, but low doses of 324 NE antagonists have no effect on self-stimulation behavior (Rolls et al., 1974). In the present 325 study, we selected doses that minimized effects on movement or arousal. The same mice used 326 above were retrained after extinction (3 FR1 sessions) to press a lever for stimulation of LC 327 neurons that projected to the dentate gyrus. They were then tested with DA or NE antagonists  Fixed-ratio training with food reward (calcium imaging) 352 The behavioral tests used for calcium imaging were performed while the mice were food 353 deprived to 85% of their free-feeding weight. There were 6 tasks used, with three days of testing 354 for each task, for a total of 18 testing days. Each imaging session lasted approximately 10 355 minutes. Five mice were trained on a fixed ratio (FR) 1 task where they received a pellet for 356 pressing a lever. Subsequently, animals were moved to an FR3, and then FR5.

357
To examine the interaction of spatial location and lever pressing we used an FR5 switch 358 task. Where a single lever (Supplementary Figure 3) into an operant chamber, and 5 responses 359 resulted in a pellet. Following 5 pellets (25 presses), the first lever retracted, and a second lever 360 is inserted. The mouse has to move to the other side of the food cup and press the other lever to 361 earn a food reward.

362
In order to dissociate reward delivery and action production we used an FR1 schedule of 363 reinforcement where the reward was delayed one second after the press (Two-lever FR1 delay 364 task). We used two levers in this task. One lever is inserted on a given trial. If pressed the lever 365 would retract, and a food pellet is delivered 1 s later. After a 2 s inter-trial-interval, one of the 366 two levers was randomly inserted.

367
Following the fixed ratio testing, mice received a reward at fixed intervals (20s), 368 preceded by 1 second of white noise (Non-contingent reward). The animals were not required to 369 press a lever to receive the reward.  Baseline activity was considered -3000 to -1000 ms prior to the event. If the calcium activity 375 was 99% above baseline for 3 consecutive 100 ms bins then there was considered to be a 376 significant increase in calcium activity. For tasks that involved two levers we calculated the 377 percent of neurons that were significantly modulated by either lever, and then excluded these 378 neurons from our analysis of modulation by a specific lever. We identified spatially tuned cells 379 by computing the spatial information contained in the calcium transients, compared with shuffled      Coeruleus; scp, superior cerebellar peduncle; DAPI, 4′,6-diamidino-2-phenylindole. **** p <