Cell-Type Specific Responses to Associative Learning in the Primary Motor Cortex

The primary motor cortex (M1) is known to be a critical site for movement initiation and motor learning. Surprisingly, it has also been shown to possess reward-related activity, presumably to facilitate reward-based learning of new movements. However, whether reward-related signals are represented among different cell types in M1, and whether their response properties change after cue-reward conditioning remains unclear. Here, we performed longitudinal in vivo two-photon Ca2+ imaging to monitor the activity of different neuronal cell types in M1 while mice engaged in a classical conditioning task. Our results demonstrate that most of the major neuronal cell types in M1 showed robust but differential responses to both cue and reward stimuli, and their response properties undergo cell-type specific modifications after associative learning. PV-INs’ responses became more reliable to the cue stimulus, while VIP-INs’ responses became more reliable to the reward stimulus. PNs only showed robust response to the novel reward stimulus, and they habituated to it after associative learning. Lastly, SOM-IN responses emerged and became more reliable to both conditioned cue and reward stimuli after conditioning. These observations suggest that cue- and reward-related signals are represented among different neuronal cell types in M1, and the distinct modifications they undergo during associative learning could be essential in triggering different aspects of local circuit reorganization in M1 during reward-based motor skill learning.


Abstract: 25
The primary motor cortex (M1) is known to be a critical site for movement initiation and motor learning. 26 Surprisingly, it has also been shown to possess reward-related activity, presumably to facilitate reward-27 based learning of new movements. However, whether reward-related signals are represented among 28 different cell types in M1, and whether their response properties change after cue-reward conditioning 29 remains unclear. Here, we performed longitudinal in vivo two-photon Ca 2+ imaging to monitor the activity 30 of different neuronal cell types in M1 while mice engaged in a classical conditioning task. Our results 31 demonstrate that most of the major neuronal cell types in M1 showed robust but differential responses to 32 both cue and reward stimuli, and their response properties undergo cell-type specific modifications after 33 associative learning. PV-INs' responses became more reliable to the cue stimulus, while VIP-INs' 34 responses became more reliable to the reward stimulus. PNs only showed robust response to the novel 35 reward stimulus, and they habituated to it after associative learning. Lastly, SOM-IN responses emerged 36 and became more reliable to both conditioned cue and reward stimuli after conditioning. These 37 observations suggest that cue-and reward-related signals are represented among different neuronal cell 38 types in M1, and the distinct modifications they undergo during associative learning could be essential in 39 triggering different aspects of local circuit reorganization in M1 during reward-based motor skill learning. 2016). In human subjects, reward has also been shown to modulate M1 activity, likely through an 68 inhibitory circuit dependent mechanism (Thabit et al., 2011). However, it remains unclear how reward-69 related responses are represented in M1, and if the representation changes with associative learning. 70 It was recently shown that in well-trained mice performing a skilled reaching task, a subset of 71 layer 2/3 (L2/3) pyramidal neurons (PNs) in M1 specifically report successful, but not failed, reach-and-72 grasp movements. In contrast, a different subset of PNs report only failed reach-and-grasp movements 73 (Levy et al., 2020). Since the ability to use past experience to learn action-outcome associations is critical 74 to survival, encoding the outcome in M1 may be an important part of motor skill learning. It is widely 75 accepted that associative learning using reinforcement can accelerate and enhance learning (Abe et al.,76 2011; Nikooyan & Ahmed, 2015). In the case of motor learning, studies have demonstrated that positive 77 feedback (reward) facilitates motor memory retention and negative feedback (punishment) speeds up the 78 learning process (Galea et al., 2015). One hypothesis is that during learning, reward signals in the brain, 79 together with neuromodulators and synaptic plasticity, are involved in potentiating and optimizing the 80 neural circuitry in M1 that underlies the rewarded movement. Implementing such a learning process 81 would necessitate the interplay between different cell types within the local microcircuitry (Richards et  over local network activity and provide a potential mechanism for how the brain processes reward signals 94 and ultimately uses this information to optimize neural activity related to learned motor skills. control over PN frequency tuning (Seybold et al., 2015). Moreover, in the auditory cortex, prefrontal 99 cortex, and basolateral amygdala, reinforcement signals such as reward  cue-and reward-related signals are represented among major neuronal cell types in M1, and they undergo 116 cell-type specific modifications during associative learning, indicating they may have distinct roles in 117 integrating reinforcement signals to promote circuit reorganization in M1 during motor skill learning. 118 119

RESULTS 120
To understand how reward-associated signals are represented within the local microcircuitry in M1 before 121 and after associative learning, we established a head-fixed auditory cued reward conditioning task, which 122 allowed us to combine the task with in vivo two-photon Ca 2+ imaging to examine the response properties 123 of different neuronal cell type populations in awake and behaving mice ( Figure 1A). In this task, water-124 restricted mice were exposed to a conditioned stimulus (CS; auditory tone, 1s duration), followed by a 125 1.5s delay and then the delivery of the unconditioned stimulus (US; water reward, ~10 µL). Mice were 126 trained for ~30-35 trials/session (1 session/day for 7 days) with a randomly varied inter-trial interval (ITI) 127 between 60 -120s ( Figure 1B). Since M1 is known to be involved in movement initiation and motor skill 128 learning, we chose to use a simple classical conditioning task with just an auditory tone paired with 129 reward and omitted any additional training where mice would be required to learn a new movement. reliably attribute changes in neuronal activity over the course of the task to associative learning, rather 135 than to motor learning. 136 Mice learned to associate the CS with the reward after 7 days, shown by an increase in 137 anticipatory lick rate, a conditioned response, following the tone onset on day 7 compared to day 1 138 (Figure 1C). On a trial-by-trial basis, anticipatory lick rate did not change significantly within a single 139 session on both day 1 and day 7, implying limited within-session improvements ( Figure 1D). demonstrating the mice effectively learned the CS-reward association by day 7 (Figure 1E, F). 148 To investigate the activity of different neuronal cell types during this task, we used in vivo two-149 photon Ca 2+ imaging of different cell type populations. To target PNs in M1, we injected an adeno-150 associated virus (AAV) carrying a Ca 2+ indicator (GCaMP6f) driven by the CaMKII promoter 151 (AAV1.CaMKIIa.GCaMP6f) into M1 of wildtype B6129S mice (Figure 2A). After 3-5 weeks, we 152 recorded the activity of hundreds of L2/3 PNs using two-photon microscopy in awake mice while they 153 underwent the head-fixed conditioning task, and we tracked the same population of neurons on day 1 and 154 day 7. We identified all the active neurons within a session, irrespective of the behavioural task (see 155 Methods), and sorted neurons by the timing of their peak activity relative to the CS onset. It was apparent 156 that there are subpopulations of neurons more responsive to CS, reward, or both ( Figure 2B, C). We also 157 repeated the experiments to examine if the major IN subtypes in M1 also respond to the CS and reward 158 during the conditioning task. To do this, we injected AAV-Syn-Flex-GCaMP6f in PV-Cre, SOM-Cre or 159 VIP-Cre transgenic mice to selectively express GCaMP6f in PV-INs, SOM-INs, or VIP-INs, respectively, 160 and then performed in vivo two-photon Ca 2+ imaging to monitor the response properties of the same 161 population of INs on day 1 and day 7 after associating learning. We also compared the mean percentage 162 of active cells within the entire session and confirmed all cell types had a similar proportion of active cells 163 (irrespective of the behavioural task) on day 1 and day 7 ( Figure 2D). 164 To examine task-related activity in each cell type, we first compared the mean percent of active 165 cells during the CS and reward to a null distribution made by randomly sampling the session irrespective 166 of the behavioural task, and then calculating the mean percentage of active neurons during the sampled 167 period. By repeating this 1,000 times for each cell type on day 1 and day 7, we created a distribution of 168 the percentage of active neurons that were present at baseline levels or by chance.  Figure 2E); in contrast, PN responses to the reward were significantly 175 higher on day 1 but significantly lower than the null distribution on day 7 (Day 1: 23.95 ± 2.49%, Day 7: 176 12.12 ± 0.95%; Figure 2E). Lastly, SOM-INs showed significant responses to the CS and reward only on 177 day 7 following associative learning, while on day 1, they demonstrated no response to the CS and a 178 modest response to the reward (CS: Day 1: 5.53 ± 2.7%, Day 7: 13.56 ± 3.17%; Reward: Day 1: 9 ± 179 3.66%, Day 7: 12.15 ± 3.93%; Figure 2H). Based on these findings, we decided to only examine the 180 sessions where the percent of active cells were significantly greater than the null distribution in our 181 subsequent analysis, as non-significant percent of active cells during the stimulus period cannot be readily 182 distinguished from non-task related baseline noise. 183 We began our analysis on PV-INs and VIP-INs because they both showed significant responses 184 to both CS and reward on day 1 and day 7. To understand how their representations of reward and 185 reward-associated cues changed over the course of learning, we first analyzed the tuning of individual 186 cells to unbiasedly identify their response properties. By quantifying the tuning of each cell's average 187 response during the CS and reward response periods (2.5s window) using the non-parametric Spearman 188 correlation ρ (see Methods), we observed a wide range of tuning coefficients to the CS and reward, with a 189 small proportion that was strongly positively or negatively tuned to the CS or reward stimulus (tuning 190 coefficient near -1 or 1; Figure 3A-D), consistent with our earlier analyses demonstrating that neurons in 191 M1 show activity associated with the CS or reward during the conditioning task. We next examined 192 whether the tuning coefficient changes within each cell type after associative learning by calculating the 193 changes in tuning coefficients for each cell between day 1 and day 7. Again, to validate our findings, we 194 compared these values to a null distribution of Δρ values obtained by randomly sampling the two sessions 195 (see Method Details). The PV-IN population did not show any significant changes in either CS or reward 196 tuning between day 1 and 7 ( tone = −0.049 ± 0.046, reward = 0.014 ± 0.054; Figure 3E, F), 197 indicating that neither CS-nor reward-related tuning became stronger after associative learning. In 198 contrast, VIP-INs' CS tuning did not change significantly between day 1 and 7 ( = −0.065 ± 199 0.048, but VIP-INs' reward tuning significantly increased on day 7 ( reward = 0.161 ± 0.086; Figure  200 3G, H), suggesting a strengthening of VIP-IN responsivity to reward following associative learning. 201 Although the tuning properties can reveal changes in task-related responsivity, it is limited in 202 identifying changes at the trial-by-trial level. When we assessed population activity following the CS 203 onset (Figure 4A), it was apparent that a group of PV-INs and VIP-INs were CS responsive on both day 204 1 and day 7 (Figure 4B, 5B). Hence, by identifying and tracking the same neurons from day 1 to day 7, 205 we were able to ask if there was (1) an increase in the number of neurons being recruited as CS-or 206 reward-responsive during associative learning or (2) a change in the trial-by-trial reliability of CS and 207 reward responses. When we compared the mean percent of CS-responsive neurons on day 1 and day 7, 208 we found that the average percent of CS responsive PV-INs during a trial increased significantly by day 7 209 (Day 1: 15.26 ± 2.11%, Day 7: 24.09 ± 2.98%; Figure 4C), while the percent of CS-responsive VIP-INs 210 did not change (Day 1: 11.29 ± 3.23%, Day 7: 16.59 ± 2.01%; Figure 4D), demonstrating that more PV-211 INs became responsive to the CS after associative learning. We then assessed the reliability of the 212 responses, defined as the percent of trials within a session where a neuron was responsive to the CS. This 213 measure quantifies how consistently a neuron responded to the CS within a session. We first plotted the 214 cumulative distribution function of reliabilities among all PV-INs and VIP-INs. We observed that PV-215 INs, as a population, were significantly more reliable in their CS responses than VIP-INs on day 1 216 ( Figure 4E). Next, we grouped neurons into 'High Reliability' if they were among the top 50 th percentile, 217 while neurons in the bottom 50 th percentile were deemed 'Low Reliability'. By dividing the neurons into 218 High and Low reliability groups and following them from day 1 to day 7, we could examine if a neuron's 219 initial response in the naïve state will be subsequently changed by associative learning. We found that 220 PV-INs that began as highly reliable maintained their reliability to the CS (Day 1: 29.8 ± 1.51%, Day 7: 221 33.87 ± 4.72%), while PV-INs that began as low reliability became significantly more reliable (8.47 ± 222 0.46%, Day 7: 18.99 ± 3.76%; Figure 4F). In contrast, the reliability of both high and low VIP-INs did 223 not change (High Reliability: Day 1: 26.55 ± 2.62%, Day 7: 25.93 ± 3.81%; Low Reliability: Day 1: 6.32 224 ± 0.76%, Day 7: 14.24 ± 2.6%; Figure 4G). Together, these results show that as a population, more PV-225 INs became responsive to the CS, and their responses also became more reliable following associative 226 learning. Reliability: Day 1: 10.58 ± 0.74%, Day 7: 19.59 ± 3.08%; Figure 5F).  Figure 5G). Altogether, while the proportion of reward-240 responsive VIP-INs during a given trial did not change, a subset of VIP-INs that were largely 241 unresponsive to reward on day 1 became more reliably responsive on day 7. 242 Although PV-INs and VIP-INs were the only cell types that were significantly responsive to both 243 CS and reward on both day 1 and day 7, PNs and SOM-INs also had significant responses to specific 244 stimuli on certain days. While PNs did not show significant CS responses when compared to baseline, 245 their reward responses on day 1 were significantly above the null distribution, and they became 246 significantly lower than the null distribution on day 7 ( Figure 2E). This result is in line with the change 247 in tuning coefficient (∆ ), which showed a significant decrease in reward tuning between day 1 248 and 7 ( reward = −0.141 ± 0.067; Figure 6A). Moreover, the cumulative distribution function of PN 249 reliability also shifted significantly to lower reliabilities on day 7 compared to day 1 ( Figure 6B). These 250 results indicate that PNs initially respond to novel reward; however, they habituate to the reward 251 following associative learning. 252 SOM-INs initially had no response to the CS on day 1, but their responses became significant on 253 day 7 ( Figure 2H). The change in CS tuning coefficient (∆ ) was not significant (∆ = .059 ± 254 0.031, Figure 6C), suggesting their responsivity did not change with learning. Interestingly, when we 255 assessed the change in reliability to the CS between day 1 and day 7, the cumulative distribution function 256 shifted significantly to higher reliability values on day 7 ( Figure 6C, D). Notably, by day 7, there was a 257 visible reduction in the number of SOM-INs that had 0% reliability to CS on day 1, indicating they were 258 completely unresponsive to tone on day 1 but not on day 7. Finally, SOM-INs showed modest but 259 significant responses to reward on day 1 and 7 ( Figure 2H). When we assessed the reward tuning among 260 the SOM-IN population, ∆ did not show a significant change between day 1 and 7 (∆ = 261 .0.017 ± 0.040, Figure 6E). However, SOM-IN reliability also shifted to higher values on day 7 ( Figure  262 6F). Altogether, these results suggest that in naïve mice, SOM-INs are unresponsive to the neutral CS and 263 modestly responsive to the novel reward stimulus; however, following associative learning, SOM-INs 264 become more reliably responsive to both the CS and reward. 265 Lastly, reward consumption requires innate tongue movements during licking, and since 266 microstimulation of mouse M1 has been shown to evoke tongue and jaw movements (Komiyama et al.,267 2010), it is crucial to distinguish whether the observed CS and reward responses resulted from the task-268 related stimuli or if the activity is simply associated with licking movements. We demonstrated earlier 269 that head-fixed mice learned the CS-US association by displaying the conditioned response (anticipatory 270 licking) following the CS on day 7 (Figure 1). To address this potential confound, we identified all the 271 self-initiated licking bouts during ITIs, when no reward was present (Figure 7A-C). We assessed all the 272 significantly active cell types on day 1 and day 7 (identified in Figure 2) and calculated the response 273 reliability index of all the active neurons during ITI lick bouts and compared them to the response 274 reliability index for the CS and reward. On both day 1 and day 7, all cell types exhibited lower reliability 275 index values for the ITI lick bouts compared to the CS and reward, indicating that the increase in task-276 related responses following water rewards was specific to the reward stimulus, and not licking 277 movements (Figure 7D-J). These results suggest that the cell-type specific modifications observed 278 between day 1 and 7 were not simply caused by licking movements. M1 is still unclear. Using chronic two-photon Ca 2+ imaging, combined with transgenic mouse lines and 287 viral strategies to target different neuronal cell types, we demonstrated that during a conditioning task, all 288 major cell types in M1 responded to either the CS, the reward stimulus, or both. Most notably, each cell 289 type underwent cell-type specific modifications after association learning. By tracking the same 290 population of neurons before and after associative learning, we revealed that the CS-responding 291 population increased among PV-INs, and individual cell responses to the CS also became more reliable 292 following associative learning. On the contrary, VIP-INs became more reliable to reward. Additionally, 293 PNs had a drastically reduced response to reward, while SOM-INs became more reliable to both the CS 294 and the reward. Our findings suggest that each cell type has a distinct role in processing information 295 related to the cue-reward association in M1, and they may work together to provide the reinforcement 296 signals in M1 that are important for motor skill learning. 297 Previous studies in trained rhesus monkeys performing a joystick center-out task have shown a 298 widespread representation of reward anticipation and reward-related activity among cortical neurons in 299 M1 (Ramakrishnan et al., 2017). Consistent with earlier work, we also observed reward-related activity in 300 all four major cell types in M1, even in naïve mice on day 1 when they were first exposed to the CS and 301 reward. It has been reported that in sensory cortices, repeated passive exposure to a sensory stimulus leads 302 to a long-lasting reduction in PN responsivity, but when animals are engaged in learning, PNs maintain 303 their responsivity to the repeated stimulus ( processing CS-and reward-related information in M1.    bouts with no water reward present, compared to the mean reliability index following reward timing. All 492 cell types were more reliably responsive during the reward period than during licking movement alone. 493 Only sessions/cell-types with significant reward-related responses were analyzed.

ACKNOWLEDGEMENTS: 500
We thank the members of the Chen lab for discussions and providing feedback on the manuscript. This 501 work was supported by grants for S.  injected. All injections were performed at a rate of 10 nl/min and the pipette was left in place for 4 533 minutes following the injection to avoid backflow. A glass imaging window was then implanted over the 534 craniotomy and sealed with dental cement. Following surgery, a subcutaneous injection of dexamethasone 535 (2 mg/kg) and buprenorphine (0.1 mg/kg) was given. Mice were given a minimum of 1 week to recover 536 prior to beginning water restriction. 537

Auditory Cued Reward Conditioning Behaviour 538
Mice were gradually water restricted down to ~1 ml per day (~80% of original body weight) over two simultaneous two-photon imaging and exposed to the unconditioned stimulus (a constant auditory tone, 542 1s in duration) followed by a 1.5s delay period and a water reward (~ 10 µl). All lick times were 543 measured by an infrared beam lick-o-meter and logged using the data acquisition software WaveSurfer 544 (https://wavesurfer.janelia.org/). The inter-trial interval between the previous water reward and 545 subsequent tone onset was randomly varied between 60-120s. Each session was one hour in duration with 546 30-35 trials in total. Mice underwent one session per day for seven consecutive days. Two-photon 547 calcium image was performed simultaneously on day 1 and day 7 of the behavioural task. 548 To assess licking behaviour, lick rate (number of licks per second, measured as infrared beam 549 breaks) was calculated within 500ms bins, then averaged across all trials within a session for each mouse. 550 Lick rate was then averaged across mice. Mean anticipatory lick rate was calculated as the mean lick rate 551 from the time of tone onset to the end of the delay period (2.5s in duration), not including the reward 552 delivery. Mean ITI lick rate was calculated from the lick rate during the first 2.5s of self-initiated 553 spontaneous lick bouts. ITI lick bouts were defined as licking events that followed the previous trial by at 554 least 20s and preceded the subsequent trial by more than 2.5s. Mean reward lick rate was calculated from 555 the lick rate from the time of reward delivery to 2.5s after. 556 All trials within a session were including in lick rate analysis in Figure 1. To ensure behavioural 557 consistency across trials, only trials with at least 3 lick responses within 2.5 s of the reward delivery time 558 were included in all analysis of neural responses. 559 560

Calcium Imaging and Analysis 561
In vivo imaging was performed using a commercial two-photon microscope (B-scope, Thorlabs, Newton, 562 NJ, USA) and a 16x water immersion objective (Nikon) with excitation at 925 nm (InSight X3, Spectra-563 Physics, Milpitas, CA, USA) with a frame rate of 30 Hz. Images were taken at 512 x 512 pixels covering 564 755 by 650 µm. 565 Images were corrected for movement in the x and y plane using full-frame cross-correlation 566 image alignment (Turboreg (Thévenaz, Ruttimann, & Unser, 1998) plug-in ImageJ). The entire session 567 was visually inspected and regions of interests (ROIs) were manually drawn on neurons using a custom 568 MATLAB program, described in Peters et al. (Peters et al., 2014). The ROI template from day 1 was 569 loaded onto day 7 and aligned along the x and y plane. Only neurons that could be tracked from day 1 to 570 day 7 were included in the dataset. 571 identify significant activity events for each neuron and then excluded ROIs with no significant activity 578 events within the session, irrespective of the behaviour. For each neuron the ΔF trace was circularly 579 shifted by a random integer 1,000 times and compared to the original trace. If the original ΔF trace was 580 greater than the shifted data for at least 5 consecutive frames in at least 950 iterations, this was considered 581 an active event. If a neuron did not have at least one active event in the entire session, irrespective of the 582 behaviour, it was removed from the data set. This only accounted for a small proportion of ROIs as most 583 of them are active on both day 1 and day 7, as shown in Figure 2D. 584 For all subsequent analyses, a modified z-score, adapted from Kato et al. (2015), was applied to 585 ΔF. The z-score was calculated as Z = (f(t) -µ)/σ, where f(t) is the ΔF trace for a neuron, µ is the mean, 586 and σ is the standard deviation of the neuron's ΔF during the baseline period. The baseline period was a 587 concatenation of 2.5 s preceding the tone onset (start of a trial) for all trials within a session. 588

Calculation of Tuning Coefficients 589
We quantified the tuning of individual neurons to the tone and reward stimuli delivered in our classical 590 conditioning task using the non-parametric Spearman correlation (scipy.stats.spearmanr) between the 591 trial-averaged fluorescence and the timing of stimulus delivery 592 the number of trials, and is the Spearman correlation coefficient. Analysis was carried out with 597 baseline =2 s and post =6 s. We considered the "tone" period indicated by tone to range from the start 598 of the tone at time to the start of reward delivery at time + tone + delay , and the "reward" period 599 indicated by reward to be the first 2.5 seconds of reward delivery (see schematic in Figure 4A, 5A). We 600 used the change in from day 1 to day 7 as a cell-resolved measure of changes in tuning over the course 601 of learning. 602 To summarize learning-associated changes in tuning, we calculated the mean change in the 603 Spearman correlation for each cell type and trial component (tone or reward) from day 1 to day 7 as 604 follows 605 where ℳ is the set of mice used in the experiment, is the number of neurons in mouse , and , , ( )

607
is the Spearman correlation as defined above. 608 We used a non-parametric approach for statistical tests involving the mean change in Spearman 609 correlation by scrambling trial times and bootstrapping mice to construct a null distribution for . 610 Specifically, we first drew a random sample of |ℳ| mice from ℳ with replacement, then drew a random 611 sample of | ( ) | trial start times uniformly distributed between 0 and session ( ) − ( baseline + tone + 612 delay + reward + post ) for each day and randomly-selected mouse, and finally used these randomly-613 selected mice and scrambled trial start times to compute the change in tuning . This process was 614 repeated 1000 times to approximate the distribution of under the null hypothesis that changes in 615 tuning are unrelated to tone and reward delivery. We considered the observed changes in tuning to be 616 statistically significant at the * or * * level if they fell into the 5 % or 1 % tails of this distribution, 617 respectively.

Activity Analysis 619
To identify neuron responses to tone and reward, we applied a set threshold to each neuron on a trial-by-620 trial basis. Neurons were defined as tone-responsive or reward-responsive within a trial if they exceeded 1 621 z-score (excitation threshold used in Kato et al., 2015(Kato et al., 2015) for at least 5 consecutive frames 622 within 2.5s of the tone onset or 2.5s of the reward delivery time, respectively. This was assessed for each 623 trial with at least 3 lick responses within 2.5 s of the reward delivery time. We then took the median of the 624 percent of responsive neurons across all trials in a session from one mouse, and the mean across mice. 625 We used a Monte-Carlo approach to validate the percent of tone and reward responsive neurons. The 626 mean percentage of tone-responsive and reward-responsive neurons observed were compared to a null 627 distribution made for each cell type on each day. We randomly sampled mice with replacement, then 628 sampled the entire session, and then calculated the percentage of active cells (exceeding 1 z-score for at 629 least 5 consecutive frames) during a randomly chosen 2.5s window. For each mouse, the number of 630 samples was equal to the number of included trials (i.e. number of trials with at least 3 lick responses 631 within 2.5s of reward delivery). We then took the median across the random samples and then took the 632 mean across mice to obtain a mean percentage of responsive neurons during a randomly chosen time 633 window. This was repeated 1000 times to generate a null distribution of mean percentage of active 634 neurons. A Monte-Carlo approach was then used to assess whether the observed percentage of tone-and 635 reward-responsive neurons was significantly different from the null distribution by comparing the 636 observed value to the tails of the normally distributed null distribution. This was done for each cell type 637 on both Day 1 and Day 7. We considered the tone-or reward-responses to be statistically significant at 638 the * or * * level if they fell into the 5 % or 1 % tails of this distribution, respectively and *** if there was 639 no overlap with the distribution. Since this approach tests the null hypothesis that the observed neuronal 640 responses are due to chance (in this case, baseline activity/noise), only cell types with a significantly 641 higher percentage of responsive neurons for a given session were analyzed. 642 The tone/reward reliability index was defined as the percent of trials within a session where the 643 neuron was tone/reward responsive. The reliability cumulative distribution was made by pooling the day 644 1 index values of all the neurons from a neuronal cell type (across mice). If a neuron's day 1 index value 645 was lower or equal to the index value at the 50 th percentile of the cumulative distribution for that cell type, 646 it was categorized into the Low Reliability group. If a neuron's day 1 index value exceeded the 50 th 647 percentile value, it was categorized into the High Reliability group. We then took the mean reliability 648 within each group on day 1 and day 7. 649

Inter-Trial Interval Lick Bout Analysis 650
Inter-trial interval (ITI) lick bouts were defined as self-initiated licking events that occurred at least 20 s 651 after the preceding reward delivery time (trial end) and more than 2.5s prior to the subsequent tone onset 652 (trial start). If licks were separated by 3s or more, they were considered a new lick bout. To remain 653 consistent with tone and reward analyses, only the first 2.5s of a lick bout were analyzed for neural 654 responses. ITI lick bout reliability indices were calculated as described above. 655

Statistical Analysis 656
Statistical analysis for tuning coefficients were performed in Python and in R. Statistical analyses for 657 anticipatory licking, tone-and reward-responsivity, and reliability index were performed in Matlab using 658 the Statistics and Machine Learning Toolbox. Two-way ANOVA was used to test for differences in 659 anticipatory lick rate on day 1 and day 7. One-way ANOVA was used to test for differences in lick rate 660 during ITI, tone and reward. One-way ANOVA was used to compare the percent of active cells across 661 cell types on a single day. Monte-Carlo (as described above) was used to test for significant percent of 662 tone-and reward-responsive neurons, and for changes in tuning properties. Paired t-test was used to test 663 for differences in the percentage of responsive cells and reliability index on day 1 and 7, and for 664 differences in neuron reliability between ITI lick bouts, tone and reward.