Dopamine D1 receptor activation drives plasticity in the songbird auditory pallium

Vocal learning species must form and extensively hone associations between sounds and social contingencies. In songbirds, dopamine signaling guides song motor-production, variability, and motivation, but it is unclear how dopamine regulates fundamental auditory associations for learning new sounds. We hypothesized that dopamine regulates learning in the auditory pallium, in part by interacting with local neuroestradiol signaling. Here, we show that zebra finch auditory neurons frequently coexpress D1 receptor (D1R) protein, neuroestradiol-synthase, GABA, and parvalbumin. Auditory classical conditioning increased neuroplasticity gene induction in D1R-positive neurons. In vitro, D1R pharmacological activation reduced the amplitude of GABAergic and glutamatergic currents, and increased the latter’s frequency. In vivo, D1R activation reduced the firing of putative interneurons, increased the firing of putative excitatory neurons, and made both neuronal types unable to adapt to novel stimuli. Together, these data support the hypothesis that dopamine acting via D1Rs modulates learning and memory in the songbird sensory cortex.


Introduction 38
Vocal and auditory learning have evolved in select few organisms, including humans and 39 songbirds. Studying vocal learning in songbirds continues to provide insight into the suite of 40 mechanisms that support spoken language (Jarvis, 2019). The first step in the spoken language 41 learning process is to make socially-contingent associations about complex sounds. This likely 42 engages cortical brain structures, where sounds and meaning are bound. Time: χ 2 (7)=21.802, p=0.003; Treatment: χ 2 (1)=2.480, p=0.115; Time*Treatment: χ 2 (7)=4.000, 246 p=0.780; Dunnett's post-hoc test vs before-drug: p<0.05 on minutes 2 and 5 of SKF). 247 In summary, D1R activation in vitro reduced the amplitude of both GABA and 248 glutamatergic spontaneous currents, while also increasing the frequency of the latter. These 249 findings establish a role for dopamine modulation of network excitation and inhibition in NCM,250 and predict that D1R activation in vivo causes differential effects depending on cell type (i.e., 251 downregulate vs. upregulate GABAergic and glutamatergic neuron firing, respectively). 252

Cell type separation based on waveform measurements for in vivo recordings 253
We isolated 107 single-units from 9 adult birds in awake head-fixed recordings using 254 acute recording 32-channel microdrives coupled to retrodialysis probes (RetroDrives; 255 Supplementary Fig. 7; see methods). We measured peak-to-peak duration and ratio of each unit 256 and analyzed the data using an unsupervised hierarchical clustering algorithm (see methods; Fig.  257 5A). The gap-statistic results show that the variance in clustering was best explained by 4 258 clusters. Cell types differed significantly in waveform peak-to-peak duration (GLM/ANOVA: 259 χ 2 (3)=299.970, p<0.001; Tukey's post-hoc test: all p<0.01) and ratio (GLM/ANOVA: 260 χ 2 (3)=294.880, p<0.001; Tukey's post-hoc test: all p<0.001 except BS1-BS2 where p=0.972). In 261 summary, waveform peak-to-peak duration followed the pattern NS1<NS2<BS1<BS2 and 262 waveform peak-to-peak ratio followed BS2<BS1=NS1<NS2. 263 The classification commonly used in the literature of narrow-and broad-spiking neurons 264 in songbird pallium uses only peak-to-peak duration and a division boundary of ~0. Sugiyama, 2016). Using both peak-to-peak duration and ratio, we provide evidence of 2 further subdivisions; therefore, we named our clusters to extend the previous classification: NS1 and 268 NS2 -narrow-spiking; and BS1 and BS2 -broad-spiking. 269 Following clustering, non-auditory-responsive cells were excluded from the analyses (see NS1-BS2). Therefore, the 4 cell types clustered by waveform shape in our recordings also 284 differed in physiological profile. 285 Broadly speaking, NS1 cells had highly symmetrical and narrow action potentials, high 286 firing rates and z-scores, as well as fast response latencies and lower selectivity, which all 287 parallel features of mammalian cortical high-firing PV+ inhibitory interneurons. Compared to 288 NS1, NS2 cells had less narrow and symmetrical waveforms with lower firing rates, which 289 resemble properties of mammalian cortical low-firing somatostatin+ or VIP+ inhibitory 290 To explore further the change in spontaneous and stimulus firing due to SKF, we plotted 315 a correlation between the %-change in spontaneous versus stimulus firing induced by SKF (Fig.  316   7D). Values above 0 in either axis indicate an increase in firing due to SKF. Note that on average 317 BS1 and BS2 data points were situated above 0 in both axes, whereas NS1 and NS2 were below 318 0. Changes in spontaneous vs stimulus firing were also highly correlated (Pearson's  In this study, we show that dopamine D1 receptors (D1R) modulate learning and synaptic 364 plasticity in the secondary association pallium (NCM) of a songbird. Specifically, we show that 365 (1) D1R protein is prevalent in NCM neurons, especially in aromatase-, GABA-, and signaling are acting in tandem to modulate learning and memory in the songbird auditory pallium. Thus, we hypothesize that this co-modulatory signaling could apply to other brain 406 circuits that domiciliate both aromatase and dopamine receptors, such as human auditory cortex 407 (Yague et al., 2006). 408 In songbirds, conspecific song-driven EGR1 expression is a hallmark of NCM in contrast 409 to surrounding regions, and can even be used to define NCM's anatomical boundaries ( show that EGR1 is enhanced specifically in D1R+ neurons after an audio-visual classical 423 conditioning paradigm. Therefore, our findings support the view that EGR1 induction reflects 424 neuronal plasticity, rather than simply cellular activation per se (Duclot and Kabbaj, 2017). 425 We acknowledge the caveat that, in our task, auditory conditioning (video always 426 preceded by sound) prompted the Paired group to attend more to the video (i.e. more actively 427 looking at the screen or 'screen-time'), which was detected via our full GLM model. Still, the 428 same analyses revealed that this effect was dependent on the area and the hemisphere analyzed. We propose three models (Fig. 9B) for the effects we observed in vivo. If the cell types 506 we recorded are part of the same microcircuit, it is plausible they are affecting each other's firing 507 properties. Therefore, in our "connected model 1", we suggest that the D1R activation might be 508 increasing the tonic firing of a GABAergic neuron upstream to NS1 cells, thus inhibiting them 509 and disinhibiting BS1 cells. This model resembles a disinhibitory circuitry discovered in 510 mammalian cortex for auditory associative learning, in which learning activates layer 1 511 inhibitory interneurons, which inhibit layer 2/3 PV+ interneurons, thus disinhibiting pyramidal signaling in auditory cortex is involved in learning, it is plausible to hypothesize that dopamine 527 could be affecting SSA. In songbird NCM, SSA has been shown to parallel familiarity with 528 sounds, such that novel sounds will produce more negative slopes (i.e. higher SSA) than familiar, 529 previously adapted sounds (Chew et al., 1996). In fact, after successful behavioral association 530 learning, learned sounds produce less SSA than novel sounds (Bell et al., 2015). Here, we 531 provide evidence that D1Rs are involved in this process, such that pharmacological D1R 532 activation disrupts SSA in NCM neurons. Importantly, our data show that changes in SSA do not 533 correlate with changes in spontaneous or stimulus firing rate, suggesting that lower SSA is not 534 the product of a firing rate "floor-effect". 535 Our experiments were designed to test the prediction that blunt D1R activation would 536 produce cellular plasticity in NCM. However, it is important to note that naturalistic dopamine 537 signaling regulation is much more spatially and temporally targeted ( dopamine release are expected to be more nuanced spatially and temporally. With these aspects 541 in mind, we hypothesize that indiscriminate D1R activation forces the NCM circuit into a 542 "preadapted" state making it unable to adapt to subsequent presentation of novel sounds. Perhaps 543 dopaminergic activation more precisely paired with sound stimuli would produce more specific 544 changes. Therefore, future work should examine whether D1R activation in NCM paired with 545 sounds would promote changes in SSA and association learning, including juvenile song 546 learning. Furthermore, future studies should clarify through neuronal tract tracing which specific nuclei 558 provide dopaminergic inputs to NCM and whether the effects observed in this study can be 559 mimicked by dopamine release from such nuclei. 560 In conclusion, we show that D1R signaling shifts the excitatory-inhibitory balance in 561 songbird pallium to modulate mechanisms involved in auditory learning and key components of 562 auditory response, circuitry, and plasticity. We propose that D1Rs are important mediators of 563 learning and memory in the avian sensory pallium and this mechanism could be a common 564 feature among vertebrates.  (Table 1) were prepared in 10% NGS in 0.3% 588 PBT. To confirm antibody specificity, in a subset of sections the D1R antibody was preincubated 589 for 1 h with blocking peptide (Fig. 1B). Sections were incubated with primary antibodies for 1 h 590 at room temperature, followed by 2 days at 4 °C. Then, sections were washed 3x15 min in 0. Diamond with DAPI (Thermo Fisher). 595 Images were taken with a confocal microscope (Nikon A1si). First, NCM was localized 596 and a 4x4 large image was taken at 10x magnification. Then, using only the DAPI channel, the 597 microscope stage was digitally controlled and moved to selected locations on the 10x images, at 598 the ventral and dorsal posterior edges of NCM (Fig. 1C). Then, 15 µm (1 µm step size) z-stack 599 images were taken at 60x magnification, starting from the top-most surface of the section. Two 600 sections per hemisphere per animal were imaged. All laser intensities were maintained uniform 601 across all images within experiments. 602 D1R antibody penetration noticeably decayed at ~5 µm deep into the tissue, therefore 603 only the top 5 µm of each z-stack was quantified. Cell counts were performed by a blinded 604 experimenter using Fiji (ImageJ; NIH). Briefly, color histograms were set individually for each 605 image so that background was predominantly dark and only strong signals were counted. Only 606 antibody localization around the nuclei (DAPI) was included. Only cells with large, ovoidal 607 nuclei (presumably neurons) were counted. Antibody quantification was done using the z-stack, 608 while DAPI quantification was done using the z-max-projection image.  , 1992). Therefore, we exposed birds to Reference conditions to provide 632 expression level references to our Experimental groups. In the Song condition, animals were exposed to three different conspecific bouts of songs (each 18-21 s long; ~65 dB) repeated 10 634 times in pseudorandom order over 30 min. In the silence condition animals were not presented 635 with any stimulus for 30 min. 636

1.4.3.2
Immunofluorescence 637 EGR1 protein expression in NCM peaks between 1 and 2 h after induction (Mello and 638 Ribeiro, 1998). Therefore, in all conditions, after the presentation of the last stimulus (or after 30 639 minutes in the Silence group), chamber lights were turned off (to minimize further stimulation) 640 and animals remained in the chamber for an additional 50 minutes before they were retrieved for 641 perfusion (~10 min until PFA). Total time from beginning of exposure to fixative exposure was 642 ~1.5 h. 643 After perfusion, brains were processed, sectioned, and stained as described above, but 644 with the following differences. A triple immunofluorescence protocol was performed using 645 antibodies against NeuN, D1R, EGR1 and aromatase (Table 1)  were used. Imaging and cell counts were also performed as described above, except NeuN was 648 used as a background stain. 649

Behavioral scoring and analyses 650
Video recordings were snipped into 15-s clips around stimulus presentations using 651 custom Python code. Only the Experimental conditions (Paired and Unpaired) were analyzed. 652 Clips were scored twice, once for state and once for event behaviors. State behaviors 653 included beak direction (left, right, pointed at screen, or pointed at camera), sleeping, eating, 654 drinking or continuously moving/flying. Event behaviors were not mutually exclusive with 655 states, and included vocalization, singing, feather ruffling, head tilts, hopping, and gaping. Videos were scored by an experimenter blinded to the subject ID and trial order. Experimental 657 groups could be inferred by the observer because in the Paired group, the video playback shortly 658 followed the tone presentation. Behaviors were scored using JWatcher (Blumstein and Daniel, 659

2007). 660
Behavioral data were extracted and processed using custom Python scripts. Timestamps 661 and continuous behavior durations were aligned to stimulus timestamps and quantified as rates 662 (Hz; event behaviors) or percent of time spent (fraction of stimulus duration; state behaviors). 663 Behaviors outside of the stimuli presentation windows were not analyzed. Beak direction 664 behaviors that enabled subjects to see the screen (beak pointed left, right and towards screen) 665 were summed to comprise a new category termed "screen-time". Conversely, beak direction 666 opposite to the screen (back of the head facing video), eating, drinking and sleeping were 667 averaged to comprise a category termed "distractibility". 668

Recordings 670
Fifteen males were used for slice recordings across two experiments. We focused these 671 experiments on males to further explore mechanisms proposed in a previous behavioral study 672 done in males (Macedo-Lima and Remage-Healey, 2020). We note that we did not observe 673 systematic sex differences in the immunofluorescence and in vivo electrophysiology findings, but 674 we do not discard the possibility of sex differences. 675 After swift decapitation, the top of the skull was resected and the head was immediately 676 immersed in a Petri dish filled with ice-cold carbogen-aerated cutting solution (0-Mg 2+ cutting; 677 in mM: 222 glycerol, 25 NaHCO3, 2.5 KCl, 1.25 NaH2PO4, 0.5 CaCl2, 34 glucose, 0.4 ascorbic 678 acid, 2 Na2-pyruvate, 3 myoinositol. Standard: idem except 25 glucose and 3 MgCl2; ~320 mOsm/kg, pH 7.4). In the Petri dish, the cerebellum was resected and the brain was removed 680 from the skull. Then, brain was removed from the cutting solution and placed on an ice-cold 681 For most recordings, after bicuculline was delivered and allowed to take effect, a 1-min baseline 719 recording was made, but for a few cells (9 out of 25), a 7-min rundown recording followed. After recordings were completed, the recording pipette was slowly retrieved, and slices 736 were drop-fixed overnight in 4% paraformaldehyde in PB. Then, they were transferred to 737 cryoprotectant solution and kept at -20 °C until processed. 738

Analyses 739
Recordings were analyzed in IgorPro 6 (WaveMetrics, Lake Oswego, OR). All traces 740 were downsampled (5x) and lowpass filtered at 500 Hz. 741 In sEPSC experiments, for amplitude measurements, cells were only included (n=9) in 742 the analysis if series resistance did not change by more than 20% from baseline values. For 743 frequency recordings, all cells (n = 25) were analyzed, as recording quality fluctuations are not 744 expected to interfere with their detection, due to their high amplitude (>50 pA; noise band ~5 745 pA). Currents were thresholded and manually curated with NeuroMatic (Rothman and Silver, 746 2018). After curation, currents were automatically measured by custom IgorPro code.
For sIPSC recordings, cells whose series resistance changed more than 20% from 748 baseline were excluded from all analyses. For each cell, one template current was manually 749 selected and spontaneous PSCs were automatically detected using a spontaneous current 750 detection algorithm (Clements and Bekkers, 1997) implemented by Dr. Geng-Lin Li for IgorPro. 751 After detection, all IPSCs were measured automatically by custom IgorPro code. 752 were inserted, the tips of the probe and wires were offset by ~0.5mm. Importantly, the horizontal 793 distance between probe wires were ~0.2 mm. Finally, tetrodes were gold-plated to 200-250 kΩ 794 impedance and all wires and pins were covered with liquid electrical tape (Gardner Bender, New 795 Berlin, WI) and allowed to dry. RetroDrives were confirmed to successfully operate in NCM 796 using baclofen/muscimol delivery to locally silence neurons within minutes in an earlier study 797 (Macedo-Lima and Remage-Healey, 2020). 798

1.4.5.3
Recording protocol 799 On the day of the recording, a microdialysis probe was perfused with artificial 800 cerebrospinal fluid (aCSF; described below) using a microinjection pump (PHD2000, Harvard 801 Apparatus). RetroDrive wires were dipped in 6.25% DiI (Thermo Fisher) in 200-proof ethanol 802 for visualization of electrode tracks. Then, the animal was comfortably restrained and head-803 fixed. The Kwik-Cast was removed from the craniotomy over one of the hemispheres. Animal 804 and RetroDrive grounds were connected using alligator clips. The microdialysis probe was 805 inserted through the cannula and the RetroDrive was lowered to NCM (~1.5-2 mm from brain 806 surface; mediocaudally to skull markings). Importantly, tetrodes were positioned medially to the 807 probe such that wires were ~0.5 mm lateral and ~1.2 anterior from the stereotaxic zero 808 (midsagittal sinus). 809 Recordings were made while animals listened to auditory stimuli and aCSF (PRE) 810 followed by SKF-38393 (SKF) followed by aCSF (POST) were infused during song playback to 811 assess within-subject the effects of SKF on responses to auditory stimuli (described in detail 812 below). Recordings were completed within 4 hours of restraint. 813 Recordings were made from both hemispheres in different days. When recording in the 814 first hemisphere was completed, the craniotomy was resealed with Kwik-Cast and the animal 815 was returned to the home cage. Within 2 days, the second hemisphere recording was made, after 816 which the animal was overdosed with isoflurane and decapitated. The brain was drop-fixed and 817 cryoprotected in 30% sucrose in 10% formalin, and frozen until cutting. Cryostat sections were 818 obtained at 40 µm and imaged to confirm location of wires and probe. 819 Recordings were amplified and digitized by a 32-channel amplifier and evaluation board 820 (RHD2000 series; Intan Technologies, Los Angeles, CA) and sampled at 30 kHz using Intan 821 software. An Arduino Uno (Arduino, Sommerville, MA) was connected to the recording 822 computer to deliver TTL pulses to the evaluation board's DAC channel bracketing the beginning 823 and end of the audio stimuli (described below) to optimize detection during analysis. Audio 824 playback and TTL pulses were controlled by a custom-made MatLab (MathWorks, Natick, MA) 825 script which also controlled the Arduino and sent a copy of the audio analog signal to the 826 evaluation board ADC channel. 827

1.4.5.4
Stimuli 828 Zebra finch songs were obtained from multiple databases 829 (http://ofer.sci.ccny.cuny.edu/song_database), therefore unlikely to have been familiar to our 830 subjects. Twenty-four song files from unrelated birds were bandpass filtered at 0.5-15 kHz and 831 trimmed to include two consecutive motifs without introductory notes in Adobe Audition 832 (Adobe) and mean amplitude-normalized to 70 dB in Praat (Boersma and van Heuven, 2001). 833 Songs were randomly and equally split into two sets, then into 3 subsets containing 4 songs each. 834 For each animal, 1 set was used per hemisphere and, within a hemisphere recording, 1 subset 835 was used per treatment. This was done to ensure that for each treatment birds listened to novel 836 stimuli, because NCM neurons exhibit stimulus-specific adaptation (Chew et al., 1996). 837 Importantly, there was no difference in neuronal firing rates to different subsets, controlled by 838 each stimulus and within each subject and each neuron (GLM/ANOVA: Subset: χ 2 (5)=5.173, p=0.395). Therefore, responses across treatments are comparable as they were presumed to 840 reflect responses to novel stimuli. 841 Each playback session consisted of 4 conspecific songs, repeated 30 times each in 842 pseudorandomized order. Interstimulus interval was pseudorandom within the interval 5±2 s. 843 Audio pressure was amplified to ~65 dB as measured by a sound level meter (RadioShack, Fort 844 Worth, TX). Playback trial duration lasted ~20 min. 845 Recordings were made from each hemisphere on different days. For each animal's first 846 recording, the starting hemisphere was randomized in the first subjects, then counterbalanced 847 between sexes. The stimulus set was also initially randomized, then counterbalanced across sexes 848 and hemispheres, but the subset selected for each treatment was always randomized 849 (www.random.org). , where S and B 877 are the stimulus and spontaneous firing rates across stimulus trials, respectively. After computing 878 z-score by stimulus, those were averaged to yield a single z-score per unit per treatment. 879 Adaptation rates were calculated using trials 6-25, which is the approximate-linear phase 880 of the adaptation profile in NCM (Phan et al., 2006). For each stimulus, the stimulus firing rate 881 across trials was normalized by the firing on trial 6 (set to 100%). Then, a linear regression was calculated between trials 6 and 25. For each treatment, the minimum (steepest) adaptation slope 883 across stimuli was used for each unit. 884 Latency to respond to stimuli were calculated as in Ono et al. (2016). Briefly, for each 885 stimulus, 5-ms PSTH were generated and convolved with a 5-point box-filter. The latency to 886 respond to a stimulus was the time after stimulus onset in which the filtered PSTH rose above 3 887 standard deviations of the average preceding spontaneous firing period (100 ms). If threshold 888 was not crossed within 400 ms, that stimulus was excluded from analyses. 889

Statistical approach 890
All statistical analyses and plotting were performed using libraries for R and Python, 891 respectively. 892 Our general statistical approach was to perform generalized linear modeling (GLM; using 893 'lme4', 'glmmTMB', 'lmerTest', 'DHARMa' and 'car' R packages) followed by ANOVA. Data 894 were initially fitted with gaussian distributions. Normality of residuals was assessed using 895 DHARMa and Q-Q plot inspection following the GLM fits. If residuals violated normality, data 896 were refit with other distributions (Poisson or negative binomial) and residuals were reassessed 897 based on new distributions. If normality was still violated, data were log-transformed when 898 possible (non-negative, non-zero data) or rankit-transformed (Bliss et al., 1956) and the fitting 899 process above was repeated. If residuals were distributed according to GLM distributions, the 900 model was analyzed by ANOVA using Wald chi-square tests. If data still violated residual 901 diagnostics, non-parametric one-way analyses were performed (e.g. Kruskal-Wallis followed by 902 Dunn's post-hoc tests, Friedman test or Wilcoxon signed rank tests). 903 For immunofluorescence data, quantifications of the two sections belonging to the same 904 hemisphere and animal were averaged. Response variables were always percentages of the total 1.6