Abstract
Attending to mistakes while practicing alone provides opportunities for learning1, 2, but self-evaluation during audience-directed performance could distract from ongoing execution3. It remains unknown how animals switch between practice and performance modes, and how evaluation systems process errors across distinct performance contexts. We recorded from striatal-projecting dopamine (DA) neurons as male songbirds transitioned from singing alone to singing female-directed courtship song. In the presence of the female, singing-related performance error signals were reduced or gated off and DA neurons were instead phasically activated by female vocalizations. Mesostriatal DA neurons can thus dynamically change their tuning with changes in social context.
When a male zebra finch sings its courtship song to a female of interest, song is highly stereotyped and tonic levels of dopamine (DA) are increased in Area X, a vocal motor basal ganglia nucleus capable of regulating song variability4–8. Yet when males practice alone, song is highly variable and tonic DA levels are decreased in Area X4, 9. Blockade or disruption of striatal DA signaling eliminates the social context-dependent transition between ‘practice’ and female-directed ‘performance’ modes10, 11, suggesting that tonic DA levels actively regulate ongoing vocal variability5, 7, 8, 12.
DA has an additional learning function during singing distinct from, and difficult to reconcile with, its role in modulating vocal variability during courtship7, 13. Specifically, when males sing alone, Area X projecting DA neurons in the ventral tegmental area (VTAx) encode phasic error signals, necessary and sufficient for learning14–16, characterized by brief suppressions following worse-than-predicted song syllable outcomes and activations following better-than-predicted ones (Fig. 1a)17. Phasic DA signals thus encode errors in predicted song quality, i.e. the difference between how good a syllable sounded and how good it was predicted to sound based on recent practice.
(a) When singing alone, VTAx DA neurons exhibit a low baseline ‘tonic’ firing rate as well as phasic activations and suppressions following undistorted (blue) and distorted (red) syllable renditions, respectively17 (black dotted line indicates baseline firing rate when singing alone). (b) Schematic of possible outcomes for a VTAx neuron recorded during female directed song. Tonic rate could either increase (right column) or not (left column). From top to bottom: phasic error signals could be unchanged, bigger, smaller or be gated off altogether. Other possible outcomes (e.g. tonic rate could decrease, phasic activations and suppressions could be independently altered) are not shown. Black dotted lines denote baseline firing rate when male sings alone.
Do the same songbird DA neurons that modulate ongoing vocal variability also evaluate recent vocal performance for learning, and if so, how? In mammals, it has been proposed that the state-dependent vigor of ongoing behavior is regulated by the tonic discharge of DA neurons, while the evaluation of reward outcomes for learning is regulated by brief, phasic error signals in the same neurons18–21. To test this hypothesis, it is necessary to observe how tonic firing rates and phasic error signals change (or don’t change) in single neurons across clear-cut, DA-dependent changes in behavioral state. This experiment is uniquely possible in songbirds singing alone or singing female-directed courtship song (Fig. 1).
To test how DA neurons may implement these dual functions, we recorded antidromically-identified VTAx neurons as we controlled both perceived error (with syllable-targeted distorted auditory feedback (DAF)17, 22, 23) and behavioral state (with female present or absent)5, 7, 8. Surprisingly, tonic discharge patterns of VTAx neurons, including mean firing rate, median interspike interval (ISI), burstiness and firing regularity, did not significantly differ between undirected and directed song (Fig. 2, mean rates: 13.86±3.22 Hz undirected vs. 14.66±3.48 Hz female-directed; median ISI: 0.046±0.018 s undirected vs 0.045±0.016 s female-directed; coefficient of variation of the ISI distribution (CVisi): 0.88±0.18 undirected vs. 0.89±0.19 female-directed; peak of the normalized spike train autocorrelation: 1.15±0.11 undirected vs. 1.13±0.12 female-directed, n=8 neurons; p>0.05 for all measures, paired two-sided Wilcoxon signed rank tests). Tonic DA discharge patterns during non-singing periods were also not substantially affected (Extended Data Fig. 1). Thus previously reported increases in striatal DA levels and associated reduction in courtship song variability4–7, 12 are unlikely to be caused by changes in DA spiking activity, suggesting a role for spiking-independent regulation of DA release or re-uptake at striatal synapses24–26.
(a) Top to bottom: spectrograms, spiking activity during female-directed and undirected songs, corresponding spike raster plots and rate histograms, and z-scored difference in firing between undirected and directed motif-aligned rate histograms (all plots aligned to motif onset). (b-c) ISI distribution (b) and normalized spike train autocorrelogram (STA) (c) during singing alone (black) and female directed (green) songs for the neuron shown in a. Insets: ISI distributions (b) and STAs (c) for 8 VTAx neurons (mean +/− SEM shading). (d-g) Mean firing rate (d), median ISI (e), coefficient of variation of the ISI distribution (CVisi) (f), and peak of the STA (g) for 8 VTAx neurons recorded when males sang alone and when they sang female-directed song (n.s. denotes p>0.05, paired two-sided Wilcoxon signed rank test).
To test how the transition to female-directed song affects phasic error signals, we recorded neuronal responses to syllable-targeted DAF17, 22, 23 as males sang alone and to a female. DAF, though not generally aversive27, induces a perceived vocal error on distorted renditions such that undistorted renditions are reinforced22, 23 by phasic DA signals14–16. Consistent with past work17, VTAx neurons recorded during undirected singing exhibited phasic error signals characterized by suppressions following distortions and phasic activations at the precise moment of the song when a predicted distortion did not occur (significant error response in 7/8 VTAx neurons, Methods). Significant suppressions followed DAF onset with a latency of 63±14 ms, lasted 67±21 ms, and resulted on average in a 55±16% reduction in firing rate (significant suppressions observed in 6/7 VTAerror neurons, Methods). Significant phasic activations mirrored phasic suppressions: they followed undistorted target onsets with a latency of 46±25 ms, lasted 64±14 ms, and resulted on average in a 37±10% increase in firing rate (significant activations observed in 6/7 VTAerror neurons, Methods) (Fig. 3).
Error responses during singing alone (a) and female-directed singing (b) for the same antidromically identified VTAx neuron. Top to bottom: spectrograms, spiking activity during undistorted and distorted trials, corresponding spike raster plots and rate histograms, and z-scored difference between undistorted and distorted rate histograms (all plots aligned to target onset). Horizontal bars in histograms indicate significant deviations from baseline (p < 0.05, one-sided z test). (c) Response to female calls for the same antidromically identified VTAx neuron. Top to bottom: spectrograms of female calls and spiking activity, corresponding spike raster plot, and rate histograms (all plots aligned to female call onset). Horizontal bars in histograms indicate significant deviations from baseline (p < 0.05, one-sided z test). (d) Top, normalized responses to distorted targets (mean ± SEM). Bottom, scatter plot of normalized rate in the 50 to 125 ms window following target time (solid fills indicate p < 0.05, bootstrap, Methods) for undirected and female-directed singing (* denotes p<0.05, paired two-sided Wilcoxon signed rank test). (e) Same as (d) but for undistorted targets. (f) Top, normalized responses to female calls (mean ± SEM). Bottom, scatter plot of normalized rate in the 50 to 125 ms window following female calls onset (solid fills indicate P < 0.05, bootstrap, Methods).
Phasic error responses that were robust during undirected singing were usually gated off during courtship song (z-scored error responses, undirected: 2.6±0.5; directed: 1.1±1.3; p<0.05, paired two-sided Wilcoxon signed rank test, loss of significant error response in 6/7 VTAerror neurons, Methods) (Fig. 3).
We wondered if reduced performance error signaling during female-directed song could occur if the male attended less to evaluating his own song and more to real-time interaction with the female. Although female zebra finches do not sing, they can respond to male courtship efforts with vocal calls of her own28. Consistent with the idea that phasic DA signals can depend on female behavior, female calls induced phasic activations in every VTAx DA neuron recorded in sessions where female calls were produced. The timing and magnitude of female call-induced activations resembled the phasic activations observed following undistorted targets during undirected singing (latency from call onset: 39±24 ms, duration: 93±28 ms, 42±17% increase in firing rate, p<0.05 in 7/7 neurons, bootstrap).
Together these findings show, for the first time to our knowledge, that tonic DA spiking is not strongly activated during courtship behavior, that the tuning of DA neurons can dynamically change with social context, that DA neurons can be phasically activated by vocal signals of a potential mate, and, more generally, that mistakes are processed differently during ‘practice’ and audience-directed ‘performance’ modes.
Author Contributions
VG and JHG designed the research, analyzed data, and wrote the paper. VG and PAP performed experiments.
Competing interests
The authors declare no competing financial interest.
Additional information
Data can be accessed at http://www.nbb.cornell.edu/goldberg/
Supplemental Text
During non-singing periods in between female-directed song bouts, male birds exhibit motivated pursuit-like behaviors, including orienting, and producing vocal calls towards the female28. VTAx neurons exhibited a small but significant increase in mean firing rate during non-singing periods with the female present (mean rates: 11.69±2.88 Hz undirected vs. 12.83±3.05 Hz female-directed, p<0.01; median ISI: 0.068±0.020 s undirected vs 0.061±0.017 s female-directed, p<0.05), but discharge patterns measured by CVisi and STA did not differ (CVisi): 0.77±0.21 undirected vs. 0.79±0.19 female-directed, p>0.5; peak of the normalized STA: 1.11±0.11 undirected vs. 1.09±0.08 female-directed, p>0.5, paired two-sided Wilcoxon signed rank tests) (Extended Data Fig. 1).
(a-b) ISI distribution (a) and normalized spike train auto-correlogram (STA) (b) during non-singing periods with female present (green) and absent (black). Data from the neuron shown in Fig. 2. Insets: ISI distributions (b) and STAs (c) for 8 VTAx neurons (mean +/− SEM shading). (c-f) Mean firing rate (c), median ISI (d), coefficient of variation of the ISI distribution (CVisi) (e), and peak of the STA (f) for 8 VTAx neurons recorded during non-singing periods with female present and absent (* denotes p<0.05 and n.s. denotes p>0.05, paired two-sided Wilcoxon signed rank test).
Methods
Animals and surgery
Subjects were 4 adult male (91-240 days old) and 3 adult female (100-200 days old) zebra finches. All experiments were carried out in accordance with NIH guidelines and were approved by the Cornell Institutional Animal Care and Use Committee. During implant surgeries, birds were anesthetized with isoflurane and a bipolar stimulation electrode was implanted into Area X at established coordinates (+5.6A, +1.5L relative to lambda and 2.65 ventral relative to pial surface; head angle 20 degrees)17. Custom microdrives carrying an accelerometer, linear actuator, and omemade electrode arrays (5 electrodes, 3-5 MOhms, microprobes.com) were implanted into a region where antidromically identified VTAx neurons were intraoperatively identified. After each experiment, small electrolytic lesions (30 μA for 60 s) were made with the recording electrodes for histological verification of electrode position. Brains were then fixed, cut into 100 μm thick sagittal sections and immuno-stained with antibodies to tyrosine hydroxylase for histological confirmation of reference lesions among dopamine neurons as described previously17.
Syllable-targeted distorted auditory feedback
Postoperative birds were placed in a sound isolation chamber equipped with a microphone and two speakers which provided distorted auditory feedback (DAF). To implement targeted DAF, the microphone signal was analyzed every 2.5 ms using custom Labview software. Specific syllables were targeted by detecting a unique inter-onset interval (onset time of previous syllable to onset time of target syllable) using the sound amplitude as previously described17. The targeted syllable was programmed to be distorted with DAF 50% of the time (actual distortion probability: 48±3%). DAF was a broadband sound bandpassed at 1.5-8kHz, the same spectral range of zebra finch song. DAF amplitude was measured with a decibel meter (CEM DT-2 85A) and maintained at less than 90 dB.
Electrophysiology
Neural signals were band-passed filtered (0.25-15 kHz) in homemade analog circuits and acquired at 40 kHz using custom Matlab software. Single units were identified as Area X projecting (VTAx) by antidromic identification (stimulation intensities 50-400 μA, 200 μs on the bipolar stimulation electrode in Area X). All neurons identified as VTAx were further validated by antidromic collision testing17.
Data analysis
For each neuron, spiking data were first collected during undirected song when the male was singing alone in the sound isolation chamber. The male was then presented with either an adult female in a separate cage (9/11 neurons) or a video of an adult female displayed on a screen (2/11 neurons) within the sound isolation chamber. Neurons included in singing analysis (8/11) were recorded for at least 30 motifs of undirected and 30 motifs of female-directed song. 3/11 VTAx neurons recorded exclusively during female calls were included in the analysis. Neurons included in female call analysis were recorded during at least 60 renditions of natural, spontaneous female calls. Spike sorting was performed offline using custom Matlab software. Instantaneous firing rates (IFR) were defined at each time point as the inverse of the enclosed interspike interval (ISI). Firing rate histograms were constructed with 25 ms bins and smoothed with a 3-bin moving average. To calculate the mean rate and median ISI during singing (Fig. 2d-e), the firing rate and median ISI were averaged over all song motifs, with a time-window extending 50 ms before to motif-onset to 50 ms after motif-offset. The coefficient of variation (CV) of the ISI and the peak of the spike-train autocorrelation (STA) in Fig. 2f-g were computed over the entire singing bouts. To test for error responses, we compared the firing activity between randomly interleaved undistorted and distorted song renditions. We computed the z-scored difference between the target time-aligned distorted and undistorted firing rate histograms (Fig. 3a-b). The target time was defined as the median DAF onset-time relative to the distorted syllable onset-time. The error response was defined as the mean z-scored difference in a 50-125 ms window following target time17. Monte Carlo methods were used to quantify the significance of rate changes following target times of the song and following female calls (Fig. 3d-f) as previously described17. Briefly, the mean number of spikes within a 50-125 ms window after DAF, undistorted target onset, or female call onset was compared to 10,000 surrogate means generated by calculating the mean number of spikes in an identical number of randomly placed windows during singing (for undistorted and distorted targets), and during non-singing periods (for female calls). P values for the suppression (or activation) were calculated by analyzing the frequency with which the surrogate means were less than (or greater than) or equal to the observed mean (Fig. 3d-f). To quantify the magnitude of significant activation and suppressions (Fig. 3d-f), we calculated the normalized firing rate as follows: the mean number of spikes in a 50-125 ms window after DAF target time (or female call onset) was normalized by the mean number of spikes in 10,000 randomly placed identical windows during singing (for undistorted, distorted, and female call targets)17. To calculate the significance bars shown in Fig. 3a-c, spiking activity within ±1 second relative to target onset was binned in a moving window of 30 ms with a step size of 2 ms. Each bin after the target time was tested against all the bins in the previous 1 second (the prior) using a one-sided z-test17. To calculate the latencies and durations of significant activations (suppressions), a threshold of half the firing rate histogram maximum (minimum) was applied to the firing rate histogram. The onset-time was defined as the first increasing (decreasing) threshold-crossing after target time, while the offset was defined as the first decreasing (increasing) threshold-crossing after onset-time.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The custom Matlab code used in this study are available from the corresponding author upon reasonable request.
Acknowledgments
We thank members of the Goldberg Lab for comments on the manuscript and Samantha Carouso-Peck and Michael H. Goldstein for help with social context manipulation. VG was supported by a Simons Foundation Postdoctoral Fellowship and a NIH/NINDS Pathway to Independence Award (grant # K99NS102520), PAP by NIH/NINDS (grant # F32NS098634), and JHG by NIH/NINDS (grant # R01NS094667), Pew Charitable Trusts, and Klingenstein Neuroscience Foundation.