Abstract
Neurons exhibit diverse intrinsic dynamics, which govern how they integrate synaptic inputs to produce spikes. Intrinsic dynamics are often plastic during development and learning, but the effects of these changes on stimulus encoding properties are not well known. To examine this relationship, we simulated auditory responses to zebra finch song using a linear-dynamical cascade model, which combines a linear spectrotemporal receptive field with a dynamical, conductance-based neuron model, then used generalized linear models to estimate encoding properties from the resulting spike trains. We focused on the effects of a low-threshold potassium current (KLT) that is present in a subset of cells in the zebra finch caudal mesopallium and is affected by early auditory experience. We found that KLT affects both spike adaptation and the temporal filtering properties of the receptive field. Interestingly, the direction of the effects depended on the temporal modulation tuning of the linear (input) stage of the cascade model, indicating a strongly nonlinear relationship. These results suggest that small changes in intrinsic dynamics in tandem with differences in synaptic connectivity can have dramatic effects on the tuning of auditory neurons.
Introduction
Neurons have diverse, nonlinear dynamics. Many brain regions contain multiple kinds of neurons with different spike waveforms and spiking patterns (Bal and Oertel, 2001; Sivaramakrishnan and Oliver, 2001; Ascoli et al., 2008), and there is substantial variation even within well-defined cell types (Toledo-Rodriguez et al., 2004; Schulz et al., 2006; Bomkamp et al., 2019). Intrinsic dynamics can be modified by activity and experience (Ross et al., 2017; Daou and Margoliash, 2020; Chen and Meliza, 2020), which may be an important mechanism for learning (Titley et al., 2017). This physiological diversity has been known for many decades (Llinás, 1988) and can be modeled on a detailed, biophysically realistic level (Padmanabhan and Urban, 2010; Tripathy et al., 2015), but our understanding of how intrinsic dynamics affect neural computations in many systems has remained surprisingly qualitative.
One reason why biophysical models are difficult to apply to processes at the algorithmic and computational levels is their complexity. A simple, single-compartmental model that can produce common physiological behaviors like bursting, adaptation, or rebound spiking, is a system of around ten or more nonlinear differential equations, with fifty or more parameters (e.g., Rothman and Manis, 2003; Meliza et al., 2014). These parameters correspond to specific aspects of the cell biology (such as membrane capacitance or sodium channel density), which makes them easy to interpret and, in some cases, possible to measure directly. However, the relationships between the parameters and the observable behaviors of the neuron are highly nonlinear, making it difficult to constrain them statistically. It is difficult and time-consuming to fit dynamical models to biological data (Druckmann et al., 2008; Van Geit et al., 2008; Toth et al., 2011; Vavoulis et al., 2012), and there is little consensus on the appropriate methods or even whether there are globally optimal solutions (Prinz et al., 2004). Moreover, access to the intracellular voltage is needed, through a low-resistance electrode or an optical sensor (Huys and Paninski, 2009), which greatly limits the number of neurons that can be modeled within the context of a circuit, and almost always requires the use of ex vivo preparations that cannot be presented with realistic stimuli.
As a consequence, many studies of function in neural systems have emphasized phenomenological models that omit most of the biophysical and dynamical features of spike generation in exchange for computational tractability (Keat et al., 2001; Jolivet et al., 2004; Kobayashi et al., 2009; Izhikevich, 2003). One of the simplest examples is the generalized linear model (GLM), which represents spiking as an inhomogeneous Poisson process with a conditional intensity that depends only on a linear function of the stimulus and spiking response in the recent past (Pillow et al., 2005). In contrast to more realistic models, the GLM is a staple of statistics, with a well-defined likelihood function that is concave everywhere, guaranteeing that a global optimum can be found (Paninski et al., 2004). The GLM also has established techniques for regularization, which is necessary when stimuli have naturalistic (i.e., highly correlated) distributions (Theunissen et al., 2001; Schwartz et al., 2006).
Because of its simplicity and probabilistic formulation, a GLM can be thought of as a representation of a neuron’s encoding properties; that is, an abstract view of how the cell transforms sensory stimuli into spike trains. Surprisingly, although GLMs have been successfully used to model encoding in a number of different sensory systems (Pillow et al., 2005; Calabrese et al., 2011), and there have been several studies using GLMs to predict and characterize more complex spiking models (Ostojic and Brunel, 2011; Pozzorini et al., 2015; Weber and Pillow, 2017), to our knowledge there has not been any attempt to relate the GLM to more detailed, dynamical models with realistic sensory inputs. As a result, it is difficult to predict how natural, pathological, or experience-dependent variations in voltage-gated channels are likely to affect sensory processing.
In this study, we examined the relationship between intrinsic dynamics and encoding properties in the context of auditory processing in songbirds. Encoding models, including GLMs, have been employed extensively to study this system (Theunissen et al., 2000; Sen et al., 2001; Nagel and Doupe, 2008; Woolley et al., 2009; Calabrese et al., 2011), but until recently, there have been no data on the intracellular physiology of the constituent neurons. Using whole-cell patch recordings from slices, we have found that the caudal mesopallium (CM), a cortical-level auditory area (Wang et al., 2010; Jarvis et al., 2013), has diverse, experience-dependent intrinsic dynamics (Chen and Meliza, 2018, 2020). Most of the putatively excitatory neurons fire repetitively when depolarized, but a substantial fraction only fire at stimulus onsets. This phasic firing behavior is correlated with strong outward rectification that activates at low voltages, and it can be pharmacologically converted to tonic firing by blocking low-threshold potassium currents (KLT). The proportion of phasic neurons changes over development, along with expression of Kv1.1, a low-threshold potassium channel, and both depend on exposure to a complex acoustic environment early in life.
The dependence of phasic firing on auditory experience suggests that intrinsic plasticity (i.e., a change in the expression or properties of voltage-gated currents, rather than synaptic currents) plays a critical role in development for songbirds, but for all the reasons noted above, the functional significance remains unclear. Here, we took a simulation-based approach to ask how changing the magnitude of low-threshold potassium currents in a dynamical model would affect encoding properties, as estimated with a GLM.
We simulated auditory responses using a linear-dynamical cascade model (Bjoring and Meliza, 2019), which combines a linear spectrotemporal receptive field (RF) with a single-compartment biophysical model (Fig. 1A). The dynamical model we used can produce phasic or tonic responses depending on a single parameter that governs the maximal conductance of a low-threshold potassium current (gKLT) (Fig. 1B–C). As shown previously, increasing this parameter causes neurons to attenuate their responses to low-frequency components of a broadband driving current (Chen and Meliza, 2018) and to be less responsive to random synaptic-like noise (Bjoring and Meliza, 2019). Here, to quantify the effects of these changes on encoding, we simulated responses of the linear-dynamical model to zebra finch songs, convolving the stimuli with representative RFs (Woolley et al., 2009) and using the output as an external driving current to the dynamical system. We used the spike trains produced by these simulations to fit GLMs (Fig. 2) and then compared estimates for the RF and spike-history parameters to determine how KLT influenced how the model was representing the acoustic structure of the stimulus.
(A) The linear stage of the model consists of the convolution of a stimulus with a receptive field. The output of the convolution is input as a driving current (Istim(t)) into a biophysical model (B) that represents membrane voltage dynamics as a system of ordinary differential equations. A stimulus-independent current with a pink-noise spectral distribution (Inoise(t)) is added to produce intertrial variability. (C) The model is numerically integrated to produce a simulated voltage trace. Multiple trials are simulated by keeping Istim(t) the same from trial to trial, while drawing new values for Inoise(t).
The data to be fit comprise a stimulus, which can be a univariate time series or a multivariate spectrogram (as shown here), and a spiking response. The model represents the response as an inhomogeneous Poisson process with a conditional intensity that depends on the convolution of the stimulus with a receptive field (K) and the convolution of the response with a spike-history filter, which was parameterized as the sum of two exponential decays representing short-term (α1) and long-term (α2) adaptation or facilitation. Not shown is a constant offset ω, which governs the baseline probability of firing. These model parameters are estimated by regularized maximum likelihood.
Results
Univariate white-noise stimulus
As a proof of principle, we began with an example using a white-noise stimulus drawn from a univariate Gaussian distribution. The absence of temporal correlations in this stimulus is ideal for obtaining unbiased estimates of the GLM parameters, allowing us to determine how intrinsic dynamics affect encoding in a best-case scenario.
We generated data for fitting the GLM by providing 100 s of white noise as input to two linear-dynamical cascade (LDC) models that had the same RF but different dynamics. The dynamical stage of the model was based on our previous work in the zebra finch caudal mesopallium (Chen and Meliza, 2018; Bjoring and Meliza, 2019). The tonic model lacks KLT and has a higher capacitance, whereas the phasic model includes KLT and has a lower capacitance (see Methods for parameter values). These models reproduce the responses to step currents (Fig. 3A) and broadband currents seen in slices. Both LDC models produced similar responses to the white noise stimulus, but the phasic model tended to have narrower peaks of activity (Fig. 3B–C).
(A) Voltage responses of tonic and phasic models to high- and low-amplitude injected current steps (shown in bottom row). The tonic model exhibits depolarization block to strong currents but fires repetitively to weak currents, whereas the phasic model only fires a single spike to all suprathreshold current levels. (B) Top, response of the tonic dynamical model to a white-noise stimulus. The input RF is shown in D. Middle, raster plots of spike times from 10 trials with the same stimulus but varying Inoise(t). Black ticks correspond to the output of the dynamical model and colored ticks are the predictions of a GLM fit to a different set of data from this model. Bottom, spike rate histograms (bin size = 10 ms) for 50 trials from the dynamical model (black) and the GLM (yellow). Only a subset of the full test data is shown. (C) Like B, but for the model with phasic dynamics. The stimulus, RF, and noise level were the same. (D) Estimated RFs from the GLMs compared to the input RF of the dynamical model. To indicate posterior uncertainty in the estimates, individual samples from the MCMC sampler are shown in light gray, and the median is overlaid in color. (E) Posterior distributions of baseline firing rate (ω) and spike-history filter parameters (α1 and α2). The top panels in each column show marginal distributions for individual parameters, and the panels in the lower left corner show joint distributions for each pair of parameters. Note that more positive values of α1 and α2 correspond to stronger adaptation (i.e., a negative correlation with past spiking).
In general, parameter estimates are only interpretable to the extent that the model is a good fit to the data. We checked the goodness of fit by comparing the responses of the LDC model and the fitted GLM to a new white-noise stimulus. The output of the GLM was an excellent prediction of the dynamical model’s response (Fig. 3B–C). Indeed, the correlations between the average firing rates for LDC data and GLM prediction (tonic: r = 0.96; phasic: r = 0.84) were comparable to the correlations between average rates of even and odd trials in the data (tonic: 0.94; phasic: 0.90)—as good as could be expected given the intrinsic variability of the data. Thus, at least for white-noise stimuli, the linear spike-history filter and static nonlinearity of the GLM can closely approximate the dynamical nonlinearity of a single-compartment biophysical model. This allows us to interpret the GLM parameters as meaningful descriptions of the encoding properties of the more complex model.
The LDC and GLM both have receptive fields that are convolved with the stimulus to produce a signal that modulates the probability of spiking. When a GLM is fit using data from an LDC model, we expect the estimated RF to resemble the RF used to generate the data, but not exactly. Indeed, differences between the input and estimated RFs will reflect the effects of the intrinsic dynamics. One expected effect is from the filtering properties of the membrane. In the GLM, firing probability depends on a static, exponential function of the convolved stimulus (Fig. 2). In the LDC model, the output of the convolution enters as a current that contributes linearly to the derivative of the membrane voltage. The capacitance and conductance of the membrane act as an additional, lowpass filter, so we would expect the estimated RF to be a lowpass-filtered version of the input RF.
In the time domain, the effect of the membrane would be to stretch the RF out in time. In fact, what we observed was that the estimated RFs were either very close to the input RF (Fig. 3D, top) or compressed in time (Fig. 3D, bottom), corresponding to a relative boosting of higher frequencies. This would not be possible for a model with a purely passive membrane; therefore, it must be the active, voltage-gated currents that are shifting the model’s temporal encoding properties. This temporal distortion, which is consistent with the bandpass characteristics of KLT (Meng et al., 2012; Chen and Meliza, 2018), will be explored further in subsequent analyses.
Intrinsic dynamics also affected the spike-history filter. Unlike the RF, the parameters for the spike-history filter do not correspond to specific parameters in the LDC model; however, we expect them to reflect the effects of currents that are activated by spiking. As seen in Fig. 3E, the spike-history filter was stronger on both short (α1) and long (α2) timescales for data from the tonic model compared to the phasic one. The posterior uncertainty in these parameter estimates was low compared to the difference between dynamical models. This means that the spiking patterns produced by phasic and tonic cells are sufficiently different, at least for this kind of stimulus and amount of data, to observe changes in a single biophysical parameter.
Multivariate birdsong stimulus
Having demonstrated that the GLM can be used to analyze the encoding properties of a dynamical model, we turned to a more realistic scenario using natural birdsong as the stimulus. The dynamics remained the same as in the univariate case, but the linear stage was replaced with a spectrotemporal RF. The stimulus, which consisted of 40 s of song from multiple zebra finches, was converted to a spectrogram and convolved with the RF, summing across spectral channels. This produced a univariate time series that entered into the dynamics as an external current.
We used RFs that were representative of the diversity found in cortical-level auditory neurons. RF structure can be analyzed in terms of the modulation transfer function (MTF), a 2-D Fourier transform of the RF that shows its joint spectral and temporal tuning (Woolley et al., 2005). Most of the neurons in the zebra finch primary auditory pallium have MTFs with power along either the spectral or temporal axis, indicating that they can be tuned to narrow spectral bands or to rapid modulations of the temporal envelope, but only rarely to both (Woolley et al., 2009). This distribution is similar to the modulation spectrum of zebra finch song (Woolley et al., 2005) and at least partly reflects the statistics of early auditory experience (Moore and Woolley, 2019). Here, we simulated responses using 58 synthetic RFs drawn from this distribution (Woolley et al., 2009). Each RF was combined with the tonic and phasic dynamical models, so that we could quantify the effects of KLT across RF types and determine if there was any interaction with RF structure. As before, the simulated responses were used to estimate GLM parameters, but with the addition of elastic-net regularization to account for the high sparsity of the RFs and the autocorrelations present in the birdsong stimulus (see Methods).
We begin by examining three examples representative of the distribution. As will be seen, the temporal characteristics of the input RF have a consistent effect on encoding properties, so we have denoted these three examples in terms of their temporal modulation transfer functions (tMTFs): wideband (WB), bandpass-low (BP-L), and bandpass-high (BP-H). As seen in Fig. 4A–F, the fitted GLMs had good predictive performance for both the phasic and tonic models and across all three input RFs, with high correlations between the spike rate histograms produced by the LDC and GL models to a novel birdsong stimulus. Thus, even with many more parameters and an autocorrelated stimulus, the GLM is still a good tool for analyzing the encoding properties of the dynamical models.
(A) Receptive field parameters and responses for a model with tonic dynamics and a spectrally narrowband, temporally wideband RF. Top left, input RF in the LDC model. Top right, estimated RF from GLM. Note the temporal smearing and the broad suppression at longer lags in the estimated RF. Middle, examples of spiking responses to zebra finch song from the LDC model (top, black ticks) and the fitted GLM (bottom, red). Bottom, corresponding spike rate histograms (50 trials) for the LDC and GLM (product-moment correlation: rWB = 0.86). (B-C) RFs and responses for models with tonic dynamics and BP-L (B) or BP-H RFs (C), same format as in (A). The GLM accurately predicted the firing rate of the LDC for these parameter values (rBP-L = 0.93, rBP-H = 0.86). (D-F) RFs and responses for models with the same RFs as in (A-C), but with phasic dynamics (rWB = 0.92, rBP-L = 0.95, rBP-H = 0.83). All prediction correlations were high considering the underlying spiking variability in the even and odd trials of the LDC (product-moment correlations: tonicWB = 0.92, tonicBP-L = 0.89, tonicBP-H = 0.83; phasicWB = 0.93, phasicBP-H = 0.90, phasicBP-H = 0.91). (G) Posterior distributions of α1 and α2 comparing dynamical models for each RF. (H) Temporal modulation transfer functions of input RFs, tonic model estimates, and phasic model estimates for each of the three input RFs. Power is normalized relative to the peak for each spectrum.
As with the univariate case, the estimated RFs were qualitatively similar to the input RFs, but with distortions in the temporal profile. Most of the estimated RFs appeared to be smeared in time and with stronger and longer suppressive periods. Some of the distortions were consistent across tonic and phasic models, but there were also differences between the two dynamical models that reflect the effects of KLT. We analyzed these effects by looking at the tMTFs, which are calculated by summing the 2D Fourier transform of the RFs across the spectral dimension (Fig. 4G). These plots show how well the model neuron is able to encode temporal modulations in the stimulus as a function of frequency. All of the estimated RFs were tuned to frequencies below 100 Hz, which is about the fastest temporal modulation rate found in zebra finch song (Singh and Theunissen, 2003). Although some of the input RFs had the potential to represent faster modulations, these frequencies were attenuated in the estimated RFs, probably because of the passive filtering properties of the membrane and the statistics of the stimulus. The main differences between the dynamical models were in the attenuation of low frequencies. Strikingly, the effects of the dynamics on lowpass attention varied across RFs. For the WB input, the estimated tMTF was more bandpass in the phasic model compared to the tonic model, while the opposite was true for the BP-L and BP-H inputs. Thus, not only does KLT change the temporal encoding properties of the neuron, but this effect is different depending on the filtering properties of the inputs (i.e., the input tMTF).
The posterior distributions for the spike-history parameters were broader than for the univariate examples (Fig. 4H), indicating that the estimates are more poorly constrained by the data. This was expected, given that the stimulus was shorter and more correlated. Although this reduced our ability to resolve effects of dynamics on these parameters for single neurons, as the next section will show, the trends in these examples were consistent across the larger sample of RFs.
As with the RF temporal structure, the spike-history filter parameters were affected by the interaction of RF type and dynamics. In general, phasic models had stronger adaptation than tonic models, as indicated by larger values of α1 or α2 (Fig. 4F). This effect was in the opposite direction from what we saw in the univariate case (Fig. 3E), where tonic neurons had larger values of α1 and α2. This discrepancy presumably reflects differences in the stimulus statistics, because the univariate example RF was qualitatively similar to the temporal profile of the example RFs. As has been reported previously, neuron models fit to white-noise stimuli produce poor predictions to natural stimuli (Theunissen et al., 2001). The univariate GLMs produced good predictions because they were fit and tested with white-noise stimuli, but the parameter estimates do not generalize to other kinds of stimuli. Like many other natural stimuli, a key feature of birdsong is that the temporal envelope is dominated by low frequencies. These slow oscillations produce sustained periods of excitation or inhibition that drive the dynamical model into regimes where adaptive processes come more strongly into play. This nonlinear interaction between stimulus statistics and dynamics likely also explains why the effect of KLT varied across the example RFs. For the WB and BP-L models, phasic dynamics (i.e., increased KLT) caused α1 to increase while α2 remained constant, but for the BP-H models, phasic dynamics did not have a large effect on either parameter.
Interaction of intrinsic dynamics and RF temporal filtering
Based on these examples, we hypothesized that the key contributor to these interactions was the temporal profile of the input RF, in particular whether there was a negative lobe at longer lags. In the modulation frequency domain, this lobe corresponds to bandpass filtering. The parametric, Gabor-based model we used to generate the RFs (Woolley et al., 2009) represents this feature by a single parameter, the temporal phase (Pt), which is 0 for the WB example and for the BP-L and BP-H examples. Approximately half (26/58) of the RFs in our larger sample, those with modulation power primarily along the spectral axis, had Pt of 0, whereas the RFs with power along the temporal modulation axis (32/58) had Pt of
.
The performance of GLMs fit to data from the larger set of RFs was consistently good, with high correlations between the spike-rate histograms of the LDC and GL models for the tonicWB (r = 0.74 ± 0.11), tonicBP (r = 0.92 ± .02), phasicWB (r = 0.84 ± 0.04), and phasicBP (r = 0.93 ± 0.05) groups, that were comparable to the correlations between the even and odd trials of the LDC data for the tonicWB (r = 0.90 ± 0.05), tonicBP (r = 0.87 ± 0.03), phasicWB (r = 0.92 ± 0.01), and phasicBP (r = 0.90 ± 0.02) models. Performance was slightly lower for the tonicWB data, but the reason for this was not clear.
The results from the larger sample of RFs were consistent with our hypothesis. We looked first at the effects of dynamics on RF temporal structure, specifically the extent to which the estimated tMTF (which represents how the full LDC model encodes stimuli) was attenuated at low frequencies compared to the input tMTF (Δl). In Fig. 4G, Δl corresponds to the difference between the black line and blue or yellow line at f = 0 with maximum power set to 1. Positive values of Δl indicate that the estimated RF is more bandpass (i.e., responds less to low-frequency modulations) compared to the input RF. Negative values indicate that encoding of lower frequencies is boosted. As shown in Fig. 5, for models with WB temporal tuning, phasic dynamics attenuated low frequencies, in comparison to the matching tonic models (LMM: b1 = −0.38, n = 52). For neurons with BP temporal tuning, the effect was the opposite: phasic dynamics caused low frequencies to be less attenuated compared to the matching tonic models (b1 = 0.22, n = 64). In other words, across a broad range of RFs, KLT consistently causes neurons with broadly tuned inputs to become more selective for higher-frequency features, but causes neurons that already have narrowly tuned inputs to become more responsive to lower frequencies.
Lowpass attenuation was defined as the difference in the ratios between the power at f = 0 and the peak power of the temporal modulation spectrum (as in Fig. 4H) of the input RF and GLM estimated RF. The y-axis shows the difference between this value for the input RF and the estimated RF. Positive values indicate that the estimated RF is more bandpass in its temporal filtering properties compared to the input RF, while negative values indicate the estimated RFs were more lowpass. For each RF, lowpass attenuation estimates for the phasic and tonic models are connected by a black dotted line. The bold dotted line shows the differences in the mean lowpass attenuation estimates (enlarged black dot) between RF types for a given model. The linear mixed effects model (LMM) with the interaction between RF type and dynamics fits significantly better than the LMM with main effects only (LMM: χ2(1) = 81.8, p < 0.001)
Similarly, just as we saw with the example models, the adaptation parameters also depended on RF temporal structure and dynamics. As shown in Fig 6A, the general trend was for phasic models to have lower spontaneous firing rates and stronger adaptation, but the effects differed by RF type. Spontaneous firing rates (ω; Fig 6B) were strongly suppressed by KLT in WB models (LMM: b1 = −1.10, n = 52) but not in BP models (b1 = −0.17, n = 64, Fig. 6B). Short-timescale adaptation (α1; Fig. 6C) was stronger for models with phasic dynamics (b1 = −0.59, n = 116) and BP RFs (b2 = −0.29, n = 116), but the effect of KLT was consistent across both kinds of RFs (i.e., no interaction). There was an interaction for longer-timescale adaptation (α2; Fig 6D), which was greater for BP models with phasic dynamics (b1 = 0.03, n = 64) compared to the corresponding tonic models, but showed the opposite effect for WB models (b1 = −0.11, n = 52). Note that in contrast to the univariate example, α2 estimates were consistently negative, which corresponds to a baseline facilitation (i.e., past spikes are associated with an increased probability of firing).
(A) Point estimates of ω, α1, and α2 GLM parameters for phasic (blues) and tonic (yellows) models by RF type. Across the diagonal are the marginal distributions for each of the parameters, with the joint distributions on the off-diagonal. (B) Scatter plot of parameter estimates showing paired phasic and tonic models (as in Fig. 5). For each RF, the phasic and tonic model parameter estimates are connected by a black dotted line. The bold dotted lines show the differences in the mean parameter estimates between RF types for a given model. The LMM with the interaction is a significantly better fit than the LMM with only main effects for ω (LMM: χ2(1) = 45.9, p < 0.001) and α2 (χ2(1) = 42.4, p < 0.001) but not for α1 (χ2(1) = 0.16, p = 0.69)
Nonlinear, nonmonotonic effects of KLT on encoding properties
Up to this point, intrinsic dynamics have been dichotomized into tonic and phasic firing. For step currents, this dichotomy reflects a bifurcation in the dynamics: below a critical value of gKLT, spiking is repetitive, but above this value, it occurs only at the stimulus onset (Rothman and Manis, 2003; Meng et al., 2012). For broadband current stimuli, however, the effects of gKLT are more graded (Chen and Meliza, 2018). To test whether KLT affects encoding properties in a continuous or binary manner, we simulated responses using LDC models with values of gKLT that varied in steps of 1 nS over a range of 0 to 50 nS (with capacitance kept constant at 60 pF), which encompasses the bifurcation in this model from tonic to phasic firing. For simplicity, we used only the three example receptive fields shown in Fig. 4 (WB, BP-L, and BP-H). Using the same birdsong stimulus, we fit GLMs to data from these simulations and examined how lowpass attenuation and adaptation were affected.
The performance of the GLM was good across all levels of gKLT (Fig. 7A), though for the WB RF, there was a drop in performance for larger gKLT values. Because this was not associated with a decrease in intertrial variability (as indicated by correlations between even and odd trials in the data; Fig. 7B), it suggests that the LDC model is more difficult to approximate with a GLM for some combinations of parameters. Overall, however, the predicted spike trains remained highly accurate, allowing resulting parameter estimates to be meaningfully interpreted.
(A) Correlation coefficients between the spike-rate histograms of the LDC and GL models as a function of gKLT. The RFBP-L (purple) and RFBP-H (orange) models were largely unaffected in their fit by gKLT, while the RFWB (red) models showed a dependence on gKLT. (B) Correlation between the even and odd trials of the LDC model as a function of gKLT. (C–F) Lowpass attenuation, ω, α1, and α2 estimates as a function of gKLT for the three exemplar RFs.
Consistent with what we observed with dichotomized dynamics, the effects of KLT on RF temporal structure, spontaneous firing rate, and adaptation depended on RF type (Fig. 7C–F). With the exception of spontaneous firing rate (Fig. 7D), the trajectories of the parameters as gKLT increased were nonlinear and, in the case of α1, sometimes not even monotonic. However, there was little evidence of bifurcation, which would have appeared as a sharp discontinuity between two stable regimes. These results confirm that the effects of intrinsic dynamics on encoding properties are highly nonlinear, with a strong dependence on the statistics of the stimulus and the tuning of the inputs.
Discussion
These data demonstrate how intrinsic dynamics can affect the temporal encoding properties of cortical-level auditory neurons. Although this effect is not unexpected, to our knowledge it has not yet been quantitatively characterized. Our approach was to simulate zebra finch auditory responses with a biophysically realistic linear-dynamical cascade model and then estimate encoding properties using GLMs, which are statistically robust and easy to interpret. This allowed us to modulate intrinsic dynamics by changing the parameter values that correspond to specific cellular mechanisms and explore the effects on receptive fields and spike-history adaptation.
We focused on a low-threshold potassium current (KLT), which is expressed in a subset of neurons in zebra finch CM. In a previous study, we used broadband current injections to show that KLT affects temporal integration, causing neurons to reject low-frequency fluctuations and respond selectively to frequencies around the maximum temporal modulation rate of zebra finch song (Chen and Meliza, 2018). This filtering effect is reproduced by the dynamical model used here. However, the current stimuli used to build the model were artificial and unrepresentative of the stimulus-driven synaptic activity CM neurons would receive in vivo. Thus, to predict how variation in KLT might affect auditory responses to vocal communications in this species, we drove the dynamical model with an injected current that was the result of convolving natural zebra finch song with a spectrotemporal RF, which we term the “input RF”. Input RFs, which represent a linear approximation of the preprocessing performed by the neuron’s presynaptic partners, were randomly drawn from a published distribution of RFs found in zebra finch Field L (Woolley et al., 2009), the major source of ascending auditory input to CM (Vates et al., 1996; Wang et al., 2010). This allowed us to predict which effects of the dynamics would be consistent across the population and which would depend on tuning of the inputs.
The estimated RFs, which we interpret as the features of the stimulus that neurons encode in their spiking outputs, reflected the statistics of the stimulus, the filtering properties of the input RFs, and the dynamics of spiking. Estimated RFs qualitatively resembled input RFs but were distorted in time. Analyzing these distortions using temporal modulation transfer functions (Fig. 4), we found that most (90/116) of the model neurons were less responsive to high frequencies than their inputs; we expected this effect from the lowpass filtering associated with passive leak currents. KLT, in contrast, primarily affected low frequencies in the tMTF. To our surprise, the sign of the effect depended on the input tMTF, specifically how broadly tuned it was. Wideband tMTFs became more bandpass, with less power at low frequencies, just as in our previous study (Chen and Meliza, 2018). Bandpass tMTFs, however, became more lowpass, indicating that KLT was effectively boosting responses to low frequencies in the stimulus. It is unclear how this happens.
KLT also affected the spike-history filter component of the GLM. Here the effects were more consistent across RF types, though there were some weak but significant interactions (Fig. 6B). Perhaps more surprisingly, the shape of the input RF itself influenced estimates of these parameters: α1 estimates were higher for BP neurons whereas α2 was higher for WB neurons. As a result, there was significant overlap in the population distributions of tonic and phasic neurons, such that it would be unlikely one could infer whether a cell was tonic or phasic from the spike-history parameters alone. Unfortunately, this means that intracellular recordings are still needed to characterize intrinsic dynamics.
When dynamical neuron models are stimulated with step currents, gKLT is a bifurcation parameter with a critical value that determines whether the cell can spike repetitively (tonic firing) or not (phasic firing). We found that using more realistic currents, there is little evidence of bifurcation in encoding properties, which changed smoothly as we varied gKLT (Fig. 7). These relationships nonetheless tended to be quite nonlinear, indicating that neurons can in principle achieve dramatic changes in functional response properties with only small changes in the expression or localization of a single type of channel. This sensitivity is what makes intrinsic dynamics a useful target for plasticity. We recently showed that CM neurons express more Kv1.1 and become more phasic during the peak of the critical period for song memorization, but only in finches raised in the complex acoustic environment of a colony (Chen and Meliza, 2020). In rats, exposure to dynamically modulated noise causes neurons in primary auditory cortex to shift their tuning away from the spectrotemporal modulation frequencies of the noise (Homma et al., 2020). Taken together with our study, it may be that KLT is responsible for changes in sensory filtering during auditory development. This mechanism could, in concert with synaptic plasticity, allow neurons to modulate their responses to temporally complex acoustic environments during critical periods of auditory development.
This study was limited to the effects of manipulating a single biophysical parameter (gKLT) on encoding of a single kind of auditory stimulus (zebra finch song), but the same approach could easily be adapted to other auditory areas and sensory systems that exhibit diverse or plastic intrinsic dynamics. We have shown that GLMs can accurately predict the spiking responses of more complex, more biophysically realistic models across different kinds of stimuli, receptive fields, and dynamical regimes. However, GLMs remain a linear approximation, and care is needed in interpreting the parameter estimates, which are not linear or independent functions of the underlying dynamics. Given the nonlinear kinetics of most voltage-gated currents, we expect that the relationships between intrinsic dynamics and encoding properties will be complex and often counterintuitive in most systems, but that there will be much to learn in each system about how intrinsic dynamics reflect the computational tasks and constraints that need to be solved.
Methods
Stimuli Design
For univariate models, the stimulus consisted of 100 s of Gaussian white noise sampled at 1 kHz. For multivariate models, the stimulus consisted of zebra finch song motifs recorded from 30 adult males in our colony. Each motif was normalized to the same RMS amplitude and repeated twice, padding with at least 50 ms microphone noise at the beginning to avoid transients in the convolution. The total duration of the stimulus was 63.7 s, of which 12.7 s was reserved for testing performance. Spectrograms of the stimuli were calculated using a gammatone filter bank (Slaney, 1998) with a window size of 2.5 ms and 20 spectral channels between 1.0 and and 8.0 kHz, and a step size of 1.0 ms.
Receptive Field Construction
The univariate receptive field was generated from the difference of two gamma functions with time constants of 16 and 32 ms and an amplitude ratio of 1.5. Spectro-temporal receptive fields (RFs) were parameterized as the outer product of two Gabor functions multiplied by a scalar amplitude:
where H is the temporal dimension of the RF, G is the spectral dimension, t0 is the latency, f0 is the peak frequency, σt and σf are the temporal and spectral bandwidths, Ωt and Ω f are the temporal and spectral modulation frequencies, Pt is the temporal phase (either 0 or 2π), Pf is the frequency phase (set to 0 for all RFs), and A is the amplitude. The temporal dimension H had a duration of 50 ms with a 1 ms resolution, while the frequency dimension G had 20 channels between 1 and 8 kHz. We generated 59 RFs by sampling randomly from the distributions given in (Woolley et al., 2009) as representative of empirically recorded RFs in primary areas of the zebra finch auditory pallium. The amplitude parameter A was initially set to 1 for all of the RFs, but was adjusted to between 1.5–2 for 7/59 models so that they would fire at least at 1 Hz on average. It was not possible to meet this criterion for one RF out of the 59, so it was discarded, leaving a total of 58 RFs used for analyses.
Linear-Dynamical Cascade Model
Auditory responses were simulated with a model consisting of a linear, time-invariant stage that drives a conductance-based, single-compartment dynamical stage (Bjoring and Meliza, 2019). The voltage dynamics were governed by the sum of the external driving currents Istim(t) and Inoise(t) and the intrinsic currents, which included a leak current and various voltage-gated currents:
Each voltage-gated current depended on a maximal conductance gX, the reversal potential for the ion species conducted by the channel EX, and one or more gating variables (e.g., m, h, n). For all, the dynamics of the voltage-gated conductances were defined by first-order kinetics; for example,
The driving current, Istim(t) was produced by convolving the stimulus with the RF. For the univariate white noise simulation, the white noise stimulus was convolved with the 1-dimensional filter with respect to time. For the multivariate simulations, each spectral channel was convolved with the corresponding channel of the RF and the results were summed to produce a univariate time series. Multiple trials (n = 50) were simulated by adding pink noise to Istim(t), with a signal-to-noise ratio of 4.
The dynamical stage of the model was initially adapted for the zebra finch caudal mesopallium (Chen and Meliza, 2018) from the ventral cochlear nucleus model of Rothman and Manis (Rothman and Manis, 2003). This model can produce phasic or tonic responses to step currents depending on the maximal conductance of a low-threshold potassium current (gKLT). When gKLT is low, the model neuron produces sustained responses to weak and moderate depolarizations; when gKLT is high, the model only fires at the onset of the current step. The model parameter values used here are shown in Table 1. Each RF was paired with a tonic and a phasic model. To examine how encoding properties change over the full range of gKLT values, we started with the tonic model parameters and increased gKLT from 0 nS to 50 nS in steps of 1 nS.
Parameter values for biophysical models.
Results of Δl LMM comparison
Results of ω LMM comparison
Results of α1 LMM comparison
Results of α2 LMM comparison
The dynamical model simulation code was generated using spyks (https://github.com/melizalab/spyks; version 0.6.10), and the dynamics were integrated using a 5th-order Runge-Kutta algorithm with an adaptive error tolerance of 1 × 10−5 and an interpolated step size of 0.025 ms. The output of the integration was converted to spike times by thresholding the voltage at -20 mV.
Generalized Linear Models
A generalized linear model (GLM) (Pillow et al., 2005; Calabrese et al., 2011) was fit to the spike trains produced by the linear dynamical cascade models (Fig. 2). The conditional intensity of the model was given by:
where λ(t) is the conditional intensity at time t, exp(−ω) corresponds to the baseline firing rate of the GLM, K is the RF, which is convolved with the song spectrogram x, and h is the spike adaptation filter, which is convolved with the spike train history yhist(t). Note that we use f1 ∗ f2(t) to denote the convolution of two functions with respect to time. The full RF was a 20 × 50 matrix (20 spectral channels by 50 time bins of 1 ms). To reduce the number of parameters and avoid overfitting, K was parameterized with a rank-2 approximation; that is, the product of a 20 × 2 spectral filter and a 2 × 50 temporal filter (Thorson et al., 2015). The parameter count in the temporal dimension was further reduced by projecting into a basis set consisting of 12 raised cosine functions (Pillow et al., 2005). This basis set achieves good temporal resolution in the time immediately following a spike, with the resolution smoothly decreasing at long time intervals. The spike-history filter h was parameterized in a basis set of two exponential functions:
where τ1 and τ2 are time constants corresponding to short (10 ms) and long (200 ms) timescales, and α1 and α2 are the coefficients. This parameterization was chosen based on the multiadaptive timescale model, which is closely related to the GLM and has been shown to be capable of reproducing a broad range of intrinsic dynamics (Kobayashi et al., 2009; Yamauchi et al., 2011) The GLMs were fit to data from the first 80% of the stimuli. The log-likelihood function of the GLM is given by
where ti is the time of the ith spike, n is the number of spikes in the experiment, T is the final time point of the experiment, and θ represents the free parameters (Rasmussen, 2018). Because the stimulus is highly correlated and the RF is expected to be sparse, we used elastic-net regularization to constrain the RF parameter estimates. Elastic-net regularization is combination of ridge regression and the least absolute shrinkage and selection operator (LASSO). Ridge regression introduces an L2 penalization parameter (ν2) to account for multicollinearity, which is inherently present in the highly correlated structure of the song spectrogram. The LASSO introduces an L1 penalization parameter (ν1) to shrink small correlations to zero and acts as a feature selection algorithm, enforcing RF sparseness. A cost function was given by:
where ‖k‖1 and ‖k‖2 are the L1-norm and L2-norm of K (reshaped into a 1-D vector), respectively. Since the log-likelihood function is concave and is guaranteed to be free of local maxima (Paninski et al., 2004), we simultaneously estimated the parameters (ω, K, α1, α2) by minimizing the cost function, which was done by using the nonlinear conjugate gradient method scipy function ‘fmin_ncg’ (version 1.3.0) (Virtanen et al., 2020). Theano (version 1.0.4) (Al-Rfou et al., 2016) was used to symbolically derive the gradient and Hessian of the cost function and dynamically generate C code to evaluate them. The regularization coefficients (ν1, ν2) and the factorization rank D were chosen using 4-fold cross-validation on the estimation data.
We quantified the uncertainty in the maximum-likelihood estimates (ω, K, α1, α2) by sampling from the joint posterior distribution using emcee (version 2.2.1), a Python implementation of an affine-invariant ensemble Markov chain Monte Carlo sampler (Foreman-Mackey et al., 2013). to estimate the posterior uncertainty in the parameter estimates, p(θ|t0, …, tn) ∝ p(θ)L(θ|t0, …, tn). The log of the prior probability p(θ) was set to the elastic-net penalty (Eq. 7) using the values of ν1 and ν2 obtained through cross-validation, and the log-likelihood was as in Eq. (6). An ensemble of 1000 chains was initialized with random values centered around the maximum-likelihood estimate and given a burn-in of 2500–6000 steps. After this period, each chain was sampled one more time to give a set of 1000 independent samples from p(θ|t0, …, tn). For population-level analyses, the final value of the GLM (ω, K, α1, α2) parameters were the median value of their respective posterior distributions due to the symmetric bell-shaped curve of the posteriors. These values were very close to the initial ML point estimates, so we did not sample from the posterior for the analyses shown in Fig. 7.
To quantify performance, we generated posterior predictive distributions of spike trains from the fitted GLMs, with time discretized to Δ = 0.5 ms. At such short time scales, the conditional rate λ(t) · Δ could be approximated as a Bernoulli trial at each time bin which was used to produce spike train responses from the GLMs. In each trial, we drew a sample from the posterior distribution, so the intertrial variability reflects not only the intrinsic variance of the Bernoulli distribution but the uncertainty in the parameter estimates as well. Performance was quantified as the product-moment correlation between the spike-rate histograms (50 trials, 10 ms bins) for the data and the prediction on the 20% of the stimulus reserved for testing. As a baseline measure of intrinsic variability, we calculated the product-moment correlation between even and odd trials in the data (i.e., from the linear-dynamical cascade model); however, we did not explicitly correct performance scores.
Lowpass Attenuation
The estimated RF parameters were projected back into a linear time basis and reshaped into a 20 × 50 matrix. To obtain the temporal modulation transfer function (tMTF), a 2-dimensional Fourier transform was performed on the RF, summing across the spectral dimension (including positive and negative frequencies). The Fourier transform was calculated using the numpy package in Python, with zero-padding and the application of a Hanning window in the temporal profile to avoid edge effects. RF lowpass attenuation was quantified as:
where P0 is the power for the zero frequency of the input tMTF, Pmax is the maximum power of the input tMTF,
is the power at the zero frequency of the estimated tMTF, and
is the maximum power of the estimated tMTF. Positive values of Δl indicate that the estimated RF responds more weakly to low modulation frequencies compared to the input RF, whereas negative values indicate that the estimated RF is more responsive to low frequencies.
Linear Mixed Effects Models
Given the nested, repeated-measures nature of the experimental design (each input RF was used with tonic and phasic dynamical models), we used a random-intercepts LMM with input RF as a random effect. All LMMs were estimated using the lme4 (version 1.1.21) R package, which does not return p-values for parameter estimates due to unreliability issues (Bates et al., 2015). To determine statistical significance, we therefore took a model-comparison approach where nested LMMs of increasing complexity were compared against each other. Three candidate models were fit: random effects (variance components) only, random effects and main fixed effects, and random effects with main effects and interactions. Restricted maximum likelihood (REML) parameter estimation gives unbiased LMM estimates, however the LMMs cannot be compared as nested models (Pinheiro and Bates, 2006) and we therefore used maximum likelihood estimation (MLE) to generate LMMs. Candidate LMMs were compared across three fit statistics: AIC, BIC, and chi-squared. Lower values of AIC and BIC indicate better relative fit. The null hypothesis of the chi-squared test is that the more complicated model is not a better fit to the data than the less complicated model.
The variance-components model was given by the equation:
where yij is the observed value of the dependent variable for the ith type of neuron model (tonic or phasic) and jth input RF type (WB or BP), b0 is a fixed intercept, uj is the value of the random intercept of the jth RF type, and eij is the error term for the for the LMM. Both uj and eij are assumed to be normally distributed with a mean of zero and a constant variance of σu and σe respectively. This LMM essentially tests if the differences we see in the dependent variable are solely due to the random effects of each input RF rather than neuron model or RF type. For all LMM analyses, tonic neuron models and BP RFs were coded as 1, and phasic neuron models and WB RFs were coded as 0.
The main-effects model was given by the equation:
where yij, b0, uj, and eij are defined identically as above, b1 is the fixed effect of Mi, the ith neuron model type, and b2 is the fixed effect for Rj, the jth input RF type.
The interactions model is identical to the main-effects model, with the addition of a fixed effect b3 of the multiplicative interaction between neuron model and RF type, with the equation given by:
If the interactions model was found to be the best fit to the data, simple-effects models were estimated using REML since these LMMs were not compared to any other candidate models. Simple effects models were calculated by subsetting the data by RF type and estimating a LMM with RF as a random intercept and neuron model type as a fixed effect. For each RF type, the LMM equation is given by:
Acknowledgments
We thank Margot Bjoring for assistance in model development and critical feedback, Laura Jamison for suggestions in statistical analysis design, and Jacy Zanussi for thoughtful discussion and critical feedback. This work was supported in part by funding from the National Science Foundation (IOS-1942480), the Thomas F. and Kate Miller Jeffress Memorial Trust, and The Hartwell Foundation.