Chance, long tails, and inference: a non-Gaussian, Bayesian theory of vocal learning in songbirds

Baohua Zhou; David Hofmann; Itai Pinkoviezky; Samuel J. Sober; Ilya Nemenman

doi:10.1101/167460

Abstract

Traditional theories of sensorimotor learning posit that animals use sensory error signals to find the optimal motor command in the face of Gaussian sensory and motor noise. However, most such theories cannot explain common behavioral observations, for example that smaller sensory errors are more readily corrected than larger errors and that large abrupt (but not gradually introduced) errors lead to weak learning. Here we propose a new theory of sensorimotor learning that explains these observations. The theory posits that the animal learns an entire probability distribution of motor commands rather than trying to arrive at a single optimal command, and that learning arises via Bayesian inference when new sensory information becomes available. We test this theory using data from a songbird, the Bengalese finch, that is adapting the pitch (fundamental frequency) of its song following perturbations of auditory feedback using miniature headphones. We observe the distribution of the sung pitches to have long, non-Gaussian tails, which, within our theory, explains the observed dynamics of learning. Further, the theory makes surprising predictions about the dynamics of the shape of the pitch distribution, which we confirm experimentally.

Introduction

Learned behaviors — reaching for an object, walking, talking, and hundreds of others — allow the organism to interact with the ever-changing surrounding world. To learn and execute skilled behaviors, it is vital for such behaviors to fluctuate from iteration to iteration. Such variability is not limited to inevitable biological noise (1, 2), but rather a significant part of it is controlled by animals themselves and is used for exploration during learning (3, 4). Furthermore, learned behaviors rely heavily on sensory feedback. The feedback is needed, first, to guide the initial acquisition of the behaviors, and then to maintain the needed motor output in the face of changes in the motor periphery and fluctuations in the environment. Within such sensorimotor feedback loops, the brain computes how to use the inherently noisy sensory signals to change patterns of activation of inherently noisy muscles to produce the desired behavior. This transformation from sensory feedback to motor output is both robust and flexible, as demonstrated in many species in which systematic perturbations of the feedback dramatically reshape behaviors (1, 5–8).

Since many complex behaviors are characterized by both tightly controlled motor variability and by robust sensorimotor learning, we propose that, during learning, the brain controls the distribution of behaviors. In contrast, most prior theories of animal learning have assumed that there is a single optimal motor command that the animal tries to produce, and that, after learning, deviations from the optimal behavior result from the unavoidable (Gaussian) downstream motor noise. Such prior models include the classic Rescorla-Wagner (RW) model (9), as well as more modern approaches belonging to the family of reinforcement learning (10–12), Kalman filters (13, 14), or dynamical Bayesian filter models (15, 16). Such theories have addressed many important experimental questions, such as evaluating the optimality of the learning process (13, 17–20), accounting for multiple temporal scales in learning (7, 13, 21, 22), identifying the complexity of behaviors that can be learned (23), and pointing out how the necessary computations could be performed using networks of spiking neurons (12, 24–28).

However, despite these successes, most prior models that assume that the brain aims to achieve a single optimal output have been unable to explain some commonly observed experimental results. For example, since such theories assume that errors between the target and the realized behavior drive changes in future motor commands, they typically predict large behavioral changes in response to large errors. In contrast, experiments in multiple species report a decrease in both the speed and the magnitude of learning with an increase in the experienced sensory error (6, 19, 29, 30). One can rescue traditional theories by allowing the animal to reject large errors as “irrelevant”—unlikely to have come from its own actions (19, 31). However, such rejection models have not yet explained why the same animals that cannot compensate for large errors can correct for even larger ones, as long as their magnitude grows gradually with time (6, 30).

Here we present a theory (Fig. 1) of a classic model system for sensorimotor learning — vocal adaptation in a song-bird — in which the brain controls a probability distribution of motor commands and updates this distribution by a recursive Bayesian inference procedure. Since the distribution of the song pitch is empirically heavy tailed (Fig. 2C), our model does not make the customary Gaussian assumptions. The focus on learning and controlling (non-Gaussian) distributions of behavior allows us to capture successfully all of the above-described nonlinearities in learning dynamics and, furthermore, to account for the previously unnoticed learning-dependent changes in the shape of the distribution of the behavior.

Fig. 1.

The dynamical Bayesian model (Bayesian filter). A) A Bayesian filter consists of the recursive application of two general steps (32): i) an observation update, which corresponds to novel sensory input and updates the underlying probability distribution of plausible motor commands using the Bayes’ formula; ii) a time evolution update, which denotes the temporal propagation and corresponds to uncertainty increasing with time (see main text); here the probability distribution is updated by convolution with a temporal kernel. These two steps are repeated for each new sensory data in a recursive loop. B) Example distributions for the entire procedure in two scenarios: Gaussian (top) and heavy-tailed (bottom) distributions. The x-axis, ϕ_t, represents the motor command which results in a specific pitch sung by the bird. The outcome of this motor command is then measured by two different sensory modalities, represented by , with corresponding likelihood functions L₁(ϕ_t; Δ) and L₂(ϕ_t; 0), respectively. The Δ shift for modality 1 is induced by the experimentalist. Dashed brown lines represent the individual likelihood functions from the individual modalities, and the solid lines represent their product, which signals how likely it is that the correct motor command corresponds to ϕ_t. Heavy-tailed distributions can produce a bimodal likelihood, which, multiplied by the prior, suppresses large-error signals. In contrast, Gaussian likelihoods are uni-modal and result in greater compensatory changes in behavior.

Fig. 2.

Experimental data and model fitting. The same six parameters of the model are used to simultaneously fit all data. (A) The dots with error bars are four groups of experimental data, with different colors and symbols indicating different shift sizes (red circle, 0.5 semitone shift; blue square, 1 semitone shift; green diamond, 1.5 semitones shift; cyan upper triangle, 3 semitones shift). The error bars indicate the standard error of the group mean, accounting for variances cross individual birds and within one bird, see Materials and Methods). For each group, the data are combined from three to eight different birds, and the sign of the experimental perturbation (lowering or raising pitch) is always defined so that adaptive (i.e. error-correcting) vocal changes are positive. Data points without error bars only had a single bird, and they are not used for the fitting, which we denote by hollow symbols. The mean pitch sung on day 0 by each bird is defined as the zero semitone compensation (ϕ = 0). The solid lines with one standard deviation bands (see Materials and Methods) are results of the model fits, with the same color convention as in experimental data. (B) The dots with error bars show the data from staircase-shift experiment, with the same plotting conventions as in (A). The data are combined from three birds. During the experiment, every six days, the shift size is increased by 0.35 semitone, as shown by the dotted horizontal line segments. On the last day of the experiment, the experienced pitch shift is 2.8 semitones. The magenta solid line with one standard deviation band is the model fit. (C) Dots represent the distribution of pitch on day 0, before the pitch shift perturbation (the baseline distribution), where the data are from 23 different experiments (all pitch shifts combined). The gray parabola is a Gaussian fit to the data within the ±1 semitone range. The empirical distribution has long, non-exponential tails. The brown solid line with one standard deviation band is the model fit.

Results

Biological model system

Vocal control in songbirds is a powerful model system for examining sensorimotor learning of complex tasks (33). The phenomenology we are trying to explain arises from experimental approaches to inducing song plasticity (30). Songbirds sing spontaneously and prolifically and use auditory feedback to shape their songs towards a “template” learned from an adult bird tutor during development. When sensory feedback is perturbed (see below) using headphones to shift the pitch (fundamental frequency) of auditory feedback (30), birds compensate by changing the pitch of their songs so that the pitch they hear is closer to the unperturbed one. As shown in Fig. 2A, the speed of the compensation and its maximum value, measured as a fraction of the pitch shift, decrease with the increasing shift magnitude, so that a shift of 3 semitones results in near-zero fractional compensation. Crucially, the small compensation for large perturbation does not reflect the limited plasticity of the adult brain since imposing the perturbation gradually, rather than instantaneously, results in a large compensation (Fig. 2B).

Data

We use experimental data collected in our previous work (8, 30) to develop our mathematical model of learning. As detailed in Ref. (34), we used a virtual auditory feedback system (8, 35) to evoke sensorimotor learning in adult song-birds. For this, miniature headphones were custom-fitted to each bird and used to provide online auditory feedback in which the pitch (fundamental frequency) of the bird’s vocalizations could be manipulated in real time, with a loop delay less than 10 ms. In addition to providing pitch-shifted feedback, the headphones blocked the airborne transmission of the bird’s song from reaching the ear canals, thereby effectively replacing the bird’s natural airborne auditory feedback with the manipulated version. Pitch shifts were introduced after a baseline period of at least 3 days in which birds sang while wearing headphones but without pitch shifts. All pitch shifts were implemented relative to the bird’s current vocal pitch and were therefore “correctable” in the sense that if the bird changed its vocal pitch to fully compensate for the imposed pitch shift, the pitch of auditory feedback heard through the headphones would be equal to its baseline value. All data were collected during undirected singing (i.e. no female bird was present).

Mathematical model

To describe the data, we introduce a dynamical Bayesian filter model, Fig. 1A. We focus on just one variable learned by the animal during repeated singing — the pitch of the song syllables. Even though the animal learns the motor command and not the pitch directly, we do not distinguish between the produced pitch ϕ and the motor command leading to it because the latter is not known in behavioral experiments. We set the mean “baseline” pitch sung by the animal as ϕ = 0, representing the “template” of song memorized during development, and nonzero values of ϕ denote deviations of the sung pitch from the target.

In our model, the state of the motor learning system at each time step is a probability distribution over motor behaviors, rather than a single motor command or mean behavior that is corrupted by downstream noise as in many other models. Thus at time t, the animal has access to the prior distribution over motor commands, p_prior(ϕ_t). We assume that the bird then randomly selects and produces the pitch from this distribution. In other words, in a departure from standard accounts, we suggest that the experimentally observed variability of sung pitches is dominated by the deliberate exploration of plausible motor commands, rather than by noise in the motor system. This is supported by the experimental finding that the variance of pitch during singing directed at a female (performance) is significantly smaller than the variance during undirected singing (practice) (4, 36).

After producing a vocalization, the bird then senses the pitch of the produced song syllable through various sensory pathways. Besides the normal airborne auditory feedback reaching the ears, which we can pitch-shift, information about the sung pitch may be available through other, unmanipulated pathways. For example, efference copy may form an internal short term memory of the produced specific motor command (37). Additionally, proprioceptive sensing presumably also provides unshifted information (38). Finally, unshifted acoustic vibrations might be transmitted through body tissue in addition to the air, as is thought to be the case in studies that use pitch shifts to perturb human vocal production (39, 40).

We denote all feedback signals as where the index i denotes different sensory modalities. Because sensing is noisy, feedback is not absolutely accurate. Thus it is interpreted using Bayes’ formula, so that the posterior probability of which motor commands lead to the target given the observed signals is p_post(ϕ_t) ∝ , where p_likelihood represents the distribution of motor commands that would fully “correct” the errors given the observed sensory evidence. Finally, the animal expects that the target motor command may change with time because of slow random changes in the motor plant. In other words, the animal must increase its uncertainty about the target with time in the absence of new sensory information. Such increase in uncertainty is given by p_prop(ϕ_t+δt|ϕ_t), the propagator of statistical field theories (41). Overall, this results in the distribution of motor outputs after one cycle of the model where Z is the normalization constant.

We choose δt to be one day in our implementation of the model and lump all vocalizations (which we record) and all sensory feedback (which are unknown) in one time period together. That is, we look at timescales of changes across days, rather than faster fluctuations on timescales of minutes or hours. This matches the temporal dynamics of the learning curves (Fig. 2A, B). Since the bird sings hundreds of song bouts daily, we now use the law of large numbers and replace the unknown sensory feedback for individual vocalizations by its expectation value . For simplicity, we focus on just two sensory modalities, the first affected by the headphones, and the second one not, and we remain agnostic about the exact nature of this second modality among the possibilities noted above. Thus the expectation values of the feedbacks are the shifted and the unshifted versions of the expected value of the sung pitch, and . We refer to the conditional probability distribution for each sensory modality as the likelihood functions L_i(ϕ_t) for a certain motor command being the target given the observed sensory feedback. Thus assuming that both sensory inputs are independent measurements of the motor output, we rewrite Eq. (1) as where 0 and Δ represent the centers of the likelihoods.

As illustrated in Fig. 1B, such Bayesian filtering behaves differently for Gaussian and heavy-tailed likelihoods and propagators. Indeed, if the two likelihoods are Gaussians, their product is also a Gaussian centered between them. In this case, the learning speed of an animal is linear in the error Δ, no matter how large this error is, which conflicts with the experimental results in songbirds and other species (5, 8, 19, 33). Similarly, if the two likelihoods have long tails, then when the error is small, their product is also a single-peaked distribution as in the Gaussian case. However, when the error size Δ is large, the product of such long-tailed likelihoods is bimodal, with evidence peaks at the shifted and the unshifted values, with a valley in the middle. Since the prior expectations of the animal are developed before the sensory perturbation is turned on, they peak near the unshifted value. Multiplying the prior by the likelihood then leads to suppression of the shifted peak and hence of large error signals in animal learning.

In Eq. (2), there are three distributions to be defined: L₁(ϕ_t; Δ), L₂(ϕ_t;0), and p_prop(ϕ_t+δt|ϕ_t), corresponding to the evidence term from the shifted channel, the evidence term from the unshifted channel, and the time propagation kernel, respectively. The prior at the start of the experiment t = 0, p_prior(ϕ₀), is not an independent degree of freedom: it is the steady state of the recurrent application of Eq. (2) with no perturbation, Δ = 0. We have verified numerically that a wide variety of shapes of L₁, L₂ and p_prop result in learning dynamics that can approximate the experimental data (see Materials and Methods). To constrain the selection of specific functional forms of the distributions, we point out that the error in sensory feedback obtained by the animal is a combination of many noisy processes, including both sensing itself and the neural computation that extracts the pitch from the auditory input and then compares it to the target pitch. By the well-known generalized central limit theorem, the sum of these processes is expected to converge to what are known as Lévy alpha-stable distributions, often simply called stable distribution (42) (see Materials and Methods). If the individual noise sources have finite variances, the stable distribution will be a Gaussian. However, if the individual sources have heavy tails and infinite variances, then their stable distribution will be heavy-tailed as well (Cauchy distribution is one example). Most stable distributions cannot be expressed in a closed form, but they can be evaluated numerically (see Materials and Methods). Symmetric stable distributions, which we assume here, are characterized by three parameters: the stability parameter α (measuring the proportion of the probability in the tails), the scale or width parameter γ, and the location or the center parameter μ (the latter can be predetermined to be 0, Δ, or the previous time step value in our case). For three distributions L₁(ϕ_t; Δ), L₂(ϕ_t;0), and p_prop, this results in the total of six unknown parameters.

Fits to data

We fit the set of six parameters of our model simultaneously to all the data shown in Fig. 2. Our dataset consists of twenty-three individual experiments across five experimental conditions: four constant pitch shift learning curves and one gradual, staircase shift learning curve (see Material and Methods for details). As mentioned previously, birds learn the best (larger and faster compensation) for smaller perturbations, here 0.5-semitone, panel (A). In contrast, for a large 3-semitone perturbation, the birds do not compensate at all within the 14 days of the experiment. However, the birds are able to learn and compensate large perturbations when the perturbation increases gradually, as in the staircase experiment in panel (B). Importantly, the baseline distribution (panel C) has a robust non-Gaussian tail, supporting our model. We note that our six-parameter model fits are able to simultaneously describe all of these data with a surprising precision, including their most salient features: dependence of the speed and the magnitude of the compensation on the perturbation size for the constant and the staircase experiments, as well as the heavy tails in the baseline distribution.

Predictions

Mathematical models are useful to the extent that they can predict experimental results not used to fit them. Quantitative predictions of qualitatively new results are particularly important for arguing that the model captures the system’s behavior. To test the predictive power of our model, we used it to predict the dynamics of higher-order statistics of pitches during learning, rather than using it to simply predict the mean behavior. We first use the model to predict time-dependent measures of the variability (standard deviation in this case) of the pitch. As shown in Figure 3A-E, our model correctly predicted time-dependent increases in the standard deviation in both single-shift (Figure 3A-D) and staircase-shift experiments (Figure 3E) with surprising accuracy. We stress again that no new parameter fits were done for these curves. Potentially even more interestingly, Fig. 3F shows that our model is capable of predicting unexpected features of the probability distribution of pitches, such as the asymmetric and bimodal structure of the pitch distribution at the end of the staircase-shift experiment. This bimodal structure is predicted by our theory, since the theory posits that the (bimodal) likelihood distribution (Fig. 1B, bottom) will iteratively propagate into the observable pitch distribution (the prior). The existence of the bimodal pitch distribution in the data therefore provides strong evidence in support of our theory. Importantly, this phenomenon can never be reproduced by models based on animals learning a single motor command with Gaussian noise around it, rather than a heavy-tailed distribution of motor commands.

Fig. 3.

Predictions of our model using the parameter values obtained from fitting the data shown in Fig. 2. The dots with error bars (A-E) and the histogram (F) represent experimental data with color, symbol, error bar, and other plotting conventions as in Fig. 2. The dotted lines with one standard deviation bands represent model fits. (A-E) Our model predicts that the standard deviation of the pitch distributions increases gradually in pitch-shift experiments (corresponding to 0.5 semitone, 1 semitone, 1.5 semitones, 3 semitones and staircase shift, respectively). Experiments confirm the prediction. (F) Our model predicts that, at the end of the staircase experiment (mean and standard deviation shown in Figs. 2B and 3E, respectively), the pitch distribution should be bimodal, while it is unimodal initially (cf. Fig. 2C), which is also supported by data (note that the data here is from day 47 from the single bird who was exposed to the staircase shift for the longest time).

Discussion

We introduced a novel mathematical framework within the class of observation-evolution models (32) for understanding sensorimotor learning: a dynamical Bayesian filter with non-Gaussian (heavy-tailed) distributions. Our model describes the dynamics of the whole probability distribution of the motor commands, rather than just its mean value. We posit that this distribution controls the animal’s deliberate exploration of plausible motor commands. The model reproduces the learning curves observed in a range of songbird vocal adaptation experiments, which classical behavioral theories have not been able to do to date. Further, also unlike the previous models, our approach predicts learning-dependent changes in the width and shape of the distribution of the produced behaviors.

To further increase the confidence in our model, we show analytically (see Materials and Methods) that traditional linear models with Gaussian statistics (13) cannot explain the different levels of compensation for different perturbation sizes. While we cannot exclude that birds would continue adapting if exposed to perturbations for longer time periods and would ultimately saturate at the same level of adaptation magnitude, the Gaussian models are also argued against by the shape of the pitch distribution, which shows heavy tails (Fig. 2C and 3F) and by our ability to predict not just the mean pitch, but the whole pitch distribution dynamics during learning.

An important aspect of our dynamical model is its ability to reproduce multiple different time scales of adaptation (Fig. 2A, B) using a nonlinear dynamical equation with just a single time scale given by the width of the propagation kernel. As with other key aspects of the model, this phenomenon results from the non-Gaussianity of the distributions employed, and is in contrast to other multiscale models that require explicit incorporation of many time scales (7, 13). While multiple time scales could be needed to account for other features of the adaptation, our model clearly avoids this for the present data. In the future, we hope that an extension of our model to include multiple explicit time scales will account for individual differences across animals, for the dynamics of acquisition of the song during development, and for the slight shift of the peak of the empirical distribution in Fig. 3F from ϕ = 0.

Previous analyses of the speed and magnitude of learning in the Bengalese finch have noted that both depend on the overlap of the distribution of the natural variability at the baseline and at the shifted means (22, 34): small overlaps result in slower and smaller learning, so that different overlaps lead to different time scales. However, these prior studies have not provided a mechanistic, learning-theoretic explanation of why or how such overlap might determine the dynamics of learning. Our dynamical inference model provides such a mechanism.

We have chosen the family of so-called Lévy alpha-stable distributions to provide the central ingredient of our model: the heavy tails of the involved probability distributions. In general, a symmetric alpha-stable distribution has a relatively narrow peak in the center and two long fat tails, and this might provide some valuable qualitative insights into how the nervous system processes sensory inputs. For example, a narrow peak in the middle of the likelihood function suggests that the brain puts a high belief in the sensory feedback. However, the heavy tails say that it also puts certain weight (nearly constant) on the probability of very large errors outside of the narrow central region. We have verified that the actual choice of the stable distributions is not crucial for our modeling. For example, one could instead take each likelihood as a power law distribution, or as a sum of two Gaussians with equal means, but different variances. The latter might correspond to mixture of high (narrow Gaussian) and low (wide Gaussian) levels of certainty about sensory feedback, potentially arising from variations in environmental or sensory noise or from variations in attention. As shown in Materials and Methods, different choices of the underlying distributions result in essentially the same fits and predictions. This suggests that the heavy tails themselves, rather than their detailed shape, are crucial for the model.

While we used Bengalese finches as the subject of this study, nothing in the model relies on the specifics of the songbird system. Sensorimotor learning in many animals should avail itself of modeling by our approach, and we predict that any animal with heavy-tailed distribution of motor outputs should exhibit similar phenomenology in its sensori-motor learning. Exploring whether the model allows for such cross-species generalizations is an important topic for future research, as are questions of how networks of neurons might implement such computations (43–46).

Materials and Methods

Experiments

The data used are taken from the experiments in Ref. (30) and is described in detail there. Briefly, subjects were nine male adult Bengalese finches (females do not produce song) aged over 190 days. Lightweight headphones and microphones were used to shift the perceived pitches of birds’ own songs by different amounts, and the pitch of the produced song was recorded. For each day, only data from 10 am to 12 pm is used. The same birds were used in multiple (but not all) pitch shift experiments separated by at least 32 days. Changes in vocal pitch were measured in semitones, which is a relative unit of the fundamental frequency (pitch) of each song syllable:

Stable distributions

A probability distribution is said to be stable if a linear combination of two variables distributed according to the distribution has the same distribution up to location and scale (42). By the generalized central limit theorem, the probability distribution of sums of a large number of i. i. d. random variables with infinite variances tend to be stable distributions (42). A general stable distribution does not have a closed form expression, except for three special cases: Lévy, Cauchy and Gaussian. A symmetric stable variable x can be written in the form x = γy + μ, where y is called the standardized symmetric stable variable and follows the following distribution (42): Thus any symmetric stable distribution is characterized by three parameters: the type, or the tail weight, parameter α; the scale parameter γ; and the center μ. α takes the range (0,2] (42). If α = 2, the corresponding distribution is the Gaussian, and if α = 1, it is the Cauchy distribution. γ can be any positive real number, and μ can be any real number. The above integral is difficult to compute numerically. However, due to the common occurrence of stable distributions in various fields, such as finance (47), communication systems (48), and brain imaging (49), there are many algorithms to compute it approximately. We used the method of Ref. (50). In this method, the central and tail parts of the distribution are calculated using different algorithms: the central part is approximated by 96-points Laguerre quadrature and the tail part is approximated by Bergstrom expansion (51).

Note that even though we take the propagator and the likelihood distributions as stable distributions in our model, their iterative application (effectively, a product of many likelihood distributions) gives finite variance predictions, allowing us to compare predicted variances of the behavior with experimentally measured ones.

Fitting

Our model consists of three truncated stable distributions, one for each of the two likelihood functions resembling the feedback modalities and a third for the propagation kernel. We use truncation to ensure biological plausibility: neither extremely large errors nor extremely large pitch changes are physiologically possible. We truncate the distributions to the range [−8,8] semitones — much larger than imposed pitch shifts and slightly larger than the largest observed pitch fluctuations in our data, 7 semitones. This leaves us with 9 parameters of which we need to fit 6 from data, namely the type parameters α and the scale parameters γ, while the center parameters μ are predetermined: the two likelihoods are at 0 and Δ respectively, while the propagation kernel is centered around the previous time step value (see Eq. (1)). We construct an objective function that is a sum of terms representing the quality of fit for the three data sets we consider: the χ² for four mean adaptations to the constant shifts (Fig. 2A), the χ² for the mean adaptation for the staircase shift (Fig. 2B), and the log-likelihood of the observed baseline pitch probability distribution (Fig. 2C). To make sure that all three terms contribute on about the same scale to the objective function, we multiply the baseline fit term by 10.

The objective function landscape is not trivial in this case, and there is not a single best set of parameters. Figure 4 illustrates this by showing the quality of fit as a function of each pair of (α, γ), while keeping the other four parameters fixed. There is a large subspace (a plateau or a long nonlinear valley, depending on the projection used) that provides nearly the same fit values. In other words, the effective number of important parameters is less than six. Thus choosing the maximum of the objective function and characterizing the error ellipsoid to get the best-fit parameter values and their uncertainties is not appropriate. Instead, we focus on values and uncertainties of the fits themselves. For this, we sweep through the entire parameter space and, for each set of parameters , we calculate the value of the objective function and the corresponding fitted or predicted curve . Then for the mean fits/predictions (lines in our figures), we have For the standard deviations, denoted by shaded regions in our figures, we have There are many ways of doing the sweep over the parameters. Here we choose first to find a local minimum (however shallow it is). Then for each parameter, we choose six data points on each side of the minimum, distributed uniformly in the log space between the local minimum and the extremal parameter values ((0.2,1.9] for each α and [0.01,8] for each γ). The extremal values avoid α = 0,2 and γ = 0, which are singular and dramatically slow down computations. Thus there are total of 13 grid points for each parameter, and the total of 13⁶ ≈ 4.8 · 10⁶ total parameter samples.

Fig. 4.

Objective function as a function of two parameters (stability and scale) for (A) the first (shifted) likelihood, (B) the second (unshifted) likelihood, and (C) the propagation kernel, while the respective other four parameters are held fixed. The gray shades represent values in logarithmic scale (log₁₀) of the objective function; lighter color represents a better fit.

Choice of the shape of distributions

For figures in the main text, we have chosen stable distributions for L₁, L₂ and p_prop. To investigate effects of this choice, we repeated the fitting and the predictions for different distribution choices. We consider a family of power law distributions ∝1/(1 + (ϕ/γ)^2α) and a family of mixtures of Gaussians of different width ρN(0, γ²)+(1−ρ)N(0, δ²). Distributions in either family produce very similar fits to the stable distribution model. For example, Fig. 5 shows the fits and predictions for the power law distribution model. The detailed shape of the distributions seems less important than the existence of the heavy tails.

Fig. 5.

Fits and predictions, same as equivalent panels in Figs. 2, 3, but with the power law family of heavy tailed distributions instead of the stable distributions family. The shaded areas around the theoretical curves represent confidence intervals for one standard deviation.

Linear dependence on pitch shift in a Kalman filter with multiple time scales

We emphasized that traditional learning models cannot account for the nonlinear dependence of the speed and the magnitude of learning on the error signal. Here we show this for one such common model, originally proposed by Körding et al. (13). This Kalman filter model belongs to the family of Bayes filters, which are dynamical models describing the temporal evolution of the probability distribution of a hidden state variable (can be a vector or a scalar) and its update using the Bayes formula for integrating information provided by observations, which are conditionally dependent on the current state of the hidden variable. The specific attributes of a Kalman filer within the general class of Bayes filters (32) are the linearity of the temporal evolution of the hidden state (the pitch ϕ for the birds, but referred to as disturbances d in Ref. (13) and hereon), the linear relation between the measurements (observations) and the hidden variable, and the Gaussian form of the measurement noise and the distribution of disturbances.

One can argue that Kalman filter models with multiple time scales may be able to account for the diversity of learning speeds in our pitch shift experiments. We explore this in the context of an experimentally induced constant shift Δ to one disturbance d in the Kalman filter model with multiple time scales from Ref. (13). If there is a constant shift Δ, Eq. (3) in Ref. (13) takes the form The first step in the Kalman filter dynamics is the prediction: where 〈d〉_s|t is the mean disturbance vector at time s given measurements up to time t and with τ_i being the relaxation timescale of d_i. We assume that the shift occurs when the disturbances have relaxed to the steady state: 〈d〉 = 0. Therefore, we approximate the standard Kalman filter equation describing the observation update of the expectation value of the disturbance after a measurement at time t + 1 as (see (32) for a detailed formal description) where R is the covariance matrix of the measurement noise, and Σ is the covariance matrix of the hidden variables Σ does not depend on the measurement and is thus not affected by the shift Δ. Thus the steady state prediction variance Σ_s is given by a solution to the equation where A is the matrix determining the temporal evolution of the mean disturbances, Eq. (7), and Q is the covariance matrix of the intrinsic (temporal evolution) noise.

From Eq. (9) we see that Σ_s is constant if the perturbation occurs when the system was at the steady state. We now wish to find the new steady-state given the constant perturbation Δ Consider, for simplicity, two disturbances, each with its own temporal scale n = 2. The components of the steady state covariance are and we define

Substituting Eq. (7) in Eq. (8) we get In the steady state, 〈d〉_t+1|t+1 = 〈d〉_t|t = d_s, we get Thus we find that the sum of the disturbances is proportional to Δ independent of the size of Δ even for systems with multiple time scales.

Generalizing the result to n disturbances with different time scales, we get the following equations at steady state: These equations are solved by which generalizes the linear dependence of learning on Δ for arbitrary n. Thus this (and similar) Kalman filter based model cannot explain the experimental results studied here.

ACKNOWLEDGEMENTS

This work was partially supported by NIH BRAIN Initiative Theory Grant 1R01-EB022872, James S. McDonnell Foundation Grant 220020321, NIH Grant NS084844, and NSF Grant 1456912. We are grateful to the NVIDIA corporation for supporting our research with donated Tesla K40 GPUs.

References

1.↵
R Shadmehr, M A Smith, and J W Krakauer. Error Correction, Sensory Prediction, and Adaptation in Motor Control. Annual Rev Neurosci, 33(1):89–108, 2010. doi: 10.1146/annurev-neuro-060909-153135.
OpenUrl CrossRef PubMed Web of Science
2.↵
M D McDonnell and L M Ward. The benefits of noise in neural systems: Bridging theory and experiment. Nature Rev Neurosci, 12(7):415–426, 2011. ISSN 1471-003X, 1471-0048. doi: 10.1038/nrn3061.
OpenUrl CrossRef PubMed
3.↵
Allen Neuringer. Operant variability: Evidence, functions, and theory. Psychonomic Bulletin & Review, 9(4):672–705, December 2002. ISSN 1069-9384, 1531-5320. doi: 10.3758/BF03196324.
OpenUrl CrossRef PubMed Web of Science
4.↵
M H Kao, A J Doupe, and M S Brainard. Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Nature, 433(7026):638–643, 2005. ISSN 0028-0836. doi: 10.1038/nature03127.
OpenUrl CrossRef PubMed Web of Science
5.↵
B A Linkenhoker and E I Knudsen. Incremental training increases the plasticity of the auditory space map in adult barn owls. Nature, 419(6904):293–296, 2002.
OpenUrl CrossRef PubMed
6.↵
E I Knudsen. Instructed learning in the auditory localization pathway of the barn owl. Nature, 417(6886):322–328, 2002. ISSN 0028-0836. doi: 10.1038/417322a.
OpenUrl CrossRef PubMed Web of Science
7.↵
M A Smith, A Ghazizadeh, and R Shadmehr. Interacting Adaptive Processes with Different Timescales Underlie Short-Term Motor Learning. PLOS Biol, 4(6):e179, 2006. ISSN 1545-7885. doi: 10.1371/journal.pbio.0040179.
OpenUrl CrossRef PubMed
8.↵
S J Sober and M S Brainard. Adult birdsong is actively maintained by error correction. Nature Neurosci, 12(7):927–31, 2009. ISSN 1546-1726. doi: 10.1038/nn.2336.
OpenUrl CrossRef PubMed Web of Science
9.↵
R Rescorla and A Wagner. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Classical Conditioning II, pages 64–99. Appleton-Century-Crofts, 1972.
10.↵
D Joel, Y Niv, and E Ruppin. Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15(4):535–547, 2002. ISSN 0893-6080. doi:10.1016/S0893-6080(02)00047-3.
OpenUrl CrossRef PubMed Web of Science
11.
R S Sutton and A G Barto. Reinforcement Learning: An Introduction. MIT Press Cambridge, USA, 2nd edition, 2012.
12.↵
A Lak, W Stauffer, and W Schultz. Dopamine neurons learn relative chosen value from probabilistic rewards. eLife, 5:e18044, 2016. ISSN 2050-084X. doi: 10.7554/eLife.18044.
OpenUrl CrossRef
13.↵
K P Kording, J B Tenenbaum, and R Shadmehr. The dynamics of memory as a consequence of optimal adaptation to a changing body. Nature Neurosci, 10(6):779–786, 2007. ISSN 1097-6256. doi: 10.1038/nn1901.
OpenUrl CrossRef PubMed Web of Science
14.↵
D M Wolpert. Probabilistic models in human sensorimotor control. Human Movement Sci, 26(4):511–524, 2007. ISSN 01679457. doi: 10.1016/j.humov.2007.05.005.
OpenUrl CrossRef PubMed Web of Science
15.↵
C R Gallistel, T A Mark, A P King, and P E Latham. The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect. J Exper Psych. Animal Behav Proc, 27(4):354–372, 2001. ISSN 0097-7403.
OpenUrl
16.↵
S J Gershman. A Unifying Probabilistic View of Associative Learning. PLOS Comput Biol, 11(11):e1004567, 2015. doi: 10.1371/journal.pcbi.1004567.
OpenUrl CrossRef PubMed
17.↵
O Donchin and R Shadmehr. Linking motor learning to function approximation: Learning in an unlearnable force field. In Adv Neural Inf Proc Syst 14, volume 1, page 7. MIT Press, 2001.
OpenUrl
18.
R J van Beers. Motor Learning Is Optimally Tuned to the Properties of Motor Noise. Neuron, 63(3):406–417, 2009. ISSN 0896-6273. doi: 10.1016/j.neuron.2009.06.025.
OpenUrl CrossRef PubMed Web of Science
19.↵
K Wei and K Körding. Relevance of Error: What Drives Motor Adaptation? J Neurophysiol, 101(2):655–664, 2009.
OpenUrl CrossRef PubMed Web of Science
20.↵
J M Beck, W J Ma, X Pitkow, P Latham, and A Pouget. Not Noisy, Just Wrong: The Role of Suboptimal Inference in Behavioral Variability. Neuron, 74(1):30–39, 2012. ISSN 0896-6273. doi: 10.1016/j.neuron.2012.03.016.
OpenUrl CrossRef PubMed
21.↵
K Wei and K Körding. Uncertainty of feedback and state estimation determines the speed of motor adaptation. Frontiers Comput Neurosci, 4, 2010. ISSN 1662-5188. doi: 10.3389/fncom.2010.00011.
OpenUrl CrossRef PubMed
22.↵
C W Kelly and S J Sober. A simple computational principle predicts vocal adaptation dynamics across age and error size. Frontiers Integrative Neurosci, 8:9, 2014. doi:10.3389/fnint.2014.00075.
OpenUrl CrossRef
23.↵
T Genewein, E Hez, Z Razzaghpanah, and D A Braun. Structure Learning in Bayesian Sensorimotor Integration. PLOS Comput Biol, 11(8):27, 2015. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004369.
OpenUrl CrossRef
24.↵
R Shadmehr, O Donchin, E-J Hwang, S E Hemminger, and A Rao. Learning dynamics of reaching. Motor Cortex and Voluntary Movements, pages 297–328, 2005.
25.
P Dayan and Y Niv. Reinforcement learning: The Good, The Bad and The Ugly. Curr Opin Neurobiol, 18(2):185–196, 2008. ISSN 0959-4388. doi: 10.1016/j.conb.2008.08.003.
OpenUrl CrossRef PubMed Web of Science
26.
B J Fischer and J L Peña. Owl’s behavior and neural representation predicted by Bayesian inference. Nature neurosci, 14(8):1061–1066, 2011. ISSN 1546-1726. doi: 10.1038/nn.2872.
OpenUrl CrossRef PubMed
27.
S A Neymotin, G L Chadderdon, C C Kerr, J T Francis, and W W Lytton. Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex. Neural Comput, 25(12):3263–3293, 2013. ISSN 0899-7667. doi: 10.1162/NECO_a_00521.
OpenUrl CrossRef
28.↵
W Schultz. Neuronal Reward and Decision Signals: From Theories to Data. Physiol Rev, 95(3):853–951, 2015.
OpenUrl CrossRef PubMed
29.↵
F R Robinson, C T Noto, and S E Bevans. Effect of Visual Error Size on Saccade Adaptation in Monkey. J Neurophysiol, 90(2):1235–1244, 2003. ISSN 0022-3077, 1522-1598. doi:10.1152/jn.00656.2002.
OpenUrl CrossRef PubMed Web of Science
30.↵
S J Sober and M S Brainard. Vocal learning is constrained by the statistics of sensorimotor experience. Proc Natl Acad Sci (USA), 109(51):21099–21103, 2012. ISSN 1091-6490. doi:10.1073/pnas.1213622109.
OpenUrl Abstract/FREE Full Text
31.↵
R Hahnloser and G Narula. A bayesian account of vocal adaptation to pitch-shifted auditory feedback. PLoS ONE, 12:e0169795, 2017.
OpenUrl
32.↵
J Kaipo and E Somersalo. Statistical and Computational Inverse Problems. Springer, 2004. ISBN 0-387-22073-9.
33.↵
M S Brainard and A Doupe. What songbirds teach us about learning. Nature, 417:351–358.
34.↵
BD Kuebrich and SJ Sober. Variations on a theme: Songbirds, variability, and sensorimotor error correction. Neuroscience, 296:48–54, 2015.
OpenUrl
35.↵
Lukas A. Hoffmann, Conor W. Kelly, David A. Nicholson, and Samuel J. Sober. A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Song-birds. Journal of Visualized Experiments : JoVE, (69), 2012. ISSN 1940-087X. doi:10.3791/50027.
OpenUrl CrossRef PubMed
36.↵
B P Ölveczky, A S Andalman, and M S Fee. Vocal Experimentation in the Juvenile Songbird Requires a Basal Ganglia Circuit. PLoS Biology, 3(5):8, 2005. ISSN 1545-7885. doi:10.1371/journal.pbio.0030153.
OpenUrl CrossRef
37.↵
C Niziolek, S Nagarajan, and J Houde. What does motor efference copy represent? evidence from speech production. J Neurosci, 33(41):16110–16116, 2013.
OpenUrl Abstract/FREE Full Text
38.↵
R A Suthers, F Goller, and J M Wild. Somatosensory feedback modulates the respiratory motor program of crystallized birdsong. Proc Natl Acad Sci (USA), 99(8):5680–5685, 2002. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.042103199.
OpenUrl Abstract/FREE Full Text
39.↵
Y Liu, H Hu, J A Jones, Z Guo, W Li, X Chen, P Liu, and H Liu. Selective and divided attention modulates auditory–vocal integration in the processing of pitch feedback errors. European J Neurosci, 42(3):1895–1904, 2015. ISSN 1460-9568. doi: 10.1111/ejn.12949.
OpenUrl CrossRef PubMed
40.↵
N E Scheerer and J A Jones. The predictability of frequency-altered auditory feedback changes the weighting of feedback and feedforward input for speech motor control. European J Neurosci, 40(12):3793–3806, 2014. ISSN 0953816X. doi: 10.1111/ejn.12734.
OpenUrl CrossRef PubMed
41.↵
J Zinn-Justin. Quantum Field Theory and Critical Phenomena. Clarendon Press, 4th edition, 2002.
42.↵
J P Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, Boston, 2015.
43.↵
József Fiser, Pietro Berkes, Gerg˝o Orbán, and Máté Lengyel. Statistically optimal perception and learning: From behavior to neural representations. Trends Cogn Sci, 14(3): 119–130, 2010. ISSN 1364-6613. doi: 10.1016/j.tics.2010.01.003.
OpenUrl CrossRef PubMed Web of Science
44.
Lars Buesing, Johannes Bill, Bernhard Nessler, and Wolfgang Maass. Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons. PLOS Comput Biol, 7(11):e1002211, 2011. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1002211.
OpenUrl CrossRef PubMed
45.
David Kappel, Stefan Habenschuss, Robert Legenstein, and Wolfgang Maass. Network Plasticity as Bayesian Inference. PLOS Comput Biol, 11(11):e1004485, 2015. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004485.
OpenUrl CrossRef PubMed
46.↵
Mihai A. Petrovici, Johannes Bill, Ilja Bytschok, Johannes Schemmel, and Karlheinz Meier. Stochastic inference with spiking neurons in the high-conductance state. Phys Rev E, 94 (4), 2016. ISSN 2470-0045, 2470-0053. doi: 10.1103/PhysRevE.94.042312.
OpenUrl CrossRef
47.↵
S Mittnik, M S Paolella, and S T Rachev. Diagnosing and treating the fat tails in financial returns data. J Empir Finance, 7(3–4):389–416, 2000. ISSN 0927-5398. doi: 10.1016/S0927-5398(00)00019-0.
OpenUrl CrossRef
48.↵
C L Nikias and M Shao. Signal Processing with Alpha-Stable Distributions and Applications. Adaptive and learning systems for signal processing, communications, and control. Wiley, New York, 1995. ISBN 978-0-471-10647-0.
49.↵
D Salas-Gonzalez, J M Górriz, J Ramírez, I A Illán, and E W Lang. Linear intensity normalization of FP-CIT SPECT brain images using the α-stable distribution. NeuroImage, 65: 449–455, 2013. ISSN 1053-8119. doi: 10.1016/j.neuroimage.2012.10.005.
OpenUrl CrossRef
50.↵
I A Belov. On the computation of the probability density function of α-stable distributions. Mathem Model Anal, 2:333–341, 2005.
OpenUrl
51.↵
H Bergström. On some expansions of stable distribution functions. Ark Mat, 2(4):375–378, 1952.
OpenUrl

View the discussion thread.

Posted July 23, 2017.

Download PDF

Citation Tools

Subject Area

Animal Behavior and Cognition

Subject Areas

All Articles

Animal Behavior and Cognition (5209)
Biochemistry (11730)
Bioengineering (8743)
Bioinformatics (29179)
Biophysics (14964)
Cancer Biology (12080)
Cell Biology (17399)
Clinical Trials (138)
Developmental Biology (9417)
Ecology (14174)
Epidemiology (2067)
Evolutionary Biology (18294)
Genetics (12233)
Genomics (16791)
Immunology (11858)
Microbiology (28051)
Molecular Biology (11575)
Neuroscience (60919)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4955)
Plant Biology (10422)
Scientific Communication and Education (1682)
Synthetic Biology (2881)
Systems Biology (7338)
Zoology (1650)

[1] 1.↵
R Shadmehr, M A Smith, and J W Krakauer. Error Correction, Sensory Prediction, and Adaptation in Motor Control. Annual Rev Neurosci, 33(1):89–108, 2010. doi: 10.1146/annurev-neuro-060909-153135.
OpenUrl CrossRef PubMed Web of Science

[2] 2.↵
M D McDonnell and L M Ward. The benefits of noise in neural systems: Bridging theory and experiment. Nature Rev Neurosci, 12(7):415–426, 2011. ISSN 1471-003X, 1471-0048. doi: 10.1038/nrn3061.
OpenUrl CrossRef PubMed

[3] 3.↵
Allen Neuringer. Operant variability: Evidence, functions, and theory. Psychonomic Bulletin & Review, 9(4):672–705, December 2002. ISSN 1069-9384, 1531-5320. doi: 10.3758/BF03196324.
OpenUrl CrossRef PubMed Web of Science

[4] 4.↵
M H Kao, A J Doupe, and M S Brainard. Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Nature, 433(7026):638–643, 2005. ISSN 0028-0836. doi: 10.1038/nature03127.
OpenUrl CrossRef PubMed Web of Science

[5] 5.↵
B A Linkenhoker and E I Knudsen. Incremental training increases the plasticity of the auditory space map in adult barn owls. Nature, 419(6904):293–296, 2002.
OpenUrl CrossRef PubMed

[6] 6.↵
E I Knudsen. Instructed learning in the auditory localization pathway of the barn owl. Nature, 417(6886):322–328, 2002. ISSN 0028-0836. doi: 10.1038/417322a.
OpenUrl CrossRef PubMed Web of Science

[7] 7.↵
M A Smith, A Ghazizadeh, and R Shadmehr. Interacting Adaptive Processes with Different Timescales Underlie Short-Term Motor Learning. PLOS Biol, 4(6):e179, 2006. ISSN 1545-7885. doi: 10.1371/journal.pbio.0040179.
OpenUrl CrossRef PubMed

[8] 8.↵
S J Sober and M S Brainard. Adult birdsong is actively maintained by error correction. Nature Neurosci, 12(7):927–31, 2009. ISSN 1546-1726. doi: 10.1038/nn.2336.
OpenUrl CrossRef PubMed Web of Science

[9] 9.↵
R Rescorla and A Wagner. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Classical Conditioning II, pages 64–99. Appleton-Century-Crofts, 1972.

[10] 10.↵
D Joel, Y Niv, and E Ruppin. Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15(4):535–547, 2002. ISSN 0893-6080. doi:10.1016/S0893-6080(02)00047-3.
OpenUrl CrossRef PubMed Web of Science

[11] 11.
R S Sutton and A G Barto. Reinforcement Learning: An Introduction. MIT Press Cambridge, USA, 2nd edition, 2012.

[12] 12.↵
A Lak, W Stauffer, and W Schultz. Dopamine neurons learn relative chosen value from probabilistic rewards. eLife, 5:e18044, 2016. ISSN 2050-084X. doi: 10.7554/eLife.18044.
OpenUrl CrossRef

[13] 13.↵
K P Kording, J B Tenenbaum, and R Shadmehr. The dynamics of memory as a consequence of optimal adaptation to a changing body. Nature Neurosci, 10(6):779–786, 2007. ISSN 1097-6256. doi: 10.1038/nn1901.
OpenUrl CrossRef PubMed Web of Science

[14] 14.↵
D M Wolpert. Probabilistic models in human sensorimotor control. Human Movement Sci, 26(4):511–524, 2007. ISSN 01679457. doi: 10.1016/j.humov.2007.05.005.
OpenUrl CrossRef PubMed Web of Science

[15] 15.↵
C R Gallistel, T A Mark, A P King, and P E Latham. The rat approximates an ideal detector of changes in rates of reward: Implications for the law of effect. J Exper Psych. Animal Behav Proc, 27(4):354–372, 2001. ISSN 0097-7403.
OpenUrl

[16] 16.↵
S J Gershman. A Unifying Probabilistic View of Associative Learning. PLOS Comput Biol, 11(11):e1004567, 2015. doi: 10.1371/journal.pcbi.1004567.
OpenUrl CrossRef PubMed

[17] 17.↵
O Donchin and R Shadmehr. Linking motor learning to function approximation: Learning in an unlearnable force field. In Adv Neural Inf Proc Syst 14, volume 1, page 7. MIT Press, 2001.
OpenUrl

[18] 18.
R J van Beers. Motor Learning Is Optimally Tuned to the Properties of Motor Noise. Neuron, 63(3):406–417, 2009. ISSN 0896-6273. doi: 10.1016/j.neuron.2009.06.025.
OpenUrl CrossRef PubMed Web of Science

[19] 19.↵
K Wei and K Körding. Relevance of Error: What Drives Motor Adaptation? J Neurophysiol, 101(2):655–664, 2009.
OpenUrl CrossRef PubMed Web of Science

[20] 20.↵
J M Beck, W J Ma, X Pitkow, P Latham, and A Pouget. Not Noisy, Just Wrong: The Role of Suboptimal Inference in Behavioral Variability. Neuron, 74(1):30–39, 2012. ISSN 0896-6273. doi: 10.1016/j.neuron.2012.03.016.
OpenUrl CrossRef PubMed

[21] 21.↵
K Wei and K Körding. Uncertainty of feedback and state estimation determines the speed of motor adaptation. Frontiers Comput Neurosci, 4, 2010. ISSN 1662-5188. doi: 10.3389/fncom.2010.00011.
OpenUrl CrossRef PubMed

[22] 22.↵
C W Kelly and S J Sober. A simple computational principle predicts vocal adaptation dynamics across age and error size. Frontiers Integrative Neurosci, 8:9, 2014. doi:10.3389/fnint.2014.00075.
OpenUrl CrossRef

[23] 23.↵
T Genewein, E Hez, Z Razzaghpanah, and D A Braun. Structure Learning in Bayesian Sensorimotor Integration. PLOS Comput Biol, 11(8):27, 2015. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004369.
OpenUrl CrossRef

[24] 24.↵
R Shadmehr, O Donchin, E-J Hwang, S E Hemminger, and A Rao. Learning dynamics of reaching. Motor Cortex and Voluntary Movements, pages 297–328, 2005.

[25] 25.
P Dayan and Y Niv. Reinforcement learning: The Good, The Bad and The Ugly. Curr Opin Neurobiol, 18(2):185–196, 2008. ISSN 0959-4388. doi: 10.1016/j.conb.2008.08.003.
OpenUrl CrossRef PubMed Web of Science

[26] 26.
B J Fischer and J L Peña. Owl’s behavior and neural representation predicted by Bayesian inference. Nature neurosci, 14(8):1061–1066, 2011. ISSN 1546-1726. doi: 10.1038/nn.2872.
OpenUrl CrossRef PubMed

[27] 27.
S A Neymotin, G L Chadderdon, C C Kerr, J T Francis, and W W Lytton. Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex. Neural Comput, 25(12):3263–3293, 2013. ISSN 0899-7667. doi: 10.1162/NECO_a_00521.
OpenUrl CrossRef

[28] 28.↵
W Schultz. Neuronal Reward and Decision Signals: From Theories to Data. Physiol Rev, 95(3):853–951, 2015.
OpenUrl CrossRef PubMed

[29] 29.↵
F R Robinson, C T Noto, and S E Bevans. Effect of Visual Error Size on Saccade Adaptation in Monkey. J Neurophysiol, 90(2):1235–1244, 2003. ISSN 0022-3077, 1522-1598. doi:10.1152/jn.00656.2002.
OpenUrl CrossRef PubMed Web of Science

[30] 30.↵
S J Sober and M S Brainard. Vocal learning is constrained by the statistics of sensorimotor experience. Proc Natl Acad Sci (USA), 109(51):21099–21103, 2012. ISSN 1091-6490. doi:10.1073/pnas.1213622109.
OpenUrl Abstract/FREE Full Text

[31] 31.↵
R Hahnloser and G Narula. A bayesian account of vocal adaptation to pitch-shifted auditory feedback. PLoS ONE, 12:e0169795, 2017.
OpenUrl

[32] 32.↵
J Kaipo and E Somersalo. Statistical and Computational Inverse Problems. Springer, 2004. ISBN 0-387-22073-9.

[33] 33.↵
M S Brainard and A Doupe. What songbirds teach us about learning. Nature, 417:351–358.

[34] 34.↵
BD Kuebrich and SJ Sober. Variations on a theme: Songbirds, variability, and sensorimotor error correction. Neuroscience, 296:48–54, 2015.
OpenUrl

[35] 35.↵
Lukas A. Hoffmann, Conor W. Kelly, David A. Nicholson, and Samuel J. Sober. A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Song-birds. Journal of Visualized Experiments : JoVE, (69), 2012. ISSN 1940-087X. doi:10.3791/50027.
OpenUrl CrossRef PubMed

[36] 36.↵
B P Ölveczky, A S Andalman, and M S Fee. Vocal Experimentation in the Juvenile Songbird Requires a Basal Ganglia Circuit. PLoS Biology, 3(5):8, 2005. ISSN 1545-7885. doi:10.1371/journal.pbio.0030153.
OpenUrl CrossRef

[37] 37.↵
C Niziolek, S Nagarajan, and J Houde. What does motor efference copy represent? evidence from speech production. J Neurosci, 33(41):16110–16116, 2013.
OpenUrl Abstract/FREE Full Text

[38] 38.↵
R A Suthers, F Goller, and J M Wild. Somatosensory feedback modulates the respiratory motor program of crystallized birdsong. Proc Natl Acad Sci (USA), 99(8):5680–5685, 2002. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.042103199.
OpenUrl Abstract/FREE Full Text

[39] 39.↵
Y Liu, H Hu, J A Jones, Z Guo, W Li, X Chen, P Liu, and H Liu. Selective and divided attention modulates auditory–vocal integration in the processing of pitch feedback errors. European J Neurosci, 42(3):1895–1904, 2015. ISSN 1460-9568. doi: 10.1111/ejn.12949.
OpenUrl CrossRef PubMed

[40] 40.↵
N E Scheerer and J A Jones. The predictability of frequency-altered auditory feedback changes the weighting of feedback and feedforward input for speech motor control. European J Neurosci, 40(12):3793–3806, 2014. ISSN 0953816X. doi: 10.1111/ejn.12734.
OpenUrl CrossRef PubMed

[41] 41.↵
J Zinn-Justin. Quantum Field Theory and Critical Phenomena. Clarendon Press, 4th edition, 2002.

[42] 42.↵
J P Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, Boston, 2015.

[43] 43.↵
József Fiser, Pietro Berkes, Gerg˝o Orbán, and Máté Lengyel. Statistically optimal perception and learning: From behavior to neural representations. Trends Cogn Sci, 14(3): 119–130, 2010. ISSN 1364-6613. doi: 10.1016/j.tics.2010.01.003.
OpenUrl CrossRef PubMed Web of Science

[44] 44.
Lars Buesing, Johannes Bill, Bernhard Nessler, and Wolfgang Maass. Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons. PLOS Comput Biol, 7(11):e1002211, 2011. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1002211.
OpenUrl CrossRef PubMed

[45] 45.
David Kappel, Stefan Habenschuss, Robert Legenstein, and Wolfgang Maass. Network Plasticity as Bayesian Inference. PLOS Comput Biol, 11(11):e1004485, 2015. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004485.
OpenUrl CrossRef PubMed

[46] 46.↵
Mihai A. Petrovici, Johannes Bill, Ilja Bytschok, Johannes Schemmel, and Karlheinz Meier. Stochastic inference with spiking neurons in the high-conductance state. Phys Rev E, 94 (4), 2016. ISSN 2470-0045, 2470-0053. doi: 10.1103/PhysRevE.94.042312.
OpenUrl CrossRef

[47] 47.↵
S Mittnik, M S Paolella, and S T Rachev. Diagnosing and treating the fat tails in financial returns data. J Empir Finance, 7(3–4):389–416, 2000. ISSN 0927-5398. doi: 10.1016/S0927-5398(00)00019-0.
OpenUrl CrossRef

[48] 48.↵
C L Nikias and M Shao. Signal Processing with Alpha-Stable Distributions and Applications. Adaptive and learning systems for signal processing, communications, and control. Wiley, New York, 1995. ISBN 978-0-471-10647-0.

[49] 49.↵
D Salas-Gonzalez, J M Górriz, J Ramírez, I A Illán, and E W Lang. Linear intensity normalization of FP-CIT SPECT brain images using the α-stable distribution. NeuroImage, 65: 449–455, 2013. ISSN 1053-8119. doi: 10.1016/j.neuroimage.2012.10.005.
OpenUrl CrossRef

[50] 50.↵
I A Belov. On the computation of the probability density function of α-stable distributions. Mathem Model Anal, 2:333–341, 2005.
OpenUrl

[51] 51.↵
H Bergström. On some expansions of stable distribution functions. Ark Mat, 2(4):375–378, 1952.
OpenUrl