## Abstract

Sensory deprivation has long been known to cause hallucinations or “phantom” sensations, the most common of which is tinnitus induced by hearing loss, affecting 10–20% of the population. An observable hearing loss, causing auditory sensory deprivation over a band of frequencies, is present in over 90% of people with tinnitus. Existing plasticity-based computational models for tinnitus are usually driven by homeostasis mechanisms, modeled to fit phenomenological findings. Here, we use an objective-driven learning algorithm to model an early auditory processing neuronal network, e.g., in the dorsal cochlear nucleus. The learning algorithm maximizes the network’s output entropy by learning the feed-forward and recurrent interactions in the model. We show that the connectivity patterns and responses learned by the model display several hallmarks of early auditory neuronal networks. We further demonstrate that attenuation of peripheral inputs drives the recurrent network towards its critical point and transition into a tinnitus-like state. In this state, the network activity resembles responses to genuine inputs even in the absence of external stimulation, namely, it “hallucinates” auditory responses. These findings demonstrate how objective-driven plasticity mechanisms that normally act to optimize the network’s input representation can also elicit pathologies such as tinnitus as a result of sensory deprivation.

**Author summary** Tinnitus or “ringing in the ears” is a common pathology. It may result from mechanical damage in the inner ear, as well as from certain drugs such as salicylate (aspirin). A common approach toward a computational model for tinnitus is to use a neural network model with inherent plasticity applied to early auditory processing, where the input layer models the auditory nerve and the output layer models a nucleus in the brain stem. However, most of the existing computational models are phenomenological in nature, driven by a homeostatic principle. Here, we use an objective-driven learning algorithm based on information theory to learn the feed-forward interactions between the layers, as well as the recurrent interactions within the output layer. Through numerical simulations of the learning process, we show that attenuation of peripheral inputs drives the network into a tinnitus-like state, where the network activity resembles responses to genuine inputs even in the absence of external stimulation; namely, it “hallucinates” auditory responses. These findings demonstrate how plasticity mechanisms that normally act to optimize network performance can also lead to undesired outcomes, such as tinnitus, as a result of reduced peripheral hearing.

## Introduction

Tinnitus is a common form of auditory hallucinations, affecting the quality of life of many people (≈10–20% of the population, [1–6]). It can manifest as “ringing” in a narrow frequency band, but also as noise over a wide frequency range. An observable hearing loss, causing sensory deprivation over a band of frequencies, is present in *>*90% of people with tinnitus [1–4], and the remaining people with tinnitus are believed to suffer some damage in higher auditory processing pathways [5, 7] or have some cochlear damage that does not affect the audiogram [8].

From a neural processing point of view, hallucinations correspond to brain activity in sensory networks, which occurs in the absence of an objective external input. Hallucinations can occur in all sensory modalities, and can be induced by drugs, certain brain disorders, and sensory deprivation. For example, it is well known that visual deprivation (e.g., being in darkness for an extended period) elicits visual hallucinations, and, similarly, auditory deprivation elicits auditory hallucinations [9–11].

Although the causes of tinnitus can sometimes be mechanical (“objective tinnitus” [2, 12]), this is not the case in *>*95% of patients [6, 12]. This so-called “subjective tinnitus” is commonly associated with plasticity of feedback and recurrent neuronal circuits [2, 5, 8, 13–16].

The dorsal cochlear nucleus (DCN) is known to display tinnitus-related plastic reorganization following cochlear damage [17–20], and is thought to be a key player in the generation of tinnitus [21–24]. It is stimulated directly by the auditory nerve with a tonotopic mapping. Each output unit, composed of a group of different cells, receives inputs from a small number of input fibers and inhibits units of similar tuning [25, 26]. This connectivity pattern results in a sharp detection of specific notches [26]. As the DCN is the earliest candidate along the auditory path displaying tinnitus-related activity [17, 18], it is the most common candidate for the generation of tinnitus [21–24]. This choice is also supported by DCN hyperactivity following artificial induction of tinnitus [19, 20]. Interestingly, this induced hyperactivity persists even if the DCN is later isolated from inputs other than the auditory nerve [27]. This suggests that tinnitus-related hyperactivity in the DCN is intrinsic and not caused by feedback from higher order auditory networks.

While existing computational models successfully account for some of the characteristics of tinnitus [28], many of them are based on lateral inhibition [29–31] or gain adaptation [32], and do not take into account long-term neural plasticity. Plasticity-based models for tinnitus are usually phenomenological models, where plasticity is described as a homeostatic process [33–39] or an amplification of central noise [40], rather than as a process which serves a computational goal. Another computational model for tinnitus is based on stochastic resonance and suggests that tinnitus arises from an adaptive optimal noise level, but it is focused on a single auditory frequency and has yet to be further explored [41].

In this work, we try to gain new insights into tinnitus by using information theoretic-driven plasticity. We implemented the entropy maximization (EM) approach in a recurrent neural network [42] to model the connection between the raw sensory input and its downstream representation. This approach was previously applied to model the feed-forward connectivity in the primary visual cortex, giving rise to orientation-selective Gabor-like receptive fields [43]. A later generalization of the algorithm to learning recurrent connectivity [42] was used to show that EM drives recurrent visual neural networks toward critical behavior [44]. Furthermore, the evolved recurrent connectivity profile has a Mexican-hat shape; namely, neurons with similar preferred orientations tend to excite one another, while neurons with distant preferred orientations tend to inhibit one another. While the aforementioned studies focused on the normal function of the visual system, EM-based neural networks were barely used to model any abnormalities or to study the effect of changes in input statistics [45]. The relationship between EM-based adaptation and the emergence of tinnitus from sensory deprivation was previously discussed in the context of single neurons [46], yet it was never explored on a large-scale recurrent network.

Here, we trained a recurrent EM neural network to represent auditory stimuli, so it can stand as a simplified model for early auditory processing. Subsequently, to test the effect of sensory deprivation on the network’s output representation, we modified the input statistics by attenuating a certain frequency band. Our findings show that tinnitus-like hallucinations naturally arise in this model following sensory deprivation. These findings suggest that hallucinations following sensory deprivation can stem from general long-term plasticity mechanisms that act to optimize the representation of sensory information. Furthermore, our analysis indicates that the trained network tends to operate near a critical point on the verge of hallucinations, similar to previous findings [44]. The increased gain of the recurrent interactions, which acts to compensate for the attenuated input, may lead the network to cross the critical point into a regime of hallucinations.

## Results

To model the early stages of auditory processing (e.g., DCN), we used an entropy maximization (EM) approach to train a recurrent neural network (see Methods). The neurons obey first-order rate dynamics, and it is assumed that the network reaches a steady state following the presentation of each stimulus. The learning algorithm for the feed-forward and recurrent connectivity was based on the gradient-descent algorithm described in [42], with the addition of regularization. The network was trained in an unsupervised manner to represent simulated auditory stimuli (see Methods for more details). Figure 1 depicts the network’s architecture and a typical stimulus.

In all simulations described here, we used a network of 40 input neurons and 400 output neurons (an overcomplete representation). Regularization was achieved using a cost on the norm of the weights and was applied to both feed-forward (using *ℓ*_{1} norm) and recurrent (using *ℓ*_{2} norm) sets of connections (see Methods). The coefficients of the regularization terms were set to *λ*_{W} = 0.001 for the feed-forward connections and *λ*_{K} = 0.183 for the recurrent connections (for details regarding these choices, see below the subsection on the Regularization effect).

### Training using typical stimuli

First, we trained the network using typical auditory inputs, simulated as a combination of multiple narrow Gaussians in the frequency domain with additional noise (see Methods and Fig. 1B). After the convergence of the learning process, each output neuron had a specific and unique preferred frequency (Fig. 2A). Furthermore, the connectivity profiles converged to a “Mexican-hat” shape for both feed-forward and recurrent connections (Fig. 2B,D). This profile of connectivity causes neurons with adjacent frequencies to excite one another, while neurons with slightly more distant frequencies inhibit each other. The significance of this profile lies in its ability to reduce the width of the output response profile for a Gaussian input, thus, effectively reducing the noise. Similarly shaped spectral receptive fields were observed in various primary auditory networks [25, 26, 47, 48] including the DCN, suggesting similar connectivity patterns.

The network’s response to typical stimuli shows tonotopic responses, and the response in the absence of external stimuli is near spontaneous activity (Fig. 3A–F). We note that the initial feed-forward connectivity was manually tuned to produce a tonotopic mapping (using weak Gaussian profiles with ordered centers). Although the feed-forward connections do change throughout the learning process, the tonotopic organization remains stable. The tonotopic mapping is a well-known property of all auditory processing stages between the cochlea and the auditory cortex in various species, including humans [49–53]. The preservation of the tonotopic organization throughout the learning process is in agreement with biological observations, suggesting that it is created in the embryonic stages of development and is preserved through plasticity processes [54].

We noticed that spatial connectivity profiles barely change throughout the learning, while their scale changes dramatically. In light of this observation, we quantified several global parameters of the network as a function of the scale of the recurrent connectivity matrix (Fig. 4). We also used these measurements to gain insights into the effect of regularization on our results. First, note that the regularization caused the network learning process to converge to slightly down-scaled recurrent interactions compared to the optimal scale in terms of the non-regularized objective function (Fig. 4A). This specific scale seems to play a role in determining the proximity of the network dynamics to the critical point. Specifically, the convergence time rises dramatically at this point, reflecting the well-known phenomenon of “critical slowing down” [55–58]. In addition, at this scale, the population vector’s magnitude rises sharply, reflecting the emergence of non-uniform activity profiles in the absence of a structured input (see Methods and Fig. 4B,C). All these results point to the same conclusion – without the regularization, the recurrent connectivity should have been scaled by ≈ 1.3, such that the spectral radius of the recurrent connectivity matrix would be ≈4. We note that the maximal derivative of the chosen activation function 1*/*(1 + exp (−*x*)) is 1*/*4. Thus, having the spectral radius of the recurrent connectivity matrix near 4 indicates proximity to the critical point (see Methods). This means that the regularization keeps the recurrent connectivity below its optimal scale (in terms of the entropy term alone), and the network remains subcritical. We note that for different regularization coefficients, the scale of the interactions could obtain different values.

### Sensory deprivation

After the learning was stabilized for normal stimuli, we attenuated the inputs in the higher half of the frequency range (Fig. 1C), and let the network’s recurrent connections adapt to the new input statistics. Consequently, the recurrent connectivity between the deprived neurons was strengthened (Fig. 2E,F). The stronger recurrent connectivity in the deprived region led to a phase transition, resulting in an inhomogeneous stationary activity pattern independent of the given input (Fig. 3G–I). We interpret those results as “hallucinations”, elicited by the sensory deprivation. Interestingly, the “hallucinations” in our model develop only in the deprived region of the output layer, consistent with certain types of tinnitus [3, 46, 59, 60].

Following the induction of sensory deprivation, we evaluated the criticality measures once again (Fig. 4D–F). The results remained qualitatively similar, but the optimal scale moved much closer to 1 (≈1.06). Thus, the network converged to a point much closer to its critical point, compared to its state before the induction of sensory deprivation. We note that following sensory deprivation, the effect of learning on the recurrent connections is not limited to scaling. Hence, the different measures exhibit different patterns in the supercritical domain (above the scale of ≈1.06).

### Regularization effect

As discussed above, to keep the dynamics from crossing into the supercritical domain, we added regularization to the network’s weights. For each type of connectivity matrix (feed-forward and recurrent), we tested regularization both by *ℓ*_{1} and *ℓ*_{2} norms of the connections. Applying *ℓ*_{1} regularization is known to lead to sparse connectivity [61]; however, applying it to the recurrent connectivity matrix ended in nullifying all connections but two, which were still strong enough to turn the dynamics into the supercritical domain. These results are extremely non-biological (as recurrent connectivity is present in most biological neural networks); thus, we focus only on simulations where the recurrent connections were regularized by their *ℓ*_{2} norm. Using either the *ℓ*_{1} or *ℓ*_{2} norm to regularize the feed-forward connectivity did not have a dramatic effect on the results. Since using the *ℓ*_{1} norm leads to a more biological sparse connectivity, as found experimentally in the DCN [26], we chose to focus on this option.

The stability of the network’s fixed point is determined by the sign of the eigenvalues of the matrix that controls the linearized dynamics. In this case, the corresponding matrix is (*I* − *GK*), where *K* is the recurrent connectivity matrix and *G* is a diagonal matrix containing the derivatives of the activation function for each output neuron (see Methods). Since the maximal derivative of the chosen activation function (1*/*(1 + exp (−*x*))) is 1*/*4, the critical point is characterized by having the spectral radius of the recurrent connectivity matrix, *K*, near 4. We used this result as an efficient surrogate to the actual critical point.

In our simulations, the spectral radius of the recurrent connectivity matrix *K* decreased with the respective regularization coefficient *λ*_{K}, with a characteristic sharp drop (Fig. 5). Generally, the value of *λ*_{K} where this drop occurs depends mainly on the number of output neurons; however, in our simulations, sensory deprivation caused this value to rise. This phenomenon created an interval of *λ*_{K} values, where sensory deprivation drives the dynamics much closer to the critical point, thus, eliciting the hallucination-like responses described before. To emphasize this effect, we used the lower bound of this interval (*λ*_{K}=0.183) in all simulations previously displayed; for larger values of *λ*_{K}, the system will be further away from the critical point.

## Discussion

In this work, we used an EM approach to train a recurrent neural network to represent simulated auditory stimuli, and examined the effect of input statistics on the evolved representation. For typical inputs, the network developed connectivity patterns and exhibited output responses similar to biological findings regarding the auditory system in general [62–65] and, more specifically, the DCN [25, 26]. Interestingly, sensory deprivation elicited tinnitus-like “hallucinations” in the network, resembling the characteristics of certain types of tinnitus [3, 11, 46, 59, 60]. Although we focused here on tinnitus, this qualitative phenomenon is independent of the input modality and can be used to explain how other kinds of “phantom” sensations are caused by neural plasticity and involve the specific region in the sensory input space, which was deprived of input [66, 67].

Previous computational models relied on phenomenological homeostasis-driven plasticity to demonstrate tinnitus elicited by sensory deprivation [33–38]. Here, we used an objective-driven plasticity, namely, the main mechanism underlying the network’s plasticity is optimizing an explicit computational goal. Specifically, the network maximizes the entropy of its output, which corresponds to increasing input sensitivity [44]. The general resemblance of our model to biological findings supports the hypothesis that EM serves as a computational objective for primary sensory processing networks in the brain (e.g., [43, 44]). However, as described in the Methods section, the vanilla EM learning rules drive the network into a phase transition. This process leads the network away from a stable fixed point and into dynamical states with poor information representation. Thus, some regularization should be used to keep the network subcritical. To this end, we used a penalty on the *ℓ*_{2} norm of the recurrent connections as a regularization method, which can be thought of as a kind of homeostatic mechanism [68–72]. In this model, the emergence of tinnitus depends on the interplay between the computational objective and the homeostatic regularization, in contrast to models driven by a single phenomenological homeostatic mechanism. Future studies might employ different types of regularization methods (e.g., firing-rate-based rather than weight-based) and examine their effect on the dynamics of the network.

While most of the hyper-parameters of the model can be chosen arbitrarily without having any qualitative effect on the results, the regularization coefficient for the recurrent connectivity, *λ*_{K}, is an exception; if it is too small, numerical instabilities might accidentally drive the network into a supercritical domain, but if it is too large, the network will always remain subcritical. In the first case, the output may no longer be dependent on the input, while in the second case, the the input may have little effect on the output – in both cases, moving away from the critical point leads to poor sensitivity. In practice, there is a specific range of values which yields the qualitative results demonstrated in this paper (see Fig. 5) and, according to our observations, it is mainly dependent on the number of output neurons. Here, we used a grid search to find the corresponding range, and the results were obtained using the minimal value within it. This choice minimized the cost of regularization relative to the EM objective, while keeping the evolved dynamics in the subcritical regime. Furthermore, this choice of *λ*_{K} drives the network close to the critical point and emphasizes the effect of sensory deprivation on the transition into the tinnitus-like domain and on the resulting “hallucinations”.

These results are in line with a plethora of studies from recent years, suggesting near-critical dynamics in biological neural networks across various scales, from neuronal cultures to large-scale human brain activity [73–81]. In particular, it is proposed that healthy neural dynamics are poised near a critical point, yet within the subcritical domain [82]. Under these circumstances, changes in the input statistics can trigger the network to transition into supercritical dynamics, which may manifest as hallucinations. Our study portrays a concrete, albeit simplified, network model that leads to near-critical dynamics and experiences a transition from healthy to pathological neural dynamics as a consequence of inherent plasticity and sensory deprivation. We note that the network dynamics here are too simplified to enable a direct comparison with the rich dynamics observed in cortical networks and with common hallmarks of criticality (e.g., [73]).

An illuminating perspective on the emergence of hallucinations, such as tinnitus, as a consequence of sensory deprivation comes from the framework of Bayesian inference [83–85]. According to this framework, sensory systems generate perception by combining the incoming stimuli with prior expectations in a way that takes into account the relative uncertainty of each. Under sensory deprivation, the uncertainty about the input is very large; hence, the weight of the prior expectations become more dominant. This process may eventually lead to a state in which prior expectations dominate perception, which can be interpreted as a hallucination [86]. If this perception is maintained long enough, it will turn into a strong prior by itself, thus, giving rise to a chronic hallucination – namely, tinnitus [84]. Although our model does not use the Bayesian framework explicitly, it does share a few characteristics with the Bayesian approach. For example, according to the Bayesian framework, the profile of the “hallucination” is an amplified prior, so it should resemble typical inputs–much like our results (see Fig. 3G–I). The advantage of the model described here lies in its mechanistic nature, namely, that it is cast in the language of neuronal networks with long-term plasticity. Thus, it can be more straightforward to interpret and compare to experimental data.

To summarize, we have demonstrated how the EM approach can be used as a model for early auditory processing and the phenomenon of tinnitus. Previous works have demonstrated that EM-based neural networks can serve as models for early visual processing [43, 44] and the phenomenon of synaesthesia [45]. We believe that this framework can be used for modeling other modalities and phenomena as well. It is also important to extend this framework to more biologically plausible network models, which could account for more detailed aspects of the underlying neural dynamics.

## Methods

### The model

We modeled an early auditory processing neural network (e.g., the DCN) using the overcomplete recurrent EM neural network described in [42], with the addition of regularization on strong connectivity.

### Network architecture and dynamics

Our system is composed of *M* input neurons, **x**, and *N* output neurons, **s**. Each output neuron’s activity through time is given by the dynamic equation:
where *W* is the feed-forward connectivity matrix, *K* is the recurrent connectivity matrix, *T* are the output neurons’ thresholds, and *g* (*x*) = 1*/*(1 + exp (−*x*)) is the activation function of the neurons. For overcomplete transformations, we assume *M < N* (Fig. 1A).

The fixed points of Eq. 1 are given implicitly by:
These fixed points are stable iff all of the eigenvalues of the linearized dynamics matrix (*I* − *GK*) have positive real parts [44] [*G* is a diagonal matrix defined by . Since the values of *G* are upper-bounded by max_{x} *g* (*x*) = 1*/*4, for a matrix *K* with eigenvalues *<*4, the fixed points are indeed stable. In practice, when fixed points exist at all, there will usually only be one such stable fixed point.

Numerically, the steady state can be found via integrating Eq. 1 using Euler’s method for a long time-period until the activities stabilize; however, this method is highly inefficient. In this work, we found the steady state by solving Eq. 2 directly using the Newton-Raphson method.

When the eigenvalues of *K* are near 4, the eigenvalues of (*I* − *GK*) might get close to zero. Crossing this point will result in instability of the fixed point and a phase transition. Near this phase transition, the decrease in the eigenvalues of (*I* − *GK*) will cause the effective time constant to rise – a phenomenon termed “critical slowing down”. Furthermore, such a phase transition is expected to be characterized by a spontaneous symmetry breaking [87], which can be measured by several metrics. Here, we used the population vector for that purpose, calculated as where *ϕ*_{k} ≡ 2*πk/N* . Although in our case the boundary conditions are not periodic, we assume their effect to be negligible since *N ≫* 1.

### Learning rules

The goal of the network is to find the set {*W* ^{∗}, *K*^{∗}, *T* ^{∗}}which maximizes the entropy *H*(**s**) of the steady state outputs. To do so, we used the objective function described in [42], with additional regularization terms on the *ℓ*_{1} and *ℓ*_{2} norms of *W* and *K*, respectively:
where is the Jacobian of the transformation given by *χ* = *ϕW* , and *ϕ* ≡ − (*I* − *GK*)^{−1}*G* [42].

This objective function, without the regularization terms, would lead to an increase in the singular values of *χ*. One way to achieve that goal is to decrease the eigenvalues of (*I GK*) to zero, which may lead one of them to turn slightly negative due to numerical errors. This will result in instability of the fixed point and a phase transition, as discussed above. The goal of the regularization terms is to prevent this phenomenon, which is a general property of unregularized entropy maximization systems of continuous variables [88].

The learning rules were derived using the gradient descent method, as in [42]:
where is defined by (*S* (*A*)) sign (*A*_{ij}) and *χ*^{+} stands for the pseudo-inverse of *χ* (in the overcomplete case used here, *χ*^{+} = *χ*^{T} *χ χ*^{T}).

### Auditory inputs

The input stimuli were chosen according to certain heuristics to emulate the system’s response to tones of varying frequencies and amplitudes. Each input sample embodies the reaction of the auditory hair cells to a combination of tones of certain frequencies. The input profile for a pure tone is centered on the neuron that best matches that frequency, and drops off to neighboring neurons to form a narrow Gaussian response curve. The amplitude and the frequency were selected at random with a uniform distribution on the permitted ranges. In addition to the input response, all neurons feature some spontaneous activity that is irrespective of the inputs to model the neurons’ reaction to background noises and non-stimulated motion of the hair cells (Fig. 1B).

The amplitudes of natural sounds are not uniformly distributed, loud sounds being exponentially less common; however, the response of the inner hair cells is determined not only by the absolute amplitude of the sound, but also by the reactivity of the basilar membrane, as controlled by the outer hair cells. This serves as an automatic gain control mechanism, giving the inner hair cells use of their full motion capacity for normal inputs. Therefore, we hold the uniform distribution to be a good approximation to the output of the inner hair cells when presented with natural sounds [89, 90] To model sensory deprivation, we attenuated the higher half of the frequency domain by applying a (monotonically decreasing) sigmoid envelope to all stimuli (Fig. 1C). The choice of attenuating the higher frequencies was based on the most common type of hearing loss [91, 92], but attenuation could be applied to other frequency bands.

## Acknowledgments

The authors wish to thank Avishalom Shalit and Jennifer Resnik for helpful discussions and valuable comments on the manuscript.

## Footnotes

↵* shrikio{at}bgu.ac.il

## References

- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.
- 35.
- 36.
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.
- 51.
- 52.
- 53.↵
- 54.↵
- 55.↵
- 56.
- 57.
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.
- 71.
- 72.↵
- 73.↵
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵