## Summary

The brain has an impressive ability to withstand neural damage. Diseases that kill neurons can go unnoticed for years, and incomplete brain lesions or silencing of neurons often fail to produce any effect. How does the brain compensate for such damage, and what are the limits of this compensation? We propose that neural circuits optimally compensate for neuron death, thereby preserving their function as much as possible. We show that this compensation can explain changes in tuning curves induced by neuron silencing across a variety of systems, including the primary visual cortex. We find that optimal compensation can be implemented through the dynamics of networks with a tight balance of excitation and inhibition, without requiring synaptic plasticity. The limits of this compensatory mechanism are reached when excitation and inhibition become unbalanced, thereby demarcating a recovery boundary, where signal representation fails and where diseases may become symptomatic.

## Introduction

The impact of neuron loss on information processing is poorly understood (Montague et al., 2012; Palop et al., 2006). Every day, approximately 85,000 neurons die in the healthy adult brain (Morrison and Hof, 1997; Pakkenberg and Gundersen, 1997). Chronic diseases such as Alzheimer’s disease and brain tumor growth increase this death rate considerably, and acute events such as a stroke and traumatic brain injury can kill huge numbers of cells rapidly. Yet, this damage can often be asymptomatic and go unnoticed for years (Leary and Saver, 2003). Similarly, the incomplete lesion or silencing of a targeted brain area may ‘fail’ to produce any measurable, behavioral effect. This resilience of neural systems to damage is especially impressive when compared to man-made computer systems that typically lose all function following only minor destruction of their circuits. A thorough understanding of the interplay between neural damage and information processing is therefore crucial for our understanding of the nervous system, and may also help in the interpretation of various experimental manipulations such as pharmacological silencing (Aksay et al., 2007; Crook and Eysel, 1992), lesion studies (Keck et al., 2008), and optogenetic perturbations (Fenno et al., 2011).

In contrast to the study of information representation in damaged brains, there has been substantial progress in our understanding of information representation in healthy brains (Abbott, 2008). In particular, the theory of efficient coding, which states that neural circuits represent sensory signals optimally given various constraints (Barlow, 1961; Olshausen and Field, 1996; Olshausen and Simoncelli, 2001; Rieke et al., 1997; Salinas, 2006), has successfully accounted for a broad range of observations in a variety of sensory systems, in both vertebrates (Greene et al., 2009; Olshausen and Field, 1996; Olshausen and Simoncelli, 2001; Smith and Lewicki, 2006) and invertebrates (Fairhall et al., 2001; Machens et al., 2005; Rieke et al., 1997). However, an efficient representation of information is of little use if it cannot withstand some damage, such as normal cell death. Plausible mechanistic models of neural computation should be able to withstand the type of damage that the brain can withstand.

In this work, we propose that neural systems maintain stable signal representations by actively compensating for the destruction of neurons. We show that this compensation does not require synaptic plasticity, but can be implemented instantaneously by a balanced network whose dynamics and connectivity are tuned to implement efficient coding (Boerlin et al., 2013; Bourdoukan et al., 2012). When too many cells die, this balance is disrupted, and the signal representation is lost. We predict how much cell death can be tolerated by a neural system and how tuning curves change shape following optimal compensation. We illustrate these predictions using three specific neural systems for which experimental data before and after silencing are available - the oculomotor integrator in the hindbrain (Aksay et al., 2000, 2007), the cricket cercal system (Mizrahi and Libersat, 1997; Theunissen and Miller, 1991), and the primary visual cortex (Crook and Eysel, 1992; Crook et al., 1996, 1997, 1998; Hubel and Wiesel, 1962). In addition, we show that many input/output non-linearities in the tuning of neurons can be re-interpreted to be the result of compensation mechanisms within neural circuits. Therefore, beyond dealing with neuronal death, the proposed optimal compensation principle expands the theory of efficient coding and provides important insights and constraints for any neural code.

## Results

**The principle of optimal compensation.** We begin our illustration of optimal compensation using a simple two-neuron system that represents a signal, *x*(*t*). This signal may be a time-dependent sensory signal such as luminosity for example, or more generally, may be the result of a computation from within a neural circuit. We make two assumptions about how this signal is represented. First, we assume that an estimate of the signal can be extracted from the neurons by summing their instantaneous firing rates:
where *r*_{1}(*t*) is the firing rate of neuron 1, *r*_{2}(*t*) is the firing rate of neuron 2, and is the signal estimate. Second, we assume that the representation performance can be quantified using a simple cost-benefit trade-off. This trade-off is called our loss function and is given by:

Here, the first term quantifies the signal representation error - the smaller the difference between the readout and the actual signal *x*, the smaller the representation error. The second term quantifies the cost of the representation and is given by . This term acts as a regularizer, ensuring that signal representations are shared amongst all neurons using low firing rates. The parameter *β* determines the trade off between this cost and the representation error. Quadratic loss functions such as Equation 2 have been a mainstay of efficient coding theories, for instance in the context of stimulus reconstruction (Rieke et al., 1997) or sparse coding (Olshausen and Field, 1996).

We can now study how firing rates must change in this system to compensate for the death of a neuron. In the healthy, undamaged state, there are many possible firing rate combinations that can represent a given signal (Figure 1A, dashed line). Among these combinations, the solution with equal firing rates is optimal, because it has the smallest cost (Figure 1A, black circle). Now, we kill neuron 1 in our simple model, and ask: how should the firing rate of neuron 2 change, in order to maintain the representation performance? We see that its firing rate will double (for very small *β*) so as to compensate for the failure of neuron 1 (Figure 1A, red circle). We obtain this result by minimizing the loss function (Equation 2) under the constraint that *r*_{2} = 0. This constraint is the mathematical equivalent of neural death. The induced change in the firing rate of neuron 2 preserves the signal representation, and thereby compensates for the loss of neuron 1.

The system can compensate the loss of a neuron because it is redundant (two neurons for one signal). While such redundancy is a necessary prerequisite for robustness, our toy example shows that it is by no means sufficient: to keep functionality, and to properly represent the signal, the firing rate of the remaining neuron needs to change.

**Optimal compensation through instantaneous restoration of balance.** Next, we investigate how this compensation can be implemented in a neural circuit. One possibility is that a circuit rewires through internal plasticity mechanisms in order to correct for the effect of the lost neurons. Surprisingly, we find that plasticity is not necessary. Rather, neural networks can be wired up such that their internal dynamics moves the instantaneous firing rates rapidly into the minimum of the loss function (Equation 2) (Boerlin et al., 2013; Charles et al., 2012; Hu et al., 2012; Rozell et al., 2008). As we will show, such dynamics are sufficient to compensate for the loss of neurons.

First, we assume that instantaneous firing rates, *r _{i}*(

*t*), are equivalent to filtered spike trains, similar to the postsynaptic filtering of actual spike trains (see Methods for details). Specifically, a spike fired by neuron

*i*contributes a discrete unit to its instantaneous firing rate,

*r*→

_{i}*r*+ 1, followed by an exponential decay. Second, we assume that each neuron will only fire a spike if the resulting change in the instantaneous firing rate reduces the loss (Equation 2). From these two assumptions, the dynamics and connectivity of the network can be derived (Boerlin et al., 2013; Bourdoukan et al., 2012). Specifically, we obtain a network containing two integrate-and-fire neurons driven by an excitatory input signal and coupled by mutual inhibition (Figure 1B). Both neurons work together, and take turns at producing spikes (Figure 1C). The neuron that reaches threshold first produces a spike, contributes to the signal readout and inhibits the other neuron. Now, when a neuron dies (Figure 1C, red arrow), the remaining neuron no longer receives inhibition from its partner neuron and it becomes much more strongly excited, spiking twice as often. In other words, the remaining neuron now implements the full optimization alone. This compensation happens because each neuron seeks to minimize the loss function, with or without the help of other neurons. When one neuron dies, the remaining neuron automatically and rapidly assumes the full burden of signal representation. In this way, the simple spiking network naturally implements optimal compensation following cell death (see Methods for additional details).

_{i}These ideas can be extended to realistic neural circuits by scaling up to larger networks. In a network of *N* neurons, the signal is represented by the weighted summation of *N* firing rates:
where *w _{i}* is the readout weight of the

*i*neuron and

^{th}*r*(

_{i}*t*) is the instantaneous firing rate of the

*i*neuron (Figure 2A–C). Such linear readouts are used in many contexts and broadly capture the integrative nature of dendritic summation. Just as for the two-neuron network, we can then derive both network connectivity and dynamics for this larger network, by assuming that its spiking dynamics minimize the loss function (Equation 2) (Boerlin et al., 2013; Bourdoukan et al., 2012).

^{th}The resulting network consists of leaky integrate-and-fire neurons receiving a tight balance of excitatory and inhibitory inputs (Figure 2D–G). In this balanced state, the signal *x*(*t*) is represented accurately (Figure 2D). Now, when some portion of the neural population dies, the remaining neurons receive less inhibition (or excitation, depending on which neurons die). In turn, their firing rates change, and the overall levels of inhibition (or excitation) readjust (Figure 2G), until balance is restored, and with it, accurate signal representation is restored. In this way, tightly balanced network dynamics compensate automatically and rapidly for the destruction of their elements (see Methods for additional details).

**The recovery boundary.** As expected, this recovery from neural death is not unlimited. When too many cells die, some portions of the signal can no longer be represented, no matter how the remaining neurons change their firing rates (Figure 2D–G). The resultant *recovery boundary* coincides with a breakdown in the balance of excitation and inhibition - where there are no longer enough inhibitory cells to balance excitation (or enough excitatory cells to balance inhibition) (Figure 2G). In this example, the recovery boundary occurs when all the neurons with positive valued readout weights have died, so that the network can no-longer represent positive valued signals or remain balanced (Figure 2G, black arrow).

When all but one neuron with positive readout weights die, the remaining neuron will shoulder the full representational load. This is an unwieldy situation because the firing rate of this neuron must become unrealistically large to compensate for all the dead neurons. In a more realistic scenario, neural activity can saturate, in which case the recovery boundary occurs much earlier (see Methods and Figures S1A, B). For instance, we find that a system with an equal number of positive and negative readout weights tolerates up to 90% neuron loss (when neurons are killed in random order) if the maximum firing rate of a neuron is 1000 Hz, whereas it will only tolerate 50% neuron loss if the maximum firing rate is 150 Hz (Figure S1B).

**Analyzing tuning curves using quadratic programming.** A crucial signature of the optimal compensation principle introduced above is that neuron firing rates must change to keep the linear readout of the signal *x* stable. In order to test how these changes compare with experimental recordings, we need to quantify them more precisely, which we will do by deriving a direct mathematical relationship between the firing rates in our spiking network and the input signal.

While the relation between the dynamics of a spiking network and the observed firing rates of individual neurons has traditionally been a hard problem, addressed for instance in the framework of mean-field theory (Renart et al., 2003), we can here circumvent many of these difficulties by noting that our spiking network minimizes the loss function (Equation 2). While this minimization has been performed for instantaneous firing rates, *r _{i}*(

*t*), and time-varying signals

*x*(

*t*), we conjecture that the

*average*firing rate, , for a

*constant*input signal,

*x*, must be given by the minimum of the same loss function as in Equation 2 so that, where the signal estimate, , is formed using the average firing rates. This minimization is performed under the constraint that firing rates must be positive, since the firing rates of spiking neurons are positive valued quantities, by definition. Traditionally, studies of population coding or efficient coding usually assume both positive and negative firing rates (Dayan and Abbott, 2001; Olshausen and Field, 1996; Olshausen and Simoncelli, 2001), and the restriction to positive firing rates is generally considered an implementational problem (Rozell et al., 2008). Surprisingly, however, we find that the constraint changes the nature of the solutions and provides fundamental insights into the shapes of tuning curves in our networks, as clarified below.

Mathematically, the minimization of a quadratic function under linear constraints is called *quadratic programming.* In our case, the loss function is quadratic in the average firing rates, , and the positivity constraint is linear in the firing rates. Using quadratic programming, we can therefore calculate the average firing rates by solving Equation 4 for a range of values of *x*. This procedure produces neural tuning curves, (Figure 3, second column), that closely match those measured in simulations of our tightly balanced network (Figure 3, third and fourth column), for a variety of networks of increasing complexity (Figure 3; see supplementary materials for mathematical details).

Our first observation is that the positivity constraint produces non-linearities in neural tuning curves. We illustrate this using a two-neuron system with two opposite valued readout weights, *w*_{1} = 0.025, *w*_{2} = −0.025 (Figure 3A, first column). At signal value *x* = 0, both neurons fire at equal rates so that the readout, Equation 3, correctly becomes . When we move to higher values of *x*, the firing rate of the first neuron, , increases linearly (Figure 3A, second column, orange line), and the firing rate of the second neuron, , decreases linearly (Figure 3A, second column, blue line), so that in each case, . Eventually, around *x* = 0.4, the firing rate of neuron 2 hits zero (Figure 3A, black arrow). Since it cannot decrease below zero, and because the estimated value must keep growing with *x*, the firing rate of neuron 1 must grow at a faster rate. This causes a kink in its tuning curve (Figure 3A, black arrow). This kink in the tuning curve slope is an indirect form of optimal compensation, where the network is compensating for the temporary silencing of a neuron. We provide further geometric insights into the shape of the tuning curves obtained from quadratic programming, in Figure S2 and in the Methods.

Our second observation is that tuning curves depend on the decoder weight values: a neuron with a negative weight has a negative firing rate slope and a neuron with a positive weight has a positive firing rate slope (Figure 3A–C, second column). If the network consists of several neurons, then those that have small readout weight magnitude have shallow slopes and thresholds far from *x* = 0, and those with large readout weight magnitude have steep slopes and thresholds close to *x* = 0 (Figure 3B, second column). By increasing the range of readout weight values, we increase the range or tuning curve slopes (Figure S3A). Every time one of the neurons hits the zero firing rate lower bound, the tuning curve slopes of all the other neurons change.

Our third observation is that tuning curves depend on the form of the cost terms, . If there is no cost term, there are many equally favorable firing rate combinations that produce identical readouts (Figure 1A). The precise choice of a cost term largely determines which of these solutions is found by the network. In Figure 3, we choose a *biased* quadratic cost . Here, the second term biases the population towards a solution where a background task variable *r _{B}* is represented, and where the contribution of neuron

*k*to this background task is determine by

*c*. This background task could be any signal-independent variable, that is not being probed or varied during an experiment. It has the effect of forcing tuning curves to intercept, thereby revealing much of the complexity of the quadratic programming solution as we have just observed. By increasing the range of neuron cost values {

_{k}*c*}, we increase the range or tuning curve intercepts (Figure S3B), and by decreasing the magnitude of cost values, we decrease the tuning curve intercept values (Figure S3C). As the heterogeneity of the cost parameters and readout weights increases, we see that the heterogeneity of tuning curve shapes increases (Figure 3C, second column). If we use a quadratic cost without a bias term , tuning curves do not intercept (Figure S4A). If we use a linear cost term , then a small number of neurons will dominate the representation (Figure S4B and Methods).

_{k}We note that although the firing rate predictions here using Equation 4 are very accurate (Figure 3, fourth column), we expect these predictions to break down in certain network regimes. For example, if membrane potential noise is large (about the same order of magnitude as the synaptic input) (Figure S5A), or if the readout weights become too large (Figure S5B and Methods). Accordingly, the dynamics of our network “implements” only an approximation to the quadratic programming algorithm, albeit a close approximation for the cases considered here. To simplify our nomenclature, we take this to be implied for our spiking network implementation, but we note that the quadratic programming solutions we obtain are the true optimal solutions.

**Tuning curves before and after neuronal silencing.** We can now calculate how tuning curves change shape in our spiking network following neuronal death. When a set of neurons are killed or silenced within a network, their firing rates are effectively set to zero. We can include this silencing of neurons in our quadratic programming algorithm by simply clamping the respective neurons’ firing rates to zero:
where *X* denotes the set of healthy neurons and *s* the set of dead (or silenced) neurons. This additional clamping constraint is the mathematical equivalent of killing neurons. The difference between the solutions of Equation 4 and Equation 5 provides our definition of ‘optimal’ compensation, which is understood to be the minimization of this loss function—either numerically or through the dynamics of the spiking network. In turn, we can study the tuning curves of neurons in networks with knocked-out neurons.

As a first example, we compare the tuning curves of neurons before and after neuron death. Using a one-dimensional input signal, *x*, and similar network connectivity as before (Figure 3C), we solve Equation 4 and obtain a complex mixture of tuning curves with positive and negative slopes, and diverse threshold crossings (Figure 4A). We then calculate how these tuning curves change shape following neuron death (Figure 4B), using Equation 5. We find that neurons that have similar tuning to the knocked out neurons increase their firing rates and dissimilarly-tuned neurons decrease their firing rates. In this way, signal representation is preserved as much as possible (Figure 4B, inset). In comparison, we find that a network that does not change its tuning curves after cell death has drastically worse representation error (Figure 4B, inset). In addition, if all the neurons with positive tuning curve slopes are killed, we cross the recovery boundary, so even optimal compensation can no longer preserve signal representation (Figure 4C).

We note that these observations are the key signatures of optimal compensation. In particular, they are largely independent of the specific details of our network model, such as the choice of readout weights or the choice of a cost-term. For example, if instead of using a biased quadratic cost term, we use a quadratic cost term or a linear cost term, we still find that neurons with similar tuning to the knocked out neurons increase their firing rates following the knockouts (see Figure S4B, E), and we find that the recovery boundary occurs at the same place (see Figure S4C, F).

We can now make our first comparison to measurements. We consider the vertebrate oculomotor system, which is responsible for horizontal eye fixations in the hindbrain of vertebrates. This system is naturally comparable to the one-dimensional signal representation example that we have just considered: the signal *x* corresponds to the eye position, with zero representing the central eye position, positive values representing right-side eye positions and negative values representing left-side eye positions. The biased cost term could reflect the system’s desire to retain a constant muscle tone. We find that the tuning curves of neurons measured in the oculomotor system (Aksay et al., 2000) (Figure 5A) are similar to the tuning curves in our one-dimensional network calculated using Equation 4 (Figure 5B). In both cases, neurons that encode right-side eye positions have positive slopes, and these follow a recruitment order, where neurons with shallow tuning curve slopes become active before neurons with steep tuning curve slopes (Aksay et al., 2000; Fuchs et al., 1988; Pastor and Gonzalez-Forero, 2003) (Figure 5A, B inset). Now, when neurons with negatively sloped tuning curves are inactivated using lidocaine and muscimol injections (Aksay et al., 2000), the system can compensate so that it can still represent right-side eye positions (Figure 5C). This is consistent with our optimal compensation prediction (Figure 4C inset and Figure 5D). However, the system is unable to represent left-side eye positions (Figure 5C) because all the negatively sloped neurons are knocked out. This is consistent with our recovery boundary prediction (Figure 4C inset and Figure 5D).

**Optimal compensation in networks with bell-shaped tuning curves.** Next, we investigate these compensatory mechanisms in more complicated systems that have bell-shaped tuning curves. We consider two examples: a small sensory system called the cricket (or cockroach) cercal system, which represents wind velocity, (Theunissen and Miller, 1991) and the primary visual cortex, which represents oriented edge-like stimuli (Hubel and Wiesel, 1962). Our framework can be generalized to these cases using circular signals embedded in a two-dimensional space, x = (cos(*θ*), sin(*θ*)), where *θ* is the orientation of our signal. In the case of the cercal system, *θ* is wind-direction, and in the visual cortex, *θ* represents the orientation of an edge-like stimulus. We must also generalize our signal readout mechanism, so that our neural population can represent the two dimensions of the signal simultaneously:
where **w*** _{i}* is the two-dimensional linear decoding weight of the

*i*neuron.

^{th}As before, we calculate tuning curves using quadratic programming (Equation 4), now for the multivariate case (Equation 28). If our system consists of four neurons with equally spaced readout weights (Figure 6A) we obtain tuning curves that are similar to the cricket (or cockroach) cercal system, with each neuron having a different, preferred wind direction (Theunissen and Miller, 1991) (Figure 6B, E). The similarity between predicted tuning curves (Figure 6B) and measured tuning curves (Figure 6E) suggests that we can interpret cercal system neurons to be optimal for representing wind direction using equally spaced readout weights.

We can now study how tuning curves in this system should change following cell death (Equation 5). This is analogous to experiments in the cockroach cercal system where a single neuron is killed (Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997). We find that our model system crosses the recovery boundary following the death of a single neuron (Figure 6D), and so, optimal compensation does not produce any changes in the remaining neurons (Figure 6C). This result is identical to measured responses (Figure 6F) (Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997). It occurs because four neurons are required, at a minimum, to represent all four quadrants of the signal space, and so, there are no changes that the remaining neurons can make to improve signal representation. Indeed, our model of the cercal system has no redundancy (see Methods for more details). As, such, the cercal system is an extreme system that exists on the edge of the recovery boundary (Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997).

The situation becomes more complicated in the large network example which has sufficient redundancy to compensate for the loss of neurons. When neurons have irregularly spaced readout weights (Figure 6G), we obtain an irregular combination of bell-shaped tuning curves (Figure 6H), similar to those found in the primary visual cortex, with each neuron having a different preferred orientation and maximum firing rate as determined by the value of the decoding weights. If we now kill neurons within a specific range of preferred orientations, then the remaining neurons increase their firing rates and shift their tuning curves towards the preferred orientations of the missing neurons (Figure 6I). In this way, the portion of space that is under-represented following cell death becomes populated by neighboring neurons, which thereby counteract the loss. The signal representation performance of the population is dramatically improved following optimal compensation compared to a system without compensatory mechanisms (Figure 6J).

**A high-dimensional example: optimal compensation in V1.** The visual cortex represents highdimensional signals, namely images, that consist of many pixel values. In the previous section, we investigated optimal compensation in a simple model of the visual cortex, in which neurons represent a one-dimensional orientation signal embedded in a two-dimensional space. Such a model is likely to be too simplistic to account for tuning curve changes in experiments. To investigate compensatory mechanisms in a more realistic model of the primary visual cortex, we make a final generalization to systems that can represent high-dimensional image signals (Figure 7A).

The tuning of V1 simple cells can largely be accounted for by assuming that neural firing rates provide a sparse code of natural images (Olshausen and Field, 1996; Olshausen and Simoncelli, 2001). In accordance with this theory, we use decoding weights that are optimized for natural images, and we use a sparsity cost instead of a quadratic cost (see Methods). As before, we obtain the firing rates of our neural population by minimizing our loss function (Equation 4), under the constraint that firing rates must be positive. We find that neurons are tuned to both the orientation and polarity of edge-like images (Figure 7D), where the polarity is either a bright edge with dark flanks, or the opposite polarity - a dark edge with bright flanks. Orientation tuning emerges because natural images typically contain edges at many different orientations, and a sparse code captures these natural statistics (Olshausen and Field, 1996; Olshausen and Simoncelli, 2001). Polarity tuning emerges as a natural consequence of the positivity constraint, because a neuron with a positive firing rate cannot represent edges at two opposing polarities. Similar polarity tuning has been obtained before, but with an additional constraint that decoding weights be strictly positive (Hoyer, 2004, 2003).

Finally, we calculate optimal compensation of tuning curves following the silencing of an orientation column (Figure 7E), again using Equation 5. Without optimal compensation, (i.e., without any changes in firing rates), we find that the silencing of an orientation column damages image representation, especially for image components that contain edges parallel to the preferred orientations of the dead neurons (Figure 7B, orange arrow). When the population implements optimal compensation, the firing rates of many neurons change, and the image representation is recovered (Figure 7C). The firing rates of the remaining neurons increase substantially at the preferred orientation of the silenced cells (Figure 8A, C) and the preferred orientations of many cells shift toward the preferred orientation of the silenced cells (Figure 8A, D), similar to the simplified two-dimensional example. Note that firing rates at the anti-preferred orientation also increase (Figure 8A, C). This occurs because neurons represent more than orientation (they represent a full image), and so, neurons that have anti-preferred orientations relative to each other may have similar tuning along other signal dimensions. These compensatory firing rate changes are consistent with experiments in the cat visual cortex, in which orientation columns are silenced using GABA (Crook et al., 1996, 1997, 1998) (Figure 8B and Figure S6). Neurons that escape silencing in the cat visual cortex also shift their tuning curves towards the preferred orientations of the silenced neurons (Figure 8B). In addition, the changes observed in the cat visual cortex are rapid, occurring at the same speed as the GABA-ergic silencing, which is consistent with the rapid compensation that we predict.

These results are largely independent of the precise parameterization of the system, such as the number of neurons chosen and the number of neurons knocked out. A high degree of over-completeness, for instance, will shift the recovery boundary to a larger fraction of knocked-out neurons, but will not change the nature of the compensatory response (Figure S7A, B). At high degrees of over-completeness, quantitive fluctuations in tuning curve responses are averaged out, indicating that optimal compensation becomes invariant to over-completeness (Figure S7C, D). The precise proportion of neurons knocked out in the experimental studies (Crook and Eysel, 1992) is not known. However, for a range of reasonable parameter choices (50%-75% knock out within a single orientation column), our predicted tuning curve changes are consistent with the recorded population response to neuron silencing (Crook and Eysel, 1992) (Figure S6B). Although the existence of a quantitive match is interesting, it is the broad qualitative agreement between our theory and the data from a variety of systems that is most compelling.

## Discussion

To our knowledge, optimal compensation for neuron death has not been proposed before. Usually, cell death or neuron silencing is assumed to be a wholly destructive action, and the immediate neural response is assumed to be pathological, rather than corrective. Synaptic plasticity is typically given credit for recovery of neural function. For example, synaptic compensation has been proposed as a mechanism for memory recovery following synaptic deletion (Horn et al., 1996), and optimal adaptation following perturbations of sensory stimuli (Wainwright, 1999) and motor targets (Braun et al., 2009; Kording et al., 2007; Shadmehr et al., 2010) has also been observed, but on a slow timescale consistent with synaptic plasticity. In this work, we have explored the properties of optimal compensation, and have shown that it may be implemented without synaptic plasticity, on a much faster timescale, through balanced network dynamics.

The principle of optimal compensation that we have proposed here is obviously an idealization, and any putative compensatory mechanism of an actual neural system may be more limited. That said, we have catalogued a series of experiments in which pharmacologically induced tuning curve changes can be explained in terms of optimal compensation. These experiments were not originally conducted to test any specific compensatory mechansisms, and so, the results of each individual experiment were explained by separate, alternative mechanisms (Aksay et al., 2007; Crook et al., 1996, 1997, 1998; Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997). The advantage of our optimal compensation theory is that it provides a simple, unifying explanation for all these experiments.

Our work is built upon a connection between two separate theories: the theory of balanced networks, which is widely regarded to be the standard model of cortical dynamics (Renart et al., 2010; van Vreeswijk and Sompolinsky, 1996, 1998) and the theory of efficient coding, which is arguably our most influential theory of neural computation (Barlow, 1961; Bell and Sejnowski, 1997; Greene et al., 2009; Olshausen and Field, 1996; Salinas, 2006). We were able to connect these two theories for two reasons: (1) We show how to derive a tightly balanced spiking network from a quadratic loss function, simplifying the recent work of (Boerlin and Denève, 2011; Boerlin et al., 2013) by focusing on the part of their networks that generate a spike-based representation of the information. (2) We show that the firing rates in these spiking networks also obey a quadratic loss function, albeit with a positivity constraint on the firing rate. While the importance of positivity constraints has been noted in other context (Hoyer, 2004, 2003; Lee and Seung, 1999; Salinas, 2006), we here show its dramatic consequences in shaping tuning curves, which had not been appreciated previously. Now, using this connection we can easily compute the tuning curves in our network by directly minimizing the loss function under a positivity constraint. This constraint minimization problem, called quadratic programming, provides a novel link between balanced networks and traditional firing rate calculations. It allows us to think of spiking dynamics as a quadratic programming algorithm, and tuning curves as the solution. In turn, we obtain a single normative explanation for the polarity tuning of simple cells (Figure 7D), tuning curves in the oculomotor system (Figure 5A, B), and tuning curves in the cricket cercal system (Figure 6A–F), as well as the mechanisms underlying the generation of these tuning curves, and the response of these systems to neuronal loss.

Several alternative network models have been proposed that minimize similar loss functions as in our work (Charles et al., 2012; Dayan and Abbott, 2001; Hu et al., 2012; Rozell et al., 2008). However, in all these models, neurons produce positive and negative firing rates (Charles et al., 2012; Dayan and Abbott, 2001; Rozell et al., 2008), or positive and negative valued spikes (Hu et al., 2012). The compensatory response of these systems will be radically different, because oppositely tuned neurons can compensate for each other by increasing their firing rates and changing sign, whereas we predict that only similarly tuned neurons will compensate for each with increased firing rates. Similar reasoning holds for any efficient coding theories that assumes positive and negative firing rates. In more recent work, it has been found that a spiking network, that produces positive spikes only, can learn to efficiently represent natural images (King et al., 2013; Zylberberg et al., 2011). Whether these models will support optimal compensation will depend on the specifics of the learnt connectivity (see Methods). Since optimal compensation is directly related to balance, we expect any balanced network models that track signals (Renart et al., 2010; van Vreeswijk and Sompolinsky, 1996, 1998) to also implement optimal compensation following neuron death. These observations hold for generalizations of our own model in which the signal input x is replaced by slow recurrent synaptic inputs that perform more complex computations, such as arbitrary linear dynamical systems (Boerlin et al., 2013). Indeed, recent work on balanced networks that act as integrators has shown robustness to various perturbations, including neuron death (Boerlin et al., 2013; Lim and Goldman, 2013), and can therefore be considered a special case of the optimal compensation principle that we propose here.

Aside from the specific examples studied here, we can use our theory to make a number of important, testable predictions about the impact of neural damage on neural circuits in general. First, we can predict how tuning curves change shape to compensate for neuron death. Specifically, neurons with similar tuning to the dead neurons increase their firing rates and neurons with dissimilar tuning decrease their firing rates (unless they are already silent). This happens because the remaining neurons automatically seek to carry the informational load of the knocked out neurons (which is equivalent to maintaing a balance of excitation and inhibition). This is a strong prediction of our theory, and as such, an observation inconsistent with this prediction would invalidate our theory. There have been very few experiments that measure neural tuning before and after neuron silencing, but in the visual cortex, where this has been done, our predictions are consistent with experimental observations (Crook et al., 1996, 1997, 1998) (Figure 8A, B).

Our second prediction is that optimal compensation is extremely fast—faster than the timescale of neural spiking. This speed is possible because optimal compensation is supported by the balance of excitation and inhibition, which responds rapidly to neuron death—just as balanced networks can respond rapidly to changes in input (Renart et al., 2010; van Vreeswijk and Sompolinsky, 1996, 1998). We are not aware of any experiments that have explicitly tested the speed of compensation following neuron silencing. In the visual cortex, the pharmacological silencing of neurons is too slow to out-rule the possibility that there is some synaptic plasticity (Crook et al., 1996, 1997, 1998). Nonetheless, these experiments are consistent with our prediction, because the changes observed in tuning curve shape are at least as fast, if not faster than the speed of pharmacological silencing. Ideally, these predictions could be tested using neuronal ablations or optogenetic silencing.

Finally, we predict that all neural systems have a cell death recovery boundary, beyond which neural function disintegrates. Existing measurements from the oculomotor system (Aksay et al., 2007) seem to be consistent with this prediction (Figure 5 and Figure 6A–F). We predict that this recovery boundary coincides with a disruption in the balance of excitation and inhibition. This has not been explored experimentally, although the disruption of balance has recently been implicated in a range of neural disorders such as epilepsy (Bromfield, 2006) and schizophrenia (Yizhar et al., 2011). Anecdotally, there have been many unreported experiments where neural ablation has failed to cause a measurable behavioral effect. Our theory suggests that such “failed” lesion experiments may be far more interesting than previously thought, and that the boundary between a measurable and unnoticeable behavioral effect deserves specific attention. Indeed, the properties of a recovery boundary may shed light on the progression of neurodegenerative diseases—especially those that are characterized by a period of asymptomatic cell death, followed by a dramatic transition to a disabled symptomatic state, as in Alzheimer’s disease and stroke (Leary and Saver, 2003). We predict that these transitions occur at the recovery boundary of the diseased system. We also predict that an accumulation of asymptomatic damage, through aging for example, or through acute conditions such as silent stroke, will increase the susceptibility of the brain to symptomatic damage, by moving it closer to the recovery boundary. If a system was to cross our predicted recovery boundary, without there being any functional deficit, our theory would be invalidated.

These predictions, and more broadly, the principle of optimal compensation that we have developed here, promise to be useful across a number of areas. First, as a neuroscience tool, our work provides a framework for the interpretation of experimental manipulations such as pharmacological silencing (Aksay et al., 2007), lesion studies (Keck et al., 2008) and optogenetic perturbations (Fenno et al., 2011). Second, in the study of neural computation, optimal compensation may be a useful guiding principle, because plausible models of neural computation should be designed specifically to withstand the type of damage that the brain can withstand. Finally, our work provides a new framework for describing how neurodegenerative disease impacts behavior through neural networks, by generalizing the theory of efficient coding from the intact brain state to the damaged brain state (Bredesen et al., 2006; Morrison and Hof, 1997).

## Methods

We have described the properties of optimal compensation, and given a variety of examples of optimal compensation across a range of systems. Here, we present further technical details. First, we describe how we tune a spiking network to represent signals optimally and we specify our choice of parameters for each figure. For the high-dimensional sparse coding example, we describe how we calculate sparse coding receptive fields and orientation tuning curves, both before and after neuron death. Next, we explain quadratic programming with an analytically tractable example, and we provide further details on our knock-out calculations. Finally, we prove that our spiking model performs optimal compensation. The Matlab code that we used to generate all the figures in this paper, along with all the parameters used for these figures will be published online.

**Derivation of network model.** In this section, we derive the connectivity and dynamics of a network that can optimally compensate for neuron death. We consider a network of *N* leaky integrate-and-fire neurons receiving input x = (*x*_{1},&, *x _{j}*,&,

*x*), where

_{M}*M*is the dimension of the input and

*x*is the

_{j}*j*input. In response to this input, the network produces spike trains

^{th}**s**= (

*s*

_{1},&,

*s*,&,

_{j}*s*), where is the spike train of the

_{N}*i*neuron and are the spike times of that neuron. Here, we describe the general formulation of our framework for arbitrary networks with

^{th}*N*≥

*M*.

A neuron fires a spike whenever its membrane potential exceeds a spiking threshold. We can write this as *V _{i}* >

*T*, where

_{i}*V*is the membrane potential of neuron

_{i}*i*and

*T*is the spiking threshold. The dynamics of the membrane potentials are given by: where Ω

_{i}*is the connection strength from neuron*

_{ik}*k*to neuron

*i*,

*F*is the connection strength from input

_{ij}*j*to neuron

*i*,

*g*(

*x*) is an operator applied to the input

_{j}*x*,

_{j}*λ*is the neuron leak and

*σ*is the standard deviation of intrinsic neural noise, represented by a Wiener process

_{V}*η*(Dayan and Abbott, 2001; Knight, 1972). The input “signal”

_{i}*g*(

*x*) is a placeholder for any feedforward input into the networks, but also for other recurrent synaptic inputs that are not explicitly modeled here. When a neuron spikes, its membrane potential is reset to

_{j}*R*≡

_{i}*T*+ Ω

_{i}*. Note that for ease of presentation, this reset is included in the recurrent connectivity summation of Equation 7. Also, we have written many variables without indicating their time-dependency. For example, the input signals,*

_{ii}*x*(

_{j}*t*), the voltages,

*V*(

_{i}*t*) and the spike trains,

*s*(

_{k}*t*), are all time-dependent quantities, whereas the thresholds,

*T*, the leak,

_{i}*λ*and the connection strengths, Ω

*and*

_{ik}*F*, are all constants.

_{ij}We assume that this network provides a representation of the input signal **x** using a simple linear decoder:
where **w*** _{k}* is the fixed contribution of neuron

*k*to the signal. We call this a vector of readout weights or an “output kernel”. This equation is the same as Equation 6 from the main text, and is a more general version of Equation 1 and Equation 3. The instantaneous firing rate of the

*k*-th neuron,

*r*, is a time-dependent quantity that we obtain by filtering the spike train with an exponential filter: where

_{k}*τ*is the time-scale of the filtering. This firing rate definition is particularly informative because it has the form of a simple model of a postsynaptic potential, which is a biologically important quantity. Note that the units of this firing rate are given by

*τ*, so we must multiply by

*τ*

^{−1}to obtain units of Hz. For example, in our two-neuron example (Figure 1A), we want to plot

*r*

_{1}and

*r*

_{2}in units of Hz, and we have used

*τ*=10 ms, so we must multiply by

*τ*

^{−1}= 100 Hz to obtain values in standard units.

Our goal is to tune all the parameters of this network so that it produces appropriate spike trains at appropriate times to provide an accurate representation of the input **x**, both before and after cell death. We can do this by requiring our network to obey the following rule: at a given time point *t*, each neuron only fires a spike whenever a spike reduces the loss function (Equation 2),
where the loss function is now assumed to be generalized to multivariate input signals **x** (Boerlin et al., 2013; Bourdoukan et al., 2012),
with (·)^{2} denoting an inner product, and . Then, since a spike changes the signal estimate by and the rate by *r _{i}* →

*r*+ 1, we can restate this spiking rule for the

_{i}*i*-th neuron as: where

*δ*is the kronecker delta. By expanding the right-hand-side, and canceling equal terms, this can be rewritten as

_{ik}This equation describes a rule under which neurons fire to produce spikes that reduce the loss function. Since a neuron *i* spikes whenever its voltage *V _{i}* exceeds its threshold

*T*, we can interpret the left-hand-side of this spiking condition (Equation 13) as the membrane potential of the

_{i}*i*neuron: and the right-hand-side as the spiking threshold for that neuron:

^{th}We can identify the connectivity and the parameters that produce optimal coding spike trains by calculating the derivative of the membrane potential (as interpreted in Equation 14) and matching the result to the dynamical equations of our integrate-and-fire network (Equation 7):

From Equation 8, we obtain and from Equation 9 we obtain *dr _{k}*/

*dt*= −

*r*/

_{k}*τ*+

*s*, which yields a simple differential equation for the readout:

_{k}By inserting these expressions into Equation 16 we obtain:

Finally, using the voltage definition from Equation 14 to write we can replace the second term on the right hand side and obtain:

This equation describes the voltage dynamics of a neuron that produces spikes to represent signal **x**. If we now compare this equation with our original integrate-and-fire network, Equation 7, we see that these are equivalent if

A network of integrate-and-fire neurons with these parameters and connection strengths can produce spike trains that represent the signal **x** to a high degree of accuracy. Elements of this calculation have been produced before (Boerlin et al., 2013), but are reproduced here for the sake of completeness. Also, it has been shown that this connectivity can be learned using a simple spike timing-dependent plasticity rule (Bourdoukan et al., 2012), so extensive fine-tuning is not required to obtain these spiking networks. We note that the input into the network consists of a combination of the original signal, *x _{j}*, and its derivative,

*dx*/

_{j}*dt*. In the simulations, we feed in the exact derivative. To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition).

Our derivation of network dynamics directly from our loss function allows us to interpret the properties of this network in terms of neural function. Each spike can be interpreted as a greedy error reduction mechanism – it moves closer to the signal **x**. This error reduction is communicated back to the network through recurrent connectivity, thereby reducing the membrane potential of the other neurons. The membrane potential, in turn, can be interpreted as a representation error – a linear projection of onto the neuron’s output kernel **w*** _{i}* (Equation 14). Following this interpretation, whenever the error becomes too big, the voltage becomes too big and it reaches threshold. This produces a spike, which reduces the error, and so on.

We can also understand these network dynamics in terms of attractor dynamics. This network implements a point attractor - firing rates evolve towards a stable fixed point in N-dimensional firing rate space. The location of this point attractor depends on neural input and network connectivity. When a neuron dies, the point attractor is projected into a subspace given by *r _{k}* = 0, where neuron

*k*is the neuron that has died.

Note that in this derivation, we used a quadratic cost. This cost increases the value of the spiking threshold (Equation 24) and the spiking reset (Equation 20). We can also derive network parameters for alternative cost term choices. For example, if we we use a linear cost, we simply need to drop the second term (*βδ _{ik}*) in Equation 20, while keeping all other parameters the same. In other words, we can implement a quadratic cost by increasing the spiking threshold and the spiking reset, and we can implement a linear cost by increasing the spiking threshold without increasing the spiking reset. In this way, the spiking threshold, and the reset determine the cost function. It is conceivable that these variables may be learned, just as network connectivity may be learned. Alternatively, these values may be predetermined for various brain areas, depending on the computational target of each brain area.

**Tuning curves and quadratic programming.** For constant input signals, the instantaneous firing rates of the neurons will fluctuate around some mean value. Our goal is to determine this mean value. We will start with the most general case, and rewrite the instantaneous firing rates as , were is a zero-mean ‘noise’ term that captures the fluctuations of the instantaneous firing rates around its mean value. Note that these fluctuations may depend on the neuron’s mean rate. In turn, neglecting the costs for a moment, we can average the objective function, Equation 11, over time to obtain

For larger networks, we can assume that the spike trains of neurons are only weakly correlated, so that . Here, is the variance of the noise term. We obtain

We furthermore notice that the spike train statistics are often Poisson (Boerlin et al., 2013), in which case we can make the replacement . The loss function becomes then a quadratic function of the mean firing rate, which needs to be minimized under a positivity constraint on the mean firing rate. This type of problem is known as ‘quadratic programming’ in the literature. In this study, we generally focused on networks for which the contributions of the second term can be neglected, which is generally the case for sufficiently small readout weights and membrane voltage noise (see Figure S5). In this case, we obtain the multivariate version of the loss function used in Equation 4, where we added the costs again. In general, quadratic programming is mathematically intractable, so the objective function must be minimized numerically. However, in networks with a small number of neurons, we can solve the problem analytically and gain some insight into the nature of quadratic programming. Here, we do this for the two neuron example (Figure 3A).

In this example, firing rates and are given by the solution to the following equation:

The positivity constraint partitions the solution of this equation into three regions, determined by the value of *x*: region where and , region where and , and region where and (Figure S2A). In region 2, we can then easily solve Equation 29 by setting the derivative of our loss function to zero. This gives where **w** = (*w*_{1}, − *w*_{2}) and **c** = (1,1)/2. Looking at this equation, we see that when and when . Therefore, the firing rate solution for region 2 is valid when . For , we have because of the positivity constraint in Equation 29. This is region 3. We can calculate by setting and then minimizing the loss function. This gives . Similarly, for we have because of the positivity constraint. This is region 1 and we obtain . The firing rates within each region are given by a simple linear projection of , although the size and direction of this projection is different in each region. As such, the solution to this quadratic programming problem is a piece-wise linear function of *x*.

In networks with larger numbers of neurons, the solution will still be a piece-wise linear function of *x*, although there will be more regions and the firing rate solutions are more complicated because more neurons are simultaneously active. In contrast, the transformation from firing rates to *x*′ is very simple (Equation 8). It is given by a simple linear transformation, and is region independent (Figure S2B).

**Readout weights and cost terms: 1-d and 2-d example.** There are a number of free parameters in our model. These are the cost terms and the readout weights {*w _{jk}*}. The choice of these values determine the precise shape of tuning curves. In general however, the precise values of these terms has little influence on the coding capability of our system, once certain properties have been satisfied, which we will outline here.

The cost term that we use for our first example (Figure 1), and for our 2-d bump shaped tuning curves (Figure 6 and Figure S7C, D) is a quadratic cost . (Here, and in the following, we will no longer distinguish *r _{i}*, the instantaneous firing rate used in the spiking network, and , the mean firing rates used in quadratic programming. Both notations apply depending on the context.) This cost term encourages the system to find a solution in which all neurons share in the signal representation. For our homogenous monotonic tuning curve examples (Figure 3A, B and Figure S2 and S3A) we use a homogeneous quadratic cost , which biases the system towards a background firing rate value of

*r*, where

_{B}*β*

_{2}/

*β*represents the extent of this bias. For our inhomogenous monotonic tuning curve examples (Figures 2, 3C, 4, 5B, D, 6G-J and Figures S1, S3B, C and S5), we use a slightly more general

*biased*quadratic cost . This is similar to the homogenous cost, but with a sum weighted by {

*c*} rather than 1/

_{k}*N*. This allows us to introduce some inhomogeneity into our system. The additional bias term in this heterogeneous cost is equivalent to an additional readout dimension. To see this, consider a 1-dimensional system with . We can rewrite this as a 2-dimensional system with , where and . Throughout this work, our choice of cost function is motivated by our goal of making direct comparisons to data from various systems. For example, in the oculomotor system, the population firing rate is approximately constant (Aksay et al., 2000) for all values of

*x*, which is a property that may facilitate constant muscle tone in the eye. We accommodate this property by choosing a biased quadratic cost term. Nonetheless, our general quadratic programming predictions about optimal compensation predictions are qualitatively similar regardless of our cost choice (Figure 4 and Figure S4).

The other important parameters that we must choose are the readout weights. For the sake of consistency and ease of comparison, we use similarly valued readout weights across several figures (Figures 2, 3C, 4, 5B, D and Figures S1, S4 and S5A, C). We have chosen values that are regularly spaced, with the addition of some random noise. Specifically, we set the readout to *w _{i}* — [

*w*+ (

_{min}*w*)(

_{max}w_{min}*i*− 1)/

*(*N/2 1)]

*ξ*for

_{i}*i*∈ [1,

*N*/2] and

*w*— [

_{j}*w*+ (

_{min}*w*−

_{max}*W*)(

_{min}*j*−

*N*/2 − 1)/(

*N*/2 − 1)]

*ξ*for

_{j}*j*∈ [

*N*/2 + 1,

*N*]. Also, we set with (0.5,1.5). The parameter values that we use are chosen so that we can make direct comparisons between our framework and the oculomotor system (Figure 5A, B). These values are

*w*= 0.01 and

_{min}*w*= 0.05. It is interesting to observe that the tuning curves of the oculomotor system are consistent with a random readout. Indeed, we find that there are many possible parameter values that produce systems with similar performance levels (Figure S3). This suggests that the performance of the oculomotor system is adequate, even with a random readout and the resulting random (if symmetric) connectivity. For our bump-shaped tuning curve example, the parameter values that we use are plotted (Figure 6A, G).

_{max}All of our quadratic programming predictions for tuning curve shape and our optimal compensation predictions for tuning curve shape change still hold for alternative choices of readout weights (Figure 3 and Figure S3), once the following properties are satisfied. First, it is important that the readout vectors span the space of the signal that we want our network to represent. Otherwise, the system will not be able to represent signals along certain directions, or compensate for neuron death. Second, it is important to set the scale of the readout, so that the cost does not dominate. There is a natural scaling that we can use to avoid such problems. We require the size of firing rates to be independent of network size, , so we must have . As a consequence, the off-diagonal elements of the recurrent connectivity are small , for *i* ≠ *k* (Equation 20), and if we assume that the diagonal elements are on the same order of magnitude, then . This scaling provides a principled basis for parameter choices in our model.

We may also want to scale our decoder weights without changing the shape of our tuning curve prediction (Figure S5B). To do this, the spiking cost parameter *β* and the membrane potential leak *λ* must also be scaled together. Specifically, if the readout weights are given by {*α* × *w _{jk}*}, where

*α*is a scaling parameter that characterizes the size of the decoder weights and {

*w*} are fixed decoder weights, then the spiking cost parameter must be set to

_{jk}*α*

^{2}×

*β*and the membrane potential leak must be set to

*α*×

*λ*. We can see that this preserves the shape of tuning curves by looking at the resulting structure of our loss function (Equation 2):

As before, the minimum of this loss function gives firing rates in units of the membrane potential leak (*αλ*). Therefore, we must divide **r** by (*αλ*) to obtain firing rates in units of Hz. Our loss function then becomes:

This loss function is independent of *α*, and so, using this scaling the optimal tuning curves will have the same shape for all values of *α*.

**Readout weights and cost terms: V1 example.** There are many possible choices of decoder weights {**w*** _{i}*} that provide a faithful representation of a signal. In positive sparse coding, we choose the decoder weights that provide the most efficient signal representation, for a sparse cost term (

*C*(

**r**) = ∑

*), under the constraint that firing rates must be positive. Here, we describe how we calculate these positive sparse coding weights, which we will use in several of our figures (Figures 7, 8 and Figures S6, 7).*

_{k}r_{k}We use the signal vector **x** = (*x*_{1},&, *x _{j}*,&,

*x*) to denote an image patch, where each element

_{M}*x*represents a pixel from the image patch (Olshausen and Field, 1996). We quantify the efficiency of a sparse representation using the following loss function: where 〈&〉 denotes an average across image patches. This is simply Equation 2 with a sparse cost term, averaged over image patches. The first term in this loss function is the image representation error. The second term quantifies the sparsity of the representation. The decoding filters that minimize this loss function will be the positive sparse coding filters for natural images.

_{j}We assume that the decoding filters are optimized to represent natural images, such as forest scenes, flowers, sky, water and other images from nature. Natural images are chosen because they are representative of the images that surround animals throughout evolution. We randomly select 2000 12 × 12 image patches from eight natural images taken from Van Hateren’s Natural Image Database (van Hateren and van der Schaaf, 1998). These images are preprocessed by removing low-order statistics, so that our sparse coding algorithm can more easily learn the higher-order statistical structure of natural images. First of all, images are centered so as to remove first order statistics:

Next, images are whitened, so as to remove second-order statistics:
where M = 〈**xx*** ^{T}*〉 is a decorrelating matrix.

We calculate sparse coding filters by minimizing the loss function (Equation 33) using a two step procedure. First of all, for a subset of 50 image patches, we calculate the firing rates that minimize the loss function under the constraint that firing rates must be positive:

The positivity constraint reduces the representational power of the neural population, so, to counteract this, we use a large population containing twice as many neurons as signal dimensions, *N* = 2*M* and we initialize our decoder matrix to *w _{jk}* =

*δ*—

_{j,k}*δ*to ensure that our neural population can easily span the signal space throughout the sparse coding calculation.

_{j,k+M}Next, we update our kernels by stochastic gradient descent on the loss function, using the optimal firing rates calculated in the previous step:

Kernel weights are normalized for each neuron, so that they do not become arbitrarily large. This calculation is similar to sparse coding calculations performed before (Olshausen and Field, 1996), except that we enforce the additional requirement that firing rates must be positive. This constraint was used before for sparse non-negative matrix factorization (Hoyer, 2004, 2003). However, this method requires an additional constraint that decoding weights be strictly positive, so that the nonnegative matrix factorization method can be applied. This additional constraint will reduce the efficiency of the representation, because every additional constraint has the property of reducing the efficiency by precluding solutions that may be more efficient, but that violate the additional constraint.

To compare the properties of optimal compensation in this positive sparse coding model to experiment, we calculate the orientation tuning of neurons in our model using a protocol similar to that used in the experimental work (Crook and Eysel, 1992). Specifically, we drive our model using an orientated edge-like stimulus. These stimuli are Gabor filters, with a profile similar to the sparse coding filters. We calculate the firing rate response of each neuron using Equation 36 for 16 different stimulus orientations. At each orientation, the Gabor filter is positioned at regularly spaced locations along a line perpendicular to the Gabor filter orientation and the maximum firing rate of each neuron along this line is taken as the firing rate response of that neuron at that orientation. These firing rate responses form the orientation tuning curves that we can compare to experimental recordings.

**The impact of neuron death.** We calculate the impact of neuron death on tuning curve shape with and without optimal compensation (Figure 4). To calculate tuning curve shape without compensation we solve Equation 4 in the intact state. We then constrain the firing rates of neurons selected for death to zero and calculate the impact of this on our signal representation. To calculate the impact of neuron death with optimal compensation, we solve Equation 5.

We also calculate the impact of neuron death and optimal compensation in an over-complete positive sparse coding model (Figure S7A). To do this, we define an over-completeness factor, *M*, given by *M* = *N/d*, where N is the number of neurons in the representation and d is the signal dimension. In the sparse coding calculation above, we had *M* = 2. We can use this to define *N*_{2} = 2*d* and , where **w*** _{i}* is a sparse coding readout vector. The over-complete readout vectors that we use are similar to the sparse coding readout vectors, but with some multiplicative noise (to avoid parallel readout weights). Specifically, we obtain new over complete readout vectors using , where ,

*a*=

*mN*

_{2}and

*m*∈ [0,&,

*M*/2 − 1]. We use this method to calculate readout weights for over-complete representations because the sparse coding calculation becomes computationally prohibitive for large values of

*M*.

**The impact of neuron death in a spiking network.** We investigate the impact of neuron death in our spiking model by simulating the network both before and after neuron death (Figure 2). Specifically, we use the Euler method to iterate the dynamics described by Equation 19 and we measure the readout , the spike trains s and the membrane potentials V. We simulate the death of a neuron by setting all the connections to and from that neuron to zero. This is equivalent to silencing a neuron. We continue to measure the network activity and the readout after neuron death.

The integrate-and-fire model that we have described can produce spikes at arbitrarily large firing rates. This can be a problem, especially when neurons die and the remaining neurons compensate with extremely high, unrealistic firing rates. To avoid this, we include a form of adaptation in our spiking model. Specifically, we extend our spiking rule so that a neuron *i* will only spike if its firing rate *f _{i}* is lower than a maximum

*f*. Here, the firing rate f is given by the following differential equation: and

_{max}*τ*is the timescale of adaption. This timescale is much slower that the timescale of spiking dynamics

_{A}*τ*.

We use this extended neural model to calculate the recovery boundary in a system that is optimized to represent a 1-d variable (Figure S1). We kill neurons in random order until the representation error averaged across signal values is more that 10% larger than the representation error in an identical system without adaptation constraints. We then average across trials to obtain a value for the recovery boundary. This working definition for the recovery boundary captures the serious degradation in representation performance that characterizes the notion of a recovery boundary.

We can also calculate the firing rates and recovery boundary of our extended integrate-and- fire model using quadratic programming (data not shown). Specifically, the firing rates of this network are given by the solution of the following optimization problem:
where *R* denotes the set of constraints that the firing rates must satisfy:
and where *X* is the set of all dead (or silenced) neurons.

**Tight balance of excitation and inhibition.** In simulations of our spiking model, we see that the balance of excitation and inhibition coincides with optimal coding (Figure 2D, G). We define the excitation *E _{i}* received by a neuron as the total input current received through excitatory synapses, and inhibition

*I*as the total input current received through inhibitory synapses: where unless

_{i}*Q*< 0, unless

_{ik}*F*< 0, unless Q

_{ij}x_{j}*> 0 and unless*

_{ik}*F*> 0. We can understand why excitation and inhibition are balanced during optimal coding by first observing that the value of the loss function is small during optimal coding. Consequently, the membrane potential is small, because the membrane potential is a linear transformation of (Equation 14). Finally, for membrane potentials to be small, we observe that there must be a balance of excitation and inhibition (Equation 14, Equation 41, Equation 42). As such, the balance of excitation and inhibition is a signature of optimal coding.

_{ij}x_{j}We can also understand this balance using scaling arguments. As we have discussed, the recurrent connection strengths are small . However, the spiking thresholds are also small , because and (Equation 24). Now, because recurrent input is summed across the entire population of *N* neurons, we find that . This is *N* times larger than the spiking threshold. In order for the membrane potential to be on the same order of magnitude as the spiking threshold, there must be a precise cancellation of excitatory and inhibitory inputs. This balance of excitation and inhibition is much tighter than balanced networks with random connectivity (van Vreeswijk and Sompolinsky, 1996, 1998). In randomly connected networks the balance between excitation and inhibition is of order (van Vreeswijk and Sompolinsky, 1998), whereas in the tightly balanced networks that we consider, these fluctuations are .

**Optimal compensation proof.** The spiking network that we use is capable of rapidly implementing optimal compensation, without requiring any synaptic plasticity mechanisms. We can prove this by showing that an optimal network of *N* − 1 neurons is equivalent to an optimal network of *N* neurons after the death of one neuron.

For the sake of argument, we suppose that the *N ^{th}* neuron dies. At a mechanistic level, the death of this neuron is equivalent to cutting all the connections to and from the dead neuron and to our readout . Therefore, a network where the

*N*neuron has died is equivalent to a network with

^{th}*N*− 1 neurons with readout matrix and ∀

*i*∈ [1,

*M*], with feed forward connectivity and ∀

*j*∈ [1,

*M*], with recurrent connectivity and with spiking thresholds .

Now, we compare this damaged network to an optimal network consisting of *N* − 1 neurons. To make a fair comparison, we assume that this network has the same readout matrix **w**′ = **w**^{*} as the reduced damaged network. Then, the recurrent connectivity for this network is given by Ω′ = Ω^{*}, the feed forward connectivity is given by **F**′ = **F**^{*} and the spiking thresholds are given by **T**′ = **T**^{*}. This configuration is equivalent to the reduced damaged network. Therefore, a spiking neural network who’s neurons are individually tuned to represent a signal optimally before cell death will perform optimal compensation and provide an optimal signal representation after cell death.

## Author Contributions.

D.G.T.B., S.D and C.K.M. designed the study. D.G.T.B. and C.K.M. performed the analysis. D.G.T.B and C.K.M wrote the manuscript.

## Competing Interests

The authors declare that they have no competing financial interests.

## Acknowledgements.

We thank Tony Movshon, Pedro Gonçalves, and Wieland Brendel for stimulating discussions and Alfonso Renart, Claudia Feierstein, Joe Paton and Michael Orger for helpful comments on the manuscript. S.D. acknowledges the James McDonnell Foundation Award and EU grants BACS FP6-IST-027140, BIND MECT-CT-20095-024831, and ERC FP7-PREDSPIKE. C.K.M. acknowledges an Emmy-Noether grant of the Deutsche Forschungsgemeinschaft and a Chaire d’excellence of the Agence National de la Recherche.