Abstract
How do neural populations adapt to the time-varying statistics of sensory input? To investigate, we measured the activity of neurons in primary visual cortex adapted to different environments, each associated with a distinct probability distribution over a stimulus set. Within each environment, a stimulus sequence was generated by independently sampling form its distribution. We find that two properties of adaptation capture how the population responses to a given stimulus, viewed as vectors, are linked across environments. First, the ratio between the response magnitudes is a power law of the ratio between the stimulus probabilities. Second, the response directions are largely invariant. These rules can be used to predict how cortical populations adapt to novel, sensory environments. Finally, we show how the power law enables the cortex to preferentially signal unexpected stimuli and to adjust the metabolic cost of its sensory representation to the entropy of the environment.
Main
Sensory systems adapt their representations to the changing statistics of the environment1-6, integrating stimulus history over multiple timescales7-10. In the visual system, adaptation is distributed across multiple brain regions comprising a hierarchical network, from the retina to primary and high-level cortical areas3,11,12. In primary visual cortex, adaptation has been studied extensively at the single-cell level, providing a wealth of information about how tuning curves, along different dimensions, are affected by an adaptor13-36. Here, we adopt a complementary approach, measuring and analyzing adaptation at the level of neural populations31 from a geometric perspective37,38, using both simple and naturalistic stimuli34. We show how such a strategy allows us to derive properties of adaptation that capture the transformation of sensory representations between environments. Moreover, these properties lead to information theoretic relationships that show how adaptation maintains an efficient cortical representation39,40.
Results
Describing adaptation at the population level
We developed a simple method to study how neural populations adapt in different environments. Consider a finite stimulus set, s = {si}. We define a visual environment, A, as one where the probability of observing si is given by pA(si). To examine how neurons adapt in this environment, we present a rapid stimulus sequence by independently drawing stimuli from pA(si) while recording their activity. We define the vector rA(si) as the mean response of the population over repeated presentations of si in the environment A. The mean population vector is computed at the optimal time delay between stimulus and response (see Methods). Similarly, the responses of the population to the same stimulus set can be measured in a different environment, B, where the probability of observing a stimulus is dictated by pB(si). This measurement yields another mean population vector rB(si). Given two environments, A and B, can we describe how rA(si) relates to rB(si)? If so, can such a model predict how the population will behave when it adapts to a new environment C? The main contribution of our study is to offer affirmative answers to these questions.
We applied this approach to study the responses of cortical populations to a set of oriented sinusoidal gratings (Fig 1, Methods). Neural activity was recorded from layers 2/3 in primary visual cortex using in-vivo, two photon imaging. In our initial set of experiments, we used three different environments A, B and C defined by simple distributions. Each row of panels in Fig 2 illustrates the outcome of one such experiment. All rows are formatted identically; thus, it suffices to describe the results from the session depicted by the panels in Fig 2a. Here, the prior probability in environment A was a von Mises distribution centered at the vertical orientation (with concentration κ = 1.2, indicating the sharpness of the distribution); in B, the same distribution was centered at the horizontal orientation; while in C, the orientations were uniformly distributed (Fig 2a(i)). The orientation tuning curve for each neuron, averaged across all environments, was computed. One can visualize the tuning curves of neurons in the population as a pseudo-color image, where each column represents a tuning curve, and cells are arranged according to their preferred orientation along the x-axis (Fig 2a(ii)). Each row, therefore, represents the average population response to one orientation of the stimulus. Our analyses were based on a sub-population of well-tuned neurons (see Methods) – however, the phenomena we describe are robust to this selection (Supplementary Fig 1).
Experimental protocol. a, Sessions included the presentation of three environments, A, B, and C each associated with a different distribution over the stimulus set. b, A session consisted of six blocks, each containing a unique permutation of all three environments. Each environment was presented for 5 min. Within an environment, stimuli were drawn from the corresponding distribution and flashed at a rate of 3 per second. The presentation protocol was meant to mimic the changes of the retinal image during saccadic eye movements18,35,49,56. A blank screen was shown for 1 min between environments. From one session to the next, the order of the permutations was randomized. c, Each session began with a coarse retinotopic mapping, where we determined the average locations of receptive fields within different sectors of the field of view, numbered 1 to 9. The bottom panel shows the center of the receptive fields for each sector mapped on the computer monitor. The background image represents the aggregate receptive field of the entire field of view. The red circle denotes its center. The dashed circle represents the circular window used during visual stimulation.
Characterizing adaptation in neural populations. Each row shows results from a different experiment. Axes are labeled in the top row and, unless otherwise noted, they have the same scale in all other rows. (i) The distribution of orientations associated with each environment. (ii) Mean responses of cells to an orientation. Each column represents a tuning curve. Cells have been arranged according to their preferred orientation. (iii) Logarithmic plot of the ratios between probabilities versus the ratio between magnitudes across the 3 possible pairs of environments. Colors indicate the corresponding pairs. Solid line represents the best fitting line (without intercept). Fit statistics appear at the inset. (iv) Distribution of cosine distance scatter. The mean value appears at the inset. (v) Calculation of the equivalent angular distance. The resulting value is noted at the inset. (vi) Using the power law to predict magnitudes of population responses in a new environment. Best fitting line (without intercept) is shown as a solid line. Fit statistics appear at the inset. (vii) Testing for population homeostasis. (viii) Correlation between the l2 and l1 norms across stimuli and environments. Solid line represents best linear fit (without intercept). Fit statistics appear at the inset.
Response magnitudes follow a power law
We discovered that the magnitudes of responses between environments are linked via a power law. Denote by rX(si) the l2 norm (or magnitude) of the response vector rX(si), where X is one of the three possible environments {A, B,C}. Given data from two different environments, X and Y, we observed that when plotting rX(si)/rY(si) against pX(si)/pY(si), in double logarithmic coordinates, the points fall on a straight line passing through the origin (Fig 2a(iii)). In other words, the ratio between the response magnitudes, and the ratio between the stimulus probabilities, are related via a power law: rX(si)/rY(si) = [pX(si)/pY(si)]². The best fit for the slope was β = −0.38 (Fig 2a(iii), inset). As a goodness-of-fit measure we used the adjusted R2, which equaled 0.97, indicating that the power law provides an excellent description of the data. As the slope (also the exponent in the power law) β is negative, the environment where a stimulus was presented more frequently generated the response with the lower magnitude – a classic signature of adaptation.
Response directions are approximately invariant
Next, we investigated the variability in the direction of population responses to a stimulus between environments. First, we defined the unit vectors and computed the resultant,
. Then, we calculated the cosine distance between
and
for all stimuli and environments. The distribution of these values provides an estimate of direction scatter which, in this experiment, was
(Fig 2a(iv)). How can we tell if this value is small or large? To assess the magnitude of the scatter we developed a “yardstick” that returns the change in the orientation of a stimulus required to cause a shift in the direction of the population response equal to scatter magnitude. We proceeded as follows. First, we computed d(Δ θ), defined as the average cosine distance between population responses evoked by stimuli that differ by Δ θ degrees (averaged across environments). Then, we calculated the equivalent angular difference, Δ θeq, as the value of Δ θ for which d(Δ θ) equals the mean scatter,
. This value is obtained from the point of intersection between the horizontal line at
and the d(Δ θ) function (Fig 2a(v)). In this case, we obtain Δ θeq = 1.4 deg. This equivalent angle is indeed small (across all experiments Δ θeq = 1.26 ± 0.41, mean ± 1 SD), indicating that the directions of population responses are approximately invariant across environments under our experimental conditions.
The power law predicts response magnitudes in novel environments
The measurements of responses in two environments are sufficient to obtain an estimate of the power law exponent. For example, we can use the data obtained in environments A and B to find the exponent β that best fits the relation rA(si)/rB(si) = [pA(si)/pB(si)]β. Suppose we now want to predict the magnitudes rC(si) in a new environment C, where the distribution of stimuli is given by pC(si). As the power law holds across any two environments, we must have rC(si)/rB(si) = [pC(si)/pB(si)]β, and we can predict rC(si) = rB(si) [pC(si)/pB(si)]β. Similarly, we can generate a prediction based on the responses in A, which yields rC(si) = rA(si) [pC(si)/pA(si)]β. Finally, we can combine both estimates by taking their geometric mean:
The notation rA,B→C indicates we are using data from environments A and B to predict the magnitudes of responses in C. As our dataset contains three environments, we can also compute rA,C→B and rB,C→A in a similar fashion. The measured versus predicted response magnitudes show that the power law accurately predicts the magnitudes of the responses in a novel environment, with an adjusted R2 value of 0.95 (Fig 2a(vi)). These results were remarkably consistent across experimental sessions (Fig 2b-e). Thus, adaptation at the population level is satisfactorily captured by two simple rules: a power law for the magnitudes and the approximate invariance of the direction across environments.
Violations of population homeostasis are common
It has been suggested that one of the goals of adaptation is to maintain population homeostasis, meaning that system adjusts itself to keep the average firing rate of neurons constant between environments31. We sought to test population homeostasis in our data. It can be shown that neurons maintain a constant rate across environments if and only if the function ΦX(si) = pX(si)rX(si) is constant for all environments X ∈ {A, B, C} (see Methods). Comparing the shape of these functions between environments offers a simple test of population homeostasis. Plotting these functions immediately reveals they are far from constant (Fig 2a(vii)). Instead, within each environment, the shape of ΦX(si) resembles the shape of the probability distribution associated with X.
An alternative way to restate this result is to notice that homeostasis is nothing more than a power law with β = −1 (derivation in Methods). The failure of homeostasis is then evident by the fact that the experimental values of β fall in the [−0.4, −0.15] range (Supplemental Fig 2). A similar failure of homeostasis is obtained if rX(si) represents the l1 norm instead of the l2 norm of the responses (Fig 2a(vii), solid versus dashed lines). The reason is that, in our experiments, the l1 norm is proportional to the l2 norm (Fig 2a(viii)). We will return to discuss the cause underlying this relationship below.
Probability distributions are represented with limited resolution
One difference between the data discussed up to this point, and an earlier study of population homestasis31, is that the latter used “peaked” distributions. In this condition, one orientation (the adapter) has a higher probability than the remaining orientations, which are all equally probable. Could this difference explain the lack of population homeostasis in our data? To find out, we conducted measurements in environments with peaked distributions (Fig 3). We found that many of the results remained unchanged, including the lack of population homeostasis (Fig 3a(vii)), the relative invariance of vector directions (Fig 3a(iv,v)), and the strong correlation between l2 and l1 norms (Fig 3(viii)). However, the power law no longer offered an adequate description of the data (Fig 3a(iii,vi)).
Results using peaked distributions. Each panel shows the result of one experiment. The top rows are organized as in Fig 2. Bottom rows: (ix) Distributions obtained after smoothing the actual probabilities in (i) with the optimal von Mises kernel with concentration κopt. (x) The adjusted R-squared as a function of the smoothing parameter κ. The curve has an inverted U-shape with the maximum goodness of fit attained at an intermediate value. (xi) Restoration of the power law under the assumption the cortex is using a smoothed estimate of the actual probabilities. (xii) Predictions using the power law relationship derived from (xi).
Seeking an explanation for this shortcoming, we reasoned that a single adaptor is not expected to influence only neurons with an orientation preference matching its orientation, but also cells with nearby orientation preferences, as their tuning bandwidths are finite. In other words, the cortex may be unable to represent a “peaked” distribution that has a sharp transition between neighboring orientations. Instead, the responses may be consistent with the population behaving according to a smoothed version of the actual distribution: , where hκ (θ) is a von Mises kernel with concentration κ and the operator ⊛ represents circular convolution. Indeed, when we repeated the analyses using
instead of pX(θ), we found there is an intermediate value of κ that produces the best linear fit between the ratios of responses and the ratios of stimulus probabilities, as assessed by the adjusted R2 measure (Fig 3a(ix,x)). Using the optimal κ largely restores the power law relationship (Fig 3a(xi)). In this example, the adjusted R2 for the power law linear fit is vastly improved from its original value of −0.25 to +0.79 after the smoothing correction. Similarly, the adjusted R2 for the prediction of magnitudes in a new environment improved from an adjusted R2 of −0.21 to +0.72 (Fig 3a(xii)). As one may expect, smoothing did not improve the predictions when the probability distributions were smooth to begin with, like those in Fig 2 (see Supplementary Table 1).
The rules of adaptation extend to natural distributions
Next, we wondered if the power law relationship would capture the data when environments are defined by a richer set of orientation distributions, such as those found in natural images (Fig 4a). Such distributions can be skewed and have multiple peaks of varying widths with different relative amplitudes. To answer this question, we collected data where environments were drawn from empirical distributions of natural image patches (see Methods). We were able to observe all the same phenomena (Fig 4b-d). The goodness-of-fit could be improved to a degree by smoothing, especially in cases where one or more environments contained narrow peaks. Yet, the baseline performance of the power law and its prediction were good from the outset (adjusted R2 for the power law was 0.77 ± 0.04, mean ± 1SD). Direction scatter remained small (1.10 ± 0.22 deg, mean ± 1SD). A summary of the fits across all our experiments is provided in Supplementary Fig 2 and Supplementary Table 1.
Testing the power law using naturalistic, orientation distributions. a, Orientation distributions in natural image patches photographed by the authors on the UCLA campus. b-d, Results using naturalistic environments. The panels are formatted exactly as Fig 3.
The rules of adaptation extend to complex stimulus sets
We conducted a series of experiments where the stimulus set consisted of 18 movie sequences selected from nature documentaries. Movies were randomly assigned an identification number from 1 to 18. The stimuli were not matched for luminance or contrast. We used the same environmental distributions as in our experiments with natural orientation distributions. The same phenomena could be observed under these conditions (Fig 5b-c). The scatter in the direction of population vectors remained small, the power law still applied, and it predicted the responses in novel environments accurately. In this data, too, we observed a violation of population homeostasis. The exponents in the power law were somewhat smaller (−0.19 ± 0. 03, mean ± 1SD) compared to gratings (−0.25 ± 0. 06, mean ± 1SD) and this difference was significant (Wilcoxon rank sum test, p = 4.4 × 10−12). Thus, it appears that the exponent in the power law depends, in part, on the stimulus set. Notice that the two sets differed strongly in the distribution of cosine distances between the responses. Movies evoked responses with pairwise cosine distances larger than 0.4, while gratings contain many pairs of stimuli differing by less than 20 deg, which evoke similar population responses with distances smaller than 0.4 (Fig 2-4(iv)). These differences, we suspect, may be related to the disparate exponents obtained. Nevertheless, the power law still captured the behavior of cortical populations under adaptation to complex movie sequences.
Testing the power law using movie clips. a, Stills from movie clips obtained from National Geographic documentaries available online. Each clip was present for the 333 ms. b-d, Each row shows the results obtained in separate experiments. Each panel has the same layout. i, Movie clips were assigned ID from 1 to 18 in a random order. Environments were defined using the same type of distributions used in the experiments described in Fig 4. Ii, Direction scatter, expressed in terms of the cosine distance, is shown by the orange bars. This is the same calculation shown in Figs 2-4 (iv) for sinusoidal grating data. The blue bars show a histogram of cosine distances between the mean population responses evoked by pairs of movie clips. The formatting of the remaining panels is the same as in prior Figures.
Adaptation is relatively fast and sensitive to spatial phase
Finally, we investigated the dynamics of the adaptation and its dependence on spatial phase using the sinusoidal grating dataset (Fig 6a-c). For each environment, we computed the magnitude of the responses to an orientation given that the stimulus preceding it by T seconds differed in orientation by Δ θ, which we denote by r(Δ θ, T). We define the modulation function as m(Δ θ, T) = r(Δ θ, T)/r(Δ θ T∞). Here, T∞ is a sufficiently large time. A plot of m(Δθ, T) shows that the responses have relatively fast dynamics. Stimuli presented more than 2 sec into the past do not longer influence the magnitude of population responses41. We repeated a similar analysis to examine the modulatory influence of the immediately preceding stimulus as a function of both relative orientation and relative spatial phase. Clearly, the population response is sensitive to spatial phase, as the maximum suppression results when the previous stimulus matched both the orientation and phase of the present one (Fig 6c, note that spatial phases are incongruent for different orientations).
Dynamics of adaptation and relative entropy. a-b, Modulation of response magnitude by a stimulus with orientation Δ θ away, shown T seconds earlier in the sequence. Adaptation is fast – stimuli presented beyond 2 sec into the past have no influence on the population response. b, Same data as in a, for Δθ = 0 and Δθ = 90 with solid lines showing exponential fits to the data. We refer to τd as the depletion time constant and τr as the recovery time constant. The terms are used for convenience and are not meant to imply we know the mechanism behind adaptation is synaptic depression. c, Modulatory effect of an immediately preceding stimulus jointly as a function of relative shifts in orientation and spatial phase. The data for Δθ = 0 shows that adaptation is sensitive to spatial phase. d, As expected from the theory, relative entropy of pX from pY is correlated with the expected value (in environment X) with log (rX(si)/rY(si)). These data are pooled from 16 different experiments.
Computational implications of the power law
What insights about the computational role of cortical adaptation can we gain from the power law relationship? To simplify our discussion, let us assume that in a uniform environment, U, where all stimuli have the same probability, the responses magnitudes are also the same. Then, according to the power law, we can write rX(si)/r (si) = [pX(si)/p (si)]β. We are assuming r (si) = r and p (si) = 1/|s|, where |s| is the number of stimuli in our set. Hence, log rX (si) = A + β log pX (si), where A is a constant. From an information theoretic point of view42, X(si) = − log pX (si) represents the information content or “surprise” of observing si in the environment X. We can then write log rX (si) = A − β X (si). Thus, the logarithm of the response magnitude is linearly related to the “surprise” of observing si. Note that β < 0, so the larger the surprise the larger the response. This relationship allows us to appreciate how adaptation enables the cortex to signal unexpected, deviant, novel, or odd-ball stimuli43-46 (all terms referring to stimuli with a low probability of appearance within an environment).
Now, if we take expectations on both sides of the “surprise” equation, we obtain E{log rX (si)} = A − β E{log pX (si)} = A − βHX, where HX is the entropy of the environment X. Here, one can interpret E{log rX (si)} as an overall measure of population activity in X. This means that adaptation enables the cortex to adjust the metabolic cost of its representation to the predictability of the environment. A predictable environment (with low entropy) will be coded with at a lower metabolic cost than an unpredictable one (with high entropy). These properties are consistent with the principles of efficient cortical encoding39,40.
We can go further and derive a more general information-theoretic expression between two arbitrary environments, without invoking the assumption of a uniform environment with constant response magnitudes. Namely, it we can show that the relative entropy (or the Kullback–Leibler divergence) of pX(si) from pY(si) is related to the magnitudes of the responses by (DKL || pY) = (1/β) EX{log [rX(si)/rY(si)]} (see Methods). We can empirically test this relationship by pooling the data across our experiments while using the exponent obtained from the power law fit in each case. As expected, we observe that (DX || pY) is significantly correlated with (1/β) EX{log [rX(si)/rY(si)]} (Fig 6d).
These findings still hold if rX(si) represents the l1 norm rather than the l2 norm, as our data show they are proportional to each other. What is the root of this relationship? If we assume the population contains a set of homogeneous tuning curves, the distribution of responses for any given orientation (horizontal slices through the individual panels of Fig 2(ii)) will be the same for each orientation. Our data show that adaptation simply changes the amplitude of these vectors. Thus, for any stimulus and environment, the distribution of activity over the population will be one among a family of scaled distributions f(r/s)/s. One can easily verify this leads to norms that are proportional to each other. The constant of proportionality depends only on the shape of the distribution. A similar argument can be used for the case of natural images, assuming the tuning of the population tiles the Fourier domain and the fact that natural images have 1/f amplitude spectra.
Discussion
We studied adaptation of cortical populations in different statistical environments and found that their behavior can be summarized by two properties. First, the ratio of response magnitudes to a stimulus is linked to the ratio of its prior probabilities via a power law. Second, the directions of the responses are largely invariant between environments. These relationships could be used to predict the behavior of neural populations exposed to novel environments. The same set of phenomena were obtained with environments defined by natural, orientation distributions, and with stimulus sets composed of natural image sequences. The power law seemingly failed in environments defined by peaked distributions (Fig 3). However, the relationship could be restored using a smoothed estimate of the empirical distributions in the environment. Finally, the power law offered some new insights into the role of adaptation, revealing a neural population’s ability to signal low-probability stimuli and to adjust the metabolic cost of the representation to the predictability of the environment.
When our findings are compared to the seminal work of Benucci and collaborators31, we find some discrepancies and some points of agreement. First, we observed a robust violation of population homeostasis in all our conditions, including environments with peaked distributions that matched their conditions (Fig 3), where the probability of adaptor was ∼0.36. Our results, instead, more closely resemble the deviations from homeostasis these authors reported when the probability of the adaptor was 0.5 (see their Supplementary Fig 4). It is possible that homeostasis in mice holds only when the “dynamic range” of the environments, defined as maxi p(si)/mini p(si), is smaller than the ones we tried so far. However, clear violations of homeostasis are also observed in environments with natural orientation distributions (Fig 4) and with natural movie sequences (Fig 5). For these reasons, we think that population homeostasis does not appear to be the driving force of adaptation under natural conditions.
Benucci and collaborators proposed a model where the activity is modulated by two gain factors: one applied to the tuning curves of neurons, and another applied to the population response magnitudes. They noted that the modulation of response magnitudes was the dominant component (see their Supplementary Fig 6). Notably, in 4 out of 5 cases where the adaptor had a probability of 0.5, a change in the gain of the population alone was sufficient to explain adaptation – there was no measurable change in the gain of the tuning curves. The dominance of the population gain is consistent with the property of direction invariance and our own supplementary analyses (Supplementary Fig 3). Moreover, the gain functions that explained adaptation in their peaked environments were broad Gaussians, which is analogous to our finding that adaptation in the orientation domain behaves according to a smoothed distribution. The dynamics of adaptation we report (Fig 6) are very similar to their estimates, both leading to integration windows of about ∼2 sec (see their Supplementary Fig 5). It is possible that our discrepancies are due to the use of different species and methodologies. They recorded multiunit activity in cat areas 17 and 18 using Utah arrays covering 4 × 4 mm2 of cortex, with electrodes arranged in a grid with 400μm spacing. In contrast, we used single-cell, two-photon imaging in mouse primary visual cortex, with neurons located within a field of view of 730μm × 445μm in size.
We note that the well-documented shifts in the preferred orientation of neurons away from an adapting stimulus (e.g. ref15) are consistent with the modulation of population responses and direction invariance. To appreciate this point, consider a homogenous population of neural responses in a uniform environment (Fig 7a). Here, the rows represent the population responses and the columns the tuning curves of the population. Let us assume the presence of an adaptor at 90 deg modulates the population responses (Fig 7b). The cortical responses under adaptation leave the direction of population vectors invariant (Fig 7c). If we plot the preferred orientation of the tuning curves (the columns of Fig 7c), we notice that neurons with preferred orientations near the adaptor will have their flanks closer to the adaptor decay more rapidly than those facing away (Fig 7d), causing a shift in their preferred orientation relative to the uniform environment (Fig 7e). Thus, shifts in the preferred orientation of individual neurons between environments and the invariance of population directions coexist in our model of adaptation.
Consistency of direction invariance and tuning curve shifts. a, Responses of a homogenous population in a uniform environment. b, Modulation function evoked by an adaptor at 90 deg. Each row in a is multiplied by its corresponding gain to yield the responses of the population under adaptation in c. d, Examples of a few tuning curves (columns of c) under adaptation. Solid curves show two tuning curves near the adaptor. The flanks of the tuning curves closer to the adaptor fall more rapidly than those facing away, shifting their preferred orientations. d, Shifts in the preferred orientation of tuning curves under adaptation relative to the uniform environment.
The approximate invariance of response directions allows a downstream decoder to perform well despite being “unaware”47 of the state of adaptation of primary visual cortex. A decoder could learn a single, average map between stimulus orientation and a response direction, . Given the direction of a new response
in an unknown environment, the decoder could estimate the stimulus orientation as
where d(·,·) is the cosine distance between the arguments. Approximate direction invariance means such a decoder is likely to perform well across different environments. We emphasize this does not mean the decoder will be unbiased. While the scatter in response direction across environments is small, the deviations are not random and have a structure associated with the distribution in each environment that contributes to biases and changes in discriminability in unaware decoders that we will describe elsewhere (Dipoppa et al, manuscript in preparation). Direction invariance under our experimental conditions is likely the result of short presentation times aimed at mimicking the changes in the retinal image due to saccadic eye movements. One may expect stronger departures from direction invariance in conditions involving longer adaptation times48.
The circuit implementing the power law is still under investigation. The power law behavior and its relatively fast dynamics (consistent to what has been reported in previous studies11,35,49), suggests a possible involvement of synaptic depression11,16,17,35, although intrinsic cell properties could also play a role50,51. A normalization network may also be able to explain our findings52,53, although the fact that the phenomenon is phase sensitive (Fig 6c) does not align well with the notion that the normalization signal is pooled over many cortical neurons54, as this would render it phase invariant. Lastly, it has not been established if adaptation can be explained exclusively as a feed-forward circuit or requires the use of top-down signals55.
There are other important questions raised by our study. Can the model be extended to capture how the covariance of the responses and discriminability of stimuli are affected by adaptation? How does contrast sensitivity change between different environments? How do contrast and stimulus probability interact to yield a response magnitude? Can the power law describe adaptation in neural populations in other brain regions and sensory modalities? While much remains to be explored, the present analyses show that studying the responses of cortical populations at the population level can yield important insights into the signal processing goals of adaptation and potentially, other visual computations.
Methods
Experimental model and subject details
The procedures in the experiments described here were approved by UCLA’s Office of Animal Research Oversight (the Institutional Animal Care and Use Committee) and were in accord with guidelines set by the U.S. National Institutes of Health. A total of 4 mice, male (1) and female (3), aged P35-56, were used. Animals were a cross between TRE-GCaMP6s line G6s2 (Jackson Lab, https://www.jax.org/strain/024742) and CaMKII-tTA (https://www.jax.org/strain/007004).
Surgery
Imaging is performed by visualizing activity through chronically implanted cranial windows over primary visual cortex. Carprofen is administered pre-operatively (5mg/kg, 0.2mL after 1:100 dilution). Mice are anesthetized with isoflurane (4%–5% induction; 1.5%–2% surgery). Core body temperature is maintained at 37.5C. Eyes were coated with a thin layer of ophthalmic ointment during the surgery. Anesthetized mice are mounted in a stereotaxic apparatus using blunt ear bars placed in the external auditory meatus. A portion of the scalp overlying the two hemispheres of the cortex is then removed to expose the skull. The skull is dried and covered by a thin layer of Vetbond. After the Vetbond dries (15 min), we affix an aluminum bracket with dental acrylic. The margins are sealed with Vetbond and dental acrylic to prevent any infections. A craniotomy is performed over monocular V1 on the left hemisphere using a high-speed dental drill. Special care is taken to ensure that the dura is not damaged during the process. Once the skull is removed, a sterile 3 mm diameter cover glass is placed directly on the exposed dura and sealed to the surrounding skull with Vetbond. The remainder of the exposed skull and the margins of the cover glass are sealed with dental acrylic. Mice are allowed to recover on a heating pad. When alert, they are transferred back to their home cage. Carprofen is administered post-operatively for 72 hours. Mice are allowed to recover for at least 6 days before the first imaging session.
Two-photon imaging
We conducted imaging sessions in awake animals starting 6-8 days after surgery. Mice are positioned on a running wheel and head-restrained under a resonant, two-photon microscope (Neurolabware, Los Angeles, CA) controlled by Scanbox acquisition software and electronics (Scanbox, Los Angeles, CA). The light source was a Coherent Chameleon Ultra II laser (Coherent Inc, Santa Clara, CA). Excitation wavelength was set to 920nm. The objective was an x16 water immersion lens (Nikon, 0.8NA, 3mm working distance). The microscope frame rate was 15.6Hz (512 lines with a resonant mirror at 8kHz). The field of view was 730μm × 445μm. The objective was tilted to be approximately normal the cortical surface. Images were processed using a standard pipeline consisting of image stabilization, cell segmentation and signal extraction using Suite2p (https://suite2p.readthedocs.io/)57. A custom deconvolution algorithm was used58. A summary of the experiments, including summaries of the analyses are presented in Supplementary Table 1.
Visual stimulation
We used a Samsung CHG90 monitor positioned 30 cm in front of the animal for visual stimulation. The screen was calibrated using a Spectrascan PR-655 spectro-radiometer (Jadak, Syracuse, NY), generating gamma corrections for the red, green, and blue components via a GeForce RTX 2080 Ti graphics card. Visual stimuli were generated by a custom written Processing 4 sketch using OpenGL shaders (see http://processing.org). At the beginning of each experiment, we obtained a coarse retinotopy map as follows. The field of view was split into a 3 × 3 grid (Fig 1c, top) and the average fluorescence in each sector computed in real time. The screen was divided into a 5 × 18 grid. We randomly elected a location on the grid and presented a contrast-reversing 4 × 4 checkerboard for 1 sec. Within a block, each grid location was stimulated once. A total of 5 blocks were used. The data were analyzed to produce an aggregate receptive field map for each sector. The centers of each of these receptive fields are shown in the bottom panel of Fig 1c for one experiment. The grand average of the receptive fields appears as the background in the same figure. The center of the population receptive field is used to center the location of our stimuli in these experiments. We endeavored to center population at an elevation of around 0 deg, which allowed us to maximize the circular window through which we presented the stimulus (dashed circle in Fig 1c). The experiment consisted of the presentation of 100% sinusoidal gratings of 0.04 cpd using the protocol depicted in Fig 1. The spatial phases at each orientation were uniformly randomized from 0 to 360 deg in steps of 45 deg. When computing the mean population vector for a given stimulus, we averaged across spatial phases, thus minimizing the effect of eye movements on our analyses. The appearance of a new stimulus on the screen was signaled by a TTL line sampled by the microscope. As a failsafe, we also signaled the onset of the stimulus by flickering a small square at the corner of the screen. The signal of a photodiode was sampled by the microscope as well.
Orientation distributions
We collected natural images from the UCLA campus (see samples in Fig. 4a). Images were converted to grayscale and the gradient of the image ∇I (x, y) computed at each location. We computed the distribution of the magnitude of the gradient across the entire image and set a threshold at the 90th percentile of the distribution. The angle of the gradient for the pixels with magnitudes exceeding the threshold were collected. A smooth distribution in the orientation domain was then obtained via kernel density estimation59, where the kernel was a von Mises distribution with κ = 5. The result was discretized to yield a probability distribution over angles ranging from 0 to 170 in steps of 10. A library of 150 distributions were generated by this procedure. In each experiment we randomly selected 3 distributions and accepted them if (a) the minimum probability of a stimulus across all environments exceeded 0.03 (to ensure we would have a reasonable amount of data for all orientations) and (b) the cosine distance between the distributions was larger than 0.25 for any of the 3 pairs (to ensure that the distributions were sufficiently different from each other). If the test failed, we repeated the procedure until a random pick satisfied the criteria.
Optimal stimulus-response delay
For each environment, we calculated the magnitude of the population response T microscope frames after the onset of the stimulus, where T ranged from -2 to 15. The frame rate of the microscope was 15.53 frames/sec. The time to peak of these curves agreed for all environments. In other words, the dynamics were not dependent on the statistics of the environment. We therefore averaged the magnitudes across all the 3 environments and defined the optimal stimulus-response delay as the time (in microscope frames) between the onset of the stimulus and the peak response magnitude of the population.
Analysis of population homeostasis
Note that the average firing rate of the population in an environment X is given by which can be written as
. Here, we have dropped the subscript from the unit vector, as directions are invariant across environments. Population homeostasis holds if the vector
for all environments X, where k is a constant. Assuming the vectors
are linearly independent, there is a unique way to write k as a linear combination of
, which we write as
(we can safely assume k is in the span of
, otherwise, the mean rate cannot equal k). Hence,
. By coefficient matching, we conclude that homeostasis holds if and only if ΦX(si) = pX(si)rX(si) = ki for all environments X. In other words, when homeostasis holds, we have that pX(si)rX(si) = ki = pY(si)rY(si), which implies rX(si)/rY(si) = [pX(si)/pX(si)]−1. This means that homeostasis is a particular case of the power law with β = −1.
Derivation of relative entropy relationship
By the definition of KL divergence, we have (DKL || pY) = Σi pX(si) log(pX(si)/pY(si)). The power law states that log(rX(si)/rY(si)) = β log(pX(si)/pY(si)). From here, we have log(pX(si)/ pY(si)) = (1/β) log(rX(si)/rY(si)). We can now substitute back into the first expression to get (pX || pY) = (1/β) Σi pX(si) log(rX(si)/rY(si)) = (1/β) EX log(rX(si)/rY(si)). This is the relationship tested experimentally in Fig 6d.
Statistics & Reproducibility
We conducted experiments by independently measuring the adaptation of neural populations in visual cortex in 23 different instances (see Supplementary Table 1). Results were statistically significant (p-values less than 10−4) and consistent across individual experiments. As the study did not involve different groups undergoing different treatments, there was no need for randomization or blind assessment of outcomes. Data selection was used to analyze neurons that showed well-tuned behavior with a circular variance60 of less than 0.5. However, this choice has little effect on the outcome of our analyses. The same results are obtained if we, instead, work with the entire population (see Supplementary Fig 1).
Data availability
Data including the mean responses of the population for each experiment will accompany the publication of this manuscript.
Code availability
Sample code describing the structure of the database and the replication of some of our analyses will be released in the same repository as the data with the publication of this manuscript.
Acknowledgements
This work was supported by NIH grant NS116471 (D.L.R.) We thank Dean Buonomano, Andrea Benucci and Matteo Carandini for comments on an earlier version of this manuscript.