Abstract
The primary visual cortex (V1) processes complex mixtures of orientations to build neural representations of our visual environment. It remains unclear how V1 adapts to the highly volatile distributions of orientations found in natural images. We used naturalistic stimuli and measured the response of V1 neurons to orientation distributions of varying bandwidth. Although broad distributions decreased single neuron tuning, a neurally plausible decoder could robustly retrieve the orientations of stimuli from the population activity at all bandwidths. This decoder demonstrates that V1 population co-encodes orientation and its precision, which enhances population decoding performances. This internal representation is mediated by temporally distinct neural dynamics and supports a precision-weighted description of neuronal message passing in the visual cortex.
Introduction
Selectivity to the orientation of stimuli is an archetypal feature of neurons in the primary visual cortex (V1) (1). In more than 60 years of investigations, low-complexity stimuli such as gratings have uncovered many orientation-tuned filter-like properties in V1 (2). These stimuli are, however, intrinsically limited in their relevance to study the rich cortical dynamics (3) involved in the vision of natural images (4). Indeed, the complex content of our everyday visual scenery (5) enforces sparse and efficient population codes in V1 (6–8). Thus, any understanding of how the early visual system works has to also account for the behavior of V1 in its ecological visual environment (9). While it is possible to directly probe V1 with natural images (10), synthetic “natural-like” stimuli are often preferred (11). This approach offers a compromise between the desire for keeping a fine control over the experiments and the need for complex, naturalistic stimuli to study ecological brain mechanisms. Moreover, these synthetic stimuli allow the isolation of relevant visual parameters by removing correlations between the multiple perceptual dimensions composing a natural image (12–15).
The underlying structure of natural images and natural-like stimuli can be represented as a distribution of orientations (Supplementary Figure 1). The amplitude of such distributions corresponds to the contrast of the image, whose impact on orientation selectivity has been thoroughly investigated (16). On the contrary, the cortical processing of the broadness of this distribution (i.e. the orientation band-width) is unclear. Broader orientation bandwidths reduce the salience and the precision of orientation inputs. This is a central challenge which V1 must constantly address, as the distribution of orientations in natural images is highly heterogeneous. Current models of the influence of orientation bandwidth on vision come from the field of texture perception (17). Intuitively, psychophysics experiments have shown that increased broadness in orientation distributions worsens orientation discrimination performances (18–20), which can be explained by computational models of intracortical dynamics (21–23). Despite these converging observations, there has been very few investigations in the neural correlates of the processing of orientation bandwidth. Recent work by Goris et al. (14) has shown that orientation bandwidth causes tuning modulations of single neurons in V1 and V2 that are coherent with previous psychophysics and computational studies. These modulations are heterogeneous at the population level and provide a robust theoretical basis for encoding natural images. However, the detailed processes involved in this population code remain unknown.
Here, we used naturalistic stimuli to study the encoding of the bandwidth of orientations in V1. Using a biologically plausible neuronal decoder (24), we found that stimuli orientation could be robustly inferred from the population activity at multiple broadness of input bandwidth. While band-width is not encoded explicitly in the activity of individual V1 neurons, we showed that it is actually co-encoded with orientation, which improves the orientation encoding performances of a neural population. Moreover, the bandwidth of an orientation is linked to its processing time, which stems from temporally separated neural representations of orientation bandwidths within the same neural network. Orientations of narrow bandwidths are encoded in the population activity right after the onset of the stimuli. This representation then fades away to make place for a broader bandwidth neural code, which can peak up to several hundred of milliseconds later. These results suggest that the precision of the orientation, in the form of its bandwidth, is actively processed in V1. This conveys crucial implications for theories requiring the encoding of the precision of sensory variables, such as predictive coding (25).
Results
Single neuron modulation by orientation bandwidth
Single-unit activity of 254 V1 neurons was recorded in three anesthetized cats. Orientation selectivity was measured using band-pass filtered white noise images called Motion Clouds (26), whose underlying generative framework allowed to precisely control the orientation content of each visual stimulus. Specifically, we varied two parameters of Motion Clouds: their median orientation θ and the band-width of the orientation distribution Bθ (Figure 1a), which as the standard deviation of the distribution is related to the variance or inverse precision of the orientation input to V1. Like natural images, Motion Clouds are complex mixtures of orientations, but their stationarity in the spatial domain removes any potential second-order correlation, and allows to isolate the effect of Bθ on orientation selectivity (Supplementary Figure 1). Moreover, they conform to the statistics of natural images, namely the 1/f2 power spectrum distribution (6). We measured the orientation selectivity of V1 neurons to Motion Clouds for 12 evenly spaced values of θ (steps of 15°) and 8 values of Bθ (steps of 5.1°, starting at 0°). As Bθ increased, the vast majority of tuning curves remained centered on the same preferred orientation (98% units, p < 0.05, Wilcoxon signed-rank test) but diminished in peak amplitude (94% units, p < 0.05, Wilcoxon signed-rank test, 73.1% mean amplitude decrease). Only 26% neurons were still tuned when orientation bandwidth reached Bθ = 36.0°. As such, the common selectivity modulation caused by an increasing orientation bandwidth Bθ was a broadening of tuning, which unfolded heterogeneously between neurons, as illustrated by three example tuning curves (Figure 1b). Specifically, neurons Ma006 and Ti002 were no longer tuned to stimuli of Bθ = 36.0° (p = 0.24 and 0.07, respectively, Wilcoxon signed-rank test, firing rate of preferred vs orthogonal orientation), contrary to neuron Tv001 which remained orientation selective (p = 10−6). The variations of tuning broadness were more precisely assessed by measuring both the global tuning of the raw data (Circular variance, CV) and the local strength of tuning around the peak of a fitted von Mises function (Half-Width at Half-Height, HWHH). The step-like changes observed (Figure 1c) were fitted with a Naka-Rushton function (27) (Supplementary Figure 3), allowing the estimation of two descriptive parameters of the bandwidth-response function. First, the bandwidth value B50 corresponds to the Bθ at which a neuron transitioned towards a more broadly tuned state. Then, the slope n, which indicates the steepness of the variations of tuning as Bθ increases. This variable was correlated to the tuning width measured at Bθ = 0.0° (pHWHH = 0.048, Spearman’s RHWHH = 0.12, pCV = 10−6, Spearman’s RCV = 0.33), that is, broadly tuned neurons underwent tuning modulations at lower Bθ compared to narrower neurons, similar to previous experimental observations (14). No other significant relationship was found, except for the n; B50 correlation produced by the Naka-Rushton fitting procedure (the optimal fit of a neuron whose B50 is past the maximum tested Bθ is a linear function). Overall, the population’s parameters were heterogeneous (Figure 1c), suggesting a diversity of tuning modulations between neurons would provide better orientation selectivity capacity between multiple orientation bandwidths (14).
Decoding orientation from population activity
Since the variety of tuning modulations observed in single neurons pleads in favor of a multi-neuronal collaboration, we used a neuronal decoding method that probes for a population code to see if the parameters of the Motion Clouds could be read out from the population activity. Briefly, we trained a multinomial logistic regression classifier, which is a probabilistic model that classifies data with multiple outcomes (28) (see Material and Methods). Here, this classifier was fed the firing rate of all recorded neurons in a given time window and learned a coefficient for each recorded neuron that predicts best the identity of a stimulus eliciting the population activity (Figure 2a). To decode orientation θ, the dataset of trials were split by Bθ such that 8 independent orientation-bandwidth specific decoders were learned. These were then able to predict the correct Motion Clouds’ θ well above the chance level of 1 out of 12 values. The temporal evolution of decoding performance for these decoders (Figure 2b) shows that the maximum is reached faster for stimuli with narrower band-widths (270 and 370 ms post-stimulation for Bθ = 0.0° and Bθ = 36.0°, respectively). The decoding performance is sustained up to 150 ms after the end of stimulation, indicating that late neural activity can serve to decode the orientation of stimuli for all Bθ (Figure 2b). A more detailed description of these decoders’ performances is given by confusion matrices (Figure 2c), which represent the accuracy of the decoder for all possible combinations of true and predicted stimuli (see Materials and Methods). The maximal diagonalization (number of significant ypred = ytrue instances) of the Bθ = 0° decoder, observed as early as 100 ms after the stimulation onset, is close to the diagonalization of the Bθ = 36.0° decoder observed hundreds of ms later (12/12 versus 11/12 classes significantly decoded above chance level, respectively, p < 0.05, 1000 permutations test). Thus, orientation could be retrieved above chance level from the population activity, for all Bθ, albeit with different dynamics. The short delay required to perform decoding above chance level is congruent with the feedforward-like processing latency of V1 (24), while the increased time required to reach maximum accuracy, especially for broad bandwidth orientations, would hint at the presence of a slower mechanism at these precision levels.
Decoding orientation bandwidth
As the decoding of orientation was impacted by orientation bandwidth, we sought to understand if Bθ was also encoded in the population activity. A decoder trained to retrieve the Bθ of stimuli (Figure 3a) reached a maximum accuracy of twice the chance level in 280 ms. However, the confusion matrix indicates that the predictions were essentially binary, discriminating between narrow and broad Bθ (Bθ = [0.0°; 5.1°; 36.0°] significantly decoded above change level, p < 0.05, 1000 permutations test). This dyadic decoding is reminiscent of the step-like changes observed in tuning curves (Figure 1c), but also of the bi-modal like distributions of the tuning steepness in the population (Figure 1c). Altogether, a binary estimation of the orientation’s bandwidth is available independently of the orientation itself, but a fine representation of bandwidth, especially for intermediate Bθ, cannot be retrieved accurately as a standalone variable using the present decoding method.
Nevertheless, orientation bandwidth could also be encoded in V1 as a two-dimensional variable, consisting of the Cartesian product of orientation and bandwidth. Such a θ × Bθ decoder was trained and could retrieve the identity of Motion Clouds with a maximum accuracy of about 16 times the chance level in 350 ms (Figure 3b). This delay to maximum accuracy is similar to the delay of broad Bθ orientation decoders (Figure 2b). In line with the previous θ decoders (Figure 2b), the confusion matrix of this θ Bθ decoder showed greater accuracy for patterns of narrow bandwidth (Figure 2b inset, 44% average θ decoding for Bθ = 0.0° and 12% for Bθ = 36.0°). Only 26% of stimuli were accurately predicted to a significant degree due to secondary, parallel diagonals in the confusion matrices (Figure 3b, right, arrows), which are instances where the decoder correctly predicted θ with no exact certainty on the associated Bθ (only elements within the main diagonals are correct decoding occurences, where ypred = ytrue). Since the addition of Bθ worsens the decoding of θ, even at Bθ = 0.0°, what advantage is there to co-encode bandwidth? To answer this question, we compared the performances of two orientation decoders (Figure 3c). The first learned only θ without any prior knowledge on Bθ while another was the marginalization of the θ × Bθ decoder over Bθ (Figure 3b). This is a simple restructuring operation which “folded” the 96 class confusion matrix over its columns and rows to obtain an orientation-only output. When marginalized, the θ × Bθ decoder performed better than the Bθ-agnostic decoder (12/12 classes significant, p < 0.05, 1000 permutations test). Prior knowledge on the orientation bandwidth associated with an oriented input thus yielded better performances of ypred = ytrue orientation decoding.
Temporal dynamics of bandwidth decoding
The visualization of the decoder’s weight can give some indication as to how this dual θ × Bθ code is implemented in single neurons. The decoder learned for each neuron and at each time step a vector βk of K elements. Representing all the independently-learned vectors through time creates a coefficient map that allowed to visualize the contribution by individual neurons to the population code in the θ × Bθ decoding paradigm. Repeated values through Bθ in each vector (Figure 4a, red areas in each dashed sections) correspond to a “vote” for the preferred orientation of the neuron, along with ambiguity over different Bθ values. Similar to the behavior of tuning curves (Figure 1b), the coefficient maps showed for some neurons a decrease of the orientation selective decoding coefficients as Bθ increased (coefficient maps of neurons Tc001 and Th002), while others remained tuned yet with an increasing width and time delay (neuron Th004). While the tuning curves were linked to the neuron’s receptive field, the coefficient maps here are comparable to an “emitting field”, i.e. to the content of the message passed by a neuron to the population. The presence of repeated peaks raised one important question: if the Bθ identity of orientation θ is ambiguous in single neurons (repeated values, Figure 4a) and in the population (secondary diagonals, Figure 3b), how can V1 retrieve the θ × Bθ identity of a stimulus? A possible disambiguation of the stimulus’ identity comes from the dynamics of the Bθ decoding. The specific time course of bandwidth decoding creates a lagged onset for broad bandwidth orientations, particularly visible in the coefficient map of neuron Th004. This is for instance reflected in the decoder’s total pool of coefficients, in which the narrower half of measured bandwidths (Bθ < 18°) stimuli are best encoded during the first 200 ms post-stimuli, transitioning to a better representation for the broader half (Bθ > 18°) afterwards (Figure 4b).
Since the decoding of orientation and bandwidth evolved through trial time, we investigated the detailed dynamics of the neural code by using temporal generalization (29). Briefly, a decoder was trained and tested at two different time points, ttrain and ttest, with the underlying hypothesis that a successful temporal generalization relies on the existence of a similar neural code at both training and testing times. More specifically, this amounts to use the coefficient βk1 learned at ttrain in the example given in Figure 2a, and to decode the activity at time ttest. As the decoder’s accuracy represents the level of discriminability of the population’s activity within feature space, measuring the performance of the decoder when ttrain ≠ ttest allowed us to assess the unraveling of the neural code in time. Note that generalization points where ttrain = ttest (on the white dashed line, Figure 5a) are the exact same measurement as the time course of accuracy shown previously (Figure 2b). The accuracy decreased slowly when moving away from this time identity line, showing a good degree of decoding generalization throughout time. The upper halves of the temporal generalization matrices above the time identity line correspond to generalizations where ttrain ≥ ttest, i.e. generalizing backward in trial time, while generalizations below the time identity line, where ttrain ≤ ttest, are generalizations forward in trial time. Hence, subtracting the upper half to the lower half of each matrix measures the asymmetry around the time identity line (contoured clusters, Figure 5, Supplementary Figure 6) and gives an indication of the direction of the neural code generalization in the temporal dimension. A positive asymmetry (red clusters) indicates a better generalization forward in time, while a negative asymmetry (blue clusters) indicates a better backward generalization. For instance, the temporal generalization matrices of narrow Bθ-specific orientation decoders showed a significant asymmetry (Figure 5b), 44% significantly asymmetric transpositions, Bθ = 0.0°, p < 0.05, 1000 permutations test). In other terms, training a narrow Bθ-specific orientation decoder and transposing it further in time within the trial (lower half of each generalization matrix in Figure 5a, positive values in Figure 5b) is not identical to training and transposing it backwards in trial time (upper half Figure 5a, negative values in Figure 5b). Notably, this asymmetry is positive when ttrain ≈ 100 ms and ttest ≥ 100 ms, implying that narrow Bθ-specific orientation decoders can use early trial activity to infer the identity of the stimulus during the rest of the trial. In other words, narrow Bθ information is encoded in the early transient population activity and this neural code is kept the same for the entire trial. Conversely, the asymmetry became negative for ttest ≥ 350 ms, independently of the trial time, implying that the late offset activity’s structure is intrinsically different and cannot be decoded based on the earlier activity. For generalization matrices of broad Bθ orientation decoders, the proportion of significantly asymmetrical transpositions is almost null : the transposition matrix is symmetric, i.e. the late neural code can be transposed backward or forward in time (1% significantly asymmetric transposition, Bθ = 36.0°, p < 0.05, 1000 permutations test). Taken together, the asymmetry of the Bθ = 0.0° generalization matrix shows that the early population activity is more suited to decode narrow Bθ, whereas the symmetry of the Bθ = 36.0° matrix implies that the late neural code for broad stimuli is always present but rises after the narrow representation, culminating late in trial time.
To understand the extent to which each Bθ-specific code extends temporally, matrices of cross-conditional temporal generalization were produced, where training and testing bins came from different Bθ (Figure 5c, full matrices shown in Supplementary Figure 6b). The asymmetry previously observed for narrow Bθ generalization remained marked when transposing the Bθ = 0.0° orientation decoder to broader Bθ stimuli (55% significantly asymmetric transpositions from Bθ = 0.0° to Bθ = 36.0°, p < 0.05, 1000 permutations test). The fact that the positive asymmetry remains confined to the early ttrain while negative asymmetry is still present in late ttrain shows that a narrow Bθ code is always present at the transient of the stimulation and stays similar across time points and Bθ. The transposition of the decoder trained for Bθ = 36.0° displayed a progressive asymmetrization when transposed to patterns of narrower bandwidth (20% significantly asymmetric transpositions from Bθ = 36.0° to Bθ = 0.0°, p < 0.05, 1000 permutations test). Namely, a pattern of negative asymmetry was present for ttrain ≈ 150 ms and ttest ≥ 250 ms, which corresponds to the region of non-significant asymmetry in the Bθ = 0.0° iso-conditional transposition (Figure 5b). This region in which the broad band-width decoder cannot decipher narrower bandwidth population activity corresponds to the transition period in the decoder’s coefficients where narrow Bθ code starts receding, but population activity encoding broad Bθ has yet to arise (Figure 4b). As such, the neural codes for narrow and broad bandwidth seem to follow two different temporal dynamics. The transient activity evoked by a narrow stimulus bandwidth generates a code that can be generalized to later time (and larger bandwidths), while the broad bandwidth codes can be generalized independently of time and bandwidth, but only dominates the global population code at later times.
While the generalization of orientation decoders allowed to separate the relative timing of the different bandwidth decoders, we have stated that co-dependency on Bθ and θ improves the population’s code in V1 (Figure 3c). Hence, to understand how the two dynamics of narrow and broad band-width codes are combined into a single population representation, we carried out the temporal generalization of the θ × Bθ decoder (Figure 6a). The characteristic asymmetry of the previous narrow Bθ matrices was observed (78% significantly asymmetric transpositions, p < 0.05, 1000 permutations test). Coherently, the asymmetry showed a first positive region, corresponding to the previously observed preference for time-forward generalization of narrow Bθ (black contoured red region, Figure 6b), before transitioning to a negative asymmetry for the broad Bθ code. Hence, the overall θ × Bθ population code retains the specificity of the bandwidth-dependent dynamics.
What is the overall dynamic of the population code ? The probabilistic output of the decoder (Figure 7) hints at a separate processing for each of the different Bθ codes. Indeed, for narrow bandwidth, the most probable pattern is correctly identified right at the onset of the stimulus presentation and stays stable through time. Furthest errors of orientation decoding, Δθ are on the order of one step in our stimulation parameters (15°). For broader Bθ values, there is a no clear consensus on the θ of the stimulation at the onset, but the correct identity becomes apparent much later, as a gradient centered on the correct orientation and bandwidth emerges. This progressive convergence onto the correct stimulus’ identity can be interpreted in terms of a segregation of oriented inputs within cortical populations (Figure 8). In that sense, the observed bandwidth code is a neural trace of the undergoing separation of orientations. Neurons would contribute to a population “vote” on orientation and bandwidth, which explains the stability of early, narrow bandwidth activity as a global consensus towards the most likely θ × Bθ. The additional processing time required for stimulations of broader bandwidths is explained by the potential decomposition of broad Motion Clouds into segregated distributions whose orientations can be retrieved.
Discussion
The bandwidth of orientation distributions impacts orientation selectivity and we have sought to understand how V1 processes this critical visual feature. Using naturalistic stimuli, we showed that non-linearities of single neurons support a population activity that co-encodes orientation and band-width through differential dynamical processes.
Few studies have investigated the neural correlates of orientation bandwidth. Recently, Goris et al. (14) used stimuli similar to Motion Clouds to investigate the function of tuning diversity in V1 and V2, reporting that heterogeneously tuned populations are better suited to encode the mixtures of orientations found in natural images. Consistent with their findings, we reported a variety of single-neuron modulations by the orientation bandwidth, as well as a correlation between the steepness of these Bθ modulations and the tuning of the neuron, evocative of a relationship between tuning width and stimulus (30). This heterogeneously tuned population, both in θ and Bθ, is suited to decode orientation mixtures of varying bandwidths and is known to be pivotal for encoding natural images in V1 (31). We performed a biologically plausible readout (24, 32) on this population activity and found that for narrow Bθ stimuli, above chance level decoding was performed in a timing coherent with feedforward-like processing. More intriguing, the delay required to infer the identity of stimuli with maximum accuracy, especially for broader Bθ, would suggest the involvement of a slower neuronal pathway.
Finding a suitable origin for this late response is easier when examining the study of orientation bandwidth within a predictive coding framework. In predictive coding, predicted and actual sensory state generate prediction errors to update the internal state of the generative model of the world. These predictions are modulated proportionally to their precision, such that highly precise predictions errors on sensory input cause a greater update of the internal model. Now, considering that the bandwidth of our stimuli are linked to the inverse variance of their orientation distributions, that is, their precisions, Motion Clouds offer an ideal tool to investigate the role of precision-weighting of the sensory inputs. In predictive coding, the top-down inference on sensory states are classically assigned to feedback connectivity (33), which is the first putative candidate for the neural substrate of precision modulation. This would imply that the low and high precision codes co-exists temporally but not spatially, and correspond respectively to the feedforward sensory input and the feedback precision-weighted prediction. In anesthetized cats, feedback from extrastriate areas modulates the gain of V1 responses without effect on the orientation preference (34) but provides contextual modulations of V1 (35), a suitable synonym for precision weighting (36).
Local cortical connectivity in V1 is also likely involved. Recurrent connectivity implements complex computations (37) and maintains tuning to features that cannot be encoded by single neurons (38), making it relevant in processing stimuli such as Motion Clouds. In line with this idea, the diversity of single-neurons modulations reflects that the heterogeneity in the recurrent synaptic connectivity can sustain a resilient tuning to orientation selectivity in V1 (39). If the implementation of precision weighting comes from this neural pathway, then the increased delay to reach maximum accuracy for broad bandwidth stimuli would stem from multiple iterations of computations implemented by recurrent interactions, which is coherent with the speed of slowly-conducted horizontal waves in the orientation map (40). Likewise, stimuli of broad orientation bandwidth should recruit more neighboring columns of different preferred orientations, whose inhibitory interactions in the orientation domain would explain the decrease in single neurons response, as observed with plaid stimuli (41). This inhibitory process would be well suited to the interpretation of our results in terms of separation of the multiple orientations (Figure 8), which would be performed by local cortical competition.
Whether feedback or recurrent, both these interpretations of the results fit the view that the precision of feed-forward predictions errors can be assigned to supragranular neurons (42). Consideration should also be given to transthalamic pathways, namely those involving the pulvinar nucleus, whose reciprocal connectivity with the visual hierarchy modulates cortical response functions (43) and is theorized to finesse the precision weighting of visual information (36).
We found no evidence for an explicit neural code for precision that would be independent of orientation, at least using our current decoding paradigm. However, the multi-feature tuning observed between θ and Bθ is a common form of encoding found in the visual cortex (44). This synergistic coding scheme has been shown to improve the representation of a stimulus, notably between motion direction and speed (45). Given the advantage conferred over independent tuning, it is not unexpected that multidimensional tuning would be exploited to form the substrate of precision weighting on sensory variables, which could emerge independently throughout the visual hierarchy. Within V1, the non-simultaneity of our electrophysiological recordings prevents us from further interpretations on the mechanisms involved in precision processing. Previous publications have reported population decoding by merging neural activity across electrodes or experiments (46, 47), which we validated in our data (Supplementary Figure 5).
Naturalistic stimuli have been used here to shed light on the processing of orientation precision. This problem was framed in the form of an instantaneous snapshot of visual inputs and can now be transposed to natural images and their exploration with saccadic behaviors, which adds a temporal dimension to the orientation distributions (48). Regardless of the involvement of saccades, the functional organization of precision processing in V1 remains an open question. The substrate for cortical dynamics on the scale of hundreds of milliseconds is still ill-defined but, as previously mentioned, slow temporal events are likely related large-scale neural activity. Naturally, one can then wonder how precision is processed at the mesoscale level. Could the nonlinearities observed be organized in maps in the primary visual cortex and if so, how would they relate to the maps of orientation?
DATA AVAILABILITY
The data generated in the present study are available from the corresponding author, H.J.L., upon reasonable request. No publicly available data was used in this study.
CODE AVAILABILITY
Data was analyzed using custom Python code, available at https://github.com/hugoladret, using the following libraries: SciPy (49), scikit-learn (50), numpy (51), PyTorch (52), lmfit (53) and Matplotlib (54).
AUTHOR CONTRIBUTIONS
L.U.P., C.C. F.C. and H.J.L. designed the study. H.J.L., N.C. and L.K. collected the data. H.J.L. and L.U.P. analyzed the data. H.J.L. and L.U.P. wrote the original draft of the manuscript. All authors reviewed and edited the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Materials and Methods
Animal procedures
Experiments were performed on 3 adult cats (3.6 - 6.0 kg, 2 males). All surgical and experimental procedures were carried out in compliance with the guidelines of the Canadian Council on Animal Care and were approved by the Ethics Committee of the University of Montreal (CDEA #20-006). Animals were first administered atropine (0.1 mg/kg) and acepromazine (Atravet, 1mg/kg) sub-cutaneously to reduce the parasympathetic effects of anesthesia and provoke sedation, respectively. Anesthesia was induced with 3.5% Isoflurane in a 50:50 (v/v) mixture of O2 and N2O. Isoflurane concentration was maintained at 1.5% during surgical procedures. A tracheotomy was performed and animals were immobilized using an intravenous injection of 2% gallamine triethiodide. Animals were then artificially ventilated and a 1:1 (v/v) solution of 2% gallamine triethiodide (10 mg/kg/h) in 5% of dextrose in lactated ringer solution was continuously administered to maintain muscle relaxation. Throughout the experiment, expired level of CO2 was maintained between 35 and 40 mmHg by adjustement of the tidal volume and respiratory rate. Heart rate was monitored and body temperature was maintained at 37°C using a feedback controlled heated blanket. Dexamethasone (4mg) was administered intramuscularly every 12h to reduce cortical swelling. Pupils were dilated using atropine (Mydriacyl) and nictitaing membranes were retracted using phenylephrine (Midfrin). Rigid contact lenses of appropriate power were used to correct the eyes’ refraction and eye lubricant was used to avoid corneal dehydratation. Lidocaine hydrochloride (2%) was used in all incisions and pressure points. A craniotomy was performed between Horsley-Clarke coordinates 4-8 P ; 0.5 - 2 L to access the area 17 (V1) contralateral to the stimulation side. Small durectomies were performed for each electrode penetration. A 2% agar solution in saline was applied over the exposed regions to stabilize recordings and avoid the drying of cortical surface.
Electrophysiological recordings
During recording sessions, anesthesia was changed to 0.5-1% Halothane, as isoflurane has been shown to yield a depression of visual responses (55). Extracellular activity was recorded using 32 channel linear probes (≈1 MΩ, 1×32-6 mm-100-177, Neuronexus) and acquired at 30 KHz using an Open Ephys acquisition board (56). Single units were isolated using Kilosort spike sorting software (57) and manually curated using the phy software (58). Clusters of low amplitude templates or illdefined margins were excluded from analysis. Furthermore, clusters were excluded if their firing rate dropped below 5 spikes.s−1 for more than 30 seconds or if their tuning curve was poorly fitted with a Von-Mises distribution (r2 < .75), leaving 254 putative neurons for further analysis.
Visual Stimulation
Visual stimuli were generated using Psychopy (59) and were projected monocularly with a PROPixx projector (VPixx Technologies Inc., St-Bruno, QC, Canada) onto an isoluminant screen (Da-Lite© screen) located 57 cm from the animal’s eye, covering 104°x 79° of visual angle with a mean luminance of 25 cd/m2. The stimuli used here are Motion Clouds (26), a class of band-pass filtered white-noise textures (60) that provide parametric control over the content of the stimuli while retaining the statistics of natural images (6). The envelope of the filters in Fourier space is a Gaussian in the coordinates of the relevant axis, in which it is described by its mean and bandwidth. As such, a MotionCloud (MC) is defined as: where is the Fourier transform, G the spatial frequency envelope and O the orientation envelope. The spatial frequency envelope follows a log-normal distribution: where f0 is the mean spatial frequency and Bf is the band-width of the spatial frequency distribution, both in cycles per degree. The orientation envelope is a Von-Mises distribution: where θf is the angle of fx, fy in the Fourier plane, θ is the mean orientation and Bθ is the bandwidth of the orientation distribution. For narrow distributions, this distribution is close to a Gaussian and Bθ measures its standard deviation. The spatial frequency parameters were set at f0 = Bf = 0.9 cpd and orthogonal drift speed was set to 10°/s, within the response range of area 17 neurons (61). For the orientation envelope, θ was varied in 12 even steps from 0 to π rad and Bθ in 8 even steps from to ≈ 0. All stimuli were displayed at 100% contrast. Each Motion Cloud was displayed for 300 ms, interleaved with the presentation of a mean luminance screen for 150 ms. Trials were fully randomized and each stimulus was presented 15 times.
Single Neuron Analysis
Tuning curves were generated by selecting a 300 ms window maximizing spike-count variance (62) in which firing rate was averaged and baseline sub-stracted. Tuning curves were averaged across drift directions. A Poisson loss function was minimized to fit a Von-Mises distribution to the data: where Rmax is the response at the preferred orientation θ, R0 the response at the orthogonal orientation, κ is a measure of concentration and θk the orientation of the stimuli. A local measure of orientation tuning around the peak of the function, the half-width at half height (HWHH) (63) was measured as:
A global measure of the orientation tuning was also assessed by computing the Circular Variance (CV) of the raw tuning curve: where R(θk) is the response to a stimuli of angle θk. CV varies from 1 for exceptionally orientation-selective neurons to 0 for untuned neurons (64).
The evolution of HWHH and CV as functions of Bθ were assessed with a Naka-Ruhston function: where f0 is the base value, fmax the maximal value, Bθ50 the orientation bandwidth at half fmax and n a strictly positive exponent of the function (27). A Wilcoxon signed-rank test, corrected for continuity, was used to assess the significance of orientation selectivity. Significance of the orientation tuning was computed by comparing the firing rate at the preferred and orthogonal orientations for all trials. Changes of preferred orientation were assessed as the difference between and , where Bθ untuned is the broadest orientation bandwidth to which a neuron remains significantly orientation tuned, following the previous test. Variations of peak amplitude was measured by comparing the firing rate at preferred orientation for Bθ = 0.0° and .
Population Decoding
The stimulus identity was decoded using a multinomial logistic regression classifier (28), which can be interpreted as an instantaneous, neuronal plausible readout of the population activity (24). For a given stimulus, population activity was a vector X(t) = [X1(t) X2(t) … X254(t)], where Xi(t) is the spike count of neuron i in time window [t; t + ΔT], where t is the time bin and ΔT the size of the integration window, usually 100 ms. Time t was varied from −200 ms to 400 ms in steps of 10 ms, relative to the stimulation onset. Each time bin was labeled as the end of the time window and was hence reported as t + ΔT.
The multinomial logistic regression is an extension of the binary logistic regression (28) which was trained to classify the neural activity vector between K possible classes. The probability of such vector to belong to a given class is: where 〈·,·〉 is the scalar product over the different neurons, k = 1, …, K is the class, out of K possible values and βk are the vectors of learned coefficients of the classifier. We trained several such classifiers, to decode orientation θ (K = 12), orientation bandwidth Bθ (K = 8) or both (K = 12 8 = 96). All meta-parameters (the time window size, penalty type, regularization strength and train/test split size) were controlled to show that the decoder is optimally parameterized and that its results are dependent on experimental data, not on decoder parameterization (Supplementary Figure 4).
Decoding accuracy was reported as the average accuracy across classes, also known as the balanced accuracy score (65), which accounts for possible imbalances in the learning or testing dataset. Decoding performance was also reported with confusion matrices, in which the values on each row i and column j represents the normalized number of times a stimuli of class k = i is predicted to belong to the class k = j. Hence, a perfect decoder would produce a perfectly diagonal confusion matrix (unit matrix). When reporting a confusion matrix, color maps values were clipped between chance level and maximum accuracy. The temporal evolution of the balanced accuracy score corresponds to the mean of the diagonal of the confusion matrix at each time bin. We reported the significativity of decoders by splitting the population activity in separate training and testing data sets, then performing 6 different such splits and comparing the resulting confusion matrices to chance level decoding matrices.
Temporal Generalization
Temporal generalization assessed the dynamics of the neural code by training and testing the decoder at different time bins (29). The resulting matrix displayed the capacity of a decoder to temporally generalize a structure it learned from the data across two time points. The diagonal, where ttrain = ttest, was the normal time course of the decoder’s accuracy. The upper half of the matrix corresponds to transposition points where the code is tested at a prior time step compared to its learning step, and conversely for the lower half. Hence, an asymmetric matrix postulates a neural code that can be either transient, transitioning between representations (lower half > upper half, large contoured regions) or stable through the trial (upper half > lower half, sparse contoured regions). Where significativity is reported, 6 different train/test splits were performed for each transposition, and the significance was computed between the difference of symmetrical points around the time identity line using a permutation test (1000 permutations).
ACKNOWLEDGEMENTS
This article is dedicated to the memory of Umit Keysan (1992-2019), a dear colleague and wonderful friend. The authors would like to thank Genevieve Cyr for her technical assistance, Bruno Oliveira Ferreira de Souza and Visou Ady for advices regarding experimental procedures as well as Louis Eparvier and Jean-Nicolas Jérémie for their comments on the manuscript and Jonathan Vacher for fruitful exchanges on the formalization of the generation of synthetic images and for his contributions to the analysis of neurophysiological recordings. This work was supported by ANR project “Horizontal-V1” ANR-17-CE37-0006 and by a CIHR grant to C.C (PJT-148959). H.J.L. was funded by an Ecole Doctorale 62 PhD grant.
Footnotes
Extensive manuscript rewrite : Figure 2 revised ; Added Figure 7 and Figure 8 ; Discussion revised.