Abstract
Tuning curves are the functions that relate the responses of sensory neurons to various values within one continuous stimulus dimension (such as the orientation of a bar in the visual domain or the frequency of a tone in the auditory domain). They are commonly determined by fitting a model e.g. a Gaussian or other bell-shaped curves to the measured responses to a small subset of discrete stimuli in the relevant dimension. However, as neuronal responses are irregular and experimental measurements noisy, it is often difficult to determine reliably the appropriate model from the data. We illustrate this general problem by fitting diverse models to representative recordings from area MT in rhesus monkey visual cortex during multiple attentional tasks involving complex composite stimuli. We find that all models can be well-fitted, that the best model generally varies between neurons and that statistical comparisons between neuronal responses across different experimental conditions are affected quantitatively and qualitatively by specific model choices. As a robust alternative to an often arbitrary model selection, we introduce a model-free approach, in which features of interest are extracted directly from the measured response data without the need of fitting any model. In our attentional datasets, we demonstrate that data-driven methods provide descriptions of tuning curve features such as preferred stimulus direction or attentional gain modulations which are in agreement with fit-based approaches when a good fit exists. Furthermore, these methods naturally extend to the frequent cases of uncertain model selection. We show that model-free approaches can identify attentional modulation patterns, such as general alterations of the irregular shape of tuning curves, which cannot be captured by fitting stereotyped conventional models. Finally, by comparing datasets across different conditions, we demonstrate effects of attention that are cell-and even stimulus-specific. Based on these proofs-of-concept, we conclude that our data-driven methods can reliably extract relevant tuning information from neuronal recordings, including cells whose seemingly haphazard response curves defy conventional fitting approaches.
Introduction
Tuning curves represent a sensory neuron’s response profile to a continuous stimuli parameter (such as orientation, direction of motion, or spatial frequency in the visual domain) and are an ubiquitous tool in neuroscience. In order to describe such selectivities in simple terms, tuning curves are commonly modeled by fitting suitable shape functions to the data, such as, for example, Gaussian distributions [1–4], arbitrary polynomials [5] or generic Fourier series [6] for orientation and direction tuning curves, or other smooth functions like splines [7] for non bell-shaped tuning profiles.
However, the measured tuning curve shapes are tremendously variable and they often deviate from the assumed reference shape [8–11]. More than thirty years ago, De Valois, Yund and Hepler already observed that no single function could adequately describe the tuned response of all cells [8] they had recorded. A few years later, Swindale found that several models fit his data equally well and further questioned the existence of a single all-encompassing model, observing, moreover, that the preferred stimulus deduced from a fitted tuning curve depended on the chosen model [9]. These and other studies thus manifest that the choice of a model to fit can affect the conclusions reached about tuning properties and their contextual modulations.
Here, we will first systematically investigate the problem of ambiguous model selection, highlighting the possible consequences of the choice of a “wrong” model. We will then introduce alternative methods which allow the extraction of features of interest directly from the measured neuronal responses, without the need of fitting any model to the empirical neuronal responses to a small subset of possible values from the continuous stimulus dimension. To illustrate the applicability and the heuristic power of our data-driven methods, we will carry out analyses of attentional modulation effects in single unit recordings in the middle temporal visual area (MT) of four rhesus monkeys, where neurons exhibit characteristic direction selective responses to moving visual stimuli [12, 13]. We will consider responses to stimuli consisting of either one or two random dot patterns (RDPs) in the receptive field where the two RDPs could be either spatially separated or transparently superimposed. In these experimental paradigms, tuning curves are expected to display either one or two peaks. The expected effects of attention include gain modulations leading to changes in the amplitude of these response peaks. However, due to the high number of experimental conditions and the difficulty of the animal’s behavioral task, only relatively few trials could be recorded for each stimulus. This limited sampling, combined with the heterogeneity of response profiles, make the measured tuning curves very “noisy”. The dataset, thus, besides its intrinsic interest, provides a perfect test-bed to reveal the drawbacks of fitting techniques and to benchmark alternative methods. We will discuss two complementary strategies in detail.
First, we will parse the trial-averaged responses of single neurons to obtain, through a set of algorithmic rules, a list of features characterizing the neuron’s response profile. For instance, a set of rules will be used to estimate the direction of a cell’s preferred stimulus, solely based on the data points themselves. Analogously, other rules will be used to capture into suitable index quantities generic variations of the average response profile of a cell across different experimental conditions (e.g. different types of stimuli, or targets of attention). As a shared prerequisite, all these rules must be able to operate by just receiving as input the set of average responses to each of the different stimuli from the small set used in the experiment. Although such an approach might seem too coarse compared to continuous interpolations, we will show an excellent agreement between the conclusions reached by feature extraction and fitting methods, whenever reliable model-based estimates can be derived. In addition, the same feature extraction rules will straightforwardly generalize even to the most irregular tuning curves, for which model fitting would be questionable. By the same token, the unbounded flexibility in rule design will allow the extraction of ad hoc features, revealing aspects of shape and shape change which elude a parameterization in terms of conventional fitted models.
Second, we will make use of the full information conveyed by the stochasticity of individual trials. For each neuron and for each different stimulus direction we will quantify the distribution of responses across trials, and compare them across different experimental conditions. This approach will show that attention frequently significantly modulates the response of a cell only for specific stimulus directions rather than a general modulation across the whole tuning profile.
Thus, through proof-of-concept analyses, we demonstrate the potential of data-driven methods for harnessing the heterogeneity of tuned responses. We prove that our approach robustly captures complex effects of attention, avoiding excessive reliance on illustrative well-behaved cases and circumventing the narrow constraints exerted by a fitting framework.
Results
The experiment: attentional influences on single-cell responses to composite stimuli
To illustrate drawbacks with fitting approaches and test novel methodology on a concrete representative example, we focused on extra-cellular recordings of single neurons from area MT in the visual cortex of rhesus monkeys [14, 15]. The aim of these recordings was to investigate how attention affects the tuning curves obtained by the simultaneous presentation of two directions of motion within the receptive field of a given neuron.
Neuronal responses were recorded under different conditions (Fig. 1A–E; see Methods for details). In the first experimental paradigm (Fig. 1A–B), random dot patterns (RDPs) moving within two spatially separated stationary apertures were used. They were sized and positioned for each recorded neuron to fit within its classical receptive field (RF). In a second experimental paradigm (Fig. 1C–D), both RDPs were fully overlapping. This single aperture contained two sets of random dots, moving transparently in two directions of motion and covered most of the RF. In all conditions a fixed angular separation of 120° between the two RDPs was used.
In each of the two paradigms, attention could be directed either to a fixation spot outside the recorded RF (condition “afix”), or to one of the RDPs inside the RF (condition “ain”). In addition, the study included conditions in which only one of the two RDPs was shown while attention was directed to the fixation spot (condition “uni”; see Fig. 1E for an explanatory cartoon of a spatially separated “uni” trial). Overall, data from six different experimental configurations (spatially separated afix, ain and uni; transparent afix, ain and uni) were analyzed (see also Materials and Methods).
Based on previous studies the “uni” experiments should generate unimodal tuning curves, while 120° separation between the spatially separated and transparent “afix” and “ain” stimuli should result in bimodal tuning curves [16], with the peaks occurring whenever one of the two stimulus components moved in the cell’s preferred direction. Fig. 1F shows an example tuning curve from the spatially separated afix condition. Responses were sampled using 12 different directions across the full 360° range with an angular resolution of 30°. This sampling resolution is typical, as direction-tuning curves are most frequently assessed using 8 or 12 evenly spaced directions to account for the constraints of a behavioral paradigm where the number of trials that can be run is limited. Note that for plotting, stimulus directions were aligned for each cell such that the attended directino in the 240° stimulus was moving in the cell’s preferred direction.
“Noisy” tuning curves: not one model to rule them all
To model bell-shaped tuning curves, Gaussian curves are commonly used [1–4] and they are usually wrapped, due to the circular nature of the fitted data [17]. This is exemplified in Fig. 1F where we fitted a sum of two Gaussians (brown curve) to a bimodal dataset. In this case, the fit looks adapted to the data, at least according to visual inspection. However, not all the cases are equally “well-behaved”, as evinced, e.g., by comparing the tuning curve of Fig. 1F with a second example in Fig. 2A.
Two aspects need to be emphasized. First, the coarse sampling of the response profile because of the 30◦ separation between measurement points and second, the large trial-by-trial response variability to a given stimulus, visualized by the error bars.
Indeed, the estimated coefficient of variation (i.e. the standard deviation in units of the mean) has a median of 38 % for the spatially separated paradigm and of 72 % for the transparent paradigm, as shown by Fig. 2B. Given this large uncertainty in the data it is thus not surprising that a “well fitted” linear combination of Gaussians lies within the error bars. However, within these large ranges of uncertainty, fitted curves obtained from model functions other than Gaussians could be accommodated as well, and there is no general a priori argument that a Gaussian model is the best suited model for such tuning curves.
To corroborate our intuition, we fitted eight different model functions to our data, testing for their compatibility with Gaussian and non-Gaussian shapes (Fig. 3). The used functions were a wrapped Gaussian (brown color) [9], a wrapped Cauchy function (yellow) [17], a symmetric Beta function (violet) [18], a wrapped generalized bell-shaped membership function (pink) [19], a von Mises function [9, 17] (orange), and Fourier series of order 2 (red), 3 (blue) and 4 (green) [6]. Details on the used models are provided in the Materials and Methods section. In Fig. 3A, we show an example neuron for which all tested models provided reasonably looking fits at visual inspection (in the transparent attend fix condition).
We rigorously quantified goodness-of-fit by evaluating a score Q, giving the null hypothesis probability to observe by mere chance a sum of squared errors larger than in our fit. Small values of Q thus indicate poor fits (see Materials and Methods). We termed a fit “good” whenever Q > 0.1 but even lower values have been considered acceptable elsewhere [20]. We computed goodness-of-fit for every model type and for every cell, and we assessed the fraction of cells for which each given model type provided a fit deemed to be good. Fig. 3B shows that—at least according to the Q measure—nearly all cells could be well fitted by every model function, in every experimental condition. This statement still holds even when adopting more conservative thresholds for goodness-of-fit testing. As detailed in Supporting S1 Fig, even for threshold criteria as stringent as Q > 0.7, nearly all models provided good fits for more than 80% of the cells in most conditions.
Thus, for a majority of cells, goodness-of-fit alone was not enough to select the best model. We therefore adopted a model comparison approach and calculated the Akaike information criterion AIC [21] for different models (see Materials and Methods).
Smaller AIC values indicate better reproduction of the data by the model. Differences between AIC values for different models quantify how much information is lost describing the data with the model with a higher AIC value compared to the model with a smaller AIC value. Let, for a given cell, AICmin be the minimum AIC value across all tested models. We then compute for each model m the relative information loss ΔAIC(m) = AIC(m) AICmin. Hence, for the best model ΔAIC = 0.
However—as a conventional rule of thumb—models with a small increase (ΔAIC < 1) should not be ruled out by model comparison but rather considered as equally good contenders for the “best fit” [22].
We show in Fig. 3C, the fraction of cells for which every tested model was evaluated as the “relative best”, i.e. obtained a value of ΔAIC = 0, as well as the fraction of cells in which it scored as an “equally good contender” with ΔAIC < 1. The outcome was qualitatively similar across all conditions. There was not a single model which scored systematically at the top for all cells, but each of the eight tested model types scored ΔAIC = 0 at least for a fraction of neurons. Even considering the softer criterion of ΔAIC < 1, none of the models appeared to be good enough to be used to fit all cells. Interestingly, for all experimental paradigms, Gaussian fits only quite rarely scored as the relative best (6–15 %, depending on paradigm and condition). On the contrary, Fourier series fits were the more frequent winners (72–100 %, taken together Fourier series of all used orders).
Some studies suggest that when the number of available samples is small, the corrected Akaike information criterion AICc [22] (see Materials and Methods) should be preferred to the AIC. This AICc penalizes models with a larger number of parameters more then the AIC already does (i.e. it implements a sharper “Occam’s razor”). We thus repeated the same analysis of Fig. 3C replacing the AIC with the AICc. Results are presented by the Supporting S2 FigA, which also presents the full statistical distributions of the observed AIC and AICc values (S2 FigB-C). It turned out that the best model according to AICc was always one with few parameters: either four or five in the uni condition and five in the afix and ain conditions. This indicates that models with only few parameters are enough to describe our highly irregular data. While no clear winner emerged in the unidirectional paradigm, the second order Fourier series fit model clearly outperformed the other models in the bidirectional paradigms, due to its reasonable fidelity in rendering the shapes of the measured tuning curves, combined with a smaller number of parameters.
In summary, model comparisons show that no single model can fit all cells equally well and that more than one model should be used when looking for the continuous interpolation of discretely sampled noisy tuning curves. Among parametric models tested (using both the AIC and the AICc criterion), Fourier series (rather than the commonly used Gaussian curves) tended to be the relative best in a larger number of cases. This reflects the substantial diversity of tuning curve shapes present in our representative dataset, since Fourier series do not have a single shape, but can faithfully render very dissimilar circularly wrapped profiles.
Intermezzo: how to compare the shapes of different parametric fits
When fitting a model to data-points (X, Y) (such as a Gaussian profile Y(X, a, b, c, d) = a exp(.5 (X–c)2/b2) + d), the set of the parameters of the model (a, b, c, d in this case) form a vector of descriptors of the shape of the relation linking X and Y. In the case of the Gaussian, indeed, the parameter a represents the peak amplitude, b the peak width, c the x-position of the peak amplitude and d the baseline level. For this reason, the shape of tuning curves and its alterations have often been analyzed in terms of the values of the parameters of a fitted model [3, 23]. However comparisons between fitted parameters are feasible only if a same underlying parametric model is used to fit tuning curves for all cells and in all experimental conditions. We have seen, on the contrary, that selecting a unique all-encompassing model to describe tuning might not be the best choice. Furthermore, fitted parameters do not have a direct geometric interpretation for every model. For instance, for Fourier series models, small changes of the internal parameters can lead to large changes in the resulting shape.
In order to compare between tuning curve shapes generated by different models in an intuitive way, we introduced generalized descriptive features that do not correspond to model parameters, but are extracted directly from the fitted curves through appropriate algorithmic rules (Fig. 4A).
The first step for the extraction of descriptive features is to select a certain number of points sampled along a fitted tuning curve profile and to note their coordinates (X, Y). Since the fitted profile is continuous, the number of selected points can be made arbitrarily large (unlike the number of actual measured data-points), but it is important to stress that the feature extraction approach that we introduce here always operates on a discrete set of points (a fact that will later allow us to apply it directly to the data-points themselves).
The second step is to apply the desired feature extraction rule to the sampled points. Consider, for example, a unimodal tuning curve, for which we could extract a feature GlobalMaximum, by finding the maximum Y coordinate among the points sampled along the profile. Analogously, we could introduce a feature MaximumAngle, corresponding to the preferred direction, by finding the X coordinate of the point whose Y coordinate corresponds to the feature GlobalMaximum. These and other features can easily be generalized to the case of bimodal tuning curves (cf. Fig. 4B). In this case the evaluation rules would be modified to limit the search just to the right (left) half of the sampled points to identify peak positions and amplitude of the right (left) peak respectively. Note that these features are based on points sampled along the fitted profiles, rather than on the parameters of these fitted profiles, which they reflect therefore only in a highly indirect manner.
Describing the extraction of a peak amplitude and position as a feature extraction procedure could be seen as a (generally non-linear) projection method [24] through which a continuous—or, at least, finely discretized—tuning curve shape is converted into a vector with a much smaller finite number of entries. This might seem unnecessarily complex, however there are several good reasons for doing so.
First, computed features may but do not need to mirror classic model parameters. For example, while the feature MaximumAngle would be equivalent to the parameter c in the case of a Gaussian fit, Fourier series don’t have any parameter directly reflecting peak position. More generally, all kind of convenient features can be designed, independent of the underlying model and tailored to describe specific shape aspects, such as, e.g., in a bimodal tuning curve, the position InnerMinimumAngle of the location of the lowest response between the two maxima —not necessarily centered between the two peaks— or the shape of the peak themselves whose OuterWidth and InnerWidth could differ, indicative of skewness or asymmetries. Table 1 gives a list of selected features, Supporting Tables S1-S4 provide the full list of features that we used as well as the detailed algorithm by which they were computed.
Second, as features are detached from the model itself, they allow a straightforward comparison of aspects of the tuning curves between parametrically incompatible models, which was a central aim in our study.
Third, as already anticipated, feature extraction rules can be applied directly—or with only minor modifications—to the observed data points themselves (Fig. 4C), without need of previously interpolating any continuous model curve. We will apply this direct approach, after completing our discussion of drawbacks inherent to model selection, by using feature extraction as a tool for comparing different fits.
Different models can lead to different quantitative and qualitative results
As indicated above, Fig. 3A shows an example for which all eight model functions could be reasonably well fitted to the measured neuronal data. We parsed these eight fitted curves based on a common set of feature extraction rules (see Tables 1 and S1-S4). We then compared the extracted shape features between the models. In Fig. 3A the position of the right and left peak and the inter-peak minimum are highlighted, respectively, by circle, diamond and square symbols. Across the model fits the positions of the left peak differed by up to 35° (10 % of 360°) and the corresponding firing rates by up to 2 Hz (15 % of the maximum trial-averaged firing rate for the afix condition in this cell); the position of the inter-peak minimum varied by 90° (25 %), their firing rates by 5 Hz (31 %); the positions of the right peak by 64◦ (18 %), their firing rates by 5 Hz (35 %). The example neuron of Fig. 3A thus shows substantial differences between the shape features inferred when applying different models.
These differences at the level of a single cell generalized to a large fraction of cells in the dataset, creating significant differences between models at the population level. For each of the eight models we computed the distributions of the values of different features across cells. In addition we extracted features from a ninth model, denoted the best model (bM). In this bM case, we selected the best among the eight tested fits, as indicated by the ΔAIC = 0 criterion on a cell-by-cell basis, and for each cell we extracted features out of its specific best fit. Comparisons among the eight models and the ninth bM model are shown in Fig. 5A for nine selected features. In these nine matrices, a red entry indicates that the statistical comparison between the median values of the extracted features are significantly different between the two corresponding models (Kruskal-Wallis test, p < 0.05), while a blue entry denotes an agreement between the two models. Entries below and above the diagonal refer to feature comparisons for the spatially separated and the transparent paradigms, respectively.
For some features, the median value did not change significantly for most model comparisons (for example the global minimum—GlobalMinimum—in all conditions of the spatially separated paradigm). For other features, there were marked differences between Fourier series and bM on the one hand, and the remaining models on the other hand (for example the circular variance, CircularVariance in the uni condition; but also the feature GlobalMinimum for the transparent paradigm). For yet other features, almost every model yielded different results (for example: the differences between the skewnesses, ΔSkewness, of the left and right peak in afix and ain; the 75 %-bandwidth of the left peak, , in afix; and the 75 %-bandwidth in the uni condition Bandwidth75 %). Importantly, in many cases there was a difference between bM and some of the other models.
Altogether, Fig. 5A demonstrates that the choice of model instead of another makes a difference, since it may lead to quantitative changes in the evaluated features.
However, it would be even more severe if these quantitative changes in the evaluation of specific features led to divergent qualitative conclusions on the comparison between experimental conditions. For instance, when studying attentional modulation effects, it is important to compare tuning curve peak amplitudes among, e.g. an afix and an ain condition. Say, for illustration, that we estimate in the afix condition, a median Maximum feature value of 30 Hz based on fits with the i-th model function, and a different median value of 40 Hz based on fits with the j-th model function. Let’s suppose that the median values for the i-th and j-th model in the ain condition read 42 and 45 Hz, respectively. Besides quantitative differences, it might happen that based on the i-th model we conclude that attention has led to a significant increase of the Maximum feature, but that this same comparison between the afix and ain conditions is not significant based on the j-th model. Thus, we would reach different conclusions on the effects of attention, depending on the chosen model. To check systematically for qualitative deviations in inter-condition comparisons, we performed comparisons between a large number of relevant feature pairs estimated from the nine different models (the eight tested model fit functions, supplemented by the bM model). We analyzed inter-model consistency for three different categories of comparisons: a feature from the spatially separate paradigm and the same feature from the transparent paradigm (this is possible for all defined features); a same feature taken from two conditions, e.g. uni vs afix, or afix vs ain (viable whenever the feature is defined for both conditions); or, two comparable features from a same condition (a list is given in supplementary table S5), e. g. Maximumleft vs Maximumright for the peak firing rates of the two peaks of a bidirectional condition. For each tested feature pair we counted the number of fitted models for which the comparison was significant.
The histogram of these counts is reported, for different categories of feature pairs, in Fig. 5B. All histograms display a marked bimodal structure with two modes at the zero and nine counts values. These modes correspond, respectively, to the cases of complete agreement between models, i.e of a comparison which is never or always significant.
Since both these two cases were the most frequent, there was a robust tendency toward a qualitative agreement between the conclusions of different models. Crucially though, the gap between the two modes of these histograms was not empty, but there were frequent cases in which the significance of comparisons between two features in a pair depended on the adopted model. Thus, for all these feature pairs, the choice of a specific model for fitting tuning curves would have led to qualitatively divergent conclusions about the effects of attention. In particular, the reached conclusion might differ from the one drawn from the bM model, the one which was constructed as optimal on a cell-by-cell basis.
This makes it advisable to always fit tuning curves based on a bM mixture of models. However, the bM approach is particularly cumbersome to calculate. Furthermore, it is ill-defined. Indeed, given the high heterogeneity in the data, it is plausible that adding even more models to the list of candidates among which to perform the bM choice would lead to further qualitative differences. It seems therefore necessary to devise alternative strategies which completely avoid the questionable step of model selection itself.
Feature extraction revisited: the direct method
Rather than relying on the extraction of tuning parameters from fitted data as illustrated above, rules for feature extraction can be generalized to operate on the experimental data points themselves. The main difference between a fitted profile and the empirical data points is a coarse angular resolution of experimental measurements that might potentially lead to a loss of precision of the extracted feature values.
However, this is a quantitative, not a qualitative difference, that does not prevent the application of the rule, as illustrated by Fig. 4C. We therefore extracted shape-describing features directly from the data for all cells in all experimental conditions and compared them with matching features estimated from different model-based fits.
Fig. 6 depicts the results of this comparison and Table 2 reports selected feature values obtained from the direct method and the corresponding values from the bM model (see Tables S6-S7 for a comprehensive list). We first focus on qualitative differences between the direct and the model-based approaches (Fig. 6A), before delving into quantitative differences (Fig. 6B). Fig. 6A follows an approach similar to Fig. 5B, however we now built distinct histograms for feature pairs which are significantly different based on the direct method (blue histogram) and feature pairs which are not significantly different based on the direct method (green histogram). As in Fig. 5B, we counted the number of fitted models with significant with feature differences. The blue (green) histograms—peaking at the maximum (minimum) model value count—indicate that when the direct method found a feature comparison to be significant (not significant), the most frequent case was that all nine (none of the) tested models also reached the same conclusion. Concomitantly, the left (right) tails of these histograms fell off quickly.
Beyond the qualitative agreement, we also checked for quantitative agreements between features extracted by the direct and the model-based methods. We computed for each model i and for each feature F a z-score variable zi(F) = (mean(F)i mean(F)direct) / std(F)direct. The distributions of these z-scores over all the different features, for each different model function are plotted in Fig. 6B. For different experimental conditions and for all the model functions, the z-score distributions are centered on zero, indicating quantitative agreement between the direct and model-based methods. For the spatially separated paradigm all models gave results particularly close to the “direct” method (the difference was smaller than one standard deviation in almost all cases, i. e. z lay between -1 and 1, for at least 95 % of the extracted features). A relatively weaker quantitative agreement was observed for the transparent paradigm. For this paradigm, the Fourier series and the von Mises models gave the best agreement to the direct method. However, even in the case of the wrapped Cauchy model, which gave the worst agreement with the direct method, 88 % of the features had z-scores between -1 and 1.
In conclusion, we observed a qualitative and quantitative agreement between the direct method and the tested models. But note that the direct method makes less assumptions on the expected shape of tuning curves and does not conceal their heterogeneity.
The direct method in action: Tuning curve modulations
After focusing on methodological aspects, we will now turn to the effects of different experimental paradigms, attentional conditions, and number of stimuli on the tuning.
We will concentrate on a narrow selection of significant feature variations (Kruskal-Wallist test with p < 0.05) revealed by the direct method, performing comparisons between conditions both within the same experimental paradigm and between different paradigms (Table 2 and Fig. 7). Supporting Tables S8-S9 provide then a complete list of significantly different feature pairs evaluated with the direct method and the best model.
First, we evaluated tuning curves when one or two unattended stimuli were present in the receptive field, i. e. in the afix and uni conditions where attention was directed outside the receptive field (RF).
For the spatially separated paradigm, the peaks in the afix condition were smaller than in the uni condition. We monitored peak elevation over the baseline using the ad hoc engineered features PEAKTOPEAKright and NORMALIZEDPEAKTOPEAKright (analogous results hold for the left peak). These features quantify for each cell the variation between the right peak’s maximum and the response’s global minimum, which is normalized for the latter feature by the maximum firing rate in the uni condition (which is aligned, by convention, such that its peak overlaps the right peak in the afix condition). This NORMALIZEDPEAKTOPEAKright feature decreased from 0.86 in the uni condition to 0.68 in the bidirectional stimulus afix condition (we report, here and in the following, sample median values). The same trend also held in the transparent condition, where NORMALIZEDPEAKTOPEAKright decreased from 0.84 to 0.55, when superposing a second stimulus within the RF.
As detailed in the Methods section, in the spatially separate paradigm the uni condition was measured with attention directed to the fixation spot, whereas in the transparent case it was taken to be the cue-period of the ain condition, that is, attention was directed to the stimulus during the measurement. Accordingly, features regarding uni conditions cannot be compared between the two paradigms. Please note that this is due to the experimental design and does not limit the applicability of the method.
No significant differences were found between the amplitudes of the two peaks present in the afix condition, within both the spatially separated and the transparent paradigms, as monitored by the features MAXIMUMleft and MAXIMUMright.
We then evaluated the effects on tuning curves when deploying attention into the RF, i.e. in the ain condition.
We first monitored the emergence of amplitude differences between the two peaks of the tuning curve, computing, for instance, the feature ΔPEAKTOPEAK, i.e. the difference between PEAKTOPEAKright and PEAKTOPEAKleft. In the spatially separated paradigm, there was a significant increase of the amplitude difference between the attended and unattended peaks, with ΔPeakToPeak rising from 4 Hz in the afix up to 9 Hz in the ain condition, as a combined effect of a decrease of the left peak (NORMALIZEDPEAKTOPEAK changes from 0.6 to 0.5) and an increase of the right peak (from 0.68 to 0.76). In contrast, in the transparent paradigm, both peaks increased as an effect of attention (median NORMALIZEDPeakToPeak changed from 0.51 to 0.64 and from 0.55 to 0.70, for the left and right peaks, respectively). The different effect of attention on peak amplitudes in the spatially separate and transparent paradigms is also illustrated by scatter plots of the NORMALIZEDPeakToPeak in the ain vs the afix condition (Figs. 7A-B), where the cloud of points lies slightly below the diagonal for the unattended (left) peak in the spatially separated paradigm and slightly above it for the attended (right) peak in the spatially separated paradigm and for both peaks in the transparent paradigm.
Besides analyses of the amplitude and width of tuning curve peaks, the feature extraction approach allows the investigation of more general alterations in the response profile. The general shape of the attended peak differed between the spatially separate and the transparent paradigms. In particular the right peak was flatter (more platykurtic) in the spatially separate than in the transparent paradigm, as revealed by the median values of the feature (see Table 1 for its definition), respectively, of 90◦ versus 60◦. A usually unreported effect of attention is illustrated in Fig. 7C, where we analyze variations of the InnerWidth features, i.e. the (absolute values of the) angular distance between each peak and the minimum between the peaks. Usually similar for both peaks in the afix condition, differences between the INNERWIDTH feature for the attended and the unattended peaks may signal interaction phenomena, such as, e.g., a tendency for the attended peak to re-absorb the unattended peak. For the spatially separated paradigm, the difference between the left and right INNERWIDTH, given by the compound feature ΔINNERWIDTH, increased significantly from 0° to 30°. On the contrary, no significant change was observed for the transparent paradigm. Note that the values of ΔINNERWIDTH are discretely quantized due to the coarse angular resolution of our measurements and the lack of interpolation in the direct method.
Conversely, the firing rate at the minimum between the peaks, monitored by the ad hoc feature normalizedInnerMinimum, increased significantly from 0.32 to 0.41, only for the transparent paradigm. For the spatially separated paradigm a trend in the same direction was also present, but was not significant.
Together these effects denote different shape alteration typologies for the two paradigms, which represent an asymmetric expansion of the attended at the expense of the unattended peak in the spatially separated paradigm and a symmetric, growth of both peaks for the transparent paradigm, increasing responses in the inter-peak dip.
In conclusion, the composition of multiple stimuli and the attentional state affected general global aspects of tuning curves, inducing characteristic and significant patterns of changes. The direct method allowed to isolate known effects of attention without need to resorting to any fit, and identified different patterns of attentional modulation for the two tested experimental paradigms. It also cast light on usually neglected aspects of tuning curve shapes such as peak asymmetries, which experimental condition and attention can also modulate, besides the most commonly studied effects on peak amplitude and width.
Cell-and stimulus-specific aspects of attentional modulation
So far, the analyses have been based on responses averaged across all trials available for a given stimulus. While such an approach is very common, it is a simplification. Indeed, neuronal responses fluctuate strongly from trial to trial, which may be functionally relevant [25–28]. We therefore compared the distributions of responses across different attentional conditions, in a cell-by-cell and stimulus angle-by-stimulus angle fashion.
The cartoon in Fig. 8A illustrates this approach for a single cell. In the plot, each dots corresponds to the response of the cell in a given trial. The two experimental conditions (e.g., afix vs ain in the spatially separated paradigm) are represented in different colors. Comparing responses between any two conditions for matching stimuli, the large trial-to-trial variability stands out, as evident by scanning vertically the clouds of colored dots for any given fixed position on the horizontal axis (stimulus configurations). Accordingly, for some stimuli, the trial ensembles of responses may be significantly different between the two conditions (for example, in the cartoon of Fig. 8A, at the 60° stimulus), while for other stimulus configurations, the trial ensembles will not (for example at the 240° stimulus, in the cartoon).
For each given cell we identified subsets of stimulus directions for which attention caused a significant response modulation (p < 0.05, two-sample Kolmogorov-Smirnov test). We found that these stimulus subsets were highly cell-specific, with different cells exhibiting robustly significant attentional modulations at different angles, not necessarily concentrated in proximity of a specific attended direction, but scattered for each cell over the entire range of possible stimuli (see Supporting Fig. S4 FigA). Correspondingly, we will refer to these stimulus-resolved significant differences, assessed at the single cell level as specific effects. While the statistical power of this analysis at the single cell level was limited by the small number of trials (see Supporting Fig. S4 FigB), the population level showed narrow stimulus ranges for which the fraction of significant specific effects were larger.
Despite the irregularity and large inter-cell variability of the significance patterns of specific effects, some weak overlap at the population level could still be identified, identifying narrow stimulus ranges for which the fraction of cells manifesting a significant specific effect were larger. Fig. 8B,C and D show how the response profile of a cell was altered when adding a second stimulus component within its RF, i.e. when going from a uni to an afix condition. The green histograms in Fig. 8B show the frequency distribution of the number of stimuli with significant changes, for the spatially separated and the transparent paradigm. In the spatially separated paradigm, 29% of cells showed significant changes for four or more stimulus directions and 16 % of the cells did not show significant specific effects at any angle. The corresponding numbers in the transparent paradigm are 25 % and 23 %, respectively. This means that, for around a fifth of all cells, the entire tuning curves for the uni and afix conditions where statistically indistinguishable.
The cells for which the addition of the second stimulus caused no significant response modulation tended to be poorly tuned already in the uni condition (see Supporting Figs. S3 FigA,B). On the other hand, some cells with equally poor tuning in the uni condition nevertheless displayed significant modulations of their firing rate when adding the second stimulus component.
Fig. 8B,E and F show cell-and stimulus-specific effects of attention, by comparing the afix and the ain conditions. The pink histograms in Fig 8B show the results of such a comparison for the spatially separated and the transparent paradigms. 7 % of the cells in the spatially separated paradigm and 10 % in the transparent paradigm showed significant specific effects of attentional modulation in four or more stimulus directions. The majority of cells in the spatially separated paradigm (60 %) and 50 % of the cells in the transparent paradigm showed a significant specific effect of attention for at least one stimulus. Note that most cells for both paradigms showed a clear tuning profile. Only 23 % (36 %) of the cells lacking any significant attentional modulation for the spatially-separated (transparent) paradigm were among the cells irresponsive to a second stimulus (Supporting S3 FigC,D).
Figs. 8C-F also show the stimulus directions for which significant specific effects of the addition of a stimulus component or of the allocation of attention were more frequently observed. The blue bars indicate the fraction of cells without a significant response modulation for a given direction, while the green and pink bars indicate significant increases and decreases of responses, respectively. Not surprisingly, the most frequent significant response enhancement occurred for the direction bins around 90° in the spatially separate uni vs afix comparison, corresponding to the cases where the preferred or a similar direction was added to a single stimulus moving about 120° away from the preferred direction.
For significant attentional modulations between the afix and ain conditions response increases were more frequent than response decreases at all stimulus directions, except 90° and 120° in the spatially separate paradigm (where response decreases were more frequent) and 60° in the transparent paradigm (where response increases and decreases were equally rare). In addition, significant attentional modulations, occurred mostly for stimulus angles between 180° and 330° (in the spatially separate paradigm) and 90° and 270° (in the transparent paradigm). These observations on stimulus-specific effects are compatible with the previous observation based on the direct feature extraction method, in that, for the transparent paradigm, both peaks are positively enhanced, while, for the spatially separate paradigm, only the attended peak is boosted, but the unattended one tends to be depressed. In this way the analysis of specific effects can shed light on the cell-level genesis of global shape changes of the average tuning curves. The trial ensemble comparison analyses presented in this section manifest how significant gain modulations at the level of a cell population may arise from the contribution of specific effects which are only rarely significant at the single cell level.
Discussion
We showed that the commonly used approach to analyze tuning curve by fitting an idealized model function to the trial-averaged data may be more problematic than usually thought. Indeed, when adopting a model-based approach, there is a clear danger to reach model-specific conclusions (cf. [9]), which would not be confirmed by selecting different, equally viable models and which may be spurious. Here, going beyond model fitting and remaining within a purely data-driven framework, we extracted information about tuning and its modulations directly from the measured data points, through the application of rules for the extraction of suitable features. The high flexibility in feature design provided an antidote against over-constrained angles of view, which may be inherited by the adoption of narrow models.
Previous works already explored possible improvements on conventional least-squares fitting when dealing with noisy tuning curve data [4, 5] and a wide alternative of possible functional models to fit, not only Gaussians [1–4], but also typical circular statistics distributions [9, 10], as well as Fourier series [6, 9]. Even the most sophisticated techniques, however, are not immune to the drawbacks inherent to any procedure assuming a common underlying statistical model. On the contrary, as already pointed out long ago [8] and further confirmed by our analyses, the “best model” may vary from cell to cell, making the problem of its selection conceptually ill-posed.
Yet, fitting still remains a practical tool to inspect tuning behavior in data, abstracting, at least as a first step, from the variety of tuning curve shapes present in any dataset. Although the tested models all give rise to bell-shaped tuning profiles, they differ in the geometry of the bells’ flanks and these differences might be relevant for fine stimulus discrimination [16, 29]. Therefore, whenever fitting is used, one should carefully explore the entire set of candidate models, rather than of a single model, as a defense against excessive model bias. Results from our direct method itself could be included as well in the tested mix of analyses. A common set of features could then be extracted through a set of shared operational rules to systematically identify patterns of (or lack of) consistency between the diverse considered approaches. Comparisons between experimental conditions which are found to be significant only for a narrow subset of methods should then be looked at with suspicion, and confirmed by additional independent verifications.
The novelty of our data-driven approach is, however, more substantial than just providing yet another “model-less model”. First, if the correct model function cannot be certified with certainty, much of the seemingly high precision achieved by model-based interpolation may be just an illusion. Looking at the data in an agnostic and democratic manner, our data-driven methods could assess the statistical significance of attentional effects, strongly localized in both stimulus-and neuronal spaces. In particular they highlighted that only about 40 %-50 % of cells—similar to some previous reports of object-based attention in V1 [30, 31]—were significantly modulated attention, among them some virtually unresponsive cells which would often be discarded in conventional model-based studies. It remains an open question whether these specific effects are an artifact due to the limited availability of information (too sparse sampling of stimuli, limited number of available trials, etc.) or if they can be related to the fine-scale synaptic structure of top-down inputs. Indeed, at the local circuit level there is evidence for an extreme functional specificity of wiring [32, 33] and the frontal eye field, one of the assumed source areas of attention [34], might provide not more than a couple of synapses to excitatory (but not inhibitory) neurons in V4 [35]. In addition, models have shown that random and sparse recurrent network architectures are compatible with highly heterogeneous tuning curves [36, 37]. In the context of the present study, it is enough to stress that such fine-grained specific attentional effects would remain hidden to any approach based on the fitting of a stereotyped smooth model to cell responses. Adopting a model-free characterization of neuronal responses may thus well be necessary to relate advances in connectomics with cell-level modulations of functional activation.
Another potential application in which data-driven approaches could prove to be qualitatively superior to model-based approaches is the study of how attention affects complex population codes [38] of tuned responses. Indeed, by comparing trial ensembles of dozens of simultaneously recorded neurons previous studies already suggested that noise correlations were essential for the attentional performance enhancements [39] and that feature attention is coordinated across hemispheres whereas spatial attention correlates only local groups of neurons [40]. We, on the other hand, had only single cell recordings available, but they revealed a high degree of heterogeneity in tuning which may be functional, not merely reflecting noise, but carrying relevant information [26–29, 41–43]. In particular, such single-cell “weird” modulations may build up in a coordinated manner to give rise to population-level representations of the attended stimulus with a higher quality of encoding or with better and faster decodability properties [44, 45]. Until now only very few studies have addressed the recording of the tuned response of many cells simultaneously [39, 40, 46, 47] but the fast pace of growth of the number of simultaneously recorded neurons [48] will certainly call for more detailed characterizations of tuned responses, such as the ones that our methods begin to provide.
Conventional model fitting methodology is restricted to the analysis of a model’s parameters thereby potentially overlooking some features of the tuning with high discriminatory power. We have circumvented this problem in that we analyzed a set of features describing a wider range of aspects of the data. In the extreme case one could set up an all-encompassing feature library and programmatically mine for the most relevant ones. A possible drawback of massive feature libraries may be the feature selection analogue of over-fitting, i.e. the inevitability that some statistical comparison will appear to be spuriously significant just in virtue of multiple comparison issues. However, even this “data dredging” [49] is legitimate when used as an explorative technique for the generation of hypotheses to be verified by further studies on independently acquired data-sets. As a matter of fact, with twenty-first-century neuroscience entering an age of “big data” and large-scale cooperation [50], feature selection [24], in which features with optimized classification relevance are engineered in a (semi-)unsupervised manner, will increasingly become a method of choice for machine-augmented data-set parsing and knowledge discovery.
To conclude, although we are still far from understanding the intricate circuit mechanism through which attention influences information representation, routing and processing in the brain, we hope that our general methodology will assist the interpretation and inspire the design of future experiments necessary to advance this research endeavor.
Methods
Experimental procedures
All animal procedures of this study have been approved by the responsible regional government office (Niedersächsisches Landesamt für Verbraucherschutz und Lebensmittelsicherheit (LAVES)) under the permit numbers 33.42502/08-07.02 and 33.14.42502-04-064/07.
The animals were group-housed with other macaque monkeys in facilities of the German Primate Center in Goettingen, Germany in accordance with all applicable German and European regulations. The facility provides the animals with an enriched environment (incl. a multitude of toys and wooden structures), natural as well as artificial light, exceeding the size requirements of the European regulations, including access to outdoor space.
All invasive procedures were done under appropriate anesthesia and with appropriate analgesics. The German Primate Center has several veterinarians on staff that regularly monitor and examine the animals and consult on any procedures.
During the study the animals had unrestricted access to food and fluid, except on the days where data were collected or the animal was trained on the behavioral paradigm. On these days the animals were allowed unlimited access to fluid through their performance in the behavioral paradigm. Here the animals received fluid rewards for every correctly performed trial. Throughout the study the animals’ psychological and medical welfare was monitored by the veterinarians, the animal facility staff and the lab’s scientists, all specialized on working with non-human primates.
Three out of four animals were used in follow-up studies. One animal was euthanized at the end of the study. The decision was made in consultation with the attending veterinarian. Euthanasia was performed using an anaesthetic overdose (Sodium-Pentobarbital (i.v.)).
Single-unit action potentials were recorded extracellularly from extrastriate cortical area MT of four male rhesus monkeys (Macaca mulatta), using two sets of covert attention tasks. Two of the animals were performing the “spatially separated” paradigm, the other two the “transparent” paradigm. For the duration of every trial the monkeys were required to maintain their gaze on a fixation point in the middle of a computer monitor, placed at a viewing distance of 57 cm. While the animal maintained fixation, either one or two moving RDPs appeared in apertures in the receptive field (RF) of a given cell, as well as in the opposite hemifield outside the RF. In case of two RDPs the direction of the RDP in aperture 2 was always shifted clockwise from the direction of the other RDP by 120°. The direction of motion of the RDPs were varied in steps of 30° to obtain a tuning curve. The angular difference of 120° was selected as bi-directional tuning curves are expected to have two peaks in this case [16]. In the transparent paradigm the two RDPs were fully overlapping, crating just one aperture, covering most of the RF whereas in the spatially separate paradigm the two apertures were smaller and non-overlapping, but both still fully contained in the RF.
In the transparent condition there always existed just one aperture resulting in a single “uni” response profile. In the spatially separate paradigm, on the other hand, presenting the stimulus in either one of the two apertures gave rise to different responses, “uni1” and “uni2”, where the latter refers to the condition in which the stimulus appeared in the to-be-attended aperture (in the ain condition). If not noted otherwise, we always analyzed the “uni2” condition in the data from the spatially separate paradigm, and for simplicity also refer to it as just “uni”.
For the spatially separate unidirectional attend-fix condition (Fig. 1E) the monkeys were instructed to direct attention to the fixation spot, after a delay one RDP appeared in one of the two non-overlapping apertures and the monkey needed to detect a change of color of the fixation spot in order to receive a liquid reward. The spatially separate attend-fix condition (Fig. 1A) was similar, but RDPs were presented in both of the apertures. In the spatially separated attend-in condition (Fig. 1B) a RDP in one of the apertures was presented as a cue (of 500 or 600 msec duration), indicating to the monkey the location and the motion direction of a stimulus to be attended in the course of the trial. After a delay (800 ms) RDPs appeared in both apertures, and the monkey had to detect a transient change of motion velocity in the cued aperture at a random time point till maximally 2.5 s after the stimulus onset while ignoring possible changes in the other (distracting) RDPs.
The transparent attend-fix (Fig. 1C) and transparent attend-in (Fig. 1D) differed from the corresponding spatially separate conditions only in that the two apertures in which the RDPs were presented overlapped.
We generally analyzed data from the response period, which was defined as the time window 200-700 msec after onset of the stimulus in the RF. However, as no distinct unidirectional condition was recorded for the transparent paradigm, we used the cue period of the attend-in condition (50-500 msec after the cue onset) as a proxy for the uni condition in this case. That means that in the transparent uni condition attention was directed to the stimulus, whereas in the spatially separate uni condition attention was directed to the fixation spot. Accordingly, the uni conditions cannot be directly compared between the two paradigms.
We had 109 and 146 cells in the spatially separate and transparent paradigm, respectively. Uni conditions were recorded for 85 out of the 109 cells in the spatially separate paradigm. In 3 cells of the transparent paradigm the afix rates were not recorded and, therefore, our analysis disregarded the afix condition of those cells.
Tuning data pre-processing
Data analysis was performed using custom-written software in Python (available on request). We did not perform spike-density estimation, but all analyses of tuning responses were based on raw firing rates, either averaged over trials (for model fitting and data-driven feature extraction) or estimated within each trial independently (for trial ensemble comparisons). Cells were included in the analysis only if at least two trials were available for every recorded condition. For some cells of the spatially separated paradigm no uni conditions were recorded. These cells were generally included in the analysis and exempted only in calculations concerning the uni condition.
All tuning curves were conventionally aligned, such that the maximum firing rate of the uni condition corresponded to the angular coordinate 240°. Whenever uni conditions had not been recorded (this was the case for some cells of the spatially separated paradigm) the angular position of the maximum firing rate of the right peak in the afix condition was defined to be 240°.
Fitted models
To analyse tuning curves we fitted several model functions to the trial-averaged firing response data. Each fit to each specific cell was fully determined by the chosen parametric model and by a vector of model parameters. The wrapped Gaussian (wG) was given by: where We always assumed N = 4 for wrapping.
The wrapped Cauchy (wC) function was given by: where and Ω = 2π/360.
The (modified) von Mises function (vM) was given by: where
The symmetric Beta function (sβ) was given by: where x = (2p/360(θc) + π)/(2π) mod 1 and
The wrapped generalized bell-shaped membership function (wB) was given by where , We always assumed N = 4 for wrapping.
All these functions are illustrated in Fig. 3A. In their basic form, they give rise to unimodal tuning profiles, as in the uni condition. To fit bimodal responses to composite stimuli, in the afix and ain conditions, we used a sum of two (identical) model functions, i.e.: where g is either one of wG, wC, sβ, wB and the total parameter vector is There was some redundancy between the parameter sets for the two peaks. For the first four functions p1 = (a1, b1, c1, d/2) and p2 = (a2, bc, c2, d/2), while, for the wB model, both and contained an additional component, s1 and s2, respectively. Hence, f had overall seven (wG, wC, vM, sβ) or nine (wB) free parameters.
In addition we also fitted Fourier series of order n = 2, 3, 4, given by: where Ω = 2π/360 and These Fourier series were fully determined by five, seven or nine parameters respectively, depending on their order n = 2, 3, 4. Fourier fits of unimodal or bimodal tuning curves shared a common functional form.
Fitting methods
We used standard weighted non-linear least square fitting, relying on routines within Python’s SciPy package (http://www.scipy.org, sequential quadratic programming) for the minimization of the χ2 statistics. The applied initial conditions and boundaries therefore are listed in Supporting Table S10. An exception was given by Fourier series.
Let pi be the ith component of the Fourier series’ parameter vector
and Xi(θ) be the ith compoment of (1, cos(Ωθ), sin(Ωθ),…, cos(nΩθ), sin(nΩθ)). Then, an exact analytical solution to the least squares problem exists, which can be straightforwardly derived to be: where has components bi = yi/σi denote the mth column of U and V, respectively. The matrices U, V and wm form the singular value composition of matrix A, s.t. A = U diag(w1,…, w2n+1)V T. The matrix A, in turn, has components Aim = Xm(θi)/σi.
To quantify goodness-of-fit we also used a standard framework, as laid out in [20]. We assume that measurement errors in yi are normally distributed. For model functions that are linear in their parameters —note, that in our library of models, this assumption holds only for the Fourier series F n—, the null hypothesis probability that the sum of squared errors is equal or larger than the observed χ2 is given by
Q(K − (2n + 1)/2, χ2/2=2) where function and K is the number of independent samples (here, K = 12 tested stimulus ≤ directions). We use this quantity Q as measure for goodness of fit. If the probability Q is ≤ 10% we term the quality of the fit “bad”, otherwise we cannot rule out the hypothesis that the fit is an appropriate statistical models for our observations. For general non-linear models—for which the sum of squared errors cannot be expected to follow a conventional χ2 distribution—we evaluated approximately the goodness-of-fit Q statistics through a Montecarlo resampling approach (10000 replicas, cf. [20] for details).
Model selection
We performed model selection based on the Akaike information criterion (AIC) [22, 51]. The information-theoretic quantity AIC gives the expected increase in uncertainty when using a certain model to describe the data rather than the “true” model. It can be computed from the sum of squared errors in the least-squares procedure according to where K = 12 is once again the number of independent samples and M is the number of free parameters of the model function. Importantly, this formula is only valid in the limit of large K. Some studies [22, 51] therefore recommend to use a correction factor in the case in which . This corrected Akaike information criterion reads:
Such AICc converges to AIC for large K and mainly differs for it by applying a stronger penalty to models with larger number of free parameters.
Features
Given a tuning curve tc(θ)—either as a discretized version of a fitted tuning profile, or directly by the vector of empirically observed average responses to stimuli with different directions—we characterized its shape in a non-parametric manner by calculating general features. A feature F is any map of the graph of the tuning curve tc(θ) onto some scalar number F : tc↦ ℝ. Examples of features are the maximum of the tuning curve or the preferred direction (see Fig. 4 for an illustration). The complete list of features that we used is given in Supporting Tables S1-S4. Although we did not perform any feature clustering or redundancy elimination through e.g. factor analysis, willing in reality to maintain a library of features as wide as possible, we verified that most of the features bear complementary information, as hinted to by a sample distribution of pairwise correlations between feature values strongly peaked around zero (not shown).
Angular features were measured in degrees, ranging from 0◦ to 360◦ (with the exception of the feature GlobalMinimumAngle, for which we used the range −120◦ ≤ GlobalMinimumAngle ≤ 240◦) and coarsely quantized at just K = 12 equally spaced angular values. For each feature measured in units of Hz we also computed a normalized counterpart, denoted with the prefix normalized, by dividing the feature value by the global maximum firing rate found in the uni condition of the cell (if available).
We run statistical tests between feature pairs to search for effects of changes of experimental condition (transparent vs spatially separated, afix vs ain, etc.). For each of the two compared conditions we evaluated the values of the tested feature for each cell. We then performed two-way Kruskal-Wallis testing and dubbed a comparison as significant, whenever the p-value of this Kruskal-Wallis test was smaller than 0.05. All found significantly different feature pairs are listed in Supporting Tables S8 and S9, an excerpt in Table 2. The feature pairs reported herein are also the ones used in the systematic counting of significant comparisons reported in Figs. 5 and 6.
Violin plots
Violin plots were calculated using Gaussian kernel density estimations with Scott’s rule (as implemented by www.scipy.org; [52]) for bandwidth estimation. Highlighted horizontal lines within the violin-shaped plot elements denote 2.5 %, 50 % and 97.5 % quantiles.
Trial ensemble comparison
Beyond feature extraction we also compared directly vectors of firing rates measured across different trials for a same common cell and a same common stimulus. We then compared firing rate ensembles over trials for matching stimulus directions and cells across different experimental conditions, by means of a between-sample two-way Kolmogorov-Smirnov test. As for pairwise feature comparisons, we deemed a comparison between firing rate trial ensembles to be significant, whenever the p-value of this Kolmogorov-Smirnov test was smaller than 0.05. We call specific effects such stimulus-and cell-dependent effects of a significant change in condition revealed by trial ensemble comparison.
Acknowledgments
Footnotes
↵* markus{at}nld.ds.mpg.de