The Spatial Frequency 1 Representation Predicts Category 2 Coding in the Inferior Temporal 3 Cortex 4

Understanding the neural representation of spatial frequency (SF) in the primate 22 cortex is vital for unraveling visual processing mechanisms in object recognition. While numerous 23 studies concentrate on the representation of SF in the primary visual cortex, the characteristics of 24 SF representation and its interaction with category representation remain inadequately 25 understood. To explore SF representation in the inferior temporal (IT) cortex of macaque 26 monkeys, we conducted extracellular recordings with complex stimuli systematically ﬁltered by 27 SF. Our ﬁndings disclose an explicit SF coding at single-neuron and population levels in the IT 28


Introduction
Spatial frequency (SF) constitutes a pivotal component of visual stimuli encoding in the primate visual system, encompassing the number of grating cycles within a specific visual angle.Higher SF (HSF) corresponds to intricate details, while lower SF (LSF) captures broader information.Previous psychophysical studies have compellingly demonstrated the profound influence of SF manipulation on object recognition and categorization processes (Joubert et al., 2007;Schyns and Oliva, 1994;Craddock et al., 2013;Caplette et al., 2014;Cheung and Bar, 2014;Ashtiani et al., 2017).(2015) and Jahfari (2013) have highlighted the significance of HSF and LSF for categorical/coordinate processing and in object recognition and decision making, respectively.The sequence in which SF content is presented also affects the categorization performance, with coarse-to-fine presentation leading to faster categorizations (Kauffmann et al., 2015).Considering face as a particular object, several studies showed that middle and higher SFs are more critical for face recognition (Costen et al., 1996;Hayes et al., 1986;Fiorentini et al., 1983;Cheung et al., 2008).Another vital theory suggested by psychophysical studies is the coarse-to-fine perception of visual stimuli, which states that LSF or global contents are processed faster than HSF or local contents (Schyns and Oliva, 1994;Rotshtein et al., 2010;Gao, 2011;Yardley et al., 2012;Kauffmann et al., 2015;Rokszin, 2016).Despite the extensive reliance on psychophysical studies to examine the influence of SF on categorization tasks, our understanding of SF representation within primate visual systems, particularly in higher visual areas like the inferior temporal (IT) cortex, remains constrained due to the limited research in this specific domain.

Saneyoshi and Michimata
One of the seminal studies investigating the neural correlates of SF processing and its significance in object recognition was conducted by Bar (2003).Their research proposes a top-down mechanism driven by the rapid processing of LSF content, facilitating object recognition (Bar, 2003;Fenske et al., 2006).The exploration of SF representation has revealed the engagement of distinct brain regions in processing various SF contents (Fintzi and Mahon, 2014;Chaumon et al., 2014;Bermudez et al., 2009;Iidaka et al., 2004;Peyrin et al., 2010;Gaska et al., 1988;Bastin et al., 2013;Oram and Perrett, 1994).More specifically, the orbitofrontal cortex (OFC) has been identified as accessing global (LSF) and local (identity; HSF) information in the right and left hemispheres, respectively (Fintzi and Mahon, 2014).The V3A area exhibits low-pass tuning curves (Gaska et al., 1988), while HSF processing activates the left fusiform gyrus (Iidaka et al., 2004).Neural responses in the IT cortex, which play a pivotal role in object recognition and face perception, demonstrate correlations with the SF components of complex stimuli (Bermudez et al., 2009).Despite the acknowledged importance of SF as a critical characteristic influencing object recognition, a more comprehensive understanding of its representation is warranted.By unraveling the neural mechanisms underlying SF representation in the IT cortex, we can enrich our comprehension of the processing and categorization of visual information.
To address this issue, we investigate the SF representation in the IT cortex of two passiveviewing macaque monkeys.We studied the neural responses of the IT cortex to intact, SF-filtered (five ranges), and phase-scrambled stimuli.SF decoding is observed in both population-and singlelevel representations.Investigating the decoding pattern of individual SF bands reveals a courseto-fine manner in recall performance where LSF is decoded more accurately than HSF.Temporal dynamics analysis shows that SF coding exhibits a coarse-to-fine pattern, emphasizing faster processing of lower frequencies.Moreover, SF representation forms an average LSF-preferred tuning across neuron responses at 70ms to 170ms after stimulus onset.Then, the average preferred SF shifts monotonically to HSF in time after the early phase of the response, with its peak at 220ms af-ter the stimulus onset.The LSF-preferred tuning turns into an HSF-preferred one in the late neuron response phase.
Next, we examined the relationship between SF and category coding.We found a strong positive correlation between SF and category coding performances in sub-populations of neurons.SF coding capability of individual neurons is highly correlated with the category coding capacity of the sub-population.Moreover, clustering neurons based on their SF responses indicates a relationship between SF representation and category coding.Employing the neuron responses to five SF ranges considering only the scrambled stimuli, an SF profile was identified for each neuron that predicts the categorization performance of that neuron in a population of the neurons sharing the same profile.Neurons whose response increases with increasing SF encode faces better than other neuron populations with other profiles.
Given the co-existence of SF and category coding within the IT cortex and the prediction capability of SF for category selectively, we examined the neural mechanisms underlying SF and category representation.In single-level, we found no correlation between SF and category coding capability of single neurons.At the population level, we found that the contribution of neurons to SF coding did not correlate with their contribution to category coding.Delving into the characteristics of SF coding, we found that individual neurons carry more independent SF-related information compared to the encoding of categories (face vs. non-face).Analyzing the temporal dynamics of each neuron's contribution to population-level SF coding reveals a shift in sparsity during different phases of the response.In the early phase (70ms-170ms), the contribution is more sparse than category coding.However, this behavior is reversed in the late phase (170ms-270ms), with SF coding showing a less sparse contribution.
Finally, we compared the representation of SF in the IT cortex with several popular convolutional neural networks (CNNs).We found that CNNs exhibited robust SF coding capabilities with significantly higher accuracies than the IT cortex.Like the IT cortex, LSF content showed higher decoding performance than the HSF content.However, while there were similarities in SF representation, CNNs did not replicate the SF-based profiles predicting neuron category selectivity observed in the IT cortex.We posit that our findings establish neural correlates pertinent to behavioral investigations into SF's role in object recognition.Additionally, our results shed light on how the IT cortex represents and utilizes SF during the object recognition process.

SF coding in the IT cortex
To study the SF representation in the IT cortex, we designed a passive stimulus presentation task (Figure1a, see Materials and methods).The task comprises two phases: the selectivity and the main.During the selectivity phase, 155 stimuli, organized into two super-ordinate and four ordinate categories, were presented (with a 50ms stimulus presentation followed by a 500ms blank period, see Materials and methods).Next, the six most responsive stimuli are selected along with nine fixed stimuli (six faces and three non-face objects, Figure1b) to be presented during the main phase (33ms stimulus presentation followed by a 465ms blank, see Materials and methods).Each stimulus is phase scrambled, and then the intact and scrambled versions are filtered in five SF ranges (R1 to R5, with R5 representing the highest frequency band, Figure1b), resulting in a total of 180 unique stimuli presented in each session (see Materials and methods).Each session consists of 15 blocks, with each stimulus presented once per block in a random order.The IT neurons of passive viewing monkeys are recorded where the cells cover all areas of the IT area uniformly (Figure1a).We only considered the responsive neurons (see Materials and methods), totaling 266 (157 M1 and 109 M2).A sample neuron (neuron #155, M1) peristimulus time histogram (PSTH) is illustrated in Figure1c in response to the scrambled stimuli for R1, R3, and R5.R1 exhibits the most pronounced firing rate, indicating the highest neural activity level.In contrast, R5 displays the lowest firing rate, suggesting an LSF-preferred trend in the neuron's response.To explore the

Temporal dynamics of SF representation
The sample neuron and recall values in Figure1 indicate an LSF-preferred neuron response.To explore this behavior over time, we analyzed the temporal dynamics of SF representation.Fig- ure2a illustrates the onset of SF recalls, revealing a coarse-to-fine trend where R1 is decoded faster than R5 (onset times in milliseconds after stimulus onset, R1=84.5±3.02,R2=86.0±4.4,R3=88.9±4.9,R4=86.5±4.1,R5=97.15±4.9,R1 < R5, p-value<0.001).Figure2b illustrates the time course of the average preferred SF across the neurons.To calculate the preferred SF for each neuron, we multiplied the firing rate by the SF range and normalized the values (see Materials and methods).
Figure2b demonstrates that following the early phase of the response (70ms to 170ms), the average preferred SF shifts towards HSF, reaching its peak at 215ms after stimulus onset (preferred SF, 0.54±0.15).Furthermore, a second peak emerges at 320 ms after stimulus onset (preferred SF, 0.22±0.16),indicating a shift in the average preferred SF in the IT cortex towards higher frequencies.To analyze this shift, we divided the time course into two intervals of 70ms to 170ms, where the response peak of the neurons happens, and 170ms to 270ms, where the first peak of SF preference occurs.We calculated the percentage of the neurons that significantly responded to a specific SF range higher than others (one-way ANOVA with a significance level of 0.05, see Materials and methods) for the two time intervals.Figure2c and d show the percentage of the neurons in each SF range for the two time steps.In the early phase of the response (T1, 70ms to 170ms), the highest percentage of the neurons belong to R1, 40.19%, and a decreasing trend is observed as we move towards higher frequencies (R1=40.19%,R2=19.60%,R3=13.72%,R4=10.78%,R5=15.68%).Moving to T2, the percentage of neurons responding to R1 higher than the others remains stable, dropping to 38.46%.The number of neurons in R2 also drops to under 5% from 19.60% observed in T1.On the other hand, the percentage of the neurons in R5 reaches 46.66% in T2 compared to 15.68% in T1 (higher than R1 in T1).This observation indicates that the increase in preferred SF is due to a substantial increase in the selective neurons to HSF, while the response of the neurons to R1 is roughly unchanged.To further understand the population response to various SF ranges, the average response across neurons for R1 to R5 is depicted in Figure2c and d (bottom panels).In the first interval, T1, an average LSF-preferred tuning is observed where the average neuron response decreases as the SF increases (normalized firing rate for R1=1.09±0.01,The average preferred SF of IT neurons moves towards higher frequencies from 170ms after stimulus onset, reaching its highest value at 220ms.A second peak emerges at 320ms following the stimulus onset.The SF preference shows a monotonic increase followed by a decrease in time.c,d Shift in neural response towards HSF.The average response of all neurons within the two time intervals (T1 and T2 in panel b) is shown, with error bars indicating the SEM.c In T1, from 70ms to 170ms after stimulus onset, a decreasing response of the neurons is observed as the SF content shifts towards higher frequencies.The relative percentage of neurons showing stronger responses to SF ranges (R1 to R5) in T1 is depicted in the inner top panel.R1 is the most responsive SF for roughly 40% of the neurons.d In the following interval (T2, 170ms to 270ms), an increasing tuning is observed from R2 to R5, where R5 elicits the highest firing rates.Furthermore, in T2, there is a roughly threefold increase in the percentage of neurons exhibiting stronger responses to R5 compared to T1, indicating a shift in the neurons' responses towards HSF (top panel).

SF profile predicts category coding
Our findings indicate explicit SF coding in the IT cortex.Given the co-existence of SF and category coding in this region, we examine the relationship between SF and category codings.As depicted in Figure2, while the average preferred SF across the neurons shifts to HSF, the most responsive SF range varies across individual neurons.To investigate the relation between SF representation and category coding, we identified an SF profile by fitting a quadratic curve to the neuron responses across SF ranges (R1 to R5, phase-scrambled stimuli only).Then, according to the fitted curve, an SF profile is determined for each neuron (see Materials and methods).Five distinct profiles were identified based on the tuning curves (Figure3a): i) flat, where the neuron has no preferred SF (not included in the results), ii) LSF preferred (LP), where the neuron response decreases as SF increases, iii) HSF preferred (HP), where neuron response increases as the SF shifts towards higher SFs, iv) U-shaped where the neuron response to middle SF is lower than that of HSF or LSF, and v) inverse U-shaped (IU), where the neuron response to middle SF is higher than that of LSF and HSF.The U-shaped and HSF-preferred profiles represent the largest and smallest populations, respectively.To check the robustness of the profiles, considering the trial-to-trial variability, the This observation underscores the importance of middle and higher frequencies for face representation.The LSF-preferred profile also exhibits significantly higher face SI than non-face objects (p-value<0.001).On the other hand, in the IU profile, non-face information surpasses face SI (p-value<0.001),indicating the importance of middle frequency for the non-face objects.Finally, in the U profile, there is no significant difference between the face and non-face objects (face vs. nonface p-value=0.36).
To assess whether the SF profiles distinguish category selectivity or merely evaluate the neu- The "flat" category, where the response to no SF was higher than others, was excluded from this analysis.b SI of face/non-face vs. scrambled stimuli is illustrated (see Materials and methods).The SI value and SF profile are determined within the time window of 70ms to 170ms after stimulus onset.The HSF-preferred population exhibited significantly higher face SI compared to the other groups.The LSF-preferred population displayed a significant difference in face and non-face SI.On the other hand, the IU profile indicates a significantly higher SI value for the non-face compared to the face.The U-shaped profile did not show any significant differences between the face and the non-face.These results suggest that the neuron's response to various SF bands can predict its decoding capability.c,d The relation between SF and category coding in sub-populations.Initially, the LDA method was employed to calculate the individual neuron's performance in the single-level category and SF coding.Next, a sorting procedure based on SF (panel c) and category (panel d) coding performances were conducted to create sub-populations of neurons exhibiting similar capabilities (see Materials and methods).The scatter plot of the category and SF coding accuracy of these sub-populations demonstrated a notably high degree of positive correlation between SF and category accuracies in the IT cortex.
provide support for the association between profiles and categories rather than mere responsiveness.
Next, to examine the relation between the SF (category) coding capacity of the single neurons and the category (SF) coding capability of the population level, we calculated the correlation between coding performance at the population level and the coding performance of single neurons within that population (\figurename \ref{fig:profile}c and d).In other words, we investigated the relation between single and population levels of coding capabilities between SF and category.The The SNC value for SF is significantly higher than for the category.b Furthermore, the CMI of each neuron pair, conditioned to the label (category or SF), is illustrated.CMI reflects the information redundancy between neuron pairs during SF or category decoding.A lower CMI value for SF indicates that individual neurons carry more independent SF-related information compared to category information.c Sparse neuron contribution in SF coding at the early phase of the response.
To investigate the contribution of the neurons in population decoding, the sparseness of the LDA weights assigned to each neuron is calculated.Higher sparseness indicates a greater contribution of a smaller group of neurons to the decoding process.The time course of weight sparseness is depicted for SF and category (face vs. non-face) decoding, with shadows representing the STD.During the early phase of the response, the sparseness of SF-related weights is higher than that of the category, while this relationship is reversed during the late phase of the response.

Uncorrelated mechanisms for SF and category coding
As both SF and category coding exist in the IT cortex at both the single neuron and population vs. HSF (R4 and R5)) or category, to assess the information redundancy across the neurons.CMI quantifies the shared information between the population of two neurons regarding SF or category coding.Figure5b indicates a significantly lower CMI for SF (average CMI for SF=0.66±0.0009and cat-egory=0.69±0.0007,SF<category with p-value≈0), indicating that neurons carry more independent SF-related information than category-related information.
To investigate each neuron's contribution to the decoding procedure (LDA decision), we computed the sparseness of the LDA weights corresponding to each neuron (see Materials and methods).For SF, we trained the LDA on R1, R2, R4, and R5 with two labels (one for R1 and R2 and the alternative for R4 and R5).A second LDA was trained to discriminate between faces and nonfaces.Subsequently, we calculated the sparseness of the weights associated with each neuron in SF and category decoding.Figure5c illustrates the time course of the weight sparseness for SF and category.The category reflects a bimodal curve with the first peak at 110ms and the second at 210ms after stimulus onset.The second peak is significantly larger than the first one (category first peak, 0.016±0.007,second peak, 0.051±0.013,and p-value<0.001).In SF decoding, neurons' weights exhibit a trimodal curve with peaks at 100ms, 215ms, and 320ms after the stimulus onset.
The first peak is significantly higher than the other two (SF first peak, 0.038±0.005,second peak, 0.018±0.003,third peak, 0.028±0.003,first peak > second peak with p-value<0.001,and first peak > third peak with p-value=0.014).Comparing SF and category, during the early phase of the response (70ms to 170ms), SF sparseness is higher, while in 170ms to 270ms, the sparseness value of the category is higher (p-value < 0.001 for both time intervals).This suggests that, initially, most neurons contribute to category representation, but later, the majority of neurons are involved in SF coding.These findings support distinct mechanisms governing SF and category coding in the IT cortex.

SF representation in the artificial neural networks
We conducted a thorough analysis to compare our findings with CNNs.To assess the SF coding capabilities of CNNs, we utilized popular architectures, including ResNet18, ResNet34, VGG11, VGG16,

Discussion
Utilizing neural responses from the IT cortex of passive-viewing monkeys, we conducted a study on SF representation within this pure visual high-level area.Numerous psychophysical studies have underscored the significant impact of SF on object recognition, highlighting the importance of its representation.To the best of our knowledge, this study presents the first attempt to systematically examine the SF representation in a high-level area, i.e., the IT cortex, using extracellular recording.
Understanding SF representation is crucial, as it can elucidate the object recognition procedure in the IT cortex.
Our findings demonstrate explicit SF coding at both the single-neuron and population levels, with LSF being decoded faster and more accurately than HSF.During the early phase of the response, we observe a preference for LSF, which shifts toward a preference for HSF during the late phase.Next, we made profiles based on SF-only (phase-scrambled stimuli) responses for each neuron to predict its category selectivity.Our results show a direct relationship between the population's category coding capability and the SF coding capability of individual neurons.While we observed a relation between SF and category coding, we have found uncorrelated representations.
Unlike category coding, SF relies more on sparse, individual neuron representations.Finally, when comparing the responses of IT with those of CNNs, it is evident that while SF coding exists in CNNs, the SF profile observed in the IT cortex is notably absent.Our results are based on grouping the neurons of the two monkeys; however, the results remain consistent when looking at the data from individual monkeys as illustrated in Appendix 2.
The influence of SF on object recognition has been extensively investigated through psychophysical studies (Joubert et al., 2007;Schyns and Oliva, 1994;Craddock et al., 2013;Caplette et al., 2014;Cheung and Bar, 2014;Ashtiani et al., 2017).One frequently explored theory is the coarse-to-fine nature of SF processing in object recognition (Schyns and Oliva, 1994;Rotshtein et al., 2010;Gao, 2011;Yardley et al., 2012;Kauffmann et al., 2015;Rokszin, 2016).This aligns with our observation that the onset of LSF is significantly lower than HSF.Different SF bands carry distinct information, progressively conveying coarse-to-fine shape details as we transition from LSF to HSF.Psychophysical studies have indicated the utilization of various SF bands for distinct categorization tasks (Rot-

shtein et al., 2010).
Considering the face as a behaviorally demanded object, psychophysical studies have observed the influence of various SF bands on face recognition.These studies consistently show that enhanced face recognition performance is achieved in the middle and higher SF bands compared to LSF (Costen et al., 1996;Hayes et al., 1986;Fiorentini et al., 1983;Cheung et al., 2008;Awasthi, 2012;Jeantet, 2019).These observations resonate with the identified SF profiles in our study.Neurons that exhibit heightened responses as SF shifts towards HSF demonstrate superior coding of faces compared to other neuronal groups.
Unlike psychophysical studies, imaging studies in this area have been relatively limited.To rule out the degraded contrast sensitivity of the visual system to medium and high SF information because of the brief exposure time, we repeated the analysis with 200ms exposure time as illustrated in Appendix 1 -Figure4 which indicates the same LSF-preferred results.Furthermore, according to Figure2, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase.It indicates that the amount of HSF input received by the IT neurons in the later phase is as much as LSF, however, its impact on the IT response is observable in the later phase of the response.Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity.Next, according to Figure3(a), 6% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group.This analysis is carried out in the early phase of the response (70-170ms).While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons.Additionally, the highest SI belongs to the HSF-preferred profile in the early phase of the  and V4).Therefore, our results show that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the IT cortex.
Hong et al. (2016) suggested that the neural mechanisms responsible for developing tolerance to identity-preserving transform also contribute to explicitly representing these categoryorthogonal transforms, such as rotation.Extending this perspective to SF, our results similarly suggest an explicit representation of SF within the IT population.However, unlike transforms such as rotation, the neural mechanisms in IT leverage various SF bands for various categorization tasks.
Furthermore, our analysis introduced a novel SF-only profile for the first time predicting category selectivity.
These findings prompt the question of why the IT cortex explicitly represents and codes the SF content of the input stimuli.In our perspective, the explicit representation and coding of SF contents in the IT cortex facilitates object recognition.The population of the neurons in the IT cortex becomes selective for complex object features, combining SFs to transform simple visual features into more complex object representations.However, the specific mechanism underlying this combination is yet to be known.The diverse SF contents present in each image carry valuable information that may contribute to generating expectations in predictive coding during the early phase, thereby facilitating information processing in subsequent phases.This top-down mechanism is suggested by the works of Bar (2003) and Fenske et al. (2006).
Moreover, each object has a unique "characteristic SF signature," representing its specific arrangement of SFs."Characteristic SF signatures" refer to the unique patterns or profiles of SFs associated with different objects or categories of objects.When we look at visual stimuli, such as objects or scenes, they contain specific arrangements of different SFs.Imagine a scenario where we have two objects, such as a cat and a car.These objects will have different textures and shapes, which correspond to different distributions of SFs.The cat, for instance, might have a higher concentration of mid-range SFs related to its fur texture, while the car might have more pronounced LSFs that represent its overall shape and structure.The IT cortex encodes these signatures, facilitating robust discrimination and recognition of objects based on their distinctive SF patterns.
The concept of "characteristic SF signatures" is also related to the "SF tuning" observed in our Finally, we compared SF's representation within the IT cortex and the current state of the art networks in deep neural networks.CNNs stand as one of the most promising models for comprehending visual processing within the primate ventral visual processing stream (Kubilius et al., 2018(Kubilius et al., , 2019)).Examining the higher layers of CNN models (most similar to IT), we found that randomly initialized and pre-trained CNNs can code for SF.This is consistent with our previous work on the CIFAR dataset (Toosi et al., 2022).Nevertheless, they do not exhibit the SF profile we observed in the IT cortex.This emphasizes the uniqueness of SF coding in the IT cortex and suggests that artificial neural networks might not fully capture the complete complexity of biological visual processing mechanisms, even when they encompass certain aspects of SF representation.Our results intimate that the IT cortex uses a different mechanism for SF coding compared to contemporary deep neural networks, highlighting the potential for innovating new approaches to consider the role of SF in the ventral stream models.
Our results are not affected by several potential confounding factors.First, each stimulus in the set also has a corresponding phase-scrambled variant.These phase-scrambled stimuli maintain the same SF characteristics as their respective face or non-face counterparts but lack shape information.This approach allows us to investigate SF representation in the IT cortex without the confounding influence of shape information.Second, our results, obtained through a passive viewing task, remain unaffected by attention mechanisms.Third, All stimuli (intact, SF filtered, and phase scrambled) are corrected for illumination and contrast to remove the attribution of the categoryorthogonal basic characteristics of stimuli into the results (see Materials and methods).Fourth, while our dataset does not exhibit a balance in samples per category, it is imperative to acknowledge that this imbalance does not exert an impact on our observed outcomes.We have equalized the number of samples per category when training our classification models by random sampling from the stimulus set (see Materials and methods).One limitation of our study is the relatively low number of objects in the stimulus set.However, the decoding performance of category classification (face vs. non-face) in intact stimuli is 94.2%.The recall value for objects vs. scrambled is 90.45%, and for faces vs. scrambled is 92.45 (p-value=0.44),which indicates the high level of generalizability and validity characterizing our results.Finally, since our experiment maintains a fixed SF content in terms of both cycles per degree and cycles per image, further experiments are needed to discern whether our observations reflect sensitivity to cycles per degree or cycles per image.
In summary, we studied the SF representation within the IT cortex.Our findings reveal the existence of a sparse mechanism responsible for encoding SF in the IT cortex.Moreover, we studied the relationship between SF representation and object recognition by identifying an SF profile that predicts object recognition performance.These findings establish neural correlates of the psychophysical studies on the role of SF in object recognition and shed light on how IT represents and utilizes SF for the purpose of object recognition.

Animals and recording
The activity of neurons in the IT cortex of two male macaque monkeys weighing 10 and 11 kg, respectively, was analyzed following the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the Society for Neuroscience Guidelines and Policies.The experimental procedures were approved by the Institute of Fundamental Science committee.Before implanting a recording chamber in a subsequent surgery, magnetic resonance imaging and Computed Tomography (CT) scans were performed to locate the prelunate gyrus and arcuate sulcus.The surgical procedures were carried out under sterile conditions and Isoflurane anaesthesia.Each monkey was fitted with a custom-made stainless-steel chamber, secured to the skull using titanium screws and dental acrylics.A craniotomy was performed within the 30x70mm chamber for both monkeys, with dimensions ranging from 5 mm to 30 mm A/P and 0 mm to 23 mm M/L.
During the experiment, the monkeys were seated in custom-made primate chairs, and their heads were restrained while a tube delivered juice rewards to their mouths.The system was mounted in front of the monkey, and eye movements were captured at 2KHz using the EyeLink PM-910 Illuminator Module and EyeLink 1000 Plus Camera (SR Research Ltd, Ottawa, CA).Stimulus presentation and juice delivery were controlled using custom software written in MATLAB with the MonkeyLogic toolbox.Visual stimuli were presented on a 24-inch LED-lit monitor (AsusVG248QE, 1920 x 1080, 144 Hz) positioned 65.5 cm away from the monkeys' eyes.The actual time the stimulus appeared on the monitor was recorded using a photodiode (OSRAM Opto Semiconductors, Sunnyvale, CA).
One electrode was affixed to a recording chamber and positioned within the craniotomy area using the Narishige two-axis platform, allowing for continuous electrode positioning adjustment.
To make contact with or slightly penetrate the dura, a 28-gauge guide tube was inserted using a manual oil hydraulic micromanipulator from Narishige, Tokyo, Japan.For recording neural activity extracellularly in both monkeys, varnish-coated tungsten microelectrodes (FHC, Bowdoinham, ME) with a shank diameter of 200-250 m and impedance of 0.2-1 M (measured at 1kHz) were inserted into the brain.A pre-amplifier and amplifier (Resana, Tehran, Iran) were employed for single-electrode recordings, with filtering set between 300 Hz and 5 KHz for spikes and 0.1 Hz and 9 KHz for local field potentials.Spike waveforms and continuous data were digitized and stored at a sampling rate of 40 kHz for offline spike sorting and subsequent data analysis.Area IT was identified based on its stereotaxic location, position relative to nearby sulci, patterns of gray and white matter, and response properties of encountered units.
and the scatter between categories.SI is defined as the ratio of between-category scatter to withincategory scatter.The computation of SI involves three sequential steps.Initially, the center of mass for each category, referred to as and the overall mean across all categories, termed the total mean, , were calculated.Second, we calculated the between-and within-category scatters.

= ∑
where is the scatter matrix of the i'th category, is the stimulus response, is within-category scatter, is the between-category scatter, and is the number of samples in the i'th category.Finally, SI was computed as where ‖ ‖ indicates the norm of S. For additional information, please refer to the study conducted by Dehaqani et al. (2016).

SNC and CMI
To examine the influence of individual neurons on population-level decoding, we introduced the concept of the SNC.It measures the reduction in decoding performance when a single neuron is removed from the population.We systematically removed each neuron from the population one at a time and measured the corresponding drop in accuracy compared to the case where all neurons were present.
To quantify the CMI between pairs of neurons, we discretized their response patterns using ten levels of uniformly spaced bins.The CMI is calculated using the following formula.
where and represent the discretized responses of the two neurons, and represents the conditioned variable, which can be the category (face/non-face) or the SF range (LSF (R1 and R2) and HSF (R4 and R5)).We normalized the CMI by subtracting the CMI obtained from randomly shuffled responses and added the average CMI of SF and category.CMI calculation enables us to assess the degree of information shared or exchanged between pairs of neurons, conditioned on the category or SF while accounting for the underlying probability distributions.

Sparseness analysis
The sparseness analysis was conducted on the LDA weights, regarded as a measure of task relevance.To calculate the sparseness of the LDA weights, the neuron responses were first normalized using the z-score method.Then, the sparseness of the weights associated with the neurons in the LDA classifier was computed.The sparseness is computed using the following formula.
where is the neuron weight in LDA, ( 2 ) represents the mean of the squared weights of the neurons.The maximum sparseness occurs when only one neuron is active, whereas the minimum sparseness occurs when all neurons are equally active.

Deep neural network analysis
To compare our findings with those derived from deep neural networks, we commenced by curating a diverse assortment of CNN architectures.This selection encompassed ResNet18, ResNet34, VGG11, VGG16, InceptionV3, EfficientNetb0, CORNet-S, CORTNet-RT, and CORNet-z, strategically chosen to offer a comprehensive overview of SF processing capabilities within deep learning models.Our experimentation spanned the utilization of both randomly initialized weights and pretrained weights sourced from the ImageNet dataset.This dual approach allowed us to assess the influence of prior knowledge embedded in pre-trained weights on SF decoding.In the process of extracting feature maps, we fed our stimulus set to the models, capturing the feature maps from the last four layers, excluding the classifier layer.Our results were primarily rooted in the final layer (preceding classification), yet they demonstrated consistency across all layers under examination.For classification and SF profiling, our methodology mirrored the procedures employed in our neural response analysis.

. SF response distribution
To check the SF response strength, the histogram of IT neuron responses to scrambled, face, and non-face stimuli is illustrated in this figure.A Gamma distribution is also fitted to each histogram.To calculate the histogram, the neuron response to each unique stimulus is calculated for each neuron in spike/seconds (Hz).In the early phase, T1, the average firing rate to scrambled stimuli is 26.3 Hz which is significantly higher than the response in -50 to 50ms which is 23.4 Hz.In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz.Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5, 19.4, and 22.4 Hz, respectively.

Robustness of SF profiles
To investigate the robustness of the SF profiles, considering the trial-to-trial variability, we calculated the neuron's profile using half of the trials.Then, the neuron's response to R1, R2, ..., R5 is calculated with the remaining trials.Appendix 1 -Figure3, illustrates the average response of each profile for SF bands in each profile.

Figure 1 .a
Figure 1.Experimental design and SF coding.aExperimental design.The design of the experiment involved the collection of responses from IT neurons to 15 stimuli (including six faces, three non-faces, and six selective stimuli, see Materials and methods) in six SF bands (intact and R1 to R5, see Materials and methods), and two versions (scrambled and unscrambled) using a passive presentation task.Presentation of blocks starts if the monkey preserves fixation for 200ms.Each block consisted of a 33ms stimulus presentation followed by a blank screen with a fixation point of 465ms, and each stimulus was presented 15 times.The recorded signals were sorted, and visually responsive neurons were selected (N = 266, see Materials and methods).b A sample of the fixed stimulus set.This panel shows three (out of six) faces, three non-faces, and one scrambled sample stimulus.Each row corresponds to an SF range starting with intact, followed by R1 to R5 (low to high SF).c A sample neuron.The PSTH of a sample neuron (N = 151, M1) for scrambled stimuli is depicted.To generate a response vector for a given stimulus or trial, the responses of each neuron were averaged in a 50ms time window centered around the relevant time point.The PSTH was smoothed using a Gaussian kernel with a standard deviation of 20ms.The responses of three SF bands (R1, R3, and R5) are shown for better illustration.d SF coding exists in the IT cortex.The decoding performance of SF ranges using scrambled stimuli is shown over time.Single-level and population-level representations were fed into an LDA algorithm to predict the SF range of the scrambled stimuli.Shadows illustrate the SEM and STD for single and population levels, respectively.This figure highlights the presence of SF coding in both individual and population neural activity.e LSF-preferred nature of SF coding.The population recall of each SF band in response to scrambled stimuli, determined using the LDA method, is presented.The error bars indicate the STD.The results demonstrate a decreasing trend as SF moves towards higher frequencies, suggesting a coarse-to-fine decoding preference.

Figure 2 .
Figure 2. The temporal dynamics of SF representation.aCourse-to-fine nature of SF coding.The onset time of the recall of each SF range in scrambled stimuli is illustrated, with error bars indicating the STD.The results suggest that the onset time of decoding increases as SF increases.b SF preference shifts toward higher frequencies over time.The time course of the average preferred SF (see Materials and methods) across neurons is illustrated.The average preferred SF of IT neurons moves towards higher frequencies from 170ms after stimulus onset, reaching its highest value at 220ms.A second peak emerges at 320ms following the stimulus onset.The SF preference shows a monotonic increase followed by a decrease in time.c,d Shift in neural response towards HSF.The average response of all neurons within the two time intervals (T1 and T2 in panel b) is shown, with error bars indicating the SEM.c In T1, from 70ms to 170ms after stimulus onset, a decreasing response of the neurons is observed as the SF content shifts towards higher frequencies.The relative percentage of neurons showing stronger responses to SF ranges (R1 to R5) in T1 is depicted in the inner top panel.R1 is the most responsive SF for roughly 40% of the neurons.d In the following interval (T2, 170ms to 270ms), an increasing tuning is observed from R2 to R5, where R5 elicits the highest firing rates.Furthermore, in T2, there is a roughly threefold increase in the percentage of neurons exhibiting stronger responses to R5 compared to T1, indicating a shift in the neurons' responses towards HSF (top panel).
strength of SF selectivity in each profile is provided in Appendix 1 -Figure3, by forming the profile of each neuron based on half of the trials and then plotting the average SF responses with the other half.Following profile identification, the object coding capability of each profile population is assessed.Here, instead of LDA, we employ the separability index (SI) introduced by Dehaqani et al.(2016), because of the LDA limitation in fully capturing the information differences between groups as it categorizes samples as correctly classified or misclassified.To examine the face and non-face information separately, SI is calculated for face vs. scrambled and non-face vs. scrambled.Figure3a displays the identified profiles and Figure3b indicates the average SI value during 70ms to 170ms after the stimulus onset.The HSF preferred profile shows significantly higher face information compared to other profiles (face SI for LP=0.58±0.03,HP=0.89±0.05,U=0.07±0.01,IU=0.07±0.01,HP > LP, U, IU with p-value < 0.001) and than nonface information in all other profiles (non-face SI for LP=0.04±0.01,HP=0.02±0.01,U=0.19±0.03,IU=0.08±0.02,and face SI in HP is greater than non-face SI in all profiles with p-value < 0.001).

Figure 3 .
Figure 3. SF profile predicts category coding.a,b SF profile predicts category selectivity.a The responses of each neuron were standardized by subtracting the mean and dividing by the standard deviation of the baseline time.Neurons were then categorized into four groups based on the fitting of a quadratic function to their responses (see Materials and methods).Each panel presents the average neuron responses within each category for SF ranges R1 to R5, with error bars indicating the SEM of the response values.The percentage of the neurons in each category is displayed at the top of each panel.The "flat" category, where the response to no SF was higher than others, was excluded from this analysis.b SI of face/non-face vs. scrambled stimuli is illustrated (see Materials and methods).The SI value and SF profile are determined within the time window of 70ms to 170ms after stimulus onset.The HSF-preferred population exhibited significantly higher face SI compared to the other groups.The LSF-preferred population displayed a significant difference in face and non-face SI.On the other hand, the IU profile indicates a significantly higher SI value for the non-face compared to the face.The U-shaped profile did not show any significant differences between the face and the non-face.These results suggest that the neuron's response to various SF bands can predict its decoding capability.c,d The relation between SF and category coding in sub-populations.Initially, the LDA method was employed to calculate the individual neuron's performance in the single-level category and SF coding.Next, a sorting procedure based on SF (panel c) and category (panel d) coding performances were conducted to create sub-populations of neurons exhibiting similar capabilities (see Materials and methods).The scatter plot of the category and SF coding accuracy of these sub-populations demonstrated a notably high degree of positive correlation between SF and category accuracies in the IT cortex.

Figure 4 .
Figure 4. Uncorrelated mechanisms for SF and category coding.auncorrelated SF and category coding in the single level.The scatter plot indicates the category-SF accuracies and does not reveal a significant correlation between SF and category coding capabilities within the IT cortex at the single-neuron level.The error bars show the STD for SF and category decoding accuracies.b uncorrelated neuron contribution in SF and category coding in population.The LDA weight of each neuron is considered as the neuron contribution in the population coding of SF or category (see Materials and methods).The scatter plot of the neuron weights in SF shows a near-zero correlation with the neuron weights in category coding.

Figure 6 .
Figure 6.SF representation in CNNs.aSF coding capabilities.We assessed the SF coding capabilities of popular CNN architectures (ResNet18, ResNet34, VGG11, VGG16, InceptionV3, EfficientNetb0, CORNet-S, CORTNet-RT, and CORNet-z) using both randomly initialized (R) and pre-trained (P) weights on ImageNet.An LDA model was trained using feature maps from the four last layers of each CNN to classify the SF content of input images.The SF decoding accuracy for each CNN on our dataset is presented with error bars indicating the STD.b LSF-preferred recall performance.The recall performance of two sample networks (CORNET-z and ResNet18) is presented.STD values are illustrated with error bars.The recall values for LSF content were higher than HSF content in most CNNs, resembling the trends observed in the IT cortex.c The profiles (left) and face/non-face SI value (right) of a sample network (ResNet18).Profiles are calculated similarly to the IT cortex.CNNs did not replicate the SF-based profiles observed in the IT cortex.
Gaskaet al. (1988)  observed low-pass tuning curves in the V3A area, and Chen et al.(2018)  reported an average low-pass tuning curve in the superior colliculus (SC).Purushothaman et al.(2014)  identified two distinct types of neurons in V1 based on their response to SF.The majority of neurons in the first group exhibited a monotonically shifting preference toward HSF over time.In contrast, the second group showed an initial increase in preferred SF followed by a decrease.Our findings align with these observations, showing a rise in preferred SF starting at 170ms after stimulus onset, followed by a decline at 220ms after stimulus onset.Additionally, Zhang et al.(2023)  found that LSF is the preferred band for over 40% of V4 neurons.This finding is also consistent with our observations, where approximately 40% of neurons consistently exhibited the highest firing rates in response to LSF throughout all response phases.Collectively, these results suggest that the average LSF preferred tuning curve observed in the IT cortex could be a characteristic inherited from the lower areas in the visual hierarchy.Moreover, examining the course-to-fine theory of SF processing, Chen et al. (2018) and Purushothaman et al. (2014) observed a faster response to LSF compared to HSF in SC and V1, which resonates with our course-to-fine observation in SF decoding.When analyzing the relationship between the SF content of complex stimuli and IT responses, Bermudez et al. (2009) observed a correlation between neural responses in the IT cortex and the SF content of the stimuli.This finding is in line with our observations, as decoding results directly from the distinct patterns exhibited by various SF bands in neural responses.
response which supports the impact of the HSF part of the input.Similar LSF-preferred responses are also reported by Chen et al. (2018) (50ms for SC) and Zhang et al. (2023) (3.5 -4 secs for V2 results.Neurons in the visual cortex, including the IT cortex, have specific tuning preferences for different SFs.Some neurons are more sensitive to HSF, while others respond better to LSF.This distribution of sensitivity allows the visual system to analyze and interpret different information related to different SF components of visual stimuli concurrently.Moreover, the IT cortex's coding of SF can contribute to object invariance and generalization.By representing objects in terms of their SF content, the IT cortex becomes less sensitive to variations in size, position, or orientation, ensuring consistent recognition across different conditions.SF information also aids the IT cortex in categorizing objects into meaningful groups at various levels of abstraction.Neurons can selectively respond to shared SF characteristics among different object categories (assuming that objects in the same category share a level of SF characteristics), facilitating decision-making about visual stimuli.Overall, we posit that SF's explicit representation and coding in the IT cortex enhance its proficiency in object recognition.By capturing essential details and characteristics of objects, the IT cortex creates a rich representation of the visual world, enabling us to perceive, recognize, and interact with objects in our environment.
To generate a response vector for a given stimulus or trial, the responses of each neuron were averaged in a 50ms time window centered around the relevant time point.The PSTH was smoothed using a Gaussian kernel with a standard deviation of 20ms.
levels, we investigated their underlying coding mechanisms (for single level and population level separately).Figure4a displays the scatter plot of SF and category coding capacity for individual neurons.The correlation between SF and category accuracy across individual neurons shows no −13).Therefore, SF representation relies more on individual neuron representations, suggesting a sparse mechanism of SF coding where single-level neuron information is less redundant.In contrast, single-level representations of category appear to be more redundant and robust against information loss or noise at the level of individual neurons.We utilized conditional mutual information (CMI) between pairs of neurons conditioned on the label, SF(LSF (R1 and R2) InceptionV3, EfficientNetb0, CORNet-S, CORTNet-RT, and CORNet-z, with both pre-trained on Ima-geNet and randomly initialized weights (see Materials and methods).Employing feature maps from the four last layers of each CNN, we trained an LDA model to classify the SF content of input images.The results indicated that CNNs exhibit SF coding capabilities with much higher accuracies than the IT cortex.Figure6a shows the SF decoding accuracy of the CNNs on our dataset (SF decoding accu-However, while the CNNs exhibited some similarities in SF representation with the IT cortex, they did not replicate the SF-based profiles that predict neuron category selectivity.As depicted in Figure6c, although neurons formed similar profiles, these profiles were not associated with the category decoding performances of the neurons sharing the same profile. Saneyoshi A, Michimata C. Categorical and coordinate processing in object recognition depends on different spatial frequencies.Cognitive Processing.2015; 16:27-33.Schyns PG, Oliva A. From blobs to boundary edges: Evidence for time-and spatial-scale-dependent scene recognition.Psychological science.1994; 5(4):195-200.Akhaee MA, Dehaqani MRA.An automatic spike sorting algorithm based on adaptive spike detection and a mixture of skew-t distributions.Scientific Reports.2021; 11(1):1-18.Akhaee MA, Dehaqani MRA.Brain-inspired feedback for spatial frequency aware artificial networks.In: 2022 56th Asilomar Conference on Signals, Systems, and Computers IEEE; 2022.p. 806-810.Perlovsky L, Bar M. Predictions and incongruency in object recognition: A cognitive neuroscience perspective.Detection and identification of rare audiovisual cues.2012; p. 139-153.Schriver KE, Hu JM, Roe AW. Spatial frequency representation in V2 and V4 of macaque monkey.Elife.