Face-selective units in human ventral temporal cortex reactivate during free recall

Properties of face-responsive individual neurons in the human ventral temporal cortex (VTC) have yet to be studied, and their role in conscious perception remains unknown. To explore this, we implanted microelectrodes into the VTCs of eight human subjects undergoing invasive epilepsy monitoring. Most (26 of 33) category-selective units showed specificity for face stimuli, with a range of response profiles. Different face exemplars evoked consistent and discriminable responses in the population of units sampled. During a free recall task, face-selective units selectively reactivated in the absence of visual stimulation during the 2-second window prior to face recall events. Furthermore, the identity of the recalled face could be predicted by comparing activity preceding recall events to activity evoked by visual stimulation. One Sentence Summary Single neurons in the human ventral temporal cortex code for individual face exemplars, both during sensory stimulation and during imagery in the absence of sensory stimulation.

with a range of response profiles. Different face exemplars evoked consistent and discriminable responses in the population of units sampled. During a free recall task, face-selective units selectively reactivated in the absence of visual stimulation during the 2-second window prior to face recall events. Furthermore, the identity of the recalled face could be predicted by comparing activity preceding recall events to activity evoked by visual stimulation. 20 One Sentence Summary: Single neurons in the human ventral temporal cortex code for individual face exemplars, both during sensory stimulation and during imagery in the absence of sensory stimulation.

Main Text:
Facial recognition is an essential adaptive social function in primates, facilitated by the extensive 25 development of specialized visual areas in the brain's ventral temporal cortex (VTC). Information processing therein must meet social demands to recognize, classify and uniquely identify a multitude of faces. First described in monkeys, studies of single neuron firing in response to visual features of faces have uncovered key "bottom-up" mechanisms of the feature space that drives neuronal firing (1)(2)(3). VTC neurons exhibit precise facial feature sensitivity, 30 supporting their role in the discrimination of individual face exemplars. However, their role in non-sensory, extraretinal processing, as would occur during imagery and recall, is difficult to extrapolate from monkeys who cannot qualitatively express their experience. In humans, neuroimaging has defined a critical node in face processing within the VTC: the fusiform face area (FFA) (4). Functional MRI (fMRI) studies support both a sensory ("bottom-up") as well as a 35 cognitive ("top-down") role for the FFA, which is activated not only when subjects view faces, but also when they expect to see a face (5,6), perform visual imagery tasks involving faces (7,8), and hold face representations in working memory (9). Activation of category-selective regions of VTC can predict recall of items in that category (10,11). In addition to fMRI studies, magnetoencephalography (12) and direct electrocortical recordings (13) in humans show fusiform responses emerging early (~100 ms) after face presentation, which have been interpreted as evidence for a sensory role for this region. While these studies have provided important insights, they lack the spatiotemporal resolution needed to uncover response properties of individual human neurons. Thus, the exquisite neuronal selectivity for facial features revealed by monkey neurophysiology, the relatively early sensitivity revealed by human field potential 5 recordings and the fMRI evidence for top-down control of the FFA have yet to be integrated.
Clinical macroelectrodes modified to include microwires provide an opportunity to study neuronal spiking in patients undergoing chronic recordings, and such in vivo human data have provided insights into a number of physiological and pathological processes, most notably in the medial temporal lobe (14). However, single and population spiking activity in the VTC has yet to 10 be explored. To bridge these knowledge gaps, we recorded from microwires in the VTCs of human subjects as they viewed face stimuli and, later, recalled them in an episodic free recall task, allowing us to examine human sensory as well as higher-order cognitive processes at the single-unit level. Recorded units exhibited a wide diversity of response patterns while maintaining strong category specificity. We show that population responses to the presentation 15 of different face exemplars can be robustly discriminated, and that these responses are reinstated when subjects recall and visualize previously-presented face images in the absence of sensory input. Our results support models of episodic memory in which the single-neuronal substrate of sensory processing is reactivated in a top-down fashion during recollection (15).
Eight subjects (four females, see Table S1) with partial epilepsy, undergoing diagnostic 20 intracranial EEG (iEEG) monitoring of VTC were implanted with micro-macro depth electrodes (16) (Ad-Tech Medical, Oak Creek, WI), using either anatomical landmarks (17) or fMRI guidance. All subjects gave informed consent under IRB guidelines, and electrode targeting was guided primarily by clinical needs. Electrode positions were localized post-operatively by superimposing MRI and computed tomography scans (18), and an example is shown in Fig. 1A

25
(full data, Fig. S1). None of these areas were involved in seizure onset, as determined by epileptologist evaluation. Subjects performed a 1-back task with faces, body parts, houses, tools and patterns (Fig. S2) while microelectrode data were recorded at 30 kHz. Sixty-three visuallyresponsive single and multi-units were recorded collectively from all eight subjects (see Table S1 for single-subject data). Of these, 33 were category-selective, with the vast majority (26,30 recorded from five subjects, 41% of visually-responsive units) selective for faces (see Fig. S3). The average time-course of all visually responsive units across subjects showed a marked preference for faces (Fig. 1B). Visual responsiveness was strongly correlated with face selectivity (p < 0.001, Spearman's ρ, Fig. S4). Fig. 1C shows a typical face-selective unit, with a vigorous response to face image presentation 35 and return to baseline firing rate after the image disappeared. However, we also observed a surprising diversity of face-selective response patterns, including units with response persistence (Fig. 1D, Fig. S5) and units whose activity showed sharp transient peaks after face presentation (Fig. 1E, Fig. S6). One unit showed a strong face-selective offset response (Fig. 1F, Fig. S7). Several units showed selective suppression to face presentation (Fig. 1G, Fig. S8), a finding 40 consistent with single-unit recordings from inferotemporal cortex in non-human primates (19,20).
We also recorded units with selectivity for non-face categoriestools and houses. With the exception of two house-selective units recorded from one subject, from whom only weakly faceselective units were recorded, all house-and tool-selective units came from subjects whose recording sites yielded no face-or body-selective units. This is consistent with the reported segregation of domains dedicated to processing animate and inanimate objects (17,21). After face-selective units, house-selective units were the most common, with four (6% of visuallyresponsive units). Fig. 1H shows one such unit (also see Fig. S9). The sole tool-selective unit is shown in Fig. 1I (also see Fig. S10). 5 Next, we sought to corroborate previous reports suggesting that individual faces (face exemplars) had unique representations in the FFA (22-24), in human VTC more broadly (25), and, at the single-unit level, in macaque face patches (1)(2)(3). Responses to single presentations (trials) of exemplars from all visually-responsive units across all subjects were concatenated into a single log-transformed pseudo-population vector. Multi-dimensional scaling (MDS), a linear technique 10 for dimensionality reduction, was then applied to visualize the relationships among trials of different exemplars in a common space. For example, responses from all trials of three face and house exemplars (Face 1-3 and House 1-3 in the stimulus set, chosen ex ante, for illustration) are presented in Fig. 2B. As expected, we observed a clear segregation of faces and houses in the representational space. We then applied MDS to the three face exemplars (Fig. 2C) and the three 15 house exemplars (Fig. 2D), alone, and found that trials of individual face exemplars appeared to be linearly separable, while trials of house exemplars were not.
To quantify this, we plotted transformed pseudo-population response vectors from each pair of exemplars presented and used a simple linear discriminant to separate them. We then performed leave-one-out cross-validation as a measure of the discriminability of each of these exemplar 20 pairs ( Fig. 2A). Classifier accuracy was very high for exemplar pairs from different stimulus categories, especially faces vs. non-face objects, showing that responses to these categories are distinct (p < 0.01, bootstrap test, Bonferroni correction). While it is unsurprising that faces can be distinguished from non-faces, given the vastly different magnitudes of the responses of many of the units, the classifier was also able to distinguish among non-face categories. Confirming the 25 previous studies, we show robust exemplar selectivity, evidenced by strong classifier performance in discriminating face exemplar pairs (>80%, p < 0.01, bootstrap test, Bonferroni correction). We also found some weaker tool exemplar decoding (p < 0.05, bootstrap test, Bonferroni correction), but no within-category exemplar decoding for other categories.
Next, we tested whether face representations could be activated spontaneously by the brain, 30 during imagery in the absence of external visual input. To that end, half of the subjects also performed an episodic free recall task, reported before (11), in which they were shown and asked to remember full-color photographs of famous faces and scenes. After performing a short interference task and putting a blindfold on, the subjects were asked to freely recall as many pictures as possible, focusing on one category (faces/places) at a time. Importantly, the subjects 35 were instructed to visualize and describe each picture that they recall in as much visual detail as possible, emphasizing unique colors, face expression, lighting, perspective, etc. Face-selective units were recoded from 3/4 subjects, and small numbers of place-selective units were recorded from 2/4 (Fig. S11A).
We observed that units' firing rates during face presentation (over pre-stimulus baseline), and in 40 the 4-second interval centered at onset of the face recall utterance (over whole-experiment baseline), were well correlated (p < 0.05, Spearman's ρ = 0.33, Fig. 3C), as was their preference for face stimuli during presentation and recall (p < 0.05, Spearman's ρ = 0.37, Fig. 3D). These correlations persisted when only strongly visually-responsive units (see Methods) were included. To further examine this content-selective relationship, and to investigate the precise temporal dynamic of this recall-triggered activity, we computed the average baseline-corrected activity for each presentation trial (Fig. 3E) and each recall event (Fig. 3F) for all face-selective units in each implant, and compared the face to place stimuli. Activity in face-selective units was significantly greater around face recall events than around place recall events (2-sample t-test, p < 0.05). Mean activity in face-selective units began increasing around 2 seconds before onset of a face 5 recall utterance, peaked, and returned to near baseline as the subject began to speak. This result is consistent with previous fMRI (10) and iEEG (11) studies showing an increase in bloodoxygenation-level dependent (BOLD) activity and high-frequency broadband signal in categoryselective VTC in the seconds leading up to recall of an item in that category.
Single-trial raster plots (Fig. 3A, Fig. S12) for face-selective units displayed remarkably similar 10 patterns of activity before recall and during the initial face image presentation, suggesting it might be possible to predict recall of individual face exemplars using firing patterns of these units during presentation. To do this, we trained a classifier similar to the one used to test for exemplar decoding in the 1-back task (Fig. 2) on data from stimulus presentation and attempted to classify individual recall events based on the activity of the units in the 2 seconds prior to 15 utterance onset. As mean firing rates can vary between during presentation and recall (Fig. S13), we normalized across the population before the classifier was applied.
Leave-one-out cross-validation was performed on a set of binary classifiers trained to discriminate each pair of exemplars. Classification accuracy was defined as correctly identifying the exemplar from which the input trial was drawn (for within-category classification), and 20 correctly identifying its category, regardless of the specific exemplar (in cross-category classification). Three out of the four subjects showed above-chance cross-category classification accuracy (bootstrap test, p < 0.05, Bonferroni-Holm correction, Fig. S11B), and two also showed above-chance face classification accuracy. This corresponds to the set of subjects from whom face-selective units were recorded, reinforcing that exemplar-selective face information can be 25 decoded at the level of few neurons spatially confined to the sampling volume of a single microwire bundle, no more than 8 mm across.
We proceeded to test the performance of our classifier sets in decoding exemplar or category based on significant activations prior to recall utterances (Fig. 3B, S11). Subject 3 showed above-chance cross-category decoding (p < 0.05, bootstrap test, Bonferroni-Holm correction), 30 while Subject 8 showed a similar trend. Taken together, Subjects 3 and 8 clearly showed face exemplar decoding above chance (p < 0.05, bootstrap test, Bonferroni-Holm correction, complete data Fig. S11C-D).
In summary, we demonstrate the first single unit recordings from the human VTC. Extending prior observations from nonhuman primates, we report a diverse range of highly face-selective 35 units within human VTC; that those units form a population code by which individual face exemplars can be discriminated; and that reactivation of the patterns forming that code occurs not only during face perception, but also during face imagination and recall. In line with prior neuroimaging work supporting a role of the VTC in conscious perception, we demonstrate selective activation during recall at the single neuron level. These findings support the role of the 40 VTC, and the FFA specifically, as the substrate of conscious face representation in the human brain, and that this structure is used not only to identify and discriminate faces observed in the environment, but also to host internally-generated representations. Our research adds to a large and growing body of literature supporting a multi-faceted role for higher-order sensory areas in subserving working memory, imagery and other processes whereby higher-order cognitive events engage the same neuronal substrates as bottom-up sensory processes. Medtronic. Data and materials availability: Data will be provided upon request. Code is available on: https://github.com/IEEG/SUFreeRecall.

Materials and Methods
Supplementary Text 10 Figures S1-S13 Table S1 References (26-32)   Responses to faces in red and places in blue. Colored areas: mean ± standard error of the mean at each timepoint. Dashed lines represent image onset at time 0 and offset. (F) Mean peri-recall time histogram. Significant difference between face and place activity (gray box, -2 to 2 seconds, 2-sample t-test, p < 0.05). Black bar: 2-sample t-tests, p<0.05, uncorrected.