Abstract
Faces are a rich source of social information. How does the infant brain develop the ability to recognize faces and identify potential social partners? We collected functional magnetic neuroimaging (fMRI) data from 49 awake human infants (aged 2.5-9.7 months) while they watched movies of faces, bodies, objects, and scenes. Face-selective responses were observed not only in ventral temporal cortex (VTC) but also in superior temporal sulcus (STS), and medial prefrontal cortex (MPFC). Face responses were also observed (but not fully selective) in the amygdala and thalamus. We find no evidence that face-selective responses develop in visual perception regions (VTC) prior to higher order social perception (STS) or social evaluation (MPFC) regions. We suggest that face-selective responses may develop in parallel across multiple cortical regions. Infants’ brains could thus simultaneously process faces both as a privileged category of visual images, and as potential social partners.
Introduction
From birth, infants see faces often [1–3], prefer to look at faces more than non-face visual stimuli [4–9], and readily distinguish a familiar face from an unfamiliar one [10, 11]. Infants’ brains visually process faces as a distinct object category [12] but it is not known when in development infants’ brains begin to process the social attributes of faces. In adult brains, perception of faces not only evokes activity in face-selective visual regions [13–15] but also in higher-order brain regions that represent multi-modal social perception and evaluation [16–18]. This paper addresses the open, and hotly debated question, of how this suite of face-selective regions develop. We test two contrasting hypotheses: (1) face-selective responses develop sequentially, with visual perception regions responding selectively to faces earlier in development than higher-order social regions, or (2) face-selective responses emerge in such that perceptual and social regions come to represent the visual category and social meaning of faces simultaneously. Here we use functional magnetic resonance imaging (fMRI) data from a large group of awake human infants to test whether face-selective responses in social regions emerge later in development than visual perceptual regions.
Serial Hypothesis of Cortical Development
One prominent hypothesis, which we call the “Serial Hypothesis” of development, holds that distinct brain regions become face-selective in a feed-forward developmental sequence (Figure 1a; for recent reviews see: [19–21]). On this view, face-selective responses first emerge in subcortical regions as a response to the salient visual properties of faces, then in higher-level visual cortex as infants gain experience looking at faces, and finally in higher-order social cognition regions as infants learn the connection between faces and social value. By analogy to the conspecific detection mechanism in newly hatched chicks [22], the first brain region to respond distinctively to faces is likely to be subcortical. This subcortical region would contain a rough template of face-like images (e.g., high contrast shapes with an upper field bias), which would then drive infants to look at faces. Because subcortical regions are small and deep in the brain, they are very hard to measure in human or non-human primate infants, but plausible sites for an innate subcortical face template in primates include the superior colliculus, pulvinar, and/or amygdala [6].
a) The Serial Hypothesis predicts that the earliest face-selective responses arise in an innate subcortical face-template, which drives infants to attend to face-like images. Visual experience with faces then causes self-organization of face-selective responses in ventral-temporal cortical regions, like FFA. Subsequently, higher-order regions like STS and MPFC can associate face shapes with emotional and social meaning. b) Alternatively, the Parallel Hypothesis proposes that from early in infancy, infants’ attention to faces is driven by both perceptual and socio-emotional processes; and that face-selective responses in STS and MPFC arise simultaneously with, and potentially provide input to, perceptual representations of faces in ventral-temporal cortex. c) For each MRI visit, swaddled infants were placed in a custom 32-channel infant head coil. Movies were projected in a mirror over their eyes. Example face stimuli pictured (a-b). Blue areas were added to brain images created by Milan Vukelić and available for Public License at: https://www.behance.net/gallery/5399443/BRAINS-%28cognitive-neurosciences%29. Photo credit for images in c to Caitlin Cunningham Photography.
Next, the frequent appearance of face-like images in infants’ visual experience would drive bottom-up, self-organizing specialization of regions in the cortical visual system [20, 23]. The Serial Hypothesis supposes that high-level visual regions initially self-organize along low-level features (e.g., retinotopy, spatial frequency, curvilinearity), called a proto-organization [20]. For example, the area that becomes the fusiform face area (FFA) would acquire a face-selective response profile gradually, as infants gain visual exposure to the combination of curvy, low spatial frequency environmental input characteristic of faces [24]. In line with this hypothesis, adult face-selective regions respond more to images with curvilinear features than images with rectilinear features, even in the absence of faces [25, 26] and macaques raised without exposure to faces do not have face-selective responses that can be measured with fMRI [27].
Finally, once high-level visual regions have formed a visual category of faces, even higher order brain regions would then be able to associate faces with multi-modal information about social meaning and emotional value. In adults, the social and emotional meaning of faces are encoded in (among other regions) superior temporal sulcus (STS) and medial prefrontal cortex (MFPC). The response profiles of these regions are distinct in adults from the perceptual face responses in FFA. The STS has been implicated in a variety of social perception tasks [17] including facial movement [28], social interaction perception [29], voice perception [30, 31], and multimodal face and voice perception [32, 33]. The MPFC is specifically involved in abstract social and emotional processes, such as social evaluation [34–36], self-referential processing [35,37–41], and emotion attribution [16,35,42–47].
The Serial Hypothesis of development predicts that responses to faces in STS and MPFC would develop later, potentially much later, than category selective regions in VTC [20–22]. One source of this prediction is the evidence that anatomical development proceeds from primary sensory areas to association cortices [48–53] (for review see: [54]). Signatures of cortical maturation including expansion [55], increased sulci depth [56], and myelination [57–59] occur in primary sensory areas earlier in development than association areas. Not only does anatomical development of the brain occur in series, but signatures of neural function such as glucose metabolism [60] and synaptogenesis [61] reach their peak in primary sensory areas prior to association areas. Perhaps most surprisingly, new neurons are still migrating and being integrated into PFC well into the second year of life [62] – a process that completes in primary sensory areas around the time of birth [63]. This evidence has led to the hypothesis that due to the structural immaturity of association cortex, functions associated with those regions in adults are not yet present in infants. Yet, it is equally plausible that rather than anatomical maturation preceding neural function, perhaps neural function and anatomical maturation are intrinsically reciprocal in brain development.
Parallel Hypothesis of Cortical Development
By contrast to the Serial Hypothesis of development, we propose a Parallel Hypothesis, in which responses to the social meanings and emotional value of faces (by reverse inference in STS and MPFC, respectively) emerge just as early in development as perceptual responses to faces in FFA (Figure 1b). Substantial behavioral and neural evidence supports this hypothesis. Behaviorally, infants’ active attention to faces cannot be fully explained by a face-template or preference for face-like images. Neonates and young infants do not merely prefer to look at faces over other stimuli, but choose to look at some faces more than others, based on both visual (e.g., skin tone, sex, facial expression) and non-visual (e.g., speaker language, prosody, social behavior) properties [11,64–72]. These behavioral data suggest that infants’ brains represent not just the perceptual form, but also the social meaning and emotional value of faces. Consistent with the predictions from behavioral data, studies using functional near-infrared spectroscopy (fNIRS), electroencephalography (EEG), and positron emission topography (PET) in human infants show responses to faces in the superior temporal sulcus (STS) as early as a few days after birth [73–80] and in the medial prefrontal cortex (MPFC) by 5-months (the earliest that has been reported) [74,77,80–82].
However, no previous study has directly tested whether face-selective responses in human infants emerge in subcortical regions followed by visual higher-level visual cortex, followed by social-emotional regions, as predicted by the Serial Hypothesis, or simultaneously across these regions, as predicted by the Parallel Hypothesis. The main limitation is that responses in subcortical and ventral visual regions cannot be measured using surface-based neuroimaging techniques like fNIRS and EEG.
Current study
For the current study, we analyzed fMRI data from 49 infants (Figure 1c) while they watched videos of faces, bodies, objects, scenes, and an abstract baseline. Infants ranged in age from 2 to 9 months, meaning that the oldest infants had three times as much visual experience as the youngest ones. Thus, these data allow us to test distinctive predictions of the two hypotheses. The Serial Hypothesis makes three predictions. First, infants as a group may not yet have any face-selective responses in the purported latest developing regions (STS and MPFC). Second, face-selective responses should be correlated with age, particularly in cortical regions. In visual perceptual regions (e.g., FFA), face-selective responses should increase with infants’ age and visual experience; and if there is any face-selective response in STS or MPFC, it should be more selective in the older infants. Third, a subset of infants (plausibly, the youngest ones) should have face-selective responses in FFA but not yet in STS and MPFC.
Conversely, the Parallel Hypothesis makes a different set of predictions. If face-selective responses appear early and simultaneously across multiple brain areas (Figure 1b), then infants as a group should show face-selective responses in subcortical, visual perceptual, and social-emotional regions. Second, face-selective responses should either be present in all infants from the earliest ages or increase simultaneously in all regions. Third, any subset of infants who have face-selective responses in FFA should also have face-selective responses in STS and MPFC.
Results
Face-selective responses observed in infant STS and MPFC
One group of infants was scanned on an older infant coil [83] with a sinusoidal acquisition sequence [84] (Coil 2011; n=26) and a second group of infants was scanned on a newer infant coil [85] with a standard acquisition sequence (Coil 2021; n=23), resulting in higher SNR (see Methods). For the first set of analyses, we treat the two datasets as independent tests of the hypothesis.
As the Serial and Parallel Hypotheses make contrasting predictions for whether infants as a group will have face-selective responses in the STS and MPFC, we first look in these regions. In a group random effects analysis, we observed responses to faces that were greater than the response to the average of non-face conditions in bilateral STS and MPFC, in both the Coil 2011 data and the Coil 2021 data (Figure 2).
Group whole brain random effects analysis of Coil 2011 data (a; n=26) and Coil 2021 data (b; n=23) revealed face responses (faces>(bodies+objects+scenes)) in both STS (top) and MPFC (bottom). Note that the precise anatomical location of STS and MPFC activations between Coil 2011 and Coil 2021 datasets cannot be directly compared due to differences in distortion and resolution (Figure S2). Group maps are displayed on representative functional images from each coil; threshold P<0.05, uncorrected, for visualization purposes. Cortical area outlined blue represents the anatomical search space. An fROI analysis of Coil 2011 (n=20, 18 individuals) revealed face responses (purple) that were significantly greater in independent data than the response to bodies (pink) objects (teal) and scenes (green) in STS and MPFC. These results were replicated in the Coil 2021 dataset (b). Symbols: *P<0.05; **P<0.01; ***P<0.001. Error bars indicate within-subject SE [86].
To test whether face responses in STS and MPFC are significantly larger than responses to each control condition (i.e., are “face selective” [15]), we used a functional region of interest (fROI) analysis. We identified the top 5% of voxels that responded more to faces than the average of the other categories within each broad region in individual infants and then extracted responses in those voxels to all four conditions in independent, left-out data from the same infant (see Methods). In the Coil 2011 fROI analysis (Figure 2a, n=18), STS and MPFC face responses were significantly and substantially greater than the response to each of the three other stimulus categories (all Ps<0.02; Table S1). These results were replicated in the Coil 2021 fROI analysis (n=20, Figure 2b; all Ps<0.003; Table S1) and when data were combined across both coils (n=38, 35 individuals; Figure 3a-b; all Ps<0.00008; Table S1).
In fROI analyses collapsed across Coil 2011 and Coil 2021 datasets (n=38, 35 individuals), we used four large anatomical search spaces (first column, blue, projected onto an infant anatomical image) of STS (a), MPFC (b), ventral-temporal cortex (c), and lateral-occipital cortex (d). Bar charts show the average response across participants in each fROI (top 5% of voxels) to each stimulus category (compared to baseline) in data independent of that used to define the fROI. Error bars indicate within-subject SE [86]. Symbols used to report one-tailed statistics from linear mixed effects models: †P<0.1; *P<0.05; **P<0.01; ***P<0.001. Line graphs show selectivity analysis with different proportions of voxels selected with the average response in independent data for faces (purple), bodies (pink), objects (teal), and scenes (green). The vertical dashed line marks the top 5% and corresponds to the bar charts. Error bars are standard error of the mean. The scatter plots show that the contrast value (the difference between the face response and the average response to all non-face conditions) for each fROI was not correlated with age (all Ps>0.19).
With the Coil 2011 and Coil 2021 datasets combined, we next conducted the same fROI analyses varying the number of voxels selected from the top 1% of voxels to 100% of voxels in each of the parcels, and tested responses in held out data. In both STS and MPFC, no matter how large the subset of voxels selected, the observed face response was significantly greater than the response to each non-face condition (Figure 3a-b) indicating face-selective responses in infant STS and MPFC are robust and do not depend on the threshold for fROI definition.
Thus, infants as a group have face-selective responses in STS and MPFC, regions that represent social meaning and emotional value in adults. The presence of face-selective responses in infant STS and MPFC importantly constrains the Serial Hypothesis: if face selectivity develops serially, the process must occur within the first few months of life.
Infants’ cortical responses to faces were not correlated with age
The two hypotheses make contrasting predictions for whether face-selective responses in cortical regions should increase with age and visual experience. We tested these predictions in two visual regions (FFA and OFA) and in two social-emotional regions (STS and MPFC). To maximize the potential that we would be able to detect changes with age and experience, we combined the Coil 2011 and Coil 2021 datasets. Importantly, the amount of low-motion data included from each infant was not correlated with age (Fig. S1).
First, we confirmed that there is a selective response to faces in the visual regions of these infants (see [12], for a complementary analysis of Coil 2021 dataset alone in FFA- and OFA-specific parcels). To account for differences in distortion between Coil 2011 and Coil 2021 datasets (See Fig. S2 for example images), we used an expanded parcel of the ventral temporal cortex which included the location of the FFA (ventral parcel) and an expanded parcel of lateral occipital cortex that included the location of the OFA (lateral parcel; see Methods). In an fROI analysis (n=38 datasets from 35 infants), the face response in the ventral parcel (Figure 3c; Table S1) was substantially and significantly larger than the response to each of the non-face conditions (all Ps<0.005). In the lateral parcel (Figure 3d; Table S1), we found weak evidence for a face-selective response (all Ps<0.06).
Next, we checked whether the difference between responses to face and non-face stimuli increased with age in any cortical region. In a linear mixed effects (LME) analysis with age as a predictor and contrast value (i.e., the response to faces minus the average response to non-faces conditions) as the dependent variable (Figure 3c-d, last column), infants’ age did not predict the contrast value in either of the visual regions (LME, all Ps>0.2; Table S2). Similarly, there was no effect of age on the contrast value in either STS or MPFC parcels (all Ps>0.5; Table S2).
Thus, we did not find evidence that face-selective responses in cortical regions increase with age. Instead, face-selective responses in both visual and social-emotional regions are relatively constant between ages 2 and 9 months in all regions tested.
No subset of infants showed face-selective responses in subcortical and visual regions but not social-emotional regions
The Serial Hypothesis distinctively predicts that a subset of infants (plausibly, the youngest ones) should have face-selective responses in subcortical regions and FFA, but not yet in STS and MPFC. By contrast, the Parallel Hypothesis predicts that any subset of infants who have face-selective responses in FFA should also have face-selective responses in STS and MPFC. We tested these predictions in four complementary ways.
First, we considered the full sample collected with Coil 2021, because these data had the fewest distortions and thus provided the best measurements in ventral visual and subcortical regions. Thus, in these infants we tested whether subcortical face-responses emerge earlier than ventral visual regions, and whether ventral visual face responses emerge earlier than STS or MPFC. Group random effects of the Coil 2021 dataset did not reveal subcortical responses to faces when compared to the average of the other conditions (see Fig S3 for whole brain analyses of Coil 2011 data); however fROI analyses are more sensitive.
We defined parcels to search for subcortical responses in superior colliculus, amygdala, and thalamus. The superior colliculus, a structure found in the brainstem, is thought to increase the salience of face-like images and thus drive infants’ face looking behavior [20, 22]. To search for the (very small) superior colliculus, we used our fROI method within an anatomically defined parcel covering the brainstem (Figure 4a; Table S1). While the brainstem contains a region responding numerically more to faces than to objects (P=0.06), it was not statistically different than the response to either bodies (P=0.2) or scenes (P=0.6). The amygdala and thalamus are two other plausible sites for the putative face template. An fROI analysis (Figure 4a; Table S1) revealed weak evidence for face-selective responses in the amygdala (all Ps≤0.06) and the thalamus (Ps≤0.11). In the same group of infants (Coil 2021 data), we observed face-selective responses in the ventral (all Ps<0.001) and lateral parcels (all Ps<0.03) as well as STS and MPFC parcels (reported above; Figure 2). These differences between regions were not due to differences in temporal signal-to-noise ratio (tSNR): subcortical, STS, and ventral parcels had a higher tSNR than MPFC and lateral parcels (Figure S4). In the Coil 2021 dataset alone, age was not a good predictor of contrast magnitude in any of the cortical parcels (MPFC ß=0.03, P=0.9; STS ß=-0.01, P=0.9; Ventral ß=0.12, P=0.3; Lateral ß=0.21, P=0.3) or the thalamus (ß=0.04, P=0.8) but was a marginal predictor of contrast magnitude in the brainstem (ß=0.07, P=0.09) and amygdala (ß=0.12, P=0.1). Thus, we find no evidence that face-selective responses emerge earlier in subcortical than in cortical regions, or earlier in visual perception regions than in social-emotional regions, in the Coil 2021 dataset.
(a) An fROI analysis of subcortical areas in infants measured in the Coil 2021 dataset revealed that the face response (purple) in the amygdala (a, top) and thalamus (a, middle) was significantly greater than the response to bodies (pink) and objects (teal) but not statistically different than the response to scenes (green). In the brainstem of these infants (a, bottom), no face-selective response was detected. (b) In an fROI analysis of all infants younger than 5.0 months, both STS and MPFC had face-selective responses, the ventral parcel had a marginally face-selective response, and the lateral parcel had no face-selective response. (c) In the 10 infants with the greatest contrast value (response to faces minus the average response to non-face conditions) in the FFA, both STS and MPFC had face selective regions, while the lateral parcel had a region that responds significantly more to faces than objects or scenes, but not bodies. Error bars indicate within-subject SE [86]. Symbols used to report one-tailed statistics from linear mixed effects models: n.s.P>0.1; †P<0.1; *P<0.05; **P<0.01; ***P<0.001.
Second, we considered just the youngest infants in our combined sample, those less than 5.0 months old. In an fROI analysis of these young infants (n=15, 2.5-4.6 months, mean=3.6 months; Figure 4b; Table S3), we found weak evidence for a face-selective response in the ventral parcel (all Ps<0.09) but not the lateral parcel (all Ps>0.1). Yet both the STS and MPFC also exhibited face responses that were substantially and significantly greater than the response to each of the non-face conditions (all Ps≤0.03). Taken together, there was no evidence that visual perception regions exhibit more robust face-selective responses than social-emotional regions in the youngest infants.
Third, we examined the infants with the most face-selective response in FFA. We selected the 10 infants with the largest contrast value (the response to faces subtracting the average response to non-face conditions) in the ventral fROI. In these infants (Figure 4c; Table S3), we also found face-selective responses in both the STS (all Ps<0.02) and the MPFC (all Ps<0.009).
Fourth, in a final effort to determine whether the FFA develops before the STS and/or MPFC, we randomly sampled a subset of 15 infants and tested whether FFA, STS and/or MPFC showed a face-selective response. Of 2500 random sub-samples (Figure S5), all three fROIs had face-selective responses in 36.8% (919 instances) of subsamples while 0.9% (23 instances) of subsets showed no face-selective response in any region. In 1.2% of subsets (31 instances), infants had face-selective responses in VTC but neither STS or MPFC, whereas face-selective responses in STS and/or MPFC but not VTC occurred in 48.5% (1212 instances) of subsets. Thus, subsets of infants who have face-selective responses in VTC but not STS or MPFC are very rare and non-representative. Taken together, these results indicate that face-selective responses in visual-perceptual regions do not emerge prior to face-selective responses in higher-order social-emotional regions.
Discussion
Here we tested two alternative hypotheses about the development of face-selective responses in human cortex. Contrary to the predictions of the Serial Hypothesis, we found no evidence that face selectivity emerges sequentially in subcortical regions, followed by ventral visual regions, followed by “higher order” social and emotional regions. Specifically, we found evidence for face-selective responses in STS and MPFC across our dataset and these responses were not correlated with age. Most tellingly, we did not observe any evidence for a subset of infants with face-selective responses in FFA but not in STS and MPFC. These results, however, are consistent with the Parallel Hypothesis, which proposes that face-selective responses emerge simultaneously across perceptual and social-emotional regions.
We note several limitations of the current study. Although we failed to find a correlation between infants’ age and the magnitude of face-selective responses in any region, it is possible that a true correlation was masked by the variability of data quality across infants. Awake infants are a very challenging population for fMRI scanning. Using dynamic stimuli and custom-built infant-sized head coils [83, 85], we acquired fMRI data from a large group of young infants [12], but inevitably, each infant yielded a different amount of usable data. Because the amount of data per infant was low (though not correlated with age), the estimate of face selectivity in each individual infant is noisy. Also, the age range we tested constituted a three-fold increase in visual experience but was still small relative to the human lifespan. The youngest infants who successfully participated were already 2 months old, and the oldest infants were only 9 months old. So, face-selective responses in cortical regions may increase with age prior to age 2 months, or after 9 months, or with a very gradual slope that requires more data to detect. For these reasons, our failure to find a correlation between age and face selectivity is only weak evidence against the Serial Hypothesis.
The stronger evidence against the Serial Hypothesis was the finding that no subset of infants in this population had face-selective responses in FFA but not STS or MPFC. One possible limitation of this analysis is that the temporal signal-to-noise ratio (tSNR) and spatial distortions of the images were not homogeneous across the brain. For example, the subcortical regions, which are anatomically smaller, were much more distorted in the Coil 2011 dataset than the Coil 2021 dataset. The hint of face-selective responses in the amygdala and thalamus in the Coil 2021 dataset is intriguing but should be confirmed in future studies. We did not have enough data to test whether subcortical regions had face-selective responses in the youngest infants, as predicted by the Serial Hypothesis. On the other hand, consistent with the Parallel Hypothesis, nearly every subset of infants we checked, including just the youngest infants (2.5-4.6 months), had face-selective responses in STS and MPFC.
Potential Functions of STS and MPFC in Infancy
What functions might STS and MPFC be serving in young infants? For simplicity of argument, we have up to this point treated STS and MPFC as part of a single class of “higher order” regions that respond to social-emotional face features. However, these regions are anatomically distant from each other and associated with different functions in adults. The STS is implicated in social perception, specifically in representing the meaning of facial movements. The MPFC is implicated in social evaluation, representing the value of a social stimulus or event with respect to oneself or others. From behavioral evidence, it is plausible that infants engage in both kinds of social cognition; and from (limited) fNIRS evidence, it is plausible that the STS and MPFC serve analogous functions in infants to those they serve in adults.
For example, in adults a region of STS responds more to dynamic faces and voices than any other stimulus, including still faces [17, 87]. Infants prefer to look at dynamic faces more than still faces [88, 89], and in studies using fNIRS, channels near STS in infants responds more to dynamic faces compared to dynamic objects or scenes [74,76,90,91]. Another region of STS in adults responds to cues of animacy and social interaction in amorphous shapes or point-light display [29]. Infants use the same cues to identify agents and interpret actions in behavioral experiments [92–96], and an fNIRS channel near STS responded to point light displays of face movement compared to scrambled motion [75, 97]. Thus, it is plausible that in infants and adults, regions of STS are responsible for encoding other people’s facial and bodily movements as meaningful actions.
By contrast, in adults MPFC plays a distinct role from the STS. Observing social relationships, inferring others’ emotions, and engaging in self-relevant tasks activate areas of MPFC [33–46]. Substantial behavioral evidence indicates that infants evaluate social stimuli in terms of their valence and self-relevance. Infants prefer to look at smiling faces and prosocial agents more than neutral, negative, or anti-social faces and agents [68,94,98,99] and, self-relevant faces such as those that match the race or gender of an infant’s primary caregiver, garner longer looking times [67,69,100–104]. Infants also have a preference for people that sing the same song [105] or speak the same language [71] as their primary caregiver. These behavioral capacities may be at least in part supported by MPFC. FNIRS channels over infant MPFC respond more to smiling faces compared to neutral faces [106, 107], and to people that appear to engage directly with the infant by using a direct gaze [81, 82], playing hand games [79], or speaking in an infant-directed manner [108]. Taken together, these results provide evidence that infant MPFC, like adult MPFC, preferentially responds to valued, emotional, and self-relevant social stimuli.
Process of Parallel Cortical Development
In summary, we argue that face-selective responses may emerge in parallel during infancy in both perceptual and higher-order association cortices. In adults, these brain regions have functions that range from detection and recognition of facial identity (e.g., FFA) to perception of facial movement and expression (STS) to the extracting the emotional meaning and value of the social interaction (MPFC). How could face selectivity emerge in parallel across all of these regions? One possibility is that even prior to maturation of sensory areas, social interactions elicit a reward response which drives young infants to attend to pro-social, self-relevant aspects of their environment. Under this framework, seeing engaging human faces, hearing self-relevant voices, and pro-social touch would activate infants’ reward network, resulting in increased attention to socially relevant stimuli. In support of this hypothesis, infant STS responds to social perception in the auditory [79,91,109,110], visual [74,76,90,91], and tactile [111] domains while MPFC is part of the reward network [112–115] and responds to socially self-relevant environmental cues in infants [79,81,82,106–108]. In infants, affective touch enhances infants’ attention to faces [116] and increases functional coupling between STS and MPFC [117], while auditory language modulates activity in STS and MPFC [110, 118]. Social perception and interaction through touch and audition begins earlier and is more sustained in neonates’ early experiences than visual input, in part because of neonate’s limited visual acuity. Thus, social experience in non-visual modalities could prime and potentiate the receptivity of STS and MPFC to faces, particularly those associated with infant-directed speech or touch. In this way, social reward could drive infants’ preferences to look at familiar, friendly, or socially valued people, shaping the input received by visual face-perception regions. On this view, during infancy, face perception in FFA, social perception in STS and social evaluation in MPFC may have reciprocal and mutually reinforcing connections. The consequence of such connections would be simultaneous, parallel development of face-selectivity in all of these cortical regions.
The finding that MPFC has functionally specific responses in infancy also has implications for broader theories of cortical development. By anatomical metrics like myelination and cortical thinning, prefrontal cortices are the slowest cortical areas to mature. Anatomical development of PFC is implicated in late-developing cognitive functions such as executive function and language [48–54,119–121]. Thus, many researchers have assumed the prefrontal cortex is not a major player in cognitive function during the earliest stages of development. In contrast, our results are consistent with mounting evidence that infant PFC has remarkably sophisticated function despite structural immaturity [122]. Prior studies have reported activity in lateral PFC that corresponds to adult functional organization. For example, a region in lateral PFC in infants responds more to sequences with statistical regularity compared to unstructured input [123–125]. Another area in infant PFC responds more to native language compared to foreign language or other non-speech sounds [126–130] and putative language areas have highly specific connection with orthographic-specific perceptual regions in VTC [131]. Thus, Werchan and Amso [132] have argued that PFC function is not immature in infants, but functions adaptively for infants’ age-specific ecological niche; Deheane-Lambertz and Spelke [133] argued for PFC as key player in infant cognitive development; and Saygin and colleagues [131, 134] have argued that long-range structurally connectivity with PFC constrains functional development. Our current results provide further evidence in favor of such views, incorporating the social functions of MPFC as well.
Summary
In sum, using fMRI in awake human infants, we find that in addition to visual perception regions, the STS and MPFC also have face-selective responses early in infancy. These results suggest that human infant brains not only perceive faces as a special class of objects, but also process faces as self-relevant potential social partners. Our findings about the origins of face selectivity accord with a broader theoretical view about the origins of the mind: infants are not passive statisticians analyzing data in a purely data-driven fashion [135], but actively participate in the construction of their own minds, choosing which input to process and hypotheses to test in pursuit of their own agenda. The social value of faces may thus be reflected in the order and timing of development of face-selective responses across multiple regions of the human infant brain.
Methods
Subjects
Infants were recruited from the Boston metro area through word-of-mouth, fliers, and social media. Parents of participants were provided parking or, when travel was a constraint for participation, reimbursed travel expenses. Participants received a small compensation for each visit and, whenever possible, printed images of their brain. We recruited 86 infants (2.1-11.9 months; mean age = 5.2 months; 41 female; 42 Coil 2021) and recovered usable data (see data selection) from 49 infants (Coil 2011, n=26, 2.5-8.7 months, 13 female; Coil 2021, n=23, 2.1-9.7 months, 11 female). These data were previously used to test if infants have category-selective responses to faces, bodies and scenes in ventral temporal cortex. Results from that investigation are reported in [12].
Stimuli
Paradigm 1
Infants watched videos of faces, bodies, objects, and scenes [87], and a colorful, curvy, abstract baseline was used to maintain infants’ attention. Videos were selected to be categorically homogeneous within blocks and heterogeneous between blocks. Each block was 18s and was composed of six 3s videos from the same category. Face videos showed one child’s face on a black background. Object videos show toys moving. Body videos showed children’s hands or feet on a black background. Scene videos showed natural environments. Baseline blocks were also 18s and consisted of six 3s videos that featured abstract color scenes such as liquid bubbles or tie-dyed patterns. Block order was pseudo-random such that all blocks played once prior to playing again. Videos played continuously for as long as the infant was content, paying attention, and awake.
Paradigm 2
Infants watched videos from the same five categories (faces, objects, bodies, scenes, curvy baseline) as in Paradigm 1. However, the videos were shortened to 2.7s, interleaved with still images from the same category (but not drawn from the videos) presented for 300 ms. All blocks were 18s and included 6 videos and 6 images. Video and image order were randomized within blocks and block order was pseudorandom by category. Paradigm 2 contained one additional block depicting hand-object interactions (not included in the present analysis).
Data Collection
Infants were swaddled if possible. A parent or researcher went into the scanner with the infant while a second adult stood outside the bore of the scanner. Infants heard lullabies (https://store.jammyjams.net/products/pop-goes-lullaby-10) for the duration of the scan. For data collected with Coil 2011, lullabies were played over a loudspeaker into the scanning room. For data collected with Coil 2021, lullabies were played through custom infant headphones (Figure 1c).
Coil 2011
For data collected with Coil 2011 we used similar methods as previously reported [77]. We used a custom 32-channel infant coil designed for a Siemens Trio 3T scanner [83]. We used a quiet EPI with sinusoidal trajectory [84] with 22 near-axial slices (repetition time, TR=3s, echo time, TE=43 ms, flip angle=90°, field of view, FOV=192 mm, matrix=64x64, slice thickness=3 mm, slice gap=0.6 mm). The sinusoidal acquisition sequence caused substantial distortions in the functional images (Figure S2a).
Coil 2021
Infants wore custom infant MR-safe headphones (Figure 1a). Infant headphones attenuated scanner noises and allowed infants to hear the lullabies. An adjustable coil design increased infant comfort and accommodated headphones as well as a variety of head sizes (Figure 1c). The new infant coil and infant headphones designed for a Siemens Prisma 3T scanner enabled the use of an EPI with standard trajectory with 44 near-axial slices (repetition time, TR=3s, echo time, TE=30ms, flip angle=90°, field of view, FOV=160 mm, matrix=80x80, slice thickness=2 mm, slice gap=0 mm). Functional data collected with Coil 2021 were less distorted (Figure S2b) than data collected with Coil 2011 (Figure S2a).
Data Selection (subrun creation)
To be included in the analysis, data had to meet criteria for low head motion. Motion criteria are the same as our two previous studies [12, 77]. Data were cleaved between consecutive timepoints having more than 2 degrees or mm of motion, creating subruns, which contained at least 24 consecutive low-motion volumes. All volumes included in a subrun were extracted from the original run data and combined to create a new Nifti file for each subrun. Paradigm files were similarly updated for each subrun. Volumes with greater than 0.5 degrees or mm of motion between volumes were removed from all analyses. Total data collected and total data included in each analysis from each subject are reported in [12].
Data collected within a 30-day window from a single subject were analyzed as one data point. Subjects had to have at least 5 minutes of low-motion data to be included in the whole brain analysis. For Coil 2011, this procedure resulted in 566.0 minutes of usable data (mean=5.9 minutes, s.d.=11.3) from 27 subjects (mean age=4.7 months, s.d.=1.4; n=10 Paradigm 2). For Coil 2021, this procedure resulted in 514.3 minutes of usable data (mean=6.5 minutes, s.d.=12.6) from 23 subjects (mean age=5.7 months, s.d.=2.1; all Paradigm 2). Overall, 49 unique individual infants’ data were included in whole brain analyses (one subject contributed data to both Coil 2011 and Coil 2021 datasets).
To be included in the fROI analysis, subjects had to have at least two subruns with at least 96 volumes in each subrun (one to choose voxels and the other to independently extract response magnitudes from the selected voxels). We included subjects in the fROI analysis for whom we had two timepoints and accounted for individual differences by using a subject random effects regressor and an age regressor in our linear mixed effects analyses (see fROI methods). This resulted in a final fROI dataset of 39 total data points from 36 unique individuals (2.5-9.7 months).
Preprocessing
Each subrun was processed individually. First, an individual functional image was extracted from the middle of the subrun to be used for registering the subruns to one another for further analysis. Then, each subrun was motion corrected using FSL MCFLIRT. If more than 3 consecutive images had more than 0.5 mm or 0.5 degrees of motion, there had to be at least 7 consecutive low-motion volumes following the last high-motion volume for those volumes to be included in the analysis. Additionally, each subrun had to have at least 24 volumes after accounting for motion and sleep TRs. Functional data were skull-stripped (FSL BET2), intensity normalized, and spatially smoothed with a 3mm FWHM Gaussian kernel (FSL SUSAN).
Data registration
All subruns were registered within subjects and then each subject was registered to a standard template. First, the middle image of each subrun was extracted and used as an example image for registration. If the middle image was corrupted by motion or distortion, a better image was selected to be the example image. The example image from the middle subrun of the first visit was used as the target image. All other subruns from each subject were registered to that subject’s target image using FSL FLIRT. The target image for each subject was then registered to a template image using FSL FLIRT. For data collected with Coil 2011, the template image was the same one used previously [77]. For data collected with Coil 2021, a template image was selected from a single subject from whom we had a high-resolution anatomical image on which to display functional data. Given the distortion of the images and the lack of an anatomical image for each subject, traditional registration tools do not effectively register infant data. As such, we attempted to register each image using a rigid, an affine, and a partial affine registration with FSL FLIRT. The best image registration was selected by eye from the three options and manually tuned using the Freesurfer GUI for the best possible data alignment. Each image took between 2 and 8 hours of human labor to register. Images collected with Coil 2021 were transformed to the anatomical space of the template image for visualization in Figure S2.
Subject-level Beta and Contrast Maps
Functional data were analyzed with a whole-brain voxel-wise general linear model (GLM) using custom MATLAB scripts. The GLM included 4 condition regressors (faces, scenes, bodies, and objects), 6 motion regressors, a linear trend regressor, and 5 PCA noise regressors PCA noise regressors analogous to GLMDenoise [136]. Condition regressors were defined as a boxcar function for the duration of each condition block (18s). Infant inattention or sleep was accounted for using a single impulse nuisance (‘sleep’) regressor. The sleep regressor was defined as a boxcar function with a 1 for each TR the infant was not looking at the stimuli, and the corresponding TR was set to 0 for all condition regressors. Boxcar condition and sleep regressors were convolved with an infant hemodynamic response function (HRF) that is characterized by a longer time to peak and deeper undershoot compared to the standard adult HRF [137]. Next, data and all regressors except PCA noise regressors were concatenated across subruns. PCA noise regressors were computed across concatenated data and beta values were computed for each condition in a whole-brain voxel-wise GLM. Subject-level contrast maps were computed as the difference between the face beta and the average of all non-face (bodies, objects, and scenes) betas for each voxel using in-house MATLAB code.
Group Random Effects Analysis
To test whether there was systematic overlap between areas of activation across subjects, we conducted one group random effects analysis for eligible data collected with each coil. First, subject-level contrast difference (faces – non-face) maps (see data inclusion and subject contrasts above) were transformed to coil-specific template space (see data registration method above). Group random effects analyses were performed using Freesurfer mri_concat and Freesurfer mri_glmfit. Face responses were only considered if they were significantly greater than non-faces and fell within group-constrained parcels (see next section for parcel definition).
Parcels / Search Spaces
Due to the distorted nature of the Coil 2011 dataset, we opted to used larger parcels than the standard FFA parcel we used previously [12] in order to include all available data. We created lateral, ventral, STS, MPFC, and subcortical parcels using the Glasser atlas [138]. The lateral parcel included Glasser areas LO1, LO2, LO3, V4, V4t and PIT while the ventral parcel included Glasser areas VMV1, VMV2, VMV3, VVC, PHA1, PHA2, PHA3, and FFC. For the STS parcel, we used Glasser areas STSvp, STSva, STSdp, STSda, STV, and for the MPFC parcel we used Glasser areas p24, d32, 9m, and p32. We also used the anatomically defined thalamus, brainstem, and amygdala from MNI space.
Functional Region of Interest (fROI)
To determine if cortical responses are face-specific, we utilized a functional region of interest (fROI) approach. Due to the variable amount of data in each subrun for each subject and the impact this could have on reliable parameter estimates from the GLM, we first combined or split subruns to approximately equate the amount of data across subruns within subjects. For example, if a subject had three subruns and the first was 30 volumes, the second was 75 volumes and third was 325 volumes, then we concatenated the first two subruns to create one subrun and we split the third subrun into 3 resulting in a total of four subruns with approximately 100 volumes per subrun.
To constrain search areas for voxel selection, we used anatomically defined parcels (see parcels/search spaces) transformed to subject native space. We used an iterative leave-one-subrun-out procedure such that data were concatenated across all subruns except one prior to the whole-brain voxel-wise GLM and contrast were computed (described above). The top 5% most significant voxels for the contrast faces>(bodies+objects+scenes) within an anatomical constraint parcel were selected as the fROI for that subject, and the parameter estimates were extracted from a GLM on the left-out subrun. For all bar plots, beta values were averaged across participants and experiments.
To determine whether a region’s response was category-selective, we fit the beta-values using a linear mixed effects model. In each region, we had an a priori hypothesis faces would elicit the largest response. So, in each model, we dummy-coded the other three control conditions, to test the hypotheses that the response to each control condition was lower than to the predicted face.
Specifically, we fit a model in MATLAB with the expression:
where the three dummy-coded condition regressors are f1 (bodies), f2 (objects) and f3 (scenes). Fixed effects parameters of no interest were age, motion, and sex. Motion was the fraction of scrubbed volumes. Subject and coil were each coded as random intercepts for all models. In the Coil 2011 analyses, we also included a fixed effect paradigm parameter (“para”) because two slightly different versions of the experimental paradigm were used during data collection.
The response in a parcel was deemed selective if the fixed effect coefficient for each of the three control conditions was significantly negative, using a t-test. Because predictions are unidirectional, reported p-values are one-tailed. For example, a face parcel was only deemed face-selective if the parameter estimates for the body, object, and scene regressors were significantly negative.
Effects of infant age
Effect of age on contrast magnitude
One metric of face selectivity is the difference between the response to faces and all other categories (i.e., the contrast value). To determine if the contrast difference between faces and all non-face conditions changed as a function of infant age, we used a linear mixed effects model for each parcel. Contrast values (i.e., face beta – average beta for all non-face conditions) was the dependent variable. The predictor of interest was age. Motion and sex were fixed effects of non-interest. Coil and subject were coded as random effects.
Author contributions
HLK and RS designed the study. HLK, LH, and IN collected the data. HLK analyzed the data with input from and supervision by MAC, NK, and RS. HLK, MAC, NK, and RS wrote the manuscript. All authors provided feedback on the final version.
Supplemental Figures and Tables
The amount of data included for an individual data point was not correlated with the age of that subject (r=0.19, p=0.12). Each participant that contributed at least some low motion data is indicated by blue circles. Participants that contributed no data are indicated by gray points.
Representative (not ideal) image of fMRI data collected on Coil 2011 with EPI using a sinusoidal trajectory (a) and an EPI with a standard trajectory collected on Coil 2021 (b).
A group random effects analysis of the Coil 2011 dataset (n=26) revealed a subcortical activation to faces, outlined in blue. These face responses could not be localized to a specific subcortical area due to the distorted nature of the Coil 2011 functional data.
(a) tSNR across Coil 2021 subjects in cortical (left) and subcortical (right) parcels. (b) tSNR for the youngest subjects across cortical parcels. (c) tSNR in cortical parcels for the 10 subjects with the most face selective responses. Error bars indicate within-subject SE.
Proportional Venn Diagram reveals the number of permutations out of 2500 that MPFC (blue), STS (yellow), and FFA (red) exhibited face-selective responses with overlapping observations represented by merged colors.
Acknowledgments
This research was carried out at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. The authors thank Nayanika Das and Somaia Saba for help with registrations; Steven Shannon, Atsushi Takahashi, and Boris Keil for technical support; members of Saxe Lab, and members of the Kanwisher Lab for help during recruitment and data collection; the Cambridge Writing Group, and members of the Saxe Lab and Kanwisher Lab for helpful comments on the manuscript; Michelle Hung and Kirsten Lydic for code review; Hannah LeBlanc for all the things; and all the infants and their families. Funding: We gratefully acknowledge support of this project by a National Science Foundation (graduate fellowship to HLK; Collaborative Research Award #1829470 to MAC), NIH (#1F99NS124175 to HLK; #R21-HD090346-02 to RS; #DP1HD091947 to NK; shared instrumentation grant S10OD021569 for the MRI scanner), the McGovern Institute for Brain Research at MIT, and the Center for Brains, Minds and Machines (CBMM), funded by an NSF STC award (CCF-1231216).
References
- [1].↵
- [2].
- [3].↵
- [4].↵
- [5].
- [6].↵
- [7].
- [8].
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].
- [39].
- [40].
- [41].↵
- [42].↵
- [43].
- [44].
- [45].
- [46].↵
- [47].↵
- [48].↵
- [49].
- [50].
- [51].
- [52].
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].
- [66].
- [67].↵
- [68].↵
- [69].↵
- [70].
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].↵
- [93].
- [94].↵
- [95].
- [96].↵
- [97].↵
- [98].↵
- [99].↵
- [100].↵
- [101].
- [102].
- [103].
- [104].↵
- [105].↵
- [106].↵
- [107].↵
- [108].↵
- [109].↵
- [110].↵
- [111].↵
- [112].↵
- [113].
- [114].
- [115].↵
- [116].↵
- [117].↵
- [118].↵
- [119].↵
- [120].
- [121].↵
- [122].↵
- [123].↵
- [124].
- [125].↵
- [126].↵
- [127].
- [128].
- [129].
- [130].↵
- [131].↵
- [132].↵
- [133].↵
- [134].↵
- [135].↵
- [136].↵
- [137].↵
- [138].↵