Abstract
Humans rapidly detect and interpret sensory signals that have emotional meaning. The posterior temporal sulcus (pSTS) and amygdala are known to be critical for this ability, but their precise contributions—whether specialized for facial features or sensory information more generally—remain contentious. Here we investigate how these structures process visual emotional cues using artificial neural networks (ANNs) to model fMRI signal acquired as participants view complex, naturalistic stimuli. Characterizing data from two archival studies (Ns = 20, 45), we evaluated whether representations from ANNs optimized to recognize emotion from either facial expressions alone or the broader visual context differ in their ability to predict responses in human pSTS and amygdala. Across studies, we found that representations of facial expressions were more robustly encoded in pSTS compared to the amygdala, whereas representations related to visual context were encoded in both regions. These findings demonstrate how the pSTS operates on abstract representations of facial expressions such as ‘fear’ and ‘joy’ to a greater extent than the amygdala, which more strongly encodes the emotional significance of visual information more broadly, depending on the context.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
A conceptual replication has been added (Study 2); characterization of encoding models in other face processing regions is now included (see Figure 4); tests characterizing ANN representational specificity are described in Supplementary section.