The distributed representation of random and meaningful object pairs in human occipitotemporal cortex: The weighted average as a general rule

doi:10.1016/j.neuroimage.2012.12.023

NeuroImage

Volume 70, 15 April 2013, Pages 37-47

https://doi.org/10.1016/j.neuroimage.2012.12.023 Get rights and content

Abstract

Natural scenes typically contain multiple visual objects, often in interaction, such as when a bottle is used to fill a glass. Previous studies disagree about the representation of multiple objects and the role of object position herein, nor did they pinpoint the effect of potential interactions between the objects. In an fMRI study, we presented four single objects in two different positions and object pairs consisting of all possible combinations of the single objects. Objects pairs could form either a meaningful action configuration in which they interact with each other or a non-meaningful configuration. We found that for single objects and object pairs both identity and position were represented in multi-voxel activity patterns in LOC. The response patterns of object pairs were best predicted by a weighted average of the response patterns of the constituent objects, with the strongest single-object response (the max response) weighted more than the min response. The difference in weight between the max and the min object was larger for familiar action pairs than for other pairs when participants attended to the configuration. A weighted average thus relates the response patterns of object pairs to the response patterns of single objects, even when the objects interact.

Highlights

► Responses to pairs are predicted by a weighted average of responses to single objects. ► The strongest single-object response is weighted more than the min response. ► Max and min response are weighted differently for different types of object pairs.

Introduction

Object identity is considered to be extracted through hierarchical processing along the ventral object vision pathway. Ultimately, it is represented in distributed patterns of neural activity in the highest stages of that pathway, namely the inferior temporal cortex in monkeys and object-selective regions in human occipitotemporal cortex (e.g. DiCarlo and Cox, 2007, Haxby et al., 2001). Many studies have investigated neural responses to isolated objects (e.g., Carlson et al., 2003, Cichy et al., 2011, Kravitz et al., 2010, Logothetis and Sheinberg, 1996, Spiridon and Kanwisher, 2002, Tanaka, 1996), a situation atypical of the real world, where we usually see multiple objects at the same time. However, an increasing number of studies have revealed that the representation of a particular object is altered by the presence of other objects (e.g., Chelazzi et al., 1998, Miller et al., 1993, Reddy and Kanwisher, 2007, Rolls and Tovee, 1995, Zoccolan et al., 2005, Zoccolan et al., 2007). The exact nature of the coding of displays containing multiple objects is not clear because of discrepancies between existing studies and a lack of investigation of potentially relevant factors. Here we will focus upon the simplest situation, namely displays containing two objects. We investigated one discrepancy, namely the exact relationship between the responses to object pairs and the responses to their constituent objects, and two potentially relevant factors: the existence of an action relationship between the two objects of a pair and the task context.

Theoretically, the relationship between the representation of single objects and of object pairs could be captured by many functions, including the following four possibilities that describe how the response to the object pair relates to the response to the constituent objects when presented in isolation: (i) a simple averaging of the responses to the two objects, (ii) a weighted average with more weight to the object that elicits the strongest response in a neuron (the ‘max’ object as opposed to the ‘min’ object), (iii) a nonlinear max operator, which is an extreme version of a weighted average with the weight for the min object being 0, and (iv) any of many possible models in which the response to the object pair cannot be easily predicted from the responses to the two single objects.

At the moment we can only draw limited conclusions from the studies available in the literature for the following reasons. First, studies are not consistent. There is evidence for non-linear relationships (e.g. Heuer and Britten, 2002), with the max operator as the most prominent proposal. A couple of studies comparing responses to single objects and object pairs argued in favor of a simple averaging (MacEvoy and Epstein, 2009, Zoccolan et al., 2005), while other studies found evidence for a weighted average with more weight given to the max object (Agam et al., 2010, Reddy et al., 2009). Zoccolan et al. (2007) provided one explanation for the discrepancy between simple averaging and weighted averaging by showing a negative relationship between clutter tolerance and stimulus selectivity so that minimally selective neurons will tend to implement a max operator while the very highly selective neurons (which were mostly studied by Zoccolan et al., 2005) implement a simple average.

Second, all these previous studies investigating the relationship between single-object and paired-object responses have focused upon ‘random’ object pairs, in which there is no meaningful relationship between the two composing objects. Stimuli used were geometric forms (Zoccolan et al., 2005, Zoccolan et al., 2007), for which no particular configuration is preferred, or randomly chosen complex objects (Agam et al., 2010, MacEvoy and Epstein, 2009, Reddy et al., 2009). In the second case, these objects are combined so that neither the configuration nor the relative size correspond to real-world experiences. However, behavioral, fMRI, and TMS studies (e.g., Green and Hummel, 2006, Kim and Biederman, 2010, Kim et al., 2011, Riddoch et al., 2003, Riddoch et al., 2006) have noted that two objects can often interact meaningfully, for example by forming a so-called action pair (e.g. a cork screw on top of a bottle of wine), and that this interaction is relevant for the representation of these object pairs. For example, Riddoch et al. (2006) presented neglect patients with objects that were or were not co-located for action. They found a reduction of extinction for object pairs that performed a familiar action. An effect of this type of interacting objects on the overall strength of response in object-selective regions was also found with fMRI in normal subjects (Robberts and Humphreys, 2010). In sum, these studies suggested that such action pairs are represented as a whole (Humphreys and Riddoch, 2007) and that the overall activity elicited by object pairs in the object vision pathway is modulated by such action relationship. As these findings suggest that action pairs are coded differently, we wondered whether the relationship between single-object representations and pair representations in random object pairs would still apply to action pairs. Stated differently, we do not know whether the findings from previous studies, namely that the whole is equal to the average (MacEvoy and Epstein, 2009) or the weighted average (Agam et al., 2010, Reddy et al., 2009) of the parts in the case of random object pairs, can be extrapolated to action pairs.

In the present study, we compared the multi-voxel patterns of responses in the object vision pathway between single objects and object pairs. Our methods are similar to the ones used in a previous experiment (MacEvoy and Epstein, 2009) to be able to compare the results and better evaluate the effect of our additional manipulations. We compared the three most frequently proposed models in the literature, namely simple averaging, the max model, and weighted averaging. In addition, the position of the objects within the pairs was also manipulated, to test whether the inclusion of meaningful, action-related configurations would alter the relationship between the response patterns of pairs and their constituent objects. We found that this relationship between the response patterns of single objects and pairs could be most reliably described by a weighted average of the patterns of the single objects, with the maximum response weighted more than the minimum response. Data suggested that the maximum and the minimum response were weighted differently for the different types of object pairs when participants attended to the configuration.

Section snippets

Participants

Ten naive students of the University of Leuven (KU Leuven) with normal or corrected-to-normal vision participated in this study as paid volunteers (ages between 20 and 26 years, two male, all reported being right-handed). Data from one participant were excluded due to excessive head movement. The experiments were approved by the ethical committee of the Faculty of Psychology and Educational Sciences and the Medical Ethical Committee of the KU Leuven. Participants signed an informed consent at

Representation of single objects

Before we turn to the representation of the object pairs, we first investigated the properties of the representation of single object images. In particular, we checked what information about the single objects is stored in different parts of LOC: position and/or identity. Results are summarized in Fig. 2. Both pLOC and aLOC were able to discriminate better than chance between different objects when stimulus position is held constant (pLOC: F(1,8) = 602.997, p < .001; aLOC: F(1,8) = 16.941, p = .003)

Discussion

In the present study we presented single objects and object pairs while participants were performing an exemplar-level 1-back task or a task in which they had to judge the action quality of the configuration. We found that for both individual objects and object pairs information about the identity and the position of the objects is represented in the response patterns in LOC. The relationship between response patterns of pairs and single objects was best described by a weighted average of the

Funding

This work was supported by the Fund for Scientific Research — Flanders by a fellowship to A.B. and grant G.0562.10; and by the Methusalem program (METH/08/02).

Acknowledgments

We thank S.P. MacEvoy and R.A. Epstein for helpful comments on the manuscript.

References (31)

R.M. Cichy et al.
Encoding the identity and location of objects in human LOC
NeuroImage
(2011)
J.J. DiCarlo et al.
Untangling invariant object recognition
Trends Cogn. Sci.
(2007)
E.K. Miller et al.
Suppression of visual responses of neurons in inferior temporal cortex of the awake macaque by addition of a second stimulus
Brain Res.
(1993)
H.P. Op de Beeck et al.
Distributed subordinate specificity for bodies, faces, and buildings in human ventral visual cortex
NeuroImage
(2010)
L. Reddy et al.
Category selectivity in the ventral visual pathway confers robustness to clutter and diverted attention
Curr. Biol.
(2007)
M. Spiridon et al.
How distributed is visual category information in human occipito-temporal cortex? An fMRI study
Neuron
(2002)
Y. Agam et al.
Robust selectivity to two-object images in human visual cortex
Curr. Biol.
(2010)
D. Brainard
The psychophysics toolbox
Spat. Vis.
(1997)
T.A. Carlson et al.
Patterns of activity in the categorical representations of objects
J. Cogn. Neurosci.
(2003)
L. Chelazzi et al.
Responses of neurons in inferior cortex during memory-guided visual search
J. Neurophys.
(1998)

T.J. Gawne et al.

Responses of primate visual cortical V4 neurons to simultaneously presented stimuli

J. Neurophys.

(2002)

C. Green et al.

Familiar interacting object pairs are perceptually grouped

J. Exp. Psychol. Hum.

(2006)

J.V. Haxby et al.

Distributed and overlapping representation of faces and objects in ventral temporal cortex

Science

(2001)

H.W. Heuer et al.

Contrast dependence of response normalization in area MT of the rhesus monkey

J. Neurophys.

(2002)

G.W. Humphreys et al.

How to define an object: evidence from the effects of action on perception and attention

Mind Lang.

(2007)

Cited by (42)

Twos in human visual perception
2020, Cortex
Human vision serves the social function of detecting and discriminating with high efficiency conspecifics and other animals. The social world is made of social entities as much as the relations between those entities. Recent work demonstrates that vision encodes visuo-spatial relations between bodies with the same efficiency and high specialization of face/body perception. Specifically, perception of face-to-face (vs. non-facing) bodies evokes effects compatible with the most robust markers of face-specificity such as the behavioral inversion effect and increased activity in selective visual areas. Another set of results suggests that face-to-face bodies are processed as a grouped unit, analogously to facial features in a face. The facing dyad in the visual cortex may be the earliest rudimentary representation of social interaction.
Biased competition in semantic representation during natural visual search
2020, NeuroImage
Citation Excerpt :
Moreover, attention should bias this competition in favor of the target (Desimone, 1998), irrespective of whether attention is deployed to a spatial location (Keitel et al., 2012; Kastner et al., 1998), to a visual feature (McMains and Kastner, 2011; Bichot et al., 2005; Boynton, 2005), or to a visual object (Gentile and Jansma, 2010; Reddy et al., 2009). Several neuroimaging studies provided evidence for competition among cortical representations of multiple objects across visual cortex in the absence of specific task demands (Kastner et al., 1998; MacEvoy and Epstein, 2009; Gentile and Jansma, 2010; Nagy et al., 2011; Baeck et al., 2013; Jeong and Xu, 2017). Gentile and Jansma (2010) measured average BOLD responses in fusiform face area (FFA) while subjects viewed a single or a pair of face images.
Humans divide their attention among multiple visual targets in daily life, and visual search can get more difficult as the number of targets increases. The biased competition hypothesis (BC) has been put forth as an explanation for this phenomenon. BC suggests that brain responses during divided attention are a weighted linear combination of the responses during search for each target individually. This combination is assumed to be biased by the intrinsic selectivity of cortical regions. Yet, it is unknown whether attentional modulation of semantic representations are consistent with this hypothesis when viewing cluttered, dynamic natural scenes. Here, we investigated whether BC accounts for semantic representation during natural category-based visual search. Subjects viewed natural movies, and their whole-brain BOLD responses were recorded while they attended to “humans”, “vehicles” (i.e. single-target attention tasks), or “both humans and vehicles” (i.e. divided attention) in separate runs. We computed a voxelwise linearity index to assess whether semantic representation during divided attention can be modeled as a weighted combination of representations during the two single-target attention tasks. We then examined the bias in weights of this linear combination across cortical ROIs. We find that semantic representations of both target and nontarget categories during divided attention are linear to a substantial degree, and that they are biased toward the preferred target in category-selective areas across ventral temporal cortex. Taken together, these results suggest that the biased competition hypothesis is a compelling account for attentional modulation of semantic representations.
Object Vision in a Structured World
2019, Trends in Cognitive Sciences
In natural vision, objects appear at typical locations, both with respect to visual space (e.g., an airplane in the upper part of a scene) and other objects (e.g., a lamp above a table). Recent studies have shown that object vision is strongly adapted to such positional regularities. In this review we synthesize these developments, highlighting that adaptations to positional regularities facilitate object detection and recognition, and sharpen the representations of objects in visual cortex. These effects are pervasive across various types of high-level content. We posit that adaptations to real-world structure collectively support optimal usage of limited cortical processing resources. Taking positional regularities into account will thus be essential for understanding efficient object vision in the real world.
The representation of symmetry in multi-voxel response patterns and functional connectivity throughout the ventral visual stream
2019, NeuroImage
Several computational models explain how symmetry might be detected and represented in the human brain. However, while there is an abundance of psychophysical studies on symmetry detection and several neural studies showing where and when symmetry is detected in the brain, important questions remain about how this detection happens and how symmetric patterns are represented. We studied the representation of (vertical) symmetry in regions of the ventral visual stream, using multi-voxel pattern analyses (MVPA) and functional connectivity analyses. Our results suggest that neural representations gradually change throughout the ventral visual stream, from very similar part-based representations for symmetrical and asymmetrical stimuli in V1 and V2, over increasingly different representations for symmetrical and asymmetrical stimuli which are nevertheless still part-based in both V3 and V4, to a more holistic representation for symmetrical compared to asymmetrical stimuli in high-level LOC. This change in representations is accompanied by increased communication between left and right retinotopic areas, evidenced by higher interhemispheric functional connectivity during symmetry perception in areas V2 and V4.
Typical retinotopic locations impact the time course of object coding
2018, NeuroImage
In everyday visual environments, objects are non-uniformly distributed across visual space. Many objects preferentially occupy particular retinotopic locations: for example, lamps more often fall into the upper visual field, whereas carpets more often fall into the lower visual field. The long-term experience with natural environments prompts the hypothesis that the visual system is tuned to such retinotopic object locations. A key prediction is that typically positioned objects should be coded more efficiently. To test this prediction, we recorded electroencephalography (EEG) while participants viewed briefly presented objects appearing in their typical locations (e.g., an airplane in the upper visual field) or in atypical locations (e.g., an airplane in the lower visual field). Multivariate pattern analysis applied to the EEG data revealed that object classification depended on positional regularities: Objects were classified more accurately when positioned typically, rather than atypically, already at 140 ms, suggesting that relatively early stages of object processing are tuned to typical retinotopic locations. Our results confirm the prediction that long-term experience with objects occurring at specific locations leads to enhanced perceptual processing when these objects appear in their typical locations. This may indicate a neural mechanism for efficient natural scene processing, where a large number of typically positioned objects needs to be processed.
Dorsal and ventral stream contribution to the paired-object affordance effect
2018, Neuropsychologia
Visual extinction, a parietal syndrome in which patients exhibit perceptual impairments when two objects are simultaneously presented in the visual field, is reduced when objects are correctly positioned for action, indicating that action helps patients’ visual attention. Similarly, healthy individuals make faster action decisions on object pairs that appear in left/right standard co-location for actions in comparison to object pairs that appear in a mirror location, a phenomenon called the paired-object affordance effect. However, the neural locus of such effect remains debated and may be related to the activity of ventral or dorsal brain regions. The present fMRI study aims at determining the neural substrates of the paired-object affordance effect. Fourteen right-handed participants made decisions about semantically related (i.e. thematically related and co-manipulated) and unrelated object pairs. Pairs were either positioned in a standard location for a right-handed action (with the active object – lid – in the right visual hemifield, and the passive object – pan – in the left visual hemifield), or in the reverse location. Behavioral results showed a suppression of the observed cost of correctly positioning related pairs for action when performing action decisions (deciding if the two objects are usually used together), but not when performing contextual decisions (deciding if the two objects are typically found in the kitchen). Anterior regions of the dorsal stream (e.g. supplementary motor area) responded to inadequate object co-positioning for action, but only when the perceptual task required action decisions. In the ventral cortex, the left lateral occipital complex showed increased activation for objects correctly positioned for action in all conditions except when neither task demands nor object relatedness was relevant for action. Thus, fMRI results demonstrated a joint contribution of ventral and dorsal cortical streams to the paired-affordance effect. They further suggest that this contribution may depend on contextual situations and task demands, in line with flexible views of affordance evocation.

View all citing articles on Scopus

View full text

The distributed representation of random and meaningful object pairs in human occipitotemporal cortex: The weighted average as a general rule

Abstract

Highlights

Introduction

Section snippets

Participants

Representation of single objects

Discussion

Funding

Acknowledgments

NeuroImage

Trends Cogn. Sci.

Brain Res.

NeuroImage

Curr. Biol.

Neuron

Robust selectivity to two-object images in human visual cortex

Curr. Biol.

The psychophysics toolbox

Spat. Vis.

Patterns of activity in the categorical representations of objects

J. Cogn. Neurosci.

Responses of neurons in inferior cortex during memory-guided visual search

J. Neurophys.

Responses of primate visual cortical V4 neurons to simultaneously presented stimuli

J. Neurophys.

Familiar interacting object pairs are perceptually grouped

J. Exp. Psychol. Hum.

Distributed and overlapping representation of faces and objects in ventral temporal cortex

Science

Contrast dependence of response normalization in area MT of the rhesus monkey

J. Neurophys.

How to define an object: evidence from the effects of action on perception and attention

Mind Lang.