The distributed representation of random and meaningful object pairs in human occipitotemporal cortex: The weighted average as a general rule
Highlights
► Responses to pairs are predicted by a weighted average of responses to single objects. ► The strongest single-object response is weighted more than the min response. ► Max and min response are weighted differently for different types of object pairs.
Introduction
Object identity is considered to be extracted through hierarchical processing along the ventral object vision pathway. Ultimately, it is represented in distributed patterns of neural activity in the highest stages of that pathway, namely the inferior temporal cortex in monkeys and object-selective regions in human occipitotemporal cortex (e.g. DiCarlo and Cox, 2007, Haxby et al., 2001). Many studies have investigated neural responses to isolated objects (e.g., Carlson et al., 2003, Cichy et al., 2011, Kravitz et al., 2010, Logothetis and Sheinberg, 1996, Spiridon and Kanwisher, 2002, Tanaka, 1996), a situation atypical of the real world, where we usually see multiple objects at the same time. However, an increasing number of studies have revealed that the representation of a particular object is altered by the presence of other objects (e.g., Chelazzi et al., 1998, Miller et al., 1993, Reddy and Kanwisher, 2007, Rolls and Tovee, 1995, Zoccolan et al., 2005, Zoccolan et al., 2007). The exact nature of the coding of displays containing multiple objects is not clear because of discrepancies between existing studies and a lack of investigation of potentially relevant factors. Here we will focus upon the simplest situation, namely displays containing two objects. We investigated one discrepancy, namely the exact relationship between the responses to object pairs and the responses to their constituent objects, and two potentially relevant factors: the existence of an action relationship between the two objects of a pair and the task context.
Theoretically, the relationship between the representation of single objects and of object pairs could be captured by many functions, including the following four possibilities that describe how the response to the object pair relates to the response to the constituent objects when presented in isolation: (i) a simple averaging of the responses to the two objects, (ii) a weighted average with more weight to the object that elicits the strongest response in a neuron (the ‘max’ object as opposed to the ‘min’ object), (iii) a nonlinear max operator, which is an extreme version of a weighted average with the weight for the min object being 0, and (iv) any of many possible models in which the response to the object pair cannot be easily predicted from the responses to the two single objects.
At the moment we can only draw limited conclusions from the studies available in the literature for the following reasons. First, studies are not consistent. There is evidence for non-linear relationships (e.g. Heuer and Britten, 2002), with the max operator as the most prominent proposal. A couple of studies comparing responses to single objects and object pairs argued in favor of a simple averaging (MacEvoy and Epstein, 2009, Zoccolan et al., 2005), while other studies found evidence for a weighted average with more weight given to the max object (Agam et al., 2010, Reddy et al., 2009). Zoccolan et al. (2007) provided one explanation for the discrepancy between simple averaging and weighted averaging by showing a negative relationship between clutter tolerance and stimulus selectivity so that minimally selective neurons will tend to implement a max operator while the very highly selective neurons (which were mostly studied by Zoccolan et al., 2005) implement a simple average.
Second, all these previous studies investigating the relationship between single-object and paired-object responses have focused upon ‘random’ object pairs, in which there is no meaningful relationship between the two composing objects. Stimuli used were geometric forms (Zoccolan et al., 2005, Zoccolan et al., 2007), for which no particular configuration is preferred, or randomly chosen complex objects (Agam et al., 2010, MacEvoy and Epstein, 2009, Reddy et al., 2009). In the second case, these objects are combined so that neither the configuration nor the relative size correspond to real-world experiences. However, behavioral, fMRI, and TMS studies (e.g., Green and Hummel, 2006, Kim and Biederman, 2010, Kim et al., 2011, Riddoch et al., 2003, Riddoch et al., 2006) have noted that two objects can often interact meaningfully, for example by forming a so-called action pair (e.g. a cork screw on top of a bottle of wine), and that this interaction is relevant for the representation of these object pairs. For example, Riddoch et al. (2006) presented neglect patients with objects that were or were not co-located for action. They found a reduction of extinction for object pairs that performed a familiar action. An effect of this type of interacting objects on the overall strength of response in object-selective regions was also found with fMRI in normal subjects (Robberts and Humphreys, 2010). In sum, these studies suggested that such action pairs are represented as a whole (Humphreys and Riddoch, 2007) and that the overall activity elicited by object pairs in the object vision pathway is modulated by such action relationship. As these findings suggest that action pairs are coded differently, we wondered whether the relationship between single-object representations and pair representations in random object pairs would still apply to action pairs. Stated differently, we do not know whether the findings from previous studies, namely that the whole is equal to the average (MacEvoy and Epstein, 2009) or the weighted average (Agam et al., 2010, Reddy et al., 2009) of the parts in the case of random object pairs, can be extrapolated to action pairs.
In the present study, we compared the multi-voxel patterns of responses in the object vision pathway between single objects and object pairs. Our methods are similar to the ones used in a previous experiment (MacEvoy and Epstein, 2009) to be able to compare the results and better evaluate the effect of our additional manipulations. We compared the three most frequently proposed models in the literature, namely simple averaging, the max model, and weighted averaging. In addition, the position of the objects within the pairs was also manipulated, to test whether the inclusion of meaningful, action-related configurations would alter the relationship between the response patterns of pairs and their constituent objects. We found that this relationship between the response patterns of single objects and pairs could be most reliably described by a weighted average of the patterns of the single objects, with the maximum response weighted more than the minimum response. Data suggested that the maximum and the minimum response were weighted differently for the different types of object pairs when participants attended to the configuration.
Section snippets
Participants
Ten naive students of the University of Leuven (KU Leuven) with normal or corrected-to-normal vision participated in this study as paid volunteers (ages between 20 and 26 years, two male, all reported being right-handed). Data from one participant were excluded due to excessive head movement. The experiments were approved by the ethical committee of the Faculty of Psychology and Educational Sciences and the Medical Ethical Committee of the KU Leuven. Participants signed an informed consent at
Representation of single objects
Before we turn to the representation of the object pairs, we first investigated the properties of the representation of single object images. In particular, we checked what information about the single objects is stored in different parts of LOC: position and/or identity. Results are summarized in Fig. 2. Both pLOC and aLOC were able to discriminate better than chance between different objects when stimulus position is held constant (pLOC: F(1,8) = 602.997, p < .001; aLOC: F(1,8) = 16.941, p = .003)
Discussion
In the present study we presented single objects and object pairs while participants were performing an exemplar-level 1-back task or a task in which they had to judge the action quality of the configuration. We found that for both individual objects and object pairs information about the identity and the position of the objects is represented in the response patterns in LOC. The relationship between response patterns of pairs and single objects was best described by a weighted average of the
Funding
This work was supported by the Fund for Scientific Research — Flanders by a fellowship to A.B. and grant G.0562.10; and by the Methusalem program (METH/08/02).
Acknowledgments
We thank S.P. MacEvoy and R.A. Epstein for helpful comments on the manuscript.
References (31)
- et al.
Encoding the identity and location of objects in human LOC
NeuroImage
(2011) - et al.
Untangling invariant object recognition
Trends Cogn. Sci.
(2007) - et al.
Suppression of visual responses of neurons in inferior temporal cortex of the awake macaque by addition of a second stimulus
Brain Res.
(1993) - et al.
Distributed subordinate specificity for bodies, faces, and buildings in human ventral visual cortex
NeuroImage
(2010) - et al.
Category selectivity in the ventral visual pathway confers robustness to clutter and diverted attention
Curr. Biol.
(2007) - et al.
How distributed is visual category information in human occipito-temporal cortex? An fMRI study
Neuron
(2002) - et al.
Robust selectivity to two-object images in human visual cortex
Curr. Biol.
(2010) The psychophysics toolbox
Spat. Vis.
(1997)- et al.
Patterns of activity in the categorical representations of objects
J. Cogn. Neurosci.
(2003) - et al.
Responses of neurons in inferior cortex during memory-guided visual search
J. Neurophys.
(1998)
Responses of primate visual cortical V4 neurons to simultaneously presented stimuli
J. Neurophys.
Familiar interacting object pairs are perceptually grouped
J. Exp. Psychol. Hum.
Distributed and overlapping representation of faces and objects in ventral temporal cortex
Science
Contrast dependence of response normalization in area MT of the rhesus monkey
J. Neurophys.
How to define an object: evidence from the effects of action on perception and attention
Mind Lang.
Cited by (42)
Twos in human visual perception
2020, CortexBiased competition in semantic representation during natural visual search
2020, NeuroImageCitation Excerpt :Moreover, attention should bias this competition in favor of the target (Desimone, 1998), irrespective of whether attention is deployed to a spatial location (Keitel et al., 2012; Kastner et al., 1998), to a visual feature (McMains and Kastner, 2011; Bichot et al., 2005; Boynton, 2005), or to a visual object (Gentile and Jansma, 2010; Reddy et al., 2009). Several neuroimaging studies provided evidence for competition among cortical representations of multiple objects across visual cortex in the absence of specific task demands (Kastner et al., 1998; MacEvoy and Epstein, 2009; Gentile and Jansma, 2010; Nagy et al., 2011; Baeck et al., 2013; Jeong and Xu, 2017). Gentile and Jansma (2010) measured average BOLD responses in fusiform face area (FFA) while subjects viewed a single or a pair of face images.
Object Vision in a Structured World
2019, Trends in Cognitive SciencesDorsal and ventral stream contribution to the paired-object affordance effect
2018, Neuropsychologia