Abstract
Over the last decade, seemingly conflicting results were obtained regarding the question of whether features of an object are stored separately, or bound together, in working memory (WM). Many of these studies are based on an implicit assumption about a default, or fixed, mode of working memory storage. However, according to recent findings about the functional property of WM, we proposed that anticipated memory probes used in a given experiment might actually determine the format in which information is maintained in WM. In order to test this flexible maintenance hypothesis, we recorded EEG while subjects performed a delayed-match-to-sample task with and without the requirement of maintaining bound features. In two experiments, we found significant differences in EEG signals recorded in central-parietal channels between the two conditions, providing reliable evidence for such flexible maintenance.
Introduction
The question of how feature binding is maintained in working memory has received a lot of attention in the psychological and neuroscientific literature (Treisman & Gelade, 1980; Treisman, 1998;Schneegans & Bays, 2019a; Treccani, 2018). These studies focused on investigating two questions: 1) whether objects are stored in the visual working memory as bound objects or separated features, and 2) whether the maintenance of binding in working memory requires the involvement of additional processing resource, for instance, attention. These two questions are interrelated yet independent.
Assessing the limitations of working memory performance is a typical, but indirect, way to investigate the first question concerning the format of objects maintained in working memory, either as bound objects or as separate features. If features are separately stored in working memory, then performance should be limited by the number of features rather than the number of objects to be maintained. Otherwise, if bound objects rather than separate features are stored, then working memory performance should be limited by the number of objects regardless of the number of features. Following such reasoning, Luck and Vogel (1997) proposed a ‘strong object’ view since they observed that increasing numbers of features within objects did not impair performance in change-detection tasks. Moreover, they found that remembering additional features did not decrease performance compared to remembering less features with the same number of objects. Invariance to the number of features was observed even when one object consisted of multiple values from the same feature dimension. Such a strong object hypothesis was supported by a study showing no difference in delay EEG activity between memorization of multiple and single color objects (Luria & Vogel, 2011).
However, other studies came to different conclusions. For example, Wheeler and Treisman (Wheeler & Treisman, 2002) observed that while features from different dimensions could be stored in parallel without a cost for working memory performance, adding features from the same feature dimension limited memory performance. They proposed the multiple-resources view, stating that working memory maintains features from different feature dimensions in parallel, while features from the same feature dimension compete for storage space. The multiple-resources view was later supported by other studies using either a similar (Delvenne & Bruyer, 2004) and different paradigms (Wang, 2017, Xu, 2002). For example, Wang et al (2017) found that for objects with two dimensions, the memory performance decreased as more feature values had to be remembered, but the other (fixed) dimension of that object was not affacted.
Yet another variant for the architecture of working memory suggests that working memory performance is limited by both the number of objects and the number of features. For example, Olson and Jiang (2002) found that when the number of objects was held constant, performance was better in the single-feature condition than in the multiple feature condition, suggesting that it was more difficult to store two features than one feature of an object. When the number of features was held constant, performance was better when features conjoined to form objects than when they were presented as isolated features. The importance of both the number of features and the number of items inspired the ‘weak-objects view’ of working memory limitation (Alvarez & Cavanagh, 2004; Hardman & Cowan, 2015).
The second question, whether binding features together requires additional resources, was widely studied through dual-task paradigms in which a secondary-attention-demanding task was performed during the retention interval of a task. If maintaining bound objects requires processing resources beyond those required for maintaining unbound features, then tests requiring retention of bound features would suffer from the secondary task to a greater extent than those requiring only retention of features. Using this approach, Allen, Baddeley and Hitch (2006) found that a concurrent backward counting task impaired overall change-detection performance, but did not impair memory of bound objects more than memory of separate features. This pattern of results was replicated when memory of bound objects was made more demanding by separating the shape and color features spatially, temporally, or across visual and auditory modalities (Baddeley, Allen, & Hitch, 2011), and when the secondary task was auditory (Morey & Bieler, 2013; Vergauwe, Langerock, & Barrouillet, 2014). That is, the impairment caused by the secondary task was not selective to bound objects.
However, some secondary tasks, involving object-based attention, like multi-object tracking (Fougnie & Marois, 2009), mental rotation, and random dot kinematograms (He et al., 2020; Shen, Huang, & Gao, 2015), showed a greater interference with the maintenance of bound objects. A specific susceptibility of feature bindings to interference from subsequent distractor stimuli has also been observed (Makovski & Pertzov, 2015; Ueno, Allen, Baddeley, Hitch, & Saito, 2011). A fMRI study showed that retention of binding in addition to features involved more cortical regions compared with retention of features (Parra, Della Sala, Logie, & Morcom, 2014). Behavioral dissociation between maintaining features and bound objects was observed in different groups. For example, memory for bound objects could be impaired, without impairment in memory for individual features or locations, in patients with a variant of encephalitis (Pertzov et al., 2013) and Alzheimer’s disease (Liang et al., 2016; Parra et al., 2009; Pertzov et al., 2013). This suggests that maintenance of binding involves extra processing resources in addition to maintenance of features.
Both debates about whether features are stored separately or bound, and whether keeping bound objects in WM requires extra attention or not, are based on an implicit assumption that there is a default or fixed mode of working memory storage. Under this assumption, this mode of representation could be impaired with resource shortage and could be estimated with performance in response to different probe questions. However, such a universal default mode cannot be taken for granted. Indeed, remarkable flexibility in prioritizing information in WM according to the task goal has been reported. For example, when non-human primates viewed the same to-be-remembered stimuli but were trained to expect different kinds of memory probes, delay activity in the prefrontal cortex showed different patterns (Rao, Rainer, & Miller, 1997). In an fMRI study in humans, participants were required to remember a face and a scene (Nobre, 2007). During the delay period, a cue was presented to inform which is to be tested later. Increased activity was observed in areas involved in face (fusiform gyrus) or scene (parahippocampal gyrus) processing, according to the cue. Thus, working memory is not simply a passive representational state of visual input during a delay period, but is better conceived as a functional state bridging previous contexts and sensations to anticipated actions and outcomes (Myers, Stokes, & Nobre, 2017). Consequently, the anticipated memory probes (questions) used in a given experiment might actually determine the format in which objects will be maintained in working memory and the involved cognitive resources.
Imaging studies have also supported both maintenance of separate features and of bound objects. For instance, both primate (Baizer, Ungerleider, & Desimone, 1991; Mishkin, Ungerleider, & Macko, 1983) and human studies (Courtney, Ungerleider, Keil, & Haxby, 1996; Smith et al., 1995) found that working memory for spatial location and item identities activated different regions of the brain. Other studies however found no evidence for distinct representations of spatial and non-spatial features (D’Esposito et al., 1998; Kravitz, Kriegeskorte, & Baker, 2010; Kravitz, Saleem, Baker, & Mishkin, 2011). These conflicting findings might be due to different methodologies used, but they may also be due to the flexibility of storing visual input both as bound objects and as separated features.
In order to test this flexible maintenance hypothesis, claiming that the brain stores separate features or bound objects according to the task goal, we recorded EEG while subjects performed a delayed-match-to-sample task with and without the requirement of maintaining binding between features. If there is a universal default mode for the system to store visual stimuli regardless of task requirement, we expected no systematic differences between these two conditions in the Event-Related Potential (ERP) during delay period, as the visual input to be maintained was identical. If information maintained in the visual working memory is adaptive to task goals, we expected to find differences in ERP during the delay period between these two conditions. The direction of such differences, if any, could inform us of the amount of processing resources consumed by each representation format. We report two experiments, where Experiment 1 is exploratory, and Experiment 2 is a confirmatory replication.
Experiment 1
Methods
Participants
Fifteen healthy volunteers from the Hebrew University of Jerusalem participated in the study. They were paid (40NIS/hrs.) or given course credits for participation. All subjects had reportedly normal or corrected-to-normal sight and no psychiatric or neurological history. One subject did not finish the experiment and three were excluded from analysis due to noisy recordings. The remaining eleven subjects consisted of 6 males and 5 females 19–31 years old. The experiment was approved by the ethics committee of the Hebrew University of Jerusalem, and informed consents were obtained after the experimental procedures were explained to subjects.
Stimuli and apparatus
Subjects sat in a dimly lit room. The stimuli were presented with Psychotoolbox-3 (http://psychtoolbox.org/) implemented in Matlab 2018 on a ViewSonic G75f CRT (1024×768) monitor with a 100-Hz refresh rate. They appeared on a grey background at the center of the computer screen located 100 cm away from the subjects’ eyes.
Subjects performed a delayed match-to-sample test in 3 blocked conditions (Figure 1): two-item-feature (F2), two-item-binding (B2), and four-item-binding (B4). In the B2 and F2 conditions the memory array consisted of 2 colored items randomly selected out of six items of identical irregular shapes in clearly different colors (Figure S1). Each item subtended a visual angle of 2.1°×2.1. The color of each item was randomly selected out of six highly distinguished colors, without repetition (i.e. each item had a unique color within the array). The locations of the two selected items were randomly selected out of eight potential locations evenly distributed on an invisible circle with a diameter of 7.3°centered on the fixation cross. In the B4 condition, the memory array consisted of four items of different colors on four locations that were randomly selected from the six colors and eight potential locations respectively, without repetition.
Following a variable delay period of 1, 2 or 3 seconds, a single probe was shown on each trial. In the B2 and B4 condition, a probe could be one of three types: Matched, New-Feature, or Mis-Conjunction. A Matched probe had the same color and location as one of the items in the memory array. A New-Feature probe had either an old color but was located at a novel location, or was at an old location but had a novel color. Mis-conjunction probes had an old color and an old location, both present in the memory array, but the conjunction between them was novel. In the F2 condition, there were only two types of probes, namely Matched probes and New-Feature probes. A Matched probe in this condition was either presented at the center of the screen in an old color contained in the memory array, or a black item shown at an old location. Finally, a New-feature probe in this condition was a probe either presented at the center of the screen with a color that did not appear in the previous memory array or an item in black presented at a new location.
Experimental procedure
Each trial started with 700 ms a fixation cross presented at the center of the screen. Next, the memory array appeared for 100 ms. This memory array was followed by delay period of either 1,2 or 3 seconds blank screen, in equal proportions. Trials with the 3 delay durations were randomly mixed within a block. Following the delay period, a probe item was presented and subjects had to press a key to indicate if the probe item was an “old” or a “novel” item. The probe disappeared when the response was made (with a limit of 2 seconds), followed by the initial fixation of the next trial.
There were 3 conditions in the experiment, each consisting of 288 trials. In the B2 and B4 conditions, the delay period was followed by match probes in 144 trials, new feature probes in 48 trials, and Mis-conjunction probes in 96 trials, and a probe was regarded as an old item only when the probe had both the color and the location of one of the items shown in the memory array. In the F2 condition, the delay period was followed by match-feature probes in 144 trials and the new-feature probes in the remaining 144 trials, and a probe was regarded old when either the color or the location of the probe had been shown in the memory array. Each condition began with 64 practice trials and each condition was divided into 2 consecutive blocks, resulting in 6 blocks in total. Each block took about 15 mins, and subjects were instructed to take a break between blocks.
As the memory arrays in B2 and F2 both included 2 items with 2 relevant features each, the contrast between B2 and F2 was intended to reveal the effect of retaining feature conjunctions in addition to individual feature values. The contrast between B4 and B2 was conducted to reveal a load effect (maintaining 4 vs. 2 items). The order of these three conditions was counterbalanced across participants.
All analyses and figures for the behavior results were made using JASP 0.8.5.1 (https://jasp-stats.org/). Response accuracy of 11 subjects was entered into a 3 (Delay duration: 1 vs 2 vs 3 sec delay) × 3 (Condition: F2 vs B2 vs B4) repeated-measures two-way ANOVA. Degrees of freedom were adjusted for violations of the assumption of sphericity with the Greenhouse–Geisser correction when necessary.
EEG recording and data analysis
EEG data were recorded using an Active 2 system (BioSemi, the Netherlands) from 64 active electrodes spread out across the scalp according to the extended 10–20 system with the addition of two mastoid electrodes and a nose electrode (https://www.biosemi.com/pics/cap_64_layout_medium.jpg). Horizontal electrooculogram (EOG) was recorded from electrodes placed at the outer canthi of both eyes. Vertical EOG was recorded from electrodes placed above and below the left eye. The EEG was continuously sampled at 1024 Hz with an anti-aliasing low pass filter with a cutoff of 1/5 the sampling rate, and stored for off-line analysis. The data was referenced online to the Common Mode Sensor (CMS) which was placed in the space between POz, PO3, Pz, and P1.
Data preprocessing and analysis were done with the FieldTrip toolbox (version 20191213 http://www.fieldtriptoolbox.org/) implemented in Matlab 2018 (Mathworks, Natick, MA, USA). Preprocessing was applied to continuous data. During preprocessing, EEG and EOG signals were firstly filtered with Butterworth zerophase (forward and reverse filter) Bandpass filter of 0.1–180 Hz and then referenced to the nose channel. Extremely noisy or silent channels, which contributed more than 20% of all artifacts (Criteria: more than 100μV absolute difference between samples within segments of 100 ms; absolute amplitude > 100μV) were deleted. No more than 2 channels in a single subject were deleted, see details below. Next, data were rereferenced to an average of all remaining EEG electrodes. Ocular and muscular artifacts were removed from the EEG signal using the ICA method by manual selection of artifact components based on correlation with the EOG channels, power spectrum typical to muscle activity, and typical component scalp topographies. After ocular and muscle artifacts were removed, automatic artifact rejection was applied (http://www.fieldtriptoolbox.org/tutorial/automatic_artifact_rejection/). Time points larger than 12 standard deviation from the mean of the corresponding channel were marked, together with 200 msec before and after, ensuring that the (subthreshold) beginning and end of an artifactual event will be accounted for. A visual inspection of the data followed in order to detect rare artifacts which were missed by the automatic procedure. Finally, previously deleted channels were recreated by mean interpolation of the neighboring electrodes (FC5, F6 were interpolated in subject 01, AFz in subjects 02, 04, and 06, AF8 and Iz in subjects 05, 07, and 09, and PO3 and CP1 in subject 08). Data were then down-sampled to 512 Hz, filtered with a Butterworth zero-phase lowpass filter at a cutoff at 20 Hz and parsed into 1800 msec segments starting 500 msec before the memory array onset, and averaged within each subject and condition. The average of 100 msec before the onset of stimuli over each trial was defined as baseline and subtracted from all data points of each segment.
There were two planned contrasts in our experiments: a) the difference between F2 and B2, potentially revealing the effect of task (retaining features or bound objects) on the ERPs, given the same visual input, and (b) the difference between B2 and B4, revealing the working memory load effect on the ERP, given the same task. To compare ERP amplitudes between conditions across all electrodes and time samples, cluster-based permutation tests (Maris & Oostenveld, 2007) were performed in Experiment 1, the exploratory phase. This approach allowed for a sensitive comparison between conditions at the level of spatiotemporal clusters without a predefined region of interest (ROI) and provided relevant correction for multiple comparisons with Monte Carlo based cluster-correction. The cluster-based permutation test included the following procedure: a) A paired sample t-test was applied to each time point and channel, resulting in one t-value for each data point. b) A threshold of p <.05 was applied to each time point on each channel, and a cluster was defined as the collection of above threshold data points, adjacent to each other either spatially or temporally. c) t-values were summed up within each cluster resulting in sum-t values. d) Repeating steps a-c for 1000 times on data while switching the condition labels within a randomly selected set of subjects for each iteration. Within each iteration, the largest positive and the smallest negative sum-t entered into the “positive” and the “negative” null distribution, respectively. e) The sum-t value of each cluster calculated in step c was compared with the two null distributions. Clusters with the sum t-value larger than of 97.5% of the largest null distribution were defined as significant positive clusters, while clusters with the sum-t value smaller than 97.5% of the smallest null distribution were defined significant negative clusters.
Results
Behavioral results
A 3 (Delay duration : 1 sec vs 2 sec vs 3 sec delay) × 3 (Condition: 2F vs 2B vs 4B) repeated-measures two-way ANOVA of response accuracy revealed a main effect of Condition, F(2,20) = 13.70, p < 0.01, η2 =.58 (Figure 2). Follow-up pairwise comparison (with Bonferroni correction) showed a significantly lower response accuracy in condition B4 than both condition B2, Mdiff = - .095, SE = .03, t = −3.30, p =.011, Cohen’d =-.996, and condition F2, Mdiff = - .15, SE = .03, t = −5.18, p < 0.01, Cohen’d = −1.56, whereas no significant differences were found between B2 and F2 condition on the response accuracy, Mdiff = - .054, SE = .03, t = −.56, p = .23. The main effect of Delay Duration was also significant, F(2,20) =4.89, p = .002, η2 =.328. Follow-up pairwise comparison showed a marginal significant decrease of response accuracy from the 1 sec to the 3 sec delay, Mdiff = .03, SE = .01, t = 2.49, p = .065, Cohen’d = .75, and the 2 sec to the 3 sec delay, Mdiff = - .026, t = 3.96, SE = .007, p = .006, Cohen’d = .75, but not from 1 to 2 sec delay, Mdiff = - .003, SE = .01, t = −.26, p = 1, reflecting some memory decrement following extended delays. The interaction between Delay Duration and the Condition was not significant, F(4,40) = .69, p = .60, η2 =.07. These results indicated that B4 condition, as expected, was more difficult than the other conditions, and that F2 and B2 conditions did not significantly differ in their overall exertion. Therefore, any results gained from comparing F2 and B2 conditions in EEG could not be explained by differences in task difficulty.
ERP results
Comparing B4 and B2 condition
We examined the effect of enhanced memory load by comparing the EEG signal amplitude in the B4 and B2 conditions up to the first second after the onset of the memory array (as this period was common to all 3 delays). Figure 3 shows that maintaining four-bound objects (B4 condition) evoked larger ERP amplitude than maintaining two-bound objects (B2 condition) at right-frontal and middle-central electrodes.
A spatiotemporal cluster-based permutation test showed that this significant cluster of difference (p < 0.05, cluster-corrected) extended from 394 msec to 847 msec after the onset of the memory array1. Figure 4 shows the grand average waveforms across subjects for each of the 25 channels included in this significant positive cluster2.
Comparing F2 and B2 condition
We compared ERP amplitude when subjects were required to remember two items with (B2) and without (F2) the need to maintain the conjunction (binding) between color and location. As the memory array was identical in both conditions, a difference between the two conditions could not be due to differences in visual information. This contrast thus indicated whether the visual information had been maintained in a taskspecific manner and further revealed whether maintaining bound-objects consume more or less resources relative to maintaining separated features. Considering the results above from the contrast between B4 and B2, as well as previous studies (Mecklinger & Pfeifer, 1996; Ruchkin & Johnson, 1990), stronger central positivity could be taken as indicating higher memory load. Under the premise that binding entails extra load in comparison to feature maintenance we would expect B2 to elicit more positive ERPs than F2 condition. A difference in the other direction would suggest the less intuitive alternative that maintaining isolated, unbound, features is the more tasking situation.
Cluster-based permutation test showed that F2 elicited stronger positivity than B2 with a significant cluster in centro-parietal electrodes between 220 msec to 589 msec after the onset of the memory array (Figure 5). Figure 6 shows the amplitude averaged across subjects for each of the 20 channels included in this significant positive cluster. Tentatively, this result indicates the involvement of less resources in maintaining bound objects relative to separate features. Before discussing this result, we note that this surprising result is found through an exploratory method rather than a hypothesis-driven approach. Therefore, in experiment 2, we recruited a new group of subjects in an attempt to replicate this finding with the same paradigm.
Experiment 2
Method
We replicated experiment 1 using the same experimental material and procedure.
Participants
Eighteen healthy volunteers from the Hebrew University of Jerusalem participated for either course credits or payment (40NIS/hrs). All subjects had normal or corrected-to-normal sight and reported no psychiatric or neurological history. Five subjects were excluded from analysis due to noisy recordings (Subjects in which more than two neighboring channels were noisy were excluded). The final sample was composed of thirteen subjects (6 males, 7 females aged range 21-30). Informed consents were obtained after the experimental procedures were explained to the subjects. The experiment was approved by the ethics committee of the Hebrew University of Jerusalem.
EEG recording and data analysis
Data preprocessing and analysis were the same as in Experiment 1. Briefly, preprocessing started with deleting extremely noisy channels. A Butterworth Bandpass filter of 0.1–180 Hz was applied. Ocular and muscular artifacts were extracted with ICA and removed. Then an automatic and manual artifact rejection was applied. Finally, previously deleted channels were recreated by mean interpolation of the neighboring electrodes (CP3 and P1 in subject 01; AF7, F7, P9 in subject 03; FC5 in subject 04; POz, Fp2 in subject 05; T7,TP8 and P9 in subject 06; F8 in subject 08, CP3, T7 in subject 09; T7, CP3 in subject 12). After preprocessing, data were downsampled to 512 Hz and were low-pass filtered with a Butterworth bandpass filter at a cutoff of 20 Hz, segmented, and corrected for the baseline.
In Experiment 2, the confirmatory phase, with a new group of subjects, we used pre-defined ROIs based on the significant spatio-temporal clusters found in experiment 1. “Rectangle ROI” for the contrast between two conditions was defined as all channels that were included in the significant cluster, spanning the interval from the earliest time to the latest time in that cluster across all channels. “Cloud ROI “ included all channels within the significant clusters but including, for each channel, only the time points inside the significant cluster of that channel (Figure S2). One-tailed-paired-sample t-tests were then applied to compare amplitudes averaged over pre-defined ROIs (Rectangle ROI & Cloud ROI) between B2 and B4 conditions and F2 and B2 conditions respectively. The direction of the one-tailed test followed the expected effect as observed in Experiment 1.
Results
Behavior results
A 3 (Delay duration : 1 sec vs 2 sec vs 3 sec delay) × 3 (Condition: 2F vs 2B vs 4B) repeated-measures ANOVA revealed a main effect of Condition on response accuracy, F(2,24) = 27.38, p < 0.01, η2 = .70. Follow-up pairwise comparisons (all with Bonferroni corrections) showed a significantly lower response accuracy in condition B4 than both condition B2, Mdiff = - .15, SE = .023, t = −6.65, p < 0.01, Cohen’d = −1.85, and condition F2,Mdiff = - .14, SE = .023, t = −6.13, Cohen’d = −1.70, p < 0.01, while no significant differences were found between B2 and F2, Mdiff = 001, SE = .023, t = 0.52, Cohen’d = 0.14, p = 1. These results are similar to the pattern of behavioral results in Experiment 1 and confirmed that B2 Condition and F2 Condition were not significantly different in terms of task difficulty, while B4 Condition was more tasking than the other two. As in experiment 1, a main effect was found on Delay Duration, F(2,24) =7.70, p = .003, η2 = .39 (Figure 7). Followup pairwise comparison showed a significant decrease of response accuracy following 3 secs delay compared with 1 sec delay, Mdiff = - .032, SE = .01, t = −3.91, p = .009, Cohen’d = −1.09, while no significant difference in response accuracy was found between 1 sec delay and 2 sec delay, Mdiff = −.014, SE = .008, t = −1.74, p = .28, Cohen’d = −0.48, nor between 2 sec delay and 3 sec delay, Mdiff = - .02, SE = .008, t = −2.17, p = .12, Cohen’d = −.62. These results reflect forgetting of visual information over time. The interaction between Delay Duration and Condition was marginally significant, F(4,48) = 2.56, p = .05, η2 =.18. Altogether, behavior results confirmed what was found in experiment 1, suggesting that B4 condition is more demanding than the other two conditions, while any result gained from comparing F2 and B2 conditions in EEG could not be explained by differences in task difficulty.
ERP results
Comparison between B4 and B2
Experiment 2 served as a confirmation stage following the exploratory stage of Experiment 1. Thus, the ROIs obtained from Experiment 1 were used to test for Condition Effect in Experiment 2, performed on a new group of subjects. We averaged the amplitudes of the ERPs across the channels for the “Rectangle” ROI (Figure.S1) including the channels consisting the significant cluster of Experiment 1, and all the time points within the cluster which were significant in any of the channels. Figure 8a shows the average ERP over subjects and channels included in the ROI. The normality of the data was confirmed with Shapiro-Wilk test. A one-tailed paired sample t-test confirmed that, as in Experiment 1, the B4 condition (M = 1.13, SE = 0.24) evoked significantly higher positivity than the B2 condition (M = 0.68, SE = 0.20), t(12) = 3.14, p = .004, Cohen’d = 0.87. This direction of difference was found in eleven out of thirteen subjects (Figure 8b). Similar result (Figure S3) was found using the “Cloud ROI”, which was the exact cluster found significant in Experiment 1, by channels and time-points. Taken together, we successfully replicated the results showing that maintaining additional bound items in memory lead to larger positive response in frontal-right and middle-central channels within the time window ranging from ~390 msec to ~850 msec after the offset of the memory array.
Comparing F2 and B2 condition
Next, the amplitude for F2 and B2 condition were averaged respectively within the spatial-temporal Rectangle ROI (Figure 9a). A One-tailed paired-sample t-test showed that the F2 condition evoked larger positivity (M = 0.73, SE = 0.18) than B2 condition (M = 0.40, SE = 0.22) within the Rectangle ROI. However this difference was not significant, t(12) = 1.54, p =.08, Cohen’s d = 0.43. Seven out of thirteen subjects exhibited larger averaged positivity in F2 than in B2 condition (Figure 9b). A similar result was found when the Cloud ROI was applied (Figure S4).
To complement inferential frequentists statistics, we used JASP 0.8.5.1 software (Marsman & Wagenmakers, 2017; van Doorn et al., 2020) to conduct Bayesian Paired-Sample t-tests and examine the null hypothesis that B2 and F2 evoked the same amplitude ERP within the ROI. The alternative hypothesis was that F2 evoked a less positive ERP than B2 condition. That is, we compared H0 : δ = 0 and H- : δ < 0, where δ is the standardized effect size of F2-B2 (Figure S5a). The prior distribution was assigned a Cauchy prior distribution (see Jeffreys, 1961; Liang et al., 2008) with r= 1/√2, truncated to allow only negative effect size values. For parameter estimation, we used the two-sided t test model and plotted the posterior distribution of δ (Figure S5b). We found relatively strong evidence against H- (BF-0 = .13 Figure S5a). The posterior distribution of δ had a median of −0.436 with a 95% credible interval ranging from −1.19 to 0.24 (Figure S5b). This result was robust across a wide range of prior r, with BF-0 ranging from about 0.3 to 0.05 (i.e., H0 is 3 to 20 times more likely than H-, see Figure S4c), confirming that B2 condition did not evoke larger positivity than F2. Similar results were gained using the cloud ROI (Figure S6). Next we tested the same null hypothesis H0 : δ = 0 against the alternative H+ : δ > 0, that is, that F2 evoked a more positive ERP than B2. The Cauchy prior distribution was now truncated to allow only positive values. Here, we did not find strong evidence in favor of either hypothesis (BF+0 = 1.31,see Figure S5d & Figure S6d). In summary, we confirmed that remembering bound objects does not consume more cognitive resource (putatively reflected by greater positivity within the ROI) than remembering separate features, and found a trend towards the result of Experiment 1 that showed higher positivity in F2 than in the B2 condition.
Post-hoc analysis
The ERP over channels within the pre-defined ROI during the maintenance period reveals a clear trend for F2 condition to evoke larger positive ERP than the B2 condition, with a somewhat longer delay than in the first experiment (see Figure 9). A possible reason that the difference between F2 and B2 in Experiment 1 did not fully replicate is that the effect differs in the temporal domain across individuals. That is, a large difference between F2 and B2 conditions might appear soon after the offset of the memory array for some subjects, whereas for other subjects the difference between these two conditions takes longer time to develop. Such individual differences can be seen in Figure 11, where the differences in amplitude between the F2 and B2 conditions for electrodes included in the pre-defined ROI are drawn for each subject, for every 200 msec bin from 200 msec to 1000 msec after memory array onset. It can be seen that the difference (F2-B2) was positive in most of the subjects and most of the time bins, although subjects varied as to the latency of the maximum difference. Therefore, we extended the temporal ROI into the whole maintenance period from 200 msec to 1000 msec after the array onset and segregated it into four 200 msec segments. We then averaged the amplitudes across the ROI channels for each segment. For each subject, we then selected the temporal bin that elicited the largest difference between F2 and B2, and compared the amplitudes of the F2 and B2 conditions across subjects with paired sample t-test. We compared the result to a distribution under the null hypothesis generated through a permutation procedure. In each iteration of this procedure we randomly permuted the labels of the B2 and F2 conditions and then applied the same procedure of segmentation and peak difference selection to compare the surrogate conditions with a paired sample t-test. Thus, each permutation contributed one value to the null distribution and this procedure was repeated 5000 times. The t-value of the empirical data (red vertical line in Figure 12a) was larger than 99.48 %of t-values in the permutated trials (Histogram in Figure 12a), which is unlikely to be obtained by chance. Thus, allowing temporal variance in the ROI of each individual leads to replication of the effect found in experiment 1, showing that the F2 condition elicits larger positivity compared with B2 condition during the maintenance duration. Critically, we repeated the procedure, now looking for the maximum negative (F2<B2) difference between F2 and B2, but the result was far from significant in this case (Figure 12b).
Discussion
In two experiments, we found significant differences in EEG signals between tasks requiring short term maintenance of visual features and maintenance of bound objects, even when the visual input was identical across tasks. This provides evidence of task-dependent maintenance processes and against the notion, implicitly embedded in previous literature that a fixed default mode of item representation is maintained in visual working memory. Furthermore, our results suggest that maintaining information in the bound-object condition may require less processing resources than in the separate-feature condition.
As expected, the ERP results reflected stronger central positivity in the more demanding B4 condition, compared to the B2 condition. Indeed, the behavioral results indicated that the B4 condition was more challenging than the other conditions. Considering the topographical distribution of this difference, as well its long latency, it is reasonable to take this increased positivity as a marker of memory load, but we cannot rule out a pure visual effect as the number of items on the display differed between B4 and the other conditions. Conversely, the B2 and F2 conditions consisted of identical arrays of two items, and lead to similar report accuracy. Still, a significant difference between the ERPs in these conditions was observed. Thus, the difference between the F2 and the B2 conditions was task-related rather than stimulus-related. The F2 condition elicited a higher central positivity and was more akin to the B4 (high load) condition than to the B2 condition in that respect. This pattern could reflect either (a) a larger number of units (i.e. 4 feature values vs. 2 bound objects) maintained during the delay period of F2 vs. B2 or (b) extra activity in the F2 condition when bound objects had to be decomposed into separate features to efficiently fulfill the task requirement. We note a more frontal topography for the B2 and B4 difference relative to the B2 and F2 difference, which might suggest different processes, favoring option (b) above. These two explanations are not mutually exclusive and cannot be completely differentiated in the current study.
A previous visual search study (Berggren & Eimer, 2018) provided some intuition on how the brain represents information in a format that is compatible with the test demands. In this experiment, subjects answered whether the search display contained one of two target items held in working memory. When the search array included only one object with a target matching feature, targets and incorrect conjunction objects elicited identical N2pc component and sustained posterior contralateral negativity (SPCN). The N2pc is assumed to index a shift of spatial attention to the location of the potential target (Eimer, 1996; Kiss, Van, & Eimer, 2008), and the SPCN is assumed to index attentional activation of visual working memory representations of the potential target (Jolicoeur, Brisson, & Robitaille, 2008). These results suggest that in this condition only features were used as searching templates. In another condition, it was insufficient to detect a certain feature, as all objects had target-matching features, and both a target and an incorrect conjunction object could be present in the same display. In this case, the target evoked a larger N2pc than the incorrect conjunction objects, and only targets elicited SPCN components. Therefore, in this condition a bound object, rather than an isolated feature template, was used. Taken together, the results of this study are compatible with the possibility that working memory templates guiding attention in visual search were encoded or used flexibly, to effectively distinguish match and no-match probes, based on the task demand.
In our study, a color probe in the feature condition (F2) always appeared in the center (a location never used in the memory arrays) and a location probe was always colored black (a color never used in the memory arrays). Thus, the matched-probes in F2 (feature) condition matched only one feature dimension of the stimuli in the memory array, whereas the other feature of the probe was fixed across trials, with a value that never appeared in the memory array. In this condition, it is more efficient to maintain feature values which are not yoked to other feature dimensions. Since the memory array contained two colors and two locations, 4 different feature values had to be maintained. The matched-probes in the B2 (binding) condition, on the other hand, matched an item from the memory array in both feature dimensions. In this case, retaining bound objects allows a direct comparison between the representation in the working memory and the upcoming probe, and may lead to more efficient performance.
Task-specific encoding and retention of visual stimuli have been observed by other studies addressing working memory more directly. For example, one study found that memory for different features can be differentially affected by retro-cues indicating which feature dimension would be tested (Park, Sy, Hong, & Tong, 2017). Woodman and Vogel (2008) suggested that object colors and orientations could be represented in working memory either separately, or together, according to the task requirement. A lower contralateral delay activity (CDA), a marker for memory load, was observed when only color was task-relevant compared to the case both color and orientation were relevant. Using CDA, a recent study (McCants, Katus, & Eimer, 2020) showed that objects were represented separately by their parts when parts appeared as probes, whereas a single compound object was maintained when objects were used as probes. McCants et al. concluded that not only can top-down control determine which of several feature dimensions (e.g. color, orientation) to maintain, but also whether parts of objects or their gestalt would be maintained. Here, we showed that the system is even more flexible – it allows selection between the maintenance of separate feature dimensions vs. a bound object according to the task goal and the nature of the probe.
The finding that objects are retained either as separate features or as bound items based on task requirements may explain some of the seemingly conflicting results in the literature, although not all of them. This is based on the idea that although two tasks are similar in their explicit instructions, subtle differences in the relationship between the memory array and the probe might affect the way that the memory array is maintained in memory. For example, Luck and Vogel (1997; extended in Vogel, Woodman, & Luck, 2001) observed that the ability to detect a change in a stimulus array of a given set size is the same regardless of the number of features. In contrast, Wheeler and Treisman (2002) found memory capacity is determined merely by the number of total features when two features from the dimension were combined into one object. In Luck and Vogel (1997), each item presented in the memory array contained features randomly selected from all possible feature values with replacement (e.g., more than one item could be red) and the non-match probe could include an erroneous conjunction of features that appeared in the memory array. In contrast, in the study by Wheeler and Treisman (2002, Experiments 1 and 2), items on the memory array were generated by selecting from possible feature values without replacement (e.g. only one item could be red). Moreover, to generate a non-matched probe, the probe included a feature that had not been used by any item in the memory array. This design, where no replacement was allowed when assigning feature value to a single memory array, can be also found on other studies claiming that increased features to be remembered for each object impaired change detection (Oberauer & Eichenberger, 2013). In the study by Luck and Vogel (1997), bound objects had to be maintained to detect a non-match probe, while in studies showing contradictory observations, remembering a list of all features in the memory array was sufficient to distinguish between matched and non-match probe. The different results regarding representation format might therefore reflect the fact that such format is flexibly shaped by task goal. Our findings, therefore, draw attention to subtle details between experimental designs which could have an impact on the pattern of information that is maintained in memory.
Whereas the electrophysiological difference between the B2 and B4 condition was accompanied by a difference in task accuracy, the same difference between B2 and F2 conditions did not result in an observable change in accuracy. This highlights a dissociation between the activity measured continuously during the maintenance period, and the decision recorded after the presentation of the probe. Taking the results together, one can speculate that more units held in working memory are reflected by enhanced positivity (and possibly an enhancement of the CDA when stimuli are lateralized), whereas diminished accuracy reflects increase in difficulty of comparing the probe with what was retained in working memory. For example, Alvarez and Cavanagh (2007) showed that working memory capacity is inversely correlated with item complexity. However, Awh et al (2007) showed that the sample-test similarity is highly correlated with item complexity, and that the item complexity effect on memory performance was not observed when the sample-test similarity was low. This suggests that the retrieval process, rather than the maintenance stage, limits memory performance. Further studies could test this hypothesis more directly.
It is worth noting that we have used location-to-color binding in our study, and this kind of binding has some unique aspects that might not generalize to other types of binding. There is some evidence suggesting that objects are automatically encoded together with location. For example, some studies showed the task-irrelevant location of a stimulus can also be directly decoded from EEG data during the delay period of a working memory task (Elsley & Parmentier, 2015; Foster, Bsales, Jaffe, & Awh, 2017; Olson & Marshuetz, 2005). EEG studies show that spatial attention can be drawn to items in WM even when the cue is non-spatial and the location is entirely irrelevant for the task (Eimer & Kiss, 2010; Kuo, Rao, Lepsien, & Nobre, 2009). There are studies supporting an even stronger claim that non-spatial features binding is mediated by binding to locations (Schneegans & Bays, 2017; Treisman & Gelade, 1980; Pertzov & Husain, 2014), perhaps due to the fact that most neurons selective to visual features also exhibit some spatial selectivity (Schneegans and Bays, 2019). Further studies are needed to test whether the results in the current test generalize to cases in which objects are defined by two non-spatial features.
To conclude, our study suggests that working memory should not be construed as a passive storage of environmental input, but a dynamic system actively engaging in the task and preparing to carry out an effective response. Specifically, it provides a novel piece of evidence that the format of objects represented in the working memory, either as bound objects that integrate different feature dimensions or as separated features, is task-dependent. The finding of flexible maintenance of features and bound objects calls for attention to how experimental manipulations, either implicitly or explicitly, influence the format of objects maintained in working memory. Additionally, although the underlying mechanism demands further investigation, throughout two studies we found evidence to reject an intuitive suggestion that maintaining bound objects is more resource-demanding than maintaining the same number of features included in that object.
Declaration of interests
Ruoyi Cao, Yoni Pertzov, Mowei Shen and Zaifeng Gao declare no competing interests. Leon Y. Deouell is a co-founder, advisor and equity holder of Innereye Ltd which has no direct interest in the current study.
Supplementary material
Acknowledgements
The study was supported by a China-Israel cooperative scientific research provided by the Israel ministry of Science, Technology and Space to Leon Y. Deouell and Yoni Pertzov, and Ministry of Science and Technology of the People’s Republic of China (2016YFE0130400).to Mowei Shen. We thank Yongdi Zhou for helping with design and discussion of the present study. We thank Naomi Revel and Geffen Markusfeld for helping with running experiments.
Footnotes
↵1 Note that the cluster analysis does not ensure that the effect in any single point in the cluster is independently significant (Sassenhagen & Draschkow, 2019)
↵2 Note that a negative cluster on the left-temporal region is occasionally significant (P<.05), but it was not stable across multiple run in permutation.