Abstract
Human observers have the remarkable ability to efficiently prioritize task-relevant over task-irrelevant visual information. Yet, a fundamental question remains whether this ability is limited to a single task relevant item, or whether multiple items can be prioritized simultaneously. The answer to this question depends on 1) whether observers can concurrently prepare and maintain multiple top-down templates for more than one target object, and 2) whether those templates can then, in parallel, bias selection towards more than one target in the visual input. Here we disentangle these two processes for the first time. We measured electroencephalographic (EEG) responses while observers searched for two color-defined targets among distractors. Crucially, we not only varied the number of target colors that observers anticipated (thus determining the number of target templates), but also the number of colors used to distinguish the two target objects present in the search display (thus determining the number of templates required to engage in actual selection). Multivariate classification of the EEG pattern allowed us to track the attentional enhancement of each target separately across time. Both behavioral and electrophysiological results revealed only a small cost associated with preparing two versus one color template. In contrast, substantial costs arose when two templates had to be engaged in the actual selection of search targets. Furthermore, the results indicate that this cost is based on limitations of parallel processing, rather than a serial bottleneck. These findings bridge currently diverging theoretical perspectives on capacity limitations of feature-based attention.
Significance Statement Attention to the visual environment is by definition limited. Scientific debate has centered on how many objects human observers can look for at once. Using a novel electroencephalography approach we concurrently tracked attention to multiple visual target objects. We are the first ones to show that the process of attending to multiple objects involves two components: One where observers prepare for the selection of multiple objects, and one where observers actually engage in selecting those objects from the sensory input. The selection bottleneck is clearly in the latter stage, while there is relatively little cost in preparing for more than one target. The findings unite different theoretical stances in the field.
Introduction
Adaptive, goal-driven behavior demands the selection of relevant objects from the visual environment while irrelevant information is being ignored. This requires the activation of task-relevant representations in memory – often referred to as attentional templates – which then bias selection towards matching sensory input through top-down recurrent feedback (1–5). A fundamental yet unresolved question is whether multiple attentional templates can be deployed concurrently – a question that has recently generated considerable controversy, with arguments both for (6–15) and against (16–21) a strong bottleneck.
We provide electrophysiological evidence showing that the real bottleneck is not so much in the number of different templates that can be concurrently active in anticipation of a visual task, but in the number of matching sensory representations in the incoming signal that can subsequently be prioritized by those templates. Crucially, for the selection of multiple targets to be truly simultaneous, two requirements have to be met. First, attentional templates need to be set up for each anticipated target feature, presumably in visual working memory (VWM). Although it is uncontroversial that VWM can hold multiple representations (22), in order to be able to bias selection, each of these representations also needs to be in a state in which it can eventually engage, through recurrent feedback, with matching sensory signals (which is not the same as merely remembering; see refs. 23–27). Second, to simultaneously select multiple targets, the visual system must also be able to concurrently use those templates to prioritize multiple matching representations in the incoming sensory signal. In other words, multiple feedback loops must be able to engage concurrently. It is important to point out that template activation and template-guided prioritization are distinct (cf., ref. 28): It may be that at any moment multiple templates are ready to potentially engage in the prioritization of visual input, but that only one can actually do so following visual stimulation. So far, studies of multiple-target selection have only focused on the limits in the readiness to engage in selection, and ignored potential limits in the selection itself.
To resolve this, we recorded electroencephalograms (EEG) from the scalp of healthy human individuals while they were presented with heterogeneous visual search displays, from which they always had to select two target objects (see Fig. 1A). Crucially, we varied the number of unique target features (one or two colors) that the observer had to prepare for, and the number of unique features that they would need to select from the search display. This allowed us to disentangle the contribution of multiple template preparation on the one hand, and multiple template engagement on the other. A bottleneck could either emerge when going from one to two unique templates (reflecting a limit in the readiness to engage), from one to two unique targets (reflecting a limit in the engagement itself), or both.
Traditionally, visual target selection is assessed using the N2pc, an event-related potential (ERP) that is characterized by increased negativity over posterior electrodes contralateral to the hemifield in which the target is located (29, 30). However, because the N2pc can only distinguish between the left versus right hemifield, it is not able to simultaneously track the selection of multiple targets at different locations in more complex visual search displays. To overcome this limitation, we used multivariate decoding, which has been proven to successfully track the spatiotemporal dynamics of feature-based attentional selection at any location in a search display (31). This allowed us to independently track attentional selection over time for multiple concurrent targets at once, and also to investigate the parallel versus serial nature of these selection processes.
Results
Twenty-four participants performed a visual search task for which they were always required to find two color-defined target characters presented among an array of distractor characters, and determine whether these two targets belonged to the same alphanumeric category (see Fig. 1A). The task-relevant colors were cued prior to a block of trials. To assess if prioritization of multiple targets is limited in terms of the number of attentional templates that can be simultaneously set up, limited in the number of templates that can be simultaneously engaged in the selection of target features in the display, or both, we independently manipulated 1) how many colors were task-relevant and 2) how many of these target colors actually appeared in the search display. Specifically, in 1TMP–1TGT (one template, one target feature) blocks, only one color was task-relevant, so that both targets had the same color and thus participants knew beforehand which color template to prepare. In 2TMP–1TGT (two templates, one target feature) blocks, two unique colors were cued as task-relevant, but per display only one of these was used to distinguish the two targets present (i.e., both targets had the same color). Because participants could not predict which of the two target colors would be present, they had to keep both templates active, even though only one of them was then required for selecting the actual targets. Finally, in 2TMP-2TGT (two templates, two target features) blocks, again two unique colors were cued as task-relevant, but now both these colors also had to be used to select the two target objects from the search display, since one of the targets carried one color, and the other target carried the other color. Note again that in all conditions, subjects had to select two targets, only the number of target-defining features would vary across conditions. This controlled for other task-related factors such as the number of characters that had to be identified and the alphanumeric comparison that had to be performed on them.
Behavioral results
Fig. 1B and 1C show mean accuracy scores and mean response times (RTs) as a function of experimental condition (1TMP-1TGT, 2TMP-1TGT, and 2TMP-2TGT). Performance differences were assessed using pairwise, Bonferroni-corrected (to α = 0.025) classical t-tests and Bayesian t-tests on both measures. Any performance costs for the 2TMP-1TGT relative to the 1TMP-1TGT condition reflect the cost of preparing for multiple templates compared to a single template (preparation cost). Any performance cost in the 2TMP-2TGT relative to the 2TMP-1TGT condition represents the cost of having to engage multiple templates to select targets (engagement cost). We found evidence for both, with engagement costs being most prominent. Specifically, there was an effect of the number of templates on both accuracy and response times, with performance being reliably slower and slightly more error-prone in the 2TMP-1TGT condition than in the 1TMP-1TGT condition (RT: 731 ms vs. 679 ms, t(23) = 5.03, p < .001, Cohen’s d = 0.64, BF = 572; accuracy: 95.4% vs. 96.5%, t(23) = 2.76, p = .01, Cohen’s d = 0.61, BF = 4.4). Even stronger costs were observed when the number of uniquely colored targets in the display was increased from one to two, with performance being substantially slower and more error-prone) in the 2TMP-2TGT condition than in the 2TMP-1TGT condition (RT: 916 ms vs. 731 ms, t(23) = 9.05, p < .001, Cohen’s d = 1.63, BF = 2.5 × 106; accuracy: 91.4% vs. 95.4%, t(23) = 5.90, p < .001, Cohen’s d = 1.48, BF = 3.9). Indeed, when we directly compared these two sources of multiple-target cost to each other, the engagement cost was greater than the preparation cost on both measures (accuracy: 4.0% vs. 1.2%, t(23) = 3.36, p = .03, Cohen’s d = 1.03, BF = 14.8; RT: 185 ms vs. 52 ms, t(23) = 5.00, p < .001, Cohen’s d = 1.67, BF = 540).
Note further that in the 2TMP-1TGT condition, the actual target color in the display could repeat or switch from trial to trial. Previous work has shown switch costs, in which selection is slower after the target color changes from one trial to the next trial, compared to when the target color stays the same (7, 8, 32–34). A closer analysis of the current data also revealed that search suffered from switches, in terms of RTs (repeat trials: M = 704 ms, switch trials: M = 754 ms; t(23) = 8.1, p < .001, Cohen’s d = 0.56, BFswitchcosts = 4.2 × 105), and accuracy (repeat trials: M = 95.8%, switch trials: M = 94.9%; t(23) = 2.7, p =. 01, Cohen’s d = 0.40, BFswitchcosts = 4.0).
The behavioral data thus reveal that multiple target search comes with costs, and that these costs come in two forms. First, keeping two templates in mind results in relatively small but reliable costs compared to keeping only one template. This effect is strongest when the actual target color in the display has switched, suggesting a shift in weights on specific templates from trial to trial. Second, considerably larger costs emerge when the observer not only maintains two different templates, but also has to engage both of them in biasing selection towards the two corresponding targets. Note that this is not the result of the number of target objects per se, as participants had to select and compare two targets in all conditions, but it is caused by the number of unique features defining these targets. Selecting two objects by a single feature is thus more efficient and more accurate than selecting two objects using two different features.
Decoding of target positions based on the raw EEG
Next, to determine whether the behavioral costs indeed reflected deficits in the selection of the different targets, we used EEG to track the strength and dynamics of attentional enhancement of the different target positions. To this end, one target was always placed on the vertical meridian, and the other target always on the horizontal meridian, so that we could train separate linear discriminant classifiers (with electrodes as features) for each of the spatial target dimensions to distinguish left from right targets and top from bottom targets, separately for each condition and time sample (see SI Methods for details). We reasoned that any inefficiencies associated with setting up multiple unique templates (i.e., 1TMP vs. 2 TMP conditions) and/or with actually using those templates to select multiple unique targets (i.e., 1TGT vs. 2TGT conditions) should result in decoding to suffer in terms of relative delays, strength, or both. Fig. 2A shows decoding performance for each of the conditions (1TMP-1TGT, 2TMP-1TGT, and 2TMP-2TGT), separately for the horizontal (left versus right) and vertical meridian (top versus bottom). Fig. 2B shows the topographical patterns associated with the forward-transformed classifier weights over time, which are interpretable as neural sources (see ref. 38 and SI Methods). As a general finding, we were able to track attentional selection on both the horizontal and vertical meridian, with comparable decoding performance. Decoding performance was tested against chance for every sample, corrected for multiple comparisons using cluster-based permutation testing (43, also see SI Methods). After cluster-based permutation, we observed clear significant clusters in each of the three conditions, with significant decoding emerging at different moments in time. For the left-right distinction, the topographical pattern during the early time window (200–350 ms) resembles that of the N2pc, while for later time windows (350–700 ms) it resembles SPCN or CDA-like patterns (39–41). As shown in Figure S1 (and the SI Results), more traditional event-related analyses indeed revealed N2pc and SPCN components, which likely contributed to the classifiers’ performance. For vertically positioned targets a gradient from frontal to posterior channels spread along the midline, similar to recent results from our labs (31, 42). The fact that the decoding approach picks up on information related to attentional selection also on the vertical midline is testament to its power over conventional ERP methods, and allowed us to simultaneously track attentional selection of both targets over time. However, as there were no main or interaction effects involving the meridian in any of the comparisons, we averaged decoding performance across the spatial dimensions.
If there is a limit on how many templates can be prepared for, we should find reduced and/or delayed classification for the 2TMP-1TGT condition compared to the 1TMP-1TGT condition (Fig. 2C). If the limitation is on how many templates can be engaged in selection, the cost should emerge in the comparison of the 2TMP-2TGT and 2TMP-1TGT conditions (Fig. 2D). Indeed, we observed reliable differences for both comparisons that directly resembled the behavioral pattern. First, we compared the latencies at which target positions became decodable, thus providing a window on any delays in attentional selection. Because differences in onset of significant clusters cannot be reliably interpreted as reflecting differences in onsets of the underlying neurophysiological processes (44), we instead used a jackknife-based approach to quantify the latency of the 50% maximum amplitude in the decoding window (35–37, see SI Methods). This revealed a reliable onset difference between the 1TMP-1TGT (M = 216 ms) and 2TMP-1TGT (M = 235 ms) conditions (M = 20 ms, tc(23) = 2.97, p = .007; Fig. 2C), indicating that attentional selection is delayed as a result of having to prepare for two different target colors compared to having to prepare for only a single target color. Comparing the onsets between the 2TMP-1TGT (M = 235 ms) and 2TMP-2TGT (M = 263 ms) conditions yielded a further delay of 25 ms associated with having to engage in selecting two target colors compared to selecting a single target color (tc(23) = 2.35, p = .02; Fig. 2D). Finally, and also similar to the behavioral responses, the onset of the neurophysiological response in the 2TMP-1TGT condition was delayed by 23 ms when the target color switched from one trial to the next, compared to when it repeated (tc(23) = 4.34, p < .001; see Fig. S2).
Next, we assessed the strength of classification over time by testing AUC values of the the relevant conditions against each other using paired t-tests and cluster-based permutation testing to correct for multiple comparisons (see SI Methods). This procedure revealed an early and short-lasting difference of the number of templates (i.e., between 1TMP-1TGT and 2TMP-1TGT conditions; see Fig. 2C), with stronger classification for the single template condition that reflects the onset latency difference reported above. Again in line with the behavioral results, more substantial cost in decoding performance emerged when the number of target features in the displays increased from one to two (i.e., between the 2TMP-2TGT and 2TMP-1TGT conditions; see Fig. 2D).
Thus, both the onset latency and strength of decoding performance show clear deficits in attentional selection when observers need to select two different targets from a display (i.e., engage two templates in selection) compared to when they have to select two targets based on the same target color (i.e., engage one template in selection). In contrast, having to set up two templates instead of one came with only minor onset latency differences and no overall differences in decoding strength. This clearly points to a deficit when multiple templates need to be engaged simultaneously rather than when multiple templates need to be prepared simultaneously.
Sample-wise correlation of classifier confidence across trials as a measure of intertarget dependency
While the previous section showed a clear impairment when two templates need to be engaged in selection, it leaves unanswered whether selection is hindered by limitations in parallel processing or by a serial bottleneck. That is, engaging two templates during search may prioritize both unique targets in parallel but in a mutually competitive manner (45), or the two templates may only be engaged (and thus the corresponding targets prioritized) sequentially, possibly in continuously alternating fashion (e.g., ref. 7).
To investigate these competing hypotheses, we assessed performance for each target dimension separately (horizontal and vertical). A serial model predicts that attention to a target on one dimension should go at the expense of attention to the target on the other dimension, and thus decoding performance for the vertical and horizontal axes to correlate negatively. In case of parallel, independent selection, there should be no systematic relationship between classification confidence for one dimension and classification confidence for the other dimension, as selection of one target is impervious to the selection of the other target. A positive correlation would arise from a common mechanism driving selection of two different targets. Note that these possibilities are difficult to assess at the group level as individuals may have different serial strategies. For example, one observer may prefer to first select targets from the horizontal axis, while another may prefer the vertical axis first, such that any existing correlation (if present) might cancel out. Hence, we first plotted average performance over time separately for each individual and separately for the horizontal and vertical axis. Then, to reveal whether consistent temporal dependencies existed for any given participant, we correlated classification performance over time in the 150 ms to 700 ms post stimulus window. Although this revealed incidental positive and negative correlations for individual participants, there was no systematically positive or negative relationship (average correlation ρ = 0.11; min-max range: −0.37-0.63; see Fig. S3).
However, even individual participants themselves may not behave consistently across trials, and, while selection is still serial, which axis is preferred may also vary from trial to trial. Selection therefore needs to be assessed at the trial level. To this end, we extracted the classifier confidence scores for each dimension (horizontal and vertical), per individual participant, trial and time point (see SI Methods for details, and also ref. 46), and correlated the two dimensions across trials using Spearman’s ρ. Classifier confidence, expressed as the distance from the decision boundary, can be interpreted as the representativeness of a certain instance of a particular class, in this case how strongly a particular pattern resembles the typical EEG pattern across electrodes for a target being present at a specific location. We reasoned that if prioritization is limited to a single target at a time, a classifier cannot simultaneously have high confidence about both targets, and thus confidence should correlate negatively. The correlations between the confidence scores on the two spatial dimensions are plotted in Fig. 3A. As can be seen, there was again no systematic relationship between decoding the locations of the two targets, in any of the conditions. Apart from a short-lasting positive correlation around the 500 ms time point in the 2TMP-2TG condition which is likely to be spurious, correlations for all time points were close to zero.
However, given that this is a null result, we sought to make sure that our approach is in principle sensitive to existing correlations. To this end, we simulated a data set with the same overall characteristics as the recorded data, but with either positive, negative, or no correlations injected, under various signal to noise ratios (see SI Methods for details). The results of this simulation are summarized in Fig. S4 of the SI Results and demonstrate that with sufficiently high decoding AUC values (> approx. 0.55-0.60), correlations (whether positive or negative) between the horizontal and vertical position classifiers can, in principle, be reliably detected. However, because group classification performance in our dataset did not exceed 0.59 (in the 1TMP-1TGT condition), we instead assessed for each individual observer the correlation between target dimensions for those time points at which classification performance reached its maximum. As Fig. 3B shows, even for individuals with relatively high classification scores, there was no evidence for a correlation between the classification confidence between the two target dimensions. The absence of such a correlation in our data is thus most consistent with a limited parallel independent selection model, rather than a serial model or a parallel model operating under a common mechanism.
Discussion
Selection of task-relevant information from complex visual environments is limited, and a central question in attention research has been whether observers can simultaneously prepare for and select multiple different target objects. The current results provide evidence that these limitations do not so much reside at the level of template preparation (i.e., the number of target representations set up prior to the task), but at the extent to which templates can then be concurrently engaged in selecting matching information from the sensory input. By systematically varying not only the number of different target features observers had to prepare for, but also the number of different target features they would encounter in the displays, we were able to, for the first time, dissociate limitations in template preparation from limitations in template engagement. Specifically, we observed relatively small but reliable costs on both behavioral and EEG classification performance when two templates needed to be activated instead of one, suggesting a reliable but relatively minor bottleneck at this stage of processing. In contrast, substantial costs emerged on both behavioral and EEG performance measures when two templates had to be prepared, and both of these templates (rather than just one) had to be engaged in driving the selection of two different targets.
We propose a model which extends existing frameworks that assume a crucial role for top-down biased competition (2–4, 46). According to these frameworks, the activation of target templates in working memory involves the pre-activation or biasing of associated sensory features. The presence of such features in the input will then trigger a long-range recurrent feedback loop, leading the enhancement of the target representation in VWM (including its location), and thus making it available for other cognitive processes such as response selection (processes which are themselves limited, cf. refs. 47–49). Our data indicates that while multiple top-down feedback connections may be prepared at once, there is a limitation in how these feedback loops are engaged by matching input. Figure 4 illustrates how we believe the existing framework should be extended. Specifically, we propose that multiple templates may hold each other in a mutually competitive relationship within VWM, most likely through laterally suppressive connections (50). Figure 4A depicts the situation when just one of the target features is then encountered in the sensory input. The corresponding feedback loop is triggered, leading to an enhanced target representation within VWM. If only one target feature is present, the corresponding template will automatically win the competition. Although two templates can be maintained in parallel, the mutual competition between them is slightly disadvantageous. This will lead to the initial delay in target selection that we observed in the data when two templates instead of just one were activated. Moreover, the selective enhancement of one representation over another may carry over to the next trial, thus resulting in the target switch costs that we also observed both in behavior and EEG performance measures.
The crucial situation occurs when the visual input contains multiple target features and thus multiple feedback loops are being triggered, as is shown in Figure 4B. Because of the mutually suppressive relationship, strengthening one feedback loop will automatically go at the expense of the other. Although both loops are triggered in parallel, the mutually aversive relationship results in slower and weaker accumulation of evidence for either of the targets, consistent with what we observed in the data. In theory, the system may resolve such competition in two ways. The first is to keep selection of both targets running in parallel, and accept the slower evidence accumulation. The second option is to impose a serial strategy in which selection is first biased in favor of one target, and then switched to the other (or alternate between the two). Our data provides no evidence for the serial model. First, the average group data nor the average individual subject data showed any systematic pattern of switching between the two target positions (i.e., differences in classification performance for left-right versus top-bottom). Second, also a trial-based correlation analysis of classifier confidence scores showed the absence of a negative correlation between the target positions. Our findings are therefore most consistent with a limited-capacity parallel model, in which observers maintain two templates active during search, but with mutually aversive consequences. However, we point out that our data do not exclude the possibility of seriality. First, while there may have been little seriality in selecting the targets from the displays on the basis of color, there may have been a serial component in accessing their alphanumeric identity – a component to which our classifier was not sensitive. Moreover, there is still a distinct possibility that imposing seriality is a valid strategy that observers may deploy to resolve competition between different target features, but that such choices depend on tasks, context, or instructions (51, 52). For example, we previously observed evidence for serial switching in a different paradigm when observers had to select only one of two targets present, and were instructed to switch at least a few times during a block (15, 53). The current results indicate that the process can occur in parallel, not that it must.
We believe the distinction between template preparation and template engagement in selection has great potential for resolving the current debate on whether observers can look for more than a single target at the same time (7, 8, 13, 16, 17, 19, 21). Studies central to this debate have largely focused on how many templates can be prepared in anticipation for a search, rather than how many of these templates can then be concurrently engaged in selection without costs. From our data, the answer to the question then appears to be yes, observers may look for multiple targets simultaneously at little cost, but it is selecting those targets that runs into real limitations.
Although we found the costs of going from one to two templates to be relatively small, this leaves open the question whether costs will increase more strongly with more templates being added. Given the capacity limits of VWM, this is to be expected, merely because memory itself will start to fail. Interestingly though, work by Wolfe (54) has shown that observers can successfully search for tens of different target objects if given the opportunity to first commit these objects to long term memory. In fact, given that in our experiment the target template remained the same for a block of trials, observers may have at least partly relied on long term memory here, too (but see ref. 18 for evidence that measures of attentional selection, i.e., the N2pc, are not affected by whether targets are stored in long-term memory or working memory). The question of capacity is also important when considering that current limitations were found when both target features were drawn from the same dimension (color). There is evidence that different dimensions may to some extent independently store (e.g., ref. 55), or guide attention towards (56, 57), target features. Our methods may therefore prove useful in assessing the exact limitations of selecting targets defined along different dimensions.
To sum up, we propose that models of visual selection need to consider the difference between preparing for selection and engaging in selection of multiple visual targets. We demonstrate that whereas the first process comes at little cost, the true bottleneck of multiple-target selection is in engaging multiple template representations.
Supporting Information
N2pc Analysis
Even though the backward decoding approach would already show whether and when location-specific information would be present in the raw EEG, for the sake of comparison to the existing N2pc literature, we also conducted a more common event-related potential (ERP) analysis to examine latency and amplitude of the N2pc component. First, to identify N2pc components, we computed ERPs locked to stimulus onset at electrodes PO7 and PO8. ERPs at the ipsilateral electrode relative to the horizontal target position (i.e., PO7 for targets on the left, PO8 for targets on the right), were subtracted from ERPs at the contralateral electrode, collapsed over the vertical target position, but separately for each participant and condition. The resulting difference wave forms were then statistically tested against zero with two-sided one-sample t-tests at each time point. A cluster-based permutation test (5000 permutations, α = .05) was performed on contiguous time points to correct for multiple comparisons (6). To quantify amplitudes and onset latency the same approach as for the classification scores was used, with the exception that we did not use the entire epoch when looking for the peak, but only the window of 200 – 350 ms post stimulus, as this is the time window in which the N2pc is typically observed (e.g., refs. 18, 19).
Reliable N2pc components were identified in all three conditions (see Fig. S2). Furthermore, also the sustained posterior contralateral negativity (SPCN) was observed. This component is thought to reflect visual working memory processes (20–23). In the 1TMP–1TGT and 2TMP–1TGT conditions the N2pc merged directly into the SPCN, while in the 2TMP–2TGT condition the N2pc first disappeared, before the SPCN emerged later in the trial.
Onset Latency
To examine potential differences regarding the onset of N2pc waves, we ran pairwise t-tests between the 1TMP–1TGT and 2TMP–1TGT conditions and the 2TMP–1TGT and 2TMP–2TGT conditions on the jackknife-estimated onset latencies (50% fractional peak latency). For the comparisons 1TMP–1TGT vs. 2TMP–1TGT, we found N2pc latencies for the 2TMP-1TGT (M = 237 ms) condition to be significantly delayed relative to the 1TMP–1TGT (M = 214 ms) condition (M = 23 ms, tc(23) = 3.5, p = .002). Unexpectedly, N2pcs in the 2TMP-1TGT condition were also significantly delayed relative to the 2TMP–2TGT (M = 225 ms) condition (M = 12 ms, tc(23) = 2.2, p = .04). Finally, to assess the strength of the N2pc over time, we took the difference between the N2pcs in the 1TMP–1TGT and 2TMP–1TGT conditions and between the 2TMP–2TGT and 2TMP–1TGT conditions and compared them against zero, by running a one-sampled t test with cluster-based permutation test (5000 permutations). This analysis suggested that the N2pcs in the 1TMP–1TGT and the 2TMP–1TGT conditions rose to a similar level, whereas the N2pc in the 2TMP–2TGT condition was significantly smaller in large parts of the N2pc window, as well as the SPCN window. Overall the N2pc results are similar to the findings of the multivariate decoding approach, though the latter is likely to pick up on additional information as it is not limited by choice of electrode and time window.
Correlation of confidence scores on simulated data with known underlying correlational structure
To investigate whether the impairment of participants when they needed to engage two templates in selection was due to limitations in parallel processing or a serial bottleneck, we computed Spearman’s ρ correlation between the classifier confidence scores for targets on the horizontal and vertical dimension across trials, separately for each time point, and condition. A negative correlation would provide evidence in favor of a serial bottleneck, while uncorrelated confidence scores would suggest parallel processing. However, an uncorrelated signal would also be expected if decoding strength was simply too weak due to an insufficient signal-to-noise ratio (SNR) in the data. As we did observe a null correlation, we wanted to make sure that we had enough statistical power to detect a correlation if it was actually present. To that end, we ran a simulation in which we embedded a signal in noise and systematically manipulated the noise levels, to determine at which decoding strength (i.e., SNR) a known correlation could be extracted. This simulation was set up in such a way so that the exact same analysis pipeline could be applied as for the actual data. Specifically, we replaced the data of eight channels with simulated data in which we injected either a positive, negative or null correlation between horizontal and vertical targets and varied the overall noise level. The data were created by generating half a cycle of a sine wave with an amplitude of 1 µV, extending over 400 ms (200 – 600 ms post stimulus) and assigned to a subset of channels to reflect attentional selection. To create location-specific effects (i.e., contra vs. ipsilateral), we injected the same ERP with a negative amplitude on an orthogonal subset of the channels. Therefore, attentional selection was simulated with a positive ERP on half the channels and a negative ERP on the other half. Importantly, attentional selection of vertical and horizontal targets was simulated independently, by using an orthogonal split of the channels into contra- and ipsilateral. For every correlation pattern we simulated 512 trials, the same number as in the real experiment. For the positive correlation, we injected the ERP for both vertical and horizontal targets on half of the simulated trials, and no ERP on the other half, reflecting either both targets to be selected simultaneously, or none of them (parallel selection). For the negative correlation, the ERP was either injected for vertical targets or for horizontal targets (each half of the trials), but never in both, reflecting the selection of either one or the other target (serial selection). For the null correlation, per trial, we randomly chose whether an ERP was present for one of the targets, both, or none. Next, we added random noise for all trials. Critically, the SNR was parametrically manipulated, relative to the (constant) amplitude of the ERP. For example, a SNR of 4 means the peak ERP amplitude was four times as high as the maximum noise amplitude. In total, we used SNRs of 4, 2, 1.33, 1, 0.67, 0.5, 0.33, 0.25, 0.2, 0.17, 0.14, 0.13, 0.11, 0.1, 0.07, 0.05, and 0.04. Once the simulated dataset was created, the same backward decoding model (see SI Methods) was used to decode the target location, separately for the vertical and horizontal target, the injected correlation and the SNR. Similarly, the classifiers’ confidence scores were correlated between vertical and horizontal targets, as was done for the actual data. Fig. S4A demonstrates that location-specific information could be decoded above chance for all SNRs, but that the classification performance declined with decreasing SNRs. Fig. S4B then shows that whether or not the injected correlation could be retrieved from the data strongly depended on the SNR. For example, at an SNR of 0.1 the injected correlations between horizontal and vertical targets could not be retrieved, despite classification performance being reliably above chance. Notably, peak classification at that SNR is comparable to the average group classification scores we observed in our data, suggesting that if there were a correlation present in the actual data, the SNR may have been too low for our analyses to detect it. To overcome this problem, we leveraged the individual subject data, as this showed a considerable variability. Specifically, we extracted the peak AUC score per classifier (top-bottom and left-right) for each individual within the 150-700 ms time window, and then assessed the correlation between these dimensions for these time points. Figure 3B in the main text shows these individual maximum AUC scores plus the associated correlations. Maximum classification performance of several individuals was well above the threshold at which correlations should become decodable (AUC > 0.6). Importantly, no correlations in decoding strength between target locations was observed for any of those participants, whether positive or negative. This is consistent with the idea that the absence of correlations in the actual dataset reflects limited parallel processing.
Acknowledgments
This work was supported by Open Research Area Grant 464-13-003 from the Netherlands Organization for Scientific Research and by European Research Council Consolidator Grant ERC-2013-CoG-615423 to C. N. L. Olivers.
Footnotes
Author Note: E. Ort, Department of Experimental and Applied Psychology, Institute for Brain and Behaviour, Vrije Universiteit Amsterdam, van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands. E-mail: eduardxort{at}gmail.com
↵* For the first two participants presentation time was two display frames (∼16.67 ms) shorter than for the rest of the sample. To facilitate good behavioral performance, we increased presentation time from the third participant onwards. However, as these two participants performed well, even with 16.7 ms presentation rates (and thus met our inclusion criteria), we decided to keep them in the sample.