Sensitivity enhancement and selection are shared mechanisms for spatial and feature-based attention

Human observers use cues to guide visual attention to the most behaviorally relevant parts of the visual world. Cues are often separated into two forms: those that rely on spatial location and those that use features, such as motion or color. These forms of cueing are known to rely on different populations of neurons. Despite these differences in neural implementation, attention may rely on shared computational principles, enhancing and selecting sensory representations in a similar manner for all types of cues. Here we examine whether evidence for shared computational mechanisms can be obtained from how attentional cues enhance performance in estimation tasks. In our tasks, observers were cued either by spatial location or feature to two of four dot patches. They then estimated the color or motion direction of one of the cued patches, or averaged them. In all cases we found that cueing improved performance. We decomposed the effects of the cues on behavior into model parameters that separated sensitivity enhancement from sensory selection and found that both were important to explain improved performance. We found that a model which shared parameters across forms of cueing was favored by our analysis, suggesting that observers have equal sensitivity and likelihood of making selection errors whether cued by location or feature. Our perceptual data support theories in which a shared computational mechanism is re-used by all forms of attention. Significance Statement Cues about important features or locations in visual space are similar from the perspective of visual cortex, both allow relevant sensory representations to be enhanced while irrelevant ones can be ignored. Here we studied these attentional cues in an estimation task designed to separate different computational mechanisms of attention. Despite cueing observers in three different ways, to spatial locations, colors, or motion directions, we found that all cues led to similar perceptual improvements. Our results provide behavioral evidence supporting the idea that all forms of attention can be reconciled as a single repeated computational motif, re-implemented by the brain in different neural architectures for many different visual features.

The visual world presents human observers with an overload of sensory information, only part of which is relevant suggest that spatial and feature-based attention are more similar than different and support theories of a shared

51
To measure the effect of cueing by feature or location on perceptual sensitivity we asked observers to perform a 52 cued motion direction averaging task (Fig. 1). Briefly, observers were asked to report the average motion direction 53 of two out of four random dot patches. The two selected patches were cued either by their common location or 54 color, randomly interleaved across trials. This task thus engages either spatial or feature-based attention according 55 to the current cue. In addition, the task does not require working memory, which avoids potential confounds 56 introduced by storing perceptual information during a delay. 57 By examining known stimulus manipulations we first showed that our task provided a good measure of perceptual 58 sensitivity. As expected, we found that estimates of average direction were more precise on easier trials. This 59 was true both for trials with a smaller angle difference between the two cued patches (Fig. 2a) and for trials Observers were asked to select two out of four random dot patches and average their directions of motion. Observers initiated trials by fixating a central cross (Fixation). During this initial period and until stimulus presentation the dots in the four patches moved incoherently. A cue was shown at the fixation cross indicating which two dot patches should be averaged (Cue): a line to the left or right of fixation indicated selection by side or a mini-patch of dots colored yellow or blue indicated selection by feature. After a brief delay (Inter-stimulus interval) the four dot patches moved coherently in random directions for a variable duration (Stimulus). After another brief delay (Delay) observers used a rotating wheel to report the average direction of motion for the two dot patches they were asked to select. Feedback was given by indicating the true average motion direction.
with longer duration (Fig. 2b). To quantify this, we fit a model of perceptual sensitivity in estimation tasks (the Data are split by the angular distance between the motion direction of the two dot patches which were cued. Note that the x-axis in all panels has been re-scaled from degrees to psychophysical distance, see Methods. (b) Conventions as in (a), data are split by the duration of the stimulus. (c) Conventions as in previous panels. Selection by spatial location (i.e. averaging the two patches on the right or left) is shown in yellow, and selection by color (i.e. averaging the two yellow or blue patches) is shown in blue. The two inset plots show the same histogram in a circular space, with a red dashed line indicating the true average. In all panels lines indicate the average normalized histogram of response counts across observers and shaded regions the 95% confidence interval.  (Fixation). A pre-cue (Cue) was then shown at fixation to indicate to observers which of the four dot patches they should select. A brief delay (Inter-stimulus interval) followed. Up to this point all four dot patches were colored white and moving incoherently. The dots then became colored and coherent for a variable duration (Stimulus). After another brief delay (Delay), observers were shown a second cue which was used to disambiguate the target stimulus (Post-cue). For example, if the observer was cued to remember the two stimuli on the left, the post-cue could be yellow to indicate that of the two patches that were cued (blue and yellow, left side) only the motion direction of the target (the yellow patch on the left) should be reported. Observers were given unlimited time to respond (Response) and received feedback before the next trial (Feedback). (b) A second variant of the same task was also run in which the cues were side (left or right) or motion direction (up or down) and observers reported about the color of the dot patches. particular cued features. In one variant observers were cued by location or color and had to report the motion 92 direction of a dot patch (Fig. 3a). In the other, observers were cued by location or motion direction while 93 reporting color (Fig. 3b). 94 We first evaluated the cued estimation task on two reference conditions: trials in which no cue was given and 95 trials where no distractors were shown. These two references provide a lower bound (Uncued condition, Fig. 3) 96 and an upper bound (No-distractor, Fig. 3) on performance. The no-distractor condition is an upper bound 97 because the stimulus to report appears in the absence of distracting information. This should be equivalent to 98 the optimal performance of an observer cued to select a single dot patch.

99
The reference conditions showed that observers could perform this task and that indeed, the absence of distractors  To decompose the responses in the cued estimation task we extended an existing observer model (Schurgin et al.,105 2020) to fit separate parameters to capture sensitivity (d ) and selection (β) (Fig. 4). On each trial, the observer 106 model encodes the four stimulus patches (Fig. 4a,b) by a set of noisy channels (Fig. 4c). A parameter, d , 107 controls the maximum value of the channels and the sensitivity of the observer (Fig. 4d). In this model, the 108 channel with the maximum response "wins" and becomes the observer's estimated angle for that dot patch (red 109 dashed line, Fig. 4e). The reported angle is then sampled from the estimated angles for the four dot patches 110 in proportion to the β parameters ( Fig. 4f,g). By computing the probability of each channel winning we can 111 generate the full response likelihood for each dot patch (color distributions, Fig. 4e), i.e. how likely the observer 112 was to make a particular response having seen that patch. When weighted by the β parameters, which fit the 113 probability of choosing to report each patch, we get the distribution of response likelihoods for that trial, given 114 all stimuli (Fig. 4h). Thus, in our model the β parameters control how selection occurs while the d parameter 115 controls the observer's precision of report. 116 We first confirmed that the model captured the perceptual sensitivity of the observers in the two reference 117 conditions. The model accounted for the qualitative aspects of the data well (curves track the markers, Fig.   118 5a,b). For the report-color task the average R 2 pseudo over observers was 0.  Figure 4: Estimation task model. (a-c) On a trial, stimuli of varying angles are encoded by many independent channels (each channel is represented by a single colored tuning curve). Each channel's tuning profile is defined by the psychophysical distance function (see Methods) relative to that channel's preferred angle. (d) The channel responses for a trial are noisy, so for a particular presented stimulus each channel will have a response sampled (black markers) from a normal distribution with a mean set by the height of the tuning curve at the stimulus angle (colored markers indicate mean and error bars ±1 standard deviation). The model's predicted response to a dot patch is found by taking the channel with the maximum sampled response and reporting its preferred angle. These winning angles are shown as a red vertical dashed line in (e) along with the probability distribution of each channel having the maximum response. A free parameter d sets the spread of this distribution by multiplicative scaling of the peak responses of the channels, in a manner analogous to signal detection (see Methods). (f) The selection parameters (β) control the probability that each dot patch will influence an observer's report. (g) From a discrete sampling perspective, the β values sets the proportion of trials where the observer will report about the estimated angles of particular dot patches, e.g. here the estimated target angle is reported. (h) To fit the model we computed the full likelihood distribution for each trial. We then optimized the model parameters to maximize the likelihood of each observer's actual reports across all trials. Looking at the sensitivity parameter we found that without distractors observers consistently made more precise 125 reports (Fig. 5c) for about one in every three trials.

139
Returning to the main hypothesis, we next looked at whether the cued trials caused a change in sensitivity or 140 selection when compared to baseline performance. In the cued trials, observers selected dot patches by their 141 common spatial location (left or right) or common feature (yellow/blue color, or up/down motion for the two task 142 variants, respectively) ( Fig. 3). Although cued to two dot patches, observers still reported only the properties of a 143 single dot patch at the end of each trial, uniquely identified by the post-cue. As expected, we found performance 144 on these trials to be intermediate between the reference conditions ( Fig. 6a, Although both the uncued and cued trials had distractors present, we found that cueing reduced the impact of 150 the distractors, both improving sensitivity and reducing selection errors. To quantify this, we combined trials from 151 the two cued conditions and looked at the values of d (Fig. 6c) and β (Fig. 6d). Cueing, by spatial location or 152 feature, increased d in both task variants (Fig. 6c) igure 5: Baseline estimation performance. (a) A histogram of observer responses, relative to the true target motion direction is shown averaged across observers for the uncued and no-distractor conditions. Markers indicate the mean and error bars the 95% confidence intervals. Lines are the average fit of the model. (b) As in (a) for the report-color variant. (c) The d parameter is shown for individual observers (gray) and the average (black) for each condition and task variant. (d) The β parameters are shown averaged across observers for each condition and task variant. β target refers to the dot patch which was post-cued (here left side, moving up), β side is the patch on the same side as the post-cued patch (left side, moving down), β f eature is the patch on the other side with the matched feature (right side, moving up), and β distractor is the patch on the other side with a mis-matched feature (right side, moving down).   .20] for report-color. These data show that cueing has a substantial impact on performance in this task, 166 improving sensitivity and reducing the probability that observers will misreport about the irrelevant dot patches.

167
Because the change in d and β that we observed were small we performed a model recovery simulation to confirm 168 that our dataset was sufficient to detect these effects with high power (Fig. 6e,f). Briefly, we simulated data 169 sets with various d and β parameters using the range of values observed in the cued and uncued conditions 170 (Fig. 6c,d). We generated 200 such data sets for every combination of parameters and fit these with our analysis 171 pipeline. We then bootstrapped the resulting values to compute the model's true positive rate (sensitivity), i.e.

172
the probability that the model would recover a difference in d or β when a real difference was simulated with While the averaging task data showed that human observers could use spatial and featural cues with similar efficacy, this could be done by similar or different computational mechanisms. Testing this hypothesis by fitting 177 our model with either shared or separate parameters for the Uncued, Cue spatial, and Cue feature conditions 178 (Fig. 7a), revealed that spatial and featural cues employ a shared computational mechanism. We first fit a Null 179 model which used the same parameters for all three conditions, testing the null hypothesis that cueing had no 180 effect. We then compared this to a Cued model which separated the spatial and feature cueing conditions from 181 the uncued condition (the parameters from this model are reported in Fig. 6c,d). Finally, we fit a model in which 182 all three conditions had separate parameters. We found that separately fitting cued trials improved model fit ( Fig.   183 7b), but there was no additional improvement to fitting spatial and feature-based conditions separately. Adding  We found that spatial and feature-based cues changed perception in similar ways across a set of cued estimation 194 tasks. In one task observers averaged motion directions cued either by location or color, a perceptual judgment 195 with low working memory load. We found that observers were able to use both cues with similar efficacy, suggesting 196 that they had similar effects on sensitivity. Although this measurement put each cue on the same scale it did not 197 demonstrate computational similarity: observers might, for example, use sensory enhancement more when cued to 198 a spatial location. We designed a set of tasks to separate sensitivity enhancement from errors of selection through  The mean angle in our task was uninformative across trials, even though on a small number of trials with clustered

288
A common mechanism combined with normalization predicts that the shift in spatial attention in our task, between 289 selecting two overlapped dot patches or two separated patches, will lead to a change in the pool of activity that

299
Researchers studying visual search have long held that visual features are extracted and processed in a parallel 300 step where spatial information is prioritized (Treisman & Gelade, 1980;Wolfe, 1994). Physiology experiments, 301 in turn, have gone on to separate the neural effects of spatial and feature cues using different tasks. Because 302 of these operational differences, many studies have found that spatial and featural cues have unique behavioral In total 16 observers were subjects for the experiments (9 female, 7 male, mean age 25 y, range 19 -37). All 310 observers except one (who was an author) were naïve to the intent of the experiments. Three observers were 311 excluded during the initial training sessions and one after data collection due to an inability to maintain appropriate 312 fixation (see eye-tracking below). Potential observers were not considered for inclusion in the study if they self-313 reported any anomaly of color vision (e.g. color-blindness). At the start of the experiment observers completed 314 the Ishihara test for color vision (Ishihara, 1987) and one observer was excluded due to anomalous responses.  refresh-rate of 120 Hz) at a 60 cm viewing distance. Output luminance and spectral luminance distributions were

Averaging task
On each trial in the averaging task observers were asked to report the average motion direction of two dot patches 354 (Fig. 1). Before stimulus presentation a cue indicated to observers the features they would use to select the two 355 dot patches. There were two ways that observers were instructed to select these dot patches out of the four on the 356 screen: they could either be the two on the same side (left/right) or the two with the same color (yellow/blue).  Fig. 1). Each trial was followed by a brief inter-trial interval (0 -2 s, uniformly distributed).

372
On each trial in the estimation task observers were asked to report about either the color or motion direction of a 373 single dot patch (Fig. 3). Before each block of 40 trials, observers were told which feature would be reported with 374 either the phrase "report color" or "report direction" appearing on the screen. On each side during report-direction 375 blocks one dot patch was colored blue and the other yellow and all four dot patches moved in random directions 376 (0 to 359 deg, uniformly distributed). During report-color blocks the stimulus properties were inverted. On each 377 side one dot patch moved upwards and the other downwards and the four dot patches were colored using angles 378 in L*a*b* space (L* = 60, a*= cos θ, b*= sin θ, θ sampled from 0 to 359 deg, uniformly distributed).

379
Before stimulus presentation a pre-cue indicated to observers the features of the target patch or gave no infor-380 mation, depending on the type of trial. There were two ways that observers were instructed to select dot patches 381 out of the four on the screen: they could either be the two on the same side (left or right) or the two with the same feature (yellow or blue in cue-color blocks, up or down in cue-direction blocks).

383
These pre-cues were the same as in the averaging task, either lines (cueing left or right) or patches of dots (cueing 384 either color or motion direction). The pre-cues were blocked, so that the same cue type was repeated for twenty 385 trials (i.e. cue side trials repeated, sampled randomly between cue left and cue right). A post-cue always indicated 386 the specific patch that needed to be reported. Each trial consisted of the following sequence (Fig. 3) chosen based on the averaging task to make the estimation task difficult for participants. 391 We also included uncued and no-distractor conditions as references. Comparing cued to uncued trials let us test for 392 improved performance due to cueing, while trials without distractors gave us a measurement of the performance 393 ceiling. In the uncued condition (Uncued, Fig. 3) observers were shown an uninformative pre-cue, then shown a 394 post-cue which resolved which dot patch should be reported. In the no-distractor condition only the target patch 395 was shown and observers reproduced the color or motion direction without interference. These control conditions 396 were otherwise identical in timing to the regular trials and were also blocked in twenty trial sets. In total, 30% 397 of trials were cue side, 30% cue feature, 20% uncued, 10% no-distractor, and an unused condition accounted for 398 the last 10%.

400
Psychophysical distance 401 We designed our analyses to avoid conflating poor sensitivity with high lapse rates by converting angular to 402 psychophysical distance (Schurgin et al., 2020). When observers estimate motion direction or color in angular 403 space, including in our data, they often make a large number of responses far from the target angle. At first 404 glance, these appear to be guesses on lapse trials (Zhang & Luck, 2008). Previous work has demonstrated that To quantify the observer accuracy in the averaging task we fit a simple model of perceptual sensitivity for angular 419 estimation tasks, the "target confusability competition" model (Schurgin et al., 2020). In the following sections 420 we will build up this model of observer behavior.

421
The model takes into account two aspects of sensory representations to predict observer behavior. First, the model 422 takes into account the "confusability" of stimuli by transforming angular distances into psychophysical distance 423 (Eq. 1). In a second step, noisy internal channels tuned according to the psychophysical distance independently 424 "compete" to represent a stimulus in a manner analogous to signal detection (Fig. 4a). On each trial the model 425 proceeds according to the following steps.

426
First, the stimulus angles are encoded by the channels. The tuning profile of each channel takes the form of the 427 normalized psychophysical distance function (Eq. 1). An example from the estimation task is shown in Figure 4.

428
For a single trial with four dot patches (Fig. 4a) of varying color (θ target , θ side , θ f eature , and θ distractor , Fig. 4b) 429 a small set of channels (Fig. 4c) would be activated as in Figure 4d. The mean activation and range of noise are 430 shown (colored markers and error bars) as well as examples of samples from those distributions (black markers). 431 We use 100 channels in the full model, but the exact number of channels is an arbitrary hyperparameter. More 432 channels provide better resolution to account for the data, up to some ceiling. In simulations we found that more 433 than 100 provided a marginal benefit because correlations quickly accumulate in nearby channels.

434
Each channel's response (Fig. 4d) is normally distributed around the mean (µ) determined from the tuning profile 435 with standard deviation (σ) set to one: Where θ pref is the preferred orientation for that channel, p is the function described in Eq. 1, and d controls the 437 maximum amplitude of the response.

438
Next, we take the channel with the maximum response. The preferred orientation of this channel is the angle 439 reported by the modeled observer (Fig. 4d). Because each channel has independent normally-distributed noise, 440 the probability of any channel being reported can be computed as the conditional probability of that channel 441 exceeding all of the other channels (Fig. 4e). We approximate this distribution by numerically integrating over 442 channel responses a: This equation computes the probability that channel C θ 's response will exceed all the other channels and be chosen 444 as the observer's response, given that they observed a dot patch with angle θ stimulus . a indexes the response of 445 the channels according to Eq. 2. To compute the likelihood distribution across all angles we numerically evaluate 446 Eq. 3 for each channel. We evaluate a in the range m = ± 5 d based on simulations which showed that this range 447 was more than sufficient to capture the range of channel response values, but still be computationally tractable.

448
The likelihood distributions are normalized as probability density functions, such that: A free parameter d controls the maximum amplitude of the channel responses (Fig. 4d) and therefore the width  For the averaging task we fit d to the responses of individual observers by maximizing the likelihood of the 462 observed data using Bayesian adaptive direct search (Acerbi & Ma, 2017). Note that the β parameters are used 463 to fit the estimation task. For the averaging task we simply set β target = 1 and the others to 0. We cross-validated 464 the models by separating the data into ten folds using nine to fit the model and evaluating the likelihood on the 465 left out fold. We repeated this leave-one-fold-out procedure to obtain the likelihood of the full dataset.

466
To account for motor error we convolved the likelihood functions (Eqn. 3) with an additional 2 • full-width half 467 maximum normal distribution (Fig. 4f). We also tested models with 1 and 3 • distributions to ensure the results

468
were robust to this parameter.

469
Estimation task analysis 470 To understand how observers encoded the stimulus during the estimation task, we expanded the model to separate 471 sensitivity (how precise an observer's reports were) from errors of selection (how likely observers were to report 472 about the target or an erroneous patch). The estimation task model generalizes the averaging task model to 473 account for the presence of four stimuli, allowing all four to modify an observer's reports.

474
To model the observer's trial-by-trial response, we assumed that four likelihood distributions (one for each of the opposite-side with mismatched-feature (green), respectively (Inset panel, top right, Fig. 4a,b).

483
The actual bias (β) values were calculated from three intermediate values: These are computed in a simple hierarchy: first β s controls whether the correct side is sampled. Second, the 485 parameters β f and β d determine whether the patch with the feature matching the target or the distractor is 486 sampled. We constrained β s , β f , and β d to the range [0, 1], which then also constrains β target + β side + 487 β f eature + β distractor = 1. In this way, the fit value of β target will correspond to the proportion of trials in 488 which an observer's response angle could be best attributed as having come from the target dot patch. β side will 489 correspond to the proportion of trials attributed to the dot patch on the same side as the target, and similarly for 490 β f eature and β distractor .

491
The output of this model is then a full likelihood distribution (Fig. 4h), i.e. the probability that any given angle 492 will be chosen as a response given the condition and stimulus (Eqn. 5).

493
In sum, we fit one sensitivity parameter (d ) and three intermediate bias parameters (β s , β f , β d ) for the data set 494 in which each observer selected by location or color (and reported motion direction) and separately for the data 495 set in which they selected by location or motion direction (and reported color). Each model thus fit four free 496 parameters using approximately 700 trials of data.

497
Model statistics 498 To compare any two variants of the models we computed their cross-validated log-likelihood ratio (i.e., the 499 difference in total log-likelihood). We use this statistic rather than other information criterions (e.g. Akaike infor-500 mation criterion (Akaike, 1987)) because the cross-validation procedure already penalizes models with additional 501 parameters for over-fitting.

502
To evaluate the quality of model fits for the estimation task we computed a measure of variance explained. We 503 binned the proportion of responses at equal degree intervals (32 bins, 11.25 • each) generating a distribution of 504 response angles and compared these to the model's predicted distribution using the formula: Where SS res and SS total are the unexplained variance and total variance, computed from the proportion of 506 responses y and the model predictions y : To obtain a measure of statistical significance we randomly permuted the responses made by each observer within 508 their data and refit the models, then repeated this permutation procedure 100 times. After the subtracting 509 the mean, the resulting distribution of log-likelihoods had a 95% CI of [-2.22, 2.54]. This matches the common 510 suggestion that when an information criterion statistic differs by more than two it should be considered statistically 511 significant, while a difference larger than 10 would indicate a substantial improvement in model fit. For all 512 parameter comparisons we used permutation tests to compute confidence intervals on their differences.

513
Model recovery 514 To estimate the statistical power of our data set and analysis we performed a model recovery simulation. Our 515 focus was on estimating our statistical sensitivity (i.e. true positive rate) for various effect sizes of the d and 516 β parameters. To estimate the sensitivity of the d parameter, we set up a series of simulated data sets each 517 consisting of 700 trials (i.e. equivalent to the data of one observer, for one task variant). These datasets were 518 constructed by sampling response angles according to the same model used to fit the data, including the addition 519 of motor noise (Fig. 4). We simulated a d of 1.00, 1.01, 1.02, 1.03, 1.04, 1.05, 1.10, and 1.20, consistent with 520 the range of d values observed in the uncued and cued data. We set β target = 1 for these data sets. For each 521 d value we generated 200 simulated data sets and fit these with our analysis pipeline. We then compared the fit 522 d values against the distribution of values for the dataset with d = 1.00. A hit was counted if the fit value for a 523 simulated data set with d > 1.00 was larger than the fit value for the data with d = 1.00. We bootstrapped the 524 comparisons 10,000 times to estimate our sensitivity and report this (markers, Fig. 6e) relative to the observed 525 effects. The fit of a saturating exponential function (black line) which captures the simulations well is also shown. 526 We next set up a similar test to recover the β parameters. We simulated data starting from the uncued β of parameters, fit the model to each, and compared the parameters of each 10% increment against the fit to the 531 0% data, bootstrapping these 10,000 times. We calculated the proportion of simulations in which the cued β 532 went in the right direction relative to the uncued β and report the results for the β target parameter in Figure 6f.

533
The other β parameters all shared this same sensitivity curve.

534
Code and data accessibility 535 Code and data to reproduce the analysis and figures described can be accessed with the DOI 10.17605/OSF.IO/KMBTZ.