The fact that visual attention is rapidly engaged by sudden onsets is important for the survival of the organism, as through this mechanism one can give immediate attentional priority to potentially dangerous events and eventually react accordingly (Jonides & Yantis, 1988). Regardless of whether the attentional capture is fully automatic or under top-down control (Folk & Remington, 2015; Folk, Remington, & Johnston, 1992), should the organism be compulsively attracted by innocuous repetitive onsets, the resulting orienting behavior would become a waste of time and resources, and in the end disadvantageous for survival. Hence, evolution must have also selected a mechanism for suppressing the repetitive orienting of attention toward irrelevant onset stimuli when enough evidence has accumulated that they pose no threat to the organism.

In recent years, there has been an increasing interest in the cognitive and neural mechanisms that could permit to ignore or attenuate the potentially distracting effect of irrelevant stimuli. A prominent view emerging from the “attention” research field postulates that distractor filtering is accomplished via top-down control by actively suppressing irrelevant information (e.g., Geng, 2014). In fact, the idea that the brain can learn to disregard irrelevant information dates back to Pavlov’s original study on the habituation of the orienting reflex (OR) (Pavlov, 1927). Sokolov (1963) later studied more in detail the OR and its habituation, and proposed the influential stimulus-model comparator theory. By interacting with the world, the brain would build a corresponding neural model, which is constantly updated as a function of the stimuli encountered. When a new stimulus does not match the current model, attention is oriented toward the novel sensory input to evaluate its significance for the organism. As recently discussed by Folk and Remington (2015), the model would be cognitively encapsulated and impenetrable to top-down factors, thus ensuring the automaticity of the attentional orienting toward salient and unexpected stimuli. However, another crucial prediction of the model, specifically addressed in our study, is that if the upcoming stimulus matches the neural model, then habituation of the corresponding OR takes place.

Stemming from Sokolov’s (1963) idea, other studies have proposed that habituation of the orienting of attention allows to filter task-irrelevant stimuli that occur in a repetitive fashion (e.g., Cowan, 1988; Elliott & Cowan, 2001). Accordingly, we have documented that capture of focused attention habituates over time as exposure to the irrelevant onsets continues (Pascucci & Turatto, 2015; Turatto & Pascucci, 2016). Although top-down suppression strategies could be used to exclude the distractors, habituation, which is an ubiquitous phenomenon in the animal kingdom (Thompson, 2009), indicates that other neural mechanisms can play a role in filtering unwanted stimulation (Ramaswami, 2014).

Because habituation of the OR requires the storage of a neural model of the to-be-ignored irrelevant stimulus, two germane questions arise with respect to the characteristics of this form of memory. The first is prompted by the observation that learned contextual information can help the allocation of attention toward the relevant target location (Chun & Jiang, 1998). Analogously, one may ask whether the neural model underlying habituation of capture is based only on the distractor features or, alternatively, whether it contains also contextual information (here provided by the display layout). The second question concerns the duration of the model of unwanted stimulation once it is no longer used. Perceptual learning studies have shown that the effects of training for the attended target can last for weeks or months (Ball & Sekuler, 1982), which implies that the learned information persists long after training. Likewise, it becomes interesting to ascertain whether the brain also forms long-term memories of the irrelevant distracting information.

Experiment 1: Habituation of capture is context dependent

Habituation is usually considered a form of non-associative learning (Groves & Thompson, 1970). Conversely, Wagner (1979) proposed an associative theory of habituation, postulating that with training an association is formed between the repetitive stimulus and the surrounding context. Later exposure to the same context generates the retrieval of the habituated stimulus representation in short-term memory. This representation reduces the attentional response usually triggered by the stimulus when it is initially presented and unexpected, thus showing response habituation. However, repeated presentation of the context without the stimulus would weaken the associative strength between the two representations (extinction), so that when the stimulus is reintroduced in the same context, a spontaneous recovery of the habituated response is observed. Crucially, to claim that habituation is context specific spontaneous recovery should not be observed if, after training, both the stimulus and the context are omitted before the test phase.

To establish whether habituation of capture is context specific, we first exposed, on Day 1, participants to a visual distractor while they performed a discriminative task with focused attention, which should lead to habituation of capture. Then, we evaluated spontaneous recovery of capture after the distractor was removed for approximately 48 hours, but in two different conditions (see Fig. 1). In the extinction condition the distractor was removed on Days 2 and 3 but reintroduced in the last block of trials of Day 3 to test capture. By contrast, in the control condition, participants did not perform the task on Day 2, and were tested with the distractor in a single block of trials on Day 3. Therefore, the extinction and control conditions were identical in terms of exposure to the distractor (Day 1, and one block on Day 3), interval of time between training and test (two days), but differed in terms of context exposure between the training and the test phase.

Fig. 1
figure 1

Each square represents a block of 100 trials. Dark squares are blocks in which the distractor was present on 50% of the trials. Light squares are blocks in which the distractor was omitted. On Day 2 and on the firsts four blocks of Day 3, the extinction group performed the task without distractors

Method

Participants

Fifty-two undergraduate students (41 female; mean age = 22.1 years) of the University of Trento were recruited from the Department of Psychology for course credits. They had normal or corrected-to-normal vision and were all naïve as to the purpose of the experiment. Informed consent was obtained from all participants. All the experiments were carried out in accordance with the Declaration of Helsinki, and with the approval of the local institutional ethics committee (Comitato Etico per la Sperimentazione con l’Essere Umano, Università degli Studi di Trento, Italy).

Apparatus

Stimuli were presented on a 23.6-inch VIEWPixx/EEG color monitor (1920 × 1080, 100 Hz) and generated with a custom made program written in MATLAB and the Psychophysics Toolbox (Pelli, 1997) running on a Dell Precision T1600 machine (Windows 7 Enterprise). Eye fixation was monitored with an Eyelink 1000 Desktop Mount system (SR Research, Ontario, Canada).

Stimuli and procedure

Each trial started with the presentation for 1,200 ms of the fixation point surrounded by four circles (inner diameter of 4°; outer diameter of 4.15°) positioned at the corners of an imaginary square (diagonal of 22.62°) centered on the fixation point. Three circles were light gray (7 cd/m2) and one was red (17 cd/m2), and were shown on a dark-gray background (0.07 cd/m2). The red circle served as cue to indicate the position of the upcoming target. The position of the cue was randomly assigned on each trial. On distractor-present trials, 200 ms before the target occurrence a high-luminance white annulus frame (inner diameter of 3.75°, outer diameter of 4.25°, 52.5 cd/m2) was superimposed for 100 ms to one of the three light-gray circles, thus creating a sudden visual onset distractor (see Fig. 2). The position of the distractor relative to the target position was balanced across trials.

Fig. 2
figure 2

Schematic representation of the experimental paradigm trial events. The circle with the dashed line was red and served as cue for the target location. See Experiment 1 method for further details

Participants were instructed to maintain fixation on the central point while focusing their attention exclusively on the cue. The task was to report as quick as possible the orientation (left vs. right) of the target line by pressing the corresponding arrow on the computer keyboard. Response times (RTs) were recorded from the target appearance, and the maximum time allowed for responding was 1,500 ms. Trials in which participants did not respond within this time window were excluded from the analysis (<1% in total). Error feedbacks were provided by a message presented on the screen for 500 ms at the end of the trial.

When an eye movement or blink was detected in the first 500 ms of the trial, the trial was aborted and restarted. If an eye movement was detected during the presentation of either the distractor or the target, an error message appeared on the screen, and trial was discarded from the analysis.

On Day 1, all participants (N = 52) performed the task with the distractor in five blocks of 100 trials each. After Day 1, one group of participants (N = 26) was assigned to the extinction condition. On Day 2, and in the first four blocks of Day 3, they performed the same task as in Day 1 but without the distractor. They were then tested with the distractor on Block 5 of Day 3. The other group of participants (N = 26) was assigned to the control condition. These participants did not perform the task on Day 2 and were tested with the distractor in a single block of trials on Day 3.

Each block of trials was preceded by the gaze calibration procedure. On Day 1, before the beginning of the experiment, participants performed 20 trials of practice to familiarize with the task, and in which the distractor was never presented.

Results and discussion

Eye movements (<2% of the trials) were discarded prior to the analyses on RTs. Errors were <2% and were not further analyzed. RTs shorter than 150 ms or longer than 1,000 ms were treated as outliers and removed from the analyses (<1%). To being with, RTs on correct trials of Day 1 (for all participants, N = 52) were entered into an ANOVA for repeated measures with onset (present vs. absent) and block as factors. The factor Onset F(1, 51) = 26.500, p < .001, η2 = 0.342, Block, F(4, 204) = 24.712, p < .001, η2 = 0.326, and the Onset × Block interaction, F(4, 204) = 9.325, p < .001, η2 = 0.155, were significant. Figure 3a depicts the amount of capture defined as the RT differences between onset-present trials and onset-absent trials, as a function of block, and shows that, in agreement with the habituation hypothesis, the attentional capture response triggered by the onset decreased with practice. This was confirmed by pairwise comparisons (t test, two tails) showing that the amount of capture decreased significantly between Block 1 (M = 21 ms, SD = 3) and Block 5 (M = 4 ms, SD = 2; p < .001). Actually, habituation of the attentional response triggered by the onset was robust enough to make participants fully immune to distraction in the last two blocks of trials, as attested by the fact that the amount of capture did not differ from zero in Blocks 4 and 5 (all ps > .1).

Fig. 3
figure 3

Results of Experiment 1. a Habituation of attentional capture (Day 1) across blocks of training for the group of 52 participants. b Habituation (Day 1) and spontaneous recovery of capture at test (Day 3) as a function of group. The control group (N = 26) did not perform the task on Day 2, and was directly tested on Day 3. The extinction group (N = 26) performed the task without the distractor on Day 2 and in the first four blocks of trials of Day 3, and was tested with the distractor in the last block of Day 3. Bars represent ±1 SEM

The next crucial question was whether the attentional capture response recovered at test on Day 3, as a function of whether, before the test, participants were exposed to the context without the distractor (extinction condition) or not (control condition). To this aim, we first analyzed the amount of capture for the group of participants assigned to the control condition. The results at test (see Fig. 3b) clearly showed that there was no sign of spontaneous recovery of capture. RTs on Day 3 (M = −2 ms, SD = 4) were significantly different from those in Block 1 of Day 1 (p = .001) and did not differ from those in Block 5 of Day 1 (p = .546). Conversely, for participants assigned to the extinction condition, the results at test showed a spontaneous recovery of capture (see Fig. 3b). RTs on Day 3 (M = 18 ms, SD = 5) were significantly different from RTs in Block 5 of Day 1 (p < .001) but did not differ from RTs in Block 1 of Day 1 (p = .634).

By showing recovery of capture only in the extinction condition, the results suggest that habituation of capture was context specific and that the neural model representing the distractor onset contained also contextual information (Wagner, 1979).

Experiment 2: Long-term distractor memory

Once established that the neural model on which relies habituation of the attentional capture is context specific, the next step was to investigate the duration of this memory representation. Therefore, we tested whether habituation of capture was still present one or two weeks after training.

Method

Participants

Thirty-two undergraduate students (25 female; mean age = 20 years) of the University of Trento were recruited from the Department of Psychology and Cognitive Sciences for course credits. They had normal or corrected-to-normal vision, and were all naïve as to the purpose of the experiment.

Apparatus

As in Experiment 1.

Stimuli and procedure

The training phase was identical to Day 1 of Experiment 1. Then, participants (N = 32) were divided in two groups, both tested in a single block of trials, either on Day 7 (N = 16) or on Day 14 (N = 16).

Results and discussion

Eye movements (<1%) were discarded prior to the analyses on RTs. Errors were <2% and were not further analyzed. RTs shorter than 150 ms or longer than 1,500 ms were treated as outliers and removed from the analyses (<1%). RTs on correct trials of Day 1 were entered into an ANOVA for repeated measures with Onset (present vs. absent) and Block as factors. The factor Block, F(4, 124) = 19.172, p < .001, η2 = 0.382, and the Onset × Block interaction, F(4, 124) = 11.123, p < .001, η2 = 0.264, were significant. Figure 4a depicts the amount of capture as a function of block, and confirms that the attentional capture habituated with practice. Accordingly, pairwise comparisons showed that the amount of capture decreased significantly between Block 1 (M = 19 ms, SD = 4) and Block 5 (M = −4 ms, SD = 4, p < .001). Actually, the onset failed to capture attention in Blocks 2, 3, 4, and 5, in which the RTs difference between onset-present and onset-absent trials was not significant (all ps > .05).

Fig. 4
figure 4

Results of Experiment 2. a Habituation of attentional capture (Day 1) across blocks of training for the group of 32 participants. b Habituation (Day 1) and spontaneous recovery of capture as a function of the day of test. One group (N = 16) was tested at Day 7, while the other (N = 16) group was tested at Day 14. Bars represent ±1 SEM

We then tested whether the attentional capture response was still habituated after one or two weeks. First, we analyzed the amount of capture for the group of participants tested on Day 7. The results (see Fig. 4b) clearly showed that on Day 7 habituation of attentional capture was still complete. RTs on Day 7 (M = 6 ms, SD = 5) were significantly different from those in Block 1 of Day 1 (p = .029), but did not differ from those in Block 5 of Day 1 (p = .151). Even more remarkably, when participants were exposed to the distractor after two weeks it failed to capture attention, showing that the degree of habituation was comparable to that found at the end of training in Day 1 (see Fig. 4b). RTs on Day 14 (M = −2 ms, SD = 5) were significantly different from RTs in Block 1 of Day 1 (p = .013), and did not differ from RTs in Block 5 of Day 1 (p = .571).

These results show that approximately 45 minutes of training with a distractor appearing on 50% of the trials, were sufficient to produce a complete habituation of attentional capture, and that habituation was based on an enduring memory of the distractor representation, which lasted at least for 2 weeks after training.

General discussion

Our results are in agreement with the associative model of long-term habituation proposed by Wagner (1979), which assumes that a novel and salient stimulus (here, the distractor) attracts focused attention, thus entering a primary attentional state called Al. This representation decays rapidly, but, meanwhile, a short-term representation of the stimulus is formed, called the secondary state (A2). A2 is also short living, but, crucially, while active it prevents another presentation of the stimulus to fully capture attention again (i.e. to enter the A1 state), thus forming the basis for short-term habituation. Furthermore, when a stimulus is repeatedly presented in a given context, a long-term association is formed between the two corresponding representations. Long-term context-specific habituation takes place because, when encountered, the context itself triggers retrieval of the stimulus in the A2 state, thus decreasing the attentional capture response elicited by the stimulus. These cognitive mechanisms would explain why in Experiment 1 capture recovered in the extinction condition, given that on Days 2 and 3 the distractor-context association decayed. Conversely, the association remained unchanged in the control condition.

Although our results find a straightforward explanation in the Wagner’s (1979) model, an alternative account could be provided by the singleton-detection versus feature-search mode distinction introduced by Bacon and Egeth (1994). According to this view, on Day 1 with practice participants adopted a feature-search mode for the red cue, which progressively weakened the onset capture. When on Days 2 and 3 the onsets were removed, participants reset their attention strategy to a less cognitive-demanding singleton-detection mode, as the only singleton present in the display was the red cue. Hence, when in the test block of Day 3 the onsets were reintroduced, the singleton-detection strategy made the attention system vulnerable to distraction again. There are, however, reasons that make us less sympathetic with this explanation. To begin with, this account must rest on the assumption that in Block 1 participants, with practice, shifted from a singleton-detection strategy, responsible for the initial high degree of capture, to a feature-search strategy, thus reducing distraction. Additionally, this assumption seems questionable, since in the original Bacon and Egeth’s (1994) study an ad hoc manipulation of the search display was necessary to force participants to abandon the singleton-detection strategy in favor of the feature-search strategy. In other words, the adoption of a more cognitively demanding attention strategy did not occur spontaneously because of exposure to the distractor. For the same reason, it seems to us unlikely that such attention strategy shift occurred in our paradigm.

The context-specific habituation emerged in Experiment 1 can also be easily accommodated in the stimulus-model comparator theory proposed by Sokolov (1963), which postulates that the more the sensory input matched the model, the more the attentional response to the distractor was inhibited, thus leading to habituation. When the distractor was then removed in the control group, the model was updated and consisted only of the irrelevant static stimuli forming the context. The onsets reappearance in the test block created a mismatch in the model, and a recovery of capture was observed. The relevance of the Sokolovian model for the automaticity of attentional capture has recently been discussed by Folk and Remington (2015), who showed that infrequent onsets do capture attention in a pure automatic fashion, thus bypassing any top-down control. However, our work presents several novel aspects that were not considered in this previous study. First, we confirmed that capture of fully focused attention habituates as a function of exposure to the onsets (Turatto & Pascucci, 2016); second, such habituation was context specific, with context defined by the spatial layout of the display; third, the neural model of the distractor (and of the corresponding context) was stored in a long-term memory that endured unchanged for at least 2 weeks.

As for the role of context, it is well established that in visual search contextual information guides attention toward the target (Chun & Jiang, 1998). Furthermore, contextual information can promote the use of the specific attentional set (singleton-detection vs. feature-search) used during the visual-search task, thus making the attentional system more or less vulnerable to distraction (Cosman & Vecera, 2013). By contrast, in the habituation of capture we documented contextual information did not regulate the attentional set based on the target information, but rather it affected the neural model used to filter the unwanted stimulation. More in general, while the guidance of attention toward the relevant stimuli can be controlled by different memory systems (Hutchinson & Turk-Browne, 2012), our study shows that attentional selection based on distractor filtering capitalizes on a long-term memory of the irrelevant information.

Recent studies have proposed that the brain can prevent unwanted attentional capture by means of top-down filtering mechanisms that lead to distractors suppression (e.g., Cunningham & Egeth, 2016; Geng, 2014; Marini, Chelazzi, & Maravita, 2012). However, habituation and the underlying neural mechanisms have long been suggested to operate precisely by filtering the irrelevant and unwanted sensory information (Groves & Thompson, 1970; Sokolov, 1963) and consequently would be intimately connected with attentional selection (Cowan, 1988; Turatto & Pascucci, 2016). This idea has been beautifully captured in a recent theoretical model of adaptive filtering linking together habituation, attention, and predictive coding (Ramaswami, 2014).

To conclude, our results show that habituation of capture is context dependent and relies on long-lasting memories of irrelevant information. Thus, attentional selection seems regulated by two neural models or cognitive sets (also, see Noonan et al., 2016): The well-known top-down set based on the relevant information (the target), and a second model, likely more bottom-up in nature, dedicated to the filtering of unwanted stimulation. As for the latter, the neural and cognitive mechanisms underlying habituation seem to provide, at least in our paradigm, a parsimonious and straightforward explanation for the progressive reduction of the distracting power of recurring task-irrelevant visual onsets.