Hierarchical modulation of auditory prediction error signaling is independent of attention

The auditory system is tuned to detect rhythmic regularities or irregularities in the environment which can occur on different timescales, i.e. regularities in short (local) and long (global) timescale which could conflict or converge. While MMN and P3b are thought to index local and global deviance, respectively, it is not clear how these hierarchical levels interact and to what extent attention modulates this interaction. We used a hierarchical oddball paradigm with local (sequence-level) and global (block-level) violations of regularities in 5-tone sequences, in attended and unattended conditions. Amplitude of negativity in the N2 timeframe and positivity in the P3b timeframe elicited by the final tone in the sequence were analyzed in a 2*2*2 factorial model (local status, global status, attention condition). We found a significant interaction between the local and global status of the final tone on the N2 amplitude (p<.001, ηp2= .55), while there was no significant three-way interaction with attention (p > .05), together demonstrating that lower-level prediction error is modulated by detection of higher-order regularity but expressed independently of attention. Regarding P3b amplitude, we found significant main effect of global status (p<.001, ηp2= .42), and an interaction between global status and attention (p < .001, ηp2= .70). Thus, higher-level prediction error, indexed by P3b, is sensitive to global regularity violations if the auditory stream is attended. The results demonstrate the capacity of our auditory perception to rapidly resolve conflicts between different levels of predictive hierarchy as indexed by MMN modulation, while P3b represents a different, attention-dependent system.

1 Introduction 2 Our perception relies on prediction to facilitate the decoding of the sensory information. The 3 predictive coding theories of perception suggest that the brain tries to minimize the surprise or 4 prediction error, and continuously uses the unpredicted portion of the sensory input to adjust 5 the predictive models (1). A crucial component of the predictive coding is the hierarchical 6 organization of perceptual systems, with higher levels which represent slower-changing 7 regularities modulating the processing of lower-level predictive units which integrate over 8 shorter time (2). Such nested hierarchical system are crucial in human speech processing, 9 where the probability of a sound depends on its immediate local environment such as the 10 syllable structure, whereas word and sentence rules in the given language give wider context 11 which would need to be taken into account when predicting the subsequent sound (3). How 12 such hierarchically nested rules are extracted from the auditory stream, how they interact with 13 each other and other systems such as attention and long-term memory are central to our 14 understanding of auditory perception. 15 Unpredictable deviations from predicted pattern in auditory input are detected and given 16 special processing in the auditory analysis of the incoming sounds. This can be detected using 17 event-related potentials, where mismatch negativity (MMN), a negativity best visible in the 18 difference wave between a predicted and an unpredicted stimulus, arises 150-250 ms after the 19 onset of the deviant (4). While neuronal adaptation contributes to the MMN generation, the 20 MMN cannot be fully explained by adaptation as it can be elicited by a deviant which is 21 physically identical to the standard but violates an otherwise established rule (5, 6). MMN has 22 been suggested to reflect the prediction error which results when a perceptual predictive 23 model meets sensory input (7). The portion of the input which is compatible with the model 24 will be attenuated, whereas the unpredicted portion will be propagated further and used to 25 update the predictive model. This would explain why physically more different stimuli, as 4 26 well as less probable stimuli lead to larger MMN: there capacity of the model to explain the 27 incoming information is lower, and consequently larger error variance will be propagated (7, 28 8). 29 Most MMN studies have examined the predictability of a stimulus with regard to its 30 immediate environment: the immediately preceding sounds, presented with relatively short 31 interstimulus interval (<1000 ms), where the predictive model would predict continuation of 32 the pattern established by the immediately preceding stimuli (9). However, in addition to such 33 "local" rules which govern whether the stimulus is predictable from a short train of preceding 34 stimuli, there could be more general rules about the predictability of the current stimulus in a 35 wider context, which could agree with or diverge from the local rules. For example, a 36 stimulus which is locally unpredictable, deviating from several preceding identical stimuli, 37 could be following a rule establishing its regular occurrence over a longer time period. Such 38 "global" rule would take into account the probability of the stimulus in a more complex 39 situation. Paradigms which have examined such "local"/"global" rules have used a set-up 40 where two stimuli, A and B, are presented in regularly structured sequences (e.g., AAAAB), 41 so that the model could build up an expectancy about a physical deviant, which could be 42 rarely violated by a physical standard (e.g., by a sequence AAAAA) (10). Such pattern 43 extraction-MMN (11) has elicited considerable interest as it addresses the question how the  It has been suggested that simple tone vs. complex pattern deviations, corresponding to 47 violating the local and the global rule, respectively, are processed by different neural systems, 48 with simple-feature deviance-detection occurring at earlier levels of auditory processing and 49 increasingly complex rule deviations detected on higher levels (12). An important question, 50 however, is whether the higher levels can modify the processing in the earlier levels, as suggested by the hierarchical coding model (1): when the global rule has been learned, can it 52 modify the detection of the deviance due to the local rule, and suppress the MMN for a 53 stimulus which differs from its immediate environment but is predictable according to the 54 global rule?

55
The MMN generation to an unpredictable stimulus also appears to be connected to the 56 distribution of attentional resources: an unexpected deviance which the subjects can detect 57 typically elicits not only the MMN but also ERPs associated to attention, such as the P3a 58 component which has been linked to general novelty-processing and attention-switching 59 mechanisms (13). However, the link between MMN and attentional processes does not appear 60 to be directly causal: the P3a is not an obligatory sequel to MMN, but instead appears to 61 depend on the level of MMN or be linked to N1 instead (14-16). Another aspect of the MMN-62 attention relationship is the question whether attentional resources are required for deviance 63 detection to unfold in the first place. While MMN is often referred to as attention-64 independent, relying on the observation that a robust MMN is elicited when the subjects are 65 attending visual stimuli (5), it has been proposed that attention can reorganize the auditory 66 stream before it is passed on to the processes which lead to MMN generation (17). One 67 interesting situation where such attention-dependent reorganization could have a substantial 68 effect is the detection of regularities which unfold over a longer time period: seconds or even 69 minutes.

70
Crucially, however, global-rule formation has been suggested to be attention-dependent. An The participants were 20 healthy young adults (10 female) recruited among the community of 132 the University of Bergen. Three subjects' data was discarded due to excess movement 133 artefacts during EEG session, the analyses are based on 17 subjects (7 female

158
Each block began with 25 repetitions of the sequence which was the global standard for that 159 block, serving as a model-building phase, following which 150 sequences were presented (in 160 80/20 ratio). The block length was 4 minutes 22 seconds. Inter-block interval was 3 seconds.

161
In total, 8 blocks were presented: four with XXXXX as global standard and four with 162 XXXXY as global standard.

163
Two attentional conditions were used: attended and unattended stimulation. In the attended 164 condition the subjects were asked to monitor the tones, compliance was checked by asking the 165 subjects to report on the sound characteristics after the recording using a 5-item questionnaire.

166
All subjects reported detecting some of the regularities present in the sound streams, with the 167 average score 4.1 out of 5 possible (only three subjects scoring 2 or 3, the others 4 or 5). In 168 the unattended condition the subjects were asked to perform a visual working memory task. In 169 the visual n-back task abstract visual objects (Fribbles, TarrLab, http://www.tarrlab.org) were 170 presented, asynchronously with the auditory stimuli, and the subjects asked to press a button 171 in case the object was the same as 2 objects previously. Compliance was checked by 172 examining the response profile of the visual n-back task. All subjects gave responses to the visual task. The mean accuracy (proportion of hits and correct rejections) was 0.87 (range 174 0.77-0.95, sd 0.06); the mean sensitivity index d' was 1.65 (range 0.82-2.74, sd 0.62).  After recording, the data were offline filtered using a zero-phase Butterworth IIR filter with 187 high-pass threshold 0.01 Hz (slope 12 dB/oct) and low-pass threshold 30 Hz (slope 12 188 dB/oct). The data were downsampled to 250 Hz. Eye movements were removed using 189 Gratton-Coles algorithm implemented in the BrainVision Analyzer. The data was epoched 190 into segments relative to the onset of first tone in the 5-tone sequence. Epochs spanned from -191 100 to 1348 ms after the onset of the first tone, covering the entire 5-tone sequence.

236
The waveforms for N2 as well as difference waveforms isolating the MMN component are 237 depicted on Figure 1, row A and B. As can be seen in the difference waveforms (Fig.1, B), a 238 clear MMN was present in all conditions; testing the peak amplitude at FCz electrode against 239 zero indicated that the peak value was significantly below zero in all six conditions (all 240 p<.005). We then examined the effect of the different experimental conditions on the N2 peak 241 (Fig.1, A) using the factorial general linear model as described above. In FCz, there was a for the Local factor depended on the level of the Global factor: for items that were also global 247 standards, the local standard-deviant difference was smaller (sta: -.93, dev: -1.45; t (16) =-1.6, 248 p=.12, d=.36) than for items that were also global deviants (sta: -1.9, dev:-4.5; t(16)=-5.5, 249 p<.001, d=1.5).   In the left mastoid, for items that were also global standards, the local standard-deviant 279 difference was smaller (sta: 1.1, dev 1.3, t=1.11, p=.28, Cohen's d: 0.28) than for the items 280 that were also global deviants (sta: 1.8, dev: 3.34, t=5.5, p<.001, Cohen's d: 1.3). In the right 281 mastoid, similar pattern was seen: for items that were also global standards, the local 282 standard-deviant difference was smaller (sta: .66, dev: 1.03, t=2.09, p=.05, Cohen's d: .53) 283 than for the items that were also global deviants (sta: 1.3, dev 2.28, t=3.4, p=.004, Cohen's d:

284
.91).   In this experiment we exposed subjects to a hierarchical auditory structure where the 337 frequency of a tone was predicted by two independent rules in a factorial design: a local rule We additionally characterized the MMN wave generated by the deviants relative to the 390 standard sequence. The difference waves relative to sequence which was both a local and a 391 global standard (Fig.1, B) show the expected significant negativity in all three deviant 392 conditions.

393
The parallel representation of the local and global rule is consistent with the studies on multi-  In contrast to the N2, the results in the P3a and P3b showed that their representation of 407 violation of predictions was modified by attention. There was a slight divergence in their 408 response. P3b was not sensitive to local status of the sound. This has in earlier literature led to 409 suggestion that P3b is uniquely representing the higher, block-level status of incoming 410 information, which is dependent on attention, and is unable to operate when the attentional 411 resources are removed (10,18,20). The current results demonstrate that this is not the full 412 picture. P3b is indeed sensitive to interaction between global status and attention: the global 413 deviants and standards differed only in the auditory attention condition, and did not differ in the unattended condition when the subjects were performing a visual n-back task as a 415 distractor. However, as discussed above, the modulation of the N2 by the global rule was 416 clearly present in the attended as well as unattended condition. This indicates that the 417 processes leading to negativity in the MMN timeframe could track the global status of the 418 sound even under conditions where the P3b was not capable of representing it. Therefore, the 419 data does not support the view that global rule representation is uniquely performed by a 420 system where violation generates P3b. Instead, the P3b appears to index conscious, attentional 421 processing of the attended sequences. This is consistent with the theories interpreting P3b as 422 an index of detecting events which are salient or important relative to the currently maintained 423 goal state (26, 27).

424
The P3a, unlike P3b, was sensitive to the local status of the tone, similarly to N2. This is in 425 agreement with findings of P3a following the violations which elicit the MMN. The P3a, 426 linked to frontal lobe as well as medial temporal lobe, is frequently associated to novelty 427 processing. In the context of auditory oddball studies it has been associated to orienting 428 attention toward the deviant sound, or evaluating the contextual novelty of sounds (13).

429
However, the effect was not modulated by the global rule, corroborating the suggestions that