Stimulus expectations do not modulate visual event-related potentials in probabilistic cueing designs

Humans and other animals can learn and exploit repeating patterns that occur within their environments. These learned patterns can be used to form expectations about future sensory events. Several influential predictive coding models have been proposed to explain how learned expectations influence the activity of stimulus-selective neurons in the visual system. These models specify reductions in neural response measures when expectations are fulfilled (termed expectation suppression) and increases following surprising sensory events. However, there is currently scant evidence for expectation suppression in the visual system when confounding factors are taken into account. Effects of surprise have been observed in blood oxygen level dependent (BOLD) signals, but not when using electrophysiological measures. To provide a strong test for expectation suppression and surprise effects we performed a predictive cueing experiment while recording electroencephalographic (EEG) data. Participants (n=48) learned cue-face associations during a training session and were then exposed to these cue-face pairs in a subsequent experiment. Using univariate analyses of face-evoked event-related potentials (ERPs) we did not observe any differences across expected (90% probability), neutral (50%) and surprising (10%) face conditions. Across these comparisons, Bayes factors consistently favoured the null hypothesis throughout the time-course of the stimulus-evoked response. When using multivariate pattern analysis we did not observe above-chance classification of expected and surprising face-evoked ERPs. By contrast, we found robust within– and across-trial stimulus repetition effects. Our findings do not support predictive coding-based accounts that specify reduced prediction error signalling when perceptual expectations are fulfilled. They instead highlight the utility of other types of predictive processing models that describe expectation-related phenomena in the visual system without recourse to prediction error signalling. Highlights – We performed a probabilistic cueing experiment while recording EEG. – We tested for effects of fulfilled expectations, surprise, and image repetition. – No expectation-related effects were observed. – Robust within– and across-trial repetition effects were found. – We did not find support for predictive coding models of expectation effects.


Introduction 29
Humans and other animals can learn from recurring patterns of sensory input and 30 form predictions about upcoming events. For example, after hearing a loud bark one 31 might expect to see a dog and would be surprised upon seeing a cat instead. These so-32 called predictive processes can help us to detect unusual or novel events (Ulanovsky et 33 be expected to appear with 50% probability (a neutral stimulus). These effects are 93 thought to be analogous to effects of expectation violation whereby surprising stimuli 94 (e.g., low appearance probability) evoke larger prediction error responses than 95 expected or neutral conditions (Egner et al., 2010). 96 In our recent review (Feuerriegel et al., 2021a) we reported that, for fMRI BOLD 97 signals in the visual system, there is actually limited and inconsistent evidence for ES 98 defined as an expected-neutral difference. BOLD signal differences between expected 99 and surprising stimuli were apparently due to surprise-related increases rather than 100 any suppression associated with expectation fulfilment (e.g., Summerfield & Koechlin, 101 2008;. 102 In addition, multiple confounding factors were identified in Feuerriegel et al. 103 (2021a) that may mimic expectation effects in commonly used experiment designs. 104 One prominent confound is effects of repetition suppression and adaptation, whereby 105 stimuli that are identical (or similar) to those that were recently encountered evoke 106 reduced neural response magnitudes (Grill-Spector et al., 2006;Solomon & Kohn, al., 2016) have not replicated when using electrophysiological measures, such as firing 125 rates and local field potential amplitudes in macaques (e.g., Kaliukhovich & Vogels, 126 2011;Vinken et al., 2018;Solomon et al., 2021) and event-related potentials/fields 127 (ERPs/ERFs) in humans (Kok et al., 2017;Rungratsameetaweemana et al., 2018;128 Solomon et al., 2021;reviewed in Feuerriegel et al., 2021a). Studies that did report 129 expected-surprising differences found effects at frontal channels or topographies 130 consistent with effects on the parietal P3 ERP component rather than sources in visual 131 cortex (Summerfield et al., 2011;Hall et al., 2018;Feuerriegel et al., 2018a). Solomon et 132 al. (2021) only reported differential ERPs to surprising stimuli when they were 133 designated as targets in a go/no-go task. These effects resembled ERP components 134 associated with target detection in perceptual decision tasks (e.g., Loughnane et al., 135 2016). ES and surprise effects have been observed in macaques during predictive 136 cueing and statistical learning experiments that involved weeks of sequence learning 137 prior to recording (Meyer & Olson, 2011;Meyer et al., 2014;Ramachandran et al., 2017;138 Here, we tested for ES and surprise effects in humans using a larger (n=48) 156 sample than previous electrophysiological studies while accounting for relevant 157 confounding factors identified in Feuerriegel et al. (2021a). We used a predictive 158 cueing design whereby a cue signalled the appearance probability of a subsequent face 159 image, ranging between 10% (surprising), 50% (neutral) and 90% (expected). We 160 compared ERPs evoked by expected, neutral and surprising faces using mass-161 univariate analyses to characterise expectation effects across the entire time-course of 162 the stimulus-evoked response. We also tested for expected-surprising differences that 163 are broadly distributed across the scalp using multivariate pattern analysis (MVPA). In 164 each cueing condition only two different face images could potentially appear, 165 meaning that participants could form expectations for specific low-and high-level 166 stimulus features. In our design, expected and surprising faces were task-relevant but 167 did not require an immediate decision or motor response. This allowed us to avoid 168 ERP effects related to target detection or perceptual decision processes (e.g., 169 experiment due to experimenter error. One additional participant was excluded 188 because of a response keypad malfunction during the first two blocks of the training 189 session. This left 48 participants for both behavioural and EEG data analyses (32 190 women, 16 men, 44 right-handed) aged between 18 and 35 (M = 22.0, SD = 2.9). This 191 was a substantially larger sample than existing studies testing for electrophysiological 192 expectation effects using predictive cueing designs (e.g., n=23 in Kok et al., 2017;n=17 193 in Rungratasemeetaweemana et al., 2018;n=22 in Solomon et al., 2021). As these 194 studies did not report statistically significant expectation effects, we did not have an a 195 priori effect size target for statistical power analyses. Prior to the experiment the EEG cap was set up and participants completed a 222 training session to learn the probabilistic cue-face associations. Seven different cues 223 were presented, which each cued the appearance probabilities of subsequently 224 After the presentation of the first two cues, participants then completed a face 251 identification task. In each trial of this task, one of the two cues was presented for 252 400ms followed by a face image for 400ms after a 500ms interval ( Figure 1C). 253 Participants were required to press keys 1 or 3 on a TESORO Tizona numpad (1000Hz 254 polling rate) using their left or right index finger depending on the face identity shown 255 (A or B). In half of trials the first cue was presented, and in the other half the second 256 cue was presented (randomly interleaved). During the task the overall probabilities 257 (averaged across cues) of Faces A and B appearing were 50%. This task was repeated 258 across participants. Participants were required to make a response within 1200ms after 283 S2 face onset. If a response was not made during this window, the feedback 'too slow' 284 was presented for 1,000ms. If responses were made prior to S2 face onset then 'too fast' 285 feedback was presented. This task ensured that the S1 faces predicted by each cue were 286 task-relevant, but did not require an immediate perceptual decision and keypress 287 response (see also Kok et al., 2017). 288 Prior to the experiment participants completed practice trials until they were 289 confident in their ability to perform the task. Practice trials were identical to those in 290 the experiment except that participants were provided with feedback ('correct', 'error', 291 'too fast', 'too slow') for 1,000ms after the response window following S2 face onset in 292 each trial. A fixation cross was displayed for 800ms between the offset of task feedback 293 and the presentation of the cue in the subsequent trial. 294 There were 70 trials per block, consisting of 36 expected faces, four surprising 295 faces, twenty 50% neutral faces, and ten 25% neutral faces. Each face image was 296 presented an equal number of times within each block. Participants completed 12 297 blocks including a total of 840 trials. At the end of each block participants were 298 notified of their accuracy and mean RT for correct responses in that block.  predicted the subsequent appearance of one of two face images with 90% probability (expected faces), 10% probability (surprising 305 faces) or 50% probability (50% neutral faces). An additional cue predicted one of four faces to each appear with 25% probability (25% 306 neutral faces). B) An example trial in the training session whereby participants were passively exposed to cue-face pairs. C) An 307 example trial from the training session face identification task. Participants were required to press one of two keys depending on the 308 face identity presented in each trial. For the 25% neutral cue condition, participants were instructed to press keys in response to two 309 of the face identities, and not to respond if one of the other two faces were presented. D) An example trial from the experiment. 310 Participants were required to press one of two keys based on whether the second (S2) face presented in the trial was the same or a 311 different identity to the first (S1) face. Neither the animal cues or the S1 face identity were predictive of the S2 face being a repetition 312 or alternation. All face images shown here are subject to a Pixabay license (https://pixabay.com/hu/service/license/). These images 313 were not part of the actual stimulus set and we do not have permission to publish the original images. All images in this figure have 314 been processed in the same way as the original stimuli.

Task Performance Analyses 316
To assess whether participants had learned the cue-face associations in the 317 training session we derived measures of accuracy (proportion correct) and mean RTs 318 for trials with correct responses. Analyses were performed in JASP v0.16.4 (JASP Core 319 Team). We compared performance across expected (90%), 50% neutral and surprising 320 (10%) faces. Accuracy was near ceiling for some conditions and was not normally 321 distributed and so Wilcoxon signed-rank tests were used to test for differences in 322 accuracy across conditions. Paired-samples t tests were used to test for differences in 323 mean RTs. Bonferroni corrections were used to adjust the alpha level for multiple 324 comparisons. Bayesian versions of each test were also run to derive Bayes factors in 325 favour of the alternative hypothesis (Cauchy prior distribution, width 0.707, 1,000 326 samples drawn for signed-rank tests). Because the training task for 25% probability 327 faces included 50% of trials where no response was required (resulting in a hybrid face 328 identification and go/no-go task) we did not compare performance between this and 329 other conditions. 330 Performance on the face matching task during the experiment was measured 331 by calculating accuracy and mean RTs for trials with correct responses. We compared 332 accuracy across conditions using Wilcoxon signed-rank tests with Bonferroni 333 corrections as described above. We compared mean RTs across S1 face appearance 334 probability conditions using a one-way repeated measures ANOVA. A Greenhouse-335 Geisser correction was used to account for sphericity violations. RTs were also 336 compared across repeated and alternating face conditions using a paired-samples t 337 test. Bayesian versions of each test were also run in JASP. 338 339

EEG Data Acquisition and Processing 340
We recorded EEG using a 64-channel Biosemi Active II system (Biosemi, The 341 Netherlands) with a sampling rate of 512 Hz. Recordings were grounded using 342 common mode sense and driven right leg electrodes 343 (http://www.biosemi.com/faq/cms&drl.htm). We added six additional electrodes: two 344 electrodes placed 1 cm from the outer canthi of each eye, and electrodes placed above We processed EEG data using EEGLab v2022.0 (Delorme & Makeig, 2004) in 347 MATLAB (Mathworks). The EEG dataset and data processing and analysis code will be 348 available at osf.io/ahuc4/ at the time of publication. First, we identified excessively 349 noisy channels by visual inspection (mean number of bad channels = 0.5, range 0-3) 350 and excluded these from average reference calculations and Independent Components 351 Analysis (ICA). Sections with large amplitude artefacts were also manually identified 352 and removed. We then low-pass filtered the data at 40 Hz (EEGLab Basic Finite 353 Impulse Response Filter New, default settings), re-referenced the data to the average 354 of all channels and removed one extra channel (AFz) to compensate for the rank 355 deficiency caused by average referencing. We duplicated the dataset and additionally 356 applied a 0.1 Hz high-pass filter (EEGLab Basic FIR Filter New, default settings) to 357 improve stationarity for the ICA. The ICA was performed on the high-pass filtered 358 dataset using the RunICA extended algorithm (Jung et al., 2000). We then copied the 359 independent component information to the non high-pass filtered dataset (e.g., as 360 done by Feuerriegel et al., 2018a). Independent components associated with blinks and 361 saccades were identified and removed according to guidelines in Chaumon et al. 362 (2015). After ICA, we interpolated previously removed noisy channels and AFz using 363 the cleaned dataset (spherical spline interpolation). EEG data were then high-pass 364 filtered at 0.1 Hz. 365 The resulting data were segmented from -100ms to 800ms relative to cue onset, 366 S1 face onset and S2 face onset. Epochs were baseline-corrected using the prestimulus 367 interval. Epochs containing amplitudes exceeding ±100µV from baseline at any of the 368 64 scalp channels, as well as epochs from trials with 'too fast' or 'too slow' responses to 369 S2 faces, were rejected (cues: mean epochs retained = 786 out of 840, range 629-837, S1 370 faces: mean = 792, range 636-839, S2 faces: mean = 793, range 629-838). Statistics 371 relating to numbers of epochs retained per condition are displayed in Supplementary 372 across-trial image repetition using mass-univariate analyses as described above. To 419 test for within-trial face repetition effects we compared ERPs evoked by S2 faces when 420 preceded by the same face as compared to a different S1 face identity. To test for 421 across-trial repetition effects we compared ERPs evoked by cue images based on 422 whether the cue in the previous trial was the same or a different image. We also 423 compared ERPs evoked by S1 faces depending on whether the S2 face in the previous 424 trial was the same or a different face identity. 425 426

Multivariate Pattern Classification Analyses 427
The ROI-based mass-univariate analyses described above are sensitive to 428 between-condition amplitude differences that are consistent across participants but 429 are less sensitive to ERP pattern differences that vary across individuals or effects that 430 occur outside of the predefined ROIs. To better test for such effects we used MVPA 431 using support vector machine (SVM) classification as implemented in DDTBOX v1.0.5 432 (Bode et a., 2019) interfacing LIBSVM (Chang & Lin, 2011). To test for the combined 433 influence of ES and surprise effects we trained classifiers to discriminate between 434 expected and surprising S1 faces. These were preceded by the same cue images, 435 meaning that late ERP differences across cue images could not contribute to above-436 chance classification performance (e.g., via effects on pre S1 face ERP baselines). Please 437 note that here we did not classify face identity, but rather the expected or surprising 438 status of the S1 faces. To compare the classification accuracy time-courses against 439 within-trial repetition effects we also trained classifiers to discriminate between S2 440 faces based on whether they were preceded by the same face (repetition trials) or a We used a sliding window approach whereby epochs were split into non-443 overlapping analysis time windows of 10ms in duration. Within each time window 444 EEG amplitudes were averaged separately for each of the 64 scalp channels to create a 445 spatial vector of brain activity (64 features) corresponding to each trial. In cases where 446 one condition included more epochs than the other, a random subset of epochs was 447 drawn from the former condition to balance epoch numbers. Data from each 448 condition were then split into five subsets of epochs. SVM classifiers (cost parameter C 449 = 1) were trained on the first four subsets (80% of the data) from each condition and 450 subsequently tested using the remaining subset to derive a classification accuracy 451 measure. This was repeated until each subset had been used once for testing. This 5-452 fold cross-validation procedure was then repeated another five times with different 453 epochs randomly allocated to each subset each time to minimise drawing biases. The 454 average classification accuracy across all cross-validation steps and analysis repetitions 455 was taken as the estimate of classification performance for each participant. This 456 that participants were better practiced at the task when identifying neutral faces. RTs 475 were faster for expected compared to both 50% neutral, t(47) = -4.22, p < .001, BF10 = 476 209.92, and surprising faces, t(47) = -6.25, p < .001, BF10 > 128,900 ( Figure 2B). 477 However, we did not observe faster RTs for 50% neutral compared to surprising faces, 478 t(47) = -1.56, p = .125, BF10 = 0.49. Higher accuracy and faster RTs for expected faces 479 indicates that participants successfully learned the cue-face associations within the 480 training session. 481 Please note that accuracy and mean RTs are plotted in Figure 2A-B also for 25% 482 neutral faces requiring a keypress response. However due to the difference in task for 483 this condition (with 50% of trials requiring no response) these were not compared 484 with the other conditions. 485 486

Experiment Task Performance 487
During the experiment accuracy was generally high across all S1 face expectancy 488 conditions for the S1-S2 face matching task ( Figure 2C). Accuracy was slightly lower in 489 trials with surprising as compared to expected S1 faces, Z = 2.06, p = .040, BF10 = 2.67, 490 congruent with previous reports of impaired decision-making for stimuli immediately 491 following surprising events (reviewed in Wessel, 2018). However this effect was not 492 statistically significant when adjusting for multiple tests. Differences in accuracy were 493 not observed when comparing other S1 face expectancy conditions, expected-50% 494 neutral: Z = 0.55, p = .586, BF10 = 0.16, 50% neutral-surprising: Z = 1.89, p = .059, BF10 = 495  For the 25% probability condition accuracy refers to the proportion of target faces that were correctly identified. B) Mean RTs for 504 correct trials. C) Accuracy in the face matching task by S1 face appearance probability condition. D) Mean RTs for correct trials by S1 505 face probability. E) Mean RTs for correct trials by S1-S2 repetition (Rep) or alternation (Alt) status. 506

Expectation and Predictability Effects on S1 Face-Evoked ERPs 507
We did not observe statistically-significant ERP amplitude differences across 508 any of the compared expectancy conditions. Group-averaged ERPs for each set of 509 conditions, difference waves, standardised Cohen's d effect size estimates (Cohen, 510 1988) and Bayes factors in favour of the alternative hypothesis are displayed in Figure  511 3. We did not observe differences between ERPs evoked by expected and surprising 512 faces ( Figure 3A), expected and 50% neutral faces ( Figure 3B), expected and 25% 513 neutral faces ( Figure 3C), or 50% neutral and surprising faces ( Figure 3D). Amplitude 514 differences and standardised effect size point estimates were very small. Bayes factors 515 generally provided evidence in favour of the null (values smaller than 1/3) across the 516 time-course of the S1 face-evoked response. 517 We also compared ERPs across 50% and 25% neutral conditions to test for 518 predictability effects (reported in Pajani et al., 2017;Feuerriegel et al., 2018a;Rostalski 519 et al., 2020). Statistically-significant differences were not observed after correction for 520 multiple comparisons. Bayes factors were in favour of the null hypothesis across most 521 of the peristimulus time window ( Figure 3E). 522 Results of analyses of ERPs at the occipital ROI (displayed in the Figure 3. Group-averaged ERPs evoked by expected (90% appearance probability), surprising (10% probability), and neutral (50% and 528 25% probability) S1 faces. A) Expected -surprising ERP differences. B) Expected -50% neutral differences. C) Expected -25% neutral 529 differences. D) 50% neutral -surprising differences. E) 50% neutral -25% neutral differences. ERPs are averaged across channels 530 within the parieto-occipital ROI including channels PO7/8, P7/8 and P9/10. ERPs for each pair of compared conditions are displayed 531 along with difference waves (with shading denoting standard errors), Cohen's d estimates and Bayes factors in favour of the 532 alternative hypothesis. 533

Effects of Across-Trial Repetition on Cue-and S1 Face-Evoked ERPs 546
A number of across-trial cue image repetition effects were observed spanning 547 125-547ms from cue onset. Group-averaged ERPs evoked by cues that were either the 548 same or different to the cue in the preceding trial are presented in Figure 4C. Cue 549 repetition effects were observed between 125-172ms (cluster mass = 121.74, critical 550 cluster mass = 47.50, p = .014), between 195-234ms (cluster mass = 87.22, p = .032) and 551 between 254-547ms from cue stimulus onset (cluster mass = 769.71, p < .001). 552 Topographies of effects (displayed in Figure 4D) resembled those observed for within-553 trial face repetition effects. 554 We also compared S1 face-evoked ERPs across conditions whereby the S2 face 555 in the preceding trial was the same or a different face identity. Group-averaged ERPs 556 are displayed in Figure 4E and scalp maps of effects are shown in Figure 4F. Across-557 trial repetition effects were observed between 178-229ms (cluster mass = 102.57, critical 558 cluster mass = 39.88, p = .019) and between 664-725ms from S1 face onset (cluster 561 562

Multivariate Pattern Classification Results 581
We used MVPA to test for patterns of single-trial ERPs that discriminate 582 between expected (90% probability) and surprising (10%) S1 faces. S1 face expectancy 583 could not be classified at above-chance levels at any time during the epoch (Figure  584 5A). We did not attempt to classify between other expectancy conditions as the 585 preceding cue images systematically differed across these conditions within each 586 participant dataset. In these cases, late cue image-specific ERPs can influence S1 face 587 pre-stimulus baselines to produce spurious above-chance classification performance. 588 The within-trial repetition or alternation status of S2 faces could be classified at 589 above-chance levels from 160 ms after S2 face onset until the end of the epoch (cluster 590 mass = 948.72, critical cluster mass = 4.73, cluster p < .001).
Stimulus expectations do not modulate visual ERPs 29 592 593 Figure 5. Multivariate pattern classification results. A) Classification accuracy for discriminating between expected (90% appearance 595 probability) and surprising (10% probability) S1 faces. Blue lines display classification performance for the original data and orange 596 lines show performance for the data with permuted condition labels (empirical chance distribution). Shaded regions denote standard 597 errors. B) Classification accuracy for discriminating between repeated and alternating S2 faces. The magenta shaded area denotes the 598 time window of statistically significant above-chance classification performance.

Discussion 600
A set of influential predictive coding models specify that the response magnitudes of 601 visual stimulus-selective neurons should inversely scale with the subjective 602 appearance probability of a stimulus (Friston, 2005(Friston, , 2010Summerfield & de Lange, 603 2014;Walsh et al., 2020). To test this hypothesis, we entrained participants' 604 expectations to see particular face identities after certain cue images. Task 605 performance results indicated that participants had learned these cue-face 606 associations during a training session. We then presented the same cue-face 607 associations in our experiment. Importantly, the cued faces were task-relevant but did 608 not require an immediate decision or motor action. Despite analysing a large (n=48)  Our results, taken together with multiple null findings reported in earlier 616 studies, indicate that expectations in predictive cueing designs are not sufficient to 617 produce differences in stimulus-evoked electrophysiological responses within the 618 human visual system. This is incongruent with models that specify reduced prediction 619 error signalling when an observer's expectations are fulfilled (e.g., Friston, 2005Friston, , 2010620 Clark, 2013;Summerfield & de Lange, 2014) but is consistent with a growing body of 621 electrophysiological work using predictive cueing designs that did not report effects of 622 ES or surprise on responses of visual stimulus-selective neurons (Kaliukhovich & 623 Vogels, 2011Kok et al., 2017;Vinken et al., 2018;Rungratsameetaweemana et al., 624 2018;Solomon et al., 2021). 625 We additionally report that both within-and across-trial repetition effects can 626 be readily observed in the absence of expectation effects. This is in line with most 627 contemporary models of adaptation (e.g., Solomon & Kohn, 2014;Vogels, 2016) but is 628 incongruent with predictive coding-based models that describe repetition suppression 629 as a product of biased expectations toward previously-seen stimuli (e.g., Summerfield

Consequences of Cued Expectations on ERPs 633
We did not observe evidence for either ES or surprise effects on face-evoked 634 ERPs. Bayes factors instead indicated evidence in favour of the null hypothesis 635 throughout the peristimulus time window. Even when using MVPA, which can exploit 636 distributed patterns of EEG signals across the scalp and ERP differences that are 637 idiosyncratic to participants, we did not see above-chance classification accuracy when 638 comparing expected and surprising faces. 639 Our findings are consistent with previous work using predictive cueing designs 640 and electrophysiological recordings that did not report expectation effects (Kok et al., 641 2017;Rungratsameetaweemana et al., 2018;Kaliukhovich & Vogels, 2011;Vinken et al., 642 2018;Solomon et al., 2021;reviewed in Feuerriegel et al., 2021a). Although ERP 643 differences between expected and surprising stimuli have been reported (Summerfield 644 et al., 2011;Hall et al., 2018;Feuerriegel et al., 2018a) the topographies of these effects 645 more closely indicate frontal sources or effects on the centro-parietal positivity ERP 646 component (O'Connell et al., 2012;Twomey et al., 2015) rather than sources in the 647 visual system. Tang et al. (2018) reported smaller visual N1 component peak 648 amplitudes for surprising gratings, however this was likely a result of peak amplitude 649 measurement biases due to a smaller number of trials per participant in the surprising 650 condition (Thomas et al., 2004). Additionally, Solomon et al. (2021) reported expected-651 surprising differences only when surprising stimuli were targets, but the timing and 652 topography of these effects closely resembled target detection-related ERP 653 components observed in similar perceptual decision-making tasks (Loughnane et al., 654 2016). By contrast, in our design a perceptual decision was not required immediately 655 following the expected and surprising S1 faces. Other effects of stimulus appearance 656 probability outside of the visual system, such as those on P3 component amplitudes in 657 oddball designs (Duncan-Johnson & Donchin, 1977;Polich & Margala, 1997), also 658 appear to be absent when the critical stimuli do not prompt an immediate decision 659 (Kimura et al., 2009;Feuerriegel et al., 2018bFeuerriegel et al., , 2021bFile & between cues and subsequently presented stimuli. We consider this unlikely because 664 participants were explicitly told the cue-face contingencies in our experiment and 665 demonstrated faster RTs for expected stimuli in the training session task (see also Kok 666 et al., 2017;Rungratsameetaweeana et al., 2018;Vinken et al., 2018). During our 667 experiment participants were further exposed to the expected cue-face pairs at least 668 100 times per cue. This would be expected to provide at least an equivalent 669 opportunity to learn cue-stimulus associations as compared to existing fMRI 670 predictive cueing designs (e.g., Egner et al., 2010). 671 The second possibility is that effects of fulfilled expectations and surprise are 672 not expressed in neural activity that contributes to EEG and other electrophysiological 673 measures, but are instead captured by changes in BOLD signals. Effects of surprise 674 (but not ES) in predictive cueing designs have been identified and replicated across 675 fMRI studies (e.g., Summerfield & Koechlin, 2008;reviewed in 676 Feuerriegel et al., 2021a). However, this would run counter to the idea that prediction 677 errors are signalled by cortical pyramidal neurons that typically contribute to invasive 678 recordings and EEG/ MEG (e.g., Friston, 2005;Bastos et al., 2012). 679 Notably, the inferences drawn from fMRI studies are complicated by additional 680 effects of surprise (separate from prediction error signalling) that contribute to the 681 BOLD response. For example, Richter & de Lange (2019) reported expected-surprising 682 BOLD signal differences for object images only when participants completed an object 683 categorisation task, and not during a task in which the objects were task-irrelevant 684 (see also Larsson & Smith, 2012). In addition, they found that RT differences between 685 expected and surprising conditions in the categorisation task correlated with BOLD 686 signal differences across the same conditions in V1 and temporal occipital fusiform 687 cortex. RTs are typically faster for expected stimuli (e.g., den Ouden et al., 2010;688 Mulder et al., 2012;Richter et al., 2018) and similar associations between RTs and 689 BOLD signals in visual areas have been documented across decision-making tasks 690 (Yarkoni et al., 2009;Mumford et al., 2023). Such associations are thought to reflect 691 variation in the duration of decision-making processes that involve attention and 692 visual working memory, termed a time-on-task effect (Yarkoni et al., 2009). 693 Importantly, the probabilistic cueing studies that reported surprise effects on BOLD to designs whereby participants detect a rare target that is distinct from the critical 696 stimuli (e.g., inverted faces, Larsson & Smith, 2012;697 Feuerriegel et al., 2018a). This is because each stimulus requires a decision to 698 determine that it is not a target, even if no motor response is required in target-absent 699 trials (akin to a go/no-go task, Murphy et al., 2015). Time-on-task effects could be 700 interpreted within predictive coding frameworks as reflecting the duration of 701 prediction error signalling, where model convergence is slower for surprising stimuli. 702 However, this would be incongruent with the finding that surprise effects were task-703 dependent in Richter and de Lange (2019). 704 Richter and de Lange (2019) also identified increased pupil dilation for 705 surprising stimuli from ~600 ms post stimulus onset, which correlated with the degree 706 of expected-surprising BOLD signal differences in V1. Notably, this pupil dilation 707 effect was not present when object stimuli were task-irrelevant. This can be 708 understood as a secondary consequence of surprise, whereby increased pupil dilation 709 leads to changes in retinal input and corresponding effects on low temporal resolution 710 expectations (Corbetta et al., 1990;Reynolds & Heeger, 2009) and pre-activation of 728 expected stimulus representations (Kok et al., 2017;Blom et al., 2020;Feuerriegel et al., 729 2021c). Beyond the visual system, biases in motor action preparation (de Lange et al., 730 2013;Gold & Stocker, 2017;Kelly et al., 2021) can enable faster responses when 731 expectations to see a particular stimulus are paired with expectations to make a 732 certain motor action. There are also well-documented effects associated with reward 733 predictions errors and information gain in areas such as the insula and striatum (e.g., 734 Loued-Khenissi Schultz, 2016). Rather, our findings suggest that 735 appeals to ES and cortical prediction error minimisation via top-down inhibition are 736 not necessary to adequately model expectation-related phenomena in the visual 737 system. This highlights the utility of alternative models that describe a range of cued 738 expectation effects without recourse to ES (e.g., Spratling, 2017;Heeger, 2017;Press et 739 al., 2020Press et 739 al., , 2022Feuerriegel et al., 2021c;Hogendoorn, 2022). Notably, many of these 740 models describe top-down excitatory rather than inhibitory modulations of stimulus-741 selective visual neurons. This is consistent with evidence that cortical feedback 742 projections within the visual system predominantly originate from, and target, 743 et al. (2018a) we observed predictability effects at electrodes over visual cortex. 760 However, the relatively unpredictable faces in Feuerriegel et al. (2018a) were also more 761 novel than predictable faces. When controlling for this novelty confound in our 762 experiment, we did not replicate these effects. 763 Notably, we have observed predictability effects on BOLD signals in the 764 Fusiform Face Area (Rostalski et al., 2020) while controlling for a similar novelty 765 confound in Pajani et al. (2017). This appears to be another example of BOLD signal 766 effects that are not replicated when using electrophysiological measures. 767 earlier CPP onsets) for repeated as compared to unrepeated stimuli (discussed in 791 Feuerriegel et al., 2022). 792 793

Limitations 794
Our findings should be interpreted with the following caveats in mind. Firstly, the 795 models and empirical evidence discussed here pertain to the visual system and may 796 not be representative of other sensory systems. For example, expectation effects have 797 been reported in auditory predictive cueing designs (e.g., Todorovic & de Lange, 2012). 798 BOLD signals in other brain areas such as the insula have also been found to scale 799 with quantitative measures of uncertainty and surprise (Preuschoff et al., 2008;Mohr 800 et al., 2010;, although these effects may depend on the 801 task-relevance of expected and surprising stimuli (Richter & de Lange, 2019). The EEG 802 signals measured in our experiment may not be sensitive to neuronal activity within 803 hypothesised timescales of internal model updating in predictive coding accounts 823 (e.g., Manahova et al., 2018;Zhou et al., 2020). The qualitatively different patterns of 824 effects across paradigms highlights the importance of building a coherent body of 825 evidence within each paradigm and warrant caution when comparing our findings to 826 those from other designs. 827 In addition, we manipulated expectations relating to face images, meaning that 828 we cannot rule out ES or surprise effects on neurons selective for other stimulus 829 features (but see Kok et al., 2017;Solomon et al., 2021). Future work could adapt our 830 design to manipulate expectations relating to a broader set of stimulus features. 831 We also note that our analyses of ERP amplitudes focus on phase-locked EEG 832 responses, rather than non phase-locked effects that may be uncovered using time-833 frequency analyses (e.g., Zhou et al., 2020; but see Vinken & Vogels, 2018;Solomon et 834 al., 2021). Our experiment was not designed to cleanly isolate time-frequency 835 responses evoked by single stimuli within each trial.