Pupil dilation and P3 reflect subjective surprise about decision outcome

Central to human and animal cognition is the ability to learn from feedback in order to optimize future rewards. Such a learning signal might be encoded and broadcasted by the brain’s arousal systems, including the noradrenergic locus coeruleus. Pupil responses and the P3 component of event-related potentials reflect rapid changes in the arousal level of the brain. Here we ask whether and how these variables may reflect “subjective surprise”: the mismatch between one’s expectation about being correct and the outcome of a decision, when expectations fluctuate due to internal factors (e.g., engagement). We show that during an elementary decision-task in the face of uncertainty both physiological markers of phasic arousal reflect subjective surprise. We further show that pupil responses and P3 are unrelated to each other, and that subjective prediction error computations depend on feedback awareness. These results further advance our understanding of the role of central arousal systems in decision-making under uncertainty.

In this literature, uncertainty and surprise depended lawfully on external factors, such as the volatility of the environment or the difficulty of the stimulus to be discriminated. Strikingly, even when we are given the same information to act on and all external factors are held constant, we will often choose differently each time when asked to make a decision (Glimcher, 2005;Gold & Shadlen, 2007;Sugrue et al., 2005;Wyart & Koechlin, 2016). Such repeated decisions tend to be associated with varying levels of uncertainty (or the inverse: confidence in being correct) (Fleming & Dolan, 2012;Fleming & Lau, 2014;Meyniel et al., 2015). Choice and confidence variability of this kind must be driven by internal variables.
It is unknown if pupil-linked arousal tracks subjective surprise -that is, surprise about decision outcome in the context of varying levels of decision confidence due to internal factors. This is important because deviations between objective task performance and subjective decision confidence are commonly observed, both in healthy humans as well as in several pathologies. Furthermore, it is currently unclear how peripheral markers relate to neural markers of subjective surprise. The phasic release of neuromodulators may be captured in the size of the P3 event-related potential (ERP) component, as measured with electroencephalography (EEG) over centro-parietal electrode sites (Boldt & Yeung, 2015;Brown et al., 2015;Friedman et al., 1973;Jepma et al., 2016;Kamp & Donchin, 2015;Murphy et al., 2011;Pineda et al., 1989;Polich, 2007;Steinhauer & Zubin, 1982). The P3 has been shown to scale with novelty, surprise and perceptual confidence in previous studies (Boldt & Yeung, 2015;Yeung & Summerfield, 2012). Finally, it remains an open question if and how subjective surprise also depends on the subjective awareness of the feedback stimulus. Although unconscious stimuli are known affect a plethora of cognitive processes, it is unknown how important feedback awareness is for prediction error computation (van Gaal & Lamme, 2012).
To tackle these questions, we combined an elementary perceptual decision paradigm, including explicit confidence ratings and high or low visibility feedback, with simultaneous pupil size and EEG recordings. We found (i) that both feedback-related pupil responses and P3 amplitudes reflected surprise about decision outcome, (ii) that the same pupil responses and P3 amplitudes were unrelated to each other, and (iii) that surprise about decision outcome, as reflected by the pupil and/or P3, depends on the conscious access to the feedback stimulus.

Subjects
Thirty-two students from the University of Amsterdam (23 women; aged 18-24) participated in the study for course credits or financial compensation. All subjects gave their written informed consent prior to participation, were naive to the purpose of the experiments, and had normal or corrected-to-normal vision. All procedures were executed in compliance with relevant laws and institutional guidelines and were approved by the local ethical committee of the University of Amsterdam.

Tasks
Subjects participated in three experimental sessions, separated by less than one week. We will first explain the main task, performed in session two and three, and thereafter the tasks performed in the first session. In each session, subjects were seated in a silent and dark room (dimmed light), with their head positioned on a chin rest, 60 cm in front of the computer screen. The main task was performed while measuring Pupil and EEG responses.
Main task: orientation discrimination task (sessions 2 and 3) Stimuli were presented on a screen with a spatial resolution of 1280×720 pixels, run at a vertical refresh rate of 100 Hz. Each trial consisted of the following consecutive intervals ( Fig.  1A): (i) the baseline interval (1.6-2.1 s); (ii) the stimulus interval (0.5 s; interrogation protocol), the start of which was signaled by a tone (0.2 s duration); (iii) the response period (terminated by the participant's response); (iv) a delay (uniformly distributed between 1.5 and 2 s); (v) the feedback interval (0.5 s), the start of which was signaled by the occurrence of a tone (0.2 s duration); (vi) a delay (uniformly distributed between 1.5 and 2 s); (vii) the feedback identity response period (terminated by the participant's response).
During Gabor presentation the luminance across all pixels was kept constant. A sinusoidal grating (1.47 cycles per degree) was presented for the entire stimulus interval. The grating was either tilted 45° (clockwise, CW) or 135° (counter-clockwise, CCW). Grating orientation was randomly selected on each trial, under the constraint that it would occur on 50% of the trials within each block of 60 trials. The grating was presented in a Gaussian annulus of 11.4 cm, with a 10.85 degrees visual angle (1.47 cycles per degree). Feedback was signaled by the Dutch word "goed" (correct feedback) or the word "fout" (incorrect feedback), from now on referred to as "correct" and "error" feedback. The words were presented for three frames just below fixation. Feedback was either masked, by presenting both forward as well as backward masks (masks1-masks2-feedback-masks3-masks4) or unmasked, by presenting only forward masks (masks1-masks2-feedback). Each mask consisted of 6 randomly scrambled letters (without the letters making up the words "goed" or "fout"). Masks' types were presented two frames each. Feedback type (masked vs. unmasked) was randomly selected on each trial, under the constraint that it would occur on 50% of the trials within each block of 60 trials (Fig.  1A).
Throughout the main experiment, the contrast of the Gabor was fixed at the individual threshold level that yielded about 70% correct choices. Each subject's individual threshold contrast was determined before the main experiment using an adaptive staircase procedure (Quest). The corresponding threshold contrasts yielded a mean accuracy of 70.9% correct (±0.44 % s.e.m.) in the main experiment.
Subjects performed between 12 and 17 blocks (distributed over two measurement sessions), yielding a total of 720-1020 trials per participant. Subjects were instructed to report the orientation of the Gabor, and simultaneously their decision confidence in this decision, by pressing one of four response buttons with their left or right index or middle finger: left middle finger: CCW, sure; left index finger: CCW, unsure; right index finger: CW, unsure; right middle finger: CW, sure. Subjects were also instructed to report the identity and visibility of the feedback by pressing one of four response buttons with their left or right index or middle finger: left middle finger -"error", seen; left index finger -"error", unseen; right index finger -"correct", unseen; right middle finger -"correct", seen. For analyses, we defined high visibility feedback trials as trials on which the feedback was unmasked and subjects reported it as "seen". We defined low visibility feedback trials as trials on which feedback was masked and subjects reported it as "unseen".
Note that indeed, the masking procedure revealed two clear categories: on unmasked trials subjects were 99.82% correct in their discrimination between feedback identity (error/correct) (s.e.m.=0.05%) whereas in the masked condition they were 71.21% correct (s.e.m.=1.69%). Note that chance level in the feedback discrimination task is not 50%, because overall Gabor discrimination performance was 70.9% and feedback presentation was veridical. Therefore, subjects may have been able to anticipate the likelihood of being correct/wrong.

Passive viewing task (session 1)
In this control task, subjects fixated their gaze at the center of the screen and passively viewed the words "goed" (correct) and "fout" (error), randomly presented for 100 times. Words were presented for three frames (100 Hz refresh rate) and were not masked.

Forced-choice visibility task (session 1)
In this control task, the words "goed" (correct) or "fout" (error) were presented in the same way as in the main experiment (see above), that is, in a masked or unmasked manner (same timings and presence or absence of masks as described above). Subjects were instructed to report the identity of the presented words, by pressing one of two response buttons with their left or right index finger: left -"error"; right -"correct" (the stimulus-response mapping was counter-balanced across trials, and was indicated on the screen after each trial). Subjects performed two blocks, yielding a total of 200 trials per participant.
In total we tested 49 subjects in the first behavioral and eye-tracking session (namely, the passive viewing and forced-choice visibility tasks). Six subjects did not enter the main experiment due to various reasons (e.g. drop-out, extensive blinking). Of the remaining 43 subjects, the 32 subjects with the lowest discrimination performance score were invited for the second and third session. Discrimination performance for the 32 included subjects varied between 49% and 73% correct. Included subjects were on average 98.87% (SEM=0.02) correct in the unmasked condition and 61.9% (SEM=0.02) correct in the masked condition. The average percentage of correct responses for masked words exceeded chance-level performance (t31=11.26, p<0.001).

Priming task (session 1)
In this control task, subjects were instructed to respond as fast and accurately as possible to eight Dutch words, randomly selected out of five of positive (laugh, happiness, peace, love, fun) and 5 (death, murder, angry, hate, war) of negative in valence, by pressing one of two response buttons with their left or right index finger: left -negative; right -positive. Unknown to our subjects, these words were preceded by the masked words "goed" and "fout", respectively "correct" and "incorrect", three frames each before the positive or negative word targets (12 frames each) in 100 Hz refresh rate. This yielded congruent and incongruent trials. Subjects performed six blocks, yielding a total of 480 trials per participant.

Data acquisition
The diameter of the left eye's pupil was tracked at 1000 Hz with an average spatial resolution of 15-30 min arc, using an EyeLink 1000 system (SR Research, Osgoode, Ontario, Canada). EEG data was recorded and sampled at 512 Hz using a BioSemi Active Two system. Sixtyfour scalp electrodes were distributed across the scalp according to the 10-20 International system and applied using an elastic electrode cap (Electro-cap International Inc.) Additional electrodes were two electrodes to control for eye-movements (left eye, aligned with the pupil, vertically positioned, each referenced to their counterpart), two reference electrodes at the ear lobes to be used as reference and two electrodes for heartbeat (positioned at the left of the sternum and in the right last intercostal space).

Data analysis
Eye data preprocessing Periods of blinks and saccades were detected using the manufacturer's standard algorithms with default settings. The subsequent data analyses were performed using custom-made Python software. The following steps were applied to each pupil recording: (i) linear interpolation of values measured just before and after each identified blink (interpolation time window, from 150 ms before until 150 ms after blink), (ii) temporal filtering (third-order Butterworth, low-pass: 10 Hz), (iii) removal of pupil responses to blinks and to saccades, by first estimating these responses by means of deconvolution, and then removing them from the pupil time series by means of multiple linear regression (Knapen et al., 2016), and (iv) conversion to units of modulation (percent signal change) around the mean of the pupil time series from each block.

Quantification of feedback-evoked pupillary responses
We computed feedback-evoked pupillary response amplitude measures for each trial as the mean of the pupil size in the window 0.5 s to 1.5 s from feedback, minus the mean pupil size during the 0.5 s before the feedback. This time window was chosen to be centered around the peak of the pupil response to a transient event (like the feedback in our task; Fig. 2A) (de Gee et al., 2014;Hoeks & Levelt, 1993).

EEG data preprocessing
Standard pre-processing steps were performed in EEGLAB toolbox in Matlab. Data were bandpass filtered from 0.1 to 40 Hz off-line for ERP analyses. Epochs ranging from -1 to 2 seconds around feedback presentation were extracted. Linear baseline correction was applied to these epochs using a -200 to 0 ms window. The resulting trials were visually inspected and those containing artifacts were removed manually. Moreover, electrodes that consistently contained artifacts were interpolated, entirely or per bad epoch. Finally, using independent component analysis, artifacts caused by blinks and other events not related to brain activity were manually removed from the EEG data.

Quantification of feedback-related ERP components
We focused on ERP components related to feedback processing with different latencies and topographical distributions. To zoom in on these specific components, a central region of interest (ROI) was defined (including the averaged signal of electrodes F1, Fz, F2, FCz, FC1, FC2, Cz, C1, C2, CPz, CP1, CP2, Pz, P1, P2). The P3 window was defined from 0.5 to 0.8 seconds after feedback presentation.

Behavioral analyses and statistical comparisons
Behavioral and statistical analyses were performed in Python. We only considered unmasked trials reported as seen as high visibility feedback (99% of unmasked trials) and masked trials indicated as unseen as low visibility feedback (89.2% of masked trials). RT was defined as the time from stimulus offset until the button press. We used 2x2 repeated measures ANOVA to test for the main effect of being correct, and for the interaction effect between correctness and confidence. With a 2x2x2 repeated measures ANOVA we tested whether these main and interaction effects were different between the high and low visibility conditions. We used the paired-samples t-test to test for differences in RT, accuracy or choices between high and low confidence trials, and between congruent and incongruent priming conditions. We used Pearson correlation to quantify the relationship between pupil responses and P3 amplitudes.

Data and code sharing
The data are publicly available on [to be filled in upon publication]. Analysis scripts are publicly available on [to be filled in upon publication].

Figure 1. Task and behavior. (A)
Sequence of events during a single trial. Subjects reported the direction and level of confidence in the decision about a Gabor patch by pressing one of four buttons (CCW=counter-clock-wise, CW=clock-wise; CCW sure; CCW unsure, CW unsure, CW sure). After the decision interval, veridical feedback was presented indicating the correctness of the response. Subjects reported the identity and visibility of the feedback stimulus (the word "error" or "correct" in Dutch) by pressing one of four buttons (seen error; unseen error; unseen correct; seen correct; see Methods for details). Feedback-related pupil response and P3 amplitudes report a subjective prediction error During simultaneous pupillometry and EEG recordings, thirty-two human subjects performed a challenging contrast orientation discrimination task (three experimental sessions per subject, on different days). On each trial this involved discriminating the orientation (clockwise [CW] vs. counter-clockwise [CCW]) of a low-contrast Gabor, explicit confidence ratings and feedback (Fig. 1A). The Gabor's contrast was adjusted individually such that each subject performed at about 70% correct ( Fig. 1C; Materials and Methods). Subjects simultaneously indicated their CW/CCW-choice and the accompanying confidence in that decision (sure/unsure; type 1 confidence (Galvin et al., 2003), see Fig. 1A). These explicit ratings provided a window into the trial-to-trial fluctuations of decision confidence, which may shape prediction error signals after decision outcome (feedback), and physiological correlates thereof. The Dutch words for "error" or "correct" provided feedback about the correctness of the preceding CW/CCW-choice. Feedback was preceded and followed by masks (random letters) on 50% of trials. This was done to test whether uncertainty about whether the decision was correct or incorrect, because of varying levels of feedback awareness, may affect phasic measures of central arousal state. At the end of the trial, subjects had to indicate the subjective visibility and identity (the word "error" or "correct") of the feedback stimulus (Fig. 1A). This allowed us to post-hoc sort trials based on the combination of masking strength and subjective visibility (Materials and Methods).
Subjects' choice behavior indicated that they successfully introspected perceptual performance: subjects were faster and more accurate when they were confident in their decision (Fig. 1B,C), a typical signature of confidence (Kamp & Donchin, 2015b;Meyniel et al., 2013). There was no relationship between confidence and decision bias (Fig. 1D). In line with earlier work (Sanders et al., 2016;Urai et al., 2017), reaction times (RTs) predicted accuracy and confidence, with more accurate and confident choices for faster RTs (Fig. 1E). Taken together, these results suggest that subjects in our task were able to introspect perceptual performance well.
Negative feedback ('error') was more surprising than positive feedback ('correct'), because subjects performed well above chance (~71% correct). Negative feedback should be especially surprising when subjects were relatively sure about the correctness of the preceding choice. In contrast, positive feedback should be least surprising when they were relatively sure about the correctness of the preceding choice. In line with this intuition, trial counts followed the expected ordering (from least to most often / from most to least surprising): sure-error, unsure-error, unsure-correct, sure-correct (Fig. 1F). For trial counts, there was thus a significant main effect for correctness (F1,31=2663.43, p<0.001) and an interaction effect between correctness and confidence (F1,31=129.23, p<0.001). Any physiological variable that encodes a subjective prediction error (surprise about decision outcome) should follow a similar pattern. More specifically, although the main effect of correctness (feedback: error vs correct) may partly reflect typically observed error monitoring processes (Cohen et al., 2011;Ullsperger et al., 2014), which are not necessarily related to prediction errors, because it can be triggered purely by the type feedback received (error or correct). It is however especially the interaction between confidence and correctness that we consider a signal of subjective prediction error computation.
Using the same 2x2 ANOVA logic, we tested whether the amplitude of the feedback-related pupil response and/or P3 component reflect a subjective prediction error. Indeed, feedbackrelated pupil responses (to high visibility feedback) were larger for negative versus positive feedback (main effect of correctness: F1,31=43.33, p<0.001; Fig. 2A,B). This error vs. correct difference was larger when subjects' decision confidence was high vs. low (interaction correctness x confidence: F1,31=11.30, p=0.002; Fig. 2A,B). We verified that the correctness main effect was not driven by any low-level stimulus characteristics, such as luminance, or the intrinsic valence of the words used as feedback (e.g. being of positive/negative valence; Materials and Methods; Fig. 1I). The feedback-related P3 exhibited similar functional properties as the feedback-related pupil responses (Fig. 2D,E): the P3 component was larger for feedback indication an error vs. correct decision (F1,31=66.72, p<0.001) and this effect interacted with decision confidence (F1,31=6.04, p=0.020, see headmaps in Fig. 2D for topographical distributions of these effects).
If the pupil-and P3-responses are driven by the same central (e.g., neuromodulatory) process, as postulated in one influential account (Sander Nieuwenhuis et al., 2005), then both measures should not only exhibit the same dependence on experimental conditions on average ( Fig. 2A-F), but also their main and interaction effects should correlate across subjects. We did not find evidence for this (Fig. 2G,H). This suggests that pupil size and the P3 are driven by distinct neural processes, both of which are sensitive to decision confidence and prediction errors. We used the reaction times, a sensitive measure of confidence (Fig.  1E), to visualize and quantify the pupil-and P3-reported prediction errors in a more finegrained fashion. To that end, we used the same 2x2 ANOVA logic. The feedback-related pupil responses were larger for negative compared to positive feedback (F1,31=48.77, p<0.001) and this effect interacted with RT (F1,31=12.48, p=0.001, Fig. 1C). Likewise, the feedback-related P3 amplitudes were larger for negative compared to positive feedback (F1,31=71.91, p<0.001) and this effect interacted with RT (F1,31=6.08, p=0.019, Fig. 1F).
Taken together, we conclude that both physiological variables, feedback-related pupil responses and P3 amplitudes, report a subjective prediction error, when feedback is presented fully consciously. High visibility feedback-related P3 correctness main effect plotted against pupil response correctness main effect. Stats, Pearson correlation; datapoints, individual subjects (N=32); error bars, 60% confidence intervals (bootstrap). (H) As G, but for correctness x confidence interaction effects. (I) High visibility event related pupil time course sorted by the words "correct" and "error" during a passive viewing experiment (Materials and Methods). All panels except G,H: group average (N=32); shading or error bars, s.e.m.

Physiological correlates of subjective prediction errors depend on feedback awareness
Feedback-related pupil responses and P3 amplitudes did not report a subjective prediction error after low visibility feedback (Materials and Methods). For the feedback-related pupil responses, there was no significant main effect of correctness (F1,31=4.04, p=0.053) nor an interaction effect thereof with confidence (F1,31=0.04, p=0.840, Fig. 3A,B). Likewise, for the feedback-related P3 amplitudes, there was no significant main effect of correctness (F1,31=3.67, p=0.064) nor an interaction effect thereof with confidence (F1,31=0.60, p=0.442, Fig. 3D,E). As for the high visibility feedback, the pupil-and P3-main and interaction effects were not correlated across participants (Fig. 3G,H).
We ruled out the possibility that the low visibility feedback was too weak (because of masking) to drive a potential prediction error. A behavioral priming experiment (Materials and Methods) with the same stimuli and stimulus timings showed typical priming effects, both in regard to RT and accuracy. We observed faster responses and higher accuracy for congruent primetarget pairs versus incongruent pairs (Fig. 3I).
Feedback-related pupil responses and P3 amplitudes reported subjective prediction errors significantly better after high versus low visibility feedback. For the feedback-related pupil responses, there was a significant visibility x correctness interaction effect (F1,31=14.90, p<0.001) and a visibility x correctness x confidence interaction effect (F1,31=6.99, p=0.013). Likewise, for the feedback-related P3 amplitudes, there was a significant visibility x correctness interaction effect (F1,31=48.65, p<0.001) but no significant visibility x correctness x confidence interaction effect (F1,31=3.01, p=0.093).
In sum, feedback-related pupil responses and P3 amplitudes only reflected a subjective prediction error after high visibility feedback (interaction correctness x confidence). Our results indicate that full visibility of decision feedback is critical to drive a subjective prediction error response. High visibility feedback-related pupil responses sorted by correctness (error, correct) and confidence (high, low). (C) High visibility feedback-related pupil responses sorted by correctness (error, correct) and RT. (D), As A, but for the high visibility feedback event-related potential (ERP) time courses. Head map, interaction effect between correctness and confidence (map limits [-1 1]). (E,F), as B,C but for high visibility feedback-related P3 scalar amplitude measures. (G) Low visibility feedback-related P3 correctness main effect plotted against pupil response correctness main effect. Stats, Pearson correlation; datapoints, individual subjects (N=32); error bars, 60% confidence intervals (bootstrap). (H) As G, but for correctness x confidence interaction effects. (I) Reaction times (left) and accuracy (right) sorted by congruency (congruent, incongruent) showing typical behavioral priming effects (Materials and Methods). All panels except G,H: group average (N=32); error bars, s.e.m.

Discussion
Here we show that two markers of phasic arousal, pupil size and the P3 ERP component, reflect subjective prediction error responses. This subjective prediction error response was only observed when the feedback stimulus, indicating the correctness of the previous decision, was presented at full visibility. These results were not driven by any low-level stimulus characteristics, such as luminance, or the intrinsic valence of the words used as feedback (e.g. being of positive/negative valence, Fig. 2I). The reported findings advance the current knowledge about the role of decision confidence and conscious awareness in prediction error computations in several important ways.
For the first time (to our knowledge), we reveal that prediction error responses are truly defined by subjective decision confidence, despite equal task difficulty. Previous studies have revealed that pupil dilation reflects decision uncertainty and prediction error computation during perceptual choices (Colizoli et al., 2018;Joshi & Gold, 2020;Urai et al., 2017) when task difficulty was manipulated. In these studies, humans performed a random dot motion task incorporating easy and difficult trials depending on the strength of motion coherence. Pupil responses were larger for performance feedback informing that the decision was erroneous vs. correct and this effect was modulated by trial difficulty, in such a way that the pupil dilated most for erroneous decisions based on strong evidence (strong prediction error) and least for correct decisions based on strong evidence (no prediction error). However, several studies have shown that subjective reports of decision confidence do not necessarily track experimental manipulations of task difficulty, for example because confidence estimations are biased due to individual differences in sensitivity to evidence strength or affective value (Fleming & Lau, 2014;Lebreton et al., 2018;Zylberberg et al., 2014). Because we interrogated subjective decision confidence on every single trial, this allowed us to perform post-hoc trial sorting based on trial-by-trial fluctuations in confidence under equal task settings. Thereby we were able to link feedback processing directly to subjective confidence estimations, establishing direct evidence for subjective prediction error computation in the pupil and P3 ERP component.
Further, we show that full visibility of performance feedback is crucial for subjective prediction error computation. Although there is consensus that some perceptual and cognitive processes may unfold in the absence of awareness, it is highly debated which functions (if any) may need consciousness to emerge (Dehaene & Naccache, 2001;Hommel, 2007;Kunde et al., 2012;. Many perceptual and cognitive processes may partly unfold unconsciously, typically demonstrated in masked priming studies, in which a task-irrelevant unconscious stimulus facilitates responding to a subsequent task-relevant conscious stimulus (Kiefer et al., 2011;Kiesel et al., 2007;Lamme, 2010). These "simple" priming effects are typically explained by assuming that the fast feedforward sweep of neural processing is relatively unaffected by masking and is able to unconsciously affect ongoing behavioral responses (Dehaene & Changeux, 2011;. We also observed here that the same masked stimuli used as feedback in the main experiment (the words error/correct) could induce behavioral priming when presented in the context of a masked priming task (Fig.  3). However, when the same stimuli were used as feedback stimuli in a perceptual decision task, no subjective prediction error responses were observed (correctness x confidence interaction). Speculatively, error detection mechanisms (main effect of correctness) could still be observed when feedback was masked in the current task design (Fig. 3). Although evidence was statistically relatively weak, the main effects of correctness, especially in pupil size, were significant. This may not be overly surprising, because previous studies have shown that error detection mechanisms may unfold (at least partially) in the absence of error awareness (Charles et al., 2013;Cohen et al., 2009;Nieuwenhuis et al., 2001;Overbeek et al., 2005;Shalgi, 2012). However, our results revealed more importantly an absence of confidence x correctness interactions on low visibility feedback. This may suggest that to incorporate subjective confidence in feedback-driven prediction error computations, awareness of the decision outcome (feedback) is crucial. This may suggest that subjective prediction error computation cannot rely on feedforward responses alone, in contrast to e.g. masked priming, and requires (bidirectional) interactions (i.e. recurrent processing) between higher-order and lower-order regions, a phenomenon mainly observed when stimuli are presented above the threshold of conscious perception (Dehaene & Changeux, 2011;van Gaal & Lamme, 2012). Further unraveling the underlying neural processes dissociating "objective error processing" from "subjective prediction error" computation is important for further understanding the potential scope and limits of unconscious information processing.
Previous studies have shown that pupil size is sensitive to implicit surprise and effort invested in a cognitive task. For example, it has recently been demonstrated that pupil size increases when the level of cognitive effort invested in a (conflict) task is high, even when subjects are not aware of systematic differences in difficulty between conditions (Diede & Bugg, 2017). Related, it has been shown recently that when agents are not aware of specific transitional rules in an implicit learning task, both the pupil and central ERP potentials (reminiscent of the mismatch negativity) may still signal surprise when statistical regularities in stimulus transitions are violated (Alamia et al., 2019; see also Meijs et al., 2018). Although intriguing, both tasks can be considered "implicit" (learning/conflict) tasks, because stimuli were always presented fully consciously and subjects were just not aware of differences in the probabilities of occurrence of specific stimuli. Therefore these effects cannot be directly compared to situation in which stimulus visibility is reduced.
Although we show that two different measures of the subjects arousal state reflect subjective prediction error responses, they may do so in potentially different ways, because correlations between pupil size and P3 amplitude were absent, in line with previous studies (Hong et al., 2014;Kamp & Donchin, 2015;Mückschel et al., 2017;Murphy et al., 2011). Recent animal studies have revealed a tight coupling between pupil diameter and neural responses in the noradrenergic locus coeruleus (Breton-Provencher & Sur, 2019;Joshi et al., 2016;Reimer et al., 2016;Liu et al., 2017;Varazzani et al., 2015), which is supported by recent human fMRI studies (de Gee et al., 2017;Murphy, O'Connell, et al., 2014). However, some (of these) studies also found unique contributions to pupil size in other subcortical regions, such as the cholinergic basal forebrain, dopaminergic midbrain, and the superior and inferior colliculi (de Gee et al., 2017;Joshi et al., 2016;Mridha et al., 2019;Reimer et al., 2016). Several lines of evidence reinforce a putative link between pupil diameter and the dopamine system, for example in patients with Parkinson's disease (Kringelbach et al., 2007;Manohar & Husain, 2015;Mathôt, 2018;Varazzani et al., 2015;Weinshenker & Schroeder, 2007). Similarly, the P3 has also been used as an electrophysiological correlate of feedback-evoked phasic catecholamine release in the cortex Polich, 2007;Rangel-Gomez et al., 2013). Therefore, although these physiological markers of phasic arousal tend co-occur, they may reflect (partly) different processes (Eckstein et al., 2017;Kamp & Donchin, 2015). Understanding the relationship between changes in pupil dilation and the amplitude of P3 responses is important avenue for future research. Further, although the main goal of this study was testing the association between putative measures of central arousal state (pupil size/P3 responses) and subjective prediction error computation, exploratory analyses did not reveal any systematic relationship between other ERP components associated with feedback processing, such as the feedback related negativity (FRN) (Cohen et al., 2011) and our variables of interest. Previously, using a probabilistic reversal learning task, we observed that the amplitude of the FRN was strongly linked to the signed prediction error variable ("objective prediction error") derived from reinforcement learning modeling (Correa et al., 2018). This relationship was strongly attenuated when feedback awareness was reduced. Future work is needed to explore in more detail the relationship between different ERP components (e.g. FRN, P3) and specific aspects of prediction error computation and how these may be differentially affected by levels of (feedback) awareness.