Neural correlates of confidence during decision formation in a perceptual judgment task

When we make a decision, we also estimate the probability that our choice is correct or accurate. This probability estimate is termed our degree of decision confidence. Recent work has reported event-related potential (ERP) correlates of confidence both during decision formation (the centro-parietal positivity component; CPP) and after a decision has been made (the error positivity component; Pe). However, there are several measurement confounds that complicate the interpretation of these findings. More recent studies that overcome these issues have so far produced conflicting results. To better characterise the ERP correlates of confidence we presented participants with a comparative brightness judgment task while recording electroencephalography. Participants judged which of two flickering squares (varying in luminance over time) was brighter on average. Participants then gave confidence ratings ranging from “surely incorrect” to “surely correct”. To elicit a range of confidence ratings we manipulated both the mean luminance difference between the brighter and darker squares (relative evidence) and the overall luminance of both squares (absolute evidence). We found larger CPP amplitudes in trials with higher confidence ratings. This association was not simply a by-product of differences in relative evidence (which covaries with confidence) across trials. We did not identify postdecisional ERP correlates of confidence, except when they were artificially produced by pre-response ERP baselines. These results provide further evidence for neural correlates of processes that inform confidence judgments during decision formation.


=. Introduction
When we make decisions, we also estimate the probability that our choice is accurate or will lead to desired outcomes.These (often implicit) confidence judgments have been conceptualised as 'second-order' decisions across a continuous dimension (ranging from being certain that the decision was incorrect, to being certain that the decision was correct), signalling the prospect that a corresponding 'first-order' decision is accurate (Yeung & Summerfield,VU0V;Pouget et al.,VU0b;Fleming & Daw,VU0c).We can use our sense of confidence as a proxy for objective choice accuracy to rapidly correct errors (Rabbitt & Vyass,0ef0;Yeung & Summerfield,VU0V) and determine whether we should adjust our decision-making strategies to improve performance (Vickers,0ece;van den Berg et al.,VU0ba;Desender et al.,VU0ea).
Two broad classes of models have been proposed to account for confidence judgments in perceptual decision tasks.The first class of 'decisional-locus' models (as labelled by Yeung & Summerfield,VU0V) specify that confidence judgments are (primarily or exclusively) based on information related to features of the first-order decision (e.g., Vickers, 0ece; Kiani & Shadlen,VUUe;Kiani et al.,VU0g).This includes a subset of racing accumulator models that predict confidence as a function of the relative extent of accumulated evidence in favour of each choice alternative (e.g., Vickers,0ece;Vickers & Packer,0efV;Ratcliff & Starns,VUUe).By contrast, postdecisional locus models describe processes that occur after the time of the firstorder decision, such as continued evidence accumulation (Rabbitt & Vyass,0ef0;Pleskac & Busemeyer,VU0U;Moran et al.,VU0h;van den Berg et al.,VU0bb;Desender et al.,VUV0a;Maniscalco et al.,VUV0) or computations based on other sources of information (e.g., exerted motoric effort, Fleming & Daw,VU0c;Gajdos et al.,VU0e;Turner et al.,VUV0a;Overhoff et al.,VUVV).
Each class of models specifies critical processes that occur over different time windows relative to the first-order decision.Accordingly, researchers have tested for neural correlates of confidence during decision formation and postdecisional time windows using high temporal resolution neural recordings such as electroencephalography (EEG) and analyses of event-related potential (ERP) components.This has produced two distinct sets of findings: one related to the centro-parietal positivity (CPP) component during decision formation (O'Connell et Neural correlates of confidence g al., VU0V; Twomey et al.,VU0h) and another linked to the postdecisional error positivity (Pe) component (Falkenstein et al.,0ee0, for reviews see Rausch et al., VUVU; Feuerriegel et al., VUVV).

=.=. ERP Correlates of Confidence During Decision Formation
The CPP component (analogous to the PMb, Twomey et al.,VU0h) has a morphology that closely resembles accumulation-to-boundary trajectories of decision variables in evidence accumulation models (Ratcliff,0ecf;Ratcliff et al.,VU0b;O'Connell et al.,VU0V;Kelly & O'Connell,VU0M;Twomey et al.,VU0h).The CPP has therefore been conceptualised as a neural correlate of evidence accumulation trajectories in decision-making tasks.Accordingly, the amplitude of this component at the time of the decision (e.g., immediately preceding a keypress response) has been interpreted as indexing the extent of evidence accumulated in favour of the chosen option (e.g., Philiastides et al.,VU0g;Gherman & Philiastides,VU0f;Steinemann et al.,VU0f;von Lautz et al.,VU0e;Feuerriegel et al.,VUV0;Kelly et al.,VUV0).Larger stimuluslocked CPP amplitudes have been reported to co-occur with higher confidence ratings (Squires et al.,0ecM;Gherman & Philiastides,VU0h,VU0f;Herding et al.,VU0e;Zakrzewski et al.,VU0e;Rausch et al.,VUVU) and also higher stimulus visibility ratings as measured using a perceptual awareness scale (Tagliabue et al.,VU0e).Larger preresponse amplitudes have also been reported for higher model-estimated (Philiastides et al.,VU0g) and participant-reported confidence ratings (Grogan et al., VUVM; but see Feuerriegel et al., VUVV).This has been taken as support for the 'balance of evidence' hypothesis proposed in racing accumulator models (e.g., Vickers,0ece;Vickers & Packer,0efV;Smith & Vickers,0eff;Ratcliff & Starns,VUUe;discussed in Smith et al,VUVV).In these models, a first-order decision is made when one of multiple racing accumulators reaches a decision boundary.Differences in the relative extent of evidence accumulated for chosen and unchosen options at the time of the first-order decision (assumed to be indexed by CPP amplitudes) determine one's degree of confidence.Here, we note that there are several other decisional-locus models that do not propose the balance of evidence hypothesis (reviewed in Pleskac & Busemeyer, VU0U; Yeung & Summerfield,VU0V;Fleming & Daw,VU0c).These models could also be formulated in ways that specify links to CPP amplitudes during decision formation.

Neural correlates of confidence h
There are, however, several conceptual and measurement issues that complicate the interpretation of these findings.The first issue is that racing accumulator models specify differences in evidence accumulation across choice options at the time of the decision (approximated by the time of the motor response).However, the majority of studies reporting positive associations between CPP amplitudes and decision confidence have analysed stimulus-locked ERPs (e.g.,Squires et al.,0ecM;Gherman & Philiastides,VU0h,VU0f;Herding et al.,VU0e;Zarkewski et al.,VU0e;Rausch et al.,VUVU).Because response times (RTs) vary widely across trials, it is unclear whether these stimulus-locked measures capture amplitude variations that are most pronounced at the time of decision formation.In addition, higher confidence ratings typically co-occur with faster RTs in most perceptual decision tasks (e.g., Johnson, 0eMe; Vickers & Packer,0efV;Kiani et al.,VU0g;Rahnev et al.,VUVU).As the CPP peaks around the time of the response, there are likely differences with respect to both the latency and/or amplitude distributions across confidence ratings, which are unable to be cleanly dissociated in stimulus-locked ERPs (for further discussion see Ouyang et al.,VU0h;Feuerriegel et al.,VUVV).
Studies measuring pre-response CPP amplitudes (which avoid the issues described above) have yielded mixed results.Grogan et al. (VUVM) reported larger CPP amplitudes in trials with higher confidence ratings, both when choice and confidence were reported simultaneously and for delayed confidence judgments.However, Feuerriegel et al. (VUVV) reported that confidence was associated with amplitudes of a fronto-central component rather than the CPP (resembling components identified in Kelly & O'Connell,VU0M;Burwell et al.,VU0e).This fronto-central effect also appeared to bias measures at parietal channels via volume conduction across the scalp (although this issue was accounted for in Grogan et al., VUVM).It remains to be seen whether associations between response-locked CPP amplitudes and confidence reliably replicate across different decision-making tasks, or if there are a subset of conditions under which these correlations are observed (discussed in Grogan et al., VUVM).

Neural correlates of confidence b
Researchers have also reported associations between confidence ratings and amplitudes of the postdecisional Pe component.The Pe is typically measured around VUU-gUU ms after the time of the first-order decision at the same centro-parietal electrodes used to measure the CPP (e.g., Boldt & Yeung,VU0h).Larger (i.e., more positive-going) Pe amplitudes are observed when participants are aware of making an error (Ridderinkhof et al.,VUUe;Steinhauser & Yeung,VU0U;Wessel et al.,VU00), and the morphology of this component closely resembles that of the CPP in the lead-up to error detection reports following a first-order decision (Murphy et al.,VU0h).Larger Pe amplitudes have also been observed in trials with lower confidence ratings (e.g., Boldt & Yeung,VU0h;Desender et al.,VU0ea;Rausch et al.,VUVU).Based on these findings, Desender et al. (VUV0b) proposed a model in which the Pe component tracks the degree of postdecisional evidence accumulated against the first-order decision (i.e., the extent of evidence in favour of having made an error) that determines one's degree of confidence.Importantly, their model specifies a monotonic, negative-going association between Pe amplitudes and confidence ratings spanning the range of "certainly wrong" to "unsure" to "certainly correct".
Interpretation of these findings, however, is complicated by measurement issues relating to ERP baselines.Landmark findings of confidence-Pe component associations have used pre-response baselines for their primary analyses (e.g., Boldt & Yeung,VU0h;Desender et al.,VU0eb).These pre-response baseline windows overlap in time with observed differences in pre-response ERPs across confidence ratings, visible at the same centro-parietal electrodes (e.g., Feuerriegel et al., VUVV; Grogan et al., VUVM).Consequently, ERP differences already present during the pre-response baseline window would be artefactually propagated with opposite polarity to the postresponse time window due to the baseline subtraction procedure (for examples see Feuerriegel & Bode, VUVV).Although Boldt and Yeung (VU0h) reported similar results across pre-stimulus and pre-response baselines, others have reported that postdecisional ERP amplitude associations with confidence disappear when instead applying pre-stimulus baselines (Desender et al.,VU0eb;Feuerriegel et al.,VUVV;Grogan et al.,VUVM).Notably, Feuerriegel et al. (VUVV) reported that, when using a pre-stimulus baseline, Pe amplitudes were specifically associated with confidence ratings indicating certainty in having committed an error (across the range of "surely Neural correlates of confidence c incorrect" to "unsure" ratings), but not certainty in having made a correct response (across the range of "unsure" to "surely correct").Grogan et al. (VUVM) did observe postdecisional ERP waveforms that covaried with confidence ratings when using prestimulus baselines, however these were only observed in conditions where additional, decision-relevant stimuli were presented immediately after the time of the perceptual judgment in each trial.Rather than an evidence accumulation process based on residual representations of sensory evidence or iconic memory (as proposed in Murphy et al.,VU0h;Desender et al.,VUV0b), this may have instead reflected the additional accumulation of the sensory evidence provided by the postdecisional stimuli.In conditions where no stimuli were presented after the initial choice, associations between confidence and Pe amplitudes were not observed.
As the model of Desender and colleagues (VUV0b) provides an elegant, unified account of the Pe linking error detection, confidence, and changes of mind, further systematic investigation is necessary to determine the contexts in which the Pe does (and does not) vary with confidence in decision-making tasks.

=.F. Decorrelating Stimulus Discriminability and Confidence
There is another issue common to both the CPP-and Pe-related findings described above.In previous studies, higher average confidence ratings have been reported in conditions of higher stimulus discriminability (e.g., Rausch et al., VUVU; Feuerriegel et al., VUVV; Grogan et al., VUVM), or in subsets of trials with higher accuracy (e.g., Boldt & Yeung,VU0h).It has been proposed that, in such cases, neural correlates of confidence may (at least partly) reflect co-occurring differences in stimulus discriminability or task difficulty rather than one's degree of confidence per se (discussed in Lau & Passingham,VUUb;Odegaard et al.,VU0f;Dou et al.,VUVM).
However, confidence and accuracy can be (at least partly) dissociated.Recent studies using two-choice discrimination tasks have manipulated task difficulty (e.g., the difference in brightness between two squares in a comparative brightness judgment task, here termed relative evidence) as well as the overall intensity of a relevant sensory attribute (the overall brightness of the two squares, here termed absolute evidence).According to Weber's law, increasing the level of absolute evidence (while keeping relative evidence constant) reduces stimulus discriminability Neural correlates of confidence f (Geisler, 0efe).However, increases in absolute evidence also lead to faster responses and higher levels of reported confidence despite worse accuracy in these conditions (e.g., Zylberberg et al.,VU0V;Koizumi et al.,VU0h;Peters et al.,VU0c;Odegaard et al.,VU0f;Samaha & Denison,VUVV;Ko et al.,VUVV).Manipulations of stimulus luminance have also produced similar effects on confidence and accuracy in recognition memory tasks (Busey et al., VUUU).This implies that absolute evidence influences both decision-making and metacognition, which presents an opportunity to probe neural correlates of confidence that are not strongly correlated with accuracy.
Recently, Dou et al. (VUVM) manipulated both relative and absolute evidence and reported consistent associations between CPP build-up rates (i.e., rising slopes) and confidence across three experiments.This was found even when statistically controlling for levels of relative evidence, suggesting that pre-decisional neural correlates of confidence may be distinct from ERP components that covary with task difficulty.However, they did not analyse CPP pre-response amplitudes, which are thought to index the extent of evidence accumulated, as opposed to the CPP build-up rate which is linked to the rate of evidence accumulation (O'Connell et al.,VU0V;Kelly & O'Connell,VU0M).The former, which is directly relevant to racing accumulator models, remains to be tested.

=.H. Testing for Pre-and Postdecisional Correlates of Confidence
Despite extensive efforts to characterise the ERP correlates of confidence, a large body of findings are complicated by ERP measurement issues, and recent findings have not been consistently replicated across studies.To further test the generalisability of pre-and postdecisional ERP correlates of confidence across decision contexts, we adapted the brightness judgment design of Ko et al. (VUVV) and recorded EEG.In this task, participants judged which of two luminance-varying squares was brighter on average and provided confidence ratings after a brief interval.We manipulated the average luminance difference (termed relative evidence in Ko et al., VUVV) and the overall luminance of both squares (absolute evidence).As observed in previous work (e.g., Ko et al., VUVV; Dou et al., VUVM) higher levels of absolute evidence led to worse accuracy but higher confidence ratings.This allowed us to assess neural correlates of confidence ratings that were partially decorrelated from Neural correlates of confidence e relative evidence and task performance (as done in Dou et al., VUVM).Here, we tested for associations between confidence ratings and pre-decisional CPP amplitudes (linked to the extent of evidence accumulation at the time of a decision) as well as postdecisional Pe amplitudes (linked to the extent of evidence accumulated in favour of detecting an error).
Neural correlates of confidence 0U E. Method

E.=. Participants
We recruited Mb university students with normal or corrected-to-normal vision.
We report how we determined our sample size, all data exclusions (if any), all inclusion/exclusion criteria, whether inclusion/exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.Our target sample size (prior to exclusions) was comparable to Feuerriegel et al. (VUVV, n=Mh) who reported ERP correlates of confidence.We did not have a specific target effect size, however we note that our sample size is larger than those of many previous studies (e.g., n=0b in Boldt & Yeung,VU0h;n=Vh in Rausch et al.,VUVU).We excluded two participants who failed to report confidence ratings in more than VU% of all trials, three for overall accuracy lower than hh%, and one for reporting the same confidence level in more than eU% of trials whereby confidence was reported (same exclusion criteria as in Ko et al., VUVV).Two further participants were excluded due to frequent, long-duration blinks and eye movements during the experimental trials as identified by visual inspection of the data.This criterion was not established prior to data analysis but was judged to be a strong indicator that those participants were not performing the task in the intended manner.The final sample included Vf participants (aged 0f-Me, M=Vb, SD=b, 0b female).This sample size was comparable to recent studies that reported associations between CPP and Pe amplitudes and confidence (e.g., Feuerriegel et al., VUVV; Grogan et al., VUVM).Participants were reimbursed MU AUD for their time.This study was approved by the Human Research Ethics Committee of the Melbourne School of Psychological Sciences (ID 0ehgbg0.V).

E.E. Stimuli and Task
Participants completed a comparative brightness judgment task in which two flickering greyscale squares were concurrently presented in each trial.Both squares were cU × cU pixels in size and were positioned at equal distances from the centre of the screen, separated from each other by 0fU pixels.Stimuli were displayed against a black background on a Sony Trinitron Multiscan GgVU CRT Monitor (0VfU x 0UVg pixels; ch Hz refresh rate) gamma-corrected using a ColorCAL MKII Colorimeter (Cambridge Research Systems).Stimuli were presented using Psychtoolbox-M (Brainard,0eec;Kleiner et al.,VUUc) running in MATLAB VU0fb (The Mathworks).
Code used for stimulus presentation will be available at osf.io/cxvMm at the time of publication.
Within each trial (depicted in Figure 0A), the luminance of each flickering square changed with each monitor refresh (every 0M.M ms).Luminance values of each square were randomly sampled from a pair of truncated normal distributions which Luminance values differed across conditions with respect to the difference between squares (low/medium/high relative evidence) and the overall luminance across both squares (low/medium/high absolute evidence).

E.F. Procedure
After giving written consent and receiving task instructions, participants were seated in a dark testing booth cU cm from the computer monitor.They completed a training session while the experimenter stayed in the testing booth to ensure task comprehension.Participants then completed the main experiment alone.No part of the study procedures was pre-registered prior to the research being conducted.
In each trial (depicted in Figure 0A) a white fixation dot was presented for 0,hUU ms.The dot changed to red for hUU ms to signal the imminent appearance of the flickering squares.Following this, the squares appeared.Participants indicated which square appeared brighter on average by pressing one of two buttons on a seven-button Cedrus response pad (RB-cgU, Cedrus Corporation) using their left and right index fingers.Stimuli were presented for a maximum of 0,hUU ms and disappeared immediately after a response was made.Participants were instructed to respond as quickly as possible.
Participants were then presented with a blank screen for 0,UUU ms and subsequently rated their degree of confidence using the same seven-button response pad.During this period, the word 'Confidence' was presented in the centre of the screen.Participants indicated their confidence on a seven-point scale with options "surely incorrect" (0), "probably incorrect" (V), "maybe incorrect" (M), "unsure" (g), "maybe correct" (h), "probably correct" (b) and "surely correct" (c).The midpoint rating of "unsure" signified that they were unsure whether the brightness judgment Neural correlates of confidence 0M was correct or incorrect (i.e., they indicated that they were guessing).They were also instructed to make their confidence rating as quickly as possible within 0,hUU ms from confidence rating stimulus onset.No confidence rating was required if the brightness judgment was "too slow" (>0,hUU ms RT) or "too quick" (<VhU ms RT).In this case, only the respective "too early/slow" feedback was presented for 0,hUU ms, and then the next trial began.
The experiment included 0,UUf experimental trials equally allocated across 0g blocks of cV trials each.An equal number of trials from all conditions were randomly interleaved within each block.Each block was followed by a self-terminated rest period.
Before the experiment, participants completed a training block of Mb trials without making confidence ratings.Instead, they received feedback on their accuracy after each trial to familiarize themselves with the brightness judgment task.
Participants then completed another training block of Mb trials in which both brightness and confidence judgments were required.No performance feedback was given in this training, and a confidence rating scale displaying each option was presented on the screen for 0,hUU ms.Participants were instructed that during the main experiment, the visual presentation of the scale would be removed, and the confidence judgment would only be prompted by the word "Confidence".

E.H. Task Performance Analyses
To characterise patterns of task performance we first examined effects of relative and absolute evidence manipulations on brightness judgment accuracy, RTs and confidence ratings.Analyses of RTs and confidence ratings were done separately for trials with correct and erroneous brightness judgments (as done by Fleming et al.,VU0f;Ko et al.,VUVV).Code used for analyses will be available at osf.io/cxvMm at the time of publication.No part of the study analyses was pre-registered prior to the research being conducted.
We fit generalised linear mixed effects models using the R package lmeg (Bates et al. VU0h).We included relative evidence, absolute evidence, and their interaction as fixed effects and participant as a random intercept in all models.We fit models specifying binomial distributions with a logit function to model accuracy, gamma distributions with an identity function for RTs (as recommended by Lo & Andrews, Neural correlates of confidence 0g VU0h), and gaussian distributions with an identity function for confidence (as done by Fleming et al.,VU0f).Model equations and outputs are presented in the Supplementary Material.
We additionally performed frequentist and Bayesian post-hoc paired-samples t tests using JASP vU.0c.0 (JASP Core Team, Cauchy prior distribution, width U.cUc, default settings) to determine whether average confidence ratings were higher for trials with correct as compared to erroneous responses.We performed this analysis (using pooled data across relative and absolute evidence conditions) for all trials, and for the subset of trials that contributed to averaged ERPs for correct responses and errors.

E.L. EEG Data Acquisition and Processing
EEG data were recorded using a Biosemi Active Two system with bg active electrodes at a sampling rate of h0V Hz.Recordings were grounded using common mode sense and driven right leg electrodes (http://www.biosemi.com/faq/cms&drl.htm).Six external electrodes were additionally included: two behind the left and right mastoids, two placed 0 cm from the outer canthi, and one above and one below the right eye.
EEG data were processed using EEGLab v.VUVV.U (Delorme & Makeig, VUUg) in MATLAB (Mathworks).First, we identified excessively noisy channels by visual inspection (median number of bad channels = U, range U-e) and excluded these from average reference calculations and Independent Components Analysis (ICA).Sections with large artefacts were also manually identified and removed.
We then re-referenced the data to the average of all channels, low-pass filtered the data at MU Hz (EEGLab Basic Finite Impulse Response Filter New, default settings), and removed one extra channel (AFz) to correct for the rank deficiency caused by the average reference.We processed a copy of this dataset in the same way and additionally applied a U.0 Hz high-pass filter (EEGLab Basic FIR Filter New, default settings) to improve stationarity for the ICA.We then ran ICA on the duplicate dataset using the RunICA extended algorithm (Jung et al., VUUU).Independent component information was copied to the unfiltered dataset (e.g., as done by Feuerriegel et al.,VU0f).Components associated with muscular and ocular activity were identified and Neural correlates of confidence 0h removed based on the guidelines in Chaumon et al. (VU0h).Previously removed channels, including AFz, were then interpolated from the cleaned dataset using spherical spline interpolation.
The resulting EEG data were then segmented from -0UU to V,VUU ms relative to flickering square onset.Segments were baseline corrected using the pre-stimulus interval.Epochs with amplitudes exceeding ±0hU μV at any scalp electrode were excluded from further analyses.Across participants, the number of epochs included in EEG analyses ranged from MMh to efV (median = cgg).From this data, we derived response-locked segments ranging from -hUU to cUU ms relative to the brightness judgment button press.
Two versions of the response-locked data were created.The first used the same pre-stimulus baseline as the stimulus-locked segments.The second used a preresponse baseline spanning -0UU to U ms relative to the button press response.This was done to investigate how the use of pre-response baselines (commonly used in this area of research) influences measures of post-decisional ERP components such as the Pe (see Feuerriegel et al., VUVV, Feuerriegel & Bode, VUVV).

E.O.=. CPP amplitude analyses.
We measured response-locked CPP amplitudes as the average amplitude between -0MU and -cU ms relative to the response, averaged across parietal electrodes Pz, P0, PV, CPz, and POz (same time window as Steinemann et al.,VU0f;Feuerriegel et al.,VUV0,VUVV).For these analyses, we used a pre-stimulus baseline.To link our results to previous work using stimulus-locked ERPs (e.g., Gherman & Philiastides,VU0f;Rausch et al.,VUVU), we also measured stimulus-locked CPP amplitudes as the average amplitude within the time window of MhU-hUU ms from flickering square onset.
For CPP amplitude analyses we compared correct and erroneous responses using paired-samples frequentist and Bayesian t-tests as implemented in JASP vU.0c.0 (JASP Core Team, Cauchy prior distribution, width U.cUc, default settings).We additionally fitted linear regression models using MATLAB to predict CPP mean amplitudes based on confidence ratings within the range spanning "maybe correct" (h) Neural correlates of confidence 0b to "surely correct" (c) for each participant separately, as done by Feuerriegel et al.
(VUVV).As the "guessing" (g) rating may also be considered as the lowest certainty condition, we also included complementary analyses that included this rating.For these analyses, we included both trials with correct responses and errors.The resulting Beta coefficients (regression model slopes) were tested at the group level using one-sample frequentist and Bayesian t-tests (as done by Feuerriegel et al., VUVV).
We did not perform analyses using confidence ratings between "surely incorrect" and "maybe incorrect" (i.e., ratings indicating an error had occurred) because there were insufficient numbers of trials for such restricted analyses (median number of trials per participant = MM.h, and e participants with < VU trials).Please note that, although there were substantial numbers of trials with errors, in those trials, participants often provided confidence ratings indicating that they had made a correct response (as also observed in Ko et al., VUVV).
To test for associations between CPP amplitudes and confidence while controlling for relative evidence, we performed confidence-CPP amplitude regression analyses using the methods described above for trials within each relative evidence condition separately (similar to analyses in Tagliabue et al.,VU0e).We used restricted ranges of confidence ratings spanning "maybe correct" to "surely correct".In a complementary analysis we averaged each of these beta (regression slope) estimates across participants to derive overall measures of associations with confidence.
We also assessed effects of relative and absolute evidence independently of effects of confidence, to determine whether CPP amplitudes also covaried with the quality of information provided by the stimulus (similar to Odegaard et al.,VU0f;Tagliabue et al.,VU0e).We used the same regression analysis methods to test for associations with relative evidence level (low, medium, high) when holding confidence constant, for "maybe correct", "probably correct" and "surely correct" ratings separately.Beta values were also averaged across confidence rating conditions to derive more general estimates of relative evidence effects.We repeated these analyses using absolute evidence level (low, medium, high) as a predictor of CPP amplitudes while holding confidence constant.

Neural correlates of confidence 0c
For Pe amplitude measures we baseline-corrected single-trial ERPs using prestimulus and pre-response baselines in separate analyses.We measured Pe amplitudes as the mean amplitude within VUU-MhU ms following the response, using the same set of parietal electrodes as for the CPP (same time window as Nieuwenhuis et al.,VUU0;Di Gregorio et al.,VU0f;Feuerriegel et al.,VUVV).Paired-samples t-tests were conducted to compare Pe amplitudes across trials with correct and erroneous responses.Within-subject regressions were performed using the predictor of confidence as described above.We did not find associations between Pe amplitudes and confidence in the data using pre-stimulus baselines, and so we did not conduct additional analyses controlling for relative or absolute evidence.

F.=. Task Performance
Both relative and absolute evidence manipulations produced intended effects on accuracy, RTs, and confidence ratings, consistent with Ko et al. (VUVV, see also Ratcliff et al.,VU0f;Turner et al.,VUV0b).Task performance for each relative and absolute evidence level combination is shown in Figure V. Accuracy was higher in conditions with higher relative evidence (i.e., larger mean luminance differences between the squares, p < .UU0) and lower in conditions of higher absolute evidence (i.e., higher overall luminance across the two squares, p < .UU0, see Figure VA).RTs for correct and error trials were faster in conditions of higher absolute evidence and higher relative evidence (p's < .U0 for all main effects, Figure VB-C) except for RTs in error trials where relative evidence did not have a significant effect.Higher confidence ratings were made in conditions of higher relative evidence and higher absolute evidence for trials with correct responses (p's < .UU0, Figure VD).This was despite lower accuracy in higher absolute evidence conditions and consistent with findings in Ko et al. (VUVV).In trials with errors, confidence ratings were higher in conditions of higher absolute evidence and lower relative evidence (p's < .UU0, Figure VE).Model outputs and the full set of statistical results (including interaction effects) are reported in the Supplementary Material.Associations between confidence, accuracy, and RTs are plotted in Supplementary Figure S0.

Neural correlates of confidence 0f
We also performed post-hoc paired-samples t tests to assess whether confidence ratings differed across trials with correct and erroneous responses.When including all trials, confidence was higher on average in trials with correct responses, t(Vc) = 0V.Ue, p < .UU0, BF!( = M.ff * 0U ) .However, when only including the subset of trials that contributed to averaged ERPs for correct and error responses, we did not observe differences in average confidence ratings, t(Vc) = U.eU, p = .Mcg, BF!( = U.Ve.

F.E. CPP Amplitudes During Decision Formation
First, we analysed CPP amplitudes time-locked to stimulus onset as done in some previous studies (e.g., Rausch et al., VUVU).We did not find significant differences in stimulus-locked CPP amplitudes across trials with correct responses and Neural correlates of confidence 0e errors, t(Vc) = U.0M, p = .fef,BF!( = U.VU (Figure MA).We also did not find evidence for associations with confidence across the range of "maybe correct" to "surely correct" ratings, t(Vc) = -U.gM,p = .bcM,BF!( = U.VV (Figure MC).Results of complementary analyses that included the "guessing" trials as the lowest confidence rating did not yield associations with CPP amplitudes, t(Vc) = -U.Vc, p = .cfc,BF!( = U.V0. We then repeated these analyses for CPP amplitudes time-locked to the response.We did not find differences in pre-response CPP amplitudes across correct responses and errors, t(Vc) = U.Vc, p = .ceU,BF!( = U.V0 (Figure MB).Here, please note that average confidence ratings did not significantly differ between correct responses and errors when analysing trials that contributed to these averaged ERPs, as reported above.To assess associations between pre-response CPP amplitudes and certainty in having made a correct response as reported in Feuerriegel et al. (VUVV) and Grogan et al. (VUVM), we performed regression analyses using restricted ranges of confidence ratings, spanning "maybe correct" (h) to "surely correct" (c).Here, omitting the "guessing" (g) trials further removed any ambiguity about whether this option could have been chosen in the case of a lapse of attention.CPP amplitudes were positively associated with confidence, t(Vc) = g.VU, p < .UU0, BF!( = 000.f0(Figure MD).Scalp maps of group-averaged beta values (i.e., regression model slopes) indicated that associations between CPP amplitudes and confidence were most prominent over parietal channels (Supplementary Figure SVA).We also ran complementary analyses that included the "guessing" trials as the lowest certainty rating.CPP amplitudes remained associated with confidence in these analyses, t(Vc) = V.Vh, p = .UMV, BF!( = 0.cg.errors.B, D) ERPs for confidence ratings ranging from "maybe correct" to "surely correct".Grey shaded regions denote the mean amplitude measurement time windows for the CPP.Asterisks denote statistically significant associations between confidence ratings and ERP component amplitudes (*** denotes p < .UU0 for the regression analysis using confidence ratings ranging between "maybe correct" to "surely correct").

Associations with confidence when controlling for relative evidence.
We also tested for associations between pre-response CPP amplitudes and certainty in having made a correct response while controlling for effects of relative evidence.To do this, we fit regression models separately for trials within each relative Neural correlates of confidence V0 evidence level.We again restricted our analyses to confidence judgments ranging between "maybe correct" (5) and "surely correct" (7), given the low trial numbers for each analysis in the lower confidence range after splitting by relative evidence level.

Effects of relative and absolute evidence when controlling for confidence.
We also tested for associations between the level of relative evidence and preresponse CPP amplitudes while keeping confidence ratings constant.This was done for trials with "maybe correct", "probably correct" and "surely correct" ratings in separate analyses.We did not find associations between CPP amplitudes and relative evidence level for "maybe correct", t(Vc) = -U.cM,p = .gcM,BF!( = U.Vb, "probably correct", t(Vc) = U.UM, p = .ech,BF!( = U.VU, or "surely correct" ratings, t(Vc) = U.M0, p = .cbU,BF!( = U.V0, nor when averaging beta values across each of these analyses within each participant, t(Vc) = -U.Mb, p = .cVM,BF!( = U.V0.
Taken together, these results indicate that CPP amplitudes covaried with confidence rather than levels of relative or absolute evidence.

F.F. Post-Decisional Pe Component Amplitudes
We analysed Pe component amplitudes using both pre-stimulus and preresponse ERP baselines in separate analyses, as done in Feuerriegel et al. (VUVV) and Neural correlates of confidence VV Grogan et al. (VUVM).This was done to systematically assess whether the use of preresponse baselines can artificially produce associations between Pe amplitudes and confidence, in cases where there are already ERP differences during the pre-response baseline window (e.g., Feuerriegel & Bode, VUVV).F.F.=.Analyses using pre-stimulus baselines.
When using pre-stimulus baselines (which circumvent issues with propagating potential pre-response ERP differences into the Pe window), we did not observe Pe amplitude differences between errors and correct responses, t(Vc) = -U.Vg, p = .f0,BF!( = U.V0 (Figure gA).Pe amplitudes were not associated with decision confidence, t(Vc) = U.g0c, p = .bfU,BF!( = U.VV (Figure gC).We also ran complementary analyses that included the "guessing" trials as the lowest confidence rating.Pe amplitudes were not associated with confidence in these analyses, t(Vc) = -0.UV, p = .M0b, BF!( = U.MV.
Because there was no indication of Pe amplitudes covarying with confidence, we do not report additional analyses controlling for relative or absolute evidence here.

F.F.E. Analyses using pre-response baselines.
When using pre-response baselines (which, however, might propagate potential pre-existing ERP differences into the Pe window), we did not identify differences in Pe amplitudes following errors compared to correct responses, t(Vc) = -0.0M,p = .Vbf, BF!( = U.Mb (Figure gB).However, there were clear associations between Pe amplitudes and confidence ratings, t(Vc) = -V.fc,p = .UUf, BF!( = h.b0(Figure gD), such that more positive-going amplitudes were observed in trials with lower confidence ratings.We also ran complementary analyses that included the "guessing" trials as the lowest confidence rating.Pe amplitudes remained associated with confidence in these analyses, t(Vc) = -M.c0,p < .UU0, BF!( = Mh.cV.However, as explained above, these statistically significant associations are most likely due to differences already present before the response (during the CPP measurement window, Figure MD) that largely overlap with the pre-response baseline window.
In summary, these results show that in our experiment, in which error detection was rare, the Pe did not reliably reflect differences between errors and Neural correlates of confidence VM correct responses.When avoiding confounds relating to pre-response ERP differences, the Pe also did not reflect differences in confidence judgments.ERPs for confidence ratings ranging from "maybe correct" to "surely correct".In all plots the grey shaded area denotes the VUU-MhU ms time window used to measure the Pe component.The shaded magenta area denotes the pre-response baseline time window.Asterisks denote statistically significant associations between confidence ratings and ERP component amplitudes (** denotes p < .U0 for the regression analysis using confidence ratings from "maybe correct" to "surely correct" included).

H. Discussion
To test for electrophysiological correlates of confidence we presented participants with a comparative brightness judgment task.We elicited a wide range of Neural correlates of confidence Vg confidence ratings by varying the difference in average luminance across the two flickering squares (relative evidence), and the overall brightness of both squares (absolute evidence).CPP amplitudes during decision formation were positively associated with confidence, and specifically certainty in having made a correct response.Importantly, this was not simply a consequence of covariation between relative evidence (task difficulty) and confidence ratings.Our findings complement reports of parietal (Grogan et al., VUVM) and fronto-central (Feuerriegel et al., VUVV) correlates of confidence during decision formation, suggesting that critical computations that inform confidence judgments occur during this time window.However, we did not find associations between confidence and Pe component amplitudes following each decision, except when these were likely to be artificially produced using pre-response baselines (as in Feuerriegel et al., VUVV; Grogan et al., VUVM).We show that, in some decision contexts, postdecisional ERP trajectories do not necessarily covary with confidence.Instead, we propose that postdecisional ERP dynamics depend on whether postdecisional evidence accumulation is relied upon to inform confidence judgments, and also specific conditions such as the reporting of decision errors (e.g., Murphy et al.,VU0h;Feuerriegel et al.,VUVV) or continued integration of additional sensory evidence after a decision is made (Grogan et al., VUVM).

H.=. ERP Correlates of Confidence During Decision Formation
We found larger (more positive) CPP amplitudes during decision formation when participants made higher confidence ratings (similar to Feuerriegel et al., VUVV; Grogan et al., VUVM).This association persisted even when controlling for relative evidence, which often covaries with confidence across a variety of decision-making tasks (e.g., Philiastides et al.,VU0g;Fleming et al.,VU0f;Feuerriegel et al.,VUVV;Grogan et al.,VUVM).The absolute evidence manipulation in our paradigm allowed us to elicit a range of confidence judgments at each level of relative evidence.We found that CPP amplitudes were specifically associated with confidence rather than relative evidence in our data (similar to electrocorticography results in Peters et al.,VU0c) as also found in relation to perceptual awareness ratings (Tagliabue et al.,VU0e).O'Connell, VU0M), situations have been identified in which pre-response amplitudes systematically vary across conditions (e.g., Steinemann et al.,VU0f;Kelly et al.,VUV0;Feuerriegel et al.,VUV0).Assuming that the CPP does trace the degree of (unsigned) evidence in favour of the chosen option in each trial, modulations of pre-response amplitudes would be consistent with racing accumulator models of confidence that specify more accumulated evidence in favour of the chosen option in trials with higher confidence ratings (e.g., Vickers, 0ece; Vickers & Packer, 0efV; see also Smith et al., VUVV).Although processes occurring during decision formation are unlikely to be the sole source of information used to guide confidence judgments (e.g., Fleming & Daw,VU0c;Turner et al.,VUV0a;Desender et al.,VUV0a), this interpretation would suggest that evidence accumulation dynamics, which can be traced using neural measures, are important to consider when modelling confidence judgments.
This interpretation also has implications for post-decisional evidence accumulation models of confidence (e.g., Pleskac and Busemeyer, VU0U; Moran et al.,VU0h;Desender et al.,VUV0b), most of which specify a fixed decision boundary for the initial choice to preserve model identifiability and prevent over-fitting (for a more flexible model see van den Berg et al.,VU0bb).If the extent of evidence accumulation at the time of the first-order choice varies across confidence judgments (as specified in racing accumulator models), then the fixed decision boundary assumption of many postdecisional locus models is false.If this is the case, it is likely that (at least part of) the variance captured by postdecisional processes in these models is due to processes that occur during the first-order judgment (e.g., Turner et al., VUVV).To account for this, hybrid decisional/postdecisional process models could be developed to better delineate the consequences of computations occurring over each time window.
Neural correlates of confidence Vb However, model identifiability should be ascertained (e.g., using parameter recovery methods, Miletić et al.,VU0c;Evans et al.,VUVU) before they are fitted to choice and confidence data.
Here, we note that specific features of our trial design may have enabled detection of pre-decisional correlates of confidence.Our findings are consistent with those in Grogan et al. (VUVM) when a blank screen was presented between choice and delayed confidence ratings, but not when postdecisional stimuli were presented that provided additional information about the accuracy of the first-order decision.As noted by Grogan and colleagues, presentation of additional, decision-relevant stimuli (or substantial postdecisional evidence accumulation, e.g., Resulaj et al., VUUe) may weaken the relationship between processes during decision formation and subsequent confidence ratings that are provided after some delay.
We also note that the topographic distributions beta values in Supplementary

H.E. Postdecisional ERP Correlates of Confidence
In our experiment, Pe amplitudes did not covary with confidence ratings.
However, the application of a pre-response baseline produced spurious associations Neural correlates of confidence Vc with confidence (discussed in Feuerriegel et al., VUVV; Feuerriegel & Bode, VUVV).
These findings are congruent with previous work that included analyses using both pre-stimulus and pre-response baselines (Desender et al.,VU0eb;Feuerriegel et al.,VUVV;Grogan et al.,VUVM;but see Boldt & Yeung,VU0h).Although Feuerriegel et al. (VUVV) reported more positive-going Pe amplitudes for error trials with lower confidence ratings, this was specifically linked to certainty in having made an error.In other words, an association was found specifically when analysing the range of confidence ratings indicating that an error had occurred, similar to the "surely incorrect" to "maybe incorrect" range here.We did not obtain sufficient numbers of trials with error-indicating confidence ratings to evaluate the same effect in our dataset.
Our findings are relevant to the recently-proposed model of Desender et al. (VUV0b) that posits that postdecisional parietal ERP components (specifically the Pe) reflect a continued evidence accumulation process that determines confidence judgments.According to this model, two-alternative discrete choice decisions are initially made according to a double-bounded evidence accumulation process.After the first-order judgment is made, the ensuing metacognitive confidence judgment is proposed to reflect the degree of evidence accumulated in favour of having made an error.Importantly, the model of Desender et al. (VUV0b) predicts that the extent of accumulated postdecisional evidence will be reflected in the amplitude of the Pe component, and that the amplitude of the Pe component will show a monotonic, inverse relationship with decision confidence ratings spanning the range of "surely incorrect" to "surely correct" (as depicted in their Fig.0C).
Contrary to these predictions, we did not observe such differences in Pe amplitudes across confidence ratings.Our findings indicate that, in some circumstances, confidence ratings are more closely associated with neural activity at the time of the decision rather than postdecisional ERPs.Notably, in our task any postdecisional evidence accumulation may not have strongly influenced confidence judgments because the stimuli were difficult to discriminate and fluctuated over time.
This contrasts with Flanker or Stroop tasks in which the stimuli are highly visible, and participants can more easily detect their errors (e.g., Murphy et al.,VU0h).
Postdecisional correlates of evidence accumulation might also be more readily Neural correlates of confidence Vf observed when participants perform an error detection task (Murphy et al.,VU0h) or more frequently provide confidence ratings indicating an error had occurred (Feuerriegel et al., VUVV), or when additional, decision-relevant stimuli are presented after the first-order decision (Grogan et al., VUVM).This supports the notion of the Pe as a well-established correlate of error awareness (Falkenstein et al.,0ee0;O'Connell et al.,VUUc;Charles et al.,VU0M;Murphy et al.,VU0h), as well as the idea that a postdecisional ERP component can track the influence of additional stimuli after a decision has been made (Grogan et al., VUVM).Taken together, these findings suggest that the link between Pe amplitudes and confidence depends on the extent to which postdecisional evidence accumulation is used to determine confidence judgments or error detection decisions, which may vary across decision-making tasks.
Our results also do not challenge the idea that, in some cases, decision reversals and confidence judgments can be substantively influenced by continued evidence accumulation, even without stimuli being presented between the choice and confidence rating.For example, changes of mind can be driven by stimuli that appear during (and immediately before) motor action execution, where these stimuli do not influence the first-order decision (e.g., Resulaj et al., VUUe; Turner et al., VUVV).
However, we note that, in our dataset and most existing work involving difficult perceptual discrimination tasks, there is no clear evidence of covariation between Pe component amplitudes and confidence as would be expected from a substantial influence of postdecisional evidence accumulation.

H.F. Limitations
Our findings should be interpreted with the following caveats in mind.First, participants did not provide sufficient numbers of confidence ratings indicating that an error had occurred (i.e., maybe/probably/surely incorrect ratings).This is despite them making objectively erroneous decisions in a substantial proportion of trials (indicated by the accuracy plots in Figure VA).This is consistent with distributions of confidence ratings in Ko et al. (VUVV) using an almost identical task, which was designed to elicit substantial variance in confidence ratings within each relative evidence condition.However, this meant that there were not enough trials to test for ERP correlates of certainty in having made an error.This may also be why we did not Neural correlates of confidence Ve observe differences in Pe amplitudes across trials with correct responses and errors, as the Pe is more closely linked to error awareness rather than error commission (e.g., Charles et al.,VU0M;Murphy et al.,VU0h).
In addition, ranges of confidence ratings varied across individuals, which is why we used our linear regression approach to test for associations between relatively higher and lower confidence reports.This variability is likely to arise from different criteria being used for different confidence rating categories across participants (e.g., DeCarlo, VU0U; Peters & Lau,VU0h).In other words, one participant's "probably correct" rating may not necessarily map onto the same internal degree of confidence as another's "probably correct" rating, making comparison of individual ratings difficult at the group level.For this reason, we did not judge it meaningful to test for differences between specific pairs of confidence ratings (e.g., "probably correct" compared to "certainly correct").This prohibited us from assessing non-linearities in the mapping of ERP amplitudes to confidence ratings.Small-N studies (Smith & Little,VU0f) may be appropriate for better characterising the functional form of ERP component-confidence associations within individuals in contexts where confidence rating criteria are more stable over time.
We also note that our findings do not pertain to the validity of evidence accumulation models that are fit to patterns of task performance data but do not make predictions about neuroimaging measures (e.g., Ratcliff & Starns,VUUe;Pleskac & Busemeyer,VU0U).Our findings specifically relate to how such processes specified in these models might be implemented in the brain and reflected in measures of electrophysiological activity.It is possible that processes specified in these models are simply not indexed by the ERP components analysed in the lines of EEG studies mentioned here.
Lastly, even in analyses controlling for effects of relative evidence, the factor of absolute evidence was still varied across trials in order to produce variation in confidence ratings.It is possible that increasing overall brightness led to both increased confidence ratings (despite a drop in accuracy) and larger response-locked CPP amplitudes due to larger visual evoked potentials.As we replicated effects reported in Grogan et al. (VUVM), we do not believe that our results are simply a byproduct of this confound.In addition, we did not observe associations between levels Neural correlates of confidence MU of absolute evidence and CPP amplitudes when holding confidence ratings constant.
As absolute evidence manipulations necessarily entail changes in the overall intensity of stimulation, it may not be possible to completely control for this potential confound in future work.

H.H. Conclusion
We report evidence of a parietal ERP correlate of confidence during decision formation, which was not simply a by-product of changes in relative evidence or accuracy.However, we did not find a similar parietal correlate of confidence during the postdecisional time window that occurred between choice and confidence reports.
Our findings reinforce the notion that processes occurring during decision formation are likely to substantively inform confidence judgments and should be considered in models of how we compute (and communicate) our degree of confidence in our decisions.

Figure = .
Figure =.Trial diagram and stimuli.(A) In each trial, two flickering squares of differing average luminance were presented.Each square changed in luminance with each monitor refresh (every 0M.M ms).Participants indicated the square that appeared brighter on average.Following this judgment, participants reported their decision confidence using a c-point scale while the word "confidence" was presented on the screen.(B) Average luminance values (in RGB) for each square in each condition.

Figure 2 .
Figure 2. Accuracy, mean RTs, and mean confidence ratings for different combinations of relative and absolute evidence levels.(A) Decision accuracy (proportion correct).(B) Mean RTs for correct trials.(C) Mean RTs for error trials.(D) Mean confidence ratings for correct trials.(E) Mean confidence ratings for error trials.Confidence ratings were measured on a scale ranging from 1 (surely incorrect) to 7 (surely correct) with a mid-point of 4 indicating guessing.The dotted line indicates the mid-point of the scale.Error bars represent standard errors.
Figure F. Stimulus-and response-locked ERPs averaged over parietal channels Pz, P0, PV, CPz, and POz.Stimulus-locked ERPs are plotted in the left column.Responselocked ERPs are plotted in the right column.A, B) ERPs for correct responses and

Figure H .
Figure H. Group-averaged ERPs following keypress responses at electrodes Pz, P0, PV, CPz, and POz, corrected using either a pre-stimulus baseline (left column) or a preresponse baseline (right column).A, B) Trials with correct responses and errors.C, D) Confidence-ERP amplitude correlations were strongest over parietal channels rather Neural correlates of confidence Vh than the more fronto-central distribution identified inFeuerriegel et al. (VUVV), suggesting that this effect does reflect variation in the CPP component.Based on this interpretation, it is plausible that these effects reflect modulations of the CPP component, and by extension differences in the degree of (unsigned) evidence accumulated at the time of the decision (consistent withGrogan   et al., VUVM).Although a defining feature of CPP component morphology is that it typically reaches a fixed amplitude prior to the response, as consistent with standard implementations of the Diffusion Decision Model(O'Connell et al., VU0V; Kelly &

Figure
Figure SVA (displaying associations between ERP amplitudes and confidence) are most prominent across a broad range of parietal channels, which does not quite match the midline parietal distribution of the CPP (visible in Supplementary Figures SVC).This This suggests that, contrary to the clear midline parietal effect loci inGrogan et  al. (VUVM), additional, overlapping ERP components may have contributed to the observed effects.Notably, Feuerriegel et al. (VUVV) observed a distinct, fronto-central component that influenced amplitudes at parietal channels via volume conduction (see also Kelly & O'Connell, VU0M; Dmochowski & Norcia, VU0h; Burwell et al., VU0e for a similar component), however an effect with such a frontal locus was not observed in our data.The close proximity of any additional effects to midline parietal channels makes it difficult to cleanly disentangle those from CPP component changes using standard current source density (CSD) estimation methods (e.g., Kayser and Tenke, VUUb), and so we have not used CSD-transformed ERPs here.Future work should pay close attention to the topographies of effects and their correspondence to wellestablished spatial distributions of the CPP component (e.g., Kelly & O'Connell, VU0M).