## Abstract

Human decision-making and self-reflection often depend on context and internal biases. For instance, decisions are often influenced by preceding choices, regardless of their relevance. It remains unclear how choice history influences different levels of the decision-making hierarchy. We employed analyses grounded in information and detection theories to estimate the relative strength of perceptual and metacognitive history biases, and to investigate whether they emerge from common/unique mechanisms. Though both perception and metacognition tended to be biased towards previous responses, we observed novel dissociations which challenge normative theories of confidence. Different evidence levels often informed perceptual and metacognitive decisions within observers, and response history distinctly influenced 1^{st} (perceptual) and 2^{nd} (metacognitive) order decision-parameters, with the metacognitive bias likely to be strongest and most prevalent in the general population. We propose that recent choices and subjective confidence represent heuristics which inform 1^{st} and 2^{nd} order decisions in the absence of more relevant evidence.

## Introduction

Human knowledge of the external world and of internal cognitive processes is often biased and incomplete^{1–3}. When decisions are made about sensory input (i.e. Is a target present?), we can distinguish between objective accuracy (perceptual sensitivity), and how accurate one is in judging their own performance (metacognitive insight)^{4,5}. Metacognitive insight can be quantified by comparing subjective confidence to objective accuracy^{4–6}. Though accuracy and confidence usually correlate, metacognitive performance differs widely across individuals^{3,7,8} with important consequences in everyday life. For instance, insight modulates learning, adaptive decision-making, error monitoring and exploration^{9–13}. In fact, impaired metacognition is associated with many neuropsychiatric disorders^{14} and sub-clinical symptom dimensions^{15}.

Even in healthy individuals, perceptual and metacognitive decisions not only depend on the immediately available evidence, but also on recent experiences and choices. For instance, when similar stimuli are serially presented, perceptual decisions are often biased towards responses and/or stimuli on preceding trials, a phenomenon known as choice history bias^{16–21} or serial dependence^{22–27}. Whilst this mechanism may generally be adaptive, because recent experience usually predicts upcoming input, it can also lead to non-veridical decisions^{23,28–30}. Interestingly, serial dependence has also been reported for subjective confidence reports^{31}, and the level of confidence on the preceding trial has been suggested to modulate perceptual history bias, with repetition more likely when preceding confidence was high^{16,17,32}. These reports suggest the existence of an intimate link between perception and metacognition in the formation of history biases. However, the exact nature of this relationship, and the relative strength and source of each bias, remain unclear.

Employing both model-based and non-parametric analyses, we observed history biases in both perceptual responses and ratings of confidence, but we show that the metacognitive history bias is stronger and likely to be most prevalent in the general population. Computational modelling revealed intriguing dissociations between perceptual and metacognitive decision-making parameters. For instance, perceptual choice alternation (disengagement from hysteresis) was associated with increased perceptual sensitivity but reduced metacognitive insight. Overall performance closely matched predictions from recently proposed computational models of decision-making and confidence^{33–35}. However, we crucially demonstrate that both perceptual and metacognitive decision criteria are not fixed: they fluctuate from moment-to-moment and are biased by recent choices. Accurate models of subjective confidence must go beyond a normative account to capture sub-optimal metacognitive performance driven by irrelevant factors such as preceding confidence reports.

## Results

### Overall performance exhibited signatures predicted by computational models of decision-making and metacognition

Thirty-seven human observers performed a two-alternative forced choice (2-AFC) orientation discrimination task (Fig. 1A and Methods). Participants judged whether a briefly presented Gabor patch was tilted leftward or rightward of the vertical plane, and reported the level of confidence they felt in their decision (on a scale of 1 – ‘Not confident at all’ to 4 – ‘Highly confident’). The true orientation (and hence task difficulty) was manipulated from trial-to-trial. This design allowed us to test predictions arising from a recently proposed computational model of perceptual decision-making and metacognition based on Bayesian statistical confidence and signal detection theory (SDT), as defined in Figure 1B (and Methods). Briefly, human decisions have been modelled as the comparison of an internal decision variable (DV), representing the evidence in favour of one or other choice in 2-AFC tasks, against a decision criterion (C). Under this model, confidence in the decision is given by the distance of the DV from C^{4,5,16,33–35}. When a discrete confidence rating scale is employed, the level of confidence is defined by where the DV falls with respect to the so-called type-2 criteria (*c*_{1}, *c*_{2}, … *c*_{N−1}), where N indexes the number of possible ratings. A confidence rating of k will follow if the DV falls in the interval (*c*_{k−1}, *c*_{k}).

This model gives rise to several predictions regarding the relationships between stimulus evidence, accuracy and decision confidence^{16,34,36–38}: (1) Accuracy should scale with evidence strength (Fig. 1C); (2) Conditioning type-1 performance on high or low confidence ratings should change the slope of the relationship between stimulus evidence and accuracy, with a steeper slope for high relative to low confidence trials (Fig. 1C); (3) Confidence should increase with evidence strength for correct trials, but decrease with evidence strength for incorrect trials (Fig. 1D); (4) Even when there is no veridical evidence in favour of one response or other, confidence should be above 0 (Fig. 1D). These predictions were all confirmed in our data. Accuracy increased as a function of evidence strength but the slope of the stimulus evidence-accuracy relationship was steeper for high relative to low confidence trials (Fig. 1F). Confidence increased with evidence strength for correct trials but decreased with evidence strength for incorrect trials (Fig. 1G). Accordingly, response time decreased as a function of evidence strength for correct trials but increased for incorrect trials (Fig. 1H). Finally, participants reliably reported some level of confidence in decisions even when the Gabor patch was vertically aligned and hence there was no informative evidence (Fig. 1G).

### Choice history bias occurs in both perceptual and metacognitive decisions, but is stronger in metacognition

Next, we investigated the degree to which choice history biases both perceptual and metacognitive responses. Across all trials, no systematic group-level bias in favour of either choice was apparent (t-test of psychometric function (PF) thresholds versus 0°: t(36) = .1497, p = .8818, Bayes Factor (BF_{10}) = 0.179) (Fig. 2A). However, group-averaged PFs conditioned on the previous response were shifted towards the previous response (‘left’/’right’ responses were more likely following ‘left’/’right’ responses respectively) despite randomly ordered presentations (Fig. 2A). Post-left PF thresholds were significantly biased away from veridical 0° (t(36) = 3.1295, p = .0035, BF_{10}= 10.462), as were post-right PF thresholds, but in the opposite direction (t(36) = −2.5466, p = .0153, BF_{10} = 2.9235). Accordingly, post-left were significantly different to post-right thresholds (t(36) = 4.2498, p < .001, BF_{10} = 177.4). The effect remained significant for trial lags of two (t(36) = 5.9966, p < .001, BF_{10} = 2.3930e+04) and three (t(36) = 5.91, p < .001, BF_{10} = 1.8667e+04) (Fig. 2B). It has recently been suggested that confidence on a given trial modulates the likelihood of the perceptual choice being subsequently repeated^{16,32}. However, we did not find any influence of preceding confidence on perceptual history bias, with the bias occurring both when confidence was low versus high on the previous trial (Supplementary Fig. 1).

Next, we investigated the degree of metacognitive history bias^{31}. Confidence increased as a function of absolute orientation (i.e. sensory evidence) but was also shifted towards previous trial ratings (i.e. ‘high’/’low’ were more likely following ‘high’/’low’ ratings respectively) (Fig. 2C). A regression analysis (see Method) confirmed that confidence was positively predicted by ratings on the previous trial across participants (t-test of slopes versus 0: t(36) = 11.7028, p < .001, BF_{10} = 9.3215e+10). The effect remained significant for trial lags of two (t(36) = 11.9084, p < .001, BF_{10} = 1.5082e+11) and three (t(36) = 10.1775, p < .001, BF_{10} = 2.2515e+09) (Fig. 2D).

In order to calculate within-participant significance and estimate population prevalence of the observed biases, we performed additional analyses using mutual information (MI)^{39}. MI provides an assumption free measure of dependence with effect sizes on a common meaningful scale (bits) across variables with different characteristics (i.e. different dimensionality and/or number of samples). Hence, we could also quantify and compare how strongly both perceptual and metacognitive responses of each participant were related to the objective evidence at hand versus recent choices. First, we quantified the strength of dependence between stimulus evidence (orientation of the Gabor (Orient)) and both perceptual responses and confidence ratings (Fig. 2E). We then quantified, on the same scale, the choice history biases in both confidence ratings and perceptual responses (see Method for details). Supplementary Figure 2 highlights how the MI measures relate to the model-based bias measures displayed in Fig. 1B and C.

As expected, the highest dependence was found between objective evidence (Orient) and perceptual responses (Resp) (Fig. 2E). Interestingly, this was stronger than the dependence between objective evidence and confidence ratings (t(36) = 11.6448, p < .001, BF_{10} = 8.1307e+10), suggesting sub-optimal metacognitive performance. The confidence history bias was stronger than the perceptual history bias (t(36) = 6.25, p < .001, BF_{10} = 4.9486e+04), and in fact had roughly the same influence on confidence as current trial evidence (t(36) = 0.384, p = .7032, BF_{10} = 0.1894). Statistical inference was performed non-parametrically within individual participants based on 1000 permutations of the data. In our sample, 13/37 participants showed significant perceptual history bias (at p=0.05). Therefore, the population prevalence^{40,41} of perceptual history bias detectable in our experiment is 31.7% [14.6 48.8] (maximum likelihood estimate with 95% bootstrap confidence interval). The majority of those showing significant perceptual history bias tended to repeat their previous responses (N=10), with only 3 tending to alternate^{16,19,20}. Across participants perceptual history bias was inversely related to the effect of evidence on perceptual responses within trials (Fig. 2F: Pearson’s r = −0.55, p < .001, BF_{10} = 64.297). However, the influence of evidence was stronger in most participants (MI(Orient;Resp)>MI(Resp-1;Resp), 35/37 participants) than the influence of choice history (MI(Resp-1;Resp)>MI(Orient;Resp), 2/37). 34/37 participants showed significant metacognitive history bias (at p=0.05) which implies a population prevalence of 91.4% [80.1 100]. All participants showing significant metacognitive history bias tended to repeat previous confidence ratings. Across participants metacognitive history bias was inversely related to the effect of evidence on confidence within trials (Fig. 2G: Pearson’s r = −0.53, p < .001, BF_{10} = 29.059), with relatively even sub-groups of participants for whom current evidence dominated confidence judgements (MI(Orient;Conf) > MI(Conf-1;Conf), 16/37), vs those for whom rating history dominated judgements (MI(Conf-1;Conf) > MI(Orient;Conf), 21/37).

### Uncovering the influence of choice history on perceptual and metacognitive decisions with computational behavioural modelling

In order to explore the relationship between perceptual and metacognitive choice history biases, we returned to the decision-making model (defined in Method and Fig. 1B) to formally test which aspects of both perceptual (type-1) and metacognitive (type-2) performance were affected by previous choices. Type-1 performance encompasses traditional measures of perceptual sensitivity (** d’**) and bias (

**), whereas type-2 performance encompasses measures of metacognitive sensitivity (**

*c***) and bias (**

*meta-d’***-**

*meta***)**

*c*^{5,42}.

**represents the type-1**

*Meta-d’***value expected to give rise to the observed confidence data under the assumption that the observer has perfect metacognitive insight (i.e.**

*d’***=**

*d’***when confidence is always high when correct and low when incorrect). To quantify metacognitive insight/efficiency (or in other words how much of the information present in the type-1 performance participants make use of in their type-2 decisions), we can subtract**

*meta-d’***from**

*d’***. If**

*meta-d’***-**

*meta-d’***≠ 0, then confidence ratings are either more (positive) or less (negative) sensitive to the task-related evidence than the perceptual responses. The metacognitive criteria (**

*d’***-**

*meta***) index the tendency to give high/low confidence ratings regardless of evidence (metacognitive response bias). Their absolute distance from type-1**

*c***(**

*c***|**-

*meta***–**

*c***) represents the level of evidence needed to increase confidence ratings from low to high**

*c*|^{43}. Unlike type-1

**,**

*c***-**

*meta***values are calculated separately for each possible perceptual response (‘left’/‘right’ orientation judgements). Additionally, there are N-1 meta-c for each response, where N indexes the number of possible ratings (4 in the current experiment). In order to simplify the analysis, we averaged over the 3**

*c***|**-

*meta***–**

*c***values for each response (‘left’/‘right’) separately (see Methods).**

*c*|First, we assessed whether overall metacognitive sensitivity (** meta**-

**) systematically deviated from perceptual sensitivity (**

*d’***). Across orientations, confidence judgements were less reflective of the evidence than perceptual judgements, with mean**

*d’***-**

*meta***being lower than mean**

*d’***(compare Fig. 3A and 3B). A repeated-measures analysis of variance (ANOVA: 2 (sensitivity measure:**

*d’***,**

*d’***-**

*meta***) × 6 (absolute orientation (evidence): 3°,6°,9°,12°,15°,18°)) revealed that**

*d’***-**

*meta***was significantly lower than**

*d’***(main effect: F(1,36) = 58.818, p < .001) and the difference increased as a function of orientation (Fig. 3C) (interaction: F(5,180) = 13.614, p < .001). Hence, participants were generally unable to make use of all information available for perceptual judgements when estimating their confidence (suboptimal metacognition), in line with the MI results. To investigate the influence of previous perceptual choices, we calculated the type-1 and type-2 model parameters**

*d’*^{42}separately for ‘post-left’ and ‘post-right’ decision trials across each level of evidence strength. We then performed repeated-measures ANOVAs (2 (previous choice: ‘left’/‘right’) × 6 (absolute orientation: 3°,6°,9°,12°,15°,18°)) for each parameter.

Previous perceptual choice did not influence either perceptual or metacognitive sensitivity (Fig. 3A-C), neither ** d’**,

**-**

*meta***nor**

*d’***-**

*meta***–**

*d’***(F-values ≤ 1.086, p-values ≥ .37). However, type-1**

*d’***was biased towards the previous perceptual choice across all orientations (Fig. 3D: main effect: F(1,36) = 20.344, p < .001; interaction: F(5,180) = 1.619, p = .157), in line with the PF analysis. Metacognitive criteria (**

*c***|**-

*meta***–**

*c***) were biased in a response dependent manner (Fig. 3E-F). When participants responded ‘left’, they displayed higher meta-criteria when they had also responded ‘left’ on the previous trial (repetition) compared to when they had responded ‘right’ (alternation) (main effect: F(1,36) = 12.983, p < .001; interaction: F(5,180) = 1.603, p = .162). Accordingly, when participants responded ‘right’, they displayed higher meta-criteria when they had responded ‘right’ on the previous trial (repetition) compared to when they had responded ‘left’ (alternation) (main effect: F(1,36) = 14.52, p < .001; interaction: F(5,180) = 2.427, p = .037). The interaction term in the ‘right’ response analysis was driven by the effect not being significant for the two largest orientations. The effect was significant for 3°, 6°, 9°, 12° (t-values ≥3.164, p-values ≤.003, BF**

*c*|_{10}≥ 11.352) but not 15° (t(36) = 1.775, p = .084, BF

_{10}= 0.731) nor 6° (t(36) = 1.972, p = .056, BF

_{10}= 1). In sum, perceptual choices influenced decision criteria for both perceptual and metacognitive subsequent choices.

To investigate the influence of the previous metacognitive choice, we performed the same analysis but this time for ‘post-high’ and ‘post-low’ confidence trials (two bins split as evenly as possible within each participant: see Methods). Previous confidence did not influence perceptual sensitivity (Fig. 4A: ** d’** main effect: F(1,36) = .076, p = .784; interaction: F(5,180) = .162, p = .976), but it did influence subsequent metacognitive sensitivity (Fig. 4B:

**-**

*meta***(main effect: F(1,36) = 48.972, p < .001; interaction: F(5,180) = 4.617, p = .001)) and metacognitive efficiency (Fig. 4C:**

*d’***-**

*meta***–**

*d’***main effect: F(1,36) = 33.194, p < .001; interaction: F(5,180) = 2.375, p = .041). The interaction terms in both the metacognitive sensitivity (**

*d’***-**

*meta***) and efficiency (**

*d’***-**

*meta***–**

*d’***) analyses were driven by the effect increasing as a function of orientation (Fig. 4B-C). For metacognitive sensitivity, follow-up t-tests showed that the effect was significant for orientations of 3°, 9°, 12°, 15°, 18° (t-values ≥2.413, p-values ≤.021, BF**

*d’*_{10}≥ 2.239) but not for 6° (t(36) = .457, p = .65, BF

_{10}= 0.195). For metacognitive efficiency, the effect was significant for 9°, 12°, 15°, 18° (t-values ≥3.368, p-values ≤.002, BF

_{10}≥ 18.449) but not for 3° (t(36) = 1.737, p = .091, BF

_{10}= 0.689) nor 6° (t(36) = .18, p = .858, BF

_{10}= 0.179).

In contrast to the perceptual history bias, type-1 c was not influenced by confidence on the previous trial (Fig. 4D: main effect: F(1,36) = 1.419, p = .241; interaction: F(5,180) = .645, p = .666). However, **| meta**-

**–**

*c***were significantly reduced following ‘high’ relative to ‘low’ confidence responses, both for ‘left’ (Fig. 4E) (main effect: F(1,36) = 43.086, p < .001; interaction: F(5,180) = 1.481, p = .198) and ‘right’ responses (Fig. 4F) (main effect: F(1,36) = 31.366, p < .001; interaction: F(5,180) = 3.025, p = .012), indicating that ‘high’/’low’ confidence ratings were more likely following ‘high’/’low’ ratings respectively. The interaction term in the ‘right’ response analysis was driven by the previous rating effect decreasing as a function of orientation (Fig. 4F: linear contrast F(1,36) = 6.771, p = .013). Follow-up t-tests showed that the effect was significant for orientations of 3°, 6°, 9°, 12°, 15° (t-values ≥ 2.739, p-values ≤ .01, BF**

*c*|_{10}≥ 4.37) but not for 18° (t(36) = 1.066, p = .203, BF

_{10}= 0.299). Hence, metacognitive choice history influenced all aspects of metacognitive performance (sensitivity, efficiency and bias), but did not influence perceptual sensitivity nor bias.

### Choice alternation is associated with increased perceptual sensitivity but reduced metacognitive insight

Next, we investigated directly whether repeating (versus alternating) the previous choice was associated with changes in either perceptual and/or metacognitive performance. Figure 5 plots metacognitive history bias effects separately for ‘repetition’ (Fig. 5A) and ‘alternation’ trials (Fig. 5B). For both, confidence increased as a function of orientation but also tended to be shifted towards previous ratings. Confidence was positively predicted by previous ratings for both repetition (t(36) = 11.88, p < .001, BF_{10} = 1.4145e+11) and alternation trials (t(36) = 6.6953, p < .001, BF_{10} = 1.7697e+05). However, the effect was significantly stronger for repetition trials (t(36) = 5.0343, p < .001, BF_{10} = 1.5439e+03). Intriguingly, computational modelling revealed a novel dissociation of perceptual and metacognitive sensitivity induced by disengagement from choice hysteresis. When participants alternated from their previous choice, they were more likely to be correct than when they repeated (Fig. 5C: ** d’** main effect: F(1,36) = 68.841, p < .001; interaction: F(5,180) = 2.763, p = .02). The effect was significant at all orientations (t-values ≥ 2.709, p-values ≤ .01, BF

_{10}≥ 4.1) but increased as a function of orientation (linear contrast: F(1,36) = 13.284, p = .001). However, this improvement in perceptual sensitivity for alternation trials was not reflected in metacognitive sensitivity (Fig. 5D:

**-**

*meta***(main effect: F(1,36) = 1.311, p = .26; interaction: F(5,180) = .932, p = .462)). Hence, objective accuracy increased for alternation relative to repetition trials whereas metacognitive efficiency decreased (Fig. 5E:**

*d’***-**

*meta***–**

*d’***main effect: F(1,36) = 27.262, p < .001; interaction: F(5,180) = 2.321, p = .045). The**

*d’***-**

*meta***–**

*d’***effect was significant at orientations of 6°, 9°, 12°, 15°, 18° (t-values ≤ −2.2, p-values ≤ .033, BF**

*d’*_{10}≥ 1.552), but not 3° (t(36) = −1.303, p = .201, BF

_{10}= 0.385), and increased as a function of orientation (F(1,36) = 9.899, p = .003).

Choice hysteresis did not influence either perceptual or metacognitive decision criteria (Fig. 5F-H: F-values ≤ 1.685, p-values ≥ .14). Overall, participants lacked full metacognitive insight into the increased likelihood of being correct when they alternated from their previous perceptual response.

### Choice history biases are associated with reduced perceptual and metacognitive sensitivity, but not reduced metacognitive insight, across participants

Finally, we investigated the correlation between perceptual and metacognitive history biases (Fig. 6A), and whether they contribute to suboptimal perceptual and metacognitive sensitivity, across participants. The strength of perceptual bias did not predict the strength of metacognitive bias (Pearson’s r = 0.1072, p = .5278, BF_{10} = 0.156). Metacognitive history bias was stronger in most participants (MI(Conf-1;Conf)>MI(Resp-1;Resp), 36/37 participants) than perceptual history bias (MI(Resp-1;Resp)>MI(Conf-1;Conf), 1/37 participants).

History biases have previously been linked to reduced perceptual^{19} and metacognitive sensitivity^{31}, and we replicated these findings here. The perceptual bias was inversely related to perceptual sensitivity (** d’**) (Fig. 6B: r = −0.5877, p < .001, BF

_{10}= 179.836) and the metacognitive bias was inversely related to metacognitive sensitivity (

**-**

*meta***) (Fig. 6C: r = −0.4315, p = .0077, BF**

*d’*_{10}= 4.353). Perceptual history bias was not significantly associated with metacognitive sensitivity (

**-**

*meta***) and metacognitive history bias was not significantly associated with perceptual sensitivity (**

*d’***) (Supplementary Fig. 3).**

*d’*Previously, a negative correlation was found between metacognitive history bias and metacognitive sensitivity (as quantified by the area under the Type-2 receiver operating characteristic (ROC) curve (Type-2 AUC))^{31}. However, neither type-2 AUC nor ** meta**-

**account for type-1 performance, and hence do not represent pure measures of metacognitive insight/efficiency**

*d’*^{5,42}. Therefore, to establish the relationships between perceptual and metacognitive history biases and metacognitive insight, we correlated both with metacognitive efficiency (

**-**

*meta***–**

*d’***). A weak positive correlation was found between perceptual history bias and metacognitive efficiency (Fig. 6D: r = 0.3437, p = .0373, BF**

*d’*_{10}= 1.101). The BF

_{10}did not provide strong evidence for the alternative hypothesis therefore we do not interpret this further. However, a one-tailed analysis to test for a negative relationship revealed strong evidence for the null hypothesis (BF

_{10}= 0.07). A non-significant negative relationship was observed between metacognitive history bias and metacognitive efficiency (Fig. 6E: r = −0.1852, p = .2726, BF

_{10}= 0.233). Hence, when the contribution of type-1 performance to absolute metacognitive sensitivity was factored out, history biases were not significantly associated with reduced metacognitive insight across participants. Note that similar results were found using a ratio measure of metacognitive efficiency/insight (Supplementary Fig. 4).

Using MI to quantify history biases eliminates information about the bias direction (i.e. ‘Repeater’ versus ‘Alternator’). For the sake of completeness, the same correlation analyses using metrics which retain the bias direction are reported in Supplementary Figure 5.

## Discussion

Human decisions are often influenced by sources other than the relevant information^{1,2,44,45}. Understanding sub-optimal decision-making represents a fundamental enterprise in modern psychology and neuroscience^{46}. In line with previous studies, we show that choice history represents a source of task-irrelevant choice variability, both for perceptual decisions^{16–21,27} and confidence reports^{31}. Most participants displayed positive history biases: they were more likely to repeat perceptual decisions and confidence ratings even though stimuli were presented in a random order and hence previous choices were of no relevance. Crucially, we quantified both perceptual and metacognitive history biases on a common effect size scale (using MI) and estimated single-subject significance and population prevalence of the respective effects. Additionally, by employing computational modelling of perceptual decisions and confidence ratings, we were able to uncover latent parameters which are influenced by choice history at different levels of the decision-making hierarchy. Across participants, perceptual and metacognitive history biases did not correlate with each other, but were independently associated with reduced perceptual and metacognitive sensitivity, whereas neither bias predicted metacognitive insight. We show for the first time that the perceptual and metacognitive biases influence distinct type-1 (perceptual) and type-2 (metacognitive) aspects of decision-making, and the metacognitive bias is stronger and likely to be more prevalent in the general population. These observations are of fundamental relevance for contemporary models of decision-making and confidence, suggesting that recent confidence represents a mental shortcut (heuristic) which informs self-reflection when more relevant information is unavailable.

A normative model posits that confidence computations reflect the probability of being correct in a statistically optimal manner^{33,34,36,47}, in line with suggestions that the computation of confidence arises from the same neural processes as the decision itself^{48–51}. The normative model relates confidence to the available evidence through a conditional Bayesian posterior probability^{17,36,44} and several of the model predictions were met in the current dataset using explicit confidence ratings of visual discrimination performance (see Fig 1B-H). This suggests that subjective confidence is to some extent consistent with normative statistical principles, though it should be noted that a first-order normative model is not the only model which gives rise to such predictions^{38}. However, the influence of choice history on confidence ratings (see Fig 2C-G) shows that the normative model alone cannot fully account for subjective confidence. Rather, the normative computation may be one of several determinants of confidence^{34}, and differential weighting of these determinants may explain individual differences in the degree of metacognitive history bias and overall metacognitive sensitivity. Other factors which have been suggested to influence confidence judgements include context^{52}, social pressure^{12,53}, attention^{54,55} and fatigue^{56}. Our approach allowed us to quantify and compare the degree to which confidence judgements were driven by objective evidence versus preceding confidence ratings. Surprisingly, we found a relatively even split of participants for whom the objective evidence most strongly influenced confidence versus participants for whom previous ratings were a stronger influence (Fig. 2G). In contrast, all but one participant showed a stronger influence of objective evidence on perceptual choices than the influence of previous choices (Fig. 2F). Metacognitive judgements are thus more susceptible to bias from extraneous factors than perceptual decisions, an observation which may be of practical relevance in terms of learning, error monitoring and psychological well-being^{9–13}. Further research may investigate whether primarily ‘history’ versus ‘evidence’ based metacognitive styles meaningfully predict differences in influential traits such as cognitive flexibility, personality and/or psychiatric symptomology.

The current results align with models positing that confidence judgements arise from processes which are to some degree dissociable from the decision process itself^{5,38,57}, with distinct neural implementations and independent influences. Evidence supporting such a dissociation has come from neuroimaging^{7,8,35,58–64}, psychophysics^{32,65,66}, brain stimulation^{67,68} and clinical^{14,69,70} studies. Several aspects of our findings accord with a ‘second-order’ computation of confidence. First, participants were generally unable to make use of all of the information available for their perceptual decisions when rating confidence, which indicates ‘noise’ in the metacognitive system and sub-optimal insight^{71,72}. Second, perceptual and metacognitive history biases were uncorrelated across participants (Fig. 6A) and impacted on distinct latent decision-making parameters (Figs. 3+4). For instance, type-2 (metacognitive) decision criteria were modulated as a function of prior confidence ratings independently of the type-1 (perceptual) criteria (Fig. 4D-F), and alternating from choice hysteresis was associated with increased perceptual sensitivity but reduced metacognitive insight (Fig 5A-C). This dissociation when disengaging from choice hysteresis, reported here for the first time, adds to previous reports suggesting that accuracy and confidence can be un-coupled even in healthy participants^{55,68,71}. Thus, confidence computations must operate, at least partly, on an axis which is dissociable from type-1 decisions^{43}. We did find evidence for some level of interaction between perceptual and metacognitive history biases. The metacognitive bias was strongest for trials in which the perceptual choice was repeated, though it remained significant also for alternation trials (Fig. 5A-B). This suggests that some level of ‘shared’ hysteresis occurs across both systems. However, in contrast to previous findings^{16,17,32}, preceding confidence had no influence on the likelihood of the perceptual choice being repeated. Subtle but important differences in experimental designs may explain this discrepancy (see Supplementary Figure 1).

Why might perceptual and metacognitive decision processes be dissociable? One possibility is that the nature of everyday decision-making renders the utilisation of all type-1 information for metacognitive reflection either impossible or unnecessary^{71}. As is known for decision-making, metacognitive judgements might rely partly on heuristics and simplifications which result in systematic biases under particular conditions, including lab-based tasks with high levels of uncertainty^{8,71,73–75}. In natural settings, it may generally be advantageous to assume statistical regularity of environmental stimuli and to default to this model/heuristic under conditions of high uncertainty^{27,28}. If the metacognitive system has less access to objective evidence than the perceptual system, then stronger history biases of confidence ratings are likely to occur. Indeed, here confidence ratings were less sensitive to the objective evidence than perceptual choices and were also more strongly biased by previous ratings.

The mechanisms underlying history biases remain unclear, though neural signatures encoding previous perceptual choices have been identified across various sensory, associative and motor brain regions^{26,76–78}. Recent studies have investigated perceptual history bias within the context of computational models of decision-making. The drift-diffusion model (DDM)^{79} represents an extension of classic SDT incorporating single-trial dynamics of evidence accumulation. Under this model, biasing of the type-1 criterion by previous choices (Fig 3D) could occur due to asymmetry in either the starting point or drift rate of the evidence accumulation process. Urai et al.,^{20} show compelling evidence across six tasks that drift rate bias provides the best account, in line with persistence of decisional weights over time/trials^{18,27}, an interpretation which is fully in line with our results. An intriguing avenue for further research would be to model the temporal dynamics of both type-1 and type-2 decisions^{57} in order to ascertain the mechanism(s) of history induced type-2 criterion shifts (Fig 4E-F). Additionally, by combining such an approach with functional neuroimaging^{35,49}, neural correlates of model parameters may reveal the neural implementations underlying both perceptual and metacognitive choices themselves, along with history biases.

To our knowledge, this study is the first to report estimates of the population prevalence of both perceptual and metacognitive choice history biases. We employed information theoretic statistics to quantify aspects of decision-making within individual participants on a common effect size scale. These measures also enable computationally efficient non-parametric within-participant inference. This novel approach could be widely applied to different questions in studies of decision-making. We found that metacognitive history bias was significant in almost all of our sample (34/37), allowing us to infer an estimate of the population prevalence of 91.5% [80.1 100] (maximum likelihood with 95% bootstrap confidence interval). That is, we can expect that at least 80% of the general population would have an effect detectable with our experimental design (i.e. statistically significant at p=0.05 from 416 trials). The perceptual history bias was significant at the group-level, but was only significant in 13/37 of our sample, yielding a population prevalence estimate of 31.7% [14.6 48.8]. Statistical inference in psychology traditionally focusses on population mean effects, but we argue that it is crucial to determine the degree to which the effects can be reliably observed within individuals, and the prevalence of these effects in the population.

The extent to which these biases negatively influence everyday decisions remains unclear, though repeating previous choices in situations of uncertainty may serve to preserve neural resources associated with choice alternation and to maintain self-consistency^{8}. Indeed, activation of a specific cortical network involving inferior frontal cortex and the subthalamic nucleus during the decision process is associated with disengagement from choice hysteresis^{80}. This suggests that switching choices under conditions of uncertainty comes at a computational cost. It is interesting to speculate that engagement of this network might improve performance but not subjective confidence in the choice, thereby explaining the lack of metacognitive insight our participants displayed, despite increased perceptual sensitivity, when alternating from their previous choice (Fig. 5A-C). Furthermore, the drive for hysteresis/self-consistency may induce uncertainty when choices are switched and hence distort metacognitive judgements.

It is possible that such biases could have negative implications in circumstances where significant decisions must be made under conditions of high uncertainty (i.e. security scanning, medical imaging^{81}). Furthermore, miscalibrated metacognitive judgement (systematic under- or over- confidence) is likely to impact on learning, adaptive decision-making and mental health^{9–15}, and may be compounded by history and confirmation biases. The development of behavioural and/or pharmacological techniques to reduce such biases can help to optimise accurate decision-making and self-reflection.

## Method

### Participants

Forty-three healthy human observers participated in the study. All reported normal or corrected-to-normal vision. Due to poor psychophysical performance (explained in the *Data Exclusion* section), 6 participants were excluded from the analysis, leaving a total number of 37 participants (26 female/11 male aged from 18 to 38 years (*M* = 25.23, *SD* = 3.95)). The study was approved by the Ethics Committee of the College of Science and Engineering at the University of Glasgow and all participants gave their informed consent. No monetary reward was given to participants for taking part, though undergraduate students could receive course credits for their participation.

### Stimuli and task

The stimuli were Gabor patches (windowed sine wave gratings: 96 × 96 pixels (2.54 × 2.45 cm)) presented at the centre of the screen. The Gabor patches had a peak contrast of 100% Michelson, a spatial frequency of 3.7 cycles per degree and a 0.3° standard deviation Gaussian contrast envelope. At a viewing distance of 57 cm (fixed using a chinrest), Gabor patches subtended 2.55° of visual angle. On each trial, the stimulus would appear at a random angle that ranged from −18° to +18° relative to vertical at intervals of 3° (including 0°). The monitor used to present the stimuli had a display refresh rate of 60Hz and screen resolution of 1920 × 1080 pixels. The software used to implement the task was E-prime 2.0 and participants made responses using a QWERTY keyboard. Each trial began with a fixation point displayed at the centre of the screen for 1000ms (see Fig. 1A). Following this, a Gabor patch appeared at a random orientation in the centre of the screen for a duration of 16ms. After the stimulus disappeared, the participant viewed the fixation point for 400ms, before being instructed to indicate whether they perceived the top of the Gabor patch to be tilted in a “leftward” or “rightward” direction relative to vertical (2-alternative forced choice), by responding with the left and right arrow keys respectively. Participants were not informed as to the accuracy of their choice and no time limit was enforced. Immediately after responding, participants were presented with a second decision regarding their confidence about the perceptual choice they had just made. Participants were asked to rate their confidence on a scale of 1 to 4, where 1 represented “not confident at all” and 4 represented “highly confident”, using the corresponding digit keys on the keyboard. Immediately after making this response the central fixation point reappeared indicating the beginning of the next trial. A short practice block (12 trials), including only the most extreme angles (−18°, +18°) and with accuracy feedback on each trial, was performed in order to familiarise participants with the task. In the full experiment, each of the 13 orientations was presented 32 times in a randomised order, amounting to 416 trials in total. The experimental session lasted approximately 30 minutes.

### Quantifying the psychometric function

To model Gabor orientation discrimination performance, cumulative logistic psychometric functions (PFs) were fit to the data using a Maximum Likelihood criterion^{82}. The dependent measure was the proportion of trials on which the participant indicated that the Gabor appeared to be oriented ‘rightward’ and the independent measure was the true orientation of the Gabor. The logistic function is described by the following:
where ** x** are the tested Gabor orientations,

**is the subjective threshold (location on the x-axis corresponding to 50% ‘left’/50% ‘right’ responses) and**

*δ***is the slope of the rising curve (indexing visual sensitivity). Both**

*α***and**

*λ***represent the probability of stimulus independent lapses and were fixed at 0.02.**

*γ*### Data exclusion

The PF threshold and slope parameters were used to formally detect outliers in the dataset. Any participant who met any one of the following two criteria for the overall PF fit to their entire dataset was excluded from further analysis: 1) a threshold value over 3 median absolute deviations from the overall group median and/or 2) a slope value over 3 median absolute deviations from the overall group median. This led to a total of 6 participants being excluded and hence 37 participants were entered into the final inferential analyses.

### Quantifying perceptual choice history bias

In order to measure perceptual choice history bias, the data within each participant were split into two bins: one containing trials that followed a leftward orientation response on the previous trial (‘post left response’) and the other containing trials that followed a rightward orientation response on the previous trial (‘post right response’). The PF was fit separately to data from these subsets of trials (Fig. 2A). From the resulting fits, the threshold and slope were retrieved. This was done separately for trial lags of +1, +2 and +3. The difference in PF threshold between ‘post left’ and ‘post right’ responses indexes the strength and direction of perceptual choice history bias (Fig. 2B). If positive choice history bias (i.e. tendency to repeat previous choices^{16–20}) heavily influences the orientation judgements then the group-averaged psychometric curves conditioned separately on ‘post left response’ and ‘post right response’ trials will be shifted horizontally on the *x*-axis in relation to one another. To formally test this, a repeated-measures *t*-test was performed to compare the PF thresholds between ‘post left response’ trials and ‘post right response’ trials. This analysis was also performed separately for ‘post high confidence’ and ‘post low confidence’ trials respectively (Supplementary Fig. 1: see *SDT parameter* analyses section below for division of confidence bins).

### Quantifying metacognitive choice history bias

Measuring history bias of metacognitive decisions required a different analytical approach. If positive metacognitive choice history bias occurs^{31} then confidence ratings will be more likely to be high following a high confidence rating and low following a low confidence rating, regardless of the level of external evidence (i.e. absolute Gabor orientation) (see Fig. 2C). To statistically test this, linear regression was performed between absolute Gabor orientation and mean confidence ratings separately for post 1, 2, 3 and 4 rating trials in each participant. Subsequently, linear regression was then performed between the previous confidence rating (1, 2, 3, 4) and the intercepts of the orientation-confidence regressions, and the resulting within-participant regression slope represented our measure of history bias. At the group level, a one sample *t*-test (versus 0) was performed on the resulting regression slopes to examine whether they were statistically different from zero (i.e. whether they showed a systematic directionality across participants). This was done separately for trial lags of +1, +2 and +3 (Fig. 2D). This analysis was also performed separately for trials in which the previous perceptual choice (i.e. ‘left’ or ‘right’) was ‘repeated’ versus trials in which the perceptual choice was ‘alternated’ (Fig 5A-B). A paired-samples t-test was used to compare the regression slopes between ‘repetition’ and ‘alternation’ trials.

### Quantifying choice history biases and estimating population prevalence using mutual information (MI)

Mutual Information (MI)^{39} is a measure of statistical dependence between two random variables that places no assumption on the form of the dependence. For two discrete variables *X* and *Y* that are distributed according to a joint probability distribution *P(X,Y)* the MI is defined as:
When the probability distributions are estimated from observed data the resulting MI estimate suffers from a limited sampling bias which causes the expectation of the estimate to be systematically larger than the true value. We correct this by subtracting the Miller-Madow bias estimate^{83}, which is given by , where |*X*|, |*Y*| are respectively the number of discrete values taken by the variables *X* and *Y*, and *N*_{trl} is the number of trials used for the calculation.

Statistical inference was performed via permutation testing. The relationship between *X* and *Y* was shuffled and the resulting MI values stored. This was repeated 1000 times (separately for the each participant). The 95^{th} percentile of the resulting permutation value was used as the threshold for inference on the MI value obtained from the unshuffled data.

We calculate the following MI values (Figure 2E): I(Orient; Resp), I(Resp-1; Resp), I(Orient; Conf), I(Conf-1;Conf). In these calculations, the number of bins for the orientation is reduced by considering neighbouring levels of evidence together (e.g. 7 discrete bins corresponding to the following presented angles: [−18 −15] [−12 −9] [−6 −3] [0] [3 6] [9 12] [15 18]). Perceptual response is always represented with two discrete values (left or right). Confidence was represented with 3 or 4 discrete values (some participants never used one of the four confidence response values). For the choice history calculations, the variable Resp-1/Conf-1 is given by all trials excluding the last, the variable Resp/Conf is formed from all trials excluding the first.

### Modelling perceptual and metacognitive sensitivity and bias

Computational models of perceptual decision-making and confidence judgements, grounded largely in statistical decision theory and SDT, have successfully accounted for a range of confidence related empirical data^{33,34,47,57}. Here, we modelled perceptual decisions and confidence ratings within an extended signal detection theory (SDT) framework^{5}. This model assumes that, during yes/no detection or 2-alternative forced choice (2-AFC) discrimination tasks, binary decisions are made by the comparison of internal evidence (indexed by a noisy decision variable (** dv**)) with a decision criterion (

**). Across trials, evidence generated by each stimulus class (i.e. noise/signal, choice A/choice B) is sampled from a stimulus-specific normal distribution. The relative separation between the distributions (in standard deviation units) indexes the overall level of evidence available for the decisions (**

*c***), and hence how well the observer can discriminate between noise and signal, or between choice A and choice B. On a given trial, the probability that the choice is correct is indexed by the absolute distance between**

*d’***and**

*dv***(in an unbiased observer), and hence statistically optimal confidence judgements should reflect this computation**

*c*^{34,47}. When a discrete confidence rating scale is employed, the rating on a given trial is defined by where the dv falls with respect to the so-called ‘type-2’ criteria (

**). The**

*c2***are response conditional, with separate criteria for the 2 possible choices (i.e. noise/signal, choice A/choice B). Overall, there are (k−1) × 2**

*c2***, where k equals the number of confidence ratings available. Figure 1B presents the model schematically for 3 differing levels of decision evidence: no evidence (left panel), weak evidence (middle panel) and strong evidence (right panel). The distributions and predicted effects in Figures 1B-E were produced using code developed by Urai et al.,**

*c2*^{16}(https://github.com/anne-urai/pupilUncertainty). The x-axis ranges from [−15:15] in these examples,

**was set to 0.1 (no evidence), 1.58 (weak evidence) and 3.17 (strong evidence) whereas c was always set to 0 (unbiased observer). The flanking**

*d’***were set at ±3 (conservative) and ±6 (liberal) for each. To formalise the predicted relationships between evidence strength, accuracy and confidence (Fig. 1-E), we simulated a normal distribution of**

*c2***for one response (i.e.**

*dv**μ*>0) at each level of evidence strength. All samples from the simulated distribution were split into correct and error ‘choices’ based on their position relative to

**. For each combination of evidence strength and choice, the level of confidence is where**

*c**f*is the cumulative distribution function of the normal distribution which transforms the distance between dv and c into the probability of a correct response

^{16,84}. Ten millions trials were simulated and for each iteration a binary choice was computed along with its accuracy and corresponding level of confidence. Because response times are often taken as a proxy of decision confidence (with response times increasing as a function of decreasing confidence)

^{16,34}the response time prediction (Fig. 1E) represents an inversion of the confidence prediction (Fig. 1D). In order to quantify both type-1 and type-2 performance parameters (i.e. sensitivity and bias) across different levels of evidence strength (absolute Gabor orientations) in the real data, we adopted the meta-d’ approach (see

^{5,42,43}for extended description and discussion) as implemented using single-subject Bayesian model fits within the ‘Hmeta-d’ toolbox

^{42}(https://github.com/metacoglab/HMeta-d).

**-**

*meta***characterises type-2 sensitivity as the value of**

*d’***that a metacognitively optimal observer, with the same type-1 response bias (**

*d’***c**), would have required to produce the observed type-2 (confidence) data

^{5}. If an observer has perfect metacognitive insight (i.e. they are always high in confidence when correct and low in confidence when incorrect) then

**will be equal to**

*d’***-**

*meta***. Importantly, because**

*d’***-**

*meta***is expressed in the same units as**

*d’***, the two can be compared directly to quantify the level of metacognitive efficiency/insight. If the metacognitive efficiency score (**

*d’***-**

*meta***-**

*d’***) ≠ 0, then the type-2 responses (confidence ratings) are either more (positive value) or less (negative value) sensitive to the task-related evidence than the type-1 perceptual responses. We note that (**

*d’***-**

*meta***/**

*d’***) is often used to quantify metacognitive efficiency as a ratio of type-1 performance**

*d’*^{58}and so we replicated our correlation analyses involving (

**-**

*meta***-**

*d’***) using (**

*d’***-**

*meta***/**

*d’***) (see Supplementary Figure 4). The same pattern of results was found. The metacognitive criteria (meta-c) represent type-2 bias (**

*d’***) calculated within the**

*c2***-**

*meta***framework: the tendency to give high or low confidence ratings regardless of evidence strength. We calculated the absolute distance between meta-c and type-1**

*d’***(**

*c***|**-

*meta***–**

*c***) in order to isolate the metacognitive response bias from the perceptual response bias**

*c*|^{43}. Lower values of

**|**-

*meta***–**

*c***indicate an overall response bias in favour of higher confidence ratings. As mentioned,**

*c*|**-**

*meta***values are calculated separately for each of the possible perceptual responses (i.e. ‘left’ or ‘right’ orientation judgements in the current study) and for each of N-1 confidence ratings available to choose from (4 in the current experiment). In order to streamline the analysis, we averaged over the 3**

*c (c2)***|**-

*meta***–**

*c***values for each response (‘left’ or ‘right’) separately to gain a single estimate of overall metacognitive response bias.**

*c*|### Statistical analyses on SDT parameters

We compared overall perceptual sensitivity (** d’**) to metacognitive sensitivity (

**-**

*meta***) across all levels of evidence strength using a 2 (sensitivity measure:**

*d’***,**

*d’***-**

*meta***) × 6 (absolute Gabor orientation: 3°,6°,9°,12°,15°,18°) repeated measures ANOVA. To assess the extent to which the type-1 and type-2 SDT performance parameters were influenced by both perceptual and metacognitive choice history, trials were binned in three different ways (1. ‘post left’/’post right’ choice trials (Fig. 3); 2. ‘post high’/’post low’ confidence trials (Fig. 4); 3. ‘repetition’/’alternation’ trials (Fig. 5)) and the parameters (**

*d’***,**

*d’***-**

*meta***,**

*d’***-**

*meta***–**

*d’***,**

*d’, c***|**-

*meta***–**

*c***:**

*c*|**,**

*‘left’ responses***|**-

*meta***–**

*c***:**

*c*|**were calculated for both bins separately at each of the 6 levels of evidence strength. Repeated measures ANOVAs (2 (choice history bin) × 6 (absolute Gabor orientation: 3°,6°,9°,12°,15°,18°)) were then performed separately for each parameter. Significant interaction terms were followed up using paired samples t-tests of the difference between the choice history bins separately at each level of evidence strength. To split the trials into relatively equal ‘post high’ and ‘post low’ confidence bins within each participant, the number of trials immediately following each of the 4 confidence ratings (i.e. post ‘1’, ‘2’, ‘3’, ‘4’ ratings) was calculated and bins were assigned that minimized the difference in trial number between the high and low bins (median difference between bins = 69 trials (min = 7, max = 251)). This led to 10 participants having ‘low’ bin = ‘1’, ‘2’ and ‘3’ ratings, ‘high’ bin = ‘4’ ratings, 14 participants (‘low’ bin = ‘1’ and ‘2’ ratings, ‘high’ bin = ‘3’ and ‘4’ ratings) and 13 participants (‘low’ bin = ‘1’ ratings, ‘high’ bin = ‘2’, ‘3’ and ‘4’ ratings). Note that 4 participants were excluded from the analysis of the influence of previous confidence level on perceptual choice history bias (Supplementary Fig. 1) because they had PF slope values over 3 median absolute deviations from the overall group median in at least one of the conditions here. This was due to biased perceptual and/or confidence decisions leading to a small number of trials being available for PF fitting after binning for these participants.**

*’right’ responses*)For all t-tests and correlations (see below), we calculated the Bayes Factor (BF_{10}) obtained from paired-samples Bayesian t-tests^{85} or correlation hypothesis tests^{86}, with a prior following a Cauchy distribution and a scale factor of 0.707. BF_{10} quantifies the evidence in favour of the null or alternative hypotheses, where BF_{10} below 1/3 indicates evidence for the null hypothesis, above 3 indicates evidence for the alternative hypothesis and between 1/3 and 3 indicates that the evidence is inconclusive (potentially due to a lack of statistical power)^{85}.

### Between-subject correlations

Both Pearson and Spearman correlation coefficients were calculated for each of the between-subject correlations of interest. Only Pearson’s r are shown in the corresponding figures.

## Author Contributions

Conceptualization, C.S.Y.B.; Data collection, R.B. and F.W.; Formal Analysis, C.S.Y.B and R.A.A.I.; Writing, C.S.Y.B., R.B. and R.A.A.I. Supervision, C.S.Y.B.

## Competing interests

The authors declare no competing financial interests.