Short Communication
A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings

https://doi.org/10.1016/j.concog.2011.09.021Get rights and content

Abstract

How should we measure metacognitive (“type 2”) sensitivity, i.e. the efficacy with which observers’ confidence ratings discriminate between their own correct and incorrect stimulus classifications? We argue that currently available methods are inadequate because they are influenced by factors such as response bias and type 1 sensitivity (i.e. ability to distinguish stimuli). Extending the signal detection theory (SDT) approach of Galvin, Podd, Drga, and Whitmore (2003), we propose a method of measuring type 2 sensitivity that is free from these confounds. We call our measure meta-d′, which reflects how much information, in signal-to-noise units, is available for metacognition. Applying this novel method in a 2-interval forced choice visual task, we found that subjects’ metacognitive sensitivity was close to, but significantly below, optimality. We discuss the theoretical implications of these findings, as well as related computational issues of the method. We also provide free Matlab code for implementing the analysis.

Highlights

► Signal detection theory (SDT) predicts that task performance affects metacognition. ► Current measures of metacognition do not account for these confounds. ► Our new SDT method measures metacognitive performance without these confounds. ► Applying the method to data, we find observers are below SDT-optimal metacognition. ► We provide free Matlab code for performing the analysis.

Introduction

In psychological tasks, one measure of interest may be how well an observer’s confidence ratings predict stimulus judgment accuracy. For instance, suppose that observers perform a discrimination task, and on every trial provides a judgment about how confident they are that their discrimination is correct. We may ask: to what extent are the observer’s confidence judgments predictive of response accuracy? In the literature, the task of discriminating between one’s own correct and incorrect responses with confidence judgments has been called the “type 2 task” (Clarke et al., 1959, Galvin et al., 2003), as opposed to the “type 1 task” of discriminating between stimulus alternatives.

There are several widely used measures of type 2 sensitivity. Assuming confidence judgments are characterized in a binary way (high or low), a straightforward way to measure type 2 performance is to measure how often confidence judgments are congruent with accuracy, i.e. the probability that correct and incorrect judgments are “correctly” endorsed with high and low confidence, respectively. (See e.g. the “advantageous wagering” measure in Persaud, McLeod, & Cowey, 2007). A related approach is to compute a correlation coefficient between accuracy and confidence (e.g. phi in Kornell, Son, & Terrace, 2007 and gamma in Nelson, 1984). However, while these approaches are simple conceptually and computationally, they do not model type 2 sensitivity and type 2 response bias as separate processes and thus risk confounding them. For instance, a difference in two observers’ confidence-accuracy correlation coefficient may be due merely to a difference in overall likelihood to endorse responses with high confidence, rather than a true difference in type 2 sensitivity. For this reason, signal detection theory (SDT) approaches (Macmillan & Creelman, 2005) should be preferred, because they allow one to separate the independent contributions of sensitivity and response bias in type 2 task performance.

Several SDT approaches to characterizing type 2 performance have been put forth. Kunimoto, Miller, and Pashler (2001) proposed modeling type 2 performance in the same way SDT models type 1 performance. In the simplest type 1 SDT model, we assume that two stimulus alternatives generate normal distributions of evidence along some internal decision axis (see e.g. Fig. 1A), with the normalized distance between the distributions, d′, providing a measure of stimulus discrimination sensitivity. In Kunimoto et al.’s approach, we similarly assume that correct and incorrect judgments generate normal distributions of evidence along some decision axis, with the normalized distance between them, a′, providing a measure of type 2 sensitivity. However, specifying the parameters of the standard SDT model1 already places strong constraints on the distributions of evidence for correct and incorrect judgments (Fig. 1B; Galvin et al., 2003), and these distributions in general do not conform well to the model proposed by Kunimoto et al.; in a sense, their type 2 SDT model is thus inconsistent with the type 2 implications of the standard type 1 SDT model. Thus, a’ does not satisfactorily separate type 2 sensitivity from type 2 response bias theoretically (Galvin et al., 2003) or empirically (Evans & Azzopardi, 2007).

Clarke et al. (1959) and more recently Galvin et al. (2003) discussed how distributions of evidence for correct and incorrect stimulus judgments could be derived from the type 1 SDT model. An important lesson from this work is that type 1 sensitivity (d′) and response bias (c1) influence the area under the type 2 ROC curve (Fig. 1B). This entails that two metacognitively optimal observers could differ on type 2 performance due only to differences in type 1 performance.2

This observation invites a distinction between what might be called “absolute” type 2 sensitivity and “relative” type 2 sensitivity.3 Suppose observer A has d = 1, c1 = 0 and observer B has d = 2, c1 = 0, but that both observers make optimal use of the type 1 information available to them when performing the type 2 task. B will have greater area under her type 2 ROC curve than A, and in general her confidence ratings will be more predictive of accuracy. In this sense, B has greater “absolute” type 2 sensitivity than A. But by hypothesis, the difference in their metacognitive performance derives entirely from informational differences at the type 1 level, and so in a sense it is misleading to conclude that the metacognitive mechanisms of B are operating at a higher level of efficiency or sensitivity than those of A. The difference in their absolute type 2 sensitivity reflects the difference in the quality of type 1 information they are metacognitively evaluating, rather than in the quality of the evaluation itself. Once we take type 1 performance into account, we see that A and B are in fact equally effective at metacognitively evaluating the type 1 information available to them. In this sense, A and B have equivalent “relative” type 2 sensitivity, i.e. type 2 sensitivity relative to type 1 performance.

Note that absolute and relative type 2 sensitivity assess different aspects of metacognitive performance. Absolute type 2 sensitivity measures how much information confidence ratings carry about task performance. Relative type 2 sensitivity factors out the contribution of type 1 performance to absolute type 2 sensitivity, thus revealing the efficacy of metacognitive processing in and of itself. In other words, absolute type 2 sensitivity tells us how much we should trust an observer’s confidence ratings, which depends on the quality of information being metacognitively evaluated as well as the quality of the metacognitive evaluation itself. Relative type 2 sensitivity separates these factors, providing a measure of the quality of the metacognitive evaluation itself.

For many research applications we are interested specifically in assessing the efficacy of metacognitive mechanisms in and of themselves. In such instances, measures of absolute type 2 sensitivity such as area under the type 2 ROC curve (e.g. Kolb and Braun, 1995, Wilimzig et al., 2008) may not be appropriate, because such measures are likely to be influenced by both the efficacy of metacognitive function and the quality of information those mechanisms are evaluating.

How should we measure relative type 2 sensitivity? We endorse the proposal of Galvin et al. (2003) to evaluate observed type 2 sensitivity with reference to the type 2 sensitivity that would be expected to occur, given an SDT analysis of the observed type 1 performance (henceforth, “SDT-expected type 2 sensitivity”). Galvin et al. envisioned doing this comparison at the level of type 2 distributions of evidence, conditional on response accuracy. But this approach meets with several difficulties. It is difficult to compute SDT-expected type 2 sensitivity since it is difficult to derive general mathematical forms of the type 2 distributions from the type 1 model. And it is unclear how to compute observed type 2 sensitivity from observed type 2 ROC data in terms of parametric type 2 distributions, given their complexity and dependence on type 1 model parameters.

We observe that the spirit of Galvin et al.’s analysis can be retained while bypassing the difficulties of working directly with type 2 distributions (Fig. 2). Due to the theoretical link between type 1 and type 2 SDT models (Fig. 1B), type 2 sensitivity can be expressed at the level of type 1 distributions (Fig. 2A). That is, we can characterize observed type 2 sensitivity as the value of d′ that a metacognitively optimal observer would have required to produce the empirically observed type 2 data. We call this measure “meta-d” to reflect that it is a measure of type 2 sensitivity (meta-) expressed at the level of type 1 SDT (d′). One can think of meta-d′ as a measure of the signal that is available for the subject to perform the type 2 task. While meta-d′ measures observed type 2 sensitivity, its counterpart for SDT-expected type 2 sensitivity is simply the empirically observed value of d′. Importantly, since meta-d′ is expressed in the same scale as the conventionally estimated d′ value, the two can be compared directly. The comparison of meta-d′ with d′ achieves the comparison of observed type 2 sensitivity to SDT-expected type 2 sensitivity. In turn, this comparison gives us a measure of relative type 2 sensitivity.

Meta-d′ has high interpretational value. If meta-d = d′, then the observer exhibits type 2 sensitivity in agreement with what the standard SDT model would expect it to be, given the observed type 1 performance. In other words, on an SDT analysis we could say that the observer is metacognitively “ideal,” making use of all the information available for the type 1 task when performing the type 2 task. If meta-d  d′, then the observer’s type 2 sensitivity either outperforms or underperforms expectation (Fig. 1C). Typically, one would expect meta-d  d′, on the assumption that the information available for the type 1 task is exhaustive of the information available for the type 2 task. In this case, the degree to which meta-d′ is smaller than d′ reflects the degree to which the observer is metacognitively inefficient.

Because d′ has ratio scaling properties (Macmillan & Creelman, 2005), differences and ratios of d′ values are meaningful; for instance, if observer A has d = 2 and observer B has d = 1, it is meaningful to say that A has twice the sensitivity of B. Since meta-d′ is expressed on the same scale as d′, numerical comparisons between d′ and meta-d′ are also yield meaningful quantities. Thus, we are not limited to testing the null hypothesis that d = meta-d′, but can make graded assessments of relative type 2 sensitivity based on the outcome of differences or ratios. For instance, one can meaningfully state that a certain psychophysical manipulation changed an observer’s metacognitive capacity from 100% to 70%, or that a certain drug reduced the observer’s metacognitive capacity by 0.3 signal-to-noise ratio units (since d′ and meta-d′ are expressed in signal-to-noise ratio units).

The detailed method for estimating meta-d′ is described in Supplementary Online Materials. In brief, the central idea of the estimation is that the type 1 SDT model entails what the type 2 ROC curves for each type 1 response should be (Fig. 1B). Thus, we can directly fit the parameters of a type 1 SDT model so as to optimize the fit of the type 1 SDT model’s predicted type 2 ROC curves to the observed type 2 ROC data (Fig. 1C). Meta-d′ is the d′ of the type 1 SDT model that maximizes the likelihood of the observed type 2 ROC data (given a response bias similar to that observed in the empirical data; see Supplementary Online Materials). So for instance, if an observer has d = 2 and meta-d = 1, we could say that although their actual d′ is 2, their response-conditional type 2 ROC curves behave as if their d′ were only 1.

For implementation, we provide free Matlab code for easy estimation of meta-d′ (http://www.columbia.edu/~bsm2105/type2sdt/). On that website we also provide further documentation that gives full technical treatment to several theoretical and computational issues.

In the present study, we applied this new analysis approach to estimate subjects’ metacognitive sensitivity in a spatial 2IFC (2-interval forced-choice) visual task. Specifically, we tested how far they deviated from optimal metacognitive sensitivity given their type 1 performance.

Section snippets

Methods

Thirty participants performed a spatial 2IFC task. In each trial, participants distinguished between 2 spatial arrangements of visual stimuli and then rated their confidence in the accuracy of their responses on a four-point scale. Details are reported in Supplementary Online Materials.

Results

Despite the fact that we tried to titrate the stimulus contrast to control for type 1 performance level, there was substantial between-subject variation in (type 1) d′. Nonetheless, we can use this to our advantage by observing the results of the meta-d′ estimation across a range of d′ values. In Fig. 3, we plotted meta-d′ vs d′ for every subject. Note that there was a substantial positive relationship between these variables with most data points clustering near the line meta-d = d′, in line

Discussion

In this study we have employed a novel method for isolating and measuring the sensitivity with which metacognitive mechanisms differentiate between correct and incorrect decisions. One of the primary strengths of SDT is that we can use it to calculate d′, a measure of stimulus classification sensitivity independent from the influence of response bias. In a similar spirit, we have demonstrated a method for extending the standard SDT model in order to estimate meta-d′, a measure of type 2

Acknowledgment

This work is supported by internal funding from Columbia University (to HL). We thank Dobromir Rahnev and Steve Fleming for their helpful comments on the manuscript.

References (24)

  • Z. Dienes

    Assumptions of subjective measures of unconscious mental states: Higher order thoughts and bias

    Journal of Consciousness Studies

    (2004)
  • Z. Dienes et al.

    Assumptions of a subjective measure of consciousness: Three mappings

  • Cited by (504)

    View all citing articles on Scopus
    View full text