Abstract
Humans typically make near-optimal sensorimotor judgments but show systematic biases when making more cognitive judgments. Here we test the hypothesis that, while humans are sensitive to the noise present during early sensory processing, the “optimality gap” arises because they are blind to noise introduced by later cognitive integration of variable or discordant pieces of information. In six psychophysical experiments, human observers judged the average orientation of an array of contrast gratings. We varied the stimulus contrast (encoding noise) and orientation variability (integration noise) of the array. Participants adapted near-optimally to changes in encoding noise, but, under increased integration noise, displayed a range of suboptimal behaviours: they ignored stimulus base rates, reported excessive confidence in their choices, and refrained from opting out of objectively difficult trials. These overconfident behaviours were captured by a Bayesian model which is blind to integration noise. Our study provides a computationally grounded explanation of suboptimal cognitive inferences.
The question of whether humans make optimal choices has received considerable attention in the neural, cognitive and behavioural sciences. On one hand, the general consensus in sensory psychophysics and sensorimotor neuroscience is that choices are near-optimal. For example, humans have been shown to combine different sources of stimulus information in a statistically near-optimal manner, weighting each source by its reliability (Ernst & Banks, 2002; Knill, Kersten, & Yuille, 1996; Körding & Wolpert, 2006; Ma, Beck, Latham, & Pouget, 2006; Mamassian, Landy, & Maloney, 2002; Trommershäuser, Maloney, & Landy, 2008). Humans have also been shown to near-optimally utilise knowledge about stimulus base rates to resolve stimulus ambiguity (Kersten, Mamassian, & Yuille, 2004; Körding & Wolpert, 2004; O’Reilly, Jbabdi, Rushworth, & Behrens, 2013; Sun & Perona, 1998; Vilares, Howard, Fernandes, Gottfried, & Kording, 2012).
On the other hand, psychologists and behavioural economists, studying more cognitive judgments, have argued that human choices are suboptimal (Tversky & Kahneman, 1974). For example, when required to guess a person’s occupation, humans neglect the base rate of different professions and solely rely on the person’s description provided by the experimenter. Such suboptimality has been attributed to insufficient past experience (Hertwig & Erev, 2009), limited stakes in laboratory settings (Levitt & List, 2007), the format in which problems are posed (Jarvstad, Hahn, Rushton, & Warren, 2013), distortions in representations of values and probabilities (Ackermann & Landy, 2014), and/or a reluctance to employ costly cognitive resources (Gershman, Horvitz, & Tenenbaum, 2015; Kahneman, 2011). However, an account of human decision-making that can explain both perceptual optimality and cognitive suboptimality has yet to emerge (Summerfield & Tsetsos, 2015).
Here we propose that resolving this apparent paradox requires recognizing that perceptual and cognitive choices often are corrupted by different sources of noise. More specifically, choices in perceptual and cognitive tasks tend to be corrupted by noise which arises at different stages of the information processing leading up to a choice (Faisal & Wolpert, 2009; Hunt, 2014; Juslin & Olsson, 1997; Ma & Jazayeri, 2014). In perceptual tasks, experimenters typically manipulate noise arising before or during sensory encoding. For example, they may vary the contrast of a grating, or the net motion energy in a random dot kinematogram, which affects the signal-to-noise ratio of the encoded stimulus and in turn the sensory percept. Conversely, in cognitive tasks, which often involve written materials or clearly perceptible stimuli, experimenters typically seek to manipulate noise arising after stimulus encoding. For example, they may vary the discrepancy between different pieces of information bearing on a choice, such as the relative costs and benefits of a consumer product (Kahneman, 2011). These types of judgment are difficult because they require integration of multiple, sometimes highly discordant, pieces of information within a limited-capacity system (Botvinick, Braver, Barch, Carter, & Cohen, 2001; Eriksen & Eriksen, 1974; MacLeod, 1991).
Here we test the hypothesis that, while humans are sensitive to noise arising during early sensory encoding, they are blind to the additional noise introduced by their own cognitive system when integrating variable or discordant pieces of information. We tested this hypothesis using a novel psychophysical paradigm which separates, within a single task, these two types of noise. In particular, observers were asked to categorise the average tilt of an array of gratings. We manipulated encoding noise (i.e. the perceptual difficulty of encoding an individual piece of information) by changing the contrast of the array of gratings, with decisions being harder for low-contrast arrays. Second, we manipulated integration noise (i.e. the cognitive difficulty of integrating multiple pieces of information) by changing the variability of the orientations of individual gratings, with decisions being harder for high-variability arrays. Manipulating these different sources of noise within a single task allows us to rule out previous explanations of the optimality gap which hinge on task differences. To pre-empt our results, we show that, while observers adapt near-optimally to increases in encoding noise, they fail to adapt to increases in integration noise. We argue that such “noise blindness” is a major driver of suboptimal inference and may explain the gap in optimality between perceptual and cognitive judgments.
Results
Experimental dissociation of encoding noise and integration noise
All six experiments were based on the same psychophysical task (see Methods). On each trial, participants were presented with eight tilted gratings organized in a circular array. Participants were required to categorise the average orientation of the array as oriented clockwise (CW) or counter-clockwise (CCW) from the horizontal axis (Fig. 1A-B). After having made a response, participants received categorical feedback about choice accuracy, before continuing to the next trial. We manipulated two features of the stimulus array to dissociate encoding noise and integration noise: the contrast of the gratings (root mean square contrast, rmc: {0.15, 0.6}), which affects encoding noise, and the variability of the gratings’ orientations (standard deviation of orientations, std: {0°, 4°, 10°}), which affects integration noise. The distribution of average orientations was identical for all experimental conditions.
In Experiments 1 (n = 20) and 2 (n = 20), we assessed the effects of contrast and variability on choice accuracy and evaluated participants’ awareness of these effects. In both experiments, at the beginning of a trial, we provided a “prior” cue which, on half of the trials, signalled the correct stimulus category with 75% probability (henceforth “biased” trials), and, on the other half of trials, provided no information about the stimulus category (henceforth “neutral” trials) (Fig. 1B). The neutral trials provided us with a baseline measure of participants’ choice accuracy in the different conditions of our factorial design, and the biased trials allowed us to assess the degree to which – if at all – participants compensated for reduced choice accuracy in a given experimental condition by relying more on the prior cue. In Experiment 2, to provide additional insight into participants’ awareness of their own performance, we also asked participants to report their confidence in the choice (i.e. the probability that a choice is correct; Fig. 1C).
Matched performance _ for different levels of encoding and integration noise
We first used the neutral trials to benchmark the effects of contrast and variability on choice accuracy. As intended, choice accuracy decreased with lower contrast (Exp1: F(1,19) = 15.54, p < .001; Exp2: F(1,19) = 41.08, p < .001; collapsed: F(1,39) = 49.3, p < .001) and with higher variability (Exp1: F(1.3,24.7) = 8.51, p < .001; Exp2: F(1.6,32.2) = 26.0, p < .001; collapsed: F(1.4,57.3) = 30.61, p < .001). Our factorial design contained three critical conditions which allowed us to compare participants’ behaviour under distinct sources of noise: (i) “baseline”, (ii) “low-c” and (iii) “high-v”. In the baseline condition, the total amount of noise is lowest (high contrast, .6; zero variability, 0°). In the low-c condition (low contrast, .15; zero variability, 0°), encoding noise is high but integration noise is low. Conversely, in the high-v condition, integration noise is high but encoding noise is low (high contrast, 0.6; high variability, 10°). As expected, choice accuracy was reduced both in the low-c and in the high-v conditions (about 12%) compared to the baseline condition (baseline>low-c: t(39) = 9.24, p < .001; baseline>high-v: t(39) = 9.70, p < .001; Fig. 2A). Critically, choice accuracy was at statistically similar levels in the low-c and the high-v conditions (Exp1, high-v>low-c: t(19) = 0.36, p > 0.7; Exp2, high-v>low-c: t(19) = 0.11, p > 0.9; collapsed, high-v>low-c: t(39) = 0.34, p > 0.7; Fig. 2A). Overall, the results show that we successfully manipulated noise at different stages of information processing.
Do people utilise the prior cue to compensate for increased errors?
We next leveraged the biased trials to assess the degree to which participants adapted to the changes in choice accuracy induced by our factorial design. Given the above results, we would expect participants to rely more on the prior cue in the low-c and the high-v condition than in the baseline condition. To test this prediction, we applied Signal Detection Theory (Macmillan & Creelman, 2004; Stanislaw & Todorov, 1999) to quantify the degree to which participants shifted their decision criterion in accordance with the prior cue (see Methods). Briefly, we constructed a “bias index” computed as the difference in the decision criteria between the condition in which the prior cue was “clockwise” and the condition in which the prior cue was “counter-clockwise”. The higher the bias index, the higher the influence of the prior cue on choice. As expected under an ideal observer framework, participants used the prior cue more in the low-c than in the baseline condition (t(39) = 4.89, p < .001; Fig. 2C). However, contrary to an ideal observer framework, participants used the prior cue less in the high-v than in the baseline condition (t(39) = 2.85, p < .01; Fig. 2C). This pattern is clear from the psychometric curves constructed separately for each condition shown in Fig. 2B (compare inflection points).
In line with these results, a full factorial analysis of the bias index identified a positive main effect of contrast (F(1,39) = 24.02, p < .001) and a negative main effect of variability (F(1.9,37.1) = 9.9, p < .001; Fig. 2D). Finally, including both neutral and biased trials, we used trial-by-trial logistic regression to investigate how contrast (c) and variability (v) affected the influence of the prior cue and sensory evidence (μθ) on choices (μθ, cue, μθ*c, μθ*v, cue*c, cue*v; Fig. 2E). The prior cue had a larger influence on choices on low-contrast compared to high-contrast trials (t(39) = 4.05, p < .001) and on low-variability compared to high-variability trials (t(39) = 5.21, p < .001). Taken together, these results show that participants did not adapt to the additional noise arising during integration of discordant pieces of information.
Are people blind to integration noise?
To test whether participants failed to adapt because they were “blind” to integration noise, we analysed the confidence reports elicited in Experiment 2 (Fig. 1C). We implemented a strictly-proper scoring rule such that it was in participants’ best interest (i) to make as many accurate choices as possible and (ii) to estimate the probability that a choice is correct as accurately as possible (Sonnemans & Theo Offerman, 2001). In support of our hypothesis, analysis of the full factorial design showed that, while confidence varied with contrast (F(1,19) = 32.97, p < . 001), it did not vary with variability (F(1.2,22.5) = 0.73, p > 0.4). In addition, direct comparison between the low-c and high-v conditions showed that participants were more confident in the high-v condition (t(19) = 3.98, p < .001; Fig. 3A), with participants overestimating their performance (difference between mean confidence and mean accuracy; t(19) = 2.66, p < .05; Fig. 3A). Although participants reported lower confidence in the high-v condition compared to baseline (Fig. 3A), this decrease was due to participants utilising response times as a cue to confidence (Zakay & Tuvia, 1998): a trial-by-trial regression analysis showed that confidence decreased with longer response times (RTs) and was unaffected by variability once RTs had been accounted for (v: t(19) = 0.38, p > 0.7; all other t-values > 4, all p < . 001; see Fig. 3B and Response times in the Supplementary Information). Overall, these results show that participants were overconfident under integration noise, as if they were “blind” to the impact of integration noise on their performance.
In Experiment 3 (n = 18), because explicit confidence reports can be highly idiosyncratic (Aitchison, Bang, Bahrami, & Latham, 2015; Bang et al., 2017), we obtained an implicit, but perhaps more direct measure, of confidence (Hampton, 2001; Kepecs & Mainen, 2012; Kiani & Shadlen, 2009). Specifically, on half of the trials (“optional trials”), we introduced an additional choice option, an opt-out option, which yielded “correct” feedback with a 75% probability. On the other half of trials (“forced trials”), participants had to make an orientation judgment. Under this design, to maximise reward, participants should choose the opt-out option whenever they thought they were less than 75% likely to make a correct choice. Despite matched levels of choice accuracy in the low-c and the high-v conditions (forced trials, t(17) = 0.24, p > 0.8), participants decided to make an orientation judgment more often on high-v than on low-c trials (optional trials, t(17) = 2.32, p < .05; Fig. 3D), again indicating overconfidence in the face of integration noise. A full factorial analysis verified that the proportion of such opt-in trials varied with contrast (F(1,17) = 21.2, p < .001) but not with variability (F(1.4,23.9) = 3.6, p > 0.05). Similarly, a trial-by-trial logistic regression showed that the probability of opting in varied with contrast (t(17) = 6.93, p < .001) but not with variability (t(17) = 1.6, p > 0.1), after controlling for other task-relevant factors (e.g., average orientation and RTs). In sum, participants opted out more often when encoding noise was high, but did not do so when integration noise was high, despite making a comparable proportion of errors in the two conditions.
Computational model of noise blindness
We next compared a set of computational models based on the ideal observer framework to provide a mechanistic explanation for the observed data (see Methods). There are broadly three components to our modelling approach. First, a generative (true) model which describes the task structure and the generation of noisy sensory data. Second, an agent’s internal model of the task structure and how sensory data is generated; the internal model may differ from the generative model. Finally, a Bayesian inference process which involves inverting the internal model in order to estimate the probability of a stimulus category given sensory data and generate a response. This inference process involves marginalising over contrast and variability levels according to a belief distribution over experimental conditions. Optimal behaviour can be said to occur when there is a direct correspondence between the generative model and the agent’s internal model. We evaluated the models both qualitatively (i.e. model predictions for critical experimental conditions) and quantitatively (i.e. BIC scores).
We focus on an “omniscient” model, which has perfect knowledge of the task structure and how sensory data is generated, and two suboptimal models which propose different mechanistic explanations of participants’ lack of sensitivity to the performance cost associated with stimulus variability. The suboptimal models relax the omniscient assumptions about an agent’s beliefs about (i) the task structure and/or (ii) the sources of noise in play. See Supplementary Information for details about all models considered.
In our task the average orientation of a stimulus array was sampled from a common distribution of orientations across experimental conditions (Fig. 4A). We modelled an agent’s sensory data as a random (noisy) sample from a Gaussian distribution centred on the average orientation of the stimulus array (Fig. 4B), with the variance of this distribution determined by both encoding noise and integration noise. We used each participant’s data from the neutral trials to parameterise their levels of encoding noise and integration noise in each experimental condition (see Methods). The fitted noise levels, which are part of the generative model, were the same for all models; thus no additional free parameters were fitted to the data and the models only differed with respect to their assumptions about the internal model used for Bayesian inference.
The omniscient model has, for each experimental condition, a pair of functions that specify the probability density over sensory data given a CW and a CCW stimulus, taking into account both encoding and integration noise. As the model can identify the current condition (e.g., knows with certainty that a trial is drawn from the high-contrast, high-variability condition), it only uses the relevant pair of density functions to compute the probability of the observed sensory data given a CCW and a CW category (Fig. 4C). On neutral trials, each category is equally likely, and the agent computes the probability that a stimulus is CW and CCW directly from the density functions. On biased trials, the categories have different prior probabilities, and the agent scales the density functions by the prior probability of each category as indicated by the prior cue (Fig. 4D). After having calculated the probability that a stimulus is CW and CCW, the agent can compute a choice (i.e. chose the category with the higher posterior probability) and confidence in this choice (i.e. the probability that the choice is correct)
We now consider two competing explanations of the participants’ lack of sensitivity to the performance cost associated with stimulus variability. First, a variability-mixer model which relaxes the assumption that an agent can identify the current variability condition. The model therefore uses a single pair of density functions for all variability conditions (which are a mixture of density functions across variability levels). As a result, compared to the omniscient model, the density functions are wider on low-variability trials but narrower on high-variability trials. Second, a noise-blind model which relaxes the assumption that the agent is aware of integration noise. As for the variability-mixer model, the noise-blind model uses a single pair of density functions for all variability conditions, but, critically, these density functions do not take into account the additional noise induced by stimulus variability. Because of these differences in the internal model used for Bayesian inference, the models differ in the degree of confidence in a choice for a given sensory data (Fig. 4F) and, by extension, the influence of the prior cue on choice on biased trials.
In support of our hypothesis, the noise-blind model provided the best fit to our data. First, the noise-blind model, and not the omniscient model, predicted three key features of participants’ behaviour: (i) overconfidence on high-variability trials within participants (Fig. S2), (ii) no correlation between mean accuracy and mean confidence across participants (Fig. 5A) and (iii) a diminished influence of the prior cue on high-variability trials, as seen by both the analysis of the bias index (Fig. 5C) and the trial-by-trial regression predicting confidence (Fig. 5D), where the prior cue has a positive effect on confidence but its effect decreases with high contrast and high variability (in line with noise blindness). In addition, quantitative comparison yielded “very strong evidence” (Kass & Raftery, 1995) for the noise-blind model over the omniscient model, with an average ΔBIC across participants of -32.9 (Fig. 5B). Similarly, analyses of the patterns of overconfidence in the critical conditions of our factorial design favoured the noise-blind over the variability-mixer model (Fig. S2), and quantitative comparison yielded “very strong evidence” for the noise-blind over the variability-mixer model (ΔBIC = -20.4, Fig. 5B). In sum, the modelling indicates that participants neglected integration noise altogether.
Participants are noise blind and not variability blind
To further rule out the hypothesis that participants were simply unable to discriminate the variability conditions as proposed by the variability-mixer model, we ran Experiment 4 (n = 24). After having made a choice, participants were asked to categorise either the contrast of the stimulus array (rmc, high: .6 vs. low: .15) or the variability of the stimulus array (std, high: 10° vs. low: 0°) (Fig. 1E). Again, choice accuracy on neutral trials in the low-c and the high-v conditions was statistically indistinguishable (t(23) = 1.16, p > 0.2). We reasoned that, if participants had difficulty identifying the variability condition but otherwise aware of integration noise, then they should behave closer to optimal when they correctly identified the variability condition. To test this prediction, we used the biased trials to compare cue usage when the variability condition was correct and incorrectly categorised (75.71% ± 2.26% of the variability-condition judgments were correct). In contrast to the prediction, but in line with our hypothesis, participants showed blindness to integration noise even when they correctly identified the variability condition: participants were more biased on low-c than high-v trials regardless of whether the variability categorisation was correct (t(23) = 3.21,p < .01) or incorrect (t(23) = 4.05,p < .001; Fig. 6A-B).
In Experiments 1-4, the experimental conditions were interleaved across trials, which may have made it too difficult for participants to separate the different sources of noise in play. To test the generality of our results, we ran Experiment 5 (n = 24) in which either the contrast or variability level were kept constant across a block of trials (Fig. 6C-D). Even then, and despite receiving trial-by-trial feedback, participants were not, compared to the baseline condition, more influenced by the prior cue when variability was high (biased trials, t(23) = 0.32, p > 0.7), but they were when contrast was low (biased trials, t(23) = 3.31, p < .01). In other words, even under blocked conditions participants failed to learn about the performance cost associated with stimulus variability.
Sequential sampling account of noise blindness
A recent study investigated how stimulus volatility (i.e. changes in evidence intensity across a trial) affected choice and confidence (Zylberberg, Fetsch, & Shadlen, 2016). Participants were found to make faster responses and report higher confidence when stimulus volatility was high. These results were explained by a sequential sampling model which assumes that observers are unaware of stimulus volatility and therefore, unlike an “omniscient” model, adopt a common choice threshold across trial types. In the Supplementary Information, we show, using empirical and computational analyses, that this model cannot explain our results (Fig. S3). For example, the model predicts faster RTs on high-variability than low-variability trials, a prediction which is at odds with our observation of slower RTs on high-variability trials.
Noise blindness cannot be explained by subsampling
We have proposed that stimulus variability impairs performance because of noise inherent to cognitive integration of variable or discordant pieces of information. An alternative explanation of the performance cost for high stimulus variability is that participants based their responses on a subset of gratings rather than the full array. Under this subsampling account, choice accuracy for high-variability stimuli is lower because of a larger mismatch between the average orientation of the full array and the average orientation of the sampled subset. Here we provide several lines of evidence against the subsampling account (see details in Supplementary Information).
We first examined performance under different set-sizes in Experiment 6 (n = 20) where the stimulus array was made up of either four or eight gratings (average orientations and orientation variability were equated across set-sizes). We reasoned that, if participants did indeed engage in subsampling, then performance should be higher for four than eight gratings. Because of the matched average tilt in the array, sampling four items would impair performance in the high-v condition for an eight-item array but not for a four-item array. However, we found no effect of set-size on choice accuracy (F(1,20) = 0.006, p > 0.9; Fig. S4A); the effects of contrast (F(1,20) = 40.9, p < .001) and variability (F(1,20) = 30.50, p < .001) were comparable to those observed in our previous experiments.
We next simulated performance for eight-grating arrays under a subsampling agent which did not have integration noise but instead sampled a subset of the items (1-8 items, Fig. S4B). The observed difference in participants’ performance between the baseline and the high-v conditions could be explained by assuming an agent that sampled about four items out of eight. However, this account – because there is no integration noise – predicts that participants should have similar levels of performance for the baseline and the high-v conditions for four-item arrays, a prediction which is at odds with our data (Fig. S4A). If integration noise is introduced, then most, if not all, items would have to be sampled to account for the data.
Finally, we fitted a computational model to participants’ choices in Experiments 1 to 3 (eight-item arrays) in order to directly estimate the number of items sampled by each participant. This modelling approach revealed that the majority participants (42 out of 60) sampled all eight items (Table S2). We note that subsampling, even if an auxiliary cause of integration noise, cannot without further assumptions (e.g. blindness to the performance cost) explain participants’ lack of sensitivity to the performance cost associated with high-variability stimuli.
Discussion
Here we propose a new explanation for the previously reported gap in optimality between perceptual and cognitive decisions. Using a novel paradigm, we show, within a single task, that humans are sensitive to noise present during sensory encoding, in keeping with previous perceptual studies (Ernst & Banks, 2002; Körding & Wolpert, 2004), but blind to noise arising when having to integrate variable or discordant pieces of information, a typical requirement in cognitive tasks. This noise blindness gave rise to two common signatures of suboptimality often found in cognitive studies: base-rate neglect and overconfidence.
We provided several lines of evidence for our hypothesis. When stimulus variability was high, participants were overconfident, as indicated by cue usage, subjective confidence reports as well as opt-in responses, even though they received trial-by-trial feedback, and even when stimulus variability was salient (Exp1-3), accurately categorised (Exp4) or constant across a block of trials (Exp5). These findings indicate that, while participants were able to track stimulus variability, they simply neglected the performance cost associated with high-variability stimuli. We also ruled out that such noise blindness was due to participants only sampling a subset of a stimulus array (Exp6). The best model of our data assumed that participants sampled all items and were blind to the additional noise inherent to cognitive integration of variable or discordant pieces of information.
An extensive literature has considered the different types of noise which affect human choices (Beck, Ma, Pitkow, Latham, & Pouget, 2012; Hunt, 2014; Juslin & Olsson, 1997). Our classification is partially related to a previous distinction between noise which originates inside the brain, such as intrinsic stochasticity in sensory transduction (Thurstone, 1927), and noise which arises outside the brain, such as a probabilistic relationship between a cue and a reward (Brunswik, 1956). Specifically, our account classifies noise according to when it arises during the information processing that precedes a choice. Encoding noise refers to noise accumulated up to the point at which a stimulus is encoded. As such, encoding noise includes both “external” noise (e.g., a weak correspondence between a retinal image in dim lighting and the object that caused the image) and “internal” noise (e.g., intrinsic stochasticity in sensory transduction). In comparison, integration noise strictly refers to internal noise which arises at later stages of information processing, such as when integrating variable or discordant pieces of information within a limited-capacity system. Under our account, any task that requires the combination of multiple pieces of evidence will be subject to integration noise, and the amount of integration noise will scale with the variability of the different pieces of information that must be combined. Choices may of course be affected by other types of noise than those considered here. For example, cognitive decisions may involve memories, sometimes distant in the past, and risk and ambiguity (Bach & Dolan, 2012; Payzan-LeNestour & Bossaerts, 2011).
Many psychophysical tasks confound encoding and integration noise. For instance, in a random dot-motion task, increasing motion coherence simultaneously increases encoding noise (as instantaneous evidence is less indicative of the correct category of motion) and integration noise (as the variability of evidence across time is higher and thus harder to integrate). Recent work has shown that noisy cognitive inference, related to our notion of integration noise, is a major driver of variability in choices (Drugowitsch, Wyart, Devauchelle, & Koechlin, 2016). Similarly, it has been shown that for complex inference problems, a mismatch between an agent’s internal model of a task and the true structure of a task provokes departures from optimality (Beck et al., 2012). Here we extend these findings by introducing noise blindness as an additional driver of suboptimal cognitive inference. Specifically, the variability in choices caused by integration noise, or by imperfect inference, may not systematically bias choices away from the true choice. Blindness to these sources of choice variability, however, predicts systematic overconfidence, which may manifest itself as a lack of sensitivity to base-rate information, for example. In short, suboptimality can arise not only from having the “wrong” model of the task but also from having the “wrong” model of oneself.
We do not know why humans are blind to integration noise. One possibility is that basing decision strategies on all sources of noise would prolong deliberation and thus reduce reward rates, or that recognising one’s own cognitive deficiencies requires a much longer timeframe. However, a well-known cognitive illusion may help understand why blindness to one’s own cognitive deficiencies may not be catastrophic: even though failures to detect salient visual change suggests that cognitive processing is highly limited (Simons & Levin, 1997), humans enjoy rich, vivid visual experiences of cluttered natural scenes. Human information processing is sharply limited by capacity, but as agents we may not be fully aware of the extent of these limitation.
Author contributions
S.H.C., D.B., T.E. and C.S. conceived the study. S.H.C., D.B., J.D. and C.S. designed the experiments. S.H.C. programmed the experiments. S.H.C, D.B. and J.D. performed the experiments. S.H.C., D.B. and R.M. developed the models. S.H.C. and D.B. analysed the data and performed the simulations. S.H.C., D.B., J.D., T.E. and C.S. interpreted the results. S.H.C. drafted the manuscript. S.H.C, D.B. and C.S. wrote the manuscript.
Competing interests
The authors declare no financial or non-financial competing interests.
Methods
Participants
One hundred and five healthy human participants with normal or corrected-to-normal vision were recruited to participate in six experiments (72 females, 8 left-handed, mean age ± SD: 25.02 ± 4.25; Exp1: n = 20; Exp2: n = 20; Exp3: n = 20; Exp4: n = 24; Exp5: n = 24; Exp6: n = 20). Participants were reimbursed for their time and could earn an additional performance-based bonus (see below). All participants provided written informed consent. The experiments were conducted in accordance with local ethical guidelines.
Experimental paradigm
All six experiments were based on the same psychophysical task. On each trial, participants had to judge whether the average orientation of a circular array of gratings (Gabor patches; see Fig. 1) was tilted clockwise (CW) or counter-clockwise (CCW) relative to horizontal. The average orientation of the gratings in each trial was randomly selected from a mixture of two Gaussian distributions (centred at 3° either side of the horizontal axis, respectively, and with 8° of standard deviation). We manipulated encoding noise and integration noise by varying two features of the array in a factorial way manner: the root mean square contrast (rmc) of the individual gratings, which affects the difficulty of encoding the stimulus array, and the variability of the orientations of the individual gratings (std), which affects the difficulty of integrating orientations across the stimulus array. The number of contrast and variability conditions varied between experiments: in Experiments 1-3, three contrast levels (rmc = {0, .16, .6}) and three variability levels (std = {0°, 4°, 10°}); in Experiments 4-6, two contrast levels (rmc = {.15, .6}) and two variability levels (std = {0°, 10°}). The stimulus array was presented for 150 ms and was followed by a 3000 ms choice period. Participants indicated their choice by pressing the right (CW) or the left (CCW) arrow-key on a QWERTY keyboard. They received feedback about choice accuracy, before continuing to the next trial. If no response was registered within the choice period, the word “LATE” appeared at the centre of the screen, and the next trial was started. Experiments 1, 2 and 3 consisted of 1296 trials, divided into 36 blocks of 36 trials each. Experiments 4, 5 and 6 consisted of 1200 trials, divided into 32 blocks of 40 trials each.
In Experiments 1 and 2, participants were presented with a cue to the prior probability of each stimulus category. The cue was presented 700 ms before the onset of the stimulus array and remained on the screen until a response was registered. An “N” indicated that the two stimulus categories were equally likely, an “R” indicated a 75% probability of a CW stimulus and an “L” indicated a 75% probability of a CCW stimulus. Half of the blocks contained neutral trials (“N”) and the other half contained biased trials (“R” or “L”). The blocks were randomised across an experiment. In Experiment 2, after having made a choice, participants were required to indicate the probability that the choice is correct by moving a sliding marker along a scale (50% to 100% in increments of 1%). In Experiment 3, on half of the blocks, participants could opt out of making a choice and receive the same reward as for a correct choice with a 75% probability. There was no prior cue. In Experiment 4, after having made a choice, participants had to categorize (high vs. low) either the contrast or the variability of the stimulus array. Participants received trial-by-trial feedback about the categorisation judgment. The judgment types were counterbalanced across trials. In Experiment 5, for each block of trials, we fixed the contrast or the variability level while varying the other feature. In Experiment 6, on half of the blocks, the stimulus array consisted of eight gratings and, on the other half of blocks, the stimulus array consisted of four gratings. Further experimental details are provided in the Supplementary Information.
Statistical analyses
All statistics are reported at the group level. We performed analyses of variance (ANOVAs) with participants as a random variable to test the effects of contrast and variability on choice accuracy, response times, cue usage, confidence (Exp2) and opt-in behaviour (Exp3). We performed most analyses of choice accuracy and confidence using neutral trials; analyses of cue usage were naturally based on biased trials. We used multiple linear regression and multiple logistic regression to isolate the effect of variability on confidence and opt-in responses, respectively. For the analyses in Fig. 5A, seven participants were excluded because of excessive opt-out responses, but result were almost identical when including them. All p-values lower than .001 are reported as “p < .001”, p-values higher or equal than .001 but lower than .01 are reported as “p < .01”, p-values higher or equal to .01 but lower than .05 are reported as “p < .05”. All p-values greater or equal to .05 are reported as higher than the closest lower decimal (e.g., a p-value of .175 would be reported as “p > 0.1”), with exception of p-values between .05 and . 1 which are reported as “p > .05”. The degrees of freedom for the ANOVAs are specified using non-integer numbers when a Greenhouse-Geisser correction has been used to correct for violations of the sphericity assumption.
Computational modelling
We first describe the omniscient model who takes into account encoding and integration noise and can identify which condition a trial is drawn from (i.e. assigns a probability of 1 to the current condition on a given trial). We then describe the variability-mixer model, who takes into account integration noise but cannot distinguish the variability conditions (i.e. assigns equal probability to all variability conditions on a given trial), and the noise-blind model, who entirely neglects integration noise. For completeness, we ran six additional models which varied an agent’s awareness of encoding noise and/or ability to discriminate contrast conditions. We only discuss these models in the Supplementary Information as they had no support in the empirical data.
We modelled – regardless of the model – an agent’s noisy estimate, x, of the true average orientation, μ, as a random sample from a Gaussian distribution with mean μ and variance σ2: where σ is the agent’s total level of noise (encoding plus integration noise) in an experimental condition (see below for noise estimation).
We assumed that an omniscient agent’s internal model has, for each condition, a unique pair of category-conditioned probability density functions (PDFs) over sensory data, which reflect the total level of noise and the true probability distribution over average orientations (see Fig. 4C for an example). As such, an omniscient agent would have six pairs of PDFs in Experiments 1-3 and four pairs of PDFs in Experiments 4-6. An omniscient agent uses the relevant pair of PDFs to compute the probability of the sensory data given a CW and a CCW category: where cat is the category and cond is the condition. We constructed the PDFs by convolving the true probability distribution over average orientations with a zero-centred Gaussian distribution with variance σ2 depending on a participant’s total noise in a condition. Note that the construction of these PDFs is specific to the model in question (see construction of “non-omniscient” PDFs below) and is the only source of variation in model predictions about choice and confidence.
We assumed that an agent – regardless of the model – would compute the probability of each category using Bayes’ theorem: where P(x|cat, cond) is computed using the relevant PDFs and p(cat) is the prior probability of the category in question as indicated by the prior cue. If the category in question is CW, then the alternative category, catalt is CCW, and vice versa. On neutral trials, the prior probability of each category is 50%. On biased trials, the prior probability of one category is 75% and the prior probability of the other category is 25%. The computation detailed in eq. 3 can be thought of as scaling the relevant PDFs by the prior probability of the respective category (see Fig. 4D for an example).
Finally, we assumed that an agent – regardless of the model – makes a decision, d, by selecting the category with higher posterior support and computes confidence in this decision as: which in our task is directly given by the posterior probability of the chosen category.
Because an omniscient agent takes into account encoding and integration noise and knows which experimental condition a trial is drawn from, she will (i) be appropriately influenced by the prior cue, (ii) accurately estimate the probability of having made a correct choice, and (iii) opt out of trials when she believes that she is less than 75% likely to be correct. We now describe two models which relax the “omniscient” assumptions.
We first consider a variability-mixer agent who is sensitive to integration noise but cannot distinguish the different variability conditions. Therefore, when estimating the probability of the sensory data given a CW and a CCW category, the variability-mixer marginalizes its estimate over all possible variability conditions (equivalent to an omniscient agent whose PDFs have been mixed across variability conditions). As a result, when orientation variability is low, the PDFs are more overlapping than for the omniscient model. Conversely, when orientation variability is high, the PDFs are less overlapping than for the omniscient model. For these reasons, a variability-mixer model would display a mixture of under- and overconfidence.
Finally, we consider a noise-blind agent who is entirely unaware of integration noise. Like in the case of the variability-mixer model, a noise-blind agent only has a pair of PDFs for each contrast level but, unlike in the case of a variability-mixer model, these PDFs only take into account encoding noise. As a result, when orientation variability is non-zero, the PDFs are less overlapping than under either of the two other models (Fig. 4E) and a noise-blind agent would therefore tend to hold stronger posterior beliefs (i.e. steeper curves for Fig.4F). Such stronger posterior beliefs will lead a noise-blind agent to (i) be less influenced by the prior cue than needed, (ii) overestimate the probability of having made a correct choice, and (iii) not opt out of trials when being less than 75% likely to be correct.
We note that the models make the same predictions about choice on neutral trials but are distinguishable when focusing on (i) biased trials and (ii) confidence and opt-in behaviour on both neutral and biased trials. Our modelling approach allowed us to calculate a choice probability for each trial under a given model. For model analyses requiring a categorical choice (e.g., logistic regression), we sampled choices according to these choice probabilities.
Noise estimation
We assumed that each experimental condition was affected by Gaussian noise with a specific standard deviation, σcond. We assumed that encoding noise depends upon the contrast of the array and that integration noise is proportional to the variability of orientations in the array. We estimated the total level of noise for each condition using four free parameters (three for Experiments 4-6). Two parameters characterised the level of encoding noise for each contrast level: one for low contrast (nClow) and one for high contrast (nChigh). The other two parameters (one for Experiments 4-6) characterised the level of integration noise for each variability level: one for medium variability (nVmed, only for Experiments 1-3) and one for high variability (nVhigh). For a given condition, the total level of noise (the standard deviation of the Gaussian noise distribution), σcond, is thus given by: where εσcond and ισcond specify the contribution of encoding noise and integration noise, respectively. For instance, for the low-contrast, high-variability condition would be given by substituting nClow for εσcond and nVhigh for iσcond.
We fitted the four noise estimators for each participant by maximizing the likelihood of the participant’s choice using neutral trials only (we used a genetic algorithm with a population size of 100 individuals and a maximum generation time of 1000 generations). We note that, because of our factorial design, we could separate the two sources of noise. We used the fitted parameters for each participant to construct the model PDFs described above. We stress that the noise estimation use choices on neutral trials only and that the model predictions pertain to independent features of the data: (i) confidence on neutral trial choices, (ii) choices (and choice probabilities) on biased trials, and (iii) probability of opting out.
The mean ± SEM of the best fitting values for the four noise parameters (nClow, nChigh, nVmed and nVhigh) in units of degrees were: 10.10 ±1.51, 3.31 ± 0.39, 3.0 ± 0.78 and 6.8 ± 1.0, respectively. Following equation 5, the estimated total amounts of noise fitted for the three key conditions (baseline, low-c and high-v) were therefore: 3.31 ± 0.39, 10.1 ± 1.51 and 8.0 ± 1.0, respectively. There was a significant difference between the values for the baseline condition and those for the other two conditions (both p-values < 0.001), but no significant difference between the low-c and high-v conditions (p-value > 0.16).
Psychometric fits
We fitted psychometric curves to the average proportion of clockwise choices using a four-parameter logistic function: where P is the proportion of CW choices, A1 is the right asymptote, A2 is the left asymptote, x0 is the inflection point and 1/dx is the steepness, and x is the average stimulus orientation at which the proportion of CW choices is evaluated. We computed the proportion of clockwise choices within average-orientation bins (i.e. six quantiles over the average orientation relative to horizontal). The psychometric curves shown in Fig. 2B are only used for illustration.
Bias index
We used Signal Detection Theory (Macmillan & Creelman, 2004; Stanislaw & Todorov, 1999) to calculate the decision criteria, c, separately for trials on which the prior cue favoured CW and trials on which the prior favoured CCW. The decision criterion provides a signed estimate of the degree to which the prior cue biases a participants’ choices independently of their sensitivity to average orientation. We computed the criterion as, c = −0.5[Φ−1(HR) + Φ−1(FAR)], where Φ−1 represents the inverse of the normal cumulative density function, and HR and FAR represent the hit rate (i.e. the proportion of CW trials where participants responded CW) and false alarm rate (i.e. the proportion of CCW trials where participants responded CW), respectively. We then used the difference between c when cued CW (cCW) and c when cued CCW (cCCW) as our measure of cue usage: bias index = cCW-cCCW. Higher values indicate greater cue usage. We computed a bias index for each participant and each experimental condition.
Acknowledgments
This work was supported by a Wellcome 4-year-PhD grant to S.H.C. (0099741/Z/12/Z) and an ERC starter grant to C.S. (281628). The Wellcome Centre for Human Neuroimaging is supported by core funding from Wellcome (203147/Z/16/Z).