Abstract
To make optimal decisions, intelligent agents must learn latent environmental states from discrete observations. Bayesian frameworks argue that integration of evidence over time allows us to refine our state belief by reducing uncertainty about alternate possibilities. How is this increasing belief precision during learning reflected in the brain? We propose that moment-to-moment neural variability provides a signature that scales with the degree of reduction of uncertainty during learning. In a sample of 47 healthy adults, we found that BOLD signal variability (SDBOLD, as measured with functional MRI) indeed compressed with successive exposure to decision-related evidence. Crucially, more accurate participants expressed greater SDBOLD compression primarily in Default Mode Network regions, possibly reflecting the increasing precision of their latent state belief during more efficient learning. Further, computational modeling of behavior suggested that more accurate subjects held a more unbiased (flatter) prior belief over possible states that allowed for larger uncertainty reduction during learning, which was directly reflected in SDBOLD changes. Our results provide first evidence that moment-to-moment neural variability compresses with increasing belief precision during effective learning, proposing a flexible mechanism for how we come to learn the probabilistic nature of the world around us.
To make optimal decisions, animals often need to learn about “states” (i.e., decision-relevant properties) of the environment, such as the availability of fish in a nearby pond based on previous catch successes. These states are often not directly observable, so optimal agents need to combine the uncertainty about their beliefs with the evidence available to them. Research suggests that humans and other animals often take into account uncertainty during learning and decision-making2–4, while suboptimal use of uncertainty has been linked to maladaptive behavior observed in clinical and ageing populations3. Bayesian decision theory prescribes how agents should combine evidence with their internal beliefs under uncertainty to arrive at optimal estimates of external states and provides a benchmark to assess rational behavior in the context of learning 2,3,5. In this framework, an agent must represent a probability distribution over possible states. The variance of this distribution reflects the uncertainty of the agent’s state belief. As evidence accumulates and the agent refines its representation of the “correct” world state, the variance of the belief distribution is thought to reduce. How this increasing belief precision during learning is reflected in the brain remains unclear.
A potential candidate for tracking the degree of state uncertainty is the moment-to-moment variability of neural responses. At the level of sensory cortex, computational modeling and non-human animal work suggests that neuronal variability signals uncertainty about peripheral inputs, such that neural population activity reflects sampling from a probability distribution over potential stimulus features; the higher perceptual uncertainty, the higher the neural variability 6–10. This idea is in line with theoretic work suggesting that neural systems should maintain an appropriate degree of instability in the face of uncertainty to permit the exploration of alternative causes of incoming sensory input 11. While plausible, no study to date has investigated whether stimulus-evoked neural variability tracks changes in perceptual uncertainty in a learning task and how it relates to behavior. With regard to human data, a recent study by Kosciessa, et al. 12 showed that increasing uncertainty about the task-relevance of different perceptual stimulus features is accompanied by an increase in cortical “excitability” (heightened desynchronization of alpha rhythms and increased entropy in the EEG signal). However, it is not clear whether these perceptual accounts of uncertainty-based shifts in neural variability also translate to higher-order decision variables that arise while learning about latent environmental states.
In the context of learning, larger temporal variability under high uncertainty may not just represent exploration of potential world states but may allow more flexible updating of one’s belief once new information becomes available. Indeed, previous human work suggests that higher brain signal variability affords larger cognitive flexibility 13,14. For example, several studies have found that brain signal variability increases with increasing task demands (at least until processing limits are reached) and that the ability to upregulate variability predicts task performance 15–22. These studies commonly argue that neural variability supports performance under increasing task demand by allowing the brain to maintain flexible responding to stimulus information. In line with this idea, Armbruster-Genç, et al. 16 observed better performance on a task switching paradigm under higher BOLD signal variability, whereas cognitive stability during distractor inhibition was related to lower brain signal variability. During learning, task demands are highest early on, when the brain needs to maintain high flexibility to incorporate incoming evidence for belief updating. This should allow it to converge on the “correct” state representation with learning. We therefore hypothesize that moment-to-moment brain signal variability compresses with increasing belief precision (i.e., decreasing uncertainty) during learning (Figure 1).
To test this hypothesis, we acquired functional MRI (fMRI) while participants performed a “marble task”. In this task, participants had to learn the probability of drawing a blue marble from an unseen jar (i.e., urn) based on five samples (i.e., draws from the urn with replacement). In a Bayesian inference framework, the jar marble ratio can be considered a latent state that participants must infer. We hypothesized that (1) variability in the BOLD response (SDBOLD) would compress over the sampling period, thus mirroring the reduction in state uncertainty, and that (2) subjects with greater SDBOLD compression would show smaller estimation errors of the jars’ marble ratios as an index of more efficient belief updating. A secondary aim of the current study was to directly compare the effect of uncertainty on SDBOLD with a more standard General Linear Modelling (GLM) approach, which looks for correlations between average BOLD activity and uncertainty. This links our findings directly to previous investigations of neural uncertainty correlates, which disregarded the magnitude of BOLD variability 2,23–37. We hypothesized (3) that SDBOLD would uniquely predict inference accuracy compared to these standard neural uncertainty correlates.
Our results showed that SDBOLD closely tracked individual differences in the reduction of state uncertainty during learning which related to task accuracy in a unique manner that was not captured by a more standard GLM approach. As such, we identified a novel neural signal reflecting uncertainty during learning.
METHODS
Participants and procedure
51 healthy young adults (18 – 35 years) participated in the study. They were recruited from the participant database of the MPI for Human Development (Berlin, Germany) and gave written informed consent according to the guidelines of the German Psychological Society (DGPS) ethics board, who approved the study. Participants underwent fMRI while performing a perceptual gambling task. They received 10 Euros per hour in addition to a variable performance bonus of up to 2 Euros.
The marble task
The task was divided into four blocks of 18 trials each. Participants performed three blocks during fMRI acquisition, while the final block was performed outside the scanner. We will only consider the three MR blocks for our analyses. Participants received instructions and completed nine practice trials prior to the first scanning block. Each trial of the task consisted of three phases: a sampling, an estimation, and a gambling phase. To investigate brain signal variability changes during learning, we focused on the sampling and estimation phases in this study (referred to as the “marble task”). A description of the gambling phase, which was unrelated to the learning process, is provided in the Supplementary Methods. The task design also included a between-trial reward manipulation. Here, we collapsed trials across the two levels of the reward manipulation given that we assumed those would only affect the choice and not the learning part. The task was programmed using Presentation software (Version 14.9, Neurobehavioral Systems Inc., Albany, CA, USA).
In the marble task, participants were asked to estimate the proportion of blue marbles in an unseen “jar” (i.e., urn), containing a total of 100 blue and red marbles, based on five successive samples (i.e., draws from the urn with replacement) presented successively during the sampling phase (Figure 2A). In total, 18 different jars with different proportions of blue marbles (ranging from 0.1 to 0.91) were presented across all trials (Supplementary table S1). Each jar was presented four times across the four experimental blocks in random order. Each sample from a given jar could contain either one, five or nine marbles, which manipulated the informativeness of the samples (i.e., one marble was least informative for inferring the proportion of blue marbles whereas nine marbles was most informative). Samples were presented in 3×3 grids with grey marbles serving as placeholders to ensure similar visual inputs across samples. Additionally, within each sample the order of the marbles (i.e., the location in the 3×3 grid) was permuted. The samples for each jar were randomly drawn from a binomial distribution with the corresponding probability of drawing a blue marble. This draw was performed once so each jar was associated with a consistent set of five samples with varying sizes (note that the total number of marbles across samples thus varied between jars). This ensured that all participants received identical sample information. For each trial with the same unseen jar, the order of the associated samples was varied randomly. Each sample was presented for 1s followed by a fixation cross presented for 2 to 6s. Following the sampling phase, participants were asked to indicate their estimate of the proportion of blue marbles in the jar by adjusting the ratio of blue to red marbles in a 100-marble grid using an MR-compatible five-button box. The starting value of the grid was always set to 50 red and 50 blue marbles. With the upper and lower buttons, the participants were able to adjust the grid in steps of five marbles, and with the left and right button in a one marble step, respectively. With the middle button participants confirmed their estimation. The maximum reaction time for the estimation phase was set to 7s. Participants received a bonus payment based on the average accuracy of their marble ratio estimates (performance-dependent bonus).
Behavioral modelling
We modeled participants’ responses using variants of a Bayesian observer model. The Bayesian observer represents the probability of drawing a blue marble from the unseen jar as a beta distribution with parameters α and β, which is updated after each draw according to Bayes’ rule (Figure 2B). Because the beta distribution is a conjugate prior for the binomial distribution (the likelihood function in this task), the posterior belief distribution is also a beta distribution with αs+1 = αs + Bs and βs+1 = βs + Rs, where Bs and Rs are the number of blue and red marbles for a sample s respectively. The prior for the first draw of each trial is given by α = β = 1, representing a flat prior. The best estimate for the probability of drawing a blue marble is given by the expectation of the beta distribution.
To model empirical response patterns affecting subjects’ task accuracy, we considered two variants of this Bayesian observer model. In one model variant, we fit a parameter ρ ≥ 1 for each subject, which described the initial setting of α and β on each trial prior to observing samples from a given jar (where ρ = α = β; Figure 2B). A higher value for π describes a narrower prior around the default belief of 0.5. In a second Bayesian observer model variant, we fit a parameter ρ ≥ 0, which exponentially weighs the evidence of each sample: αs+1 = αs + (Bs)ο and βs+1 = βs + (Rs) ο. If ρ < 1, the evidence of larger draws is underweighted, while for ρ > 1 the evidence of larger draws is overweighted. Comparing these models allowed us to test whether deviations from the unbiased Bayesian observer model were related to prior representation or evidence weighting respectively.
To accommodate probabilistic responding of subjects, we compared these models with two alternative choice rules. One model family assumes that subjects’ reported estimates represent draws from the final beta distribution on each trial. Another model family assumes that subjects’ choices are draws from a truncated normal distribution (0 ≤ x ≤ 1) that is centered on the expectation of the final beta distributions: N(E[beta(α, β)], σ), where σ is a free model parameter that captures response noise around the model prediction. This parameter may reflect imprecision in entering one’s marble ratio estimate into the response grid, but it can also capture any unmodeled sources of (biased and random) errors in jar estimates beyond those explained by the model parameters of interest. This parameter may thus also capture model misfit.
Bayesian observer models assume that participants track the full belief distribution over potential marble ratios and consider uncertainty when updating their estimate. To test whether subjects indeed behave in a Bayesian manner, we also fitted a simpler Rescorla-Wagner model that only updates a point estimate of the marble ratio belief: , where R<% is the marble ratio point estimate after observing sample s, B and Rs are the number of blue and red marbles respectively, and 0 ≤ α ≤ 1 is the free learning rate parameter. To model subjects’ responses, we included a free noise parameter σ modelling Gaussian response noise around the final point estimate, similar to the noisy choice rule described previously for the Bayesian observer models.
Model fitting and comparison
We fitted behavioral models to participants’ responses on all trials of the three MR blocks by minimizing the joint negative log likelihoods under each model using the fmincon routine implemented in MATLAB (version 2017b). We ensured model convergence by fitting each model 10 times and used the fitting iteration with the minimal negative log likelihood across subjects. We compared model fits by computing the Bayesian Information Criterion (BIC): BIC = k ln(n) − 2 ln (L), where k is the number of parameters, n is the number of datapoints, and L is the maximized model likelihood. We report BICs for each subject and across all subjects.
Model simulations
We simulated behavior using the empirical parameter estimates and the trial sequences of each subject. This resulted in one synthetic dataset for each model with 51 simulated subjects each. We used these simulated datasets to perform model and parameter recovery checks. See the Supplementary Information for the results of these checks.
Image acquisition and pre-processing
Participants underwent functional MRI scanning at the Max Planck institute for Human Development (Berlin, Germany) in a 3 Tesla Siemens TrioTim MRI system (Erlangen, Germany) using a multi-band EPI sequence (factor 4; TR = 645 ms; TE = 30 ms; flip angle 60°; FoV = 222 mm; voxel size 3 × 3 × 3 mm; 40 transverse slices). This amounted to 1010 T2*-weighted functional images per scanner run per subject (except for one subject, who had 1020 acquired images per run). A T1-weighted structural scan was also acquired (MPRAGE: TR = 2500 ms; TE = 4.77 ms; flip angle 7°; FoV = 256 mm; voxel size 1 × 1 × 1 mm; 192 sagittal slices).
T1-weighted images were brain extracted using ANTs software (version 2.3.5, http://stnava.github.io/ANTs/) using population level templates from the OASIS dataset (https://figshare.com/articles/dataset/ANTs_ANTsR_Brain_Templates/915436). The functional T2*-weighted scans were pre-processed in FSL (version 5.0.11) FEAT separately for each run. The pipeline includes motion correction, brain extraction of the functional images, and spatial smoothing using a Gaussian 5mm kernel. Following FEAT, the functional images were first detrended using SPM’s detrend function (at a 3rd order polynomial) and then high-pass filtered using a standard 8th order butterworth filter implemented in MATLAB (version 2017b) with a cut-off of 0.01 Hz. We then performed ICA on the resulting data using FSL MELODIC to identify residual artifacts. We manually labeled rejected components for 19 subjects (∼37% of the total data; for details on the component rejection criteria see 38) and then used these labels to train FSL’s ICA classifier FIX to automatically label artifactual components for the remaining subjects. We tested different classification thresholds and validated classification accuracy for several randomly selected subjects from the test set. We accepted a threshold of 20 after this manual inspection. Rejected components were then regressed out of the data using FSL’s fsl_regfilt function. Lastly, functional images were first registered to the brain-masked T1-weighted images and then to MNI space (using the MNI152 template provided in FSL) via linear rigid-body transformation as implemented in FSL’s flirt function.
Statistical analysis
Behavioral analysis
All behavioral analyes reported were run in SPSS (Version 25). To limit the effect of univariate outliers on the reported results, we winsorized the estimation error, extreme jar bias, and behavioral model parameter variables. Outliers were defined as datapoints that were 1.5 times the interquartile range beyond the 25th or 75th percentile respectively. We used the highest/lowest score of the non-outlying datapoints to impute outlying values. Between zero and three values were imputed for each variable.
Functional MRI analysis
Subject-level fMRI data were modelled using a mass-univariate GLM approach as implemented in SPM12 (Wellcome Department of Imaging Neuroscience, London, UK). To quantify BOLD response variability, we adapted a “least squares – single” (LS-S) approach as described by Mumford, et al. 39. This approach is implemented in an in-house version of variability toolbox for SPM (http://www.douglasdgarrett.com/#software) developed by our research group. This toolbox takes as input a standard GLM design matrix. We included the following regressors in each subject’s design matrix: Five regressors for the onsets of each successive sample presentation (with a duration of 1s), one regressor for the estimation phase onsets (with a duration of subjects’ RTs), and one regressor for the gambling phase onsets (with a duration of subjects’ RTs). All regressors were modelled as stick functions that were convolved with the canonical hemodynamic response function (HRF) and its first and second derivatives resulting in a total of 21 regressors per scanner run plus one constant regressors for each run. The toolbox then proceeds to iteratively fit GLMs which include one regressor modelling a single event and a second regressor modelling all other events of the same and all other conditions (this is done separately for each task run). To avoid issues with beta estimation close to the end of a timeseries (caused by truncation of the HRF), the toolbox discards any onsets that occur within the final 20s of the timeseries. Finally, the toolbox computes the standard deviation over the resulting (across-trial) beta estimates for each condition to yield the measure of SDBOLD used in this study. Compared to other approaches to SDBOLD estimation, this SDBOLD quantification allows one to parse dynamics amongst neighboring events/time points, while accounting for hemodynamic delays.
To compare our SDBOLD results to standard analysis approaches, we also obtained restricted maximum likelihood (ReML) GLM beta estimates for each task condition. The design matrix of this GLM included the same regressors as the one passed to variability toolbox for SDBOLD estimation. We defined contrasts to summarize condition effects across task runs. However, this approach ignores the variance in uncertainty trajectories across trials that result from our sample size manipulation (see Figure S1B). To account for this variance, we ran another GLM modelling the modulation of the BOLD signal by the posterior variance (i.e., uncertainty) of the unbiased Bayesian observer model during the sampling process. The design matrix of this GLM included the following regressors for each scanner run: one regressor for the sample onsets (with a duration of 1s), one regressor for the parametric modulation of the sample onsets regressor by model-derived posterior variances, one regressor for the estimation phase onsets (with a duration of subjects’ RTs), and one regressor for the gambling phase onsets (with a duration of subjects’ RTs). We mean-centered the parametric modulation regressor prior to model estimation. All regressors were modelled as stick functions (except for the parametric modulation regressor) that were convolved with the canonical hemodynamic response function (HRF) and its first and second derivatives resulting in a total of 12 regressors per scanner run. We also included three additional constant regressors modelling the mean of each scanner run. To look at the effect of the parametric modulation regressor, we defined a contrast that picked out the beta estimates of this regressor for each scanner run (more precisely, the parametric modulation regressors convolved with the canonical HRF).
To relate within-person SDBOLD or standard GLM beta estimates to the task design or behavior, we use a partial least-squares approach (PLS) 40. PLS finds latent factors that express maximal covariance between brain and behavior/design matrices using singular value decomposition. Brain scores are defined as the product of the brain data matrix with their respective latent factor weights (saliences). We will refer to brain scores in our SDBOLD analyses as “latent SDBOLD” and to the brain scores of our parametric modulation analyses as “latent uncertainty modulation”. To obtain a summary measure of the spatial expression of a latent variable, we can sum brain scores over all voxels. We performed 1000 permutation tests to assess the significance of the brain-design/behavior relationship against a null model. We then divided each voxel’s salience by its bootstrapped standard error, at +/-3 (approximating a 99.9% confidence interval), which serves as a pseudo-normalized measure of voxel robustness. We determined clusters of voxels with robust saliences by applying a cluster mask with a minimum threshold of 25 voxels. Prior to all PLS analyses, brain measures images were grey matter masked using the tissue prior provided in FSL at a probability threshold of > 0.37, and restrained to voxels with non-zero values for the respective brain measure across subjects.
To account for univariate outliers and non-linear relationships, we ran all reported brain-behavior analyses on ranked brain and behavioral variables in line with previous research 41. The results should thus be interpreted as monotonic rather than strictly linear relationships. Furthermore, we used a criterion of Cook’s distance > 4/N to identify multivariate outliers in all PLS models, removing where present. We nevertheless report the results of all analyses for the full dataset (N=51) without outliers removed in the Supplementary Results.
Data availability
The code to perform all behavioral and neuroimaging analyses is available at: https://github.com/LNDG/Skowron_etal_2023.
RESULTS
Subjects’ estimation errors are explained by individual differences in prior belief width
Participants performed the “marble task” during functional MRI scanning (Figure 2A). On each trial, they observed five sequential sample draws from a hypothetical “jar” containing a certain proportion of red and blue marbles (sampling phase). The marble ratio of the unseen jar varied from trial to trial and sample draws could contain one, five or nine marbles. Afterwards, participants indicated their estimate of the proportion of blue marbles in the jar by adjusting the number of blue marbles in a 10 by 10 response grid representing the unseen jar (estimation phase; see methods for further details). Our primary measure of interest was estimation error, defined as ℰ+ = |θ+ − Jθ+|, where 8 is the experienced proportion of blue marbles on trial t (i.e., the mean across samples; note that this quantity differs from the true proportion of the unseen jar but constitutes the best (unbiased) estimate given the available samples) and Jθ is the subject’s reported estimate of the blue marble proportion of the jar on trial t (Figure 3A). We computed the median estimation error across trials as an individual difference measure of estimation accuracy. Subjects’ median estimation error ranged from 0.01 to 0.20 (Median = 0.07, SD = 0.035). A Wilcoxon signed-ranks test revealed that estimation errors significantly differed from zero (Z = 6.215, one-sided p = 2.57 · 10,&-). This indicates differences between experienced and estimated marble ratios on the group level. Previous research suggests that people often underestimate probabilities of frequent events and overestimate probabilities of rare events in similar decision-from-experience tasks 42,43. To see whether this effect contributed to estimation accuracy in our task, we examined whether subjects made more errors for jars with more extreme marble ratios. We ran a linear mixed model predicting trial-wise estimation error from the marble ratio distance from 50:50 for a given jar. To account for individual differences in intercepts, we entered subject ID as a covariate in the model. There was a significant positive relationship between trial-wise estimation-error and jar marble ratio distance from 50:50 (F(1,2702) = 154.608, p = 1.48 · 10,./, semi-partial ρ2 = 0.048). The more extreme the jar marble ratio, the larger subjects’ estimation error on a given trial. In the following sections, we will refer to this effect as the “extreme jar bias”. Next, we examined whether individual differences in this extreme jar bias explained individual differences in median estimation error. We computed individuals’ extreme jar bias by running regression models predicting trial-wise estimation error from marble ratio distance from 50:50 for each subject separately. Individuals’ extreme jar bias was defined as the fitted (unstandardized) slopes of these regression models (Mean = 0.198, SD = 0.230). We found that individual differences in extreme jar bias correlated strongly and positively with subjects’ median estimation error (Pearson’s r(49) = 0.63, p = 8.28 · 10,0; Figure 3B). Biased estimation of extreme jar proportions thus contributed to individual differences in task accuracy.
We next fitted a set of Bayesian observer models that could account for this estimation bias in terms of inadequate uncertainty representations. The model with the overall best Bayesian Information Criterion (BIC) value was a Bayesian observer model which included a fitted prior parameter π, describing individual differences in the width of the belief distribution over marble ratios before observing any samples, and a response noise parameter σ, which captures variability of subjects’ reported marble ratio estimates around model predictions (Table 1). The second-best overall model included an exponential evidence weight ρ and a noisy choice rule parameter α, followed by an exponential evidence weight and fitted prior model assuming responses were sampled from the final trial posterior of the belief distribution. A Rescorla-Wagner model had the worst fit out of all models, which suggests that people indeed keep track of and leverage uncertainty estimates to update their marble ratio belief in a Bayesian manner. Parameters of the winning model were recoverable from simulations and model recovery was acceptable at the group level (see Supplementary Methods for details).
We next tested whether the prior width parameter π of the winning Bayesian observer model captured individual differences in extreme jar bias that contributed to subjects’ estimation error while controlling for other unmodelled sources of error captured by the response noise parameter σ. The fitted prior parameter π of the winning Bayesian observer model had a median of 2.45 (SD = 4.23), indicating that subjects on average had a narrower prior than the unbiased Bayesian observer (π = 1). Moreover, the fitted noise parameter σ had a median of 0.10 (SD = 0.05), highlighting that subjects’ reported marble ratios deviated from model predictions by 10 marbles on average. We ran multiple regression analyses predicting median estimation error and extreme jar bias from the fitted prior parameter π and noise parameter σ respectively. In the first regression model, a narrower prior parameter π (t(48) = 13.152, p = 1.57 · 10,&0, semi-partial ρ2 = 0.771; Figure 3C) and a higher noise parameter σ (t(48) = 4.501, p = 4.30 · 10,1, semi-partial ρ2 = 0.090; Figure 3D) uniquely predicted larger estimation errors (full model: (F(2,48) = 88.135, p = 8.54 · 10,&0, R2 = 0.786). Importantly, the prior width parameter π accounted for most of the explained variance in estimation errors suggesting that our model captured individual differences in subjects’ response behavior well. A narrower parameter π reflects a stronger prior belief in a 50:50 marble ratio, which should bias estimates for more extreme jars towards this (incorrect) default belief. In line with this expectation, our second regression model revealed that subjects’ extreme jar bias was positively related to prior width π (t(48) = 11.780, p = 9.11 · 10,&2, semi-partial ρ2 = 0.499) but also negatively to response noise σ (t(48) = −2.983, p = 4.48 · 10,., semi-partial ρ2 = −0.032; full model: F(2,48) = 114.968, p = 4.956 · 10,&3, R2 = 0.827). The significant (although smaller) effect of the noise parameter σ on extreme jar bias was driven by a small number of subjects with negative extreme jar bias slopes (i.e., subjects who made larger errors for jars with even rather than extreme marble ratios), which the Bayesian observer model could not account for (see Supplementary Results for further details). Taken together, these analyses indicate that the width of the prior belief substantially impacts individual task performance by capturing subjects’ tendency to overestimate jars with blue marble proportions close to 0 and underestimate jars with blue marble proportions close to 1 (Figure 3A).
BOLD signal variability tracks individual differences in prior uncertainty and compresses during learning
To investigate our hypothesis that BOLD signal variability (SDBOLD) collapses with successive sample presentations, we ran a task PLS analysis 40 relating SDBOLD to the five sample periods. This analysis returned a significant latent effect (permuted p = 0) showing that SDBOLD reduced with successive exposure to marble samples across a distributed set of cortical brain regions spanning the parietal, prefrontal and temporal lobes (Figure 4A, see Figure S2A for full axial brain plots and Table S3 for peak voxel coordinates in robust clusters). We further investigated this trend by fitting orthogonal polynomial contrasts in a linear mixed model entering subject ID as covariate. The pattern of latent SDBOLD change was best described by a linear contrast (F(1,201) = 40.444, p = 1.34 · 10,3, semi-partial ρ2 = 1.12 · 10,4) over a quadratic (F(1,201)=3.066, p = 8.15 · 10,4, semi-partial ρ2 = 8.41 · 10,/) and cubic (F(1,201)=3.416, p = 6.61 · 10,4, semi-partial ρ2 = 9.61 · 10,/) one.
Next, we asked whether the ability to collapse SDBOLD over the sampling period related to individual differences in task accuracy. To this end, we quantified SDBOLD change (ΔSDBOLD) by fitting linear regression slopes to the SDBOLD estimates for the five sample presentations in each voxel. We ran behavioral PLS analysis relating these SDBOLD slopes to subjects’ median estimation error. Four Cook’s d (i.e., multivariate) outliers were removed from this and all subsequent reported analyses resulting in N = 47 (we refer to the Supplementary Information for PLS results ran on the full dataset). We found a significant latent relationship (permuted p = 3.00 · 10,., Figure 4B, cf. Figure S3A for N = 51) revealing that subjects who decreased SDBOLD more, especially in parietal, prefrontal (PFC) and temporal cortex, also produced smaller estimation errors (Spearman’s r = 0.612). Notably, this set of brain regions largely overlapped with the canonical Default Mode Network (DMN) that is typically observed in resting-state fMRI (BSR > 3; Figure 4C, see Figure S2B for full axial brain plots and Table S4 for peak voxel coordinates in robust clusters) 1.
We further investigated our hypothesis that individual differences in state uncertainty representations could explain the relationship between ΔSDBOLD and estimation error. Our winning behavioral model suggested that subjects with a narrower prior belief distribution over marble ratios (i.e., large π parameter) showed little decrease in state uncertainty during sampling, because they started out with less uncertainty to begin with, and made more errors because they remained inflexible about their (incorrect) prior belief of a 50:50 marble ratio. In contrast, more accurate subjects (i.e., π close to 1) represented maximal state uncertainty at the start of a trial and reduced uncertainty more with each presented sample (Figure 2B). We expected ΔSDBOLD to mirror these individual differences in state uncertainty change during the sampling phase. To test this hypothesis, we ran multiple regression analysis predicting latent ΔSDBOLD (i.e., the whole-brain pattern of ΔSDBOLD which relates to median estimation error; see Figure 3b) from the prior width π parameter of the winning behavioral model. We also entered the response noise parameter σ as a control variable, which captures all unmodeled sources of estimation error and thus serves as a measure of model misfit (see Methods). This model explained a significant amount of variance (F(2,44) = 4.489, p = 1.68 · 10,4, R2 = 0.169). There was a significant main effect only for prior width π (t(44) = 2.703, p = 9.72 · 10,., semi-partial ρ2 = 0.138) but not for noise parameter σ (t(44) = 0.425, p = 0.673). Therefore, the wider a participant’s prior belief (i.e., smaller π), the more they collapsed SDBOLD during the sampling phase (Figure 4D). To ensure that that these effects were not due to the specific parameterization of the winning behavioral model, we replicated this analysis also for the parameters of the (second-best) evidence weight model (see Supplementary Results).
It is possible that our within-person effects of ΔSDBOLD are not independent of between-subject differences in SDBOLD levels, which have also been linked to task performance in previous studies 13. To account for this, we computed a latent SDBOLD control variable by matrix multiplying brain saliences from the behavioral PLS analysis relating ΔSDBOLD to estimation error (which reflect the spatial expression of the observed relationship) with subjects’ SDBOLD level at the first sample presentation period. We found no significant correlation between latent ΔSDBOLD and latent SDBOLD at first sample exposure (Spearman’s r(45) = −0.073, p = 6.28 · 10,&). Next, estimation error was significantly related to ΔSDBOLD (t(44) = 5.430, p = 2 · 10,2, semi-partial ρ2 = 0.625) when controlling for the effect of latent SDBOLD at first sample exposure (t(44) = 1.806, p = 7.78 · 10,4, semi-partial ρ2 = 0.208; full model: F(2,44) = 15.747, p = 7 · 10,2, R2 = 0.417). Likewise, prior width π (residualized by noise parameter σ) of the winning Bayesian-observer model was significantly related to ΔSDBOLD (t(44) = 3.151, p = 2.92 · 10,., semi-partial ρ2 = 0.409) when controlling for the effect of latent SDBOLD at first sample exposure (t(44) = 2.526, p = 1.52 · 10,4, semi-partial ρ2 = 0.328; full model: F(2,44) = 7.618, p = 1.44 · 10,., R2 = 0.257). In both models, the inclusion of the control variable also did not diminish the effect size of ΔSDBOLD on these behavioral measures. Thus, the effect of within-person SDBOLD compression on task performance was independent of between-subject differences in initial SDBOLD levels.
Overall, these results indicate that the degree to which subjects compress SDBOLD with learning predicts individual differences in inference accuracy, which can be explained in terms of idiosyncratic uncertainty representations.
Standard GLM neural uncertainty correlates are distinct from BOLD variability effects
Finally, we compared our SDBOLD findings to a more standard GLM analysis approach looking for brain areas whose (average) BOLD response magnitude is correlated with uncertainty. To this end, we obtained voxel-wise GLM beta estimates for each of the sample periods and entered them into behavioral PLS analysis relating them to estimation error. This multivariate approach makes the results directly comparable to our main finding showing that SDBOLD compression during learning relates to individual differences in task performance. This analysis did not yield a significant latent relationship (permuted p = 0.161 for N = 47 and permuted p = 0.268 for N = 51), suggesting that SDBOLD but not mean BOLD change predicted individual differences in inference accuracy. However, this change in average BOLD activity over the sampling period does not account for more nuanced modulation of the BOLD signal by uncertainty resulting from the varying sample sizes in our task design (see Figure S1B). In other words, an unbiased Bayesian observer model would predict more uncertainty reduction after observing a larger compared to a smaller sample in addition to the overall reduction in uncertainty with the total accumulated evidence (i.e., trial time). Our SDBOLD metric required discrete uncertainty conditions (with sufficient trial counts) and we thus only considered the main effect of trial time on uncertainty. However, in a standard GLM approach we could model the parametric modulation of the BOLD signal by the posterior variance of the belief distribution (i.e., uncertainty) for the unbiased Bayesian observer model, which accounts for both trial time and sample size effects on uncertainty. Again, we used a multivariate PLS approach to relate individuals’ voxel-wise GLM beta estimates (for a parametric uncertainty regressor) to median estimation error. This revealed a significant latent association (permuted p = 3.40 · 10,4, Spearman’s r = 0.675, Figure 5A, cf. Figure S3B for N = 51). Robust positive clusters were mainly located in the lateral parietal cortex, PFC (dorsomedial, ventrolateral, and rostrolateral parts), and in the insula. Robust negative clusters were present in ventromedial PFC (vmPFC) and in the left hippocampus (see Figure S2C for full axial brain plots and Table S5 for peak voxel coordinates in robust clusters). This latent relationship was related to the prior width parameter π of the winning Bayesian observer model, similar to the effect of βSDBOLD (Figure 5B). For further details on these results, please refer to the Supplementary Information.
Qualitatively, these effect regions seem to show little overlap with those showing an effect for ΔSDBOLD. To test the spatial specificity of ΔSDBOLD effects, we computed a new latent variable by extracting the brain saliences from the behavioral PLS analysis relating latent ΔSDBOLD to estimation error and matrix-multiplied them with the uncertainty beta estimates in each voxel from our standard parametric GLM analysis. We then ran a multiple regression analysis predicting median estimation error from latent ΔSDBOLD and this spatially-matched latent parametric uncertainty modulation variable. The regression model was significant (F(2,44) = 13.901, p = 2.10 · 10,1) with an R2 of 0.387. The main effect of latent ΔSDBOLD was significant (t(44) = 5.265, p = 4 · 10,2, semi-partial ρ2 = 0.386) but not for the new latent control variable (t(44) = 0.975, p = 0.335). This indicates that ΔSDBOLD effects and the parametric uncertainty modulation of the BOLD signal on performance are spatially distinct in the brain.
Furthermore, we found a significant positive correlation between the latent ΔSDBOLD (Spearman’s r(45) = 0.407, p = 4.55 · 10,.) and latent uncertainty modulation effects on estimation accuracy indicating that more accurate performers show both types of uncertainty representation in the brain. We ran multiple regression analysis predicting estimation errors from both latent ΔSDBOLD and latent uncertainty modulation to investigate whether they explained unique variance in estimation errors. The full model was significant (F(2,44) = 31.883, p = 2.76 · 10,3, R2 = 0.592; Figure 5C). We found a significant main effect of both latent ΔSDBOLD (t(44) = 4.844, p = 1.60 · 10,1, semi-partial ρ2 = 0.218) and latent uncertainty modulation (t(44) = 3.829, p = 4.04 · 10,/, semi-partial ρ2 = 0.136). This suggests that both types of neural uncertainty correlates uniquely relate to task accuracy. Overall, these analyses reveal that SDBOLD effects are spatially distinct and uniquely relate to task accuracy compared to neural correlates revealed by a more standard GLM approach.
DISCUSSION
In this study we show that reductions in state uncertainty over the course of learning co-occur with a compression in BOLD signal variability. Moreover, individuals who decreased state uncertainty to a greater extent showed more pronounced SDBOLD compression and made smaller estimation errors on average. Behavioral modeling suggested that better performers reduced state uncertainty to a greater extent because they began with a wider (more uncertain and flexible) prior belief distribution before observing any samples, resulting in more unbiased marble ratio estimates.
Our findings establish SDBOLD as a novel within-person neural correlate of uncertainty, which reduces over the course of Bayesian inference/learning, and relates to inference accuracy. The current findings thus add to a growing literature demonstrating that within-person SDBOLD modulation in the face of varying task demands facilitates adaptive behavior across different cognitive domains (see 13). Our study is the first to directly show that SDBOLD tracks reductions in uncertainty. The bulk of past empirical work arguing for such a link showed modulation of brain signal variability across disparate task conditions that varied along several dimensions beyond uncertainty (e.g., cognitive demand, bottom-up sensory input or processing requirements 17,18,44,45). By employing a learning paradigm, we were able to operationalize uncertainty in a highly precise manner; specifically, one’s uncertainty regarding the latent state of an environmental variable, a property that can be reduced through learning from observations. To our knowledge, the only other study to date that has directly investigated brain signal variability changes with uncertainty is a recent study by Kosciessa, et al. 12, who systematically manipulated uncertainty in a perceptual decision-making task by varying the number of visual features that could be probed in a subsequent decision phase. This paradigm allowed the researchers to equate bottom-up visual input across uncertainty conditions, similar to our own study. However, their study focused on irreducible uncertainty (sometimes referred to as risk or outcome uncertainty in the literature 2,3) and only considered temporal variability in the EEG signal. In contrast, we show that reductions in state uncertainty (i.e., uncertainty about the probability of an outcome) due to learning are mirrored in the variability of the fMRI BOLD signal. This supports the longstanding idea that brain signal variability enables cognitive flexibility 13,14, which is required early on in the learning process to update one’s internal state belief.
Another innovation of our study over prior BOLD variability work more generally13,14 is the use of an event-related method to quantify trial-to-trial SDBOLD. This allowed us to track the evolution of SDBOLD with changes in uncertainty on a short time scale (several seconds) 39. This method is inspired by the machine-learning fMRI literature allowing for the estimation of GLM beta parameters for single events, over which we can compute the variance for each uncertainty condition. These features of our study resolve various shortcomings of previous work and provide strong support for a link between brain signal variability and uncertainty.
Individual differences in uncertainty representations track inference accuracy
We found that low performing individuals made most estimation errors for extreme marble ratios, such that jars with blue marble proportions close to one were underestimated and those with proportions close to zero were overestimated. This effect has been reported previously for tasks where participants are asked to judge outcome probabilities from experience, sometimes referred to as under-extremity 42,43,46. Notably, some studies found that this effect is abolished when participants were asked to give verbal judgements instead 47. This may suggest that individual differences in extreme jar bias result from people’s (in)ability to accurately enter their marble ratio estimates into the grid in the estimation phase of our task. However, we found that brain activity during sampling (prior to the estimation phase) predicted individual differences in peoples’ (average) estimation error. It is thus likely that the observed bias reflects people’s internal representations rather than being a mere consequence of our chosen response modality.
Our winning Bayesian observer model suggested that such estimation errors could best be explained by individual differences in the representation of the initial prior belief over blue marble proportions rather than in belief updating; low performers held a narrower prior belief distribution over the default belief of a 50:50 marble ratio. It is possible that this prior was induced by our task design given the default position of the response grid. The fitted prior could thus reflect individual differences in anchoring to this task feature 48. Alternatively, this bias could reflect bona fide trait-like individual differences in expectations about extreme outcomes. In line with this explanation, Glaze, et al. 49 found that performance differences on a change-point inference task were best explained by participants’ prior width over potential change rates, which cannot simply be explained by anchoring. Note that the distinction between prior beliefs and belief updating is also pertinent to other approximately-Bayesian and non-Bayesian cognitive models that have been used to model the inference process in similar tasks, which we did not consider here 43,50. These models account for cognitive limitations in human information processing and may provide an even better fit to behavioral data. While not the main aim of the current study, future modelling and experimental work is required to pin down the source of individual differences in under-extremity observed in the current study.
We showed that less SDBOLD compression (independent of initial SDBOLD levels) in poorer performers could best be explained by more limited uncertainty reduction, based on parameter estimates for the winning behavioral model. Still, our results rely on the assumption that participants represent and utilize uncertainty akin to an (approximate) Bayesian observer. Future work should reduce reliance on model assumptions and obtain subjective uncertainty estimates. For example, one could regularly ask people to report their state belief together with their belief confidence throughout the learning process to directly track updates in their subjective belief distribution.
Performance-related SDBOLD collapse in the Default Mode Network
We observed performance-relevant collapse in SDBOLD in a network of brain regions that largely overlapped with regions commonly ascribed to the default mode network (DMN) in resting-state fMRI connectivity analyses 1,51–54, including parietal (precuneus, inferior parietal lobe including the angular gyrus), prefrontal (superior frontal gyrus, frontopolar cortex, orbitofrontal cortex, paracingulate gyrus) and temporal (middle temporal gyrus) cortices. Previous work has consistently reported deactivation of this brain network during externally-cued, demanding cognitive tasks compared to rest 51,52,54–57. In contrast, internally-oriented tasks that call for self-referential processing and memory recollection demonstrate increased activity in key regions of the DMN 54,58–60. With respect to brain signal variability, one study by Grady and Garrett 44 found increased SDBOLD in DMN regions in externally-as compared to internally-guided tasks. Given previous reports that higher local SDBOLD is associated with lower whole-brain functional network dimensionality 61, this finding may reflect more crosstalk between the DMN and other brain networks during externally-guided tasks. Therefore, SDBOLD compression during learning may reflect the gradual build-up of an internal world model; larger SDBOLD under high state uncertainty early on may afford a more flexible incorporation of incoming information to update one’s internal state belief. As state uncertainty reduces and SDBOLD compresses, the brain arrives at a stable internal belief representation, which informs one’s jar marble ratio estimate in the subsequent task phase.
Previous research has shown that internally-guided tasks evoke a coupling between the DMN and the frontoparietal-control network to support task performance 62. In our task, we would expect such a coupling to emerge towards the end of the sampling period reflecting the utilization of one’s internal state belief to prepare the upcoming response. Indeed, we observe a concomitant increase in BOLD activity in the fronto-parietal network and compression of SDBOLD in the DMN in high performing subjects. How functional connectivity between different brain networks changes over the course of learning goes beyond the scope of the current study, but constitutes an interesting target for future research.
Standard analytic approaches reveal different neural uncertainty correlates compared to SDBOLD
Our standard GLM analysis revealed that BOLD modulation by uncertainty also related to estimation accuracy in our task. On one hand, higher BOLD (in lateral parietal cortex, dorsomedial PFC, ventro- and rostrolateral PFC, and anterior insula) with lower uncertainty predicted higher estimation accuracy. In other words, high performers showed an increase in BOLD activity in these regions as they observed more samples and became more certain about the jar marble ratio on a given trial. These areas correspond to fronto-parietal control and dorsal attention brain networks, which support goal-directed behavior in externally-driven tasks 1,63–66. Conversely, positive coupling between the BOLD signal and state uncertainty in mainly in the vmPFC (and a cluster in the left hippocampus) also predicted better performance in our task, which has previously been linked to confidence representations 67 (see Supplementary Information for detailed discussion of these findings).
Interestingly, we found that uncertainty representations in SDBOLD (in DMN regions) and parametric BOLD signal modulation (in fronto-parietal control/attention regions) were spatially different and uniquely predicted estimation accuracy. Why would the brain track uncertainty in these different neural signals and across different networks? One possibility could be that SDBOLD tracks uncertainty about the latent environmental state with or without immediate relevance for decision-making, while parametric BOLD modulation by uncertainty may be more relevant in the context of goal-directed decision formation and action selection. In support of this view, previous research has shown modulation of brain signal variability by stimulus features even when no action was required. For example, in the human neuroimaging literature, changes in neural variability have been observed in response to varying complexity of visual stimuli during passive viewing, which were linked to offline cognitive performance 41,45. In a similar vein, it has recently been argued that early perceptual uncertainty, which has been the main focus of sampling accounts of neural variability, is tracked irrespective of task demands but can be flexibly utilized in higher-order decision-making 6-8,68. Thus, changes in brain signal variability may track environmental variables without the need for immediate action. In contrast, parametric BOLD modulation by uncertainty has been reported for learning and decision-making tasks that require participants to make choices on every trial 2,23–37. As such, these neural uncertainty correlates may be more tightly linked to decision formation and may in some cases even relate to metacognitive awareness of this decision-relevant variable 67,69,70. Although speculative, this framework makes testable predictions for situations in which we would expect to see a dissociation of these two neural signals, which should be addressed in future work.
Limitations and next steps
One potential limitation of our task design was that our stimulus sample size manipulation introduced trial-to-trial variance in how much uncertainty could be reduced with each presented sample (Figure S1B). Because our SDBOLD measure was computed for each sample period across trials, uncertainty levels were by definition not equivalent within each condition bin (although uncertainty always decreased across sample periods). However, our results showed that SDBOLD effects could not be explained by parametric BOLD signal modulation by trial-to-trial differences in uncertainty trajectories in the same brain regions, which could have accounted for this variance. Ideally, future studies should keep sample size consistent between trials so that SDBOLD can be computed over comparable uncertainty levels.
Another potential limitation is that uncertainty was collinear with within-trial time in our task; the more samples were presented, the more uncertainty could be reduced in general. However, this is an inevitable consequence of decision-making and learning in stable environments, in which one can simply accumulate evidence over time. Although within-trial time could have introduced arousal- or attention-related effects (which have previously been linked to human brain signal variability 13), we observed performance-related interindividual variability in SDBOLD compression over the sampling period. Such an effect cannot, by definition, be accounted for by the fixed factor of trial time. Nevertheless, future work could investigate the coupling between brain signal variability and uncertainty in non-stationary learning environments (see e.g. 3), within which uncertainty can increase or decrease over time.
Conclusion
We provide first evidence that moment-to-moment brain signal variability compresses with increasing belief precision during learning. Whether brain signal compression is directly proportional to the informativeness of the available evidence is an important prediction that should be investigated in future work.
SUPPLEMENTARY INFORMATION
SUPPLEMENTARY METHODS
Task design
During the final experimental block, performed outside the MR scanner, confidence ratings of the estimation of the blue-red marble ratio were collected. For this block of trials, a rating scale was included directly after completion of the estimation and prior to the gambling phase. Participants were instructed to indicate their confidence ranging from “very unsure” to “very sure” about the estimation of the blue-red marble ratio on a continuous confidence judgment scale 1. Additionally, the amount of obtainable reward on each trial was manipulated. Each marble was marked with one ring (low reward) or two rings (high reward). In the high reward condition, the magnitude of the reward in the gamble was doubled compared to the low reward condition. Each participant completed each jar an equal number of times in the low and high reward conditions.
The gambling phase
At the end of each trial, participants had to make a risky decision between two options. One option represented a draw from the current urn. If the outcome had been a blue marble, the participant would have received a payoff with a specific magnitude (which is different in each trial, see details below). The other option represented a reference option with a certain payoff (i.e., reward probability was 100 percent). After the experiment finished, one of these gambles was picked and participants received a bonus payment based on the result of a draw from their chosen option. The options were presented in the left or right hemi-field of the screen and the order was pseudo-randomized across the experiment. Participants should choose one option via left or right button of the button box. Each trial ended following the decision and subsequently the next trial started with a different urn and reward probability.
The 18 different marble jars in the experiment differed in expected value in the following way: In six jars the expected value (probability of drawing a blue marble times the varying payoff) was larger than the reference option (ranging between 10 and 30 percent), in another six urns the expected value of both options were approximately equal and in the final six urns the expected value was lower than the reference option (ranging between −10 and −30 percent, Supplementary table S1). The probability of drawing a blue marble varied between 0.1 and 0.91, whereas the payoff varied between 8 and 130 points. Theoretically, a decision-maker that always chooses the gambling option could expect the same summed expected value as a decision maker that always chooses the certain reference option.
SUPPLEMENTARY RESULTS
Model and parameter recovery
The ground truth model was recovered well from each simulated dataset. The respective ground truth model consistently had the lowest BIC overall (Supplementary table S2). Across the simulated subjects, the ground truth model was always recovered most frequently (Supplementary Figure S1E). However, there was some model confusion, particularly for our winning model (fitted prior with noisy choice rule), which was only recovered in about 49% of simulated subjects. This may also explain the variance in empirical model fit on the subject level, although the majority of subjects (35%) were still best fit by the overall winning Bayesian observer model (Supplementary Figure S1A). Some degree of model confusion is to be expected since all variants of our Bayesian observer model aim to account for the same behavioral pattern (biased estimation of extreme marble proportions). Certain parameterizations of the prior model family may thus yield simulated behavior that is also well accounted for by models from the exponential family. Furthermore, some confusion between nested models is not necessarily surprising. For example, certain ranges of the response noise parameter σ may result in similar behavioral predictions to that of the same model that samples responses from the final beta distribution. It is noteworthy, however, that in general models including a noisy response rule are well differentiable from models with a sampling response rule. This supports the inclusion of the noise parameter σ in our winning model. There was also good differentiation between the Rescorla-Wagner model and all Bayesian observer models on the subject level supporting our inference that subjects seem to use uncertainty to inform their marble ratio estimates.
We also checked the recoverability of model parameters. For our winning model, both prior (Pearson’s r = 0.98) and noise (Pearson’s r = 0.97) parameters estimated for each simulated subject were highly correlated with their ground truth values (Supplementary Figure S1C). Parameters of the second-best model were also recoverable (Supplementary Figure S1D). This reinforces the inferences made from individuals’ parameter estimates in our results.
Explaining the effect of response noise parameter σ on extreme jar bias
The significant main effect of response noise σ in predicting subjects’ extreme jar bias suggests that people with opposite extreme jar biases (i.e., more misestimation for jars with marble ratios close to 50:50), is not captured by the winning behavioral model. Indeed, if we remove subjects with opposite extreme jar bias (i.e., negative effect slopes, N = 9) from this regression analysis, the main effect of noise parameter σ on extreme jar bias is not significant (t(39) = −0.113, p = 0.922). Thus, our behavioral model accounts well for the empirical response pattern we sought to capture (i.e., positive effect slopes).
Analysis of BOLD signal modulation by uncertainty
Our PLS model relating BOLD signal modulation by uncertainty (i.e., standard GLM beta estimates for the parametric uncertainty regressor) to subjects’ median estimation error revealed a significant latent association (permuted p = 3.40 · 10-3, Spearman’s r = 0.675, Figure 5B, cf. Figure S3B for N = 51). Because brain saliences were both positive and negative, individual differences in parametric modulation effects related differently to estimation errors between regions (see Figure S2C for full axial brain plots and Table S5 for peak voxel coordinates in robust clusters). In robust clusters with positive saliences, subjects who down-modulated BOLD activity more with increasing state uncertainty made fewer estimation errors (lower latent uncertainty modulation rank scores reflecting more negative modulation effects) than subjects who did not track state uncertainty in these regions (higher latent uncertainty modulation rank scores reflecting BOLD signal modulation by state uncertainty effects close to zero). Robust clusters were mainly located in the lateral parietal cortex, PFC (dorsomedial, ventrolateral, and rostrolateral parts), and in the insula. In contrast, in robust clusters with negative brain saliences, subjects who showed more positive coupling between BOLD signal and state uncertainty made fewer estimation errors than subjects with a negative coupling. This relationship was mainly expressed in a cluster located in the ventromedial PFC (vmPFC) and a cluster in the left hippocampus.
We also investigated whether these performance-related individual differences in BOLD signal modulation by uncertainty could be explained by idiosyncratic uncertainty representations, as captured by our winning behavioral model. We ran multiple regression predicting latent uncertainty modulation of the BOLD response, which relates to task accuracy, from the prior width parameter π controlling for response noise σ. This model explained a significant amount of variance (F(2,44) = 6.244, p = 4.10 · 10-3) with an R2 of 0.221 (Figure 5C). There was a significant main effect for prior width π (t(44) = 3.508, p = 1.05 · 10-3, semi-partial ρ2 = 0.218) but not for parameter σ (t(44) = 1.590, p = 0.119). This finding suggests that people with suboptimal uncertainty representations, due to a narrow prior belief distribution (i.e., higher π), show less BOLD signal modulation by state uncertainty trajectories derived from the unbiased Bayesian observer model. We again ran this analysis also for the parameters of the (second-best) evidence weight model, which afforded similar inferences (see supplementary results).
Brain-behavior results for the evidence weight model
Our alternative model, which explains suboptimal responding by an exponential weighting of the incoming evidence, would similarly predict individual differences in state uncertainty reduction during the sampling phase: People who reduce uncertainty less for large samples (i.e., ρ < 1) make more estimation errors than people who weight the information provided by large samples more optimally (i.e., ρ = 1). Subjects’ fitted prior parameter π of the winning model and fitted evidence weight ρ for second-best model (both including a response noise parameter σ) were highly correlated (Pearson’s r(49) = −0.863, p = 3.73 · 10!$%), supporting the notion that they capture similar behavioral response patterns. For this model, the fitted evidence weight parameter ρ had a median of 0.85 (SD = 0.84) and the fitted noise parameter σ had a median of 0.10 (SD = 0.05).
Predicting ranked latent ΔSDBOLD from the ranked evidence weight ρ and response noise σ parameters in a multiple regression model explained a significant amount of variance (F(2,44) = 6.938, p = 2.41 · 10-3) with an R2 of 0.240. There was a significant main effect only for evidence weight ρ (t(44) = −3.725, p = 5.53 · 10-4, semi-partial ρ2 = −0.240): People who underweight the evidence of large samples more (i.e. ρ < 1) show less SDBOLD collapse during the sampling phase. Furthermore, the regression model predicting latent uncertainty modulation of the BOLD response from evidence weight parameter ρ and response noise σ explained a significant amount of variance (F(2,44) = 5.544, p = 7.12 · 10-3) with an R2 of 0.201. There was a significant main effect for evidence weight ρ (t(44) = −3.257, p = 2.17 · 10-3, semi-partial ρ2 = −0.193) and also for parameter σ (t(44) = 2.147, p = 3.74 · 10-3). This suggests that people who underweight the evidence of large samples (ρ < 1) show less BOLD signal modulation by uncertainty trajectories of the unbiased Bayesian observer.
Overall, our interpretation that individual differences in uncertainty representations explain the observed brain-behavior relationships would not have changed had we chosen this alternative model. We frame our conclusions in terms of the fitted prior model because it provided a better account of behavior.
SUPPLEMENTARY DISCUSSION
Our standard analysis approach revealed that BOLD signal modulation by state uncertainty predicted task accuracy. On one hand, higher BOLD signal (in lateral parietal cortex, dorsomedial PFC, ventro- and rostrolateral PFC, and anterior insula) with lower uncertainty predicted higher estimation accuracy. In other words, higher performers showed an increase in BOLD activity in these regions as they observed more samples and became more certain about the jar marble ratio on a given trial. These areas correspond to frontoparietal control and dorsal attention brain networks, which support goal-directed behavior in externally-driven tasks 2-6. These networks of brain areas have also been found to correlate with state uncertainty in other inference tasks 7,8. However, these prior studies commonly report an opposite effect direction in which high state uncertainty reflected more activity in these brain regions. For example, an fMRI study by McGuire, et al. 7 investigated how uncertainty drives learning in a task that required participants to infer the position of an unseen helicopter based on bag drops that followed a normal distribution around the helicopter’s true position. In their task, the helicopter location could change unannounced leading to learning rate adjustments due to change-point uncertainty and uncertainty about the helicopter location in a given environment (which the authors term relative uncertainty). The authors found BOLD signal modulation by relative uncertainty in lateral parietal cortex, dorsomedial PFC, ventrolateral PFC, and anterior insula, which match the regions we found in the current study. However, the direction of effect is reversed showing a positive coupling between relative uncertainty and the BOLD signa. Two key differences of our study are that we investigate state inference in a stable rather than dynamic environment and that decisions are only required after observing several evidence samples. In these respects, our study shares similarities with perceptual decision-making studies, which assume that noisy perceptual evidence is integrated over time to arrive at a decision 9,10. Previous fMRI studies of perceptual decision-making also report the involvement of a similar set of higher-order brain regions, which have been linked to evidence accumulation, decision formation and response preparation 11–18. Notably, some studies have reported an increase in average BOLD activity in these brain areas over the sampling period (i.e., with decreasing state uncertainty), which aligns with our findings 11,19.
Conversely, positive coupling between the BOLD signal and state uncertainty in vmPFC and left hippocampus also predicted better performance in our task. The previously mentioned study by McGuire, et al. 7 also reported an effect in the vmPFC, but again with opposite effect direction compared to the one we see in high performers. At first glance, our findings appear at odds with previous work reporting vmPFC tracking of subjective confidence. However, recent work by Trudel, et al. 20 suggests that representations of uncertainty in this brain region depend on the behavioral goal. In their task, participants had to learn the predictiveness of two choice options in determining a target location that was later revealed in the trial. Early on during learning, vmPFC BOLD signal positively tracked the uncertainty differences of the two choice options and correlated with uncertainty-guided exploration. Later on, vmPFC BOLD signal negatively tracked the uncertainty difference and correlated with uncertainty-avoidant exploitation. The positive effect we observe in our study is thus in line with the general idea of an “exploratory brain mode” that supports uncertainty-guided learning. Overall, the GLM results directly connect to previous studies across various domains of decision-making and reveal surprising discrepancies that require more attention in future work.
Acknowledgements and Funding
During the work on his dissertation, A.S. was a pre-doctoral fellow of the International Max Planck Research School on the Life Course (LIFE, www.imprs-life.mpg.de; participating institutions: Max Planck Institute for Human Development, Freie Universität Berlin, Humboldt-Universität zu Berlin, University of Michigan, University of Virginia, University of Zurich) and the International Max Planck Research School on Computational Methods in Psychiatry and Ageing Research (COMP2PSYCH, https://www.mps-ucl-centre.mpg.de/comp2psych). D.D.G. received an Emmy Noether Programme grant from the German Research Foundation. D.D.G. is affiliated with the Max Planck UCL Centre for Computational Psychiatry and Aging Research.