Abstract
When judging the average value of sample stimuli (e.g., numbers) people tend to either over- or underweight extreme sample values, depending on task context. In a context of overweighting, recent work has shown that extreme sample values were overly represented also in neural signals, in terms of an anti-compressed geometry of number samples in multivariate electroencephalography (EEG) patterns. Here, we asked whether neural representational geometries may also reflect underweighting of extreme values (i.e., compression) which has been observed behaviorally in a great variety of tasks. We used a simple experimental manipulation (instructions to average a single-stream or to compare dual-streams of samples) to induce compression or anti-compression in behavior when participants judged rapid number sequences. Model-based representational similarity analysis (RSA) replicated the previous finding of neural anti-compression in the dual-stream task, but failed to provide evidence for neural compression in the single-stream task, despite the evidence for compression in behavior. Instead, the results suggested enhanced neural processing of extreme values in either task, regardless of whether extremes were over- or underweighted in subsequent behavioral choice. We further observed more general differences in the neural representation of the sample information between the two tasks. The results suggest enhanced processing of extreme values as the brain’s default. Such a default raises new questions about the origin of common psychometric distortions, such as diminishing sensitivity for larger values.
Introduction
When making decisions about magnitudes such as numbers, people tend to distort sample information away from its true value. A commonly observed distortion is a compression of magnitude, where extreme (or outlying) samples receive relatively less weight than prescribed by a linear and, according to common interpretation, normative transformation of objective into psychological or subjective values (Bernoulli, 1954; Fechner, 1860; Juechems et al., 2021; Li et al., 2017; Tversky & Kahneman, 1992; Vandormael et al., 2017). However, in some task contexts, the opposite type of distortion has been observed, that is, an anti-compression, where extreme or outlying samples are overweighted (Clarmann von Clarenau et al., 2022; Kunar et al., 2017; Luyckx et al., 2019; Spitzer et al., 2017; Tsetsos et al., 2012; Vanunu et al., 2020).
Using electroencephalographic (EEG) recordings and multivariate representational similarity analysis (RSA), recent work has identified a potential neural signature of such psychometric distortions. During processing of symbolic number samples, multivariate EEG patterns have been characterized by a “numerical distance effect”, where the representational similarity of, for instance, “4” and “5” is larger than that between “4” and “6”, which in turn is larger than that between “3” and “7”, and so forth (Appelhoff et al., 2022; Luyckx et al., 2019; Sheahan et al., 2021; Spitzer et al., 2017; Teichmann et al., 2018). Intriguingly, in a multi-sample decision task that promoted anti-compression of number samples in behavior, the “neural numberline” underlying the numerical distance effect was found to be anti-compressed as well (Appelhoff et al., 2022; Luyckx et al., 2019; Spitzer et al., 2017). The findings suggested that behaviorally relevant distortions in multi-sample decisions may occur already when the individual samples are being processed.
However, such “neurometric” signature of psychometric distortion has thus far only been reported in tasks that promoted anti-compression, that is, a selective overweighting of extreme (outlying) sample values in behavior. A much more common observation in other task contexts is a compression of magnitude, where extreme values are underweighted, for instance, in psychophysical tasks (Fechner, 1860; Stevens, 1957; Wyart et al., 2012), in studies of numerical cognition (Dehaene, 2003; Longo & Lourenco, 2007; Nieder & Dehaene, 2009; Nieder & Miller, 2003), and in behavioral economics experiments (Kellen et al., 2016; McAllister & Tarbert, 1999; Tversky & Kahneman, 1992). To what extent large-scale neural patterns, as recorded with human EEG, may also reflect psychometric compression is still unknown.
In the present study, we capitalized on recent progress in understanding the experimental factors that may mediate whether people compress or anti-compress magnitudes in decision making (see also Summerfield & Parpart, 2022). Specifically, in a recent behavioral study, we found compression when judging the average of a single stream, but anti-compression when comparing dual streams of number samples (Clarmann von Clarenau et al., 2022). Here, we adopted this experimental manipulation to examine the neural signatures of compressive (as compared to anti-compressive) number processing in multivariate EEG patterns.
Behaviorally, the results confirmed a compression of numerical values in the single-stream task, and an anti-compression of the same values in the dual-stream task. In neural signals, we replicated the finding of an anti-compressed number representation in the dual-stream task. Surprisingly, however, we found no evidence for neural compression in the single-stream task. Instead, we observed more general differences in the neural representation of the sample information. Whereas in the dual-stream task, the samples’ neural geometry predominantly reflected their abstract magnitude, the single-stream task was associated with a more direct, non-quantitative representation of the concrete sample stimuli. The results relativize the diagnosticity of sample-level EEG-metrics for psychometric distortions. They also suggest a default mode of processing, namely, enhanced neural processing of extreme sample values, regardless of whether they are over- or underweighted in subsequent behavior.
Results
Participants (n=30) performed two different variants of a sequential integration task where they observed sequences of ten digits (ranging between 1 and 9, colored in red or blue; Fig. 1a). In the single-stream variant (“averaging” task), participants were asked to report whether the average of all ten number samples (regardless of color) was larger or smaller than 5. In the dual-stream variant (“comparison” task), they were asked to indicate whether the red or the blue samples had the higher average value.
Behavioral results
As expected, mean choice accuracy (Fig. 1b) was higher in the single-stream task (80.5% ± 0.8% SE) than in the dual-stream task [76.3% ± 0.7% SE; t(29)=5.51, p<0.001, d=1.02; paired t-test]. This suggests that comparing two streams was more difficult than averaging a single-stream (of otherwise physically identical inputs; Fig. 1a).
To characterize how the numerical value (1-9) of a sample influenced subsequent choice, we calculated model-free decision weights (see Materials and Methods). Descriptively, the weighting curve showed a concave shape (indicating compression) in the single-stream task (Fig. 1c, left), whereas an convex shape (indicating anti-compression) was evident in the dual-stream task (Fig. 1c, right).
For quantitative analysis, we fitted a psychometric model (see Materials and Methods), which characterizes the transformation of sample values (1-9) as a sign-preserving power function with exponent kappa (k; where k<1 indicates compression and k>1 anti-compression). The model further includes parameters for overall bias (b) towards larger or smaller numbers (b > or < 0) and decision noise (s, see Materials and Methods).
The best-fitting parameter estimates are shown in Fig. 1d. Of main interest was parameter kappa (k), which indicates the extent to which a weighting policy is compressed or anti-compressed relative to a linear weighting (k = 1). Indeed, k was significantly smaller than 1 in the single-stream task [M=0.82, SE=0.07, t(29)=2.43, p=0.02, d=0.44; one-sample t-test against 1] and significantly larger than 1 in the dual-stream task [M=1.82, SE=0.16, t(29)=5.21, p<0.001, d=0.95], which confirms robust compression in single-stream averaging, and robust anti-compression in dual-stream comparison. The essential difference in transformation (Eq. 1) implied by these k-values is illustrated in Fig. 1c (insets). While the transformation is concave (i.e., shallower towards the extremes) in the single-stream task, it is convex (i.e., steeper towards the extremes) in the dual-stream task.
Examining bias (b), we found significantly positive values both in the single-stream task [M=0.02, SE=0.01, t(29)=3.52, p=0.001, d=0.64] and in the dual-stream task [M=0.21, SE=0.06, t(29)=3.34, p=0.002, d=0.61]. Thus, judgments in both tasks were overall biased towards larger numbers, which is consistent with previous work (Clarmann von Clarenau et al., 2022; Luyckx et al., 2019; Spitzer et al., 2017). Finally, noise (s) was significantly higher in the dual-stream (M=1.23, SE=0.07) than in the single-stream task [M=0.98, SE=0.06; t(29)=4.40, p<0.001, d=0.70; paired t-test]. This is consistent with the lower level of accuracy in the dual-stream task (see Fig. 1b).
Together, our experimental manipulation was successful in inducing opposite types of psychometric distortions, with identical stimulus inputs in the two task conditions. Whereas decision weighting was compressed (suggesting relative underweighting of extreme values) in the single-stream task, it was anti-compressed (suggesting relative overweighting of extreme values) in the dual-stream task.
EEG results
Multivariate (RSA) results
Turning to the EEG data, we first examined the encoding of sample information in multivariate ERP patterns using RSA (see Materials and Methods). Specifically, we examined in each of the two tasks (single-stream and dual-stream) the extent to which RSA patterns encoded (i) the concrete digit that was shown as sample stimulus (e.g., “4” or “8”), (ii) its color (i.e., red or blue), and (iii) the numerical magnitude information in a sample (i.e., 1-9, in terms of a numerical distance effect; see Materials and Methods and Fig. 2a).
The visual attributes of a sample (i.e., its color and digit shape) were encoded early on, from approximately 100 to 700 ms after sample onset, in both tasks (Fig. 2b all pcluster<0.001). From approx. 200 ms on, the RSA patterns also encoded the samples’ numerical magnitude, in terms of a significant numerical distance effect (single-stream: pcluster<0.001; dual-stream: pcluster<0.001), thus replicating and extending previous work (Appelhoff et al., 2022; Luyckx et al., 2019; Sheahan et al., 2021; Spitzer et al., 2017; Teichmann et al., 2018). Descriptively, the numerical distance effect observed in the single-stream task, while robustly significant, appeared weaker than that in the dual-stream task.
Comparing the RSA time courses between the two task conditions (Fig. 2c) confirmed that the numerical distance effect was significantly stronger in the dual-stream task (pcluster<0.002). Surprisingly, we found no difference in color encoding between the two tasks, even though color was task-relevant only in the dual-stream task, but not in the single-stream task. Instead, in the single-stream task, we observed a relatively stronger representation of the concrete digit (i.e., the unique number symbol) that had been displayed (pcluster<0.005). This effect was evident in a relatively late time window (approximately 400-600 ms, that is, only after the early visual encoding of digits and color).
Together, multivariate ERP patterns in both task conditions robustly encoded information about the sample’s color, the number symbol it showed, and its numerical magnitude. The representation of samples in the single-stream task, however, showed qualitative differences, in terms of a relatively weaker encoding of numerical magnitude, and a relatively stronger encoding of the concrete sample stimuli.
Neurometric RSA results
Next, we examined potential distortions of the “neural numberline” underlying the neural magnitude representation disclosed in the above RSA results. To this end, we parameterized the numerical distance model (Fig. 2d, right) to reflect distortions by k (compression/ anti-compression) and b (bias towards/against larger numbers), analogously as in our psychometric model (see Materials and Methods, Eq. 1). We then used exhaustive gridsearch to determine for each participant the parameter combination with which the model fitted the data best. Fig. 3b illustrates the improvement in fit (in terms of Δ r relative to the standard model with k = 1 and b = 0; cf. Fig. 2d, right) in a representative time window (0.2-0.6 s; cf Fig.2c and 3a). Note that neurometric mapping was performed using a log scale of k (where log(k) = 0 corresponds to k = 1, see Fig. 3b) to avoid fitting bias (see Materials and Methods).
The results (Fig. 3b) replicated our previous finding of a significant distortion of the neural numberline in the dual-stream task (Fig. 3b right). Specifically, like in our previous work (Appelhoff et al., 2022; Spitzer et al., 2017), we observed neurometric estimates of k > 1 (i.e., anti-compression) and of b > 0 (i.e., a bias towards larger numbers; p=0.013, FDR-corrected), which mirrors the pattern observed in the behavioral data (cf. Fig. 1c-d, orange). However, contrary to our expectations, we found no evidence for a neurometric compression in the single-stream task (Fig. 3b, left), where the psychometric weighting in behavior was clearly compressed (cf. Fig. 1c-d, blue). Descriptively, the neurometric map in the single-stream task indicated a pattern similar to that in the dual-stream task (i.e., anti-compression k > 1, and positive bias, b > 0). However, the improvement in fit over the linear/unbiased model was weak and statistically non-significant (Fig. 3b, left), potentially reflecting that the numerical distance effect in the single-stream task was overall weaker (cf. Fig. 2b).
Examining the mean neurometric parameter estimates in the two tasks statistically, they showed significant anti-compression (k > 1) both in the dual-stream [M=3.05, SE=0.40; t(29)=5.16, p<0.001, d=0.94] and in the single-stream task [M=3.81, SE=0.57; t(29)=4.90, p<0.001, d=0.89, t-tests against 1]. Direct comparison of neurometric k between the two tasks showed no significant difference [t(29)=1.25, p=0.22, d=0.28, paired t-test]. A positive offset bias (b) was evident in the dual-stream task [M=0.18, SE=0.04; t(29)=3.97, p<0.001, d=0.73] but not in the single-stream task [M=0.04, SE=0.05; t(29)=0.75, p=0.46, d=0.14, t-tests against 0; difference between tasks t(29)=2.27, p=0.03, d=0.54, paired t-test]. Together, the neurometric RSA results yielded no evidence for a compression of numerical magnitude akin to that observed in behavior in the single-stream task. If anything, the results were suggestive of anti-compression (k > 1) in both tasks, although it should be noted that the improvement in model fit (relative to a linear model) in the single-stream task was small and not statistically significant (Fig. 3b, left).
Univariate ERP results (CPP/P3)
We complemented our analysis by examining neurometric distortions also in univariate ERP signals, specifically in the sample-evoked CPP/P3 response. Previous research has implicated the CPP/P3 in decision formation, with its amplitude reflecting the perceived strength of evidence (Herding et al., 2019; O’Connell et al., 2012; Pisauro et al., 2017; Spitzer et al., 2016; Twomey et al., 2015; Wyart et al., 2015). CPP/P3 amplitudes were previously also found to be modulated by numerical sample values in the context of a dual-stream comparison task (Spitzer et al., 2017).
We observed modulations of CPP/P3 amplitude by numerical value both in the single-stream (Fig. 4a, left; pcluster<0.001) and in the dual-stream task (Fig. 4a, right; pcluster<0.001, repeated measures analyses of variance), with the modulation in the dual-stream task appearing descriptively stronger. The mean amplitudes showed a U-shaped pattern over numbers 1-9 (Fig. 4b), consistent with previous findings that CPP/P3 reflects the strength of decisional evidence in an unsigned fashion, that is, a theoretical quantity similar to the absolute |dv| in our psychometric model, which would reflect the strength of evidence for either choice, “<” or “>” (see also Herding et al., 2019; O’Connell et al., 2012; Pisauro et al., 2017; Spitzer et al., 2016; Twomey et al., 2015; Wyart et al., 2015). Thus, for model-based analysis, we fitted Eq. 1 to the CPP/P3 amplitude data (using pairwise distance matrices and gridsearch analogous to our neurometric RSA above), but using |dv| to generate the model-predicted pattern.
The best-fitting parameter estimates are shown in Fig. 4c (inset bar graph). Mirroring the RSA results, the parameter estimates based on CPP/P3 amplitude showed significant anti-compression (k > 1) in both tasks [single-stream: M=2.58, SE=0.55; t(29)=2.88, p=0.007, d=0.53, dual-stream: M=3.12, SE=0.51; t(29)=4.15, p<0.001, d=0.76, t-tests against 1], with no significant difference between tasks [t(29)=0.69, p=0.49, d=0.18, paired t-test]. A positive offset bias was again observed in the dual-stream task [M=0.28, SE=0.06; t(29)=5.00, p<0.001, d=0.912], but not in the single-stream task [M=0.09, SE=0.05; t(29)=1.78, p=0.09, d=0.33, t-tests against 0; difference: t(29)=2.97, p=0.006, d=0.63, paired t-test]. Together, the univariate ERP analysis thus corroborates our RSA finding that the neural processing of number samples in single-stream averaging was not characterized by compression but—if anything—by anti-compression, despite the evidence for compression in subsequent behavioral choice (cf. Fig. 1c-d).
Discussion
We observed opposite types of psychometric distortions (compression or anti-compression) in behavior when participants were instructed to process an identical stream of numbers in two distinct ways, namely, comparing the complete stream against a fixed target value or comparing two sub-streams against each other. However, contrary to expectations based on past research, the neural signals associated with the processing of the individual number samples showed evidence for anti-compression (i.e., enhanced processing of extreme values) under both instructions, regardless of whether extreme values were over- or underweighted in subsequent behavioral choice. We further observed qualitative differences between the sample representations in the two tasks, with a relatively weaker encoding of the samples’ numerical magnitude in the single-stream task.
In psychophysical research concerned with the how attributes of a physical stimulus (e.g., size, weight, color) relate to their subjective experience or perception, human observers are commonly found to underweight extreme values. Such subjective “compression” of magnitude can be observed in a great variety of settings, from basic sensory-perceptual judgments (Fechner, 1860; Stevens, 1957) to economic decisions (Kellen et al., 2016; McAllister & Tarbert, 1999; Tversky & Kahneman, 1992). There exist various theoretical accounts for the origin and potential benefits of subjective compression in perception and decision making (Bhui & Gershman, 2018; Ciranka et al., 2022; de Gardelle & Summerfield, 2011; Li et al., 2017; Pardo-Vazquez et al., 2019; Stewart et al., 2006; Summerfield & Li, 2018; Vandormael et al., 2017). Our present results seem to contrast these vast literatures with the finding that human brain signals tended to reflect the magnitude of numerical values in an anti-compressed fashion, even when they were compressed in later choice.
However, while the typical finding in many tasks is compression, there are task contexts where observers overweight extreme samples in choice behavior, in line with an anti-compression of sample values (Clarmann von Clarenau et al., 2022; Kunar et al., 2017; Ludvig et al., 2014; Luyckx et al., 2019; Shevlin et al., 2022; Spitzer et al., 2017; Tsetsos et al., 2012; Vanunu et al., 2020). We recently showed that such anti-compression can be beneficial in tasks that are computationally challenging (like our dual-stream comparison task), and where capacity-limited observers may be forced to selectively focus on a subset of the samples at the expense of others (Clarmann von Clarenau et al., 2022; see also Tsetsos et al., 2016). Our present findings may suggest that in task contexts in which higher-level processing capacities are exceeded, participant behavior might more directly reflect the brain’s default response to extreme values (i.e., privileged processing).
Despite the lack of evidence for different neural geometries of numerical magnitude in the two tasks, the samples’ overall representation in neural signals yet differed. In the dual-stream task, neural signals encoded the numerical magnitude of a sample more strongly than in the single-stream task. This result was unexpected because nominally, the numerical magnitude of a sample was of equal relevance in both tasks. In the single-stream task, in turn, we found relatively stronger encoding of which unique number symbol was presented. Further, although the color of a sample (red/blue) was task-irrelevant in the single-stream task, the neural encoding of color was as strong and as sustained as in the color-based dual-stream task. While these results were unexpected, they may suggest more general differences in the role of “abstract” magnitude processing (Dehaene, 2003; Nieder, 2005; Nieder & Dehaene, 2009; Piazza & Izard, 2009; Walsh, 2003) in the two tasks. Potentially, in the more challenging dual-stream task participants relied more directly on an intuitive “sense of magnitude” (Leibovich et al., 2017; Piazza et al., 2006; Spitzer et al., 2014) to gauge a sample’s decision value. The computationally simpler single-stream task, in contrast, may have allowed them to engage in more symbolic-analytic processing (e.g., approximate arithmetics and/or verbalization) – processes that might be less amenable to EEG-decoding than the numerical distance pattern that prevailed in the dual-stream task.
The neural anti-compression of sample values in both of the tasks was also evident in univariate CPP/P3 signals which had previously been implicated in the decisional evaluation of stimulus information (Herding et al., 2019; O’Connell et al., 2012; Pisauro et al., 2017; Spitzer et al., 2016; Twomey et al., 2015; Wyart et al., 2015). Interpreting the amplitude of CPP/P3 signal as an index of the perceived strength of evidence, our findings in the single-stream task show a mismatch between the pattern observed in sample-by-sample processing (anti-compression, as revealed by neural data) and that in eventual judgment of the aggregate stream (compression, as evident in behavior). Future work will be required to identify the neural mechanisms underlying the eventual downweighting of extreme values in such task contexts, despite enhanced encoding in sample-level decision signals. It should be noted that the amplitude of P3 signals is known to be modulated by various factors, including how rare or surprising an event is (Donchin, 1981; Duncan- Johnson & Donchin, 1977). However, the sample values in our experiment were uniformly distributed, that is, each value occurred equally often on average, ruling out an explanation in terms of stimulus frequency.
Whether extreme values are over- or underweighted has major implications for the behavioral choices people make. One and the same numerical evidence, as in the present study, may lead to opposite choices, depending on how people respond to and process extreme values. For instance, anti-compression can explain systematic violations of “rational” axioms, such as transitivity, in multi-attribute choice (Summerfield & Tsetsos, 2015; Tsetsos et al., 2016). More generally, non-linear distortions of objective data (such as numbers) have often been interpreted as paradigmatic manifestations of seemingly “irrational” human behaviors in decision making (e.g., non-linear probability weighting in risky choice as assumed in cumulative prospect theory; Tversky & Kahneman, 1992). However, over the past years a new literature has evolved that recasts these behaviors as well-adapted policies of capacity-limited observers, fostering rather than hampering their measurable performance under these constraints (Bhui et al., 2021; Gigerenzer et al., 2011; Gigerenzer & Brighton, 2009; Juechems et al., 2021; Lieder & Griffiths, 2020; Sims, 2003, 2010; Tsetsos et al., 2016). Here, we shed new light on the open question of how such adaptive distortions may arise mechanistically, in terms of the neural signal patterns evoked by samples of evidence while observers are in the process of reaching a decision. Our finding of a “default” anti-compression of values in neural responses—regardless of subsequent behavior—raises the question at which exact processing stage value compression emerges, and how it leads to, for instance, the well-known “diminishing sensitivity” to larger values in economic choices (e.g., Tversky & Kahneman, 1992).
Materials and Methods
Participants
Thirty-two healthy volunteers took part in the experiment. We excluded two participants who reported having misunderstood the task instructions and who performed near chance level (50% correct choices) in one of the tasks (52% and 53%, respectively; both p>0.4, Binomial tests against 0.5). Results are reported for the n=30 remaining participants (15 male, 15 female; mean age 27.4 ± 4.9 years; one left handed). All participants provided written informed consent and received €10 per hour as compensation, in addition to a €10 flat fee for participation, as well as a performance-dependent bonus (€7.03 ± 1.08 on average). The study was approved by the ethics committee of the Max Planck Institute for Human Development.
Experimental design
Each participant performed two variants of a sequential number integration task (Clarmann von Clarenau et al., 2022; Spitzer et al., 2017). The stimulus protocols in the two task variants were identical (Fig. 1a). On each trial, participants viewed a sequence of 10 Arabic digits (randomly drawn from a uniform distribution of numbers 1 to 9) displayed in either red or blue font color (randomly assigned to each sample, with the restriction that each sequence contained 5 red and 5 blue samples). In the “averaging” (single-stream) task, participants were asked to judge whether the average of all 10 number samples in the sequence (regardless of their color) was larger or smaller than 5. In the “comparison” (dual-stream) task, participants were asked to indicate whether the red or the blue samples had the higher average value. Past behavioral work has shown a psychometric compression of number values in the single-stream task, whereas anti-compression was evident in the latter dual-stream task (Clarmann von Clarenau et al., 2022; Spitzer et al., 2017).
The experiment was programmed in Python using the PsychoPy package (Peirce et al., 2019) and run on a Windows 10 PC. The experiment code is available on GitHub (https://github.com/sappelhoff/ecomp_experiment). Throughout the experiment, we additionally recorded eye-movements using an EyeLink 1000 Plus (SR Research Ltd., Canada), which were not analyzed in the present study. Participants were informed about the eye-movement recording and were instructed to keep their gaze at the center of the screen throughout the experiment.
Each trial started with a white central fixation stimulus (a combination of bulls eye and cross hair; Thaler et al. 2013) on an otherwise black screen. After 500 ms, the fixation stimulus disappeared and the number sequence was presented at a rate of 350 ms per sample (font Liberation Mono; height 3° visual angle; see Fig. 1a). Each sample was smoothly faded to black after 270 ms to improve the visual experience of the stimulus transitions. After the last sample, participants were prompted to enter a response by pressing the left or right button on a USB response pad (The Black Box ToolKit Ltd., UK). To avoid left/right motor response preparation during sequence presentation, in each of the two tasks, we randomized the mapping of responses (“smaller” or ”larger” in the averaging task; “red” or ”blue” in the comparison task) onto left/right button presses trial-by-trial, using a response screen (Fig. 1a, right).
If participants failed to respond within 3 s, the trial was discarded and after a delay of 100 ms, a message (“too slow!”) was displayed in red color for 1 s. On average, participants responded within 0.67±0.27 s and timeouts occurred only on 0.03% of trials. On the remaining trials, performance feedback was displayed (“correct” or “wrong”, in green or orange color, respectively) for 350 ms. All feedback was displayed centrally in Liberation Mono font with a height of 1° visual angle. On 4.85% of trials, in which the objective sequence average was precisely 5 (in the single-stream task) or identical for red and green samples (in the dual-stream task), a random feedback message was displayed. These trials were excluded from the analysis of accuracy levels, but were included in the modeling- and EEG analyses (see also Spitzer et al., 2017). After feedback, the central fixation stimulus re-appeared and after 500 to 1500 ms (randomly varied), the next trial started.
Each participant first performed 300 trials in one of the tasks (single-stream averaging or dual-stream comparison), followed by 300 trials in the other task (in counterbalanced serial order across subjects). Thus, 3000 number samples were presented in each task condition and participant. Trials were performed in blocks of 50, with summary performance feedback (percentage correct choices) being provided after each block. After completing all blocks of the first task, participants received the instructions for the second task. To avoid differences in stimulus input, the second task was performed on the exact same number sequences as the first task. Upon completing the second task, participants received a monetary bonus depending on their mean accuracy in both tasks.
EEG recording
The experiment was performed in an electrically shielded and soundproof cabin. Scalp EEG was recorded with 64 active electrodes (actiCap, Brain Products GmbH Munich, Germany) positioned according to the international 10% system. Electrode FCz was used as the recording reference. We additionally recorded the horizontal and vertical electrooculogram (EOG) and electrocardiogram (ECG) using passive electrode pairs with bipolar referencing. All electrodes were prepared to have an impedance of less than 10 kΩ. The data were recorded using a BrainAmp DC amplifier (Brain Products GmbH Munich, Germany) at a sampling rate of 1000 Hz, with an RC high-pass filter with a half-amplitude cutoff at 0.016 Hz (roll-off: 6 dB/octave) and low-pass filtered with an anti-aliasing filter of half-amplitude cutoff 450 Hz (roll-off: 24 dB/octave). The dataset is available on GIN in source format and formatted according to the Brain Imaging Data Structure (BIDS) using MNE-BIDS (Appelhoff et al., 2019; Gorgolewski et al., 2016; Pernet et al., 2019): https://gin.g-node.org/sappelhoff/mpib_ecomp_sourcedata/.
Behavioral data analysis
We calculated model-free decision weights to examine how strongly each numerical sample value (1, 2, …, 9) contributed to participants’ choices in the two tasks. In the single-stream task, these weights were computed as the proportion of times a sample value was associated with the subsequent choice “larger”. Analogously, in the dual-stream task, the weights were computed as the proportion of times the sample’s color (i.e., red or blue) was subsequently chosen (see also Spitzer et al., 2017). For comparison with model predictions (see below), we computed decision weights also from the model-predicted choice probabilities (CP, see Eq. 3) obtained from using the best fitting parameter estimates in each participant.
Psychometric model
To quantify psychometric distortions (i.e., compression or anti-compression) in behavior, we used a simple psychometric model that has been used extensively in previous work (Appelhoff et al., 2022; Clarmann von Clarenau et al., 2022; Li et al., 2017; Luyckx et al., 2019; Spitzer et al., 2017). The model formalizes the transformation of objective sample values X (here: numbers 1-9, normalized to the range [-1, 1]) into a subjective decision value dv as a sign-preserving power function: where exponent k (kappa) determines the overall shape of the transformation (k < 1: compression; k = 1: linear; k > 1: anti-compression). Parameter b (bias) implements an overall weighting bias towards smaller (b < 0) or larger (b > 0) numbers. Sample-level decision values (dv) are integrated into a trial-level decision value (DV) by summation over samples: where c is an indicator variable denoting a sample’s color (red: -1, blue: +1) in the dual-stream task. In the single-stream task, c was fixed at 1. This way, Eq. 2 effectively implements a comparison between streams in the dual-stream task, and simple averaging in the single-stream task. Finally, the trial level decision value (DV) is transformed into a choice probability according to a logistic function: where CP denotes the probability of choosing “>5” (in the single-steam task) or “blue>red” (in the dual-stream task), and parameter s quantifies the level of decision noise, with larger values of s implying more random choices.
The model was fitted to each participant’s individual choice data using the Nelder-Mead method as implemented in SciPy (Virtanen et al., 2020), with parameter values restricted to the ranges (k: [0, 5]; b: [-0.5, 0.5], s: [0.01, 3]). Fitting was performed iteratively using 900 combinations of different starting values for each task condition, and the solution with the lowest Bayesian Information Criterion (BIC) was used in the analysis. Statistical analysis of the fitted parameters proceeded with conventional inferential tests on the group level.
EEG preprocessing
We used functions from MNE-Python (Gramfort et al., 2013) and PyPrep (Appelhoff et al., 2018; based on Bigdely-Shamlo et al., 2015) to automatically mark noisy segments and bad channels in the EEG recordings. We additionally screened all recordings visually to reject noisy segments or bad channels that the automatic procedures had missed. This way, on average, 2.5 ± 1.6 channels were discarded per participant. Next, we corrected ocular and cardiac artifacts using independent component analysis (ICA). To this end, we high-pass filtered a copy of the raw data at 1 Hz and downsampled it to 100 Hz. We then ran an extended infomax ICA on all EEG channels and time points that were not marked as bad in the prior inspection. Using the EOG and ECG recordings, we identified stereotypical eye blink, eye movement, and heartbeat artifact components through correlation with the independent component time courses. We visually inspected and rejected the artifact components before applying the ICA solution to the (Winkler et al., 2015). We then filtered the ICA-cleaned data between 0.1 and 40 Hz, interpolated bad channels, and re-referenced each channel to the average of all channels.
Event-related potentials (ERPs)
We epoched the preprocessed data from −0.1 to 0.9 s relative to each sample stimulus onset. Remaining bad epochs were rejected using a thresholding approach from the FASTER pipeline (Step 2; Nolan et al., 2010). On average, n = 5764 clean epochs (96.1%) per participant were retained for analysis. The epochs were then downsampled to 250 Hz and baseline corrected relative to the period from −0.1 to 0 s before stimulus onset. Since our analyses focused on stimulus-specific effects, we subtracted the overall mean waveform from the individual epochs, in each of the two task conditions. The mean-subtracted epochs were then averaged into stimulus-specific ERPs for each sample value (1, 2, …, 9) in each color (red/blue). Note that the individual samples in a stream were statistically independent by design, allowing us to examine stimulus-specific ERP responses in a time window that overlapped with the onset of the next sample stimulus.
Representational similarity analysis (RSA)
We used representational similarity analysis (RSA; Kriegeskorte & Kievit, 2013) to examine the encoding of sample information in multivariate ERP patterns. Specifically, we examined the representational geometry of our stimulus space (numbers 1 to 9, colored red or blue) in terms of the multivariate (dis-)similarity between the ERP topographies (64 channels) associated with the 18 different stimuli. Representational dissimilarity was computed as the Mahalanobis distance, between each pair of stimuli, yielding an 18×18 representational dissimilarity matrix (RDM), at each time point of the peri-sample epoch. To compute the Mahalanobis distance, we fitted a general linear model to the z-scored trial data, with each stimulus type specified as a condition, and used the residual trial-by-trial variance for pairwise distance calculation. This procedure ensured multivariate noise normalization for the RDMs (Guggenmos et al., 2018). Below, we refer to the thus obtained RDMs as ERP-RDMs.
To examine the information encoded in the ERP-RDM time courses, we used three different model RDMs (see Fig. 2a) reflecting (i) the unique digit symbols, with minimum dissimilarity between identical digits, and maximum dissimilarity between distinct digits (“digit” model), (ii) the samples’ color, with minimum (maximum) dissimilarity between same (different) colors (“color” model), and (iii) the numerical distance between samples, that is, the arithmetic difference between their objective number values (“numerical distance” model). To render the three models fully independent, we recursively orthogonalized each model RDM with respect to all others using the Gram-Schmidt process (Appelhoff et al., 2022; Spitzer et al., 2017). Finally, we assessed the match between each model RDM and the empirically observed ERP-RDMs via Pearson correlation at each time point, using only the lower triangle of the RDMs and omitting the diagonal, to exclude redundant matrix cells.
For statistical analysis of the RSA time courses, we used t-tests against zero with cluster-based permutation testing to control for multiple comparisons over time points (Maris & Oostenveld, 2007). To test whether RSA results differed between the single- and dual-stream tasks, we first computed their difference, followed by cluster-based permutation tests against zero. All permutation tests were performed over 1000 iterations with a cluster-defining threshold of p = 0.01 and cluster length as the critical statistic (thresholded at p=0.01).
Analysis of neurometric distortions
The theoretical model underlying conventional RSA of numerical distance effects (see above) is a straight number line, where the numbers (1, 2, …, 9) are equidistant. The standard numerical distance model (Fig. 2a, right) is equivalent to a model of dv according to Eq. 1 (see Psychometric model) where k = 1 and b = 0. To examine potential nonlinear distortions of the number representations in neural signals (“neurometric” distortions), we constructed numerical distance models based on dv while varying the values of k (from log(k) = -2 to +2, see Fig. 3b) and b (from -0.5 to 0.5). Varying k on a log scale centered around log(k) = 0 (i.e., k = 1) ensured that parameter estimates were not biased to show anti-compression (or compression) by chance in subsequent gridsearch. For each parameter combination, we correlated the resulting model RDMs with the ERP-RDM, yielding a grid (“neurometric map”) of the parameter space (see Fig. 3b). The parameter combination with the maximum correlation was used as the estimate of the participant’s neurometric distortion parameters (see also Spitzer et al., 2017; Appelhoff et al., 2022). Statistical analysis of the neurometric parameter estimates proceeded with conventional statistical tests on the group level.
Univariate ERP analysis
For complementary inspection of univariate ERP responses evoked by the number samples, we examined the stimulus-specific ERP (see above) for each sample value (1-9; collapsed across red/blue colors). To focus on CPP/P3 responses (see Results), the ERPs were pooled over centro-parietal channels (CP1, P1, POz, Pz, CPz, CP2, P2) and amplitudes were examined in a time window from 300 ms to 700 ms based on previous work (Appelhoff et al., 2022; Polich, 2007; Spitzer et al., 2017; Wyart et al., 2015). The ERP time courses were analyzed statistically using cluster-based permutation testing (see above). For model-based analysis, we used the same approach as in our analyses of neurometric distortions in RSA (see above), except that the model RDMs were constructed from |dv| (i.e., the absolute, unsigned magnitude of dv, see Results) and correlated with the pairwise differences in univariate ERP amplitude between samples 1-9.
Data availability
All data is available on GIN: https://gin.g-node.org/sappelhoff/mpib_ecomp_sourcedata/.
Code availability
All analysis code is available on GitHub: https://github.com/sappelhoff/ecomp_analysis. The experiment code is available on GitHub: https://github.com/sappelhoff/ecomp_experiment.
Ethics information
The study was approved by the ethics committee of the Max Planck Institute for Human Development.
Author contributions
SA, BS: Formal analysis, Visualization, Methodology, Conceptualization, Project Administration, Writing - original draft
SA: Investigation, Validation, Data curation, Software
SA, BS, RH: Writing - review & editing
BS: Supervision
RH, BS: Resources
Funding
This work was supported by a European Research Council Consolidator Grant ERC-2020-COG-101000972 (BS).
Competing interests
The authors declare no competing interests.
Acknowledgements
We thank Anna Faschinger, Gabriele Inciuraite, Aleksandra Zinoveva, Larissa Samaan, Simon Ciranka, and Jann Wäscher for help with data collection, and Verena Clarmann von Clarenau and Thorsten Pachur for helpful discussions.