## ABSTRACT

Theoretical and empirical work posits the existence of a common magnitude system in the brain. Such a proposal implies that manipulating stimuli in one magnitude dimension (e.g. time) should interfere with the subjective estimation of another magnitude dimension (e.g. space). Here, we asked whether a generalized Bayesian magnitude estimation system would sample sensory evidence using a common, amodal prior. Two psychophysical experiments separately tested participants on their perception of duration, surface, and numerosity when non-target magnitude dimensions and the rate of sensory evidence accumulation were manipulated. First, we found that duration estimation was resilient to changes in surface or numerosity, whereas lengthening (shortening) duration yielded under- (over-) estimations of surface and numerosity. Second, the perception of numerosity and surface were affected by changes in the rate of sensory evidence accumulation, whereas duration was not. Our results suggest that a generalized magnitude system based on Bayesian computations would minimally necessitate multiple priors.

## INTRODUCTION

The representation of space, time, and number is foundational to the computational brain [1-3], yet whether magnitudes share a common (conceptual or symbolic) format in the brain is unclear. Walsh’s A Theory Of Magnitude (ATOM) [2] proposes that analog quantities are mapped in a generalized magnitude system which entails that space, time, and number may share a common neural code. One additional implication for the hypothesis of a common representational system for magnitudes is that the estimation of a target magnitude dimension should be affected by the manipulation of another, non-target magnitude dimension, such that the larger the magnitude of the non-target feature, the larger one should perceive the target magnitude to be (Figure 1A). Such predictions can be formalized in Bayesian terms [04] so that each dimension magnitude yields a likelihood estimate subsequently informed by an amodal prior common to all magnitude dimensions (Figure 1B). In line with ATOM and the common magnitude system hypothesis, a growing body of behavioral evidence ([5-26], for review see [27-29]) suggests the existence of interferences across magnitude dimensions. Several neuroimaging studies also suggest the possibility of a common neural code for quantity estimations mostly implicating parietal cortices (Bueti and Walsh, 2009; Dormal, et al. 2012; Pinel et al., 2004 ; Hayashi et al., 2015; Hubbard et al., 2005; Riemer et al., 2016; Nieder, 2013 but see [37-38]). However, while a variety of interactions between time, space and number has been reported, the directionality of these interactions is not always consistent in the literature (e.g. [13-14]): for instance, manipulating the duration of events has seldom been reported to affect numerical and spatial magnitudes [12, 13, 26] and size [5, 7] typically influence duration. Yet, if time, number and space share a common representational system in the brain and a common amodal prior, all magnitude dimensions should interact with each other in a bi-directional manner (Figure 1A).

Recent discussions in the field suggest that a common magnitude system would evaluate and combine quantities in the external world on the basis of Bayesian computations [13, 39]. Convergent with this proposal, recent examinations of Bayesian processing in magnitude estimation have demonstrated a number of distinct effects [4]. One primary example is the so-called central tendency effect, wherein magnitude estimates regress to the mean of the stimulus set, such that large (small) magnitudes are under (over) estimated. Crucially, central tendency effects have been demonstrated across a number of different magnitude judgments, including time (historically known as Vierordt’s law [39, 40]), number [41], distance and angle [42]. Further, correlations between the degree of central tendency have been found between different magnitude dimensions [42], suggesting the existence of “global” priors for magnitude [4]. The notion of global priors is compatible with Walsh’s ATOM model by suggesting that, though differences in the initial processing of different magnitude dimensions may exist, the representation of these magnitudes is amodally stored. Further, the existence of global priors would provide an explanation for congruency effects between different magnitude dimensions. However, a single global prior for magnitude estimations would not explain why congruency effects may be inconsistent between dimensions or why the directionality of interferences may differ across magnitude dimensions.

To address this first working hypothesis, we used a paradigm in which stimuli consisted of clouds of dynamic dots characterized by the total *duration* of the trial (D: time magnitude), the total *number* of dots (N: number magnitude) and the overall *surface* filled by the dots (S: space magnitude). Two experiments were conducted (Figure 2A-B). In a first experiment (Experiment 1), while participants estimated a target magnitude dimension (e.g. D), we independently manipulated the non-target magnitude dimensions (e.g. S and N). This design allowed us to test all possible combinations and investigate possible interactions between magnitudes (Table 1). If magnitude dimensions interact, increasing or decreasing the N or the S should lead to an under/overestimation of D (Figure 2D). These effects should be bidirectional so that when participants estimate N or S, increasing or decreasing the non-target dimensions D should lead to under or overestimation of the target magnitude dimension N or S. On the other hand, if dimensions are independent, manipulating the number of events in a given trial should not affect duration or surface estimates. Similarly, decreasing or increasing the duration should not affect numerical or spatial judgments if magnitudes are independent.

In a second working hypothesis, we manipulated the accumulation regime of sensory evidence for N and S estimation (Figure 2B). The accumulation of sensory evidence in time for space and number has seldom been controlled for or manipulated during magnitude estimations. In a prior experiment [13], constraining the duration of sensory evidence accumulation in the S and N dimensions, the estimation of duration remained resilient to changes in the other dimensions, whereas D affected the estimation of S and N: curiously, the longer (shorter) durations decreased (increased) the estimation of S and N. These results were discussed in the context of a possible Bayesian integration of magnitude dimensions. Similarly, here, using a dynamic paradigm in which N and S accumulate over time raises the question of the implications of varying the speed or rate of sensory evidence delivery: for a given N or S, if D increases, the speed of presentation decreases, and vice versa. Hence, while the number of dots and the cumulative surface accumulated linearly in time in the Experiment 1 (Figure 1B), in the second experiment (Experiment 2) we investigated whether changes in the rate of presentation of visual information affected the estimation of D, N, and S. Two evidence accumulation regimes were tested: a fast-slow (FastSlow) and a slow-fast distribution (SlowFast), see Stimuli part in Material & Methods section.

In a third question, we wished to investigate the extent to which Bayesian models could explain the behavioral results obtained in magnitude estimation, independent of the means by which participants provided their estimates. Thus far, studies demonstrating central tendency effects [39, 42, 43] have all relied on continuous estimation procedures, wherein participants estimated a particular magnitude value with a motor response. In particular, these studies utilized reproduction tasks, which require participants to demarcate where (when) a particular magnitude matched a previously presented standard. In contrast, the majority of studies demonstrating congruency effects in magnitude estimation have all employed two-alternative forced choice (2AFC) designs. This difference may be particularly relevant as recent studies have demonstrated that the size-time congruency effect, one of the most heavily studied and replicated, depends on the type of decision being made ([44] but see [25,45] for congruency effects with temporal reproduction). As such, for both Experiment 1 and 2, we provide systematic quantifications of the magnitude estimates as categorical estimations together with analysis of continuous reports.

## MATERIALS & METHODS

### Participants

A total of 45 participants were tested. 3 participants did not come to the second session and 10 were disregarded for poor performance. Hence, 17 healthy volunteers (7 males, mean age 24.9 ± 5.8 y.o.) took part in Experiment 1, and 15 participants (8 males, mean age 26.5 ± 7 y.o.) took part in Experiment 2. All had normal or corrected-to-normal vision. Both experiments took place in two sessions one week apart. Prior to the experiment, participants gave a written informed consent. The study was conducted in agreement with the Declaration of Helsinki (2008) and was approved by the Ethics Committee on Human Research at Neurospin (Gif-sur-Yvette, France). Participants were compensated for their participation.

### Stimuli

The experiment was coded using Matlab 8.4 with Psychtoolbox (v 3.0.12) and built on a published experimental design (see [13]). Visual stimuli were clouds of grey dots which appeared dynamically on a black computer screen (1024 x 768 pixels, 85 Hz refresh rate). Dots appeared within a virtual disk of diameter 12.3 to 15.2 degrees of visual angle; no dots could appear around the central fixation protected by an invisible inner disk of 3.3 degrees. Two dots could not overlap, the duration of each dot varied between 35 ms to 294 ms, and the diameter between 0.35 to 1.14 degrees. A cloud of dots was characterized by its duration (D: total duration of the trial during which dots were presented), its numerosity (N: cumulative number of dots presented on the screen in a given trial) and its surface (S: cumulative surface covered by the dots during the entire trial). On any given trial, D, N and S could each take 6 possible values corresponding to 75, 90, 95, 105, 110 and 125 % of the mean value. We fixed D to 800 ms (D_{mean} = 800 ms) and initially picked N_{mean} = 30 dots and S_{mean} = 432 mm² which were individually calibrated in the first session of the experiment (see Procedure). To ensure that luminance was not used as a cue to perform the task, the relative luminance of dots varied randomly across all durations among 57, 64, 73, 85, 102 and 128 in the RGB-code. In Experiment 1, the total number of dots accumulated linearly over time, 2 to 7 dots at a time in steps of 9 to 13 iterations (Figure 2A). In Experiment 2, the total number of dots accumulated in a fast-to-slow or in a slow-to-fast progression (FastSlow or SlowFast, respectively): in FastSlow, 75% ± 10% of the total number of dots in the trial were presented in the first 25% of the duration of the trial, whereas in SlowFast, 25% ± 10% of the total number of dots was shown in 75% of the total duration of the trial.

### Procedure

Participants were seated in a quiet room ~60 cm away from the computer screen with their head maintained on a chinrest. The main task consisted in estimating the magnitude of the trial along one of its three possible dimensions (D, N, or S). Each experiment consisted of two sessions: in the first session, stimuli were calibrated to elicit an identical discrimination threshold in all three dimensions on a per individual basis (see [13]); the second session consisted in the experiment proper.

In the first session of Experiment 1 and 2, the task difficulty across magnitudes was individually calibrated by computing the participant’s Point of Subjective Equality (PSE: 50% discrimination threshold) and the Weber Ratio (WR) for each dimension D, N, and S. Specifically, participants were passively presented with exemplars of the minimum and maximum value for each dimension and were then required to classify 10 of these extremes as minimum ‘-’ or maximum ‘+’ by pressing ‘h’ or ‘j’ on an AZERTY keyboard. Participants then received feedback indicating the actual number of good answers that they provided. Subsequently, the PSE and the WR were independently assessed for each magnitude by varying one dimension and keeping the other two dimensions at their mean values (e.g. if D varied among its 6 possible values, S was S_{mean} and N was N_{mean}). 5 trials per value of the varying dimension were collected yielding a total of 30 trials per dimension from which the individual’s PSE and WR were computed and compared. This process (~15 min) was iterated until the PSE and the WR were similar and stabilized across magnitudes. The final mean values (mean +/- SD) for Experiment 1 were N_{mean} = 32 +/− 3 dots and S_{mean} = 476 +/- 58 mm^{2}. The final mean values for Experiment 2, N_{mean} = 32 +/- 2 dots and S_{mean} = 490 +/- 41 mm^{2}.

In the second session, and for both Experiment 1 and 2, participants first performed 30 trials/magnitude dimension similar to the first session to ensure that their PSE and WR remained identical. Only two participants in Experiment 1 required a recalibration procedure.

Subsequently, participants performed the magnitude estimation task *proper* in which participants were asked to provide a continuous estimation of a given magnitude by moving a cursor on a vertical axis whose extremes were the minimal and maximal values of the dimension. In a given trial, participants were provided with the written word ‘Durée’ (Duration), ‘Nombre’ (Number) or ‘Surface’ (Surface) which indicated to participants which dimension they had to estimate (Figure 2A). At the end of a trial, the vertical axis appeared on the screen with the relative position of ‘+’ and ‘−’ pseudo-randomly assigned. The cursor was always initially set in the middle position on the axis. Participants used the mouse to vertically move the slider along the axis and made a click to validate their response. They were asked to emphasize accuracy over speed. Trials were pseudo-randomized across dimensions and conditions.

In Experiment 1, five experimental conditions were tested per dimension: in the control condition, the two non-target dimensions were kept to their mean values, and in the four remaining conditions, one of the other non-target dimension was minimal or maximal while the other was kept to its mean (Table 1). A total of 1080 trials were tested (3 dimensions x 5 conditions x 6 values x 12 repetitions).

In Experiment 2, there were two main sensory accumulation regimes (FastSlow, and SlowFast) and the emphasis was on the effect of duration on surface and number. Hence, the main control condition consisted in assessing the estimation of duration with S_{mean} and N_{mean} and whether the rate of evidence delivery affected duration estimates. Similarly for N and S, two control conditions investigated the effect of the rate of stimulus presentation, without varying non-target dimensions. Due to results obtained in Experiment 1, Experiment 2 neither investigated interactions of N or S on D, nor interactions between N and S. Ten experimental blocks alternated between FastSlow and SlowFast presentations counterbalanced across participants. 12 repetitions of each possible combination were tested yielding a total of 144 trials for D (2 distributions x 6 durations x 12 repetitions), 432 trials for N (2 distributions x 3 conditions x 6 numerosities x 12 repetitions) and 432 trials for S (2 distributions x 3 conditions x 6 surfaces x 12 repetitions) for a grand total of 1008 trials. (Table 2).

### Analyses

To analyze the point-of-subjective equality (PSE) and the Weber Ratio (WR), participants’ continuous estimates were first transformed into categorical values: a click between the middle of the axis and the + (-) was considered as a + (-) response. Proportions of + were computed on a per individual basis and separately for each dimension and each experimental condition. Proportions of + responses were fitted using the logit function (Matlab 8.4) on a per individual basis. Goodness-of-fits were individually assessed and participants for whom the associated p-values in the control conditions were >.05 were excluded from the analysis. Per condition, PSE and WR that were 2 standard deviations away from the mean were disregarded and replaced by the mean of the group (max of 2 values / condition across all individuals). Statistics were run using R (Version 3.2.2). PSE and WR were defined as:

Additionally, continuous estimates were analyzed to interrogate central tendency effects. For each magnitude dimension, continuous estimates were expressed as the relative position on the slider that participants selected on each given trial, with higher percentages indicating closer proximity to ‘+’. To measure the central tendency effects, continuous estimates were plotted against the corresponding magnitude for each condition, also expressed as a percentage – where 0 indicated the smallest magnitude and 100 indicated the largest – and fits with a linear regression, and the slope and y-intercept of the best fitting line were extracted [46, 47]. Slope values closer to 1 indicated veridical responding (participants responded with perfect accuracy), whereas values closer to 0 indicated a complete regression to the mean (participants provided the same estimate for every magnitude). In contrast, intercept values of these regressions could indicate an overall bias for over- or under-estimation. To compare central tendency effects between magnitude dimensions, correlation matrices between slope values for D, N, and S were constructed. Bonferroni corrections were applied to control for multiple comparisons.

### RESULTS

To examine Bayesian effects in the magnitude system, we evaluated both choice and continuous judgments in two magnitude estimation experiments using variations of the same paradigm (Figure 2A). To evaluate choice responses, continuous estimates were binned according to which end of the scale they were closer to. Previous work has demonstrated that bisection tasks and continuous estimations are compatible and provide similar estimates of duration [43, 48]. Our intention was thus to first replicate the effects of Lambrechts and colleagues [13] with a modified design, and second, to measure central tendency effects in our sample to examine whether these effects correlated between magnitude dimensions, which would suggest the existence of global priors ([4], Figure 2D).

### Control conditions: matching task difficulty across magnitude dimensions

Two independent repeated-measures ANOVAs with the PSE or WR as dependent variables using magnitude dimensions (3: D, N, S) and control conditions (3: Linear (Experiment 1), SlowFast and FastSlow (Experiment 2) distributions) as within-subject factors did not reveal any significant differences (all *p* >.05). This suggested that participants’ ability to discriminate the different values presented in the tested magnitudes was well matched across magnitude dimensions (Figure 2C).

### Experiment 1: Duration affects Number and Surface estimates

We first analyzed the data of Experiment 1 as categorical choices. Figure 3A illustrates the grand average estimations of duration, numerosity, and surface for all experimental manipulations (colored traces) along with changes of PSE (insets).

Separate 2x2 repeated-measures ANOVAs were run on PSE for each target magnitude dimension using the non-target magnitude dimensions (2) and their magnitude values (2: min, max) as within-subject factors. No main effects of non-target magnitude dimension (F[1,16] = 0.078, *p* = 0.780), magnitude value (F[1,16] =0.025, *p* =0.875) or their interaction (F[1,16] = 0.003, *p* =0.957) were found on duration (D) indicating that manipulating N or S did not change participants’ estimation of duration (Fig. 3A, left panel). In the estimation of N, main effects of non-target dimensions (F[1,16] = 7.931, *p* = 0.0124), their magnitude value (F[1,16]=25.53, *p* =0.000118) and their interaction (F[1,16] = 23.38, *p* =0.000183) were found. Specifically, when the non-target dimensions were at their minimal value, the PSE obtained in the estimation of N was lower than when the non-target dimensions were at their maximal value. Additionally, in N estimation, D_{min} lowered the PSE more than S_{min}, and D_{max} raised the PSE more than S_{max}. Paired t-tests were run contrasting the PSE obtained in the estimation of N during the control (D_{mean} S_{mean}) and other experimental conditions: D_{min} significantly increased [PSE(D_{min}) < PSE(D_{mean}): *p* = 4.1e^{-5}] whereas D_{max} significantly decreased [PSE(D_{max}) > PSE(D_{mean}): *p* = 0.0032] the perceived number of dots (Fig. 3A, middle panel, inset). There were no significant effects of S on the estimation of N. Altogether, these results suggest that the main effect of non-target dimension on numerosity estimation was driven by the duration of the stimuli.

In the estimation of S, we found no main effect of non-target magnitude dimension (F[1,16] = 1.571, *p* = 0.228) but a significant main effect of magnitude value (F[1,16]=22.63, *p* =0.000215). The interaction was on the edge of significance (F[1,16] = 3.773, *p* =0.0699) suggesting that, as for N, only one non-target magnitude dimension may be the main driver of the significant results observed in the magnitude effect. Paired t-tests contrasting the PSE obtained in the estimation of S during the control (D_{mean} N_{mean}) and other conditions showed that D_{min} significantly increased (PSE(D_{min}) < PSE(D_{mean}): *p* = 8.7e^{−4}), whereas D_{max} significantly decreased (PSE(D_{max}) > PSE(D_{mean}): *p* = 0.035) the perceived surface (Fig. 3A, right panel). No significant effects of N on S were found. As observed for the estimation of N, these results suggest that the main effect of non-target magnitude dimension on the estimation of S was entirely driven by the duration.

Overall, the analysis of PSE indicated that participants significantly overestimated N and S when dots were presented over the shortest duration, and underestimated N and S when dots accumulated over the longest duration. Additionally, manipulating N or S did not significantly alter the estimation of duration. No significant interactions between N and S were found. To ensure that these results could not be accounted for by changes in participants’ perceptual discriminability in the course of the experiment, repeated-measures ANOVA were conducted independently for each target dimension (3: D, N, S) with the Weber Ratio (WR) as dependent variable and experimental conditions (5) as main within-subject factors. No significant differences (all *p* > .05) were found suggesting that the WRs were stable over time, and that task difficulty remained well matched across dimensions in the course of the experiment.

For the analysis of continuous estimates (Figure 3B), we first examined the effect of central tendency for each target magnitude dimension, collapsing across the non-target magnitudes (Figure 4A). A repeated measures ANOVA of slope values with magnitude as a within-subjects factor revealed a main effect of the target magnitude [F(2,32) = 13.284, *p* = 0.000063]. Post-hoc paired t-tests identified this effect as driven by a lower slope, indicating a greater regression to the mean for S as compared to D [*t*(16) = 3.495, *p* = 0.003] and N [*t*(16) = 5.773, *p* = 0.000029], with no differences in slope values between D and N [*t*(16) = 0.133, *p* = 0.896] (Figure 4B).

Further analyses revealed comparable findings as the categorical analysis: separate 2x3 repeated measures ANOVAs were run for each magnitude dimension, with the non-target magnitude dimension and its magnitude value as within-subject factors. Analysis of slope values revealed no significant main effects or interactions for any of the tested magnitudes (all *p* >.05), indicating no change in central tendency, as a function of the non-target magnitudes. However, an analysis of intercept values demonstrated a significant main effect of the non-target dimension for S [*F*(2,32) = 24.571, *p* < 0.00001] and N [*F*(2,32) = 39.901, *p* < 0.00001], but not for D [*F*(2,32) = 0.010, *p* = 0.99]. Specifically, intercept values were shifted higher (lower) when D_{max} (D_{min}) was the non-target dimension in both the S and N tasks (Figure 3B, red hues), but not when the non-target dimension was N for S, S for N, or for either S or N when D was the target magnitude dimension.

To examine the central tendency effects across magnitude dimensions, we correlated the slope values between target magnitude dimensions (Figure 4C). Collapsing across the non-target dimensions, we found that all three slope values significantly correlated with one another [D to S: Pearson *r* = 0.594; D to N: *r* = 0.896; S to N: *r =* 0.662]. Given that S exhibited a greater central tendency than D or N, we compared the Pearson correlation coefficients with Fisher’s z-test for the differences of correlations; this analysis revealed that the D to N correlation was significantly higher than the D to S correlation [Z = 2.03, *p* = 0.04], and marginally higher than the S to N correlation [Z = 1.73, *p* = 0.083], suggesting that D and N dimensions, which had similar slope values, were also more strongly correlated with each other than with S. To further explore this possibility, we conducted partial Pearson correlations of slope values; here, the only correlation to remain significant was D to N, when controlling for S [*r* = 0.8352], whereas D to S, controlling for N, and N to S, when controlling for D were no longer significant [*r* = 0.0018 and 0.3627, respectively].

The results of the correlation analysis revealed that D and N tasks were highly correlated in slope, indicating that individual subjects exhibited a similar degree of central tendency for these two magnitude dimensions (Figure 4A). To explore this at a more granular level, we expanded our correlation analysis to include all non-target dimensions (Figure 4C). The result of this analysis, with a conservative Bonferroni correction (*r* > 0.8) for multiple comparisons confirmed the above results, demonstrating that D and N dimensions were correlated across most non-target dimensions, but that D and N dimensions were weakly and not significantly correlated with S. This finding suggests that D and N estimation may rely on a shared prior, that is separate from S; however, a shared (D, N) prior would not explain why D estimates were unaffected by changes in N, nor would it explain why S estimates are affected by changes in D.

Lastly, we sought to compare the quantifications based on continuous data with those from the categorical analysis. Previous work has demonstrated that the WR on a temporal bisection task correlates with the central tendency effect from temporal reproduction [43]. To confirm this, we measured the correlations between the slope values of continuous magnitude estimates with the WR from the categorical analysis. As predicted, we found a significant negative correlation between slope and WR for D (*r* = -0.69) and N (*r* = -0.57); however, the correlation for S failed to reach significance (*r* = -0.41, *p* = 0.1), indicating that greater central tendency (lower slope values) were associated with increased variability (larger WR). This finding is notable, as the analysis of WR values did not reveal any difference between magnitude dimensions. This suggests that the slope of continuous estimate judgments may be a better measure of perceptual uncertainty than the coefficient of variation derived from categorical responses.

### Experiment 2: Duration is robust to accumulation rate, not N and S

In Experiment 2, participants estimated D, N or S while the accumulation regime was manipulated as either FastSlow or SlowFast (Fig. 2B, Table 2). As previously, we systematically analyzed the categorical and the continuous reports. First, we tested the effect of the accumulation regime on the estimation of each magnitude dimension by using a 2x3 repeated-measures ANOVA with PSE measured in control conditions (Table 2, 1^{st} row) as independent variable and distribution (2: FastSlow, SlowFast) and magnitude dimension (3: N, D, S) as within-subject factors. Marginal main effects of accumulation regime (F[1,28] = 2.872, *p* = 0.0734) and magnitude dimensions (F[1,14] = 4.574, *p* = 0.0506) were observed. Their interaction was significant (F[2,28] = 10.54, *p* = 0.0004). A post-hoc t-test revealed no significant effects of accumulation regime on the estimation of D (*p* = 0.23), but significant effects of accumulation regime in the estimation of N (*p* =0.016) and S (*p* = 0.0045) (Fig. 5A).

Second, we tested the effect of D and accumulation regime on the estimation of N and S (Figure 5A, top insets). We conducted a 2x2x2 repeated-measures ANOVA with PSE as an independent variable and magnitude dimension (2: N, S), accumulation regime (2: FastSlow, SlowFast), and duration (3: D_{min}, D_{max}) as within-subject factors. Main effects of accumulation regime (F[1,14] = 22.12, *p* = 0.000339) and duration (F[1,14] = 27.65, *p* = 0.000121) were found, suggesting that both N and S were affected by the distribution of sensory evidence over time, and by the duration of the sensory evidence accumulation. No other main effects or interactions were significant although two interactions trended towards significance, namely the two-way interaction between accumulation regime and duration (F[1,14] = 3.482, *p* = 0.0831) and the three-way interaction between dimension, accumulation regime, and duration (F[1,14] = 3.66, *p* = 0.0764). These trends were likely driven by the SlowFast condition as can be seen in Figure 5A.

For the analysis of continuous data (Figure 5B), we first examined any overall differences in slope values for different accumulation regimes (FastSlow *vs.* SlowFast) across all three target magnitude dimensions (D, N, S). A (3x2) repeated measures ANOVA with the above as within-subjects factors revealed a main effect of magnitude [*F*(2,32) = 7.878, *p* = 0.002], with S once again demonstrating the largest slope value, but no effect of accumulation regime or interaction (both *p* >.05), suggesting that the rate of accumulation did not influence the central tendency effect. However, on the basis of our *a priori* hypothesis, post-hoc tests revealed a significantly lower slope value for N in SlowFast compared to FastSlow [*t*(17) = 3.067, *p =* 0.007], suggesting that participants exhibited more central tendency for numerosity when the accumulation rate was slow in the first half of the trial (Figure 5B). The analysis of intercept values did not reveal any effects of accumulation regime or magnitude (all *p* >.05). However, on the basis of our *a priori* hypothesis, post-hoc tests demonstrated that S exhibited a significantly lower intercept for SlowFast compared to FastSlow [*t*(17) = 3.609, *p* = 0.002], with no changes for either D or N (both *p* >.05), indicating that participants underestimated surface when the rate of evidence accumulated slowly in the first half of the trial.

For S and N, further examination of slope values for the three possible durations using a 2x2x3 repeated measures ANOVA with magnitude (S, N), accumulation regime (FastSlow,SlowFast), and duration (D_{min}, D_{mean}, D_{max}) as within-subjects factors, revealed a significant main effect of magnitude [*F*(2,32) = 7.717, *p* = 0.013] and of accumulation regime [*F*(2,32) = 11.345, *p* = 0.004], but not of duration [*F*(2,32) = 1.403, *p* = 0.261]. Using the same analysis for intercept values, we found no main effects of magnitude [*F*(2,32) = 1.296, *p* = 0.272], but a significant effect of accumulation regime [*F*(2,32) = 5.540, *p* = 0.032] and of duration [*F*(2,32) = 21.103, *p* = 0.000001]. More specifically, we found that intercept values were lower for longer durations, indicating greater underestimation when the interval tested was longer. No other effects reached significance (all *p* >.05).

Overall, these findings indicate that duration estimations were immune to changes in the rate of accumulation of non-target magnitudes, similar to the findings of Experiment 1. Also similar, we found that estimates of S and N were affected by duration as non-target magnitude, with longer durations associated with greater underestimation of S and N (Figure 6). In addition, our results demonstrate a difference between accumulation regimes for S and N, with SlowFast regimes associated with greater underestimation than FastSlow, regardless of duration. Lastly, we observed that SlowFast accumulation regimes led to an increase in the central tendency effect, suggesting that slower rates of accumulation may increase reliance on the magnitude priors.

## Discussion

In this study, we report that when sensory evidence accumulates equally over time and when task difficulty is equated across magnitude dimensions (space, time, number), duration estimates are resilient to manipulations of number and surface, whereas number and surface estimates are biased by the temporal properties of sensory evidence accumulation. Specifically, we replicated the findings of Lambrechts and colleagues [13] by demonstrating that number and surface estimates are under- and over-estimated when presented for long and short durations, respectively. As we did not find robust bidirectional interactions between dimensions, these findings do not support the idea of a common magnitude system in the brain. However, we do not argue that time, number and space do not interact under certain constraints. Specifically, by considering a Bayesian model relying on multiple priors (one for each dimension), magnitudes may interact when providing conflicting sensory cues. Recent hypotheses suggest that a Bayesian framework can provide a general explanation for the variety of behavioral features observed in magnitude estimations independently applied to distance, loudness, numerical or temporal judgments [4]. The proposed Bayesian framework combines an estimate of the likelihood (sensory input) with a prior representation (memory). One major goal of our study was thus to determine the degree to which different magnitude dimensions might rely on an amodal global prior representation of magnitude as would be expected in a generalized magnitude system such as ATOM [2]. To accomplish this, participants took part in two experiments independently manipulating the congruence across magnitude dimensions (Experiment 1) and the rate of sensory evidence provided to participants (Experiment 2).

A first prediction was that if different magnitude dimensions rely on a single amodal prior, then magnitude estimates should exhibit similar levels of central tendency across magnitude dimensions (duration, surface, number; Figure 2D). Instead, in Experiment 1, our results demonstrated that surface estimates exhibited greater central tendency than either duration or number, and further was not correlated with the degree of central tendency for either dimension. However, duration and number did exhibit correlated central tendency effects. This finding suggests that estimates of surface are distinct from estimates of duration and number, but that duration and number may be more similar to one another. Indeed, neural recording studies in the prefrontal and parietal cortex of non-human primates have revealed overlapping, yet largely separate, representations of duration and size [49, 50], and number and size [51, 52], respectively. Further, while number, size, and time exhibit common activations of the right parietal cortex, they each engage larger networks of regions beyond this area [37, 53, 54]. For size estimates, recent work suggests that comparisons of object size draw on expectations from prior experience in other brain regions [55]. Yet, as no strong bidirectional effects were observed between duration and number, it is unlikely that duration and number share neuronal populations with similar tuning features.

Another possible interpretation of the results obtained in Experiment 1 is to consider multiple priors in magnitude estimations. When participants make temporal judgements, the combination of prior knowledge P(π) and noisy sensory inputs P(D|π) (duration of the given trial) enables participants to make an accurate posterior estimate, represented by P(π|D) ∞ P(D|π) • P(π) (see [4], Box 3 for more details), which explains the regression to the mean. Neither numerosity nor surface priors are present in this equation, which could explain why duration estimates are robust to numerosity or surface manipulations. Because numerosity and surface accumulate over time, one possible strategy for the participants is to estimate numerosity and surface based on the speed of presentation of stimuli, and on the duration of the trial (i.e. a high (low) speed and a long (short) duration of presentation correspond to a large (small) value of numerosity or surface). Under this hypothesis, the uncertainty related to the temporal dimension may add noise in the decision or the accumulation process, so that the perceived duration of the trial can bias numerosity and surface estimates (Figure 6). When numerosity and surface accumulate over a given duration, if that duration is short (long) it will affect the accuracy of participants’ estimations. Because the short (long) duration was overestimated (underestimated) it may explain why participants overestimated (underestimated) numerosity and surface in the D_{min} (D_{max}) condition. This explanation would be compatible with the hypothesized effect of duration as introducing noise on sensory accumulation.

Indeed, one noteworthy aspect unique to the time dimension is that the objective rate of presentation is fixed [56]. That is, objective time by conventional measurements proceeds at a single mean rate. In contrast, we can experimentally manipulate the rate at which we present information for number and surface. In Experiment 1, in order to keep the values of surface and number fixed when duration was manipulated, we necessarily had to change the rate of accumulation for these values. For example, between short and long durations with the same value of number, we had to change the rate of accumulation for number so that the same total value was reached at the end of the duration. This may explain the incongruent effects of duration on surface and number; shorter (longer) durations may engender larger (smaller) estimates of surface and number because the rate of accumulation is faster (slower). In this sense, surface and number are not being influenced by the magnitude of time as a dimension *per se*, but rather time is interfering with the rate of accumulation, and so the effect of duration is an epiphenomenon of the experimental design. Hence, to test this hypothesis, we modulated the accumulation rate of the presentation of numerosity and surface in Experiment 2.

In Experiment 2, where the rate of accumulation for number and surface experienced a rate-change a little less (more) than halfway through the presentation time from fast-to-slow (slow-to-fast), we replicated and extended our findings of Experiment 1. Specifically, we again found that shorter (longer) durations led to longer (shorter) estimates of surface and number, regardless of the rate-change in accumulation regime. However, we also found a difference in accumulation regimes: when the rate of accumulation was slower in the beginning of the session, estimates of surface and number were smaller than when the rate of accumulation was faster. It is important to remember that the ultimate value of the presented surface and number was the same, regardless of the accumulation regime. As such, participants were biased in their estimates by the rate of evidence accumulation in the first-half of the given trial, regardless of how long that trial lasted. This strongly suggests that human observers are biased by the rate of accumulation at the start of a trial, and are resistant to changes in rate throughout the trial.

This observation is important in the context of ongoing discussions on drift-diffusion processes in which the accumulation of evidence following the first end-point depends on terminated processes and guess probability ([57] case study 1 and Figure 1;[58] Figure 2, Figure 4A) and on the importance of change points during the accumulation process [59,60]. Additionally, this finding strengthens the hypothesis that the effect of duration on surface and numerosity may occur as a result of the impact on the implicit timing or accumulation rate, and not as a function of the explicitly perceived duration. This would be consistent with recent findings suggesting that noise memory - known to scale with duration - was not the primary factor of errors in decision-making but that noise in sensory evidence was instead a major contributor [61]. Our results suggest that speeding up the rate of evidence and lengthening the duration of a trial may be equivalent to increasing noise in sensory accumulation of other magnitude dimensions (Figure 6).

In Experiment 2, we also investigated the effect of accumulation regime on central tendency as participants again provided continuous magnitude estimates on a vertical sliding scale. Previous magnitude studies using continuous estimates have demonstrated a central tendency effect, with over(under)-estimations for small (large) magnitudes [4], the degree to which depends on the uncertainty inherent in judging the magnitude in question [43, 62]. The result of this analysis revealed that, when the rate of accumulation was slow (fast) in the beginning of the trial, the degree of central tendency was greater (lesser); further, the objective duration of the trial did *not* impact central tendency. This finding suggests that slower accumulation regimes engender greater uncertainty in magnitude estimates, and that this uncertainty may be present before the ultimate decisional value is reached. Previous work in decision-making with evidence accumulation has suggested that the objective duration of a trial leads to greater reliance on prior estimates, as longer presentation times are associated with greater uncertainty [63]. Specifically, Hanks and colleagues [63] found that a drift-diffusion model that incorporated a bias signal to rely on prior evidence that grows throughout the trial could explain reaction time differences in a dots-motion discrimination task. Notably, the bias signal is incorporated into the drift-diffusion process, such that longer trials push the accumulation rate towards a particular value, depending on the prior. A critical manipulation in this study was the emphasis on speed or accuracy for subjects; increased emphasis for accuracy led to longer decision times and greater reliance on the prior, as explained by the model. Our results suggest otherwise – the duration of the trial alone cannot determine reliance on the prior. If the effect of duration solely led to greater reliance on the prior, then we should have seen central tendency effects increase with longer durations, which did not occur in either Experiments 1 or 2. Instead, the rate of evidence accumulation determined reliance on the prior(s), regardless of duration, with slower rates leading to greater reliance.

Additional studies using neuroimaging techniques such as M/EEG need to investigate the neural correlates underpinning accumulation processes in the brain when estimating magnitudes (Centro-Parietal Positivity, for example, see [64,65]), to fully explain the behavioral results obtained in these two experiments. Further, fMRI studies must be conducted to elucidate the neural circuits for memory representations of different magnitudes [47]. Bayesian approaches may provide interesting perspectives on magnitudes estimations, and additional studies need to be performed to understand to which extent these models can be applied to explain the variety of results observed in the literature. One intriguing observation is the finding that duration estimates were not only resilient to changes in numerosity or surface, but also to the rate of sensory evidence. This finding is unexpected and runs counter-intuitive to various findings in time perception. In this task, these robust findings suggest that unlike surface and numerical estimates, duration may not rely on the accumulation of discretized sensory evidence.

## DATA ACCESSIBILITY

Data will be made available through Open Science Framework

## AUTHORS’ CONTRIBUTIONS

B.M., M.W. and V.v.W. designed the study. B.M. collected the data. B.M., M.W. and V.v.W. analyzed the data. All authors co-wrote the manuscript and gave final approval for publication.

## COMPETING INTERESTS

The authors declare having no competing interests.

## FUNDING

This work was supported by an ERC-YStG-263584 to VvW.

## ACKNOWLEDGMENTS

We thank members of UNIACT at NeuroSpin for their help in recruiting and scheduling participants, and Baptiste Gauthier for discussions.