## Summary

A precise estimation of event timing is essential for survival.^{1} Yet, temporal distortions are ubiquitous in our daily sensory experience.^{2} A specific type of temporal distortion is the time order error (TOE), which occurs when estimating the duration of events organized in a series.^{3} TOEs shrink or dilate objective event duration. Understanding the mechanics of subjective time distortions is fundamental since we perceive events in a series, not in isolation. In previous work,^{4} we showed that TOEs appear when discriminating small duration differences (20 or 60 ms) between two short events (Standard, S and Comparison, C), but only if the interval between events is shorter than 1 second. TOEs have been variously attributed to sensory desensitization,^{5,6} reduced temporal attention,^{7,4} poor sensory weighting of C relative to S,^{8} or idiosyncratic response bias.^{6}

Surprisingly, the serial dynamics of *relative* event duration were never considered as a factor generating TOEs. In two experiments we tested them by swapping the order of presentation of S and C. Bayesian hierarchical modelling showed that TOEs emerge when the first event in a series is shorter than the second event, independently of event type (S or C), sensory precision or individual response bias. Participants disproportionately expanded first-position shorter events. Significantly fewer errors were made when the first event was objectively longer, confirming the inference of a strong bias in perceiving ordered event durations. Our finding identifies a hitherto unknown duration-dependent encoding inefficiency in human serial perception.

## Results

In two separate experiments, we asked human participants to discriminate the duration of a Standard (S) event against that of a Comparison (C) event (or vice versa), and decide which one was longer (two-interval forced choice design, 2IFC). The S event was displayed in first position in Experiment 1, but was shifted to second position in Experiment 2 (**Fig 1a**). To signal the onset and offset of each event, we used a blue disk as a visual stimulus (see STAR Methods). The interval between cue and first stimulus onset was randomly chosen between 400 and 800 ms. The duration of the S event (120 ms) was kept constant, while the duration of the C event varied, providing participants with three degrees of sensory evidence (weak ±𝛥20; medium ±𝛥60; strong ±𝛥100; **Fig 1b**). Temporal attention was manipulated by parametrically increasing the Inter-Stimulus Interval (ISI) between S and C (or C and S): 400, 800, 1600, and 2000 ms.

We hypothesized that with an ever-changing C in first position (second experiment) the magnitude of time order error (TOE) would increase, as participants would not benefit from orienting attention in time to C in the second position (first experiment). Hence, we expected an interaction between the factors Stimulus presentation order and the orienting of attention in time.^{9}

### Sensory precision and temporal distortion profiles

Responses were modeled using individual sigmoid psychometric functions for each ISI level, from which we obtained the midpoint *μ* and the slope *β* of each curve (**Fig 1c**). After fitting the responses, we determined the point of subjective equality (PSE) between S and C, which corresponds to the magnitude *μ*. We analyzed precision indexed by the just noticeable difference (JND), and the cumulative effect time order errors (TOEs) indexed by the constant error (CE). CE is a global index of temporal distortion. We computed the JND as the *β* multiplied by the log ,^{10},^{5} and the CE as the difference between the physical magnitude *ϕ*_{s} of the Standard event and *μ*.

To test changes in the dependent variables as a function of factors ISI and Stimulus presentation order (S first or C first), we quantified the effect of both factors by implementing Bayesian Model Comparison. To do that, we applied a Bayes factor approach to a mixed ANOVA using Bayes’ rule.^{11,12,13} We built four alternative models: the *M*_{ISI} model (using the ISI as predictor), the *M*_{E} model (using Experiment as predictor, that is Standard position), the *M*_{ISI+E} (using both factors as predictors), and the *M*_{Int} model (including ISI and Experiment predictors, but also their interaction). To evaluate the predictive performance of each model, we compared them to the null model *M*_{0} (no effect of ISI and Stimulus presentation order) by computing the Bayes factor.^{9}

#### Just Noticeable Difference (JND)

In experiment 1, the mean JND values were: ISI_{400} = 29.90 (*SD* = 15.91), ISI_{800} = 22.54 (*SD* = 9.59), ISI_{1600} = 22.11 (*SD* = 10.54), and ISI_{2000} = 20.98 (*SD* = 11.54). In experiment 2, they were 32.67 (*SD* = 17.05), 23.97 (*SD* = 12.01), 25.53 (*SD* = 14.52), and 26.96 (*SD* = 14.62), for the ISI_{400}, IS_{I800}, ISI_{1600} and ISI_{2000} conditions, respectively (**Fig 2a**).

Modeling of the JND data revealed that the data were best explained by the *M*_{ISI} model (BF_{10} = 107822; see **Fig 2b**). Post hoc comparisons revealed decisive evidence in favor of statistical differences between the ISI_{400} and the remaining conditions: ISI_{800}, ISI_{1600}, and ISI_{2000} (posterior odds = 53596, 284, and 109; respectively). We conclude that Standard position (Experiment factor) does not have modulatory effects on sensory precision, which is driven by temporal attention.

#### Constant error (CE)

In experiment 1, mean CE values decreased with increasing ISI: ISI_{400} = 9.18 (*SD* = 19.32), ISI_{800} = 7.18 (*SD* = 14.50), ISI_{1600} = 1.94 (*SD* = 12.34), and ISI_{2000} = -2.60 (*SD* = 11.30). We found a similar pattern in experiment 2: ISI_{400} (*M =* 27.97, *SD* = 22.82), ISI_{800} (*M* = 24.02, *SD* = 20.12), ISI_{1600} (*M* = 20.62, *SD* = 20.79), and ISI_{2000} (*M* = 18.26, *SD* = 20.78; **Fig 2c**).

These results were best explained by the *M*_{ISI+E} model (BF_{10} = 185.6 * 10^{10}). Post hoc comparisons revealed very strong evidence in favor of differences between the ISI_{400} and the conditions ISI_{1600} and ISI_{2000} (posterior odds = 56, 56 and 3189; respectively), but also between the ISI_{800} and the conditions ISI_{1600} and ISI_{2000} conditions (posterior odds = 6 and 118; respectively). The Experiment factor revealed decisive evidence in favor of a statistical difference between experiments 1 and 2 (posterior odds = 4.1 * 10^{15}). That is, increasing ISI helped to minimize CE in both experiments. Participants in experiment 2 made more errors than participants in experiment 1 (**Fig 2d**).

### Relative duration effects

We then analyzed individual discrimination performance at each sensory evidence level (±𝛥100, ±𝛥60, ±𝛥20). We tested how the percentage of accuracy changes as a function of relative duration of the stimuli within a series: “First stimulus longer than the second” and “Second stimulus longer than the first” (**Fig 3a**,**c**,**e**). For each sensory evidence level and duration ordering, we again applied the four Bayesian inferential models (*M*_{ISI}, *M*_{E}, *M*_{ISI+E}, and *M*_{Int}) and compared them to the null model *M*_{0}. These models were carried out after applying an arcsine transformation to the data. This is calculated as two times the arcsine of the square root of the proportion of correct responses.

#### Strong sensory evidence (±𝛥100)

When the first stimulus was longer, mean accuracy values in experiment 1 were: ISI_{400} = 92.50, ISI_{800} = 96.42, ISI_{1600} = 97.38, and ISI_{2000} = 96.54. In experiment 2, they were 93.75, 96.59, 96.70, and 97.50, respectively. When the second stimulus was longer, mean accuracy values in experiment 1 were: ISI_{400} = 94.40, ISI_{800} = 97.02, ISI_{1600} = 97.26, and ISI_{2000} = 95.83. Whereas in experiment 2, they were 90.79, 96.13, 96.02, and 96.02, respectively.

The best model was the *M*_{ISI} model independently of the serial position of the longer stimulus (BF_{10} = 11895.5; BF_{10} = 13.5; respectively; **Fig 3b**). That is, when the sensory evidence for a difference between S and C is strong, behavioral performance is explained solely by the ISI factor, and the relative duration of events in a series is weighted out. When the first stimulus was longer, post hoc comparisons revealed very strong top decisive evidence in favor of a difference between the ISI_{400} and the remaining conditions (ISI_{800}, ISI_{1600}, and ISI_{2000}, posterior odds = 155, 474, and 88; respectively). Whereas when the second stimulus was longer, we found moderate evidence in favor of a difference between the ISI_{400} and the long ISI conditions (ISI_{1600} and ISI_{2000}, posterior odds = 4 and 3; respectively). We found no evidence for a difference between ISI_{400} and ISI_{800} (posteriors odds = 1).

#### Medium sensory evidence (±𝛥60)

When the first stimulus was longer, mean accuracy values in experiment 1 were: ISI_{400} = 89.28, ISI_{800} = 94.16, ISI_{1600} = 95.47, and ISI_{2000} = 93.69. Whereas in experiment 2, they were 68.75, 78.63, 81.36, and 80.00, respectively. When the second stimulus was longer, mean accuracy values in experiment 1 were: ISI_{400} = 85.23, ISI_{800} = 91.54, ISI_{1600} = 93.09, and ISI_{2000} = 92.73. In experiment 2, they were 92.27, 93.29, 91.47, and 92.04, respectively.

When the first stimulus was effectively longer, the accuracy data were best explained under the Null model (*M*_{E}, BF_{10} = 0.31; *M*_{ISI}, BF_{10} = 0.15; M_{Int}, BF_{10} = 0.12; *M*_{E+ISI}, BF_{10} = 0.04). When the second stimulus was longer, the *M*_{E+ISI} was the best model (BF_{10} = 1.66 * 10^{11}; **Fig 3d**). Post hoc comparisons of the ISI factor revealed that there was decisive evidence for differences between the ISI_{400} and the rest of the ISI levels (ISI_{800}, ISI_{1600}, ISI_{2000}, posterior odds = 35985, 2.7 * 10^{6}, and 36531; respectively). The Experiment factor revealed too that there is decisive evidence for a difference between experiments 1 and 2 (posterior odds = 2.3 * 10^{7}).

#### Weak sensory evidence (±𝛥20)

When the first stimulus was longer, mean accuracy values in experiment 1 were: ISI_{400} = 75.83, ISI_{800} = 78.45, ISI_{1600} = 72.38, and ISI_{2000} = 71.90. Whereas in experiment 2, they were 47.5, 49.77, 52.38, and 55.45, respectively. When the second stimulus was longer, mean accuracy values in experiment 1 were: ISI_{400} = 59.52, ISI_{800} = 63.92, ISI_{1600} = 69.04, and ISI_{2000} = 78.33. In experiment 2, they were 83.06, 85.68, 82.50, and 80.90, respectively.

When first stimulus was longer, the data were best explained by the *M*_{E+ISI} model (BF_{10} = 261.9). Post hoc comparisons of the ISI factor revealed moderate evidence for differences between the ISI_{800} and the ISI levels 16000 and 2000 (posteriors odds = 7 and 10, respectively). We found no differences between the rest of the conditions (all prior odds ≦ 0.3). The Experiment factor revealed decisive evidence for a difference between experiments 1 and 2 (posteriors odds = 35586).

When the longer stimulus was in the second position, the data were best explained by the *M*_{Int} model (BF_{10} = 9.5 * 10^{9}; **Fig 3f**). Post-hoc analyses of the ISI factor revealed that there are moderate and decisive evidence for differences between the ISI_{400} and the conditions ISI_{1600} and ISI_{2000} (posteriors odds = 3 and 8824, respectively), but also decisive and moderate evidence for differences between the conditions ISI_{2000} when compared to the conditions ISI_{800} and ISI1_{600} (posteriors odds = 2137 and 10, respectively). The Experiment factor revealed decisive evidence in favor of a difference between experiments 1 and 2 (posteriors odds = 4 * 10^{9}).

In experiment 1, post hoc comparisons of the interaction revealed differences between the ISI_{400} and the conditions ISI_{1600} and ISI_{2000} (posteriors odds = 8 and 50889, respectively), but also between the conditions ISI_{2000} when compared to the conditions ISI_{800} and ISI_{1600} (posteriors odds = 1240 and 29, respectively). In Experiment 2, post hoc comparisons revealed that data were best explained under the Null model (all posterior odds ≦ 0.43).

#### A bias in serial duration perception

We then used a Bayesian hierarchical model to compare differences in the accuracy rate between “First stimulus longer than the second” (*ϕ*^{a}) and “Second stimulus longer than the first” (*ϕ*^{b}) conditions for the uncertain sensory evidence levels ±Δ20 and ± Δ60, which showed an effect of the experimental manipulation (for the Strong sensory evidence, see Supplemental Information). To do that, we applied the Bayesian rate comparison model^{14} and quantified the difference *α* between both rates (*ϕ*^{a} and *ϕ*^{b}) as a normally distributed random effect (see STAR Methods; **Fig. 4a**).

We applied this model in both experiments for within comparisons at each ISI level. Thus, at each ISI level of each experiment we compared the null hypothesis *H*_{0} (no difference in accuracy between *ϕ*^{a} and *ϕ*^{b} rates) versus the alternative hypothesis *H*_{1} **(Fig. 4b)**.

#### Medium sensory evidence (±𝛥60)

Results of experiment 1 yielded strong and moderate evidence in favor of *H*_{1} in the conditions ISI_{400} and ISI_{800} (BF_{10} = 12; BF_{10} = 9; respectively), but no evidence in the longer conditions ISI_{1600} and ISI_{2000} (BF_{10} = 3; BF_{10} = 0.86; anecdotal evidence in favor of *H*_{1} and *H*_{0}, respectively; **Fig 4c**). These results confirm that, for short ISIs, participants do more mistakes when the longer stimulus is place in the second position. However, this effect can be minimized by the allocation of attention in time.^{7,15}

Contrary to experiment 1, results of experiment 2 showed decisive evidence for *H*_{1} at all ISI levels: ISI_{400,} ISI_{800}, ISI_{1600} and ISI_{2000} (BF_{10} > 10000; BF_{10} > 10000; BF_{10} = 310; BF_{10} = 1512; respectively; **Fig 4d**). That is, when sensory evidence is of medium strength, temporal attentions does not help to minimize temporal errors. Regardless of the ISI, participants do more mistakes when the longer stimulus is placed in the second position.

#### Weak sensory evidence (±𝛥20)

We found similar results in the weak sensory evidence. In experiment 1, results yielded extreme evidence in favor of *H*_{1} in the conditions ISI_{400} and ISI_{800} (BF_{10} = 2945; BF_{10} = 3013; respectively), but no differences in the longer ISIs: 1600 and 2000 (BF_{10} = 0.59; BF_{10} = 2; respectively; **Fig. 4e**). In experiment 2, results showed extreme evidence for *H*_{1} at each ISI level (BF_{10} > 10000; BF_{10} > 10000 ; BF_{10} > 10000; BF_{10} > 10000; respectively; **Fig. 4f**).

### Response bias

Finally, we tested whether participants were biased in hitting the keys for “S” or “C”. We abstracted away from the confounding factor of sensory uncertainty by analyzing key responses across ISI levels at the Strong sensory evidence level only (±𝛥100). We applied individual Bayesian binomial analyses to the set of responses, and verifed the null hypothesis *H*_{0} of a 50% probability to choose either S or

C. In both experiments, results yielded Bayes factors (BF_{01}) with moderate evidence in favor of the null hypothesis (**Supplemental InformationFig 2**). For experiment 1, the lower and higher values for the BF were 4.5 and 10.1, respectively. Whereas for experiment 2, they were 3.6 and 10.1. Overall, button press data were between 3.6 and 10.1 more likely under the null hypothesis (no bias) than under the alternative hypothesis, discarding the presence of an idiosyncratic response bias in some of the participants.

## Discussion

Time order error (TOE) is a subjective distortion of an event’s duration that occurs when the event is inserted in a series. It constitutes one of the oldest and least understood phenomena of subjective time perception.^{16,17,18} A set of models have been proposed to try and explain how TOEs are generated. In the sensation-weighting model,^{8} temporal distortions would arise because the sensory effects produced by S and C are differentially weighted before they are discriminated. In the difference model,^{6} TOEs depend on two components: 1) Desensitization caused by short inter-stimulus intervals (ISIs) —akin to a form of attentional blink^{19}— not allowing the sensory system to reset back to its initial state;^{5} 2) an idiosyncratic response bias in picking a specific stimulus (always picking the first, or the second one). Both models predict that by improving stimulus encoding processes, such as by increasing ISI and thus temporal attention to the second stimulus in a series, TOE should decrease. We verified this stance in our previous work^{4} using a visual two-interval forced-choice (2IFC) task with empty events in a fixed order, with the comparison (C) event always following the standard (S) event. However, all accounts missed the crucial point of explaining how TOEs are generated under uncertain sensory evidence and limited attention resources.

Although a meta-analysis^{21} suggested that temporal sensitivity decreases when the Standard event is displayed in the 2^{nd} position, our results demonstrate no effect of S position on temporal sensitivity, which was modulated only by the ISI factor. We conclude that TOEs are not elicited by differences in sensory precision. Our analysis of key presses shows that participants were not biased in hitting the key for “S” or “C”. That is, TOEs are not generated by an absolute positional bias, as predicted by the difference model.^{6}

Since TOEs occur during serial discrimination tasks, we tested whether event duration dynamics is the primary source of distortions by flipping the positions of standard (S) and comparison (C) events in two separate behavioral experiments. First, we replicated the finding that increasing the temporal interval between first and second event minimizes TOEs for S in both 1^{st} and 2^{nd} position, although significantly less frequently for S in 2^{nd} position. These results can be explained by the beneficial effects of allocating attention in time, more oriented to the encoding unpredictable events (C).^{7,20,15}

Second, a Bayesian modelling of accuracy at each sensory level (𝛥100, 𝛥60, 20𝛥) showed that, while most TOEs occur for short ISIs and small to medium sensory evidence, they tend to cluster in a specific serial condition: When the first stimulus in a series is shorter than the second stimulus, regardless of whether it is S or C. In such a case, participants were biased to say that the first event was longer, consistently making mistakes.

When the first stimulus was truly larger than the second, performance was significantly above chance, suggesting optimal processing even under uncertain sensory circumstances (small and medium sensory evidence, 20/60 ms). This prefigures a novel order bias in serial perception based on duration-dependent relative positions of stimuli, which can only be partially counteracted by increasing temporal attention. Attention helps when S is in the first position, as it enhances the encoding of the C stimulus whose duration is unpredictable.

Our new model finally explains that time order errors arise under sensory uncertainty because of serial perceptual encoding inefficiency. Future research is needed to uncover the physiological basis of such a strong expectation about the temporal statistics of incoming stimuli.

## STAR Methods

### Participants

#### Experiment 1

Part of the results for experiment 1 were previously published.^{4} This dataset has a sample of 52 participants (34 female; ages: 18-33; mean age: 24.42). We removed participants with an accuracy ≤ 55 %, as this is an indicator that they completed the task by chance. One participant was removed due to his low accuracy. For each participant we computed the goodness-of-fit of the psychometric function (*R*^{2}). Participants with a *R*^{2} value lower than two standard deviations away from the mean were removed. Seven participants were removed from analysis following this procedure. For the dependent variables WF and CE, we also discarded extreme outliers. We identified these outliers by using two standard deviations. Two participants were discarded for being marked as extreme outlier in experiment 1. Therefore, the final sample included the data from 42 participants (26 female; ages: 18-33; mean: 24.14).

#### Experiment 2

We had an initial sample of 58 participants in experiment 2 (45 female; ages: 18-37; mean age: 25.41). We applied the same procedure as in experiment 1 for discarding outliers. Four participants were removed from analysis due to their low performance of correct responses (< 55%). Nine participants were removed from analysis due to their low *R*^{2} values. One participant was discarded for being marked as extreme outliers in the WF. Therefore, the final analysis included the data from 44 participants (34 female; ages: 18-32; mean: 25.22). In total, we report on the behavior of 86 participants.

Individuals were recruited through online advertisements. Participants self-reported normal or corrected vision and had no history of neurological disorders. Up to three participants were tested simultaneously at computer workstations with identical configurations. They received 10 euros per hour for their participation. The studies were approved by the Ethics Committee of the Max Planck Society. Written informed consent was obtained from all participants previous to the experiment.

## Design

We used a classical interval discrimination task by using a 2IFC design, where participants were presented with two visual durations: S and C.^{22,23} S had a magnitude of 120 ms and was always displayed in the first position in experiment 1, but in the second position in experiment 2. In both experiments, we used three magnitudes for the step comparisons 𝛥 between S and C: 20, 60, and 100 ms. We derived the magnitudes for the C stimuli as *S* ± 𝛥, which resulted in the next C durations: 20, 60, 100, 140, 180, and 220 ms. For the ISI, we used four durations: 400, 800, 1600, and 2000 ms. For each trial, the inter-trial interval (ITI) was randomly chosen from a uniform distribution between 1 and 3 seconds. Participants judged whether the S or C interval was the longer duration, and responded by pressing one of two buttons on an RB-740 Cedrus Response Pad (*cedrus*.*com*, response time jitter < 1 ms, measured with an oscilloscope).

## Stimuli and Apparatus

We used empty visual stimuli, which were determined as a succession of two blue disks with a diameter of 1.5° presented on a gray screen.^{24} We used empty stimuli to ensure that participants were focused on the temporal properties of the stimuli.^{25} All stimuli were created in MATLAB R2018b (mathworks.com), using the Psychophysics Toolbox extensions.^{26,27,28} Stimuli were displayed on an ASUS monitor (model: VG248QE; resolution: 1,920 × 1,080; refresh rate: 144 Hz; size: 24 in) at a viewing distance of 60 cm.

## Protocol (Task)

The experiment was run in a single session of 70 minutes. Participants completed a practice set of four blocks (18 trials in each block). All sessions consisted of the presentation of one block for each ISI condition. Each block was composed of 120 trials and presented in random order. In order to avoid fatigue, participants always had a break after 60 trials. Each trial began with a black fixation cross (diameter: 0.1°) displayed in the center of a gray screen. Its duration was randomly selected from a distribution between 400 and 800 ms. After a blank interval of 500 ms, S was displayed and followed by C, after one of the ISI durations.

Participants were instructed to compare the durations of the two stimuli by pressing the key “left”, if S was perceived to have lasted longer, and the key “right” if C was perceived to have lasted longer. After responding, they were provided with immediate feedback: the fixation cross color changed to green when the response was correct, and to red when the response was incorrect.

## Data analysis

The data analysis was implemented with Python 3.7 (*python*.*org*) using the ecosystem SciPy (*scipy*.*org*) and the libraries Pandas,^{29} Seaborn,^{30} and Pingouin.^{31} Frequentist statistical analyses were executed in Pingouin. Bayesian statistical analyses were implemented using the BayesFactor package for R.^{9} All data and statistical analyses were performed in Jupyter Lab (*jupyter*.*org*).

To endorse open science practices and transparency on statistical analyses, we used JASP^{32} (jasp-stats.org) for providing statistical results (data, plots, distributions, tables and post hoc analyses) of both Frequentist and Bayesian analyses in a graphical user-friendly interface. These results can be consulted at Open Science Framework as annotated .jasp files (osf.io/jkzq4/). As JASP uses the BayesFactor package as a backend engine, the default prior distributions of this package were the same for JASP. Modelling of the Bayesian rate-comparison model^{14} was done in MATLAB R2018b (mathworks.com) using JAGS 4.3.0^{33} (www.mcmc-jags.sourceforge.net/).

## Psychometric curves

Responses were modeled using a 6-point psychometric function using the nonlinear least-squares fit in Python. We plotted the six C durations on the *x* -axis and the probability of responding “C longer than S” on the *y* -axis. We parameterized psychometric functions by using the distribution of a logistic function *f*, which is given by
where *L* is the maximum value of *f, x* is the magnitude of the C stimuli, *μ* is the *x*-value at the mid-point of the psychometric function, and *β* parametrizes the slope of the logistic function. Two indices of temporal performance are extracted from this fitting: a marker for the perceived event duration, and a marker for the sensory temporal precision.^{34} The perceived duration was measured via the PSE (*μ* in equation 1), i.e., the value on the *x*-axis that corresponds to the 50% value on the *y*-axis.^{34} We derived the CE from the PSE, which is defined as the difference between the PSE and the magnitude of the physical duration *ϕ*_{s} of S:
Positive CEs indicate that the C stimuli were perceived as shorter than the S stimulus. The temporal sensitivity was measured by using the JND, which is defined as being half the interquartile range of the fitted function , where *x* . and *x* ._{25} denote the point values on the *y* -axis that output 25% and 75% “longer” responses. The smaller the JND, the higher the discrimination sensitivity of the sensory system. The JND is obtained from the slope (*β* in equation 1) of the fitted function:

### Statistical analyses

#### Frequentist analyses

Frequentist statistical analyses are available at Open Science Framework (osf.io/jkzq4/). For the Frequentist analyses the level of statistical significance to reject the null hypothesis *H*_{0} was set to *α* = 0.05. To test for significant changes in the dependent variables, we implemented a repeated measures ANOVA across ISI levels in both experiments. We carried out post hoc comparisons by using the Bonferroni correction .

#### Bayesian Model Comparison

To contrast results from both experiments, we applied a Bayes Factor approach to ANOVA by using Bayesian Model Comparison (BMC).^{13,35} Tod do that, we implemented Bayes’ rule for obtaining the posterior distribution *p*(*θ* ∣ *D*), where *D* expresses the observed data, under the model specification *M*_{1},^{11,14} which is given by
where *p*(*D* ∣ *θ, M*_{1}) denotes the likelihood, *p* = (*θ* ∣ *M*_{1}) expresses the prior distribution, and the marginal likelihood is expressed by *p*(*D* ∣ *M*_{1}). Bayes factor ANOVA compares the predictive performance of competing models. Thus, for evaluating the relative probability of the data *D* under competing models we computed the Bayes factor (BF): Let BF_{10} express the Bayes factor between a null model *M*_{0} versus an alternative model *M*_{1}. The predictive performance of these models is given by the probability ratio obtained by dividing the marginal likelihoods of *M*_{1} and *M*_{0}:
In this case, BF_{10} expresses to which extent the data support the model *M*_{1} over *M*_{0,} whereas BF_{01} indicates the Bayes factor in favor *M*_{0} over *M*_{1}. BF values < 0 give support to *M*_{0}, whereas BF > 1 support the *M*_{1} model. A BF of 1 reveals that both models predicted the data equally well.^{36} For our analysis of the dependent variables we build four alternative models (*M*_{E}, *M*_{ISI}, *M*_{ISI+E}, and the *M*_{Int}) for quantifying the effect of two factors: ISI and Standard position. The *M*_{E} model uses the Standard position as predictor, whereas the *M*_{ISI} model uses the ISI. The *M*_{ISI+E} model uses both factors as predictors (the ISI and the Standard position), and the *M*_{Int} model uses again both factors but includes also their interaction (ISI * Experiment). The prior model probability *p*(M) of each model was set to be equal, *i*.*e*., prior model odds of 0.2. We compare each model to the null model *M*_{0} and provided the BF. The model with the highest BF was selected as the best model, and thus is the inferential model that explain the data more accurately.

Bayesian post hoc testing is based on pairwise comparisons using Bayesian t-tests with a Cauchy prior. For post hoc testing, the posterior odds are corrected for multiple testing by fixing to 0.5 the prior probability that the null hypothesis holds across comparisons.^{36} For all our analyses we used the default prior values for Bayes factor ANOVA,^{37,38} which are also the default values in the BayesFactor package and JASP.

#### Bayesian rate-comparison model

To test individual differences on the accuracy-rate *θ* between the conditions a (“First stimulus longer”) and b (“Second stimulus longer”), we used the Bayesian comparison-rate model (see **Fig. 4a**) for computing differences between *θ*^{a} and *θ*^{b}.^{14} We deployed the Bayesian comparison-rate model for within subject comparisons at each ISI level and modeled differences between conditions as having Gaussian distributions. That is, with mean *μ* and standard deviation *σ*:
This model receives two inputs for each subject *i:* the number of trials *n* and the total number of correct responses *s*. We assumed that the rate parameter *θ* follows a binomial distribution. In order to model *θ* as a normally distributed variable, we applied a probit transformation, which transforms *θ* (a real number) into a probability *ϕ*:^{39}
We added a variable *δ* to quantify the strength of the effect size:
Then, we modeled the effect *α* as random effect. That is, with a Gaussian distribution of mean *μ*_{α} and standard deviation *σ*_{α}:
Thus, differences between conditions are given by adding the effect *α* to one of the transformed rates *ϕ*:
To run this model, we generated 50 chains with 100000 samples, which generated a total of 5000000 samples. We discarded the first 1000 burn-in samples. Posterior distributions of this model show visually the strength of the effect size *δ*. We computed the Bayes factors for this model using the Savage-Dickey method.^{40,14}

## Author contributions

Research question: FS and AT. Study design: All authors. Testing and data collection: FS. Data analysis and writing original draft: FS and AT. Writing, review & editing: all authors.

## Declaration of interests

The authors declare no competing interests.

## Supplemental information

**Strong sensory evidence (**±𝛥100**)**. Results of experiment 1 yielded anecdotal evidence in favor of *H*_{0} in all conditions: ISI_{400,} ISI_{800}, ISI_{1600} and ISI_{2000} (BF_{10} = 0.29; BF_{10} = 0.51; BF_{10} = 0.81, BF_{10} = 0.94; respectively; **Fig SI 1a**). We found similar results in experiment 2 for the conditions ISI_{400,} ISI_{800}, ISI_{1600} and ISI_{2000} (BF_{10} = 0.38; BF_{10} = 0,58; BF_{10} = 0.52; BF_{10} = 0.34; respectively; **Fig SI 1b**). That is, when sensory evidence for C is strong, no duration-dependent bias can be detected.

## Acknowledgements

We thank Lauren Fink and Luigi Acerbi for critical suggestions.