A fundamental issue in human performance concerns the mechanisms that allow people to discriminate between two stimuli (e.g., Falmagne, 1985; Gescheider, 1997; Green & Swets, 1966; Macmillan & Creelman, 2005; Wickens, 2002). It is usually assumed that when participants are asked to compare two stimuli differing in magnitude (e.g., pitch, duration, luminance), they base their judgments on the difference of the internal representations of these stimuli. Models that incorporate this idea are often based on Thurstone’s (1927a, 1927b) original difference model, which also underlies Signal Detection Theory and other accounts in psychophysics (Falmagne, 1985; García-Pérez & Alcalá-Quintana, 2010; Luce & Galanter, 1963; Yeshurun, Carrasco, & Maloney, 2008). Difference models state that the internal representation X 1 of one stimulus is compared with the internal representation X 2 of the other stimulus. More specifically, these models assume that an internal comparison mechanism operates on the difference D = X 1X 2. When this difference is greater than a fixed constant—which is zero if the participant has no bias to prefer one of the response alternatives—the participant assumes that the stimulus associated with X 1 is the larger one. Likewise, if the difference is smaller than the fixed constant, the participant assumes that the stimulus associated with X 2 is the larger one.

Although difference models provide a simple discrimination mechanism, they cannot account for certain order effects on discrimination performance (Lapid, Ulrich, & Rammsayer, 2008; Rammsayer & Ulrich, 2012; Ulrich, 2010; Ulrich & Vorberg, 2009). More specifically, if a constant standard s and a variable comparison c are presented successively, difference models predict that discrimination performance as indexed by the difference limen (DL)Footnote 1 should not depend on the order of s and c, even if the observer has a bias favoring one stimulus position. In particular, these models predict equal slopes of the psychometric functions and, thus, equal DLs for stimulus orders 〈sc〉 (i.e., c follows s) and 〈cs〉 (i.e., c precedes s; see Appendix 1 for a proof). Figure 1 provides an illustration. As can be seen, the predicted psychometric functions for the two stimulus orders can be shifted—as one would expect if the participant exhibits a response bias—but their slopes and, thus, the DLs are identical.

Fig. 1
figure 1

Difference model: Predicted psychometric functions for stimulus orders 〈sc〉 (dotted line) and 〈cs〉 (dashed line; see Appendix 1 for details). In addition, the average of those functions is shown (solid line). Without loss of generality, the following additional assumptions were made to calculate these functions: (1) The difference Δ = E s E c is normally distributed with E[Δ] = 0 and Var[Δ] = 302; (2) the magnitude of the standard stimulus s is 500 (arbitrary unit; e.g., milliseconds), and the magnitude of the comparison c varies from 300 to 700; and (3) bias γ is 30. The psychometric functions (dotted and dashed line) are shifted by 2 · |γ| against each other, but their slopes and, thus, the DLs are identical

In several studies including discrimination of temporal intervals (e.g., Lapid et al., 2008; Stott, 1935; Ulrich, 2010; Woodrow, 1935), weight discrimination (e.g., Ross & Gregory, 1964), and contrast discrimination in visual perception (Nachmias, 2006), DL estimates have been found to be larger for the 〈cs〉 than for the 〈sc〉 stimulus order. Table 1 gives a brief overview of studies in which discrimination performance on 〈sc〉 trials and 〈cs〉 trials was analyzed separately. As one can see, in all of these studies, discrimination performance is superior for 〈sc〉 trials, as compared with 〈cs〉 trials (but see Hellström, 2003, Table 7; Hellström & Rammsayer, 2004). Thus, in sharp contrast to the predictions of difference models, presentation order of stimuli does indeed affect discrimination performance.

Table 1 Overview of studies in which discrimination performance for 〈sc〉 trials (i.e., constant standard s precedes variable comparison c) and 〈cs〉 trials was analyzed separately. As one can see, across different tasks and modalities, discrimination performance is generally better for 〈sc〉 than for 〈cs〉 trials by about 10 %–110 %

For example, Yeshurun et al. (2008) pointed out that performance on detection tasks often differs depending on whether the to-be-detected signal is presented in the first or the second of two intervals. These authors proposed that this might be due to decisional biases or to differences in sensitivity for the comparison of both intervals. In a similar vein, Ulrich and Vorberg (2009) introduced the term Type B effect to distinguish the effect of stimulus order on discrimination performance as indexed by DL from the well-studied effect of stimulus order on the point of subjective equality (PSE).Footnote 2 The latter effect, commonly known as time-order error (TOE),Footnote 3 has been known since the advent of psychophysics and has been subjected to extensive psychophysical research since then (Fechner, 1860; Hellström, 1979, 1985, 2003; Hellström & Rammsayer, 2004; Helson, 1964; Michels & Helson, 1954; Woodworth & Schlosberg, 1954; Guilford, 1954, pp. 305–311 provides a detailed overview of the classical literature on the TOE). In contrast to the TOE, the Type B effect reflects a genuine difference in sensitivity between the two presentation orders, and it has been studied less often (see, however, Hellström, 2003, Table 7; Hellström & Rammsayer, 2004; Rammsayer & Wittkowski, 1990).Footnote 4 It is important to understand why the Type B effect occurs. A major purpose of the present work is to contribute to a better understanding of this effect’s origin.

Trial-by-trial updating of an internal reference

One possibility for the origin of the Type B effect is that participants build up a virtual standard or internal reference across the experiment (e.g., Durlach & Braida, 1969; Helson, 1947, 1964; Michels & Helson, 1954; Morgan, Watamaniuk, & McKee, 2000; Nachmias, 2006). Elaborating on this idea, Lapid et al. (2008) proposed a quantitative account of how such an internal reference I is established and updated on every trial. According to this Internal Reference Model (IRM), the internal reference I n on the current trial n is a weighted sum of the internal reference I n−1 from the previous trial and the internal representation of the first stimulus X 1,n on the current trial. Specifically, the updated internal reference I n on trial n is assumed to follow a geometrically moving average (Roberts, 1959),

$$ {{\mathbf{I}}_n} = g \cdot {{\mathbf{I}}_{{n - 1}}} + \left( {1 - g} \right) \cdot {{\mathbf{X}}_{{1,n}}} $$
(1)

with the constant weight g, 0 ≤ g < 1. Participants compare the internal representation of the second stimulus X 2,n on the current trial with this internal reference. If \( {{\mathbf{D}}_n} = {{\mathbf{I}}_n} - {{\mathbf{X}}_{{2,n}}} > 0 \), they judge the first stimulus to be the larger one, and, otherwise, they judge the second stimulus to be the larger one. The IRM is an extension of the standard difference model because, if weight g is set to zero, only the internal representation X 1,n of the first stimulus is compared with X 2,n on trial n (see Appendix 2 for further details).

IRM can be regarded as a special case of Hellström’s (1979, 1985, 2003) prominent Sensation Weighting model, which is related to Michels and Helson’s (1954; Helson, 1964) quantitative TOE theory (Hellström 1985) and Helson’s (1947, 1964) adaptation-level theory. The complete form of Hellström’s Sensation Weighting model is given by the following equation (see, e.g., Hellström, 1985, p. 45; Hellström & Rammsayer, 2004, p. 3; Patching, Englund, & Hellström, 2012):

$$ d = \left[ {{s_1}{\Psi_1} + \left( {1 - {s_1}} \right){\Psi_{{r1}}}} \right] - \left[ {{s_2}{\Psi_2} + \left( {1 - {s_2}} \right){\Psi_{{r2}}}} \right] + b, $$
(2)

where d is the subjective difference between two compared successive stimuli, Ψ1 and Ψ2 are the sensation magnitudes of the stimuli, s 1 and s 2 are weighting coefficients, and the term b accounts for a possible judgment bias. Furthermore, Ψ r1 and Ψ r2 are reference levels that reflect the current average subjective level of stimulation. According to Hellström and Rammsayer (2004), the ratio s 1/s 2 determines the size and the direction of the time-order effect. For s 2 = 1 and b = 0, the discrimination process in the Sensation Weighting model is analogous to the discrimination process proposed by IRM.

Taken together, IRM can be regarded as intermediate between classical difference models and the Sensation Weighting model. However and most important, IRM explicitly tackles how the internal reference builds up during an experiment. Specifically, a core feature of IRM is that it incorporates a dynamic trial-by-trial updating process for the internal reference. This assumption is in line with the idea that psychometric functions may be based on a nonstationary process (Fründ, Haenel, & Wichmann, 2011). IRM’s dynamic updating process allows us to derive a set of specific predictions with regard to stimulus order and trial sequence, which will be outlined subsequently.

Predicted effects of stimulus order on DL

IRM predicts worse discrimination performance for 〈cs〉 than for 〈sc〉 trials when these stimulus orders are presented in separate experimental blocks (see Appendix 2). A rough approximation that does not take into account the variance of the distribution the comparisons are drawn from shows that for blocked stimulus order, \( D{L_{{ < sc > }}} \approx \left( {1 - g} \right) \cdot D{L_{{ < cs > }}},\;0 < 1 - g \leqslant 1 \) (for an exact analytical derivation, see Appendix 2). The worse performance on such blocked 〈cs〉 trials than on blocked 〈sc〉 trials is a consequence of integrating the variable comparison c—instead of the constant standard s—into the internal reference. However, it was unclear whether the same prediction applies when stimulus order is random, as in the two-alternative forced choice task. We have therefore conducted Monte Carlo simulations to examine the predictions of the model under blocked and random stimulus order.Footnote 5 The predicted DLs of these simulations are depicted in Fig. 2 for blocked and random stimulus order and for various values of g. As one can see, the model predicts a clear order effect on DL, and this effect increases with increasing g. Surprisingly, the model predicts about the same size of order effect for blocked and random stimulus order.

Fig. 2
figure 2

Predictions of the Internal Reference Model for the difference limen (DL) as a function of stimulus order and block type. According to this model, participants establish an internal reference I that is updated on every trial, such that on trial n, \( {{\mathbf{I}}_n} = g \cdot {{\mathbf{I}}_{{n - 1}}} + \left( {1 - g} \right) \cdot {{\mathbf{X}}_{{1,n}}} \) is a weighted sum of the internal reference on the previous trial and the first stimulus on the present trial with the constant weight g, 0 ≤ g < 1. Participants compare I n with the second stimulus X 2,n on this trial, and respond with “second interval larger" if X 2,n > I n and with “first interval larger" otherwise. Estimates were obtained from Monte Carlo simulations. The simulations assumed that X is normally distributed with mean E(X) = s for the standard s and mean E(X) = c for the comparison c. The standard deviation of X was SD(X) = 50 in either case. The standard was set to 500 ms, and the comparison could take values from 400 to 600 ms in constant steps of 20 ms. The simulations mimicked the responses of a participant, and the obtained data were used to calculate psychometric functions, from which DLs were obtained. Results for weights g = 0.3 (left panel), g = 0.5 (middle panel), and g = 0.7 (right panel) are shown

Psychophysical data pertinent to these predictions are scarce and mixed. First, Nachmias (2006, Experiment 2) instructed five observers to discriminate between visual stimuli (e.g., contrast discrimination of Gabor patches). Nachmias’ data revealed an effect of stimulus order on discrimination performance. Specifically, the standard deviation of the fitted psychometric functions was 41 % larger for 〈cs〉 than for 〈sc〉 trials when these stimulus orders were blocked. When they were randomly intermixed, the effect of stimulus order was somewhat smaller (35 %; Table 1, p. 2461). Although the main effect of stimulus order was significant, neither the main effect of block type (blocked vs. random) nor the interaction was statistically reliable. Thus, these results are consistent with the predictions of IRM. The lack of a statistical effect might, however, reflect low statistical power due to the rather small sample size. Second, Nahum, Daikhin, Lubin, Cohen, and Ahissar (2010) examined auditory two-tone frequency discrimination performance for blocked orders of s and c (i.e., 〈sc〉 and 〈cs〉), as well as for an alternating order of the two stimuli. For alternating stimulus order, they observed worse discrimination performance on 〈cs〉 trials than on 〈sc〉 trials, that is, a Type B effect. For blocked stimulus order, this effect was also present numerically, but it was much reduced and statistically not reliable. Thus, in contrast to Nachmias (2006) and the predictions of the IRM, these data suggest that mixing stimulus order across trials has an impact on discrimination performance. This finding is in accordance with a variety of studies that commonly indicate that blocking versus mixing of experimental conditions can have a severe effect on performance, which is often attributed to adopting different strategies or decision criteria (e.g., Grice & Hunter, 1964; Mattes & Ulrich, 1997; Niemi & Näätänen, 1981; Rogers & Monsell, 1995; Sanders, 1998).

In addition to the Type B effect, IRM accounts for other findings in the literature. For example, it is a well-established finding that discrimination performance increases when the standard is presented repeatedly before the comparison process takes place (e.g., Drake & Botte, 1993; Ivry & Hazeltine, 1995; Schulze, 1989). According to IRM, this is because the signal-to-noise ratio of the internal reference tends to increase with each successive presentation of the standard (see Appendix 2 for a demonstration of IRM’s noise reduction property).

Predicted effects of the previous trial on PSE

IRM implies a less stable internal reference on blocked 〈cs〉 than on blocked 〈sc〉 trials because, in the former case, the variable comparison c is integrated into the internal reference (see Equation 1). This unstable reference not only worsens discrimination performance (i.e., larger DL), but also sequentially modulates participants’ judgments.

Specifically, for stimulus order 〈cs〉, the size of c on the previous trial must affect the size of the internal reference because, for this order, c is integrated into the internal reference. First, if c and, thus, X 1 were small on the preceding trial (say, c = 400 ms) relative to the standard (say, s = 500 ms), the internal reference I n on the present trial will also tend to be small. Hence, the standard in the second position of the current trial appears relatively large when compared with I n . Therefore, the magnitude of c on the present trial will be underestimated. As a consequence of such an underestimation, a larger value of c is necessary in order to yield a sensation equivalent to the one associated with s. Accordingly, PSE will be larger than s for these trials. Second, if c was large on the preceding trial (say, c = 600 ms), the internal reference I n on the present trial also tends to be large, leading to an overestimation of c on the present trial relative to s. Consequently, a smaller value of c suffices in order to yield a sensation equivalent to the one associated with s. This, in turn, leads to a PSE smaller than s for these trials.

In short, conditioning on the size of c on the preceding trial, the estimated PSE for stimulus order 〈cs〉 is greater than s if c on the preceding trial was small. Likewise, the estimated PSE for stimulus order 〈cs〉 is smaller than s if c on the preceding trial was large. For stimulus order 〈sc〉, c is not integrated into the internal reference, and thus, the magnitude of c on the preceding trial cannot affect the value of I n . Therefore, PSE should not depend on the magnitude of c on the preceding trial. To sum up, PSE should depend on the magnitude of c on the previous trial for stimulus order 〈cs〉, but not for stimulus order 〈sc〉.

In order to illustrate this prediction, we analyzed the data for the blocked conditions from our Monte Carlo simulations as a function of stimulus order and the magnitude of c on the preceding trial. As can be seen in Fig. 3, IRM predicts a clear sequence effect on PSE for 〈cs〉 trials. PSE is greater if the comparison on the previous trial was small (i.e., c n − 1 < s) than if it was large (i.e., c n − 1 > s). This effect increases with increasing g. However, if the order of stimuli is reversed and the first stimulus is the constant standard s, as in the blocked 〈sc〉 condition, the internal reference and, hence, PSE do not depend on the magnitude of the comparison c on the previous trial, which enables an especially strong prediction.

Fig. 3
figure 3

Predictions of the Internal Reference Model for the point of subjective equality (PSE) as a function of blocked stimulus order and previous comparison magnitude. PSE estimates were obtained from Monte Carlo simulations. Results for weights g = 0.3 (left panel), g = 0.5 (middle panel), and g = 0.7 (right panel) are shown. For 〈cs〉 trials, on which the variable comparison c precedes the constant standard s, the model predicts an effect of the comparison magnitude on the previous trial on PSE, and this effect increases with increasing g. For 〈sc〉 trials, though, the model predicts no effect of comparison sequence

Aim of the present study

The present experiments were designed to test the two major predictions of IRM outlined above. First, IRM predicts smaller DLs, that is, better discrimination for stimulus order 〈sc〉 than for 〈cs〉. This effect should be about the same for blocked and random stimulus order. Second, IRM predicts sequential effects of the previous comparison magnitude on PSE for blocked stimulus order 〈cs〉, but not for 〈sc〉. To evaluate these predictions, participants performed an auditory (Experiment 1) or visual (Experiment 2) two-interval duration discrimination task (Grondin, 2010) under three conditions: (1) s always preceded c within a single experimental session (i.e., 〈sc〉 blocked), (2) s always followed c (i.e., 〈cs〉 blocked), and (3) stimulus order was random (i.e., 〈sc〉 and 〈cs〉 intermixed). The three conditions differed only in stimulus order; the stimuli themselves were physically identical across all conditions. The present experiments therefore enable a systematic and comprehensive analysis of the Type B effect and of sequential effects on PSE.

Experiment 1

Method

Participants

Seventeen female and 7 male volunteers with normal hearing (mean age: 24.5 ± 6.8 years) participated in three experimental sessions on separate days. Each session lasted approximately 60 min. All participants were naïve about the purpose of the study and were reimbursed for participating in the experiment. One participant was replaced because of too many trials without a response.

Apparatus and stimuli

The experiment was run in a sound-attenuated booth. A Mac Pro 3.1 (Apple, Inc.) controlled both stimulus presentation and response recording. Instructions and feedback appeared on a computer screen (Samsung SyncMaster 1100 MB, 1,024 × 768 pixels, 150 Hz), placed approximately 60 cm from the participant. The “y" (i.e., “z" on a QWERTY keyboard) and “m” key of a standard Apple QWERTZ USB-keyboard served as the left and right response keys, respectively.Footnote 6 The experiment was programmed in MATLAB (The MathWorks, Inc., Version R 2009a) using the Psychophysics Toolbox 3.0.8 (Brainard, 1997; Pelli, 1997). The auditory stimuli were filled intervals of white noise, sine ramped and damped with rise and fall times of 10 ms, and were presented binaurally through headphones (Sennheiser HD 212Pro) at a peak level of 65 dB SPL. A new interval of white noise was generated for each stimulus on each trial.

Procedure

Participants performed a duration discrimination task. On each trial, two auditory intervals were presented in succession. One of these intervals had a constant duration of 500 ms (standard s), and the other interval had a variable duration ranging from 400 to 600 ms in constant steps of 20 ms (comparison c). A trial started with the presentation of the first stimulus. After an interstimulus interval of 1,000 ms, the second stimulus was presented. Following stimulus presentation, participants pressed the left response key with their left index finger to indicate that they judged the first stimulus as being longer than the second one and the right response key with their right index finger to indicate that they judged the second stimulus as being longer than the first one. Immediately after the response, participants received feedback, displayed for 400 ms at the center of the screen, about which stimulus was physically longer. A “1” or “2” indicated that the first or second stimulus was longer, respectively, and an “=” sign indicated that the two stimuli were physically identical in duration. If participants did not respond within 5,000 ms after the offset of the second stimulus, the trial was terminated, and “zu langsam” (too slow) was displayed for 800 ms on the screen. After an intertrial interval of 1,600 ms, the next trial began.

There were three conditions tested in separate sessions. The order of sessions was counterbalanced across participants. In the 〈scblocked condition, the first stimulus was s, and the second stimulus was c, so the temporal order of stimuli was 〈sc〉 on each trial. In the 〈csblocked condition, the temporal order of stimuli was reversed; that is, the first stimulus was c, and the second stimulus was s on each trial. In the 〈scandcsrandom condition, stimulus order was random on each trial; that is, the two possible orderings 〈sc〉 and 〈cs〉 occurred randomly intermixed within a single session. Note that the stimuli were physically identical in all conditions; only the order of stimuli and block type (i.e., blocked vs. random) differed. In all conditions, participants received the same written instruction displayed on the computer screen—namely, to indicate by keypress which of the two stimuli (the first or the second) was longer. Participants were not informed about the procedural details of the experiment. For example, they were not told that there was a constant and a variable stimulus on each trial. In addition, they were not told whether stimulus order was blocked or random. A postexperimental interview revealed that participants were not aware of these conditions.

In the blocked conditions, each of the 11 levels of c was presented 60 times, resulting in a total of 660 trials. To equalize the total number of trials per condition in the random condition, each of the 11 levels of c was presented 30 times for the stimulus order 〈sc〉 and 30 times for 〈cs〉. Participants could take a short rest after every 110 trials. At the beginning of each session, there was a practice block of 22 trials, and participants were informed when this block was finished (each of the 11 levels of c was presented twice in the blocked conditions and once for each stimulus order in the random condition). Practice trials did not enter into the data analysis.

Design and dependent variables

The data from the random condition were analyzed separately for the two stimulus orders. Thus, there was a stimulus order (〈sc〉 vs. 〈cs〉) × block type (blocked vs. random) within-subjects design. The dependent variables were the DL and the PSE. In order to assess whether participants traded speed against accuracy, we also measured response time (RT) from the offset of the second interval to the onset of response.

Psychometric functions and estimation of DL and PSE

For the blocked conditions, a separate logistic psychometric function for each stimulus order (i.e., 〈sc〉 and 〈cs〉) was fitted to individual data sets using a maximum likelihood procedure (for an implementation, see Bausenhart, Dyjas, Vorberg, & Ulrich, 2012). In this logistic psychometric function

$$ F(c) = \frac{1}{{1 + exp\left[ { - \left( {c - a} \right)/b} \right]}}, $$

c denotes the duration of the comparison, a is the PSE, and b > 0 reflects the slope and, thus, assesses discrimination performance, that is, DL = ln(3)·b (e.g., Bush, 1967, p. 448). For the random conditions, DL and PSE were estimated under a constraint that forces the psychometric functions averaged over stimulus orders to pass through the point (s, 0.5) (Ulrich, 2010; Ulrich & Vorberg, 2009).

Results and discussion

Figure 4 shows the data for each participant and the fitted psychometric functions for stimulus orders 〈sc〉 and 〈cs〉 in the blocked conditions. For almost all participants, the estimated psychometric function for stimulus order 〈cs〉 is shallower than the one for stimulus order 〈sc〉, indicating worse discrimination performance for 〈cs〉 trials, that is, a pronounced Type B effect. As one may expect, the pattern of results varies across participants. First, the discrimination performance differs greatly between participants. Second, for some participants, a strong Type B effect can be observed (e.g., participants 6, 13, 17, and 24), whereas for others, this effect is only weak or even absent (e.g., participants 1, 3, 9, and 20). As was noted by a reviewer, it might be possible that participants fall into two groups, such that one small group does not show the Type B effect, whereas the other group does. Within the framework of IRM, this would mean that one group of participants relies only on the stimulus information present on the current trial (i.e., g = 0), whereas the other group relies on an internal reference in order to additionally use more remote stimulus information (i.e., g > 0). It remains to be tested whether such interpersonal differences exist and whether they are stable or can be changed strategically. For some participants, a marked shift of the psychometric function is also visible (e.g., participants 4, 11, and 16).

Fig 4
figure 4

Data sets from the blocked conditions of Experiment 1 (auditory duration discrimination task). For each of the 24 participants, data and fitted psychometric functions for stimulus orders 〈sc〉 and 〈cs〉 are shown. The digit in the lower right corner gives the participant number. The vertical and horizontal dotted lines indicate c = s = 500 ms and p = 0.5, respectively

Figure 5 depicts the data for the random condition. As for the blocked conditions, the psychometric functions for stimulus order 〈cs〉 are shallower than those for 〈sc〉, again indicating inferior discriminability on 〈cs〉, as compared with 〈sc〉, trials. As before, the pattern of results varies across participants. It is remarkable that the data patterns between blocked and random stimulus order are quite consistent across participants. This also indicates that the effects for each participant are stable across sessions and conditions.

Fig. 5
figure 5

Data sets from the random condition of Experiment 1. For each of the 24 participants, data and fitted psychometric functions for stimulus orders 〈sc〉 and 〈cs〉 are shown. Note that in contrast to the data shown in Fig. 4, for these data the average of the two psychometric functions must pass through the point (500, 0.5)

Analysis for effects of stimulus order and block type

From the psychometric functions depicted in Figs. 4 and 5, DLs and PSEs were estimated for each participant (see Bausenhart et al., 2012). Separate analyses of variance (ANOVAs) were performed for DL, PSE, and RT. Figure 6a shows mean DL as a function of stimulus order and block type. As was evident from the data for individual participants and in line with IRM’s predictions and Nachmias’ (2006, Table 1) results, discrimination performance was better for the stimulus order 〈sc〉 (DL = 53 ms) than for 〈cs〉 (DL = 104 ms), F(1,23) = 13.57, p = .001, η 2p = .37. Neither the main effect of block type nor the interaction of the two factors was significant (both Fs < 1). The Weber fraction, DL/s, was 0.11 and 0.21 for stimulus order 〈sc〉 and 〈cs〉, respectively. This amounts to a meaningful increase of almost 100 %.

Fig. 6
figure 6

Mean difference limen (DL; top panel, a), mean point of subjective equality (PSE; middle panel, b), and mean response time (RT; bottom panel, c) ± 1 · SE as a function of stimulus order and block type for Experiment 1. The standard errors (SEs) of the means were computed according to Cousineau (2007)

In contrast to Nachmias’ (2006, Table 2) results,Footnote 7 PSE did not differ significantly for the two stimulus orders, F(1,23) = 2.15, p = .16 (Fig. 6b). Again, neither the effect of block type nor the interaction of stimulus order and block type was significant (both Fs < 1). The average PSE across participants and conditions was equal to 502 ms. Analysis of RT suggests that the DL difference between 〈sc〉 and 〈cs〉 is not due to a speed–accuracy trade-off (Fig. 6c). Neither stimulus order, F(1,23) = 2.29, p = .14, nor block type, F(1,23) = 1.30, p = .27, nor the interaction of these factors, F < 1, significantly influenced RT.

Table 2 Numerical example to illustrate the decrement of the variance of the internal reference \( Var\left( {\left. {{{\mathbf{I}}_n}} \right|\left\langle {sc} \right\rangle } \right) \) with increasing number of trials n for various values of g. For this example, σ 2 = 1. As is evident, \( Var\left( {\left. {{{\mathbf{I}}_n}} \right|\left\langle {sc} \right\rangle } \right) \) converges rapidly and typically approaches the asymptote after fewer than 10 trials

Analysis for trial sequence effects

As was discussed in the introduction, IRM predicts a specific pattern of sequence effects, that is, effects of the comparison magnitude on the previous trial on discrimination on the current trial (cf. Fig. 3). We therefore analyzed data from trials for which the magnitude of the comparison on the previous trial c n − 1 was small (i.e., c n − 1 < s) and from trials for which it was large (i.e., c n − 1 > s) separately. Only data from the blocked conditions could be used for this analysis, because in the random condition, not only the magnitude of c n − 1, but also the order of s and c varies. Trials for which the magnitude of c n − 1 was physically identical to the standard were excluded. Thus, separate logistic psychometric functions were estimated for the two previous comparison magnitudes (i.e., small vs. large), and this was done for each blocked condition (〈sc〉 and 〈cs〉). From these psychometric functions, DL and PSE were computed. In addition, RT was calculated. Each dependent measure was submitted to a separate ANOVA with the two factors previous comparison magnitude (small vs. large) and stimulus order (〈sc〉 vs. 〈cs〉).

Figure 7b depicts PSE as a function of stimulus order and previous comparison magnitude. The ANOVA for PSE revealed a significant main effect of previous comparison magnitude, F(1,23) = 6.96, p = .015, η 2p = .23, and PSE was greater (513 ms) if the previous comparison was small than if it was large (491 ms). This main effect, however, should not be interpreted meaningfully, because it resulted from an interaction of stimulus order and previous comparison magnitude, F(1,23) = 13.72, p = .001, η 2p = .37. Consistent with the prediction of IRM, PSE for 〈cs〉 trials was greater if the previous comparison was small than if it was large. For 〈sc〉 trials, the magnitude of the previous comparison exerted virtually no influence on PSE. The main effect of stimulus order did not reach significance, F(1,23) = 1.29, p = .268.

Fig. 7
figure 7

Mean difference limen (DL; top panel, a), mean point of subjective equality (PSE; middle panel, b), and mean response time (RT; bottom panel, c) ± 1 · SE as a function of blocked stimulus order (〈sc〉 vs. 〈cs〉) and previous comparison magnitude (small [i.e., c n − 1 < s] vs. large [i.e., c n − 1 > s]) for Experiment 1. The standard errors (SEs) of the means were computed according to Cousineau (2007)

Consistent with the previous overall analysis of DL, DL was smaller for 〈sc〉 trials (45 ms) than for 〈cs〉 trials (103 ms), F(1,23) = 11.61, p = .002, η 2p = .34, again indicating a Type B effect (cf. Fig. 7a). There was a statistical trend for better discrimination performance if the previous comparison was large (DL = 69 ms) than if it was small (DL = 79 ms), F(1,23) = 4.03, p = .057, η 2p = .15. The interaction of stimulus order and previous comparison magnitude was not significant, F < 1.

Analysis of RT suggested that the results above are not due to a speed–accuracy trade-off (see Fig. 7c). Neither stimulus order nor previous comparison magnitude (both Fs < 1), nor the interaction, F(1,23) = 1.60, p = .219, was significant.

Association between Type B effect and trial sequence effect

It might be informative to assess whether the Type B effect (i.e., an effect of stimulus order on DL) and the trial sequence effect (i.e., an effect of the comparison on the previous trial on PSE) are associated. If the Type B effect and the trial sequence effect are due to a common underlying mechanism, there might be an association between the magnitude of both effects. We conducted a correlational analysis to investigate whether such an association exists. Since the sequential effects can be examined only for blocked stimulus order, we restrict the following analysis to the blocked conditions. In order to quantify the Type B effect for each participant, we subtracted DL for 〈sc〉 trials from DL for 〈cs〉 trials, such that Type B effect = DL csDL sc. In a similar vein, in order to quantify the trial sequence effect, we subtracted PSE for trials with large c on the preceding trial from PSE for trials with small c on the preceding trial, such that \( sequence\;effect = PS{E_{{{c_{{n - 1}}} < s}}} - PS{E_{{{c_{{n - 1}}} > s}}} \). The correlation between these two measures was r = .89, t(22) = 8.09, p < .001, indicating that the Type B effect and the trial sequence effect are indeed associated, supporting the notion of a common underlying mechanism.

In summary, the results of Experiment 1 revealed a strong Type B effect that was of the same size for blocked and random stimulus order. This result is consistent with the predictions of IRM and suggests that g does not differ between blocked and random stimulus presentations. Furthermore and also in line with IRM, sequential effects on PSE were observed for stimulus order 〈cs〉, but not for stimulus order 〈sc〉.

Experiment 2

In order to assess whether the obtained effects in Experiment 1 are specific to the auditory modality or generalize across modalities, Experiment 2 employed a visual duration discrimination task.

Method

Participants

A new sample of 24 female participants (mean age: 20.1 ± 2.4 years) participated in three sessions on separate days.

Apparatus, stimuli, and procedure

The apparatus was identical to the one in Experiment 1, except that no headphones were used. The visual stimuli were discs (diameter 50 pixels) presented on the computer screen in light gray (49.2 cd/m2) on a dark gray (5.4 cd/m2) background. Procedure and time course were identical to those in Experiment 1, except that the range of c was increased to take into account the inferior discrimination performance in the visual modality (e.g., Grondin, 2001). Specifically, the duration of c ranged from 300 to 700 ms in steps of 40 ms, but the duration of s was again 500 ms.

Results and discussion

Figures 8 and 9 depict the individual psychometric functions in the blocked and random conditions, respectively.

Fig. 8
figure 8

Data sets from the blocked conditions of Experiment 2 (visual duration discrimination task). For each of the 24 participants, data and fitted psychometric functions for stimulus orders 〈sc〉 and 〈cs〉 are shown. The digit in the lower right corner gives the participant number. The vertical and horizontal dotted lines indicate c = s = 500 ms and p = 0.5, respectively

Fig. 9
figure 9

Data sets from the random condition of Experiment 2. For each of the 24 participants, data and fitted psychometric functions for stimulus orders 〈sc〉 and 〈cs〉 are shown. Note that in contrast to the data shown in Fig. 8, for these data the average of the two psychometric functions must pass through the point (500, 0.5)

Analysis for effects of stimulus order and block type

As in Experiment 1, DL was about twice as large for stimulus order 〈cs〉 (207 ms) as for 〈sc〉 (98 ms), demonstrating again a strong Type B effect, F(1,23) = 10.28, p = .004, η 2p = .31 (cf. Fig. 10a). In contrast to Experiment 1, DL was reliably larger in the random (178 ms) than in the blocked (127 ms) condition, F(1,23) = 5.62, p = .027, η 2p = .20. Within IRM, this increase could reflect an increase of g for random stimulus order, indicating that participants rely more on the internal reference in this condition. There was no significant interaction, F(1,23) = 2.61, p = .120. The analogous ANOVA on PSE revealed no significant effects, Fs < 1 (Fig. 10b). Finally, RT was longer on 〈sc〉 than on 〈cs〉 trials (497 vs. 458 ms, respectively; Fig. 10c), F(1,23) = 5.61, p = .027, η 2p = .20. Furthermore, longer RTs were observed in the random than in the blocked condition (499 vs. 456 ms, respectively), although this effect was only marginally significant, F(1,23) = 3.16, p = .089, η 2p = .12. Both factors did not interact significantly, F < 1. The RT results suggest that discrimination in the random condition is more demanding than in the blocked condition. Surprisingly, RTs were larger for 〈sc〉 trials than for 〈cs〉 trials. A speculative yet plausible explanation for this finding is that for stimulus order 〈cs〉, participants might sometimes decide which stimulus is the larger one already after the first stimulus—that is, the variable c—has been presented. Accordingly, they might elicit their responses faster for this stimulus order than for order 〈sc〉, in which the first stimulus never conveys sufficient information about the required response.

Fig. 10
figure 10

Mean difference limen (DL; top panel, a), mean point of subjective equality (PSE; middle panel, b), and mean response time (RT; bottom panel, c) ± 1 · SE as a function of stimulus order and block type for Experiment 2. The standard errors (SEs) of the means were computed according to Cousineau (2007)

Analysis for trial sequence effects

As for Experiment 1, we separately analyzed the data of the blocked conditions from trials for which c on the previous trial was small and from trials for which c on the previous trial was large. The ANOVA for PSE revealed that consistent with the predictions of IRM and in line with the results of Experiment 1, stimulus order and previous comparison magnitude interacted significantly, F(1,23) = 10.33, p = .004, η 2p = .31 (Fig. 11b). Separate t tests were performed for the two stimulus orders. For 〈sc〉 trials, the PSE for small versus large c on the preceding trial did not differ significantly, t(23) = −1.70, p = .103. However, for 〈cs〉 trials, the PSE was significantly greater for small c on the preceding trial than for large c on the preceding trial, t(23) = 2.13, p = .044. The main effects of the factors stimulus order and previous comparison magnitude were not significant, Fs < 1. Consistent with the analysis above, discrimination performance as indexed by DL was better on 〈sc〉 than on 〈cs〉 trials (90 and 172 ms, respectively; Fig. 11a), F(1,23) = 9.14, p = .006, η 2p = .28. There were no other significant effects on DL, Fs < 1. The ANOVA on RT revealed neither a significant effect of stimulus order, F(1,23) = 1.61, p = .217, nor one of previous comparison magnitude, F(1,23) = 2.63, p = .119, nor an interaction of these factors, F < 1 (Fig. 11c).

Fig. 11
figure 11

Mean difference limen (DL; top panel, a), mean point of subjective equality (PSE; middle panel, b), and mean response time (RT; bottom panel, c) ± 1 · SE as a function of blocked stimulus order (〈sc〉 vs. 〈cs〉) and previous comparison magnitude (small [i.e., c n − 1 < s] vs. large [i.e., c n−1 > s]) for Experiment 2. The standard errors (SEs) of the means were computed according to Cousineau (2007)

Association between Type B effect and trial sequence effect

The correlation between the Type B effect and the trial sequence effect was smaller than in Experiment 1, r = .30, and was only marginally significant, t(22) = 1.36, p = .093, indicating that the association between these effects was weaker than in Experiment 1. This might be due to more noisy data in Experiment 2, which employed a visual duration discrimination task, as compared with the auditory duration discrimination task employed in Experiment 1.

In sum, the results of Experiment 2 again revealed a strong Type B effect and a sequential dependency of PSE on previous comparison magnitude. These results are thus consistent with the results of Experiment 1 and the predictions of IRM. In contrast to Experiment 1, larger DLs were observed in the random than in the blocked condition.

General discussion

The idea that an internal reference or internal standard builds up across trials in a discrimination task has been put forward by several authors (e.g., Durlach & Braida, 1969; Helson, 1947, 1964; Michels & Helson, 1954; Morgan et al., 2000; Nachmias, 2006). The IRM proposed here provides an especially simple and plausible mechanism of how such an internal reference emerges. The mechanism underlying this model involves a geometrically moving average with a single free parameter, namely weight g. The present experiments were designed to evaluate two major inherent predictions of IRM that are independent of the specific value of g.

Evaluation of IRM’s major predictions

According to IRM’s first major prediction and in contrast to standard difference models (e.g., Luce & Galanter, 1963), discrimination performance as indexed by DL is better when the constant standard s precedes, rather than follows, the variable comparison c. Consistent with this prediction of a Type B effect, DL was considerably increased for stimulus order 〈cs〉, as compared with stimulus order 〈sc〉. This effect was observed for an auditory duration discrimination task (Experiment 1), as well as for a visual duration discrimination task (Experiment 2). In addition, this Type B effect had about the same magnitude for random as for blocked stimulus orders, a result that is also consistent with IRM.

Although a large body of literature on order effects exists, previous studies have focused primarily on the effects of stimulus order on PSE, that is, on the classical time-order error (for a review, see Eisler, Eisler, & Hellsström, 2008). Nonetheless, there is also some evidence for the existence of the Type B effect not only in temporal discrimination, but also across a variety of tasks and modalities (cf. Table 1). The majority of studies reviewed in the introduction reported better discrimination for stimulus order 〈sc〉 than for 〈cs〉. To our knowledge, the only exceptions—that is, reversed Type B effects—were reported for very brief stimuli and interstimulus intervals (Hellström, 2003, Table 7; Hellström & Rammsayer, 2004). These findings are inconsistent with the predictions of the current version of IRM. It might be that the mechanism underlying discrimination of brief stimuli presented with short interstimulus intervals is special; for example, memory processes and interference between stimuli might play a crucial role. Future extensions of IRM may incorporate such processes (e.g., by allowing for negative weighting of prior experience, that is, − 1 < g < 1) in order to explain an even broader range of phenomena.Footnote 8

Regardless of whether discrimination performance is better for 〈sc〉 or for 〈cs〉 trials, the theoretical significance of these Type B effects has often been neglected in previous research, presumably because standard difference models cannot account for any influence of stimulus order on discrimination performance. Nevertheless, these difference models form the basis of various very prominent psychophysical theories, including Signal Detection Theory. Due to its dynamic updating mechanism, IRM directly implies effects of stimulus order on discrimination performance. Therefore, IRM is a promising extension of standard difference models.

A second major prediction of IRM concerns sequential dependencies of PSE in the blocked conditions. As was outlined in the introduction (see also Fig. 3), IRM predicts that the magnitude of c on the preceding trial modulates the judged stimulus magnitude on the current trial. Specifically for stimulus order 〈cs〉, according to IRM, a large c on the previous trial increases the internal reference. Consequently, the magnitude of c on the present trial is overestimated, and thus PSE becomes smaller than s. Likewise, a small c on the previous trial decreases the internal reference, and thus c on the present trial is underestimated, leading to a PSE larger than s. In contrast, on 〈sc〉 trials, c does not enter into the internal reference, and therefore the magnitude of c on the preceding trial should not modulate PSE on the present trial. This second major prediction of IRM was established empirically in Experiments 1 and 2 as well.

It should be noted, however, that a more complex version of IRM can predict a similar pattern of results. This more complex version assumes that not only the first, but also the second stimulus of each trial is integrated into the internal reference. For example, the internal reference I n on trial n could alternatively be established as

$$ {{\mathbf{I}}_n} = g \cdot \left[ {g \cdot {{\mathbf{I}}_{{n - 1}}} + \left( {1 - g} \right) \cdot {{\mathbf{X}}_{{2,n - 1}}}} \right] + \left( {1 - g} \right) \cdot {{\mathbf{X}}_{{1,n,}}} $$
(3)

with weight g, 0 ≤ g < 1. In this version, X 1,n denotes the internal representation of the first stimulus on the present trial, and X 2,n − 1 denotes the internal representation of the second stimulus on the preceding trial. Thus, the second stimulus on the present trial is compared with a conglomerate of all previously presented stimuli. Monte Carlo simulations employing Equation 3 revealed virtually identical results to those of the simple IRM embodied by Equation 1.Footnote 9 In accordance with Occam’s principle, we prefer the simple version. After all, the task itself may suggest the strategy of integrating only the first stimulus, given that the participant has to memorize the first stimulus until the second is presented and then judge which one was longer.

IRM and the single-stimulus paradigm

In the present study, we investigated the effect of the preceding comparison on the discrimination process for blocked stimulus order, and it was demonstrated that the size of the preceding comparison can strongly influence the judgment on the current trial. Similar sequential effects have been reported with the method of single stimuli (Lages & Treisman, 1998; Treisman & Williams, 1984). Here, the standard s is presented only at the beginning of the experiment. On subsequent trials, participants receive only the variable comparison c and judge whether c is smaller or greater than s.

In Lages and Treisman’s (1998) first experiment, participants indicated whether a comparison sine-wave grating was higher or lower in spatial frequency than the standard. In the analysis for sequential stimulus dependencies, the authors found that PSE was greater when the spatial frequency of the preceding stimulus was high than when it was low. Thus, in this single-stimulus experiment, the effect of the preceding c on PSE was in the opposite direction than in our experiment. Lages and Treisman explained this negative sequential effect within the framework of the Criterion Setting Theory, which is an extension of Signal Detection Theory (Green & Swets, 1966). Like Signal Detection Theory, Criterion Setting Theory assumes that discrimination is based on a decision process that compares sensory input represented on a decision axis with a response criterion. In contrast to Signal Detection Theory, Criterion Setting Theory assumes that the response criterion changes from trial to trial in order to optimize performance (see also Lages & Treisman, 2010; Treisman & Lages, 2010, for criterion setting in different tasks and contexts). Specifically, the criterion is increased when the internal representation of the preceding c was above the criterion and is lowered if this representation was below the criterion.

Although this theory accounts for the negative sequential effect, it should be noted that IRM can also account for this finding in the single-stimulus method. Accordingly, the participant compares the internal representation X n of the comparison on the present trial with the internal reference I n . If X n is larger than I n , the participant responds with “c > s”; otherwise, with “c < s.” Importantly, the internal reference is updated on every trial according to the process described above and is, in this case,

$$ {{\mathbf{I}}_n} = g \cdot {{\mathbf{I}}_{{n - 1}}} + \left( {1 - g} \right) \cdot {{\mathbf{X}}_{{n - 1,}}} $$
(4)

with weight g, 0 < g < 1.

In order to assess the behavior of IRM for the single-stimulus paradigm, we ran a Monte Carlo simulation that was based on this updating process. For this simulation, we used the same set of stimuli and the same parameter settings (cf. Fig. 2) as for the other simulations. Here s was presented only once at the beginning, and on all subsequent trials, only c was presented. Trials were grouped according to the magnitude of the preceding c, that is, whether c n − 1 < s or c n − 1 > s. Figure 12 depicts the two psychometric functions depending on the magnitude of the preceding c for weights g = 0.3, 0.5, and 0.7. As can be seen, the psychometric function is shifted to the left for c n − 1 < s and to the right for c n − 1 > s, demonstrating a negative sequential effect, just as Lages and Treisman (1998) observed empirically. This sequential effect decreases with increasing g.

Fig. 12
figure 12

Predictions of the Internal Reference Model (IRM) for the psychometric functions in the method of single stimuli. In this method, the constant standard s is presented only at the beginning of the experiment, and on all subsequent trials, participants judge whether a variable comparison c is smaller or greater than s. Predictions were derived from a Monte Carlo simulation that mimicked a participant’s responses. Data were grouped according to the magnitude of c on the preceding trial, that is, whether c n−1 < s or c n−1 > s

In this simulation, the internal reference is completely driven by the history of the stimulus sequence and, thus, established by bottom-up processing. Nevertheless, the size of this internal reference might also be influenced by top-down processes to adjust for payoffs. Consequently, the internal reference might serve as a criterion that participants can shift according to task requirements.

Time-order error and Type B effect

A theoretically important issue is whether the classical TOE and the Type B effect are independent phenomena or are just two sides of the same coin. In the present experiments, the analyses revealed no effects of stimulus order or block type on PSE; that is, there was no classical TOE. Thus, our results are consistent with the notion that the TOE (i.e., effect on PSE) and the Type B effect (i.e., effect on DL) are dissociable. This, however, does not necessarily exclude the possibility that the two effects emerge from the same underlying mechanism. For instance, relations between the TOE and the Type B effect might rather be observable in experiments employing roving standards or no fixed standard at all (cf. Hellström, 2000, 2003). Since the present version of IRM involves no specific assumptions that allow one to predict classical TOEs, this version might suggest that the Type B effect and the classical TOE stem from different mechanisms. Future extensions of the model should address how TOEs can emerge within this framework. For example, merging IRM and Hellström’s Sensation Weighting model might be fruitful in order to provide a more general account for these phenomena. Such an extended model may also account for the reversed Type B effect that was observed for brief stimuli and short interstimulus intervals (Hellström, 2003, Table 7; Hellström & Rammsayer, 2004).

Conclusion

A strong Type B effect was observed in an auditory and a visual two-stimuli discrimination task. This effect was independent of whether the two stimulus orders 〈sc〉 and 〈cs〉 were presented in separate blocks or randomly within a single block. Moreover, PSE was modulated by the magnitude of the comparison on the preceding trial. These results are consistent with the predictions of IRM, which can be seen as a hybrid of the Sensation Weighting model (e.g., Hellström, 1985) and classical difference models (Thurstone, 1927a, 1927b). Given that a strong Type B effect was previously observed in various studies across a wide range of tasks and in different modalities (cf. Table 1), it seems rather unlikely that the obtained results are specific to the temporal discrimination tasks employed in the present study. Classical difference models of discrimination performance such as Signal Detection Theory cannot explain the Type B effect. The IRM is a straightforward extension of these standard models; it suggests a simple and plausible mechanism that accounts for the Type B effect, as well as for sequential dependencies across trials. A general question that may arise is how IRM relates to sensory processing, decision making, and memory.