INTRODUCTION

The focus of this article is the classical problem of level encoding and its relation to the physiological response properties of auditory-nerve (AN) fibers. The pioneering work in this area is a series of papers from Siebert (1965, 1968). Siebert took a mathematical modeling approach to derive expressions for the sensitivity index for performance in intensity discrimination. Siebert assumed that the action potentials on a single AN fiber could be represented mathematically as a stochastic point process, specifically a Poisson process. He further assumed that the variability of the firing times on each neuron was statistically independent from fiber to fiber, consistent with the results of Johnson and Kiang (1976). With these assumptions, the only additional information needed to specify the model completely was the rate of firing for each fiber and, most important, the dependence of this firing rate on stimulus level. Siebert assumed a convenient form for this dependence that allowed an analytic solution for the performance of an ideal observer (basically the best performance that could be achieved given the statistical nature of the firing patterns) based on the complete set of neural firings. The current study extends Siebert’s work with analytic expressions that allow explicit description of the level dependence of the temporal response. A simple description of the rate function for each nerve fiber is specified as a function of time and level, and a nonstationary Poisson process is assumed. Analytical performance measures are derived that allow comparisons among the different information sources regarding the level of the stimulus, including the average rate of responses, and the temporal synchronization and relative phases of responses at low frequencies.

Although many people have extended Siebert’s work with computations of performance based on more detailed assumptions about peripheral coding (e.g., Goldstein 1980; Delgutte 1987; Viemeister 1988; Winslow and Sachs 1988; Winter and Palmer 1991; Huettel and Collins 1999; Heinz et al. 2001a,b; reviewed by Delgutte 1996), most of these studies were essentially computational in nature. The computational approach does not take advantage of mathematical expressions that give insight into the relationship among the various sources of information and parameters of dependence. In addition, these studies have shown that the robust level-discrimination performance demonstrated by human listeners is not accounted for by the optimal use of average-rate information in the AN. Thus, it is of interest to examine whether the optimal use of temporal information in AN responses provides a better account of robust performance.

In the following section, general results are derived that are used throughout the article. Then, analytical results based on average rate and on temporal information are presented in separate sections, followed by general discussion.

THEORETICAL CALCULATIONS

General methods for characterizing performance

A convenient parameter for summarizing empirical performance and theoretical predictions is the sensitivity per decibel δ′(Heinz et al. 2001a; based on the sensitivity-per-Bel measure of Durlach and Braida 1969; Braida and Durlach 1972). This parameter is used here because it has been shown to be generally appropriate for intensity discrimination experiments and allows convenient combination of information from independent sources. Specifically, the sensitivity per decibel δ′ is defined in terms of the usual sensitivity coefficient d′ between two levels L and L + ΔL (Rabinowitz et al., 1976; Buus and Florentine 1991):

where L and L + ΔL are measured in decibels (SPL). If the just-noticeable difference (JND) in level is defined by the value of ΔL giving unity d′, then the JND is equal to 1/δ′. It follows directly that Weber’s Law, which refers to a constant JND as a function of level, corresponds to a δ′ that is independent of the reference level L. Similarly, the “near miss” to Weber’s Law, which refers to the slight improvement in level discrimination of tones that has been experimentally observed as level increases (McGill and Goldberg 1968b; Florentine et al. 1987), corresponds to a δ′(L) that increases with L.

The theoretical significance of the parameter δ′ can be appreciated from the combination of Eq. (1) with the definition of d′(L, L + ΔL) from signal detection theory. Specifically, if it is postulated that decisions are made by comparing the value of an underlying random variable X (the decision variable) with a threshold that is chosen for each experiment in a manner that accounts for bias and judgment factors, then achievable performance can be characterized by a single parameter. This parameter is d′(L, L + ΔL) and is defined by the relation

where E[X;L] and var[X;L] are the expected value and variance of X given the level L. This form of the equation for d′ assumes that the variance of X for L and L + ΔL are approximately the same, which is true for the level increments of interest, i.e., near threshold. The δ′ measure is convenient because (δ′)2 [and (d′)2] is an additive parameter for an optimum linear combination of uncorrelated decision variables. That is, if X is given by the relation

where the c m are weighting parameters and the Y m are uncorrelated random variables, then the best performance (maximum δ′) that can be obtained (allowing any choice of the parameters c m ) is given by

where δ′ m is the sensitivity per decibel for the variable Y m and is defined by relations parallel to Eqs. (1) and (2). This additivity theorem for (δ′ m )2 is especially significant for constant-variance Gaussian (normal) or Poisson random variables where the differences in the distributions of the decision variables are determined by changes in the means. In these cases, the optimum linear combination of the random variables results in performance as good as or better than any nonlinear combination.

The general calculation of the best achievable performance as limited by the statistical properties of the data can be done with the likelihood ratio test. In this test, a decision is based on the relative probabilities (or probability densities) of the observations under the two hypotheses under consideration. In other words, to discriminate between the levels L and L + ΔL, one calculates the ratio of the conditional probabilities of the available observations (conditional on each of these levels) and then compares the ratio to a threshold. This threshold is determined by the chosen performance criterion and is based on a priori probabilities and the relative costs and benefits of the possible outcomes. Since comparison to a threshold is not affected by monotonic transformations of both sides of the inequality, the log-likelihood ratio (formed by taking the logarithm of the likelihood ratio and the threshold) is commonly used for specific computations. In the case of statistically independent observations, the log operation results in a summation of the log-likelihood ratios for the individual observations.

Performance based on Poisson observations

Performance can be characterized for the specific case of a nonstationary (time-varying) Poisson process, which is specified by r(t,L), the instantaneous rate as a function of time t and level L (e.g., Siebert 1970; Rieke et al. 1997; Heinz et al. 2001a). The likelihood ratio test can be shown to be equivalent to the following test:

where the set of t i are the times at which discharges occur for a given stimulus presentation, N is the total number of discharges (the count) during the presentation of the stimulus, and C is the threshold. This inequality [lneq. (5)] describes a processor for AN responses that could perform the discrimination task when the rate function r(t,L) is known to the central processor. The processor calculates the ratio of the two rate functions at each observed discharge time t i , sums the log of the ratio across all discharges, and compares the value of the sum to a threshold. This processor is evaluated below with different assumptions about the rate function r(t,L).

In the case that r(t,L) is independent of time t (during the response to the stimulus), the summand in Ineq. (5) is independent of t, and the optimum test reduces to

where the count N contains all relevant information from the Poisson process and C is again the threshold. Since performance depends only on N, the general results for a decision variable X given above can be applied here. It follows directly from the statistics of the (Poisson) variable N that the parameter δ′ is given by

where r(L) is the value of r(t,L) during the stimulus of duration T. For the final approximation in Eq. (7), it is assumed that r(L) varies continuously with L over the increment ΔL so that the derivative exists, and r(L + ΔL) ≅ r(L) + (ΔL)[dr(L)/dL]. Since L and ΔL are measured in decibels, this approximation is only meaningful if the derivative of the function r(L) is taken with respect to L in decibel units. The expression in Eq. (7), which is used extensively in subsequent sections, is equivalent to the result provided by Siebert (1965, 1968).

In order to include the information provided by the specific times of the neural firings, (i.e., temporal information), an expression is required for δ′ when the decision variable is equal to the left side of Ineq. (5) and the rate r(t,L) depends on t. Using approximations similar to those used for Eq. (7), the resulting expression for δ′ is given by

A detailed derivation of this expression can be found in Heinz et al. (2001a).

Performance based on a simple description of the rate function for AN responses

Attention is restricted to a simple analytical expression for r(t,L) that is a good description of the discharge patterns in response to tones for many AN fibers (see Colburn 1973). Specifically, it is assumed that r(t,L) for a tonal stimulus at frequency f is given by

This expression for r(t,L) can be understood by noting that the shape of the time dependence is described by an exponentially rectified cosine: exp{g(L) cos(2πft + Θ(L))}. The instantaneous discharge rate is monotonically related to the sinusoid in a way compatible with most AN data. When the value of the sinusoidal function is large and positive, the instantaneous discharge rate is large and positive; when the value is large and negative, the rate approaches a value near zero. The size of the level-dependent parameter g determines the degree of synchrony. This synchrony parameter g is related to the familiar synchrony index or vector strength VS by the relation VS = 2I 1[g]/I 0[g].

To be compatible with AN data, g(L) must increase as a function of level to a maximum value that depends on the stimulus frequency (Johnson 1980). Note that the modified Bessel function I 0[g] Eq. (9) is equal to the time average of the exponentially rectified cosine term and thus, the average discharge rate is given by r(L). The average rate r(L) must be specified independent of the synchrony g(L) to separate the contributions of rate and synchrony cues in the computations below. Finally, note that the phase parameter Θ(L) depends on level, and the phase of the response could provide information about stimulus level, as discussed below.

The r(t,L) given in Eq. (9) can now be used to compute an explicit expression for the sensitivity index as represented in Eq. (8). Ignoring edge effects due to integrating over fractions of periods, one can show that the sensitivity index is given by the sum of three terms:

These terms arise in the following manner: The integrand in Eq. (8) can be rewritten in terms of the rate times the square of the partial derivative of the log rate. When the log of r(t,L),

is inserted into the equation for (δ′)2 [Eq. (8)], the partial derivative with respect to L results in a sum of derivatives of the terms on the right side of Eq. (11). The square of the derivative thus produces several cross terms within the integral, which can be solved and simplified to arrive at Eq. (10). The terms in the solution can be factored into those depending upon the derivative of rate with respect to level, dr(L)/dL; those depending upon the derivative of synchrony with respect to level, dg(L)/dL; and those depending on the derivative of the phase with respect to level, dΘ(L)/dL. Thus, the three terms in Eq. (10) represent the separate contributions of changes in count (or mean rate), synchrony, and phase to optimum level-discrimination performance. Note that the first term is consistent with the results from the count-based Poisson model described above Eq. (7).

The analysis resulting in Eq. (10) implicitly assumes that the processor makes full use of knowledge of the statistics of the process, which includes complete knowledge of the function r(t,L). Accordingly, knowledge of the time origin of the stimulus, and thus a phase reference, is assumed in order to use all of the synchrony and nonlinear-phase information. The third term in Eq. (10) expresses the maximum information contained within the phase dependence on level. When no absolute phase reference is available (as is generally believed), non-linear phase information is available from differences in timing of discharges across fibers tuned to different frequencies. This issue is addressed below.

Performance of multiple-channel models: several combination rules

The peripheral auditory system is clearly a multiple-channel structure, and any serious attempt to relate level discrimination to peripheral physiology must allow cross-channel combination of information. A fundamental consideration for understanding multiple-channel models is how activities on the individual channels are combined. There are, of course, an unlimited number of possible combination rules, and predicted behavior depends on this choice. The most important rule for this analysis is probably the optimum rule: If the activities on the individual channels are specified probabilistically (including the interchannel statistical dependencies), then signal detection theory allows calculation of the best performance achievable by any processing scheme. Models exist, possibly ad hoc and complex, that can achieve any level of performance between this “best” or optimum performance and chance performance. As in most black-box modeling tasks, the merit of a given model is usually based on its simplicity and economy of assumptions relative to the amount and complexity of the data it is able to predict or describe. In this section, several combination rules are outlined that have been suggested for level discrimination.

Three specific rules illustrate some of the important issues. The first rule is the optimum combination rule, suggested and analyzed by Siebert (1965, 1968) for AN fibers and used by Florentine and Buus (1981) for combining channels in an excitation-pattern model. As noted above, this rule allows computation of the limitations imposed on performance by peripheral encoding when individual channels correspond to individual nerve fibers. The second rule, suggested by Zwicker (1956) and Maiwald (1967b,c), is based on the use of a single channel at a time. This channel is the one that results in the best performance in a given situation. The third rule, analyzed by Goldstein (1974) for loudness judgments and by Teich and Lachs (1979) for discrimination, postulates that the sum of the counts from all fibers, i.e., the total count, is the decision variable.

For each of these rules, the relation between the total sensitivity per decibel (δ′op, δ′sc, or δ′tc, for optimum, single-channel, or total-count rules, respectively) and the statistics of the individual channels can be found. The relation is particularly straightforward for the optimum combination rule when it is assumed that the activities on the channels are Gaussian or Poisson and uncorrelated. In this case the optimal combination is a weighted sum of the individual-channel decision variables (when the distributions of the decision variable differ only in the means), and the squares of the individual δ′ m ; that is,

where m indexes the individual channels. This relation also holds more generally (i.e., for distributions other than Gaussian or Poisson) whenever the final decision variable is specified to be an optimally weighted linear sum of the statistically independent decision variables for the individual channels. For the single-channel rule, δ′sc is simply the maximum value of the δ′ m (maximum over all m); that is,

The third rule, the total-count-decision-variable case, results in the relation

where ΔE m is the change in the mean of the decision variable on the mth channel and V m is the corresponding variance. The total variance is the sum of the individual variances due to the assumption of statistically uncorrelated channels.

These relations can be compared and understood by considering a few special cases. If there is a single channel, performance is the same for all cases. If there are many channels but only one channel has a significant change in the mean for the two levels being discriminated, the optimum decision rule looks only at this channel (the same is true for the single-channel rule); however, the total count rule adds all channels. The variance therefore increases dramatically but the change in the mean remains equal to the single-channel case, thereby degrading performance relative to the other two rules. Another useful example is the case of N statistically identical channels. In this case, the optimum rule and the summation rule show an improvement in δ′ by a factor of √N relative to the single-channel rule, i.e., δ′op = δ′tc = √Nδ′sc.

As a last general step in preparation for the specific models addressed below, consider a set of channels with identical rate-level functions except for their thresholds, which are distributed according to a density function n(L). In these conditions, the sum over m above becomes a convolution of the threshold distribution, n(L), with [δ′(L)]2, the squared sensitivity per decibel of a fiber with L thr = 0. The optimum and summed-channel combination rules then result in the following equations:

where * represents the convolution operation and all quantities except ΔL are functions of the level L.

RESULTS: LEVEL DISCRIMINATION BASED ON AVERAGE-RATE INFORMATION

The ability of single-fiber counting models to explain Weber’s Law

The degree to which the average discharge rate of individual AN fibers can account for robust level discrimination over their limited dynamic range depends on the shape of their rate-level function and on the level dependence of the variance in their response. First, several simple rate-level functions are considered for the Poisson case to evaluate the relationship between the requirements for single AN fibers to produce Weber’s Law and known physiological response properties. The sensitivity per decibel δ′ (and thus the JND) can be calculated if the rate-level function r(L) is specified. Second, the effects of non-Poisson variance on the ability to explain Weber’s Law are explored by evaluating an existing model that includes dead-time refractoriness.

The effect of rate-level shape

The first rate-level function considered is given by the following equations in which all levels are in decibels re: threshold:

where SR is equal to the spontaneous rate of discharge, and L sat is the level above which the rate saturates. This function is plotted in Figure 1 for a value of L sat equal to 40 dB and for several values of SR. It can be verified that, for this choice of r(L), (δ′)2 in Eq. (7) is given by

Figure 1
figure 1

The rate-level function used to describe the rates of auditory-nerve fibers in the model. Levels are plotted in reference to threshold. The rate saturates at 40 dB above threshold. SR: spontaneous rate. SR = 0.5, 10, and 50 sp/s are shown.

In Figure 2, (δ′)2 vs. L is plotted for T = 0.1 s and three values of SR, corresponding to the three classes of fibers suggested by Liberman (1978): those with low spontaneous rates (Fig. 2A. with SR = 0.5 sp/s), medium spontaneous rates (Fig. 2B with SR = 10 sp/s), and those with high spontaneous rates (Fig. 2C with SR = 50 sp/s). Saturation at L sat makes the function zero above L sat. The dashed curves show the JND in decibels as a function of L for a single channel with the corresponding (δ′)2. Note that all functions are plotted relative to a threshold (which would vary among AN fibers).

Figure 2
figure 2

The square of the sensitivity index (δ′)2 for rate cues as a function of level L in decibels (solid curves) for model fibers with three different spontaneous rates of discharge: (A) SR = 0.5, (B) SR = 10, (C) SR = 50 sp/s. Note that these values of (δ′)2 have been multiplied by a factor of 100 (in this figure and Fig. 5 only) in order to plot them on the same axes with the just-noticeable difference (in decibels) as a function of L (dashed curves). Stimulus duration was 100 ms.

Several observations are relevant here. First, note that a single low-to-medium spontaneous-rate fiber provides sufficient rate information for a JND of 3 or 4 dB at levels just above threshold. When a longer duration, say T = 0.3 s, and a higher slope, say 10 sp/s/dB (achieving a discharge rate of 200 sp/s at 20 dB above threshold), are used in the calculations, a single fiber provides sufficient rate information for a JND of approximately 1 dB.

Second, note that high-SR fibers provide significantly less information in terms of average discharge rate than do low-SR fibers. In Figure 2, (δ′)2 for a low-SR unit is approximately three times larger than that for a high-SR unit. This effect comes from the larger variance associated with higher means in Poisson random variables. The details of the rate-level functions vary among the different spontaneous rate groups (Sachs and Abbas 1974; Winter et al. 1990; Schoonhoven et al. 1997); however, a more precise description of the rate-level functions would be expected to have only a small quantitative, but not qualitative, effect on the calculations and conclusions of this study.

Third, the shape of the dependence of (δ′)2 for a single Poisson channel with the basic rate-level function of Eq. (17) differs grossly from psychoacoustic observations. It is suggested by the results plotted in Figure 2 [and is easily verified analytically, see Eq. (7)] that whenever r(L) is increasing linearly (on a dB scale), (δ′)2 is decreasing with level (due to increased variance with increases in rate). There is no physiological evidence of fibers with rate-level functions that increase faster than linearly over a range greater than 10 or 20 dB. Thus, it can be concluded that a single Poisson channel with a rate-level function compatible with available physiology cannot provide sufficient information even for Weber’s Law, let alone improvement of performance with level, or “the near miss to Weber’s Law.”

The second rate-level function considered is given by

where L is in dB relative to a threshold reference value. It is easily verified that Weber’s Law predictions are obtained from a Poisson channel with this rate-level function. In this case δ′ is equal to a constant value [10√(4cT)] that is independent of L for levels above threshold. It follows that a near-miss prediction on a single Poisson channel requires a rate-level function that grows faster than quadratically on a decibel level scale.

The last rate-level function considered has an exponential shape (on a dB level scale). This type of function was used in the counting models of McGill and Goldberg (1968a,b) and Luce and Green (1974). In both models, count is a Poisson (or nearly Poisson) random variable with a rate-level function that can be written as

where a and b are constants and L is the level in dB re: some reference level L ref. In this case, the resulting expression for (δ′)2 is Tab 2ebL. The result is a JND that decreases with increasing level, and thus these models can predict the observed near-miss behavior. However, rate-level functions with this shape have not been observed in AN fibers and at best could represent combinations of many fibers.

Since single-channel Poisson counting models of level discrimination require rate-level functions that do not represent physiological data directly, we next consider whether deviations from Poisson variability can account for Weber’s Law in single AN fibers.

The effect of deviations from Poisson variability

The importance of the assumptions for the statistical properties of the model discharge patterns is illustrated by single-channel predictions using the formulae of Teich and Lachs (1979). They give expressions for the mean and variance of the count for a dead-time-modified Poisson process, assuming that the rate of the original Poisson process grows proportionally to stimulus level in decibels. The mean and variance of the count in the modified process are given by

and

where E is the stimulus energy, E ref is a threshold constant, T is the duration, and τ is the dead time. If (δ′)2 is computed for a single channel with these statistics, one obtains

Thus, (δ′)2 for a single channel would saturate and become independent of level.

This example shows the importance of variance assumptions, since the mean rate-level function [mean count in Eq. (21) divided by T] has a shape very similar to Eq. (17) with L sat = 20 dB (if thresholds are adjusted), and yet the predicted (δ′)2 in Eq. (23) is dramatically different than that given in Eq. (7) and shown in Figure 2. Also, a saturating rate-level function can provide information sufficient for Weber’s Law and even at a δ′ level consistent with a JND of 0.3 dB (δ′ = 3.2) when τ/T = 0.005.

However, a question for this article is how well non-Poisson models of this type describe AN behavior. The mean function in the model of Teich and Lachs (1979) is similar to observed rate-level functions; the variance, however, is clearly inconsistent with available data near saturation. For example, with a saturation rate of 100 sp/s, the variance of the count over 1 s at a rate of 90 sp/s is less than unity, and the coefficient of variation (the ratio of the standard deviation to the mean count) is less than 0.01. Furthermore, this relative variability continues to decrease inversely proportionally to the stimulus energy because the model fibers are stimulated to discharge almost immediately upon the conclusion of the fixed dead time after each firing. AN data are closer to the Poisson assumption. For example, the count data from Young and Barta (1986, their Fig. 6a) show that a count of 20 discharges per 200 ms (100 sp/s) has a standard deviation of approximately 3 discharges per 200 ms. (For a count of 100 discharges over a full second, this would correspond to a standard deviation of 3√5 = 6.7). Thus, the coefficient of variation at this mean count would be 0.067, which is roughly a factor of 1.5 less than expected for a Poisson process (0.1), but much greater than the model used by Teich and Lachs (1979). It follows that this non-Poisson model does not appropriately describe AN patterns and thus overestimates the amount of AN information at high levels.

This example illustrates the extent to which variance must be reduced from Poisson statistics to produce Weber’s Law and that this reduction is much greater than has been reported for AN fibers (e.g., Young and Barta 1986; Delgutte 1987; Winter and Palmer 1991). Thus, the deviation from Poisson discharge-count variance observed in AN fibers cannot account for the inability of Poisson counting models to predict robust level encoding.

The ability of multiple-CF counting models to explain the “near miss” to Weber’s Law

Since single-fiber models cannot simultaneously be consistent with physiological observations and psychophysical observations, multiple-channel models are considered. When models for level discrimination of narrowband stimuli are considered, the spread of excitation to fibers with CFs that differ from stimulus frequency becomes a central issue. This section begins with a description of a simple AN model (Siebert 1965, 1968) to demonstrate how a population of AN fibers with limited dynamic range can produce Weber’s Law. Several modifications to Siebert’s model are then discussed in terms of their ability to produce the “near miss” to Weber’s Law.

Siebert’s model of Weber’s Law based on spread of excitation

Unlike many other modeling studies that also explicitly included a spread of excitation over CF (e.g., Zwicker 1956; Maiwald 1967a,b,c; Florentine and Buus 1981), Siebert (1965, 1968) included the AN discharge patterns explicitly in his multiple-CF model. Siebert (1965, 1968) assumed optimum processing of a population of Poisson counts, which were based on a saturating rate-level function that was the same for all fibers except that tone threshold varied with CF based on AN frequency tuning. With these assumptions, level discrimination using fibers within a narrow CF band is poor except for a narrow range of levels near threshold (as discussed above), so that the robustness of performance across level is almost completely determined by the spread of excitation over CF bands. Siebert (1965, 1968) showed that Weber’s Law is predicted for tonal stimuli by this model if one assumes a uniform-in-log-frequency distribution of CFs, and two-piece linear tuning curves with constant slopes (in decibels versus log-frequency axes). Although AN fibers with CFs near the tone frequency saturate, the edges of the activity pattern provide a constant amount of information as level increases.

Possible modifications of Siebert’s model to explain the “near miss”

If the distribution of CFs is changed from uniform-in-log-frequency to uniform-in-linear-frequency, a “near miss” deviation from Weber’s Law is predicted with the amount of deviation dependent upon assumptions about the slopes of the tuning curves. This deviation is a direct consequence of having more fibers in the nonsaturated region of CFs as level increases. Specifically, the increase in the number of fibers in the nonsaturated region with CFs above the stimulus frequency is much greater than the decrease in the number with CFs below the stimulus frequency. However, the original uniform-in-log-frequency assumption is much more descriptive of available physiological data than the uniform-in-linear-frequency alternative, thus rejecting this possibility for the purposes of the present study. [Note that the uniform-in-linear-frequency assumption with this model results in the incorrect prediction that the masking of high-CF fibers results in a decrease in performance as level increases when the masking forces the system to use information on low-frequency fibers, since the number of fibers in the useful range (nonsaturated) decreases as level increases.]

If the shape of the tuning curves changes as a function of CF such that higher-CF fibers have lower slopes (decreasing Q), then the spread of excitation would proceed more quickly and place more high-CF fibers in the useful range at higher levels. A model with this assumption would also result in a “near miss” prediction. Although the narrowly tuned “tip” portion of tuning curves shows an increasing Q with increasing CF, the tails of the tuning curves at high CFs (Kiang and Moxon 1974) provide a clear physiological basis for this assumption. Other examples of the dependence of the tails on CF can also be seen in Kiang (1980) and Evans (1972). Instead of describing available tuning curves and the distribution of CFs and calculating the spread of excitation, one can measure the spread directly by measuring the distribution of thresholds for a fixed stimulus waveform for all AN fibers. A sample of measured thresholds for a 1-kHz tone from three cats can be seen in Figure 4 in Kiang and Moxon (1974). The slope of the mean threshold as a function of CF decreases with increasing CF when plotted on the log-frequency axis, consistent with the increasing number of useful fibers as the level increases (if the distribution of CFs is approximately uniform on a log-frequency scale and if the distribution of thresholds at a fixed CF is independent of CF). There are not sufficient data to characterize this factor with quantitative precision; it is clear, however, that this effect would contribute to a deviation from Weber’s Law in the observed direction, i.e., an improvement in performance with increasing level.

The third factor is the shape of the rate-level functions for fibers with CFs above and below the stimulus frequency (Sachs and Abbas 1974; Cooper and Yates 1994). The slope of the rate-level function for a given fiber decreases as the stimulus frequency increases above CF. As frequency decreases below CF, the slope either increases or remains roughly constant. This result indicates that many high-CF fibers will have steeper rate-level functions than fibers with CFs near the stimulus frequency. This would also predict an improvement in discrimination performance at higher levels (other things being equal) relative to Siebert’s prediction of Weber’s Law. If the slope increases by a factor of 3, the predicted δ′ for a single fiber increases by a factor between √3 and 3, depending on the spontaneous rate. Note that such a slope change is consistent with the nonlinear growth of the output of the high-frequency channels in Zwicker’s (1956) model that leads to a predicted improvement in performance at high levels. Furthermore, the fibers with CF below the stimulus frequency are less useful than the fibers with higher CFs. If it were possible to eliminate the higher-CF fibers, performance (i.e., sensitivity per decibel) would be expected to decrease as level increased as a consequence of this effect.

To summarize the conclusions from Siebert’s model (optimum processing of stationary Poisson patterns), deviations from Weber’s Law that are comparable to psychophysical data (a near miss) could be predicted for tones by modifying the model to incorporate the tails of tuning curves for high-CF fibers and/or changes in slope of the rate-level function with tone frequency relative to CF. It is important to consider how well the data being predicted constrain the models being investigated. For example, as discussed above, many modifications of Siebert’s basic model can produce a “near miss” to Weber’s Law based on spread of excitation (also see Lachs et al. 1984; Delgutte 1996; Heinz et al. 2001a,b). Thus, the ability to predict the “near miss” rather than Weber’s Law for tones in quiet is not a critical issue for evaluating level encoding in the AN. A much stronger constraint is the ability to explain the observation that level-discrimination performance is still robust in the presence of off-frequency masking noise (e.g., Moore and Raab 1974, 1975; Viemeister 1983). The simplest (and most common) interpretation of this result is that spread of excitation is not necessary for robust level encoding. This interpretation is based on the assumption that the only influence of the off-frequency masker is prevention of any spread of excitation to CFs away from the tone frequency. If this is true, it becomes critical to account for Weber’s Law only on the basis of information in AN fibers with CFs near the tone frequency. In fact, models that assume Weber’s Law within single-CF channels produce a near miss to Weber’s Law for tones in quiet based on spread of excitation (e.g., Florentine and Buus 1981). The influence of off-frequency maskers may be more complicated than typically assumed because of nonlinear interactions between the signal and masker (e.g., Rhode et al. 1978); however, a quantitative evaluation of these effects requires a more complex AN model than is considered in the present study (see Heinz 2000; Heinz et al. 2002). Nonetheless, it is informative to evaluate level encoding in single-CF channels, and thus the next section continues with the analytical approach to examine the ability of rate information to account for Weber’s Law based on pooling across AN fibers with similar CFs.

The ability of single-CF counting models to explain Weber’s Law

In this section, level discrimination performance (as characterized by δ′ vs. L) is obtained from a population of AN fibers with a common CF (equal to the stimulus frequency). The results depend upon the postulated combination rule as well as the set of assumptions about the discharge patterns. This section focuses on encoding in terms of discharge rate for illustrative purposes, while contributions of temporal information are evaluated below.

Optimum processing

First consider optimum processing of time-invariant Poisson processes (i.e., optimally weighted Poisson counting variables) with rate-level functions given by Eq. (17) as plotted in Figure 1. Since δ′ for L thr = 0 has been calculated for this case (as plotted in Fig. 2), overall performance can be calculated by combining across individual AN fibers according to Eq. (15). The distribution of threshold values, n(L), must be specified along with the values for spontaneous discharge rate. To specify the thresholds, the observation that the rate thresholds of fibers at their CFs are (negatively) correlated with the spontaneous rates of discharge (SRs) is incorporated (Liberman 1978). Three distributions of thresholds are chosen, one for each of the SR categories (low SR = 0.5 sp/s, medium SR = 10 sp/s, and high SR = 50 sp/s). The threshold distributions shown in Figure 3 are based on the data of Liberman (1978). With these assumptions, the optimum sensitivity per decibel is given by

where δ′L, δ′M, and δ′H are the sensitivities per decibel for L thr = 0 described by the functions in Figure 2 for the low, medium, and high SR cases, respectively; n L(L), n M(L), and n H(L) represent the threshold distributions shown in Figure 3; and * represents convolution. The result of this calculation for (δ′op)2, is shown in Figure 4 for a band that is assumed to contain 2200 fibers, corresponding roughly to the number of fibers in a single 1/3-octave band of CFs when frequencies are uniformly distributed on a logarithmic scale (1350 high-SR, 500 medium-SR, and 350 low-SR fibers).

Figure 3
figure 3

The postulated distribution of fiber thresholds for a fixed characteristic frequency CF. Thresholds are plotted in decibels relative to the minimum threshold for the CF. Distributions are separated into categories for each of three classes of spontaneous activity.

Figure 4
figure 4

The sensitivity index squared (δ′)2 as a function of level for a band of 2200 fibers with a common characteristic frequency CF. The contributions from the subgroups of low-, medium-, and high-spontaneous fibers are also plotted.

It is apparent in Figure 4 that optimum use of the counts on all fibers in a common CF band does not predict a level dependence corresponding to Weber’s Law or the near miss to Weber’s Law. Rather it predicts a significant decrease in performance as level increases above about 15 dB. However, predictions for reference levels near 15 dB using 2200 fibers are better than observed performance (e.g., δ′op ≅ 4.5, whereas δ′observed ≅ 1 since the JND ≅ 1 dB). The inability of single-CF Poisson rate information to account for Weber’s Law is consistent with similar studies that have used more accurate rate-level shapes (i.e., that vary with spontaneous rate and threshold) and discharge-count variance based on AN data from cat (e.g., Delgutte 1987; Viemeister 1988; Winslow and Sachs 1988). In contrast, Winter and Palmer (1991) predicted robust level-discrimination performance over at least 110 dB based on single-CF AN rate-level responses in guinea pig. Robust level encoding at high levels in their model resulted from the contribution of high-threshold, low-SR fibers with nonsaturating (“straight”) rate-level functions. However, “straight” rate-level functions were not observed in the guinea pig data for CFs below 1.5 kHz (Winter and Palmer 1991) and have not been observed in data from cat at any CF (e.g., Sachs and Abbas 1974; Delgutte 1987; Winslow and Sachs 1988). Thus, optimal processing of rate information within a single-CF-band does not generally predict Weber’s law. This conclusion implies that this type of rate-based single-CF model alone cannot describe the action of a single (critical-band) channel in models of the type suggested by Zwicker (1956) and Maiwald (1967a,b,c) since performance [e.g., δ′ (L)] is postulated to be independent of L for a single channel stimulated at its CF. However, the wide dynamic range over which enough single-CF rate information is available to account for human performance suggests that combination rules other than the optimal rule should be examined.

Other (nonoptimal) combination rules

Throughout the level range for which predicted performance is superior to observed performance (from less than 0 dB to greater than 70 dB in Fig. 4), there is generally sufficient information available in this single band of fibers to allow performance equal to observed performance if appropriate nonoptimum processing is assumed. This means in essence that many nonoptimum models could describe the observed results in this range. Most of these nonoptimum models may be contrived and ad hoc, but some may be simple and appealing.

In the discussion of combination rules above, total-count and single-fibers-at-a-time rules were considered in addition to the optimum rule. The total count statistic can give performance only equal to or poorer than optimum. Since saturated fibers contribute maximum variance and a negligible change in the mean to the total count, total-count performance will be significantly worse than optimum at high levels. Since this degradation will be relatively less important at lower levels, the total-count statistic will give a description of level discrimination that is even worse (more rapid decrease in performance with level) than the optimum use of Poisson counts. Further, as seen in Figure 2, a single-fiber-at-a-time rule does not provide adequate sensitivity; however, a similar rule applied to groups of fibers (i.e., using a different set of fibers at each level, e.g., Winslow et al. 1987) could be constructed to give Weber’s Law performance over a range of at least 80 dB. Similarly, Delgutte (1987) has shown that a combination rule in which low-SR, high-threshold fibers were processed more efficiently than high-SR, low-threshold fibers could extend the dynamic range over which Weber’s Law was predicted; however, performance still degraded significantly above 80 dB SPL.

The considerations for cases in which only fibers within a single CF band are available can be summarized as follows: Performance based on rate information would ultimately degrade at high levels, and therefore the full range of CFs must be included to understand level discrimination of tones at the highest levels. When all fibers within a given CF band are included, and when all uncertainties are considered, it is not possible to exclude the possibility of performance consistent with Weber’s Law over a wide range of levels using only the rate information in a single-CF band. However, a parsimonious and general model for predicting robust level encoding based on the processing of average-rate information does not exist at this time. Thus, it is of interest to extend the analytical approach used in the present study to the quantification of other sources of level information contained in single-CF AN responses, specifically temporal information.

RESULTS: LEVEL DISCRIMINATION BASED ON TEMPORAL INFORMATION

The ability of synchrony information to explain Weber’s Law

The time-varying Poisson single-channel case [Eq. (10)] is considered here, assuming that the time-varying rate-level function is given by Eq. (9) with Θ independent of level (i.e., level-dependent synchrony is included, but not level-dependent phase). Since the characteristics of the first term in Eq. (10) (i.e., rate information) have been described above, attention is focused on the second, synchrony term.

To evaluate the effect of the second term in Eq. (10), specific assumptions about the function g(L) are made. The maximum value of g(L) depends on frequency; in cat, the largest values are about 5 and occur for low frequencies (as do the largest slopes of g vs. L) (Johnson 1980). The maximum value of g(L) decreases steadily above about 1–3 kHz (Johnson 1980; Weiss and Rose 1988; Koppl 1997). As a convenient approximation to available data (Evans 1980; Johnson 1980), it is assumed in the following that g(L) increases linearly over a range of 20 dB as shown in Figure 5A for a low-frequency fiber. Also, since the discharge patterns on AN fibers often show phase-locking to the stimulus at levels below the level at which the average rate of discharges starts to increase (Johnson 1980), a hypothetical fiber is considered for which g(L) increases to its maximum value before the rate increases above the spontaneous rate. [In actuality, the dynamic range for synchronization partly overlaps that of average rate, but the conclusions drawn here are not affected by this simplification.] For easy comparison to the average-rate-alone results in Figure 2, the duration is again taken to be T = 0.1 s. For dg(L)/dL = 1/4, (δ′)2 reduces to (5/8) SR d 2[ln I 0 (g)]/dg 2, where SR is the spontaneous rate. This function is plotted in Figure 5B for two values of SR (SR = 50 sp/s and SR = 10 sp/s). Note that in contrast to rate information, which decreases as SR increases, synchrony information increases with SR because of the increased number of discharges that encode temporal information.

Figure 5
figure 5

A. The synchrony parameter g(L) (dimensionless) as a function of level (in decibels relative to the rate-defined threshold). B. The square of the sensitivity index (δ′)2 as a function of level; this function was calculated assuming optimum use of the synchrony information alone [i.e., second term in Eq. [10]. Note that 100(δ′)2 is plotted to allow for direct comparison with Figure 2. Two spontaneous rates are illustrated; high SR, solid line; medium SR, dotted line.

This example shows that synchrony can provide much information for level discrimination at low frequencies. Since the synchrony threshold is clearly below the rate threshold, this source of information could extend the range of levels over which a single fiber could provide robust performance. If synchrony information is included, the (δ′)2 for synchrony in Figure 5B is essentially added to each fiber’s (δ′m)2 from the rate-alone analysis in accordance with Eq. (10) above. For the single-CF population model considered above (see Fig. 4), this information could add 10–15 dB to the range of levels over which (δ′op)2 above observed performance but does not change the fact that predicted performance deteriorates rapidly at high levels.

The ability of nonlinear-phase information to explain Weber’s Law

An additional source of information in the phase-locked discharges of low-frequency AN fibers is the nonlinear phase (Anderson et al. 1971), which introduces the third term on the right side of Eq. (10). As mentioned above, the usefulness of this cue is dependent upon either the availability of an absolute phase reference, which is unlikely, or the use of relative times of the discharges of fibers with different CFs (Carney 1994). The Poisson model with nonlinear phase cues can be studied using the expression for the time-varying rate given in Eq. (9), which includes level-dependent rate and synchrony in addition to level-dependent phase. The average-rate-level function used in this section is described in Eq. (17) (Fig. 1) and the level-dependent synchrony was described in the last section (Fig. 5A).

The level-dependent phase is described by a simple function that captures the key features described by Anderson et al. (1971) for AN responses, Ruggero et al. (1997) for basilar membrane responses, and Cheatham and Dallos (1998) for inner hair cells. The phase of a fiber’s response to tones has increasing lag as a function of level in response to stimulus frequencies below CF, has no change with level at CF, and has decreasing lag with level in response to frequencies above CF. Figure 6 shows the dependence of phase on frequency for a single model fiber’s responses at several levels; the plotted phases are referenced to phase at 90 dB SPL [using Anderson et al.’s (1971) convention]. The model phase varies linearly between 30 and 90 dB SPL. This is a conservative range of levels over which the nonlinear-phase cue might convey information for level discrimination; Ruggero et al. (1997) showed that in the most sensitive experimental preparations, the compressive nonlinearity has a threshold of about 20 dB SPL and extends to levels of 100 dB or higher. The maximum difference in phase between the nonlinear-phase threshold (30 dB SPL) and 90 dB SPL is specified as π/2, and that maximum is reached at frequencies 1/2-octave above and below CF.

Figure 6
figure 6

These plots show the simplified nonlinear-phase dependence that was introduced into the analytical AN model, based on the AN recording of Anderson et al. (1971). Phase was referenced to the phase in response to 90 dB SPL at each frequency and plotted versus frequency. The phase is shown for 10 dB increments in level, from 30 dB SPL (largest phase differences with respect to the phase at 90 dB SPL) to 90 dB SPL (flat line, since this is the reference). The maximum phase change with SPL was limited to π/2; the maximum phase changes occurred at a half-octave above and below CF. At each frequency, phase was varied linearly with level between 30 and 90 dB SPL.

This AN model has a highly simplified representation of the nonlinear phase, which facilitates the calculations here. A more accurate representation would vary the amount and frequency range of the level-dependent phase as a function of CF to incorporate the change in the strength of the active process as a function of CF (see Heinz 2000). Nevertheless, the form chosen here yields phase-level curves that are comparable to those of Anderson et al. (1971) for low CFs. As in the treatments of level-dependent rate and synchrony, the details of the level-dependent phase are not important to the goal of illustrating a method for quantifying the information in this neural cue.

When quantifying the information for level discrimination that is available in responses that contain all three level-dependent response properties, the three terms in Eq. (10) can be plotted separately to illustrate the relative contributions of each cue. The upper panels of Figure 7 show rate r(L), synchrony g(L), and phase Θ(L) versus level for a high-SR, 1200-Hz CF model fiber in response to a 1000-Hz tone. Siebert’s (1968) tuning curve function,

was used to compute the threshold for this off-CF tone. For illustration, the frequency of the tone was chosen to be approximately a quarter-octave below CF, resulting in a half-maximal phase cue (see Fig. 6), Recall that the nonlinear-phase cue for tones exists only for fibers responding to frequencies above or below CF. The lower panels of Figure 7 show (δ′)2 for each of the three terms in Eq. (10). The rate-level and sync-level functions are shifted approximately 15 dB to the right compared with Figures 1 and 5 because the fiber is responding to a tone at a frequency away from CF. As before, the changes in rate with SPL contribute information over a limited level range between rate threshold and L sat. The synchrony contributes a relatively large amount of information, but only at very low levels. In contrast, the nonlinear phase contributes values of (δ′)2 comparable to those of the rate term, which are maintained at mid-to-high SPLs. The nonlinear-phase cue increases from 30 to 55 dB SPL because the rate-level function has still not saturated at these levels. Above 55 dB SPL, where rate is saturated, the phase cue remains constant until 90 dB SPL, where the phase becomes level-independent in the model and no information about level change is provided.

Figure 7
figure 7

Top row shows rate-level function, sync-level function, and phase-level function, respectively, for a high-SR, 1200-Hz CF fiber in response to a 1000-Hz tone (which is near the point of half-maximal phase change; see Fig. 6). Bottom row shows (δ′)2 vs. level based on each of the three AN response properties.

Relative amounts and CF distributions of rate, synchrony, and nonlinear-phase information

The definition of the time-varying rate function in Eq. (9) resulted in the ability to “parse” the level information into the three terms in Eq. (10). The overall information for level discrimination contributed by the three cues can be examined by simply summing the three terms of (δ′)2 (Fig. 8), which illustrates the differing importance of the rate and temporal forms of information over different ranges of sound levels for a single fiber. Of course, the distribution of information provided by some of these cues also varies with CF. The CFs that convey information in the form of rate and synchrony vary with level because of spread of excitation, saturation, and the change in amount of compression as a function of CF.

Figure 8
figure 8

Sum of the three (δ′)2 terms in Figure 7 as a function of level. Note that robust information is contained in a single high-SR AN fiber over a wide dynamic range when the nonlinear-phase cues are considered.

Figure 9 illustrates (δ′)2 vs. CF for the three terms in Eq. (10) and their sum at three sound levels of a 1000-Hz tone. The level that excites each model fiber is determined by the simple triangular tuning-curve filter described in Eq. (25). The effects of saturation for fibers with CF near the stimulus frequency and the spread of excitation with increasing level are clear in the “rate” and “synchrony” terms. The “phase” term illustrates that, at moderate-to-high levels, the fibers tuned near the tone frequency have information for level discrimination. The sum of the three terms illustrates that the CF range near the tone frequency provides information at all three SPLs, due to synchrony and rate at low sound levels and to phase at moderate-to-high sound levels. Thus, at low CFs, where the average-rate dynamic ranges of both low- and high-SR fibers are limited, the nonlinear-phase cues may be especially important for conveying information related to changes in level.

Figure 9
figure 9

Sensitivity index squared (δ′)2 vs. CF in response to a 1000-Hz tone at three levels (dB re threshold at CF) for high-SR fibers. Stimulus levels (10, 40, 70 dB SPL) are indicated at the right of each row. The (δ′)2 based on Rate only, Sync only, and Phase only are shown (first three columns), as well as the sum of the three terms (right column). It is clear that each cue potentially contributes information for discrimination at different stimulus levels and over different CF ranges. For example, the phase cues provide information in the region near CF at mid to high levels where rate and synchrony provide little or no information due to saturation.

GENERAL DISCUSSION

This study explored several issues related to the encoding of level in AN discharge patterns. Analytical models of AN tone responses and signal detection theory were used to quantify optimal performance limits based on the stochastic responses of the AN. Simple analytical AN models provided insight into the relative importance of different sources (rate and temporal) of neural information for level encoding. Specifically, simple equations were derived for the relative contributions of average-rate, synchrony, and phase cues. The inclusion of temporal information in analytical AN models extends previous modeling studies of level encoding, which have been primarily limited to average-rate information (e.g., Siebert 1965, 1968; Delgutte 1987; Winslow and Sachs 1988; Viemeister 1988; Winter and Palmer 1991).

The ability of individual AN fibers to robustly encode level changes based on average rate depends on the shape of the rate-level function and on the nature of the discharge randomness. It was shown that the rate information provided by individual AN fibers is maximal at stimulus levels within 5–10 dB above fiber threshold and that information begins to degrade at levels well below those for which rate saturation limits performance. This degradation is primarily due to the variance of AN discharge counts increasing significantly with increases in rate, while AN rate-level curves do not increase faster than linearly (versus decibels) over wide level ranges. Thus, individual AN fibers are even more limited in their ability to robustly encode changes in stimulus level based on rate than saturation would suggest.

Since there is considerably more than enough information in the AN population response to allow observed performance in level discrimination in quiet over a wide range of levels, the interesting question becomes how to understand the parametric dependencies and the effects of off-frequency maskers. It is typically assumed that good performance in the presence of off-frequency maskers implies that Weber’s Law must be produced by AN fibers within a narrow CF band. However, consistent with previous studies, it was shown here that optimal processing of average-rate information does not account for Weber’s Law based on fibers with a limited CF range because performance degrades significantly above about 40 dB SPL.

While the predicted trends in optimal performance were inconsistent with behavioral performance, it is not possible to rule out rate-based models because there is enough total rate information to account for robust level-discrimination performance over a wide range of levels (for general discussions of the use of optimal performance limits to evaluate neural encoding, see Siebert 1968, 1970; Colburn 1973; Delgutte 1996; Heinz et al. 2001a). Optimal performance limits superior to behavioral performance suggest the need for a suboptimal combination scheme (as discussed below). However, the strong degradation in rate information as level increases above medium levels suggests that parsimonious suboptimal combination schemes based on rate information may not exist and that other sources of neural information may be needed to account for robust level encoding in the AN.

The analytical AN model used in the present study allowed for the quantitative comparison of the relative contributions of rate and of temporal information. The level dependence of synchrony provides information that extends the dynamic range for robust level encoding at low frequencies, but only at low levels. Thus, synchrony information per se does not help account for robust level encoding at high levels based on fibers within a narrow range of CFs. In contrast, it was shown that nonlinear-phase cues provide robust level information within a narrow CF range over a wide range of levels, including high levels.

The third term of Eq. (10) illustrates the dependence of nonlinear-phase information on basic AN response properties. It was shown that phase information depends not only on the rate of change in phase with level, but also on average discharge rate and strength of synchrony. This makes sense intuitively, as changes in phase are easier to decode when many spikes are observed and when these spikes are strongly phase locked to the stimulus. This dependence implies that nonlinear-phase information at low frequencies is robust up to high levels in all fibers because average rate and synchrony are essentially constant at levels more than ~30 dB above fiber threshold, and the rate of change of the phase is essentially constant with level (Anderson et al. 1971; Ruggero et al. 1997).

The relation between nonlinear-phase responses and nonlinear tuning implies that nonlinear-phase cues exist over the entire range of levels for which the cochlear amplifier produces compressive BM responses (i.e., at least up to ~90 dB SPL; Ruggero et al. 1997). The predicted optimal performance limits do not depend on (or suggest) a specific mechanism for decoding the nonlinear-phase cues. However, these phase cues can be decoded by any mechanism that compares the relative phase response across fibers with different CFs (discussed further below) because the level dependence of phase differs across frequency relative to CF (Anderson et al. 1971; also see Fig. 6). Thus, nonlinear-phase responses appear to provide a realistic source of robust level information near CF and may provide an alternative explanation at low frequencies to the level-dependent processing schemes that are necessary to account for Weber’s Law based on average rate.

The present study provides constraints for two possible explanations, one based on average rate and one based on nonlinear phase, for robust level encoding at high stimulus levels based on AN fibers within a narrow range of CFs. As discussed in the following paragraphs, a specific neural mechanism has been proposed for each explanation of how AN information could be decoded in the cochlear nucleus to produce robust level encoding. Further support for or against each explanation can be garnered by considering whether there are cell types in the cochlear nucleus that could perform the proposed neural processing.

Winslow et al. (1987) have proposed a “selective listening” mechanism in which average-rate information from high-SR, low-threshold fibers is used at low levels, while that from low-SR, high-threshold fibers is used at high levels. Lai et al. (1994) have demonstrated that such a selective-listening strategy can be performed by a simple model of a cochlear nucleus stellate neuron based on shunting inhibition. However, the required anatomical innervation patterns of the different SR fibers to stellate neurons and quantitative psychophysical predictions have not been demonstrated for this mechanism. Furthermore, it is not clear that a model that relies solely on low-SR fibers at high levels would produce Weber’s Law because the information provided by low-SR fibers also begins to degrade within 10 dB above their threshold (see Fig. 2).

Carney (1994) suggested that a monaural, across-frequency coincidence detection mechanism could be used to decode the level information provided by nonlinear-phase cues. There is physiological evidence that some cell types in the cochlear nucleus (e.g., globular bushy cells) have responses that are consistent with a coincidence-detection mechanism (e.g., Carney 1990; Joris et al. 1994a,b). Heinz et al. (2001b) have quantitatively evaluated the ability of a simple across-frequency coincidence-counting mechanism to account for robust level encoding based on the information in AN responses. They showed that a near-CF population of coincidence counters could reliably decode the robust nonlinear-phase cues provided at low frequencies. In addition, the coincidence-detector population also produced Weber’s Law at high frequencies based on the more robust average-rate cues associated with stronger compression at high frequencies. Carney et al. (2002) have also demonstrated the ability of a monaural, across-frequency, coincidence-detection mechanism to account for detection of tones in noise. Future physiological studies are needed to test specific single-unit-response predictions for the coincidence-detection mechanism, as well as the selective-listening mechanism, in order to provide further support for the types of AN information that are important for robust level encoding.

As noted above, available data indicate that AN phase locking to the cycles of a tone decreases at frequencies higher than approximately 1–3 kHz (Johnson 1980; Weiss and Rose 1988; Koppl 1997); however, this rolloff in synchrony was not included in the simple analytical model. If it is assumed that synchrony information in the human AN is similarly reduced at high frequencies, then the information conveyed by synchrony and level-dependent phase cues for encoding the level of a tone is significantly reduced at high frequencies. At high frequencies, the contributions of rate cues from high-threshold, low-SR fibers with wider dynamic ranges are potentially more important (Winter and Palmer 1991; Heinz 2000; Heinz et al. 2001b). The low-SR fibers depend upon large amounts of compression for their wide dynamic ranges and the amount of compression increases as a function of CF (e.g., Cooper and Yates 1994). These facts are consistent with the observation that fibers with non-saturating (“straight”) rate-level functions are not observed at CFs below about 1500 Hz (Winter and Palmer 1991) in guinea pig.

If robust level encoding were dependent on phase cues at low frequencies and on rate cues at high frequencies, then a variation in level-discrimination performance across frequency could be expected. However, Heinz et al. (2001b) have shown that linear spread of excitation plays a strong role for level discrimination in quiet, which suggests that this frequency effect would be subtle. In fact, a subtle frequency dependency has been observed in level-discrimination performance (Jesteadt et al. 1977; Florentine et al. 1987). While the near miss to Weber’s Law occurs for low frequencies, a small but significant nonmonotonicity in performance as a function of level occurs at high frequencies. This “midlevel bump,” which begins to appear between 1 and 4 kHz, can be accounted for by the strong BM compression at high frequencies that starts around 30 dB SPL (Heinz et al. 2001b). Finally, it should also be mentioned that the present analysis does not address the time variation in the rate that occurs after the onset of a stimulus (Smith and Brachman 1979); the level dependence of this adaptation (i.e., a wider dynamic range at onset) could also provide level information (cf. Evans 1980) and is not limited to low frequencies.

The significance of potential variations across species is another issue that requires future work. For example, “straight” rate-level curves have been observed in guinea pig AN responses for high CFs (Winter et al. 1990) but not for low CFs (Winter and Palmer 1991), whereas “straight” rate-level functions have not been observed for any CFs in cat (e.g., Sachs and Abbas 1974; Delgutte 1987; Winslow and Sachs 1988). This result suggests that the strength and frequency dependence of compression may differ for cats and guinea pigs. Heinz et al. (2001b) have demonstrated that the strength of compression has a large effect on the ability of near-CF rate information to account for Weber’s Law. Thus, an important remaining issue is the strength of compression in humans relative to species for which physiological BM and AN data are available. Psychophysical methods have recently been developed that estimate BM compression based on forward-masking studies (e.g., Oxenham and Plack 1997; Nelson et al. (2001). These methods have been shown to produce estimates of human compression that are consistent with the amount of BM compression that has been measured at high frequencies. However, these methods rely on assumptions for which the physiological evidence at low frequencies is not definitive, e.g., that below-CF responses are linear. These methods show promise for estimating the strength of cochlear nonlinearity in humans, but their ability to accurately estimate compression strength as a function of frequency remains to be shown.

In summary, it is likely that level discrimination is mediated by a multiplicity of attributes of the physiological data and that the relative usefulness of these attributes is dependent upon the stimulus circumstances, such as masked or unmasked, wideband or narrowband, short or long duration, and fast or slow stimulus onsets and offsets. The present study provides a quantitative framework to analyze and compare different types of information available in AN responses for encoding level. Future studies with more complex AN models can extend the results in the present study by using this quantitative approach.