Abstract
Sensory signals are transduced at high resolution, but their structure must be stored in a more compact format. Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics. We measured discrimination of 'sound textures' that were characterized by particular statistical properties, as normally result from the superposition of many acoustic features in auditory scenes. When listeners discriminated examples of different textures, performance improved with excerpt duration. In contrast, when listeners discriminated different examples of the same texture, performance declined with duration, a paradoxical result given that the information available for discrimination grows with duration. These results indicate that once these sounds are of moderate length, the brain's representation is limited to time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration. Such statistical representations produce good categorical discrimination, but limit the ability to discern temporal detail.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Plomp, R. Rate of decay of auditory sensation. J. Acoust. Soc. Am. 36, 277–282 (1964).
Dye, R.H. & Hafter, E.R. The effect of intensity on the detection of interaural differences of time in high-frequency trains of clicks. J. Acoust. Soc. Am. 75, 1593–1598 (1984).
Saint-Arnaud, N. & Popat, K. Analysis and synthesis of sound texture. Proc. AJCAI Workshop Comput. Auditory Scene Anal. 293–308 (1995).
Dubnov, S., Bar-Joseph, Z., El-Yaniv, R., Lischinski, D. & Werman, M. Synthesizing sound textures through wavelet tree learning. IEEE Comput. Graph. Appl. 22, 38–48 (2002).
Athineos, M & Ellis, D. Sound texture modeling with linear prediction in both time and frequency domains. IEEE Workshop Appl. Signal Processing Audio Acoustics 648–651 (2003).
Lu, L., Wenyin, L. & Zhang, H. Audio textures: theory and applications. IEEE Trans. Speech Audio Process. 12, 156–167 (2004).
Schwarz, D. State of the art in sound texture synthesis. 14th Int. Conf. Digital Audio Effects 221–231 (2011).
McDermott, J.H., Oxenham, A.J. & Simoncelli, E.P. Sound texture synthesis via filter statistics. IEEE Workshop Appl. Signal Processing Audio Acoustics 297–300 (2009).
McDermott, J.H. & Simoncelli, E.P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Demany, L., Trost, W., Serman, M. & Semal, C. Auditory change detection: simple sounds are not memorized better than complex sounds. Psychol. Sci. 19, 85–91 (2008).
Goossens, T., van de Par, S. & Kohlrausch, A. On the ability to discriminate Gaussian-noise tokens or random tone-burst complexes. J. Acoust. Soc. Am. 124, 2251–2262 (2008).
Geffen, M.N., Gervain, J., Werker, J.F. & Magnasco, M.O. Auditory perception of self-similarity in water sounds. Front. Integr. Neurosci. 5, 15 (2011).
Hanna, T.E. Discrimination of reproducible noise as a function of bandwidth and duration. Percept. Psychophys. 36, 409–416 (1984).
Coble, S.F. & Robinson, D.E. Discriminability of bursts of reproducible noise. J. Acoust. Soc. Am. 92, 2630–2635 (1992).
Heller, L.M. & Trahiotis, C. The discrimination of samples of noise in monotic, diotic, and dichotic conditions. J. Acoust. Soc. Am. 97, 3775–3781 (1995).
Goossens, T., van de Par, S. & Kohlrausch, A. Gaussian-noise discrimination and its relation to auditory object formation. J. Acoust. Soc. Am. 125, 3882–3893 (2009).
Gerken, G.M., Bhat, V.K.H. & Hutchinson-Clutter, M.H. Auditory temporal integration and the power-function model. J. Acoust. Soc. Am. 88, 767–778 (1990).
Moore, B.C.J. Frequency difference limens for short-duration tones. J. Acoust. Soc. Am. 54, 610–619 (1973).
Viemeister, N.F. Temporal modulation transfer functions based upon modulation thresholds. J. Acoust. Soc. Am. 66, 1364–1380 (1979).
Sheft, S. & Yost, W.A. Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88, 796–805 (1990).
Houtgast, T. & Plomp, R. Lateralization threshold of a signal in noise. J. Acoust. Soc. Am. 44, 807–812 (1968).
Hafter, E.R., Dye, R.H. & Gilkey, R.H. Lateralization of tonal signals which have neither onsets nor offsets. J. Acoust. Soc. Am. 65, 471–477 (1979).
Klein, D.J., Konig, P. & Kording, K.P. Sparse spectrotemporal coding of sounds. EURASIP J. Appl. Signal Process. 7, 659–667 (2003).
Smith, E.C. & Lewicki, M.S. Efficient auditory encoding. Nature 439, 978–982 (2006).
Hromadka, T., DeWeese, M.R. & Zador, A.M. Sparse representations of sounds in the unanesthetized auditory cortex. PLoS Biol. 6, 124–137 (2008).
Deutsch, D. Tones and numbers: Specificity of interference in short-term memory. Science 168, 1604–1605 (1970).
Starr, G.E. & Pitt, M.A. Interference effects in short-term memory for timbre. J. Acoust. Soc. Am. 102, 486–494 (1997).
Latinus, M. & Belin, P. Human voice perception. Curr. Biol. 21, R143–R145 (2011).
Tzanetakis, G. & Cook, P. Musical genre classification of audio signals. IEE Trans. Speech Audio Processing 10, 293–302 (2002).
Guttman, N. & Julesz, B. Lower limits of auditory periodicity analysis. J. Acoust. Soc. Am. 35, 610 (1963).
Warren, R.M., Bashford, J.A. Jr., Cooley, J.M. & Brubaker, B.S. Detection of acoustic repetition for very long stochastic patterns. Percept. Psychophys. 63, 175–182 (2001).
Kaernbach, C. The memory of noise. Exp. Psychol. 51, 240–248 (2004).
Agus, T.R., Thorpe, S.J. & Pressnitzer, D. Rapid formation of auditory memories: insights from noise. Neuron 66, 610–618 (2010).
McDermott, J.H., Wrobleski, D. & Oxenham, A.J. Recovering sound sources from embedded repetition. Proc. Natl. Acad. Sci. USA 108, 1188–1193 (2011).
Carlyon, R.P., Micheyl, C., Deeks, J.M. & Moore, B.C.J. Auditory processing of real and illusory changes in frequency modulation (FM) phase. J. Acoust. Soc. Am. 116, 3629–3639 (2004).
Lyzenga, J., Carlyon, R.P. & Moore, B.C.J. Dynamic aspects of the continuity illusion: perception of level and of the depth, rate and phase of modulation. Hear. Res. 210, 30–41 (2005).
Cutting, J.E. & Rosner, B. Categories and boundaries in speech and music. Percept. Psychophys. 16, 564–571 (1974).
Nahum, M., Nelken, I. & Ahissar, M. Low-level information and high-level perception: The case of speech in noise. PLoS Biol. 6, e126 (2008).
Ariely, D. Seeing sets: Representation by statistical properties. Psychol. Sci. 12, 157–162 (2001).
Chong, S.C. & Treisman, A. Representation of statistical properties. Vision Res. 43, 393–404 (2003).
Haberman, J. & Whitney, D. Seeing the mean: ensemble coding for sets of faces. J. Exp. Psychol. Hum. Percept. Perform. 35, 718–734 (2009).
Parkes, L., Lund, J., Angelucci, A., Solomon, J.A. & Morgan, M. Compulsory averaging of crowded orientation signals in human vision. Nat. Neurosci. 4, 739–744 (2001).
Greenwood, J.A., Bex, P.J. & Dakin, S.C. Positional averaging explains crowding with letter-like stimuli. Proc. Natl. Acad. Sci. USA 106, 13130–13135 (2009).
Balas, B., Nakano, L. & Rosenholtz, R. A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 1–18 (2009).
Freeman, J. & Simoncelli, E.P. Metamers of the ventral stream. Nat. Neurosci. 14, 1195–1201 (2011).
Alvarez, G.A. & Oliva, A. Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proc. Natl. Acad. Sci. USA 106, 7345–7350 (2009).
Yabe, H. et al. Temporal window of integration of auditory information in the human brain. Psychophysiology 35, 615–619 (1998).
Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time'. Speech Commun. 41, 245–255 (2003).
Viemeister, N.F. & Wakefield, G.H. Temporal integration and multiple looks. J. Acoust. Soc. Am. 90, 858–865 (1991).
Elhilali, M. & Shamma, S.A. A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. J. Acoust. Soc. Am. 124, 3751–3771 (2008).
Acknowledgements
The authors thank B. Anderson, S. Keshvari and J. Traer for comments on earlier versions of the manuscript. Research was funded by the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Contributions
J.H.M., M.S. and E.P.S. designed the experiments. M.S. conducted the experiments. J.H.M. analyzed the data. J.H.M. and E.P.S. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1 and 2 and Supplementary Table 1 (PDF 1961 kb)
Rights and permissions
About this article
Cite this article
McDermott, J., Schemitsch, M. & Simoncelli, E. Summary statistics in auditory perception. Nat Neurosci 16, 493–498 (2013). https://doi.org/10.1038/nn.3347
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.3347
This article is cited by
-
Timbral effects on consonance disentangle psychoacoustic mechanisms and suggest perceptual origins for musical scales
Nature Communications (2024)
-
Model metamers reveal divergent invariances between biological and artificial neural networks
Nature Neuroscience (2023)
-
Hearing as adaptive cascaded envelope interpolation
Communications Biology (2023)
-
Deep neural network models of sound localization reveal how perception is adapted to real-world environments
Nature Human Behaviour (2022)
-
The role of temporal coherence and temporal predictability in the build-up of auditory grouping
Scientific Reports (2022)