Summary statistics in auditory perception

McDermott, Josh H; Schemitsch, Michael; Simoncelli, Eero P

doi:10.1038/nn.3347

Article
Published: 24 February 2013

Summary statistics in auditory perception

Josh H McDermott¹,
Michael Schemitsch² &
Eero P Simoncelli²

Nature Neuroscience volume 16, pages 493–498 (2013)Cite this article

7883 Accesses
144 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Sensory signals are transduced at high resolution, but their structure must be stored in a more compact format. Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics. We measured discrimination of 'sound textures' that were characterized by particular statistical properties, as normally result from the superposition of many acoustic features in auditory scenes. When listeners discriminated examples of different textures, performance improved with excerpt duration. In contrast, when listeners discriminated different examples of the same texture, performance declined with duration, a paradoxical result given that the information available for discrimination grows with duration. These results indicate that once these sounds are of moderate length, the brain's representation is limited to time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration. Such statistical representations produce good categorical discrimination, but limit the ability to discern temporal detail.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Textures and time-averaged statistics.**

**Figure 2: Texture and exemplar discrimination results.**

**Figure 3: Exemplar discrimination with mixtures of sources.**

Multiscale temporal integration organizes hierarchical computation in human auditory cortex

Article 10 February 2022

Sam V. Norman-Haignere, Laura K. Long, … Nima Mesgarani

Scaling of sensory information in large neural populations shows signatures of information-limiting correlations

Article Open access 20 January 2021

MohammadMehdi Kafashan, Anna W. Jaffe, … Jan Drugowitsch

Illusory sound texture reveals multi-second statistical completion in auditory scene analysis

Article Open access 08 November 2019

Richard McWalter & Josh H. McDermott

References

Plomp, R. Rate of decay of auditory sensation. J. Acoust. Soc. Am. 36, 277–282 (1964).
Article Google Scholar
Dye, R.H. & Hafter, E.R. The effect of intensity on the detection of interaural differences of time in high-frequency trains of clicks. J. Acoust. Soc. Am. 75, 1593–1598 (1984).
Article Google Scholar
Saint-Arnaud, N. & Popat, K. Analysis and synthesis of sound texture. Proc. AJCAI Workshop Comput. Auditory Scene Anal. 293–308 (1995).
Dubnov, S., Bar-Joseph, Z., El-Yaniv, R., Lischinski, D. & Werman, M. Synthesizing sound textures through wavelet tree learning. IEEE Comput. Graph. Appl. 22, 38–48 (2002).
Article Google Scholar
Athineos, M & Ellis, D. Sound texture modeling with linear prediction in both time and frequency domains. IEEE Workshop Appl. Signal Processing Audio Acoustics 648–651 (2003).
Lu, L., Wenyin, L. & Zhang, H. Audio textures: theory and applications. IEEE Trans. Speech Audio Process. 12, 156–167 (2004).
Article Google Scholar
Schwarz, D. State of the art in sound texture synthesis. 14th Int. Conf. Digital Audio Effects 221–231 (2011).
McDermott, J.H., Oxenham, A.J. & Simoncelli, E.P. Sound texture synthesis via filter statistics. IEEE Workshop Appl. Signal Processing Audio Acoustics 297–300 (2009).
McDermott, J.H. & Simoncelli, E.P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Article CAS Google Scholar
Demany, L., Trost, W., Serman, M. & Semal, C. Auditory change detection: simple sounds are not memorized better than complex sounds. Psychol. Sci. 19, 85–91 (2008).
Article Google Scholar
Goossens, T., van de Par, S. & Kohlrausch, A. On the ability to discriminate Gaussian-noise tokens or random tone-burst complexes. J. Acoust. Soc. Am. 124, 2251–2262 (2008).
Article Google Scholar
Geffen, M.N., Gervain, J., Werker, J.F. & Magnasco, M.O. Auditory perception of self-similarity in water sounds. Front. Integr. Neurosci. 5, 15 (2011).
Article Google Scholar
Hanna, T.E. Discrimination of reproducible noise as a function of bandwidth and duration. Percept. Psychophys. 36, 409–416 (1984).
Article CAS Google Scholar
Coble, S.F. & Robinson, D.E. Discriminability of bursts of reproducible noise. J. Acoust. Soc. Am. 92, 2630–2635 (1992).
Article CAS Google Scholar
Heller, L.M. & Trahiotis, C. The discrimination of samples of noise in monotic, diotic, and dichotic conditions. J. Acoust. Soc. Am. 97, 3775–3781 (1995).
Article CAS Google Scholar
Goossens, T., van de Par, S. & Kohlrausch, A. Gaussian-noise discrimination and its relation to auditory object formation. J. Acoust. Soc. Am. 125, 3882–3893 (2009).
Article Google Scholar
Gerken, G.M., Bhat, V.K.H. & Hutchinson-Clutter, M.H. Auditory temporal integration and the power-function model. J. Acoust. Soc. Am. 88, 767–778 (1990).
Article CAS Google Scholar
Moore, B.C.J. Frequency difference limens for short-duration tones. J. Acoust. Soc. Am. 54, 610–619 (1973).
Article CAS Google Scholar
Viemeister, N.F. Temporal modulation transfer functions based upon modulation thresholds. J. Acoust. Soc. Am. 66, 1364–1380 (1979).
Article CAS Google Scholar
Sheft, S. & Yost, W.A. Temporal integration in amplitude modulation detection. J. Acoust. Soc. Am. 88, 796–805 (1990).
Article CAS Google Scholar
Houtgast, T. & Plomp, R. Lateralization threshold of a signal in noise. J. Acoust. Soc. Am. 44, 807–812 (1968).
Article CAS Google Scholar
Hafter, E.R., Dye, R.H. & Gilkey, R.H. Lateralization of tonal signals which have neither onsets nor offsets. J. Acoust. Soc. Am. 65, 471–477 (1979).
Article CAS Google Scholar
Klein, D.J., Konig, P. & Kording, K.P. Sparse spectrotemporal coding of sounds. EURASIP J. Appl. Signal Process. 7, 659–667 (2003).
Google Scholar
Smith, E.C. & Lewicki, M.S. Efficient auditory encoding. Nature 439, 978–982 (2006).
Article CAS Google Scholar
Hromadka, T., DeWeese, M.R. & Zador, A.M. Sparse representations of sounds in the unanesthetized auditory cortex. PLoS Biol. 6, 124–137 (2008).
Article CAS Google Scholar
Deutsch, D. Tones and numbers: Specificity of interference in short-term memory. Science 168, 1604–1605 (1970).
Article CAS Google Scholar
Starr, G.E. & Pitt, M.A. Interference effects in short-term memory for timbre. J. Acoust. Soc. Am. 102, 486–494 (1997).
Article CAS Google Scholar
Latinus, M. & Belin, P. Human voice perception. Curr. Biol. 21, R143–R145 (2011).
Article CAS Google Scholar
Tzanetakis, G. & Cook, P. Musical genre classification of audio signals. IEE Trans. Speech Audio Processing 10, 293–302 (2002).
Article Google Scholar
Guttman, N. & Julesz, B. Lower limits of auditory periodicity analysis. J. Acoust. Soc. Am. 35, 610 (1963).
Article Google Scholar
Warren, R.M., Bashford, J.A. Jr., Cooley, J.M. & Brubaker, B.S. Detection of acoustic repetition for very long stochastic patterns. Percept. Psychophys. 63, 175–182 (2001).
Article CAS Google Scholar
Kaernbach, C. The memory of noise. Exp. Psychol. 51, 240–248 (2004).
Article Google Scholar
Agus, T.R., Thorpe, S.J. & Pressnitzer, D. Rapid formation of auditory memories: insights from noise. Neuron 66, 610–618 (2010).
Article CAS Google Scholar
McDermott, J.H., Wrobleski, D. & Oxenham, A.J. Recovering sound sources from embedded repetition. Proc. Natl. Acad. Sci. USA 108, 1188–1193 (2011).
Article CAS Google Scholar
Carlyon, R.P., Micheyl, C., Deeks, J.M. & Moore, B.C.J. Auditory processing of real and illusory changes in frequency modulation (FM) phase. J. Acoust. Soc. Am. 116, 3629–3639 (2004).
Article Google Scholar
Lyzenga, J., Carlyon, R.P. & Moore, B.C.J. Dynamic aspects of the continuity illusion: perception of level and of the depth, rate and phase of modulation. Hear. Res. 210, 30–41 (2005).
Article CAS Google Scholar
Cutting, J.E. & Rosner, B. Categories and boundaries in speech and music. Percept. Psychophys. 16, 564–571 (1974).
Article Google Scholar
Nahum, M., Nelken, I. & Ahissar, M. Low-level information and high-level perception: The case of speech in noise. PLoS Biol. 6, e126 (2008).
Article Google Scholar
Ariely, D. Seeing sets: Representation by statistical properties. Psychol. Sci. 12, 157–162 (2001).
Article CAS Google Scholar
Chong, S.C. & Treisman, A. Representation of statistical properties. Vision Res. 43, 393–404 (2003).
Article Google Scholar
Haberman, J. & Whitney, D. Seeing the mean: ensemble coding for sets of faces. J. Exp. Psychol. Hum. Percept. Perform. 35, 718–734 (2009).
Article Google Scholar
Parkes, L., Lund, J., Angelucci, A., Solomon, J.A. & Morgan, M. Compulsory averaging of crowded orientation signals in human vision. Nat. Neurosci. 4, 739–744 (2001).
Article CAS Google Scholar
Greenwood, J.A., Bex, P.J. & Dakin, S.C. Positional averaging explains crowding with letter-like stimuli. Proc. Natl. Acad. Sci. USA 106, 13130–13135 (2009).
Article CAS Google Scholar
Balas, B., Nakano, L. & Rosenholtz, R. A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 1–18 (2009).
PubMed Google Scholar
Freeman, J. & Simoncelli, E.P. Metamers of the ventral stream. Nat. Neurosci. 14, 1195–1201 (2011).
Article CAS Google Scholar
Alvarez, G.A. & Oliva, A. Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proc. Natl. Acad. Sci. USA 106, 7345–7350 (2009).
Article CAS Google Scholar
Yabe, H. et al. Temporal window of integration of auditory information in the human brain. Psychophysiology 35, 615–619 (1998).
Article CAS Google Scholar
Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time'. Speech Commun. 41, 245–255 (2003).
Article Google Scholar
Viemeister, N.F. & Wakefield, G.H. Temporal integration and multiple looks. J. Acoust. Soc. Am. 90, 858–865 (1991).
Article CAS Google Scholar
Elhilali, M. & Shamma, S.A. A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. J. Acoust. Soc. Am. 124, 3751–3771 (2008).
Article Google Scholar

Download references

Acknowledgements

The authors thank B. Anderson, S. Keshvari and J. Traer for comments on earlier versions of the manuscript. Research was funded by the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Josh H McDermott
Howard Hughes Medical Institute, Center for Neural Science, and Courant Institute of Mathematical Sciences, New York University, New York, New York, USA
Michael Schemitsch & Eero P Simoncelli

Authors

Josh H McDermott
View author publications
You can also search for this author in PubMed Google Scholar
Michael Schemitsch
View author publications
You can also search for this author in PubMed Google Scholar
Eero P Simoncelli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.H.M., M.S. and E.P.S. designed the experiments. M.S. conducted the experiments. J.H.M. analyzed the data. J.H.M. and E.P.S. wrote the manuscript.

Corresponding author

Correspondence to Josh H McDermott.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2 and Supplementary Table 1 (PDF 1961 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

McDermott, J., Schemitsch, M. & Simoncelli, E. Summary statistics in auditory perception. Nat Neurosci 16, 493–498 (2013). https://doi.org/10.1038/nn.3347

Download citation

Received: 24 September 2012
Accepted: 31 January 2013
Published: 24 February 2013
Issue Date: April 2013
DOI: https://doi.org/10.1038/nn.3347

This article is cited by

Timbral effects on consonance disentangle psychoacoustic mechanisms and suggest perceptual origins for musical scales
- Raja Marjieh
- Peter M. C. Harrison
- Nori Jacoby
Nature Communications (2024)
Model metamers reveal divergent invariances between biological and artificial neural networks
- Jenelle Feather
- Guillaume Leclerc
- Josh H. McDermott
Nature Neuroscience (2023)
Hearing as adaptive cascaded envelope interpolation
- Etienne Thoret
- Sølvi Ystad
- Richard Kronland-Martinet
Communications Biology (2023)
Deep neural network models of sound localization reveal how perception is adapted to real-world environments
- Andrew Francl
- Josh H. McDermott
Nature Human Behaviour (2022)
The role of temporal coherence and temporal predictability in the build-up of auditory grouping
- Joseph Sollini
- Katarina C. Poole
- Jennifer K. Bizley
Scientific Reports (2022)