Review
Efficient Neural Coding in Auditory and Speech Perception

https://doi.org/10.1016/j.tins.2018.09.004Get rights and content

Highlights

Efficient neural coding may support the selectivity for speech in the auditory pathway.

The auditory neuronal code matches the statistics of natural and behaviorally relevant sounds.

Speech perception may rely on the same auditory coding mechanisms that facilitate efficient coding of other natural sound statistics.

Speech has long been recognized as ‘special’. Here, we suggest that one of the reasons for speech being special is that our auditory system has evolved to encode it in an efficient, optimal way. The theory of efficient neural coding argues that our perceptual systems have evolved to encode environmental stimuli in the most efficient way. Mathematically, this can be achieved if the optimally efficient codes match the statistics of the signals they represent. Experimental evidence suggests that the auditory code is optimal in this mathematical sense: statistical properties of speech closely match response properties of the cochlea, the auditory nerve, and the auditory cortex. Even more interestingly, these results may be linked to phenomena in auditory and speech perception.

Section snippets

The Relevance of Efficient Neural Coding for Speech Perception

Speech has long been recognized as ‘special’ 1, 2, 3, 4, 5, 6. We prefer it over other sounds from birth onwards [6], and we are able to make fine-grained discriminations that allow us to convey an infinite amount of messages. The special status of speech has been studied from a variety of perspectives. Researchers of social cognition approach it as our species-specific communicative signal, and as the basis of learning and cultural transmission 7, 8. Others have claimed that speech is special

The Statistical Structure of Sounds

To test whether the mammalian auditory system codes sound in a mathematically optimal way, it is first necessary to describe the statistical structure of sounds. The space (in the mathematical sense) of all potential sounds is vast (Box 1). Within this space, natural sounds, including speech, comprise a compact yet multi-dimensional subspace. Analyses of statistical regularities in natural sounds have identified several prominent features. The temporal structure of many natural environmental

Nonredundant, Optimal Mathematical Models of Sounds

According to the efficient coding hypothesis, the brain has evolved to efficiently process and respond to stimuli that occur in nature, reducing redundancy in their neural representations [9]. This principle posits that the statistical properties of neuronal responses should match the statistical structure of natural stimuli, and should maximize the efficiency in representation 10, 33. This is best achieved if neuronal responses constitute a sparse, nonredundant code, meaning that the code

Sounds with Naturalistic Statistics are Special for the Mammalian Auditory System

According to the efficient coding hypothesis, identifying the statistical dependencies in the structure of sounds yields insight into the structure of the neuronal code. This was tested by constructing artificial codes that were optimized according to some set of constraints to best represent natural sounds, and then compared to experimental measurements of responses of neurons in the auditory pathway. Such advanced mathematical models were, for instance, used to better understand the structure

Can Efficient Coding Explain Perception?

Few studies to date have directly addressed whether efficient coding principles can account for auditory percepts. Among these, one series of studies 25, 26, 61 tested how human adults, infants, and newborns perceive water sounds generated by a mathematical model (Figure 4) that consisted of a population of randomly spaced gamma tone chirps from a wide range of frequencies [25]. This model generated scale-invariant sounds when the temporal structure of the chirps scaled relative to their center

Concluding Remarks and Future Perspectives

The research findings discussed in this review suggest that auditory perception may obey the principles of efficient neural coding, relying on the informational, theoretical notion of optimality. The existing studies demonstrate that the approaches for understanding the mathematical structure of sounds can yield predictions about neuronal encoding throughout the auditory pathway. The correspondence between neuronal responses and model predictions, conversely, is consistent with the notion that

Acknowledgements

This work was supported by Human Frontier in Science Foundation Young Investigator Award to M.N.G. and J.G.; National Institutes of Health (Grant numbers NIH R01DC014700, NIH R01DC015527), and the Pennsylvania Lions Club Hearing Research Fellowship to M.G.N.; an ERC Consolidator Grant 773202 ERC-2017-COG ‘BabyRhythm’, the LABEX EFL (ANR-10-LABX-0083) and the ANR grant ANR-15-CE37-0009-01 awarded to J.G. M.N.G. is the recipient of the Burroughs Wellcome Award at the Scientific Interface. We

References (72)

  • D. Poeppel

    The analysis of speech in different temporal integration windows: cerebral lateralization as asymmetric sampling in time

    Speech Commun.

    (2003)
  • A.M. Liberman

    Perception of the speech code

    Psychol. Rev.

    (1967)
  • A.M. Liberman

    On finding that speech is special

  • P. Marler et al.

    Birdsong and speech: evidence for special processing

  • A. Vatakis

    Facilitation of multisensory integration by the “unity effect” reveals that speech is special

    J. Vis.

    (2008)
  • A. Vouloumanos et al.

    Tuned to the signal: the privileged status of speech for young infants

    Dev. Sci.

    (2004)
  • M. Tomasello et al.

    Joint attention and early language

    Child Dev.

    (1986)
  • F. Attneave

    Some informational aspects of visual perception

    Psychol. Rev.

    (1954)
  • H.B. Barlow

    Possible principles underlying the transformation of sensory messages

  • C.E. Shannon

    A mathematical theory of communication

    Bell Syst. Tech. J.

    (1948)
  • E.P. Simoncelli et al.

    Natural image statistics and neural representation

    Annu. Rev. Neurosci.

    (2001)
  • M.S. Lewicki

    Efficient coding of natural sounds

    Nat. Neurosci.

    (2002)
  • V.L. Ming et al.

    Efficient coding in human auditory perception

    J. Acoust. Soc. Am.

    (2009)
  • E.C. Smith et al.

    Efficient auditory coding

    Nature

    (2006)
  • H. Attias et al.

    Temporal low-order statistics of natural sounds

    Adv. Neural Inf. Process. Syst.

    (1997)
  • F. Rieke

    Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents

    Proc. Biol. Sci.

    (1995)
  • C.E. Stilp

    Speech perception in simulated electric hearing exploits information-bearing acoustic change

    J. Acoust. Soc. Am.

    (2013)
  • R. Guevara Erra et al.

    The efficient coding of speech: cross-linguistic differences

    PLoS One

    (2016)
  • R.F. Voss et al.

    ‘1/f noise’ in music and speech

    Nature

    (1975)
  • N. Singh et al.

    Modulation spectra of natural sounds and ethological theories of auditory processing

    J. Acoust. Soc. Am.

    (2003)
  • T. Chi

    Multiresolution spectrotemporal analysis of complex sounds

    J. Acoust. Soc. Am.

    (2005)
  • J. Fritz

    Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex

    Nat. Neurosci.

    (2003)
  • M.N. Geffen

    Auditory perception of self-similarity in water sounds

    Front. Integr. Neurosci.

    (2011)
  • J. Gervain

    Category-specific processing of scale-invariant sounds in infancy

    PLoS One

    (2014)
  • R. Plomp

    The ear as a frequency analyzer

    J. Acoust. Soc. Am.

    (1964)
  • T. Houtgast et al.

    A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria

    J. Acoust. Soc. Am.

    (1985)
  • Cited by (23)

    • Innate frequency-discrimination hyperacuity in Williams-Beuren syndrome mice

      2022, Cell
      Citation Excerpt :

      The ability to distinguish acoustic frequencies from each other or from the surrounding auditory scene has been essential for survival throughout evolution, and in humans remains fundamental to everyday hearing, linguistics, and musicality (Feng and Ratnam, 2000; Gervain and Geffen, 2019; Peretz, 2016; Stewart, 2008).

    • Frontotemporal activation differs between perception of simulated cochlear implant speech and speech in background noise: An image-based fNIRS study

      2021, NeuroImage
      Citation Excerpt :

      Despite myriad sources of distraction in daily life, listeners' perception of speech demonstrates surprising resilience. The robustness of speech perception owes to the neural redundancy within the auditory system, whereby subcortical neural firing strongly correlates with stimulus patterns and becomes increasingly discerning to specific feature combinations of speech at the level of the cortex (Gervain and Geffen, 2019; Schnupp, 2006). Likewise, comprehension of speech generally follows a hierarchy of processing such that acoustic sensory analyses begin at the temporal lobe, and higher level, attentional mechanisms of the frontal cortex are recruited to resolve more complicated speech information (Davis and Johnsrude, 2003; Friederici, 2011).

    • Do infants represent human actions cross-modally? An ERP visual-auditory priming study

      2021, Biological Psychology
      Citation Excerpt :

      Already at birth, infants’ auditory system is sufficiently developed to support the segregation of concurrent streams of sounds, and hence prepared for perceiving and representing distinct social sounds from their surrounding environment (Draganova et al., 2018; Graven & Browne, 2008; Hepper & Shahidullah, 1994; Winkler et al., 2003). They also have well developed abilities to process acoustic properties such as intensity and frequency, temporal relations, and melody (Baruch, Panissal-Vieu, & Drake, 2004; Berg & Boswell, 1998; Nazzi, Floccia, & Bertoncini, 1998; Plantinga and Trainor, 2005; Trainor & Trehub, 1992), which contribute to the extraction of the complex acoustic features and their integration into coherent percepts (Geangu et al., 2015; Gervain & Geffen, 2019; Gervain, Werker, Black, & Geffen, 2016; Gervain, Werker, & Geffen, 2014). Importantly, already at birth, the infant brain appears to process those acoustic properties that are relevant for the efficient discrimination and perceptual categorization of natural sounds, such as the similarity in the acoustic patterns at different levels of observation, or scale-invariance (Gervain & Geffen, 2019; Gervain et al., 2014; Gervain et al., 2016).

    View all citing articles on Scopus
    View full text