Chapter 9 - Role of the auditory system in speech production

https://doi.org/10.1016/B978-0-444-62630-1.00009-3Get rights and content

Abstract

This chapter reviews evidence regarding the role of auditory perception in shaping speech output. Evidence indicates that speech movements are planned to follow auditory trajectories. This in turn is followed by a description of the Directions Into Velocities of Articulators (DIVA) model, which provides a detailed account of the role of auditory feedback in speech motor development and control. A brief description of the higher-order brain areas involved in speech sequencing (including the pre-supplementary motor area and inferior frontal sulcus) is then provided, followed by a description of the Hierarchical State Feedback Control (HSFC) model, which posits internal error detection and correction processes that can detect and correct speech production errors prior to articulation. The chapter closes with a treatment of promising future directions of research into auditory–motor interactions in speech, including the use of intracranial recording techniques such as electrocorticography in humans, the investigation of the potential roles of various large-scale brain rhythms in speech perception and production, and the development of brain–computer interfaces that use auditory feedback to allow profoundly paralyzed users to learn to produce speech using a speech synthesizer.

Introduction

Speech production is a highly complex motor act, involving the finely coordinated activation of approximately 100 muscles in the respiratory, laryngeal, and oral motor systems. To achieve this task, speakers utilize a large network of brain regions. This network includes regions involved in other motor tasks, such as the motor and somatosensory cortical areas, cerebellum, basal ganglia, and thalamus, as well as regions that are more specialized for speech and language, including inferior and middle prefrontal cortex and superior and middle temporal cortex. Our goal in this chapter is to describe the critical role of the auditory system in speech production. We will first discuss the role of sensory systems in motor control broadly and summarize the long history of ideas and research on the interaction between auditory and motor systems for speech. We then describe current research on speech planning, which strongly implicates the auditory system in this process. Two large-scale neurocomputational models of speech production are then discussed. Finally, we will highlight some future directions for research on speech production.

Movement is absolutely dependent on sensory information. We know where and how to reach for an object because we see its location and shape; we know how much force to exert while we are holding the object because we feel the pressure of the object on our hand and the weight on our limb; and we know how to initiate any of these movements because our sensory systems tell us where our limb is in relation to our body and the object. British neurologist and physiologist Henry Charlton Bastian (1837–1915) wrote on the topic of movement control in 1887, stating, “It may be regarded as a physiological axiom, that all purposive movements of animals are guided by sensations or by afferent impressions of some kind” (Bastian, 1887, p. 1). Experimental work over the decades backs up these claims. This work has found, for example, that blocking somatosensory feedback from a monkey's limb (while leaving motor fibers intact) causes the limb to go dead. With training the monkey can learn to reuse it clumsily, but only with visual feedback; blindfold the animal and motor control degrades dramatically (Sanes et al., 1984). Similar symptomology can be found in humans suffering from large-fiber sensory neuropathy, which deafferents the body sense while leaving motor fibers intact (Sanes et al., 1984).

Speech is no different. Without the auditory system, as in prelingual-onset peripheral deafness, normal speech development cannot occur. Importantly, it is not just during development that auditory information is critical. Experimental or naturally caused manipulations of acoustic input can have dramatic effects on speech production. For example, delayed auditory feedback induces non-fluency (Yates, 1963), altering feedback in the form of pitch or the formant frequency structure results in automatic and largely unconscious compensation in speech articulation (Burnett et al., 1998, Houde and Jordan, 1998, Larson et al., 2001), and exposure to a different linguistic environment can induce changes in the listener–speaker's articulation (picking up accents; Sancier and Fowler, 1997). Furthermore, although individuals who become deaf as adults can remain intelligible for years after they lose hearing, they show some speech output impairments immediately (including impaired ability to adjust pitch loudness in different listening conditions), and over time their phonetic contrasts become reduced (e.g., Perkell et al., 2000) and they exhibit articulatory decline (Waldstein, 1989).

The speech research literature contains numerous theoretic proposals that strongly link speech perception and speech production. Notable examples include the motor theory of speech perception (Liberman et al., 1967, Liberman and Mattingly, 1985), which posits that speech perception involves translating acoustic signals into the motor gestures that produce them, as well as acoustic theories of speech production (e.g., Fant, 1960, Stevens, 1998), which highlight the importance of acoustic or auditory targets in the speech production process. In the following sections we elaborate on the roles of auditory information in recent neural models of speech planning and execution.

Section snippets

The planning of speech movements

Although it may seem obvious that auditory information is in some way involved in speech production – the lack of normal speech development in the absence of auditory feedback provides incontrovertible evidence of this – there has been much debate in the speech motor control literature over exactly what role(s) auditory feedback plays. Much of this debate revolves around the following central question: what exactly are the goals, or targets, of the speech production planning process?

To produce

Brain regions involved in speech articulation

Figure 9.2 illustrates neural activity in the cerebral cortex and cerebellum during a simple speech task (monosyllabic word production) contrasted with a baseline task of silently viewing letters. As in most speech production neuroimaging studies, activity is seen in the ventral precentral gyrus (motor and premotor cortex), ventral postcentral gyrus (somatosensory cortex), superior temporal gyrus (auditory cortex), supplementary motor area (SMA), and superior paravermal cerebellum (primarily

Neurocomputational models of speech production

The extensive library of results from neuroimaging studies provides important insight into the roles of a large number of cortical and subcortical areas in speech production. In isolation, however, these results do not provide an integrated, mechanistic view of how the neural circuits engaged by speech tasks interact to produce fluent speech. To this end, computational models that both suggest the neural computations performed within specific modules and across pathways linking modules and

Future directions

The advent of non-invasive functional neuroimaging techniques such as PET and fMRI in the late 20th century has led to greatly accelerated progress towards understanding the neural mechanisms underlying speech production and perception. In more recent years, investigators have used multiple neuroimaging techniques in the same subject pool to overcome limitations in any given technique. For example, fMRI data, which has high spatial resolution but low temporal resolution, can be combined with

Acknowledgments

This study was supported by the National Institute on Deafness and other Communication Disorders grants R01 DC007683 (FHG), R01 DC002852 (FHG), R01 DC03681 (GH), and R01 DC009659 (GH). We thank Barbara Holland and Jason Tourville for assistance with manuscript preparation.

References (128)

  • M. D’Esposito et al.

    Functional MRI studies of spatial and nonspatial working memory

    Cogn Brain Res

    (1998)
  • M. Desmurget et al.

    Forward modeling allows feedback control for fast reaching movements

    Trends Cogn Sci

    (2000)
  • A.L. Giraud et al.

    Endogenous cortical rhythms determine cerebral specialization for speech perception and production

    Neuron

    (2007)
  • E. Golfinopoulos et al.

    fMRI investigation of unexpected somatosensory feedback perturbation during speech

    Neuroimage

    (2011)
  • F.H. Guenther et al.

    A neural theory of speech acquisition and production

    J Neuroling

    (2012)
  • F.H. Guenther et al.

    Neural modeling and imaging of the cortical interactions underlying syllable production

    Brain Lang

    (2006)
  • G. Hickok et al.

    Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language

    Cognition

    (2004)
  • A.K. Ho et al.

    Sequence heterogeneity in Parkinsonian speech

    Brain Lang

    (1998)
  • U. Jurgens

    The efferent and efferent connections of the supplementary motor area

    Brain Res

    (1984)
  • M. Kawato

    Internal models for motor control and trajectory planning

    Curr Opin Neurobiol

    (1999)
  • J.G. Kerns et al.

    Prefrontal cortex guides context-appropriate responding during language production

    Neuron

    (2004)
  • S.E. Kohn

    The nature of the phonological disorder in conduction aphasia

    Brain Lang

    (1984)
  • W.J. Levelt et al.

    Do speakers have access to a mental syllabary?

    Cognition

    (1994)
  • A.M. Liberman et al.

    The motor theory of speech perception revised

    Cognition

    (1985)
  • B. Lindblom et al.

    Formant frequencies of some fixed-mandible vowels and a model of speech motor programming by predictive simulation

    J Phon

    (1979)
  • H. Luo et al.

    Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex

    Neuron

    (2007)
  • R.C. Miall et al.

    Forward models for physiological motor control

    Neural Netw

    (1996)
  • F.A. Middleton et al.

    Cerebellar output channels

    Int Rev Neurobiol

    (1997)
  • F.A. Middleton et al.

    Basal ganglia and cerebellar loops: motor and cognitive circuits

    Brain Res Rev

    (2000)
  • N. Nozari et al.

    Is comprehension necessary for error detection? A conflict-based account of monitoring in speech production

    Cognit Psychol

    (2011)
  • J. Numminen et al.

    Differential effects of overt, covert and replayed speech on vowel-evoked responses of the human auditory cortex

    Neurosci Lett

    (1999)
  • J. Numminen et al.

    Subject's own speech reduces reactivity of the human auditory cortex

    Neurosci Lett

    (1999)
  • G.M. Oppenheim et al.

    Inner speech slips exhibit lexical bias, but not the phonemic similarity effect

    Cognition

    (2008)
  • M.G. Peeva et al.

    Distinct representations of phonemes, syllables, and supra-syllabic sequences in the speech production network

    Neuroimage

    (2010)
  • H. Ackermann et al.

    Speech deficits in ischaemic cerebellar lesions

    J Neurol

    (1992)
  • H. Ackermann et al.

    The contribution of the cerebellum to speech production and speech perception: clinical and functional imaging data

    Cerebellum

    (2007)
  • G.E. Alexander et al.

    Parallel organization of functionally segregated circuits linking basal ganglia and cortex

    Annu Rev Neurosci

    (1986)
  • H. Ames

    Neural dynamics of speech perception and production: From speaker normalization to apraxia of speech

    (2009)
  • R.A. Andersen

    Multimodal integration for the representation of space in the posterior parietal cortex

    Philos Trans R Soc Lond B Biol Sci

    (1997)
  • B.E. Averbeck et al.

    Parallel processing of serial movements in prefrontal cortex

    Proc Natl Acad Sci

    (2002)
  • B.B. Averbeck et al.

    Neural activity in prefrontal cortex during copying geometrical shapes: I. Single cells encode shape, sequence, and metric parameters

    Exp Brain Res

    (2003)
  • H.C. Bastian

    The “muscular sense”: its nature and cortical localisation

    Brain

    (1887)
  • D. Bendor et al.

    The neuronal representation of pitch in primate auditory cortex

    Nature

    (2005)
  • J.W. Bohland et al.

    Neural representations and mechanisms for the performance of simple speech sequences

    J Cogn Neurosci

    (2010)
  • K.E. Bouchard et al.

    Functional organization of human sensorimotor cortex for speech articulation

    Nature

    (2013)
  • S. Boyce et al.

    Coarticulatory stability in American English /r/

    J Acoust Soc Am

    (1997)
  • C.P. Browman et al.

    Articulatory gestures as phonological units

    Phonology

    (1989)
  • D. Bullock et al.

    Neural dynamics of planned arm movements: emergent invariants and speed-accuracy properties during trajectory formation

    Psychol Rev

    (1988)
  • D. Bullock et al.

    A self-organizing neural network model for redundant sensory-motor control, motor equivalence, and tool use

    J Cogn Neurosci

    (1993)
  • T.A. Burnett et al.

    Voice F0 responses to manipulations in pitch feedback

    J Acoust Soc Am

    (1998)
  • Cited by (63)

    • Speech dysfunction, cognition, and Parkinson's disease

      2022, Progress in Brain Research
      Citation Excerpt :

      The chapter highlights common PD interventions (pharmacological, surgical, and non-pharmacological) and how each takes into consideration the interaction between cognition and speech production outcomes. Central to contemporary speech production theories is the idea that speakers use auditory, sensory, and kinesthetic feedback to monitor whether an actualized speech/vocal output matches the intended production target (Guenther and Hickok, 2015; Nozari and Novick, 2017; Tourville and Guenther, 2011). When misalignments between planned targets and actual productions occur, or when monitoring and conflict detection processes are impaired, so-called “error signals” can manifest as misarticulated speech sounds, long pauses, verbal fillers (“uh” and “uhm”), disfluent productions (e.g., repeated sounds, syllables, or words), word choice, grammatical errors, and reformulated/revised productions (Levelt, 1980; Postma, 2000).

    • Non-invasive brain stimulation for speech in Parkinson's disease: A randomized controlled trial

      2021, Brain Stimulation
      Citation Excerpt :

      Moreover, the six-motion parameter time series (obtained from the realign procedure in SPM), the framewise displacement time series, and the signals from white matter and cerebrospinal fluid were regressed out of the data in subsequent analysis. We were specifically interested in task-induced BOLD signal changes in the regions of interest (ROIs), i.e. brain regions engaged in the dorsal language pathway and implicated in pathophysiology of HD in PD [3–6], including the left orofacial sensorimotor cortex (OFSM1) (-58 -4 22) [25], supplementary motor area (SMA) (8 8 62) [3], left inferior frontal gyrus (IFG) (−44 23 15) [26], left caudate nucleus (CN) (-14 -1 17) [27], left anterior insula cortex (AIC) (−32 16 2) [27], left supramarginal gyrus (SMG) (-42-52 37) [26], the right posterior superior temporal gyrus (STG) (40 -38 14) [9] (see Fig. 2). The ROIs were centered on the nearest local maxima (peak voxel values) of the mean group activations within the selected brain regions (reading sentences in contrast to baseline condition) at the baseline examination, calculated using a one-sample t-test.

    View all citing articles on Scopus
    View full text