Elsevier

NeuroImage

Volume 172, 15 May 2018, Pages 162-174
NeuroImage

Neural source dynamics of brain responses to continuous stimuli: Speech processing from acoustics to comprehension

https://doi.org/10.1016/j.neuroimage.2018.01.042Get rights and content

Abstract

Human experience often involves continuous sensory information that unfolds over time. This is true in particular for speech comprehension, where continuous acoustic signals are processed over seconds or even minutes. We show that brain responses to such continuous stimuli can be investigated in detail, for magnetoencephalography (MEG) data, by combining linear kernel estimation with minimum norm source localization. Previous research has shown that the requirement to average data over many trials can be overcome by modeling the brain response as a linear convolution of the stimulus and a kernel, or response function, and estimating a kernel that predicts the response from the stimulus. However, such analysis has been typically restricted to sensor space. Here we demonstrate that this analysis can also be performed in neural source space. We first computed distributed minimum norm current source estimates for continuous MEG recordings, and then computed response functions for the current estimate at each source element, using the boosting algorithm with cross-validation. Permutation tests can then assess the significance of individual predictor variables, as well as features of the corresponding spatio-temporal response functions. We demonstrate the viability of this technique by computing spatio-temporal response functions for speech stimuli, using predictor variables reflecting acoustic, lexical and semantic processing. Results indicate that processes related to comprehension of continuous speech can be differentiated anatomically as well as temporally: acoustic information engaged auditory cortex at short latencies, followed by responses over the central sulcus and inferior frontal gyrus, possibly related to somatosensory/motor cortex involvement in speech perception; lexical frequency was associated with a left-lateralized response in auditory cortex and subsequent bilateral frontal activity; and semantic composition was associated with bilateral temporal and frontal brain activity. We conclude that this technique can be used to study the neural processing of continuous stimuli in time and anatomical space with the millisecond temporal resolution of MEG. This suggests new avenues for analyzing neural processing of naturalistic stimuli, without the necessity of averaging over artificially short or truncated stimuli.

Introduction

In a natural environment, the brain frequently processes information in a continuous fashion. For example, when listening to continuous speech, information is extracted incrementally from an uninterrupted acoustic signal at multiple levels: phonetically relevant sound patterns are recognized and grouped into words, which in turn are integrated into phrases which are meaningful in the context of a larger discourse (e.g. Gaskell and Mirkovic, 2016). Contrary to this continuous mode of functioning, neuroimaging experiments typically isolate phenomena of interest with short, repetitive trials (for many examples, see e.g. Gazzaniga et al., 2009). While such research unquestionably leads to valuable results, the lack of naturalness of the stimuli is associated with uncertainty of how generalizable such results are to real world settings (see e.g. Brennan, 2016). Consequently, there is a need for complementary research with more naturalistic stimuli.

Brain responses to continuous speech have been studied with functional magnetic resonance imaging (fMRI) (Brennan et al., 2012, Brennan et al., 2016, Chow et al., 2014, Willems et al., 2016). Hemodynamic changes have been shown to track inherent properties of words, such as word frequency, as well as properties of words in context, such as the contextual probability of encountering a given word. However, the low temporal resolution of fMRI, typically sampled at or below 1 Hz, imposes several limitations on the phenomena that can be modeled. While the studies cited above suggest that the resolution is adequate to model responses with a timescale of individual words, this is not the case for processes at faster timescales such as phonetic perception, where relevant events last only tens of milliseconds. In addition, fMRI responses can be modeled in terms of brain regions which are or are not sensitive to a given variable, but the relative and absolute timing of different components of the response remain obscure. Thus, even when word-based variables are analyzed, hemodynamic responses are modeled as instantaneous effects of the relevant variable, convolved with the hemodynamic response function, but without taking into account the temporal relationship between the stimulus and different components of the brain response (e.g. Brennan et al., 2016, Willems et al., 2016).

In contrast to fMRI, electroencephalography (EEG) and magnetoencephalography (MEG) have the temporal resolution to track continuous processing with millisecond accuracy. Previous research has established that the dependency of the MEG or EEG response on a continuous stimulus variable can be modeled as a linear time-invariant system (Lalor et al., 2006). This technique has been originally developed for relating neurons' spiking behavior to continuous sensory stimuli (see Ringach and Shapley, 2004), but can be extended to MEG/EEG signals by modeling the response as a linear convolution of a stimulus variable with an impulse response function (see Fig. 1). Given a known stimulus and a measured response, one can then estimate the optimal response function to predict the measured response from the stimulus. This technique has been used to model EEG responses to continuously changing visual stimuli, by modeling continuous EEG signals as the convolution of moment-by-moment stimulus luminance with an appropriate response function (Lalor et al., 2006). An analogous procedure has been used to estimate responses to amplitude modulated tones and noise (Lalor et al., 2009). As an extension of this procedure, the response to continuous speech has been modeled as a response to the level of momentary acoustic power, the acoustic envelope (Lalor and Foxe, 2010).

While the original formulation focused on purely sensory neurons, i.e. neurons whose response is a linear function of sensory input (Ringach and Shapley, 2004), the same method has also been applied successfully to determine cognitive influences on sensory processing. This can be achieved by modeling the signal as a response to a continuous predictor variable that represents a specific property of interest of the input stimulus. Thus, besides the acoustic envelope, the EEG response to continuous speech has been shown to reflect categorical representations of phonemes (Di Liberto et al., 2015). Furthermore, using stimuli in which speech from multiple talkers is mixed, it has been shown that the response function to the acoustic envelope can be divided into an earlier component around 50 ms that responds to the acoustic power in the overall stimulus, and a later component around 100 ms that responds to the acoustic envelope of the attended speech stream but not the unattended one (Ding and Simon, 2012b, Ding and Simon, 2012a).

While this research shows that response functions for continuous stimuli can be estimated, and that they can track not just sensory but also cognitive processes, all the above studies estimated response functions using only sensor space data. Topographic distributions of response functions have been assessed using equivalent current dipole localization (Lalor et al., 2009, Ding and Simon, 2012a) but this does not use the full localizing power of MEG. For investigating cognitive processing of sensory signals in particular, better source localization has the potential to separate response functions to different stimulus properties through anatomical separation of the brain response. In this paper, we propose to use distributed minimum norm source estimates to localize MEG data before estimating response functions. We developed a procedure in which source estimates are computed for continuous raw data, response functions are estimated independently at each virtual current dipole of the source model, and these individual response functions are then recombined to create a picture of the brain's responses to different functional aspects of the continuous stimulus, in both time and anatomical space. In other words, source localization is used to decompose the raw signal based on likely anatomical origin, and this decomposition is then used to estimate each potential source location's response to a particular stimulus variable.

To test and demonstrate this procedure, we analyzed data from participants listening to segments of a narrated story. We show that 6 min of data per participant is enough to estimate response functions that are reliable across subjects. In order to demonstrate the ability to localize responses in different brain regions, we focused on predictor variables with clearly different predictions for their anatomical localization and temporal response characteristics (see Fig. 2): the response to the acoustic envelope of the speech signal should be associated with at least two strong components around 50 and 100 ms latency, in auditory cortex; previous studies suggest that the latter component is posterior to the former (Ding and Simon, 2012a). Responses associated with word recognition were assessed via lexical frequency, which is known to be one of the strongest predictors of lexical processing in general (see e.g. Baayen et al., 2016). Higher lexical frequency is associated with faster recognition of spoken words (e.g. Connine et al., 1990, Meunier and Segui, 1999, Dahan et al., 2001), and lower amplitudes in event related potentials to single spoken words (Dufour et al., 2013). FMRI investigations indicate a corresponding reduction in left-hemispheric temporal and frontal activity when processing more frequent compared to less frequent words in a narrated story (Brennan et al., 2016). Responses associated with higher levels of language processing beyond word recognition were assessed with an estimate of the amount of semantic combinatory processing over the course of the speech stimulus. This estimate was based on the presence of constructions associated with semantic composition operations, which previous MEG studies localized to the anterior temporal lobe (Bemis and Pylkkänen, 2011, Bemis and Pylkkänen, 2012, Westerlund et al., 2015). This variable is relatively coarse and likely to be correlated with other variables reflecting structural integration, such as constituent size, associated with left temporal and inferior frontal activity (e.g., Pallier et al., 2011, Brennan et al., 2012). Consequently, this variable was treated as a rough estimate of multi-word integration processes during story comprehension, likely to be associated with anterior temporal and frontal responses.

Section snippets

Testing dataset

We analyzed a subset of the data described in detail by Presacco and colleagues (Presacco et al., 2016). In brief, 17 adults (aged 18–27 years) recruited from the Maryland and Washington, D.C. areas listened to 1-min long segments of an audiobook recording of The Legend of Sleepy Hollow by Washington Irving (https://librivox.org/the-legend-of-sleepy-hollow-by-washington-irving/), narrated by a male speaker, and sampled at 44,100 Hz. Audio segments were modified to remove pauses longer than

Results

The model fit was evaluated using the Pearson correlation between the actual and the predicted responses. Each of the three predictor variables was evaluated as to whether it significantly improved the model, by comparing the fit of the full model with the fit of a model in which this variable was deliberately misaligned, using the first half of the stimulus to predict the second half of the response and vice versa. Whole brain maps of the difference were tested for significance with one-tailed

Discussion

We described a procedure for combining linear kernel estimation with distributed MEG source localization to estimate the brain response to continuous stimuli as a function of delay time and anatomical space. We demonstrated the utility of this procedure by analyzing responses to continuous speech. Using just 6 min of MEG data per subject, we found reliable responses reflecting variables related to different cognitive levels of speech comprehension, including acoustic, lexical and semantic

Conclusion

We demonstrated that linear kernel estimation can be combined with distributed minimum norm source estimates to map brain response to continuous speech in time and anatomical space. While we developed and tested this technique for studying speech processing, it is applicable to other continuous stimuli. Kernels can be estimated with multiple predictor variables competing for explanatory power, which makes it possible to model responses to suspected covariates and test whether variables of

Acknowledgements

This work was supported by the National Institutes of Health [grant number R01-DC-014085].

References (71)

  • E.C. Lalor et al.

    The VESPA: a method for the rapid estimation of a visual evoked potential

    NeuroImage

    (2006)
  • F.-H. Lin et al.

    Assessing and improving the spatial accuracy in MEG source localization by depth-weighted minimum-norm estimates

    NeuroImage

    (2006)
  • E. Maris et al.

    Nonparametric statistical testing of EEG- and MEG-data

    J. Neurosci. Meth.

    (2007)
  • F. Meunier et al.

    Frequency effects in auditory word recognition: the case of suffixed words

    J. Mem. Lang.

    (1999)
  • A. Nakamura et al.

    Somatosensory homunculus as drawn by MEG

    NeuroImage

    (1998)
  • K.V. Nourski et al.

    Functional organization of human auditory cortex: investigation of response latencies through direct recordings

    NeuroImage

    (2014)
  • R.C. Oldfield

    The assessment and analysis of handedness: the Edinburgh inventory

    Neuropsychologia

    (1971)
  • D. Ringach et al.

    Reverse correlation in neurophysiology

    Cognit. Sci.

    (2004)
  • S.M. Smith et al.

    Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference

    NeuroImage

    (2009)
  • M. Westerlund et al.

    The LATL as locus of composition: MEG evidence from English and Arabic

    Brain Lang.

    (2015)
  • E. Ahissar et al.

    Speech comprehension is correlated with temporal response patterns recorded from auditory cortex

    Proc. Natl. Acad. Sci. U. S. A.

    (2001)
  • S. Akram et al.

    Dynamic estimation of the auditory temporal response function from MEG in competing-speaker environments

    IEEE Trans. Biomed. Eng.

    (2016)
  • R.H. Baayen et al.

    Frequency in lexical processing

    Aphasiology

    (2016)
  • D.K. Bemis et al.

    Simple composition: a magnetoencephalography investigation into the comprehension of minimal linguistic phrases

    J. Neurosci.

    (2011)
  • D.K. Bemis et al.

    Basic linguistic composition recruits the left anterior temporal lobe and left angular gyrus during both listening and reading

    Cerebr. Cortex

    (2012)
  • P. Boersma et al.

    Praat: Doing Phonetics by Computer [Computer Program] (Version 6.0.19)

    (2017)
  • J.R. Brennan

    Naturalistic sentence comprehension in the brain

    Lang. Ling. Compass

    (2016)
  • C. Brodbeck

    Eelbrain: 0.25

    (2017)
  • M. Brysbaert et al.

    Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English

    Behav. Res. Meth.

    (2009)
  • C. Cheung et al.

    The auditory representation of speech sounds in human motor cortex

    ELife

    (2016)
  • H.M. Chow et al.

    Embodied comprehension of stories: interactions between language regions and modality-specific neural systems

    J. Cognit. Neurosci.

    (2014)
  • G.B. Cogan et al.

    Sensory–motor transformations for speech occur bilaterally

    Nature

    (2014)
  • C.M. Connine et al.

    Word familiarity and frequency in visual and auditory word recognition

    J. Exp. Psychol. Learn. Mem. Cognit.

    (1990)
  • A.M. Dale et al.

    Improved localizadon of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction: a linear approach

    J. Cognit. Neurosci.

    (1993)
  • S.V. David et al.

    Estimating sparse spectro-temporal receptive fields with natural stimuli

    Netw. Comput. Neural Syst.

    (2007)
  • Cited by (81)

    View all citing articles on Scopus
    View full text