ABSTRACT
Natural conversation is multisensory: when we can see the speaker’s face, visual speech cues influence our perception of what is being said. The neuronal basis of this phenomenon remains unclear, though there is indication that neuronal oscillations—ongoing excitability fluctuations of neuronal populations in the brain—represent a potential mechanism. Investigating this question with intracranial recordings in humans, we show that some sites in auditory cortex track the temporal dynamics of unisensory visual speech using the phase of their slow oscillations and phase-related modulations in neuronal activity. This effect is asymmetric, as we find much less detectable tracking of auditory speech by visual cortex. Auditory cortex thus builds a representation of the speech stream’s envelope based on visual speech alone, at least in part by resetting the phase of its ongoing oscillations. Phase reset amplifies the representation of the speech stream and organizes the information contained in neuronal activity patterns.
INTRODUCTION
While viewing one’s interlocutor is not always necessary for speech perception, it significantly improves intelligibility under noisy conditions (Sumby and Pollack, 1954). Moreover, mismatched auditory and visual speech stimuli can induce striking perceptual illusions (McGurk and Macdonald, 1976). Despite the ubiquity and power of visual influences on speech perception, the underlying neuronal mechanisms remain an open question. The cerebral processing of auditory and visual speech converges in multisensory cortical areas, especially the superior temporal lobe (Miller and D’Esposito, 2005; Beauchamp, Nath and Pasalar, 2010). Crossmodal influences are also found in cortex traditionally considered to be unisensory; in particular, visual speech modulates the activity of auditory cortex (Calvert et al., 1997; Besle et al., 2008).
The articulatory movements that constitute visual speech strongly correlate with the corresponding speech sounds (Chandrasekaran et al., 2009; Schwartz and Savariaux, 2014) and predict them to some extent (Arnal et al., 2009; Zion Golumbic et al., 2013), suggesting that visual speech might serve as an alerting cue to auditory cortex, preparing the neural circuits to process the incoming speech sounds more efficiently. Our hypothesis is that this preparation occurs through a resetting of the phase of neuronal oscillations: through this phase reset, visual speech cues influence neuronal excitability in auditory cortex (Schroeder et al., 2008).
This hypothesis rests on four lines of evidence. First, auditory speech is rhythmic, with syllables arriving at a relatively rapid rate (4-7 Hz) nested within the slower (1-3 Hz) rates of phrase and word production. These rhythmic features of speech are critical for it to be intelligible (Shannon et al., 1995; Greenberg et al., 2003). Second, auditory cortex synchronizes its oscillations to the rhythm of heard speech, and the magnitude of this synchronization correlates with the intelligibility of speech (Ahissar et al., 2001; Luo and Poeppel, 2007; Ghinst et al., 2016; Di Liberto et al., 2018; Keitel, Gross and Kayser, 2018). Third, neuronal oscillations correspond to momentary changes in neuronal excitability, so that the response of sensory cortex depends on the phase of its oscillations upon stimulus arrival (Lakatos et al., 2005; Whittingstall and Logothetis, 2009). Fourth, even at the level of primary sensory cortex, oscillations can be phase-reset by stimuli from other modalities, and this crossmodal reset influences the processing of incoming stimuli from the preferred modality (Lakatos et al., 2007; Kayser, Petkov and Logothetis, 2008; Mercier et al., 2013, 2015).
There is strong support for the phase-reset hypothesis in non-human primates (Perrodin et al., 2015). In humans, noninvasive neurophysiology has brought solid evidence that visual speech entrains oscillatory activity in widespread regions of the cerebral cortex, including areas involved in speech perception and production (Crosse, Butler and Lalor, 2015; Park et al., 2016, 2018). However, limitations inherent to noninvasive methods leave two crucial sets of questions unanswered. First, because reconstructing the cerebral sources of neurophysiological signals recorded at the scalp surface is necessarily an imperfect estimate (Mégevand et al., 2014), the exact identity of the cortical areas involved has not yet been ascertained. More specifically, whether human auditory cortex aligns the phase of its oscillations to unisensory visual speech remains to be demonstrated. Second, the mechanistic basis for phase alignment is unclear: it could represent either a resetting of the phase of ongoing neuronal oscillations or a succession of sensory-evoked responses (Shah et al., 2004; Schroeder et al., 2008). Noninvasive neurophysiology is ill-equipped to address this point, because it cannot reliably measure high-frequency cortical activity (Millman et al., 2013), a signal that directly correlates with local neuronal activity (Ray et al., 2008).
Here, we used intracranial EEG (Parvizi and Kastner, 2018) to settle these questions. We demonstrate that portions of human auditory cortex are able to align the phase of their oscillations to unisensory visual speech stimuli, and that this alignment happens through phase reset. Our findings are the strongest confirmation to date of the phase-reset hypothesis of audiovisual speech integration (Schroeder et al., 2008).
RESULTS
Phase reset of low-frequency oscillations in auditory cortex in response to visual speech
We recorded intracranial EEG (iEEG) signals from electrodes implanted in the brain of six human participants undergoing invasive electrophysiological monitoring for epilepsy. Patients attended to clips of a speaker telling a short story (7-11 seconds long), presented in the auditory (soundtrack with black screen) and visual (silent movie) modalities. iEEG electrodes were considered to be in auditory cortex (25 electrodes over 5 participants) if they fulfilled both an anatomical criterion (location in the superior temporal lobe) and a physiological criterion: increase in local neuronal activity (as indexed by the amplitude of broadband high-frequency activity, BHA; (Ray et al., 2008)) in response to auditory speech. To determine how visual speech influences activity in auditory cortex, we computed BHA as well as power and intertrial coherence (ITC, a measure of phase alignment) in the delta (1-3 Hz), theta (4-7 Hz) and alpha (8-12 Hz) oscillatory frequency bands.
Figures 1A to 1D show data for a representative electrode in auditory cortex that displayed a sustained alignment in the phase of its delta-band oscillations in response to unisensory visual speech (Figure 1C). If this phase alignment were caused by sensory-evoked responses, increases in delta power and local neuronal activity would be expected (Makeig et al., 2002; Shah et al., 2004; Lakatos et al., 2007). In fact, delta power decreased (see Figure 1C), and local neuronal activity did not increase (Figure 1D). Thus, this combination of observations points towards phase reset of ongoing neuronal oscillations as the more likely mechanism.
This was verified across electrodes and participants: several auditory cortex electrodes displayed robust phase alignment of their slow oscillations in response to visual speech, as demonstrated by a significant ITC increase in the delta and, to a lesser extent, theta bands (Figure 1E). By contrast, local neuronal activity (Figure 1F) and low-frequency power (Figure 1G) tended to decrease in a majority of auditory electrodes in response to visual speech. Importantly, there was no correlation between the intensity of delta phase alignment and neuronal activity, and delta phase alignment correlated inversely with delta power (Figure 1H). Taken together, these results support the view that the low-frequency phase alignment to visual speech observed in some portions of auditory cortex is mediated by rapid, repetitive crossmodal phase resetting of ongoing neuronal oscillations rather than by a succession of sensory-evoked responses (Schroeder et al., 2008).
Phase-amplitude coupling links slow oscillations to local neuronal activity
Neuronal oscillations reflect momentary fluctuations in neuronal excitability through phase-amplitude coupling (PAC; (Buzsáki and Draguhn, 2004; Lakatos et al., 2005; Whittingstall and Logothetis, 2009; Canolty and Knight, 2010)). We looked for evidence of PAC in auditory cortex during the perception of visual speech (see Figure 2A and 2B for an example) by computing the modulation index (MI; (Tort et al., 2010)) between slow oscillations and BHA. Across participants, most auditory electrodes displayed significant PAC in response to visual speech (Figure 2C). The magnitude of PAC in auditory cortex correlated with the magnitude of phase alignment, but not with the intensity of local neuronal activity (Figure 2D). The combination of phase alignment of auditory cortex to visual speech in the delta and theta bands with evidence of phase-amplitude coupling at these frequencies suggests that, even though visual speech does not increase the overall rate of neuronal activity in auditory cortex, it shapes the temporal dynamics of auditory cortical activity at frequencies that are relevant for the processing of auditory speech.
Auditory cortex represents the temporal dynamics of speech sounds from visual cues
Our observation of phase-locking of auditory cortex to visual speech, combined with the established correlation between parameters of visual speech such as the area of mouth opening and the envelope of speech sounds (Chandrasekaran et al., 2009), suggests that auditory cortex might be able to build a relatively detailed representation of the temporal dynamics of speech from unisensory visual inputs. In order to probe this representation, we applied a technique of stimulus reconstruction (Mesgarani et al., 2009) in an attempt to reconstitute the speech envelope from the responses of auditory cortex to visual speech alone (see Figure 3A and 3B for an example). Over participants, we found that reconstruction performed significantly above chance in a subset of auditory electrodes (Figure 3C), even in the complete absence of any auditory input. Importantly, these electrodes were among those that exhibited delta phase-locking to visual speech (compare Figures 3C and 1E, and see also Figure S1). As a control, we failed to reconstruct the speech envelope from visual cortex responses to auditory speech (not shown), despite the fact that auditory stimuli were in fact presented in that case. These results indicate that some portions of auditory cortex build a faithful representation of perceived speech based on visual speech cues alone. As has been shown before (Zion Golumbic et al., 2013), this representation complements and enriches that built from auditory speech, thus facilitating the attentional selection of the speech stream, as well as its parsing into phonetically and linguistically relevant building blocks (Schroeder et al., 2008; Schroeder and Lakatos, 2009; Arnal and Giraud, 2012; Giraud and Poeppel, 2012).
Since each stimulus was presented multiple times, it could be that stimulus representation in auditory cortex became more faithful over repetitions, as participants associated visual gestures and speech sounds in the stimulus set. To probe this, we reconstructed the speech envelope from auditory cortex responses to visual speech separately for each repetition of the stimuli.
We did not find any tendency for reconstruction accuracy to improve over stimulus repetitions (Figure 3D). Nevertheless, it is very likely that repeated exposure will strengthen the associations between visual and auditory speech tokens, as suggested by the literature on speechreading training (Massaro, Cohen and Gesi, 1993).
Little evidence of phase alignment to auditory speech in visual cortex
The phase-reset hypothesis makes a site- and direction-specific prediction regarding phase reset of auditory cortex oscillations by visual speech gestures. To test this prediction, we examined the responses of visual cortex to auditory speech (28 electrodes over 4 participants, selected according to anatomical and physiological criteria: location in the occipital lobe and increased BHA to visual speech). There was little detectable phase alignment of slow oscillations in visual cortex to auditory speech (Figure 4). This observation fits with the notion that the phase-resetting effect of visual speech on auditory cortex is specific, and is not merely due to an indiscriminate phase-reset of oscillations in sensory cortex by crossmodal stimuli.
DISCUSSION
It is widely observed that auditory cortex tracks the temporal dynamics of unisensory visual speech using phase entrainment of intrinsic low-frequency oscillations. This phase alignment in turn determines systematic, stimulus-locked variations in neuronal activity, as indexed by fluctuations in broadband high-frequency activity. It was further shown that visual speech gestures enhance intelligibility by facilitating auditory cortical entrainment to the speech stream (Crosse, Butler and Lalor, 2015; Perrodin et al., 2015; Park et al., 2016, 2018; Di Liberto et al., 2018; Micheli et al., 2018). Here, we used iEEG recordings for a more direct examination of the neurophysiological mechanisms underlying visual enhancement of auditory cortical speech processing. Our findings significantly elaborate the mechanistic description of crossmodal stimulus processing as a critical contribution to speech perception under complex and noisy natural conditions. Three aspects of the findings are novel and fundamentally important.
First, the low-frequency tracking reflects a pattern of phase resetting linked to the succession of visual cues, rather than simply a succession of evoked responses. Indeed, phase concentration is accompanied by an amplitude decrease, rather than the amplitude increase that accompanies evoked responses (Makeig et al., 2002; Shah et al., 2004). Interestingly, this may help to explain the paradoxical observation that, despite the general perceptual amplification that attends audiovisual speech, neurophysiological responses to multisensory audiovisual stimuli in both auditory and visual cortex are generally smaller than those to the preferred-modality stimulus alone (Besle et al., 2008; Schepers, Yoshor and Beauchamp, 2014; Mercier et al., 2015). While the physiological mechanisms of the low-frequency power decrease are not yet clear, our findings represent an unequivocal demonstration of cross-modal phase-reset in speech perception, and they strongly support the hypothesis that oscillatory phase reset is a mechanism by which visual speech cues influence the processing of speech sounds by auditory cortex (Schroeder et al., 2008).
Second, auditory cortical responses to visual speech in isolation reflect stimulus-specific features of the visual speech cues. As such, they suggest a key role for oscillatory phase as a neuronal coding mechanism—along the intensity and spatial pattern of neuronal responses (Kayser et al., 2009)—underlying specific aspects of audiovisual speech integration such as the categorical perception of syllables (ten Oever and Sack, 2015). Such a prediction could be tested in future studies by investigating how conflicting auditory and visual speech cues hijack spike-phase coding to cause perceptual illusions (McGurk and Macdonald, 1976).
Finally, the pattern of rapid quasi-rhythmic phase resetting we observe has strong implications for the mechanistic understanding of speech processing in general. Indeed, this phase resetting aligns the ambient excitability fluctuations in auditory cortex with the incoming sensory stimuli, potentially helping to parse the continuous speech stream into linguistically relevant processing units such as syllables (Schroeder et al., 2008; Giraud and Poeppel, 2012; Zion Golumbic, Poeppel and Schroeder, 2012). As attention strongly reinforces the tracking of a specific speech stream (Mesgarani and Chang, 2012; Zion Golumbic et al., 2013; O’Sullivan et al., 2015), phase resetting will tend to amplify an attended speech stream above background noise, increasing its perceptual salience.
It is clear that visual enhancement of speech takes place within the context of strong top-down influences from frontal and parietal regions that support the processing of distinct linguistic features (Di Liberto et al., 2018; Keitel, Gross and Kayser, 2018). It is also clear that low-frequency oscillations relevant to speech perception can themselves be modulated by transcranial electrical stimulation (Zoefel, Archer-Boyd and Davis, 2018). Our findings highlight the need to consider oscillatory phase in targeting potential neuromodulation therapy to enhance communication.
MATERIALS AND METHODS
Experimental design
Participants
Six patients (3 women, age range 21-52 years old) suffering from drug-resistant focal epilepsy and undergoing video-intracranial EEG (iEEG) monitoring at North Shore University Hospital (Manhasset, NY 11030, USA) participated in the experiments. All participants were native speakers of English. The participants provided written informed consent under the guidelines of the Declaration of Helsinki, as monitored by the Feinstein Institute for Medical Research’s institutional review board.
Stimuli and task
Stimuli (Zion Golumbic et al., 2013) were presented at the bedside using a laptop computer and Presentation software (version 17.2, Neurobehavioral Systems, Inc., Berkeley, CA, http://www.neurobs.com). Trials started with a 1-s fixation cross on a black screen. The participants then viewed or heard video clips (7-11 seconds) of a speaker telling a short story. The clips were cut off to leave out the last word. A written word was then presented on the screen, and the participants had to select whether that word ended the story appropriately or not. There was no time limit for participants to indicate their answer; reaction time was not monitored. There were 2 speakers (one woman) telling 4 stories each (8 distinct stories); each story was presented once with one of 8 different ending words (4 appropriate), for a total of 64 trials. These were presented once in each of 3 sensory modalities: audiovisual (movie with audio track), auditory (soundtrack with a fixation cross on a black screen), visual (silent movie). Trial order was randomized, with the constraint that the same story could not be presented twice in a row, regardless of modality.
Intracranial EEG recordings
iEEG electrode localization
The placement of iEEG electrodes (subdural and depth electrodes, Ad-Tech Medical, Racine, WI, and Integra LifeSciences, Plainsboro, NJ) was determined on clinical grounds, without reference to this study. The localization and display of iEEG electrodes was performed using iELVis (http://ielvis.pbworks.com) (Groppe et al., 2017). For each participant, a post-implantation high-resolution CT scan was coregistered with a post-implantation 3D T1 1.5-tesla MRI scan and then with a pre-implantation 3D T1 3-tesla MRI scan via affine transforms with 6 degrees of freedom using the FMRIB Linear Image Registration Tool included in the FMRIB Software Library (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) (Jenkinson et al., 2012) or the bbregister tool included in FreeSurfer (https://surfer.nmr.mgh.harvard.edu/fswiki/FreeSurferWiki) (Fischl, 2012). Electrodes were localized manually on the CT scan using BioImage Suite (http://bioimagesuite.yale.edu/) (Papademetris et al., 2006). The pre-implantation 3D T1 MRI scan was processed using FreeSurfer to segment the white matter, deep grey matter structures, and cortex, reconstruct the pial surface, approximate the leptomeningeal surface (Schaer et al., 2008), and parcellate the neocortex according to gyral anatomy (Desikan et al., 2006). In order to compensate for the brain shift that accompanies the insertion of subdural electrodes through a large craniotomy, subdural electrodes were projected back to the pre-implantation leptomeningeal surface (Dykstra et al., 2012) using iELVis.
iEEG recording and preprocessing
Intracranial EEG signals were referenced to a vertex subdermal electrode, filtered and digitized (0.1 Hz high-pass filter, 200 Hz low-pass filter, 500-512 samples per second, XLTEK EMU128FS or Natus Neurolink IP 256 systems, Natus Medical, Inc., Pleasanton, CA). Analysis was performed offline using the FieldTrip toolbox (http://www.fieldtriptoolbox.org/) (Oostenveld et al., 2011) and custom-made programs for MATLAB (The MathWorks Inc., Natick, MA, https://www.mathworks.com/products/matlab.html). 60-Hz line noise and its harmonics were filtered out using a discrete Fourier transform filter, iEEG electrodes contaminated with noise or abundant epileptiform activity were identified visually and rejected, and the remaining iEEG signals were re-referenced to average reference.
Time-frequency analysis of iEEG signals
Time-frequency analysis was performed using a Morlet wavelet transform. Wavelets (3 cycles) were centered every 10 ms from −2 to +10 s with respect to stimulus onset, and every 1 Hz from 1-8 Hz, every 2 Hz at 10 and 12 Hz, and every 10 Hz from 70-150 Hz. The complex number resulting from the wavelet transform was used to compute the power and phase of oscillations.
Power
Single-trial power was baseline-corrected by dividing it by the mean power over trials of the same modality during the −0.5 to −0.25-s baseline period preceding stimulus onset (Grandchamp and Delorme, 2011). Note that moving the baseline to earlier or later periods (−1.0 to −0.75 s, −0.75 to −0.5 s, or −0.25 to 0 s) did not significantly alter our observations. Power was averaged over canonical frequency bands (delta: 1-3 Hz, theta: 4-7 Hz, alpha: 8-12 Hz). Broadband high-frequency activity (BHA), which reflects local neuronal activity (Ray et al., 2008), was computed by dividing single-trial power within each 10-Hz frequency band between 70 and 150 Hz by its own mean over the trial, baseline-correcting as described above, and then averaging over frequency bands. Single-trial power was then averaged over trials in each modality and over time from +1 to +7 seconds relative to stimulus onset; the first and last seconds were ignored to leave out onset and offset responses.
In order to assess whether auditory electrodes displayed changes in power in response to visual speech, a paired t-test for power during stimulus presentation compared to baseline was computed in each electrode, modality and frequency band. The corresponding p-values (two-tailed, because the null hypothesis was that power did not either increase or decrease from baseline) were corrected for multiple comparisons across electrodes using an FDR procedure with a family-wise error rate set at 0.05, implemented in the Mass Univariate ERP toolbox (Groppe, Urbach and Kutas, 2011). Note that the Benjamini-Hochberg FDR procedure maintains adequate control of the family-wise error rate even in the case of positive dependencies between the observed variables (Benjamini and Hochberg, 1995; Groppe, Urbach and Kutas, 2011). The same approach was used in visual electrodes to test for power changes in response to auditory speech.
Intertrial coherence
The intertrial coherence (ITC) quantifies the phase alignment of iEEG oscillations over trials and ranges from 0 to 1, 1 indicating perfect phase alignment (Tallon-Baudry et al., 1996). ITC was computed as the mean resultant length of the phase angle of slow oscillations over the 8 trials where the same stimulus was presented. Single-stimulus ITC was then averaged over time from +1 to +7 seconds relative to stimulus onset and over stimuli within each modality. In order to assess the hypothesis that auditory electrodes displayed increased ITC to visual speech, a permutation test was used to generate a surrogate distribution of ITC under the null hypothesis. For that purpose, 8 trials were selected at random from the 64 trials in each modality and the ITC over these 8 trials was computed in the same fashion as the observed ITC. The procedure was repeated 1000 times. Observed ITC values were converted to z-scores relative to the surrogate distribution. The corresponding p-values (considering the z-scores as onetailed, because the null hypothesis was that observed ITC values are not higher than expected by chance) were corrected for multiple comparisons over electrodes and frequency bands (delta, theta and alpha) using an FDR procedure with a family-wise error rate set at 0.05. The same approach was used in visual electrodes to test for increased ITC to auditory speech.
Phase-amplitude coupling
Phase-amplitude coupling refers to the systematic relationship between the phase of a slow oscillation and the intensity of local neuronal activity, approximated by BHA (Canolty et al., 2006). Phase-amplitude coupling was quantified by computing the modulation index (MI) (Tort et al., 2010) using custom-made MATLAB code. MI relates to the Kullback-Leibler distance between the observed distribution of BHA values, binned as a function of slow oscillatory phase, and a uniform distribution. It ranges from 0 to 1, 0 indicating absolutely no phase-amplitude coupling. MI was computed for each trial from +1 to +7 seconds relative to stimulus onset, and was then averaged over stimuli within each modality.
In order to assess the hypothesis that auditory electrodes displayed significant phase-amplitude coupling during visual speech, a permutation test was used to generate a surrogate distribution of MI under the null hypothesis. For that purpose, the BHA and slow oscillatory phase of trials within each modality were paired at random and a surrogate value of MI was computed in the same fashion as the observed MI. The procedure was repeated 1000 times. Observed MI values were converted to z-scores relative to the surrogate MI distribution. The corresponding p-values (one-tailed, because the null hypothesis was that observed MI values are not higher than expected by chance) were corrected for multiple comparisons over electrodes and frequency bands (delta, theta and alpha) using an FDR procedure with a family-wise error rate set at 0.05.
Stimulus reconstruction
Because neuronal activity in auditory cortex reflects the dynamics of auditory stimuli, the spectro-temporal features of speech sounds can be reconstructed from the neural responses of auditory cortex (Mesgarani et al., 2009; Pasley et al., 2012). In order to determine whether cortex encodes features of speech stimuli that are detailed enough to allow their identification, the speech envelope was reconstructed from the iEEG responses using NAPLIB (Khalighinejad et al., 2017). The rationale of reconstructing the speech envelope, a feature of auditory stimuli, from neural responses to unisensory visual speech is that the speech envelope correlates with the area of mouth opening (Chandrasekaran et al., 2009; Schwartz and Savariaux, 2014). The broadband speech envelope was extracted by filtering the audio track of the video clips through a gammatone filter bank with 128 center frequencies equally spaced on the equivalent rectangle bandwidth-rate scale and ranging from 80 and 5000 Hz, approximating a cochlear filter (Carney and Yin, 1988); computing the power in each frequency band using a Hilbert transform; and averaging power over frequencies (University of Surrey’s Institute of Sound Recording MATLAB Toolbox). The speech envelope was then reconstructed from the broadband iEEG signals (downsampled to 100 samples per second, then averaged over trials for each video clip in each modality) using optimal prior reconstruction, a linear mapping between the neural responses and the original stimulus (Mesgarani et al., 2009). Lags of −200 to +200 ms between the speech envelope and the neural responses were allowed. The reconstruction algorithm was first trained on 7 of the 8 stimuli in each modality, and then tested by reconstructing the speech envelope of the 8th stimulus from the corresponding neural responses. The procedure was repeated for all 8 video clips. In order to account for the varying lengths of the video clips, the speech envelopes were padded with zeros between −1 and +13 s relative to stimulus onset. The zero-lag cross-correlation between the actual and reconstructed speech envelopes (averaged over stimuli within each modality) was used as a metric for the accuracy of stimulus reconstruction.
In order to assess the hypothesis that the speech envelope can be reconstructed from the activity of auditory electrodes in response to visual speech, a permutation test was used to generate a surrogate distribution of cross-covariance under the null hypothesis. For that purpose, stimulus reconstruction was performed after the labels of the speech envelopes of each stimulus in each modality were shuffled, and surrogate values of the cross-covariance was computed in the same fashion as the observed cross-covariances. The procedure was repeated 1000 times. Observed cross-covariances were then converted to z-score relative to the surrogate distribution. The corresponding p-values (one-tailed, because the null hypothesis was that observed cross-covariance values are not higher than expected by chance) were corrected for multiple comparisons over electrodes using an FDR procedure with a family-wise error rate set at 0.05.
Electrode selection
The selection of electrodes for further analysis was based on a combination of anatomical and neurophysiological criteria. The anatomical criterion was based on the Desikan-Killiany parcellation of each participant’s MRI (Desikan et al., 2006). Within each participant, the selection of auditory electrodes started by identifying electrodes that lay in the superior temporal lobe (superior temporal gyrus, transverse temporal cortex, or banks of the superior temporal sulcus of the Desikan-Killiany parcellation). Then, the BHA response of these electrodes to auditory speech was examined for a sustained increase (physiological criterion). BHA was averaged between +1 and +7 seconds relative to stimulus onset and compared to baseline using a two-tailed one-sample t-test. P-values were corrected for multiple comparisons over electrodes (2-8 per participant) using a false-discovery rate (FDR) procedure (Benjamini and Hochberg, 1995) with a family-wise error rate set at 0.05. 25 electrodes in 5 participants matched these criteria. Similarly, visual electrodes were defined as those electrodes that lay in the occipital lobe (lingual gyrus, pericalcarine cortex, cuneus, or lateral occipital cortex of the Desikan-Killiany parcellation) and that displayed a sustained BHA increase in response to visual speech. 28 electrodes in 4 participants matched these criteria.
Assessing the statistical significance of observed effects across all electrodes of interest We computed the probability of observing a given number or more significant electrodes under the null hypothesis by simulating one billion null experiments and subjecting the simulated z-scores to the same FDR procedure as the observed data. The corresponding probabilities appear in the legends to the figures.
Data and software availability
Please see Table S1 for a list of materials and software used in this study. Data and custom-made software are available upon request from Pierre Mégevand (pierre.megevand{at}unige.ch).
AUTHOR CONTRIBUTIONS
Conceptualization: PM, EZG, CES and ADM; Software: PM, MRM, DMG and NM; Formal Analysis: PM, MRM, DMG and NM; Investigation: PM and EZG; Writing – Original draft: PM; Writing – Review & Editing: PM, MSB, CES and ADM; Visualization: PM, MRM and DMG; Project Administration: PM, DMG and ADM; Funding Acquisition: PM, CES and ADM; Supervision: CES and ADM.
DECLARATION OF INTERESTS
The authors declare no competing interests.
ACKNOWLEDGEMENTS
We thank the patients for their participation; Erin Yeagle, Willie Walker Jr., the physicians, and other professionals of the Neurosurgery and Neurology departments of North Shore University Hospital for their assistance; Itzik Norman for help with brain surface reconstruction; Bahar Khalighinejad for help with stimulus reconstruction. Part of the computations for this work were performed at the University of Geneva on the Baobab cluster. This work was supported by the Swiss National Science Foundation (grants 139829, 148388 and 167836 to PM), the NINDS (NS098976 to CES, MSB and ADM) and the Page and Otto Marx Jr. Foundation to ADM.