Abstract
Online speech processing imposes significant computational demands on the listening brain. Predictive coding provides an elegant account of the way this challenge is met through the exploitation of prior knowledge. While such accounts have accrued considerable evidence at the sublexical- and word-levels, relatively little is known about the predictive mechanisms that support sentence-level processing. Here, we exploit the ‘pop-out’ phenomenon (i.e. dramatic improvement in the intelligibility of degraded speech following prior information) to investigate the psychophysiological correlates of sentence comprehension. We recorded electroencephalography and pupillometry from 21 humans (10 females) while they rated the clarity of full sentences that had been degraded via noise-vocoding or sine-wave synthesis. Sentence pop-out was reliably elicited following visual presentation of the corresponding written sentence, despite never hearing the undistorted speech. No such effect was observed following incongruent or no written information. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that pop-out is mediated via top-down signals that enhance the precision of cortical speech representations. Spectral analysis revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta- and alpha-band power, as well as pupil diameter, were increased following the provision of any written information. We interpret these findings as evidence of a transition to a state of active listening, whereby participants selectively engage attentional and working memory processes to evaluate the congruence between expected and actual sensory input.
Significance Statement Continuous speech processing depends on the integration of top-down expectations and bottom-up sensory inputs, the neurophysiological substrates of which remain poorly understood. Here, we investigate the neural correlates of auditory filling-in using full sentences and two complementary forms of speech degradation (noise-vocoded and sine-wave speech). The effect of prior expectation was assessed through the reconstruction of noisy stimuli from the electroencephalogram (EEG) using a multivariate model trained on a separate dataset in which participants listened to clear speech. Prior expectations were delivered in a different modality (visual) to focus our investigation on top-down processes. Our findings demonstrate how prior expectations from one modality can be flexibly transferred to another to recover the meaning of continuous speech from degraded stimuli.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Conflict of interest statement:The authors declare no competing financial interests.