Abstract
In real-life multi-talker listening environments, the auditory system needs to isolate attended from distracting sound sources and to compensate for non-stationary acoustical conditions. How and at which stages of the central auditory pathway this is achieved is unclear. Here we used electroencephalography (EEG) to investigate the effect of continuously varying signal-to-noise ratio (SNR) on the neural response to speech while listeners (N=18) attended to one of two simultaneously presented, spatially non-segregated talkers. We show that the differential impact of attentional set (i.e., which talker to attend to) and SNR (i.e., which talker is louder) on successive components of neural phase-locking reflects the unfolding of an SNR-invariant representation of the target talker in time and cortical topography. Using a forward encoding-model approach, neural responses to the temporal envelopes of individual talkers and their respective modulation by both, attentional set and SNR were estimated. The model response yielded a clear succession of P1–N1–P2-like components and attention detection accuracies of −80% in sensor and source space. The earlier component were driven almost exclusively by SNR, while the latest P2 component reflected only attentional set. Under the most adverse SNR, the modeled response yielded an additional, late component and enhanced low-frequency phase coherence to the ignored talker, which indicate contributions of a fronto-parietal attention network in suppressing irrelevant acoustic input. Modeling the neurocortical response can thus provide us with a comprehensive spatio-temporal view on how attentional filters for successful suppression of distracting sensory information are implemented neurally.
Significance statement Listening requires neural means of tracking an attended sound source (e.g., an attended talker) and of identifying and inhibiting processing of concurrent, distracting sound sources. Here, we investigate the neural response in a highly distracting listening scenario by training forward encoding models, which are linear mappings from the broad-band envelope of the speech signal towards the recorded neural response in the electroencephalogram. Over the initial 400 ms in response to concurrent speech, the neural representation becomes gradually more biased towards the attended source and increasingly invariant to adverse acoustic conditions. These results fill a gap in our understanding of how auditory attentional filters are implemented neurally, that is, when and where attentional control succeeds at suppressing distracting sensory information.