Abstract
Tracking thhe envelope of speech in the brain is important for speech comprehension. Recent research suggests that acoustic background noise can enhance neural speech tracking, enabling the auditory system to robustly encode speech even under unfavorable conditions. Aging and hearing loss are associated with internal, neural noise in the auditory system, which raises the question whether additional acoustic background noise can enhance neural speech tracking in older adults. In the current electroencephalography study, younger (∼25.5 years) and older adults (∼68.5 years) listened to spoken stories either in quiet (clear) or in the presence of background noise at a wide range of different signal-to-noise ratios. In younger adults, neural speech tracking was enhanced by minimal background noise, indicating the presence of stochastic resonance, that is, the response facilitation through noise. In contrast, older adults, compared to younger adults, showed enhanced neural speech tracking for clear speech and speech masked by minimal background noise, but the acoustic noise led to little enhancement in neural tracking in older people. The data demonstrate different sensitivity of the auditory cortex to speech masked by noise between younger and older adults. The results are consistent with the idea that the auditory cortex of older people exhibits more internal, neural noise that enhances neural speech tracking – through stochastic resonance – but that additional acoustic noise does not further support speech encoding. The work points to a highly non-linear auditory system that differs between younger and older adults.
Significance statement Acoustic background noise can enhance neural speech tracking in younger adults to facilitate robust speech encoding in unfavorable situations. Aging and hearing loss increase neural noise, potentially making the auditory system less sensitive to acoustic noise. Here, younger and older adults listened to spoken stories in quiet or background noise while electroencephalography was recorded. Neural speech tracking was larger for older than younger adults for speech in quiet and under minimal background noise. However, noise enhanced neural tracking only for younger, but not older adults. The results support the idea that the auditory cortex of older people exhibits more neural noise that enhances neural speech tracking through stochastic resonance, but that additional acoustic noise does not further amplify speech encoding.
Introduction
Many older adults live with some form of hearing loss (Feder et al., 2015; Goman and Lin, 2016) that leads to difficulties comprehending speech in the presence of background noise, such as in crowded places (Pichora-Fuller et al., 2016; Herrmann and Johnsrude, 2020). Understanding how speech in noisy situations is encoded in the brains of older people is critical for developing effective treatments for speech comprehension challenges.
Much research has focused on how the auditory cortex tracks the envelope of speech (Lalor and Foxe, 2010; Ding et al., 2014; Ding and Simon, 2014; Brodbeck and Simon, 2020), because accurate envelope encoding is thought to facilitate speech understanding (Rosen, 1992; Shannon et al., 1995; Ding et al., 2014; Vanthornhout et al., 2018; Lesenfants et al., 2019). However, recent works suggest non-linearities in how neural speech tracking is affected by different levels of background noise (Yasmin et al., 2023; Herrmann, 2024; Panela et al., 2024). Neural speech tracking exhibits an inverted u-shaped profile, where tracking is highest for moderate signal-to-noise ratios (SNRs) that are associated with intelligible, but challenging speech, whereas neural tracking decreases for more unfavorable SNRs (poor intelligibility) and more favorable SNRs (high intelligibility; Yasmin et al., 2023; Herrmann, 2024). Attentional effort required to understand speech at moderate SNRs has been suggested to lead to the neural- tracking enhancement (Hauswald et al., 2022; Yasmin et al., 2023; Panela et al., 2024), but recent work demonstrates little impact of attention on the inverted u-shape (Herrmann, 2024). Instead, it was suggested that noise per se increases neural speech tracking at moderate SNRs, and that tracking only decreases when speech intelligibility significantly declines for unfavorable SNRs (Herrmann, 2024). Stochastic resonance – the response facilitation through noise (McDonnell and Abbott, 2009; McDonnell and Ward, 2011; Krauss et al., 2016) – was proposed as the critical mechanism that leads to the neural tracking enhancement (Herrmann, 2024; Figure 1).
A: Schematic of an auditory microcircuit with normal and reduced inhibition (INH), regulating the spiking output of an excitatory neuron (EXC). Reduced inhibition in auditory cortex is associated with aging and hearing loss (Ouellet and de Villers-Sidani, 2014; Salvi et al., 2017; Herrmann and Butler, 2021), leading to less regulated – more noisy – spiking output. B: Visualization of stochastic resonance (McDonnell and Abbott, 2009; McDonnell and Ward, 2011; Krauss et al., 2016) through the simulation of the membrane potential of a single neuron driven by a periodic (5 Hz) stimulus input. An integrate-and-fire neuron model was used (Izhikevich, 2003, 2004). Periodic stimulation in the absence of noise (top left) does not elicit spikes (i.e., action potentials; bottom left). Periodic stimulation in the presence of background noise (top right) leads to spiking activity at the periodicity of the stimulus input (bottom right). This response facilitation through nose is referred to as stochastic resonance (McDonnell and Abbott, 2009; McDonnell and Ward, 2011; Krauss et al., 2016).
Aging and hearing loss are associated with a loss of neural inhibition and an increase in neural excitation in auditory cortex, resulting from reduced inputs to the neural pathway caused by peripheral damage (Caspary et al., 2008; Ouellet and de Villers-Sidani, 2014; Zhao et al., 2016; Resnik and Polley, 2017; Salvi et al., 2017; Herrmann and Butler, 2021; McClaskey, 2024). A loss of inhibition and increased excitation can manifest as hyperresponsivity to sound (Auerbach et al., 2014; Chambers et al., 2016; Salvi et al., 2017). Consistently, the neural tracking of the speech envelope is enhanced in older compared to younger adults (Presacco et al., 2016a, b; Brodbeck et al., 2018; Decruy et al., 2019; Broderick et al., 2021; Panela et al., 2024), highlighting the impact on the encoding of relevant features of speech. Reduced inhibition and increased excitation also increase spontaneous activity – and thus neural noise – in the absence of sound (Kaltenbach and Afman, 2000; Eggermont and Roberts, 2004; Eggermont, 2015; Parthasarathy et al., 2019). Neural noise – although difficult to observe directly in humans using non-invasive recording techniques, such as electroencephalography (EEG) – could drive the age-related enhancement of neural speech tracking through stochastic resonance (for discussions of the role of stochastic resonance in hearing loss see Krauss et al., 2016; Schilling et al., 2023). Some neurons may receive insufficient input to elicit a response when an individual listens to clear speech but may be pushed beyond their firing threshold by acoustically elicited neural noise (e.g., younger) or intrinsic neural noise (e.g., older) in the auditory system (Figure 1B). Critically, a neural system with increased internal, neural noise (e.g., older) may show reduced sensitivity to external, acoustic noise than a system with lower neural noise (e.g., younger), because the neural distortions introduced through acoustic noise may not necessarily add further output amplification to a system that is already amplified through stochastic resonance. However, whether the auditory system of older adults enhances neural speech tracking through acoustic noise is unknown.
The current study uses EEG to investigate in younger and older adults how neural speech tracking is affected by background noise ranging from very high (i.e., intelligible) to low SNRs (i.e., less intelligible). Enhanced speech tracking for speech in quiet and a reduced sensitivity of neural speech tracking to background noise for older compared to younger adults would indicate that changes in the aged auditory system, such as increased neural noise, reduces stochastic resonance driven by external, acoustic noise.
Methods and materials
Participants
Twenty-six younger adults (median: 25.5 years; range: 18–34 years; 8 male or man, 16 female or woman, 1 transgender man, 1 non-binary) and 26 older adults (median: 68.5 years; range: 57–78 years; 9 male or man, 17 female or woman) participated in the current study. Participants were native English speakers or grew up in English-speaking countries (mostly Canada) and have been speaking English since early childhood (<5 years of age). Participants reported having normal hearing abilities and no neurological disease (one person reported having ADHD, but this did not affect their participation). Participants gave written informed consent prior to the experiment and were compensated for their participation. The study was conducted in accordance with the Declaration of Helsinki, the Canadian Tri-Council Policy Statement on Ethical Conduct for Research Involving Humans (TCPS2-2014), and was approved by the Research Ethics Board of the Rotman Research Institute at Baycrest Academy for Research and Education.
Acoustic environment and stimulus delivery
Data were gathered in a sound-attenuating booth to reduce external sound interference. Sounds were delivered using Sennheiser HD 25-SP II headphones connected via an RME Fireface 400 audio interface. The experiment was implemented using Psychtoolbox (version 3.0.14) running in MATLAB (MathWorks Inc.) on a Lenovo T480 laptop with Windows 7. Visual stimuli were projected into the booth via a mirrored display. Auditory stimuli were played at approximately 70 dB SPL.
Hearing assessment
Pure-tone audiometry was administered for each participant at frequencies of 0.25, 0.5, 1, 2, 3, 4, 6, and 8 kHz. Pure-tone average thresholds (PTA: average across 0.5, 1, 2, and 4 kHz; Stevens et al., 2013; Humes, 2019) were higher for older compared to younger adults (t50 = 6.893, p = 8.8 10-9, d = 1.912; Figure 2A). Elevated thresholds are consistent with the presence of mild-to- moderate hearing loss in the current sample of older adults, as would be expected (Moore, 2007; Plack, 2014; Presacco et al., 2016a; Herrmann et al., 2018, 2022). A few older adults of the current sample also appeared to have ‘clinical’ hearing loss as indicated by thresholds above 20 dB HL (Stevens et al., 2013; Humes, 2019), but none of them were prescribed with hearing aids. Although the main analyses focus on the originally intended comparisons of younger and older adults, in explorative analyses, data were analyzed separately for older adults with clinically normal hearing (PTA < 20 dB HL) and those with hearing impairment (PTA > 20 dB HL).
A: Left: Pure-tone audiometric thresholds for younger and older adults. Circles and thick lines reflect the mean across participants. Thin lines are the thresholds for each participant. Right: Pure-tone average threshold (PTA; across 0.5, 1, 2, and 4 kHz). Bars reflect the mean across participants. Error bars reflect the standard error of the mean. Dots reflect the PTAs for individual participants. B: Sound levels of the speech and background babble for each signal-to-noise ratio (SNR) used in the current study. Sound levels are provided in dB based on MATLAB calculations. More negative values reflect softer sound intensities. Values can be interpreted relative to each other, whereas the absolute magnitude is related to hardware and software conditions, such as sound card, transducers, and MATLAB internal settings. The colored dashed lines show the mean sensation level for a babble noise stimulus for both age groups. C: Sample spectrograms for the first 6 seconds of one story under different speech-clarity conditions (clear, 44.4, 27.6, and 10.8 dB SNR). Note that the magnitudes in panels B and C are not comparable.
In order to obtain a reference threshold in MATLAB software for speech and babble presentation during the main experimental procedures, the sensation level for a 12-talker babble noise was estimated using a method-of-limits procedure (Leek, 2011; Herrmann and Johnsrude, 2018; Herrmann et al., 2022). Participants listened to a 14-s babble noise that changed continuously in intensity at a rate of 5.4 dB/s (either decreased [i.e., starting at suprathreshold levels] or increased [i.e., starting at subthreshold levels]). Participants pressed a button when they could no longer hear the tone (intensity decrease) or when they started to hear the tone (intensity increase). The sound stopped after the button press. The sound intensity at the time of the button press was noted for 6 decreasing sounds and 6 increasing sounds (decreasing and increasing sounds alternated), and these were averaged to determine the sensation level. Due to technical issues this threshold was only available for 21 younger and 25 older adults. As expected, given the audiometric pure-tone average thresholds (Figure 2A), sensation levels for the babble noise were elevated for older compared to younger adults (t44 = 2.573, p = 0.014, d = 0.762, mean difference: 5.2 dB; Figure 2B).
Story materials
Participants listened to 20 unique audio stories, each with a duration between 1.5 to 2.5 minutes. These stories were crafted using OpenAI’s GPT-3.5 (OpenAI et al., 2023), which also generated four comprehension questions per story, alongside four answer options (one correct, three distractors). The themes varied widely across stories, encompassing scenarios such as making an unexpected friendship on a plane, a boy finding a knitting talent, and a linguist deciphering ancient text. To ensure high quality of both the content and questions, the AI- generated materials underwent manual verification. Google’s AI-based text-to-speech synthesizer was employed to produce the auditory version of the stories, using the male English- speaking voice “en-US-Neural2-J” with default settings (https://cloud.google.com/text-to-speech/docs/voices).
Participants listened to the 20 stories in 5 blocks, each comprising 4 stories. Four of the 20 stories were played under clear conditions, that is, in quiet. Twelve-talker babble was added to the other 16 stories (Bilger, 1984; Bilger et al., 1984; Wilson et al., 2012). Twelve-talker babble simulates a crowded restaurant, but does not permit identifying individual words in the masker (Mattys et al., 2012). The babble masker was added at SNRs ranging from +57 to –6 dB in 16 steps of 4.2 dB SNR (Figure 2B, C). Speech in background babble above +15 dB SNR is highly intelligible (Holder et al., 2018; Rowland et al., 2018; Spyridakou et al., 2020; Irsik et al., 2022), and listeners had no trouble understanding speech at the highest SNRs used in the current study. All stimuli (clear speech; the mixed speech and babble signals) were normalized to the same root-mean-square amplitude and presented at about 70 dB SPL. Figure 2B shows that for high SNRs, the root-mean-square amplitude of the speech signal in the sound mixture remained relatively constant, because the babble at high SNRs had little impact on the root-mean-square amplitude of the sound mixture. Assignment of speech-clarity levels (clear speech and SNRs) to specific stories was randomized for each participant.
After each story, participants rated two statements regarding their speech comprehension using a 9-point scale (1 = strongly disagree, 9 = strongly agree): ‘I understood the gist of the story’ and ‘I was able to comprehend the speech well’. They were instructed to rate the statements independently from other stories they had heard. Ratings were linearly normalized to a 0 to 1 scale for statistical purposes, making them comparable to proportion- correct measures (Herrmann, 2024; Mathiesen et al., 2024; Panela et al., 2024). Such gist ratings have previously been shown to strongly correlate with word-report speech intelligibility measures (Davis and Johnsrude, 2003; Ritz et al., 2022). The ratings of the two statements were averaged to obtain one comprehension rating per story and participant. After rating the two statements, participants answered four multiple-choice questions about the content of the story. The comprehension questions offered four answer choices (chance level of 25%). The proportion of correct answers was calculated.
Electroencephalography (EEG) acquisition and preprocessing
A BioSemi system (BioSemi, Netherlands) was used to record electroencephalographic data from 16 Ag/Ag–Cl electrodes (10-20 system) and two additional electrodes, one positioned on the left and one on the right mastoid. Data were recorded at a 1024 Hz sampling rate and with a 208 Hz online low-pass filter. Reference electrodes were part of the BioSemi CMS-DRL (common mode sense-driven right leg) system for optimal referencing and noise reduction.
Offline processing was performed in MATLAB. A 60-Hz elliptic notch filter was used to reduced power-line noise. EEG signals were re-referenced to the average of the left and right mastoid electrodes, which enhances auditory responses at fronto-central electrodes (Ruhnau et al., 2012; Herrmann, 2024). EEG data were high-pass filtered at 0.7 Hz (length: 2449 samples, Hann window) and low-pass filtered at 22 Hz (length: 211 samples, Kaiser window, window parameter 4). The data were time-locked to the onset of each story, downsampled to 512 Hz, and subjected to an Independent Component Analysis (ICA) to remove blink and eye movement artifacts (Bell and Sejnowski, 1995; Makeig et al., 1995; Oostenveld et al., 2011). Signal segments showing fluctuations greater than 80 µV within a 0.2-second window in any EEG channel were set to 0 µV to remove artifacts not removed by the ICA (cf. Dmochowski et al., 2012; Dmochowski et al., 2014; Cohen and Parra, 2016; Irsik et al., 2022; Yasmin et al., 2023; Panela et al., 2024). Finally, EEG data were further low-pass filtered at 10 Hz (251 points, Kaiser window) because neural signals in the low-frequency range are most sensitive to the speech envelope (Luo and Poeppel, 2007; Di Liberto et al., 2015; Zuk et al., 2021; Karunathilake et al., 2023; Synigal et al., 2023; Yasmin et al., 2023).
Calculation of amplitude-onset envelopes
Each story’s audio signal (devoid of background noise) was processed through a basic auditory model, which included 30 cochlear-like auditory filters and cochlear compression by a factor of 0.6 (McDermott and Simoncelli, 2011). The resulting 30 envelopes were averaged and smoothed with a 40-Hz low-pass filter (Butterworth, 4th order). Such a computationally simple peripheral model has been shown to be sufficient, as compared to complex, more realistic models, for envelope-tracking approaches (Biesmans et al., 2017). The amplitude-onset envelope was computed since it elicits strong neural speech tracking (Hertrich et al., 2012) and was used in the previous studies in younger adults that showed noise-related enhancements in neural speech tracking (Yasmin et al., 2023; Herrmann, 2024; Panela et al., 2024). The amplitude-onset envelope was obtained by calculating the first derivative of the averaged amplitude envelope and subsequently setting negative values to zero (Hertrich et al., 2012; Fiedler et al., 2017; Daube et al., 2019; Fiedler et al., 2019; Yasmin et al., 2023; Panela et al., 2024). It was then downsampled to match the EEG data’s temporal resolution and transformed to z-scores (subtraction by the mean and division by the standard deviation).
EEG analysis: Temporal response function and prediction accuracy
The relationship between EEG signals and auditory stimuli was assessed using a linear temporal response function (TRF) model (Crosse et al., 2016; Crosse et al., 2021). Ridge regression with a regularization parameter of λ = 10 was applied based on previous work (Fiedler et al., 2017; Fiedler et al., 2019; Yasmin et al., 2023; Panela et al., 2024). Pre-selection of λ based on previous work avoids extremely low and high λ on some cross-validation iterations and avoids substantially longer computational time. Pre-selection of λ also avoids issues if limited data per condition are available, as in the current study (Crosse et al., 2021).
For each story, 50 random 25-second segments of the EEG data were extracted and paired with corresponding segments of the amplitude-onset envelope. A leave-one-out cross- validation approach was employed, with one segment reserved for testing and the other non- overlapping segments used to train the TRF model for lags ranging from 0 to 0.4 s. The model’s performance was evaluated by correlating the predicted EEG signals with the actual EEG in the test segment, and this procedure was repeated across all 50 segments to derive the mean prediction accuracy. Overlapping segments were used to increase the amount of data for training given the short duration of the stories (Herrmann, 2024). Critically, speech-clarity levels were randomized across stories and analyses were the same for all conditions. Hence, no impact of overlapping training data on the results is expected (consistent with noise-related enhancements observed previously when longer stories and non-overlapping data were used; Yasmin et al., 2023).
To investigate the neural-tracking response amplitude, TRFs for each training dataset were calculated for lags ranging from -0.15 to 0.5 s. Baseline correction was performed by subtracting the mean signal from -0.15 to 0 seconds from the TRF data at each time point. Analysis concentrated on the fronto-central electrodes (F3, Fz, F4, C3, Cz, C4), which are known to reflect auditory cortical activity (Näätänen and Picton, 1987; Picton et al., 2003; Herrmann et al., 2018; Irsik et al., 2021). Key metrics were the P1-N1 and P2-N1 amplitude differences. To this end, the P1, N1, and P2 latencies were estimated for each SNR from the averaged time courses across participants (separately for each group). P1, N1, and P2 amplitudes were calculated for each participant and condition as the mean amplitude in the 0.02 s time window centered on the peak latency. The P1-minus-N1 and P2-minus-N1 amplitude differences were calculated. The amplitude of individual TRF components (P1, N1, P2) was not analyzed because the TRF time courses for the clear condition had an overall positive shift (see also Herrmann, 2024; Panela et al., 2024) that could bias analyses more favorably towards response differences which may, however, be harder to interpret. The P1-N1 amplitude is of particular interest in the current study, because this early auditory response has previously shown the response amplification through background noise (Yasmin et al., 2023; Herrmann, 2024; Panela et al., 2024).
Statistical analyses
Behavioral data (comprehension accuracy, comprehension ratings), TRFs, and EEG prediction accuracy for the four clear stories were averaged. For the stories in babble, a sliding average across SNR levels was calculated for behavioral data, TRFs, and EEG prediction accuracy, such that data for three neighboring SNR levels were averaged to reduce noise in the data.
For the statistical analyses of behavioral data (comprehension accuracy, comprehension ratings), P1-N1 amplitude, P2-N1 amplitude, and EEG prediction accuracy, the clear condition was compared to each SNR level (resulting from the sliding average) using a paired samples t- test. False discovery rate (FDR) was used to account for multiple comparisons (Benjamini and Hochberg, 1995; Genovese et al., 2002). Age groups were compared at each SNR level individually using an independent-samples t-test and FDR thresholding. To investigate the overall effect of background babble and interaction with group, a repeated-measures analysis of variance (rmANOVA) was calculated with the within-participants factor Speech Clarity (clear, babble [averaged across SNR levels]) and Group (younger, older). In addition, analyses also explored neural responses for the older adult group split into those with clinically normal hearing (N = 15; PTA < 20 dB HL) and those with hearing impairment (N = 11; PTA > 20 dB HL).
All statistical analyses were carried out using MATLAB (MathWorks) and JASP software (JASP, 2024; version 0.19.1). Note that for post hoc tests of an rmANOVA, JASP uses the rmANOVA degrees of freedom. The reported degrees of freedom may thus be higher than for direct contrasts had they been calculated independently from the rmANOVA.
Results
Older adults show reduced noise-related enhancements of neural speech tracking
For both younger and older adults, story comprehension accuracy decreased for the most difficult SNRs relative to clear speech, but there were no differences between age groups (FDR- thresholded; Figure 3A). Ratings of speech comprehension/gist understanding decreased for SNRs below +10 dB compared to clear speech, for both age groups. Older adults also rated speech comprehension/gist understanding higher than younger adults for SNRs between +6.6 and +40.2 dB, which may be related to the known higher subjective ratings of hearing abilities relative to objective hearing abilities in older compared to younger adults (Helfer et al., 2017; Helfer and Jesse, 2021).
A: Story comprehension accuracy. B: Ratings of speech comprehension and gist understanding. The colored, horizontal lines close to the x-axis reflect a significant difference between clear speech and the different SNRs (FDR-thresholded). The two- colored (dashed) line reflects a significant difference between age groups (FDR-thresholded).
Figures 4A and 4B show the temporal responses functions and topographical distributions. Figures 4C displays P1-N1 amplitudes as they relate to speech-clarity conditions. For younger adults, P1-N1 amplitudes increased with decreasing SNR relative to clear speech, up to about +10 dB SNR, whereas amplitudes decreased for yet lower SNRs (Figure 4C, left). For older adults, the increase in P1-N1 amplitudes associated with background babble was only significant around +10 dB SNR, with amplitudes decreasing for lower SNRs (Figure 4C, left). In fact, P1-N1 amplitudes were greater for older compared to younger adults only for clear speech and for speech at high SNRs (i.e., >28 dB), because the auditory cortex of older adults showed a reduced sensitivity to background babble (Figure 3C, left). This reduced noise-sensitivity is also evidenced by the rmANOVA for the P1-N1 amplitude (clear speech vs speech in babble [collapsed across SNRs]). The Speech Clarity × Group interaction (F1,50 = 15.714, p = 2.3 · 10-4, ω2 = 0.025) showed that the P1-N1 amplitude for younger adults was greater for speech in babble than for clear speech (t25 = 6.893, pHolm = 2.1 · 10-5, d = 0.611), whereas this was not the case for older adults (t25 = 0.392, pHolm = 0.697, d = 0.046; Figure 3C, right; effect of Speech Clarity: F1,50 = 11.628, p = 0.001, ω2 = 0.018; effect of Group: F1,50 = 16.207, p = 1.9 · 10-4, ω2 = 0.13).
A: Temporal response functions for different speech-clarity conditions and for younger and older adults. B: Topographies for P1- N1 and P2-N1 TRF amplitudes. C: Left: P1-N1 amplitudes for different speech-clarity conditions and both age groups. The colored, horizontal lines close to the x-axis reflect a significant difference between clear speech and the SNR conditions (FDR-thresholded). The two-colored (dashed) line reflects a significant difference between age groups (FDR-thresholded). Right: P1-N1 TRF amplitude for clear speech and the mean across SNR conditions (babble). D: Same as in panel C for the P2-N1 TRF amplitudes.
Figures 4D displays the relation between speech-clarity conditions and P2-N1 amplitudes. For both younger and older adults, the P2-N1 amplitudes were smaller for SNRs around 0 dB and below compared to clear speech, but older adults showed overall larger P2-N1 amplitudes for all speech-clarity conditions. This is also shown by the rmANOVA for the P2-N1 amplitude, revealing smaller amplitudes for speech in babble than clear speech (F1,50 = 13.780, p = 5.2 · 10-4, ω2 = 0.030) and larger amplitudes for older compared to younger adults (F1,50 = 26.698, p = 4.2 · 10-6, ω2 = 0.201). The interaction was not significant (F1,50 = 1.540, p = 0.220, ω2 = 0.001; Figure 3D).
Figure 5 shows the relation between speech-clarity conditions and EEG prediction accuracy. Prediction accuracy decreased with decreasing SNR relative to clear speech (Figure 4, left). This was also reflected in the rmANOVA, revealing smaller EEG prediction accuracies for speech in babble than clear speech (F1,50 = 16.881, p = 1.5 · 10-4, ω2 = 0.046) and younger compared to older adults (F1,50 = 4.832, p = 0.033, ω2 = 0.036). The interaction was not significant (F1,50 = 0.064, p = 0.801, ω2 < 0.001; Figure 4, right).
Left: EEG prediction accuracy for each speech-clarity condition and age group. The colored, horizontal lines close to the x-axis reflect a significant difference between clear speech and the SNR conditions (FDR-thresholded). There was no significant difference between age groups for individual speech-clarity conditions (FDR- thresholded). Right: EEG prediction accuracy for clear speech and the mean across SNR conditions (babble).
Comparing older adults with clinically ‘normal’ hearing to those with hearing impairment
Audiograms for younger adults, older adults with clinically ‘normal’ hearing, and older adults with hearing impairment are shown in Figure 6A. Despite the group separation, the older adult group with clinically ‘normal’ hearing still had greater pure-tone average thresholds compared to younger adults (t39 = 5.536, p = 2.3 · 10-6, d = 1.795), revealing subclinical hearing impairments that are common among older individuals (Dubno et al., 2013; Plack, 2014; Helfer and Jesse, 2021).
A: Audiograms (left) and pure-tone average thresholds for younger adults, older adults with ‘normal’ hearing (NH), and older adults with clinical hearing impairment (HI). B: Left: P1-N1 TRF amplitudes for different speech-clarity conditions and groups. The colored, horizontal lines close to the x-axis reflect a significant difference between clear speech and the SNR conditions (FDR- thresholded). The two-colored (dashed) lines reflect a significant difference between groups (FDR- thresholded). Right: P1-N1 TRF amplitude for clear speech and the mean across SNR conditions (babble). C: Same as in panel B for the P2-N1 TRF amplitudes.
For neither of the two groups of older adults did the P1-N1 amplitudes show much sensitivity to background babble relative to clear speech, with the exception around +10 dB SNR for older adults with hearing impairment (Figure 6B, left). Critically, for clear speech and speech at high SNRs, the P1-N1 amplitude was larger in both older adult groups compared to younger adults, although the difference was significant for more SNRs for older adults with hearing impairment (FDR-thresholded; Figure 6B). This is also reflected in the rmANOVA for the P1-N1 amplitude, showing a larger amplitude for both groups of older adults compared to younger adults (normal hearing: t49 = 2.844; pHolm = 0.013; d = 0.877; hearing impairment: t49 = 3.95; pHolm = 7.5 · 10-4; d = 1.352), whereas no difference between the two older adult groups was found (t49 = 1.256; pHolm = 0.215; d = 0.474; main effect of Group: F2,49 = 9.134, p = 4.3 · 10-4, ω2 = 0.098). Group interacted with Speech Clarity (F2,49 = 7.963, p = 0.001, ω2 = 0.024; Figure 5B, right): Younger adults showed a larger P1-N1 amplitude when speech was masked by babble compared to clear speech (t49 = 5.083; pHolm = 8.7 · 10-5; d = 0.614), whereas this was not significant in older adults with normal hearing (t49 = 0.009; pHolm = 1; d = 0.001) nor in older adults with hearing impairments (t49 = 0.776; pHolm = 1; d = 0.144).
A decrease in P2-N1 amplitudes with SNR, particularly for very low SNRs, relative to clear speech was observed for younger adults and older adults with hearing impairment (FDR- thresholded; Figure 6C left; this was significant for older adults without hearing impairment for uncorrected p-values). Moreover, P2-N1 amplitudes were larger for both older adult groups compared to younger adults for all SNRs (FDR-thresholded; Figure 6C left). The rmANOVA further corroborated this, revealing larger P2-N1 amplitudes for older adults with hearing impairment relative to those with normal hearing (t49 = 2.62; pHolm = 0.012; d = 0.967) and younger adults (t49 = 5.921; pHolm = 9.3 · 10-7; d = 1.98), and larger amplitudes for older adults with normal hearing relative to younger adults (t49 = 3.36; pHolm = 0.003; d = 1.013; main effect of Group: F2,49 = 18.640, p = 9.5 · 10-7, ω2 = 0.190). P2-N1 amplitudes were lower for speech in babble than clear speech (F1,49 = 17.527, p = 1.2 · 10-4, ω2 = 0.043), but there was no interaction (F2,49 = 1.643, p = 0.204, ω2 = 0.003; Figure 5C, right).
For EEG prediction accuracy, an SNR-related decrease relative to clear speech was observed for younger adults and older adults with hearing impairment (FDR-thresholded; Figure 7 left). Prediction accuracy was greater for older adults with hearing impairment relative to younger adults for most speech-clarity conditions (FDR-thresholded; Figure 7 left). The rmANOVA for EEG prediction accuracy showed smaller accuracies for speech in babble than clear speech (F1,49 = 15.474, p = 2.6 · 10-4, ω2 = 0.046). Prediction accuracy was greater for older adults with hearing impairment than younger adults (t49 = 3.307; pHolm = 0.005; d = 1.085) and older adults with ‘normal’ hearing (t49 = 2.408; pHolm = 0.040; d = 0.872), whereas there was no difference between the two latter groups (t49 = 0.721; pHolm = 0.474; d = 0.213; effect of Group: F2,49 = 5.546, p = 0.007, ω2 = 0.057; Figure 7 right).
Left: EEG prediction accuracy for each speech-clarity condition and age group. The colored, horizontal lines close to the x-axis reflect a significant difference between clear speech and the SNR conditions (FDR-thresholded). The two- colored (dashed) line reflects a significant difference between groups (FDR-thresholded). Right: EEG prediction accuracy for clear speech and the mean across SNR conditions (babble). NH – normal hearing; HI – hearing impairment
Discussion
The current study investigated the extent to which auditory cortex of older adults shows noise- related enhancements in neural speech tracking. Younger and older adults listened to spoken stories either in quiet (clear) or in the presence of background noise. In younger adults, neural speech tracking, as evidenced by the P1-N1 amplitude of the temporal response functions, was enhanced when speech was presented in minimal background noise. Although neural tracking of speech in quiet and speech masked by minimal background noise was enhanced for older adults compared to younger adults, older adults showed little evidence of noise-related enhancements. The data demonstrate different sensitivity of the auditory cortex between younger and older people to speech masked by acoustic background noise. The data are consistent with the idea that internal, neural noise enhances neural speech tracking in older adults – possibly through stochastic resonance – that reduces the impact of additional external, acoustic noise.
Noise-related enhancement of neural speech tracking
The current study shows that, for younger adults, minimal background noise increases the neural tracking of the amplitude-onset envelope of speech compared to speech presented in quiet (P1-N1 amplitude; Figure 4). This is remarkable, given that the background babble overlaps spectrally with the speech, but a noise-related enhancement has been shown recently in a few other works for speech and more simple sounds (Alain et al., 2009; Ward et al., 2010; Parbery- Clark et al., 2011; Alain et al., 2012; Alain et al., 2014; Shukla and Bidelman, 2021; Herrmann, 2024; Panela et al., 2024).
Minor background noise appears to be sufficient to enhance neural tracking, because the enhancement is present for very high SNRs (>40 dB; Figure 4) for which speech is as intelligible as speech in quiet (Holder et al., 2018; Rowland et al., 2018; Spyridakou et al., 2020; Irsik et al., 2022; Figures 3 and 4). Neural speech tracking decreases only for low SNRs for which speech comprehension declines (Figures 3 and 4). Critically, a recent study shows that the noise-related enhancement in speech tracking is present even when participants attend to a demanding visual task rather than to the speech (Herrmann, 2024), making it unlikely that attentional effort explains the enhancement, especially at high SNRs (Rowland et al., 2018). Moreover, the enhancement was present only for early sensory responses (P1-N1), but not for later responses (P2-N1) and EEG prediction accuracy (which integrates responses over time), consistent with a sensory-driven nature of the enhancement.
A few neural speech tracking studies used noise that matched the spectrum of speech as a background masker but did not find a noise-related enhancement (Ding and Simon, 2013; Zou et al., 2019; Synigal et al., 2023). However, these studies used low SNRs for which speech is less intelligible (<10 dB) and neural tracking reduced (Figure 4). Moreover, babble noise appears to enhance the neural tracking more than speech-matched noise (Herrmann, 2024), potentially because the babble facilitates neural activity in the same speech-relevant auditory regions that are recruited by the speech. This points to some specificity of the spectral noise properties in facilitating the amplification (cf. Krauss and Tziridis, 2021)
Stochastic resonance – that is the response facilitation of a non-linear system through noise (Stocks, 2000; Ward et al., 2002; Moss et al., 2004; Stein et al., 2005; McDonnell and Abbott, 2009; McDonnell and Ward, 2011; Krauss et al., 2016; Schilling et al., 2023; Figure 1) – has been suggested to underlie the enhancement of neural speech tracking in the presence of background noise (Herrmann, 2024). Observing neural speech tracking with scalp EEG requires the synchronized activity of more than 10,000 neurons (Niedermeyer and da Silva, 2005; da Silva, 2010). Some neurons may not have been driven enough to elicit a response for speech in quiet, whereas the presence of noise – through stochastic resonance – may have pushed them beyond their firing threshold (Figure 1). Stochastic resonance may help individuals to hear robustly even when background noise is present.
Age-related enhancement in neural speech tracking
Neural speech tracking was enhanced in older compared to younger adults for all metrics (P1- N1, P2-N1, EEG prediction accuracy), especially for speech in quiet (clear) and speech masked by minimal background noise. An age-related enhancement in neural speech tracking is consistent with previous work (Presacco et al., 2016a, b; Brodbeck et al., 2018; Broderick et al., 2021; Panela et al., 2024) and with work showing larger neural responses to tones and noises for older adults (Sörös et al., 2009; Alain et al., 2014; Bidelman et al., 2014; Stothart and Kazanina, 2016; Irsik et al., 2021; Herrmann et al., 2023). Such hyperactivity is thought to result from a loss of inhibition and an increase in excitation in the auditory pathway due to reduced peripheral inputs associated with aging and hearing loss (Caspary et al., 2008; Caspary et al., 2013; Ouellet and de Villers-Sidani, 2014; Zhao et al., 2016; Resnik and Polley, 2017; Salvi et al., 2017; Herrmann and Butler, 2021; McClaskey, 2024).
The age-related enhancement of the P1-N1 amplitude appears mostly to be due to aging (Figure 6B), whereas the P2-N1 amplitude and the EEG prediction accuracy seem to be also or exclusively driven by hearing loss (Figures 6C and 7). Previous studies have shown mixed results regarding the effects of aging versus hearing loss on neural speech tracking. Some studies have found that neural speech tracking is greater for older adults with hearing loss compared to those without (Decruy et al., 2020; Fuglsang et al., 2020; Gillis et al., 2022; Schmitt et al., 2022), whereas other studies point to age-related enhancements per se (Presacco et al., 2019). Counterintuitively, the current data suggest that earlier, sensory responses (P1-N1) are less affected by hearing loss than later responses (P2-N1). However, distinguishing between the impacts of hearing loss versus aging per se may not be possible (Humes et al., 2012). Even minor hearing loss or peripheral damage that is less-well detectable with pure-tone audiometry, such as damage to synapses connecting to auditory nerve fibers (Kujawa and Liberman, 2009; Bharadwaj et al., 2014; Liberman and Kujawa, 2017), can lead to hyperactivity in the auditory system (Qiu et al., 2000; Munguia et al., 2013; Resnik and Polley, 2017; Salvi et al., 2017). The current enhancements for older adults with clinically ‘normal’ hearing compared to younger adults may thus still be related to differences in hearing abilities (i.e., audiometric thresholds were elevated even in normal-hearing older adults; Figure 6A). Speculatively, minor hearing loss is sufficient to enhance early sensory responses, which then do not amplify further with worsening hearing abilities.
Age-related differences in sensitivity of neural speech tracking to background noise
The main purpose of the current study was to investigate whether auditory cortex of older adults shows enhancements in speech tracking due to acoustic background noise. However, there was little evidence that neural speech tracking in older adults with or without hearing loss is enhanced by acoustic background noise (Figures 4C and 6B). There was only a minor increase in the P1-N1 amplitude around +10 dB SNR, but speech comprehension for this and lower SNRs is more difficult for listeners (Irsik et al., 2022; Herrmann, 2023) and the increase could thus be due to attentional effort (Pichora-Fuller et al., 2016; Hauswald et al., 2022; Yasmin et al., 2023). Neural speech tracking decreased for both younger and older adults for lower SNRs, for which speech comprehension decreased as well; this is consistent with previous work (Ding and Simon, 2013; Zou et al., 2019; Yasmin et al., 2023).
A loss of neural inhibition and an increase in neural excitation due to aging and hearing loss can lead to a gain or amplification of a neuron’s output, because under such circumstances a smaller input to the neuron is already sufficient for it to fire compared to a less excitable and inhibited neuron. This gain – or central gain, because the amplification increases along the auditory pathway – is often referred to as the cause for hyperresponsivity to sound and speech (Auerbach et al., 2014; Zhao et al., 2016; Salvi et al., 2017; Herrmann and Butler, 2021; McClaskey, 2024). The reduced sensitivity to acoustic noise for older adults observed in the current study could potentially be the result of a maxed-out central gain, such that any noisy neural activity driven by the acoustic noise that would normally facilitate enhancements due to stochastic resonance may not be as effective under conditions of maxed-out central gain. However, this may not be a sufficient explanation, since there was a minor effect of background noise around +10 dB SNR (Figures 4C and 6B) and hearing loss appeared to increase the P1-N1 amplitude slightly (but non-significantly; Figure 6B), suggesting that central gain may not be entirely at its maximum.
Alternatively or in addition, spontaneous activity – that is, neural noise – in the auditory system is known to increase with age and hearing loss (Kaltenbach and Afman, 2000; Eggermont and Roberts, 2004; Munguia et al., 2013; Eggermont, 2015; Parthasarathy et al., 2019). Internal, neural noise in older adults may have a comparable effect on the neural tracking of speech in quiet as the external, acoustic noise has in younger adults. Indeed, increased neural noise in hearing loss has been suggested to lead to stochastic resonance phenomena where the neural noise can increase sensitivity to sound (Krauss et al., 2016; Krauss and Tziridis, 2021; Schilling et al., 2023). Increased internal, neural noise in older adults could reduce the sensitivity to external, acoustic noise, because the noisy neural activity elicited by acoustic noise may not necessarily add further output amplification to the aged auditory system if it is already amplified through stochastic resonance (resulting from neural noise). Although, central gain and increased neural noise are not independent phenomena and distinguishing one from the other with non-invasive recording techniques may be challenging (Schilling et al., 2023), the current results are the first to point at the possibility that internal neural noise in the auditory cortex of older people could underlie enhanced neural speech tracking.
Acknowledgments
I thank Priya Pandey and Saba Junaid for their help with story generation and data collection. The research was supported by the Canada Research Chair program (CRC-2019-00156) and the Natural Sciences and Engineering Research Council of Canada (Discovery Grant: RGPIN-2021- 02602).
Author contributions
Björn Herrmann: Conceptualization, methodology, formal analysis, investigation, data curation, writing - original draft, writing - review and editing, visualization, supervision, project administration, funding acquisition.
Statements and Declarations
The author has no conflicts or competing interests.
Data availability
Data are available at https://osf.io/ (upon publication). One younger and one older participant declined to share their data publicly (we employ separate consents for study participation and data sharing in line with Canadian Tri-Council Policies for Ethical Conduct for Research Involving Humans – TCPS 2 from 2022). Their data are thus not made available.
Conflict of interest statement
The author declares no competing interests.
Author contributions
Björn Herrmann: Conceptualization, methodology, formal analysis, investigation, data curation, writing - original draft, writing - review and editing, visualization, supervision, project administration, funding acquisition.
Acknowledgments
I thank Priya Pandey and Saba Junaid for their help with story generation and data collection. The research was supported by the Canada Research Chair program (CRC- 2019-00156) and the Natural Sciences and Engineering Research Council of Canada (Discovery Grant: RGPIN-2021-02602).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵