Pupillometry as an objective measure of sustained attention in young and older listeners

Sijia Zhao; Gabriela Bury; Alice Milne; Maria Chait

doi:10.1101/579540

Abstract

The ability to sustain attention on a task-relevant sound-source whilst avoiding distraction from other concurrent sounds is fundamental to listening in crowded environments. To isolate this aspect of hearing we designed a paradigm that continuously measured behavioural and pupillometry responses during 25-second-long trials in young (18-35 yo) and older (63-79 yo) participants. The auditory stimuli consisted of a number (1, 2 or 3) of concurrent, spectrally distinct tone streams. On each trial, participants detected brief silent gaps in one of the streams whilst resisting distraction from the others. Behavioural performance demonstrated increasing difficulty with time-on-task and with number/proximity of distractor streams. In young listeners (N=20), pupillometry revealed that pupil diameter (on the group and individual level) was dynamically modulated by instantaneous task difficulty such that periods where behavioural performance revealed a strain on sustained attention, were also accompanied by increased pupil diameter. Only trials on which participants performed successfully were included in the pupillometry analysis. Therefore, the observed effects reflect consequences of task demands as opposed to failure to attend.

In line with existing reports, we observed global changes to pupil dynamics in the older group, including decreased pupil diameter, a limited dilation range, and reduced temporal variability. However, despite these changes, the older group showed similar effects of attentive tracking to those observed in the younger listeners. Overall, our results demonstrate that pupillometry can be a reliable and time-sensitive measure of the effort associated with attentive tracking over long durations in both young and (with some caveats) older listeners.

Introduction

The ability to sustain attention on a task-relevant stimulus whilst avoiding distraction from competing information is a fundamental perceptual challenge across sensory modalities. Arguably this is especially the case in hearing because of the dynamic nature of sound-objects. Listening in many natural environments (e.g. a busy train station, a loud restaurant, a noisy classroom) does not only depend on hearing acuity but also on the brain’s ability to focus and maintain attention on a specific sound (e.g. the announcement at the train station, a conversation in the restaurant, the teacher’s voice in the classroom) whilst resisting distraction from other concurrent sounds. Understanding ’attentive tracking’ is central to understanding the challenges faced by the brain during every-day listening and for addressing impairments in this ability. Indeed, diminished sustained attention capacity is hypothesized to underlie various disorders commonly associated with impaired listening, including Auditory Processing Disorder (APD; e.g. Moore et al., 2010, 2013), ADHD (Tucha et al., 2017), autism spectrum disorder (ASD; Corbett and Constantine, 2006) and dementia (Berardi et al., 2005; Calderon et al., 2001). Failure to maintain attention is also observed in hearing impaired individuals (Pichora-Fuller et al., 2016) and as a consequence of healthy ageing (Mishra et al., 2014; Petersen et al., 2017; Schoof and Rosen, 2014).

To successfully track a given source within a noisy scene, a listener must overcome challenges associated with energetic masking (i.e. extracting the information related to the target from the sound mixture) as well as challenges associated with selecting and continuously following the relevant source from within the background (Shinn-Cunningham and Best, 2008; Woods and McDermott, 2015). Most previous work has investigated listening in noisy environments using speech embedded in noise or in a mixture of other speakers, effectively confounding both aspects of tracking. However, it is likely that individual capacity to sustain attention is in itself a factor that will affect listening success. Here we sought to isolate and continuously monitor this aspect of auditory processing.

A large body of work demonstrates that sustained attention is not static but fluctuates over time and that these behavioural effects are associated with changes in connectivity along a distributed network of brain regions (Fortenbaugh et al., 2017; Langner and Eickhoff, 2013; Thomson et al., 2015). Emerging models postulate that lapses in sustained attention may arise from the weakening of executive processes over time, resulting in failure to effectively control resource allocation between the main task, distractor suppression, and mind wondering (Kurzban et al., 2013). To monitor sustained attention and determine how it is affected by increasing demands on distractor suppression, we designed a paradigm that isolates this facet of listening and measured behavioural and pupillometry responses during 25-second long trials.

Pupil dilation has long been used as a measure of effort (Beatty, 1982; Bradshaw, 1968; Cabestrero et al., 2009; Granholm and Steinhauer, 2004; Hjortkjær et al., 2018; Wel and Steenbergen, 2018) and is presently attracting considerable interest in the auditory modality because of evidence that pupil dilation can be used as an objective means with which to evaluate challenges to listening (McGarrigle et al., 2014; Peelle, 2018; Pichora-Fuller et al., 2016). The bulk of existing work has used pupillometry to evaluate listening effort associated with degraded speech (Koelewijn et al., 2012, 2014, 2015; Kuchinsky et al., 2014; Naylor et al., 2018; Ohlenforst et al., 2017; Wang et al., 2017; Wendt et al., 2016, 2017; Winn et al., 2015, 2018; Zekveld et al., 2010, 2011, 2018). As a result, these tasks inherently challenged both the ability to cope with energetic masking and the ability to sustain attention over time. Here we seek to specifically relate pupil dilation to the challenges of attentive tracking.

There is evidence to suggest that pupil dilation may be particularly correlated to the demands on sustained attention (Hopstaken et al, 2015; Sarter et al, 2001). Non-luminance mediated pupil dilation is at least partially driven by the release of NE (Norepinephrine, also Noradrenaline; Loewenfeld and Lowenstein, 1993) and ACh (Acetylcholine; see recent review Larsen and Waters, 2018). NE release has been consistently linked to arousal and sustained attention through its effects on modulating the response gain of cortical and thalamic neurons (Berridge and Waterhouse, 2003; Sara, 2009). ACh has been associated with activation in the anterior attention system and is hypothesized to play a role in controlling distraction (Berry et al., 2014; Demeter and Sarter, 2013; Kim et al., 2017; Sarter et al., 2006). We, therefore, expect that increased demands on sustained attention, including time-on-task and number of distractors. should be revealed in a time-specific manner in the pupil dilation pattern.

The auditory stimuli used in the present experiments are simple artificial ‘sound-scapes’ (Figure 1) that minimize the demands of segregation and isolate processes associated with object selection. ‘Scenes’ consist of a number (1, 2 or 3) of concurrent tone streams that model auditory sources. Each source is modulated at a unique rate; this contributes to the perceptual distinctiveness of each stream. The sources are widely set apart in frequency (always at least 6 ERB in Experiment 1; 2 ERB in Experiment 2) such that any effects are interpretable in the context of competition for processing resources rather than the increasing physical overlap between sources. On each trial, participants are instructed via a brief cue sound to attend to one of the streams. Attention is verified and quantified as performance on a gap detection task. Gaps occur in all streams but listeners are instructed to only respond to those in the target (‘Attended’) stream. The scenes are long (~25 seconds) and the task, therefore, requires listeners to maintain sustained attention over long durations and actively resist distraction from the other concurrent streams within the scene. As the scene size grows participants systematically struggle to resist the distractions, detecting fewer targets and making more false alarms (Figure 2). This demonstrates that this task models in a suitable way the competition for processing resources in crowded acoustic scenes.

Figure 1.

A schematic representation (not to scale) of the stimuli in Experiment 1. ‘Scenes’ consist of 1 (‘Easy’), 2 (‘Medium’) or 3 (‘Hard’) concurrent tone streams. Each source is amplitude modulated at a unique rate to increase distinctiveness. The sources are widely set apart in frequency (6 ERB). On each trial, participants are instructed (via a 2-second long cue sound) to attend to one of the streams (‘target’). Attention is verified and quantified as performance on a gap detection task. Gaps occur in all streams, but listeners are instructed to only respond to those in the target stream. The scenes are long (25 seconds) and as such the task requires listeners to maintain sustained attention over long durations and actively resist distraction from the other concurrent streams within the scene.

Figure 2.

Behavioural performance of the Young group (Experiment 1). Performance measures were: hit rate, number of false alarms, number of bad trials. [A] Data from all participants (N=33). [B] Data from the participants retained for the pupillometry analysis (N=20). See ‘methods’ for retention criteria. Grey circles indicate individual data. The task conditions are labelled by difficulty: ‘Easy’ condition = 1 stream; ‘Medium’ condition = 2 streams; ‘Hard’ condition = 3 streams. All performance measures were significantly modulated by task difficulty. Error bar is ±1 SEM.

We address three questions: Firstly, we ask whether pupillometry can be a reliable and time-sensitive measure of the effort associated with attentive tracking over long durations similar to those over which listeners must maintain attention in ecologically relevant situations. Indeed, most previous work has used coarse pupil measures (peak dilation) and over relatively short intervals (most investigations have focused on the first 5 seconds; but see Hjortkjær et al., 2018). In contrast, we aim to measure instantaneous pupil diameter changes over a period of ~25 seconds.

A second challenge is related to isolating the effect of effort from other factors linked to task difficulty (McGarrigle et al., 2014). Manipulation of effort through varying task difficulty is intrinsically associated with reduced performance i.e. an increasing number of trials on which participants fail to accomplish the task. There is, therefore, a risk that pupil activity measured during those trials may reflect processes linked with the failure of attention and/or disengagement (for example, if the task is difficult participants might decide to ‘give up’ on a certain proportion of the trials). This is problematic because it is known that day dreaming is associated with pupil dilation (Franklin et al., 2013; Pelagatti et al., 2018). Indeed, Winn et al. (2015) compared pupil responses to correctly and incorrectly identified sentences and reported increased pupil dilation associated with failed trials, especially later in the trial (see their figure 1 and 5). This may be interpreted as indicating that the trials on which listeners failed were experienced as more difficult than the successful trials (see also Zekveld et al., 2010), but another interpretation may be that the increased dilation is related to disengagement rather than effort.

To address these concerns, we adopt a strict policy of analysing only successful trials. These are defined as trials on which all target gaps have been correctly identified and where the participant had at most one false positive. This allows us to focus on trials where resources were appropriately allocated, and distractors successfully ignored. Any differences observed between conditions will, therefore, reveal pure effects of task demands, not contaminated by failure to attend. Furthermore, we ensure that each condition and participant contribute an equal number of trials to the analysis (determined by the poorest performer on the hardest condition).

Finally, we ask whether pupillometry as a measure of effort to sustain attention is also applicable to older listeners. Attentive capacity is known to decline with age (e.g., Brosnan et al., 2018; Dørum et al., 2016; van der Leeuw et al., 2017; Lufi et al., 2015; Tu et al., 2018). An objective measure of sustained auditory attention would, therefore, be useful to quantify such difficulties and assess intervention outcomes. However, there are known changes to ocular physiology associated with healthy aging (Bitsios et al., 1996; Guillon et al., 2016; Tekin et al., 2018; Winn et al., 1994) that might limit the efficacy of pupillometry in this population (Piquado et al., 2010; Van Gerven et al., 2004).

Below, we report on two experiments run on young (18-31 year-old) and older (63-79 year-old) listeners. Overall, our results reveal that pupillometry can be a robust measure for attentive tracking in young listeners, supporting its potential usefulness as a screening measure and for evaluating various failures in sustained attention ability. There is also evidence that pupil measures may be used in older people but we identify a few cautionary issues.

Experiment 1: Young listeners

Methods

Participants

Thirty-three paid participants (21 females; mean age 22.9, range 18-31) took part in this study. All reported normal hearing and no history of neurological disorders. Experimental procedures were approved by the research ethics committee of University College London and written informed consent was obtained from each participant. Thirteen participants were excluded from the pupillometry analysis due to poor behavioural performance, leaving a subset of 20 participants (14 females, mean age 22.7, range 18-30).

Stimuli

Stimuli were 25-second-long artificial acoustics “scenes” that contained 1, 2 or 3 concurrent tone-pip sequences (“streams”). Each stream had a unique carrier frequency and modulation rate (AM). Carrier frequencies were selected from a pool of 18 ERB-spaced (Moore and Glasberg, 1983) values between 500 and 4000 Hz with the constraint that the separation between streams (in the 2- and 3-stream condition) was exactly 6 ERBs. AM rates were selected from a pool of 4 values 3, 7, 13 or 23 Hz. Tone pip duration was fixed at 30 ms (10 ms rise and fall). Together the unique combination of frequency and AM rate associated with each stream supported the perception of the scene as consisting of several concurrent, segregable “auditory objects”. In order to control for perceived loudness, the overall scene intensity was kept constant across scene-size conditions. As a consequence, individual stream intensity decreased with scene size.

Each stream contained between 2 and 3 silent gaps. These were created by removing the appropriate number of tones to generate a silent gap of around 333 ms (the minimum length of a gap in the 3Hz AM rate stream). Silent gaps could not occur within the first or last 2 seconds of a stream sequence or within 2 seconds of one another (including across streams). Participants were instructed to monitor one of the streams (“target”) for gaps whilst ignoring gaps in the distractor streams. The target stream was indicated by means of a 2000 ms cueing tone-pip sequences which preceded each trial. The scene was then presented following a 2000 ms silent gap (see Figure 1). In the 3-stream condition, the target stream was always the middle-frequency stream. To facilitate comparison across conditions, stimuli were created in triplets containing the same target stream across all 3 conditions. These were then presented in random order during the experimental session.

Procedure

Participants sat with their head fixed on a chinrest in front of a monitor (24-inch BENQ XL2420T with a resolution of 1920×1080 pixels and a refresh rate of 60 Hz) in a dimly lit and acoustically shielded room (IAC triple walled sound-attenuating booth). They were instructed to continuously fixate on a black cross presented at the centre of the screen against a grey background whilst monitoring the cued target stream for gaps. They were to respond (button press) as quickly as possible when a gap was detected whilst ignoring gaps in the distractor streams. Visual feedback (number of misses and false alarms) was presented for 1500 ms at the end of each trial.

Stimuli were presented in random order, such that on each trial the specific condition was unpredictable until scene onset. Sounds were delivered diotically to the participants’ ears with Sennheiser HD558 headphones (Sennheiser, Germany) via a Creative Sound Blaster X-Fi sound card (Creative Technology, Ltd.) at a comfortable listening level self-adjusted by each participant. Stimulus presentation and response recording were controlled with the Psychtoolbox package (Psychophysics Toolbox Version 3; Brainard, 1997) on MATLAB (The MathWorks, Inc.).

The entire experimental session lasted approximately 2 hours. Participants first completed a short practice block followed by 6 experimental blocks comprised of 12 trials each (~6.5min, 4 trials per condition). In total, 72 trials (24 trials per condition) were presented in a random order for each participant.

Analysis of behavioural data

Key presses occurring within 0.3 s of a previous keypress were considered to be accidental and removed from the analysis. A keypress was classified as a hit if it occurred 0.3 to 1.5 seconds following a target gap. Hit rate (HR) was computed for each subject, in each condition, as the ratio between detected vs. presented gaps in the target stream. All key presses that were not classified as a hit were classified as false alarms (FA). These were summed and averaged across trials as a measure of distractibility. As mentioned above, only trials on which participants performed well were included in the pupillometry analysis. “Successful trials” were those where all the target gaps were correctly detected (100% hits) and which included at most one FA. All other trials were classified as “bad trials” and removed from the analysis. These 3 measures: HR, #FA and #bad-trials are plotted as measures of performance in Figures 2, 3, 4 and 5. Note that FA is quantified as a count (and not as a rate). This is because false responses can happen at any time during the trial.

Figure 3.

The pupil dilation response reflects effort to sustain attention. [A] Pupil dilation results from the young group (N=20). The solid lines represent the average pupil diameter as a function of time relative to the baseline (500 ms pre-onset). The shaded area shows ±1 SEM. Colour-coded horizontal lines at graph bottom indicate time intervals where bootstrap statistics confirmed significant differences between each pair of conditions. [B] Time-binned behavioural performance. Error bars are ±1 SEM. [C] Time-binned Hit Rate difference between the ‘Hard’ and ‘Medium’ conditions. Error bars are ±1 standard deviation. Grey dots represent individual data. [D] Correlation between PDR and HR for each time bin. Within each time-bin average PDR difference between the ‘Hard’ and ‘Medium’ conditions is correlated with the corresponding HR difference (as in [C]). Black bars indicate Spearman correlation coefficients at each time bin. Red shaded areas indicate time interval where a significant correlation (Bonferroni corrected) was observed. Plotted on the right-hand side is the correlation in the 15-20 sec time-bin. Each dot represents data from a single subject. [E] Correlation between PDR (‘Hard’ – ‘Medium’ condition) and behavioural performance (hit rate difference between the ‘Hard’ and ‘Medium’ conditions) on an individual subject level. Black bars indicate Spearman correlation coefficients at each time point. Red shaded areas indicate time intervals where a significant correlation (p<0.05; FWE uncorrected) was observed. This analysis was conducted over the entire trial duration with all significant time-points indicated.

Figure 4.

A schematic representation of the stimuli in Experiment 2. Stimuli were similar to those in Experiment 1, with the exception that difficulty was varied by changing the distance between streams. The ‘Easy’ condition consisted of a single stream (identical to that in Experiment 1). The ‘Medium’ condition consistent of 2 concurrent streams separated by 10 ERB. The ‘Hard’ condition consisted of 2 concurrent streams separated by 2 ERB. Other parameters are identical to those in Experiment 1.

Figure 5.

Behavioural performance of the Older group (Experiment 2). Performance measures were: average hit rate, number of false alarms and number of bad trials for retained participants (N=19). Grey circles indicate individual data. All performance measures were significantly modulated by task difficulty. Error bars are ±1 SEM.

To quantify changes to behaviour during the unfolding trial, behavioural data were also analysed over five-time bins of 5s ([0-5]s, [5-10]s, [10-15]s, [15-20]s, [20-25]s). Mean HR and mean #FAs were computed for each condition in each time bin. The data were analysed with a repeated measures ANOVA to investigate main effects of condition and time. The p-value was a priori set to p<0.05. The Greenhouse Geisser correction is used where appropriate.

Pupil diameter measurement

An infrared eye-tracking camera (Eyelink 1000 Desktop Mount, SR Research Ltd.) was positioned at a horizontal distance of 65 cm away from the participant. The standard five-point calibration procedure for the Eyelink system was conducted prior to each experimental block and participants were instructed to avoid any head movement after calibration. During the experiment, the eye-tracker continuously tracked gaze position and recorded pupil diameter, focusing binocularly at a sampling rate of 1000 Hz. Participants were instructed to blink naturally during the experiment and encouraged to rest their eyes briefly during inter-trial intervals. Prior to each trial, the eye-tracker automatically checked that the participants’ eyes were open and fixated appropriately; trials would not start unless this was confirmed.

Analysis: Pupillometry

As described above, only “successful trials” were included in the pupillometry analysis. To equate the number of trials analysed per condition, the number of trials per condition was set to 12 per participant (this number was determined based on the performance of the worst retained participant on the most difficult condition).

Preprocessing

Only the left eye was analysed. To measure the pupil dilation response (PDR) associated with tracking the acoustic stream, the pupil data from each trial were epoched from 0.5 s prior to stream onset to stream offset (25 s post-onset). For each trial, baseline correction was applied by subtracting the mean pupil diameter over the pre-onset interval (0.5-sec pre-onset). The data were smoothed with a 150 ms Hanning window and down-sampled to 20 Hz. Intervals where full or partial eye closure was detected (e.g. during blinks) were automatically treated as missing data and recovered using shape-preserving piecewise cubic interpolation. The blink rate was low overall. In both young and older (see below) participant groups and for all conditions, the average blink rate (defined as the proportion of excluded samples due to eye closure) was approximately 5% (SD = 5%). Blinks were distributed evenly over the trial duration.

For each participant, the pupil diameter was time-domain-averaged across all epochs of each condition to produce a single time series per condition.

Time-series statistical analysis

To identify time intervals in which a given pair of conditions exhibited PDR differences, a nonparametric bootstrap-based statistical analysis was used (Efron and Tibshirani, 1994). The difference time series between the conditions was computed for each participant and these time series were subjected to bootstrap re-sampling (1000 iterations). At each time point, differences were deemed significant if the proportion of bootstrap iterations that fell above or below zero was more than 95% (i.e. p<0.05). Any significant differences in the pre-onset interval would be attributable to noise and the largest number of consecutive significant samples pre-onset was used as the threshold for the statistical analysis for the entire epoch.

Participant exclusion criteria

Participants with more than 50% of bad trials on the hardest condition (3 streams) were excluded from the main analysis.

Results

Behavioural performance

Figure 2A shows behavioural performance across the full group of participants (N=33). The pattern of performance demonstrates that the task becomes increasingly harder with the addition of distractor streams to the scene (manifested by reduced HR and increased #FA and #of bad trials). This suggests that the paradigm successfully manipulates demands on attentive tracking. Thirteen participants performed poorly on the hardest condition, resulting in an insufficient number of “successful trials”. These participants were excluded from further analysis. The fact that 30% of participants are excluded suggests that the task loads resources to the extent that it may deplete them in a large proportion of participants.

Figure 2B plots the performance of the 20 retained participants (those who had at least 12 successful trials in the hardest condition). A repeated measures ANOVA with condition (1 stream - ‘Easy’, 2 streams - ‘Medium’, 3 streams - ‘Hard’) revealed a main effect of condition on all performance measures (HR, #FA, #bad trials). For HR F(1.464, 27.812) = 36.915, p < .001, for #FA F(1.557, 29.589) = 29.910, p < .001, for #bad trials F(2,38) = 39.181, p < .001. Post-hoc tests (Bonferroni corrected) for HR revealed no significant difference between the ‘Easy’ and ‘Medium’ conditions (p = .065) but did show a significantly reduced HR for ‘Hard’ compared to ‘Easy’, and ‘Medium’ compared to ‘Easy’ trials (p-values < .001). There were significant differences between all conditions for the #FA (p-values ≤.002) and #bad trials (p-values ≤ .026), showing an increase in false alarms and bad trials for conditions with higher numbers of streams.

In addition to quantifying the overall effects, we examined how performance evolved over the duration of the trial by separating the trial into 5s time bins (Figure 3B). A repeated measures ANOVA with condition (‘Easy’, ‘Medium’, ‘Hard’) and time bin (five 5s intervals) revealed a main effect of condition for both measures; for HR F(1.504,28.584) = 33.876, p < .001, for #FA F(1.557, 29.589) = 29.910, p < .001). There was also an interaction between condition and time bin for HR F(8,152) = 3.667, p = .001 and for #FA F(3.996,75.933) = 2.623, p = .041. This was because the difference between the hardest condition and the other two conditions was not fixed but increased partway through the trial. A post hoc repeated measures ANOVA analysis on hit rates in each condition as a function of time-bin revealed no effect for the ‘Easy’ and ‘Medium’ conditions, but a significant effect of time-bin on the “Hard” condition (F(4,76)=8.79 p<.001). An identical result was obtained for analysis of #FA (“Hard” condition F(4,76)=4.08 p=0.05; other two conditions n.s).

The Pupil Dilation Response (PDR) as a measure of effort to sustain attention

Figure 3A plots the average pupil diameter data across the 20 listeners as a function of time relative to the pre-onset baseline. Note that the baseline was not taken at a complete resting state but during a brief silent interval (2 seconds) that occurred between the presentation of the cue and the onset of the scene. At this point all conditions are equiprobable.

All three conditions share a similar PDR pattern: Immediately after scene onset (t=0), the pupil diameter rapidly increased and reached a peak within 2 seconds. A significant difference between the PDR to the ‘Easy’ versus ‘Medium’ and ‘Hard’ tracking conditions emerged roughly 1 second after onset. The difference between the ‘Medium’ and ‘Hard’ conditions emerged 2.15 seconds after onset. After the initial peak in the ‘Hard’ condition (at 2 seconds), the pupil diameter continuously climbed to a peak at 4.1 seconds.

Following the initial dilation, the pupil diameter gradually decreased throughout the epoch but in a manner that preserved the differences between the different conditions. The difference between the ‘Medium’ and ‘Easy’ conditions was no longer significant after 14.25 s. However, the PDR to the ‘Hard’ condition remained considerably above the other two conditions throughout the epoch.

Note that the negative pupil diameter values later in the trial reflect the fact that pupil diameter reduced beyond its size during the pre-trial (baseline) period. This likely happens due to the presence of pupil dilation in the pre-trial period, reflecting the anticipation of the onset of the scene (e.g. Bradshaw, 1968; Wierda et al. 2012).

Correlation between PDR and behaviour at an individual level

To investigate the relationship between pupil dynamics and behavioural performance on an individual subject level, we correlated within each time bin the HR difference between the ‘Hard’ and ‘Medium’ conditions (Figure 3C) with the corresponding mean PDR difference. The ‘Easy’ condition was excluded from this analysis because it was associated with little behavioural variability across participants, consistent with ceiling performance. Correlation coefficients (Spearman) are plotted in Figure 3D. A significant (Bonferroni corrected), moderate correlation between PDR and HR was observed between 15-20 s after trial onset. This timing corresponded to the time window where the HR and #FA of the ‘Hard’ condition demonstrated increased divergence relative to the ‘Medium’ condition (Figure 3B).

For a more time-sensitive analysis, we also correlated the instantaneous PDR difference between the ‘Hard’ and ‘Medium’ conditions at every time sample (20 Hz) with the mean overall HR difference between these conditions measured for each participant (Figure 3E). Correlation coefficients (Spearman) are plotted as black bars in Figure 3E. Significant time samples (FWE uncorrected) are marked in red. In line with the time-binned analysis, a significant correlation between instantaneous PDR and HR was found between ~12 and ~19 s post-stream onset.

Experiment 2: Older listeners

Overall, the results from Experiment 1 indicate that pupil dilation is a stable and sensitive measure of effort to sustain attention at the group level and that it is modulated by individual subject performance. This finding makes PDR a potentially useful objective tool for evaluating attentive tracking ability. Specifically, PDR may be instrumental for quantifying deficits in attentive tracking often exhibited by older populations. However, a potential drawback is the known physiological changes to the pupil that occur during healthy ageing; increased demands on accommodation, reduced pupil diameter and slower responses are commonly observed (Bitsios et al., 1996; Guillon et al., 2016; Tekin et al., 2018). Whilst the physiological underpinnings of these effects are not fully clear (Bitsios et al., 1996), they manifest as relative pupil rigidity and may reduce the sensitivity of the PDR as a measure of effort.

In Experiment 2 we used a paradigm similar to that in Experiment 1 to measure attentive tracking capacity in a group of older listeners.