Phasic arousal optimizes decision computations in mice and humans

Neuromodulatory brainstem systems controlling the global arousal state of the brain are phasically recruited during cognitive tasks. The function of such task-evoked neuromodulatory signals is debated. Here, we uncovered a general principle of their function, across species and behavioral tasks: counteracting maladaptive biases in the accumulation of evidence. We exploited that neuromodulatory brainstem responses are mirrored in rapid dilations of the pupil. Task-evoked pupil dilations predicted smaller biases in an auditory detection task performed by humans and mice. In humans, this effect generalized to a more complex, and well-known form of bias: risk-seeking in a value-based (“stock-market”) decision. Across tasks, pupil-linked bias suppressions were specifically due to changes in the accumulation of evidence, indicating that phasic changes in arousal state shape the formation of the decision. Thereby, phasic arousal accounts for a significant component of the trial-to-trial variability overt choice behavior.


INTRODUCTION 24
Even when awake, the central arousal state of the brain changes on a moment-to-moment basis

12
It is not known if the discrepancy between the intra-and post-decisional models of phasic arousal is 13 due to differences in species (e.g. between monkeys and humans), physiological signals, or behavioral  Here, we identified a general principle of the function of task-evoked neuromodulatory responses. 21 We combined pupillometry and behavioral modeling in humans and mice performing the same simple 22 perceptual decision (auditory go/no-go detection). In addition, humans performed a forced-choice 23 decision task based on identical stimuli as well as a value-based choice task often used to study stock

15
To track phasic arousal, we measured the rising slope of the pupil, immediately after sound onset. 16 We choose this measure for three reasons: (i) to most specifically track noradrenergic activity (Reimer

22
In both mice and humans, we found a consistent relationship between the early, task-evoked pupil 23 response and decision outcome, in line with the intra-decisional account of phasic arousal function. We 24 quantified detection performance in terms of the signal detection-theoretic measures (Green and Swets, 25 1966) sensitivity (d') and bias (criterion), as well as reaction times (RT) (Materials and Methods). We 26 found that both mice and humans had an overall conservative decision criterion (Fig. 2). Optimality 27 analysis of the go/no-go choice task showed that this conservative bias was maladaptive, reducing the 28 fraction of correct/rewarded choices below what could be achieved at a given perceptual sensitivity

28
In both species, we found that the bias suppression associated with a large pupil response was due 29 to a shift in drift criterion (Fig. 3). First, formal model comparison favored a change in drift criterion

32
3D,E,H,I). The drift criterion model almost fully accounted for the pupil dependent changes in overt 33 decision bias. The fitted parameters accurately predicted overall signal detection criterion, and its pupil 34 response predicted shift (Fig. 3F,J; in humans, for all but the highest signal strength, which exhibited a ceiling effect). Specifically, in both species, the starting point was strongly biased towards no-go 1 irrespective of pupil response (Fig. 3C,G). Thus, overcoming this conservative choice bias required 2 increasing their drift criterion. This increase in drift criterion occurred on trials with large pupil 3 responses (Fig. 3D,E,H,I). We found either no, or a less consistent, relationship between pupil 4 responses and drift rate, boundary separation, or non-decision time (Fig. S4).

5
Two observations verified that the drift criterion model used here well accounted for the overt 6 behavior in the go/no-go task. First, as expected, drift rate increased with signal strength, reflecting the 7 subjects' ability to accumulate strong sensory evidence more efficiently (Fig. S4A,G). Second, the 8 fitted parameters accurately predicted overall RT and sensitivity, and, in mice, its pupil response 9 predicted shift (Fig. S4). 6 criterion and task-evoked pupil response in mice, separately for the different signal strengths. Linear fits are 1 plotted wherever the first-order fit was superior to the constant fit (Materials and Methods). Quadratic fits were 2 plotted (dashed lines) wherever the second-order fit was superior to first-order fit. Stats, mixed linear modeling. The previous results suggest that the variations in systematic accumulation bias due to arousal 9 would appear as random decision noise (i.e., drift rate variability) without taking trial-by-trial arousal 10 responses into account. Indeed, this is what we found. We simulated RT distributions from two 11 conditions that differed according to the fitted drift criterion (accumulation bias) estimates in the lowest 12 and highest pupil-defined bin of each individual. As predicted, when fitting the model, drift rate 13 variability was accurately recovered when drift criterion could vary with condition, but was 14 significantly overestimated when drift criterion was fixed (Fig. 4). 15 16

27
We thus asked humans (N = 24, 18 participants from go no-go experiment) to perform an auditory 28 yes/no (forced-choice) detection task based on identical sensory stimuli to the go/no-go task. In this 29 task, motor responses (and associated preparatory activity) were balanced across yes-and no-choices 30 ( Fig. 5A). Consistent with our go/no-go results in mice and humans, we observed that the pupil 31 response predicted suppression of maladaptive perceptual choice bias (Fig. 5B). Furthermore, this bias 32 suppression was, again, best explained by a change in drift criterion instead of starting point ( Fig. 5C-33 E) and pushed behavior to a more optimal regime (Fig. S5A). Importantly, pupil response amplitudes in the go/no-go and yes/no tasks were correlated across eighteen human subjects who participated in 1 both experiments (Fig. 5F). This was true for yes-as well as no-choices. Therefore, the suppression of 2 choice bias in our results does not reflect motor preparation.  (numbers), it lends itself naturally to the sequential sampling modeling approach we applied to deviation of the more profitable sequence was lower than that of the losing sequence; on "narrow-1 error", trials this was reversed ( Fig. 6B and see Materials and Methods). Attitude towards risk was 2 operationalized as "pro-variance bias": the fraction of high-variance choices pooled across both trial 3 types (Materials and Methods). As in previous work (Tsetsos et al., 2016; 2012), subjects exhibited a 4 systematic "pro-variance" bias indicating risk seeking (fraction of high-variance choices larger than 5 0.5; Fig. 6C).

6
Indeed, the pro-variance bias was suppressed as a function of pupil response (Fig. 6C). This bias 7 was most reduced on intervals characterized by relatively strong pupil-linked phasic arousal responses.

8
Pupil responses did not predict changes in RT or accuracy (Fig. S6C). We again used sequential 9 sampling modeling to pinpoint the source of the pupil-linked pro-variance bias suppression. We fitted a 10 previously introduced model that accounts for several key features of behavior in the current task

16
The phasic arousal-related bias suppression is distinct from ongoing arousal fluctuations 17 A concern might be that the bias suppression effects under large pupil responses reported here were 18 due to associations between the preceding baseline pupil diameter and behavior. Such baseline effects 19 might be "inherited" by the phasic pupil response through its commonly negative correlation with Owing to its widespread projections, the LC might profoundly impact cortical computation and 5 behavior. The functional role of the task-evoked LC responses, however, has remained debated. We 6 here established a new principle of this function, generalizing across species (humans and mice) and 7 behavioral tasks (from perceptual to high-level decisions): task-evoked phasic arousal suppresses 8 biases in evidence accumulation. We identified this principle by combining pupillometry and 9 computational modeling of human and mouse choice behavior in the same auditory decision task.    (Donner and Siegel, 2011)). We propose that this challenge can be met by using physiological 23 "reference signals" that serve as vehicles for linking of neural data across different scales and species.

24
The identical behavioral correlates of task-evoked pupil dilation we report here for mice and humans, 25 imply that pupil dilation can be used as a reference signal, at least in the decision-making tasks studied  2 Five mice (all males; age range, 2-4 months) and twenty human subjects (15 females; age range,   3 19-28 y) performed the go/no-go task. Twenty-four human subjects (of which 18 had already 4 participated in the go/no-go task; 20 females; age range, 19-28 y) performed an additional yes/no task.

5
Thirty-seven human subjects (18 females; age range, 20-36 y) performed a value-based choice task, of 6 which five were excluded from the analyses due to bad eye data quality and/or excessive blinking. s; inter-stimulus-interval, 0.5 s). A weak signal tone (pure sine wave) was superimposed onto the last 20 noise stimulus (Fig. 1A). The number of noise stimuli, and thus the signal position in the sequence, was 21 randomly drawn beforehand. The probability of a target signal decreased linearly with sound position 22 in the sequence (Fig. 1B), so as to keep hazard rate of signal onset approximately flat across the trial.

23
Each trial was terminated by the subject's go-response (hit or false alarm) or after a no-go error (miss). stimulus and which was terminated by the subject's response (or after a maximum duration of 2.5 s).

1
The decision interval consisted of only an auditory noise stimulus (McGinley et al., 2015a), or, on 50% 2 of trials, a pure sine wave (2 KHz) superimposed onto the noise. Auditory stimuli were presented at the 3 same intensity of 65dB using the same over-ear headphone as in the go/no-go task.

4
Participants were instructed to report the presence or absence of the signal by pressing one of two 5 response buttons with their left or right index finger, once they felt sufficiently certain (free response 6 paradigm). The mapping between perceptual choice and button press (e.g., "yes" -> right key; "no" -> 7 left key) was counterbalanced across participants. After every 40 trials subjects were informed about 8 their performance.

9
Throughout the experiment, the target signal volume was fixed at a level that yielded about 75%

33
In all trials there was a correct answer, with the average difference between the higher and the lower 34 sequence being sampled from d ~ U(1,12) with a mean of 6.5. This experiment contained three 35 conditions, which were intermixed within a block of trials: a neutral condition, a condition designed to 36 induce a "pro-variance" effect, and a condition designed to induce a "frequent winner" effect. In this 37 report, we present analyses of the pro-variance condition; results of the neutral and frequent winner 38 conditions will be the focus of another report. The pro-variance condition involved two types of trials, 39 "narrow-correct" trials and "narrow-error" trials. In both types of trials the sequences were generated 40 from Gaussian distributions, with the mean of the higher sequence (μH) sampled from μH ∼U(45,65).

41
The mean of the lower sequence was μL = muh − d. In the narrow-correct trials, the standard deviation 42 of the higher sequence was σH = 10 while the standard deviation of the lower sequence was σL = 20; 43 in the narrow-error trials this was reversed (σH = 20 and σL = 10).
This experiment was part of a larger study that also included MEG measurements of cortical 1 activity combined with pharmacological intervention. Subjects performed the number intregration task 2 in three measurement sessions (nocebo, placebo, drug [lorazepam]); they received an additional fixed 3 €25 in the nocebo session, and an additional €70 in the placebo and drug sessions.

32
Thus, locking pupil responses to the motor response balanced those motor components in the pupil 33 responses across trials, eliminating them as a confounding factor for estimates of phasic arousal 34 amplitudes. The resulting pupil bins were associated with strongly different overall pupil response 35 amplitudes across the whole trial (Fig. S5D).

36
The go/no-go and value-based choice task entailed several deviations from the above task structure 37 that posed different challenges for the quantification of task-evoked pupil responses. We met those by 38 tailoring the analysis to the specifics of these task protocols, as described next. The go/no-task entailed, 39 by design, an imbalance of motor responses between trials ending with different decisions, with no 40 motor response for (implicit) no-choices Thus, the above-described transient motor component in the 41 pupil response would yield larger pupil responses for yes-than for no-choices, even without any link 42 between phasic arousal and decision bias. We took two measures to minimize contamination by this motor imbalance. First, we quantified the pupil responses as the maximum of the pupil derivative in an 1 early window that ranged from the start of the pupil derivative time course being significantly different 2 from zero up to the first peak (grey windows in Fig. 1D,E). Second, we excluded decision intervals 3 with a motor response before the end of this window plus a 50 ms buffer (Fig. S1E,F). In both species, 4 the resulting pupil derivate defined bins were associated with strongly different overall pupil response 5 amplitudes across the whole trial (Fig. S1B,D).

6
In the value-based choice task, the trials were substantially longer than in the go/no-go and yes/no 7 tasks (4.0-6.4 s vs. ~1 s): the length of the value sequences varied systematically between trials (5-8 8 pairs), and the high-contrast numbers elicited an initial constriction of the pupil (Fig. S6A, initial dip 9 below pre-stimulus baseline level during the first 1.5 s). In order to quantify the amplitude of phasic 10 arousal across the full interval of evidence accumulation, we computed pupil responses as the mean 11 pupil size from 1.5 s to 4.5 s after the onset of the first pair of samples (grey window in Fig. S6A), with 12 the pre-trial baseline pupil size (mean pupil size in the 500 ms before the first pair of samples) 13 subtracted out. As pupil diameter increased with each sample after the first (Fig. S6A), larger pupil 14 responses were to be expected for 8-sample compared to 5-sample trials. Therefore, we computed pupil 15 responses aligned to stimulus onset, while excluding (i) the initial dip during after the first pair of 16 samples (likely due to the pupil light reflex) and (ii) motor and/or feedback-related components 17 occurring post 4.5 s for the shortest trials (5 samples) (Fig. S6A, left). The resulting pupil response 18 defined bins were associated with strongly different overall pupil response amplitudes across the whole 19 trial (Fig. S5D).

20
For analyses of the go/no-go and yes/no tasks, we used five equally populated bins of task-evoked 21 pupil response amplitudes; we used three bins for the value-based choice task, in which subjects 22 completed fewer trials.

24
In the go/no-go task all stimuli in one trial as defined in the experiment (i.e., sequence discrete 25 signal+noise or noise sounds) were analyzed as a separate decision. The first stimulus of each trial (see 26 Behavioral tasks) was excluded from the analyses, because this interval served as a reference and never 27 included the target signal (pure sine wave). In the go/no-go and yes/no tasks, reaction time (RT) was 28 defined as the time from stimulus onset until the lick or button press. In the value-based choice task, 29 RT was defined as the time from the last sample offset until the button press. In the mice go/no-go data 30 set, intervals with RTs shorter than 240 ms were excluded from the analyses (see Quantification of 31 task-evoked pupillary responses and Fig. S1E); in the human go/no-go data set, intervals with RTs 32 shorter than 510 ms were excluded from the analyses (Fig. S1F).

34
Signal detection metrics d' and criterion (Green and Swets, 1966) were computed separately for five 35 pupil response defined bins. We estimated d' as the difference between z-scores of hit-and false-alarm 36 rates. We estimated criterion by averaging the z-scores of hit-and false-alarm rates and multiplying the 37 result by -1. In the go/no-go task, the same false alarm rate was used for each signal strength (difficulty 38 level).

39
Determining optimal choice bias in the go/no-go task 40 We simulated 50000 trials for each combination of a range of perceptual sensitivities and choice 41 biases (signal detection d' and criterion). Sensitivity ranged from 0.5 to 3.0 in steps of 0.5 and criterion 42 ranged from -3.5 to 3.5 in steps of 0.05. On each trial, target signal position (#2-7 in the sequence) was 43 determined as in the actual task (see above). On every sound interval, the agent's internal decision variable (DV) was randomly drawn from a noise or signal+noise distribution which were d' apart. The 1 noise distribution was three times larger than the signal+noise distribution because subjects encounter 2 more noise sounds (follows from the probabilities in Fig. 1B). Every encountered noise sound added 3 1.5 s (1 s sound + 0.5 s ISI; see Fig. 1A) to total time. A correct reject (DV drawn from noise 4 distribution < criterion) was followed by the next sound in the same sequence. A hit (DV drawn 5 from signal+noise distribution > criterion) resulted in a reward and the completion of the trial. A false 6 alarm (DV drawn from noise distribution > criterion) resulted in a timeout (additional 8 s added to total 7 time) and the abortion of the trial without obtaining a reward. A miss (DV drawn from signal+noise 8 distribution < criterion) resulted the abortion of the trial without obtaining a reward. For the human 9 version of the go/no-go task, an additional 8 s was added to total time after misses. Optimality was 10 defined as the criterion value that maximized reward rate (# rewards / total time).

12
Drift diffusion modeling (go/no-go and yes/no tasks) 13 We used the HDDM 0.

37
We used BIC to select the model which provided the best fit to the data (Schwarz, 1978). BIC

42
To verify that the drift criterion model indeed accounted for the pupil response-dependent changes 43 in overt choice fractions (i.e., signal detection criterion, Fig. 3F,J), we simulated a new dataset using the fitted drift diffusion model parameters. Separately per subject, we simulated 5000 trials for each 1 signal strength and pupil bin, while ensuring that the fraction of signal+noise vs. noise trials matched 2 that of the empirical data; we then computed signal detection criterion for every bin (as described 3 above). 4 We used a similar approach to test if, without monitoring task-evoked pupil responses, systematic 5 variations in accumulation bias (drift criterion) would appear as random trial-to-trial variability in the 6 accumulation process (drift rate variability) (Fig. 4). For simplicity, we now pooled across signal 7 strengths and simulated 50000 trials from two conditions that differed according to the fitted drift 8 criterion (accumulation bias) estimates in the lowest and highest pupil-defined bin of each individual; 9 drift rate, boundary separation and non-decision time were fixed to the mean across pupil bins of each 10 individual; drift rate variability was fixed to 0.5. We then fitted the drift criterion model as described    Smith, 1998), but now performed within a mixed linear modeling framework. In the first step, we fitted 38 three mixed models to test whether pupil responses predominantly exhibited no effect (zero-order polynomial), a monotonic effect (first-order), or a non-monotonic effect (second-order) on the 1 behavioral metric of interest (y). The fixed effects were specified as: with β as regression coefficients, S as the signal strength (for go/no-go task), and TPR as the bin-9 wise task-evoked pupil response amplitudes. We included the maximal random effects structure 10 justified by the design ( Barr et al., 2013). For data from the go/no-go task, the random effects were 11 specified to accommodate signal strength coefficient to vary with participant, and the intercept and 12