ABSTRACT
Studying the temporal dynamics of visual perceptual decision-making can offer key insights into cognitive processes contributing to it. Here, we investigated the time course as well as fundamental psychophysical constants governing visual perceptual decision-making in freely behaving mice performing 2-AFC orientation discrimination tasks. We did so by analyzing response accuracy against reaction time (i.e., conditional accuracy), and using drift diffusion modeling, in a series of experiments in which we varied target size, contrast, duration, and presence of a foil. Our results revealed two distinct stages in the time course of mouse visual decision-making - a ‘sensory encoding’ stage, in which conditional accuracy exhibits a classic tradeoff with response speed before asymptoting at a peak level, and a subsequent ‘short term memory-dependent’ stage exhibiting a classic asymptotic decay of performance following stimulus offset. We estimated the duration of visual sensory encoding as ~300 ms across tasks, the lower bound of the duration of short-term memory as ~1700 ms, the briefest duration of visual stimulus input that is informative as ~40 ms, and the longest duration that benefits overall decision accuracy as 1000 ms. Separately, by varying stimulus onset delay, we demonstrated that the conditional accuracy function and RT distribution, the two components of overall accuracy, can be independently modulated, and also estimated impulsivity of mice via an ‘impulsivity index’. Our results reveal shared stages in mouse and human visual decision dynamics and establish a quantitative foundation for investigating the underlying neural circuit bases in mice.
SIGNIFICANCE STATEMENT This study presents a quantitative breakdown of the time course of visual decision-making in mice during naturalistic behavior. It demonstrates parallel stages in mouse visual perceptual decision dynamics to those in humans, estimates their durations, and shows that mice are able to discriminate well under challenging visual conditions – with stimuli that are brief, low contrast, and small. This work sets the stage for investigating the neural bases of visual perceptual decision dynamics and their dysfunction in mice.
INTRODUCTION
Exploring the temporal dynamics of perceptual decisions from onset of the sensory input through the initiation of behavioral responses affords a key window into the underlying cognitive processes [1–3]. Investigations of such dynamics have led to rich insights into human visual decision-making[4, 5], and other species/modality [6–8]. They have revealed distinct stages in perceptual processing, their timing, and their interactions, and have highlighted the importance of conditional accuracy analysis (over just conventional reaction time analyses) for investigating dynamics of perception and decision-making [9–11]. Conducting such investigations in a genetically tractable animal model can additionally facilitate the subsequent unpacking of the mechanistic basis of different stages in perceptual dynamics. However, despite the recent rise in the use of the laboratory mouse for the study of the visual system [12–14] and of visually guided decision-making [15–25], the temporal dynamics of visual perceptual decisions represents a significant gap in mouse visual psychophysics [26, 27].
In this study, we adapted approaches from human psychophysical studies to investigate the dynamics of visual decision-making in freely behaving mice. In a series of experiments involving touchscreen-based [24, 28] 2-alternative forced choice (2-AFC) orientation discrimination tasks, we investigated the effect of stimulus size, contrast, duration, delay, and the presence of a competing foil on mouse decision performance (accuracy, sensitivity and RT), and importantly, on the conditional accuracy function. We identified two distinct stages in the time-course of mouse visual decision-making within a trial, as has been reported in humans [29–36]. In the first ‘sensory encoding’ stage [29–32], response accuracy exhibited a classic tradeoff with response speed, and asymptoted to a peak level. In the next stage, response accuracy did not exhibit such a tradeoff, but instead, declined following stimulus offset, exhibiting a classic asymptotic decay to chance performance consistent with a short-term memory (STM)-dependent process [33–36]. Combining these results with those from drift diffusion modeling [37] allowed us to estimate fundamental psychophysical constants in mouse perceptual decision-making: the time needed by mice to complete visual sensory encoding, the duration for which their short term memory can intrinsically support discrimination behavior after stimulus input is removed, the shortest visual stimulus duration that is informative, and the longest stimulus duration beyond which no additional benefit in overall decision accuracy is seen. Additionally, by varying stimulus onset delay, we demonstrated that the two components of accuracy, namely, the conditional accuracy function and the RT distribution, can be independently modulated by task parameters. This last experiment also allowed a quantitative estimation of impulsivity of mice via a novel ‘impulsivity index’. Together, this study reveals parallels between mouse and human visual decision dynamics, despite differences in their sensory apparatuses, and enable investigations into the neural circuit underpinnings of the timecourse of perceptual decision-making in mice.
METHODS
Animals
Thirty-three C57Bl6/J male mice (Jackson Labs) were housed in a temperature (~75F) and humidity (~55%) controlled facility on a 12:12h light:dark cycle; ZT0=7 am. All procedures followed the NIH guidelines and were approved by the [Author University] Animal Care and Use Committee (ACUC). Animals were allowed to acclimate for at least one week, with ad libitum access to food and water before water regulation was initiated per previously published procedures [38]. Briefly, mice were individually housed (for monitoring and control of daily water intake of each identified animal), and administered 1mL water per day to taper their body weight down, over the course of 5-7 days, to 80-85% of each animal’s free-feeding baseline weight. During behavioral training/testing, the primary source of water for mice was as a reinforcer for correct performance: 10 μL of water was provided for every correct response. Experiments were all carried out in the light phase.
Apparatus
Behavioral training and testing were performed in soundproof operant chambers equipped with a touchscreen (Med Associates Inc.), a custom-built reward port (fluid well), infrared video cameras, a house light and a magazine light above the reward port. The reward port was located at the opposite wall of the chamber relative to the touchscreen (Fig. 1A, 1-1A). Mice were placed within a clear plexiglass tube (5cm diameter) that connects the touchscreen and the reward port. A thin plexiglass mask (3 mm thickness) was placed 3 mm in front of the touchscreen with three apertures (1cm diameter) through which mouse was allowed to interact with the screen via nose-touch. The ‘left’ and ‘right’ apertures were placed 3cm apart (center-to-center) along the base of the triangle, and a ‘central’ aperture, at the apex of the triangle, was 1.5 cm below the midpoint of the base. All experimental procedures were executed using control software (K-limbic, Med-Associates).
(A) Left: Schematic of touchscreen-based experimental setup showing key components. Right: Snapshot of freely behaving mouse facing a visual stimulus on the touchscreen. (B) Schematic of 2-AFC task design. Black discs: Screenshots of touchscreen with visual stimuli; dashed ovals: locations of holes through which mice can interact with touchscreen; white ‘+’: zeroing cross presented within central response hole at start of each trial; red arrowhead: nose-touch by mouse. Shown also are vertical or horizontal grating stimuli, and reinforcement (water)/punishment (timeout) schedule. Bottom: Trial timeline. 0 ms corresponds to the instant at which the mouse touches the zeroing cross (trial initiation). Immediately following this, the target grating was presented and stayed on for 3s, or until the mouse responded, whichever came first. Vertical and horizontal targets were interleaved randomly. (C) Psychometric plots of discrimination accuracy against stimulus contrast (Michelson contrast: (luminancebright-luminancedark)/ (luminancebright+luminancedark)*100; Methods). Different colors correspond to different target sizes. Data: mean ± s.e.m; n= 8 mice. 2-way ANOVA, p<0.001 (contrast), p<0.001 (size), p=0.498 (interaction). (D) Plot of discrimination sensitivity (d’, filled data) and critera (|c|, hollow data) against stimulus contrast. 2-way ANOVA, d’: p<0.001 (contrast), p<0.001 (size), p=0.899 (interaction); |c|: p=0.374 (contrast), p=0.056 (size), p=0.998 (interaction). The d’ and |c| were plotted with a slight offset at contrast =20% to avoid overlapping. (E) Plot of median reaction time (RT) against stimulus contrast. 2-way ANOVA, p=0.99 (contrast), p=0.004 (size), p=1 (interaction).
See also Fig. 1-1.
Visual stimuli
Visual stimuli were bright objects on the dark background (luminance = 1.32 cd/m2). A small cross (60×60 pixels; luminance = 130 cd/m2) was presented in the central aperture and had to be touched to initiate each trial. Oriented gratings (horizontal or vertical) were generated using a square wave, with fixed spatial frequency (24 pixels/cycle) known to be effective for mice to discriminate [17]. The dark phase of the grating was black as the background (luminance, Ldark= 1.32 cd/m2), and the bright phase was varied between 1.73 cd/m2 and 130 cd/m2 depending on the tasks (see below). The contrast of grating stimulus was calculated as the Michelson contrast = (luminancebright – luminancedark) / (luminancebright + luminancedark) *100; The size of the stimulus was also varied depending on the task, ranging from 60 pixels x 60 pixels to 156 pixels x 156 pixels, which subtended 25-65 visual degrees at a viewing distance of 2 cm from the screen (Fig. 1-1A).
Experimental procedure and behavioral training
Each mouse was run for one 30 min behavioral session per day, with each session yielding 80-180 trials. Each trial in a session was initiated by the mouse touching the zeroing cross.Upon trial initiation, the cross vanished, and the visual stimulus (or stimuli) were immediately presented (except in the delay task), for a duration of 0.1-3s depending on the task (see below).Mice were trained to report the orientation of target grating, by nose-touching the correct response aperture (vertical → left; horizontal → right). A correct response triggered a tone (600 Hz, 1 sec), the magazine light turning on, and the delivery of 10μL of water. When mice turned to consumed the reward, their head entry into the reward port was detected by an infrared sensor which caused the zeroing cross (for the next trial) to be presented again. An incorrect response triggered a 5-s timeout, during which the house light and the magazine light were both on and zeroing cross was unavailable for the next trial to be initiated. A failure to respond within 3s (starting stimulus presentation) resulted in a trial reset: the stimulus vanished and the zeroing cross was presented immediately (without a timeout penalty), to allow initiation of the next trial. Well-trained animals failed to respond on fewer than 5% of the total number of trials, and there were no systematic differences in the proportion of such missed trials between different conditions. Within each daily 30-minute behavioral session, mice consumed approximately 1mL of water. If a mouse failed to collect enough water from the behavioral session, they were provided with a water supplement using a small plastic dish in their home cage.
Single-stimulus discrimination task
Upon trial initiation, a single grating stimulus (i.e., the ‘target’) was presented above the central aperture, at the same horizontal level as the left and right apertures, and mice were required to report its orientation with the appropriate nose-touch (Fig. 1B). When stimulus size and contrast were manipulated (Fig. 1, and 2), three different sizes were tested: 60×60, 84×84, 108×108 (pixels x pixels). Of each size, seven different levels of contrast were tested: 20%, 32%, 54%, 70%, 85%, 93%, 98%. Trials with different stimulus contrasts at a particular size were interleaved randomly throughout a session, while trials with different stimulus sizes were examined on different days. Data were recorded from a total of 18 sessions (days). When stimulus size was manipulated independently (Fig. S1J), full-contrast (98%) gratings of different sizes were tested: 60×60, 84×84, 108×108, 132×132, 156×156 (pixels × pixels). Trials with different stimulus sizes were interleaved randomly throughout a session, and data were recorded from a total of five sessions (days). When the stimulus duration was manipulated (Fig. 3), the contrast (98%) and size (60×60) of the grating were fixed, and eleven different stimulus durations were tested: 100, 200, 300, 400, 500, 600, 800, 1000, 1500, 2000, 3000 ms. The stimulus duration was fixed for a given day, and across days, was varied in a descending sequence from 3000 ms to 100 ms. Data were recorded from a total of 21 sessions. When the stimulus onset delay was manipulated (Fig. 5), the contrast (98%), size (60×60), and duration (600ms) of the grating were fixed. Three different delays were tested: 0, 100, and 200 ms. The delay duration was fixed for a given day, and varied in an ascending sequence from 0 ms to 200 ms. Data were recorded from a total of 7 sessions (days).
(A) Plot of accuracy as a function of RT bins (conditional accuracy) using same dataset as Fig. 1. Orange dots: Data pooled across all stimulus sizes and contrast ratios, n=8 mice; RT bin size = 100 ms. Orange curve: Conditional accuracy function, CAF (best-fit rising asymptotic function; Methods); light orange shading: 95% CI of the fit (Methods). Indicated are three key parameters (apeak, t50 and tpeak) describing the sensory encoding stage of the CAF – the initial period during which accuracy improves for longer RT values, exhibiting a tradeoff between speed and accuracy (see text; Methods). Peak accuracy (apeak): mean ± s.d.= 87.5 ± 0.5%; time to reach peak accuracy (tpeak): 462 ± 13 ms; time at which accuracy just exceeds 50% (t50): 236 ± 10 ms. Gold histogram: RT distribution (y-axis on the right). The overall response accuracy for a particular stimulus condition is the dot product of the CAF and the RT distribution. (B) CAFs for targets of various sizes (black: 45°; blue: 35°; red: 25°); conventions as in A. (C) Plots of key CAF parameters for different target sizes. Left panel: apeak; middle panel: tpeak; right panel: t50. Data show mean ± s.t.d of distribution of bootstrapped estimates (Methods). ‘*’ (‘n.s.’): p<0.05 (p>0.05), paired permutation tests followed by HBMC correction (Methods). apeak: p<0.001 (25 ° vs. 35°), p<0.001 (35 ° vs. 45°), p<0.001 (25 ° vs. 45°); tpeak: p=0.398 (25 ° vs. 35°), p=0.827 (35 ° vs. 45°), p=0.576 (25 ° vs. 45°); t50: p=0.226 (25 ° vs. 35°), p=0.127 (35 ° vs. 45°), p=0.918 (25 ° vs. 45°). (D) CAFs for targets of different contrast conditions (magenta: ‘low’ contrast – first three contrast ratio levels from Fig. 1C; green: ‘high’ contrast – last four contrast ratio levels; Methods); conventions as in A. (E) Plots of key CAF parameters for different contrast conditions; conventions and statistical methods as in C. apeak: p<0.001 (low vs. high contrast conditions); tpeak: p<0.001; t50: p=0.747.
(A) Psychometric plot of discrimination accuracy against stimulus duration. Data: mean ± s.e.m; n= 6 mice. 1-way ANOVA; p=0.047. Stimulus size = 25°; contrast = 98%. (B) Plot of discrimination sensitivity (d’, filled data, p=0.001, 1-way ANOVA) and criteria (|c|, hollow data, p=0.802, 1-way ANOVA) against stimulus duration. (C) Plot of median reaction time (RT) against stimulus duration. 1-way ANOVA; p=0.133. (D) Plot of the conditional accuracy (solid data) as a function of RT bins relative to stimulus offset. Only trials in which the stimulus was longer than 332 ms (to ensure full sensory encoding; see text), and in which the response was initiated after the stimulus disappeared (i.e., with RT ≥ stimulus duration + 200 ms; see text) were included (Methods). Curve and shading: best-fit sigmoid function and 95% C.I. Hollow data: bootstrapped estimates of the peak accuracy (apeak, mean ± s.t.d. = 85.6 ± 3.1%), tdecay (762 ± 271 ms) and tchance (1916 ± 385 ms). Histogram: RT distribution (y axis on the right).
See also Fig. 3-1.
(A) Schematic of the flanker task; target grating is always presented at the lower location; a second ‘flanker’ grating (orthogonal orientation – incongruent flanker, or same orientation – congruent flanker) is presented simultaneously, and always at the upper location; contrast of flanker is systematically varied (adapted from [24]). All other conventions as in Fig. 1. Plots represent results from new analyses applied to published data [24] after collapsing across all flanker contrasts (Methods).
(B) Left panel: Comparison of performance between trials with incongruent vs. congruent flanker. p<0.001, signed rank test. Right panel: Comparison of median RT between trials with incongruent vs. congruent flanker. p=0.019, signed rank test. Data re-analyzed from You et al [24]; each line represents data from one mouse. (C) CAFs of the sensory encoding stage; data correspond to trials with RT < stimulus offset, i.e., 1000 ms. Blue: trials with congruent flanker; red: trials with incongruent flanker; histograms; RT distributions.
(D) Plots of key parameters of CAFs (sensory encoding stage) for trials with congruent vs. incongruent flanker; apeak (left), t50 (middle), and tpeak (right). Data show mean ± s.t.d of distribution of bootstrapped estimates. ‘*’ (‘n.s.’): p<0.05 (p>0.05), permutation tests followed by HBMC correction, congruent vs. incongruent flanker conditions (Methods). apeak: p<0.001; t50: p=0.022; tpeak: p=0.01.
(E) CAFs of the STM-dependent stage; data correspond to trials with RT > stimulus offset (1000 ms) + tdelay (200 ms; see text); aligned to stimulus offset. Blue: trials with congruent flanker; red: trials with incongruent flanker.
(F) Plots of key parameters of CAFs (VSTM-dependent stage) for trials with congruent vs. incongruent flanker; tchance (left) and tdecay (right). Conventions and statistical methods as in D. tchance: p=0.426; tdecay: p=0.313.
(A) Left: RT distributions at different stimulus onset delays (shown as shaded zones); plotted with response to time of trial initiation (for clarity). Right:Plot of median RT (w.r.to train initiation) against stimulus onset delay; p=0.138, 1way ANOVA; Pearson’s ρ = 0.99, p=0.035. (B) Plot of response accuracy against stimulus onset delay; p=0.337, 1way ANOVA; ρ= −0.988, p=0.098. (C) Conditional accuracy functions of the sensory encoding stage; Blue: trials with no delay; red: trials with delay = 100 and 200 ms; shaded bands: bootstrap confidence intervals (95%); confidence intervals overlap for the two datasets. Histograms: RT distributions. (D) Conditional accuracy functions of the STM-dependent stage. Other conventions as in C.
See also Fig. 5-1.
Flanker task
Upon trial initiation, either one stimulus (‘target’, 60×60 pixels, contrast=88%) was presented at the lower location, or two stimuli were presented simultaneously, with the target at the lower location and a second ‘flanker’ at the upper location (Fig. 4A). Flankers were of the same size (60×60) and spatial frequency (24 pixel/cycle) as the target, but of contrast in 8 different levels: 13%, 27%, 41%, 58%, 74%, 88%, 94%, 98%. The orientation of the flanker was either identical to that of the target (‘congruent trial’) or orthogonal to that of the target (‘incongruent trial’). The stimulus (stimuli) was (were) presented for a duration of 1s, and mice were required to report orientation of the target grating with the appropriate nose-touch (within 3s). All types of trials (no flanker, congruent, incongruent) and flanker contrasts were interleaved randomly within each daily session. Data from this experiment have been reported previously [24], and were re-analyzed here using different analyses, after collapsing trials across all the contrasts of the flanker.
Subject inclusion/exclusion
25 out of 33 mice passed the inclusion threshold of response accuracy >70% in single stimulus discrimination task. Of these, different subsets of mice were used in different tasks. For mice involved in more than one experiment, they were well rested for 3-8 weeks with food and water ad libitum between experiments. Before the start of each experiment, all mice were given a few days of practice session to ensure that they remembered/re-learned the association between the orientation of single target and the appropriate nose-touch.
Trial inclusion/exclusion
Mice were observed to become less engaged in the task towards the end of a behavioral session, when they had received a sizeable proportion of their daily water intake. This was reflected in their behavioral metrics: they tended to wait longer to initiate the next trial, and their performance deteriorated. We identified and excluded such trials following a published procedure [24], in order to minimize confounds arising from loss of motivation towards the end of sessions. Briefly, we pooled data across all mice and all sessions, treating them as coming from one session of a single ‘mouse’. We then binned the data by trial number within the session, computed the discrimination performance in each bin (% correct), and plotted it as a function of trial number within session (Fig. 1-1C, 3-1, 5-1A). Using a bootstrapping approach, we computed the 95% confidence interval for this value. We used the following exclusion criterion: Trials q and above were dropped if the qth trial was the first trial at which at least one of the following two conditions was satisfied: (a) the performance was statistically indistinguishable from chance on the qth trial and for the majority (3/5) of the next 5 trials (including the qth), (b) the number of observations in qth trial was below 25% of the maximum possible number of observations for each trial (mice*sessions), thereby signaling substantially reduced statistical power available to reliably compare performance to chance. The plots of performance as a function of trial number, and number of observations as a function of trial number for the different tasks in this study are shown in Figs. 1-1C, 3-1, 5-1A, along with the identified cut-off trial numbers (q).
Behavioral measurements
Response accuracy (% correct) was calculated as the number of correct trials divided by the total number of trials responded (correct plus incorrect). Reaction time (RT) was defined as the time between the start of stimulus presentation and response nose-touch, both detected by the touchscreen. In the experiment involving stimulus onset delays (Fig. 5A), RT was computed with respect to trial initiation (for clarity; as opposed to from stimulus onset), and denoted ‘RT w.r.to trial initiation’.
Signal detection analysis (sensitivity and criterion)
In the framework of signal detection theory, we assigned the correct vertical trials as ‘hits’, incorrect vertical trials as ‘misses’, correct horizontal trials as ‘correct rejections’ and incorrect horizontal trials as ‘false alarms’, and calculated the perceptual sensitivity (d’) and criterion (c) accordingly [39] (Fig. S1B). Because of the inherent symmetry in 2-AFC tasks, this calculation was independent of which grating orientation – vertical or horizontal – was assigned as ‘signal’ and which as ‘noise’. Consequently, a positive value of c caused poor performance just as much as the corresponding negative value, and therefore, we quantified the absolute value of c (|c|) as the relevant metric of decision criterion.
Drift diffusion modeling of RT distributions
To shed light on potential mechanisms underlying observed RT distributions, we applied the drift-diffusion model to our RT data [40, 41]. This model hypothesizes that a subject (‘decision maker’) collects information from the sensory stimulus via sequential sampling, causing sensory evidence to accrue for or against a particular option (usually binary) while viewing the stimulus. A decision is to be made when the accumulating evidence reaches an internal threshold of the subject. This process of evidence accumulation, together with the processes of sensory encoding and motor execution, as well as threshold crossing, determine the RT observed on each trial (Fig. 1-1D).
We used a standard version of the model that consists of four independent variables [37, 42]: (1) the drift rate, (2) the boundary separation, (3) the starting point, and a (4) non-decisional constant (tdelay), which accounts for the time spent in sensory encoding and motor execution. In the case of our tasks, there was no reason for the drift rate to be different between vertical versus horizontal gratings, and therefore, we merged both type of trials (trials with a horizontal target grating and trials with a vertical target grating). We treated ‘correct’ response and ‘incorrect’ response as the two binary options, and fit the diffusion model to the RT distributions of correct versus incorrect trials using the fast-dm-30 toolbox with the maximum likelihood option to gain estimates of those four parameters for each individual mouse [40].
Conditional accuracy analysis
In order to get the full distribution of RT, trials from all mice were pooled together and treated as if they were from one single mouse. Pooled trials were then sorted by their RT, and then binned by RT into 50ms, 100 ms or 200 ms bins, depending on the total number of trials available in each experiment. Conditional accuracy was calculated as the number of correct trials divided by the total number of trials for each RT bin.
Conditional accuracy function (CAF)
To quantitatively describe the relationship between the conditional accuracy and RT, we fitted the plot of discrimination accuracy against (binned) RT with different functions (the CAF, see below) using a nonlinear least square method.
For RT bins aligned to stimulus onset (Fig. 2, 4C, 5C), we fit the data using an asymptotic function: accuracy= λ (1 – e- γenc (RT-δ))). Three key metrics were defined for the sensory encoding phase for the use in subsequent comparisons between trial conditions: (1) peak conditional accuracy (apeak), the maximal level of accuracy that the CAF reaches within the range of RT; (2) the timepoint at which the conditional accuracy reaches its maximal (tpeak). We defined it as the time point when the ascending CAF reaches apeak *0.95; and (3) the timepoint at which the conditional accuracy just exceeds chance level of performance (t50). We defined it as the time point when the ascending CAF crosses 52.5% (i.e., 50%*1.05). Note that tpeak and t50 are influenced by the slope parameter, γenc, and the temporal offset at chance performance, δ.
For RT bins aligned to stimulus offset (Fig. 3D, 4E, 5D), we fit the data using a sigmoid function: accuracy= λ [1/(1 + e-βdec (RT-τ))]+50 to quantify the time course of performance decay. Two key metrics were defined for this VSTM phase for the use in subsequent comparisons between trial conditions: (1) the first time point at which the conditional accuracy drops from its maximum (tdecay). We defined it as the time point when the descending CAF crosses apeak *0.95; and (2) the first timepoint at which the conditional accuracy drops to a level indistinguishable from the chance (tchance). We defined it as the timepoint when the descending CAF crosses 52.5%. In (rare) cases when the CAF never went below 52.5%, tchance was set to be 3000ms. Note that tdecay and tchance are influenced by the slope parameter, βdec, and τ.
The confidence interval of the CAF and each metric were estimated by bootstrapping: the same number of trials were resampled from the raw data randomly with replacement, and were then processed following the same steps as described above to get repeated estimates of the CAF and corresponding metrics. Such resampling was repeated 1000 times to estimate the dispersion of each metric. Plots of the estimated value of each metric show the mean ± std of the bootstrapped distribution of estimates (Fig. 2CE, 3D, 4DF, 5-1BC).
Statistical tests
All analyses and statistical tests were performed in MATLAB. For single-stimulus experiments in which only one stimulus parameter was systemically varied, one-way ANOVA was applied to examine the effect of the manipulating the single factor (duration and delay, Fig. 3ABC, 5AB, 1-1J). For experiments that involved changing both stimulus size and contrast (Fig. 1CDE, 1-1E-H), two-way ANOVA was applied to examine the effect of each factor, as well as their interaction.
For the flanker task, the Wilcoxon signed-rank test was used to examine if the group performance was different between trial types (Fig. 4B).
The Pearson correlation coefficient and associated p-value were calculated for paired data (Fig. 3ABC, 5AB, 1-1I) using corrcoef function in MATLAB; correlation was also used to evaluate whether there was a trend in the data that did not show significance with ANOVA.
For the metrics associated with CAF, permutation tests were used to determine if the estimated values of each metric were different between experimental conditions (Fig. 2CE, 4DF, 5-1BC). Specifically, trials from both conditions were pooled together (unlabeled), and randomly re-assigned into two groups. The best-fit CAF and associated metrics were then calculated for each group, and so was the difference of metrics between groups. Following 1000 repetitions, the resulting distribution of the difference of metrics between groups was obtained under the null hypothesis that the data from the two conditions were indistinguishable (i.e., from the same distribution). The real, observed difference of metrics obtained from the two experimental conditions (for instance, Δapeak from low-contrast vs. high contrast conditions) was compared against the null distribution to compute the corresponding p-value.
Correction for multiple comparisons was performed where necessary using the Holm-Bonferroni test (HB test) for multiple comparisons.
RESULTS
In this study, freely behaving mice were trained to perform 2-AFC orientation discrimination in a touchscreen-based setup [24, 28](Methods). Briefly, mice were placed in a plexiglass tube within a soundproof operant chamber equipped with a touch-sensitive screen at one wall and a reward well at the opposite wall (Fig. 1A, S1A). A plexiglass sheet with three holes was placed in front of the touchscreen – the holes corresponded to the locations at which mice were allowed to interact with the screen by a nose-touch (Fig. 1A). All trials began with a nose-touch on a bright zeroing-cross presented within the lower central hole (Fig. 1B). Immediately following nose-touch, an oriented grating (target; bright stimulus on a dark background) was presented at the center of the screen. Mice were rewarded if they responded to the orientation of the target with an appropriate nose-touch: vertical (horizontal) grating → touch within upper left (upper right) hole. Behavioral data were collected from daily sessions that lasted 30 minutes for each mouse.
Stimulus size and contrast modulate mouse discrimination performance
We first examined the effect of target size and target contrast on the decision performance of mice in the orientation discrimination task. Here, the target grating was presented for up to 3 seconds after trial initiation (Fig. 1B; Methods), and its size and contrast were systematically varied; the spatial frequency was fixed at 0.1 cycles/degree (24 pixels/cycle) [16, 17] (Methods). Mice were allowed to respond at any time during stimulus presentation, and the stimulus was terminated automatically upon response.
We found that both the target contrast and size significantly modulated discrimination accuracy (Fig. 1C, 2-way ANOVA, main effect of contrast, p<0.001; main effect of size, p<0.001; interaction, p=0.498). The improvements in response accuracy observed with increasing target contrast as well as target size were accompanied by improvements in perceptual sensitivity (Fig. 1D, filled symbols; 2-way ANOVA, main effect of contrast, p<0.001; main effect of size, p<0.001; interaction, p=0.899), but no detectable change in decision criterion (Fig. 1D, open symbols). These results revealed that mice discriminated target orientation better than chance even at the lowest contrast (20%) and size (25°) tested (Fig. 1C; red dot at the left lower corner, p=0.039, Wilcoxon signed rank test). Additionally, at this smallest target size (25°), mice could discriminate with >80% accuracy and d’>2 s.d. for most of the stimulus contrasts (≥54; Fig. 1CD, red data).
The effect of these parameters on median response times (RTs) was less pronounced. Target size, but not contrast, modulated reaction times (RTs) (Fig. 1E, two-way ANOVA; main effect of size, p=0.004; main effect of contrast, p=0.998; interaction, p=1).
We also analyzed the full RT distributions using the drift diffusion model (DDM) [37, 42]–a four-parameter model fit to RT distributions corresponding to the two choices (left vs. right; Methods; Fig. 1-1D). We found that higher contrasts caused faster rates of evidence accumulation (drift rate, Fig. 1-1E), and also caused a systematic change in the starting point (Fig. 1-1G). These changes were complementary (Fig. 1-1I), thereby accounting for the lack of effect of target contrast on median RTs despite increase in drift rate (consistent with published reports [24])
The DDM analysis also yielded a quantitative estimate of tdelay, a parameter that represents the combined ‘delay overheads’ underlying a perceptual decision process. Tdelay accounts for: (a) the time taken for the sensory (visual) periphery to transduce and relay information to visual brain areas (i.e., neural response latency), as well as (b) the time taken for executing the motor response (i.e., motor execution delay). In our tasks, the latter corresponds to the time for the mouse to move its head (and body) to achieve the appropriate nose-touch. Notably, we found that stimulus contrast as well as size had no discernable effect on tdelay (Fig. 1-1H. 2-way ANOVA, size: p=0.308, contrast: p=0.523; interaction: p=0.931), and the average value of tdelay was 212 ms.
Together, these results revealed a systematic effect of target contrast as well as size on discrimination accuracy, driven by an effect on perceptual sensitivity rather than response criterion. Additionally, they revealed that the combined delay overhead in the visual decision process was nearly constant across conditions, at ~200 ms (tdelay).
Effect of stimulus size and contrast on dynamics of visual decision-making: the sensory encoding stage
To investigate the dynamics of visual perceptual decision-making, we adapted approaches from human studies and the examined the dependence of response accuracy on RT, i.e., the so-called ‘conditional accuracy’ function [9–11]. To this end, we first pooled trials across the different trial conditions (3 sizes x 7 contrasts) from all mice (n=8), sorted them based on RT, and plotted conditional accuracy for each RT bin (100ms; Fig. 2A-orange dots; Methods). We found that for responses with RT less than 500 ms, conditional accuracy improved for longer RT (Pearson’s ρ=0.99, p=0.02), consistent with the classic ‘speed-accuracy tradeoff’ [34]. For responses with RT greater than 500 ms and up to 3s, the allowed duration for responses, conditional accuracy plateaued, and was independent of RT (Pearson’s ρ=0.33, p=0.11). Drawing upon arguments from human behavioral studies, we reasoned that the initial transient stage of the conditional accuracy function reflects the process of sensory encoding: during it, slower responses allow more sensory evidence to be acquired, thereby improving conditional accuracy upto a peak value reflecting the completion of sensory encoding [29–32].
We next quantified the dynamics of sensory encoding by fitting the conditional accuracy data with an asymptotic function (Fig. 2A – orange curve) [9–11], and estimating three key metrics: (1) the peak conditional accuracy (apeak), (2) the timepoint at which conditional accuracy just exceeded 50% (chance) performance (t50; Methods), and (3) the timepoint at which conditional accuracy reached its peak (tpeak; Methods).
To examine the effect of stimulus size on sensory encoding dynamics, we fit trials of different stimulus sizes separately (Fig. 2B), and estimated the key metrics in each case (Methods; all contrasts included, all mice). We found that the peak conditional accuracy was significantly modulated by stimulus size (Fig. 2C – left; apeak, size 25°, mean ± s.d. =81.3 ± 1.2%; size 35° =88.0 ± 0.7%; size 45° =92.4 ± 0.9%; *, p<0.05, permutation tests with HBMC correction). The time to exceed chance performance and the time to reach peak accuracy, however, were not (t50, Fig. 2C-middle, size 25°=190 ± 31 ms, size 35°=221 ± 14 ms, size 45°=193 ± 20 ms; tpeak, Fig. 2C-right, size 25°=491 ± 56 ms, size 35°=461 ± 22 ms, size 45°=467 ± 26 ms).
To examine the effect of stimulus contrast on sensory encoding dynamics (Fig. 2D), we divided trials (all sizes included, all mice) into two subsets: (1) trials with target contrast ≤54% (‘low-contrast’), and (2) trials with target contrast > 54% (‘high-contrast’; Methods). We fit each subset separately and found that the peak conditional accuracy was significantly modulated by stimulus contrast (Fig.2E-left; apeak: low-contrast = 84.6 ± 0.9%; high-contrast = 89.5 ± 0.6%, p<0.001, permutation test). There was no significant effect of stimulus contrast on t50 (Fig. 2E-middle, low-contrast = 213 ± 20 ms; high-contrast = 207 ± 12 ms, p=0.747, permutation test), but the time to reach peak accuracy was significantly modulated (Fig. 2E-right; tpeak: low-contrast = 532 ± 30 ms; high-contrast = 412 ± 17 ms, p<0.001, permutation test). The shorter tpeak despite higher apeak at higher contrasts indicated that the rate of sensory encoding was faster for higher contrast stimuli.
The findings that contrast and size alter apeak (and tpeak) demonstrate the causal role of stimulus features in controlling fundamental properties of sensory encoding dynamics. Additionally, because tpeak represents the window for sensory encoding combined with the delay ‘overheads’, we estimated the duration of just the sensory encoding stage (temporal integration window) as tpeak – tdelay, varying between 212 ms (412 ms – 200 ms; high contrast) and 332 ms (532 ms −200 ms, low contrast) across conditions.
Thus, conditional accuracy analysis revealed the presence of a sensory encoding stage in mouse visual perceptual dynamics, governed by the ‘speed-accuracy tradeoff, and lasting up to 332 ms.
Stimulus duration and the dynamics of visual decision-making: the memory-dependent stage
Following the sensory encoding stage, the next stage in the timecourse of perceptual decisions identified in human studies is one that reflects completion of sensory encoding and the availability of a fully constructed internal representation of the target stimulus for guiding behavior [43]. Consequently, during this so-called internal or ‘short-term memory’ (STM)-dependent stage, longer reaction times and additional sampling do not produce improvements in accuracy [30]. Additionally, during this stage, a drop in accuracy is observed once stimulus input is removed – attributed typically to decay of information in STM process [33–36].
In the experiments so far, following the sensory encoding stage (ending at tpeak=532 ms; Fig. 2A), we observed no further increase in performance with longer RTs, consistent with human studies. Additionally, we also observed that the performance plateaued and did not exhibit a decay, consistent with the target stimulus being present throughout the response window of 3s (Fig. 2A, B, D).
To test for the existence of the second ‘decaying’ stage in mouse visual decision dynamics, we performed an experiment in which we shortened the stimulus duration systematically from 3s. This allowed us to examine decision behavior following stimulus offset. We reasoned that for trials in which mice initiated responses after the stimulus disappeared and, additionally, after the sensory encoding stage was also complete, a decline of conditional accuracy with longer RTs would reflect the reliance of mice on STM information for the production of correct responses [44–48]. In this experiment, stimulus size and contrast were maintained fixed at 25°and 98%, respectively.
As a first step in the analysis, we examined overall behavioral performance at different stimulus durations and found that it was significantly modulated (Fig. 3A, one-way ANOVA, p=0.047), with accuracy decreasing with decrease in stimulus duration (Pearson’s ρ=0.74, p=0.01). This effect was driven by a commensurate effect on perceptual sensitivity (Fig. 3B, filled data; one-way ANOVA, p=0.001; Pearson’s ρ=0.74, p=0.01) but not decision criterion (Fig. 3B, hollow data; one-way ANOVA, p=0.802). There was also a trend of decreased RT as the stimulus duration decreased (Fig. 3C, one-way ANOVA, p=0.133; Pearson’s ρ=0.86, p<0.001). These results revealed, additionally, that the shortest stimulus duration needed for mice to be able to discriminate above chance was less than 100 ms – the smallest duration tested (Fig. 3C).
Next, to examine decision dynamics, we constructed the conditional accuracy function. As motivated above, we wished to include only those trials on which mice initiated responses after the stimulus disappeared, and also after the sensory encoding stage could have been completed. Therefore, we included only the trials on which the reaction time was longer than stimulus duration + 200 ms (estimate of tdelay from Fig. S1G), and on which the stimulus was presented for longer than 332 ms (estimate of the duration of the sensory encoding stage from Fig. 2A). We aligned these trials to stimulus offset, and computed the conditional accuracy. We observed the classic decay in conditional accuracy with longer RTs (Fig. 3D).
To quantify the time course of the decay, we fit the conditional accuracy data with a sigmoidal function, and estimated three key metrics (Fig. 3D). First, the peak performance, apeak, was 85.6 ± 3.1%, comparable to the asymptotic level of Figure 2A, thereby supporting that sensory encoding is, indeed, complete on these trials. Second, the time point at which the conditional accuracy started to decline, tdecay, was ~750 ms (762 ± 271 ms) after stimulus offset. Third, the first timepoint at which the discrimination accuracy dropped to a level indistinguishable from the chance, tchance, was ~1900 (1916 ± 385 ms) after stimulus offset (Methods).
Thus, similar to human studies, our results revealed a second stage in mouse visual perceptual dynamics in which, following completion of sensory encoding, performance after stimulus offset decreases gradually with reaction time, consistent with decay of information about the stimulus maintained in STM (see also Discussion). They revealed 1700ms (1900 ms minus the 200 ms of tdelay) as an estimate of the duration after stimulus offset beyond which information for correct responding is no longer available to the animals.
Task-relevant foil (‘flanker’) modulates the sensory encoding stage of the conditional accuracy function
The sensory context in which the perceptual target is presented is well known to effectively modulate animals’ performance [49–51]. For instance, as demonstrated in the classic flanker task in humans, the co-occurrence of a foil stimulus with conflicting information can interfere with perceptual performance [52, 53]. Recently, similar results were demonstrated in mice using a touchscreen version of the flanker task [24]. In this task (Fig. 4A), a target grating (always presented at the lower location) was accompanied by a flanker grating at the upper location with either orthogonal orientation (‘incongruent’ flanker) or same orientation (‘congruent’ flanker). Compared to the presence of a congruent flanker, the ‘incongruent’ flanker significantly impaired discrimination accuracy (Fig. 4B-left; p<0.001, signed-rank test. re-plotted based on data from [24]; Methods) and median RT (Fig. 4B-right, p=0.019, signed-rank test). Here, we analyzed that dataset with the conditional accuracy analysis to investigate whether an incongruent flanker affected the sensory encoding stage or the STM-dependent stage of perceptual dynamics. The stimuli in this task were presented for 1s and the response window was 3s as before.
We investigated the effect of an incongruent flanker on perceptual dynamics by first pooling trials from all mice into two groups based on their flanker congruency, and sorted the trials based on their RT. To investigate the sensory encoding stage, we followed the approach used in Figure 2 and selected the trials on which mice responded before the stimulus ended (RT < 1000ms), and aligned them to stimulus onset. Separately, to investigate the STM-dependent stage, we followed the approach in Figure 3 and selected the trials on which responses were initiated after the stimulus disappeared (i.e, with RT – tdelay > 1000 ms), and thereby, by default, also after time needed for sensory encoding to be completed (i.e, RT – tdelay > 332 ms), and aligned them to stimulus offset. (tdelay was taken to be 200 ms from Fig. 2 and 3.) (Performing a DDM on the RT distributions from this dataset revealed that tdelay was ~200 ms, not different from the estimates in the previous experiments (mean tdelay = 205 ms for congruent trials, and 209 ms for incongruent trials))
We found that in the sensory encoding stage (Fig. 4CD), the peak conditional accuracy for incongruent trials was significantly lower than that of congruent trials (Fig. 4D-left; congruent = 88.6 ± 0.8%, incongruent = 81.9 ± 0.5%; p<0.001, permutation test), and the time at which performance just exceeded the 50% (chance) level was longer for incongruent trials (Fig. 4D-middle; to congruent = 223 ± 24 ms; incongruent = 243 ± 8 ms p=0.022, permutation test) – both indicate that the presence of incongruent flanker interfered with the sensory encoding of the target stimulus. The time to reach peak accuracy was, however, shorter for incongruent trials (Fig. 4D-right; tpeak: congruent = 433 ± 42 ms; incongruent = 371 ± 11 ms; p=0.01, permutation test), consistent with the lower apeak (Fig. 4D-left).
By contrast, there was no effect of flanker congruency on the time course of decay of conditional accuracy following stimulus offset. The time at which conditional accuracy dropped to chance was not different between congruent and incongruent flanker trials (Fig. 4EF; tchance: congruent = 1998 ± 274 ms; incongruent = 1744 ± 337 ms, p=0.426, permutation test), nor on the time at which conditional accuracy dropped just below apeak (tdecay: congruent = 302 ± 201 ms; incongruent = 207 ± 79 ms, p=0.313, permutation test).
In sum, we found that the interference in performance due to the incongruent flanker impacted the sensory encoding stage (apeak; as if weakening the target [54, 55]), but not the STM-dependent stage, of mouse visual decision dynamics.
Stimulus onset delay modulates RT distribution but not the conditional accuracy function
In investigating behavioral performance, overall decision accuracy can be decomposed into two components – the conditional accuracy function and RT distribution (Fig. 2A); the dot product of these two quantities yields overall accuracy. Our manipulations, thus far, produced changes in the conditional accuracy function predominantly. We wondered whether task parameters could alter RT distribution and possibly do so without affecting conditional accuracy function. To test this, we added a delay between trial initiation and target onset (i.e., stimulus onset delay) in the single stimulus discrimination task. We reasoned that the extent to which mice are able to adaptively withhold responding could impact the RT distribution.
We found adding a stimulus onset delay does alter the RT distribution of mice (Fig. 5A-left; RT computed w.r.to trial initiation). The median RTs (relative to trial initiation) increased with delay (Fig. 5A-right; Pearson’s p = 0.999, p=0.035). This indicated that mice were able to sense the delayed onset of stimulus and thereby withhold their responses. Nonetheless, mice were unable to do so for the full duration required: the average increase in median RT was smaller than the increase in delay (Fig. 5A-right; average Δ median RT = 36 ms for 100 ms delay, and 79 ms for 200 ms delay). This increase in median RT at longer delays was accompanied by a trend towards lower decision accuracy (Fig. 5B. Pearson’s p = 0.988, p=0.098).
By contrast, conditional accuracy analysis revealed no effect of stimulus onset delay either on the sensory encoding stage (Fig. 5C and S2D; apeak: no delay = 88.3± 2.0%; delay = 83.3 ± 3.6%; p=0.228, permutation test; t50: no delay = 209 ± 40 ms; delay = 176 ± 31 ms; p=0.652, permutation test; tpeak: no delay = 485± 36 ms; delay = 442 ± 73 ms; p=0.473, permutation test), or on the STM-dependent stage (Fig. 5D and S2E; tchance: no delay = 1581 ± 277 ms; delay = 2035 ± 444 ms; p=0.261, permutation test; tdecay: no delay = 919 ± 226 ms; delay = 543 ± 277 ms; p=0.165, permutation test).
Taken together, our results from varying the stimulus onset delay show that changes in RT distribution (and/or decision accuracy) are not necessarily accompanied by changes in the conditional accuracy function, which implies that the underlying sensory processes could remain unchanged. The observed trend of decreased accuracy was accounted by the fact that with a delay, there were more responses initiated before the sensory encoding was complete, or even before the stimulus was presented (i.e., ‘impulsive’ responses) (Fig. 5C). To quantify such impulsivity, we propose an ‘impulsivity index’ (ImpI): motivated by the observation that mice withheld responses at longer delays consistently for ~ 40 ms for every 100 ms of delay (median RTdelay=100 – median RTdelay=0 = 36 ms; median RTdelay=200- median RTdelay=100 = 43 ms), we therefore, defined ImpI = 1 – average (duration for which mice withhold responses /duration of the delay). Higher positive values of this index indicate greater impulsivity, with ImpI=1 indicating a complete inability to withhold responding in the face of stimulus delays (insensitivity to delays, or ‘maximally’ impulsive). In the case of our mice, ImpI is ~0.6.
DISCUSSION
Findings from this study have revealed two distinct stages in the temporal dynamics of visual perceptual decisions in mice, similar to those in humans. First, a sensory encoding stage that is subject to the speed-accuracy tradeoff, and then, a short-term memory dependent stage in which decision performance decays once the stimulus disappears. Our experiments also demonstrate that the two dissociable components of the dynamics of decision performance, namely the conditional accuracy function and the RT distribution, can be affected independently by experimental manipulations. Whereas experiments 1-3 revealed modulation of the conditional accuracy function (by stimulus size, contrast and presence of a foil) with minimal changes to the RT distribution, experiment 4 revealed modulation of the RT distribution (by stimulus onset asynchrony) without changes to the conditional accuracy function. Notably, our results yield quantitative estimates of fundamental psychophysical constants of visual perceptual decision-making in freely behaving mice. Together, this study establishes a quantitative platform for future work dissecting neural circuit underpinnings of the dynamics of visually guided decision-making in mice. The similarities in perceptual dynamics between mice and humans despite differences in the visual system suggest the presence of conserved principles underlying visual perceptual decision-making.
Estimates of time constants of the dynamics of visual perceptual decision-making in mice
Conditional accuracy analysis together with drift diffusion modeling and standard psychometric analyses yielded quantitative estimates of key parameters underlying the dynamics of visual decision-making in (freely behaving) mice. The duration of the sensory encoding stage (i.e., the window of temporal integration) was ≤ 332 ms across stimulus parameters and experimental conditions. Similarly, in humans, the internal representation of visual stimulus is thought to be constructed within the first 200-300 ms of stimulus presentation [29–32]; masking after that time does not impair performance. The duration of the fixed temporal overhead in the perceptual decision process, reflecting the combination of sensory (visual) latency and movement execution was ~200 ms (tdelay), and was largely unaffected by the tested stimulus parameters and experimental conditions. The shortest informative stimulus, i.e., the shortest stimulus duration which produced better than chance decision accuracy was ~40 ms (estimated as t50 – tdelay = 236-200 ms; Fig. 2A). Additionally, we developed a quantitative metric of impulsivity-one that depends on animals withholding responses until task-relevant information becomes available, rather than depending on withholding responses after all the information is available, or on the ability to stop a response that is underway [56]. For our freely behaving mice, this impulsivity index (ImpI, range 0 to 1) was 0.6. (This inherent impulsivity can potentially be countered by directed/targeted training (as in [23]).
We also obtained an estimate of the duration of STM, i.e., the longest duration following stimulus offset for which mice could continue to perform above chance, as 1700 ms. This constituted the period starting from stimulus offset to 200 ms before tchance (the last instant at which responses that are better than chance were initiated) (Fig. 3D; tchance – 200 ms = ~1700 ms). This duration of viability of the labile internal representation falls in the same range as has been reported from human studies [33, 34, 57]. This estimated duration of STM does not represent, necessarily, the duration for which visual stimulus information was maintained in STM. It could, also, represent maintenance of information about the motor response associated with the stimulus, or likely, a combination of the two.
Unlike STM, which refers to the retention of information even when it is not accessible from the environment, WM is thought of as ‘STM+,’ referring additionally to the ability to manipulate this information and protect it from interference [58, 59]. From this perspective, our estimate of 1700 ms constitutes a lower bound for WM, which can potentially be lengthened with training. Indeed, tasks that require animals to hold information during enforced delay periods can cause the duration of the WM to be longer. For instance, in mice performing olfactory WM tasks, delay periods of up to 5 sec have been reported [60]. Here, by allowing the natural evolution of the dynamics of decision-making to occur without an imposed delay period, we were able to estimate the ‘intrinsic’ duration of STM.
A potential factor that could confound our interpretation of the performance decay as being due to loss of information in STM is the attentional state of the animal. It is possible that the reduction in performance observed at longer RTs is due to trials in which mice did not pay attention to the stimulus, thereby reflecting lower accuracy (and longer RTs), rather than reflecting loss of information in STM. Our data from the flanker task (Fig. 4E) argue against this interpretation. In that experiment, we examined the time course of conditional accuracy after stimulus offset in two conditions in which we explicitly manipulated the attention of the animal (congruent vs. incongruent trials). We found that there was no difference in the decay of conditional accuracy between the two attentional conditions (Fig. 4E; red vs. blue). Thus, difference in attention across trials is unlikely to be the dominant factor accounting for the decay of conditional accuracy. Nonetheless, even if attention were a weak confounding factor, that would render our estimate of the duration of STM of 1700 ms as being, again, a lower bound.
Estimates of other psychophysical constants, and operating range of stimulus features for visual perceptual decision-making in mice
In addition to time constants of visual perceptual dynamics, this study yielded estimates of other psychophysical constants. The smallest stimulus and lowest contrast at which mice were able to discriminate orientation above chance were 25° and 20%, with mice performing at > 80% accuracy for most contrasts at that smallest size. Mice were also able to discriminate above chance at durations as short at 40 ms. All observed changes in discrimination accuracy were accompanied by changes in sensitivity rather than decision criterion, indicating that the manipulations all modulated aspects of the perceptual process. These findings that mice are able to discriminate visual stimuli in demanding sensory contexts suggest that the visual perceptual abilities of mice may be underrated.
The range of best discrimination performance of mice observed in our single target discrimination task (75-90%, Fig. 1C) was lower than that of primates (>90%) in similar tasks [61, 62]. Rather than reflecting fundamental differences in the decision processes, this is well accounted for by the lower visual acuity of mice, together with our use of ‘small’ stimuli (relative to those typically used in mouse vision studies [15, 16, 18, 20, 63]). Indeed, with larger grating stimuli, mice can perform very well with accuracy > 90% (Fig. S1J) [18, 64]. The performance plateau of 93% for a stimulus size of 45° (Fig. 1-1J), suggests that full-field stimuli may be effectively replaced by 45° stimuli without appreciable loss in performance. In addition, the stimulus duration that was maximally effective was 1000 ms, indicating that stimuli longer than 1000 ms may not be needed to test mouse behavior effectively in future single-stimulus discrimination tasks. (We note that although the conditional accuracy reaches a peak much earlier at ~300 ms, a longer stimulus is beneficial for overall accuracy due to the non-trivial fraction of trials that occupy the right half of the RT distribution, the consequence of overall accuracy being a dot product of the conditional accuracy function and RT distribution.)
At the highest contrasts, we observed a dip in performance from the plateau performance. Since the visual system adapts to the range of stimulus contrasts for best encoding [65], it is possible that the interleaved presentation of stimuli of various contrasts rendered the full-contrast stimulus unfavorable because of signal saturation [18]. Our data are directly consistent with this idea (Fig. 1C vs. Fig. 1-1J).
Optimal sensory sampling during visual perceptual decision-making in mice
An intriguing observation across tasks was that mice responded with nearly constant RT (Fig. 1, 4, and 5; RT change ≤100ms, in contrast to the much larger change, >500 ms, seen in monkey perceptual decision-making tasks [66–68]). The conditional accuracy analysis offers a plausible account for this observation. As indicated by the conditional accuracy function, mouse response accuracy increased as RT increased until it reached a plateau at tpeak. Therefore, an optimal strategy for mice would be to respond with RTs centered around tpeak: responding earlier than tpeak would sacrifice accuracy, while responding later than tpeak would needlessly delay response (reducing the reward rate). Consistent with this expectation, their RT distribution was centered around tpeak (500-600 ms, Fig. 2ABD, 4C), indicating that mice responded at near-optimal timing in terms of the speed-accuracy tradeoff.
Extended Data
(A) Lateral view of the schematic experimental setup showing the relative position of the touchscreen (leftmost vertical line), the plexiglass mask (grey-filled vertical bar), and the tube within which mice move (50 mm diameter); the plexiglass mask is positioned 3 mm in front of the touchscreen. Dashed lines indicate the central response hole (lower dashed lines), and left/right response holes (upper dashed lines; 10 mm diameter). For single-stimulus discrimination, the center of the stimulus is aligned with the center of left/right response holes in elevation, and with the central hole in azimuth (see Fig. 1A). For experiments involving two stimulus locations (i.e., flanker task), the upper (magenta) and lower (cyan) locations of the stimulus are indicated as colored bars (see also Fig. 4A). The 60 pixels x 60 pixels (12mm x 12mm) stimulus subtends a visual angle of 25° when viewed from 20 mm front of the plexiglass mask.
(B) Schematic of the signal detection theory (SDT) analysis illustrating perceptual sensitivity (d’) and criterion (c) calculations for 2-AFC task (Methods). Upper row; Left: SDT hypothesizes that the internal representation of vertical and horizontal stimuli can be reduced (projected) to a one-dimensional decision axis, on which they form two overlapping distributions (due to noise). A decision is made based on a criterion set by each individual animal: a stimulus whose representation falls above (or below) the criterion is judged as vertical (or horizontal), producing the appropriate behavioral response. A decision criterion (c) of 0, by definition, corresponds to optimal (unbiased) performance given the two distributions. For our 2-AFC task, we defined the decision criterion as the amount of deviation from an unbiased value for the following reason. Upper row; Right: Because of the inherent symmetry of 2- AFC task design, positive criterion would increase errors in classification of vertical targets, but also slightly decrease errors in classification of horizontal targets, producing a net reduction in overall accuracy. Similarly, a negative criterion would increase errors in classification of horizontal targets, but also slightly decrease errors in classification of vertical targets, again producing a net decrease in overall accuracy. Therefore, when the two distributions are similar, a negative as well as a positive criterion of the same magnitude will produce the same overall reduction in discrimination accuracy, but a criterion of smaller absolute value would signal an overall improvement in performance. For this reason, we used the absolute value of c (|c|) to examine the effect of criterion change on response accuracy. Lower row: Based on theory, improved response accuracy can result from (1) increased d’: when the two distributions become further separated; or (2) decreased |c|: when the decision criterion becomes less biased.
(C) Identification of trials towards the end of the 30 min behavioral sessions that corresponded to animals being poorly engaged in the task (Methods). Top panel: Time course of overall response accuracy across mice as a function of trial number within sessions. Accuracy obtained from trials pooled across all mice and sessions, and computed as a function of trial number within session (blue; Methods). Grey shading: bootstrapped estimates of the 95% confidence interval of the accuracy (gray; Methods). Diamonds on top: trials whose accuracy not significantly different from chance. Dashed vertical line: first trial at which the accuracy was not different from chance (50%), and stayed indistinguishable from chance for 3/5 of the next 5 trials (Methods). Data show increased variability and worse performance towards the end of sessions. Bottom panel: Number of actual observations across mice for each trial number, as a percentage of the maximal number of possible observations (Σ mice*sessions), plotted as a function of trial number within session (red). Solid vertical line: first trial at which the number of observations drops below 25%. Data show drop in the number of observations available to reliably assess performance towards the end of sessions. Based on these data, all trials above 122 of each behavioral session of this experiment were dropped from analysis (Methods). Results in Fig. 1 are based on data from trials 1-121 from each behavioral session.
(D) Schematic diagram of the two-choice drift diffusion model. The model simulates a decision process from sensory stimulus presentation to the point of behavioral report, and attempts to account for the full distribution of observed RTs. It posits that upon stimulus presentation, sensory evidence flows in, causing a (hypothetical) decision variable to ‘drift’ either upwards (towards one choice boundary) or downwards (towards the other) depending on which choice the incoming evidence favors. Under uncertainty, the decision variable drifts in a stochastic (zig-zag) manner as sensory evidence accumulates, eventually crossing one of the decision boundaries and triggering the corresponding behavioral response. Here, we adopted a standard version of the model with four parameters: (i) drift rate, or the average rate of evidence accumulation, whose sign could be either positive (favoring response A) or negative (favoring response B); (ii) boundary separation, the distance by which the two decision boundaries are separated; (iii) starting point, which captures an initial bias towards one or the other choice (starting point = 0.5 indicates an unbiased decision maker), and (iv) a non-decisional constant (t0), which accounts for net delay due to sensory latency (before decisional process) and motor execution (after a decision has been made); not illustrated here. Also indicated here is ‘threshold-to-correct response’.
(E-H) estimates of the four parameters of the drift diffusion model as a function of stimulus contrast (x-axis) and stimulus size (colors). (E) Drift rate; 2-way ANOVA, p=0.028 (contrast), p<0.001 (size), p=0.767 (interaction). (F) Boundary separation; 2-way ANOVA, p=0.171 (contrast), p=0.026 (size), p=0.953 (interaction). (G) Starting point; 2-way ANOVA, p<0.001 (contrast), p=0.325 (size), p=0.098 (interaction). (H) tdelay; 2-way ANOVA, p=0.523 (contrast), p=0.308 (size), p=0.931 (interaction).
(I) Scatter plot of drift-rate vs. threshold-to-correct response (indicated schematically in D) for different stimulus contrast. Each dot is data from a mouse at a particular contrast and size of the target; colors correspond to different sizes. Threshold-to-correct significantly correlated with drift rate (across contrasts) for stimulus size = 35° (0.339, p=0.11), and 45° (0.39; p=0.004).
(J) Discrimination accuracy as a function of stimulus size; p=0.001, 1-way ANOVA against stimulus size.
Identification of trials towards the end of the 30 min behavioral sessions that corresponded to animals being poorly engaged in the task (Methods); conventions identical to those in Fig. S1C.
(A) Identification of trials towards the end of the 30 min behavioral sessions that corresponded to animals being poorly engaged in the task (Methods). All conventions are as in Fig. S1C. Based on these data, all trials above 103 of each behavioral session of this experiment were dropped from analysis. Results in Fig. 5 are based on data from trials 1-102 from each behavioral session.
(B) Plots of key parameters of the conditional accuracy function for the sensory encoding stage of decision dynamics. Left panel: apeak; middle panel: tpeak; right panel: t50. Data show mean ± s.t.d of distribution of bootstrapped estimates (Methods). ‘*’ (‘n.s.’): p<0.05 (p>0.05), paired permutation tests followed by HBMC correction (Methods).
(C) Plots of key parameters of the conditional accuracy function for the STM-dependent stage of decision dynamics. Left panel: tchance; right panel: tdecay. Data show mean ± s.t.d of distribution of bootstrapped estimates (Methods). ‘*’ (‘n.s.’): p<0.05 (p>0.05), paired permutation tests followed by HBMC correction (Methods).