Abstract
Perceptual decisions often require the integration of noisy sensory evidence over time. This process is formalized with sequential sampling models, where evidence is accumulated up to a decision threshold before a choice is made. Although classical accounts grounded in cognitive psychology tend to consider the process of decision formation and the preparation of the motor response as occurring serially, neurophysiological studies have proposed that decision formation and response preparation occur in parallel and are inseparable (Cisek, 2007; Shadlen et al., 2008). To address this serial vs. parallel debate, we developed a behavioural, reverse correlation protocol, in which the stimuli that influence perceptual decisions can be distinguished from the stimuli that influence motor responses. We show that the temporal integration windows supporting these two processes are distinct and largely non-overlapping, suggesting that they proceed in a serial or cascaded fashion.
Does the brain use visual information to make perceptual decisions and plan the appropriate motor responses simultaneously? To address this question, we developed a novel experimental protocol that identifies the timing of visual influences on decision making and motor planning. In our protocol, human observers were asked to make a speeded discrimination and to report their choice by means of a saccadic eye movement (see Material and Methods for details). In Experiment 1 they were presented with two peripheral targets (Fig. 1A), whose positions were re-sampled at 15Hz from two generative distributions, and asked to decide with of the two distributions was closer to the central fixation point (i.e. which had the statistical expectation closer to the centre). They were asked to respond by shifting their gaze as quickly as possible onto the chosen, closer target. Observers were simply asked to ‘look at the target’: we did not require them to move to the mean of the generative distribution, nor to intercept the target’s current location (we did not enforce an acceptance window: all saccades were included in the analysis as long as they left the fixation area and reduced the distance between gaze and one of the two distribution of target positions; see Material and Methods for details). Fig. 1B (left sub-panel) illustrates one trial schematically: the observer is looking at the centre of the screen (red trace), when the two targets appear and continue changing positions. In the analysis, we aligned noisy position samples with respect to the saccadic onset time, and correlated them with either the choice (binary, left vs. right target) or to the endpoint of the saccadic eye movement. This allowed purely temporal characterisations of the target position’s influence, not only upon the choice (i.e. the saccade’s direction: right or left; black trace in Fig. 1B) but its eventual endpoint as well (blue trace in Fig. 1B). We estimated the evolution of these effects as a function of the temporal distance from saccade onset by using a Bayesian approach to reverse correlation (see Material and Methods for details). This analysis allowed us to reconstruct the temporal weighting functions underlying the decision and the oculomotor response. The results (Fig. 1E) revealed temporal weighting functions that were distinct and largely non-overlapping; whereas choices were correlated with relatively early samples, saccadic endpoints were correlated with later samples. In other words, the different time courses of visual influence on decision formation and eye movement preparation indicate a strongly serial or cascaded organization of the two processes: first a choice is formed, then a motor response is prepared.
Experiment 1 provided evidence for a serial/cascaded process. However, it is possible that the serial strategy was induced by the specific characteristics of the paradigm. While the perceptual decision required computing the difference in distance from fixation between the two targets, the oculomotor response required only the gaze-centred coordinates of the chosen target. It is possible that the requirement of computing the difference in position interfered with the processing of the gaze-centred coordinates, forcing observers to program an appropriate eye movement only after having selected the appropriate target. Moreover, using a dual-task manipulation, a previous study showed different time courses for peripheral and foveal processing of visual information before an eye movement: while peripheral processing stopped 60-80 ms before the saccade was launched, foveal processing continued right until saccade onset (Ludwig et al., 2014). The results of Experiment 1 thus leave open the possibility that integration of visual evidence could proceed in parallel under different conditions where the perceptual decision does not involve judgments of the saccadic targets, but rather some other stimuli presented more centrally. To address this possibility we designed Experiment 2, in which two patches of varying luminance were presented to opposite edges of the fovea. The perceptual decision involved choosing which was brighter on average (see Fig. 1C). We estimated the temporal weighting functions using the same approach as in Experiment 1 and found again distinct, largely nonoverlapping temporal weighting functions (Fig. 1F). The results of Experiment 2 thus indicate that visual input does not inform simultaneously decision formation and the preparation of the motor response, regardless of the particular visual feature that needs to be processed for the perceptual decision (position vs. brightness) or on the location of the visual signals (peripheral vs. parafoveal).
Our results provide evidence for a serial organization of perceptual decision-making and saccade planning. An important question is whether this account can be generalized to other conditions, such as natural viewing. In fact, during free viewing of stable scenes, fixation durations are on the order of just 300 ms, which is considerably less than the sum of integration times for perceptual decision-making and saccade planning in our paradigm. However, in normal conditions, the scene is stable and visual information can be retained and combined across multiple fixations, an idea supported by many lines of research. For example, it has been shown that the influence of visual information accumulated during a fixation is not limited to the first saccade following the fixation but extends to subsequent saccades (Caspi et al., 2004). Other studies have demonstrated that visual information can be integrated across saccades in a near-optimal fashion (Ganmor et al., 2015; Wolf and Schütz, 2015), and attention can be allocated stably across eye movements in the presence of visual landmark (Lisi et al., 2015). Thus, in normal conditions, the accumulation of perceptual evidence required to inform upcoming decisions and motor actions does not need to be completed within a single fixation; it may extend across multiple fixations. In contrast, in our experiments the accumulation of evidence had to start anew at each trial and the difficulty of perceptual decisions was set to elicit a substantial proportion of errors, resulting in relatively long integration windows. To assess whether the total duration of the pre-saccadic interval influenced the overlap of the temporal weighting functions, we split trials (using data from both Experiment 1 and 2) in 4 bins according to individual quartiles of saccadic latency, and estimated weighting functions separately for each latency bin (Fig. 2). We found that the same pattern of little/no overlap between weighting functions was replicated in each latency bin, including those with faster responses, in agreement with the serial account.
We note that our method estimated a temporal weighting function for saccadic eye movements that replicates critical features already reported in the literature, in particular the presence of a saccadic dead time: a ‘point of no return’ after which afferent information is too late to influence the upcoming movement (Findlay and Harris, 1984; Ludwig et al., 2007). Moreover, the direction of the influence of target-position samples on the saccadic landing was always positive (i.e., saccadic landing positions were attracted toward each sample, not repelled away), consistent with the integration of position information and inconsistent with repulsion by distractors, which usually occurs for saccade latencies longer than 200 ms (McSorley et al., 2006). Our results also reveal that despite the relatively long presentation of the stimuli, the saccadic system integrates information over only a relatively narrow temporal window (≈100 ms), consistent with what suggested by studies of saccades to moving targets (Etchells et al., 2010; Lisi and Cavanagh, 2015).
If decision formation and specification of motor response occurred simultaneously (Cisek, 2007), then the same samples of information should have influenced both processes and the weighting functions should have been largely overlapping. In other words, the visual signals that correlated with the choice also should have correlated with the parameters of the saccadic response. Some neurophysiological studies have reported correlates of evidence accumulation in areas of the brain that are related to the preparation of eye movements, such as the lateral intraparietal sulcus, LIP (Shushruth et al., 2018; Yates et al., 2017). Although these results suggest a tight connection between perceptual decision-making and preparation of motor responses (Shadlen et al., 2008), other interpretations are possible. Indeed, a recent inactivation study found that LIP neural activity is not required for the accumulation of perceptual evidence (Katz et al., 2016), thus putting in question the causal role of LIP in the formation of perceptual decisions. Our results contribute to this debate by showing that visual input does not simultaneously inform the formation of the decision and the preparation of the saccadic response. Instead, our results suggest that decision-related signals in LIP and oculomotor areas should be interpreted either as sensory evidence accumulation or as motor preparation, but not as simultaneous correlates of both processes. Interestingly, this perspective is in line with a recent study (Chen and Stuphorn, 2015) of economic (value-based) decision-making, which found clear evidence for sequential encoding of choice and action preparation in the macaque brain. Specifically, neurons in the supplementary eye fields (SEF) were found to encode first the value of the chosen option and -- about 100 ms later -- the parameter of the saccadic response that would obtain it. Taken together with the present study, these findings indicate that motor systems in the brain are not necessarily involved in evidence accumulation during decision-making, and may instead be engaged only at a later stage.
Although our results challenge the idea that oculomotor responses are prepared in parallel with the accumulation of perceptual evidence, they do not address the question of whether other types of responses (e.g. manual) can be prepared concurrently with the formation of a decision. Indeed, unlike saccades, hand movements can be modified online in response to new sensory inputs and often respond differently to stimuli or tasks that require the integration of information over time (Issen and Knill, 2012; Lisi and Cavanagh, 2017). Indeed, one previous study using motor perturbations (Selen et al., 2012) suggested a continuous flow of information from the ongoing decision process to control system for hand movements in the brain. In that study motor activity gradually built up with with a rate that (when averaged over trials) depended on the evidence discriminability However, Selen et al. did not use a reverse-correlation approach, and therefore could not uniquely relate the instantaneous perceptual evidence to the formation of the decision. In the context of eye movement responses, a recent study (Yates et al., 2017) that combined neural recording in the monkey with a reverse correlation approach found that the activity in LIP was driven mainly by a premotor buildup independent of choice-related evidence accumulation and that, in agreement with our behavioral results, the choice was mainly informed by early samples of evidence.
In summary, our results are inconsistent with decision formation and response preparation occurring in parallel. Instead, they are in line with theoretical models developed to account for psychological effects such as the refractory period and attentional blink (Zylberberg et al., 2011, 2012), which support a temporal separation of sensory evidence accumulation and motor preparation. Such theories postulate the existence of central bottlenecks to explain why, despite its massively parallel architecture, the brain can be surprising slow and serial at performing certain tasks. One potential advantage of organizing serially the stages of evidence accumulation and motor preparation is that it may facilitate the adaptation of behavioural responses to changes in the environment (e.g. changes in the appropriate association between stimuli and motor responses). Keeping these individual operations segregated and organized according to a serial algorithm may allow, when needed, their faster reorganization, ultimately promoting behavioural flexibility. Although this strategy might carry costs, such as slower response times due to not being able to plan simultaneously multiple response options, the benefits coming from the increased flexibility may largely outweigh the costs.
Material and Methods
Stimuli
Stimuli were Gaussian blobs presented on a background made of squares (side ≈0.08 deg), with random luminance drawn from a Gaussian distribution (RMS contrast ≈10%). The space constant of each blob was set to 0.3 deg and their peak luminance was ≈147 cd/m2. The position of each blob kept changing at 15Hz (every 4 monitor refresh cycles, corresponding to 66-67ms) and were drawn randomly from a circular area with uniform probability. The size of the circle was adjusted so that the standard deviation of position samples was 1.5 deg. In addition to the peripheral Gaussian blobs, Experiment 2 included also two small squares presented near fixation (side ≈0.8 deg, centered at ≈0.8 deg to the left and right side of the fixation point). Each square was divided in 4 vertical bars, and the luminance of each bar kept changing at 15 Hz, from a Gaussian distribution with standard deviation of 10 cd/m2 and mean equal either to the mean background luminance (≈46 cd/m2) or to a higher value set according to a staircase procedure (details in the Procedure section).
Apparatus
The experiment was run in a quiet, dark room. Right eye gaze position was recorded with an Eyelink 1000 (SR Research Ltd., Mississauga, Ontario, Canada). The participant’s head was placed on a chinrest with adjustable forehead rest. Visual stimuli were presented on a gamma-linearized LCD monitor, 51.5cm wide, placed at 77cm of viewing distance. The monitor resolution was 1920×1200. An Apple computer controlled stimulus presentations and response collection; the experimental protocol was implemented using MATLAB (The MathWorks Inc., Natick, Massachusetts, USA) and the Psychophysics (Brainard, 1997; Kleiner et al., 2007; Pelli, 1997) and Eyelink (Cornelissen et al., 2002) toolboxes.
Participants
4 Participants (2 authors and 2 naive participants) participated in Experiment 1, and 6 Participants participated in Experiment 2 (2 authors and 4 naive participants). All had normal or corrected-to-normal vision. Participants gave their informed consent in written form; the protocol of the study received full approval from the Research Ethics Committee of the School of Health Sciences of City University of London.
Procedure
Experiment 1
Each trial started when gaze position was maintained within 2 deg from the central fixation point at least 200 ms. If the trial did not start within 2 seconds, the program paused, allowing participants to take a break and re-calibrate the eye-tracker. To prevent the use of monitor edges as stable landmarks for the localization of the peripheral targets, the position of the fixation point was jittered across trials: each trial a new position was drawn from a 2-D Gaussian distribution centered on the screen center, with a standard deviation of 0.2 deg on both horizontal and vertical dimension, and zero covariance. The position of the distributions from which the positions of the peripheral targets (the Gaussian blobs) were drawn was always clamped with respect to the trial-by-trial position of the fixation point. In any trial, the average distance of the centres of the two generative distributions was always 10 deg, but it differed across left and right targets, so that the for one of the targets (the near target) it was always <10 deg, and for other >10 deg (see video S1 for an example). Participants were asked to identify as quickly as possible the nearest target, and communicating the decision by looking directly at the chosen target. A 50-ms beep (F5, 698.46Hz) was delivered as a feedback after correct choices. The distance difference between the two distributions was initialized at 2 deg and adaptively adjusted according to a two-down-one-up staircase procedure (step size 0.25 deg). Each participant ran a minimum of 20 blocks of 50 trials each, distributed over the course of several testing session on separate days. See Table S1 for information about the performances of individual observers.
Experiment 2
Experiment 2 followed a similar procedure to Experiment 1, but with the following differences. The generative distributions of target positions were always both placed at the same distance, 10 deg of eccentricity. The perceptual decision was based on a different stimulus, presented parafoveally, and consisting of 2 squares, each containing 4 bars of varying luminance, resampled in sync with the peripheral target positions (see video S2 for an example). The luminance values of the bars were drawn from a Gaussian distribution (see Stimuli section), and the mean luminance of the brightest square was initialized at 8 cd/m2 above the background luminance, and then adjusted according to a two-up one-down staircase procedure (step size 2 cd/m2). Each participant ran a minimum of 13 blocks of 50 trials each, distributed over the course of several testing session on separate days. Information about performances of individual observers is reported in Table S2.
Analysis
Pre-processing of gaze recordings
Saccadic onsets and offsets were detected offline using MATLAB and an algorithm based on 2-D eye velocity (Engbert and Mergenthaler, 2006). More specifically, eye movements were identified as saccades if their velocities exceeded the median velocity by 5 standard deviations for at least 8 ms. Once saccadic parameters were measured, further statistical analyses were made using the open source software R (R Core Team, 2015). For each trial, we selected as the primary saccade the first saccade that started after the onset of the target, from within a circular area of 2.5 deg around the initial fixation point, ended outside of that circular area. We excluded trials where the primary saccade had a latency shorter than 100 ms (about 0.5% of total trials) and trials where the amplitude of the primary saccade was less than 2 deg (about 4% of total trials).
Estimation of weighting functions
In order to estimate the weighting functions for saccade planning, we regressed the centres of gaze (with vertical and horizontal positions denoted sx and sy at saccadic offset against the spatio-temporal coordinates of the Gaussian blobs, with respect to the saccadic onset. We restricted our analysis in the 900 ms proceeding the onset of the eye movement. Since the granularity of the saccade onset detection was in the order of milliseconds, this yields 900 time points and thus, in principle, 900 parameters to estimate simultaneously. To make the estimation more tractable, we binned the temporal interval in 100 smaller intervals of 9 ms each. Whenever changes in the position of the Gaussian blob occurred within a bin, we took the average of the two positions, weighted by the relative fraction of time in which the blob occupied each position within the bin. This procedure yields for each trial i vectors of target positions xi and yi, each of length 100. The trial-by-trial coordinates of saccadic endpoint were modelled as where β = (β1,β2,…,β100) is the vector of linear coefficients determining which of the position samples are correlated with the saccadic landing position (assumed to be the same across vertical and horizontal saccadic components, up to a scaling factor m) and, ‘·’ is the dot product. Note that the linear coefficients are not independent from one another: due to the temporal structure of the stimulus, contiguous coefficients often represent the influence of the same stimulus sample. This introduces autocorrelation in the coefficient vector, such that the difference between neighbouring coefficients is likely to be smaller than that of more distant coefficients. To account for this, we fit our model within a Bayesian framework and adopted a random-walk prior (Chiogna and Carlo Gaetan and Gaetan, 2002) to enforce smoothness:
Note that the random-walk proceeds in reverse – starting by assigning a regularizing (zero-centered) Gaussian prior to the last coefficient. This is because the last coefficient lies within 9 ms from the saccade onset, and thus is unlikely to have a large influence on the saccadic vector. The remaining parameters were assigned the following priors
This modeling approach was used in both experiment 1 and 2. For each participant the model was estimated using MCMC sampling in Stan and its R interface (Carpenter et al., 2017; R Core Team, 2015). We ran 4 chains of 4000 samples each, and verified convergence by checking that there were no divergent transitions and the variance between and within chains did not differ significantly (R ≈ 1 for all parameters; Gellman & Rubin, 1992).
A similar approach was used to estimate the weighting function for the decision, with the difference that we used a generalized linear model instead of a simple linear regression, to account for the dichotomous nature of the dependent variable. Formally, in this model the probability of choosing the stimulus on the right can be expressed as where, for experiment 1, is the vector of differences between the two targets’ distances from the central fixation point. This vector contains 100 values for each trial (for clarity we omitted the trial subscript i). The notation ‘◦’ in the exponents indicates that the power operations are applied elementwise (also known as Hadamard power). The same approach was used in the analysis of experiment 2, however in this case the perceptual decision was based on the difference in luminance between the right and left patch, where Lv indicates the vector of luminances of either the left or right parafoveal patch (each value represents the average of the 4 vertical bars within the patch). To introduce smoothness we used, for both experiments, the same random-walk prior used in the analysis of saccadic weighting functions. The remaining parameters were given the following priors
This model was also estimated using MCMC sampling as implemented in Stan.
Statistical analysis
The ordered vector of 100 coefficients represents an estimate of the weighting function used by participants to make the decision and to plan the eye movement. In order to estimate the onsets and offsets of the temporal integration windows, for each participant we used samples drawn from the posterior distribution to estimate the Bayesian highest posterior density (HPDI) credible intervals around each of the 100 coefficients. This allowed us to determine the temporal integration windows as the temporal intervals in which the credible interval did not include zero. To further control for the possibility that these intervals were due to chance, we estimated their probability under the null hypothesis using the cluster test (Cao and Worsley, 2001; Friston et al., 1994). For this test, each coefficient was transformed into a t statistic by dividing it by the standard deviation of its posterior distribution. The number of resolution elements or resels (which determines the resolution of the random field assumed by the cluster test) was taken to be the number of distinct stimulus samples presented during the 900 ms interval before the saccade: 13.5. For all the clusters included in the analysis the p-value resulting from this procedure was smaller than 0.01. To determine onsets and offsets of the integration windows at the group level, we averaged the onset and offset of the integration windows of individual participants (see Fig. 1).
Data availability
Data and code supporting this article are available as an Open Science Framework repository (link: https://osf.io/embky/).
Acknowledgements
This work was supported by grant RPG-2016-124 from the Leverhulme Trust.