Abstract
Neurobiological investigations of perceptual decision-making have furnished the first glimpse of a flexible cognitive process at the level of single neurons1,2. Neurons in the parietal and prefrontal cortex3–6 are thought to represent the accumulation of noisy evidence, acquired over time, leading to a decision. Neural recordings averaged over many decisions have provided support for the deterministic rise in activity to a termination bound7. Critically, it is the unobserved stochastic component that is thought to confer variability in both choice and decision time8. Here, we elucidate this stochastic, diffusion-like signal on individual decisions by recording simultaneously from hundreds of neurons in the lateral intraparietal cortex (LIP). We show that a small subset of these neurons, previously studied singly, represent a combination of deterministic drift and stochastic diffusion—the integral of noisy evidence—during perceptual decision making, and we provide direct support for the hypothesis that this diffusion signal is the quantity responsible for the variability in choice and reaction times. Neuronal state space and decoding analyses, applied to the whole population, also identify the drift diffusion signal. However, we show that the signal relies on the subset of neurons with response fields that overlap the choice targets. This parsimonious observation would escape detection by these powerful methods, absent a clear hypothesis.
Introduction
Difficult decisions require our brains to acquire many samples of evidence before making a choice. Evidence from psychophysics9–11 and neurophysiology8 support the idea that perceptual decisions are explained by a quantity, termed a decision variable (DV), which evolves as the sum of accumulated signal and noise—that is, deterministic drift plus stochastic diffusion. Critically, this formulation predicts that each decision is explained by a unique drift-diffusion trajectory that determines the outcome of the decision and the time of its completion. Recordings from single neurons in parietal and prefrontal association cortex in monkeys provide indirect support for this quantity, based on second order statistical properties from many decisions12 or successful application of machine learning to predict decision outcome13. However, to date, drift-diffusion has not been observable on single trials, making it impossible to assess whether individual decisions are governed by this process.
Here we provide the first direct evidence for a drift-diffusion process underlying single decisions. We recorded simultaneously from up to 203 neurons in the lateral intraparietal area (LIP) while monkeys made perceptual decisions about moving random dots. Using a variety of dimensionality reduction techniques, we show that a drift-diffusion signal can be detected in such populations on individual trials, and that this signal satisfies the criteria for a decision variable. Notably, the signal of interest is dominated by a subpopulation of neurons whose response fields overlap one of the choice targets, consistent with earlier single neuron studies1,7,8,12.
Results
Two monkeys made perceptual decisions, reported by an eye movement, about the net direction of dynamic random dot motion (RDM; Fig. 1a). We measured the speed and accuracy of their decisions as a function of motion strength (Fig. 1b). The choice probabilities and the distribution of reaction times (RT) are well described by a bounded drift diffusion model (Fig. 1b,c). The fits of this model also specify the evolving probability distributions of the decision variable. They will guide our interrogation of the neural data.
a, Random dot motion discrimination task. The monkey fixates a central point. After a delay, two peripheral targets appear, followed by the random dot motion. When ready, the monkey reports the net direction of motion by making an eye movement to the corresponding target. Yellow shading indicates the response fields of a subset of neurons in area LIP that we refer to as Tin neurons. b, Mean reaction times (top) and proportion of leftward choices (bottom) plotted as a function of motion strength and direction, indicated by the sign of the coherence (positive is leftward, toward Tin). Data are from all sessions from monkey M (black, 9684 trials) and monkey J (brown, 8142 trials). Solid lines are fits of a bounded drift-diffusion model. c, Drift-diffusion model. The decision process is modelled as a race between two negatively-correlated (ρ = − 0.71) accumulators: one integrating momentary evidence towards Tin and another towards Tout choices. The decision is terminated when one accumulator hits its positive bound.
During the task, we used Neuropixels probes to record simultaneously from 54–203 neurons in area LIP, representing a random and unbiased sample of neurons within the targeted recording location (Fig. 2a). This location, deep in the intraparietal sulcus was previously inaccessible using existing high density recording devices. We conducted eight sessions in two monkeys (1696–2894 trials per session). In what follows, we highlight one representative session from monkey M.
a, Neural recordings were obtained from area LIP in the intraparietal sulcus of the right hemisphere, using a prototype 45 mm Neuropixels probe capable of recording simultaneously from 384 of 4416 contacts (see Methods). The large green rectangle contains voltage traces from a single neuron, which typically appear on several contacts. In the example session, 17 neurons had response fields that overlapped the left (contralateral) choice target (Tin). b, Trial-averaged mean firing rates of 17 Tin neurons grouped by motion strength (color) and direction (thickness). left, responses aligned to stimulus onset; right, responses aligned to saccade (error trials are excluded from the means). The traces reflect the direction and strength of motion from ∼ 200 ms after motion onset. c, Single trial firing rates, , approximate drift-diffusion; left, 100 samples of
baseline corrected for 0% coherence trials. Thick lines are intended to aid tracing of single trajectories; middle, same as left for 25.6% coherence; right Examples of two trials of
for each coherence and direction of motion after subtracting the within-coherence mean activity. d, Correlation of neural activity,
, with RT (solid). Only trials with RT ≥670 ms resulting in Tin choices are include in the analysis. This correlation was partially mediated (dashed; partial correlation) when accounting for activity at a later time point (t = 0.55 s). e, Leverage of neural activity,
, on choice (β) also show mediation as in d. f, Percent mediation on all sessions using the values at t = 0.35 s (gray arrows in d and e). Boxes show median and interquartile range; whiskers end at minimum and maximum.
Our initial analyses focus on the subset of neurons with response fields that overlap the contralateral choice target (termed Tin) and show spatially selective persistent activity on an oculomotor delayed response task14—the type of neuron previously selected for study in single-cell recording experiments. Fig. 2b shows the activity of 17 such neurons, averaged across neurons and trials of the same motion coherence in this representative session. The average firing rates from this ensemble replicate earlier findings: ramp-like trajectories during decision formation with buildup rates that depend on the strength and direction of motion and stereotyped firing rates just preceding saccadic choices for Tin. The motion-dependent separation of responses is thought to mark the beginning of decision formation in LIP, ∼200 ms after onset of the RDM. While the build-up is thought to reflect the deterministic component of the accumulation of momentary evidence, the diffusion process is elusive. This is because noise is suppressed through averaging across trials, and for a single neuron, the spikes are too sparse to permit inferences of time-varying rate over the appropriate scale. With a pool of 17 neurons, however, it is now possible to measure the rate on single trials.
Each trace in Fig. 2c shows the average firing rate of the 17 Tin neurons,, on single trials. They are redolent of drift-diffusion. The traces contain many alternations in the sign of their derivative (even after smoothing), consistent with an accumulation of independent samples of positive and negative numbers. The traces spread apart by
, because the variance of a running sum of independent identically distributed random samples is the sum of the individual variances (σ2 ∝t). This relationship holds for ∼100 ms for the weakest motion strength. As shown below, the attenuation of the spreading at later times is a sign of a termination bound that limits the range of the accumulation. It thus appears that the ramp-like averages in Fig. 2b belie drift-diffusion processes on single trials.
We hypothesized that the single trial average firing rate, , approximates the decision variable 8. If so, the diffusion component should explain the variability of choice and reaction time for trials sharing the same direction and motion strength. Specifically, (i) a sample of
should be predictive of choice and correlate inversely with the RT on trials that result in Tin choices; (ii) later samples ought to predict choice better and correlate more strongly (negatively) with RT than earlier samples, and (iii) later samples should contain most, if not all, of the information present in the earlier samples. As we will show, these predictions are borne out.
As shown in Fig. 2d, a sample of taken near the beginning of decision formation is negatively correlated with the RT on that trial (black line), even though we are only considering trials with long RTs (≥670 ms). The correlation is statistically significant for t ≥250 ms, which is only 50 ms after the responses begin to exhibit a dependency on motion strength and direction. The magnitude of the correlation is stronger with later samples, and critically, the later samples contain most of the information in the earlier sample, thereby rendering a reduced partial correlation, given the later sample (dashed line). For example,
, whereas 200 ms later
, and the partial correlation
. The later sample mediates 66% of the earlier sample’s coefficient of determination (R2). The same pattern holds for the choice (Fig. 2e). Early samples of
exert positive leverage (β) on the probability of a Tin choice; later samples have more leverage and explain away the leverage of earlier samples from logistic regression (GLM). We compute a simple index of mediation using the change in β when the later time point is absent or included in the regression (see Methods). In the example in Fig. 2e, the mediation at t = 0.35 s is complete (across-session mean ± sem = 74 ±9%; p < 0.001, Fig. 2f, Fig. S1). Thus
is an evolving representation that controls the choice and decision time on a single trial—the decision variable (DV), consistent with the hallmarks enumerated above.
The representation of the DV is concentrated
Up to now, our focus has been on the same neurons we would have selected in single-neuron recording experiments, in this case screened post hoc for spatially selective persistent activity associated with the contralateral choice-target. We pursued several strategies to assess whether the representation is more broadly distributed among the population of simultaneously recorded neurons. The first is a targeted approach to find the weighting of all 191 neurons that best approximates the bounded drift-diffusion process inferred from behavior, while remaining agnostic about the role of the 17 Tin neurons.
We fit the choice and RTs on the session using a bounded drift-diffusion model, structured as a race between two partially anticorrelated drift-diffusion processes (Fig. 1c). The architecture approximates the neural organization— populations of neurons that accumulate evidence for right vs. left and populations that accumulate evidence for left vs. right. Each has an upper stopping bound8. The fits furnish an estimate of the time-varying distribution of the DV associated with each motion strength and direction. The estimates incorporate the fact that trials terminate by reaching a positive bound in either of the competing processes. This gives rise to nonlinear functions of the expectation and variance as a function of time (Fig. 3a). We optimized the weights for each neuron such that the time-varying distribution of the weighted average firing rates best matched the model’s predicted distribution (see Methods). The vector of weights, d, define the direction in the 191-dimensional neuronal state space that best capture the one-dimensional drift-diffusion decision variable. As shown in Fig. 3b, the projection of activity on d, Sd(t), exhibits time dependent mean and variance similar to the model predictions.
a, Predicted mean and variance of single-trial diffusion paths derived from the fits to behavior (Fig. 1b). The conditionalization imposed by the stopping bounds are responsible for the departure from linear functions of time (both columns). For plotting purposes only we incorporate a lower bound on DV in the model as spike rate cannot be negative (see Methods). b, Mean and variance of population single-trial responses, projected on the diffusion vector. c, Mean and variance of the single trial responses. d, Example pairs of single-trial diffusion paths rendered as
(solid) and Sd(t) (dashed). e, Histogram of within-trial correlations between pairs of trials like those in panel d. Correlations between
and Sd (purple) and between
and Sd with the Tin neurons removed (Sd− ; green). Arrows indicate median correlations. f, Heat map shows the population response projected on the diffusion vector, determined during the waiting period of memory saccades. Circles show the location of targets in the oculomotor delayed response task. Red circles show the location of the choice targets. This shows a response field overlapping the left choice target. g,h, Correlation of Sd(t) with RT and choice on trials with RT>670 ms. Solid and dashed black curves show simple and partial correlations, as in Fig. 2d,e. The third curve (copper) shows mediation by
, Mediation by Sd and
across eight sessions. Boxes show median and interquartile range; whiskers end at minimum and maximum.
The theoretical means and variances are also approximated by the 17 Tin neurons (Fig. 3c). Indeed, the unweighted average firing rates, (t) are similar to Sd(t) on individual trials (Fig. 3d). The histograms in Fig. 3e show correlations for all trials from this session. They show a strong correlation between Sd and
(median
), which is reduced significantly when the Tin neurons are excluded from d to construct the 174-dimensional d− (median
, KS-test; Table S1), indicating a large contribution of the Tin neurons to d. In fact, the population activity projected onto d renders spatially selective persistent activity to targets in the same region of the visual field as the Tin neural response fields (Fig. 3f; 7 of 8 sessions, Fig. S2). It is therefore unsurprising that early samples of Sd(t) exhibit correlations with choice and RT. Nor is it surprising that a sample of Sd(0.55) mediates the correlation of earlier samples of Sd(t) (dashed curve). Surprisingly, however, a sample of
also mediates earlier samples of Sd(t) (copper curve). We observe this pattern of mediation across measures in all but one session for which a later sample of
does not mediate earlier activity of Sd (Fig. S3).
Clearly, the 17 Tin neurons play an important role in rendering the decision. We assume they are but a subset of the larger pool of neurons with response fields that overlap the same choice target—hence the partial mediation. Yet, they do not stand out on the basis of their weighting coefficients (Fig. S7)—not even when the weights are penalised so as to be sparse (L1 regularization). Thus without prior knowledge, we might not have recognized the simple logic of the decision process represented by neurons that inform the saccadic report. Taken together these analyses support the hypothesis that the DV is a one dimensional process15 that is largely explained by the 17 Tin neurons.
Data driven approaches
Multineuron recordings present an opportunity to discover unanticipated features of data, beyond the imagination of the scientist and existent hypotheses. Such efforts are aided by a growing set of tools, devised to facilitate discovery of neural computations realized in the neuronal state space. Here we ask whether three data driven methods would discover diffusion and the importance of the Tin neurons had we not known about the latter and looked for the former. The answers to these questions may be relevant to many types of experiments that seek to exploit large scale neural recordings.
We first trained a logistic decoder to predict the monkey’s choices using the full population of 191 neurons in the sample session. We used all coherences and allowed a unique set of decoding weights at each time step (similar to Ref. 13; see Methods). As shown in Fig. 4a, the cross-validated accuracy of this optimal decoder increased as a function of time from motion onset. To estimate the upper bound of decoding, we performed the same analysis on data simulated with the diffusion model fit to the behavior. The peak performance of this decoder is 72% (Fig. 4a). The population decoder rivals this performance, and remarkably, so does the application of a threshold to Sd(t) and to predict choice. Indeed, removal of the 17 Tin neurons from the population reduces decoding performance far in excess of what would be expected by removing a random 17 neurons (average reduction in decoding accuracy of 8% and 1 ± 1%, mean ± sd, respectively).
a, Logistic choice-decoder. The decoder is trained to predict the monkey’s choice from a weighted sum of firing rates from each neuron, using a random half of the trials. Decoding accuracy is established on the left out trials (black trace). The best achievable performance (horizontal dashed line) is estimated from simulation of a drift-diffusion model. Colored traces are accuracy of predictions using the 17 Tin neurons and Sd− (re-derived diffusion direction without the Tin neurons). Shading indicates s.e. b, The decoder defines a direction in neuronal state space. Heat map shows the degree to which a decoder trained at one time (abscissa) performs at other times (ordinate). The weights at t = 0.35 s are used to generate single-trial responses analyzed in columns 2 & 3. 2nd column, Trial averaged signals grouped by motion strength (color) and direction (line style) with same convention as Fig. 2a. 3rd column, Examples of detrended single-trial responses. c, Principal components analysis. The image shows trial average activity along the first 3 PCs (200 ms after motion onset until 100 ms before median RT) for 4 motion strengths (both directions for non-zero coherence). Filled and empty circles indicate the beginning of the trials for thick and thin traces, respectively (same convention as Fig. 2b). This depiction is less interpretable than those in the next columns. Middle and right columns use single-trial signals rendered by the PC-1. Same as in b. d, Correlation clustering. Column 1, Pairwise correlations between the neurons in the example session. Vertical and horizontal lines separate four spectral clusters based on r-values. Columns 2–3 use single-trial signals rendered by Cluster-1 (arrow in 1st column) as in b. e, Similarity of single-trial signals from
and the three methods in b–d. Arrows indicate median correlations. f, Summary of mediation by a later sample of the signal in the corresponding row (self) or
on all sessions using the same RT restriction and sample time points as Fig. 3i. The stray point in the box-whisker plot (bottom) is an outlier.
While the decoder assigns a new set of weights to each time point, Fig. 4b shows that weights derived as early as 300 ms after motion onset (100 ms from the beginning of decision-related activity) perform nearly as well when those same weight are applied at all other times16. This observation is consistent with the characterization of the process as one-dimensional, supported by the mediation analyses (e.g., Fig. 3g,h). We applied decoder weights derived at t = 0.35 s to define a direction in the neuronal state space to produce time dependent signals on single trials. The across-trial means exhibit coherence-dependent drift and the single-trials bear semblance to diffusion. The latter have leverage on choice, by design, and they correlate inversely with RT. Both effects (at t = 0.35 s) are mediated by a later sample of the same signal (self, RT: 67.4%; choice: 62.8%) and also by (RT: 81.1%; choice: 66.9%). The top row of Fig. 4f shows the mean percent mediation for the eight sessions (see also Fig. S4).
We also applied Principal Components Analysis (PCA) to trial-averaged standardized firing rates, using the epoch from 200 ms to 500 ms from motion onset (as in Fig. 2b). The first three PCs (Fig. 4c) capture 57% of the total variance. We use just the first PC (39% of variance) to render single-trial signals. The trial averaged means show clear dependence on motion strength and direction and single-trial traces are redolent of diffusion. They are positively correlated with (Fig. 4e), and their correlation with behavior is mediated by
on most sessions (Fig. 4f, Fig. S5). We obtain similar results using demixed PCA17 (Fig. S5).
We conclude that PCA has the potential to reveal drift-diffusion on single trials in most of our experiments. This might seem surprising initially, as the PCs are derived from trial-averaged firing rates, which suppress the diffusion component. However, the PCs assign weight to the neurons that represent the drift, and those averages comprise drift-diffusion on single trials. However, like the choice decoder, which also mimics diffusion, PCA (and dPCA) do not expose the simplicity of the organization around the Tin neurons (Fig. S7). An inquisitive experimenter could discover this organization by noticing a visual response to the targets and a perisaccadic response associated with Tin choices.
Is there any feature of the data that might provide a clue to the data analyst about the importance of the Tin neurons? Although it is not obvious on single sessions, pooling the eight sessions reveals that the largest weights of d, the choice-decoder and PC1 are assigned to Tin neurons (Fig. S7). However, this does not reveal the underlying logic. Inspired by Kiani et al.18, we conducted a spectral analysis of the pairwise correlation matrix using spike counts in a short epoch before the onset of RDM (Fig. 4d). Clusters of positively correlated neurons are apparent along the main diagonal. We define Cluster-1 (arrow) as the cluster with the strongest single-trial correlations with . Thus, it is unsurprising that the single-trial traces from Cluster-1 correlate strongly with
(Fig. 4e). In the example session, this cluster also stands out as the one with the largest average correlation
, but this was not a reliable indicator across sessions. That said, 9 of the 16 neurons in this cluster are Tin neurons, and across all sessions, 43 ± 11% of Tin neurons are in Cluster-1 (Table S2). Thus, without prior knowledge of the Tin neurons, spectral clustering might bring them to our attention. Many are in the clusters with the stronger
.
Taken together, we conclude that even without a hypothesis, it is possible to discover drift-diffusion and its status as a decision variable. Absent knowledge of Tin, one could recognize that it is represented by a low (one) dimensional subspace of the neuronal state space. While accurate, the characterization would obfuscate the simple biological organization.
Discussion
We have directly observed the neural representation of a stochastic decision variable—the accumulation of noisy evidence that determines choice and RT on single decisions. The signal is evident in the mean firing rate of small pools of neurons with response fields that overlap a choice target (Tin neurons) and also in the weighted sum of the full population of 54–203 sampled neurons, the neuronal state space (NSS), which includes these Tin neurons. It has the hallmarks of a decision variable: (i) it is correlated with both choice and RT, (ii) the magnitude of the correlations increase over time, as the decision is forming, (iii) later samples of the signal mediate correlations of earlier samples, (iv) its time-varying mean and variance (for each motion strength) are consistent with a drift-diffusion process, and (v) although not shown here, the same signals lead to termination of the decision process19. We focused on the early epoch of decision formation because this is where the drift-diffusion signal is least distorted by the stopping bound, trial dropout, and response features associated with saccade initiation19.
The finding validates the conclusions from single-neuron studies, which inferred one-dimensional diffusion dynamics using indirect methods12,15. It also complements the choice-decoding strategy applied in Peixoto et al. (Ref. 13) to population recordings from the prefrontal and motor cortex. Choice decoders, like ours and Peixoto’s, are machine learning tools designed to predict the choice, but they are not the same as the actual DV. Unlike a choice-decoder, a real DV does not represent the probability of a choice but the signal that determines the choice and RT. The choice and RT are stochastic because the drift-diffusion signal is the accumulation of noisy evidence. While the stochastic drift-diffusion cannot be detected in trial-averaged firing rates, we can now recognize that the coding of a stochastic quantity conforms to the same principles as the encoding of simpler, deterministic quantities, such as direction selectivity—that is, pooling of spikes from many weakly correlated neurons with similar selectivity20,21 (but see Ref. 22). This explains why the sample of Tin neurons mediates strongly and also why it does not mediate completely, as these are only a sample of the full population.
A decision variable is just one of many neural representations that underlies a single instance of a mental process. The capacity to record simultaneously from an ever increasing number of neurons opens the possibility of studying the mechanisms underlying such instances. In this spirit we are encouraged by the capacity of purely data driven methods to “discover” diffusion. However, none of the methods we tried would identify the important Tin neurons. That is, they do not reveal the spatial organization of the response fields. Based on the experience here, one could imagine a path to their discovery by testing neurons identified by correlation-based clustering or large weights in PC1, say. For example one might notice that the responses of such neurons are correlated with the single trial diffusion signal rendered by PC1, and in addition they respond to both the onset of the choice targets and just preceding contraversive saccades. It would be natural to map the neural response fields and recapitulate the single neuron studies that discovered spatially selective persistent activity14. Thus, while they do not guarantee discovery of the underlying neural organization, data driven approaches may point to observations that lead to discovery through hypothesis-driven inquiry and experiments.
These hypothesis-driven steps are essential, in our view, if we desire an explanation of brain function in the language of biology, in addition to mathematics. The present findings identify the DV with a particular functional cell type in LIP. The ability to observe the idiosyncratic, time-varying brain signal that gives rise to one decision at a moment in time opens a new lens to view neural computations that are only revealed on single instances. For example, the capacity to observe a decision variable on a single trial opens the possibility of elucidating interactions between LIP and other nodes of the network involved in decision making. As we show in the companion paper19, single-trial firing rates in functionally connected Tin neurons in LIP and the superior colliculus perform distinct computations, which form and terminate the decision, respectively. One wonders what biological parsimony might hide in other neuronal state spaces.
Methods
Ethical approval declarations
Two adult male rhesus monkeys (Macacca mulatta) were used in the experiments. All training, surgery, and experimental procedures complied with guidelines from the National Institutes of Health and were approved by the Institutional Animal Care and Use Committee at Columbia University. A head post and recording chamber were implanted using aseptic surgical procedures.
Behavioral tasks
The monkeys were trained to interact with visual stimuli presented on an CRT video monitor (Vision Master 1451, iiyama; viewing distance 57 cm; frame rate 75 Hz). They were trained to control their gaze and make saccadic eye movements to peripheral targets to receive a liquid reward (juice). The direction of gaze was monitored by an infrared camera (EyeLink 1000; SR Research, Ottawa, Canada; 1 kHz sampling rate). The tasks involve stages separated by random delays, distributed as truncated exponential distributions
where tmin and tmax define the range, λ s is the time constant, and α is chosen to ensure the total probability is unity. Below, we report the range and the exponential parameter λ. Note that because of truncation, the expectation 𝔼 (t) < tmin + λ.
In the random dot motion (RDM) task, monkeys are trained to make decisions about the net direction of motion in a dynamic random dot motion display. The RDM is confined to a circular aperture (diameter 5 dva; degrees visual angle) centered on the fixation point (dot density 16.7 dots ·’dva−2 ·s−1). The task flow is shown in Fig. 1a. The random wait from onset of the choice targets to onset of the RDM is (0.25–0.7 s, λ =0.15). The direction (bottom left or top right) and strength of motion are determined randomly from ± {0, 3.2, 6.4, 12.6, 25.6, 51.2} % coherence (coh). The sign of the coherence indicates direction (positive for leftward). The values control the probability that a dot plotted on frame n will be displaced by Δx on frame n + 3 (Δt =40 ms), as opposed to randomly replaced. The displacement is consistent with velocity 5 dva s−1 (see Ref. 7 for additional details). The monkey is rewarded for choosing the correct target (trials with 0% coh were rewarded randomly). Errors are punished by extending the intertrial interval by up to 3 s (see Ref. 19 for additional details). On approximately half of the trials, a 100 ms pulse of weak motion (± 4% coh) was added to the RDM stimulus at a random time (0.1–0.8 s, λ =0.4) relative to RDM onset (similar to Ref. 23). Monkey M performed 9684 trials (5 sessions); monkey J performed 8142 trials (3 sessions). The data are also analyzed in a companion paper19.
In the oculomotor delayed response task (ODR;14,24), one target was flashed briefly (200 ms at a pseudorandom location in the visual field. After a variable delay (0.4–1.1 s, λ =0.3 for monkey M, 0.5–1.5 s, λ =0.2 for monkey J), the fixation point was extinguished and the monkey made a saccade to the remembered location of the target ±2.5 dva to receive a reward. This task was conducted to provide a rough characterization of the neural response fields during the visual, perisaccadic and delay epochs. They were also used to assess the stability of the recording over the session. Neurons were designated Tin if they exhibited spatially selective persistent activity at the location of the response target in the visual hemifield contralateral to the recorded hemisphere. The example session contained 17 Tin neurons, and we refer to the unweighted mean firing rate as . These analyses were conducted post hoc, after spike sorting.
Behavioral analyses
We fit a variant of the drift-diffusion model (Fig. 1c) to the choice-RT data from each session. Details of the model and the fitting method are described in Ref. 25. The model constructs the decision process as a race between two accumulators: one accumulating evidence for the Tin choice and against the Tout choice (e.g., left minus right) and one accumulating evidence for a Tout choice and against a Tin choice (e.g., right minus left). The decision (Tin or Tout) is determined by the accumulator that first exceeds its positive decision bound, at which point the decision is terminated. The races are negatively correlated with one another, owing to the common source of noisy evidence. We assume they share half the variance, , but the results are robust to a wide range of reasonable values. The decision bounds are allowed to collapse linearly as a function of time, such that
We used the method of images to compute the probability density of the accumulated evidence (x) for each accumulator as a function of time (t) using a time-step of 1 ms. We assumed that x = 0 at t = 0. The decision time distributions rendered by the model were convolved with a Gaussian non-decision time distribution, which summarizes sensory and motor delays, to generate the predicted RT distributions. The model has 6 parameters in total: κ, B0, α, µnd, σnd, and C0, where κ determines the scaling of motion strength to drift rate, C0 implements bias in units of signed coherence26, µnd is the mean non-decision time and σnd is its standard deviation.
For analytic tractability using the method of images, the model has no lower bound. This leads to a distortion of the mean and variance for the strongly negative coherences (given that spike rates can not go negative). Therefore, for plotting purposes only (Fig. 3a), we implemented a simple version of a lower bound (at − B0). That is, having obtained the distribution of DV for all time points, any density below the lower bound was placed at the lower bound before we calculated the means and variances across time.
Neurophysiology
We used prototype “alpha” version Neuropixels1.0-NHP45 probes (IMEC/HHMI-Janelia) to record the activity of multiple isolated single-units from the ventral subdivision of area LIP (LIPv 27). We used anatomical MRI to identify LIPv and confirmed its physiological hallmarks with single-neuron recordings (Thomas Recording GmbH) before proceeding to multi-neuron recordings. Neuropixels1.0-NHP45 enable recording from 384 out of 4416 total electrical contacts distributed along the 45 mm long shank. All data presented here were recorded using the 384 contacts closest to the tip of the probe (Bank 0), spanning 3.84 mm. Reference and ground signals were directly connected to each other, and connected to the monkey’s headpost. A total of 1084 neurons were recorded over eight sessions (54–203 neurons per session). (Table 1).
*Example session; FR, mean firing rate (sp/s) 75–125 ms after motion onset, across trials and Tin neurons; excluded, refers to neurons with weights set to zero in some state space analyses (see Methods)
The Neuropixels1.0-NHP45 probe is connected to a standard commercially-available headstage for the Neuropixels1.0 probes, connected via the standard Neuropixels1.0 5m cable to the PCI eXtensions for Instrumentation (PXIe) hardware (PXIe-1071 chassis and PXI-6141 and PXIe-8381 I/O modules, National Instruments). Raw data were acquired using the SpikeGLX software (http://billkarsh.github.io/SpikeGLX/), and single-units were identified offline using the Kilosort 2.0 algorithm28,29, followed by manual curation using Phy (https://github.com/cortexlab/phy). The spike times were then synchronized with task events acquired by the experimental control system (Rex30) and OmniPlex (Plexon Inc.).
Neural data analysis
The spike times from each neuron are represented as delta functions of discrete time (1 kHz) and convolved with a 50 ms boxcar filter to achieve a coarse representation of firing rate as a function of time, sij(t), on each trial i and each neuron j. We used an 80 ms boxcar filter to calculate correlation between neuron firing rate and behavior (Fig. 2d–f) because it led to more robust estimates of summary statistics (e.g., %-mediation for different choice of early and late time points).
For analyses that use standardized signals—derivation of d, PCA, and choice-decoding—we expressed the rates as z-scores based on the mean and standard deviation of the firing rate 100 ms after motion onset. In some sessions, this led to the exclusion of a small number of very low firing-rate neurons which did not show any activity on any trial in the normalization window (i.e., 75–125 ms after motion onset; Table 1). Those neurons were assigned zero weight and do not contribute to the degrees of freedom in statistical tests. In the example session, this eliminated 5 neurons, thereby reducing the effective number of neurons from 191 to 186. The same uniformly-weighted 50 ms filter was used for data visualization with the exception of examples of single-trial activity (Fig. 2c, Fig. 3d, Fig. 4b–d) for which the raw data was convolved with a truncated Gaussian filter generated using the Matlab function, gausswin (width = 80 ms, width-factor = 1.5, σ ≈ 26 ms).
The expected mean and variance of the decision variable as a function of time and coherence (µc(t) and , respectively) was furnished by the fits of the behavioral model (see above). The model parameters govern the probability density of the DV for each coherence and time, and µc(t) and
were computed using the density unabsorbed by the decision bounds. We then found the direction in neural state space (defined by d) for which projections onto it—rendering Sd(t)—best fit µc(t) and
. Goodness of fit was quantified using the K-L divergence between the predicted and fitted distributions of the DV. For simplicity, we calculated K-L divergence under the assumption that these distributions are Gaussian:
where p is the DV distribution given by the behavioral model fit and q is the DV distribution given by Sd. The weights comprising d were optimized using a quasi-newton search algorithm (fminunc, MATLAB). The weights were applied to the normalized activity of each neuron and the fits were evaluated within the epoch 200–400 ms after motion onset. Because it is always the case that µp(0) = 0 and
, we enforced that the same be true for µq(0) and
by subtracting Sd(0) from Sd(t) on each trial. We initially encouraged sparsity in the weight vector by adding a
term to the cost function (similar to L1 regularization), where λ was optimized through cross-validation. As results were robust to changes in λ, final weight vectors were fit without a regularization term. To determine to what degree the representation of this diffusion process was driven by
activity, we removed all Tin neurons from the population and then refit the weight vector
. We refer to the projection of the data onto this NSS direction as Sd−. We refer to the projection of the population response on d as Sd and use Sd(t) to render single trial firing rates (in arbitrary units, owing to standardization).
Spatial selectivity of d
To create the heat map in Fig. 3f we used the mean of Sd(t) during the memory-delay period of the ODR task (300–600 ms after target onset). Equivalent maps for all sessions capturing the visual response to the target (100–300 ms after target onset), the delay period activity and activity just before the saccade (250 to 50 ms before the saccade) are shown in Fig. S2.
Choice decoder
For each experimental session, we trained a linear choice decoder on 50% of trials in sliding 50 ms windows between 100 and 500 ms after motion onset. Decoder accuracy was cross-validated using the activity of held-out trials at the same time point. We additionally computed a logistic regression of and
activity against choice at each time point and compared their cross-validated choice prediction accuracy against the performance of the choice decoder and the theoretical peak decoding performance determined through simulations (Fig. 4a). To evaluate whether the optimal choice decoder represented a one-dimensional signal, we tested whether decoding accuracy was stable using the same weights over time16. Decoder weights generated for one time point were therefore applied to the held-out neural data at all other time points to then recomputed the decoding accuracy. With such time-independence established (Fig. 4b, left), a decoder dimension was generated by applying the decoder weights fit to the activity at t = 0.35 s after motion onset to all trials and time points.
Principal Components Analysis
We performed PCA on the normalized within-coherence mean activity from 200 ms after motion onset to the earlier of two events: 500 ms after motion onset or median RT minus 100 ms. Visual inspection of the Principal Components (PC) indicated that this yielded typically one or a small number of PCs that represented an evidence-dependent rise in activity after motion-stimulus onset. To determine which PC was most likely to reflect decision-related activity on single trials, we computed the correlations between projections of the neural data onto the first ten PCs with . In seven out of the eight experimental sessions, the first PC yielded the greatest correlation coefficient, highlighting that not only the average but also trial-to-trial variations in
resembles the main signal embedded in LIP activity in this task. Neural data projected onto this PC was then evaluated against our criteria for a decision variable (see below).
Spectral clustering
We used spectral clustering to group neurons into four clusters, using the correlation matrix between neuron pairs to construct the similarity matrix. For each neuron and trial, we counted the number of spikes in a time window between −200 and 0 ms relative to motion onset. We then calculated the Pearson correlation coefficient (ρij), using these spike counts, for every pair of neurons i and j. The similarity (affinity) matrix was defined as ρij − min(ρ), where min(ρ) is the smallest element of the correlation matrix. k-means clustering with 4 components was used to cluster the eigenvalues of the (random-walk normalized) Laplacian matrix31. We again computed the correlations between the mean activity in each cluster with to determine which cluster was the most likely to yield diffusion dynamics. The mean activity of the most-highly correlated cluster was used to generate a spectral clustering dimension (Cluster-1).
Demixed principal component analysis (dPCA)
We use dPCA17 to reduce the dimensionality of the data while attempting to separate the motion-strength dependent components. For each neuron, we calculated the mean spike-counts across trials within coherence (in 10 ms bins) in the epoch 200–500 ms from motion onset. These signals were z-scored within neuron before we derived 5 dPCs using Matlab code provided by Ref. 17 (http://github.com/machenslab/dPCA). We identified the component (dPC1) that rendered coherence-dependent mean signals (Fig. S8a).
Correlation between single-trial activity and behavior
We used measures of correlation between the value of single-trial signals and both the RT and choice on that trial, restricting to trials with reaction times between between 0.670–2 s. Single-trial signal refers to the projections of the neural population activity onto vectors of weights (e.g., Sd(t)). For Tin neurons and Cluster-1 (Fig. 4d) it is the average firing rate of the neural responses as a function of time. We computed the Pearson correlation, ρ(t), between reaction time and residual neural activity (mean-subtracted within motion coherence in a sliding 80-ms time window) on trials that resulted in a Tin choice. Negative correlation indicate that greater neural activity at a given time point was predictive of earlier Tin choices. We also compute the partial correlation conditional on the residual at t = 0.55 s. The reduction in R2 induced by the later sample is the %-mediation, which we limit to the range 0–100%.
Correlation with a Tin choice was established using the coefficient, β1, in logistic regression (same restriction on RT):
Analogous to partial correlation, we included the later time point and fit
and defined mediation as
As represented here, the mediation of the earlier S(t) is by a later sample of the same signal (self). We prefer the simplicity of this definition of mediation to others based on pseudo-R2. For both types of mediation, we also test whether the earlier S(t) is mediated by the Tin neurons by substituting
for S(0.55) in Eq. 6.
The box-whisker plots (Figs. 2 to 4) summarize the distribution of %-mediation statistics for t = 0.35 s across experimental sessions. To determine statistical significance, we compare the %-mediation of data pooled across sessions against a null distribution generated by permuting, within session and signed coherence, the trials used for the later time point (N = 1000).
Similarity to 
To quantify the similarity of the stochastic (diffusion) component of to other single-trial signals, we compute the correlation coefficient between the pair of detrended signals produced on the same trial, from t = 0.2 s to t = RT − 0.1. Detrending removes the mean response of trials of the same signed motion coherence. The r-values from each trial are summarized by histograms in Figs. 3 and 4. We used a one-tailed Kolmogorov-Smirnov (KS) test to compare histograms.
Author contributions
G.M.S. and M.N.S. designed the experiment. N.A.S., G.M.S. and E.M.T. implemented the experimental set-up. G.M.S. collected the data. N.A.S., G.M.S., A.Z., D.M.W and M.N.S. analyzed the data. All authors wrote the manuscript.
Additional information
Supplementary Information is available for this paper. Correspondence and requests for materials should be addressed to shadlen{at}columbia.edu.
Extended Data
Activity in the diffusion direction d during three time windows of the memory saccade task is plotted as a heat map as a function of target location. From left to right, each row contains maps of the visual response to the target (100–300 ms after target onset), the delay period activity during which the monkeys are remembering the location of the target no longer shown on the screen (300–600 ms after target onset) and activity just before the saccade (250 to 50 ms before the saccade). Filled red circles indicate the locations or response targets during the RDM task and open white circles the locations of all other target locations probed during the mapping task.
a, Same as in Fig. 4a. The weights at t = 0.35 s define a direction in neuronal state space which is used to construct a one-dimensional signal, S(t). b, Correlation and partial correlation S(t) with RT for RT>0.67 s, as in Fig. 3g. c, Leverage of S(t) on choice, as in Fig. 3h. Partial mediation by later sample of S(t) is apparent in seven sessions; partial mediation by is apparent in six (p < 0.001).
a, Distribution of weights for Tin (copper) and non-Tin (blue) neurons, combined across all sessions. Columns address the same analysis method: diffusion as in (Fig. 3), first Principal Component (Fig. 4c), first spectral clustering component (Fig. 4d), first demixed dPC (Fig. S8), and weights derived from the logistic choice-decoder (Fig. 4a,b). The weights usually follow a bell-shaped distribution, with the exception of the spectral clustering analysis for which the weights are binary. b, Probability that a neuron assigned a weight (abscissa) is a Tin neuron. The curve is logistic regression.
a, Mean activity projected onto dPC1 aligned to motion stimulus onset sorted by coherence and motion direction. b-c, Mediation analyses using the mean activity projected onto dPC1, as in Fig. S4.
Median within-trial correlation between and five measures of decision-related activity.
Intersection between Tin population and Spectral Cluster-1. nN = number of neurons; nC1 = number of cells in cluster; % C1 = percentage of cells in cluster; of Tin cells; %Tin in C1 = of the Tin cells, what percent are in Cluster-1. % C1 in Tin = of the Cluster-1 cells, what percent are Tin neurons;
Acknowledgments
We thank Shushruth for comments on the manuscript, Cornel Duhaney and Brian Madeira for their assistance in the planning and execution of surgeries, animal training and general support, and we thank Columbia University’s ICM for the quality of care they provide for our animals, especially during the pandemic and lockdown. We would further like to thank Tanya Tabachnik and her team at the Zuckerman Institute Advanced Instrumentation Core and Tim Harris, Wei-lung Sun, Jennifer Colonell, and Bill Karsh at HHMI Janelia for their continued support with Neuropixels1.0-NHP45 probes development and testing. This research was supported by the Howard Hughes Medical Institute; an R01 grant from the NIH Brain Initiative (M.N.S., R01NS113113); a T32 and F31 grant from the National Eye Institute (G.M.S, T32 EY013933, F31 EY032791); the Grossman center; and the Brain and Behavior Research Foundation.
Footnotes
References updated