Neural population dynamics in dorsal premotor cortex underlying a reach decision

We investigated if a dynamical systems approach could help understand the link between decision-related neural activity and decision-making behavior, a fundamentally unresolved problem. The dynamical systems approach posits that neural dynamics can be parameterized by a state equation that has different initial conditions and evolves in time by combining at each time step, recurrent dynamics and inputs. For decisions, the two key predictions of the dynamical systems approach are that 1) initial conditions substantially predict subsequent dynamics and behavior and 2) inputs should combine with initial conditions to lead to different choice-related dynamics. We tested these predictions by investigating neural population dynamics in the dorsal premotor cortex (PMd) of monkeys performing a red-green reaction time (RT) checkerboard discrimination task where we varied the sensory evidence (i.e., the inputs). Prestimulus neural state, a proxy for the initial condition, predicted poststimulus neural trajectories and showed organized covariation with RT. Furthermore, faster RTs were associated with faster pre- and poststimulus dynamics as compared to slower RTs, with these effects observed within a stimulus difficulty. Poststimulus dynamics depended on both the sensory evidence and initial condition, with easier stimuli and “fast” initial conditions leading to the fastest choice-related dynamics whereas harder stimuli and “slow” initial conditions led to the slowest dynamics. Finally, changes in initial condition were related to the outcome of the previous trial, with slower pre- and poststimulus population dynamics and RTs on trials following an error as compared to trials following a correct response. Together these results suggest that decision-related activity in PMd is well described by a dynamical system where inputs combine with initial conditions that covary with eventual RT and previous outcome, to induce decision-related dynamics.


Introduction
and depends on the outcome of the previous trial, respectively (Purcell and Kiani, 2016 Figure 1: Initial conditions and inputs predict subsequent neural dynamics and behavior (A) The initial condition hypothesis from delayed reach experiments (Afshar et al., 2011) posits that the position and velocity of a neural state at the time of the 'go' cue ("initial condition") negatively correlates with RT. That is for faster RT trials, neural state at the time of the go cue is 1) further along ("position") relative to the mean neural trajectory and thus closer to the movement initiation state and 2) has a greater rate of change in the direction of the mean neural trajectory ("velocity"). (B) The neural population state at the end of a perceived time interval and a gain modifier actuates the initial conditions (Set, circles) determining the speed (arrows) of subsequent dynamics and therefore when an action is produced (Go, X's) (Remington et al., 2018b). (C & D) Prestimulus neural activity differs for speed and accuracy contingencies for speedaccuracy tradeoff tasks (Heitz and Schall, 2012) or after correct and error trials (Thura et al., 2017). (E) Biased initial conditions predict both RT and choice (X 0 ∼ RT, choice) and combine with sensory evidence to lead to decisions. Initial neural states vary trial-to-trial, and are closer to the movement onset state for one choice (here left). Trials with neural states closer to a left movement onset at stimulus onset will have faster RTs and RTs will be slower for right choices. Trial outcomes have no effect on initial conditions in this model as initial conditions largely reflect a reach bias. (F) Initial conditions solely predict RT (X 0 ∼ RT ). The position of the initial condition before checkerboard onset is closer to a movement initiation state and the velocity of the dynamics are faster for fast RTs compared to slow RTs. Previous outcomes shift these initial conditions such that the dynamics are either faster or slower, leading to faster or slower RTs respectively. Overall dynamics depend on both the initial conditions and the sensory evidence. Current population state at stimulus onset/go cue (dots within an ellipse) evolves along trajectories of varying speed (color bars in A & E; apply to A, B, E and F) as set by the initial conditions (A) and also inputs after stimulus onset (E & F). In E and F light/dark opacity of the arrowhead indicates weak/strong stimulus input.   Figure 2: Monkeys can discriminate red-green checkerboards and demonstrate rich variability in RTs between and within stimulus coherences (A) An illustration of the setup for the behavioral task. We loosely restrained the arm the monkey was not using with a plastic tube and cloth sling. A reflective infrared bead was taped on the middle digit of the active hand to be tracked in 3D space. We used the measured hand position to mimic a touch screen and to provide an estimate of instantaneous arm position; eye position was tracked using an infrared reflective mirror placed in front of the monkey's nose. (B) Timeline of the discrimination task. (C) Examples of different stimuli used in the experiment parameterized by the color coherence of the checkerboard cue. Positive values of signed coherence (SC) denote more red (R) than green (G) squares and vice versa. (D) Psychometric curves, percent responded red, and (E) RTs (correct and incorrect trials) as a function of the percent SC of the checkerboard cue, over sessions of the two monkeys (T: 75 sessions; O: 66 sessions). Dark orange markers show measured data points along with 2 × SEM estimated over sessions (error bars lie within the marker for many data points). The black line segments are drawn in between these measured data points to guide the eye. Discrimination thresholds measured as the color coherence level at which the monkey made 81.6% correct choices are also indicated. Thresholds were estimated using a fit based on the cumulative Weibull distribution function.
(F) Box-and-whisker plot of RT as a function of unsigned checkerboard coherence with outliers plotted as blue circles. Note large RT variability within and across coherences. (G) The recording location, caudal PMd (PMdc), indicated on a macaque brain, adapted from Ghazanfar and Santos (2004). Single and multi-units in PMdc were primarily recorded by a 16 electrode (150-µm interelectrode spacing) U-probe (Plexon, Inc., Dallas, TX, United States); example recording depicted.
(ranging from ∼ 1500 ms to ∼ 3500 ms). Using timeouts for errors encouraged animals to prioritize accuracy  Figure 3: Firing rates of a heterogeneous population of PMd neurons are modulated by the input (i.e., strength of the sensory evidence) and the initial conditions (prestimulus firing rate) covaries with RT before stimulus onset. (A, B) Firing rate activity across (A) 7 levels of color coherence and (B) 11 RT bins and both action choices (right -dashed, left -solid) of 5 example units in PMd from monkeys T and O aligned to stimulus onset (Cue/vertical dashed black line). Firing rates are plotted until the median RT of each color coherence and until the midpoint of each RT bin (notice slightly different lengths of lines). Color bars indicate the level of difficulty for the coherence (violet -mostly one color, orange -nearly even split of red and green squares) or RT speed. Gray shading is SEM . In A, the firing rates separate faster for easier choices compared to harder choices, and in B, the same neurons show prestimulus modulation as a function of RT (X 0 ∼ RT ). trials (including both correct and wrong trials) across all the different stimulus coherences and sorted by RT and choice before averaging. 133 To identify the number of relevant dimensions for describing this data, we used a principled approach developed in 134 Machens et al. (2010) (see 4.10 for details). Firing rates on every trial in PMd during this task can be thought of 135 as consisting of a combination of signal (i.e., various task related variables) and noise contributions from sources 136 outside the task such as spiking noise, for example. Trial averaging reduces this noise but nevertheless when PCA 137 is performed it returns a principal component (PC) space that captures variance in firing rates due to the signal 138 and variance due to residual noise ("signal+noise PCA"). Ideally, we only want to assess the contributions of the 139 signal to the PCA, but this is not possible for trial-averaged or non-simultaneously recorded data. To circumvent 140 this issue and determine the number of signal associated dimensions, the method developed in Machens et al. 141 (2010) estimates the noise contributions by performing a PCA on the difference between single trial estimates of 142 firing rates, to obtain a "noise" PCA. Components from the signal+noise PCA and the noise PCA were compared 143 such that only signal+noise dimensions that were significantly greater than the noise dimensions (identified as 144 the first point where the signal+noise variance was significantly lower than the noise variance by at least 3 × 145 SEM) were included in further analyses. The assumption here is that the dimensions above the noise are largely 146 dominated by the signal and the dimensions below the noise are largely noise dimensions. This analysis yielded 147 six PCs that explained > 90% of the variance in firing rates (Fig. S1).  Note, such covariation between prestimulus neural state and RT was not an artifact of pooling across all the 160 different stimulus difficulties and was observed even within a level of stimulus coherence (note similarities between 161 Fig. 4B & Fig. 6C). We discuss this further in section 2.6 where we analyzed the joint effects of inputs and 162 initial conditions.

163
Collectively, the visualization using PCA firmly suggests that prestimulus state predicts poststimulus dynamics 164 and covaries with RT but not the eventual choice. In subsequent sections, we used various analyses to further 165 understand if these data could be understood through the lens of a dynamical system that has varying initial 166 conditions, and inputs. We first examined how initial conditions control the dynamics of decision-making, and 167 then how they combined with inputs to drive decisions. The dynamical systems perspective predicts that poststimulus dynamics and behavior depend upon the position 170 and velocity of prestimulus neural trajectories in state space (i.e., initial conditions) Vyas 171 et al., 2020a). Position is the instantaneous location in a high-dimensional state space of neural activity (i.e., 172 firing rate of neurons) and velocity is a directional measure of how fast these positions are changing over time 173 (i.e., directional rate of change from one neural state to the next). We used the Kinematic analysis of Neural 174 Trajectories (KiNeT) approach recently developed by Remington et al. (2018b) Figure 4: Prestimulus population firing rates covary with RT (A) The first four PCs (P C 1,2,3,4 ) of trial averaged firing rates organized across 11 RT bins (violet -fastest bin to orange -slowest bin, both reach directions (right -dashed lines, left -solid lines), and aligned to checkerboard onset. Percent variance explained by each PC is indicated at the top of each plot. (B) State space trajectories of the 1st, 2nd and 4th PCs (P C 1,2,4 ) aligned to checkerboard onset (red dots). Prestimulus neural activity robustly separates as a function of RT bin. Diamonds and squares, color matched to their respective trajectories, indicate 250 ms post-checkerboard onset and 20 ms time steps respectively. Note that faster RT trajectories appear to move faster in the prestimulus period than slower RTs ("fast/slow prestim", also see G). (C) "KiNeT distance" analysis showing that trajectories are consistently spatially organized before and after stimulus onset and correlated with RT. (D) Angle between subspace vector at each timepoint and subspace vector at the first timepoint (-400ms). The angle between subspace vectors is largely consistent but the space rotates as choice signals emerge (green highlight box). (E) Average relative angle between adjacent trajectories for each timepoint. The angles between adjacent trajectories were largely less than 90 • for the prestimulus period but approach orthogonality as choice signals emerge poststimulus. (F) "KiNeT Time to reference" (t Ref , relative time at which a trajectory reaches the closest point in Euclidean space to the reference trajectory) analysis shows that trajectories for faster RTs reach similar points on the reference trajectory (cyan, middle trajectory) earlier than trajectories for slower RTs. This result suggests that the dynamics for faster RTs are closer to a movement initiation state than slower RTs. (G) Average scalar speed for the prestimulus period (-400 to 0 ms epoch) as a function of RT bin. Firing rates across the population change faster (both increases and decreases) for faster RTs and slower for slower RTs. Error bars are bootstrap SEM . (H) Choice selectivity signal measured as the Euclidean distance in the first six dimensions between the two reach directions for each RT bin aligned to checkerboard onset. The rate at which Choice selectivity (CS) emerges is faster for faster RTs compared to slower RTs (green highlight box). In C & F the x-axis is labelled "Time (ms)", this should be understood as time on the reference trajectory. Abbreviations: Checkerboard onset -Cue & vertical black dashed line, a. u. -Arbitrary units.
First, we used KiNeT to assess if position of the initial conditions was related to RT. If the position of the initial condition covaries with RT then we would expect a lawful ordering of neural trajectories organized by RT bin,180 otherwise they would lie one on top of the other indicating a lack of spatial organization. Thus, we examined the 181 spatial ordering of six-dimensional neural trajectories grouped by RT bins for each reach direction. We estimated 182 the signed minimum Euclidean distance at each point for the trajectory relative to a reference trajectory (the 183 middle RT bin, cyan, for that reach direction, Fig. 4C). Trajectories were 1) organized by RT with trajectories 184 for faster and slower RT bins on opposite sides of the the reference trajectory, and 2) the relative ordering of 185 the Euclidean distance with respect to the reference trajectory was also lawfully related to RT (Fig. 4C) as 186 measured by a correlation between RT and the signed Euclidean distance at -100 ms before checkerboard onset 187 (r = -0.85, p = 9.45 × 10 −4 ). These data are consistent with the prediction that the position of the initial 188 condition correlates with RT.

189
Second, we examined if the relative ordering of trajectories by RT in the prestimulus period predicted the ordering

207
Third, we examined if the velocity of the peristimulus dynamics was faster for faster RTs compared to slower RTs.

208
For this purpose, we used KiNeT to find the timepoint at which the position of a trajectory is closest (minimum 209 Euclidean distance) to the reference trajectory, which we call Time to reference (t ref , Fig. 4F). Trajectories slower 210 than the reference trajectory will reach the minimum Euclidean distance relative to the reference trajectory later to as speed in Remington et al. (2018b). Although a trajectory could reach the closest point to the reference 215 trajectory later due to a slower speed, it could also be due to unrelated factors such as starting in a position in 216 state space further from movement onset or by taking a more meandering path through state space. All of these 217 effects are consistent with a longer t ref and a slower velocity, but not necessarily a slower speed.

218
KiNeT revealed that faster RTs involved faster pre-and poststimulus dynamics whereas slower RTs involved 219 slower dynamics as compared to the reference trajectory (trajectory associated with the middle RT bin, cyan) 220 (Fig. 4F). There was also a positive correlation between RT bin and t ref as measured by KiNeT at -100 ms 221 before checkerboard onset (r = 0.82, p = 1.96 × 10 −3 ). Additionally, we found that the overall scalar speed 222 of trajectories in the prestimulus state for the first six dimensions (measured as a change in Euclidean distance 223 over time and averaged over the 400 ms prestimulus period) covaried lawfully with RT (Fig. 4G). Thus, the 224 'velocity' of the initial condition, relative to the reference trajectory, is faster for faster RTs compared to slower 225 RTs, coherent with the prediction of the initial condition hypothesis (Afshar et al., 2011). 226 Collectively, these results firmly establish that the initial condition in PMd correlates with RT and that the geometry and dynamics of these decision-related trajectories strongly depend on the position and 'velocity' of 228 the initial condition consistent with the hypothesis shown in Fig. 1F (Afshar et al., 2011). The previous analyses demonstrated that initial conditions strongly covaried with RT consistent with the hypoth-231 esis shown in Fig. 1F. Does the initial condition also predict choice? To investigate this issue, we first examined 232 the covariation between prestimulus and poststimulus state with choice by measuring a choice selectivity signal 233 identified as the Euclidean distance between the left and right choices in the first six dimensions at each timepoint.

234
The choice selectivity signal was largely flat during the prestimulus period and increased only after stimulus onset 235 (Fig. 4H). We also found that slower RT trials had delayed and slower increases in the choice selectivity signal 236 compared to the faster RTs, a result consistent with the slower overall dynamics for slower compared to faster 237 RTs (Fig. 4H). Consistent with this observation, we found a negative correlation between the average choice 238 selectivity signal in the 125 to 375 ms period after checkerboard onset and RT (r =-0.87, p = 4.22 × 10 −4 ).

258
Note, prediction of RT by spiking activity was not just an artifact of RT covarying with the coherence. A linear 259 regression with binned spiking activity and coherence as predictors explained significantly more variance in RTs in 260 all prestimulus bins than a linear regression of RTs with solely coherence as the predictor (only the last prestimulus 261 bin is reported here: Mean ± SD: 13.66 ± 8.9%, 6.32 ± 5.97%; Wilcoxon rank sum comparing median R 2 , p 262 = 2.97 × 10 −9 , Fig. 5E). Therefore, nearly equal amounts of RT variance are explained by prestimulus neural 263 spiking activity (∼ 7%) and the coherence of the eventual stimulus (6.32%, Fig. 5E).

264
In contrast, a logistic regression using binned spiking activity to predict choice, failed to predict choice, during 265 the prestimulus period, more than the 99th percentile of accuracy from a logistic regression using trial-shuffled 266 spiking activity (Fig. 5D). Similar logistic regressions were built for each session and accuracy was averaged 267 across bins and sessions. The average prestimulus accuracy in predicting choice ( Fig. 5F) was no better than the 268 99th percentile of averaged prestimulus accuracy from similar logistic regressions built on trial-shuffled spiking 269 activity (Mean ± SD: 50.08 ± 0.51%, 50.00 ± 0.03%, only one session was larger than the shuffled data out of 270 51 comparisons, one-tailed binomial test, p > 0.999, Fig. S2B). Prestimulus spiking activity was no better than 271 chance at predicting eventual choice even when trials were grouped by RT bins (Fig. S2C) Figure 5: Single-trial analysis, linear regression, and decoders reveal that initial conditions predict RT but not choice (A/B) LFADS (Pandarinath et al., 2018) trajectories in the space of the first three orthogonalized factors (X 1,2,3 ), obtained via PCA on LFADS latents, plotted for (A) the fastest 30% of trials (blue) and the slowest 30% of trials (red) for one reach and (B) for left (purple) and right (green) reaches, all for the easiest coherence from a single session (23 units). Each trajectory is plotted from 200 ms before checkerboard onset (dots) to movement onset (diamonds). (C/D) Variance explained (R 2 )/decoding accuracy from linear/logistic regressions of binned spiking activity and coherence (20 ms) to predict trial-matched RTs/eventual choice from all 23 units in the LFADS session shown in A/B. The magenta and light green dotted lines are the 99th and 1st percentiles of R 2 /accuracy values calculated from averaged models of trial-shuffled (shuffled 500 times) spiking activity and RTs/choice. (E/F) R 2 /accuracy values, calculated as in C/D, averaged across 51 sessions. 6.32% is the average percentage of variance explained across the 51 sessions for both monkeys by regressions using stimulus coherence to predict RTs. Orange shaded area is SEM . 50% accuracy in D/F is denoted by the black dotted line.
key line of evidence in support of the hypothesis outlined in Fig. 1F that initial conditions covary with RT but 273 not choice. 274 2.6. Inputs and initial conditions both contribute to the speed of poststimulus decision-related dynamics 275 Thus far we have shown that the initial conditions predict RT but not choice. Our monkeys clearly demonstrate 276 choice behavior that depends on the sensory evidence, and also are generally slower for harder compared to 277 easier checkerboards. These behavioral results and the dynamical systems approach make two key predictions: 278 1) sensory evidence (i.e. the input), should modulate the rate at which choice-selectivity emerges after stimulus 279 onset and 2) the overall dynamics of the choice selectivity signal should depend on both sensory evidence and 280 initial conditions.

281
To test the first prediction, we performed two analyses. First, we performed a PCA on firing rates of PMd 282 neurons organized by stimulus coherence and choice. Fig. 6A shows the state space trajectories for the first 283 three components. In this space, activity separates faster for easier compared to harder coherences. Consistent 284 with this visualization, choice selectivity increases faster for easier compared to harder coherences (Fig. 6B).

285
These results suggest that poststimulus dynamics are at least in part controlled by the sensory input consistent 286 with the predictions of the dynamical systems hypothesis.

287
To test the second prediction of how sensory evidence and initial conditions jointly impact the speed of post-288 stimulus dynamics, we performed a PCA of PMd firing rates conditioned on RT and choice within a coherence.

289
To obtain these trajectories, we first calculated trial-averaged firing rates for the various RT bins within each 290 coherence. We then projected these firing rates into the first six dimensions of the PC space organized by choice 291 and RTs ( Fig. 4A & B). This projection preserved more than 90% of the variance captured by the first six 292 dimensions of the data organized by RT bins and choice within a coherence which ranged from 75 to 80% of the 293 total variance of the data for a given coherence. Consistent with the results in Fig. 4B, the prestimulus state 294 again correlates with RT even within a stimulus difficulty (Fig. 6C).

295
To assess how inputs and initial conditions jointly influenced decision-related dynamics, we again computed the the choice selectivity signal is more delayed for the slower RTs compared to the faster RTs, while a similar slope 302 effect is still observed (i.e., steeper slope for fast RTs as compared to slow RTs) (Fig. 6D, bottom panel). These 303 plots suggest that inputs and initial conditions combine and alter the rate and latency of choice-related dynamics. 304 We quantified these patterns by first measuring the rate at which choice-selectivity emerges. Our metric was the 305 average choice selectivity signal in the 200 ms period from 125 to 325 ms after checkerboard onset as a function 306 of the initial condition and for each of the 7 coherences. We obtained an estimate of the initial condition by using 307 a PCA to project the average six-dimensional location in state space in the -300 ms to -100 ms period before 308 checkerboard onset for each of these conditions on to a one-dimensional axis (see 4.14). As Fig. 6E shows, 309 the rate at which the choice selectivity signal emerges is greater for easier coherences across the board but also 310 weaker or stronger depending on the initial condition. Furthermore, when coherence is fixed, the average rate of 311 the choice selectivity signals depends on the initial condition. A partial correlation analysis found that the rate 312 at which choice selectivity emerges depends on both the initial condition (r = 0.85, p < 0.001) and the sensory 313 evidence (r = -0.38, p < 0.001). These results are key evidence that choice-selective, decision-related dynamics 314 are controlled both by the initial condition and the sensory evidence. 315 We also measured the latency at which choice selectivity emerged and how it depended on initial condition and 316 sensory inputs. To estimate latency, we fit the choice selectivity signal (CS(t)) using a piecewise function as selectivity signal (averaged over the time period from 125 to 375 ms after checkerboard onset) as a function of the initial condition within each coherence. As expected easier coherences lead to higher choice selectivity signals regardless of RT, but the rates and the latencies of this signal depend on the initial condition as well as sensory evidence. (F) Latency of the choice selectivity signal as a function of the initial condition and for each stimulus coherence. As expected from D, the latency is largely flat for the easier coherences and faster RT bins (regardless of coherence), but slower for the harder coherences. For clarity, only four of the seven coherences are shown in E & F. input and the initial condition. Latencies are slower when the initial condition is in the slow RT state and the sensory input is weak, but faster for strong inputs and when the initial condition is in a fast RT state. Again, a 320 partial correlation analysis found that the latency of choice selectivity depends on both the initial condition (r = 321 -0.60, p < 0.001) and coherence (r = 0.44, p < 0.001).

322
Collectively, these results strongly support a dynamical system for decision-making where both initial conditions 323 and inputs together shape the speed of decision-related dynamics and behavior, whereas poststimulus dynamics 324 alone control choice. So far we have demonstrated that the initial condition, as estimated by prestimulus population spiking activity, 327 explains RT variability and poststimulus dynamics in a decision-making task. However, why initial conditions 328 fluctuate remains unclear. One potential source of prestimulus neural variation could be post-outcome adjust-329 ment, where RTs for trials following an error are typically slower or occasionally faster than RTs in trials following 330 a correct response (Danielmeier and Ullsperger, 2011;Purcell and Kiani, 2016;Dutilh et al., 2012). 331 We examined if post-outcome adjustment was present in the behavior of our monkeys. We identified all error, 332 correct (EC) sequences and compared them to an equivalent number of correct, correct (CC) sequences. The 333 majority of the data are from sequences of the form "CCEC" (78%), while the remainder of "EC" sequences were 334 compared to other "CC" sequences (22%). Associated RTs were aggregated across both monkeys and sessions. 335 We found that correct trials following an error were significantly slower than correct trials following a correct trial 336 (M ± SD: 447 ± 117 ms, 428 ± 103 ms; Wilcoxon rank sum comparing median RTs, p = 8.44 × 10 −136 , SFig. 337 3A). Additionally, we found that correct trials following a correct trial were modestly faster than the correct trial 338 that preceded it (M ± SD: 428 ± 103 ms, 431 ± 101 ms; Wilcoxon rank sum comparing median RTs, p = 339 8.48 × 10 −4 , SFig. 3A). Thus, trials where the previous outcome was a correct response led to a trial with a 340 faster RT, whereas trials where the previous outcome was an error led to a trial with a slower RT.

341
Such changes in RT after a previous trial were mirrored by corresponding shifts in initial conditions. A PCA of 342 trial-averaged firing rates organized by previous trial outcome and choice revealed that prestimulus population 343 firing rate covaried with the previous trial's outcome. Post-error correct trials, hereafter post-error trials, showed 344 the largest prestimulus difference in firing rates as compared to other trial outcomes (Fig. 7A, B). and trial outcome (green -correct trial, cyan -correct trial following a correct trial, red -error trial, and magenta -correct trial following an error trial). Percentage variance explained by each PC presented at the top of each plot. (B) 1st, 3rd and 4th PC (P C 1,3,4 ) state space aligned to checkerboard onset (red dots). Plotting of PCs extends 400 ms before checkerboard onset and 400 ms after. Observe how neural activity separates as a function of outcome, but not by choice, up to 400 ms before stimulus onset. Different colored squares and diamonds indicate 20 ms time steps and 280 ms post-checkerboard onset respectively. (C) "KiNeT distance" analysis demonstrating that trajectories are spatially organized with post-error trials furthest from other trial types peri-stimulus as compared to a reference trajectory (green, middle trajectory). (D) Accuracy of logistic regression of spiking activity from the current trial used to predict the outcome of the previous trial. Orange outline is SEM. (E) "KiNeT Time to reference" (t ref ) analysis reveals that prestimulus 'velocity' is slower for post-error trials as compared to the reference trajectory (green, middle trajectory). In C & E the x-axis is labelled "Time (ms)", this should be understood as time on the reference trajectory. Abbreviation: a.u. -arbitrary units, * -p < 0.05, # -p = 0.05.
These results strongly suggest that some of the initial condition covariation with RT Fig. 4 might be related to the previous trial's outcome. To test this hypothesis, we performed two analyses: First, we wanted to know how much of the variance of the firing rate data organized by RT and choice could be accounted for by the subspace 368 spanned by the first six dimensions of the PCA organized by previous trial's outcome and choice (Elsayed et al.,369 2016, "outcome subspace", Fig. 7A, B). We chose the first six dimensions (explains ∼ 90% of the variance) of 370 this outcome subspace as these dimensions were significantly above or equal to noise components (SFig. 3C).

371
This analysis revealed that 77.19% of the total variance for the firing rates organized by RT and choice was 372 explained by the first six dimensions of the outcome subspace suggesting that the previous trial's outcome has a 373 large impact in explaining prestimulus firing rate covariation with RT.

374
In a parallel analysis we performed a dPCA (Kobak et al., 2016) on the population firing rates in the 600 375 ms before checkerboard onset organized by previous trial's outcome and choice, and another organized by RT 376 and choice. The respective axes that maximally separated as a function of previous trial's outcome and that 377 maximally separated as a function of RT demonstrated significant overlap with an angle of 47.8 • between them.

378
These results suggest that the previous trial's outcome leads to a shift in prestimulus dynamics consistent with 379 determining the speed of the dynamics and therefore eventual RTs.

380
Lastly, we examined if there were differences in pre-and poststimulus state with respect to choice between the 381 different trial outcomes. Again the high-dimensional Euclidean distance between left and right choice trajectories 382 was largely flat during the prestimulus period and increased only after stimulus onset for all trial outcomes (SFig.

383
3D). We also found that the separation between choices increased slower for error trials as compared to all other 384 trial outcomes (SFig. 3D).

385
These findings are consistent with the dynamical systems approach as they demonstrate that initial condition 386 before stimulus onset is dependent upon trial history and that pre-and poststimulus dynamics slow down after 387 errors as compared to after correct trials. Collectively, the past trial's outcome leads to different initial conditions,

391
Our goal in this study was to rigorously identify a dynamical system for the neural population activity underlying 392 decision-making as recently demonstrated in studies of neural population dynamics related to motor planning 393 and timing (Afshar et al., 2011;Remington et al., 2018b;Shenoy et al., 2013;Vyas et al., 2020a). To this end, 394 we investigated the neural population dynamics in PMd of monkeys performing a red-green RT decision-making 395 task (Chandrasekaran et al., 2017;Coallier et al., 2015). The prestimulus neural state in PMd, proxy for the 396 initial condition of the dynamical system, was strongly predictive of RT, but not choice. We observed these 397 effects across and within stimulus difficulties and also on single trials. Furthermore, faster RT trials had faster 398 neural dynamics and separate initial conditions from slower RT trials. Additionally, poststimulus, choice-related 399 dynamics were altered by the inputs with easier checkerboards leading to faster dynamics than harder ones.

400
Finally, these initial conditions and the behavior for a trial depended on the previous trial's outcome, where RTs 401 and prestimulus trajectories were slower for post-error compared to post-correct trials. Together, these results 402 suggest that decision-related neural population dynamics in PMd can be well described by a dynamical system 403 where the speed of the choice (the output of the system) is strongly set by its initial conditions. However, the 404 eventual choice itself is determined by the input and the speed of these choice-related dynamics depends on the 405 initial condition. Finally, the outcome of the trial affects the initial condition of the next trial.

410
Regardless of species or brain region, an increasingly common finding is that neurons associated with cognition 411 and motor control are often heterogeneous and demonstrate complex time-varying patterns of firing rates and mixed selectivity (Chaisangmongkon et al., 2017;Rigotti et al., 2013;Mante et al., 2013;Machens et al., 2010;Hanks et al., 2015). Simple models or indices although attractive to define are often insufficient to summarize the 414 activity of these neural populations Chaisangmongkon et al., 2017;Mante et al., 415 2013), and even if one performs explicit model selection on single neurons using specialized models (Latimer et al.,416 2015), the results can be brittle because of the heterogeneity inherent in these brain regions (Chandrasekaran 417 et al., 2018). The dynamical systems approach addresses this problem by using dimensionality reduction and optimization techniques to understand collective neuronal activity of different brain regions and tasks, generally 419 summarizing large population datasets in orders of magnitude fewer dimensions than were recorded from (Okazawa 420 et al., 2021;Mante et al., 2013;Machens et al., 2010). Here, we demonstrated that >90% of the variance from 421 the firing rate activity of nearly 1,000 neurons in PMd during decisions could be explained in just a few (six) 422 dimensions.

423
Besides providing a compact description of population activity, there are three other clear advances afforded 424 by using a dynamical systems approach to study decisions. First, we find lawful relationships between the low-425 dimensional activity of neural populations and task variables such as choice, RT, stimulus difficulty and past 426 outcomes (Mante et al., 2013;Okazawa et al., 2021). Second, this lawful relationship can be understood as 427 emerging from a dynamical system that is parameterized by initial conditions and inputs that subsumes much 428 of decision-making behavior (Vyas et al., 2020a). Finally, this dynamical system naturally bridges previously  In the remainder of the discussion, we further discuss the implications of our identified dynamical system for 434 decision-making models, and unpack the factors that may underlie initial conditions.

435
Our results are an important and significant advance over a previous study of dynamics in PMd during reach 436 planning (Afshar et al., 2011). As described previously in this study, Afshar et al. (2011) showed that in a delayed 437 reach task, the position and velocity of the initial conditions correlated with RT. However, it was unclear from 438 the study, what the role of inputs was and how changes in initial conditions emerge across trials. Our study 439 answers both questions and provides a clear account of how both initial conditions and inputs jointly control the 440 dynamics in PMd, a key brain region involved in mapping sensory cues to actions (Kurata and Hoffman, 1994).

441
The sensory evidence, which acts as the input combines with initial conditions determining the choice of the 442 monkeys and also alters the speed of the choice. We also demonstrated that changes in initial conditions emerge 443 due to the outcome of the previous trial with errors leading to large shifts in the initial condition and significantly 444 altering subsequent dynamics.

445
Our results, mainly that decision-related neural activity and behavior are well described by a dynamical system 446 dependent upon both initial conditions and inputs, are inconsistent with simple drift diffusion models (DDMs) 447 where decision-making behavior is solely driven to a bound by accumulation of sensory evidence (Ratcliff, 1978; 448 Ratcliff et al., 2016;Hawkins et al., 2015). Including variable drift rates and starting points in a DDM would be 449 insufficient towards recapitulating prestimulus decision-related signals that covary with RT. Variable non-decision 450 times could potentially explain the RT behavior reported here. However, the neural effect of a change in non-451 decision time is thought to relate to changes in the initial latency of decision-related responses and does not 452 predict changes in the prestimulus neural state. Thus, while simple DDMs with a variable non decision time may 453 explain the behavior observed herein they would fail to recreate the observed variability in the initial condition. 454 We believe that cognitive process models with an additive or multiplicative stimulus-independent gain signal, 455 previously described as "urgency" and successfully used to describe monkey behavior and neural activity (Cisek 456 et al., 2009;Cowley et al., 2020;Murphy et al., 2016), could faithfully model the behavior 457 and the neural dynamics. A variable additive gain signal, which adds inputs to accumulators for left and right 458 choices in a race model for decisions, would lead to different initial conditions and thus faster dynamics for faster RTs and slower dynamics for slower RTs (Murphy et al., 2016). Similarly, a multiplicative gain signal would also 460 lead to differences in both the initial firing rates and control the speed of decision-making behavior (Murphy 461 et al., 2016;Cisek et al., 2009). Both types of gain signals generate similar predictions about RT and choice 462 behavior and are often difficult to distinguish using trial-averaged firing rates as done here. One way to resolve 463 this impasse would be to employ single-trial analysis (Peixoto et al., 2021) of neural responses in multiple brain 464 areas using a task paradigm that dispenses sensory evidence over the course of a trial such as in the tokens 465  or pulses task (Hanks et al., 2015). 466 Typically, researchers have focused on the slowing down of responses after an error, a phenomenon termed post-467 error slowing (Dutilh et al., 2012;Purcell and Kiani, 2016). However, our findings suggest that both correct and 468 error outcomes can influence the pre-and poststimulus decision-making neural dynamics on subsequent trials 469 suggesting that post-error slowing could be better understood under the umbrella of post-outcome adjustments 470 (Danielmeier and Ullsperger, 2011). It is currently unclear how these post-outcome adjustments in PMd emerge.

471
One possibility is that these adjustments emerge from the internal dynamics of PMd itself. Errors vs. correct 472 trials could lead to a shift in the initial condition due to recurrent dynamics that occur in PMd due to the 473 presence or absence of reward. Such error related signals have been observed in premotor and motor cortex and 474 have even been used to augment brain computer interfaces (Even-Chen et al., 2017). Alternatively, the changes 475 observed in PMd could emerge from inputs from other brain areas such as the anterior cingulate cortex (ACC) 476 which is known to monitor trial outcome (Hyman et al., 2013), or the supplementary motor area (SMA), which 477 has been implicated in timing of motor actions and evaluative signals related to outcome (Bogacz et al., 2010;478 Ullsperger et al., 2014). Simultaneous recordings in PMd and these brain areas are necessary to tease apart the 479 contribution, if any, of these areas to the initial condition changes observed in PMd.

480
Barring Fig. 5 and Fig. S3, our description of decision-related dynamics largely focused on trial-averaged activity.

481
Even with such a constraint we were able to identify that the position of initial conditions predict RT, and are 482 modified by the outcome of the previous trial and that the dynamics for faster RT trials are further along the 483 movement initiation path compared to slower RT trials. We also demonstrated that sensory inputs combined 484 with initial conditions to alter the speed of dynamics and drive choice-related behavior. We believe that even 485 further insights will be available using single-trial analysis. In particular, here we were unable to fully characterize 486 the relative contributions of the position of the initial condition and the velocity of the initial condition to 487 decision-related dynamics and behavior. We anticipate that further analyses of the curvature, velocity relative 488 to the mean trajectory, path length, and speed of the trajectories will lead to an even better description of the 489 single-trial dynamics underlying decisions as has been done for motor planning (Afshar et al., 2011). Note, we 490 were unable to fully perform such analyses in the current study as we often had only a few neurons per session 491 in Monkey O. The session shown in Fig. 5 and Fig. S3 was an exception as we had 23 well-modulated units in 492 Monkey T. 493 We have shown that the outcome of the previous trial alters the initial conditions for subsequent trials. There 494 are certainly other factors that lead to changes in the initial conditions. In particular, recent studies have shown 495 that both neural activity and behavior as indexed by RT, performance, and pupil size drifts over slow time scales 496 and that these slowly drifting signals are likely a process independent of deliberation on sensory evidence (Cowley 497 et al., 2020;Ferguson and Cardin, 2020). Such effects often emerge over several hours. We believe that such 498 effects could also contribute to the changing initial conditions observed in our study. However, we were unable 499 to assess these effects as 1) we did not measure pupil size, 2) significant amounts of our data were collected with 500 single electrode recordings over short time periods (often 10-15 minutes or so for a tranche of 300-500 trials), 501 and 3) even in sessions where Plexon U-probes were used to simultaneously record from neural populations we 502 often paused the task whenever the animal disengaged from the task or had a sudden decrease in performance.

503
Furthermore, after such pauses we generally increased reward sizes to remotivate the animals. These interventions 504 are often standard for electrophysiological recordings in behaving monkeys but preclude the assessment of the 505 effects of slow fluctuations on decision-making. Nevertheless, we believe that such effects are likely to be an 506 additional crucial source of variability for the initial condition, especially given that it was found to be a factor 507 independent of sensory evidence (Cowley et al., 2020) as in our study, and likely alters decision-making dynamics and behavior. A rich area for future research is to assess whether the same effects observed in V4, and caudal 509 prefrontal cortex in Cowley et al. (2020) also occurs for perceptual decisions in PMd. 510 We found that prestimulus neural activity in PMd and in this task did not covary with or predict eventual choice.

511
However prestimulus neural activity in lateral intraparietal cortex was found to be predictive of choice for low 512 coherence or harder random dot stimuli (Shadlen and Newsome, 2001). Our lack of an observed covariation 513 between the initial condition and choice may be due to the randomization of target configurations, thus the 514 monkeys in our experiment were disincentivized from preplanning a reach direction. To be clear, our lack of 515 a finding does not preclude prestimulus activity in other brain areas or even in PMd with different tasks from 516 covarying with choice . 517 We believe that the effects we see where the initial conditions predict the RT of the animal in a cognitive task are 518 likely to be observed in many brain areas. For example, previous results recorded in monkey dorsomedial prefrontal 519 cortex during timing tasks (Remington et al., 2018b) and in motor cortex/PMd from motor planning tasks (Afshar 520 et al., 2011) bear out the contention that our observation of prestimulus PMd neural population activity covarying 521 with and predicting RTs in a decision-making task is likely not solely localized to PMd or constrained to occur only 522 in this task. In fact, differences in baseline modulation of neural activity between speed and accuracy conditions 523 of speed-accuracy tradeoff tasks (Heitz and Schall, 2012) is found in frontal eye field (Heitz and Schall, 2012) 524 and pre-supplementary motor area (Bogacz et al., 2010). We also showed that prestimulus beta band activity 525 in this same task was correlated with RT (Chandrasekaran et al., 2019). Additionally, in a study of post-error 526 slowing the level of prestimulus phase synchrony in fronto-central electrodes, was found to positively correlate 527 with the speed of RTs (van den Brink et al., 2014). These findings of neural activity changing as a result of 528 different conditions of a speed-accuracy tradeoff task or being predictive of RTs, strongly suggest that initial 529 conditions in multiple brain regions, and potentially some putative fronto-central motor network, effect the speed 530 of a response. In other words, changes in the initial conditions in various brain regions before stimulus onset is 531 likely not a localized effect and suggests either broad signalling (Derosiere et al., 2022) from some source or even 532 feed-forward/feedback mechanisms between brain regions. Research employing dynamical systems approaches demonstrate that future population level activity and behavior 536 is sensitive to initial conditions such that initial conditions were predictive of RTs in motor planning or timing 537 tasks (Afshar et al., 2011;Remington et al., 2018b). However it was unclear whether decision-related neural 538 activity was similarly sensitive to initial conditions and if so, how such sensitivity might interact with sensory 539 evidence accumulation, a well-studied aspect of decision-making (e.g., Roitman and Shadlen, 2002). Our first 540 main contribution is that we observe prestimulus neural dynamics predictive of the RT of a decision, equivalent 541 to the predictive power of the eventual stimulus itself, despite lacking an explicit manipulation of speed-accuracy 542 tradeoff. Our second main contribution was to show that both initial conditions and sensory evidence influenced 543 choice-related neural population dynamics and ultimately behavior. Finally, our third contribution was to show 544 that initial conditions depended on previous outcomes, and, in turn, altered poststimulus dynamics and RTs. 545 We believe that this suite of findings through the lens of the dynamical systems approach is a starting point for 546 understanding the dynamical system underlying decision-making behavior. The insights from this study could 547 be further expanded via single-trial analysis of simultaneous recordings in multiple decision-related regions, by 548 examining how baseline neural activity predicts various aspects of behavior, and ultimately how behavior or global 549 state then feeds back into initial conditions.   Monkeys sat in a customized chair (Synder Chair System, Crist Instrument Co., Inc.) with their head restrained.

571
The arm that was not used to respond in the task was gently restrained with a tube and cloth sling. Experi-572 ments were controlled and data collected using a custom computer control system (Mathworks' xPC target and

586
Experiments were made up of a sequence of trials that each lasted a few seconds. Successful trials resulted in a 587 juice reward whereas failed trials led to a time-out of 2-4 s. A trial started when a monkey held its free hand on 588 a central circular cue (radius = 12 mm) and fixated on a small white cross (diameter = 6 mm) for ∼300-485 589 ms. Then two isoluminant targets, one red and one green, appeared 100 mm to the left and right of the central 590 hold cue. Targets were randomly placed such that the red target was either on the right or the left trial-to-trial, 591 with the green target opposite the red one. In this way color was not tied to reach direction. Following an 592 additional center hold period (400-1000 ms) a static checkerboard stimulus (15 x 15 grid of squares; 225 in total, 593 each square: 2.5 mm x 2.5 mm) composed of isoluminant red and green squares appeared superimposed upon 594 the fixation cross. The monkey's task was to move their hand from the center hold and touch the target that 595 matched the dominant color of the checkerboard stimulus for a minimum of 200 ms (for full trial sequence see 596 Fig. 2B). For example, if the checkerboard stimulus was composed of more red squares than green squares the 597 monkey had to touch the red target in order to have a successful trial. Monkeys were free to respond to the 598 stimulus as quickly or slowly, within an ample ∼ 2s time frame, as they 'chose'. There was no delayed feedback 599 therefore a juice reward was provided immediately following a successful trial (Roitman and Shadlen, 2002). An error trial or miss led to a timeout until the onset of the next trial.

606
The hold duration between the onset of the color targets and onset of the checkerboard stimulus was randomly 607 chosen from a uniform distribution from 400-1000 ms for monkey T and from an exponential distribution for mon-608 key O from 400-900 ms. Monkey O attempted to anticipate the checkerboard stimulus therefore an exponential 609 distribution was chosen to minimize predictability. stimulus (c). The accuracy function was fit using a Weibull cumulative distribution function.

620
Weibull cumulative distribution function: The discrimination threshold α is the color coherence level at which the monkey would make 81.6% correct 621 choices. The parameter γ describes the slope of the psychometric function. Threshold and slope parameters 622 were fit per session and averaged across sessions. We report the mean and standard deviation of threshold and 623 R 2 values from the fit in the text.

638
Single electrode recording techniques were used for a subset of the electrophysiological recordings. Small burr 639 holes in the skull were made using handheld drills (DePuy Synthes 2.7 to 3.2 mm diameter). A Narishige drive 640 (Narishige International USA, Inc., Amityville, NY, United States) with a blunt guide tube was placed in contact with the dura. Sharp FHC electrodes (> 6 MΩ, UEWLGCSEEN1E, FHC, Inc., Bowdoin, ME, United States) instantaneous firing rate (e.g., r i (t, RT, lef t)) for a trial. 3) We then used these trials to estimate the mean and standard error of the firing rate for a condition (e.g.,r(t, RT, lef t)).

687
When firing rates were aligned to checkerboard onset, we removed all spikes 50 ms before movement onset until 688 the end of the trial. We performed this operation to ensure movement related spiking activity did not spuriously 689 lead to ramping in the checkerboard period. units with high firing rates and ensures that each unit has roughly the same overall variability across conditions.

703
Eigenvectors, eigenvalues, and the projected data were calculated using the pca function in MATLAB. We used the approach developed by Machens et al. (2010) to estimate the number of dimensions that best 706 described our data. The assumption of this method is that the firing rates of the k th neuron for the i th trial given 707 a RT bin and choice (r i k (t, RT, choice)) are assumed to be composed of a mean "signal" rate (q k (t|RT, choice)) 708 and a "noise" rate that fluctuates across trials (η i k (t, RT, choice)).
Noise here encompasses both contributions from the random nature of spike trains as well as systematic but unknown sources of variability. Averaging over trials: Whereη k (t, RT, choice) is the average noise over N instantiations (i.e., trials) of the noise term η i k (t, RT, choice).

710
The overall mean firing rate over time and conditions (r) is given as: Note, none of these assumptions are strictly true. Noise may not be additive and it may depend on RT bin and 712 may increase or decrease during various phases of the trial. However, these assumptions illustrate the problem 713 encountered in identifying the number of dimensions to best describe the data.

714
Under these assumptions PCA attempts to identify a covariance matrix as Which can be simplified (see Machens et al. (2010) for more details) to: Where Q ij is a signal covariance and H ij is the noise covariance.

717
Our goal is to perform PCA on Q ij . However, because our data were not collected simultaneously, we cannot 718 calculate Q ij as we do not have a good estimate of H ij .

719
Nevertheless, even with trial-averaged data, one can provide an estimate of H ij by constructing putative noise 720 matrices based on the simplifying assumption that the noise is largely independent in neurons with perhaps modest 721 noise correlations. To generate representative noise traces for our firing rates, notice that if one subtracts: Which is just subtraction of two random instantiations of the same process, which can be written as: Where the final equality emerges from the equations for standard error of the mean. For example, var( Thus, we can generate estimates of the "average" noiseη i (t, RT, choice) Using this equation, we can estimate H ij .

727
We denote C ij as the "signal+noise" covariance matrix and H ij as the "noise" covariance matrix. We estimate  We used the recently developed KiNeT analysis (Remington et al., 2018b) to characterize how state space 734 trajectories evolve over time in terms of relative speed and position as compared to a reference trajectory. We 735 used the first six PCs (∼90% of variance) of the PCAs organized by choice and RT/outcome as these PCs were 736 significantly different from noise in both PCAs (Machens et al., 2010).

737
As such we have a collection of six-dimensional trajectories (Ω 1 , Ω 2 . . . Ω n ) differing in RT bins and choice in 738 one analysis (Fig. 4C-F) and trial outcome and choice in another (Fig. 7C, E). The trajectory associated with 739 the middle RT bin (cyan, Fig. 4C, F) and the trajectory associated with the "Correct" trial outcome (Fig. 7C, If the non-reference trajectory reaches a similar position to the reference trajectory at an 'earlier' timepoint then last RT bins) determines whether the current non-reference trajectory is closer to either the 1st or last condition.

757
As defined here, if a trajectory is closer (i.e. smaller angle) to the trajectory for the 1st condition then (D i [j]) 758 is positive, otherwise it is negative (Fig. 4C).
Angle: KiNeT computes the vector between adjacent trajectories by subtracting the positions of two non-760 reference trajectories when they are respectively closest to the reference trajectory at timepoint j. These vectors 761 are then normalized and all the angles between all adjacent normalized vectors is found at all timepoints. Finally, 762 the average angle is found at each timepoint between all adjacent trajectories (Fig. 4E).

763
∆ Where Ω i (t + δt) and Ω i (t) are six-dimensional trajectories within condition i at time t + δt and time t in the 776 prestimulus period, respectively. ℓ 2 i (t) is the ℓ 2 norm between six-dimensional trajectories within a condition at 777 time t and t + 1. We then averaged speeds across choices and over the entire prestimulus period (-400 ms to 778 0).

779
The plotted 'prestimulus firing rate speed' was averaged across 50 bootstraps in which trials were sampled with 780 replacement 50 times (Fig. 4G). Separate PCA and speed calculations were performed per bootstrap. We estimated the 'choice selectivity signal' by calculating the Euclidean distance between left and right reaches at all timepoints for the first six PCs within each condition (i.e., RT bins ( Fig. 4H & Fig. 6D) and coherence (Fig. 6B).
Ω L (t), and Ω R (t) -the six-dimensional location in state space for a left and right choice at time t.

783
To calculate the latency of this choice selectivity signal, we fit the time varying choice selectivity signal with a 784 piecewise function of the form 4.14. Initial condition as a function of RT and coherence 786 To estimate the initial conditions shown in Fig. 6E, F, we performed the following procedure. For each coherence 787 and RT bin, we concatenated the average location in the six-dimensional state space in the -300 ms to -100 ms 788 epoch before checkerboard onset for both reach directions and obtained a 77x12 matrix (7 coherences,11 RT 789 bins, and 2 choices). We then performed a PCA on this 77x12 matrix and used the top PC as a measure of the 790 initial condition that we used for plotting and subsequent partial correlation analysis. LFADS is a generative model which assumes that neuronal spiking activity is generated from an underlying there was a minimum of 5 neurons in a single session and a maximum of 18. Some sessions had distinct portions 809 (e.g., the electrode was moved). In the later portion of three sessions, 2 neurons were recorded from and in 810 another, 3 neurons were recorded from. Otherwise in all other sessions at least 5 neurons were recorded from.

811
Variance explained and decoding accuracy shown in Fig. 5 Where RT i (t) is the RT on the i th trial, X ij (t) is the spike count in a 20 ms bin for the i th trial and the j th unit, c i is the coherence for the i th trial, and the β j/c are coefficients for the model. After regression, we calculated variance explained by spiking activity and coherence together for each bin by using the standard equation for variance explained.
Where RT is the mean RT, RT k is the RT for the k th trial, and RT k is the RT predicted for the k th trial.

820
For assessing if the R 2 values were significant, we computed a shuffled distribution (500 shuffles) where we 821 shuffled the trials to remove the relationship between the RTs and spiking activity. We then assessed if the per 822 bin R 2 values were significantly different from the 99th percentile of the shuffled distribution R 2 values.

823
Logistic Regression to decode choice: For decoding choice and previous outcome on a bin-by-bin basis, we used a regularized logistic regression approach. Decoders were trained with equal number of trials for the opposing outcomes (i.e., left vs. right reaches; previous correct vs. previous error trials). The logistic regression approach assumes that the log odds in favor of one event (e.g., left) vs. right reach is given by the following equations: β 0 -intercept of the model, β j -model coefficient for the j th neuron in the current bin, X j -spiking activity 824 of the j th neuron of the current bin. The following equation is used to produce the outputs of the system: if 825 p(Lef t|X) < 0.5 then -1 and if p(Lef t|X) > 0.5 then 1. 826 We used the implementation provided in MATLAB via the fitclinear function and the Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm to find the optimal fit for the parameters (Shanno, 1970). We typically attempted to predict choice or previous outcome using tens of units. To simplify the model, decrease collinearity of the coefficients and to avoid overfitting, we used L2 regularization (ridge regression): Where J -cost associated with coefficients, λ -penalty term (1/number of in-fold observations), and β are 827 the coefficients of the model. We used 5-fold cross validation and calculated loss for each model. Accuracy is 828 reported as accuracy = 1 − mean(loss). choice . For this purpose we used a modified version of the alignment index developed by Elsayed et al. (2016): The alignment index, A, provides an estimate of the fraction of variance that is explained by projecting one 835 subspace into another. tr() is the trace of a matrix, which can be proved to be the sum of its eigenvalues.  We used dPCA, a semi-supervised dimensionality reduction technique to further understand if prestimulus activity 844 which covaried with RTs shared variance with firing rate activity that covaried with the previous trial's outcome. 845 We performed two dPCAs. The first identified axes that maximally accounted for firing rate variability from trial 846 outcome and the second identified axes that maximally accounted for firing rate variability that covaried with 847 RTs. We then calculated the dot product between these axes and estimated the angle using the inverse cosine 848 of the dot product. An angle of zero would indicate that these axes completely overlap and that their sources of 849 variance are the same, whereas orthogonal angles would mean that the axes do not overlap and therefore share 850 no variance. Signal+Noise Noise Figure S1: Percent variance explained by each component from the PCA organized by RT and choice: "sig-nal+noise" and "noise" variance explained by the first 10 components. The first six components capture over 90% of the variance. To derive the error bars for the signal+noise PCA, we used bootstrapping (50 repeats) over trials to estimate standard errors.  Figure S2: Prestimulus spiking activity is predictive of RT but not choice, even when decoding is performed within RT bins (A/B) Scatterplot of true mean prestimulus R 2 /accuracy values compared to the R 2 /accuracy values for the 99th percentile of the shuffled data. Each dot represents the bin-and trial-averaged prestimulus mean R 2 /accuracy value within each of the 51 sessions. The dotted line is where scatter points would fall if shuffled R 2 and real R 2 values were equivalent. (A) Many of the points lie above this line suggesting that real prestimulus neural activity explains more of the RT variance than shuffled neural data. (B) In contrast, many of the points lie on or below this line suggesting that real prestimulus neural activity is not predictive of choice. (C) Plot of mean accuracy from logistic regressions of binned spiking activity (20 ms) used to predict trial-matched eventual choice within RT bins. Accuracy is averaged across 51 sessions. Gray shaded area is SEM . The gray dotted line is 50% accuracy. Each trajectory is plotted from 200 ms before checkerboard onset (dots) to movement onset (diamonds). (C) Scree plot of the percentage of variance explained by the first ten components. The first six PCs capture ∼ 90% of the variance in firing rate activity.
(D) Euclidean distance in the first six dimensions between the two reach directions aligned to checkerboard onset ('Cue' & black dashed line). We observed no prestimulus separation between reach directions. Choice selectivity is lower and slower for error trials compared to all other outcomes. Post-error choice selectivity may be larger than other trial outcomes. • argmin -where function achieves its minimum at point j