ABSTRACT
Economic choice and stopping are not traditionally treated as related phenomena. However, we were motivated by foraging models of economic choice to hypothesize that they may reflect similar neural processes occurring in overlapping brain circuits. We recorded neuronal activity in orbitofrontal cortex (OFC), while macaques performed a stop signal task interleaved with a structurally matched economic choice task. Decoding analyses show that OFC ensembles predict successful versus failed stopping both before the trial and immediately after the stop signal, even after controlling for value predictions. These responses indicate that OFC contributes both proactively and reactively to stopping. Moreover, OFC neurons’ engagement in one task positively predicted their engagement in the other. Finally, firing patterns that distinguished low from high value offers in the economic task distinguished failed and successful trials in the stopping task. These results endorse the idea that economic choice and inhibition may be subject to theoretical unification.
Acknowledgements
This work was supported by an R01 (DA038615) to BYH. We thank Meghan C. Pesce for help with data collection.
INTRODUCTION
Stopping (sometimes referred to as inhibition) and economic choice are two major brain functions that have historically been studied independently. Nonetheless, there is some reason to think that they may spring from shared processes. For example, several psychiatric conditions, including depression and addiction, impair both processes, and greater impairment of both is associated with greater disease progression (Iacono et al., 2008; Nestler et al., 2002; Volkow et al., 2011). Second, both are closely associated with, and empirically linked to, the broader concept of self-control (Berkman et al., 2016; Hayden, 2018; Inzlicht et al., 2014; Shenhav, 2017). Third, both tend to activate a similar set of brain regions, including the pre-motor cortex, the ventrolateral prefrontal cortex, basal ganglia, and the thalamus (Aron, 2007; Aron and Poldrack, 2006; Cisek, 2012; Cisek and Kalaska, 2010; Hampshire and Sharp, 2015; Sakagami and Pan, 2007; Schall et al., 2002).
The idea that stopping and choice may have a deeper relationship is motivated by certain foraging-inspired approaches to economic choice (Cisek, 2012; Cisek and Pastor-Bernier, 2014; Hayden, 2018; Hayden and Moreno-Bote, 2018; Kacelnik et al., 2011; Krajbich et al., 2010; Stephens and Krebs, 1986). A core tenet of these approaches is that the brain’s decision-making systems are evolved to make accept-reject decisions (Kacelnik et al., 2011; Ojeda et al., 2018; Pirrone et al., 2017; Shapiro et al., 2008; Vasconcelos et al., 2010). Even ostensibly binary economic choices, in this view, reflect a pair of (potentially interacting) accept-reject choices. Each accept-reject decision, in turn, involves choosing whether to pursue an option or refrain from pursuit. Accepting involves selecting the attended or activated option, or, more abstractly, performing the afforded action (Cisek and Kalaska, 2010; Cisek and Pastor-Bernier, 2014; Hayden and Moreno-Bote, 2018). Rejecting involves countermanding the afforded action. A classic binary economic choice, then, may be seen as two related decisions about whether to go or stop choosing the attended option or the afforded action (Hayden, 2018).
This way of looking at choice is consistent with some recent studies that suggest that binary choice involves a serial, not parallel, consideration of options (Krajbich et al., 2010; Rich and Wallis, 2016; Strait et al., 2014; reviewed in Hayden and Moreno-Bote, 2018). These studies and others indicate that attention is largely limited to a single option, which is evaluated, often relative to the other one (Lim et al., 2011; Rich et al., 2017; Rudebeck and Murray, 2014; Strait et al., 2014 and 2015; Xie et al., 2018). Choice, then, presumably occurs relative to a single option that is evaluated relative to a background value, which includes the value of choosing the other option (Shapiro et al., 2008; Vasconcelos et al., 2010). However, this work does not directly tie economic choice and reward processing to stopping processes.
Here we sought to test the overlap hypothesis by comparing neuronal activity in an economic choice task with that observed in a stopping task. We focused on the orbitofrontal cortex (OFC). The centrality of OFC in economic choice is largely undisputed, although its specific role remains to be determined (Padoa-Schioppa, 2011; Rich et al., 2017; Rudebeck and Murray, 2014; Schoenbaum et al., 2009; Wallis, 2007; Wilson et al., 2014). It is clear, nonetheless, that activity of Area 13 of OFC correlates with the values of offers and of chosen options, and is likely to be critical for value comparison as well (Padoa-Schioppa, 2013; Padoa-Schioppa and Assad, 2006; Raghuraman and Padoa-Schioppa, 2014). In contrast to its clear role in choice, the contribution of the OFC to stopping remains disputed. On one hand, a good deal of work argues against a direct inhibitory role for the OFC (Chudasama et al., 2006; Ghods-Sharifi et al., 2008; Rudebeck and Murray, 2014; Schoenbaum et al., 2003; Stalnaker et al., 2015). However, multiple studies give the OFC at least some role in inhibition (Bryden and Roesch, 2015; Chikazoe et al., 2009; Dias et al., 1996; Eagle et al., 2007; Horn et al., 2003; Majid et al., 2013; Mishkin, 1964; Roberts and Wallis, 2000). One reason for the continued debate about the role of OFC in choice the lack of direct evidence from the unit activity in this region in stopping tasks (but see: Bryden and Roesch, 2015).
We hypothesized that OFC participates in both choice and stopping decisions in similar ways, that is, by computing executive signals that promote (or fail to promote) particular strategies. To test this hypothesis, we examined responses of OFC neural populations recorded in two interleaved tasks, a stop signal task and an economic choice task. The tasks were designed to have structures as similar as practically possible. We were particularly interested in the questions of (1) whether and how the function of this economic region includes stopping, and (2) whether neural response patterns related to stopping correspond with patterns related to value.
RESULTS
Behavior in the stop signal task and economic choice task
Subjects performed a standard stop signal task (similar but not identical to the one used by Hanes and Schall (1995; Figures 1A and 1B and Methods). On each trial, following a central fixation, subjects saw an eccentric target (go signal) that, if fixated, provided a juice reward. On a subset of trials (33%, called stop trials), a second signal (stop signal) appeared at fixation and countermanded the previously instructed saccade. Successful stopping trials were rewarded. Failed trials (trials in which a saccade was made despite a stop signal) were not. Median reaction time to the target in go trials was 0.435 sec and 0.272 sec in subject J and subject T, respectively (Figures 1C and 1D). Average trial length including feedback time for subjects J and T were 3.66 s and 2.62 s, with the mean feedback start time as 1.78 s and 1.49 s, respectively. Both subjects showed typical behavior in this task; their performance in stop trials varied as a function of time of presentation of stop signals relative to that of go signal (Figures 1E and 1F).
task framework (A) stop signal task (B) economic choice task. Behavioral results for subject J are presented in panels (C, E, G, I, J) and for subject T in (D, F, H, K, L) (C, D) reaction time distributions for various trial conditions of stop signal task (E, F) Inhibition function and accuracy of choices varied as a function of SSDs (G, H) reaction time distributions for various trial conditions of economic choice task. Previous trial had effects in reaction time behavior in (I, K) stop signal task and (J, L) economic choice task. Error bars represent SEM, and * denotes t-test significance with p < 0.05. (M) recording site – Area 13 of OFC (scan from subject J shown).
The delay between the go signal and the stop signal is called the stop signal delay (SSD) and it varied randomly across trials. We estimated the SSD that leads to approximately 50% successful stopping (SSD-50) because it can help in computing the stop signal reaction time (SSRT, Logan, 1994; Logan and Cowan, 1984; Verbruggen and Logan, 2008). The SSD-50 was 0.277 sec for subject J and 0.131 sec for subject T. SSRT computed for SSD-50 was 0.158 sec for subject J, and 0.141 sec for subject T. These values are typical of rhesus macaques in these tasks (e.g. Hanes and Schall, 1995; Ito et al., 2003).
We randomly interleaved stop signal trials with trials from an economic choice task. This task was designed to have a similar structure to the stop signal task. Specifically, forced choice trials were equivalent to go trials and choice trials were equivalent to stop trials (see Methods for details). In the economic choice task, the offers were associated with low (yellow), medium (blue), or high (magenta) reward value (Figure 1B). The subjects chose either offer 1 (which appeared at the periphery, similar to go signal) or, when it occurred, could choose the later-appearing offer 2 (which appeared at the center, similar to stop signal). The delay for offer 2 was fixed and defined by the measured stop signal delay computed from the stopping task. For simplicity, we will use the terms accept and reject trials to refer to those in which the subject chose offer 1 and offer 2, respectively.
As anticipated, the two tasks showed similar behavior results. Median reaction time in forced choice trials was 0.41 sec and 0.27 sec in subject J and subject T, respectively (Figures 1G and 1H). The reaction time medians for choice trials in the presence of offer 2 were lower that that in forced choice trials (Supplemental results-A). On average, the total length of trials including the feedback time for subjects J and T were 3.86 s and 2.88 s, with the mean feedback start time as 1.88 s and 1.70 s, respectively. Choice accuracy varied as a function of SSD in both subjects (Figures 1E and 1F; Supplemental results-A) Both subjects showed similar behavioral effects in the current trial of each task, as a function of previous trial conditions (Figures 1I and 1J for subject J, Figures 1K and 1L for subject T; refer to supplementary results A for behavioral effects).
Overlapping sets of neurons participate in the two tasks
We recorded responses of 96 neurons (52 in subject J and 44 in subject T) in Area 13 of the OFC (Figure 1M). The number of neurons to be collected was determined a priori based on exploratory analyses of previously collected datasets and was not adjusted during recording based on analyses performed mid-experiment. Note that while this number is smaller than in some other studies, is it sufficient to detected the effects we are interested in here.
We focused our analyses on four key time periods of the trial: (1) the 300 ms epoch before the trial started (pre-go signal epoch), which corresponds to fixation time before the appearance of any stimulus targets; signal differences here presumably reflect proactive control (Stuphorn and Emeric, 2012). (2) The variable time after the stop signal and before the reaction time period (post-stop signal epoch). The post-stop epoch is important because it is when inhibition generated in response to countermanding commands would presumably occur, and has therefore been the focus of many studies of stopping (Logan et al., 2015; Schall, 2001; Schall et al., 2002). It corresponds to the time during which reactive control occurs (Stuphorn and Emeric, 2012). (3) The variable time after the go signal and before the reaction time period (post-go signal epoch). The equivalents of (2) and (3) in choice task are the variable times after offer 2, and offer 1, respectively before the reaction time of a trial; these epochs denote the task related activity in general; (4) the variable time between the beginning and end of feedback reward (feedback epoch).
Neurons had diverse tuning profiles with near balance of sign. In stopping task, 44.79% of cells showed positive task tuning (response during task versus baseline), and 53.12% showed negative tuning, and the rest weren’t modulated. In the economic choice task, 45.83% of cells showed positive task tuning, and 51.04% showed negative tuning, and the rest weren’t modulated. We examined the relationship between simple response patterns in the two tasks on a cell-by-cell basis. We found similar neuronal activity when comparing task related activity against baseline across tasks (which we call task-related tuning). Regression weights (task-related tuning coefficient) in one task positively predicted the weights in the other task (Pearson correlation between tuning coefficients, r = 0.78, p < 0.001, Figure 2A). (Note that these data come from separate sets of trials, so there is no overlap in data used to estimate the two sets of tuning coefficients.) Moreover, absolute response patterns (that is, unsigned regression weights) were also positively correlated across the two tasks (0.74, p < 0.001), indicating that it was the same set of neurons involved in the two tasks, rather than distinct sets (for details on using this method to interpret relationships between regression coefficients, see Blanchard et al., 2015a).
Correlations between signed (A) task-related tuning coefficients, reward tuning coefficients in (B) post-go signal epoch (C) feedback epoch of stopping and economic choice tasks.
Next, we compared the coding of rewards in both tasks. For the stop signal task, we looked at the differential coding of no-rewards during failed stopping versus rewards in successful stopping; for the economic task, we looked at differential coding of varied rewards associated with offers. Tuning coefficients for reward values were positively correlated between both the tasks during the post-go signal period (between signed coefficients, r = 0.3, p < 0.001, Figure 2B) and during the feedback epoch (between signed coefficients, r = 0.35, p < 0.001, Figure 2C). The positive correlation demonstrates that the rewards are encoded in similar way across tasks. Moreover, the unsigned correlation coeffcients were also correlated in both epochs (r = 0.27, p = 0.01 and r = 0.28, p = 0.01, respectively). This correlation indicates that coding of the two types of value was handled by the same subset of neurons across the two tasks.
Selectivity for stopping in single neurons
The role of OFC in economic decisions is well established, but its role in stopping is not. Thus, a demonstration of functional overlap involves showing that it plays a role in stopping. Because of its relevance to the foraging hypothesis (see Introduction), we focused here on the determination of successful from failed stopping. The two time periods of significant interest to this hypothesis include: 1) The post-stop signal period, and 2) the pre-go signal time period.
Responses of example neurons are illustrated in Figure 3. In neuron J19, firing rates following the go signal but before the SSRT were lower on successfully inhibited trials (1.8 spikes/sec) than on failed stopping trials (4.1 spikes/sec, Wilcoxon rank test, ranksum = 1480, p < 0.05, n = 567 trials, Figure 3A). Note that there is a larger and more prominent modulation in firing rate later in the trial. Given its timing, this modulation likely relates to outcome monitoring and is too late to influence stopping. Another example neuron, T25, showed distinct patterns for successful and failed stopping trials even 500 msec before the beginning of the trial (ranksum = 2080, p < 0.05, n = 579 trials, Figure 3B).
Activity of example neurons during successful stopping, failed stopping and go trials are illustrated with respect to go signal in panels (A, B, E, F) and stop signal in (C, D) presentation time. Time from start of the go (stop) signal to SSRT is shaded all panels. Neuron in panel A shows significant difference in firing rates of successful and failed stopping trials before SSRT. Neuron in panel B shows difference even before the beginning of trial. Neuron in panel C is the same as panel A, shows significant difference in firing rates of inhibition trials before stopping response time. Likewise, neuron in panel D shows difference around few msecs after SSRT. (E, F) Activity of example neurons in economic choice task: neuron in panel E is the same as panels A and C. Its reward related activity after SSRT in the choice task parallels to that of stop signal task, and is positively correlated to the value. Neuron in panel F shows the opposite trend, and is negatively correlated to the reward value.
The responses shown in Figure 3C and 3D are aligned to stop signal (time zero). Figure 3C illustrates the activity of the same neuron shown in Figure 3A; its response pattern showed significant differences between successful stopping trials (1.8 spikes / sec) and failed stopping trials (4.4 spikes / sec) that begin after the presentation of stop signal but before SSRT (ranksum = 1340, p < 0.05). Finally, neuron T10 (Figure 3D) fired more vigorously on successful than on failed stopping trials at around 100 msec after the SSRT (ranksum = 2229, p < 0.05). The results also show that in OFC, the codes for go and stop trials do not show simple and opposite activities for a saccade. This contrasts with the activities of movement and fixation neurons in FEF, where inhibition is driven by a rapid rise in firing rates of a specific subpopulation of neurons—fixation neurons – that gate the activity of another subpopulation—movement neurons (Hanes and Schall, 1995; Logan et al., 2015; Schall, 1991). Therefore, OFC neurons are rather complex for a race model (Logan et al., 2015) to compute their stopping pattern.
To test for the possibility that our putative inhibition signals were just reward correlates, we took advantage of trials collected from the economic choice task. The data from this task allowed us to assess each neuron’s tuning function for anticipated rewards. Responses to different reward amounts by two example neurons are shown in Figures 3E and 3F. We found tuning for anticipated reward values in the firing activity during the reward feedback time period. For example, we observed a significant positive correlation between reward amount and firing rate in neuron J19 (ϱ = 0.3138, p < 0.001, Figure 3E), this neuron shows similar reward related activity across tasks (Figure 3A, 3C). We also show another neuron with a significant negative correlation to rewards (neuron T10, ϱ = – 0.143, p = 0.04, Figure 3F). Example individual neurons suggest the presence of diverse stopping and reward related neuronal codes at OFC. Furthermore, simple population analyses don’t inform about stopping patterns at OFC (refer to supplementary results-B).
Ensemble patterns distinguish successful from failed stopping
To compare whether similar ensemble patterns of activity predicted behavior in the two tasks, we examined neural network decoders (classifiers). (Note that while we used normalized data, see Methods, the normalization procedure did not alter our results, see Supplementary results-C). We first trained decoders to analyze differences in population activation patterns between successful and failed stopping in stop signal task; We trained classifiers using two key epochs— post-stop signal epoch and pre-go signal epoch. Testing of decoders for significant patterns differentiating successful and failed stopping used 100 msec moving boxcars stepping in 10 msec intervals as input to the classifier.
Because of the possibility of false positives, we were especially interested in periods in which a decoder had several positive effects in adjacent bins (see Methods). The post-stop signal decoder was able to classify success versus failure of inhibition significantly in a series of 9 consecutive bins spanning 40 to 120 msec after the stop signal (these times indicate the starts of the 100 msec boxcars; see Methods on procedures to determine statistical significance of a boxcar using chi-square statistics). The corresponding numbers for individual subjects were 40 to 140 msec in subject J and 40 to 220 msec in subject T (supplementary results-D). The post-stop signal pattern series are unlikely to occur by chance (p < 0.001 in all cases, see Methods for specific use of chi-square tests to quantify significance of consecutive bins). Since each bin spans 100 msec, the central point of the first element in this series provides a rough estimate of the latency of the effect. That number was t = 90 msec for both subjects.
Notably, the central point of the first bin of the series to reach significance in both subjects occurred before the stop signal reaction time of either subjects (the SSRTs were 140 msec for subject J and 120 msec, also see supplementary results-D; Figure 4). We call the central point the cancellation time; it measures the center point latency of first statistically significant difference between successful and failed stopping trials for the ensemble of neurons. The cancellation time is 90 msec for both subjects. The cancellation time preceded the average stopping response (SSRT) by 50 msec in subject J, and by 30 msec in subject T, suggesting OFC’s stopping-related patterns precede the stopping response (see Discussion, and supplementary results-D).
Performance of (A) pre-go signal (B) post-stop decoders to distinguish successful versus failed stopping pattern. Insets in both panels illustrate sample projections of decoder’s output responses (Y-and Z-axes indicate the values of o1 and o2, for successful and failed stopping trials, respectively). Error bars indicate SEM, so non-overlap with the chance bar (horizontal dashed red line) is not sufficient to indicate statistical significance). Time points underneath through thick horizontal orange bars (and black tick marks) denote start time of 100 msec boxcars having classification accuracy significantly above chance (i.e. 50%) (chi-square test, p < 0.05).
We then examined the response differences of the pre-go signal decoder. We observed a significant pattern difference between successful and failed stopping trials (p < 0.001) in 36 number of boxcars, extending from 470 to 120 msec before the go signal (see Methods for details on chi-square statistics). For subject J, significant decoding was observed in 35 number of boxcars during the time periods 460 to 120 msec; for subject T it was 23 number of boxcars from 420 to 200 msec. These results indicate that the upcoming success or failure of inhibition is decodable from OFC patterns even before the start of the trial (Figure 4, also see supplementary results-D). The pre-go signal pattern series are unlikely to occur by chance (p < 0.001 in all cases, see Methods for specific use of chi-square tests to quantify significance of consecutive bins). Our results do not tell us why this correlation exists, although one may infer that it reflects some internal state that drives successful versus failed inhibition. Thus is it is a likely correlate of proactive control. Overall, these results implicate Area 13 of OFC in the process of regulating stopping decisions.
The post-stop and pre-go decoders are statistically orthogonal
We next examined how the pre-go and post-stop decoders related to each other. We did so by comparing their weight vectors. We found a very low similarity between them (Pearson correlation coefficient, r = 0.0008± 0.0086), suggesting the two epoch patterns may be nearly statistically orthogonal and hence independent. A different possibility is that this low correlation may be an artifact of noise. To test this second possibility, we next performed a cross-validation procedure to estimate the maximum range of measured cross-correlation values had the variables been fully correlated given our noise properties. Cross correlation coefficient obtained between converged weight values of pre-go and post-stop signal decoders (theoretical maximum), ‘rxmaX is 0.9058 ± 0.0878; the value fell outside the central 98% of cross-validated data (and is significant at p <= 0.01; 100 randomizations, average of rx-max from the randomized sets = 25.84 ± 0.82), substantiating statistical independency between pre-go and post-stop signal decoder weight patterns.
Moreover, when the decoding performances for two decoders were compared, they were significantly different during the time periods after the stop signal (t-stat = 6.0491, p = 0.003). Similarly, they were significantly different even before the beginning of trial (t-test, t-stat = 8.8874, p < 0.001). The above results suggest that the two decoders that predict successful versus failed inhibition are statistically orthogonal and thus dissimilar. More broadly, these results suggest that the computations associated with ostensible proactive and reactive control of stopping by OFC are distinct.
OFC encoding of stopping is not a by-product of value coding
The reward-encoding role of OFC is a hallmark of its function (Padoa-Schioppa, 2011; Rushworth et al., 2011; Schoenbaum et al., 2009; Schoenbaum et al., 2011; Schultz, 2000; Wallis, 2007). We therefore wondered whether the stopping-related activity that we observed might by an artifact of its reward roles. For example, it may be that there is some undetectable natural variation in the relative subjective value of the reward offered for correct performance. On trials in which the reward happened to have a slightly lower value, the subject would be less motivated to perform correctly; this fluctuation would then introduce a correlation between firing rates and successful stopping (see, for example, Sugrue et al., 2005, for a similar argument about LIP neurons).
If the stopping-related signals were a direct consequence of reward encoding for every neuron, we would see a positive correlation between coding patterns for rewards and stopping. We computed a reward index for all neurons by regressing their responses at feedback epoch against the outcomes themselves. We computed a stopping index for all neurons by subtracting on their firing rate during successful and failed stopping before SSRT (see Methods). We found no correlations between these indices in the post-stop signal time period (Pearson correlation, ϱ =0.09, p=0.4, Figure 5). Nor did we find such correlations in pre-go signal time period (ϱ =-0.02, p=0.82, Figure 5).
Correlations between stopping and reward indices show no significant effect during 100 msec in (A) pre-go signal and (B) post-stop signal time period.
This lack of correlation at the neural level may be a sign that the reward code and the stopping code are different. It may also, in theory, be due to lack of sufficient data to detect a significant effect. To test this idea, we performed a cross-validation analysis (See Methods). Specifically, we reasoned that if insufficient data were the problem then a within sample correlation would also produce no significant correlation. A positive coefficient resulting from a within sample correlation, using randomly sampled half-sized subsets, then, would indicate that our data have sufficient power to detect a significant effect (Blanchard et al., 2015a). We thus tested whether the correlation coefficient for stopping and reward indices fell below the bottom 5 percentile of the coefficients obtained for within-group correlations. Indeed, the coefficient fell below 1st percentile of that obtained for 100 randomizations in cross validation analysis. Figure 5 show nocorrelations between stopping and reward indices with p <= 0.01.
Overlapping functional ensemble codes for stopping and for value
We next examined the relationship between patterns that distinguished successful versus failed stopping of the stop signal task, and different reward values of the economic choice task. One potential confound in such an analysis is that OFC may encode action, and similar effects may reflect their shared actions (Feierstein et al., 2006; Grattan and Glimcher, 2014; Roesch et al., 2006; Yoo et al., 2018; Strait et al., 2016). To deal with this problem, for the analysis of economic choice task data, we used only trials in which the subject accepted the offer. Thus, action was the same – a saccade – in all cases. Below, we present results on training using two key epochs representing the trial, its task-related activity informing (1) post-go signal epoch, and the (2) the feedback epoch.
We first trained a network with successful versus failed stopping trials, and tested them with high versus low value-accept trial (see Methods). Therefore, this decoder network looks for stopping task related patterns in the economic choice trials. When trained with post-go signal epoch, decoder differentiating successful versus failed stopping patterns showed significant similarities to time periods −140 to −60 msec, 420 to 570 msec, from the presentation of offer 1 for patterns differentiating low versus high rewarding accept trials (Figure 6). We also trained a reversed-network with reversed training and testing sets: trained with patterns distinguishing high versus low value-accept trials, and tested with patterns distinguishing failed and successful stopping trials. The significance of the reversed network, in contrast with the earlier, looks for the presence of economic choice task related patterns in the stopping task trials. Similar to the earlier, in reversed-network, training with patterns differentiating low versus high rewarding accept trials of choice task, in the post-go signal epoch, showed significant similarity to time periods around −230 to −70 msec, 420 to 720 msec from the presentation of go signal for patterns differentiating failed versus successful stoppings, respectively (Figure 6). This shows the presence of overlapping patterns between tasks, the neural patterns relating to the decision process of one particular task-type can be tracked from many time periods (pre trial onset and during the later parts of the trial) of the other task-type.
Performance of decoders trained on (A) stopping task (B) choice task on poststop signal epoch (blue, magenta) and feedback (green, brown) epochs. Error bars represent SEM. Time points underneath the thick horizontal highlighted bars (blue, green, magenta, brown) of each result, as labelled in the legend, denote the start time of 100 msec boxcars having percent accuracies of classification above chance of 50% (chi-square test, p < 0.05). Red line indicates chance (50%) performance for both decoders.
Then, we trained with feedback epoch of successful versus failed stopping trials; their patterns showed similarity to time periods 570 to 690 msec after the presentation of offer 1 of high versus low rewarding accept trials (Figure 6). In the reversed-network, with characteristics described as the previous, training using high versus low rewarding accept trials of choice task showed significant similarities to time periods around 190-220 msec and 400-440 msec of failed versus successful stopping patterns, respectively (Figure 6). These results show the similarities of feedback related ensembles in both tasks.
Reward promotes choice of the option it is associated with. The results show that the patterns used to encode stopping in the stop signal task would correspond to patterns associated with value encoding in the economic choice task (Hayden, 2018). Particularly, we found that task and feedback epoch related patterns of one task can be tracked from the other, in time periods both before and during the trial, in a non-continuous and dispersed manner. Similarities between neural patterns of both tasks may promote the idea of shared strategies between stopping and choice decisions. We hypothesize that reward related patterns could act as motivational forces when present in stopping decisions. On the other hand, presence of stopping related patterns in choice decisions could facilitate internal action strategies guiding the choice.
DISCUSSION
We examined responses of neurons in Area 13 of the OFC in two tasks, one an implementation of a classic stopping task and the other a simple economic choice task with similar structure. Although the economic role of this region in choice is well established, its role in stopping is not. Our finding that OFC ensembles predict stopping both before the trial and immediately before the stopping response demonstrate that this region does participate in regulation of stopping. The timing of the two stopping-related patterns is reminiscent of the times associated with reactive and proactive control, respectively (Braver, 2012; Braver et al., 2007; Chen et al., 2010; Chikazoe et al., 2009; Hanes et al., 1998; Ito et al., 2003; Majid et al., 2013; Stuphorn et al., 2000; Stuphorn et al., 2010; Stuphorn and Emeric, 2012). Moreover, by interleaving the tasks, we were able to show that it is largely the same neurons participating in both processes. Finally, we show that the patterns that differentiate value in the choice task can distinguish failed from successful stopping in the stop signal task. These results thus support the hypothesis that stopping and economic choice reflect common computations occurring in overlapping circuits.
These results provide evidence in favor of the hypothesis that the neural processes that regulate stopping relate to the ones that regulate economic choices. In foraging theory, decisions are generally framed as accept-reject (Blanchard and Hayden, 2015; Kacelnik et al., 2011; Stephens and Anderson, 2001; Stephens and Krebs, 1986). From this perspective, binary choices, the mainstay of behavioral economics and microeconomics, are better thought of as two somewhat independent accept-reject decisions (Hayden and Moreno-Bote, 2018; Kacelnik et al., 2011). Each accept-reject decision, in turn, functions like its classic foraging counterpart, that is, as a choice between pursuing and refraining from pursuit (Stephens and Krebs, 1986; Freidin et al., 2009; Kacelnik et al., 2011). In other words, what appears to be a binary choice may actually be a pair of countermanding stopping decisions (Hayden, 2018). Our results provide tentative neural evidence in support of this idea.
These results also invite a reconsideration of the role of the OFC. This region, especially Area 13, is sometimes cast as a specialist in economic functions (Padoa-Schioppa, 2011; Wallis, 2007). Our results challenge that narrow view and endorse a broader view that encompasses stopping as well. We doubt that the role of OFC is limited to these two functions. Its other functions likely include contingent (rule-based) decisions, working memory, switching, and conflict monitoring (Bryden and Roesch, 2015; Lara et al., 2009; Mansouri et al., 2014; Meyer and Bucci, 2016; Sleezer et al., 2016; Sleezer et al., 2017). All these functions, including stopping and economic choice, can arguably be placed within the larger category of executive functions. Like stopping, executive functions more broadly are generally associated with dorsal prefrontal structures (Miller, 2000; Miller and Cohen, 2001). Some of these functions may also be part of the repertoire of the OFC as well.
Stopping is often associated with dorsal prefrontal structures, such as FEF, and with subcortical structures, like SC (Hanes and Schall, 1995; Logan et al., 2015; Schall, 1991). Our work suggests that the role of OFC is qualitatively different than these regions. Specifically, we find evidence that single neurons were neither consistently associated with a higher or lower firing rate, nor were they associated with two discrete sets of neurons, as in frontal eye fields (FEF) and superior colliculus, SC (Hanes and Carpenter, 1999; Pouget et al., 2017; Stuphorn et al., 2000). Instead, our results show stopping correlates only when examining patterns found in ensembles of cells. This finding suggests that encoding of stop signals may be more abstract than the coding in more dorsal regions.
Overall, our results suggest one core function of OFC may be to generate an abstract regulatory signal to feed into a cascade of downstream structures that ultimately determine choice (Hunt and Hayden, 2017). In this way, it may be similar to other regions, especially cingulate cortex (Blanchard et al., 2015b; Hillman and Bilkey, 2010; Shenhav et al., 2013). In the context of economic choice, this signal will resemble a value signal; in other cases it will correlate with other relevant task variables. This view is consistent with the idea that choice and control processes both reflect a gradual transformation occurring in a distributed manner across brain regions, rather than a modular one (Balasubramani et al., 2018; Eisenreich et al., 2017; Hunt and Hayden, 2017) to study their neural codes. One benefit of view is that provides a basis for the observed role of OFC and adjacent structures in self-control (Kable and Glimcher, 2007; McClure et al., 2004).
The OFC is the major gateway by which sensory information enters into the prefrontal cortex and is a major source of visceral information as well (Öngür and Price, 2000). It may therefore occupy an early position in PFC processing hierarchies (Carmichael and Price, 1994; Fuster, 1988; Fuster, 2001; Rushworth et al., 2012; Rushworth et al., 2011). These facts raise the possibility that OFC serves as a first (or at least a relatively early) stage for computing preliminary executive signals that can affect-but not determine behavior (Cavada et al., 2000; Ebitz and Hayden, 2016; Öngür and Price, 2000; Wallis, 2007).
METHODS
Subjects
Two male rhesus macaques (Macaca mulatta, subject J, age 10, and subject T, age 5) served as subjects. All animal procedures were approved by the University Committee on Animal Resources at the University of Rochester and were designed and conducted in compliance with the Public Health Service’s Guide for the Care and Use of Animals.
Recording site
A Cilux recording chamber (Crist Instruments) was placed over the area 13 of OFC (Figure 1). The targeted area expands along the coronal planes situated between 28.65 and 33.60 mm rostral to the interaural plane with varying depth. Position was verified by magnetic resonance imaging with the aid of a Brainsight system (Rogue Research Inc). Neuroimaging was performed at the Rochester Center for Brain Imaging, on a Siemens 3T MAGNETOM Trio Tim using 0.5 mm voxels. We confirmed recording locations by listening for characteristic sounds of white and grey matter during recording, which in all cases matched the loci indicated by the Brainsight system.
Electrophysiological techniques
Single electrodes (Frederick Haer & Co., impedance range 0.8–4 MOhm) were lowered using a microdrive (NAN Instruments) until waveforms of between one and five neuron(s) were isolated. Individual action potentials were isolated on a Plexon system. Neurons were selected for study solely based on the quality of isolation; we never preselected based on task-related response properties.
Eye tracking and reward delivery
Eye position was sampled at 1,000 Hz by an infrared eye-monitoring camera system (SR Research). Stimuli were controlled by a computer running MATLAB (Mathworks) with Psychtoolbox (Brainard and Vision, 1997) and Eyelink Toolbox (Cornelissen et al., 2002). A standard solenoid valve controlled the duration of water delivery. The relationship between solenoid open time and water volume was established and confirmed before, during, and after recording.
Task
The stopping task is a measure of self-control that provides an alternative approach that avoids some of the limitations of intertemporal choice tasks (Hayden, 2016). The task followed standard stop signal paradigm (Hanes and Schall, 1995; Logan, 1994; Logan and Cowan, 1984). Subjects were placed in front of a computer monitor (1920x1080px) with black background. Following a brief (300 msec) central fixation on a white circle (radius 25px, Figure 1), the fixation spot disappeared on the appearance of eccentric saccade target (90px white square, 2.38 degrees, positioned at 288px in left or 1632px in right of screen, 50% chance). A go trial (67% of trials, randomly selected) was indicated by a go signal which is the peripheral target, whereas a stop trial (33% of trials, randomly selected) was indicated by an additional appearance of a stop signal — a central gray square (90px square, 2.38 degrees) delayed relative to the go signal presentation. Stop signal delays (SSD) in the task were set to stabilize at a delay causing approximately 50% successful stopping in average of all stop trials recorded till that moment of time in the day (SSD-50); SSDs were modulated through a staircase procedure with intervals of 16 msec. On go trials, subjects were rewarded for a saccade to the go signal and fixating on it for 200 msec; and on stop trials, subjects were rewarded for inhibiting their saccade to go signal and fixating at the stop signal for 400 msec. Water rewards were provided as feedback, and they were contingent on subject’s performance. Rewards were always 125 μl. The inter trial interval was 800 msec.
The economic choice task had a similar task framework to stop signal task, and they interleaved randomly in an interval of 1-3 trials. In go trials (random 67% of the total), a peripheral target called go offer (90px white square, 2.38 degrees, positioned at 288px in left or 1632px in right of the screen, 50% chance) was presented, and it was randomly associated with low (15μl), medium (125μl), or high (250μl) reward offers, as indicated by yellow, blue and magenta colored squares, respectively. In this task, the go trials were named forced choice trials, and the go offer was called offer 1. In stop trials (random 33% of the total)-called as choice trials, a center stop offer (offer 2, 90px square, 2.38 degrees) delayed with respect to the appearance of offer 1 was presented in addition. The offer 2 was also randomly associated with yellow, blue and magenta colors to indicate low, medium and high reward sizes. The offer 1 in stop trials was always in blue color to represent medium reward sized offer. This setup allowed the subject to make a choice through reward comparison in case of choice trials, and through a forced choice when only offer 1 was presented. All other parameters were the same as stop signal task.
Behavioral analysis
Inhibition function related failed stoppings to stop signal delay (SSD). The delay from the presentation of go signal that caused 50% successful cancellation in stop signal task (SSD-50) was used for computing stop signal reaction time (SSRT). SSRT was usually computed through median and integration methods (Hanes and Schall, 1995; Logan, 1994; Logan and Cowan, 1984; Verbruggen and Logan, 2008). Median method computed median of go trials’ reaction time distribution and then subtracted SSD-50 from it to give SSRT. The integration method computed the point in go trials’ RT distribution whose area was half the whole and then subtracted SSD-50 from it to give SSRT. SSRT computed from both of the above methods gave nearly equal results, and they were averaged to obtain the final SSRT estimates reported for both subjects.
Statistical methods
Separate PSTH matrices were constructed by aligning spike rasters to the presentation of the go signal and stop signal for every neuron. Firing rates were calculated in 1 msec bins but were generally analysed in longer epochs. Normalization procedure was carried out by subtracting the mean firing during inter-trial interval (ITI) time period (baseline) and then by zscoring each neuron’s data, and the normalized data is used for decoder analysis presented in the manuscript. Alternatively, decoding was also tested with just zscored data, and the results are presented in supplementary material. For display, PSTHs were smoothened using 200 msec running boxcars. Tests used in the study include two sample t-test for parameteric analysis, Wilcoxon rank test for non-parametric analysis, chi-square test for comparing decoder’s classification accuracy against baseline (50% classification accuracy), Pearson correlation method for correlation analysis. To compute population tuning, we picked neurons with significant (p < 0.05) differences between successful and failed stopping trials using Wilcoxon rank test.
Decoding analyses
We chose a neural network based decoding technique because it could efficiently analyse population responses from frontal cortex that are highly multiplexed and nonlinear. To generate population activation states as input patterns for the decoding analysis, we first separated all trials of each neuron by trial conditions (successful and failed stopping trials). Then, we averaged the activity from randomly sampled 10 trials belonging to a condition, with replacement, to form activation state for a neuron in any particular time period. The averaged responses of all 96 neurons’ were pooled to generate one population activation state for a particular trial condition and for a specific time period. 100 unique activation state patterns were used for the network training. 75% of the data was used for training and the rest was used for testing the network. The procedure is similar to that carried out by other studies (e.g., Mante et al., 2013; Pouget et al., 2000; Rigotti et al., 2013; Wang and Hayden, 2017).
The network used to study the stopping patterns had a single hidden layer with 100 hidden nodes, and 2 output nodes each representing one target condition for classification. The number of input nodes equal to the total number of neurons used for analysis = 96 (from two subjects). The network weights were initialized to small random numbers between −0.01 and 0.01.
The following back-propagation algorithm was used for training the decoders (Haykin and Network, 2004; Rumelhart et al., 1986; Rumelhart et al., 1988; Werbos, 1974). In the below, the input nodes are denoted by subscript, k, hidden nodes by subscript, j, and output nodes by subscript, i. Output error, e, associated with the network’s response for the p’th input pattern was
where yi was the i’th output node response, and desired output was 1 / 0 if the i’th output node was associated with target trial condition for the corresponding input pattern (e.g., successful stopping, failed stopping). Total output error over all input patterns was computed by,
Network’s objective was to minimize the squared output error (eqn. 1) for the p’th pattern as denoted by eqn. (3).
Response of any node was a hyperbolic tangent function (g) of slope = 5 of the total input (his) to it. The output node response, yi, as a function of its input was calculated as,
where, net input (his) to the output layer was,
In the above, the weights, wij, with superscript, s, indicate the second level of the network between hidden and output layer. Vj denoted the output of hidden layer, and it was represented as a function of net input to the hidden node (hif) as follows,
and
The superscript, f, in eqns. (6, 7) denote first level of the network between input and hidden layer, wjk were their weights, and xk was the input pattern to neural network. Weight updates were proportional to the negative change in error for the p’th pattern, Ep, on change in weights. All updates happened trial by trial in the training phase. The update used at the second level was by eqn. (8), and that in the first level was by eqn. (10).
where,
where,
ŋ is the learning rate set to 0.001 for pre-go and post-stop signal decoder, and g’ denotes first order derivative of hyperbolic tangent function.
We had two different decoders trained on data from 1) pre-go signal, 2) post-stop signal time periods to show OFC’s active participation in stopping; the former worked on data aligned to presentation of go signal at time = 0, and the latter worked on data aligned to stop signal. For pre-go decoder, the training data was population activation states generated on averaging the signal from the fixation epoch spanning 300 msec before the presentation of go signal. For post-stop decoder, training data was generated on averaging the firing in the post-stop signal epoch. The entire network was run for n = 100 instances with different random weight initializations to obtain average output performance. Training procedure in all instances converged to classification accuracy of above 80%, and the converged weights at the end of training were used for testing of decoder. The testing data used were population activation states generated by averaging 100 msec boxcars that slides with step size of 10 msec (a total of 91 boxcars).
Similarities in the functioning and generalization of pre-go and post-stop decoders were analysed by comparing their converged weights, as well as by comparing their classification accuracy. For cross validation, the similarity index (r-max) was computed by cross correlating converged hidden layer weight vectors (with zero lag) of two decoders of interest. The index was averaged across n (=100) instances of networks with different weight initializations. The similarity index obtained from autocorrelating the weight vectors were used to statistically compare and cross-validate the results from cross correlation, and the results were significant using ttest (ttest, tstat = 210, p < 0.001). Comparisons between classification accuracies of the two decoders, at pre-go signal time period or post-stop signal time period, were done by using t-test on average performances of the two decoders computed from n instances (with different random weight initializations).
Cancellation time was defined by the size of test-boxcar window positioned at first instance of atleast four consecutive test-boxcars (100 msec window moving in intervals of 10 msec) in a row, whose performance was significantly higher than 50% using chi-square test (p < 0.05). The method avoids false positives that otherwise appear by 99% chance when considering just any one single significant instance of 91 total boxcars. With simulations using markov chains, we found that at least 4 consecutive significant windows were needed in a row for the claim of significance with p < 0.001; so the criteria to find at least 4 consecutive significant bins were used to find pre-go and post-stop decoder results as well as cancellation time. Average latency of cancellation signals to SSRT was found by subtracting SSRT of each subject from the mean cancellation time.
The decoders used for finding similarities between stopping and economic choices were similar to the above. The forced choice trials with offer 1 (accept) kind were chosen for analysis. The trials were divided into low and high value types, based on their reward magnitudes: the former type when the rewards were either low or medium, and the latter when the rewards were high, respectively. For this analysis, we also consider post-go signal decoder, similar to the post stop signal decoder except for its training on the post-go signal epoch from the presentation of go signal (in case of stop signal task) or offer 1 presentation (in case of economic choice task) till the reaction time. And, the feedback decoder was trained on the neural signals from the start to the end of feedback, for both the tasks. All decoders were tested using trials from economic choice task using boxcars of 100 msec length moving in intervals of 10 msec.
Reward and stopping index
Reward index for every neuron was measured by linearly regressing the firing at outcome epoch (between reaction time and feedback) to the received reward sizes in economic trials. The stopping index was measured as the difference in normalized firing rates (FR) of successful and failed stopping trials divided by their norm.
Cross validation tests were performed to support the idea that we had sufficient data to detect an effect had it been there, and to suggest that our results of lack of a significant correlation between stopping and reward indices were statistically meaningful. For the cross validation analysis, all trials within a neuron were randomly separated to two groups, A and B. Stopping and reward index were computed for those two groups of each neuron. We performed correlations between stopping indices of groups A and B, and between reward indices of A and B. A total of n (=100) random permutation instances were performed to generate different A and B sets. The test should ideally show high correlations between indices of A and B for any instance, and we indeed saw positive correlations between stopping-indexA and stopping-indexB, and similarly for reward-indexA and reward-indexB. We confirmed that the actual correlation coefficient between stopping and reward indices in OFC fell within bottom 1% of the coefficients computed for n instances of stopping-indexA and stopping-indexB. The above was also confirmed for n coefficients for reward-indexA and reward-indexB. The results showed no-significant correlations between stopping and reward indices with p ≤ 0.01.
Results-B
Population averages provide weak information about stopping
Analysis of single neurons did not provide strong evidence for a role for OFC in stopping. The percent of neurons that individually distinguish successful and failed stopping trials (regardless of sign) was 8.43% during the 100 msec post-stop signal time period, and was 10.50% during the 100 msec pre-go signal time period. (These epochs were selected before analysis in order to reduce the likelihood of p-hacking). These proportions were not significantly greater than chance in either of the two key epochs (chi-square stat = 1.22, p = 0.26 in the post-stop signal time period; chi-square stat = 1.8, p = 0.17 in the pre-go signal time period). This lack of a detectable effect does not imply that a correlation between stopping and unit activity in OFC does not exist; rather it suggests that if it does exist it is too weak to detect using conventional methods that focus on single neurons in a sample of the size we collected.
We next tested whether successful and failed stopping trials have a consistent sign of effect on firing rates. The percent of significantly positive cells (successful > failed) was 5.40%, and wasn’t significantly different from the percent of significantly negative (successful < failed) cells (3.03% chi-square test, chi-square stat = 0.52, p = 0.47) in the post-stop signal period. The difference in the sizes of the two cell classes was also not significant before the start of trial at the pre-go signal time period (significantly positive cells 7.55%, significantly negative cells 2.95%, chi square = 2.40, p = 0.12).
Next we looked at grand averages of populations of neurons. We observed no difference between successful and failed stopping trials either after the stop signal or before the beginning of trial. Specifically, during the post-stop signal time period, responses were slightly less for successful than failed stopping in subject J (average of 0.3 spikes/sec, p = 0.6); the opposite pattern was observed in subject T (average of 0.52 spikes/sec, p = 0.53). Neither effect was statistically significant. Thus, these results suggest that conventional population averages don’t reveal information about the pattern of stopping. Together these analyses indicate that, if stopping correlates exist in OFC, they are of a different form than they take in regions like FEF and SC.
Population activity for successful stopping and failed stopping with respect to (A, C) go signal presentation and (B, D) stop signal presentation, for subjects J and T. Time from start of the go (stop) signal to SSRT is shaded in panels A and C (B and D). Data for all SSDs are averaged to present successful and failed stopping trials. Error bars denote SEM. They don’t reveal significant information about the pattern of stopping.
Results-C
Both post stop signal and pre-go signal decoder was able to classify success of stopping significantly above chance (see Methods for specific use of chi-square tests to quantify significance) before SSRT and go signal presentation, respectively, and chi-square tests were used for finding their significance with p < 0.05, see Methods). Results suggest that successful differentiation of stopping codes can be obtained irrespective of the normalization methods used in the study (In the manuscript, Normalization procedure was carried out by subtracting the mean firing during inter-trial interval (ITI) time period (baseline) and then by zscoring each neuron’s data, and the normalized data is used for decoder analysis).
Results-D
The post stop signal decoder was able to classify success of stopping significantly above chance (see Methods for specific use of chi-square tests to quantify significance) in a time period ranging from 40 msec to 170 msec for subject J, and 40 to 220 msec for subject T, respectively after the stop signal (these times indicate the beginning of 100 msec boxcars, and chi-square tests were used for finding their significance with p < 0.05, see Methods). The first significant bin was therefore of window size 40 – 140 msec, that led to average cancellation time as 90 msec. It preceded the average stopping response by 50 msec in subject J, and by 30 msec in subject T, suggesting OFC’s responses may precede the stopping response. In Pre-go signal decoder, for subject J, high accuracy of decoding was found during the time periods 460 msec to 120 msec before the appearance of go signal. Likewise, it was 420 msec to 200 msec in subject T.