Trading accuracy for speed over the course of a decision

While making decisions in uncertain and dynamic environments, humans and other animals often need to balance the desire to gather sensory information (to make the best choice) with the urgency to act, facing a speed-accuracy tradeoff (SAT). Given the ubiquity of the SAT across species, extensive research has been devoted to understanding the computational mechanisms allowing its regulation at different timescales, including from one context to another, and from one decision to another. However, in unstable environments, animals frequently need to change their SAT on even shorter timescales – i.e., over the course of an ongoing decision – and very little is known about the mechanisms that allow such rapid adaptations. The present study aimed at addressing this issue. Human subjects performed a decision task with changing evidence, dissociating in time the deliberation process from the moment of commitment. In this task, subjects received rewards for correct answers but incurred penalties for mistakes. An increase or a decrease in penalty occurring halfway through the trial promoted rapid SAT shifts, favoring speeded decisions either in the early or in the late stage of the trial. Importantly, these shifts were associated with stage-specific adjustments in the accuracy criterion exploited for committing to a choice and relatedly, with dynamic, non-linear changes in urgency. Those subjects who decreased the most their accuracy criterion at a given decision stage exhibited the highest gain in speed, but also the highest cost in terms of performance accuracy at that time. Altogether, the current findings offer a unique extension of former work, by suggesting that dynamic changes in urgency allow the regulation of the SAT within the timescale of a single decision.


Abstract
While making decisions in uncertain and dynamic environments, humans and other animals often need to balance the desire to gather sensory information (to make the best choice) with the urgency to act, facing a speed-accuracy tradeoff (SAT). Given the ubiquity of the SAT across species, extensive research has been devoted to understanding the computational mechanisms allowing its regulation at different timescales, including from one context to another, and from one decision to another.
However, in unstable environments, animals frequently need to change their SAT on even shorter timescalesi.e., over the course of an ongoing decisionand very little is known about the mechanisms that allow such rapid adaptations. The present study aimed at addressing this issue. Human subjects performed a decision task with changing evidence, dissociating in time the deliberation process from the moment of commitment. In this task, subjects received rewards for correct answers but incurred penalties for mistakes. An increase or a decrease in penalty occurring halfway through the trial promoted rapid SAT shifts, favoring speeded decisions either in the early or in the late stage of the trial. Importantly, these shifts were associated with stage-specific adjustments in the accuracy criterion exploited for committing to a choice and relatedly, with dynamic, non-linear changes in urgency. Those subjects who decreased the most their accuracy criterion at a given decision stage exhibited the highest gain in speed, but also the highest cost in terms of performance accuracy at that time. Altogether, the current findings offer a unique extension of former work, by suggesting that dynamic changes in urgency allow the regulation of the SAT within the timescale of a single decision.

INTRODUCTION
Humans and other animals are motivated to make choices that maximize their reward rate (Balci et al. 2011). Paradoxically, while decision accuracy increases the likelihood of getting rewards, the long deliberation time necessary to make accurate choices can ultimately reduce the reward rate (Gold and Shadlen 2002;Drugowitsch et al. 2012;Carland et al. 2019). Hence, animals always need to balance the desire to gather sensory information (to make the best choice) with the pressure to act quickly, facing a speed-accuracy tradeoff (SAT; Henmon 1911;Palmer et al. 2005;Rinberg et al. 2006;Trimmer et al. 2008;Chittka et al. 2009;Salinas et al. 2014). Given the central role of the SAT in decision-making, extensive research is being devoted to understanding the computational mechanisms at the basis of its regulation (Bogacz et al. 2010b;Schall 2019, for review see Heitz 2014).
For decades, models of decision-making have offered theoretical accounts of how the brain may regulate the SAT (Stone 1960;Ratcliff and Rouder 1998;Ratcliff et al. 2001Ratcliff et al. , 2003Thapar et al. 2003;Bogacz et al. 2010a, for review see Heitz 2014).
Traditional formalizations postulate that decision-making involves an accumulation of sensory evidence, which drives neural activity up to a fixed level, and once this critical threshold is reached, an action is selected (e.g., Vickers 1970;Reddi and Carpenter 2000;Usher and McClelland 2001;Leon and Shadlen 2003;Mazurek et al. 2003;Gold and Shadlen 2007;Brown and Heathcote 2008;Heitz and Schall 2012;Hanks et al. 2014; Kelly and O'Connell 2015;Derosiere et al. 2018;Alamia et al. 2019;Schall 2019). In this view, to achieve a desired accuracy criterion, the brain controls the height of a neural threshold that determines how much neural activity related to evidence is needed to commit to a decision. Fast decisions involve low accuracy criteria, reducing the amount of evidence required for neural activity to reach the threshold, while longer and accurate deliberations imply higher accuracy criteria. Such adaptations were shown to occur both from one SAT context to another (e.g., Forstmann et al. 2008;Herz et al. 2016Herz et al. , 2017 and from one decision to another within the same context (e.g., Purcell and Kiani 2016;Fischer et al. 2018;Desender et al. 2019), providing a key mechanism to trade speed with accuracy at different time-scales. Yet, some recent work questioned the accuracy (e.g., Ditterich, 2006) and the generalizability (e.g., Cisek et al., 2009, Thura 2016 of perfect accumulator models, as well as their suitability to describe the neural processes underlying SAT (Rae et al. 2014;Servant et al. 2019).
About a decade ago, several studies demonstrated that the amount of evidence required to commit to a choice can sometimes decrease over the course of a decision, indicative of an accuracy criterion that wanes as time elapses (i.e., rather than being fixed over time; e.g., Cisek et al. 2009;Gluth et al. 2012). To explain these data, some authors incorporated a time-dependent "urgency" signal in the decision-making models, whichcombined with sensory evidencepushes neural activity upwards over time, effectively implementing a dropping accuracy criterion (Ditterich 2006;Churchland et al. 2008;Cisek et al. 2009;Standage et al. 2011b;Drugowitsch et al. 2012;Thura et al. 2012;Kira et al. 2015). Such urgency-based models appeared to often better explain behavioral data than urgency-free models (Ditterich 2006;Churchland et al. 2008;Standage et al. 2011a;Carland et al. 2015;Murphy et al. 2016;Hauser et al. 2017;Malhotra et al. 2017;Palestro et al. 2018;Steinemann et al. 2018; although see: Hawkins et al. 2015;Voskuilen et al. 2016). If we assume the urgency signal is linear, then it can be characterized by an initial state and a growing rate, which determine the initial height and the dropping rate of the accuracy criterion, respectively, playing thus a central role in the regulation of the SAT. Consistently, in situations where speed is of essence, both the initial state (e.g., Steinemann et al., 2018;Thura, 2020;Thura et al., 2014) and the growing rate (e.g., Hanks et al., 2014;Murphy et al., 2016) of urgency are higher, implying a lower initial criterion that quickly decays over time, compared to when the emphasis is on accuracy.
Making decisions in dynamic environments sometimes requires adjusting the SAT on very short timescalesi.e., not only from one context or one decision to another but also during an ongoing decision (Gluth et al. 2012). For example, imagine a monkey foraging for fruits in a tree, calmly evaluating which looks tastier when, all of a sudden, a more dominant monkey shows up. In such a scenario, the foraging animal will have to speed up its decision, which may lead it to commit to a choice that does not meet its initially high standards. This situation illustrates how animals sometimes need to quickly change their decision policy and expedite a decision as it unfolds.
However, very little is known about the computational mechanisms that allow such rapid adjustments.
Here, we address the hypothesis that human subjects can modify their SAT at specific stages of the deliberation process, by dynamically changing their accuracy criterion. As such, while the majority of former studies have conceptualized urgency as a signal that continuously and steadily grows over time within a trial, here we propose that individuals can in fact control the temporal dynamics of this growing signal and thus, of the criterion for committing to a choice. We tested this idea by assessing the behavior of 15 healthy participants in a modified version of the tokens task (Cisek et al., 2009), where penalty changes occurring halfway through the trial promoted rapid SAT shits, either in the early or in the late stage of the decision process.

Participants
We tested 15 participants for this study (11 women; 24 ± 4.1 years old), recruited from the Research Participant Pool at the Institute of Neuroscience of UCLouvain. All subjects were right-handed according to the Edinburgh Questionnaire (Oldfield 1971) and had normal or corrected-to-normal vision. None of the participants had any neurological disorder or history of psychiatric illness or drug or alcohol abuse, or were on any drug treatments that could influence performance. Participants were financially compensated for their participation and earned additional money depending on their performance on the task (see below). The protocol was approved by the institutional review board of the Université catholique de Louvain, Brussels, Belgium, and required written informed consent.

Experimental setup
Experiments were conducted in a quiet and dimly lit room. Subjects were seated at a table in front of a 21-inch cathode ray tube computer screen. The display was gamma-corrected and its refresh rate was set at 100 Hz. The computer screen was positioned at a distance of 70 cm from the subject's eyes and was used to display stimuli during a decision-making task. Left and right forearms were placed on the surface of the table with both hands on a keyboard positioned upside-down. Left and right index fingers were located on top of the F12 and F5 keys, respectively ( Figure   1.A).

Task
The task used in the current study is a variant of the "tokens task" (Cisek et al. 2009) and was implemented by means of LabView 8.2 (National Instruments, Austin, TX).
The sequence of stimuli is depicted in Figure 1.A. In between trials, subjects were always presented with a default screen consisting of three empty circles (4.5 cm diameter each), placed on a horizontal axis at a distance of 5.25 cm from each other.
The central and lateral circles were light blue and dark blue, respectively, and were displayed on a white background for 2500 ms. Each trial started with the appearance of fifteen randomly arranged tokens (0.3 cm diameter) in the central circle. After a delay of 800 ms, the tokens began to jump, one-by-one every 200 ms from the center to one of the two lateral circles (i.e., 15 token jumps; Jump1 to Jump15). The subjects were instructed to indicate by a left or right index finger keypress which lateral circle they thought would ultimately receive the majority of the tokens (F12 or F5 key-presses for left or right circle, respectively). They could respond as soon as they felt sufficiently confident, as long as it was after Jump1 had occurred and before Jump15. Once a response was provided, the tokens kept jumping every 200 ms until the central circle was empty. At this time, the selected circle was highlighted either in green or in red depending on whether the response was correct or incorrect, respectively, providing the subjects with a feedback of their performance; the feedback also included a numerical score displayed above the central circle (see below, Reward, penalty and block types section). In the absence of any response before Jump15, the central circle was highlighted in red and a "Time Out" (TO) message appeared on top of the screen, together with a "0" (score, see below) above the central circle. The feedback screen lasted for 500 ms and then disappeared at the same time as the tokens did (the circles always remained on the screen), denoting the end of the trial. Each trial lasted 6600 ms.
Although to subjects the token jumps appeared completely random, the direction of each token jump was determined a priori, producing different types of trials (e.g., "easy" or "ambiguous" trials; Cisek et al. 2009). The different trial types were randomized in the full sequence of trials. The impact of these trial types on decision behavior has been studied previously (e.g., Cisek et al., 2009;Thura et al., 2014) and this issue was not investigated here as it falls beyond the scope of the present study.
One key feature of the tokens task is that it allows one to compute, in each trial, the success probability pi(t) associated with choosing each target i at any time t. For instance, for a total of 15 tokens, if at a particular moment in time the right target contains NR tokens, whereas the left contains NL tokens, and there are NC tokens remaining in the center, then the probability that the target on the right will ultimately be the correct one (i.e., the success probability of guessing right) is as follows: (1) Computation of the subject's accuracy criterion relies on the amount of sensory evidence that was available when the subject committed to her/his choice (i.e., at decision time [DT]). Because it is very unlikely that subjects can calculate the real success probability function (eq. 1), we computed a first-order estimation as the sum of log-likelihood ratios (SumLogLR) of individual token movements (Cisek et al., 2009).
Based on the temporal profile of the accuracy criterion (i.e., of the SumLogLR at DT), it is then possible to extract an urgency function, characterized by an initial level and a changing rate (i.e., the intercept and the coefficients of the function, respectively). Hence, the tokens task provides us with the possibility to estimate how the accuracy criterion, as well as the initial level and the changing rate of urgency varies from one experimental condition to another. Further details regarding the computation of the accuracy criterion and of the urgency function are provided later, in the Data analyses section.

Reward, penalty and block types
As mentioned above, subjects received a feedback score at the end of each trial, which depended on whether they had selected the correct or the incorrect response.
Correct responses led to positive scores (i.e., a reward) while incorrect responses led to negative scores (i.e., a penalty). Subjects knew that the sum of these scores would turn into a monetary reward at the end of the experiment.
In correct trials, the reward was equal to the number of tokens remaining in the central circle at the time of the response (in € cents). Hence, the potential reward for a correct response gradually decreased over time (Figure 1.B). For instance, a correct response provided between Jump5 and Jump6 led to a gain of 10 cents (10 tokens remaining in the central circle). However, it only led to a gain of 5 cents when the response was provided between Jump10 and Jump11 (5 tokens remaining in the central circle). The fact that the reward dropped over time produced an increasing urge to respond over the course of a trial, as evidenced from the urgency functions obtained in such a task ).
Incorrect responses led to a negative score but here, the size of this penalty was not linearly proportional to the RT. Importantly, it differed in three block types (see Figure 1.B). The penalty for an incorrect response always equaled 7 cents in the first half of the trial (i.e., up to Jump8), regardless of the block type. However, in the second half of the trial (i.e., after Jump8), it could then either increase to 13 cents (PenaltyIncrease blocks), remain constant at 7 cents (PenaltyConstant blocks) or decrease to 1 cent (PenaltyDecrease blocks). The passage from the first half of the trial (called early-stage) to the second half (late-stage) was indicated to the subjects by a change in the color of the central circle, which always turned black at Jump8. Each block type was realized on a separate experimental session (see Experimental procedure section below), and subjects were informed at the beginning of the session the block type to be realized.
We expected that the penalty shift would induce stage-specific adjustments of the SAT in the PenaltyIncrease and PenaltyDecrease blocks, compared to the PenaltyConstant condition. Particularly, in the PenaltyIncrease blocks, we expected that the prospect of a higher penalty at a late stage would promote faster decisions at the early stage, at the cost of accuracy. Hence, we expected subjects to trade accuracy for speed specifically when making early-stage decisions in the PenaltyIncrease blocks. Inversely, in the PenaltyDecrease blocks, we predicted a tendency to make fast but less accurate decisions at a late-stage of the trial, after the drop in penalty.

Experimental procedure
Subjects performed three experimental sessions (one for each block type) conducted on separate days at a 24-h interval. Testing always occurred at the same time of the day for a given subject, to avoid variations that could be due to changes in chronobiological states (Schmidt et al. 2006;Derosière et al. 2015). The order of the sessions was counterbalanced across participants.
The three sessions always started with two short blocks of a simple RT task. In this task, subjects were presented with the same display as in the tokens task described above. However here, instead of jumping one by one, the 15 tokens jumped simultaneously into one of the two lateral circles (always the same one in a given block) and subjects were instructed to respond as fast as possible by pressing the appropriate key (i.e., F12 and F5 for left and right circles, respectively). Because the target circle was known in advance of the block, this task did not require any choice to be made and was exploited to determine the subject's mean "simple reaction time" (SRT) for left and right index finger responses. We obtained this SRT by computing the difference between the key-press and the time at which the 15 tokens left the central circle (Cisek et al., 2009).
Next, subjects performed training blocks to become acquainted with the tokens task. In the first one (20 trials, only run on the first session), we ran a version of the tokens task in which the feedback was simplified; the lateral circle turned either green or red, depending on whether subjects had provided a correct or incorrect response; no reward or penalty was provided here. Then, we ran two training blocks (20 trials each) in the condition subjects would be performing next during the whole session (PenaltyIncrease, PenaltyConstant or PenaltyDecrease).
The actual experiment involved 8 blocks of 80 trials (640 trials per session; 1920 trials per subject). Each block lasted about 8.5 minutes and a break of 5 minutes was provided between each of them. Each session lasted approximatively 120 minutes. In each trial, 15 tokens jumped one-byone every 200 ms from the central circle to one of the lateral circles. The subjects had to indicate by a left or right index finger keypress (i.e., F12 and F5 keys, respectively) which lateral circle they thought would receive more tokens at the end of the trial. For a correct response, the subjects won, in € cents, the number of tokens remaining in the central circle at the time of the response. Hence, the reward earned for a correct response decreased over time, as depicted in B. The example presented on upper inset at the right of panel A represents a correct response provided between Jump5 and Jump6i.e., the score indicates that 7 tokens remained in the central circle at the moment the right circle was chosen. In contrast, as illustrated on the middle inset of A, subjects lost money if they chose the incorrect lateral circle: they received a negative score that depended on the block type, as indicated in B. In the absence of any response ("Time Out" trial, bottom inset), subjects were neither rewarded, nor penalized (score = 0). For representative purposes, the "Time Out" message is depicted below the circles in this example, while it was presented on top of the screen in the actual experiment. B. Block types. Incorrect responses led to a negative score, which differed in three block types. The penalty for an incorrect response always equaled 7 cents in the first half of the trial (i.e., up to Jump8), regardless of the block type. However, in the second half of the trial (i.e., after Jump8), it could then either increase to 13 cents (PenaltyIncrease blocks; magenta, left), remain constant at 7 cents (PenaltyConstant blocks; yellow, center) or decrease to 1 cents (PenaltyDecrease blocks; blue, right). The passage from the first half of the trial (called early-stage) to the second half (late-stage) was indicated to the subjects by a change in the color of the central circle, which always turned black at Jump8 (see A).

Data analyses
Data were collected by means of LabView 8.2 (National Instruments, Austin, TX), Although the non-decision delay might differ in simple versus choice tasks (e.g., due to different encoding demands), we are always comparing DTs so those comparisons are not affected by any inaccuracies in our within-subject SRT estimate.
A potential inaccuracy could occur in the calculation of the SumLogLR at DT in some trials (see below), in which we may mistakenly calculate it from the wrong token state.
However, this should be quite rare given that the tokens jump only once every 200 ms, giving us a large time window.
For the analysis of DT and performance accuracy data, the dataset was split into two subsets according to whether decisions were made during the early stage (between Jump1 and Jump8; DTs ranging from 0 and 1400 ms) or during the late stage of the trial (between Jump8 and Jump15; DTs ranging from 1400 and 2800 ms). This allowed us to test for the effect of the block on the subjects' decision speed and accuracy, separately for responses provided either during the early-or late-stage of the trial. We predicted that, compared to the PenaltyConstant condition, subjects' performance accuracy would be particularly low for responses provided during the early-stage in PenaltyIncrease blocks and during the late-stage in PenaltyDecrease blocks, reflecting a propensity to trade decision accuracy for speed when the penalty is the lowest within a trial.

Accuracy criterion
As mentioned above, the tokens task allows us to estimate the subject's accuracy criterion, based on the amount of evidence that was available for the chosen circle in each trial at DT (i.e., the SumLogLR at DT). As such, high (low) accuracy criteria imply the necessity to accumulate a large (small) amount of evidence before committing to a choice, and thus, high (low) SumLogLR at DT values. The SumLogLR of individual token ( ) movements was calculated as follows: (2) Where p(ek|S) is the likelihood of a token event ek (a token jumping into either the selected or non-selected lateral circle) during trials in which the selected target S is correct, and p(ek|NS) is its likelihood during trials in which the non-selected circle NS is correct. Hence, the SumLogLR at DT is proportional to the difference in the number of tokens that have moved in each direction at the time of commitment. To characterize the changes in accuracy criterion from one block condition to another during the earlyand the late-stage of the trial, we split the SumLogLR at DT dataset into two subsets according to whether decisions were made in the former or in the latter stage. In accordance with previous studies (e.g., Cisek et al., 2009;Gluth et al. 2012;Murphy et al., 2016), we expected that the accuracy criterion would drop as the deadline to respond approached, thus leading to globally lower values in the late-relative to the early-stage of the trial. However, we predicted that, compared to the PenaltyConstant condition, the criterion would be particularly low during the early-stage in PenaltyIncrease blocks and during the late-stage in PenaltyDecrease blocks, reflecting the subjects' ability to adjust their criterion to a desired level at specific stages of the decision process. A direct implication of such urgency-based models is that decisions made with low levels of sensory evidence (i.e., involving low accuracy criteria) should be associated with high levels of urgency and vice versa. That is, one core assumption is that a high urgency should push one to commit to a choice even if evidence for that choice is weak, effectively implementing a low accuracy criterion. Hence, the accuracy criterion values (SumLogLR at DT) can be exploited to estimate the level of urgency at DT (e.g., Thura et al., 2014;Thura, 2020).
Here, we estimated the level of urgency based on the accuracy criterion values obtained for different DTs. We first grouped the trials in bins as a function of the DT and calculated the average accuracy criterion for each bin. Nine bins were defined, with the first bin including decisions made between 600 and 800 ms, the second bin including decisions made between 800 and 1000 ms, and so on, until the last bin covering the period between 2200 and 2400 ms. The accuracy criterion values preceding 600 ms or following 2400 ms were not considered for this analysis because many subjects did not respond at these times (59.5 ± 0.04 % of the bins were missing values for these very early and very late times). Considering a model in which evidence is multiplied by an urgency signal, we estimated urgency values based on the accuracy criterion obtained at each bin, in each subject and each block condition, as follows: Above, t is the DT bin, p is the penalty condition, s is the subject number, AC is the accuracy criterion value (i.e., SumLogLR at DT), T is a constant representing a fixed neural threshold (which we fixed to 1), and U is the estimated urgency value. We then fitted regression models over the obtained urgency values. A linear and a second-order polynomial model were fitted and the Akaike Information Criterion (AIC) was obtained for each subject and each block condition, allowing us to compare the two models to each other. We predicted that, compared to the PenaltyConstant condition, urgency would be particularly high during the early-stage in PenaltyIncrease blocks and during the latestage in PenaltyDecrease blocks, and that the polynomial model would thus better capture the dynamic changes in urgency in these two block conditions compared to the linear one (i.e., lower AIC values for polynomial fits).

Statistical analyses
Statistica software was used for all analyses (version 7.0, Statsoft, Oklahoma, United-States). The DT, performance accuracy and SumLogLR at DT data were analyzed using two-way repeated-measure ANOVAs (ANOVARM) with BLOCK (PenaltyIncrease, PenaltyConstant, PenaltyDecrease) and STAGE (early-stage, late-stage) as within-subject factors. The %TO data were analyzed using a one-way ANOVARM with BLOCK (PenaltyIncrease, PenaltyConstant, PenaltyDecrease) as a within-subject factor.
Finally, the AIC values obtained from the urgency fits were analyzed using a two-way ANOVARM with BLOCK (PenaltyIncrease, PenaltyConstant, PenaltyDecrease) and MODEL (linear, polynomial) as a within-subject factors. When appropriate, LSD post-hoc tests were used to detect paired differences. Results are presented as mean ± SE.

Decision time, performance accuracy and %TO
The average DT was not significantly different in the PenaltyIncrease, the Finally, the ANOVARM revealed a significant effect of BLOCK on the timeout (%TO) data (F2,28 = 15.95, p < .0001; see Figure 2.H). The %TO was indeed higher in the PenaltyIncrease (7.2 ± 1.3 %) than in both the PenaltyConstant (4.1 ± 0.8 %; p = .004) and the PenaltyDecrease blocks (1.5 ± 0.2 %; p < .0001). In addition, it was lower in the PenaltyDecrease than in the PenaltyConstant blocks (p = .016). Hence, the lower the penalty was during the late-stage of the trial, the less the subjects were inclined to be cautious and to avoid responding, consistent with the reduced decision accuracy observed in the late-stage of PenaltyDecrease blocks. . That is, the subjects who decreased the most their criterion in one block type with respect to the PenaltyConstant condition were those who presented the highest gains in decision speed, but also the highest costs in terms of accuracy.

Urgency functions
A direct implication of urgency-based models is that decisions made with a low accuracy criterion are associated with a high level of urgency and vice versa. Hence, we used the temporal profile of the accuracy criterion, obtained for decisions made between 600 and 2400 ms (presented in Figure 3.D), to estimate urgency functions.  [100-(CriterionPenaltyIncrease/CriterionPenaltyConstant*100)] for early-stage decisions and [100-(CriterionPenaltyDecrease/CriterionPenaltyConstant*100)] for late-stage decisions) and the block-related shift in DT and accuracy. Both early-stage and late-stage data are shown, leading to an n of 30 points. D. Temporal evolution of the accuracy criterion as a function of DT. E and F. Urgency functions, obtained using a linear and polynomial model, respectively. G. Comparison of the Akaike Information Criterion (AIC) for linear and polynomial models. The graphs show that the AIC value was significantly higher for polynomial fits when applied on the PenaltyIncrease data (magenta bars), but not for PenaltyConstant and PenaltyDecrease data (yellow and blue, respectively). The cumulative distribution of subjects obtained for the ΔAIC (i.e., ΔAIC = AICPolynomial -AICLinear) is presented on the right, highlighting that the polynomial model outperformed the linear one for most of the single-subject data (magenta trace). H. Cumulative distribution of subjects and mean urgency estimated in the early stage (i.e., at 600 ms) using the polynomial fit. I. Same as H for the late stage (i.e., estimation made at 2400 ms). * = Between-block significant difference at p < .05. Error bars represent SE. J. Example of individual data. Accuracy criterion values data are represented for the three block conditions and each stage.

DISCUSSION
In dynamic environments, humans and other animals often need to change their choice SAT while a decision is ongoing. However, very little is known about the computational mechanisms that allow these rapid changes of decision policy. In the present study, we addressed the hypothesis that human subjects can shift their SAT at specific stages of the deliberation process, by dynamically adjusting their accuracy criterion. Participants performed a modified version of the tokens task (Cisek et al., 2009), where an increase or a decrease in penalty occurring halfway through the trial promoted rapid SAT shifts, either in the early or in the late decision stage. Our results reveal that subjects traded accuracy for speed specifically at times where the penalty was the lowest within a trial. Interestingly, these changes were accompanied by stagespecific adjustments in accuracy criterion; in fact, those who decreased the most their criterion presented the highest gains in decision speed, but also the highest costs in terms of accuracy.
Several studies have now revealed the flexibility with which humans can adapt their choice SAT at different time-scales, including from one context to another (e.g., Palmer et al. 2005;Forstmann et al. 2008;Ratcliff and McKoon 2008;Herz et al. 2016Herz et al. , 2017 and from one decision to another (e.g., Purcell and Kiani 2016;Fischer et al. 2018;Desender et al. 2019). The current findings offer a unique extension of this work, by showing that the SAT can be modulated on an even shorter time-scale i.e., over the course of a single decision. In PenaltyIncrease blocks, decisions were faster but less accurate in the first half of the trial (i.e., compared to the PenaltyConstant condition), while in PenaltyDecrease blocks, such SAT shifts occurred in the second half of the trial. The occurrence of a shift in the first half of the trial in PenaltyIncrease blocks indicates the operation of a proactive, anticipatory process, through which the prospect of a future rise in penalty determined the decision policy to adopt for early-stage decisions. As such, in the current task, subjects likely chose a policy for modifying their SAT before the trial had even started (or before the block of trials). Given that each block (and even each session) always involved the same type of penalty change, subjects could determine what decision policy they should adopt in this specific setting and apply it during deliberation. Whether rapid shifts in SAT can occur reactively (e.g., following online, unpredictable cues) remains an open question, worthy of future investigation.
Moreover, the temporal dynamics of this drop depended on whether the penalty increased or decreased halfway through the decision process. In the PenaltyIncrease blocks, subjects lowered their accuracy criterion specifically in the early decision stage (i.e., relative to PenaltyConstant blocks) while in PenaltyDecrease blocks, they did so in the late decision stage. In fact, the adjustment of the accuracy criterion was more pronounced in the early decision stage (i.e., in the PenaltyIncrease relative to the PenaltyConstant blocks), than in the late one (i.e., in the PenaltyDecrease relative to the PenaltyConstant blocks, where differences were marginally significant). This idea is substantiated by the finding that a polynomial model captured more variance of the changes in urgency when considering the PenaltyIncrease data (i.e., compared to a linear model), but not when considering the PenaltyDecrease data. Hence, participants seemed more effective at adjusting their level of urgency (and, relatedly, their accuracy criterion) for early-compared to late-stage decisions. One possible explanation for this is that urgency was inherently lower for early decisions than for late ones, leaving more room for volitional regulation. Alternatively, it may be the case that the incentive to adjust urgency was stronger in the early stage of PenaltyIncrease blocks than in the late stage of PenaltyDecrease ones. As such, because of the natural aversion of humans to risk (Weber et al. 2004;Zhang et al. 2014), the prospect of a future rise in penalty might have been more salient than the sudden drop in penalty, thus leading to stronger changes in urgency in the former block condition. Importantly, the shape of urgency signals is likely dependent on multiple contextual factors (such as risk here), on individual traits (Carland et al., 2019) and on the type of task and behavior at play. While our findings suggest that urgency signals can take the form of non-linear ramps in decision-making tasks involving dynamic SAT adjustments, such signals might include even more intricate shapes such as pulses or pauses to construct complex motor behaviors in other task settings, such as when moving to a beat.
Overall, the present study builds on former work on the computational mechanisms underlying the SAT policy. Consistent with past research, we show that the accuracy criterion progressively drops over time during the decision process, in line with an increased urge to commit as the time left to respond diminishes. Most importantly, we provide evidence that rapid shifts in SAT can occur over the course of an ongoing decision and that these changes could be related to dynamic adjustments of the accuracy criterion and, relatedly, of urgency (Shinn et al., 2020). Future work is needed to extend the current observations to situations involving reactive SAT shifts, which may emerge in response to online sensory cues.