## Abstract

How choices are made within noisy environments is a central question in the neuroscience of decision making. Previous work has characterized temporal accumulation of evidence for decision-making in static environments. However, real-world decision-making involves environments with statistics that change over time. This requires discounting old evidence that may no longer inform the current state of the world. Here we designed a rat behavioral task with a dynamic environment, to probe whether rodents can optimally discount evidence by adapting the timescale over which they accumulate it. Extending existing results about optimal inference in a dynamic environment, we show that the optimal timescale for evidence discounting depends on both the stimulus statistics and noise in sensory processing. We found that when both of these components were taken into account, rats accumulated and temporally discounted evidence almost optimally. Furthermore, we found that by changing the dynamics of the environment, experimenters could control the rats’ accumulation timescale, switching them from accumulating over short timescales to accumulating over long timescales and back. The theoretical framework also makes quantitative predictions regarding the timing of changes of mind in the dynamic environment. This study establishes a quantitative behavioral framework to control and investigate neural mechanisms underlying the adaptive nature of evidence accumulation timescales and changes of mind.

## Introduction

Decision making refers to the cognitive and neural mechanisms underlying processes that generate choices. In our daily life, the processes of decision making are ubiquitous. Decision making has been a major focus in the neuroscience community because it bridges sensory, motor, and executive functions. A well characterized decision making paradigm is that of “evidence accumulation” or “evidence integration” referring to the process by which the subject gradually processes evidence for or against different choices until making a well defined choice. Evidence accumulation is thought to underlie many different types of decisions from perceptual decisions (Brunton et al., 2013), to social decisions (Krajbich et al., 2015), and to value based decisions (Basten et al., 2010).

Most behavioral studies to date have focused on evidence accumulation in stationary environments. In the case of stationary environments, the normative behavioral strategy used is perfect integration (Bogacz et al., 2006), which refers to equal weighting of all incoming evidence across time. However, real world environments are complex and change over time. In this case, a strategy based on perfect integration will be suboptimal due to the changing statistics of the environment. Crucially, in a dynamic environment older observations may no longer reflect the current state of the world, and an observer needs to modify their inference processes to discount older evidence. Previous studies have demonstrated that humans can modify the timescales of evidence integration, adopting “leaky” integration when beneficial (Ossmy et al., 2013; Glaze et al., 2015). This observation opens many questions related to why and how subjects might alter their integration timescales. To answer “why” or normative questions, one would ideally like to develop a model that can be directly compared to the standard evidence accumulation models used in the decision making literature. Two recent studies have developed this connection to drift-diffusion models, and examined evidence accumulation in dynamic environments either in humans (Glaze et al., 2015; Gold and Stocker, 2017) or in ideal observer models (Veliz-Cuba et al., 2016). Animal models of behavior facilitate investigation of “how” or mechanistic questions, by allowing measurement and perturbation of neural circuits. Here, we demonstrate that rats are capable of adopting the optimal integration timescale predicted by the recently developed modeling framework (Veliz-Cuba et al., 2016), and we furthermore show that they can dynamically modulate their integration timescale according to changing environmental statistics.

In the present study, we extend a previously published pulse-based accumulation of evidence task the “Poisson clicks task” (Brunton et al., 2013; Erlich et al., 2015; Hanks et al., 2015) to a dynamic environment. We refer to our task as the “Dynamic clicks task”. We extend results from the literature (Veliz-Cuba et al., 2016) to develop the optimal inference process for our task. The ideal observer is closely related to the “drift-diffusion model” used widely in the decision making literature (Bogacz et al., 2006; Ratcliff and McKoon, 2008). The primary difference is that, in addition to integrating sensory evidence, the ideal observer discounts accumulated evidence at a rate proportional to the volatility of the environment, and the reliability of each evidence pulse. The reliability of each pulse is determined by the stimulus statistics (e.g., the pulse rates), as well as noise in the subject’s sensory transduction process. While the exact origin of sensory noise is unclear, quantitative modeling can separate sensory noise from other types of noise (Brunton et al., 2013). Here, we use sensory noise to refer to noise that scales with the amount of evidence. The role of sensory noise in decision making processes is a relatively unexplored area. Studies in the literature are beginning to document under what circumstances subjects modify their behavior based on noise in the sensory evidence (Gureckis and Love, 2009; Zylberberg et al., 2016).

Using high-throughput behavioral training, we trained rats to perform this task. With a combination of quantitative methods, we find that rats’ adaptation to the dynamic environment is such that they adopt the optimal timescale for evidence accumulation. Our findings establish rats as an adequate animal model for evidence accumulation in a dynamic environment. Training rodents on state of the art cognitive tasks opens up the opportunity to understand the neuronal mechanisms underlying complex behavior. Rodents can be trained in a high throughput manner, are amenable to genetic manipulation, are accessible to electrophysiological and optogenetic manipulations, and a large number of experimental subjects can be used. Finally, the dynamic clicks task opens up the opportunity to study the neural underpinnings of evidence integration in a dynamic environment as this task gives the experimentalist a unique quantitative handle over the integration timescale of the animals.

## Results

### A dynamic decision making task

We developed a decision making task that requires accumulating noisy evidence in order to infer a state that is hidden, and dynamic. Rats were trained to infer, at any moment during the course of a trial, which of two states the environment was in at that moment. These could be either a state in which randomly-timed auditory clicks were played from a left-speaker at a high rate and right speaker clicks were played at a low rate, or its inverse (low rate on the left, high rate on the right). In more detail, in each trial of our task, we first illuminate a center light inside an automated operant chamber, to indicate that the rat may start the trial by nose-poking into the center port. Once the rat enters the center port, auditory clicks play from speakers positioned on the left and right sides of the rat. The auditory clicks are generated from independent Poisson processes. Importantly, the left and right side Poisson rate parameters are dependent on a hidden state that changes dynamically during the course of each trial. This is in sharp contrast to previous studies where the Poisson click rates are constant for the duration of each trial (Brunton et al., 2013; Erlich et al., 2015; Hanks et al., 2015). Within each trial, the dynamic environment is in one of two hidden states *S*^{1}, and *S*^{2}, each of which has an associated left and right click generation rate (, respectively; ). In this study *S*^{1} and *S*^{2} were symmetric ( and ). Each trial starts with equal probability in one of the two states, and switches stochastically between them at a fixed “hazard rate” *h*. On each time step, the switch probability is given by *h*Δ*t*, (with Δ*t* kept small enough that *h*Δ*t* < < 1). At the end of the stimulus period, the auditory clicks end, and the center light turns off, indicating the rat must make a left or right choice by entering one of the side reward ports. The rat is rewarded with a water drop for correctly inferring the hidden state at the end of the stimulus period (if *S*^{1}, go right; if *S*^{2}, go left). The stimulus period duration is variable on each trial (0.5 – 2 seconds), so the rat must be prepared to infer the current hidden state at all times. Figure 1 shows a schematic of task events, as well as an example trial. Rats trained every day, performing 150-1000 self-paced trials per day.

### Optimal inference in a dynamic environment

Here we derive the optimal procedure for inferring the hidden state. Optimality, in this setting, refers to reward maximizing. Given that each trial’s duration is imposed by the experimenter and thus fixed to the rat, maximizing reward is equivalent to maximizing accuracy (Bogacz et al., 2006). We build on results from Veliz-Cuba et al. 2016, but a basic outline is repeated here for continuity. Mathematical details can be found in the supplementary materials.

Before diving into the derivation, it is worth building some intuition. Because the hidden state is dynamic, auditory clicks heard at the start of the trial are unlikely to be informative of the current state. However, because state transitions are hidden, an observer doesn’t know how far back in time observations are still informative of the current state. Our derivation derives the optimal weighting of older evidence. We first consider observations in discrete timesteps of short duration Δ*t*. Within each timestep, a momentary evidence sample *∊* is generated. This sample is either a click on the left, a click on the right, no clicks, or a click on both sides (we will consider Δ*t* small enough that *r*_{1}Δ*t* << 1 and *r*_{2}Δ*t* << 1 so that multiple clicks are not generated within one timestep).

Following Veliz-Cuba et al. 2016, the probability of being in State 1 at time *t*, given all observed samples up to time *t*:

We can interpret this equation as the probability of being in State 1 given all observed evidence up to time *t* (*P* (*S*^{1}|*∊*_{1…t})) is proportional to the probability of observing the evidence sample at time t given State 1 (*P*(*∊*_{t}|*S*^{1})) times the independent probability that we were in State 1 given evidence from timesteps 1… *t* − 1 (*P*(*S* ^{1}|*∊*_{1…t−1})). This second term is decomposed into two terms which depend on the probability of remaining in the same state from the last time step ((1 − *h*Δ*t*) *P* (*S*^{1}|*∊*_{1…t−1})) and the probability of changing states after the last time step (*h*Δ*tP* (*S*^{2}|_{∊1…t−1})).

Combining the probability of each state into a ratio, we can write the posterior probability ratio (*R*_{t}) of the current state given all previous evidence samples *∊*_{1…t}:

Observe that in a static environment (*h* = 0), the term on the far right simplifies to *R*_{t−1} and (2) becomes the statistical test known as the Sequential Probability Ratio Test (SPRT) (Wald, 1945; Barnard, 1946; Bogacz et al., 2006). A recent study demonstrated that monkeys could accurately perform a literal instantiation of the SPRT (Kira et al., 2015). When *h* ≠ 0, the more complicated expression reflects the fact that previous evidence samples might no longer be informative of the current state, in a manner proportional to the environmental volatility *h*.

In order to compare (2) to standard decision making models like the drift-diffusion model (DDM) we will transform the expression into a differential equation. We can accomplish this by taking the logarithm of (2), then substituting â = log(*R*), and finally taking the limit of Δ*t* goes to 0 (See Veliz-Cuba et al. 2016 and supplementary materials for details):

This differential equation describes the evolution of the log-probability ratio of being in each of the two hidden states indicates more evidence for *S*^{1}, while â < 0 indicates more evidence for *S*^{2}. Momentary evidence samples *∊*_{t} are incorporated into the log-probability ratio through the evidence term . The previously accumulated evidence is forgotten by a nonlinear discounting term (−2h sinh (â)) (See Fig 2C). The evidence discounting reduces the effect of older evidence, weighting recent evidence more. This discounting reflects the fact that older evidence may no longer be informative of the current state of the environment. In a static environment (*h* = 0), the discounting term is eliminated, and the ideal observer perfectly integrates the momentary evidence samples. In analysis of the static decision making models, the evidence term is commonly approximated by its expectation (drift) and variance (diffusion), transforming (3) into the Drift-Diffusion Model (DDM) for decision making (Bogacz et al., 2006).

From this point on our derivation departs from existing results in the literature. In order to develop a deeper understanding of the optimal inference on our task, we will evaluate the evidence term. Because of the discrete nature of the Poisson evidence, this term can be precisely evaluated for each evidence sample in a way that is not possible in other decision making tasks. In a small sample window of duration Δ*t*, the probability of a Poisson event is *r*Δ*t*, where *r* is the parameter of the Poisson process (provided *r*Δ*t* << 1). In our task a momentary sample *∊*_{t} is the result of two independent Poisson processes and can take on four possible values: a click on both sides, a click on the right, a click on the left, or no clicks. Evaluating the evidence term for these four conditions:

A click on both sides

No clicks

A click on the right

A click on the left

We define the function *κ*(*r*_{1},*r*_{2}) to be the increase in the log-probability ratio from the arrival of a single click on the right, given click rates *r*_{1}, *r*_{2}. The function *κ* tells us how reliably each click indicates the hidden state. This is easily seen when letting Δ*t* → 0, so . If the click rates *r*_{1} and *r*_{2} are very similar (so *κ* is small) then we expect many distractor clicks (clicks from the smaller click rate that do not indicate the correct state), so an individual click tells us little about the underlying state. On the other hand, if the click rates are very different (so *κ* is large) then we expect very few distractor clicks, so an individual click very reliably informs the current state. In the limit of one of the click rates going to zero: *κ* → ∞, and a single click tells us the current state with absolute certainty. In our task, the two click rates *r*_{1} and *r*_{2} always sum to 40 hz. Figure 2A shows *κ* as a function of the click rates.

Re-writing the log-evidence term in (3) in terms of *κ* and using *δ*_{L/R,t} to represent the left/right click times, we can summarize across all four conditions:

We can then rescale equation (8) by *κ*, let , to put our evidence accumulation equation in units of clicks:

Here *δ*_{L}/_{R,t} are trains of delta functions at the times of the left and right clicks. Equation (9) has a simple interpretation, sensory clicks are integrated (*δ*_{R,t} − *δ*_{L,t}), while accumulated evidence is discounted proportionally to the volatility of the environment (*h*), and the reliability of each click (*κ*). This interpretation also allows for a simple assay of behavior: do rats adopt the optimal discounting timescale? We will present two quantitative methods for measuring the rats discounting timescales. However, before examining rat behavior, we need to examine the impact of sensory noise on optimal behavior.

### Sensory noise decreases click reliability

The function *κ* (*r*_{1},*r*_{2}) tells us how reliably each click indicates the underlying state as a function of the click generation rates *r*_{1} and *r*_{2}. The computation above of *κ* assumes that each click is detected and correctly localized as either a left or right click with perfect accuracy. Previous studies using pulse-based evidence demonstrate that rats have significant sensory noise (Brunton et al., 2013; Scott et al., 2015). The term sensory noise in the context of these studies refers to sources of errors that scale with the number of pulses of evidence. Sensory noise was measured by fitting parametric models that included a parameter for how much uncertainty in the accumulation variable was increased due to each pulse of evidence. The exact biological origin of this noise remains unclear. It could arise from sensory processing errors, or from disruption of coding in the putative integration circuit at the moment of pulse arrival. Regardless of its origins, sensory noise is a significant component of rodent behavior.

We will now show that sensory noise decreases how reliably each click indicates the underlying state. While sensory noise can be modeled in many ways, primarily the mislocalization of clicks changes the click reliability. We analyze the cases of Gaussian noise on the click amplitudes and missing clicks, and provide a general argument for mislocalization in the supplementary materials. Mislocalization refers to how often clicks are incorrectly localized to the other speaker (hearing a click from the left and assigning it to the right). For intuition, consider that if a rat could never tell whether a click was played from the right or left then each click would never indicate any information about the underlying state. We can again evaluate the log-evidence term, this time including the probability of click mislocalization (*n*):

A click on the right

A click on the left

The terms for no clicks, or clicks on both sides evaluate to 0. As in the case with no sensory noise, the log-evidence is either 0, or has value κ. We can simplify the expression for *κ* by letting Δ*t* → 0:

Sensory noise decreases how reliably each click informs the underlying state in the trial, increasing *n* decreases *κ*. If *n* = 0, we recover the original *κ* derived without noise. If *n* = 0.5, then each click is essential heard on a random side, and therefore contains no information so *kappa* = 0. If *n* = 1, then we simply flip the sign of all clicks.

Previous studies using the same auditory clicks have shown that rats have significant sensory noise. Figure 2B shows *κ* against *n*, and highlights the average sensory noise, and corresponding *κ*, found in a previous study (Brunton et al., 2013).

### Lower click reliability requires longer integration timescales

The discounting term of equation (9) has *κ* in the denominator as well as the argument of the sinh term. As a result, it is not clear how decreasing the click reliability *κ* changes the behavior of the optimal inference agent. To gain insight, consider that if evidence is very reliable, accurate decisions can be made by only using a few clicks from a small time window. However, if evidence is unreliable, a longer time window must be used to average out unreliable clicks. This intuition is confirmed by plotting the discounting function for a variety of evidence reliability values (Figure 2C). Decreasing reliability weakens the evidence discounting term creating longer integration timescales. See the supplementary materials for more details.

### Evidence discounting leads to changes of mind

The optimal inference equation attempts to predict the hidden state. As the hidden state dynamically transitions, we expect the inference process to track, albeit imperfectly, the dynamic transitions. From the perspective of a subject this dynamic tracking leads to changes of mind in the upcoming choice. Through the optimal inference process we can predict the timing of changes of mind by looking for times when the sign of the inference process changes (sign(*a*)). The presence of sensory noise slows the integration timescale, and thus slows the timing of changes of mind. Figure 2D shows the predicting timing of changes of mind with and without sensory noise.

### Linear approximation to nonlinear discounting function is very accurate

The full nonlinear discounting function , is complicated. In order to aid our analysis of rat behavior, we will consider a linear approximation to the discounting function (−λ*a*), where λ gives the discounting rate. There are many possible linear approximations with different slopes. A linear approximation using the slope of sinh at the origin will fail to capture the strong discounting farther from the origin. We found the best linear approximation numerically.

Figure 2E shows, for a particular noise level and click rates, the accuracy of a range of linear discounting agents against the full nonlinear agent. If λ is tuned correctly, the linear agent accuracy is very close to the full nonlinear function. We find this to be true across a wide range of noise values (Figure 2F). While the optimal linear strength at each noise level changes (Figure 2G), the accuracy is always very close to that of the full nonlinear theory. It is important to note that a linear approximation in general will not always be close in accuracy to the full nonlinear theory, but for our specific click rate parameters it is an accurate approximation. See Veliz-Cuba et al. 2016 for examples of evidence statistics for which the linear approximation does not fit as well.

Given that a linear discounting function matches the accuracy of the full nonlinear model, we will analyze rat evidence discounting behavior by looking for the appropriate discounting rate or equivalently the appropriate integration timescale. Specifically, we will compare the rat behavior to this linear discounting equation: where λ is the discounting rate and is the integration timescale. We did not examine whether rats demonstrate nonlinear evidence discounting because the linear approximation in our task is effectively indistinguishable from the full nonlinear theory.

### Psychophysical reverse correlation reveals the integration timescale

Psychophysical reverse correlation is a commonly used statistical method to find what aspects of a behavioral stimulus influence a subject’s choice. Here we use reverse correlation to find the integration timescale used by the rats. We then normalized the reverse correlation curve to have an area under the curve equal to one. This step lets the curves be interpreted in units of effective weight at each time point. A flat reverse correlation curve indicates even weighting of evidence across all time points. Previous studies in a static environment find rats with flat reverse correlation curves (Brunton et al., 2013; Hanks et al., 2015; Erlich et al., 2015). Figure 3A shows the reverse correlation for an example rat in a dynamic environment. The stimulus earlier in the trial is weighted less than the stimulus at the end of the trial indicating evidence discounting. Figure 3B shows the mean reverse correlations for all rats in the study. Figure 3C shows the reverse correlation curves from a family of linear discounting agents (*da* = *δ*_{R} − *δ*_{L} − λ*adt*), with λ ranging from 0 to 30. The curves were generated from a synthetic dataset of 20,000 trials. The weaker the discounting rate, the flatter the reverse correlation curves. To quantify the discounting timescale from the reverse correlation curves, an exponential function *e*^{bt} was fit to each curve. The parameter *b* reliably recovers the discounting rate λ (Figure 3D).

### Rats adapt to the optimal timescale

To compare each rat’s evidence discounting timescale to the optimal inference equation, we simulated the optimal inference agent on the trials each rat experienced. We then computed the reverse correlation curves for both the rats and the optimal agent (Figure 4A). To quantitatively compare timescales, we then fit an exponential function to each of the reverse correlation curves. Rat behavior was compared with two optimal agents. The first optimal agent assumes no sensory noise; while the second agent uses the optimal timescale given the average level of sensory noise across rats reported in Brunton et al. 2013 (Figure 4B). When the average level of sensory noise is taken into account, the rats match the optimal timescale. The reverse correlation analysis shows that rats are close to optimal given the average level of sensory noise in a separate cohort of rats.

### A quantitative behavioral model captures rat behavior

In order to extend our analysis to examine individual variations in noise level and integration timescales, we fit a behavioral accumulation of evidence model from the literature to each rat (Brunton et al., 2013; Hanks et al., 2015; Erlich et al., 2015). This model generates a moment-by-moment estimate of a latent accumulation variable. The dynamical equations for the model are given by:

At each moment in a trial, the model generates a distribution of possible accumulation values *P*(*a*|*t*, *δ*_{R}, *δ*_{L}) In addition to the click integration and linear discounting that was present in our normative theory, this model also parameterizes many possible sources of noise. Each click has multiplicative Gaussian sensory noise, . In addition to the sensory noise, each click is also filtered through an adaptation process, *C*. The adaptation process is parameterized by the adaptation strength *ϕ*, and a adaptation time constant τ_{ϕ}. If *ϕ* > 1 the model has facilitation of sequential clicks, and if *ϕ* < 1 the model has depression of sequential clicks. The accumulation variable a also undergoes constant additive Gaussian noise σ_{a}. Finally, the initial distribution of *a* has some initial variance given by σ_{i} See Brunton et al. 2013 for details on the development and evaluation of this model. One major modification to the model from previous studies is the removal of the sticky bounds *B*, which are especially detrimental to subject performance given the dynamic nature of the task. This model is a powerful tool for the description of behavior on this task because of its flexibility at characterizing many different behavioral strategies (Brunton et al., 2013; Hanks et al., 2015; Erlich et al., 2015).

The model was fit to individual rats by maximizing the likelihood of observing the rat’s choice on each trial. To evaluate the model, we can compare the reverse correlation curves from the model and subject. Figure 5A shows the comparison for an example rat, showing that the model captures the timescale of evidence discounting seen by the reverse correlation analysis. See the supplemental materials for residual error plots for each rat.

In order to analyze the model fits we can examine the best fit parameters for each rat, and compare them to rats trained on the static version of the task (from Brunton et al. 2013). The evidence discounting strength parameter λ shows a striking difference between the two rat populations (Figure 5B). In the static task, the rats have small discounting rates indicating an integration timescale comparable to the longest trial the rats experienced (Brunton et al., 2013; Hanks et al., 2015; Erlich et al., 2015). In the dynamic task, the rats have strong evidence discounting, consistent with the reverse correlation analysis. See the supplemental materials for a comparison of other model parameters.

To assess whether rats individually calibrate their discounting timescales to their level of sensory noise, we estimate the sensory noise level from the model parameters. We estimated the click mislocalization probability by taking the average level of adaptation, and the Gaussian distributed sensory noise. Figure 5C shows each rat’s fit compared to the numerically obtained optimal discounting levels from Figure 2F. The rats appear to have slightly larger discounting rates than predicted by the normative theory. The deviation from the normative theory may be due to other parameters in the behavioral model, the fact that we considered only the average level of sensory adaptation, or other factors. In order to more directly examine whether the rats were adopting the optimal timescale, we asked whether the rat’s discounting rates were constrained by the other model parameters. For each rat, we took the best fitting model parameters, and froze all parameters except the discounting rate parameter λ. Then, we found the value of λ that maximized accuracy on the trials each rat performed. Note this optimization did not ask to maximize the similarity to the rat’s behavior. We found that given the other model parameters, the accuracy maximizing discounting level was very close to the rat’s discounting level (Figure 5D) meaning that different sources of noise parametrized in the model highly constrain the rats’ discounting rates. Further, while the discounting rates changed slightly, the improvement in total trial accuracy changed even less. For all rats, optimizing the discounting rate increased the total accuracy of the model by less than 1% (Figure 5E). Taken together these results suggest that rats discount evidence at the optimal level given several sources of noise.

### Individual rats in different environments

Previous studies have demonstrated that rats can optimally integrate evidence in a static environment (Brunton 2013). Here we have demonstrated that rats can optimally integrate and discount evidence in a dynamic environment. In order to demonstrate the ability of individual rats to adapt their timescales in different environments, we moved three rats from a dynamic environment (*h* = 0.5 Hz) to a static environment (*h* = 0 Hz), and then back. The rats trained in each environment for many daily sessions (minimum 25 sessions). In each environment, we quantified their behavior using reverse correlation methods. Figure 6A-C show the reverse correlation curves for an example rat as the rat transitioned between environments with different statistics. Figure 6D shows the integration timescales for each rat in each environment. Rats rapidly adjusted their timescales when moving into a static environment, a session-by-session estimate is in the supplementary materials Figure 23. Consistent with our normative theory, rats in the *h* = 0.5 Hz environment show discounting rates approximately half the strength of rats in the *h* = 1 Hz environment. We find rats can dynamically adjust their integration behavior to match their environments.

## Discussion

We have developed a pulse-based auditory decision making task in a dynamic environment. Using a high-throughput automated rat training, we trained rats to accumulate and discount evidence in a dynamic environment. Extending results from the literature (Veliz-Cuba et al., 2016), we formalized the optimal behavior on our task, which critically involves discounting evidence on a timescale proportional to the environmental volatility and the reliability of each click. The reliability of each click depends on the experimenter imposed click statistics, and each rat’s sensory noise. We find that once sensory noise is taken into account, the rats have timescales consistent with the optimal inference process. We used quantitative modeling to investigate rat to rat variability, and to predict a moment-by-moment estimate of the rats’ accumulated evidence. Finally, we demonstrated rats can rapidly adjust their discounting behavior and respectively their integration timescales in response to changing environmental statistics. Our findings open new questions into complex rodent behavior and the underlying neural mechanisms of decision making.

Previously accumulation of evidence has been studied in a static stationary environment. These studies have given behavioral and neural insights into the ability of rats, monkeys, and humans to optimally accumulate evidence over extended timescales (Brunton et al., 2013; Kira et al., 2015; Purcell et al., 2010; Philiastides et al., 2011; Lee and Cummins, 2004; Kelly and O’Connell, 2013; Gold and Shadlen, 2001). These studies have showed that rats or primates, like humans, can gradually accumulate evidence for decision-making, and that their evidence accumulation process timescale is optimal. Quantitative modeling revealed that errors originated from sensory noise, not from the evidence accumulation process. The optimal strategy in the stationary environment is perfect integration. A natural extension of the static version of the task is a setting in which the environment changes with some defined statistics and this what we aimed to do in our ”dynamic clicks task”. In the dynamics clicks task, the optimal strategy involves discounting evidence at a rate proportional to the volatility of the environment and the reliability of each evidence pulse. The behavioral quantitative modeling builds on a study that derived ideal observer models for dynamic environments, including the two-state environments considered here, and more complex environments (Veliz-Cuba et al., 2016). That study analyzed the behavior of ideal agents with Gaussian distributed evidence samples. Our work builds on their derivation of ideal behavior, and extends their analysis to discrete evidence. Importantly, our analysis allowed us to separate evidence reliability into experimenter imposed stimulus statistics and sensory noise. Moreover, our findings show that rats discounting rates are optimal only when factoring in sensory noise. We have also shown that rats can switch back and forth between environments with different volatilities thus providing for the first time a knob for the experimenter to control the subjects’ integration timescale.

On the other hand, a recent study examined human decision making in a dynamic environment (Glaze et al., 2015). That study found that humans show nonlinear evidence discounting, but their discounting rates did not match with the optimal inference. Incorporating models of human sensory noise could explain deviations from optimality in their data. We did not examine whether rats demonstrate nonlinear evidence discounting because the linear approximation in our task is effectively indistinguishable from the full nonlinear theory (Figure 2). Other studies in humans have also found that humans perform leaky integration in dynamic environments (Ossmy et al., 2013).

The behavior presented here is distinct from previous tasks that have investigated decision making over time. Cisek et al. 2009 developed an evidence accumulation task in which the amount of evidence changes over the course of the trial. However, in that study the evidence is generated from a stationary process and the optimal behavior is to perfectly integrate all evidence. This is in contrast to the present study that examines conditions under which the optimal behavior is to discount old evidence.

In a separate line of work called bandit tasks, the subject gets reward or feedback on a timescale slower than the dynamics of the environment (Iigaya et al., 2017; Miller et al., 2017). In bandit tasks, the environment changes slowly with respect to each choice, and subjects get many opportunities for reward and feedback before the environment changes. In the work presented here, the subjects must perform inference without feedback while the dynamics of the environment are changing within the course of one trial. Importantly, in our task the environmental state “resets” after each choice the rat makes.

The dynamic accumulation of evidence task that we are presenting here should not also be confused with the conventional change detection tasks, which have only a single change of mind. In our case, we have many changes of mind that are happening stochastically. See Fig 2 in Veliz-Cuba et al. 2016 for a detailed discussion on the relationship between these tasks.

It is very important to note that the term “evidence discounting” is different than “temporal discounting” prominently used in the reinforcement learning literature. Temporal discounting is the phenomenon in which the subjective value of some reward decreases in magnitude when the given reward is delayed (Dayan and Abbott, 2005, pg.352). In our case, evidence discounting is the phenomenon in which an agent discards evidence in order to infer state changes in the environment.

One benefit of rodent studies is the wide range of experimental tools available to investigate the neural mechanisms underlying behavior. Our task will facilitate the investigation of two neural mechanisms. First, due to the dynamic nature of each trial, subject’s change their mind often during each trial allowing experimental measurement of changes of mind within one trial. Further, these changes of mind are driven by internal estimates of accumulated evidence. Previous studies of rat decision making have identified a cortical structure, the Frontal Orienting Fields (FOF) as a potential substrate for upcoming choice memory (Erlich et al., 2011; Hanks et al., 2015; Erlich et al., 2015; Kopec et al., 2015; Piet et al., 2017). Future work could investigate if and how the FOF tracks upcoming choice in a dynamic environment during changes of mind. It will also complement already existing neurophysiological studies of changes of mind (Kiani et al., 2014; Peixoto et al., 2016)

Second, normative behavior in a dynamic environment requires tuning the timescale of evidence integration to the environmental volatility. There is a large body of experimental and theoretical studies on neural integrators (Seung, 1996; Goldman, 2009; Aksay et al., 2007; Scott et al., 2017) that investigates how neural circuits potentially perform integration. Many possible neural circuit mechanisms have been proposed, from random unstructured networks (Maass et al., 2002; Ganguli et al., 2008), feed-forward syn-fire chains (Goldman, 2009), and recurrent structured networks of many forms (Seung, 1996; Druckmann and Chklovskii, 2012; Boerlin et al., 2013). The task developed here allows for experimental control of the putative neural integrator’s timescale within the same subject. Measurement of neural activity in different dynamic environments, and thus different integration timescales, may shed light into which mechanisms are used in neural circuits for evidence integration. For instance, unstructured networks, or feed-forward networks may re-tune themselves via adjusting read-out weights. Networks that integrate via recurrent dynamics; however, would re-tune themselves via changes in those recurrent dynamics. Alternatively, measurement of neural activity in different dynamic environments may reveal fundamentally new mechanisms of evidence integration. For instance, Erlich et al. 2015 proposed multiple integration networks with different timescales to account for behavioral changes in response to prefrontal cortex inactivations. Our task may allow further investigation into the structure and dynamics of neural integrators.

## Methods

### Subjects

Animal use procedures were approved by the Princeton University Institutional Animal Care and Use Committee and carried out in accordance with NIH standards. All subjects were adult male Long Evans rats (Vendor: Taconic and Harlan, USA) placed on a controlled water schedule to motivate them to work for a water reward.

### Behavioral Training

We trained 14 rats on the dynamic clicks task (Figure 1). Rats went through several stages of an automated training protocol. In the final stage, each trial began with an LED turning on in the center nose port indicating to the rats to poke there to initiate a trial. Rats were required to keep their nose in the center port (nose fixation) until the light turned off as a “go” signal. During center fixation, auditory cues were played indicating the current hidden state. The duration of the fixation period (and stimulus period) ranged from 0.5 to 2 seconds. After the go signal, rats were rewarded for entering the side port corresponding to the hidden state at the end of the stimulus period. The hidden state did not change after the go signal. A correct choice was rewarded with 24 microliters of water; while an incorrect choice resulted in a punishment noise (spectral noise of 1 kHz for a 0.7 seconds duration). The rats were put on a controlled water schedule where they receive at least 3% of their weight every day. Rats trained each day in a training session on average 120 minutes in duration. Training sessions were included for analysis if the overall accuracy rate exceeded 70%, the center-fixation violation rate was below 25%, and the rat performed more than 50 trials. In order to prevent the rats from developing biases towards particular side ports an anti-biasing algorithm detected biases and probabilistically generated trials with the correct answer on the non-favored side.

### Linear discounting agents

To analyze the performance of linear discounting agents at varying levels of noise, we created synthetic noisy-datasets. For each level of click noise, each click switched sides according to the noise level. On each of these datasets, we numerically optimized the discounting level that maximized the accuracy of predicting the hidden state at the end of the trial.

### Psychophysical reverse correlation

The computation of the reverse correlation curves was very similar to methods previously reported (Brunton et al., 2013; Hanks et al., 2015; Erlich et al., 2015). However, one additional step is included to deal with the hidden state. The first step is to smooth the click trains on each trial with a causal Gaussian filter (*k*(*t*)), this creates one smooth click rate for each trial. The filter had a standard deviation of 5 msec.

Then, the smooth click rate on each trial was normalized by the expected click rate for that time step, given the current state of the environment. This gives us the deviation (the excess click rate) from the expected click rate for each trial.

Finally, we compute the choice triggered average of the excess click rate by averaging over trials based on the rat’s choice.

The excess rate curves were then normalized to integrate to one. This was done to remove distorting effects of a lapse rate, as well to make the curves more interpretable by putting the units into effective weight of each click on choice. To quantify the timescale of the reverse correlation curves, we fit an exponential of the form *ae*^{bt} to each curve. The parameter *b* is the discounting rate, while 1/*b* is the integration timescale.

### Behavioral Model

Previous studies using this behavioral accumulation of evidence model (Brunton et al., 2013) have included sticky bounds which absorb probability mass when the accumulated evidence reaches a certain threshold. We found this sticky bounds to be detrimental to high performance on our task, so we removed them. The removal of the sticky bounds facilitates an analytical solution of the model. The model assumes an initial distribution of accumulation values . At each moment in the trial, the distribution of accumulation values *P*(*a*|*t*, *δ*_{R}, *δ*_{L}) is Gaussian distributed with mean (*μ*) and variance (σ^{2}) given by:

Where #*R* is the number of right clicks on this trial up to time *t*, and *R*(*i*) is the time of the *i*^{th} right click. *C*(*R*(*i*)) tells us the effective adaptation for that clicks. For a detailed discussion of a similar model, see Feng et al. 2009.

Given a distribution of accumulation values , and the bias parameter *B*, we can compute the left and right choice probabilities by:

These choice probabilities are then distorted by the lapse rate, which parameterizes how often a rat makes a random choice:. The model parameters *θ* were fit to each rat individually by maximizing the likelihood function:

Additionally, a half-gaussian prior was put on the initial noise (σ_{i}) and accumulation noise parameters (σ_{a}). Due to the presence of large discounting rates, these parameters are difficult to recover in synthetic datasets. The priors were set to match the respective best fit values from Brunton et al. 2013. The numerical optimization was performed in MATLAB. To estimate the uncertainty on the parameter estimates, we used the inverse hessian matrix as a parameter covariance matrix (Daw, 2011). To compute the hessian of the model, we used automatic differentiation to exactly compute the local curvature (Revels et al., 2016).

### Calculating noise level from model parameters

Given the model parameters (, *ϕ*, and τ_{ϕ}), we computed the average level of sensory adaptation on each click ⟨C⟩. Then, we computed what fraction of the probability mass would cross 0 to be registered as a click on the other side.

## Author Contributions

AP:Task design, rat training, theoretical analysis, quantitative methods development and application. AE: Task design, rat training, and advised during all aspects of the study. CB: advised during all aspects of the study

## Psychophysical Reverse Correlation details

Here we present two control analyses on our reverse correlation method. First, we show that our method is not biased by the presence of a lapse rate, unlike logistic regression. Second, we rule out degenerate strategies like deciding based on only the last click.

## Are the rats really integrating? Ruling out last click strategies

One possible concern is that the rats might be relying on degenerate strategies like choosing based on the last click they heard. Or that the rat’s integration timescale is so short, that their behavior shouldn’t really be considering integration. Figure 19A shows a quasi-fixed point analysis of the optimal accumulation equation given a noise level. Assuming the environment stays in one state for a long time, we then replace the evidence term with the expected rate of clicks, and solve for the steady state accumulation value. We can see that for all noise levels, the fixed point lies above 1 click, so the optimal behavior necessarily involves integrating clicks. For the average rat noise level, we see integration of about 5 clicks.

Figure 19B shows the recovered discounting rate from the reverse correlation method against a simulated discounting agents, similar to Figure 3. Here, we include much stronger discounting agents, and find the recovered discounting rate asymptotes at just under 36 Hz, which is the expected total click rate(*r*_{1} − *r*_{2} ≈ 36). The last click strategy could be considered a discounting agent with an infinite discounting rate, and would be recovered in our analysis as a discounting rate of about 36. We find our rats are well away from this limit. Thus we confidently rule out a last click strategy.

## Model details

## Acknowledgements

We thank all members of the Brody lab for technical assistance, and feedback throughout the project. We thank Ben Scott, Diksha Gupta, Tim Hanks, and Christine Constantinople for detailed comments on the manuscript. This work was supported in part by NIH grant 5-R01-MH108358