Abstract
To gain insight into the process by which animals choose between actions, we trained mice in a two-armed bandit task with time-varying reward probabilities. Whereas past work has modeled the selection of the higher rewarding port in such tasks, we sought to also model the trial-to-trial changes in port selection – i.e. the action switching behavior. We find that mouse behavior deviates from the theoretically optimal agent performing Bayesian inference in a hidden Markov model (HMM). Instead the strategy of mice can be well-described by a set of models that we demonstrate are mathematically equivalent: a logistic regression, drift diffusion model, and ‘sticky’ Bayesian model. Here we show that switching behavior of mice is characterized by several components that are conserved across models, namely a stochastic action policy, a representation of action value, and a tendency to repeat actions despite incoming evidence. When fit to mouse behavior, the expected reward under these models lies near a plateau of the value landscape even in changing reward probability contexts. These results indicate that mouse behavior reaches near-maximal performance with reduced action switching and can be described by models with a small number of relatively fixed-parameters.
Competing Interest Statement
The authors have declared no competing interest.