Abstract
Decision making in non-stationary and stochastic environments can be interpreted as a variant of non-stationary multi armed bandit task where the optimal decision requires identification of the current context. We formalize the problem using a Bayesian approach taking biological constraints into account (limited memory) that allow us to define a sub-optimal theoretical model. From this theoretical model, we derive a biological model of the striatum based on its micro-anatomy that is able to learn state and action representations. We show that this model matches the theoretical model for low stochasticity in the environment and could be considered as a neural implementation of the theoretical model. Both models are tested on non-stationary multi-armed bandit task and compared to animal performances.
Author Summary Decision making in changing environments requires knowledge of the current context in order to adapt the response to the environment. Such context identification is based on the recent history of actions and their outcome: when some action used to be rewarded but is not anymore, it might be a sign of a context change. An ideal observer with infinite memory could optimally estimate the current context and act accordingly. Taking biological constraint into account, we show that a model of the striatum, which is the largest nucleus of the basal ganglia, can solve the task in a sub-optimal way as it has been shown to be the case in rats in a T-maze task.