Abstract
A central difficulty for computational theories of planning is that the value of an action you take now depends on which actions you take afterwards. Thus, optimal actions are coupled across states. We argue that this coupling underlies a pattern of challenges for reinforcement learning models in neuroscience to explain both the brain’s flexibilities and its inflexibilities. Building on recent advances in control engineering, we propose a new model for goal-directed decision making in the brain, which unlike previous attempts, is efficient, flexible and biologically realistic. This theory connects a wide range of seemingly disparate empirical and theoretical phenomena across different areas of neuroscience, such as flexible decision making, efficient replanning, cognitive control and Pavlovian response biases. We also propose that entorhinal grid code encodes a map of state expectancies under a default policy, which provides a computational mechanism for how these cells could contribute to flexible planning.