Abstract
A central difficulty for computational theories of planning is that the value of an action taken now depends on which actions are chosen afterward. Thus, optimal choices are coupled across states. We argue that this interdependence underlies a pattern of challenges for reinforcement learning models to explain both the brain’s flexibilities and inflexibilities. Building on advances in control engineering, we propose a model for decision-making in the brain that is more efficient, flexible and biologically realistic than previous attempts. It replaces the classic iterative optimization with a linear approximation that addresses interdependence by softly maximizing around a default policy. This solution exposes connections between seemingly disparate phenomena across neuroscience, notably flexible replanning with biases and cognitive control. It also gives new insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.