Abstract
Complex behaviors are often driven by an internal model, which integrates sensory information over time and facilitates long-term planning to reach subjective goals. We interpret behavioral data by assuming an agent behaves rationally — that is, they take actions that optimize their subjective reward according to their understanding of the task and its relevant causal variables. We apply a new method, Inverse Rational Control (IRC), to learn an agent’s internal model and reward function by maximizing the likelihood of its measured sensory observations and actions. This thereby extracts rational and interpretable thoughts of the agent from its behavior. We also provide a framework for interpreting encoding, recoding and decoding of neural data in light of this rational model for behavior. When applied to behavioral and neural data from simulated agents performing suboptimally on a naturalistic foraging task, this method successfully recovers their internal model and reward function, as well as the computational dynamics within the neural manifold that represents the task. This work lays a foundation for discovering how the brain represents and computes with dynamic beliefs.
Footnotes
Conceptual framework: XP, PS. Discrete control: ZW, XP, PS. Continuous control: SD, MK, XP, PS. Neural simulations: ZW. Neural analysis: ZW, MK, XP. Initial draft: XP, ZW, SD. Editing: XP, ZW, SD, PS. Funding acquisition: XP, PS.
↵† Unfortunately, the conventional notations in EM and reinforcement learning collide here, both using the same letter: this Q auxiliary function is denoted in the Calligraphic font to distinguish it from the state-action value function Q in the MDP model.