ABSTRACT
When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Using a reinforcement learning task with a spatially structured reward function, we show that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
We have clarified the connections with prior studies of this paradigm, highlighted the advantages of our SCEPTIC model, acknowledged prior work on spatial generalization in reinforcement learning, added a discussion of random vs. directed exploration, clarified our definition of response time (RT) swings, performed additional whole-brain analyses with both model-derived regressors, tested the trial-level alternative model across the range of learning rates, provided additional information about the SCEPTIC model, performed new analyses ruling out a time-dependent shift account of the antero-posterior signal dissociation in the hippocampus, clarified the nature of exploratory choices on the task, and explicated what we meant by information compression.