## ABSTRACT

A cognitive map has long been the dominant metaphor for hippocampal function, embracing the idea that place cells encode a geometric representation of space. However, evidence for predictive coding, reward sensitivity, and policy dependence in place cells suggests that the representation is not purely spatial. We approach this puzzle from a reinforcement learning perspective: what kind of spatial representation is most useful for maximizing future reward? We show that the answer takes the form of a predictive representation. This representation captures many aspects of place cell responses that fall outside the traditional view of a cognitive map. Furthermore, we argue that entorhinal grid cells encode a low-dimensional basis set for the predictive representation, useful for suppressing noise in predictions and extracting multiscale structure for hierarchical planning.

## Introduction

Learning to predict long-term reward is fundamental to the survival of many animals. Some species may go days, weeks or even months before attaining primary reward, during which time aversive states must be endured. Evidence suggests that the brain has evolved multiple solutions to this reinforcement learning (RL) problem^{1}. One solution is to learn a model or “cognitive map” of the environment^{2}, which can then be used to generate long-term reward predictions through simulation of future states^{1}. However, this solution is computationally intensive, especially in real-world environments where the space of future possibilities is virtually infinite. An alternative “model-free” solution is to learn, from trial-and-error, a value function mapping states to long-term reward predictions^{3}. However, dynamic environments can be problematic for this approach, because changes in the distribution of rewards necessitates complete relearning of the value function.

Here, we argue that the hippocampus supports a third solution: learning of a “predictive map” that represents each state in terms of its successor states^{4,5}. This representation is sufficient for long-term reward prediction, is learnable using a simple, biologically plausible algorithm, and explains a wealth of data from studies of the hippocampus.

Our primary focus is on understanding the computational function of hippocampal place cells, which respond selectively when an animal occupies a particular location in space^{6}. A classic and still influential view of place cells is that they collectively furnish an explicit map of space^{7,8}. This map can then be employed as the input to a model-based^{9–11}or model-free^{12,13}RL system for computing the value of the animal’s current state. In contrast, the predictive map theory views place cells as encoding predictions of future states, which can then be combined with reward predictions to compute values. This theory can account for why the firing of place cells is modulated by variables like obstacles, environment topology, and direction of travel. It also generalizes to hippocampal coding in non-spatial tasks. Beyond the hippocampus, we argue that entorhinal grid cells^{14}, which fire periodically over space, encode a low-dimensional decomposition of the predictive map, useful for stabilizing the map and discovering subgoals.

## Results

### The successor representation

We consider the problem of RL in a Markov decision process consisting of the following elements^{15}: a set of states (e.g., spatial locations), a set of actions, a transition distribution *P*(*s′*|*s,a*) specifying the probability of transitioning to state *s′* from state *s* after taking action *a*, a reward function *R*(*s*) specifying the expected immediate reward in state *s*, and a discount factor *γ* ∈ [0,1] that down-weights distal rewards. An agent chooses actions according to a policy *π*(*a|s*) and collects rewards as it moves through the state space. The value of a state is defined formally as the expected discounted cumulative future reward under policy *π*:
where *s _{t}* is the state visited at time

*t*. Our focus here is on policy evaluation (computing

*V*). In our simulations we feed the agent the optimal policy; in the Supplemental Methods we discuss algorithms for policy improvement. To simplify notation, we assume implicit dependence on

*π*and define the state transition matrix

*T*, where

*T*(

*s,s′*) =

**Σ**

_{a}

*π*(

*a|s*)

*P*(

*s′|s,a*).

The value function can be decomposed into the inner product of the reward function with a predictive state representation known as the successor representation (SR)^{4}, denoted by *M*:

The SR encodes the expected discounted future occupancy of state *s′* along a trajectory initiated in state *s*:
where 𝕀(·) = 1 if its argument is true, and 0 otherwise.

An estimate of the SR (denoted ) can be incrementally updated using a form of the temporal difference learning algorithm^{4,16}. After observing a transition *s _{t}* →

*s*

_{t+1}, the estimate is updated according to: where η is a learning rate (unless specified otherwise, η = 0.1 in our simulations). The form of this update is identical to the temporal difference learning rule for value functions

^{15}, except that in this case the reward prediction error is replaced by a

*successor prediction error*(the term in brackets). Note that these prediction errors are distinct from state prediction errors used to update an estimate of the transition function

^{17}; the SR predicts not just the next state but a superposition of future states over a possibly infinite horizon. The transition and SR functions only coincide when

*γ*= 0.

The SR combines some of the advantages of model-free and model-based algorithms. Like model-free algorithms, policy evaluation is computationally efficient with the SR. However, factoring the value function into a state dynamics SR term and a reward term confers some of the flexibility usually associated with model-based methods. Separate terms for state dynamics and reward permit rapid recomputation of new value functions when reward is changed independently of state dynamics, as demonstrated in Fig. 1. As the animal toggles between hunger and thirst, or when food is redistributed about the environment, the animal can immediately recompute a new value function based on its expected state transitions. A model-free agent would have to relearn value estimates for each location in order to make value predictions, while a model-based agent would need to aggregate the results of time-consuming searches through its mode^{11,4}. This advantage is demonstrated in Fig. S1, in which we demonstrate that while changing the reward function completely disrupts model free learning of a value function in a 2-step tree maze, SR learning can quickly adjust. Thus, the SR combines the efficiency of model-free control with some of the flexibility of model-based control.

For an agent trying to optimize expected discounted future reward, two states that predict similar successor states are necessarily similarly valuable, and can be safely grouped together^{18}. This makes the SR a good metric space for generalizing value. Since adjacent states will frequently lead to each other, the SR will naturally represent adjacent states similarly and therefore be smooth over time and space in spatial tasks. Since the SR is well defined for any Markov decision process, we can use the same architecture for many kinds of tasks, not just spatial ones.

### Hippocampal encoding of the successor representation

We now turn to our main theoretical claim: that the SR is encoded by the hippocampus. This hypothesis is Based on the central role of the hippocampus in representing space and context^{19}, as well as its contribution to sequential decision making^{20,21}. Although the SR can be applied to arbitrary state spaces, we focus on spatial domains where states index locations.

Place cells in the hippocampus have traditionally been viewed as encoding an animal’s current location. In contrast, the predictive map theory views these cells as encoding an animal’s *future* locations. Crucially, an animal’s future locations depend on its policy, which is constrained by a variety of factors such as the environmental topology and the locations of rewards. We demonstrate that these factors shape place cell receptive field properties in a manner consistent with a predictive map.

To build some intuition for this idea, Fig. 2 illustrates the differences between our SR model (Fig. 2C) and two alternative models (Fig. 2A-B). As examples, we implement the three models for a 2D room containing an obstacle and for a 1D track with an established preferred direction of travel. The first alternative model is a Gaussian place field in which firing is related to the Euclidean distance from the field center (Fig. 2A), usually invoked for modelling place field activity in open spatial domains^{22,23}. The second alternative model is a topologically sensitive place field in which firing is related to the average path length from the field center, where paths cannot pass through obstacles^{13} (Fig. 2A). Like the topological place fields and unlike the Gaussian place fields, the SR place fields respect obstacles in the 2D environment. Since states on opposite sides of a barrier do not frequently occur nearby in time, SR place fields will tend to be active on only one side of a barrier.

On the 1D track, the SR place fields skew opposite the direction of travel. This backward skewing arises because upcoming states can be reliably predicted further in advance when traveling repeatedly in a particular direction. Neither of the control models provide ways for a directed behavioral policy to interact with state representation, and therefore cannot show this effect. Evidence for predictive skewing comes from experiments in which animals traveled repeatedly in a particular direction along a linear track^{24,25}. In Fig. 3, we explain how a future-oriented representation evokes a forward-skewing representation in the population at any given point in time but implies that receptive fields for any individual cell should skew backwards. In order for a given cell to fire predictively, it must begin firing before its encoded state is visited, causing a backward-skewed receptive field. Figure 4 compares the predicted and experimentally observed backward skewing, demonstrating that the model captures the qualitative pattern of skewing observed when the animal has a directional bias.

Consistent with the SR model, experiments have shown that place fields become distorted around barriers^{26–28}. In Figure 5, we explore the effect of placing obstacles in a Tolman detour maze on the SR place fields and compare to experimental results obtained by Alvernhe *et al.*^{28}. When a barrier is placed in a maze such that the animal is forced to take a detour, the place fields engage in “local remapping.” Place fields near the barrier change their firing fields significantly more than those further from the barrier (Fig. 5A-C). When barriers are inserted, SR place fields change their fields near the path blocked by the barrier and less so at more distal locations where the optimal policy is unaffected (Fig. 5D-F). This locality is imposed by the discount factor.

The SR model can be used to explain how hippocampal place field clustering depends on factors such as reward, punishment, and boundaries (Fig. 6). The center of mass of place fields near the wall under a random walk policy will retreat away from the walls of a rectangular room (Fig. 6A). For any model of place cell firing, the centers of mass will be biased away from boundaries, since no firing will be recorded outside of the room^{26}. However, this bias should be more pronounced under the SR model because successor states will also necessarily lie within the environment, shifting the fields asymmetrically away from walls. In fragmented, multi-compartment environments, this has the effect of causing place fields within the same compartment to cluster (Fig. 6B,C). We explore the implications of this clustering in more detail toward the end of this section, accompanied by the illustration in Fig. S2.

The SR model predicts that firing fields centered near rewarded locations will expand to include the surrounding locations and increase their firing rate under the optimal policy, as has been observed experimentally^{29,30}. The animal is likely to spend time in the vicinity of the reward, meaning that states with or near reward are likely to be common successors. SR place fields in and near the rewarded zone will cluster because it is likely that states near the reward were anticipated by other states near the reward (Fig. 6D,E). For place fields centered further from the reward, the model predicts that fields will skew opposite the direction of travel toward the reward, due to the effect illustrated in Fig. 3: a state will only be predicted when the animal is approaching reward from some more distant state. Given a large potentially rewarded zone or a noisy policy, these somewhat contradictory effects are sufficient to produce clustering of place fields near the rewarded zone, as has been observed experimentally in certain tasks^{31,32}(Fig. 6D,E), We predict that punished locations will induce the opposite effect, causing fields near the punished location to spread away from the rarely-visited punished locations (Fig. 6F).

In addition to the influence of experimental factors, changes in parameters of the model will have systematic effects on the structure of SR place fields. Motivated by data showing a gradient of increasing field sizes along the hippocampal longitudinal axis^{33,34}, we explored the consequences of modifying the discount factor ** γ** in Figure S2. Hosting a range of discount factors along the hippocampal longitudinal axis provides a multi-timescale representation of space. It also circumvents the problem of having to assume the same discount parameter for each problem or adaptively computing a new discount. Another consequence is that larger place fields reflect the community structure of the environment. A gradient of discount factors might therefore be useful for decision making at multiple levels of temporal abstraction

^{18,35,36}.

An appealing property of the SR model is that it can be applied to non-spatial state spaces. Fig. 7A-D shows the SR embedding of an abstract state space used in a study by Schapiro and colleagues^{18,37}. Human subjects viewed sequences of fractals drawn from random walks on the graph while brain activity was measured using fMRI. We compared the similarity between SR vectors for pairs of states with pattern similarity in the hippocampus. The key experimental finding was that hippocampal pattern similarity mirrored the community structure of the graph: states with similar successors were represented similarly^{37}. The SR model recapitulates this finding, since states in the same community tend to be visited nearby in time, making them predictive of one another (Fig. 7E-G).

To demonstrate further how the SR model can integrate spatial and temporal coding in the hippocampus, we simulated results from a recent study^{38} in which subjects were asked to navigate among pairs of locations to retrieve associated objects in a virtual city (8A). Since it was possible to “teleport” between certain location pairs, while others were joined only by long, winding paths, spatial Euclidean distance was decoupled from travel time. The authors found that objects associated with locations that were nearby in either space or time increased their hippocampal pattern similarity (Fig. 8B). Both factors (spatial and temporal distance) had a significant effect when the other was regressed out (Fig. 8C). The SR predicts this integrated representation of spatiotemporal distance: when a short path is introduced between distant states, such as by a teleportation hub, those states come predict one another.

### Dimensionality reduction of the predictive map by entorhinal grid cells

Because the firing fields of entorhinal grid cells are spatially periodic, it was originally hypothesized that grid cells might represent a Euclidean spatial metric to enable dead reckoning^{8,14}. Other theories have suggested that these firing patterns might arise from a low-dimensional embedding of the hippocampal map^{5,23,39}. Combining this idea with the SR hypothesis, we argue that grid fields reflect a low-dimensional eigendecomposition of the SR. A key implication of this hypothesis is that grid cells will respond differently in environments with different boundary conditions.

The boundary sensitivity of grid cells was recently highlighted by a study that manipulated boundary geometry^{40}. In square environments, different grid modules had the same alignment of the grid relative to the boundaries (modulo 60°, likely due to hexagonal symmetry in grid fields), whereas in a circular environment grid field alignment was more variable, with a qualitatively different pattern of alignment (Fig. 9A-C). Krupic *et al.* performed a “split-halves” analysis, in which they compared grid fields in square versus trapezoidal mazes, to examine the effect of breaking an axis of symmetry in the environment (Fig 9D,E). They found that moving the animal to a trapezoidal environment, in which the left and right half of the environment had asymmetric boundaries, caused the grid parameters to be different on the two sides of the environment^{40}. In particular, the spatial autocorrelegrams – which reveal the layout of spatial displacement at which the grid field repeats itself – were relatively dissimilar over both halves of the trapezoidal environment. The grid fields in the trapezoid could not be attributed to linearly warping the square grid field into a trapezoid, raising the question of how else boundaries could interact with grid fields.

According to the SR eigenvector model, these effects arise because the underlying statistics of the transition policy changes with the geometry. We simulated grid fields in a variety of geometric environments used by Krupic and colleagues (Fig. 9F-H). In agreement with the empirical results, the orientation of eigenvectors in the circular environment tend to be highly variable, while those recorded in square environments are almost always aligned to either the horizontal or vertical boundary of the square (Fig. 9G,J). The variability in the circular environment arises because the eigenvectors are subject to the rotational symmetry of the circular task space. SR eigenvectors also emulate the finding that grids on either side of a square maze are more similar than those on either side of a trapezoid, because the eigenvectors capture the effect of these irregular boundary conditions on transition dynamics.

Another main finding of Krupic et al.^{40} was that when a square environment is rotated, grids remain aligned to the boundaries as opposed to distal cues. SR eigenvectors inherently reproduce this effect, since a core assumption of the theory is that grid firing is anchored to state in a transition structure, which is itself constrained by boundaries.

A different manifestation of boundary effects is the fragmentation of grid fields in a hairpin maze^{41}. Consistent with the empirical data, SR eigenvector fields tend to align with the arms of the maze, and frequently repeat across alternating arms (Figure 10)^{41}. While patterns at many timescales can be found in the eigenvector population, those at alternating intervals are most common and therefore replicate the checkerboard pattern observed in the experimental data (Fig. S8).

To further explore how compartmentalized environments could affect grid fields, we simulated a recent study^{42} that characterized how grid fields evolve over several days’ exposure to a multi-compartment environment (Fig. 11). While grid cells initially represented separate compartments with identical fields (repeated grids), several days of exploration caused fields to converge on a more globally coherent grid (Fig. 11D-F). With more experience, the grid regularity of the fields simultaneously decreased, as did the similarity between the grid fields recorded in the two rooms (Fig. 11C). The authors conclude that grid cells will tend to a regular, globally coherent grid to serve as a Euclidean metric over the full expanse of the enclosure.

Our model suggests that the fields are tending not toward a globally *regular* grid, but to a predictive map of the task structure, which is shaped in part by the global boundaries but also by the multi-compartment structure. We simulated this experiment by initializing grid fields to a local eigenvector model, in which the animal has not yet learned how the compartments fit together. After the SR eigenvectors have been learned, we relax the constraint that representations be the same in both rooms and let eigenvectors and the SR be learned for the full environment. As the learned eigenvectors converge, they increasingly resemble a global grid and decreasingly match the predictions of the local fit (Fig. 11H-L; Fig. S10). As with the recorded grid cells, the similarity of the fields in the two rooms drops to an average value near zero (Fig. 11I). They also have less regular grids compared to a single-compartment rectangular enclosure, explaining the drop in grid regularity observed by Carpenter *et al.* as the grid fields became more “global”^{42}. Since separating barriers between compartments perturb the task topology from an uninterrupted 2D grid.

A normative motivation for invoking low-dimensional projections as a principle for grid cells is that they can be used to smooth or “regularize” noisy updates of the SR. When the projection is based on an eigendecomposition, this constitutes a form of *spectral regularization*^{43}. For example, a smoothed version of the SR can be obtained by reconstructing the SR from its eigendecomposition using only low-frequency (high eigenvalue) components, thereby filtering out high-frequency noise (see Methods). Importantly, the regularization is topologically sensitive, meaning that smoothing respects boundaries of the environment. This property is not shared by regularization using a Fourier decomposition (Fig. S3). The regularization hypothesis is consistent with data suggesting that although grid cell input is not required for the emergence of place fields, place field stability and organization depends crucially on input from grid cells^{44–46}. These eigenvectors also provide a useful partitioning of the task space, as discussed in the following section.

### Subgoal discovery using grid fields

In structured environments, planning can be made more efficient by decomposing the task into subgoals, but the discovery of good subgoals is an open problem. The SR eigenvectors can be used for subgoal discovery by identifying “bottleneck states” that bridge large, relatively isolated clusters of states, and group together states that fall on opposite sides of the bottlenecks^{47,48}. Since these bottleneck states are likely to be traversed along many optimal trajectories, they are frequently convenient waypoints to visit. Navigational strategies that exploit bottleneck states as subgoals have been observed in human navigation^{49}. It is also worth noting that accompanying the neural results displayed in Fig. 7, the authors found that when subjects were asked to parse sequences of stimuli into events, stimuli found at topological bottlenecks were frequent breakpoints^{18}.

The formal problem of identifying these bottlenecks is known formally as the *k*-way normalized min-cut problem. An approximate solution can be obtained using spectral graph theory^{50}. First, the top log *k* eigenvectors of a matrix known as the graph Laplacian are thresholded such that negative elements of each eigenvector go to zero and positive elements go to one. Edges that connect between these two labeled groups of states are “cut” by the partition, and nodes adjacent to these edges are a kind of bottleneck subgoal. The first subgoals that emerge will be the cut from the lowest-frequency eigenvector, and these subgoals will approximately lie between the two largest, most separable clusters in the partition (see Supplemental Methods for more detail). A prioritized sequence of subgoals is obtained by incorporating increasingly higher frequency eigenvectors that produce partition points nearer to the agent.

The SR shares its eigenvectors with the graph Laplacian (see Supplemental Methods)^{5}, making SR eigenvectors equally suitable for this process of subgoal discovery. We show in Fig. S4 that the subgoals that emerge in a 2-step decision task and in a multicompartment environment tend to fall near doorways and decision points: natural subgoals for high-level planning. It is worth noting that SR matrices parameterized by larger discount factors ** γ** will project predominantly on the large-spatial-scale grid components. The relationship between more temporally diffuse, abstract SRs, in which states in the same room are all encoded similarly (Fig. S2), and the subgoals that join those clusters can therefore be captured by which eigenvalues are large enough to consider.

The fact that large SR fields project predominantly onto eigenvectors with large spatial scales, whereas smaller SR fields project more strongly onto finer scale grid fields, is consistent with the smooth longitudinal gradient in connectivity between MEC and hippocampus^{34}. Hippocampal cells with larger place fields are more densely wired to the entorhinal cells with larger spatial scales in their grids, and vice versa. It has also been shown experimentally that entorhinal lesions impair performance on navigation tasks and disrupt the temporal ordering of sequential activations in hippocampus while leaving performance on location recognition tasks intact^{45,51}. This suggests a role of grid cells in spatial planning, and encourages us to speculate about a more general role for grid cells in hierarchical planning.

## Discussion

The hippocampus has long been thought to encode a cognitive map, but the precise nature of this map is elusive. The traditional view that the map is essentially spatial^{7,8}is not sufficient to explain some of the most striking aspects of hippocampal representation, such as the dependence of place fields on an animal’s behavioral policy and the environment’s topology. We argue instead that the map is essentially *predictive*, encoding expectations about an animal’s future state. This view resonates with earlier ideas about the predictive function of the hippocampus^{20,52–54}. Our main contribution is a formalization of this predictive function in a reinforcement learning framework, offering a new perspective on how the hippocampus supports adaptive behavior.

Our theory is connected to earlier work by Gustafson and Daw^{13} showing how topologically-sensitive spatial representations recapitulate many aspects of place cells and grid cells that are difficult to reconcile with a purely Euclidean representation of space. They also showed how encoding topological structure greatly aids reinforcement learning in complex spatial environments. Earlier work by Foster and colleagues^{12} also used place cells as features for RL, although the spatial representation did not explicitly encode topological structure. While these theoretical precedents highlight the importance of spatial representation, they leave open the deeper question of why particular representations are better than others. We showed that the SR naturally encodes topological structure in a format that enables efficient RL.

The work is also related to work done by Dordek *et al.*^{23}, who demonstrated that gridlike activity patterns from principal components of the population activity of simulated Gaussian place cells. As we mentioned in the Results, one point of departure between empirically observed grid cells data and SR eigenvector account is that in rectangular environments, SR eigenvector grid fields can have different spatial scales aligned to the horizontal and vertical axis (see Fig. S8)^{14}. In grid cells, the spatial scales tend to be approximately constant in all directions unless the environment changes^{55}. The principal components of Gaussian place field activity are mathematically related to the SR eigenvectors, and naturally also have grid fields that scale independently along the perpendicular boundaries of a rectangular room. However, Dordek *et al.* found that when the components were constrained to have non-negative values and the constraint that components be orthogonal was relaxed, the scaling became uniform in all directions and the lattices became more hexagonal^{23}. This suggests that the difference between SR eigenvectors and recorded grid cells is not fundamental to the idea that grid cells are applying a spectral dimensionality reduction. Rather, additional constraints such as non-negativity are required.

The SR can be viewed as occupying a middle ground between model-free and model-based learning. Model-free learning requires storing a look-up table of cached values estimated from the reward history^{1,56}. Should the reward structure of the environment change, the entire look-up table must be re-estimated. By decomposing the value function into a predictive representation and a reward representation, the SR allows an agent to flexibly recompute values when rewards change, without sacrificing the computational efficiency of model-free methods^{4}. Model-based learning is robust to changes in the reward structure, but requires inefficient algorithms like tree search to compute values^{1,15}.

Certain behaviors often attributed to a model-based system can be explained by a model in which predictions based on state dynamics and the reward function are learned separately. For instance, the *context preexposure facilitation effect* refers to the finding that contextual fear conditioning is acquired more rapidly if the animal has the chance to explore the environment for several minutes before the first shock^{57}. The facilitation effect is classically believed to arise from the development of a conjunctive representation of the context in the hippocampus, though areas outside the hippocampus may also develop a conjunctive representation in the absence of the hippocampus, albeit less efficiently^{58}. The SR provides a somewhat different interpretation: over the course of preexposure, the hippocampus develops a *predictive* representation of the context, such that subsequent learning is rapidly propagated across space. Figure S5 shows a simulation of this process and how it accounts for the facilitation effect.

Recent work has elucidated connections between models of episodic memory and the SR. Specifically, Gershman *et al.* demonstrated that the SR is closely related to the Temporal Context Model (TCM) of episodic memory^{16,19}. The core idea of TCM is that items are bound to their temporal context (a running average of recently experienced items), and the currently active temporal context is used to cue retrieval of other items, which in turn cause their temporal context to be retrieved. The SR can be seen as encoding a set of item-context associations. The connection to episodic memory is especially interesting given the crucial mnemonic role played by the hippocampus and entorhinal cortex in episodic memory. Howard and colleagues^{59} have laid out a detailed mapping between TCM and the medial temporal lobe (including entorhinal and hippocampal regions).

Spectral graph theory provides insight into the topological structure encoded by the SR. We showed specifically that eigenvectors of the SR can be used to discover a hierarchical decomposition of the environment for use in hierarchical RL. Mahadevan *et al.* demonstrated that the related Laplacian eigenvectors are useful as a representational basis for approximating value functions, dubbing these eigenvectors “protovalue functions”^{60}. Spectral analysis has frequently been invoked as a computational motivation for entorhinal grid cells (e.g., by Krupic and colleagues^{61}). The fact that any function can be reconstructed by sums of sinusoids suggests that the entorhinal cortex implements a kind of Fourier transform of space. However, Fourier analysis is not the right mathematical tool when dealing with spatial representations in a topologically structured environment, since we do not expect functions to be smooth over boundaries in the environment. This is precisely the purpose of spectral graph theory: Instead of being maximally smooth over Euclidean space, the eigenvectors of the graph Laplacian embed the smoothest approximation of a function that respects the graph topology^{60}.

In conclusion, the SR provides a unifying framework for a wide range of observations about the hippocampus and entorhinal cortex. The multifaceted functions of these brain regions can be understood as serving a superordinate goal of prediction.

## Methods

### Task simulation

Environments were simulated by discretizing the plane into points, and connecting these points along a triangular lattice. The adjacency matrix *A* was constructed such that *A*(*i,j*) = 1 wherever it is possible to transition between states *i* and *j*, and 0 otherwise. The transition probability matrix *T* was defined so that *T*(*i,j*) is the probability of transitioning from state *i* to *j*. Under a random walk policy, the transition probability distribution is uniform over allowable transitions.

### SR computation

To solve for the successor representation, we used the convergence of the geometric sum of transition matrices, *T*, discounted by ** γ** ∈ [0,1]:

To simulate the long-term effect of rewards and punishments in the environment, the optimal policy was found using value iteration^{15}, and the SR was computed analytically under this policy. When demonstrating learning dynamics, the SR was estimated using the TD-learning (Fig. 11, Supp. Fig. 1, Supp. Fig. 3). Noise was injected into the location signal by adding uniform random noise with mean 0 to the state indicator vector.

### Eigenvector computation and Spectral Regularization

For Figure 11, eigenvectors were computed incrementally using a Candid Covariance-free Incremental PCA (CCIPCA), an algorithm that efficiently implements stochastic gradient descent to compute principal components^{62} (eigenvectors and principal components are equivalent in this and many domains). All eigenvectors were thresholded because firing rates cannot be negative. Spectral regularization was implemented by reconstructing the SR from the truncated eigendecomposition (Fig. S3). Details can be found in the Supplemental Methods.

In generating the grid cells shown, we assume random walk policy, which is the maximum entropy prior for policies (see^{63} for why maximum entropy priors can be good priors for regularization). However, since the learned eigenvectors are sensitive to the sampling statistics, our model predicts that regions of the task space more frequently visited would come to be over-represented in the grid space (see Figure S6 for examples).

### Quantifying place and grid fields

To quantify place field clustering, center of mass (CoM) of SR place fields was computed by summing the locations of firing, weighted by the firing rate at that location (normalized so that the total firing summed to 1):
where **p**(*s′*) is the (*X*, *Y*) coordinate of the place field centered at state *s′*.

Grid field quantifications paralleled the analyses of Krupic *et al.*^{40}: an ellipse was fit to the 6 peaks closest to the central peak, “orientation” refers to the orientation of the main axes (*a,b*). “Correlation” always refers to the Pearson correlation, “spatial correlation” refers to the Pearson correlation computed over points in space (as opposed to points in a vector), and spatial autocorrelation refers to the 2D auto-convolution.

To measure similarity between halves of the environment in Figure 9, we 1) computed the spatial autocorrelation for each half, 2) selected a circular window in the center of the autocorrelation, and 3) computed the correlation between autocorrelations of the two halves in the window. This paralleled the analysis taken by Krupic *et al.*^{40} and provides a measure of grid similarity across halves of the environment. The circular window is used to control for the fact that the boundaries of the square and trapezoid in the two halves of the respective environments differ.

In evaluating our simulations of the grid fields reported by Carpenter *et al.*^{42} (Fig. 11), the local model consisted of the set of 2D Fourier components bounded by the size of the compartment and the global model consisted of the set of 2D Fourier components bounded by the size of the environment. “Model fit” was measured for each eigenvector by finding maximum correlation over all model components between the eigenvector and model component.

## Author contributions statement

All authors conceived the model and wrote the manuscript. Simulations were carried out by K.S.

## Additional information

The authors declare no competing financial interests.

## Acknowledgments

We are grateful to Tim Behrens, Ida Mommenejad, and Kevin Miller for helpful discussions, and to Alexander Mathis and Honi Sanders for comments on an earlier draft of the paper. This research was supported by the NSF Collaborative Research in Computational Neuroscience (CRCNS) Program Grant IIS-120 7833 and The John Templeton Foundation. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies.