Abstract
Model-based decision making relies on the construction of an accurate representation of the underlying state-space, and localization of one’s current state within it. One way to localize is to recognize the state with which incoming sensory observations have been previously associated. Another is to update a previous state estimate given a known transition. In practice, both strategies are subject to uncertainty and must be balanced with respect to their relative confidences; robust learning requires aligning the predictions of both models over historic observations. Here, we propose a dual-systems account of the hippocampal-entorhinal system, where sensory prediction errors between these models during online exploration of state space initiate offline probabilistic inference. Offline inference computes a metric embedding on grid cells of an associative place graph encoded in the recurrent connections between place cells, achieved by message passing between cells representing non-local states. We provide testable explanations for coordinated place and grid cell ‘replay’ as efficient message passing, and for distortions, partial rescaling and direction-dependent offsets in grid patterns as the confidence weighted balancing of model priors, and distortions to grid patterns as reflecting inhomogeneous sensory inputs across states.
Author Summary
Minimising prediction errors between transition and sensory input (observation) models predicts partial rescaling and direction-dependent offsets in grid cell firing patterns.
Inhomogeneous sensory inputs predict distortions of grid firing patterns during online localisation, and local changes of grid scale during offline inference.
Principled information propagation during offline inference predicts coordinated place and grid cell ‘replay’, where sequences propagate between structurally related features.
Introduction
Grid cells in the medial entorhinal cortex (mEC), whose firing fields form a periodic hexagonal lattice across the environment, are thought to support path integration1–3, whereas hippocampal place cells tend to have unimodal firing fields reflecting environmental cues such as boundaries4,5. Grid cell firing patterns are stable over time, suggesting corrective environmental inputs, possibly from place cells6, but rely more on self-motion than place cell firing patterns7 suggesting that environmental inputs are not fully corrective2,8,9. Given estimates of each input’s uncertainty, Bayesian inference tells us how they should be optimally combined.
Although online learning (i.e. using only currently available sensory information) can converge under low PI and sensory noise, robust learning in the presence of noise requires minimizing the error between self-motion and environmental estimates of location across all state transitions10,11. Thus, historic observations must be stored and revisited offline (i.e. independently of current sensory inputs) to allow propagation of local environmental information to non-local but structurally connected regions of the cognitive map, e.g. as when adapting to a novel shortcut or barrier. This process can also be viewed as an embedding of sensory experience within a low-dimensional manifold (in this case, 2D space), as observed of place cells during sleep12.
Building on previous work11,13–15, we propose a dual-systems (online-offline) account of spatial inference in the hippocampal/entorhinal system, which we define as the process of identifying the configuration of both one’s own location (current state) and the location of environmental landmarks in space (c.f. ‘SLAM’10). In familiar environments, online localization (identification of one’s own position) is achieved by recursively combining self-motion and sensory inputs, which are mediated by learned transition and observation models, respectively. However, prediction errors between these models trigger offline inference over non-local states, facilitating fast learning of new or changed associative environmental structure, encoded online in place-place cell synaptic associations. We identify this offline inference with coordinated hippocampal/mEC ‘replay’16–20.
Our framework also provides algorithmic- and implementation-level explanations for observed features of grid cell firing in response to manipulations7,9,21–24 or inhomogeneity25–29 of environmental sensory input. Overall, these phenomena can be understood in a probabilistic framework, where minimization of prediction errors between the transition and observation models are traded against prior model beliefs.
Results
Probabilistic online localization with place and grid cells
Grid cells exist in ‘modules’ of cells, whose firing patterns have the same spatial scale and orientation relative to the environment, but differ in their spatial offsets30. The spatial scale increases in discrete steps along the dorso-ventral axis, suggesting that, across modules, GCs support a hierarchical representation of space21,24,31–33. Here, we consider a single module of GCs, whose activity represent a probability distribution over a periodic, discretized region of space (visualised as a topographically arranged sheet of cells; Fig. 1A).
The self-location distribution is maintained over time by recursively integrating sensory and selfmotion inputs, accounting for their uncertainties (Fig. 1A). Firstly, the posterior distribution over agent location (grid module activity) from the previous time-step G is updated given noisy perceived movement û via the transition model T (see Methods): where x is the 2D coordinate of the agent location in metric space (corresponding to a particular grid cell) and x’ the location at the previous time-step. Biophysically, T would be represented by a population of direction dependent ‘shifter’ cells with asymmetric recurrent weights1 with a circulant structure34 (Fig. S1C, see Methods) learned apriori (but see Refs. 35–37). The rate of translation of activity on the grid sheet in response to movement û is controlled by the transition model gain α = [αx, 0; 0, αu], which might correspond to the strength of the associations to, or the speed dependence of, shifter or conjunctive cells31,38,39 (see Supplementary Methods).
The transition model estimate G’(x) is then refined by observations of environmental features, which map to metric space locations via observation model H: where P is a vector of place cell firing rates, firing of place cell i representing the likelihood of the presence of a specific sensory feature, or combination thereof. In our simulations, these have unique locations in physical space μi and receptive field widths . Where the number of grid cells is large, the weights from place cell i to the grid module define a distribution for that feature’s estimated location in metric space (Fig. 1A). The weighted projection of place cell activity by these weights defines the observation model H. K provides inhibitory normalization (see Supplementary Methods).
Online learning modifies the observation model to reflect the current transition model location estimate (induces synaptic changes in the place-grid cell connection weights via a BCM rule; Fig. 2A; Methods). Online learning produces stable grid patterns (due to the circulant structure of T) for a range of levels of PI and sensory noise, but convergence fails in higher noise regimes (Fig. S2B). After a short period of initial learning, stable grid patterns emerge in the integrated estimate, despite the pure PI estimate being too noisy and the sensory associations too immature to drive stable patterns, if operating independently (Fig. 1B).
Offline inference: The hippocampus as a probabilistic graph
Local, online learning is not robust in novel environments, because corrections to the estimated agent location (current grid cell activity), e.g. upon encountering familiar environmental features (place cell activity) associated to a different location on the grid module, also imply corrections to the encoding of feature observations along the preceding trajectory10. That is to say; local updates to the cognitive map also imply non-local, structurally associated changes. Formally, probabilistic spatial inference in this case requires finding the most likely configuration of metric space feature locations {bi}i=1:Np (each bi is the 2D coordinate of feature i in metric space, i.e. a place - grid cell association) and agent location x (the distribution over which is indicated by the grid cell firing rates) consistent with environmental sensory observations made along a given trajectory.
Theoretically, the configuration {bi}i=1Np can be recovered purely from the distances between pairs of environmental features10 (Fig. 1F, “square”, “ring”). Importantly, despite both feature locations being susceptible to large absolute errors (due to noisy PI), the errors will be correlated such that pairwise distance measurements will decrease in variance with observations10. This method predicts characteristic failure modes when the pairwise distance information is ambiguous or incomplete (e.g., Fig. 1F; “broken ring”). New distance observations might also cause dramatic changes to the inferred configuration (e.g. the discovery of a shortcut). If the current absolute estimates of feature node (i.e. place cell) location are stored in the place-grid cell synaptic weights, we propose that the relative distances between pairs of features are stored in the recurrent weights between place cells in hippocampal region CA3.
Consider a spring network, where the edge between environmental feature nodes i and j represents a noisy pairwise observation with length reflecting pairwise distance and stiffness reflecting certainty13,40 (Fig. 1F, S6A). Minimizing the elastic energy in the spring mesh system corresponds to finding the maximum of the joint likelihood L(·), which is a function of the feature locations in metric space {bi}i=1:Np and the internal gain parameter α, given pairwise distance measurements with Gaussian noise (δij). This is equivalent to defining a probabilistic graphical model (see Methods) over the posterior: where the current PC-GC weights B (Bi is the probability distribution of the location of feature i and Bi (bi) is its value at metric/grid module location bi) act as priors on the feature node locations, the pairwise potential terms ψ(·) penalize the difference between associative pairwise distance measurements δij, made directly in environmental stimulus space, and the distance between their candidate locations in metric space bi — bj. E is the set of connected PCs (see Methods). Distance in metric space is also a function of the transition model gain (α), which has a Gaussian prior p(α|α0) (a larger |α| will decrease the metric space distance for all pairs; see Methods). Maximizing the likelihood (finding the state of minimum energy in the spring network) model over all feature node pairs minimizes the total prediction error between associative and metric generative models of the world15 (Fig. S7).
The associative distances can be straightforwardly learned during online exploration. Since Hebbian learning reflects coactivity, a trajectory exploring the environment uniformly results in synaptic strengths between place cells proportional to the spatial correlation between their receptive fields41. The Euclidean separation between their fields is then accessed via a simple transformation (see Methods; Fig. S1G). In this context, learning the PC-GC weights (modifying the observation model) during online localization corresponds to forming spatial priors over feature locations which anchor the structure, which would otherwise be translation or rotation invariant (since measurements are relative), learned during offline inference to constant locations on the grid-map. Taken together, our framework proposes a mapping onto anatomy of the joint agent-feature location distribution required for full probabilistic inference over environmental structure (Fig. S5; See overall algorithm in Table S1).
Partial grid pattern responses to environmental rescaling
Uniform rescaling of an environment will introduce a mismatch between the estimates of location from the transition and observation models (i.e. a ‘prediction error’). To minimize these prediction errors, the offline system can either modify the transition model gain to match the current environmental input (Fig. 2A, bottom), or modify the mapping from environmental inputs to metric space in the observation model (Fig. 2A, middle; see Methods). The degree to which either is modified should reflect their relative confidences, specified by a ‘transition confidence score’ (, the ratio of confidence in the transition vs observation models; see Methods). Similarly, a ‘prior confidence score’ Pc specifies how much the system will tolerate persistent prediction errors; if Pc is large, optimization may favour preserving prior configurations, as opposed to alignment of the current transition and observation models (see Methods).
We modelled experiments in which the physical environment21,42 and perceived velocity through a visual virtual environment7 were re-scaled, such that self-motion and sensory inputs conflicted. In both experiments, the rescaling of the grid patterns was partial, i.e. less than the magnitude of the physical or virtual manipulation, and less than those of place fields.
Both manipulations can be simulated by introducing a visual gain parameter αVisual to the simulation of the environment (in both experiments it scales the amount of self-motion required to traverse the width of the perceived environment). Learned associative distances (δ) are also scaled by this parameter, reflecting its effect on the temporal overlap of place fields (see Methods Eq. 12). We simulated grid pattern rescaling responses over a range of transition confidence scores Tc. When confidence in the transition model is high (Tc → ∞), grid patterns in the real world are unchanged when plotted against physical movement (but are changed when plotted in visual VR coordinates; Fig. 2B, first column). The opposite is true when confidence in the observation model is high: grid patterns are unchanged relative to the apparent environment (Fig. 2B, last column).
However, for intermediate Tc values (i.e. balanced confidence in the transition and observation models), the model predicts partial rescaling of the grid pattern relative to the size of the manipulation (Fig. 2B, middle column), matching the observed grid patterns in both experiments7,21 (and a similar third experiment in on a virtual linear track43; Fig. 2C).
Differential grid and place field responses to environmental reshaping
How does the offline system respond to more complex environmental deformations? When one wall of a familiar rectangular environment is rotated inwards by 45°22, place fields near the wall shifted almost fully while fields further away remained largely stationary, consistent with place fields preferentially reflecting local environmental inputs5. In contrast, grid fields shifted only partially near to the manipulated wall. Using the observed place field shifts23, we simulated the response of grid cells to the same manipulation (Fig. 3; see Supplementary Methods). Shifted place fields induce a misalignment between the associative distances and the distance between their encodings in metric space. The place field shifts are local and non-uniform, and so misalignment cannot be corrected by a global change to the transition gain α. Indeed, α is not significantly modified during the optimization process, regardless of the TC value. Instead, alignment between the transition and observation models is maximized by modifying the observation model, i.e. updating the locations of the place fields on the grid module.
If there were no confidence in the prior observation model (PC → 0), it would be modified offline to match the transition model, leaving the grid pattern unperturbed by the environmental change (Fig. 3B,C, top row). At the other extreme, favouring prior beliefs over recent observations (i.e. pairwise distances encoded during the manipulation trial) would result in an unchanged observation model, and grid field shifts that exactly mirror corresponding place field shifts (Fig. 3B,C, bottom row). In this regime, there would be permanent misalignment of the transition and observation models during online localization, producing noisy grid patterns, as when simulating a related experiment where grid distortions were observed in trapezoid environments44 (Fig. S2D). Setting PC to an intermediate value reproduces the experimentally observed partial shifting of grid fields (relative to the place fields23) when visualizing the structure encoded in the observation model (i.e. assuming low confidence in the transition model; Fig. 3B,C and D).
Direction dependent shifting of grid patterns during online localization
In addition to partial changes to grid scale in response to environmental rescaling, enduring misalignments between observation and transition models can result from strong model priors, which prevent complete adaptation of the transition model gain. These cause the transition estimate to consistently precede that of the observation model, in the current direction of travel (on a 1D track, Fig. 2E) during VR visual gain decrease trials or physical expansion of the environment.
In all three cases, the integrated estimate of location (Eq. 2) in the online model converges to a fixed distance ahead of the observation model estimate (in the direction of travel; Fig. 2E inset and 3D), causing the grid pattern in the real world to dynamically shift opposite the direction of travel, as observed experimentally7,9. Our model suggests that the offsets should be partial (smaller than implied by a hard-reset at the boundary) and not specifically require a recent boundary encounter (cf. Keinath et al.9). Dynamic shifting in the model will reduce with experience of the novel or manipulated environment, as model misalignment reduces, as observed experimentally45.
Online and offline perceptual warping in spatial representations
With increasing experience of an environment, grid firing patterns exhibit both local scale changes26 and global shear-like distortions29, the latter associated with 7.5-8° offsets of one of the grid axes29,44 to the walls of square environments. Both effects were present in our simulations and can be attributed distinctly to the offline map-learning and online localization components of our theoretical framework.
Firstly, we show that local changes to the grid scale26, which are positively correlated with behavioural occupancy (animals spend more time in the middle of the environment), arise from the offline process of map (PC-GC connections) learning. These mapping-induced distortions can be further subdivided into two mechanisms, both of which induce local scale changes by biasing the pairwise distances recovered from the Hebbian learned recurrent connections in CA3 (Fig. 1D,4; Fig. S7).
Firstly, relative behavioural under-sampling of the place fields near the boundaries of the environment (using occupancy statistics from26; Fig. 4A, bottom) lead to weaker PC-PC connections, and consequent overestimation of their pairwise distances, producing local scale changes (Fig. S7C).
Secondly, since Hebbian learned connection weights between place cells reflect the correlation in their firing, and therefore their statistical discriminability46, two place cells with broad receptive fields would develop a stronger connection than a pair with equal separation but narrower receptive fields (stronger connections correspond to shorter distances on the grid module, producing grid patterns with larger scales in the environment; Fig. 1D). Another recent study47 suggests that place fields are narrower near the edges of an environment, consistent with greater precision when driven by more proximal environmental features5 (Fig. 4A, top row). In our model, this produces weaker recurrent connections and a shrinking of the grid pattern at the edges of the environment following offline inference (Fig. 4D,E, top row).
Together, our results suggest that the cognitive ‘distance’ between two sensory features should be greater both when the absolute confidence in their spatial locations is greater (reflecting an increased statistical discriminability), or when those features are under-sampled relative to other features.
Although the action of both mechanisms are independent their effect is the same; both i) relative under-sampling of the transition between two adjacent states and ii) a reduced statistical discriminability between those states, both contribute to a weaker pairing of their representative place cells, resulting in greater separation between their encodings in metric space and a locally larger grid scale when ‘read-out’ in the firing pattern (a locally larger perception of distance).
In contrast, global shear-like distortions29 and associated 7.5-8° offsets of one of the grid axes29,44 can be interpreted as localization induced distortions during online exploration. In Stensola et al.29, rats were introduced into the same corner of the box at the start of each trial; in Butler et al.25, shearing developed following the introduction of reward25. In both experiments, shearing developed with increasing experience25,29. We hypothesized that these distortions reflect an increasing effect of non-uniform environmental inputs to the grid module, either reflecting their natural distribution25,29 or inhomogeneous behavioural sampling of environmental locations26.
In our simulations, given a learned map, biasing the strength of sensory inputs at specific locations (e.g. one/two corners) during online exploration reproduced several experimentally characterized global distortions by causing a bias in the decoding of location (i.e. salient locations contribute a larger ‘vote’; Fig. S3; see Supplementary Methods).
Probabilistic inference through HPC-mEC message passing
To this point we discussed, from a functional perspective, how the brain might optimize its internal representations to reflect the uncertainty of sensory information. But how might the brain perform this optimization? In the above analyses of offline inference, we numerically computed the maximally likely feature locations on the grid module. However, the system must also track the uncertainty in these estimates, which would require updating the place-grid cell weights (including those with firing fields far from the agent location). An update of the full weight distributions is generally intractable when the state space is large.
Belief propagation48 is a technique for approximating this inference on graph structured data, and comprises two stages. First, a given feature node (i.e. a place cell) computes its location distribution (i.e. connections to the grid cells) Bi (bi) by multiplying its prior with messages received from its connected neighbours (Fig. 5C; see Methods for details). A message mi→j (bj) expresses neighbour node i’s belief of node j’s location, conditioned on its own distribution, and is dependent on the same pairwise potential terms ψ(·) in Eq. 3. The effect of a message is to favour distributions of nodes i and j which locate them at a radial distance equal to the associative distance δij; causing messages to be expressed as rings centred on the belief of the broadcasting node (Fig. 5C). Resolving a feature’s unique location then depends on aggregating messages from multiple neighbours (Fig. 5C). Computations are distributed, and importantly only require information that is local to each neuron.
Each node in the graph iterates between updating its belief and broadcasting messages, converging when new messages cease to change the beliefs of their recipient nodes. As expected, the reduction in pairwise prediction error between associative distances and their corresponding distances in grid space (see Methods) over successive message iterations is accompanied by a sharpening of the distribution of each feature’s location on the grid module (see Supplementary Methods; Fig. S1H).
Offline inference triggered by prediction errors
How might the online and offline systems interact? If the online system is sufficient to localize within pre-learned, simple or slowly changing environments, non-local reactivations of place cells would be unnecessary. However, more complex offline inference is required under more demanding circumstances, or in novel or changing environments. We hypothesize that offline or ‘remote’ inference is triggered by prediction errors between location estimates from the transition and observation models, respectively (Fig. 1G), defined in our model as the Kullback-Leibler divergence .
Prediction errors are large when the observation model prediction (weighted place cell input) is different and more sharply peaked than the transition model estimate (Fig. 1G; see Methods; prediction errors will not be generated in absence of incoming sensory information, as in darkness, when the observation model estimate is uncertain).
To illustrate our dual-systems (online+offline) hypothesis (Fig. S7), we simulated an agent navigating around a novel circular track (the loop closure task; Fig. 5). Completion of the first lap produces positive prediction errors between the sharply peaked input from feature inputs learned at the beginning of the trial, and the agent location estimate which is uncertain given the accumulation of PI noise (Fig. 5B).
Decrease in structural error (the difference between the place field separations and their encoded separations on the grid cell sheet) following online+offline inference was markedly larger than following online learning alone (Fig. 5E). The inferential power of this ‘one-shot’ learning process derives from consideration of the full covariance structure of the feature locations (captured by the CA3 connection weights between place cells), compared to the purely local learning occurring online. The system was subsequently able to navigate with dramatically reduced error (Fig. 5Aiv), eliminating prediction errors on subsequent lap completions (Fig. 5B; Supplementary Video 1).
Coordinated grid-place cell replay as structured information propagation
The scheduling of updates in belief propagation is important because messages that do not change the beliefs of neighbours are redundant (Fig. 6A). We scheduled only the place cell whose belief had changed most to broadcast a new message on each cycle (Fig. 6A; see Methods). This max-update scheduling was more efficient than simple synchronous schemes, converging with fewer messages (Fig. 6C; see Ref.49).
The sequences of place cells broadcasting messages during offline inference in the loop-closure simulation have significant structure (6B). They tend to initially propagate backwards along the track from the animal’s current position, resembling the characteristic reverse hippocampal replay following reward17 (Fig. 6B), but also occasionally hop to new locations where remote sequences are initiated50 (Fig. 6B,F). These subsequent sequences showed an approximately equal distribution of forward/reverse sweeps (Fig. 6D; see Methods; Supplementary Video 1).
Thus, hippocampal ‘replay’ may reflect correction of local regions of the cognitive graph given new or ‘surprising’ information, as opposed to simple recapitulation of experience51. Sequences selectively affect place cells whose beliefs are structurally affected, and terminate when this is no longer the case, ‘hopping’ to remote regions. This leads to smooth sequences in un-converged graphs (novel environments) and more hoppy sequences with experience, where converged regions may be skipped (Fig. 6F). These ‘hops’ marked the separation of ‘replay’ events into distinct sub-sequences (see Methods). Multiple trajectories may also be played out in parallel (e.g. two trajectories alternating under max-scheduling; Fig. 6B, middle, grey shading).
A neural model of coordinated place cell – grid cell replay
How might belief propagation for offline inference be implemented in spikes fired by place and grid cells during replay? We propose a schematic model with a focus on function rather than biological detail (e.g. our ‘place cells’ combine the recurrent connections of CA3 with the connections to mEC of CA1). In the model, minimizing prediction errors between associative and metric generative models corresponds to synchronizing the propagation of activity through CA3 and mEC, respectively (Fig. 7; see Supplementary Methods; Supplementary Video 2). A ‘message’ is initiated by a place cell spike, which propagates in CA3 via the Hebbian recurrent connections that encode place field separations. In parallel, the same spike initiates activity at the corresponding location on the grid cell module, which then propagates on the grid sheet as a traveling wave, using the same circuitry as path integration in the online model and propagating at the same speed as spikes in CA3 (see Methods). Hebbian-like learning strengthens connections from place cells to grid cell which simultaneously receive input in CA3 and EC respectively (Fig. 7A), approximating the algorithmic message-passing implementation (Fig. 7B, C). Firing of the broadcasting place cell is triggered by changes in its synaptic weights to the grid cell population, reflecting correction of the observation model in response to prediction error with the transition model (see Supplementary Methods).
Discussion
Building on previous work11,13,15, we argue that the mEC-HPC system performs spatial inference in two distinct regimes. Given a known ‘cognitive map’ (mapping sensory information to metric space), probabilistic integration allows optimal estimation of current location by online combination of uncertainty-weighted self-motion and environmental observations provided by transition and observation models respectively (Fig. 1). Where these estimates deviate strongly, prediction errors (Fig. 1G) trigger offline inference events (Fig. 5B), which propagate local environmental input to remote but structurally associated states, producing coordinated (often sequential) reactivations in place and grid cells (Fig. 6, 7). The effect of offline inference is to produce a 2D embedding of the sensory information provided through the place cells, which may facilitate planning or generalization. Although not modelled here, back-projections from grid to place cells, reflecting the metric embedding of their place fields, might therefore also reduce uncertainty in place cells’ firing, producing increased spatial stability in their fields, as observed to occur during sleep12.
Partial rescaling of grid patterns7,21 and differential shifting of grid and place fields23 in response to manipulations of environment sensory input can be understood as joint optimization of transition and observation models, balancing model priors with new observations. Where prediction errors persist, direction dependent grid pattern shifts may emerge as a result of probabilistic integration of these conflicting cues7,9 (whereas boundary-dependent resetting9 produces larger shifts than experimentally observed and no rescaling; Fig. 2D, E).
We show that observed grid pattern distortions can be mechanistically linked to inhomogeneity in the sampling or neural representation of the environment25,26,29 (Figs. 4A, S3), which might be reflected in behaviour52. Thus variation in the confidence, sampling or discriminability of sensory states will produce local changes in grid scale, inducing non-Euclidean structure in the metric representation of space (Fig. 1D, 4B). Our model also shows that distortions appear gradually with experience29, as the learned mapping from sensory features to metric space (the observation model) becomes more confident relative to the estimate of location from path integration (the transition model). Given initial learning, online localization errors (Fig. S3) should occur immediately following subsequent manipulations to the environmental sensory input, whereas offline changes may occur over longer timescales and correlate with replay of the manipulated states (Figs. 4,6; consistent with grid, but not place fields reorganizing significantly during sleep53). However, although large prediction errors will cause more easily detectable offline inference events, offline learning may occur continuously and not necessarily reactivate distinct previously experienced spatial trajectories51. We note that strong associative connectivity may also contribute to pattern completion, making the place cell representation robust to cue removal54.
Theoretical studies have demonstrated how the connectivity of the mEC metric space might emerge from a low-dimensional embedding of sensory stimuli55, predictive states35 or from unsupervised learning during navigational tasks56. A crucial difference in our model is that perceptually similar but physically separated compartments will be represented distinctly57, reflecting the vectorial translation between them in the transition model (i.e., not simply reflecting the topological state transition structure35). Another recent model showed that grid cell like responses can emerge from learning the transition model that best predicts observed sensory stimuli36. We instead assume a fixed transition structure but with a variable linear gain, consistent with continuous attractor models31 where translation of activity on the grid cell sheet is driven by cells with velocity-dependent firing rates38,39. Indeed, a recent study showed that velocity dependence in mEC firing is tied to environmental manipulations39.
We propose that offline structural inference events correspond to coordinated HPC/mEC replay16–19,58–60, which can be viewed as synchronizing predictions from associative (CA3) and metric (mEC) generative models (Fig. S7). In this way, structural changes to an environment can be propagated to non-local regions of the metric embedding, in contrast to models in which these states need to be physically revisited13, consistent with the observation that replays do not necessarily repeat experienced trajectories51. Prediction errors between the two models may trigger replay events and corresponding sharp-wave ripples61,62. To our knowledge, this is the first functional model of coordinated place cell-grid cell replay18 (although cf. Ref. 63), and provides an alternative to rewardbased theories64,65 (we note that rewards may themselves represent salient sensory features, independent of their reward value).
Our model makes a number of experimentally testable predictions. Firstly, systematic manipulation of the discriminability of sensory cues distributed within an environment should produce predictable distortions to the grid pattern, observed with increasing experience of an environment. Secondly, replay should be more frequent after structural changes such as shortcuts, blockages or gain manipulations as in the experimental setup of Fig. 5. Thirdly, replay events triggered by specific unexpected sensory observations should become less frequent (Fig. 5B) and smooth (Fig. 6E, F) with continued experience, if the observations remain stable. Fourthly, multiple local replay events may occur in inter-leaved fashion (Fig. 6C, middle, grey shading). Fifthly, we predict the existence of travelling waves in grid cells (as a function of their spatial phase; see also66,67, Fig. 7). Lastly, initial messages propagating from the animal’s current location may not cause subsequent messages in remote regions of the graph which are already sufficiently converged (messages will not cause changes in the beliefs of their recipients), although activity in mEC will continue to propagate. Thus grid cell replay could thus be detectable in the absence of simultaneous place cell replay19 (but place cell replay requires the grid cell transition model and so depends on mEC68).
Our proposed structure learning framework can account for diverse phenomena observed in the HPC-mEC system, and makes several novel, experimentally testable predictions.
Methods
Online recursive Bayesian estimation
The transition matrix T defines the probability of transitioning from agent location x’ to location x, and is a function of the perceived current velocity û and transition model gain α = [αx, 0; 0, αy]. Since our metric space is periodic, T accounts for cyclic transitions cmn, with Gaussian noise proportional to the perceived velocity Ϻ~u + N(0, diag(u)ΣPI): where f(x|μ, ∑) is a multivariate Gaussian PDF, cmn = 2α1(mv1 + nv2) and define the unit vectors of a hexagonal lattice69 with grid pattern orientation ϕ and diag(·) produces a diagonal matrix from a vector input. Since most of the mass is associated with shorter transitions, in practice we approximate the full distribution with a finite number of periodic summations (i.e. ignore the tails; 5 cycles in our simulations; Fig. S1B).
The observation model defines the likelihood of the current environmental sensory inputs (i.e. the population vector of place cell firing P, where is the firing rate distribution of place cell i over physical space ) given the predicted metric location x, via a thresholded weighted sum: H(P|x) = [∑i Bi(x)Pi]+. Here, Bi(x) is the location distribution of landmark i in metric space, which would be encoded biophysically in the learned [NP × NG] matrix B of synaptic weights from place to grid cells (i.e. the gth row and ith column of B is the distribution Bi(·) evaluated at the location of the gth grid cell). The normalization constant K = ∫ G(x)dx in Eq. 2 simply sums over the current grid cell activity and might biophysically be implemented by inhibitory interneurons.
Online learning of structural priors
In the online model, the place-grid cell weight matrix B is learned using the BCM rule: where τPG = 1e – 4 is the learning rate and G’ and H are column vectors whose elements are the transition and observation models estimated at the locations of a finite set of grid cell locations. ⊙ is the element-wise (Hadamard) product between two vectors and ⊗ is their outer product. The sliding threshold θ ∈ R1×NG provides adaptive synaptic normalization, where τθ ≈ 10τPG. Learning takes place between the apriori distribution G’ and the current sensory observation H (i.e. before the observation correction to G’).
The offline probabilistic graphical model
The pairwise potentials penalize differences in the pairwise distances encoded by association δij, and those that would be computed by comparing their absolute encodings in metric space dmn(·); i.e. they encourage a metric embedding that reflects the associative distance. The metric distance function defines the pairwise distance between the encoding of locations i. and j in mEC in metric space, and is dependent on the gain factor α. of the transition model. Pairwise measurements are assumed to have confidence wij, (inverse variance) that increases with decreasing inferred distance (i.e. wij, = 1/(σPC + σPIδij)). The transition model gain is assumed to have a Gaussian prior p(α) = exp (–wα (α — α0)T(α – α0)), the wα term representing the confidence in the prior gain value α0. The periodic offset term cmn is the same as defined for the transition model.
Associative encoding in the hippocampus
The associative distances are recovered from the [NP × Np] synaptic weights in CA3 A. Under a random-walk behavioural trajectory, the simple modified Hebbian learning rule: where ⊙ is the element-wise (Hadamard) product and τPP the learning rate. The synaptic weights can be shown to converge to , the square-root of the correlation between the firing rates of two PCs70. Where place fields have uniform receptive field widths (σPC) and peak firing rates, the Euclidean distance between place fields i and j can be inferred via the simple transformation46:
The recovered distance is therefore scaled by the receptive fields’ variance (the Bhattacharyya distance46), and so relates to ‘discriminability’ (Fig. 1D). CA3 synapses effectively average over multiple pairwise measurements. By assuming that noise in the pairwise distance measurements scale linearly with distance, both the mean and variance of the Gaussian describing this distribution is efficiently encoded in a single PC-PC synapse.
Simplified analysis of the probabilistic graphical model likelihood
To characterize model predictions in the environmental rescaling71 and gain change7 experiments, we studied a reduced version of the full graphical model (Eq. 3; see Supplementary Methods for full derivation). In 1D, given a linear observation model x = H(x’) = K1 + Kx’ and a large number of evenly spaced place fields, Eq. 3 simplifies to (see Supplementary Methods): which can be solved analytically. A similar reduction was applied to the 2D case when considering differential shifts in grid and place fields23.
Belief propagation for offline inference
Belief propagation48 is an iterative, two-stage local message-passing scheme in which, at each iteration n, a feature node (i.e. a place cell) first updates its own belief (connections to the grid cells) by integrating messages from connected nodes j ∈ Γi with its own prior belief (Fig. 5C):
The message from node j to node i (mj→i) communicates its belief over the distribution of the locations of place cell i in grid cell space, conditioned on its own location distribution: where the pairwise potentials ψ(·) are the same as those described in the full likelihood function (Eq. 3). The graph converges when new messages cease to change the beliefs of their recipient nodes.
Scheduled message passing on the place cell graph
‘Synchronous’ belief propagation computes belief updates for each step before broadcasting all new messages in the next step. In simulations, we demonstrated that scheduling message broadcasts based on internal ‘message tension’ (divergence between previous and updated belief given new messages) produced faster and more accurate convergence (Fig. 6B; see also Elidan et al.49). Message tension between the node’s previous and updated beliefs is defined as: where is the Jensen-Shannon (symmetric K-L) divergence, where:
When the message tension is below a pre-defined threshold , a node has converged and ceases to broadcast new messages. This mechanism is similar to the prediction error between transition and observation models used to trigger offline inference, with the exception that it uses the symmetric divergence measure rather than .
Traveling waves in neural media
In simulations, the traveling waves in mEC are simulated explicitly by calculating the true messages conditioned on the sending nodes’ current beliefs at each time-step (Eq. 5B). In the ‘neural model’ (Fig. 7), messages were approximated as waves propagating radially from an initial stimulation on the mEC sheet using a modified mechanical wave model, as used to describe oscillations in water: where c is the speed of wave propagation, [·]+ is a threshold linear activation function, and the modified spatial Laplacian operator ∇’ is a symmetric 2D Gaussian filter with variance equal to the PI noise (see Supplementary Methods for extended discussion).
Acknowledgements
We thank the groups of Caswell Barry and Francesca Cacucci for use of their data and Martin Pearson (UWE and Bristol Robotics Lab), Dan Bush, Andrej Bicanski and Tim Behrens for valuable theoretical discussions.
We acknowledge funding by the ERC Advanced grant NEUROMEM, the Wellcome Trust and the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 785907 Human Brain Project SGA2. The authors declare no competing financial interests.
Footnotes
Some were having trouble downloading the original submission - we've reduced the file size in this "revision".
http://www.talfanevans.co.uk/Papers/evans_burgess_2020_bioArxiv/Video_1_Loop_closure.mp4
http://www.talfanevans.co.uk/Papers/evans_burgess_2020_bioArxiv/Video_2_Neural_mechanism.mp4