## Abstract

When animals explore spatial environments, their representations often fragment into multiple maps. What determines these map fragmentations, and can we predict where they will occur with simple principles? We pose the problem of fragmentation of an environment as one of (online) spatial clustering. Taking inspiration from the notion of a *contiguous region* in robotics, we develop a theory in which fragmentation decisions are driven by surprisal. When this criterion is implemented with boundary, grid, and place cells in various environments, it produces map fragmentations from the first exploration of each space. Augmented with a long-term spatial memory and a rule similar to the distance-dependent Chinese Restaurant Process for selecting among relevant memories, the theory predicts the reuse of map fragments in environments with repeating substructures. Our model provides a simple rule for generating spatial state abstractions and predicts map fragmentations observed in electrophysiological recordings. It further predicts that there should be “fragmentation decision” or “fracture” cells, which in multicompartment environments could be called “doorway” cells. Finally, we show that the resulting abstractions can lead to large (orders of magnitude) improvements in the ability to plan and navigate through complex environments.

## Introduction

Contextual reorientation [1] and reanchoring, in which behavior, state estimates, or meaning are suddenly reevaluated based on new contextual information from the world, are universal phenomena in psychology. One famous set of examples is the parsing of garden-path sentences such as “Time flies like an arrow, fruit flies like a banana” or “The woman brought the sandwich from the kitchen tripped” [2]. In the latter there is a sudden reorientation upon hearing the word tripped, so that *the woman* becomes the person *who was* brought the sandwich rather than the person bringing the sandwich. Similarly, spatial reorientation and reanchoring can occur when entering a building lobby from the outside or entering a different looking room from another one. Such reanchoring or reorientation events may constitute the basis on which the brain segments the continuous stream of experience into episodes or chunks that it uses to structure experience and memory [3–5].

In the brain, grid cells construct continuous 2-dimensional Euclidean maps of small environments [6] by the integration of self-movement cues as the animal explores the space, Fig. 1a. The advantage of such velocity integration-based Euclidean representations is that they provide a consistent encoding of seen and unseen locations and independent of paths taken to get there, making it possible to compute novel shortcut paths between locations [7–11].

However, between different environments, place and grid cells “remap”: Representations of these environments involve different (if overlapping) sets of place cells and the spatial relationships between place cells in one environment are not preserved in the other [12]. Grid remapping is more subtle: grid cells exhibit coherent module-wide shifts that are differential across modules in their firing phases [13]. Remapping can be driven by non-spatial changes in context (e.g. changes in olfactory or visual cues within the same space) or by large spatial changes where the subject cannot easily determine its spatial displacement (e.g. after a journey in a closed vehicle).

These jumps in spatial representation, typically studied by discontinuously transplanting subjects from one environment to another or by switching non-spatial cues, can also occur when subjects smoothly navigate themselves within a single unchanging environment, particularly if it has many compartments or subregions [14, 15], Fig. 1b-c. This phenomenon – referred to as map fragmentation – is also a form of remapping. However, it is a distinct version of remapping different from the traditional use of the word and concept [12] because the environment remains stationary as the animal moves continuously through it, while in typical remapping experiments the environment changes [16]. We hypothesize that map fragmentation is a solution to multiple problems: First, it solves the problem of the accumulation of path integration errors that prevent the formation of consistent maps over larger spaces, resulting in the formation of smaller but consistent Euclidean maps. Thus, map fragmentation enables spatial inference and shortcut behaviors within each submap. Second, each submap represents a state abstraction in which contiguous locations are clustered together, and combining these abstractions with links between them can permit efficient and hierarchical representation and planning. Third, submaps can combine more globally to form a “topometric” map, a representation with enough expressiveness for topologically non-trivial cognitive spaces beyond real space, that preserves the advantages of both local metric structure and global hierarchy and abstraction.

Here, we propose a simple online rule for map fragmentation that avoids the large memory, time complexity and data-inefficiency of offline algorithms, and show that the resulting rule is a good potential model of map segmentations observed in grid and place cells. Finally, we demonstrate by implementing efficient random tree search algorithms that map fragmentation can facilitate efficient planning relative to using global maps, leading to a massive speed-up in complex and large environments without repeated substructures.

## Results

### Map fragmentation as clustering: an offline baseline

We propose that remapping across environments and fragmentation within environments can be considered to be a clustering problem: At each sampled location, the question is whether it should be categorized as a part of the most recently used map, or be assigned to a different one. A sensible answer would be that sufficiently “similar” locations should be assigned to the same map (cluster), while sufficiently different ones should be assigned to different maps (clusters).

We view a map as a (local) world model that enables the prediction of sensory inputs at any location within the map. Thus, we consider that a key metric for map fragmentation may be predictability or surprisal, Fig. 2. A similar metric has been used in robotics methods for simultaneous localization and mapping (SLAM) [17]. Specifically, sets of poses (locations and orientations) where the predictability of external observations remains high while moving between them (“contiguous regions”) should be clustered together into one map, Fig. 2. This view complements the use of other metrics that have been implemented in offline settings to construct spatial maps, including the graph Laplacian [18] and successor representation [19] methods, both of which use temporal proximity as their metric (indeed, under a random exploration policy, the successor representation is closely related to the graph Laplacian). Our primary focus here is on how biological and artificial agents might generate sensible maps in an online fashion. Secondarily, we use the metric of prediction or surprisal to generate these online fragmentations. In Discussion, we will consider how additional metrics can be used within the same online framework.

Define a model ℙ (*z*′ | *x*′, *x, z*) that predicts the sensory input *z*′ at pose *x*′, based on the sensory input *z* at pose *x* (Fig. 2a,b; see Methods for details). The sensory observations and their predictions are given in terms of a range sensor centered on the agent, in the actual environment (Fig. 2a, right) or in a reconstructed map based on the observations *z* (Fig. 2a, middle), respectively. For each pose *x*, we delineate the surrounding region where predictability remains above threshold; this, by definition, is a contiguous region. We call the boundary of the region the prediction horizon for *x*. The radius of the prediction horizon varies depending on location within the environment, Fig. 2c. We can use the mutual surprise between poses, which we define as (see Methods for details), as a measure of proximity that we illustrate with an Isomap embedding [20] of the environment (Fig. 2d). In this visualization, contiguous (high predictability) regions are compressed, while transition or bottleneck regions (low predictability) are stretched.

Finally, we define the average surprisal (see Methods) of a pose *x* by averaging over the mutual surprise of all nearby poses at a fixed Euclidean distance, and apply a clustering procedure similar to DBSCAN [21]^{1}. The procedure computes the connected components of all locations whose average surprise lies below a fixed threshold and decomposes the map into core fragments and transition regions, Fig. 2e. Additionally, in order to make an informed choice about the fragmentation threshold, we compute a contour tree (cf. [22]) of the surprisal values, which provides a visualization of how the connectivity ot space evolves with increasing thresholds, Fig. 2f. As we see, there are regions of the contour tree that are relative robust to the detailed threshold choice, providing similar connectivities over a range of threshold values.

The surprisal-based segmentations align well with both intuitive fragmentations and with neural data (cf. Fig. 1), suggesting that predictability may be a key and principled objective for map segmentation decisions.

However, the algorithm is offline, requiring full exploration of the space before it can generate the fragmented map. This is unlike in experiments, where animals generate map fragmentations in real-time as they explore an environment [15]; in non-spatial contexts too, there is evidence that event boundaries are defined in real-time [5, 23]. The algorithm also has high complexity, requiring fine spatial discretization and a large memory and computational buffer for the storage of and computation on the full predictivity matrix over all pairs of positions in the space. The same is true for Laplacian and successor matrix-based methods. Further, there is an additional gap between observed map fragmentations in biology and the latter two algorithms because while they provide multi-scale representations of the space (in the form of eigenvectors of some similarity matrix), they are not actually fragmentations of the environment, Fig. S3,S4.

### Online fragmentation based on predictability: Our model

We next build a simple and biologically plausible online map fragmentation model based on surprisal, with the goal of generating fragmentations that are consistent with the principled offline clustering-based algorithm above. Our model is an agent that integrates its velocity as it explores an environment to update its pose estimate, and uses a short-term memory (STM) and a long-term memory (LTM) to make predictions about what it expects to see next.

The sensory observations for the online model consist of the activities of a population of cells that encode the presence of environmental boundaries at some distance, similar to boundary vector cells (BVCs) [24, 25] in entorhinal cortex or boundary-coding cells in the occipital place area [26]. These encode a binary, idiothetically-centered local view of the space^{2} (with observation field-of-view angle *ϕ*), Fig. 3a,c. The velocity-based position estimates are represented by a population of idealized grid cells from multiple modules. For simplicity and to match the experimental setups in [14, 15] we assume that the pose angle is specified by a global orienting cue – effectively, the agent has access to its true head direction. The STM consists of an exponentially decaying moving average of recent observations, each shifted according to the internal velocity estimate of the agent, Fig. 3d. The STM is used to generate the prediction for the next observation (motivated by [17]), and a normalized dot product between the prediction and the current observation (BVC activity) yields our predictability signal (Fig. 3b,e). Due to its implementation as a moving average, STM activity slightly lags BVC activity. While high predictability is maintained along a trajectory, no fragmentation occurs. Once the predictability signal dips below a threshold, then at the first subsequent stabilization of spatial information, signaled by predictability returning to threshold, a fragmentation event is triggered (Fig. 3b). At this point, the agent must make a decision about which map to use, for which it uses its LTM.

The LTM consists of associations between the grid cell-encoded position representations and the sensory observations (filtered through the STM) encountered by the agent in the past (Fig. 3f). At a fragmentation event, the agent stochastically retrieves a previously visited state and corresponding location estimate in proportion to its match with the current obseration, or with some small constant probability (Fig. 3g), the agent selects (initializes) a new map, which corresponds to selecting a randomized new internal position representation by randomizing the set of phases across the grid modules.

The stochastic selection of an item from LTM based on overlap with the current observation serves two purposes simultaneously: first, an observation is likely to drive selection of a closely matching prior observation, and second, the retrieval of a previous observation is also proportional to the number of times that observation has been made before, because stochastic selection from the set of past observations is a form of monte carlo volume estimation. In short, the selection of a submap after a fragmentation decision enables the reuse of existing submaps to represent new spaces when relevant based on similarity and frequency of past observations, while simultaneously permitting the creation of new maps. The frequency-dependence of this process together with the possibility to create new maps is similar to the Bayesian nonparametric Chinese Restaurant Process (CRP) [16, 28, 29]; the observational similarity component makes it more akin to the distance dependent CRP (dd-CRP) [30, 31]. However, a key difference is that our observations are only implicitly clustered into submaps: each observation and location pair is stored independently of the rest in the LTM without a submap assignment, with submap boundaries only defined by the existence of a fragmentation decision and a jump in the grid-encoded spatial locations for the post-fragmentation observation relative to the immediate pre-fragmentation observation ^{3}.

Further maintaining a “temporal” LTM which memorizes spatial transition probabilities, and using this information to bias the selection of a map at fragmentation events stabilizes how an environment is fragmented, though it is not critical (see SI, Fig. S6). Spatial transitions contain valuable information about the relationships between individual map fragments and are important for exploiting their hierarchical structure in route planning, as we illustrate later.

### Fragmented maps in multiple environments

We explore the map fragmentations generated by our online model across organically shaped and previously experimentally tested structured multi-compartment environments, Fig. 4. The online model generates fragmentations at locations that correspond to observation bottlenecks, including at doorways or narrow openings and around the corner of sharp turns, Fig. 4a-c (top). Starting from the first trajectory through the space and across multiple trajectories, the remapping or fragmentation points and the selected maps are consistent, evidence of the robustness and reliability of online fragmentation (Fig. 4a-c, bottom). In the two-room and hallway environment, the model generates a fragmentation in which the two rooms are each represented by the same local map (rather than a single global map), and these maps are distinct from the map for the hallway. Moreover, the fragmentations generated by the online model are consistent with the fragments from the principled baseline method, Fig. 4d. This model can be used to generate fragmentations and predicted grid cell tuning curves for arbitrary enviromental geometries; we do so for model cells from different grid modules in two environments, Fig. 4e,h (fragmentations of more environments, including a square sprial maze and a simple linear track, shown in SI, Fig. S1a-c). If the angular field is of view *ϕ* is restricted rather than omnidirectional, the maps also acquire direction tuning, Fig. 4h and Fig. S1c.

### Coherence of fragmentation across scales and maintenance of cell-cell relationships

Two key structural predictions of our model are, first, that the map fragmentations are consistent and coherent across scales (across grid modules), with all cells and modules remapping at the same spatial location in an environment. This is in contrast with eigenvector-based models [18, 19, 32], in which there is no specific or coherent remapping decision that is made across eigenvectors, Fig. S3,S4, Fig. 6.

Second, in our model all grid cells within each module maintain fixed cell-cell relationships across map fragments and environments. This too is in direct contrast with eigenvector-based models, Fig. 4g,S5, Fig. 6. Consistent with our model, grid cell data and analyses reveal that the pairwise relationships between co-modular grid cells remain stable across environments [33] and states [34, 35] even when place cells remap and their relationships change. These neural data are inconsistent more generally with models in which grid cell responses are derived from place cell responses [32, 36] because they would predict altered cell-cell relationships when place cells remap [34].

### Efficient planning with fragmentations

Next, we quantify the functional utility of map fragmentation in a navigational planning problem. The fragmented maps, which represent a form of state abstraction, decompose the planning problem hierarchically, into a family of smaller and simpler sub-problems. Thus, they are expected to make planning more efficient. We perform computational experiments to illustrate this point, comparing a bi-level navigation algorithm in the fragmented map with a simple baseline.

Consider a goal-directed problem in which agents, who have previously mapped the space, are tasked with finding a path to a cued goal location from a start location. For planning, we will assume that the LTM containing stored observation-location associations also includes an explicit submap identification (that is, all observations until a fragmentation event are assigned the same submap ID; at a fragmentation event, if the retrieved map has not yet been assigned a submap ID, a new submap ID is initiated and added to the LTM and associated with all subsequent observations until the next fragmentation event, and so on; all the observations between fragmentations are fused using local displacement information to form a submap for the whole fragment) and storing submap transitions. The environments are complex, but *without* repeating submap structure (Fig. 5a-b, d-e), because the fragmented representations generated by our simple agent do not distinguish between difference spaces with the same appearance (no global odometry assumed across submaps).

The baseline (global) agent is furnished with a global map, which includes ground-truth position informaton for all observations (Fig. 5a,d) and uses the Rapidly-exploring Random Tree (RRT) algorithm [37] to find a path through the space (Methods). The agent using a fragmented approach constructs a graph in which the nodes correspond to the submaps, and the edges correspond to observed transitions between submaps during exploration. It performs a depth-first search through the transition tree to find the sequence of submaps that lead to the node containing the target location (determined by querying the LTM with the target inputs). Within each submap, the agent uses the RRT algorithm to plan a path between the locations corresponding to the entry and exit edges. This agent possesses no global positional information.

In the environment of Fig. 5a-b, routes are found vastly more rapidly with fragmented maps than without: we see a ∼ 5-fold speedup. The relative advantage of planning with fragmented maps grows superlinearly with the complexity and size of the environment and separation between start and end locations within these spaces, Fig. 5c (right; steps are a proxy for the problem complexity).

Next, we simulate agents moving through 3D photorealistic virtual apartments in which observations are rich pixel images with range data, Fig. 5d, g. We apply con-volutional visual recognition networks to the dense inputs to extract sparse landmarks and use these to generate online map segmentations (Methods). As before, the agent performs bi-level planning on the tree of transitions with submaps as nodes and RRT planning within submaps. Here we find a several orders of magnitude speedup in plan- ning with map fragmentation, Fig. 5f.

## Discussion

### Relationship to existing work

Existing models of neural map fragmentation fall into two categories, Fig. 6: The first assumes that fragmentation is driven by large path integration errors that cause a mismatch between estimated position and familiar observations [38]; in environments with little ambiguity in the external sensory cues, there would be no fragmentation. The second considers eigenvectors of different types of transition matrices, e.g. eigenvectors of the graph Laplacian of the adjacency matrix or of the successor representation [18, 19]. These models require a global buildup of the transition matrix between all pairs of locations in the environment, requiring in some sense a complete map of the environment before any fragmentations could happen. They also do not provide an explicit fragmentation of the environment, but rather a global map whose eigenvectors may be interpreted as the tuning curves of grid cells, and a subset of which appear visually fragmented (with different fragmentations by different grid cells and modules). By contrast, our model is fully online and provides explicit, robust fragmentations starting from the first trajectory through an environment, even in the absence of positional ambiguity; it also requires a smaller memory demand than transition matrix models. Finally, the model involves only simple, biologically plausible computational elements, with grid cells and BVCs and a short-term and a long-term memory, to explain a number of experimental results.

Our work, initially motivated by the empirical observations of fragmented maps in neuroscience, is closely related to work on segmented maps in the field of simultaneous localization and mapping (SLAM) in robotics [17, 39–41]. The main difference is that predictions in our model are based on a temporally limited window into the past, provided through the STM, whereas in [17] *all* observations are accumulated into a map that the prediction is based on. Further, our predictions are based on idiothetically-centered local views of the environment (BVC) – which are not assembled into a global allocentric map – and use an adapted moving average as a STM. For (re-)localization we use local views stored in a spatially indexed LTM.

As we have shown, spatial abstraction and spatial hierarchy in the form of map fragments can be of high utility in efficient search for solving goal-directed problems. State abstractions and hierarchical representations are broadly recognized to be important for more efficient reinforcement learning as well, and implemented in different forms including the classic options framework and more recent attempts [42, 43]. A key challenge for such approaches is to find rules that generate appropriate state abstractions, especially those capable of doing so in an online or streaming way. Our work is a contribution in this direction; related work includes the generation of temporal abstractions based on novelty rather than surprisal [44].

Our use of a surprisal signal is closely related to curiosity-based algorithms for rein-forcement learning [45]. These algorithms use prediction error as an internal reward, to drive agents to explore unknown parts of the space. By contrast, we use prediction error as a way to generate state abstractions.

### Model extensions: a broader set of metrics for fragmentation

The general principle of online state abstraction through online map fragmentation can use metrics in addition to surprisal for triggering a fragmentation event. Consider the case of two hallways with similar ideothetically-centered views, e.g. hallways 2 and 3 in Fig. 4c, that differ only in the permitted turn direction at the end. A natural extension of the model would be to incorporate a cell population encoding navigational affordances, to fragment and select maps based not only sensory surprisal but also on the set of actions that can be or are commonly taken. Other extensions include using the physical distance between states [18, 19], the passage of time [46–48] with a dynamic (temporally decaying) threshold for fragmentation that makes fragmentation more likely as time elapses (also see [17]), the appearance unique or novel visual features including landmarks [44, 49–51], and sufficient mismatch in the estimates of state made from different cues or sensory modalities [38, 52], in addition to the metric of perceptual predictibility that we have used here and that the hippocampus has been shown to be sensitive to [53, 54]. The present model, which provides an online method for generating meaningful abstractions, may be applied with arbitrary combinations of these metrics to generate fragmentations influenced by multiple factors.

### Merging of maps

In case of prolonged experience in the two compartment environment, map fragmentations tend to merge into a single, continuous representation that covers both compartments [15]. In our model some map fragments can, because of the stochastic nature of the fragmentation process, occasionally extend beyond an expected fragmentation boundary (see Fig. S1d). These events occur sparsely and are unlikely to be the source of the merging of maps observed in [15]. We expect the merging of maps to result from an improvement of the prediction signal with more experience, which can be modeled by allowing the prediction system to use not just recent observations from short term memory, but also past observations from long term memory. Exploring the dynamics of this process is an interesting potential extension of the present work.

### Role of map fragmentation for general cognitive representation

Our model of online fragmentation of a continuous stream of experience enables the representation of a very general class of maps – including in spatial and non-spatial cognitive domains – in a way that exceeds the capabilities of a “pure” grid code. Grid cells generate Euclidean representations of Euclidean spaces [11]. Fragmented maps can each be viewed as separate local Euclidean “charts”, mapped out by a multi-modular grid code, that are then associated to each other through transitions learned in the hip-pocampus according to the global layout of the charts. In other words, the combination of fragmented maps and the transitions between them can be viewed as a topological atlas [55] or topometric map [56], that can represent highly non-Euclidean structures while also permitting locally metric computations.

Thus, from a general perspective, map fragmentation and remapping (reanchoring) on cognitive representations can be viewed as faciliating the step from representing flat Euclidean space to representing richer manifolds. In combination with grid cells’ ability to represent high-dimensional variables [11], such a coding scheme becomes highly expressive.

In contrast to the approach taken in [57, 58] there is no need to generate entirely new neural codes and representations to fit the local statistics of the explored space. Instead, we propose that the neural codes seen within submaps retain their native structure across spaces, in the form of a pre-formed and stable recurrent scaffold for memory through grid cells. Even though grid cell representations in each module are 2-dimensional, theoretically the set of modules an represent even high-dimensional continuous spaces [11], while potential non-Euclidean aspects of cognitive varaibles can be captured by the between-submap transitions. This structuring of memory into continuous parts with preexisting scaffolds [59–63] together with occasional transitions between these continuous chunks simultaneously provides rapid learning and flexibility.

### Episodic memory

Episodic memory, one example of a general cognitive representation, deserves special discussion because of the privileged role of the hippocampal system in its creation, storage, and use [64, 65]. Like spatial map fragmentations, episodic memory involves partitioning or clustering the continuous stream of temporal experience into chunks that involve similar perceptual, temporal, and contextual elements [3–5, 66]. Our proposal for surprisal-based spatial segmentations could be applied to study memory chunking. Interestingly, the memory for non-spatial items has also been shown to segment based on changes in spatial context, specifically by passage through doorways [67], as would be predicted by the present model.

The utility of applying our model first in the spatial domain is that it yields concrete predictions that are quantifiably consistent with observed map fragmentations. Applying it across cognitive domains will contribute to a unified computational model for how the hippocampal formation generates structured memories of spatial and non-spatial cognitive experience [57, 58, 65, 66, 68, 69].

### Experimental Predictions

The decision to form a new map fragment in our model depends only on recent observations that are filtered through a STM, without requiring global information about the enironment. Thus, map fragmentations are predicted to occur in real time and on the very first pass through relevant regions of new environments, consistent with experimental results in the spatial and non-spatial domains [3–5, 70]. Further, in our model, all grid cells and grid modules undergo map fragmentation simultaneously, at the same time and location along a given trajectory, unlike in other models (Fig. S3,S3) [18, 19]. Fragmentations tend to occur at spatial bottlenecks that limit the prediction horizon, which correspond to “doorways” in the environment. The current evidence for cells firing at doorways is mixed [71, 72]. However, the necessity for a neural correlate that communicates the fragmentation decision and facilates across-module grid realignment under a fragmentation event predicts the existence of “fragmentation decision cells” or “doorway cells” whose tuning curves would resemble the heatmaps of Fig. 4d.

A common theme in MEC seems to be that cells with spatially structured tuning coexist with vector versions of themselves: i.e., cells that have similar tuning curves but are offset by a fixed vector (e.g. BVCs [24, 25] and landmark or object vector cells [73, 74]). In this light we might also expect “fragmentation vector cells” or “doorway vector cells” that fire if the rodent is at a fixed angle and distance from a fragmentation location. These cells, which could be interpreted as encoding future action affordances or future map transitions, would faciliate planning.

Next, the model predicts that the stochastic process of generating map fragmentations can result in more than one map for the same region even when there is not an explicit manipulation of context or task. There are at least two implications of this result. First, it suggests that variations in the firing of grid and place cells on different visits to a location might be due not only to variable paths taken within a single map [75] but to the retrieval of entirely different maps. Second, these multiple, stochastically generated maps might subsequently be easy to harness for contextual differentiation, for instance like “splitter” cells [58, 76–78].

Finally, the large efficiencies in planning and goal-directed navigation afforded by the use of fragmented maps suggest that neural planning should exhibit hallmarks of the fragmentation process: If theta phase precession or waking neural replay events [79–85] correspond to planning [86–88], we should expect them to exhibit punctate trajectories with hierarchical dynamics between versus within fragments in multicompartment environments.

## Methods

The source code will be made available online upon publication or request.

### Offline fragmentations from predictability and surprise

We approximate the probabilistic observation model *P* (*z*′ | *x*′ , *x, z*) by
Here is the maximum a posteriori estimation of an occupancy map given by an inverse sensor model as described in [27], and *P* (*z*′ | *x, m*) is the respective range sensor model. More precisely: Given a deterministic range sensor that takes measurements along a fixed number (*n* = 1000, 1500) of simulated beams, whose angles are chosen at equally spaced angles from the interval [− *π, π*], we take three depth measurements *z, z*′ , and *z*″ . The first two are taken in the actual environment at their resepective poses *x* and *x*′ , whereas the third is taken on a map built from the initial measurement *z* made at *x*. The observation model *P* (*z*′ | *x, m*) is then defined as a multivariate diagonal Gaussian with constant diagonal entries *σ* = 1.0 and mean *z*″ , Fig. 2a,b.

The function underlying the distance matrix used for the Isomap embedding (cf. Fig. 2d) is given by the *mutual surprise s*(*x, x*′) between two poses *x, x*′ which we define as
We refer to the negative mutual suprise as *mutual predictability*. With this in hand we define the *average surprise s*(*x*) = *s*(*x*; *r, ε*) of a pose by averaging over the mutual surprise about all poses at a fixed distance. To be more precise we define
where Δ_{x} = Δ_{x}(*r, ε*) is the set of all poses whose distance to *x* lies in the range [*r*−*ε, r*+*ε*], for some previously fixed *r, ε* > 0; in our experiments we use *r* ≈ 0.4 and *ε* ≈ 0.05 depending of the minimal distances between poses. We sometimes refer to the negative avergage mutual suprise as *contiguity*. Informally, a high contiguity implies fewer suprises in direct proximity of the current pose and thus a low urge to remap.

To extract map fragmentations we uniformly sample poses from the environment and compute their avergage surprise, Fig. 2e. We then consider only those poses whose surprise lies below a previously fixed threshold (chosen acordingly for each environment). To make an informed choice about the threshold we compute a discrete contour tree of the poses with respect to the average surprise visualizing the evolution of the connectivity with respect to increasing thresholds, Fig. 2f. The connected components of the subthreshold region yields a fragmentation into sub-maps, one for each connected region, and a suprathreshold transition region. We consider two poses to be connected if their Euclidean distance is below another previously fixed threshold that depends on the coverage of the environment by all the pose samples.

### Online fragmentations from predictability

#### Observations and internal mapping locations

As before, our observation model is given by a range sensor that takes measurements along a fixed number of simulated beams. The beams’ angles are chosen at equally spaced angles from the interval [*θ*_{t} −*ϕ*/2, *θ*_{t} + *ϕ*/2]. Here *θ*_{t} denotes the head direction at time *t* and *ϕ* = 360°, 270° defines the field of view of the agent; cf. Fig. 3a. We convert these range measurements into the activity *z*_{t} of a simulated population of boundary vector cells by a binning process; cf. Fig. 3a,c. In our model we assume there is a *n* × *n* array of BVCs covering an area of *w* × *w*, with *n* = 91, 111 and *w* ≈ 4m.

We assume that internally locations are represented by a population of idealized grid cells of mulitple scales. For ease of computation, we interpret this multi-module grid code as a high capcity code for an unfolded 2-dimensional space [11]; cf. Fig. 4a-c (bottom). The Poisson rate maps *f*_{c} for an idealized grid cell *c* are then generated from superposing three cosinusoidal waves, each offset by an angle of 60°, over the unfolded 2-dimensional grid space, i.e.
Here *λ*_{c} and *x*_{c} encode the lattice scale and its offset, and *R*_{c} is a rotation matrix defining the orientation of the lattice.

#### Short term memory

The short term memory (STM) is defined as an adapted exponential moving average of BVC activity:
where the prediction
is a shifted version of the 2d-array *m*_{t− 1} with respect to the scaled velocity of the agent. We found that a smoothing parameter *α* ≈ 0.9 works well. The scaling parameter maps from the environment into pixel space. The shift of the BVC array results in a diffused version of the array caused by shifts with non-integer values. The extent of diffusion depends on the resolution (or number) of the BVCs.

#### Prediction model and fragmentation events

The prediction model is a normalized dot product of the current BVC observation *z*_{t} with the prediction computed from the STM as described above:
where vec(*z*) is the unfolded version of a 2d-array *z*; cf. Fig. 3b-d. A fragmentation event is triggered after the prediction signal ℙ (*z*_{t} | *m*_{t − 1}) recovers from falling below a fixed threshold *θ* (≈ 0.9, 0.925) and rises above again; cf. Fig. 3b. The normalization and the fact that both *z*_{t} and are non-negative ensures that the prediction score lies within the intervall [0, 1].

#### Long term memory and relocalization

The LTM is implemented as a matrix *M* ∈ ℝ^{n× S}whose *s*’th column is given by the concatenation of the internal position estimate, its predecessor and the state of the STM at the time *t*_{s} the entry was written to memory, i.e. we have (cf. Fig. 3e)
We fill the memory as follows: At each time step *t* we choose a slot (column) *s*_{t} in the memory and replace the corresponding entry with the new one. Until we reach capacity, that is as long as *t* ≤ *S*, we set *s*_{t} = *t*, after that the slots *s*_{t} are chosen uniformly at random – similar to the associative memory in [89]. Thus, the LTM consists of two associative memories: one storing assiciations between locations and observations, and the other storing state transitions. Alternatively, one could store associations with the actual observations and not the filtered observations from the , but we found that the associations with the STM work better and result in more stable fragmentations. The same is true when we restrict the capacity of the memory; cf. Fig. S6. Note that the LTM also maintains a temporal memory storing transitions for each entry in the memory. We use a memory size *S* between 2000 and 6000.

To determine the new location during a fragmentation event we query the LTM and compute two distinct weight vectors *w*_{1} and *w*_{2}. The first encodes how well a given observation *z* fits any of its entries and is given by
With slight abuse of notation we denote by *θ*_{M} the function that sets all values below a certain threshold *θ*_{M} to −∞. For ease of notation we set e^{−∞} := 0 – this becomes relevant in the probability computation below. We usually set this threshold to be equal to the fragmentation threshold *θ* ≈ 0.93. In order to allow for more flexibility during the above lookup we query the LTM not only with the actual observation *z*, but also with observations shifted by small pixel offsets *δ*, i.e. with *z*_{δ} = shift(*z, δ*) instead of just *z*, where *δ* ∈ Z^{2}is chosen from a small region Δ around the origin. If a shifted observation fits a particular entry in the memory better, we replace the corresponding entry in the computed weight vector *w*_{1}. Then, if one of these adjusted entries, *s* say, is chosen during a remapping event we do not remap exactly to the associated position but adjust it proportional to the respective offset *δ* and remap to (recall that λ translates from environment to pixel coordinates).

The second vector serves as a bias towards map transitions that have already been traversed and is given by
Note that we use the Euclidean norm between two 2d vectors out of computational convenience, but we could have used the dot product of their corresponding multi modular grid codes as well. Finally, when a fragmentation event is triggered we sample a new location from
where *P*_{0}(*x*) is a distribution over the space of possible locations, *w*_{0} = 1 the concentration parameter, and *τ* = 1, 10, 20 is the inverse temperature of the model; cf. Fig. 3g.

#### Trajectories

The trajectories are generated by choosing waypoints in the environment uniformly at random and navigating toward the next waypoint along a perturbed shortest path at a mean speed of 20 cm*/*sec. Time is discretized into steps of size Δ*t* = 0.1 sec.

#### Hierarchical Planning

We apply Rapidly-exploring Random Trees (RRT) [37] to find a path between randomly chosen pairs of start and target positions, Fig. 5a. Next, we run our online segementation algorithm to get an environment-fragmentation into submaps and form a topological graph whose vertices and edges correspond to submaps and transitions respectively. For each map-fragment we superpose all its associated memories (STM-filtered BVC activity) and threshold this newly formed representation to form an occupancy grid map (in the sense of [27]) which we can apply the path planning algorithm to. We exploit the hierarchical structure by first finding a path of transitions in the topological graph, using a breadth-first search, and reduce the overall planning task into a family of sub-problems as follows: Each transition into- and out of a node defines a pair of local entry and exit postions on the submap associated with the traversed node defining a smaller planning problem that can be solved more effectively, Fig. 5b. In Fig. 5c we plot the distances between start and goal locations against the number of planning steps needed. The algorithm underlying the results in Fig. 2d–g is given as follows. Because the 3D environments involve dense observations of pixel-rich data, we add image processing and observation sparsification steps in the form of landmark identification. The agent receives RGB-D images as input, removes the floor plane, and segments the resulting point cloud. It retains as landmarks the large segments that are not vertical walls, which are generally large furniture items that are both relatively static and easy to recognize robustly from new viewpoints. As it moves through the environment, fragments are defined as follows: Starting at the initial location, the current fragment is defined a set of two visible landmarks, and the region of space from which both those landmarks remain in view constitutes the set of spatial locations assigned to that fragment. Whenever the agent moves into a part of the space where one or both of those landmarks are not visible, and if the current location does not correspond to any existing fragment, it starts a new fragment. Each fragment is connected topologically to the fragment it entered from.

#### Arm-arm correlation

The correlation matrices in Fig. 4i are computed as follows. For each arm in Fig. 4h we produce a 1-dimensional signal by averaging over the x-axis of the respective tuning curves in each arm. Each entry *c*_{ij} in the matrix is then given by the Pearson correlation coefficient of the 1-dimensional signals in arm *i* and *j*.

## Footnotes

↵* {mklukas{at}mit.edu, fiete{at}mit.edu}

↵

^{1}The density notion in DBSCAN is based on a count of neighbors. We use the average mutual surprise instead.↵

^{2}This is also known as a grid occupancy map in robotics [27].↵

^{3}A temperature hyperparameter controls the degree of noise in the selection of a submap from LTM. This stochastic process allows us to not only use the degree of similarity but also the frequency of similar observations in selecting a submap: it performs a stochastic measurement of the volume of similar observations (submap occupancy), and then stochastically selects a map on that basis, without keeping an explicit count of how often each submap has been visited in the past. Thus, we may call this process a doubly-stochastic dd-CRP.

## References

- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.↵
- 49.↵
- 50.
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.
- 61.
- 62.
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.↵
- 86.↵
- 87.
- 88.↵
- 89.↵
- 90.