## Abstract

Experimental studies of grid cells in the Medial Entorhinal Cortex (MEC) have shown that they are selective to an array of spatial locations in the environment that form a hexagonal grid. However, place cells in the hippocampus are only selective to a single-location of the environment while granule cells in the dentate gyrus of the hippocampus have multiple discrete firing locations, but lack spatial periodicity. Given the anatomical connection from MEC to the hippocampus, previous feedforward models of grid-to-place have been proposed. Here, we propose a unified learning model that can describe the spatial tuning properties of both hippocampal place cells and dentate gyrus granule cells based on non-negative sparse coding. Sparse coding plays an important role in many cortical areas and is proposed here to have a key role in the navigational system of the brain in the hippocampus.Our results show that the hexagonal patterns of grid cells with various orientations, grid spacings and phases are necessary for model cells to learn a single spatial field that efficiently tile the entire spatial environment. However, if there is a lack of diversity in any grid parameters or a lack of cells in the network, this will lead to the emergence of place cells that have multiple firing locations. More surprisingly, the model shows that place cells can also emerge even when non-negative sparse coding is used with weakly-tuned MEC cells, instead of MEC grid cells, as the input to place cells. This work suggests that sparse coding may be one of the underlying organizing principles for the navigational system of the brain.

## 1 Introduction

The brain can perform extremely complex spatial navigation tasks, but how the brain does this remains unclear. Since the Nobel-prize-winning discovery of place cells in the hippocampus (Moser et al., 2008; O’Keefe, 1976; O’Keefe and Dostrovsky, 1971) and grid cells in the Medial Entorhinal Cortex (MEC) (Hafting et al., 2005; Rowland et al., 2016), brain regions involved in spatial awareness and navigation have attracted much attention from both experimental and computational neuroscientists.

Experimental rat studies show that hippocampal place cells have a single specific location in the environment at which they have an elevated firing rate (O’Keefe and Dostrovsky, 1971) and neighboring cells have firing fields at different locations of the environment, such that the local cell population in the hippocampus can represent the whole spatial environment (O’Keefe, 1976). In contrast, granule cells in the dentate gyrus of the hippocampal formation have multiple discrete firing locations without spatial periodicity (Jung and McNaughton, 1993; Leutgeb et al., 2007).

MEC grid cells are also spatially tuned to the locations of the environment. However, unlike hippocampal place cells, firing fields of grid cells form a hexagonal grid that evenly tile the entire environment (Hafting et al., 2005). The hexagonal grid of each grid cell is characterized by spacing (distance between fields on the grid), orientation (the degree of rotation relative to an external reference), and phase (offset relative to an external reference). The spacing of the grid increases step-wise monotonically along the dorsal-ventral axis (Hafting et al., 2005). Moreover, the progression in grid spacing along the dorsal-ventral axis is geometric, with ratio around 1.42, such that grid cells are organized into discrete modules according to their spacing (Stensola et al., 2012). Additionally, grid cells in each module also have similar orientation but random phases (Stensola et al., 2012).

Experimental evidence indicates that MEC grid cells are the main projecting neurons to the dentate gyrus and CA3 of the hippocampus (Leutgeb et al., 2007; Steward and Scoville, 1976; Tamamaki and Nojyo, 1993; Zhang et al., 2013). Consequently a variety of models have been proposed to explain the emergence of the firing fields of hippocampal place cells based on the feedforward connection from MEC grid cells, from mathematical models that have no learning (de Almeida et al., 2009; Solstad et al., 2006) to models with plasticity (Franzius et al., 2007a,b; Rolls et al., 2006; Savelli and Knierim, 2010).

For the learning models of grid-to-place formation, Rolls et al. (2006) used a competitive learning procedure to learn place cells from grid cell input. However, only approximately 10% of model cells were found to have a single-location place field. Furthermore, the competition in the model was introduced by manually setting the population activation to a small specified value that indicates a sparse network. Similarly, Franzius et al. (2007b) applied independent component analysis (ICA) (Hyvarinen, 1999) to maximising the sparseness of the model place cells. However, the examples of model place cells in these studies are mostly located at the border of the environment (Figure 1G in Franzius et al. (2007b) and Figure 3C in Franzius et al. (2007a)). Additionally, in their model the connection strength between grid and place cells can be positive or negative and the place cell responses were manually shifted by the addition of a constant term to ensure that they were non-negative, which puts into question the biological realization of the model. Furthermore, previous models do not investigate how well the learned place map represents the spatial environment.

Sparse coding, proposed by Olshausen and Field (1996), provides a compelling explanation of many experimental findings of brain network structures. One particular variant of sparse coding, non-negative sparse coding (Hoyer, 2003), has recently been shown to account for a wide range of neuronal responses in areas including the retina, primary visual cortex, inferotemporal cortex, auditory cortex, olfactory cortex and retrosplenial cortex (see Beyeler et al. (2019) for a review). However, whether sparse coding can account for the formation of hippocampal place cells has not previously been investigated in detail.

Here we applied sparse coding with non-negative constraints, where neuronal responses and connection weights are restricted to be non-negative, to building a learning model of place cells using grid cell responses as the input. Our results show that single-location place fields can be learnt that tile the entire environment, given a sufficient diversity in grid spacings, orientations and phases of the input grid cells. However, if there is a lack of diversity in any of these grid parameters, the learning of the place cells is impeded; instead, the learning results in more place cells with multiple firing locations. Furthermore, a lower number of grid cell inputs results in learning multiple place cell firing locations. The competition generated by the principle of sparse coding in the model naturally provides a global inhibition such that the place cells display discrete firing fields, suggesting that the proposed model can be implemented by biologically based neural mechanisms and circuits. Moreover, the model can still learn place cells even when the inputs to the place cells are replaced by the responses of weakly-tuned MEC cells. This suggests a plausible explanation of why place cells emerge earlier than grid cells during development (Langston et al., 2010; Wills et al., 2010).

## 2 Materials and Methods

### 2.1 Sparse coding with non-negative constraints

**Sparse coding** was originally proposed by Olshausen and Field (1996) to demonstrate that simple cells in the primary visual cortex represent their sensory input using an efficient neuronal representation, namely that their firing rates in response to natural images tend to be sparse (rarely attain large values) and statistically independent. In addition, sparse coding finds a reconstruction of the sensory input through a linear representation of features with minimal error, which can be understood as minimizing the following cost function
where the matrix **I** is the input, columns of **A** are basis vectors (universal features) from which any input can be constructed from a weighted sum, the vector **s** represents the neural responses and each element, *s _{i}*, is the coefficient for the corresponding basis vector, the function

*Q*(·) is a function that penalizes high activity of model units, and

*β*is a sparsity constant that scales the penalty function (Olshausen and Field, 1996, 1997). The term

**As**in Eq. (1) represents the model reconstruction of the input, so this cost function represents the sum of squared reconstruction error and response penalty. Therefore, the model finds a sparse representation for the input by solving this minimization problem. By taking the partial derivatives of Eq. (1) in terms of the elements of

**A**and

**s**, and then applying gradient descent, the dynamic equations and the learning rule are given by where 〈·〉 is the average operation,

*Q*′(·) is the derivative of

*Q*(·), and the dot notation represents differentiation with regard to time.

**Non-negative sparse coding** is simply sparse coding with non-negative constraints, i.e., the connection weights **A** and model responses **s** are restricted to non-negative values in the cost function Eq. (1). Note that, when *β* in Eq. (1) is set to zero, the cost function of non-negative sparse coding reduces to the cost function of non-negative matrix factorization (Lee and Seung, 1999).

### 2.2 The environment

The 2D spatial environment used in this study is a 1m×1m square box. A 32 × 32 grid with 1024 points is used to represent the entire environment. Therefore, a 1024 × 1 vector, denoted by **r**, with only one non-zero element can be used to represent the location of a virtual rat.

### 2.3 Grid cell model

The hexagonal firing fields of grid cells are represented in this study by the sum of three sinusoidal gratings (de Almeida et al., 2009; Kropff and Treves, 2008; Solstad et al., 2006), as described by
where is the grid cell response at the spatial location , λ is the grid spacing, *θ* is the grid orientation, represents the phase offset, and , sin(2*πi*/3 + *θ*)) is the unit vector with direction 2*πi*/3 + *θ*. *G*(·) described in Eq. (3) is normalized to have a maximum value of 1 and minimum of 0. Because of the periodicity of the hexagonal pattern, the grid orientation, *θ*, lies in the interval of [0, *π*/3), and the phase offsets in both *x* and *y* axis are smaller than the grid spacing, i.e., 0 ≤ *x*_{0}, *y*_{0} < λ.

Since grid cells have different spacings, orientations and phases, Eq. (3) is used to generate diverse grid fields. The value of the grid spacing, λ, ranges in value from 28 cm (Hafting et al., 2005; Solstad et al., 2006) and increases by a geometric ratio 1.42 that is consistent with experimental results (Stensola et al., 2012), and the optimal grid scale is derived by a mathematical study (Wei et al., 2015). For example, if there are *N _{λ}* different grid spacings, the spacings will be 28 cm, 28 × 1.42

^{Nλ−1}= 39.76 cm, o o o, 28 × 1.42

^{Nλ−1}cm. For each grid spacing, different values of grid orientation,

*θ*, are uniformly taken from the interval [0, 60°). For example, if there are 3 different grid orientations, the values will be 0, 20° and 40°. The number of different orientations for each grid spacing is denoted as

*N*. Furthermore, it is assumed here that there are

_{θ}*N*and

_{x}*N*phase offsets along x-axis and y-axis for each specific grid spacing and orientation. Similar to grid orientation, the value of the phase is taken uniformly from [0, λ). For example, if there are 2 phases along the x-axis, they will have the values

_{y}*x*

_{0}= 0 and λ/2. The resulting total number grid cells, denoted as

*N*

_{g}, will be the product of numbers of spacings, orientations and phases:

Some examples of grid fields described by Eq. (3) are shown in Figure 1. These grid fields have diverse grid spacings, orientations and phases.

Since the environment is represented by a 32 × 32 grid, a 1024 × 1 vector, denoted by **g**, can be used to represent the firing field of the grid cell over the entire environment. For a given position **r** in the environment, the response of a grid cell is simply **g**^{T}**r**.

The number of different grid parameters (*N*_{λ}, *N _{θ}, N_{x}* and

*N*) of grid cells defined above are assigned to different values to investigate the effect of the diversity of grid cells on the formation of place cells. Next we define grid cells with parameters that better capture the biologically observed variability, which will be used to investigate the robustness of the model.

_{y}Similarly, grid cells are separated into discrete modules based on their grid spacings, which is supported by experimental evidence (Stensola et al., 2012). However, values of the grid spacing are randomly sampled from normal distributions: there are four discrete modules for grid spacings (λ) with mean of 38.8 cm, 48.4 cm, 65 cm and 98.4 cm and a same standard deviation of 8 cm. For grid orientation (*θ*), since grid cells in the same spacing module tend to have similar orientations (Stensola et al., 2012), grid cells in the four discrete modules also have mean orientations 15°, 30°, 45° and 0°, and a common standard deviation of 3°. Because grid phases, (*x*_{0}, *y*_{0}), are random in each module (Stensola et al., 2012), grid phase is randomly sampled from a uniform distribution. Stensola et al. (2012) showed that 87% of grid cells belong to the two modules with small spacings. Therefore, when using realistic grid fields in the study, we have 43.5%, 43.5%, 6.5% and 6.5% of grid cells in the modules with mean spacings 38.8 cm, 48.4 cm, 65 cm and 98.4 cm, respectively (unless otherwise noted).

The firing field of the grid cells is taken be the sum of the spatial firing pattern at every vertex on the hexagonal pattern. The spatial firing pattern with vertex (*x _{v}, y_{v}*) is described by a 2D Gaussian function with the following form (Neher et al., 2017)
where γ

_{v}is the amplitude and

*σ*determines the radius of the firing field. The amplitude at every vertex of the hexagonal pattern, γ

_{v}, is chosen from a normal distribution with mean 1 and standard deviation 0.1, and σ is determined by the grid spacing, λ, with

*σ*= 0.32λ (Neher et al., 2017). The grid field is then the sum of firing fields (described by Eq. 5) at all vertices of the hexagonal pattern. The locations of vertices of the hexagonal pattern are determined by grid spacing, λ, grid orientation,

*θ*, and grid phase, (

*x*

_{0},

*y*

_{0}).

### 2.4 Structure of the model

In this study a two-layer network is proposed to model the activities of grid cells (first layer) and place cells (second layer), respectively. Given a spatial location in the environment, grid cells respond according to their firing fields. Grid cell responses then feed into place cells and the grid-place network implements a sparse coding model with non-negative constraints. The model structure is shown in Figure 2.

Denote **G** as a 1024 × *N*_{g} matrix that represents the firing fields for *N*_{g} grid cells in the network; i.e. each column of **G**, **g**_{i} (*i* = 1, 2,…, *N*_{g}) is a 1024 × 1 vector that represents the firing field of grid cell i. For a spatial location **r** in the environment, grid cell responses (firing rates), **s**_{g}, are given by **s**_{g} = **G**^{T}**r**. Place cell responses (firing rates), **s**_{p}, are computed by a sparse coding model for the grid-place network with non-negative connection **A**. Assume there are *N*_{p} place cells in the network. Then **A** is a *N*_{g} × *N*_{p} matrix and **s**_{p} is a *N*_{p} × 1 vector. Denote **u**_{p} as a *N*_{p} × 1 vector that represents membrane potentials of place cells. The model dynamics is given by
where *τ* is the time constant for place cells, *β* is the threshold of the rectifying function of firing rates, and **W** can be understood as the matrix of recurrent connections between place cells. In this paper, we take , where is a *N*_{p} × *N*_{p} identity matrix. The dynamics of place cells described in Eq. (6) is derived from the local competitive algorithm (LCA) proposed by Rozell et al. (2008) that solves sparse coding efficiently. However, place cell responses, **s**_{p}, and connection matrix, **A**, are taken to be non-negative in this study.

The code to run the model is available online (https://github.com/lianyunke/Learning-Place-Cells-from-Grid-Cells-Using-Nonnegative-Sparse-Coding).

#### 2.4.1 Learning rule

The learning rule for updating the connection strength matrix **A** is similar to that in previous studies of sparse coding (Olshausen and Field, 1997; Zhu and Rozell, 2013), as given by
where *η* is the learning rate. Elements of **A** are kept non-negative during training, i.e., the element will be set to 0 if it becomes negative after applying the learning rule described in Eq. (7). Then each column of **A** is normalised to unit length, similar to previous studies (Lian et al., 2019; Olshausen and Field, 1997; Rolls et al., 2006; Zhu and Rozell, 2013).

The model dynamics and learning rule described in Eqs. (6) and (7) can be implemented in a biologically realistic network (Lian et al., 2019). Here we simply use the equations described above to demonstrate that the principle of non-negative sparse coding can learn both place cells in the hippocampus and granule cells in the dentate gyrus.

### 2.5 Training

Since the environment used in this study is 1m×1m environment, the maximal grid spacing is taken to be smaller than 1m, i.e., 1 ≤ *N*_{λ} ≤ 4. All possible grid spacings are 28 cm, 39.76 cm, 56.46 cm and 80.17 cm. For grid orientation, we have 1 ≤ *N _{θ}* ≤ 7. For grid phase, we have the same number of phases in each direction and the maximal number is 5, i.e. 1 ≤

*N*=

_{x}*N*≤ 5.

_{y}There are 100 model cells at the second layer in our simulations, i.e.. *N*_{p} = 100. The dynamical system described by Eq. (6) is implemented by the first-order Euler method, where the membrane time constant is *τ* = 10ms, consistent with the physiological value (Dayan and Abbott, 2001), the threshold is *β* = 0. 3, and there are 200 integration time steps with a time step of 0. 8ms which we found to provide numerically stable solutions. We use 20,000 epochs in our training. In each epoch, a random location, **r**, is presented to the grid cells and the model responses are computed using Eq. (6) and the matrix of connection strengths, **A**, is updated by Eq. (7). The learning rate, η, is chosen to be 0.03. The parameters above were chosen to ensure a stable solution in a reasonable time scale, but the results were found to be robust to moderate changes of these parameters.

### 2.6 Recovering the firing fields of model cells

After training, we use the method of reverse correlation to recover the firing fields, denoted as **F**, of model cells. We present *K* random locations, **r**_{1}, ···, **r**_{K}, to the model, compute according to Eq. (6) the neural responses of a model cell, *s*_{1}, ···, *s _{K}*, and then compute the firing field,

**F**, of this model cell by

#### 2.6.1 Fitting firing fields to 2D Gaussian functions

The recovered firing field, **F** (recovered by Eq 8), is fitted by a 2D Gaussian function *Q*(*x, y*) of the form
where γ is the amplitude, *σ* is the breadth of the firing field, and (*x _{c}, y_{c}*) represents the center of the 2D Gaussian function. The built-in MATLAB (version R2020a) function,

*Isqcurvefit*, is used to fit these parameters. The fitting error is defined as the square of the ratio between the fitting residual and firing field.

#### 2.6.2 Selecting single-location firing field

Some firing fields of model cells have multiple firing locations and noise in the background. A firing field is categorised as a place field if the following two criteria are satisfied: (1) the fitting error is smaller than 15% (2) the breadth, σ, is larger than 5 cm. These two rules exclude any cells with no obvious firing field or with multiple-location firing field. A model cell is called a *place cell* if its firing field meets the two criteria of a place field.

### 2.7 Measuring the uniformity of place cell representation

For place cells with a single-location firing field, the field center (xc, yc) fitted by Eq. (9) indicates the spatial location that the place cell responds to. We measured how well these place cells represent the entire environment using two measures.

The first measure is *distance to place field, d*_{PF}, which indicates the distance between each spatial location (*p _{x}, p_{y}*) in the environment and the nearest place field, described as

If the distance to a place field is large for a location, it means that there are no place fields near this location. Therefore, the distribution of this measure can tell us how well place fields tile the entire spatial environment. When all spatial locations have small values of this distance to place field, d_{PF}, the entire environment is tiled by the place cells.

The second measure is the *K-nearest-distance, d*_{Knd}. For all the centres of place cells with single-location firing field, we define K-nearest-distance as the maximal distance of K nearest centers of each center (*x _{j}, y_{j}*), described as
where min

^{K}returns a set of

*K*smallest values. When

*K*= 2, the distribution of K-nearest-distance,

*d*

_{Knd}, for all centers shows the uniformity of place cells in the environment. However, this measure alone provides little information about the coverage of all place cells because place cells with small K-nearest-distance might only lie in a small sub-region of the entire environment, which would give a small value of this measure but would not represent a good tiling of the entire environment.

The distance to place field, *d*_{PF}, together with K-nearest-distance, *d*_{Knd}, provides quantitative measures of how well the place cells code for the spatial environment. Small values of both measures indicate that place cells can tile the entire environment fairly evenly. For example, if 100 place cells are organised on a 10 × 10 grid that evenly tile the 1m ×1m environment, the K-nearest-distance will be 100/(10 − 1) ≈ 11.11 cm for each place cell and the distance to place field for every location is smaller than 11.11/2 ≈ 5.56 cm.

## 3 Results

### 3.1 The model can learn single-field place-like cells if grid cells are diverse

Our simulations show that the non-negative sparse coding model proposed here can learn single-location place cells given diverse grid cells as the input. Grid cells with 4 different spacings (*N*_{λ} = 4), 6 different orientations (*N _{θ}* = 6) and 5 different phases in both

*x*and

*y*axis (

*N*=

_{x}*N*= 5) are used here, i.e., there are 600 grid cells in total (

_{y}*N*

_{g}=

*N*

_{λ}

*N*

_{θ}

*N*= 600).

_{x}N_{y}#### 3.1.1 Learned place cells are localized and tile the whole environment

The place field, **F** defined by Eq.(8), for 100 model cells is shown in Figure 3A. All model cells shown here have a single-location firing field. Furthermore, different cells have spatially different firing fields. After learning, the place field of each cell is fitted to a 2D Gaussian function. All cells have a small fitting error (< 3%) and meet the criteria of single-location place field (as defined in Materials and Methods 2.6.2). The centers of all the place cells are displayed together in the 1m×1m spatial environment represented by a 32 × 32 pixel-like image, Figure 3B, which shows that the centers of the 100 place cells tile the entire environment without any overlap. In addition, the box plot in Figure 3C shows that any location within the space is within a distance of no more than 8.2 cm from the nearest place fields. The histogram of K-nearest-distance of all 100 place cells is displayed in Figure 3D, which shows that the distribution is centered around a mean value of 10.70 cm and standard deviation 0.75 cm. Given that the learned place cells have mean radius 8. 69 cm, Figure 3B,C and D illustrate that the learned place cells tile the whole environment rather evenly, i.e, the model learned by non-negative sparse coding can give an accurate neural representation of spatial locations in the environment.

#### 3.1.2 The competition introduced by sparse coding provides the inhibition for place cells

The connectivity profile between 600 grid cells and 100 place cells is plotted in Figure 4A, which shows that each place cell selects particular grid cells with different weights. As a result, the overall feedforward connection from the spatial environment to the place cells, namely the matrix product **GA**, has the spatial structure plotted in Figure 4B, which shows that each place cell is selective to one spatial location similar to the recovered firing fields (Figure 3A). However, **GA** has strong average offsets, which can be seen from the grey background in Figure 4B. The model of place cells proposed by Solstad et al. (2006) has an inhibition term to balance the excitation so that the place fields are responsive to a single location. As for the model of place cells proposed by Franzius et al. (2007b), an offset constant is added and signs of model units are adjusted in order to achieve single location place fields. Nevertheless, comparing Figure 3A and Figure 4B, we can conclude that the network implemented by sparse coding naturally introduce the competition to inhibit place cells such that they have firing fields similar to those found in experiments. As stated earlier in Materials and Methods, the sparse coding model used in this paper can be implemented by a biologically realistic network (Lian et al., 2019), suggesting that principle used here can be a potential mechanism used in the navigational system of the brain. The results presented in this study are not sensitive to different parameter values as long as there are diversities in spacing, orientation and phase. Even 81 grid cells are sufficient for the model to learn place cells that tile the whole environment (see S1 Fig).

In addition, the principle of sparse coding forces the model to learn a efficient representation of the grid cell input. The average percentage of active model cells in response to a spatial location is 5.59%. The sparse population activity is consistent with the experimental study that shows sparse ensemble activities in the macaque hippocampus (Skaggs et al., 2007).

### 3.2 The model can learn cells with multiple firing locations similar to dentate gyrus cells

In this section, we show that the lack of diversity in any grid parameters will prevent the model from learning place cells and cells with multiple firing locations start to emerge, i.e., the same model can learn cells similar to dentate gyrus cells that have multiple firing locations.

#### 3.2.1 When grid cells are less diverse

A lack of diversity in grid spacing results in the emergence of multiple firing locations of the model cells, as illustrated in Figure 5A compared with Figure 3A. Similarly, compared with Figure 3B and C, the lack of diversity in grid orientation or grid phase will also cause the model to learn more cells with multiple firing locations (Figure 5B and C). These model cells are similar to dentate gyrus cells that are found to have multiple firing locations in experimental studies (Jung and McNaughton, 1993; Leutgeb et al., 2007).

Recall that the principle of sparse coding finds a linear representation of the input, namely the grid cell responses. Our results suggest that grid cells with less diversity in grid parameters are not sufficient to well represent the whole environment, so that the system gives an ambiguous representation of the spatial location. Therefore, the diverse grid cells found in the MEC are crucial to the emergence of hippocampal place cells. Similar to MEC-hippocampus connections, there are also feedforward connections from MEC to the dentate gyrus, so the lack of diversity in afferent grid cells may be one possible factor explaining how cells with multiple firing locations emerge in the dentate gyrus.

#### 3.2.2 When there are fewer model cells

Simulations also show that a smaller number of model cells, *N*_{p}, can also cause the model to learn cells with multiple firing locations, even though the grid cells are diverse. This is illustrated in Figure 6, which shows the firing fields of the model when there are different numbers of model cells in the network. The values of the remaining parameters are exactly the same as ones used in Figure 5 which shows well-learned place cells, except the number of model cells, *N*_{p}. Figure 6 demonstrates that as the number of model cells, *N*_{p}, decreases, cells with more firing-locations start to emerge. The less cells, the larger the proportion of cells with multiple firing locations that emerge. When *N*_{p} = 10 all model cells have more than one firing location. When *N*_{p} = 20 there are five cells that are categorised as place cells. When *N*_{p} is larger than 30, almost all cells are found to have single-location firing fields.

Consequently, though a network with diverse grid cells can represent the spatial environment well, having less model cells does not result in an accurate representation of the spatial location with single-location place cells. This suggests more generally that cells with multiple firing locations may be generated by having a small number of cells in the population that implements the sparse coding.

### 3.3 The spatial resolution of the model increases as more place cells are utilised to represent the environment

As discussed above, when *N*_{p} is larger than 30, almost all cells have a single-location firing field. In addition, the learned place cells tile the whole environment rather well with small values of the K-nearest-distance, Eq.(11). Furthermore, as *N*_{p} increases, the mean K-nearest-distance and field breadth decreases (Figure 7), indicating that the spatial resolution of the neural representation by place cells improves.

### 3.4 Model results are robust to realistic grid fields

When more realistic grid fields (Materials and Methods, 2.3) are used that incorporate the observed biological variability, the model can still learn a robust representation of the spatial location of the entire environment, as shown in Figure 8.

Figure 8A shows that when grid fields are diverse in grid spacing, orientation and phase, each model cell learns a single-location firing field such that centers of all place fields tile the entire spatial environment rather evenly. The box plot of distance to place field shows fairly small values and indicates the whole environment is covered well. The distribution of K-nearest-distance has mean 10.71 cm and standard deviation 0.72 cm, qualitatively consistent with results shown in Figure 3C (mean 10.70 cm and standard deviation 0.75 cm). Therefore, the learned place cells evenly tile the entire environment.

Figure 8B shows that realistic grid fields with less diversity will learn model cells with multiple firing locations. The left plot displays the firing fields of 100 model cell with less diversity in grid spacing. The standard deviation of spacing in four modules is set to 0 cm instead of 8 cm while the standard deviation of orientation is still 3^{°} and the phase is random. The middle plot shows firing fields when there is less diversity in grid orientation, where the standard deviation of spacing is 4 cm, the standard deviation of orientation is 0^{°} and phase is random. The right plot is for the case of less diversity in grid phase, where the standard deviation of spacing is 8 cm, the standard deviation of orientation is 3° and phase is (0,0) for all grid cells. Thought the model learn cells with multiple locations when there is less diversity in grid parameters, the place cells still tile the entire environment (see S3 Fig).

Figure 8C shows that having fewer model cells makes the model learn cells with multiple firing locations. The six plots separated by dashed lines in Figure 8C represent the firing fields of model cells when the number of model cells, *N*_{p}, is 10, 20, 30, 40, 50, and 60, respectively. When *N*_{p} = 10 and 20, there is no place cell. When *N*_{p} ≥ 40, almost all model cells are place cells.

Similar to Figure 7, the neural representation of the spatial environment has better resolution (smaller radius and smaller K-nearest-distance) as *N*_{p} increases, as seen from Figure 8D.

### 3.5 The model can generate large place fields

As discussed in a previous study (Neher et al., 2017), most existing model of place cells cannot produce large place fields, such as CA3 place cells with size around 1225 cm^{2}.

The model proposed here can generate large place fields by simply having grid cells with large grid spacings as the input to model cells.

In this part of the study, only grid cells with grid spacings in the fourth module are used; i.e., the grid spacing is sampled from the normal distribution with mean 98.4 cm and standard deviation 8 cm, grid orientation is sampled from the distribution with mean 0^{°} and standard deviation 3^{°}, and grid phases are randomly chosen from a uniform distribution. Similarly, 600 grid cells are used. The number of model cells, *N*_{p}, is set to 20.

After learning, 18 out of 20 model cells satisfy the definition of place cells. Figure 9A shows that large place fields can emerge after learning. Figure 9C shows that these place cells have radiuses from 18.71 cm to 21.22 cm (mean 19.68 cm and standard deviation 0.75 cm). Therefore, the size of place fields range from 1099.76 cm^{2} to 1414.62 cm^{2}. Figure 9B,D and E show that these 18 place cells with large size cover the entire environment rather evenly.

Above all, the model can learn place cells if the afferent grid cells have large grid spacings, consistent with experimental evidence that the sizes of grid cells and place cells increase along the dorsal-ventral axis (Fyhn et al., 2007; Kjelstrup et al., 2008) and with topographic entorhinal-hippocampal projections along the dorsal-ventral axis (Dolorfo and Amaral, 1998).

### 3.6 Weakly-tuned MEC cells are sufficient for place cells to emerge

Recent experimental evidence shows that the emergence of hippocampal place cells happens earlier in development than grid cells (Langston et al., 2010; Wills et al., 2010). Here we show that even weakly-tuned MEC cells can provide sufficient spatial information for the emergence of place cells which have an accurate representation of spatial location. This suggests that place cells can emerge throughout the development of MEC grid cells, from the initial weakly-tuned spatial pattern to the fully developed hexagonal grid pattern.

Weakly-tuned cells are observed to be abundant in the MEC (Zhang et al., 2013). The weakly-tuned field is generated in the simulation by first assigning a random activation, sampled from a uniform distribution between 0 and 1, to each location, then smoothing the map with a Gaussian kernel with standard deviation 6 cm, and normalizing the map such that the values are between 0 and 1 (Neher et al., 2017). Similar to Figure 3 and 8A, 600 weakly-tuned MEC cells and 100 model cells are used. 16 examples of weakly-tuned field are shown in Figure 10A.

Though the fields of weakly-tuned MEC cells are very different from the periodic pattern of grid cells. Surprisingly, they can nevertheless provide sufficient spatial information such that the model based on sparse coding can decode MEC cell responses and give an accurate representation of the spatial location. Figure 10B shows the firing field of learned place cells. Figure 10C, D and E shows that the centers of place cells evenly tile the entire spatial environment.

Compared with Figure 3 and 8A, using weakly-tuned MEC cells instead of grid cells results in learning a hippocampal place-map with less resolution. The mean radius of place fields in Figure 10B (11.45 cm) is larger than Figure 3 and 8A (8.92 cm and 8.69 cm, respectively). Furthermore, the K-nearest-distance in Figure 10 (mean 12.32 cm and standard deviation 7.10 cm) is also larger, compared with Figure 3 (mean 10.70 cm and standard deviation 0.75 cm) and 8A (mean 10.71 cm and standard deviation 0.72 cm) when grid cells are used. The large standard deviation in Figure 10D suggests that the irregular fields of weakly-tuned MEC cells leads to the less even tiling of place cells. The average active rate for model cells is 30.68%, much larger than the percentage when grid cells are used (5.59%).

Therefore, the model suggests that place cells emerge earlier than grid cells during development, in part because the neural system can learn a hippocampal map even when the hexagonal spatial field is not well developed.

Sparse coding can learn place cells even though the input cells (MEC cells in this paper) are weakly tuned to the spatial environment. Thus, input cells with stronger spatial selectivity can provide more spatial information so that unique place field can be decoded by sparse coding. Barry and Burgess (2007) used a learning model to learn place cells from responses of boundary vector cells that are selective to boundaries of the environment at particular angles and distances. Their result can be regarded as a special case of the results presented in the paper, where boundary vector cells are simply input cells with stronger tuning of the spatial environment.

Our results also suggest that these weakly-tuned MEC cells can arise from any form of sensory inputs, such as visual input and auditory input, that encode spatial information. For example, the visual input at different locations of the environment actually carries information about spatial locations and consequently the afferent visual information to MEC can lead to spatially tuned MEC cells. Moreover, the principle of sparse coding can cause the MEC cells to generate a place map. The conjecture proposed here can explain a recent experimental study that shows that place cell firing mainly reflect visual inputs (Chen et al., 2019) and another experimental study that suggests homing abilities of mice even in darkness may not need accurate grid cell firing (Chen et al., 2016).

## 4 Discussion

In this paper, we applied sparse coding with non-negative constraints to a hierarchical model of grid-to-place cell formation. Our results show that sparse coding can learn an efficient place code that represents the entire environment when grid cells are diverse in grid spacing, orientation and phase. However, lack of diversity in grid cells and fewer model cells leads to the emergence of cells with multiple firing locations, like those cells found in the dentate gyrus. In addition, weakly-tuned Medial Entorhinal Cortex (MEC) cells are sufficient for sparse coding to learn place cells, suggesting that place cells can emerge even when grid cells have not been fully developed.

### 4.1 Comparison with other learning models

Our work differs from them significantly from previous studies on learning place cells from grid cell input (Franzius et al., 2007b; Neher et al., 2017; Rolls et al., 2006). First, we systematically investigate the influence of the diversity in grid cells upon the formation for place cells. Second, we demonstrate that learned place cells can represent the entire spatial environment well. Third, the same model can produce cells with one firing location, multiple firing locations and large receptive field size, which can account for the emergence of a range of different observed hippocampal cell types. Fourth, we demonstrate that weakly-tuned MEC cells can also provide sufficient spatial information for the emergence of place cells after learning. Most importantly, all the results presented in this paper are generated by the same model, namely sparse coding with a non-negative constraint.

### 4.2 Properties of grid cells that are necessary for the emergence of place cells

Though the model based on sparse coding can learn place cells when only weakly-tuned MEC cells are used, it does not imply that grid cells are not necessary for their formation. The active firing rate for model cells when weakly-tuned MEC cells are used as input is much larger than the rate when grid cells are used, suggesting that grid cells are more efficient and thereby reduce the energy required by the neural system. Fiete et al. (2008) proposed that grid cells with different spacings and phases altogether form a residual system that efficiently encode the spatial location. In addition, the triangular lattice of the grid pattern is known to be the solution to the optimal circle packing problem (Thue, 1892) and the geometric scale of grid spacings can represent the spatial environment efficiently (Wei et al., 2015).

### 4.3 Underlying neural circuits

Our study examines the extent to which sparse coding is as an underlying principle in the navigational system of the brain. However, the current model implies no specific neural circuits for the implemention of the sparse coding, rather it is one of the principles that underlyings the formation of the neural circuits. Neurophysiological and anatomical studies suggest that the entorhinal cortex and the hippocampus interact via a loop (Tamamaki, 1997; Tamamaki and Nojyo, 1995; Witter et al., 2014). Therefore, feedforward connections from the entorhinal cortex to the hippocampus, recurrent connections within the hippocampus, and feedback connections from the hippocampus to the entorhinal cortex all play an important role, thought their specific contributions to the overall function of the network have not been fully uncovered yet. The proposed model based on sparse coding in this study does not rule out any of the network structures mentioned above, as sparse coding can be implemented in neural circuits either in a feedforward network with recurrent connections (Zylberberg et al., 2011) or a network with feedforward-feedback loops (Lian et al., 2019).

### 4.4 Future work

The current study does not propose a specific biological neural circuit for implementing sparse coding in the entorhinal-hippocampal region, which is the study of ongoing work. Such a model of these neural circuits would need to take into account the experimentally known networks in this area. In addition, the model here used prefixed grid cells. We did not attempt here to provide a description for how grid cells emerge, but rather the grid cells are assumed to provide an efficient representation of the environment. It would be interesting to also investigate the role of sparse coding in how grid cells themselves emerge. It is hoped that such future work, which incorporates these aspects of the development process of both grid cells and place cells, will provide further insights into how the navigational system of the brain works. Sparse coding represents just one of a number of possible mechanisms that shape network structures, and much remains to be explored to incorporate other mechanisms, such as those associated with the complexities of metabotropic receptor effects, as discussed in Hasselmo et al. (2020).

## 5 Conclusion

In this study we examined the role of non-negative sparse coding upon hippocampal place cells that receive input from MEC grid cells. The model showed that both place fields and cells with multiple locations can be learned, depending upon specific network parameters. In addition, the learned place cells give an accurate representation of spatial location. Furthermore, weakly-tuned MEC cells are sufficient to drive hippocampal cells to learn place fields. This study elucidates the role of sparse coding as an important mechanism in the navigational system of the brain.

## 6 Supporting information

## 7 Acknowledgments

This work received funding from the Australian Government, via grant AUSMURIB000001 associated with ONR MURI grant N00014-19-1-2571.

## Footnotes

↵* yanbo.lian{at}unimelb.edu.au, aburkitt{at}unimelb.edu.au