## Abstract

Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space. What is the functional significance of such representations and how can they arise? Here, we propose that localized receptive fields emerge in similarity-preserving networks of rectifying neurons that learn low-dimensional manifolds populated by sensory inputs. Numerical simulations of such networks on standard datasets yield manifold-tiling localized receptive fields. More generally, we show analytically that, for data lying on symmetric manifolds, optimal solutions of objectives, from which similarity-preserving networks are derived, have localized receptive fields. Therefore, nonnegative similarity-preserving mapping (NSM) implemented by neural networks can model representations of continuous manifolds in the brain.

## 1 Introduction

A salient and unexplained feature of many neurons is that their receptive fields are localized in the parameter space they represent. For example, a hippocampus place cell is active in a particular spatial location [1], the response of a V1 neuron is localized in visual space and orientation [2], and the response of an auditory neuron is localized in the sound frequency space [3]. In all these examples, receptive fields of neurons from the same brain area tile (with overlap) low-dimensional manifolds.

Localized receptive fields are shaped by neural activity as evidenced by experimental manipulations in developing and adult animals [4, 5, 6, 7]. Activity influences receptive fields via modification, or learning, of synaptic weights which gate the activity of upstream neurons channeling sensory inputs. To be biologically plausible, synaptic learning rules must be physically local, i.e., the weight of a synapse depends on the activity of only the two neurons it connects, pre- and post-synaptic.

In this paper, we demonstrate that biologically plausible neural networks can learn manifold-tiling localized receptive fields from the upstream activity in an unsupervised fashion. Because analyzing the outcome of learning in arbitrary neural networks is often difficult, we take a normative approach, Fig. 1. First, we formulate an optimization problem by postulating an objective function and constraints, Fig. 1. Second, for inputs lying on a manifold, we derive an optimal offline solution and demonstrate analytically and numerically that the receptive fields are localized and tile the manifold, Fig. 1. Third, from the same objective, we derive an online optimization algorithm which can be implemented by a biologically plausible neural network, Fig. 1. We expect this network to learn localized receptive fields, the conjecture we confirm by simulating the network numerically, Fig. 1.

Optimization functions considered here belong to the family of similarity-preserving objectives which dictate that similar inputs to the network elicit similar outputs [8, 9, 10, 11, 12]. In the absence of sign constraints, such objectives are provably optimized by projecting inputs onto the principal subspace [13, 14, 15], which can be done online by networks of linear neurons [8, 9, 10]. Constraining the sign of the output leads to networks of rectifying neurons [11] which have been simulated numerically in the context of clustering and feature learning [11, 12, 16, 17], and analyzed in the context of blind source extraction [18]. In the context of manifold learning, sign-constrained optimal solutions of similarity-preserving objectives have been missing because optimization of existing similarity-preserving objectives is challenging.

Our main contributions are:

Analytical sign-constrained optimization of similarity-preserving objectives for input originating from symmetric manifolds.

Derivation of biologically plausible similarity-preserving, manifold-learning neural networks.

Offline and online algorithms for manifold learning of arbitrary manifolds.

The paper is organized as follows. In Sec. 2, we derive a simplified version of a similarity-preserving objective with sign-constrained output. Much of our following analysis can be carried over to other similarity-preserving objectives but with additional technical considerations. In Sec. 3, we derive a necessary condition for the optimal solution. In Sec. 4, we consider solutions for the case of symmetric manifolds. In Sec. 5, we derive online optimization algorithm and a neural network. In Sec. 6, we present the results of numerical experiments.

## 2 A Simplified Similarity-preserving Objective Function

To introduce similarity-preserving objectives, let us define our notation. The input to the network is a set of vectors, **x**_{t} ∈ ℝ^{n}, *t* = 1,…,*T*, with components represented by the activity of n upstream neurons at time, *t*. In response, the network outputs an activity vector, **y**_{t} ∈ ℝ^{m}, *t* = 1,…, *T, m* being the number of output neurons.

Similarity preservation postulates that similar input pairs, **x**_{t} and **x**_{t}, evoke similar output pairs, **y**_{t} and **y**_{t}. If similarity of a pair of vectors is quantified by their scalar product and the distance metric is Euclidean, we have
where we introduced a matrix notation **X** ≡ [x_{1},…, **x**_{T}] ∈ ℝ^{n×T} and **Y** ≡ [y_{1},…, **y**_{T}] ∈ ℝ^{m×T} and *m* < *n*. Such optimization problem is solved offline by projecting the input data to the principal subspace [13, 14, 15]. The same problem can be solved online by a biologically plausible neural network performing global linear dimensionality reduction [8, 10].

We will see below that nonlinear manifolds can be learned by constraining the sign of the output and introducing a similarity threshold, *α*:
where **E** is a matrix of all ones. In the special case, *α* = 0, Eq. (2) reduces to the objective in [11, 19, 18].

Intuitively, Eq. (2) attempts to preserve similarity for similar pairs of input samples but orthogonalizes the outputs corresponding to dissimilar input pairs. Indeed, if the input similarity of a pair of samples *t, t*′ is above a specified threshold, **x**_{t} · **x**_{t′} > *α*, output vectors **y**_{t} and **y**_{t′} would prefer to have **y**_{t} · **y**_{t′} ≈ **x**_{t} · **x**_{t′} − *α*, i.e., it would be similar. If, however, **x**_{t} · **x**_{t′} < *α*, the lowest value **y**_{t} · **y**_{t′} for **y**_{t}, **y**_{t′} ≥ 0 is zero meaning that that they would tend to be orthogonal, **y**_{t} · **y**_{t′} = 0. As **y**_{t} and **y**_{t′} are nonnegative, to achieve orthogonality, the output activity patterns for dissimilar patterns would have non-overlapping sets of active neurons. In the context of manifold representation, Eq. (2) strives to preserve in the y-representation local geometry of the input data cloud in x-space and let the global geometry emerge out of the nonlinear optimization process.

As the difficulty in analyzing Eq. (2) is due to the quartic in **Y** term, we go on to derive a simpler quadratic in **Y** objective function that produces very similar outcomes. To this end, we, first, introduce an additional power constraint: Tr **Y**^{⊤}**Y** ≤ *k* as in [9, 11]. We will call the input-output mapping obtained by this procedure NSM-0, with NSM standing for **N**onnegative **S**imilarity-preserving **M**apping:
where we expanded the square and kept only the **Y**-dependent terms.

We can redefine the variables and drop the last term in a certain limit (see the Supplementary Material, Sec. A.1, for details) leading to the optimization problem we call NSM-1:

Intuitively, just like Eq. (2), NSM-1 preserves similarity of nearby input data samples while orthog-onalizing output vectors of dissimilar input pairs. Conceptually, this type of objective has proven successful for manifold learning [20]. Indeed, a pair of samples *t, t*′ with **x**_{t} · **x**_{t′} > *α*, would tend to have **y**_{t} · **y**_{t′} as large as possible, albeit with the norm of the vectors controlled by the constraint ║y_{t}║^{2} ≤ *β*. Therefore, when the input similarity for the pair is above a specified threshold, the vectors **y**_{t} and **y**_{t′} would prefer to be aligned in the same direction. For dissimilar inputs with **x**_{t} · **x**_{t′} < *α*, the corresponding output vectors **y**_{t} and *y _{t}>* would tend to be orthogonal, meaning that responses to these dissimilar inputs would activate mostly nonoverlapping sets of neurons.

## 3 A Necessary Optimality Condition for NSM-1

In this section, we derive the necessary optimality condition for Problem (NSM-1). For notational convenience, we introduce the Gramian **D** ≡ **X**^{⊤}**X** and use [**z**]_{+}, where **z** ∈ ℝ^{T}, for the componentwise ReLU function, ([**z**]_{+})_{t} ≡ max(*z _{t}*, 0).

The optimal solution of Problem (*NSM-1*) satisfies
where **y**^{(a)} designates a column vector which is the transpose of the a-th row of **Y** *and* **Λ** = diag(*λ*_{1},…, *λ _{T}*) is a nonnegative diagonal matrix.

The proof of Proposition 1 (Supplementary Material, Sec. A.2) proceeds by introducing Lagrange multipliers **Λ** = diag(*λ*_{1},…, *λ _{T}*) ≥ 0 for the constraint diag(

**Y**

^{⊤}

**Y**) ≤

*β*

**I**, and writing down the KKT conditions. Then, by separately considering the cases

*λ*= 0 and

_{t}y_{at}*λ*> 0 we get Eq. (3).

_{t}y_{at}To gain some insight into the nature of the solutions of Eq. (3), let us assume *λ _{t}* > 0 for all

*t*and rewrite it as

Eq. (4) suggests that the interaction within each pair of **y**_{t} and **y**_{t′} has a different sign, i.e., it is excitatory or inhibitory, depending on their similarity. If **y**_{t} and **y**_{t′} are similar, *D _{tt′}* >

*α*, then

*y*has excitatory influence on

_{at′}*y*. Otherwise, if

_{at}**y**

_{t}and

**y**

_{t′}are farther apart, the influence is inhibitory. Such models often give rise to localized bump solutions [21]. Since, in our case, the variable

*y*gives the activity of the

_{at}*a*-th neuron as the

*t*-th input vector is presented to the network, such a bump would define a localized receptive field of neuron,

*a*, in the space of inputs. Below, we will derive such solutions with local receptive fields for inputs arising from symmetric manifolds.

## 4 Solution for Symmetric Manifolds via a Convex Formulation

So far, we set the dimensionality of y, i.e., the number of output neurons, m, a priori. However, as this number depends on the dataset, we would like to allow for flexibility of choosing the output dimensionality adaptively. To this end, we introduce the Gramian, **Q** ≡ **Y**^{⊤} **Y**, and do not constrain its rank. Minimization of our objective functions requires that the output similarity expressed by Gramian, **Q**, captures some of the input similarity structure encoded in the input Gramian, **D**.

Redefining the variables makes the domain of the optimization problem convex. Matrices like **D** and **Q** which could be expressed as Gramians are symmetric and positive semidefinite. In addition, any matrix, **Q**, such that **Q** ≡ **Y**^{⊤}**Y** with **Y** ≥ 0 is called *completely positive*. The set of completely positive *T* × *T* matrices is denoted by and forms a closed convex cone [22].

Then, NSM-1, without the rank constraint, can be restated as a convex optimization problem with respect to **Q** belonging to the convex cone :

Despite the convexity, for arbitrary datasets, optimization problems in are often intractable for large *T* [22]. Yet, for **D** with a high degree of symmetry, below, we will find the optimal **Q**.

Imagine now that there is a group *G* ⊆ *S _{T}, S_{T}* being the permutation group of the set {1,2,…,

*T*}, so that

*D*

_{g(t)g(t′) = Dtt′}for all

*g ∈ G*. The matrix with elements

*M*

_{g(t)g(t′)}is denoted as

*g*

**M**, representing group action on

**M**. We will represent action of

*g*on a vector

**w**∈

*R*as gw, with (

^{T}*g*w)

_{t}=

*w*(

_{g}*t*).

If the action of the group G is transitive, that is, for any pair *t, t*′ *e* {1,2,…,*T*} there is a g ∈ G so that *t*′ = *g*(*t*), then there is at least one optimal solution of Problem (NSM-1a) with **Q** = **Y**^{⊤}**Y, Y** ∈ ℝ^{m×T} such that

for each a, the transpose of the a-th row of

**Y**, termed**y**^{(a)}, satisfies**Y**could be written as where g_{i}are members of distinct left cosets in G/H, H being the stabilizer subgroup of**y**^{(1)}, namely*H*= {*h*∈*G*|*h*y^{(1)}=**y**^{(1)}}.

In other words, when the symmetry group action is transitive, all the Lagrange multipliers are the same. Also the different rows of the **Y** matrix could be generated from a single row by the action of the group. A sketch of the proof is as follows (see Supplementary Material, Sec. A.3, for further details). For part (i), we argue that a convex minimization problem with a symmetry always has a solution which respects the symmetry. Thus our search could be limited to the *G*-invariant elements of the convex cone, , which happens to be a convex cone itself. We then introduce the Lagrange multipliers and define the Lagrangian for the problem on the invariant convex cone and show that it is enough to search over **Λ** = *λI*. Part (ii) follows from optimality of **Q** = **Y**^{⊤}**Y** implying optimality of .

Eq. (5) is a non-linear eigenvalue equation that can have more than one solution. In simple cases, those solutions are related to each other by symmetry. We will find such explicit solutions in the following subsections.

### 4.1 Solution for Inputs on the Ring with Cosine Similarity in the Continuum Limit

In this subsection, we consider the case where inputs, **x**_{t}, lie on a one-dimensional manifold shaped as a ring centered on the origin:
where *t* ∈ {1,2,…, *T*}. Then, we have .

In the limit of large *T*, we can replace a discrete variable, *t*, by a continuous variable, and *λ* → *Tμ* leading to
with *C* adjusted so that ∫ *u _{ϕ}*(

*θ*)

^{2}

*dm*(

*ϕ*) = 1 for some measure

*m*in the space of

*ϕ*, which is a continuous variable labeling the output neurons. We will see that

*ϕ*could naturally be chosen as an angle and the constraint becomes .

Eq. (7) has appeared previously in the context of the ring attractor [21]. While our variables have a completely different neuroscience interpretation, we can still use their solution:
whose support is the interval [*ϕ* − *ψ,ϕ* + *ψ*].

Eq. (8) gives the receptive fields of a neuron, *ϕ*, in terms of the azimuthal coordinate, *θ*, shown in the bottom left panel of Fig. 1. The dependence of *μ* and *ψ* on *α* is given parametrically by two self-consistency conditions (Supplementary Material, Sec. A.4). So far, we have only shown that Eq. (8) satisfies the necessary optimality condition in the continuous limit of Eq. (7). In Sec. 6, we confirm numerically that the optimal solution for a finite number of neurons approximates Eq. (8), Fig. 2.

While we do not have a closed-form solution for NSM-0 on a ring, we show that the optimal solution also has localized receptive fields (see Supplementary Material, Sec. A.5).

### 4.2 Solution for Inputs on Higher-dimensional Compact Homogeneous Manifolds

Here, we consider two special cases of higher dimensional manifolds. The first example is the 2-sphere, *S*^{2} = *SO*(3)/*SO*(1). The second example is the rotation group, *SO*(3), which is a three-dimensional manifold. It is possible to generalize this method to other compact homogeneous spaces for particular kernels.

We can think of a 2-sphere via its 3-dimensional embedding: *S*^{2} ≡ {x ∈ ℝ^{3}║|x║ = 1}. For two points Ω, Ω′ on the 2-sphere let **D**(Ω, Ω′) = x(Ω) · x(Ω′), where x(Ω), x(Ω′) are the corresponding unit vectors in the 3-dimensional embedding.

Remarkably, we can show that solutions satisfying the optimality conditions are of the form

This means that the center of a receptive field on the sphere is at Ω_{0}. The neuron is active while the angle between x(Ω) and x(Ω_{0}) is less than ψ. For the self-consistency conditions, determining ψ, μ in terms of *α*, see Supplementary Material, Sec. A.6.

In the case of the rotation group, for *g,g*′ e SO(3) we adopt the 3 × 3 matrix representations **R**(*g*), **R**(*g*′) and consider to be the similarity kernel. Once more, we index a receptive field solution by the rotation group element, *g*_{0}, where the response is maximum:
with *ψ, μ* being determined by *α* through self-consistency equations. This solution has support over *g* ∈ *SO*(3), such that the rotation has a rotation angle less than *ψ*.

## 5 Online Optimization and Neural Networks

Here, we derive a biologically plausible neural network that optimizes NSM-1. To this end, we transform NSM-1 by, first, rewriting it in the Lagrangian form:

Here, unconventionally, the nonnegative Lagrange multipliers are factorized into inner products of two nonnegative vectors (z_{t} · z_{t}). Second, we introduce auxiliary variables, **W, b, V**_{t} [10]:

This form suggests a two-step online algorithm. For each input **x**_{t}, in the first step, one solves for **y**_{t}, z_{t} and **V**_{t}, by projected gradient descent-ascent-descent,
where *γ _{y,z,V}* are step sizes. This iteration can be interpreted as the dynamics of a neural circuit (Fig.1, Top right panel), where

**y**

_{t}are activity variables of an excitatory neuron population, b is a bias term, z

_{t}are activity variables of an inhibitory neuron population,

**W**is the feedforward connectivity matrix, and

**V**

_{t}is the synaptic weight matrix from excitatory to inhibitory neurons, which goes through a fast time-scale anti-Hebbian plasticity. In the second step,

**W**and b are updated by a gradient descent-ascent step: where

**W**is going through a slow time-scale Hebbian plasticity and b through homeostatic plasticity.

*η*is a learning rate. Application of this algorithm to symmetric datasets is shown in Fig. 2 and Fig. 3.

## 6 Experimental Results

In this section, we solve both offline and online optimization problems numerically. Our results confirm the theoretical predictions in Sec. 4. Moreover, our algorithms yield manifold-tiling localized receptive fields on real-world data.

According to the theoretical results in Sec. 4, for the input data lying on a ring, optimization without a rank constraint yields truncated cosine solutions, see Eq. (8). Here, we show numerically that fixed-rank optimization yields the same solutions, Fig. 2: the computed matrix **Y**^{⊤}**Y** is indeed circulant, all receptive fields are equivalent to each other, are well approximated by truncated cosine and tile the manifold with overlap. Similarly, for the input lying on a 2-sphere, we find numerically that localized solutions tile the manifold, Fig. 3.

For the offline optimization we used a Burer-Monteiro augmented Lagrangian method [23, 24]. Traditionally, the number of rows *m* of **Y** is chosen to be *βT* (observe that diag(**Y**^{⊤}**Y**) ≤ *β***I** implies that Tr(**Y**^{⊤}**Y**) ≤ *βT*, making *βT* an upper bound of the rank). We use the non-standard setting *m* ≫ *βT*, as a small m might create degeneracies (i.e., hard-clustering solutions).

### Dealing with real-world data

For normalized input data with every diagonal element above the threshold *α*, the term *α* Tr(**EQ**) = *α* Σ_{tt}, **y**_{t} · **y**_{t}′ in NSM-1 behaves as described in Sec. 2. For unnormalized inputs, it is preferable to control the sum of each row of **Q**, i.e. Σ_{t}′ **y**_{t} · **y**_{t′}, with an individual *α _{t}*, instead of the total sum.

Additionally, enforcing is in many cases empirically equivalent to enforcing but makes the optimization easier. We thus obtain the objective function
which, for some choice of *α _{t}*, is equivalent to
where

**1**∈ ℝ

^{T}is a column vector of ones.

For highly symmetric datasets without constraints on rank, NSM-2 has the same solutions as NSM-1 (see Supplementary Material, Sec. A.7). Relaxations of this optimization problem have been the subject of extensive research to solve clustering and manifold learning problems [25, 26, 27, 28]. A biologically plausible neural network solving this problem was proposed in [12]. For the optimization of NSM-2 we use an augmented Lagrangian method [23, 24, 28, 29].

We have extensively applied NSM-2 to datasets previously analyzed in the context of manifold learning [28, 30] (see Supplementary Material, Sec. B). Here, we include just two representative examples, figs. 4 and 5, showing the emergence of localized receptive fields in a high-dimensional space. Despite the lack of symmetry and ensuing loss of regularity, we obtain neurons whose receptive fields, taken together, tile the entire data cloud. Such tiling solutions indicate robustness of the method to imperfections in the dataset and further corroborate the theoretical results derived in this paper.

## 7 Discussion

In this work, we show that objective functions approximately preserving similarity, along with nonnegativity constraint on the outputs, provide neurally plausible representation of manifolds. Neural networks implementing such algorithms rely only on local (Hebbian or anti-Hebbian) plasticity of synaptic strengths.

Our algorithms, starting from a linear kernel, **D**, generate an output kernel, **Q**, restricted to the sample space. Whereas the associations between kernels and neural networks was known [31], previously proposed networks used random synaptic weights with no learning. In our algorithms, the weights are learned from the input data to optimize the objective. Therefore, our algorithms learn data-dependent kernels adaptively.

In addition to modeling biological neural computation, our algorithms may also serve as general-purpose mechanisms for generating representations of manifolds adaptively. Unlike most existing manifold learning algorithms [32, 33, 34, 35, 36, 37], ours can operate naturally in the online setting. Also, unlike most existing algorithms, ours do not output low-dimensional vectors of embedding variables but rather high-dimensional vectors of assignment indices to centroids tiling the manifold, similar to radial basis function networks [38]. The advantage of such high-dimensional representation becomes obvious if the output representation is used not for visualization but for further computation, e.g., linear classification [39].

## Acknowledgments

We are grateful to Johannes Friedrich, Victor Minden, Eftychios Pnevmatikakis, and the other members of the Flatiron Neuroscience group for discussion and comments on an earlier version of this manuscript. We thank Sanjeev Arora, Moses Charikar, Jeff Cheeger, Surya Ganguli, Dustin Mixon, Afonso Bandeira, Marc’Aurelio Ranzato, and Soledad Villar for helpful discussions.