## Abstract

Within a single sniff, the mammalian olfactory system can decode the identity and concentration of odorants wafted on turbulent plumes of air. Yet, it must do so given access only to the noisy, dimensionally-reduced representation of the odor world provided by olfactory receptor neurons. As a result, the olfactory system must solve a compressed sensing problem, relying on the fact that only a handful of the millions of possible odorants are present in a given scene. Inspired by this principle, past works have proposed normative compressed sensing models for olfactory decoding. However, these models have not captured the unique anatomy and physiology of the olfactory bulb, nor have they shown that sensing can be achieved within the 100-millisecond timescale of a single sniff. Here, we propose a rate-based Poisson compressed sensing circuit model for the olfactory bulb. This model maps onto the neuron classes of the olfactory bulb, and recapitulates salient features of their connectivity and physiology. For circuit sizes comparable to the human olfactory bulb, we show that this model can accurately detect tens of odors within the timescale of a single sniff. We also show that this model can perform Bayesian posterior sampling for accurate uncertainty estimation. Fast inference is possible only if the geometry of the neural code is chosen to match receptor properties, yielding a distributed neural code that is not axis-aligned to individual odor identities. Our results illustrate how normative modeling can help us map function onto specific neural circuits to generate new hypotheses.

## 1 Introduction

Sensory systems allow organisms to detect physical signals in their environments, enabling them to maximize fitness by acting adaptively. This experience of the physical environment, also known as the *Umwelt*, depends on the sensors and sensory organs of each organism [1, 2]. Throughout evolution, organisms have developed specialized sensory mechanisms to extract specific information about the physical world. In vision and audition—the most studied sensory modalities in neuroscience—stimuli are characterized by intuitive metrics such as orientation or frequency, which have been shown to map onto neural representations from the earliest stages of the sensory systems [3–5]. One can continuously vary the orientation of an object or the pitch of a tone and quantify resulting changes in perception and neural representations. From a computational point of view, this structure in the representations can be viewed as optimizing the information transfer in the network [6–16].

In contrast, the geometric structure of the olfactory world is far less clear: How can one ‘rotate’ a smell? Despite significant effort, attempts to find such structure in olfactory stimuli and link that geometry to maps in olfactory areas have succeeded only in identifying coarse principles for highlevel organization, far from the precision of orientation columns or tonotopy in visual and auditory cortices [17–19]. In the absence of geometric intuitions, the principles of compressed sensing (CS) have emerged as an alternative paradigm for understanding olfactory coding [20–27]. This framework provides a partial answer to the question of how an organism could identify which of millions of possible odorants are present given the activity of only a few hundred receptor types [28–33]. However, existing CS circuit models do not admit convincing biologically-plausible implementations that can perform fast inference at scale. Indeed, many proposals assume that the presence of each odorant is represented by a single, specialized neuron, which is inconsistent with the distributed odor coding observed *in vivo* [22, 34–36]. This axis-aligned coding does not leverage the geometric structure of the sensory space, which can be defined even in the absence of interpretable dimensions [37–41].

In this paper, we propose a Poisson CS model for the mammalian olfactory bulb. Our primary contributions are as follows:

We derive a normative CS circuit model which can be mapped onto the circuits of the bulb (§3). Importantly, this mapping goes beyond basic counting of cell types; it includes detailed biological features like symmetric coupling and state-dependent inhibition (§4).

We show that this model enables fast, accurate inference of odor identity in a biologically reasonable regime where tens of odorants are present in a given scene (§5). This fast inference is enabled by considering the geometry of the olfactory receptor code. This consideration leads to distributed odor coding, resolving a major tension between previous CS circuit models and neural data.

We extend our circuit model to allow Bayesian inference of uncertainty in odor concentrations (§6).

In total, our results demonstrate the importance of considering representational geometry when trying to understand neural coding in the olfactory bulb (OB). Importantly, we show that it is the geometry in the space defined by the receptor affinity (or OSN activation) that controls the speed of inference. This view is distinct from previous geometric theories of olfaction, which have focused on the space of odorants [42, 43]. We propose that thinking in terms of the geometry of OSN coding will allow for deeper understanding of early olfactory processing.

## 2 Related work and review of the olfactory sensing problem

We begin with a review of the principles of olfactory coding, and of previous CS models for olfactory circuits. The structural logic of early olfactory processing is broadly conserved across the animal kingdom (Fig. 1A) [44–47], and this distinctive circuit structure is thought to play a key role in the computational function of the olfactory bulb [48–50]. In mammals, volatile odorants are first detected by olfactory receptor (ORs) proteins expressed on the surface of olfactory sensory neurons (OSNs). Each OSN expresses only a single OR type; in humans there are around 300 distinct ORs, in mice around 1000 [45]. Importantly, most ORs have broad affinity profiles, and the OSN code for odor identity is combinatorial [51]. In contrast to the immune system’s highly adaptable chemical recognition capabilities arising from somatic recombination [52], ORs are hard coded into the genome as single genes [53], and therefore can only change over evolutionary timescales [54, 55]. Some adaptation of expression levels across the receptors is possible [56], but the chemical affinity of the receptor array is fixed. OSNs expressing the same OR then converge onto the same glomerulus, synapsing onto the principal projection cells of the olfactory bulb (OB), the mitral and tufted cells. These in turn send signals to olfactory cortical areas. The OB contains several types of inhibitory interneurons, whose computational role remains to be clarified [49]. Importantly, the excitatory mitral and tufted cells are not reciprocally connected across glomeruli. Instead, they are connected through a network of inhibitory granule cells, the most numerous cell type in the OB [57].

In our model, we will focus on a particular task: odor component identification within a single sniff [58, 59]. This computational problem differs from many experimental tasks focusing on discrimination between two odorants [60], which underlie most scientific work on rodent olfactory decision making [60–62]. Here, the goal is to identify the components of a complex olfactory scene [63–66]. Importantly, the limits of human performance in this setting remain to our knowledge unknown [67, 68]. To render the problem more tractable, we will make a number of simplifications of the anatomy and physiology of the OB. We will not distinguish between mitral and tufted cells, the two classes of projection neurons. Recent works have shed some light on the distinct computational roles of these cell types; these distinctions are likely to become important when considering richer environmental dynamics than we do here [69, 70]. We will also ignore the fact that odorant concentrations can vary over many orders of magnitude, and OSN responses to large changes in concentration are strongly nonlinear. Here, we will focus on concentration changes of one or two orders of magnitude. Within such ranges, the responses of OB neurons are well characterized by linear models [71].

On the theoretical side, it is widely recognized that the tremendous compression of dimensionality inherent in the transformation from odorant molecules to receptor activity means that the olfactory decoding problem is analogous to the one faced in compressed sensing (CS) [21–27, 72–74]. Classical CS theory shows that sparse high-dimensional signals can be recovered from a small number of random projections [28–33]. Inspired by these results, previous works have used the principles of CS to build circuit models for olfactory coding [21–27, 72–74]. However, these model circuits do not map cleanly onto the neural circuits of the OB and the olfactory cortical areas, particularly because they often assume each granule cell encodes exactly one odorant. Moreover, these works usually assume a Gaussian noise model for OSNs, which is biologically unrealistic (but see [21, 22]). Instead, OSN activity is better captured by a Poisson noise model [75, 76]. Some theoretical guarantees for Poisson CS are known, but the situation is less well-understood than in the Gaussian case [32, 33, 77, 78].

## 3 A neural circuit architecture for Poisson compressed sensing

We now derive a normative, rate-based neural circuit model that performs Poisson CS, which in subsequent sections we will map onto the circuitry of the OB. The design of this model will follow general biological principles, without initially drawing on specific knowledge of the OB. With the goal of sensing within a single sniff in mind, the circuit’s objective is to rapidly infer the odorant concentrations underlying a single, static sample of OSN activity . For simplicity, we model the mean activity of OSNs as a linear function of the concentration, with a receptor affinity matrix and a baseline rate . As motivated above, we use a Poisson noise model for OSN activity given the underlying concentration signal **c**, and correspondingly a Gamma prior over concentrations with shape and scale :^{3}
Given this likelihood and prior, we construct a neural circuit to compute the maximum *a posteriori* (MAP) estimate of the concentration **c** using gradient ascent on the log-posterior probability. Here, we sketch the derivation, deferring some details to Appendix B. Our starting point is the gradient ascent equation
where⊘ denotes elementwise division and ċ(*t*) = *d***c***/dt*. Here, the estimate **c** is formally constrained to in numerical simulations we will sometimes ignore this constraint. Circuit algorithms of this form were studied in previous work by Grabska-Barwinska et al. [21].

However, in this most basic setup there is a one-to-one mapping between neurons and odorants, which is at variance with our knowledge of biological olfaction (§2). To distribute the code, we instead use a projected setup where the firing rates of the neurons are mapped to the concentration estimate **c** through a matrix . Even if **Γ** is non-square— in particular, if *n*_{g} *> n*_{odor}—so long as **ΓΓ**^{⊤} is positive-definite and the rates follow the dynamics *τ*_{g}**g**? (*t*) = (**AΓ**)^{⊤}[**s**⊘ (**r**_{0} + **AΓg**)−**1**] + **Γ**^{⊤}[(** α**−

**1**)⊘ (

**Γg**)−

**] for some time constant**

*λ**τ*

_{g}, the concentration estimates will still converge to the MAP. This corresponds to preconditioned gradient ascent [39, 79]. Again, we formally require the constraint that , which translates into a constraint on

**g**. Most simply, we may take and choose

**Γ**to be positivity-preserving.

These dynamics include two divisive non-linearities, which can be challenging to implement in biophysical models of single neurons [80]. Using the approach proposed by Chalk et al. [81], we can linearize the inference by introducing two additional cell types that have as their fixed points the elementwise divisions **s** ⊘ (**r**_{0} + **AΓg**) and (** α** −

**1**) ⊘ (

**Γg**). Concretely, we introduce cell types with rates and such that their fixed-point rates for fixed

**g**are

**p**

^{*}=

**s**⊘(

**r**

_{0}+

**AΓg**) and

**z**

^{*}= (

**−**

*α***1**) ⊘ (

**Γg**), respectively. This yields the coupled circuit dynamics for cell-type-specific time constants

*τ*

_{g},

*τ*

_{p}, and

*τ*

_{z}, where ⊙ denotes elementwise multiplication. In the limit

*τ*

_{p},

*τ*

_{z}≪

*τ*

_{g}, this circuit will recover the MAP gradient ascent. If

*τ*

_{p}and

*τ*

_{z}are not infinitely fast relative to

*τ*

_{g}, we expect these dynamics to approximate the desired dynamics [81] (see Appendix C for a preliminary analysis of the linear stability of the MAP fixed-point). We will test the accuracy of this approximation for biologically-reasonable time constants using numerical experiments. Moreover,

**p**should formally be constrained to , such that the non-negativity of the target ratio is respected. Finally, we note that in the special case

**=**

*α***1**in which the Gamma prior reduces to an exponential prior, the introduction of the cell type

**z**is no longer required.

## 4 Biological interpretation and predictions of the circuit model

We now argue that the normative model derived in the preceding section can be mapped onto the circuitry of the OB. In particular, though the model was derived based only on general biological principles, its specific features are biologically implementable based on the detailed anatomy and physiology of the OB. In terms of the levels of understanding of neural circuits proposed by David Marr, this is an example of how normative modeling can bridge the gap between algorithmic and mechanistic understanding [82, 83].

As foreshadowed by our notation, we interpret the cell type **g** as the granule cells of the OB, and the cell type **p** as the mitral cells, which are projection neurons. Provided that the elements of the matrix **AΓ** are non-negative, this interpretation is justified at the coarsest level by the signs with which the two cell types appear in the dynamics: the **p** neurons excite the **g** neurons, which in turn inhibit the **p** neurons. Finally, we interpret the cell type **z** as representing a form of cortical feedback. In the remainder of this section, we will justify this mapping in detail.

### 4.1 Circuit anatomy: weight transport, dendro-dendritic coupling, and cortical feedback

The first salient feature of the dynamics (3) is that the cell types **g** and **p** do not make direct lateral connections amongst themselves. Rather, they connect only indirectly through the neurons of the opposing cell type, matching the connectivity structure of mitral and granule cells (Fig. 1A). Moreover, the synaptic weights **AΓ** of the connections from **g** to **p** neurons mirror exactly the weights (**AΓ**)^{⊤} of the connections from **p** to **g** neurons. Naïvely, this creates a weight transport problem of the form that renders backpropagation biologically implausible: how should the exact transpose of a matrix be copied to another synapse [84–86]? However, in the unique case of the OB this does not pose a substantial obstacle, as the mitral and granule cells are coupled by dendro-dendritic synapses, meaning that bi-directional connectivity occurs at a single physical locus (Fig. 1) [49, 57, 87].

This interpretation accounts for the interactions between the cell types **g** and **p** in the dynamics (3), but what about the cell type **z**? These cells give direct excitatory input to the granule cells **g** with weights **Γ**^{⊤}, and represent the concentration-dependent contribution to the log-prior gradient. We can therefore interpret these cells as representing feedback from olfactory cortical areas to the OB, which arrives at the granule cells [88–90]. Though our model can thus flexibly incorporate cortical feedback, for our subsequent simulations we will focus on the simple case in which the prior is static and has ** α** =

**1**, in which case it reduces to an exponential and feedback is not explicitly required.

Given this mapping of the cell types of our model to the cell types of the bulb, we will henceforth choose the membrane time constants to match experiment, taking *τ*_{p} = 20 ms [91] and *τ*_{g} = 30 ms [92]. This matching of timescales is required for our comparison of model inference timescales to the timescale of a single sniff to be meaningful.

### 4.2 Choosing the affinity matrix

In theories of Poisson CS, the optimal sensing matrix **A** is one that has columns that are in some precise sense as orthogonal as possible, so that it acts as an approximate isometry [27–33]. However, biologically, the olfactory system is not free to choose optimal sensors. Rather, the affinity profiles of each receptor are dictated by biophysics and by evolutionary history [53, 54, 93]. To build a more realistic model for OSN sensing, we therefore turn to biological data. Using two-photon calcium imaging, we recorded the responses of 228 mouse OSNs glomeruli to 32 odorants (see Appendix F for details; these data were previously published in [94]). In Fig. 1B, we show that the distribution of responses is well-fit by a Gamma distribution with shape 0.37 and scale 0.36 (Appendix F). We then define our ensemble of sensing matrices **A** by drawing their elements as independently and identically distributed Gamma(0.37, 0.36) random variables. An example matrix drawn from this ensemble is shown in Fig. 1C.

### 4.3 Divisive predictive coding by mitral cells

In our model (3), odorant concentration estimates are decoded from granule cell activity. This feature is shared with previous Gaussian CS models of olfactory coding [20, 24, 25]. Our model, however, matches biology more closely than these previous works because it allows for a distributed code rather than assuming that each granule cell codes for a single odorant.

What, then, is the functional role of the mitral cells in our model? We can interpret their dynamics as implementing a form of predictive coding, in which they are trying to cancel their input by the current prediction. Because the mitral cell activity converges to the ratio of their input to the prediction, this a divisive form of predictive coding [81]. In contrast, a Gaussian noise model gives a subtractive form of predictive coding in which the activity converges to the difference between input and prediction (Appendix D) [15]. In Fig. 2A, we show example timecourses of model mitral cell activity following the onset of a stimulus. Consistent with experimentally-measured responses [96, 97], a sharp transient response at the onset of stimulation is followed by decay to a low level of tonic activity (Fig. 2A).

### 4.4 State-dependent inhibition of mitral cells

A salient feature of our circuit model is that the inhibition from the granule cells onto a mitral cell is gated by the activity of the mitral cell itself (3). This state-dependent inhibition is reminiscent of *in vitro* experiments showing that granule cell mediated inhibition is activity dependent [95]. In these experiments, Arevian et al. measured the activity of a primary mitral cell (*MC*_{A}) while increasing its level of stimulation under two conditions. In the first condition, no other cells in the circuit are being stimulated. In the second condition, they also activate another mitral cell (*MC*_{B}). The activation of the second mitral cell leads to the activation of granule cells connected to both mitral cells and a reduction in the firing evoked by stimulation of the primary cell alone. Strikingly, these authors showed that this inhibition is dependent on the activity of the primary cell [95].

Here, we show that our proposed circuit reproduces these observations of state-dependent inhibition, and that they do not arise in a similarly-constructed circuit for a Gaussian noise model. To model Arevian et al. [95]’s *in vitro* experiments, we simulated a reduced circuit with 2 mitral cells and 10 granule cells. We stimulated *MC*_{A} with an input *s*_{A}∈[1 : 400] while toggling on or off the stimulation *s*_{B} = 80 of the second mitral cell *MC*_{B}. As observed experimentally, when *MC*_{B} is activated, the inhibition of the primary mitral cell *MC*_{A} is state dependent (Fig. 2B). To show that this effect arises from our Poisson circuit, we build a similar inference circuit with a Gaussian noise model (Appendix D). Under similar conditions, the inhibition in the Gaussian circuit is independent of the activity of the primary mitral cell *MC*_{A} (Fig. 2B). Furthermore, the dynamics of the relative inhibition qualitatively recapitulate those observed experimentally, with sustained inhibition throughout the stimulation period at the stimulation level of *MC*_{A} leading to maximal inhibition and inhibition followed by relaxation for stronger stimulation levels of *MC*_{A} (Fig. 2C and see [95] Fig. 3b).

## 5 Geometry, speed, and capacity

We have argued that the model introduced in §3 could be implemented biologically, but can it perform at scale? Concretely, can a circuit of this architecture with neuron counts comparable to the human OB correctly identify which among a large set of odorants are present in a given scene? This is precisely the question of the capacity of the CS algorithm [98, 99]. In Fig. 3A, we present our algorithm with scenes composed of varying numbers of randomly-selected odorants out of a panel of 1000, which for simplicity we take to be at the same concentration. We first ask how many odorants can be reliably detected within a single sniff, i.e., 200 ms. To convert MAP concentration estimates into presence estimates, we simply binarize the estimated concentrations based on whether they are larger than half of the true odorant concentration. The ability of the one-to-one code to successfully detect odorants falls off rapidly, with the detection fraction falling below one-half even if only a handful of odorants are present (Fig. 3A).^{4}

The limited capacity of the one-to-one code can be overcome by distributing the code in a way that takes into account the geometry of the sensing problem. Here, the information geometry of the problem is governed by the sensing matrix **A**, which introduces correlations in the input signals to mitral cells because the off-diagonal components of **A**^{⊤}**A** are non-negligible. To counteract these detrimental correlations, we can choose a readout matrix **Γ** such that **ΓΓ**^{⊤}≈(**A**^{⊤}**A**)^{+} up to constants of proportionality, where we must take a pseudoinverse because **A**^{⊤}**A** is highly rank-deficient [37–40] (see Appendix G for details). As a control, we also consider a naïvely distributed code with . Distributing the code, even without accounting for the geometry of inference, markedly improves the single-sniff capacity to around 10-20, and taking into account the geometry produces a further improvement (Fig. 3A).

To gain a more granular view of how tuned geometry enables faster inference, in Fig. 3B we test the three models’ detection capabilities at sub-sniff resolution. We can see that at long times—around a second after odorant onset—the naïvely distributed and geometry-aware codes achieve similar capacities of around 50-60 odorants. However, the geometry-aware code reaches this detection capacity within a single sniff, whereas the naïvely distributed code requires the full second of processing time. The detection capacity of the one-to-one code reaches only around 20 odorants after 1 second, and even then does not appear to have reached its asymptote. These results illustrate two important conceptual points: First, when the strength of individual synapses is bounded, distributed coding can speed up dynamics by increasing the effective total input to a given neuron. Second, given a sensing matrix that induces strong correlations, geometry-aware distributed coding can accelerate inference by counteracting that detrimental coupling.

These tests show show that our model can reliably detect tens of odorants from a panel of thousands of possible odorants, but do not probe how the model’s capacity scales to larger odor spaces. While the true dimensionality of odor space remains unknown [100, 101], there may be orders of magnitude more than thousands of possible odorants. As an upper bound, there are on the order of millions of known volatile compounds that are plausibly odorous [102]. Thus, it is important to determine how our model scales to more realistically-sized odor spaces. From the literature on compressed sensing performance bounds, we expect the threshold sparsity to decay slowly—roughly logarithmically— with increasing *n*_{odor} [28–33]. As a first step, in Supp. Figs. G.1 and G.2, we reproduce Fig. 3A-B for between 500 and 8000 possible odorants, showing that performance does indeed drop off slowly with increasing odor space dimension. To get a more precise estimate of how performance scales with repertoirse size, in Fig. 3C we plot the threshold number of odorants for which half can be reliably detected as a function of the repertoire size, showing that for the geometry-aware code it decays only a bit faster than logarithmically. Our ability to simulate larger systems was limited by computational resources. This limitation is present in previous works, and the repertoires tested here are comparable to—or substantially larger than—those used in past studies [21, 22, 24, 26, 72].

## 6 Fast sampling for uncertainty estimation

Thus far, we have focused on a circuit that performs MAP estimation of odorant concentrations. However, to successfully navigate a dynamic, noisy world, animals must estimate sensory uncertainty at the timescale of perception [38, 103–107]. Fortunately, our circuit model can be easily extended to perform Langevin sampling of the full posterior distribution, allowing for uncertainty estimation while maintaining its attractive structural features. In Appendix B, we provide a detailed derivation of a model that implements Langevin sampling through the granule cells for a Poisson likelihood and Gamma prior. This yields a circuit that is identical to the MAP estimation circuit introduced in §3 up to the addition of Gaussian noise to the granule cell dynamics:
Here, ** ξ**(

*t*) is a vector of

*n*

_{g}independent zero-mean Gaussian noise processes with covariance 𝔼[

*ξ*

_{j}(

*t*)

*ξ*

_{j}

*′*(

*t*

^{′})] = 2

*τ*

_{g}

*δ*

_{jj}

*′ δ*(

*t*−

*t*

^{′}), and once again the rates

**g**and

**p**should in principle be constrained to be non-negative. In this case, the readout matrix

**Γ**both preconditions the effective gradient force and shapes the structure of the effective noise

**Γ**, allowing us to mold the geometry of the sampling manifold [37, 40]. By using a projected readout, we maintain the independence of the noise processes for different neurons. This both allows us to have many independent samplers—if

*ξ**n*

_{g}

*> n*

_{odor}—and is important for biological realism if we interpret the sampling noise as resulting from fluctuations in membrane potential due to synaptic noise [108].

As a test of how this sampling circuit performs, we consider a simple setup in which one set of odorants appears at a low concentration, and then a second set of odorants appear at a higher concentration while the low odorants are still present. This setup tests both the circuit’s ability to converge rapidly enough to give accurate posterior samples within a single sniff, and its ability to correctly infer odorant concentrations even in the presence of distractors [63, 66]. In Fig. 4, we show that circuits with one-to-one or naïvely distributed codes do not give accurate estimates of the concentration mean within 200 ms, while tuning the geometry to match the receptor affinities enables fast convergence. All three circuits overestimate the posterior variance at short times, consistent with what one would expect for unadjusted Langevin samplers [109], but the geometry-aware model’s estimate decays most rapidly towards the target. Therefore, when the synaptic weights are tuned, our circuit model can enable fast, robust estimation of concentration statistics, even in the presence of distractors.

## 7 Discussion

In this paper, we have derived a novel, minimal rate model for fast Bayesian inference in early olfaction. Unlike previously-proposed algorithms for CS in olfaction, this model has a clear mapping onto the circuits of the mammalian OB. We showed that this model successfully performs odorant identification across a biologically-relevant range of scene sparsities and circuit sizes. This model therefore exemplifies how normative approaches can blur the lines between algorithmic and mechanistic understanding of neural circuits [82, 83]. We now conclude by discussing possible avenues for future inquiry, as well as some of the limitations of our work.

One limitation of our simulations is that we have chosen to distribute the neural code randomly, and have allowed for negative entries in the mitral-granule synaptic weight matrix **AΓ** (see methods in Appendix G). These features are not entirely biologically satisfactory. However, our model only captures the mitral cells and granule cells, overlooking a number of other inhibitory cell types that could contribute to solving this problem, e.g., through feedforward inhibition onto mitral cells or lateral inhibition across granule cells [48, 49]. Fundamentally, negative values in the geometry-aware decoding matrix **Γ** arise as although the affinity matrix is positive, its inverse will contain negative elements. A biologically-plausible realization of the geometry-aware code through the introduction of additional cell types could be achieved by decomposing the inverse into several components, yielding sparse, consistently-signed connectivity [110]. As a first step, in Supp. Fig. G.4 we show that similar performance to Fig. 4 can be achieved using a sparse non-negative randomly distributed code. As a result, one objective for future work will be to develop better models for the decoding matrix **Γ** that result in more realistic connectivity.

Though our model captures two of the interesting features of the anatomy and physiology of the OB—symmetric dendrodendritic coupling and state-dependent inhibition of mitral cells—there are many biological details which we have not addressed. First, our linear model for OSN mean firing neglects receptor antagonism, gain control, and other nonlinear effects [94, 111, 112], which are known to affect the performance of Gaussian CS models for olfaction [26, 113]. Second, our models are rate-based, while neurons in the OB spike. In spiking implementations of sampling networks, the noise is not uncorrelated across neurons, complicating their biological interpretation [38, 114]. Constructing models that capture these richly nonlinear effects will be an important objective for future work. A first step towards such a nonlinear model would be to build a spiking network that approximates the rate-based models considered here, which could be accomplished using the efficient balanced network formalism for distributed spiking networks [38]. Another step would be to add a Hill function nonlinearity to the OSN model to approximate competitive binding, as studied for Gaussian compressed sensing by Qin et al. [26]. One challenge in constructing models that incorporate additional nonlinearity is that the simple linear strategy for distributing the code used here may no longer be directly applicable. In Appendix E, we illustrate this obstacle for the relatively simple case of a model with the same linear OSNs and Poisson likelihood but an *L*_{0} instead of Gamma prior, building on recent work on circuits for Gaussian CS with *L*_{0} priors [115].

A closely related point is that we model the weights of the synapses between mitral and granule cells as fixed, and do not consider synaptic plasticity. In particular, we assume that they are tuned to the statistics of the receptor affinities without specifying a mechanism by which this tuning could take place. In biology, receptor abundances and other OSN properties display activity-dependent adaptation over long timescales, meaning that the optimal tuning is unlikely to be static [56, 116]. Some past works have sought to incorporate plasticity of the mitral-granule cell synapses into decoding models [24, 117], tying into a larger body of research on how plasticity can enable flexible feature extraction in olfaction [118, 119]. This learning should lead to measurable changes in the population response to odorant panels, which our model predicts should be linked in a precise way to account for receptor-induced correlations in the responses to the most frequently present odorants. Experimentally characterizing and carefully modeling these changes in responses across timescales will be an interesting avenue for future work [120]. Experimental techniques to probe these ideas at the neural and behavioral level have recently been proposed [121, 122] which allow more precise control of stimulus and subjective geometry.

Though the circuit model derived in §3 incorporates a general Gamma prior represented by cortical feedback, our simulations focus on the special case in which the prior reduces to an exponential, in which the feedback neurons are not needed. Future work will therefore be required to carefully probe the effect of incorporating a Gamma prior with non-unit shape and to dissect the structure of the resulting modeled cortical feedback. More generally, it will be interesting to extend our framework to incorporate data-adaptive priors. Importantly, the stimuli used in this paper constitute an extremely impoverished model for the richness of the true odor world; we do not account for its rich dynamical structure and co-occurence statistics. Adaptive priors as encoded by cortical feedback would allow circuits to leverage this structure, enabling faster and more accurate inference [88–90, 123, 124].

We conclude by noting that our work provides an example of how distributed coding can lead to faster inference than axis-aligned disentangled coding. In recent years, the question of when axisaligned coding is optimal has attracted significant attention in machine learning and neuroscience [125–135]. Much of this work focuses on the question of when axis-aligned codes are optimal for energy efficiency or for generalization, whereas here we focus on the question of which code yields the fastest inference dynamics. These ideas are one example of the broader question of how agents and algorithms should leverage the rich geometry of the natural world to enable fast, robust learning and inference [136]. We believe that investigating how task demands and biological constraints affect the optimal representational geometry for that task is a promising avenue for illuminating neural information processing in brains and machines [41, 136, 137].

## A Notational conventions

In this Appendix, we define the notational conventions used throughout the paper. We write for the non-negative reals, and ℕ= {0, 1, …} for the natural numbers. For a vector , we write
if the probability mass function of **x** ∈ ℕ^{n} is
For scalars *α >* 0 and *λ >* 0, we write that a vector is distributed as
if its probability density function is
Similarly, for vectors ** α**, , we write if
if
For vectors

**x**,, we write

**x**⊙

**y**for their Hadamard (elementwise) product and

**x**⊘

**y**for their elementwise ratio

## B Detailed derivation of circuit model

In this Appendix, we give a detailed derivation of the circuit models introduced in Sections 3 and 6 of the main text. Here, we focus on the setting of the sampling circuit; the circuit algorithm to compute the MAP introduced in Section 3 of the main text can be recovered at each step by dropping the additive noise terms from the dynamics.

We recall that our goal is to sample the posterior *p*(**c**|**s**) over concentrations **c** given OSN activity **s**, for a Poisson likelihood
and a Gamma prior
Here, we use the conventions of Appendix A for Poisson and Gamma random vectors. Discarding normalization constants that do not depend on **c**, the density of the posterior is then
We will sample from this distribution using Langevin dynamics, without explicitly constraining the concentration estimate to be non-negative [21, 22]. This corresponds to the stochastic dynamics
where ** η**(

*t*) is an

*n*

_{odor}-dimensional zero-mean Gaussian noise process with

**E**[

*η*

_{j}(

*t*)

*η*

_{j}

*′*(

*t*

^{′})] = 2

*δ*

_{jj}

*′ δ*(

*t*−

*t*

^{′}). The gradient of the log-posterior is or, in vector notation, The Langevin dynamics then become This is closely related to the Langevin sampling algorithm introduced in equation (3.9) of Grabska-Barwińska et al. [21], but here we are not separately inferring odor concentration and odor presence. We note that, for

*α*

_{j}

*>*1, the term (

**−**

*α***1**) ⊘

**c**gives a force that diverges for small

*c*

_{j}.

However, in this setup single neurons encode single odors. We instead want to allow for a distributed code, where the population of neurons responsible for sampling may not directly encode single odors. To distribute the code, we follow previous work by Masset et al. [38] in using the “complete recipe” for stochastic gradient MCMC [37]. From the “complete recipe”, we know that the dynamics
for a (expectantly named) time constant *τ*_{g} *>* 0 and any (potentially non-square) matrix such that **ΓΓ**^{⊤} is positive-definite, will have as their stationary distribution the posterior *p*(**c**|**s**). here, we have replaced the *n*_{odor}-dimensional noise process ** η**(

*t*) with an

*n*

_{g}-dimensional noise process

**(**

*ξ**t*) with zero mean and covariance Then, we can write the concentration estimate as where the activity of the neurons follows the dynamics Using the gradient of the log-posterior computed above, we then have These dynamics include two divisive non-linearities, which can be complex to implement in biophysical models of single neurons [80]. Using the approach proposed in Chalk et al. [81], we can linearize the inference by introducing two additional cell types that have as their fixed points the elementwise divisions

**s**⊘(

**r**

_{0}+

**AΓg**) and (

**-**

*α***1**)⊘(

**Γg**). We then introduce a population

**p**of

*n*

_{OSN}neurons, with dynamics such that the fixed point is We finally introduce a third population

**z**of

*n*

_{odor}neurons, with dynamics such that the fixed point is To be more precise, the cell types

**p**and

**z**compute the desired elementwise divisions in the pseudosteady-state regime

*τ*

_{p},

*τ*

_{z}↓ 0.

Putting everything together, this gives the circuit dynamics
where we recall that the covariance of the zero-mean Gaussian noise process ** ξ**(

*t*) is We note that the two constant terms in the dynamics of

**g**can be grouped into an overall leak term −

**Γ**

^{⊤}(

**A**

^{⊤}

**1**+

**). As detailed in the main text, we interpret**

*λ***p**as M/T cells,

**g**as granule cells, and

**z**as cortical feedback.

## C Stability of the MAP fixed point

In this Appendix, we aim to get some understanding for how the introduction of the cell type **p** to represent the elementwise division affects the inference, particularly when *τ*_{p} is comparable to *τ*_{g}. For simplicity, we specialize to the case of an exponential prior (i.e., we set ** α** =

**1**), in which case the general circuit (3) simplifies to

### C.1 Analysis of a two-cell circuit

To build intuition, we first consider a circuit with *n*_{odor} = *n*_{OSN} = *n*_{g} = 1:
In this case, it is useful to non-dimensionalize the system. Re-scale time as
and define the dimensionless parameters and input
Then, the dynamics are
These dynamics have a single fixed point at
Linearizing about this fixed point, we have
for
The eigenvalues of **M** are easily computed as
For any positive and non-negative , we have
hence it is easy to see that the real parts of both of these eigenvalues are strictly negative so long as , meaning that the system is stable. If —which should be exceedingly rare if the OSNs have a baseline rate—then
and there can be oscillations. Therefore, a linear stability analysis suggests that the MAP fixed point of this two-cell circuit should be stable even for large in the presence of a non-zero input, though we expect the relaxation timescales to grow as becomes larger.

### C.2 Analysis of the full circuit

We now consider the full MAP circuit
We assume that *n*_{OSN} *< n*_{odor} ≤ *n*_{g} and that **AΓ** is of full row rank, i.e., it has rank *n*_{OSN}. We recall that and subspace, and, in particular, the **p**-dependent term affects only an *n*_{OSN}-dimensional subspace.

We start by observing that the dynamics of **p** depend on **g** only through
Importantly, if **AΓ** is positivity-preserving, then .Then, we have the closed dynamics
for . The fixed point of this system is of course determined by the conditions *d*(**q, p**)*/dt* = **0**, which gives
subject to the non-negativity constraints. By our assumptions on the rank of **AΓ**, the symmetric matrix **AΓΓ**^{⊤}**A**^{⊤} is positive-definite and thus can be inverted to solve the first condition for **p**^{*}:
For the second condition to be satisfied, we can see that the elements of **p**^{*} must be strictly positive at the fixed point, which gives a self-consistency condition on **A, Γ**, and ** λ**. Assuming that this holds, we then have
which again gives a self-consistency condition as non-negativity is required.

Assuming these conditions hold, we can then linearize the dynamics about the fixed point. For convenience, we non-dimensionalize time through
and
which yields the linearized dynamics
for
Using the fact that the diagonal matrices diag(**p**^{*}) and diag(**s**⊘**p**^{*}) commute, the characteristic polynomial of **M** is
One case that is particularly easy to solve is when the symmetric positive-definite matrix **AΓΓ**^{⊤}**A**^{⊤} is in fact diagonal, with positive diagonal entries *a*_{j}. Then, we have
meaning that the eigenvalues of **M** are
which under the given assumptions always have strictly positive real part for non-negative inputs.

Now, more generally, assume that **M** is diagonalizable, and let **m** be a un-normalized eigenvector of **M** with eigenvalue *μ*. Then, writing
for **u**, , the eigenvector condition
implies that **u** and **v** satisfy
Assuming that *μ* is non-zero, we can solve the first equation for **u** and then substitute the result into the second to obtain a quadratic eigenproblem for *μ* and **v**:
We will not attempt to solve this eigenproblem, but will instead attempt to extract information about possible values of *μ* for a fixed **v**. Suppose (without loss of generality given the assumption that it is nonzero, as otherwise we may divide by its norm) that **v** is a unit vector.Then, acting with **v**^{†} from the left, we have
where we define the coefficients
and
So long as **s** and **p**^{*} are positive, *a* is real and positive. Let
such that
By assumption, the elements of are strictly greater than −1. Then,
while
Then, solving the quadratic equation for *μ*, we have
for
This gives
If Im(*b*) = 0, then we are in the same situation as we were for the two-neuron circuit, and the fixed point is thus stable.

More generally, for the system to be stable we want to have
For either sign of , one can show that this holds provided that
and
This amounts to a combined condition on the effective prior strength and the relative time constants. Therefore, for sufficiently small , we have stability for a broad range of *τ*_{p}.

How can we go from stability of the (**q, p**) circuit to stability of the full circuit? Heuristically, this follows by decomposing the dynamics of **g** in terms of the different subspaces. We will not attempt to do this rigorously, as our goal is only to get a rough sense of how the system should behave. By the fact that their dimensions are strictly ordered, we know that span(**Γ**^{⊤})⊃span[(**AΓ**)^{⊤}]. We can first exclude components outside span(**Γ**^{⊤}), as they will be unchanged by the dynamics and do not affect the concentration estimate. Recalling the non-negativity constraint, components in span(**Γ**^{⊤})\span[(**AΓ**)^{⊤}] will decay linearly to zero. Then, the components in span[(**AΓ**)^{⊤}] should be controllable using the argument for the (**q, p**) space. Giving a fully rigorous analysis of the conditions under which the MAP fixed point is stable will be an interesting avenue for future investigation.

## D Derivation for the Gaussian circuit model

To compare the result of the state dependent inhibition due to estimating the MAP for a Poisson noise model with the fixed inhibition resulting from Gaussian noise we build an alternative circuit model. This circuit still shares the separation into two cell types, but the granule cells **g** converge on the solution for a Gaussian noise model.

Our starting point is an isotropic Gaussian likelihood
variance *σ*^{2}, and an exponential prior
Gradient ascent on the resulting log-posterior over **c** leads to the dynamics:
Distributing the code such that **c**(*t*) = **Γg** and splitting the inference as we did in the Poisson case gives the circuit dynamics
Unlike in the Poisson case, the inhibition onto the projection cells (mitral cells) is not gated by their activity, leading to the fixed offset shown in Fig. 2B.

## E Extending the sampling circuit to incorporate an *L*_{0} prior

In this Appendix, we consider the possibility of extending our circuit model to incorporate an *L*_{0} spike-and-slab prior
where *ϖ* is the probability that the odor is present and *λ* is the rate of the exponential prior on concentrations given that the odor is present. To sample from the resulting posterior using Langevin dynamics, we will follow the approach of Fang et al. [115], who developed a Langevin algorithm to perform sparse coding with this prior given a Gaussian likelihood. This approach only works in the sampling setting; it cannot be applied to MAP estimation because the Dirac mass at *c*_{i} = 0 means that the MAP estimate will always vanish.

In this approach, we define an auxiliary variable **u** that is mapped to concentration estimates **c** via element-wise soft thresholding:
where
is the soft-thresholding function for threshold
We then posit the follo ing Langevin d namics for **u**, given an observation **s**:
with no constraint on the sign of **u**, where ** η**(

*t*) is a white Gaussian noise process. Here, all nonlinearities are applied element-wise, and

**u**

_{0}=

*u*

_{0}

**1**is a vector with all elements equal to the threshold

*u*

_{0}. Fang et al. [115] argue that the stationary distribution induced on

**c**=

*f*(

**u**) by these dynamics should be the desired posterior with

*L*

_{0}prior.

Using the Poisson likelihood gradient as computed before, we have
At this stage, we can see that the soft thresholding has picked out a preferred basis. If we apply the complete recipe in a way analogous to what we did in Appendix B by writing
we will have
where ** ξ**(

*t*) is a zero-mean Gaussian noise process with covariance Here, we cannot simply regroup terms; if we introduce an additional cell type

**p**as before to compute the division we will have a sort of effective weight matrix for the

**p**-to-

**g**connections, and an entirely different coupling

**A**

*f*(|

**Γg**|) for the

**g**-to-

**p**connections. Concretely, introducing

**p**as before, we have the system This issue illustrates the limitations of the simple, linear approach to distributing the neural code used in Appendix B. In particular, the nonlinearity in some sense picks out a preferred basis, meaning that we can no longer perform a simple linear change of coordinates to distribute the code.

## F Experimental details and affinity matrix fitting

Here we briefly describe the experimental procedures used to collect the data used to fit the parameters of the affinity matrix **A** in Fig. 1. The full experimental methods are described in the paper that first presented the data, Zak et al. [94]. All the experiments were performed in accordance with the guidelines set by the National Institutes of Health and approved by the Institutional Animal Care and Use Committee at Harvard University.

### F.1 *In vivo* recordings from mouse OB

Adult (> 8 weeks) OMP-GCaMP3 mice of both sexes were used in this study. A craniotomy was performed to provide optical access to olfactory sensory neuron axon terminals in both olfactory bulbs. A custom-built two-photon microscope was used for in vivo imaging. Images were acquired at 16-bit resolution and 4-8 frames/s. The pixel size was 0.6 *μm* and the fields of view were 720×720 *μm*. Monomolecular odorants (Allyl butyrate, Ethyl valerate, Methyl tiglate, and Isobutyl propionate) were used as stimuli and delivered by a 16-channel olfactometer controlled by customwritten software in LabView. For the odorant concentration series, the initial odorant concentration was between 0.08%−80%(*v/v*) in mineral oil and further diluted 16 times with air. The relative odorant concentrations were measured by a photoionization detector, then normalized to the largest detected signal for each odorant. For all experiments, the airflow to the animal was held constant at 100*mL/min*, and odorants were injected into a carrier stream. Each odorant concentration was delivered 2–6 times in pseudorandom order.

Images were processed using both custom and available MATLAB scripts. Motion artifact compensation and denoising were done using *NoRMcorre* [138]. The Δ*F/F* signal was calculated by finding the peak signal following odorant onset and averaging with the two adjacent points. To account for changes in respiration and anesthesia depth, correlated variability was corrected for [139]. Thresholds for classifying responding ROIs were determined from a noise distribution of blank (no odorant) trials from which three standard deviations were used for responses.

### F.2 Gamma distribution fitting

In order to fit the affinity matrix **A** to the experimentally recorded data, we normalized the response of each glomerulus to its maximum response across the panel of 32 odorants. We then vectorized the resulting matrix and fitted a Gamma distribution using the gamfit function in MATLAB. As mentioned in the main text, this resulted in a Gamma(0.37, 0.36) distribution.

## G Numerical methods and supplemental figures

All numerical simulations were performed using MATLAB 9.13 (R2022b, The MathWorks, Natick, MA, USA) either on desktop workstations (CPU: Intel i9-9900K or Xeon W-2145, 64GB RAM) or on the Harvard University FASRC Cannon HPC cluster (https://www.rc.fas.harvard.edu). Our simulations were not computationally intensive, and required around that 6000 CPU-hours of total compute time.

### G.1 State-dependent inhibition simulations

For the simulations to highlight the signatures of state-dependent inhibition of the Poisson network and compare its behavior with the experimental observation by Arevian et al. [95] we use a reduced network with *n*_{OSN} = 2 and *n*_{g} = 10. We chose this reduced circuit both to reduce computational costs and to match the experimental parameters of the *in-vitro* experiment. The other parameters of the simulation where as follows: *τ*_{p} = 0.02, *τ*_{g} = 0.03, *r*_{0} = 10, *λ* = 2, *dt* = 1*e*^{−4}. The simulation ran for 600 *ms*, stimulation was applied starting at *t*_{start} = 100 *ms* and ended at *t*_{end} = 500 *ms*. We stimulated the principal mitral cell *MC*_{A} with 80 equally spaced values in *s*_{A}∈[1, 400]. When the second mitral cell was active, its stimulation was set at *s*_{B} = 80. The entries of the Γ matrix were sampled from a normal distribution on which we applied a mask such that only 25% of its entries were non-zero. We sampled 32 ‘pairs’ of cells by resampling the Γ matrix for each ‘pair’ of cells. The simulation for the Gaussian circuit used the same parameters except the dynamics followed those from equation D.4. For plotting purposes, we normalized the range of firing in Figure 2B to the maximum firing rate for each circuit model.

### G.2 Capacity simulations

In our capacity simulations, we use *n*_{OSN} = 300 and *n*_{odor} = 500, 1000, or 2000. We take the odor stimulus to be a rectangular pulse, with varying numbers of randomly-selected odors appearing at concentration *c*_{j} = 40. As elsewhere, we take *τ*_{p} = 20 ms [91] and *τ*_{g} = 30 ms [92] to match experiment. We set the baseline rate to be *r*_{0} = 1 and the prior mean to be *λ* = 1; based on some experimentation our results appear relatively insensitive to small variations in these choices. We integrate the MAP circuit dynamics (3) using the forward Euler method with timestep Δ*t* = 10^{−4} s. To determine which odors the model estimates as being present at a given timepoint, we simply check which concentration estimates exceed 20 at that time.

As discussed in the main text, we consider three variants of the MAP circuit, defined by different choices of the matrix **Γ**:

For the one-to-one code, we let

*n*_{g}=*n*_{odor}, and simply set .For the naïvely distributed code, we let

*n*_{g}= 5*n*_{odor}, and choose**Γ**as follows: We sample a random matrix with by drawing a Gaussian matrix and orthogonalizing its rows. Then, we define**Γ**=**Q***/N*{|(**AQ**)_{ij}|}.For the geometry-aware code, we let

*n*_{g}= 5*n*_{odor}, and choose**Γ**as follows: We sample a random matrix with by drawing a Gaussian matrix and orthogonalizing its rows. Then, we compute an approximate inverse square root of the low-rank matrix**A**^{⊤}**A**as for a small positive regularizing constant*a*. In our simulations, we set*a*= 0.5; we find empirically that our results are not substantially sensitive to small variations in*a*. Finally, we let**Γ**=**BQ***/N*{|(**ABQ**)_{ij}|}.

where *N* is a normalization function defined as
with *C* = 50 identified as a reasonable choice. Our normalization convention for **Γ** is motivated by the idea that, in biology, the strength of individual synapses should be bounded. Moreover, changing the overall scale of the synaptic weights in our model corresponds to changing the effective time constant of the dynamics, which can produce a trivial speedup.

### G.3 Sampling simulations

In our sampling simulations, we use *n*_{OSN} = 300, *n*_{odor} = 500 or 1000, and *n*_{g} = 5*n*_{odor}. Here, the odor stimulus was composed of two rectangular pulses: At time *t*_{low} = 0 s, five “low” odors appear at concentration 10, and remain present at that concentration until time *t* = 2 s. Then, at time *t*_{high} = 1 s, five “high” odors appear at concentration 40, and remain present until time *t*_{off} = 2 s. As in our capacity simulations, we take *τ*_{p} = 20 ms [91] and *τ*_{g} = 30 ms [92] to match experiment, and we set the baseline rate to be *r*_{0} = 1 and the prior mean to be *λ* = 1. As for our capacity results, our results appear relatively insensitive to small variations in these choices. We integrate the sampling circuit dynamics (4) using the forward Euler-Maruyama method with timestep Δ*t* = 10^{−5} s.

As in our capacity simulations, in Fig. 4 we choose **Γ** as follows:

For the one-to-one code, we let

*n*_{g}=*n*_{odor}, and simply set .For the naïvely distributed code, we let

*n*_{g}= 5*n*_{odor}, and choose**Γ**as follows: We sample a random matrix with by drawing a Gaussian matrix an orthogonalizing its rows. Then, we define**Γ**=**Q***/*max{|(**AQ**)_{ij}|}.For the geometry-aware code, we let

*n*_{g}= 5*n*_{odor}, and choose**Γ**as follows: We sample a random matrix with by drawing a Gaussian matrix and orthogonalizing its rows. Then, we compute an approximate inverse square root of the low-rank matrix**A**^{⊤}**A**as for a small positive regularizing constant*a*. In our simulations, we set*a*= 0.5; we find empirically that our results are not substantially sensitive to small variations in*a*. Finally, we let**Γ**=**BQ***/*max{|(**ABQ**)_{ij}|}.

In Fig. 4A, we smooth the concentration estimate timeseries using a 100 ms moving average. In Fig. 4B-C, we show cumulative estimates of the mean and variance. Concretely, given a concentration timeseries *ĉ*_{j}(*t*), we estimate the mean and variance as
and
respectively, where we assume *τ* and *t* are integer multiples of Δ*t*. In Fig. 4D-E, we do the same except for times after *t*_{high}. In both cases, baselines were obtained by running the naïve sampling algorithm for 10^{8} steps with a burn-in period of 10^{7} steps. In Supp. Fig. G.3, we reproduce Fig. 4 showing estimates for individual odorants as well as the mean across odorants.

In Supp. Fig. G.4, we show a preliminary experiment with an alternative, mostly non-negative choice for **Γ**. This gives the following three models:

For the one-to-one code, we let

*n*_{g}=*n*_{odor}, and simply set .For the naïvely distributed code, we let

*n*_{g}= 5*n*_{odor}, and choose**Γ**as follows: We sample a sparse, non-negative random matrix with entries that are non-zero with density 0.15, and the non-zero entries are drawn uniformly on [0, 1]. Then, we define**Γ**=**Q***/*max{|(**AQ**)_{ij}|}.For the geometry-aware code, we let

*n*_{g}= 5*n*_{odor}, and choose**Γ**as follows:We sample a sparse, non-negative random matrix with entries that are non-zero with density 0.15, and the non-zero entries are drawn uniformly on [0, 1]. Then, we compute an approximate inverse square root of the low-rank matrix**A**^{⊤}**A**as for a small positive regularizing constant*a*. In our simulations, we set*a*= 0.5; we find empirically that our results are not substantially sensitive to small variations in*a*. Finally, we let**Γ**=**BQ***/*max{|(**ABQ**)_{ij}|}.

This yields a naïvely distributed synaptic weight matrix **AΓ** that is entirely non-negative, while the geometry-aware weight matrix has a small number of negative elements due to the fact that the inverse of a non-negative matrix need not be non-negative. We see that the behavior of this circuit is similar to that observed in Fig. 4. As mentioned in the Discussion, an important topic for future work will be to devise a method to choose **Γ** that yields a fully non-negative, sparse synaptic weight matrix **AΓ**.

## H Code and data availability

All code and data required to reproduce the figures presented is available under an MIT License at https://github.com/Pehlevan-Group/olfaction-geometry/.

## Acknowledgments and Disclosure of Funding

We thank Naoki Hiratani, Shanshan Qin, and Vikrant Kapoor for useful discussions. This work was supported by NSF grants DMS-2134157 and CAREER IIS-2239780 to CP, NIH grants R01DC017311 and R01DC016289 to VNM, and NTT Research award A47994 to VNM. CP received additional support from a Sloan Research Fellowship. PM was partially supported by a grant from the Harvard Mind Brain Behavior Interfaculty Initiative. JDZ was partially supported by NIH grant K99DC017754. This work has been made possible in part by a gift from the Chan Zuckerberg Initiative Foundation to establish the Kempner Institute for the Study of Natural and Artificial Intelligence. A subset of the computations in this paper were run on the FASRC cluster supported by the FAS Division of Science Research Computing Group at Harvard University.

## Footnotes

↵† VNM and CP jointly supervised this work.

jzavatoneveth{at}g.harvard.edu, paul_masset{at}fas.harvard.edu, vnmurthy{at}fas.harvard.edu, cpehlevan{at}seas.harvard.edu

Added plots showing scaling of detection capacity with odorant repertoire size. NeurIPS 2023 Camera-Ready version.

↵

^{3}See Appendix A for a detailed description of our notational conventions.↵

^{4}We remark that, with a one-to-one code, our model is identical to that proposed by Grabska-Barwińska et al. [21] except for the introduction of the granule cells.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].↵
- [6].↵
- [7].
- [8].
- [9].
- [10].
- [11].
- [12].
- [13].
- [14].
- [15].↵
- [16].↵
- [17].↵
- [18].
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].
- [30].
- [31].
- [32].↵
- [33].↵
- [34].↵
- [35].
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].
- [62].↵
- [63].↵
- [64].
- [65].
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].
- [86].↵
- [87].↵
- [88].↵
- [89].
- [90].↵
- [91].↵
- [92].↵
- [93].↵
- [94].↵
- [95].↵
- [96].↵
- [97].↵
- [98].↵
- [99].↵
- [100].↵
- [101].↵
- [102].↵
- [103].↵
- [104].
- [105].
- [106].
- [107].↵
- [108].↵
- [109].↵
- [110].↵
- [111].↵
- [112].↵
- [113].↵
- [114].↵
- [115].↵
- [116].↵
- [117].↵
- [118].↵
- [119].↵
- [120].↵
- [121].↵
- [122].↵
- [123].↵
- [124].↵
- [125].↵
- [126].
- [127].
- [128].
- [129].
- [130].
- [131].
- [132].
- [133].
- [134].
- [135].↵
- [136].↵
- [137].↵
- [138].↵
- [139].↵