## Abstract

There are numerous different odorant molecules in nature but only a relatively small number of olfactory receptor neurons (ORNs) in brains. This “compressed sensing” challenge is compounded by the constraint that ORNs are nonlinear sensors with a finite dynamic range. Here, we investigate possible optimal olfactory coding strategies by maximizing mutual information between odor mixtures and ORNs’ responses with respect to the bipartite odor-receptor interaction network (ORIN) characterized by sensitivities between all odorant-ORN pairs. We find that the optimal ORIN is sparse – a finite fraction of sensitives are zero, and the nonzero sensitivities follow a broad distribution that depends on the odor statistics. We show that the optimal ORIN enhances performances of downstream learning tasks (reconstruction and classification). For ORNs with a finite basal activity, we find that having a basal-activity-dependent fraction of inhibitory odor-receptor interactions increases the coding capacity. All our theoretical findings are consistent with existing experiments and predictions are made to further test our theory. The optimal coding model provides a unifying framework to understand the peripheral olfactory systems across different organisms.

## I. INTRODUCTION

Animals rely on their olfactory systems to detect, discriminate and interpret external odor stimuli to guide their behavior. Natural odors are typically mixtures of different odorant molecules whose concentrations can vary over several orders of magnitude [1–3]. Remarkably, animals can distinguish a large number of odorants and their mixtures by using a relatively small number of odor receptors (ORs) [4, 5]. For example, humans have only ~ 300 ORs [6, 7], and the often cited number of odors that can be distinguished is ~ 10000 [8], the real number may be even larger [9] (see also [10] and [11]). Humans can also distinguish odor mixtures with up to 30 different compounds [12]. In comparison, the highly olfactory lifestyle and exquisite olfactory learning ability of the fly is afforded by only ~ 50 ORs [4, 13]. The olfactory system achieves such remarkable ability through a combinatorial code in which each odorant is sensed by multiple receptors and each receptor can be activated by many odorants [14–16]. In both mammals and insects, odorants bind to receptors in the cilia or dendrites of olfactory receptor neurons (ORNs), each of which expresses only one type of receptor. ORNs that express the same receptors then converge onto the same glomerulus in olfactory bulb (mammals) or antennal lobe (insects), whose activity patterns contain the information about external odor stimuli [13, 17–19]. A key question that we want to address here is how ORNs best represent external olfactory information that can be interpreted by the brain to guide an animal’s behavior [4, 13, 20].

It has long been hypothesized that the input-output response functions of sensory neurons are “selected” by statistics of the stimuli in the organism’s natural environment to transmit a maximum amount of information about its environment, generally known as the efficient coding hypothesis [21, 22] or the related InfoMax principle [23, 24]. For instance, the contrast-response function of interneurons in the fly’s compound eye can be well approximated by the cumulative probability distribution of contrast in the natural environment [22]. The receptive fields of neurons in early visual pathway are thought to exploit statistics of natural scenes [25–29]. Similar result has also been observed in the auditory system [30]. In all these cases, to achieve maximum information transmission, an “ideal” neuron should transform the input distribution into a uniform output distribution [22, 31] and a population of neurons should decorrelate their responses [28, 32, 33].

However, unlike light or sound, which can be characterized by a single quantity such as wavelength or frequency, there are a huge number of odorants each with its own unique molecular structure and different physiochemical properties [34, 35]. The high dimensionality of the odor space thus poses a severe challenge for the olfactory system to code olfactory signals. Fortunately, typical olfactory stimuli are sparse with only a few types of odorant molecules in an odor mixture [1, 2, 36]. The sparsity of the odor mixture immediately reminds us of the powerful compressed sensing (CS) theory developed in computer sceince and signal processing community [37, 38]. The CS theory shows that sparse high dimensional signals can be encoded by a small number of sensors (measurements) through random projections; and the highly compressed signal can be reconstructed (decoded) with high fidelity by using an *L*_{1}-minimization algorithm [37–39]. However, conventional CS theory assumes the sensors to have a linear response function with essentially an infinite dynamic range [40]. In contrast, ORN response is highly nonlinear [41, 42], with a typical dynamic range less than 2 orders of magnitude, which is far less than the typical concentration range of odorants [42].

The use of the CS theory has recently been explored in olfactory systems. For example, Zhang and Sharpee proposed a fast reconstruction algorithm in a simplified setup with binary ORNs and binary odor mixtures without concentration information [43]. In another work, Krishnamurphy *et al*. studied how the overall “hour-glass” (compression followed by decompression) structure of the olfactory circuit can facilitate olfactory association and learning, again with the assumption that ORN responses to odor mixtures are linear [44]. Following ideas in CS theory, Singh *et al* recently proposed a fast olfactory decoding algorithm that might be implemented in the downstream olfactory system [45]. However, All these studies primarily focus on the downstream decoding and learning of the compressed signals by assuming a linear neuron response function. The question of how ORNs with a nonlinear input-output response function can best compress the sparse high dimensional odor information remains unanswered.

Another related study is the recent work by Zwicker *et al*. [46] where the authors investigated the maximum entropy coding scheme for the olfactory system by using a simplified binary response function, where an odor only induces a response when its concentration is above a threshold that is inversely proportional to the receptor sensitivity to the odor. They found two conditions for the binary ORNs to maximize the information transmission. The first condition is that each ORN on average responses to half of the odors, i.e., half of the odors have a concentration that is higher than the corresponding threshold; and the other condition is that the responses from different ORNs need to be uncorrelated. These results were obtained by studying the average activities of the ORNs and their correlations. However, due to the limitation of the binary input-output response function and the specific prior for the sensitivity distribution used in [46], the optimal coding strategy for neurons with realistic physiological properties remains unclear.

To address these important open questions, here we study the optimal coding scheme by using a realistic ORN input-output response function where the ORN output depends on the odor concentration continuously in a nonlinear (sigmoidal with odor concentration on a logarithmic scale) form characterized by its sensitivity, or equivalently the inverse of the half-maximum response concentration. By optimizing the input-output mutual information in the full sensitivity matrix space without any prior and following general insights from compressed sensing for sparse odor mixtures, we systematically study the statistical properties of the optimal sensitivity matrix and their dependence on odor statistics.

We found that the optimal ORN sensitivity matrix is sparse, i.e., each receptor only responds to a fraction of the odorants in its environment and its sensitivity to the other odorants is *zero*. The sparsity itself is a robust (universal) feature of the optimal sensitivity matrix, and the value of the optimal sparsity depends on statistics of the odor mixture and the number of ORNs. By using a simple mean-field theory, we show that the sensitivity matrix sparsity is caused by the competition (trade-off) between enhancing multiple odor detection and avoiding odor-odor interference. Next, we demonstrate the advantages of the maximum entropy (information) coding scheme in two downstream “decoding” (learning) tasks: reconstruction and classification of odor stimuli. Finally, we generalize our theory to neurons with a finite basal activity, where we found that the optimal coding strategy is to allow co-existence of both odor-evoked inhibition and activation with the fraction of inhibitory interactions depending on the basal activity. Comparisons with existing data in different organisms are consistent with our theory, which provides a unified framework to understand olfactory coding. Possible predictions based on our theory and future directions to incorporate more biological complexities such as odor-odor and receptor-receptor correlations in our model are also discussed.

## II. RESULTS

We first describe the mathematical setup of the problem before presenting the results. An odor mixture can be represented as a vector ** c** = (

*c*

_{1},…,

*c*), where

_{N}*c*is the concentration of odorant (ligand)

_{j}*j*(= 1, 2,…,

*N*) and

*N*is the number of all possible odorants in the environment. A typical odor mixture is sparse with only

*n*(≪

*N*) odorant molecules that have non-zero concentrations. As illustrated in Fig. 1(a) (dotted box), the odor mixture signal

**is sensed by**

*c**M*sensors. The encoding process, which maps

**to the ORN response vector**

*c***= (**

*r**r*

_{1},

*r*

_{2}, …,

*r*), is determined by the bipartite odorant-ORN interaction network characterized by the sensitivity matrix

_{M}**, whose elements are denoted as**

*W**W*, the sensitivity of

_{ij}*i*-th sensor (ORN) to

*j*-th odorant for all odorant-ORN pairs with

*j*= 1, 2,…,

*N*and

*i*= 1, 2,…,

*M*. For simplicity, we used a simple competitive binding model [47, 48] [and see Supplemental Material (SM)], in which the normalized response of ORN

*i*(=1, 2, ⋯,

*M*) to odor

**can be described by a nonlinear function where**

*c**F*is the response function which is taken to be the form in Eq. 1, and

_{i}*η*represents the noise term. For convenience, we assume

_{i}*η*is a Gaussian noise with zero mean and standard deviation

_{i}*σ*

_{0}. Other forms of the nonlinear response function and noise can be used without affecting the general conclusions of our paper.

As illustrated in Fig. 1(b), the input-output response curve is highly nonlinear (Sigmoidal) resulting in a finite response range for each sensor, which is less than the range of concentration for a typical odorant molecule. Therefore, to encode the full concentration range of an odorant molecule, an odorant needs to interact with multiple sensors with different sensitivities. On the other hand, given the fact that *M* < *N*, each sensor has to sense multiple odorant molecules. These two considerations lead to the many-to-many odor-receptor interaction network characterized by the sensitivity matrix ** W** = {

*W*|

_{ij}*i*= 1, 2,…,

*M; j*= 1, 2,…,

*N*}.

Eq. (1) maps the external odor stimulus ** c** to the internal neuronal activity

**= (**

*r**r*

_{1},

*r*

_{2}, ⋯,

*r*). The downstream olfactory circuits then use this response pattern to evaluate (decode) odor information (both their identities and concentrations) in order to guide the animal’s behaviors. The quality of encoding odor information by the periphery ORNs directly sets the upper limit of how well the brain can decode the odor information [49]. In this paper, we focus on discovering the key statistical properties of the sensitivity matrix

_{M}**that allows biologically realistic ORNs to best represent the external odor information.**

*W*A given odor environment can be generally described by a probability distribution *P*_{env}(** c**). To convey maximum information about external odor stimuli in their response patterns to the brain, ORs/ORNs can adjust their sensitivity matrix

**to “match” odor statistics**

*W**P*

_{env}(

**). Without any assumption on what information the brain may need, the mutual information**

*c**I*(

**) between stimuli and response pattern of ORNs sets the limit on how much odor information are received by the peripheral ORNs and thus serves as a good “target function” to be maximized [23, 24, 46, 50–52].**

*c, r**I*is defined as where

*H*(

**) and**

*r**H*(

**|**

*r***) are the entropy of output distribution**

*c**P*(

_{r}**) and conditional distribution**

*r**P*(

**|**

*r***). In the limit of small noise, the second term is independent of**

*c***and negligible, hence, we will use**

*W**H*(

**) as our target function for optimization.**

*r**H*(** r**) depends on

**and**

*W**P*

_{env}(

**) because**

*c**P*

_{r}(

**) depends on**

*r***and**

*W**P*

_{env}(

**):**

*c*In this paper, we study the optimal coding strategy by maximizing the mutual information *I* or equivalently the differential entropy *H* with respect to the sensitivity matrix ** W** for different odor mixture statistics

*P*

_{env}(

**) and different numbers of ORNs. The mutual information as given in Eq. (2) can only be computed analytically for very simple cases. For more general cases, we used the covariance matrix adaptation - evolution strategy (CMA-ES) algorithm to find the optimal sensitivity matrix [53, 54] (see SM for technical details).**

*c*### A. The optimal sensitivity matrix is sparse

Odor concentration varies widely in natural environment [1, 55]. To capture this property, we studied the case where the odorant concentrations in an odor mixture follows a log-normal distribution with variance . Other broad distributions such as power law distributions are also studied without changing the general conclusions. For simplicity, we consider the case where odorants appear independently in the mixture; more realistic consideration such as correlation among odorants will be discussed later in the Discussion section.

For given odor statistics (characterized by *N, n* and *σ _{c}*), and a given number of nonlinear sensors

*M*, we can compute and optimize the input-output mutual information

*I*(

**|**

*W**N, n, σ*) with respect to all the

_{c}; M*M*×

*N*elements in the sensitivity matrix

**. We found that the optimal sensitivity matrix**

*W***is “sparse”: only a fraction (**

*W**ρ*) of its elements have non-zero values [sensitive, shown as the colored elements in Fig. 2(a)], and the rest are insensitive [the black elements in Fig. 2(a)], with essentially zero values of

_{w}*W*. The sparsity in the optimal

_{ij}**was not found in the previous theoretical study [46] mainly due to the oversimplified binary ORN response function used there. From the histogram of ln(**

*W**W*) shown in Fig. 2(b), it is clear that elements in the optimal sensitivity matrix fall into two distinctive populations: the insensitive population that has practically zero sensitivity [note the log scale used in Fig. 2(b)], and a sensitive population with a finite sensitivity. For the cases when the odor concentration follows a log-normal distribution, the distribution of the sensitive (nonzero) elements

_{ij}*P*(

_{s}*w*) can be fitted well with a log-normal distribution as shown in Fig. 2(b).

Our main finding here, i.e., sparsity in the odor-receptor sensitivity matrix, is supported by existing experimental measurements. As shown in Fig. 2(c), the sparsity *ρ _{w}* is estimated to be ~ 0.4 for fly larva [42] and ~ 0.1 for mouse [56]. Additionally, the broad distribution of the non-zero sensitivity observed in our model also agree qualitatively with those estimated from experiments [Fig. 2(c), two right panels], which are slightly skewed log-normal distributions.

Besides the distribution of the individual sensitivity matrix elements, we also calculated the row (sensor)-wise and column (odorant)-wise rank-order correlation coefficients (Kendall’s tau, *τ*) and compared them with those from the same matrix but with its elements shuffled randomly. As shown in the Supplemental Material (SM), we found that both the rows and columns (Fig. S1 in SM) in the optimal matrix have a higher level of orthogonality (and thus independence) than that from random matrices. This orthogonality in the optimal ** W** matrix leads to a higher input-output mutual information than those from the shuffled matrices [see Fig. S2(a) in SM] and a nearly uniform distribution of ORN activity for different odor mixtures [see Fig. S2(b)-(d) in SM].

### B. The optimal sparsity depends on odor statistics and the number of sensors

The statistics of the optimal sensitivity matrix elements are characterized by the sparsity *ρ _{w}* defined as the fraction of

*non-zero*elements in

**, and the distribution of the sensitive (nonzero) elements,**

*W**P*(

_{s}*w*), which is further characterized by its mean (

*μ*) and standard deviation (

_{w}*σ*). Note that the sparsity parameter

_{w}*ρ*is defined in such a way that a smaller value of

_{w}*ρ*corresponds to a sparser sensitivity matrix. We investigated systematically how

_{w}*ρ*, and

_{w}, μ_{w}*σ*depend on statistical properties of the odor mixture characterized by

_{w}*N, n*, and

*σ*, as well as

_{c}*M*, the total number of sensors (ORNs).

We found that as the odor concentration becomes broader with increasing *σ _{c}, ρ_{w}* increases [Fig. 3(a)]. This is expected as more receptors with different sensitivities are required to sense a broad range of input concentrations. When we increased the odor mixture sparsity

*n*or the total number of possible odors

*N*, the optimal sensitivity matrix sparsity

*ρ*decreases [Fig. 3(b)&(c)]. In general, as the mapping from odor space to ORN space becomes more “compressed” with larger values of

_{w}*n*and/or

*N*, the optimal strategy is to have each receptor respond to a smaller fraction of odorants to avoid saturation. Finally, we gradually increased the number of receptors

*M*with fixed values of

*N, n*, and

*σ*. We found that

_{c}*ρ*decreases, i.e., the sensitivity matrix becomes more sparse as the number of sensors

_{w}*M*increases [Fig. 3(d)]. This somewhat counter-intuitive result can be understood as the system has more sensors to encode signals, each sensor can respond to a smaller number of odors to avoid interference. For all the cases we studied, when the odor concentrations follow a log-normal distribution, then the distribution of the non-zero sensitivities in the optimal sensitivity matrix follows roughly a log-normal distribution with its mean

*μ*and standard deviation

_{w}*σ*depending on the odor statistics (

_{w}*σ*) and the number of ORNs

_{c}, n, N*M*(see Fig. S3 in SM).

To verify whether sparsity is a general (robust) feature in the optimal sensitivity matrix, we studied the cases when the odor concentration follows different distributions, such as a symmetrized power-law distribution, *P*_{env}(*c*) ∝ exp(−*β*| ln *c*|)[see Fig. S4(a) in the SM for the comparison with log-normal distribution], with different exponent *β*. For all values of *β* studied, there is always a finite sparsity *ρ _{w}* < 1 in the optimal sensitivity matrix. As shown in Fig. 4(a),

*ρ*decreases slightly when

_{w}*β*increases and the odor concentration distribution becomes narrower, which is consistent with the previous cases when the odor concentration distribution is log-normal [Fig. 3(a)]. However, as shown in Fig. 4(b), the distribution of the sensitive elements,

*P*(

_{s}*w*), does not follow exactly a log-normal distribution [see Fig. S4 (b) in SM]. In fact,

*P*(

_{s}*w*) is asymmetric in the ln(

*w*) space with a skewness that depends on

*β*as shown in the inset of Fig. 4(b).

Taken together, our results suggest that sparsity in the sensitivity matrix is a robust feature for nonlinear compressed sensing problems. This theoretical finding is supported by and explains existing experiments in olfactory systems [42, 56]. Our study also showed that the nonzero sensitivities follow a broad distribution whose exact shape, mean, and variance depend on odor statistics and total number of ORNs.

### C. The origin of sparsity in the optimal sensitivity matrix

Given the constraint that the number of sensors is much smaller than the possible number of odorants, i.e., *M* ≪ *N*, each sensor needs to respond (sense) to multiple types of odorant molecule so that all odorant molecules can be sensed by at least one sensor. However, in an odor mixture with a few types of odorant molecules, two or more odorants in the mixture can bind with the same sensor and interfere with each other, e.g., by saturating the nonlinear sensor. The probability of interference increases with the sparsity of the sensitivity matrix. This tradeoff between sensing multiple odorants and the possible interference determines the sparsity in the optimal sensitivity matrix. We demonstrate this tradeoff and its effect more rigorously by developing a mean-field theory (MFT) as described below.

We begin with the simplest case where many receptors sense only one odorant (*N* = 1, *M* ≫ 1) where obviously there is no interference. As first proposed by Laughlin [22], the optimal coding scheme is for the *M* receptors to distribute their sensitivities according to the input concentration distribution so that the output distribution is uniform. For the case when the distribution of the odorant concentration is log-normal with a standard deviation *σ _{c}*, the optimal sensitivity distribution

*P*

_{1}(

*w*) that maximizes

*H*(

**) is also approximately a log-normal distribution: where the mean**

*r**μ*= 0 and the variance increases with the variance of logarithmic concentration distribution. More importantly, we show analytically that in general the coding capacity

_{w}*I*

_{1}increases logarithmically with the number of receptors M when

*M*≫ 1 (see SM for details), which is verified by simulation results as shown in Fig. 5(a):

This means that sparsity *ρ _{w}* = 1, i.e., all sensitivities should be nonzero because there is no interference when only one type of odorant molecule (

*n*= 1) is present in the mixture. However, it is important to note that the maximum mutual information only increases weakly (logarithmically) for large

*M*.

We next consider the case where two odorants are sensed by multiple receptors (*N* = 2, *M* ≫ 1). Let’s denote the number of receptors that respond to each odorant as *m*(*m* ≤ *M*) and the sparsity *ρ _{w}* =

*m*/

*M*. If each odorant is sensed by a disjoint set of receptors, the total differential entropy will simply double the amount for a single odorant:

*I*

_{2}(

*m*) = 2

*I*

_{1}(

*m*). However there is a finite probability

*p*=

*m*/

*M*=

*ρ*, that a given receptor in one set will also respond to the other odorant. Therefore, on average there are

_{w}*m*×

*p*=

*m*

^{2}/

*M*receptors whose output is “corrupted” due to interference between two different odorants in a given mixture. We can write down the differential entropy as where

*I*

_{1}(

*m*) is the maximum differential entropy for one odor [Eq. (5)] and Δ

*I*is the marginal loss of information (entropy loss), which can be approximated by Δ

*I*≈

*α*(

*I*

_{1}(

*m*+ 1) −

*I*

_{1}(

*m*)) ≈

*α∂I*

_{1}(

*m*)/

*∂m*where

*α*≤ 1 is the average fraction of information loss for a “corrupted” sensor. We can then obtain the optimal value of

*m*by maximizing

*I*

_{2}(

*m*) with respect to m. For

*m*≪

*M*, the interference effect is small, so

*I*

_{2}(

*m*) ≈ 2

*I*

_{1}(

*m*), which increases with

*m*logarithmically according to Eq. (5). As

*m*increases, the interference effect given by the second term on the RHS of Eq. (6) increases with

*m*, which is faster than the slow logarithmic growth of 2

*I*

_{1}(

*m*). This leads to a peak of

*I*

_{2}(

*m*) at an optimal value of

*m*=

*m** <

*M*or a sparsity of the sensitivity matrix

*ρ*=

_{w}*m**/

*M*< 1 [Fig. 5(b)].

In the MFT, we can compute the olfactory coding and interference by ignoring the weak rank-order correlation in the optimal sensitivity matrix and assuming the distributions for the optimal sensitivity matrix elements are *i.i.d*. In particular, we used the following approximation for the distribution of the sensitivity matrix ** W**:
where

*ρ*is the matrix sparsity, and

_{w}*P*(

_{s}*W*) is a smooth distribution function, which is approximated here as a log-normal distribution with mean

_{ij}*μ*and standard deviation

_{w}*σ*as given in Eq. (4). The mean differential entropy of ORN response pattern which is averaged over the distribution of the sensitivity matrix

_{w}**, can be maximized with respect to the parameters**

*W**ρ*, and

_{w}, μ_{w}*σ*(see SM for details). The resulting optimal parameters agree with our direct numerical simulations qualitatively with a sparsity

_{w}*ρ*< 1 that increases with the width of the input distribution

_{w}*σ*(see Fig. S5 in SM).

_{c}### D. The optimal sparse sensitivity matrix enhances downstream decoding performance

The response patterns of ORNs form the internal representation of external odor stimuli that the higher (downstream) regions of the brain can use to infer the odor information for controlling the organism’s behavior. In previous sections, we focused on understanding the statistical properties of the optimal sensitivity matrix *W* that maximize mutual information between odor input and ORN output. Here in this section, we test whether the optimal sensitivity matrix can enhance the downstream decoding performance by examining two specific learning tasks: classification and reconstruction.

#### Task I: classification

The goal of the classification task is to infer the category of odor mixture such as the odor valence by training with similar odor stimuli. Classification is believed to be carried out by the *Drosophila* olfactory circuit, which we describe briefly here. After odor signals are sensed by ~ 50 ORNs, they are relayed by the projection neurons (PNs) in antennal lobes to a much larger number of Kenyon cells (KCs) in the mushroom body (MB), as illustrated in Fig. 6(a). Each of the ~ 2000 KCs in MB on average receives input from ~ 7 randomly selected PNs [57]. A single GABAergic neuron (APL) at each side of the brain can be activated by the KCs and inhibits KCs globally [58]. Such random expansion and global inhibition enable sparse and decorrelated representation of odor information in MB [57, 59, 60]. The large number of KCs in MB then converge to a few (only 34) mushroom body output neurons (MBONs) [61], which project to the other brain regions and drives attractive or repulsive behavior [62]. Olfactory learning is mainly mediated by the dopaminergic neurons (DANs) which controls the synaptic weights between KCs and MBONs [63].

To mimic the properties of MB, our model “classifier” network, as illustrated in in Fig. 6(b), contains a high dimensional mixed-layer (KCs). For simplicity, we consider a single readout neuron. Each KC unit in the mixed-layer pools the ORNs with a fixed random, sparse matrix. Only the synaptic weights from the KCs to the readout neuron are plastic. To consider the variability of natural odors, we assumed that odor stimuli fall into clusters whose centers represent corresponding typical odor stimuli. Members in a given cluster are variations to the centroid [64]. The radius of a cluster Δ*S* characterizes the variability of a specific odor mixture. Centroids were drawn from *P*_{env}(** c**) with randomly assigned labels (attractive or aversive). Members inside each cluster were generated by adding noise of size Δ

*S*, which results in clouds of points in the odor space [Fig. 6(c)] with each cloud having a randomly assigned label (see SM for details).

The synaptic weights from the KCs to the readout neuron are trained by using a simple linear discriminant analysis (LDA) method although other linear classification algorithms such as support vector machine (SVM) would also work. After training, the performance of the “classifier” is quantified by the accuracy of classification on the testing dataset.

To test effects of different coding schemes on the classification performance, we vary the distribution of the sensitivity matrix elements by changing the sparsity *ρ _{w}* without changing the distribution of the non-zero sensitivity matrix elements (e.g., the log-normal distribution with fixed mean and variance). The output of the coding process

**(**

*r***) serves as the input for the “classifier” network and the classifier error is computed for different values of**

*c, W**ρ*. As shown in Fig. 6(d), we find that the best performance is achieved near

_{w}*ρ*= 0.6, which belongs to the range of

_{w}*ρ*with large mutual information between odor input and the ORN/PN response [shaded region in Fig. 6(d)]. Changing parameters such as

_{w}*M, n*and number of categories give similar results (see Fig. S6 in SM). In line with recent studies which show that sparse highdimensional representation facilitates downstream classification [64, 65], our results suggest that maximum entropy coding at the ORNs/PNs level may enhance classification by retaining maximum odor mixture information in a form that can be decoded by the KCs through random expansion.

#### Task II: reconstruction

This goal of the reconstruction task is to infer (decode) both the composition and the exact concentrations of all odorant components in an odor mixture from the sensor responses. This more stringent task is motivated directly by the original compressed sensing problem in computer science, its relevance to the olfactory systems will be discussed later in the Discussion section.

As illustrated in Fig. 7(a), the output of the coding process ** r**(

**) serves as the input for the downstream reconstruction network. Here, we used a generic feedforward artificial neural network (ANN) with a few (1-5) hidden layers and a output layer that has the same dimension**

*c, W**N*as the odor space. We trained the ANN with a training set of sparse odor mixtures drawn from the odor distribution

*P*

_{env}(

**), and tested its performance by using new odor mixtures randomly drawn from the same odor distribution. Denote the reconstructed odor vector as**

*c***and a binary vector**

*ĉ***associated with**

*ξ***, i.e.,**

*c**ξ*= 1 if

_{i}*c*≠ 0, otherwise,

_{i}*ξ*= 0. Due to the sparse nature of odor mixture and the wide concentration range, the reconstruction error 𝓛 is defined as the sum of “identity error” 𝓛

_{i}_{1}and “intensity error” 𝓛

_{2}: where with and determined from training (supervised learning).

The reconstruction error depends on the coding matrix ** W**, in particular its sparsity

*ρ*, as shown in Fig. 7(b). Pair-wise comparisons of non-zero concentrations in the original and reconstructed odor mixtures for three different coding regimes are shown in Fig. 7(c) (see Fig. S7 in SM for a direct comparison of the whole reconstructed and original odor vectors).The best performance is achieved around

_{w}*ρ*= 0.6, within the region where sparse

_{w}**enable nearly maximum entropy coding (shaded region), this property is insensitive to the number of hidden layers in the reconstruction network (see Fig. S8 in SM). Our results show that the optimal entropy code provides an efficient representation of the high-dimensional sparse data so that the downstream machine learning algorithms can achieve high reconstruction accuracy.**

*W*### E. Optimal coding strategy for ORNs with a finite basal activity

So far, we have only considered the case where the neuron activity is zero in the absence of stimulus and odorants only activate the ORs/ORNs. It has been widely observed that some ORNs show substantial spontaneous activities, and some odorants can act as inhibitors to suppress the activities of neurons they bind to [41, 66, 67], as shown in Fig. 8(a). The presence of an inhibitory odorant can shift a receptor’s dose-response curve to an excitatory odorant, thereby diminishing the sensitivity of the receptor to excitatory odorants[68]. It is then natural to ask what is the optimal design of the sensitivity matrix to maximize coding capacity if odorants can be either excitatory or inhibitory. To answer this question, we used a two-state model to characterize both odor-evoked excitation and inhibition [68]. Now, the interaction between the odorant *j* and ORN *i* has two possibilities – it can be either excitatory with a sensitivity or inhibitory with a sensitivity . The normalized response of *i*-th ORN to odor mixture ** c** is
where

*γ*determines the basal activity by

*r*

_{0}= 1/(1 +

*γ*),

*n*and

_{A}*n*are the number of excitatory and inhibitory odorants to the

_{I}*i*-th receptor, and

*η*is a small Gaussian white noise.

_{i}Our simulations show that with a finite spontaneous activity, the receptor array achieves maximum entropy coding by assigning a certain number of inhibitory interactions in the sensitivity matrix [Fig. 8(b)]. The strength (sensitivity) of both the excitatory and inhibitory elements follow (approximately) log-normal distributions [Fig. 8(c)]. The fraction of inhibitory interaction (*ρ _{i}*) in the optimal

*W*is roughly proportional to the spontaneous activity of ORN

*r*

_{0}, with only a slight deviation when

*r*

_{0}→ 0 and

*r*

_{0}→ 1 [Fig. 8(d, upper panel)]. Interestingly, as

*r*

_{0}→ 0,

*ρ*approaches a finite value that is related to the fraction of zero sensitivity elements (1 −

_{i}*ρ*) we studied in the previous sections for ORNs without a spontaneous activity. As the basal activity increases, the coding capacity increases rapidly at first and quickly plateaus around

_{w}*r*

_{0}= 0.3 [Fig. 8(d, lower panel)]. The increase of coding capacity can be understood intuitively by considering that the effective dynamic range of receptors increases in the presence of inhibition. Odor-evoked inhibition enables receptors to work bi-directionally and avoid saturation when responding to many odorants simultaneously.

To verify our theoretical results, we have analyzed the statistics of the sensitivities for the excitatory and inhibitory interactions obtained from the experimental data in fly by Hallem and Carlson [41] as well as in mosquito by Carey *et al* [69]. As shown in Fig. 8(e) for the fly data, both the excitatory and inhibitory sensitivities follow log-normal distributions, which are consistent with our model results shown in Fig. 8(c). The mosquito data shows very similar results (see Fig. S9 in SM). Our theory also showed that the fraction of inhibitory interaction *ρ _{i}* increases with the basal activity

*r*

_{0}as shown in Fig. 8(d, upper panel). We have tested this theoretical result from the experimental data. As shown in Fig. 8(f), the number of inhibitory odor-ORN interaction for an ORN shows a strong positive correlation with its basal activity for both fly and mosquito, which is in agreement with our theoretical prediction. Finally, we note that the relative basal activity 〈

*r*

_{0}〉 from the experimnetal data [41] is smaller than 0.16 (see SM for detailed analysis), where the differential entropy rises sharply with

*r*as highlighted by the shaded region in Fig. 8(d, lower panel). Although an even higher spontaneous activity towards

_{0}*r*

_{0}= 0.5 can further increase the coding capacity, the gain is diminishing, while the metabolic cost increases drastically in maintaining the spontaneous activity [70]. Thus, an optimal basal activity would be expected in the shaded region of Fig. 8(d) due to the tradeoff between coding capacity and energy cost.

## SUMMARY AND DISCUSSIONS

In this paper, we studied how a relatively small number of nonlinear sensors (ORNs) with a limited dynamic range can optimize the transmission of high dimensional but sparse information in the environment. We found that the optimal sensitivity matrix elements follow a bi-modal distribution. For neurons without a basal activity, the sensitivity matrix is sparse – a neuron only responds to a fraction *ρ _{w}*(< 1) of odorants with its sensitivities (to these odorants) following a broad distribution and it is insensitive to the rest (1 −

*ρ*) fraction of the odorants. This sparsity in the odor-ORN sensitivity matrix is a direct consequence of the finite dynamic range of the realistic nonlinear ORNs, which are different from the linear sensors in the conventional compressed sensing problem. For neurons with a finite basal activity

_{w}*r*

_{0}, the optimal sensitivity distribution is also bi-modal with a fraction

*ρ*of the odor-neuron interaction inhibitory and the rest (1 −

_{i}*ρ*) fraction of the odor-neuron interaction excitatory and

_{i}*ρ*increases with

_{i}*r*

_{0}. Details of the odor-receptor sensitivity distribution depend on the odor mixture statistics and the sensor characteristics, but the bi-modal distribution is robust. By investing the effects of different coding schemes on the downstream decoding/learning tasks, we showed that the maximum entropy code (representation) of the external signal enhances the performance of downstream reconstruction and classification tasks.

### Connection to experiments and testable predictions

Our primary finding - the sparsity in the odor-receptor sensitivity matrix ** W** - seems to be consistent with existing experimental measurements of receptor-odor sensitivity matrices in different organisms [Fig. 2(c)]. Although the natural odor environment varies for different organisms, it is interesting to see that the broad distribution of the non-zero sensitivity observed in our model is consistent with the sensitivity matrices estimated from experiments in fly larva, mouse, adult fly, and mosquito [Fig. 2(c) and Fig. 8(e)&(f)]. The optimal coding strategy, if exists, would be the result of evolution. Thus, our theory may be tested by comparing olfactory systems in different species. In particular, our theory predicts that the sparsity parameter

*ρ*decreases with the number of ORNs

_{w}*M*[Fig. 3(d)], which can be tested by measuring the sparsity in the odor receptor sensitivity matrices in different organisms.

The relatively high level of spontaneous activity in ORNs has long been thought to only play a role in the formation of topographic map during development [71]. A recent study shows that odor-evoked inhibition can code olfactory information that drives the behavior of the fly [68]. Our results provide a quantitative explanation for the advantage of having certain level of spontaneous basal activity and odor-evoked inhibitions in odor coding. For neurons with a finite basal activity, our theory predicts that the fraction of odorants that inhibit the neuron increases with the basal activity of the neuron. The data from adult fly and mosquito are consistent with this prediction [Fig. 8(f)]. However, powerful high throughput techniques such as Calcium imaging, which only indirectly measure the odor-ORN interaction, seem to be incapable of detecting inhibitory interaction [42]. Therefore, more large scale direct measurements using electrophysiological methods such as those done for *Drosophila* [41, 72] and mosquito [69] should be carried out to test our predictions in different organisms.

By considering how the coding capacity of ORNs changes with basal activity [Fig. 8(d)] and the associated extra energy cost [70], one can hypothesize the existence of an “optimal” *r*_{0}. Our result suggests that as the number of sensors increases, the benefit of having basal activity diminishes, hence, the “optimal” *r*_{0} should decrease as the number of sensory neurons increases. Indeed, this is consistent with the fact that *E. coli* has 5 chemoreceptors [73] which work bi-directionally with a high basal activity *r*_{0} ≈ 1/3 − 1/2 [74], and *r*_{0} in mouse is smaller than that in fly [67]. Of course, more experiments across different organisms with different numbers of sensory neurons are needed to test this hypothesis.

### Future Directions

In this study, we assumed that odor information is contained in the instantaneous spiking rate of ORNs, and did not consider adaptation dynamics. Although adaptation plays an important role in all sensory systems [75], it happens in a relatively slower time scale than the time required for animals to detect and respond to odor stimuli [76, 77]. In general, sensory adaptation shifts the response function of the sensory neuron according to the background stimulus concentration and it leads to a larger but still finite effective dynamic range without changing the qualitative characteristics of the input-output response curve [75, 78]. Therefore, even though ORN level adaptation can further increase coding capacity at a slightly longer time scale as shown recently by Kadakia and Emonet [79], we do not expect it to qualitatively affect the optimal coding strategy found here. It remains an interesting question to understand how neuronal dynamics such as adaptation can be used for coding time-dependent odor signals.

We have used reconstruction and classification as two learning tasks to demonstrate the advantage of having maximum entropy coding at the ORN level. While the classification task has clear biological relevance, it is unclear to what extent animals need to infer the concentrations of individual odorants in an odor mixture. The perception of odors has been thought as synthesis, i.e., odorant mixture is perceived as a unit odor [16]. Nevertheless, the performance of the reconstruction task indicates that most of the information about the odor mixture including the identities and concentrations of individual odorants in a sparse mixture can potentially be extracted from the activity pattern of ORNs, which is consistent with the experimental finding that mice after training can detect a target odorant in odor mixtures with up to 16 different odorants [80]. In this work, we focused only on the optimal coding strategy for the peripheral ORNs. In the fly olfactory system, odorants that elicit very similar ORN response patterns can be represented by very distinct patterns of KCs [41, 60]. It remains an interesting open question whether and how the architecture of the ORN/PN to KC network optimizes the odor information transmission to enhance precision of downstream learning and decision-making.

In conventional compressed sensing theory with linear sensors, a random measurement matrix enables accurate reconstruction of sparse high dimensional input signals [37, 40]. By using prior information about the input, a better sensory matrix can be designed [81, 82]. In many cases, the optimal matrix maximizes the entropy of compressed representation [83]. Unlike the linear CS problem where the measurement matrix is known and can be used directly for reconstructing the sparse input signal by using the *L*_{1}-minimization algorithm, reconstruction in the nonlinear CS problem studied here has to be done by learning without prior knowledge of the sensitivity matrix. Despite this difference, our results suggest that with nonlinear sensors, the sparse optimal sensory matrix that maximizes information transmission enables better learning and more accurate reconstruction. This general observation and the limit of reconstruction in nonlinear CS should be examined with more rigorous analysis and larger numerical simulations.

Finally, in our study, we considered the simplest case where odorants appear independently in odor mixtures. However, even in this simplest case, we have found weak but statistically relevant “orthogonal” structure in the optimal sensitivity matrix [Fig. 2(c) and Fig. S1 in SM]. In naturally occurring odor mixtures, co-occurrence of odorants in different odor sources are common. For example, odorants that are products in the same biochemical reaction pathway, i.e., fermentation, are likely to appear together [2, 84]. Although odorant-evoked ORN response patterns are not simply determined by the molecular structure, some very similar odorants do trigger similar ORN response patterns [41]. On the other hand, ORNs and their responses to different odorants can be correlated due to structural similarities in their receptor proteins. It would be interesting to explore how such correlations among ORNs and odorant molecules as well as co-occurrences among different odorants in odor mixtures can affect the optimal coding strategy at the olfactory periphery in a future study.

## ACKNOWLEDGMENTS

We thank Xiaojing Yang, Guangwei Si, Jingxiang Shen, Louis Tao, and Roger Traub for helpful discussions and comments. The work was supported by the Chinese Ministry of Science and Technology (Grant No. 2015CB910300) and the National Natural Science Foundation of China (Grant No. 91430217). The work by YT is partially supported by a NIH grant (R01-GM081747).

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].
- [16].↵
- [17].↵
- [18].
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].
- [27].
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵