## Abstract

The circuits of olfactory signaling are reminiscent of complex computational devices. The olfactory receptor code, which represents the responses of receptors elicited by olfactory stimuli, is effectively an input code for the neural computation of odor sensing. Here, analyzing a recent dataset of the responses of human olfactory receptors (ORs) to odorants, we show that the space of human olfactory receptor codes is partitioned into a modular structure where groups of receptors are “labeled” for key olfactory features. The existence of such independent and sizable receptor groups implies a significant dimensional reduction in the space of human odor perception. Our data-driven statistical analysis of receptor codes leads to a valuable discovery that human olfaction works by hybridizing both the combinatorial coding and labeled line strategies, even at the early stage of signal processing.

## INTRODUCTION

Olfaction is a sensory process that captures environmental signal by detecting the molecular stimuli to a repertoire of olfactory receptors (ORs). Although the full processing of olfactory information is realized in the complicated neural circuits [1], the first step of olfactory sensing involves a selective binding of odorants to the cognate ORs, which is biochemical in nature [2]. The binding elicits an array of downstream responses in the corresponding olfactory receptor neurons (ORNs) [3]. The response pattern encoded into ORs or ORNs, termed the olfactory *receptor code* [4], provides the first neural representation of an odor and is essential in the early stage of information-processing in olfactory sensing.

Lying at a crucial juncture in the flow of olfactory information, the receptor code links between the molecular and neural spaces. An olfactory stimulus in the *molecular space*, representing the physiochemical properties of the odorants, is translated into the *receptor code space* to form an “input code”. The information is then processed through the higher-order *neural spaces*, and eventually evokes the sense of smell that constitutes the *perceptual odor space* (see Fig. 1). Recent insights into the processing of olfactory information were gained mostly based on the olfactory sensing of non-human species [5–7] or through theoretical studies [8, 9]. A separate branch of research attempts to relate the molecular space directly with the perceptual odor space [10, 11]. However, the goal of understanding the principles of olfactory computation in humans faces a fundamental difficulty due to our limited access to the neural circuitry in the human brain.

Here, we show that a critical progress can still be made by a systematic analysis of the receptor code space. Whereas the OR repertoire is known to encode the input signal from odorants like a piano keyboard that combinatorially produces a variety of musical chords [4], it may be further “formatted” to facilitate the processing of relevant information, perhaps by appropriately reflecting the structure of the perceptual odor space. Specifically, we analyze a set of receptor codes for different odorants, extracted from the measurement of downstream biochemical responses of the human ORs [12–14]. Knowing that the receptor code is implemented by the many-to-many pairwise interaction between odorants and the ORs [4, 15], we treat the interaction of each odorant-receptor pair as either “on” (responding) or “off” (not responding), and focus on the binary pattern of pairwise interaction as a zeroth-order representation of the receptor code. We reveal the intrinsic structure of the human receptor code space through an analysis of the similarity patterns, which involves quantifying the extent of overlap between distinct receptor codes (receptor code redundancy).

## RESULTS

We present a quantitative analysis of the pattern of odorant-receptor interactions, employing the dataset that report the responses of 303 receptors against 89 odorants [14]. The state of all *N* = 303 receptors can be represented in terms of an *N*-dimensional binary vector **y**, where *y _{i}* = 1 if the

*i*-th receptor is “on”, and

*y*= 0 if it is “off”. For a given odor x, the corresponding

_{i}*receptor code*

**y**is the representation of the odor

**x**in the

*N*-dimensional binary receptor code space.

We collected all 535 pairwise interactions reported in the dataset, including 60 de-activations [16], and visualized the receptor codes in the form of an interaction network (Fig. 2a). In this interaction network, each node is either an odorant or a receptor, and each edge connects an interacting odorant-receptor pair For the interaction network with a richer display with varying parameters, see Fig. S1.

### Interaction network reveals odorant hubs with non-redundant receptor codes

In the odorant-OR interaction network (Fig. 2a), the number of edges attached to an odorant node indicates the number of receptors that recognize this odorant. We call this number the *degree* of the odorant node, and denote it by *k*_{λ}, where λ is the index for the odorant (also see Methods).

We observe two properties from the statistics of single-odorant degrees. First, the receptor-space representation of single odorants is sparse overall: On average, an odorant is recognized by only 6 (〈*k*〉 ≃ 6) out of *N* = 303 receptors, which amounts to 2% of the receptor space. The sparsity observed in the network is consistent with previous reports from the ORN responses [17, 18]. Second, the degrees of the odorants in the dataset are non-uniformly distributed with a heavy tail, which can be reasonably fitted to a power law *P*(*k*) ~ *k*^{−α} with *α* ≈ 0.9 (Fig.2b). Such distribution allows us to identify a small number of high-degree odorant “hubs”. The six highest-degree hub odorants are labeled in Fig. 2a. It is worth noting that the receptor codes for the hub odorants are highly non-redundant with one another (Fig. 2c); among the six highest-degree odorants, 10 out of all 15 pairs have completely disjoint sets of receptor codes. In particular, despite significant similarity in chemical structures, there is no co-activated OR between eugenol and eugenol acetate, which differ in a single functional group (hydroxy group versus acetate in the aromatic ring). The overlap of receptor codes between the hub odorants is significantly small compared to that between randomly sampled odorants from the dataset (Fig. S3).

We introduce an idea of *receptor code redundancy*, a quantitative measure of (dis-)similarity between the olfactory responses of two distinct odorants. If two odors have exactly the same receptor codes, they are perceived as the same olfactory signal. We hypothesize that two odors are poorly distinguishable when their receptor codes have a higher redundancy. For example, consider a discriminatory task where the goal is to detect a target odorant in the presence of a constant background odor [19, 20]. We define the receptor code redundancy in terms of the fraction of receptors that respond to the target odorant (see Fig. 2d-e). If this fraction is small, as in Fig. 2d, the target odorant is deemed unambiguously detected even in the presence of the background odorant.

Taken together, the single-odorant representation in the receptor space is sparse and non-uniform, giving rise to high-degree “hub” odorants. The receptor code is almost non-redundant among these hub odorants, whereas the odorants that have a greater receptor code redundancy with a hub odorant are grouped around it. As presented below, analysis of the receptor code redundancy enables us to decipher the structure of the receptor code space more quantitatively.

### The receptor code space is naturally partitioned to show a modular structure

The grouped structure is better manifested when the interaction network is projected to the space of receptors. Here we consider the *co-activation graph* of receptors, which inherits all receptor nodes in the original interaction network; two receptor nodes are connected by an edge if they share a common odorant in the original network (Fig. 2d-e, last column). Of particular interest in the coactivation graph are the “receptor *cliques”*, or the groups of receptors that are co-activated by the same odorants. Because each pair of receptors interacting with a shared odorant λ is connected in the co-activation graph of receptors, the set of all receptors that interact with a given odorant always form a clique. Given the particular structure of the interaction network, with hub odorants with largely non-redundant receptor codes (Fig. 3a), the co-activation graph of receptors is bound to have large and mostly non-overlapping cliques that are associated to the hub odorants (Fig. 3b). We use the receptor cliques to *partition* the receptor space into non-overlapping groups, such that each receptor group is associated to a shared odorant; when a receptor is a part of more than one receptor clique, it is assigned to the larger clique (Fig. 3c).

The receptor groups can be used to group odorants based on their receptor codes. The idea is to construct odorant groups such that an odorant λ belongs to an odorant group Λ if its receptor code redundancy with respect to the group Λ, *χ*_{λ|Λ} = |**y**_{λ} ∩ **y**_{Λ}| / |**y**_{λ}|, is large (see Methods). We first used the receptor groups to determine the reference receptor codes for odorant groups (**y**_{Λ}’s) and then assigned each odorant to the group Λ where *χ*_{λ|Λ} is maximized. This results in a simultaneous partitioning of odorants and receptors (Fig. 3d, inner groups).

So far, we have demonstrated that groups emerge naturally from the particular statistics of the receptor codes. The resulting groups give us information as to which pairs of receptors (or odorants) are most strongly correlated. But in order to characterize the overall structure of the entire receptor code space, the “positive” correlations alone are not enough; for a perfect classification, receptors in the same group must respond to the same sets of odorants, and receptors in different groups to different odorants (see Methods for details). Thus, we step further to obtain an optimal classification of the co-activation patterns in the receptor code space. By appropriately *merging* the receptor groups obtained from the primary grouping, we carry out a secondary grouping of the receptors, where the threshold for pairwise merging is determined to maximize the goodness of classification (Fig. S5). Because we partition the odorants and the receptors simultaneously, a merging of receptor groups automatically leads to a merging of the corresponding odorant groups (Fig. 3d).

Application of the grouping procedure to the human olfactory receptor codes results in the sorted interaction matrix in Fig. 4, where each row represents the receptor code for a given odorant. The rows and columns of the interaction matrix are sorted according to the orders in the respective odorant/receptor groups (Fig. 4a-b). The interaction matrix has a roughly block-diagonal form (Fig. 4c), with much less significant contributions from the off-diagonals elements. Notice that we have already arranged the interaction network layout to represent a grouped structure, which is visualized more clearly in Fig. 5, where colored territories represent the best partitioning of the receptor code space. The feasibility of grouping reflects a special structure of the receptor code; such clear grouping is not obtained from a random network with the same degree statistics (Fig. S6).

### Odorants in the same group tend to carry similar olfactory features

Whereas our analysis is only based on the receptor codes and their redundancy, it results in a grouping of perceptually similar odorants together. For example, the hub odorants in the largest global group (Group 1 in Fig. 4) include “plant-related” odors, such as cis-3-hexen-1-ol (the characteristic green/grassy odor); geranyl acetate, (floral); cinnamaldehyde, eugenol methyl ether (plant-derived spices). The smaller-degree odorant members in the group include phenyl acetaldehyde (green or honey-like); sandalwood (woody); coumarin (sweet, grassy, hay-like); linalool, lilial, lyral, anisalde-hyde, terpineol (floral); n-amyl acetate (fruity); methyl salicylate, spearmint, cyclohexanone, carvones (minty).

In Group 2, we find more “animal-related” odors, such as an-drostenone, androstadienone (smell of body) and butyric acid (sweaty smell, also occurring in human body). Other members of this group includes odorants that are often associated with “bodily” or “sensual” feelings: ambrette (fragrance similar to animal musk) and jasmine. There are also several odorants that smell floral or fruity; in fact, many receptors are shared between the first (plant-related) and the second group (body-related).

In Groups 3, 4 and 5, odorants are perceived as culinary spices: eugenol, eugenol acetate (the characteristic odor of clove); ethyl vanillin, cinnamon (common spices). In Group 3, which contains eugenol, we see food-related flavors: banana, nutmeg, butyl an-thranilate (fruity odor); guaiacol (characteristic flavor for whiskey and roasted coffee); and 2-heptanone (characteristic flavor for gorgonzola cheese). These odorants with similar odor qualities, from which we draw the “characteristic olfactory feature” for each global group, are marked with asterisks in Fig. 4a (see also Tables S1–S2 for a full list of odor descriptors, for all odorants used in this study).

## DISCUSSION

We discovered a natural partitioning of the receptor code, by identifying groups of olfactory receptors (ORs) with correlated response patterns. Our grouping was performed without any label for the perceptual information; yet, this grouping procedure also partitioned the odorants into a few odor categories with similar odor descriptors. A correlated activity in a group of receptor codes for a certain olfactory feature is indicative of the existence of “labeled receptor groups”. Below we will elaborate this idea and discuss its implications from a broader perspective of human odor perception.

### Receptor groups as the bases of perceptual odor space

There is a clearer association between the groups identified in the receptor code space and the major components in the perceptual odor space (the olfactory “features”). The presence of such association is supported by the following two observations. First, odor-ants in the same group tend to have similar perceptual descriptors, e.g., “plant-related” or “animal-related” odorant group. Second, we find that our result of odorant groups is consistent with the reported phenomenon of “olfactory white” [21] in which human subjects cannot distinguish odor mixtures if more than 30–40 odorants are randomly mixed spanning the odor space. In our analysis with odorant mixtures, the average receptor coverage by a mixture of 30–40 random odorants was equivalent to the coverage by the ~6 highest-degree odorants. Moreover, a random sampling of 30 odorants was just enough to sample each of the 6 largest odor groups at least once (Fig. S3).

The presence of receptor groups implies a low-dimensional nature of the receptor code space; or eventually, of the perceptual odor space. Because the odor space is represented by the response pattern of *N* ≈ 400 functional ORs, there are in principle 2^{400} possible OR states, even in the binary regime. But when a group of receptors respond in a correlated way, not all 2^{400} states are equally likely. Therefore, a significant amount of dimensionality reduction is made in the receptor code space. This aligns with the previous ideas involving the effective sparsity of odor space that, despite the apparent high-dimensionality implicated by physiochemical properties of odorants, olfaction is working in a much lower dimension in effect [22, 23]; or that the odor space is intrinsically clustered rather than uniform [21, 24].

The effective sparsity of the receptor code space, which may once again propagate to the perceptual odor space. has an analogy to vision. When one says that the natural visual scene is sparse [25], it means that the scene consists of a small number of features, such as lines or edges [26]. Receptor groups are like lines or edges, in the sense that they represent certain correlated patterns in the receptor space that are projected to the coarse-grained *olfactory features* in the perceptual odor space. Whereas feature-extracting coding was thought to be the realm of higher-level neurons [27, 28], here we found a concrete evidence that it already starts at the level of receptors.

### Combinatorial coding versus labeled lines: the hybrid strategy of human odor sensing

Our results point to a hybrid picture of the two coding hypotheses. At one extreme, one may assume that each receptor carries a specific signal, so that the glomerular array works as a set of *labeled lines*. Decoding is then unambiguous, as each OR is a unique discriminator for an odor unit (odorant), and a mixture is the sum of its components. Because there is only a finite number of ORs available, however, coding with labeled lines has a clear limitation on its capacity. Alternatively, one can assume that odors are identified through the combinations of responding receptors. This latter strategy of *combinatorial coding* can accommodate a much larger number of distinct odor units than the number of ORs. In this case, let alone the biochemical cost of activating many ORs [29], decoding would be a highly non-trivial problem that requires further computation at higher-level neurons.

For the most part, the receptor code looks like a set of labeled lines (or labeled *groups* as we now prefer to call them), which maintains a set of dedicated channels for each odorant (or odor group). However, the receptor code also has signatures of combinatorial coding, in the sense that some odorants activate receptors from more than one groups.

The hybrid coding strategy is not entirely a new idea. It is known that the *sparse distributed coding* strategy, which mixes the *local coding* (labeled lines) and *dense distributed coding* (combinatorial coding), is used in visual sensing [25, 26]. There is growing evidence that insects employ a mixture of combinatorial coding and labeled-line strategies [30, 31]. Olfactory coding strategies in most organisms, including humans, are clearly more complicated than the dichotomy between labeled lines and combinatorial coding [32]. However, this is not to say that any evidence supporting either coding strategy is false. There are cases where the labeled-line structure is clearly at work [33], and there are other important cases in which the code cannot be understood without the idea of combinatorial coding [4]. Presumably, biological evolutions at both OR and neuron levels result in the hybrid strategy as a real solution. Our statistical analysis on the odorant-OR interaction data clearly demonstrates that hybridized coding strategy is already at work in the receptor code space, the very first layer of human olfaction [34, 35].

### Structures in the receptor code space, features in the perceptual odor space

Here, we further elaborate on our proposal that structures in the receptor code space represent the olfactory features in the perceptual odor space. Specifically, we put forward two hypotheses: (i) The large receptor groups found in the receptor code space through our global partitioning are “labeled” for the basic olfactory features that span the perceptual odor space. (ii) The hub odorants correspond to the salient stimuli in the natural olfactory environment.

First, we find that the olfactory features for the labeled receptor groups are in line with coarse-grained odor categories, e.g., “plant-related” and “animal-related” odors. This lends support to the idea that primary features of olfactory perception may already be “hardcoded” in the receptor code space [36], enabling innate discrimination of these features. Indeed, there is a clear advantage in keeping hard-coded labels for the basic olfactory cues, which would have been strongly conserved over the course of evolution [37] and would benefit from a faster response [38]. In fact, most examples of labeled-line receptors in other organisms are coarse-grained discriminators for certain groups of odorants [39, 40], or simply for good versus bad [33,41], which is arguably the most important axis of odor perception for humans as well [23, 42, 43].

Second, at the core of our analysis are the hub odorants that elicit responses in the large population of receptors, which implies that degree is a measure of the functional importance of a stimulus in the receptor space. The degree of an odorant might also reflect its statistical importance in the natural odor space, the statistical weight of which may have been adjusted through the process of evolution. We hypothesize that the receptor code is designed to assign a larger number of receptors to a more *salient* stimulus, whose presence is more strongly correlated with a particular olfactory feature. The existence of salient odorants in the natural odor space is in fact plausible from the biological viewpoint. For example, because the types of odorants produced by the flower is an outcome of evolutionary selection, most flowers have a few common odorants, with their physiochemical properties attracting the pollinating insects [44].

When more salient stimulus is encoded by the response of more ORs, it has the advantage that important olfactory signals (carried by the salient odorants) are detected reliably, despite the perturbations in the receptor code space, e.g., the functional loss of certain ORs due to genetic mutation [45, 46]. If the hypothesis was correct, one would be able to predict the saliency of odorants in the natural odor space based on the number of receptors that respond. It is of great interest to explore whether the statistical salience has any relationship with the statistics of co-occurrence between odors [47].

### Concluding remarks

A set of concrete messages can be drawn from our study: (i) The existence of odorant groups, or the correlated patterns between the receptor codes, greatly reduces the dimensionality of the receptor code space and allows us to identify coarse-grained olfactory features in the perceptual odor space. (ii) The odorant and receptor groups revealed from our study clarify the balance between the labeled lines and combinatorial coding strategies. The labeled receptor groups on top of non-uniformly distributed receptor codes imply a hybrid strategy, which can be used to leverage both the fidelity of the labeled-line design and discriminative capacity of the combinatorial coding. (iii) Each large group of odorants is described by the coarse-grained olfactory feature (e.g. plant-related, animal-related odor) and the apparent hierarchy within each group. Our findings suggest a corresponding hierarchy in the perceptual odor space, which opens up a series of questions as to the natural odor statistics and its relation to the statistics of their receptor codes. Despite several simplifying assumptions in this study, e.g., neglecting the dependence on odor concentration, competitive binding between two different types of odorants [48], the effect of inverse agonists [49], as well as any temporal effects [50], essential findings from our analysis on human receptor code space should be preserved (e.g., see Fig. S2 for the effect of odorant concentration on degree distribution).

In summary, by analyzing the dataset of OR responses, we identified a modular structure inherent in the receptor code space of human olfaction. Insights from these findings could be extended to the broader study of human odor perception and help understand exceptionally diverse data on olfaction in a more systematic manner.

## METHODS

### Modeling

#### Network representation of the receptor code space

We analyze the dose-response curves of 535 interacting odorant-receptor pairs, involving 89 odorants and 303 olfactory receptors (ORs), from the dataset of [14]. The interacting odorant-OR pairs were screened from a library of 511 human OR genes [13, 14]; the final number of 303 ORs include only those that respond to at least one odorant in the dataset. The panel of odorants was selected to span the 20-dimensional physiochemical space that explains the variance in mammalian OR responses [12]. The receptors in the dataset, which cover a near-full (~ 3/4) space of known human ORs, can be viewed as the receptor-space representation of the odorant. On the other hand, the set of cognate odorants found for each receptor, which is only a small sampling of the olfactory environment (humans can detect at least ≳ 10^{4} odorants), is not guaranteed to be complete. We note that the dataset also includes several natural odors that are not monomolecular odorants (for example “banana”), which do not exactly fit in our formulation of pairwise odorant-OR interactions. Nevertheless, these are only a minor fraction of the dataset, and we included all reported odors in our analysis.

A graph can be formally written as *G* = (*V*, *E*), where *V* is the set of all nodes (“vertices”) in the graph, and *E* is the set of all edges. In the binary regime, each edge in *E* is an unordered pair of nodes in *V*; this gives a binary graph, which can also be represented in terms of an *adjacency matrix A*, whose (*i*, *j*)-th element *A _{ij}* represents whether there is an edge between the two nodes

*i*and

*j*in the graph. Specifically, the odorant-receptor network is a

*bipartite graph*, where each node belongs to either of two node types (either odorants or receptors) and every edge connects one odorant node to one receptor node. If

*V*and

_{O}*V*are the sets of all odorants nodes and the set of all receptors nodes, respectively, each edge in

_{R}*E*connects exactly one node in V

_{O}and the other node in

*V*. This bipartite structure is cast into the

_{R}*N*×

*M*sub-matrix of the full adjacency matrix,

*A*

_{iλ}, where

*N*= |

*V*| is the number of receptor nodes in the network, and

_{R}*M*= |

*V*| the number of odorant nodes. Each element of the matrix is defined as

_{O}*A*

_{iλ}= 1 if and only if there is an edge between receptor

*i*and odorant λ. We call

*A*

_{iλ}the

*interaction matrix*.

#### Binary vector representation of a receptor code

The receptor code space representation of an odorant, written in a binary vector **y**, is equivalent to the corresponding column of the interaction matrix. That is, [**y**_{λ}]_{i} = *A*_{iλ}, where [**y**]_{i} is the *i*-th element of vector **y**.

The union of two binary vectors is defined element-wise as [**y**_{1} ∪ **y**_{2}]_{i} = [**y**_{1}]_{i} ∨ [**y**_{2}]_{i}, where ∨ is the logical “or” operation between two binary variables. Similarly, the intersection is defined as [**y**_{1} ∩ **y**_{2}]_{i} = [**y**_{1}]_{i} ∧ [**y**_{2}]_{i}, where Λ is the logical “and” operation. We also define the difference of two binary vectors as [**y**_{1} \ **y**_{2}]_{i} = [**y**_{1}]_{i} ∧¬-[**y**_{2}]_{i}, where ¬ is the logical “not” operation. For the union and the intersection, the definitions easily extend to more than two binary vectors.

#### Degree of an odorant

The *degree* of a node is the number of edges attached to it, or (when there is no loop) the number of other nodes that are connected to it. It is useful to define the *size* of a binary vector as the number of its non-zero elements: .

The degree of an odorant node is the number of receptor nodes connected to it:

In the current study, we primarily consider the odorant degree, or how broadly (or narrowly) an odorant is “targeting” the receptors. This breadth of interaction for the odorant is well-defined, because there is a finite number of olfactory receptors. The dataset in [14] covers a majority of the known (functional) human olfactory receptor repertoire: 303 out of ~ 400 ORs. Our approach has advantage over other studies that considered whether a given receptor is “narrowly tuned” or “broadly tuned” across different odors, which is hard to quantify without a good metric of the odor space.

The degree distribution of odorants, *P _{O}*(

*k*), is measured by counting the number of odorants at each degree

*k*.

*P*(

_{O}*k*) appears approximately straight in the log-log scale, and is fitted to the curve,

*P*(

*k*) ~

*k*

^{−α}. However, the fit does not necessarily indicate an underlying distribution that is power-law in a strict sense; we are simply using the power-law dependence to describe the heavy-tailed shape of the degree distribution.

#### Co-activation graph

The *co-activation graph* is a projection of the original bipartite graph to one of the two node types. For example, the co-activation graph of receptors, *G _{R}* = (

*V*,

_{R}*E*), inherits all receptor nodes

_{R;O}*V*from the original bipartite graph. Two receptor nodes in the co-activation graph

_{R}*G*are connected by an edge if there is a common odorant node in the original graph

_{R}*G*that is connected simultaneously to both receptors (see Fig. S4a,f). To define the edges,

*E*, it is sufficient to define the corresponding adjacency matrix

_{R;O}*A*, where (

_{R}*A*)

_{R}_{ij}= 1 indicates an edge between two receptors

*i*,

*j*in the co-activation graph. We call

*A*the

_{R}*co-activation matrix*of receptors. Each element of the co-activation matrix is given as , which indicates whether there is at least one odorant λ for which

*A*

_{iλ}and

*A*

_{jλ}are both 1, where

*A*

_{iλ}is the interaction matrix for the original bipartite graph.

#### Receptor code redundancy

We define the receptor code redundancy as *χ* = (*n _{full}* -

*n*)/

_{eff}*n*, where

_{full}*n*is the full impact of a signal in the absence of redundancy, and

_{full}*n*is the net effect of the signal under redundancy. For a discrimination task of a target odorant λ in the presence of a background odorant

_{eff}*μ*, the effect of redundancy reduces the net effect of adding λ from

*n*= |

_{full}**y**

_{λ}| to

*n*= |

_{eff}**y**

_{λ}\

**y**

_{μ}|. In this case, the receptor code redundancy is

*χ*

_{λ|μ}= (|

**y**

_{λ}| − |

**y**

_{λ}\

**y**

_{μ}|)/ |

**y**

_{λ}| = |

**y**

_{λ}∩

**y**

_{μ}| / |

**y**

_{λ}|.

### Grouping algorithms

We will provide details of the grouping process in three steps: node ranking, primary grouping and secondary grouping. To start, we rank the odorants and receptors in the dataset.

#### Ranking odorants and receptors by the degrees

Odorants are ranked by their degree of interaction, *k* (Eq. 1). When there are multiple odorants with the same degree, they are ranked according to the order they are indexed in the original dataset [14]. Wecallthis ordering *h*_{0}, such that each odorant λ (where λ may be an arbitrary index) is assigned a unique rank *h*_{0}(λ) ∈ {1, ⋯, *M*}.

Receptors are ranked by their degree in the co-activation graph. The degree in the co-activation graph represents the number of other receptors that share at least one odorant partner with the given receptor. When there are multiple receptors with the same degree of co-activation, they are ranked according to the order they are indexed in the original dataset [14]. Similarly as for the odorants, we call this ordering *g*_{0}, such that each receptor *i* is assigned a unique rank *g*_{0}(*i*) ∈ {1, ⋯, *N*}.

#### Primary grouping

We first partition the receptor space into a set of non-overlapping groups, and subsequently use these to classify the odorants based on the receptor code overlap.

In the primary receptor grouping (*g*_{1}), receptors are grouped such that each receptor “signs up” to the largest receptor clique it belongs to. Specifically, we assign to each receptor the rank *h*_{0} of the highest-degree odorant it interacts with; in case of ties, we choose odorants of higher ranks (smaller *h*_{0}). We re-rank the receptor group index while preserving the order of the associated rank *h*_{0}, by assigning the primary group index *I* =1 to the largest receptor group, and the index *I* = 2 to the next largest group, and so on (*I* =1,2, ⋯ *I _{max}*). This results in a grouping

*g*

_{1}:

*i*↦

*I*that maps each receptor

*i*to a group index

*g*

_{1}(

*i*) =

*I*.

In the primary odorant grouping (*h*_{1}), each odorant is assigned to the above-determined receptor group (*g*_{1}) to which its receptor code overlaps the most. The primary odorant grouping *h*_{1}: λ ↦ *I* (or *h*_{1}(λ) = *I*) is defined by choosing the receptor group of index *I*, with which the odorant’s receptor code (**y**_{λ}) displays the maximum overlap, namely *I* = argmax_{J} *χ*_{λ|J}. Here, the receptor code overlap of an odorant λ to a receptor group index *I* is quantified as *χ*_{λ|I} = |**y**_{λ} ∩ **y**_{I}| / |**y**_{λ}|, where the binary vector **y**_{I} represents the receptor code for the group *I* by having each element (**y**_{I})_{i} = 1 if *g*_{1}(*i*) = *I*, and (**y**_{I})_{i} = 0 otherwise. Note that the resulting odorant group indices span through *I* = 1, 2, ⋯, *I*_{max}, shared with the receptor group indices.

#### Secondary grouping

We merge the primary groups based on the amount of co-activation in the receptor code space.

In the secondary receptor grouping (*g*_{2}), pairs of primary receptor groups are merged if there is a strong co-activation. For two primary receptor groups *I*, *J*, we calculate the amount of coactivation between the two groups, simply in terms of the density of off-diagonal-block contribution in the co-activation matrix *A _{R}*:
where

*i*∈

*I*is a shorthand to indicate that the sum runs over all receptors

*i*such that

*g*

_{1}(

*i*) =

*I*. We merge each pair of primary groups

*I*and

*J*whenever their co-activation is stronger than a threshold, i.e., if . Here we used an optimal threshold value of , where the grouping was most informative (Fig. S5; also see Statistical validation). Note that this is a greedy and transitive grouping. For example, if there are three primary groups

*I*

_{1},

*I*

_{2},

*I*

_{3}such that but , then all three are merged together in the secondary grouping; the two groups

*I*

_{1},

*I*

_{2}are first combined, then

*I*

_{3}is also pulled into the group that

*I*

_{2}belongs to. After performing all pairwise merges, we once again re-rank the resulting group indices while preserving the order. This defines a grouping

*G*

_{12}:

*I*↦

*I*’ that maps each primary group

*I*to a (re-numbered) secondary group

*I*’. Each receptor

*i*is then assigned a secondary group index

*g*

_{2}(

*i*) =

*G*

_{12}(

*g*

_{1}(

*i*)).

In the secondary odorant grouping (*h*_{2}), pairs of primary odorant groups are merged if there is a strong co-activation. Because the primary grouping was a simultaneous partitioning of odorants and receptors, the above merging operation of receptor groups from *g*_{1}(*i*) to *g*_{2}(*i*) automatically propagates to the odorant groups. Specifically, each odorant λ is assigned a secondary group index *h*_{2}(λ) = *G*_{12}(*h*_{1}(λ)).

### Statistical validation

#### Optimal threshold for secondary receptor grouping

In the secondary receptor grouping, our goal was to obtain the best partitioning of the receptors so that the resulting structure captures the co-activation pattern. Because we start from the primary receptor groups as the “units”, and work by merging them based on the overlaps, the grouping depends on the threshold . In the limit where the threshold is too small, most units are merged together and a giant cluster is formed (Fig. S5a). On the other hand, when the threshold is too large, most units stay un-merged, failing to capture the global structure (Fig. S5c). The optimal threshold is where the grouping is most *informative*, in the sense that more strongly co-activating primary groups are merged together in the same secondary group, and non-co-activating groups stay apart (Fig. S5b).

Here we consider our grouping as a binary classifier, where each pair of receptors are either grouped together or not. Then the informativeness of grouping can be quantified in terms of the diagnostic ability of the classifier. In particular, there are two success rates we want to maximize: the true positive rate (TPR) and the true negative rate (TNR). The TPR, also called the *sensitivity* or *recall*, is defined as the fraction of co-activated receptor pairs (non-zero elements in the co-activation matrix) that are correctly grouped together. The TNR, also called the *specificity*, is defined as the fraction of non-co-activated receptor pairs (zero elements in the co-activation matrix) that are correctly *not* grouped together. Because our “units” of grouping are the primary receptor groups, which are considered fixed, we only count the elements of the co-activation matrix that belong to different primary groups (off-block elements). It is also common to consider the false positive rate (FPR; also called *the fallout*) instead of the TNR, with a simple relationship FPR = 1 − TNR.

We plot the sensitivity of grouping versus the 1–specificity, at varying threshold , also known as the receiver operating characteristic (ROC) curve, (Fig. S5d). A perfect classifier corresponds to the upper left corner of the ROC plane, where the sensitivity and the specificity are both 1; the results of a random guess would sit along the positive diagonal (also called the no-discrimination line). We define the optimal threshold as the point where the curve most closely approaches the top left corner of the ROC plane; more specifically, we minimize the combined loss function ((1 − sensitivity)^{2} + (1 − specificity)^{2}), or the sum of squared normalized error. The best secondary grouping of the co-activation pattern of receptors is thus obtained at (Fig. S5b).

#### Comparison to null models

To confirm that the existence of receptor groups is a special property of the receptor code, we constructed null models with randomized interactions, constrained only by the marginal degree statistics of the data. First, we considered a random interaction network of 89 odorants and 303 receptors, where each odorant-receptor pair interacts with a uniform probability. The only constraint was that there are exactly 535 interacting pairs, as in the data; this is equivalent to fixing the average degree of an odorant. We observed that this random network groups poorly, reflecting the fact that it is essentially unstructured (Fig. S6a). Second, we considered a more constrained null model where the degree distribution is preserved. We obtained this by shuffling each column of the activation matrix *A*_{iλ} independently, such that each odorant re-connects to the receptors randomly, while the degree is kept fixed. The primary groups are observed as large blocks in the co-activation matrix, because the high-degree odorants are preserved; however, the secondary grouping is still poor (Fig. S6b). This shows that the global structure with well-partitioned groups is a special property of the receptor code, which involves more than just the degree distribution. Note that due to randomization, the degree correlations in these null models are also bound to be neutral; the structure is not easily captured by the simple standard statistics.

## Code and data availability

List of odorants with odor descriptors are provided in Tables S1 and S2. The perceptual odor descriptors were obtained from multiple sources, including public databases, academic works and industrial reports. Data sources are listed in Table S3.

Formatted data files to reproduce the network representation of odorant-receptor interaction with full attributes for each node and edge (as.cyjs and .cys files, to open in Cytoscape), and the Matlab code for the grouping analysis along with associated documentation, are available at `https://github.com/jihyunbak/ORnetwork`.

## AUTHOR CONTRIBUTIONS

J.H.B., S.J.J. and C.H. designed the projects. J.H.B. and C.H. performed research and analyzed the data. J.H.B., S.J.J. and C.H. wrote the paper.

## COMPETING INTERESTS

The authors declare that they have no competing interests.

## ACKNOWLEDGMENTS

We thank the KIAS Center for Advanced Computation for providing computing resources. This study was partly supported by the grants from the National Research Foundation of Korea (NRF-2018R1A2B3001690) (C.H.), and the National Science Foundation (CHE-1362926) (S.J.J).

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].
- [52].↵
- [53].↵
- [54].
- [55].
- [56].