A Principal Odor Map Uni�es Diverse Tasks in Human Olfactory Perception

Mapping molecular structure to odor perception is a key challenge in olfaction. Here, we use graph neural networks (GNN) to generate a Principal Odor Map (POM) that preserves perceptual relationships and enables odor quality prediction for novel odorants. The model is as reliable as a human in describing odor quality: on a prospective validation set of 400 novel odorants, the model-generated odor pro�le more closely matched the trained panel mean (n=15) than did the median panelist. Applying simple, interpretable, theoretically-rooted transformations, the POM outpe�ormed chemoinformatic models on several other odor prediction tasks, indicating that the POM successfully encoded a generalized map of structure-odor relationships. This approach broadly enables odor prediction and paves the way toward digitizing odors.


Introduction
A fundamental problem in neuroscience is mapping the physical prope ies of a stimulus to perceptual characteristics. In vision, wavelength maps to color; in audition, frequency maps to pitch. By contrast, the mapping from chemical structures to olfactory percepts is poorly understood. Detailed and modality-speci c maps like the CIE color space (1), and Fourier space (2) led to a be er understanding of visual and auditory coding. Similarly, to be er understand olfactory coding, olfaction needs a be er map.
Pitch increases monotonically with frequency; in contrast, the relationship between odor percept and odorant structure is riddled with discontinuities, exempli ed by Sell's triplets (3), trios of molecules in which the structurally similar pair is not the perceptually similar pair (Fig.  1A). These discontinuities in the structure-odor relationship suggest that standard chemoinformatic representations of molecules-functional group counts, physical prope ies, molecular ngerprints, etc.-used in recent odor modeling work (4)(5)(6) are inadequate to map odor space.

Results
To generate odor-relevant representations of molecules, we constructed a Message Passing Neural Network (MPNN) (7), a speci c type of graph neural network (GNN) (8), to map chemical structures to odor percepts. Each molecule is represented as a graph, with each atom described by its valence, degree, hydrogen count, hybridization, formal charge, and atomic number. Each bond is described by its degree, aromaticity, and whether it is in a ring. Unlike traditional ngerprinting techniques (9), which assign equal weight to all molecular fragments within a set bond radius, a GNN can optimize fragment weights for odor-speci c applications. Neural networks have unlocked predictive modeling breakthroughs in diverse perceptual domains (e.g., natural images (10), faces (11), and sounds (12)) and naturally produce intermediate representations of their input data that are functionally high-dimensional, data-driven maps. We use the nal layer of the GNN (hencefo h, "our model") to directly predict odor qualities, and the penultimate layer of the model as a principal odor map (POM). The POM 1) faithfully represents known perceptual hierarchies and distances, 2) extends to novel odorants, 3) is robust to discontinuities in structure-odor distances, and 4) generalizes to other olfactory tasks.
To train the model, we curated a reference dataset of approximately 5000 molecules, each described by multiple odor labels (e.g. creamy, grassy), by combining the Goodscents (13) and Le ngwell (14) (GS/LF) avor and fragrance databases (Fig. 1B). The model (Fig. 1C) achieved strong cross-validation predictive pe ormance of AUROC=0.89 (15). The GNN was trained on a curated dataset of 5000 semantically labeled molecules drawn from GoodScents (13) and Le ngwell (14) avor and fragrance databases; one square represents 100 molecules; three example training set molecules and their odor descriptions are shown: 2-methyl-2-hexenoic acid (top), 2,5-dimethyl-3-thioisovalerylfuran (middle), 1-methyl-3-hexenyl acetate (bo om). (C) Schematic illustrating the process of training a GNN to generate the POM. (D-F) Odorants plo ed by the rst and second principal components (PC) of their (D) perceptual labels from GS/LF training dataset (138 labels), (E) cFP structural ngerprints (radius 4, 2048-bit), and (F) POM coordinates (256 dimensions). Areas dense with molecules having the broad category labels oral, meaty, or alcoholic are shaded; areas dense with narrow category labels are outlined. The POM recapitulates the true perceptual map, but the FP map does not; note that only relative (not absolute) coordinates ma er.
To test how well the POM represents known perceptual relationships, we compared both the POM and a map built with standard chemoinformatic features -Morgan ngerprints (FP) -to empirical perceptual space ( Fig. 1D-F). We measured the delity of the maps in representing true relative perceptual distances, (e.g. two molecules that smell of jasmine should be nearer to each other than to a beefy molecule) and hierarchies (e.g. jasmine and lavender are subtypes of the oral odor family). The POM be er represents relative distances: distances in the perceptual map ( Fig. 1D) are more signi cantly correlated to distances in the POM (R=0.73, Fig. S1A) than to distances in the FP map (R=-0.12, p <0.001, Fig. S1B). The POM be er represents perceptual hierarchies: molecules with a shared odor label have signi cantly tighter cluster density (CD) in the POM (CD = 0.51± 0.19) than in the FP map (CD = 0.68 ± 0.23, p <0.001, Fig. S2), where smaller CD values denote more dense clusters.
To test if the model extends to novel odorants, we designed a prospective validation challenge (16) in which we benchmarked model predictive pe ormance against individual human raters. In olfaction, no reliable instrumental method of measuring odor perception exists, and trained human sensory panels are the gold standard for odor characterization (17). Like other sensory modalities, odor perception is variable across individuals (18,19), but group-averaged odor ratings have been shown to be stable across repeated measurements (20) and represent our best avenue to establish the ground-truth odor character for novel odorants. We trained a coho of subjects to describe their perception of odorants using the Rate-All-That-Apply method (RATA) and a 55-word odor lexicon. During training sessions, each term in the lexicon was paired with visual and odor references (Table S1; Fig. S3). Only subjects that met pe ormance standards on the pretest of 20 common odorants (Data S2; individual test-retest correlation R > 0.35; reasonable label selection for common odorants) were invited to join the panel.
To avoid trivial test cases, we applied the following selection criteria for the set of 400 novel odorants: 1) molecules must be structurally distinct from each other (Fig. S4), 2) molecules should cover the widest gamut of odor labels (Data S1), and 3) molecules must be structurally or perceptually distinct from any training example (e.g. Fig. 1A, Data S1). Our prospective validation set consists of 55-odor label RATA data for 400 novel, intensity-balanced odorants generated by our coho of ≥15 panelists (2 replicates). Summary statistics and correlation structure of the human perceptual data is presented in Fig. S5-7. Our panel's mean ratings were highly stable (panel test-retest: R = 0.80, n = 15; Fig. S8) and more consistent than the DREAM coho 's ratings (6) (Fig. S9-10). To measure the model's pe ormance, we compared the concordance of its normalized predictions with the normalized panel mean rating ( Fig. 2A and 2C). While there is considerable variation across molecules in the ability of both individual raters and the model to match the panel mean ratings, the model output comes closer to the panel mean than does the median panelist for 53% of molecules ( Fig. 2E and 2F). The model's superiority at the task is even more impressive given that panelists are able to smell each odorant as they rate it, while the model's predictions are based solely on nominal molecular structure.
As a baseline comparison, we trained a cFP-based random forest (RF) model, the previous state-of-the-a (6), on the same dataset (Fig. 2B). This baseline model surpassed the median panelist for only 41% of molecules, showing that our GNN model's pe ormance increase comes not only from the volume and quality of the data, but impo antly from the model architecture.
The GNN model shows human-level pe ormance in aggregate, but how does it pe orm across perceptual and chemical classes? When we disaggregate pe ormance by odor label, the model is within the distribution of human raters for all labels except musk and surpasses the median panelist for 32/55 labels (58%, Fig. 3A). This per-label view suppo s the view that the GNN model is superior to the previous state of the a model trained on the same data (paired 2-tailed t-test p=1.0e-7).
Predictive pe ormance for a given label depends on the complexity of the structure-odor mapping for that label, so it is unsurprising that it pe orms best for labels like garlic and shy that have clear structural determinants (sulfur-containing for garlic; amines for shy), and worst for the label musk, which includes at least 5 distinct structural classes (macrocyclic, polycyclic, nitro, steroid-type, and straight-chain) (21,22). In contrast, a panelist's pe ormance for a given label depends on their familiarity with the label in the context of smell; consequently, we see strong panelist-panel agreement for labels describing common food smells like nu y, garlic, and cheesy and weak agreement for labels like musk and hay. Weak agreement for musk may also be due to genetic variability in perception, a well-documented phenomenon (23).
Model pe ormance also depends on the number of training examples for a given label; with enough examples, models can learn even complex structure-percept relationships. In general, our model's pe ormance is high for labels with many training examples (e.g, fruity, sweet, oral) (Fig. 3B), but pe ormance for labels with few training examples can be either high (e.g., shy, camphoreous, cooling) or low (e.g, ozone, sharp, fermented). In other words, collecting more training data raises the oor for model pe ormance. Likewise, model pe ormance is bounded above by panel test-retest correlation (Fig. S13). When we disaggregate by chemical classes (e.g. esters, phenols, amines), both panelist and model pe ormance is relatively uniform (Fig. 3C), with sulfur-containing molecules showing strongest pe ormance from panelists and the model (R = 0.52).
Chemical materials are impure -a fact too o en unaccounted for in olfactory research (24). To measure the contribution of impurities to the odor percept of our stimuli, we applied a gas chromatography-mass spectrometry (GC-MS and gas chromatography-olfactometry (GC-O) quality control (QC) procedure to 50 stimuli (Data S1). This QC procedure matches an odor percept to its causal molecule, allowing us to identify stimuli for which the primary odor character was not due to the nominal compound. Our QC led to diverse conclusions: the nominal compound caused the odor (12/50), the nominal compound and contaminants contribute to the odor (16/50), contaminants caused the odor (18/50), or the cause of the odor could not be determined (4/50) (Fig. 3D). In some cases, while we purchased a novel odorant, the dominant odorant was not novel; for example, the stimulus 4,5-dimethyl-1,3-thiazol-2-amine was described by the panel as bu ery, sweet, and dairy, but this odor percept was a ributed through QC to the contaminant diacetyl, a well-known bu ery odorant. In another case, the purchased odorant, isobornyl methylacrylate, was described by the panel and the model as both piney and oral; however, through QC we determined that the nominal compound was oral only and that the piney aroma was due to the closely related compound, borneol, which was detected as a contaminant in the sample. Based on QC results, we removed 26 molecules known or suspected to have high degrees of odorous contamination (Data S1).
The prevalence of odorous contamination that we found demonstrates that it is not safe to assume that the odor percept of a purchased chemical is due to the nominal compound. The Flavor & Fragrance (F&F) industry is motivated to minimize odorous contaminants for commercially valued odorants, but there is no such incentive for non-F&F commodity chemicals. We stress the need for caution and diligence in expanding odor stimulus space.
Implications of each QC result on model pe ormance are unique (Data S1). In some cases, the model pe ormed well despite the presence of odorous contaminants. We estimate that, if these contaminants were removed from the rated samples, model pe ormance improves in 6 of 50 scenarios, degrades in another 6 of 50 scenarios, remains neutral in 21 of 50 scenarios, and cannot be determined in 17 of 50 scenarios. To test if the model is robust to discontinuities in structure-odor distances, we designed an additional challenge in which 41 new triplets (example in Fig. 4A) were constructed and validated by the panel (as in Fig. 1A). In each triplet, the anchor molecule is a known odorant, and is matched with one structurally similar and one structurally dissimilar novel odorant, and in which the more structurally dissimilar odorant is predicted to be the more perceptually similar of the two to the anchor. Our trained panelists were presented with the three odorants as a set and rated the perceptual distance between each of the molecules in the triplet (Fig. 4B). Con rming the model's predictions --counterintuitive under simpler structural models of odor --our panelists generally rated the structurally dissimilar molecules as being more perceptually similar to an anchor molecule than the anchor's structural neighbor (p < 2.2e-16, Fig. 4C). This signi cant result is fu her evidence that the POM overcomes discontinuities in the structure-odor relationship. A reliable structure-odor map allows us to explore odor space at scale. We compiled a list of 500,000 potential odorants whose empirical prope ies are currently unknown to science or industry; most have never been synthesized before. Because a molecule's coordinates in the POM are directly computable from the model, we can plot these potential odorants in the POM (Fig. 5A), revealing a potential space of odorous molecules that is much larger than the much smaller space covered by current fragrance catalogs (~5,000 purchasable, characterized odorants). These molecules would take approximately 70 person-years of continuous smelling time to collect using our trained human panel. length, vector distance, and vector projection correspond to the odor prediction tasks of odor detectability, similarity, and descriptor applicability. Equation shows that the projected space Y represents the dot product between POM and a task-speci c projection matrix X. (C) A linear model atop POM outpe orms a chemoinformatic SVM baseline at predicting odor applicability on two extant datasets, Dravnieks (25) and DREAM (6), as well as the current data. (D) A linear model atop POM outpe orms a chemoinformatic SVM baseline at predicting odor detection threshold using data from Abraham et al, 2011 (26). (E) A linear model atop POM outpe orms a chemoinformatic SVM baseline at predicting perceptual similarity on Snitz et al, 2013 (4).
We show that the POM has a meaningful interpretation by extracting intuitive, geometric measures and mapping them to several olfactory prediction tasks (Fig. 5B). The applicability of any set of odor descriptors corresponds to a projection of the POM coordinates onto axes corresponding to those descriptors; odor strength (detectability) corresponds to the magnitude of this projection (Fig. S12), and odor similarity corresponds to the distance between such projects for di erent molecules. We nd that a simple linear model applied to POM and using these geometric interpretations has comparable or superior pe ormance to a chemoinformatic suppo vector machine (SVM) model across multiple published datasets (Fig 5C, D, E), collectively representing some of the most thorough previous public e o s to characterize these features of odor.

Discussion
There is no universally accepted method for quantifying and categorizing an odor percept. In other words, olfaction has been a sense without a map. Systems of odor classi cation have been proposed: rst intuitive categorizations (28), then empirically-suppo ed universal spaces (29,30), and later a empts to incorporate receptor mechanisms (31,32). However, these systems do not tie stimulus prope ies to perception, and none have reached broad acceptance. Here we propose and validate a novel, data-driven, high-dimensional map of human olfaction. We have shown that this map recapitulates the structure and relationships of odor perceptual categories evoked by single molecules, that it can be used to achieve prospective predictive accuracy in odor description that exceeds that of the typical individual human, and that it is broadly transferrable to arbitrary olfactory perceptual tasks using natural and interpretable transformations. This map represents for odor what the CIE color space represents for vision.
Nearly all published chemosensory models were t to the data used in their construction. Even using cross-validation, the oppo unity for over-ing is high, because the data comes from a single distribution, task, or experimental source. Prospective validation on new data from a new source with no adjustments, as we pe ormed, represents a much more stringent test of real-world utility. In this prospective context, we found that our model pe orms roughly on par with the median human panelist, beating a chemoinformatic baseline.
However, in a real-world se ing, models can and should be updated as new data becomes available. This process is called 'online learning' (27), and is a central capability of many real-world ML systems. Fig 5C demonstrates that a linear model atop POM reaches an even higher level of pe ormance when the POM is tuned to the new dataset.
The success of this model is not merely an advance in predictive modeling. It o ers a simple, intuitive, contiguous, hierarchical, parseable map of molecular space in terms of odor, much as color spaces represents wavelengths of light in terms of colors and color components. It enables human-level pe ormance not only for odor description but also generalizes to a gamut of other olfactory tasks. It o ers the oppo unity to reason, intuitively and computationally, about the relationships within and between molecular and odor spaces.
There are some practical considerations to keep in mind when using this map. First, the concentration of an odor in uences odor character, but is not explicitly included in the map. So while it can predict detection thresholds, a prope y of the odorant molecule, it cannot predict suprathreshold intensity, a function of the odorant and its concentration. Many molecules have no odor, which we addressed by pre-screening with a separate, simpler model (33), and we diluted odorants to standardize intensity. Second, predictive pe ormance is strong for organic molecules, the vast majority of odorants we encounter, but we could not extend the predictions into halides or molecules that include novel elements due to the lack of safety data for those molecules. Given uniformly strong pe ormance across broad chemical classes tested in our prospective validation set (Fig. 3C), we expect high accuracy on novel chemicals within these chemical classes, but we would not expect high pe ormance for molecules that have chemical motifs not represented in our training set. For instance, if our training dataset did not contain any molecules with carbon macrocycles, we would not expect the model to accurately predict the odor of an unseen macrocyclic musk (Fig. 3A). Third, many chemical stimuli have odorous contaminants (24), pa icularly those that have not been developed for use in fragrance applications. Neural networks are known to pe orm well, even with substantial noise in the training and test sets, which we see in the present work. Nonetheless, we recommend isolating the compound of interest from odorous contaminants, and/or characterizing the perceptual quality of contaminants. Finally, datasets in real-world se ings are not static, but grow in size, and shi in distribution -models should be periodically retrained to incorporate new data. We showed that model pe ormance tends to improve with increased training data (Fig. 3B) and data quality (Fig. S13), consistent with ML applications in other areas (34,35). Indeed, the most impo ant future work --work which will increase the accuracy and resolution of the map and any model that uses it --will be scaling the volume and quality of training data. Progress in neuroscience is o en measured by the creation and discovery of new maps of the world suppo ed by neural circuitry-maps of space in hippocampus, faces in the superior temporal sulcus, tonotopy in auditory co ex, and retinotopy and Gabor lters in V1 visual co ex, among others. Each is only possible because scientists rst possessed a map of the external world, and then measured how responses in the brain varied with stimulus position on the map. We have had no such map for odor, but this study proposes and validates a novel data-driven map of human olfaction. We hope this map will be useful to researchers in chemistry, olfactory neuroscience, and psychophysics: rst, as a drop-in replacement for chemoinformatic descriptors, and more broadly as a new tool for investigating the nature of olfactory sensation.