| Literature DB >> 35377807 |
Emily J Mayhew1, Charles J Arayata2, Richard C Gerkin3, Brian K Lee4, Jonathan M Magill2, Lindsey L Snyder2, Kelsie A Little2, Chung Wen Yu2, Joel D Mainland2,5.
Abstract
In studies of vision and audition, stimuli can be chosen to span the visible or audible spectrum; in olfaction, the axes and boundaries defining the analogous odorous space are unknown. As a result, the population of olfactory space is likewise unknown, and anecdotal estimates of 10,000 odorants have endured. The journey a molecule must take to reach olfactory receptors (ORs) and produce an odor percept suggests some chemical criteria for odorants: a molecule must 1) be volatile enough to enter the air phase, 2) be nonvolatile and hydrophilic enough to sorb into the mucous layer coating the olfactory epithelium, 3) be hydrophobic enough to enter an OR binding pocket, and 4) activate at least one OR. Here, we develop a simple and interpretable quantitative model that reliably predicts whether a molecule is odorous or odorless based solely on the first three criteria. Applying our model to a database of all possible small organic molecules, we estimate that at least 40 billion possible compounds are odorous, six orders of magnitude larger than current estimates of 10,000. With this model in hand, we can define the boundaries of olfactory space in terms of molecular volatility and hydrophobicity, enabling representative sampling of olfactory stimulus space.Entities:
Keywords: machine learning; odor space; olfaction; physical transport
Mesh:
Substances:
Year: 2022 PMID: 35377807 PMCID: PMC9169660 DOI: 10.1073/pnas.2116576119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.A model can accurately classify molecules as odorous or odorless based only on transport features. (A) Schematic of the transport process that molecules must complete to act as olfactory stimuli. To elicit an odor, molecules must reach the olfactory epithelium (OE), adsorb into the olfactory mucosa, enter OR binding pockets, and trigger OR neuron (ORN) activation. (B) Transport-feature ML model-generated odorous probabilities for all molecules in the dataset. Each dot represents one molecule colored by the ground truth, and the width of the violin plot is the density of molecules at a given prediction value. (C) Odorous and odorless molecules in transport space. LogP and log(vapor pressure [mmHg]) are plotted for each molecule in the dataset; odorous molecules are represented by circles, and odorless are represented by crosses; molecules are colored by transport-feature ML model-generated odorous probabilities. An LR-generated 50% odorous probability boundary for solids or liquids (Eq. ) is plotted as a solid line, and the boundary for gases (Eq. ) is plotted as a dashed line; increasing the value of any feature by X increases the log odds of odorousness by kX, where k is the corresponding model coefficient. (D) Density of odorous and odorless molecules in transport space defined by molecular weight and number of heteroatoms. Each successive contour line indicates a step increase in density (odorous, red = 0.05%; odorless, blue = 0.01%). Each molecule has an integer number of heteroatoms, but these values are jittered along the y axis to better show density. Plotted within the black box, molecules that obey the rule of three are generally odorous. (E) Heat map of mean AUROC generated by the transport ML, many-feature ML, and the rule of three models for molecules of common chemical classes (number of matching molecules in parentheses).
Fig. 2.Common inaccuracies in data impact model performance. (A) Difference between experimentally determined BP values and BP values calculated using the Burnop (9) and Banks (8) methods. (B and C) Odor classification predictions by transport-feature ML models using BP values calculated by the (B) Burnop or (C) Banks method. (D) Human subject-classified molecules in transport space defined by BP and log P. Many clearly nonvolatile molecules were initially classified as odors due to odorous contaminants. (E) Transport-feature ML model odor predictions for human subject-classified molecules. Chemical compounds that are odorless but had odorous contaminants are correctly predicted to be odorless by the model.
Fig. 3.The transport model can be used to predict the population of odor space. (A) Proportion of molecules predicted by the transport ML model to be odorous as a function of HAC. Red circles show the mean probability generated for HAC tranches from the GDB database (13) with SE indicated. (B) Estimated number of possible molecules and predicted odorous molecules from the GDB databases as a function of HAC. (C) Cumulative estimates of possible molecules and odorous molecules with increasing HAC on a logarithmic scale. The red data point at HAC 17 reflects our conservative estimate of 40 billion odorous molecules.
Fig. 4.Visualization of olfactory space highlights understudied regions. (A) UMAP plot of known odorous molecules (green) and possible molecules from GDB-17 colored by their transport ML-predicted odorous probability. Many regions dense with probable odors are sparsely represented by known odors. (B) Eugenol, a known odorant. (C–E) Example molecules from GDB-17 and their transport ML-predicted probability of being odorous (p).