| Literature DB >> 24058466 |
Jason B Castro1, Arvind Ramanathan, Chakra S Chennubhotla.
Abstract
In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF)--a dimensionality reduction technique--to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures.Entities:
Mesh:
Year: 2013 PMID: 24058466 PMCID: PMC3776812 DOI: 10.1371/journal.pone.0073289
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Summary of non-negative matrix factorization (NMF) applied to odor profiling data.
Schematic Overview: NMF seeks a lower, s-dimensional approximation of a matrix as the product of matrices and . is , consisting in the present study of odor descriptors odors. A given column of is the semantic profile of one odor, with each entry providing the percent-used value (see methods) of a given descriptor. Columns of are basis vectors of the reduced, s-dimensional odor descriptor space. Columns of are -dimensional representations (weights) of the odors in the new basis. Plot of residual error between perceptual data, , and different NMF-derived approximations. . For each choice of subspace, data were divided into random training and testing halves, and residual error between and computed. One-hundred such divisions into training and testing were used to compute the standard errors shown (shaded areas). Reconstruction error (fraction of unexplained variance) for PCA and NMF vs. number of dimensions. The change in reconstruction error for the first interval is indicated by asterisks(*), and corresponds to the first point in the next panel. Change in reconstruction error for PCA and NMF, compared to the change in reconstruction error for PCA performed on a scrambled matrix (). is used to estimate the cutoff number of dimensions for which a given dimensionality reduction method is explaining only noise in a dataset. Note that each point, , is actually the difference in reconstruction error between dimensions and (by way of illustration, points with an asterisk in this panel denote corresponding intervals in the previous panel ).
Figure 2Properties of the perceptual basis set .
Plot of normalized odor descriptor amplitude vs. odor descriptor number for the basis vector . Each point along the x-axis corresponds to a single odor descriptor, and the amplitude of each descriptor indicates the descriptor's relevance to the shown perceptual basis vector. Colored circles show the largest points in the basis vector, and descriptors corresponding to these points are listed to the right. Waterfall plot of the 10 basis vectors constituting , used in subsequent analyses. Note that each vector contains many values close to or equal to zero. Detailed view of the first four basis vectors and their leading values. Left column: peak-normalized, rank ordered basis vectors, illustrating their sparseness and non-negativity. Right column: semantic descriptors characterizing the first four basis vectors. Bars show the first six rank-ordered, peak-normalized components of basis vectors 1 through 4 (subset of data from left column). The semantic label for each component is show to the left.
10 largest-valued descriptors for each of the 10 basis vectors obtained from non-negative matrix factorization.
| W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 |
| FRAGRANT | WOODY, RESINOUS | FRUITY, OTHER THAN CITRUS | SICKENING | CHEMICAL | MINTY, PEPPERMINT | SWEET | POPCORN | SICKENING | LEMON |
| FLORAL | MUSTY, EARTHY, MOLDY | SWEET | PUTRID, FOUL, DECAYED | ETHERISH, ANAESTHETIC | COOL, COOLING | VANILLA | BURNT, SMOKY | GARLIC, ONION | FRUITY, CITRUS |
| PERFUMERY | CEDARWOOD | FRAGRANT | RANCID | MEDICINAL | AROMATIC | FRAGRANT | PEANUT BUTTER | HEAVY | FRAGRANT |
| SWEET | HERBAL, GREEN, CUT GRASS | AROMATIC | SWEATY | DISINFECTANT, CARBOLIC | ANISE (LICORICE) | AROMATIC | NUTTY (WALNUT ETC) | BURNT, SMOKY | ORANGE |
| ROSE | FRAGRANT | LIGHT | SOUR, VINEGAR | SHARP, PUNGENT, ACID | FRAGRANT | CHOCOLATE | OILY, FATTY | SULFIDIC | LIGHT |
| AROMATIC | AROMATIC | PINEAPPLE | SHARP, PUNGENT, ACID | GASOLINE, SOLVENT | MEDICINAL | MALTY | ALMOND | SHARP, PUNGENT, ACID | SWEET |
| LIGHT | LIGHT | CHERRY (BERRY) | FECAL (LIKE MANURE) | PAINT | SPICY | ALMOND | HEAVY | HOUSEHOLD GAS | COOL, COOLING |
| COLOGNE | HEAVY | STRAWBERRY | SOUR MILK | CLEANING FLUID | SWEET | CARAMEL | WARM | PUTRID, FOUL, DECAYED | AROMATIC |
| HERBAL, GREEN, CUT GRASS | SPICY | PERFUMERY | MUSTY, EARTHY, MOLDY | ALCOHOLIC | EUCALIPTUS | LIGHT | MUSTY, EARTHY, MOLDY | SEWER | HERBAL, GREEN, CUT GRASS |
| VIOLETS | BURNT, SMOKY | BANANA | HEAVY | TURPENTINE (PINE OIL) | CAMPHOR | WARM | WOODY, RESINOUS | BURNT RUBBER | SHARP, PUNGENT, ACID |
Figure 3NMF on full, descriptor-only, and odor-only shuffled versions of the data.
Peak behavior of histograms obtained from NMF performed on shuffled data, for each of the various shuffling conditions (see text for descriptions). Tail behavior of histograms, same procedure and conditions as in ; note difference in scaling of axes between and . Waterfall plots of basis sets obtained when NMF was applied on shuffled data, for various shuffling conditions. Note the comparative lack of sparseness, relative to the basis set shown in Fig. 3A. Reproducibility of basis vectors across iterations of NMF for shuffled data sets was eliminated, or severely compromised, as shown in Fig. 4.
Figure 4Consensus Matrices for odor-shuffles, descriptor-shuffles, and full-shuffles.
Consensus matrices (see text) showing reliability of basis sets when NMF is applied to various shuffled versions of the data. Only the original data shows the bimodal distribution of 1s and 0s characteristic of highly reliable clustering. Image ranges and colorscale same for all 4 matrices. Top: Histograms of consensus matrix values for the three shuffling conditions, and the original data, confirming that only the original data shows a bimodal distribution of 1s and 0s (line colors correspond to labels in ). Bottom: Cumulative histograms, same data as above.
Figure 5Approximate orthogonality of the NMF basis vectors.
Histogram of angles subtended by all pairs of basis vectors, . Histogram was constructed for all pairwise comparisons between dimensions, excluding self-comparisons. Bar with (*) denotes self-comparisons. Matrix of pairwise comparisons of angles between dimensions.
Figure 6Visualization of odors expressed in coordinates of the new basis.
The weight matrix, , discovered by NMF. Columns of (each column corresponds to a different odor), are normalized and sorted into groups defined by peak coordinate (1–10). Plot of all 144 odors (each point is a column of ) in the space spanned by the first 3 basis vectors, and . Black, red, and blue points are those with peak coordinates in dimensions 1, 2, and 3 respectively. Gray points are all remaining odors. Chemical structures of representative odorants from the second and seventh diagonal blocks of the sorted matrix (panel ).
Figure 7Two-dimensional embedding of the descriptor-space, .
Results of stochastic neighbor embedding (see text) applied to the similarity matrix for . Axis units are arbitrary, but preserve neighbor relations present in the higher dimensional space, . Note that discrete clusters are clearly evident. Clusters were identified by eye, and descriptors composing each cluster are listed in the table below.
Figure 8Two-dimensional embedding of the odorant-space, .
Results of stochastic neighbor embedding (see text) applied to the similarity matrix for . As in figure 7, axis units are arbitrary, but preserve neighbor relationships observed in the full-dimensional space, . Clusters were identified by eye, and odorants composing each cluster are listed in the table below.
Figure 9Co-clustering of descriptors and odors.
Overview of method used for defining a bicluster (see text for definition). A column of (descriptors), and the corresponding row of (odors) are rank ordered. The indices derived from the rank-ordering are used to re-order rows and columns of (accomplished by computing the outer product between the rank-ordered column of and rank-ordered row of ), producing a submatrix with high correlation among both odors and descriptors. By the nature of the sorting procedure, these matrices – biclusters – will have their largest values in the upper-left corner. For purposes of visualization, biclusters were convolved with an averaging filter. The 10 biclusters defined by NMF on odor perceptual data.
List of compounds in every cluster identified from NMF.
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 |
| 1. Isoamylphenylacetate,2. Aurantiol,3. 6,7-dihydro-1,1,2,3,3-pentamethyl-4-(5H)indanone,4. Indol-hydroxycitronellal,5. beta-ionone (low concentration),6. beta-ionone (high concentration),7. N'-[(E)-3-(5-methoxy-2,3-dihydro-1,4-benzodioxin-7-yl) prop-2-enoyl]-2,3-dihydro-1,4-benzodioxine-3-carbohydrazide,8. hydroxyisohexyl 3-cyclohexene carboxaldehyde,9. 2-methoxynaphthalene,10. Diethoxymethane,11. Galaxolide,12. ethylenebrassylate,13. Phenylethyl Alcohol (low concentration)14. Phenylethyl Alcohol (high concentration) | 15. Cedrene epoxide,16. bornyl acetate,17. 8-sec-Butylquinoline,18. 2,4,6-trimethylcyclohex-3-ene-1-carbaldehyde,19. decalin,20. dibutylamine,21. Synthetic amber,22. 1,1-Dimethoxy-2-phenylpropane,23. Methyl isonicotinate,24. Nootkatone,25. 1-octen-3-ol,26. isophorone (low concentration),27. isophorone (high concentration),28. Isopropyl quinolone,29. Argeol,30. Gamma-undecalactone,31. 10-undecenoic acid | 32. ethylmethylphenylglycidate (low concentration)33. ethylmethylphenylglycidate (high concentration)34. allylcaproate,35. isoamyl acetate,36. n-amyl butyrate,37. Dmbc butyrate,38. ethyl butyrate,39. ethyl propionate,40. Fructone,41. methylanthranilate,42. Pentylvalerate | 43. Butyric Acid44. hexanoic acid45. indole46. methylthiolbutyrate47. n-pentanoic acid48. 4-pentenoic acid49.50. . phenylacetic acid51. Propyl butyrate52. Skatole (3-Methyl-1H-indole)53. Isovalerylaldehyde54. isovaleric acid | 55. Acetophenone56. Anisole57. 1-Butanol58. 4-cresol59. p-Tolylisobutyrate60. 4-methyl anisole61. cyclohexanol62. 2,5-dimethylpyrazine63. methyl hexyl ether64. 1-hexanol65. 3-hexanol66. iodoform67. methyl furan-3-carboxylate68. 4-methylquinoline69. phenylacetylene70. alpha-terpineol71. 6-methyl-1,2,3,4-tetrahydroquinoline72. Thymol73. Toluene74. 3-Methyl-1H-indole |