| Literature DB >> 26484763 |
Ritesh Kumar1, Rishemjit Kaur2, Benjamin Auffarth3, Amol P Bhondekar1.
Abstract
Odours are highly complex, relying on hundreds of receptors, and people are known to disagree in their linguistic descriptions of smells. It is partly due to these facts that, it is very hard to map the domain of odour molecules or their structure to that of perceptual representations, a problem that has been referred to as the Structure-Odour-Relationship. We collected a number of diverse open domain databases of odour molecules having unorganised perceptual descriptors, and developed a graphical method to find the similarity between perceptual descriptors; which is intuitive and can be used to identify perceptual classes. We then separately projected the physico-chemical and perceptual features of these molecules in a non-linear dimension and clustered the similar molecules. We found a significant overlap between the spatial positioning of the clustered molecules in the physico-chemical and perceptual spaces. We also developed a statistical method of predicting the perceptual qualities of a novel molecule using its physico-chemical properties with high receiver operating characteristics(ROC).Entities:
Mesh:
Year: 2015 PMID: 26484763 PMCID: PMC4615634 DOI: 10.1371/journal.pone.0141263
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Database Characteristics.
| Dataset | No of molecules | No of perceptual descriptors | Avg No of perceptual descriptors per molecule | Avg occurrence of a perceptual descriptor | Sparseness (%) |
|---|---|---|---|---|---|
|
| 537 | 177 | 1.72 (1–5) | 5.226 (1–58) | 99.03 |
|
| 2933 | 456 | 2.60(1–10) | 16.72(1–689) | 99.43 |
|
| 239 | 157 | 3.23 (1–4) | 4.929(1–97) | 97.94 |
|
| 815 | 107 | 3.44 (1–21) | 26.20(1–196) | 96.78 |
|
| 196 | 95 | 3.48(1–19) | 7.18 (1–80) | 96.33 |
|
| 3017 | 526 | 3.51(1–23) | 20.11(1–830) | 99.33 |
The values in brackets represent the range of corresponding column descriptors
Fig 1Database characteristics:
The number of perceptual descriptors vs number of molecules per database.
Network Characteristics of the odour network and the comparison with random network where, A = Avg degree, Nd = Network diameter, Nl = Avg path length, Dg = Graph density, α = Power law exponent, Xmin = Power law cutoff degree, r = Assortativity Coefficient, clavg = Clustering Coefficient and R-cl = Random Clustering Coefficient.
| Database | #Nodes | #Weighted Edges | A | Nd | Nl | Dg | α | Xmin | r | clavg |
|
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 177 | 508 | 5.74 | 6 | 2.98 | 0.024 | 2.41 | 7 | -0.13 | 0.34 |
|
|
| 456 | 11057 | 48.50 | 5 | 2.33 | 0.041 | 1.92 | 26 | -0.17 | 0.74 |
|
|
| 157 | 1012 | 12.90 | 6 | 2.49 | 0.05 | 2.25 | 13 | -0.18 | 0.73 |
|
|
| 107 | 6655 | 124.40 | 4 | 1.73 | 0.329 | 3.46 | 183 | -0.14 | 0.84 |
|
|
| 95 | 1850 | 38.95 | 4 | 2.07 | 0.167 | 1.89 | 16 | -0.18 | 0.73 |
|
|
| 526 | 25805 | 98.12 | 4 | 2.21 | 0.054 | 1.69 | 28 | -0.20 | 0.80 |
|
Fig 2Degree Distribution.
The degree distribution of the networks in log-log plot along with the fitted truncated power law. For all the networks, except SuperScent, the probability that a given perceptual descriptor connects with k other perceptual descriptors follows a power law (p(x) = x-α) with α Є [2, 3] or α ~ 2.
Hubs in the Network.
| Number of occurrences | Perceptual descriptors |
|---|---|
| 6 | Fruit, Floral, Wood, Herb |
| 5 | Sweet, Fat, Green |
| 3 | Nut, Citrus |
| 2 | Pungent, Meat, vegetable |
| 1 | Balsam, Sulfur, Wax, Earth, Ether, Pineapple, Spice, Apple, Chocolate |
Semantic analysis and comparison of the odour network using the brown database. The networks have been created using a bag of words approach using window sizes of 2,3,4 according to the average number of perceptual descriptors per molecule in each database. The odour subnetworks consisted of only those perceptual descriptors that were found in the semantic database. Network Characteristics along with Eigen value similarity of the perceptual network in comparison with random network has been tabulated, where, N = Number of nodes, WE = Number of Weighted Edges, A = Avg degree, Nd = Network diameter, Nl = Avg path length, Dg = Graph density, r = Assortativity Coefficient and clavg = Clustering Coefficient
| Semantic network | Odor Subnetwork | Eigen Value similairty | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Database | N | WE | A | Nd | Nl | Dg | r | clavg | WE | A | Nd | Nl | Dg | r | clavg | |
|
| 98 | 70 | 1.43 | 10 | 4.21 | 0.009 | 0.272 | 0.013 | 240 | 4.89 | 6 | 2.89 | 0.037 | -0.158 | 0.219 | 3.42e+03 |
|
| 204 | 389 | 3.81 | 12 | 4.71 | 0.007 | 0.188 | 0.03 | 5006 | 49.07 | 5 | 2.21 | 0.078 | -0.168 | 0.674 | 3.12e+06 |
|
| 80 | 118 | 2.95 | 7 | 3.55 | 0.015 | 0.189 | 0.018 | 400 | 10 | 7 | 2.49 | 0.079 | -0.166 | 0.527 | 1.39e+04 |
|
| 75 | 76 | 2.03 | 9 | 4.02 | 0.013 | 0.189 | 0 | 4388 | 117.01 | 3 | 1.57 | 0.429 | -0.116 | 0.811 | 1.50e+06 |
|
| 72 | 73 | 2.03 | 12 | 5.24 | 0.014 | 0.079 | 0 | 1350 | 37.50 | 4 | 2.01 | 0.201 | -0.167 | 0.736 | 2.77e+05 |
|
| 221 | 218 | 1.97 | 12 | 4.64 | 0.007 | 0.125 | 0.035 | 13125 | 118.78 | 5 | 2.08 | 0.12 | -0.210 | 0.764 | 1.80e+07 |
Fig 3Odour Network.
(a-g) The communities detected in the odour network of databases using modularity maximization algorithm. The colours indicate the different communities.
The number of clusters obtained and the corresponding Hubert Index of the clusters.
| Database | #Clusters Perceptual descriptors | #Clusters Structural | Hubert Index(HI) |
|---|---|---|---|
|
| 12 | 9 | 0.518 |
|
| 17 | 17 | 0.750 |
|
| 17 | 17 | 0.718 |
|
| 15 | 17 | 0.723 |
|
| 12 | 17 | 0.656 |
|
| 17 | 17 | 0.733 |
The classification rate and ROC values with and without feature selection.
| Database | ROC without feature selection | ROC with feature selection | # Features |
|---|---|---|---|
|
| 0.787 | 0.798 | 49 |
|
| 0.733 | 0.623 | 3 |
|
| 0.83 | 0.829 | 87 |
|
| 0.781 | 0.787 | 50 |
|
| 0.722 | 0.688 | 9 |
|
| 0.809 | 0.817 | 61 |