| Literature DB >> 33051564 |
Rinu Chacko1, Deepak Jain2, Manasi Patwardhan1, Abhishek Puri1, Shirish Karande1, Beena Rai1.
Abstract
Machine learning and data analytics are being increasingly used for quantitative structure property relation (QSPR) applications in the chemical domain where the traditional Edisonian approach towards knowledge-discovery have not been fruitful. The perception of odorant stimuli is one such application as olfaction is the least understood among all the other senses. In this study, we employ machine learning based algorithms and data analytics to address the efficacy of using a data-driven approach to predict the perceptual attributes of an odorant namely the odorant characters (OC) of "sweet" and "musky". We first analyze a psychophysical dataset containing perceptual ratings of 55 subjects to reveal patterns in the ratings given by subjects. We then use the data to train several machine learning algorithms such as random forest, gradient boosting and support vector machine for prediction of the odor characters and report the structural features correlating well with the odor characters based on the optimal model. Furthermore, we analyze the impact of the data quality on the performance of the models by comparing the semantic descriptors generally associated with a given odorant to its perception by majority of the subjects. The study presents a methodology for developing models for odor perception and provides insights on the perception of odorants by untrained human subjects and the effect of the inherent bias in the perception data on the model performance. The models and methodology developed here could be used for predicting odor characters of new odorants.Entities:
Mesh:
Year: 2020 PMID: 33051564 PMCID: PMC7553929 DOI: 10.1038/s41598-020-73978-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Visualization of the data based on functional groups present (grouped by compounds), colored based on the perceived average familiarity among the subjects and sized according to the average pleasantness of the odor among the subjects. (b) Visualization of the ratings within each group (screenshot of group 8 representing aliphatic esters) showing the pattern of perceptual ratings in terms of familiarity, pleasantness and intensity. The bubble size represents the count of samples for each combination. (Group details are as follows, 1: aliphatics, 2: aromatics, 3: cyclic compounds, 4: alphatic acids, 5: aromatic acids, 6: aliphatic alcohols, 7: aromatic alcohols, 8: aliphatic esters, 9: aromatic esters, 10: aliphatic aldehydes, 11: aromatic aldehydes, 12: aliphatic ketones, 13: aromatic ketones, 14: organosulphur compounds, 15: nitrogen containing compounds, 16: others). Interactive charts for the above figures as obtained using TCS Vitellus are provided as additional information.
Figure 2Distribution of samples associated with the odor character (left) and the percentage of odorants given low familiarity ratings by the subjects for each odor character (right).
Figure 3Overall workflow of model development for the classification tasks.
Performance of the algorithms on the sweet OC classification task: optimal model in bold.
| Algorithm | Train F1-score | Validation F1-score | Test F1-score |
|---|---|---|---|
| Gradient boosting machine | 0.823 | 0.792 | 0.813 |
| AdaBoost | 0.8 | 0.784 | 0.825 |
| Random forest | 0.839 | 0.789 | 0.821 |
| Support vector machine | 0.764 | 0.739 | 0.792 |
| XGBoost | 0.804 | 0.783 | 0.819 |
| K nearest neighbors | 0.802 | 0.778 | 0.78 |
| Gradient boosting machine | 0.81 | 0.78 | 0.812 |
| AdaBoost | 0.797 | 0.778 | 0.81 |
| Random forest | 0.806 | 0.786 | 0.824 |
| Support vector machine | 0.768 | 0.749 | 0.824 |
| XGBoost | 0.798 | 0.773 | 0.814 |
| K nearest neighbors | 0.789 | 0.758 | 0.803 |
| Gradient boosting machine | 0.816 | 0.785 | 0.813 |
| AdaBoost | 0.789 | 0.768 | 0.823 |
| Random forest | 0.818 | 0.785 | 0.824 |
| Support vector machine | 0.754 | 0.732 | 0.781 |
| K nearest neighbors | 0.784 | 0.773 | 0.838 |
Performance of the algorithms on the musky OC classification task; optimal model in bold.
| Algorithm | Train F1-score | Validation F1-score | Test F1-score |
|---|---|---|---|
| Gradient boosting machine | 0.651 | 0.601 | 0.683 |
| AdaBoost | 0.624 | 0.598 | 0.634 |
| Random forest | 0.661 | 0.573 | 0.636 |
| Support Vector Machine | 0.623 | 0.591 | 0.659 |
| XGBoost | 0.540 | 0.520 | 0.620 |
| K nearest neighbors | 0.589 | 0.523 | 0.643 |
| Gradient boosting machine | 0.680 | 0.633 | 0.704 |
| AdaBoost | 0.646 | 0.619 | 0.689 |
| Random forest | 0.678 | 0.638 | 0.628 |
| Support vector machine | 0.650 | 0.628 | 0.681 |
| K nearest neighbors | 0.647 | 0.629 | 0.644 |
| Gradient boosting machine | 0.590 | 0.581 | 0.571 |
| AdaBoost | 0.606 | 0.566 | 0.582 |
| Random forest | 0.608 | 0.562 | 0.564 |
| Support vector machine | 0.539 | 0.513 | 0.528 |
| XGBoost | 0.613 | 0.567 | 0.644 |
| K nearest neighbors | 0.559 | 0.560 | 0.580 |
Figure 4Heatmap showing correlation between features obtained after feature selection for (a) sweet OC prediction and (b) musky OC prediction.
Figure 5Input features ranked based on their importance in prediction of (a) sweet OC and (b) musk OC.
Odor description of the misclassified test compounds.
| S. No | Molecule | Ground truth label | Predicted label | Familiarity rating | Organoleptics from goodscents |
|---|---|---|---|---|---|
| 1 | 3-petanone | Non-sweet | Sweet | 20 | Ethereal acetone |
| 2 | allyl hexanoate | Non-sweet | Sweet | 20 | Sweet fruity pineapple tropical ethereal rum arrack fatty cognac |
| 3 | propyl acetate | Non-sweet | Sweet | 40 | Sweet and fruity |
| 4 | allyl phenyl acetate | Non-sweet | Sweet | 40 | Honey fruity rum |
| 5 | methyl – 3(methyl thio) propionate | Non-sweet | Sweet | 20 | Sulfurous vegetable onion sweet garlic tomato |
| 6 | 1,6- hexalactam | Non-sweet | Sweet | 20 | Amine spicy |
| 7 | methyl (methyl thio) acetate | Non-Sweet | Sweet | 60 | Sulfurous cooked potato roasted nut fruity tropical |
| 8 | octyl isovalerate | Non-sweet | Sweet | 20 | Warm floral rose honey apple pineapple |
| 9 | Ambroxan | Non-sweet | Sweet | 60 | Ambergris old paper sweet labdanum dry |
| 10 | isobutyl alcohol | Non-sweet | Sweet | 20 | Ethereal winey |
| 11 | 3-decen-2-one | Non-sweet | Sweet | 20 | Fatty green fruity apple earthy jasmine |
| 12 | benzaldehyde propylene glycol acetal | Non-sweet | Sweet | 20 | Bitter narcissus sweet napthalic woody |
| 13 | bis(methylthio)methane | Sweet | Non-sweet | 20 | Garlic sulfurous green spicy mushroom |
| 14 | trans-2-hexanal | Sweet | Non-sweet | 20 | Green leafy |
| 15 | 2-(4-hydroxyphenyl)ethylamine | Sweet | Non-sweet | 20 | Meaty dirty cooked phenolic rubbery |
| 16 | 2,5,-dimethyl pyrole | Sweet | Non-sweet | 20 | - |
| 17 | pyrazinyl ethane thiol | Sweet | Non-sweet | 20 | Sulfurous meaty cabbage |
| 18 | lepdidine | Sweet | Non-sweet | 80 | Burnt oil herbal floral sweet |
Figure 6Word cloud of semantic descriptors reported for odorants labelled as (a) sweet and (b) musky.