| Literature DB >> 29696137 |
Priyanka Banerjee1, Robert Preissner1.
Abstract
Taste of a chemical compound present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96% and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10% of the natural product space as sweet with confidence score of 0.60 and above. 77% of the approved drug set was predicted as bitter and 2% as sweet with a confidence score of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds using the feature space of a circular fingerprint.Entities:
Keywords: KNIME workflow; Random Forest; bitter prediction; fingerprints; sweetness prediction; taste prediction
Year: 2018 PMID: 29696137 PMCID: PMC5905275 DOI: 10.3389/fchem.2018.00093
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.221
Cross-validation and external set validation results for sweet prediction.
| Cross-validation | 416 | 545 | 95.00 | 0.98 | 0.90 | 0.97 | 0.82 | 0.94 |
| External validation | 102 | 139 | 96.69 | 0.98 | 0.91 | 0.97 | 0.92 | 0.92 |
Cross-validation and external set validation results for bitter prediction.
| Cross-validation | 545 | 416 | 95.00 | 0.97 | 0.97 | 0.90 | 0.82 | 0.96 |
| External validation | 139 | 102 | 96.69 | 0.98 | 0.97 | 0.91 | 0.92 | 0.95 |
Figure 1The distribution of top 10 most occuring frequent features in the sweet compounds and their relative occurences in the bitter class.
Figure 2The distribution of top 10 most occuring frequent features in the bitter compounds and their relative occurences in the sweet class.
Figure 3Graphical representation of the relative frequency distribution of each feature index in the sweet class (green) and bitter class (red) for Morgan fingerprints (2,048 bits).
Figure 4Top 20 most occurring features and their respective index position in both sweet and bitter molecules. It can be inferred from the figure that the top occurring features between sweet and bitter compounds used in this model are highly independent as individual index position in the fingerprints (bits set to 1) differs.
Figure 5Percentage of approved drugs predicted as bitter and their corresponding Anatomical Therapeutic Chemical (ATC) class.
BitterSweetForest prediction of the oral toxicity compounds.
| 1 (fatal) | X ≤ 5 | 510 | 384 | 0 |
| 2 (fatal) | 5 < X ≤ 50 | 1,779 | 1,392 | 5 |
| 3 (toxic) | 50 < X ≤ 300 | 6,918 | 5,579 | 3 |
| 4 (harmful) | 300 < X ≤ 2,000 | 21,884 | 17,340 | 24 |
| 5 (may be harmful) | 2,000 < X ≤ 5,000 | 6,740 | 5,413 | 26 |
Comparison with top two methods predicting sweet taste of molecules.
| Training | – | – | 90.00 | – | – | |
| Test | – | – | 0.92 | – | – | |
| QSTR (2017) | Training | 327 | 161 | 83.00 | 0.77 | 0.89 |
| Test | 108 | 53 | 85.0 | 0.79 | 0.91 | |
| BitterSweet Forest | Training | 416 | 548 | 93.50 | 0.97 | 0.90 |
| Test | 102 | 139 | 94.00 | 0.97 | 0.91 |
* NER, Ratio of correctly classified molecules to the total number of molecules.
Comparison with top two methods predicting bitter taste of molecules.
| Bitter X (2016) | Training | 431 | 431 | 88.00 | – | – | – |
| Test | 108 | 108 | 91.00 | 0.90 | 0.91 | 0.94 | |
| BitterPredict (2017) | Training | 484 | 1,343 | 93.00 | 0.91 | 0.94 | – |
| Test | 207 | 574 | 83.00 | 0.77 | 0.86 | – | |
| BitterSweet Forest | Training | 545 | 416 | 95.00 | 0.97 | 0.90 | 0.97 |
| Test | 139 | 102 | 96.69 | 0.97 | 0.91 | 0.98 |