| Literature DB >> 30761295 |
Suqing Zheng1,2, Wenping Chang1, Wenxin Xu1, Yong Xu3, Fu Lin1.
Abstract
Artificial sweeteners (AS) can elicit the strong sweet sensation with the low or zero calorie, and are widely used to replace the nutritive sugar in the food and beverage industry. However, the safety issue of current AS is still controversial. Thus, it is imperative to develop more safe and potent AS. Due to the costly and laborious experimental-screening of AS, in-silico sweetener/sweetness prediction could provide a good avenue to identify the potential sweetener candidates before experiment. In this work, we curate the largest dataset of 530 sweeteners and 850 non-sweeteners, and collect the second largest dataset of 352 sweeteners with the relative sweetness (RS) from the literature. In light of these experimental datasets, we adopt five machine-learning methods and conformational-independent molecular fingerprints to derive the classification and regression models for the prediction of sweetener and its RS, respectively via the consensus strategy. Our best classification model achieves the 95% confidence intervals for the accuracy (0.91 ± 0.01), precision (0.90 ± 0.01), specificity (0.94 ± 0.01), sensitivity (0.86 ± 0.01), F1-score (0.88 ± 0.01), and NER (Non-error Rate: 0.90 ± 0.01) on the test set, which outperforms the model (NER = 0.85) of Rojas et al. in terms of NER, and our best regression model gives the 95% confidence intervals for the R2(test set) and ΔR2 [referring to |R2(test set)- R2(cross-validation)|] of 0.77 ± 0.01 and 0.03 ± 0.01, respectively, which is also better than the other works based on the conformation-independent 2D descriptors (e.g., 2D Dragon) according to R2(test set) and ΔR2. Our models are obtained by averaging over nineteen data-splitting schemes, and fully comply with the guidelines of Organization for Economic Cooperation and Development (OECD), which are not completely followed by the previous relevant works that are all on the basis of only one random data-splitting scheme for the cross-validation set and test set. Finally, we develop a user-friendly platform "e-Sweet" for the automatic prediction of sweetener and its corresponding RS. To our best knowledge, it is a first and free platform that can enable the experimental food scientists to exploit the current machine-learning methods to boost the discovery of more AS with the low or zero calorie content.Entities:
Keywords: QSAR; machine learning method; relative sweetness prediction; sweet taste; sweetener prediction
Year: 2019 PMID: 30761295 PMCID: PMC6363693 DOI: 10.3389/fchem.2019.00035
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.221
Figure 1The protocol to derive the classification and regression model used in this work.
The performance of four consensus models (CM01–CM04) for the sweetener/non-sweetener classification.
| CM01 | 0.91 (0.01) | 0.90 (0.03) | 0.94 (0.02) | 0.85 (0.02) | 0.88 (0.02) | 0.80 (0.02) | 0.90 (0.01) | 0.85 (0.01) | 0.87 (0.01) | 0.03 (0.02) | 0.03 (0.01) |
| CM02 | 0.91 (0.01) | 0.90 (0.03) | 0.94 (0.02) | 0.86 (0.03) | 0.88 (0.02) | 0.81 (0.03) | 0.90 (0.01) | 0.85 (0.01) | 0.87 (0.01) | 0.04 (0.02) | 0.03 (0.02) |
| CM03 | 0.89 (0.00) | 0.89 (0.01) | 0.93 (0.01) | 0.83 (0.00) | 0.86 (0.00) | 0.77 (0.01) | 0.88 (0.00) | 0.85 (0.01) | 0.87 (0.00) | 0.02 (0.01) | 0.02 (0.01) |
| CM04 | 0.89 (0.01) | 0.89 (0.02) | 0.94 (0.01) | 0.82 (0.01) | 0.85 (0.01) | 0.76 (0.01) | 0.88 (0.01) | 0.84 (0.00) | 0.87 (0.00) | 0.02 (0.00) | 0.02 (0.00) |
| CM01 | 0.91 ± 0.01 | 0.90 ± 0.01 | 0.94 ± 0.01 | 0.85 ± 0.01 | 0.88 ± 0.01 | 0.80 ± 0.01 | 0.90 ± 0.01 | 0.85 ± 0.01 | 0.87 ± 0.01 | 0.03 ± 0.01 | 0.03 ± 0.01 |
| CM02 | 0.91 ± 0.01 | 0.90 ± 0.01 | 0.94 ± 0.01 | 0.86 ± 0.01 | 0.88 ± 0.01 | 0.81 ± 0.01 | 0.90 ± 0.01 | 0.85 ± 0.01 | 0.87 ± 0.01 | 0.04 ± 0.01 | 0.03 ± 0.01 |
| CM03 | 0.89 ± 0.00 | 0.89 ± 0.01 | 0.93 ± 0.01 | 0.83 ± 0.00 | 0.86 ± 0.00 | 0.77 ± 0.00 | 0.88 ± 0.00 | 0.85 ± 0.00 | 0.87 ± 0.00 | 0.02 ± 0.00 | 0.02 ± 0.00 |
| CM04 | 0.89 ± 0.01 | 0.89 ± 0.02 | 0.94 ± 0.01 | 0.82 ± 0.01 | 0.85 ± 0.01 | 0.77 ± 0.01 | 0.88 ± 0.01 | 0.84 ± 0.00 | 0.87 ± 0.00 | 0.02 ± 0.00 | 0.02 ± 0.00 |
(1) The number in each parenthesis is the standard deviation, which is derived based on the multiple random data-splitting schemes; (2) ΔF1-score and ΔNER refer to | F1-score (test set)–F1-score (cross-validation) | and | NER (test set)–NER (cross-validation) | respectively; (3) “MCC,” “CV,” and “NER” are short for “Matthews correlation coefficient,” “cross-validation,” and “Non-error Rate,” respectively.
Figure 2The main features of e-Sweet platform for the sweetener and sweetness prediction.
Figure 3(A) the scatter-plot of ΔF1-score vs. F1-score for all the classification models; (B) The scatter plot of ΔR2 vs. R2(test set) for all the regression models. ΔF1-score [referring to |F1-score (test set)–F1-score(cross-validation)|] and ΔR2 [referring to |R2(test set)–R2(cross-validation)|] are used to monitor the potential overfitting or underfitting.
The performance of three consensus models (CM01–CM03) for the regression of relative sweetness (RS).
| CM01 | 0.77 (0.05) | 0.27 (0.06) | 0.39 (0.03) | 0.72 (0.05) | 0.07 (0.05) |
| CM02 | 0.78 (0.05) | 0.28 (0.06) | 0.40 (0.03) | 0.71 (0.05) | 0.07 (0.05) |
| CM03 | 0.77 (0.01) | 0.58 (0.31) | 0.58 (0.17) | 0.74 (0.01) | 0.03 (0.01) |
| CM01 | 0.77 ± 0.02 | 0.27 ± 0.03 | 0.39 ± 0.01 | 0.72 ± 0.02 | 0.07 ± 0.02 |
| CM02 | 0.78 ± 0.02 | 0.28 ± 0.03 | 0.40 ± 0.01 | 0.71 ± 0.02 | 0.07 ± 0.02 |
| CM03 | 0.77 ± 0.01 | 0.58 ± 0.27 | 0.58 ± 0.15 | 0.74 ± 0.01 | 0.03 ± 0.01 |
(1) The number in each parenthesis is the standard deviation, which is obtained on the basis of the multiple random data-splitting schemes; (2) ΔR.
Figure 4The histograms of average-similarity are utilized to define the applicability-domain of our classification (A) and regression models (B). Both average-similarity thresholds of 0.1 are defined and implemented in our e-Sweet platform to automatically check whether the compound to be predicted is within the applicability domain of our models.