| Literature DB >> 31739559 |
Estela Guardado Yordi1,2, Raúl Koelig1, Maria J Matos2,3, Amaury Pérez Martínez1,4, Yailé Caballero1, Lourdes Santana2, Manuel Pérez Quintana4, Enrique Molina1,2, Eugenio Uriarte2,5.
Abstract
Increasing interest in constituents and dietary supplements has created the need for more efficient use of this information in nutrition-related fields. The present work aims to obtain optimal models to predict the total antioxidant properties of food matrices, using available information on the amount and class of flavonoids present in vegetables. A new dataset using databases that collect the flavonoid content of selected foods has been created. Structural information was obtained using a structural-topological approach called TOPological Sub-Structural Molecular (TOPSMODE). Different artificial intelligence algorithms were applied, including Machine Learning (ML) methods. The study allowed us to demonstrate the effectiveness of the models using structural-topological characteristics of dietary flavonoids. The proposed models can be considered, without overfitting, effective in predicting new values of Oxygen Radical Absorption capacity (ORAC), except in the Multi-Layer Perceptron (MLP) algorithm. The best optimal model was obtained by the Random Forest (RF) algorithm. The in silico methodology we developed allows us to confirm the effectiveness of the obtained models, by introducing the new structural-topological attributes, as well as selecting those that most influence the class variable.Entities:
Keywords: artificial intelligence; flavonoid; total antioxidant capacity
Year: 2019 PMID: 31739559 PMCID: PMC6915672 DOI: 10.3390/foods8110573
Source DB: PubMed Journal: Foods ISSN: 2304-8158
Figure 1Percentage of each NDB (Nutrient Database) alimentary group represented in the studied dataset.
Examples of the conformation of the dataset and the respective attributes.
| (NDB No)-ALIMENTARY GROUP a | FOOD a/NDB No. | ATTRIBUTES | CLASS (ORAC EXP) Mean | ||||
|---|---|---|---|---|---|---|---|
| Flavonoid a | Class of Flavonoid a | Amount of Flavonoid (Mean) a | TEACexp b | TPexp Mean | |||
| (11)—Vegetables and Vegetable Products | Broccoli, raw (Brassica oleracea var. italica)/11090 | (+)-Catechin | Flavan-3-ols | 0 | 2.4 | 316 c | 1510 [ |
| (-)-Epigallocatechin 3-gallate | Flavan-3-ols | 0 | 4.93 | ||||
| Hesperetin | Flavanones | 0 | 1.37 | ||||
| Naringenin | Flavanones | 0 | 1.53 | ||||
| Apigenin | Flavones | 0 | 1.45 | ||||
| Luteolin | Flavones | 0.8 | 2.09 | ||||
| Kaempferol | Flavonols | 7.84 | 1.34 | ||||
| Myricetin | Flavonols | 0.06 | 3.1 | ||||
| Quercetin | Flavonols | 3.26 | 4.7 | ||||
| (02)—Spices and Herbs | Guava, red-fleshed/99428 | Apigenin | Flavones | 0 | 1.45 | 247 d | 1990 [ |
| Luteolin | Flavones | 0.8 | 2.09 | ||||
| Kaempferol | Flavonols | 0 | 1.34 | ||||
| Myricetin | Flavonols | 0 | 3.1 | ||||
| Quercetin | Flavonols | 1 | 4.7 | ||||
a Extracted from FCDB [3,5]. b Extracted from [58]. c Extracted from [14]. d Extracted from [57]. Trolox equivalent antioxidant capacity flavonoid value (TEACexp). Total polyphenol value (TPexp). Nutrient Database Number (NDB No).
Examples of the chemical information of flavonoids, and their presence in food, contained in the studied database.
| FLAVONOIDS | STRUCTURE | SMILE | NAME FOOD | NDB No. a |
|---|---|---|---|---|
| (-)-Epicatechin 3-gallate |
| C1C(C(OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)OC(=O)C4=CC(=C(C(=C4)O)O)O | Apples, Fuji, raw, with skin | 09504 |
| (+)-Catechin |
| OC1CC2=C(O)C=C(O)C=C2OC1C3=CC=C(O)C(=C3)O | Bananas, raw ( | 09040 |
| Hesperetin |
| O=C(CC(C3=CC(O)=C(OC)C=C3)O2)C1=C2C=C(O)C=C1O | Juice, orange, raw | 09206 |
| Naringenin |
| OC1=CC=C(C=C1)C2CC(=O)C3=C(O2)C=C(O)C=C3O | Melons, honeydew, raw ( | 09184 |
| Apigenin |
| O=C(C=C(C3=CC=C(O)C=C3)O2)C1=C2C=C(O)C=C1O | Pineapple, raw, all varieties ( | 09266 |
| Luteolin |
| O=C(C=C(C3=CC(O)=C(O)C=C3)O2)C1=C2C=C(O)C=C1O | Pomegranates, raw ( | 09286 |
| Kaempferol |
| O=C(C(O)=C(C3=CC=C(O)C=C3)O2)C1=C2C=C(O)C=C1O | Broccoli, cooked, boiled, drained, without salt | 11091 |
| Quercetin |
| O=C(C(O)=C(C3=CC(O)=C(O)C=C3)O2)C1=C2C=C(O)C=C1O | Mushrooms, white, raw ( | 11260 |
| Myricetin |
| O=C(C(O)=C(C3=CC(O)=C(O)C(O)=C3)O2)C1=C2C=C(O)C=C1O | Potatoes, red, flesh and skin, raw ( | 11355 |
a Nutrient Database Number (NDB No) [3].
Hierarchy of attributes of the set A1 regarding their influence in the class.
| Order | Attributes | Correlation Value f | Set of Attributes for the Model |
|---|---|---|---|
| 1 | TPexp a | 0.1551576 | A2 |
| 2 | µ8 H | 0.1483031 | A2 |
| 3 | µ12 H | 0.1349679 | A2 |
| 4 | µ11 H | 0.1213032 | A2 |
| 5 | µ10 H | 0.1206462 | A2 |
| 6 | µ13 H | 0.1096691 | A2 |
| 7 | id_flav b | 0.1018874 | (-) |
| 8 | mean_flav c | 0.0341301 | (-) |
| 9 | TEACexp d | 0.0108586 | (-) |
| 10 | Class_flav e | 0.0094634 | (-) |
a TPexp (Total polyphenol value). b id_flav (Flavonoids). c mean_flav (Amount of flavonoid (mean). d TEACexp (Trolox equivalent antioxidant capacity flavonoid value). e Class_flav (Class of flavonoid). f Value of correlation with the class. (-) not selected for the model. H bonding weight n-octanol/water partition coefficient.
Statistics corresponding to the training set score for the optimal models of each of the ML algorithms.
| Algorithm | RMSE | Rsquared |
|---|---|---|
| KNN | 1851.174 | 0.905 |
| RF | 1271.060 | 0.957 |
| MLP | 6582.955 | 0.284 |
| SVM | 1790.536 | 0.901 |
ML: Machine Learning; RMSE, Root Mean Squared Error; KNN: nearest k-neighbor algorithm; RF: Random Forest; MLP: Multi-Layer Perceptron; SVM: Support Vector Machine.
Statistics corresponding to the test set score for the optimal models of each of the ML algorithms.
| Algorithm | RMSE | Rsquared |
|---|---|---|
| KNN | 1956.810 | 0.880 |
| SVM | 1622.627 | 0.917 |
| RF | 1557.108 | 0.925 |
| MLP | 6429.185 | 0.007 |
Figure 2Effectiveness performance versus RMSE (Root Mean Squared Error) for each algorithm. (a) KNN (nearest k-neighbor algorithm). (b) SVM (Support Vector Machine). (c) RF (Random Forest). (d) MLP (Multi-Layer Perceptron).
Figure 3Representation of the numerical outputs in each of the models for the training and tested dataset.
Figure 4Representation of numerical outputs in each model for training and dataset tested (a) KNN, (b) SVM, (c) RF, and (d) MLP.