| Literature DB >> 26075889 |
Marcos Hernández Suárez1, Gonzalo Astray Dopazo2, Dina Larios López3, Francisco Espinosa4.
Abstract
There are a large number of tomato cultivars with a wide range of morphological, chemical, nutritional and sensorial characteristics. Many factors are known to affect the nutrient content of tomato cultivars. A complete understanding of the effect of these factors would require an exhaustive experimental design, multidisciplinary scientific approach and a suitable statistical method. Some multivariate analytical techniques such as Principal Component Analysis (PCA) or Factor Analysis (FA) have been widely applied in order to search for patterns in the behaviour and reduce the dimensionality of a data set by a new set of uncorrelated latent variables. However, in some cases it is not useful to replace the original variables with these latent variables. In this study, Automatic Interaction Detection (AID) algorithm and Artificial Neural Network (ANN) models were applied as alternative to the PCA, AF and other multivariate analytical techniques in order to identify the relevant phytochemical constituents for characterization and authentication of tomatoes. To prove the feasibility of AID algorithm and ANN models to achieve the purpose of this study, both methods were applied on a data set with twenty five chemical parameters analysed on 167 tomato samples from Tenerife (Spain). Each tomato sample was defined by three factors: cultivar, agricultural practice and harvest date. General Linear Model linked to AID (GLM-AID) tree-structured was organized into 3 levels according to the number of factors. p-Coumaric acid was the compound the allowed to distinguish the tomato samples according to the day of harvest. More than one chemical parameter was necessary to distinguish among different agricultural practices and among the tomato cultivars. Several ANN models, with 25 and 10 input variables, for the prediction of cultivar, agricultural practice and harvest date, were developed. Finally, the models with 10 input variables were chosen with fit's goodness between 44 and 100%. The lowest fits were for the cultivar classification, this low percentage suggests that other kind of chemical parameter should be used to identify tomato cultivars.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26075889 PMCID: PMC4467870 DOI: 10.1371/journal.pone.0128566
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Diagram operational of an artificial neuron and sample diagram for an ANN3 model with ten input neurons, five neurons in the intermediate layer and three output neurons, that’s, with a topology of 10-5-3.
Mean content of the chemical parameters and estimation of the influence factors (p-value).
Data are expressed as % or quantities per fresh weight.
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Parameter | Content |
|
|
|
|
|
|
| Explained variance (%) |
| Fructose (%) | 1.28±0.41 | 0.0121 | 0.001 | 0.030 | 47.6 | ||||
| Glucose (%) | 1.29±0.41 | 0.000 | 0.000 | 0.004 | 0.033 | 56.7 | |||
| Total fiber (%) | 1.81±0.56 | 0.006 | 0.008 | 0.001 | 0.027 | 0.000 | 0.003 | 57.3 | |
| Protein (%) | 0.80±0.15 | 0.005 | 0.018 | 36.2 | |||||
| Phenolic compound (mg/100g) | 20.41±4.3 | 0.020 | 0.005 | 38.3 | |||||
| Lycopene (mg/ 100g) | 2.31±0.72 | 0.000 | 53.9 | ||||||
| P (mg/kg) | 246±61 | 0.000 | 0.034 | 50.3 | |||||
| Na (mg/kg) | 92.4±63.4 | 0.004 | 0.000 | 0.004 | 0.018 | 58.8 | |||
| K (mg/kg) | 2522±512 | 0.000 | 0.015 | 52.2 | |||||
| Ca (mg/kg) | 67.5±18.6 | 0.010 | 0.001 | 0.000 | 0.002 | 0.000 | 0.009 | 59.3 | |
| Mg (mg/kg) | 115±22 | 0.000 | 0.000 | 0.038 | 59.3 | ||||
| Fe (mg/kg) | 1.92±0.05 | 0.000 | 0.020 | 0.040 | 0.002 | 0.000 | 53.3 | ||
| Cu (mg/kg) | 0.30±0.15 | 0.000 | 0.022 | 0.017 | 50.4 | ||||
| Zn (mg/kg) | 0.77±0.21 | 0.045 | 0.000 | 0.024 | 49.9 | ||||
| Mn (mg/kg) | 0.60±0.21 | 0.000 | 0.008 | 0.000 | 0.000 | 0.032 | 69.8 | ||
| Ascorbic acid (mg/ 100g) | 15.3±4.48 | 0.035 | 37.8 | ||||||
| Oxalic acid (mg/ 100g) | 25.6±9.3 | 0.011 | 37.5 | ||||||
| Pyruvic acid (mg/ 100g) | 1.37±0.77 | 0.031 | 0.001 | 0.000 | 0.028 | 0.004 | 59.8 | ||
| Malic acid (mg/ 100g) | 78.3±40.2 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.002 | 0.000 | 75.0 |
| Citric acid (mg/ 100g) | 354±121 | 0.000 | 0.043 | 0.037 | 0.013 | 48.2 | |||
| Fumaric acid (mg/ 100g) | 2.77±1.22 | 0.009 | 36.0 | ||||||
| Chlorogenic acid (mg/ 100g) | 0.59±0.05 | 0.000 | 0.027 | 0.030 | 50.7 | ||||
| Caffeic acid (mg/ 100g) | 0.04±0.01 | 0.000 | 53.9 | ||||||
| Ferulic acid (mg/ 100g) | 0.09±0.04 | 0.032 | 0.000 | 0.001 | 0.022 | 57.2 | |||
|
| 0.02±0.03 | 0.000 | 73.6 | ||||||
a Values used to select the first predictor for the GLM-AID analysis.
b Expressed as galic acid
Fig 2Tree-structured with main significant differences according to the GLM-AID analysis for the tomato samples.
2A October; 2B December; 2C February; 2D April.
Average Percentage of Success (APS) for the training, validation and average phases (mean APPS for training and validation phase together) considering all variables for harvest date (APSH), production type (APSP) and the tomato cultivar (APSC) for models with 25 input variables (ANN1 and ANN2) and models with 10 input variables (ANN3 and ANN4).
| Training phase | Validation phase | Average | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Type | Topology | Training Cycles | APSH | APSP | APSC | APSH | APSP | APSC | APSH | APSP | APSC |
| ANN1 | 25-41-3 | 50,000 | 100 | 99.3 | 96.7 | 68.8 | 75 | 50 | 97 | 97 | 92.2 |
| 25-50-3 | 50,000 | 100 | 98.7 | 95.4 | 87.5 | 62.5 | 12.5 | 98.8 | 95.2 | 87.4 | |
| 25-35-3 | 50,000 | 99.3 | 100 | 94.7 | 87.5 | 75 | 37.5 | 98.2 | 97.6 | 89.2 | |
| ANN2 | 25-13-1 | 2,000 | 99.3 | 100 | 99.4 | ||||||
| 25-28-1 | 2,000 | 100 | 87.5 | 98.8 | |||||||
| 25-44-1 | 25,000 | 99.3 | 50 | 94.6 | |||||||
| ANN3 | 10-13-3 | 200,000 | 87.4 | 90.1 | 82.8 | 68.8 | 68.8 | 43.8 | 85.6 | 88 | 79 |
| 10-10-3 | 200,000 | 88.7 | 85.4 | 64.9 | 93.8 | 56.3 | 31.3 | 89.2 | 82.6 | 61.7 | |
| 10-9-3 | 400,000 | 68.2 | 84.1 | 46.4 | 75 | 93.8 | 18.8 | 68.9 | 85 | 43.7 | |
| ANN4 | 10-18-1 | 1,000 | 98.7 | 100 | 98.8 | ||||||
| 10-8-1 | 64,000 | 93.4 | 81.3 | 92.2 | |||||||
| 10-18-1 | 32,000 | 61.6 | 31.3 | 58.7 | |||||||
a The first value corresponds to input variables, the second value corresponds with intermediate neurons, and the last value corresponds with the neurons number in the output layer
b Best models development for the particular case.
Fig 3Individual classification for each factor according to prediction models selected to predict the harvests date (D) ANN4 10-18-1, the production type (P) ANN3 10-9-3, and the tomato cultivar (C) ANN3 10-13-3.
The left block (ten columns) shows the samples for the training phase (151 cases) and the right block (one column) are the samples for the validation phase (16 cases). The cases correctly classified are highlighted in green whilst the incorrectly classified are in red.