| Literature DB >> 26600316 |
Salvador Gutiérrez1, Javier Tardaguila1, Juan Fernández-Novales1, María P Diago1.
Abstract
The identification of different grapevine varieties, currently attended using visual ampelometry, DNA analysis and very recently, by hyperspectral analysis under laboratory conditions, is an issue of great importance in the wine industry. This work presents support vector machine and artificial neural network's modelling for grapevine varietal classification from in-field leaf spectroscopy. Modelling was attempted at two scales: site-specific and a global scale. Spectral measurements were obtained on the near-infrared (NIR) spectral range between 1600 to 2400 nm under field conditions in a non-destructive way using a portable spectrophotometer. For the site specific approach, spectra were collected from the adaxial side of 400 individual leaves of 20 grapevine (Vitis vinifera L.) varieties one week after veraison. For the global model, two additional sets of spectra were collected one week before harvest from two different vineyards in another vintage, each one consisting on 48 measurement from individual leaves of six varieties. Several combinations of spectra scatter correction and smoothing filtering were studied. For the training of the models, support vector machines and artificial neural networks were employed using the pre-processed spectra as input and the varieties as the classes of the models. The results from the pre-processing study showed that there was no influence whether using scatter correction or not. Also, a second-degree derivative with a window size of 5 Savitzky-Golay filtering yielded the highest outcomes. For the site-specific model, with 20 classes, the best results from the classifiers thrown an overall score of 87.25% of correctly classified samples. These results were compared under the same conditions with a model trained using partial least squares discriminant analysis, which showed a worse performance in every case. For the global model, a 6-class dataset involving samples from three different vineyards, two years and leaves monitored at post-veraison and harvest was also built up, reaching a 77.08% of correctly classified samples. The outcomes obtained demonstrate the capability of using a reliable method for fast, in-field, non-destructive grapevine varietal classification that could be very useful in viticulture and wine industry, either global or site-specific.Entities:
Mesh:
Year: 2015 PMID: 26600316 PMCID: PMC4658183 DOI: 10.1371/journal.pone.0143197
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Average raw and pre-processed spectra with SNV+de-trending from the whole set of samples.
Solid line: SNV+de-trending. Dashed line: Raw.
Parameter sets for SVM and ANN algorithms.
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| 3.5 | 1 |
| 0.3 | 0.1 |
|
| 0.1 | 1 |
| 0.3 | 0.9 |
|
| 1 | 1 |
| 0.7 | 0.1 |
|
| 10 | 1 |
| 0.3 | 0.1 |
|
| 3.5 | 2 |
| 0.3 | 0.9 |
|
| 0.1 | 2 |
| 0.7 | 0.1 |
|
| 1 | 2 |
| 0.3 | 0.1 |
|
| 10 | 2 |
| 0.3 | 0.9 |
|
| 3.5 | 3 |
| 0.7 | 0.1 |
|
| 0.1 | 3 |
| 0.3 | 0.1 |
|
| 1 | 3 |
| 0.3 | 0.9 |
|
| 10 | 3 |
| 0.7 | 0.1 |
‡ 0: no processing elements (PEs) in hidden layer; a: PEs = (#attributes + #classes)/2; i: PEs = #attributes; o: PEs = #classes.
SVM: Support Vector Machine; ANN: Artificial Neural Network.
Comparison of means of percentage of correctly classified leaves for signal scatter correction attending to the algorithm used for N = 20 and N = 5 datasets.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
| 44.8 | 75.8 | 78.4 |
|
| 47.2 | 75.1 | 77.9 | |
|
|
|
|
| |
|
|
| 76.5 | 87.9 | 87.1 |
|
| 75.8 | 85.1 | 86.7 | |
|
|
| * * |
|
n.s.: not significant (p ≥ 0.05); * *: p < 0.01; (Tukey’s range test at a significance level p = 0.05). SNV+D: Standard Normal Variate followed by De-trending; PLS-DA: Partial Least Squares Discriminant Analysis; SVM: Support Vector Machine; ANN: Artificial Neural Network.
Correctly classified percentages of grapevine leaves for each Savitzky-Golay filter and algorithm combination for N = 20 and N = 5 number of varieties.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
| 45.3 B | 77.5 A | 81.2 A | * * * |
|
| 44.4 B | 69.7 A | 67.3 A | * * * | |
|
| 51.0 B | 78.0 A | 84.3 A | * * * | |
|
| 43.2 B | 76.7 A | 79.8 A | * * * | |
|
|
| * * | * * * | ||
|
|
| 81.5 | 87.8 | 88.0 |
|
|
| 72.5 B | 79.3 A | 81.0 A | * | |
|
| 84.5 B | 91.2 A | 91.6 A | * * * | |
|
| 66.0 B | 87.8 A | 86.9 A | * * * | |
|
| * * | * * * | * * * |
The values shown are the varieties correctly classified percentage. Each value is, in turn, the average of the results obtained using and not using scatter correction and, for SVM and ANN, the 12 parameter sets.
PLS-DA: Partial Least Squares Discriminant Analysis; SVM: Support Vector Machine; ANN: Artificial Neural Network.
Uppercase and italic lowercase letters attend respectively to row-wise (comparison among algorithms) and column-wise (comparison among Savitzky-Golay filters) values comparison. n.s.: not significant (p ≥ 0.05); *: p < 0.05; * *: p < 0.01; * * *: p < 0.001.
Fig 2Average raw (A) and processed spectra (B) with SNV+de-trending+Savitzky-Golay filter (second-degree derivative, window size 5) from all samples.
Confusion matrix from the execution with the best score (ANN, SNV+D, D2W5 and parameter set 10) with an overall correctly classified value of 87.25% (20 leaves per variety).
|
| |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
| 1 | 3 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 50 |
|
| 0 |
| 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 70 |
|
| 1 | 3 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 75 |
|
| 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 95 |
|
| 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 95 |
|
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 90 |
|
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 80 |
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 85 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 90 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 95 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 95 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
|
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 95 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 0 | 0 | 0 | 0 | 0 | 90 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 85 |
|
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 90 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 95 |
|
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | 0 |
| 0 | 70 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 100 |
Each row represents the actual variety and in which one was classified. Bolded values (diagonal of the matrix) are the number of samples properly classified. The last column shows the correctly classified percentage for each variety.
ANN: Artificial Neural Network; SNV+D: Standard Normal Variate followed by De-trending; D2W5: Second-degree derivative and window size 5 Savitzky-Golay filter.
Ve: Verdejo; M: Malvasia; V: Viura; A: Albariño; T: Treixadura; G: Godello; WG: White Grenache; WT: White Tempranillo; PX: Pedro Ximénez; Vi: Viognier; CF: Cabernet Franc; Gr: Grenache; CS: Cabernet Sauvignon; C: Carmenere; S: Syrah; Te: Tempranillo; PN: Pinot Noir; Ca: Caladoc; Ma: Marselan; TN: Touriga Nacional.
Confusion matrix from the global dataset execution with the best score (ANN, NoSNV+D, D2W5 and parameter set 6) with an overall correctly classified value of 77.08% (24 leaves per variety).
|
| |||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
|
|
| 1 | 3 | 0 | 1 | 4 | 62.5 |
|
| 0 |
| 0 | 2 | 1 | 0 | 87.5 |
|
| 0 | 0 |
| 1 | 0 | 1 | 91.7 |
|
| 1 | 3 | 1 |
| 2 | 0 | 70.8 |
|
| 1 | 0 | 0 | 6 |
| 1 | 66.7 |
|
| 0 | 0 | 1 | 0 | 3 |
| 83.3 |
Each row represents the actual variety and in which one was classified. Bolded values (diagonal of the matrix) are the number of samples properly classified. The last column shows the correctly classified percentage for each variety.
ANN: Artificial Neural Network; NoSNV+D: No application of Standard Normal Variate followed by De-trending; D2W5: Second-degree derivative and window size 5 Savitzky-Golay filter.
V: Viura; Gr: Grenache; T: Treixadura; Te: Tempranillo; S: Syrah; A: Albariño.