| Literature DB >> 26891304 |
Salvador Gutiérrez1, Javier Tardaguila2, Juan Fernández-Novales3, Maria P Diago4.
Abstract
Plant phenotyping is a very important topic in agriculture. In this context, data mining strategies may be applied to agricultural data retrieved with new non-invasive devices, with the aim of yielding useful, reliable and objective information. This work presents some applications of machine learning algorithms along with in-field acquired NIR spectral data for plant phenotyping in viticulture, specifically for grapevine variety discrimination and assessment of plant water status. Support vector machine (SVM), rotation forests and M5 trees models were built using NIR spectra acquired in the field directly on the adaxial side of grapevine leaves, with a non-invasive portable spectrophotometer working in the spectral range between 1600 and 2400 nm. The ν-SVM algorithm was used for the training of a model for varietal classification. The classifiers' performance for the 10 varieties reached, for cross- and external validations, the 88.7% and 92.5% marks, respectively. For water stress assessment, the models developed using the absorbance spectra of six varieties yielded the same determination coefficient for both cross- and external validations (R² = 0.84; RMSEs of 0.164 and 0.165 MPa, respectively). Furthermore, a variety-specific model trained only with samples of Tempranillo from two different vintages yielded R² = 0.76 and RMSE of 0.16 MPa for cross-validation and R² = 0.79, RMSE of 0.17 MPa for external validation. These results show the power of the combined use of data mining and non-invasive NIR sensing for in-field grapevine phenotyping and their usefulness for the wine industry and precision viticulture implementations.Entities:
Keywords: SVM; non-destructive; plant water status; regression tree; rotation forest; stem water potential; variety classification
Mesh:
Year: 2016 PMID: 26891304 PMCID: PMC4801612 DOI: 10.3390/s16020236
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Diagram of the datasets used in both experiments and the different calibration and validation processes.
Confusion matrix of grapevine varietal classification using support vector machines and a 5-fold cross-validation. The diagonal of the matrix corresponds to the number of samples that were properly classified. The last column displays, for each variety, the correctly classified percentage (n = 159).
| Predicted Variety | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CS | CL | CR | WG | PX | PN | TE | TR | VO | VU | % | ||
| Actual variety | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ||||
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | ||||
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | ||||
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ||||
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ||||
| 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | ||||
| 0 | 1 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ||||
CS: Cabernet Sauvignon; CL: Caladoc; CR: Carmenere; WG: White Grenache; PX: Pedro Ximenez; PN: Pinot Noir; TE: Tempranillo; TR: Treixadura; VO: Viognier; VU: Viura.
Detailed accuracy by class of the grapevine varietal classification using support vector machines and a 5-fold cross-validation (n = 159).
| Class | True Positive Rate | False Positive Rate | Precision | AUC |
|---|---|---|---|---|
| 1.000 | 0.007 | 0.941 | 0.997 | |
| 0.938 | 0.014 | 0.882 | 0.997 | |
| 0.938 | 0.007 | 0.938 | 0.998 | |
| 0.800 | 0.007 | 0.923 | 0.985 | |
| 0.813 | 0.021 | 0.813 | 0.980 | |
| 0.875 | 0.014 | 0.875 | 0.976 | |
| 0.938 | 0.014 | 0.882 | 0.997 | |
| 0.875 | 0.000 | 1.000 | 0.999 | |
| 0.750 | 0.028 | 0.750 | 0.992 | |
| 0.938 | 0.014 | 0.882 | 0.992 | |
| Weighted average | 0.887 | 0.013 | 0.888 | 0.991 |
AUC: area under the receiver operating characteristic (ROC) curve.
Confusion matrix of grapevine varietal classification using support vector machines and an external validation of 40 samples. The diagonal of the matrix corresponds to the number of samples that were properly classified. The last column displays, for each variety, the correctly classified percentage (n = 40).
| Predicted Variety | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CS | CL | CR | WG | PX | PN | TE | TR | VO | VU | % | ||
| Actual variety | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ||||
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
CS: Cabernet Sauvignon; CL: Caladoc; CR: Carmenere; WG: White Grenache; PX: Pedro Ximenez; PN: Pinot Noir; TE: Tempranillo; TR: Treixadura; VO: Viognier; VU: Viura.
Detailed accuracy by class of the grapevine varietal classification using support vector machines and an external validation of 40 samples (n = 40).
| Class | True Positive Rate | False Positive rate | Precision | AUC |
|---|---|---|---|---|
| 1.000 | 0.000 | 1.000 | 1.000 | |
| 1.000 | 0.000 | 1.000 | 1.000 | |
| 1.000 | 0.000 | 1.000 | 1.000 | |
| 1.000 | 0.028 | 0.800 | 0.993 | |
| 1.000 | 0.000 | 1.000 | 1.000 | |
| 0.750 | 0.028 | 0.750 | 0.972 | |
| 0.750 | 0.000 | 1.000 | 1.000 | |
| 1.000 | 0.000 | 1.000 | 1.000 | |
| 0.750 | 0.000 | 1.000 | 1.000 | |
| 1.000 | 0.028 | 0.800 | 1.000 | |
| Weighted average | 0.925 | 0.008 | 0.935 | 0.997 |
AUC: area under the receiver operating characteristic (ROC) curve.
Stem water potential (ψ) ranges per variety.
| Variety | ||||||
|---|---|---|---|---|---|---|
| Godello | Pedro Ximenez | Grenache | Carmenere | Tempranillo | Marselan | |
| −0.90 | −0.65 | −1.15 | −1.45 | −1.85 | −1.02 | |
| −0.62 | −0.42 | −0.85 | −1.10 | −1.62 | −0.85 | |
Statistic overview and results of the ψ (MPa) estimation using a rotation forest and M5 trees.
| Statistics | Rotation Forest and M5 Trees | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Calibration (n = 94) | 5-Fold Cross-Validation (n = 94) | External Validation (n = 24) | ||||||||
| n | Min | Max | Mean | SD | R2 | RMSE | R2 | RMSE | R2 | RMSE |
| 118 | −1.85 | −0.42 | −1.03 | 0.396 | 0.97 | 0.083 | 0.84 | 0.164 | 0.84 | 0.165 |
n: number of samples; Min: minimum; Max: maximum; SD: standard deviation; RMSE: root-mean-square error in MPa.
Figure 2Regression plot for ψ estimation using a Rotation Forest and M5 trees with a 5-fold cross (a) and external (b) validations. Prediction confidence bands are shown at a 95% level (dashed lines). Solid line represents the regression line and dotted line refers to the 1:1 line. Each points’ color and shape refers to its absolute error value |ε| (the absolute value of the difference between the actual value and the predicted one) in MPa: green ●: |ε| < 0.1, minimal error; olive ■: 0.1 ≤ |ε| < 0.2, low error; orange ►: 0.2 ≤ |ε| < 0.4, moderate error; red ▼: |ε| ≥ 0.4, high error.
Statistic overview and results of the ψ (MPa) estimation for the variety-specific model (Tempranillo) using a rotation forest and M5 trees.
| Statistics | Rotation Forest and M5 Trees | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Calibration (n = 45) | 5-Fold Cross-Validation (n = 45) | External Validation (n = 11) | ||||||||
| n | Min | Max | Mean | SD | R2 | RMSE | R2 | RMSE | R2 | RMSE |
| 56 | −1.85 | −0.8 | −1.447 | 0.314 | 0.92 | 0.098 | 0.76 | 0.159 | 0.79 | 0.168 |
n: number of samples; Min: minimum; Max: maximum; SD: standard deviation; RMSE: root-mean-square error in MPa.
Figure 3Regression plot for ψ estimation of the variety-specific model (Tempranillo) using a Rotation Forest and M5 trees with a 5-fold cross (a) and external (b) validations. Prediction confidence bands are shown at a 95% level (dashed lines). Solid line represents the regression line and dotted line refers to the 1:1 line. Each points’ color and shape refers to its absolute error value |ε| (the absolute value of the difference between the actual value and the predicted one) in MPa: green ●: |ε| < 0.1, minimal error; olive ■: 0.1 ≤ |ε| < 0.2, low error; orange ▼: 0.2 ≤ |ε| < 0.4, moderate error; red ►: |ε| ≥ 0.4, high error.