| Literature DB >> 34063552 |
Véronique Gomes1, Ana Mendes-Ferreira1,2,3, Pedro Melo-Pinto1,4.
Abstract
Remote sensing technology, such as hyperspectral imaging, in combination with machine learning algorithms, has emerged as a viable tool for rapid and nondestructive assessment of wine grape ripeness. However, the differences in terroir, together with the climatic variations and the variability exhibited by different grape varieties, have a considerable impact on the grape ripening stages within a vintage and between vintages and, consequently, on the robustness of the predictive models. To address this challenge, we present a novel one-dimensional convolutional neural network architecture-based model for the prediction of sugar content and pH, using reflectance hyperspectral data from different vintages. We aimed to evaluate the model's generalization capacity for different varieties and for a different vintage not employed in the training process, using independent test sets. A transfer learning mechanism, based on the proposed convolutional neural network, was also used to evaluate improvements in the model's generalization. Overall, the results for generalization ability showed a very good performance with RMSEP values of 1.118 °Brix and 1.085 °Brix for sugar content and 0.199 and 0.183 for pH, for test sets using different varieties and a different vintage, respectively, improving and updating the current state of the art.Entities:
Keywords: convolutional neural networks; grape berries; hyperspectral imaging; machine learning; prediction; transfer learning
Year: 2021 PMID: 34063552 PMCID: PMC8156429 DOI: 10.3390/s21103459
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Number of samples collected for each vintage and variety.
| Vintage | Variety | No. of Samples |
|---|---|---|
| 2012 | Touriga Franca | 240 |
| 2013 | Touriga Franca | 81 |
| Touriga Nacional | 60 | |
| Tinta Barroca | 82 | |
| 2014 | Touriga Franca | 120 |
| Touriga Nacional | 118 | |
| Tinta Barroca | 120 | |
| 2016 | Touriga Franca | 407 |
| Touriga Nacional | 132 | |
| Tinta Barroca | 143 | |
| 2017 | Touriga Franca | 540 |
| Touriga Nacional | 144 | |
| Tinta Barroca | 118 | |
| 2018 | Touriga Franca | 360 |
Figure 1One-dimensional convolutional neural network architecture design.
Bayesian optimization hyperparameter settings.
| Hyperparameter | Range Values |
|---|---|
| Convolution layer 1—number of filters (#Filters 1) | 5–256 |
| Convolution layer 1—kernel size 1 | 3–100 |
| Convolution layer 2—number of Filters (#Filters 2) | 5–256 |
| Convolution layer 2—kernel size 2 | 3–100 |
| Dense No. of neurons (neurons) | 4–256 |
| Dropout rate (dropout 1/2) | 0.1–0.6 |
| Learning rate (LR) | 0.01–0.06 |
| Batch size | 8–260 |
Figure 2Sampling characterization for each vintage and variety: (a) sugar reference measurements; (b) pH reference measurements.
Optimized hyperparameters of 1D CNN for each preprocessing method using BOGP.
| Preprocessing | #Filters 1 | Kernel Size 1 | #Filters 2 | Kernel Size 2 | Neurons | Dropout 1/2 | LR | Batch Size |
|---|---|---|---|---|---|---|---|---|
| MSC | 39 | 40 | 60 | 7 | 128 | 0.20/0.15 | 0.050 | 8 |
| Norm | 34 | 50 | 47 | 9 | 128 | 0.15/0.15 | 0.039 | 8 |
| SG | 60 | 50 | 60 | 3 | 128 | 0.40/0.20 | 0.033 | 8 |
Model performance of the optimized 1D CNN for each preprocessing method.
| Parameter | Preprocessing | Validation Set | Test Set |
|---|---|---|---|
| RMSEV | RMSEP | ||
| Sugar | MSC | 0.765 °Brix | 0.806 °Brix |
| Norm | 0.743 °Brix | 0.791 °Brix | |
| SG | 0.726 °Brix | 0.755 °Brix | |
| pH | MSC | 0.150 | 0.146 |
| Norm | 0.127 | 0.124 | |
| SG | 0.119 | 0.110 |
Figure 3Absolute percentage error of each preprocessing technique regarding the 1D CNN created when applied to independent test samples for (a) sugar measurements and (b) pH measurements.
Figure 4RMSEs of predictions for each preprocessing technique regarding the 1D CNN created and then applied to test samples for (a) sugar measurements and (b) pH measurements. The boxes represent the 25th, 50th, and 75th percentiles, the whiskers represent the minimum and maximum values, and the plus symbol denotes the mean reference values.
Figure 5Prediction results of the independent test set with samples of the varieties TN (green points) and TB (orange points) when introduced into the 1D CNN model created with TF samples, regarding (a) sugar measurements and (b) pH measurements.
Figure 6Percentiles for absolute percentage error of sugar and pH in the independent test set (TB and TN).
Predictive results of sugar and pH for 1D CNN tested with samples of TN and TB varieties.
| Parameter | TN | TB |
|---|---|---|
| RMSEP | RMSEP | |
| Sugar | 1.025 °Brix | 1.203 °Brix |
| pH | 0.234 | 0.158 |
Optimized hyperparameters of 1D CNN for training process with samples of TF from 2012 until 2017 using BOGP.
| Preprocessing | #Filters 1 | Kernel Size 1 | #Filters 2 | Kernel Size 2 | Neurons | Dropout 1/2 | LR | Batch Size |
|---|---|---|---|---|---|---|---|---|
| SG | 15 | 32 | 29 | 19 | 90 | 0.41/0.20 | 0.043 | 8 |
Results obtained by the 1D CNN TF Model (2012–2017).
| Parameter | RMSEV | RMSEP |
|---|---|---|
| Sugar | 1.227 °Brix | 1.396 °Brix |
| pH | 0.182 | 0.223 |
Figure 7Prediction results of the TL-TF Model (2017) for (a) sugar measurements and (b) pH measurements.
Figure 8Absolute percentage error of each TF Test 2018 when applied to both created models: (a) sugar measurements; (b) pH measurements.
Figure A1Boxplot of the descriptive statistics for sugar content reference values used for the training process and independent test phases. The boxes represent the 25th, 50th and 75th percentiles, the whiskers represent the fifth and 95th percentiles, the lower and upper open circles represent the minimum and maximum values, and the plus symbol denotes the mean values.
Figure A2Boxplot of the descriptive statistics for pH reference values used for the training process and independent test phases. The boxes represent the 25th, 50th and 75th percentiles, the whiskers represent the fifth and 95th percentiles, the lower and upper open circles represent the minimum and maximum values, and the plus symbol denotes the mean values.
Figure A3Absolute errors obtained for the independent test set comprising TF (blue point), TB (orange points), and TN (green points) varieties in (a) sugar measurements and (b) pH measurements.
Comparison of results from the present work and from other works published in the literature for the prediction of sugar content and pH using spectroscopic techniques in reflectance mode.
| RMSE | |||||
|---|---|---|---|---|---|
| Present Work Results: | Sugar (°Brix) | pH | |||
| Different vintages (six) | 0.755 | 0.110 | |||
| Testing with a different vintage | 1.085 | 0.183 | |||
| Testing with different varieties | 1.025/1.203 | 0.234/0.158 | |||
|
|
| ||||
| Used more than two vintages | [ | SVM | 1.411 | 0.144 | |
| Tested with a different vintage | [ | ANN | - | 0.191 | |
| [ | PLS | 1.344 | - | ||
| ANN | 1.355 | - | |||
| Tested with different varieties | [ | ANN | - | 0.170/0.176 | |
| [ | SVM | 2.443/3.186 | 0.303/0.253 | ||
| Used one vintage | [ | PLS | 1.270 | - | |
| [ | PLS | 0.939 | |||
| ANN | 0.955 | ||||
| [ | PLS | 1.150 | - | ||
| [ | ANN | 0.950 | 0.180 | ||
| Used one vintage + blending varieties | [ | LS-SVM * | 0.960 | Different range of pH values | |
| PLS | 0.930 | ||||
|
| Used more than two vintages | [ | MPLS ** | 1.000 | 0.120 |
| Used one vintage | [ | MPLS ** | 1.370 | 0.120 | |
| [ | MPLS ** | - | 0.150 | ||
| Tested with a different vintage | [ | PLS | 1.090 | 0.060 | |
| Used one vintage + blending varieties | [ | PLS | 0.650 | 0.050 | |
* Least-squares support vector machines; ** modified partial least squares.