| Literature DB >> 35957349 |
Dejan Ljubobratović1, Marko Vuković2, Marija Brkić Bakarić1, Tomislav Jemrić2, Maja Matetić1.
Abstract
To date, many machine learning models have been used for peach maturity prediction using non-destructive data, but no performance comparison of the models on these datasets has been conducted. In this study, eight machine learning models were trained on a dataset containing data from 180 'Suncrest' peaches. Before the models were trained, the dataset was subjected to dimensionality reduction using the least absolute shrinkage and selection operator (LASSO) regularization, and 8 input variables (out of 29) were chosen. At the same time, a subgroup consisting of the peach ground color measurements was singled out by dividing the set of variables into three subgroups and by using group LASSO regularization. This type of variable subgroup selection provided valuable information on the contribution of specific groups of peach traits to the maturity prediction. The area under the receiver operating characteristic curve (AUC) values of the selected models were compared, and the artificial neural network (ANN) model achieved the best performance, with an average AUC of 0.782. The second-best machine learning model was linear discriminant analysis with an AUC of 0.766, followed by logistic regression, gradient boosting machine, random forest, support vector machines, a classification and regression trees model, and k-nearest neighbors. Although the primary parameter used to determine the performance of the model was AUC, accuracy, F1 score, and kappa served as control parameters and ultimately confirmed the obtained results. By outperforming other models, ANN proved to be the most accurate model for peach maturity prediction on the given dataset.Entities:
Keywords: AUC; artificial neural networks; dimensionality reduction; fruit quality; group lasso; lasso regularization; machine learning; non-destructive measurements; peach maturity prediction
Mesh:
Year: 2022 PMID: 35957349 PMCID: PMC9371007 DOI: 10.3390/s22155791
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Graphical representation of MSE values obtained by cross-validation used to select the best lambda value.
A dataset with the list of variables used in model training.
| Feature | Variable Name | Description |
|---|---|---|
| fruit maturity | ripe | peach maturity (output binary variable) |
| fruit length | fruit_length | peach length |
| fruit shape index | fruit_shape_index | peach shape index |
| a_AC | ||
| C_AC | ||
| dE2000-AC | dE2000_AC | dE2000 for additional color |
| L_GC | ||
| a_GC | ||
| c_GC |
A dataset with the list of variables it contained before the dimension was reduced.
| Feature | Variable Name | Description | Group |
|---|---|---|---|
| fruit firmness | firmness | peach firmness | output var. |
| fruit weight | fruit_weight | peach weight | 1 |
| fruit width | fruit_width | peach width | 1 |
| fruit length | fruit_length | peach length | 1 |
| fruit shape index | fruit_shape_index | peach shape index | 1 |
| fruit diameter | fruit_diameter | peach diameter | 1 |
| fruit volume | fruit_volume | peach volume | 1 |
| fruit density | fruit_density | peach density | 1 |
| L_AC |
| ||
| a_AC |
| ||
| b_AC |
| ||
| C_AC |
| ||
| h_AC |
| ||
| a.b_AC |
| ||
| CCI-AC | CCI_AC | CCL additional color index | 2 |
| COL-AC | COL_AC | COL additional color index | 2 |
| CIRG1-AC | CIRG1_AC | CIRG1 additional color index | 2 |
| CIRG2-AC | CIRG2_AC | CIRG2 additional color index | 2 |
| dE2000-AC | dE2000_AC | dE2000 for additional color | 2 |
| L_GC |
| ||
| a_GC |
| ||
| b_GC |
| ||
| c_GC |
| ||
| h_GC |
| ||
| a.b_GC |
| ||
| CCI-GC | CCI_GC | CCL ground color index | 3 |
| COL-GC | COL_GC | COL ground color index | 3 |
| CIRG1-GC | CIRG1_GC | CIRG1 ground color index | 3 |
| CIRG2-GC | CIRG2_GC | CIRG2 ground color index | 3 |
| dE2000-GC | dE2000_GC | dE2000 for ground color | 3 |
Group LASSO regularization preserving the coefficients next to the variables in Group 3.
| Variable | Group | Group Lasso |
|---|---|---|
| fruit_weight | 1 | 0.000000000 |
| fruit_width | 1 | 0.000000000 |
| fruit_length | 1 | 0.000000000 |
| fruit_shape_index | 1 | 0.000000000 |
| fruit_diameter | 1 | 0.000000000 |
| fruit_volume | 1 | 0.000000000 |
| fruit_density | 1 | 0.000000000 |
| L_AC | 2 | 0.000000000 |
| a_AC | 2 | 0.000000000 |
| b_AC | 2 | 0.000000000 |
| C_AC | 2 | 0.000000000 |
| h_AC | 2 | 0.000000000 |
| a.b_AC | 2 | 0.000000000 |
| CCI_AC | 2 | 0.000000000 |
| COL_AC | 2 | 0.000000000 |
| CIRG1_AC | 2 | 0.000000000 |
| CIRG2_AC | 2 | 0.000000000 |
| dE2000_AC | 2 | 0.000000000 |
| L_GC | 3 | −0.003395380 |
| a_GC | 3 | 0.029737581 |
| b_GC | 3 | 0.005994080 |
| c_GC | 3 | 0.014852482 |
| h_GC | 3 | −0.025684634 |
| a.b_GC | 3 | 0.024926216 |
| CCI_GC | 3 | 0.022079283 |
| COL_GC | 3 | 0.022994860 |
| CIRG1_GC | 3 | 0.011642995 |
| CIRG2_GC | 3 | 0.004742768 |
| dE2000_GC | 3 | 0.008545801 |
Different models AUC results for seed values from 1 to 5.
| Seed | ANN | CART | GBM | LDA | LR | KNN | RF | SVM |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.756 | 0.756 | 0.822 | 0.844 | 0.867 | 0.600 | 0.800 | 0.778 |
| 2 | 0.778 | 0.667 | 0.756 | 0.756 | 0.733 | 0.644 | 0.711 | 0.733 |
| 3 | 0.711 | 0.711 | 0.733 | 0.756 | 0.756 | 0.689 | 0.733 | 0.689 |
| 4 | 0.844 | 0.756 | 0.756 | 0.844 | 0.844 | 0.644 | 0.667 | 0.756 |
| 5 | 0.800 | 0.600 | 0.756 | 0.689 | 0.689 | 0.578 | 0.711 | 0.644 |
Model’s AUC, accuracy, F1 score, and kappa averages.
| Model | AUC | Accuracy | F1 Score | Kappa |
|---|---|---|---|---|
| ANN | 0.782 | 0.738 | 0.765 | 0.468 |
| LDA | 0.766 | 0.730 | 0.765 | 0.448 |
| LR | 0.765 | 0.732 | 0.765 | 0.453 |
| GBM | 0.714 | 0.675 | 0.724 | 0.333 |
| RF | 0.708 | 0.675 | 0.722 | 0.332 |
| SVM | 0.691 | 0.642 | 0.688 | 0.267 |
| CART | 0.670 | 0.663 | 0.719 | 0.301 |
| KNN | 0.626 | 0.605 | 0.653 | 0.197 |
Figure 2Comparison of AUC and accuracies for all eight models from 100 model trainings with different seed values.
Figure 3AUC and accuracy density distributions of compared models.
Comparison of averaged model scores and representative model scores based on the chosen seed values.
| Model | Representative Model Seed | Average AUC | Representative Model AUC | Average Accuracy | Representative Model Accuracy | Average | Representative Model Kappa |
|---|---|---|---|---|---|---|---|
| ANN | 6 | 0.782 | 0.778 | 0.738 | 0.733 | 0.468 | 0.467 |
| LDA | 58 | 0.766 | 0.756 | 0.730 | 0.733 | 0.448 | 0.460 |
| LR | 3 | 0.765 | 0.756 | 0.732 | 0.733 | 0.453 | 0.449 |
| GBM | 29 | 0.714 | 0.711 | 0.675 | 0.667 | 0.333 | 0.322 |
| RF | 35 | 0.708 | 0.711 | 0.675 | 0.667 | 0.332 | 0.328 |
| SVM | 63 | 0.691 | 0.689 | 0.642 | 0.644 | 0.267 | 0.273 |
| CART | 18 | 0.670 | 0.667 | 0.663 | 0.667 | 0.301 | 0.301 |
| KNN | 29 | 0.626 | 0.622 | 0.605 | 0.600 | 0.197 | 0.182 |
Figure 4ROC curves for representing models.
Figure 5Representation of the ANN network with 8 input variables and 2 hidden layers with the output variable ripe.
Results of the best performing models trained on the full dataset compared to the results of a model trained on the dataset with only 8 input variables (LASSO).
| Model | AUC (Lasso) | AUC (Full) | AUC Increase | Acc. (Lasso) | Acc. (Full) | Acc. Increase | Kappa (Lasso) | Kappa (Full) | Kappa Increase |
|---|---|---|---|---|---|---|---|---|---|
| ANN | 0.782 | 0.763 | 2.49% | 0.738 | 0.718 | 2.79% | 0.468 | 0.430 | 8.84% |
| LDA | 0.766 | 0.731 | 4.79% | 0.730 | 0.683 | 6.88% | 0.448 | 0.360 | 24.4% |
| LR | 0.765 | 0.714 | 7.14% | 0.732 | 0.671 | 9.09% | 0.453 | 0.335 | 35.2% |
Figure 6The graphical comparison of model performance shows an increase in all measured parameters for the models trained on the dataset to which LASSO was applied.