| Literature DB >> 35923875 |
Yuzhen Chen1,2, Wanxia Sun1, Songtao Jiu1, Lei Wang1, Bohan Deng1, Zili Chen1, Fei Jiang3, Menghan Hu1,2, Caixi Zhang1.
Abstract
Citrus is one of the most important fruits in China. Miyagawa Satsuma, one kind of citrus, is a nutritious agricultural product with regional characteristics of Chongming Island. Near-infrared Spectroscopy (NIR) is a proper method for studying the quality of fruits, because it is low-cost, efficient, non-destructive, and repeatable. Therefore, the NIR technique is used to detect citrus's soluble solid content (SSC) in this study. After obtaining the original spectral data, the first 70% of them are divided into the training set and 30% into the test set. Then, the Random Frog algorithm is chosen to select characteristic wavelengths, which reduces the dimension of the data and the complexity of the model, and accordingly makes the generalization of the classification model better. After comparing the performance of various classifiers (AdaBoost, KNN, LS-SVM, and Bayes) under different characteristic wavelength numbers, the AdaBoost classifier outperforms using 275 characteristic wavelengths for modeling eventually. The accuracy, precision, recall, and F 1-score are 78.3%, 80.5%, 78.3%, and 0.780, respectively and the ROC (Receiver Operating Characteristic Curve, ROC curve) is close to the upper left corner, suggesting that the classification model is acceptable. The results demonstrate that it is feasible to use the NIR technique to estimate whether the citrus is sweet or not. Furthermore, it is beneficial for us to apply the obtained models for identifying the quality of citrus correctly. For fruit traders, the model helps them to determine the growth cycle of citrus more scientifically, improve the level of citrus cultivation and management and the final fruit quality, and thus increase the economic income of fruit traders.Entities:
Keywords: AdaBoost; citrus soluble solids content; machine learning; near infrared spectroscopy; random frog
Year: 2022 PMID: 35923875 PMCID: PMC9340214 DOI: 10.3389/fpls.2022.841452
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Figure 1Near infrared (NIR) spectra of citrus in different picking times in chronological order. Different picking orders are represented by different colors. Each line is the average spectrum of 12 fruit samples at each picking time.
Figure 2Cutoff probability of different characteristic wavelengths. For the subsequent modeling, 10 (A), 50 (B), 100 (C), 200 (D), 250 (E), 275 (F), and 300 (G) characteristic wavelengths are chosen, respectively. The numbers on the right corner of the figure are the cutoff probability of corresponding characteristic wavelengths number. The cutoff probability generally decreases as the number of informative wavelengths increases.
Figure 3Histogram of citrus soluble solid content (SSC) of the total 122 fruit samples collected in this study. The mean of all citruses' SSC is 9.06 Brix while the SD is 0.93 Brix. The citruses corresponding to the blue areas are regarded as sweet while the citruses corresponding to the red ones are regarded as unsweet.
Figure 4Comparison of NIR spectra of sweet (SSC beyond 9 Brix) and unsweet (SSC below 9 Brix) fruit samples. The shaded areas represent the confident intervals to each line.
Modeling results of sweet (SSC beyond 9 Brix) and unsweet (SSC below 9 Brix) classification of Miyagawa Satsuma in Chongming Island under different classifiers with different characteristic wavelengths.
|
|
|
| |||
|---|---|---|---|---|---|
|
|
|
|
| ||
|
|
|
| |||
| 10 | AdaBoost | 60.9 | 62.1 | 60.9 | 0.604 |
| KNN | 69.6 | 69.8 | 69.6 | 0.696 | |
| Bayes | 65.2 | 65.4 | 65.2 | 0.648 | |
| LS-SVM | 62.2 | 53.3 | 100.0 | 0.696 | |
| 50 | AdaBoost | 60.9 | 62.1 | 60.9 | 0.604 |
| KNN | 52.2 | 53.7 | 52.2 | 0.503 | |
| Bayes | 65.2 | 65.4 | 65.2 | 0.648 | |
| LS-SVM | 67.6 | 57.1 | 100.0 | 0.727 | |
| 100 | AdaBoost | 69.6 | 69.8 | 69.6 | 0.696 |
| KNN | 52.2 | 52.9 | 52.2 | 0.516 | |
| Bayes | 69.6 | 69.6 | 69.6 | 0.694 | |
| LS-SVM | 59.5 | 51.6 | 100.0 | 0.681 | |
| 200 | AdaBoost | 65.2 | 71.6 | 65.2 | 0.631 |
| KNN | 65.2 | 67.8 | 65.2 | 0.644 | |
| Bayes | 69.6 | 69.6 | 69.6 | 0.694 | |
| LS-SVM | 62.2 | 53.3 | 100.0 | 0.696 | |
| 250 | AdaBoost | 69.6 | 71.3 | 69.6 | 0.692 |
| KNN | 56.5 | 58.1 | 56.5 | 0.555 | |
| Bayes | 69.6 | 69.8 | 0.7 | 0.696 | |
| LS-SVM | 62.2 | 53.3 | 100.0 | 0.696 | |
| 275 |
|
|
|
|
|
| KNN | 60.9 | 62.1 | 60.9 | 0.604 | |
| Bayes | 69.6 | 69.8 | 69.6 | 0.696 | |
| LS-SVM | 62.2 | 53.3 | 100.0 | 0.696 | |
| 300 | AdaBoost | 69.6 | 71.3 | 69.6 | 0.692 |
| KNN | 65.2 | 67.8 | 65.2 | 0.644 | |
| Bayes | 69.6 | 69.8 | 69.6 | 0.696 | |
| LS-SVM | 62.2 | 54.2 | 81.3 | 0.65 | |
| 1556 | AdaBoost | 75.0 | 68.8 | 91.7 | 0.786 |
| KNN | 60.9 | 56.3 | 81.8 | 0.667 | |
| Bayes | 73.9 | 72.7 | 72.7 | 0.727 | |
| LS-SVM | 56.8 | 50.0 | 93.8 | 0.652 |
Bold font represents the best model.
Figure 5Receiver Operating Characteristic (ROC) curves. (A) is ROC curve of positive samples (SSC beyond 9 Brix) while (B) is ROC curve of negative samples (SSC below 9 Brix).