| Literature DB >> 35360340 |
Xiao Wei1,2, Dandan Kong1, Shiping Zhu2, Song Li2, Shengling Zhou2, Weiji Wu3.
Abstract
Different soybean varieties vary greatly in their nutritional value and composition. Screening for superior varieties is also essential for the development of the soybean seed industry. The objective of the paper was to analyze the feasibility of terahertz (THz) frequency-domain spectroscopy and chemometrics for soybean variety identification. Meanwhile, a grey wolf optimizer-support vector machine (GWO-SVM) soybean variety identification model was proposed. Firstly, the THz frequency-domain spectra of experimental samples (6 varieties, 270 in total) were collected. Principal component analysis (PCA) was used to analyze the THz spectra. After that, 203 samples from the calibration set were used to establish a soybean variety identification model. Finally, 67 samples from the test set were used for prediction validation. The experimental results demonstrated that THz frequency-domain spectroscopy combined with GWO-SVM could quickly and accurately identify soybean varieties. Compared with discriminant partial least squares (DPLS) and particles swarm optimization support vector machine, GWO-SVM combined with the second derivative could establish a better soybean variety identification model. The overall correct identification rate of its prediction set was 97.01%.Entities:
Keywords: DPLS; GWO-SVM; PSO-SVM; THz spectroscopy; soybean
Year: 2022 PMID: 35360340 PMCID: PMC8963758 DOI: 10.3389/fpls.2022.823865
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
FIGURE 1Research flowchart.
FIGURE 2THz frequency-domain spectra of different variety experimental samples.
FIGURE 3The PC score plot.
DPLS soybean variety identification model validation results.
| Spectral pre-processing methods | The best number of PC | Correct identification rate% | Overall precision% | Overall F1 score% | Identification time used s | ||||||
| HD2 | LD1 | NMH | LD4 | HD12 | QH34 | Overall | |||||
| (a).None | 9 | 40 | 60 | 85.71 | 100 | 68.75 | 100 | 76.12 | 79.18 | 75.64 | 0.90 |
| (b).Mean-centering | 10 | 40 | 80 | 85.71 | 100 | 68.75 | 100 | 79.10 | 84.28 | 78.64 | 0.51 |
| (c).Auto scaling | 10 | 50 | 60 | 85.71 | 100 | 75 | 100 | 79.10 | 80.96 | 78.66 | 0.35 |
| (d).SNV | 9 | 20 | 70 | 85.71 | 100 | 68.75 | 100 | 74.63 | 81.66 | 72.88 | 0.47 |
| (e).Minimum and maximum values to [0 1] | 10 | 50 | 50 | 85.71 | 100 | 81.25 | 100 | 79.10 | 79.35 | 78.40 | 0.47 |
| (f).MSC | 9 | 20 | 70 | 85.71 | 100 | 68.75 | 100 | 74.63 | 81.66 | 72.88 | 0.79 |
| (g).First derivative | 10 | 50 | 70 | 85.71 | 100 | 56.25 | 100 | 76.12 | 76.06 | 82.13 | 0.65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
The row with the highest value of overall correct identification rate% and precision% is highlighted in bold.
PSO-SVM soybean variety identification validation results.
| Spectral pre-processing methods | Correct identification rate% | Overall precision% | Overall F1 score% | Identification time used s | ||||||
| HD2 | LD1 | NMH | LD4 | HD12 | QH34 | Overall | ||||
| None | 100 | 90 | 100 | 92.31 | 75 | 100 | 91.04 | 93.35 | 91.29 | 196.04 |
| (b). Mean-centering | 100 | 90 | 100 | 92.31 | 75 | 100 | 91.04 | 93.35 | 91.29 | 230.47 |
|
|
|
|
|
|
|
|
|
|
|
|
| (d). SNV | 90 | 90 | 85.71 | 100 | 75 | 100 | 89.55 | 91.11 | 89.81 | 239.65 |
| (e). Minimum and maximum values to [0 1] | 100 | 80 | 100 | 100 | 75 | 100 | 91.04 | 93.71 | 91.25 | 278.69 |
| (f). MSC | 90 | 90 | 85.71 | 100 | 75 | 100 | 89.55 | 91.11 | 89.81 | 152.84 |
| (g). First derivative | 90 | 80 | 100 | 92.31 | 81.25 | 100 | 89.55 | 90.90 | 89.72 | 211.92 |
| (h). Second derivative | 100 | 80 | 71.43 | 100 | 87.5 | 100 | 91.04 | 91.44 | 90.99 | 248.08 |
The row with the highest value of overall correct identification rate% and precision% is highlighted in bold.
GWO-SVM soybean variety identification validation results.
| Spectral pre-processing methods | Correct identification rate% | Overall precision% | Overall F1 score% | Identification time used s | ||||||
| HD2 | LD1 | NMH | LD4 | HD12 | QH34 | Overall | ||||
| (a). None | 100 | 90 | 100 | 84.62 | 75 | 100 | 89.55 | 92.13 | 89.81 | 162.33 |
| (b). Mean-centering | 100 | 90 | 100 | 84.62 | 75 | 100 | 89.55 | 92.13 | 89.81 | 147.19 |
| (c). Auto scaling | 100 | 90 | 100 | 100 | 75 | 100 | 92.54 | 94.84 | 92.77 | 330.74 |
| (d). SNV | 90 | 100 | 85.71 | 92.31 | 75 | 100 | 89.55 | 91.50 | 89.89 | 218.83 |
| (e). Minimum and maximum values to [0 1] | 100 | 80 | 100 | 100 | 75 | 100 | 91.04 | 93.71 | 91.25 | 322.81 |
| (f). MSC | 90 | 100 | 85.71 | 92.31 | 75 | 100 | 89.55 | 91.50 | 89.89 | 182.52 |
| (g). First derivative | 100 | 100 | 100 | 92.31 | 81.25 | 100 | 94.03 | 95.51 | 94.20 | 160.37 |
|
|
|
|
|
|
|
|
|
|
|
|
The row with the highest value of overall correct identification rate% and precision% is highlighted in bold.