| Literature DB >> 23857260 |
Wenwen Kong1, Chu Zhang, Fei Liu, Pengcheng Nie, Yong He.
Abstract
A near-infrared (NIR) hyperspectral imaging system was developed in this study. NIR hyperspectral imaging combined with multivariate data analysis was applied to identify rice seed cultivars. Spectral data was exacted from hyperspectral images. Along with Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA), K-Nearest Neighbor Algorithm (KNN) and Support Vector Machine (SVM), a novel machine learning algorithm called Random Forest (RF) was applied in this study. Spectra from 1,039 nm to 1,612 nm were used as full spectra to build classification models. PLS-DA and KNN models obtained over 80% classification accuracy, and SIMCA, SVM and RF models obtained 100% classification accuracy in both the calibration and prediction set. Twelve optimal wavelengths were selected by weighted regression coefficients of the PLS-DA model. Based on optimal wavelengths, PLS-DA, KNN, SVM and RF models were built. All optimal wavelengths-based models (except PLS-DA) produced classification rates over 80%. The performances of full spectra-based models were better than optimal wavelengths-based models. The overall results indicated that hyperspectral imaging could be used for rice seed cultivar identification, and RF is an effective classification technique.Entities:
Mesh:
Year: 2013 PMID: 23857260 PMCID: PMC3758629 DOI: 10.3390/s130708916
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Hyperspectral imaging system.
Figure 2.(a) Raw spectra of rice seeds. (b) first derivative preprocessed spectra of rice seeds.
Figure 3.Scores scatter plot of PC1 and PC2 of raw spectra.
Results of classification models based on full spectra.
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PLS-DA | 32/37 | 86.49% | 36/37 | 97.30% | 37/37 | 100% | 38/39 | 97.44% | 143/150 | 95.33% | |
| 13/19 | 68.42% | 17/19 | 89.47% | 17/18 | 94.44% | 16/19 | 84.21% | 63/75 | 84.00% | ||
| SIMCA | Cal | 37/37 | 100% | 37/37 | 100% | 37/37 | 100% | 39/39 | 100% | 150/150 | 100% |
| Pre | 19/19 | 100% | 19/19 | 100% | 18/18 | 100% | 19/19 | 100% | 75/75 | 100% | |
| KNN | Cal | 34/37 | 91.89% | 37/37 | 100% | 30/37 | 81.08% | 39/39 | 100% | 140/150 | 93.33% |
| Pre | 17/19 | 89.47% | 19/19 | 100% | 13/18 | 72.22% | 19/19 | 100% | 68/75 | 90.67% | |
| SVM | Cal | 37/37 | 100% | 37/37 | 100% | 37/37 | 100% | 39/39 | 100% | 150/150 | 100% |
| Pre | 19/19 | 100% | 19/19 | 100% | 18/18 | 100% | 19/19 | 100% | 75/75 | 100% | |
| RF | Cal | 37/37 | 100% | 37/37 | 100% | 37/37 | 100% | 39/39 | 100% | 150/150 | 100% |
| Pre | 19/19 | 100% | 19/19 | 100% | 18/18 | 100% | 19/19 | 100% | 75/75 | 100% | |
Nr is the number of rightly classified samples; Nt is the total number of samples.
accu is the classification accuracy.
Cal represents the calibration set of the samples.
Pre represents the prediction set of the samples.
Figure 4.Weighted regression coefficients of PLS-DA model with selected wavelengths.
Results of classification models based on optimal wavelengths.
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PLS-DA | 20/37 | 54.05% | 26/37 | 70.27% | 28/37 | 75.68% | 31/39 | 79.49% | 105/150 | 70% | |
| 11/19 | 57.89% | 15/19 | 78.95% | 10/18 | 55.56% | 14/19 | 73.68% | 50/75 | 66.67% | ||
| KNN | Cal | 36/37 | 97.30% | 34/37 | 91.89% | 32/37 | 86.49% | 38/39 | 97.44% | 140/150 | 93.33% |
| Pre | 17/19 | 89.47% | 19/19 | 100% | 13/18 | 72.22% | 19/19 | 100% | 68/75 | 90.67% | |
| SVM | Cal | 37/37 | 100% | 36/37 | 97.30% | 36/37 | 97.30% | 37/39 | 94.87% | 146/150 | 97.33% |
| Pre | 17/19 | 89.47% | 16/19 | 84.21% | 16/18 | 88.89% | 18/19 | 94.74% | 67/75 | 89.33% | |
| RF | Cal | 37/37 | 100% | 37/37 | 100% | 37/37 | 100% | 39/39 | 100% | 150/150 | 100% |
| Pre | 17/19 | 89.47% | 15/19 | 78.95% | 13/18 | 72.22% | 18/19 | 94.74% | 63/75 | 84% | |
Nr is the number of rightly classified samples; Nt is the total number of samples.
accu is the classification accuracy.
Cal represents the calibration set of the samples.
Pre represents the prediction set of the samples.