| Literature DB >> 30257420 |
Di Wang1,2, Lin Xie3, Simon X Yang4, Fengchun Tian5.
Abstract
Near-infrared (NIR) spectral sensors deliver the spectral response of the light absorbed by materials for quantification, qualification or identification. Spectral analysis technology based on the NIR sensor has been a useful tool for complex information processing and high precision identification in the tobacco industry. In this paper, a novel method based on the support vector machine (SVM) is proposed to discriminate the tobacco cultivation region using the near-infrared (NIR) sensors, where the genetic algorithm (GA) is employed for input subset selection to identify the effective principal components (PCs) for the SVM model. With the same number of PCs as the inputs to the SVM model, a number of comparative experiments were conducted between the effective PCs selected by GA and the PCs orderly starting from the first one. The model performance was evaluated in terms of prediction accuracy and four parameters of assessment criteria (true positive rate, true negative rate, positive predictive value and F1 score). From the results, it is interesting to find that some PCs with less information may contribute more to the cultivation regions and are considered as more effective PCs, and the SVM model with the effective PCs selected by GA has a superior discrimination capacity. The proposed GA-SVM model can effectively learn the relationship between tobacco cultivation regions and tobacco NIR sensor data.Entities:
Keywords: NIR sensor; cultivation region discrimination; feature selection; genetic algorithm; support vector machine
Mesh:
Year: 2018 PMID: 30257420 PMCID: PMC6210373 DOI: 10.3390/s18103222
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Framework of tobacco cultivation region classifier.
The number of tobacco samples collected in four different regions.
| Class 1 (North) | Class 2 (Middle) | Class 3 (Northwest) | Class 4 (Southwest) | Total |
|---|---|---|---|---|
| 38 | 144 | 70 | 80 | 332 |
Figure 2Raw NIR spectra of the 332 samples.
Figure 3Amount of information for each of the top 25 PCs from PCA.
Figure 4Flow diagram of the proposed GA-SVM classifier.
The confusion matrix.
| Predicted Label | |||
|---|---|---|---|
| Positive | Negative | ||
|
| Positive |
|
|
| Negative |
|
| |
Parameters selection with grid search method and 5-fold cross-validation. n: the number of PCs in corresponding input subset; C: the penalty constant; σ: the width in RBF; P: the best prediction accuracy of 5-fold cross-validation; t: time consumption.
|
|
|
| ||
|---|---|---|---|---|
| 6 | 2.0 | 2.0 | 59 | 20 |
| 8 | 1.4 | 2.0 | 66.2 | 21 |
| 10 | 2.0 | 1.4 | 72.2 | 23 |
| 12 | 2.0 | 1.4 | 75.6 | 24 |
| 14 | 2.0 | 2.0 | 78.6 | 24 |
| 16 | 2.0 | 0.7 | 75.6 | 28 |
Figure 5The selection of population size for the GA-SVM model [30].
Figure 6The selection of crossover rate and mutation rate for the GA-SVM model [30]. (a) Crossover rate; (b) Mutation rate.
Optimal individual PCs selected by GA based on SG smoothing of the NIR spectral sensor data. N: the i-th number of PC in PCA.
| Input Number | ||||||
|---|---|---|---|---|---|---|
|
| 6 | 8 | 10 | 12 | 14 | 16 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 0 | 0 | 0 | 0 | 1 | 1 |
| 3 | 0 | 1 | 1 | 1 | 1 | 1 |
| 4 | 0 | 0 | 0 | 0 | 0 | 1 |
| 5 | 0 | 0 | 1 | 0 | 1 | 1 |
| 6 | 1 | 1 | 1 | 1 | 1 | 1 |
| 7 | 1 | 1 | 1 | 1 | 1 | 1 |
| 8 | 1 | 1 | 1 | 1 | 1 | 1 |
| 9 | 0 | 0 | 1 | 1 | 1 | 0 |
| 10 | 1 | 1 | 1 | 1 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 | 1 | 1 |
| 12 | 0 | 0 | 1 | 1 | 1 | 1 |
| 13 | 0 | 0 | 0 | 0 | 1 | 1 |
| 14 | 0 | 0 | 0 | 0 | 0 | 1 |
| 15 | 0 | 0 | 0 | 1 | 1 | 0 |
| 16 | 0 | 0 | 0 | 0 | 0 | 0 |
| 17 | 0 | 0 | 0 | 0 | 0 | 0 |
| 18 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19 | 0 | 0 | 0 | 0 | 0 | 0 |
| 20 | 0 | 0 | 0 | 0 | 0 | 1 |
| 21 | 0 | 1 | 0 | 1 | 1 | 1 |
| 22 | 0 | 0 | 0 | 0 | 0 | 0 |
| 23 | 0 | 0 | 0 | 1 | 0 | 0 |
| 24 | 0 | 0 | 0 | 0 | 0 | 1 |
| 25 | 0 | 0 | 0 | 0 | 0 | 0 |
Prediction accuracy of GA-SVM with PCs from GA and SVM model with the first corresponding number of PCs. Pa: prediction accuracy; I: the amount of information.
| GA-SVM | SVM | |||
|---|---|---|---|---|
| Input Number | ||||
| 6 | 72.7 | 72.4 | 60.6 | 99.6 |
| 8 | 75.8 | 74.1 | 67.0 | 99.8 |
| 10 | 75.8 | 74.7 | 68.2 | 99.8 |
| 12 | 74.2 | 74.2 | 77.3 | 99.8 |
| 14 | 83.3 | 98.8 | 80.3 | 99.9 |
| 16 | 78.8 | 99.8 | 78.8 | 99.9 |
The result of two models in terms of four evaluation parameters on testing set. : sensitivity rate; : specificity rate; : precision rate; : F1-score [30].
|
|
| |||||||
|
|
|
|
|
|
|
|
| |
|
| 0.92 | 1 | 1 | 0.96 | 0.84 | 0.83 | 0.75 | 0.79 |
|
| 0.67 | 0.98 | 1 | 0.76 | 0.88 | 0.78 | 0.75 | 0.79 |
|
|
| |||||||
|
|
|
|
|
|
|
|
| |
|
| 0.67 | 0.93 | 0.67 | 0.67 | 0.88 | 1 | 1 | 0.94 |
|
| 0.75 | 0.94 | 0.71 | 0.75 | 0.82 | 1 | 0.89 | 0.9 |