| Literature DB >> 26761015 |
Fangfang Qu1, Dong Ren2, Jihua Wang3,4, Zhong Zhang5, Na Lu6, Lei Meng7.
Abstract
Spectral analysis technique based on near infrared (NIR) sensor is a powerful tool for complex information processing and high precision recognition, and it has been widely applied to quality analysis and online inspection of agricultural products. This paper proposes a new method to address the instability of small sample sizes in the successive projections algorithm (SPA) as well as the lack of association between selected variables and the analyte. The proposed method is an evaluated bootstrap ensemble SPA method (EBSPA) based on a variable evaluation index (EI) for variable selection, and is applied to the quantitative prediction of alcohol concentrations in liquor using NIR sensor. In the experiment, the proposed EBSPA with three kinds of modeling methods are established to test their performance. In addition, the proposed EBSPA combined with partial least square is compared with other state-of-the-art variable selection methods. The results show that the proposed method can solve the defects of SPA and it has the best generalization performance and stability. Furthermore, the physical meaning of the selected variables from the near infrared sensor data is clear, which can effectively reduce the variables and improve their prediction accuracy.Entities:
Keywords: information processing; near infrared sensors; spectroscopy; successive projections algorithm; variable selection
Mesh:
Substances:
Year: 2016 PMID: 26761015 PMCID: PMC4732122 DOI: 10.3390/s16010089
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Descriptive statistics for sample measurements.
| Dataset | Number of Samples | Concentration Range (%) | Mean Value (%) | Standard Deviation |
|---|---|---|---|---|
| Calibration | 108 | 0.045–0.850 | 0.419 | 0.2425 |
| Validation | 54 | 0.075–0.835 | 0.468 | 0.2266 |
Modeling results of different processing methods.
| Method | RAW | MSC | SNV | SNV + DT | SG | SW | 1-Der | 2-Der |
|---|---|---|---|---|---|---|---|---|
| R | 0.8994 | 0.9325 | 0.9521 | 0.9444 | 0.8993 | 0.8991 | 0.9507 | 0.9512 |
| RMSECV | 0.1020 | 0.0845 | 0.0715 | 0.0769 | 0.1020 | 0.1020 | 0.0753 | 0.0771 |
Figure 1Spectra of samples: (A) RAW spectra and (B) SNV spectra.
Figure 2Flowchart of the proposed method.
Prediction performance for various .
| EBSPA-MLR | SPA-MLR | EBSPA-PLS | SPA-PLS | |||||
|---|---|---|---|---|---|---|---|---|
| R2 | RMSEP | R2 | RMSEP | R2 | RMSEP | R2 | RMSEP | |
| 10 | 0.9587 | 0.0582 | 0.9183 | 0.0818 | 0.9548 | 0.0608 | 0.9154 | 0.0824 |
| 15 | 0.9599 | 0.0573 | 0.9129 | 0.0871 | 0.9611 | 0.0565 | 0.9542 | 0.0612 |
| 20 | 0.9614 | 0.0563 | 0.9269 | 0.0788 | 0.9654 | 0.0534 | 0.9542 | 0.0612 |
| 25 | 0.9625 | 0.0555 | 0.9269 | 0.0788 | 0.9671 | 0.0521 | 0.9542 | 0.0612 |
| 30 | 0.9445 | 0.0672 | 0.9269 | 0.0788 | 0.9677 | 0.0516 | 0.9542 | 0.0612 |
The maximum number of the selected variables is , and R2 and RMSEP are the correlation coefficient and standard deviation of the validation set, respectively.
Figure 3Model results for different values of : (A) selected numbers of variables; and (B) RMSEP values.
Figure 4Experimental comparison of small sample sizes: (A) correlation coefficient and (B) RMSEP.
Comparison of EBSPA and BSPA.
| (27, 14) | (54, 27) | (81, 41) | (108, 54) | ||
|---|---|---|---|---|---|
| BSPA-MLR | R2 | 0.9645 | 0.9682 | 0.9591 | 0.9592 |
| RMSEP | 0.0599 | 0.0530 | 0.0594 | 0.0671 | |
| 117 | 116 | 107 | 105 | ||
| EBSPA-MLR | R2 | 0.9687 | 0.9767 | 0.9606 | 0.9614 |
| RMSEP | 0.0563 | 0.0455 | 0.0583 | 0.0563 | |
| 50 | 57 | 40 | 87 | ||
| BSPA-PLS | R2 | 0.9588 | 0.9716 | 0.9518 | 0.9192 |
| RMSEP | 0.0644 | 0.0501 | 0.0643 | 0.0806 | |
| 119 | 83 | 94 | 92 | ||
| EBSPA-PLS | R2 | 0.9704 | 0.9754 | 0.9665 | 0.9654 |
| RMSEP | 0.0547 | 0.0467 | 0.0538 | 0.0534 | |
| 20 | 28 | 36 | 38 | ||
| BSPA-LS-SVM | R2 | 0.9166 | 0.8883 | 0.8843 | 0.8972 |
| RMSEP | 0.0818 | 0.0973 | 0.0979 | 0.0904 | |
| 15 | 17 | 22 | 21 | ||
| EBSPA-LS-SVM | R2 | 0.9204 | 0.9510 | 0.8985 | 0.9024 |
| RMSEP | 0.0800 | 0.0633 | 0.0920 | 0.0882 | |
| 13 | 10 | 11 | 10 |
Metrics R2 and RMSEP are the correlation coefficient and standard deviation of the validation set, respectively; and are the number of variables in BSPA and EBSPA, respectively; and (m, n) denotes the sample set, where m and n are the number of samples of the calibration and validation sets, respectively.
Figure 5Variable selection: (A) FiPLS; (B) BiPLS; (C) UVE; (D) MC-UVE; (E) CARS; and (F) EBSPA-PLS.
Comparison of model performances.
| Method | Calibration Set | Validation Set | Variable Numbers | ||
|---|---|---|---|---|---|
| R1 | RMSEC | R2 | RMSEP | ||
| PLS | 0.9562 | 0.0707 | 0.9553 | 0.0605 | 4001 |
| FiPLS | 0.9696 | 0.0594 | 0.9440 | 0.0685 | 266 |
| BiPLS | 0.9711 | 0.0578 | 0.9607 | 0.0633 | 1199 |
| UVE | 0.9566 | 0.0704 | 0.9363 | 0.0718 | 214 |
| MC-UVE | 0.9536 | 0.0614 | 0.9535 | 0.0616 | 1571 |
| CARS | 0.9568 | 0.0702 | 0.9444 | 0.0673 | 29 |
| EBSPA-PLS | 0.9734 | 0.0523 | 0.9654 | 0.0534 | 38 |
Metrics R1 and RMSEC are the correlation coefficient and standard deviation of the calibration set, respectively; and R2 and RMSEP are the correlation coefficient and standard deviation of the validation set, respectively.
Figure 6Comparison of regression rates: (A) FiPLS; (B) BiPLS; (C) UVE; (D) MC-UVE; (E) CARS; and (F) EBSPA-PLS.