| Literature DB >> 32872634 |
Hongbo Li1, Dapeng Jiang1, Jun Cao1, Dongyan Zhang1.
Abstract
Lipid content is an important indicator of the edible and breeding value ofEntities:
Keywords: NIR spectroscopy; Pinus koraiensis seeds; chemometric algorithms; feature selection; preprocessing
Mesh:
Substances:
Year: 2020 PMID: 32872634 PMCID: PMC7506848 DOI: 10.3390/s20174905
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Pine nut samples and surface scan using the NIRQuest512 spectrometer.
Figure 2Flowchart of principal component analysis (PCA)–partial least-squares (PLS) classification and wavelet transformation (WT)–Monte Carlo and uninformative variable elimination (MCUVE)–PLS regression models.
Descriptive statistics for lipid contents of Pinus koraiensis seeds as measured by Soxhlet extraction.
| Set | Mean | Min (b) | Max (c) | SD (d) | CV (e) |
|---|---|---|---|---|---|
| Yichun#1–40 | 63.01 | 62.60 | 63.40 | 0.27 | 0.41 |
| Heihe#41–80 | 61.17 | 60.30 | 62.20 | 0.52 | 0.85 |
| Changbai Mountain#81–120 | 60.90 | 59.70 | 62.30 | 0.76 | 1.26 |
| Calibration set (n (a) = 80) | 61.90 | 59.70 | 63.40 | 0.94 | 1.52 |
| Prediction set (n (a) = 40) | 60.75 | 60.10 | 62.20 | 0.71 | 1.16 |
| Total#120 | 61.70 | 59.70 | 63.40 | 1.09 | 1.77 |
(a)n sample number; (b) Min minimum; (c) Max maximum; (d) SD standard deviation; (e) CV coefficient of variation, CV = [{SD/Mean} × 100].
Figure 3Near-infrared (NIR) raw spectra and standard normalized variable (SNV) pretreated spectra of whole pine nuts, (a) The raw spectra, (b) The SNV pretreated spectra.
Figure 4NIR raw spectra, SNV pretreated spectra, and wavelet compression spectra of pine nut powder, (a) The raw spectra, (b) The SNV pretreated spectra, (c) The spectra after being compressed by ‘db9’, (d) The spectra after being compressed by ‘bior4.4’, (e) The spectra after being compressed by ‘sym8’, (f) The spectra after being compressed by ‘coif4’.
Compression rate and data distortion of different wavelet compression methods.
| Wavelet Filter | Threshold Methods | Compression R (%) | PRD (a) (%) |
|---|---|---|---|
| db9 | Birge–Massart Strategy | 85.1519 | 0.28 |
| SURE Shrink Thresholding | 86.0656 | 0.36 | |
| Donoho Thresholding | 83.9738 | 0.23 | |
| Soft Thresholding | 86.0714 | 0.37 | |
| bior4.4 | Birge–Massart Strategy | 85.6925 | 0.28 |
| SURE Shrink Thresholding | 86.7016 | 0.39 | |
| Donoho Thresholding | 84.7487 | 0.21 | |
| Soft Thresholding | 86.7525 | 0.39 | |
| sym8 | Birge–Massart Strategy | 84.6819 | 0.27 |
| SURE Shrink Thresholding | 85.9911 | 0.36 | |
| Donoho Thresholding | 83.5233 | 0.20 | |
| Soft Thresholding | 86.1487 | 0.37 | |
| coif4 | Birge–Massart Strategy | 83.7562 | 0.26 |
| SURE Shrink Thresholding | 85.2497 | 0.37 | |
| Donoho Thresholding | 82.5347 | 0.20 | |
| Soft Thresholding | 85.5128 | 0.38 |
(a)PRD, percent root mean square difference.
Figure 5Score plots of the pine nut samples in the space defined by the first two and three principal components, (a) The visualization of PCA-2D, (b) The visualization of PCA-3D.
Figure 6Stability diagram of spectral bands selected by MCUVE.
Comparison of classification model results.
| Model | Input Dimensions | Calibration Set | Prediction Set | |||||
|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Time /s | Precision | Recall | F1 (a) | Accuracy (%) | Time /s | ||
| PLS (b) | 511 | 78.75 | 8.96 | 0.85 | 0.93 | 0.81 | 77.50 | 4.46 |
| SNV (c)–PLS | 511 | 88.75 | 8.13 | 0.93 | 0.95 | 0.92 | 87.50 | 3.32 |
| SNV–PCA (d)–PLS | 2 | 98.75 | 2.61 | 1.00 | 0.94 | 0.97 | 97.50 | 0.91 |
| 3 | 98.75 | 2.96 | 0.97 | 0.95 | 0.97 | 97.50 | 1.03 | |
(a) F1 = [{2 × Precision × Recall} / {Precision + Recall}]; (b) PLS partial least-squares; (c) SNV standard normalized variable; (d) PCA principal component analysis.
Comparison of calibration results and prediction results obtained with the use of partial least-squares (PLS), uninformative variable elimination (UVE)–PLS, Monte Carlo (MC)–UVE–PLS, wavelet transformation (WT)–PLS, WT–MCUVE–PLS, principal component regression (PCR), principal component analysis (PCA)–PLS, and successive projections algorithm (SPA)–PLS models.
| Model | Number of Features | Calibration Set (n (a) = 80) | Prediction Set (n (a) = 40) | ||
|---|---|---|---|---|---|
| RMSECV (b) | R2 (d) (Cal (e)) | RMSEP (c) | R2 (d) (Pre (f)) | ||
| PLS | 511 | 0.0407 | 0.8613 | 0.1396 | 0.7489 |
| UVE–PLS | 100 | 0.0159 | 0.9169 | 0.0875 | 0.8810 |
| MCUVE–PLS | 70 | 0.0449 | 0.8369 | 0.1556 | 0.6721 |
| WT–PLS | 154 | 0.0808 | 0.7284 | 0.1491 | 0.7595 |
| WT–MCUVE–PLS | 70 | 0.0098 | 0.9485 | 0.0390 | 0.9369 |
| PCR | 511 | 0.0467 | 0.7512 | 0.1357 | 0.7540 |
| PCA–PLS | 80 | 0.0284 | 0.8635 | 0.1693 | 0.7330 |
| SPA–PLS | 50 | 1.6666 | 0.8820 | 0.1538 | 0.8141 |
(a)n sample number; (b)RMSECV standard error of cross-validation; (c) RMSEP root mean square error of prediction; (d) R multiple correlation coefficients; (e) Cal calibration set; (f) Pre prediction set.
Figure 7Correlation plot for the prediction of lipid content using PLS, UVE–PLS, MCUVE–PLS, WT–PLS, WT–MCUVE–PLS, principal component regression (PCR), PCA–PLS, and successive projections algorithm (SPA)–PLS models based on NIR spectra, (a) The visualization output of PLS, (b) The visualization output of UVE–PLS, (c) The visualization output of MCUVE–PLS, (d) The visualization output of WT–PLS, (e) The visualization output of WT–MCUVE–PLS, (f) The visualization output of PCR, (g) The visualization output of PCA–PLS, (h) The visualization output of SPA–PLS.