| Literature DB >> 31510072 |
Lifei Wei1,2, Ziran Yuan3, Ming Yu1, Can Huang1, Liqin Cao4.
Abstract
: In this study, in order to solve the difficulty of the inversion of soil <span class="Chemical">arsenic (As) content using laboratory and field reflectance spectroscopy, we examined the transferability of the prediction method. Sixty-three soil samples from the Daye city area of the Jianghan Plain region of China were taken and studied in this research. The characteristic wavelengths of soil As content were then extracted from the full bands based on iteratively retaining informative variables (IRIV) coupled with Spearman's rank correlation analysis (SCA). Firstly, the IRIV algorithm was used to roughly select the original spectral data. Gaussian filtering (GF), first derivative (FD) filtering, and gaussian filtering again (GFA) pretreatments were then used to improve the correlation between the spectra and soil As content. A subset with absolute correlation values greater than 0.6 was then retained as the optimal subset after each pretreatment. Finally, partial least squares regression (PLSR), Bayesian ridge regression (BRR), ridge regression (RR), kernel ridge regression (KRR), support vector machine regression (SVMR), eXtreme gradient boosting (XGBoost) regression, and random forest regression (RFR) models were used to estimate the soil As values using the different characteristic variables. The results showed that, compared with the traditional method based on IRIV, using the characteristic bands selected by the IRIV-SCA method can effectively improve the prediction accuracy of the models. For the laboratory spectra experiment stage, the six most representative characteristic bands were selected. The performance of IRIV-SCA-SVMR was found to be the best, with the coefficient of determination (R2), root-mean-square error (RMSE), and mean absolute error (MAE) in the validation set being 0.97, 0.22, and 0.11, respectively. For the field spectra experiment stage, the 12 most representative characteristic bands were selected. The performance of IRIV-SCA-XGBoost was found to be the best, with the R2, RMSE, and MAE in the validation set being 0.83, 0.35, and 0.29, respectively. The accuracy and stability of the inversion of soil As content are signific<ass="Gene">span class="Gene">antly improved by the use of the proposed method, and the method could be used to provide accurate data for decision support for the treatment and recovery of As pollution over a large area.Entities:
Keywords: characteristic bands; eXtreme gradient boosting regression; hyperspectral remote sensing; iteratively retaining informative variables; random forest regression; soil arsenic content
Year: 2019 PMID: 31510072 PMCID: PMC6767283 DOI: 10.3390/s19183904
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The location of the study area and the locations of the sampling (the unmanned aerial vehicle (UAV) image was taken by a DJI Matrice 600 Pro drone).
Variable classification rules.
| Wavelength Variable Type | Classification Rules |
|---|---|
| Strongly informative | |
| Weakly informative | |
| Uninformative | |
| Interfering |
Figure 2The technical flowchart of the algorithm proposed in this paper.
Figure 3Soil reflectance spectra (with fringe noise removed) used to predict the As concentration in the soil: (a) laboratory reflectance spectra; (b) field reflectance spectra.
Statistics of As concentrations for the collected soil samples.
| Study Area | Dataset | Sample Size | Minimum | Maximum | Mean | SD | CV | Skewness | Kurtosis |
|---|---|---|---|---|---|---|---|---|---|
| Daye | Entire | 63 | 7.04 | 12.84 | 9.28 | 1.11 | 11.97% | 0.58 | 0.41 |
Figure 4Iteratively retaining informative variables (IRIV) iterative process and wavelength type decision parameter values obtained using the laboratory spectral reflectance. (a) Number of retained variables in the iterative rounds of IRIV; (b) DMEAN and p value in the 6th iteration.
Figure 5Iteratively retaining informative variables (IRIV) iterative process and wavelength type decision parameter values obtained using the field spectral reflectance. (a) Number of retained variables in the iterative rounds of IRIV; (b) DMEAN and p value in the 6th iteration.
Figure 6Correlation coefficients between the different pretreatments and the As concentration of soil. The green line indicates the IRIV spectral reflectance and the As concentration of soil, the red line indicates the Gaussian filtering (GF) spectral reflectance and the As concentration of soil, the blue line indicates the first derivative (FD) spectral reflectance and the As concentration of soil, and the black line indicates the GFA spectral reflectance and the As concentration of soil (a) Laboratory spectra of the soil samples; (b) Field spectra of the soil samples.
The feature bands and the correlation coefficients.
| Algorithm | Spectral | Spectral Set (nm) | Correlation Coefficients |
|---|---|---|---|
| IRIV | Laboratory | 486, 527, 740, 769,849, 1033, 1147, 1184, 1185, 1241, 1359, 1365, 2233, 2336, 2382 | −0.509, −0.490, −0.278, −0.279, −0.296, −0.287, −0.271, −0.264, −0.264, −0.259, −0.264, −0.264, −0.205, −0.194, −0.204 |
| Field | 619.6, 621, 1186.8, 1422.1, 1871.7, 1896.8, 1907.5, 2348.2, 2383.4 | −0.437, −0.448, −0.320, −0.364 −0.383, −0.391, −0.383, −0.431, −0.427 | |
| IRIV-SCA | Laboratory | GF486, GF527, GFA849–769, GFA1147–1033, GFA1184–1147, GFA2382–2336 | −0.821, −0.792, −0.743, 0.822, 0.663, −0.609 |
| Field | GF619.6, GF621, GF1186.8, GF1422.1, GF1871.7, GF1896.8, GF1907.5, GF2348.2, GF2383.4, GFA1871.7–1422.1, GFA1896.8–1871.7, GFA2348.2–1907.5 | −0.870, −0.885, −0.868, −0.901, −0.913, −0.921, −0.919, −0.931, −0.929, −0.632, −0.892, −0.806 |
1 GF = Gaussian filtering; GFA = Gaussian filtering again
Prediction accuracies of the As concentration obtained using laboratory spectra and field spectra based on IRIV.
| Algorithm | Spectral | Models | Calibration Set | Validation Set | ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| IRIV | Laboratory | PLSR | 0.29 | 0.94 | 0.73 | 0.52 | 0.67 | 0.49 |
|
|
|
|
|
|
|
| ||
| RR | 0.49 | 0.80 | 0.62 | 0.49 | 0.69 | 0.56 | ||
| KRR | 0.55 | 0.76 | 0.59 | 0.48 | 0.70 | 0.56 | ||
| SVMR | 0.99 | 0.11 | 0.10 | 0.59 | 0.62 | 0.49 | ||
| XGBoost | 0.87 | 0.40 | 0.31 | 0.57 | 0.63 | 0.49 | ||
| RFR | 0.78 | 0.53 | 0.39 | 0.27 | 0.82 | 0.69 | ||
| Field | PLSR | 0.27 | 1.00 | 0.75 | 0.37 | 0.74 | 0.62 | |
| BRR | 0.16 | 1.07 | 0.85 | 0.20 | 0.84 | 0.73 | ||
| RR | 0.28 | 1.00 | 0.75 | 0.37 | 0.75 | 0.63 | ||
| KRR | 0.29 | 0.99 | 0.75 | 0.42 | 0.72 | 0.60 | ||
| SVMR | 0.75 | 0.59 | 0.32 | 0.23 | 0.83 | 0.64 | ||
| XGBoost | 0.99 | 0.14 | 0.10 | 0.29 | 0.79 | 0.69 | ||
|
|
|
|
|
|
|
| ||
Prediction accuracies of the As concentration obtained using laboratory spectra and field spectra based on IRIV-SCA.
| Algorithm | Spectral | Models | Calibration Set | Validation Set | ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| IRIV-SCA | Laboratory | PLSR | 0.93 | 0.31 | 0.22 | 0.91 | 0.23 | 0.21 |
| BRR | 0.94 | 0.30 | 0.19 | 0.92 | 0.33 | 0.18 | ||
| RR | 0.93 | 0.31 | 0.19 | 0.92 | 0.14 | 0.17 | ||
| KRR | 0.92 | 0.33 | 0.20 | 0.91 | 0.25 | 0.20 | ||
|
|
| 0.15 |
|
|
|
| ||
| XGBoost | 0.98 | 0.13 | 0.01 | 0.93 | 0.25 | 0.14 | ||
| RFR | 0.97 | 0.30 | 0.12 | 0.96 | 0.18 | 0.15 | ||
| Field | PLSR | 0.77 | 0.56 | 0.40 | 0.76 | 0.42 | 0.35 | |
| BRR | 0.78 | 0.55 | 0.38 | 0.75 | 0.43 | 0.36 | ||
| RR | 0.77 | 0.56 | 0.37 | 0.75 | 0.43 | 0.35 | ||
| KRR | 0.75 | 0.58 | 0.38 | 0.74 | 0.44 | 0.35 | ||
| SVMR | 0.87 | 0.42 | 0.24 | 0.78 | 0.40 | 0.31 | ||
|
|
|
|
|
|
|
| ||
| RFR | 0.88 | 0.41 | 0.30 | 0.66 | 0.50 | 0.36 | ||
Figure 7A comparison between the measured values and predicted values of the different regression models using laboratory spectra. (a) Partial least squares regression (PLSR); (b) Bayesian ridge regression (BRR); (c) ridge regression (RR); (d) kernel ridge regression (KRR); (e) support vector machine regression (SVMR); (f) eXtreme gradient boosting (XGBoost) regression; (g) random forest regression (RFR).
Figure 8A comparison between the measured values and predicted values of the different regression models using field spectra. (a) Partial least squares regression (PLSR); (b) Bayesian ridge regression (BRR); (c) ridge regression (RR); (d) kernel ridge regression (KRR); (e) support vector machine regression (SVMR); (f) eXtreme gradient boosting (XGBoost) regression; (g) random forest regression (RFR).