| Literature DB >> 30911024 |
Xiaoguang Zhang1,2,3, Biao Huang4,5.
Abstract
To achieve the best high spectral quantitative inversion of salt-affected soils, typical saline-sodic soil was selected from northeast China, and the soil spectra were measured; then, partial least-squares regression (PLSR) models and principle component regression(PCR) models were established for soil spectral reflectance and soil salinity, respectively. Modelling accuracies were compared between two models and conducted with different spectrum processing methods and different sampling intervals. Models based on all of the original spectral bands showed that the PLSR was superior to the PCR; however, after smoothing the spectra data, the PLSR did not continue outperforming the PCR. Models established by various transformed spectra after smoothing did not continue showing superiority of the PCR over the PLSR; therefore, we can conclude that the prediction accuracies of the models were not only determined by the smoothing methods, but also by spectral mathematical transformations. The best model was the PCR based on the median filtering data smoothing technique (MF) + log (1/X) + baseline correction transformation (R2 = 0.7206 and RMSE = 0.3929). To keep the information loss becoming too large, this suggested that an 8 nm sampling interval was the best when using soil spectra to predict soil salinity for both the PLSR and PCR models.Entities:
Year: 2019 PMID: 30911024 PMCID: PMC6434016 DOI: 10.1038/s41598-019-41470-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Accuracies of the PLSR and PCR models for EC based on the original spectra.
| Model | Calibration | Cross-validation | Independent Prediction | Number of predictors or factors | |||
|---|---|---|---|---|---|---|---|
| R2 | RMSE | R2 | RMSE | R2 | RMSE | ||
| PLSR | 0.8623 | 0.2431 | 0.5256 | 0.4561 | 0.5346 | 0.5071 | 7 |
| PCR | 0.5373 | 0.4455 | 0.3145 | 0.5610 | 0.4534 | 0.5496 | 11 |
“Independent Prediction”stands for the accuracy of models by independent validation set (36 selected samples). “Number of predictors or factors” denotes the number of spectral principal components extracted.
Prediction accuracies of the PLSR and PCR models for EC based on different spectral smoothing methods.
| Method | Calibration | Cross-validation | Independent Prediction | Number of predictors or factors | ||||
|---|---|---|---|---|---|---|---|---|
| R2 | RMSE | R2 | RMSE | R2 | RMSE | |||
| PLSR | 1 | 0.8796 | 0.2272 | 0.6695 | 0.3867 | 0.3069 | 0.6189 | 10 |
| 2a | 0.7695 | 0.3144 | 0.5600 | 0.4485 | 0.5806 | 0.4814 | 7 | |
| 3 | 0.8807 | 0.2262 | 0.6093 | 0.4192 | 0.6414 | 0.4452 | 10 | |
| 4 | 0.9042 | 0.2027 | 0.6698 | 0.3885 | 0.6090 | 0.4649 | 10 | |
| PCR | 1 | 0.7660 | 0.3168 | 0.5926 | 0.4298 | 0.5766 | 0.4837 | 19 |
| 2 | 0.7563 | 0.3233 | 0.5512 | 0.4532 | 0.5804 | 0.4815 | 19 | |
| 3 | 0.7636 | 0.3184 | 0.5356 | 0.4569 | 0.6799 | 0.4206 | 19 | |
| 4 | 0.7540 | 0.3248 | 0.5830 | 0.4355 | 0.6407 | 0.4456 | 17 | |
“Independent Prediction” stands for the accuracy of the models by independent validation set (36 selected samples). “Number of predictors or factors” denotes the number of spectral principal components extracted. The numbers 1, 2, 3, and 4 represent the moving-average data smoothing technique (MA), the Savitzky-Golay data smoothing technique (SG), the median filtering data smoothing technique (MF), and the Gaussian filtering data smoothing technique (GF) methods, respectively. The data in rows marked with the letter “a” are referenced from the literature[37].
Prediction accuracy of the PLSR and PCR models for EC based on moving-average data smoothing technique (MA) spectral smoothing.
| Method | Calibration | Cross-validation | Independent Prediction | Number of predictors or factors | ||||
|---|---|---|---|---|---|---|---|---|
| R2 | RMSE | R2 | RMSE | R2 | RMSE | |||
| PLSR | 1 + A | 0.8745 | 0.2320 | 0.7492 | 0.4453 | 0.6088 | 0.4650 | 10 |
| PCR | 1 + A | 0.7620 | 0.3195 | 0.5646 | 0.4701 | 0.5601 | 0.4931 | 19 |
| PLSR | 1 + A + B | 0.8973 | 0.2098 | 0.5861 | 0.4354 | 0.5863 | 0.4782 | 10 |
| PCR | 1 + A + B | 0.7081 | 0.3538 | 0.4133 | 0.5198 | 0.6087 | 0.4650 | 19 |
| PLSR | 1 + C | 0.5150 | 0.4561 | 0.1769 | 0.6109 | −0.0240 | 0.7522 | 3 |
| PCR | 1 + C | 0.2856 | 0.5535 | 0.1478 | 0.6192 | 0.0299 | 0.7648 | 6 |
| PLSR | 1 + D | 0.9013 | 0.2057 | 0.6119 | 0.4223 | 0.5792 | 0.4822 | 9 |
| PCR | 1 + D | 0.7503 | 0.3273 | 0.5121 | 0.4726 | 0.5818 | 0.4807 | 20 |
| PLSR | 1 + E | 0.9060 | 0.2008 | 0.5755 | 0.4435 | 0.5528 | 0.4971 | 9 |
| PCR | 1 + E | 0.7257 | 0.3430 | 0.5140 | 0.4741 | 0.4376 | 0.5575 | 17 |
| PLSR | 1 + F | 0.8900 | 0.2172 | 0.5782 | 0.4413 | 0.5095 | 0.5207 | 8 |
| PCR | 1 + F | 0.7357 | 0.3367 | 0.5326 | 0.4649 | 0.4547 | 0.5490 | 16 |
“Independent Prediction”stands for the accuracy of models by independent validation set (36 selected samples). “Number of predictors or factors” denotes the number of spectral principal components extracted. The number 1 represents the MA methods. 1 + A represents MA + log(1/X); 1 + A + B represents MA + log(1/X) + baseline correction; 1 + C represents MA + first derivative; 1 + D represents MA + area normalization; 1 + E represents MA + SNV; and 1 + F represents MA + MSC.
Prediction accuracy of PLSR and PCR modes for EC based on MF spectral smoothing.
| Method | Calibration | Cross-validation | Independent Prediction | Number of predictors or factors | ||||
|---|---|---|---|---|---|---|---|---|
| R2 | RMSE | R2 | RMSE | R2 | RMSE | |||
| PLSRa | 2 + A | 0.8600 | 0.2450 | 0.6010 | 0.4209 | 0.6677 | 0.4285 | 10 |
| PCR | 2 + A | 0.7677 | 0.3156 | 0.5247 | 0.4615 | 0.7031 | 0.4050 | 19 |
| PLSRa | 2 + A + B | 0.9159 | 0.1899 | 0.6246 | 0.4100 | 0.5612 | 0.4925 | 12 |
| PCR | 2 + A + B | 0.8033 | 0.2904 | 0.5415 | 0.4572 | 0.7206 | 0.3929 | 20 |
| PLSRa | 2 + C | 0.3732 | 0.5185 | 0.1241 | 0.6337 | 0.2066 | 0.6621 | 1 |
| PCR | 2 + C | 0.1556 | 0.6018 | 0.1217 | 0.6344 | 0.0373 | 0.7294 | 1 |
| PLSR | 2 + D | 0.8780 | 0.2288 | 0.6372 | 0.4029 | 0.6086 | 0.4651 | 9 |
| PCR | 2 + D | 0.8084 | 0.2867 | 0.6020 | 0.4243 | 0.6564 | 0.4357 | 20 |
| PLSRa | 2 + E | 0.8967 | 0.2105 | 0.6017 | 0.4243 | 0.5450 | 0.5015 | 9 |
| PCR | 2 + E | 0.7567 | 0.3230 | 0.5694 | 0.4420 | 0.5168 | 0.5167 | 18 |
| PLSRa | 2 + F | 0.8510 | 0.2508 | 0.5902 | 0.4320 | 0.4624 | 0.5450 | 8 |
| PCR | 2 + F | 0.7570 | 0.3228 | 0.5779 | 0.4378 | 0.4931 | 0.5292 | 17 |
“Independent Prediction”stands for the accuracy of models by independent validation set (36 selected samples). “Number of predictors or factors” denotes the number of spectral principal components extracted. The number 2 represents the median filtering data smoothing technique (MF) methods. 2 + A represents MF + log(1/X); 2 + A + B represents MF + log(1/X) + baseline correction; 2 + C represents MF + first derivative; 2 + D represents MF + area normalization; 2 + E represents MF + SNV; and 2 + F represents MF + MSC. The data in rows marked with the letter “a” are referenced from the literature[37].
Results of calibration, validation and prediction with different resampling intervals by the PCR analysis.
| Re-sampling intervals (nm) | Calibration | Cross-validation | Independent Prediction | Number of predictors or factors | |||
|---|---|---|---|---|---|---|---|
| R2 | RMSE | R2 | RMSE | R2 | RMSE | ||
| 2 | 0.7677 | 0.3156 | 0.5247 | 0.4615 | 0.7032 | 0.4050 | 19 |
| 4 | 0.7700 | 0.3141 | 0.5334 | 0.4571 | 0.6821 | 0.4192 | 19 |
| 6 | 0.7638 | 0.3183 | 0.5342 | 0.4588 | 0.6714 | 0.4261 | 19 |
| 8 | 0.7771 | 0.3092 | 0.5308 | 0.4597 | 0.7150 | 0.3968 | 19 |
| 10 | 0.7518 | 0.3262 | 0.5252 | 0.4602 | 0.6447 | 0.4431 | 19 |
| 16 | 0.7618 | 0.3196 | 0.5632 | 0.4424 | 0.6465 | 0.4420 | 18 |
| 32 | 0.8298 | 0.2702 | 0.6562 | 0.3938 | 0.5602 | 0.4930 | 19 |
| 64 | 0.8175 | 0.2798 | 0.6714 | 0.3826 | 0.4487 | 0.5520 | 18 |
“Independent Prediction”stands for the accuracy of models by independent validation set (36 selected samples). “Number of predictors or factors” denotes the number of spectral principal components extracted.
Figure 1Sampling plots in the classic district of northeast China. The small black dots represent the location of the sampling points. The rectangular frames represent the scope of the study area.