| Literature DB >> 26964095 |
Lefeng Qiu1, Kai Wang2, Wenli Long3, Ke Wang4, Wei Hu1, Gabriel S Amable5.
Abstract
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26964095 PMCID: PMC4786095 DOI: 10.1371/journal.pone.0151131
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Study area location and sampling points.
Fig 2Spatial distribution of the soil Cd concentrations in relation to the (a) land use types, (b) industry types and (c) town center and main highway.
The environmental variables selected for model calibration.
| Environmental variable | Abbreviation | Unit | Type | Mean | Minimum | Maximum |
|---|---|---|---|---|---|---|
| pH | - | Continuous | 6.06 | 4.40 | 8.31 | |
| SOM | % | Continuous | 3.12 | 0.57 | 6.50 | |
| ELE | m | Continuous | 37.66 | 4.43 | 146.26 | |
| Lu_Veg | - | Binary | - | - | - | |
| Lu_Pad | - | Binary | - | - | - | |
| Lu_Dry | - | Binary | - | - | - | |
| Lu_For | - | Binary | - | - | - | |
| Lu_Orc | - | Binary | - | - | - | |
| Dmetal | km | Continuous | 9.07 | 0.05 | 30.69 | |
| Dhardware | km | Continuous | 3.39 | 0.04 | 11.38 | |
| Dbuild | km | Continuous | 7.43 | 0.03 | 26.91 | |
| Dchemical | km | Continuous | 6.24 | 0.06 | 18.50 | |
| Dpaper | km | Continuous | 10.73 | 0.04 | 40.58 | |
| Dother | km | Continuous | 34.27 | 0.34 | 59.43 | |
| Droad | km | Continuous | 3.10 | 0.002 | 21.97 | |
| Dtown | km | Continuous | 3.14 | 0.33 | 8.81 |
a Binary variable (0 for absence and 1 for presence).
Fig 3(a) SLR, (b) CART, and (c) RF predictions of Cd in agricultural soils in Fuyang County.
The areas in red are predicted to exceed the Chinese soil Cd guide limit of 0.3 mg/kg.
Estimated parameters of the SLR model.
| Variable | Parameter | Std. error | t value | p-value |
|---|---|---|---|---|
| -3.619 | 0.379 | -9.541 | 0.000 | |
| -0.007 | 0.002 | -4.314 | 0.000 | |
| 0.473 | 0.061 | 7.79 | 0.000 | |
| -0.038 | 0.008 | -5.012 | 0.000 | |
| -0.087 | 0.035 | -2.503 | 0.013 |
Fig 4CART model developed to predict Cd in agricultural soils in Fuyang County.
The lengths of the lines or "branches" are proportional to the variance explained, and longer branches explain more variance. Below: 0–0.3 mg/kg; Above: > 0.3 mg/kg.
The relative importance of the variables for explaining soil Cd variation as indicated by variations in misclassification errors and the numbers of terminal nodes in the CART model following the exclusion of predictors.
| Variable | Misclassification error rate | Number of terminal nodes |
|---|---|---|
| 8.11% | 9 | |
| 9.01% | 9 | |
| 9.91% | 8 | |
| 10.36% | 7 | |
| 10.36% | 8 | |
| 10.81% | 7 | |
| 13.51% | 8 |
Error matrices for the SLR, CART and RF predictions of the soil Cd concentrations.
| SLR | CART | RF | ||||
|---|---|---|---|---|---|---|
| Low | High | Low | High | Low | High | |
| 27 | 5 | 23 | 9 | 25 | 7 | |
| 10 | 12 | 7 | 15 | 6 | 16 | |
| 72.97 | 70.59 | 76.67 | 62.50 | 80.65 | 69.57 | |
SLR: total accuracy: 72.22% and kappa coefficient: 0.4048. CART: total accuracy: 70.37% and kappa coefficient: 0.3949. RF: 75.93% and kappa coefficient: 0.5050. Low: 0–0.3 mg/kg; High: > 0.3 mg/kg
Fig 5Performances of the SLR, CART, and RF models in the prediction of soil Cd concentrations.