| Literature DB >> 32998478 |
Joel Podgorski1,2, Ruohan Wu1, Biswajit Chakravorty3, David A Polya1.
Abstract
Groundwater is a critical resource in India for the supply of drinking water and for irrigation. Its usage is limited not only by its quantity but also by its quality. Among the most important contaminants of groundwater in India is arsenic, which naturally accumulates in some aquifers. In this study we create a random forest model with over 145,000 arsenic concentration measurements and over two dozen predictor variables of surface environmental parameters to produce hazard and exposure maps of the areas and populations potentially exposed to high arsenic concentrations (>10 µg/L) in groundwater. Statistical relationships found between the predictor variables and arsenic measurements are broadly consistent with major geochemical processes known to mobilize arsenic in aquifers. In addition to known high arsenic areas, such as along the Ganges and Brahmaputra rivers, we have identified several other areas around the country that have hitherto not been identified as potential arsenic hotspots. Based on recent reported rates of household groundwater use for rural and urban areas, we estimate that between about 18-30 million people in India are currently at risk of high exposure to arsenic through their drinking water supply. The hazard models here can be used to inform prioritization of groundwater quality testing and environmental public health tracking programs.Entities:
Keywords: India; arsenic; geospatial modeling; groundwater; machine learning; random forest
Mesh:
Substances:
Year: 2020 PMID: 32998478 PMCID: PMC7579008 DOI: 10.3390/ijerph17197119
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Summary of groundwater arsenic concentration data used in the model. Existing arsenic measurements taken from over 30 sources, mainly from India but also from some neighboring South Asian countries. Summaries are given for before and after spatial averaging.
| Country | Number of Data Points, before Spatial Averaging | Mean (±Standard Deviation) As Concentration, before Spatial Averaging | Number of Data Points, after Spatial Averaging | Mean (±Standard Deviation) As Concentration, after Spatial Averaging |
|---|---|---|---|---|
| India [ | 132,028 (91%) | 53 ± 451 µg/L | 17,528 (74%) | 33 ± 162 µg/L |
| Bangladesh [ | 4215 (3%) | 62 ± 139 µg/L | 3674 (15%) | 56 ± 120 µg/L |
| Nepal [ | 7575 (5%) | 15 ± 62 µg/L | 1846 (8%) | 16 ± 66 µg/L |
| Pakistan [ | 1279 (1%) | 103 ± 123 µg/L | 760 (3%) | 71 ± 98 µg/L |
| Total | 145,097 | 52 ± 438 µg/L | 23,808 | 37 ± 150 µg/L |
Figure 1Groundwater arsenic data points and simplified geology of the Indian subcontinent. (a) Spatially averaged arsenic data points used in modeling along with topography in India and neighboring countries. (b) Lithology of the Indian subcontinent.
Predictor variables and descriptions. The 26 parameters used as predictor variables in modeling are grouped into the categories ‘Climate’, ‘Soil’ and ‘Other’. Their resolution is one square km at the equator.
| Variable | Description |
|---|---|
|
| |
| Actual evapotranspiration (AET) [ | Average rate of actual evapotranspiration (mm/yr) |
| Aridity | PET/Precipitation |
| Potential evapotranspiration (PET) [ | Average rate of potential evapotranspiration (mm/yr) |
| Precipitation [ | Average rate of precipitation (mm/yr) |
| Priestley-Taylor alpha coefficient [ | AET/PET |
|
| |
| Calcisols [ | Probability of the occurrence of calcisols |
| Clay, subsoil [ | Weight % of clay particles (<0.0002 mm) at 2 m depth |
| Clay, topsoil [ | Weight % of clay particles (<0.0002 mm) at 0 m depth |
| Coarse fragments, subsoil [ | Volumetric % of coarse fragments (>2 mm) at 2 m depth |
| Coarse fragments, topsoil [ | Volumetric % of coarse fragments (>2 mm) at 0 m depth |
| Fluvisols [ | Probability of the occurrence of fluvisols |
| Gleysols [ | Probability of the occurrence of gleysols |
| Sand, subsoil [ | Weight % of sand particles (0.05–2 mm) at 2 m depth |
| Sand, topsoil [ | Weight % of sand particles (0.05–2 mm) at 0 m depth |
| Silt, subsoil [ | Weight % of silt particles (0.0002–0.05 mm) at 2 m depth |
| Silt, topsoil [ | Weight % of silt particles (0.0002–0.05 mm) at 0 m depth |
| Soil cation exchange capacity [ | Cation exchange capacity (cmolc/kg) at 2 m depth |
| Soil organic carbon [ | Soil organic carbon (permille) at 2 m depth |
| Soil organic carbon density [ | Soil organic carbon density (kg/m3) at 2 m depth |
| Soil pH [ | Soil pH measured in water at 2 m depth |
| Solonchaks [ | Probability of the occurrence of solonchaks |
| Water wilting point [ | Vol. % of available soil water until wilting point at 2 m depth |
|
| |
| Land cover [ | 17 different land cover categories according to the International Geosphere-Biosphere Programme (IGBP) |
| Lithology [ | 15 different categories of lithology |
| Topographic wetness index [ | Combination of upslope contributing area and slope |
| Water table depth [ | Mean water table depth (m) |
Confusion matrix and other statistics resulting from the analysis of the final random forest model with the test dataset at a probability cutoff of 0.5.
|
| ||
|
|
|
|
|
| 2223 | 462 |
|
| 561 | 1514 |
|
|
| |
| Accuracy (Acc) | 0.7851 | |
| No information rate (NIR) | 0.5849 | |
| <2.2 × 10−16 | ||
| Cohen’s kappa | 0.5606 | |
| Sensitivity | 0.7662 | |
| Specificity | 0.7985 | |
| Positive predictive value | 0.7296 | |
| Negative predictive value | 0.8279 | |
| Prevalence | 0.4151 | |
| Balanced accuracy | 0.7823 | |
Figure 2Arsenic hazard maps. (a) Probability of arsenic concentration in groundwater exceeding 10 µg/L. (b) High hazard areas in India based on probability cutoffs of 0.49 and 0.55.
Figure 3Normalized variable importance in terms of mean decrease in accuracy and mean decrease in Gini as calculated on the test dataset. Both decrease in accuracy and decrease in Gini were normalized by their respective greatest values (see Table S1).
Figure 4Correlations of predictor variables (a–x) with percentages of arsenic data points exceeding 10 µg/L in 16 equally sized bins. Kendall correlations (τB) with a statistically significant p value (95% confidence level) are shown in bold.
Figure 5Analyses of model performance using full modeling dataset. (a) Sensitivity and specificity were found to be equivalent at a probability cutoff of 0.49 with a corresponding accuracy of 96%. (b) Positive predictive value (PPV) and negative predictive value (NPV) were found to be equivalent at a probability cutoff of 0.55 also with a corresponding accuracy of 96%.
Area and population potentially exposed to arsenic concentrations greater than 10 µg/L by state/territory. Based on probabilities in Figure 2a exceeding 0.49 and 0.55 along with the rates [64] of household groundwater use in rural and urban areas. See text for limitations.
| State/Territory | Percentage of Land Area Exposed | Population Exposed |
|---|---|---|
| Andaman and Nicobar | 0.4–2.9% | 300–2700 |
| Andhra Pradesh | <0.1% | 2700–6600 |
| Arunachal Pradesh | 4.3–21.6% | 69,800–157,700 |
| Assam | 42.3–59.7% | 6,536,000–8,771,100 |
| Bihar | 3.0–12.0% | 1,226,800–4,636,500 |
| Chandigarh | n/a | n/a |
| Chhattisgarh | <0.1% | 700–1100 |
| Dadra and Nagar Haveli and Daman and Diu | n/a | n/a |
| Delhi | n/a | n/a |
| Goa | n/a | n/a |
| Gujarat | 0.3–4.0% | 19,300–97,300 |
| Haryana | 0.4–5.0% | 39,200–447,200 |
| Himachal Pradesh | 0.4–0.9% | 36,800–76,900 |
| Jammu and Kashmir | 0.7–1.1% | 337,800–470,800 |
| Jharkhand | 0.2–0.6% | 103,600–231,400 |
| Karnataka | 0.1–0.5% | 29,400–93,900 |
| Kerala | <0.4% | 10,400–77,300 |
| Madhya Pradesh | 0.7–2.1% | 201,200–552,100 |
| Maharashtra | <0.1% | 300–1700 |
| Manipur | 8.6–22.7% | 46,500–121,900 |
| Meghalaya | 0.2–1.5% | 3300–13,800 |
| Mizoram | 4.5–18.0% | 23,500–82,400 |
| Nagaland | 5.9–21.5% | 54,300–188,600 |
| Odisha | <0.4% | 1300–194,600 |
| Puducherry | n/a | n/a |
| Punjab | 2.3–6.8% | 299,100–788,500 |
| Rajasthan | <0.1% | 2300–10,800 |
| Sikkim | n/a | n/a |
| Tamil Nadu | <0.1% | 200–200 |
| Telangana | <0.2% | 4100–12,900 |
| Tripura | 0.1–1.4% | 800–10,000 |
| Uttar Pradesh | 1.0–2.4% | 1,222,800–2,458,500 |
| Uttarakhand | <0.7% | 900–42,300 |
| West Bengal | 12.9–20.5% | 7,432,200–10,144,700 |
| Total | 2.0–4.2% | 17,710,000–29,690,000 |