| Literature DB >> 35692327 |
Wenhui Zhu1, Jun He1, Hongzhen Zhang1, Liang Cheng1, Xintong Yang1, Xiahui Wang1, Guohua Ji1.
Abstract
The traditional risk management and control mode (RMCM) in regional sites has the defects of low efficiency, high cost, and lack of systematism. Trying to resolve these defects and explore the application possibility of machine learning, a characteristic dataset for RMCM in regional sites was established. Three decision tree (DT) algorithms (CHAID, EXHAUSTIVE CHAID, and CART) and two artificial neural network (ANN) algorithms [back propagation (BP) and radial basis function (RBF)] were implemented to predict RMCM in regional sites. The results showed that in the aspects of accuracy (ACC), precision (PRE), recall ratio (REC), and F 1 value, CART-DT was superior to CHAID-DT and EXHAUSTIVE CHAID-DT (E-CHAID-DT); and BP-ANN was superior to RBF-ANN. However, CART-DT was inferior to BP-ANN in ACC, PRE, REC, and F 1 value. BP-ANN model is good at non-linear mapping, and it has a flexible network structure and a low risk of over-fitting. The case study of a typical county demonstration area confirmed the extensibility of the method, and the method has great potential in RMCM prediction in regional sites in the future.Entities:
Keywords: artificial neural network (ANN); decision tree (DT); prediction performance; regional sites; risk management and control mode (RMCM)
Mesh:
Year: 2022 PMID: 35692327 PMCID: PMC9178191 DOI: 10.3389/fpubh.2022.892423
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Technical route of RMCM prediction in regional sites.
Description of vectorization rules of characteristic variables.
|
|
|
|
|
|---|---|---|---|
| 1 | Regional dominant land type (RDLT) | Dominant land type: sensitive; non-sensitive | {0, 1} |
| Regional land value-added potential (RLVP) | Land increment is higher or lower than restoration cost: yes; no | {0, 1} | |
| Regional dominant industry risk (RDIS) | Whether it belongs to smelting, petrochemical, coking, electroplating, tanning, hazardous waste disposal industries: yes; no | {0, 1} | |
| Regional pollution level (RPL) | ≥ intervention value; screening value < X < intervention value | {0, 1} | |
| Regional dominant functional area (RDFA) | Ecological; agricultural; urban | {1, 2, 3} | |
| Regional protection goal (RPG) | Surface water; groundwater; people; other | {1, 2, 3, 4} | |
| Regional pollution type (RPT) | Heavy metal; organic; compound | {1,2,3} | |
| Regional topography (RT) | Plain; hill; mountain | {1, 2, 3} | |
| 2 | Regional pollution range (RPR) | Large (non-point); medium (points); small (point) | {1, 2, 3} |
| Regional groundwater migration (RGM) | Strong (gravels); medium (sand); weak (silt) | {1, 2, 3} | |
| Regional soil barrier capacity (RSBC) | Strong (clay); medium (sand or silt); weak (gravels) | {1, 2, 3} | |
| 3 | Regional average production period (RAPP) | Long (>20 years); medium (5–20 years); short (<5 years) | {1, 2, 3} |
| Regional per capita gross domestic product (RPCGDP) | Advanced (> $20000); medium ($8000–$20000); backward (< $8000) | {1, 2, 3} | |
| Regional road network density (RRND) | Large (>5 km/km2); medium (1–5 km/km2); small (<1 km/km2) | {1, 2, 3} | |
| Regional average annual rainfall (RAAR) | Humid (>800 mm); medium (400–800 mm); arid (0–400 mm) | {1, 2, 3} | |
| Regional average annual wind speed (RAAWS) | Strong (>4 m/s); medium (2–4 m/s); weak (<2 m/s) | {1, 2, 3} | |
| Regional cultivated land density (RCLD) | Large (>30%); medium (10–30%); small (<10%) | {1, 2, 3} | |
| Regional river network density (RRD) | Large (>0.5 km/km2); medium (0.1–0.5 km/km2); small (<0.1 km/km2) | {1, 2, 3} | |
| 4 | Regional enterprise density (RED) | Large (≥ 5/km2); medium (1–5/km2); small ( ≤ 1/km2) | {1, 2, 3} |
| Regional population density (RPD) | Large (>100 p/km2); medium (25–100 p/km2); small (<25 p/km2) | {1, 2, 3} |
Type 1 represents unordered categorical variable; Type 2 represents ordered categorical variable; Type 3 represents continuous variable; Type 4 represents discrete variable.
Figure 2The DT model based on CHAID algorithm (A), E-CHAID algorithm (B), and CART algorithm (C).
Figure 3The ANN model based on the BP algorithm.
Figure 4The ANN model based on the RBF algorithm.
Figure 5The prediction performance evaluation for DT of CHAID, E-CHAID, and CART [(A) training database; (B) validation database].
Figure 6The prediction performance evaluation for ANN of BP and RBF (A) Training database; (B) Validation database.
Figure 7The importance of the contribution rate of BP–ANN input neurons to output variables.
Figure 8Schematic diagram of geographical location of the study area.
Statistical list of data sources.
|
|
|
|
|---|---|---|
| 1 | RT | Digitization of Geomorphologic Map of China (scale 1:4 million) |
| 2 | RAAR | Spatial interpolation dataset of annual precipitation in China in 2015, with an ACC of 1 km ×1 km |
| 3 | RAAWS | Average the wind speed statistics of meteorological stations in the study area (2015) |
| 4 | RCLD | Data of the Second National Land Use Survey−2016 National Land Change Survey |
| 5 | RRD | National 1:1 million basic geographic database (2015) |
| 6 | RPT | Statistics of land survey results of enterprises in key industries in the study area |
| 7 | RPL | Census data of soil pollution in the study area |
| 8 | RPR | Census data of soil pollution in the study area |
| 9 | RSBC | Spatial distribution data of soil texture in China (1:1 million) |
| 10 | RGW | Hydrogeological map of China (1:6 million) |
| 11 | RPCGDP | China's km grid gross domestic product (GDP) distribution dataset (2015), with an ACC of 1 km ×1 km |
| 12 | RLVP | Comprehensive judgment on the results of the urban master plan (2014–2030) in the study area and the results of personnel interviews. |
| 13 | RDFA | Achievements of the implementation plan of “three lines and one list” ecological environment zoning management and control in the study area |
| 14 | RDLT | Data of the Second National Land Use Survey-2016 Land Change Survey |
| 15 | RED | Statistics of land survey results of enterprises in key industries in the study area |
| 16 | RPD | China's population spatial distribution kilometer grid dataset, 2015, with an ACC of 1 km ×1 km |
| 17 | RRND | National 1:1 million basic geographic database (2015) |
| 18 | RDIS | Statistics of land survey results of enterprises in key industries in the study area |
| 19 | RPG | Land use survey of enterprises in key industries in the study area-statistical analysis of basic information collection results, supplemented by personnel interviews and on-site reconnaissance and judgment. |
| 20 | RAPP | Land use survey of enterprises in key industries in the study area—statistics of basic information collection results |
Figure 9Schematic prediction results of RMCM in the study area (M1: institutional control; M2: remediation; M4: remediation, engineering control, and institutional control).