Literature DB >> 35303229

Identifying potentially contaminated areas with MaxEnt model for petrochemical industry in China.

Meng Wang1, Huichao Chen2, Mei Lei3.   

Abstract

The presence of heavy metal and organic pollutants in wastewater effluents, flue gases, and even solid wastes from petrochemical industries renders improper discharges liable to posing threats to the ecological environment and human health. It is beneficial for pollution control to find out the regional distribution of contaminated sites. This study explored the relationship between the petrochemical contaminated areas and natural, socio-economic, and traffic factors. Ten indicators were selected as input variables, and the MaxEnt model was conducted to identify the potentially contaminated areas. Moreover, among these 10 variables, the factors that have the great impact on the results were determined according to the contribution of variables. The results showed that the MaxEnt model performed well with AUC of 0.981 ± 0.004, and 90% of the measured contaminated sites was located in areas with medium and high probability of contamination in the prediction results. The map of potentially contaminated areas indicated that the areas with high probability of contamination were distributed in Yangtze River Delta, Beijing, Tianjin, southern Guangdong, Fujian coastal areas, central Hubei and northeast Hunan, central Sichuan, and southwest Chongqing. The responses of variables presented that high probability of petrochemical contamination tended to appear in cities with developed economy, dense population, and convenient transportation. This study presents a novel way to identify the potentially contaminated areas for petrochemical sites and provides a theoretical basis to formulate future management strategies.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Entities:  

Keywords:  MaxEnt; Petrochemical industry; Potentially contaminated areas; Soil contamination

Mesh:

Substances:

Year:  2022        PMID: 35303229      PMCID: PMC8931184          DOI: 10.1007/s11356-022-19697-8

Source DB:  PubMed          Journal:  Environ Sci Pollut Res Int        ISSN: 0944-1344            Impact factor:   5.190


Introduction

Petrochemical industry is an industry of processing petroleum products and chemical products with petroleum fractions and natural gas as raw materials through complex processes (Liu et al. 2011). The petrochemical industry, a pillar industry worldwide, has greatly promoted the development of economy (Fan et al. 2015). However, as the major source of organic and inorganic toxic pollutants, the petrochemical industry has also posed a great threat to environment and human health (Gonzalez et al. 2021; Jephcote et al. 2020; Lin et al. 2021; Wu et al. 2016). During the early industrial period, the backward technology and equipment as well as the lack of correct conducts had severely deteriorated the soil environment. To improve this situation, researches were conducted to optimize the production technology and waste treatment process (Abilov et al. 1999; Di Fabio et al. 2013; Muller and Craig 2016; Rejowski et al. 2009). Recently, digital modeling was adopted to optimize the process control system, and machine learning methods were applied to improve the safety of production process and product quality based on industrial big data (Geng et al. 2022; Han et al. 2022; Pariyani et al. 2010; Wu et al. 2022). In addition to technological breakthroughs, it is important for prevention and remediation of contaminated soil to find out the spatial distribution of potentially contaminated areas. Many studies have been carried out on soil contamination and risk assessment for a single site. These reports are mainly about the pollutant concentration (Han et al. 2020; Nadal et al. 2004), the extent of contamination (Zhang et al. 2013, 2014), human health, and ecological risks of a specific research object (Kim et al. 2001; Rovira et al. 2014). However, few investigations have been conducted to proactively identify potentially contaminated areas at a national scale (Liu et al. 2010; Teng et al. 2015; Zhang et al. 2014). It is important to identify the areas with high probability of contamination and provide a basis to formulate future management strategies (Nadal et al. 2006; Wang et al. 2020b). In this work, niche model was introduced to identify potentially contaminated areas at a large scale. The niche model is an effective tool to identify suitable areas of species and provide a quantitative framework to describe the relationship between characteristics and geographical distribution (Sillero 2011; Soberon and Nakamura 2009). The MaxEnt model is a niche model based on maximum entropy theory, developed by Phillips team in 2004 (Phillips et al. 2006). Given the environmental constraints, the MaxEnt model is able to find out the most possible distribution space of species in the study area (Elith et al. 2011). Recently, MaxEnt has been widely applied in spatial distribution of species such as Gentiana rigescens (Shen et al. 2021), potatoes (Wang et al. 2021a), and antelopes (Wang et al. 2021a). Moreover, the MaxEnt has also been used to present spatial distribution of fields such as foot and mouth disease (Gao and Ma 2021), Dengue fever (Li et al. 2017), COVID-19 (Ren et al. 2020), and energy systems sites (Tekin et al. 2021). These results show that MaxEnt can well identify and predict the spatial distribution of various research objects. Essentially, the principle of maximum entropy is to connect the problem with information entropy, and then take the maximum information entropy as a useful hypothesis. The target is subject to a set of constraints, and MaxEnt can present the target probability distribution by finding maximum entropy of these constraints. Moreover, an obvious advantage of this model is that it can get accurate results with less data. So far, the application of MaxEnt to calculate the possibility of contamination has not been reported. Therefore, MaxEnt was developed to identify potentially contaminated areas of petrochemical industry in China, and the probability distribution of contaminated areas is set to be related to natural, socio-economic, and traffic factors. In this study, the main purposes were as follows: (1) to present probability distribution of contaminated areas in petrochemical industry; (2) to explore the quantitative relationship between natural, socio-economic, and traffic factors and spatial distribution of potentially contaminated areas; (3) to reveal the thresholds of factors in areas with high probability of contamination; (4) to provide a basis for the better development of petrochemical industry. Additionally, this method can be extended to other industries, and the probability distribution of soil contamination can be obtained by superimposing the potentially contaminated areas of all industries in the study area.

Materials and methods

Study area

China is one of the most important producers and consumers of petrochemical products in the world (Wang et al. 2020b; Zhang et al. 2009). Petrochemical industry accounts for 20% of total industrial economy in China (Wang et al. 2020b). In this study, the mainland of China was covered in the research scope.

Data collection and processing

One hundred fifty contaminated sites of petrochemical industry in China were collected from official websites of ecological environment bureaus at all levels (Fig. 1). Only one occurrence within 10 km was kept to reduce the correlation between points (Kong et al. 2021). The spatial clusters of localities were eliminated by ENMtools in ArcGIS 10.2 (Yang et al. 2021). Therefore, 100 records on contaminated sites were maintained for analysis. The data was exported and converted into CSV format by Excel, which was used as the input of the actual distribution of contaminated sites in the MaxEnt software.
Fig. 1

Spatial distribution of contaminated sites

Spatial distribution of contaminated sites Ten variables were considered in the model, including natural variables (Nat1-3), socio-economic variables (Soc1-4), and traffic variables (Tra1-3) (Table 1). The natural variables and the socio-economic variables including gross domestic product (Soc1) and population density (Soc2) were collected from Resource and Environment Science and Data Center (http://www.resdc.cn/). The vector data of traffic variables and the socio-economic variables including distance to residential area (Soc3) and distance to residential point (Soc4) was collected from the National Fundamental Geographic Information System (http://www.ngcc.cn/), which was calculated (European distance) and converted into raster grids in ArcGIS 10.2. The specific information of the variables was described in Table 1. All variables were resampled and converted to ASCII raster grids on the 1-km × 1-km scale (Wei et al. 2021). Pearson’s correlation analysis of input variables was conducted in the ArcGIS 10.2, and the results presented that the absolute values of correlation coefficients were less than 0.8 (Fig. 2). Therefore, 10 variables were all input variables of the model (Su et al. 2021; Yang et al. 2013).
Table 1

Description of the variables

Variable classificationVariablesUnitData scaleSources
Natural variablesRainfall (Nat1)mmContinuousResource and Environment Science and Data Center (http://www.resdc.cn/)
Temperature (Nat2)°CContinuousResource and Environment Science and Data Center (http://www.resdc.cn/)
Soil type (Nat3)CategoricalResource and Environment Science and Data Center (http://www.resdc.cn/)
Socio-economic variablesGross domestic product (Soc1)10,000 yuan/km2ContinuousResource and Environment Science and Data Center (http://www.resdc.cn/)
Population density (Soc2)people/km2ContinuousResource and Environment Science and Data Center (http://www.resdc.cn/)
Distance to residential area (Soc3)mContinuousNational Fundamental Geographic Information System(http://www.ngcc.cn/)
Distance to residential point (Soc4)mContinuousNational Fundamental Geographic Information System (http://www.ngcc.cn/)
Traffic variablesDistance to railway (Tra1)mContinuousNational Fundamental Geographic Information System (http://www.ngcc.cn/)
Distance to road (Tra2)mContinuousNational Fundamental Geographic Information System (http://www.ngcc.cn/)
Distance to river (Tra3)mContinuousNational Fundamental Geographic Information System (http://www.ngcc.cn/)
Fig. 2

Pearson’s correlation analysis of input variables

Description of the variables Pearson’s correlation analysis of input variables

MaxEnt

In this study, MaxEnt was developed to present the potentially contaminated areas of petrochemical industry and investigated the relationship between the spatial distribution and the variables. The flowchart of the modelling process is shown in Figure S1. Seventy-five percent of the collected data was randomly selected as training data, while the remaining 25% was testing data (Guerra-Coss et al. 2021; Shabani et al. 2020). To ensure the stability, the model was performed with 10 replicates. The final output was the average of 10 replicates (Rodriguez-Basalo et al. 2021; Yadav et al. 2021). The receiver operating characteristic (ROC) was used to evaluate the model performance (Manzoor et al. 2021). The area under the curve (AUC) of testing set was calculated in the MaxEnt software, ranging from 0 to 1. The AUC value close to 1 represented perfect prediction while AUC value of 0.5 or below indicated a bad performance (Wang et al. 2021b). The model performance was divided into five levels by the AUC value: poor (0.5–0.6), fair (0.6–0.7), good (0.7–0.8), very good (0.8–0.9), and excellent (0.9–1.0) (Li et al. 2017). To explore the importance of the variables in identifying the potentially contaminated areas, the percent contribution of variables was evaluated. Moreover, response curves were used to show the relationship between the factors and probability distribution.

Results

Model performance

In this study, the MaxEnt model presented excellent performance with AUC of 0.981 ± 0.004 in the identification of potentially contaminated areas. In order to clearly estimate the spatial distribution of potentially contaminated areas, contaminated areas were divided into three levels according to the natural breakpoint method: low, medium, and high probability of contamination. The probability values of 100 sample sites were analyzed, and the results are shown in Table 2. Ninety percent of the samples was in areas with medium and high probability of contamination, and only 10% of the samples was in areas with low probability of contamination. This also indicated that MaxEnt model performed well in identifying potentially contaminated areas for petrochemical sites. The cities identified as the areas with medium and high probability of contamination require more attention to the soil contamination caused by industrial development.
Table 2

Probability analysis of contaminated sites at different levels

Low probability of contaminationMedium probability of contaminationHigh probability of contamination
0.100.270.63
Probability analysis of contaminated sites at different levels

Spatial distribution of potentially contaminated areas

Figure 3 shows the spatial distribution of potentially contaminated areas of petrochemical sites in China. The high probability of contamination occurred in Yangtze River Delta, Beijing, Tianjin, southern Guangdong, Fujian coastal areas, central Hubei and northeast Hunan, central Sichuan, and southwest Chongqing. Combined with the map of current petrochemical enterprises (Figure S1) and Fig. 3, it can be shown that potentially contaminated areas were often in areas with dense petrochemical enterprises. This indicated that the result was reasonable and consistent with industrial distribution.
Fig. 3

Potentially contaminated areas of petrochemical industry

Potentially contaminated areas of petrochemical industry

Percent contributions of input variables

The contributions of variables in identifying potentially contaminated areas presented that Soc1 (48.7% contribution) was the most relevant factor, followed by Soc3 (25.8% contribution), Soc2 (10.2% contribution), Nat3 (7.2% contribution), and Tra1(3.1% contribution). The socio-economic factors (85.9% contribution) are the most dependent variables for spatial distribution of contaminated sites, followed by the natural factors (10.3% contribution) and the traffic factors (3.7% contribution) (Table 3).
Table 3

Percent contributions of input variables to potentially contaminated areas derived from the MaxEnt model

VariablesPercent contribution (%)
Contaminated sites
Nat110.31.6
Nat21.5
Nat37.2
Soc185.948.7
Soc210.2
Soc325.8
Soc41.2
Tra13.73.1
Tra20.3
Tra30.3
Percent contributions of input variables to potentially contaminated areas derived from the MaxEnt model

Threshold determination of the factors

In order to eliminate the influence of correlation in variables and further explore the relationship between input factors and potentially contaminated areas, single-factor modeling was performed in the MaxEnt software. The response curves between the contamination probability, and the factors were plotted. The purpose was to find out the threshold value of variables, so as to formulate prevention management policies for the high-probability contaminated areas (Seaborn et al. 2021). The response curves are shown in Fig. 4. According to the response curves, the probability of contamination first increased and then decreased with the increase of rainfall and temperature, and it reached the highest value when the rainfall was 757 ~ 2318 mm, and the temperature was 13 ~ 21 °C. The soil types in areas with high probability of contamination are yellow brown soil, Cinnamon soil, lou soil, acid rocky soil, moisture soil, seashore saline soil, paddy soil, and dewatering paddy soil. The probability of contaminated site occurrence increased with the raise of gross national product (Wang et al. 2016; Xie et al. 2012). This showed that the soil contamination of petrochemical sites tended to occur in economically developed areas. The probability of contaminated site occurrence increased sharply with increasing population density (Lv and Yu 2018). Moreover, the probability decreased sharply with increasing distance to residential areas and points. Additionally, traffic factors were of importance for spatial distribution of contaminated sites. The possibility of contamination decreased when the distance to the road, railway, and waterway increased.
Fig. 4

Response curves of input factors based on MaxEnt (the soil type (Nat3) codes are shown on the website of Resource and Environment Science and Data Center (http://www.resdc.cn/))

Response curves of input factors based on MaxEnt (the soil type (Nat3) codes are shown on the website of Resource and Environment Science and Data Center (http://www.resdc.cn/)) The parameter values at different levels were extracted, and the thresholds of the factors are described in Table 4. Attention should be paid to the areas with factors within the thresholds, where soil contamination would probably occur with high risk to human health.
Table 4

Parameter values of factors at different levels

VariablesLow probability of contaminationMedium probability of contaminationHigh probability of contamination
Nat1(mm) < 363363 ~ 757 and > 2318757 ~ 2318
Nat2(oC) < 99 ~ 13 and > 2113 ~ 21
Nat3Dark brown soil, albic soil, grey cinnamon soil, chestnut soil, chestnut cinnamon soil, brown calcium soil, grey desert soil, grey brown desert soil, brown desert soil, red clay, newly deposited soil, cracked soil, aeolian sandy soil, ……Clay pan yellow browning soil, acid purple soil, acid coarse bone soil, meadow swamp soil, red earth, rinsing yellow soil, infiltration paddy soil, yellow red soil, shanyuan red soilYellow brown soil, cinnamon soil, lou soil, acid rocky soil, moisture soil, seashore saline soil, paddy soil and dewatering paddy soil
Soc1 (10,000yuan/km2) < 961961 ~ 2650 > 2650
Soc2 (people/km2) < 272272 ~ 578 > 578
Soc3 (m) > 14,4153105 ~ 14,415 < 3105
Soc4 (m) > 79983744 ~ 7998 < 3744
Tra1 (m) > 27,1246320 ~ 27,124 < 6320
Tra2 (m) > 37571548 ~ 3757 < 1548
Tra3 (m) > 71911951 ~ 7191 < 1951
Parameter values of factors at different levels

Discussion

Analysis of relevant factors

The contribution of input variables showed that the leading correlative factors in identification of potentially contaminated areas were as follows: Soc1, Soc3, Soc2, Nat3, and Tra1, which accounted for 94.9% of the cumulative contribution rate. Considering this, the MaxEnt model was re-established with the five factors as inputs, and the performance was evaluated. It can be found that the MaxEnt model presented an AUC of 0.979 ± 0.003 in the identification of potentially contaminated areas. This illustrated that potentially contaminated areas of petrochemical industry can also be well identified with these five factors based on the MaxEnt model. Particularly, a close relationship was found between spatial distribution of potentially contaminated areas and socio-economic conditions. In China, all of the seven world-class petrochemical industry bases are located in coastal areas with developed economy and dense population, and the middle and downstream industries for chemical products rely on the oil refining industry, so a decreasing trend in the spatial distribution of petrochemical enterprises was developed from the east coast to the west inland. Therefore, the socio-economic factors played the important roles in the distribution of petrochemical soil contamination (Wang et al. 2020c). The little difference in Soc4 (the distance to residential points (yurts, grazing sites and ordinary houses)) may be due to the less significant difference between the eastern region and the central region. Similarly, the transportation of petrochemical products is essential for industry and is normally convenient by sea and by railway. Sea transportation is suitable for international trade and long-distance transportation in domestic coastal areas, while railway is an important way for land transportation. So, Tra1 presented great importance in the distribution of petrochemical sites and thus the potentially contaminated areas (Zou and Duan 2019). Additionally, soil types show diversity in spatial distribution, and there are great differences in soil types among different regions. Moreover, soil types affect the migration and transformation of pollutants in soil (Sukarjo et al. 2019; Yang et al. 2014). Therefore, these factors (Soc1, Soc3, Soc2, Nat3, and Tra1) played a decisive role in the identification of potentially contaminated areas and needed special attention when formulating management strategies.

More attention to developed areas

The contributions of input variables showed that GDP (Soc1) was the most relevant factor, and Fig. 3 presents that the potentially contaminated areas of petrochemical industry were mainly distributed in developed areas (Wang et al. 2020d) while there were few petrochemical enterprises in Western China with vast land and sparse population, economic depression, and backward transportation. Therefore, the soil contamination of petrochemical sites needs more attention in developed areas. Jiangsu is a typical representative of developed regions. As shown in Fig. 5, the probability of soil contamination of petrochemical sites in the Jiangsu was generally high, especially the region along Yangtze River (Jia et al. 2021, 2020; Wang et al. 2020a). Therefore, developed regions like Jiangsu should be given priority in the pollution research and risk management of petrochemical industry (Qiu et al. 2019).
Fig. 5

Distribution of potentially contaminated areas in Jiangsu

Distribution of potentially contaminated areas in Jiangsu

Suggestions

Petrochemical sites are mainly distributed in developed areas, which is determined by socio-economic factors (Wang et al. 2016; Xie et al. 2012). To reduce the soil burden and control risk caused by petrochemical industry, petrochemical enterprises with serious pollution and small scale can be relocated from the developed areas. For areas with natural conditions that are easy to form pollution, more strict access requirement for petrochemical industry should be implemented (Li et al. 2015). At the same time, the petrochemical industry should further promote the green transformation, eliminate the backward production capacities of high energy consumption and enlarged pollution discharge, and increase the proportion of environment-friendly green products (Tantisattayakul et al. 2016).

Limitations of the research and future prospects

Though the dataset of contaminated petrochemical sites was limited for the analysis, the performance of MaxEnt model in identifying the potentially contaminated areas was still excellent. Due to the limitation of data acquisition, only few factors are discussed in this study. If more detailed data are available, the methodology can be further explored through the following ways: (1) The factors can be explored about the probability distribution of contaminated areas, such as policies and pollution emissions. (2) This paper only explored the possibility of regional petrochemical pollution, but not the degree and density of contamination. If the degree of contamination of samples is available, the severity and density would be investigated in combination with the weight-matrix that denotes degree of contamination. In this study, the petrochemical industry was taken as an example to present the spatial distribution of potentially contaminated areas and explore the important factors in identification of contaminated areas with MaxEnt model. This method can also be used to analyze other high-pollution industries or combine multiple industries to analyze the superposition of contamination.

Conclusion

In this study, a novel method was proposed to identify the potentially contaminated areas for petrochemical industry and reveal threshold of factors based on MaxEnt model. The MaxEnt model performed well with AUC of 0.981 ± 0.004 for spatial distribution of soil contamination caused by petrochemical activities. The areas with high probability of contamination tended to locate in developed zone and were distributed in Yangtze River Delta, Beijing, Tianjin, southern Guangdong, Fujian coastal areas, central Hubei and northeast Hunan, central Sichuan, and southwest Chongqing. Among the factors being explored in this study, the socio-economic variables were the most relevant factors for identification of potentially contaminated areas, followed by the natural factors and the traffic factors. Gross domestic product, distance to residential area, population density, soil type, and distance to railway accounted for 94.9% of the cumulative contribution rate. Gross domestic product, distance to residential area, population density, soil type, and distance to railway were considered as inputs to re-establish the MaxEnt model. The results showed that the MaxEnt model performed well with AUC of 0.979 ± 0.003 for the potentially contaminated areas of petrochemical industry based on these five factors. The thresholds of factors were as the following: Soc1 > 2650 1000yuan/km2, Soc3 < 3105 m, Soc2 > 578 people/km2, Nat3: yellow brown soil, cinnamon soil, lou soil, acid rocky soil, moisture soil, seashore saline soil, paddy soil and dewatering paddy soil, Tra1 < 6320 m. Soil contamination caused by petrochemical activities should be paid attention in the areas with factors within the thresholds. Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 596 kb)
  26 in total

1.  Efficacy of histopathology in detecting petrochemical-induced toxicity in wild cotton rats (Sigmodon hispidus).

Authors:  S Kim; R L Lochmiller; E L Stair; J W Lish; D P Rafferty; C W Qualls
Journal:  Environ Pollut       Date:  2001       Impact factor: 8.071

2.  Spatial distribution and assessment of the human health risks of heavy metals in a retired petrochemical industrial area, south China.

Authors:  Shiyu Wang; Yusef Kianpoor Kalkhajeh; Zhirui Qin; Wentao Jiao
Journal:  Environ Res       Date:  2020-05-19       Impact factor: 6.498

3.  Selection of renewable energy systems sites using the MaxEnt model in the Eastern Mediterranean region in Turkey.

Authors:  Senem Tekin; Esra Deniz Guner; Ahmet Cilek; Müge Unal Cilek
Journal:  Environ Sci Pollut Res Int       Date:  2021-05-13       Impact factor: 4.223

4.  Ozone pollution characteristics and sensitivity analysis using an observation-based model in Nanjing, Yangtze River Delta Region of China.

Authors:  Ming Wang; Wentai Chen; Lin Zhang; Wei Qin; Yong Zhang; Xiangzhi Zhang; Xin Xie
Journal:  J Environ Sci (China)       Date:  2020-03-12       Impact factor: 5.565

Review 5.  Concentrations of arsenic and vanadium in environmental and biological samples collected in the neighborhood of petrochemical industries: A review of the scientific literature.

Authors:  Neus González; Roser Esplugas; Montse Marquès; José L Domingo
Journal:  Sci Total Environ       Date:  2021-01-26       Impact factor: 7.963

6.  Definition and GIS-based characterization of an integral risk index applied to a chemical/petrochemical area.

Authors:  Martí Nadal; Vikas Kumar; Marta Schuhmacher; José L Domingo
Journal:  Chemosphere       Date:  2006-01-26       Impact factor: 7.086

7.  Investigation of health risk assessment and odor pollution of volatile organic compounds from industrial activities in the Yangtze River Delta region, China.

Authors:  Haohao Jia; Song Gao; Yusen Duan; Qingyan Fu; Xiang Che; Hui Xu; Zhuo Wang; Jinping Cheng
Journal:  Ecotoxicol Environ Saf       Date:  2020-10-28       Impact factor: 6.291

8.  Early forecasting of the potential risk zones of COVID-19 in China's megacities.

Authors:  Hongyan Ren; Lu Zhao; An Zhang; Liuyi Song; Yilan Liao; Weili Lu; Cheng Cui
Journal:  Sci Total Environ       Date:  2020-04-26       Impact factor: 7.963

9.  A systematic review and meta-analysis of haematological malignancies in residents living near petrochemical facilities.

Authors:  Calvin Jephcote; David Brown; Thomas Verbeek; Alice Mah
Journal:  Environ Health       Date:  2020-05-19       Impact factor: 5.984

10.  Chinese caterpillar fungus (Ophiocordyceps sinensis) in China: Current distribution, trading, and futures under climate change and overexploitation.

Authors:  Yanqiang Wei; Liang Zhang; Jinniu Wang; Wenwen Wang; Naudiyal Niyati; Yanlong Guo; Xufeng Wang
Journal:  Sci Total Environ       Date:  2020-09-28       Impact factor: 7.963

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.