Literature DB >> 30720752

Exploring Spatial Influence of Remotely Sensed PM2.5 Concentration Using a Developed Deep Convolutional Neural Network Model.

Junming Li1, Meijun Jin2, Honglin Li3.   

Abstract

Currently, more and more remotely sensed data are being accumulated, and the spatial analysis methods for remotely sensed data, especially big data, are desiderating innovation. A deep convolutional network (CNN) model is proposed in this paper for exploiting the spatial influence feature in remotely sensed data. The method was applied in investigating the magnitude of the spatial influence of four factors-population, gross domestic product (GDP), terrain, land-use and land-cover (LULC)-on remotely sensed PM2.5 concentration over China. Satisfactory results were produced by the method. It demonstrates that the deep CNN model can be well applied in the field of spatial analysing remotely sensed big data. And the accuracy of the deep CNN is much higher than of geographically weighted regression (GWR) based on comparation. The results showed that population spatial density, GDP spatial density, terrain, and LULC could together determine the spatial distribution of PM2.5 annual concentrations with an overall spatial influencing magnitude of 97.85%. Population, GDP, terrain, and LULC have individual spatial influencing magnitudes of 47.12% and 36.13%, 50.07% and 40.91% on PM2.5 annual concentrations respectively. Terrain and LULC are the dominating spatial influencing factors, and only these two factors together may approximately determine the spatial pattern of PM2.5 annual concentration over China with a high spatial influencing magnitude of 96.65%.

Entities:  

Keywords:  PM2.5 pollution; deep convolutional network; remote sensing; spatial influence

Mesh:

Substances:

Year:  2019        PMID: 30720752      PMCID: PMC6388139          DOI: 10.3390/ijerph16030454

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   3.390


1. Introduction

Remote sensing technology has developed rapidly since the 1960s [1], and an abundance of remote sensing data has been accumulated in this 50-year period. Although abundant remotely sensed data have been applied to many fields, such as ecology, environment, geography, etc., the spatial analysis method for remotely sensed lattice data desiderates innovation. A single spatial variable generally has autocorrelation [2] (i.e., spatial dependence [3,4]), and various spatial variables have correlation. Spatial autocorrelations can be analysed with local indicators of the spatial association (LISA) index [5] (e.g., local Moran I [6], local Geary c index [7]). The main objective of spatial analysis is to identify the natural relationships that exist between variables [8,9]. The mainstream classical spatial analysis models, e.g., spatial lag model [10,11], spatial error model [10,11], and Bayesian spatial regression model [12], can only evaluate the overall or average linear correlation feature over a whole study region, neglecting the details of local area. These methods ignore the consequences of spatial heterogeneity [13]. The majority of spatial analysing methods assume stationary space. However, assuming spatial convariance structure to be stationary is not so reasonable [14]. The spatial influencing relationship can better be explored when the analysis is local and more detailed results can be yielded [15]. The inclusion of a spatial heterogeneity resulting from differences in environmental conditions, socioeconomic dynamics, and other factors reinforces the need for more regionalized spatial analyses in exposure assessment and public health [16]. Although, the geographically weighted regression (GWR) [17] method considers local details; however, it can only describe a linear or simple non-linear spatial influencing relationship. In the era of big data, the need for developing advanced spatial analysis methods (e.g., machine learning methods) for remotely sensed data is urgent. Previously, there are several studies that have applied machine learning methods to address the influencing factors on concentrations. Zheng et al. [18] used traditional artificial neural networks to model spatial correlation between Beijing’s air qualities and influencing factors, e.g., meteorology, traffic flow, human mobility. Yan et al. [19] predicted the daily average PM2.5 concentration in Nanjing, Beijing, and Sanya, combining meteorological and contaminant factors based on the Long Short-Term Memory (LSTM) model. Suleiman et al. [20] presented a machine learning model to predict the traffic-related and concentrations from various variables (e.g., traffic variables). Hsieh et al. [21] proposed a semi-supervised learning algorithm to optimize the monitoring locations of air quality in Beijing based on spatial correlation. Certainly, there are some studies which utilized typical methods to investigate the influence of satellite-based PM2.5. He et al. [22] used empirical orthogonal function (EOF) to analyse the relationship between remotely sensed and climate circulation transformation in East China. Hajiloo et al. [23] employed geographical weight regression (GWR) to investigate impact of meteorological and environmental parameters on PM2.5 concentrations in Tehran, Iran. Yang et al. [24] quantified the influence of natural and socioeconomic factors on pollution using the GeoDetector model [25,26]. This study proposed a spatial analysis method that exploits the spatial influencing feature of remotely sensed data based on the deep CNN. CNN is a mainstream deep learning method and can effectively extract the feature representations from a large number of images [27] and object detection [28]. Some researchers have applied deep CNN in remote sensing classification. Q. Zou et al. [29] and Zhao et al. [30] proposed a DBN method for high-solution satellite imagery classification. H. Liang and Q. Li, C. Tao et al., and F.P.S. Luus et al. [31], Nogueira et al. [32], Volpi et al. [33], Chen et al. [34], have employed deep CNN in hyperspectral imagery classification or feature extraction. Some researchers have also employed deep CNN in synthetic aperture radar (SAR) image classification, e.g., Du et al. [35] and Geng et al. [36]. To our knowledge, the research applying deep CNN into spatial influencing of remotely sensed lattice data is very rare. This study aimed to present a deep CNN model exploiting the magnitude of spatial influence of four factors—population, gross domestic product (GDP), terrain, and land-use and land-cover (LULC)—to remotely sense the annual mean concentration of over China. This model not only considers local spatial heterogeneity but also has super nonlinear fitting ability. Therefore, the presented model is rooted in a deep learning framework and may reduce uncertainty of the results obtained from a simplistic correlation analysis or simple regression model, therefore giving better information to decision makers of public health.

2. Materials and Methodology

2.1. Materials

The materials used in this research contain five types of data: remotely sensed concentration, population spatial distribution density, GDP spatial distribution density, terrain data, and LULC in China. The remotely sensed PM2.5 annual concentration dataset in 2010 was produced by the Atmospheric Physics Institute of Dalhousie University in Canada [37] with a resolution of . The population density data in this paper are cited in the Gridded Population of the World (GPW), data of the UN-Adjust Population Density-v4 [38], published by a data centre in NASA’s Earth Observing System Data and Information System (EOSDIS), with a resolution of 30′ × 30′. GDP spatial distribution density, terrain data, and LULC datasets were drawn from the Resources and Environmental Science Centre of the Chinese Academy of Sciences (http://www.resdc.cn). All above-mentioned data were projected by Albers Conic Equal Area with WGS-84 datum, and the resolution was unified to .

2.2. Methodology

The methodology in this paper consists of two modules: processing geospatial data and structuring the deep CNN model. The purpose of the former is to establish the dataset for the deep CNN model. The deep CNN model undertakes the mission of fitting the complex function of spatial correlation relationship.

2.2.1. Processing Geospatial Data

The deep CNN method is usually applied in image identification or classification, not directly transplanted into analysing geospatial data. In the geospatial issue, spatial correlation and geographical attribute need to be considered. Hence, geospatial data require technical processing to match the deep CNN model structure. The four influencing factors generate inputs. Each pixel location contains concentrations as output and four influencing factors. In view of spatial correlation, the pixel location and the surrounding locations should be considered. The deep CNN model has the ability of processing big data; therefore the order of spatial correlation can be amplified. In this paper, the order of spatial correlation adopts n-order shape, pixels (). Figure 1 shows an illustration of 5-order shape of the spatial correlation extent, including pixels. Subsequently, it can extract the corresponding four sets of influencing factor attribute data for a pixel location. Each dataset of influencing factors comprises the corresponding values of the surrounding pixels. In short, the annual concentration of a pixel location is affected by the four influencing factors of its own and the surrounding n-order spatial correlation extent, pixels. The mathematic form can be expressed as follows: where is the annual concentration of the i-th pixel, , ,, represent the four influencing factor attribute values of the i-th pixel and its surrounding pixels, and represents the error. The spatial influencing function can be learned by the deep CNN model.
Figure 1

Illustrating 5-order shape of the extent of spatial correlation.

2.2.2. A Developed Deep Convolutional Neural Network Model

CNN contains two categories of cells in the visual cortex, simple cells which exploit local features and complex cells which “pool” (e.g., maximizing, averaging) the outputs of simple cells within a neighbourhood. The structure of CNN model which has two special aspects of local connections and sharing weights is different from general deep learning models. A complete deep CNN stack three types of layers, convolutional layers, pooling layers, and full connected layers. The commonly used CNNs are 2-Dimensional CNN and 3- Dimensional (3-D) CNN. Figure 2 shows a 3-D CNN illustration with m (m = 1, 2, …) filters and k (k = 1, 2, …) convolution kernels. The value of a neuron at position of the j-th convolutional feature in the i-th layer can be expressed as follows [34]: where m indexes the convolutional feature in the th layer connected to the j-th convolutional feature, and and are the height and the width of the convolutional kernel. is the size of the spatial influencing factors, is the value of position connected to the m-th convolutional feature, and is the bias of the j-th convolutional feature in the i-th layer.
Figure 2

The illustration of 3-D convolution with m (m = 1, 2, …) filters and k (k = 1, 2, …) convolution kernels, the weights are color-coded.

This paper designs a deep 3-D CNN model for exploiting spatial influencing feature of remotely sensed data. Figure 3 illustrates the presented deep CNN model architecture which contains including four convolutional layers, four polling layers, and three hidden layers. And the activation function for hidden layer adopted the Rectified Linear Unit (ReLU) function. The pooling mode employed average mode. The batch normalization was set in each layer except for the output layer. The dropout ratio and learning ratio were set as 25% and 1%, respectively. The dimension of the pre-processed input neural layer is , including four sets of influencing factors with the i-th pixel and its surrounding pixels. Table 1 lists the experimental results when the spatial correlation parameter n was assigned various values. It shows that the validation accuracy reaches the highest when the spatial correlation parameter, n, is taken 9, although the training accuracy is improved along with the increase of the parameter, n. Considering that the validation accuracy is better indicator representing the accuracy of a model. Hence, the spatial correlation parameter, n, is assigned with nine. Then the input layer contains 19 × 19 × 4 neurons with the four factors’ attribute value. The number of the output neuron is 11, labelled by annual concentration with 11 categories: <10 , , , , , , , , , , >100 .
Figure 3

Illustration of the presented deep CNN model architecture exploiting spatial influencing feature of remotely sensed concentration, including four convolutional layers, four polling layers, and three hidden layers, the first layer is input containing four influencing factors’ values on a pixel, the output layer with 11 neurons consisting of 11 categories of annual concentrations on the pixel location in the middle.

Table 1

Experimental results of the training and validation accuracy of the deep 3-D CNN model when the spatial correlation parameter, n, is assigned various values

Spatial Correlation Parameter, nTraining AccuracyValidation Accuracy
167.94%80.17%
277.71%82.37%
388.08%86.11%
492.01%90.50%
594.51%91.83%
696.53%92.14%
797.35%92.90%
898.30%92.46%
998.71%93.29%
1098.87%92.40%
1199.29%93.28%
1299.53%93.25%

3. Results

The remotely sensed annual concentration and influencing factors possessed 96,337 pixels, among which, 86,903 pixels (accounting for the ratio of 90%) were used for deep learning, and the remaining 9434 pixels (accounting for 10%) were reserved for validation. Training accuracy is defined as the accuracy applied to the training data (i.e., 86,903 pixels), while validation accuracy is the accuracy for the remaining data (i.e., 9434 pixels), and estimated accuracy is the accuracy for the total data (i.e., 96,337 pixels). To investigate the integrated and respective spatial influence of the four various factors, we exploited the congregate magnitude of spatial influence from the four factors and the separate influencing magnitude from one or two factors.

3.1. Integrated Spatial Influencing Feature

If the four impact factors were all fed into the input layer, after 1000 epochs of learning, the training accuracy of 86,903 pixels were 98.71%, and the validation accuracy of the remaining 9434 pixels reached 93.29%. Figure 4 illustrates the spatial distribution of the original and estimated annual concentration of the total 96,337 pixels using the trained deep learning model fed with the four influencing factors.
Figure 4

Original (A), estimated spatial distribution of annual mean concentrations in 2010 (B) by the deep CNN model and (C) Geographic Weighted Regression (GWR) model, with the four influencing factors (population spatial density, GDP spatial density, terrain, and LULC) over China in 2010.

The estimated spatial distribution of annual concentration was nearly the same, except for very few pixel locations. It indicated that the four factors (population spatial density, GDP spatial density, terrain, and LULC) can almost determine the annual concentration. Furthermore, Table 2 listed the corresponding confusion matrix between the original and estimated annual concentration of the total 96,337 pixel locations by the four factors using the trained deep CNN model. The result showed that although there are some incorrect estimated pixel values which were close to the correct values, that is an obvious narrow diagonal band. The overall estimated accuracy is 97.85%. The estimated accuracy of the first category of annual concentration, <10 reaches a maximum of 99.38%. The minimum and the second minimum predicted accuracies are 90.81% and 95.48% respectively, occurring on the eighth and eleventh category of and . The estimated accuracy can be regarded as the spatial influencing magnitude of the influencing factors on annual concentration. A high estimated accuracy reflects directly a high spatial influencing feature. The results show that there is a strong correlation between annual concentration and the four factors. Especially while the trained deep CNN evaluated the total 96,337 pixels, the overall estimated accuracy has reached up to 97.85%, indicating the spatial influencing magnitude of the four factors on annual concentration.
Table 2

The confusion matrix of the original vs. estimated annual concentrations by the deep CNN model fed by the four influencing factor data: population, GDP, terrain, and LULC.

96,337 PixelsOriginal PM2.5 Annual Concentration (μg/m3)
<1010~2020~3030~4040~5050~6060~7070~8080~9090~100>100
Estimated PM2.5 annual concentration (μg/m3)<1018,395337000000000
10~2011211,7927030100000
20~3028321,8918612102000
30~400310110,80489300000
40~50061811513,9716050000
50~600015624103574000
60~700000049365062000
70~80002001142359815900
80~90000000142964563600
90~100000000003813327
>1000000000013166
Accuracy99.38%96.49%99.13%98.10%98.85%97.27%94.36%90.81%95.84%95.48%95.95%

3.2. Single Spatial Influencing Feature

The spatial influencing magnitude of the single factor can be measured by the deep CNN model proposed in this paper. We have implemented other deep CNNs whose input layer contains neurons with a single factor attribute value; the other parameters are the same as above. After 1000 epochs of learning, the training accuracy and validation accuracy of population spatial density and GDP spatial density were 47.12% and 36.13%, 50.07% and 40.91%. Furthermore, the results show that annual concentration has strong spatial correlation with terrain or LULC, as the validation accuracies of terrain and LULC were up to 83.17% and 72.37%. The result showed that although the overall estimated accuracies of population and GDP over China were relatively low, the two factors could have determined the severe polluted region. Furthermore, the result indicated that terrain and LULC are the main spatial influencing factors on annual concentration over China. In addition, we also have implemented the deep CNN with an input layer containing neurons describing terrain and LULC. The learning result shows that the training accuracy and validation accuracy of the two factors, terrain and LULC, were up to 90.69% and 87.95%. Table 3 listed the corresponding confusion matrix between the original and estimated annual concentration produced by the trained deep CNN fed by terrain and LULC data on the total 96,337 pixel locations. Except for the eleventh category (>100 ) of annual concentration, the other ten categories’ estimated accuracies are more than 91%. Furthermore, the overall estimated precision can reach up to 96.65%.
Table 3

The confusion matrix of the original vs. estimated annual concentrations by the deep CNN model fed by the two influencing factor data, terrain and LULC.

96,337 PixelsOriginal PM2.5 Annual Concentrations ( μg/m3)
<1010~2020~3030~4040~5050~6060~7070~8080~9090~100>100
Estimated PM2.5 annual concentration (μg/m3)<1017,974535000000000
10~2016111,77513530000000
20~30312921,62212920016000
30~401317710,869128411200
40~50112718113,9919081800
50~600023814099933100
60~700041575349175110
70~80001011181339519310
80~900010201746244451080
90~10000005054101129040
>1000000000004158
Accuracy99.08%94.63%98.42%97.17%98.30%96.02%91.94%86.01%93.56%91.88%79.80%

3.3. Comparation with the GWR Prediction

To verify the advantage of the deep CNN model presented in this paper, we conducted the GWR in the same dataset. Figure 4C is the estimated spatial distribution of annual concentrations over China in 2010 by the GWR model. It can be seen that the GWR estimated results have obvious bias comparing with the origin data (Figure 4). Furthermore, the lowest and highest concentrations were particularly misestimated by the GWR model. And the overall estimated accuracy was 72.81% which is more less than the estimated accuracy of the deep CNN model, 97.85%. Comparing the Figure 4B,C, it indicated that although the overall spatial structure estimated by the GWR is generally similar with the origin spatial structure of annual concentrations, there were some deviations in detail. The cause of the difference of the two models could be that, the deep CNN model has super strong non-linear fitting ability which can train very complicated non-linear function, however, the GWR is still a linear regression model which cannot catch complicated non-linear variation effects. Inaccurate correlativity between concentration and other influencing factors could lead to biased public policies. Scientific public policy-making need more fine and accurate analysing evidences.

4. Discussion

This study proposed a deep CNN model to exploit spatial influencing magnitude for the annual mean concentration of remotely sensed over China. In consideration of the influencing mechanism and the availability of the dataset, this study investigated the spatial influence of the four factors (population, GDP, terrain, and LULC) on the annual concentration of over China. The influencing factors of pollution are known to include natural and anthropogenic activities [39]. Among the four factors selected for this paper, terrain represented natural elements, population and GDP reflected anthropogenic activities, and LULC could be regarded as a mixture of natural and anthropogenic activities. The presented deep CNN method fully considered the local spatial heterogeneity, and a wider spatial correlated scope could be considered by more than one-order shape extent, which benefited from the strong ability of the deep CNN to process big data. This paper bridged the gap between spatial analysis and deep CNN technology with the idea of reprocessing or reorganizing remotely sensed data for deep CNN input. The deep CNN method was commonly used to extract the feature representations from a mass of labelled images [27,28]. As aforesaid, few researchers applied the deep CNN model when analysing spatial influence of multiple variables. From a different view, combining a geospatial reprocess, this study designed a 3D deep CNN structure in which the input and output neurons were influencing factors and concentration, respectively. The strong non-linear function fitting ability of a deep CNN model could then detect complicated non-linear spatial influencing effect, and the deep CNN model might consider local spatial heterogeneity. From the results, the developed deep CNN model can fully consider spatial relationship and can calculate on each pixel location. Hence, the results can effectively describe the spatial influencing feature on every pixel location. Although the GWR method can also investigate the local correlation on each pixel location, only a linear or simple non-linear regression can be implemented, and the capability of processing big data is not very strong. From the above, the deep CNN model can not only process big data well but can also fit or learn very complicated correlativity. This paper demonstrated that the deep CNN technology could be applied in exploiting the spatial influence feature of geospatial or remotely sensed data, and its advantages could be fully performed. The spatial influencing magnitude of the four factors on the annual concentration of was investigated employing the presented deep CNN model. This model was not only used in exploring spatial influence of remotely sensed concentration, but also in other fields, such as detecting risk factors of some kind of epidemic based on remotely sensed data. Through the model, the risk level of risk factors in public health could be quantificationally assessed. In other words, the developed deep CNN model has the potential to expand the field of spatial analysis of remotely sensed lattice data. Despite all this, this research has some limitations. Firstly, the spatial dependent variable, annual concentration, is classified into 11 categories, not as a continuous variable. Secondly, the deep CNN model can learn a very complicated function structure, but the mathematical mechanism is currently not clear, namely mysterious “black boxes” [40], and it is difficult to explain in a geographical process.

5. Conclusions

Population spatial density, GDP spatial density, terrain, and LULC can almost determine the spatial pattern of annual concentration with an overall estimated precision of 97.85% over China. Furthermore, terrain and LULC are the main spatial influencing factors on annual concentration among the four factors. And the overall spatial influencing magnitude of the two factors, terrain and LULC, reached up to 96.65%, nearly equal to all four factors’ spatial influencing magnitude on annual concentration.
  8 in total

1.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation.

Authors:  Ross Girshick; Jeff Donahue; Trevor Darrell; Jitendra Malik
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2016-01       Impact factor: 6.226

2.  Quantifying the influence of natural and socioeconomic factors and their interactive impact on PM2.5 pollution in China.

Authors:  Dongyang Yang; Xiaomin Wang; Jianhua Xu; Chengdong Xu; Debin Lu; Chao Ye; Zujing Wang; Ling Bai
Journal:  Environ Pollut       Date:  2018-06-04       Impact factor: 8.071

3.  Global Estimates of Fine Particulate Matter using a Combined Geophysical-Statistical Method with Information from Satellites, Models, and Monitors.

Authors:  Aaron van Donkelaar; Randall V Martin; Michael Brauer; N Christina Hsu; Ralph A Kahn; Robert C Levy; Alexei Lyapustin; Andrew M Sayer; David M Winker
Journal:  Environ Sci Technol       Date:  2016-03-24       Impact factor: 9.028

4.  Geographic weighted regression: applicability to epidemiological studies of leprosy.

Authors:  Mônica Duarte-Cunha; Andréa Sobral de Almeida; Geraldo Marcelo da Cunha; Reinaldo Souza-Santos
Journal:  Rev Soc Bras Med Trop       Date:  2016-02       Impact factor: 1.581

5.  Impact assessment of meteorological and environmental parameters on PM2.5 concentrations using remote sensing data and GWR analysis (case study of Tehran).

Authors:  Fakhreddin Hajiloo; Saeid Hamzeh; Mahsa Gheysari
Journal:  Environ Sci Pollut Res Int       Date:  2018-03-01       Impact factor: 4.223

6.  The spatial distribution of leprosy cases during 15 years of a leprosy control program in Bangladesh: an observational study.

Authors:  Eaj Fischer; D Pahan; Sk Chowdhury; Jh Richardus
Journal:  BMC Infect Dis       Date:  2008-09-23       Impact factor: 3.090

7.  Spatio-temporal variation of PM2.5 concentrations and their relationship with geographic and socioeconomic factors in China.

Authors:  Gang Lin; Jingying Fu; Dong Jiang; Wensheng Hu; Donglin Dong; Yaohuan Huang; Mingdong Zhao
Journal:  Int J Environ Res Public Health       Date:  2013-12-20       Impact factor: 3.390

8.  Long-term variation of satellite-based PM2.5 and influence factors over East China.

Authors:  Qianshan He; Fuhai Geng; Chengcai Li; Haizhen Mu; Guangqiang Zhou; Xiaobo Liu; Wei Gao; Yanyu Wang; Tiantao Cheng
Journal:  Sci Rep       Date:  2018-08-06       Impact factor: 4.379

  8 in total
  3 in total

1.  Measuring the Environmental Efficiency and Technology Gap of PM2.5 in China's Ten City Groups: An Empirical Analysis Using the EBM Meta-Frontier Model.

Authors:  Shixiong Cheng; Jiahui Xie; Yun Zhang
Journal:  Int J Environ Res Public Health       Date:  2019-02-25       Impact factor: 3.390

2.  Local neural-network-weighted models for occurrence and number of down wood in natural forest ecosystem.

Authors:  Yuman Sun; Weiwei Jia; Wancai Zhu; Xiaoyong Zhang; Subati Saidahemaiti; Tao Hu; Haotian Guo
Journal:  Sci Rep       Date:  2022-04-16       Impact factor: 4.996

3.  Air-pollution prediction in smart city, deep learning approach.

Authors:  Abdellatif Bekkar; Badr Hssina; Samira Douzi; Khadija Douzi
Journal:  J Big Data       Date:  2021-12-22
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.