Literature DB >> 30870434

Directional dependence between major cities in China based on copula regression on air pollution measurements.

Jong-Min Kim1, Namgil Lee2, Xingyao Xiao3.   

Abstract

Air pollution is well-known as a major risk to public health, causing various diseases including pulmonary and cardiovascular diseases. As social concern increases, the amount of air pollution data is increasing rapidly. The purpose of this study is to statistically characterize dependence between major cities in China based on a measure of directional dependence estimated from PM2.5 measurements. As a measure of the directional dependence, we propose the so-called copula directional dependence (CDD) using beta regression models. An advantage of the CDD is that it does not rely on strict assumptions of specific probability distributions or linearity. We used hourly PM2.5 measurement data collected at four major cities in China: Beijing, Chengdu, Guangzhou, and Shanghai, from 2013 to 2017. After accounting for autocorrelation in the PM2.5 time series via nonlinear autoregressive models, CDDs between the four cities were estimated to produce directed network structures of statistical dependence. In addition, a statistical method was proposed to test the directionality of dependence between each pair of cities. From the PM2.5 data, we could discover that Chengdu and Guangzhou are the most closely related cities and that the directionality between them has changed once during 2013 to 2017, which implies a major economic or environmental change in these Chinese regions.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30870434      PMCID: PMC6417661          DOI: 10.1371/journal.pone.0213148

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


1 Introduction

Recently, air pollution has become a significant environmental and social problem in China. PM2.5 refers to the concentration of atmospheric fine particulate matter (PM) whose diameter is ≤ 2.5μm. Exposure to PM2.5 is associated with increased mortality rates caused by lung cancer and cardiopulmonary diseases [1-4]. It is generally accepted that PM2.5 is more harmful to human health than PM with diameter > 2.5μm and ≤ 10μm (PM10) [5, 6]. Sources of particulate matter include residential wood burning, coal-fired thermal power generation, agricultural burning, diesel fuel combustion, and natural/industrial dust. PM2.5 can also be generated indirectly when gases and particles interact in the air. China’s air pollution situation is so extreme that ambient air pollution ranked the second highest and household air pollution ranked the third highest in the G20 in terms of disability-adjusted life-years (DALYs) in 2010 [7]. It has been estimated that the air pollution in China contributed to 1.6 million deaths/year in 2014 [8]. In addition to the harmful effects of air pollutants on human health and mortality, pollutants can also have diverse effects on climate and weather due to their complex composition and sources [9-12]. China has regulated ambient air quality since 1982, when it set limits on total suspended particulates (TSP), SO2, NO2, lead, and Benzopyrene [13]. In February 2012, China adopted a new Ambient Air Quality Standard [14], where limits on PM2.5 were set for the first time. In 2012, major cities in China including the Beijing-Tianjin-Hebei region, Yangtze River delta region, Pearl River delta region, and provincial capitals were required to implement the standards. The current standards were implemented nationwide in 2016; see [15] for more detail. Nowadays, a large amount of air quality monitoring data is being accumulated, where air pollutants are monitored directly on ground level; see Section 2 for more detail. Due to the increase in the amount of air quality monitoring data in China, numerous studies have been conducted to characterize China’s air quality status and recent trends [8, 16–18]. However, most of the studies focus on analyzing temporal trends of air pollutant concentration and visualizing its spatial distributions at various time scales. In recent years, there have been efforts to uncover possible sources of air pollution in China [8, 18]. However, it is a challenging task to identify causal effects based on observational data compared to controlled experimental data due to the effects of unobserved variables [19, 20]. On the other hand, under the consideration that a wide variety of factors may influence the PM2.5 level, which may be closely related to environmental and industrial factors, we focus on inferring statistical dependence and causal relations between four major cities in China based on the PM2.5 measurement data as observational evidence. In this study, we included Beijing, Chengdu, Guangzhou, and Shanghai as the target cities in China because they are the four major cities which represent main industrial regions in China. Approaches to the determination of causal relations based on obserational data include graphical causal modeling [20] such as causal Bayesian networks and structural equation models. Graphical causal models determine directed graph structures satisfying certain conditional independence assumptions called the Markov assumption and faithfulness assumption [19-21]. However, it is well known that a number of directed graph structures cannot be distinguished even if the conditional independence assumptions are satisfied, e.g., X → Y and Y → X cannot be distinguished. Alternatively, the concept of directional dependence has been studied in linear regression settings in the statistics community [22, 23]. Measures of directional dependence using copula regression were investigated in [24]. More generally, regression-based approaches for determining cause and effect variables between two random variables have been suggested in the machine learning community [25-31]. The basic principle is to compare the regression models in alternate directions and investigate asymmetry in the joint distribution. The purpose of this study is to infer and validate a statistical measure of directional dependence between the selected cities based on time series of PM2.5 concentrations. Copula directional dependence (CDD) is a measure of directional dependences between two regions of interest, which can be applied to non-normal distributions and nonlinear relationships [24, 32, 33]. Contrary to other dependence measures such as correlations and mutual information, the CDD yields bi-directional dependences which measure statistical influence from one region to another. Copula-based modeling has been widely studied and applied in many fields such as macroeconomics and finance [34, 35] and genetics and biology [36, 37]. Gaussian copula generalized linear models for longitudinal data analysis were introduced in [38]. Multivariate regression analysis using Bayesian inference with Gaussian copulas was proposed subsequently [39]. Later, a framework for joint regression analysis of one-dimensional generalized linear models using Gaussian copulas was proposed [40]. More recently, Gaussian copula marginal regression (GCMR) models were developed [41]. On the other hand, a regression model called the beta regression was proposed [42], where continuous responses are assumed to take values in unit intervals. The beta regression is effective for modeling bounded responses such as rates and proportions due to its flexibility in modeling the shapes and asymmetries of distributions. Various extensions of beta regression models have been proposed [43-46]. For time series data, a beta regression model using the Gaussian copula for bounded time series was proposed in [47], where stationary autoregressive moving average (ARMA) models were adopted for addressing the serial correlation. In this paper, we apply the CDD proposed in [32, 33]. The suggested CDD measure uses GCMR models [41] and the beta regression model [47]. By using the suggested CDD measure, non-normal and non-linear relationships in the data can be efficiently modeled without any complicated processes of structure search or hyperparameter selection. Moreover, since the CDD is a directional dependence structure, we can further infer directed networks of statistical dependence between the major cities in China. The rest of this paper is organized as follows. In Section 2, we describe the PM2.5 data and their descriptive statistics. In Section 3, we explain the proposed measure of directional dependence using beta regression. And we describe statistical procedures for processing the time series data and inferring directional dependences between the four cities in China. In Section 4, we present the analyses and results. Discussion and conclusions are given in Section 5.

2 Data

2.1 Air pollution data

Nowadays, air pollutants are monitored at various locations in many countries. A national real-time air quality monitoring system version 1.0, known as the Air Reporting System, has been operating on the China National Environmental Monitoring Center (CNEMC) website (http://www.cnemc.cn/) since January 2013. The Air Reporting System version 2.0 has been operating since January 2014 and covers 945 sites in 190 cities according to the Ambient Air Quality Standard (GB3095-2012). Other approaches to air quality monitoring include satellite data-based approaches [48-50] and geoscientific modeling [51]. Current air quality monitoring systems such as the Air Reporting System can provide the hourly concentration of various types of air pollutants measured directly at ground level. Even though the Air Reporting System of China monitors the ambient air quality in a wide range of Chinese regions and cities in real time, most air quality monitoring history is not publicly available. Independently from the Chinese national air quality monitoring system, the US Mission in China started monitoring air quality in 2008 at the US Embassy in Beijing. Since then, monitoring stations have been subsequently established at the US consulates in Shanghai (2011), Guangzhou (2011), Chengdu (2012), and Shenyang (2013) [17]. The air quality status and hourly PM2.5 concentration are available at http://www.stateair.net/. The PM2.5 time series data analyzed in this study were obtained from the US Mission China air quality website (http://stateair.net). Note that the US Mission China air quality website states that the data are not fully verified or validated; these data are subject to change, error, and correction (http://stateair.net/web/assets/USDOS_AQDataUseStatement.pdf). The historical data files on the stateair.net site can be downloaded manually after agreeing to the data use statement. The obtained data consist of hourly PM2.5 time series of four major Chinese cities, Beijing, Chengdu, Guangzhou, and Shanghai, from January 2013 until June 2017. The original data are hourly measurements of PM2.5 concentration levels measured in μg/m3. We removed the data with missing values and then took the daily maximum of the PM2.5 levels. We note that about 62% of the daily maximum values fell in the night hours (between 21:00 and 4:00) for the 2013 Beijing data. One may consider that air pollution levels during business hours are an indicator of poor health and not values falling in the night hours when the population is asleep. In this sense, it will be very interesting to analyze subsets of the air pollution data measured during business hours for applications to public health in future work. In this paper, on the other hand, we took the daily maximum values to analyze statistical dependence between major cities in China. Since the air pollution level is measured at ground-level, it fluctuates sensitively depending on daily air temperature change. Hence, we remove the effect of daily air temperature change by taking the daily maximum values, which can be indicators of other factors such as coal fuel burning, transportation, industrial activity, and wind.

2.2 Descriptive statistics

Sample time series data of the PM2.5 levels for the year 2013 are illustrated in Fig 1. Hourly measurements of the PM2.5 levels were obtained in μg/m3. Missing values were set to zero in the figure. We can see that the PM2.5 levels are relatively high during the winter season (January and December), and Guangzhou and Shanghai show relatively low overall PM2.5 values compared to the other two cities.
Fig 1

Time series data of PM2.5 levels in the four Chinese cities (Beijing, Chengdu, Guangzhou, and Shanghai) during the year 2013.

The histograms for the hourly PM2.5 level measurements in the year 2013 are presented in Fig 2. The marginal distribution for each of the Chinese cities is highly skewed and non-normally distributed, which implies that typical statistical dependence measures such as the Pearson correlation coefficient are not suitable.
Fig 2

Histograms of PM2.5 levels in the four Chinese cities during the year 2013.

We have presented summary statistics of the PM2.5 level data for the years 2013 to 2017 in Table 1. During 2013 and 2014, the median value of Beijing was the highest among the four cities. However, it constantly decreased to the second rank in the following years, while Chengdu ranked the first from 2015 to 2017. The median PM2.5 levels of Guangzhou and Shanghai were relatively low and kept decreasing from 2013 to 2016. On the other hand, the maximum of Beijing’s PM2.5 levels was the highest during the years 2013 to 2017 except 2014. The maximum of Shanghai’s PM2.5 levels was the second highest in 2013, but it kept decreasing rapidly and ranked 4th among the four cities by 2016. In summary, the overall PM2.5 levels of the four cities showed a decreasing trend, but each city had its own statistical characteristics.
Table 1

Summary statistics of PM2.5 data.

YearMin.1st Qu.MedianMean3rd Qu.Max.IQRSkew.Kurt.
2013BJ030711011378861072.009.03
CD0336582114487811.254.63
GZ027475373254461.034.49
SH026445975651492.6215.39
2014BJ02771971326711051.796.87
CD0436679101688581.9310.66
GZ020414664526441.8412.53
SH025394963406381.868.69
2015BJ0215382108722872.2810.10
CD038587290399521.727.15
GZ018323950259321.848.08
SH024384961364372.139.46
2016BJ018497295782772.2710.16
CD040617295281551.144.28
GZ013273142266291.9411.39
SH021354459212381.535.83
2017BJ018427084684662.8513.16
CD039586985333461.817.48
GZ018333752436342.2018.55
SH024394559188351.204.91
After removing the data with missing values and taking the daily maximum PM2.5 values, we generated scatter plots between each pair of cities, which are illustrated in Fig 3. The data points are distributed in a single cluster, which implies that they were sampled from an identical distribution. In addition, we can find that the distribution is highly skewed, and linear relationship between the variables is not apparent.
Fig 3

Scatterplot matrix for daily time series data of the PM2.5 levels between the four Chinese cities (BJ: Beijing, CD: Chengdu, GZ: Guangzhou, SH: Shanghai) during the year 2013.

As a measure of pairwise correlation between the PM2.5 levels of the four cities, we computed Spearman correlation coefficients, which are summarized in Table 2. Note that the Spearman correlation coefficient is a correlation coefficient between the ranks of the measured values, so it is independent of the skewed marginal distributions.
Table 2

Spearman correlation coefficients of daily maximums of the PM2.5 levels between each pair of the four Chinese cities (BJ: Beijing, CD: Chengdu, GZ: Guangzhou, SH: Shanghai) during the years 2013 to 2017.

YearBJCDGZSH
2013BJ10.270.210.17
CD-1.000.420.35
GZ--1.000.30
SH---1.00
2014BJ10.30.18-0.02
CD-1.00.250.29
GZ--1.000.23
SH---1.00
2015BJ10.360.170.10
CD-1.000.390.31
GZ--1.000.24
SH---1.00
2016BJ10.340.190.13
CD-1.000.330.33
GZ--1.000.22
SH---1.00
2017BJ10.380.32-0.08
CD-1.000.430.12
GZ--1.000.02
SH---1.00
We can find that Chengdu (CD) had relatively high Spearman correlations with Beijing (BJ) and Guangzhou (GZ) from 2013 to 2017. Correlations between Beijing (BJ), Guangzhou (GZ), and Shanghai (SH) were relatively low from 2013 to 2017. Spearman correlations between Beijing (BJ) and Guangzhou (GZ) slightly increased more recently in 2016 and 2017.

3 Methods

A copula is a multivariate joint distribution function, which is an effective approach for describing statistical dependencies within a set of non-normal random variables [52, 53]. Statistical dependence structures can be modeled by choosing a copula function independent of the marginal distributions of each random variable. Since a normal distribution assumption or linearity assumption is not required, copulas can be applied to a wide variety of statistical dependence modeling tasks.

3.1 Directional dependence based on copula regression

In this paper, we consider bivariate copulas and suggest a measure of directional dependence between a pair of random variables. A generalization to a measure of directional dependence using multivariate copulas is left for future works, where computationally efficient methods need to be developed. According to Sklar’s theorem [54, 55], any joint distribution function F(x, y) of two random variables X and Y can be represented by a bivariate function, C(u, v) composed with the marginal distribution functions, F(x) and F(y), as where C(x, y) is called the copula. Note that U = F(X) and V = F(Y) have a uniform distribution on [0, 1]. We can see that copulas are independent and invariant of marginal distributions. Hence, a copula determines the dependency structure between two random variables independently of any one-to-one continuous transformations of each variable. Directional dependence refers to asymmetric dependence between random variables. The concept of directional dependence was first investigated under linear regression models in [22, 23]. A copula can be used to define directional dependence in terms of regression [24]. Let C(u, v) denote a copula function, which defines the joint distribution of two random variables (U, V) whose marginal distributions have a uniform distribution on [0, 1]. Let C(v) be defined by the conditional distribution of V given U = u as The conditional expectation of V given U = u can be expressed by the copula as [24] Note that the conditional expectation function C(v) is the regression function of V on U. In [24], the directional dependence from U to V is defined, based on the copula regression function of V, by Note that the copula directional dependence (CDD) in Eq (3) is the proportion of the variance of V which has been explained by the copula regression function r(u). In the same way, the directional dependence from V to U is defined by the proportion of the variance of U which has been explained by the copula regression function r(v). Moreover, note that the CDD defined in Eq (3) is a version of the Spearman’s correlation coefficient. The Spearman’s correlation coefficient can also be expressed in terms of a copula irrespectively of the marginal distributions. The Spearman’s correlation coefficient, ρ, is the ordinary (Pearson) correlation between U = F(X) and V = F(Y). It is well known that ρ can be expressed in the following form [56-58]: Note that if U and V are statistically independent, then C(u, v) = uv on 0 ≤ u, v ≤ 1, which leads to r(u) = r(v) = 0.5. This implies that the directional dependence in Eq (3) and the Spearman’s ρ in Eq (4) are measures of the deviations from the joint probability distribution function for indenpendent random variables. See, e.g., [57] for relations between Spearman’s ρ and Kendall’s τ. Moreover, for a pair of random variables U and V, we can estimate and compare the two CDDs, and , to find which copula regression provides a better fit to the data and better explains the variance of the response variable.

3.2 Gaussian copula beta regression

The definition of CDD in Eq (3) between two uniformly distributed random variables U and V can be extended to a pair of continuous random variables X and Y by defining U = F(X) and V = F(Y) and On the other hand, for estimating the CDD, it is necessary to determine a parametric form of the copula regression function, r(u). Note that both U = F(X) and V = F(Y) are uniformly distributed and take their values in the unit interval [0, 1]. Since r(u) is a conditional expectation of V given U = u, both the response variable and predictor variable have bounded ranges. In beta regression, the conditional distribution of a response variable V given U = u is expressed by a beta distribution, Beta(μ, κ), with the mean 0 < μ < 1 and the precision κ > 0 [42]. A beta distribution can flexibly represent a wide range of continuous probabilty distributions defined on the unit interval, that is, various shapes and asymmetries of distributions can be expressed by a beta distribution. The probability density function of V given U = u can be written as where Γ(⋅) is the gamma function. The mean parameter μ is a function of the covariate u, through the logit function as The parameters β0 and β1 can be efficiently estimated based on the maximum likelihood approach in the Gaussian copula marginal regression (GCMR) [41, 47]. In the literatures about copula directional dependences, it has been known that the Gaussian copula regression [39] and GCMR [41] are computationally efficient approaches compared to previously known copula directional dependence measures using a family of asymmetric copulas [37, 59, 60]. In the GCMR [41], the cumulative distribution function, F(v|μ, κ), for the beta distribution in Eq 6 is used to transform the response variable v into w = F(v|μ, κ). Then, the transformed variable is related with a standard normal random variable ϵ by the inverse of the probability integral transform where Ψ is the cumulative distribution function of the standard normal distribution [47]. The above relationship between the responses v and the normal random variables ϵ is used to formulate the sampling distribution and the likelihood function. See [32, 33, 47] for more details.

3.3 Neural network autoregression

In Fig 1, we can see that the air pollution data are far from white noise processes, while their mean and variance change dramatically over time. In order to guarantee a feasible maximum likelihood inference for the beta regression, the serial dependence in the time series data should be removed via a proper modeling of the data [61]. In [32, 33], financial time series data showing conditional heteroscedasticity and serial dependence were preprocessed by employing the asymmetric GARCH(p,q) model [62], and then standardized residuals were generated. In [61], it is noted that GARCH models are employed frequently to remove serial dependence when dealing with financial log-return time series. In the case of meteorological time series data, nonlinear time series models have been widely applied for addressing irregular or chaotic behavior in the observations [63]. Nonlinear time series models consider that irregularity in the observations can be attributed to nonlinear dynamics occuring on a low dimensional chaotic attractor, which can be reconstructed and used to forecast future observations under appropriate conditions [64, 65]. Traditional linear stochastic time series models such as ARIMA models [66] have influenced the forecasting community significantly; however, they cannot capture the nonlinear dynamics underlying real life observations [67]. On the other hand, many useful nonlinear time series models have been proposed [67, 68]. Among them, artificial neural networks (ANNs) such as feedforward neural networks have been suggested as a promising approach for modeling irregular behavior in time series [69-72]. We adopted the neural network approach for estimating nonlinear autocorrelations in the data by using the nnetar function in the R package forecast [73]. Let denote the PM2.5 concentration level at a city i = 1, 2, 3, 4, day t = 1, 2, …, T and year s = 2013, …, 2017. The nonlinear feed-forward neural network model for a lagged time series can be written as where L is the lag order and f is a neural network with H hidden nodes in a single layer. The neural network model with L lags and H hidden nodes is denoted by NNAR(. The model parameters L and H were determined automatically by default values, i.e., L by the optimal value according to AIC for a linear AR(p) model, and H by half of the numbers of input values plus one. In addition, the error process is assumed to be homoscedastic. See the forecast package documentation [73] for default preprocessing procedures and default hyper-parameter values for the nnetar function. After fitting the neural network model in Eq (9) to the data, the fitted values, , are the predicted values which can be written as The term denotes a value randomly drawn from normal distributions, which can be used for obtaining prediction intervals. We have because we do not need predicted values for t > T. The residuals can be written as Due to the model in Eq (9), the residuals, , can be considered to be not serially correlated.

3.4 Bootstrap confidence interval

Let denote the difference, . It measures the difference of the two directional dependences between the two cities U and V. If , then it means that city U affects the other city V more than V affects U. For a statistical test of the difference, we use the bootstrap resampling method. We resampled a fixed rate from the data, then computed the CDDs repeatedly. In this way, we can estimate the 95% confidence interval for the difference, i.e., where LB(Δ) and UB(Δ) are the lower and upper limits of the 95% confidence interval. If the confidence interval does not include zero, then we can reject the null hypothesis that there is no difference between the directional dependences and with a 95% confidence level.

3.5 Statistical properties: A review

The proposed CDD using beta regression was proposed recently in [32, 33]. Statistical properties of the proposed CDD measure using beta regression have been assessed numerically in [32] by using simulated data sets. The experimental setting and the results can be summarized as follows. First, an asymmetric bivariate copula was constructed based on the formula of Durante [59], which is written as where C1 and C2 are two symmetric copulas and α, β ∈ (0, 1) are asymmetry parameters. C1 and C2 were selected as a Frank copula with parameter θ3 = 5 and a Gumbel copula with parameter θ4 = 40, respectively [32, 59]. It is challenging to obtain a theoretical CDD value, , of random variables (U, V) having the specific distribution C(u, v). Instead, a sample of 8,888 correlated pairs, {(u, v)}, was generated, and the CDD value was estimated by and . In order to assess the accuracy of the proposed CDD measure, the CDD value was estimated from each of the thousand independent subsamples with sample size 1000. Based on the thousand estimated CDD values, , the accuracy of the CDD was evaluated via the bias and standard error. In summary, the bias was estimated as and , and the standard error was estimated as and . On the other hand, theoretical properties of the CDD using beta regression are under development. For example, the exact CDD value of a given bivariate copula is not easy to obtain in general. In the same way, given a CDD value, it may be interesting to construct an asymmetric or symmetric copula function in a parametric form.

4 Results

We applied the neural network approach for autocorrelation estimation described in Section 3.3. Details on the feedforward neural networks and their hyper-parameters used in this paper are presented in Table 3.
Table 3

Details on the feedforward neural networks used for autocorrelation estimation.

L: the number of input nodes, i.e., the lag order, H: the number of nodes in the hidden layer.

Year20132014201520162017
Num. nodesLHLHLHLHLH
BJ1142422232
CD42421115822
GZ7453111153
SH179841067411

Details on the feedforward neural networks used for autocorrelation estimation.

L: the number of input nodes, i.e., the lag order, H: the number of nodes in the hidden layer. The PM2.5 values predicted by the estimated feedforward neural networks are illustrated in Fig 4, which are for the year 2013. After the prediction, we carried out the Durbin-Watson test for serial correlation based on the residuals. The obtained p-values for the year 2013 were 0.274 (Beijing), 0.686 (Chengdu), 0.275 (Guangzhou), and 0.318 (Shanghai), all of which were larger than the significance level α = 0.05. For the other years, we obtained p-values larger than 0.05 as well.
Fig 4

Predicted values (black straight line) of PM2.5 levels of the four Chinese cities during the year 2013 and the observed PM2.5 measurements (red dotted line).

Next, we computed the CDDs between each pair of Chinese cities based on the residuals of the daily PM2.5 levels in each year. The results are summarized in Table 4. Contrary to the correlation coefficients presented in Table 2, the CDDs are not symmetric and are computed in both directions, U to V and V to U for each pair (U, V) of the cities. We remark that the CDD values presented in Table 4 are compared with the square of correlation coefficients rather than correlation coefficients themselves since the definition in Eq (3) is a version of the coefficient of determination in multiple linear regression. In addition, we have computed 95% bootstrap confidence intervals for the difference, . We can find that most of the confidence intervals do not include zero, which means that the difference of directional dependences is significantly different from zero.
Table 4

Copula directional dependences (CDDs) between ordered pairs of Chinese cities (BJ: Beijing, CD: Chengdu, GZ: Guangzhou, SH: Shanghai) during the years 2013 to 2017, together with 95% bootstrap confidence intervals (CIs) for the difference of directional dependences.

The larger value between the two directional dependences is marked in bold font.

YearBJ → CDBJ ← CDBJ → GZBJ ← GZBJ → SHBJ ← SH
2013CDD0.01310.01440.00200.00230.00130.0029
95% CI(0.0013, 0.0014)(0.0002, 0.0002)(0.0015, 0.0016)
2014CDD0.03790.03530.00530.01130.01320.0126
95% CI(-0.0028, -0.0024)(0.0059, 0.0061)(-0.0007, -0.0005)
2015CDD0.01680.01570.00070.00000.01910.0281
95% CI(-0.0014, -0.0011)(-0.0007, -0.0006)(0.0087, 0.0091)
2016CDD0.00000.00010.01500.01210.00030.0000
95% CI(0.0000, 0.0001)(-0.0030, -0.0027)(-0.0002, -0.0001)
2017CDD0.02970.01880.06300.05070.01240.0230
95% CI(-0.0110, -0.0107)(-0.0127, -0.0123)(0.0106, 0.0108)
YearCD→GZCD←GZCD→SHCD←SHGZ→SHGZ←SH
2013CDD0.01170.01760.00500.00530.00250.0026
95% CI(0.0058, 0.0060)(0.0003, 0.0003)(0.0002, 0.0002)
2014CDD0.00310.00360.02150.01490.00440.0064
95% CI(0.0006, 0.0007)(-0.0068, -0.0066)(0.0018, 0.0020)
2015CDD0.02770.01820.00630.00800.01300.0091
95% CI(-0.0096, -0.0093)(0.0016, 0.0018)(-0.0039, -0.0037)
2016CDD0.00040.00330.00000.00040.00600.0137
95% CI(-0.0011, -0.0010)(0.0003, 0.0004)(0.0078, 0.0081)
2017CDD0.07760.07610.00520.00050.00310.0047
95% CI(-0.0016, -0.0013)(-0.0047, -0.0045)(0.0016, 0.0017)

Copula directional dependences (CDDs) between ordered pairs of Chinese cities (BJ: Beijing, CD: Chengdu, GZ: Guangzhou, SH: Shanghai) during the years 2013 to 2017, together with 95% bootstrap confidence intervals (CIs) for the difference of directional dependences.

The larger value between the two directional dependences is marked in bold font. The estimated CDD values are visually illustrated in Figs 5 to 9. The labels on the arrows between the four Chinese cities represent the directional dependences between the cities. Several findings from the CDD values are as follows:
Fig 5

A connectivity map of China based on the directional dependences between the four cities in the year 2013.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html).

Fig 9

A connectivity map of China based on the directional dependences between the four cities in the year 2017.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html).

The CDD between Beijing (BJ) and Chengdu (CD) was relatively high from 2013 to 2017 except in 2016. The CDD from CD to BJ was higher in 2013, but the CDD from BJ to CD became higher in the later years from 2014 to 2017. The CDD between Chengdu (CD) and Guangzhou (GZ) was relatively high in the years 2013, 2015, and 2017. Similarly, the CDD from GZ to CD was higher in 2013 and 2014, but the CDD from CD to GZ became higher in the later years from 2015 to 2017. The CDD between Chengdu (CD) and Shanghai (SH) was relatively high in 2014, in which the CDD from CD to SH was higher than the other direction. The CDD between CD and SH was relatively low in the other years. The CDD between Beijing (BJ) and Shanghai (SH) was relatively high during 2014, 2015, and 2017, in which the CDD from BJ to SH was higher in 2014, but the CDD from SH to BJ was higher in 2015 and 2017. The CDD between Beijing (BJ) and Guangzhou (GZ) was relatively high in 2014, 2016, and 2017, in which the CDD from GZ to BJ was higher in 2014, but the CDD from BJ to GZ became higher from 2015 to 2017.

A connectivity map of China based on the directional dependences between the four cities in the year 2013.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html).

A connectivity map of China based on the directional dependences between the four cities in the year 2014.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html).

A connectivity map of China based on the directional dependences between the four cities in the year 2015.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html).

A connectivity map of China based on the directional dependences between the four cities in the year 2016.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html).

A connectivity map of China based on the directional dependences between the four cities in the year 2017.

Base map figure source: CIA Maps (https://www.cia.gov/library/publications/resources/cia-maps-publications/China.html). The findings described above are overall consistent with the results obtained from the correlation coefficients in Table 2. Contrary to the correlation coefficients, however, the CDDs could infer directionality between each pair of cities and its structure change over time.

5 Discussion and conclusions

In this work, we presented a statistical measure of directional dependence which is called copula directional dependence (CDD). We obtained the PM2.5 concentration level data for the four major Chinese cities, Beijing, Chengdu, Guangzhou, and Shanghai, from January 1, 2013 to June 30, 2017. We preprocessed the data by using feedforward neural networks with lagged inputs, and we statistically analyzed directional dependence between the cities based on the proposed CDDs. An R source code for preprocessing and analyzing the PM2.5 data in this study is available at S1 Appendix. Particulate matter consisting of PM2.5 can be generated from various sources, and its concentration can be changed by numerous factors such as wind direction, wind speed, seasonal change, weather, and industrialization [8, 18]. On the other hand, the proposed CDD can infer dependence structures between the major Chinese cities conveniently without requiring complicated models involving the diverse economic or environmental factors in a meteorological system. The inferred CDD structure can characterize economic and environmental relationships between Chinese cities, which may indirectly reflect interrelationships between underlying hidden factor variables. In this study, we applied CDD using bivariate copulas. A multivariate generalization can be computationally challenging due to the large number of combinations of the variables. Kim and Jung [74] extended the bivariate copula-based directional dependence to a multivariate version by combining it with time-varying partial correlations. It is necessary to develop computationally efficient methods further in future works. Based on the directional dependence structures obtained by the CDDs, we could visually present the interactions between the four Chinese cities over different time periods. Moreover, we could provide levels of confidence on the directionality of the dependences between the cities by computing bootstrap confidence intervals. We remark that the CDD values need to be interpreted with caution. The CDD values range between 0 and 1, and a larger value implies a stronger correlation between the variables. However, since the CDD is a version of the coefficient of determination in multiple linear regression according to the definition in Eq (3), it should be expressed with higher precision, e.g., in three to four digits as in Table 4, than the Pearson and Spearman correlation coefficients. Note that the methodology presented in this paper consists of several statistical methods such as copulas, beta regression, and ANNs, each of which can be a source of uncertainty that affects the CDD values. Therefore, statistical decisions made by the bootstrap confidence interval are not always reliable, especially when the CDD values are relatively small. An alternative method for a reliable decision is to introduce a multiple comparision procedure to determine significant nonzero CDD values; see, e.g., [75, 76]. It is remarkable that we could see a structural change in the CDD networks, especially in the directions of the edges of the networks, between the period 2013-2014 and the period 2015-2017. Such a structural change indicates a certain dramatic shift in the states of the underlying latent factors affecting PM2.5 levels around 2014. Trends in air quality in Chinese major cities were analyzed in [17] and it was found that Beijing experienced decreased PM2.5 from 2013 to 2015. Overall, Beijing, Chengdu, and Guangzhou experienced improvements in air quality in the recent decade. Note that China released an ambient air quality standard, GB 3095-2012, in 2012 [14], and key cities including the four major cities analyzed in this study were required to implement the standard. In September 2013, the State Council of China issued the Air Pollution Prevention and Control Action Plan (APPCAP). Implementation of the APPCAP led to a control of ambient air pollution and substantial reductions in mortality and years-of-life-lost (YLLs) [77]. We can guess that continuous efforts in China have resulted in dramatic changes in air quality and the constituents of air pollutants. Further studies on the nature of the changes in air quality are necessary in order to make better public health policies. The proposed CDD can be effectively applied to any further analysis in general settings or for a more detailed analysis of air pollution.

Source code.

An R source code for preprocessing and analyzing the PM2.5 data in this study. (PDF) Click here for additional data file.
  15 in total

1.  Effect of the fine fraction of particulate matter versus the coarse mass and other pollutants on daily mortality in Santiago, Chile.

Authors:  L A Cifuentes; J Vega; K Köpfer; L B Lave
Journal:  J Air Waste Manag Assoc       Date:  2000-08       Impact factor: 2.235

2.  Joint regression analysis of correlated data using Gaussian copulas.

Authors:  Peter X-K Song; Mingyao Li; Ying Yuan
Journal:  Biometrics       Date:  2008-05-28       Impact factor: 2.571

3.  Characterizing multi-pollutant air pollution in China: Comparison of three air quality indices.

Authors:  Jianlin Hu; Qi Ying; Yungang Wang; Hongliang Zhang
Journal:  Environ Int       Date:  2015-07-18       Impact factor: 9.621

4.  Is Daily Mortality Associated Specifically with Fine Particles?

Authors:  Joel Schwartz; Douglas W Dockery; Lucas M Neas
Journal:  J Air Waste Manag Assoc       Date:  1996-10       Impact factor: 2.235

5.  An association between air pollution and mortality in six U.S. cities.

Authors:  D W Dockery; C A Pope; X Xu; J D Spengler; J H Ware; M E Fay; B G Ferris; F E Speizer
Journal:  N Engl J Med       Date:  1993-12-09       Impact factor: 91.245

6.  Rapid health transition in China, 1990-2010: findings from the Global Burden of Disease Study 2010.

Authors:  Gonghuan Yang; Yu Wang; Yixin Zeng; George F Gao; Xiaofeng Liang; Maigeng Zhou; Xia Wan; Shicheng Yu; Yuhong Jiang; Mohsen Naghavi; Theo Vos; Haidong Wang; Alan D Lopez; Christopher J L Murray
Journal:  Lancet       Date:  2013-06-08       Impact factor: 79.321

7.  Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution.

Authors:  C Arden Pope; Richard T Burnett; Michael J Thun; Eugenia E Calle; Daniel Krewski; Kazuhiko Ito; George D Thurston
Journal:  JAMA       Date:  2002-03-06       Impact factor: 56.272

8.  The effect of fine and coarse particulate air pollution on mortality: a national analysis.

Authors:  Antonella Zanobetti; Joel Schwartz
Journal:  Environ Health Perspect       Date:  2009-02-13       Impact factor: 9.031

9.  Air Pollution in China: Mapping of Concentrations and Sources.

Authors:  Robert A Rohde; Richard A Muller
Journal:  PLoS One       Date:  2015-08-20       Impact factor: 3.240

10.  Air Pollution in Major Chinese Cities: Some Progress, But Much More to Do.

Authors:  Dorrit H Lowsen; George A Conway
Journal:  J Environ Prot (Irvine, Calif)       Date:  2016-12-29
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.