| Literature DB >> 30389982 |
Yue-Hua Dai1, Zhi-Qiang Jiang1,2, Wei-Xing Zhou3,4,5.
Abstract
With most city dwellers in China subjected to air pollution, forecasting extreme air pollution spells is of paramount significance in both scheduling outdoor activities and ameliorating air pollution. In this paper, we integrate the autoregressive conditional duration model (ACD) with the recurrence interval analysis (RIA) and also extend the ACD model to a spatially autoregressive conditional duration (SACD) model by adding a spatially reviewed term to quantitatively explain and predict extreme air pollution recurrence intervals. Using the hourly data of six pollutants and the air quality index (AQI) during 2013-2016 collected from 12 national air quality monitoring stations in Beijing as our test samples, we attest that the spatially reviewed recurrence intervals have some general explanatory power over the recurrence intervals in the neighbouring air quality monitoring stations. We also conduct a one-step forecast using the RIA-ACD(1,1) and RIA-SACD(1,1,1) models and find that 90% of the predicted recurrence intervals are smaller than 72 hours, which justifies the predictive power of the proposed models. When applied to more time lags and neighbouring stations, the models are found to yield results that are consistent with reality, which evinces the feasibility of predicting extreme air pollution events through a recurrence-interval-analysis-based autoregressive conditional duration model. Moreover, the addition of a spatial term has proved effective in enhancing the predictive power.Entities:
Year: 2018 PMID: 30389982 PMCID: PMC6214986 DOI: 10.1038/s41598-018-34584-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1An illustrative example of the recurrence interval d (left panel) and the relationship between q and 〈d〉 (right panel). The selected x is diurnally adjusted PM2.5 time series at station 1.
Figure 2Average recurrence intervals (left panel) and AR(L) models (right panel) under five different qs. In the right panel, under each threshold q, we first obtain the recurrence interval series d and then model d using the autoregressive model. The maximal significant (95% confidence interval) lag is denoted as L. For one station, the Ls of the six pollutants in question and AQI’s ds are stacked in bars with distinct colors.
Figure 3The probability of d/〈d〉 of the PM2.5 series in station one.
Figure 4The Hurst exponent and the threshold for the six pollutants at Station 1.
Figure 5An illustrative example of and its spatially reviewed recurrence intervals , q = 3.0 in this example (left panel) and simulated relationship between the original time series’ correlations and recurrence interval correlations under different thresholds (right panel). In the right panel, first we simulate standard distribution time series pairs and then use the spatially reviewed recurrence intervals scheme to obtain the and and calculate the correlation.
Minimum correlation matrix and distance matrix between stations.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1.00 | 0.60 | 0.91 | 0.89 | 0.89 | 0.92 | 0.85 | 0.73 | 0.64 | 0.68 | 0.87 | 0.83 |
| 2 | 0.60 | 1.00 | 0.59 | 0.59 | 0.59 | 0.61 | 0.61 | 0.63 | 0.74 | 0.80 | 0.58 | 0.63 |
| 3 | 0.91 | 0.59 | 1.00 | 0.85 | 0.93 | 0.92 | 0.82 | 0.71 | 0.60 | 0.67 | 0.90 | 0.79 |
| 4 | 0.89 | 0.59 | 0.85 | 1.00 | 0.86 | 0.84 | 0.79 | 0.68 | 0.59 | 0.68 | 0.81 | 0.78 |
| 5 | 0.89 | 0.59 | 0.93 | 0.86 | 1.00 | 0.90 | 0.82 | 0.74 | 0.61 | 0.68 | 0.88 | 0.79 |
| 6 | 0.92 | 0.61 | 0.92 | 0.84 | 0.90 | 1.00 | 0.86 | 0.72 | 0.63 | 0.70 | 0.88 | 0.81 |
| 7 | 0.85 | 0.61 | 0.82 | 0.79 | 0.82 | 0.86 | 1.00 | 0.73 | 0.66 | 0.67 | 0.81 | 0.83 |
| 8 | 0.73 | 0.63 | 0.71 | 0.68 | 0.74 | 0.72 | 0.73 | 1.00 | 0.72 | 0.69 | 0.72 | 0.72 |
| 9 | 0.64 | 0.74 | 0.60 | 0.59 | 0.61 | 0.63 | 0.66 | 0.72 | 1.00 | 0.73 | 0.58 | 0.66 |
| 10 | 0.68 | 0.80 | 0.67 | 0.68 | 0.68 | 0.70 | 0.67 | 0.69 | 0.73 | 1.00 | 0.65 | 0.68 |
| 11 | 0.87 | 0.58 | 0.90 | 0.81 | 0.88 | 0.88 | 0.81 | 0.72 | 0.58 | 0.65 | 1.00 | 0.77 |
| 12 | 0.83 | 0.63 | 0.79 | 0.78 | 0.79 | 0.81 | 0.83 | 0.72 | 0.66 | 0.68 | 0.77 | 1.00 |
|
| ||||||||||||
| 1 | 0.00 | 49.53 | 11.08 | 5.86 | 14.76 | 8.38 | 14.69 | 43.08 | 63.15 | 38.27 | 15.51 | 13.79 |
| 2 | 49.53 | 0.00 | 43.45 | 51.05 | 43.49 | 41.58 | 34.86 | 49.35 | 41.92 | 11.36 | 37.42 | 40.17 |
| 3 | 11.08 | 43.45 | 0.00 | 8.64 | 3.96 | 6.32 | 11.13 | 32.37 | 52.26 | 32.13 | 6.11 | 18.03 |
| 4 | 5.86 | 51.05 | 8.64 | 0.00 | 11.30 | 9.80 | 16.67 | 38.62 | 60.46 | 39.68 | 14.49 | 18.80 |
| 5 | 14.76 | 43.49 | 3.96 | 11.30 | 0.00 | 10.08 | 13.68 | 28.45 | 49.16 | 32.35 | 6.63 | 21.70 |
| 6 | 8.38 | 41.58 | 6.32 | 9.80 | 10.08 | 0.00 | 6.89 | 37.90 | 55.66 | 30.24 | 7.80 | 11.71 |
| 7 | 14.69 | 34.86 | 11.13 | 16.67 | 13.68 | 6.89 | 0.00 | 38.32 | 52.58 | 23.59 | 7.91 | 10.58 |
| 8 | 43.08 | 49.35 | 32.37 | 38.62 | 28.45 | 37.90 | 38.32 | 0.00 | 28.54 | 42.04 | 30.90 | 48.52 |
| 9 | 63.15 | 41.92 | 52.26 | 60.46 | 49.16 | 55.66 | 52.58 | 28.54 | 0.00 | 41.49 | 47.89 | 62.88 |
| 10 | 38.27 | 11.36 | 32.13 | 39.68 | 32.35 | 30.24 | 23.59 | 42.04 | 41.49 | 0.00 | 26.15 | 29.74 |
| 11 | 15.51 | 37.42 | 6.11 | 14.49 | 6.63 | 7.80 | 7.91 | 30.90 | 47.89 | 26.15 | 0.00 | 17.63 |
| 12 | 13.79 | 40.17 | 18.03 | 18.80 | 21.70 | 11.71 | 10.58 | 48.52 | 62.88 | 29.74 | 17.63 | 0.00 |
The minimum correlation matrix is the minimum value of seven correlations between two stations.
Partial estimated results of the RIA-ACD(1,1) model.
| Exponential | Weibull | Exponential | Weibull | Exponential | Weibull | Exponential | Weibull | Exponential | Weibull | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
|
| 0.10** | 0.11*** | 0.08 | 0.19*** | 0.31*** | 0.22*** | 0.10 | 0.51*** | 0.06* | 0.04 *** |
|
| 0.08** | 0.06*** | 0.10** | 0.14*** | 0.26*** | 0.22*** | 0.12 | 0.00 | 0.17* | 0.08 |
|
| 0.82*** | 0.75*** | 0.83*** | 0.49*** | 0.46*** | 0.35*** | 0.81*** | 0.00 | 0.83*** | 0.83 *** |
|
| 0.57*** | 0.51*** | 0.46*** | 0.40*** | 0.38 *** | |||||
| 7.22 | 10.09 | 7.49 | 9.79 | 5.98 | 6.04 | 2.70 | 3.97 | 1.65 | 1.77 | |
| (0.70) | (0.43) | (0.68) | (0.46) | (0.82) | (0.81) | (0.99) | (0.95) | (1.00) | (1.00) | |
|
| ||||||||||
|
| 0.05*** | 0.04*** | 0.15** | 0.16** | 0.81*** | 0.33*** | 0.06 | 0.03*** | 0.60*** | 0.41 ** |
|
| 0.05*** | 0.03*** | 0.27** | 0.17* | 0.40 | 0.06 | 0.10 | 0.03 | 0.00 | 0.01 |
|
| 0.90*** | 0.89*** | 0.66*** | 0.43* | 0.00 | 0.00 | 0.88*** | 0.87*** | 0.40*** | 0.00 |
|
| 0.56*** | 0.50*** | 0.42*** | 0.37*** | 0.32 *** | |||||
| 11.63 | 13.82 | 5.38 | 10.43 | 11.69 | 13.23 | 4.34 | 3.50 | 4.61 | 4.59 | |
| (0.31) | (0.18) | (0.86) | (0.40) | (0.31) | (0.21) | (0.93) | (0.97) | (0.92) | (0.92) | |
|
| ||||||||||
|
| 0.09** | 0.05*** | 0.18*** | 0.16** | 0.23 | 0.12*** | 0.54*** | 0.18*** | 0.05*** | 0.64 |
|
| 0.19*** | 0.08*** | 0.50*** | 0.44*** | 0.62 | 0.32 | 0.29 | 0.22 | 0.00*** | 0.00 |
|
| 0.75*** | 0.79*** | 0.50*** | 0.28** | 0.38 | 0.43*** | 0.10*** | 0.15** | 1.00*** | 0.00 |
|
| 0.52*** | 0.43*** | 0.36*** | 0.31*** | 0.26 *** | |||||
| 0.98 | 1.00 | 3.68 | 4.23 | 3.01 | 3.02 | 1.14 | 0.73 | 3.00 | 3.22 | |
| (1.00) | (1.00) | (0.96) | (0.94) | (0.98) | (0.98) | (1.00) | (1.00) | (0.98) | (0.98) | |
We estimate the above model under five thresholds from q = 2.0 to q = 4.0 with 0.5 increments each time. The table reports the estimated results using both exponential and Weibull distributions. We also estimate the Newey-West corrected standard errors and label the significance of estimations using asterisks, with *, **, ***representing the statistical significance at 10%, 5%, and 1% level. The Ljung-Box Q-test for residuals autocorrelation (lags = 10) is reported in the last row of each panel and p-value of the Ljung-Box statistic is in the parentheses.
Partial estimated results of the RIA-SACD(1,1,1) model under the Weibull distribution.
| Exponential | Weibull | Exponential | Weibull | Exponential | Weibull | Exponential | Weibull | Exponential | Weibull | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
|
| 0.04 | 0.12*** | 0.53*** | 0.34*** | 0.06** | 0.22*** | 0.06*** | 0.03** | 0.59*** | 0.27*** |
|
| 0.04*** | 0.06*** | 0.26*** | 0.21*** | 0.15* | 0.23*** | 0.04 | 0.01 | 0.02 | 0.02 |
|
| 0.89*** | 0.69*** | 0.12 | 0.11 | 0.82*** | 0.34*** | 0.80*** | 0.85*** | 0.00 | 0.00 |
|
| 0.02 | 0.04*** | 0.13*** | 0.08*** | 0.02 | 0.00 | 0.14*** | 0.06*** | 0.42** | 0.53 |
|
| 0.58*** | 0.51*** | 0.46*** | 0.41*** | 0.39 *** | |||||
|
| 5.01 | 5.83 | 11.90 | 11.31 | 2.46 | 1.20 | 16.64 | 18.82 | 0.43 | 0.71 |
| (0.89) | (0.83) | (0.29) | (0.33) | (0.99) | (1.00) | (0.08) | (0.04) | (1.00) | (1.00) | |
|
| ||||||||||
|
| 0.87*** | 0.04*** | 0.51*** | 0.27*** | 0.70*** | 0.30*** | 0.08 | 0.36*** | 0.02** | 0.41** |
|
| 0.08*** | 0.03*** | 0.62*** | 0.27** | 0.55 | 0.09 | 0.05 | 0.03 | 0.00 | 0.01 |
|
| 0.00 | 0.88*** | 0.05 | 0.04 | 0.00 | 0.00 | 0.88*** | 0.00 | 0.94*** | 0.00 |
|
| 0.09 | 0.00 | 0.17*** | 0.10*** | 0.10** | 0.04 | 0.01 | 0.01 | 0.06 | 0.01 |
|
| 0.56*** | 0.50*** | 0.42*** | 0.36*** | 0.32*** | |||||
|
| 55.92 | 11.91 | 15.18 | 15.26 | 8.52 | 8.26 | 2.57 | 4.13 | 1.62 | 3.55 |
| (0.00) | (0.29) | (0.13) | (0.12) | (0.58) | (0.60) | (0.99) | (0.94) | (1.00) | (0.97) | |
|
| ||||||||||
|
| 0.12* | 0.07** | 0.51*** | 0.15*** | 0.19 | 0.11*** | 0.41 | 0.09 | 0.02 | 0.29*** |
|
| 0.17*** | 0.07*** | 0.12*** | 0.40*** | 0.65** | 0.38 | 0.28 | 0.19 | 0.00 | 0.00 |
|
| 0.70*** | 0.75*** | 0.00 | 0.32*** | 0.35* | 0.33*** | 0.29*** | 0.41* | 1.00 | 0.00 |
|
| 0.02 | 0.01 | 0.59*** | 0.00 | 0.15 | 0.12 | 0.05 | 0.03 | 0.02 | 0.29*** |
|
| 0.52*** | 0.43*** | 0.37*** | 0.31*** | 0.27*** | |||||
|
| 0.97 | 0.95 | 2.90 | 4.08 | 2.80 | 2.77 | 2.25 | 0.53 | 2.99 | 3.34 |
| (1.00) | (1.00) | (0.98) | (0.94) | (0.99) | (0.99) | (0.99) | (1.00) | (0.98) | (0.97) | |
We estimate the above model under five thresholds from q = 2.0 to q = 4.0 with 0.5 increments each time. The table reports the estimated results using both exponential and Weibull distributions. We also estimate the Newey-West corrected standard errors and label the significance of estimations using asterisks, with *, **, ***representing the statistical significance at 10%, 5%, and 1% level. The Ljung-Box Q-test for residuals autocorrelation (lags = 10) is reported in the last row of each panel and p-value of the Ljung-Box statistic is in the parentheses.
Figure 6The coefficients α and β of each time series are estimated from the RIA-ACD(1,1) model under Weibull distribution. The magnitudes of the parameters are represented by different colors filled in each circle, and the circle with no color filled in signifies that the estimated parameter is not significant at 95% level. The top row is the α and the bottom row is the β. Presented from the left column to the right are q = 2.0 to q = 4.0 respectively. The standard error is adjusted using Newey-West method.
Figure 7, and of each time series estimated from the RIA-SACD(1,1,1) model. The magnitudes of the parameters are represented by different colors filled in each circle, and the circle with no color filled in signifies that the estimated parameter is not significant at 95% level. The top row is the , the middle row is the and the bottom row is . Presented from the left column to the right are q = 2.0 to q = 4.0 respectively. The standard error is adjusted using Newey-West method.
Figure 8Distributions of AE and RMSE from the RIA-ACD(1,1) model and the RIA-SACD(1,1,1) model. The top row is the RIA-ACD(1,1) model and the bottom row is the RIA-SACD(1,1,1) model.
p-values of some AE and RMSE breakpoints and average AEs and RMSEs.
| ACD (1,1) | 0.84 | 0.88 | 0.89 | 0.90 | 0.91 | 0.93 | 137 |
| SACD (1,1,1) | 0.83 | 0.88 | 0.89 | 0.90 | 0.91 | 0.93 | 123 |
| ACD (1,1) | 0.04 | 0.13 | 0.18 | 0.25 | 0.31 | 0.44 | 717 |
| SACD (1,1,1) | 0.04 | 0.13 | 0.17 | 0.23 | 0.30 | 0.43 | 642 |
Figure 9Average AE or RMSE from the ACD(1,1) model and the SACD(1,1,1) model from three dimensions.
Figure 10Absolute errors of multi-step out-of-sample test of PM2.5 series and station one under the threshold q = 2.0. The results are obtained from RIA-SACD model.
Figure 11Percentages of significant s across lags and significant s across distances based on Eq. (6) and Eq. (7). We first estimate each time series using the above extended models and then calculate the percentage of s and s above the 95% significant level in each dimension.
List of national air pollutants monitoring stations in Beijing.
| Label | Code | Lat. | Long. | Label | Code | Lat. | Long. |
|---|---|---|---|---|---|---|---|
| 1 | 10001A | 116.37 | 39.87 | 7 | 10007A | 116.32 | 39.99 |
| 2 | 10002A | 116.17 | 40.29 | 8 | 10008A | 116.72 | 40.14 |
| 3 | 10003A | 116.43 | 39.95 | 9 | 10009A | 116.64 | 40.39 |
| 4 | 10004A | 116.43 | 39.87 | 10 | 10010A | 116.23 | 40.20 |
| 5 | 10005A | 116.47 | 39.97 | 11 | 10011A | 116.41 | 40.00 |
| 6 | 10006A | 116.36 | 39.94 | 12 | 10012A | 116.22 | 39.93 |
Figure 12Distributions of the twelve national observation stations in Beijing.
Figure 13Autocorrelations of two PM2.5 time series in station one. (a) is the c and (b) is x.