Literature DB >> 27597861

A Long-Term Prediction Model of Beijing Haze Episodes Using Time Series Analysis.

Xiaoping Yang1, Zhongxia Zhang1, Zhongqiu Zhang2, Liren Sun1, Cui Xu1, Li Yu1.   

Abstract

The rapid industrial development has led to the intermittent outbreak of pm2.5 or haze in developing countries, which has brought about great environmental issues, especially in big cities such as Beijing and New Delhi. We investigated the factors and mechanisms of haze change and present a long-term prediction model of Beijing haze episodes using time series analysis. We construct a dynamic structural measurement model of daily haze increment and reduce the model to a vector autoregressive model. Typical case studies on 886 continuous days indicate that our model performs very well on next day's Air Quality Index (AQI) prediction, and in severely polluted cases (AQI ≥ 300) the accuracy rate of AQI prediction even reaches up to 87.8%. The experiment of one-week prediction shows that our model has excellent sensitivity when a sudden haze burst or dissipation happens, which results in good long-term stability on the accuracy of the next 3-7 days' AQI prediction.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27597861      PMCID: PMC5002306          DOI: 10.1155/2016/6459873

Source DB:  PubMed          Journal:  Comput Intell Neurosci


1. Introduction

Industry of developing countries is mainly centralized around big cities, accompanied by a large population, consumption, and pollution. Together with Tianjin city and Hebei province, Northern China has become one of the most prosperous and polluted areas on Earth. By 2013, the transient population of Beijing was 37.5 million, and the intermittent outbreak of air pollution has greatly impacted every citizen's life: physiological diseases [1, 2], depression, and poor visibility in traffic [3, 4]. The main component of haze is pm2.5 (particulate matters less than 2.5 μm in aerodynamic diameter), and the concentration of pollution is described with Air Quality Index (AQI, the concentration of pm2.5). The Chinese Government began to monitor and record pm2.5 concentrations for major cities since 2013 [5]. According to the report of Quan et al. [6], the AQI reached 600 in Beijing during the haze event in January 2013. In recent years, more and more papers have referred to the haze episodes and the consequences in Northern China [7-11]. Researchers pointed out that, over the coming years, haze episodes would continue to burst frequently in Northern China [12]. This paper presents an AQI prediction model of Beijing based on time series analysis. We collected Beijing's AQI data of 29 continuous months since 2013 and constructed a dynamic structural prediction model. Statistical methods are used to obtain the maximum likelihood estimation of the prediction model. And both short-term and long-term experiments are carried out to test the accuracy and robustness of our model. The remainder of this paper is organized as follows. In Section 2, we introduce recent related work. Section 3 presents our prediction model and proves our model to be a vector autoregressive model. Experiments and evaluations are reported in Section 4. We conclude the paper in Section 5 with future works.

2. Related Work

Generally, pm2.5, or haze, is born mainly through anthropogenic factors [13-16] and eliminated by natural diffusion. Several days after emission, secondary pm2.5 is produced through photochemical reactions among indiffusible pollutants. Secondary pm2.5 is the principal component in most severe haze episodes in China [17]. A typical way of haze prediction is to use pollutant emission data (CO, SO2, and NO) in the simulation [5, 18]. Huang et al. [14] analyzed the chemical compositions of pm2.5 and used chemical mass balance to identify the emission sources. Other more complex models are proposed to introduce the atmospheric features, chemistry components, and transport factors [15]. But the more common case is that pollutant emission data usually increase or decrease synchronously with AQI. Sun [19] took population, car ownership, and GDP into consideration and proposed a statistical index system of average annual haze episode days. They found that although most factors contribute to predicting pm2.5, the annual average of NO is negatively correlated with average severely polluted days. The paper [12] established a cubic exponential smoothing model by introducing dust emission into haze prediction. Liang et al. pointed out that there are various distribution and transmission patterns of pm2.5 [20]. In fact, Wang et al. mentioned that the government control policy should be considered in model simulations [9]. Many researches use backpropagation neural network as the simulation model [19, 21]. Statistical time series analysis is rarely used in haze prediction, so long-term haze prediction is difficult for current methods to accomplish [22]. Multiple linear regression models also perform well on daily scale prediction [23, 24]. However, the test data of existing researches is not ample; for example, [21] tested the prediction accuracy on only 3 days. Besides, Zhang et al. pointed out that pm2.5 accumulation in previous days significantly affects the present daily pm2.5 concentration, which should also be a concern in the modeling process [22]. Considering the above points, this paper presents a new AQI prediction model integrated with natural factor, humanity factor, and self-evolution factor.

3. The Prediction Model of Beijing's Daily AQI

3.1. The Parameters and Architecture of the Prediction Model

The change of daily pm2.5 concentration depends on two factors: daily overall production of pm2.5 by human activities P and daily overall natural diffusion or overall natural accumulation of pm2.5 C . The production of haze depends a lot on the control policies of the government toward the emission of industry fuels I . The diffusion of haze mainly depends on the airflow W . Besides, complex chemical changes could occur between pm2.5 and other pollutants; thus, previous day's pm2.5 concentration also affects the AQI, which could be seen as the evolution result of previous day's pm2.5 and is represented by Y . Apparently, P − C could be directly observed. P is generated by a semimanual method. P is mainly related to daily human activities, and we calculate P from AQI sequences of no less than five consecutive sunny and windless days. Special circumstances are also considered. In winter, P will be larger because the heating system is on. The car usage restrictions and temporary stoppage of factories during Beijing APEC 2014 are also taken into consideration. C is then calculated as P − (P − C ). Sometimes, C is greater than zero, which means pm2.5 accumulates because of nonhuman factors. Thus, the daily net growth of pm2.5 (P − C ) is a function of the evolution result Y , the industry control index I , and the forecast of wind power W . Consider this problem as a dynamic structural model, and our model can be described as Parameters β 1, β 2, and β 3, respectively, represent the effect caused by the pm2.5 of the previous day, the wind power, and the industry control index. The net growth of previous day's pm2.5 partly affects present day's pm2.5 and partly affects the next day's pm2.5. The parameter β 4 represents this “partial adjustment.” The disturbance μ represents other factors which affect present day's pm2.5.

3.2. Complexity Reduction of the Prediction Model

In order to facilitate the research and modeling process, we have proved that this model could be reduced to a vector autoregressive model.

Proposition 1 .

Formula (1) is a vector autoregressive model.

Proof

Assume that there exists sequence autocorrelation in formula (1). The autocorrelation isin which υ is white noise. Here, we apply the Cochrane-Orcutt iteration to rewrite formula (2):where L is the lag operator (LV ≡ V ), which can convert the last phase to current value in a time series. The next work is to find the most satisfying value of ρ through successive iteration method. Specifically, this method uses residual error to estimate the unknown ρ. Assume that we use previous p days' AQI to predict present day's AQI. Multiply (1 − ρL) on both sides of formula (1); the expansion formula will be as follows: In the substitution process, many assumptions are neglected. But the ordinary least square method (OLS estimation) should not be used in the estimation of formula (4), because OLS can only illustrate the relationship between daily pm2.5 production and the policy control index, the accumulation of history pm2.5, and the wind power. The net growth of previous day's pm2.5 is only one reason of the correlation of these variables. The government could make policies to control pm2.5 production of industry to obtain “satisfying” daily production of pm2.5; that is, I is an endogenous variable. And the policy control index depends on present day's and previous p days' accumulation of history pm2.5, the wind power, the daily production of pm2.5, and daily diffusion of pm2.5:where υ represents the influence brought about by other policies. The net growths of previous days' pm2.5 and policy control index also have an effect on daily accumulation of pm2.5:where υ represents other factors that influence daily accumulation of pm2.5. Analogized from formulas (4), (5), and (6), C and W can both be written in a similar form. Join formulas (4), (5), and (6) together, and rewrite them into vector form:in which In B 0, the parameters in the 1st, 2nd, 3rd, 4th, and 5th row, respectively, relate P , C , I , Y , and W to the other variables. Every B is a 5∗5 matrix. Premultiply formula (7) by B 0 −1 (the inverse matrix of B 0):in which This is the standard form of vector autoregressive model. So it is proved that our prediction model (formula (1)) is in fact a vector autoregressive model. The regression parameters of our haze prediction model can be obtained as follows. Let The dynamic structural system (formula (7)) isss Assume that the disturbance terms are not sequence correlated or correlated to each other, which means D is a main diagonal matrix. Formula (12) could be written asin which Let Ω be the variance-covariance matrix of ε : Suppose B 0 is a lower triangular matrix, in which all main diagonal elements are assigned 1, and D is a main diagonal matrix. The parameters (B 0, Γ, D) can be obtained through the maximum likelihood estimation of complete information. The maximum likelihood estimation of Ω can be obtained by the variance-covariance matrix of regression residual. Finally, and D are calculated through triangular decomposition of ; thus, Γ can be evaluated. Above all, the prediction model of Beijing AQI has considered factors including industry emission and policy control, together with the chemical changes of previous days' pollution accumulation and the diffusion conditions. This model also takes the correlations between these factors into consideration and introduces time series haze features into the dynamic structural model. The policy control index is simulated by the record of 4 severe haze episodes during this period. The diffusion is evaluated by weather record of daily wind power.

4. Model Evaluations

We collected the daily AQI and daily weather information from 28 Oct. 2013 to 31 Mar. 2016. This complete sequence is used to test the accuracy of the prediction model. The next day's AQI prediction experiment (Section 4.1) and long-term AQI prediction experiment (Section 4.2) are both implemented. The next day's AQI prediction is evaluated from two perspectives: the accuracy of daily prediction and the accuracy of statistical analysis.

4.1. Next Day's AQI Prediction

The next day's weather forecast information is applied in next day's AQI prediction. The observed and predicted daily mean AQI in Beijing are illustrated in Figure 1. The simulation result shows that the predicted value matched the observed value very well on the whole sequence of 886 days. Sometimes, there is severe deviation from the observed value; for example, on 19 Feb. 2014, the observed AQI was 89, while our model gives a prediction of 209, with an offset of 135%. But the fact is, in the afternoon of 19 Feb., the wind of Beijing suddenly changed from northeasterly to southwesterly, and by 19:00 the AQI has reached already up to 170, which could be interpreted as our model successfully forecasted a severe haze outbreak several hours in advance; in the coming 7 days, the average daily AQI of Beijing is 305. The occasional occurrence of this “foreseeing” phenomenon is caused by coarse time granularity (daily), and this phenomenon is marked with red ellipse in Figure 1. These marks indicate that our model could “foresee” the sharp change of both outbreaks and diffusions. Most haze outbreaks and diffusions could be accurately simulated; some could be foreseen but could never be delayed.
Figure 1

(a, b, c) Next day's AQI prediction on 886 continuous days.

Figure 2(a) is a scatter diagram of daily AQI, including both observed value and predicted value. Most points lie close to y = x (the red line). But some points lie in a queue at the bottom part, which means the observed AQI exceeds 200, while the predicted value is less than 50. There are altogether 15 such outliers, 7 of which “foresee” haze diffusion, while the other 8 bug points could not be well interpreted. All the 15 points are checked and listed in Table 1. “✓” means a “foreseeing” phenomenon, and “?” represents bug points. Figure 2(b) is a scatter diagram of annual AQI (sum of daily AQI in a certain year). Our data covers only 2 months of 2013 and 3 months of 2016, so, in this diagram, these 2 points lie in the lower left corner.
Figure 2

(a) Daily AQI of the 886 days. (b) Annual AQI from 2013 to 2016.

Table 1

All the 15 outliers in Figure 2(a).

Date of outlierLabel
Nov. 2, 2013
Dec. 7, 2013?
Dec. 25, 2013
Feb. 14, 2014?
Feb. 25, 2014?
Mar. 26, 2014?
Oct. 10, 2014
Oct. 11, 2014
Nov. 19, 2014?
Nov. 20, 2014?
Nov. 30, 2014
Dec. 9, 2014?
Jan. 4, 2015?
Jan. 15, 2015
Mar. 7, 2015
The pie chart in Figure 3 shows the distribution of prediction accuracy. The deviation of predicted and observed AQI is obtained through the following formula: Figure 3 shows that 55% predictions match the observed values very well (<20% deviation). The purple part is mainly caused by the “foreseeing” phenomenon. Most samples of the red part come from less-polluted days. For example, on 12 Jan. 2016, the AQI prediction is 40 while the observed AQI is 29, which makes a deviation of 37.9%. In fact, statistics also indicate that our model performs better in worse air conditions (Figure 4). A sample is correctly predicted if the deviation of a sample is less than 20% or the predicted air quality level matches the observed level.
Figure 3

The deviation of predicted and observed AQI.

Figure 4

Prediction accuracies of different air qualities.

4.2. Long-Term AQI Prediction

In the long-term prediction, we use history haze data sequence and weather forecast information to predict the next 7 days' AQI. A sample is correctly predicted if the deviation of a sample is less than 20% or the predicted air quality level matches the observed level. From 26 Dec. 2015 to 31 Mar. 2016, we predict the AQI in the next 7 days and check the accuracy of n-day predictions. Figure 5 shows the accuracy of long-term prediction in the 91 days' experiment. Figure 5 shows that the accuracy stays stable on the next 3, 4, 5, 6, and 7 days' AQI prediction, which indicates that our model has excellent robustness on the task of long-term prediction. The next day's prediction accuracy surprisingly reaches 79.1%, which is far better than the experiment in Section 4.1. The reason is that, during the 91 days, 6 haze episodes attacked Beijing. These frequent attacks did contribute a lot to the overall performance because our model is very sensitive to sudden changes of AQI, including outbreaks and diffusions (Section 4.1; Figure 4). Figures 6 and 7 show several haze episodes during the 91 days. Both figures show a pm2.5 change process of more than 2 weeks. Figure 6 also shows a “foreseeing” phenomenon caused by coarse time granularity, marked by a red ellipse.
Figure 5

The accuracy of long-term AQI prediction.

Figure 6

Three haze episodes in Jan. 2016.

Figure 7

Three haze episodes in Feb. 2016.

5. Conclusion and Future Work

This paper presented a dynamic structural model to predict Beijing's daily AQI. This model integrated natural factor, humanity factor, and self-evolution factor into the time series model. This dynamic structural measurement model of daily haze increment is proven to be a vector autoregressive model. Experiments reflected two highlights of this model. First, our model is very sensitive to and performs very well on predicting sudden changes of AQI, including both outbreaks and diffusions. Second, the model has great robustness on the task of long-term AQI prediction. Lastly, limited by the coarse time granularity, our model sometimes “foresees” but never delays or misses any sudden changes of haze. Many researchers use simple backpropagation neural network to accomplish nonlinear prediction models. But since methods based on time series are proven to be effective in haze prediction modeling, we believe that recurrent neural networks give better performances in such a prediction task. Although the related factors are limited in existing models, the overfitting problem should still be concerned, because, in long-term prediction, a deviation could spread and be exaggerated in the following days' predictions.
  8 in total

1.  Visual range trends in the Yangtze River Delta Region of China, 1981-2005.

Authors:  Lina Gao; Gensuo Jia; Renjian Zhang; Huizheng Che; Congbin Fu; Tijian Wang; Meigen Zhang; Hong Jiang; Peng Yan
Journal:  J Air Waste Manag Assoc       Date:  2011-08       Impact factor: 2.235

2.  Elucidating severe urban haze formation in China.

Authors:  Song Guo; Min Hu; Misti L Zamora; Jianfei Peng; Dongjie Shang; Jing Zheng; Zhuofei Du; Zhijun Wu; Min Shao; Limin Zeng; Mario J Molina; Renyi Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2014-11-24       Impact factor: 11.205

3.  Characteristics of particulate matter (PM10) and its relationship with meteorological factors during 2001-2012 in Beijing.

Authors:  Guangjin Tian; Zhi Qiao; Xinliang Xu
Journal:  Environ Pollut       Date:  2014-05-22       Impact factor: 8.071

4.  Mechanism for the formation and microphysical characteristics of submicron aerosol during heavy haze pollution episode in the Yangtze River Delta, China.

Authors:  Honglei Wang; Junlin An; Lijuan Shen; Bin Zhu; Chen Pan; Zirui Liu; Xiaohui Liu; Qing Duan; Xuan Liu; Yuesi Wang
Journal:  Sci Total Environ       Date:  2014-05-27       Impact factor: 7.963

5.  Spatial and temporal analysis of Air Pollution Index and its timescale-dependent relationship with meteorological factors in Guangzhou, China, 2001-2011.

Authors:  Li Li; Jun Qian; Chun-Quan Ou; Ying-Xue Zhou; Cui Guo; Yuming Guo
Journal:  Environ Pollut       Date:  2014-04-15       Impact factor: 8.071

6.  Characterization of haze episodes and factors contributing to their formation using a panel model.

Authors:  Xiuming Zhang; Yiyun Wu; Baojing Gu
Journal:  Chemosphere       Date:  2016-02-11       Impact factor: 7.086

7.  High secondary aerosol contribution to particulate pollution during haze events in China.

Authors:  Ru-Jin Huang; Yanlin Zhang; Carlo Bozzetti; Kin-Fai Ho; Jun-Ji Cao; Yongming Han; Kaspar R Daellenbach; Jay G Slowik; Stephen M Platt; Francesco Canonaco; Peter Zotter; Robert Wolf; Simone M Pieber; Emily A Bruns; Monica Crippa; Giancarlo Ciarelli; Andrea Piazzalunga; Margit Schwikowski; Gülcin Abbaszade; Jürgen Schnelle-Kreis; Ralf Zimmermann; Zhisheng An; Sönke Szidat; Urs Baltensperger; Imad El Haddad; André S H Prévôt
Journal:  Nature       Date:  2014-09-17       Impact factor: 49.962

8.  Formation of nanoparticles of blue haze enhanced by anthropogenic pollution.

Authors:  Renyi Zhang; Lin Wang; Alexei F Khalizov; Jun Zhao; Jun Zheng; Robert L McGraw; Luisa T Molina
Journal:  Proc Natl Acad Sci U S A       Date:  2009-10-07       Impact factor: 11.205

  8 in total
  1 in total

1.  A novel hybrid model for six main pollutant concentrations forecasting based on improved LSTM neural networks.

Authors:  Shenyi Xu; Wei Li; Yuhan Zhu; Aiting Xu
Journal:  Sci Rep       Date:  2022-08-24       Impact factor: 4.996

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.