Literature DB >> 29874245

Prediction of daily PM2.5 concentration in China using partial differential equations.

Yufang Wang^1,2, Haiyan Wang³, Shuhua Chang², Adrian Avram³.

Abstract

Accurate reporting and forecasting of PM2.5 concentration are important for improving public health. In this paper, we propose a partial differential equation (PDE) model, specially, a linear diffusive equation, to describe the spatial-temporal characteristics of PM2.5 in order to make short-term prediction. We analyze the temporal and spatial patterns of a real dataset from China's National Environmental Monitoring and validate the PDE-based model in terms of predicting the PM2.5 concentration of the next day by the former days' history data. Our experiment results show that the PDE model is able to characterize and predict the process of PM2.5 transport. For example, for 300 continuous days of 2016, the average prediction accuracy of the PDE model over all city-regions is 93% or 83% based on different accuracy definitions. To our knowledge, this is the first attempt to use PDE-based model to study PM2.5 prediction in both temporal and spatial dimensions.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2018 PMID： 29874245 PMCID： PMC5991382 DOI： 10.1371/journal.pone.0197666

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

PM2.5 (particulate matter smaller than 2.5um), has been linked to deducing visibility, negatively affecting human’s health and impacting global climate and therefore PM2.5 pollution has aroused unprecedent public concern in the world [1]. In some places where there is critically high atmospheric pollution, government will enforce extreme actions as closing of schools and industries and restriction of motor vehicles circulation. If it were possible to predict these episodes one or two days in advance, public authorities will have more time to take more efficient actions to protect citizens, for instance, by increasing public transports in case of traffic block or by issuing an early warning to inform citizens to take preventing-PM2.5 measures. The originate of PM2.5 is complex. Many factors such as area topography, geomorphology, emission source location and weather may have influence on PM2.5 concentration [2] and regional transported aerosols are also an important factor for PM2.5 pollution [3-5]. Similarly like stock market, PM2.5 concentrations are dynamic and exhibit wide variation, and accurate prediction of them thus becomes a highly challenging task because of the highly nonlinear nature and the complex characteristics. Presently, studies can be divided into three groups: deterministic methods, statistical approaches and satellite remote sensing techniques. Deterministic methods, such as 3-D chemistry transport models (CTMs) [6, 7] mainly address the formation mechanism of PM2.5 from the view of physico-chemical and meteorological processes by a knowledge of the temporal dynamics of the emission quantitites of various pollutants. This kind of method needs perfect representation of the physico-chemical processes, therefore it is difficult to guarantee the real-time forecast. Statistical approaches, as empirical prediction methods, aim to detect some certain correlated patterns between air quality data and various selected predictors, thus to predict the pollutant concentrations in future. Common statistical approaches, such as linear regression models [8, 9], neural networks [10], nonlinear regression models [11, 12], neuro-fuzzy models, are easier to implement but only limited to specific geographical locations. Satellite remote sensing techniques [13] have the advantages of spatially seamless and long-term coverage, so they have been widely employed to predict PM2.5 by considering satellite-derived aerosol optical depth empirically correlated with PM2.5 in recent years, but the equipment expense is high. Each approach addresses problems from different perspectives. Differential equation models have been extensively used in biology, sociology, economics, physics and other fields to describe various time evolution and dynamic systems. Logistic models have been widely used to describe various population dynamics and predict the growth of bacteria and tumors over time in Biology Mathematics and the spatial spread of infectious disease in epidemiology [14]. Reaction-diffusion equations describe population development in [15]. Navier-Stokes equations describe the motion of viscous fluid substances in physics [16]. In particular, leveraging geocoded data dramatically expands application of PDE models to many social problems. Recently, some researchers proposed using PDE models to forecast information diffusion in online social networks by introducing social distance [17-19]. The main contribution of this study is to introduce a PDE model of PM2.5 transport which describes the temporal and spatial characteristics of PM2.5 in China from a global view. Further we use the model to predict the next day’s PM2.5 concentrations. The PDE-based model is linear-diffusion equation built on intuitive distance metric between clusters in China. These clusters are obtained by a higher-order network analysis in [20, 21]. This is the first work to apply PDE model to predict PM2.5 concentration. As a original work, this method can help Chinese government make prediction about PM2.5 concentrations one or two days in advance.

Methods

Data

The research data used in this study covers 300 days from January 1st, 2016 to October 26th, 2016. The average PM2.5 daily level of each cluster is calculated from the daily PM2.5 level of all cities included in this cluster. All relevant data is available from figshare: https://figshare.com/s/ea760256b9850255eed8 (DOI: 10.6084/m9.figshare.6243254). The daily PM2.5 data can be publicly deposited from this link. Specifically, This PM2.5 data set contains 189 priority pollution monitoring cities in China’s mainland, which cover all the thirty-four provincial-level regions of China. The most polluted and most interested cities are all included, such as Beijing, Shanghai and Guangzhou. The 189 cities have been divided into 9 clusters by a higher-order complex network method, considering various weather conditions and geography conditions in [20]. The original PM2.5 concentrations of each city are normalized to a discrete level value 1, 2,…., or 6, according to “Ambient Air Quality Standards” (GB3095-1996) of China, where PM2.5 concentrations are divided into 0-35, 36-75, 76-115, 116-150, 151-250 and greater than 250 μg and these different concentration ranges are leveled from 1 to 6, describing that air quality is good, mild, moderate, severe severe, highly severe or seriously severe. The linear scaling makes concentration value to a specific range, which ensures the larger value input attributes do not overwhelm smaller value inputs thus in turn helps decrease prediction errors.

Clustering and embedding

In our previous work [20], we have divided 189 main cities into 9 clusters by a higher-order spectral method. Specifically, the higher order organization, motif M8 in Fig 1, reflects the PM2.5 movements from source to target in PM2.5-city network. We applied motif M8 as the basic building block of complex network and applied the higher-order spectral method in [22] to divide 189 main cities of China into 9 clusters in Fig 2. Cities in each cluster are more affected with each other about PM2.5 pollution (local emission) and PM2.5 of cities among different clusters also possibly influences each other through air flow (regional transport). Therefore in this paper we propose a partial differential equation in spatio-temporal dimension to describe the dynamic system of PM2.5 transport. Similarly as information diffusion on social network, for a city cluster, one part of its PM2.5 is produced by itself from its road dust, vehicle exhaust, industrial emission, agriculture activities, which can be seen as local emission. The other part of its PM2.5 is from other city clusters through regional transport [3-5]. Therefore, the change of PM2.5 concentration in a city cluster equals the amount from regional transport plus the PM2.5 produced by the city itself (Fig 3).

Fig 1

Triangle motifs.

Fig 2

Nine clusters obtained by M8-motif spectral clustering algorithm [20].

Tableau Public 10.3 (https://public.tableau.com/) was used to create the map.

Fig 3

PM2.5 transport processes.

Nine clusters obtained by M8-motif spectral clustering algorithm [20].

Tableau Public 10.3 (https://public.tableau.com/) was used to create the map. As we demonstrate in the last paragraph, we could divide 189 cities of China into 9 clusters, i.e., U = {U1, U2, …, U9} through a higher-order spectral clustering method [20, 22]. Once the entire group of 189 cities are partitioned into disjoint sets U, i = 1, 2, …, 9, we will order U in a meaningful way. For general clustering partition, the spatial arrangement of U can be based on specific modeling goal and social or geographical characteristics of the underlying network. In [23, 24], the level of democracy, diaspora size, international economic relations and geographical proximity are used to order U. In [17], friendship hops are used to define distance metric, then U is embed at location x based on that x-axis is used as the social distance. For PM2.5 transport in China, ordering cluster U by geographical characteristics and meteorological characteristics are the best intuitive method. Wind is a main factor influencing air transport [2, 25, 26] and China is with a monsoon. Specifically, in about September the winter monsoon first reaches northern China and soon moves southly to the Yangtze River and the South China Sea. Summer monsoon prevails to southern China in early May; Then it prevails to the southwestern region in late May or early June; Mid-summer monsoon suddenly prevails into the Yangtze River and Huaihe River Basin in June, mid-summer monsoon again moves to North China and Northeast China in July. Therefore the monsoon of China leads to the fact that most of its wind directions are south-to-north or north-to-south. Therefore, It is meaningful to project the 9 city clusters located from north to south in the map of China onto y-axis of the Cartesian coordinates, and name these city clusters as U1 to U9 as their geographical location from north to south as in Fig 2. These clusters are naturally embedded into the two dimensional Euclidean space. As a result, the PDE model, which describes PM2.5 transport between the clusters, would be two-dimensional PDEs. For simplicity, in this paper, we project the clusters into y-axis and focus on one dimensional PDE. We will study two-dimensional PDE models in a future work.

Linear PDE model

Differential equation can describe lots of dynamic systems in fields of physics, mathematical biology, social science and many other fields [27-30], particularly the “information” diffusion problems. In epidemic models, it describes the spread of infectious diseases among a population [29, 31] to help people understand the disease’s spreading patterns, such as trends and ratio of people infected. Epidemic models have inspired many unsolved mathematical questions and re-energized novel mathematical researches [27]. In recent century, epidemic models also have been adopted to describe and predict computer virus infections [32, 33]. Recently, some works [17-19]use differential equations to describe information propagation such as news and rumors on social network. Ordinary differential equation(ODE) and partial differential equation(PDE) are two basic and main classes of models to describe dynamic models. The former is more suitable for studying global patterns, such as trends of people getting infected by epidemics, but cannot describe the connections among individuals, such as who infects whom. However, besides time variable t, partial differential equation has spatial factors involved. Therefore, spatial dimension provides a quantitative way to describe the local interactions between individuals of the system into global views. PDE models obey the energy conservation law [34] that is the rate at which a quantity changes in a given domain must equal the rate at which it flows to the boundary plus the rate at which it is created or destroyed in the domain. In different fields, conversation law can describe different scenarios [17-19]. In the following, we propose a linear PDE-based model to abstractly translate the PM2.5 transport in two processes: local emission and global transport (in Fig 3). To quantify the conversation law for PM2.5 concentration, let q(y, t) denote the average PM2.5 concentration of the cities in the cluster at distance y during time t. The value of q(y, t) depends on two processes. First, the cities in U can produce PM2.5 by themselves and we call this process as local emission. Secondly, PM2.5 in U where x ≠ y can flow into U by wind, namely regional transport [2] and we call this regional transport as global transport. Following is the description of the PDE model where is the global transport, which describes the regional transport of PM2.5 between different clusters. This mathematical expression is widely used in spatial biology and epidemiology [30], where it describes the spatial spread of infectious disease. measures the amount of the quantity crossing the section at x and time t and this results from a principle analogous to Fick law [34]. d(x) is a positive function of x and it describes the PM2.5 transport ability of the cluster at location x. Actually, different regions in China have different PM2.5 transport abilities, therefore using a positive piece-wise function to describe d(x) will be better, which will be studied in our future work. To be simple, we use in this paper, here b0 is a parameter which needs to be determined by the real data. r(t)q(x, t)h(x) represents the local emission and it denotes the rate at which PM2.5 is produced or dissipated within the cluster at location x and time t. The amount of reproduction or dissipation is proportional to both the existing amount of PM2.5 and the amount of various varying resources. This mathematical expression has been used to describe and predict various population dynamics such as the growth of bacteria and tumor over time [34]. h(x) represents the spatial heterogeneity of PM2.5 in the cluster at location x and it depicts the PM2.5 emissions from various sources such as road dust, vehicle exhaust, biomass burning, industrial emissions of this city cluster. In this paper, different clusters represent different regions in China. Different levels of economy, population of different regions lead to essentially different PM2.5 emissions [2]. h(x) is constructed by piecewise first-order Lagrange Interpolation with the interpolated points are x = 1, 2, …, 9, where each cluster is located, therefore nine references b = h(x), i = 1, 2, …, 9 need to be determined. r(t) is the growth rate with time t in local process. It depicts PM2.5 dissipation with the external changing factors such as wind or certain other atmospheric conditions [2, 25, 26]. Without reproduction, PM2.5 will dissipate with wind, therefore the form r(t) can be expressed as r(t) = α + e−, where α, β and δ are constant values. This form of r(t) has also been used in describing information decay in online social network [17]. More plausible expression of r(t) will be discussion in our future work. , is the Neumann boundary condition [34]. To be simple, here we suppose there is no flux of PM2.5 across the boundaries at x = l, L. q(x, 1) = ϕ(x) is an initial PM2.5 concentration function, which can be constructed from the history data of PM2.5 by the cubic splines interpolation. The basic mathematical properties, such as existence, uniqueness and positivity of the solution of the models can be established from the standard theorems for parabolic PDEs [35]. In this paper, the PDE is validated with real data from China National Environmental Monitoring Center.

Accuracy definition

We evaluate the performance of the proposed linear diffusive model by comparing the concentration level calculated by the model with the actual observations. y denotes the actual value, and is the predicted value. Then the prediction performance is evaluated from two different technical indicators. MAID (mean absolute increment accuracy), which is proposed in this paper based on the practical significance, defined as where n is the number of sample points in test data set and AID evaluates the absolute accuracy at each sample point. There are totally five PM2.5 concentration levels from level one to level five and AID describes the absolute accuracy in the view of level length. For example, if , then the prediction accuracy of this sample point is . MRA(mean relative accuacy), which is defined as where n is the number of the sample points in the test data set.

Results

After we finish the work of collecting the real PM2.5 concentration data, we conclude the prediction process as follows: Firstly, we normalize the data for decreasing the experimental errors and calculate the average PM2.5 daily level of each cluster. Then, we use the training data set of three days to predict the PM2.5 concentration for the following day, i.e., we use days 1-3, 2-4, 3-5,…as training data, and predict the following day 4, 5, 6,…correspondingly and record the prediction accuracy for all 9 regions at the day 4, 5, 6,…. Specifically, we give the detailed prediction process for day 4 as an instance. For the training data set of day 1, 2, 3, we interpolate the discrete data of day 1 in constructing the initial function ϕ(x) of the PDE model; Next, we use the data of day 2 and day 3 to train the parameters of the PDE model by fminsearch function in matlab; Finally, we use this PDE model with a new initial function ϕ(x) trained by the data of day 3 to predict the PM2.5 concentration of day 4. Fig 4 illustrates the prediction results of regions 1-9 at each day from January 4, 2016 to October 26, 2016. Clearly to see that there is a consistent trend between predicted data (represented by red lines) and real observations (presented by green lines). Specifically, Figs 5 and 6 show the prediction accuracy of each city region at different time t over 300 days from January 4, 2016 to October 26, 2016. Based on the accuracy definition MAID, in Fig 5 most of plus marks exist above the horizontal line of 0.9, which means that most of days have a 90% higher prediction value for each cluster and in Table 1 the mean accuracy values of the 9 clusters are all higher than 91%. In Fig 6, the plus marks are not so gathered like that in Fig 5, that is because the accuracy definition of MRA is a rule which measures the accuracy of PM2.5 concentration value while MAID is a rule which measures the accuracy of PM2.5 concentration value range. In Fig 6, we can still find that most of plus marks exist above the horizontal line of 0.8, which means that most of days have a 80% higher prediction value for each cluster and in Table 1 the mean accuracy values of the 9 clusters are all higher than 80%. Besides, we also find that in Table 1 the average prediction accuracy of the PDE model over all city regions and over all days is 93% and 83% based on the two different accuracy definition respectively.

Fig 4

The daily PM2.5 concentration level from January 4, 2016 to October 26, 2016.

Red line represents the prediction value and green line stands for the real data.

Fig 5

The accuracy of each cluster (city region) from January 4, 2016 to October 26, 2016, based on the accuracy definition of MAID.

Fig 6

The accuracy of each cluster (city region) from January 4, 2016 to October 26, 2016, based on the accuracy definition of MRA.

Table 1

The average accuracy of different regions and the average accuracy over all regions based on different accuracy definition.

	MAID	MRA		MAID	MRA
Region 1	0.926178451	0.806965014	Region 6	0.948400673	0.858288077
Region 2	0.926262626	0.821313941	Region 7	0.928367003	0.834495023
Region 3	0.924410774	0.823766934	Region 8	0.939478114	0.848018478
Region 4	0.911363636	0.802610969	Region 9	0.930808081	0.809905853
Region 5	0.946212121	0.865824998	Regions 1-9	0.93127572	0.830132143

The daily PM2.5 concentration level from January 4, 2016 to October 26, 2016.

Red line represents the prediction value and green line stands for the real data. Specifically, Fig 7 lists the predicted results and the graphs of h(x) and r(t) on April 27, 2016 as an instance. The parameters in h(x) and r(t) on April 27, 2016 are listed in Table 2. The results of other days are omitted due to space limit.

Fig 7

The predicted results of each cluster, h(x) and r(t) of the PDE model on April 27, 2016.

Table 2

The values of these parameters in h(x) and r(t) of the PDE model on April 27, 2016.

Parameter	Value	Parameter	Value	Parameter	Value
b₁	0.031296298	b₅	0.125161344	b₉	0.211384796
b₂	0.430962739	b₆	0.096457084	α	0.000000025
b₃	0.009605521	b₇	0.000040027	β	0.001943744
b₄	0.000081749	b₈	0.054862183	δ	7.245630868

Discussion

In this paper, we propose a PDE-based model to characterize PM2.5 transport in 189 main cities of China in order to make short-term prediction of PM2.5 concentration. This is an innovative try to use mathematical methods to describe air pollution problem. This work aims to help Chinese government or health organization to forecast the PM2.5 concentration one or two days in advance and raise an alarm for specific regions at these moments when there is a high probability that the PM2.5 concentration is high. However, the prediction performance and the PDE model may be improved further in the following respects. Firstly, We can embed the problem into a high dimension space R2 by considering a few other variables that may affect the prediction performance. To better describe PM2.5 transport, robin boundary condition or others will also be discussed in our future work. A piecewise positive function about the diffusion term b(x) will better describe the reality and we will perfect it work in the following work. Secondly, in this paper, we choose the increment between each cluster to be 1, it can be adjusted according to the specific cases. Certainly, a better increment is based on a more better distance metric of the system. Lastly, we could propose an surveillance system based on the prediction outcomes of this study for practical use.

8 in total

Review 1. Systematic review and meta-analysis of the adverse health effects of ambient PM2.5 and PM10 pollution in the Chinese population.

Authors: Feng Lu; Dongqun Xu; Yibin Cheng; Shaoxia Dong; Chao Guo; Xue Jiang; Xiaoying Zheng
Journal: Environ Res Date: 2014-11-25 Impact factor: 6.498

2. Estimating the contribution of regional transport to PM_2.5 air pollution in a rural area on the North China Plain.

Authors: Dongsheng Chen; Xiangxue Liu; Jianlei Lang; Ying Zhou; Lin Wei; Xiaotong Wang; Xiurui Guo
Journal: Sci Total Environ Date: 2017-01-21 Impact factor: 7.963

3. Characteristics of particulate matter (PM10) and its relationship with meteorological factors during 2001-2012 in Beijing.

Authors: Guangjin Tian; Zhi Qiao; Xinliang Xu
Journal: Environ Pollut Date: 2014-05-22 Impact factor: 8.071

4. Estimating ground-level PM2.5 in China using satellite remote sensing.

Authors: Zongwei Ma; Xuefei Hu; Lei Huang; Jun Bi; Yang Liu
Journal: Environ Sci Technol Date: 2014-06-13 Impact factor: 9.028

5. Regional Level Influenza Study with Geo-Tagged Twitter Data.

Authors: Feng Wang; Haiyan Wang; Kuai Xu; Ross Raymond; Jaime Chon; Shaun Fuller; Anton Debruyn
Journal: J Med Syst Date: 2016-07-02 Impact factor: 4.460

6. Higher-order Network Analysis of Fine Particulate Matter (PM _2.5) Transport in China at City Level.

Authors: Yufang Wang; Haiyan Wang; Shuhua Chang; Maoxing Liu
Journal: Sci Rep Date: 2017-10-16 Impact factor: 4.379

7. Higher-order organization of complex networks.

Authors: Austin R Benson; David F Gleich; Jure Leskovec
Journal: Science Date: 2016-07-08 Impact factor: 47.728

8. Network Analysis of Fine Particulate Matter (PM2.5) Emissions in China.

Authors: Shaomin Yan; Guang Wu
Journal: Sci Rep Date: 2016-09-09 Impact factor: 4.379

8 in total

1 in total

1. Modeling fine-grained spatio-temporal pollution maps with low-cost sensors.

Authors: Shiva R Iyer; Ananth Balashankar; William H Aeberhard; Sujoy Bhattacharyya; Giuditta Rusconi; Lejo Jose; Nita Soans; Anant Sudarshan; Rohini Pande; Lakshminarayanan Subramanian
Journal: NPJ Clim Atmos Sci Date: 2022-10-12

1 in total