Literature DB >> 34857891

A new method for evaluating air quality using an ideal grey close function cluster correlation analysis method.

Xiaoling Ren¹, Zhenfu Luo², Shuyu Qin³, Xinqian Shu⁴, Yuanyuan Zhang³.

Abstract

To scientifically and reasonably evaluate air quality with a large amount of monitored data, this paper proposes a new evaluation method called ideal grey close function cluster correlation analysis (IGCFCCA). Taking the air quality in Ningxia Province, China, as an example, according to China's air quality standard, SO2, NO2, PM10, PM2.5 and O3 are selected as evaluation indexes to perform the evaluation. The results show that the air quality in this region in 2018 can be divided into three classifications, among which the relatively poor air quality in March, April and May is the first classification, the better air quality in August and September is the third classification, and the air quality in other months falls under the second classification. Correlation analysis is used to qualitatively determine that these three classifications correspond to first-level air quality in China's air quality standard, and the correlation degree, which is the distance between the three classifications and the first-level air quality, is quantitatively determined. Specifically, the correlation degrees of the first-classification, second-classification and third-classification of air quality are 0.674, 0.697 and 0.71, respectively. The research results indicate potential directions and objectives for air quality management to achieve scientific management.

Entities: Chemical

Year: 2021 PMID： 34857891 PMCID： PMC8639721 DOI： 10.1038/s41598-021-02880-1

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

The air environment is a dynamic and complex system. The air quality is influenced by some pollutants, such as SO2, NO2, PM10, and O3. The concentrations of these pollutants are changing constantly. However, the monitored data used in analyses are usually collected in a certain period, and examples include one-hour average, few-hour average, one-month average and one-year average data. Instantaneous data collected every minute or second are difficult to collect and analyse. Therefore, this collection approach is considered a grey system. In a grey system, some information is known, and some information is unknown[1-7]. At present, China’s air quality standard (GB3095-2012) divides air quality into two levels and stipulates the concentrations of pollutants in first-level and second-level air[8,9]. The concentrations of pollutants are comparatively lower in first-level air, and they are higher in second-level air. The major pollutants include SO2, NO2, PM10, O3, and others. However, when people evaluate air quality according to GB3095-2012, there may be some problems. First, according to the national standard, the common evaluation methods can only determine which level the current air is associated with. However, there is no analysis of how much the current air belongs to the level, and it is not clear how far the current air is from the standard level. The space for improving the current air quality is also very vague. It is necessary to develop a method to quantitatively calculate the correlation degree, which is the distance between the current air and the two levels of air standards. Second, to determine the air quality in a certain area in a period of time, the concentrations of pollutants are usually monitored every day. However, the amount of monitored data is very large. Obviously, if people compare and analyse each recorded value, the workload will be very large, and tasks will be almost impossible to complete. Therefore, people usually calculate the average value of the data first and then analyse the average. However, among so many monitored data, which data should be taken as a group for average calculation is a problem. In other words, determining how to scientifically classify data is the key. Data with similar characteristics can be classified into one group. These different classifications can be analysed and evaluated. Therefore, the results of the analysis can be scientific. At present, there are many methods for comprehensively evaluating atmospheric environmental quality, including the air pollution index (API) method, ambient air quality index (AQI) method, single factor index method, green air pollution comprehensive index method, analytic hierarchy process, artificial neural network models, and fuzzy comprehensive evaluation method[8]. Due to the different evaluation principles of various evaluation methods, each method has unique advantages and disadvantages. Among them, the API and AQI methods are simple, intuitive and convenient to use but only applicable for evaluating the short-term air quality in cities[9]. The single factor index method is clear and easy to implement, but it cannot consider the air quality status as a whole, and the evaluation results are one dimensional[9]. Green's comprehensive air pollution index method is easy to understand and implement, but it is only applicable to areas where coal pollution is the main pollution type[9]. The analytical hierarchy process (AHP) is simple, practical and systematic, but quantitative results are limited; additionally, when there are many indicators, the statistics will be complex, and weights will be difficult to determine[9]. The artificial neural network evaluation method has the advantages of a fast operation speed, self-adaptation and strong fault tolerance, but the disadvantage is that when the data are poorly correlated, the evaluation results will exhibit homogenization phenomena[10-13]. IGCFCCA is a kind of fuzzy comprehensive evaluation method based on fuzzy mathematics, the fuzzy principle and the grey close function. The method can solve the common incomplete data problem and mainly assesses the relationships between uncertainty and incomplete information analysis, model building and forecasting. The method only needs a small amount of data and can achieve good prediction results. In this paper, the IGCFCCA method is used to evaluate the air quality in Ningxia Province. The method can not only scientifically classify a large amount of data but also calculate the correlation degree between each classification and the relevant standard. This approach can provide an important basis for comprehensive environmental management. Moreover, this new method provides a scientific reference and an important basis for the establishment and optimization of other industry standards in the future.

Basic principle and methods

A sample, which comes from the monitored data reports of some environmental management departments, is first classified by ideal grey close function cluster analysis. Then, the level of the sample is determined by grey correlation analysis, and comprehensive evaluation conclusions are established according to the correlation degree between the classification of the sample and the levels specified in GB3095-2012.

The classification of the sample to be evaluated

Establishing the evaluation index sequence matrix for the selected sample

Let S be a sequence of clustering objects, i.e., S = {s1, s2…, s}; X is a sequence of air-influencing variables, i.e., X = {x1, x2…, x}; x is the original monitoring data for s (i = 1, 2…, m) and x (k = 1, 2…, n); i and m represent the number of objects considered in clustering; k and n are the number of the influencing indexes which are the pollutants mentioned above. Accordingly, the following matrix can be established (Eq. 1).

Establishing the matrix of ideal-value grey close function clusters

Let X0 = {x01, x02…, x0} be the ideal-value sequence corresponding to each influential index. The principle for determining the ideal value is as follows (Eqs. 2, 3, 4). The first situation: The larger the influencing index (x) is, the better the air quality is; in this case, the ideal value The second situation: The smaller the influencing index (x) is, the better the air quality is; in this case, the ideal-value Third, the air quality is best when the influencing index (x) displays a moderate value, and the ideal value is According to the ideal value x0 (Eqs. 2, 3 or Eq. 4) and the original monitored data (x), the grey close function value y is calculated by using (Eq. 5).where x is the original monitored data and x0 is the ideal value corresponding to the k-th influential index. Moreover, the function value y is dimensionless, and y ∈ [0,1]. y denotes the correlation degree of s and s0 for the k-th index. Specifically, the larger y is, the closer s is to the ideal value s0, and the smaller y is, the farther s is from s0. Thus, the following grey close matrix Y can be established (Eq. 6). In this case, Y is the grey close function value. Moreover, (y01, y02…, y0) = (1,1…,1)1× is the ideal sequence, and the bigger y is, the better s is; the biggest y is equal to 1.

The classification of the sample to be evaluated

Because the influence of each influencing index is different, the weight of each influencing index needs to be considered. Let P be the comprehensive analysis value of s. P can be expressed as follows (Eq. 7)where W is the weight of each influencing index, and since the number of indexes is k, the number of W values is also k (W1, W2…, W). Corresponding, the following equation can be established (Eq. 8). Based on the actual comprehensive analysis value P, P = (P1, P2…, P)T. The following equation (Eq. 9) can be used to calculate the grey close value P of P in relation to P. Then, If P (Eq. 10) satisfies the following three conditions: (1) reflexivity, where P = 1 (i = j); (2) symmetry, where P = P; and (3) normativity, where P ∈ [0,1], we can select the appropriate threshold value from the P matrix, intercept the branches with weight values less than λ, which is the similarity coefficient[4,5], and establish the classification (t = 1, 2…, c) when λ level meets the relevant requirement. represents each classification of the air in a given region. The following equations (Eqs. 11, 12) can be established.where is the t-th classification, is the kth index of the t-th classification, t is the number of classifications, and k is the number of influencing indexes. can be expressed in the following matrix form (Eq. 13).

Correlation degree analysis of the sample to be evaluated

Let be the sample to be evaluated, and let X = (x1, x2…, x), which is the influencing index set mentioned above and is the evaluation index used for . Let be the stated air quality classification in the GB3095-2012. Then, the equation for the correlation coefficient is as follows (Eq. 14)[14].where ζ (k) is the correlation coefficient and ε is the resolution coefficient, with a general value of 0.5[4,5]. Moreover, the correlation degree (R) equation is as follows (Eq. 15). The value of R is calculated by using (Eq. 15). The maximum value of R indicates that the sample to be evaluated has the highest correlation degree with the considered air quality level. Therefore, the sample is classified correspondingly.

Air quality assessment—taking Ningxia Province in China as an example

The classification of the samples to be evaluated

Monthly reports of the air quality in Ningxia Province in 2018 were provided by the Department of Ecology and Environment of Ningxia Province. The monthly report data were used to establish the cluster of samples S (Table 1) (Eq. 1). Each sample included five kinds of pollutants. Moreover, the concentrations of SO2, NO2, PM10 and PM2.5 were based on monthly averages calculated from 24-h averages, and the concentration of O3 was the monthly average calculated from the 8-h average values.

Table 1

Air quality in Ningxia Province in 2018.

Index	Monthly average concentrations of major monitored pollutants (μg/m³)
Index	SO₂ (x₁)	NO₂ (x₂)	PM₁₀ (x₃)	PM_2.5 (x₄)	O₃ (x₅)
January			93	46
February	40	27	86	40	104
March	30	32	167	55	129
April	18	27	159	47	141
May	14	23	150	45	162
June	14	24	74	26	178
July	9	17	81	29	160
August	10	20	56	25	150
September	13	27	65	26	129
October	19	37	87	39	112
November	27	43	155	57	83
December	32	37	141	50	76

Air quality in Ningxia Province in 2018. x1 is the SO2 concentration; x2 is the NO2 concentration; x3 is the PM10 concentration; x4 is the PM2.5 concentration; and x5 is the O3 concentration. For these pollutants, the lower the concentration is, the better the air quality is. As shown in Table 1, because the management department only provided some monitored data and the data in January are incomplete, only the data that are listed in the table from February to December can be effectively analysed. However, the focus of this study is on the new analysis and evaluation method (IGCFCCA), and almost all of the data can be analysed by this method. According to (Eq. 3), the five ideal values are as follows: x01 is 9, x02 is 17, x03 is 56, x04 is 25, and x05 is 76. Based on the sample data in Table 1, the ideal-value grey close matrix (Eq. 6) can be obtained from (Eq. 5); according to (Eq. 8), the weights of x1, x2, x3, x4 and x5 are w1 = 0.06, w2 = 0.09, w3 = 0.34, w4 = 0.12, and w5 = 0.39, respectively. Consequently, the comprehensive analysis value P (i = 1, 2…, 11) (Table 2) of S is calculated with (Eq. 7). The grey close function value y (Eq. 5) and the comprehensive analysis value P are shown in Table 2.

Table 2

Grey close function value and the comprehensive analysis value.

Index	X₁	X₂	X₃	X₄	X₅	Comprehensive analysis value (P_i)
S₁	0.225	0.630	0.651	0.625	0.731	0.651
S₂	0.300	0.531	0.335	0.455	0.589	0.464
S₃	0.500	0.630	0.352	0.532	0.539	0.481
S₄	0.643	0.739	0.373	0.556	0.469	0.482
S₅	0.643	0.708	0.757	0.962	0.427	0.641
S₆	1.000	1.000	0.691	0.862	0.475	0.673
S₇	0.900	0.850	1.000	1.000	0.507	0.787
S₈	0.692	0.630	0.862	0.962	0.589	0.736
S₉	0.474	0.459	0.644	0.641	0.679	0.631
S₁₀	0.333	0.395	0.361	0.439	0.916	0.590
S₁₁	0.281	0.459	0.397	0.500	1.000	0.645

Grey close function value and the comprehensive analysis value. With P (P1, P2… and P11) as known numbers, P (j = 1, 2…, 11) can be calculated from (Eq. 9). The corresponding elements of the grey similar matrix (Eq. 10) are shown in Table 3.

Table 3

Grey close values P.

S	S₁	S₂	S₃	S₄	S₅	S₆	S₇	S₈	S₉	S₁₀	S₁₁
S₁	1.0000
S₂	0.7127	1.0000
S₃	0.7389	0.9647	1.0000
S₄	0.7404	0.9627	0.9979	1.0000
S₅	0.9846	0.7239	0.7504	0.7520	1.0000
S₆	0.9673	0.6895	0.7147	0.7162	0.9525	1.0000
S₇	0.8272	0.5896	0.6112	0.6125	0.8145	0.8551	1.0000
S₈	0.8845	0.6304	0.6535	0.6549	0.8709	0.9144	0.9352	1.0000
S₉	0.9693	0.7353	0.7623	0.7639	0.9844	0.9376	0.8018	0.8573	1.0000
S₁₀	0.9063	0.7864	0.8153	0.8169	0.9204	0.8767	0.7497	0.8016	0.9350	1.0000
S₁₁	0.9908	0.7194	0.7457	0.7473	0.9938	0.9584	0.8196	0.8764	0.9783	0.9147	1.0000

Grey close values P. The following information can be obtained from Table 3. If λ = 0.9[4,5], S2, S3 and S4 correspond to the first classification ; S7 and S8 correspond to the third classification ; and the other S values correspond to the second classification . S2, S3 and S4 are the samples for March, April and May, respectively, and S7 and S8 are the samples for August and September, respectively. Cluster (Eq. 13) (Table 4) includes , and .

Table 4

The classifications of air.

Index	x₁	x₂	x₃	x₄	x₅
First classification	20.67	27.33	158.67	49.00	144.00
Second classification	23.50	30.83	104.00	40.17	118.83
Third classification	11.50	23.50	60.50	25.50	139.50

The classifications of air. The samples (Table 1) can be divided into three classifications, and the class-based approach provides two main advantages. First, if the data in each month are compared and analysed with the air standards, the workload will be large, and errors will easily accumulate. In contrast, only analysing the three classifications can greatly improve the work efficiency. Second, this classification method can be used to establish national or local standards. For example, actual statistical data over many years can be classified by this method, and the classification results can be used as new comparison standards, which would be beneficial to the analysis and evaluation of statistical data in the future.

Sample evaluation and correlation degree analysis

In the former parts of the paper, the samples from each month in 2018 are divided into three classifications (, and ). The concentrations of these pollutants in the air quality standard (GB3095-2012) are used for comparison, and the comparison of the data is shown in Fig. 1.

Figure 1

Comparison of the samples to be evaluated with the two levels of air standards.

Comparison of the samples to be evaluated with the two levels of air standards. As shown in Fig. 1, compared with that in the first-level air standard, the SO2 concentration in the third-classification air standard is lower, and the NO2 concentrations in the three air classes are all lower than the concentration in the first-level air standard. In other words, the concentration of NO2 in the region meets the first-level air standard throughout the year, and the concentration of SO2 in August and September meets the first-level air standard. Therefore, according to the first-level air standard, the region should strengthen the management of PM10, PM2.5 and O3 emissions throughout the year, and the management of SO2 emissions in months other than August and September should be strengthened. Compared with the second-level air, the concentrations of SO2, NO2, O3 in the three air classes are all lower than that in the second-level standard, the concentrations of PM10 and PM2.5 in the third classification of air are lower those in the second-level standard. In other words, the concentrations of NO2, SO2 and O3 in the region meet the second-level air standard throughout the year. Moreover, the concentrations of PM10 and PM2.5 in August and September meet the second-level air standard. Therefore, according to the second-level air standard, the region should strengthen the management of PM10 and PM2.5 emissions. According to grey theory, the cluster data and the data (and from air quality standard) used for comparison must be initialized[4,5], and the initial values are shown in Table 5.

Table 5

Data initialization results.

Index	x₁	x₂	x₃	x₄	x₅
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}^{\prime }$$\end{document}S1′	1.000	1.322	7.676	2.371	6.967
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}^{\prime }$$\end{document}S2′	1.000	1.312	4.426	1.709	5.057
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{3}^{\prime }$$\end{document}S3′	1.000	2.043	5.261	2.217	12.130
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{S}}_{01}^{\prime }$$\end{document}S01′	1.000	2.000	2.000	0.750	5.000
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{S}}_{02}^{\prime }$$\end{document}S02′	1.000	0.667	1.167	0.583	2.667

Data initialization results. According to Eqs. 14 and 15, the correlation degree R and the correlation coefficient ζ of the first-level standard are shown in Table 6, and the correlation degree and correlation coefficient of the second-level standard are shown in Table 7.

Table 6

Correlation with the first-level air standard.

Correlation coefficient and correlation degree	ζ₁	ζ₂	ζ₃	ζ₄	ζ₅	R₁
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}^{\prime }$$\end{document}S1′	1.000	0.807	0.333	0.637	0.591	0.674
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}^{\prime }$$\end{document}S2′	1.000	0.638	0.333	0.558	0.955	0.697
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{3}^{\prime }$$\end{document}S3′	1.000	0.988	0.522	0.708	0.333	0.710

Table 7

Correlation with the second-level air standard.

Correlation coefficient and correlation degree	ζ₁	ζ₂	ζ₃	ζ₄	ζ₅	R₂
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}^{\prime }$$\end{document}S1′	1.000	0.832	0.333	0.646	0.431	0.648
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{2}^{\prime }$$\end{document}S2′	1.000	0.716	0.333	0.591	0.405	0.609
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{3}^{\prime }$$\end{document}S3′	1.000	0.775	0.536	0.743	0.333	0.677

Correlation with the first-level air standard. Correlation with the second-level air standard. According to Tables 6 and 7, all three classifications have the highest correlation with the first-level air standard. Therefore, the air quality in Ningxia Province in 2018 was associated with the first-level standard. More importantly, this result quantitatively indicates a correlation between the three classifications and the first-level air standard. The correlation degrees of the first classification, second classification and third classification with the first-level air standard are 0.674, 0.697 and 0.71, respectively. Therefore, it is clear that the gaps between the three classifications and the compared air standard are 0.326, 0.303 and 0.29. Moreover, the reason why the correlation degree cannot reach 1 is that some pollutant concentrations in the monitored data for these classifications are lower than the first-level air standard, and the remaining pollutant values are higher. Therefore, there is still room to continue to improve the air quality in the region. The region should continue to reduce the concentrations of pollutants and further improve the correlation degrees of all classifications of air with the first-level air standards.

Conclusions

A new method of air quality assessment, IGCFCCA, is proposed. The advantage of the method is that it can quantitatively characterize the correlation degree between the current air quality and the corresponding standard level. Specifically, the results of this method indicated that the air quality in Ningxia Province in 2018 was correlated with first-level air in China’s air quality standard. The correlation degrees of the first classification, second classification and third classification of air quality with the first-level air standard are 0.674, 0.697 and 0.71, respectively. Therefore, the region should continue to reduce the concentrations of pollutants, especially PM10, PM2.5 and O3, and further improve the correlation degrees of all classifications with the first-level air standards. Notably, this method can be used in other industries. The air quality in Ningxia Province in 2018 was classified into three classifications by ideal grey close function cluster analysis. Specifically, the relatively poor air quality in March, April and May and the comparatively better air quality in August and September correspond to the third classification, and the air quality in the remaining months corresponds to the second classification. In addition, the classification method can be used as a reference when establishing other classification standards, such as national standards, regional standards, and industry standards. Supplementary Information.

1 in total

1. Improving performance evaluation based on balanced scorecard with grey relational analysis and data envelopment analysis approaches: Case study in water and wastewater companies.

Authors: Fatemeh Sarraf; Shabnam Hashemi Nejad
Journal: Eval Program Plann Date: 2019-11-24

1 in total