Literature DB >> 30761372

Improved quantum clustering analysis based on the weighted distance and its application.

Fan Decheng1, Song Jon1,2, Cholho Pang1,3, Wang Dong1, CholJin Won4.   

Abstract

Cluster analysis is widely used in fields such as economics, management and engineering. The distance and correlation are two of the most important and often used mathematics- and statistics-based similarity measures in cluster analysis. Many studies have been conducted to improve the distance and similarity in high-dimensional and overlapped data. However, these studies do not consider the degree of influence (weight) of different properties on different types of data. In practice, the weight of each property is different, so these methods cannot accurately analyze real data. First, this study proposes a new distance measure that can reflect the weight, so that non-spherical overlapping data in the Euclidean space can be projected onto a weighted Euclidean space to form non-overlapping data. Second, the Fuzzy-ANP method is used to determine the weight of each factor. Then, by applying the Fuzzy-ANP-Weighted-Distance-QC (FAWQC) method to weighted random data, the effectiveness of the method is verified. Finally, the method is applied to the 2015 Economics-Energy-Environment (3E) data for 19 provinces in China for a comparative study of the classification of the system structure and evaluation of the low-carbon economy development level. The experiment results show that the FAWQC method can more accurately analyze real-world data than other methods.

Entities:  

Keywords:  Economics

Year:  2018        PMID: 30761372      PMCID: PMC6275214          DOI: 10.1016/j.heliyon.2018.e00984

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Cluster analysis is an important technique in data mining with growing importance in the fields of machine learning, engineering, neural networks, biology, statistics, social sciences, and economics. Cluster analysis has been developing for several decades, and many clustering algorithms have been published. Traditional clustering algorithms can be divided into several types: partition-based clustering [1, 2], hierarchical clustering [3, 4, 5], density-based clustering [6, 7, 8], grid-based clustering [9, 10], and model-based clustering [11]. Different clustering algorithms have specific applications and shortcomings. For example, some clustering algorithms are more suitable for a particular type of data, while some algorithms may be more suitable for a particular type of special distribution and do not process data of other types of distribution well. Until now, there is no single general clustering algorithm that can be applied to all problems; almost all clustering algorithms have some shortcoming. For example, the partition-based k-means algorithm and fuzzy c-means (FCM) algorithms have a major shortcoming relating to the initial values (determining k, which is the number of clusters, and selecting the set of initial cluster centers). To overcome the flaws and shortcomings of traditional clustering algorithms, methods such as quantum clustering [12], spectral clustering [13, 14], granularity clustering, probabilistic graph clustering, and synchronous clustering [15, 16, 17, 18, 19] have become popular in recent years. Quantum clustering uses the gradient descent method to solve for the minimum quantum potential at a fixed learning rate and determine the cluster center. Then, a fixed measuring standard is used to cluster the sample. However, quantum clustering has its flaws and shortcomings in terms of the execution time, measuring distance matrix, and learning iterations. To improve upon the shortcomings of QC algorithms, researchers have proposed new clustering methods. Li Zhi-Hua et al. [20] proposed a new distance-based quantum clustering (QC). The method not only has the advantages of the standard quantum clustering algorithms but also does not require sample preprocessing in most cases and has a fixed clustering distance, which significantly improves the efficiency. Casaña-Eslava et al. [21] proposed a method to find suitable length parameters when applying QC in non-spherical data. Li Yang-Yang et al. [22] proposed a new QC method using kernel component analysis. This method preserves the advantages of the traditional QC, such as the ability to find clusters of any shape and subsequent determination of the cluster center using basic information from the data without a priori knowledge about the number of clusters. Moreover, the proposed algorithm is applicable to high-dimensional data sets, particularly to complex datasets. The focus of the above research is on improving the QC algorithm without considering the importance (weight) of each property. Thus, using these methods to analyze standard datasets from the UCI repository yields good accuracy. However, the performance suffers when they are applied to real data because in real data, the distribution of every property is different and complex, and the importance (weight) of every property also varies. For different clustering tasks of different contexts, the weight of each property should be reconsidered. In this paper, first, we propose an evaluation method based on the Fuzzy-ANP called Fuzzy-ANP-Weighted-Distance-QC (FAWQC). Second, we apply the method to two-dimensional weighted random data and verify the effectiveness of the method. Then, the method is applied in the 2015 Economics-Energy-Environment (3E) data for 19 provinces in China for a comparative study of classification of system structures. Based on this study, the low-carbon economy development level of the provinces in the four categories is evaluated. By verifying the method on weighted random matrices and the 2015 3E data for 19 provinces in China, the effectiveness of the method is proven. The method more accurately reflects real-world situations than the traditional K-means and QC methods.

Theory

Principle of quantum clustering

Quantum clustering is an inversion problem of quantum mechanics. In quantum mechanics, a quantum system is determined in time and space according to the Schrodinger differential equation, which is described by a wave function. In contrast with quantum mechanics, the quantum clustering is the process in which the distribution of particles is estimated on the basis of their potential function when the wave function is given. Therefore, in quantum clustering, it is critical to determine the cluster centers by means of the potential function from the solution of Schrodinger equation (Eq. (1)).where , , and denote the Hamiltonian operator, the eigenvalue energy level, the Reduced Plank Constant and the mass of a particle, respectively. The function , a so-called wave function, describes the eigenstate of a given quantum system. The function , are the potential function and Laplacian, respectively. Therefore, the distribution of the particles is finally determined by the potential energy function. In the quantum clustering (QC) algorithm that Horn and Gottlieb proposed [12], the wave function is given as a Gaussian Kernel shown in Eq. (2), and thus , where is the width parameter. Then, is calculated by solving Eq. (1) in the following steps. Eq. (3) is the rewritten type of Eq. (1). Therefore, the can be solved as Eq. (4): Furthermore, the first –order derivative of in the Eq. (1) is solved as Eq. (5): In addition, the second-order derivative of is described as Eq. (6). Thus, the can be solved as Eq. (7). Here, the eigenvalue energy , which is considered as a constant, has no any influence on the topological structure of . Therefore, the is ultimately described as Eq. (8). The algorithm uses gradient descent to solve for the minimum quantum potential at a fixed learning rate and determine the cluster center. Then, a fixed measuring standard is used to cluster the sample. The advantages of the algorithm are as follows: It is a partition-based clustering algorithm with unsupervised learning; It has only one parameter. The shortcomings of the algorithm are: Under most circumstances, the sample requires preprocessing when clustering, which increases the computation time. The measuring distance is relatively fixed, and different sample distribution features are not differentiated; When one solves for the minima of the potential energy function using the gradient descent, the degree of learning cannot be easily known, which affects the clustering precision.

Weighted-distance-based quantum clustering algorithm

To overcome these shortcomings the distance-based quantum clustering method was proposed in [20]. In this method, first determined the parameter delta value according to [23] and calculate the potential to determine the center of the cluster. Next, we took the measuring distance (β) and proceeded to analyze the cluster. Here, the parameter β, a parameter to perform the cluster analysis after determining the cluster centers, can be determined through training process because it has no any specific decision method and has different values according to each data. The distances among all objects in the distance-based QC method (e.g., samples and cluster centers) are calculated using the following distance formula (Eq. (9)): Although the algorithm has the advantages of the standard QC method, it overcomes the issue of requiring sample preprocessing in most cases and fixed clustering distances, which significantly improves the efficiency. However, it fails to consider the weights and characteristics of the indicator data of each property. In response to different characteristics of different data categories, researchers have proposed many types of distance functions. Some well-known distance functions are Minkowski distance (Manhattan, Euclidean, Supremum), Canberra distance, Czeknowski distance, Standardized Euclidean distance and cosine distance. However, these distance functions fail to consider the importance of each property. To address these flaws and shortcomings, this paper proposes a distance function (Eq. (10)) that considers the characteristics of the data and the weight of each property, specifically as follows:where the standard deviation S reflects the characteristics of the data of each property, and weight W reflects the importance of each property. The specific method to determine the weight is described in 2.3. This distance function satisfies the three conditions (non-negativity, symmetry, and subadditivity) so it can be used to calculate the distance between two samples. Based on the distance formula, the proposed algorithm in this study is: Determine the parameter and the weight of each attribute according to the characteristics of the data. In this paper, the weight was determined by using the Fuzzy-ANP method. In addition the parameter , a parameter to determine the number of cluster centers by calculating the potential in quantum cluster, depends on the dimensions of data, and the detailed description was specified in the literatures [23, 24]. Initialize , where is the number of clusters, and is the scale of the measurement. Using Eq. (9), calculate the weighted measure to obtain the measuring matrix . Estimate parameter and calculate potential energy of the sample (identical to that for QC [23]). . According to potential energy , find the minima . Let , and v = x as the th cluster center. According to the calculation for , cluster all samples that satisfy the distance measure (1 ≤ i ≤ n and i≠k) in C and remove these samples from the set of samples . If is empty, the algorithm ends. Else, go to Step 5. The above algorithm produces C clusters, and the cluster center is represented by 1 ≤ i ≤ c.

Determining the weights based on Fuzzy-ANP

In the real world, most data are composed of high dimensions and present a certain degree of hierarchical structure. Within the hierarchy, there are interdependence and a feedback relationship among the properties. The importance of each property is also different. For example, when we establish the evaluation system for the low-carbon economic development level, there are an interdependence and a feedback relationship between each subsystem or between the properties in the subsystem. The details are described in 3.2. Compared to the AHP method, the evaluation results using the ANP method are more credible and accurate. Using the traditional ANP method to assign weights to the indicators, the decision maker may encounter subjective uncertainty when quantitatively evaluating each indicator. The uncertainty may negatively affect the overall evaluation result. To eliminate this effect, the study introduces triangular fuzzy numbers to the analytical network process to determine the weight of each indicator.

Constructing a fuzzy judgment matrix based on triangular fuzzy numbers

To eliminate as much as possible the negative effect of the expert subjective uncertainty in the process of comparing elements, this paper introduces the triangular fuzzy number judgment method, which calculates the fuzzy comprehensive value from the triangular fuzzy number given by T experts. Set X = {x, x,…, x} as an object set and U ={u, u,…, u} as a target set. Then, the degree by which the ith object satisfies m targets is Eq. (11):where is described as the Eq. (12) is the triangular fuzzy number given by the kth expert, k = 1, … T. Then, the weight vector is calculated based on the triangular fuzzy number. The probability that fuzzy number M is greater than k fuzzy numbers (Eq. (13)) is defined as Eq. (14): Therefore, the weight vector can be obtained as W′ = (d′(A1), d′(A2),…, d′(A))T, which can be simplified as Eq. (15):

Computing the ANP super matrix

To determine the effect of an element on its standard, the weight vector of the obtained elements must be combined to construct a supermatrix. In the ANP network structure, the control layer has elements B1, B2, …, B, whereas the network layer has elements C1, C2,…, C, where C contains elements e, e, …, e, i = 1, 2,…, n. With Bs as the criterion and e (l = 1, 2, …, n) as the sub-criterion, a comparative analysis is done on the effect of the other elements in set C on e to construct the comparison matrix W (Eq. (16))in element B, and we separately calculate its weight vector. The form of the supermatrix of the influence of each element on element set C (Eq. (17)) under the above criterion is as follows: With B as the criterion, the importance of every set of elements under B to criterion C(i =1, 2,…, m) is compared. This yields the weighted matrix as Eq. (18): The weighted matrix A is multiplied by the unweighted supermatrix W to obtain a weighted supermatrix of , where = A*W. Then, the steady-state supermatrix can be calculated after the nth iteration, where the row vector is the stable weight of each element in the network layer under criterion B.

Experimental

To verify the validity of the method, the paper first takes the mathematical approach and verifies the method on a theoretical two-dimensional weight random data. Then, the method is verified using real data. The method was applied to the 2015 low-carbon economy data from China.

Verifying on weighted random data

To verify the validity of the algorithm in theory, a cluster analysis was performed on two-dimensional random data. A comparative analysis was performed on the results of the QC algorithm and the traditional k-means method. First, three sets of weighted random test data were generated. Each set of data included 100 random two-dimensional data points of 0–6. When we generated the data, the weights of x and y were 0.8 and 0.2, respectively. The traditional k-means method, QC method, and FAWQC method were used for the cluster analysis. The results are shown in Fig. 1 and Table 1 (, ).
Fig. 1

Comparison of the cluster analysis results with two-dimensional weighted random data. (a) Distribution of two-dimensional weight random data; (b) FAWQC analysis result in weighted Euclidean space; (c) k-means analysis result in Euclidean space; (d) QC analysis result in Euclidean space.

Table 1

Clustering results for three cluster analysis methods.

MethodNumber of samplesCorrectIncorrectAccuracy (%)
K-means3002465282
QC3002623887.3
FAWQC3002861495.3
Comparison of the cluster analysis results with two-dimensional weighted random data. (a) Distribution of two-dimensional weight random data; (b) FAWQC analysis result in weighted Euclidean space; (c) k-means analysis result in Euclidean space; (d) QC analysis result in Euclidean space. Clustering results for three cluster analysis methods. Fig. 1(a) shows the distribution of the two-dimensional weighted sample in Euclidean space. Overlapping evidently occurs, and it is difficult to determine to which cluster the data belongs during the cluster analysis. When viewed in Euclidean space, they appear to belong to two clusters when they actually belong to one cluster. Fig. 1(b) shows the cluster analysis results when the 2-dimensional sample is projected on the weighted Euclidean space. The overlapping part has evidently been separated, and the method yields better accuracy than the traditional k-means method and QC method. Fig. 1(c, d) show the cluster analysis results with the k-means method and QC method in Euclidean space, respectively. The overlapped portions cannot be distinguished, which causes a less accurate cluster analysis. The experiment results demonstrate that the method is more accurate than traditional cluster analysis methods that do not consider weights.

Verification using low-carbon data

To verify the efficacy of the method on real-world data, the study uses the method to evaluate the low-carbon economic development level of provinces in China. To accurately reflect the actual situation, we used a low-carbon development level evaluation indicator system for the 3E system based on past literature and expert opinions. The system includes 3 sub-systems and 16 indicators, as shown in Table 2. The ANP network structure in the study is composed of a control layer and a network layer, as shown in Fig. 2. The control layer is divided into two parts: target and decision criteria. The network layer is composed of elements under each decision criterion. Between each element and collection elements, there is an interdependence and feedback relationship, which forms a complex network structure. Using an expert scoring system on a 1–9 scale, the weight and fuzzy relationship matrices of the primary and secondary indicators were determined. Then, according to the Fuzzy-ANP method in section 2.2, the weight of each indicator was calculated, as shown in Table 3. The weight matrix is: W = [0.037 0.0775 0.0362 0.1195 0.0329 0.021 0.0632 0.0849 0.0463 0.0287 0.0749 0.1562 0.0685 0.0624 0.0597 0.0311].
Table 2

Low-carbon economy development level evaluation indicator system.

Primary indicator
Secondary indicator
Tertiary indicator
NameNameName (Unit)DescriptionIndicator sign
Low-carbon economy 3E system coordination evaluation (A)Socioeconomic subsystem (B1)GDP growth (%) (C11)+
Industry structure (%) (C12)Tertiary industry product/GDP+
Employment rate (%) (C13)+
GDP per capita (10,000 Yuan) (C14)+
Urbanization (%) (C15)City population/total population+
Energy subsystem (B2)Total energy consumption (107 tce) (C21)
Energy efficiency (C22)GDP growth/energy consumption growth+
% Renewable (%) (C23)Renewable energy/total energy consumption+
% Fossil fuels (%) (C24)Fossil fuel energy/total energy consumption
Energy consumption per capita (Ton/Person) (C25)Total energy consumption/total population+
Environmental subsystem (B3)Carbon emission per capita (Ton/Person) (C31)Total carbon emission/total population
Carbon intensity (Ton/Yuan) (C32)Carbon dioxide emission per unit GDP
Forestation (%) (C33)Green area/total area+
Pollution control cost (100 million Yuan) (C34)
Carbon dioxide emission (Ten thousand Yuan) (C35)
Soot emission (Ten thousand Ton) (C36)
Fig. 2

Structural model of the ANP network of the low-carbon economy development level evaluation system.

Table 3

Weight of each indicator in the low-carbon economy development level evaluation system.

Primary indicator
Secondary indicator
Tertiary indicator
NameNameWeightNameWeightFinal weight
Low-carbon economy 3E system coordination evaluation (A)Socioeconomic subsystem (B1)0.3031GDP growth (C11)0.12210.0370
Industry structure (C12)0.25570.0775
Employment rate (C13)0.11940.0362
GDP per capita (C14)0.39420.1195
Urbanization (C15)0.10860.0329
Energy subsystem (B2)0.2441Total energy consumption (C21)0.08600.0210
Energy efficiency (C22)0.25890.0632
Renewable (C23)0.34780.0849
Fossil fuel (C24)0.18970.0463
Energy consumption per capita (C25)0.11760.0287
Environmental subsystem (B3)0.4528Carbon emission per capita (C31)0.16540.0749
Carbon intensity (C32)0.34500.1562
Forestation (C33)0.15130.0685
Pollution control cost (C34)0.13780.0624
Carbon dioxide emission (C35)0.13180.0597
Soot emission (C36)0.06870.0311
Low-carbon economy development level evaluation indicator system. Structural model of the ANP network of the low-carbon economy development level evaluation system. Weight of each indicator in the low-carbon economy development level evaluation system. The data for the 16 indicators are from the 2016 China Statistical Yearbook and Statistical Yearbook of each province. Since the data are incomplete for some provinces in the statistical yearbooks, we only evaluated and analyzed the data for 19 Chinese provinces. Table 4 shows the 16 original data for the 19 provinces.
Table 4

Original data.

GDP growthIndustrializationEmployment rateGDP per capitaUrbanizationTotal energy consumptionEnergy efficiencyFossil fuelRenewablesEnergy consumption per capitaCarbon emission per capitaCarbon intensityForestationPollution controlCO2Soot
Liaoning3.1310.46296.665354.067.3520.5221396.98295.8004.2000.0050.8830.13038.24044.9090.0230.024
Jilin2.0510.38896.551086.048.408.0281751.82690.8009.2000.0030.5200.09840.38045.5290.0140.017
Heilongjiang2.0070.50795.539462.058.8012.1261243.89396.4243.5760.0035.8681.48348.30050.7330.0070.017
Shanxi1.8530.53296.534919.055.0372.489176.11699.4000.6000.0203.7361.07218.030103.3640.0310.040
Inner Mongolia3.1740.38096.378000.060.3021.556858.27393.2006.8000.0091.5140.20621.030174.1460.0490.035
Jiangsu4.2730.48697.087995.066.5230.2352319.02494.7385.2620.0040.6820.07815.80077.9490.0100.008
Zhejiang12.9940.49897.077644.065.8019.6102186.97098.0002.0000.0122.2230.08560.960356.8010.0330.019
Anhui1.3440.38097.038000.550.5013.0001658.11999.4320.5680.0020.4030.11428.67029.4180.0070.009
Fujian2.8120.41696.567966.062.6012.1802132.99580.10019.9000.0030.4830.07165.95090.3650.0090.009
Jiangxi1.4360.39196.636724.051.628.4401981.42086.80013.2000.0020.3060.08360.01032.4630.0120.011
Shandong2.8980.45396.664168.057.0136.7591713.92095.5094.4910.0040.6770.10616.73096.0630.0150.011
Henan1.4270.34697.140000.046.8523.0001695.67694.2005.8000.0020.4350.10621.50034.8990.0120.009
Hubei2.1770.43197.38132.7256.8513.8282137.01886.00014.0000.0020.3860.07638.40026.9980.0090.008
Hunan1.7640.44295.942754.050.8915.4691868.44378.27021.7300.0020.3180.08047.77027.8520.0080.006
Guangdong3.4420.50497.567555.068.7126.3332831.22478.00022.0000.0020.3580.05251.26031.8150.0060.003
Guangxi1.3590.38897.035190.047.069.7611721.51664.00036.0000.0020.2470.07156.51051.5330.0090.007
Sichuan1.6000.43795.936775.047.6917.6801699.82684.90015.1000.0020.3480.09535.22014.4150.0090.005
Yunnan1.2960.45196.028806.043.3310.3571315.02857.20042.8000.0020.2370.08350.03045.5270.0120.007
Shaanxi2.6430.54896.647626.053.9211.7161538.24695.4504.5500.0030.5680.11841.42074.8640.0200.016
Original data. In general, there are many evaluation indicators in the evaluation indicator system of the 3E system with different properties. Therefore, the threshold method was used to make the indicator data dimensionless. The following formula (Eq. (19)) was used to standardize the positive indicators:Where U(x) is the standardized data; x is the raw data; x is the data maximum; x is the data minimum. The following formula (Eq. (20)) was used to standardize the negative indicators: Matlab was used to perform the cluster analysis. The clustering result divides the 19 provinces into four categories, as shown in Table 5(, ). Here, the close-value method [25] was used to evaluate the development degrees of 4 clusters produced after the cluster analysis. The close-value method is an excellent method of multi-objective decision making in the field of system engineering. The detail content is as follows:where m and n denote the numbers of data and attributes, respectively.
Table 5

Clustering result (weighted-distance quantum clustering).

CategoryProvinceOsculating Value
1Guangxi, Yunnan0.3111
2Liaoning, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong0
3Jilin, Heilongjiang, Anhui, Jiangxi, Henan, Hubei, Hunan, Sichuan, Shaanxi0.1978
4Shanxi, Inner Mongolia0.7028
Construct an initial decision matrix of original data: Construct the normalization of initial decision matrix: Determine the optimal or pessimal decision sample can be searched from R Clustering result (weighted-distance quantum clustering). The optimal decision sample: The pessimal decision sample: Here, where is the weight of each attribute. Calculate and , the weighted-Euclidean distance of each decision sample to the optimal decision sample A and the pessimal decision sample A. Calculate the 'close value' of each evaluation sample, and arrange the order of good and bad. Close value:where , The smaller the close value, the higher the quality. When ci = 0, the best quality and the best point. Using the osculating value method, the osculating value C was obtained: C= (0.3111, 0, 0.1978, 0.7028). We ranked the low-carbon economy development level from good to bad based on the osculating value calculations as follows: 2 > 3 > 1 > 4. To more intuitively display the regional characteristics of the clustering result, the region distribution diagram is shown in Fig. 3. The light pink sections are regions that were not analyzed because of the lack of data from the 2016 China Statistical Yearbook and Statistical Yearbook of each province.
Fig. 3

Cluster analysis distribution.

Cluster analysis distribution. In order to prove the objectivity of the weight-distance quantum clustering experiment results, the traditional k-means algorithm and QC algorithm and the close value method are applied to cluster analysis in 19 provinces. The results of the k-means method are shown in Table 6. The low carbon economy development level by category is 2 > 4> 1 > 3. The results of the QC method are shown in Table 7. The low carbon economy development level by category is 2 > 3> 1 > 4.
Table 6

Clustering Result (k-means algorithm).

CategoryProvinceOsculating Value
1Heilongjiang, Shanxi, Henan, Hubei0.7966
2Jiangsu, Zhejiang, Guangdong0
3Jilin, Anhui, Jiangxi, Fujian Hunan, Sichuan, Guangxi, Yunnan1.0723
4Liaoning, Inner Mongolia, Shandong,Shaanxi0.7930
Table 7

Clustering Result (QC algorithm).

CategoryProvinceOsculating Value
1Shanxi, Jilin, Sichuan, Shandong, Anhui, Inner Mongolia, Shaanxi, Hubei0.8788
2Jiangsu, Guangdong0
3Henan, Heilongjiang, Jiangxi, Hunan, Zhejiang, Fujian, Liaoning0.2978
4Guangxi, Yunnan1.1643
Clustering Result (k-means algorithm). Clustering Result (QC algorithm). The best areas for low carbon economic development achieved by the three methods are Jiangsu and Guangdong. Using the k-means method, the third category includes Fujian, Guangxi, and Yunnan, while the fourth category includes Inner Mongolia and Shandong. Using the QC method, the first category includes Shandong and Inner Mongolia and Shanxi, while the second category includes Zhejiang, Fujian and Heilongjiang. Liaoning, Zhejiang, Fujian, Guangdong, Jiangsu, and Shandong (the highest areas, group A) are mainly located in the East and South coast areas of China, in which economic and environmental indicators are higher than ones in other regions (see Table 4). The reason is that the third industry, the technical capacity and the foundation of industry and agriculture are superior due to the geographical advantages of these areas. In addition, and Pollution energy consumption is relatively small (see Table 4). Secondly, Heilongjiang, Anhui, Jiangxi, Henan, Hubei, Hunan, Sichuan, and Shaanxi (relatively high areas, group B) are mainly located in the inland area. These areas have the advantages such as high energy efficiency, rich renewable energy and wide area of forest. However, these areas have some disadvantages such as the lack of talents and a low level of science and technology. Moreover, the high-energy consumption industries moved from the East and South coast area to these areas: therefore, there is a high ratio of fossil energy (see Table 4). Thirdly, in Guangxi and Yunnan (relatively low regions, group C), the environment and energy indicators are relatively high, but the economic indicators (i.e., GDP per Capita, Urbanization) are very low because these areas are located in the West and South areas with relatively low development (see Table 4). Finally, in the Inner Mongolia and Shanxi Province (lowest region, group D), the environment and energy indicators are lower than ones in other areas and the economy is also on the developing stage. Therefore, Shandong can not belong to the same category with Guangxi and Yunnan. In Addition, Heilongjiang Province can not belong to the same category with Guangdong, Fujian, Zhejiang, Jiangsu, Shandong and Liaoning. Results shows, the weighted-distance version of quantum clustering can obtain a clustering result that better objectively reflects the actual situation.

Conclusion

In this paper, quantum clustering using the weighted distance by the Fuzzy-ANP method is studied. This paper consists of two parts. First, we have improved the cluster method by introducing a new weighing distance in the quantum cluster. Next, the improved method was mathematically and practically proven to be superior to the existing methods by using two-dimensional random data and 2015 3E data for 19 provinces in China. The method maintains the advantages of the traditional quantum clustering (QC) methods, for example the clustering center for arbitrary shapes can be detected according to the basic information of the data itself, and the samples do not need to be preprocessed. In addition, the importance of each attribute to the system can be more accurately reflected. The improved algorithm can be separated into two stages: weight determination and cluster analysis. However, the determination method of weight depends on the characteristics of each data. The reason why we applied the Fuzzy-ANP method to China's 19 provinces' 3E system data is as following: Firstly, ANP method is suitable for the weight determination because China's 19 provinces' 3E system data have a hierarchical structure, in which the relations between subsystems and between properties in one subsystem or different subsystems exist. Secondly, Fuzzy method was used to eliminate the influence of subjectivity on the weight determination. The method is more suitable for real-world data with high dimensions and a certain degree of hierarchical structure data, where there is an interdependence and feedback relationship among the properties in the hierarchy. In this paper, our proposed method (Fuzzy-ANP-Weighted–QC) was firstly compared with k-means and QC methods as regard to two-dimensional random data (mathematical data), and then we performed a sound comparison with K-means and QC methods as regard to 2015 3E data (real data) for 19 provinces in China. The method provides satisfactory results in terms of accuracy and processing time, but the unavoidable shortcomings of quantum clustering methods remain.

Declarations

Author contribution statement

Fan Decheng: Conceived and designed the experiments; Analyzed and interpreted the data. Song Jon: Performed the experiments; Analyzed and interpreted the data; Wrote the paper. Cholho Pang, Wang Dong, CholJin Won: Contributed reagents, materials, analysis tools or data.

Funding statement

This work was supported by National Natural Science Foundation of China (71373059) and Central University Fundamental Fund for Fundamental Research Funds (HEUCF180901).

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.
  1 in total

1.  A Euclidean Group Assessment on Semi-Supervised Clustering for Healthcare Clinical Implications Based on Real-Life Data.

Authors:  Muhammad Noman Sohail; Jiadong Ren; Musa Uba Muhammad
Journal:  Int J Environ Res Public Health       Date:  2019-05-06       Impact factor: 3.390

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.