| Literature DB >> 26569283 |
Hui Zou1,2, Zhihong Zou3, Xiaojing Wang4.
Abstract
The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.Entities:
Keywords: indicator weight; local optimization; water classification
Mesh:
Substances:
Year: 2015 PMID: 26569283 PMCID: PMC4661655 DOI: 10.3390/ijerph121114400
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Location of the Heihe River in China and location of the monitoring stations.
Boundary values of some indicators in the GB3838-2002 water quality standard.
| Indicator | I | II | III | IV | V |
|---|---|---|---|---|---|
| DO (mg/L) | 7.5 | 6 | 5 | 3 | 2 |
| COD (mg/L) | 2 | 4 | 6 | 10 | 15 |
| NH3-N (mg/L) | 0.15 | 0.5 | 1 | 1.5 | 2 |
Descriptive statistics of water quality indicators.
| Indicator | Mean | SD | SE | Minimum | Maximum |
|---|---|---|---|---|---|
| pH | 8.07 | 0.43 | 0.01 | 6.34 | 9.35 |
| DO (mg/L) | 9.02 | 2.83 | 0.06 | 2.02 | 25.5 |
| COD (mg/L) | 3.51 | 2.40 | 0.05 | 0.2 | 15 |
| NH3-N (mg/L) | 0.40 | 0.44 | 0.01 | 0.01 | 2 |
Figure 2The pseudo-code for the MIWAS-K-means algorithm
The performance for the two clustering algorithms.
| Clustering Algorithms | K-Means | MIWAS-K-Means |
|---|---|---|
| SSE | 899.6053 | 782.2792 |
| Number of iterations | 12 | 18 |
| Final feature weights | (0.25,0.25,0.25,0. 25) | (0.1602,0.1978, 0.5116,0.1303) |
Weights of indicators calculated by improved weighted K-means algorithm.
| Indicators | pH | DO | COD | NH3-N |
|---|---|---|---|---|
| Weights | 0.1602 | 0.1978 | 0.5116 | 0.1303 |
Mean values of water quality features and numbers of cases in five clusters.
| Cl.1 | Cl.2 | Cl.3 | Cl.4 | Cl.5 | |
|---|---|---|---|---|---|
| pH | 7.89 ± 0.39 | 8.12 ± 0.40 | 8.12 ± 0.44 | 8.3 ± 0.35 | 7.7 ± 0.43 |
| DO | 9.43 ± 1.99 | 9.65 ± 2.40 | 8.75 ± 2.44 | 8.49 ± 2.99 | 3.97 ± 1.61 |
| COD | 1.45 ± 0.33 | 2.38 ± 0.37 | 4.71 + 0.11 | 8.95 ± 2.25 | 7.20 ± 2.46 |
| NH3-N | 0.17 ± 0.22 | 0.24 ± 0.28 | 0.54 ± 0.53 | 0.92 ± 0.85 | 1.37 ± 1.10 |
| Number of cases | 502 | 700 | 545 | 194 | 137 |
Correct and wrong assignments obtained by LOOCV.
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| 1 | 97.4 | 2.6 | 0 | 0 | 0 |
| 2 | 3.4 | 94.6 | 0 | 2 | 0 |
| 3 | 0 | 0 | 96.9 | 2.1 | 1 |
| 4 | 0 | 5 | 1.1 | 92.8 | 1.1 |
| 5 | 0 | 0 | 2.9 | 3.6 | 93.4 |
Mean values with standard deviation of water quality indicators in 7 sites.
| pH | DO | COD | NH3-N | |
|---|---|---|---|---|
| Yanhecheng | 8.22 ± 0.47 | 8.97 ± 1.94 | 3.31 ± 1.38 | 0.24 ± 0.18 |
| Gubeikou | 7.91 ± 0.41 | 8.64 ± 1.79 | 2.10 ± 1.09 | 0.19 ± 0.10 |
| Gangnanshuiku | 7.91 ± 0.29 | 9.74 ± 1.45 | 1.75 ± 0.30 | 0.07 ± 0.04 |
| Guoheqiao | 8.18 ± 0.39 | 10.3 ± 3.04 | 2.56 ± 0.81 | 0.31 ± 0.17 |
| Sanchakou | 8.19 ± 0.45 | 8.66 ± 3.7 | 6.95 ± 2.65 | 0.83 ± 0.70 |
| Bahaoqiao | 7.88 ± 0.40 | 7.30 ± 1.81 | 4.54 ± 1.39 | 1.11 ± 0.79 |
| Chenggouwan | 8.15 ± 0.46 | 4.22 ± 3.33 | 10.1 ± 3.38 | 1.92 ± 1.52 |
Number of observations in each cluster of the seven monitoring sites.
| Sum | Cl.1 | Cl.2 | Cl.3 | Cl.4 | Cl.5 | |
|---|---|---|---|---|---|---|
| Yanhecheng | 353 | 32 | 150 | 155 | 14 | 2 |
| Gubeikou | 372 | 160 | 166 | 42 | 2 | 2 |
| Gangnanshuiku | 354 | 238 | 116 | 0 | 0 | 0 |
| Guoheqiao | 392 | 68 | 240 | 82 | 1 | 1 |
| Sanchakou | 326 | 0 | 9 | 102 | 143 | 72 |
| Bahaoqiao | 238 | 4 | 19 | 163 | 23 | 29 |
| Chenggouwan | 43 | 0 | 0 | 1 | 11 | 31 |