| Literature DB >> 34351977 |
Xin Liu1,2, Xuefeng Sang2, Jiaxuan Chang2, Yang Zheng2, Yuping Han1.
Abstract
Since water supply association analysis plays an important role in attribution analysis of water supply fluctuation, how to carry out effective association analysis has become a critical problem. However, the current techniques and methods used for association analysis are not very effective because they are based on continuous data. In general, there is different degrees of monotone relationship between continuous data, which makes the analysis results easily affected by monotone relationship. The multicollinearity between continuous data distorts these analytical methods and may generate incorrect results. Meanwhile, we cannot know the association rules and value interval between features and water supply. Therefore, the lack of an effective analysis method hinders the water supply association analysis. Association rules and value interval of features obtained from association analysis are helpful to grasp cause of water supply fluctuation and know the fluctuation interval of water supply, so as to provide better support for water supply dispatching. But the association rules and value interval between features and water supply are not fully understood. In this study, a data mining method coupling kmeans clustering discretization and apriori algorithm was proposed. The kmeans was used for data discretization to obtain the one-hot encoding that can be recognized by apriori, and the discretization can also avoid the influence of monotone relationship and multicollinearity on analysis results. All the rules eventually need to be validated in order to filter out spurious rules. The results show that the method in this study is an effective association analysis method. The method can not only obtain the valid strong association rules between features and water supply, but also understand whether the association relationship between features and water supply is direct or indirect. Meanwhile, the method can also obtain value interval of features, the association degree between features and confidence probability of rules.Entities:
Mesh:
Year: 2021 PMID: 34351977 PMCID: PMC8341608 DOI: 10.1371/journal.pone.0255684
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Comparison of discretization methods.
0, 1 and 2 are the category label.
Fig 2The flow chart of coupling method.
Variable description.
| Feature | Minimum | Maximum | Average value | Standard deviation |
|---|---|---|---|---|
| DWU/104m3 | 4283.1 | 7107.69 | 5790.65 | 568.46 |
| IWU/104m3 | 2689.23 | 6066.46 | 4500.22 | 681.92 |
| SIWU/104m3 | 1663.95 | 5549.5 | 3491.74 | 914.66 |
| FP/104people | 348.88 | 856.12 | 618.38 | 108.12 |
| R/mm | 30.07 | 398.86 | 158.43 | 110.38 |
| WR/104m3 | 0.33 | 1696.78 | 574.17 | 566.5 |
| WS/104m3 | 10189.05 | 18686.78 | 15721.55 | 1682.78 |
Fig 3Scatter density plot matrix.
The result comparison of the different D.
| D | Objective function value | Number of valid SAR | Whether there is spurious SAR | Feature sensitivity |
|---|---|---|---|---|
| 3 | 2.71 | 23 | Yes | Strong |
| 4 | 2.17 | 20 | No | Weak |
The discretization results of D = 3.
| Feature | Category and interval of D = 3 | ||
|---|---|---|---|
| 1 | 2 | 3 | |
| DWU/104m3 | (4283.1, 5261.45] | (5261.45, 6074.3] | (6074.3, 7107.69] |
| IWU/104m3 | (2689.23, 4154.58] | (4154.58, 4685.48] | (4685.48, 6066.46] |
| SIWU/104m3 | (1663.95, 2621.15] | (2621.15, 3958.37] | (3958.37, 5549.5] |
| FP/104people | (348.88, 509.95] | (509.95, 676.51] | (676.51, 856.12] |
| R/mm | (30.07, 118.32] | (118.32, 179.92] | (179.92, 398.86] |
| WR/104m3 | (0.33, 491.92] | (491.92, 618.26] | (618.26, 1696.78] |
| WS/104m3 | (10189.05, 13878.35] | (13878.35, 16709.5] | (16709.5, 18686.78] |
The discretization results of D = 4.
| Feature | Category and interval of D = 4 | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| DWU/104m3 | (4283.1, 5026.03] | (5026.03, 5750.3] | (5750.3, 6311.27] | (6311.27, 7107.69] |
| IWU/104m3 | (2689.23, 3685.71] | (3685.71, 4538.36] | (4538.36, 4768.16] | (4768.16, 6066.46] |
| SIWU/104m3 | (1663.95, 2468.94] | (2468.94, 3195.45] | (3195.45, 4590.63] | (4590.63, 5549.5] |
| FP/104people | (348.88, 475.54] | (475.54, 601.22] | (601.22, 731.63] | (731.63, 856.12] |
| R/mm | (30.07, 94.29] | (94.29, 151.94] | (151.94, 207.23] | (207.23, 398.86] |
| WR/104m3 | (0.33, 494.99] | (494.99, 524.72] | (524.72, 632.22] | (632.22, 1696.78] |
| WS/104m3 | (10189.05, 12928.82] | (12928.82, 15712.46] | (15712.46, 17393.34] | (17393.34, 18686.78] |
The one-item SAR of D = 3.
| Number | SAR | Support degree | Confidence degree | Lift degree |
|---|---|---|---|---|
| 1 | {FP}1→{WS}1 | 0.13 | 0.71 | 4.73 |
| 2 | {DWU}1→{WS}1 | 0.12 | 0.66 | 4.35 |
| 3 | {SIWU}1→{WS}1 | 0.09 | 0.58 | 3.84 |
| 4 | {SIWU}3→{WS}3 | 0.26 | 0.86 | 2.81 |
| 5 | {FP}3→{WS}3 | 0.22 | 0.81 | 2.64 |
| 6 | {DWU}3→{WS}3 | 0.21 | 0.68 | 2.21 |
| 7 | {FP}2→{WS}2 | 0.44 | 0.81 | 1.49 |
| 8 | {SIWU}2→{WS}2 | 0.43 | 0.81 | 1.49 |
| 9 | {DWU}2→{WS}2 | 0.38 | 0.74 | 1.38 |
| 10 | {WR}3→{WS}2 | 0.27 | 0.62 | 1.04 |
| 11 | {R}3→{DWU}2 | 0.28 | 0.53 | 0.94 |
| 12 | {R}3→{FP}2 | 0.29 | 0.56 | 0.92 |
| 13 | {R}3→{WS}2 | 0.28 | 0.53 | 0.88 |
| 14 | {R}3→{SIWU}2 | 0.26 | 0.50 | 0.83 |
The two-item SAR of D = 3.
| Number | SAR | Support degree | Confidence degree | Lift degree |
|---|---|---|---|---|
| 1 | {FP}1 and {DWU}1→{WS}1 | 0.11 | 0.91 | 6.04 |
| 2 | {FP}1 and {IWU}1→{WS}1 | 0.10 | 0.90 | 5.99 |
| 3 | {FP}1 and {SIWU}1→{WS}1 | 0.09 | 0.82 | 5.42 |
| 4 | {FP}3 and {SIWU}3→{WS}3 | 0.22 | 0.98 | 3.18 |
| 5 | {FP}3 and {DWU}3→{WS}3 | 0.18 | 0.95 | 3.08 |
| 6 | {R}1 and {SIWU}3→{WS}3 | 0.08 | 0.82 | 2.34 |
| 7 | {FP}2 and {IWU}2→{WS}2 | 0.12 | 0.96 | 1.77 |
| 8 | {FP}2 and {SIWU}2→{WS}2 | 0.35 | 0.85 | 1.57 |
| 9 | {FP}2 and {DWU}2→{WS}2 | 0.30 | 0.81 | 1.50 |
| 10 | {R}3 and {SIWU}2→{WS}2 | 0.23 | 0.87 | 1.43 |
| 11 | {WR}3 and {DWU}2→{WS}2 | 0.19 | 0.85 | 1.43 |
| 12 | {WR}3 and {SIWU}2→{WS}2 | 0.21 | 0.82 | 1.39 |
| 13 | {R}3 and {DWU}2→{WS}2 | 0.20 | 0.71 | 1.17 |
The one-item SAR of D = 4.
| Number | SAR | Support degree | Confidence degree | Lift degree |
|---|---|---|---|---|
| 1 | {SIWU}4→{WS}4 | 0.09 | 0.69 | 5.11 |
| 2 | {FP}4→{WS}4 | 0.09 | 0.53 | 3.91 |
| 3 | {FP}2→{WS}2 | 0.29 | 0.79 | 2.29 |
| 4 | {SIWU}2→{WS}2 | 0.20 | 0.72 | 2.10 |
| 5 | {DWU}2→{WS}2 | 0.23 | 0.68 | 1.97 |
| 6 | {FP}3→{WS}3 | 0.30 | 0.78 | 1.78 |
| 7 | {SIWU}3→{WS}3 | 0.33 | 0.74 | 1.69 |
| 8 | {DWU}3→{WS}3 | 0.22 | 0.63 | 1.45 |
| 9 | {WR}4→{WS}3 | 0.21 | 0.53 | 1.20 |
The two-item SAR of D = 4.
| Number | SAR | Support degree | Confidence degree | Lift degree |
|---|---|---|---|---|
| 1 | {FP}2 and {IWU}2→{WS}2 | 0.11 | 0.95 | 2.78 |
| 2 | {FP}2 and {DWU}2→{WS}2 | 0.19 | 0.93 | 2.69 |
| 3 | {FP}2 and {SIWU}2→{WS}2 | 0.18 | 0.78 | 2.26 |
| 4 | {R}1 and {SIWU}3→{WS}3 | 0.17 | 0.86 | 1.98 |
| 5 | {FP}3 and {SIWU}3→{WS}3 | 0.24 | 0.84 | 1.91 |
| 6 | {FP}3 and {DWU}3→{WS}3 | 0.17 | 0.80 | 1.83 |
| 7 | {R}1 and {DWU}3→{WS}3 | 0.10 | 0.77 | 1.76 |
| 8 | {R}4 and {SIWU}3→{WS}3 | 0.14 | 0.68 | 1.54 |
| 9 | {R}4 and {DWU}3→{WS}3 | 0.11 | 0.56 | 1.29 |
| 10 | {WR}4 and {SIWU}3→{WS}3 | 0.17 | 0.87 | 1.98 |
| 11 | {WR}4 and {DWU}3→{WS}3 | 0.10 | 0.83 | 1.90 |
Fig 4The distribution plot of one-item valid SARs of D = 3.
Fig 7The distribution plot of two-item valid SARs of D = 4.
The SAR difference of different confidence degree thresholds in D = 3.
| Number | SAR | Support degree | Confidence degree | Lift degree |
|---|---|---|---|---|
| 1 | {IWU}2→{WS}2 | 0.15 | 0.49 | 0.91 |
| 2 | {IWU}1→{WS}1 | 0.12 | 0.42 | 2.77 |
| 3 | {IWU}3→{WS}3 | 0.12 | 0.29 | 0.96 |
| 4 | {R}1→{WS}3 | 0.10 | 0.18 | 0.50 |
| 5 | {R}1 and {WR}3→{WS}3 | 0.10 | 0.22 | 0.62 |
| 6 | {R}3 and {IWU}2→{WS}2 | 0.09 | 0.46 | 0.76 |
The SAR difference of different confidence degree thresholds in D = 4.
| Number | SAR | Support degree | Confidence degree | Lift degree |
|---|---|---|---|---|
| 1 | {DWU}4→{WS}4 | 0.09 | 0.44 | 3.22 |
| 2 | {IWU}2→{WS}2 | 0.16 | 0.38 | 1.09 |
| 3 | {R}4→{WS}3 | 0.20 | 0.46 | 1.06 |
| 4 | {R}1→{WS}3 | 0.21 | 0.43 | 0.97 |