| Literature DB >> 24701157 |
Abstract
Grey theory is an essential uncertain knowledge acquisition method for small sample, poor information. The classic grey theory does not adequately take into account the distribution of data set and lacks the effective methods to analyze and mine big sample in multigranularity. In view of the universality of the normal distribution, the normality grey number is proposed. Then, the corresponding definition and calculation method of the relational degree between the normality grey numbers are constructed. On this basis, the grey relational analytical method in multigranularity is put forward to realize the automatic clustering in the specified granularity without any experience knowledge. Finally, experiments fully prove that it is an effective knowledge acquisition method for big data or multigranularity sample.Entities:
Mesh:
Year: 2014 PMID: 24701157 PMCID: PMC3948588 DOI: 10.1155/2014/312645
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1The typical whitenization weight function of normality grey number.
Figure 2Value distribution of normal distribution.
Figure 3Similar situation between normal distributions.
Test UCI datasets.
| ID | Name of dataset | Number of items | Effective items | Attribute type |
|---|---|---|---|---|
| 1 | Individual household electric power consumption | 2,075,259 | 2,049,280 | Real |
| 2 | Dodgers loop sensor | 50,400 | 47,497 | Integer |
The converted normal grey sequence of dataset 1.
| The average power consumption (kilowatt/minute) | ||||
|---|---|---|---|---|
| Month | 2007 | 2008 | 2009 | 2010 |
| 1 | (1.546033, 1.29201) | (1.459920, 1.2058) | (1.410202, 1.23113) | (1.430524, 1.14829) |
| 2 | (1.401083, 1.31234) | (1.181384, 1.15272) | (1.247567, 1.09718) | (1.375854, 1.06622) |
| 3 | (1.318627, 1.27604) | (1.245336, 1.14283) | (1.226734, 1.02127) | (1.130075, 0.922584) |
| 4 | (0.891188, 0.98919) | (1.115972, 1.07732) | (1.140689, 0.973492) | (1.027295, 0.836552) |
| 5 | (0.985861, 1.00641) | (1.024281, 0.964457) | (1.012855, 0.87253) | (1.095284, 0.904557) |
| 6 | (0.826814, 0.953) | (0.994096, 0.977501) | (0.840756, 0.779055) | (0.969614, 0.833514) |
| 7 | (0.667366, 0.822755) | (0.794780, 0.802558) | (0.618120, 0.628642) | (0.721067, 0.641291) |
| 8 | (0.764186, 0.896657) | (0.276488, 0.415126) | (0.664618, 0.742219) | (0.590778, 0.638138) |
| 9 | (0.969318, 1.06606) | (0.987680, 0.962621) | (0.986840, 0.936502) | (0.956442, 0.815049) |
| 10 | (1.103910, 1.11954) | (1.136768, 1.05956) | (1.144486, 1.04126) | (1.163398, 0.991783) |
| 11 | (1.294472, 1.21085) | (1.387065, 1.20713) | (1.274743, 1.10642) | (1.196854, 0.989863) |
| 12 | (1.626473, 1.3572) | (1.275189, 1.05568) | (1.364420, 1.11317) | |
Figure 4Cluster distribution of dataset 1.
The converted normal grey sequence of dataset 2.
| Time | Expectations, variances |
|---|---|
| 0 | (6, 3.90564) |
| 1 | (4, 3.16916) |
| 2 | (3, 3.05947) |
| 3 | (2, 2.08197) |
| 4 | (3, 2.28398) |
| 5 | (7, 4.32885) |
| 6 | (16, 9.05729) |
| 7 | (28, 13.1354) |
| 8 | (30, 11.8488) |
| 9 | (29, 9.62101) |
| 10 | (26, 7.61425) |
| 11 | (25, 6.67066) |
| 12 | (26, 6.74319) |
| 13 | (27, 6.64526) |
| 14 | (31, 7.67199) |
| 15 | (34, 8.7623) |
| 16 | (33, 8.87094) |
| 17 | (31, 7.72262) |
| 18 | (30, 7.90683) |
| 19 | (26, 7.05241) |
| 20 | (20, 6.40364) |
| 21 | (21, 8.41713) |
| 22 | (20, 12.5605) |
| 23 | (12, 8.37453) |
Figure 5Cluster distribution of dataset 2.
Performance comparison among the algorithms.
| Evaluation index | Dataset 1 (individual household electric power consumption) | Dataset 2 (Dodgers loop sensor) | ||||
|---|---|---|---|---|---|---|
|
| DBSCAN |
|
| DBSCAN |
| |
| Entropy | 0.21 | 0.39 | 0.23 | 0.17 | 0.31 | 0.18 |
| Purity | 0.78 | 0.62 | 0.77 | 0.85 | 0.73 | 0.83 |
| Clusters | 3 | 4 | 4 | 7 | 8 | 8 |