| Literature DB >> 29123546 |
Chunlei Chen1, Li He2, Huixiang Zhang3, Hao Zheng4, Lei Wang1.
Abstract
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.Entities:
Year: 2017 PMID: 29123546 PMCID: PMC5662818 DOI: 10.1155/2017/2519782
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Relation between evolving granularity and different-to-same mis-affiliation induced by the batch-mode part.
Figure 2Typical examples of different-to-same mis-affiliations induced by batch-mode part.
Comparison of final cluster number.
| Demo algorithm | Benchmark algorithm | |||||
|---|---|---|---|---|---|---|
| (2,1) | (4,2) | (6,4) | (2,1) | (4,2) | (6,4) | |
| Boat | 2372 | 1444 | 589 | 2517 | 1536 | 632 |
| Cars | 2348 | 1151 | 261 | 2481 | 1133 | 270 |
| f16 | 1388 | 749 | 406 | 1469 | 827 | 356 |
| Hill | 2466 | 1552 | 525 | 2540 | 1623 | 540 |
| Peppers | 1715 | 845 | 385 | 1838 | 943 | 413 |
| Sailboat | 2208 | 1762 | 1051 | 2326 | 1817 | 1106 |
| Stream | 2558 | 2312 | 1524 | 2656 | 2437 | 1574 |
| Tank | 2374 | 1137 | 172 | 2425 | 1146 | 178 |
| Truck | 1631 | 986 | 339 | 1700 | 996 | 338 |
| Trucks | 2607 | 2256 | 681 | 2766 | 2360 | 693 |
Dataset 2: max and min Rand Index under ascending granularities.
| Granularity (measured by bandwidth) | |||
|---|---|---|---|
| (2,1) | (4,2) | (6,4) | |
| Max | 0.9997 | 0.9983 | 0.9850 |
| (usc2.2.05) | (usc2.2.17) | (usc2.2.17) | |
| Min | 0.8369 | 0.5025 | 0.1501 |
| (usc2.2.02) | (usc2.2.02) | (usc2.2.07) | |
Figure 4Data set 1: variation trends of Rand Index with respect to evolving granularity.
Figure 5truck: original image and incremental clustering results under three ascending granularity values.
Figure 6usc22.02: original image and incremental clustering results under three ascending granularity values.
Figure 3Data set 1: variation trends of PFR and SSR.
Dataset 2: max and min SSR values under ascending granularities.
| Granularity (measured by bandwidth) | |||||
|---|---|---|---|---|---|
| (2,1) | (4,2) | (6,4) | (8,6) | (10,8) | |
| Min | 3.7 | 2.1 | 6.3 | 6.1 | 1.2 |
| Max | 1.0 | 9.5 | 6.2 | 3.3 | 2.1 |