| Literature DB >> 32517018 |
Sana Qaiyum1, Izzatdin Aziz1, Mohd Hilmi Hasan1, Asif Irshad Khan2, Abdulmohsen Almalawi2.
Abstract
Data Streams create new challenges for fuzzy clustering algorithms, specifically Interval Type-2 Fuzzy C-Means (IT2FCM). One problem associated with IT2FCM is that it tends to be sensitive to initialization conditions and therefore, fails to return global optima. This problem has been addressed by optimizing IT2FCM using Ant Colony Optimization approach. However, IT2FCM-ACO obtain clusters for the whole dataset which is not suitable for clustering large streaming datasets that may be coming continuously and evolves with time. Thus, the clusters generated will also evolve with time. Additionally, the incoming data may not be available in memory all at once because of its size. Therefore, to encounter the challenges of a large data stream environment we propose improvising IT2FCM-ACO to generate clusters incrementally. The proposed algorithm produces clusters by determining appropriate cluster centers on a certain percentage of available datasets and then the obtained cluster centroids are combined with new incoming data points to generate another set of cluster centers. The process continues until all the data are scanned. The previous data points are released from memory which reduces time and space complexity. Thus, the proposed incremental method produces data partitions comparable to IT2FCM-ACO. The performance of the proposed method is evaluated on large real-life datasets. The results obtained from several fuzzy cluster validity index measures show the enhanced performance of the proposed method over other clustering algorithms. The proposed algorithm also improves upon the run time and produces excellent speed-ups for all datasets.Entities:
Keywords: ant colony optimization; data stream; incremental learning; interval type-2 fuzzy c-means
Year: 2020 PMID: 32517018 PMCID: PMC7309007 DOI: 10.3390/s20113210
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Single Pass clustering approach to IT2FCM-ACO.
Parameter initialization for spIT2FCM-ACO.
| Parameter | Value |
|---|---|
| Maximum iteration | 1000 |
| Fuzzifiers ( | 1.7 |
| Fuzzifier ( | 2.6 |
| Termination condition (min_impro) | 1*10−5 |
| Pheromone evaporation rate | 0.05 |
| Parameter to avoid division by 0 ( | 0.01 |
| Varies speed of convergence | 1.0 |
A summary of Dataset.
| Dataset | # attr | # eg | # Class |
|---|---|---|---|
| Airlines [ | 7 | 539,384 | 2 |
| Forest [ | 54 | 581,012 | 7 |
| Sea [ | 3 | 1,000,001 | 2 |
| Poker [ | 10 | 1,025,010 | 10 |
| Electricity [ | 9 | 2,075,259 | 2 |
| KDD cup [ | 41 | 494020 | 23 |
Evaluation of SC index.
| Dataset | Airlines | Sea | Poker | Forest | Electricity | KDD Cup |
|---|---|---|---|---|---|---|
| IT2FCM-AO | 1.57 ± 1.01 | 0.82 ± 0.35 | 2.12 ± 0.48 | 2.28 ± 1.21 | 1.20 ± 1.21 | 1.07 ± 0.21 |
| GAIT2FCM | 1.62 ± 0.34 | 1.83 ± 0.29 | 2.65 ± 0.18 | 3.32 ± 0.18 | 1.33 ± 1.01 | 1.23 ± 0.15 |
| IT2FCM-ACO | 2.5 ± 0.04 | 2.32 ± 0.11 | 3.21 ± 0.05 | 4.28 ± 0.11 | 1.94 ± 0.52 | 2.18 ± 0.09 |
| spIT2FCM-ACO | 2.57 ± 0.01 | 2.45 ± 0.02 | 3.61 ± 0.03 | 4.74 ± 0.02 | 2.17 ± 0.08 | 2.89 ± 0.06 |
Figure 2Analysis of PI for SC in different algorithms over IT2FCM-AO.
Figure 3Analysis of PI for PCAES in different algorithms over IT2FCM-AO.
Evaluation of PCAES index.
| Dataset | Airlines | Sea | Poker | Forest | Electricity | KDD Cup |
|---|---|---|---|---|---|---|
| IT2FCM-AO | 1.54 ± 0.33 | 7.25 ± 0.21 | 5.02 ± 0.19 | 1.42 ± 1.20 | 0.94 ± 0.75 | 1.68 ± 0.92 |
| GAIT2FCM | 1.68 ± 0.58 | 7.38 ± 0.96 | 6.30 ± 0.17 | 1.64 ± 1.08 | 1.23 ± 0.4 | 1.92 ± 0.51 |
| IT2FCM-ACO | 2.74 ± 0.09 | 8.30 ± 0.47 | 6.91 ± 0.02 | 2.56 ± 1.04 | 1.57 ± 0.34 | 3.49 ± 0.20 |
| spIT2FCM-ACO | 2.76 ± 0.06 | 8.45 ± 0.07 | 6.99 ± 0.01 | 2.72 ± 1.02 | 2.10 ± 0.32 | 3.82 ± 0.10 |
Evaluation of FRI.
| Dataset | Sea | Airlines | Poker | Forest | Electricity | KDD Cup |
|---|---|---|---|---|---|---|
| IT2FCM-AO | 0.67 ± 0.12 | 0.68 ± 0.05 | 0.56 ± 0.03 | 0.46 ± 0.17 | 0.47 ± 0.15 | 0.53 ± 0.11 |
| GAIT2FCM | 0.72 ± 0.07 | 0.73 ± 0.04 | 0.78 ± 0.03 | 0.78 ± 0.13 | 0.65 ± 0.13 | 0.72 ± 0.10 |
| IT2FCM-ACO | 0.82 ± 0.02 | 0.85 ± 0.02 | 0.89 ± 0.01 | 0.92 ± 0.12 | 0.83 ± 0.10 | 0.88 ± 0.04 |
| spIT2FCM-ACO | 0.84 ± 0.01 | 0.87 ± 0.01 | 0.93 ± 0.001 | 0.94 ± 0.11 | 0.89 ± 0.05 | 0.91 ± 0.001 |
Figure 4Analysis of PI for FRI in different algorithms over IT2FCM-AO.
Evaluation of ER.
| Dataset | Airlines | Sea | Poker | Forest | Electricity | KDD Cup |
|---|---|---|---|---|---|---|
| IT2FCM-AO | 1.19 ± 0.21 | 0.39 ± 0.05 | 1.13 ± 0.13 | 1.45 ± 1.10 | 1.16 ± 0.03 | 1.46 ± 0.24 |
| GAIT2FCM | 0.29 ± 0.12 | 0.25 ± 0.03 | 0.24 ± 0.09 | 1.12 ± 0.90 | 0.98 ± 0.13 | 1.22 ± 0.11 |
| IT2FCM-ACO | 0.19 ± 0.52 | 0.15 ± 0.002 | 0.14 ± 0.05 | 0.13 ± 0.30 | 0.17 ± 0.13 | 0.11 ± 0.06 |
| spIT2FCM-ACO | 0.17 ± 0.03 | 0.14 ± 0.001 | 0.12 ± 0.03 | 0.07 ± 0.23 | 0.12 ± 0.10 | 0.11 ± 0.03 |
Figure 5Analysis of PI for ER in different algorithms over IT2FCM-AO.
Evaluation of Run Time (s).
| Dataset | Airlines | Sea | Poker | Forest | Electricity | KDD Cup |
|---|---|---|---|---|---|---|
| IT2FCM-AO | 82.18 ± 1.21 | 48.18 ± 0.21 | 116.60 ± 0.36 | 181.69 ± 1.68 | 397.06 ± 1.23 | 341.90 ± 0.98 |
| GAIT2FCM | 76.67 ± 0.78 | 32.25 ± 0.12 | 139.67 ± 0.24 | 130.95 ± 1.42 | 309.27 ± 1.10 | 303.14 ± 0.27 |
| IT2FCM-ACO | 144.54 ± 0.50 | 70.45 ± 0.04 | 254.38 ± 0.08 | 229.13 ± 1.35 | 413.94 ± 0.78 | 428.69 ± 0.02 |
| spIT2FCM-ACO | 65.84 ± 0.003 | 15.03 ± 0.004 | 105.49 ± 0.06 | 119.68 ± 0.07 | 212.67 ± 0.03 | 116.24 ± 0.004 |
Evaluation of Speed Up.
| Dataset | SspACO/AO | SspACO/GA | SspACO/ACO |
|---|---|---|---|
| Sea | 3.21 | 2.15 | 4.69 |
| Airlines | 1.25 | 1.16 | 2.19 |
| Forest | 1.51 | 1.16 | 1.91 |
| Poker | 1.10 | 1.24 | 2.41 |
| Electricity | 1.87 | 1.45 | 1.95 |
| KDD cup | 2.94 | 2.60 | 3.68 |
Figure 6Analysis of run time for different percentages of airlines dataset.
Figure 7Analysis of run time for different percentages of forest dataset.
Figure 8Analysis of run time for different percentages of sea dataset.
Figure 9Analysis of run time for different percentages of poker dataset.
Figure 10Analysis of run time for different percentages of electricity dataset.
Figure 11Analysis of run time for different percentages of KDD cup dataset.
Ranking of Algorithms based on FRI, ER, and RT.
| Dataset | Rank (FRI, ER, RT) | |||
|---|---|---|---|---|
| IT2FCM-AO | GAIT2FCM | IT2FCM-ACO | spIT2FCM-ACO | |
| Airlines | (4, 4, 3) | (3, 3, 2) | (2, 2, 4) | (1, 1, 1) |
| Sea | (4, 4, 3) | (3, 3, 2) | (2, 2, 4) | (1, 1, 1) |
| Forest | (4, 4, 3) | (3, 3, 2) | (2, 2, 4) | (1, 1, 1) |
| Poker | (4, 4, 2) | (3, 3, 3) | (2, 2, 4) | (1, 1, 1) |
| Electricity | (4, 4, 3) | (3, 3, 2) | (2, 2, 4) | (1, 1, 1) |
| KDD cup | (4, 4, 3) | (3, 3, 2) | (2, 2.5, 4) | (1, 2.5, 1) |
| Average Rank | (4, 4, 2.83) | (3, 3, 2.17) | (2, 2.08, 4) | (1, 1.25, 1) |
Evaluation of performance difference between spIT2FCM-ACO and other algorithms.
| Algorithms | z Value | ||
|---|---|---|---|
| FRI | ER | RT | |
| spIT2FCM-ACO vs. IT2FCM-AO | 4.02 | 3.69 | 2.45 |
| spIT2FCM-ACO vs. GAIT2FCM | 2.68 | 2.35 | 1.57 |
| spIT2FCMACO vs. IT2FCM-ACO | 1.34 | 1.45 | 4.02 |