| Literature DB >> 29977276 |
Yange Sun1,2, Zhihai Wang1, Yang Bai1, Honghua Dai3, Saeid Nahavandi4.
Abstract
It is common in real-world data streams that previously seen concepts will reappear, which suggests a unique kind of concept drift, known as recurring concepts. Unfortunately, most of existing algorithms do not take full account of this case. Motivated by this challenge, a novel paradigm was proposed for capturing and exploiting recurring concepts in data streams. It not only incorporates a distribution-based change detector for handling concept drift but also captures recurring concept by storing recurring concepts in a classifier graph. The possibility of detecting recurring drifts allows reusing previously learnt models and enhancing the overall learning performance. Extensive experiments on both synthetic and real-world data streams reveal that the approach performs significantly better than the state-of-the-art algorithms, especially when concepts reappear.Entities:
Mesh:
Year: 2018 PMID: 29977276 PMCID: PMC6011096 DOI: 10.1155/2018/4276291
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1
Algorithm 1The pseudocode of DBDM.
Figure 2Classifier graph diagram.
Figure 3Concept 2 recurs.
Figure 4Concept 5 recurs.
Figure 5New concept occurs.
Algorithm 2The pseudocode of RDP.
Average false positive counts on stationary Bernoulli distribution.
| 0.05 | 0.1 | 0.3 | 0.5 | |
|---|---|---|---|---|
| DDM | 1.89 | 0.76 | 0.29 | 0.19 |
| EDDM | 35.56 | 36.3 | 14.42 | 9.38 |
| ECDD | 166.34 | 157.3 | 154.01 | 0.11 |
| DBDM | 3.39 | 0.95 | 0.05 | 0.03 |
Average false positive counts on sudden concept drift.
| mean value increment | 0.04 | 0.09 | 0.29 | 0.49 |
|---|---|---|---|---|
| DDM | 11.93 | 10.06 | 8.34 | 8.7 |
| EDDM | 46.00 | 45.14 | 32.63 | 25.17 |
| ECDD | 173.81 | 169.55 | 168.07 | 91.43 |
| DBDM | 4.99 | 3.76 | 7.69 | 9.77 |
Detection delays on an abrupt drift.
| mean value increment | 0.04 | 0.09 | 0.29 | 0.49 |
|---|---|---|---|---|
| DDM | 3823.96 | 1817.97 | 419.66 | 300.92 |
| EDDM | 1758.95 | 759.96 | 221.06 |
|
| ECDD |
| 497.99 | 522.97 | 426.8 |
| DBDM | 529.29 |
|
| 200 |
Average false negative counts on a gradual drift.
| Slope | 0.0001 | 0.0002 | 0.0003 | 0.0004 |
|---|---|---|---|---|
| DDM | 0.75 | 0.58 | 0.51 | 0.32 |
| EDDM | 0.98 | 0.97 | 0.91 | 0.83 |
| ECDD | 0 | 0 | 0 | 0 |
| DBDM | 0 | 0 | 0 | 0 |
Detection delays on a gradual drift.
| Slope | 0.0001 | 0.0002 | 0.0003 | 0.0004 |
|---|---|---|---|---|
| DDM | - | - | - | - |
| EDDM | - | - | - | - |
| ECDD | 523.55 | 508.5 | 515.58 | 509.25 |
| DBDM | 702 | 454 | 330 | 328 |
Description of the nine datasets.
| Dataset | Instances | Attributes | Drift type | Noise |
|---|---|---|---|---|
| HyperPlane | 1M | 10 | gradual | 5% |
| LED | 1M | 24 | sudden | 15% |
| Random Tree | 1M | 10 | recurring | 0% |
| SEA | 1M | 3 | sudden recurring | 10% |
| Elist | 1, 500 | 913 | recurring | - |
| Spam | 9, 324 | 850 | gradual | - |
| Usenet | 5, 931 | 658 | unknown | - |
| Covertype | 581 | 53 | unknown | - |
| Gas Sensor | 13, 910 | 128 | unknown | - |
Characteristics of Elist.
| 1-300 | 300-600 | 600-900 | 900-1200 | 1200-1500 | |
|---|---|---|---|---|---|
| Medicine | + | - | + | - | + |
| Space | - | + | - | + | - |
| Baseball | - | + | - | + | - |
Comparison of classification accuracy (%).
| AWE | EB | DWM | OCBoost | RCD | RDP | |
|---|---|---|---|---|---|---|
| HyperPlane |
| 78.79 (4) | 75.21 (5) | 74.81 (6) | 85.64 (2) | 84.24 (3) |
| LED | 59.94 (5) | 53.48 (6) |
| 62.65 (4) | 67.65 (2) | 66.89 (3) |
| Random Tree | 65.17 (4) | 66.53 (3) | 61.25 (5) | 53.67 (6) | 67.53 (2) |
|
| SEA | 72.01 (6) | 77.60 (5) |
| 83.89 (3) | 81.45 (4) | 84.65 (2) |
| Elist | 54.08 (5) | 65.36 (3) | 55.07 (4) | 51.45 (6) | 67.89 (2) |
|
| Spam | 67.27 (6) | 70.13 (4) |
| 72.79 (2) | 69.19 (5) | 71.36 (3) |
| Usenet | 61.21 (6) |
| 62.76 (5) | 63.47 (4) | 70.78 (3) | 72.23 (2) |
| Covertype | 73.24 (3) | 66.86 (6) | 70.63 (4) | 69.63 (5) | 78.53 (2) |
|
| Gas Sensor | 56.45 (6) | 57.07 (5) | 64.30 (2) | 59.45 (4) | 62.36 (3) |
|
| Average Rank | 4.67 | 4.11 | 3.11 | 4.44 | 2.78 |
|
Comparison of time consumption (Cpu seconds).
| AWE | EB | DWM | OCBoost | RCD | RDP | |
|---|---|---|---|---|---|---|
| HyperPlane |
| 14.40 (2) | 59.01 (5) | 153.08 (6) | 29.21 (3) | 35.35 (4) |
| LED | 20.98 (5) | 31.36 (6) |
| 12.43 (3) | 19.98 (4) | 11.24 (2) |
| Random Tree | 37.42 (3) | 38.18 (4) | 49.53 (5) | 78.13 (6) | 31.42 (2) |
|
| SEA | 36.41 (5) | 44.56 (6) | 20.01 (2) |
| 26.41 (4) | 24.05 (3) |
| Elist | 17.36 (2) | 33.45 (5) | 50.21 (6) | 22.23 (4) | 20.11 (3) |
|
| Spam | 81.43 (4) | 63.01 (2) | 85.23 (5) | 86.36 (6) |
| 80.25 (3) |
| Usenet | 20.21 (2) | 26.69 (3) | 30.21 (5) | 30.78 (6) | 28.01 (4) |
|
| Covertype | 26.41 (4) | 13.43 (2) | 36.06 (5) | 41.56 (6) |
| 14.67 (3) |
| Gas Sensor | 86.57 (3) | 90.12 (4) | 97.79 (6) | 96.41 (5) | 80.12 (2) |
|
| Average Rank | 3.22 | 3.56 | 4.44 | 4.78 | 2.67 |
|
Comparison of memory consumption (MB).
| AWE | EB | DWM | OCBoost | RCD | RDP | |
|---|---|---|---|---|---|---|
| HyperPlane | 25.98 (5) | 28.79 (6) |
| 13.81 (2) | 13.90 (3) | 16.24 (4) |
| LED | 29.90 (5) | 39.90 (6) | 20.34 (4) |
| 13.78 (3) | 6.89 (2) |
| Random Tree | 45.17 (5) | 78.53 (6) | 18.28 (2) | 34.07 (4) | 23.07 (3) |
|
| SEA | 412.01 (6) | 343.30 (5) |
| 68.65 (2) | 73.12 (3) | 78.21 (4) |
| Elist | 80.35 (5) | 89.45 (6) |
| 34.82 (2) | 44.48 (4) | 42.58 (3) |
| Spam | 25.98 (6) | 24.79 (5) | 20.01 (3) | 22.36 (4) |
| 16.24 (2) |
| Usenet | 39.90 (6) | 23.78 (5) | 22.24 (4) |
| 20.24 (3) | 16.89 (2) |
| Covertype | 45.17 (5) | 78.53 (6) |
| 23.07 (3) | 10.89 (2) | 26.25 (4) |
| Gas Sensor | 412.01 (5) | 543.30 (6) | 410.21 (4) | 376.89 (3) | 326.07 (2) |
|
| Average Rank | 5.33 | 5.67 |
| 2.44 | 2.67 | 2.56 |
Comparison of F1-measure.
| AWE | EB | DWM | OCBoost | RCD | RDP | |
|---|---|---|---|---|---|---|
| HyperPlane |
| 0.097 (2) | 0.068 (6) | 0.079 (5) | 0.086 (4) | 0.094 (3) |
| LED | 0.118 (6) | 0.156 (5) |
| 0.262 (3) | 0.245 (4) | 0.279 (2) |
| Random Tree | 0.451 (3) | 0.345 (4) | 0.221 (5) | 0.207 (6) | 0.477 (2) |
|
| SEA | 0.089 (6) | 0.127 (5) | 0.141 (4) | 0.189 (3) |
| 0.235 (2) |
| Elist | 0.098 (3) | 0.079 (4) | 0.069 (5) | 0.058 (6) | 0.156 (2) |
|
| Spam | 0.027 (6) |
| 0.169 (4) | 0.079 (5) | 0.248 (2) | 0.216 (3) |
| Usenet | 0.039 (5) | 0.033 (6) |
| 0.047 (3) | 0.057 (2) | 0.043 (4) |
| Covertype | 0.120 (5) | 0.127 (4) | 0.136 (3) | 0.073 (6) | 0.147 (2) |
|
| Gas Sensor | 0.046 (4) | 0.037 (6) | 0.055 (3) | 0.045 (5) | 0.047 (2) |
|
| Average Rank | 4.33 | 4.11 | 3.56 | 4.67 | 2.33 |
|
Figure 6Accuracy on the HyperPlane.
Figure 7Accuracy on the Random Tree.
Figure 8Accuracy on the Elist.
Figure 9F1-measure on the Covertype.
Figure 10A critical different diagram for all classifiers against each other.