| Literature DB >> 32283841 |
Affan Ahmed Toor1, Muhammad Usman1, Farah Younas1, Alvis Cheuk M Fong2, Sajid Ali Khan3, Simon Fong4.
Abstract
With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.Entities:
Keywords: IoMT; class imbalance; concept drift; data stream mining; machine learning
Mesh:
Year: 2020 PMID: 32283841 PMCID: PMC7180875 DOI: 10.3390/s20072131
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Infrastructure of Internet of Medical Things (IoMT).
Synthetic balanced dataset comparison.
| Datasets | Drift Type | Attributes | Sizes | Class Values |
|---|---|---|---|---|
| Agarwal | Abrupt and Gradual | 10 | 50k, 100k, 500k, 1M, 2M, 3M | groupA, groupB |
| LED | Abrupt and Gradual | 8 | 50k, 100k, 500k, 1M, 2M, 3M | 0,1,2,3,4,5,6,7,8,9 |
| Mixed | Abrupt and Gradual | 11 | 50k, 100k, 500k, 1M, 2M, 3M | class1, class2 |
| RandomTree | Abrupt and Gradual | 11 | 50k, 100k, 500k, 1M, 2M, 3M | class1, class2 |
Synthetic imbalanced dataset comparison.
| Dataset | Imbalance Ratio | Attributes | Sizes | Class Values |
|---|---|---|---|---|
| Agarwal | groupA: 90% | 10 | 50k, 100k, 150k | groupA, groupB |
| Mixed | Class1: 90% | 11 | 50k, 100k, 150k | class1, class2 |
| RandomTree | Class1: 90% | 11 | 50k, 100k, 150k | class1, class2 |
Prediction error percentage in synthetic imbalanced datasets.
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 0.386 | 0.0089 | 0.455 | 0.0095 | 1.396 | 0.0691 | 0.4996 | 0.985 |
| Ab-100k | Gr-100k | 0.391 | 0.0044 | 0.456 | 0.0047 | 1.381 | 0.0345 | 0.5006 | 0.992 |
| Ab-150k | Gr-150k | 0.398 | 0.0029 | 0.457 | 0.0031 | 1.371 | 0.0230 | 0.4998 | 0.995 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 0.386 | 0.0089 | 0.455 | 0.0095 | 1.396 | 0.0691 | 0.4996 | 0.985 |
| Ab-100k | Gr-100k | 0.391 | 0.0044 | 0.456 | 0.0047 | 1.381 | 0.0345 | 0.5006 | 0.992 |
| Ab-150k | Gr-150k | 0.398 | 0.0029 | 0.457 | 0.0031 | 1.371 | 0.0230 | 0.4998 | 0.995 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 0.386 | 0.0089 | 0.455 | 0.0095 | 1.396 | 0.0691 | 0.4996 | 0.985 |
| Ab-100k | Gr-100k | 0.391 | 0.0044 | 0.456 | 0.0047 | 1.381 | 0.0345 | 0.5006 | 0.992 |
| Ab-150k | Gr-150k | 0.398 | 0.0029 | 0.457 | 0.0031 | 1.371 | 0.0230 | 0.4998 | 0.995 |
Average prediction error in synthetic balanced datasets.
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 0.3863 | 0.4552 | 1.39 | 0.4996 | 0.386 | 0.455 | 1.40 | 0.499 |
| Abr-100k | 0.3915 | 0.455 | 1.38 | 0.501 | 0.391 | 0.4554 | 1.38 | 0.506 |
| Abr-500k | 0.3909 | 0.456 | 1.378 | 0.4997 | 0.3909 | 0.456 | 1.37 | 0.49 |
| Abr-1M | 0.3901 | 0.456 | 1.381 | 0.499 | 0.3901 | 0.456 | 1.38 | 0.499 |
| Abr-2M | 0.391 | 0.455 | 1.386 | 0.4998 | 0.391 | 0.455 | 1.38 | 0.49 |
| Abr-3M | 0.391 | 0.456 | 1.385 | 0.50 | 0.391 | 0.456 | 1.38 | 0.51 |
| Gr-50k | 0.0089 | 0.0095 | 0.069 | 0.985 | 0.0089 | 0.0095 | 0.069 | 0.985 |
| Gr-100k | 0.0044 | 0.0047 | 0.034 | 0.992 | 0.0045 | 0.0047 | 0.034 | 0.992 |
| Gr-500k | 0.000089 | 0.000095 | 0.0069 | 0.9985 | 0.000089 | 0.000095 | 0.0069 | 0.9985 |
| Gr-1M | 0.000088 | 0.0000475 | 0.00345 | 0.9991 | 0.000088 | 0.0000475 | 0.0035 | 0.9991 |
| Gr-2M | 0.0000224 | 0.0000237 | 0.00172 | 0.9996 | 0.0000224 | 0.0000237 | 0.0017 | 0.9996 |
| Gr-3M | 0.0000149 | 0.0000158 | 0.00115 | 0.9999 | 0.0000149 | 0.0000158 | 0.0012 | 0.9999 |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 0.386 | 0.455 | 1.39 | 0.4996 | 0.386 | 0.455 | 1.39 | 0.4996 |
| Abr-100k | 0.391 | 0.455 | 1.38 | 0.501 | 0.391 | 0.4554 | 1.38 | 0.501 |
| Abr-500k | 0.3909 | 0.456 | 1.378 | 0.4997 | 0.3909 | 0.456 | 1.378 | 0.499 |
| Abr-1M | 0.3901 | 0.456 | 1.381 | 0.499 | 0.3901 | 0.456 | 1.383 | 0.499 |
| Abr-2M | 0.391 | 0.455 | 1.386 | 0.4998 | 0.391 | 0.455 | 1.387 | 0.49 |
| Abr-3M | 0.391 | 0.456 | 1.385 | 0.50 | 0.391 | 0.456 | 1.389 | 0.51 |
| Gr-50k | 0.0089 | 0.0095 | 0.069 | 0.985 | 0.0089 | 0.0095 | 0.069 | 0.985 |
| Gr-100k | 0.0044 | 0.0047 | 0.034 | 0.992 | 0.0045 | 0.0047 | 0.034 | 0.992 |
| Gr-500k | 0.000089 | 0.000095 | 0.0069 | 0.9985 | 0.000089 | 0.000095 | 0.0069 | 0.9985 |
| Gr-1M | 0.000088 | 0.0000475 | 0.00345 | 0.9991 | 0.000088 | 0.0000475 | 0.0035 | 0.9991 |
| Gr-2M | 0.0000224 | 0.0000237 | 0.00172 | 0.9996 | 0.0000224 | 0.0000237 | 0.0017 | 0.9996 |
| Gr-3M | 0.0000149 | 0.0000158 | 0.00115 | 0.9999 | 0.0000149 | 0.0000158 | 0.0012 | 0.9999 |
Mean evaluation time of synthetic imbalanced datasets.
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 1.82 | 1.56 | 1.83 | 1.62 | 1.50 | 1.57 | 1.50 | 1.36 |
| Ab-100k | Gr-100k | 3.08 | 2.71 | 3.34 | 3.12 | 3.06 | 3.14 | 3.01 | 2.88 |
| Ab-150k | Gr-150k | 4.54 | 4.72 | 5.08 | 4.87 | 4.54 | 4.84 | 4.72 | 4.25 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 1.53 | 1.63 | 1.80 | 1.68 | 1.53 | 1.56 | 1.57 | 1.39 |
| Ab-100k | Gr-100k | 3.02 | 3.00 | 3.42 | 3.23 | 3.46 | 3.58 | 3.87 | 3.26 |
| Ab-150k | Gr-150k | 4.26 | 4.85 | 5.24 | 5.05 | 4.96 | 4.77 | 5.45 | 4.99 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 1.50 | 1.52 | 1.79 | 1.69 | 1.59 | 1.63 | 1.65 | 1.37 |
| Ab-100k | Gr-100k | 3.00 | 3.15 | 3.51 | 3.51 | 3.78 | 3.75 | 4.51 | 4.01 |
| Ab-150k | Gr-150k | 4.38 | 4.49 | 5.18 | 5.07 | 5.56 | 5.21 | 5.88 | 5.33 |
Mean evaluation time of synthetic balanced datasets.
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 1.48 | 2.63 | 1.40 | 1.45 | 1.8 | 2.61 | 1.28 | 1.5 |
| Abr-100k | 3.10 | 3.18 | 2.75 | 2.71 | 3.02 | 3.41 | 2.72 | 2.6 |
| Abr-500k | 16.41 | 18.91 | 17.73 | 15.61 | 15.52 | 17.04 | 14.21 | 13.34 |
| Abr-1M | 34.22 | 37.75 | 31.84 | 41.18 | 32.83 | 34.65 | 28.97 | 28.74 |
| Abr-2M | 99.13 | 78.03 | 81.24 | 70.22 | 96.12 | 74.41 | 69.36 | 74.34 |
| Abr-3M | 130.21 | 128.62 | 133.63 | 98.96 | 125.48 | 121.69 | 128.67 | 115.8 |
| Gr-50k | 1.76 | 2.31 | 1.67 | 1.70 | 1.96 | 1.78 | 1.51 | 1.53 |
| Gr-100k | 2.90 | 3.48 | 3.34 | 3.36 | 3.12 | 3.12 | 3.54 | 3.12 |
| Gr-500k | 14.74 | 16.96 | 15.55 | 18.39 | 15.18 | 16.32 | 15.23 | 18.01 |
| Gr-1M | 31.41 | 33.59 | 30.80 | 43.60 | 31.76 | 32.49 | 33.99 | 44.34 |
| Gr-2M | 62.99 | 69.28 | 59.43 | 153.7 | 63.77 | 69.88 | 60.65 | 155.69 |
| Gr-3M | 91.94 | 103.58 | 91.86 | 176.25 | 93.45 | 104.19 | 91.22 | 177.43 |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 1.84 | 2.28 | 1.73 | 1.65 | 1.49 | 1.71 | 1.53 | 1.50 |
| Abr-100k | 3.08 | 3.43 | 2.69 | 2.79 | 2.56 | 3.32 | 3.00 | 3.01 |
| Abr-500k | 16.64 | 18.54 | 17.39 | 15.99 | 14.62 | 18.24 | 15.72 | 15.79 |
| Abr-1M | 33.89 | 37.89 | 32.48 | 42.12 | 30.69 | 33.75 | 28.33 | 31.17 |
| Abr-2M | 98.57 | 80.41 | 82.38 | 71.52 | 93.25 | 73.91 | 70.51 | 76.87 |
| Abr-3M | 129.37 | 131.72 | 135.51 | 99.56 | 121.42 | 120.44 | 129.10 | 118.98 |
| Gr-50k | 1.90 | 2.31 | 1.67 | 1.70 | 1.96 | 1.78 | 1.51 | 1.53 |
| Gr-100k | 2.90 | 3.48 | 3.34 | 3.36 | 3.12 | 3.12 | 3.54 | 3.12 |
| Gr-500k | 14.74 | 16.96 | 15.55 | 18.39 | 15.18 | 16.32 | 15.23 | 18.01 |
| Gr-1M | 31.41 | 33.59 | 30.80 | 43.60 | 31.76 | 32.49 | 33.99 | 44.34 |
| Gr-2M | 62.99 | 69.28 | 59.43 | 153.7 | 63.77 | 69.88 | 60.65 | 155.69 |
| Gr-3M | 91.94 | 103.58 | 91.86 | 176.25 | 93.45 | 104.19 | 91.22 | 177.43 |
Average detection delay in synthetic imbalanced datasets.
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 64.4 | 14 | 50.2 | 119 | 48.5 | 76 | 15.85 | 243 |
| Ab-100k | Gr-100k | 65.36 | 14 | 49.47 | 119 | 49.64 | 76 | 14.72 | 243 |
| Ab-150k | Gr-150k | 65.89 | 14 | 50.03 | 119 | 49.70 | 76 | 14.61 | 243 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 64.4 | 14 | 50.2 | 119 | 48.5 | 76 | 15.85 | 243 |
| Ab-100k | Gr-100k | 65.36 | 14 | 49.47 | 119 | 49.64 | 76 | 14.72 | 243 |
| Ab-150k | Gr-150k | 65.89 | 14 | 50.03 | 119 | 49.70 | 76 | 14.61 | 243 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 64.4 | 14 | 50.2 | 119 | 48.5 | 76 | 15.85 | 243 |
| Ab-100k | Gr-100k | 65.36 | 14 | 49.47 | 119 | 49.64 | 76 | 14.72 | 243 |
| Ab-150k | Gr-150k | 65.89 | 14 | 50.03 | 119 | 49.70 | 76 | 14.61 | 243 |
Average detection delay in synthetic balanced datasets.
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 64.4 | 50.2 | 48.5 | 15.85 | 64.4 | 50.2 | 48.5 | 15.8 |
| Abr-100k | 65.36 | 49.475 | 49.64 | 14.72 | 65.36 | 49.47 | 49.6 | 14.75 |
| Abr-500k | 65.8 | 49.39 | 49.26 | 14.81 | 65.8 | 49.39 | 49.3 | 14.81 |
| Abr-1M | 64.4 | 49.21 | 49.11 | 14.91 | 64.4 | 49.21 | 49.1 | 14.92 |
| Abr-2M | 65 | 49.03 | 48.85 | 14.80 | 65 | 49.03 | 48.8 | 14.80 |
| Abr-3M | 65.8 | 48.97 | 48.79 | 14.76 | 65.8 | 48.97 | 48.8 | 14.75 |
| Gr-50k | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-100k | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-500k | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-1M | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-2M | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-3M | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 64.4 | 50.2 | 48.5 | 15.85 | 64.4 | 50.2 | 48.5 | 15.85 |
| Abr-100k | 65.4 | 49.47 | 49.64 | 14.72 | 65.36 | 49.47 | 49.64 | 14.72 |
| Abr-500k | 65.8 | 49.39 | 49.26 | 14.81 | 65.8 | 49.39 | 49.26 | 14.81 |
| Abr-1M | 64.4 | 49.21 | 49.11 | 14.91 | 64.4 | 49.21 | 49.1 | 14.92 |
| Abr-2M | 65 | 49.03 | 48.85 | 14.80 | 65 | 49.03 | 48.8 | 14.80 |
| Abr-3M | 65.8 | 48.97 | 48.79 | 14.76 | 65.8 | 48.97 | 48.8 | 14.75 |
| Gr-50k | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-100k | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-500k | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-1M | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-2M | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
| Gr-3M | 14 | 119 | 76 | 243 | 14 | 119 | 76 | 243 |
Total detected drifts in synthetic imbalanced datasets.
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 390 | 10 | 100 | 4 | 247 | 13 | 210 | 486 |
| Ab-100k | Gr-100k | 781 | 10 | 201 | 4 | 463 | 13 | 421 | 1319 |
| Ab-150k | Gr-150k | 1171 | 10 | 302 | 4 | 693 | 13 | 628 | 2152 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 390 | 10 | 100 | 4 | 247 | 13 | 210 | 486 |
| Ab-100k | Gr-100k | 781 | 10 | 201 | 4 | 463 | 13 | 421 | 1319 |
| Ab-150k | Gr-150k | 1171 | 10 | 302 | 4 | 693 | 13 | 628 | 2152 |
|
|
| |||
|
|
|
|
|
|
| Ab-50k | Gr-50k | 390 | 10 | 100 | 4 | 247 | 13 | 210 | 486 |
| Ab-100k | Gr-100k | 781 | 10 | 201 | 4 | 463 | 13 | 421 | 1319 |
| Ab-150k | Gr-150k | 1171 | 10 | 302 | 4 | 693 | 13 | 628 | 2152 |
Total detected drifts in synthetic balanced datasets.
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 390 | 100 | 247 | 210 | 390 | 100 | 247 | 210 |
| Abr-100k | 781 | 201 | 463 | 421 | 781 | 201 | 463 | 421 |
| Abr-500k | 3906 | 1010 | 2363 | 1114 | 3906 | 1010 | 2363 | 1114 |
| Abr-1M | 7812 | 2022 | 4731 | 4165 | 7812 | 2022 | 4731 | 4165 |
| Abr-2M | 15624 | 4040 | 9525 | 8335 | 15624 | 4040 | 9525 | 8335 |
| Abr-3M | 20006 | 6066 | 14328 | 12486 | 20006 | 6066 | 14328 | 12486 |
| Gr-50k | 10 | 4 | 13 | 486 | 10 | 4 | 13 | 486 |
| Gr-100k | 10 | 4 | 13 | 1319 | 10 | 4 | 13 | 1319 |
| Gr-500k | 10 | 4 | 13 | 7986 | 10 | 4 | 13 | 7986 |
| Gr-1M | 10 | 4 | 13 | 16319 | 10 | 4 | 13 | 16319 |
| Gr-2M | 10 | 4 | 13 | 32986 | 10 | 4 | 13 | 32986 |
| Gr-3M | 10 | 4 | 13 | 48318 | 10 | 4 | 13 | 48318 |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
| Abr-50k | 390 | 100 | 247 | 210 | 390 | 100 | 247 | 210 |
| Abr-100k | 781 | 201 | 463 | 421 | 781 | 201 | 463 | 421 |
| Abr-500k | 3906 | 1010 | 2363 | 1114 | 3906 | 1010 | 2363 | 1114 |
| Abr-1M | 7812 | 2022 | 4731 | 4165 | 7812 | 2022 | 4731 | 4165 |
| Abr-2M | 15624 | 4040 | 9525 | 8335 | 15624 | 4040 | 9525 | 8335 |
| Abr-3M | 20006 | 6066 | 14328 | 12486 | 20006 | 6066 | 14328 | 12486 |
| Gr-50k | 10 | 4 | 13 | 486 | 10 | 4 | 13 | 486 |
| Gr-100k | 10 | 4 | 13 | 1319 | 10 | 4 | 13 | 1319 |
| Gr-500k | 10 | 4 | 13 | 7986 | 10 | 4 | 13 | 7986 |
| Gr-1M | 10 | 4 | 13 | 16319 | 10 | 4 | 13 | 16319 |
| Gr-2M | 10 | 4 | 13 | 32986 | 10 | 4 | 13 | 32986 |
| Gr-3M | 10 | 4 | 13 | 48318 | 10 | 4 | 13 | 48318 |
Prediction accuracy and mean evaluation time (seconds) in the electricity dataset.
| Classifier | AWE | ARF | OzaBag Adwin | Leverage Bagging | ||||
|---|---|---|---|---|---|---|---|---|
| Drift Detectors | Accuracy | Time | Accuracy | Time | Accuracy | Time | Accuracy | Time |
|
| 53.61 | 0.45 | 91.43 | 82.15 | 51.9 | 0.43 | 52.9 | 0.44 |
|
| 51.21 | 0.45 | 89.42 | 82.97 | 49.6 | 0.48 | 51.3 | 0.49 |
|
| 41.85 | 0.39 | 82.91 | 180.14 | 40.62 | 0.34 | 39.24 | 0.39 |
|
| 43.19 | 0.39 | 81.34 | 184.52 | 44.35 | 0.38 | 44.13 | 0.41 |
Prediction accuracy and mean evaluation time (second) of the intrusion detection dataset.
| Classifier | AWE | ARF | OzaBag Adwin | Leverage Bagging | ||||
|---|---|---|---|---|---|---|---|---|
| Drift Detectors | Accuracy | Time | Accuracy | Time | Accuracy | Time | Accuracy | Time |
|
| 86.11 | 7.58 | 99.93 | 367.34 | 75.98 | 7.45 | 70.69 | 8.12 |
|
| 84.16 | 7.19 | 99.91 | 318.64 | 74.31 | 7.34 | 69.43 | 7.83 |
|
| 71.26 | 5.97 | 99.1 | 1093.62 | 59.34 | 6.92 | 62.71 | 7.14 |
|
| 69.19 | 6.66 | 99.79 | 1194.12 | 61.96 | 9.29 | 65.19 | 11.38 |
Figure 2Average prediction error in datasets with abrupt drift.
Figure 3Average prediction error in datasets with gradual drift.
Figure 4Average detection delay in datasets with abrupt drift.
Figure 5Average detection delay in datasets with gradual drift.
Figure 6Mean evaluation time of datasets with abrupt drift.
Figure 7Mean evaluation time of datasets with gradual drift.
Figure 8Detected drifts in datasets with abrupt drift.
Figure 9Detected drifts in datasets with gradual drift.