| Literature DB >> 30188908 |
Mongkhon Thakong1, Suphakant Phimoltares1, Saichon Jaiyen2, Chidchanok Lursinsap1.
Abstract
Throughout recent times, cybersecurity problems have occurred in various business applications. Although previous researchers proposed to cope with the occurrence of cybersecurity issues, their methods repeatedly replicated the training processes for several times to classify datasets of these problems in streaming non-stationary environments. In dynamic environments, the conventional methods possibly deteriorate the adaptive solution to prevent these issues. This research proposes a one-pass-throw-away learning using the dynamical structure of the network to solve these problems in dynamic environments. Furthermore, to speed up the computational time and to maintain a minimum space complexity for streaming data, the new concepts of learning in forms of recursive functions were introduced. The information gain-based feature selection was also applied to reduce the learning time during the training process. The experimental results signified that the proposed algorithm outperformed the others in incremental-like and online ensemble learning algorithms in terms of classification accuracy, space complexity, and computational time.Entities:
Mesh:
Year: 2018 PMID: 30188908 PMCID: PMC6126810 DOI: 10.1371/journal.pone.0202937
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of the dynamic stratum (Dyn-Stratum) model.
Fig 2Our proposed structure of the network for handling cybersecurity problems.
Fig 3An example of how Dyn-Stratum works.
The benchmarked datasets on real-world cybersecurity problem.
| Dataset | # Instances | # Attributes | # Classes |
|---|---|---|---|
| Spam | 9324 | 499 | 2 |
| Phishing | 11055 | 30 | 2 |
| NLS-KDD | 148517 | 41 | 5 |
Confusion matrix for a binary classification.
| Actual class | Predicted class | |
|---|---|---|
| Yes | No | |
| Yes | TP | FN |
| No | FP | TN |
The performance on the Spam dataset with two classes over 10 runs.
| Method | With all attributes (499 attributes) | With ranked attributes (341 attributes) | ||||
|---|---|---|---|---|---|---|
| average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | |
| Dyn-Stratum | 627.02 | 205.81 | ||||
| Learn++.NSE [ | 77.34(20) | 14 | 198.32 | 75.47(5) | 14 | 144.14 |
| WMV [ | 85.08(5) | 14 | 84.45(7) | 13 | ||
| ADACC [ | 94.56(3) | 20 | 698.26 | 94.13(2) | 20 | 612.25 |
| ARF [ | 85.67(7) | 25 | 169.91 | 85.46(4) | 25 | 126.19 |
The performance on the modified NSL-KDD dataset with two classes over 10 runs.
| Method | With all attributes (41 attributes) | With ranked attributes (26 attributes) | ||||
|---|---|---|---|---|---|---|
| average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | |
| Dyn-Stratum | 478.85 | 532.81 | ||||
| Learn++.NSE | 89.14(2) | 75 | 7159.11 | 88.20(4) | 50 | 3228.18 |
| WMV | 92.85(3) | 20 | 92.13(4) | 25 | ||
| ADACC | 88.45(2) | 20 | 4145.03 | 87.39(1) | 20 | 3320.48 |
| ARF | 89.23(4) | 20 | 2857.72 | 88.35(2) | 20 | 2731.64 |
The performance on the Phishing dataset with two classes over 10 runs.
| Method | With all attributes (30 attributes) | With ranked attributes (20 attributes) | ||||
|---|---|---|---|---|---|---|
| average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | |
| Dyn-Stratum | ||||||
| Learn++.NSE | 92.21(2) | 88 | 63.68 | 91.41(3) | 88 | 63.31 |
| WMV | 91.18(2) | 25 | 29.73 | 91.22(2) | 25 | 28.71 |
| ADACC | 90.95(2) | 20 | 339.85 | 90.12(12) | 20 | 329.21 |
| ARF | 88.38(4) | 25 | 49.01 | 87.58(5) | 25 | 43.28 |
The performance on the NSL-KDD dataset with four attacks and normal class over 10 runs.
| Method | With all attributes (41 attributes) | With ranked attributes (26 attributes) | ||||
|---|---|---|---|---|---|---|
| average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | average accuracy (%) with its standard deviation | # average neurons | average training time (seconds) | |
| Dyn-Stratum | 567.14 | 650.73 | ||||
| Learn++.NSE | 87.58(7) | 79 | 7730.82 | 87.48(5) | 75 | 7528.94 |
| WMV | 91.21(3) | 15 | 91.20(3) | 15 | ||
| ADACC | 88.04(6) | 20 | 6039.27 | 85.07(5) | 20 | 5748.51 |
| ARF | 89.34(5) | 20 | 4802.15 | 84.28(5) | 20 | 4874.39 |
Geometric mean of the proposed Dyn-Stratum, Learn++.NSE, WMV, ADACC, and ARF methods on cybersecurity datasets.
| Method | Spam | Phishing | Modified NSL-KDD with two classes | |||
|---|---|---|---|---|---|---|
| with all attributes (499 attributes) | with ranked attributes (341 attributes) | with all attributes (30 attributes) | with ranked attributes (20 attributes) | with all attributes (41 attributes) | with ranked attributes (26 attributes) | |
| Dyn-Stratum | ||||||
| Learn++.NSE | 0.71 | 0.56 | 0.90 | 0.90 | 0.88 | 0.87 |
| WMV | 0.83 | 0.54 | 0.90 | 0.91 | ||
| ADACC | 0.91 | 0.91 | 0.91 | 0.91 | 0.89 | 0.88 |
| ARF | 0.79 | 0.80 | 0.91 | 0.91 | 0.87 | 0.88 |
Precision of the proposed Dyn-Stratum, Learn++.NSE, WMV, ADACC, and ARF methods on cybersecurity datasets.
| Method | Spam | Phishing | Modified NSL-KDD with two classes | |||
|---|---|---|---|---|---|---|
| with all attributes (499 attributes) | with ranked attributes (341 attributes) | with all attributes (30 attributes) | with ranked attributes (20 attributes) | with all attributes (41 attributes) | with ranked attributes (26 attributes) | |
| Dyn-Stratum | 0.89 | |||||
| Learn++.NSE | 0.73 | 0.76 | 0.91 | 0.89 | 0.88 | |
| WMV | 0.85 | 0.76 | 0.91 | 0.94 | ||
| ADACC | 0.88 | 0.87 | ||||
| ARF | 0.75 | 0.72 | 0.74 | 0.73 | 0.87 | 0.86 |
F-measure of the proposed Dyn-Stratum, Learn++.NSE, WMV, ADACC, and ARF methods on cybersecurity datasets.
| Method | Spam | Phishing | Modified NSL-KDD with two classes | |||
|---|---|---|---|---|---|---|
| with all attributes (499 attributes) | with ranked attributes (341 attributes) | with all attributes (30 attributes) | with ranked attributes (20 attributes) | with all attributes (41 attributes) | with ranked attributes (26 attributes) | |
| Dyn-Stratum | ||||||
| Learn++.NSE | 0.74 | 0.62 | 0.90 | 0.90 | 0.89 | 0.88 |
| WMV | 0.84 | 0.61 | 0.90 | 0.91 | 0.94 | |
| ADACC | 0.92 | 0.88 | 0.87 | |||
| ARF | 0.72 | 0.71 | 0.74 | 0.71 | 0.86 | 0.85 |
Fig 4Classification accuracy on the Spam dataset.
Fig 7Classification accuracy on the modified NSL-KDD dataset with two classes.
Fig 5Classification accuracy on the Phishing dataset.
Fig 6Classification accuracy on the NSL-KDD dataset with four attacks and normal class.
Recall of the proposed Dyn-Stratum, Learn++.NSE, WMV, ADACC, and ARF methods on cybersecurity datasets.
| Method | Spam | Phishing | Modified NSL-KDD with two classes | |||
|---|---|---|---|---|---|---|
| with all attributes (499 attributes) | with ranked attributes (341 attributes) | with all attributes (30 attributes) | with ranked attributes (20 attributes) | with all attributes (41 attributes) | with ranked attributes (26 attributes) | |
| Dyn-Stratum | ||||||
| Learn++.NSE | 0.74 | 0.68 | 0.90 | 0.90 | 0.89 | 0.88 |
| WMV | 0.84 | 0.65 | 0.90 | 0.91 | 0.94 | |
| ADACC | 0.92 | 0.91 | 0.90 | 0.88 | 0.87 | |
| ARF | 0.70 | 0.69 | 0.73 | 0.69 | 0.87 | 0.85 |