| Literature DB >> 35100305 |
Yifan Feng1, Weihong Cai1, Haoyu Yue1, Jianlong Xu1, Yan Lin2, Jiaxin Chen1, Zijun Hu1.
Abstract
Anomaly detection in network traffic is becoming a challenging task due to the complexity of large-scale networks and the proliferation of various social network applications. In the actual industrial environment, only recently obtained unlabelled data can be used as the training set. The accuracy of the abnormal ratio in the training set as prior knowledge has a great influence on the performance of the commonly used unsupervised algorithms. In this study, an anomaly detection algorithm based on X-means and iForest is proposed, named X-iForest, which clusters the standard Euclidean distance between the abnormal points and the normal cluster centre to achieve secondary filtering by using X-means. We compared X-iForest with seven mainstream unsupervised algorithms in terms of the AUC and anomaly detection rates. A large number of experiments showed that X-iForest has notable advantages over other algorithms and can be well applied to anomaly detection of large-scale network traffic data.Entities:
Mesh:
Year: 2022 PMID: 35100305 PMCID: PMC8803200 DOI: 10.1371/journal.pone.0263423
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The result of X-means clustering on standard Euclidean distance from the abnormal cluster centers to the normal cluster center.
(a): The result of K-Means clustering. (b): The final anomaly detection result of the generated data bu using X-iForest. (c): Demonstration of X-iForest on a test dataset.
Statistical characteristics of the other experimental dateset.
| Dataset | Cardinal number | Number of attributes | Abnormal points |
|---|---|---|---|
| Shuttle | 49097 | 9 | 3437 |
| Mulcross | 262144 | 4 | 26214 |
| Satellite | 6435 | 36 | 2036 |
| BreastW | 683 | 9 | 239 |
Fig 2The performance of iForest under different contamination parameters in the dataset with a abnormal ratio of 0.32, c represents the contamination parameter.
AUC of X-iForest and other algorithms.
| Dataset | X-iForest | EiForest | iForest | LOF | PCA | HBOS | CBLOF | KNN |
|---|---|---|---|---|---|---|---|---|
| Simulation 1 |
| 0.99 | 0.941 | 0.498 | 0.958 | 0.957 | 0.733 | 0.99 |
| Simulation 2 |
| 0.8 | 0.761 | 0.711 | 0.701 | 0.756 | 0.664 | 0.719 |
| Simulation 3 | 0.763 | 0.672 | 0.638 | 0.735 | 0.621 | 0.633 | 0.738 |
|
| Simulation 4 |
| 0.971 | 0.929 | 0.51 | 0.954 | 0.954 | 0.743 | 0.978 |
| Shuttle | 0.968 | 0.945 | 0.895 | 0.741 | 0.969 |
| 0.699 | 0.765 |
| Mulcross |
| 0.915 | 0.868 | 0.57 | 0.96 | 0.455 | 0.752 | 0.445 |
| Satellite |
| 0.679 | 0.638 | 0.585 | 0.641 | 0.646 | 0.681 | 0.646 |
| BreastW |
| 0.968 | 0.932 | 0.529 | 0.764 | 0.653 | 0.812 | 0.94 |
Fig 3The AUC results of the proposed algorithm X-iForest and others algorithms.
ADR of X-iForest and other algorithms.
| Dataset | X-iForest | EiForest | iForest | LOF | PCA | HBOS | CBLOF | KNN |
|---|---|---|---|---|---|---|---|---|
| Simulation 1 |
| 1.0 | 0.883 | 0.02 | 1.0 | 1.0 | 0.467 | 1.0 |
| Simulation 2 |
| 0.783 | 0.533 | 0.433 | 0.417 | 0.6 | 0.333 | 0.45 |
| Simulation 3 |
| 0.385 | 0.41 | 0.564 | 0.359 | 0.63 | 0.487 | 0.769 |
| Simulation 4 |
| 1.0 | 0.86 | 0.03 | 1.0 | 1.0 | 0.487 | 0.957 |
| Shuttle |
| 0.899 | 0.796 | 0.607 | 0.967 | 0.976 | 0.401 | 0.648 |
| Mulcross |
| 1.0 | 0.762 | 0.229 | 0.929 | 0.019 | 0.504 | 0.0 |
| Satellite |
| 0.38 | 0.514 | 0.432 | 0.291 | 0.304 | 0.409 | 0.53 |
| BreastW |
| 0.958 | 0.917 | 0.389 | 0.528 | 0.306 | 0.639 | 0.903 |
Fig 4The ADR results of the proposed algorithm X-iForest and others algorithms.