| Literature DB >> 29270197 |
Hongchao Song1, Zhuqing Jiang1, Aidong Men1, Bo Yang1.
Abstract
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.Entities:
Mesh:
Year: 2017 PMID: 29270197 PMCID: PMC5706085 DOI: 10.1155/2017/8501683
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Model architecture of DAE, DBN, and RBM.
Figure 2The flow chart of the proposed hybrid anomaly detection model.
Algorithm 1The procedure of the proposed hybrid model.
Details of the datasets used in the experimental investigation.
| Dataset name | Number of instances | Number of attributes | Number of classes |
|---|---|---|---|
| OAR | 9699 | 110 | 4 |
| GAS | 3600 | 128 | 6 |
| MPID | 130065 | 50 | 2 |
| KDD 2008 | 102240 | 117 | 2 |
Average AUC and corresponding standard deviation of the different methods. The first class listed in the bracket indicates the normal class. Results were calculated from 50 iterations.
| SVDD | OCSVM | aK-LPE | OCSVM-FA | Our model | ||
|---|---|---|---|---|---|---|
| OAR | AUC | 0.94 | 0.95 | 0.96 | 0.98 |
|
|
| ||||||
| OAR | AUC | 0.88 | 0.87 | 0.90 | 0.90 |
|
|
| ||||||
| OAR | AUC | 0.95 | 0.95 | 0.97 | 0.98 |
|
|
| ||||||
| GAS | AUC | 0.92 | 0.92 | 0.94 | 0.93 |
|
|
| ||||||
| GAS | AUC | 0.91 | 0.91 | 0.92 | 0.94 |
|
|
| ||||||
| GAS | AUC | 0.92 | 0.93 | 0.95 | 0.96 |
|
|
| ||||||
| GAS | AUC | 0.81 | 0.77 | 0.80 | 0.80 |
|
|
| ||||||
| MPID | AUC | 0.70 | 0.77 | 0.75 | 0.79 |
|
|
| ||||||
| MPID | AUC | 0.66 | 0.72 | 0.73 | 0.70 |
|
|
| ||||||
| KDD 2008 | AUC | 0.50 | 0.51 | 0.34 | 0.50 |
|
|
| ||||||
| Rank | 4.3 | 3.90 | 3.10 | 2.65 | 1.05 | |
Scheffe test for comparison of the proposed model and other methods. “+” indicates that the method on the left is better.
| Methods |
|
|---|---|
| Our model versus SVDD | +0.0005 |
| Our model versus OCSVM | +0.0043 |
| Our model versus aK-LPE | +0.0020 |
| Our model versus OCSVM-FA | +0.0916 |
Performance of the proposed model with different anomaly detectors.
| Dataset |
|
|
|
|
|---|---|---|---|---|
| GAS | 0.917 | 0.916 | 0.915 | 0.914 |
|
| ||||
| OAR | 0.957 | 0.951 | 0.948 | 0.941 |
The average AUC of the proposed model with different K-nearest neighbors.
| Dataset |
|
|
|
|---|---|---|---|
| OAR | 0.99 | 0.99 | 0.97 |
| GAS | 0.97 | 0.97 | 0.94 |
| MPID | 0.75 | 0.76 | 0.74 |