| Literature DB >> 27754380 |
Tao Ma1,2, Fen Wang3, Jianjun Cheng4, Yang Yu5, Xiaoyun Chen6.
Abstract
The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.Entities:
Keywords: deep neural network; ensemble model; intrusion detection system; spectral clustering; wireless sensor network
Year: 2016 PMID: 27754380 PMCID: PMC5087489 DOI: 10.3390/s16101701
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Architecture of an auto-encoder and decoder in a deep neural network (DNN).
Figure 2The SCDNN flow chart is divided into three steps and shows each process in detail.
The distribution of the training and testing sets from the six datasets generated from KDD’99 and NSL-KDD.
| Dataset | Training Dataset | Testing Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Normal% | DoS% | Probe% | U2R% | R2L% | Normal% | DoS% | Probe% | U2R% | R2L% | |
| Dataset 1 | 17.96 | 72.28 | 7.583 | 0.096 | 2.079 | 19.48 | 73.90 | 1.339 | 0.073 | 5.205 |
| Dataset 2 | 19.48 | 78.40 | 1.645 | 0.021 | 0.451 | 19.48 | 73.90 | 1.339 | 0.073 | 5.205 |
| Dataset 3 | 19.69 | 79.23 | 0.831 | 0.011 | 0.228 | 19.48 | 73.90 | 1.339 | 0.073 | 5.205 |
| Dataset 4 | 53.38 | 36.65 | 9.086 | 0.044 | 0.860 | 43.07 | 33.08 | 10.73 | 0.887 | 12.21 |
| Dataset 5 | 48.56 | 33.11 | 16.81 | 0.075 | 1.435 | 43.07 | 33.08 | 10.73 | 0.887 | 12.21 |
| Dataset 6 | 53.38 | 36.65 | 9.086 | 0.044 | 0.830 | 18.16 | 36.64 | 20.27 | 1.688 | 23.24 |
Figure 3(a–f) Comparing SCDNN accuracy over several cluster numbers k and σ values for the six datasets.
Figure 4(a–f) Comparing SCDNN accuracy for different numbers of clusters k for the six datasets.
Detection accuracy of five attack types using the optimal number of clusters for each dataset.
| Dataset | Nor (%) | DoS (%) | Probe (%) | U2R (%) | R2L (%) | Accuracy (%) | ||
|---|---|---|---|---|---|---|---|---|
| Dataset 1 | 0.5 | 97.21 | 96.87 | 80.32 | 11.4 | 6.88 | 91.97 | |
| Dataset 2 | 0.4 | 98.42 | 97.2 | 70.64 | 3.51 | 1.57 | 92.03 | |
| Dataset 3 | 0.5 | 97.61 | 97.23 | 65.96 | 4.39 | 6.59 | 92.1 | |
| Dataset 4 | 0.4 | 96.17 | 75.84 | 53.37 | 3.00 | 3.01 | 72.64 | |
| Dataset 5 | 0.4 | 97.19 | 74.51 | 48.37 | 5.00 | 0.62 | 71.83 | |
| Dataset 6 | 0.5 | 84.20 | 50.02 | 52.66 | 1.50 | 0.98 | 44.55 |
Comparing network intrusion detection results for the six datasets (%).
| Dataset | Model | Normal | DoS | Probe | U2R | R2L | Acc | Recall | ER |
|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | SVM | 83 | 66.01 | 0.88 | 3.14 | 81.52 | 77.72 | 18.48 | |
| BP | 96.51 | 89.49 | 46.18 | 9.21 | 1.93 | 85.66 | 83.48 | 14.34 | |
| RF | 93.65 | 96.62 | 59.27 | 0 | 0 | 90.44 | 91.08 | 9.56 | |
| Bayes | 91.51 | 95.59 | 61.35 | 4.39 | 3.56 | 89.48 | 10.52 | ||
| SCDNN | 97.21 | 91.68 | |||||||
| Dataset 2 | SVM | 96.22 | 97.1 | 65.84 | 0 | 0.05 | 91.39 | 90.52 | 8.61 |
| BP | 91.44 | 62.69 | 90.93 | 9.07 | |||||
| RF | 98.23 | 96.48 | 38.26 | 0 | 0 | 90.95 | 89.51 | 9.05 | |
| Bayes | 95.92 | 95.98 | 62.55 | 4.82 | 4.38 | 90.69 | 91.07 | 9.31 | |
| SCDNN | 97.2 | 3.51 | 1.57 | 91.35 | |||||
| Dataset 3 | SVM | 95.87 | 97.23 | 64.86 | 0 | 0.06 | 91.41 | 90.59 | 8.59 |
| BP | 81.53 | 96.95 | 8.81 | 6.14 | 7.26 | 88.03 | 90.05 | 11.97 | |
| RF | 96.57 | 0 | 0 | 0 | 90.76 | 89.37 | 9.24 | ||
| Bayes | 96.38 | 96.29 | 59.15 | 91.12 | 90.95 | 8.88 | |||
| SCDNN | 97.61 | 4.39 | 6.59 | ||||||
| Dataset 4 | SVM | 95.54 | 70.18 | 57.37 | 0 | 1.63 | 70.73 | 53.26 | 29.27 |
| BP | 96.35 | 71.17 | 0 | 0.58 | 72.16 | 27.84 | |||
| RF | 63.11 | 7.23 | 0 | 0 | 64.57 | 40.45 | 35.43 | ||
| Bayes | 93.9 | 72.18 | 41.02 | 0 | 0 | 68.73 | 52.78 | 31.27 | |
| SCDNN | 96.17 | 53.37 | 57.48 | ||||||
| Dataset 5 | SVM | 98.57 | 18.93 | 49.89 | 0 | 0.11 | 54.1 | 20.45 | 45.9 |
| BP | 91.79 | 7.63 | 1.5 | 49.53 | 27.56 | 50.47 | |||
| RF | 62.64 | 48.99 | 0 | 0 | 68.93 | 46.43 | 31.07 | ||
| Bayes | 99.06 | 61.65 | 35.4 | 0 | 0 | 66.87 | 44.28 | 33.13 | |
| SCDNN | 97.19 | 48.37 | 0.62 | ||||||
| Dataset 6 | SVM | 95.81 | 41.5 | 43.67 | 0 | 0 | 41.46 | 30.6 | 58.54 |
| BP | 74.72 | 4.61 | 0 | 1.53 | 33.59 | 30.6 | 66.41 | ||
| RF | 36.15 | 6.74 | 0 | 0 | 32.73 | 18.9 | 67.27 | ||
| Bayes | 82.16 | 48.25 | 28.52 | 0 | 0 | 38.37 | 30.08 | 61.63 | |
| SCDNN | 84.2 | 52.66 |
Figure 5(a–f) Prediction accuracy histogram of the five detection models.
Basic attacker types with packet routing protocols on the ad hoc on-demand distance vector (AODV) protocol in wireless sensor networks (WSNs).
| Attack Name | Attack Description | Attack Type |
|---|---|---|
| Active Reply | The route reply is forged with abnormal support to reply. | 1 |
| Route drop | The routing packets are dropped with some specific address. | 2 |
| Modify Sequence | The number of target sequences increases with largest maximal values. | 3 |
| Rushing | Rushing of routing messages. | 4 |
| Data Interruption | A data packet is used to drop the route. | 5 |
| Route Modification | The route is modified in Routing Table Entries. | 6 |
| Change Hop | The route cost in routing tables entries is altered. | 7 |
Average detection accuracy for five sensor nodes scales by the SCDNN algorithm using optimal k and values (%).
| Dataset | Parameter | Sensor Nodes | ||||
|---|---|---|---|---|---|---|
| 10 Nodes | 20 Nodes | 30 Nodes | 40 Nodes | 50 Nodes | ||
| 5% Attacker in Networks | 96.8 | 96.4 | 95.8 | 94.5 | 94.6 | |
| 93.1 | 93.5 | 92.4 | 89.3 | 88.7 | ||
| 89.6 | 89.2 | 86.5 | 82.4 | 83.3 | ||
| 10% Attacker in Networks | 96.8 | 96.4 | 95.8 | 94.5 | 94.6 | |
| 93.1 | 93.5 | 92.4 | 89.3 | 88.7 | ||
| 89.6 | 89.2 | 86.5 | 82.4 | 83.3 | ||
Figure 6Detection accuracy with five sensor node scales with 95% confidence interval. Accuracy is shown for (a) a 5% attacker and (b) a 10% attacker, 500 times each.
Figure 7(a–f) Receiver operating curves (ROC) curves of the five models in the six datasets, shown with optimal values of k and σ.
Area under curve (AUC) values for the ROCs of each model in the six datasets.
| Dataset | SVM | BP | RF | Bayes | SCDNN |
|---|---|---|---|---|---|
| Dataset 1 | 0.88 | 0.82 | 0.94 | 0.93 | 0.95 |
| Dataset 2 | 0.95 | 0.78 | 0.94 | 0.94 | 0.95 |
| Dataset 3 | 0.95 | 0.88 | 0.94 | 0.94 | 0.95 |
| Dataset 4 | 0.82 | 0.72 | 0.78 | 0.80 | 0.83 |
| Dataset 5 | 0.71 | 0.61 | 0.80 | 0.79 | 0.82 |
| Dataset 6 | 0.61 | 0.56 | 0.58 | 0.61 | 0.65 |
Figure 8Average precision histograms for the five models compared between the six datasets.