| Literature DB >> 31658774 |
Mnahi Alqahtani1, Abdu Gumaei2, Hassan Mathkour3, Mohamed Maher Ben Ismail4.
Abstract
An Intrusion detection system is an essential security tool for protecting services and infrastructures of wireless sensor networks from unseen and unpredictable attacks. Few works of machine learning have been proposed for intrusion detection in wireless sensor networks and that have achieved reasonable results. However, these works still need to be more accurate and efficient against imbalanced data problems in network traffic. In this paper, we proposed a new model to detect intrusion attacks based on a genetic algorithm and an extreme gradient boosting (XGBoot) classifier, called GXGBoost model. The latter is a gradient boosting model designed for improving the performance of traditional models to detect minority classes of attacks in the highly imbalanced data traffic of wireless sensor networks. A set of experiments were conducted on wireless sensor network-detection system (WSN-DS) dataset using holdout and 10 fold cross validation techniques. The results of 10 fold cross validation tests revealed that the proposed approach outperformed the state-of-the-art approaches and other ensemble learning classifiers with high detection rates of 98.2%, 92.9%, 98.9%, and 99.5% for flooding, scheduling, grayhole, and blackhole attacks, respectively, in addition to 99.9% for normal traffic.Entities:
Keywords: WSN-DS; extreme gradient boosting classifier; genetic algorithm; intrusion detection system; wireless sensor networks
Year: 2019 PMID: 31658774 PMCID: PMC6832929 DOI: 10.3390/s19204383
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Flowchart of proposed genetic-based extreme gradient boosting (GXGBoot) Model.
Extracted features of the wireless sensor network-detection system (WSN-DS) Dataset.
| NO. | Feature Name | Symbol | Description |
|---|---|---|---|
| 1 | Node ID | Id | It is a unique symbolized number of the sensor node. For example, the sensor node number 13 in the fourth round and in the second stage has ID 002004013. |
| 2 | Time | Time | It is the current time of the sensor node state in the simulation. |
| 3 | Is CH? | Is_CH | It is a flag, which has 1 or 0 value for determining the node is cluster head (CH), or not. |
| 4 | Who CH | who_CH | It is the ID of the cluster head (CH) in the existing round. |
| 5 | Received Signal Strength Indication | RSSI | It is the RSSI between a sensor node and its cluster head in the existing round. |
| 6 | Distance to cluster head | Dist_To_CH | It is the computed distance between a sensor node and its cluster head in the existing round. |
| 7 | Max distance to cluster head | M_D_CH | It is the maximum computed distance between sensor nodes and its cluster head within the same cluster. |
| 8 | Average distance to cluster head | A_D_CH | It represents the average distance between sensor nodes within the cluster and their cluster head. |
| 9 | Current energy | Current_Energy | It is the current energy of the current round for a sensor node. |
| 10 | Energy consumption | Consumed_Energy | It is the energy amount consumed by the sensor node in the previous round. |
| 11 | Advertise cluster head sends | ADV_S | It is the number of advertise broadcast messages sent from the cluster head to the sensor nodes. |
| 12 | Advertise cluster head receives | ADV_R | It represents the number of advertise messages which are received by the sensor nodes from cluster heads. |
| 13 | Join request messages send | JOIN_S | It is the number of join request messages, which are sent by the sensor nodes to the cluster head. |
| 14 | Join request messages receive | JOIN_R | It is the number of join request messages, which are received by the cluster head from the sensor nodes. |
| 15 | Advertise SCH sends | ADV_SCH_S | It represents the number of advertise broadcast messages of the Time Division Multiple Access (TDMA) schedule which are sent to the sensor nodes. |
| 16 | Advertise SCH receives | ADV_SCH_R | It is the number of advertise broadcast messages for the TDMA schedule which are received from cluster heads. |
| 17 | Rank | Rank | It represents the order of the sensor node within the schedule of the TDMA. |
| 18 | Data sent | Data_S | It represents the number of data packets, which are sent from a sensor node to its cluster head. |
| 19 | Data received | Data_R | It represents the number of data packets that are received by a sensor node from cluster head. |
| 20 | Data sent to base station | Data_Sent_BS | It represents the number of data packets that are sent from a sensor node to the base station. |
| 21 | Distance cluster head to base station | Dist_CH_BS | It represents the distance between the cluster head and the base station. |
| 22 | Send Code | Send_code | It is the sending code of the cluster. |
| 23 | Attack Type | Attack_Type | It is the class label of the wireless sensor network traffic, which could be normal, or attack. There are four categorical types of attacks, namely, flooding, scheduling (TDMA), grayhole, and blackhole. |
Figure 2Distribution of attacks in the WSN-DS Dataset.
Data samples from the WSN-DS dataset [46].
| Id | Time | Is CH | Who CH | Dist To CH | ADV S | ADV R | JOIN S | JOIN R | SCH S | SCH R | Rank | DATA S | DATA R | Data Sent To BS | Dist CH To BS | Send Code | Consumed Energy | Attack Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101000 | 50 | 1 | 101000 | 0 | 1 | 0 | 0 | 25 | 1 | 0 | 0 | 0 | 1200 | 48 | 130.0854 | 0 | 2.4694 | Normal |
| 101001 | 50 | 0 | 101044 | 75.32345 | 0 | 4 | 1 | 0 | 0 | 1 | 2 | 38 | 0 | 0 | 0 | 4 | 0.06957 | Normal |
| 101002 | 50 | 0 | 101010 | 46.95453 | 0 | 4 | 1 | 0 | 0 | 1 | 19 | 41 | 0 | 0 | 0 | 3 | 0.06898 | Normal |
| 101004 | 50 | 0 | 101010 | 4.83341 | 0 | 4 | 1 | 0 | 0 | 1 | 25 | 41 | 0 | 0 | 0 | 3 | 0.06534 | Normal |
| 2901024 | 3553 | 1 | 2901024 | 0 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 113.2765 | 0 | 0.01237 | Grayhole |
| 2901029 | 3553 | 1 | 2901029 | 0 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 150.3168 | 0 | 0.01237 | Grayhole |
| 2901073 | 3553 | 1 | 2901100 | 0 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 96.57363 | 0 | 0.01813 | Grayhole |
| 501014 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00446 | Blackhole |
| 501021 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00445 | Blackhole |
| 501029 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00446 | Blackhole |
| 501030 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00445 | Blackhole |
| 404017 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.18101 | TDMA |
| 404018 | 2203 | 0 | 404028 | 8.59592 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 160 | 0 | 0 | 0 | 3 | 0.26334 | Normal |
| 404020 | 2203 | 0 | 404100 | 12.89353 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 181 | 0 | 0 | 0 | 4 | 0.29774 | Normal |
| 404023 | 2203 | 0 | 404100 | 19.59164 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 181 | 0 | 0 | 0 | 1 | 0.47633 | Normal |
| 404025 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 1 | 1 | 0 | 0 | 0 | 241 | 241 | 138.3672 | 0 | 2.02545 | TDMA |
| 404028 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 4 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00623 | TDMA |
| 404029 | 2203 | 0 | 404100 | 18.31869 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 206 | 0 | 0 | 0 | 5 | 0.33993 | Normal |
| 404035 | 2203 | 0 | 404100 | 15.82954 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 181 | 0 | 0 | 0 | 1 | 0.47308 | Normal |
| 404050 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00624 | TDMA |
| 404053 | 2203 | 0 | 404100 | 19.42763 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 160 | 0 | 0 | 0 | 3 | 0.2652 | Normal |
| 404060 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.09609 | TDMA |
| 404073 | 2203 | 0 | 404100 | 14.13972 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 206 | 0 | 0 | 0 | 5 | 0.33878 | Normal |
| 404078 | 2203 | 0 | 404100 | 10.54019 | 0 | 10 | 1 | 0 | 0 | 1 | 1 | 206 | 0 | 0 | 0 | 2 | 1.42778 | Normal |
| 404080 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 1 | 1 | 0 | 0 | 0 | 241 | 241 | 176.6235 | 0 | 2.5962 | TDMA |
| 302096 | 1153 | 1 | 302096 | 0 | 6 | 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 121.695 | 0 | 0.35722 | Flooding |
| 401001 | 1203 | 1 | 401001 | 0 | 6 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 136.2575 | 0 | 0.2398 | Flooding |
| 401034 | 1203 | 1 | 401034 | 0 | 6 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 165.4621 | 0 | 0.26426 | Flooding |
| 401054 | 1203 | 1 | 401054 | 0 | 6 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 142.1079 | 0 | 0.24251 | Flooding |
| 401069 | 1203 | 1 | 401069 | 0 | 6 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 93.93772 | 0 | 0.21994 | Flooding |
| 101000 | 50 | 1 | 101000 | 0 | 1 | 0 | 0 | 25 | 1 | 0 | 0 | 0 | 1200 | 48 | 130.0854 | 0 | 2.4694 | Normal |
| 101001 | 50 | 0 | 101044 | 75.32345 | 0 | 4 | 1 | 0 | 0 | 1 | 2 | 38 | 0 | 0 | 0 | 4 | 0.06957 | Normal |
| 101004 | 50 | 0 | 101010 | 4.83341 | 0 | 4 | 1 | 0 | 0 | 1 | 25 | 41 | 0 | 0 | 0 | 3 | 0.06534 | Normal |
| 2901024 | 3553 | 1 | 2901024 | 0 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 113.2765 | 0 | 0.01237 | Grayhole |
| 2901029 | 3553 | 1 | 2901029 | 0 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 150.3168 | 0 | 0.01237 | Grayhole |
| 2901073 | 3553 | 1 | 2901100 | 0 | 1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 96.57363 | 0 | 0.01813 | Grayhole |
| 501014 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00446 | Blackhole |
| 501029 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00446 | Blackhole |
| 501030 | 1703 | 1 | 501100 | 0 | 1 | 26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00445 | Blackhole |
| 404017 | 2203 | 1 | 404100 | 0 | 1 | 9 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.18101 | TDMA |
The dataset separated 60% training set and 40% testing set using holdout method.
| The Attack Type | Training Set (60%) | Testing Set (40%) |
|---|---|---|
| Blackhole | 6029 | 4020 |
| Grayhole | 8758 | 5838 |
| Flooding | 1988 | 1324 |
| Scheduling | 3982 | 2656 |
| Normal | 204,039 | 136,027 |
| Sum | 224,796 | 149,865 |
Precision results of the 10 fold cross validation.
| Fold No. | Normal | Flooding | Scheduling | Grayhole | Blackhole |
|---|---|---|---|---|---|
| 1 | 1.00 | 0.96 | 0.99 | 0.99 | 0.99 |
| 2 | 1.00 | 0.97 | 0.99 | 0.99 | 0.99 |
| 3 | 1.00 | 0.97 | 0.99 | 0.99 | 0.99 |
| 4 | 1.00 | 0.95 | 0.99 | 0.99 | 0.99 |
| 5 | 1.00 | 0.94 | 0.98 | 0.99 | 0.99 |
| 6 | 1.00 | 0.95 | 0.98 | 0.99 | 0.99 |
| 7 | 1.00 | 0.97 | 1.00 | 0.99 | 0.99 |
| 8 | 1.00 | 0.94 | 0.99 | 0.99 | 1.00 |
| 9 | 1.00 | 0.96 | 0.99 | 0.99 | 0.99 |
| 10 | 1.00 | 0.97 | 0.99 | 0.99 | 0.99 |
Recall results of the 10 fold cross validation.
| Fold No. | Normal | Flooding | Scheduling | Grayhole | Blackhole |
|---|---|---|---|---|---|
| 1 | 1.00 | 0.99 | 0.93 | 0.99 | 0.99 |
| 2 | 1.00 | 0.98 | 0.93 | 0.99 | 0.99 |
| 3 | 1.00 | 0.98 | 0.93 | 0.99 | 1.00 |
| 4 | 1.00 | 0.99 | 0.92 | 0.99 | 1.00 |
| 5 | 1.00 | 0.98 | 0.92 | 0.99 | 1.00 |
| 6 | 1.00 | 0.98 | 0.94 | 0.99 | 0.99 |
| 7 | 1.00 | 0.98 | 0.95 | 0.99 | 1.00 |
| 8 | 1.00 | 0.99 | 0.92 | 0.99 | 0.99 |
| 9 | 1.00 | 0.98 | 0.93 | 0.98 | 0.99 |
| 10 | 1.00 | 0.98 | 0.91 | 0.99 | 1.00 |
F1-score results of the 10 fold cross validation.
| Fold No. | Normal | Flooding | Scheduling | Grayhole | Blackhole |
|---|---|---|---|---|---|
| 1 | 1.00 | 0.97 | 0.96 | 0.99 | 0.99 |
| 2 | 1.00 | 0.97 | 0.96 | 0.99 | 0.99 |
| 3 | 1.00 | 0.97 | 0.96 | 0.99 | 0.99 |
| 4 | 1.00 | 0.97 | 0.95 | 0.99 | 0.99 |
| 5 | 1.00 | 0.96 | 0.95 | 0.99 | 0.99 |
| 6 | 1.00 | 0.96 | 0.96 | 0.99 | 0.99 |
| 7 | 1.00 | 0.97 | 0.97 | 0.99 | 1.00 |
| 8 | 1.00 | 0.97 | 0.96 | 0.99 | 1.00 |
| 9 | 1.00 | 0.97 | 0.96 | 0.99 | 0.99 |
| 10 | 1.00 | 0.97 | 0.95 | 0.99 | 1.00 |
Positive and negative rates results of the 10 folds cross validation.
| Normal | Flooding | Scheduling | Grayhole | Blackhole | |
|---|---|---|---|---|---|
| TPR | 0.999 | 0.982 | 0.929 | 0.989 | 0.995 |
| TNR | 0.982 | 1 | 1 | 1 | 1 |
| FPR | 0.018 | 0 | 0 | 0.1 | 0 |
| FNR | 0.001 | 0.018 | 0.071 | 0.011 | 0.005 |
| Overall Accuracy | 0.997 | ||||
TPR: true positive rate, TNR: true negative rate, FPR: false positive rate, and FNR: false positive rate.
Average results of precision, recall, and F1-score, and their weighted average for the 10 fold cross validation.
| Precision | Recall | F1-Score | |
|---|---|---|---|
| Normal | 1 | 1 | 1 |
| Flooding | 0.958 | 0.983 | 0.968 |
| Scheduling | 0.989 | 0.928 | 0.958 |
| Grayhole | 0.99 | 0.989 | 0.99 |
| Blackhole | 0.991 | 0.995 | 0.993 |
| Weighted avg. | 1 | 1 | 1 |
Figure 3Confusion matrix of intrusion detection of the proposed GXGBoost model using holdout method on the WSN-DS Dataset.
Positive and negative rates results of the 10 fold cross validation.
| Normal | Flooding | Scheduling | Grayhole | Blackhole | |
|---|---|---|---|---|---|
| TPR | 1 | 0.98 | 0.93 | 0.99 | 0.99 |
| TNR | 0.98 | 1 | 1 | 1 | 1 |
| FPR | 0.02 | 0 | 0 | 0 | 0 |
| FNR | 0 | 0.02 | 0.07 | 0.01 | 0.01 |
| Overall Accuracy | 0.997 | ||||
Precision, recall, and F1-score results of the holdout method.
| Precision | Recall | F1-Score | |
|---|---|---|---|
| Normal | 1 | 1 | 1 |
| Flooding | 0.96 | 0.98 | 0.97 |
| Scheduling | 0.99 | 0.93 | 0.96 |
| Grayhole | 0.99 | 0.99 | 0.99 |
| Blackhole | 0.99 | 0.99 | 0.99 |
| Weighted avg. | 1 | 1 | 1 |
Comparison results of TPR for GXGBoost against the original XGBoost and other boosting classifiers models.
| TPR | |||||
|---|---|---|---|---|---|
| Normal | Flooding | Scheduling | Grayhole | Blackhole | |
| AdaBoost | 0.9900 | 0.9700 | 0.9000 | 0.8200 | 0.3800 |
| GB | 0.9977 | 0.9872 | 0.9239 | 0.8659 | 0.8714 |
| XGBoost | 0.9976 | 0.9970 | 0.9194 | 0.9409 | 0.9622 |
| Proposed GXGBoost | 1.0000 | 0.9800 | 0.9300 | 0.9900 | 0.9900 |
Figure 4ROC curves for the compared classifiers models: (a) ROC curve of GXGBoost, (b) ROC curve of XGBoost, (c) ROC curve of GB, and (d) ROC curve of AdaBoost on the WSN-DS Dataset.
Average execution time of classification in seconds (s).
| Model | Average Classification Time |
|---|---|
| AdaBoost | 10.093 s |
| GB | 3.338 s |
| XGBoost | 2.172 s |
| Proposed GXGBoost | 1.905 s |
Figure 5TPR percentage values of the proposed GXGBoost compared to the results of related work in Reference [46] using 10 fold cross validation method.