| Literature DB >> 35591056 |
Josue Genaro Almaraz-Rivera1, Jesus Arturo Perez-Diaz1, Jose Antonio Cantoral-Ceballos1.
Abstract
From smart homes to industrial environments, the IoT is an ally to easing daily activities, where some of them are critical. More and more devices are connected to and through the Internet, which, given the large amount of different manufacturers, may lead to a lack of security standards. Denial of service attacks (DDoS, DoS) represent the most common and critical attack against and from these networks, and in the third quarter of 2021, there was an increase of 31% (compared to the same period of 2020) in the total number of advanced DDoS targeted attacks. This work uses the Bot-IoT dataset, addressing its class imbalance problem, to build a novel Intrusion Detection System based on Machine Learning and Deep Learning models. In order to evaluate how the records timestamps affect the predictions, we used three different feature sets for binary and multiclass classifications; this helped us avoid feature dependencies, as produced by the Argus flow data generator, whilst achieving an average accuracy >99%. Then, we conducted comprehensive experimentation, including time performance evaluation, matching and exceeding the results of the current state-of-the-art for identifying denial of service attacks, where the Decision Tree and Multi-layer Perceptron models were the best performing methods to identify DDoS and DoS attacks over IoT networks.Entities:
Keywords: DDoS attacks; DoS attacks; IoT networks; class balancing; deep learning; intrusion detection system; machine learning
Mesh:
Year: 2022 PMID: 35591056 PMCID: PMC9103313 DOI: 10.3390/s22093367
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison between our approach and the related work around the Bot-IoT dataset.
| This work | [ | [ | [ | [ | [ | [ | [ | |
|---|---|---|---|---|---|---|---|---|
| Class balancing | ✔ | ✗ | ✗ | ✗ | ✔ | ✗ | ✗ | ✔ |
| ML models evaluation | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ | ✔ |
| DL models evaluation | ✔ | ✔ | ✔ | ✗ | ✔ | ✗ | ✔ | ✗ |
| Feature set(s) proposal | ✔ | ✔ | ✗ | ✔ | ✔ | ✔ | ✗ | ✔ |
| Time performance evaluation | ✔ | ✗ | ✗ | ✔ | ✗ | ✗ | ✗ | ✗ |
| Flow-level detection | ✔ | ✔ | ✔ | ✔ | ✗ | ✔ | ✔ | ✔ |
Figure 1Data distribution plot for multiclass classification.
Figure 2Data distribution plot for binary classification.
Feature sets selected.
| Name | Features | Description |
|---|---|---|
| First feature set | stime, pkts, bytes, ltime, seq, dur, mean, stddev, sum, min, max, spkts, dpkts, sbytes, dbytes, rate, srate, drate | Using timestamps, the Argus sequence number, and the statistical variables (i.e., rates, mean, maximum, minimum, etc.). |
| Second feature set | pkts, bytes, dur, mean, stddev, sum, min, max, spkts, dpkts, sbytes, dbytes, rate, srate, drate | With no timestamps neither the Argus sequence number, only the statistical variables. |
| Third feature set | pkts, bytes, seq, dur, mean, stddev, sum, min, max, spkts, dpkts, sbytes, dbytes, rate, srate, drate | With the Argus sequence number and the statistical variables. |
Variables description.
| Feature | Description |
|---|---|
| stime | Record start time. |
| ltime | Record last time. |
| seq | Argus sequence number. |
| pkts | Total number of packets in transaction. |
| bytes | Total number of bytes in transaction. |
| dur | Record total duration. |
| mean | Average duration at records aggregate level. |
| stddev | Standard deviation of the duration at records aggregate level. |
| sum | Total duration at records aggregate level. |
| min | Minimum duration at records aggregate level. |
| max | Maximum duration at records aggregate level. |
| spkts | Source-to-destination packet count. |
| dpkts | Destination-to-source packet count. |
| sbytes | Source-to-destination bytes count. |
| dbytes | Destination-to-source bytes count. |
| rate | Total packets per second in transaction. |
| srate | Source-to-destination packets per second. |
| drate | Destination-to-source packets per second. |
Figure 3Correlation matrix for multiclass classification.
Figure 4Correlation matrix for binary classification.
Summary of ML models parameters for the first feature set.
| Model | Binary Classification | Multiclass Classification |
|---|---|---|
| SVM |
Kernel: Radial Basis Function Max iterations: 70,000 |
Kernel: Linear Max iterations: 70,000 |
| Decision Tree |
Max depth: 11 Entropy criterion |
Max depth: 10 Entropy criterion |
| Random Forest |
Max depth: 11 Entropy criterion Trees: 12 |
Max depth: 10 Entropy criterion Trees: 9 |
Summary of ML models parameters for the second feature set.
| Model | Binary Classification | Multiclass Classification |
|---|---|---|
| SVM |
Kernel: Radial Basis Function Max iterations: 50,000 |
Kernel: Radial Basis Function Max iterations: 50,000 |
| Decision Tree |
Max depth: 7 Entropy criterion |
Max depth: 8 Entropy criterion |
| Random Forest |
Max depth: 7 Entropy criterion Trees: 2 |
Max depth: 8 Entropy criterion Trees: 9 |
Summary of ML models parameters for the third feature set.
| Model | Binary Classification | Multiclass Classification |
|---|---|---|
| SVM |
Kernel: Radial Basis Function Max iterations: 50,000 |
Kernel: Radial Basis Function Max iterations: 70,000 |
| Decision Tree |
Max depth: 8 Entropy criterion |
Max depth: 7 Entropy criterion |
| Random Forest |
Max depth: 8 Entropy criterion Trees: 11 |
Max depth: 7 Entropy criterion Trees: 21 |
Summary of DL models parameters for the three feature sets.
| Model | Binary Classification | Multiclass Classification |
|---|---|---|
| RNN, LSTM, GRU, MLP |
Classes: 2 Batch size: 128 Input size: 18, 15, and 16 Hidden size: 128 (512 for MLP) Layers: 3 (4 for MLP) Sequence length: 1 (None for MLP) Epochs: 100 Optimizer: Adam Loss function: Cross Entropy Learning rate: 0.0011 Device: CPU |
Classes: 4 Batch size: 128 Input size: 18, 15, and 16 Hidden size: 128 (512 for MLP) Layers: 3 (4 for MLP) Sequence length: 1 (None for MLP) Epochs: 100 Optimizer: Adam Loss function: Cross Entropy Learning rate: 0.0011 Device: CPU |
Multiclass classification results for the first feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Random Forest | 99.945% | 99.945% | 99.945% | 99.945% |
| Decision Tree | 99.917% | 99.918% | 99.917% | 99.917% |
| LSTM | 99.862% | 99.862% | 99.864% | 99.863% |
| GRU | 99.862% | 99.861% | 99.865% | 99.863% |
| MLP | 99.862% | 99.861% | 99.865% | 99.863% |
| RNN | 99.807% | 99.806% | 99.811% | 99.808% |
| SVM | 94.056% | 94.661% | 94.056% | 94.122% |
Binary classification results for the first feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Random Forest | 99.972% | 99.973% | 99.972% | 99.972% |
| Decision Tree | 99.945% | 99.945% | 99.945% | 99.945% |
| RNN | 99.862% | 99.889% | 99.926% | 99.908% |
| MLP | 99.862% | 99.889% | 99.926% | 99.908% |
| GRU | 99.835% | 99.852% | 99.926% | 99.889% |
| LSTM | 99.807% | 99.816% | 99.926% | 99.871% |
| SVM | 98.404% | 98.431% | 98.404% | 98.388% |
Multiclass classification results for the second feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Random Forest | 99.89% | 99.89% | 99.89% | 99.89% |
| Decision Tree | 99.862% | 99.863% | 99.862% | 99.862% |
| MLP | 96.34% | 96.372% | 96.375% | 96.354% |
| GRU | 96.23% | 96.276% | 96.25% | 96.249% |
| LSTM | 96.01% | 96.042% | 96.042% | 99.022% |
| RNN | 95.019% | 95.126% | 95.027% | 95.049% |
| SVM | 75.482% | 78.481% | 75.482% | 75.218% |
Binary classification results for the second feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Decision Tree | 99.862% | 99.862% | 99.862% | 99.862% |
| Random Forest | 99.835% | 99.835% | 99.835% | 99.835% |
| GRU | 97.111% | 97.797% | 98.339% | 98.067% |
| LSTM | 96.753% | 97.437% | 98.228% | 97.831% |
| MLP | 96.505% | 97.498% | 97.822% | 97.66% |
| RNN | 96.147% | 97.381% | 97.453% | 97.417% |
| SVM | 81.205% | 84.721% | 81.205% | 76.825% |
Multiclass classification results for the third feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Random Forest | 99.917% | 99.918% | 99.917% | 99.917% |
| Decision Tree | 99.862% | 99.863% | 99.862% | 99.862% |
| GRU | 99.697% | 99.693% | 99.702% | 99.697% |
| MLP | 99.642% | 99.638% | 99.646% | 99.641% |
| LSTM | 99.56% | 99.557% | 99.565% | 99.561% |
| RNN | 99.56% | 99.556% | 99.566% | 99.56% |
| SVM | 89.351% | 89.603% | 89.351% | 89.347% |
Binary classification results for the third feature set.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Decision Tree | 99.89% | 99.89% | 99.89% | 99.89% |
| Random Forest | 99.835% | 99.835% | 99.835% | 99.835% |
| MLP | 99.697% | 99.742% | 99.852% | 99.797% |
| GRU | 99.642% | 99.595% | 99.926% | 99.76% |
| RNN | 99.615% | 99.595% | 99.889% | 99.742% |
| LSTM | 99.615% | 99.559% | 99.926% | 99.742% |
| SVM | 94.194% | 94.347% | 94.194% | 94.246% |
Multiclass classification time performance for the first feature set.
| Model | Avg Flows/s | Stddev Flows/s |
|---|---|---|
| Decision Tree | 29,453 | 790.687 |
| MLP | 8306 | 537.827 |
| SVM | 4283 | 139.935 |
| RNN | 4158 | 59.906 |
| GRU | 2497 | 51.75 |
| LSTM | 2388 | 20.823 |
| Random Forest | 1813 | 65.692 |
Binary classification time performance for the first feature set.
| Model | Avg Flows/s | Stddev Flows/s |
|---|---|---|
| Decision Tree | 29,452 | 716.966 |
| MLP | 9411 | 38.543 |
| SVM | 4956 | 25.011 |
| RNN | 4375 | 77.826 |
| GRU | 2661 | 8.712 |
| LSTM | 2610 | 5.094 |
| Random Forest | 1350 | 81.339 |
Multiclass classification time performance for the second feature set.
| Model | Avg Flows/s | Stddev Flows/s |
|---|---|---|
| Decision Tree | 30,362 | 681.989 |
| MLP | 9319 | 48.97 |
| RNN | 4742 | 49.485 |
| GRU | 2864 | 17.051 |
| LSTM | 2702 | 33.465 |
| Random Forest | 1954 | 15.106 |
| SVM | 651 | 8.033 |
Binary classification time performance for the second feature set.
| Model | Avg Flows/s | Stddev Flows/s |
|---|---|---|
| Decision Tree | 29,940 | 523.611 |
| MLP | 9177 | 142.993 |
| RNN | 4697 | 27.281 |
| Random Forest | 4571 | 60.758 |
| GRU | 2763 | 49.491 |
| LSTM | 2687 | 22.446 |
| SVM | 866 | 7.232 |
Multiclass classification time performance for the third feature set.
| Model | Avg Flows/s | Stddev Flows/s |
|---|---|---|
| Decision Tree | 33,094 | 378.595 |
| MLP | 9934 | 257.982 |
| RNN | 4823 | 99.721 |
| GRU | 2918 | 51.754 |
| LSTM | 2877 | 65.451 |
| SVM | 1171 | 17.393 |
| Random Forest | 994 | 22.931 |
Binary classification time performance for the third feature set.
| Model | Avg Flows/s | Stddev Flows/s |
|---|---|---|
| Decision Tree | 32,607 | 151.361 |
| MLP | 10,017 | 101.06 |
| RNN | 4883 | 123.54 |
| GRU | 2996 | 34.989 |
| LSTM | 2864 | 79.405 |
| Random Forest | 1668 | 89.448 |
| SVM | 1422 | 10.979 |
Figure 5Decision Tree accuracy, as the best model, across the three different feature sets for binary and multiclass classifications.
Figure 6Confusion matrix for Decision Tree multiclass classification, using the best feature set. The numbers in the axes mean 0 for Normal class, 1 for UDP class, 2 for TCP class, and 3 for HTTP class.
Figure 7Confusion matrix for Decision Tree binary classification, using the best feature set. The numbers in the axes mean 0 for Normal class, and 1 for Attack class.
Binary classification results for Normal flows vs. DDoS/DoS subcategories (protocols), using the first feature set.
| Classes | Best Model (s) | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Normal vs. DDoS | Random Forest | 99.956% | 99.956% | 99.956% | 99.956% |
| Normal vs. DDoS UDP | Decision Tree and Random Forest | 99.853% | 99.853% | 99.853% | 99.853% |
| Normal vs. DDoS HTTP | Decision Tree and Random Forest | 100% | 100% | 100% | 100% |
| Normal vs. DDoS TCP | Decision Tree and Random Forest | 100% | 100% | 100% | 100% |
| Normal vs. DoS | Random Forest | 99.956% | 99.956% | 99.956% | 99.956% |
| Normal vs. DoS UDP | All models, except for SVM | 100% | 100% | 100% | 100% |
| Normal vs. DoS HTTP | Decision Tree | 100% | 100% | 100% | 100% |
| Normal vs. DoS TCP | All models, except for SVM | 100% | 100% | 100% | 100% |
Binary classification results for Normal flows vs. DDoS/DoS subcategories (protocols), using the second feature set.
| Classes | Best Model (s) | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Normal vs. DDoS | Decision Tree and Random Forest | 99.956% | 99.956% | 99.956% | 99.956% |
| Normal vs. DDoS UDP | Decision Tree and Random Forest | 99.853% | 99.853% | 99.853% | 99.853% |
| Normal vs. DDoS HTTP | Decision Tree and Random Forest | 100% | 100% | 100% | 100% |
| Normal vs. DDoS TCP | Decision Tree and Random Forest | 100% | 100% | 100% | 100% |
| Normal vs. DoS | Random Forest | 99.868% | 99.868% | 99.868% | 99.868% |
| Normal vs. DoS UDP | All models, except for SVM | 100% | 100% | 100% | 100% |
| Normal vs. DoS HTTP | Decision Tree | 100% | 100% | 100% | 100% |
| Normal vs. DoS TCP | Decision Tree and Random Forest | 100% | 100% | 100% | 100% |
Binary classification results for Normal flows vs. DDoS/DoS subcategories (protocols), using the third feature set.
| Classes | Best Model (s) | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| Normal vs. DDoS | Random Forest | 99.956% | 99.956% | 99.956% | 99.956% |
| Normal vs. DDoS UDP | Random Forest | 99.853% | 99.853% | 99.853% | 99.853% |
| Normal vs. DDoS HTTP | Decision Tree and Random Forest | 100% | 100% | 100% | 100% |
| Normal vs. DDoS TCP | Random Forest | 100% | 100% | 100% | 100% |
| Normal vs. DoS | Random Forest | 99.868% | 99.868% | 99.868% | 99.868% |
| Normal vs. DoS UDP | All models, except for SVM | 100% | 100% | 100% | 100% |
| Normal vs. DoS HTTP | Decision Tree | 100% | 100% | 100% | 100% |
| Normal vs. DoS TCP | All models, except for SVM | 100% | 100% | 100% | 100% |