| Literature DB >> 35341179 |
Mohammed Saeed Alzahrani1, Fawaz Waselallah Alsaade1.
Abstract
The Internet plays a fundamental part in relentless correspondence, so its applicability can decrease the impact of intrusions. Intrusions are defined as movements that unfavorably influence the focus of a computer. Intrusions may sacrifice the reputability, integrity, privacy, and accessibility of the assets attacked. A computer security system will be traded off when an intrusion happens. The novelty of the proposed intelligent cybersecurity system is its ability to protect Internet of Things (IoT) devices and any networks from incoming attacks. In this research, various machine learning and deep learning algorithms, namely, the quantum support vector machine (QSVM), k-nearest neighbor (KNN), linear discriminant and quadratic discriminant long short-term memory (LSTM), and autoencoder algorithms, were applied to detect attacks from signature databases. The correlation method was used to select important network features by finding the features with a high-percentage relationship between the dataset features and classes. As a result, nine features were selected. A one-hot encoding method was applied to convert the categorical features into numerical features. The validation of the system was verified by employing the benchmark KDD Cup database. Statistical analysis methods were applied to evaluate the results of the proposed study. Binary and multiple classifications were conducted to classify the normal and attack packets. Experimental results demonstrated that KNN and LSTM algorithms achieved better classification performance for developing intrusion detection systems; the accuracy of KNN and LSTM algorithms for binary classification was 98.55% and 97.28%, whereas the KNN and LSTM attained a high accuracy for multiple classification (98.28% and 970.7%). Finally, the KNN and LSTM algorithms are fitting-based intrusion detection systems.Entities:
Mesh:
Year: 2022 PMID: 35341179 PMCID: PMC8956412 DOI: 10.1155/2022/4705325
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Proposed system.
Feature names of the KDD Cup dataset.
| S. No | Feature names |
|---|---|
| 1 | Duration |
| 2 | Protocol type |
| 3 | Service |
| 4 | Src-byte |
| 5 | Dst-rate |
| 6 | Flag |
| 7 | Land |
| 8 | Wrong_fragment |
| 9 | Urgent |
| 10 | Hot |
| 11 | Nume_faild_login |
| 12 | Logged_in |
| 13 | Num_compromised |
| 14 | Root_shell |
| 15 | Su_atte- + mpted |
| 16 | Num_root |
| 17 | Num_file_creation |
| 18 | Num_shells |
| 19 | Num_acces_shells |
| 20 | Num_outbound_cmds |
| 21 | Is_hot_Login |
| 22 | Ist_guest_Login |
| 23 | Count |
| 24 | Serror_rate |
| 25 | Rerror-rate |
| 26 | Same-Srv-rate |
| 27 | Diff-Srv-rate |
| 28 | Srv_Count |
| 29 | Srv_serror_rate |
| 30 | Srv_rerror_rate |
| 31 | Srv_Diff_host_rate |
| 32 | Dst_host_count |
| 33 | Dst_host_srv_count |
| 34 | Dst_host_same_srv_count |
| 35 | Dst_host_diff_srv_count |
| 36 | Dst_host_same_src_port_rate |
| 37 | Dst_host_srv_diff_host_rate |
| 38 | Dst_host_serror_rate |
| 39 | Dst_host_srv_serror_rate |
| 40 | Dst_host_rerror_rate |
| 41 | Dst_host_srv_rerror_rate |
All types of attacks in the KDD Cup.
| Attacks in datasets | Type of attacks in KDD Cup |
|---|---|
| DoS | Back, Land, Neptune, Pod, Smurf, Teardrop, Mailbomb, Processtable, Udpstor m, Apache2, Worm |
| Probe | Satan, IPsweep, Nmap, Portsweep, Mscan, Sa int |
| R2L | Guess_password, Ftp_write, Imap, Phf, Multihop, Warezmaster, Xlock, Xsnoop, Snmpgue, ss, Snmpgetattack, Httptunnel, Sendmail, Named |
| U2R | Buffer_overflow, Loadmodule Rootkit, Perl, Sqlattack, Xterm, Ps |
| Attacks in datasets | Type of attacks in NSL-KDD |
| DoS | Back, Land, Neptune, Pod, Smurf, Teardrop, Mailbomb, Processtable, Udpstor m, Apache2,Worm |
| Probe | Satan, IPsweep, Nmap, Portsweep, Mscan, Saint |
| R2L | Guess_password, Ftp_write, Imap, Phf, Multihop, WarezmasterXlock, Xsnoop, Snmpgue, ss, Snmpgetattack, Httptunnel, Sendmail, Named |
| U2R | Buffer_overflow, Loadmodule Rootkit, Perl, Sqlattack, Xterm, Ps |
Categorical features.
| S. No | Feature name |
|---|---|
| 2 | Service |
| 3 | Flag |
| 6 | Protocol type |
Selected features.
| No. | Feature name | Correlation (ranking%) |
|---|---|---|
| 23 | count | 0.576257 |
| 30 | srv_serror_rate | 0.648135 |
| 24 | serror_rate | 0.650527 |
| 38 | dst_host_serror_rate | 0.651740 |
| 39 | dst_host_srv_serror_rate | 0.654855 |
| 12 | logged_in | 0.690053 |
| 36 | dst_host_same_srv_rate | 0.693525 |
| 33 | dst_host_srv_count | 0.722356 |
| 26 | same_srv_rate | 0.751746 |
Figure 2Architecture of the LSTM technique.
Figure 3The structure of the autoencoder model for an IDS.
Figure 4Percentage instance values of the KDD Cup data.
Performance of binary classifiers to detect intrusion.
| Models | Network packets | Accuracy (%) | Precision (%) | Recall (%) | F1 score (%) |
|---|---|---|---|---|---|
| QSVM | Normal | 95.77 | 93 | 92 | 96 |
| Attacks | 99 | 99 | 95 | ||
|
| |||||
| KNN | Normal | 98.55 | 98 | 99 | 98 |
| Attacks | 99 | 98 | 99 | ||
|
| |||||
| Linear discriminant | Normal | 96.77 | 96 | 98 | 97 |
| Attacks | 97 | 96 | 97 | ||
|
| |||||
| Quadratic discriminant | Normal | 68.91 | 63 | 100 | 77 |
| Attacks | 76 | 99 | 86 | ||
Statistical analysis of binary classifiers to predict intrusion.
| Models | MAE | MSE | RMSE |
|
|---|---|---|---|---|
| QSVM | 0.0422 | 0.0422 | 0.20 | 83 |
| KNN | 0.0144 | 0.01449 | 0.120 | 94.17 |
| Linear discriminant | 0.0323 | 0.0322 | 0.1796 | 87.04 |
| Quadratic discriminant | 0.3101 | 0.310 | 0.55 | 13.82 |
Results of deep learning in binary classes.
| Models | Loss | Accuracy (%) | Precision (%) | Recall (%) | F1 score (%) |
|---|---|---|---|---|---|
| LSTM | 0.063 | 97.82 | 97.25 | 98.12 | 97.97 |
| DAE | 0.1040 | 87.40 | 76.25 | 98.84 | 85.71 |
Figure 5Performance of LSTM model on binary classification.
Figure 6Performance of DAE model on binary classification.
Instance values of attacks.
| Attacks | #Instance values |
|---|---|
| Normal | 66810 |
| DoS | 45570 |
| Probe | 11579 |
| R2L | 990 |
| U2R | 52 |
Figure 7Percentage of values of attacks.
Performance of machine learning algorithms in detecting multiple classes.
| Model | Attacks | Accuracy (%) | Precision (%) | Recall (%) | F1 score (%) |
|---|---|---|---|---|---|
| Linear SVM | DoS | 95.39 | 95 | 96 | 96 |
| Probe | 87 | 79 | 83 | ||
| R2L | 63 | 62 | 62 | ||
| U2R | 0.00 | 0.00 | 0.00 | ||
| Normal | 97 | 98 | 98 | ||
|
| |||||
| QSVM | DoS | 92.89 | 96 | 94 | 95 |
| Probe | 97 | 60 | 74 | ||
| R2L | 0.00 | 0.00 | 0.00 | ||
| U2R | 0.00 | 0.000 | 0.00 | ||
| Normal | 91 | 100 | 95 | ||
|
| |||||
| KNN | DoS | 98.28 | 99 | 98 | 99 |
| Probe | 96 | 97 | 96 | ||
| R2L | 91 | 80 | 85 | ||
| U2R | 57 | 27 | 36 | ||
| Normal | 98 | 99 | 99 | ||
|
| |||||
| Linear discriminant | DoS | 93.18 | 94 | 96 | 95 |
| Probe | 89 | 73 | 80 | ||
| R2L | 33 | 88 | 48 | ||
| U2R | 0.04 | 60 | 0.08 | ||
| Normal | 97 | 95 | 96 | ||
| DoS | 61.79 | 94 | 86 | 90 | |
| Probe | 84 | 28 | 42 | ||
| R2L | 0.03 | 100 | 0.06 | ||
| U2R | 0.00 | 0.00 | 0.00 | ||
| Normal | 75 | 51 | 61 | ||
Statistical analysis of machine learning for multiple classification.
| Models | MAE | MSE | RMSE |
|
|---|---|---|---|---|
| Linear SVM | 0.097 | 0.274 | 0.524 | 92.41 |
| QSVM | 0.20 | 0.648 | 0.8050 | 82.81 |
| KNN | 0.050 | 0.172 | 0.4158 | 95.22 |
| Linear discriminant | 0.145 | 0.401 | 0.633 | 88.90 |
Results of deep learning for multiple classification.
| Models | Loss | Accuracy (%) | Precision (%) | Recall (%) | F1 score (%) |
|---|---|---|---|---|---|
| LSTM model | 0.088 | 97.07 | 97.34 | 96.86 | 97.10 |
| Autoencoder model | 0.0676 | 80.01 | 80 | 78.23 | 88.23 |
Figure 8Performance of LSTM model on multiple classification.
Figure 9Performance of DAE model on multiple classification.
Significant results of the proposed system.
| Model | Accuracy | Experiments |
|---|---|---|
| KNN | 98.55 | Binary classification |
| LSTM | 97.82 | Binary classification |
| KNN | 98.28 | Multiple classification |
| LSTM | 97.07 | Multiple classification |
Figure 10ORC of the LSTM model for multiple classification: (a) normal, (b) DoS attack, (c) probe attack, (d) R2L attack, and (e) U2R attack.
Comparison results of the proposed system against existing security system using artificial intelligence approaches
| Ref. | Model | Datasets | Accuracy % |
|---|---|---|---|
| [ | SAAE-DNN | NSL-KDD test | 87.74% |
| [ | ICVAE-DNN | NSL-KDD test | 85.97% |
| [ | Bagging | NSL-KDD test | 90.41% |
| [ | GAR-forest | NSL-KDD test | 90% |
| Proposed system | LSTM | NSL-KDD test | 97.07% with multiple classification and 97.82 with binary classification |