| Literature DB >> 36015744 |
Amit Kumar Balyan1, Sachin Ahuja1, Umesh Kumar Lilhore2, Sanjeev Kumar Sharma1, Poongodi Manoharan3, Abeer D Algarni4, Hela Elmannai4, Kaamran Raahemifar5,6,7.
Abstract
Due to the rapid growth in IT technology, digital data have increased availability, creating novel security threats that need immediate attention. An intrusion detection system (IDS) is the most promising solution for preventing malicious intrusions and tracing suspicious network behavioral patterns. Machine learning (ML) methods are widely used in IDS. Due to a limited training dataset, an ML-based IDS generates a higher false detection ratio and encounters data imbalance issues. To deal with the data-imbalance issue, this research develops an efficient hybrid network-based IDS model (HNIDS), which is utilized using the enhanced genetic algorithm and particle swarm optimization(EGA-PSO) and improved random forest (IRF) methods. In the initial phase, the proposed HNIDS utilizes hybrid EGA-PSO methods to enhance the minor data samples and thus produce a balanced data set to learn the sample attributes of small samples more accurately. In the proposed HNIDS, a PSO method improves the vector. GA is enhanced by adding a multi-objective function, which selects the best features and achieves improved fitness outcomes to explore the essential features and helps minimize dimensions, enhance the true positive rate (TPR), and lower the false positive rate (FPR). In the next phase, an IRF eliminates the less significant attributes, incorporates a list of decision trees across each iterative process, supervises the classifier's performance, and prevents overfitting issues. The performance of the proposed method and existing ML methods are tested using the benchmark datasets NSL-KDD. The experimental findings demonstrated that the proposed HNIDS method achieves an accuracy of 98.979% on BCC and 88.149% on MCC for the NSL-KDD dataset, which is far better than the other ML methods i.e., SVM, RF, LR, NB, LDA, and CART.Entities:
Keywords: Hybrid IDS; genetic algorithm; intrusion detection; machine learning; particle swarm optimization; random forest; security
Mesh:
Year: 2022 PMID: 36015744 PMCID: PMC9414798 DOI: 10.3390/s22165986
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Working process of NIDS.
Review of existing work in the field of intrusion detection.
| Reference | Methods/Techniques | Key Features | Challenges/Improvement |
|---|---|---|---|
| [ | A hybrid method based on GA and ANN | Better precision and recall. | No real-time data set. Accuracy can be improved by adding two-way training. Analysis of variance (ANOVA) is missing. |
| [ | Ensemble model based on meta-classification | Better precision and accuracy compared to other methods. | The training and testing process time is lengthy. |
| [ | Risk analysis of RPL and OFS | Capable of dealing with high-dimensional data. | Training requires a significant amount of time. |
| [ | Deep learning in IDS | DNNs perform outstandingly in terms of better precision and recall. | Only limited datasets were used. |
| [ | Deep-learning approach in NIDS | Reduce false alarms and training times. | ANOVA is not implemented. |
The present state of the art in the field of intrusion detection research.
| Key Method | References | Form of Benchmark Data | The Categories/Type of Intrusion in the IDS Dataset | Performance Measuring Criteria |
|---|---|---|---|---|
| Single technique-based recurrent neural network | [ | Standard IDS data set | Probe, DoS, U2R and R2L | Precision, positive detection rate, false-positive rate. |
| [ | Standard IDS data set | SMTP, HTTP web, IAMP, TCP, ICMP, secure web, misapplication, IRC, Flow-Gen, ICMP, and DNS | Recall, F1-score, precision, AUC error rate, accuracy. | |
| Machine learning methods | [ | Standard IDS data set | Probe, DoS, U2R and R2L | True positive rate, accuracy, F1-score, precision, and recall. |
| [ | A real-time IDS data set | Probe, DoS, U2R and R2L | Accuracy, TPR, FP, TN, precision, TNR, recall. | |
| [ | Standard IDS data set | Threats, malware, cyber threats | Accuracy and precision. | |
| Evolutionary Machine Learning methods | [ | Standard IDS data set | DoS, R2L, U2R and probe | Precision, recall true positive: false negative, false positive, true negative. |
| [ | Standard IDS data set | DoS, R2L, U2R and probe, SYN, threats, and DDoS | Precision, TPR, F1-score, accuracy. |
Comparison of existing research based on dimensionality reduction in IDS.
| Reference | Method Used | Major Contribution | Challenges/Limitations |
|---|---|---|---|
| [ | Principal component analysis | Able to manage large datasets, more efficient. | Not able to handle nonlinear problems. |
| [ | Auto-encoders | It does not require any prior assumptions for the reduction. | Slower in speed. |
| [ | Missing value ratio | Mainly finds out the missing values and NULL values. | Works on the specific data framework. |
| [ | Low variance filter | It eliminates the low variance filter in specific dimensions. | Can work on limited data. |
| [ | Factor analysis | It can analyze various data factors. | Slower. |
| [ | Forward feature selection | It works in the forward direction. | Works on the specific data framework. |
| [ | Uniform manifold approximation and projection (UMAP) | UMAP is crafted from a theoretical foundation predicated on Riemannian manifolds and algebraic configuration. | Can work on limited data. |
| [ | Random forest | It constructs a dimension reduction tree based on the decision. | Can work on limited data. |
Comparisons of current research for handling the concept of drift and model decay in IDS.
| Reference | Research Scope | Major Contribution | Challenges/Limitations |
|---|---|---|---|
| [ | Single classifier | It involves a single classification method. | Poor accuracy results. |
| [ | Ensemble classifier for active state | It involves multiple classification active state methods. | More accurate and efficient. |
| [ | Ensemble classifier for passive state | It involves multiple classification methods for passive state data. | More accurate. |
| [ | Chunk-based data level | It involves actual attribute selection, employing continuously adaptive statistics analyses of input and probabilities of class labels. | It can work on specific data types. |
| [ | Online learning based | It involves the online learning method. | Work on limited data. |
Figure 2The working of the proposed hybrid IDS model.
Description of the NSL-KDD dataset.
| Attack Classes in NSL KDD Dataset | Types of Attacks |
|---|---|
| Probe class | IP-sweep, Satan, Port-sweep, N-map, Saint, and M-scan |
| DoS class | Land, Back, Tear-drop, Pod, Neptune, Smurf, Worm, Mail-bomb, Process-table, Apache-2, and Upstorm. |
| R2L class | Ftp_write, Guess_password, Phf, Imap, Warez-master, Multi-hop, Xlock, Snmpguess, Xsnoop, HTTP-tunnel, Snmpget-attack, Named, and Send-mail. |
| U2R class | Buffer_overflow, Xterm, Rootkit, Loadmodule, Sqlattack, Perl, and Ps |
Binary class classification (NSL-KDD) dataset.
| Data Set Type | Normal Record | Abnormal Record | Total Records |
|---|---|---|---|
| KDD-Test−21 dataset | 2125 | 9698 | 11,850 |
| KDD-Train+ dataset | 67,343 | 58,630 | 1,25,973 |
| KDD-Test+ dataset | 22,544 | 9711 | 12,833 |
Multi-class classification (NSL-KDD) dataset.
| Data Set Type | Normal-Class | DoS-Class | Probe-Class | R2-L Class | U2-R Class | Total Records |
|---|---|---|---|---|---|---|
| KDD-Test−21 dataset | 2125 | 4344 | 2421 | 2885 | 67 | 11,850 |
| KDD-Train+ dataset | 67,343 | 45,927 | 11,656 | 9,95 | 52 | 1,25,973 |
| KDD-Test+ dataset | 9711 | 7460 | 2421 | 2885 | 67 | 22,544 |
Experimental result of BCC on KDD-Train+ dataset.
| Classification Method | Accuracy % | TPR % | TNR % | Precision % | FPR % | FNR % | Recall % | False Alarm % | F1-Score % |
|---|---|---|---|---|---|---|---|---|---|
| SVM | 87.798 | 96.748 | 95.656 | 97.847 | 3.147 | 2.968 | 91.414 | 91.242 | 96.665 |
| LDA | 91.568 | 93.447 | 91.774 | 96.778 | 2.987 | 3.145 | 90.565 | 78.127 | 96.321 |
| RF | 97.498 | 97.658 | 95.998 | 97.847 | 2.747 | 3.541 | 94.665 | 91.334 | 97.478 |
| Naïve Bayes | 94.657 | 96.114 | 92.045 | 95.624 | 2.147 | 2.747 | 93.554 | 79.542 | 95.078 |
| CART | 94.592 | 92.474 | 94.141 | 96.547 | 4.145 | 2.878 | 92.778 | 81.632 | 95.621 |
| LR | 92.314 | 90.852 | 90.784 | 91.263 | 3.564 | 2.145 | 90.112 | 80.471 | 91.256 |
| Proposed HNIDS | 98.979 | 99.658 | 98.996 | 99.847 | 0.974 | 0.774 | 96.124 | 70.021 | 99.414 |
Figure 3(a) Confusion matrix and (b) normalized confusion matrix for proposed HNIDS model for BCC.
Figure 4Accuracy % for proposed HNIDS model for BCC.
Experimental results of MCC on KDD-Train+ dataset.
| Classification Method | Accuracy % | TPR % | TNR % | Precision % | FPR % | FNR % | Recall % | False Alarm % | F1-Score % |
|---|---|---|---|---|---|---|---|---|---|
| SVM | 84.171 | 87.148 | 85.656 | 78.417 | 17.477 | 12.878 | 83.659 | 81.442 | 86.005 |
| LDA | 82.814 | 85.701 | 81.774 | 74.814 | 18.657 | 13.557 | 82.447 | 79.477 | 86.201 |
| RF | 82.898 | 82.601 | 85.998 | 77.457 | 16.047 | 11.121 | 88.936 | 71.445 | 83.008 |
| Naïve Bayes | 86.771 | 84.457 | 82.045 | 75.478 | 17.107 | 14.877 | 85.223 | 77.502 | 85.968 |
| CART | 83.252 | 85.441 | 84.141 | 78.719 | 18.547 | 13.978 | 86.445 | 76.694 | 83.771 |
| LR | 80.124 | 83.278 | 80.117 | 75.961 | 15.623 | 11.451 | 80.978 | 71.457 | 80.584 |
| ProposedHNIDS | 88.149 | 88.661 | 87.996 | 82.867 | 11.714 | 10.414 | 90.447 | 70.101 | 83.478 |
Figure 5(a) Confusion matrix and (b) normalized confusion matrix for the proposed HNIDS model for MCC.
Figure 6Accuracy % for proposed HNIDS model for MCC.