| Literature DB >> 36123364 |
Umme Zahoora1, Asifullah Khan2,3,4, Muttukrishnan Rajarajan5, Saddam Hussain Khan1,6, Muhammad Asam1, Tauseef Jamal1.
Abstract
Ransomware attacks pose a serious threat to Internet resources due to their far-reaching effects. It's Zero-day variants are even more hazardous, as less is known about them. In this regard, when used for ransomware attack detection, conventional machine learning approaches may become data-dependent, insensitive to error cost, and thus may not tackle zero-day ransomware attacks. Zero-day ransomware have normally unseen underlying data distribution. This paper presents a Cost-Sensitive Pareto Ensemble strategy, CSPE-R to detect novel Ransomware attacks. Initially, the proposed framework exploits the unsupervised deep Contractive Auto Encoder (CAE) to transform the underlying varying feature space to a more uniform and core semantic feature space. To learn the robust features, the proposed CSPE-R ensemble technique explores different semantic spaces at various levels of detail. Heterogeneous base estimators are then trained over these extracted subspaces to find the core relevance between the various families of the ransomware attacks. Then, a novel Pareto Ensemble-based estimator selection strategy is implemented to achieve a cost-sensitive compromise between false positives and false negatives. Finally, the decision of selected estimators are aggregated to improve the detection against unknown ransomware attacks. The experimental results show that the proposed CSPE-R framework performs well against zero-day ransomware attacks.Entities:
Mesh:
Year: 2022 PMID: 36123364 PMCID: PMC9485118 DOI: 10.1038/s41598-022-19443-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Abstract diagram of the proposed CSPE-R system.
Figure 2Ransomware family vs. no of samples.
Train test split of ransomware families.
| Seen classes | Unseen classes |
|---|---|
| Citroni | Pgpcoder |
| CryptLocker | Reveton |
| Cryptowall | TeslaCrypt |
| Kollah | Trojan Ransom |
| Kovter | |
| Locker | |
| Matsnu |
Figure 3Core feature hunting using CAE.
Cost matrix of errors.
| Models | FN | FP |
|---|---|---|
| SVM | 2 | 8 |
| RF | 6 | 12 |
| LR | 2 | 16 |
Figure 4Heterogeneous estimators learning on different feature spaces.
Figure 5Algorithm 1: Creating deep heterogeneous ensemble.
Figure 6Algorithm 2: generating and sorting Pareto-front.
Figure 7Algorithm 3: Pseudocode of finding dominant estimators.
Figure 8Comparison of original features vs. CAE reduced transformation on zero-day test data.
Figure 9Comparison of different feature subspaces against zero-day test data.
Figure 10PR-Curve of different feature spaces.
Comparison of the proposed ensemble method with and without using classifier selection strategy.
| Methods | TN | FP | FN | TP | Net-err | Recall | Acc(%) |
|---|---|---|---|---|---|---|---|
| Proposed method with ensemble selection | 116 | 17 | 1 | 133 | 18 | 0.99 | 93.0 |
| Proposed method without ensemble selection | 94 | 39 | 0 | 134 | 39 | 1.00 | 85.3 |
| Front 1 | 116 | 17 | 1 | 133 | 18 | 0.99 | 93.0 |
| Front 2 | 93 | 40 | 1 | 133 | 41 | 0.99 | 84.6 |
Comparison of the proposed ensemble with individual base learners concerning Accuracy, Recall, and F1-score on zero-day test data.
| Methods | Accuracy | Recall | F1 |
|---|---|---|---|
| Proposed CFH-SVM(100 ) | 0.90 | 0.90 | |
| Proposed CFH-RF( 100) | 0.84 | 0.91 | |
| Proposed CFH-LR(100) | 0.90 | 0.89 | 0.90 |
| Proposed CFH-SVM(500) | 0.87 | ||
| Proposed CFH-RF(500) | 0.89 | 0.79 | 0.88 |
| Proposed CFH-LR(500) | 0.89 | 0.87 | 0.89 |
| SVM | 0.88 | 0.79 | 0.87 |
| RF | 0.80 | 0.76 | 0.79 |
| LR | 0.90 | 0.81 | 0.89 |
| Proposed CSPE-R Ensemble |
The black bold values show the top-1 results.
Comparison with current techniques.
| Athors | Method | Recall | Test data |
|---|---|---|---|
| Our proposed | CSPE-R | 0.99 | Unknown attacks |
| Khan et al.[ | DNAact-Ran OC(SVM) | 0.82 | Known attacks |
| Al-rimy et al.[ | Anomaly | 0.89 | Unknown attacks |
| Sgandurra et al.[ | EldeRAN | UnKnown attacks | |
| Zhang et al.[ | SA-CNN | 0.87 | Known attacks |
Significant values are in [bold].
Figure 11Comparison between state of the art ensemble methods.
Paired t-test comparison with base estimator on test data.
| Estimator’s | RF | SVM | LR | CFH-RF(100) | CFH-SVM(100) | CFH-LR(100) | CFH-RF(500) | CFH-SVM(500) | CFH-LR(500) |
|---|---|---|---|---|---|---|---|---|---|
| CSPE-R Ensemble | 1.97E−15 | 9.29E−17 | 2.30E−16 | 1.62E−11 | 8.07E−14 | 3.37e−14 | 2.32E−16 | 8.07E−14 | 3.37E−14 |
| Hypothesis test | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |