| Literature DB >> 33110340 |
Abdul Basit1, Maham Zafar1, Xuan Liu2, Abdul Rehman Javed3, Zunera Jalil3, Kashif Kifayat3.
Abstract
In recent times, a phishing attack has become one of the most prominent attacks faced by internet users, governments, and service-providing organizations. In a phishing attack, the attacker(s) collects the client's sensitive data (i.e., user account login details, credit/debit card numbers, etc.) by using spoofed emails or fake websites. Phishing websites are common entry points of online social engineering attacks, including numerous frauds on the websites. In such types of attacks, the attacker(s) create website pages by copying the behavior of legitimate websites and sends URL(s) to the targeted victims through spam messages, texts, or social networking. To provide a thorough understanding of phishing attack(s), this paper provides a literature review of Artificial Intelligence (AI) techniques: Machine Learning, Deep Learning, Hybrid Learning, and Scenario-based techniques for phishing attack detection. This paper also presents the comparison of different studies detecting the phishing attack for each AI technique and examines the qualities and shortcomings of these methodologies. Furthermore, this paper provides a comprehensive set of current challenges of phishing attacks and future research direction in this domain. © Springer Science+Business Media, LLC, part of Springer Nature 2020.Entities:
Keywords: Advanced phishing techniques; Cyberattack; Deep learning; Hybrid learning; Internet security; Machine learning; Phishing attack; Security threats
Year: 2020 PMID: 33110340 PMCID: PMC7581503 DOI: 10.1007/s11235-020-00733-2
Source DB: PubMed Journal: Telecommun Syst ISSN: 1018-4864 Impact factor: 2.314
Fig. 1Phishing attack diagram [26]
Fig. 2Phishing report for third quarter of the year 2019 [1]
Fig. 3Most targeted industry sectors—3rd quarter 2019 [3]
Fig. 4Taxonomy of this survey focusing on phishing attack detection studies
Fig. 5Deep learning for phishing attack detection
Fig. 6Machine learning for phishing attack detection
ML approaches for phishing websites detection
| Authors | Classification method | Feature selection method | Accuracy (%) |
|---|---|---|---|
| James et al. [ | J48, JBK, SVM, NB | – | 89.75 |
| Abdelhamid et al. [ | eDRI | – | 93.5 |
| Mao et al. [ | SVM, RF, DT, AB | – | 97.31 |
| Jain and Gupta [ | – | Feature extraction | 99.09 |
| Hota et al. [ | CART, C4.5 | RRFST | 99.11 |
| Ubing et al. [ | EL | – | 95.4 |
| Chen and Chen [ | ELM, SVM, LR, C$.5, LC-ELM, KNN, XGB | ANOVA | 99.2 |
Comparison of scenario based studies
| Authors | Scenarios | Method | Accuracy |
|---|---|---|---|
| Yao et al. [ | Identity detection processs | Logo extraction | 98.3% |
| Curtis et al. [ | Dark traid attacker’s concept | Dark traid | – |
| Williams et al. [ | 62,000 employers over 6 weeks of observation | Theoretical approaches | – |
| Parsons et al. [ | Worked on 985 participants | ANOVA | – |
Comparison of scenario based studies
| Authors | Classification method | Feature selection method | Accuracy (%) |
|---|---|---|---|
| Subasi et al. [ | ANN, KNN, RF, SVM, C4.5, RF | – | 97.36 |
| Tyagi et al. [ | DT, RF, GBM | PCA | 98.4 |
| Mao et al. [ | SVM, RF, DT, AB | – | 97.31 |
| Jagadeesan et al. [ | RF, SVM | – | 95.11 |
| Joshi et al. [ | RF, RA | RA | 97.63 |
| Sahingoz et al. [ | SVM, DT, RF, KNN, KS, NB | NLP | 97.98 |
Comparison of hybrid methods used in state-of-the-art
| Authors | Classification method | Accuracy (%) |
|---|---|---|
| Patil et al. [ | LR, DT, RF | 96.58 |
| Niranjan et al. [ | RC, KNN, IBK, LR, PART | 97.3 |
| Chiew et al. [ | RF, C4.5, Part, SVM, NB | 96.17 |
| Pandey et al. [ | RF, SVM | 94 |
Comparison table of state-of-the-art studies focusing on phishing techniques
| Authors | Classification | Feature selection technique | Accuracy |
|---|---|---|---|
| James et al. [ | J48, IBK, SVM, NB | – | 89.75% |
| Subasi et al. [ | ANN, kNN, RF, SVM, C4.5, RF | – | 97.36% |
| Abdelhamid et al. [ | eDRI | – | 93.5% |
| Mao et al. [ | SVM, DT | – | 93% |
| Jain and Gupta [ | – | – | 99.09% |
| Yao et al. [ | – | – | 98.3% |
| Patil et al. [ | LR, DT, RF | – | 96.58% |
| Jagadeesan et al. [ | RF, SVM | – | 95.11% |
| Hota et al. [ | CART, C4.5 | RRFST | 99.11% |
| Tyagi et al. [ | DT, RF, GBM | PCA | 98.40% |
| Curtis et al. [ | – | – | – |
| Sahingoz et al. [ | SVM, DT, RF, kNN, KS, NB | NLP | 97.98% |
| Parsons et al. [ | – | – | – |
| Joshi et al. [ | RF, RA | RA | 97.63% |
| Ubing et al. [ | EL | – | 95.4% |
| Mao et al. [ | SVM, RF, DT, AB | – | 97.31% |
| Williams et al. [ | – | – | – |
| Niranjan et al. [ | RC, kNN, IBK, LR, PART | – | 97.3% |
| Chen and Chen [ | ELM, SVM, LR, C4.5, LC-ELM, kNN, XGB | ANOVA | 99.2% |
| Chiew et al. [ | RF, C4.5, PART, SVM, NB | – | 96.17% |
| Pandey et al. [ | SVM, RF | – | 94% |