| Literature DB >> 35082844 |
Rundong Yang1, Kangfeng Zheng1, Bin Wu1, Di Li1, Zhe Wang1, Xiujuan Wang2.
Abstract
While antiphishing techniques have evolved over the years, phishing remains one of the most threatening attacks on current network security. This is because phishing exploits one of the weakest links in a network system-people. The purpose of this research is to predict the possible phishing victims. In this study, we propose the multidimensional phishing susceptibility prediction model (MPSPM) to implement the prediction of user phishing susceptibility. We constructed two types of emails: legitimate emails and phishing emails. We gathered 1105 volunteers to join our experiment by recruiting volunteers. We sent these emails to volunteers and collected their demographic, personality, knowledge experience, security behavior, and cognitive processes by means of a questionnaire. We then applied 7 supervised learning methods to classify these volunteers into two categories using multidimensional features: susceptible and nonsusceptible. The experimental results indicated that some machine learning methods have high accuracy in predicting user phishing susceptibility, with a maximum accuracy rate of 89.04%. We conclude our study with a discussion of our findings and their future implications.Entities:
Mesh:
Year: 2022 PMID: 35082844 PMCID: PMC8786481 DOI: 10.1155/2022/7058972
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1A phishing susceptibility prediction model. The model contains a susceptibility feature extraction part, a classification part, and a model prediction part, which enables prediction of susceptibility.
Multidimensional attribute features.
| Attribute | Features | Category | Frequency | Percentage |
|---|---|---|---|---|
| Demographics | Age | <20 | 67 | 6.06 |
| 20–30 | 720 | 65.16 | ||
| 30–40 | 107 | 9.68 | ||
| 40–50 | 128 | 11.58 | ||
| >50 | 83 | 7.51 | ||
| Education level | Below high school | 61 | 5.52 | |
| Vocational high school/high school | 109 | 9.86 | ||
| Undergraduates | 610 | 55.2 | ||
| Graduate student or above | 325 | 29.14 | ||
| Gender | Male | 555 | 50.23 | |
| Female | 550 | 49.77 | ||
| Annual income | < ¥30,000 | 477 | 43.17 | |
| ¥30,000–¥100,000 | 369 | 33.39 | ||
| ¥100,000–¥200,000 | 178 | 16.11 | ||
| > ¥ 200,000 | 81 | 7.33 | ||
| Personality | Personality | Conscientiousness | 124 | 11.22 |
| Extraversion | 18 | 0.016 | ||
| Agreeableness | 528 | 47.78 | ||
| Openness | 443 | 40.09 | ||
| Neuroticism | 42 | 0.038 | ||
| Knowledge experience | Computer knowledge | High | 250 | 22.62 |
| Middle | 649 | 58.73 | ||
| Low | 206 | 18.64 | ||
| Network security knowledge | High | 179 | 16.19 | |
| Middle | 583 | 52.76 | ||
| Low | 343 | 31.04 | ||
| Social engineering knowledge | High | 129 | 11.67 | |
| Middle | 568 | 51.14 | ||
| Low | 408 | 36.92 | ||
| Susceptibility | Phished | Yes | 609 | 55.12 |
| No | 496 | 44.88 |
Figure 2Phishing email. This is a fake PayPal phishing email.
Figure 3ROC curves of predictions across models and methods. The figure includes the ROC curves of several models for comparison.
Scores for each learning algorithm metric.
| Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) | |
|---|---|---|---|---|
| LR | 77.52 | 80.55 | 77.12 | 78.80 |
| DT | 83.28 | 82.17 | 88.29 | 85.12 |
| SVM | 72.62 | 73.60 | 77.12 | 75.32 |
| RF | 84.14 | 83.75 | 87.76 | 85.71 |
| GBDT |
|
|
|
|
| XGBoost | 88.46 | 88.54 | 90.42 | 89.47 |
| AdaBoost | 88.47 | 87.75 | 91.48 | 89.58 |
Bold values represent the best performing values among all the modeled properties.
Figure 4Scores regarding classification models. Comparison results of each learning algorithm used in the prediction model for each dataset.
Figure 5Correlation analysis was performed to optimize the model..
Figure 6The influence of personality determined by analyzing the factors influencing the susceptibility to phishing and identifying the most influential factors.