| Literature DB >> 34278030 |
Abdullateef O Balogun1, Kayode S Adewole1, Muiz O Raheem1, Oluwatobi N Akande2, Fatima E Usman-Hamza1, Modinat A Mabayoje1, Abimbola G Akintola1, Ayisat W Asaju-Gbolagade1, Muhammed K Jimoh3, Rasheed G Jimoh1, Victor E Adeyemo4.
Abstract
The phishing attack is one of the most complex threats that have put internet users and legitimate web resource owners at risk. The recent rise in the number of phishing attacks has instilled distrust in legitimate internet users, making them feel less safe even in the presence of powerful antivirus apps. Reports of a rise in financial damages as a result of phishing website attacks have caused grave concern. Several methods, including blacklists and machine learning-based models, have been proposed to combat phishing website attacks. The blacklist anti-phishing method has been faulted for failure to detect new phishing URLs due to its reliance on compiled blacklisted phishing URLs. Many ML methods for detecting phishing websites have been reported with relatively low detection accuracy and high false alarm. Hence, this research proposed a Functional Tree (FT) based meta-learning models for detecting phishing websites. That is, this study investigated improving the phishing website detection using empirical analysis of FT and its variants. The proposed models outperformed baseline classifiers, meta-learners and hybrid models that are used for phishing websites detection in existing studies. Besides, the proposed FT based meta-learners are effective for detecting legitimate and phishing websites with accuracy as high as 98.51% and a false positive rate as low as 0.015. Hence, the deployment and adoption of FT and its meta-learner variants for phishing website detection and applicable cybersecurity attacks are recommended.Entities:
Keywords: Bagging; Boosting; Ensemble; Functional trees; Machine learning; Meta-learning; Phishing websites; Rotation forest
Year: 2021 PMID: 34278030 PMCID: PMC8264617 DOI: 10.1016/j.heliyon.2021.e07437
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Building a functional tree.
Figure 2Pruning a functional tree.
Figure 3Bagging algorithm.
Figure 4Boosting algorithm.
Figure 5Rotation forest algorithm.
Figure 6Experimental framework.
Performance comparison of FT and its variants with baseline classifiers on Dataset 1.
| FT-1 | FT-2 | FT-3 | NB | SVM | SMO | Dec Table | |
|---|---|---|---|---|---|---|---|
| Accuracy (%) | 95.50 | 95.22 | 90.70 | 94.60 | 92.70 | 93.44 | |
| F-Measure | 0.955 | 0.952 | 0.907 | 0.946 | 0.927 | 0.934 | |
| AUC | 0.973 | 0.951 | 0.962 | 0.944 | 0.925 | 0.981 | |
| TP-Rate | 0.955 | 0.952 | 0.907 | 0.946 | 0.927 | 0.934 | |
| FP-Rate | 0.048 | 0.051 | 0.098 | 0.059 | 0.078 | 0.073 | |
| MCC | 0.909 | 0.903 | 0.811 | 0.891 | 0.852 | 0.867 |
Performance comparison of FT and its variants with baseline classifiers on Dataset 2.
| FT-1 | FT-2 | FT-3 | NB | SVM | SMO | Dec Table | |
|---|---|---|---|---|---|---|---|
| Accuracy (%) | 96.79 | 96.64 | 85.15 | 91.49 | 93.87 | 95.79 | |
| F-Measure | 0.968 | 0.966 | 0.850 | 0.915 | 0.939 | 0.958 | |
| AUC | 0.977 | 0.966 | 0.949 | 0.915 | 0.939 | 0.982 | |
| TP-Rate | 0.968 | 0.966 | 0.852 | 0.915 | 0.939 | 0.958 | |
| FP-Rate | 0.032 | 0.034 | 0.149 | 0.085 | 0.061 | 0.042 | |
| MCC | 0.936 | 0.933 | 0.715 | 0.830 | 0.878 | 0.916 |
Performance comparison of FT and its variants with baseline classifiers on Dataset 3.
| FT-1 | FT-2 | FT-3 | NB | SVM | SMO | Dec Table | |
|---|---|---|---|---|---|---|---|
| Accuracy (%) | 88.91 | 88.99 | 84.10 | 85.66 | 86.00 | 84.47 | |
| F-Measure | 0.890 | 0.891 | 0.825 | 0.825 | 0.846 | 0.839 | |
| AUC | 0.950 | 0.910 | 0.948 | 0.867 | 0.900 | 0.954 | |
| TP-Rate | 0.889 | 0.890 | 0.841 | 0.857 | 0.860 | 0.845 | |
| FP-Rate | 0.074 | 0.074 | 0.120 | 0.123 | 0.109 | 0.110 | |
| MCC | 0.810 | 0.817 | 0.722 | 0.734 | 0.757 | 0.737 |
Performance comparison of FT-based Meta-learners on Dataset 1.
| FT-1 | FT-2 | FT-3 | RoF-FT-1 | RoF-FT-2 | RoF-FT-3 | BG-FT-1 | BG-FT-2 | BG-FT-3 | BT-FT-1 | BT-FT-2 | BT-FT-3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | 95.50 | 96.07 | 95.22 | 96.78 | 96.83 | 96.49 | 96.77 | 96.57 | 96.44 | 97.00 | 96.9 | |
| F-Measure | 0.955 | 0.961 | 0.952 | 0.968 | 0.968 | 0.965 | 0.968 | 0.966 | 0.964 | 0.970 | 0.969 | |
| AUC | 0.973 | 0.987 | 0.951 | 0.995 | 0.988 | 0.995 | 0.995 | 0.990 | 0.995 | 0.995 | ||
| TP-Rate | 0.955 | 0.961 | 0.952 | 0.968 | 0.968 | 0.965 | 0.968 | 0.966 | 0.964 | 0.970 | 0.969 | |
| FP-Rate | 0.048 | 0.041 | 0.051 | 0.035 | 0.033 | 0.037 | 0.035 | 0.036 | 0.037 | 0.032 | 0.033 | |
| MCC | 0.909 | 0.920 | 0.903 | 0.935 | 0.936 | 0.929 | 0.935 | 0.93 | 0.928 | 0.939 | 0.937 |
Performance comparison of FT-based Meta-learners on Dataset 2.
| FT-1 | FT-2 | FT-3 | RoF-FT-1 | RoF-FT-2 | RoF-FT-3 | BG-FT-1 | BG-FT-2 | BG-FT-3 | BT-FT-1 | BT-FT-2 | BT-FT-3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | 96.79 | 97.86 | 96.64 | 97.43 | 98.32 | 97.4 | 97.58 | 98.21 | 97.33 | 98.11 | 97.84 | |
| F-Measure | 0.968 | 0.979 | 0.966 | 0.974 | 0.983 | 0.974 | 0.976 | 0.982 | 0.973 | 0.981 | 0.978 | |
| AUC | 0.977 | 0.992 | 0.966 | 0.996 | 0.994 | 0.996 | 0.997 | 0.994 | 0.997 | 0.997 | ||
| TP-Rate | 0.968 | 0.979 | 0.966 | 0.974 | 0.983 | 0.974 | 0.976 | 0.982 | 0.973 | 0.981 | 0.978 | |
| FP-Rate | 0.032 | 0.021 | 0.034 | 0.026 | 0.017 | 0.026 | 0.024 | 0.018 | 0.027 | 0.019 | 0.022 | |
| MCC | 0.936 | 0.957 | 0.933 | 0.949 | 0.966 | 0.948 | 0.952 | 0.964 | 0.947 | 0.962 | 0.957 |
Performance comparison of FT-based Meta-learners on Dataset 3.
| FT-1 | FT-2 | FT-3 | RoF-FT-1 | RoF-FT-2 | RoF-FT-3 | BG-FT-1 | BG-FT-2 | BG-FT-3 | BT-FT-1 | BT-FT-2 | BT-FT-3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | 88.91 | 90.24 | 88.99 | 89.87 | 89.80 | 88.77 | 90.32 | 88.70 | 89.06 | 89.28 | 87.73 | |
| F-Measure | 0.890 | 0.903 | 0.891 | 0.899 | 0.898 | 0.888 | 0.903 | 0.887 | 0.891 | 0.893 | 0.877 | |
| AUC | 0.950 | 0.970 | 0.910 | 0.973 | 0.954 | 0.972 | 0.978 | 0.962 | 0.963 | 0.967 | 0.966 | |
| TP-Rate | 0.889 | 0.902 | 0.890 | 0.899 | 0.898 | 0.888 | 0.903 | 0.887 | 0.891 | 0.893 | 0.877 | |
| FP-Rate | 0.074 | 0.074 | 0.071 | 0.071 | 0.07 | 0.079 | 0.073 | 0.076 | 0.082 | 0.079 | 0.091 | |
| MCC | 0.810 | 0.826 | 0.817 | 0.824 | 0.825 | 0.808 | 0.828 | 0.810 | 0.808 | 0.812 | 0.785 |
Detection Comparison of proposed methods with existing methods on Dataset 1.
| Phishing Models | Accuracy (%) | F-Measure | AUC | TP-Rate | FP-Rate | MCC |
|---|---|---|---|---|---|---|
| Aydin and Baykal [ | 95.39 | 0.938 | 0.936 | - | 0.046 | - |
| Dedakia and Mistry [ | 94.29 | - | - | - | - | - |
| Mohammad, et al. [ | 92.18 | - | - | - | - | - |
| Ubing, et al. [ | 95.40 | 0.947 | - | - | 0.041 | - |
| Ali and Ahmed [ | 91.13 | - | - | - | - | - |
| Verma and Das [ | 94.43 | - | - | - | - | - |
| Hadi, et al. [ | 92.40 | - | - | - | - | - |
| Chiew, et al. [ | 93.22 | - | - | - | - | - |
| Rahman, et al. [ | 94.00 | - | - | - | 0.049 | - |
| Rahman, et al. [ | 95.00 | - | - | - | 0.039 | - |
| Chandra and Jana [ | 92.72 | - | - | - | - | - |
| Folorunso, et al. [ | 95.97 | - | - | - | - | - |
| Folorunso, et al. [ | 94.10 | - | - | - | - | - |
| Al-Ahmadi and Lasloum [ | 96.65 | 0.965 | - | - | - | - |
| Alsariera, et al. [ | 96.26 | - | 0.994 | - | 0.050 | - |
| Ali and Malebary [ | 96.43 | - | - | - | - | - |
| Ferreira, et al. [ | 87.61 | - | - | - | - | - |
| Vrbančič, et al. [ | 96.50 | - | - | - | - | - |
| 96.78 | 0.968 | 0.995 | 0.968 | 0.035 | 0.935 | |
| 96.83 | 0.968 | 0.968 | 0.033 | 0.936 | ||
| 96.49 | 0.965 | 0.988 | 0.965 | 0.037 | 0.929 | |
| 96.77 | 0.968 | 0.995 | 0.968 | 0.035 | 0.935 | |
| 96.57 | 0.966 | 0.995 | 0.966 | 0.036 | 0.930 | |
| 96.44 | 0.964 | 0.990 | 0.964 | 0.037 | 0.928 | |
| 97.00 | 0.97 | 0.97 | 0.032 | 0.939 | ||
| 0.995 | ||||||
| 96.9 | 0.969 | 0.995 | 0.969 | 0.033 | 0.937 |
Indicates methods proposed in this study.
Detection Comparison of proposed methods with existing methods on Dataset 2.
| Phishing Models | Accuracy (%) | F-Measure | AUC | TP-Rate | FP-Rate | MCC |
|---|---|---|---|---|---|---|
| Chiew, et al. [ | 94.60 | - | - | - | - | - |
| Rahman, et al. [ | 87.00 | - | - | - | 0.078 | - |
| Rahman, et al. [ | 91.00 | - | - | - | 0.067 | - |
| 97.43 | 0.974 | 0.996 | 0.974 | 0.026 | 0.949 | |
| 98.32 | 0.983 | 0.983 | 0.017 | 0.966 | ||
| 97.4 | 0.974 | 0.994 | 0.974 | 0.026 | 0.948 | |
| 97.58 | 0.976 | 0.996 | 0.976 | 0.024 | 0.952 | |
| 98.21 | 0.982 | 0.997 | 0.982 | 0.018 | 0.964 | |
| 97.33 | 0.973 | 0.994 | 0.973 | 0.027 | 0.947 | |
| 98.11 | 0.981 | 0.997 | 0.981 | 0.019 | 0.962 | |
| 97.84 | 0.978 | 0.997 | 0.978 | 0.022 | 0.957 |
Indicates methods proposed in this study.
Detection Comparison of proposed methods with existing methods on Dataset 3.
| Phishing Models | Accuracy (%) | F-Measure | AUC | TP-Rate | FP-Rate | MCC |
|---|---|---|---|---|---|---|
| Rahman, et al. [ | 88.00 | - | - | - | 0.099 | - |
| Rahman, et al. [ | 87.00 | - | - | - | 0.087 | - |
| 89.87 | 0.899 | 0.973 | 0.899 | 0.071 | 0.824 | |
| 89.80 | 0.898 | 0.954 | 0.898 | 0.070 | 0.825 | |
| 88.77 | 0.888 | 0.972 | 0.888 | 0.079 | 0.808 | |
| 90.32 | 0.903 | 0.978 | 0.903 | 0.073 | 0.828 | |
| 88.70 | 0.887 | 0.962 | 0.887 | 0.076 | 0.810 | |
| 89.06 | 0.891 | 0.963 | 0.891 | 0.082 | 0.808 | |
| 89.28 | 0.893 | 0.967 | 0.893 | 0.079 | 0.812 | |
| 87.73 | 0.877 | 0.966 | 0.877 | 0.091 | 0.785 |
Indicates methods proposed in this study.
Figure 7Performance comparison of FT-2 with baseline classifiers on Dataset 1.
Figure 8Performance comparison of FT-2 with baseline classifiers on Dataset 2.
Figure 9Performance comparison of FT-2 with baseline classifiers on Dataset 3.
Figure 10Performance comparison of FT-2 variant as a base classifier for meta-learners on Dataset 1.
Figure 11Performance comparison of FT-2 variant as a base classifier for meta-learners on Dataset 2.
Figure 12Performance comparison of FT-2 variant as a base classifier for meta-learners on Dataset 3.
Figure 13Comparison of BT-FT-2 with existing methods on Dataset 2.
Figure 14Comparison of RoF-FT-2 with existing methods on Dataset 3.