| Literature DB >> 33163974 |
Abstract
The SMS phishing is another method where the phisher operates the SMS as a medium to communicate with the victims and this method is identified as smishing (SMS + phishing). Researchers promoted several anti-phishing methods where the correlation algorithm is applied to explore the relevancy of the features since there are numerous features in the features corpus. The correlation algorithm assesses the rank of the features that is the highest rank leads to the more relevant to the appropriate assignment. Therefore, this paper analyses four rank correlation algorithms particularly Pearson rank correlation, Spearman's rank correlation, Kendall rank correlation, and Point biserial rank correlation with a machine-learning algorithm to determine the best features set for detecting Smishing messages. The result of the investigation reveals that the AdaBoost classifier offered better accuracy. Further analysis shows that the classifier with the ranking algorithm that is Kendall rank correlation appeared superior accuracy than the other correlation algorithms. The inferred of this experiment confirms that the ranking algorithm was able to reduce the dimension of features with 61.53% and presented an accuracy of 98.40%. © Springer Nature Singapore Pte Ltd 2020.Entities:
Keywords: Correlation Algorithm; Machine Learning Algorithm; Phishing; Smishing
Year: 2020 PMID: 33163974 PMCID: PMC7604914 DOI: 10.1007/s42979-020-00377-8
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Fig. 1Number of unique phishing Web sites detected
English text messages
| Total SMS | Phishing SMS | Ham SMS |
|---|---|---|
| 5578 | 747 | 4831 |
Selecting the classifier
| classifier | Precision | Recall | F1-scores | Accuracy |
|---|---|---|---|---|
| Random Forest | 98.39 | 91.42 | 94.72 | 98.66 |
| DecisionTreeClassifier | 93.4 | 92.49 | 92.91 | 98.12 |
| AdaBoostClassifier | 97.75 | 92.23 | 94.86 | 98.67 |
| Support Vector Machine | 97.11 | 92.76 | 94.79 | 98.64 |
Comparative analysis of different correlation algorithms
| Rank algorithm | Number of features | Precision | Recall | F1-scores | Accuracy(%) |
|---|---|---|---|---|---|
| Pearson | 21 | 95.49 | 90.21 | 92.72 | 98.12 |
| Spearman’s | 18 | 96.6 | 91.02 | 93.67 | 98.37 |
| Kendall rank | 20 | 96.49 | 91.42 | 93.97 | 98.40 |
| Point biserial | 21 | 95.49 | 90.21 | 92.72 | 98.12 |
Comparative analysis with other methods
| Methods | Number of features | Accuracy(%) |
|---|---|---|
| Smidca: anti-smishing [ | 20 | 96.16 |
| Distributed System [ | 13 | 92.00 |
| The proposed model | 20 | 98.40 |