| Literature DB >> 34341626 |
Sandhya Mishra1, Devpriya Soni1.
Abstract
With the origin of smart homes, smart cities, and smart everything, smart phones came up as an area of magnificent growth and development. These devices became a part of daily activities of human life. This impact and growth have made these devices more vulnerable to attacks than other devices such as desktops or laptops. Text messages or SMS (Short Text Messages) are a part of smartphones through which attackers target the users. Smishing (SMS Phishing) is an attack targeting smartphone users through the medium of text messages. Though smishing is a type of phishing, it is different from phishing in many aspects like the amount of information available in the SMS, the strategy of attack, etc. Thus, detection of smishing is a challenge in the context of the minimum amount of information shared by the attacker. In the case of smishing, we have short text messages which are often in short forms or in symbolic forms. A single text message contains very few smishing-related features, and it consists of abbreviations and idioms which makes smishing detection more difficult. Detection of smishing is a challenge not only because of features constraint but also due to the scarcity of real smishing datasets. To differentiate spam messages from smishing messages, we are evaluating the legitimacy of the URL (Uniform Resource Locator) in the message. We have extracted the five most efficient features from the text messages to enable the machine learning classification using a limited number of features. In this paper, we have presented a smishing detection model comprising of two phases, Domain Checking Phase and SMS Classification Phase. We have examined the authenticity of the URL in the SMS which is a crucial part of SMS phishing detection. In our system, Domain Checking Phase scrutinizes the authenticity of the URL. SMS Classification Phase examines the text contents of the messages and extracts some efficient features. Finally, the system classifies the messages using Backpropagation Algorithm and compares results with three traditional classifiers. A prototype of the system has been developed and evaluated using SMS datasets. The results of the evaluation achieved an accuracy of 97.93% which shows the proposed method is very efficient for the detection of smishing messages.Entities:
Keywords: Backpropagation Algorithm; Covid-19 SMS Scam; Cyber security; Machine learning; Mobile security; Paytm SMS scam; Phishing; Smishing
Year: 2021 PMID: 34341626 PMCID: PMC8318556 DOI: 10.1007/s00521-021-06305-y
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Fig. 1A smishing message showing Paytm scam
Fig. 2A URL-based smishing message showing Paytm scam
Fig. 3A text message showing Covid-19 scam
Fig. 4Architecture of the proposed System
Fig. 5Architecture of the backpropagation network
Fig. 7Performance of algorithms on our system
Comparison of the proposed model with other proposed systems
| Techniques and details | Feature based [ | Rule based [ | SmiDCA [ | Smishing Detector [ | S-detector [ | Proposed system |
|---|---|---|---|---|---|---|
| Search engine domain matching | NO | NO | NO | NO | NO | YES |
| Source code domain matching | NO | NO | NO | YES | NO | YES |
| Existence of URL | YES | YES | YES | YES | YES | YES |
| Existence of phone number and email id in the message | YES | YES | YES | YES | NO | YES |
| Smishing keywords | YES | YES | YES | YES | YES | YES |
| Misspelled words | NO | NO | YES | NO | NO | YES |
| Leet words | NO | NO | NO | NO | NO | YES |
| Symbols | YES | YES | NO | NO | NO | YES |
| Special characters | NO | NO | YES | NO | NO | YES |
Fig. 6Frequency of each heuristic analyzed
Hyperparameters used in Backpropagation Algorithm
| Hyperparameters | Values |
|---|---|
| No. of hidden nodes | 10 |
| No. of Epochs | 10 |
| Activation function | ReLU |
| Solver | Adam |
| Learning rate | 0.01 |
Fig. 8Behavior of error to change in learning rate
Fig. 9The software and hardware configurations used for evaluation of the system
Evaluation results of the system
| Algorithm | Accuracy | AUC | Execution time in seconds |
|---|---|---|---|
| Backpropagation Algorithm | 97.93 | 0.988 | 33.32 |
| Random Forest | 97.85 | 0.985 | 20.41 |
| Naive Bayes | 97.76 | 0.983 | 17.24 |
| Decision Tree | 96.48 | 0.974 | 16.32 |
Performance of the proposed model on Backpropagation Algorithm
| Confusion matrix | ||
|---|---|---|
| Smishing messages | Legitimate messages | |
| Classified as smishing | TP = 509 | FP = 92 |
| Classified as legitimate | FN = 29 | TN = 5228 |
Outcome of the proposed model after cross-validation
| Iterations | Accuracy |
|---|---|
| Iteration 1 | 97.65 |
| Iteration 2 | 97.97 |
| Iteration 3 | 98.29 |
| Iteration 4 | 97.52 |
| Iteration 5 | 98.25 |
| Average | 97.93 |