| Literature DB >> 35634062 |
Marwan Ali Albahar1, Mahmoud Said ElSayed2, Anca Jurcut2.
Abstract
It is critical to successfully identify, mitigate, and fight against Android malware assaults, since Android malware has long been a significant threat to the security of Android applications. Identifying and categorizing dangerous applications into categories that are similar to one another are especially important in the development of a safe Android app ecosystem. The categorization of malware families may be used to improve the efficiency of the malware detection process as well as to systematically identify malicious trends. In this study, we proposed a modified ResNeXt model by embedding a new regularization technique to improve the classification task. In addition, we present a comprehensive evaluation of the Android malware classification and detection using our modified ResNeXt. The nonintuitive malware's features are converted into fingerprint images in order to extract the rich information from the input data. In addition, we applied fine-tuned deep learning (DL) based on the convolutional neural network (CNN) on the visualized malware samples to automatically obtain the discriminatory features that separate normal from malicious data. Using DL techniques not only avoids the domain expert costs but also eliminates the frequent need for the feature engineering methods. Furthermore, we evaluated the effectiveness of the modified ResNeXt model in the classification process by testing a total of fifteen different combinations of the Android malware image sections on the Drebin dataset. In this study, we only use grayscale malware images from a modified ResNeXt to analyze the malware samples. The experimental results show that the modified ResNeXt successfully achieved an accuracy of 98.25% using Android certificates only. Furthermore, we undertook extensive trials on the dataset in order to confirm the efficacy of our methodology, and we compared our approach with several existing methods. Finally, this article reveals the evaluation of different models and a much more precise option for malware identification.Entities:
Mesh:
Year: 2022 PMID: 35634062 PMCID: PMC9142319 DOI: 10.1155/2022/8634784
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1A typical architecture of the CNN.
Structure of APK files.
| Reference | APK folders/files | Responsibilities |
|---|---|---|
| [ | AndroidManifest.xml | It is one of the most important files in the Android application, which stores the basic information for the applications and includes the app components, such as activities, services, broadcast receivers, content providers, and others, in addition to package information, such as permissions, package name, and app ID. It also reveals the SDK version. |
| Assets/folder | The assets include the assets of an application directory, like images and files, which can be put in this folder and accessed by the asset manager object to retrieve the application assess detailed in the assets folder. | |
| Lib/folder | This folder contains the native code libraries. The software layer of a processor relates to a specific type of gather inside in this folder. | |
| META-INF/folder | This includes three main files, which are the signatures certifications, and manifest files for the APK such as MANIFEST.MF, SF, and ∗. RSA. | |
| Res/folder | This folder includes a description of the resources such as icons, music, images, string, resources, and layouts. These resources are not compiled in resources, arsc folder. | |
| Classes.dex | Dex code represents bytecode for Android applications which is generated after the compilation of the Java code. which contains multiple constructs for all classes composed like file header, string table, local variable list, class definition table, and method list and can be understandable by the Dalvik virtual machine. Any change in the dex file will affect the APK. | |
| Resources.arsc | This includes an application's resources in a binary format, like strings, styles, and the paths of images or layouts files, which are a part of this content. However, the data can only be processed in an XML format. |
Common android malware families.
| Common android malware families | |||||
|---|---|---|---|---|---|
| Accu Track | Counterclank | FakeTaoBao | Kidlogger | Placms | SpyOO |
| Acnetdoor | Dogowar | FakeUpdate/Apkqug | Ksapp | Podec | Steek/Fatakr |
| Adsms | Dougalek | Fakevertu | LeNa | PoisonCake | Tascudap |
| Airpush/StopSMS | DroidDeluxe | Find and Call/Fidall | Lien/ | ProxyTrojan/NotCompatible/NioServ | TapSnake/ |
| Anserver/Answerbot | DroidDream | Finspy | Locker/SLocker Ransomware | Qicsomos | TGloader/ |
| Antares/ | DroidJack/SandoRAT | Fjcon | Loicdos | Raden | TigerBot |
| Antammi | DroidDreamLight | Flexispy | Loozfon | Repane | Tetus |
| Badaccents | DSEncrypt | Fonefee/Feejar | Maistealer | RuFraud | Tracer |
| Badnews | Extension/Monad | Gamex | Malap | Saiva | TypStu |
| BankBot | FaceNiff | Gazon | Mania | Samsapo | UpdtBot |
| Beita | FakeAV | GingerBreak | MobileTx | Sndapps/Snadapps | USBcleaver |
| Binv | FakeDaum/Vmwol | GingerMaster/GingerBreaker | Mobinauten | SMSsniffer | Uten |
| BgServ | FakeBank | Godwon | Moghava | SpamBot | Uxipp |
| Biige | FakeDefender | GoldenEagle/GlodEagl | Nandrobox | SeaWeth | Vdloader |
| Bosster | FakeDoc | GoneIn60seconds | PDAspy | SMSCatcher | Walkinwat/Pirater |
| Chulli Cellspy Coogos CopyCat Cosha | FakeNefix | Iconosys | Pjapps | SpyBubble | ZergRush |
Compare between static and dynamic analyses.
| Static | Dynamic | |
|---|---|---|
| How it works | The suspected code is analyzed without the application being run during static analysis. This method involves disassembly of source code and analyzing it to check the presence of malware without executing the source code and depend only on malware abstraction characteristics and application byte code. Mostly, reengineering is applied [ | The suspected code is analyzed during the runtime execution. It focuses on the characteristics and traces of suspicious use during implementation [ |
|
| ||
| Advantage | Harmful applications are not needed to be installed on the device. | It can detect dependencies that are impossible to detect in the static method. |
| Do not execute or run the malware code. | Collects temporal instructions. | |
| Applications are in format APK or archive in a zip package [ | Deals with real data, whereas. in the static analysis, you cannot know input files to be passed for analysis. | |
| It can overcome string detection issues, such as malware fitting and pleomorphism [ | ||
|
| ||
| Disadvantage | This technique does not take into consideration the analysis of unknown malware. | Can have a negative performance impact on the application. |
| The source codes used are not directly available, and it must be disassembled to extract the features [ | Requires better mobile security at critical monitoring stages. | |
| Harmful applications cannot appear until the code has been run. | It can give incorrect results for similar behavior of the malicious applications with staring applications. | |
| Suffers from code obfuscation [ | It is a complex and time-consuming technique that requires high resource usage and storage capacity [ | |
Figure 2Residual learning: a building block.
Various malware datasets' publication counts.
| Dataset | Number of publications |
|---|---|
| Drebin | 20 |
| Repository | 8 |
| Collection | 6 |
| MalGenome | 17 |
Figure 3Conversion process of APK into grayscale image.
Fixed image width according to the file size.
| File size | Width |
|---|---|
| <50 kB | 64 |
| 50 kB–100 kB | 128 |
| 100 kB–200 kB | 256 |
| 200 kB–500 kB | 512 |
| 500 kB–1000 kB | 1024 |
Figure 4The fingerprint images of different malware families using file sections of the Android certificate (CR), AndroidManifest (M), classes.dex (CL), and resource (RS) of an APK.
Figure 5Cardinality of ResNeXt block.
Figure 6The architecture of the proposed modified ResNext.
Various combinations and its associated instances used in the study.
| Combination | CR | AM | RS | CL | CR + AM | CR + RS | CR + CL | AM + RS | AM + CL | RS + CL | CR + AM + RS | CR + AM + CL | CR + RS + CL | AM + RS + CL | CR + AM + RS + CL |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of instances | 1826 | 4659 | 4659 | 4660 | 4659 | 4659 | 4660 | 4659 | 4660 | 4660 | 4659 | 4660 | 4660 | 4660 | 4660 |
Figure 7Comparison of combination classification accuracy for each model.
Generic and augmented CNN accuracies on 15 different grayscale malware image combinations.
| Image combination | CNN (%) | CNN-SVM (%) | CNN-KNN (%) | CNN-RF (%) | VGG16 (%) | GoogLeNet (%) | ResNeXt (%) | Modified ResNeXt (%) | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | CR | 83.58 | 82.92 | 77.11 | 83.42 | 78.27 | 88.86 | 92.96 | 98.25 |
| 2 | AM | 89.79 | 90.18 | 83.94 | 84.85 | 85.76 | 90.76 | 92.51 | 95.50 |
| 3 | RS | 86.86 | 88.56 | 86.02 | 84.53 | 82.12 | 89.37 | 95.21 | 96.50 |
| 4 | CL | 89.46 | 90.57 | 89.40 | 87.58 | 87.23 | 91.16 | 94.74 | 95.63 |
| 5 | CR + AM | 91.48 | 92.59 | 86.93 | 87.52 | 90.57 | 89.81 | 92.74 | 96.88 |
| 6 | CR + RS | 87.12 | 89.47 | 86.80 | 85.89 | 88.91 | 89.16 | 94.08 | 97.38 |
| 7 | CR + CL | 89.33 | 90.25 | 89.01 | 88.43 | 89.34 | 90.01 | 93.85 | 96.94 |
| 8 | AM + RS | 88.29 | 89.47 | 87.78 | 84.98 | 86.78 | 90.07 | 93.86 | 96 |
| 9 | AM + CL | 89.33 | 90.83 | 89.79 | 88.69 | 84.43 | 90.07 | 92.06 | 95.57 |
| 10 | RS + CL | 88.49 | 90.96 | 89.34 | 87.58 | 84.37 | 85.77 | 94.98 | 96.07 |
| 11 | CR + AM + RS | 89.46 | 90.77 | 88.75 | 85.50 | 87.67 | 89.66 | 93.56 | 96.75 |
| 12 | CR + AM + CL | 89.33 | 90.51 | 88.49 | 88.82 | 86.81 | 90.26 | 93.40 | 96.46 |
| 13 | CR + RS + CL | 89.53 | 90.90 | 89.66 | 88.17 | 84.56 | 89.80 | 94.30 | 96.49 |
| 14 | AM + RS + CL | 88.55 | 90.70 | 89.86 | 87.97 | 89.29 | 90.43 | 94.15 | 95.88 |
| 15 | CR + AM + RS + CL | 89.33 | 90.70 | 89.60 | 87.84 | 84.32 | 90.04 | 93.86 | 96.47 |
A comparison of execution time and images processed per second by the proposed model.
| S/no. | Combination | Execution time (s) | Images processed/second |
|---|---|---|---|
| 1 | CR | 231.2 | 6.57 |
| 2 | AM | 663.8 | 5.1 |
| 3 | RS | 787.4 | 4.25 |
| 4 | CL | 1102.1 | 4.21 |
| 5 | CR + AM | 790.2 | 4.23 |
| 6 | CR + RS | 1004.4 | 4.64 |
| 7 | CR + CL | 1109.7 | 4.2 |
| 8 | AM + RS | 850.5 | 4.34 |
| 9 | AM + CL | 1120.4 | 4.12 |
| 10 | RS + CL | 1093.3 | 4.26 |
| 11 | CR + AM + RS | 624.7 | 5.04 |
| 12 | CR + AM + CL | 1139.4 | 4.04 |
| 13 | CR + RS + CL | 1235.5 | 3.78 |
| 14 | AM + RS + CL | 1203.9 | 3.83 |
| 15 | CR + AM + RS + CL | 1513.7 | 3.08 |
Confusion matrix for the top 20 malware families in the proposed model.
| FakeInstaller | DroidKungFu | Plankton | OpFake | GinMaster | BaseBridge | Iconosys | Kmin | FakeDoc | Geinimi | Adrd | DroidDream | ExploitLinuxLotoor | Mobile Tx | Glodream | FakeRun | SendPay | Gappusin | Imlog | SMSreg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FakeInstaller | 904 | 0 | 0 | 17 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| DroidkungFu | 0 | 579 | 3 | 2 | 13 | 10 | 0 | 0 | 0 | 0 | 34 | 0 | 6 | 0 | 3 | 0 | 0 | 17 | 0 | 0 |
| Plankton | 1 | 3 | 573 | 0 | 4 | 20 | 0 | 0 | 0 | 11 | 11 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| OpFake | 0 | 0 | 0 | 560 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
| GinMaster | 0 | 1 | 2 | 3 | 315 | 0 | 0 | 0 | 0 | 1 | 12 | 0 | 0 | 0 | 4 | 0 | 0 | 1 | 0 | 0 |
| BaseBridge | 0 | 5 | 0 | 1 | 1 | 316 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |
| Iconosys | 0 | 0 | 0 | 0 | 0 | 0 | 152 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Kmin | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 135 | 0 | 0 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| FakeDoc | 3 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 120 | 0 | 2 | 0 | 0 | 0 | 4 | 0 | 0 | 1 | 0 | 0 |
| Geinimi | 0 | 0 | 0 | 0 | 3 | 2 | 0 | 0 | 0 | 86 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Adrd | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 88 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| DroidDream | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 78 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| ExploitLinuxLotoor | 0 | 1 | 0 | 0 | 3 | 4 | 0 | 0 | 0 | 0 | 1 | 0 | 60 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Mobile Tx | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 69 | 0 | 0 | 0 | 0 | 0 | 0 |
| Glodream | 0 | 2 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 60 | 0 | 0 | 0 | 0 | 0 |
| FakeRun | 0 | 0 | 21 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 4 | 27 | 0 | 3 | 0 | 0 |
| SendPay | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 59 | 0 | 0 | 0 |
| Gappusin | 0 | 3 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 51 | 0 | 0 |
| Imlog | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 41 | 0 |
| SMSreg | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39 |
F1-score comparisons between the modified ResNeXt and the original ResNeXt in the Drebin dataset.
| Family | ResNeXt | Modified ResNeXt |
|---|---|---|
| Adrd | 0.67433 | 0.908108 |
| BaseBridge | 0.921283 | 0.961832 |
| DroidDream | 0.962963 | 0.981132 |
| DroidKungFu | 0.916865 | 0.935737 |
| ExploitLinuxLotoor | 0.875912 | 0.827068 |
| FakeDoc | 0.948617 | 0.988593 |
| FakeInstaller | 0.613636 | 0.99675 |
| FakeRun | 0.761194 | 0.97561 |
| Gappusin | 0.858311 | 0.866667 |
| Geinimi | 0.794702 | 0.891192 |
| GinMaster | 0.993464 | 0.939691 |
| Glodream | 0.97619 | 0.78481 |
| Iconosys | 0.957447 | 0.996721 |
| Imlog | 1 | 0.988506 |
| Kmin | 0.934891 | 1 |
| Mobile Tx | 0.93551 | 1 |
| OpFake | 0.991597 | 0.995938 |
| Plankton | 0.95122 | 0.994378 |
| SMSreg | 0.986361 | 0.886076 |
| SendPay | 0.895833 | 0.944 |
Figure 8Comparison of family classification F1-score for each model in the Drebin dataset.
Figure 9Stability comparison of classification performance on three sets.
Stability classification performance of the proposed model for the Drebin and AMD datasets.
| Dataset | Precision | Recall | Accuracy |
|---|---|---|---|
| Drebin + Drebin | 97.1 | 97.3 | 98.2 |
| Drebin + S-AMD | 66.2 | 67.3 | 66.2 |
| Drebin + T-AMD | 75.2 | 77.8 | 77.4 |
| S-AMD + Drebin | 87.1 | 87.2 | 89.1 |
| S-AMD + S-ADM | 83.3 | 88.1 | 83.5 |
| T-AMD + Drebin | 86.3 | 87.6 | 87 |
| T-AMD + T-ADM | 93 | 92.5 | 91.3 |