Literature DB >> 33583978

Towards a systematic description of the field using bibliometric analysis: malware evolution.

Sharfah Ratibah Tuan Mat1, Mohd Faizal Ab Razak1, Mohd Nizam Mohmad Kahar1, Juliza Mohamad Arif1, Salwana Mohamad1, Ahmad Firdaus1.   

Abstract

Malware is a blanket term for Trojan, viruses, spyware, worms, and other files that are purposely created to harm computers, mobile devices, or computer networks. Malware commonly steals, encrypts, damages, and causes a mess in these devices. The growth of malware attacks has a consequence on the growth and attractiveness of mobile features in mobile devices. Most malware research aims to probe the different methods of preventing, analysing, and detecting malware attacks. This paper aims to demonstrate an exhaustive knowledge map of the Android malware by collecting a ten (10) year dataset from the Web of Science database. A bibliometric analysis was employed for analysing articles published between 2010 and 2019. Using the keyword "malware", 5622 articles were retrieved. After scrutinising with the keywords of "Android malware", 1278 articles were then collected. This study provides an overview of the articles, productivity, research area, the Web of Science categories, authors, high-cited articles, institutions, and impact journals examining malware. Research activities are continued by placing terms in the classification of malware detection systems that outline important areas in malware research. From the analysis, it can be concluded that the highest number of publications focusing on malware studies came from the continent of Asia. Additionally, this study discusses the challenges of malware studies in the recent research studies as well as the future direction. © Akadémiai Kiadó, Budapest, Hungary 2021.

Entities:  

Keywords:  Android malware; Bibliometric; Intrusion detection system; Web of science

Year:  2021        PMID: 33583978      PMCID: PMC7871169          DOI: 10.1007/s11192-020-03834-6

Source DB:  PubMed          Journal:  Scientometrics        ISSN: 0138-9130            Impact factor:   3.238


Introduction

Malware is a term used for all kinds of malicious software designed to attack and damage a computer system. The common malware causing harm to the network and operating system include Trojan horse, worm, virus, spyware, ransomware and adware (Razak et al. 2016). These malware attack the system in different methods (Qamar et al. 2019). Such a malware is proficient in triggering destruction to the operating system and networks. In Quarter 2 (Q2) of 2019, malware targeting mobile devices had increased by 50% as compared to the previous year (Palmer 2019). In January 2019, a total of two billion hacked records had been uncovered (Sanders 2019). McAfee Labs noted that attacks from ransomware had grown by 118% in 2019 (Mcafee 2019a, b) while banking Trojan had doubled from June to September 2018, serving as the most vigorous in growth, among the malware families (Mcafee 2019a, b). A new malware reported by Chebyshev et al. (2019) revealed that the new Trojan Android.MobOk took money from mobile accounts by using subscriptions. To date, a total of 905,174 malicious installation packages had been detected by Quarter 1 of 2019. The number had decreased by 151,624 in Quarter 2 of 2019. Statistics showed that the risk tool had increased by 41.24% in 2019 as compared to only 30.20% in 2018. Furthermore, the percentage of adware and Trojan was noted to have increased by 18.71% and 11.83%, respectively in 2019. This substantial growth in mobile attacks over the years showed that attackers were progressively noticing that the Android mobile devices are attractive targets (Verkijika 2019). The Android network offers attractive functions for communication, such as entertainment, data storage, and social communication (Chen and Li 2017). The acceptance of the operating system in Android mobile devices has been one of the most targeted by malware, spurring the attention of unscrupulous authors (Shrivastava and Kumar 2019a). These people have been encouraged by their own unscrupulous goals and other lucrative benefits. The lack of priority given to security by mobile device developers has also caused the exploitation of malware into mobile devices (Thompson et al. 2017). To combat the security issues, Android itself has provided sand-boxing for security mechanism; however, malware authors creatively manipulated other vulnerabilities of Android to spread the malware (Qamar et al. 2019). In addition, the lack of user awareness (Lopes et al. 2019), and the vulnerabilities of a computer operating system boost the opportunity for malware to exploit data through malicious codes (Goel and Jain 2018). These malicious programmes accomplish different purposes, such as encrypting, stealing crucial information stored in phone storage, removing important data, modifying or controlling main computing functions, and capturing the activities that are unknown to the users (Basu et al. 2019). The reliance on mobile devices by most users for their personal work through Wi-Fi access (Sharma and Gupta 2018a) gives feasibility to attackers to attack a user’s credential (Sharma and Gupta 2016). The awareness on the emerging of malware should be alerted to all mobile users as a way to prevent the devices from being damaged. To prevent the dissemination of malware, devices are protected by using existing methods such as anti-malware software and the intrusion detection system (IDS) (Talal et al. 2019). Nevertheless, novel approaches are still needed to detect the rapid increase in malware attacks throughout the year. With the advent of more advanced technologies, malware authors are able to hide malware from detection. Malware authors applied a diverse sophisticated obfuscation technique including encryption, packing, polymorphism and metamorphism (Huda et al. 2018). It is generally used to prevent signature extraction originating from the malware’s binary code (Or-Meir et al. 2019). This phenomenon has prompted many researchers to investigate and analyse the features of malware. Most of the studies were conducted so as to introduce a better approach of preventing, detecting, and proposing a new approach to solve Android malware. A study by De Lorenzo et al. (2020) used dynamic analysis with Vizmal to spot and avoid malware. Vizmal is a visualisation tool used to trace the execution of applications in Android. It is used to overcome the issue of obfuscation created by malware authors. Rolling acts as an assistant during inspection of malware analysis and observes the localisation of malicious. Others studies such as Yerima et al. (2014) and Yu et al. (2013) applied the Bayesian technique to detect malware. Another study Magdum (2015) used a feature of permission-based dimension in machine learning to identify the malware. All these studies which described the research activities in this field are crucial. Despite the many research activities that have been published, the bibliometric study of malware likewise becomes popular in today’s research trends to provide an impactful study. Bibliometric is a quantitative analysis of articles published in a specific field (Blanco-Mesa et al. 2017; Baker et al. 2019). The bibliometric study analyses the data and features of articles, such as productivity, research area, Web of Science (WoS) categories, authors, high cited articles, institutions, and impact journals. The bibliometric method is used to evaluate the impact of published articles and to assist the researcher in understanding the structure of the research life (Reuters 2008). It reveals the area of the studies, thereby increasing the interest and attention of researchers and funding institutions. Analysis derived from the bibliometric method is able to compare the countries that contributed to the publications according to their respective fields. Bibliometric study has been applied in a wide range of fields including the COVID-19 pandemic (Gautam et al. 2020), environmental (Zhang et al. 2020), agricultural (Luo et al. 2020), sustainable development (Ye et al. 2020), Chinese loess plateau (Zhang and Chen 2020), accounting (Merigó and Yang 2017), economic (Bonilla et al. 2015), linguistic decision making (Yu et al. 2016) and fuzzy research (Merigó et al. 2015). Bibliometric studies contribute to several advantages such as: (a) reveal the importance of research in the related field, (b) reveal the development of research based on the institution and performance, (c) enable researchers to use the publication of related studies for future studies, and (d) to improve the knowledge of new researchers. The current study aims to evaluate studies done on the Android malware which have been published in the WoS from the year 2010 to 2019. The study scrutinises the Android malware research topic, publication pattern, research area, authors, highly cited articles, impact journals, and the institution of the studies. The significant aspect in this analysis is that the Web of Science has a wider view of the contributions. In planning the review of Android malware articles in the WoS database, the following steps were followed: (1) identify and analyse the Android malware study in the Web of Science for 10 years (2009–2019); (2) present the findings of Android malware detection considering articles, productivity, research area, the Web of Science categories, authors, high-cited articles, institutions, and impact journals; (3) define and study the research gap, the highlighted questions, and the difficulties encountered in the prior studies; and (4) identify the latest trends on Android malware attack. The objective of classifying these steps is to deliver a better understanding of Android malware. The proliferation of Android malware studies has been analysed to determine the tendency of malware pattern and the detection procedures taken to prevent the spreading of malware. Focusing on the past 10 years publication of malware specifically for Android malware study, this bibliometric analysis similarly looked at the introduction of the scope and aims of the study by planning and evaluating the challenges in malware trends. The current study used “android malware” as the main keyword to get the related publications. The keyword is imperative in order to retrieve current information on the research trend, and also to disclose the research direction and attraction. The related publications were searched by using the WoS Core Collection database. Limit was set at the past 10 years (2010–2019). Additionally, this paper also discussed the malware detection system and the challenges in malware study. As a summary of the paper, we analysed the research publication comprising seven (7) continents including Asia, Europe, North America, the Middle East, Australia, South America, and Africa. Asia had the highest publication at 40.5%, among all the continents, followed by Europe with 26.5%, and North America with 20.3%. This showed that Asia outperformed Europe by a difference of 14% while North America and Europe had some disparity. The continent with the least contribution of publication of Android malware seemed to be Africa, at 0.7% only. Table 1 illustrates the distribution of the publication in seven continents.
Table 1

Publication of 7 continents

ContinentPublication %
Asia40.5
Europe26.5
North America20.3
Middle East8.7
Australia2.3
South America1.0
Africa0.7
Publication of 7 continents The remainder of this paper is systematised as follows. Section 2 discusses the process of collecting data. Section 3 provides the findings of the studies. Section 4 explains the taxonomy for the detection system of malware. Section 5 discusses the challenges and imminent trends and Sect. 6 concludes the paper.

Methodology

Bibliometric is defined as the statistical method used to analyse articles, books, and other publications. It is frequently used in the library and information science field (Library 2020). Bibliometric is similarly referred to as scientometrics. Bibliometric analysis covers part of the research evaluation methodology, and various kinds of literature tend to have their own method of bibliometric analysis (Ellegaard and Wallin 2015). According to Razak et al. (2016), bibliometric is a process to appraise, analyse, and envision the arrangement of scientific fields. The bibliometric approach focuses on quantitative analysis, such as citation counts. In such analysis, the term ‘complementary’ is used as a qualitative indicator to search for issues like funding granted, rewards received, peer review, and number of patents (Library 2019). The key concepts of the bibliometric approach are output and impact, which are used as a measurement for publications and citations. Hence, bibliometric studies give many advantages in order to provide the important trend of the research topic. Bibliometric analysis has been used in various areas of study. The bibliometric study done by Shanker et al. (2020) analysed the studies of neurosurgeon’s academic works in the New York metropolitan area. Another bibliometric study was by Iwami et al. (2019) who examined fields that co-evolved with information technology while (Ospina-Mateus et al. 2019) analysed the study of motorcycle accidents. A study by Baker et al. (2019) in the field of financial economics used bibliometric analysis to present the productivity and impact of RFE (review of financial economics). Additionally, Prashar and Sunder (2019) used bibliometric study in the field of sustainability development, Raparelli and Bajocco (2019) in the field of vehicle agricultural and Galetsi and Katsaliaki (2019) in the field of Information Science. Comparatively, the bibliometric study of malware is only just emerging in research trends as compared to other fields. In this study, the researcher illustrates how to evaluate the research by using the bibliometric method. The evaluation is conducted through the analysis to get the impact of the articles. Table 2 analyses past studies which had applied the bibliometric approach, in which the current study is similarly applying. However, there are some dissimilarities noted on the keyword and findings used.
Table 2

The list of studies of bibliometric methods

ReferencesFieldsYear
Prashar and Sunder (2019)Sustainability development2020
Shukla et al. (2020)Medical Informatics2020
Galetsi and Katsaliaki (2019)Information Science2019
Baker et al. (2019)Financial economics2019
Lu et al. (2019)Public health2019
Ahmad et al. (2019)Dental traumatology2019
Raparelli and Bajocco (2019)Vehicle agricultural2019
Firdaus et al. (2019)Blockchain2019
Razak et al. (2016)Malware2016
This studyAndroid malware2020
The list of studies of bibliometric methods For the development of this study, the author used the database Web of Science which belongs to Thomson Reuters. In this study, WoS core collection database was chosen and SciELO Citation Index, KCI-Korean Journal Database, and Russian Science Citation Index were removed. The selected articles are solely written in English. To carry out the research analysis, the keywords malware and android malware were used to distinguish the numbers of publications of both keywords. The keyword Android malware focused on the publication of mobile malware while the keyword malware generated global information of malware including cybercrime, IoT, phishing and many more articles of malware in the WoS. The advantages of using the keyword ‘Android malware’ is the collected articles are related to mobile malware and resulted in better in findings. Thus, the Android malware is selected for the keyword in this bibliometric study. The data for this study were analysed two times by considering the changes of number of the publication in the WoS database. Firstly, analysis of data was on October 2019 and secondly in February 2020. In February 2020, there were 1278 articles of Android malware and 5622 articles for malware. In this filter, 97 articles were excluded consisting of the SciELO Citation Index, KCI-Korean Journal Database, and Russian Science Citation Index. Then, the selected 1278 articles were analysed for the title, year of publications, research area, author/s, citation, institution/s and impact journal. These articles included articles, journals and book chapters. With the selected 1278 articles, an analysis was done by forming the affiliation between the research area, author/s, citation, institution/s and impact journal. Finally, the open-source application called R was used as a tool to visualise the final result. R was used because this tool supports many bibliographic visual for analysis and comprises excellent features. Figure 1 clarifies the data collection process.
Fig. 1

Methodology of data collection

Methodology of data collection

Web of Science (WoS)

The Web of Science (WoS) is a webpage that offers multiple databases for indexed journal articles. Formerly known as the Web of Knowledge, the WoS was introduced by the Institute for Science Information (ISI). It is presently managed by Clarivate Analytics (Iwami et al. 2019). The WoS has indexed coverage starting from the year 1900. The WoS has covered more than 12,000 impact journals, with 148,000 journals and book-based proceedings, across 256 disciplines in science, social sciences, and humanities (Webofknowledge 2018). It provides the basic search, cited reference search, author search, and advanced search, from four databases such as the Web of Science Core Collection, the KCI-Korean Journal Database, Russian Science Citation Index, and SciELO Citation Index. The WoS provides the citation report; it also analyses the result so that it can track the activities, and the impact of the journal through an appropriate keyword search. This study chose the WoS database because the contents of the WoS had been evaluated before, based on publication impact, review, influence, and geographical distribution. The WoS served as a research tool that accommodates the user in acquiring the information, and in analysing and disseminating knowledge. The WoS has innumerable capabilities of search and analysis. These are useful for researchers when searching for index journals in their respective areas. The indexing was first used to search for the results across disciplines. Past studies of bibliometric included Baker et al. (2019), Shukla et al. (2020), Yao et al. (2020) and Chen et al. (2019) which utilised the WoS database comprising of science, social science, arts, and humanities field. Besides the WoS, there are other database websites, such as ScienceDirect, Elsevier's Scopus, IEEE Explore, Google Scholar, and Springer.

Findings

This section describes the findings of Android malware studies. Articles between 2010 and 2019 were analysed. Findings were divided into seven (7) sub-topics: publication year, countries, research areas, authors, institutions, highly cited article, and impact journals. The total publications were noted to be 1278 articles, as presented in Table 3.
Table 3

Publication based on the year

Year2010201120122013201420152016201720182019
No. of Publication162660119198241254236137
Publication %0.10.52.04.79.315.518.819.918.510.7
Publication based on the year The statistics in Table 3 showed that publications of Android malware studies had increased twice in amount, starting from 2011 until 2014. The highest publication was in 2017, with 254 publications. The increased publication can be attributed to the wild growth of malicious software on Android devices. This seemed to have encouraged researchers to examine the factors infected by malware, the vulnerabilities of the devices, and the impact and method used to prevent and reduce those malware attacks. Publications on Android malware dropped slightly in the year between 2018 and 2019. The reason can be attributed to the delayed time taken by reviewers and publishers to accept such articles. Figure 2 describes the type of publications based on the years.
Fig. 2

Numbers of publication type based on years

Numbers of publication type based on years In Fig. 2, it is noted that publications were increasing smoothly year by year. This occurrence then dropped slightly in 2018, and more significantly in 2019. Undoubtedly, publications of journals consume time from acceptance of articles to publication, hence the rate in publication showed a decline. This had clearly affected the number of publications for that particular year. In addition, the publication of book chapters was more noticeable in the year 2017 and 2019.

Productivity

Table 4 illustrates the output of the publications among the continents. It is essential to scrutinise the output growth of the articles in order to analyse the malware issue that is a worldwide concern. These articles were analysed based on the continent category so as to detect the awareness of the malware issue and the frequency of malware attacks in the user country. Data presented in Table 4 list the publications across continents from year 2010 to 2019.
Table 4

Productivity based on continents

Continent/countryNumber of articles% of articles
South America171.0
 Argentina10.1
 Brazil50.3
 Chile20.1
 Colombia80.6
 Ecuador10.1
North America33020.3
 Canada482.9
 Mexico70.4
 Nicaragua40.2
 Russia60.4
 United States27216.7
Asia65940.5
 Bangladesh50.3
 China32820.1
 India1106.8
 Indonesia40.2
 Japan150.9
 Malaysia422.6
 Alestine20.1
 Singapore332.0
 South Korea734.5
 Sri Lanka10.1
 Taiwan332.0
 Thailand30.2
 Vietnam100.6
Europe43126.5
 Austria130.8
 Belgium30.2
 Croatia20.1
 Cyprus30.2
 Czech Republic50.3
 Denmark70.4
 England603.7
 Finland100.6
 France362.2
 Germany442.7
 Greece140.9
 Iceland10.1
 Italy986.0
 Luxembourg201.2
 Malta10.1
 Myanmar10.1
 Netherlands50.3
 North Ireland130.8
 Norway20.1
 Poland20.1
 Portugal60.4
 Romania60.4
 Scotland80.5
 Slovakia20.1
 Spain462.8
 Sweden90.6
 Switzerland110.7
 Ukraine10.1
 Wales20.1
Australia372.3
 Australia352.1
 New Zealand20.1
Middle East1428.7
 Algeria20.1
 Egypt30.2
 Iran171.0
 Israel70.4
 Jordan40.2
 Lebanon40.2
 Morocco20.1
 Oman10.1
 Pakistan291.8
 Qatar50.3
 Saudi Arabia241.5
 Turkey382.3
 U Arab Emirates60.4
Africa120.7
 Namibia10.1
 Nigeria30.2
 South Africa70.4
 Tunisia10.1
Productivity based on continents Following the analysis of publications across continents, data are subsequently categorised based on countries and continents according to year. Table 5 further illustrates.
Table 5

Productivity of continent based on year

Continent/countryYear
2010201120122013201420152016201720182019
South America0000155034
 Argentina0000000001
 Brazil0000021020
 Chile0000011000
 Colombia0000123012
 Ecuador0000000001
North America01921335471646215
 Canada002051177115
 Mexico0000005110
 Nicaragua0000001210
 Russia0010000410
 United States01621284358504810
Asia01819629712013113982
 Bangladesh0000011120
 China0157254459697543
 India0001121815252514
 Indonesia0000021010
 Japan0001113432
 Malaysia000344610123
 Palestine0001000001
 Singapore00001510764
 South Korea000514141361110
 Sri Lanka0000010000
 Taiwan00215710422
 Thailand0010002000
 Vietnam0000000523
Europe051021346389917741
 Austria0013212112
 Belgium0000111000
 Croatia0000011000
 Cyprus0000001020
 Czech Republic0000021002
 Denmark0000101140
 England0011461016157
 Finland0010212400
 France0022564881
 Germany02343128642
 Greece0100261130
 Iceland0000100000
 Italy01037103019199
 Luxembourg0010235702
 Malta0000000010
 Myanmar0000001000
 Netherlands0001001210
 North Ireland0001222213
 Norway0000010001
 Poland0000001100
 Portugal0000001311
 Romania0000030210
 Scotland0001015100
 Slovakia0000010100
 Spain001425410128
 Sweden0001002420
 Switzerland0100015013
 Ukraine0000000100
 Wales0000000110
Australia00022582810
 Australia00022482710
 New Zealand0000010010
Middle East101371326273628
 Algeria0000001010
 Egypt0000010020
 Iran0001112732
 Israel1011211000
 Jordan0000001102
 Lebanon0001101010
 Morocco0000001010
 Oman0000010000
 Pakistan00000272117
 Qatar0000002201
 Saudi Arabia0000213648
 Turkey00001468127
 U Arab Emirates0000021111
Africa0020230113
 Namibia0000000001
 Nigeria0000110001
 South Africa0020120011
 Tunisia0000000100
Productivity of continent based on year From the above, it can be noted that the most productive continent in publishing articles were Asia and Europe. The former produced 40.5% while the latter produced 26.5%, and North America produced 20.3%. It appears that Asia had outperformed Europe by 14%, thereby making Asia the most prolific in publications focusing on Android malware. Among these, 20.1% of publications were from China. Other countries that followed suit include: the United States, India, Italy, and South Korea. Comparatively, the Middle East, Australia, Africa, and South America contributed less. Research funding is genuinely needed in scientific research. Here, it is observed that the United States had spent around 500 billion USD for research and development (R&D) while China had spent about 400 billion USD (Enago Academy 2018). However, research in the United States remained stagnant due to economic trouble (Enago Academy 2018) whereas China managed to increase its R&D funding, simultaneously yielding the most in scientific research. This is because it had the support of its government with a lot of funding provided for a collaborative venture in China (International Center 2019). In this regard, China defeated the United States, for the first time in science publishing (Enago Academy 2018; Dockrill 2018). Thus, Asia has become the most prolific in the publication of Android malware articles.

Research area

The subsequent finding focused on research areas which discussed the total publications found on a particular research area. This measure is important for measuring the performance and challenges observed in the different fields of studies. The yield of the related research areas uncovered the movement of the research studies. Here, it was noted that the WoS contained 27 research fields in the publication of Android malware. Table 6 presents this outcome.
Table 6

Research area of studies

Research areasPublicationsPublication %
Computer Science110086.1
Engineering48638.0
Telecommunications32125.0
Science Technology Other Topics282.2
Automation Control Systems282.2
Robotics141.2
Mathematics100.8
Physics100.8
Materials Science70.6
Information Science Library Science50.4
Operations Research Management Science50.4
Chemistry40.3
Education Educational Research30.2
Instruments Instrumentation30.2
Acoustics20.2
Energy Fuels20.2
Mechanics20.2
Optics20.2
Business Economics20.2
Fisheries10.1
Health Care Sciences Services10.1
Imaging Science Photographic Technology10.1
Legal Medicine10.1
Mathematical Computational Biology10.1
Medical Informatics10.1
Psychology10.1
Social Sciences Other Topics10.1
Research area of studies From the above, the statistics showed that there were numerous research areas that were related, for instance, Computer Science, Engineering, Telecommunication, Science Technology, and Automation Control systems. The publications noted for all these research areas were dominated by Computer Science and Engineering, with 86.1% and 38%, respectively. The total publications for Computer Science involving Android malware issues emerged from the evolution of device technology. Here, it was observed that the total publications from the Computer Science field were 1100 articles, followed by Engineering with 386 articles. Second to Computer Science, the Engineering field was then followed by the Telecommunications field. Based on this, it can thus be deduced that Computer Science and Engineering correlated with each other. Both contributed to developing a new technology that could be used by academia and the public. Nonetheless, there were specific terms observed to be related to Computer Science and Engineering, for instance, machine learning, security, artificial intelligence, computer architecture, and data processing. The development of new mobile devices was associated with the expertise of Computer Science and Telecommunications, hence their link with each other. The article with the highest citation was traced to Dissecting Android Malware: Characterisation and Evolution with 655 citations under the Computer Science and Engineering area in the WoS database. This confirmed that there was a close connection between the field of Computer Science and Engineering. Consequently, there was no significant difference within the first and second contributors in the publication of Android malware articles. Both areas were correlated in producing articles on the same topics. The rest of the research areas are listed in Table 6.

Web of Science categories

Table 7 lists the WoS categories, which presents the seven (7) sub-categories of Computer Science. The first among these was Computer Science Theory Methods, followed by Computer Science Information System. The other five sub-categories came under the research area of Engineering comprising Electrical Electronics Engineering, Multidisciplinary Engineering, Mechanical Engineering, Industrial Engineering, and Aerospace Engineering. As is obvious, Electrical Electronics Engineering comprised the most Android malware related publications, while Aerospace Engineering had the lowest.
Table 7

Web of Science categories

WoS categoryPublication% Publication
Computer Science Theory Methods55443.4
Computer Science Information Systems49939.0
Engineering Electrical Electronic46536.4
Telecommunications31925.0
Computer Science Software Engineering21116.5
Computer Science Artificial Intelligence16212.7
Computer Science Hardware Architecture1088.5
Computer Science Interdisciplinary Applications1028.0
Automation Control Systems282.1
Multidisciplinary Sciences191.5
Engineering Multidisciplinary181.4
Robotics141.1
Computer Science Cybernetics90.7
Mathematics Applied90.7
Physics Applied80.6
Materials Science Multidisciplinary70.6
Logic60.5
Information Science Library Science50.4
Operations Research Management Science50.4
Engineering Mechanical40.3
Mathematics40.3
Chemistry Multidisciplinary30.2
Instruments Instrumentation30.2
Acoustics20.2
Education Educational Research20.2
Education Scientific Disciplines20.2
Energy Fuels20.2
Engineering Industrial20.2
Green Sustainable Science Technology20.2
Mathematics Interdisciplinary Applications20.2
Mechanics20.2
Optics20.2
Business10.1
Chemistry Analytical10.1
Engineering Aerospace10.1
Ergonomics10.1
Fisheries10.1
Health Care Sciences Services10.1
Imaging Science Photographic Technology10.1
Mathematical Computational Biology10.1
Medical Informatics10.1
Medicine Legal10.1
Nanoscience Nanotechnology10.1
Physics Fluids Plasmas10.1
Physics Mathematical10.1
Physics Multidisciplinary10.1
Psychology Experimental10.1
Psychology Multidisciplinary10.1
Social Sciences Interdisciplinary10.1
Web of Science categories

Author

The finding in terms of the author is significant in this bibliometric study. It facilitates other researchers in their studies by highlighting the most prolific or most active contributor in terms of publications in the Android malware research. Table 8 presents the top 20 most influential and productive authors. The table classified under Author is organised in terms of the number of publications, institutions, and countries.
Table 8

Authors

AuthorsPublication% PublicationsInstitutionCountry
Francesco Mercaldo332.5University of SannioItaly
Fabio Martinelli201.6University of SannioItaly
Mauro Conti191.5Uni of PaduaItaly
Carraro Aoron Visaggio181.4University of SannioItaly
Jacques Klein171.4Univ LuxembourgLuxembourg
Yang Liu161.3Xidian UnivChina
Nor Badrul Anuar151.2Univ of MalayaMalaysia
Li Li151.2Monash UniAustralia
Tegawende F Bissyande141.1Univ LuxembourgLuxembourg
Zhenxiang Chen131.0Univ of JinanChina
Vijay Laxmi131.0Malaviya Natl Inst TechnolIndia
Wei Wang131.0Univ of BeijingChina
Manoj Singh Gaur120.9Malaviya Natl Inst TechnolIndia
Le Traon Yves120.9Univ LuxembourgLuxembourg
Li Qi120.9Uni BeijingChina
Vittoria Nardone110.9University of SannioItaly
Sakir Sezer110.9University BelfastIreland
Vinod P110.9Uni of PaduaItaly
ShanshanWang100.8Univ of JinanChina
QibenYan100.8Univ of Nebraska-LincolnUnited States
Suleiman Y. Yerima100.8University BelfastIreland
Authors Data above highlights publications generated from all the seven continents. Countries like Europe and Asia were the most notable, producing the most publications in Android malware with countries like Italy, Luxembourg, Malaysia, China, and India holding the best record. The top three authors were from the continent of Europe, specifically, from Italy. The most prominent author was Francesco Mercaldo, who published 33 articles, followed by Fabio Martinelli with 20 articles and Mauro Conti with 19 publications. Both Francesco Mercaldo and Fabio Martinelli were from the University of Sanni, whereas Mauro Conti was from the University of Padua. From Asia, Yang Liu, Nor Badrul Anuar, and Vijay Laxmi served as the most active contributors. From China, Yang Li contributed a total of 16 publications while Nor Badrul from the University of Malaya, Malaysia, contributed a total of 15 articles. The top 20 authors who were involved in various research areas were from 16 different institutions.

High cited articles

This section describes the number of citations, as illustrated in Table 9. A list of 25 most cited articles with information in terms of citation numbers, published journal, year, and research areas was presented. The top three contained the most cited publications which were published between five and seven years ago. This information conformed with the theory that the citation came from articles that have been longer in the database (Razak et al. 2016). The research areas contributing to the publications on Android malware include Engineering, Telecommunications, Science Technology other topics, Automation Control Systems, Robotics, Mathematics, and finally, Computer Science which had become the dominant field for highly cited articles.
Table 9

Highly cited articles

ReferencesNumber of CitationJournalYearResearch Area
Zhou and Jiang (2012)6552012 IEEE Symposium on Security and Privacy (SP)2012Computer Science
Arzt et al. (2014)385ACM SIGPLAN Notices2014Computer Science
Asaf Shabtai et al. (2012)281Journal of Intelligent Information Systems2012Computer Science
Wu et al. (2012)192Proceedings of the 2012 Seventh Asia Joint Conference on Information Security (ASIAJCIS 2012)2012Computer Science
Aafer et al. (2013)187Security and Privacy in Communication Networks, SECURECOMM 20132013Computer Science
Davi et al. (2011)132Information Security2011Computer Science
Zhang et al. (2014)117Ccs'14: Proceedings of the 21st ACM Conference on Computer and Communications Security2014Computer Science
Faruki et al. (2014)111IEEE Communications Surveys and Tutorials2015Computer Science
Gorla et al. (2014)10236th International Conference on Software Engineering (ICSE 2014)2014Computer Science
Feng et al. (2014)9722nd ACM SIGSOFT International Symposium on The Foundations of Software Engineering (FSE 2014)2014Computer Science
Wei et al. (2014)96CCS'14: Proceedings of the 21st ACM Conference on Computer and Communications Security2014Computer Science
Peiravian and Zhu (2013)912013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI)2013Computer Science
Wang et al. (2014)82IEEE Transactions on Information Forensics and Security2014Computer Science
Suarez-Tangil et al. (2014)81Expert Systems with Applications2014Computer Science
Yerima et al. (2013)812013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA)2013Computer Science
Sanz et al. (2013)80International Joint Conference CISIS'12—ICEUTE'12—SOCO'12 Special Sessions2013Computer Science
Shabtai et al. (2014a, b)70Computers & Security2014Computer Science
Seo et al. (2014a, b)69Journal of Network and Computer Applications2014Computer Science
Yuan et al. (2014)68ACM Sigcomm Computer Communication Review2014Computer Science
Tam et al. (2017)66ACM Computing Surveys2017Computer Science
Narudin et al. (2016)65Soft Computing2016Computer Science
Rastogi et al. (2014)65IEEE Transactions on Information Forensics and Security2014Computer Science
Zheng et al. (2013)642013 12th IEEE International Conference on Trust, Security, and Privacy in Computing and Communications (TRUSTCOM 2013)2013Computer Science
Feizollah et al. (2015)63Digital Investigation2015Computer Science
Yuan et al. (2016)61Tsinghua Science and Technology2016Computer Science
Highly cited articles As noted in Table 9, the article that was most cited was, “Dissecting Android Malware: Characterisation And Evolution” which received 655 citations (Zhou and Jiang 2012). The author of this article was from China, the continent of Asia, and the article was published by the journal of the IEEE Symposium on Security and Privacy in 2012. The article described the characteristics and evolution of malware by presenting a total of 1260 samples of Android malware from 49 dissimilar families. The characteristics of these malware samples were examined based on their behaviours, including installation, activation, and payloads. The article indicated the best detection of the malware at 79.6% and the worst detection at 20.2% based on the dataset. This outcome thus demanded that a better solution be developed for the next generation of mobile malware detection. The top second article was “Flowdroid: Precise Context, Flow, Field, Object-Sensitive And Lifecycle-Aware Taint Analysis For Android Apps”, with 385 citations published in 2014 by Acm Sigplan Notices (Arzt et al. 2014). This article used static taint analysis to present FLOWDROID for Android applications. The experiment was implemented on 500 benign and 1000 malware from Google play and the VirusShare project, respectively. A closer view of both the two articles suggested that researchers studying malware detection could use this information for further knowledge. These articles were the most highly cited and acknowledged by other new researchers based on findings, methods, and ideas.

Institutions

This section discusses the publications that were linked to the respective institution. The aim of doing this was to categorise the institutions by comparing the publications. It was found that institutions from Asia held the highest in Android malware publications. Table 10 illustrates the top 30 of the greatest institution, comprising four continents: Asia, Europe, the Middle East, and North America.
Table 10

Institutions

InstitutionsPublications% PublicationCountry
Chinese Academy of Sciences473.7China
Beijing University of Posts Telecommunications332.6China
Consiglio Nazionale Delle Ricerche Cnr282.2Italy
University of Sannio262.0Italy
Institute of Information Engineering Cas252.0China
Istituto Di Informatica E Telematica Iit Cnr231.8China
University of Chinese Academy of Sciences Cas211.6China
Tsinghua University201.6China
University of London201.6England
University of Luxembourg201.6Luxembourg
University of Padua191.5Italy
Pennsylvania Commonwealth System of Higher Education191.5United States
Universiti Malaya171.3Malaysia
University of California System161.3United States
University System of Georgia161.3United States
Korea University151.2Korea
University of Jinan151.2China
Nanyang Technological University141.1Singapore
Nanyang Technological University National Institute of Education Nie Singapore141.1Singapore
Centre National De La Recherche Scientifique131.0France
State University System of Florida131.0United States
Gazi University120.9Turkey
Inria120.9France
Malaviya National Institute of Technology Jaipur120.9India
Queens University Belfast120.9North island
Royal Holloway University London120.9England
University of New Brunswick120.9Canada
University of North Carolina120.9United State
University of Texas System120.9United State
Institutions Table 10 presents the most distinguished institutions in publishing Android malware articles. It is noted that the Chinese Academy of Science is the greatest institution for publication, followed by Beijing University. This also showed that institutions from the continent of Asia had the greatest number of publications. This was then followed by other institutions from the continent of Europe, followed in line by North America. Other distinguished institutions that were from Asia include the University of Chinese Academy, Tsinghua University, University of Malaya, Korea University, and the University of Jinan. This study further discovered that most eminent institutions in Asia were located in China. Moreover, China's speed in the publication surpassed other countries in Asia, with mainly seven (7) institutions that contributed to these publications. Moreover, the analysis showed that the entire publications among institutions were held together by a small gap. Slightly different publications among the institutions proved that the researchers had excellent facility and high competition.

Impact journal

This section discusses the impact of the journal under the Computer Science field. A journal is a publication comprising of articles written by researchers and experts in a specific field of study and solely for academic or technical purposes. The impact journal is one of the critical parts in this study as it represents the most prominent journal with the greatest citations received in publications. The most influential journals are shown in Table 11 with the quartile, numbers of citation, impact factor, and average citations per year.
Table 11

Impact journal of Android malware articles

JournalQCIFYearACPReferences
IEEE Communications Surveys and TutorialsQ111122.973201522.2Faruki et al. (2014)
IEEE Transactions on Information Forensics and SecurityQ1826.211201413.67Wang et al. (2014)
Expert Systems with ApplicationsQ1814.292201413.5Suarez-Tangil et al. (2014)
Journal of Network and Computer ApplicationsQ1695.273201411.5Seo et al. (2014a, b)
ACM Computing SurveysQ1666.1312017222Tam et al. (2017)
IEEE Transactions on Information Forensics and SecurityQ1656.211201410.83Rastogi et al. (2014)
IEEE Transactions on Industrial InformaticsQ1467.377201823Li et al. (2018)
Journal of Systems and SoftwareQ1442.55920104.4Asaf Shabtai et al. (2010)
Soft ComputingQ2652.784201616.25Narudin et al. (2016)
Computers & SecurityQ2703.062201411.67Shabtai et al. (2014a, b)
Journal of Intelligent Information SystemsQ32811.589201235.13Asaf Shabtai et al. (2012)
ACM Sigcomm Computer Communication ReviewQ3681.74201411.33Yuan et al. (2014)
Digital InvestigationQ3631.66201512.6Feizollah et al. (2015)
Tsinghua Science and TechnologyQ3631.696201615.75Yuan et al. (2016)
Digital InvestigationQ3561.66201511.2Talha et al. (2015)
IET Information SecurityQ3520.94920148.7Yerima et al. (2014)
IET Information SecurityQ3470.94920159.4Yerima et al. (2015)
ACM SIGPLAN NoticesQ43850.335201464.17Arzt et al. (2014)
Information SecurityQ41320.402201114.67Davi et al. (2011)
Computer Security—EsoricsQ4570.40220149.5Yang et al. (2014)

Q quartile, C citation, IF impact factor, ACP average citation per year

Impact journal of Android malware articles Q quartile, C citation, IF impact factor, ACP average citation per year From the top 20 highest impact journal articles of Android malware, there were eight (8) articles with Quartile 1 (Q1) impact. Q1 to Q4 refers to journal’s ranking quartiles within a subdiscipline. Q1 is the greatest impact of the journal. In this regard, the most influential journal in this study was the IEEE Communications Surveys and Tutorials that have been in the WoS for five (5) years. It has an average of 22.2 citations per year. The title of the best impact journal article in the WoS was: Android Security: A Survey of Issues, Malware Penetration, and Defenses with 111 citations. Moreover, the oldest journal in the WoS is the Journal of Systems and Software which has been in the WoS for ten years. It has 44 citations and an average of 4.4 citations per year. Aforementioned, the number of journals for Quartile 2 (Q2) is two (2), and for Quartile 3 is seven (7). Figure 3 illustrates the top 20 authors, with 17 countries, and 28 of the most used keywords. As seen in the figure, China is the highest contributor to the publication of an article with 12 authors. Next in line is Italy, the United States, India, and Luxembourg. There seemed to be a significant difference between the first contributor, China, and the second contributor, Italy. The most common keywords used by the authors were: malware, Android, malware detection, and machine learning. Likewise, Malaysia also contributed to the publication, with the keyword most used being Android. The figure shows that the continent of Asia is the most prolific contributor to the production of Android malware, with studies conducted in China, Malaysia, India, and Singapore.
Fig. 3

Relationship between country, author, and keywords

Relationship between country, author, and keywords Figure 4 illustrates the relationship between the title, the authors and their affiliations. The titles most frequently used by the authors are Android, malware, and detection, and this applies to all the institutions. The title less used by the authors were framework, dynamic classification, approach, and techniques. Yang Liu from China was the top author, as seen in the figure. He also used the keyword Android in the title of his articles. The top university noted in Fig. 4 is traced to the University of Chinese Academy Science from China. Likewise, the University of Malaya, and the University of Malaysia Pahang, from Malaysia, also contributed to this publication on Android malware.
Fig. 4

Relationship between title with author and affiliation

Relationship between title with author and affiliation

Malware intrusion detection system (IDS)

This section describes the malware IDS used as a methodology in malware detection. Malware is purposely created to disrupt the computer or mobile devices so as to gain information and to spread the virus to infect the devices. Android has a size of 3.5 million applications and 99% have been targeted by malware (Amin et al. 2020). Most of the antivirus provided in Android apps do nothing to check the malware behaviour (Whitwam 2020). On top of that, 21.1 million Android mobile devices have been affected by malware applications when mobile users downloaded applications from Google Play Store (Counterpoint 2019). This malware will indirectly influence users to adhere to unwanted premium services, thereby causing severe damages to the mobile device (Computer Hope 2019). Malware applications calmly kidnap users’ account details, making users subscribe to premium messages via SMS, and then compromising the hardware (The App Store Celebrates 10 Years and 2 Million Apps 2018). Mobile devices usually contain a lot of personal data and crucial information that are often used for online transactions, and as a medium for bill payments (Wazid et al. 2019), thereby leading to many financial transactions. The impact of the malware is that it would conduct all these activities silently without the mobile device users’ knowledge, causing users’ financial losses. Some methods have been introduced to help researchers detect and overcome malware presence. Amin et al. (2020) proposed Android Intent (implicit and explicit) for malware detection by combining the Android permission and Android Intent. The use of intent continued in study (Shrivastava and Kumar 2019a) which focused on permission and intent modelling. On the other hand, Taheri et al. (2020) developed four detection methods using Hamming distance to find the similarities of benign and malware samples. Those mentioned studies used static analysis technique which is considered the greatest method in reducing power and time consumption in detecting the malware. Despite that, Garg et al. (2020) proposed a multi stage model using anomaly (dynamic) to solve the security of IoT-enabled application. Both techniques have different roles and advantages. Malware detection system is divided into three (3) sections as illustrated in Fig. 5. This system includes the analysis techniques, the detection approaches, and the deployment approaches.
Fig. 5

Taxonomy of the malware detection system

Taxonomy of the malware detection system

Analysis technique

Analysis technique is a method which can determine the malicious code by classifying the malware features into two types: dynamic analysis and static analysis (Belaoued et al. 2019). Both types of analysis techniques are used to detect malware presence. Unfortunately, the unscrupulous author can use obfuscation as a technique to prevent being detected (Or-Meir et al. 2019). Obfuscation is a technique practiced so as to make something difficult to understand. Static analysis is a technique of investigating the code in offline mode (Amin et al. 2020). The examination is executed without running an application (Amin et al. 2020; Statista 2019; Tam et al. 2017; Akour et al. 2017). For this purpose, it uses the reverse engineering technique to extract certain features for analysis, such as API and data permissions (Singhal et al. 2019). Static analysis detects the malware by comparing the detection code with the source code in the database. The process of the static analysis reads the code and detects unfamiliar code as malware. Studies by Singhal et al. (2019) and Magdum (2015) have detected malware by using static analysis technique. The advantage of using static analysis is its fast detection. The process of detection can be performed without executing the applications (Shrivastava and Kumar 2017). Although static analysis is unable to detect the obfuscation technique, it is able to reveal and address the suspicious files much faster (Shrivastava and Kumar 2019a). Dynamic analysis observes the behaviour of malicious files during the execution of an application (Akour et al. 2017). It is different from static analysis in that dynamic analysis is able to detect unknown malware, new malware, and even obfuscation techniques (Kuntz et al. 2017; Kim et al. 2019). The application that is detected as malicious by static analysis will then be re-analysed by dynamic analysis. This technique is more accurate and it reduces costs. Some studies such as Lanet et al. (2018) and Feizollah et al. (2017) had used dynamic analysis. The only limitation of dynamic analysis is that it is unable to identify malicious applications like IMEI stealers (Singhal et al. 2019). Table 12 illustrates the comparison between static and dynamic analyses.
Table 12

The comparison between static and dynamic analyses

Analysis techniqueStaticDynamic
Characteristic
 Analysis modeOffline modeIn execution of applications
 Malware analysis

Applied Reverse engineering tools such as Apktool

Using the API system to check malicious

Analyze the behavior during execution of an application

It observes the malicious and error program

 Tools used for analysis

DroidRanger

Scandroid

RiskRanker

Stowaway

AdRisk

DNADroid

Kirin

CrowDroid

TaintDroid

ParanoidAndroid

Aurasium

AppFence

DriodScope

BenefitThe detection is fastThe result is more accurate
 LimitationIt is incapable of detecting unfamiliar and new malware familiesIncrease power consumption and cost
The comparison between static and dynamic analyses Applied Reverse engineering tools such as Apktool Using the API system to check malicious Analyze the behavior during execution of an application It observes the malicious and error program DroidRanger Scandroid RiskRanker Stowaway AdRisk DNADroid Kirin CrowDroid TaintDroid ParanoidAndroid Aurasium AppFence DriodScope

Detection approach

Malware detection approaches can be divided into three types: signature, anomaly, and hybrid (Razak et al. 2016). The signature approach detects malware events by matching the signature stored in the database via the normal and abnormal patterns (Seo et al. 2014a, b). In comparison, the anomaly approach recognises malicious behaviours by supervising the events via network traffic and system (Suárez-Tangil et al. 2018). It has the advantage of detecting new malware and unfamiliar malware by observing the behaviour. Nevertheless, this approach is unable to detect unfamiliar and new malware that is not matched with the signature in the database. Thus, the database needs to be updated frequently in order to enable the detection of various malware. The comparison between the signature and anomaly approaches is presented in Table 13.
Table 13

The comparison between signature and anomaly approaches

Detection approachAdvantagesDisadvantages
Signature

High detection rate and accuracy for known attacks

The simple and effective to detect known malware

Has lower false alarm rate

Only detect the code that has a signature in the database

The database needs to update frequently to detect new malware

Anomaly

Able to adapt and detect new, unique and abnormal malware

Less dependent on an existing database

Have a higher false alarm rate due to unconfigured properly before their deployment
The comparison between signature and anomaly approaches High detection rate and accuracy for known attacks The simple and effective to detect known malware Has lower false alarm rate Only detect the code that has a signature in the database The database needs to update frequently to detect new malware Able to adapt and detect new, unique and abnormal malware Less dependent on an existing database Another approach is the hybrid approach which is the combination of the anomaly and signature approaches. The combination helps to enable the detection of new malware whenever the signature is unable to perform the detection. This approach overcomes the deficiency of both the anomaly and signature approaches. The studies by Seo et al. (2014a, b) and Yu et al. (2013) had used the anomaly approach to detect malware. Table 14 demonstrates the studies of the signature approach, Table 15 highlights studies of the anomaly approach and Table 16 presents the studies of the hybrid approach.
Table 14

Signature approach

ReferencesAimClassifierPerformance
Almin and Chatterjee (2015)To propose an Android application analyzer (AAA) to identify malicious applications installed on the phoneK-Means and Naïve BayesianMore Accurate Compared To The Anti-Virus
Sheen et al. (2015)To design scalable mechanisms using multi-feature collaborative decision fusion (MCDF)Naïve Bayes, J48, SVM and IbkTPR = 97%, Precision = 83%
Sharma and Gupta (2019)To propose a method using machine learning for privacy risk analysis in Android applicationsBayesian networkAccuracy = 95.5%
Zhu et al. (2018)To propose DroidDet with low cost and high efficientRotation forest and SVMAccuracy = 88.3%
Martín et al. (2019)To analyze malware using Machine learning classifierGraph-Community Algorithm and Hierarchical ClusteringAccuracy = 84%

SVM support vendor machine

Table 15

Anomaly approach

ReferencesObjectiveAlgorithmResult
An et al. (2018)To create a robust malware detection to secure home routersSVMTPR = 99.8%
Yu et al. (2013)To analyze Android application behavior using the Machine Learning method. (Dynamic)Naïve Bayesian with Chi-squareAccuracy = 80.4%
Lanet et al. (2018)To compare the performance of different detection approaches using different featuredSVM, HMM, (J48), and RF

HMM = 90.64%

SVM = 97.33%

J48 = 97%

RF = 97.33%

Tahir et al. (2019)To define and propose a new method for recognizing abnormal behaviors in networkLOC and SVMTPR = 94.8%
Hu et al. (2018)To propose a combination of network traffic analysis and data mining to classify malicious network behaviorSVM, KNN, and LOFAccuracy = 81.8%

SVM support vendor machine, HMM hidden Markov model, RF random forest, LOF local outlier factor, KNN K-nearest neighbors

Table 16

Hybrid approach

ReferencesObjectiveAlgorithmResult
Rehman et al. (2018)To present detection of malware in Android Applications using signature and anomaly approachKNN, J48, SVM, decision treeAccuracy = 99.8%
Ali (2019)To present a genetic algorithm (GA) and a particle swarm optimization (PSO) to fix up the optimization problem in SVMGA and PSOTPR = 96%
Venkatraman et al. (2019)To examine the proposed method of hybrid image-based with deep learning architectures for effective malware classificationSVMAccuracy = 98.6%
Adebayo and Aziz (2019)To improve the malicious detection rate using PSO algorithm against AndroidPSOPSO Accuracy = 98.2%
Huda et al. (2018)To introduce a hybrid structure to classify features of a large-time routine of malware behaviorSVMAccuracy = 97.7%

SVM support vendor machine, LOF local outlier factor, KNN K-nearest neighbors, GA genetic algorithm, PSO particle swarm optimization

Signature approach SVM support vendor machine Anomaly approach HMM = 90.64% SVM = 97.33% J48 = 97% RF = 97.33% SVM support vendor machine, HMM hidden Markov model, RF random forest, LOF local outlier factor, KNN K-nearest neighbors Hybrid approach SVM support vendor machine, LOF local outlier factor, KNN K-nearest neighbors, GA genetic algorithm, PSO particle swarm optimization

Deployment approach

The deployment approach is used for detecting malware in the intrusion detection system (IDS). An IDS is a security tool used for recognising intrusions, just like the firewall (Feizollah et al. 2013). The IDS hardware, software, or combination, is used for monitoring the activities and for detecting the malware signal in the network or system. Anomaly detection and signature-based detection are two types of IDSs (Daimi 2017). The malware intrusion detection system is deployed either in a host-based, network-based, or hybrid-based system. An activity in the host-based system (HIDS) is monitored, analysed, and processed by itself whilst the deployment detection in network-based (NIDS) system is run by a remote server (Mas’ud et al. 2014a, b). Meanwhile, the hybrid-based detection system comes from the combination of the HIDS and the NIDS. The aim of the combination (HIDS and NIDS) is to increase the capabilities of the existing IDS (Potteti and Parati 2015). The deployment approach used by previous studies is presented in Table 17.
Table 17

Deployment and detection approach studies

ReferencesDeployment ApproachDetection ApproachYear
Guanghui (2020)NIDSAnomaly2020
Yang et al. (2019)NIDSAnomaly2019
Niazi and Faheem (2019)NIDSAnomaly2019
Liang et al. (2019)NIDSAnomaly2019
Besharati et al. (2018)HIDSSignature2019
Jose et al. (2018)HIDSAnomaly2018
Deshpande et al. (2018)HIDSAnomaly2018
Subba et al. (2017)HIDSAnomaly2017
(Haider et al. (2016)HIDSAnomaly2016
Moon et al. (2016)HIDSAnomaly2016
Koucham et al. (2015)HIDSAnomaly2015
Deployment and detection approach studies

Mobile malware

The popularity of the mobile device has spurred the emergence of malware. Most malware target Androids for spreading the malicious code because it is the most commonly used operating system in many mobile devices. As mentioned before, malware targets mobile activities by stealing user-sensitive data such by encrypting users’ banking data, eliminating crucial data, altering, and monitoring user’s activities without the users’ knowledge (Qamar et al. 2019; Arabo and Pranggono 2013). Malware is able to interrupt the operation of the devices by consuming the resources of the devices such as the storage, processor, and network (Shrivastava and Kumar 2019b) (Cyber Secur. Parallel Distrib. Comput. 2019). The malware author has a lot of creativity such that they spread the malware by infecting the devices and network insidiously. To better understand malware threats, this section reviews studies of mobile malware extracted from the WoS database, published from 2010 to 2019. Table 18 lists the various types of malware and its characteristics which are incredibly harmful to mobile devices. These diverse types of malware can threaten the devices by employing different purposes in order to damage the system in the mobile devices.
Table 18

Malware and the characteristics

Types of malwareCharacteristic
VirusThe virus spreads and infects the file and the program by executed itself
WormA worm replicates and sending itself through the network without affecting the operating system
Trojan horseTrojan will disguise itself as a trustworthy program to attract a user to run it. It will distribute the virus when the program is running
BotnetA Botnet spread itself through the network and allowed an attacker to control the infected computer
SpywareSpyware took user information, data, and observe their activities without their knowledge
RootkitsA rootkit treats the root of the system
AdwareAdware is an unwanted advertisement in the form of a popup or banner, and it comes from the history of the user's browser
Malware and the characteristics The table indicates how each malware type can attack the mobile devices through varying methods. The infected and damaged mobile devices would then be infiltrated with fake emails, unnecessary software updates, fake websites, and counterfeit applications. Their presence is unnoticed because they are silent; thus, devices would be infected without the user’s knowledge. Users would only detect their presence when the devices are fully damaged, or in critical condition. Future studies should attempt to describe the detection using multiple methods so as to reduce such incidences on mobile devices. Figure 6 presents the mapping of malicious malware types and their behaviours.
Fig. 6

The mapping of malicious malware types and their behaviours

The mapping of malicious malware types and their behaviours

Risk analysis

Risk analysis is a process used to identify the loss, the threat, and the level of risk occurring (Alali et al. 2018). The level of risk is measured based on the impact of the mobile attack. As mobile device functions grow drastically to compete with the new emergence of design among developers in the market place, mobile users face higher risks (Naga Malleswari et al. 2017). Risk analysis is thus analysed by some procedures, such as categorising the risk, triggers, effects of the risk, re-evaluating the possibility of the risk, and finding the factor to mitigate the risk (Sharma and Gupta 2018b; B 2018). There are three levels of risk, such as low, medium, and high (Shrivastava and Kumar 2019c). Likewise, there are three main elements of security materials, such as confidential data, availability, and integrity (B 2018). Table 19 illustrates the risk level of risk analysis.
Table 19

Risk level

Risk levelDescription
HighThe risk is unacceptable and it reduces the risk implemented before running the application. The data are exposed to leakage, unsecured wifi, the presence of spyware, and phishing attacks
MediumThe risk is acceptable and under protection. It needs to monitor continuously for each threat to ensure the level at the normal
LowThe risk is acceptable and able to be used. The security is provided by the mobile devices with the relevant protection but needs to observe the threat to detect any changes that will increase the risk level
Risk level The risk levels are the yield of the inacceptable effect of ambiguous events or impact of the event. The risk levels are evaluated based on the factor of impact and likelihood. Nevertheless, the vulnerabilities of this method are that they are unable to incorporate the abilities of the threat so as to determine the risk level. Moreover, the threat depends on the vulnerabilities of the system. Therefore, risk analysis helps the user to manage the risk factor for a specific event.

Threats

The risk analysis is the consequence of the threat on mobile devices. Mobile threats are divided into four (4) classes, such as application threats, web threats, network threats, and physical threats (Lookout 2019). Table 20 represents the details of each threat.
Table 20

The threat and descriptions

ThreatDescription
Application basedThe threat comes from downloaded applications from the market store. The fraudulent application looks legitimate and exploits the devices once downloaded. The vulnerabilities of the devices contribute to the exploitation of the threat
Web basedThe connection of the Internet has spurred the threat easily comes when the users used the devices to surf the website contained malware
Network basedThe attackers usually provided open wifi to gain confidential information from the users
Physical basedThe portable device easily lost or stolen. The value of the devices gathered with the data stored inside has encouraged the unscrupulous to get the devices physically
The threat and descriptions

Evaluation measure

The common evaluation of measurements as practiced by researchers in malware IDS is the effectiveness of the system they used. This evaluation focuses on accuracy, true positive rate (TPR), false positive rate (FPR), true negative rate (TNR), false negative rate (FNR), f-measure, and recall. A true positive (TP) indicates the precise measurement of the presence of malware. The higher the true positive, the better the outcome. A false negative (FN) indicates a detection of malware erroneously defined as benign. A true negative (TN) refers the benign correctly as a benign while a false positive (FP) defines a benign erroneously as a malware (Kamesh and Sakthi Priya 2012).

Challenges and future direction

The challenges and movements for future research that are related to mobile malware are hereby also discussed. A number of studies had emphasised the malware issue which posed a threat to mobile devices. It is thus a challenge to many researchers looking at malware detection. Although numerous methods have been noted in advanced studies, and various systems have been proposed for detecting malware automatically, malicious files, websites and the number of malware continue to grow (Akour et al. 2017). Thus, more needs to be done in this research field.

Accuracy

The accuracy of malware detection is measured by using the measurement of TP, FP, TN, and FN. They are called true if the detection is accurate and matches reality. The perfect detection is when the TPR = 100%, TNR = 100, FPR = 0% and FNR = 0%. In truth, it is impossible to achieve 100% accuracy of TP and FN (Akour et al. 2017). However, with a larger amount of data, analysis may possibly provide a near accuracy of the positive or negative measurement. False positive or false negative is likewise known as a false alarm. It incorrectly identifies a legitimate programme as a malicious programme or a malicious programme as a legitimate program. This is a big challenge in the IDS. This issue frustrates users and the developers when the programme they had created is blocked. This occurrence can affect the reputation of their business. No one will run the programme anymore when it is flagged as malicious. Another effect for a false alarm is that it could turn the device to become dangerous when the suspicious programme runs into the user device. This scenario is a significant problem in current technology. A study by Wang et al. (2018) uses a hybrid approach to analyse the data of malware, and the results showed a lower rate of false alarm.

Features

Features are the first part to be selected prior to analysing and detecting the malware. The best feature selected would allow the detector to become more efficient (Aung and Zaw 2013). Inappropriate features may cause a high false alarm (Razak et al. 2016). However, the number of features can be reduced so as to get a higher level of accuracy. The first and crucial step in machine learning method is feature selection (Feizollah et al. 2015). The selection of appropriate features can thus lead to higher accuracy, thereby reducing the false alarm. Nevertheless, accumulating a massive number of inappropriate features for the machine learning classification may cause classifier drawbacks like the misunderstanding of algorithm learning, an increase in the model’s running time, and lower generality (Mas’ud et al. 2014a, b). Subsequently, an enormous size of features contributes to the growth of space usage, and intricacy management. Therefore, it is unsuitable for mobile devices with limited storage and restricted power consumption. The selection of appropriate features enables machine learning classifiers to make more efficient detections during the pre-processing of data. Thus, reducing the features is necessary in order to preserve the accuracy.

Dataset

The occurrence of Android malware attacking users has increased rapidly in recent years. The Android malware applies sophisticated techniques such as metamorphism, polymorphism, oligomorphic, obfuscation, and modification to avoid detection. The detection mechanism provided by mobile devices is unable to operate efficiently due to restricted datasets, and the lack of understanding of malicious activities. To evaluate the proposed system of detection, a dataset base is required. The limitation of the malware sample can make the detection system unreliable. This study Razak et al. (2016) discovered that more than 100,000 malware modifications belonged to 777 families. Studies by Zhou and Jiang (2012) and Arzt et al. (2014) had used malware samples from the Virus Share project, and benign samples from Google Play. They noted that the dataset of malware has been proliferating. Based on this, a restriction mechanism is needed. Moreover, outdated dataset has also become inappropriate for analysis, thus research also requires the latest dataset to be examined so as to improve the detection performance in terms of accuracy and to lower the rate of false alarms.

Risk assessment

Risk assessment is a fundamental method used for explaining the possibility of risk levels. It is a crucial part that shields the user against dangerous applications; it grants mobile users a possibility of reducing the threat impact to a tolerable level. The process for risk assessment is carried out so as to measure the impact of the threat based on the value of the assets, threats, vulnerabilities, and the effects resulting from the attack. The acknowledged risk from threats and weaknesses must be ranked depending on the criticality of the issue. Leading a risk assessment is interesting due to less awareness on its effect on risk decision making. A study by Naga Malleswari et al. (2017) helped to improve users’ awareness by presenting the privacy risk for users before granting permission. Similarly, a study by Alali et al. (2018) proposed the Fuzzy Inference Model (FIS) which determines four (4) factors of risk: threat, vulnerability, impact, and likelihood. These were used for classifying the risk impact, and for providing the response to mitigate the risk. Razak et al. (2019) also presented the risk factor based on zoning approaches.

Android malware on the Internet of Things (IoT)

The IoT is the modernised technology of communication among things and objects (Wu et al. 2019). The IoT integrates widely with mobile devices by serving various services around the world. The mobile devices supervise and control the provided services for the long distance with keyless mechanisms. For example, Macmanus (2012) offered a location in Audi’s new business car. The volume of data produced every day from different IoT has enlarged, from terabytes to petabytes (Garg et al. 2020). The IoT services produce more convenient experiences such as remote monitors to lessen energy waste for home equipment, such as air conditioning, television, and refrigerator. With the sharp growth of technology, more and more IoTs services are controlled by Android mobile applications. Behind the sophisticated technology of IoT, the issue of security in IoT services has also worsened, especially with malware attacks. Likewise, the IoT threat has also increased over the past few years. Attackers are able to slip into mobile user devices and reach the control of the IoT. It benefits the attacker by acquiring and integrating information such as personal data, contact number, location, payment data of Internet banking from mobile devices. The open-source in the Android application has become one of the factors causing malware increase in the IoT services. Studies showed that eight new malware families that emerged in the year 2015 had mostly originated from China and the United States (Johnson 2016). To overcome the malware attack, some protection has been introduced. However, the protection of the IoT system is actually a part of the tough problem due to the difficulty in developing an effective detection system and in avoiding the leakage of information. The study by Park et al. (2019) proposed three levels of awareness to be introduced into the IoT system: define the threat, measure the risk, and optimise the risk. A study by Wu et al. (2019) was able to detect malware by using the Bayesian network which was grounded on traffic feature analysis. The result showed a higher accuracy with fewer substantial features. Another study (Ham et al. 2014) used the linear support vector machine (SVM) to detect malware so as to secure a reliable IoT service. Another study (Garg et al. 2020) used the Density-Based Spatial Clustering of Applications with Noise for the same purpose.

Mobile banking

The successful use of mobile phone among people and network thriving globally has encouraged the people to expose their business to online systems (Sharma and Gupta 2016). Exposing the business has to expand the users of mobile banking. Despite the advantages of using mobile banking, this type of banking also invites the proliferation of malware altogether. The emergence of banking malware necessitates more attention as this threat is one of the most dangerous threats to the mobile user; e.g. by generating malicious code with the intention of stealing personal financial information from banking and transferring funds activities to the hacker accounts. Mobile bankers previously spread the malware through third-party apps and recently infiltrate Google Play widely (Mobliciti 2020). A new version of mobile banking malware is impersonating as legitimate cryptocurrency wallet to steal money from the secure wallet found on Google Play (Seals 2020; Whittaker 2019). The malware will flourish more in sophistication as cryptocurrency trading becomes widespread.

Fake applications for COVID-19 pandemic

Covid-19 has threatened the world since 2019 and in this period, malware have been growing fast. Ransomware thrives 72% and mobile vulnerabilities grow 50% (Security 2020). This increase in malware is because most of world population is in lockdown, whereby this situation renders developers the busiest in gaining profit for their benefit. This complicated situation has offered attackers to highlight their talent of creating applications for users. Starting with fake application to control the dissemination of the coronavirus, malicious apps were also created to give recommendations on how to avoid infection from the biological virus. Users would show unlimited interest in any application that are related to COVID-19 in order to stay healthy (Moran 2020). Banking is a susceptible sector in this pandemic since users tend to utilise online shopping during lockdown situations. Banking trojan and information stealers were found rampant with the increase of unemployment (Ljubas 2020). Thus, this sector has contributed to the greatest amount of malware activities to spread malice during the COVID-19 pandemic.

Conclusion

The popularity of computers and mobile devices has led to the emergence of new malware. According to TMS (2011), malware had increased by 54% in 2017 as compared to previous years. A total of 24,000 malicious files are detected each day. An estimation by (Spring 2019) noted that one out of five computers would be attacked by at least one malware in 2019. The Internet is one of the factors frequently spreading malware into user’s devices. To alleviate malware problems and to improve safety in mobile devices, several approaches have been introduced by various studies. The current study used the bibliometric technique to analyse the Android malware trends from 2010 until 2019. Some findings were noted and highlighted, for instance, productivity, research area, authors, highly cited articles, institutions, and impact journals. These criteria were able to highlight the research trends related to Android malware production. The number of Android malware production had increased at an average rate of 2.1% on a yearly basis. The report by Dobran (2019) stated that ransomware attacked new organisations every 14 s for the year 2019, and for the year 2021, it would be every 11 s. The bibliometric analysis of the Android malware in 2010 until 2019 showed that Asia was the highest contributor of research publications, among other continents. Next was Europe and North America. The Middle East, Australia, Africa, and South America contributed less. The highest publication of Android malware articles was from China, with a total of 25% publications, followed by the United States, India, Italy, and South Korea. This implies that Asia had outperformed Europe by a difference of 17.6% of publications. In addition, this study has also highlighted the top 20 authors who were most active in the area of research. The top author was Francesco Mercaldo, followed by Fabio Martinelli, Mauro Conti, and Carraro Aaron Visaggio. These top four authors were from Italy, the continent of Europe. The top two authors and the fourth top author were from the University of Sannio while the third top author was from the University of Padua. The subsequent authors were from the countries of Luxembourg, Malaysia, China, and India. This study has shown the bibliometric analysis of the publications in the field of Android malware. The analysis provides the objective and a quantitative measure of the influence that a publication has on its respective specialty. The information present in this study is important for researchers to build the network of research in their field of study. It is hoped that the information would encourage more future research to be performed as a measure to overcome the rapid proliferation of malware. Finally, this study delivers a general depiction on the subject matter and aims to exhibit the importance of the expansion in the field of Android malware investigation.
  9 in total

Review 1.  Big data analytics in health: an overview and bibliometric study of research activity.

Authors:  Panagiota Galetsi; Korina Katsaliaki
Journal:  Health Info Libr J       Date:  2019-12-30

2.  A bibliometric analysis of the top 50 most cited articles published in the Dental Traumatology.

Authors:  Paras Ahmad; Paul Vincent Abbott; Mohammad Khursheed Alam; Jawaad Ahmed Asif
Journal:  Dent Traumatol       Date:  2019-12-19       Impact factor: 3.333

Review 3.  Half a century of computer methods and programs in biomedicine: A bibliometric analysis from 1970 to 2017.

Authors:  Nagesh Shukla; José M Merigó; Thorsten Lammers; Luis Miranda
Journal:  Comput Methods Programs Biomed       Date:  2019-09-10       Impact factor: 5.428

Review 4.  Global trends and prospects in microplastics research: A bibliometric analysis.

Authors:  Ying Zhang; Shengyan Pu; Xue Lv; Ya Gao; Long Ge
Journal:  J Hazard Mater       Date:  2020-06-10       Impact factor: 10.588

5.  The bibliometric analysis of scholarly production: How great is the impact?

Authors:  Ole Ellegaard; Johan A Wallin
Journal:  Scientometrics       Date:  2015-07-28       Impact factor: 3.238

6.  Security Risk Measurement for Information Leakage in IoT-Based Smart Homes from a Situational Awareness Perspective.

Authors:  Mookyu Park; Haengrok Oh; Kyungho Lee
Journal:  Sensors (Basel)       Date:  2019-05-09       Impact factor: 3.576

7.  Trends in Shared Decision-Making Studies From 2009 to 2018: A Bibliometric Analysis.

Authors:  Cuncun Lu; Xiuxia Li; Kehu Yang
Journal:  Front Public Health       Date:  2019-12-18

8.  A Bibliometric Analysis of the Development of ICD-11 in Medical Informatics.

Authors:  Donghua Chen; Runtong Zhang; Hongmei Zhao; Jiayi Feng
Journal:  J Healthc Eng       Date:  2019-12-25       Impact factor: 2.682

Review 9.  Publication Trends of Research on Sepsis and Host Immune Response during 1999-2019: A 20-year Bibliometric Analysis.

Authors:  Ren-Qi Yao; Chao Ren; Jun-Nan Wang; Guo-Sheng Wu; Xiao-Mei Zhu; Zhao-Fan Xia; Yong-Ming Yao
Journal:  Int J Biol Sci       Date:  2020-01-01       Impact factor: 6.580

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.