| Literature DB >> 35813991 |
Rajesh Kumar1, Siddharth Sharma1, Chirag Vachhani1, Nitish Yadav1.
Abstract
This paper examines the transition in the cyber-security discipline induced by the ongoing COVID-19 pandemic. Using the classical information retrieval techniques, a more than twenty thousand documents are analyzed for the cyber content. In particular, we build the topic models using the Latent Dirichlet Allocation (LDA) unsupervised machine learning algorithm. The literature corpus is build through a uniform keyword search process made on the scholarly and the non-scholarly platforms filtered through the years 2010-2021. To qualitatively know the impact of COVID-19 pandemic on cyber-security, and perform a trend analysis of key themes, we organize the entire corpus into various (combination of) categories based on time period and whether the literature has undergone peer review process. Based on the weighted distribution of keywords in the aggregated corpus, we identify the key themes. While in the pre-COVID-19 period, the topics of cyber-threats to technology, privacy policy, blockchain remain popular, in the post-COVID-19 period, focus has shifted to challenges directly or indirectly brought by the pandemic. In particular, we observe post-COVID-19 cyber-security themes of privacy in healthcare, cyber insurance, cyber risks in supply chain gaining recognition. Few cyber-topics such as of malware, control system security remain important in perpetuity. We believe our work represents the evolving nature of the cyber-security discipline and reaffirms the need to tailor appropriate interventions by noting the key trends.Entities:
Keywords: COVID-19 pandemic; Cyber-security trends; Latent Dirichlet Allocation; Topic modeling; Trend analysis; Unsupervised machine learning
Year: 2022 PMID: 35813991 PMCID: PMC9254575 DOI: 10.1016/j.cose.2022.102821
Source DB: PubMed Journal: Comput Secur ISSN: 0167-4048 Impact factor: 5.105
Inferred topics for the all-time non-peer-reviewed corpus from LDA generated topic distribution.
| Topic | Topic Distributions | Label |
|---|---|---|
| T0 | network, time, business, privacy, company, year, attack, technology, secure, healthcare | Network privacy and security in healthcare |
| T1 | story, bank, group, fraud, site, money, attack, week, case, botnet | Financial fraud |
| T2 | software, malware, system, vulnerability, code, today, version, computer, flaw, program | Software system vulnerabilities |
| T3 | card, credit, breach, company, payment, customer, encryption, fraud, source, investigation | Credit card breaches |
| T4 | service, number, account, email, access, phone, address, site, name, password, customer | Customer credentials |
| T5 | device, surveillance, location, iPhone, skimmer, machine, cash, camera, wireless, reader | Device skimmers |
Topic distribution for the peer-reviewed corpus. Here PR refers to the peer-reviewed literature taken from Scopus academic database.
| Corpus Name | Topic | Topic Distributions | Label |
|---|---|---|---|
| Pre- COVID (2010- | T2 | detection, classify, malware, accuracy, feature, detect, anomaly_detection, module, memory, computation | Malware detection |
| T7 | system, analysis, approach, model, framework, attack, process, risk, network, problem | Cyber-risk management | |
| T9 | software, vulnerability, defense, test,code, damage, cyberattack, functionality, malware, execution | Software vulnerability analysis | |
| T3 | system, control, power, communication, access, secure, energy, architecture, operation, authentication | Control system security | |
| T4 | network, traffic, technique, dataset, intrusion_detection, graph, ransomware, intrusion, honeypot, password | Network intrusion detection | |
| T5 | game, influence, prediction, deterrence, theory, exercise, deception, uncertainty, file, aviation | Attack models based on game-theory | |
| T6 | blockchain, insurance, learning, workshop, cryptocurrency, employment, shortage, literacy, index, spectrum | Blockchain and cryptocurrency | |
| T8 | design, development, knowledge, field, engineering, computer, education, training, project, program | Cybersecurity engineering | |
| Post- COVID | T1 | risk, management, safety, business, solution, value, organization, adoption, technology, supply_chain | Cyber-risk management |
| (2020-2021) | T1 | privacy, health, healthcare, science, crisis, care, innovation, cryptography, period, staff | Privacy in healthcare |
| (PR) | T2 | investment, wireless, channel, disinformation, transaction, banking, aviation, author, radiation, bank | Digital banking |
| T4 | awareness, program, course, web, computer_ science, exposure, online, university, coverage, book | Security awareness | |
| T10 | network, model, detection, attack, performance, method, approach, problem, traffic, intrusion | Network intrusion detection | |
| All- time | T1 | technology, development, analysis, threat, framework, approach, work, industry, level, environment | Cyber-governance |
| (2010- 2021) | T1 | software, safety, engineering, education, design, program, project, vehicle, hardware, control | Control systems security |
| T6 | network, detection, model, attack, performance, method, system, intrusion, machine, time | Intrusion detection models | |
| T7 | malware, cloud, service, evidence, event, time, group, framework, series, contribution | Malware detection | |
| T10 | risk, management, business, assessment, authentication, resilience, trust, compliance, organization, supply_chain | Cyber risk management | |
| T10 | malware, health, healthcare, ransomware, fraud, smart_city, consumer, payment, breach, device | Cyber-security in healthcare | |
| T10 | risk, management, business, assessment, authentication, resilience, trust, compliance, organization, supply_chain | Supply chain cyber-risk |
Labelled topics for the non-peer-reviewed corpus from LDA generated topic distribution. Here NPR= Non-peer-reviewed literature taken from four cybersecurity blogs (refer Section 3).
| Corpus Name | Topic | Topic Distributions | Label |
|---|---|---|---|
| Pre- COVID | T0 | firm, service, account, phone, address, program, password, vulnerability, protection, order | Common security vulnerabilities |
| (2010- 2019) | T2 | government, part, customer, group, news, case, name, page, fact, encryption | Fake news |
| T3 | time, access, privacy, business, malware, email, today, industry, consumer, code | Malware in business processes | |
| T4 | network, technology, bank, banking, cloud, process, approach, tech, community, cybercrime | Cybercrime in banking | |
| T6 | breach, data, report, problem, issue, web, law, database, company, gang | Data breach reports | |
| T7 | company, software, card, site, credit, system, control, fraud, management, payment | Credit card frauds | |
| T8 | use, story, money, tool, organization, hacker, patch, domain, ability, victim | Hacker’s victims’ story | |
| Post- COVID | T0 | time, attack, site, email, use, activity, discussion, bank, organization, encryption | Side channel attacks |
| (2020- 2021) (NPR) | T1 | network, today, group, customer, someone, credit, authentication, something, hack, digital_transformation | Network security |
| T2 | time, attack, site, email, insider, activity, discussion, bank, organization, encryption | Social engineering attack | |
| T2 | company, software, service, malware, system, computer, code, password, day, file | Software vulnerability analysis | |
| T3 | malware, fraud, work, cloud, identity, compromise, investment, campaign, services, member | Malware in cloud services | |
| T4 | technology, ransomware, reading, week, traffic, process, government, accompanying_podcast, analysis, infrastructure | Ransomware | |
| All- time | T0 | network, time, business, privacy, company, year, attack, technology, report, government | Network Privacy |
| (2010- 2021) | T1 | story, bank, group, fraud, site, money, attack, week, case, botnet | Financial fraud |
| T3 | card, credit, breach, company, payment, customer, encryption, fraud, source, investigation | Credit card breaches | |
| T4 | service, number, account, email, access, phone, address, site, name, password, customer | Customer credentials | |
| T5 | device, surveillance, location, iPhone, skimmer, machine, cash, camera, wireless, reader | Device skimmers | |
| T5 | software, malware, system, vulnerability, code, today, version, computer, flaw, program | Software system vulnerability |
Fig. 2Framework to generate topic distributions for each corpus.
Fig. 3Coherence score metric vs. Hyper-parameters to determine the optimal values for the All-time non-peer-reviewed corpus.
Cyber-security themes in the peer-reviewed literature corpus. The text marked in red represents the themes common to both the pre-COVID-19 and the post-COVID-19 period.
| Category | Trends |
|---|---|
| Pre-COVID-19 cyber-security themes | “Malware detection”, “Control system security”, “Intrusion detection system”, “Software vulnerability analysis”, “Attack models based on game-theory”, “Blockchain and cryptocurrency”,“Cyber risk management” |
| Post-COVID-19 cyber-security themes | “Cyber insurance”, “Cyber-security in healthcare”, “Control system security”, “Intrusion detection system”, “Cyber-resilience in supply chain”, “Malware detection”, “Cyber risk management”, Security awareness |
Cyber-themes in the non-peer-reviewed corpus. The red text represents themes common to both the pre-COVID-19 and the post-COVID-19 eras.
| Category | Trends |
|---|---|
| Pre-COVID-19 cyber-security themes | “Privacy Protection”, “Fake news”, “Malware in business process”, “Cyber-crime in banking”, “Credit card frauds” |
| Post-COVID-19 cyber-security themes | “Social-engineering”, “Side channel attacks”, “Software vulnerability analysis”, “Malware in cloud services” |
Fig. 4Number of documents vs. time plot for each cyber-theme in peer-reviewed corpus. The year 2020 marks the beginning of the COVID-19 pandemic (shown by a vertical dotted red line).
Fig. 5Year vs Number of documents for each cyber-theme in non-peer-reviewed corpus. The year 2020 marks the beginning of the COVID-19 pandemic (shown by a vertical dotted red line).