| Literature DB >> 34268490 |
Jennifer Sleeman1, Tim Finin1, Milton Halem1.
Abstract
Cybersecurity threats continue to increase and are impacting almost all aspects of modern life. Being aware of how vulnerabilities and their exploits are changing gives helpful insights into combating new threats. Applying dynamic topic modeling to a time-stamped cybersecurity document collection shows how the significance and details of concepts found in them are evolving. We correlate two different temporal corpora, one with reports about specific exploits and the other with research-oriented papers on cybersecurity vulnerabilities and threats. We represent the documents, concepts, and dynamic topic modeling data in a semantic knowledge graph to support integration, inference, and discovery. A critical insight into discovering knowledge through topic modeling is seeding the knowledge graph with domain concepts to guide the modeling process. We use Wikipedia concepts to provide a basis for performing concept phrase extraction and show how using those phrases improves the quality of the topic models. Researchers can query the resulting knowledge graph to reveal important relations and trends. This work is novel because it uses topics as a bridge to relate documents across corpora over time.Entities:
Keywords: cybersecurity; cyberthreat information; dynamic topic modeling; knowledge graph; topic modeling
Year: 2021 PMID: 34268490 PMCID: PMC8275653 DOI: 10.3389/fdata.2021.601529
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
FIGURE 1Comparing word counts and phrase counts for black carbon among the intergovernmental panel for climate change books assessment report 3 (Sleeman et al., 2018).
FIGURE 2Knowledge graph construction of the dynamic topic modeling process.
Examples of Wikipedia concept terms used.
|
|
|---|
| Cryptanalysis |
| Cryptographic protocol |
| Cryptographic software |
| Cryptography |
| Cryptosystem and cryptovirology |
| Cyber-insurance |
| Cyber-security regulation |
| Cyber security standards |
| Cyber self-defense |
| Cyberattack and cybercrime |
| Cyberspace |
| Cyberterrorism |
| Cyberwarfare |
Examples of acronyms generated for cybersecurity-related concepts.
|
|
|
|---|---|
| Australian information security association | aisa |
| Advanced encryption standard | aes |
| Denial of service | dos |
| Department of homeland security | Dhs |
FIGURE 3Three components of our topic model ontology (TMO) are (A) classes for representing document collections (top left), (B) classes to encode document and topic relationships (middle right), and (C) classes to represent cross-domain mappings (bottom left).
FIGURE 4Symantec malware reports from 2000 to 2017 data distribution by year (top) and Cryptography and Security arXiv articles from 1997–2016 data distribution by year (bottom).
FIGURE 5arXiv train/test split heat maps.
These two examples of the top ten terms for a topic from the arXiv topic model of the DTM concepts with and without context concept phrases show that adding the phrases makes the concepts more easily understood by a person.
|
|
|
|
|---|---|---|
| 2000 | Quantum cryptography, phase, photon, cryptography, measurement, channel, system, eavesdropping, stage, and polarization | Quantum, state, communication, phase, cryptography, channel, eavesdropping, protocol, error, and polarization |
| 2000 | Intrusion detection, universal, taxonomy, intrusion detection system, based, payload, classification, input, attack, and alert | Cell, network, intrusion, parameter, system, information, detection, method, space, and approach |
Comparing the concept-based, bag-of-words, and automatic phrase extraction methods for a classification task using three different algorithms on the Cryptography and Security data set.
| Classification method |
|
|
|
|---|---|---|---|
| SVM | 0.60 | 0.59 | 0.58 |
| Naive bayes | 0.36 | 0.34 | 0.28 |
| Logistic regression | 0.59 | 0.59 | 0.56 |
FIGURE 6Comparing topic coherence between the bag-of-words model and the concept-based model using the c_v measure.
FIGURE 760-, 80-, and 100-topic Symantec Malware reports of the DTM concept: Malware.
The top ten most relevant terms from two topics in the 100-topic Symantec Malware report of the dynamic topic model that correlate to the two trends that spike early and then drop off in Figure 7 100 topics.
|
|
|
|---|---|
| 2006 | Remove, protection, threat, antivirus software, packed, file, malware, Symantec, security, and window |
| 2007 | Remove, protection, threat, antivirus software, packed, file, malware, Symantec, security, and window |
| 2008 | Protection, remove, threat, malware, packed, file, Symantec, antivirus software, security, window, protection, packed, threat, file, Aymantec, remove, malware, window, antivirus software, and security |
|
|
|
| 2008 | Privacy, commander, doctor, malware, picture, movie, action, video, multi, and surveillance |
| 2009 | Malware, doctor, privacy, commander, action, android, intent, picture, movie, and video |
| 2010 | Malware, action, doctor, privacy, android, intent, commander, Bluetooth, picture, and provider |
| 2011 | Action, android, intent, privacy malware, doctor, Bluetooth, commander, picture, and WiFi |
100-topic Symantec Malware report of the dynamic topic model topic 86.
|
|
|
|---|---|
| 2007 | Malicious, component, info, scanner, attacker, rootkit, door, remote, malware, and computer |
| 2008 | Info, malicious, malware, scanner, attacker, rootkit, component, door, computer, and remote |
| 2009 | Malicious, info, attacker, malware, door, scanner, rootkit, remote, computer, and component |
| 2010 | Malicious, info, attacker, malware, door, computer, remote, scanner, rootkit, and based |
| 2011 | Info, malicious, attacker, malware, computer, door, remote, reputation, dropped, and based |
| 2012 | Malicious, info, attacker, malware, computer, door, remote, reputation, threat, and dropper |
| 2013 | Malicious, info, malware, attacker, computer, remote, door, dropper, reputation, and back |
| 2014 | Malicious, malware, info, attacker, computer, dropper, remote, door, reputation, and back |
| 2015 | Malware, malicious, info, dropper, attacker, computer, remote, payload, dropped, and door |
| 2016 | Malware, malicious, info, dropper, attacker, computer, remote, payload, dropped, and door |
100-topic Symantec Malware report of the dynamic topic model topic 86.
|
|
|
|---|---|
| 2000 | Component, computer, attacker, malicious, dropper, kernel, remote, malware, door, and author |
| 2001 | Component, attacker, computer, malicious, remote, dropper, door, malware, kernel, and security |
| 2002 | Component, attacker, malicious, door, dropper, computer, remote, kernel, malware, and rootkit |
FIGURE 860-, 80-, and 100-topic arXiv Cryptography and Security research articles of the DTM concept: Malware.
100-topic arXiv Cryptography and Security research articles of the dynamic topic model topic 3 for years 2007–2009 and 2012–2016.
|
|
|
|---|---|
| 2007 | Infected, epidemic, virus, malware, infection, worm, internet, and spreading |
| 2008 | Wireless, spread, malware, worm, infected, infection, spreading, and virus |
| 2009 | Spread, internet, propagation, epidemic, malware, worm, infected, infection, virus, propagation, host, Internet, simulation, and spread |
|
|
|
| 2012 | Malware, infection, infected, virus, worm, malicious, host, behavior, spread, and parameter |
| 2013 | Malware, malicious, behavior, worm, infected, sample, call, email, file, and family |
| 2014 | Malware, similarity, sample, infected, behavior, infection, malicious, family, based, and type |
| 2015 | Malware, sample, behavior, spreading, family, infected, based, malicious, virus, and benign |
| 2016 | Malware, sample, virus, family, malicious, benign, infected, infection, based, and anti |