| Literature DB >> 35214237 |
Igor Kotenko1, Konstantin Izrailov2, Mikhail Buinevich3.
Abstract
Ensuring security for modern IoT systems requires the use of complex methods to analyze their software. One of the most in-demand methods that has repeatedly been proven to be effective is static analysis. However, the progressive complication of the connections in IoT systems, the increase in their scale, and the heterogeneity of elements requires the automation and intellectualization of manual experts' work. A hypothesis to this end is posed that assumes the applicability of machine-learning solutions for IoT system static analysis. A scheme of this research, which is aimed at confirming the hypothesis and reflecting the ontology of the study, is given. The main contributions to the work are as follows: systematization of static analysis stages for IoT systems and decisions of machine-learning problems in the form of formalized models; review of the entire subject area publications with analysis of the results; confirmation of the machine-learning instrumentaries applicability for each static analysis stage; and the proposal of an intelligent framework concept for the static analysis of IoT systems. The novelty of the results obtained is a consideration of the entire process of static analysis (from the beginning of IoT system research to the final delivery of the results), consideration of each stage from the entirely given set of machine-learning solutions perspective, as well as formalization of the stages and solutions in the form of "Form and Content" data transformations.Entities:
Keywords: IoT systems; analytic model; cyber security; formalization; machine learning; static analysis; survey model
Year: 2022 PMID: 35214237 PMCID: PMC8963110 DOI: 10.3390/s22041335
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Research scheme.
Form and Content transformations in the process of static analysis of the information system.
| Transformations | Stage 1. | Stage 2. | Stage 3. | Stage 4. |
|---|---|---|---|---|
| Form and Content |
|
|
|
|
| Form |
|
|
|
|
| Content |
|
|
|
|
Machine-learning decision matrix for static analysis stages.
| Tasks | Stage 1. | Stage 2. | Stage 3. | Stage 4. |
|---|---|---|---|---|
| Task 1. |
|
|
|
|
| Task 2. Anomaly |
|
|
|
|
| Task 3. |
|
|
|
|
| Task 4. |
|
|
|
|
| Task 5. |
|
|
|
|
Note. In the table, the expression with multiplication by zero “0 x” means that in the process of applying the solution from the ML field data were obtained, which are not used to this end of SA at this stage. Thus, some classes {C}, regression numbers {D}, clusters {O} and reduced feature dimensions {O ∼ Z} are divided into two sets (with indices T1 and T2 and a common T, with indices K1 and K2 and a common K, and with indices R1 and R2 and a common R), the second of which is not used in the interest of SA.
Publication summary (Part 1).
| Ref. | Title | Year | Type | Stage | Task | Content |
|---|---|---|---|---|---|---|
| [ | Toward Large-scale Vulnerability Discovery Using Machine Learning | 2016 | Conference | 3 | 1 | Practice |
| 3 | 3 | |||||
| [ | Detection of malicious code by applying machine-learning classifiers on static features: A state-of-the-art survey | 2009 | Journal | 3 | 1 | Experiment |
| [ | Malicious Code Detection Using Active Learning | 2008 | Conference | 3 | 1 | Theory |
| [ | Type Learning for Binaries and Its Applications | 2019 | Conference | 2 | 1 | Experiment |
| [ | Method for classification of files based on machine-learning technology | 2020 | Journal | 1 | 1 | Practice |
| [ | Identification of Processor’s Architecture of Executable Code Based on Machine Learning | 2020 | Journal | 1 | 1 | Practice |
| [ | Machine Learning-Assisted Binary Code Analysis | 2007 | Workshop | 2 | 1 | Theory |
| [ | o-glasses: Visualizing x86 Code from Binary Using a 1d-CNN | 2020 | Conference | 4 | 1 | Theory |
| [ | Cyber Vulnerability Intelligence for Internet of Things Binary | 2020 | Conference | 3 | 1 | Experiment |
| [ | A machine-learning approach to anomaly-based detection on Android platforms | 2015 | Journal | 3 | 2 | Practice |
| [ | Android malware detection using the dendritic cell algorithm | 2014 | Conference | 3 | 2 | Experiment |
| [ | Similarity detection among data files—a machine-learning approach | 1997 | Conference | 1 | 4 | Theory |
| [ | Document Clustering for Forensic Computing: An Approach for Improving Computer Inspection | 2011 | Conference | 1 | 4 | Experiment |
| [ | Document Clustering—A Feasible Demonstration with K-means Algorithm | 2019 | Conference | 1 | 4 | Theory |
| [ | Evolution in Software Architecture Recovery Techniques—A Survey | 2017 | Conference | 2 | 4 | Theory |
| [ | A Hierarchical Clustering-Based Approach for Software Restructuring at the Package Level | 2017 | Conference | 2 | 4 | Practice |
| [ | A Novel Solutions for Malicious Code Detection and Family Clustering Based on Machine Learning | 2019 | Conference | 4 | 1 | Theory |
| 4 | 4 | |||||
| 4 | 5 | |||||
| [ | Android malware detection using 3-level ensemble | 2016 | Conference | 3 | 5 | Experiment |
| [ | Reverse engineering smart card malware using side channel analysis with machine-learning techniques | 2016 | Conference | 2 | 5 | Theory |
| 3 | 5 | |||||
| [ | Feature selection and machine-learning classification for malware detection | 2015 | Journal | 3 | 1 | Theory |
| 3 | 5 | |||||
| [ | Android malware detection based on permissions | 2014 | Conference | 3 | 1 | Practice |
| 3 | 5 | |||||
| [ | Android ransomware detection using reduced opcode sequence and image similarity | 2017 | Conference | 2 | 5 | Experiment |
| 4 | 5 | |||||
| [ | File Block Classification by Support Vector Machine | 2011 | Conference | 1 | 1 | Theory |
| [ | Preventing File-Less Attacks with Machine Learning Techniques | 2019 | Conference | 3 | 2 | Theory |
| [ | Document Image Classification and Labeling using Multiple Instance Learning | 2011 | Conference | 1 | 1 | Experiment |
| [ | Multi-scale Structural Saliency for Signature Detection | 2007 | Conference | 1 | 1 | Practice |
| [ | Multi-instance clustering with applications to multi-instance prediction | 2009 | Journal | 1 | 4 | Experiment |
| [ | Detection of packed executables using support vector machines | 2011 | Conference | 1 | 1 | Practice |
| [ | Detecting Packed Executable File: Supervised or Anomaly Detection Method? | 2016 | Conference | 1 | 2 | Experiment |
| [ | An anomaly detection system proposal to ensure information security for file integrations | 2018 | Conference | 1 | 2 | Theory |
| [ | Visualizing Big Data Outliers through Distributed Aggregation | 2018 | Conference | 4 | 2 | Theory |
Publication summary (Part 2).
| Ref. | Title | Year | Type | Stage | Task | Content |
|---|---|---|---|---|---|---|
| [ | Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs | 2016 | Conference | 4 | 2 | Practice |
| [ | Predicting File Lifetimes with Machine Learning | 2019 | Lecture Notes | 1 | 3 | Theory |
| [ | A Machine-Learning Approach to Automatic Detection of Delimiters in Tabular Data Files | 2016 | Conference | 2 | 3 | Theory |
| [ | Multiple linear regression for universal steganalysis of images | 2018 | Conference | 1 | 3 | Experiment |
| 2 | 3 | |||||
| [ | Log File Anomaly Detection | 2016 | Report | 2 | 2 | Theory |
| 4 | 2 | |||||
| [ | Experimentations with OpenStack System Logs and Support Vector Machine for an Anomaly Detection Model in a Private Cloud Infrastructure | 2020 | Conference | 2 | 2 | Experiment |
| [ | Forecasting Zero-Day Vulnerabilities | 2016 | Conference | 4 | 3 | Practice |
| [ | The Effects of Depth of Field on Subjective Evaluation of Aesthetic Appeal and Image Quality of Photographs | 2020 | Journal | 4 | 3 | Experiment |
| [ | Text Document Classification with PCA and One-Class SVM | 2017 | Conference | 1 | 1 | Theory |
| [ | Machine Learning With Feature Selection Using Principal Component Analysis for Malware Detection—A Case Study | 2019 | Journal | 3 | 1 | Theory |
| [ | Power-based Side-Channel Instruction-level Disassembler | 2018 | Conference | 2 | 5 | Practice |
| 3 | 5 | |||||
| [ | Data mining methods for detection of new malicious executables | 2001 | Conference | 3 | 1 | Practice |
| [ | Integrated static and dynamic analysis for malware detection | 2015 | Journal | 3 | 1 | Experiment |
| [ | Classification of malware families based on N-grams sequential pattern features | 2013 | Conference | 3 | 1 | Experiment |
| [ | Malware detection using machine learning | 2009 | Conference | 3 | 1 | Practice |
| [ | Byteweight: Learning to recognize functions in binary code | 2014 | Conference | 2 | 1 | Practice |
| [ | Recognizing functions in binaries with neural networks | 2015 | Conference | 2 | 1 | Theory |
| [ | Automatically learning semantic features for defect prediction | 2016 | Conference | 3 | 2 | Theory |
| [ | Emergent, crowd-scale programming practice in the IDE | 2014 | Conference | 3 | 2 | Practice |
| [ | Using web corpus statistics for program analysis | 2014 | Conference | 3 | 2 | Practice |
| [ | Bugram: bug detection with n-gram language models | 2016 | Conference | 3 | 2 | Practice |
| [ | Finding Likely Errors with Bayesian Specifications | 2017 | Preprint | 3 | 2 | Practice |
| [ | Learning to Represent Programs with Graphs | 2018 | Conference | 3 | 2 | Theory |
| [ | Deep Learning to Find Bugs | 2017 | Journal | 3 | 2 | Practice |
| [ | Strengthening the empirical analysis of the relationship between linus’ law and software security | 2010 | Conference | 3 | 3 | Theory |
| [ | An empirical study of the evolution of PHP web application security | 2011 | Conference | 3 | 3 | Theory |
| [ | Can traditional fault prediction models be used for vulnerability prediction? | 2013 | Journal | 3 | 3 | Theory |
| [ | An initial study on the use of execution complexity metrics as indicators of software vulnerabilities | 2011 | Conference | 3 | 3 | Theory |
| [ | Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities | 2011 | Journal | 3 | 3 | Theory |
| [ | Using complexity metrics to improve software security | 2013 | Journal | 3 | 3 | Theory |
| [ | Predicting vulnerable components: Software metrics vs text mining | 2014 | Conference | 3 | 3 | Theory |
Publication summary (Part 3).
| Ref. | Title | Year | Type | Stage | Task | Content |
|---|---|---|---|---|---|---|
| [ | Challenges with applying vulnerability prediction models | 2015 | Conference | 3 | 3 | Theory |
| [ | To fear or not to fear that is the question: Code characteristics of a vulnerable function with an existing exploit | 2016 | Conference | 3 | 3 | Theory |
| [ | Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista | 2010 | Conference | 3 | 3 | Theory |
| [ | Bugs as deviant behavior: A general approach to inferring errors in systems code | 2001 | Conference | 3 | 2 | Practice |
| [ | DynaMine: Finding common error patterns by mining software revision histories | 2005 | Conference | 3 | 2 | Practice |
| [ | PR-miner: Automatically extracting implicit programming rules and detecting violations in large software code | 2005 | Conference | 3 | 2 | Practice |
| [ | Detecting object usage anomalies | 2007 | Conference | 3 | 2 | Practice |
| [ | Mining API patterns as partial orders from source code: From usage scenarios to specifications | 2007 | Conference | 3 | 2 | Practice |
| [ | Alattin: Mining alternative patterns for detecting neglected conditions | 2009 | Conference | 3 | 2 | Theory |
| [ | Learning from 6000 projects: Lightweight cross-project anomaly detection | 2010 | Conference | 3 | 2 | Theory |
| [ | Discovering neglected conditions in software by mining dependence graphs | 2008 | Journal | 3 | 2 | Theory |
| [ | Chucky: Exposing missing checks in source code for vulnerability discovery | 2013 | Conference | 3 | 2 | Theory |
| [ | Vulnerability extrapolation: Assisted discovery of vulnerabilities using machine learning | 2011 | Conference | 3 | 1 | Experiment |
| [ | Generalized vulnerability extrapolation using abstract syntax trees | 2012 | Conference | 3 | 1 | Theory |
| [ | Predicting common web application vulnerabilities from input validation and sanitization code patterns | 2012 | Conference | 3 | 1 | Experiment |
| [ | Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns | 2013 | Journal | 3 | 1 | Practice |
| 3 | 4 | |||||
| [ | Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis | 2013 | Conference | 3 | 1 | Experiment |
| [ | Web application vulnerability prediction using hybrid program analysis and machine learning | 2015 | Journal | 3 | 1 | Theory |
| [ | Predicting vulnerable software components via text mining | 2014 | Journal | 3 | 1 | Theory |
| [ | Automatic inference of search patterns for taintstyle vulnerabilities | 2015 | Conference | 3 | 1 | Experiment |
| 3 | 4 | |||||
| [ | Predicting vulnerable software components through N-gram analysis and statistical feature selection | 2015 | Conference | 3 | 1 | Theory |
| [ | Classification and Analysis of Android Malware Images Using Feature Fusion Technique | 2021 | Conference | 3 | 1 | Practice |
| [ | SHELLCORE: Automating Malicious IoT Software Detection Using Shell Commands Representation | 2021 | Conference | 3 | 1 | Practice |
| 3 | 5 | |||||
| [ | Machine Learning Tensor Flow Based Platform for Recognition of Hand Written Text | 2021 | Conference | 1 | 1 | Practice |
| 2 | 1 | |||||
| [ | A Machine Learning-Based Framework for Mobile Forensics | 2020 | Conference | 1 | 1 | Practice |
| 1 | 4 | |||||
| [ | Automation of Vulnerability Classification from its Description using Machine Learning | 2020 | Conference | 4 | 1 | Practice |
| [ | Threats Classification Method for the Transport Infrastructure of a Smart City | 2020 | Conference | 4 | 4 | Theory |
Figure 2Distribution of publications by year.
Figure 3Distribution of publications by type.
Figure 4Distribution of publications by content.
Overview model of scientific works on the implementation of the static analysis stages using machine learning.
| Task Name | Stage 1. | Stage 2. | Stage 3. | Stage 4. |
|---|---|---|---|---|
| Task 1. Classification | [ | [ | [ | [ |
| Task 2. Anomaly detection | [ | [ | [ | [ |
| Task 3. Regression | [ | [ | [ | [ |
| Task 4. Clustering | [ | [ | [ | [ |
| Task 5. Generalization | [ | [ | [ |
Figure 5Histogram of research work distribution for static analysis and machine-learning tasks.
Figure 6Example of complex static IoTS analysis using machine learning.