| Literature DB >> 35742272 |
Zeinab Shahbazi1, Yung-Cheol Byun1.
Abstract
Social media evidence is the new topic in digital forensics. If social media information is correctly explored, there will be significant support for investigating various offenses. Exploring social media information to give the government potential proof of a crime is not an easy task. Digital forensic investigation is based on natural language processing (NLP) techniques and the blockchain framework proposed in this process. The main reason for using NLP in this process is for data collection analysis, representations of every phase, vectorization phase, feature selection, and classifier evaluation. Applying a blockchain technique in this system secures the data information to avoid hacking and any network attack. The system's potential is demonstrated by using a real-world dataset.Entities:
Keywords: blockchain; digital forensics; machine learning; natural language processing
Mesh:
Year: 2022 PMID: 35742272 PMCID: PMC9222863 DOI: 10.3390/ijerph19127027
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Comparison of the recent state-of-the-art DF methods.
| Author | Proposed | Advantages | Limitations |
|---|---|---|---|
| Choi et al. | Digital forensic | Data recovery | Difficult to protect |
| Zhang et al. | Digital forensic | Investigating the | Limitation in |
| Du et al. | Future of artificial | Survey of automated | Image data with |
| Xiao et al. | Analysis of video- | Identification of | Difficult to identify |
| Jadir et al. | Digital forensic | Enhancing the | Challenges of processing |
Figure 1Overview of the proposed digital forensic analysis.
Figure 2Multi-layered implementation process.
Vertices before and after feature selection.
| Before Feature Selection | ||||||
|---|---|---|---|---|---|---|
| Samples | topic 1 | topic 2 | topic 3 | topic 4 | topic 5 | y (label) |
| Vector 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| Vector 2 | 0 | 0 | 0.7 | 0.2 | 0.4 | 1 |
| Vector 3 | 0.5 | 0.3 | 0.2 | 0 | 0.4 | 0 |
| Vector 4 | 0 | 0.6 | 0 | 0 | 0.6 | 1 |
| After Feature Selection | ||||||
| Vectors | topic 1 | topic 2 | topic 3 | y (label) | ||
| Vector 1 | 1 | 0 | 0 | 0 | ||
| Vector 2 | 0 | 0 | 0.2 | 1 | ||
| Vector 3 | 0.5 | 0.3 | 0 | 0 | ||
| Vector 4 | 0 | 0.6 | 0 | 1 | ||
Figure 3Blockchain-based evidence identification.
Development environment.
| Module | Component | Description |
|---|---|---|
| Machine | Operating System | Microsoft Windows 10 |
| CPU | Intel (R) Core (TM) | |
| Main Memory | 16GB RAM | |
| Core Programming Language | Python | |
| IDE | PyCharm Professional 2020 | |
| ML Algorithm | Random Forest | |
| Blockchain | Operating System | Ubuntu Linux 18.04 LTS |
| Docker Engine | Version 18.06.1-ce | |
| Docker Composer | Version 1.13.0 | |
| IDE | Composer Playground | |
| Programming Language | Node.js |
Data information.
| Data Type | Total Records |
|---|---|
| 5000 | |
| 6500 | |
| Blogs | 6600 |
| News | 5500 |
| Training Set | 80% |
| Testing Set | 20% |
Figure 4Multi-layered data processing.
Analysis operators list of the following processes.
| Name of Operators | Details |
|---|---|
| Tweet Cloud | Object correlation method to provide the |
| Hashtag Cloud | Object correlation based on hashtags of |
| Interaction Graph | Subject and object correlation for sorting |
| Interaction Frequency | Subject and objective correlation to perform |
| Views Similarity | Rule-based correlation for nearest user-opinion |
| Trace Operator | Linking the evidence to the entity. |
| Temporal Activity | Using temporal correlation to analyze the user |
| Geo-location Activity | Object correlation for sorting the location based |
Different classifiers’ performance evaluation for each fold.
| Fold | Metrics | Decision | Naive | Logistic | Random | Support Vector |
|---|---|---|---|---|---|---|
| 1 | P | 0.6595 | 0.8254 | 0.9486 | 0.9846 | 0.6487 |
| R | 0.7511 | 0.9700 | 0.7111 | 0.6911 | 0.7348 | |
| F1 | 0.6667 | 0.6174 | 0.8611 | 0.7811 | 0.7794 | |
| 2 | P | 0.7198 | 0.3541 | 0.6736 | 0.8947 | 0.4955 |
| R | 0.5511 | 0.7511 | 0.4711 | 0.5948 | 0.6564 | |
| F1 | 0.6944 | 0.5656 | 0.5611 | 0.7182 | 0.5836 | |
| 3 | P | 0.6111 | 0.6993 | 0.8793 | 0.9831 | 0.6939 |
| R | 0.6311 | 0.9300 | 0.7511 | 0.8334 | 0.8479 | |
| F1 | 0.6825 | 0.6111 | 0.7622 | 0.7939 | 0.7749 | |
| 4 | P | 0.5968 | 0.7986 | 0.8611 | 0.9444 | 0.6232 |
| R | 0.8711 | 0.8711 | 0.7911 | 0.7746 | 0.7498 | |
| F1 | 0.6929 | 0.6477 | 0.7994 | 0.8337 | 0.6949 | |
| 5 | P | 0.6374 | 0.4058 | 0.7929 | 0.8478 | 0.5498 |
| R | 0.5111 | 0.8711 | 0.7111 | 0.6964 | 0.7699 | |
| F1 | 0.5990 | 0.6566 | 0.7633 | 0.7982 | 0.6479 |
Figure 5LDA-based preplexity records.
Average score of different classifiers.
| Metrics | Decision | Naive | Logistic | Random | Support Vector |
|---|---|---|---|---|---|
| P | 0.6449 | 0.5367 | 0.6393 | 0.0.9443 | 0.6279 |
| R | 0.6631 | 0.9191 | 0.6871 | 0.6943 | 0.7432 |
| F1 | 0.6673 | 0.6197 | 0.7334 | 0.7611 | 0.6745 |
Records with and without feature selection.
| # | Metrics | Decision | Naive | Logistic | Random | Support |
|---|---|---|---|---|---|---|
| With feature | P | 0.6449 | 0.6367 | 0.8513 | 0.9443 | 0.6279 |
| R | 0.6631 | 0.9191 | 0.6871 | 0.6943 | 0.7432 | |
| F1 | 0.6673 | 0.6197 | 0.7534 | 0.7611 | 0.6745 | |
| Without feature | P | 0.6293 | 0.6176 | 0.8122 | 0.8321 | 0.4574 |
| R | 0.6171 | 0.5351 | 0.6791 | 0.5467 | 0.6831 | |
| F1 | 0.6372 | 0.5779 | 0.6974 | 0.6998 | 0.5445 |
Figure 6Topic probability analysis records.
Figure 7Evidence block in JSON script.
Figure 8Details of the process of the blockchain framework for forensic analysis.