| Literature DB >> 34819990 |
Waqas Haider Bangyal1, Rukhma Qasim1, Najeeb Ur Rehman1, Zeeshan Ahmad1, Hafsa Dar2, Laiqa Rukhsar1, Zahra Aman1, Jamil Ahmad3.
Abstract
A vast amount of data is generated every second for microblogs, content sharing via social media sites, and social networking. Twitter is an essential popular microblog where people voice their opinions about daily issues. Recently, analyzing these opinions is the primary concern of Sentiment analysis or opinion mining. Efficiently capturing, gathering, and analyzing sentiments have been challenging for researchers. To deal with these challenges, in this research work, we propose a highly accurate approach for SA of fake news on COVID-19. The fake news dataset contains fake news on COVID-19; we started by data preprocessing (replace the missing value, noise removal, tokenization, and stemming). We applied a semantic model with term frequency and inverse document frequency weighting for data representation. In the measuring and evaluation step, we applied eight machine-learning algorithms such as Naive Bayesian, Adaboost, K-nearest neighbors, random forest, logistic regression, decision tree, neural networks, and support vector machine and four deep learning CNN, LSTM, RNN, and GRU. Afterward, based on the results, we boiled a highly efficient prediction model with python, and we trained and evaluated the classification model according to the performance measures (confusion matrix, classification rate, true positives rate...), then tested the model on a set of unclassified fake news on COVID-19, to predict the sentiment class of each fake news on COVID-19. Obtained results demonstrate a high accuracy compared to the other models. Finally, a set of recommendations is provided with future directions for this research to help researchers select an efficient sentiment analysis model on Twitter data.Entities:
Mesh:
Year: 2021 PMID: 34819990 PMCID: PMC8608495 DOI: 10.1155/2021/5514220
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Research methodology.
Machine learning based approaches results for fake news on COVID-19.
| Model | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| Logistic regression | 96 | 0.99 | 0.97 | 0.98 |
| Random forest | 97 | 0.99 | 0.98 | 0.98 |
| Decision tree | 96 | 0.96 | 0.96 | 0.96 |
| SVM | 96 | 0.99 | 0.97 | 0.98 |
| KNN | 97 | 0.97 | 0.96 | 0.98 |
| Adaboost | 96 | 0.98 | 0.97 | 0.97 |
| MLP/BPA | 97 | 0.98 | 0.98 | 0.98 |
| Naïve Bayes | 95 | 0.99 | 0.97 | 0.98 |
Macro and weighted average of precision, recall, and F1-score.
| Metrics | Average | Machine learning classification algorithms | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Logistic regression | Random forest | Decision tree | SVM | KNN | Adaboost | MLP/BPA | Naïve Bayes | ||
| Precision | Macro | 0.64 | 0.73 | 0.75 | 0.68 | 0.51 | 0.66 | 0.74 | 0.69 |
| Weighted | 0.99 | 0.99 | 0.96 | 0.99 | 1.00 | 0.98 | 0.98 | 0.99 | |
|
| |||||||||
| Recall | Macro | 0.99 | 0.99 | 0.76 | 0.99 | 0.98 | 0.85 | 0.90 | 0.97 |
| Weighted | 0.97 | 0.98 | 0.96 | 0.97 | 0.96 | 0.97 | 0.98 | 0.97 | |
|
| |||||||||
| F1 score | Macro | 0.71 | 0.81 | 0.76 | 0.76 | 0.51 | 0.71 | 0.80 | 0.77 |
| Weighted | 0.98 | 0.98 | 0.96 | 0.98 | 0.98 | 0.97 | 0.98 | 0.98 | |
Deep learning-based approaches results for fake news on COVID-19.
| Model | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| LSTM | 95 | 0.90 | 0.95 | 0.93 |
| BiLSTM | 97 | 0.97 | 0.97 | 0.97 |
| GRU | 95 | 0.91 | 0.95 | 0.93 |
| RNN | 95 | 0.91 | 0.95 | 0.93 |
| Conv1d | 97 | 0.97 | 0.97 | 0.97 |
Macro and weighted average of precision, recall, and F1-score of DL.
| Metrics | Average | Deep learning classification algorithms | ||||
|---|---|---|---|---|---|---|
| CNN | LSTM | Bi-LSTM | GRU | RNN | ||
| Precision | Macro | 0.64 | 0.79 | 0.71 | 0.85 | 0.47 |
| Weighted | 0.97 | 0.90 | 0.97 | 0.91 | 0.91 | |
|
| ||||||
| Recall | Macro | 0.98 | 0.96 | 0.96 | 0.67 | 0.50 |
| Weighted | 0.97 | 0.95 | 0.97 | 0.95 | 0.95 | |
|
| ||||||
| F1 score | Macro | 0.71 | 0.80 | 0.80 | 0.79 | 0.48 |
| Weighted | 0.97 | 0.93 | 0.97 | 0.93 | 0.93 | |
Figure 2Heat map of logistic regression.
Figure 3Heat map of random forest.
Figure 4Heat map of decision tree.
Figure 5Heat map of SVM.
Figure 6Heat map of KNN.
Figure 7Heat map of AdaBoost.
Figure 8Heat map of MLP.
Figure 9Heat map of Naïve Bayes.
Figure 10Heat map for CNN.
Figure 11Heat map for LSTM.
Figure 12Heat map for bi-LSTM.
Figure 13Heat map for GRU.
Figure 14Heat map of RNN.
Figure 15Classification accuracy machine learning-based approaches results for fake news on COVID-19.
Figure 16Classification accuracy deep learning-based approaches results for fake news on COVID-19.