| Literature DB >> 35039760 |
SreeJagadeesh Malla1, P J A Alphonse1.
Abstract
The World Health Organization declared the novel coronavirus disease 2019 a pandemic on March 11, 2020. Along with the coronavirus pandemic, a new crisis has emerged, characterized by widespread fear and panic caused by a lack of information or, in some cases, outright fake messages. In these circumstances, Twitter is one of the most eminent and trusted social media platforms. Fake tweets, on the other hand, are challenging to detect and differentiate. The primary goal of this paper is to educate society about the importance of accurate information and prevent the spread of fake information. This paper has investigated COVID-19 fake data from various social media platforms such as Twitter, Facebook, and Instagram. The objective of this paper is to categorize given tweets as either fake or real news. The authors have tested various deep learning models on the COVID-19 fake dataset. Finally, the CT-BERT and RoBERTa deep learning models outperformed other deep learning models like BERT, BERTweet, AlBERT, and DistlBERT. The proposed ensemble deep learning architecture outperformed CT-BERT and RoBERTa on the COVID-19 fake news dataset using the multiplicative fusion technique. The proposed model's performance in this technique was determined by the multiplicative product of the final predictive values of CT-BERT and RoBERTa. This technique overcomes the disadvantage of these CT-BERT and RoBERTa models' incorrect predictive nature. The proposed architecture outperforms both well-known ML and DL models, with 98.88% accuracy and a 98.93% F1-score.Entities:
Year: 2022 PMID: 35039760 PMCID: PMC8756170 DOI: 10.1140/epjs/s11734-022-00436-6
Source DB: PubMed Journal: Eur Phys J Spec Top ISSN: 1951-6355 Impact factor: 2.707
Different COVID-19 disease related tweets
| Fake COVID-19 tweet | |
|
| |
| Real COVID-19 tweet | |
Summary for text classification based papers
| S. no. | Year | Author and Paper | Important discussed topic | Model/technique |
|---|---|---|---|---|
| 1 | 2021 | Madichetty and Sridevi [ | Detecting situational tweets in the aftermath of a disaster. | Feature-based approach and Fine-tuned RoBERTa model. |
| 2 | 2021 | Malla and Alphonse [ | Detection of useful information tweets about COVID-19 | Majority voting technique, RoBERTa, BERTweet, and CT-BERT |
| 3 | 2020 | Jagadeesh and Alphonse [ | Identify and classify informative COVID-19 tweets | RoBERTa |
| 4 | 2007 | Danesh et al. [ | An ensemble ML model for text classification. | Naive-Bayes, k-NN classifier and Racchio with fusion method |
| 5 | 2021 | Kranthi Kumar and Alphonse [ | Impact of respiratory sounds on COVID-19 disease identification | CNN |
COVID-19 fake tweets detection papers summary
| S. no. | Year | Author and Paper | Model | Accuracy | F1-score |
|---|---|---|---|---|---|
| 1 | 2020 | Gautam et al. [ | XLNet + LDA | 93.90 | 94.00 |
| 2 | 2021 | Shushkevich and Cardiff [ | Ensemble model | 93.90 | 94.00 |
| 3 | 2020 | Glazkova et al. [ | CT-BERT+ hard voting | 98.50 | 98.69 |
| 4 | 2021 | Paka et al. [ | BERTa + BiLSTM | 95.40 | 95.30 |
| 5 | 2021 | Li et al. [ | BiLSTM | 89.00 | 88.00 |
| 6 | 2017 | Singhania et al. [ | 3HAN + features | 96.30 | 96.77 |
| 7 | 2021 | Ahmed et al. [ | LSVM + TF-IDF | 92.15 | 92.08 |
Fig. 1Overview of the proposed (FBEDL) ensemble deep learning model
RoBERTa results have obtained using the test data set
| Epocs | bs | lr | Loss | TN | FN | FP | TP | Accuracy | F1-score | Recall | Precision |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 25 | 8 | 1.12e−05 | 0.0780 | 1002 | 18 | 13 | 1107 | 98.55 | 98.62 | 98.84 | 98.40 |
CT-BERT results from the test dataset
| Epocs | bs | lr | Loss | TN | FN | FP | TP | Accuracy | Recall | Precision | F1-score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 25 | 8 | 1.02e−06 | 0.0313 | 993 | 27 | 11 | 1109 | 98.22 | 99.02 | 97.62 | 98.32 |
COVID-19 fake english data set details
| Fake news (COVID-19) dataSet | Fake | Real |
|---|---|---|
| Training data | 3060 | 3360 |
| Validation data | 1020 | 1120 |
| Test data | 1020 | 1120 |
Machine learning models: results from the test data set
| Model | Accuracy | F1-score | Precision | Recall |
|---|---|---|---|---|
| DecisionTree | 85.37 | 85.39 | 85.47 | 85.37 |
| Logistic Regression | 91.96 | 91.96 | 92.01 | 91.96 |
| Support Vector Machine | 93.32 | 93.32 | 93.33 | 93.32 |
| Gradient Boost | 86.96 | 86.96 | 87.24 | 86.96 |
Deep learning models: results from the test data set
| Model | TN | FN | FP | TP | Accuracy | F1-score | Precision | Recall |
|---|---|---|---|---|---|---|---|---|
| ALBERT | 937 | 83 | 62 | 1058 | 93.22 | 93.59 | 94.46 | 92.73 |
| DistilBERT | 988 | 32 | 20 | 1100 | 97.57 | 97.69 | 98.21 | 97.17 |
| BERT | 988 | 32 | 13 | 1107 | 97.90 | 98.00 | 98.84 | 97.19 |
| BERTweet-COVID-19 | 992 | 28 | 16 | 1104 | 97.94 | 98.05 | 98.57 | 97.53 |
| CT-BERT | 993 | 27 | 11 | 1109 | 98.22 | 98.32 | 99.02 | 97.62 |
| RoBERTa | 1002 | 18 | 13 | 1107 | 98.55 | 98.62 | 98.84 | 98.40 |
Fig. 2Deep learning models performance in terms of evaluation metrics
Performance comparison: proposed model versus existing models
| Model | F1-score | Accuracy |
|---|---|---|
| Decision Tree [ | 85.39 | 85.37 |
| Gradient Boost [ | 86.96 | 86.96 |
| Logistic Regression [ | 91.96 | 91.96 |
| Support Vector Machine [ | 93.32 | 93.32 |
| (Baseline) | ||
| XLNet + LDA [ | 96.70 | 96.60 |
| Ensemble [ | 94.00 | 93.90 |
| CT-BERT + hard voting [ | 98.69 | 98.50 |
| Proposed model (FBEDL) | 98.93 | 98.88 |
FBEDL model results from the test dataset
| F1-score | Accuracy | Recall | precision |
|---|---|---|---|
| 98.93 | 98.88 | 98.75 | 99.11 |
Fig. 3Performance: proposed model versus state-of-art models