| Literature DB >> 35194546 |
Shankar Biradar1, Sunil Saumya1, Arun Chauhan2.
Abstract
COVID-19 has caused havoc globally due to its transmission pace among the inhabitants and prolific rise in the number of people contracting the disease worldwide. As a result, the number of people seeking information about the epidemic via Internet media has increased. The impact of the hysteria that has prevailed makes people believe and share everything related to illness without questioning its truthfulness. As a result, it has amplified the misinformation spread on social media networks about the disease. Today, there is an immediate need to restrict disseminating false news, even more than ever before. This paper presents an early fusion-based method for combining key features extracted from context-based embeddings such as BERT, XLNet, and ELMo to enhance context and semantic information collection from social media posts and achieve higher accuracy for false news identification. From the observation, we found that the proposed early fusion-based method outperforms models that work on single embeddings. We also conducted detailed studies using several machine learning and deep learning models to classify misinformation on social media platforms relevant to COVID-19. To facilitate our work, we have utilized the dataset of "CONSTRAINT shared task 2021". Our research has shown that language and ensemble models are well adapted to this role, with a 97% accuracy.Entities:
Keywords: COVID-19; Contextual embedding; Fake news; Machine learning; Social networks; Voting classifier
Year: 2022 PMID: 35194546 PMCID: PMC8855031 DOI: 10.1007/s40747-022-00672-2
Source DB: PubMed Journal: Complex Intell Systems ISSN: 2199-4536
Dataset distribution
| Split | Real | Fake | Total |
|---|---|---|---|
| Train | 3360 | 3060 | 6420 |
| Validation | 1120 | 1020 | 2140 |
| Test | 1120 | 1020 | 2140 |
| Total | 5600 | 5100 | 10700 |
Sample sentences from dataset
| Sentence | Label |
|---|---|
| The CDC currently reports 99031 deaths. | Real |
| In general, the discrepancies in death counts between different | |
| sources are small and explicable. The death toll stands at roughly | |
| 100000 people today. | |
| CDC Recommends Mothers Stop Breastfeeding To | Fake |
| Boost Vaccine Efficacy | |
| The WHO confirmed asymptomatic persons can’t | Fake |
| transmit the coronavirus and are not infectious | |
| The confirmation earlier today of a second death | Real |
| linked to COVID-19 in the last two days means the number of | |
| COVID-19-related deaths in New Zealand are now 24. |
Test set results of traditional algorithms on different embedding
| BERT Embedding | XLNet Embedding | ELMo Embedding | ||||
|---|---|---|---|---|---|---|
| Classifier | Acc | Acc | Acc | |||
| 0.92-Fake | 0.82-Fake | 0.90-Fake | ||||
| LR | 0.9257 | 0.8313 | 0.9038 | |||
| 0.93-Real | 0.84-Real | 0.90-Real | ||||
| 0.87-Fake | 0.78-Fake | 0.71-Fake | ||||
| NB | 0.8759 | 0.7886 | 0.70166 | |||
| 0.88-Real | 0.79-Real | 0.69-Real | ||||
| 0.92-Fake | 0.79-Fake | 0.90-Fake | ||||
| SVM | 0.9241 | 0.8053 | 0.9044 | |||
| 0.93-Real | 0.83-Real | 0.90-Real | ||||
| 0.91-Fake | 0.82-Fake | 0.87-Fake | ||||
| RF | 0.9123 | 0.8273 | 0.8694 | |||
| 0.91-Real | 0.84-Real | 0.86-Real | ||||
| 0.91-Fake | 0.79-Fake | 0.86-Fake | ||||
| KNN | 0.9143 | 0.8012 | 0.88 | |||
| 0.91-Real | 0.81-Real | 0.87-Real | ||||
| 0.93-Fake | 0.84-Fake | 0.87-Fake | ||||
| ENSEMBLE | 0.93 | 0.85 | 0.88 | |||
| 0.93-Real | 0.85-Real | 0.88-Real | ||||
Fig. 1Early fusion-based DNN model architecture
Fig. 2RNNs’ ensemble model architecture
Fig. 3Voting classifier architecture
Fig. 4Multi-level bit-wise OR model architecture
Test set result of language and voting classifier model models (Model 3)
| Classifier | Acc (%) | |
|---|---|---|
| BERT classifier | 97 | 97-Fake,98-Real |
| ULMFit classifier | 96 | 96-Fake,96-Real |
| LR,ULMFit classifier, BERT classifier | 98 | 98-Fake,98-Real |
| LR,KNN,BERT classifier | 96 | 96-Fake,96-Real |
| LR,SVM,RF,KNN AND BERT classifier | 95 | 95-Fake,95-Real |
Parameters for ML models
| Classifier | Hyperparameter |
|---|---|
| Logistic regression | C=1, max-iter=500 |
| Random forest | no-of-estimators=200 |
| Naive bayes | var-smoothing=1e-09 |
| Support vector machine | c=1, solver=‘lbfgs’ |
| K nearest neighbors | n-neighbors=24 |
Fig. 5Comparative analysis of ML algorithms on different embedding
Fig. 6Performance analyses using different activation function
Test set results of early fusion-based DNN models (Model 1)
| Classifier | Acc (%) | |
|---|---|---|
| BERT embedding+DNN | 91.58 | 92-Fake, 91-Real |
| ELMO embedding+DNN | 92.16 | 92-Fake, 92-Real |
| XLNet embedding+DNN | 83.13 | 83-Fake, 84-Real |
| BERT+ELMo embeddings+DNN | 92.61 | 93-Fake, 92-Real |
| XLNet+BERT embeddings+DNN | 90.66 | 90-Fake, 89-Real |
| XLNet+ELMo embeddings+DNN | 91.93 | 91-Fake, 92-Real |
| BERT+XLNet+ELMo embeddings+DNN | 93 | 93-Fake, 92-Real |
Test set results of RNN-based models (Model 2)
| Classifier | Acc (%) | |
|---|---|---|
| LSTM | 91.74 | 91-Fake,92-Real |
| BILSTM | 91.121 | 91-Fake,91-Real |
| GRU | 90.23 | 89-Fake,91-Real |
| BIGRU | 91.978 | 91-Fake,93-Real |
| Ensemble | 92 | 92-Fake,93-Real |
Test set result for bit-wise operator models (Model 4)
| Model | Acc(%) | |
|---|---|---|
| LR OR BERT | 95 | 94-Fake, 95-Real |
| ULMFit OR BERT | 96 | 96-Fake, 96-Real |
| ( LR OR BERT) OR (SVM OR KNN) | 92 | 92-Fake,93-Real |
Fig. 7Performance analyses using different data set
Test cases for fake news
| Sample text | Model-1 | Model-2 | Model-3 | Model-4 | Target |
|---|---|---|---|---|---|
| Bill Gates said that the COVID-19 vaccine will permanently change your DNA | Fake | Fake | Fake | Fake | Fake |
| COVID-19 is caused by a bacterium, not virus and can be treated with aspirin | Fake | Real | Fake | Fake | Fake |
| EMA endorses the use of dexamethasone for COVID-19 | Fake | Fake | Real | Real | Fake |
| Thank God! new COVID-19 clusters mostly affecting low paid workers | Fake | Fake | Fake | Fake | Fake |
| A video of a television presenter where she says “thank Godwhere she says “thank God things get complicated” referring to the coronavirus in Germany to the coronavirus in Germany | Real | Fake | Fake | Fake | Fake |
Results on domain-specific embeddings
| Embedding | Accuracy(%) | |||
|---|---|---|---|---|
| LR | ClinicalBERT | 88 | 88 | 89 |
| NB | ClinicalBERT | 80 | 80 | 80 |
| SVM | ClinicalBERT | 90 | 90 | 90 |
| RF | ClinicalBERT | 85 | 85 | 86 |
| KNN | ClinicalBERT | 85 | 85 | 86 |
| Ensemble | ClinicalBERT | 86 | 85 | 86 |
Comparative analysis our model with some existing models
| Source | Model | Acc (%) | |
|---|---|---|---|
| [ | SVM+linguistic features | 95.19% | 95.70 |
| [ | ladiff ULMFit | 96.72 | 96.73 |
| [ | XLNet with topic distributions | 96.8 | 96.7 |
| proposed model1 | early fusion-based model | 93 | 93 |
| proposed model2 | RNN-based model | 92 | 92 |
| proposed model4 | bit-wise operator-based model | 96 | 96 |