| Literature DB >> 35564518 |
Myeong Gyu Kim1, Minjung Kim2, Jae Hyun Kim3, Kyungim Kim4.
Abstract
Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a vast number of tweets. This study aimed to apply the BERT model for classifying misinformation on garlic and COVID-19 on Twitter, using 5929 original tweets mentioning garlic and COVID-19 (4151 for fine-tuning, 1778 for test). Tweets were manually labeled as 'misinformation' and 'other.' We fine-tuned five BERT models (BERTBASE, BERTLARGE, BERTweet-base, BERTweet-COVID-19, and BERTweet-large) using a general COVID-19 rumor dataset or a garlic-specific dataset. Accuracy and F1 score were calculated to evaluate the performance of the models. The BERT models fine-tuned with the COVID-19 rumor dataset showed poor performance, with maximum accuracy of 0.647. BERT models fine-tuned with the garlic-specific dataset showed better performance. BERTweet models achieved accuracy of 0.897-0.911, while BERTBASE and BERTLARGE achieved accuracy of 0.887-0.897. BERTweet-large showed the best performance with maximum accuracy of 0.911 and an F1 score of 0.894. Thus, BERT models showed good performance in classifying misinformation. The results of our study will help detect misinformation related to garlic and COVID-19 on Twitter.Entities:
Keywords: COVID-19; Twitter; bidirectional encoder representations from transformers (BERT); garlic; misinformation
Mesh:
Year: 2022 PMID: 35564518 PMCID: PMC9103576 DOI: 10.3390/ijerph19095126
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1Study flow diagram.
Figure 2Pipeline of BERT-based models for classification task.
BERT models used in the study.
| Model | Number of Parameters | Pretraining Data |
|---|---|---|
| BERTBASE | 110 M | 2500 M words from English Wikipedia and 800 M words from BookCorpus |
| BERTLARGE | 340 M | Same as BERTBASE |
| BERTweet-base | 135 M | 850 M English tweets |
| BERTweet-COVID-19 | 135 M | 23 M COVID-19 English tweets |
| BERTweet-large | 355 M | 873 M English Tweets |
Classification performance of BERT models.
| Model | Fine-Tuning Datasets | |||
|---|---|---|---|---|
| COVID-19 Rumor Dataset | Garlic-Specific Dataset | |||
| Accuracy | F1 Score | Accuracy | F1 Score | |
| TF-IDF with naïve Bayes | - | - | 0.839 | 0.799 |
| BERTBASE | 0.620 | 0.399 | 0.887 | 0.864 |
| BERTLARGE | 0.621 | 0.570 | 0.897 | 0.874 |
| BERTweet-base | 0.622 | 0.589 | 0.897 | 0.876 |
| BERTweet-COVID-19 | 0.647 | 0.588 | 0.901 | 0.880 |
| BERTweet-large | 0.626 | 0.563 |
|
|
* Bolds represent the highest score.
Figure 3Precision-recall curve (BERTweet-large model).
Figure 4t-SNE visualization of the first hidden-layer embeddings (upper left) and the last hidden-layer embeddings (upper right) before fine-tuning and the first hidden-layer embeddings (lower left) and the last hidden-layer embeddings (lower right) after fine-tuning.
Predicted results with BERTweet-large model.
| Category | N * | Type | Example Tweet (Paraphrased to Ensure Anonymity) |
|---|---|---|---|
| True positive | 682 | “Good news!!! Corona virus can be cured with a bowl of freshly boiled garlic water. An old Chinese doctor proved its effectiveness. Many patients have also confirmed that this is effective.” | |
| True negative | 939 | True information | “COVID-19 Updates: While Garlic is a healthy food with some antimicrobial properties, there is no evidence that eating garlic has protected people from coronavirus.—WHO” |
| Sarcasm | “To protect against the corona virus, eat two cloves of garlic and raw onion every morning and evening. Basically, it’s useless but will keep everyone at a safe distance.” | ||
| Irrelevant information | “Corona Food Diaries Day 11: Kale and Pasta in a Garlic Parmesan white wine sauce topped with grilled chicken.” | ||
| False positive | 66 | “We have ginger, turmeric, garlic, and black pepper in our diet regularly. They protect us from common cold, cough, sore throat, and lung mucus. But it is unknown whether they can provide protection against the corona virus.” | |
| False negative | 91 | “Corona Virus or COVID-19 does not resist ginger and garlic. Regions using much garlic and ginger have recorded insignificant number of cases of this outbreak.” |
* Number of tweets predicted by category out of 1778 test data.