| Literature DB >> 34178179 |
Chen Yang1, Xinyi Zhou1, Reza Zafarani1.
Abstract
COVID-19 has impacted all lives. To maintain social distancing and avoiding exposure, works and lives have gradually moved online. Under this trend, social media usage to obtain COVID-19 news has increased. Also, misinformation on COVID-19 is frequently spread on social media. In this work, we develop CHECKED, the first Chinese dataset on COVID-19 misinformation. CHECKED provides a total 2,104 verified microblogs related to COVID-19 from December 2019 to August 2020, identified by using a specific list of keywords. Correspondingly, CHECKED includes 1,868,175 reposts, 1,185,702 comments, and 56,852,736 likes that reveal how these verified microblogs are spread and reacted on Weibo. The dataset contains a rich set of multimedia information for each microblog including ground-truth label, textual, visual, temporal, and network information. Extensive experiments have been conducted to analyze CHECKED data and to provide benchmark results for well-established methods when predicting fake news using CHECKED. We hope that CHECKED can facilitate studies that target misinformation on coronavirus. The dataset is available at https://github.com/cyang03/CHECKED.Entities:
Keywords: COVID-19; Dataset; Fake news; Infodemic; Information credibility; Multimedia; Social media
Year: 2021 PMID: 34178179 PMCID: PMC8217979 DOI: 10.1007/s13278-021-00766-8
Source DB: PubMed Journal: Soc Netw Anal Min
List of Keywords Relevant to COVID-19
Fig. 1Illustrations of Collected Microblogs
Statistics of CHECKED Data
| Real | Fake | All | |
|---|---|---|---|
| # Microblogs | 1,760 | 344 | 2,104 |
| with images | 1,149 | 53 | 1,202 |
| with video | 563 | 106 | 669 |
| with reposts | 1,151 | 229 | 1,380 |
| with comments | 1,151 | 292 | 1,443 |
| # Reposts of microblogs | 1,827,817 | 40,358 | 1,868,175 |
| # Comments of microblogs | 1,169,246 | 16,456 | 1,185,702 |
| # Likes of microblogs | 56,407,610 | 445,116 | 56,852,726 |
| # Weibo users | 686,077 | 51,674 | 737,751 |
Fig. 2Distribution of Selected Keywords in Collected Microblogs
Fig. 3Word Cloud
Fig. 4Dist. of Words
Fig. 5Dist. of Dates Posted
Fig. 6Dist. of Images
Fig. 7Dist. of Comments
Fig. 8Dist. of Reposts
Fig. 9Dist. of Likes
Benchmark Results using CHECKED data to Detect Fake News
| FastText | TextCNN | TextRNN | Att-TextRNN | Transformer | |
|---|---|---|---|---|---|
| Macro | 0.839 | 0.938 | 0.700 | 0.871 | 0.927 |