| Literature DB >> 33935583 |
Lynnette Hui Xian Ng1, Kathleen M Carley1.
Abstract
The 2020 coronavirus pandemic has heightened the need to flag coronavirus-related misinformation, and fact-checking groups have taken to verifying misinformation on the Internet. We explore stories reported by fact-checking groups PolitiFact, Poynter and Snopes from January to June 2020. We characterise these stories into six clusters, then analyse temporal trends of story validity and the level of agreement across sites. The sites present the same stories 78% of the time, with the highest agreement between Poynter and PolitiFact. We further break down the story clusters into more granular story types by proposing a unique automated method, which can be used to classify diverse story sources in both fact-checked stories and tweets. Our results show story type classification performs best when trained on the same medium, with contextualised BERT vector representations outperforming a Bag-Of-Words classifier.Entities:
Keywords: Coronavirus; Fact checking; Misinformation; Social cybersecurity; Text classification
Year: 2021 PMID: 33935583 PMCID: PMC8072300 DOI: 10.1007/s10588-021-09329-w
Source DB: PubMed Journal: Comput Math Organ Theory ISSN: 1381-298X Impact factor: 2.023
Summary of stories
| Fact checking site | Number of stories |
|---|---|
| Poynter (coronavirus misinformation) | 6139 |
| Snopes | 151 |
| PolitiFact | 441 |
Data fields
| Data field | Explanation |
|---|---|
| Article Id | Unique ID, if given by the website; otherwise self-generated |
| Date reported | Date of story if available; otherwise date the story was highlighted |
| Validity | Truthfulness of the story |
| Story | Story to be fact checked |
| Elaboration | Elaboration to the validity of the story |
| Medium | Medium where the story was originated (i.e. Facebook, Twitter, WhatsApp) |
Harmonisation metric for story validity
| Harmonised validity | Explanation | Variations on fact-checking sites |
|---|---|---|
| True | Can be verified by trusted source (eg Centers for Disease Control and Prevention, peer-reviewed papers) | Correct, Correct Attribution, True |
| Partially true | Contains verifiable true facts and facts that cannot be verified | Half true, Half truth, Mixed, Mixture, Mostly True, Partially True, Partly True, Partially correct, True but |
| Partially false | Contains verifiable false facts and facts that cannot be verified | Mostly False, Partly False, Partially False, Two Pinocchios |
| False | Can be disputed or has been disputed false by trusted source or the organisation/ person in the claim | False, Falseo, Fake, Misleading, Pants on fire, Pants-fire, Scam, Barely-true |
| Unknown | Cannot be verified or disputed | Org. doesn’t apply rating, In dispute, No evidence, Unproven, Unverified, Suspicions |
Fig. 1Three story types categorization process flows
Fig. 2Story clusters
Performance of story validity classifier variant (F1 score)
| Cluster | BOW + Naive bayes | BOW + SVM | BOW + Logistic regression | BERT + SVM | BERT + Logistic regression |
|---|---|---|---|---|---|
| 1 | 0.90 | 0.90 | 0.92 | 0.92 | 0.90 |
| 2 | 0.85 | 0.86 | 0.88 | 0.85 | 0.85 |
| 3 | 0.82 | 0.83 | 0.86 | 0.82 | 0.84 |
| 4 | 0.85 | 0.88 | 0.88 | 0.85 | 0.84 |
| 5 | 0.90 | 0.90 | 0.90 | 0.90 | 0.89 |
| 6 | 0.87 | 0.87 | 0.88 | 0.85 | 0.88 |
| Average | 0.87 | 0.87 | 0.89 | 0.87 | 0.87 |
Level of agreement across fact checking sites
| Cluster | Snopes | Snopes | PolitiFact |
|---|---|---|---|
| 1 | 0.04 | 0.26 | 0.70 |
| 2 | 0.22 | 0.31 | 0.47 |
| 3 | 0.13 | 0.17 | 0.70 |
| 4 | 0.10 | 0.00 | 1.00 |
| 5 | 0.00 | 0.00 | 1.00 |
| 6 | 0.02 | 0.04 | 0.94 |
| Avg | 0.085 | 0.13 | 0.80 |
Performance of story type classification
| Precision | Recall | F1-score | |
|---|---|---|---|
| Stories trained on stories (BERT) | 0.59 | 0.59 | 0.58 |
| Stories trained on stories (BERT-enhanced) | 0.55 | 0.39 | 0.45 |
| Stories trained on stories (BOW) | 0.56 | 0.43 | 0.48 |
| Stories trained on Tweets (BERT) | 0.10 | 0.11 | 0.09 |
| Stories trained on Tweets (BERT-enhanced) | 0.07 | 0.07 | 0.07 |
| Stories trained on Tweets (BOW) | 0.06 | 0.05 | 0.05 |
| Tweets trained on stories (BERT) | 0.12 | 0.14 | 0.13 |
| Tweets trained on stories (BERT-enhanced) | 0.10 | 0.08 | 0.09 |
| Tweets trained on stories (BOW) | 0.07 | 0.03 | 0.05 |
| Tweets trained on Tweets (BERT) | 0.43 | 0.43 | 0.43 |
| Tweets trained on Tweets (BERT-enhanced) | 0.39 | 0.29 | 0.33 |
| Tweets trained on Tweets (BOW) | 0.35 | 0.22 | 0.27 |
| Stories random baseline | 0.12 | 0.12 | 0.12 |
| Tweets random baseline | 0.16 | 0.16 | 0.16 |
Sampling of story type categories and examples
| Category | Story from fact checking sites | Tweet |
|---|---|---|
| Conspiracy | Is the umbrella corporation logo oddly similar to a wuhan biotech lab? Chinese scientists expelled from a Canadian microbiology lab took the novel strain with them to china | Utter rubbish. Wuhan bioweapon exclusive covid19 may not have originated in china my driver says covid19 is a conspiracy to kill people in order to get money |
| Commercial activity/ promotion | Can you get free baby formula during covid19 crisis by calling the company? | Careful Cleaning and disinfecting will help rid your home of the coronavirus |
| Correction/calling out | Its time to debunk claims that vitamin c could cure it in the midst of the novel outbreak | Garlic may be healthy otherwise but it won’t prevent you from fake news hi [...] I know it wasn’t your intention but your tweet joking about 5g and contains misinformation |
| Fake Cure | Turkish doctor allegedly found vaccine Romiania developed a vaccine able to cure white people only | Hydroxychloroquine works its actually worked for 60 years my mum ginger water recipe can cure covid19 for real |
| False fact/ prevention | Food products such as rice fortune cookies, mi goreng noodles, ice tea and Chinese red bull are contaminated in Australia | Tide pods actually make you immune to covid19 |
| Politics | Did president trump cut the cdc budget as the new coronavirus spread in February 2020? | In America the president advocates drinking bleach to cure covid19 [...] |
| True Public Health Responses | China is building a hospital for new patients | France will no longer use hxc to treat covid19 patients after study suggests it poses health risks |
| Case Occurences | A doctor tested positive for the 2019 novel nvov at the Makati medical center In china, more than 30 million quarantined [...] | Health authorities have identified a new virus behind the death of one man and dozens falling ill |