| Literature DB >> 36193330 |
Kevin Zhan1, Yutong Li1, Rafay Osmani2, Xiaoyu Wang3, Bo Cao1.
Abstract
Background: During the ongoing COVID-19 pandemic, we are being exposed to large amounts of information each day. This "infodemic" is defined by the World Health Organization as the mass spread of misleading or false information during a pandemic. This spread of misinformation during the infodemic ultimately leads to misunderstandings of public health orders or direct opposition against public policies. Although there have been efforts to combat misinformation spread, current manual fact-checking methods are insufficient to combat the infodemic. Objective: We propose the use of natural language processing (NLP) and machine learning (ML) techniques to build a model that can be used to identify unreliable news articles online.Entities:
Keywords: COVID-19; deep learning; ensemble model; false information; infodemic; news article reliability
Year: 2022 PMID: 36193330 PMCID: PMC9516811 DOI: 10.2196/38839
Source DB: PubMed Journal: JMIR Infodemiology ISSN: 2564-1891
Figure 1Details of workflow for data exploration and “new model” construction (highlighted in blue). CNN: convolutional neural network; BiGRU: bidirectional gated recurrent unit; BiLSTM: bidirectional long short-term memory; GRU: gated recurrent unit; KNN: K-nearest neighbor; LR: logistic regression; LSTM: long short-term memory; NB: naive Bayes; XGBoost: extreme gradient boosting.
Figure 2Number of occurrences for keywords in unreliable news articles (N=298,498 words).
Figure 3Number of occurrences of keywords in reliable news articles (N=662,290 words).
Top 10 lexical categories from Empath (a neural network–based topic analysis tool) in reliable and unreliable news articles selected by Empath. The reliable and unreliable means is the mean counts of each lexical category being classified into reliable and unreliable news articles, respectively.
| Lexical category | Reliable mean (SD) | Unreliable mean (SD) | ||
| magic | –7.91 (1992) | <.001 | 0.19 (0.60) | 0.51 (1.22) |
| power | –7.16 (1992) | <.001 | 1.28 (2.20) | 2.16 (3.24) |
| business | 7.15 (1992) | <.001 | 8.58 (10.54) | 5.31 (7.10) |
| work | 6.89 (1992) | <.001 | 5.78 (8.82) | 3.28 (3.89) |
| contentment | 6.18 (1992) | <.001 | 0.70 (1.61) | 0.29 (0.72) |
| office | 6.14 (1992) | <.001 | 3.02 (4.37) | 1.88 (2.60) |
| dispute | –6.11 (1992) | <.001 | 1.58 (2.48) | 2.35 (2.94) |
| morning | 5.87 (1992) | <.001 | 1.06 (1.87) | 0.59 (1.11) |
| legend | –5.85 (1992) | <.001 | 0.34 (0.92) | 0.64 (1.31) |
| blue collar job | 5.83 (1992) | <.001 | 0.62 (1.75) | 0.21 (0.68) |
Text length and readability metrics for reliable (N=1346) and unreliable (N=648) online news articles. The text length was expressed as the average sentence length and word length. Readability was expressed using the Flesch-Kincaid grade level, the Dale-Chall readability index, the ARIa, the Coleman-Liau index, the Gunning fog index, and the Linsear Write index.
| Metrics | Reliable mean (SD) | Unreliable mean (SD) | ||
| Average word length (characters) | 6.14 (0.27) | 6.32 (1.66) | –3.93 (1992) | <.001 |
| Average sentence length (words) | 23.67 (5.17) | 26.38 (7.06) | –9.70 (1992) | <.001 |
| Flesch-Kincaid grade level | 12.68 (2.63) | 14.39 (3.37) | –12.38 (1992) | <.001 |
| Gunning fog index | 14.87 (2.72) | 16.42 (3.33) | –11.00 (1992) | <.001 |
| Coleman-Liau index | 10.85 (1.87) | 11.82 (2.46) | –9.72 (1992) | <.001 |
| Dale-Chall index | 10.21 (0.96) | 10.70 (1.02) | –10.53 (1992) | <.001 |
| ARI | 13.41 (3.30) | 15.43 (4.47) | –11.41 (1992) | <.001 |
| Linsear Write index | 16.42 (4.02) | 18.73 (5.31) | –10.80 (1992) | <.001 |
aARI: automated readability index.
Comparison of sentiment polarity (0=least expression of sentiment in interest, 1=most expression of sentiment in interest) between reliable (N=1346) and unreliable (N=648) news articles in terms of sentiment of the sentences within news articles. Differences between the frequencies of sentences possessing positive, neutral, or negative sentiment were analyzed with a 2-sample independent t test.
| Sentiment | Reliable mean (SD) | Unreliable mean (SD) | ||
| Negative | 0.066 (0.042) | 0.076 (0.039) | –5.46 (1992) | <.001 |
| Neutral | 0.850 (0.054) | 0.840 (0.050) | 4.37 (1992) | <.001 |
| Positive | 0.084 (0.035) | 0.085 (0.035) | –0.095 (1992) | .92 |
Figure 4Receiver operating characteristic (ROC) curve and AUC scores with the corresponding color for both traditional ML models (KNN, LR,NB) and deep learning models (BiLSTM, CNN, LSTM, BiGRU, GRU, new model). AUC: area under the curve; BiGRU: bidirectional gated recurrent unit; BiLSTM: bidirectional long short-term memory; CNN: convolutional neural network; FP: false positive; GRU: gated recurrent unit; KNN: K-nearest neighbor; LR: logistic regression; LSTM: long short-term memory; ML: machine learning; NB: naive Bayes; TP: true positive.
Performance metrics for the ReCOVery validation data set for traditional MLa models (KNNb, LRc, NBd), and deep learning models (BiLSTMe, CNNf, LSTMg, BiGRUh, GRUi, new model).
| Model | Specificity | Sensitivity | AUCj |
| LR | 0.720 | 0.575 | 0.563 |
| KNN | 0.660 | 0.739 | 0.530 |
| NB | 0.700 | 0.627 | 0.553 |
| BiLSTM | 0.810 | 0.925 | 0.892 |
| CNN | 0.792 | 0.851 | 0.789 |
| LSTM | 0.829 | 0.903 | 0.883 |
| BiGRU | 0.791 | 0.963 | 0.868 |
| GRU | 0.804 | 0.918 | 0.878 |
| New model | 0.835 | 0.945 | 0.906 |
aML: machine learning.
bKNN: K-nearest neighbor.
cLR: logistic regression.
dNB: naive Bayes.
eBiLSTM: bidirectional long short-term memory.
fCNN: convolutional neural network.
gLSTM: long short-term memory.
hBiGRU: bidirectional gated recurrent unit.
iGRU: gated recurrent unit.
jAUC: area under the curve.
Figure 5Confusion matrix for ReCOVery validation subset on trained new ensemble model with BiGRU and XGBoost. BiGRU: bidirectional gated recurrent unit; XGBoost: extreme gradient boosting.