| Literature DB >> 33954243 |
Hema Karande1, Rahee Walambe2,3, Victor Benjamin4, Ketan Kotecha1,2, T S Raghu4.
Abstract
The evolution of electronic media is a mixed blessing. Due to the easy access, low cost, and faster reach of the information, people search out and devour news from online social networks. In contrast, the increasing acceptance of social media reporting leads to the spread of fake news. This is a minacious problem that causes disputes and endangers the societal stability and harmony. Fake news spread has gained attention from researchers due to its vicious nature. proliferation of misinformation in all media, from the internet to cable news, paid advertising and local news outlets, has made it essential for people to identify the misinformation and sort through the facts. Researchers are trying to analyze the credibility of information and curtail false information on such platforms. Credibility is the believability of the piece of information at hand. Analyzing the credibility of fake news is challenging due to the intent of its creation and the polychromatic nature of the news. In this work, we propose a model for detecting fake news. Our method investigates the content of the news at the early stage i.e., when the news is published but is yet to be disseminated through social media. Our work interprets the content with automatic feature extraction and the relevance of the text pieces. In summary, we introduce stance as one of the features along with the content of the article and employ the pre-trained contextualized word embeddings BERT to obtain the state-of-art results for fake news detection. The experiment conducted on the real-world dataset indicates that our model outperforms the previous work and enables fake news detection with an accuracy of 95.32%. ©2021 Karande et al.Entities:
Keywords: BERT; Credibility; LSTM; Misinformation; Fake news; Stance detection
Year: 2021 PMID: 33954243 PMCID: PMC8053013 DOI: 10.7717/peerj-cs.467
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Pipeline architecture of the system.
Figure 2A part of a dataset consisting of true and false news instances.
Performance of different AI models.
| Models | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | |
|---|---|---|---|---|---|
| Tokenizer | LSTM | 86.6 | 85.1 | 88.7 | 86.9 |
| Bi-LSTM | 85.4 | 84.9 | 86.2 | 85.5 | |
| CNN |
| 93.0 | 93.0 | 93.01 | |
| GloVe embeddings | LSTM |
| 91.7 | 92.7 | 92.2 |
| Bi-LSTM | 91.9 | 90.2 | 93.9 | 92.1 | |
| CNN | 91 | 91.6 | 90.2 | 90.9 | |
| GloVe embeddings and attention mechanism | LSTM |
| 91.7 | 92.7 | 92.2 |
| Bi-LSTM | 91.9 | 90.2 | 93.9 | 92.1 | |
| CNN | 91 | 91.6 | 90.2 | 90.9 | |
| BERT embeddings | LSTM | 91.16 | 91.01 | 91.01 | 91.01 |
| Bi-LSTM | 93.05 | 88.76 | 88.76 | 93.3 | |
| CNN |
| 94.11 | 94.11 | 95.31 |
Classification results for the proposed model.
| Models | Training | Validation | Testing | Precision | Recall | F1 | ROC |
|---|---|---|---|---|---|---|---|
| ANN | 91.52 | 91.86 | 91.85 | 91.98 | 91.98 | 91.69 | 91.85 |
| LSTM | 98.51 | 91.16 | 91.16 | 91.01 | 91.01 | 91.01 | 91.15 |
| Bi-LSTM | 98.48 | 93.06 | 93.05 | 88.76 | 88.76 | 93.3 | 93.47 |
| CNN | 99.96 | 95.33 |
| 94.11 | 94.11 | 95.31 | 95.33 |
Effectiveness of stance feature in the classification of news articles.
| Features | Models | Training | Validation | Testing | Precision | Recall | F1 | ROC |
|---|---|---|---|---|---|---|---|---|
| News Title, News Body | ANN | 89.2 | 88.0 | 88.33 | 86.57 | 86.57 | 88.61 | 88.41 |
| LSTM | 95.29 | 88.8 | 90.64 | 87.42 | 87.42 | 91.0 | 90.9 | |
| Bi-LSTM | 97.99 | 89.79 | 92.21 | 93.6 | 93.6 | 92.0 | 92.26 | |
| CNN | 99.1 | 93.68 | 93.90 | 91.3 | 91.3 | 94.0 | 94.0 | |
| News Title, News Body, Similarity between them (Stance) | ANN | 89.31 | 89.37 | 89.38 | 86.4 | 86.4 | 89.8 | 89.4 |
| LSTM | 94.36 | 89.05 | 91.06 | 87.8 | 87.8 | 91.44 | 91.37 | |
| Bi-LSTM | 98.6 | 92.11 | 92.6 | 93.5 | 93.5 | 92.5 | 92.6 | |
| CNN | 99.3 | 92.9 | 94.42 | 94.33 | 94.33 | 94.43 | 94.42 |
Figure 3Different evaluation metrics applied to our system.
(A) Training accuracies for different models. (B) Testing accuracies for different models. (C) Precision scores for different models. (D) Recall values for different models. (E) F1 Scores for different models. (F) ROC values for different models.
5-fold cross validation results for the proposed model.
| Features | Models | Training | Validation | Testing | Precision | Recall | F1 | ROC |
|---|---|---|---|---|---|---|---|---|
| News Title, News Body, Similarity between them (Stance) | ANN | 91.31 | 88.72 | 90.73 | 87.35 | 87.35 | 91.14 | 91.17 |
| LSTM | 97.08 | 87.61 | 88.60 | 84.98 | 84.98 | 89.29 | 89.40 | |
| Bi-LSTM | 99.23 | 92.72 | 93.24 | 92.03 | 92.03 | 93.34 | 93.44 | |
| CNN | 99.92 | 95.25 | 95.85 | 94.81 | 94.81 | 95.89 | 95.90 |
Performance of various Fake News Identification models.
| 86 | 82.5 | 95 | ||||||
| 73.2 | 82.6 | 62 | 72.8 | 87.3 | ||||
| 84.3 | 83.9 | |||||||
| 70 | 89 | |||||||
| 90.7 | 94.7 | 92.7 | ||||||
| 84.3 | 83.9 | 94.3 | ||||||
| 92.1 | 93.1 | |||||||
| 84.9 | 87.7 | 91 | ||||||
| 90 | 95 | 85 | 86 | |||||
| Our Model | 91.16 | 93.05 |
|