| Literature DB >> 31557213 |
Yong Fang1, Jian Gao1, Cheng Huang1, Hua Peng2, Runpu Wu3.
Abstract
With the rapid development of the internet, social media has become an essential tool for getting information, and attracted a large number of people join the social media platforms because of its low cost, accessibility and amazing content. It greatly enriches our life. However, its rapid development and widespread also have provided an excellent convenience for the range of fake news, people are constantly exposed to fake news and suffer from it all the time. Fake news usually uses hyperbole to catch people's eyes with dishonest intention. More importantly, it often misleads the reader and causes people to have wrong perceptions of society. It has the potential for negative impacts on society and individuals. Therefore, it is significative research on detecting fake news. In the paper, we built a model named SMHA-CNN (Self Multi-Head Attention-based Convolutional Neural Networks) that can judge the authenticity of news with high accuracy based only on content by using convolutional neural networks and self multi-head attention mechanism. In order to prove its validity, we conducted experiments on a public dataset and achieved a precision rate of 95.5% with a recall rate of 95.6% under the 5-fold cross-validation. Our experimental result indicates that the model is more effective at detecting fake news.Entities:
Year: 2019 PMID: 31557213 PMCID: PMC6762082 DOI: 10.1371/journal.pone.0222713
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The architecture of the SMHA-CNN model.
Experimental environment configuration.
| Items | Configuration |
|---|---|
| OS | Ubuntu 16.04.3 LTS |
| The system configuration | CPU:Intel i7-7700,RAM:16G |
| GPU:GeForce GTX 2080 8G | |
| The library of Python | Keras,Scikit-learn,Matplotlib |
Confusion matrix.
| Actual fake | Actual Non-fake | |
|---|---|---|
| TP | FP | |
| FN | TN |
Fig 2The distribution of raw article length.
The result of using 5-fold cross validation with Word2vec in experiment A.
| Method | Precision(%) | Recall(%) | F1-score(%) | |
|---|---|---|---|---|
| non-fake | 94.5 | 95.6 | 95.1 | |
| fake | 96.5 | 95.6 | 96.3 | |
| avg/total | ||||
| avg/total | 67.3 | 67.5 | 67.1 | |
| avg/total | 91.1 | 91.0 | 91.0 | |
| avg/total | 86.2 | 86.1 | 86.1 | |
| avg/total | 82.3 | 81.2 | 80.7 | |
| avg/total | ||||
| avg/total | 83.0 | 83.0 | 83.0 | |
| avg/total | 88.6 | 88.5 | 88.5 | |
| avg/total | 85.5 | 85.5 | 85.5 | |
| avg/total | 92.6 | 92.6 | 92.6 |
Fig 3The receiver operating characteristic curve of the different models.
(a) Using the number of 1200 as input length, the red curves show the performance of LSTM models, the black curves show the performance of Bi-LSTM model, the orange curves show the performance of GRU model, the green curves show the performance of Att-LSTM model, the blue curves show the performance of SMHA-CNN model. (b) Using the number of 400 as input length, and the red curves show the performance of LSTM model, the black curves show the performance of Bi-LSTM model, the orange curves show the performance of GRU model, the green curves show the performance of Att-LSTM model, the blue curves show the performance of SMHA-CNN model.
The evaluating results in experiment B.
| precision(%) | recall(%) | f1-score(%) | accuracy(%) | |
|---|---|---|---|---|
| 95.9 | 95.9 | 95.9 | ||
| 94.7 | 94.7 | 94.7 | 94.67 | |
| - | - | - | 93.5 | |
| 93.1 | 93.1 | 93.0 | ||
| 91.5 | 91.2 | 91.0 | 91.19 | |
| - | - | - | 87.7 |
Fig 4The different samples’ visualization results.
(a) The visualization result of a sample of fake news, the number in the top left corner of the word represents the times that the word be mapped. (b) The visualization result of a sample of real news, the number in the top left corner of the word represents the times that the word be mapped.