| Literature DB >> 34611504 |
Suleyman Gokhan Taskin1,2, Ecir Ugur Kucuksille2, Kamil Topal3.
Abstract
Social media has affected people's information sources. Since most of the news on social media is not verified by a central authority, it may contain fake news for various reasons such as advertising and propaganda. Considering an average of 500 million tweets were posted daily on Twitter alone in the year of 2020, it is possible to control each share only with smart systems. In this study, we use Natural Language Processing methods to detect fake news for Turkish-language posts on certain topics on Twitter. Furthermore, we examine the follow/follower relations of the users who shared fake-real news on the same subjects through social network analysis methods and visualization tools. Various supervised and unsupervised learning algorithms have been tested with different parameters. The most successful F1 score of fake news detection was obtained with the support vector machines algorithm with 0.9. People who share fake/true news can help in the separation of subgroups in the social network created by people and their followers. The results show that fake news propagation networks may show different characteristics in their own subject based on the follow/follower network. © King Fahd University of Petroleum & Minerals 2021.Entities:
Keywords: Fake news detection; Machine learning; Natural language processing; Social network analysis
Year: 2021 PMID: 34611504 PMCID: PMC8485117 DOI: 10.1007/s13369-021-06223-0
Source DB: PubMed Journal: Arab J Sci Eng ISSN: 2191-4281 Impact factor: 2.807
Fake news detection studies made by using non-ANN-based supervised learning algorithms.
| Refs. | Dataset | ML | Success ML | Performance measure | Best result |
|---|---|---|---|---|---|
| [ | Mobile phone reviews from mobile01.com | LR, SVM | SVM | F1-score | 0.61 |
| [ | Hotel reviews, restaurant review, gay marriage, and gun control, | SGD, SVM, | SVM | Accuracy | 0.9 |
| fake and real news articles from kaggle.com | KNN, LR, DT | ||||
| [ | 3 large Facebook pages each from the right and from the left | NB | NB | Accuracy | 0.75 |
| and Facebook pages of Politico, CNN and ABC News | |||||
| [ | The authors present a list of Facebook pages divided into two | LR, | HBLC | Accuracy | 0.99 |
| categories: scientific news sources and conspiracy news sources. | HBLC | ||||
| [ | Articles on sport, politics, rumor, health and other were | SVM, | SVM | F1-score | 0.79 |
| collected with web crawler. | NB | ||||
| [ | 2 satirical news sites (The Onion and The Beaverton) and 2 legitimate | SVM | SVM | F1-score | 0.87 |
| news sources (The Toronto Star and The New York Times): varying | |||||
| across 4 domains (civics, science, business, and “soft” news) | |||||
| [ | Collecting legitimate news from mainstream news websites such as | SVM | SVM | F1-score | 0.73 |
| CNN, FoxNews, Bloomberg, and CNET and collecting fake | |||||
| news using crowdsourcing. | |||||
| [ | News articles from Google with web crawler | NB | NB | Accuracy | 0.79 |
Fake news detection studies made by using ANN-based supervised learning algorithms.
| Refs. | Dataset | ML | Success ML | Performance measure | Best result |
|---|---|---|---|---|---|
| [ | Kaggle open source dataset of fake | LR, RNN, GRU, | GRU | F1-score | 0.84 |
| news article and signalmedia open | LSTM, BiLSTM, | ||||
| source dataset of not fake news article. | CNN | ||||
| [ | Using a set of articles flagged as false by | KNN, SVM, | LSTM | F1-score | 0.90 |
| Snopes, and a set of real articles from news | LSTM | ||||
| organizations such as NDTV, CNN etc.. | |||||
| [ | FNC-1 open source dataset of articles | LSTM, GRU | GRU | FNC-score | 69.08 |
| [ | The form of (headline, body) pairs from | RNN, LSTM, | BiLSTM | Accuracy | 0.84 |
| leading news organizations such as | BiLSTM, GRU, | ||||
| NDTV, CNN etc.. | BiGRU | ||||
| [ | FNC-1 open source dataset of articles. | MLP | MLP | FNC-score | 83.08 |
| [ | Tweets on Twitter, discussion topics | CSI, DT, SVM, | CSI | Accuracy | 0.95 |
| on Weibo and users. | LSTM, GRU | ||||
| [ | 19 fake news article websites (20,372 article) | 3HAN, GRU | 3HAN | Accuracy | 0.97 |
| labeled by polifact, 9 real news article | |||||
| websites (20,932 article) listed by forbes. | |||||
| [ | Tweets from 174 suspicious propaganda accounts | LR, RNN, | RNN, CNN | F1-score | 0.92 |
| identified by PropOrNot and manually constructed | CNN | ||||
| a list of 252 trusted news accounts by writers. | |||||
| [ | LIAR open source dataset of articles. | LR, SVM, | CNN | Accuracy | 0.27 |
| BiLSTM, CNN | |||||
| [ | LIAR open source dataset of articles. | LR, SVM, RNN, | CNN | Accuracy | 0.27 |
| GRU, LSTM, | |||||
| Bi-LSTM, CNN | |||||
| [ | Open source dataset from | GRU, LSTM, BiLSTM, | SMHA-CNN | F1-score | 0.96 |
| fakenews.mit.edu. | SMHA-CNN |
Numerical information about dataset
| F | F’ | FU | F’U | FF | F’F | |
|---|---|---|---|---|---|---|
| Topic-1 | 230 | 145 | 210 | 140 | 2.001.523 | 393.967 |
| Topic-2 | 222 | 155 | 209 | 152 | 849.073 | 1.066.007 |
| Topic-3 | 394 | 141 | 364 | 128 | 2.246.878 | 2.865.459 |
| Total | 846 | 441 | 783 | 420 | 5.097.474 | 4.325.433 |
Fig. 1Deep Learning Models for Fake News Detection.
Confusion matrix
| F’ | R’ | ||
|---|---|---|---|
| F | TP | FN | P |
| R | FP | TN | N |
| P’ | N’ |
Fig. 2Steps of machine learning algorithms in fake news detection problem
F1-score values obtained by using the TF-IDF word representation method, with different parameters, of the KNN algorithm.
| k=2 | k=3 | k=4 | k=5 | k=6 | k=7 | k=8 | ||
|---|---|---|---|---|---|---|---|---|
| All | Manhattan | 0.75 | 0.74 | 0.75 | 0.74 | 0.75 | 0 .75 | |
| Euclidean | 0.77 | 0.80 | 0.80 | 0.81 | 0.82 | 0.82 | ||
| Minkowski | 0.74 | 0.77 | 0.77 | 0.76 | 0.76 | |||
| Topic-1 | Manhattan | 0.69 | 0.73 | 0.73 | 0.73 | 0.71 | 0.69 | |
| Euclidean | 0.66 | 0.73 | 0.72 | 0.77 | 0.77 | |||
| Minkowski | 0.65 | 0.72 | 0.73 | 0.73 | ||||
| Topic-2 | Manhattan | 0.71 | 0.73 | 0.74 | 0.75 | 0.75 | 0.75 | |
| Euclidean | 0.75 | 0.78 | 0.78 | 0.79 | 0.79 | 0.80 | ||
| Minkowski | 0.75 | 0.76 | 0.74 | 0.75 | 0.73 | 0.74 | ||
| Topic-3 | Manhattan | 0.72 | 0.73 | 0.73 | 0.73 | 0.73 | 0.72 | |
| Euclidean | 0.78 | 0.8 | 0.81 | 0.82 | 0.82 | 0.82 | ||
| Minkowski | 0.71 | 0.73 | 0.69 | 0.69 | 0.62 | 0.65 |
F1-score values obtained by using Word2vec word representation method, with different parameters, of the KNN algorithm
| k=2 | k=3 | k=4 | k=5 | k=6 | k=7 | k=8 | ||
|---|---|---|---|---|---|---|---|---|
| All | Manhattan | 0.76 | 0.78 |
| 0.78 |
| 0.78 | 0.78 |
| Euclidean | 0.76 | 0.79 | 0.79 | 0.78 | 0.79 | 0.78 |
| |
| Minkowski | 0.76 | 0.77 |
| 0.78 |
| 0.78 |
| |
| Topic-1 | Manhattan | 0.77 | 0.79 | 0.77 | 0.78 | 0.78 |
| 0.77 |
| Euclidean | 0.77 |
|
| 0.77 | 0.77 |
|
| |
| Minkowski | 0.78 | 0.78 | 0.77 | 0.77 | 0.78 | 0.77 |
| |
| Topic-2 | Manhattan | 0.74 |
|
| 0.74 | 0.74 | 0.74 | 0.74 |
| Euclidean |
|
|
| 0.74 |
| 0.74 |
| |
| Minkowski | 0.75 | 0.75 | 0.74 |
| 0.74 | 0.74 | 0.75 | |
| Topic-3 | Manhattan |
|
| 0.81 | 0.81 |
| 0.8 | 0.81 |
| Euclidean | 0.81 | 0.81 | 0.77 |
|
|
| 0.81 | |
| Minkowski |
|
| 0.81 |
|
| 0.81 | 0.80 |
F1-score values of the RF algorithm obtained with different parameters.
| Entropy-50 | Entropy-100 | Entropy-500 | Entropy-1000 | Gini-50 | Gini-100 | Gini-500 | Gini-1000 | ||
|---|---|---|---|---|---|---|---|---|---|
| All | TF-IDF | 0.84 | 0.84 | 0.84 | |||||
| W2v | 0.79 | 0.8 | 0.8 | 0.78 | 0.79 | 0.8 | |||
| Topic-1 | TF-IDF | 0.89 | 0.89 | 0.88 | 0.89 | 0.89 | |||
| W2v | 0.8 | 0.81 | 0.83 | 0.83 | 0.8 | 0.82 | 0.83 | ||
| Topic-2 | TF-IDF | 0.78 | 0.78 | 0.77 | 0.78 | 0.77 | 0.77 | 0.78 | |
| W2v | 0.76 | 0.75 | 0.76 | 0.76 | |||||
| Topic-3 | TF-IDF | 0.8 | 0.81 | 0.8 | 0.8 | 0.81 | 0.81 | ||
| W2v | 0.83 | 0.83 | 0.83 | 0.83 |
F1-score values of the SVM algorithm obtained with different parameters.
| Linear | Poly-2 | Poly-3 | Poly-4 | Poly-5 | Poly-6 | RBF | Sigmoid | ||
|---|---|---|---|---|---|---|---|---|---|
| All | TF-IDF | 0.86 | 0.83 | 0.75 | 0.63 | 0.54 | 0.86 | ||
| W2v | 0.79 | 0.81 | 0.81 | 0.81 | 0.7 | ||||
| Topic-1 | TF-IDF | 0.83 | 0.75 | 0.61 | 0.42 | 0.44 | 0.87 | ||
| W2v | 0.82 | 0.83 | 0.83 | 0.82 | 0.82 | 0.83 | 0.8 | ||
| Topic-2 | TF-IDF | 0.82 | 0.82 | 0.79 | 0.7 | 0.61 | 0.55 | 0.8 | |
| W2v | 0.77 | 0.77 | 0.77 | 0.76 | 0.74 | 0.77 | 0.75 | ||
| Topic-3 | TF-IDF | 0.84 | 0.78 | 0.66 | 0.54 | 0.44 | |||
| W2v | 0.86 | 0.86 | 0.85 | 0.85 | 0.84 | 0.85 | 0.8 |
F1-score values of the Deep Learning algorithms obtained with different parameters.
| LR=0.001 | LR=0.0001 | ||||||
|---|---|---|---|---|---|---|---|
| M-1 | M-2 | M-3 | M-1 | M-2 | M-3 | ||
| All | RNN | 0.58 | 0.56 | 0.58 | 0.57 | 0.57 | |
| GRU | 0.68 | 0.7 | 0.72 | ||||
| LSTM | 0.76 | 0.77 | 0.76 | ||||
| Topic-1 | RNN | 0.56 | 0.56 | ||||
| GRU | 0.81 | 0.42 | 0.44 | 0.39 | |||
| LSTM | 0.82 | 0.81 | 0.7 | 0.78 | 0.79 | ||
| Topic-2 | RNN | 0.56 | 0.56 | 0.56 | |||
| GRU | 0.81 | 0.81 | 0.42 | 0.42 | 0.38 | ||
| LSTM | 0.8 | 0.8 | 0.73 | 0.76 | 0.80 | ||
| Topic-3 | RNN | 0.56 | 0.55 | 0.55 | |||
| GRU | 0.79 | 0.81 | 0.4 | 0.41 | 0.41 | ||
| LSTM | 0.73 | 0.76 | 0.8 | ||||
F1-score values of the Bi-directional Deep Learning algorithms obtained with different parameters
| LR=0.001 | LR=0.0001 | ||||||
|---|---|---|---|---|---|---|---|
| M-1 | M-2 | M-3 | M-1 | M-2 | M-3 | ||
| All | Bi-RNN | 0.65 | 0.64 | 0.63 | 0.62 | 0.63 | |
| Bi-GRU | 0.78 | 0.78 | 0.78 | 0.77 | 0.78 | ||
| Bi-LSTM | 0.79 | 0.79 | 0.79 | 0.79 | 0.78 | ||
| Topic-1 | Bi-RNN | 0.61 | 0.61 | 0.58 | 0.57 | 0.57 | |
| Bi-GRU | 0.81 | 0.79 | 0.68 | 0.7 | 0.75 | ||
| Bi-LSTM | 0.82 | 0.81 | 0.81 | 0.81 | 0.82 | ||
| Topic-2 | Bi-RNN | 0.62 | 0.61 | 0.58 | 0.58 | 0.58 | |
| Bi-GRU | 0.67 | 0.7 | 0.75 | ||||
| Bi-LSTM | 0.81 | 0.8 | 0.82 | 0.82 | 0.82 | ||
| Topic-3 | Bi-RNN | 0.6 | 0.58 | 0.56 | 0.58 | ||
| Bi-GRU | 0.79 | 0.67 | 0.69 | 0.74 | |||
| Bi-LSTM | 0.81 | 0.81 | |||||
Fig. 3Error bars graph indicating F1-metric averages and standard deviations of non-ANN-based supervised learning algorithms, deep learning algorithms, and unsupervised learning algorithms in all data sets.
Fig. 4a Topic-1 users, having high authorization scores. b Topic-1 users, having high authorization scores, and who shared real news. c Topic-1 users, having high authorization scores, and who shared fake news
Fig. 5a Topic-2 users, having high authorization scores. b Topic-2 users, having high authorization scores, and who shared real news. c Topic-2 users, having high authorization scores, and who shared fake news.
Fig. 6a Topic-3 users, having high authorization scores. b Topic-3 users, having high authorization scores, and who shared real news. c Topic-3 users, having high authorization scores, and who shared fake news.