| Literature DB >> 35573903 |
Dimitrios Serpanos1,2, Georgios Xenos3, Billy Tsouvalas4.
Abstract
Detection and identification of misinformation and fake news is a complex problem that intersects several disciplines, ranging from sociology to computer science and mathematics. In this work, we focus on social media analyzing characteristics that are independent of the text language (language-independent) and social context (location-independent) and common to most social media, not only Twitter as mostly analyzed in the literature. Specifically, we analyze temporal and structural characteristics of information flow in the social networks and we evaluate the importance and effect of two different types of features in the detection process of fake rumors. Specifically, we extract epidemiological features exploiting epidemiological models for spreading false rumors; furthermore, we extract graph-based features from the graph structure of the information cascade of the social graph. Using these features, we evaluate them for fake rumor detection with 3 configurations: (i) using only epidemiological features, (ii) using only graph-based features, and (iii) using the combination of epidemiological and graph-based features. Evaluation is performed with a Gradient Boosting classifier on two benchmark fake rumor detection datasets. Our results demonstrate that epidemiological models fit rumor propagation well, while graph-based features lead to more effective classification of rumors; the combination of epidemiological and graph-based features leads to improved performance.Entities:
Keywords: epidemiological models; graph-based detection; misinformation; rumor classification; rumor propagation
Year: 2022 PMID: 35573903 PMCID: PMC9093217 DOI: 10.3389/frai.2022.734347
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1The SEIZ model.
Figure 2SI and SEIZ fitting examples for the first 72 h of diffusion of a particular rumor.
Figure 3Graph of false story cascade from the Twitter16 dataset.
Figure 4RSI values of different rumors. Red = False rumors, Blue = True rumors.
Statistics of Twitter15 and Twitter16 datasets (Ma et al., 2017).
|
|
|
|
|---|---|---|
| Number of users | 276,663 | 173,487 |
| Number of tweets | 1,490 | 818 |
| Average time length/tree | 1,337 h | 848 h |
| Average posts/tree | 223 | 251 |
| Max posts/tree | 1,768 | 2,765 |
| Minimum posts / tree | 55 | 81 |
Twitter15 and Twitter16 fitting error (RMSE).
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| 240 h | 19.55 | 21.592 | 21.38 | 25.578 |
| 120 h | 24.69 | 24.41 | 26.08 | 29.661 |
| 72 h | 29.29 | 26.83 | 30.98 | 28.075 |
| 48 h | 32.88 | 24.126 | 34.86 | 29.232 |
| 4 h | 35.24 | 6.855 | 34.66 | 6.205 |
Twitter15 and Twitter16 SI and SEIZ 4 class classification results.
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
| ||||||||
| 240 h | 38.98 | 37.80 | 37.56 | 38.98 | 46.83 | 45.68 | 45.90 | 46.83 |
| 120 h |
|
|
|
|
|
|
|
|
| 72 h | 41.13 | 40.12 | 39.78 | 41.13 | 48.29 | 47.38 | 47.83 | 48.29 |
| 48 h | 37.90 | 36.63 | 36.41 | 37.90 | 42.44 | 42.33 | 42.35 | 42.44 |
| 4 h | 33.24 | 32.33 | 32.03 | 33.24 | 38.73 | 38.90 | 39.49 | 38.73 |
|
| ||||||||
| 240 h |
|
|
|
| 46.34 | 45.49 | 46.25 | 46.34 |
| 120 h | 37.10 | 36.67 | 36.50 | 37.10 |
|
|
|
|
| 72 h | 36.56 | 35.64 | 35.51 | 36.56 | 45.37 | 45.61 | 46.16 | 45.37 |
| 48 h | 37.90 | 36.98 | 36.85 | 37.90 | 40.49 | 40.26 | 40.47 | 40.49 |
| 4 h | 36.76 | 35.62 | 35.46 | 36.76 | 36.76 | 36.61 | 36.84 | 36.76 |
The best performing experimental setups are provided in bold.
Twitter15 and Twitter16 SI and SEIZ binary classification results.
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
| ||||||||
| 240 h | 54.59 | 54.57 | 54.49 | 54.59 | 63.11 | 63.07 | 63.21 | 63.11 |
| 120 h | 50.27 | 50.25 | 50.28 | 50.27 | 64.08 | 64.07 | 64.08 | 64.08 |
| 72 h | 52.43 | 52.43 | 52.43 | 52.43 | 66.02 | 66.00 | 66.09 | 66.02 |
| 48 h |
|
|
|
| 66.02 | 66.02 | 66.03 | 66.02 |
| 4 h | 50.00 | 49.98 | 50.14 | 50.00 |
|
|
|
|
|
| ||||||||
| 240 h | 51.89 | 51.86 | 51.91 | 51.89 | 60.19 | 60.15 | 60.21 | 60.19 |
| 120 h | 49.19 | 49.11 | 49.21 | 49.19 | 63.11 | 63.09 | 63.11 | 63.11 |
| 72 h | 52.43 | 52.42 | 52.45 | 52.43 | 65.05 | 65.01 | 65.17 | 65.05 |
| 48 h | 54.59 | 54.45 | 54.62 | 54.89 |
|
|
|
|
| 4 h |
|
|
|
| 66.02 | 65.98 | 66.05 | 66.02 |
The best performing experimental setups are provided in bold.
RSI classification results for Twitter15 and Twitter16 datasets.
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
| ||||||||
| 240 hours |
|
|
|
|
|
|
|
|
| 120 hours | 48.62 | 48.62 | 48.62 | 48.62 | 49.19 | 49.11 | 49.21 | 49.19 |
| 72 hours | 49.24 | 49.24 | 49.24 | 49.24 | 52.97 | 52.92 | 53.01 | 52.97 |
| 48 hours | 48.61 | 48.61 | 48.61 | 48.61 | 49.73 | 49.62 | 49.75 | 49.73 |
| 4 hours | 50.57 | 50.57 | 50.57 | 50.57 | 50.00 | 48.36 | 50.73 | 50.00 |
|
| ||||||||
| 240 hours |
|
|
|
| 58.25 | 57.16 | 59.40 | 58.25 |
| 120 hours | 52.43 | 52.43 | 52.50 | 52.43 | 55.34 | 54.31 | 56.05 | 55.34 |
| 72 hours | 54.37 | 54.32 | 54.36 | 54.37 | 54.37 | 54.06 | 54.59 | 54.37 |
| 48 hours | 61.17 | 61.05 | 61.36 | 61.17 |
|
|
|
|
| 4 hours | 51.46 | 51.43 | 51.44 | 51.46 | 50.49 | 48.64 | 50.36 | 50.49 |
The best performing experimental setups are provided in bold.
Figure 5Weighted average diffusion time per diffusion level—Twitter15.
Figure 6Weighted average diffusion time per diffusion level—Twitter16.
Twitter15 and 16 Graph-based and combined feature sets—binary classification results with labels = {true, false}.
|
| ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
|
| Graph + <4 h ADT | 58.60 | 59.69 | 58.16 | 61.29 | 75.73 | 76.19 | 72.73 | 80.00 |
| Graph + <24 h ADT | 60.22 | 59.34 | 60.67 | 58.06 | 77.67 | 77.67 | 75.47 | 80.00 |
| Graph + <48 h ADT | 57.53 | 56.83 | 57.78 | 55.91 | 75.73 | 77.06 | 71.19 | 84.00 |
| Graph + <72 h ADT |
|
|
|
| 77.50 | 78.50 | 73.68 | 84.00 |
| Graph + Total ADT | 57.53 | 58.64 | 57.14 | 60.22 |
|
|
|
|
| Graph + SEIZ feature set binary classification | ||||||||
| Graph + <4 h ADT + SEIZ | 62.24 | 56.98 | 59.76 | 54.44 | 73.79 | 74.29 | 70.91 | 78.00 |
| Graph + <48 h ADT + SEIZ |
|
|
|
| 75.73 | 76.64 | 71.93 | 82.00 |
| Graph + <72 h ADT + SEIZ | 58.08 | 56.08 | 53.00 | 59.55 |
|
|
|
|
The best performing experimental setups are provided in bold.
Twitter15 and 16 Graph-based and combined feature sets −4-class classification results.
|
| ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
|
|
|
|
|
|
|
|
|
|
| Graph + <4 h ADT | 37.27 | 37.66 | 38.54 | 37.27 | 51.71 | 51.80 | 52.60 | 51.71 |
| Graph + <24 h ADT |
|
|
|
| 56.10 | 56.01 | 57.34 | 56.10 |
| Graph + <48 h ADT | 39.68 | 39.20 | 38.95 | 39.68 |
|
|
|
|
| Graph + <72 h ADT | 37.53 | 37.09 | 36.99 | 37.53 | 59.02 | 58.75 | 59.10 | 59.02 |
| Graph + Total ADT | 38.34 | 38.41 | 39.02 | 38.34 | 52.20 | 51.86 | 51.75 | 52.20 |
| Graph + SEIZ feature set 4-class classification | ||||||||
| Graph + <4 h ADT + SEIZ |
|
|
|
| 56.91 | 55.88 | 56.00 | 56.91 |
| Graph + <48 h ADT + SEIZ | 39.34 | 37.47 | 36.51 | 39.34 |
|
|
|
|
| Graph + <72 h ADT + SEIZ | 42.15 | 40.50 | 39.93 | 42.15 | 52.23 | 52.48 | 53.51 | 52.23 |
The best performing experimental setups are provided in bold.
Baseline models classification results for Twitter15 and Twitter16 datasets.
|
| ||
|---|---|---|
|
|
| |
| DTC (Castillo et al., | 45.4 | 46.5 |
| SVM-RBF (Yang et al., | 31.8 | 32.1 |
| SVM-TS (Ma et al., | 54.4 | 57.4 |
| DTR (Zhao et al., | 40.9 | 41.4 |
| GRU (Ma et al., | 64.6 | 63.3 |
| RFC (Kwon et al., | 56.5 | 58.5 |
| PTK (Ma et al., | 75.0 | 73.2 |
| RvNN (Ma et al., | 72.3 | 73.7 |
| PPC (Liu and Wu, | 84.2 | 86.3 |
| Our approach | 46.5 | 62.4 |