| Literature DB >> 28081135 |
Sejeong Kwon1, Meeyoung Cha1, Kyomin Jung2.
Abstract
This study determines the major difference between rumors and non-rumors and explores rumor classification performance levels over varying time windows-from the first three days to nearly two months. A comprehensive set of user, structural, linguistic, and temporal features was examined and their relative strength was compared from near-complete date of Twitter. Our contribution is at providing deep insight into the cumulative spreading patterns of rumors over time as well as at tracking the precise changes in predictive powers across rumor features. Statistical analysis finds that structural and temporal features distinguish rumors from non-rumors over a long-term window, yet they are not available during the initial propagation phase. In contrast, user and linguistic features are readily available and act as a good indicator during the initial propagation phase. Based on these findings, we suggest a new rumor classification algorithm that achieves competitive accuracy over both short and long time windows. These findings provide new insights for explaining rumor mechanism theories and for identifying features of early rumor detection.Entities:
Mesh:
Year: 2017 PMID: 28081135 PMCID: PMC5230768 DOI: 10.1371/journal.pone.0168344
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Log-log scale of the CCDF.
Complementary Cumulative Distribution Function (CCDF) of aggregated user characteristics for rumor and non-rumor events in the 56-day observation.
Fig 2Diffusion network examples.
The network visualization in (a) shows that rumors involve a larger fraction of singletons and smaller communities, resulting in a sporadic diffusion pattern. In contrast, the diffusion network of a non-rumor event in (b) is highly connected, forming a giant connected component and a smaller fraction of singletons. Edge colors represent the relative influence of the spreader and recipient, such that red (blue) means information propagated from a lower-degree (higher-degree) spreader to a higher-degree (lower-degree) recipient.
Fig 3Samples of extracted time series.
The time series are extracted from 56-day observation period(x-axis = days; y-axis = number of tweets). Rumors typically have longer life spans and more fluctuations.
User features with invariant tendency during the observation periods.
| Symbol | Description | Class | ||||
|---|---|---|---|---|---|---|
| 3 | 7 | 14 | 28 | 56 | ||
| Minimum of number of friends | R | R | R | R | R | |
| 75th percentile of number of friends | N | N | N | N | ||
| Maximum of number of friends | N | N | N | N | ||
| Standard deviation of number of friends | N | N | N | N | N | |
| 25th percentile of number of followers | N | N | N | N | N | |
| median of number of followers | N | N | N | N | N | |
| 75th percentile of number of followers | N | N | N | N | N | |
| Maximum of number of followers | N | N | N | N | N | |
| Average of number of followers | N | N | N | N | N | |
| Standard deviation of number of followers | N | N | N | N | N | |
| Kurtosis deviation of number of followers | N | N | N | N | N | |
| Skewness of number of followers | N | N | N | N | N | |
| 25th percentile of number of tweets | N | N | N | N | N | |
| median of number of tweets | N | N | N | N | N | |
| 75th percentile of number of tweets | N | N | N | N | N | |
| Maximum of number of tweets | N | N | N | N | N | |
| Average of number of tweets | N | N | N | N | N | |
| Standard deviation of number of tweets | N | N | N | N | N | |
In the Class column, fields denoted with R (or N) indicate that the target feature over the specific observation period had higher value for rumors (or non-rumors) at a significant level of p < 0.05 based on the Mann-Whitney U test. A Blank field indicates that there was no significance in correlation. Results are shown for observation periods of 3, 7, 14, 28, and 56 days.
Temporal features with invariant tendencies during the observation periods.
| Symbol | Description | Class | ||||
|---|---|---|---|---|---|---|
| 3 | 7 | 14 | 28 | 56 | ||
| Periodicity of external shock | N | N | ||||
| External shock periodicity offset | N | N | N | |||
| Interaction periodicity offset | N | N | N | |||
Linguistic features with invariant tendencies during the observation periods.
| Symbol | Description | Class | ||||
|---|---|---|---|---|---|---|
| 3 | 7 | 14 | 28 | 56 | ||
| family | family (daughter, father, husband, aunt) | N | N | N | N | N |
| i | 1st person singular (I, me, mine) | R | R | R | R | R |
| you | 2nd person (you, your, thou) | R | R | R | R | R |
| conj | conjunctions (and, but, whereas, although) | R | R | R | R | R |
| present | present tense (is, does, hear) | R | R | R | R | R |
| auxverb | auxiliary verbs (am, will, have) | R | R | R | R | R |
| discrep | discrepancy (would, should, could) | R | R | R | R | R |
| adverb | adverb (very, really, quickly) | R | R | R | R | R |
| excl | exclusive (but, without, exclude) | R | R | R | R | R |
| cogmech | cognitive mechanism (cause, know, ought) | R | R | R | R | R |
| negate | negations (not, no, never) | R | R | R | R | R |
| tentat | tentative (maybe, perhaps, guess) | R | R | R | R | R |
| assent | assent (agree, OK, yes) | R | R | R | R | |
| certain | certain (always, never) | R | R | R | R | |
| social | social processes (mate, talk, they, child) | R | R | R | ||
| swear | swear words (damn, piss, f*ck) | R | R | R | ||
| hear | hear (listen, hearing) | R | R | R | ||
The ‘Description’ column lists example words for each symbol.
Fig 4Correlogram among the linguistic features.
These plots show correlations among linguistic features with significance for rumor events. Colors of circles represent correlation coefficients, where the dark blue (red) color indicates coefficients of 1 (-1). Size of circles represent the absolute values of correlation coefficients. Blank or no circle indicates non-significant cases where p-values ≥ 0.05. From (b), strong negation is expressed by a positive correlation between scores of ‘negation’ and ‘certain’. Interestingly, the ‘assent’ category increases for rumors, which indicates that users who confirmed the rumor also appeared as time passed.
Network features with invariant tendencies during the observation periods.
| Symbol | Description | Class | ||||
|---|---|---|---|---|---|---|
| 3 | 7 | 14 | 28 | 56 | ||
| Number of nodes of extended network | N | N | N | N | N | |
| Number of edges of extended network | N | N | N | N | N | |
| Number of nodes without incoming edge in extended network | N | N | N | N | N | |
| Number of nodes without outgoing edge in extended network | N | N | N | N | N | |
| Density of LCC of extended network | R | R | R | R | R | |
| Number of edges of LCC of extended network | N | N | N | N | N | |
| Number of nodes of friendship network | N | N | N | N | N | |
| Number of edges of friendship network | N | N | N | N | N | |
| Number of nodes without outgoing edge in friendship network | N | N | N | N | N | |
| Density of LCC of friendship network | R | R | R | R | R | |
| Number of edges of LCC of friendship network | N | N | N | N | N | |
| Proportion of nodes without incoming edges of friendship network | R | R | R | R | R | |
| Proportion of nodes without outgoing edges of friendship network | R | R | R | R | R | |
| Proportion of isolated nodes of friendship network | R | R | R | R | R | |
| Density of LCC of diffusion network | R | R | R | R | R | |
| Number of nodes of LCC of diffusion network | N | N | N | N | N | |
| Number of edges of LCC of diffusion network | N | N | N | N | N | |
| Proportion of nodes without incoming edges of diffusion network | R | R | R | R | R | |
| Proportion of nodes without outgoing edges of diffusion network | R | R | R | R | R | |
| Proportion of isolated nodes of diffusion network | R | R | R | R | R | |
Fig 5Comparison of the strength of the features in determining rumors.
Total and User+Linguistic are the newly proposed rumor classification algorithms in this study.
Selected features.
| Symbol | Description | Selection | ||||
|---|---|---|---|---|---|---|
| 3 | 7 | 14 | 28 | 56 | ||
| Standard deviation of number of friends | ** | |||||
| Standard deviation of number of followers | ** | |||||
| Average of number of followers | ** | |||||
| i | 1st person singular (i, me, mine) | * | * | |||
| conj | conjunctions (and, but, whereas, although) | ** | ** | ** | ||
| auxverb | auxiliary verbs (am, will, have) | ** | ** | ** | ||
| adverb | adverb (very, really, quickly) | ** | * | ** | ** | |
| excl | exclusive (but, without, exclude) | ** | * | ** | ** | ** |
| cogmech | cognitive mechanism (cause, know, ought) | * | ** | ** | ||
| affect | affective processes (happy, cried, abandon) | ** | * | |||
| negate | negations (not, no, never) | ** | ** | ** | ** | ** |
| tentat | tentative (maybe, perhaps, guess) | ** | * | ** | ** | ** |
| certain | certain (always, never) | ** | ** | |||
| hear | hear (listen, hearing) | * | ** | |||
| Number of edges of LCC of extended network | ** | ** | * | |||
| Average clustering coefficients of LCC of friendship network | * | * | ||||
| Proportion of isolated nodes of diffusion network | ** | |||||
| Fraction of LTH among information diffusion | ** | ** | ||||
| Fraction of HTL among information diffusion | ** | ** | ||||
| Strength of external shock at birth | * | ** | ||||
| Periodicity of external shock | ** | |||||
| External shock periodicity offset | ** | |||||
| Interaction periodicity offset | ** | * | ||||
This table lists the features that are selected as prominent differentiating ones among all features. In the Selection column, * and ** signs indicate that the target feature was selected for the interpretation set and prediction set. If a feature is selected for the prediction set, it is also in the interpretation set by definition. A blank field indicates that the corresponding feature was not selected.