| Literature DB >> 35789600 |
Zoleikha Jahanbakhsh-Nagadeh1,2,3, Mohammad-Reza Feizi-Derakhshi1, Majid Ramezani1, Taymaz Akan1,4,5, Meysam Asgari-Chenaghlu1, Narjes Nikzad-Khasmakhi1, Ali-Reza Feizi-Derakhshi1, Mehrdad Ranjbar-Khadivi1,6, Elnaz Zafarani-Moattar1,7, Mohammad-Ali Balafar8.
Abstract
With technologies that have democratized the production and reproduction of information, a significant portion of daily interacted posts in social media has been infected by rumors. Despite the extensive research on rumor detection and verification, so far, the problem of calculating the spread power of rumors has not been considered. To address this research gap, the present study seeks a model to calculate the Spread Power of Rumor (SPR) as the function of content-based features in two categories: False Rumor (FR) and True Rumor (TR). For this purpose, the theory of Allport and Postman will be adopted, which it claims that importance and ambiguity are the key variables in rumor-mongering and the power of rumor. Totally 42 content features in two categories "importance" (28 features) and "ambiguity" (14 features) are introduced to compute SPR. The proposed model is evaluated on two datasets, Twitter and Telegram. The results showed that (i) the spread power of False Rumor documents is rarely more than True Rumors. (ii) there is a significant difference between the SPR means of two groups False Rumor and True Rumor. (iii) SPR as a criterion can have a positive impact on distinguishing False Rumors and True Rumors.Entities:
Keywords: Ambiguity of rumor; Automatic rumor verification; Importance of rumor; Spread Power of Rumor (SPR)
Year: 2022 PMID: 35789600 PMCID: PMC9244448 DOI: 10.1007/s12652-022-04034-1
Source DB: PubMed Journal: J Ambient Intell Humaniz Comput
The list of previous machine learning techniques for rumor detection based on Content (C), User (U), and Structural (S) features
| References | Lang. | Dataset | Method | Features | Conclusion | ||
|---|---|---|---|---|---|---|---|
| C | U | S | |||||
| Castillo et al. ( | EN | DT, NB, SVM | DT as best classifier | ||||
| Qazvinian et al. ( | EN | Present the tweet with two patterns: Lexical and Part-of-speech | Identify users that spread false information in online social media using their proposed features | ||||
| Yang et al. ( | CHI | Sina Weibo | SVM-RBF kernel | Improve in accuracy | |||
| Kwon et al. ( | EN | DT, RF, SVM, LR | RF as best classifier | ||||
| Wu et al. ( | CHI | Sina Weibo | SVM-RBF kernel | Improve in accuracy using network based features | |||
| Wang and Terano ( | EN | Analyze patterns of diffusion with linear model | Identify influential spreaders | ||||
| Vosoughi ( | EN | Verify rumors in different time periods using DTW and HMMs | HMM as best classifier | ||||
| Floos ( | AR | TF-IDF | The effectiveness of content features in validating Arabic tweets | ||||
| Zhao et al. ( | En | Searching enquiry phrases, clustering similar posts, then ranking the clusters | Accuracy of 0.52 for their best run using J48 | ||||
| Liu and Xu ( | CHI | Sina Weibo | SVM | Differences in the propagation patterns of rumors and credible messages | |||
| Zubiaga et al. ( | En | A sequential classifier | |||||
| Kwon et al. ( | EN | RF | Identify significant features in the first 3, 7, 14, 28 and 56 days of the initiation | ||||
| Zamani et al. ( | FA | J48, Naive Bayes, SMO, IBK | About 70% precision just based on structural features and about 80% based on both categories of features | ||||
| Mahmoodabad et al. ( | FA | MLP, KNN, DT, NB, Random Tree, RF, Rules.Part, SVM, etc | RF and meta. RandomSubSpace as best classifiers | ||||
| Kumar et al. ( | EN | Implement SVM, DT, KNN, NB, NN using Particle Swarm Optimization (PSO) to select optimal features | Improve in accuracy using PSO | ||||
Fig. 1Proposed structure to compute spread power of rumor
Fig. 2The hierarchical structure of feature engineering for the SPR calculation
A summary of Emotional features along with a brief description of each (The new features are marked with a ”*”)
| Abbr. | Feature | Description |
|---|---|---|
| Emotional features | ||
| ETag | Emotiveness Zhou et al. ( | The ratio of adjectives plus adverbs to nouns plus verbs |
| Fr | Fear* | The ratio of the number of sentences containing fear-based words to the total number of sentences in the document |
| Su | Surprise* | The ratio of the number of sentences containing surprise-based words to the total number of sentences in the document |
| Dsg | Disgust* | The ratio of the number of sentences containing disgust-based words to the total number of sentences in the document |
| Sad | Sadness* | The ratio of the number of sentences containing sadness-based words to the total number of sentences in the text |
| An | Anger* | The ratio of the number of sentences containing anger-based words to the total number of sentences in the document |
| Aff | Affective* | The ratio of the number of sentences containing affective-based words (Words that cause emotion or feeling, such as, |
| MV | Motion Verbs* | The ratio of the number of sentences containing motion verbs (such as, jump, dilatory, rotation and so on.) to the total number of sentences in the document |
| CW | Consecutive Words* | The ratio of the number of sentences containing consecutive repeated words (such as, |
| CC | Consecutive Chars* | The ratio of the number of sentences containing consecutive repeated characters in a word (such as, |
| PS | Positive Sentiment Castillo et al. ( | The ratio the number of positive words in the document to the sum of positive and negative words. If the number of positive and negative words is zero, then PS is zero |
| NS | Negative Sentiment Castillo et al. ( | The ratio of the number of negative words in the document to the sum of positive and negative words. If the number of positive and negative words is zero, then NS is zero |
| SA_Thrt | Speech Act_Threat Jahanbakhsh-Nagadeh et al. ( | The SA_Thrt of a document is determined by SA classifier provided by Jahanbakhsh-Nagadeh et al. ( |
| SA_Req | Speech Act_Request Jahanbakhsh-Nagadeh et al. ( | The SA_Req (Ie, politely asks from somebody to do or stop doing something) of a document is determined by the SA classifier Jahanbakhsh-Nagadeh et al. ( |
| Adj_Sup | Superlative Adjective* | The ratio of the number of sentences containing Adj_Sup (Simple Adjective + Suffixes |
| Adj_Cmp | Comparative Adjective* | The ratio of the number of sentences containing Adj_Cmp (Simple Adjective + Suffixes + |
| Strt | Start sentence* | It analyzes whether the first sentence of the document contains emotion-based words. This feature have a boolean value for each document |
| End | End sentence* | It analyzes whether the last sentence of the document contains emotion-based words and words associated with the request. This feature have a boolean value for each document |
A summary of Ambiguity features along with a brief description of each (The new features are marked with a “*”)
| Abbr. | Feature | Description |
|---|---|---|
| Ucer | Uncertainity Zhou et al. ( | The ratio of uncertainty-based words to the sum of certainty and uncertainty-based words in the document. If certainty and uncertainty word are zero then uncertainty score is zero |
| SV | Sensory Verb Hamidian and Diab ( | The ratio of the number of sentences containing SV (Such as, |
| QW | Question Word Castillo et al. ( | The ratio of the number of sentences containing QW (Such as, what, when, where, and who) to the total number of sentences in the document |
| QM | Question Mark Castillo et al. ( | The ratio of the number of sentences containing the question mark ’?’ or multiple question marks “?????” to the total number of sentences of the document |
| EM | Exclamation Mark Castillo et al. ( | The ratio of the number of sentences containing the exclamation mark to the total number of sentences of the document |
| SA_Ques | Speech Act_Question Jahanbakhsh-Nagadeh et al. ( | The SA_Ques (Such as, usual questions for information or confirmation) of a document is determined by the SA classifier Jahanbakhsh-Nagadeh et al. ( |
| Pro | Pronoun Castillo et al. ( | The ratio of the number of sentences containing pronoun (A personal pronoun in 1st, 2nd, or 3rd person) to the total number of sentences of the document |
| Tntv | Tentative Hamidian and Diab ( | The ratio of the number of sentences containing tentative adjective (It describes something that is uncertain and unsure) to the total number of sentences of the document |
| Neg | Negation Kwon et al. ( | The ratio of the number of sentences containing Negation words (Units of language, including, words (e.g., not, no, never, incredible) and affixes (e.g., -n’t, un-, any-)) to the total number of sentences of the document |
| Antcpnt | Anticipation Kwon et al. ( | The ratio of the number of sentences containing Anticipation-based words to the total number of sentences of the document |
| Adv_Exm | Example Words* | The ratio of the number of sentences containing Example-based words (such as, |
| If | Conditional words* | The ratio of the number of sentences containing the conditional conjunctions (such as, if) to the total number of sentences of the document |
| GT | General Terms Castillo et al. ( | The ratio of the number of sentences containing the general terms (It Refers to a person (or object) as a class of persons or objects) to the total number of sentences of the document |
| UT | Un_Trust* | The ratio of the number of sentences containing un_trust words (such as, lack of trust, distrust, and suspicion) to the total number of sentences of the document |
A summary of Newsworthy features along with a brief description of each (The new features are marked with a “*”)
| Abbr. | Feature | Description |
|---|---|---|
| Newsworthy features | ||
| RT | Relative Time* | The ratio of the number of sentences containing RT-based words (such as, |
| Adj_Qty | Quantity Adjective* | The ratio of the number of sentences containing Quantity Adjective (such as, |
| ND | Numerical Digits Castillo et al. ( | The ratio of the number of sentences containing numeral characteres to the total number of sentences in the document |
| NE | Named Entity Hamidian and Diab ( | The ratio of the number of sentences containing NE (In three classes, the person’s name, the organization, the location) to the total number of sentences in the document |
| LD | Lexical Diversity Zhou et al. ( | The ratio of vocabulary to the total number of terms in the document Zhou et al. ( |
| Cer | Certainty Zhou et al. ( | The ratio of certainty-based words to the sum of certainty and uncertainty-based words in the document. If certainty and uncertainty word are zero then certainty score is zero |
| SA_Dec | SA_Declarative Jahanbakhsh-Nagadeh et al. ( | The SA_Dec (Ie, Transfer information to hearer) of a document is determined by the SA classifier Jahanbakhsh-Nagadeh et al. ( |
| SA_Quot | SA_Quotations Jahanbakhsh-Nagadeh et al. ( | The SA_Quot (Ie, speech acts that another person said or wrote before) of a document is determined by the SA classifier Jahanbakhsh-Nagadeh et al. ( |
| Adj_Ord | Ordinal Adjective* | The ratio of the number of sentences containing Adj_Ord (Number + |
| SM | Spelling Mistake Zhou et al. ( | The ratio of misspelled words based on typographical errors to total number of words in the document |
Fig. 3The general procedure of the SPR calculation
Distribution of Persian rumors datasets
| Dataset | FR | TR | Description |
|---|---|---|---|
| Twitter Zamani et al. ( | 783 | 783 | Crawling Twitter rumors from two Iranian websites, Gomaneh.com and Shayeaat.ir which publish Persian rumors and annotating by Zamani et al. |
| Telegram Jahanbakhsh-Nagadeh et al. ( | 882 | 882 | Crawling Telegram rumors from three Telegram channels of Iranian websites, Gomaneh.com, Wikihoax.org, and Shayeaat.ir. Also, several Telegram channels (i.e., Fars News Agency, Iranian Students’ News Agency (ISNA), Tasnim News Agency, Tabnak, Nasim News Agency (NNA), Mehr News Agency (MNA), Islamic Republic News Agency (IRNA)) has been crawled to extract non-rumors |
The result of t-test for 42 proposed features along with two criteria Ambiguity(Amb) and Importance(Imp) (those values that are less than 0.05 are italicized)
| ETag | Fr | Su | Dsg | Sad | An | Aff | MV | |
|---|---|---|---|---|---|---|---|---|
| P-value | 0.952 | 0.094 | ||||||
| P-value | 0.175 | 0.305 | ||||||
| P-value | ||||||||
| P-value | 0.847 | |||||||
| P-value | 0.228 | 0.515 | 0.867 | |||||
| P-value |
Independent t-test values for SPR
| Levene’s test for equality of variances | t-test for equality of means | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| F | Sig | t | df | Sig.(2-tailed) | Mean difference | Std. Error difference | 95% Confidence interval of the difference | |||
| Lower | Upper | |||||||||
| SPR | Equal variances assumed | 15.188 | 0.000 | 4.835 | 1233 | 0.024484 | 0.005064 | 0.014549 | 0.034419 | |
| Equal variances not assumed | 4.815 | 1131.062 | 0.024484 | 0.005085 | 0.014507 | 0.034461 | ||||
Fig. 4The illustration of the distribution of emotional features by boxplots in two classes of FR (0) and TR (1)
Fig. 5The illustration of the distribution of newsworthy-based features by boxplots in two classes of FR (0) and TR (1)
Fig. 6The illustration of the distribution of ambiguity-based features by boxplots in two classes of FR (0) and TR (1)
Fig. 7The illustration of the distribution of five features (Emo, Nws, Imp, Amb, and SPR) by box plots in two classes of FR (0) and TR (1)
Comparing the the average values of importance (Imp.), ambiguity (Amb.) and spread power of rumors (SPR) in two categories FRs and TRs on Twitter and Telegram
| Dataset | Category | Avg. Imp. | Avg. Amb. | Avg. SPR |
|---|---|---|---|---|
| Twitter Zamani et al. ( | FR | 0.217 | 0.137 | 0.135 |
| TR | 0.274 | 0.114 | 0.103 | |
| Telegram Jahanbakhsh-Nagadeh et al. ( | FR | 0.326 | 0.274 | 0.242 |
| TR | 0.361 | 0.269 | 0.218 |
Result of precision, recall and F-score measures of RF classifier based on proposed features to compute the SPR (with and without feature weighting by PSO
| Dataset | Category | Precision (with/without) | Recal (with/without) | F-measure (with/without) |
|---|---|---|---|---|
| FR | 0.772 / 0.750 | 0.746 / 0.712 | 0.759 / 0.730 | |
| TR | 0.754 / 0.726 | 0.780 / 0.763 | 0.766 / 0.743 | |
| Avg | 0.763 / 0.738 | 0.763 / 0.738 | 0.763 / 0.737 | |
| Telegram | FR | 0.791 / 0.742 | 0.814 / 0.781 | 0.802 / 0.760 |
| TR | 0.825 / 0.768 | 0.803 / 0.729 | 0.814 / 0.751 | |
| Avg | 0.808 / 0.755 | 0.808 / 0.755 | 0.808 / 0.755 |
The effect of SPR in rumor detection on Telegram dataset using RF classifier
| TP rate | FP rate | P | R | F1 | |
|---|---|---|---|---|---|
| (1) Content features | |||||
| FR | 0.753 | 0.230 | 0.766 | 0.753 | 0.759 |
| TR | 0.770 | 0.247 | 0.757 | 0.770 | 0.764 |
| Avg | 0.762 | 0.238 | 0.762 | 0.762 | 0.762 |
| (2) Content features + SPR | |||||
| FR | 0.802 | 0.145 | 0.802 | 0.846 | 0.824 |
| TR | 0.855 | 0.198 | 0.855 | 0.812 | 0.833 |
| Avg | 0.828 | 0.172 | 0.828 | 0.829 | |
Comparison of the proposed method with previous methods to detect Persian rumors based on content-based features analysis
| Method | Telegram | ||||||
|---|---|---|---|---|---|---|---|
| TR | FR | Avg | TR | FR | Avg | ||
| Zamani et al. ( | Pr | 0.568 | 0.928 | 0.753 | 0.628 | 0.951 | 0.787 |
| Re | 0.987 | 0.194 | 0.587 | 0.981 | 0.412 | 0.658 | |
| F1 | 0.721 | 0.320 | 0.514 | 0.765 | 0.575 | 0.674 | |
| Jahanbakhsh-Nagadeh et al. ( | Pr | 0.760 | 0.705 | 0.734 | 0.774 | 0.810 | 0.792 |
| Re | 0.710 | 0.755 | 0.732 | 0.823 | 0.760 | 0.791 | |
| F1 | 0.734 | 0.729 | 0.732 | 0.798 | 0.784 | 0.791 | |
| Content features + SPR | Pr | 0.736 | 0.792 | 0.766 | 0.855 | 0.802 | 0.828 |
| Re | 0.780 | 0.750 | 0.764 | 0.812 | 0.846 | 0.829 | |
| F1 | |||||||