| Literature DB >> 33821098 |
Xuefan Dong1,2, Ying Lian3, Yuxue Chi4,5, Xianyi Tang6, Yijun Liu4,5.
Abstract
Based on the supernetwork theory, a two-step rumor detection model was proposed. The first step was the classification of users on the basis of user-based features. In the second step, non-user-based features, including psychology-based features, content-based features, and parts of supernetwork-based features, were used to detect rumors posted by different types of users. Four machine learning methods, namely, Naive Bayes, Neural Network, Support Vector Machine, and Logistic Regression, were applied to train the classifier. Four real cases and several assessment metrics were employed to verify the effectiveness of the proposed model. Performance of the model regarding early rumor detection was also evaluated by separating the datasets according to the posting time of posts. Results showed that this model exhibited better performance in rumor detection compared to five benchmark models, mainly owing to the application of the supernetwork theory and the two-step mechanism.Entities:
Keywords: Machine learning classification; Rumor detection; Supernetwork theory; Two-step method
Year: 2021 PMID: 33821098 PMCID: PMC8014906 DOI: 10.1007/s11227-021-03748-x
Source DB: PubMed Journal: J Supercomput ISSN: 0920-8542 Impact factor: 2.474
Employed rumor features in 12 relevant studies
| Papers | Content | User | Diffusion | Topic | Multimedia | Network | Number of features |
|---|---|---|---|---|---|---|---|
| [ | 19 | ||||||
| [ | 7 | ||||||
| [ | 12 | ||||||
| [ | 9 | ||||||
| [ | 22 | ||||||
| [ | 23 | ||||||
| [ | 29 | ||||||
| [ | 41 | ||||||
| [ | 24 | ||||||
| [ | 38 | ||||||
| [ | 11 | ||||||
| [ | 14 |
some of above papers proposed several categories that are not included in Table . However, it seems that they can be integrated, to a large extent. For example, in Liang et al.[16], a behavior-based set of features is proposed, including verified users, average number of followers per day, and average number of posts per day, which are related to user profile. Therefore, we classified these features into the user-based group
Fig. 1A general form of supernetwork Notes there are four different layers in Fig. 1; nodes with similar color are included in the similar layer; black edges measure the interactions between two nodes within different layers; blue, orange, green, and gray edges measure the interactions between two nodes within a similar layer
Fig. 2A simple form of online public opinion supernetwork model [7] Notes (1) Social subnetwork: each blue dot measures an individual; (2) Environment subnetwork: each purple star measures a message; (3) Psychological subnetwork: each yellow triangle measures a kind of psychology; (4) Viewpoint subnetwork: each green square measure a keyword
Fig. 3Rumor detection supernetwork with three layers Notes (1) Social subnetwork: each blue dot measures an individual; (2) Psychological subnetwork: each yellow triangle measures a kind of psychology; (3) each green square measure a keyword
Fig. 4Word cloud graph of 200 high frequency words of rumors
List of rumor-related features selected in this paper
| Group | Serial | Description |
|---|---|---|
| User-based features | F1 | Whether the user's identity is verified by Weibo or not |
| F2 | Whether a user account is an authentic account or not | |
| F3 | Whether the user has personal descriptions or not | |
| F4 | Gender | |
| F5 | Type of user | |
| F6 | Age | |
| F7 | Ratio between following and follower | |
| F8 | Location | |
| F9 | Time between the registration time and posting time | |
| F10 | Time between this post and last post | |
| F11 | Number of previous posts | |
| F12 | Posted by PC or mobile | |
| F13 | Average time between each two contiguous posts | |
| Psychology-based features | F14 | Psychology of posts (based on the psychological lexicon) |
| Content-based features | F15 | Whether it contains multimedia or not |
| F16 | Number of the URLs | |
| F17 | Number of specific symbols (@, *, #, etc.) | |
| F18 | Original or forward | |
| F19 | Length of post | |
| F20 | Number of comments | |
| F21 | Number of forwards | |
| F22 | Number of “liked” | |
| F23 | Time between the first comment and the last comment | |
| F24 | Time between the first forward and the last forward | |
| F25 | Time between the first “liked” and the last “liked” | |
| Supernetwork-based features | F26 | Social Subnetwork Clustering Coefficient |
| F27 | Psychology Complexity | |
| F28 | Rumor Keywords Density | |
| F29 | Superedge Similarity |
Fig. 5The framework of Two-Step Supernetwork Rumor Detection System
Rumor information of four topics
| Topic | Rumor information |
|---|---|
| Illegal vaccine scandal in Shandong, China | 1. Disabled children have a clear causal relationship with vaccination |
| 2. More than 1000 children are left with lifelong disability in each year | |
| 3. It is life-threatening to fight a non-refrigerated vaccine | |
| 4. Adverse reactions are all due to problems with the vaccine | |
| 5. The second type of vaccine is unsafe | |
| 6. Chinese domestic vaccine is low-tech | |
| The establishment of Xiongan New Area | 1. A picture showing a dead male who jumped out of windows widely transmitted via the Internet. The reason for suicide is that he sold his house located in Xiong’an City before the establishment of “Xiong’an District” (The price of houses located in Xiong’an City grew rapidly after the establishment of “Xiong’an District”) |
| 2. 87 Central Government-owned enterprises will move to Xiong'an New District | |
| 3. The policy of car plate lottery will be implemented in Baoding City in June | |
| 4. The Capital Third Airport will be located in Baoding City, Xushui District | |
| RYB kindergarten teachers abused children | 1. Children are forced-fed white pills that “made them go to sleep” |
| 2. Children were subjected to naked “health checks” at the daycare | |
| 3. Children were sexually molested | |
| 4. A Chinese military regiment was involved in molesting the children | |
| COVID-19 | 1. A graduate student from Wuhan Virus Research institute, Chinese Academy of Sciences is the Patient Zero of COVID-19 |
| 2. Zhong Nanshan did not wear a mask in the hospital | |
| 3. Zhong Nanshan predicts the time for lifting restrictions | |
| 4. Normal travel would be recovered in March 16, 2020 | |
| 5. A module hospital would be built in Beijing Jishuitan Hospital | |
| 6. A woman spat at the doorknob of a residential area in Wuhan | |
| 7. Mutation virus appeared in Wenzhou | |
| 8. A man infected by COVID-19 coughed hundreds of times in a market | |
| 9. Government works in Tianmen, Hubei Province dumped the radishes donated by other provinces into the garbage station | |
| 10. Floating workers to Shanghai could not enter the community | |
| 11. People without a pass would be segregated | |
| 12. 6 points deducted for driving without mask | |
| 13. Dozens of mask manufacturers have stopped production in Changyuan, Henan Province | |
| 14. 30,000 sheep donated by Mongolia were driven to Er Lian Hao Te, Inner Mongolia |
Description of the data
| Topic | Time period | keywords | Number of posts | ||
|---|---|---|---|---|---|
| Rumor | Non-rumor | Total | |||
| Illegal vaccine scandal in Shandong, China | March 18, 2016 to March 24, 2016 | illegal vaccine; Shandong Province | 5441 | 17,571 | 23,012 |
| The establishment of Xiongan New Area | April 1, 2017 to April 7, 2017 | Xiongan; New Area | 12,219 | 37,878 | 50,097 |
| RYB kindergarten teachers abused children | November 12, 2017 to November 27, 2017 | Red Yellow Blue (RYB) kindergarten; Child; Abuse | 5736 | 7846 | 13,582 |
| COVID-19 | December 20, 2019 to March 1, 2020 | COVID-19; Coronavirus | 42,005 | 90,671 | 132,676 |
Comparison results of the models
| Acc. (%) | P. (%) | R. (%) | F1(%) | F0.5(%) | F2(%) | ||
|---|---|---|---|---|---|---|---|
| NB | Model A | 80.19 | 60.22 | 79.45 | 67.59 | 62.79 | 76.55 |
| Model B | 78.96 | 58.55 | 78.07 | 66.07 | 61.20 | 75.12 | |
| Model C | 85.21 | 68.11 | 84.98 | 75.19 | 70.69 | 82.73 | |
| [ | 77.48 | 56.59 | 76.62 | 64.28 | 59.30 | 73.57 | |
| [ | 78.31 | 56.93 | 77.60 | 65.40 | 59.99 | 74.73 | |
| [ | 82.94 | 64.36 | 83.00 | 71.91 | 67.07 | 80.38 | |
| [ | 85.14 | 70.55 | 79.42 | 74.56 | 72.06 | 78.37 | |
| [ | 79.45 | 59.02 | 75.85 | 66.04 | 61.58 | 73.59 | |
| NN | Model A | 79.20 | 59.20 | 78.00 | 66.16 | 61.59 | 75.04 |
| Model B | 79.06 | 58.72 | 77.77 | 66.06 | 61.30 | 74.91 | |
| Model C | 83.72 | 70.33 | 81.68 | 72.64 | 71.61 | 79.57 | |
| [ | 76.16 | 54.53 | 74.95 | 62.28 | 57.24 | 71.80 | |
| [ | 78.20 | 56.90 | 78.12 | 65.18 | 59.83 | 74.98 | |
| [ | 82.45 | 64.12 | 81.17 | 70.85 | 66.50 | 78.70 | |
| [ | 83.64 | 67.26 | 77.38 | 71.84 | 69.00 | 76.18 | |
| [ | 81.80 | 64.41 | 74.39 | 68.95 | 66.14 | 73.22 | |
| SVM | Model A | 79.48 | 58.97 | 79.92 | 67.01 | 61.79 | 76.75 |
| Model B | 79.85 | 59.92 | 79.52 | 67.47 | 62.57 | 76.56 | |
| Model C | 86.67 | 70.98 | 86.78 | 77.42 | 73.29 | 84.59 | |
| [ | 76.64 | 55.44 | 74.06 | 62.65 | 57.99 | 71.25 | |
| [ | 76.84 | 54.83 | 73.93 | 62.50 | 57.58 | 71.22 | |
| [ | 80.40 | 60.46 | 78.81 | 67.82 | 63.09 | 76.20 | |
| [ | 82.63 | 64.38 | 82.44 | 71.52 | 66.93 | 79.81 | |
| [ | 80.25 | 61.45 | 76.48 | 67.46 | 63.61 | 74.30 | |
| LR | Model A | 78.78 | 58.22 | 78.50 | 65.81 | 60.84 | 75.34 |
| Model B | 79.11 | 58.43 | 79.65 | 66.60 | 61.30 | 76.45 | |
| Model C | 85.30 | 68.47 | 84.91 | 75.32 | 70.97 | 82.69 | |
| [ | 78.46 | 58.11 | 77.35 | 65.60 | 60.77 | 74.46 | |
| [ | 78.93 | 57.87 | 78.55 | 66.16 | 60.84 | 75.59 | |
| [ | 81.24 | 61.49 | 81.33 | 69.51 | 64.37 | 78.53 | |
| [ | 81.45 | 63.59 | 80.47 | 69.76 | 65.69 | 77.77 | |
| [ | 79.15 | 58.99 | 75.31 | 65.55 | 61.35 | 72.97 |
Fig. 6Performance of the models based on average results of four learning algorithms