| Literature DB >> 35282405 |
Sushila Shelke1,2, Vahida Attar1.
Abstract
Emergence in the social network leads to the extensive and faster diffusion of news than conventional news channels. Verification of data is challenging due to massive information on a social network. Unverified information can be a rumor or fake news that causes damage to an individuals and organizations, revealing the harmful impact on humanity. Therefore, it is vital to combat rumor diffusion to minimize the adverse effects on society. Despite vigorous efforts to deal with this issue, researchers mainly focussed on temporal dynamics of posts and other features like a user, network, content-based, which demonstrate a moderate accuracy. The time series features are associated with an event that suppresses the other quality features related to each post. There is a scope for improvement in the accuracy, so this paper focuses on post-wise features such as user-based, content-based and lexical-based features along with post sequences. We proposed a framework that uses various essential features and combines two deep learning models. Word embedding is utilized with bidirectional long short-term memory (BiLSTM) and combined with post-wise features using a multilayer perceptron (MLP), which improves accuracy. The experiments on the real-world dataset of Twitter demonstrate a notable improvement in accuracy compared to state-of-the-art approaches.Entities:
Keywords: Deep learning; Lexical features; Rumor; Rumor detection; Social network
Year: 2022 PMID: 35282405 PMCID: PMC8898597 DOI: 10.1007/s11042-022-12761-y
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Deep learning-based approaches with various features
| Ref. no. | Approach | Models | Features type | |||
|---|---|---|---|---|---|---|
| User | Text | Propagation | Temporal | |||
| [ | RNN | RNN, LSTM + GRU, Multilayer GRU | ||||
| [ | RNN | RNN + Multilayer LSTM | ||||
| [ | RNN | Bi-LSTM | ||||
| [ | RNN | RNN + Autoencoder | ||||
| [ | CNN | CNN + Max Pooling | ||||
| [ | CNN | Bi-LSTM | ||||
| [ | CNN | CNN + Bi-GRU | ||||
| [ | Hybrid | RNN + NN | ||||
| [ | Hybrid | RNN + CNN | ||||
| [ | Hybrid | GRU + CNN | ||||
| [ | Hybrid | GAN + CNN, GAN + GRU | ||||
Comparative study of performance metrics
| Ref. no. | Approach | Text representation | Model | Class | Accuracy | F1 |
|---|---|---|---|---|---|---|
| [ | RNN | tf-idf | GRU2 | R | 0.881 | 0.898 |
| NR | 0.86 | |||||
| [ | RNN | tf-idf | CallAtRumors-LSTM | – | – | 0.87 |
| [ | RNN | tf-idf | HSA-BLSTM | R | 0.844 | 0.825 |
| NR | 0.863 | |||||
| [ | CNN | Paragraph2vector | CAMI-CNN | R | 0.777 | 0.793 |
| NR | 0.758 | |||||
| [ | CNN | word2vector | RCNN-FAN | R | 0.799 | 0.792 |
| NR | 0.805 | |||||
| [ | Hybrid | doc2vec | CSI-LSTM+NN | – | 0.892 | 0.894 |
| [ | Hybrid | tf-idf | CED-CNN + RNN | – | 0.744 | 0.747 |
| [ | Hybrid | tf-idf | GAN-GRU | R | 0.863 | 0.866 |
| NR | 0.858 |
Fig. 1Identification of event as rumor from snopes.com
Fig. 2Identification of event as non-rumor from Politifact.com
Fig. 3Example of search queries for data collection of events
Sample list for rumor events
| Sr. no. | Date | Event story | Post_count |
|---|---|---|---|
| 1 | 09-03-2020 | A homemade hand sanitizer made with Tito’s Vodka can be used to fight the new coronavirus. | 2186 |
| 2 | 11-03-2020 | Will an Asteroid Hit Earth in April 2020? | 37 |
| 3 | 21-03-2020 | Says drinking a bleach solution will prevent you from getting the coronavirus | 123 |
| 4 | 21-03-2020 | Don’t hold your breath. This isn’t a credible way to test for coronavirus | 996 |
| 5 | 23-03-2020 | Will Eating Bananas Prevent Coronavirus Infection? | 60 |
| 6 | 23-03-2020 | Did Nostradamus Predict the COVID-19 Pandemic? | 125 |
| 7 | 26-03-2020 | Can You Get a Free Coronavirus Test by Donating Blood? | 143 |
| 8 | 26-03-2020 | Will Gargling with Salt Water or Vinegar ‘Eliminate’ the COVID-19 Coronavirus? | 144 |
| 9 | 26-03-2020 | Will Sipping Water Every 15 Minutes Prevent a Coronavirus Infection? | 17 |
| 10 | 28-03-2020 | Beware of rumors of robbers posing as COVID testers | 110 |
| 11 | 29-03-2020 | Does ‘Every Election Year’ Have a Coinciding Disease? | 4971 |
Sample list for non-rumor events
| Sr. no. | Date | Event story | Post_count |
|---|---|---|---|
| 1 | 24-03-2020 | Was ‘Coronavirus’ Replaced with ‘Chinese Virus’ in Trump’s Notes? | 10,304 |
| 2 | 27-03-2020 | Bill Gates told us about the coronavirus in 2015 | 5000 |
| 3 | 27-03-2020 | Was COVID-19 Discovered in the US and South Korea on the Same Day? | 52 |
| 4 | 27-03-2020 | Are Most Cruise Ships Registered Under Foreign Flags | 1474 |
| 5 | 27-03-2020 | Did Video Show Italian Army Trucks Transporting Coffins Amid COVID-19 Pandemic? | 92 |
| 6 | 27-03-2020 | Is This a Photo of an American Revolutionary War Vet? | 865 |
| 7 | 28-03-2020 | Spectrum will provide free internet to students during coronavirus school closures | 2850 |
| 8 | 28-03-2020 | Does ‘Triscuit’ Mean ‘Electric Biscuit’? | 68 |
| 9 | 01-04-2020 | Did Empire State Building Display ‘Siren’ Lights During COVID-19 Pandemic? | 9994 |
| 10 | 01-04-2020 | Did Cities Close Schools, Businesses During the 1918 Pandemic? | 36 |
| 11 | 01-04-2020 | Did the Trump Administration Send 18 Tons of PPE to China in Early 2020? | 1092 |
Details of real-world and benchmarked dataset
| Name | Real-World | Benchmarked | ||
|---|---|---|---|---|
| English Only | Existed | Extracted | English Only | |
| Total Events | 70 | 992 | 990 | 986 |
| Total Rumor Events | 51 | 498 | 498 | 498 |
| Total Non-Rumor Events | 18 | 494 | 492 | 489 |
| Total Posts | 85,560 | 340,176 | 274,530 | 267,708 |
| Total Rumored Posts | 47,209 | 132,470 | 105,256 | 104,920 |
| Total Non-Rumored Posts | 38,351 | 207,706 | 169,274 | 162,788 |
| Minimum Posts Per Event | 2 | 10 | 2 | 2 |
| Maximum Posts Per Event | 10,304 | 3029 | 2838 | 2702 |
Fig. 4Pseudocode for text preprocessing
Identified features from user, content and lexical category
| Category | Features |
|---|---|
| User | User_Registration_Age, Is_Verified_user?, User_Description_Length, Follower_count, Friends_count, Favourite_count, Status_count, User_Location_present |
| Content based | #Hashtags, #URLs, # Question_Marks, #Exlamatory, #Mentions Retweet_count, Word_count, Sentiment_score, Is_Media_present, Tweet_favorite_count, Tweet_geo_location_present, Tweet_reply_count |
| Lexical | alcohol, ancient, anger, animal, anonymity, anticipation, appearance, art, attractive, banking, beach, beauty, blue_collar_job, body, breaking, business, car, celebration, cheerfulness, childish, children, cleaning, clothing, cold, college, communication, competing, computer, confusion, contentment, cooking, crime, dance, death, deception, disappointment, disgust, dispute, divine, domestic_work, dominant_heirarchical, dominant_personality, driving, eating, economics, emotional, envy, exasperation, exercise, exotic, fabric, family, farming, fashion, fear, feminine, fight, fire, friends, fun, furniture, gain, giving, government, hate, healing, health, hearing, help, heroic, hiking, hipster, home, horror, hygiene, independence, injury, internet, irritability, journalism, joy, kill, law, leader, legend, leisure, liquid, listen, love, lust, magic, masculine, medical_emergency, medieval, meeting, messaging, military, money, monster, morning, movement, music, musical, negative_emotion, neglect, negotiate, nervousness, night, noise, occupation, ocean, office, optimism, order, pain, party, payment, pet, philosophy, phone, plant, play, politeness, politics, poor, positive_emotion, power, pride, prison, programming, rage, reading, real_estate, religion, restaurant, ridicule, royalty, rural, sadness, sailing, school, science, sexual, shame, shape_and_size, ship, shopping, sleep, smell, social_media, sound, speaking, sports, stealing, strength, suffering, superhero, surprise, swearing_terms, swimming, sympathy, technology, terrorism, timidity, tool, torment, tourism, toy, traveling, trust, ugliness, urban, vacation, valuable, vehicle, violence, war, warmth, water, weakness, wealthy, weapon, weather, wedding, white_collar_job, work, worship, writing, youth, zest |
Fig. 5Cumulative explained variance graph for PCA components
Fig. 6Distribution of user registration age over rumor and non-rumor posts
Fig. 7Correlation matrix for rumor and non-rumor posts
Fig. 8Block diagram for input and output of BiLSTM_UCL model
Fig. 9Proposed architecture of BiLSTM_UCL model
Fig. 10BiLSTM_UCL model summary
Details of benchmarked and extended real-world dataset
| Name | Benchmarked | Real-world-extended |
|---|---|---|
| Total Events | 986 | 1056 |
| Total Rumor Events | 498 | 549 |
| Total Non-Rumor Events | 489 | 507 |
| Total Posts | 267,708 | 353,268 |
| Total Rumored Posts | 104,920 | 152,129 |
| Total Non-Rumored Posts | 162,788 | 201,139 |
| Minimum Posts Per Event | 2 | 2 |
| Maximum Posts Per Event | 2702 | 10,304 |
Fig. 11Confusion matrix for rumor detection
Details of optimal hyperparameters
| Parameter name | Value of parameter |
|---|---|
| Vocabulary size | 1000 |
| Sequence length | 100 |
| Dropout | 0.5 |
| Adagrad Learning rate | 0.001 |
| Epoch | 100 |
| Number of Dense layers | 5 |
| Batch Size | 32 |
| Loss function | binary_crossentropy |
| Activation Function | Relu |
| Activation Function in Output Layer | Sigmoid |
| Loss Function | Adagrad |
Fig. 12The learning curve of accuracy and loss for the BiLSTM_UCL model
Performance of proposed models on benchmarked and real-world dataset
| Sr. no. | Models | Class | Benchmarked dataset | Real-world extended dataset | ||||
|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | |||
| 1. | Lex_PCA | NR | 0.88 | 0.94 | 0.91 | 0.91 | 0.94 | 0.92 |
| R | 0.90 | 0.80 | 0.85 | 0.91 | 0.87 | 0.89 | ||
| 2. | UCL_PCA | NR | 0.92 | 0.97 | 0.95 | 0.92 | 0.97 | 0.95 |
| R | 0.95 | 0.87 | 0.91 | 0.96 | 0.89 | 0.93 | ||
| 3. | BiLSTM_Embed | NR | 0.95 | 0.97 | 0.96 | 0.96 | 0.96 | 0.96 |
| R | 0.95 | 0.93 | 0.94 | 0.95 | 0.94 | 0.95 | ||
| 4. | BiLSTM_UCL | NR | 0.97 | 0.97 | 0.98 | 0.97 | 0.98 | 0.98 |
| R | 0.96 | 0.98 | 0.97 | 0.96 | 0.97 | 0.97 | ||
Experimental results
| Sr. no. | Models | Class | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|
| 1 | GRU-II [ | R | 0.881 | 0.851 | 0.95 | 0.898 |
| NR | 0.93 | 0.8 | 0.86 | |||
| 2 | HSA-BLSTM [ | R | 0.844 | 0.87 | 0.67 | 0.757 |
| NR | 0.73 | 0.899 | 0.805 | |||
| 3 | CAMI [ | R | 0.777 | 0.744 | 0.848 | 0.793 |
| NR | 0.82 | 0.705 | 0.758 | |||
| 4 | CSI [ | – | 0.892 | – | – | 0.83 |
| 5 | Lex_PCA | R | 0.89 | 0.90 | 0.80 | 0.85 |
| NR | 0.88 | 0.94 | 0.91 | |||
| 6 | UCL_PCA | R | 0.93 | 0.95 | 0.87 | 0.91 |
| NR | 0.92 | 0.97 | 0.95 | |||
| 7 | BiLSTM_Embed | R | 0.95 | 0.95 | 0.93 | 0.94 |
| NR | 0.95 | 0.97 | 0.96 | |||
| 8 | BiLSTM_UCL | R | 0.96 | 0.98 | 0.97 | |
| NR | 0.97 | 0.97 | 0.98 |
Fig. 13Comparison of the proposed model with existing models
Fig. 14Improvement in accuracy throughout the proposed models