| Literature DB >> 34025817 |
Abstract
Online social media (OSM) has emerged as a prominent platform for debate on a wide range of issues. Even celebrities and public figures often share their opinions on a variety of topics through OSM platforms. One such subject that has gained a lot of coverage on Twitter is the Novel Coronavirus, officially known as COVID-19, which has become a pandemic and has sparked a crisis in human history. In this study, we examine 29 million tweets over three months to study highly influential users, whom we refer to as leaders. We recognize these leaders through social network techniques and analyse their tweets using text analysis. Using a community detection algorithm, we categorize these leaders into four clusters: research, news, health, and politics, with each cluster containing Twitter handles (accounts) of individual users or organizations. e.g., the health cluster includes the World Health Organization (@WHO), the Director-General of WHO (@DrTedros), and so on. The emotion analysis reveals that (i) all clusters show an equal amount of fear in their tweets, (ii) research and news clusters display more sadness than others, and (iii) health and politics clusters are attempting to win public trust. According to the text analysis, the (i) research cluster is more concerned with recognizing symptoms and the development of vaccination; (ii) news and politics clusters are mostly concerned with travel. We then show that we can use our findings to classify tweets into clusters with a score of 96% AUC ROC.Entities:
Keywords: COVID-19; Community detection; LDA; OSM; PageRank; Sentiment analysis; Social Network
Year: 2021 PMID: 34025817 PMCID: PMC8124097 DOI: 10.1007/s13278-021-00756-w
Source DB: PubMed Journal: Soc Netw Anal Min
Fig. 1Tweets collection, preprocessing, emotions and public concern identification flow
Fig. 2Tweets preprocessing
Description of the dataset
| Parameter | Value |
|---|---|
| Time period | 01-02-2020 to 02-05-2020 |
| #Tweets | 29,469,349 |
| #Original tweets | 6,494,657 |
| #Retweets | 22,974,692 |
| #Users | 7,875,334 |
| #Features | 91 |
Fig. 3Optimal number of clusters using Girvan-Newman method. Here, x-axis represents the number of clusters identified by Girvan-Newman community detection algorithm, and y-axis represents modularity value for different numbers of clusters. We find that the optimal value for the number of clusters is 4 (with highest modularity value) for our dataset
Fig. 4Dominant emotions in tweets from various leaders
Fig. 5Dominant emotions in tweets from various leaders over time
Topic Modeling using LDA with Gibbs sampling
| S.No. | Words | Topic (Proposed) |
|---|---|---|
| 1 | flu, symptomatic, fever, infected, spreading, china, cold, sick, severe, asymptomatic | Symptoms |
| 2 | china, effective, flu, cdc, world, research, testing, vaccine, pandemic, treatment | Vaccination |
| 3 | hand, wash, sanitizer, time, home, touch, masks, water, soap | Countermeasures |
| 4 | italy, international, quarantine, pandemic, traveling, restrictions, china, government, iran, world | Travel |
| 5 | china, stop, cdc, who, pandemic, public, death, rate, flu, news | Pandemic |
Fig. 6Wordcloud
Fig. 7Leaders alignment toward various public concerns
Fig. 9Random neural network framework
Fig. 8Flow diagram for features concatenation and model selection
Model mean AUC ROC and standard deviation for various dataset features. For example, the BERT model with the dataset (Text+Emotion) achieves 93.6% mean AUC ROC accuracy with 0.8% standard deviation
| Data | SVC | RF | NN | BERT |
|---|---|---|---|---|
| Text | 91.9 (2) | 95.3 (2) | 91.8 (9) | 92.5 (8) |
| Text+Concerns | 92.4 (3) | 95.3 (2) | 92.7 (9) | 92.8 (9) |
| Text+Emotions | 93.2 (3) | 93.5 (9) | 93.6 (8) | |
| Text+Emotions + Concerns |