| Literature DB >> 33088880 |
Ece C Mutlu1, Toktam Oghaz2, Jasser Jasser2, Ege Tutunculer1, Amirarsalan Rajabi2, Aida Tayebi1, Ozlem Ozmen1, Ivan Garibay1,2.
Abstract
At the time of this study, the SARS-CoV-2 virus that caused the COVID-19 pandemic has spread significantly across the world. Considering the uncertainty about policies, health risks, financial difficulties, etc. the online media, especially the Twitter platform, is experiencing a high volume of activity related to this pandemic. Among the hot topics, the polarized debates about unconfirmed medicines for the treatment and prevention of the disease have attracted significant attention from online media users. In this work, we present a stance data set, COVID-CQ, of user-generated content on Twitter in the context of COVID-19. We investigated more than 14 thousand tweets and manually annotated the tweet initiators' opinions regarding the use of "chloroquine" and "hydroxychloroquine" for the treatment or prevention of COVID-19. To the best of our knowledge, COVID-CQ is the first data set of Twitter users' stances in the context of the COVID-19 pandemic, and the largest Twitter data set on users' stances towards a claim, in any domain. We have made this data set available to the research community via the Mendeley Data repository. We expect this data set to be useful for many research purposes, including stance detection, evolution and dynamics of opinions regarding this outbreak, and changes in opinions in response to the exogenous shocks such as policy decisions and events.Entities:
Keywords: COVID-19; Coronavirus; Hydroxychloroquine; Opinion mining; Polarity; Social media; Stance classification; Twitter
Year: 2020 PMID: 33088880 PMCID: PMC7560381 DOI: 10.1016/j.dib.2020.106401
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
The frequency of the annotated tweets belonging to each stance class.
| Stance | Number of Tweets |
|---|---|
| Neutral | 2848 |
| Against | 4685 |
| Favor | 6841 |
Fig. 1This figure demonstrates the most frequently used keywords in each stance category: (a) favor, (b) neutral, (c) against.
The most common five words and their frequencies in each stance category.
| Favor | Neutral | Against | |||
|---|---|---|---|---|---|
| word | count | word | count | word | count |
| drug | 1047 | drug | 467 | Drug | 1318 |
| treatment | 1099 | treatment | 326 | Treatment | 689 |
| patient | 1510 | patient | 354 | Patient | 906 |
| doctor | 1243 | Trump | 714 | Trump | 2863 |
| treat | 1180 | India | 673 | Study | 659 |
Fig. 2The daily tweet counts in April 2020, classified into three categories: ’neutral’, ’against’, and ’favor’. The black line refers to the ratio of the number of ’favor’ labeled tweets to the number of tweets with the ’against’ stance, for a 3 day moving average.
Fig. 3A visualization of the distribution of our data set for stance annotation.
The comparison of the results for 6 classifiers using tf-idf vectorization.
| Classifier | Unigram | Bigram |
|---|---|---|
| Stochastic Gradient Descent (SGD) | 0.7429 | 0.7439 |
| Support Vector Machine (SVM) | 0.7651 | 0.7651 |
| Multilayer Perceptron (MLP) | 0.7453 | 0.7457 |
| Logistic Regression (LR) | ||
| Multinomial Naive Bayes (MNB) | 0.7182 | 0.7182 |
| Gradient Boosting Classifier (GB) | 0.6764 | 0.6768 |
| Subject | Social Science, Health Informatics, Computer Science |
| Specific subject area | Twitter stance detection, Sentiment analysis, |
| Polarization among in audiences | |
| Type of data | Text (CSV-formatted) |
| How data were acquired | Twitter API |
| Data format | Raw, Filtered |
| Parameters for data collection | Keyword query: Coronavirus; Corona; COVID-19; Covid19; |
| Sars-cov-2; COVD; Pandemic (case insensitive) | |
| Description of data collection | We collected Twitter data for April 2020 with the specified |
| keyword queries. We considered only root tweets (excluding retweets). | |
| Data source location | |
| Data accessibility | |
| We adhere to Twitter’s terms and conditions by not providing the tweet | |
| JSON, but sharing the stance labels with the tweet IDs, so that the tweets | |
| can be rehydrated from the Twitter API. |