Literature DB >> 35873536

Understanding COVID-19 response by twitter users: A text analysis approach.

Digvijay Pandey1, Bandinee Pradhan2.   

Abstract

COVID-19 outbreak has caused a high number of casualties and is an unprecedented public health emergency. Twitter has emerged as a major platform for public interactions, giving opportunity to researchers for understanding public response to the outbreak. The researchers analyzed 100,000 tweets with hashtags #coronavirus, #coronavirusoutbreak, #coronavirusPandemic, #COVID19, #COVID-19, #epitwitter, #ihavecorona, #StayHomeStaySafe, #TestTraceIsolate. Programming languages such as Python, Google NLP, and NVivo are used for sentiment analysis and thematic analysis. The result showed 29.61% tweets were attached to positive sentiments, 29.49% mixed sentiments, 23.23 % neutral sentiments and 18.069% negative sentiments. Popular keywords include "cases", "home", "people" and "help". We identified "30" such topics and categorized them into "three" themes: Public Health, COVID-19 around the world and Number of Cases/Death. This study shows twitter data and NLP approach can be utilized for studies related to public discussion and sentiments during the COVID-19 outbreak. Real time analysis can help reduce the false messages and increase the efficiency in proving the right guidelines for people.
© 2022 The Author(s).

Entities:  

Keywords:  COVID-19; Public health; Public opinion; Sentiments; Twitter

Year:  2022        PMID: 35873536      PMCID: PMC9293375          DOI: 10.1016/j.heliyon.2022.e09994

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

On December 31, 2019, an unknown pneumonia-like disease was detected in Wuhan, China and was reported to WHO country-office in China. In Early January, it was declared as an International public health emergency. In February, WHO announced the name of the new coronavirus as COVID-19. Since then, people all around the world are fighting to find a vaccine for the virus and the authorities are taking measures to keep the public safe from the virus. These measures include easy and efficient access to testing and results, rigorous contact tracing, consistent science-based messaging, quarantines and a genuine commitment to clamping down on socializing. The recent COVID-19 or Coronavirus pandemic is one such topic that has been trending on twitter. Ever since the outbreak in China, the global situation is worsening. As of April 28, 2020, globally, there were 2,954,222 cases of COVID-19. The cases are increasing globally with 960,916, highest cases alone in the United States. The virus is primarily spread between people during close contact, often via small droplets produced by coughing, sneezing, or talking. To stop the spreading of the virus, authorities worldwide were implementing travel bans, lockdowns, quarantines, curfews, stay-at-home orders, sanitizations, and public facilities closures. New technologies such as Twitter, Facebook and Redditt have played an important role in allowing people to express their opinion on social media platforms. These technologies have made it possible for users to create, modify, and share information (Kietzmann et al., 2011). Moreover, Kumar, 2020 asserts that this has brought the power of big data to researchers to carry out sentiment analysis, explore key factors from a specific community. Taking the current situation, WHO has provided guidelines to the general public and authorities on handling the pandemic. Therefore, this research will help provide information about factors related to COVID-19 and extend the current methodology in this new context.

Using twitter to measure Public Opinion

With 330 million monthly active users known in 2020, Twitter is one of the perfect mediums for the people to disseminate information. Social networking has been a representation of the arrival of emerging technology for modern technological environments. The web-based media became an impact of the event's latest advances. Studies that depend upon the Social Mediated Crisis Communication (SMCC) model have revealed that folks depend upon social and traditional media for data during emergencies (Austin et al., 2012; Liu et al., 2013). As a feature of a bigger media biological system, informal communication destinations became a facet of the range of authority utilized by associations to succeed in and include people during emergencies (Hanna et al., 2011; Keim and Noji, 2011). Studies have underlined that the determination of media (customary versus social) matters as associations endeavor to illuminate general society around public emergencies (Austin et al., 2012; Jin et al., 2014; Keim and Noji, 2011; Liu et al., 2011, 2015; Schultz et al., 2011). The contents shared on social media sites provide information as evidence during times of public health threats. This is often particularly significant for things analyzed in light of COVID-19 global pandemic as Twitter has become a replacement medium used to circulate recent information on COVID-19 by the health experts, journalists, government authorities and other genuine sources around the world. Specifically, understanding of the profound, rich, and altered correspondence about COVID-19 on Twitter could encourage the continuous responses of people with medical concerns and the right information about it. The government and health agencies need to provide information to people regarding the potential danger as their community or country transforms into a red zone with numerous cases of coronavirus infection. Giving the open data about how an infection of COVID-19 is spread and what government pioneers do to forestall forthcoming spread may stop alarm and deceived correspondence. With more than 27 million deaths recorded thus far, the government need to reassure its citizens' safety and secure the overall population's health. Twitter binds everybody for the overall cause during emergencies. The swine influenza outbreak in the US in 2009 drew a lot of attention from Twitterati. So, researchers studied the primary evidence-based scientific research by applying a qualitative methodology (Braun and Clarke, 2006) to tweets generated during the peak of the swine influenza epidemic. Moreover, during the 2010 seismic tremor, the American Red Cross had disentangled its raising money procedure and utilized Twitter as a discussion whereby people could tweet five-digit numbers and promptly give to the Haiti calamity subsidize (Manjoo, 2010). Increments in tweets pertaining to the catastrophe through posts/retweets containing “Haiti”' or “'seismic tremor”' were seen after the calamity, featuring that associations were conveying about the actual occasion. According to the Centers for Disease Control and Prevention (CDCP, 2016a), in 2014 the deadliest outbreak of Ebola epidemic in history occurred with about 22,000 cases. In response to the rising Ebola concern among the general public, the CDC commenced a live Twitter conversation with the general public in an effort to provide accurate information regarding the transmission of the disease. Another evidence-based research has examined the utility of Twitter for gaining insight into communicable disease outbreaks (Chew and Eysenbach, 2010; Signorini et al., 2011; Kostkova et al., 2014) and Ebola (Oluwafemi et al., 2014). The research offered unique insight into data driven qualitative tweets associated with communicable disease outbreaks. Another research (Diddi and Lundy, 2017) explored how four different health-related organizations used their Twitter accounts to speak about varied aspects of carcinoma during the month of October, which is observed as Breast Cancer Awareness month. All the tweets by these associations were analyzed for the presence of the theoretical parameters of the Health Belief Model (HBM) and in this way the examination exhibited while various associations shared important breast cancer related content on Twitter, each utilized the online media stage with an alternate style, clear evidence through specialization in different types of HBM constructs while publishing breast cancer-related tweets. As proposed by SMCC, associations must consider the three sorts of people to interact with their online media messages so on to enhance the adequacy of imparting through web-based media (Austin et al., 2012; Fraustino et al., 2012; Jin et al., 2014; Liu et al., 2013). Utilizing retweets could get online media makers to spread data to supporters. Moreover, with the presentation of online media, associations have taken on a human-like quality, rather than stay an inaccessible element (Notter and Grant, 2012). As the worldwide community faces the coronavirus pandemic together, Twitter assists individuals to find reliable data, connect with others, and follow what is going on progressively. There is a growing research enthusiasm for analyzing tweets according to the thought they express. This interest may be a result of the enormous proportion of messages that are posted regularly in Twitter which contain significant data for the receptive mind. Such an ordinary correspondence system would undoubtedly get the opportunity to arrange the extending level of instinct required by the segment of people for the most part. It encourages the users to be propelled by stories of courageous acts, positive examples, and global efforts to fight against the pandemic. On the contrary side, Twitter is effectively combating to prevent the deception or harm which will come to users from posts on their platform.

Research methodology

A five-step process was followed to collect, process, analyze and derive insights/inferences from COVID-19 tweets as shown in the flow chart below: In Figure 1 the steps 1 to 3 were performed using custom written Python code on Anaconda (Spyder IDE) and step 4 was done on Python, Microsoft Excel 2016, Tableau Public Desktop 10.3 (Sarlan et al., 2014).
Figure 1

Sentiment Analysis Process. Source: Created by authors.

Data collection The scope of the analysis was to examine the public expressions, sentiments and interest/focus areas when there was a sudden spike in COVID-19 cases along with lockdown in most parts of the world, hence the time period was fixed as 12-March-2020 to 15-Apr-2020. The data (Twitter day-wise tweets data) was manually downloaded from a publicly accessible data science platform Kaggle.com using two URLs post logging in by the analyst's profile on the website. The data had only those tweets (original tweets, no retweets) that contained at least one of the following hashtags – #coronavirus, #coronavirusoutbreak, #coronavirusPandemic, #COVID-19, #COVID_19, #epitwitter, #I havecorona, #StayHomeStaySafe, #TestTraceIsolate Data collation & preparation Once the 35 CSV files were placed in a folder, the next task was to combine the data from multiple files and create a master dataset. While combining the files the following were considerations were taken: Since the NLP sentiment tagging services are not available for all languages and for the sake of uniformity, only English tweets were filtered from the dataset using the ‘lang’ column in the dataset which had value ‘en’ for English. Owing to data processing constraints as well as for the sake of brevity: 100K tweets were sampled from the consolidated dataset in English from #A above using simple random sampling accomplished by pandas. Sample function in Python. A python code was written to accomplish this. The compiled data was observed to have 20 million tweets in 66 languages which was filtered on language English and sampled (using simple random sampling) to get 100K tweets1 which were saved in a CSV file. Sentiment Analysis Process. Source: Created by authors.

Illustration of data filtering for final (analysis) dataset

Data cleaning was not used in the preparation step so the hyperlinks and hashtags were retained in the ‘text’ (tweet content) column on the final dataset. This was done to preserve the original nature of the tweet which would help in contextual sentiment tagging of the tweet. Several industry analysts such as () do cleaning of hyperlinks, hashtags, word stemming and lemmatization, stop word and special characters removal at this step-in social media text analysis; however, this was not done in the current analysis due to the aforementioned reason.

Sentiment tag for each tweets

Once the final dataset was ready in the CSV format; the next step was to tag the sentiment score and magnitude. Google Cloud Natural language was used for this purpose along with Python code. The algorithm iterated over each tweet in the 100K final dataset where it called the Google NLP sentiment tagging API to get the sentiment score and sentiment magnitude. The score defined the polarity of the sentiment (−1 being most negative to +1 being most positive) whereas magnitude (0 to infinity) defined the strength of the emotion. Post tagging the sentiment score and magnitude on each tweet, threshold for tagging each sentiment was selected. Per Google NLP API documentation this is subjective to the context of analysis being done using Python code. So, the following was used to tag each tweet into one of the four sentiments – positive, negative, mixed and neutral as per Table 1 given below:
Table 1

Sentiments by tweets.

ScoreMagnitudeSentiment Tag
≥0.2>0.25Positive
<=-0.2>0.25Negative
-0.2 to +0.2>0.25Mixed
Any (-1 to 1)≤0.25Neutral
Sentiments by tweets. ‘The score, magnitude and sentiment tag were added as a new column in the dataset which was exported to CSV format to be used as a compatible file for insight generation and analysis via Python, Excel and Tableau. This file was called ‘the tagged dataset’.

Results

A qualitative approach was used to further develop themes in the twitter data. The researchers have used NVIVO12 to carry out the analysis. Specifically, Braun and Clarke (2006) six step process was used: (1) Becoming familiar with the data, (2) Generate initial codes, (3) Search for themes, (4) Review themes, (5) Define themes, and (6) Write-up. Finally, the researchers developed themes related to COVID-19 and the popular tweeted words included “COVID-19”, “people”, “cases”, “new”, “home”, help” and “health”. See Table 2 and Figure 2. The result is presented in Figure 3.
Table 2

TOP 30 Popular words.

WordCountWeighted Percentage
#COVID-19490822.47%
#coronavirus485382.44%
#COVID127160.64%
amp111880.56%
people100590.51%
#coronaviruspandemic65610.33%
cases63840.32%
new56590.28%
just54640.27%
coronavirus53020.27%
home49860.25%
get49410.25%
time49200.25%
help47940.24%
like45790.23%
pandemic45390.23%
need43730.22%
COVID43380.22%
one42300.21%
#coronavirusoutbreak42050.21%
stay41220.21%
health40680.20%
world39100.20%
virus37890.19%
please36020.18%
today34230.17%
day33620.17%
support31540.16%
know31260.16%
deaths30930.16%

Source: Created by authors.

Figure 2

The word cloud of the most popular keyword. Source: Created by authorsCOVID-19 related themes.

Figure 3

Sentiments by tweets.

TOP 30 Popular words. Source: Created by authors. The word cloud of the most popular keyword. Source: Created by authorsCOVID-19 related themes. Sentiments by tweets. Considering the enormous number of words that can be seen in Table 2, it is notable how well this word significantly influenced the contemporary COVID-19 pandemic narrative. During the COVID-19 outbreak, words such as "Corona" had a high number of tweets. The ideas have been determined by individually picking relevant topics and employing the text search query in NVIVO 12, which has been utilised during the content analysis method. Table 3 reveals a high prevalence of tweets with the topics face mask, quarantine, self-quarantine, Covid Test, Lockdown, and so forth. Some of the more important words used throughout these issues included mask, face shield, home quarantine, self-quarantine, test kits, rapid test, safety, and others. Even though it's possible that such an issue seems tied to something like a separate group of circumstances, it's important to note that almost all of the tweets analysed contain a word connected with COVID-19 as well as its variants.
Table 3

List of theme, topic, related words, and number of references on twitter.

ThemeTopicRelated wordsNumber of references
Public HealthFacemaskmask, facemask, face shield, PPE5114 references, 51.14 % coverage
QuarantineHome quarantine, self-quarantine12105 references, 128.05 % coverage
COVID-19 TestTest kits, rapid test2944 references, 0.02 % coverage
LockdownCOVID-19 lockdown9470 references, 0.08 % coverage
Social Distancing1 feet distance4346 references, 0.05 % coverage
SafetyStay home, stay safe8567 references, 0.05% coverage
COVID-19 VaccineVaccine1204 references, 0.01 % coverage
COVID-19 cases around the worldCOVID-19 in the United StatesLockdown in the US5259 references, 0.05 % coverage
UKUK lockdown, Immunity3256 references, 0.01 % coverage
ItalyItaly lockdown2623references, 0.02 % coverage
Wuhan, ChinaCriticism from media4777 references, 0.04 % coverage
COVID-19: No of new cases and DeathNew casesNew cases, confirmed cases, active cases6335 references, 0.05 % coverage
Death rateDeath poll, COVID-19 death3654 references, 0.03 % coverage
List of theme, topic, related words, and number of references on twitter.

Discussion

The general discourse and emotions of COVID-19-related Twitter messages were examined in this research. Between March 12 and April 15, 2020, individuals explored the top important themes shown in Table 3. The COVID-19 subjects are better understood owing to the topic modelling. Numerous conclusions can be drawn from the findings. To start with, there are a number of references to "quarantine" throughout the discussion, meaning that people value it higher than other issues. The second most discussed issue among individuals is "lockdown." Third, respondents were worried about "safety," as well as communications circulating all around them. This indicated that respondents are saying "stay home, stay safe" increasingly frequently. Fourth, notwithstanding their fears, a majority of participants expressed optimistic and mixed sentiments, indicating that many people predict an increase in the incidence of instances. Finally, vaccinations have been the least reported topic during the study period, indicating that the disease is still very much in the initial stages and that just a few discussions about it when the vaccine would've been ready were taking place.

Limitations and future research

The researchers used only sample trending hashtags to collect the data. As the situation is ongoing and evolving new trends would be coming up. For example, COVID-19 vaccine was not a popular trend during the research period. Another limitation is “global sample”, since the tweets can be taken as a global representation and is indicative of the twitter “users” opinion. However, twitter does provide real time data to conduct analysis which can be valuable. Another limitation is the language, as the researchers have only analyzed the tweets in English language. These limitation gaps can be bridged in future research. Future research can delve deeper into the themes emerged in this study. For example, “lockdown”, “safety” and “vaccines”. Another avenue is to look into other trends such as “misinformation”, “politics in the time of COVID-19”.

Conclusion

This proposed research, particularly attempting to analyse people's emotions and feelings throughout a COVID-19 outbreak, has been an accomplishment. Throughout this study, the Twitter posting interface was chosen to ensure the reliability of the findings as well as the ease of accessing individual Twitter posts. This study demonstrates how Twitter data may be used to measure public sentiment amid emergencies like COVID-19. Even throughout the study, it was revealed that nearly all states were tweeting about COVID-19 with positive views, showing that all of those people had already become accustomed to COVID-19 and, as a result, the survival rate had increased over time. The results of this analysis revealed that subjects such as "preventive methods to combat COVID-19," "public health," and "COVID-19 cases and mortality rate" were frequently discussed. The sentiment analysis suggested maximum users showed “positive” and “mixed” emotions. This type of analysis can be helpful for government and healthcare authorities to understand and react to public emergencies. It can also be utilized to ensure trust in the public.

Declarations

Author contribution statement

Bandinee Pradhan, Digvijay Pandey, Wangmo, GadeeGowwrii: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement

Data will be made available on request.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.
  5 in total

1.  Emergent use of social media: a new age of opportunity for disaster resilience.

Authors:  Mark E Keim; Eric Noji
Journal:  Am J Disaster Med       Date:  2011 Jan-Feb

2.  Ebola, Twitter, and misinformation: a dangerous combination?

Authors:  Sunday Oluwafemi Oyeyemi; Elia Gabarron; Rolf Wynn
Journal:  BMJ       Date:  2014-10-14

3.  Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak.

Authors:  Cynthia Chew; Gunther Eysenbach
Journal:  PLoS One       Date:  2010-11-29       Impact factor: 3.240

4.  Organizational Twitter Use: Content Analysis of Tweets during Breast Cancer Awareness Month.

Authors:  Pratiti Diddi; Lisa K Lundy
Journal:  J Health Commun       Date:  2017-02-19

5.  The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic.

Authors:  Alessio Signorini; Alberto Maria Segre; Philip M Polgreen
Journal:  PLoS One       Date:  2011-05-04       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.