Maneet Singh1, Hennaav Kaur Dhillon2, Parul Ichhpujani2, Sudarshan Iyengar1, Rishemjit Kaur3. 1. Department of Computer Science and Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India. 2. Department of Ophthalmology, Government Medical College and Hospital, Chandigarh, India. 3. Principal Scientist, CSIR-Central Scientific Instruments Organisation, Chandigarh; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India.
Abstract
Purpose: COVID-19-associated mucormycosis (CAM) was a serious public health problem during the second wave of COVID-19 in India. We planned to analyze public perceptions by sentiment analysis of Twitter data regarding CAM. Methods: In this observational study, the application programming interface (API) provided by the Twitter platform was used for extracting real-time conversations by using keywords related to mucormycosis (colloquially known as "black fungus"), from May 3 to August 29, 2021. Lexicon-based sentiment analysis of the tweets was done using the Vader sentiment analysis tool. To identify the overall sentiment of a user on any given topic, an algorithm to label a user "k" based on their sentiments was used. Results: A total of 4,01,037 tweets were collected between May 3 and August 29, 2021, and the peak frequency of 1,60,000 tweets was observed from May 17 to May 23, 2021. Positive sentiment tweets constituted a larger share as compared to negative sentiment tweets, with weekly variations. A temporal analysis of the demand for utilities showed that the demand was high in the initial period but decreased with time, which was associated with the availability of resources. Conclusion: Sentiment analysis using Twitter data revealed that social media platforms are gaining popularity to express one's emotions during the ongoing COVID-19 pandemic. In our study, time-based assessment of tweets showed a reduction over time in the frequency of negative sentiment tweets. The polarization in the retweet network of users, based on sentiment polarity, showed that the users were well connected, highlighting the fact that such issues bond our society rather than segregating it.
Purpose: COVID-19-associated mucormycosis (CAM) was a serious public health problem during the second wave of COVID-19 in India. We planned to analyze public perceptions by sentiment analysis of Twitter data regarding CAM. Methods: In this observational study, the application programming interface (API) provided by the Twitter platform was used for extracting real-time conversations by using keywords related to mucormycosis (colloquially known as "black fungus"), from May 3 to August 29, 2021. Lexicon-based sentiment analysis of the tweets was done using the Vader sentiment analysis tool. To identify the overall sentiment of a user on any given topic, an algorithm to label a user "k" based on their sentiments was used. Results: A total of 4,01,037 tweets were collected between May 3 and August 29, 2021, and the peak frequency of 1,60,000 tweets was observed from May 17 to May 23, 2021. Positive sentiment tweets constituted a larger share as compared to negative sentiment tweets, with weekly variations. A temporal analysis of the demand for utilities showed that the demand was high in the initial period but decreased with time, which was associated with the availability of resources. Conclusion: Sentiment analysis using Twitter data revealed that social media platforms are gaining popularity to express one's emotions during the ongoing COVID-19 pandemic. In our study, time-based assessment of tweets showed a reduction over time in the frequency of negative sentiment tweets. The polarization in the retweet network of users, based on sentiment polarity, showed that the users were well connected, highlighting the fact that such issues bond our society rather than segregating it.
Social media platforms have attracted the attention of researchers around the globe because of the ample amount of data available for mining public opinion toward any issue. Opinion mining is a prominent area of research incorporating automated analysis of text.[12] Nowadays, a large amount of such text is generated daily due to the advent of social networking sites like Twitter, thus motivating researchers from different domains to understand the sentimental as well as moral aspects of opinion and attitude of the people about topics of concern.[34]In the current study, we focus on mucormycosis associated with COVID-19.Mucormycosis is a fungal infection, maiming coronavirus disease (COVID-19) patients in India, affecting the sinuses, eye, and brain, thus being currently called as “COVID-associated mucormycosis” (CAM). CAM progresses rapidly and causes high morbidity and mortality. Given the magnitude of the CAM outbreak, the Indian Ministry of Health had advised all states to declare mucormycosis as an epidemic.During the second wave of COVID-19, many Indian hospitals, which were probably seeing 1 or 2 patients with mucormycosis in a month, saw an unprecedented surge of 500–600 patients being admitted concurrently; thus, resource crunch was seen leading to distress among masses.[5]The current study was planned to understand the public perception toward mucormycosis. We utilized the Twitter platform and extracted tweets from it by using keywords associated with mucormycosis. These tweets were then used to study the sentimental aspect of public opinion. A temporal analysis was done with respect to overall tweets frequency, sentiments of the tweets, and the frequency of tweets focusing on demand related to CAM. The above dynamics were also compared with the real-world events. In this study, we also proposed a methodology to label users based on the overall sentiment of their tweets. The level of polarization in the discussions related to mucormycosis was also assessed.
Methods
Data collection
The API provided by the Twitter platform was used for extracting real-time conversations related to fungal infections, from May 3 to August 29, 2021. The keywords used to extract the tweets were “Black Fungus,” “White Fungus,” “Yellow Fungus,” “Mucor,” “Mucormycosis,” “Amphotericin B,” “Amphotericin B,” and “Rhino Orbital cerebral mucormycosis.” In this way, 401,037 English tweets posted by users across the globe were extracted. The majority of the tweets were obtained using the keywords “mucor/mucormycosis” and “Amphotericin B/Amphotericin B”; however, some of the tweets may have contained multiple keywords.To ensure that the tweets mainly belonged to India, we randomly selected 5000 tweets and extracted the locations of the users of those tweets from their profiles by using the approach given by Kaur et al. and Hutto et al.[24] We focused only on English-language tweets in our analysis because of lack of availability of efficient tools or methodology for detecting sentiments for other languages spoken in India.
Ethical statement
Our study does not involve participation by any live human subjects, and because all the data used for analysis is publicly available, approval from the institutional review board was not needed.
Preliminary analysis
To get more insights into the data, a preliminary analysis of the tweets was done. In this regard, first, all the hashtags and mentions included in the tweets were extracted and their overall frequency was computed. “Hashtags” are the words beginning with the “#” symbol, generally indicating the agenda of the tweet and “mentions” are the Twitter handles of the users of the platform, beginning with “@” symbol, used primarily to target/refer the tweet to a particular user(s). Other than the hashtags and mentions, the frequency of the tweets was also analyzed at regular intervals of our data collection period. For this, we divided our collection period into multiple 7-day slots, each corresponding to 1 week, and computed the frequency of tweets for every week.
Preprocessing tweets
Due to the unstructured nature of tweets, it was essential to preprocess the text for further processing. The tweets were filtered with the removal of URLs, escape tags, hashtags, mentions, non-alphanumeric characters, and any additional spaces present within the text.
Lexicon-based sentiment assessment of tweets
Social media platforms like Twitter have become a good means of expressing our sentiments in the form of text posted on these websites. The text, in general, comprises one or more words and these words can guide us in identifying the type of sentiment conveyed in it. There are tools available online that utilize dictionaries that associate words with different positive or negative sentiments. In our case, we employed the Vader sentiment analysis tool,[4] which has been applied in prior studies,[6] for detecting sentiments in English tweets. The given tool returns a sentiment score in the range from −1 to 1. Here, a tweet with a score greater than 0.05 was considered to represent positive sentiment; similarly, a tweet having a sentiment score less than −0.05 was considered to represent negative sentiment. All the remaining tweets were considered to be neutral. In this way, the sentiments expressed in each of the tweets of our dataset were extracted. A similar approach was also used to find the sentiment of frequently occurring words in the positive and negative sentiment tweets.
User sentiment identification
A user on social media platform may post multiple tweets on any given topic. Out of these multiple tweets, it may be possible that not all tweets have a similar sentiment. Therefore, to identify the overall sentiment of a user on any given topic, we proposed an algorithm to label a user “k” based on their sentiments. The proposed approach first extracts the positive sentiment tweets as well as negative sentiment tweets of each user. Using the above tweets, a user is labeled as 1, −1, or 0 with the help of Equation (3) given in Algorithm 1. Here, label 1 corresponds to positive sentiment; similarly, −1 corresponds to negative sentiment. The label 0 is regarded as neutral.
Algorithm 1 (Sentimental Labelling of User “k”)
Let T be the set of tweets by user k in our dataset.Let D be a dictionary that maps the tweets of user k to its sentiment label such that D (t) = sentiment(t).The set of positive sentiment tweets (P) and negative sentiment tweets (N) for user k are as follows:P = {t | D = 1 ∀t ∈ T}D = {t | D = -1 ∀t ∈ T}The sentiment label for a user k, that is, L can then be computed using the following equation:
Computation of opinion polarization
In this section, we aim to quantify the polarization of the sentimental aspect of the opinion of users toward CAM. For this, we first constructed an undirected retweet network from the tweets given in our database. The nodes in our network resemble the users and an edge emn would imply that there exists at least one tweet, that is, retweeted between users “m” and “n.” We then filtered every user “j” with the label Lj = 0 to focus only on users with positive or negative sentiment. The retweet network thus formed contained 33,970 nodes and 91,729 edges. Next, we detected the communities within the retweet network by using the work by Blondel et al.[7] Let N be the total number of communities in the network, and and be the number of positive sentiment users and negative sentiment users, respectively, in community “I.” Finally, we employed the method proposed in our previous work to compute the level of polarization among users with opposing sentiments on Twitter.[8] The equation to compute the polarization is P given as follows:where
Extraction of user profiles
The basic attributes of all the users were extracted from their profiles on Twitter. These attributes include the total number of tweets posted by the user, number of Twitter accounts following the user’s profile, the number of Twitter accounts followed by the user, and the number of days elapsed since the user created a profile on Twitter.
Identifying frequently used words from Amphotericin B-based tweets
The frequently occurring non-functional words in the tweets have been found in past studies to identify central themes of discussions.[9] As one of the aims of the current study was to extract key topics of conversations corresponding to the Amphotericin B injection vials on the Twitter platform, we first removed the functional words (which are generally found in abundance) from the tweets, which included stopwords along with commonly used keywords (selected manually), namely “mucormycosis,” “amphotericin,” “amphotericin b,” “black,” “fungus,” “liposomal,” “injection,” “sir,” “injections,” “black fungus,” “mucor,” “liposomal,” “vials,” “fungals,” “fungal,” and “infection.” For identification purposes, we constructed a word cloud by using the Python library.[10]
Help related tweets extraction
The varying demand or help requested by the users on the Twitter platform related to the fungus problem were also quantified. For this, tweets having the following keywords “need,” “demand,” “required,” “request,” “urgent,” “urgently,” “needed,” “please,” “plz,” “help,” “want,” and “wanted” were extracted. Using the above-filtered tweets, we computed their frequency for each 7-day slot of our data collection period and compared them with the frequency of all the tweets to assess the demand-based tweets.
Results
A total of 4,01,037 tweets were collected between May 3 and August 29, 2021 by using the keywords “Black Fungus,” “White Fungus,” “Yellow Fungus,” “Mucor,” “Mucormycosis,” “Amphotericin B,” “Amphotericin B,” and “Rhino Orbital Cerebral Mucormycosis.”The keyword-based frequency distribution of tweets showed 2466 tweets with the keyword “Black Fungus,” 466 tweets with the keyword “White Fungus,” 75 tweets with the keyword “Yellow Fungus,” 283,480 tweets with the keywords “Mucor/Mucormycosis,” 148,828 tweets with the keywords “Amphotericin B/Amphotericin B,” and 466 tweets with the keyword “Rhino orbital cerebral mucormycosis.”The distribution of tweets [Fig. 1a] among the top five countries (based on frequency) shows that the majority of the tweets belong to India (more than 75% in our sample).
Figure 1
(a) The distribution of tweets among the top five countries. (b) The weekly distribution of the total number of tweets during the study period
(a) The distribution of tweets among the top five countries. (b) The weekly distribution of the total number of tweets during the study periodThe peak frequency of nearly 1,60,000 tweets was observed from May 17 to May 23, 2021; thereafter, the tweets started to show a declining trend [Fig. 1b].It was also observed that the highest weekly tweet frequency of 60,000 tweets using the keywords “Amphotericin B” and “Amphotericin B” was during the same week and showed a declining trend thereafter, reaching nearly zero by July 24, 2021 [Fig. 2a].
Figure 2
(a) The weekly distribution of the total number of tweets mentioning amphotericin B injection. (b) The weekly distribution of the total number of tweets demanding amphotericin B injection. (c) Comparison of injection-based demand tweets compared to the overall tweets. (d) The weekly distribution of the total number of tweets demanding other drugs such as posaconazole and isuvaconazole
(a) The weekly distribution of the total number of tweets mentioning amphotericin B injection. (b) The weekly distribution of the total number of tweets demanding amphotericin B injection. (c) Comparison of injection-based demand tweets compared to the overall tweets. (d) The weekly distribution of the total number of tweets demanding other drugs such as posaconazole and isuvaconazoleThe percentage share of the commonly used hashtags was also calculated; where #mucormycosis had the highest share of nearly 12%, followed by #BlackFungus and #COVID19 at nearly 8% and 6%, respectively. The percentage share of the other hashtags is shown in Fig. 3a.
Figure 3
(a) The percentage share of the commonly used hashtags. (b) The percentage share of commonly used mentions. (c) Top 20 users based on the frequency of tweets are also plotted
(a) The percentage share of the commonly used hashtags. (b) The percentage share of commonly used mentions. (c) Top 20 users based on the frequency of tweets are also plottedThe frequency of mentions was also calculated. The top 20 mentions are depicted in Fig. 3b. The most frequently mentioned Twitter handle was @SonuSood, accounting for nearly 7.5% of all the mentions. Mentions for government Twitter handles such as @PMOIndia, @narendermodi, @MoHFW_INDIA, @drharshvardhan, and @CMODelhi accounted for less than 2% of the mentions individually. The top two hashtags used to refer to a state were #Gujarat and #Delhi.Fig. 3c depicts the top 20 users based on the frequency of the tweets. Most of these handles belong to private users, and two of them belong to media (print and electronic). Surprisingly, none of the handles belonged to government agencies.As per the frequency plot of the sentiment-based tweets, positive sentiment tweets constituted a larger share as compared to negative sentiment tweets, with weekly variations [Supplementary Fig. 1]. The positive tweet share with a Lexicon value of >0.5 was seen during 6 weeks during the study period. The share of negative tweets with a >–0.5 lexicon value was seen in only one week during the study period. The first week of the analysis period showed a higher number of negative sentiment tweets followed by a rising share of the positive sentiment tweets for the next 5 weeks. During the week with the highest number of tweets (May 17 to May 23, 2021), there was a higher share of the positive sentiment tweets as compared to the negative sentiment tweets, as depicted in Fig. 4a and b. The user profiles tweeting positively or negatively were also analyzed and no differences in the Twitter profile characteristics of users with opposing sentiments were observed as shown in Fig. 4c.
Figure 4
(a) Share of positive sentiment tweets. (b) Share of negative tweets. (c) Twitter profile characteristics of users with opposing positive and negative sentiments
(a) Share of positive sentiment tweets. (b) Share of negative tweets. (c) Twitter profile characteristics of users with opposing positive and negative sentimentsA word cloud is a visual representation of the words that appear commonly in texts. The most used words related to the injection-based tweets were “hospital,” “patient,” “urgent,” and “vial.” “Urgent” and “save” were the top two positive sentiment words (>25,000 and 15,000), and “suffering” and “death” (>10,000 and >8000) were the two most common negative sentiment words, as seen in Supplementary Fig. 2.The maximum number of tweets (>60,000) demanding for utilities was during the May 17 to May 23, 2021 peak, thereafter showing a declining trend and reaching nearly zero by the first week of August [Fig. 5a]. These demand-based tweets formed a significant part of the overall tweets posted during this period [Fig. 5b]. People expressed their demands in a positive manner as depicted in Supplementary Fig. 4. The demand-based tweets were analyzed separately into drug demand tweets (injection amphotericin B, posaconazole, Isuvaconazonle), hospital bed demanding tweets (beds with and without oxygen) as in Fig. 5c and d. Tweets pertaining to the demand for hospital beds rose sharply in the 2nd week of May 2021 and continued the high tweet trend till the end of the month. The tweets pertaining to the demand for oxygen beds sharply rose in the third week of May and reached nearly zero by the end of June 2021. Specifically looking into injection amphotericin B-demand tweets, the peak was during May 17–23, 2021 [Fig. 2b]. When these injection-based demand tweets were compared to the overall tweets discussing the CAM issue, two peaks were observed during May 17–23, 2021 and Aug 2–8, 2021 [Fig. 2c].
Figure 5
(a) Weekly tweet share of absolute demand-based tweets (b) Ratio of demand-based tweets to total number of tweets. (c) Demand-based tweets: demanding Oxygen beds. (d) Demand-based tweets: demanding hospital beds
(a) Weekly tweet share of absolute demand-based tweets (b) Ratio of demand-based tweets to total number of tweets. (c) Demand-based tweets: demanding Oxygen beds. (d) Demand-based tweets: demanding hospital bedsFig. 2d represents the demand-based tweets for other drugs used in the management of mucormycosis, such as posaconazole and isuvaconazole.
Discussion
Followed by the USA, India is the second worst pandemic-hit country, with 42,975,433 infected cases as of March 9, 2022.[1112] India also accounts for 469,000 deaths due to COVID-19, the third-highest number in the world following the USA and Brazil as of November 30, 2021.[13]Beginning in May 2021, India started to witness a large number of cases of CAM, accounting for nearly 47,000 cases during the second wave of the pandemic.[1415]The spread of the pandemic caused anxiety among the general population. Therefore, the current sentiment analysis was carried out to help understand the public emotions of the Indian population regarding the Indian implications ongoing COVID-19 pandemic.The present study aims to analyze the public sentiment through data retrieved from a popular social networking and microblogging website, Twitter. A total of 4,01,037 tweets were collected between May 3 and August 29, 2021 by using the keywords “Black Fungus,” “White Fungus,” “Yellow Fungus,” “Mucor,” “Mucormycosis,” “Amphotericin B,” “Amphotericin B,” and “Rhino Orbital Cerebral Mucormycosis.” The data were analyzed on a weekly basis to better understand the sentiments of the public. The peak frequency of nearly 1,60,000 tweets was observed during May 17–23, 2021; thereafter, the tweets started to show a declining trend. India contributed about 81% of the global CAM cases. The peak occurrence of these cases was observed in the third week of May 2021.[16] This coincides with the weekly analysis of the highest number of tweets seen in the present study. The total number of newly diagnosed CAM cases started to show a declining trend in the last week of June 2021, which also correlates with the decline in the total number of tweets.The primary treatment of mucormycosis essentially rests upon the intravenous administration of an antifungal drug, namely Amphotericin B, in its liposomal form. A pan-India shortage of this drug during the epidemic of mucormycosis was reported.[17] The reasons for the shortage of this drug were threefold: hoarding, black marketing, and a skewed demand-supply ratio due to limited production of the drug.[161819] The present study analyzed the total number of tweets with the mention of injection amphotericin B. The peak of this mention was observed during the third week of May 2021, which correlates with the highest number of newly recorded cases (>8800 cases pan-India). There was a rise of 2869 cases in just 4 days from May 22 to May 26, 2021.[20]The percentage share of the commonly used hashtags “#mucormycosis” had the highest share of nearly 12% followed, by “#BlackFungus” and “#COVID19” at nearly 8% and 6%, respectively. It is interesting to note that #verified accounted for around 1% of the total tweet share, probably indicating confirmed and verified sources from where the injection could have been made available.The hashtag #WhiteFungus also contributed around 1% of the total tweet share, corroborating with the reported cases of aspergillosis in addition to mucormycosis.Gujarat accounted for the highest number of mucormycosis cases (2859) according to the national registry as of May 28, 2021.[21] #Gujarat was one of the top two frequently mentioned states. The second most frequently mentioned state was #Delhi. Interestingly, Delhi had an acute shortage of the drug even after the allocation of drugs by the center to the states.[20] By June 17, 2021, the center allocated more than 7 lakh vials of liposomal amphotericin B to all state and central institutions. This correlated with the decrease in the injection-based demand tweets in the corresponding week.[21]On analyzing the top 20 users, Twitter handles with the top 2 highest number of tweets belonged to private users. Others among these top 20 users included physicians, media-based handles. One Twitter handle @ TechSparx, belonged to a data scientist who created a separate blog to plot the various statistics and state-based data related to mucormycosis (governmentstates.com/mucormycosis/index.html). Interestingly, four of the top 20 users had joined Twitter during April–May 2021, during the mucormycosis epidemic in the country. Most of the tweets were demand-based tweets.It is interesting to note that mentions for government Twitter handles such as @PMOIndia, @narendermodi, @MoHFW_INDIA, @drharshvardhan, @CMODelhi accounted for less than 2% of the mentions individually despite CAM being declared as an epidemic in most states. Twitter was being used as a platform by people across the nation to voice their concerns, but the response from the government authorities was very limited. Sentiment analysis such as ours can serve to highlight the need for government authorities to have digital and social media policies for addressing the concerns of the general population during health emergencies such as COVID-19 and CAM.As per the frequency plot of the sentiment-based tweets, there was a larger share of the positive sentiment tweets as compared to the negative sentiment tweets with weekly variations. The first week of the analysis period showed a higher number of negative sentiment tweets followed by a rising share of the positive sentiment tweets in the following weeks. It was observed that the sentiment tweets were predominantly positive, suggesting public confidence despite a public health emergency. However, during the week with the highest number of tweets (May 17–23, 2021) and the highest number of clinically reported cases, there was a higher share of the negative sentiment tweets as compared to the positive sentiment tweets.People may have expressed overall positive (or negative) sentiment toward the given issue through their tweets/retweets, but at the same time, they have hesitated in retweeting the tweets of users with overall negative (positive) sentiments. All this indicates that because the topic is associated with health and not politics or any controversial matter, people were united in expressing their thoughts or beliefs on Twitter [Supplementary Fig. 3].Demand-based tweets formed a substantial part of the overall tweets during the study period. The peak share of demand-based tweets was observed for 4 weeks. Of the demand-based tweets, injection-related tweets had the highest share (40,000 tweets) in the peak week of May 17 to May 23, 2021. The injection demand-based tweets nearly leveled zero by the last week of June 2021. This was around the time the newly reported cases of CAM had dropped.Preethi and Saroja also studied sentiment analysis for mucormycosis, but their study duration was from May 1 to June 15, 2021 while ours was for a longer duration, from May 3 to August 29, 2021.[22]
Conclusion
Sentiment analysis of social media data of a popular platform such as Twitter gives an overview of people’s perceptions of black fungus. This article via its trend of demand-based tweets can provide insights to the government and policymakers to take important appropriate actions at appropriate places for controlling the black fungus and COVID-19 outbreaks.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.Total share of positive and negative sentiment tweets(a) Word cloud of the frequently used words. (b) Positive sentiment keyword count (c) Negative sentiment keyword countTwitter profile charecteristics of positive and negative sentiment tweetsRetweet network with nodes labelled based on their overall sentiments