Literature DB >> 35722449

Analyzing the public sentiment on COVID-19 vaccination in social media: Bangladesh context.

Md Sabab Zulfiker1, Nasrin Kabir2, Al Amin Biswas1, Sunjare Zulfiker3, Mohammad Shorif Uddin2.   

Abstract

Since December 2019, the world has been fighting against the COVID-19 pandemic. This epidemic has revealed a bitter truth that though humans have advanced to unprecedented heights in the last few decades in terms of technology, they are lagging far behind in the fields of medical science and health care. Several institutes and research organizations have stepped up to introduce different vaccines to combat the pandemic. Bangladesh government has also taken steps to provide widespread vaccinations from January 2021. The Bangladeshi netizens are frequently sharing their thoughts, emotions, and experiences about the COVID-19 vaccines and the vaccination process on different social media sites like Facebook, Twitter, etc. This study has analyzed the views and opinions that they have expressed on different social media platforms about the vaccines and the ongoing vaccination program. For performing this study, the reactions of the Bangladeshi netizens on social media have been collected. The Latent Dirichlet Allocation (LDA) model has been used to extract the most common topics expressed by the netizens regarding the vaccines and vaccination process in Bangladesh. Finally, this study has applied different deep learning as well as traditional machine learning algorithms to identify the sentiments and polarity of the opinions of the netizens. The performance of these models has been assessed using a variety of metrics such as accuracy, precision, sensitivity, specificity, and F1-score to identify the best one. Sentiment analysis lessons from these opinions can help the government to prepare itself for the future pandemic.
© 2022 The Authors.

Entities:  

Keywords:  COVID-19; Deep learning; Public sentiment; Social media; Traditional machine learning; Vaccination

Year:  2022        PMID: 35722449      PMCID: PMC9188682          DOI: 10.1016/j.array.2022.100204

Source DB:  PubMed          Journal:  Array (N Y)        ISSN: 2590-0056


Introduction

The World Health Organization (WHO) proclaimed COVID-19 a pandemic in January 2020 [1]. COVID-19 has created substantial ramifications on people's everyday lives all across the world. According to reports, COVID-19 has affected over 380 million people, with over 5.68 million individuals dying as of January 2022 [2]. Bangladesh is one of the world's most densely inhabited countries. It is one of the worst victims of this epidemic because of its large population. In Bangladesh, there are more than 1.80 million confirmed cases of COVID-19 by January 2022, with over 28,000 deaths [3]. Various health precautions, such as keeping social distance, wearing masks, and maintaining appropriate sanitization, are necessary to avert this pandemic. However, taking adequate health precautions alone is not enough to remove this pandemic from the planet. Vaccination is the only method to stop the spread of this disease. Countries like the USA, China, Russia have come forward to produce vaccines. The developed countries are performing mass vaccination. The government of Bangladesh has also taken steps to begin vaccinations from early 2021. By January 2022, more than 160 million doses of vaccines have been administered, and more than 62 million persons have been fully vaccinated [3]. The number of individuals who use social media has increased in tandem with the number of people using the internet. Different social media platforms have emerged as the primary means for internet users to express their emotions and reactions in recent years [4]. More than 2 billion people regularly share their daily activities through social media platforms [5]. According to a study, there are currently 3.50 billion internet users worldwide. One in every three persons in the world uses different social media sites. Some of the most popular platforms are Facebook, Twitter, and YouTube. By 2019, there are around 2.40 billion Facebook users, 330 million Twitter users, and over 1.90 billion YouTube users [6]. Due to the simplicity of these social media sites, users rely heavily on these platforms for sharing news and information. These platforms have become the prime sources of health-related information also. In the current COVID-19 situation, people are vastly sharing and discussing their views on the pandemic, vaccines and the vaccination process. Users post both positive and negative news and opinions. Sometimes negative and fake information about the vaccines and the vaccination process creates vaccine hesitancy, leading individuals to be disinterested in getting vaccinated. In 2019, vaccine hesitancy was enlisted as one of the ten most significant implications to global health [7]. Fake, as well as misleading information regarding the vaccination procedure, has the ability to lessen a country's vaccination rate. With the beginning of the vaccination program for COVID-19 in Bangladesh, Bangladeshi netizens are also using social networking platforms to express their views and reactions. They have shown mixed reactions to the ongoing vaccination program. However, very few studies have been conducted to analyze the opinions and feelings of Bangladeshi netizens about the COVID-19 vaccination program. This work attempts to fill this gap. Based on different demographic attributes, this study looked at how social media users felt about the COVID-19 vaccines and the ongoing vaccination campaigns in Bangladesh. This study also aims to extract the most essential themes expressed by the Bangladeshi netizens regarding these campaigns. Finally, this research has suggested several machine learning and deep learning approaches for assessing the polarity of social media users' views about Bangladesh's vaccination program and has picked the best model that can accurately identify the polarity of users' emotions. By utilizing the methodologies and approaches described in this study, Bangladesh government and policymakers can be aware of the actual opinions and sentiments of the Bangladeshi residents towards the ongoing vaccination campaigns. They can learn about the most concerning issues raised by the citizens about these campaigns. Based on this information, the government may take appropriate actions to ensure that the people of Bangladesh receive enough services regarding vaccination. Government can also be aware of the common misconceptions of the Bangladeshi citizens about the vaccine. By analyzing these misconceptions, they can take suitable initiatives to raise awareness among the native citizens against these fallacies. Besides, lessons from these opinions can help the government to prepare itself for the future pandemic. The key contributions of this study are as follows: Constructing a corpus of the Bangladeshi public opinion regarding the COVID-19 vaccines and the vaccination campaigns. Analyzing the views of Bangladeshi citizens regarding the vaccines and the vaccination campaigns. Exploring various deep learning and machine learning approaches to anticipate the Bangladeshi population's attitudes towards the vaccine and vaccination campaigns. The remainder of the paper is arranged as follows: Section 2 portrays the related works of the study. The methodology for analyzing the opinions of the Bangladeshi netizens regarding the COVID-19 vaccine and the ongoing vaccination program is delineated in Section 3. The result of the study is exhibited in Section 4. Finally, Section 5 concludes the paper by outlining some potential future scopes.

Related works

It can be undoubtedly said that COVID-19 has posed an extreme threat to human lives. This threat also sparked the rapid development of its vaccines. Misinformation communicated through online social media, on the other hand, frequently contributes to negative vaccine sentiments and hesitation. Understanding how people feel about the vaccine and the vaccination process as expressed on social media can help with disease monitoring, control, and eventual eradication. The threat of infectious disease has a significant impact on how people perceive and act in a variety of ways, and thus this is a complicated issue. In this part, we've looked into a variety of high-quality related research studies to understand the findings and limitations of each one. Naseem et al. [1] used a large dataset of ninety thousand COVID-19 related tweets to examine public attitudes towards the pandemic. The study was conducted with a view to discovering the most popular issues discussed by Twitter users about the COVID-19 epidemic. The LDA model was used to perform the topic modelling. Different machine and deep learning-based models were employed for predicting the polarity of the public sentiments regarding the pandemic. According to the study, the Bidirectional Encoder Representations from Transformers (BERT) model showed the best performance in classifying the public sentiments. Chakraborty et al. [8] performed their study to analyze the polarity of public opinions regarding the COVID-19 pandemic also. According to this study, although individuals were tweeting mostly positive statements about the epidemic, the majority of the retweets represented negative feelings. The authors proposed a deep learning-based model to identify the polarity of the public sentiments regarding the epidemic and achieved an accuracy of 81%. They also proposed a fuzzy model based on the Gaussian Membership Function to determine the polarity of public opinions. Doing so, they achieved an accuracy of 79%. Shim et al. [9] collected the public opinion regarding the COVID-19 vaccine in Korea. This study utilized LDA to retrieve the most prevalent topics discussed by the Korean population about the COVID-19 vaccine. For identifying the polarity of the public attitudes towards the vaccine, this study proposed a Bi-Directional LSTM (Bi LSTM) model. This study found that initially, the ratio of positive and negative tweets about the vaccine was almost the same. However, there was a rise in the number of negative tweets about the vaccine with the increase in the number of COVID-19 affected cases. In the negative tweets, the Twitter users expressed their fear and disappointment about the vaccine. The reactions of the Filipino citizens to the Philippine government's COVID-19 immunization efforts were studied by Villavicencio et al. [10]. The study classified the polarity of sentiments into three categories, namely positive, negative, and neutral. This study employed the Naive Bayes classifier to identify the polarity of the opinions and achieved an accuracy of 81.77%. The Naive Bayes classifier was also used by Pristiyono et al. [11] to analyze the sentiments of the Indonesian citizens regarding the COVID-19 vaccines. According to this study, the tweets regarding the vaccine were mostly dominated by negative sentiments. Lyu et al. [12] conducted a study for identifying sentiments and topics in COVID-19 vaccine-related social media discussions. This study also tried to detect the shifts in the topics and the emotions over time about the vaccine. They used LDA for topic modelling as well as sentiment and emotion analysis on a total of 1,499,421 tweets from 583,499 different persons. Finally, they discovered that topic modelling of vaccine-related tweets produced 16 topics that were arranged into five main themes. However, due to the period of their data set, they did not investigate sentiments against individual vaccine brands. Machine learning (ML) algorithms were utilized by Kwok et al. [13] for extracting the sentiments and topics regarding COVID-19 immunization on Twitter. They examined tweets by displaying high-frequency word clouds and relationships between word tokens after collecting 31,100 English tweets. They constructed an LDA topic model for identifying the common topics in the enormous numbers of tweets. Three topics were discovered as a result of their investigation. Only one-third of all tweets showed negative thoughts regarding the COVID-19 vaccine, with nearly two-thirds expressing positivity. The two most prominent positive sentiments among the eight basic emotions discovered in this study were trust and anticipation. Alam et al. [14] conducted a study to identify and untangle people's diverse sentiments about vaccination using deep learning techniques. The attitudes of people towards the vaccines of various kinds were examined in this study using natural language processing (NLP) techniques and they visualized the scenario by grouping the polarities of the received feelings into three categories, i.e. positive, negative, and neutral. The performance of the predictive models was evaluated using a Recurrent Neural Network (RNN) including Long Short-Term Memory (LSTM) and Bi-Directional LSTM (Bi LSTM). Here, LSTM and Bi LSTM obtained an accuracy of 90.59% and 90.83%, respectively. Hayawi et al. [15] presented a unique COVID-19 vaccination misinformation detection framework based on machine learning (ML). They used ML techniques to classify vaccination misinformation after collecting and annotating COVID-19 vaccine tweets. They categorized more than 15,000 tweets into two categories using credible sources and taking the help of specialists. This research work used Extreme Gradient Boosting (XGBoost), LSTM, and the BERT transformer model for classification. Among these models, the BERT model showed the best performance. They concluded that machine learning-based algorithms are successful at detecting COVID-19 vaccine misinformation. Basiri et al. [16] proposed a unique technique for analyzing the sentiment of the tweets relating coronavirus from eight countries by combining different deep learning and traditional machine learning model. They also used Google Trends to examine coronavirus-related queries to better understand how sentiment changed over time in different locations. The coronavirus drew peoples’ attention from various countries at different periods and with variable degrees of intensity, according to their results. Additionally, their tweets reflect their attitudes on events and news in their countries, such as the number of new cases of infection, fatalities, and recoveries. Furthermore, throughout the transmission of the infection, a common sentiment pattern is noticed in numerous countries. Nezhad and Deihimi [17] presented a study that analyzed Iranian people's views on COVID-19 vaccination using Persian tweets. To begin, they extracted 803,278 Persian tweets from Twitter using various keywords to determine their sentiments. For analyzing the sentiments of the tweets, they utilized a deep learning model based on CNN-LSTM architecture. Finally, they discovered a slight variation in the number of positive opinions towards domestic and imported vaccines. The study stated that the imported vaccines had the majority of positive sentiments. Melton et al. [18] performed sentiment analysis and LDA topic modelling on textual data obtained from 13 Reddit communities concentrating on the COVID-19 vaccination. To discover changes in sentiment and to identify the latent topics of the opinions of the community members, data was aggregated and examined by month. According to polarity analysis, these communities indicated more positive sentiments than negative sentiments towards vaccine-related topics. Topic modelling demonstrated that community members were more concerned with adverse effects than bizarre conspiracy theories. Throughout the LDA topic modelling, keywords indicating vaccine hesitancy were discovered. Yang et al. [19] investigated the misconceptions and misinformation about the Covid-19 pandemic. They categorized the misconceptions regarding Covid-19 into five groups. They found that majority of Covid-19 misconceptions addressed the spreading of the virus. Moreover, they also found that the misconceptions regarding the propagation of the virus and preventive methods spread quicker than the other categories. They employed a deep neural network-based model to identify the emotions of the tweets propagating misconceptions. They noticed that fear was the most prevalent emotion in these tweets. Zhou et al. [20] explored the dynamics of depression in the tweets of the citizens of a state of Australia during the Covid-19 pandemic. According to this study, the epidemic increased the citizens' depression levels. For analyzing depression among the inhabitants of that state, they extracted multimodal and term frequency-inverse document frequency (TF-IDF) features from their tweets. To identify depression among the tweets, they utilized different state-of-the-art machine learning algorithms. Zhou et al. [21] also investigated the dynamics of public opinions toward the Covid-19 outbreak. By analyzing the tweets of the residents of Australia's New South Wales state, they discovered that the pandemic decreased the overall positive polarity of the sentiments. Furthermore, they investigated the population's sentiment patterns in response to various social events and governmental decisions. They employed the Valence Aware Dictionary for Sentiment Reasoning (VADER) model to analyze the polarity of the sentiments. Yin et al. [22] conducted a study to analyze the public attitudes toward the Covid-19 vaccine. In addition, they identified the most common and discussed vaccine-related topics. They used LDA for topic modelling and the VADER model to determine the polarity of the feelings. According to the findings of this study, the majority of the netizens showed a willingness to be vaccinated. This study also showed that the netizens had unfavourable feelings regarding the news of adverse effects of the vaccines, shortfalls of the vaccines, and fatalities following taking the vaccines. Yin et al. [23] analyzed the dynamics of topics and sentiments on social networking sites. They found that tweets associated with home quarantine conveyed mostly positive views, and tweets discussing the mortalities due to the pandemic mostly expressed negative sentiments. Based on the preceding discussions, it can be stated that no research work has been conducted yet to analyze the opinions of Bangladeshi netizens regarding the vaccine and the vaccination process using machine learning, deep learning and natural language processing techniques. This research has paved the way to fill this void. This study has utilized different deep learning and traditional machine learning techniques to identify the polarity of the opinions of the Bangladeshi netizens. Furthermore, this study has extracted the latent topics discussed by the Bangladeshis in social media regarding the COVID-19 vaccine and the vaccination campaigns.

Methodology

The following subsections cover the overall strategy for extracting the most common topics discussed by the Bangladeshi netizens and identifying the polarity of their sentiments towards the COVID-19 vaccine and the vaccination program using a variety of deep learning and traditional machine learning algorithms.

Data collection

For conducting this study, we have constructed a dataset including 1075 statuses, comments, and tweets of the Bangladeshi netizens. We collected the data between June 2020 and July 2021. Opinions written in the English language have been considered only to generate the dataset. Fig. 1 shows the monthly percentage of the gathered opinions from June 2020 to July 2021. From the trend in Fig. 1, we can state that people were reluctant to express their sentiments and emotions regarding the vaccines in 2020. But from January 2021, the number of opinions regarding the vaccine and vaccination process increased rapidly with the initiation of the vaccination program.
Fig. 1

Monthly percentage of the collected opinions.

Monthly percentage of the collected opinions. Fig. 1 indicates that the most opinions were acquired in May 2021, and the least opinions were acquired in November 2020. 25.30% of the total opinions were gathered in May 2021, but on the other hand, only 0.65% of the total opinions were collected in November 2020. In May 2021, China provided free vaccines as a gift to Bangladesh in several phases. Besides, a Bangladeshi pharmaceutical company took the initiative to import the Moderna vaccine for the first time in Bangladesh. The netizens frequently expressed their opinions regarding these events, causing a surge in the amount of posted tweets, status updates, and comments on social media. Fig. 2 depicts the percentage of opinions acquired from the organizational and personal accounts. It shows that 16.65% of the total opinions were collected from the accounts of different organizations like news portals, health organizations, etc. and, 83.35% of opinions were gathered from personal accounts. The organizational accounts included the pages of UNICEF Bangladesh, UNICEF South Asia, the Ministry of Foreign Affairs of Bangladesh, and various newspapers like The Daily Star, Dhaka Tribune, etc.
Fig. 2

Percentage of the collected opinions by account type.

Percentage of the collected opinions by account type.

Data labelling

For labelling the polarity of the statements, we employed seven volunteers. The volunteers labelled the sentiment of each statement of the dataset into two categories: positive and negative. The majority rating of the volunteers was used to determine the absolute polarity of a statement. The inter-rater reliability score for labelling the statements was estimated using the Krippendorff's Alpha [24] and the Fleiss' Kappa [25] reliability tests. Table 1 shows that both Krippendorff's Alpha and Fleiss' Kappa inter-rater reliability scores are more than 0.80. So, it can be stated that the strength of the agreements of the volunteers in labelling the statements is high enough.
Table 1

Inter-rater reliability test scores.

Test NameReliability Score95% Confidence Interval (95% CI)
Lower BoundUpper Bound
Krippendorff's Alpha Reliability Test0.84730.84050.8539
Fleiss' Kappa Reliability Test0.84700.83400.8600
Inter-rater reliability test scores. After taking the majority voting of the seven volunteers, 595 statements were identified to have positive polarities. On the other hand, 480 statements were found to have negative polarities. Fig. 3 shows the monthly percentage of positive and negative opinions.
Fig. 3

Monthly percentage of the polarity of the opinions.

Monthly percentage of the polarity of the opinions. From Fig. 3, it can be stated that the percentage of opinions having positive polarities is greater than the percentage of negative polarities in most of the months, except June 2020, August 2020, November 2020, December 2020, and January 2021. The Bangladesh government has started the vaccination program in the last week of January 2021. So, the positive sentiments of the Bangladeshi netizens towards the vaccine and the vaccination program have increased after January 2021. Fig. 4 depicts the sentiment dynamics of every ten days. Before the vaccination campaign, there was a lot of skepticism and negative opinions concerning the vaccines. However, after January 2021, the citizens overwhelmingly expressed positive opinions about the vaccines.
Fig. 4

Sentiment dynamics of every ten (10) days.

Sentiment dynamics of every ten (10) days. The number of affected cases in Bangladesh increased dramatically in March 2021. As a result, there was a rise in the percentage of negative sentiments following the first week of March 2021. There was a shortage of vaccines in mid-April 2021, and the netizens frequently expressed unpleasant opinions in that timeline. Also, from the end of June to the middle of July 2021, the ratio of the negative sentiments was pretty high. During that time period, there was also a scarcity of vaccines in the country. Fig. 5 (a) and Fig. 5 (b) show the 10-Day Moving Average of the number of positive and negative opinions for 2021, respectively. In Fig. 5, only the opinions from January 2021 to July 2021 have been considered, as over 80% of opinions in the collected dataset are from this time period.
Fig. 5

Ten (10) day moving average of the positive and negative opinions.

Ten (10) day moving average of the positive and negative opinions. Fig. 6 (a) and Fig. 6 (b) represents the word cloud of the opinions having positive and negative sentiments, respectively.
Fig. 6

Word clouds.

Word clouds.

Topic modelling

Topic modelling technique extracts a text document's latent topics. It also groups the documents into the topics and themes that have been discovered. In this study, the Latent Dirichlet Allocation (LDA) model has been used to extract the most discussed topics among the Bangladeshi citizens regarding the vaccine and the vaccination process. In 2003, Blei et al. proposed the LDA model [26]. Documents are seen by LDA as a collection of topics. Topics, on the other hand, are treated as a blend of words. The probability of each word appearing in a topic is measured. If a word has a high probability of appearing in a topic, all documents containing that word are tightly linked to that topic. This study has extracted the top five dominant topics from the corpus containing positive statements and the top five dominant topics from the corpus containing negative statements of the Bangladeshi netizens about the vaccine and the vaccination process. The top five prevalent topics were selected from the corpuses of the positive and the negative statements based on the perplexity score. A lower perplexity score suggests a higher generalization ability. The perplexity scores for top topics () were measured. Table 2 shows that, the perplexity scores were the lowest for the top five dominant topics.
Table 2

Perplexity scores for different number of topics in the corpus of the positive and negative statements.

Number of TopicsPerplexity Score for Corpus of the Positive StatementsPerplexity Score for Corpus of the Negative Statements
2−6.04−6.11
3−6.10−6.21
4−6.20−6.30
5−6.26−6.36
Perplexity scores for different number of topics in the corpus of the positive and negative statements. Table 3 shows the top fifteen words of the top five topics in the corpus containing the positive statements extracted by LDA.
Table 3

Top five topics of the positive corpus.

Topic 1Topic 2Topic 3Topic 4Topic 5
vaccinesVaccinevaccinenovaccine
bangladeshBangladeshgoodsidebangladesh
vaccineVaccinesnewseffectsdose
bestdosesuseveryonegreat
ministerchinapeoplealhamdulillahgot
hopegoodbangladeshtookvaccinated
bangavaxmillioncountryvaccinevaccination
greatcovid19biotechgetpfizer
seesinopharmalsomedicalsecond
takingmodernaglobeparentsthank
usgovernmentgovernmentbangladeshdoses
initiativecovidthanksfinenews
appreciateusallahvaccinatedpeople
countriespfizerneedknowfirst
congratulationscovaxgetvaccinationsoon
Top five topics of the positive corpus. Some insights of the topics of Table 3 are as follows: Topic 1: It represents the appreciation of the netizens for the government's effort in allowing to start the manufacturing of the country's indigenous vaccine, Bangavax. Topic 2: When millions of Sinopharm, Moderna, and Pfizer vaccines arrived in Bangladesh for mass immunization, people were ecstatic. Topic 3: When Globe Biotech announced that they had developed the COVID-19 vaccine, the netizens took it very positively. Topic 4: People stated that they had no side effects and were all right after taking the vaccine. Topic 5: After receiving their first/second dosage of vaccines, netizens expressed their gratitude to the authority and the Bangladesh government. Table 4 shows the top fifteen words of the top five topics in the corpus containing the negative statements extracted by LDA.
Table 4

Top five topics of the negative corpus.

Topic 1Topic 2Topic 3Topic 4Topic 5
vaccinevaccinesvaccinenotvaccine
peoplebangladeshnotvaccinebangladesh
bangladeshchinatakevaccinesvaccines
notnopeoplepeoplenot
dosevaccinebusinessvaccinationget
indiapricedosefevermoney
firstgetindianstillindian
coronaindianeedmakingindia
gotministerindiapricegovernment
urinetrialtakingbodygovt
cowknowcoviduus
astrazenecapeoplebangladeshchinesetime
evenpoliticsnopainpeople
vaccinationsinopharmwoheadachecountries
usvaccinatedserumcountriesgive
Top five topics of the negative corpus. Some insights of the topics of Table 4 are as follows: Topic 1: Even after taking the vaccines, many became infected with COVID-19. As a result, internet users raised their doubts regarding the originality of the vaccines. Topic 2: According to the netizens, different foreign countries were allegedly playing politics when it came to delivering vaccine doses to Bangladesh. They also claimed that foreign countries were demanding a high price for vaccines. Topic 3: The netizens stated that different pharmaceutical companies and foreign countries were doing business regarding the vaccine. So, they encouraged others to avoid taking vaccines. Topic 4: Netizens were discussing the side effects of the vaccines. Topic 5: When the government failed to acquire vaccines despite making advance payments to other countries for vaccines, netizens expressed their dissatisfaction. Moreover, they demanded that the government should take back the money that was paid in advance.

Implementation procedures

This section describes the procedures of predicting the polarity of the Bangladeshi netizens’ remarks on the vaccine and the vaccination process. The entire technique is depicted in Fig. 7 .
Fig. 7

Implementation procedures for predicting the polarity of the statements.

Implementation procedures for predicting the polarity of the statements. The following subsections describe the whole procedures.

Data cleaning

Facebook and Twitter users frequently use different emojis and emoticons in their statuses and tweets. As this research only focuses on text data, these emojis and emoticons have been removed from the collected dataset. Netizens also utilize hashtags to symbolize a variety of trends and topics. The hash characters of the hashtags are removed in this step. Often two or more words are concatenated in the hashtag words, which may affect the performance of the machine learning models. So, the hashtag words have been segmented in this step.

Tokenization

Tokenization is the technique of splitting the words/terms of a string. In this step, each statement of the input dataset has been tokenized.

Removal of punctuation marks and stop words

The presence of punctuation marks and stop words may cause the models to perform poorly. The stop words are the most prevalent words in a language. They seldom have a significant impact in determining the polarity of a statement. So, the punctuation marks and the unnecessary stop words have been removed after performing tokenization. As negation is represented by words like “not” and “no,” they were excluded from the list of stop words.

Vectorization of the statements

In this step, we constructed a vocabulary with the most frequent words of the corpus. Then vectorization is performed by mapping each of the words of a statement to a particular integer id. A two-way lookup table has been used to convert the word of the statements to particular integer ids and vice versa.

Data splitting

Here, the acquired dataset has been split into train and test data. 20% data of the dataset has been used for constructing the test dataset. The remaining 80% of the data has been utilized for training the models.

Models for classification

Different deep learning models such as Long Short-Term Memory (LSTM), Bi-Directional LSTM (Bi LSTM), 1D Convolutional Neural Network (1D-CNN), Temporal Convolutional Network (TCN) and some traditional machine learning models like Decision Tree (DT), Gradient Boost (GB), and Support Vector Machine (SVM) have been used to classify the polarity of netizens' sentiments towards the vaccines and the vaccination campaigns. The architecture of the proposed deep learning models are portrayed in Fig. 8 (a)-(d).
Fig. 8

Architecture of the proposed deep learning models.

Architecture of the proposed deep learning models. Each of the deep learning models has employed pre-trained word-embedding models. Both pre-trained word2vec and GloVe models have been employed in the embedding layer. The proposed LSTM model consists of two LSTM layers. The first LSTM layer has 256 LSTM units, whereas the second LSTM layer has 128 LSTM units. Two dense layers with Rectified Linear Unit (RELU) activation function have been placed after the LSTM layers. Each of these LSTM and dense layers is followed by a dropout layer with a dropout rate of 0.2. Finally, Batch Normalization has been applied. After that, the final dense layer has been added. It has used the Softmax activation function. The proposed Bi LSTM model has an almost similar architecture to the proposed LSTM model. It also has two Bi LSTM layers. The first Bi LSTM layer consists of 100 units and the second Bi LSTM layer consists of 32 units. In the suggested 1D-CNN model, there are two 1D-CNN layers. The first CNN layer has 256 filters and the second CNN layer has 128 filters. Both layers have a kernel size of three. Before the first CNN layer, there is a Spatial Dropout layer with a dropout rate of 0.3. A dropout layer follows the first CNN layer, and a Global Max Pooling layer follows the second CNN layer. After the Global Max Pooling layer, there is a dense layer with 512 hidden nodes, which is followed by another dropout layer. Following this dropout layer, there is another dense layer with 256 hidden nodes. This dense layer is also followed by another dropout layer. The dropout rates of all of these dropout layers are 0.3. The final dense layer utilizes the Softmax activation function and has 2 hidden nodes. Before the final dense layer, batch normalization has been applied. The proposed TCN model consists of a single TCN layer with 256 filters. The kernel size of this layer is three. The TCN layer is preceded by a Spatial Dropout layer. After the TCN layer, there is a dropout layer. The model consists of four dense layers, including the final dense layer. The dense layers have 512, 256, 128, and 2 nodes, respectively. The first three dense layers are followed by dropout layers. The dropout rate of all of these dropout layers is 0.3. Before the final dense layer, batch normalization is applied. The final dense layer uses the Softmax activation function. In these deep learning models, Binary Cross Entropy has been used as the loss function, and Adam has been used as the optimizer.

Performance evaluation and final decision

In this step, the proposed deep learning and machine learning models' performances have been assessed using several performance metrics like accuracy, precision, specificity, sensitivity, and F1-score. Based on these performance metrics, the best model for classifying the sentiments of the netizens has been chosen.

Results and discussion

For analyzing the performance of the proposed models, 215 statements of the netizens have been used. Among the statements of the test dataset, 56.74% statements have a positive polarity and the rest, 43.26% statements have a negative polarity. Different performance metrics such as accuracy, precision, sensitivity, specificity, and F1- score have been computed for assessing the performance of the suggested models using the following formulas. Here. TP(True Positive Polarity): When a proposed model identifies the polarity of a statement as positive and the original polarity of that statement is also positive, then the outcome is known as True Positive Polarity. TN(True Negative Polarity): When a proposed model identifies the polarity of a statement as negative and the original polarity of that statement is also negative, then the outcome is known as True Negative Polarity. FP(False Positive Polarity): When a proposed model identifies the polarity of a statement as positive but the original polarity of that statement is negative, then the outcome is known as False Positive Polarity. FN(False Negative Polarity): When a suggested model identifies the polarity of a statement as negative but the original polarity of that statement is positive, then the outcome is known as False Negative Polarity. Table 5 shows the confusion matrices of the proposed deep learning models for predicting the polarity of the sentiments of the netizens towards the vaccine and the vaccination process.
Table 5

Confusion matrices of the proposed deep learning models.

ModelPre-trained Word EmbeddingTPPolarityTNPolarityFPPolarityFNPolarity
LSTMword2vec112732010
GloVe108751814
Bi LSTMword2vec108801314
GloVe11371229
1D-CNNword2vec109741913
GloVe11467268
TCNword2vec108781514
GloVe105791417
Confusion matrices of the proposed deep learning models. The confusion matrices of the proposed traditional machine learning models are portrayed in Table 6 .
Table 6

Confusion matrices of the proposed traditional machine learning models.

ModelTPPolarityTNPolarityFPPolarityFNPolarity
DT75375647
GB89415233
SVM97405325
Confusion matrices of the proposed traditional machine learning models. The measured accuracy, precision, sensitivity, specificity, and F1- score of the proposed deep learning and traditional machine learning models are shown in Table 7 and Table 8 , respectively.
Table 7

Performance metrics of the proposed deep learning models.

ModelPre-trained Word EmbeddingAccuracyPrecisionSensitivitySpecificityF1- score
LSTMword2vec86.05%84.85%91.80%78.49%88.19%
GloVe85.12%85.71%88.52%80.65%87.10%
Bi LSTMword2vec87.44%89.26%88.52%86.02%88.89%
GloVe85.58%83.70%92.62%76.34%87.94%
1D-CNNword2vec85.12%85.16%89.34%79.57%87.20%
GloVe84.19%81.43%93.44%72.04%87.02%
TCNword2vec86.51%87.80%88.52%83.87%88.16%
GloVe85.58%88.24%86.07%84.95%87.14%
Table 8

Performance metrics of the proposed traditional machine learning models.

ModelAccuracyPrecisionSensitivitySpecificityF1- score
DT52.09%57.25%61.48%39.78%59.29%
GB60.47%63.12%72.95%44.09%67.68%
SVM63.72%64.67%79.51%43.01%71.32%
Performance metrics of the proposed deep learning models. Performance metrics of the proposed traditional machine learning models. From Table 7, Table 8, it can be stated that the deep learning models have outperformed the traditional machine learning models to a great extent. Bi LSTM with the pre-trained word2vec embedding model has shown the highest accuracy of 87.44% among the deep learning models. In terms of accuracy, the word2vec embedding based deep learning models have shown better performances than the GloVe embedding based deep learning models. The accuracies of LSTM, 1D-CNN and TCN models with word2vec embedding are 86.05%, 85.12%, and 86.51%, respectively. On the other hand, the accuracies of LSTM, Bi LSTM, 1D-CNN and TCN models with GloVe embedding are 85.12%, 85.58%, 84.19%, and 85.58%, respectively. Precision, sensitivity, specificity, and F1-score play essential roles in evaluating the performance of machine learning or deep learning models, in addition to accuracy. Bi LSTM with the pre-trained word2vec embedding model has portrayed the best performance in terms of precision, and it has achieved a precision of 89.26%. This model has also shown the highest specificity and F1- score of 86.02% and 88.89%, respectively. GloVe embedding based deep learning models have shown better performance than word2vec embedding based deep learning models in terms of sensitivity. 1D-CNN with the pre-trained GloVe embedding model has achieved the highest sensitivity of 93.44%. After the GloVe embedding based 1D-CNN model, the GloVe embedding based Bi LSTM has shown the best sensitivity of 92.62%. SVM showed the highest accuracy of 63.72% among the traditional machine learning algorithms. The achieved accuracies of DT and GB are 52.09% and 60.47%, respectively. SVM, DT, and GB attained precisions of 57.25%, 63.12%, and 64.67%, respectively. In terms of sensitivity, SVM outperformed DT and GB. It attained a sensitivity of 79.51%, while the attained sensitivities of DT and GB are 61.48% and 72.95%, respectively. The traditional machine learning models also showed poor performance in the case of other performance metrics like specificity and F1- score. The Receiver Operating Characteristic (ROC) curve with Area Under Curve (AUC) values of the deep learning models are shown in Fig. 9 (a)-(d). When a model's AUC value is equal to 1, the model is regarded to be the perfect model for classification. If the AUC value of a model is equal to 0.5, then the model will provide random outcomes. That means the later model is unable to differentiate between different classes. The AUC values of all the proposed deep learning models are more than 0.9. So it can be stated that all of the suggested deep learning models are almost flawless.
Fig. 9

ROC curves for deep learning models.

ROC curves for deep learning models. The ROC curve of a classifier is a two-dimensional graph that plots the true-positive rate of that classifier against its false-positive rate at various threshold levels. The better the performance of a classifier, the closer the ROC curve of that model is to the top left corner of the graph. Fig. 9 (a)–(d) shows that the ROC curves of LSTM, Bi LSTM, and TCN with the word2vec pre-trained embedding is closer to the upper left corner of the graph than the ROC curves of these models when they were using pre-trained GloVe embedding. Only in the case of the 1D-CNN model, while using pre-trained GloVe embedding, the ROC curve is closer to the top left corner of the graph than using pre-trained word2vec embedding. Fig. 10 shows the ROC curves as well as the AUC scores of the traditional machine learning models. The AUC values of DT, GB, and SVM are 0.506, 0.646, and 0.639, respectively. The ROC curves of these models also prove their inefficiency. So, it is obvious that the traditional machine learning algorithms have shown pretty poor performance in terms of different performance metrics.
Fig. 10

ROC curves for traditional machine learning models.

ROC curves for traditional machine learning models. By analyzing the performance of different deep learning and traditional machine learning algorithms, it can be concluded that the word2vec embedding based Bi LSTM model has surpassed the other models based on different performance metrics.

Conclusion and future works

Public sentiments towards a vaccine play a crucial role in alleviating the adverse effects of an epidemic. And it also helps to identify whether the vaccine is working effectively or not. This study has analyzed the Bangladeshi citizens' opinions towards the COVID-19 vaccines and the ongoing vaccination campaign. To do so, the opinions of the Bangladeshi netizens have been collected from different social media sites. This study considered the opinions of the netizens between the timeline June 2020 and July 2021. The polarity of the statements of the netizens was labelled with the help of seven volunteers. For validating the reliability of the polarity labelling of the volunteers, inter-rater reliability tests like Fleiss' Kappa and Krippendorff's Alpha reliability tests were performed. By analyzing the polarity of opinions, it can be stated that the majority of the statements were positive regarding the vaccine after the vaccination campaign was started. Topic modelling using the LDA model was performed to extract the latent topics of the positive and negative statements of the netizens. In the positive opinions, the netizens expressed their gratitude towards the government for taking measures to manufacture the country's first indigenous vaccine. They also appreciated the governmental steps for mass immunization programs. In most of the negative opinions, netizens raised their worries about the originality of the vaccines. They also expressed their dissatisfaction with the authority's inability to acquire sufficient vaccines from the foreign countries amid the vaccination campaign. Different deep learning and traditional machine learning algorithms were employed in this study to predict the polarity of the statements of the netizens. Bi LSTM model with the pre-trained word2vec embedding has outscored other models in terms of different performance metrics for anticipating the polarity of the sentiments. The obtained accuracy of this model is 87.44%. This work has only considered the opinions of the netizens expressed in the English language. In future, opinions in the Bengali language can also be considered. This study hasn't utilized transformer-based language models like BERT, XLNET, ALBERT. A comparative study between the performance of the models used in this work and the performance of the state-of-the-art transformer-based models might be conducted in future.

Credit author statement

Md. Sabab Zulfiker: Methodology, Software, Formal analysis, Writing - Original Draft. Nasrin Kabir: Data Curation, Visualization, Formal analysis, Writing - Original Draft. Al Amin Biswas: Conceptualization, Formal analysis, Investigation, Writing - Original Draft. Sunjare Zulfiker: Data Curation, Writing - Original Draft. Mohammad Shorif Uddin: Supervision, Writing - Review & Editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  12 in total

1.  COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis.

Authors:  Usman Naseem; Imran Razzak; Matloob Khushi; Peter W Eklund; Jinman Kim
Journal:  IEEE Trans Comput Soc Syst       Date:  2021-01-29

2.  Tweet Topics and Sentiments Relating to COVID-19 Vaccination Among Australian Twitter Users: Machine Learning Analysis.

Authors:  Stephen Wai Hang Kwok; Sai Kumar Vadde; Guanjin Wang
Journal:  J Med Internet Res       Date:  2021-05-19       Impact factor: 5.428

3.  Evaluating the predictability of medical conditions from social media posts.

Authors:  Raina M Merchant; David A Asch; Patrick Crutchley; Lyle H Ungar; Sharath C Guntuku; Johannes C Eichstaedt; Shawndra Hill; Kevin Padrez; Robert J Smith; H Andrew Schwartz
Journal:  PLoS One       Date:  2019-06-17       Impact factor: 3.240

4.  Examination of Community Sentiment Dynamics due to COVID-19 Pandemic: A Case Study from a State in Australia.

Authors:  Jianlong Zhou; Shuiqiao Yang; Chun Xiao; Fang Chen
Journal:  SN Comput Sci       Date:  2021-04-09

5.  ANTi-Vax: a novel Twitter dataset for COVID-19 vaccine misinformation detection.

Authors:  K Hayawi; S Shahriar; M A Serhani; I Taleb; S S Mathew
Journal:  Public Health       Date:  2021-12-07       Impact factor: 2.427

6.  Twitter sentiment analysis from Iran about COVID 19 vaccine.

Authors:  Zahra Bokaee Nezhad; Mohammad Ali Deihimi
Journal:  Diabetes Metab Syndr       Date:  2021-12-13

7.  Sentiment analysis and topic modeling for COVID-19 vaccine discussions.

Authors:  Hui Yin; Xiangyu Song; Shuiqiao Yang; Jianxin Li
Journal:  World Wide Web       Date:  2022-02-25       Impact factor: 3.000

8.  COVID-19 Vaccine-Related Discussion on Twitter: Topic Modeling and Sentiment Analysis.

Authors:  Joanne Chen Lyu; Eileen Le Han; Garving K Luli
Journal:  J Med Internet Res       Date:  2021-06-29       Impact factor: 5.428

View more
  1 in total

1.  Automatically detecting and understanding the perception of COVID-19 vaccination: a middle east case study.

Authors:  Wajdi Aljedaani; Ibrahem Abuhaimed; Furqan Rustam; Mohamed Wiem Mkaouer; Ali Ouni; Ilyes Jenhani
Journal:  Soc Netw Anal Min       Date:  2022-09-04
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.