| Literature DB >> 36090696 |
Wajdi Aljedaani1, Ibrahem Abuhaimed2, Furqan Rustam3, Mohamed Wiem Mkaouer4, Ali Ouni5, Ilyes Jenhani6.
Abstract
Introduction: The development of COVID-19 vaccines has been a great relief in many countries that have been affected by the pandemic. As a result, many governments have made significant efforts to purchase and administer vaccines to their populations. However, accommodating such vaccines is typically confronted with people's reluctance and fear. Like any other important event, COVID-19 vaccines have attracted people's discussions on social media and impacted their opinions about vaccination. Objective: The goal of this study is twofold: First, it conducts a sentiment analysis around COVID-19 vaccines by automatically analyzing Arabic users' tweets. This analysis has been spread over time to better capture the changes in vaccine perceptions. This will provide us with some insights into the most popular and accepted vaccine(s) in the Arab countries, as well as the reasons behind people's reluctance to take the vaccine. Second, it develops models to detect any vaccine-related tweets, to help with gathering all information related to people's perception of the virus, and potentially detecting vaccine-related tweets that are not necessarily tagged with the virus's main hashtags.Entities:
Keywords: COVID-19 Vaccine; COVID-19 pandemic; Deep learning; Machine learning; NLP; Sentiment Analysis; Vaccine uptake
Year: 2022 PMID: 36090696 PMCID: PMC9441136 DOI: 10.1007/s13278-022-00946-0
Source DB: PubMed Journal: Soc Netw Anal Min
Summary of the systematic analysis studies in related work
| Study | Year | Purpose | Sentiment analysis | Approach | Dataset | Feature extraction | Techniques | Study location | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Size | Duration | Availability | Language | ||||||||
|
Adamu et al. ( | 2020 | Analysing public sentiment regarding Covid-19 vaccines | Manual annotation | ML | 5,200 Tweets | N/A | No | English | TF-IDF | SVM, KNN | N/A |
|
Elhishi et al. ( | 2021 | Understanding the public attitudes towards COVID19 | Amazon Comprehend | ML | 1M Tweets | Mar20–Jan21 | No | English | TF-IDF | K-means clustering | N/A |
|
Kumaresh ( | 2021 | Analysing public sentiment regarding Covid-19 vaccines | VADER | ML | 1200 Tweets | N/A | No | English | TF-IDF | Naïve Bayes, LR | India |
|
Liu et al. ( | 2021 | Understanding public perceptions of COVID-19 vaccines | Manual annotation | ML/DL | 5000 Tweets | Nov20–Jan21 | No | English | TF-IDF | BERT, LR, RF, SVM | N/A |
|
Aygün et al. ( | 2021 | Analyzing public sentiments related to COVID-19 vaccines | Aspect-Based | DL | 928,402 Tweets | Nov20–Mar21 | No | English, Turkish | TF-IDF, Word2Vec | BERT | 8 Countries |
|
Yang and Sornlertlamvanich ( | 2021 | Investigating Public Perception of COVID-19 Vaccine | TextBlob | ML | 190,000 Tweets | Dec20–Jun21 | No | English, Japanese | BoW | Naïve Bayes | Japan, USA, UK |
|
Jayasurya et al. ( | 2021 | Analyzing of Public Sentiment on COVID-19 Vaccination | VADER | ML | 431,986 Tweets | Feb20–Apr21 | Yes | English | TF-IDF | 14 Models | N/A |
|
Cotfas et al. ( | 2021 | Analyzing the opinions regarding COVID-19 vaccination | Manual annotation | ML | 752,951 Tweets | Nov20–Dec20 | No | English | BoW | MNB, RF, SVM, Bi-LSTM, CNN, BERT | UK |
|
Rahul et al. ( | 2021 | Analysing Public Sentiments Regarding COVID-19 Vaccines | TextBlob | ML | 13,109 Tweets | Jan20–Jun21 | No | English | TF-IDF | SVM, RF, AdaBoost, MLP | N/A |
|
Reshi et al. ( | 2022 | Analyzing the global perceptions about COVID-19 vaccination | TextBlob, VADER, AFINN | ML, DL | 20,000 Tweets | Mar20–Apr20 | No | English | TF-IDF | DT, RF, LR, LSTM, GRU, RNN, CNN, CNN-LSTM, LSTM-GRNN | Global |
|
Mishra et al. ( | 2022 | Compared the sentiments of before and after the second wave | Manual annotation | ML | 48,913 tweets | Feb21–Apr21 | No | English | TF-IDF, BoW | LR, DT, KNN, RF, GB | India |
|
Nezhad and Deihimi ( | 2022 | Investigating Iranian views toward COVID-19-vaccination | Persian Analyser | ML | 80,3278 Tweets | Apr21–Sep21 | No | Persian | Word2vec | CNN-LSTM | Iran |
|
Zulfiker et al. ( | 2022 | Analyzing the views and opinions about the vaccines | Manual annotation | ML, DL | 1,075 Tweets | June20–July21 | No | English | word2vec | LSTM, Bi LSTM, 1D-CNN, DT, GB, SVM | Bangladesh |
|
Nuser et al. ( | 2022 | Analyzing user sentiment towards the COVID-19 vaccine | Manual annotation | DL | 13,190 Tweets | Jan21–Feb21 | No | English | N/A | CNN-LSTM, CNN, LSTM | N/A |
|
Paliwal et al. ( | 2022 | Analysing the sentiments of the people towards the emergency | VADER, TextBlob | ML | 7,640 tweets | Jan21 | No | English | TF-IDF | RF | India |
|
Akpatsa et al. ( | 2022 | Examining Covid-19 vaccine-related Twitter-emerged sentiments | Manual annotation | ML | 15,239 Tweets | Jan21 | No | English | TF-IDF | LR, RF, SVM, NB | Global |
| This work | 2022 | Analyzing the sentiment of Arabic tweets about COVID-19 vaccine | TextBlob, Mazajak, CAMeL | ML, DL | 1,098,376 Tweets | Jan21–Apr21 | Yes | Arabic | TF-IDF | LR, RF, ETC, GNB, GBM, LSTM, CNN, MLP | Global |
Statistics of vaccination types per tweets
| No. | Keywords | English Translation | Phase (1) | Phase (2) | Phase (3) | Phase (4) | # of Final dataset |
|---|---|---|---|---|---|---|---|
| Tweets | # After removing Links | # After Remove Duplication | # After Filtering based on 1K frequent words | ||||
| 01 |
| Vaccination | 594,739 | 594,739 | 506,487 | 506,327 | 506,327 |
| 02 |
| Vaccination_refusal | 250 | 250 | 161 | 160 | 160 |
| 03 |
| Vaccine | 443,870 | 443,795 | 115,628 | 114,529 | 114,529 |
| 04 |
| Vaccine_reject | 604 | 604 | 273 | 273 | 273 |
| 05 |
| Vaccination_conspiracy | 2,937 | 2,937 | 2,698 | 2,698 | 2,698 |
| 06 |
| Vaccination_fool | 833 | 833 | 761 | 760 | 760 |
| 07 |
| Sputnik | 45,101 | 45,101 | 36,805 | 36,759 | 36,759 |
| 08 |
| Pfizer | 193,926 | 193,926 | 140,047 | 90,770 | 90,770 |
| 09 |
| Vaccination_not_safe | 180 | 180 | 180 | 180 | 180 |
| 10 |
| Russian_vaccine | 13,027 | 13,027 | 242 | 242 | 242 |
| 11 |
| Chinese_vaccine | 8,355 | 8,355 | 114 | 114 | 114 |
| 12 |
| Vaccinations | 85,662 | 85,662 | 62,194 | 62,183 | 62,183 |
| 13 |
| Corona_doses | 21,493 | 21,493 | 16,042 | 16,037 | 16,037 |
| 14 |
| COVID_Vaccinations | 3,833 | 3,833 | 1,941 | 1,941 | 1,941 |
| 15 |
| Corona_dose | 28,292 | 28,292 | 15,054 | 15,054 | 15,054 |
| 16 |
| Sinopharm | 10,297 | 10,297 | 6,860 | 5,341 | 5,341 |
| 17 |
| NO_vaccinations | 1,776 | 1,776 | 1,173 | 1,172 | 1,172 |
| 18 |
| No_vaccines | 352 | 352 | 188 | 188 | 188 |
| 19 |
| Corona_vaccine | 7,233 | 7,233 | 4,855 | 4,839 | 4,839 |
| 20 |
| Corona_vaccines | 207,313 | 207,313 | 98,071 | 98,032 | 98,032 |
| 21 |
| COVID_vaccines | 27,841 | 27,841 | 8,933 | 8,930 | 8,930 |
| 22 |
| Oxford_vaccine | 13,905 | 13,905 | 6 | 6 | 6 |
| 23 |
| Johnson & Johnson_vaccine | 4,057 | 4,057 | 9 | 9 | 9 |
| 24 |
| Pfizer_vaccine | 37,259 | 37,259 | 57 | 57 | 57 |
| 25 |
| COVID_19_vaccine | 25,608 | 25,608 | 0 | 0 | 0 |
| 26 |
| Moderna_vaccine | 2,017 | 2,017 | 7 | 7 | 7 |
| 27 |
| Corona_vaccinations | 34,196 | 34,196 | 20,070 | 20,069 | 20,069 |
| 28 |
| Not_taking_vaccines | 683 | 683 | 27 | 27 | 27 |
| 29 |
| Moderna_vaccine | 7,835 | 7,835 | 1,945 | 1,943 | 1,943 |
| 30 |
| Vaccine_yes | 141 | 142 | 55 | 55 | 55 |
| 31 |
| Take_step_take_vaccine | 15,262 | 15,262 | 190 | 190 | 190 |
| 32 |
| Take_vaccine | 22,467 | 22,456 | 156 | 156 | 156 |
| 33 |
| First_dose | 118,844 | 118,828 | 64,712 | 64,705 | 64,705 |
| 34 |
| Second_dose | 65,885 | 65,875 | 44,627 | 44,623 | 44,623 |
| Total | 2,046,073 | 2,045,962 | 1,150,568 | 1,098,376 | 1,098,376 | ||
| Removed tweets | N/A | 111 | 895,394 | 52,104 | N/A | ||
Fig. 1Distribution of the number of tweets collected each month
Fig. 2Overview approach of the study
Sample of tweets dataset
| Arabic tweet | English translation |
|---|---|
|
| The vaccine is a step on the way back to normal life |
|
| Perhaps the Chinese vaccine is the safest for now |
|
| I expect that even if you take the vaccine, you will |
| get infected |
Sample of tweets dataset includes the corresponding tags
| Arabic tweet | English translation | Tag |
|---|---|---|
|
| The vaccine is a step on the way back to normal life | Positive |
|
| Perhaps the Chinese vaccine is the safest for now | Neutral |
|
| I expect that even if you take the vaccine, you will | |
| get infected |
TextBlob sentiment score range
| Negative | Polarity score < 0 |
| Neutral | Polarity score |
| Positive | Polarity score > 0 |
Summary of the hyperparameter in machine learning algorithm
| Algorithm | Hyperparameter | Value | Description |
|---|---|---|---|
| LR | multi_class | Multinomial | Best to solve the multi-class classification problem |
| random_state | 1000 | mm | |
| C | 3.0 | Inverse of regularization strength | |
| solver | newton-cg | ||
| RF | n_estimators | 500 | The number of decision trees |
| random_state | 5 | The random state sample was taken these random decisions to be managed | |
| max_depth | 200 | The maximum depth between each tree | |
| ETC | n_estimators | 500 | The number of decision trees |
| random_state | 5 | The bootstrapping of the samples used when building trees | |
| max_depth | 200 | Maximum depth of both the estimated regression estimation techniques | |
| SVC | kernel | linear | It maps the observations into some feature space |
| C | 1.0 | The penalty parameter of the error term | |
| random_state | 500 | The opposite of the power of regularization; it must have been a positive float | |
| GNB | N/A | N/A | Default setting |
| LSTM | LSTM Units | 1000 | LSTM layers units |
| Activation Function | ReLU | Activation function used in model | |
| Epochs | 100 | number of epoches | |
| Loss Function | Adam | Loss function used in model | |
| Optimizer | categorical_crossentropy | Optimizer used in model | |
| CNN | 1DConv | 64 filters | 1 dimensional convolutional neural network with 64 filter |
| Kernel | 3 × 3 | kernel size | |
| Activation Function | ReLU | Activation function used in model | |
| Epochs | 100 | number of epoches | |
| Loss Function | Adam | Loss function used in model | |
| Optimizer | categorical_crossentropy | Optimizer used in model | |
| MLP | Neurons | 64 | Neurons in input layer and hidden layer |
| Hidden layer | 2 | number of hidden layer and hidden layer | |
| Activation Function | ReLU | Activation funtion used in model | |
| Epochs | 100 | number of epoches | |
| Loss Function | Adam | Loss function used in model | |
| Optimizer | categorical_crossentropy | Optimizer used in model |
Results of the experiment of all models with TF-IDF features
| Models | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| LR | 0.82 | 0.82 | 0.79 | 0.80 |
| RF | 0.76 | 0.77 | 0.70 | 0.72 |
| SVC | 0.82 | 0.82 | 0.78 | 0.80 |
| GNB | 0.57 | 0.61 | 0.64 | 0.56 |
| ETC | 0.76 | 0.80 | 0.68 | 0.72 |
| LSTM | 0.82 | 0.82 | 0.82 | 0.82 |
| CNN | 0.81 | 0.81 | 0.80 | 0.81 |
| MLP | 0.82 | 0.82 | 0.81 | 0.81 |
Fig. 3The structure of the Long Short-Term Memory (LSTM) neural network. Reproduced from Yan Yan (2016)
Fig. 5Percentage of the tweets about all the six vaccinations types in the collected dataset
Fig. 4WordCloud for the tweets dataset showing the most used words. The shape of the wordcloud matches the Twitter’s logo as a reference to the source of the data (Wordclouds 2022)
Fig. 6Percentage of the tweets in each month along with their sentiment and total number of tweets in the collected dataset
Fig. 7Percentage of the voting in the sentiment analyzers
Fig. 8Confusion matrices of all models with TF-IDF features
Per-class precision, recall and f1-score of all models with TF-IDF features
| Models | Class | Precision | Recall | F1-score |
|---|---|---|---|---|
| LR | Negative | 0.83 | 0.83 | 0.83 |
| Neutral | 0.83 | 0.86 | 0.84 | |
| Positive | 0.80 | 0.68 | 0.73 | |
| RF | Negative | 0.73 | 0.77 | 0.75 |
| Neutral | 0.77 | 0.81 | 0.79 | |
| Positive | 0.82 | 0.51 | 0.63 | |
| SVC | Negative | 0.82 | 0.83 | 0.82 |
| Neutral | 0.83 | 0.86 | 0.84 | |
| Positive | 0.81 | 0.66 | 0.73 | |
| GNB | Negative | 0.76 | 0.60 | 0.67 |
| Neutral | 0.81 | 0.49 | 0.61 | |
| Positive | 0.25 | 0.83 | 0.39 | |
| ETC | Negative | 0.76 | 0.74 | 0.75 |
| Neutral | 0.75 | 0.85 | 0.80 | |
| Positive | 0.88 | 0.46 | 0.60 | |
| LSTM | Negative | 0.82 | 0.83 | 0.83 |
| Neutral | 0.81 | 0.85 | 0.84 | |
| Positive | 0.82 | 0.78 | 0.79 | |
| CNN | Negative | 0.82 | 0.83 | 0.82 |
| Neutral | 0.81 | 0.84 | 0.81 | |
| Positive | 0.80 | 0.68 | 0.75 | |
| MLP | Negative | 0.82 | 0.83 | 0.82 |
| Neutral | 0.81 | 0.85 | 0.82 | |
| Positive | 0.82 | 0.68 | 0.75 |
Fig. 9Comparison of accuracy of all models with TF-IDF features
Fig. 10Feature space for each target class