| Literature DB >> 35401714 |
Irfan Ullah Khan1, Nida Aslam1, Sara Chrouf1, Israa Atef1, Ikram Merah1, Latifah AlMulhim1, Raghad AlShuaifan1.
Abstract
Countries around the world are facing so many challenges to slow down the spread of the current SARS-CoV-2 virus. Vaccination is an effective way to combat this virus and prevent its spreading among individuals. Currently, there are more than 50 SARS-CoV-2 vaccine candidates in trials; only a few of them are already in use. The primary objective of this study is to analyse the public awareness and opinion toward the vaccination process and to develop a model that predicts the awareness and acceptability of SARS-CoV-2 vaccines in Saudi Arabia by analysing a dataset of Arabic tweets related to vaccination. Therefore, several machine learning models such as Support Vector Machine (SVM), Naïve Bayes (NB), and Logistic Regression (LR), sideways with the N-gram and Term Frequency-Inverse Document Frequency (TF-IDF) techniques for feature extraction and Long Short-Term Memory (LSTM) model used with word embedding. LR with unigram feature extraction has achieved the best accuracy, recall, and F1 score with scores of 0.76, 0.69, and 0.72, respectively. However, the best precision value of 0.80 was achieved using SVM with unigram and NB with bigram TF-IDF. However, the Long Short-Term Memory (LSTM) model outperformed the other models with an accuracy of 0.95, a precision of 0.96, a recall of 0.95, and an F1 score of 0.95. This model will help in gaining a complete idea of how receptive people are to the vaccine. Thus, the government will be able to find new ways and run more campaigns to raise awareness of the importance of the vaccine.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35401714 PMCID: PMC8984742 DOI: 10.1155/2022/6722427
Source DB: PubMed Journal: Comput Intell Neurosci
Summary of previous studies related to different vaccination.
| Ref | Dataset | Number of tweets | Language | Virus | Feature extraction | Technique | Accuracy |
|---|---|---|---|---|---|---|---|
| [ | Tweets collected from Twitter | 150,000 | English | Zika | — | Descriptive analysis | — |
| [ | Tweets collected from Twitter | 660,000 | English | Disneyland measles | TF-IDF | SVM | 0.746 |
| [ | Manually collected | 2150 | English | HPV | N-gram | SVM | 0.886 |
| [ | Tweets collected from Twitter | 14735 | English | Measles | — | SNA, chi-square | — |
| [ | Tweets collected from Twitter | 49,354 | English | Newborn vaccinations | N-gram | SVM | 0.847 |
| [ | Tweets collected from Twitter | 477,768 | English | Influenza A (H1N1) | N-gram | ME | 0.842 |
Summary of previous studies related to SARS-CoV-2 vaccination and supervised sentiment analysis.
| Ref | Dataset | No. of tweets | Language | Feature extraction | Technique | Accuracy |
|---|---|---|---|---|---|---|
| [ | Tweets collected from Twitter | 40,000 | English | — | LR | 0.63 |
| [ | Kaggle (tweets) | 125,906 | English | Word embedding | Bi-LSTM | 0.98 |
| [ | Tweets collected from Twitter | 6,000 | English | — | NB | — |
| [ | Tweets collected from Twitter | 803,278 | Persian | — | CNN-LSTM | — |
| [ | Tweets collected from Twitter | 10,000 | English | — | KNN | — |
Figure 1Block diagram for the proposed methodology.
Figure 2Distribution of tweets per class in the dataset.
Sample tweets from each class.
| Sentiment | Tweets sample in Arabic | Tweets translation to English |
|---|---|---|
| Provaccination | يوم امس اخذت الوالده الجرعه الاولي لقاح كورونا اسال الله ينفع ويحفظها الاهم روعه التنظيم والتنسيق والترتيب والسرعه والحفاوه والاهتمام والاحترافيه للجميع سعوديين ومقيمين شيء مشرف يستحق المديح والتقدير الحمدلله نعمه السعوديه وبلا مجامله | Yesterday, my mother took the first dose of the vaccine, May Allah protect her. Everything was well arranged, clean and professional. I thank God to be living in Saudi Arabia |
| اسال الله العظيم ان يجعلها لقاح العافيه ويحميك شر وجميع الشعب | ||
| Antivaccination | ما عمري تطعمت لان عندي فوبيا الابر وايد يقولون تطعموا وحاشتهم انفلونزا قويه يعني ما استفادو | I have not been vaccinated because I have a phobia of needles. Everyone who got vaccinated has a strong flu, so they did not benefit of it. After the introduction of the corona vaccine, mysterious deaths raise doubts |
| طرح لقاح كورونا وفيات غامضه تثير الشكوك | ||
| Neutral | يوم عظيم للبشريه فايزر تعلن لقاحها وارتفاع مؤشرات الاسواق والمال والاعمال واسعار النفط وشركات الطيران العالم يتعافي | Good day, Pfizer has announced their vaccines and obvious indication of rise in the market price and oil price and airlines, your excellency the minister of health all countries are signing contracts to get the vaccines. Will Saudi Arabia get it soon? |
| معالي الوزير افاده للشعب اذا الحكومه وقعت اتفاقيات شراء لقاحات شركه فايز شركه موديرنا الامريكيه اغلب الدول تتسابق لشراء اللقاحات بعقود مسبقه افيدونا جزاكم الله خير |
Figure 3Distribution of number of sample and the length of the tweets in the dataset.
Figure 4Distribution of top 10 unigrams after preprocessing.
Figure 5Distribution of top 10 bigrams after preprocessing.
Optimum parameters for the proposed SVM model.
| Parameters | Optimal value chosen |
|---|---|
|
| 1 |
| Gamma | 0.1 |
| Kernel | Linear |
Optimum parameters for the proposed LR model.
| Parameters | Optimal value chosen |
|---|---|
|
| 100 |
| Solver | Liblinear |
| Penalty |
|
Performance measure of classifiers using different features' extraction.
| Model | Feature extraction and selection | Accuracy | Precision | Recall |
|
|---|---|---|---|---|---|
| SVM | Bigram with TF-IDF | 0.73 | 0.79 | 0.62 | 0.66 |
| Bigram without TF-IDF | 0.74 | 0.74 | 0.67 | 0.7 | |
| Unigram with TF-IDF | 0.75 |
| 0.68 | 0.71 | |
| Unigram without TF-IDF | 0.73 | 0.74 | 0.67 | 0.7 | |
| NB | Bigram with TF-IDF | 0.66 |
| 0.5 | 0.49 |
| Bigram without TF-IDF | 0.72 | 0.66 | 0.68 | 0.67 | |
| Unigram with TF-IDF | 0.67 | 0.79 | 0.51 | 0.5 | |
| Unigram without TF-IDF | 0.69 | 0.69 | 0.67 | 0.68 | |
| LR | Bigram with TF-IDF | 0.73 | 0.78 | 0.67 | 0.71 |
| Bigram without TF-IDF | 0.72 | 0.73 | 0.64 | 0.67 | |
| Unigram with TF-IDF | 0.72 | 0.75 | 0.62 | 0.65 | |
| Unigram without TF-IDF |
| 0.72 |
|
| |
| LSTM model | Embedding techniques |
|
|
|
|
Figure 6Confusion matrix for logistic regression.
Figure 7Confusion matrix for support vector machine.
Figure 8Confusion matrix for LSTM model.
Figure 9LSTM model ROC curve for the three classes.