| Literature DB >> 35331966 |
Lingyao Li1, Jiayan Zhou2, Zihui Ma3, Michelle T Bensi4, Molly A Hall5, Gregory B Baecher6.
Abstract
Vaccination is the most effective way to provide long-lasting immunity against viral infection; thus, rapid assessment of vaccine acceptance is a pressing challenge for health authorities. Prior studies have applied survey techniques to investigate vaccine acceptance, but these may be slow and expensive. This study investigates 29 million vaccine-related tweets from August 8, 2020 to April 19, 2021 and proposes a social media-based approach that derives a vaccine acceptance index (VAI) to quantify Twitter users' opinions on COVID-19 vaccination. This index is calculated based on opinion classifications identified with the aid of natural language processing techniques and provides a quantitative metric to indicate the level of vaccine acceptance across different geographic scales in the U.S. The VAI is easily calculated from the number of positive and negative Tweets posted by a specific users and groups of users, it can be compiled for regions such a counties or states to provide geospatial information, and it can be tracked over time to assess changes in vaccine acceptance as related to trends in the media and politics. At the national level, it showed that the VAI moved from negative to positive in 2020 and maintained steady after January 2021. Through exploratory analysis of state- and county-level data, reliable assessments of VAI against subsequent vaccination rates could be made for counties with at least 30 users. The paper discusses information characteristics that enable consistent estimation of VAI. The findings support the use of social media to understand opinions and to offer a timely and cost-effective way to assess vaccine acceptance.Entities:
Keywords: COVID-19 vaccine; Natural language processing; Social media; Text classification; Vaccine acceptance
Mesh:
Substances:
Year: 2022 PMID: 35331966 PMCID: PMC8935963 DOI: 10.1016/j.jbi.2022.104054
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 8.000
Fig. 1The research framework for the implementation of the proposed model.
Tweet classifications and examples.
| Class | Label Criteria | Tweet Example |
|---|---|---|
| Positive (Class 1) | Refer to safety and effectiveness. Show positive emotion (e.g., willingness to take vaccine). Describe large distribution and administration, process and policies that lead to vaccine development, authorization, and recommendation. | |
| Negative (Class-1) | Show safety concern (e.g., illness, death, vaccine accidents, strong allergic reaction). Express negative emotion (e.g., not willing to take vaccine, protest against the vaccine). | |
| Unrelated (Class 0) | Discuss topics not related to vaccine confidence (e.g., vaccine rollout, vaccine equity, vaccine passport). Describe topics not related to COVID-19 vaccine (e.g., flu vaccine, HPV vaccine). |
The grid search of hyperparameters for different machine learning classifiers.
| Classifier | Grid search range |
|---|---|
| DT | Max depth: [10, 20, 40, 60, 80, 120, 200] |
| RF | Max depth: [10, 20, 40, 60, 80, 120] |
| Multinomial NB | Alpha (smooth parameter): [0.1, 0.2, 0.5, 1, 1.5, 2] |
| Linear SVM | C (regularization parameter): [0.01, 0.05, 0.1, 0.5, 1, 2, 10, 100] |
| Multinomial LR | C (inverse of regularization strength): [0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 100] |
| LSTM | Batch size: [16, 32, 64, 128, 256] |
Classification performance.
| Without sample balance | With sample balance | |||||
|---|---|---|---|---|---|---|
| Class 1 | Class − 1 | Class 0 | Class 1 | Class − 1 | Class 0 | |
| Textblob | 0.33 | 0.15 | 0.60 | |||
| Vader | 0.33 | 0.19 | 0.58 | |||
| TF-IDF + DT | 0.46 | 0.29 | 0.68 | 0.41 | 0.26 | 0.67 |
| TF-IDF + RF | 0.91 | 0.91 | 0.66 | 0.70 | 0.57 | 0.79 |
| TF-IDF + NB | 0.69 | 0.58 | 0.70 | 0.58 | 0.35 | 0.84 |
| TF-IDF + SVM | 0.61 | 0.49 | 0.76 | 0.56 | 0.40 | 0.81 |
| TF-IDF + LR | 0.68 | 0.58 | 0.75 | 0.58 | 0.41 | 0.81 |
| FastText + LSTM | 0.65 | 0.43 | 0.68 | 0.44 | 0.26 | 0.78 |
| GloVe + LSTM | 0.56 | 0.40 | 0.72 | 0.40 | 0.32 | 0.75 |
| Textblob | 0.57 | 0.26 | 0.31 | |||
| Vader | 0.52 | 0.52 | 0.24 | |||
| TF-IDF + DT | 0.36 | 0.21 | 0.78 | 0.39 | 0.31 | 0.66 |
| TF-IDF + RF | 0.24 | 0.04 | 0.99 | 0.61 | 0.49 | 0.85 |
| TF-IDF + NB | 0.43 | 0.11 | 0.92 | 0.63 | 0.73 | 0.63 |
| TF-IDF + SVM | 0.56 | 0.38 | 0.82 | 0.62 | 0.66 | 0.67 |
| TF-IDF + LR | 0.54 | 0.34 | 0.88 | 0.61 | 0.64 | 0.71 |
| FastText + LSTM | 0.33 | 0.21 | 0.90 | 0.55 | 0.40 | 0.61 |
| GloVe + LSTM | 0.43 | 0.35 | 0.82 | 0.58 | 0.34 | 0.58 |
| Textblob | 0.41 | 0.19 | 0.41 | |||
| Vader | 0.40 | 0.28 | 0.34 | |||
| TF-IDF + DT | 0.40 | 0.25 | 0.72 | 0.40 | 0.28 | 0.66 |
| TF-IDF + RF | 0.38 | 0.08 | 0.79 | 0.65 | 0.52 | 0.82 |
| TF-IDF + NB | 0.53 | 0.19 | 0.79 | 0.61 | 0.47 | 0.72 |
| TF-IDF + SVM | 0.59 | 0.43 | 0.79 | 0.59 | 0.50 | 0.73 |
| TF-IDF + LR | 0.60 | 0.42 | 0.81 | 0.60 | 0.50 | 0.75 |
| FastText + LSTM | 0.44 | 0.28 | 0.78 | 0.49 | 0.31 | 0.68 |
| GloVe + LSTM | 0.50 | 0.37 | 0.76 | 0.47 | 0.33 | 0.65 |
| Training | Testing | Training | Testing | |||
| Textblob | 37.4% | |||||
| Vader | 34.9% | |||||
| TF-IDF + DT | 94.6% | 59.6% | 94.5% | 54.3% | ||
| TF-IDF + RF | 92.8% | 67.6% | 99.2% | 74.4% | ||
| TF-IDF + NB | 83.9% | 69.1% | 85.6% | 64.4% | ||
| TF-IDF + SVM | 89.0% | 69.7% | 92.9% | 65.5% | ||
| TF-IDF + LR | 85.6% | 72.1% | 94.6% | 67.4% | ||
| FastText + LSTM | 94.2% | 66.3% | 86.9% | 56.8% | ||
| GloVe + LSTM | 89.4% | 65.9% | 92.2% | 55.3% | ||
Fig. 2Temporal results of (a) daily tweet volume of all positive and negative tweets, and (b) national VAI computed on a daily and weekly basis.
Fig. 3Timeline of notable events relative to calculated national-level VAI.
Fig. 47-day rolling average of the state-level VAI (a) compared with the national VAI (y-axis is the VAI value and x-axis is the date across the full study period), and (b) with detrended from the national VAI (y-axis is the difference between state-level VAI and national VAI, and x-axis is the date across the full study period).
Fig. 5The correlations of the state-level VAI with accumulative state-level vaccination rates reported on June 22, 2021, on a (a) daily, (b) rolling weekly, (c) rolling 30-day basis, and against accumulative state-level vaccination rates (d) on April 22, May 22, and June 22, 2021, (the legend “04/22_rates” represents the correlation of state-level VAI with vaccination rates reported on April 22).
Fig. 6(a) The number of Twitter users in each county during full study period, (b) County-level VAI computed for the period (December 14, 2020 to April 18, 2021) for counties with the number of users larger than 30, (c) County-level vaccination rate based on the data published on June 22, 2021, and (d) CDC estimated vaccine hesitancy based on the HPS from March 3 to 15, 2021.
Fig. 7The correlations of the county-level VAI (based on different numbers of users) with vaccination rates using tweets obtained (a) from August 9, 2020, to April 18, 2021, and (b) from December 14, 2020, to April 18, 2021 (after the first distribution).