| Literature DB >> 35849795 |
Aditya Singhal1, Manmeet Kaur Baxi1, Vijay Mago1.
Abstract
BACKGROUND: Social media platforms (SMPs) are frequently used by various pharmaceutical companies, public health agencies, and nongovernment organizations (NGOs) for communicating health concerns, new advancements, and potential outbreaks. Although the benefits of using them as a tool have been extensively discussed, the online activity of various health care organizations on SMPs during COVID-19 in terms of engagement and sentiment forecasting has not been thoroughly investigated.Entities:
Keywords: Twitter; content analysis; health care; natural language processing; pharmaceutical; public engagement; public health; sentiment forecasting; social media; user engagement
Year: 2022 PMID: 35849795 PMCID: PMC9390834 DOI: 10.2196/37829
Source DB: PubMed Journal: JMIR Med Inform
Distribution of tweets for the selected user accounts of 3 types of organizations.
| Name of organization (Twitter handle) | Before COVID-19, n (%) | During COVID-19, n (%) | Total tweets, N | |
|
| ||||
|
| Centers for Disease Control and Prevention (CDCgov) | 8435 (58.6) | 5963 (41.4) | 14,398 |
|
| Centers for Disease Control and Prevention (CDC_eHealth) | 1376 (86.3) | 219 (13.7) | 1594 |
|
| Government of Canada for Indigenous (GCIndigenous) | 3505 (54.0) | 2989 (46.0) | 6494 |
|
| Health Canada and PHAC (GovCanHealth) | 7878 (17.2) | 37,907 (82.8) | 45,785 |
|
| US Department of Health & Human Services (HHSGov) | 7890 (56.9) | 5969 (43.1) | 13,859 |
|
| Indian Health Service (IHSgov) | 1090 (44.7) | 1346 (55.3) | 2436 |
|
| Canadian Food Inspection Agency (InspectionCan) | 4145 (62.2) | 2516 (37.8) | 6661 |
|
| National Institutes of Health (NIH) | 5837 (71.6) | 2314 (28.4) | 8151 |
|
| National Indian Health Board (NIHB1) | 1247 (51.1) | 1195 (48.9) | 2442 |
|
| US Food and Drug Administration (US_FDA) | 5810 (59.7) | 3925 (40.3) | 9735 |
|
| Total | 47,213 (42.3) | 64,343 (57.7) | 111,555 |
|
| ||||
|
| AstraZeneca (AstraZeneca) | 3462 (78.2) | 963 (21.8) | 4425 |
|
| Biogen (biogen) | 1819 (61.9) | 1120 (38.1) | 2939 |
|
| Glaxo SmithKline (GSK) | 4200 (69.3) | 1857 (30.7) | 6057 |
|
| Johnson & Johnson (JNJNews) | 4813 (71.4) | 1926 (28.6) | 6739 |
|
| Pfizer (pfizer) | 3637 (64.1) | 2039 (35.9) | 5676 |
|
| Total | 17,931 (69.4) | 7905 (30.6) | 25,836 |
|
| ||||
|
| World Health Organization (WHO) | 24,775 (56.2) | 19,303 (43.8) | 44,078 |
aNGO: nongovernment organization.
Figure 1Overall research framework. WHO: World Health Organization.
Mean coherence scores and CPUa time for different clustering algorithms.
| Clustering algorithm | cv | cumass | Time taken (minutes:seconds) | |
|
| ||||
|
| LDAb | 0.352 | –5.526 | 17:11 |
|
| Parallel LDA | 0.396 | –3.709 | 5:48 |
|
| NMFc | 0.493 | –3.653 | 7:38 |
|
| LSId | 0.316 | –5.921 | 0:16 |
|
| HDPe | 0.696 | –18.668 | 3:24 |
|
| ||||
|
| LDA | 0.456 | –5.688 | 14:01 |
|
| Parallel LDA | 0.446 | –3.990 | 6:08 |
|
| NMF | 0.567 | –3.794 | 7:04 |
|
| LSI | 0.381 | –5.356 | 0:16 |
|
| HDP | 0.650 | –17.610 | 3:01 |
aCPU: central processing unit.
bLDA: latent dirichlet allocation.
cNMF: nonnegative matrix factorization.
dLSI: latent semantic indexing.
eHDP: hierarchical dirichlet process.
Figure 2Scaled heatmaps showing topic distribution for pharmaceutical companies before and during COVID-19.
Figure 3Top hashtags of pharmaceutical companies before and during COVID-19.
Figure 4User impact of all Twitter handles scaled between 0 and 1. CDC: Centers for Disease Control and Prevention; NIH: National Institutes of Health; WHO: World Health Organization.
Figure 5User engagement on Twitter accounts of pharmaceutical companies from January 1, 2017, to December 31, 2021.
Results of time series sentiment forecasting using different MLa models (all metrics are 5-fold cross-validation).
| Models | Pharmaceutical companies | Public health agencies | WHOb | |||||||||||||||||||||
| Before COVID-19 | During COVID-19 | Before COVID-19 | During COVID-19 | Before COVID-19 | During COVID-19 | |||||||||||||||||||
| MAEc | MSEd | RMSEe | MAE | MSE | RMSE | MAE | MSE | RMSE | MAE | MSE | RMSE | MAE | MSE | RMSE | MAE | MSE | RMSE | |||||||
| ARIMAf | 0.063g | 0.005g | 0.072g | 0.098 | 0.013 | 0.112 | 0.027g | 0.001g | 0.032h | 0.240 | 0.082 | 0.286 | 0.066h | 0.006h | 0.080h | 0.106 | 0.012 | 0.111 | ||||||
| SARIMAXi | 0.065h | 0.005g | 0.072g | 0.084 | 0.011 | 0.104 | 0.028j | 0.001g | 0.031g | 0.709 | 0.011g | 0.106h | 0.054g | 0.004g | 0.061g | 0.047h | 0.004g | 0.066 | ||||||
| Bayesian ridge | 0.083 | 0.010 | 0.100 | 0.102 | 0.018 | 0.119 | 0.031 | 0.001 | 0.037 | 0.141 | 0.037 | 0.163 | 0.075j | 0.009j | 0.087j | 0.061 | 0.008 | 0.075 | ||||||
| Ridge regression | 0.069 | 0.008 | 0.085 | 0.079 | 0.011 | 0.094 | 0.030 | 0.002 | 0.038 | 0.124 | 0.029 | 0.147 | 0.076 | 0.009 | 0.091 | 0.056 | 0.007 | 0.068 | ||||||
| CatBoost regressor | 0.066 | 0.007j | 0.080h | 0.072g | 0.008h | 0.086g | 0.027h | 0.001h | 0.035 | 0.104 | 0.023 | 0.127 | 0.079 | 0.009 | 0.089 | 0.052 | 0.007 | 0.065 | ||||||
| K-neighbors regressor | 0.070 | 0.009 | 0.087 | 0.075h | 0.008g | 0.087h | 0.030 | 0.001 | 0.036 | 0.093j | 0.022 | 0.113 | 0.081 | 0.011 | 0.100 | 0.050 | 0.007 | 0.061j | ||||||
| Elastic net | 0.070 | 0.008 | 0.088 | 0.080 | 0.009j | 0.093j | 0.029 | 0.001h | 0.035 | 0.087h | 0.021j | 0.109j | 0.082 | 0.011 | 0.100 | 0.046g | 0.006h | 0.059g | ||||||
| Lasso regression | 0.070 | 0.008 | 0.088 | 0.080 | 0.009j | 0.093j | 0.029 | 0.001 | 0.035 | 0.087h | 0.021j | 0.109j | 0.082 | 0.011 | 0.100 | 0.046g | 0.006h | 0.059g | ||||||
| Random forest regressor | 0.065j | 0.007h | 0.081j | 0.080 | 0.010 | 0.093 | 0.028 | 0.001h | 0.034j | 0.110 | 0.024 | 0.134 | 0.082 | 0.009 | 0.090 | 0.047j | 0.006j | 0.060h | ||||||
| Light gradient boosting machine | 0.070 | 0.008 | 0.088 | 0.080 | 0.009j | 0.093j | 0.029 | 0.001h | 0.035 | 0.087h | 0.021j | 0.109j | 0.082 | 0.011 | 0.100 | 0.046g | 0.006h | 0.059g | ||||||
| Gradient boosting regressor | 0.075 | 0.008 | 0.086 | 0.079 | 0.010 | 0.094 | 0.029 | 0.001j | 0.036 | 0.141 | 0.034 | 0.168 | 0.082 | 0.010 | 0.094 | 0.051 | 0.008 | 0.064 | ||||||
| AdaBoost regressor | 0.070 | 0.007 | 0.082 | 0.080 | 0.010 | 0.091 | 0.029 | 0.001 | 0.037 | 0.084g | 0.020h | 0.105g | 0.087 | 0.010 | 0.096 | 0.057 | 0.007 | 0.072 | ||||||
| Extreme gradient boosting | 0.068 | 0.009 | 0.087 | 0.080 | 0.011 | 0.098 | 0.031 | 0.002 | 0.040 | 0.151 | 0.045 | 0.171 | 0.087 | 0.011 | 0.098 | 0.055 | 0.007 | 0.065 | ||||||
| Decision tree regressor | 0.076 | 0.009 | 0.086 | 0.087 | 0.013 | 0.106 | 0.029 | 0.001 | 0.037 | 0.112 | 0.030 | 0.142 | 0.098 | 0.014 | 0.111 | 0.048 | 0.006j | 0.061 | ||||||
| Linear regression | 0.245 | 0.312 | 0.314 | 0.094 | 0.017 | 0.114 | 0.157 | 0.164 | 0.216 | 0.124 | 0.029 | 0.148 | 2.367 | 52.719 | 3.334 | 0.062 | 0.008 | 0.076 | ||||||
| Prophet | 0.108 | 0.016 | 0.126 | 0.089 | 0.011 | 0.104 | 0.040 | 0.002 | 0.049 | 0.120 | 0.015 | 0.124 | 0.114 | 0.020 | 0.143 | 0.086 | 0.011 | 0.106 | ||||||
aML: machine learning.
bWHO: World Health Organization.
cMAE: mean absolute error.
dMSE: mean squared error.
eRMSE: root-mean-square error.
fARIMA: autoregressive integrated moving average.
gThe highest-performing forecasting method.
hThe second-highest-performing forecasting method.
iSARIMAX: seasonal autoregressive integrated moving average with exogenous factors.
jThe third-highest-performing forecasting method.
Figure 6One-step-ahead forecast for all pharmaceutical companies before and during COVID-19 using the best-performing models from Table S1 (Multimedia Appendix 1). ARIMA: autoregressive integrated moving average.