| Literature DB >> 28253919 |
Jingcheng Du1, Jun Xu1, Hsingyi Song1, Xiangyu Liu2, Cui Tao3.
Abstract
BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake.Entities:
Keywords: Gold standard; Hierarchical classification; Sentiment analysis; Social media; Support vector machines; Twitter
Mesh:
Substances:
Year: 2017 PMID: 28253919 PMCID: PMC5335787 DOI: 10.1186/s13326-017-0120-6
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Sentiment classification scheme for HPV vaccines related tweets: The categories in colored rectangles (other than black) are all possible sentiment labels that can be assigned to the tweets
Detailed definition of different sentiment categories for HPV vaccines related tweets
| Sentiment | Description | |
|---|---|---|
| Positive | Show positive opinion or prompt the uptake of HPV vaccine | |
| Negative | Safety | Concerns or doubt on the safety issues of HPV vaccine or present vaccine injuries |
| Efficacy | Concerns or doubt on the effectiveness of HPV vaccine | |
| Cost | Concerns on the cost of HPV vaccine (e.g.: money or time) | |
| Resistant | Resistance to HPV vaccines due to cultural or emotional issues | |
| Others | Other concerns | |
| Neutral | Related to HPV vaccine topic but contains no sentiment or sentiment is unclear or contains both negative and positive sentiment | |
| Unrelated | Not related to HPV vaccine topic | |
Fig. 2Overview of the machine learning based system and optimization approach: (a) modularized machine learning system framework; (b) machine learning optimization steps
Fig. 3Sentiment distribution in 6,000 tweets gold standard. (Neg: Negative)
10-fold cross validation performance on the baseline model and hierarchical classification model. (F: F-1 score; P: precision; R: recall; for the categories that do not indicate the metric, F-1 score are used)
| Classification Model | Plain Classification (Baseline model) | Hierarchical Classification | |
|---|---|---|---|
| Micro-averaging | F | 0.6732 | 0.7208 |
| Macro-averaging | P | 0.4455 | 0.5402 |
| R | 0.3574 | 0.4386 | |
| F | 0.3967 | 0.4841 | |
| Unrelated | 0.8044 | 0.8599 | |
| Neutral | 0.5792 | 0.6181 | |
| Positive | 0.6528 | 0.7021 | |
| NegSafety | 0.7006 | 0.7277 | |
| NegEfficacy | 0 | 0.2593 | |
| NegCost | 0 | 0 | |
| NegResistant | 0 | 0 | |
| NegOthers | 0.155 | 0.4645 | |
10-fold cross validation performance on different feature sets combinations. (Feature sets: (a) Word n-grams; (b) POS tags; (c) Clusters; F: F-1 score; P: precision; R: recall; for the categories that do not indicate the metric, F-1 score are used)
| Feature sets | (a) | (a) + (b) | (a) + (c) | (a) + (b) + (c) | |
|---|---|---|---|---|---|
| Micro-averaging | F | 0.7208 | 0.7263 | 0.7255 | 0.73 |
| Macro-averaging | P | 0.5402 | 0.5438 | 0.5396 | 0.5477 |
| R | 0.4386 | 0.4468 | 0.4442 | 0.4576 | |
| F | 0.4841 | 0.4905 | 0.4872 | 0.4986 | |
| Unrelated | 0.8599 | 0.864 | 0.859 | 0.8618 | |
| Neutral | 0.6181 | 0.6226 | 0.625 | 0.6231 | |
| Positive | 0.7021 | 0.7098 | 0.7123 | 0.7136 | |
| NegSafety | 0.7277 | 0.734 | 0.7357 | 0.7542 | |
| NegEfficacy | 0.2593 | 0.3214 | 0.2593 | 0.3793 | |
| NegCost | 0 | 0 | 0 | 0 | |
| NegResistant | 0 | 0 | 0 | 0 | |
| NegOthers | 0.4645 | 0.4614 | 0.4724 | 0.4753 | |
10-fold cross validation performance among the best performing model after C and optimization and the model using default C and . (F: F-1 score; P: precision; R: recall; for the categories that do not indicate the metric, F-1 score are used)
| Model | Model using default | Best model using optimized | Best model using optimized | |
|---|---|---|---|---|
| Micro-averaging | F | 0.73 | 0.7352 | 0.7442 |
| Macro-averaging | P | 0.5477 | 0.6889 | 0.6873 |
| R | 0.4576 | 0.5095 | 0.5142 | |
| F | 0.4986 | 0.5858 | 0.5883 | |
| Unrelated | 0.8044 | 0.8538 | 0.8633 | |
| Neutral | 0.5792 | 0.6330 | 0.6470 | |
| Positive | 0.6528 | 0.7239 | 0.7255 | |
| NegSafety | 0.7006 | 0.7641 | 0.7617 | |
| NegEfficacy | 0 | 0.4138 | 0.4068 | |
| NegCost | 0 | 0.5 | 0.5 | |
| NegResistant | 0 | 0 | 0 | |
| NegOthers | 0.155 | 0.5144 | 0.5403 | |