| Literature DB >> 32876579 |
Ryzen Benson1, Mengke Hu1, Annie T Chen2, Subhadeep Nag3, Shu-Hong Zhu4, Mike Conway1.
Abstract
BACKGROUND: Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL.Entities:
Keywords: ENDS; JUUL; NLP; Twitter; e-cig; electronic cigarettes; electronic nicotine delivery system; infodemiology; infoveillance; machine learning; natural language processing; public health; smoking cessation; social media; tobacco; underage tobacco use
Mesh:
Year: 2020 PMID: 32876579 PMCID: PMC7495253 DOI: 10.2196/19975
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1Final categories and synthetic tweet examples, as seen in the manual annotation.
Figure 2Visualization of n-grams. n-grams can be described as a sequence of n-items, can encode additional semantic content beyond individual words, and once vectorized, can be used as features in machine learning algorithms.
Figure 3Brief descriptions of the 3 machine learning algorithms used to classify our annotated tweets.
Category proportions and frequencies from the manual annotation of tweets (n=3152).
| Categorya | Proportion, % | Frequency |
|
|
| |||
| First-person experience | 56.85 | 1792 |
|
| Neutral sentiment | 44.92 | 1416 |
|
| Positive sentiment | 33.38 | 1052 |
|
| Negative sentiment | 21.67 | 683 |
|
| Flavor/JUUL pods | 18.59 | 586 |
|
| Opinion | 15.96 | 503 |
|
| News/media | 9.58 | 302 |
|
| Other substances | 9.55 | 301 |
|
| Industry/regulation | 8.95 | 292 |
|
| Experience: other | 7.99 | 252 |
|
| Health effects | 6.85 | 216 |
|
| Underage | 6.03 | 190 |
|
| Commodity | 4.89 | 154 |
|
| Humor | 3.20 | 101 |
|
| Suorin | 2.54 | 80 |
|
| Marketing | 2.38 | 75 |
|
| Pleasure | 2.09 | 66 |
|
| Disgust | 1.71 | 54 |
|
| Craving | 1.46 | 46 |
|
| Cessation | 1.43 | 45 |
|
| Starting | 1.21 | 38 |
|
aCategories are not mutually exclusive.
Test metrics of the 3 algorithms for all 3 classification tasks as well as average model performance at 500 features for each classification task.
| Test metrics and performance | Logistic regression | Bernoulli naïve Bayes | Random forest | ||||||||||
| Acca |
| Precb | Recc | Acc |
| Prec | Rec | Acc |
| Prec | Rec | ||
| Underage JUUL use | 0.94 | 0.94 | 0.95 | 0.92 | 0.78 | 0.71 | 0.99 | 0.57 | 0.99 | 0.99 | 0.99 | 0.99 | |
| Positive sentiment | 0.72 | 0.69 | 0.82 | 0.69 | 0.69 | 0.63 | 0.83 | 0.53 | 0.82 | 0.82 | 0.80 | 0.75 | |
| Negative sentiment | 0.78 | 0.77 | 0.85 | 0.73 | 0.72 | 0.66 | 0.98 | 0.50 | 0.91 | 0.91 | 0.90 | 0.94 | |
| Average model performance | 0.81 | 0.80 | 0.87 | 0.78 | 0.73 | 0.67 | 0.93 | 0.53 | 0.91 | 0.91 | 0.90 | 0.89 | |
aAcc: accuracy
bPrec: precision
cRec: recall
Figure 4Line plot of model performance at 500 features in classifying underage tweets and the top 10 most discerning features of the underage tweets.