| Literature DB >> 35862172 |
William Baker1, Jason B Colditz2, Page D Dobbs3, Huy Mai1, Shyam Visweswaran4, Justin Zhan1, Brian A Primack5.
Abstract
BACKGROUND: Twitter provides a valuable platform for the surveillance and monitoring of public health topics; however, manually categorizing large quantities of Twitter data is labor intensive and presents barriers to identify major trends and sentiments. Additionally, while machine and deep learning approaches have been proposed with high accuracy, they require large, annotated data sets. Public pretrained deep learning classification models, such as BERTweet, produce higher-quality models while using smaller annotated training sets.Entities:
Keywords: deep learning; infoveillance; social media; transformer models; vaping
Year: 2022 PMID: 35862172 PMCID: PMC9353682 DOI: 10.2196/33678
Source DB: PubMed Journal: JMIR Med Inform
Descriptions of labels used for annotating vaping-related tweets.
| Labels | Descriptions | Example quotes |
| Relevant |
Is the tweet in English and related to the vaping topic at hand (eg, vape use or users, vaping devices, or products)? |
|
| Not relevant |
Typically, non-English tweets or tweets that referenced vaping cannabis products specifically. |
|
| Commercial |
Is the tweet an advertisement/marketing for vaping products? |
|
| Noncommercial |
Includes tweets that demonstrate favorability toward a product but do not directly advocate for purchasing it. |
|
| Positive |
The tweet is associated with positive emotions or contexts regarding vaping. |
The tweeter is currently, or has recently used, or is going to vape:
The tweeter shows positivity or neutral acceptance from others’ usage or others’ positive comments about vaping:
The tweeter mentions a vape pen in association with other positive aspects of society or popular culture.
The tweeter asks a question using first-person pronouns:
|
| Negative |
The tweet is associated with negative emotions or contexts regarding vaping. |
The tweeter believes smoking a vape is disgusting, uncool, or unattractive:
The tweeter criticizes/ridicules others for using a vape:
The tweeter prefers to use a different substance, such as cigarettes or marijuana:
|
| Neutral |
|
The tweet is factual but not opinionated or is a question about unbiased facts/information about vaping:
|
Description of annotated training and test data sets (N=2401).a
| Targets | Number of tweets with a positive target, n (%) | Number of tweets with a negative target, n (%) | Number of tweets with a neutral target, n (%) |
| Relevance | Relevant: Total: 1802 (75.05) Training: 1637 (90.84) Test: 165 (9.16) | Nonrelevant: Total: 599 (24.95) Training: 524 (87.48) Test: 75 (12.52) | N/Ab |
| Commercial | Commercial: Total: 117 (4.87) Training: 106 (90.60) Test: 11(9.40) | Noncommercial: Total: 1685 (70.18) Training: 1516 (89.97) Test: 169 (10.03) | N/A |
| Sentiment | Positive: Total: 172 (7.16) Training: 158 (91.86) Test: 14 (8.14) | Negative: Total: 130 (5.41) Training: 119 (91.54) Test: 11 (8.46) | Neutral: Total: 1372 (57.14) Training: 1229 (89.58) Test: 143 (10.42) |
aPercentages may not add up to 100% as classification was made for sentiment only if the tweet was relevant.
bSentiment-only code with neutral target.
Comparison of BERTweet and LSTMa F1 and AUROCb scores.
| Classifier/metric | Relevance | Commercial | Sentiment | |||
|
|
|
|
| |||
|
| F1 | 0.976 | 0.990 | 0.861 | ||
| AUROC | 0.945 | 0.993 | 0.817 | |||
|
|
|
|
| |||
|
| F1 | 0.924 | 0.727 | 0.250 | ||
| AUROC | 0.924 | 0.903 | 0.776 | |||
aLSTM: long short-term memory.
bAUROC: area under the receiver operating characteristic curve.