| Literature DB >> 26307512 |
Heather Cole-Lewis1, Arun Varghese, Amy Sanders, Mary Schwarz, Jillian Pugatch, Erik Augustson.
Abstract
BACKGROUND: Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public's knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions.Entities:
Keywords: Twitter; e-cigarette; machine learning; social media
Mesh:
Year: 2015 PMID: 26307512 PMCID: PMC4642404 DOI: 10.2196/jmir.4392
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Supervised machine learning-based e-cigarette tweet classification categories (interrater reliability score for manual annotation).
| Classification (Fleiss’ kappa) | Labels |
| Relevancea: Identifies tweets that are related to e-cigarettes (0.70) | Relevant |
| Subcategory: retweet with no additional information | |
| Subcategory: original tweets that were part of a conversation and require greater context to be interpreted | |
| Subcategory: duplicated tweets from a user account that had since been suspended or was primarily being used for spam or unwanted solicitations | |
| Not relevant | |
| Sentimentb: Indicates whether the stance in the tweet is positive, neutral, or negative towards e-cigarettes and users of e-cigarettes (0.65) | Positive |
| Neutral | |
| Negative | |
| User descriptionb: Characterizes the sender of the tweet based on information gleaned from the user profile (0.66) | Celebrity |
| Government | |
| Foundations or organizations | |
| Reputable news source | |
| Everyday people | |
| E-cigarette community movement | |
| Retailers | |
| Tobacco company | |
| Bots/hacked | |
| Genre: Represents the format of the tweet (0.64) | Information |
| First person e-cig use or intent | |
| Second/third person experience | |
| Personal opinion | |
| Marketing | |
| News/update | |
| Theme: Refers to the topical domain of the content in the tweet (0.65) | Cessation |
| Health and safety | |
| Underage usage | |
| Craving | |
| Other substances | |
| Illicit substance use in e-cigs | |
| Policy or government | |
| Parental use of e-cigs | |
| Advertisement/promotion | |
| Flavors |
aBinary version of this category was created in addition to multiclass version for the purposes of the analysis.
bCategories were mutually exclusive and thus analyzed as multiclass.
Supervised machine learning-based e-cigarette tweet classification performance results.
| Classifier labels | Best n-gram | Accuracy score | % achieved of possible improvement over random baseline |
| Relevance categorya | 1 | 0.75 | 57.25 |
| Relevance | 1 | 0.94 | 75.26 |
| User descriptiona | 2 | 0.68 | 41.59 |
| Sentimenta | 2 | 0.76 | 46.05 |
| News | 1 | 0.93 | 52.26 |
| Info | 4 | 0.86 | 41.75 |
| Personal experience | 2 | 0.84 | 50.17 |
| Second person | 2 | 0.92 | 47.09 |
| Personal opinion | 2 | 0.79 | 48.93 |
| Marketing | 1 | 0.91 | 72.56 |
| Cessation | 1 | 0.95 | 58.43 |
| Health and safety | 1 | 0.90 | 56.29 |
| Underage usage | 1 | 0.97 | 58.92 |
| Craving | 2 | 0.97 | 58.43 |
| Other substancesb | 1 | 0.99 | 49.42 |
| Illicit substances | 2 | 0.98 | 48.24 |
| Policy or government | 1 | 0.94 | 80.62 |
| Parental use | 1 | 0.99 | 54.40 |
| Ad or promotion | 1 | 0.89 | 72.69 |
| Flavor | 1 | 0.97 | 62.52 |
aClassifiers were multiclass. All other categories were binary.
bk-nearest neighbors (kNN) was the best performing classification technique; for all other cases, linear support vector machine (SVM) was best.
Figure 1Learning curve for tweet relevance classification.
Figure 2Learning curve for tweet topic classification.