| Literature DB >> 28951381 |
Annice Kim1, Thomas Miano2, Robert Chew2, Matthew Eggers3, James Nonnemaker3.
Abstract
BACKGROUND: Despite concerns about their health risks, e‑cigarettes have gained popularity in recent years. Concurrent with the recent increase in e‑cigarette use, social media sites such as Twitter have become a common platform for sharing information about e-cigarettes and to promote marketing of e‑cigarettes. Monitoring the trends in e‑cigarette-related social media activity requires timely assessment of the content of posts and the types of users generating the content. However, little is known about the diversity of the types of users responsible for generating e‑cigarette-related content on Twitter.Entities:
Keywords: electronic cigarettes; machine learning; social media
Year: 2017 PMID: 28951381 PMCID: PMC5635233 DOI: 10.2196/publichealth.8060
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1Approach to classifying Twitter users who tweet about e-cigarettes.
Manual classification of Twitter users who tweet about e-cigarettes: user type definitions and proportion of each type in manually labeled sample.
| Type | Definition | Sample, N | |
| Individual | The account of a real person whose Twitter profile information and tweets reflect their individual thoughts and interests. An individual is someone whose primary post content is not about vaping. | 2168 | |
| Vaper enthusiast | The account of a person or organization whose primary content is related to promoting e‑cigarettes but is not primarily trying to sell e‑cigarettes or related products. | 334 | |
| 622 | |||
| News media | The account of a newspaper, magazine, news channel, etc. News media does not include vaping-specific news sources. | ||
| Health community | The account of a public health organization, coalition, agency, or credible individual affiliated with an organization. These may also be the accounts of organizations with authority on a topic that should be thought of as | ||
| 752 | |||
| Marketer | An account marketing e‑cigarette or vaping products. These accounts can belong to a Web-based or brick-and-mortar retailer or an individual who is an affiliate marketer. | ||
| Information aggregator | An account that primarily aggregates information about e‑cigarettes/vaping and where most or all tweets are news articles related to e‑cigarettes/vaping. This account could also aggregate vaping coupons or deals. | ||
| An account that does not fall into one of the other coding categories. These accounts often post on a broad range of topics unrelated to this project, and their content can be nonsensical. Anecdotally, it was observed that many of these accounts exhibited | 1021 | ||
aDuring manual annotation of data, we initially categorized subtypes of informed agency (ie, news media and health community) and marketer (ie, marketer and information aggregator) user types, but we did not identify sufficient numbers of user handles for these subtypes to conduct meaningful analyses. Thus, during the feature selection and modeling phases, we collapsed across user subtypes to define five total user types.
Classification of Twitter users who tweet about e-cigarettes: Gradient Boosting Regression Trees (GBRT) results comparing full model and metadata-only model.
| User type | Full model (metadata + derived data) | Metadata-only model | ||||
| Recall, % | Precision, % | Recall, % | Precision, % | |||
| Individual | 91.1 | 92.3 | 89.8 | 83.6 | 86.2 | 81.2 |
| Vaper enthusiast | 47.1 | 40.0 | 57.1 | 16.2 | 12.0 | 25.0 |
| Informed agency | 84.4 | 78.5 | 91.3 | 70.0 | 67.7 | 72.4 |
| Marketer | 81.2 | 85.9 | 77.0 | 65.6 | 72.6 | 59.9 |
| Spammer | 79.5 | 81.1 | 78.0 | 74.8 | 71.9 | 78.0 |
| Average | 83.3 | 83.7 | 83.3 | 72.7 | 73.7 | 72.3 |
Figure 2Distributions of manually labeled versus model-predicted classification of Twitter users who tweet about e-cigarettes.
Figure 3Two-dimensional t-SNE visualization of Twitter users who tweet about e-cigarettes.
Ten most important features in predicting Twitter users who tweet about e-cigarettes across all user types.
| Featuresa | Proportion of feature importance among all variables, % |
| Statuses count | 5.1 |
| Followers count | 4.1 |
| Original tweet raw keyword count | 3.7 |
| Profile description keyword count | 3.3 |
| Original tweet cosine similarity mean | 3.2 |
| Retweet cosine similarity mean | 3.0 |
| Friends count | 3.0 |
| Retweet raw keyword count | 3.0 |
| Listed count | 2.9 |
| Original tweet URL count mean | 2.7 |
| Favorites count | 2.7 |
aMost important feature among each user type—Individual: favorites count (4.9%); Vaper enthusiast: retweet raw keyword count (8.3%); Informed agency: followers count (6.5%); Marketer: original tweet raw keyword counts (8.9%); Spammer: statuses count (8.1%).
Figure 4Partial dependence plots of top features by user type for users who tweet about e-cigarettes.