Literature DB >> 27410031

Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter.

Eric M Clark^1,2,3,4, Chris A Jones^4,5,6, Jake Ryland Williams^2,3,7, Allison N Kurti⁵, Mitchell Craig Norotsky⁴, Christopher M Danforth^1,2,3, Peter Sheridan Dodds^1,2,3.

Abstract

BACKGROUND: Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, and free samples.
METHODS: All electronic cigarette keyword related tweets from a 10% sample of Twitter spanning January 2012 through December 2014 (approximately 850,000 total tweets) were identified and categorized as Automated or Organic by combining a keyword classification and a machine trained Human Detection algorithm. A sentiment analysis using Hedonometrics was performed on Organic tweets to quantify the change in consumer sentiments over time. Commercialized tweets were topically categorized with key phrasal pattern matching.
RESULTS: The overwhelming majority (80%) of tweets were classified as automated or promotional in nature. The majority of these tweets were coded as commercialized (83.65% in 2013), up to 33% of which offered discounts or free samples and appeared on over a billion twitter feeds as impressions. The positivity of Organic (human) classified tweets has decreased over time (5.84 in 2013 to 5.77 in 2014) due to a relative increase in the negative words 'ban', 'tobacco', 'doesn't', 'drug', 'against', 'poison', 'tax' and a relative decrease in the positive words like 'haha', 'good', 'cool'. Automated tweets are more positive than organic (6.17 versus 5.84) due to a relative increase in the marketing words like 'best', 'win', 'buy', 'sale', 'health', 'discount' and a relative decrease in negative words like 'bad', 'hate', 'stupid', 'don't'.
CONCLUSIONS: Due to the youth presence on Twitter and the clinical uncertainty of the long term health complications of electronic cigarette consumption, the protection of public health warrants scrutiny and potential regulation of social media marketing.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27410031 PMCID： PMC4943591 DOI： 10.1371/journal.pone.0157304

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Electronic Nicotine Delivery Systems, or e-cigs, have become a popular alternative to traditional tobacco products. The vaporization technology present in e-cigarettes allows consumers to simulate tobacco smoking without igniting the carcinogens found in tobacco [1]. Survey methods have revealed widespread awareness of e-cigarette products [2, 3]. The health risks [4-7], marketing regulations [8], and the potential of these devices as a form of nicotine replacement therapy [9-11] are hotly debated politically [12] and investigated clinically [13, 14]. The CDC reports that more people in the US are addicted to nicotine than any other drug and that nicotine may be as addictive as heroin, cocaine, and alcohol [15-18]. Nicotine addiction is extremely difficult to quit, often requiring more than one attempt [18, 19], however nearly 70% of smokers in the US want to quit [20]. Data mining can provide valuable insight into marketing strategies, varieties of e-cigarette brands, and their use by consumers [21-25]. Twitter, a mainstream social media outlet comprising over 230 million active accounts, provides a means to survey the popularity and sentiment of consumer opinions regarding e-cigarettes over time. Individuals post tweets which are short text based messages restricted to 140 characters. Using data mining techniques, roughly 850,000 tweets containing mentions of e-cigarettes were collected from a 10% sample of Twitter’s garden hose feed spanning from January 2012 though December 2014. This analysis extends a preliminary study [26] which analyzed all e-cigarette related tweets spanning May through June 2012. As Twitter has become a mainstream social media outlet, it has become increasingly enticing for third parties to gamify the system by creating self-tweeting automated software to send messages to organic (human) accounts as a means for personal gain and for influence manipulation [27]. We recently introduced a classification algorithm that is based upon three linguistic attributes of an individual’s tweets [28]. The algorithm analyzes the average hyperlink (URL) count per tweet, the average pairwise dissimilarity between an individual’s tweets, and the unique word introduction decay rate of an individual’s tweets. All tweets mentioning e-cigarettes were categorized using a two-tier classification process. Tweets containing an abundance of marketing slang (‘free trial’, ‘starter kit’, ‘coupon’) are immediately categorized as automated. All of the tweets from individuals that have mentioned an e-cigarette keyword are collected in order to classify the remaining tweets per individual as either organic or automated. The machine learning classifier was trained on the natural linguistic cues from human accounts to identify promotional and SPAM entities by exclusion. The manipulative effects, agendas, and ecosystem of generalized social media marketing campaigns have been identified and extensively studied [29-31]. Other work, [32], has distinguished between purely automated accounts, or “robots”, and human assisted automated accounts referred to as “cyborgs”. On Twitter, these campaigns have also been characterized using Markov Random Fields to classify accounts as either promotional or organic [33]. This study was able to achieve very high classification accuracy, but was working under a much shorter time frame (1 month) and was trained on all relevant tweets authored within this time window. Our study compiled a 10% sample of tweets over a three-year period, so we relied on a classifier that was trained on smaller samples of tweets per individual. The emotionally charged words that contribute to the positivity of various subsets of tweets from each category were quantitatively measured using hedonometrics [34, 35]. Outliers in both the positivity and frequency time-series distributions correspond to political debates regarding the regulation of e-cigarettes. Recent studies [36-40] report an alarmingly rapid increase in the youth awareness and consumption of electronic cigarettes; a Michigan study found that the use of e-cigarettes surpass tobacco cigarettes among teens [41]. The CDC reports that “the number of never-smoking youth increased three-fold from approximately 79,000 in 2011 to 263,000 in 2013” [42]. During this time-period there has also been a substantial (256%) increase in youth exposure to electronic cigarette television marketing campaigns [43]. Due to the high youth presence on Twitter [44] as well as the clinical uncertainty regarding the risks associated with e-cigarettes, understanding the effect of promotionally marketing vaporization products across social media should be immediately relevant to public health and policy makers.

Materials and Methods

Data Collection

An exhaustive search from the 10% “garden hose” random sample from Twitter’s streaming API spanning 2012 through 2014 yielded approximately 850,000 tweets mentioning a keyword related to electronic cigarettes including: e(-)cig, e(-)cigarette, electronic cigarette, etc. All tweets were tokenized by removing punctuation and performing a case insensitive pattern match for keywords. Using time zone meta-data the tweets were converted into their local post time, in order for a more accurate ordinal sentiment analysis. The language, reported by Twitter, and user features were also collected and analyzed. The data from our study was collected via a program developed by Dodds et al, that pings Twitter’s streaming API and complies with Twitter’s Terms of Service. Our study collected each account’s unique twitter user identification number in order to classify them as either Automated or Organic, however our published data has been anonymized by replacing Twitter’s UserIDs with placeholder values.

Automation Classification

As reported in [26] there is a high prevalence of automation among e-cigarette related tweets. Many of these messages were promotional in nature, offering discounted or free samples or advertising specific electronic cigarette paraphernalia. A human detection algorithm defined and tested in [28] was implemented to classify accounts as either automated or organic (human in nature). The original classifier was trained on 1000 accounts—752 were identified as humans and 248 as automated accounts. The classifier operates by isolating organic linguistic characteristics and identifies automated accounts by exclusion. All tweets from each individual appearing in our dataset were collected for the classifier. For each individual, the average URL count, average tweet dissimilarity, and word introduction decay rate were calculated for the individuals with at least 25 sampled tweets. The majority (94%) of commercial e-cigarette tweets collected by [26] contain a hyperlink (URL). The average URL count per tweet has been demonstrated to be a strong feature for detecting robotic accounts [45-47]. Many algorithmically generated tweets contain similar structures with minor character replacements and long chains of common substrings, as opposed to Organic content. The Pairwise Tweet Dissimilarity of tweets t, t from a particular individual was estimated by subtracting the length (number of characters) of the longest common subsequence, |LCS(t, t)| from the length of both tweets, |t| + |t| and normalizing by the total length of both tweets: For example, given the two tweets: (t1, t2) = (I love tweeting, I love spamming). Then |t1| = 16, |t2| = 15, LCS(t1, t2) = |I love | = 7 (including whitespace) and we calculate the pairwise tweet dissimilarity as: The average tweet dissimilarity of the individual was then estimated by finding the arithmetic mean of each individual’s calculated pairwise tweet dissimilarity. Since automated and promotional accounts have a structured and limited vocabulary, the unique word introduction decay rate introduced in [48] serves as another useful attribute to detect automated accounts. Using these attributes, the calibrated human detection algorithm, tested in [28], detected over 90% of automated accounts from a mixed 1000 user sample with less than a 5% false positive rate. The Human Detection Algorithm was calibrated for a range of tweet sample sizes from hand classified Organic accounts. Ordinal samples of collected tweets from each account were binned into partitions of 25 ranging from 25 to a maximum of 500 tweets. Table 1 below lists the number of automated and organic classified accounts per year. Individuals with less than 25 sampled tweets were not classified with the detection algorithm.

Table 1

Electronic Cigarette Tweet Category Counts and Twitter Account Classification.

Year	Tweet Categorization				Account Classification
	Total	Automated	Organic	Discarded	Automated	Organic	N/A*
2012	107,918	85,546	13,492	8,880	12,715	12,052	19,512
2013	426,306	339,111	76,037	11,158	64,874	59,376	120,142
2014	316,424	234,972	68,698	12,754	54,033	63,289	48,528

*Accounts with less than 25 tweets were not classified.

*Accounts with less than 25 tweets were not classified. To benchmark the accuracy of the detection algorithm on this sample of tweets, a random sample of 500 accounts algorithmically classified as automatons and 500 classified as Organic were hand classified. All collected tweets were hand coded by two evaluators. Tweets were reviewed until the evaluator noticed the presence of automation. If no subset of tweets appeared to be algorithmically generated, the individual was coded as human. Both evaluators had prior experience distinguishing algorithmic versus organic tweets. Refer to the supplementary materials in [28] for a detailed explanation of this annotation process. In Fig 1, features of each of these 1000 sampled individuals are plotted in three dimensions. Organic features (green) are densely distributed, while the automated features (red points) are more dispersed. The black lines illustrates the organic feature cutoff for the classifier; individuals with features falling outside of the box are classified as automatons. On this sampled set of accounts, the classification algorithm exhibited a 94.6% True Positive rate with a 12.9% False Positive Rate.

Fig 1

Tweets from a random sample of 500 organic classified and 500 automated classified accounts were hand coded to gauge the accuracy of the detection algorithm.

The feature set of each sampled individual is plotted in three dimensions. The traced box indicate the organic feature cutoff. True Positives (red) are correctly identified automatons, True Negatives (green) are correctly identified Humans, False Negatives (blue) are automatons classified as humans and False Positives (orange) are humans classified as automatons.

Tweets from a random sample of 500 organic classified and 500 automated classified accounts were hand coded to gauge the accuracy of the detection algorithm.

Categorization by Topics

Tweets with at least 3 advertising jargon references (e.g. coupon, starter kit, free trial) were immediately classified as automated. All posts from users with at least 10 marketing classified tweets were also flagged as automated. As noted in [26], some Organic users could retweet promotional content for rewards (e.g. winning free samples or discounts). All of these tweets were still classified as automated, but the user was not flagged as such. The remaining tweets were classified as either automated or organic by the human detection algorithm. Posts from users who had an insufficient number of sampled tweets (<25) to algorithmically classify and who hadn’t posted commercial content were classified as Organic. Due to the high prevalence of hyperlinks included in tweets from promotional accounts, Tweets with URLs whose user had insufficient tweets to classify algorithmically were discarded (3.85% total tweets). A final list with each tweet classification coding is created by merging the commercial keyword classification with the results from the Human Detection Algorithm.

Results and Discussion

The number of automated, and in particular promotional, tweets vastly overwhelm (80.7%) the organic (see Fig 2). The identified automated accounts tweet e-cigarette content with much higher frequency than the Organic users. The average number of automated tweets per user was 1.96 with a standard deviation of 35.06 and a max of 14,310. Average organic posts per user were 1.44 with a standard deviation of 4.01 and max of 356 tweets. A total of 607,446 Automated Tweets provided a URL (92.09%).

Fig 2

Left: Binned User E-cigarette Keyword Tweet Distribution (2012-2014). Right: 2013 Automated Tweet Rank-Frequency Word Cloud. High frequency stop words (‘of’, ‘the’, etc.) are removed from the rank-frequency word distribution. Frequency WordClouds (see Fig 2) illustrate the most frequently used words by the Automated category. The size of the text reflects the ranked word frequencies. Marketing key words (Free Trial, Brand, Starter Kit, win, Sale) and brand names (V2, Apollo) are prevalent, illustrating commercial intent. Many automated tweets also refer to the health benefits of switching to electronic cigarettes (#EcigsSaveLives), even though they have not been officially approved as such by the Food and Drug Administration, [49, 50]. See Table 2 for sub categorical counts of the automated tweets.

Table 2

Automated Tweet Subcategory Counts.

Subcategory	Count	Percentage	Impressions	Relevance*	Year
Commercial	53,471	62.51%	59.74M	88.4%	‘12
	283,677	83.65%	195.25M		‘13
	149,333	63.55%	951.03M		‘14
Cessation	6,392	7.47%	8.59M	90.8%	‘12
	6,599	1.95%	25.64M		‘13
	8,386	3.57%	42.72M		‘14
Discount	26,596	31.09%	27.02M	89.8%	‘12
	112,720	33.24%	38.21M		‘13
	37,735	16.06%	160.49M		‘14
Flavor	1,685	1.97%	2.24M	81%	‘12
	2,715	0.80%	4.79M		‘13
	6,133	2.61%	17.51M		‘14

*Relevant percentage of 500 randomly sampled tweets

Tweet Sentiment Analysis

Hedonometrics are performed on the organic subset of electronic cigarette tweets to quantify the change in user sentiments over time. Using the happiness scores of English words from LabMT [34], along with its multi-language companion [35] the average emotional rating of a corpus is calculated by tallying the appearance of words found in the intersection of the word-happiness distribution and a given corpus, in this case subsets of tweets. A weighted arithmetic mean of each word’s frequency, fword, and corresponding happiness score, h for each of the N words in a text yields the average happiness score for the corpus, : The average happiness of each word, h lies on a 9 point scale: 1 is extremely negative and 9 is extremely positive. Neutral words (4 ≤ h ≤ 6), aka ‘stop words’, were removed from the analysis to bolster the emotional signal of each set of tweets. Fig 3 shows that automated electronic cigarette tweets are using very positive language to promote their products. The average happiness of the Organic tweets are much more stable, and are becoming slightly more negative over time. Both distributions have a sudden drop in positivity during December 2013, around a debate regarding new e-cigarette legislation by the European Union. These tweets, labeled #EuEcigBan, are investigated separately in the next section. The words that have the largest contributions to changes in sentiments are investigated with Word-shift graphs.

Fig 3

Categorical Tweet Word-shift Graphs: On the left, Organic Tweets from 2013 are the reference distribution to compare sentiments of Organic Tweets made in 2014 where we see a negative shift in the calculated average word happiness.

Categorical Tweet Word-shift Graphs: On the left, Organic Tweets from 2013 are the reference distribution to compare sentiments of Organic Tweets made in 2014 where we see a negative shift in the calculated average word happiness.

Due to tweets tagged #EUEcig Ban, January 2014 and December 2013 are omitted. The computed average happiness (havg) decreases from 5.82 to 5.77 due to both an increase in the negative words ‘tobacco’, ‘drug’, ‘ban’, ‘poison’, and a decrease in the positive words ‘love’, ‘like’, ‘haha’, ‘cool’ among others. On the right, Organic Tweets from 2013 are the reference distribution to compare Automated Tweets from 2013. The words ‘free’ and ‘trial’ are excluded from the graph, since their high frequency and happiness scores distorts the image. With these key words included the the automated tweet havg increases from 6.17 to 6.59. Word-shift graphs, introduced in [34], illustrate the words causing an emotional shift between two word frequency distributions. A reference period (T), creates a basis of the emotional words being used to compare with another period, (T). The top 50 words responsible for a happiness shift between the two periods are displayed, along with their contribution to shifting the average happiness of the tweet-set. The arrows (↑, ↓) next to a word indicate an increase or decrease, respectively, of the word’s frequency during the comparison period with respect to the reference period. The addition and subtraction signs indicate if the word contributes positively or negatively, respectively, to the average happiness score. Marketing accounts that delivered personalized advertising by attempting to impersonate organic users were prevalent among these commercial entities. These accounts, along with the traditional marketing robots, were diluting the data with extremely positive sentiments regarding their products. Using hedonometrics, we distinguish the emotionally charged words that influence a shift in computed average word happiness between these types of accounts. The sentiment analysis helps to characterize the thematic differences between Organic and Automated entities. In Fig 3, below, Word-shift graphs compare the change in Organic sentiments over time, as well as the difference in sentiments between automated and organic tweets. On the left, the 2013 Organic Tweet distribution is used as a reference to compare sentiments from 2014 Organic Tweets. December 2013 and January 2014 are removed to dampen the effect of tweets mentioning the #EUecigBan (see S1 Fig). The average happiness score decreases from 5.84 in 2013 to 5.77 in 2014. This decrease in the average happiness score is due to a relative increase in the negative words ‘ban’, ‘tobacco’, ‘doesn’t’, ‘drug’, ‘against’, ‘poison’, ‘tax’; a relative decrease in the positive words ‘haha’, ‘good’, ‘cool’. Notably, there is also relatively less usage of the words ‘quit’, ‘addicted’, and an increase in ‘health’, ‘kids’, ‘juice’. On the right, Organic tweets from 2013 is the reference distribution to compare Automated tweets from the same year. Automated tweets are more positive (6.17-6.59 versus 5.84) due to a relative increase in the marketing words ‘best’, ‘win’, ‘buy’, ‘sale’, ‘health’, ‘discount’, etc and a relative decrease in the negative words ‘bad’, ‘hate’, ‘stupid’, ‘don’t’, among others.

Sub-Categorical Tweet Topics

Pertinent topics related to e-cigarette marketing regulation include kid-friendly flavors, smoking cessation claims, and price reduction (including free trials, and starter kits). The commercialized, smoking cessation claims, and discounts were primary topics in the foundational study [51] that identified these campaigns over a 2 month time window. We included the kid-friendly flavors topic in this list due to recent studies reporting their prevalence [10, 24] as well as its current spotlight in political controversy. Keywords from each of these topics are used to sub-classify the automated tweet set per year, see Table 2 below. Purely commercial tweets were those with any marketing keywords including: ‘buy’, ‘save’, ‘coupon(s)’, ‘discount’, ‘price’, ‘cost’, ‘deal’, ‘promo’, ‘money’, ‘sale’, ‘purchase’, ‘offer’, ‘review’, ‘code’, ‘win(ner)’, ‘free’, ‘starter kit(s)’, ‘premium’. The URL from each tweet was also analyzed for promotional keywords. Any URL with at least three mentions of the above keywords was enough to classify the tweet as commercial. When an individual on Twitter ‘follows’ another account, posts from these users appear on the ‘timeline’ of the individual. We quantify the social reach of each of these sub-categorical tweets by counting the total number of accounts’ ‘timelines’ who could have been exposed to the advertisement. To approximate this, we sum the number of followers from each individual’s tweets. The total number of impressions from the commercial category increases from 195.25 million to 951.03 million between 2013 to 2014, even though the total count has dropped from 283k to 149k. This implies that promotional accounts that are successful in deceiving Twitter’s SPAM detector may be gaining many more social links to broadcast their commercial context. In order to gauge the accuracy of these sub-categorical tweet topics, 500 tweets were randomly sampled from each category and were evaluated separately by two people to determine the relevance of the tweet to its categorization. The evaluators had a high level of concordance (84.8%) and the discrepancies were resolved and merged into a final list. Sampled tweets were highly relevant per category, the percentage for each is given in Table 2 below. Many automated tweets mentioned using electronic cigarettes as a cessation device, or as a safe alternative. Over 20,000 tweets were classified as cessation related, which potentially appeared on over 76.8 million individual’s Twitter feed as impressions. Although electronic cigarettes have not been conclusively authorized as an effective cessation device, [11] has demonstrated the infectiveness of electronic cigarettes to suppress nicotine cravings. It is also notable that these affiliate marketing accounts are advertising electronic cigarettes as a completely safe alternative to analog tobacco use, contrary to recent studies [52-55]. Cessation tweets were tallied using the keywords ‘quit’, ‘quitting’, ‘stop smoking’, ‘smoke free’, ‘safe’, ‘safer’, ‘safest’. Many of the purely commercialized tweets mentioned discounts or even free samples. These Discount tweets were categorized with the keywords ‘free trial’, ‘coupon(s)’, ‘discount(s)’, ‘save’, ‘sale’, ‘free (e)lectronic (cig)arette’. Tweets advertising flavors were tallied using the keywords ‘flavor(s)’ and ‘flavour(s)’ along with an extensive list of popular electronic cigarette flavors compiled from a distributor’s website (https://crazyvapors.com/e-liquid-flavor-list/). A noteworthy class of E-cigarette commercial-bots, are those that are masquerading as Organic users to spam pseudo-positive messages towards potential consumers. These “cyborgs”, as defined in [28, 33, 45], spam a positive message regarding a personal experience. One class of these automatons are sending contrived testimonies that e-cigarettes have successfully allowed them to quit smoking cigarettes. These messages are very intentionally structured and tend to swap a few words to appear organic. These messages also target specific individuals as a more personal form of marketing. The general tweet structure from a sample cyborg marketing strategy is given below: @USER {I,We} {tried,pursued} to {give up, quit} smoking. Discovered BRAND electronic cigarettes and quit in {#} weeks. {Marvelous,Amazing,Terrific}! URL @USER It’s now really easy to {quit,give up} smoking (cigarettes).—these BRAND electronic cigarettes are lots of {fun,pleasure}! URL @USER electronic cigarettes can assist cigarette smokers to quit, it’s well worth the cost URL @USER It’s {incredible,amazing}—the (really) {easy,painless} {answer,method} to quit cigarette smoking through BRAND electronic cigarettes URL I managed to quit smoking with these e-cigarettes, I highly recommend them: URL @USER @USER Its {amazing, extraordinary}—I (really) quit smoking after {#} yrs thanks to BRAND electronic cigarettes! URL Using cyborgs to mimic Organic Users for marketing purposes should be analyzed heavily, to gauge their impact and effectiveness on consumers.

Conclusion

Our study has identified an abundance of automated, and in particular, promotional tweets, and consequent organic sentiments. The collected categorized tweet data from this analysis is available for follow-up analyses into e-cigarette social media marketing campaigns. Future work can perform a deeper analysis on the URL content, similar to [23], posted by promotional accounts to get a better sense of the smoking cessation, flavor mentions, and discount prevalence. We take care not to downplay the well recognized health benefits from smoking cessation including: decreased risk of coronary artery disease, cerebrovascular disease, peripheral vascular disease, decreased incidence of respiratory symptoms such as cough, wheezing, shortness of breath, decreased incidence of chronic obstructive pulmonary disease, and decreased risk of infertility in women of childbearing age [15, 18, 56]. The greatest concern of promotional e-cigarette marketing on Twitter is the risk of enticing younger generations who otherwise may never have commenced consuming nicotine. Due to the unknown but unignorable long-term adverse health effects of electronic cigarettes and the alarmingly increased youth consumption, monitoring and potentially regulating social media commercialization of these products should be immediately relevant to public health and policy agendas.

European Union E-cigarette Ban Political Debate (#EUecigBan).

(Left) Word shift graph comparing tweets tagged #EUecigBan against 2013 English Organic User Tweets (untagged). (top-right) The automated and Organic tagged tweet distributions are plotted. A histogram displays the counts per language and user class. (bottom-right) Word clouds compare ranked-word frequencies across language and user type. Each categorical time-series exhibits a severe negative trend occurring between December 2013 and January 2014. There is an inverse relationship with the average happiness scores during this time period. This was during the time that the EU was debating strict regulation and a possible ban on specific e-cigarette products [12]. Hashtags (#) allow users to categorize the content of their tweets. During this period, 13,227 sampled tweets were tagged with #EUecigBan. In S1 Fig, a word shift graph (left) visualizes the sentiments from English Organic users using #EUecigBan versus the remaining Organic tweets from 2013. English Tweets tagged #EuEcigBan are the comparison distribution in reference to all other tweets from 2013. Tweets containing #EuEcigBan are on average much more negative (h 5.81 versus 5.37) due to an increase in the negative words ‘ban’, ‘stop’, ‘no’, ‘not’, ‘fight’, ‘against’, ‘disaster’, ‘death’, ‘corruption’, ‘tobacco’, ‘kills’, etc. The positive words also disfavor the legislation, with the words ‘save’, ‘millions’, ‘lives’, ‘support’, ‘healthy’ occurring more frequently. English, French, and German tagged tweets were the most prevalent, and word clouds help visualize themes between language and user class. This shows that Twitter sentiments can be useful in gauging public opinion toward regulation of electronic cigarettes. There is also a heavy automated tweet presence in each language with a similar attitude regarding the legislation, as depicted in the word clouds. Future work should also investigate if and how automated users can impact organic opinion on legislation. (PDF) Click here for additional data file.

Electronic Cigarette Table of Key Words.

List of all key words used in the analysis. Flavors compiled from https://crazyvapors.com/e-liquid-flavor-list/ Keywords other than ‘General Twitter Scrape’ were applied to categorize automated account tweets. (PDF) Click here for additional data file.

Twitter IDs.

List of all Twitter IDs appearing in the analysis. (TXT) Click here for additional data file.

35 in total

1. Quitting smoking among adults--United States, 2001-2010.

Authors:
Journal: MMWR Morb Mortal Wkly Rep Date: 2011-11-11 Impact factor: 17.586

2. Electronic-cigarette smoking experience among adolescents.

Authors: Jun Ho Cho; Eunyoung Shin; Sang-Sik Moon
Journal: J Adolesc Health Date: 2011-11 Impact factor: 5.012

3. Electronic nicotine delivery systems: is there a need for regulation?

Authors: Anna Trtchounian; Prue Talbot
Journal: Tob Control Date: 2010-12-07 Impact factor: 7.552

4. The scientific basis of tobacco product regulation.

Authors: D L Ashley; D Burns; M Djordjevic; E Dybing; N Gray; S K Hammond; J Henningfield; M Jarvis; K S Reddy; C Robertson; G Zaatari
Journal: World Health Organ Tech Rep Ser Date: 2008

5. Electronic cigarettes and conventional cigarette use among U.S. adolescents: a cross-sectional study.

Authors: Lauren M Dutra; Stanton A Glantz
Journal: JAMA Pediatr Date: 2014-07 Impact factor: 16.193

6. Electronic nicotine delivery devices: ineffective nicotine delivery and craving suppression after acute administration.

Authors: Thomas Eissenberg
Journal: Tob Control Date: 2010-02 Impact factor: 7.552

7. Adolescent males' awareness of and willingness to try electronic cigarettes.

Authors: Jessica K Pepper; Paul L Reiter; Annie-Laurie McRee; Linda D Cameron; Melissa B Gilkey; Noel T Brewer
Journal: J Adolesc Health Date: 2012-11-30 Impact factor: 5.012

8. Levels of selected carcinogens and toxicants in vapour from electronic cigarettes.

Authors: Maciej Lukasz Goniewicz; Jakub Knysak; Michal Gawron; Leon Kosmider; Andrzej Sobczak; Jolanta Kurek; Adam Prokopowicz; Magdalena Jablonska-Czapla; Czeslawa Rosik-Dulewska; Christopher Havel; Peyton Jacob; Neal Benowitz
Journal: Tob Control Date: 2013-03-06 Impact factor: 7.552

Review 9. Electronic cigarettes and vaping: a new challenge in clinical medicine and public health. A literature review.

Authors: Dominic L Palazzolo
Journal: Front Public Health Date: 2013-11-18

10. Using Twitter Data to Gain Insights into E-cigarette Marketing and Locations of Use: An Infoveillance Study.

Authors: Annice E Kim; Timothy Hopper; Sean Simpson; James Nonnemaker; Alicea J Lieberman; Heather Hansen; Jamie Guillory; Lauren Porter
Journal: J Med Internet Res Date: 2015-11-06 Impact factor: 5.428

27 in total

1. Youth say ads for flavored e-liquids are for them.

Authors: Karma McKelvey; Mike Baiocchi; Divya Ramamurthi; Sheila McLaughlin; Bonnie Halpern-Felsher
Journal: Addict Behav Date: 2018-08-29 Impact factor: 3.913

2. Strategies to find audience segments on Twitter for e-cigarette education campaigns.

Authors: Kar-Hai Chu; Jon-Patrick Allem; Jennifer B Unger; Tess Boley Cruz; Meleeka Akbarpour; Matthew G Kirkpatrick
Journal: Addict Behav Date: 2018-11-14 Impact factor: 3.913

3. "Technophilia": A new risk factor for electronic cigarette use among early adolescents?

Authors: Inti Barrientos-Gutierrez; Paula Lozano; Edna Arillo-Santillan; Paola Morello; Raul Mejia; James F Thrasher
Journal: Addict Behav Date: 2018-09-06 Impact factor: 3.913

4. Real Time Assessment of Young Adults' Attitudes toward Tobacco Messages.

Authors: Emily T Hébert; Elizabeth A Vandewater; Michael S Businelle; Melissa B Harrell; Steven H Kelder; Cheryl L Perry
Journal: Tob Regul Sci Date: 2018-01-01

5. Differential patterns of e-cigarette and tobacco marketing exposures among youth: Associations with substance use and tobacco prevention strategies.

Authors: Nicholas Peiper; Camila Aramburú; Kirsten Thompson; Melissa Abadi
Journal: Int J Drug Policy Date: 2020-11-17

6. A longitudinal analysis of electronic cigarette forum participation.

Authors: Sarah F Maloney; Eric K Soule; Sherilyn Palafox; Keaton McFadden; Mignonne C Guy; Thomas Eissenberg; Pebbles Fagan
Journal: Addict Behav Date: 2018-08-07 Impact factor: 4.591

7. Evaluating patients' perspectives on social media: the importance of clearly reporting data search, cleaning and processing.

Authors: K Reuter; D Lee
Journal: Br J Dermatol Date: 2019-04-14 Impact factor: 11.113

Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter.

Introduction

Materials and Methods

Data Collection

Automation Classification

Tweets from a random sample of 500 organic classified and 500 automated classified accounts were hand coded to gauge the accuracy of the detection algorithm.

Categorization by Topics

Results and Discussion

Tweet Sentiment Analysis

Categorical Tweet Word-shift Graphs: On the left, Organic Tweets from 2013 are the reference distribution to compare sentiments of Organic Tweets made in 2014 where we see a negative shift in the calculated average word happiness.

Sub-Categorical Tweet Topics

Conclusion

European Union E-cigarette Ban Political Debate (#EUecigBan).

Electronic Cigarette Table of Key Words.

Twitter IDs.

1. Quitting smoking among adults--United States, 2001-2010.

2. Electronic-cigarette smoking experience among adolescents.

3. Electronic nicotine delivery systems: is there a need for regulation?

4. The scientific basis of tobacco product regulation.

5. Electronic cigarettes and conventional cigarette use among U.S. adolescents: a cross-sectional study.

6. Electronic nicotine delivery devices: ineffective nicotine delivery and craving suppression after acute administration.

7. Adolescent males' awareness of and willingness to try electronic cigarettes.

8. Levels of selected carcinogens and toxicants in vapour from electronic cigarettes.

Review 9. Electronic cigarettes and vaping: a new challenge in clinical medicine and public health. A literature review.

10. Using Twitter Data to Gain Insights into E-cigarette Marketing and Locations of Use: An Infoveillance Study.

1. Youth say ads for flavored e-liquids are for them.

2. Strategies to find audience segments on Twitter for e-cigarette education campaigns.

3. "Technophilia": A new risk factor for electronic cigarette use among early adolescents?

4. Real Time Assessment of Young Adults' Attitudes toward Tobacco Messages.

5. Differential patterns of e-cigarette and tobacco marketing exposures among youth: Associations with substance use and tobacco prevention strategies.

6. A longitudinal analysis of electronic cigarette forum participation.

7. Evaluating patients' perspectives on social media: the importance of clearly reporting data search, cleaning and processing.

8. Marketing Content on E-Cigarette Brand-Sponsored Facebook Profile Pages.

9. Whose Post Is It? Predicting E-cigarette Brand from Social Media Posts.

10. Engagement With Online Tobacco Marketing Among Adolescents in the United States: 2013-2014 to 2014-2015.