| Literature DB >> 26957477 |
Eradah O Hamad1, Marie Y Savundranayagam, Jeffrey D Holmes, Elizabeth Anne Kinsella, Andrew M Johnson.
Abstract
BACKGROUND: Twitter's 140-character microblog posts are increasingly used to access information and facilitate discussions among health care professionals and between patients with chronic conditions and their caregivers. Recently, efforts have emerged to investigate the content of health care-related posts on Twitter. This marks a new area for researchers to investigate and apply content analysis (CA). In current infodemiology, infoveillance and digital disease detection research initiatives, quantitative and qualitative Twitter data are often combined, and there are no clear guidelines for researchers to follow when collecting and evaluating Twitter-driven content.Entities:
Keywords: Twitter feeds; coding; computer-aided content analysis; content analysis; digital disease detection; health care social media; health care tweets; infodemiology; infoveillance; mixed methods research
Mesh:
Year: 2016 PMID: 26957477 PMCID: PMC4804105 DOI: 10.2196/jmir.5391
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Content analysis (CA) in health care research. Adapted from Hsieh & Shannon (2004, p 1286, Table 4) with permission of SAGE Publications, Inc.
Studies analyzing health-related Twitter posts (2010–2014).
| Author(s) | Keywords and hashtags (#) | Sampling and data collection | Data analysis (coding process) | Validation and presentation of results |
| Chew & Eysenbach (2010) [ | “swine flu”, “swineflu”, and “H1N1” | Random sample of 5395 tweets for 9 days (each 4 weeks apart) generated from 2 million archived tweets over 8 months. Tweets were posted between May 1 and December 31, 2009 (n=600 tweets/per day were collected for analysis). | Infoveillance approach (statistical classifier) for tracking flu rate (longitudinal text mining and analysis). This approach includes in-depth qualitative manual coding, automated CAausing a triaxial coding scheme, and sentiment analysis. | Pilot coding (1200 tweets), ICRbfor a subset of 125 tweets using kappa statistic (κ>.70), Pearson correlations between manual and automated coding, and chi-square to test changes over time, frequency tables, and text matrices with quotes illustrating the categories. |
| Scanfeld et al (2010) [ | “antibiotic” and “antibiotics” | Random sample of 52,153 tweets. Tweets were posted weekly between March 13 and July 31, 2009 (n=1000 tweets were collected for analysis). | Cross-sectional survey approach using Q-methodology and CA (frequencies). | Pilot coding of 100 tweets, ICR for a random sample of 10% of the analyzed tweets using kappa statistic (κ=.73), frequency tables, and text matrices with quotes illustrating the categories. |
| Heaivilin et al (2011) [ | “toothache”, “tooth ache”, “dental pain”, and “tooth pain” | Random sample of 4859 tweets over 7 nonconsecutive days (n=1000 tweets were collected for analysis). | Cross-sectional survey approach and CA (frequencies and descriptive statistics). | Pilot coding of 300 tweets, ICR using kappa statistic (κ=.96), frequency tables, and continuous text with quotes illustrating the categories. |
| Signorini et al (2011) [ | “flu”, “swine”, “influenza”, “vaccine”, “tamiflu”, “oseltamivir”, “zanamivir”, “relenza”, “amantadine”, “rimantadine”, “pneumonia”, “h1n1”, “symptom”, “syndrome”, and “illness” and additional keywords (eg, travel, trip, flight, fly, cruise, and ship) | Two large data sets for tracking flu rate over time and location. The first data set consists of 951,697 tweets selected from the 334,840,972 tweets. Tweets were posted between April 29 and June 1, 2009. The second data set consists of 4,199,166 tweets selected from roughly 8 million tweets. Tweets were posted between October 1, 2009 and December 2009. | Quantitative CA (descriptive and advanced statistics). | Regression analysis and frequency graphs with respect to time. |
| McNeil et al (2012) [ | “seizure”, “seizures”, “seize”, “seizing”, and “seizuring” | Random sample of 10,662 tweets from a period of 7 consecutive days. Tweets were posted between April 15 and April 21, 2011 (n=1504 tweets were collected for analysis). | Prospective qualitative CA. | Pilot coding of a 48-hour preliminary data set and interrater agreement (85.4%), frequency tables, and text matrices with quotes illustrating the categories. |
| Sullivan et al (2012) [ | “concussion”, “concussions”, “concuss”, “concussed”, “#concussion”, “#concussions”, “#concuss,” and “#concussed” | Random sample of 3488 tweets over 7 consecutive days. Tweets were posted between 12:00 GMTcon July 23 and 12:00 GMT on July 30, 2010 (n=1000 tweets were collected for analysis). | Prospective observational study using qualitative CA. | Pilot coding of 100 tweets from a sample collected over a 24-hour period and interrater agreement, frequency tables, and text matrices with quotes illustrating the categories. |
| Donelle & Booth (2012) [ | “#health” and “health” as a single word, part of a word (eg, health care) | Purposeful cross-sectional sample of 36,042 tweets. Tweets were collected over 4 consecutive days, from June 16, 2009 at 19:32 GMT until June 20, 2009 at 12:02 GMT (n=2400 tweets were collected for analysis; the first 100 tweets from the end of each hour of June 19, 2009, starting at 05:00 GMT for a 24-hour period). | Qualitative (directed and deductive) CA [ | Trustworthiness and validation of findings (interrater agreement, systematic data analysis, analyst triangulation, and verbatim data collection, and basic descriptive statistics). Data were presented through frequency graphs, text matrices, and continuous text with quotes illustrating the categories. |
| Robillard et al (2013) [ | “dementia” and “Alzheimer” | Random sample of 9200 tweets for a period of 24 hours (starting February 15, 2012 at 3:35 pm) (n=920 tweets were collected for analysis in addition to a subsample containing 100 tweets generated by the top users). | Cross-sectional survey using CA [ | Pilot coding of an initial set of 100 random tweets and frequency graphs and tables. |
| Lyles et al (2013) [ | “pap smear” and “mammogram” | Cross-sectional sample of top tweets during a 5-week period. Tweets were posted between April and early May 2012 (n=474 tweets were collected for analysis). | Exploratory qualitative CA. | Pilot coding of 20% of collected tweets, ICR of 40% of collected tweets, interrater agreement, frequency graphs, text matrices, and continuous text with quotes illustrating the categories. |
| Bosley et al (2013) [ | “cardiac arrest”, “CPR”, “AED”, “resuscitation”, “heart arrest”, “sudden death”, and “defib” | All identified resuscitation-related tweets from the keyword search. Tweets were posted between April 19 and May 26, 2011 (n=15,475 tweets were collected for analysis). | Quantitative CA (descriptive statistics). | Pilot coding of 1% of identified tweets, ICR using kappa statistic (κ=.78), frequency graphs and text matrices with quotes illustrating the categories. |
| Hanson et al (2013) [ | “prescription drugs” | Random set of tweets posted by 25 identified social networks or circles. Tweets were posted between November 29, 2011 and November 14, 2012 (up to 3200 tweets per user were collected for analysis). | Quantitative CA of identified social circles | Pearson correlation coefficient of user interactions. Frequency tables and social network graphs. |
| Henzell et al (2013) [ | “braces”, “orthodontist”, and “orthodontics” | Convenience sample of consecutive tweets posted over a 5-day period. Tweets were posted between September 3 and 7, 2012 (n=131 tweets were collected for analysis). | Qualitative (discourse) CA. | Continuous text with quotes illustrating the categories. |
| Myslín et al (2013) [ | “cig*”, “nicotine”, “smoke*”, “tobacco”, “hookah”, “shisha”, “waterpipe”, “e-juice”, “e-liquid”, “vape”, and “vaping” | Random sample of tweets at 15-day intervals. Tweets were posted between December 5, 2011 and July 17, 2012 (n=7362 tweets were collected for analysis). | Infoveillance methodology [ | Pearson correlations between manual and automated coding, chi-square to test changes over time, frequency graphs, and text representation diagrams. |
| Rui et al (2013) [ | Not stated | Random sample of tweets posted by 58 health organizations (chosen randomly) within 2 months. Tweets were posted between September and November 2011 (n=1500 tweets were collected for analysis). | Quantitative (deductive) CA guided by the classic categorization of social support. | Descriptive statistics, ICR of 200 random tweets using Krippendorff alpha (.74), frequency tables, and continuous text with quotes illustrating the categories. |
| Zhang et al (2013) [ | 113 physical activity keywords generated from lists of published physical activity measures | A random sample of 30,000 tweets selected from a pool of one million tweets. Tweets were posted between January 1 and March 31, 2011 (n=4672 tweets were collected for analysis in addition to 1500 collected from this sample for further coding). | Quantitative CA (descriptive and advanced statistics). | Pilot coding of 100 tweets (separate from the final 1500 tweets) to calculate ICR (ranges from 0.83 to 0.98) using Holsti’s [ |
| Park et al (2013) [ | “health literacy” | Random sample of 1044 tweets. Tweets were posted during the time following time periods to construct a composite month: October 25–31, 2009; November 7–14, 2009; December 15–23, 2009; and January 4–10, 2010 (n=571 tweets were collected for analysis). | Quantitative CA based on Web reports on key Twitter features and previous literature in health communication and media studies. | Pilot coding, ICR of a subsample of 111 tweets using Holsti [ |
| Love et al (2013) [ | “vaccine”, “vaccination”, and “immunization” | Random sample of 6827 English-language tweets. Tweets were posted between January 8 and 14, 2012 (n=2580 tweets were collected for analysis). | Quantitative CA. | Statistical analysis (frequencies and chi-square analyses and tables). |
| Jashinsky et al (2013) [ | Keywords and phrases created from suicide risk factors (12 identified factors) | All tweets (1,659,274 tweets) posted by 1,208,809 unique users over a 3-month period. Tweets were posted between May 15, 2012 and August 13, 2012 (n=37,717 tweets from 28,088 unique users were collected for analysis). | Quantitative CA (descriptive and advanced statistics). | ICR using kappa statistic (κ=.48), Spearman rank correlation coefficient, vital statistics, and text matrices with quotes illustrating the categories. |
aCA: content analysis.
bICR: intercoder reliability.
cGMT: Greenwich mean time.
Twitter archive software used in the studies analyzing health-related Twitter posts (2010–2014).
| Author(s) | Archive software used |
| Chew & Eysenbach (2010) [ | Infoveillance system and Twitter APIa |
| Scanfeld et al (2010) [ | Twitter search engine |
| Heaivilin et al (2011) [ | Twitter search engine |
| Signorini et al (2011) [ | JavaScript application and Twitter’s API |
| McNeil et al (2012) [ | Twitter search engine |
| Sullivan et al (2012) [ | Twitter search engine |
| Donelle & Booth (2012) [ | The Archivist (MIX Online, 2011) data collection software program |
| Robillard et al (2013) [ | Twitter’s API |
| Lyles et al (2013) [ | Twitter search engine |
| Bosley et al (2013) [ | Twitter search engine |
| Hanson et al (2013) [ | Twitter’s API |
| Henzell et al (2013) [ | Twitter search engine |
| Myslín et al (2013) [ | Twitter’s API |
| Rui et al (2013) [ | ActivePython v2.7.2 |
| Zhang et al (2013) [ | Twitter’s API |
| Park et al (2013) [ | Twitter’s API |
| Love et al (2013) [ | Twitter’s API |
| Jashinsky et al (2013) [ | Twitter’s API |
aAPI: application programming interface.
Figure 2The combined content-analysis (CCA) algorithm.
Figure 3The combined content-analysis (CCA) model. CA: content analysis; qual: qualitative supplement; QUAL: qualitative priority; quan: quantitative supplement; QUAN: quantitative priority.
Selected software to aid content analysis.
| Software (source) | Web address | |
|
| ||
|
| Analytics for Twitter for Excel (Microsoft) | www.microsoft.com/en-us/download/details.aspx?id=26213 |
|
| twitteR (The Comprehensive R Archive Network) | cran.r-project.org/package=twitteR |
|
| Tweet Archivist (Tweet Archivist) | www.tweetarchivist.com |
|
| Twitter Analytics (Twitter) | analytics.twitter.com/about |
|
| ||
|
| CAQDASa,bNetworking Project (University of Surrey) | www.surrey.ac.uk/sociology/research/researchcentres/caqdas/support/choosing/ |
|
|
| |
|
| Text Analysis Info (Social Science Consulting) | textanalysis.info/pages/text-analysis-software---classified.php |
aCAQDAS: CAQDAS (computer assisted qualitative data analysis) networking project.
bFor example, ATLAS.ti, NVivo, MAXQDA, Dedoose, HyperRESEARCH.