| Literature DB >> 25379727 |
Sanmitra Bhattacharya1, Padmini Srinivasan1, Phil Polgreen2.
Abstract
OBJECTIVE: To investigate factors associated with engagement of U.S. Federal Health Agencies via Twitter. Our specific goals are to study factors related to a) numbers of retweets, b) time between the agency tweet and first retweet and c) time between the agency tweet and last retweet.Entities:
Mesh:
Year: 2014 PMID: 25379727 PMCID: PMC4224440 DOI: 10.1371/journal.pone.0112235
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Features Examined.
| Type | Group | Features | Description |
| Handle-level | 1 | Favorites | # of users favoriting tweets of a particular handle (log-transformed). |
| 1 | Followers | # of users following a particular handle (log-transformed). | |
| 1 | Friends | # of users followed by a particular handle (log-transformed). | |
| 1 | Betweenness-centrality | Importance of node in network. | |
| 2 | Tweet count | # of tweets posted by a handle in its lifetime (log-transformed). | |
| Tweet-level | 1 | Tweet age | # of days between handle creation and tweet post (log-transformed). |
| 2 | Hashtag | Whether a tweet contains a hashtag, word prefixed with # (binary). | |
| 2 | URL | Whether a tweet contains an URL, http, ftp, etc. (binary). | |
| 2 | User-mention | Whether a tweet contains a user-mention, word prefixed with @ (binary). | |
| 2 | Sentiment | Two scores: one for positivity and another for negativity. | |
| 2 | Content (Semantic Groups) | Classification of each tweet into 15 semantic groups using MTI followed by post-processing. Multiple classes per tweet allowed. |
Agencies and Handles.
| Agency | Name | # handles | Examples of handles |
| ACF | Administration for Children & Families | 1 | HeadStartgov |
| AHRQ | Agency for Healthcare Research & Quality | 1 | AHRQNews |
| CDC | Center for Disease Control & Prevention | 25 | CDCgov, CDCActEarly, CDC_BioSense, etc. |
| CMS | Centers for Medicare & Medicaid Services | 4 | CMSGov, CMSinnovates, IKNGov, etc. |
| FDA | U.S. Food & Drug Administration | 10 | US_FDA, FDATobacco, FDADeviceInfo, etc. |
| HRSA | Health Resources & Services Administration | 1 | HRSAgov |
| NIH | National Institutes of Health | 15 | NIHforFunding, NIHClinicalCntr, etc. |
| NIH/NIA | National Institute on Aging | 1 | NIAGo4Life |
| NIH/NCCAM | National Center for Complementary & Alternative Medicine | 1 | NCCAM |
| NIH/NCI | National Cancer Institute | 13 | SmokefreeGov, NCIHINTS, etc. |
| NIH/NEI | National Eye Institute | 1 | NEHEP |
| NIH/NHLBI | National Heart, Blood & Lung Institute | 3 | TheHeartTruth, nih_nhlbi, BreatheBetter |
| NIH/NIAAA | National Institute of Alcohol Abuse & Alcoholism | 1 | NIAAAnews |
| NIH/NIAID | National Institute of Allergy & Infectious Diseases | 3 | NIAIDNews, NIAIDCareers, NIAIDFunding |
| NIH/NIAIMS | National Institute of Arthritis & Musculoskeletal & Skin Diseases | 1 | NIH_NIAMS |
| NIH/NICRR | National Center for Research Resources | 1 | ncrr_nih_gov |
| NIH/NIDA | National Institute of Drug Abuse | 1 | NIDAnews |
| NIH/NIEHS | National Institute of Environmental Health Sciences | 1 | NIEHS |
| NIH/NIGMS | National Institute of General Medical Sciences | 1 | NIGMS |
| NIH/NIHGRI | National Human Genome Research Institute | 1 | DNAday |
| NIH/NIMH | National Institute of Mental Health | 1 | NIMHgov |
| NIH/NLM | National Library of Medicine | 11 | NLM_LHC, medlineplus, NCBI, etc. |
| OIG | Office of Inspector General | 1 | OIGatHHS |
| OS | Office of the Secretary | 29 | AIDSgov, bestbones4ever, BirdFluGov, etc. |
| SAMHSA | The Substance Abuse & Mental Health Services | 2 | samhsagov, distressline |
|
| 130 |
Number of tweets and retweets per agency.
| Agency | Date first handle was created | # tweets | # of tweets with zero retweets | # tweets with at least 1 retweet | # retweets | # retweets per tweet | # retweets per non-zero retweeted tweet |
| ACF | 9/7/2011 | 605 | 219 (36.2%) | 386 (63.8%) | 1924 | 3.18 | 4.98 |
| AHRQ | 6/5/2009 | 1475 | 415 (28.14%) | 1060 (71.86%) | 3432 | 2.33 | 3.24 |
| CDC | 7/24/2008 |
|
| 26073 (70.21%) | 278885 | 7.51 | 10.70 |
| CMS | 9/1/2009 | 5620 | 2132 (37.94%) | 3488 (62.06%) | 11023 | 1.96 | 3.16 |
| FDA | 12/11/2008 | 10574 | 3007 (28.44%) | 7567 (71.56%) | 75245 | 7.12 | 9.94 |
| HRSA | 6/1/2009 | 1241 | 332 (26.75%) | 909 (73.25%) | 5391 | 4.34 | 5.93 |
| NIH | 6/16/2008 | 15550 | 7446 (47.88%) | 8104 (52.12%) | 49666 | 3.19 | 6.13 |
| NIH/NIA | 10/18/2011 | 1891 | 629 (33.26%) | 1262 (66.74%) | 10556 | 5.58 | 8.36 |
| NIH/NCCAM | 8/20/2009 | 1489 | 568 (38.15%) | 921 (61.85%) | 4102 | 2.75 | 4.45 |
| NIH/NCI | 4/28/2009 | 15679 | 5580 (35.59%) | 10099 (64.41%) | 46586 | 2.97 | 4.61 |
| NIH/NEI | 3/23/2011 | 401 | 249 (62.09%) | 152 (37.91%) | 331 | 0.83 | 2.18 |
| NIH/NHLBI | 2/26/2009 | 5135 | 1526 (29.72%) | 3609 (70.28%) | 29447 | 5.73 | 8.16 |
| NIH/NIAAA | 7/15/2010 | 424 | 122 (28.77%) | 302 (71.23%) | 2279 | 5.38 | 7.55 |
| NIH/NIAID | 7/24/2009 | 1725 | 830 (48.12%) | 895 (51.88%) | 2808 | 1.63 | 3.14 |
| NIH/NIAIMS | 8/31/2009 | 822 | 135 (16.42%) | 687 (83.58%) | 1850 | 2.25 | 2.69 |
| NIH/NICRR | 8/14/2009 | 1029 | 704 (68.42%) | 325 (31.58%) | 515 | 0.50 | 1.58 |
| NIH/NIDA | 1/5/2010 | 2191 | 669 (30.53%) | 1522 (69.47%) | 7484 | 3.42 | 4.92 |
| NIH/NIEHS | 12/17/2009 | 682 | 320 (46.92%) | 362 (53.08%) | 858 | 1.26 | 2.37 |
| NIH/NIGMS | 9/2/2009 | 983 | 420 (42.73%) | 563 (57.27%) | 1791 | 1.82 | 3.18 |
| NIH/NIHGRI | 2/25/2009 | 401 | 180 (44.89%) | 221 (55.11%) | 652 | 1.63 | 2.95 |
| NIH/NIMH | 5/11/2009 | 959 | 177 (18.46%) | 782 (81.54%) | 16779 | 17.50 | 21.46 |
| NIH/NLM | 2/12/2009 | 15058 | 6525 (43.33%) | 8533 (56.67%) | 48497 | 3.22 | 5.68 |
| OIG | 5/2/2011 | 1476 | 386 (26.15%) | 1090 (73.85%) | 2459 | 1.67 | 2.26 |
| OS |
| 36587 | 8026 (21.94%) |
|
| 10.28 | 13.17 |
| SAMHSA | 3/17/2009 | 4971 | 1896 (38.14%) | 3075 (61.86%) | 21729 | 4.37 | 7.07 |
|
| 164104 | 53556 (32.64%) | 110548 (67.36%) | 1000447 | 6.10 | 9.05 | |
|
| 6564.16 (10355.66) | 2142.24 (3055.12) | 4421.92 (7499.23) | 40017.88 (89880.72) | 4.10 (3.64) | 5.99 (4.39) |
Bolded values indicate the largest values for the column.
Top 10 agency handles for most retweets per tweet.
| Handle | Date of creation | # tweets | # of tweets with non-zero retweets | # of tweets with zero retweets | # retweets | # retweets per non-zero retweeted tweet |
| CDCemergency | 1/28/2009 | 792 | 523 (66.04%) | 269 (33.96%) | 36756 |
|
| FitnessGov | 9/15/2011 | 935 | 834 (89.2%) | 101 (10.8%) | 23003 | 27.58 |
| womenshealth | 5/30/2007 |
|
| 73 (2.26%) |
| 27.14 |
| HealthCareGov | 11/1/2009 | 409 | 404 (98.78%) | 5 (1.22%) | 10315 | 25.53 |
| HHSGov | 6/4/2009 | 1295 | 1103 (85.17%) | 192 (14.83%) | 26313 | 23.86 |
| FDArecalls | 12/11/2008 | 2118 | 1278 (60.34%) |
| 29764 | 23.29 |
| CDCgov | 5/21/2010 | 3226 | 2904 (90.02%) | 322 (9.98%) | 66204 | 22.80 |
| CDC_eHealth | 7/24/2008 | 1517 | 1255 (82.73%) | 262 (17.27%) | 27856 | 22.20 |
| NIMHgov | 5/11/2009 | 959 | 782 (81.54%) | 177 (18.46%) | 16779 | 21.46 |
| PHEgov | 4/26/2010 | 1356 | 998 (73.6%) | 358 (26.4%) | 20683 | 20.72 |
Bolded values indicate the largest values for the column.
Figure 1Power law plots of (a) retweets/tweet, (b) #retweets/retweeter, (c) #days to first retweet and (d) #days to last retweet.
Figure 2Plot of # of followers vs. # of friends for each handle.
Few handles with disparate distribution of followers and friends have been labeled.
Distribution of positive and negative sentiments for tweets on a 5-point scale.
| # of positive tweets | # of negative tweets | |
| neutral | 117599 (71.66%) | 111233 (67.78%) |
| moderate-medium | 36940 (22.51%) | 31791 (19.37%) |
| medium | 8502 (5.18%) | 10143 (6.18%) |
| medium-extreme | 1051 (0.64%) | 10772 (6.56%) |
| extreme | 12 (0.01%) | 165 (0.10%) |
|
| 164104 | 164104 |
Semantic groups with examples of component semantic types and their prevalence in the dataset.
| Semantic Groups | Example Semantic Types | # of tweets (%) |
| Concepts & Ideas | Functional Concept, Regulation or Law, Temporal Concept, etc. | 68391 (41.68%) |
| Disorders | Anatomical Abnormality, Disease or Syndrome, Neoplastic Process, etc. | 59164 (36.05%) |
| Living Beings | Mammal, Eukaryote, Plant, etc. | 57836 (35.24%) |
| Geographic Areas | Geographic Area | 42133 (25.67%) |
| Chemicals & Drugs | Clinical Drug, Organic Chemical, Enzyme, etc. | 39065 (23.81%) |
| Activities & Behaviors | Daily or Recreational Activity, Machine Activity, Social Behavior, etc. | 38276 (23.32%) |
| Organizations | Health Care Related Organization, Professional Society, Self-help or Relief Organization | 35163 (21.43%) |
| Physiology | Cell Function, Mental Process, Organ or Tissue Function, etc. | 32308 (19.69%) |
| Objects | Entity, Food, Manufactured Object, etc. | 23452 (14.29%) |
| Procedures | Diagnostic Procedure, Research Activity, Therapeutic or Preventive Procedure, etc. | 23445 (14.29%) |
| Phenomena | Biologic Function, Human-caused Phenomenon or Process, Natural Phenomenon or Process | 20252 (12.34%) |
| Anatomy | Anatomical Structure, Cell Component, Tissue, etc. | 7925 (4.83%) |
| Occupations | Biomedical Occupation or Discipline, Occupation or Discipline | 7633 (4.65%) |
| Devices | Drug Delivery Device, Medical Device, Research Device | 1610 (0.98%) |
| Genes & Molecular Sequences | Amino Acid Sequence, Carbohydrate Sequence, Gene or Genome, etc. | 1138 (0.69%) |
Results of hurdle negative binomial model showing the estimate/coefficient (SE), exponent of coefficient (OR and IRR), z and p-values (*p<0.05, **p<0.01, ***p<0.001) for various independent variables for zero and count portions of the model.
| Zero Portion | Count Portion | |||||||
| Estimate (SE) | OR | z value | p | Estimate (SE) | IRR | z value | p | |
| (Intercept) | −3.295 (0.05) | 0.037 | −65.69 | *** | −1.361 (0.053) | 0.256 | −25.89 | *** |
| Log-transformed Favorite Count |
|
|
|
|
|
|
|
|
| Log-transformed Follower Count |
|
|
|
|
|
|
|
|
| Log-transformed Friend Count | 0.002 (0.012) | 1.002 | 0.168 |
|
|
|
| |
| Log-transformed Tweet Count |
|
|
|
|
|
|
|
|
| Log-transformed betweenness-centrality | 0.016 (0.01) | 1.016 | 1.679 |
|
|
|
| |
| Log-transformed tweet age |
|
|
|
|
|
|
|
|
| Hashtag |
|
|
|
|
|
|
|
|
| URL |
|
|
|
|
|
|
|
|
| User-mention |
|
|
|
|
|
|
|
|
| Positive Sentiment |
|
|
|
| −0.016 (0.01) | 0.984 | −1.695 | |
| Negative Sentiment |
|
|
|
|
|
|
|
|
| Activities & Behaviors |
|
|
|
|
|
|
|
|
| Anatomy |
|
|
|
|
|
|
|
|
| Chemicals & Drugs |
|
|
|
|
|
|
|
|
| Concepts & Ideas |
|
|
|
| −0.022 (0.011) | 0.978 | −1.94 | |
| Devices |
|
|
|
|
|
|
|
|
| Disorders |
|
|
|
|
|
|
|
|
| Genes & Molecular Sequences | 0.058 (0.071) | 1.059 | 0.812 |
|
|
|
| |
| Geographic Areas |
|
|
|
|
|
|
|
|
| Living Beings |
|
|
|
|
|
|
|
|
| Objects |
|
|
|
|
|
|
|
|
| Occupations |
|
|
|
|
|
|
|
|
| Organizations |
|
|
|
|
|
|
|
|
| Phenomena |
|
|
|
|
|
|
|
|
| Physiology |
|
|
|
|
|
|
|
|
| Procedures |
|
|
|
|
|
|
|
|
| Log(theta) | −1.8 (0.024) | 0.165 | −73.547 | *** | ||||
Italicized rows: variables with significant negative association, bolded rows: variables with significant positive association. The Zero Portion is a model of whether or not there is a retweet and the Count Portion models the number of retweets.
Results of Cox proportional hazards model for interval between a tweet and its first retweet.
| Interval Between Tweet and First Retweet | ||||
| Coefficient (SE) | HR | z | p | |
| Log-transformed Favorite Count |
|
|
|
|
| Log-transformed Follower Count |
|
|
|
|
| Log-transformed Friend Count |
|
|
|
|
| Log-transformed betweenness | −0.004 (0.008) | 0.995 | −0.566 | |
| Log-transformed Tweet Count | 0.017 (0.015) | 1.017 | 1.176 | |
| Log-transformed tweet age |
|
|
|
|
| Hashtag |
|
|
|
|
| URL | −0.021 (0.011) | 0.978 | −1.907 | . |
| User-mention |
|
|
|
|
| Positive Sentiment |
|
|
|
|
| Negative Sentiment | −0.001 (0.005) | 0.998 | −0.284 | |
| Activities & Behaviors | 0.008 (0.01) | 1.008 | 0.827 | |
| Anatomy | 0.001 (0.019) | 1.001 | 0.077 | |
| Chemicals & Drugs | 0.013 (0.01) | 1.013 | 1.347 | |
| Concepts & Ideas | 0.008 (0.009) | 1.008 | 0.954 | |
| Devices | −0.009 (0.04) | 0.990 | −0.231 | |
| Disorders |
|
|
|
|
| Genes & Molecular Sequences | 0.051 (0.047) | 1.052 | 1.085 | |
| Geographic Areas |
|
|
|
|
| Living Beings | 0.006 (0.009) | 1.006 | 0.667 | |
| Objects | 0 (0.012) | 0.999 | −0.033 | |
| Occupations | −0.02 (0.02) | 0.980 | −0.998 | |
| Organizations | 0.006 (0.015) | 1.006 | 0.416 | |
| Phenomena | 0.021 (0.013) | 1.021 | 1.596 | |
| Physiology | 0.009 (0.011) | 1.009 | 0.827 | |
| Procedures |
|
|
|
|
The Coefficients (SE), hazard ratios (HR), z and p-values (*p<0.05, **p<0.01, ***p<0.001) for the independent variables are shown. Italicized rows: variables with significant negative association, bolded rows: variables with significant positive association.
Results of Cox proportional hazards model for interval between a tweet and its last retweet.
| Interval Between Tweet and Last Retweet | ||||
| Coefficient (SE) | HR | z | p | |
| Log-transformed Favorite Count |
|
|
|
|
| Log-transformed Follower Count |
|
|
|
|
| Log-transformed Friend Count | −0.009 (0.009) | 0.991 | −0.972 | |
| Log-transformed Tweet Count |
|
|
|
|
| Log-transformed tweet age | −0.014 (0.012) | 0.986 | −1.124 | |
| Log-transformed betweenness |
|
|
|
|
| Hashtag |
|
|
|
|
| URL |
|
|
|
|
| User-mention |
|
|
|
|
| Positive Sentiment |
|
|
|
|
| Negative Sentiment |
|
|
|
|
| Activities & Behaviors |
|
|
|
|
| Anatomy |
|
|
|
|
| Chemicals & Drugs | −0.019 (0.01) | 0.981 | −1.936 | |
| Concepts & Ideas | −0.011 (0.009) | 0.989 | −1.262 | |
| Devices | −0.059 (0.04) | 0.942 | −1.471 | |
| Disorders | −0.011 (0.01) | 0.988 | −1.156 | |
| Genes & Molecular Sequences | 0.059 (0.047) | 1.060 | 1.252 | |
| Geographic Areas | 0.001 (0.014) | 1.000 | 0.04 | |
| Living Beings | −0.006 (0.009) | 0.993 | −0.721 | |
| Objects |
|
|
|
|
| Occupations |
|
|
|
|
| Organizations |
|
|
|
|
| Phenomena | −0.012 (0.013) | 0.987 | −0.928 | |
| Physiology | −0.013 (0.011) | 0.986 | −1.189 | |
| Procedures | 0.017 (0.012) | 1.016 | 1.388 | |
The Coefficients (SE), hazard ratios (HR), z and p-values (*p<0.05, **p<0.01, ***p<0.001) for the independent variables are shown. Italicized rows: variables with significant negative association, bolded rows: variables with significant positive association.