| Literature DB >> 32502758 |
Oduwa Edo-Osagie1, Beatriz De La Iglesia2, Iain Lake3, Obaghe Edeghere4.
Abstract
Public health practitioners and researchers have used traditional medical databases to study and understand public health for a long time. Recently, social media data, particularly Twitter, has seen some use for public health purposes. Every large technological development in history has had an impact on the behaviour of society. The advent of the internet and social media is no different. Social media creates public streams of communication, and scientists are starting to understand that such data can provide some level of access into the people's opinions and situations. As such, this paper aims to review and synthesize the literature on Twitter applications for public health, highlighting current research and products in practice. A scoping review methodology was employed and four leading health, computer science and cross-disciplinary databases were searched. A total of 755 articles were retreived, 92 of which met the criteria for review. From the reviewed literature, six domains for the application of Twitter to public health were identified: (i) Surveillance; (ii) Event Detection; (iii) Pharmacovigilance; (iv) Forecasting; (v) Disease Tracking; and (vi) Geographic Identification. From our review, we were able to obtain a clear picture of the use of Twitter for public health. We gained insights into interesting observations such as how the popularity of different domains changed with time, the diseases and conditions studied and the different approaches to understanding each disease, which algorithms and techniques were popular with each domain, and more.Entities:
Keywords: Disease tracking; Event forecasting; Pharmacovigilance; Public health; Syndromic surveillance
Mesh:
Year: 2020 PMID: 32502758 PMCID: PMC7229729 DOI: 10.1016/j.compbiomed.2020.103770
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589
Summary of statistical and machine learning methods and data sources for event detection using Twitter data.
| Public Health Issue | Method | Complementary Data |
|---|---|---|
| Cancer | Support Vector Machine [ | CDC |
| Smoking | Bayesian Logistic Regression [ | |
| Suicide | ARIMA (Autoregressive Integrated Moving Average [ | |
| Harmful Algal Blooms (HABS) | Deep Learning (CNN) [ | |
| HIV | Decision Tree [ | |
| Allergies | [ | pollen.com, National Climatic Data Center Climate Data Online (CDO) |
| Drug Abuse | Biterm Topic Model [ | |
| HPV | Decision Tree [ | |
| Infectious Intestinal Diseases (IID) | Word2Vec [ | Public Health England |
| Adverse Drug Events (ADE) | Multi-Instance Logistic Regression [ | |
| Depression | Non-Negative Matrix Factorization [ | National Climatic Data Center, National Oceanic and Atmospheric Administration (NOAA) |
| Ebola | Lexicon Analysis [ | |
| Back Pain | Logistic Regression [ | |
| Vomiting | TSVM [ | Public Health England |
| Gastroenteritis | TSVM [ | Public Health England |
| Asthma | Support Vector Machine [ | CDC |
| Food Borne Illness | K-Nearest Neighbour [ | Southern Nevada Health District (SNHD), CDC |
| Earthquake | Clustering [ | |
| Diabetes | Support Vector Machine [ | CDC |
| Dental Pain | Simple Statistical Analysis [ | |
| Influenze-like Illnesses (ILIs) | Clustering [ | Penn State's Health Services, Infectious Disease Surveillance Center, Royal College of General Practitioners (RCGP), Public Health England, CDC |
| General Health | Support Vector Machine [ | |
| Diarrhoea | TSVM [ | Public Health England |
| Obesity | Dbscan (Clustering) [ | |
| Middle East Respiratory Syndrome (Mers) | Lexicon Analysis [ |
Generic feelings of unwellness and non-specific illness.
Summary of statistical and machine learning methods and data sources for pharmacovigilance using Twitter data.
| Public Health Issue | Method | Complementary Data |
|---|---|---|
| Smoking | Bayesian Logistic Regression [ | |
| HIV | Support Vector Machine [ | |
| Vaccination | Semantic Network Analysis [ | |
| Drug Abuse | Decision Tree [ | National Surveys on Drug Usage and Health (NSDUH) |
| Adverse Drug Reactions (ADRs) | Conditional Random Field [ | ADRMine |
| Adverse Drug Events (ADEs) | Multi-Instance Logistic Regression (Milr) [ | |
| Alcoholism | Simple Statistical Analysis [ | |
| Miscellaneous | Decision Tree [ |
Summary of statistical and machine learning methods and data sources for forecasting using Twitter data.
| Public Health Issue | Method | Complementary Data |
|---|---|---|
| Cancer | Simple Statistical Analysis [ | CDC |
| E Coli | Latent Dirichlet Allocation [ | Robert Koch Institute |
| Vomiting | TSVM [ | Public Health England |
| Gastroenteritis | TSVM [ | Public Health England, Robert Koch Institute |
| Asthma | Decision Tree [ | Children's Medical Center (CMC) |
| Influenze-like Illnesses (H1N1) | Support Vector Regression [ | CDC |
| Influenze-like Illnesses | Deep Learning (RNN) [ | Boston Public Health Commission, Public Health England, Pan American Health Organization (PAHO), Chinese CDC, CDC |
| General Health | Temporal Ailment Topic Aspect Model (TM-ATAM) [ | CDC |
| Dengue | Simple Statistical Analysis [ | Brazilian Official Dengue case data |
| Diarrhoea | TSVM [ | Public Health England |
Generic feelings of unwellness and non-specific illness.
Summary of statistical and machine learning methods and data sources for disease tracking using Twitter data.
| Public Health Issue | Method | Complementary Data |
|---|---|---|
| Measles | Semantic Network Analysis [ | CDC |
| Influenze-like Illnesses (Hemophilus) | Bayesian Inference [ | Genbank |
| Influenze-like Illnesses (H1N1) | Semi-Superviseddeep Learning (MLP) [ | CDC |
| Influenze-like Illnesses | Bayesian Inference [ | FluWatch, Boston Public Health Commission, Chinese CDC |
| General Health | Temporal Ailment Topic Aspect Model (TM-ATAM) [ | CDC |
| Dengue | Time-Series Susceptible-Infected-Recovered Model [ | Brazilian Official Dengue case data |
| Miscellaneous | Gaussian Mixture Regression (Gmr) [ | Map data |
Generic feelings of unwellness and non-specific illness.
Summary of statistical and machine learning methods and data sources for geographic identification using Twitter data.
| Public Health Issue | Method | Complementary Data |
|---|---|---|
| Depression | Non-Negative Matrix Factorization (Nmf) [ | |
| Dengue | Time-Series Susceptible-Infected-Recovered Model [ | Brazilian Health Ministry |
| Obesity | Dbscan (Clustering) [ | |
| Miscellaneous | Latent Dirichlet Allocation [ | Map data |
Fig. 1PRISMA flow diagram for the identification and selection of studies.
Inclusion and exclusion criteria.
| Criterion | Inclusion | Exclusion |
|---|---|---|
| Time period | 2009–2019 | Studies outside these dates |
| Language | English | Non-english articles |
| Article Type | Original peer-reviewed research | Research that was not peer-reviewed |
| Literature focus | Articles reporting on a method or application of Twitter data to address a public health issue. Articles which evaluated the performance of the statistical or machine learning technique used in drawing utility from the Twitter data. | Review articles and other articles not reporting an original contribution. Articles not focused on our above definition of public health but rather concerned with public health in the context of recruitment and outreach, public awareness and communication, information dissemination or opinion mining. Articles which do not make known the statistical or machine learning technique being used. Articles which are works in progress or otherwise do not contain the full-text, such as conference abstracts. |
Summary of statistical and machine learning methods and data sources for surveillance using Twitter data.
| Public Health Issue | Method | Comparative Data Source |
|---|---|---|
| Cancer | Simple Statistical Analysis [ | CDC |
| Hepatitis A | Support Vector Machine [ | |
| Gastrointestinal Illnesses | Correlation Analysis [ | Government of ontario, Kingston, Frontenac and Lennox & Addington Public Health |
| Suicide | ARIMA (Autoregressive Integrated Moving Average [ | |
| HIV | Graph Modelling [ | |
| Allergies | K-Nearest Neighbour [ | |
| Heat Wave | Near Regression [ | The US National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI) |
| Heat Related Illnesses | Correlation Analysis [ | Government of ontario, Kingston, Frontenac and Lennox & Addington Public Health |
| Depression | ARIMA (Autoregressive Integrated Moving Average [ | |
| Syphilis | Binomial Regressions [ | CDC |
| Ebola | Bayesian Inference [ | |
| Respiratory Illness | Correlation Analysis [ | Government of ontario, Kingston, Frontenac and Lennox & Addington Public Health |
| E Coli | Latent Dirichlet Allocation [ | Robert Koch Institute |
| Measles | Support Vector Machine [ | |
| Influenze-like Illnesses (Hemophilus) | Bayesian Inference [ | Genbank |
| Vomiting | TSVM [ | Public Health England |
| Gastroenteritis | TSVM [ | Public Health England, Robert Koch Institute |
| Salmonella | Support Vector Machine [ | |
| Food Borne Illness | Support Vector Machine [ | Southern Nevada Health District (SNHD) |
| Earthquake | Clustering [ | |
| Stress | Ordinal Regression [ | |
| Air Pollution | Self-Organizing Map (Clustering) [ | The European Centre for Medium-Range Weather Forecasts (ECMWF), London Air Quality Network |
| Influenze-like Illnesses (ILI) | Lexicon Analysis [ | Public Health England, Frontenac and Lennox & Addington Public Health, Chinese CDC, Pan American Health Organization (PAHO), CDC, HHS data, Kingston, FluWatch, Government of ontario, The Pan American Health Organi-zation (PAHO) |
| General Health | Topic Model (Ailment Topic Aspect Model (Atam)) [ | CDC, U.S. Census' State-Based Counties Gazetteer |
| Dengue | Dbscan (Clustering) [ | Brazilian Health Ministry, Philippine's Department of Health, Brazilian Official Dengue case data |
| Diarrhoea | TSVM [ | Public Health England |
| Obesity | Dbscan (Clustering) [ |
Note that the information shown for 2019 is not comparable to that for other years due to the fact that, at the time of plotting the graph, 2019 had not elapsed.
Fig. 3Breakdown of studies by country.
Fig. 4Most studied diseases each year.
Generic feelings of unwellness and non-specific illness.
Fig. 2Word cloud of statistical and machine learning methods discovered in review.
Fig. 5Most applied algorithms each year.
Fig. 6Bubble chart showing the trends of research activity in public health application domains with time. The size of the bubble represents the number of articles in each category and year.
Fig. 7Most applied algorithms each year.