| Literature DB >> 30401664 |
Amaryllis Mavragani1, Gabriela Ochoa1, Konstantinos P Tsagarakis2.
Abstract
BACKGROUND: In the era of information overload, are big data analytics the answer to access and better manage available knowledge? Over the last decade, the use of Web-based data in public health issues, that is, infodemiology, has been proven useful in assessing various aspects of human behavior. Google Trends is the most popular tool to gather such information, and it has been used in several topics up to this point, with health and medicine being the most focused subject. Web-based behavior is monitored and analyzed in order to examine actual human behavior so as to predict, better assess, and even prevent health-related issues that constantly arise in everyday life.Entities:
Keywords: Google Trends; big data; health assessment; infodemiology; medicine; review; statistical analysis
Mesh:
Year: 2018 PMID: 30401664 PMCID: PMC6246971 DOI: 10.2196/jmir.9366
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of the selection procedure for including studies.
Figure 2Google Trends' publications per year in health-related fields from 2009 to 2016.
Description of the parameters used for classification.
| Parameter | Description |
| Authors | Includes the surname of the authors, date of publication, and link to the reference list (eg, |
| Period | Refers to the time-frame for which Google Trends data were retrieved and used in the study (eg, |
| Region | Refers to the country or countries or region (eg, |
| Language | Refers to the language in which the Google Trends search was conducted (eg, search for the Italian word |
| Keywords | Basic keywords are included in this category, mostly referring to the health topic examined and important keywords used to describe it. |
| Visualization (V) | Includes any form of visualization, that is, figures, maps, and screenshots (eg, screenshots of the Google Trends website). |
| Seasonality (S) | Studies that have explored the seasonality of the respective topic are included. |
| Correlations (C) | Studies that have examined correlations are included in this category. Correlations may be between Google Trends data and official data, among Google Trends time series, or between Google Trends and other Web-based sources’ time series. |
| Forecasting (F) | This category includes studies that conducted forecasting of either Google Trends time series or diseases, outbreaks, etc, using Google Trends data, independent of the method used. |
| Modeling (M) | Studies in this category conducted some form of modeling using Google Trends data. |
| Statistical Tools (St) | This category includes the studies that used statistical tools or tests, eg, |
Figure 3Countries by number of Scopus and PubMed publications using Google Trends.
Methods for exploring seasonality with Google Trends in health assessment.
| Number | Authors | Method | Description |
| 1 | Bakker et al, 2016 [ | Morlet Wavelet Analysis | To test the seasonality of Google Trends data in the examined countries |
| 2 | Braun and Harreus, 2013 [ | Visual evidence | N/Aa |
| 3 | Crowson et al, 2016 [ | Seasonal peaks | N/A |
| 4 | Deiner et al, 2016 [ | Spearman correlation | Correlating the seasonality of clinical diagnoses with Google Trends data |
| 5 | El-Sheikha, 2015 [ | Kruskal-Wallis test | To show seasonality for different months |
| 6 | Garrison et al, 2015 [ | Least-squares sinusoidal model | Variability in outcomes (supported also from a comparison with searches in Australia) |
| 7 | Harsha et al, 2014 [ | Kruskal-Wallis test | Seasonal (monthly) comparisons |
| 8 | Harsha et al, 2015 [ | Kruskal-Wallis test | Seasonal (monthly) comparisons |
| 9 | Hassid et al, 2016 [ | Pearson correlation | To examine seasonal variations across symptoms |
| 10 | Ingram and Plante, 2013 [ | Cosinor analysis; analysis of variance | To test the seasonal variation of the normalized Google Trends data; to compare the seasonal increase among the examined countries |
| 11 | Ingram et al, 2015 [ | Cosinor analysis | To test the seasonal variation of the normalized Google Trends data |
| 12 | Kang et al, 2015 [ | Visual observation | N/A |
| 13 | Leffler et al, 2010 [ | Correlations | Showing correlations among the 4 seasons for the 39 examined terms |
| 14 | Liu et al, 2016 [ | Seasonal model and a null model | Seasonality explained the searches significantly better with an F-test |
| 15 | Phelan et al, 2016 [ | Correlograms (autocorrelations plots) | Visual interpretation for exploring seasonal peaks |
| 16 | Plante and Ingram, 2014 [ | Cosinor analysis | To test the seasonal variation of the normalized Google Trends data |
| 17 | Rossignol et al, 2013 [ | Mann-Whitney U test; Harmonic Product Spectrum | Comparison of summer vs winter hits; evaluation of seasonality |
| 18 | Seifter et al, 2010 [ | Visual evidence | N/A |
| 19 | Sentana-Lledo et al, 2016 [ | Cosinor analysis | To test the seasonal variations of the Google Trends data |
| 20 | Takada, 2012 [ | Visual evidence | N/A |
| 21 | Telfer and Woodburn, 2015 [ | Two-way Wilcoxon signed rank test | To explore differences between winter and summer |
| 22 | Toosi and Kalia, 2015 [ | Visual evidence; cosinor analysis | To identify differences in seasonality between countries |
| 23 | Willson et al, 2015 [ | Visual evidence | N/A |
| 24 | Zhang et al, 2015 [ | Periodograms; ideal pass filter | To study the periodograms; to extract seasonal components |
aN/A: not applicable.
Methods of exploring correlations using Google Trends in health assessment.
| Number | Authors | Method | Description |
| 1 | Alicino et al, 2015 [ | Pearson correlation | Ebola-related Google Trends data with Ebola cases |
| 2 | Arora et al, 2016 [ | Spearman correlation | Suicide search activity vs official suicide rates (and per age) |
| 3 | Bakker et al, 2016 [ | Correlations | Between Google Trends data and reported cases |
| 4 | Bragazzi et al, 2016 [ | Pearson correlation | Between Google Trends data and epidemiological data |
| 5 | Bragazzi, 2013 [ | Autocorrelation; Pearson correlation | For the time series for multiple sclerosis (MS); between MS terms |
| 6 | Bragazzi et al, 2016 [ | Autocorrelation; Partial Autocorrelation | To compute correlation of the time series with its own values |
| 7 | Bragazzi et al, 2016 [ | Pearson correlation | Status epilepticus terms with etiology and management related terms |
| 8 | Bragazzi et al, 2016 [ | Pearson correlation | Google searches for Silicosis with Normalized Google News, Google Scholar, PubMed Publications, Twitter traffic, Wikipedia |
| 9 | Bragazzi et al, 2016 [ | Pearson correlation | Among Google Trends data and other data generating sources |
| 10 | Bragazzi, 2014 [ | Pearson correlation; autocorrelation and partial autocorrelation | Nonsuicidal self-injury and related terms; nonsuicidal self-injury plots showed regular cyclical pattern |
| 11 | Cavazos-Regh et al, 2015 [ | Pearson correlation | Among Google Trends data for noncigarette tobacco and prevalence |
| 12 | Cho et al, 2013 [ | Pearson correlation | Google flu-related queries with surveillance data for different influenza seasons |
| 13 | Crowson et al, 2016 [ | Pearson correlation | Between the selected keywords. Between medical prescriptions data and Google Trends data |
| 14 | Deiner et al, 2016 [ | Spearman correlation | For correlating seasonality of clinical diagnoses with Google Trends data |
| 15 | Domnich et al, 2015 [ | Pearson correlation | Among the examined search terms and influenza-like illness |
| 16 | Foroughi et al, 2016 [ | Rank correlations; cross-country correlations; Pearson correlations | For search volumes; for the search volumes for cancer; for the weekly search volumes between countries |
| 17 | Gahr et al, 2015 [ | Pearson correlation | Among annual prescription volumes and Google Trends data |
| 18 | Gamma et al, 2016 [ | Cross-correlations | Cross-correlations between search volumes and crime statistics |
| 19 | Gollust et al, 2016 [ | Multinomial Logit Models | To relate health insurance rates |
| 20 | Guernier et al, 2016 [ | Spearman correlation; cross-correlation | Correlating the examined search terms with notifications of tick paralysis cases record; with lag values from −7 to +7 months |
| 21 | Hassid et al, 2016 [ | Pearson correlation | Between Google Trends data and National Inpatient Sample data |
| 22 | Johnson et al, 2014 [ | Pearson correlation | Pearson correlations to explore the relation of Google Trends data and sexually transmitted infection reported rates |
| 23 | Kang et al, 2013 [ | Pearson correlation | To explore the association of (and among) search terms with surveillance data |
| 24 | Kang et al, 2015 [ | Spearman correlation | Google Trends data for allergic rhinitis and related Google Trends terms and real world epidemiologic data for the United States |
| 25 | Koburger et al, 2015 [ | Spearman-Brown correlation | To explore relations among Google Trends data and railway suicides |
| 26 | Ling and Lee, 2016 [ | Pearson correlation | Between disease prevalence and Google Trends data |
| 27 | Mavragani et al, 2016 [ | Pearson correlation | Between Google Trends data and published papers and Google Trends data with prescriptions |
| 28 | Phelan et al, 2016 [ | Linear Regression | To examine if there is significant correlation between searches and time |
| 29 | Poletto et al, 2016 [ | Pearson correlation | Between Google Trends data and number of alerts published by ProMED mail and the number of Disease Outbreak News published by the World Health Organization |
| 30 | Pollett et al, 2015 [ | Pearson correlation | To shortlist related search terms to pertussis |
| 31 | Rohart et al, 2016 [ | Spearman rank correlations; Spearman correlation; cross-correlations | For the diseases examined; correlations between diseases and the investigated search metrics; to identify best lags |
| 32 | Shin et al, 2016 [ | Spearman correlation | Between Google Trends data and the number of confirmed cases of Middle East Respiratory Syndrome and for quarantined cases of Middle East Respiratory Syndrome |
| 33 | Schootman et al, 2015 [ | Pearson correlation | Between Respiratory Syncytial Virus and Behavioral Risk Factor Surveillance System prevalence data for 5 cancer screening tests |
| 34 | Schuster et al, 2010 [ | Correlations | Lipitor Google Trends data and Lipitor revenues |
| 35 | Sentana-Lledo et al, 2016 [ | Kendall’s Tau-b test | To explore the correlation of Google Trends data with paper interview survey results |
| 36 | Simmering et al, 2014 [ | Cross-correlations | Between Google Trends data for drugs and drug utilization, to see changes in search volumes following knowledge events |
| 37 | Solano et al, 2016 [ | Correlations; cross-correlations | Between Google Trends data for suicide and national suicide rates; between different search terms |
| 38 | Wang et al, 2015 [ | Pearson correlation | Between Google Trends data and new dementia cases |
| 39 | Willson et al, 2015 [ | Spearman correlation | Between Google Trends data and observed data for aeroallergens |
| 40 | Zhang et al, 2015 [ | Cross-correlations | To examine linear and temporal associations of the seasonal data |
| 41 | Zhang et al, 2016 [ | Pearson correlation | To study pairwise comparisons among searches for different terms in Google Trends |
Forecasting and predictions using Google Trends in health assessment.
| Number | Authors | Method | Description |
| 1 | Bakker et al, 2016 [ | Statistical model | For forecasting chicken poxforce of infection, that is, monthly per capita rate of infection of children 0-14 |
| 2 | Domnich et al, 2015 [ | Generalized least squares (maximum likelihood estimates); Holt-Winters | Query-based models to predict influenza-like illness morbidity, with the exploratory variables: Influenza, Fever, Tachipirin; compared for forecasting power with Holt-Winters based on the real data (hold out set) |
| 3 | Parker et al, 2016 [ | Statistical model | For forecasting deaths for 1 year in advance (2015) |
| 4 | Pollett et al, 2015 [ | Prediction model | Tested the predicted model with a left-out dataset for prediction accuracy |
| 5 | Rohart et al, 2016 [ | Linear models | To forecast with 1 or 2 weeks step |
| 6 | Solano et al, 2016 [ | Cross-Correlations | Forecasting for suicides for 2 years without data (2013-14) based on Google Trends data of those years |
| 7 | Wang et al, 2015 [ | Cross-Correlations | To investigate forecasting with lags of 0-12 months |
| 8 | Zhang et al, 2016 [ | Autoregressive Moving Average | To predict Respiratory Syncytial Virus for “dabbing” |
| 9 | Zhou et al, 2011 [ | Dynamic model | To provide real time estimations by correcting the forecasting with the new morbidity data when published |
Statistical modeling using Google Trends in health assessment.
| Number | Authors | Method | Description |
| 1 | Alicino et al, 2015 [ | Multivariate regression | For relating Ebola Google Trends data, number of Ebola Cases, and the Human Development Index |
| 2 | Bakker et al, 2016 [ | Statistical model | For forecasting chicken poxforce of infection, that is, monthly per capita rate of infection |
| 3 | Bentley and Ormerod, 2009 [ | Maximum likelihood estimation | Established social model for engaging a new behavior for Web-based searching for flu terms |
| Barnes et al, 2015 [ | Hierarchical linear modeling | Three levels: 3 Mondays, 6 years, 47 search terms | |
| 4 | Bragazzi, 2013 [ | Multiple linear regression | To confirm multiannual long-term trends |
| 5 | Domnich et al, 2015 [ | Generalized linear model, autoregressive moving average process | Query volume-based models to predict influenza-like illness morbidity |
| 6 | El-Sheikha, 2015 [ | Linear regression | To show the global, regional, and country level interest for the search term |
| 7 | Fenichel et al, 2013 [ | Moving average, generalized linear model | Google Trends data as a variable in predicting loses in flights |
| 8 | Garrison et al, 2015 [ | Seasonal model | Best fit combination of a straight line and a sinusoid |
| 9 | Gollust et al, 2016 [ | Multinomial logit models | To relate health insurance rates |
| 10 | Haney et al, 2014 [ | ARIMAa | Radiology residency interest |
| 11 | Harsha et al, 2014 [ | Linear model | Statistical justification of annual increase in search volumes |
| 12 | Harsha et al, 2015 [ | Linear model | Statistical justification of annual increase in search volumes and of the Web-based interest related to applications for interventional radiology |
| 13 | Leffler et al, 2010 [ | Multivariable Linear Regressions | For studying the effect of climatic and environmental variables to internet searches |
| 17 | Linkov et al, 2014 [ | Polynomial trend lines | Fitted spline polynomial trend lines per time without statistical reporting |
| 18 | Liu et al, 2016 [ | Seasonal model | Best fit combination of a straight line and a sinusoid |
| 19 | Majumder et al, 2016 [ | Linear Smoothing | To adjust HealthMap to using Google Trends, model fits |
| 20 | Noar et al, 2013 [ | Linear Regression | To estimate the slope coefficient for changes in the magnitude of the effect size of Google Trends data and media search increases |
| 21 | Parker et al, 2016[ | L1-regularization on Google Trends | To build a model for forecasting deaths in each state |
| 22 | Phelan et al, 2014 [ | Linear Regression | To estimate the relation between news reports and search activity |
| 23 | Phelan et al, 2016 [ | Linear Regression | To examine if there is a significant correlation between searches and time |
| 24 | Pollett et al, 2015 [ | Linear Regression | Prediction model for pertussis cases based on Google Trends data of the most related terms |
| 25 | Rohart et al, 2016 [ | Linear models | To forecast with 1 or 2 weeks step |
| 26 | Scatà et al, 2016 [ | Epidemic model | Google Trends data is a measure of awareness, along with other sources |
| 27 | Schuster et al, 2010 [ | Generalized Linear models | Google Trends data for the examined drugs, Google Trends data and changes in annual revenues, and Google Trends data vs resource utilization |
| 28 | Stein et al, 2013 [ | Regression Fit Lines | To examine differences in queries |
| 29 | Telfer and Woodburn, 2015 [ | Visual decomposition; local regression | Figures 4, 6 and 8; regression-based decomposition of the time series for the search terms |
| 30 | Troelstra et al, 2016 [ | ARIMA | To account for dependency between data points in time series for “quit smoking” searches |
| 31 | Willson et al, 2015 [ | ARIMA | To quantify the effect of the observed (pollen) counts with the levels of search activity |
| 32 | Willson et al, 2015 [ | ARIMA | To quantify the effect of the observed (pollen) counts with the levels of search activity |
| 33 | Yang et al, 2015 [ | Prediction model (ARGOb) | To predict influenza-like illness |
| 34 | Zhou et al, 2011 [ | Dynamic Modeling | For forecasting tuberculosis incidents using Google Trends data |
aARIMA: autoregressive integrated moving average.
bARGO: autoregression with Google search data.
Statistical tests and tools using Google Trends in health assessment.
| Number | Authors | Method | Description |
| 1 | Bragazzi et al, 2016 [ | Mann-Kendall test | To show the statistical difference of peaks from the remaining period |
| 2 | Bragazzi et al, 2016 [ | ARIMAa | To show increased web searches due to an event, and correct seasonality |
| 3 | Campen et al, 2014 [ | Independent samples | For comparing searches with baseline period; for multiple weekly data comparisons |
| 4 | Crowson et al, 2016 [ | ANOVAb (Post-hoc Tukey test) | To compare grouped geographical federal regions of the United States (Northeast, Midwest, South, West) |
| 5 | El-Sheikha, 2015 [ | Wilcoxon rank test; Mann-Whitney | To study the change of interest at different time periods; to compare Web-based interest between the Northern and Southern hemispheres |
| 6 | Gahr et al, 2015 [ | Coefficients of determination | To determine the amount of variability between annual prescription volumes and Google search terms |
| 7 | Harsha et al, 2014 [ | ANOVA (Tukey-Kramer post hot test) | For the comparisons of US regions |
| 8 | Murray et al, 2016 [ | ANOVA; | To explore differences in months’ means per year; for the statistical differences of peaks compared with the remaining hits |
| 9 | Noar et al, 2013 [ | Augmented Dickey-Fuller tests | To test for nonstationarity of the time series |
| 10 | Phelan et al, 2014 [ | ANOVA | To explore differences among countries |
| 11 | Rohart et al, 2016 [ | Mean Square Error for Prediction | To assess prediction accuracy |
| 12 | Telfer and Woodburn, 2015 [ | Mann-Kendall trend tests | To detect trends significantly larger than the variance in the data for search terms |
| 13 | Troelstra et al, 2016 [ | ARIMA | Studied the effect of smoking cessation policies with ARIMA interrupted time series modeling ( |
| 14 | Zhang et al, 2015 [ | Augmented Dickey-Fuller test | To detect whether or not the extracted seasonal components of the studied trends were stationary |
| 15 | Zhang et al, 2016 [ | ANOVA | To examine the search interest for dabbing between groups of legal status states in the United States |
aARIMA: autoregressive integrated moving average.
bANOVA: analysis of variance.
Figure 4The four steps toward employing Google Trends for health assessment.