Literature DB >> 28756828

Is Google Trends a reliable tool for digital epidemiology? Insights from different clinical settings.

Gianfranco Cervellin1, Ivan Comelli2, Giuseppe Lippi3.   

Abstract

Internet-derived information has been recently recognized as a valuable tool for epidemiological investigation. Google Trends, a Google Inc. portal, generates data on geographical and temporal patterns according to specified keywords. The aim of this study was to compare the reliability of Google Trends in different clinical settings, for both common diseases with lower media coverage, and for less common diseases attracting major media coverage. We carried out a search in Google Trends using the keywords "renal colic", "epistaxis", and "mushroom poisoning", selected on the basis of available and reliable epidemiological data. Besides this search, we carried out a second search for three clinical conditions (i.e., "meningitis", "Legionella Pneumophila pneumonia", and "Ebola fever"), which recently received major focus by the Italian media. In our analysis, no correlation was found between data captured from Google Trends and epidemiology of renal colics, epistaxis and mushroom poisoning. Only when searching for the term "mushroom" alone the Google Trends search generated a seasonal pattern which almost overlaps with the epidemiological profile, but this was probably mostly due to searches for harvesting and cooking rather than to for poisoning. The Google Trends data also failed to reflect the geographical and temporary patterns of disease for meningitis, Legionella Pneumophila pneumonia and Ebola fever. The results of our study confirm that Google Trends has modest reliability for defining the epidemiology of relatively common diseases with minor media coverage, or relatively rare diseases with higher audience. Overall, Google Trends seems to be more influenced by the media clamor than by true epidemiological burden.
Copyright © 2017 Ministry of Health, Saudi Arabia. Published by Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Digital epidemiology; Epistaxis; Google Trends; Mushroom poisoning; Renal colic

Mesh:

Year:  2017        PMID: 28756828      PMCID: PMC7320449          DOI: 10.1016/j.jegh.2017.06.001

Source DB:  PubMed          Journal:  J Epidemiol Glob Health        ISSN: 2210-6006


Introduction

Traditional methods of data collection in epidemiological studies need heavy resources in terms of logistics, time, as well as human and material resources, so leading the way to searching alternative strategies for collecting data [1]. Since internet has increasingly become a meaningful health resource for both laypeople and health professionals, internet-derived information has been recognized as a surrogate tool for estimating epidemiology and gathering data about patterns of disease and population behavior [2]. Internet query platforms, which allow to interact with internet-based data, have been considered a source of potentially useful and accessible resources, especially aimed to identify outbreaks and implement intervention strategies [3]. The US Institute of Medicine (IOM) has also recently acknowledged that the use of internet data in health care research holds promise, and may also “complement and extend the data foundations that presently exist”[4]. Google Trends, a free and publically accessible online Alphabet Inc. portal, analyzes a portion of billions daily Google searches, generating data on geographical and temporal patterns according to specified keywords [5]. The usefulness of this search engine has been recognized for investigating epidemiological trends of specific diseases or groups of symptoms [6]. It has also been used in many research publications so far [7-11], but there is limited knowledge about the potential uses and limitations of Google Trends. Moreover, no agreed standards have been established so far for the appropriate use of this freely available search engine. A recent systematic review concluded that “Google Trends is being used to study health phenomena in a variety of topic domains in myriad ways, but poor documentation of methods precludes the reproducibility of the findings” [6]. Therefore, the aim of this study was to compare the reliability of Google Trends data in different clinical settings, in particular for some very common diseases with poor media coverage (i.e., low number of newspaper articles), as well as for other less common diseases attracting major media coverage (i.e., high number of newspaper articles).

Methods

Google Trends uses a fraction of searches for a specific term (“keyword” or “search term”), and then analyses the Google search outcome according to a given geographical location and a defined timeframe. A relative search volume (RSV, or Google Trends Index) is then assigned to the keyword, standardizing it from 0 to 100, where 100 represents the highest share of the term over a time series [6,12]. We carried out a search in Google Trends using the Italian equivalents to the English keywords “renal colic” (Italian: “colica renale”), “epistaxis” (Italian: “epistassi”) [along with “nose bleeding” (Italian: “sangue da naso”)], and “mushroom” (Italian: “funghi”) [along with “mushroom poisoning” (Italian: “intossicazione da funghi”)]. We selected these conditions because reliable epidemiological data, specifically associated to microclimate changes and seasonality, have been previously published, based on our Emergency Department (ED) epidemiology [13-15]. The web search was focused on the Parma Province in northern Italy, since the earlier data we have published specifically refers to this geographical area. The time limit of the Google Trends search matched exactly that of published epidemiological data, i.e., years 2002–2010 for renal colics, 2003–2012 for epistaxis and 2007–2016 for mushroom poisoning. The three clinical conditions have very poor media coverage, since no articles have been published in local media about these topics over the same period of time (see below). A second Google Trends search was then carried out for three additional clinical conditions [i.e., “meningitis” (Italian: “meningite”), “Legionella Pneumophila pneumonia” (Italian: “legionella”) and “Ebola fever” (Italian: Ebola)], which recently received large focus by the media. Poor or high media coverage was defined by systematically checking the on-line archives of local newspaper (Gazzetta di Parma) for local epidemiology (i.e., renal colic, epistaxis, mushroom poisoning), and the on-line archives of the local and the three main national newspapers (Corriere della Sera, Repubblica, and La Stampa) for the other topics (i.e., meningitis, Legionella, autism, vaccines, myocardial infarction, and influenza). Meningitis was found to be the most frequent healthcare topic covered by Italian newspapers in 2016, whereas Legionella Pneumophila pneumonia was found to be the most frequent healthcare topic covered in local newspapers (i.e., province of Parma, about 438.000 inhabitants, with an excellent internet connectivity, reaching up to 95% of the territory) in the year 2016, since a small outbreak of disease occurred in town, between September–October 2016. Ebola fever was also found to be one of the most covered topics in Italy during the year 2014 (i.e., the beginning of the African outbreak), despite the fact that the national epidemiological burden of disease was negligible. The entire year 2016 was searched for meningitis and Legionella Pneumophila pneumonia, and the entire year 2014 for Ebola virus fever.

Results

The main results of our study are shown in Figs. 1–3. A negligible overlap was observed between the seasonality of published data and Google Trends results for renal colics, epistaxis and mushroom poisoning (FigS. 1–3). Throughout the different years of analysis, the incidence of renal colics exhibited a considerable increase between May–August and a peak in July. Unlike this real epidemiology data, the information on renal colics obtained searching Google Trends did not show a significant seasonal pattern, but also showed remarkable differences from year to year, with no apparent correlation with local epidemiology information (Fig. 1). Unlike renal colics, the case of epistaxis displayed opposite seasonality in our province, with a peak between December and January. Even in this case Google Trends was not able to capture the true epidemiological pattern, displaying a large annual variability and a rather unpredictable outline (Fig. 2). Different results were obtained with “mushroom”. The Google Trends data displayed a seasonal pattern, almost overlapping with the real epidemiological profile. However, when the keyword “mushroom poisoning” used, Google Trends generated a considerably different pattern (Fig. 3). It is hence likely that the large media coverage for “mushroom” obtained from Google Trend was mostly attributable to information about harvesting and cooking rather than to real cases of mushrooms poisoning.
Fig. 1.

Number of renal colics seen in the ED, and average of Google Trends Index (referred to the Parma Province), calculated monthly, years 2007–2016.

Fig. 3.

Number of mushroom poisonings seen in the ED, and average of Google Trends Index (referred to the Parma Province) (double search, i.e., “mushrooms” and “mushroom poisoning”), calculated monthly, years 2007–2016.

Fig. 2.

Number of epistaxis episodes seen in the ED, and average of Google Trends Index (referred to the Parma Province) (double search, i.e., “epistaxis” and “nose bleeding”), calculated monthly, years 2007–2016.

Number of renal colics seen in the ED, and average of Google Trends Index (referred to the Parma Province), calculated monthly, years 2007–2016. Number of epistaxis episodes seen in the ED, and average of Google Trends Index (referred to the Parma Province) (double search, i.e., “epistaxis” and “nose bleeding”), calculated monthly, years 2007–2016. Number of mushroom poisonings seen in the ED, and average of Google Trends Index (referred to the Parma Province) (double search, i.e., “mushrooms” and “mushroom poisoning”), calculated monthly, years 2007–2016. A fairly constant number of ∼190 cases/year of meningococcal meningitis have been recorded in Italy between the years 2011 and 2016, with a modestly increased trend in the Tuscany Region (2015: Tuscany 38 cases, followed by Lombardy, with 34 cases). Nevertheless, the media coverage of these cases was obsessive, often generating misleading information, since meningococcal meningitis was confused with other non-epidemic forms (i.e. Streptococcus Pneumoniae, Hemophilus). Notably, this also contributed to generate a paranoid and unjustified fear of travelling to Tuscany. This is clearly reflected by the peak of Google search data using the keyword “meningitis“ (Fig. 4).
Fig. 4.

Number of meningococcal meningitis in the Emilia Romagna Region, and average of Google Trends Index (referred to the Parma Province) (term “meningitis”), calculated monthly, year 2016.

Number of meningococcal meningitis in the Emilia Romagna Region, and average of Google Trends Index (referred to the Parma Province) (term “meningitis”), calculated monthly, year 2016. Despite an outbreak of only 41 cases (with 2 deaths in elderly patients, both with several comorbidities) of Legionella Pneumophila occurred in the Province of Parma (438.000 inhabitants) between September and October 2016, a considerable peak of data was generated by Google Trend using the keyword “Legionella“, coinciding with the weeks when the local media published an extraordinary number of articles on this small outbreak (Fig. 5). Even more surprisingly, when the local media published the news of a single case of meningitis due to Legionella Pneumophila pneumonia occurring in a small village of our Province (in February), a new peak of interest was evident in Google Trends.
Fig. 5.

Number of Legionella Pneumophila pneumonia in the Province of Parma, and average of Google Trends Index (referred to the Parma Province) (term “legionella”), calculated monthly, year 2016.

Number of Legionella Pneumophila pneumonia in the Province of Parma, and average of Google Trends Index (referred to the Parma Province) (term “legionella”), calculated monthly, year 2016. The data about the keywords “Ebola” are even more impressive. Although no single case has ever been recorded in Northern Italy, two peaks emerged from Google Trends, in August and October 2014, corresponding to the largest media coverage of the African epidemics. The Google Trends data failed to reflect the real geographical and temporary pattern of disease, also in this case (Fig. 6).
Fig. 6.

Number of Ebola virus fever in the Emilia Romagna region, and average of Google Trends Index (referred to the Parma Province) (term “Ebola”), calculated monthly, year 2014.

Number of Ebola virus fever in the Emilia Romagna region, and average of Google Trends Index (referred to the Parma Province) (term “Ebola”), calculated monthly, year 2014.

Discussion

The terms ‘infodemiology’ and ‘infoveillance’ were coined by Gunther Eysenbach, with the aim of describing a new approach for public health [16,17], based on web data monitoring and data mining, within the conceptual framework of the so-called e-health [18,19]. Despite the use of Google Trends has considerably increased in recent years for investigating the epidemiological trends of some specific diseases or groups of symptoms [6], the reliability of this approach remains largely speculative. As for its functional algorithm, Google Trends assigns a relative search volume (RSV) comprised between 0 and 100 for a given keyword, where 100 represents the highest share of this keyword over time. This index is hence inherently arbitrary and not absolute [6,12]. For example, an index of “100” generated for “renal colic” when this keyword is searched alone in the year 2016 sharply decreases when the keywords “renal colic”, “myocardial infarction”, “vaccines”, “autism” and “influenza” are searched altogether over the same period of time. Notably, the output of the search term “autism” displays an amazing peak in May, which is obviously unrelated to the real epidemiology of disease, but is possibly due to the fact that April 2 is the world autism awareness day, thus generating transient media coverage. Only using the keyword “influenza” Google Trends and real epidemiology data apparently overlap (Fig 7).
Fig. 7.

Comparison of the average Google Trends Indexes for five different medical terms (i.e., renal colic, myocardial infarction, influenza, vaccines, and autism), Emilia Romagna Region, year 2016.

Comparison of the average Google Trends Indexes for five different medical terms (i.e., renal colic, myocardial infarction, influenza, vaccines, and autism), Emilia Romagna Region, year 2016. One important issue that emerges from this data is that Goggle Trends tends to underestimate the real epidemiological burden when the general public has poor knowledge of a given disease. For example, Google Trends underestimated the official surveillance statistics of flue during the first pandemic wave of H1N1 virus in the United States, but mirrored the real epidemiological pattern during the second wave, between the years 2009 and 2010 [20]. The search volumes of Google Trends are frequently found to be increased for conditions with large media coverage or, at least, during periods characterized by a higher burden of disease, so that they are gaining momentum in surveillance studies on several epidemiologically relevant diseases [6]. This is the case, for example, of Ebola fever, which fortunately did not directly involved European countries, but was the focus of large media coverage, so representing “a stew of fear” as defined by an editorial published in the New England Journal of Medicine [21]. It has also been recently suggested that media coverage of health-related news does not disclose costs, risks and conflicts of interest, but often overemphasizes benefits and exaggerates claims [22,23], thus supporting the concept that popular media may be sometimes detrimental rather than really useful for public health [24]. Taken together, the results of our study confirm that Google Trends has very modest reliability for delineating the true population epidemiology of relatively common diseases with poor media coverage or rarer diseases with large audience. Overall, Google Trends seems to be more influenced by media clamor than by the true epidemiological impact of disease, at least in the diseases examined here. Therefore, the real scientific usefulness of the so called “digital epidemiology” remains questionable, at least when using Google Trends. Although mining the Web is an intriguing perspective, this source of information cannot be taken for granted or even replace the efforts of public health care organizations and clinicians for obtaining “real life” epidemiological data.
  21 in total

1.  Does the media support or sabotage health?

Authors: 
Journal:  Lancet       Date:  2009-02-21       Impact factor: 79.321

2.  An easy to use and affordable home-based personal eHealth system for chronic disease management based on free open source software.

Authors:  Tatjana M Burkow; Lars K Vognild; Trine Krogstad; Njål Borch; Geir Ostengen; Astrid Bratvold; Marijke Jongsma Risberg
Journal:  Stud Health Technol Inform       Date:  2008

3.  Using Search Engine Query Data to Explore the Epidemiology of Common Gastrointestinal Symptoms.

Authors:  Benjamin G Hassid; Lukejohn W Day; Mohannad A Awad; Justin L Sewell; E Charles Osterberg; Benjamin N Breyer
Journal:  Dig Dis Sci       Date:  2016-11-23       Impact factor: 3.199

4.  Influence of World Thrombosis Day on digital information seeking on venous thrombosis: a Google Trends study.

Authors:  L J J Scheres; W M Lijfering; S Middeldorp; S C Cannegieter
Journal:  J Thromb Haemost       Date:  2016-11-19       Impact factor: 5.824

5.  Digital epidemiology reveals global childhood disease seasonality and the effects of immunization.

Authors:  Kevin M Bakker; Micaela Elvira Martinez-Bakker; Barbara Helm; Tyler J Stevenson
Journal:  Proc Natl Acad Sci U S A       Date:  2016-05-31       Impact factor: 11.205

6.  Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States.

Authors:  Benjamin N Breyer; Saunak Sen; David S Aaronson; Marshall L Stoller; Bradley A Erickson; Michael L Eisenberg
Journal:  Urology       Date:  2011-04-03       Impact factor: 2.649

7.  The unbearable lightness of health science reporting: a week examining Italian print media.

Authors:  Luca Iaboli; Luana Caselli; Angelina Filice; Gianpaolo Russi; Eleonora Belletti
Journal:  PLoS One       Date:  2010-03-24       Impact factor: 3.240

8.  Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet.

Authors:  Gunther Eysenbach
Journal:  J Med Internet Res       Date:  2009-03-27       Impact factor: 5.428

9.  New times, new needs; e-epidemiology.

Authors:  Alexandra Ekman; Jan-Eric Litton
Journal:  Eur J Epidemiol       Date:  2007-05-16       Impact factor: 12.434

10.  Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales.

Authors:  Donald R Olson; Kevin J Konty; Marc Paladini; Cecile Viboud; Lone Simonsen
Journal:  PLoS Comput Biol       Date:  2013-10-17       Impact factor: 4.475

View more
  66 in total

1.  Seasonality of bruxism: evidence from Google Trends.

Authors:  Sinan Kardeş; Elif Kardeş
Journal:  Sleep Breath       Date:  2019-02-21       Impact factor: 2.816

2.  Statins popularity: A global picture.

Authors:  Giuseppe Lippi; Camilla Mattiuzzi; Gianfranco Cervellin
Journal:  Br J Clin Pharmacol       Date:  2019-05-11       Impact factor: 4.335

3.  Lymelight: forecasting Lyme disease risk using web search data.

Authors:  Adam Sadilek; Yulin Hswen; John S Brownstein; Evgeniy Gabrilovich; Shailesh Bavadekar; Tomer Shekel
Journal:  NPJ Digit Med       Date:  2020-02-04

4.  Rare diseases: the paradox of an emerging challenge.

Authors:  Elisa Danese; Giuseppe Lippi
Journal:  Ann Transl Med       Date:  2018-09

5.  Seasonal variation in the internet searches for gout: an ecological study.

Authors:  Sinan Kardeş
Journal:  Clin Rheumatol       Date:  2018-10-29       Impact factor: 2.980

6.  Google Trends application for the study of information search behaviour on oropharyngeal cancer in Spain.

Authors:  Miguel Mayo-Yáñez; Christian Calvo-Henríquez; Carlos Chiesa-Estomba; Jérôme R Lechien; Lucía González-Torres
Journal:  Eur Arch Otorhinolaryngol       Date:  2020-11-25       Impact factor: 2.503

7.  Direct oral anticoagulants: analysis of worldwide use and popularity using Google Trends.

Authors:  Giuseppe Lippi; Camilla Mattiuzzi; Gianfranco Cervellin; Emmanuel J Favaloro
Journal:  Ann Transl Med       Date:  2017-08

8.  Leveraging Google Trends to investigate the global public interest in rheumatoid arthritis.

Authors:  Guo-Cui Wu; Sha-Sha Tao; Chan-Na Zhao; Yan-Mei Mao; Qian Wu; Yi-Lin Dan; Hai-Feng Pan
Journal:  Rheumatol Int       Date:  2019-04-06       Impact factor: 2.631

9.  Popularity of sleep disordered breathing in childhood: an analysis of worldwide search using Google Trends.

Authors:  Marco Zaffanello; Giuseppe Lippi; Nazzarena Arman; Michele Piazza; Laura Tenero; Giorgio Piacentini
Journal:  Transl Pediatr       Date:  2019-12

10.  Google Search Activity and Heart Failure: Analysis of the US Population's Interest in Heart Failure and Its Correlation with Heart Failure-Associated Mortality.

Authors:  Aakash Sheth; Ruchi Bhandari; Harsh Patel; Daniel P Morin; Paari Dominic
Journal:  J Card Fail       Date:  2020-11-12       Impact factor: 5.712

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.