Literature DB >> 33682825

Correlation between flu and Wikipedia's pages visualization.

Vincenza Gianfredi1, Omar Enzo Santangelo2, Sandro Provenzano3.   

Abstract

INTRODUCTION: This study aimed to assess if the frequency of the Italian general public searches for influenza, using the Wikipedia web-page, are aligned with Istituto Superiore di Sanità (ISS) influenza cases.
MATERIALS AND METHODS: The reported cases of flu were selected from October 2015 to May 2019. Wikipedia Trends was used to assess how many times a specific page was read by users; data were extracted as daily data and aggregated on a weekly basis. The following data were extracted: number of weekly views by users from the October 2015 to May 2019 of the pages: Influenza, Febbre and Tosse (Flu, Fever and Cough, in English). Cross-correlation results are obtained as product-moment correlations between the two times series.
RESULTS: Regarding the database with weekly data, temporal correlation was observed between the bulletin of ISS and Wikipedia search trends. The strongest correlation was at a lag of 0 for number of cases and Flu (r=0.7571), Fever and Cough (r=0.7501). The strongest correlation was at a lag of -1 for Fever and Cough (r=0.7501). The strongest correlation was at a lag of 1 for number of cases and Flu (r=0.7559), Fever and Cough (r=0.7501).
CONCLUSIONS: A possible future application for programming and management interventions of Public Health is proposed.

Entities:  

Mesh:

Year:  2021        PMID: 33682825      PMCID: PMC7975939          DOI: 10.23750/abm.v92i1.9790

Source DB:  PubMed          Journal:  Acta Biomed        ISSN: 0392-4203


Introduction

Influenza is a single strand-RNA viral vaccine-preventable disease that affect millions of people each year and causing thousands of deaths (1). According to the World Health Organization (WHO), approximately 20% of the global population is yearly infected by influenza, and approximately 70.000 people died only considering the European Region (1). Same trend is also registered in Italy, where, according to the National Institute of Health (Istituto Superiore di Sanità, ISS), a range between 4-15% of population is infected yearly (2). Influenza is still a public health issue not only because of its high incidence rate, but also because its high burden in terms of health-care costs, lost working hours, and premature death (mortality rate ≈ 13 x 100.000, in Italy) (3). However, even if a large proportion of flu burden could be avoided, thanks to a safe vaccine, flu vaccination coverage is still largely below the threshold (4-8), causing periodic epidemic worldwide. In this context, prevention of flu spread is fundamental in order to control disease outbreak. Nevertheless, identification of new cases, through classical surveillance systems, is a critical point because largely affected by under-diagnosis and under-reporting (9). Moreover, traditional surveillance systems are expensive since health-care workers manually enter data, that aggregately collected, are sent to national Health Ministry to be further analyzed. This approach results in a time-lag that can range from a minimum of weeks to several months, that might affect the prompt preventive reaction. At the contrary, novel surveillance systems based on disease-related internet activity traces, using for instance web-page views (most frequently Wikipedia web-pages) (10, 11), social media posts (12), or search queries (most frequently Google Trends) (13-15) are become even more attractive because faster and cheaper. The hypothesis behind this approach is that an increase in flu cases is followed by a higher number of people who experienced flu symptoms, which in turn corresponds to higher flu related web search (or flu related posts) from the public. These novel surveillance systems statistically associate data from the traditional surveillance to the internet activities, in order to explore public interest and to inform mathematical models that can predict the outbreak going (16). This promising and flourishing science is known as infodemiology or infoveillance and could overcome some of the traditional systems’ issues because based on real-time monitoring. Therefore, the aim of the current study was to assess if the frequency of the Italian general public searches for influenza, using the Wikipedia web-page, are aligned with ISS influenza cases. Even if influenza syndrome might range between few and mild respiratory symptoms to complicated pneumonia requiring hospitalization; typical manifestations are characterized by fever, generally higher than 38°C, accompanied by cough, usually dry, persistent, and lasting 2 weeks or more. For this reason, we mainly focused our analysis using the keywords flu, fever and cough.

Materials and methods

A cross-sectional study design was used. The reported cases of flu were selected from October 2015 to May 2019. Every week from the 42nd week of the current year to the 17th week of the following year the Istituto Superiore di Sanità (ISS) issues a bulletin with the flu cases reported in the previous week (17). From Wikipedia (18) it is possible to know how many times a specific page is viewed by users, data were extracted as daily data and aggregated on a weekly basis, corresponding to the weeks reported in the ISS bulletins. The following data were extracted: number of weekly views by users from the October 2015 to May 2019 of the pages: Influenza, Febbre and Tosse (Flu, Fever and Cough, in English). The data extracted from Wikipedia have been moved over time (Lag), one week in the future and one week in the past as regards the database with weekly data. Cross-correlation results are obtained as product-moment correlations between the two times series. The advantage of using cross-correlations is that it accounts for time dependence between two time-series variables. Statistical analyses were performed using the Pearson correlation coefficient (r). According to a rule of thumb there is a strong correlation if r > 0.7, moderate correlation if the value of r is between 0.3 and 0.7 and weak correlation if r< 0.3 (19). The statistical significance level for the analyses was 0.05. The data were analyzed using the STATA statistical software, version 14 (20).

Results

Based on results, a temporal correlation was observed between the bulletin of ISS and Wikipedia search trends (Fever, Cough and Flu). Table 1 shows correlation for number of flu reported cases and the search terms of Wikipedia for weeks at Lag 0. A strong correlation was found for: “Number of cases” and “Flu” (r=0.7571), “Fever” and “Cough” (r=0.7501). A moderate correlation was found for: “Number of cases” and “Fever” (r=0.5320), “Fever” and “Flu” (r=0.6822). Weak correlation was found for “Cough” and “Flu” (r=0.3345).
Table 1.

Number of reported cases of flu and of search terms of Wikipedia, results for weeks at Lag 0. Used Pearson correlation coefficient

Number of casesFeverCoughFlu
Number of casesr1.0000
observations112
Feverr0.53201.000
p-value<0.001
observations112112
Coughr0.12880.75011.0000
p-value0.1760<0.001
observations112112112
Flur0.75710.68220.33451.0000
p-value<0.001<0.001<0.001
observations112112112112
Number of reported cases of flu and of search terms of Wikipedia, results for weeks at Lag 0. Used Pearson correlation coefficient Table 2 shows the number of flu reported cases and the search terms of Wikipedia for weeks at Lag -1. A strong correlation was found for: “Fever” and “Cough” (r=0.7501). A moderate correlation was found for: “Number of cases” and “Flu” (r=0.6956), “Fever” and “Flu” (r=0.6822), “Cough” and “Flu” (r=0.3345).
Table 2.

Number of reported cases of flu and of search terms of Wikipedia, results for weeks at Lag -1. Used Pearson correlation coefficient

Number of casesFeverCoughFlu
Number of casesr1.0000
observations111
Feverr0.48231.000
p-value<0.001
observations111112
Coughr0.10090.75011.0000
p-value0.2920<0.001
observations111112112
Flur0.69560.68220.33451.0000
p-value<0.001<0.001<0.001
observations111112112112
Number of reported cases of flu and of search terms of Wikipedia, results for weeks at Lag -1. Used Pearson correlation coefficient Table 3 shows correlation for number of flu reported cases and the search terms of Wikipedia for weeks at Lag +1. A strong correlation was found for: “Number of cases” and “Flu” (r=0.7559), “Fever” and “Cough” (r=0.7501). A moderate correlation was found for: “Number of cases” and “Fever” (r=0.5108), “Fever” and “Flu” (r=0.6822), “Cough” and “Flu” (r=0.3345).
Table 3.

Number of reported cases of flu and of search terms of Wikipedia, results for weeks at Lag +1. Used Pearson correlation coefficient

Number of casesFeverCoughFlu
Number of casesr1.0000
observations111
Feverr0.51081.000
p-value<0.001
observations111112
Coughr0.12940.75011.0000
p-value0.1757<0.001
observations111112112
Flur0.75590.68220.33451.0000
p-value<0.001<0.001<0.001
observations111112112112
Number of reported cases of flu and of search terms of Wikipedia, results for weeks at Lag +1. Used Pearson correlation coefficient

Discussion

In this study we found a large correlation between flu cases and Wikipedia search volume for flu, and fever (p <0.001 for both) but not for cough. In particular, the correlation is stronger for flu, and medium when fever is considered. Moreover, strong correlations are also found between keywords. This means that people who are interested in one of these keywords also read the others Wikipedia web pages. In other words, there is a significant correlation between keywords and flu reported cases, and among keywords. This result remains consistent even using different time lag, becoming more stronger when a time lag of 0 was adopted. This result confirms the hypothesis that the spreading of the infection is followed by the increasing public interest on symptoms and disease, rising the internet search volume. Even if, internet search volume, using different web pages or social media, was largely assessed in other countries (mainly Americas) (16, 21), no previous study assesses the phenomenon in Italy. This is important because one limit of this approach is that results might be biased by the characteristics of the population, cultural aspects, and availability of electronic devices. As for instance, McIver and Brownstein in their study showed as a high media attention on influenza raised the search volume and predate the epidemic up to 2 weeks before the reported cases (22). Sex and age are other important aspects that should be considered. Actually, older people might be less prone to use smartphones or computers or might be less expert in search information on internet. According to the National Institute of Statistics (Istituto Nazionale di Statistica, ISTAT) approximately 10% of people aged 60-70 years regularly use internet(23), however, people over 65 represent a quarter of the total Italian population(24), making Italy one of the most elderly European country. Women more frequently than men search on internet for health-related information, at the same time different level of financial deprivation, education and health literacy might also affect the results (25). However, the search volume of information-seeking through Wikipedia can be considered a good proxy of the general information-seeking behavior (26). This is due to the high accessibility, usability and perceived reliability of Wikipedia, proved by the fact that Wikipedia often rank highly in Google search results (27). Moreover, Wikipedia represents a prominent health information resource being one of the most frequently consulted web-pages for seeking health information (27). Furthermore, Wikipedia is used not only by the public, but even for educational (both by students and health professionals) and research purpose (27). Even if this wide use of Wikipedia might help to deeply understand human behavior, it might also generate some noise signals that may increase difficulties in data interpretation (28). Moreover, Wikipedia has also some limitations. Firstly, the quality and accuracy of the contents, that mainly affect the spread of correct information among general public and students. Secondly, the lower flexibility of Wikipedia that only allowed for counts of page views, reduces the regional or local degree of data interpretation if compared to other online resources as Google trends or Twitter (29). However, since Wikipedia has language-tailored pages and considering that Italian is only spoken in Italy, this makes the analysis of the language-specific pages (as Italian) more accurate than analyses of more international languages (as for instance English or Spanish). The analysis presented in this paper might also be conducted using the web-page of (inter)national health institutions or local health units (30). This might be relevant for public health workers in order to better understand the public interests and to measure how frequently the public interact with informative and educational materials published (31). These data provide a real-time feed-back extremely useful to plan future health communication campaigns (32). Actually, in our historical context, public health workforce should even more use the internet-based skills (both for communication and for research) in order to produce evidence-based data and practices (33).

Limitations

The study has some limits: the spike of Internet searches may be attributed to various factors. It may be due to the increased number of cases in the community and increased attention given by the mass media. The established correlation may not help to identify the place of an outbreak because the Wikipedia does not provide data at these levels. Moreover, temporal and geographic changes in the interface of Wikipedia over time are not well documented, which may affect the search output and our study findings. Thus, the interpretation and generalization of the findings call for caution.

Conclusion

To conclude, our results showed the association between number of weekly views of flu, caught and fever Wikipedia web pages and the spread of influenza in Italy, in the period October 2015 - May 2019. These results confirm the important role and usefulness of the infosurveillance systems in public health. Particularly because, providing data on internet research volume they can promptly inform about the spread of infectious diseases. Moreover, inforsurveillance systems offer data in a timeliness and cheapest manner.
  25 in total

1.  Digital epidemiology: assessment of measles infection through Google Trends mechanism in Italy.

Authors:  O E Santangelo; S Provenzano; D Piazza; D Giordano; G Calamusa; A Firenze
Journal:  Ann Ig       Date:  2019 Jul-Aug

2.  Predicting disease outbreaks: evaluating measles infection with Wikipedia Trends.

Authors:  Sandro Provenzano; Omar Enzo Santangelo; Domiziana Giordano; Enrico Alagna; Dario Piazza; Dario Genovese; Giuseppe Calamusa; Alberto Firenze
Journal:  Recenti Prog Med       Date:  2019-06

3.  Monitoring public interest toward pertussis outbreaks: an extensive Google Trends-based analysis.

Authors:  V Gianfredi; N L Bragazzi; M Mahamid; B Bisharat; N Mahroum; H Amital; M Adawi
Journal:  Public Health       Date:  2018-10-17       Impact factor: 2.427

4.  "PErCEIVE in Umbria": evaluation of anti-influenza vaccination's perception among Umbrian pharmacists.

Authors:  V Gianfredi; D Nucci; T Salvatori; F Orlacchio; M Villarini; M Moretti
Journal:  J Prev Med Hyg       Date:  2018-03-30

Review 5.  [Communication in health.]

Authors:  Vincenza Gianfredi; Chiara Grisci; Daniele Nucci; Valeria Parisi; Massimo Moretti
Journal:  Recenti Prog Med       Date:  2018 Jul-Aug

6.  Statistics corner: A guide to appropriate use of correlation coefficient in medical research.

Authors:  M M Mukaka
Journal:  Malawi Med J       Date:  2012-09       Impact factor: 0.875

7.  Flu vaccination in elite athletes: A survey among Serie A soccer teams.

Authors:  Carlo Signorelli; Anna Odone; Alessia Miduri; Paola Cella; Cesira Pasquarella; Armando Gozzini; Pasquale Tamburrino; Enrico Castellacci
Journal:  Acta Biomed       Date:  2016-09-13

8.  Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis.

Authors:  J Danielle Sharpe; Richard S Hopkins; Robert L Cook; Catherine W Striley
Journal:  JMIR Public Health Surveill       Date:  2016-10-20

9.  Leadership in Public Health: Opportunities for Young Generations Within Scientific Associations and the Experience of the "Academy of Young Leaders".

Authors:  Vincenza Gianfredi; Federica Balzarini; Marco Gola; Sveva Mangano; Lucia Federica Carpagnano; Maria Eugenia Colucci; Leandro Gentile; Antonio Piscitelli; Filippo Quattrone; Stefania Scuri; Lorenzo Giovanni Mantovani; Francesco Auxilia; Silvana Castaldi; Stefano Capolongo; Gabriele Pelissero; Anna Odone; Carlo Signorelli
Journal:  Front Public Health       Date:  2019-12-17

10.  Situating Wikipedia as a health information resource in various contexts: A scoping review.

Authors:  Denise A Smith
Journal:  PLoS One       Date:  2020-02-18       Impact factor: 3.240

View more
  2 in total

1.  Wikipedia, Google Trends and Diet: Assessment of Temporal Trends in the Internet Users' Searches in Italy before and during COVID-19 Pandemic.

Authors:  Daniele Nucci; Omar Enzo Santangelo; Mariateresa Nardi; Sandro Provenzano; Vincenza Gianfredi
Journal:  Nutrients       Date:  2021-10-20       Impact factor: 5.717

2.  Infodemiology of flu: Google trends-based analysis of Italians' digital behavior and a focus on SARS-CoV-2, Italy.

Authors:  Omar Enzo Santangelo; Sandro Provenzano; Vincenza Gianfredi
Journal:  J Prev Med Hyg       Date:  2021-09-15
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.