| Literature DB >> 26064532 |
Tobias Preis1, Helen Susannah Moat1.
Abstract
Seasonal influenza outbreaks and pandemics of new strains of the influenza virus affect humans around the globe. However, traditional systems for measuring the spread of flu infections deliver results with one or two weeks delay. Recent research suggests that data on queries made to the search engine Google can be used to address this problem, providing real-time estimates of levels of influenza-like illness in a population. Others have however argued that equally good estimates of current flu levels can be forecast using historic flu measurements. Here, we build dynamic 'nowcasting' models; in other words, forecasting models that estimate current levels of influenza, before the release of official data one week later. We find that when using Google Flu Trends data in combination with historic flu levels, the mean absolute error (MAE) of in-sample 'nowcasts' can be significantly reduced by 14.4%, compared with a baseline model that uses historic data on flu levels only. We further demonstrate that the MAE of out-of-sample nowcasts can also be significantly reduced by between 16.0% and 52.7%, depending on the length of the sliding training interval. We conclude that, using adaptive models, Google Flu Trends data can indeed be used to improve real-time influenza monitoring, even when official reports of flu infections are available with only one week's delay.Entities:
Keywords: complex systems; computational social science; data science
Year: 2014 PMID: 26064532 PMCID: PMC4448892 DOI: 10.1098/rsos.140095
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1.Real-time estimates (‘nowcasting’) of the unweighted percentages of weekly outpatient visits for influenza-like illness (ILI) in the USA between 3 January 2010 and 21 September 2013. Nowcasting models are forecasting models that estimate current levels of influenza, before the release of official data one week later. (a) Out-of-sample nowcasts using ILI data from the previous week and Google search query data from the current week, for a sliding training window of Δt=16 weeks. (b) In-sample nowcast errors for the baseline model, using ILI data from the previous week only, and the advanced model, using ILI data from the previous week and Google search query data from the current week. (c) Out-of-sample nowcast errors for the baseline model and the advanced model for Δt=16 weeks.