| Literature DB >> 30305417 |
Christoph Zimmer1,2, Sequoia I Leuba3, Reza Yaesoubi4, Ted Cohen3.
Abstract
Seasonal influenza causes millions of illnesses and tens of thousands of deaths per year in the USA alone. While the morbidity and mortality associated with influenza is substantial each year, the timing and magnitude of epidemics are highly variable which complicates efforts to anticipate demands on the healthcare system. Better methods to forecast influenza activity would help policymakers anticipate such stressors. The US Centers for Disease Control and Prevention (CDC) has recognized the importance of improving influenza forecasting and hosts an annual challenge for predicting influenza-like illness (ILI) activity in the USA. The CDC data serve as the reference for ILI in the USA, but this information is aggregated by epidemiological week and reported after a one-week delay (and may be subject to correction even after this reporting lag). Therefore, there has been substantial interest in whether real-time Internet search data, such as Google, Twitter or Wikipedia could be used to improve influenza forecasting. In this study, we combine a previously developed calibration and prediction framework with an established humidity-based transmission dynamic model to forecast influenza. We then compare predictions based on only CDC ILI data with predictions that leverage the earlier availability and finer temporal resolution of Wikipedia search data. We find that both the earlier availability and the finer temporal resolution are important for increasing forecasting performance. Using daily Wikipedia search data leads to a marked improvement in prediction performance compared to weekly data especially for a three- to four-week forecasting horizon.Entities:
Keywords: Wikipedia; data resolution; forecasting; influenza; transmission dynamics
Mesh:
Year: 2018 PMID: 30305417 PMCID: PMC6228485 DOI: 10.1098/rsif.2018.0220
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Figure 1.Wikipedia search data provide good fits to CDC ILI data. The CDC ILI data from the season 2008/2009 are in black crosses, estimates obtained from Wikipedia searches with weekly aggregation are in dark grey and estimates obtained from Wikipedia searches in daily resolution are in light grey.
Figure 2.Workflow steps.
Figure 3.Using Wikipedia data leads to a gain in prediction of influenza. This figure summarizes influenza predictions over different years depicted in electronic supplementary material, figure A4. The following relations are significant (p-value < 0.05) based on the Wilcoxon signed-rank test: Weekly Wikipedia without nowcasting (green) is worse than the CDC ILI baseline for one-week forecasts. Weekly (blue) and daily (red) Wikipedia with nowcasts is always better than the CDC ILI baseline and the weekly Wikipedia without nowcasting (green). Daily Wikipedia (red) is better than weekly Wikipedia (blue) for three- and four-week forecasts. (Online version in colour.)
Figure 4.Using daily Wikipedia data versus weekly Wikipedia data reduces the inter-quantile distance of % ILI forecasts. For all seven seasons, using daily versus weekly Wikipedia data reduces the inter-quantile distance for two-, three- and four-week forecasts. The reduction using daily Wikipedia data was significant (p-value <0.05) for two-, three- and four-week forecasting targets based on the Wilcoxon signed-rank test. (Online version in colour.)
Figure 5.Using daily instead of weekly Wikipedia data reduces parameter uncertainty. Relative reduction in inter-quantile distance (5%- to 95%-quantile) for estimates of the effective reproductive number, Reff, over the seven seasons indicates using daily versus weekly Wikipedia data reduces parameter uncertainty.