| Literature DB >> 24146603 |
Donald R Olson1, Kevin J Konty, Marc Paladini, Cecile Viboud, Lone Simonsen.
Abstract
The goal of influenza-like illness (ILI) surveillance is to determine the timing, location and magnitude of outbreaks by monitoring the frequency and progression of clinical case incidence. Advances in computational and information technology have allowed for automated collection of higher volumes of electronic data and more timely analyses than previously possible. Novel surveillance systems, including those based on internet search query data like Google Flu Trends (GFT), are being used as surrogates for clinically-based reporting of influenza-like-illness (ILI). We investigated the reliability of GFT during the last decade (2003 to 2013), and compared weekly public health surveillance with search query data to characterize the timing and intensity of seasonal and pandemic influenza at the national (United States), regional (Mid-Atlantic) and local (New York City) levels. We identified substantial flaws in the original and updated GFT models at all three geographic scales, including completely missing the first wave of the 2009 influenza A/H1N1 pandemic, and greatly overestimating the intensity of the A/H3N2 epidemic during the 2012/2013 season. These results were obtained for both the original (2008) and the updated (2009) GFT algorithms. The performance of both models was problematic, perhaps because of changes in internet search behavior and differences in the seasonality, geographical heterogeneity and age-distribution of the epidemics between the periods of GFT model-fitting and prospective use. We conclude that GFT data may not provide reliable surveillance for seasonal or pandemic influenza and should be interpreted with caution until the algorithm can be improved and evaluated. Current internet search query data are no substitute for timely local clinical and laboratory surveillance, or national surveillance based on local data collection. New generation surveillance systems such as GFT should incorporate the use of near-real time electronic health data and computational methods for continued model-fitting and ongoing evaluation and improvement.Entities:
Mesh:
Year: 2013 PMID: 24146603 PMCID: PMC3798275 DOI: 10.1371/journal.pcbi.1003256
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Retrospective and prospective performance of original and updated Google Flu Trends (GFT) algorithm compared with national (United States), regional (Mid-Atlantic States) and local (New York City) weekly influenza-like illness (ILI) surveillance data, 2003–2013.
| Time Period and Geographic Location | Original GFT model | Updated GFT model |
| R2 | R2 | |
|
| ||
| Retrospective GFT model-fitting period | 0.91 | 0.94 |
| Prospective GFT model period | 0.64 | 0.73 |
| All study weeks | 0.86 | 0.77 |
|
| ||
| Retrospective GFT model-fitting period | 0.79 | 0.77 |
| Prospective GFT model period | 0.27 | 0.57 |
| All study weeks | 0.64 | 0.64 |
|
| ||
| Retrospective GFT model-fitting period | 0.89 | 0.51 |
| Prospective GFT model period | 0.03 | 0.77 |
| All study weeks | 0.34 | 0.41 |
Performance was evaluated by linear regression of weekly GFT estimates against weekly ILI surveillance.
Original GFT model time periods: The retrospective query selection model-fitting period was from September 28, 2003 through March 17, 2007; the prospective GFT model validation period was from March 18, 2007 through May 17, 2008 and ongoing operation was from May 18, 2008 through Aug 1, 2009. Mid-Atlantic region states included NJ, NY and PA (13). New York comparison was based on NY state GFT estimates (16).
Updated GFT model time periods: the retrospective query selection model-fitting period was from September 28, 2003 through September 18, 2009; The prospective operation period has run from September 19, 2009 through March 30, 2013. Mid-Atlantic region states included only NJ and NY (14). The New York level comparison was based on New York City GFT estimates (16).
Figure 1Time-series of weekly influenza-like illness (ILI) surveillance and Google Flu Trends (GFT) search query estimates, June 2003–March 2013.
Observed weekly ILI proportions (black lines) are shown with Serfling model baseline (gray lines) and 95% epidemic threshold (dashed lines). The periods of the early wave of the 2009 pandemic and the 2012/2013 epidemic are shaded in grey. Sentinel ILI-Net surveillance is shown for (A) the United States and (B) Mid-Atlantic States (New Jersey, New York, Pennsylvania). Local ILI surveillance from emergency department visits is shown for (C) New York City. Scaled GFT internet search query estimates are shown for model-fitting periods for the original (thin red line) and updated (thin blue line) GFT models, and for the periods of prospective operation of the original (thick red line) and updated (thick blue line) GFT models. For Mid-Atlantic States the updated GFT model data represents ILI proportions only for New Jersey and New York (see Supporting Information).
Comparison of seasonal and epidemic week of onset and peak weeks as measured by Google Flu Trends (GFT) and public health influenza-like illness (ILI) surveillance data at the national (United States), regional (Mid-Atlantic) and local (New York City) levels.
| Time Period | National, United States | Regional, Mid-Atlantic States | Local, New York City | ||||||
| Week of Onset (Peak) ILI Surveillance | Difference in Week of Onset (Peak) Original GFT model | Difference in Week of Onset (Peak) Updated GFT model | Week of Onset (Peak) ILI Surveillance | Difference in Week of Onset (Peak) Original GFT model | Difference in Week of Onset (Peak) Updated GFT model | Week of Onset (Peak) ILI Surveillance | Difference in Week of Onset (Peak) Original GFT model | Difference in Week of Onset (Peak) Updated GFT model | |
| 2003/2004 season | 44 (52) | +3 (−2) | +3 (0) | 48 (52) | −1 (−1) | 0 (0) | 46 (52) | +1 (−1) | +1 (0) |
| 2004/2005 season | 51 (6) | 0 (0) | 0 (+1) | 49 (51/6) | +1 (+2/0) | +1 (+1/+1) | 47 (52) | +3 (+1) | +3 (+1) |
| 2005/2006 season | 49 (52/9) | +2 (0/0) | +2 (0/0) | 48 (52/6) | +4 (+1/+3) | +4 (0/+3) | 3 (6) | −2 (+3) | −3 (+1) |
| 2006/2007 season | 50 (52/7) | +1 (0/−1) | +1 (0/0) | 47 (52/7) | +4 (+1/+2) | +5 (+1/+2) | 47 (8) | +4 (+1) | +11 (0) |
| 2007/2008 season | 52 (7) | +1 (+1) | +3 (+1) | 4 (7) | −3 (+1) | −3 (+1) | 44 (7) | +9 (+1) | +9 (+1) |
| 2008/2009 season | 4 (6) | −1 (+2) | 0 (+1) | 4 (8) | 0 (−2) | −3 (−2) | 3 (7) | −2 (−1) | −2 (0) |
| Spring 2009 pandemic A/H1N1 | 17 (17) |
| 0 (0) | 17 (21) |
| 0 (+2) | 17 (21) | +3 (−1) | 0 (0) |
| 2009/2010 pandemic season |
| NA |
|
| NA |
| 34 (47) | NA | +1 (−3) |
| 2010/2011 season | 50 (5) | NA | +1 (+2) | 48 (52/6) | NA | +3 (+1/+1) | 46 (52) | NA | +4 (+7) |
| 2011/2012 season | 8 (11) | NA | −8 (−1) |
| NA |
|
| NA |
|
| 2012/2013 season | 47 (52) | NA | −8 (+3) | 48 (52) | NA | −9 (+3) | 49 (3) | NA | −11 (0) |
Week of onset was identified as the first of consecutive weeks for each system and region above its Serfling regression 95% threshold, and peaks were identified as the weeks reporting the highest percent-ILI for each season or epidemic. The public health ILI onset and peak weeks are given by surveillance week for each season. The GFT model onset and peak weeks are given relative to the corresponding season/epidemic and regional ILI surveillance weeks.
Original GFT model time periods: The retrospective query selection model-fitting period was from September 28, 2003 through March 17, 2007; the prospective GFT model validation period was from March 18, 2007 through May 17, 2008 and ongoing operation was from May 18, 2008 through Aug 1, 2009. Mid-Atlantic region states included NJ, NY and PA (13). New York comparison was based on NY state GFT estimates (16).
Updated GFT model time periods: the retrospective query selection model-fitting period was from September 28, 2003 through September 18, 2009; The prospective operation period has run from September 19, 2009 through March 30, 2013. Mid-Atlantic region states included only NJ and NY (14). The New York level comparisons was based on New York City GFT estimates (16).
National and Mid-Atlantic region data remained above threshold at the beginning of the 2009/2010 pandemic season.
No consecutive weeks above threshold to identify onset or peak during this period.
Comparison of epidemic intensity during the 2009 A/H1N1 influenza pandemic and the 2012/2013 seasonal A/H3N2 epidemic as measured by Google Flu Trends (GFT) and public health influenza-like illness (ILI) surveillance at the national (United States), regional (Mid-Atlantic) and local (New York City) levels.
| Epidemic peak | Epidemic intensity as percent over baseline | Comparison GFT to ILI surveillance | |||||
| Time Period and Geographic Location | ILI% at peak week | seasonal excess (95% CI) | ratio excess GFT∶ILI | ||||
| ILI surveillance | original GFT model | updated GFT model | ILI surveillance | original GFT model | updated GFT model | prospective (retrospective) | |
|
| |||||||
| Spring 2009 pandemic A/H1N1 | 2.7 | 1.5 | 2.1 | 10.3 (6.1–14.5) | 0.3 (0.1–0.6) | 9.7 (5.5–13.9) | 0.03 (0.94) |
| Autumn 2009 pandemic A/H1N1 | 7.7 | NA | 7.1 | 59.2 (51.8–66.5) | NA | 43.8 (37.9–49.8) | 0.74 |
| 2009 pandemic A/H1N1, both waves | 7.7 | NA | 7.1 | 69.4 (57.9–81.0) | NA | 53.5 (43.4–63.7) | 0.77 |
| 2012/2013 seasonal A/H3N2 | 6.1 | NA | 10.6 | 27.3 (21.7–32.9) | NA | 73.2 (63.7–82.6) | 2.68 |
|
| |||||||
| Spring 2009 pandemic A/H1N1 | 4.9 | 1.4 | 3.2 | 27.2 (21.9–32.5) | 0.6 (0.03–1.1) | 19.2 (15.4–23.0) | 0.02 (0.71) |
| Autumn 2009 pandemic A/H1N1 | 8.3 | NA | 7 | 52.1 (42.8–61.3) | NA | 40.2 (33.5–46.9) | 0.77 |
| 2009 pandemic A/H1N1, both waves | 8.3 | NA | 7.1 | 79.3 (64.7–93.8) | NA | 59.4 (48.9–70.0) | 0.75 |
| 2012/2013 seasonal A/H3N2 | 5.7 | NA | 13 | 34.3 (27.3–41.4) | NA | 71.4 (65.9–76.8) | 2.08 |
|
| |||||||
| Spring 2009 pandemic A/H1N1 | 14.3 | 1.4 | 3.1 | 55.5 (52.2–58.8) | 1.3 (0.4–2.1) | 15.4 (10.9–19.8) | 0.02 (0.28) |
| Autumn 2009 pandemic A/H1N1 | 4.5 | NA | 4.4 | 26.5 (19.0–34.0) | NA | 24.3 (18.8–29.9) | 0.92 |
| 2009 pandemic A/H1N1, both waves | 14.3 | NA | 4.4 | 82.0 (71.2–92.8) | NA | 39.7 (29.7–49.7) | 0.48 |
| 2012/2013 seasonal A/H3N2 | 5.9 | NA | 12.7 | 26.3 (21.2–31.42) | NA | 77.9 (68.2–87.5) | 2.96 |
Epidemic intensity was measured by Serfling regression of weekly percent-ILI for public health surveillance data and GFT estimates for peak week and seasonal epidemic excess, with corresponding upper and lower 95% limit, calculated as the predicted non-epidemic baseline +1.96 standard deviations.
Performance of Google Flu Trends (GFT) relative to public health influenza-like illness (ILI) surveillance at the national (United States), regional (Mid-Atlantic States) and local (New York City) levels for specific epidemic and pandemic seasons.
| Time Period and Geographic Location | Original GFT model | Updated GFT model |
| R2 | R2 ('+/− week lag, max R2) | |
|
| ||
| Influenza seasons 2003–2009 (prior to 2009 pandemic) | 0.88 | 0.92 |
| 2009 pandemic A/H1N1 early wave | 0.91 | 0.84 |
| 2009/2010 pandemic A/H1N1 season | NA | 0.98 |
| 2010/2011 season | NA | 0.95 |
| 2011/2012 season | NA | 0.88 |
| 2012/2013 season | NA | 0.90 |
|
| ||
| Influenza seasons 2003–2009 (prior to 2009 pandemic) | 0.75 | 0.77 |
| 2009 pandemic A/H1N1 early wave | 0.51 | 0.82 |
| 2009/2010 pandemic A/H1N1 season | NA | 0.92 |
| 2010/2011 season | NA | 0.83 |
| 2011/2012 season | NA | 0.37 |
| 2012/2013 season | NA | 0.86 |
|
| ||
| Influenza seasons 2003–2009 (prior to 2009 pandemic) | 0.87 | 0.84 |
| 2009 pandemic A/H1N1 early wave | 0.78 | 0.88 |
| 2009/2010 pandemic A/H1N1 season | NA | 0.51 (−3 wks, 0.89) |
| 2010/2011 season | NA | 0.74 (+1 wk, 0.80) |
| 2011/2012 season | NA | 0.80 |
| 2012/2013 season | NA | 0.94 |
Figure 2Scatter plots of weekly excess influenza-like illness (ILI) visit proportions against original Google Flu Trends (GFT) model search query estimates, 2003–2009.
Weekly excess percent-ILI is calculated as Serfling estimates subtracted from observed proportions. Plots show original GFT model estimates compared with weighted CDC ILI-Net data for (A) the United States, and (B) Mid-Atlantic Census Region States (New Jersey, New York, Pennsylvania), and local ILI surveillance from emergency department visits for (C) New York City. Plots are shown for pre-pandemic influenza seasons, June 1, 2003 to April 25, 2009 (grey circles) and the early wave of the A/H1N1 pandemic, April 26 to August 1, 2009 (red diamonds). Lines representing equivalent axes for X = Y are shown (grey dashed line). Regression lines are shown for seasonal influenza 2003–2009 (black line) and the early 2009 wave of the pandemic (red line).
Figure 3Scatter plots of weekly excess influenza-like illness (ILI) visit proportions against updated Google Flu Trends (GFT) model search query estimates, 2003–2013.
Weekly excess percent-ILI is calculated as Serfling estimates subtracted from observed proportions. Plots show updated GFT model estimates compared with weighted CDC ILI-Net data for (A) the United States, and (B) Mid-Atlantic HHS-2 Region States (New Jersey, New York), and local ILI surveillance from emergency department ILI visit data for (C) New York City. Plots are shown for weeks June 1, 2003 to April 25, 2009 (grey circles), April 26 to January 2, 2010 (red diamonds), January 3, 2010 to Oct 6, 2012 (grey squares), and October 7, 2012 to March 30, 2013 (blue triangles). Lines representing equivalent axes for X = Y are shown (grey dashed line). Regression lines are shown for the 2003/2004–2008/2009 seasons (black line), 2009 pandemic (red line), 2010/2011–2010/2012 seasons (grey solid line) and the 2012/2013 season (blue line).