| Literature DB >> 30228285 |
Moritz Wagner1,2,3, Vasileios Lampos4, Ingemar J Cox4,5, Richard Pebody6.
Abstract
There has been considerable work in evaluating the efficacy of using online data for health surveillance. Often comparisons with baseline data involve various squared error and correlation metrics. While useful, these overlook a variety of other factors important to public health bodies considering the adoption of such methods. In this paper, a proposed surveillance system that incorporates models based on recent research efforts is evaluated in terms of its added value for influenza surveillance at Public Health England. The system comprises of two supervised learning approaches trained on influenza-like illness (ILI) rates provided by the Royal College of General Practitioners (RCGP) and produces ILI estimates using Twitter posts or Google search queries. RCGP ILI rates for different age groups and laboratory confirmed cases by influenza type are used to evaluate the models with a particular focus on predicting the onset, overall intensity, peak activity and duration of the 2015/16 influenza season. We show that the Twitter-based models perform poorly and hypothesise that this is mostly due to the sparsity of the data available and a limited training period. Conversely, the Google-based model provides accurate estimates with timeliness of approximately one week and has the potential to complement current surveillance systems.Entities:
Mesh:
Year: 2018 PMID: 30228285 PMCID: PMC6143510 DOI: 10.1038/s41598-018-32029-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Measures for the performance of the Twitter supervised models (national and subnational) and the Google model (national) in estimating the overall intensity, onset and peaks of the 2015/16 influenza season when compared to RCGP ILI rates (national and subnational).
| Subnational level | National level | |||||
|---|---|---|---|---|---|---|
| London Twitter | Midlands and East Twitter | North Twitter | South Twitter | England Twitter | England Google | |
|
| ||||||
| r | 0.37 | 0.37 | 0.40 | 0.31 | 0.67 | 0.96 |
| MSE | 69.46 | 70.06 | 71.72 | 131.5 | 36.57 | 3.86 |
| RMSE | 8.33 | 8.37 | 8.47 | 11.47 | 6.05 | 1.96 |
| MAE | 6.28 | 5.95 | 6.13 | 7.83 | 4.27 | 1.47 |
| MAPE | 39.10% | 45.55% | 40.85% | 45.31% | 29.95% | 14.10% |
| ME | −2.58 | −5.43 | −5.13 | −6.99 | −2.17 | 0.54 |
| Max Error | 18.05 | 19.95 | 23.42 | 32.81 | 13.88 | 4.32 |
| Week Max Error | 47 | 13 | 11 | 12 | 12 | 52 |
| Max Percentage Error | 214.88% | 84.53% | 83.05% | 207.50% | 75.72% | 66.12% |
| Week of Max Percentage Error | 47 | 13 | 11 | 46 | 47 | 52 |
|
| ||||||
| Alert week | 1 | 47 | 51 | 3 | 2 | 1 |
| Time difference | 2 | −2 | 0 | −1 | −1 | 0 |
|
| ||||||
| Magnitude of 1st peak-to-peak difference | −0.55 (2.04%) | −3.57 (17.76%) | 3.93 (18.36%) | −2.92 (10.25%) | 1.32 (6.03%) | 0.15 (0.68%) |
| Temporal offset of 1st peaks | 12 | 2 | 4 | 2 | 4 | 2 |
| Magnitude of 1st peak-to-model difference (same week as RCGP estimate) | −4.79 (17.74%) | −15.63 (77.76%) | −13.35 (62.38%) | −18.82 (66.04%) | −2.12 (9.68%) | −0.31 (1.42%) |
| Magnitude of 2nd peak-to-peak difference | 6.05 (21.61%) | −11.86 (48.61%) | −7.81 (27.70%) | −25.24 (66.07%) | −4.28 (14.91%) | −3.16 (11.01%) |
| Temporal offset of 2nd peaks | 6 | 5 | 4 | 2 | 4 | 1 |
| Magnitude of 2nd peak-to-model difference (same week as RCGP estimate) | −15.03 (53.68%) | −19.81 (81.19%) | −23.42 (83.05%) | −32.81 (85.89%) | −11.87 (41.36%) | −3.86 (13.45%) |
Figure 1RCGP ILI estimates with overlaid national Twitter (blue) and Google (green) supervised model ILI estimates by week number during the 2015/16 influenza season. Thresholds were calculated using the Moving Epidemic Method based on national RCGP ILI estimates of the previous 6 influenza seasons[25].
Pearson correlations between national supervised Twitter and Google models and Data Mart laboratory confirmed cases by influenza type during the 2015/16 influenza season.
| Influenza Type A | Influenza Type B | All Data Mart laboratory confirmed cases | |
|---|---|---|---|
| England Twitter | 0.83 | 0.24 | 0.74 |
| England Google | 0.82 | 0.68 | 0.95 |
Pearson correlations between national supervised Twitter and Google models and RCGP ILI rates by age during the 2015/16 influenza season.
| Age (years) | |||||||
|---|---|---|---|---|---|---|---|
| <1 | 1–4 | 5–14 | 15–44 | 45–64 | 65–74 | 75+ | |
| England Twitter | 0.20 | 0.42 | 0.53 | 0.67 | 0.68 | 0.39 | 0.22 |
| England Google | 0.53 | 0.72 | 0.79 | 0.96 | 0.91 | 0.70 | 0.45 |
Figure 2Absolute errors between RCGP data and the national Twitter (blue) and Google (green) supervised model ILI estimates by week number during the 2015/16 influenza season including their 3 day moving averages.