| Literature DB >> 32940617 |
Mehnaz Adnan1, Xiaoying Gao2, Xiaohan Bai2, Elizabeth Newbern1, Jill Sherwood1, Nicholas Jones3, Michael Baker4, Tim Wood1, Wei Gao2.
Abstract
BACKGROUND: Over one-third of the population of Havelock North, New Zealand, approximately 5500 people, were estimated to have been affected by campylobacteriosis in a large waterborne outbreak. Cases reported through the notifiable disease surveillance system (notified case reports) are inevitably delayed by several days, resulting in slowed outbreak recognition and delayed control measures. Early outbreak detection and magnitude prediction are critical to outbreak control. It is therefore important to consider alternative surveillance data sources and evaluate their potential for recognizing outbreaks at the earliest possible time.Entities:
Keywords: Campylobacter; disease outbreaks; forecasting; spatio-temporal analysis
Mesh:
Year: 2020 PMID: 32940617 PMCID: PMC7530686 DOI: 10.2196/18281
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Description of data sources used in analysis.
| Source | Fields of interest | Data level used in analysis | Counts | References |
| Notified case count (New Zealand surveillance database EpiSurv) | Date of onset, testing, and notification for confirmed and probable cases of campylobacteriosis | Aggregated by notification date and city of residence in Hawkes Bay | 1345 | Ministry of Health New Zealand [ |
| General practice consultations (HealthStat) | Visits for gastrointestinal complaints | Individual with visit date, age, and sex, for entire Hawkes Bay District Health Board area only | 772 | Cumming J and Gribben B [ |
| Consumer helpline (HealthLine) calls | Consumer calls concerning gastrointestinal complaints | Individual with call date, age, sex, and residential city in Hawkes Bay | 1196 | St George IM and Cullen MJ [ |
| Google Trends | User queries with keywords for gastrointestinal complaints | Normalized counts aggregated by date, query keyword, and Google Trends normalized count for entire Hawkes Bay District Health Board area only | Not applicable | Google Trends [ |
| Twitter microblogs (from Gnip Historical PowerTrack service) | Tweets with keywords for gastrointestinal complaints | Individual tweets geocoded to cities in Hawkes Bay | 191 | Gnip [ |
| School absenteeism records (from individual schools) | Absence owing to illness or any valid reason | Aggregated by schools for the 5 schools providing data, areas represented: Havelock North, Napier, and Hastings | 23,836 | Ministry of Education, New Zealand [ |
Correlation and lagged transformed correlation of alternative predictors with notified case counts of campylobacteriosis.
| Data source | Number of days that alternative measures are lagged before notifiable counts | ||||||||||
|
| 0 days | −1 day | −2 days | −3 days | −4 days | −5 days | −6 days | −7 days | −8 days | −9 days | −10 days |
| GPa consultations | 0.5b | 0.43b | 0.39b | 0.26b | 0.17b | 0.14b | 0.09 | 0.05 | 0.04 | 0.01 | 0.01 |
| Consumer helpline | 0.44b | 0.59b | 0.67b | 0.64b | 0.55b | 0.37b | 0.2b | 0.12b | 0.1 | 0.07 | 0.07 |
| Google Trends | 0.13b | 0.16b | 0.22b | 0.22b | 0.21b | 0.17b | 0.21b | 0.21b | 0.16b | 0.08 | 0.02 |
| Twitter microblogs | 0.11b | 0.21b | 0.31b | 0.25b | 0.21b | 0.07 | 0 | −0.01 | 0 | −0.03 | 0 |
| School absenteeism | 0.3b | 0.48b | 0.64b | 0.7b | 0.52b | 0.35b | 0.21b | 0.2b | 0.17b | 0.18b | 0.15b |
aGP: general practice.
bStatistically significant correlation coefficient >0.1.
Autoregressive integrated moving average models with time-lagged covariates used with alternative data sources for forecasting 1 to 5 days ahead.
| Alternative data source and forecast step | Time-lagged covariates, daysa | ARIMAb orderc | RMSEd | ||||
|
| |||||||
|
| 1 day | 1 to 10 | 3,0,1 | 1.01 | |||
|
| 2 days | 2 to 10 | 2,0,0 | 1.04 | |||
|
| 3 days | 3 to 10 | 2,0,0 | 1.04 | |||
|
| 4 days | 4 to 10 | 2,0,0 | 1.05 | |||
|
| 5 days | 5 to 10 | 2,0,0 | 1.06 | |||
|
| |||||||
|
| 1 day | 1, 2, 3, 4, 5, 6, 7, 8, 10 | 3,0,2 | 1.08 | |||
|
| 2 days | 2, 3, 5, 6, 7, 8, 10 | 3,0,2 | 1.08 | |||
|
| 3 days | 3, 4, 5, 6, 7, 8, 10 | 3,0,2 | 1.08 | |||
|
| 4 days | 4, 6, 7, 8, 9, 10 | 3,0,2 | 1.09 | |||
|
| 5 days | 6, 7, 8, 9, 10 | 3,0,2 | 1.09 | |||
|
| |||||||
|
| 1 day | 1 to 10 | 2,0,0 | 1.07 | |||
|
| 2 days | 2 to 10 | 2,0,0 | 1.08 | |||
|
| 3 days | 3 to 10 | 2,0,0 | 1.08 | |||
|
| 4 days | 4 to 10 | 2,0,0 | 1.08 | |||
|
| 5 days | 5 to 10 | 2,0,0 | 1.08 | |||
|
| |||||||
|
| 1 day | 1 to 10 | 4,0,1 | 1.07 | |||
|
| 2 days | 2 to 10 | 5,0,2 | 1.08 | |||
|
| 3 days | 3 to 10 | 3,0,2 | 1.08 | |||
|
| 4 days | 4 to 10 | 2,0,2 | 1.09 | |||
|
| 5 days | 5 to 10 | 2,0,2 | 1.09 | |||
|
| |||||||
|
| 1 day | 1 to 10 | 5,1,3 | 0.94 | |||
|
| 2 days | 2 to 10 | 5,1,3 | 0.94 | |||
|
| 3 days | 3 to 10 | 5,1,3 | 0.94 | |||
|
| 4 days | 4 to 10 | 5,0,2 | 1.09 | |||
|
| 5 days | 5 to 10 | 5,0,2 | 1.09 | |||
aLagged covariates refer to the time-lagged independent variables of alternative data source.
bARIMA: autoregressive integrated moving average.
cARIMA order (p,d,q) refers to the number of autoregressive terms, degree of differencing, and moving average components of the model.
RMSE: root mean square error.
eGP: general practice.
Figure 1Actual notified case counts and prediction results 1 to 5 days ahead for all developed models, with their prediction errors based on relative root mean square error. The best model performance with the lowest prediction error (relative root mean square error) in each time series is shown as a bold line. ABS: abseentism; AR: autoregressive; CHL: consumer helpline; GP: general practice; GT: Google Trends.
Root mean square error, relative root mean square error, and Pearson correlation for 1-, 2-, 3-, 4-, and 5-day ahead predictions during the test period (August 2016).
| Model | 1 Day | 2 Days | 3 Days | 4 Days | 5 Days | |||||||||||||
|
| RMSEa | rRMSEb | ρc | RMSE | rRMSE | ρ | RMSE | rRMSE | ρ | RMSE | rRMSE | ρ | RMSE | rRMSE | ρ | |||
| ARd | 15.28 | 46.9 | 0.917 | 23.73 | 72.8 | 0.76 | 33.9 | 105.3 | 0.82 | 38.85 | 119.2 | 0.20 | 67.57 | 202 | 0.65 | |||
| AR+CHLe |
|
|
|
|
|
| 39.74 | 123.5 | 0.79 | 38.14 | 117 | 0.28 | 68.51 | 204.8 | 0.64 | |||
| AR+GPg | 15.71 | 48.2 | 0.901 | 23.77 | 72.9 | 0.75 | 31.55 | 98 | 0.84 | 39.59 | 121.4 | 0.21 | 63.21 | 189 | 0.66 | |||
| AR+GTh | 12.9 | 39.6 | 0.933 | 22.5 | 69 | 0.76 |
|
|
| 37.84 | 116.1 | 0.21 |
|
|
| |||
| AR+Twitter | 11.61 | 35.6 | 0.951 | 22.67 | 69.5 | 0.80 | 35.63 | 110.7 | 0.81 |
|
|
| 80.83 | 241.7 | 0.62 | |||
| AR+ABSi | 4.74 | 14.5 | 0.989 | 15.97 | 49 | 0.89 | 38.68 | 120.2 | 0.81 | 47.26 | 145 | 0.28 | 71.5 | 213.8 | 0.65 | |||
aRMSE: root mean square error.
brRMSE: relative root mean square error.
cρ: Pearson correlation.
dAR: autoregressive.
eCHL: consumer helpline.
fBest performing model for a particular day on basis of the rRMSE.
gGP: general practice.
hGT: Google Trends.
iABS: school absenteeism.
Figure 2The daily estimations of the best performing models (lowest relative root mean square error) and their prediction errors during the testing period (August 2016). AR: autoregressive; CHL: consumer helpline; GT: Google Trends.
Figure 3Cluster types in notified case counts, consumer helpline inquiries, and school’s absenteeism in Hastings and Havelock North. High-high cluster refers to high values surrounded by high values, high-low cluster refers to high values surrounded by low values, low-high cluster refers to low values surrounded by high values, and low-low cluster refers to low values surrounded by low values. Multiple Types refer to multiple cluster-type designations (ie, high high, low low, high low, and low high) through the time period.
Daily Local Moran’s I in school absenteeism, consumer helpline inquiries, and notified case counts in Havelock North and Hastings cities in August 2016.
| Date | Havelock North | Hastings | ||||
|
| School absenteeism | Consumer helpline | Notified case count | School absenteeism | Consumer helpline | Notified case count |
|
| Moran’s I value, Z score | Moran’s I value, Z score | Moran’s I value, Z score | Moran’s I value, Z score | Moran’s I value, Z score | Moran’s I value, Z score |
| August 4, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.03 (−0.16) | 0.04 (−0.23) | 0.08 (−0.29) |
| August 5, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.04 (−0.23) | 0.07 (−0.29) | 0.09 (−0.32) |
| August 6, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 7, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 8, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 9, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 10, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.04 (−0.19) | 0.03 (−0.1) | 0.09 (−0.29) |
| August 11, 2016 | − | − | 0 (0.01) | 0.03 (−0.15) | 0.01 (−0.1) | 0.08 (−0.29) |
| August 12, 2016 | −0.40 (−0.23) | -0.77 (−0.29) | 0 (−0.32) | 0.04 (−0.23) | 0.03 (−0.29) | 0.09 (−0.32) |
| August 13, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 14, 2016 | −1.62 (7.08)a | −1.92 (6.71)a | − |
| − | − |
| August 15, 2016 | −1.62 (−0.23) | −1.92 (−0.29) | −2.17 (−0.32) | 0.03 (−0.17) | -0.01 (−0.04) | 0.56 (0.89) |
| August 16, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.03 (−0.16) | 0 (−0.04) | 1.20 (1.37) |
| August 17, 2016 | 0.05 (0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.02 (−0.15) | 0 (0.03) | 1.20 (0.89) |
| August 18, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.02 (−0.11) | 0 (0.03) | 0.31 (0.35) |
| August 19, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.03 (−0.23) | −0.01 (−0.29) | −0.11 (−0.32) |
| August 20, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 21, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.03 (−0.13) | 0.01 (−0.04) | −0.08 (0.25) |
| August 22, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.02 (−0.17) | 0 (−0.04) | −0.05 (−0.19) |
| August 23, 2016 | −0.10 (0.45) | −0.11 (0.37) | −0.11 (0.34) | 0.03 (−0.18) | 0 (−0.1) | −0.02(0.13) |
| August 24, 2016 | 0.21 (0.46) | 0.14 (0.37) | 0.12 (0.34) | 0.03 (−0.16) | 0.02 (−0.16) | −0.03 (−0.23) |
| August 25, 2016 | 0.14 (0.3) | 0.14 (0.37) | 0.23 (0.68) | 0.03 (−0.16) | 0.04 (−0.23) | 0.06 (−0.29) |
| August 26, 2016 | −0.07 (−0.23) | −0.11 (−0.29) | −0.22 (−0.32) | 0.04 (−0.23) | 0.07 (−0.29) | 0.09 (−0.32) |
| August 27, 2016 | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
| August 28, 2016 | −0.05 (0.2) | −0.01 (0.04) | −0.11 (0.34) | 0.04 (−0.19) | 0.03 (−0.1) | 0.03 (−0.1) |
| August 29, 2016 | −0.05 (−0.23) | −0.01 (−0.29) | −0.11 (−0.32) | 0.04 (−0.23) | 0.03 (−0.29) | 0.03 (−0.32) |
| August 30, 2016 | −0.02 (0.11) | −0.11 (0.37) | 0.05 (−0.16) | 0.05 (−0.23) | 0.08 (−0.29) | 0.10 (−0.32) |
aNegative values of the Moran’s I value and corresponding Z scores greater than 1.96 indicate that there is a statistically significant spatial outlier.
bFirst day when the data source shows a spatial outlier.