| Literature DB >> 30045889 |
Sasikiran Kandula1, Teresa Yamana2, Sen Pei2, Wan Yang2, Haruka Morita2, Jeffrey Shaman2.
Abstract
A variety of mechanistic and statistical methods to forecast seasonal influenza have been proposed and are in use; however, the effects of various data issues and design choices (statistical versus mechanistic methods, for example) on the accuracy of these approaches have not been thoroughly assessed. Here, we compare the accuracy of three forecasting approaches-a mechanistic method, a weighted average of two statistical methods and a super-ensemble of eight statistical and mechanistic models-in predicting seven outbreak characteristics of seasonal influenza during the 2016-2017 season at the national and 10 regional levels in the USA. For each of these approaches, we report the effects of real time under- and over-reporting in surveillance systems, use of non-surveillance proxies of influenza activity and manual override of model predictions on forecast quality. Our results suggest that a meta-ensemble of statistical and mechanistic methods has better overall accuracy than the individual methods. Supplementing surveillance data with proxy estimates generally improves the quality of forecasts and transient reporting errors degrade the performance of all three approaches considerably. The improvement in quality from ad hoc and post-forecast changes suggests that domain experts continue to possess information that is not being sufficiently captured by current forecasting approaches.Entities:
Keywords: forecasts; influenza; mechanistic models; meta-ensemble; nowcast
Mesh:
Year: 2018 PMID: 30045889 PMCID: PMC6073642 DOI: 10.1098/rsif.2018.0174
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Cumulative log scores and mean errors of the real-time forecast variant for week 48 through week 18 forecasts at the national and 10 HHS regions during the 2016–2017 season. One-week-ahead is not displayed as all three methods used nowcasts, and the scores/errors were thus identical. For each target, the best score and lowest error are in italics.
| probabilistic forecasts—log scores | point forecasts—mean errors | |||||
|---|---|---|---|---|---|---|
| target | DYN | STAT | SE | DYN | STAT | SE |
| season onset | −134 | − | −129 | 0.884 | 0.523 | |
| season peak week | − | − | −231 | 1.581 | 1.604 | |
| season peak intensity | −348 | − | − | 0.165 | 0.135 | |
| two-week-ahead | − | −288 | −266 | 0.204 | 0.195 | |
| three-week-ahead | − | −322 | −318 | 0.251 | ||
| four-week-ahead | −344 | −340 | − | 0.290 | 0.254 | |
Figure 1.Scores for forecasts at each week of the season, by target. Target ‘one-week-ahead’ was excluded as it would be identical for the three methods.
Statistical significance of difference in errors from each forecasting method as determined by a paired Wilcoxon signed-rank test. The values in the parentheses show the p-value resulting from testing for alternative hypothesis ‘lesser’ and ‘greater’, respectively. For example, in onset, error with DYN is significantly greater (0.01) than error with STAT and error with SE (less than 0.01); and there is no difference in errors of STAT and SE (0.14). For seasonal targets, only weeks prior to the occurrence of the event are used, as forecasts made after the event are almost always correct. See electronic supplementary material, table S1, for significant tests by variant. Statistically significant differences are italicized.
| DYN, STAT | DYN, SE | STAT, SE | |
|---|---|---|---|
| season onset | ( | ( | (0.86, 0.14) |
| season peak week | (0.65, 0.35) | (0.76, 0.24) | (0.77, 0.23) |
| season peak intensity | ( | ( | ( |
| two-week-ahead | (0.84, 0.16) | ( | (0.8, 0.2) |
| three-week-ahead | ( | ( | (0.83, 0.17) |
| four-week-ahead | ( | ( | (0.45, 0.55) |
Figure 2.Cumulative score at the end of season by location and target. Target ‘one-week-ahead’ was excluded as it would be identical for the three methods. The boxplot denotes the median, interquartile range (IQR) and the extrema (IQR*1.5). The text in black shows the mean score across the 11 locations.
Cumulative probabilistic forecast scores for all variants. The value in parentheses is the percentage difference relative to the Baseline score. Positive numbers in parentheses indicate improved performance and vice versa.
| method | target | |||||
|---|---|---|---|---|---|---|
| DYN | season onset | −135 | −134(1) | −145(−7) | −136(−1) | −125(7) |
| season peak week | −278 | −226(19) | −276(1) | −258(7) | −250(10) | |
| season peak intensity | −403 | −348(14) | −413(−3) | −367(9) | −375(7) | |
| one-week-ahead | −163 | −161(1) | −205(−25) | −172(−5) | −127(22) | |
| two-week-ahead | −241 | −252(−4) | −269(−12) | −240(0) | −219(9) | |
| three-week-ahead | −296 | −311(−5) | −330(−11) | −298(−1) | −278(6) | |
| four-week-ahead | −333 | −344(−3) | −362(−9) | −335(0) | −320(4) | |
| STAT | season onset | −95 | −115(−21) | −94(1) | −102(−7) | −85(11) |
| season peak week | −244 | −226(7) | −240(2) | −229(6) | −209(14) | |
| season peak intensity | −350 | −311(11) | −343(2) | −301(14) | −347(1) | |
| one-week-ahead | −163 | −163(0) | −220(−35) | −165(−1) | −127(22) | |
| two-week-ahead | −273 | −288(−5) | −275(−1) | −266(2) | −288(−6) | |
| three-week-ahead | −298 | −322(−8) | −308(−3) | −293(2) | −309(−4) | |
| four-week-ahead | −331 | −340(−3) | −326(1) | −325(2) | −327(1) | |
| SE | season onset | −118 | −129(−9) | −116(2) | −125(−5) | −103(13) |
| season peak week | −259 | −231(11) | −262(−1) | −264(−2) | −257(1) | |
| season peak intensity | −339 | −311(8) | −324(5) | −299(12) | −336(1) | |
| one-week-ahead | −163 | −161(2) | −160(2) | −165(−1) | −127(22) | |
| two-week-ahead | −233 | −266(−14) | −235(−1) | −229(2) | −222(5) | |
| three-week-ahead | −280 | −318(−14) | −293(−5) | −275(2) | −275(2) | |
| four-week-ahead | −301 | −329(−9) | −305(−1) | −300(0) | −300(0) | |
Mean point forecast errors for all variants. The value in parentheses is the percentage difference from the Baseline error and an italic value indicates that the difference was found to be significant (p < 0.05) with a paired Wilcoxon signed-rank test. As no post-processing was applied to the point forecasts, errors with Baseline with post-processing are identical to those with Baseline and hence omitted.
| method | target | ||||
|---|---|---|---|---|---|
| DYN | season onset | 0.784 | 0.884(− | 0.839(− | 0.709(10) |
| season peak week | 1.536 | 1.581(−3) | 1.575(−3) | 1.5(2) | |
| season peak intensity | 0.169 | 0.165(3) | 0.201(− | 0.17(−1) | |
| one-week-ahead | 0.15 | 0.147( | 0.185(− | 0.117( | |
| two-week-ahead | 0.209 | 0.204(2) | 0.269(− | 0.178( | |
| three-week-ahead | 0.268 | 0.251( | 0.363(− | 0.257( | |
| four-week-ahead | 0.327 | 0.290( | 0.457(− | 0.325(1) | |
| STAT | season onset | 0.558 | 0.523(6) | 0.503(10) | 0.386( |
| season peak week | 1.604 | 1.604(0) | 1.679(−5) | 1.627(−1) | |
| season peak intensity | 0.136 | 0.135( | 0.134( | 0.132( | |
| one-week-ahead | 0.149 | 0.147( | 0.148(1) | 0.117( | |
| two-week-ahead | 0.172 | 0.195(− | 0.182(−6) | 0.175(−1) | |
| three-week-ahead | 0.207 | 0.228(−10) | 0.220(−6) | 0.21(−2) | |
| four-week-ahead | 0.231 | 0.249(−8) | 0.238(−3) | 0.228(1) | |
| SE | season onset | 0.546 | 0.516(5) | 0.494( | 0.445(18) |
| season peak week | 1.442 | 1.513(− | 1.523(−6) | 1.412(2) | |
| season peak intensity | 0.126 | 0.129(− | 0.123( | 0.122(3) | |
| one-week-ahead | 0.149 | 0.147( | 0.148(0) | 0.117( | |
| two-week-ahead | 0.165 | 0.193(− | 0.195(− | 0.161(2) | |
| three-week-ahead | 0.210 | 0.228(− | 0.257(− | 0.217(−3) | |
| four-week-ahead | 0.243 | 0.254(−5) | 0.284(− | 0.252(−3) |
Figure 3.Cumulative sum of log score of the three methods, by variant and target. In each sub-panel, the better scoring variant would have a higher cumulative score, i.e. closer to y = 0. For example, with DYN, the one-week-ahead scores for Baseline, Real-time and Baseline with post-processing have very similar scores. Removing nowcast degraded the scores and the availability of stable ILI improved the scores.
Figure 4.Scores of the probabilistic forecasts for one-week-ahead forecasts from Baseline versus one of the variant forms. The colour of the data point denotes the week of the season, and the shape of the data point denotes the forecast method. Points above the diagonal line indicate that the variant (Baseline without nowcast for top row; Stable ILI for bottom row) outperforms baseline, while points below the diagonal line indicate that Baseline results in a higher score. Note that because one-week-ahead forecasts for both the Baseline and Stable ILI variants are nowcasts, and the same nowcasts are used for DYN, STAT and SE, the three subpanels in the second row are identical. Post-processing does not change nowcast considerably and hence is not shown.