| Literature DB >> 27165494 |
M Santillana1,2,3, A T Nguyen3, T Louie4, A Zink5, J Gray5, I Sung5, J S Brownstein1,2.
Abstract
Accurate real-time monitoring systems of influenza outbreaks help public health officials make informed decisions that may help save lives. We show that information extracted from cloud-based electronic health records databases, in combination with machine learning techniques and historical epidemiological information, have the potential to accurately and reliably provide near real-time regional estimates of flu outbreaks in the United States.Entities:
Mesh:
Year: 2016 PMID: 27165494 PMCID: PMC4863169 DOI: 10.1038/srep25732
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The CDC’s ILI estimates, baseline linear regression and AR(2) autoregressive model estimates, and ARES estimates are displayed as a function of time for the national level on the top panel.
The errors associated with the linear regression and autoregressive model baselines, and ARES are shown on the bottom panel.
Figure 2The CDC’s ILI estimates, baseline linear regression and AR(2) autoregressive model estimates, and ARES estimates are displayed as a function of time for each of the 10 US regions defined by the HHS.
Figure 3The errors associated with the linear regression and AR(2) autoregressive model baselines, and ARES are displayed as a function of time for each of the 10 US regions defined by the HHS.
Accuracy metrics between ARES and CDC’s ILI for all geographic regions, for the three flu seasons spanning 2012–2015.
| Algorithm | RMSE | Rel. RMSE (%) | Correlation | ||||||
|---|---|---|---|---|---|---|---|---|---|
| National | |||||||||
| GFT | 2.16 | 0.39 | 0.36 | 62.73% | 13.21% | 12.31% | 0.932 | 0.968 | 0.986 |
| Linear (univariate) | 0.45 | 0.20 | 0.54 | 12.20% | 9.00% | 12.22% | 0.996 | 0.985 | 0.986 |
| AR(2) | 0.46 | 0.30 | 0.45 | 11.77% | 9.53% | 11.40% | 0.940 | 0.940 | 0.937 |
| SVM (linear) + AR(2) | |||||||||
| Region 1 | |||||||||
| GFT | 2.87 | 0.27 | 0.41 | 107.01% | 21.20% | 26.01% | 0.789 | 0.881 | 0.951 |
| Linear (univariate) | 0.51 | 0.22 | 0.36 | 25.54% | 21.11% | 20.80% | 0.965 | ||
| AR(2) | 0.40 | 0.23 | 0.32 | 18.35% | 16.88% | 17.31% | 0.897 | 0.856 | 0.926 |
| SVM (linear) + AR(2) | 0.964 | 0.960 | |||||||
| Region 2 | |||||||||
| GFT | 2.22 | 0.64 | 1.18 | 46.54% | 27.37% | 38.64% | 0.960 | 0.833 | 0.938 |
| Linear (univariate) | 0.38 | 0.49 | 0.97 | 14.24% | 20.16% | 28.98% | 0.877 | 0.940 | |
| AR(2) | 0.42 | 0.27 | 0.27 | 11.32% | 10.12% | 9.73% | 0.949 | 0.922 | 0.937 |
| SVM (linear) + AR(2) | 0.975 | ||||||||
| Region 3 | |||||||||
| GFT | 1.97 | 0.33 | 0.63 | 78.06% | 24.18% | 21.19% | 0.914 | 0.984 | 0.983 |
| Linear (univariate) | 0.90 | 0.36 | 0.81 | 22.97% | 16.86% | 20.51% | 0.986 | ||
| AR(2) | 0.71 | 0.24 | 0.88 | 19.97% | 9.96% | 16.59% | 0.908 | 0.965 | 0.900 |
| SVM (linear) + AR(2) | 0.976 | 0.992 | |||||||
| Region 4 | |||||||||
| GFT | 1.84 | 0.36 | 0.45 | 58.99% | 27.05% | 16.71% | 0.891 | 0.958 | 0.974 |
| Linear (univariate) | 1.02 | 0.38 | 0.48 | 47.33% | 34.95% | 13.33% | 0.979 | 0.973 | 0.986 |
| AR(2) | 0.57 | 0.37 | 0.74 | 15.07% | 15.21% | 18.29% | 0.924 | 0.941 | 0.903 |
| SVM (linear) + AR(2) | |||||||||
| Region 5 | |||||||||
| GFT | 2.18 | 0.36 | 0.46 | 63.18% | 20.44% | 21.70% | 0.887 | 0.962 | 0.970 |
| Linear (univariate) | 0.28 | 0.43 | 0.53 | 13.58% | 23.34% | 15.22% | 0.951 | 0.983 | |
| AR(2) | 0.46 | 0.33 | 0.42 | 12.86% | 11.59% | 11.95% | 0.927 | 0.886 | 0.940 |
| SVM (linear) + AR(2) | 0.985 | ||||||||
| Region 6 | |||||||||
| GFT | 3.74 | 0.75 | 1.49 | 66.62% | 14.31% | 25.84% | 0.921 | 0.968 | 0.923 |
| Linear (univariate) | 1.29 | 0.73 | 0.66 | 21.17% | 18.53% | 13.38% | 0.965 | 0.964 | |
| AR(2) | 0.69 | 0.58 | 0.68 | 14.39% | 10.38% | 13.31% | 0.937 | 0.949 | 0.935 |
| SVM (linear) + AR(2) | 0.957 | ||||||||
| Region 7 | |||||||||
| GFT | 0.77 | 0.92 | 1.67 | 22.54% | 85.14% | 72.62% | 0.942 | 0.695 | |
| Linear (univariate) | 0.63 | 2.43 | 1.95 | 29.42% | 315.08% | 124.32% | 0.975 | 0.958 | 0.569 |
| AR(2) | 0.59 | 16.87% | 0.948 | 0.910 | |||||
| SVM (linear) + AR(2) | 0.45 | 0.81 | 54.24% | 29.14% | 0.935 | 0.827 | |||
| Region 8 | |||||||||
| GFT | 0.84 | 0.39 | 0.43 | 27.29% | 28.97% | 24.09% | 0.920 | 0.951 | 0.953 |
| Linear (univariate) | 0.77 | 0.54 | 0.78 | 23.73% | 20.16% | 25.38% | 0.973 | 0.942 | 0.894 |
| AR(2) | 0.36 | 0.41 | 0.38 | 15.01% | 19.64% | 15.78% | 0.961 | 0.843 | 0.930 |
| SVM (linear) + AR(2) | |||||||||
| Region 9 | |||||||||
| GFT | 2.84 | 0.80 | 0.42 | 72.09% | 24.93% | 15.59% | 0.922 | 0.946 | 0.934 |
| Linear (univariate) | 0.43 | 0.38 | 0.29 | 19.28% | 20.79% | 13.37% | 0.927 | 0.965 | |
| AR(2) | 0.37 | 0.29 | 0.35 | 11.92% | 13.42% | 10.82% | 0.934 | 0.935 | 0.942 |
| SVM (linear) + AR(2) | 0.970 | ||||||||
| Region 10 | |||||||||
| GFT | 2.85 | 0.73 | 0.50 | 181.88% | 91.96% | 30.46% | 0.866 | 0.953 | 0.955 |
| Linear (univariate) | 0.75 | 0.49 | 0.48 | 125.35% | 82.45% | 31.04% | 0.737 | 0.908 | |
| AR(2) | 0.49 | 0.57 | 0.41 | 37.20% | 30.14% | 27.99% | 0.867 | 0.881 | 0.920 |
| SVM (linear) + AR(2) | 0.922 | ||||||||
For comparison purposes, we have included GFT’s historical predictions, and the two baseline models: dynamic linear regression (mapping athenahealth’s ILI onto CDC’s ILI), and a two term autoregressive model, AR(2). Values with best performance appear in bold face.