| Literature DB >> 31765386 |
Soo Beom Choi1,2, Juhyeon Kim1,2, Insung Ahn1,2.
Abstract
To identify countries that have seasonal patterns similar to the time series of influenza surveillance data in the United States and other countries, and to forecast the 2018-2019 seasonal influenza outbreak in the U.S., we collected the surveillance data of 164 countries using the FluNet database, search queries from Google Trends, and temperature from 2010 to 2018. Data for influenza-like illness (ILI) in the U.S. were collected from the Fluview database. We identified the time lag between two time-series which were weekly surveillances for ILI, total influenza (Total INF), influenza A (INF A), and influenza B (INF B) viruses between two countries using cross-correlation analysis. In order to forecast ILI, Total INF, INF A, and INF B of next season (after 26 weeks) in the U.S., we developed prediction models using linear regression, auto regressive integrated moving average, and an artificial neural network (ANN). As a result of cross-correlation analysis between the countries located in northern and southern hemisphere, the seasonal influenza patterns in Australia and Chile showed a high correlation with those of the U.S. 22 weeks and 28 weeks earlier, respectively. The R2 score of ANN models for ILI for validation set in 2015-2019 was 0.758 despite how hard it is to forecast 26 weeks ahead. Our prediction models forecast that the ILI for the U.S. in 2018-2019 may be later and less severe than those in 2017-2018, judging from the influenza activity for Australia and Chile in 2018. It allows to estimate peak timing, peak intensity, and type-specific influenza activities for next season at 40th week. The correlation between seasonal influenza patterns in the U.S., Australia, and Chile could be used to forecast the next seasonal influenza pattern, which can help to determine influenza vaccine strategy approximately six months ahead in the U.S.Entities:
Mesh:
Year: 2019 PMID: 31765386 PMCID: PMC6876883 DOI: 10.1371/journal.pone.0220423
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The flow chart for this study.
Fig 2An example of cross-correlation analysis.
Maximum correlation coefficient and time lag with time series of influenza surveillance data in the United States and input variables.
| ILI | Total INF | INF A | INF B | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Country | Corr | TL | Country | Corr | TL | Country | Corr | TL | Country | Corr | TL |
| Canada | 0.891 | 1 | Australia | 0.892 | -22 | Australia | 0.896 | -22 | Norway | 0.884 | 0 |
| Australia | 0.861 | -22 | Canada | 0.885 | 1 | Canada | 0.830 | 0 | Sweden | 0.864 | 1 |
| Germany | 0.790 | 4 | U.K | 0.826 | 0 | Chile | 0.803 | -28 | Croatia | 0.856 | -2 |
| Chile | 0.786 | -28 | Norway | 0.804 | 2 | U.K | 0.747 | 0 | U.K | 0.855 | -2 |
| Norway | 0.778 | 2 | Denmark | 0.796 | 4 | Kuwait | 0.701 | -9 | Canada | 0.851 | 0 |
| Sweden | 0.762 | 3 | Chile | 0.778 | -29 | Myanmar | 0.677 | -23 | Denmark | 0.848 | 1 |
| Iceland | 0.756 | 3 | Spain | 0.775 | -2 | Bangladesh | 0.657 | -30 | Switzerland | 0.841 | -4 |
| Japan | 0.741 | -1 | Iceland | 0.770 | 2 | Luxembourg | 0.655 | 4 | Ireland | 0.840 | -3 |
| Croatia | 0.731 | 1 | Sweden | 0.767 | 3 | Laos | 0.651 | -19 | Italy | 0.830 | -4 |
| U.K | 0.718 | 1 | Malta | 0.766 | -2 | Oman | 0.647 | -11 | Australia | 0.792 | -25 |
| GT_INF_U.S. | 0.880 | -1 | GT_INF_U.S. | 0.902 | -1 | GT_INF_U.S. | 0.891 | 0 | GT_INF_U.S. | 0.834 | -4 |
| GT_INF A_U.S. | 0.861 | -2 | GT_INF A_U.S. | 0.934 | -1 | GT_INF A_U.S. | 0.929 | -1 | GT_INF A_U.S. | 0.864 | -4 |
| GT_INF B_U.S. | 0.797 | 1 | GT_INF B_U.S. | 0.907 | 1 | GT_INF B_U.S. | 0.874 | 2 | GT_INF B_U.S. | 0.947 | -2 |
| GT_INF_Australia | 0.596 | -27 | GT_INF_Australia | 0.598 | -26 | GT_INF_Australia | 0.568 | -25 | GT_INF_Australia | 0.643 | -29 |
| GT_INF A_Australia | 0.808 | -24 | GT_INF A_Australia | 0.917 | -24 | GT_INF A_Australia | 0.899 | -23 | GT_INF A_Australia | 0.911 | -27 |
| GT_INF B_Australia | 0.578 | -23 | GT_INF B_Australia | 0.653 | -24 | GT_INF B_Australia | 0.600 | -23 | GT_INF B_Australia | 0.710 | -26 |
| GT_INF_Chile | 0.572 | 11 | GT_INF_Chile | 0.650 | 8 | GT_INF_Chile | 0.610 | 9 | GT_INF_Chile | 0.700 | 4 |
| GT_INF A_Chile | 0.800 | -30 | GT_INF A_Chile | 0.767 | -30 | GT_INF A_Chile | 0.800 | -30 | GT_INF A_Chile | 0.507 | -30 |
| GT_INF B_Chile | 0.593 | -23 | GT_INF B_Chile | 0.603 | -24 | GT_INF B_Chile | 0.567 | -23 | GT_INF B_Chile | 0.579 | -26 |
| Temp_U.S. | 0.764 | -27 | Temp_U.S. | 0.568 | -28 | Temp_U.S. | 0.560 | -27 | Temp_U.S. | 0.510 | -30 |
Corr, maximum correlation coefficient; GT, Google Trends; INF, Influenza; ILI, Influenza-like illness; Temp, temperature; TL, Time lag; U.K., United Kingdom of Great Britain and Northern Ireland; U.S., United States.
GT_INF_U.S. is GT with the keyword “influenza” in the U.S.
* Countries with a correlation coefficient of 0.7 or more and time lag between -20 and -30 weeks from 2010 to 2018.
Fig 3The explanation for forecasting seasonal influenza after 26 weeks.
Fig 4The explanation for the training set at the 40th week and output for forecasting seasonal influenza after 26 weeks.
Fig 5The surveillance data for influenza A (a) and B (b) viruses in the U.S. and Australia; the values for Australia were shifted forward 22 weeks in 2010–2018. The surveillance data for influenza A (c) viruses in the U.S. and Chile; the values for Australia were shifted forward 28 weeks. The blank part of the graph, the gap between INF A and the sum of the H1N1 and H3 viruses, is the number of influenza viruses (not subtyped).
Linear regression analysis for influenza activities of previous season in the U.S. and input variables from the 40th week in 2010 to the 40th week in 2018.
| ILI for the U.S. after 26 week | |||||
|---|---|---|---|---|---|
| LR 1 | LR 2 | LR 3 | LR 4 | LR 5 | |
| ILI—U.S. (before 26 week) | 0.857 [<0.001] | - | - | - | - |
| Total INF—Australia (present) | - | 0.005 [<0.001] | - | - | 0.001 [0.485] |
| Total INF—Chile (present) | - | 0.009 [<0.001] | - | - | 0.007 [<0.001] |
| GT INF A—Australia (present) | - | - | 0.059 [<0.001] | - | 0.043 [<0.001] |
| GT INF A—Chile (present) | - | - | 0.023 [<0.001] | - | -0.009 [<0.001] |
| Temp—U.S. (present) | - | - | - | 0.111 [<0.001] | 0.054 [<0.001] |
| Adj. R-squared | 0.537 | 0.761 | 0.720 | 0.557 | 0.865 |
| Total INF for the U.S. after 26 week | |||||
| ILI—U.S. (before 26 week) | 1.076 [<0.001] | - | - | - | - |
| Total INF—Australia (present) | - | 18.4 [<0.001] | - | - | -2.610 [0.012] |
| Total INF—Chile (present) | - | 23.6 [<0.001] | - | - | 26.8 [<0.001] |
| GT INF A—Australia (present) | - | - | 246.7 [<0.001] | - | 250.0 [<0.001] |
| GT INF A—Chile (present) | - | - | 49.8 [<0.001] | - | -24.9 [0.001] |
| Temp—U.S. (present) | - | - | - | 277.9 [<0.001] | - |
| Adj. R-squared | 0.475 | 0.729 | 0.839 | 0.286 | 0.887 |
| INF A for the U.S. after 26 week | |||||
| ILI—U.S. (before 26 week) | 0.900 [<0.001] | - | - | - | - |
| Total INF—Australia (present) | - | 16.3 [<0.001] | - | - | -3.415 [0.006] |
| Total INF—Chile (present) | - | 25.3 [<0.001] | - | - | 24.8 [<0.001] |
| GT INF A—Australia (present) | - | - | 169.6 [<0.001] | - | 182.7 [<0.001] |
| GT INF A—Chile (present) | - | - | 56.2 [<0.001] | - | -7.491 [0.261] |
| Temp—U.S. (present) | - | - | - | 219.9 [<0.001] | - |
| Adj. R-squared | 0.387 | 0.690 | 0.769 | 0.300 | 0.832 |
| INF B for the U.S. after 26 week | |||||
| ILI—U.S. (before 26 week) | 1.344 [<0.001] | - | - | - | - |
| Total INF—Australia (present) | - | 24.8 [<0.001] | - | - | 9.636 [<0.001] |
| GT INF A—Australia (present) | - | - | 74.9 [<0.001] | - | 54.4 [<0.001] |
| Temp—U.S. (present) | - | - | - | 58.0 [<0.001] | - |
| Adj. R-squared | 0.552 | 0.627 | 0.749 | 0.143 | 0.787 |
Beta, Beta coefficient; CI, Confidence Interval; GT, Google Trends; INF, Influenza; ILI, Influenza-like illness; LR, Linear regression; Temp, temperature; U.S., United States of America.
GT INF A—Australia is GT with the keyword of “influenza A virus” in Australia.
Performance of prediction models for seasonal influenza outbreaks after 26 weeks in the United States from 2015 to 2019.
| R2 | RMSE (forecast values) | RMSE (Peak timing) | RMSE (Peak amplitude) | ||
|---|---|---|---|---|---|
| Influenza-like illness | Previous season | 0.487 | 1.1 | 5.5 | 2.2 |
| LR | 0.720 | 0.8 | 2.5 | 1.5 | |
| ARIMAX | 0.714 | 0.8 | 2.9 | 1.6 | |
| ANN | 0.758 | 0.7 | 1.8 | 0.8 | |
| Total Influenza viruses | Previous season | 0.346 | 4802.8 | 5.8 | 6657.8 |
| LR | 0.792 | 2707.6 | 1.7 | 5574.5* | |
| ARIMAX | 0.806 | 2618.2 | 1.7 | 6726.4 | |
| ANN | 0.738 | 3039.6 | 1.7 | 7544.3 | |
| Influenza A virus | Previous season | 0.289 | 4190.9 | 5.8 | 4901.0 |
| LR | 0.777 | 2347.8 | 1.9 | 4694.7 | |
| ARIMAX | 0.792 | 2265.2 | 1.6 | 5355.9 | |
| ANN | 0.798 | 2231.9 | 1.2 | 4545.3 | |
| Influenza B virus | Previous season | -0.238 | 1880.8 | 3.7 | 5142.3 |
| LR | 0.427 | 1279.4 | 3.0 | 1453.6 | |
| ARIMAX | 0.352 | 1360.4 | 2.7 | 3163.0 | |
| ANN | 0.403 | 1306.3 | 2.4 | 1512.0 |
R2, Coefficient of determination; RMSE, Root-mean-square error; LR, Linear regression; ARIMAX, Auto Regressive Integrated Moving Average; ANN, artificial neural network.
* Best value among previous season, LR, ARIMAX, and ANN.
Units for the RMSE (forecast values) are percentage of visits for influenza-like illness and number of total influenza, influenza A, and influenza B viruses.
Units for the RMSE (Peak timing) are week.
Units for the RMSE (peak amplitude) are percentage of visits for influenza-like illness and number of total influenza, influenza A, and influenza B viruses.
Fig 6The prediction of ANN for ILI (a) and Total influenza (b), influenza A and influenza B viruses (c) after 26 weeks from 2015 to 2019 in the U.S.
Fig 7The prediction and 95% confidence interval of Auto Regressive Integrated Moving Average for ILI (a) and Total influenza (b), influenza A (c) and influenza B viruses (d) from 2015 to 2019 in the U.S.