Literature DB >> 35636313

Gecko: A time-series model for COVID-19 hospital admission forecasting.

Mark J Panaggio¹, Kaitlin Rainwater-Lovett², Paul J Nicholas², Mike Fang², Hyunseung Bang², Jeffrey Freeman², Elisha Peterson², Samuel Imbriale³.

Abstract

During the COVID-19 pandemic, concerns about hospital capacity in the United States led to a demand for models that forecast COVID-19 hospital admissions. These short-term forecasts were needed to support planning efforts by providing decision-makers with insight about future demands for health care capacity and resources. We present a SARIMA time-series model called Gecko developed for this purpose. We evaluate its historical performance using metrics such as mean absolute error, predictive interval coverage, and weighted interval scores, and compare to alternative hospital admission forecasting models. We find that Gecko outperformed baseline approaches and was among the most accurate models for forecasting hospital admissions at the state and national levels from January-May 2021. This work suggests that simple statistical methods can provide a viable alternative to traditional epidemic models for short-term forecasting.

Entities: Chemical

Keywords: COVID-19; Coronavirus disease; Forecasting; Hospitalization; SARIMA; SARS-CoV-2; Time-series model

Mesh：

Year: 2022 PMID： 35636313 PMCID： PMC9124631 DOI： 10.1016/j.epidem.2022.100580

Source DB: PubMed Journal: Epidemics ISSN： 1878-0067 Impact factor: 5.324

Introduction

Since its discovery in late 2019, SARS-CoV-2 has spread across the globe, infecting over 180 million people and leading to over 3.9 million confirmed deaths as of June 2021 (Dong et al., 2020). During this pandemic, there have been widespread concerns that the influx of patients hospitalized due to COVID-19 could cause healthcare systems to be overwhelmed. There was an urgent need for tools to help decision-makers anticipate the demand for hospital beds, staff and other resources to give hospital systems time to prepare. In March 2020, a consortium called the COVID-19 Forecast Hub was created (Cramer et al., 2021a) to expedite the creation, evaluation and dissemination of forecasts for the trajectory of the pandemic in the U.S to inform planning, resource allocation, and mitigation measures such as non-pharmaceutical interventions (NPIs) (Polonsky et al., 2019, Davies and Youde, 2016, Lutz et al., 2019). Forecasts were submitted by academic research groups, government-affiliated laboratories and private industry groups in support of this effort. These forecasts were then combined to produce an ensemble model, called the COVIDhub-ensemble (Ray et al., 2020), intended to produce more accurate and robust forecasts than the individual component models (Johansson et al., 2019, Viboud et al., 2018, McGowan et al., 2019, Reich et al., 2019). Initially, the Forecast Hub focused on COVID-19 cases and deaths, but in May 2020 hospital admissions forecasts were added (U.S. Centers for Disease Control and Prevention, 2020). At the time, there was no agreed-upon source for hospitalization data, and admission forecasts were excluded from early model evaluation efforts (Cramer et al., 2021b). In the fall of 2020, the United States Department of Health and Human Services (HHS) released a new data-set tracking hospital admissions and bed utilization by state, and the consortium selected these data as the benchmark for hospitalization forecasts (U.S. Department of Health and Human Services, 2021). These data could be used to fit models directly to hospitalization data to provide more accurate forecasts of hospital admissions and other hospital utilization metrics. An ensemble model for confirmed COVID-19 hospital admissions based on forecasts estimated from this data was first released in December 2020 (U.S. Centers for Disease Control and Prevention, 2020). This paper presents a statistical model, called Gecko,1 designed to produce short-term hospital admission forecasts for operational use. We discuss its implementation, validation, and applications during the COVID-19 pandemic and present a comparison of its performance to alternative models. We demonstrate that it often outperforms more complex mechanistic models and that its 7-day and 14-day admission forecasts were among the most accurate forecasts submitted to the COVID-19 Forecast Hub. These findings suggest that statistical models warrant consideration as alternatives to traditional epidemic models for short-term forecasting and as components of ensemble forecast models.

Materials and methods

Data sources

The hospital admissions forecasts presented here are based on data compiled by HHS (U.S. Department of Health and Human Services, 2021). These data were collected from HHS TeleTracking, direct reporting by health care facilities, and the National Healthcare Safety Network, and are aggregated from individual facilities to the state level. This data-set includes hospital metrics related to staffing shortages, admissions, bed utilization and deaths for a variety of different patient age ranges dating back to January 1, 2020. However, we exclude data prior to October 20, 2020 due to the lower completeness of these earlier observations and the prevalence of irregularities in this data. We focus on forecasts for the total number of confirmed COVID-19 hospital admissions due to the fact that this quantity was the focus of the hospital forecasting effort spearheaded by the COVID-19 Forecast Hub (Cramer et al., 2021a). However, the Gecko model was also applied to other hospitalization metrics, including suspected COVID-19 hospital admissions, staffed inpatient and intensive care unit (ICU) beds used, as well as staffed inpatient and ICU beds used by COVID-19 patients. While the data distributed by HHS have already undergone some cleaning, anomalies remain. These anomalies can be caused by missing data from particular health-care facilities, data-entry errors, and reporting lags and backlogs. The data are updated regularly, but observations in the last 3–7 days tend to be incomplete and are often revised upward as additional data are collected. This poses significant challenges for forecasting as it can create the appearance of an artificial downward trend. As a result, additional cleaning steps were applied prior to fitting the Gecko model as discussed in Section 2.2

Model

Gecko is based on a standard time-series forecasting method called a Seasonal Auto-Regressive Integrated Moving Average Model (SARIMA) (Durbin and Koopman, 2012) and implemented using the statsmodels package in Python (Seabold and Perktold, 2010). This type of model assumes that the data represent noisy observations of a dynamic process that includes a periodic seasonal component and a stochastic trend. One can obtain posterior estimates for the true states corresponding to each observation and a model for the dynamics using a Kalman filter fitted with expectation maximization (Kalman, 1960, Roweis and Ghahramani, 1999). This model can be used to extrapolate the current trajectory over the next few weeks. This type of model was selected because hospitalizations show clear evidence of a weekly cycle in which admissions are higher during the week than on weekends (Fig. 1). Using SARIMA we are able to explicitly model this cycle to obtain better estimates for the noise distribution and the underlying dynamics. This type of model has been used previously in modeling the trajectory of the pandemic (ArunKumar et al., 2021, Demir and Kirişci, 2021), but to our knowledge Gecko is the first model to use SARIMA for forecasting hospital admissions due to COVID-19.

Fig. 1

This model was fit to observational data describing the number of confirmed COVID-19 hospital admissions across all 50 U.S. states and the District of Columbia. The form of a SARIMA model is determined by seven hyperparameters which are described in Table 1. The hyperparameters were selected to optimize the fit to historical data as measured by the Akaike Information Criterion (Akaike, 1998) and were shared across all states to avoid over-fitting. The values selected according to this criteria were and the model therefore uses first differences, no moving averages and auto-regression of order 1 for both the ARIMA and 7-day seasonal component. This yields a model with the form where denotes the unobserved true data, denotes the noisy measurements, represents the measurement noise, and represents the process noise. Here is used to denote the back-shift operator and denotes the difference operator of order . In addition to these structural hyperparameters, there are four parameters that must be estimated for each time-series: two autoregression parameters, and ; and two variances, and , for the process and measurement noise (see Table 1). These were computed using maximum likelihood estimation (MLE) and were refitted each time a forecast was generated using the available historical data since October 20, 2020. A repository with an implementation of this model is available through GitLab (Panaggio, 2022).

Table 1

Parameters and hyperparameters for SARIMA model. The hyperparameters (rows 1–7) were selected using a grid search to optimize the fit to historical data as measured by the Akaike Information Criterion (Akaike, 1998) and were shared across all states to avoid over-fitting. The parameters (rows 8–11) were fitted using maximum likelihood estimation and were refitted to the available historical data for each state each time forecasts were produced.

Parameter	Description	Fitting method	Search space	Selected value
p	autoregressive order	grid search	0,1,2	1
d	difference order	grid search	0,1,2	1
q	moving average order	grid search	0,1,2	0
P	seasonal autoregressive order	grid search	0,1,2	1
Q	seasonal difference order	grid search	0,1,2	1
D	seasonal moving average order	grid search	0,1,2	0
S	seasonal period	fixed	7	7
ϕ	autoregressive parameter	MLE	(−∞,∞)	varies
ϕ~	seasonal autoregressive parameter	MLE	(−∞,∞)	varies
σξ2	variance of process noise	MLE	(0,∞)	varies
ση2	variance of measurement noise	MLE	(0,∞)	varies

Confirmed COVID-19 hospital admissions (U.S.) and weekly forecasts from Gecko model. Only forecasts for the next 7 days are shown for clarity. Here the black curve represents the observed totals and the colored curves represent each forecast with shaded bands representing the 50% (dark) and 95% (light) predictive intervals. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Parameters and hyperparameters for SARIMA model. The hyperparameters (rows 1–7) were selected using a grid search to optimize the fit to historical data as measured by the Akaike Information Criterion (Akaike, 1998) and were shared across all states to avoid over-fitting. The parameters (rows 8–11) were fitted using maximum likelihood estimation and were refitted to the available historical data for each state each time forecasts were produced. As discussed in Section 2.1, the raw hospital admissions totals contain various anomalies. Therefore, fitting a SARIMA model directly to these data can produce forecasts that are unreliable. To avoid overfitting to these anomalies, Gecko applies the following preprocessing steps before fitting: Observations within the last three days, which tend to be incomplete, are removed. Observations corresponding to national holidays are removed. Point anomalies are removed using an anomaly detector. This anomaly detector computes a 7-day rolling average and uses the differences between the observed values and this average as an estimate of the noise distribution. Points that are within four standard deviations of the rolling average for each time-series are deemed acceptable. Of the remaining points, those with a day-to-day change of 50% relative to the moving average or more followed by a reversion to within 20% of the original value are marked as point anomalies. Anomalous observations in the last 7 days are removed. Points in the last 7 days that deviate significantly from the recent trend are marked as anomalous and replaced with null values. This trend is estimated by fitting a line to the observations between 21 and 7 days prior to the forecast date. The standard deviation of the residuals relative to that line is computed and points that are more than three standard deviations away are marked as anomalous. The observed time-series for each state is inspected visually and clear deviations from a periodic trend and significant drops in the lasts 7 days not flagged by the aforementioned anomaly detectors are removed manually. In each case, anomalous observations and other points that are removed are replaced with null values in order to prevent them from biasing estimates of the model parameters.

Forecast hub

The model was trained using confirmed COVID-19 hospital admissions time-series processed using the method outlined in Section 2.2 on a weekly basis between January 11, 2021 and May 31, 2021. Each week the data collected through Sunday evening were used to train the model, and predictive intervals (PIs) for the next 28 days were computed.2 These forecasts were then submitted to the COVID-19 Forecast Hub (Cramer et al., 2021a) for inclusion in the COVIDhub-ensemble model created by researchers at the University of Massachusetts Amherst.3 This ensemble generates its predictive intervals by averaging the corresponding predictive quantiles of all eligible component models (Ray et al., 2020). The forecasts for eligible models along with the ensemble forecast were then released publicly via the CDC website (U.S. Centers for Disease Control and Prevention, 2020). During the 21 week period considered here, 12 different models provided forecasts used in the ensemble. A list of models and an evaluation of their performance is available in Section 3.

Evaluation metrics

When creating models for operational use, building trust through model validation is essential. Gecko’s performance was continually evaluated by comparing forecasts to observed outcomes in order to demonstrate its effectiveness to stakeholders. Performance was evaluated using metrics such as mean absolute error (MAE), mean absolute percentage error (MAPE), PI coverage and weighted intervals scores (WIS). The interpretation of these metrics is summarized below. Here we use to denote indices for the forecasts under consideration, to refer to the observed outcome with index , to denote quantile of the forecasted outcome, and as an indicator variable whose value is 1 if condition is true and zero otherwise. MAE represents the absolute value of the difference between the observed value and the predicted point estimate We use the median forecast when computing this quantity. MAE generally scales with the variable that is being predicted and therefore one must consider the MAE in context (Hyndman and Koehler, 2006). MAPE is based on dividing MAE by the observed value and converting to a percentage Its scale is independent of the variable being predicted. MAPE can be more volatile than MAE particularly when the observed value is close to zero (Hyndman and Koehler, 2006). Both MAE and MAPE account for point estimates only and ignore the stated uncertainty of a forecast. PI coverage is a metric for evaluating uncertainty in which one computes the percentage of PIs at a given confidence level that contain the observed value. For a % predictive interval, it is given by where . We focus on the coverage of the 50% and 95% PIs as those intervals are most commonly used in public facing websites and dashboards and are therefore most likely to be considered by end-users when attempting to interpret forecasts. The coverage for a well-calibrated % predictive interval should be approximately %. Unfortunately, coverage can be inflated by providing excessively wide PIs. In Bracher et al. (2021), the authors propose an alternative metric for evaluating epidemic forecasts called a weighted interval score (WIS) that is strictly proper, meaning that it is optimized only when models provide accurate representations of their own uncertainty (Gneiting and Raftery, 2007). We also report this metric, and base it on the median, 50% and 95% predictive intervals as follows where , , and denotes the interval score for the % predictive interval given by with . Like MAE, WIS scales with the quantity being predicted with lower scores relative to the observed values indicating better performance. For a more in-depth discussion of these evaluation metrics, see Cramer et al. (2021b).

Results

Model performance

The performance of the Gecko model was evaluated by comparing both national and state forecasts for confirmed COVID-19 hospital admissions to the observed values over 7, 14, and 21 day horizons. Although the model can produce forecasts over longer horizons, those forecasts are not considered here as the model’s reliance on extrapolation of recent trends renders these long-term forecasts of little practical use. In Fig. 2, we evaluate incident admissions (left) as well as the change in incident admissions (right) from 7 days prior. We find strong agreement between the predicted and observed values and obtain Pearson correlations of 0.974 and 0.382 respectively.

Fig. 2

In Fig. 3, we highlight the performance by state. We find that 33 out of 51 states (and DC) had a median MAPE below 25%. The eighteen states with MAPE above 25% tended to be less populous and exhibited lower admissions levels. Except for South Carolina (median MAE: 16.3) and West Virginia (median MAE: 10.0), the MAE for these states was less than 10. We also observed a high level of coverage for the PIs. The coverages for the 95% PIs were close to 95% with a minimum value of 86% (Michigan). The coverages for the 50% PIs were generally above 50% indicating that the model overestimates the uncertainty when computing the 50% PI. For national forecasts, we obtained a MAPE of 11.7% (MAE: 756.7), with coverages of 71.4% and 95.2% for the 50% and 95% PIs respectively.

Fig. 3

MAE (top) and MAPE (middle) and coverage (bottom) for 7 day hospital admission forecasts from Gecko model by state. States are sorted by MAPE.

Comparison of forecasted and observed confirmed COVID-19 hospital admissions by state according to Gecko model. The left panel shows the raw number of admission and the right panel shows the change in admissions over a 7 day horizon. MAE (top) and MAPE (middle) and coverage (bottom) for 7 day hospital admission forecasts from Gecko model by state. States are sorted by MAPE.

Model comparison

We also compared the performance of Gecko to the models that provided hospital admissions forecasts used in the COVIDhub-ensemble in at least 50% of the weeks between January 11 and May 31, 2021. Two models, IHME-CurveFit (Institute for Health Metrics and Evaluation, 2020) and JHU_IDD (Lemaitre et al., 2021) were excluded by this criteria. The ten eligible models each provided forecasts in at least 95% of those weeks. The results are displayed in Tables 2 (MAE) and 3 (WIS). Here entries represent the averages across all available forecasts for each horizon.4 For forecasts with a 7-day horizon, we find that Gecko was the top performing model at the national level according to both MAE and WIS. For state level forecasts, Gecko was in a virtual tie for first place with the Karlen-pypm model (Karlen, 2020), a mechanistic model based on discrete-time difference equations, with Gecko obtaining slightly lower MAE and Karlen-pypm obtaining slightly lower WIS. For 14-day forecasts, Gecko was in 2nd place for both state and national forecasts, with Karlen-pypm and JHUAPL-BUCKY (Kinsey, 2020), a metapopulation SEIR model, as the top-performers respectively according to both MAE and WIS. For 21-day forecasts, Gecko ranked lower but remained in the top half of models contributing to the ensemble.

Table 2

Horizon	National forecast			State forecasts
	7	14	21	7	14	21
COVID19Sim-Simulator (MGH Institute for Technology Assessment, 2020)	2817.3 (10)	3108.4 (10)	3771.3 (10)	73.8 (10)	76.7 (10)	84.0 (10)
CU-nochange (Pei and Shaman, 2020)	1074.3 (6)	1712.7 (6)	2702.1 (9)	28.0 (4)	40.5 (3)	61.8 (8)
GT-DeepCOVID (Rodríguez et al., 2020)	1069.9 (5)	1745.6 (8)	2536.3 (7)	27.8 (3)	41.2 (6)	60.1 (7)
JHUAPL-BUCKY (Kinsey, 2020)	996.0 (4)	1099.0 (1)	1354.0 (1)	45.4 (7)	50.5 (8)	58.0 (6)
JHUAPL-GECKO	756.7 (1)	1175.7 (2)	1950.5 (4)	25.2 (1)	37.6 (2)	53.9 (5)
Karlen-pypm (Karlen, 2020)	986.1 (3)	1661.6 (5)	2547.9 (8)	25.4 (2)	35.9 (1)	51.4 (3)
LANL-GrowthRate (Los Alamos National Laboratory, 2020)	2222.3 (9)	2149.7 (9)	1979.7 (5)	52.2 (8)	48.3 (7)	46.7 (1)
MOBS-GLEAM _COVID (Chinazzi et al., 2020)	877.7 (2)	1253.4 (3)	1794.8 (3)	32.3 (6)	40.9 (5)	52.3 (4)
UCLA-SuEIR (UCLA Statistical Machine Learning Lab, 2020)	1392.6 (8)	1343.0 (4)	1508.7 (2)	66.0 (9)	63.8 (9)	62.1 (9)
USC-SI_kJα (Srivastava et al., 2020)	1107.1 (7)	1717.4 (7)	2103.1 (6)	28.5 (5)	40.6 (4)	49.3 (2)
COVIDhub-ensemble (Ray et al., 2020)	825.3	1213.6	1755.8	22.6	30.8	41.6

Table 3

Average WIS for hospitalization forecasts between January 11, 2021 and May 31, 2021. The ten models listed are those that consistently provided forecasts for inclusion in the COVIDhub-ensemble (bottom row). Displayed values indicate the average MAE over all weeks where forecasts were available. The rank is listed in parenthesis. WIS are calculated using the median and 50% and 95% predictive intervals.

Horizon	National forecast			State forecasts
	7	14	21	7	14	21
COVID19Sim-Simulator (MGH Institute for Technology Assessment, 2020)	2342.7 (10)	2606.8 (10)	3200.5 (10)	63.7 (10)	64.9 (10)	70.8 (10)
CU-nochange (Pei and Shaman, 2020)	847.3 (6)	1149.2 (7)	1776.5 (9)	20.3 (4)	27.9 (5)	41.9 (8)
GT-DeepCOVID (Rodríguez et al., 2020)	627.3 (4)	1094.1 (6)	1651.9 (7)	17.8 (3)	27.0 (4)	41.7 (7)
JHUAPL-BUCKY (Kinsey, 2020)	690.3 (5)	729.4 (1)	897.1 (1)	29.7 (7)	33.0 (7)	37.6 (5)
JHUAPL-GECKO	483.6 (1)	742.7 (2)	1184.1 (4)	16.8 (2)	25.0 (2)	34.1 (4)
Karlen-pypm (Karlen, 2020)	561.4 (2)	938.5 (4)	1549.0 (6)	16.5 (1)	21.0 (1)	29.2 (1)
LANL-GrowthRate (Los Alamos National Laboratory, 2020)	1373.7 (9)	1214.4 (8)	1139.1 (3)	34.0 (8)	32.0 (6)	31.7 (2)
MOBS-GLEAM_COVID (Chinazzi et al., 2020)	606.8 (3)	768.8 (3)	1023.1 (2)	20.7 (5)	25.6 (3)	32.5 (3)
UCLA-SuEIR (UCLA Statistical Machine Learning Lab, 2020)	1064.8 (8)	1004.9 (5)	1196.8 (5)	59.4 (9)	55.7 (9)	53.3 (9)
USC-SI_kJα (Srivastava et al., 2020)	887.3 (7)	1402.3 (9)	1656.4 (8)	24.4 (6)	34.4 (8)	41.0 (6)
COVIDhub-ensemble (Ray et al., 2020)	460.5	733.9	1065.2	13.6	18.6	24.9

When considering whether models are of operational use, it is also important to consider whether they provide insight beyond the status quo. If the current state provides a better estimate of future states than the forecasts produced by a model, then the model will have low utility. For this reason, we also evaluate the performance of these models in light of a simple baseline model that predicts the most recent observed value. For this baseline, we estimate uncertainty by fitting a Gaussian distribution to past residuals relative to these constant predictions for each horizon from 1 to 21 days. Because it assumes no change, this baseline will perform particularly poorly during periods of rapid change. In Fig. 4, we display the percentage of forecasts with a lower WIS then this baseline model.

Fig. 4

Percentage of forecasts that outperform a lagged baseline model according to WIS. Horizons of 5, 6, 12, 13, 19, and 20 days correspond to weekends which generally have noticeably lower hospital admission totals than the Monday forecast date leading to worse baseline performance for those horizons.

Average MAE for confirmed COVID-19 hospital admission forecasts. The ten models listed are those that consistently provided forecasts for inclusion in the COVIDhub-ensemble (bottom row). Displayed values indicate the average MAE over all weeks where forecasts were available. The rank is listed in parenthesis. Average WIS for hospitalization forecasts between January 11, 2021 and May 31, 2021. The ten models listed are those that consistently provided forecasts for inclusion in the COVIDhub-ensemble (bottom row). Displayed values indicate the average MAE over all weeks where forecasts were available. The rank is listed in parenthesis. WIS are calculated using the median and 50% and 95% predictive intervals. We find that many of the hospital forecasting models outperform this baseline for horizons greater than 7 days. The ensemble model in particular outperforms the baseline more than 50% of the time for both state and national forecasts across all horizons except for 1 day (for state forecasts). Gecko outperforms this baseline more than 50% of the time for state forecasts across all horizons except for 7 days and close to 50% of the time for national forecasts for all horizons. Percentage of forecasts that outperform a lagged baseline model according to WIS. Horizons of 5, 6, 12, 13, 19, and 20 days correspond to weekends which generally have noticeably lower hospital admission totals than the Monday forecast date leading to worse baseline performance for those horizons.

Discussion

The strong short-term forecasting performance of Gecko was unexpected, particularly in light of its simplicity relative to other models within the ensemble. Most are based on compartmental SEIR models which describe the dynamics of populations containing susceptible, exposed, infected and recovered/immune individuals or agent-based models which describe the behavior of individuals within these populations. These models include a variety of parameters describing contact rates, incubation periods, recovery rates, mortality rates and hospitalization rates. For a novel pathogen like SARS-CoV-2, there is a great deal of uncertainty about these epidemiological parameters. Often, their values must be estimated using limited data (Cramer et al., 2021b). These estimates typically rely on a variety of different data sources that describe confirmed cases and deaths, and leverage alternative data streams tracking positive tests as well as human mobility patterns. In contrast, Gecko is based exclusively on the confirmed admissions time-series for each state and includes only 4 parameters fitted for each, and yet the forecasts produced by Gecko are among the most accurate within a 14 day horizon. One possible explanation for this observation is that by fitting to hospital admissions curves directly using a statistical approach with few parameters, Gecko is able to respond to behavioral changes and shifting dynamics more quickly than mechanistic models which are often constrained by the assumption that parameters are constant or vary slowly. One of the strengths of Gecko’s non-mechanistic modeling approach is its versatility. It can be applied to alternative time-series with few modifications. For example, it was also used for forecasting five hospitalization metrics not considered by other models within the forecast hub: suspected COVID-19 hospital admissions, staffed inpatient and ICU beds used as well as staffed inpatient and ICU beds used by COVID-19 patients. Starting in December 2020, Gecko was used to produce weekly reports describing the short term trajectories of these hospitalization metrics at the national, state and hospital referral region (HRR) level. These forecasts were distributed to public health officials and were used to identify possible hotspots in need of support and additional resources. These forecasts were also incorporated into the Project Greenlight dashboard, a tool created by the HHS to provide situational awareness of critical capacity indicators to public health and hospital officials. While Gecko’s performance does compare favorably to alternative forecasting models, it is has a number of limitations. Mechanistic models predict peaks and troughs, although the timing and magnitude of these predictions are highly sensitive to small perturbations and subject to a high degree of uncertainty (Daunizeau et al., 2020, Alberti and Faranda, 2020). Gecko cannot anticipate changes in the current trend and therefore would not be useful for predicting epidemic peaks and troughs. This limits its utility for long-term forecasting. Gecko also does not base its forecasts on parameters with clear epidemiological interpretations. This means that the various outcomes that fall within the predictive intervals cannot be attributed to explainable scenarios. This limits Gecko’s utility for evaluating the impact of policies and interventions. In addition, Gecko uses a Gaussian distribution when describing noise, despite the fact that hospital admissions are a non-negative discrete variable. This approximation seems to work well when hospital admissions are relatively large, but breaks down when the number of daily admissions is close to zero. This limitation can explain in part why the model performs better on states with large populations as well as why the 50% predictive intervals appear to be too wide (see Fig. 3). An alternative approach would be to use a Poisson or negative binomial distribution when accounting for process and measurement noise, but such models are difficult to fit in practice. Despite these limitations, Gecko’s success in these forecasting tasks suggests that statistical time-series models can provide a valuable complement to traditional epidemic models for hospital forecasting particularly when estimating short-term trajectories. Their simplicity allows them to be trained quickly with less data and without knowledge of the epidemiological parameters. Our analysis also suggests that many of the top performing models including the Karlen-pypm, USC-SI_kJ, JHUAPL-BUCKY and Gecko were created by modelers with expertise outside of the public health domain. Their strong performance may be related to their reliance on sophisticated optimization algorithms and methods from statistical and machine learning rather than domain knowledge. The success of these models suggests that a diverse set of modeling approaches is essential when creating ensemble models and highlights a need for further investigation into the factors that are responsible for their success.

CRediT authorship contribution statement

Mark J. Panaggio: Conceptualization, Methodology, Software, Validation, Visualization, Formal Analysis, Writing – original draft, Writing – review & editing. Kaitlin Rainwater-Lovett: Conceptualization, Writing – review & editing. Paul J. Nicholas: Supervision, Writing – review & editing. Mike Fang: Validation, Writing – review & editing. Hyunseung Bang: Validation, Writing – review & editing. Jeffrey Freeman: Project administration, Funding acquisition. Elisha Peterson: Conceptualization, Supervision, Writing – review & editing. Samuel Imbriale: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

13 in total

Review 1. A unifying review of linear gaussian models.

Authors: S Roweis; Z Ghahramani
Journal: Neural Comput Date: 1999-02-15 Impact factor: 2.026

Review 2. Outbreak analytics: a developing data science for informing the response to emerging pathogens.

Authors: Jonathan A Polonsky; Amrish Baidjoe; Zhian N Kamvar; Anne Cori; Kara Durski; W John Edmunds; Rosalind M Eggo; Sebastian Funk; Laurent Kaiser; Patrick Keating; Olivier le Polain de Waroux; Michael Marks; Paula Moraga; Oliver Morgan; Pierre Nouvellet; Ruwan Ratnayake; Chrissy H Roberts; Jimmy Whitworth; Thibaut Jombart
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2019-07-08 Impact factor: 6.237

3. Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA).

Authors: K E ArunKumar; Dinesh V Kalaga; Ch Mohan Sai Kumar; Govinda Chilkoor; Masahiro Kawaji; Timothy M Brenza
Journal: Appl Soft Comput Date: 2021-02-08 Impact factor: 6.725

4. Evaluating epidemic forecasts in an interval format.

Authors: Johannes Bracher; Evan L Ray; Tilmann Gneiting; Nicholas G Reich
Journal: PLoS Comput Biol Date: 2021-02-12 Impact factor: 4.779

5. A scenario modeling pipeline for COVID-19 emergency planning.

Authors: Joseph C Lemaitre; Kyra H Grantz; Joshua Kaminsky; Hannah R Meredith; Shaun A Truelove; Stephen A Lauer; Lindsay T Keegan; Sam Shah; Josh Wills; Kathryn Kaminsky; Javier Perez-Saez; Justin Lessler; Elizabeth C Lee
Journal: Sci Rep Date: 2021-04-06 Impact factor: 4.379

6. On the uncertainty of real-time predictions of epidemic growths: A COVID-19 case study for China and Italy.

Authors: Tommaso Alberti; Davide Faranda
Journal: Commun Nonlinear Sci Numer Simul Date: 2020-06-01 Impact factor: 4.260

7. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States.

Authors: Nicholas G Reich; Logan C Brooks; Spencer J Fox; Sasikiran Kandula; Craig J McGowan; Evan Moore; Dave Osthus; Evan L Ray; Abhinav Tushar; Teresa K Yamana; Matthew Biggerstaff; Michael A Johansson; Roni Rosenfeld; Jeffrey Shaman
Journal: Proc Natl Acad Sci U S A Date: 2019-01-15 Impact factor: 11.205

8. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak.

Authors: Matteo Chinazzi; Jessica T Davis; Marco Ajelli; Corrado Gioannini; Maria Litvinova; Stefano Merler; Ana Pastore Y Piontti; Kunpeng Mu; Luca Rossi; Kaiyuan Sun; Cécile Viboud; Xinyue Xiong; Hongjie Yu; M Elizabeth Halloran; Ira M Longini; Alessandro Vespignani
Journal: Science Date: 2020-03-06 Impact factor: 47.728

9. An open challenge to advance probabilistic forecasting for dengue epidemics.

Authors: Michael A Johansson; Karyn M Apfeldorf; Scott Dobson; Jason Devita; Anna L Buczak; Benjamin Baugher; Linda J Moniz; Thomas Bagley; Steven M Babin; Erhan Guven; Teresa K Yamana; Jeffrey Shaman; Terry Moschou; Nick Lothian; Aaron Lane; Grant Osborne; Gao Jiang; Logan C Brooks; David C Farrow; Sangwon Hyun; Ryan J Tibshirani; Roni Rosenfeld; Justin Lessler; Nicholas G Reich; Derek A T Cummings; Stephen A Lauer; Sean M Moore; Hannah E Clapham; Rachel Lowe; Trevor C Bailey; Markel García-Díez; Marilia Sá Carvalho; Xavier Rodó; Tridip Sardar; Richard Paul; Evan L Ray; Krzysztof Sakrejda; Alexandria C Brown; Xi Meng; Osonde Osoba; Raffaele Vardavas; David Manheim; Melinda Moore; Dhananjai M Rao; Travis C Porco; Sarah Ackley; Fengchen Liu; Lee Worden; Matteo Convertino; Yang Liu; Abraham Reddy; Eloy Ortiz; Jorge Rivero; Humberto Brito; Alicia Juarrero; Leah R Johnson; Robert B Gramacy; Jeremy M Cohen; Erin A Mordecai; Courtney C Murdock; Jason R Rohr; Sadie J Ryan; Anna M Stewart-Ibarra; Daniel P Weikel; Antarpreet Jutla; Rakibul Khan; Marissa Poultney; Rita R Colwell; Brenda Rivera-García; Christopher M Barker; Jesse E Bell; Matthew Biggerstaff; David Swerdlow; Luis Mier-Y-Teran-Romero; Brett M Forshey; Juli Trtanj; Jason Asher; Matt Clay; Harold S Margolis; Andrew M Hebbeler; Dylan George; Jean-Paul Chretien
Journal: Proc Natl Acad Sci U S A Date: 2019-11-11 Impact factor: 11.205

10. Applying infectious disease forecasting to public health: a path forward using influenza forecasting examples.

Authors: Chelsea S Lutz; Mimi P Huynh; Monica Schroeder; Sophia Anyatonwu; F Scott Dahlgren; Gregory Danyluk; Danielle Fernandez; Sharon K Greene; Nodar Kipshidze; Leann Liu; Osaro Mgbere; Lisa A McHugh; Jennifer F Myers; Alan Siniscalchi; Amy D Sullivan; Nicole West; Michael A Johansson; Matthew Biggerstaff
Journal: BMC Public Health Date: 2019-12-10 Impact factor: 3.295