Literature DB >> 35845843

Looking back on forward-looking COVID models.

Paul Chong¹, Byung-Jun Yoon^2,3, Debbie Lai^4,5, Michael Carlson^4,6, Jarone Lee⁷, Shuhan He⁸.

Abstract

Covid Act Now (CAN) developed an epidemiological model that takes various non-pharmaceutical interventions (NPIs) into account and predicts viral spread and subsequent health outcomes. In this study, the projections of the model developed by CAN were back-tested against real-world data, and it was found that the model consistently overestimated hospitalizations and deaths by 25%-100% and 70%-170%, respectively, due in part to an underestimation of the efficacy of NPIs. Other COVID models were also back-tested against historical data, and it was found that all models generally captured the potential magnitude and directionality of the pandemic in the short term. There are limitations to epidemiological models, but understanding these limitations enables these models to be utilized as tools for data-driven decision-making in viral outbreaks. Further, it can be valuable to have multiple, independently developed models to mitigate the inaccuracies of or to correct for the incorrect assumptions made by a particular model.

Entities: Chemical

Keywords: COVID-19; COVID-19 SEIR; COVID-19 epidemiological model; COVID-19 model; COVID-19 non-pharmaceutical interventions; COVID-19 vaccination; SEIR model; data science; epidemiological model

Year: 2022 PMID： 35845843 PMCID： PMC9278499 DOI： 10.1016/j.patter.2022.100492

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

Epidemiological models have been used since at least 1927 to prevent further disease spread, predict the behavior of disease, and inform control strategies., The advent of the unprecedented COVID-19 pandemic has propelled epidemiological models into the public and political consciousness. The outputs of these models have emerged as crucial signals for decision-makers in policy and public health, with calls for government-mandated non-pharmaceutical interventions (NPIs), such as stay-at-home orders, to be derived from data-driven thresholds, such as case numbers and transmission rates. To date, a number of models have been developed to forecast deaths and hospitalizations given current COVID trajectories. Many of these models have been published or posted online. They have also been referenced by policy makers and the press in discussions of what NPIs are most appropriate at the state and local levels. However, no studies have evaluated the performance of epidemiological models against historical data, nor have any studies investigated the relative performances of these models. Covid Act Now (CAN) is an independent 501c3 non-profit organization. Since March 20th, 2020, the model developed by CAN has provided COVID-19 case and mortality projections for all 50 US states. This model was developed with an impact-oriented approach, taking into account factors such as usability, accessibility, universality, adaptability, and actionability as outlined by Shah, Lai, and Wang, with the intent for it to be used as NPI on behavior similar to meteorological models. The aggregation of data in the weeks since its launch has provided an opportunity to assess the efficacy of the model built by CAN and identify areas of improvement. The objectives in this manuscript are as follows: to detail the mechanisms of the model developed by CAN, which formulates hospitalization and death projections given four different scenarios of policy interventions and public responses; to retroactively back-test the predictions made by the model built by CAN against actual data to determine degrees of error; to retroactively back-test the predictions of other models against actual data to determine degrees of error.

Results

A model capable of forecasting the differential possible trajectories of the COVID-19 outbreak was developed given different policy interventions and public behaviors. Additional information is provided in supplemental information. This study complies with the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER) statement.

Model

The model by CAN was adapted from a pre-existing model by Hill et al. (pictured in Figure S1). The model developed by CAN predicted the progression of COVID-19 in a given population by categorizing all individuals into one of four states relative to COVID-19 by the Susceptible, Exposed, Infected, and Recovered (SEIR) model: Susceptible (S): Since immunity is not hereditary, SEIR models assume that all individuals in a population are susceptible to the disease by birth and all individuals begin in the “susceptible” state, except for the already infected individual who introduces the COVID-19 into the population. Exposed (E): Individuals move to the “exposed” state upon coming into contact with COVID-19. Exposed individuals have been infected, but they are not yet capable of infecting others, nor do they have symptoms. The disease is also assumed to be transmitted to the individual by horizontal incidence; i.e., a susceptible individual becomes infected when in contact with infectious individuals. This contact may be direct (touching or biting) or indirect (air, cough, or sneeze). Infected (I): Individuals are further sub-categorized in the “infected” state into four sub-states: Infected 1: Asymptomatic + Infectedmild: This state encapsulates mild cases. After 7 days, 4% of the individuals in this state require hospitalization and progress to the sub-state “infected 2.” The remaining 96% of individuals progress to “recovered” state.9, 10, 11 Infected 2: Infectedhospital: This state encapsulates hospitalized cases requiring non-ICU treatment. After 7 days, 30% of the individuals in this state require ICU and/or ventilation and progress to the sub-state “infected 3.” The remaining 70% progress to the “recovered'' state. Infected 3: InfectedICU + ventilator: This state encapsulates critical cases requiring ICU and/or ventilator treatment. The model by CAN assumes all deaths must first pass through this category. After 7 days, 40% of individuals in this state progress to the “deceased” state. The remaining 60% progress to the “recovered” state. Recovered (R): Individuals who recover from infection move to the “recovered” state. It was assumed that individuals in this state are immune to further infection, though knowledge on immunity remains uncertain. Deceased (D): Those individuals who have died from the disease. All of these come from ICU cases. Additional information on the model (scenario definitions, parameters, inputs, etc.) can be found in the experimental procedures section.

Back-testing

There were 21 iterations of the model developed by CAN released between March 19th and June 7th, 2020. Moreover, it produced projections for all 50 states. Back-testing analysis thus made the following considerations: Model version: Which version of the model by CAN is to be back-tested? Region: What region is back-tested for? The model built by CAN produced projections for all 50 US states and their respective counties. Scenarios: Which scenario’s projections are back-tested for? The model by CAN produced projections for four scenarios: “No Action,” “Lax Shelter in Place,” and “Strict Shelter in Place” and “Projections Based on Current Trends.” In particular, accurate back-testing should compare the model scenario with the actual scenario in a particular region. Time period: What time period is back-tested against? To limit the scope of this study, it was decided to test the four key versions of the model representing the most significant changes, released on March 19th, March 31st, April 9th, and April 14th, 2020. It was decided to test California, which was the first state in the nation to order all residents to stay at home, beginning on March 19th, and began limited reopening on May 12th, starting with restaurants and shopping centers in counties that met certain criteria., These projections were compared against actual data from the model’s launch date, to June 3rd, 2020. These projections were then compared against actual data for the 14-day period following the model’s launch date. State-level projections made by the model by CAN were back-tested against actual data for the state of California at 3-week intervals for the dates of March 5th, March 25th, April 14th, and May 4th, 2020. See Table 1 for the results of the back-testing.

Table 1

Performance of successive CAN models

Model	RMSE for hospitalizations	RMSE for deaths	2-week RMSE for hospitalizations	2-week RMSE for deaths
3.19	44.65	72.55	27.98	25.22
3.31	85.93	51.22	54.90	24.54
4.09	73.07	51.74	82.91	47.47
4.14	10.30	17.61	17.15	24.35

Successive iterations of the CAN epidemiological model were evaluated for performance for comparison purposes between both consecutive models and 2-week performance. Analysis shows the improvement of performance with newer models, evidenced by decreasing RMSE (root-mean-square error) of hospitalizations and deaths of successive iterations. RMSE compares a predicted value and a known value, with smaller RMSE values indicating closeness of predicted and observed values. RMSE was calculated with model iteration predictions in 3-week intervals starting from March 5th, March 25th, April 14th, and May 4th, respectively.

Performance of successive CAN models Successive iterations of the CAN epidemiological model were evaluated for performance for comparison purposes between both consecutive models and 2-week performance. Analysis shows the improvement of performance with newer models, evidenced by decreasing RMSE (root-mean-square error) of hospitalizations and deaths of successive iterations. RMSE compares a predicted value and a known value, with smaller RMSE values indicating closeness of predicted and observed values. RMSE was calculated with model iteration predictions in 3-week intervals starting from March 5th, March 25th, April 14th, and May 4th, respectively. In keeping with the aforementioned considerations, retrospective back-testing of other epidemiological models was performed in an effort to evaluate the performance of the model by CAN compared to other available models. Back-testing was performed for the state of California for uniformity purposes and across multiple time spans to evaluate for consistency of the models over time. The following models were back-tested against historical data and compared to the model built by CAN: Institute for Health Metrics and Evaluation (IHME), Massachusetts Institute of Technology, UCLA, University of Texas, and Youyang Gu. Only the IHME model predicted hospitalizations along with deaths, so two separate analyses of back-testing were done: one between the model by CAN and the IHME model (for hospitalizations) and another between the model developed by CAN model and all other models16, 17, 18, 19 (for deaths). The former back-testing comparison was done for various iterations of both models across March 4th to July 19th, 2020, and the latter comparison was done across five different time spans that were oriented on the models’ respective performances within their particular iterations. The results of the back-testing of COVID-19 hospitalizations for the state of California between the model built by CAN and IHME models are given in Table 2. See Figure 1 for relative performances of the various epidemiological models from June 16th to July 13th, 2020.

Table 2

Comparison of model predictions of California COVID-19 hospitalizations

CAN iteration	RMSE of predictions	IHME iteration	RMSE of predictions
05.20	1,528.02	05.20	1,764.53
05.29	1,518.35	05.29	2,241.23
06.06	1,326.82	06.06	2,667.21
06.15	1,145.39	06.13	2,633.40
06.27	4,299.12	06.27	2,791.76

The CAN and IHME models were the only models to predict COVID-19 hospitalizations, and their respective performances are displayed above for comparison purposes. The superior performance of the CAN model can be seen in the lower RMSEs (root-mean-square errors) compared with that of IHME. RMSE compares a predicted value and a known value, with smaller RMSE values indicating closeness of predicted and observed values. Predictions of successive iterations of each model were evaluated against historical data from March 4th to July 19th, 2020.

Figure 1

Performance of models’ California COVID-19 death predictions

Pictured are the performances of predictions by epidemiological models evaluated against historical data for COVID-19 deaths in the state of California from June 16th to July 13th, 2020. Dates in parentheses to the right of models’ abbreviations in the figure legend correspond to the date of the model’s predictions in the year 2020.

Comparison of model predictions of California COVID-19 hospitalizations The CAN and IHME models were the only models to predict COVID-19 hospitalizations, and their respective performances are displayed above for comparison purposes. The superior performance of the CAN model can be seen in the lower RMSEs (root-mean-square errors) compared with that of IHME. RMSE compares a predicted value and a known value, with smaller RMSE values indicating closeness of predicted and observed values. Predictions of successive iterations of each model were evaluated against historical data from March 4th to July 19th, 2020. Performance of models’ California COVID-19 death predictions Pictured are the performances of predictions by epidemiological models evaluated against historical data for COVID-19 deaths in the state of California from June 16th to July 13th, 2020. Dates in parentheses to the right of models’ abbreviations in the figure legend correspond to the date of the model’s predictions in the year 2020. Statistical analysis of the various models’ predictions for COVID-19 deaths in the state of California between May 19th to July 19th, 2020, was performed. The model by CAN was found to have a higher root-mean-square error (RMSE) for deaths than the remainder of the models, with none of the other models having significantly different RMSEs for deaths, respectively (p < 0.05); see visualization in Figure 2.

Figure 2

RMSEs of models’ California COVID-19 death predictions tables

Pictured are the RMSEs (root mean square error) of epidemiological models’ predictions of COVID-19 deaths from May 19th to July 19th, 2020. RMSE compares a predicted value and a known value, with smaller RMSE values indicating closeness of predicted and observed values. The x axis aggregates the five back-testing periods of evaluation of the models in addition to the overall average RMSE of each model for COVID-19 deaths.

RMSEs of models’ California COVID-19 death predictions tables Pictured are the RMSEs (root mean square error) of epidemiological models’ predictions of COVID-19 deaths from May 19th to July 19th, 2020. RMSE compares a predicted value and a known value, with smaller RMSE values indicating closeness of predicted and observed values. The x axis aggregates the five back-testing periods of evaluation of the models in addition to the overall average RMSE of each model for COVID-19 deaths.

Discussion

Implications

Overall, the model developed by CAN appears to have overestimated hospitalizations and deaths. This is due in part to underestimation of the efficacy of NPIs. Hospitalization projections were made with a certain pessimism, and it has since been observed that shelter in place can be highly effective, more so than initially anticipated. The shelter-in-place intervention reduced infection growth rates by as much as 70%. The comparison of other epidemiological models to the model developed by CAN showed a consistent overestimation of both hospitalizations and deaths for the model by CAN compared to other models. The model developed by Youyang Gu showed to be both the most accurate and precise model with regard to predicting deaths from COVID-19. Though other models more accurately predicted the real-world numbers, it is worth noting that the model built by CAN was the most accurate epidemiological model to forecast both hospitalizations and deaths (over the IHME model) and provided much more extensive predictions in terms of time than any of the compared models. The longer time period of projections offered by the model developed by CAN was not accompanied by comparable accuracy in predicting deaths, though it was more accurate than the IHME model in predicting hospitalizations while also providing longer projection periods. Although all of the models evaluated besides the model by CAN did not vary significantly from one another and were largely consistent in prediction performance, certain variances in performance (which can be observed in Figure 2) can be explained by the reality of inconsistency and variation in viral propagation or case reporting. This reveals a potential advantage of models that provide shorter-length but more concentrated and accurate viral propagation projections over the model built by CAN, highlighting the merit of sharing the mechanisms of the aforementioned models as is done with the model by CAN in this paper. COVID-19 has shown the utility of epidemiological modeling for large-scale, unanticipated outbreaks and is extremely useful for predicting magnitude and directionality of the disease, so policy makers and healthcare institutions can better prepare and respond. The data from these models continue to inform federal, state, and local responses to the ongoing COVID-19 pandemic.

Limitations

To date, COVID-19 has infected only a fraction of the world’s population. Variables in the model by CAN will almost certainly change over time. Due to the absence of historical precedent, infection rates in the case of interventions are best guesses informed by data. As well, it is the assumption of this model that individuals who are infected cannot be infected again, though as the pandemic progress, this is coming into question as a valid assumption. Many of the data inputs, e.g., hospitalization rate and fatality rate, are based on early estimates that are likely to be imperfect and will likely change. Data sources that were utilized may be unreliable in unexpected and unknown ways. Given the abundance of challenges and limitations encountered in the work described, an additional, non-comprehensive list of limitations can be found in Table S6. All users should err on the side of caution and interpret the results of the model conservatively. The model by CAN, and other epidemiological models, do not take vaccinations into account, and are thus limited in utility upon vaccine development and dissemination; however, such models, including the model developed by CAN, have merit as valuable tools in future viral outbreaks prior to vaccine development when NPIs predominate as the vital means of controlling viral spread and promoting public health.

Experimental procedures

Resource availability

Lead contact

Shuhan He, MD, she@mgh.harvard.edu

Materials availability

There were no new unique reagents or materials generated in this study.

Inferred parameters

Rather than use identical values across all geographies, the model by CAN was informed with actual local data as much as possible. The developed SEIR model and corresponding intervention model are fit to available incident hospitalization, mortality, and case rates using a multivariate maximum likelihood formalism. The intervention model assumes an unmitigated growth rate until a time t_break, at which point the reproduction number falls, over the course of 2 weeks, to a final value Reff = R0 ∗ epsilon, where epsilon is a percentage reduction. Inferred parameters are listed in Table S1, and for each geography, seven model parameters were fit (Table S2); input parameters are also listed in the supplemental information in Table S3.

Interventions and responses

Four scenarios were modeled: (1) a baseline scenario in which no action is taken, (2) a Wuhan-style scenario observing a contact rate reduction of 92% based upon actual outcomes in Wuhan, (3) a “strict” scenario resulting in a reduced contact rate reduction of 70%, and (4) a “lax” scenario resulting in a further reduced contact rate reduction of 50%. It was not within the scope of the model developers at the time to model additional scenarios due to considerations of simplicity and usability, though this scope may be expanded in the future. Each of these scenarios were modeled assuming measures and behaviors are held consistently for 12 consecutive weeks. A full list of definitions can be found in the supplemental information in Table S4.

Effective reproduction number R(t)

The method for inferring R(t) was an adaptation of a pre-existing methodology by Systrom and Vladeck., Several modifications were made, including incorporating mortality data to alleviate some of the systematic uncertainties associated with new COVID-19 case tracking. This basic method assumes that the number of new cases and mortalities is modeled by a Poisson process whose underlying rate is generated by an underlying inferred time varying rate:, The steps for inferring this are as follows: Noise in case and mortality data is smoothed using a Gaussian kernel of width 5 days (out to a 14-day window) to produce a trade-off between providing leading information and reducing spurious signals such as clearance of testing backlogs. A prior distribution is derived using the posterior estimate for R(t-1), where the initial prior at t0 is given by Gamma(3) distribution. A Bayesian update rule is applied at each new time step, where a serial period of 6 days is applied, and a likelihood on the Poisson rate is multiplied by the previous day’s posterior. An additional Gaussian process prior is applied on R_t so that day over day drift is penalized with a SD of 0.05. This helps to further reduce drift day to day and can be physically justified. Results are weekly sensitive to this choice, but clear artifacts are present with larger values. 90% CIs are computed using the resulting posterior estimates. Reported numbers of new cases, hospitalizations, and mortalities all suffer from systematic lags and lags from the disease itself. This lag is accounted for by matching the curvature of each source’s R(t) sequence against new cases. Specifically, the time lag between [–21, +5] days that maximizes the cross-correlation (Pearson R) of the first derivatives is identified using the most recent 30 days. This leads to a distribution of lags across states between mortalities and cases that suggests strongly that, for most states, cases and deaths have approximately the same level of “indicator lag,” where the values based on the inferred lag are shifted. The distinct fit for each state and county is utilized to calculate the composite indicator using deaths and cases only, as hospitalization reporting has not yet stabilized in many locations.27, 28, 29 Test capacity increase is accounted for by simply rescaling the number of new cases as 1/new tests, where new tests are smoothed by the same process as cases and deaths.

Data inputs

The full list of data inputs utilized can be found in Table S5. In selecting data inputs, the model development team prioritized the following: Availability: Sources that make available the data required at the granularity required are utilized. Given the novel nature of COVID-19, there is a limited selection of such data. When possible, the model developed by CAN is validated against multiple sources. Authoritativeness: Data sources that were credible and transparent were prioritized. The model development team conducts quality assurances. Timeliness: Given the fast-changing nature of COVID-19, data sources that are consistently updated were prioritized to most closely capture the current state of the pandemic. Openness: Where possible, open-source data sources were utilized.

12 in total

1. Contributions to the mathematical theory of epidemics--I. 1927.

Authors: W O Kermack; A G McKendrick
Journal: Bull Math Biol Date: 1991 Impact factor: 1.758

2. Wrong but Useful - What Covid-19 Epidemiologic Models Can and Cannot Tell Us.

Authors: Inga Holmdahl; Caroline Buckee
Journal: N Engl J Med Date: 2020-05-15 Impact factor: 91.245

Review 3. COVID-19 pandemic: from origins to outcomes. A comprehensive review of viral pathogenesis, clinical manifestations, diagnostic evaluation, and management.

Authors: RohanKumar Ochani; Ameema Asad; Farah Yasmin; Shehryar Shaikh; Hiba Khalid; Simran Batra; Muhammad Rizwan Sohail; Syed Faisal Mahmood; Rajkumar Ochani; Mohammad Hussham Arshad; Arjan Kumar; Salim Surani
Journal: Infez Med Date: 2021-03-01

Review 4. Guidelines for Accurate and Transparent Health Estimates Reporting: the GATHER statement.

Authors: Gretchen A Stevens; Leontine Alkema; Robert E Black; J Ties Boerma; Gary S Collins; Majid Ezzati; John T Grove; Daniel R Hogan; Margaret C Hogan; Richard Horton; Joy E Lawn; Ana Marušić; Colin D Mathers; Christopher J L Murray; Igor Rudan; Joshua A Salomon; Paul J Simpson; Theo Vos; Vivian Welch
Journal: Lancet Date: 2016-06-28 Impact factor: 79.321

5. Estimates of the severity of coronavirus disease 2019: a model-based analysis.

Authors: Robert Verity; Lucy C Okell; Ilaria Dorigatti; Peter Winskill; Charles Whittaker; Natsuko Imai; Gina Cuomo-Dannenburg; Hayley Thompson; Patrick G T Walker; Han Fu; Amy Dighe; Jamie T Griffin; Marc Baguelin; Sangeeta Bhatia; Adhiratha Boonyasiri; Anne Cori; Zulma Cucunubá; Rich FitzJohn; Katy Gaythorpe; Will Green; Arran Hamlet; Wes Hinsley; Daniel Laydon; Gemma Nedjati-Gilani; Steven Riley; Sabine van Elsland; Erik Volz; Haowei Wang; Yuanrong Wang; Xiaoyue Xi; Christl A Donnelly; Azra C Ghani; Neil M Ferguson
Journal: Lancet Infect Dis Date: 2020-03-30 Impact factor: 25.071

Review 6. Diagnosis of COVID-19 for controlling the pandemic: A review of the state-of-the-art.

Authors: Nastaran Taleghani; Fariborz Taghipour
Journal: Biosens Bioelectron Date: 2020-11-27 Impact factor: 10.618

7. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study.

Authors: Kiesha Prem; Yang Liu; Timothy W Russell; Adam J Kucharski; Rosalind M Eggo; Nicholas Davies; Mark Jit; Petra Klepac
Journal: Lancet Public Health Date: 2020-03-25

8. Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts.

Authors: Joel Hellewell; Sam Abbott; Amy Gimma; Nikos I Bosse; Christopher I Jarvis; Timothy W Russell; James D Munday; Adam J Kucharski; W John Edmunds; Sebastian Funk; Rosalind M Eggo
Journal: Lancet Glob Health Date: 2020-02-28 Impact factor: 26.763

9. T cell immunity to SARS-CoV-2 following natural infection and vaccination.

Authors: Anthony T DiPiazza; Barney S Graham; Tracy J Ruckwardt
Journal: Biochem Biophys Res Commun Date: 2020-10-23 Impact factor: 3.575