Literature DB >> 35923319

Projecting COVID-19 Cases and Subsequent Hospital Burden in Ohio.

Wasiur R Khuda Bukhsh, Caleb Deen Bastian, Matthew Wascher, Colin Klaus, Saumya Yashmohini Sahai, Mark Weir, Eben Kenah, Elisabeth Root, Joseph H Tien, Grzegorz Rempala.

Abstract

As the Coronavirus 2019 (COVID-19) disease started to spread rapidly in the state of Ohio, the Ecology, Epidemiology and Population Health (EEPH) program within the Infectious Diseases Institute (IDI) at the Ohio State University (OSU) took the initiative to offer epidemic modeling and decision analytics support to the Ohio Department of Health (ODH). This paper describes the methodology used by the OSU/IDI response modeling team to predict statewide cases of new infections as well as potential hospital burden in the state. The methodology has two components: 1) A Dynamic Survival Analysis (DSA)-based statistical method to perform parameter inference, statewide prediction and uncertainty quantification. 2) A geographic component that down-projects statewide predicted counts to potential hospital burden across the state. We demonstrate the overall methodology with publicly available data. A Python implementation of the methodology has been made available publicly. Highlights: We present a novel statistical approach called Dynamic Survival Analysis (DSA) to model an epidemic curve with incomplete data. The DSA approach is advantageous over standard statistical methods primarily because it does not require prior knowledge of the size of the susceptible population, the overall prevalence of the disease, and also the shape of the epidemic curve.The principal motivation behind the study was to obtain predictions of case counts of COVID-19 and the resulting hospital burden in the state of Ohio during the early phase of the pandemic.The proposed methodology was applied to the COVID-19 incidence data in the state of Ohio to support the Ohio Department of Health (ODH) and the Ohio Hospital Association (OHA) with predictions of hospital burden in each of the Hospital Catchment Areas (HCAs) of the state.

Entities: Chemical

Year: 2022 PMID： 35923319 PMCID： PMC9347277 DOI： 10.1101/2022.07.27.22278117

Source DB: PubMed Journal: medRxiv

Introduction

The coronavirus 2019 (COVID-19) disease has resulted in 267 million confirmed cases and 5 million deaths reported globally as of 8 December 2021 [10]. As evidenced by epidemics in many countries, particularly in Italy and Spain, COVID-19 patient case loads have the potential to overwhelm healthcare systems [14]. The Infectious Diseases Institute (IDI) at the Ohio State University working in conjunction with the College of Public Health (CPH), the Department of Mathematics, and the Sustainability Institute (SI) established a working relationship with the Ohio Department of Health (ODH) to act as a service to the State, beginning in 2018. Based on this initial collaborative relationship, the Ecology Epidemiology and Population Health (EEPH) program within IDI took the initiative to offer epidemic modeling and decision analytics support to ODH preparatory planning and response to the COVID-19 pandemic, specifically the Ohio epidemic. Just as in any computational modeling effort, there are many approaches and methods to choose from to model emerging epidemics. Different methods are suited to different situations in terms of data, scenarios, conditions, and urgency. This is no different for modeling the COVID-19 pandemic, either nationally or for the state of Ohio. Modeling approaches include: compartmental models [11, 28, 35], statistical models [16], and agent-based simulations [13]. For the state of Ohio, the OSU-IDI group has approached the modeling challenge on two fronts by developing: 1) projected statewide estimates of a time series of COVID-19 incidence, and 2) a geographic component that transforms the output of the statewide model to hospital burden by county or "hospital catchment" area.

Methods

Generalization of Methods Used

The predictive statewide model for Ohio comprises a dynamic network model, where network edges (contacts between nodes in the network) can interact with each other [24], but it has three key distinguishing features: The model considers a dynamic network where the edges can be deactivated over time—supporting social distancing impacts more accurately. The law of large numbers yields a set of differential equations describing the disease process on a large network [17] without requiring simulation methods. The solutions of these differential equations can be used to estimate model parameters using a principled statistical approach based on survival analysis. We write an explicit likelihood for the parameters given data on times of illness onset [19]. Critically, this allows for a more accurate quantification of the uncertainty in the model predictions. Because of these features, the DSA approach retains the tractability of an analytic model while incorporating complex human networks to better represent social interactions and distancing. The projected statewide model takes data on illness onset dates of confirmed cases as input and produces estimates of COVID-19 incidence (i.e., new cases) in Ohio over time. This output is not age-stratified, but the age distribution of the new cases is assumed to match the age distribution of confirmed cases when projecting the number of hospitalizations, which occurs in the next step of the model. This is done separately for illness severity and hospitalization since these rates vary for COVID-19 patients based on age and comorbidity stratification [31, 33]. Rather, the projected statewide model provides an estimate of the number of new and cumulative cases over time across the entirety of Ohio. Estimates of the number of hospitalized COVID-19 cases are derived in the geographic component of the model. Because illness severity and the risk of hospitalization for COVID-19 patients vary according to age and comorbidities [31, 33], we use age structure and population density to distribute case counts from the statewide epidemic model across smaller geographic areas within the state. Within each geographic unit, we use its own age distribution to project hospitalization rates over time. Consequently, counties or hospital catchment areas that have a different demographic structure (e.g., older or younger populations) will differ in their COVID-19 hospitalizations over time.

Detailed Methods of the Predictive Model

There are challenges to using traditional compartmental models [25] to estimate future COVID-19 incidence. One of the most fundamental challenges is that these methods require knowledge of the size of the susceptible population. In our predictive model, we use an approach called Dynamic Survival Analysis (DSA) [20, 30, 34], which is an extension of survival dynamical systems [6, 19]. Details on this approach and the model development can be found in Appendix A. Three key strengths of the DSA approach for predicting novel virus epidemics such as COVID-19 are: It does not require knowledge of the size of the susceptible population, It does not require information on overall disease prevalence in the population. It does not require prior knowledge of the shape of the epidemic curve. Since testing has focused on the most symptomatic and severe cases, we have almost no information on the number of asymptomatic infections or those with less severe symptoms who are not tested. Because of the novel nature of this virus and the resulting pandemic, analysis of the epidemic curve cannot be based on previous epidemics caused by other viruses e.g. SARS-CoV-1, or influenzas. Because it relies on illness onset times rather than counts of new cases, the DSA method can incorporate a partial epidemic curve like the one produced by testing for COVID-19 in Ohio, which is affected by both limited testing capacity and undetected asymptomatic infections. The DSA method has been developed to handle such limited data in a manner that allows accurate quantification of uncertainty. The method is strengthened by its simplicity as it requires only a single differential equation. Parameters for the model are inferred using empirical temporal data on new illnesses, and the target output is a time series of expected future illnesses. Initially, we used maximum likelihood estimates (MLEs) to fit the DSA model to Ohio data. However, as a substantial amount of data became available, we adopted a Hamiltonian Monte Carlo-based Bayesian approach to parameter inference and uncertainty quantification. An implementation of our inference method in the Python programming language is made available as a GitHub repository [4]. Our projected statewide model provides robust estimates of cases over time given partially observed daily counts of new illnesses. This is ideal in a setting where testing capacity is limited or changing due to constraints on lab capacity or detection limits. The counts of new illnesses are often known as an observed epidemic curve in the literature [36]. Our approach is derived from the general stochastic model of a pathogen spread across a contact network where the nodes represent individuals in a community [17]. As a working model of a contact network we use a type of random graph called a dynamic configuration model (CM) [7]. The mean-field approximation for epidemic processes on such networks is sometimes referred to as a pairwise model [21].

Assumptions:

All models have specific assumptions used to develop and implement them. First, we assume that each individual in the network (node) has a number of neighbors (their degree). Subsequently, a local Markovian infectious pressure changes their status from susceptible (S) to infective (I) to removed (R). We further assume that the R individuals are no longer able to transmit the infection and cannot be reinfected—an unknown, but readily assumed component of contemporary models. For the dynamics of the network model, individuals transition between the S, I, and R compartments based on the following principles that allow the extraction of a mathematical model of the ongoing epidemic based on observable data: Disease spread occurs over a network of contacts, that is, an infectious individual can infect their immediate neighbors at a fixed positive rate. It is implicitly assumed that the average number of a person’s contacts is also positive. Each infected individual recovers from infection at a positive rate, or is restricted from contacting their network neighbors through mandatory or voluntary isolation at a positive rate. Once infected, an infected individual has an infectious period that is assumed to follow an exponential distribution. People who are ill remain infectious, and a partial count of new illnesses is observed over time with a negligible chance of misdiagnosis (i.e., false positives).

Estimating Transmission Rates:

The complete model description and detailed development can be found in the appendix. In brief, the statistical model is used to estimate the number of people and timing of transfer between states S, I, and R. Then through substitution, a term S, the number of susceptible people at time t is developed. Embedded within this S is an improper survival function for the time to the onset of illness in a randomly chosen susceptible person. Using this survival function approach, the time to infection of a randomly selected susceptible person within a large population follows a temporal pattern determined by probability laws. After we estimate the time series of susceptibles, we can estimate the probability of a randomly selected susceptible individual being infected during the lifetime of an epidemic. From this we develop the conditional probability of remaining susceptible past time t.

Estimating Dropout and Recovery Rates:

The impact of social distancing is accomplished via a generalized approach to a person being dropped from the network. As has been visualized elsewhere, when a person is removed from the network, their neighbors and contacts within the network are removed, which limits transmission. This is accomplished by estimating the rate of infectious contact within a network (β in the appendix), and then considering drop outs as a function of recovery rate (details in appendix).

Changes in Parameters Due to Interventions:

Introducing and then easing restrictions on social distancing in the state has likely resulted in changes in parameters. The effect of such changes is incorporated into the predictive model.

Estimating Hospitalization and ICU Admission Onset

After the statewide model produces time series estimates of cases across the state, these are then translated into estimates of hospitalizations and subsequent ICU admission. The complete derivation of this model can be seen in Appendix B. In short, this is developed in three steps and is integrated with the geographic modeling methods described next. Estimate case onset for each age group based on age stratification in each geographic area modeled. Estimate the number of severe cases that will need hospitalization based first on CDC data [32] and then updated with Ohio-specific data as those data become more conducive to predictive modeling. Estimate a probability distribution (details in Appendix B) for the time from case onset to hospitalization as a function of age, sex, and race. Using the same variables, we also estimated the time from hospitalization to discharge or death and the probability of being admitted to the ICU. This provides a means to model hospital occupancy based on Ohio-specific data.

Geographic Modeling Methods

The time series of estimated cases of COVID-19 illness are output from the statewide model described above. These outputs are then used to assess hospital burden (eg. hospitalizations and ICU admissions) using age structure of the population and the estimated length of time a patient will be in a hospital or ICU bed. This involved a three step estimation process: The proportion of the total Ohio population residing in each geographic area (e.g., county or hospital catchment area) was estimated from U.S. Census data. Daily case counts were distributed across geographic areas using this proportion, essentially distributing cases by population density. This created a time series of projected case counts by geographic area. To account for age differentials in hospitalization and ICU use noted by both the popular press and in the scientific literature, we estimated the proportion of new cases expected to require hospitalization or admission to the ICU using the age distribution in each geographic unit at each time step. Initially, we used estimates from the Morbidity and Mortality Weekly Report (MMWR) published by CDC [32]. These nationwide estimates of the age distribution of hospitalization were used only while the number of hospitalizations and ICU admissions in Ohio remained too low to make reliable forecasts. Ohio rates were substituted in subsequent runs of the model. We estimated the number of hospitalized cases by assuming higher rates of hospitalization for older segments of the population. Thus, in counties with an older age structure, a larger proportion of cases would convert into a hospitalization. We used the estimated number of new hospitalizations and the average length of stay (LOS) to estimate the total hospital burden for each day in the time series. Initially, the average LOS was derived from the literature, but we used observed LOS from Ohio hospitals when enough COVID-19 cases were identified to provide robust estimates. We used a bi-modal distribution, with an average LOS of 5 days for non-ICU patients and 14 days for those requiring ICU care. For each time step, the number of new hospitalized cases estimated from the model was added to the number of cases still in the hospital. The cases that "timed out" of their hospital stay due to the LOS parameter were subtracted from the total. This created an estimated net number of patients in the hospital for each day in the time series which accounted for "patient stacking" over time. Once daily hospital counts were estimated, we compared these to the reported number of COVID available beds pooled across all hospitals in a geographic area. This allowed us to understand when and where hospital bed need might exceed current capacity.

Description of Data

Demographic data

Demographic data on the age distribution for Ohio counties and ZIP Codes were obtained from the U.S. Census Bureau’s 5-year American Community Survey 2014-2018 estimates [8]. The ACS data were used because the sample size was large enough for small geographies for reasonable standard errors and stable estimates. The Census Bureau does not develop data products for the United States Postal Service (USPS) ZIP Codes. Rather, ZIP Code Tabulation Areas (ZCTAs) are generalized areal representations of USPS ZIP Code service areas. We mapped the Census ZCTAs to ZIP Codes and used population estimates for ZCTAs. We used table B01001 that breaks out population counts by sex and 5 year age groups. Figure 1 shows the distribuion of the high risk population (age 55+) by county and Hospital Catchment Area in Ohio.

Figure 1:

Proportion of the population over the age of 55 in A) counties and B) hospital catchment areas

OHA Hospital and Bed Data:

The Ohio Hospital Association (OHA) provided data on the location of all hospitals in the state and the number of registered beds for each facility. Beds were broken out into several categories: Airborne Isolation, Critical Care, General Medical/Surgical Beds, and Extracorporeal Membrane Oxygenation (ECMO) beds. These bed types represent two levels of care required for COVID-19 patients, general hospital care for severe disease, and ICU beds for patients requiring ventilation. The OHA data contains information that constitutes a “trade secret” under Ohio Revised Code Section 1333.61.

Description of OHA hospital market share data

OHA provided data on hospital market share derived from administrative hospital claims from the past year (January 2019-December 2019, inclusive). For each hospital, patient encounters were grouped by patient zip code. A hospital’s market area included zip codes that represented the top 80% of all encounters at that hospital.

Definition of Hospital Catchment Areas

We developed small area estimates from state-level model predictions for counties and hospital catchment areas (HCA). Boundary files for the 88 Ohio counties were obtained from the U.S. Census Cartographic Boundary Files [9]. We developed hospital catchment areas using an approach modified from the Dartmouth Atlas Project [1]. We define a hospital catchment area as a collection of ZIP codes whose residents receive most of their hospital care from the hospital(s) in that region. Figure 1-B shows the HCAs developed and mapped. Note that in figure 1-B there are HCAs that cross over into neighboring states. This is explained in more detail in the subsection below. HCA definitions depend on the integrated use of geospatial methods that grouped each zip code in the state with the most geographically proximate hospital and modified these groupings using data on hospital market share by zip code. Only hospitals with acute care beds that could be used for COVID-19 patients were included in the analysis. We excluded facilities such as long-term acute care (LTAC), hospice, orthopedic, rehabilitation or psychiatric/behavioral health hospitals and freestanding ERs. We defined HCAs using three steps: The location of all hospitals in Ohio, and those in Michigan, Indiana, West Virginia, Kentucky and Pennsylvania located on the border with Ohio, was used to generate a Voronoi diagram with hospitals as generating points [26]. We included hospital in neighboring states in the Voronoi analysis to avoid edge effects, which would attribute patients to Ohio hospitals that typically use hospitals in other states. This yielded 233 distinct areas, one for each hospital generator. Twenty-eight areas were subsequently deleted because they included no area within OH. Voronoi polygons were overlaid with ZIP codes. ZIP codes were assigned to the Voronoi polygon if their centroid fell within the polygon. This ensured that each ZIP code was associated with the most geographically proximate hospital and divided the state into groupings of zip codes assigned to each hospital. Using the OHA hospital market share file, we examined the level of agreement between each Voronoi polygon and market share zip codes for each hospital. In cases where adjacent Voronoi polygons were generated by hospitals that also shared 60% or more of their market share zip codes, we aggregated these polygons to create one hospital catchment area. In large metropolitan areas with many hospitals, we aggregated groupings of Voronoi polygons to create regional catchment areas. In rural areas, this typically resulted in aggregating two adjacent polygons. Using this procedure, we generated 96 Hospital Catchment Areas for the state of Ohio. Some HCAs included zip codes from neighboring states, and some Ohio ZIP codes were included in HCAs for non-Ohio hospitals. HCAs are shown in Figure 2.

Figure 2:

Estimates of hospital burden mapped to hospital catchment areas for 12 days during the pandemic

Results

Here, we present a brief summary of our model fits and predictions. One of the hallmarks of the COVID-19 pandemic has been the change in human bahavior due to various interventions throughout the course of the pandemic. As a consequence, the COVID-19 epidemic curve in Ohio deviated much from a typical epidemic curve. The Ohio curve initially followed an exponential growth phase, then a phase of steady linear growth and a decline, and then finally, due to reopening of the state, another phase of exponential growth. As such, it is natural to believe the parameters are potentially different in these different phases. Therefore, we fit the DSA model with multiple change points. In Figure 3, we compare actual counts of daily new infections against the model predictions. As one can see, the fitted trajectories follow the observed epidemic curve reasonably well. The inner confidence bound is the true posterior confidence bound obtained point-wise. However, it is worth noting that predicted trajectories underestimate the variance or over-dispersion in the observed counts of daily new infections. Therefore, we adopted an empirical variance adjustment method to account for this possible underestimation. The broader confidence bound corresponds to the variance adjusted trajectories. More details on the variance adjustment method and other diagnostic plots are provided in Appendix B. The daily counts thus predicted are then down-projected into predictions of hospitalization surge across the state in the geographic component of the model.

Figure 3:

Predictions based on the fitted model vs actual daily incidence in Ohio as reported by the Ohio Department of Health. We have assumed three change points here corresponding to four density segments. Day zero corresponds to March 1, 2020. The first change point is on March 17, 2020. The second change point is on June 1, 2020 after which the state observed another phase of exponential growth. The third change point is on October 1, 2021, which marks the beginning of second wave characterized by a severe exponential growth phase. The broader (outer) confidence bound corresponds to the variance adjusted trajectories.

Discussion

Comparison with Other Approaches

Although, due to the significant differences in approach, it is difficult to directly compare our projective model to other contemporary methods, some indirect comparisons may be offered. The susceptible-infectious-recovered (SIR) framework is the basis for many COVID-19 epidemic models [11, 22, 28, 35]. Several websites provide implementations of the SIR framework and present scenarios with different interventions such as social distancing affecting the transmission parameter β [11, 28], thus allowing for different hypothetical scenarios to be explored. A key difference, however, is that phase-space variables from the SIR model are infections, rather than symptomatic cases. Consequently, knowledge of asymptomatic to symptomatic infections is needed to translate output from these models to cases and hospitalizations. Recent work [11] has attempted to extend the basic SIR model to include additional compartments corresponding to incubation, presymptomatic infectious, stages of symptomatic infectious, and hospitalization. Considering the lack of clear information on these types of outcomes, it is clearly difficult to implement this model as such detailed data are not currently available. The current SIR tools [11, 28] are suited for scenario exploration, like illustrating the flattening of the curve under different levels of social distancing compliance. Unfortunately, they are less suited for forecasts as model parameters are not calibrated based upon case or outcome data. A recent attempt at taking a meta-population approach with SEIR dynamics in each patch, to estimate model parameters (including proportion of asymptomatic infection) based upon case counts from China was given in [22]. This resulted in a spatially explicit model without age-structure. Conversely, [35] is an example of a model with age structure but not spatial: Weitz extends the SEIR framework to include age groups, and estimates model parameters based upon hospitalization and death counts from Georgia. Further work from the University of Washington uses mortality data as model inputs, but takes a purely statistical approach in fitting a sigmoidal curve to cumulative COVID-19 deaths using a mixed-effects model [16]. Another alternative is an agent-based model such as that used in [13]. Similar to [22], [13] forecast a very large number of infections with COVID-19. Considering the level of uncertainty in the data being input into all of these models, we rationalize that a model output of a time series of illnesses is the more reasonable approach. Similarly, using the simplified framework and structure of our approach allow for greater flexibility under uncertain data inputs into the model, and a lack of typically required information such as total susceptible population, or asymptomatic rates.

11 in total

1. Network epidemic models with two levels of mixing.

Authors: Frank Ball; Peter Neal
Journal: Math Biosci Date: 2008-01-11 Impact factor: 2.144

2. The large graph limit of a stochastic epidemic model on a dynamic multilayer network.

Authors: Karly A Jacobsen; Mark G Burch; Joseph H Tien; Grzegorz A Rempała
Journal: J Biol Dyn Date: 2018-12 Impact factor: 2.179

3. Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early Experience and Forecast During an Emergency Response.

Authors: Giacomo Grasselli; Antonio Pesenti; Maurizio Cecconi
Journal: JAMA Date: 2020-04-28 Impact factor: 56.272

4. Estimates of the severity of coronavirus disease 2019: a model-based analysis.

Authors: Robert Verity; Lucy C Okell; Ilaria Dorigatti; Peter Winskill; Charles Whittaker; Natsuko Imai; Gina Cuomo-Dannenburg; Hayley Thompson; Patrick G T Walker; Han Fu; Amy Dighe; Jamie T Griffin; Marc Baguelin; Sangeeta Bhatia; Adhiratha Boonyasiri; Anne Cori; Zulma Cucunubá; Rich FitzJohn; Katy Gaythorpe; Will Green; Arran Hamlet; Wes Hinsley; Daniel Laydon; Gemma Nedjati-Gilani; Steven Riley; Sabine van Elsland; Erik Volz; Haowei Wang; Yuanrong Wang; Xiaoyue Xi; Christl A Donnelly; Azra C Ghani; Neil M Ferguson
Journal: Lancet Infect Dis Date: 2020-03-30 Impact factor: 25.071

5. The impact of long-term non-pharmaceutical interventions on COVID-19 epidemic dynamics and control: the value and limitations of early models.

Authors: Marissa L Childs; Morgan P Kain; Mallory J Harris; Devin Kirk; Lisa Couper; Nicole Nova; Isabel Delwel; Jacob Ritchie; Alexander D Becker; Erin A Mordecai
Journal: Proc Biol Sci Date: 2021-08-25 Impact factor: 5.349

6. Quantifying the Population-Level Effect of the COVID-19 Mass Vaccination Campaign in Israel: A Modeling Study.

Authors: Ido Somekh; Wasiur R KhudaBukhsh; Elisabeth Dowling Root; Lital Keinan Boker; Grzegorz Rempala; Eric A F Simões; Eli Somekh
Journal: Open Forum Infect Dis Date: 2022-02-18 Impact factor: 4.423

7. Preliminary Estimates of the Prevalence of Selected Underlying Health Conditions Among Patients with Coronavirus Disease 2019 - United States, February 12-March 28, 2020.

Authors:
Journal: MMWR Morb Mortal Wkly Rep Date: 2020-04-03 Impact factor: 17.586

8. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors: Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal: Science Date: 2020-03-16 Impact factor: 47.728

9. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention.

Authors: Zunyou Wu; Jennifer M McGoogan
Journal: JAMA Date: 2020-04-07 Impact factor: 56.272

10. Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) - United States, February 12-March 16, 2020.

Authors:
Journal: MMWR Morb Mortal Wkly Rep Date: 2020-03-27 Impact factor: 17.586