Literature DB >> 33521406

A simple model for fitting mild, severe, and known cases during an epidemic with an application to the current SARS-CoV-2 pandemic.

Matthew I Betti1, Jane M Heffernan2.   

Abstract

One of the major difficulties with modelling an ongoing epidemic is that often data is limited or incomplete, making it hard to estimate key epidemic parameters and outcomes (e.g. attack rate, peak time, reporting rate, reproduction number). In the current study, we present a model for data-fitting limited infection case data which provides estimates for important epidemiological parameters and outcomes. The model can also provide reasonable short-term (one month) projections. We apply the model to the current and ongoing COVID-19 outbreak in Canada both at the national and provincial/territorial level.
© 2021 The Authors.

Entities:  

Keywords:  Basic reproduction number; COVID-19; Epidemic curve fitting; Other words

Year:  2021        PMID: 33521406      PMCID: PMC7833529          DOI: 10.1016/j.idm.2021.01.002

Source DB:  PubMed          Journal:  Infect Dis Model        ISSN: 2468-0427


Introduction

During a pandemic/epidemic, governments and media report on the total number of known cases to the public. These reports are done at an almost regular basis - during the most recent COVID-19 epidemic, daily reports are released. Using the reported data, modellers are often tasked to estimate important disease parameters, such as the basic reproduction number, attack rate, actual case load, and the effective reproduction number as mitigating strategies are implemented. This, however, is a difficult task since only a fraction of the total case-load is known, and these important disease parameters also depend on asymptomatic and/or mildly symptomatic non-observed infections. The IDEA model (Fisman et al., 2013) has been shown to be effective in determining various epidemic parameters when data is limited. This model uses the basic reproduction number, R0 to estimate the attack rate, epidemic duration, and turning points of an epidemic. The model has been used most notably to predict the aforementioned parameters for a influenza A pandemic (Fisman et al., 2013) and the 2014–2015 Ebola outbreak (Tuite & Fisman, 2018). Other fitting methods and models that have been employed in past epidemics, and during the current COVID-19 outbreak, include reproduction number fitting through an SEIR model (Fang et al., 2020) or a delay SIR model (Anastassopoulou et al., 2020). Briefly, most methods apply statistical modeling and Bayesian inference to parameterize assumed epidemiological model structures While many of these methods and models produce accurate results, many lack the ability to differentiate between different types of infection, or the discrepancy between known cases and total cases. The former is important for informing policy on healthcare resources as more severe cases create a larger burden on the healthcare system. The latter discrepancy is an important aspect of epidemic fitting to address, due to the fact that unknown cases can still transmit infection, creating for discrepancies in estimation of the effective reproduction number. In this study, we propose a simple ordinary differential equation model which can be used to track the dynamics of mild, severe and cumulative known cases during an epidemic. When time series data is available for cumulative known cases and some epidemiological characteristics of the infectious disease under study are known, we can estimate the basic reproduction number, the overall reporting rate, the effectiveness of mitigation strategies and the final size of the epidemic. Moreover, from the ODE system, we can determine the epidemic peak and peak time. We apply our model to Canadian national, provincial and territorial cumulative incidence data for the current Covid-19 pandemic to illustrate its predictive power.

Model

The epidemic model is a modified SIR model. The compartments we are interested in are the number of mild cases, I, the number of severe cases, I, the cumulative known cases C and cumulative incidence C. We make the following assumptions in the model: The total population is constant. Acquired immunity lasts longer than the epidemic. There is no co-infection or super-infection. The testing/reporting rate in the population is relatively constant. The probability of a case being severe vs. mild is constant. All severe infections are reported, whereas a fraction of mild infections are. The model equations are then given aswhere N is the total susceptible population, is the basic reproduction number, p is the probability of severe infection, r is the average testing/reporting rate, is a mitigation function, and is time scaled by the average infectious lifetime 1/μ. Model (1) can be rewritten as:Note that Model (1) accounts for, but does not assume, that severe infections are less infectious than mild or asymptomatic infections by a factor of p. We include p since in novel or dangerous epidemics severe cases are often managed and quarantined once known. Model (1) is coupled with four initial conditions, derived from three parameters:where R is the number of cases that have already recovered or died by the time we start fitting data. Model (1) depends on a pre-defined mitigation function, , that must satisfy certain conditions. We first assume that, in the case of no modified behaviour, . In cases of mitigation strategies taken by populations as a whole, we assume that these strategies have a maximum possible effect, k, and that they take time to be implemented to their full effect. In other words, these methods are neither perfect nor immediate. For this study, we prescribe as

Data fitting

The goal is to fit Equations (1), (2), (3) to the number of known infections that are reported by the government or media. That is, we will determine the parameters , r, N, p, p, μ, d, k, I and I from a vector of data i.e., we need to fit the vector of parametersto values v∗ such thatwhere t are the time points associated with and C and R are missing from v. We assume these values are fixed by the data and cannot be fit. Effectively, we assume that there is no error in the number of reported cases and recoveries for day 0 of the fitting routine. We proceed with a fitting routine that employs the mean-square error, defined asAdditionally, to fit the model to the data, upper and lower bounds for the vector of parameters must be defined. These bounds are determined by the literature and expert knowledge (Table 1). The least-squares algorithm is quickly implemented in Python through the LMFIT package and it is quick to evaluate, meaning that we can obtain updated estimates in almost real time. In order to fit this model, we take advantage of the fact that solving a small system of ODEs, with a relatively small number of parameters is computationally cheap. We integrate system (1) with the prescribed M(t) and an in ital guess at conditions (2). We then use the integrated C(t; v) to calculate and minimize the residual.
Table 1

Table of Parameters and Initial Conditions. All parameters that are not fixed are fit using least-squares regression.

ParameterInit. ValueMinMaxRef. (if available)
ps0.1Fixed(?;?;?)
Μ0.1Fixed(Fang et al., 2020); (Anastassopoulou et al., 2020)
R02.513.5(Aylward et al., 2020; Lai, Bergna, et al., 2020; Lai et al. 2020; Liu et al., 2020)
NFixedKnown size (Canada, 2020b)
R10–800.8Verity et al. (2020)
P10–401
K0.00101
D0.101
CK0FixedData
Im010(CK0 + 1)CK010(CK0 + 1)
Is00.5(CK0 + 1)CK010(CK0 + 1)
R10000.1CK0N
Table of Parameters and Initial Conditions. All parameters that are not fixed are fit using least-squares regression.

Application to Canada’S SARS-COV-2 outbreak

We illustrate the effectiveness of this model by applying it to the current SARS-CoV-2 outbreak in Canada. We take data of cumulative incidence from the Government of Canada website of Canada (2020). This data set gives the total incidence of cases from first reported case at the national and provincial/territorial level. Here-within, we present results for the national level, as well as, two large and two small provinces, Ontario (ON), British Columbia (BC), New Brunswick (NB), and Prince Edward Island (PE), respectively.

Parameter values and initial conditions

Equations (1), (2), (3) include eight parameters (, r, N, p, p, μ, d, k), and four initial values (C, I, I, and R). We fix two model parameters and one initial condition to reduce the number of function calls to find a fit. We first fix the infectious lifetime, μ10 days (Aylward et al., 2020), as we need the parameter to set our timescale. Second, we assume that the initial number of known cases, C(0) is known and is equal to the first data point that will be considered in the fitting routine. Finally, we fix parameter p = 0.1. Since our total incidence information does not track mild or severe cases, we base this number on the age demography of Canada. Approximately 17% of the population of Canada is 65 years or older Canada (2020a). This demographic will make up the majority of severe cases, but will not entirely be affected. Therefore, the value p = 0.1 is reasonable. Additionally, this estimate is consistent with the probability of severe cases reported by the WHO (Aylward et al., 2020). Parameter values and known initial conditions values are listed in Table 1. For unknown parameter values and initial conditions that will be informed by the model fit, we also define reasonable upper and lower bounds. These bounds are listed in Table 1.

Cleaning the data

At the beginning of an epidemic, especially in the case of a novel pathogen there are many issues with early reporting data. Events like baseline disease spread, ramping up of testing, panic-induced crowding, etc. can cause the actual disease incidence to be obscured. Even though for some provinces and territories, and at the national level, we have data dating back to January 2020, we cannot use all of it. To determine when testing rates become relatively consistent and the system has ‘settled’ to an extent, we fit eight-day rolling averages of cumulative incidence to a very simple differential equationwhere is the effective reproduction number at any given time. These results are shown in Fig. 1 the national scale, and the provinces of Ontario British Columbia, New Brunswick and Prince Edward Island. Note the high variability in the effective reproduction number for the early time points. We therefore eliminate these points and fit the cumulative incidence starting from March 15, 2020. Ignoring these data points can further be justified given that there is a more consistent trend in estimates after this date. Also, after this date, social distancing measures were put in place, schools were closed, and reporting became daily in all provinces/territories.
Fig. 1

Effective reproduction number for Canada, ON, BC, NB, PE. Data (black dots), and the eight-day rolling average (blue line) are shown. The shaded area shows the 95% confidence interval around the mean.

Effective reproduction number for Canada, ON, BC, NB, PE. Data (black dots), and the eight-day rolling average (blue line) are shown. The shaded area shows the 95% confidence interval around the mean.

Data fits and projections

Our data set is limited, therefore the typical block bootstrapping used for time series data (?) is not feasible. The method described in (?) breaks the data set into blocks of consecutive time points, and bootstraps over said blocks. Given that we only have approximately 40 data points, we would either have to sacrifice the number of blocks or the number of points in a block, leading to limited success with bootstrapping. To proceed in a manner that provides a mean fit and confidence intervals for sensitivity analysis, we choose to fit the model using random fraction P of M − m data points, where M represents the total number of data points, and m = 5 is the last m data points in the set. We reserve the final m points so that we can observe whether the projected values of the fitted model lie within these data. Here within, we set P = 2/3. To perform the fits, we also provide initial values for all model parameters. We choose these values randomly for each model fit iteration within their defined bounds (Table 1). We perform the model fitting routine 1000 times for each geographical region under study. Fitted model parameters are shown in Table 2, for Canada, ON, BC, NB, and PE. Note that we fit the national numbers as a whole instead of a sum of parts.
Table 2

Table of fitted parameter values with 95% confidence intervals. Parameters marked by ∗ are fixed.

CanadaOntarioBritish ColumbiaNew BrunswickPrince Edward Island
R03.43 ± 0.393.35 ± 0.643.30 ± 0.883.16 ± 0.1.362.63 ± 2.20
p0.99 ± 0.030.99 ± 0.120.98 ± 0.170.95 ± 0.390.92 ± 0.43
r0.36 ± 0.170.28 ± 0.250.38 ± 0.170.20 ± 0.260.11 ± 0.26
N37.6 × 10614.4 × 1064.97 × 1060.78 × 1060.15 × 106
k0.005 ± 0.050.02 ± 0.210.03 ± 0.190.05 ± 0.320.19 ± 0.7
d0.11 ± 0.020.11 ± 0.140.23 ± 0.130.51 ± 0.300.52 ± 0.43
Im03461 ± 1460973 ± 804999 ± 42746 ± 5829 ± 60
Is0221 ± 11286 ± 5243 ± 262 ± 52 ± 5
Ck02531037321
R279 ± 28617280 ± 1948021790 ± 557682679 ± 292211256 ± 8692
Table of fitted parameter values with 95% confidence intervals. Parameters marked by ∗ are fixed. Uncertainly in some of the fitted model parameter values is large. This is a result of a high level of sensitivity of these parameters to the initial conditions, and the P(M − m) data point distribution used to fit the model. Particularly, we find that limiting the number of data points, especially to fewer than 30 creates a dataset that is too sparse to fit accurately. Fig. 2 shows the national fit with 95% confidence intervals on a linear and log scale (panels (a) and (b)). Here we see that I > I over all time (red vs blue lines). We also see that the total number of cases in Canada is projected to be 395005 ± 1016403, where 162682 ± 465543 are known/reported. The shaded areas indicate the 95% confidence interval around the mean fitted values. Note that rebound indicated by the red shaded region is an artifact of the sensitivity analysis. When the P(M − m) data points used to fit the model are clustered to the early data points (mid/late-March and early-April) infection outcome is delayed, giving a later and large infection peak. Instead of ignoring these model fits, we have instead chosen to include the results within the sensitivity analysis to show the sensitivity of the fitting method to the distribution of the P(M − m) chosen data points.
Fig. 2

Model fits at the national level. The mean values and 95% confidence interval of all model populations are shown on a linear and log scale (panels (a)–(b)). Panels (c) and (d) show the resulting number of new cases per day, and the national value of over time. (top rows) mild cases I (red), severe cases I (blue), known cases C (green), total cases C (green). Panels (a)–(c)) dark blue and red points are the known data. In all plots the shaded area shows 95% confidence intervals around the mean.

Model fits at the national level. The mean values and 95% confidence interval of all model populations are shown on a linear and log scale (panels (a)–(b)). Panels (c) and (d) show the resulting number of new cases per day, and the national value of over time. (top rows) mild cases I (red), severe cases I (blue), known cases C (green), total cases C (green). Panels (a)–(c)) dark blue and red points are the known data. In all plots the shaded area shows 95% confidence intervals around the mean. Fig. 2 also shows the new known cases per day at the national level (panel (c)), and the national estimate of (panel (d)). In panel (c), it is evident that the mean fit (blue line) lies within the m red data points, and therefore, the short-term forecasting of the model is useful. We also see that, in panel (d), the reproduction number for Canada is projected to be less than unity in the middle of May 2020. This agrees with other projections in the literature (Daniel et al., 2020). Finally, we note that the rebound indicated in the confidence interval shown in panel (c) is an artifact of the sensitivity analysis, as described previously. Fig. 3 shows provincial fits with 95% confidence intervals on a linear and log scale for Ontario, British Columbia, New Brunswick and Prince Edward Island. Fig. 4, Fig. 5 show the new known cases per day and for the aforementioned provinces. Note the long flat peak for the province of Ontario in Fig. 3. This agrees with government of Ontario projections (Tuite et al., 2020). Again, note the rebounds in cases shown in Fig. 3, Fig. 4. Additionally, we see that these artifacts of the sensitivity analysis increases as the province population size decreases and the number of new cases in the province are small.
Fig. 3

Mean fits of model (1) to cumulative case data for ON, BC, NB, and PE from left to right, shown on a (top row) linear and (bottom row) log scale. Mild I (red) and severe I (blue) cases are shown, as well as, known cases C (green) and the total number of infections C (purple). Shaded area shows 95% confidence intervals. The dark blue and red points indicate the case data.

Fig. 4

The number of new cases per day for ON, BC, NB, and PE (left to right). Subplots (e), (f), (g) and (h) are the same plots without the confidence intervals for clarity. The mean of the model fits is shown (blue line), with 95% confidence interval (shaded area) in the top row. The dark blue and red point indicate the case data.

Fig. 5

Model results for the effective reproduction number for ON, BC, NB and PE. Shaded area shows the 95% confidence interval around the mean (blue line)

Mean fits of model (1) to cumulative case data for ON, BC, NB, and PE from left to right, shown on a (top row) linear and (bottom row) log scale. Mild I (red) and severe I (blue) cases are shown, as well as, known cases C (green) and the total number of infections C (purple). Shaded area shows 95% confidence intervals. The dark blue and red points indicate the case data. The number of new cases per day for ON, BC, NB, and PE (left to right). Subplots (e), (f), (g) and (h) are the same plots without the confidence intervals for clarity. The mean of the model fits is shown (blue line), with 95% confidence interval (shaded area) in the top row. The dark blue and red point indicate the case data. Model results for the effective reproduction number for ON, BC, NB and PE. Shaded area shows the 95% confidence interval around the mean (blue line) From the fitted models, we can determine important epidemiological outcomes at each scale under study. The estimated peak time, peak magnitude, and attack rate for the national scale in Table 3. We also provide the same information for ON, BC, NB, and PE. Due to the nature of the model, the attack rate in this case is the attack rate in the case that current intervention measures are maintained until the end of the epidemic. We see the attack rate is relatively consistent across provinces, as many provinces have implemented similar mitigation strategies. Additionally, we see that if the I peak time differs from the I peak time, then I peak time precedes the I peak time. We also report the peak time of C′ which represents the number of known new cases per day. The peak number of known new cases always precedes the actual peak. This should be expected, and is likely exaggerated because the model assumes cases are reported and known as soon as they present.
Table 3

Table of epidemiological outcomes for Canada, ON, BC, NB, PE.∗(The first peak of each fit is reported as consecutive peaks are artifacts of the sensitivity analysis.).

CanadaOntarioBritish ColumbiaNew BrunswickPrince Edward Island
Peak Magnitude
Im44574 ± 2087431881 ± 743341149 ± 15286152 ± 20150 ± 61
Is4952 ± 23193542 ± 8259112 ± 169616 ± 225 ± 5
I = Im + Is49527 ± 2319435423 ± 825931258 ± 16965168 ± 22455 ± 65
Peak Time
Im2020–05 − 142020–06 − 032020–03 − 272020–04 − 032020–04 − 01
Is2020–05 − 142020–06 − 032020–04 − 012020–04 − 032020–04 − 02
I = Im + Is2020–05 − 142020–06 − 032020–03 − 282020–04 − 032020–04 − 01
CK2020–05 − 042020–05 − 252020–03 − 192020–03 − 262020–03 − 24
Attack rate (CI/N)0.01 ± 0.020.03 ± 0.170.003 ± 0.070.01 ± 0.130.01 ± 0.09
Table of epidemiological outcomes for Canada, ON, BC, NB, PE.∗(The first peak of each fit is reported as consecutive peaks are artifacts of the sensitivity analysis.).

Discussion

We propose a simple epidemiological compartmental model that can be readily fit to cumulative incidence data. The goal of the model is to synthesize disease dynamics to the fewest possible parameters while remaining robust enough to be able to project possible outcomes and to provide estimates that can help inform policy and healthcare needs assessments. The intent of this model is to be able to get a rough idea of epidemic trajectory in cases with limited data. The model itself is unique in that it estimates a number of key epidemic parameters without the assumption that the growth of the known cases is directly proportional to the total number of cases. By separating out known cases from actual cases, we can better estimate parameters such as the true and attack rate. Our estimates for the peak time of the epidemic is decoupled from the peak testing/reporting rate, giving a much better idea of when the epidemic might peak. Of note, qualitatively the model suggests that the peak in known cases will precede the actual epidemic peak. Our estimates for R0 are consistent with those already estimated in the literature (Aylward et al., 2020; Lai, Bergna, et al., 2020; Lai et al. 2020; Liu et al., 2020; Tang et al., 2020), and the peak times estimated are consistent with the timelines of other countries for which the infection has peaked, such as Italy (Fanelli & Piazza, 2020). We are also able to effectively estimate a number of parameters which, when compared among regions can help inform why a similar epidemic may be spreading differently in two different populations, or which mitigation strategies are most effective. The parameter p, for instance, can be interpreted as the efficacy of isolation of severe cases and how quickly isolation is implemented for known severe infections. The parameter r can be used to compare testing and reporting rates among different regions. For instance, in our fits we see that British Columbia has a higher reporting/testing rate (a greater value for r) as well as a quicker identification of severe cases (lower p) than Ontario which may point to how BC has been able to control the spread of SARS-CoV-2 more effectively than ON. The model has its drawbacks; mostly related to the fitting. Fitting ODEs to parameters is generally sensitive to the initial choice of parameters. We have done our best here to prescribe parameters that will work for SARS-CoV-2 cumulative incidence curves, but care should be taken when implementing that the fits are indeed viable. While least-squares is very quick to implement and analyze, it does not allow for strict confidence intervals. More robust methods, such as basin-hopping may allow for better parameter estimates. Additionally a hierarchical fit, when all provinces and territories are fit with individual but related parameters, as well as, a national fit may allow for better parameter estimates. However, here, distribution characteristics that relate a specific model parameter for each fit would then also have to be assumed i.e., assume that in all provinces and territories the fraction of severe cases will be related on a normal distribution with a mean and standard deviation. However, this is difficult to do since (1) many of the parameters are affected by different provincial and territorial government policies i.e., reporting rate, mitigation strategy, and (2) the distribution parameters (i.e., mean and standard deviation) would also have to be estimated alongside the model parameters. Improvements to the model can also be added. The reporting rate r, for example, could be assumed to change over time, making r = r(t). Additionally, different mitigation function structures, could be tested. Finally, some provincial and territorial COVID-19 outbreaks are dominated by imported cases (from travelling) rather than community transmission. The model could be modified to include this distinction to determine contact tracing rates by geographical region. However, this again, would depend on data quality and availability.

Declaration of competing interest

The authors have no conflicts of interest arising from or with respect to this study.
  4 in total

1.  Modelling policy combinations of vaccination and transmission suppression of SARS-CoV-2 in Rio de Janeiro, Brazil.

Authors:  Naiara C M Valiati; Daniel A M Villela
Journal:  Infect Dis Model       Date:  2021-12-31

2.  Could a New COVID-19 Mutant Strain Undermine Vaccination Efforts? A Mathematical Modelling Approach for Estimating the Spread of B.1.1.7 Using Ontario, Canada, as a Case Study.

Authors:  Mattew Betti; Nicola Bragazzi; Jane Heffernan; Jude Kong; Angie Raad
Journal:  Vaccines (Basel)       Date:  2021-06-03

3.  Mathematical modeling of COVID-19 in India and its states with optimal control.

Authors:  Shraddha Ramdas Bandekar; Mini Ghosh
Journal:  Model Earth Syst Environ       Date:  2021-06-10

4.  A generalizable data assembly algorithm for infectious disease outbreaks.

Authors:  Maimuna S Majumder; Sherri Rose
Journal:  JAMIA Open       Date:  2021-08-02
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.