Literature DB >> 32920095

Modeling COVID-19 pandemic using Bayesian analysis with application to Slovene data.

Damjan Manevski¹, Nina Ružić Gorenjec², Nataša Kejžar³, Rok Blagus⁴.

Abstract

In the paper, we propose a semiparametric framework for modeling the COVID-19 pandemic. The stochastic part of the framework is based on Bayesian inference. The model is informed by the actual COVID-19 data and the current epidemiological findings about the disease. The framework combines many available data sources (number of positive cases, number of patients in hospitals and in intensive care, etc.) to make outputs as accurate as possible and incorporates the times of non-pharmaceutical governmental interventions which were adopted worldwide to slow-down the pandemic. The model estimates the reproduction number of SARS-CoV-2, the number of infected individuals and the number of patients in different disease progression states in time. It can be used for estimating current infection fatality rate, proportion of individuals not detected and short term forecasting of important indicators for monitoring the state of the healthcare system. With the prediction of the number of patients in hospitals and intensive care units, policy makers could make data driven decisions to potentially avoid overloading the capacities of the healthcare system. The model is applied to Slovene COVID-19 data showing the effectiveness of the adopted interventions for controlling the epidemic by reducing the reproduction number of SARS-CoV-2. It is estimated that the proportion of infected people in Slovenia was among the lowest in Europe (0.350%, 90% CI [0.245-0.573]%), that infection fatality rate in Slovenia until the end of first wave was 1.56% (90% CI [0.94-2.21]%) and the proportion of unidentified cases was 88% (90% CI [83-93]%). The proposed framework can be extended to more countries/regions, thus allowing for comparison between them. One such modification is exhibited on data for Slovene hospitals.

Entities: Disease Gene Species

Keywords: Bayesian inference; Discrete renewal process; Modeling epidemics; Non-pharmaceutical interventions; Reproduction number

Mesh：

Year: 2020 PMID： 32920095 PMCID： PMC7482592 DOI： 10.1016/j.mbs.2020.108466

Source DB: PubMed Journal: Math Biosci ISSN： 0025-5564 Impact factor: 2.144

Introduction

The new coronavirus SARS-CoV-2 has quickly spread all around the world greatly impacting all aspects of our lives. One of the key reasons for its rapid spread is the high reproduction number of infection, . The value represents the average number of people that an individual infects during their own infection period, whereby represents time. is susceptible to change due to interventions, social behavior, etc. When , the incidence of new cases reduces, but when , it increases until the epidemic reaches its peak; after that, the incidence of new cases starts to reduce due to (at least temporal) herd immunity [1]. The estimates of the basic reproduction number, , for SARS-CoV-2 vary substantially, depending on the method of estimation, and stand at about 3 [2], [3], [4], [5], [6], [7], [8]. Such a high basic reproduction number leads to a steep exponential increase in the number of cases which, in turn, causes a rapid increase in the number of people in need of hospitalization and intensive care unit (ICU). Due to the limited capacities of the healthcare system, this can lead to a situation where it is impossible to provide suitable care to all patients in need. Thus, it is of crucial importance for policy makers to estimate , infection fatality rate, proportion of asymptomatic cases and forecasting the number of hospitalized patients and patients in the ICUs. In the current COVID-19 pandemic many governments have adopted non-pharmaceutical interventions (NPIs) to control the spread of the epidemic in their countries. Many models are available for forecasting the COVID-19 pandemic with adopted NPIs, e.g. SIR-like (Susceptible–Infected–Recovered-like) compartmental models that are essentially deterministic models of stochastic processes, which capture the mean dynamic quite well for large populations [9], [10], [11], [12], [13], [14]. Another possible approach is using large simulation models that include networked (meta)populations or are agent-based and thus mimic the behavior of all individuals [11], [15], [16]. Xue et al. [14] developed a network-based model extending the network SIR model to capture the contact heterogeneity among individuals. Bayesian inference was introduced as an alternative to the compartmental models [17], [18], [19]. Here, simple stochastic models which describe the key features of epidemic spread are formulated and then actual data are used to estimate the parameters of the model, thus allowing the analysis to be driven by data and using less assumptions and/or fixed parameters [20]. Furthermore, the final estimates are equipped with statistically sound intervals of uncertainty. Flaxman et al. [7] studied the influence of NPIs relying on Bayesian inference, using data for 11 European countries. They concluded that, after adopting all interventions, the reproduction number of infection decreased from the basic value of 3.8 (mean value for all 11 countries) to 0.66 (ranging from 0.44 in Norway to 0.82 in Belgium). In their work, only country data on daily number of deaths (up to and including May 4, 2020) were taken into consideration. The same model was applied by also including Slovenia in the analysis (using data up to and including April 12, 2020), showing that NPIs were effective also in Slovenia, reducing the reproduction number to 0.6 with the whole 95% credible interval being below 1 [21]. One of the main goals of these analyses was to accurately estimate the cumulative effect of interventions while using only data on number of deaths. For this reason, it was crucial to pool information across different countries whereby imposing some strong assumptions. Namely, it had to be assumed that the reproduction number of infection does not change between given interventions and that their effect is similar across countries. Therefore, the results based on the Flaxman model [7] are driven more by countries with advanced epidemics and larger number of deaths [7]. E.g., excluding the data for Italy and Spain, the estimate of the reproduction number for Slovenia after adopting the NPIs changes to 0.8 with 95% credible interval including 1 [21]. Additionally, there is a large time lag in deaths data which may inform the parameters of the model. Furthermore, the Flaxman model [7] cannot be directly used for forecasting the number of hospitalized patients and patients in ICUs or to estimate the proportion of unidentified cases. We therefore propose an elaborated semiparametric modeling framework based on a Bayesian approach, that greatly extends and modifies the Flaxman model. In contrast to Flaxman et al. [7] which only uses daily number of deaths, we propose an extension using additional data sources for important indicators of the state of epidemic and the healthcare system that are crucial for effectively coping with the pandemic and have less ‘time lag‘. We use: confirmed number of cases and deaths, number of hospitalized patients, number of patients in ICUs, number of patients admitted to and released from hospitals and ICUs. The rationale for including more data for a given country is manyfold: (1) to obtain accurate estimates for countries with small number of deaths and/or which are at the earlier stages of the epidemic without having to pool the information from other countries, (2) to provide estimates of more parameters, which allow to report other measures, such as proportion of unidentified cases, (3) to make it possible to relax some strong assumptions around NPIs (i.e. to address critiques about the piece-wise constant of the Flaxman model). The model with data values (in circles), its expected values (in boxes), deterministic parameters in red, and ‘true‘ parameters (for which posterior distributions are estimated) in blue (the parameters with prior distributions) or black (the transformed parameters, including expected values in boxes). We model the time varying effective reproduction number differently than the Flaxman model allowing more flexibility under weaker assumptions. The outputs of our model are estimates of and other parameters with narrower credible intervals due to the inclusion of different data sources, and accurate short term forecasts of various important epidemic indicators. The quality of the model and its forecasts is evaluated on real data. We apply the proposed modeling framework using publicly available data for Slovenia [22]. We estimate the reproduction number of SARS-CoV-2 and provide accurate short term forecasts for important indicators for monitoring the state of the healthcare system throughout the course of the Slovene COVID-19 epidemic. We show how the proposed framework can be used to estimate the effectiveness of the adopted NPIs (and their lifting) for Slovenia and estimate some measures at the final time point, i.e., infection fatality rate, proportion of cumulatively infected people and proportion of unidentified cases. Flexibility of the proposed modeling framework is also illustrated using data at hospital level, providing separate forecasts for each hospital where COVID-19 patients were admitted. The rest of the article is organized as follows. In the next section we formally present the model which is applied to Slovene COVID-19 data and show its possible extensions/generalizations. This is followed by showing a series of selected results for Slovene epidemic. The paper concludes with a discussion and a summary of the most important findings.

Modeling framework

Each country has a slightly different set of data, which usually does not fit to a model that is optimal from the epidemiological point of view. The main idea is to develop a model that is applicable to available data, adapting it to suit Slovene data, but making it easy to adjust according to availability and reliability of data in any country. First, we present the core model for Slovenia. Then, we show how it may be extended or modified to a more general framework. Finally, we illustrate one possible modification of the model. We adopt the following notation. We add its meaning in parentheses so that the notation is easier to follow. Daily data on the number of confirmed cases is denoted by (positive cases), number of hospitalized patients by , number of patients in Intensive Care Units (ICU) by , daily number of new patients admitted to hospitals and to ICUs by (hospital in) and (ICU in), respectively, daily number of patients released from hospitals and ICUs by (hospital out) and (ICU out), respectively, daily number of deaths occurring in hospitals by (death from hospital), and daily number of deaths occurring outside the hospitals by (death directly from cases), where are days from the start until the end of the analysis. Let denote a normal distribution with mean and standard deviation . We say that follows a folded-normal distribution, denoted by , if and . Let denote a continuous uniform distribution on , let denote a gamma distribution with mean and coefficient of variation , and let be the exponential distribution with mean . Using we denote the negative binomial distribution with mean and variance . If is the density of a positive random variable, then denotes its discretization via

The model

The core model is schematically presented in Fig. 1. We explain it thoroughly in the next subsections. First, we present the upper part (see line border on Fig. 1) that is used for modeling infections, then the lower part used for modeling disease progression, and in the last subsection the whole model is precisely specified. In Fig. 1, the data are presented in circles, the deterministic parameters of the model are colored red, while ‘true‘ parameters of the model, for which posterior distributions are estimated, are in blue (the parameters with prior distributions) or black (the transformed parameters).

Fig. 1

The model with data values (in circles), its expected values (in boxes), deterministic parameters in red, and ‘true‘ parameters (for which posterior distributions are estimated) in blue (the parameters with prior distributions) or black (the transformed parameters, including expected values in boxes).

Modeling infections

Following Flaxman et al. [7], the number of truly infected individuals (cases) is modeled using a discrete renewal process, which is related to the Susceptible–Infected–Recovered model but it is not expressed in differential form. Similar to [7], the time between when a person gets infected and when he subsequently infects other people (i.e. the serial interval distribution ) is distributed as We use as in Flaxman et al. [7] and (Fig. 2, upper left), performing sensitivity analysis for other values of . Note that the distribution of parameter is fixed in the model.

Fig. 2

Assumed serial interval distribution ( - upper left) and assumed distributions of various times used in the model; note different scales.

The expected number of truly infected individuals on a given day is then modeled by where is the size of the population, is the time-varying reproduction number modeling the average number of secondary infections over time, and is the discretization of (1). Infections at time depend on the number of infections in the previous days, first weighted by the discretized serial interval distribution, and then scaled by and the term accounting for the fact that population can develop (temporal) herd immunity against SARS-CoV-2. The latter would simply be omitted if immunity cannot be developed. Note that we do not have the data for truly infected individuals that would correspond to . The way the reproduction number varies in time is at the core of our model: we assume that major government changes in policies (for controlling the COVID-19 epidemic) drive the changes of , where we consider two options. Let be days of major changes of policies. In the first option, where we emulated the study of Flaxman et al. [7], the reproduction number is a piece-wise constant function between days of policy changes: where is equal to if and otherwise, while is equal to if and otherwise. Parameter is the basic reproduction number with a prior distribution (similarly as in [7]) Independent prior distributions for the parameters were chosen to be With these prior distributions (for their graphical presentation see Supplementary Material 1), we have and for each Assumed serial interval distribution ( - upper left) and assumed distributions of various times used in the model; note different scales. To obtain a continuous estimate of (instead of a piece-wise constant estimate of ), we developed a second option: is modeled through where is the th column of the B spline basis matrix for the natural splines, with the positions of the knots set to . Natural splines were used in order to maintain low degrees of freedom (note that only one additional parameter needs to be estimated) and to have a continuous function . Priors for and were chosen as in the first option, but (6) does not hold anymore. Since the process cannot start from zero, see (2), we have to seed the model. As in [7], we assume that seeding of new infections begins 30 days before observing 10 cumulative deaths. From this date, we seed the models with six sequential days where we chose with prior distribution

Modeling disease progression

First, we show how confirmed cases (positive cases) are incorporated in the model, others follow a similar logic. We assume that the number of confirmed cases is distributed as where prior distributions were chosen as and is the expected number of confirmed cases that is modeled mechanistically from cases as Parameter determines the amount of over-dispersion, i.e. the additional variance of the negative binomial distribution above that of the Poisson distribution. Smaller values of the parameter allow for larger over-dispersion. The model could be re-parameterized so that larger values allow for larger over-dispersion, potentially also allowing no over-dispersion (see Supplementary Material 1). Parameter represents the ratio of cases that are being tested. We model it as where is fixed (chosen in exploratory fashion as a guess of the ratio of cases being tested, see Table 2) and is noise around it with prior distribution Furthermore, is discretized distribution for time from infection to a positive test , which is assumed to be a sum of two independent times: infection to onset (infection – onset) and onset to positive test (onset – positive) To sum up, the number of confirmed cases is the sum of the past infections weighted by their probability of transition from to . Importantly, , , , and are fixed, so is deterministic. For the infection – onset time we used , as in [7]. The other deterministic parameters depend on the testing strategies and were chosen in exploratory fashion, the chosen values are given in Table 2.

Table 2

Chosen values for deterministic parameters, defined in Table 1, where parameters were taken as in Flaxman et al. [7] and others were chosen in exploratory fashion.

X	μX	ξX	τX
O	5.1	0.86
P	4	0.25	0.1
HI	2	0.45	0.07
HO	10	0.45
UI	2	0.45	0.2
UO	11	0.45
DH	8	0.45	0.2
DC	18	0.45	0.006

The data for the number of admissions and releases from hospitals and ICUs, and the number of deaths are incorporated in the model following the same logic as , where we assumed the conditional independence of different data sources given the model parameters (for more details see Section 2.1.3). Hence, the expected values originate from different data sources (see Fig. 1) with different discretized times and ratios . Formulas for expected number of individuals at a specific state can be unified through where is an appropriate expected number of individuals at a previous state (see Fig. 1), and the meaning of and remains the same. To be succinct, we present the rest of the model in Table 1. Note that column is fixed to 1 for hospital out and ICU out. This holds when is just time-lagged , meaning that this data source does not separate deceased patients from cured. The chosen values for the deterministic parameters are given in Table 2. The assumed distributions of various times used in our analyses are shown in Fig. 2.

Table 1

Modeling the expected number of individuals at a specific state (first column) from the expected number of individuals at a previous state (third column, see also Fig. 1) via formula (15).

Quantity	Notation	xk	πX	τ∗X
Positive cases	pk	ck	Γ(μO,ξO)+Γ(μP,ξP)	τPηP
Hospital in	hkI	ck	Γ(μO,ξO)+Γ(μHI,ξHI)	τHIηHI
Hospital out	hkO	hkI	Γ(μHO,ξHO)	1
ICU in	ukI	hkI	Γ(μUI,ξUI)	τUIηUI
ICU out	ukO	ukI	Γ(μUO,ξUO)	1
Death in hospitals	dkH	hkI	Γ(μDH,ξDH)	τDHηDH
Death outside the hospitals	dkC	ck	Γ(μO,ξO)+Γ(μDC,ξDC)	τDCηDC

Expected number of hospitalized patients and number of patients in ICUs are modeled differently compared to others. We used and similarly for . We use data on in addition to and due to some inconsistencies in our data sources between the three and we did similar for ICU. Note also that we do not have a possibility to go from to ‘death from ICUs‘ as there is no such data available for Slovenia. Modeling the expected number of individuals at a specific state (first column) from the expected number of individuals at a previous state (third column, see also Fig. 1) via formula (15). Chosen values for deterministic parameters, defined in Table 1, where parameters were taken as in Flaxman et al. [7] and others were chosen in exploratory fashion. Daily number of deaths is then forecasted using . Since the daily number of deaths was consistent in our data, i.e., , additional modeling of was not necessary.

Model specification

According to Bayes theorem, the joint posterior distribution is proportional to where is the likelihood of the data and is the joint prior distribution for all the model parameters . Specifically, the vector of parameters is where goes through all data sources, i.e. (positive cases), (hospital), (hospital in), (hospital out), (ICU), (ICU in), (ICU out), (death in hospitals) and (death outside hospitals). The data are We assume conditional independence of different data sources given the model parameters, so the likelihood can be expressed as where is the probability density function of as specified in (10), similarly for others. We assume independent prior distributions if not stated otherwise in the previous subsections. In particular, the joint prior distribution is where the distributions of priors, from left to right, are specified in (11), (14), (5), (4), (8), (9) whereas all other parameters of that are not specified in the above joint prior are transformed parameters via (2), (3) or (7), (12), (13), (15), (16). In addition to the posterior distributions for all the parameters, including the expected numbers of individuals at all states (), the outputs of our model are also their forecasts for a specified number of days based on the posterior predictive distribution (the reliability of forecasts is validated using methods presented in Section 2.4). A possible extension of the core model (Fig. 1), where hyperparameters are colored green.

Possible extensions and modifications of the model

The described model could be extended to take into account data from different countries (or different regions etc.) and make estimation and forecasts at country level. A possible generalization is presented in Fig. 3. Note that there are three options for modeling each parameter, for brevity we only describe them for . The first option is to assume that all countries have the same parameter , which could lead to a poor fit due to using only one parameter. The second option is that each country has its own and that they are independent of each other, i.e. giving each its own prior, which could lead to overfitting problems due to many parameters. The third option is to model this parameter hierarchically as in Fig. 3, i.e. with a prior distribution for . In this way, we allow for the heterogeneity of parameters across countries, while using a common distribution with hyperparameter to introduce some dependence and constrain them in order to avoid the overfitting problem.

Fig. 3

A possible extension of the core model (Fig. 1), where hyperparameters are colored green.

Furthermore, the model could be easily modified to correspond to the availability and reliability of the data collected in a specific country as certain branches of the model presented in Fig. 1, Fig. 3 could be removed. In fact, this is why the expected number of patients admitted to hospitals and expected number of deaths outside of hospitals are modeled directly from the expected number of truly infected , instead through positive cases . Namely, data on positive cases for the COVID-19 pandemic are often unreliable and depend strongly upon testing strategy which changes during the course of the pandemic and substantially varies between different countries. In this way, the forecasts of our model are not driven by the number of positive cases, which, if deemed necessary, can also simply be excluded from the model.

Model stratified by hospitals

Here we show a modification of our model to incorporate data on hospital level. The model is presented in Fig. 4.

Fig. 4

Model stratified by hospitals.

Infections are modeled as before, then separate submodels for hospitals are connected to . Data for hospitals are denoted by , , , , similarly for ICU data. For each hospital we use the same model as in Section 2.1.2, where the parameters (representing the probability of admission to hospital ) are hospital specific, whereas all other parameters are not hospital specific, see Fig. 4. We opted for this assumption in our application to Slovene data for two reasons: (1) the daily numbers for hospitals were small (increasing the variability of our estimates), and (2) the organization of public health care in Slovenia is unified across the entire country, COVID-19 patients are treated only in public hospitals. This assumption could be easily relaxed in the model, if needed. Model stratified by hospitals. For Slovene hospitals we chose (in the core model was used, see Table 2) where the weights were determined from the data, yielding , , and for University Medical Centre (UMC) Ljubljana, UMC Maribor, University Clinic (UC) Golnik and General Hospital (GH) Celje, respectively (COVID-19 patients in Slovenia have been admitted only to these four hospitals).

Computational aspects and model validation

Convergence of the models was assessed based on statistics [23], trace plots, estimated posteriors, mean with 50%, and 90% credible intervals for , (the last day of studied period) and the parameters for each chain individually. Effective sample size is also reported. Sensitivity analysis was performed for mean parameter of serial distribution considering , 6.5, and (see Supplementary Material 2 and 3). Folded-normal distribution was compared with other commonly used prior distributions for positive quantities, i.e., gamma and log-normal distributions (see Supplementary Material 1). Several choices for the over-dispersion parameter of the assumed negative binomial distributions were considered (see Supplementary Material 1). The differences were not substantial. The model presented in Section 2.1 is also compared with the Flaxman model [7] using Slovene data, showing that the estimate of for Slovenia after the final intervention based on our model is larger with slightly narrower CI (see Supplementary Material 1 for more details). The effect of including several data sources in our model is shown in Supplementary Material 1; using only a single data source, i.e. data on the daily number of deaths results in wider CIs. For each model we evaluate the quality of out-of-time forecasts for future days as follows. It was decided that forecasts will not be made until the occurrence of 10 cumulative deaths, which occurred in Slovenia on March 29, 2020; the models were then refitted daily up to and including May 24, 2020, evaluating the quality of the out-of-time forecasts for . The refitted models were used to compute the exact expected log predictive density (EELPDS) [24] separately for each variable (deaths, hospital, ICU, etc.) and with larger values of EELPDS implying better predictive ability. The models were fitted in R [25] (R version 3.6.3) using R package rstan [26]. The models from May 6, 2020 onwards are fitted using 1600 iterations (800 warmup iterations), 4 chains, keeping every 4th sample, setting the target average proposal acceptance probability during standard adaptation period to 0.98 and putting a cap on the depth of the trees that it evaluates during each iteration to 15. The earlier models were fitted using 2600 iterations (1300 warmup iterations) setting the target average proposal acceptance probability to 0.995 with the other settings as described before. All the code needed to reproduce the results is available on GitHub [27]. The analysis was performed on cluster of CentOS based containers.

Application to Slovene COVID-19 data

In this section, we apply the proposed modeling framework to publicly available Slovene COVID-19 data [22]. We have analyzed the data up to and including June 3, 2020. For a brief presentation of the data see Fig. 5 where for the study period we plot the number of patients treated in hospitals and ICUs on a given date, cumulative number of deaths and cumulative number of deaths occurring in hospitals. First infection was confirmed on March 4, 2020 and in total there were 1477 infections in the studied period; the largest number of confirmed infections in a single day occurred on March 26 when 61 new infections were confirmed. First death due to COVID-19 occurred on March 14, 2020; in total there were 109 deaths, out of these 55 (50.5%) occurring in the hospitals. First patients were admitted to hospitals and ICUs on March 11, 2020; in total 352 and 73 patients were admitted to hospitals and ICUs, respectively, with the largest number of new admissions occurring on March 27, 2020 and March 25, 2020, when 23 and 6 new patients were admitted to hospitals and ICUs, respectively.

Fig. 5

Number of patients treated in hospitals and ICUs on a given date, cumulative number of deaths and cumulative number of deaths occurring in hospitals in Slovenia, March 4–June 3, 2020.

In order to control the epidemic, Slovene Government banned public events on March 10, 2020, which was followed by implementation of complete lock-down on March 20, 2020 and prohibition of movement outside of the municipality of residence on March 30, 2020. The government started to relax the restrictions on April 30, 2020. Correspondingly, were set to these respective dates when modeling . Number of patients treated in hospitals and ICUs on a given date, cumulative number of deaths and cumulative number of deaths occurring in hospitals in Slovenia, March 4–June 3, 2020.

Results for the model at the national level

The first goal was to estimate and the number of infected individuals, and make forecasts for, among others, the number of hospitalized patients and number of patients in ICUs using data at country level. For this purpose, we use the model from Section 2.1. A series of selected results is presented in Fig. 6; complete results are available as Supplementary Material 2.

Fig. 6

First row: EELDPS values for number of hospitalized patients (A) and in ICU (B) based on models that are fitted with data until May 24, 2020; estimated 50% and 90% CI for the of SARS-CoV-2 for Slovenia using natural spline with marked days of NPIs (C). Second row shows 7-day forecasts for hospitalized patients and third row for ICU patients using natural spline. The observed number of patients (brown) are shown together with the respective predicted values and 7-day forecasts (darker and lighter shaded blue areas represent 50% and 90% CI, respectively); results are shown for fitted models until March 30 (D and G), April 19 (E and H), May 24 (F and I).

Estimated number of infected individuals and basic reproduction number were very similar when modeling using splines and piece-wise constant function (see Supplementary Material 2). We have estimated that 7287 people (0.350%; 90% credible interval (CI): [0.245–0.573]%) have been infected with SARS-CoV-2 throughout the study period when modeling with splines. The basic reproduction number has increased from 3.17 (90% CI [2.74–3.59]) to 3.92 (90% CI [1.56–9.68]) until the adoption of measures to control the epidemic, after which the effective started to decrease reaching its lowest value of 0.17 (90% CI [0.05–0.51]) at the end of the study period (see also Fig. 6, panel C). Based on these results it can be concluded that the adopted NPIs in Slovenia were effective in slowing down the spread of the epidemic which eventually resulted in the end of the first wave. Using piece-wise constant function in the model, the largest and smallest estimated were less extreme, 3.21 [1.00–5.81] and 0.48 [0.44–0.52], respectively. This is due to a supposition that does not change between interventions and can therefore be only interpreted as average in the period between two successive interventions, whereas when using splines can be interpreted on a daily basis. First row: EELDPS values for number of hospitalized patients (A) and in ICU (B) based on models that are fitted with data until May 24, 2020; estimated 50% and 90% CI for the of SARS-CoV-2 for Slovenia using natural spline with marked days of NPIs (C). Second row shows 7-day forecasts for hospitalized patients and third row for ICU patients using natural spline. The observed number of patients (brown) are shown together with the respective predicted values and 7-day forecasts (darker and lighter shaded blue areas represent 50% and 90% CI, respectively); results are shown for fitted models until March 30 (D and G), April 19 (E and H), May 24 (F and I). Based on the estimates of the number of infected individuals and estimated number of deaths, we estimate, using splines when modeling , that throughout the study period infection fatality rate (IFR) was 1.56% (90% CI [0.94–2.21]%). In Slovenia, the share of estimated deaths outside the hospitals for this period was large, 47.15% (90% CI [38.5–55.26]%), due to deaths in the retirement homes, since there were many outbreaks therein. Excluding deaths not occurring in the hospitals results in the IFR of 0.8% (90% CI [0.48–1.26]%). It is also estimated that throughout the study period the proportion of unidentified cases (defined as 1 minus the ratio of the number of positive and infected cases), which can account for asymptomatic cases, was equal to 88% (90% CI [83-93]%). Fig. 6 shows 3 examples of models fitted to various end-dates. The quality of the out-of-time forecasts as measured by EELPDS was better when using splines, with good ability to make out-of-time forecasts throughout the course of the Slovene COVID-19 epidemic (see Fig. 6, panels A, B). Considering only early data, the longer the interval between end-date and final time-point prediction the wider the CI, often even being uninformative (see Fig. 6, panels D, G). As expected, when more data are available the precision of the forecasts improves with much narrower CIs (see Fig. 6, panels E, F, H, I). It is also important to note that the estimates of at the final time-point can be imprecise when the final time-point is near the last intervention. This is a natural consequence of having few data when estimating . Therefore, a long enough interval between the interventions and after the latest intervention is required in order to obtain informative estimates. For example, final estimates as given by 90% CI when using data up to and including May 1 (one day after some restrictions were lifted), May 5 and May 10 were [0.11–2.47], [0.05–1.40] and [0.03–0.61], respectively, when using splines and similarly (although less precise with wider CIs) when using piece-wise constant function (see Supplementary Material 2). First row: EELDPS values for number of hospitalized patients in UMC Ljubljana (A) and UMC Maribor (B) based on models that are fitted with data until May 24, 2020; estimated 50% and 90% CI for the of SARS-CoV-2 for Slovenia using natural spline with marked days of NPIs (C). Second row shows 7-day forecasts for hospitalized patients in UMC Ljubljana and third row in UMC Maribor using natural spline. The observed number of patients (brown) are shown together with the respective predicted values and 7-day forecasts (darker and lighter shaded blue areas represent 50% and 90% CI, respectively); results are shown for fitted models until March 30 (D and G), April 19 (E and H), May 24 (F and I).

Results for model at the hospital level

Here, we use the data on hospital level with the goal of estimating and the number of infected individuals and to forecast the number of COVID-19 patients in each hospital. We use the model presented in Section 2.3. A series of selected results are presented in Fig. 7; complete results appear as Supplementary Material 3.

Fig. 7

First row: EELDPS values for number of hospitalized patients in UMC Ljubljana (A) and UMC Maribor (B) based on models that are fitted with data until May 24, 2020; estimated 50% and 90% CI for the of SARS-CoV-2 for Slovenia using natural spline with marked days of NPIs (C). Second row shows 7-day forecasts for hospitalized patients in UMC Ljubljana and third row in UMC Maribor using natural spline. The observed number of patients (brown) are shown together with the respective predicted values and 7-day forecasts (darker and lighter shaded blue areas represent 50% and 90% CI, respectively); results are shown for fitted models until March 30 (D and G), April 19 (E and H), May 24 (F and I).

As with the analysis at the national level, using splines to model the reproduction number resulted in larger EELPDS providing reasonable short-term forecasts (for time intervals days) for each hospital throughout the course of the Slovene COVID-19 epidemic (see Fig. 7, panels A, B). A similar time trend is estimated for the reproduction number as in the analysis at national level (see Fig. 7, panel C), providing similar estimates of the total number of infected individuals. As expected, since essentially no new information was introduced into the model, the precision of the estimates is very similar as in the previous analysis.

Discussion

The idea behind the model of Flaxman was to improve the accuracy of estimates (i.e., to narrow its credible intervals) with pooling information across different countries while only using data on number of deaths. To achieve this, several strong assumptions had to be made. For instance, they had to assume that the effect of each NPI was the same across countries. Our model uses several data sources from the same country to make estimates as accurate as possible without making such strong assumptions. Using Slovene data, we have illustrated the gain in efficiency by using several data sources in comparison with using a single data source, i.e. daily number of deaths. The modeling framework can be adapted to use data at different levels of aggregation, as was demonstrated on the Slovene data when making forecasts at the hospital level. As expected, in our application this did not further improve the efficiency of our estimates since essentially no new information was added in the model. However, having separate forecasts for each hospitals was important from a practical perspective for monitoring the state of the Slovene health-care system. With semiparametric model, there are several parameters which need to be specified and are influencing the estimates. Some of these parameters are obtained from the literature, the others can be estimated from patient level data, if available, and plugged into the model, e.g., the duration of time spent in hospital and ICU. However, if some parameters cannot be determined, the model can be simplified by omitting these parameters along with the data connected to them. The model estimates and forecasts are largely driven by data, hence their accuracy relies heavily on the quality of data. If some data sources are not reliable enough or are not available then the model can be simplified by omitting them. We have explained how sharing and/or pooling information across different countries and/or regions could be done, similarly as in Flaxman et al. [7], but this extension has to be used with caution since it adds new assumptions to the model, which are difficult to verify. The extended model would have to be thoroughly validated. When applying the proposed framework to Slovene data, we have estimated that reduced after adopting the NPIs showing their efficacy in controlling the transmission of SARS-CoV-2. However, after adopting all NPIs, as estimated by our model was larger than when using the Flaxman model [7], suggesting that for the case of Slovene data, using the data from other countries would result in over-estimation of the effect of the NPIs on the reduction of . This is also in line with our previous results [21], where we have observed that excluding the two of the most affected European countries (Italy and Spain) from the Flaxman model, results in much smaller estimated reduction of after adopting the NPIs in Slovenia. The effects of the NPIs seem to be the largest in these two countries. Based on the proposed model we have estimated the proportion of cumulatively infected people (attack rate) of 0.350% (90% credible interval (CI): [0.245–0.573]%) of the Slovene population suggesting that Slovenia had one of the smallest attack rates in Europe. At the same time, we have estimated the infection fatality rate (IFR) of 1.56% (90% CI [0.94–2.21]%). While estimating the IFR is difficult, especially for COVID-19, other studies tend to report it around 0.5%–1% [28], which implies a large estimated IFR for Slovenia. This can be explained by noting that during the first wave of Slovene epidemic, the virus transmitted largely among older people inside retirement homes who are at much higher risk of dying due to COVID-19. Up to May 4, around 80% of all deaths occurred among people from retirement homes [29] and all deaths outside the hospitals were in retirement homes. Excluding the estimated number of deaths occurring outside the hospitals yields a much smaller IFR of 0.80% (90% CI [0.48–1.26]%) thus more closely agreeing with other studies. It is of importance therefore to prevent the spread of the virus among elderly. The estimated percentage of unidentified cases based on Slovene data was equal to 88% (90% CI [83-93]%). Unidentified cases can to some extent be attributed to asymptomatic or mild symptomatic cases but could also reflect testing strategy of a country. Asymptomatic cases can transmit the virus [30] causing difficulties in the control of the epidemic [31]. Other studies have estimated that 40% to 45% of those infected with SARS-CoV-2 remain asymptomatic [32] with large differences between the studies (e.g., 17.9% in [33] to 87.9% in [34]). A large estimate for Slovenia might be a consequence of the fact that people with mild symptoms during the first wave were instructed to self-isolate and were in large majority not tested for SARS-CoV-2.

Conclusion

We have presented a semiparametric framework for modeling and forecasting COVID-19 pandemic combining Bayesian inference with current epidemiological knowledge about SARS-CoV-2. It is based on the model set by Flaxman et al. [7], whilst giving important extensions and modifications in the model. The presented framework can be used for estimating the number of individuals infected with SARS-Cov-2, its effective reproduction number, infection fatality rate, proportion of cumulatively infected people and proportion of unidentified cases, as well as for accurate short term forecasts of various important indicators of the state of epidemic for the healthcare system, e.g. the daily number of patients admitted to hospitals and ICUs. Furthermore, we have shown how the proposed framework could be extended to more countries/regions, thus allowing for comparison between them. Applying the framework to the Slovene data we found, that the adoption of NPIs was effective in limiting the first wave in Slovenia which mirrored in small estimated percentage of cumulatively infected people of 0.350%, 90% CI [0.245–0.573]%. Despite that finding, the estimated infection fatality rate for Slovenia was high (1.56%, 90% CI [0.94–2.21]%) which can be explained by entries of the virus into retirement homes. When eliminating some of the deaths that could be spared if there were no infections in retirement homes, an estimate for IFR would fall to 0.80% (90% CI [0.48–1.26]%). The proposed models can be easily implemented by those familiar with Bayesian inference. The R code of our model applied to Slovene data, which is partially based on the freely available R code provided by Flaxman et al. [7], is available on GitHub [27].

CRediT authorship contribution statement

Damjan Manevski: Formal analysis, Software, Writing - review & editing. Nina Ružić Gorenjec: Methodology, Formal analysis, Writing - review & editing. Nataša Kejžar: Formal analysis, Writing - review & editing. Rok Blagus: Conceptualization, Methodology, Formal analysis, Software, Writing - original draft, Supervision, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

10 in total

1. Extended compartmental model for modeling COVID-19 epidemic in Slovenia.

Authors: Miha Fošnarič; Tina Kamenšek; Jerneja Žganec Gros; Janez Žibert
Journal: Sci Rep Date: 2022-10-08 Impact factor: 4.996

Review 2. A review on COVID-19 transmission, epidemiological features, prevention and vaccination.

Authors: Yuqin Zhang; Gonghua Wu; Shirui Chen; Xu Ju; Wumitijiang Yimaer; Wangjian Zhang; Shao Lin; Yuantao Hao; Jing Gu; Jinghua Li
Journal: Med Rev (Berl) Date: 2022-03-02

Review 3. Non-pharmaceutical interventions during the COVID-19 pandemic: A review.

Authors: Nicola Perra
Journal: Phys Rep Date: 2021-02-13 Impact factor: 25.600

4. Estimating a continuously varying offset between multivariate time series with application to COVID-19 in the United States.

Authors: Nick James; Max Menzies
Journal: Eur Phys J Spec Top Date: 2022-01-11 Impact factor: 2.891

5. Comparing the dynamics of COVID-19 infection and mortality in the United States, India, and Brazil.

Authors: Nick James; Max Menzies; Howard Bondell
Journal: Physica D Date: 2022-01-19 Impact factor: 2.300

6. Modelling the effect of non-pharmaceutical interventions on COVID-19 transmission from mobility maps.

Authors: Umair Hasan; Hamad Al Jassmi; Abdessamad Tridane; Anderson Stanciole; Farida Al-Hosani; Bashir Aden
Journal: Infect Dis Model Date: 2022-07-14

7. Assessing the impact of non-pharmaceutical interventions (NPIs) and BCG vaccine cross-protection in the transmission dynamics of SARS-CoV-2 in eastern Africa.

Authors: Chelsea Mbeke Kilonzo; Mark Wamalwa; Solange Youdom Whegang; Henri E Z Tonnang
Journal: BMC Res Notes Date: 2022-09-04

Review 8. The methodologies to assess the effectiveness of non-pharmaceutical interventions during COVID-19: a systematic review.

Authors: Nicolas Banholzer; Adrian Lison; Dennis Özcelik; Tanja Stadler; Stefan Feuerriegel; Werner Vach
Journal: Eur J Epidemiol Date: 2022-09-24 Impact factor: 12.434

9. Estimation of SARS-CoV-2 antibody prevalence through serological uncertainty and daily incidence.

Authors: Liangliang Wang; Joosung Min; Renny Doig; Lloyd T Elliott; Caroline Colijn
Journal: Can J Stat Date: 2022-08-04 Impact factor: 0.758

10. Trends in COVID-19 prevalence and mortality: A year in review.

Authors: Nick James; Max Menzies
Journal: Physica D Date: 2021-06-07 Impact factor: 2.300

10 in total