Literature DB >> 33342517

Range of reproduction number estimates for COVID-19 spread.

Damiano Pasetto¹, Joseph C Lemaitre², Enrico Bertuzzo³, Marino Gatto⁴, Andrea Rinaldo⁵.

Abstract

To monitor local and global COVID-19 outbreaks, and to plan containment measures, accessible and comprehensible decision-making tools need to be based on the growth rates of new confirmed infections, hospitalization or case fatality rates. Growth rates of new cases form the empirical basis for estimates of a variety of reproduction numbers, dimensionless numbers whose value, when larger than unity, describes surging infections and generally worsening epidemiological conditions. Typically, these determinations rely on noisy or incomplete data gained over limited periods of time, and on many parameters to estimate. This paper examines how estimates from data and models of time-evolving reproduction numbers of national COVID-19 infection spread change by using different techniques and assumptions. Given the importance acquired by reproduction numbers as diagnostic tools, assessing their range of possible variations obtainable from the same epidemiological data is relevant. We compute control reproduction numbers from Swiss and Italian COVID-19 time series adopting both data convolution (renewal equation) and a SEIR-type model. Within these two paradigms we run a comparative analysis of the possible inferences obtained through approximations of the distributions typically used to describe serial intervals, generation, latency and incubation times, and the delays between onset of symptoms and notification. Our results suggest that estimates of reproduction numbers under these different assumptions may show significant temporal differences, while the actual variability range of computed values is rather small.

Entities: Disease Gene Species

Keywords: COVID-19; Generation time; Renewal equation; Reproduction number; SEIR-Based model

Mesh：

Year: 2020 PMID： 33342517 PMCID： PMC7723757 DOI： 10.1016/j.bbrc.2020.12.003

Source DB: PubMed Journal: Biochem Biophys Res Commun ISSN： 0006-291X Impact factor: 3.575

Introduction

One of the key diagnostic tools of the current COVID-19 pandemic is the effective reproduction number , a dimensionless quantity updated in time that discriminates whether epidemiological data at time t underpin a growth in the number of new secondary infections () [[1], [2], [3]]. Instead, the basic reproduction number, , is the number of secondary infections generated from an initial case in an entirely susceptible population. As the number of infections progresses, is assumed to describe, not only for severe acute respiratory syndromes, the number of secondary infections generated within a population comprising immune individuals [4]. Together with other indicators, forms the basis for characterizing unfolding outbreaks, planning containment measures and monitoring the effectiveness of epidemiological policies and emergency interventions. The science of COVID-19 has assessed strengths and weaknesses of current estimates of [5]. The formers include its immediacy and wide insight provided into the epidemiology of the virus, while the latters consider the uncertainty affecting key epidemiological parameters. This has been known for quite some time [4,[6], [7], [8], [9], [10], [11], [12], [13], [14], [15]], and it has direct application with reference to COVID-19 [[16], [17], [18], [19]]. In particular, complicating factors could bias the determination of for the ongoing pandemic [5,6]. They are: incompleteness and inaccuracies of the epidemiological datasets (missing data, improper timing of detection, under-evaluation of asymptomatic infections by limited sampling effort); lack of detailed knowledge about the disease transmission mechanisms; and the common assumption of equating generation times (intervals between the acquisition of a primary infection and acquisition of an ensuing secondary infection) with serial times (intervals between the onset of symptoms in primary and secondary infections). Despite these drawbacks, the conclusions drawn from early analyses are deemed valuable [4], whether provided by data analysis or via data-verified model studies e.g. Refs. [16,17,[20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32]]. This paper aims at a comparative study determining the range of outputs produced by various methods and covert or overt assumptions incurred in computing Specifically, our attention is concentrated on the distribution of the generation interval of the disease (. If were without exception a single value, equal for example to 6 days, would be determined exactly by the ratio of new infections at current time divided by the number of new infections at time . Therefore, the index would represent exactly the instantaneous proliferation rate of an infection. In this case, indicates an exponential growth in the number of infections [13] and, < 1 indicates a recessive epidemic phase. However, in reality the generation period is a random variable [27,[32], [33], [34]], described by a probability distribution function typically having a humped shape and with being the mean value. In this more general case the set bounds for are , where is the instantaneous growth rate [T−1] of the number of new infections (Supplementary Material, section SM1). Wallinga and Lipsitch [13] proposed a detailed theoretical analysis on the impact of different distributions of the generation period, especially showing that when approximating the distribution as a Dirac delta with center , . We explore the impact of such an approximation on the results of the kind of epidemiological models typically used to analyze the dynamics of an unfolding epidemic. The use of simple models may be motivated by a number of factors: immediacy, the lack of detailed information about the spatial structure of the disease spread and the related data needs, counting on the fact that neglecting complications, e.g. population structure, when making inference in emerging outbreaks may have little effect [12]. This is the basic factor behind our choice of making inference from two diverse datasets and using two epidemiological models: the Swiss COVID-19 data [22] simulated through a SEIR-based stochastic model, and the Italian case [25,28] where is estimated through a renewal equation. These models are meant to reproduce a broad range of possible conditions for practical inference, therefore making this analysis a comparative study of interest. This paper is organized as follows. The Methods section is aimed at recapping the major tools and the range of assumptions adopted in the study. The Results section is organized around the study of the effect of various assumptions concerning the generation/serial intervals on the computation of the reproduction number evolving in time via drastically different approaches. A discussion on the effective variation produced on estimates closes then the paper.

Methods

SEIR-based compartmental model

Epidemiological models are fundamental tools not only to simulate the epidemic dynamics, but also to infer the temporal variations of . Here we consider the stochastic compartmental model used by Lemaitre et al. [22] to simulate the first months of the COVID-19 transmission in Switzerland. The model assumes that the population is divided into compartments that represent disease stages (see Fig. 1 ). After exposure to the virus by an infectious individual , a susceptible individual goes through a latent/incubation period in compartment , then becomes infectious, . Depending on the clinical outcome, and if health care is sought, the individual either directly recovers (), or develops severe symptoms, (followed by death or recovery) or else gets hospitalized, H. Hospitalized individual progress to discharge (), Intensive Care Units (ICU, ) or death (), and those in the ICU can either be discharged or die (the model equations are presented in section SM2). Note that we implement a compartment to describe the delay from severe symptom onset to hospitalization. Owing to the data at our disposal at the time of the study [22], we may pinpoint different residence times in the various stages of and depending on the final outcome, thus justifying the presence of further delay compartments , , , and .

Fig. 1

Schematic diagram of COVID-19 transmission and hospitalization processes. There are two sinks: death D and recovered R. Stages and are implemented with and compartments respectively, to better represent the residence times in those compartments. The two inserts show the residence time distributions in compartments and in the for model considered, M1-M4. As for the model parametrization, in this retrospective study we keep the same setup presented in Ref. [22], corresponding to the evidence provided by data as of May 13, 2020. The model is implemented as a Hidden Markov Model (HMM) using the POMP package in R [35], and it is calibrated against data on current hospitalizations and incidence deaths through a maximum likelihood inference (see section SM3 for more details). In particular, the infectious rate , which governs the force of infection and, thus, the number of the new exposed individuals, is initialized as a random walk with unknown variance, and it is calibrated via an appropriate filter. Then, the effective reproduction number is directly computed by the relation , where is the mean residence time of an individual in the infectious stage. Here, we focus on the sensitivity of the estimated under possible different assumptions in modeling stages and . The mean residence times in these compartments are determined by assuming a mean generation time of 5.2 days [27], and an exposed and non-infectious duration of days [36], hence a mean duration of days in the infectious compartments (i.e. 5.2 = 2.9 + 4.6/2). Then, and are subdivided in and compartments, respectively, to better represent the possible humped shape of the residence times in those stages. means that the residence times are exponentially distributed with mean and , respectively. Increasing and leads to gamma distributions more concentrated around the mean values, tending to the limit of a deterministic residence time spent in the two compartments (two Dirac delta distributions). Although this latter assumption is unrealistic (it would mean that all individuals have the same incubation period and period of infectiousness), it represents a drastically easier model, simplifying the two distributions of residence times to atoms of probability at their means. The analysis in Ref. [22] highlighted that and characterized the best model setup, corresponding to an exponentially distributed time in the exposed compartment and a gamma of shape factor 3 for . Here we test the sensitivity of the estimated under three model formulations characterized by different number of compartments and : M1: and (reference scenario); M2: and , as an intermediate scenario depicting two humped shaped gamma distributions; M3: and , where the gamma distributions are almost collapsed on the mean value; M4: , , where the residence times in and are deterministically imposed (Dirac delta distribution). For each of these different formulations we performed the calibration of the model parameters and the temporal estimation of as in Ref. [22]. Note that the mean residence times in and are the same across the formulations, as only the shape of the distributions varies (Fig. 1). The generation times associated to each of these formulations have a complicated structure [37] that, in the limit case, tends to a uniform distribution during the period of infection, (Figure SM1). It is important to underline that, if the distribution of the generation times is available (e.g. by approximation with the serial interval) the integro-differential model considering the age of the infection is the most suitable approach to preserve the generation times (see section SM1). As an alternative, if the generation time is well approximated by an Erlang distribution with parameter and scale , the proposed SEIR model can replicate this distribution by imposing and and the same rate in each sub-compartment equal to . For example, the serial interval estimated in the early stage of the pandemic in Italy was modelled as a gamma distribution of shape factor 1.87, and a rate of 0.28 d−1 [38], which is well approximated by an Erlang distribution with and d−1. This corresponds to the considered SEIR model with and and rates d−1 and d−1. In this case, increasing only (while preserving the original mean generation time ) leads to a model having a distribution of the generation times that gets closer to its mean value, until degenerating in a Dirac delta distribution.

The renewal equation

The estimation of the generation time from epidemiological data is a crucial step to evaluate the temporal dynamics of the reproduction index . Since the early stages of the pandemic in China (e.g. Refs. [20,21]) and in Italy (e.g. Refs. [[38], [39], [40]]), has been estimated through a Bayesian approach taking into account an approximation of the generation time distribution (based on the serial interval) and the possible delays in the confirmation of a positive case. The computation is based on the time series of cases for which the date of symptom onset is known, in the following indicated with and the distribution of the generation times, in the following represented by . The expected value of the number of symptomatic cases on a certain day, , can then be modelled through the renewal equation, i.e. by the convolution between the antecedent cases and the distribution of the generation times, all multiplied by the daily reproduction index, : Assuming that the data are random samples from a Poisson distribution with mean , one can estimate the posterior probability distribution of through Bayesian approaches, such as Markov Chain Monte Carlo, aiming to sample from the likelihood function (see section SM4). It should be noted that a source of uncertainty in this model is introduced by assuming that the transmission is driven only by the known symptomatic cases, thus neglecting transmission due to asymptomatic individuals. In order to extend the estimation of in eq. (1) to the whole reported cases in Italy (see Figure SM2), one need to take into consideration the possible delay between the beginning of infectiousness and the reporting. A second source of uncertainty is the distribution of the generation times for COVID-19 (the distribution in eq. (1) which is a topic largely discussed in the literature. A gamma distribution is typically fitted to the available data of symptom onset, thus assimilating the generation times to the serial interval. Cereda et al. [38] estimated that the distribution of the serial interval for Lombardy (and used for the whole Italy) is gamma of shape 1.87 and scale 0.28 d−1. Here we want to assess the impact on the estimated when using the following distributions for the generation times in Eq. (1): G1: the same gamma distribution evaluated in Ref. [38] (reference results); G2: a Dirac delta distribution of centre = 6.67 days corresponding to the mean value of the gamma distribution evaluated in Ref. [38]. In this case the convolution adopted in the estimation of simplifies to a simple backward translation of the cases of days. G3: a uniform distribution during the period of infectiousness, , in order to mimic the generation time associated to the model M4 described in the previous section. In order to see the possible impact on the approximation when considering different data, we repeat the estimation of for the following datasets (see Figure SM2): D1: the cases ordered by date of symptoms onset (for the cases that it is known). D2: the whole set of reported cases shifted back in time in accordance with the delay distribution estimated by Ref. [38] (gamma with shape factor = 1.88, and a rate of 0.26 d−1). D3: the whole set of reported cases deterministically shifted back in time using the mean of the gamma distribution (7.23 d).

Results

Compartmental model: the Swiss case study

The inferred posterior distributions of for Switzerland obtained with the different compartmental models show the same temporal trends under the considered configurations of and (Fig. 2 a), implying that the flexibility of the calibration method adopted yields robust estimates of the parameters. However, pushing the residence times in and towards the Dirac delta distribution (models M2, M3, M4), results in a faster decrease of with respect to the reference (model M1) during the first weeks of the outbreak. The median value of estimated in M2-M4 has frequently a negative bias, reaching for many days a relative error of 50% (Fig. 2b). The most evident consequence is the anticipation of the crossing of the critical threshold , which occurs up to 6 days before M1.

Fig. 2

Panel a): temporal dynamics of the median value of for Switzerland inferred using the four compartmental models M1, M2, M3 and M4. The blue and green areas represent the 95% C.I. for the distributions associated to M1 and M4, respectively. Panel b): relative difference with respect the reference value of (median of model M1); The gray histogram shows the number of hospitalized individuals, data used to fit the model. To analyse the impact of using G1 (gamma), G2 (Dirac) and G3 (uniform) distributions for in eq. (1), we start by considering the epidemiological data ordered by date of symptom onset (data D1, Figure SM2). When the dates of symptom onset are known, the estimate of depends only on the distribution of the serial interval. In this case, the replacement of the gamma distribution (G1) with a Dirac delta of mean days (G2) or a uniform distribution (G3), marginally impacts the dynamic of the estimated posterior of (Fig. 3 a). Small discrepancies appear in G1 when one observes a sudden change in the value of the reproduction number, for example during March (i.e. during the Italian lockdown) and at the end of the Summer, when the second wave started. The mean and maximum relative errors during the simulation are 9%, 44% for G2 and 8%, 42% for G3, respectively (Fig. 3b). For both G2 and G3, in 95% of the days the estimated agrees with G1 in being larger (or smaller) than the critical threshold.

Fig. 3

Panel a): comparison between the time evolution of (median and 95% confidence interval) estimated using Eq. (1), where the distributions of generation times are: G1 (gamma distribution, blue), G2 (Dirac distribution, red) or G3 (uniform distribution, green). The grey bars represent data D1 (SM) used for the estimation of , the Italian cases per date of symptoms onset (SM). Panel b): relative differences between the median values of the reference (model G1) and the other two distributions. Similar results are obtained when the estimated are computed by starting from dataset D2 and D3 (Figures SM3-SM4). Table SM1 summarizes the analysis on the level of agreement of the estimated with respect their reference.

Discussion

In this paper we have explored the impact of a number of factors contributing to the estimation of the reproduction index . They are specifically: shape of infectiousness kernels defining the probability distribution of the serial or of the generation intervals; use of deterministic or stochastic models. delay in case reporting; use of symptomatic vs total cases; use of detailed hospital data to infer key epidemiological parameters. Given the broad differences in approach and data, we expect that our comparative analyses may be reasonably representative of the spectrum of possible responses, and in particular of the expected range of variation one would have if lack of detailed empirical evidence forces to resort to simplifying assumptions. Our main result is that most assumptions concerning the simplification of the distribution of generation intervals have a marginal impact on the range of values of yet quite noteworthy in terms of timing. This echoes the so-called Kohlberg-Nieman demystification [41] to explain to laypeople what goes on. Their basic approximation is that the number of daily new cases is multiplied by every days. This approximation yields a simple description of the spread of the virus in terms of these two parameters, suggesting an elementary method for estimating the reproduction number from the data on daily new infections. However, the method cumulates all uncertainties embedded in the data. While highly suggestive, the above results are referred to national data relative to Switzerland and Italy alone, and neglect other factors like, e.g., age stratification of the relevant demography and of the connected epidemiological rates, changes in large-scale mobility patterns, and heterogeneity in contacts by age and settings, and of risks of infection in different employment sectors. However, the inclusion of these other factors would require a granularity of data unavailable in most real-life large-scale settings. Moreover, models that ignore them have proved their worth in tackling the descriptions required to assess interventions strategies in the contagion dynamics well after the initial spread into a naïve population. We note that in both cases dealt with here a homogeneous spread model (subsumed by a spatially implicit model) is deemed appropriate, and the detail of the epidemiological data remarkable. This is not in contrast with approaches that advocated for the use of spatially explicit approaches, appropriate as they are to face what actually happened during the onset that revealed crucial spatial effects for different communities connected as in the COVID-19 Italian case, long treated as a paradigmatic case of spatial spread [28]. The neglect of spatial effects is reasonable for epidemics where local flare-ups have coalesced into a relatively homogeneous epidemic development. We thus conclude that, regardless of methodologically subjective choices or of limited evidence, the prompt communication of reproduction numbers underlying public statements about unfolding COVID-19 figures, when public and transparent to scrutiny, is generally reliable.

Credit authorship contribution statement

DP, JL: Conceptualization, Computations, Formal analysis, Writing, Review & Editing. EB, MG, AR: Conceptualization, Formal analysis, Writing - Review & Editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

31 in total

1. Some model based considerations on observing generation times for communicable diseases.

Authors: Gianpaolo Scalia Tomba; Ake Svensson; Tommi Asikainen; Johan Giesecke
Journal: Math Biosci Date: 2009-10-23 Impact factor: 2.144

2. Temporal dynamics in viral shedding and transmissibility of COVID-19.

Authors: Xi He; Eric H Y Lau; Peng Wu; Xilong Deng; Jian Wang; Xinxin Hao; Yiu Chung Lau; Jessica Y Wong; Yujuan Guan; Xinghua Tan; Xiaoneng Mo; Yanqing Chen; Baolin Liao; Weilie Chen; Fengyu Hu; Qing Zhang; Mingqiu Zhong; Yanrong Wu; Lingzhai Zhao; Fuchun Zhang; Benjamin J Cowling; Fang Li; Gabriel M Leung
Journal: Nat Med Date: 2020-04-15 Impact factor: 53.440

3. Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19.

Authors: Alberto Aleta; David Martín-Corral; Ana Pastore Y Piontti; Marco Ajelli; Maria Litvinova; Matteo Chinazzi; Natalie E Dean; M Elizabeth Halloran; Ira M Longini; Stefano Merler; Alex Pentland; Alessandro Vespignani; Esteban Moro; Yamir Moreno
Journal: Nat Hum Behav Date: 2020-08-05

4. Inferring R0 in emerging epidemics-the effect of common population structure is small.

Authors: Pieter Trapman; Frank Ball; Jean-Stéphane Dhersin; Viet Chi Tran; Jacco Wallinga; Tom Britton
Journal: J R Soc Interface Date: 2016-08 Impact factor: 4.118

5. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures.

Authors: Jacco Wallinga; Peter Teunis
Journal: Am J Epidemiol Date: 2004-09-15 Impact factor: 4.897

6. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study.

Authors: Qifang Bi; Yongsheng Wu; Shujiang Mei; Chenfei Ye; Xuan Zou; Zhen Zhang; Xiaojian Liu; Lan Wei; Shaun A Truelove; Tong Zhang; Wei Gao; Cong Cheng; Xiujuan Tang; Xiaoliang Wu; Yu Wu; Binbin Sun; Suli Huang; Yu Sun; Juncen Zhang; Ting Ma; Justin Lessler; Tiejian Feng
Journal: Lancet Infect Dis Date: 2020-04-27 Impact factor: 25.071

7. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy.

Authors: Giulia Giordano; Franco Blanchini; Raffaele Bruno; Patrizio Colaneri; Alessandro Di Filippo; Angela Di Matteo; Marta Colaneri
Journal: Nat Med Date: 2020-04-22 Impact factor: 87.241

8. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing.

Authors: Luca Ferretti; Chris Wymant; David Bonsall; Christophe Fraser; Michelle Kendall; Lele Zhao; Anel Nurtay; Lucie Abeler-Dörner; Michael Parker
Journal: Science Date: 2020-03-31 Impact factor: 47.728

9. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors: Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal: Science Date: 2020-03-16 Impact factor: 47.728

10. Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020.

Authors: Timothy W Russell; Joel Hellewell; Christopher I Jarvis; Kevin van Zandvoort; Sam Abbott; Ruwan Ratnayake; Stefan Flasche; Rosalind M Eggo; W John Edmunds; Adam J Kucharski
Journal: Euro Surveill Date: 2020-03

1 in total

1. Importation, Local Transmission, and Model Selection in Estimating the Transmissibility of COVID-19: The Outbreak in Shaanxi Province of China as a Case Study.

Authors: Xu-Sheng Zhang; Huan Xiong; Zhengji Chen; Wei Liu
Journal: Trop Med Infect Dis Date: 2022-09-03

1 in total