Literature DB >> 32868967

A Bayesian approach for monitoring epidemics in presence of undetected cases.

Andrea De Simone^1,2,3, Marco Piangerelli¹.

Abstract

One of the key indicators used in tracking the evolution of an infectious disease is the reproduction number. This quantity is usually computed using the reported number of cases, but ignoring that many more individuals may be infected (e.g. asymptomatic carriers). We develop a Bayesian procedure to quantify the impact of undetected infectious cases on the determination of the effective reproduction number. Our approach is stochastic, data-driven and not relying on any compartmental model. It is applied to the COVID-19 outbreak in eight different countries and all Italian regions, showing that the effect of undetected cases leads to estimates of the effective reproduction numbers larger than those obtained only with the reported cases by factors ranging from two to ten.

Entities: Chemical Disease Gene Species

Keywords: Bayesian inference; COVID-19; Computational epidemiology; Stochastic process

Year: 2020 PMID： 32868967 PMCID： PMC7448881 DOI： 10.1016/j.chaos.2020.110167

Source DB: PubMed Journal: Chaos Solitons Fractals ISSN： 0960-0779 Impact factor: 5.944

Introduction

Tracking the evolution of the spread of an infectious disease is of primary importance during the whole course of any epidemic. An accurate evaluation of the transmission potential of the disease provides invaluable information to guide the decision-making process of control interventions, and to assess their effectiveness. One of the key epidemiological variables in this respect is the effective reproduction number R, defined as the average number of secondary cases per primary case of infection. In order to stop an epidemic, R needs to be persistently reduced to a level below 1. The issue of providing reliable estimates of R is particularly severe now during the on-going COVID-19 pandemic [1], and urgently calls for efforts towards a comprehensive mathematical modelling of the outbreak. In this paper we propose a statistical framework for computing the effective reproduction number R characterized by the following main features. Stochastic. Our approach is stochastic, and not rooted in any deterministic framework of compartmental models, such as the Susceptible - Infectious - Recovered (SIR) model and its extensions. Although compartmental models can indeed provide very useful outcomes, especially for extrapolating the outbreak evolution into the near future, they rely on the simultaneous determination of all the coefficients appearing in the differential equations describing the dynamics of each compartment. On the other hand, in our approach the time evolution of the system is modelled as a stochastic process, with a probability distribution associated to each random variable. Real-time. The method provides a time series of estimations of R at each time step (e.g. one day), and not a single a posteriori value when the outbreak is almost over. Bayesian. Within a Bayesian framework the results have a transparent probabilistic interpretation, the assumptions (priors) are explicit and their role is clearly tracked. A Bayesian updating procedure also accounts for the real-time evolution of the probabilities. Comprehensive. Our method is explicitly carried out for taking into account the (unknown) number of undetected cases. However, it can be straightforwardly generalized to include any additional random variable affecting the reproduction number. Since the reproduction number is a highly complex quantity, affected by a wide variety of factors, such as biological, environmental and social factors [2], this feature is particularly relevant. Several methods for estimating the reproduction number have been proposed in the literature and have one or more of the above characteristics (see e.g. Ref. [3] for a review). The pioneering work of Ref. [4] laid the grounds for a stochastic approach to the problem. The sequential Bayesian method, which provides real-time estimates of R is introduced in Ref. [5]. Comprehensive and real-time approaches, which also allow one to include the effect of undetected cases, have been recently developed in Refs. [6], [7], [8], although within the framework of compartmental models. Our work builds upon the advances carried out in Refs. [9], [10], where a real-time and Bayesian estimation method is developed, not tied to any compartmental model. The framework we develop in this work proceeds along the lines of generalizing the formalism of Refs. [9], [10] to include many other factors affecting the reproduction number directly or indirectly. To the best of our knowledge, our approach is the first one combining at once all the components listed above. In particular, in this paper we are interested in assessing the role and impact of the number of undetected infection cases onto the effective reproduction number. There may be different reasons why an infected patient is undetected and does not appear in the official reports: individuals not showing the symptoms of the disease but are able to infect others (asymptomatic carriers), individuals whose symptoms have not been linked to the disease under consideration (especially in the early stages of the outbreak), impossibility of a complete population screening, etc. The number of undetected cases of infection may indeed be rather large. Recent studies about the COVID-19 case reported that many confirmed infections were asymptomatic, with no statistically different viral load with respect to the symptomatic cases: 43% for the sample of Ref. [11], 88% for the sample in Ref. [12]. These results justify the denomination of asymptomatic transmission of SARS-CoV-2 as the “Achilles’ heel” of current COVID-19 containment strategies [13].

Statistical model

We work in a Bayesian framework in which we treat all observations and parameters as random variables. The parameter of primary interest in this paper is the effective reproduction number R, described by a continuous random variable R with prior probability density function p(r). We then consider the observed incidence cases (number of new infected individuals at time t) I and the unknown number of undetected cases U as positive integer-valued stochastic processes with discrete time index t. We will also consider the stochastic process describing the total number of incident cases as a function of time, i.e. the sum of observed (reported) cases and the number of undetected cases. Notice that, in general, I and U are dependent, and their dependence is encoded by the conditional variable U|I. The serial interval is described by the discrete random variable W. Its probability mass function p(w) provides the probability of a secondary case arising w time steps after a primary case. Given a time window of τ time steps, over which R is assumed to be constant, we can split the times into two intervals: and . To avoid notational clutter, we will indicate by I < the set of random variables before time s and by the set of random variables in the time interval In our numerical simulations leading to the results displayed in Sections 2 and A.3 we set days. By Bayes’ theorem, at any given time t > τ, the posterior probability density of R given the serial interval distribution, the incidence data history and the undetected cases in the time window is where we assumed that W and R and independent (a generic dependence between them can be implemented in a straightforward way). Now, we can use the law of total probability to sum over the unknown number of undetected cases, assuming that U depends only on I. This way, we can get the posterior probability density for R given the serial interval distribution and the time series of incidence data up to time t The sum over the undetected cases u is ideally running up to the total population minus the observed cases i (neglecting effects of acquired immunity), but in practice it is cutoff much earlier by the distribution as discussed below. The only assumptions made to derive the posterior probability density in Eq. (4) have been that W is independent of R and that U only depends on I. So, Eq. (4) quite generally describes how to incorporate the effect of undetected cases in a Bayesian statistical model for the effective reproduction number, given the serial interval and incidence data. We now turn to formulate our assumptions about each of the terms appearing in Eq. (4), and we will reach a simple analytical form, ready to use for numerical simulations. The prior p(r) for R is assumed to be uniform over the interval [0,40]. For the prior distribution p(w) of the serial interval variable W we assume a continuous Gamma distribution (to be evaluated on integer values of w) described by two parameters: the shape parameter a and the rate parameter b The parameter values at 1σ level have been reported in Ref. [14] asThe prior distributions of the serial interval parameters a, b are considered normal: . Further details can be found in Appendix A.1. The probability mass function of the undetected cases U, which we already assumed to depend on I only, can be modelled in many different ways, for example with decreasing probabilities associated to large values of undetected cases, and also in a time-dependent way. In this paper we adopt the simplest assumption of a discrete uniform distribution with a single parameter C ~ Uniform([0, C]). This way, we are describing the situation where the number of undetected cases at a given time t is constant and can be at most C times the number of reported cases at the same time. We leave the investigation of alternative (and more realistic) scenarios to future work. So, the probability mass function of the number of undetected cases, conditioned on the values of the incidence data and the parameter C iswhere χ is the indicator function. The prior distribution of the continuous parameter C is assumed to be an uninformative uniform prior between 0 and 2: C ~ Uniform([0,2]). The choice of the number 2 is of course subjective, although we believe it is a reasonable and conservative prior. The number of new total cases T at time k within the time window given the previous incidence data, the serial interval data and the value of R, is assumed to be Poisson-distributed with parameter RΛ, where the total infection potential at a generic time t is defined byAt first approximation, this quantity can be considered as unaffected by the undetected cases, as the serial interval distribution is derived from tracking the secondary of reported cases. By considering the sample of secondary infections as representative of the population, the approximation above is justified. The generalizations to replace the Poisson distribution with a two-parameter negative binomial distribution [15], [16], [17], [18] and to include the undetected cases into Λ are left to future work. Therefore, we can write explicitly the probability mass function of T|I < , W, R as Now we have collected all the ingredients of Eq. (4), and we added the nuisance parameters a, b, C to the model. So the joint posterior density of R and the nuisance parameters reads By marginalizing over a, b, C we finally get the posterior probability density for R With a uniform prior for the number of undetected cases as described in Eq. (7), the sum in the square bracket of Eq. (11) can be computed analytically in terms of the regularized upper incomplete gamma function Q(s, x) as At any time t > τ, from the conditional posterior probability density it is possible to compute the mean and the 95% central credible intervals. In particular, the effective reproduction number R is the expected value of R conditioned on the past incidence data and serial intervalwhere the conditional probability density marginalized over all nuisance parameters, is given in general by Eq. (11), and by Eq. (12) for the particular case of uniform distribution of the number of undetected cases.

Results

We now turn to apply the statistical procedure developed in the previous section to real data of the COVID-19 outbreak in eight different countries, using daily incidence data from Ref. [19] (see also Appendix A.2). In Appendix A.3, we also analyze the incidence data for the twenty Italian regions. The computation of R starts from the first day reported on the dataset. It is worth mentioning that the starting date is not the same for all the countries taken into account. In the following we show the time evolution of the mean of the posterior distribution of R, as depicted in Fig. 1 , and the corresponding 95% central credible interval. The results are shown in Figs. 2 and 3 . By applying the full procedure described in the previous section, which delivers the posterior distribution marginalized over all the nuisance parameters of the serial interval and undetected cases distributions, we obtain our main results for R (blue solid line).1 As benchmarks, we also show the results obtained with only reported cases of infection and with fixed serial interval distribution (black dashed line), and the results obtained with only reported cases and marginalizing over the parameters of the serial interval distribution (green dash-dotted line).

Fig. 1

Fig. 2

Time evolution of effective reproduction number for COVID-19 in France, Germany, Italy and South Korea. Blue solid line: mean of the posterior probability marginalized over the parameters of the undetected cases and serial interval distributions. Green dash-dotted line: results of including only the reported cases and marginalizing over the parameters of the serial interval distribution. Black dashed line: results of including only the reported cases of infection and fixed serial interval distribution. Gray shaded area: 95% central credible interval. The inset shows the results of the past two weeks in greater detail. The vertical lines refer to the time when containment measures have been adopted.

Fig. 3

Time evolution of effective reproduction number for COVID-19 in Spain, Sweden, UK and USA. For the USA, we show the lock-down date in the state of NY as a reference. See caption of Fig. 2 for details.

Time evolution of the posterior probability density of the effective reproduction number R (we used the South Korea data, just for illustration purpose). The day 0 corresponding to a uniform prior is not shown. Time evolution of effective reproduction number for COVID-19 in France, Germany, Italy and South Korea. Blue solid line: mean of the posterior probability marginalized over the parameters of the undetected cases and serial interval distributions. Green dash-dotted line: results of including only the reported cases and marginalizing over the parameters of the serial interval distribution. Black dashed line: results of including only the reported cases of infection and fixed serial interval distribution. Gray shaded area: 95% central credible interval. The inset shows the results of the past two weeks in greater detail. The vertical lines refer to the time when containment measures have been adopted. Time evolution of effective reproduction number for COVID-19 in Spain, Sweden, UK and USA. For the USA, we show the lock-down date in the state of NY as a reference. See caption of Fig. 2 for details. Wherever applicable, we also report the dates when contagion containment measures have been enforced. In Table 1 we report the numerical results for the last day of our analysis.

Table 1

Country	R_t with reported	Mean of marginalized	95% CrI
	cases only	posterior probability
France	1.00	1.47	(0.85, 3.00)
Germany	0.73	1.50	(0.56, 3.22)
Italy	0.71	1.69	(1.01, 3.37)
South Korea	0.69	1.56	(0.86, 3.04)
Spain	0.55	1.58	(1.04, 2.94)
Sweden	0.93	1.95	(1.01, 3.89)
UK	1.10	2.45	(1.49, 5.17)
USA	0.98	2.20	(1.02, 4.91)

The values of the effective reproduction number R for COVID-19, on the last day of our analysis (2020-05-08), for each of the countries we considered. The second column reports the value we find by including only reported incidence data and mean values of the serial interval distribution. On the third column we report R from the posterior distribution, marginalized over the nuisance parameters describing the serial interval and undetected cases distributions. The corresponding 95% credible interval is reported on the last column. Our results clearly show that R, after a transitory period, is in a down trend in all the countries we considered. In all the countries considered, we find values of the mean value R larger than the ones without considering the undetected cases by factors of about 2 – 4. This is a somewhat expected consequence of our conservative choice of C being at most 2. By allowing more undetected cases, the resulting reproduction numbers would be necessarily higher. Furthermore, the upper values of the 95% credible intervals are larger than the estimate with only officially reported cases by factors up to 10, and they are all significantly above 1. Similar results are obtained for the Italian regions (as shown in A.3). We emphasize that these results depend on the assumptions about the prior distributions, as described in Section 2. However, the method itself is quite general and it is straightforward to implement any other choice of priors.

Conclusions and outlook

In this paper we took the first steps towards a comprehensive stochastic modelling of the effective reproduction number R during an epidemic. We followed a completely general approach, which enables one to account for any random variable affecting R. In particular, our primary focus was on assessing the impact on R of the number of undetected infection cases. We investigated the time evolution of the posterior probability density of the random variable describing R, marginalized over the parameters of the serial interval and undetected cases distributions. The application of our method to the COVID-19 outbreak in different countries show that the effective reproduction number is largely affected by the undetected cases, and in general it increases by factors of order 2 to 10. There are several directions in which further research can be carried out, aiming at expanding the capabilities of the basic framework described in this paper. For instance, it is desirable to explore a more realistic model of the probability distribution of undetected cases, also including time dependence. Furthermore, it is also possible to adopt a fully data-driven approach by performing non-parametric estimation of the serial interval distribution from transmission chains data. The stochastic approach outlined in this paper is not designed to establish or predict any cause-effect relationship between the R trend and the enforcement of the containment measures. It is nevertheless possible to use our results (especially the credible intervals) to define some robust criterion for evaluating the effectiveness of the containment measures. Our findings can be used by public health institutions to take more informed decisions about designing and gauging the strategies of infection containment. According to our results, we recommend caution in deciding when and how to relax the containment measures based on the value of the reproduction number, since it may be much larger than usually estimated.

CRediT authorship contribution statement

Andrea De Simone: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing. Marco Piangerelli: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 2

Region	R_t with reported	Mean of marginalized	95% CrI
	cases only	posterior probability
Abruzzo	0.77	2.01	(1.20, 4.03)
Basilicata	1.29	2.94	(1.24, 7.31)
Calabria	0.46	1.32	(0.64, 2.59)
Campania	0.60	1.66	(0.89, 3.36)
Emila Romagna	0.64	1.45	(0.81, 2.93)
Friuli V. G.	0.54	1.28	(0.74, 2.37)
Lazio	0.80	1.81	(0.99, 3.69)
Liguria	0.75	1.52	(0.80, 3.13)
Lombardia	0.83	2.01	(0.94, 4.14)
Marche	0.78	1.86	(1.06, 3.38)
Molise	0.65	1.83	(0.72, 3.73)
Piemonte	0.63	1.46	(0.64, 3.12)
Puglia	0.64	1.68	(0.92, 3.37)
Sardegna	0.62	1.42	(0.75, 2.86)
Sicilia	0.56	1.39	(0.78, 2.72)
Toscana	0.61	1.55	(0.85, 3.03)
Trentino A. A.	0.49	1.07	(0.58, 2.04)
Umbria	0.52	1.03	(0.48, 2.07)
Valle d’Aosta	0.56	1.48	(0.74, 3.06)
Veneto	0.58	1.34	(0.91, 2.51)

11 in total

1. Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo'.

Authors: Enrico Lavezzo; Elisa Franchin; Constanze Ciavarella; Gina Cuomo-Dannenburg; Luisa Barzon; Claudia Del Vecchio; Lucia Rossi; Riccardo Manganelli; Arianna Loregian; Nicolò Navarin; Davide Abate; Manuela Sciro; Stefano Merigliano; Ettore De Canale; Maria Cristina Vanuzzo; Valeria Besutti; Francesca Saluzzo; Francesco Onelia; Monia Pacenti; Saverio G Parisi; Giovanni Carretta; Daniele Donato; Luciano Flor; Silvia Cocchio; Giulia Masi; Alessandro Sperduti; Lorenzo Cattarino; Renato Salvador; Michele Nicoletti; Federico Caldart; Gioele Castelli; Eleonora Nieddu; Beatrice Labella; Ludovico Fava; Matteo Drigo; Katy A M Gaythorpe; Alessandra R Brazzale; Stefano Toppo; Marta Trevisan; Vincenzo Baldo; Christl A Donnelly; Neil M Ferguson; Ilaria Dorigatti; Andrea Crisanti
Journal: Nature Date: 2020-06-30 Impact factor: 49.962

2. Asymptomatic Transmission, the Achilles' Heel of Current Strategies to Control Covid-19.

Authors: Monica Gandhi; Deborah S Yokoe; Diane V Havlir
Journal: N Engl J Med Date: 2020-04-24 Impact factor: 91.245

3. Complexity of the Basic Reproduction Number (R₀).

Authors: Paul L Delamater; Erica J Street; Timothy F Leslie; Y Tony Yang; Kathryn H Jacobsen
Journal: Emerg Infect Dis Date: 2019-01 Impact factor: 6.883

4. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures.

Authors: Jacco Wallinga; Peter Teunis
Journal: Am J Epidemiol Date: 2004-09-15 Impact factor: 4.897

5. The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks.

Authors: Thomas Obadia; Romana Haneef; Pierre-Yves Boëlle
Journal: BMC Med Inform Decis Mak Date: 2012-12-18 Impact factor: 2.796

6. Real time bayesian estimation of the epidemic potential of emerging infectious diseases.

Authors: Luís M A Bettencourt; Ruy M Ribeiro
Journal: PLoS One Date: 2008-05-14 Impact factor: 3.240

7. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy.

Authors: Giulia Giordano; Franco Blanchini; Raffaele Bruno; Patrizio Colaneri; Alessandro Di Filippo; Angela Di Matteo; Marta Colaneri
Journal: Nat Med Date: 2020-04-22 Impact factor: 87.241

8. Universal Screening for SARS-CoV-2 in Women Admitted for Delivery.

Authors: Desmond Sutton; Karin Fuchs; Mary D'Alton; Dena Goffman
Journal: N Engl J Med Date: 2020-04-13 Impact factor: 91.245

9. Improved inference of time-varying reproduction numbers during infectious disease outbreaks.

Authors: R N Thompson; J E Stockwin; R D van Gaalen; J A Polonsky; Z N Kamvar; P A Demarsh; E Dahlqwist; S Li; E Miguel; T Jombart; J Lessler; S Cauchemez; A Cori
Journal: Epidemics Date: 2019-08-26 Impact factor: 4.396

10. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors: Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal: Science Date: 2020-03-16 Impact factor: 47.728

3 in total

1. Data-assimilation and state estimation for contact-based spreading processes using the ensemble kalman filter: Application to COVID-19.

Authors: A Schaum; R Bernal-Jaquez; L Alarcon Ramos
Journal: Chaos Solitons Fractals Date: 2022-03-11 Impact factor: 9.922

Review 2. Viral outbreaks detection and surveillance using wastewater-based epidemiology, viral air sampling, and machine learning techniques: A comprehensive review and outlook.

Authors: Omar M Abdeldayem; Areeg M Dabbish; Mahmoud M Habashy; Mohamed K Mostafa; Mohamed Elhefnawy; Lobna Amin; Eslam G Al-Sakkari; Ahmed Ragab; Eldon R Rene
Journal: Sci Total Environ Date: 2021-08-21 Impact factor: 7.963

3. The forecast of COVID-19 spread risk at the county level.

Authors: Murtadha D Hssayeni; Arjuna Chala; Roger Dev; Lili Xu; Jesse Shaw; Borko Furht; Behnaz Ghoraani
Journal: J Big Data Date: 2021-07-07

3 in total