Literature DB >> 32518842

Inferring the number of COVID-19 cases from recently reported deaths.

Thibaut Jombart^1,2,3, Kevin van Zandvoort¹, Timothy W Russell¹, Christopher I Jarvis¹, Amy Gimma¹, Sam Abbott¹, Sam Clifford¹, Sebastian Funk¹, Hamish Gibbs¹, Yang Liu¹, Carl A B Pearson^1,4, Nikos I Bosse¹, Rosalind M Eggo¹, Adam J Kucharski¹, W John Edmunds¹.

Abstract

We estimate the number of COVID-19 cases from newly reported deaths in a population without previous reports. Our results suggest that by the time a single death occurs, hundreds to thousands of cases are likely to be present in that population. This suggests containment via contact tracing will be challenging at this point, and other response strategies should be considered. Our approach is implemented in a publicly available, user-friendly, online tool. Copyright:

Entities: Chemical

Keywords: SARS-CoV-2; covid-19; epidemics; estimation; modelling; outbreak; statistics

Year: 2020 PMID： 32518842 PMCID： PMC7255910 DOI： 10.12688/wellcomeopenres.15786.1

Source DB: PubMed Journal: Wellcome Open Res ISSN： 2398-502X

Introduction

As the coronavirus-2019 (COVID-19, [1]) epidemic continues to spread worldwide, there is mounting pressure to assess the scale of epidemics in newly affected countries as rapidly as possible. We introduce a method for estimating cases from recently reported COVID-19 deaths. Results suggest that by the time the first deaths have been reported, there may be hundreds to thousands of cases in the affected population. We provide epidemic size estimates for several countries, and a user-friendly, web-based tool that implements our model [16].

Methods

Using deaths to infer cases

COVID-19 deaths start to be notified in countries where few or no cases had previously been reported [2]. Given the non-specific symptoms [3], and the high rate of mild disease [4], a COVID-19 epidemic may go unnoticed in a new location until the first severe cases or deaths are reported [5]. Available estimates of the case fatality ratio, i.e. the proportion of cases that are fatal (CFR, [6, 7]), can be used to estimate the number of cases who would have shown symptoms at the same time as the fatal cases. We developed a model to use CFR alongside other epidemiological factors underpinning disease transmission to infer the likely number of cases in a population from newly reported deaths. Our approach involves two steps: first, reconstructing historic cases by assuming non-fatal cases are all undetected, and, second, model epidemic growth from these cases until the present day to estimate the likely number of current cases. We account for uncertainty in the epidemiological processes by using stochastic simulations for estimation of relevant quantities. Two pieces of information are needed to reconstruct past cases: the number of cases for each reported death, and their dates of symptom onset. Intuitively, the CFR provides some information on the number of cases, as it represents the expected number of deaths per case, so that CFR -1 corresponds to the expected number of cases per death. In practice, the number of cases until the first reported death can be drawn from a Geometric distribution with an event probability equal to the CFR. Note that while our approach could in theory use different CFR for each case (to account for different risk groups), our current implementation uses the same CFR for all cases in a simulation. Dates of symptom onset are simulated from the distribution of the time from onset to death, modelled as a discretised Gamma distribution with a mean of 15 days and a standard deviation of 6.9 days [8]. Once past cases are reconstructed, we use a branching process model for forecasting new cases [9, 10]. This model combines data on the reproduction number ( R) and serial interval distribution to simulate new cases ‘ y ’ on day ‘ t’ from a Poisson distribution: where w(.) is the probability mass function of the serial interval distribution. More details on this simulation model can be found in Jombart et al. [10]. Optionally, this model can also incorporate heterogeneity in transmissibility using a Negative Binomial distribution instead of Poisson. The serial interval distribution was characterized as a discretized Lognormal distribution with mean 4.7 days and standard deviation 2.9 days [11]. We assume that past cases caused secondary transmissions independently (i.e. are not ancestral to each other), so that simulated cases for each death can be added. This assumption is most likely to be met when reported deaths are close in time. As the time between reported deaths increases, past cases may come from the same epidemic trajectory rather than separate, additive ones, in which case our method would overpredict epidemic size. Further details on model design and parameters values are provided in Supplementary Material. Our approach is implemented in the R software [12] and publicly available as R scripts (see Extended data) [15], as well as in a user-friendly, interactive web-interface available at: https://cmmid.github.io/visualisations/inferring-covid19-cases-from-deaths [16].

Results

How many cases for a single death?

We first used our model to assess likely epidemic sizes when an initial COVID-19 death is reported in a new location. We ran simulations for a range of plausible values of R (1.5, 2 and 3) and CFR (1%, 2%, 3% and 10%), assuming a single death on the 1st March 2020 [7]. 25,000 epidemic trajectories were simulated for each parameter combination. Simulations for an ‘average severity’ scenario [7] with R = 2 and CFR = 2% show that by the time a death has occurred, hundreds to thousands of cases may have been generated in the affected population ( Figure 1). Results vary widely across other parameter settings, and amongst simulations from a given setting ( Table 1), with higher R and lower CFR leading to higher estimates of the numbers of cases. However, a majority of settings give similar results to our ‘average’ scenario, suggesting that a single death is likely to reflect several hundreds of cases. Results were qualitatively unchanged when incorporating heterogeneity in the model using recent estimates [13], but prediction intervals were wider ( Extended data).

Figure 1.

Example of simulated epidemic trajectories from a single death.

This figure shows results of 200 simulations using a CFR of 2% and R of 2 based on a hypothetical situation where a single death occurred on the 1st March 2020, represented by the red line. Ribbons of different shades represent, from the lightest to the darkest, the 95%, 75%, 50% and 25% quantile intervals.

Table 1.

Inferred number of cases for a single death.

Inferred number of cases after detection of a single death under different values of the reproduction number, and case fatality ratio. We estimate the number of expected cases in the population at the day the death occurred, and present median, 50%, and 95% estimates of the quantile interval.

R	Median	Lower 95% Quantile Interval	Lower 50% Quantile Interval	Upper 50% Quantile Interval	Upper 95% Quantile Interval
CFR 1%
1.5	252	5	102	596	2 572
2	519	9	174	1 477	8 325
3	1 733	37	541	7 461	138 624
CFR 2%
1.5	132	2	52	294	1,110
2	276	5	93	780	5 694
3	964	19	300	4 174	49 137
CFR 3%
1.5	75	2	27	191	757
2	181	4	60	465	2 515
3	719	7	173	3 100	89 909
CFR 10%
1.5	29	0	10	65	219
2	46	0	15	136	1,020
3	245	2	63	983	30 708

Example of simulated epidemic trajectories from a single death.

Inferred number of cases for a single death.

Recently affected countries

We applied our approach to three countries which recently reported their first COVID-19 deaths (Spain, Italy, and France), using the same range of parameters as in the single-death analysis. In order to compare predictions to cases actually reported in these countries, projections were run until 4th March. Overall, predictions from the model using the baseline scenario ( R = 2, CRF = 2%) were in line with reported epidemic sizes ( Table 2). Results from other scenarios are presented in the Extended data. Actual numbers of reported cases fell within the 50% quantile intervals of simulations in all three countries Italy (median: 1 294 ; QI 50%: [390 ; 3 034]; reported: 2 037), France (median: 592 ; QI 50%: [177 ; 1 705]; reported: 190) and Spain, (median: 202 ; QI 50%: [95 ; 823]; reported 202).

Table 2.

Inferred number of cases for several countries assuming CFR of 2% and R of 2.

All values are presented for the 4th of March 2020 for different countries. We present the predicted case counts as their median, 50%, and 95% estimates of the quantile interval. * First suspected death due to within country transmission.

Country	Date of first death*	Initial deaths	Reported cases	Median	Lower 95% Quantile Interval	Lower 50% Quantile Interval	Upper 50% Quantile Interval	Upper 95% Quantile Interval
Spain	4th March	1	202	263	8	95	823	7 829
Italy	26th Feb	1	2 037	1 294	33	390	3 034	19 487
France	21st Feb	1	190	592	10	177	1 705	7 501

Inferred number of cases for several countries assuming CFR of 2% and R of 2.

Discussion

Several limitations need to be considered when applying our method. First, our approach only applies to the deaths of patients who have become symptomatic in the location considered, which should usually be the case in places where traveler screening is in place. We also assume constant transmissibility ( R) over time, which implies that behavior changes and control measures have not taken place yet, and that there is no depletion of susceptible individuals. Consequently, our method should only be used in the early stages of a new epidemic, where these assumptions are reasonable. Similarly, the assumption that each death reflects independent, additive epidemic trajectories is most likely to hold true early on, when reported deaths are close in time (e.g. no more than a week apart). Used on deaths spanning longer time periods, our approach is likely to overestimate epidemic sizes. Contact tracing has been shown to be an efficient control measure when imported cases can be detected early on [14], in addition to permitting the estimation of key epidemiological parameters [11]. When the first cases reported in a new location are mostly deaths, however, our results suggest that the underlying size of the epidemic would make control via contact tracing extremely challenging. In such situations, efforts focusing on social distancing measures such as school closures and self-isolation may be more likely to mitigate epidemic spread.

Data availability

Underlying data

All data underlying the results are available as part of the article and no additional source data are required.

Extended data

Zenodo: Extended data for: Inferring the number of COVID-19 cases from recently reported deaths. http://doi.org/10.5281/zenodo.3733289 [15]. This project contains the file ‘extended_data’ (PDF), which contains supplemental information and methodological details regarding the model described in this article. Extended data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

The Shiny app using the model is available at: https://cmmid.github.io/visualisations/inferring-covid19-cases-from-deaths. Source code and R scripts available at: https://github.com/thibautjombart/covid19_cases_from_deaths. Archived code at time of publication: http://doi.org/10.5281/zenodo.3733047 [16]. License:Code is available under an MIT License; other documentation is available under a CC-BY 4.0 License. This article describes a statistical modeling method for estimating the number of COVID-19 cases from the first reported deaths in a defined location. The described methodology can provide useful information for decision making, especially as a Shiny app has been developed for facilitating quick application of the method by public health practitioners, and the R code has been made available. Introduction: I would be interested to see in the text a few words about how many (and which) countries found themselves in the situation of observing no COVID-19 case before the first deaths were reported. The reference provided (number 2) is not really specific about this point. Methods: The statistical method is well described and seems sound. I have a minor comment: in practice, published estimates of the CFR and R will be used as input parameters for the model. These estimates are derived from samples and are usually published with a certain measure of uncertainty, typically the standard deviation or a confidence interval. My understanding is that this estimation uncertainty on these input parameters is not taken into account in the prediction model: instead, the CFR and R are held constant for all simulations drawn with a set of parameter. Taking into account the uncertainty on these input parameters may lead to even greater prediction intervals, but may reflect more completely the uncertainty about the total number of cases given the current knowledge about the disease at a certain point in time. This could be done, for example, by drawing the CFR in a Beta distribution with a and b derived from the published mean and sd instead of holding it constant. In the Shiny app, the user could provide the confidence interval. Discussion: It would be interesting if the authors could comment on the availability of other published methods developed for inferring the number of cases based on reported deaths. If such methods exist, how do they compare in their approach and results with the proposed one? What are the comparative strengths of the proposed method? typo error: “theunderlying size…” Is the rationale for developing the new method (or application) clearly explained? Yes Is the description of the method technically sound? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Yes If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are sufficient details provided to allow replication of the method development and its use by others? Yes Reviewer Expertise: biostatistics, public health I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This is a useful, technically correct, and clearly written contribution. Could the authors comment on how much extra mileage one gets/advantages of this approach relative to simply saying that the current number of cases is approximately equal to 1/CFR? That is, does one have to reconstruct the past history to know how much trouble one is currently in? What would the effect of a heterogeneous CFR be? (I believe this would correspond e.g. to a 'beta-Geometric distribution', unless one instead wanted to treat it as a finite mixture of probabilities for discrete risk categories). It would be nice to have a little more detail (i.e. a few sentences) on the simulation procedure. I see how to get from CFR and deaths to a total number of preceding cases, and how to simulate times of symptom onset for the observed deaths. It's not completely obvious to me how to get from there to 'history of past cases' (i.e. incidence over time); does one run the renewal process backward in time? Or use branching-process theory to find the time distribution of symptom onset of the index case given the current size of the epidemic? Please clarify "We assume that past cases caused secondary transmissions independently (i.e. are not ancestral to each other), so that simulated cases for each death can be added." Does this mean that you assume that all observed deaths are from separate lineages/transmission chains? (The last sentence of the paragraph suggests that, but the initial statement could probably be clearer.) (Does this assumption even matter if we are in the branching-process regime?). I appreciate that the authors are trying to keep things simple, and thus the scenario-based approach (try the model for a range of CFR/R values and see what is implied) is useful. I note that the confidence intervals are already very wide (that's part of the point), but there are several quantities that are treated as known (delay distribution, serial interval distribution); I wonder how sensitive the results are to these assumptions (probably not much - I'm guessing that with R specified they might only change the timing, not the numbers). Given that the authors are already basing the answers on 25,000 solutions, it might not be too hard to construct point estimates and intervals based on a prior/uncertainty distribution of R and CFR (rather than constructing separate scenarios), and allowing for uncertainty in the delay and serial distributions. Minor comments/typos: Intro, line 1; methods, l. 7: extra comma inside parens before superscript refs?) "use [a] different CFR for each case" "parameters" values "theunderlying" "schoolclosures" In tables 1 and 2 consider stating "2.5% quantile, 25% quantile, 50% quantile, 97.5% quantile" (rather than lower/upper x 95%/50%) ? Is the rationale for developing the new method (or application) clearly explained? Yes Is the description of the method technically sound? Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article? Yes If any results are presented, are all the source data underlying the results available to ensure full reproducibility? Yes Are sufficient details provided to allow replication of the method development and its use by others? Yes Reviewer Expertise: ecology, evolution, epidemiological modeling I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

6 in total

1. A simple approach to measure transmissibility and forecast incidence.

Authors: Pierre Nouvellet; Anne Cori; Tini Garske; Isobel M Blake; Ilaria Dorigatti; Wes Hinsley; Thibaut Jombart; Harriet L Mills; Gemma Nedjati-Gilani; Maria D Van Kerkhove; Christophe Fraser; Christl A Donnelly; Neil M Ferguson; Steven Riley
Journal: Epidemics Date: 2017-02-24 Impact factor: 4.396

2. Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020.

Authors: Julien Riou; Christian L Althaus
Journal: Euro Surveill Date: 2020-01

3. Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data.

Authors: Natalie M Linton; Tetsuro Kobayashi; Yichi Yang; Katsuma Hayashi; Andrei R Akhmetzhanov; Sung-Mok Jung; Baoyin Yuan; Ryo Kinoshita; Hiroshi Nishiura
Journal: J Clin Med Date: 2020-02-17 Impact factor: 4.241

4. First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures, January 2020.

Authors: Sibylle Bernard Stoecklin; Patrick Rolland; Yassoungo Silue; Alexandra Mailles; Christine Campese; Anne Simondon; Matthieu Mechain; Laure Meurice; Mathieu Nguyen; Clément Bassi; Estelle Yamani; Sylvie Behillil; Sophie Ismael; Duc Nguyen; Denis Malvy; François Xavier Lescure; Scarlett Georges; Clément Lazarus; Anouk Tabaï; Morgane Stempfelet; Vincent Enouf; Bruno Coignard; Daniel Levy-Bruhl
Journal: Euro Surveill Date: 2020-02

5. The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application.

Authors: Stephen A Lauer; Kyra H Grantz; Qifang Bi; Forrest K Jones; Qulu Zheng; Hannah R Meredith; Andrew S Azman; Nicholas G Reich; Justin Lessler
Journal: Ann Intern Med Date: 2020-03-10 Impact factor: 25.391

6. The cost of insecurity: from flare-up to control of a major Ebola virus disease hotspot during the outbreak in the Democratic Republic of the Congo, 2019.

Authors: Thibaut Jombart; Christopher I Jarvis; Samuel Mesfin; Nabil Tabal; Mathias Mossoko; Luigino Minikulu Mpia; Aaron Aruna Abedi; Sonia Chene; Ekokobe Elias Forbin; Marie Roseline D Belizaire; Xavier de Radiguès; Richy Ngombo; Yannick Tutu; Flavio Finger; Madeleine Crowe; W John Edmunds; Justus Nsio; Abdoulaye Yam; Boubacar Diallo; Abdou Salam Gueye; Steve Ahuka-Mundeke; Michel Yao; Ibrahima Socé Fall
Journal: Euro Surveill Date: 2020-01

6 in total

15 in total

1. The local burden of disease during the first wave of the COVID-19 epidemic in England: estimation using different data sources from changing surveillance practices.

Authors: Emily S Nightingale; Sam Abbott; Timothy W Russell; Rachel Lowe; Graham F Medley; Oliver J Brady
Journal: BMC Public Health Date: 2022-04-11 Impact factor: 4.135

2. Challenges in evaluating risks and policy options around endemic establishment or elimination of novel pathogens.

Authors: C Jessica E Metcalf; Soa Fy Andriamandimby; Rachel E Baker; Emma E Glennon; Katie Hampson; T Deirdre Hollingsworth; Petra Klepac; Amy Wesolowski
Journal: Epidemics Date: 2021-11-17 Impact factor: 4.396

3. State-level variation of initial COVID-19 dynamics in the United States.

Authors: Easton R White; Laurent Hébert-Dufresne
Journal: PLoS One Date: 2020-10-13 Impact factor: 3.240

4. Estimating epidemiologic dynamics from cross-sectional viral load distributions.

Authors: James A Hay; Lee Kennedy-Shaffer; Sanjat Kanjilal; Niall J Lennon; Stacey B Gabriel; Marc Lipsitch; Michael J Mina
Journal: medRxiv Date: 2021-02-13

5. Modelling COVID-19 contagion: risk assessment and targeted mitigation policies.

Authors: Rama Cont; Artur Kotlicki; Renyuan Xu
Journal: R Soc Open Sci Date: 2021-03-31 Impact factor: 2.963

6. Analytical Validation and Clinical Application of Rapid Serological Tests for SARS-CoV-2 Suitable for Large-Scale Screening.

Authors: Amedeo De Nicolò; Valeria Avataneo; Jessica Cusato; Alice Palermiti; Jacopo Mula; Elisa De Vivo; Miriam Antonucci; Stefano Bonora; Andrea Calcagno; Giovanni Di Perri; Francesco Giuseppe De Rosa; Antonio D'Avolio
Journal: Diagnostics (Basel) Date: 2021-05-12

7. Characterisation of COVID-19 Pandemic in Paediatric Age Group: A Systematic Review and Meta-Analysis.

Authors: Naira M Mustafa; Laila A Selim
Journal: J Clin Virol Date: 2020-05-08 Impact factor: 3.168

8. Quantifying heterogeneity in SARS-CoV-2 transmission during the lockdown in India.

Authors: Nimalan Arinaminpathy; Jishnu Das; Tyler H McCormick; Partha Mukhopadhyay; Neelanjan Sircar
Journal: medRxiv Date: 2020-09-15

9. SARS-CoV-2 Infections in the World: An Estimation of the Infected Population and a Measure of How Higher Detection Rates Save Lives.

Authors: Carlos Villalobos
Journal: Front Public Health Date: 2020-09-25

10. Estimating actual COVID-19 case numbers using cumulative death count-A method of measuring effectiveness of lockdown of non-essential activities: a South African case study.

Authors: Laura Cox; Clarence Suh Yah
Journal: Pan Afr Med J Date: 2020-07-01