Literature DB >> 33521405

Clarifying predictions for COVID-19 from testing data: The example of New York State.

Abstract

With the spread of COVID-19 across the world, a large amount of data on reported cases has become available. We are studying here a potential bias induced by the daily number of tests which may be insufficient or vary over time. Indeed, tests are hard to produce at the early stage of the epidemic and can therefore be a limiting factor in the detection of cases. Such a limitation may have a strong impact on the reported cases data. Indeed, some cases may be missing from the official count because the number of tests was not sufficient on a given day. In this work, we propose a new differential equation epidemic model which uses the daily number of tests as an input. We obtain a good agreement between the model simulations and the reported cases data coming from the state of New York. We also explore the relationship between the dynamic of the number of tests and the dynamics of the cases. We obtain a good match between the data and the outcome of the model. Finally, by multiplying the number of tests by 2, 5, 10, and 100 we explore the consequences for the number of reported cases.

Entities: Chemical Disease Gene Species

Keywords: Corona virus; Epidemic mathematical model; Isolation; Public closings; Quarantine; Reported and unreported cases; Testing data

Year: 2021 PMID： 33521405 PMCID： PMC7834578 DOI： 10.1016/j.idm.2020.12.011

Source DB: PubMed Journal: Infect Dis Model ISSN： 2468-0427

Introduction

The epidemic of novel coronavirus (COVID-19) infections began in China in December 2019 and rapidly spread worldwide in 2020. Since the early beginning of the epidemic, mathematicians and epidemiologists have developed models to analyze the data and characterize the spread of the virus, and attempt to project the future evolution of the epidemic. Many of those models are based on the SIR or SEIR model which is classical in the context of epidemics. We refer to (Tang et al., 2020; Wu et al., 2020) for the earliest articles devoted to such a question and to (Anderson & May 1991; Bailey, 1957; Brauer & Castillo-Chavez, 2000; Brauer et al., 2008, 2019; Busenberg & Cooke, 1993; Diekmann et al., 2013; Hethcote, 2000; Keeling & Rohani, 2007; Murray, 1993; Thieme, 2003) for more models. In the course of the COVID-19 outbreak, it became clear for the scientific community that covert cases (asymptomatic or unreported infectious case) play an important role. An early description of an asymptomatic transmission in Germany was reported by Rothe et al. (Rotheet al., 2020). It was also observed on the Diamond Princess cruise ship in Yokohama in Japan by Mizumoto et al. (Mizumoto et al., 2020) that many of the passengers were tested positive to the virus, but never presented any symptoms. We also refer to Qiu (Qiu, 2020) for more information about this problem. At the early stage of the COVID-19 outbreak, a new class of epidemic models was proposed in Liu et al. (Liu et al., 2020a) to take into account the contamination of susceptible individuals by contact with unreported infectious. Actually, this class of models was presented earlier in Arino et al. (Arino et al., 2006). In (Liu et al., 2020a) a new method to use the number of reported cases in SIR models was also proposed. This method and model was extended in several directions by the same group in (Liu et al., 2020b, 2020c, 2020d) to include non-constant transmission rates and a period of exposure. More recently the method was extended and successfully applied to a Japanese age-structured dataset in (Griette et al., 2020). The method was also extended to investigate the predictability of the outbreak in several countries including China, South Korea, Italy, France, Germany and the United Kingdom in (Liu et al., 2021). The application of the Bayesian method was also considered in (Cotta et al., 2020). In parallel with these modeling ideas, Bayesian methods have been widely used to identify the parameters in the models used for the COVID-19 pandemic (see e.g. Roques et al. (Roques, Klein, Papaix, Sar, & Soubeyrand, 2020a, 2020b; Roques et al., 2020a, 2020b) where an estimate of the fatality ratio has been developed). A remarkable feature of those methods is to provide mechanisms to correct some of the known biases in the observation of cases, such as the daily number of tests. Here we embed the data for the daily number of tests into an epidemic model and compare the number of reported cases produced by the model and the data. Our goal is to understand the relationship between the data for the daily number of tests (which is an input of our model) and the data for the daily number of reported cases (which is an output of our model). The plan of the paper is the following. In Section 2, we present a model involving the daily number of tests. In Section 3, we apply the method presented in (Liu et al., 2020a) to our new model. In Section 4, we present some numerical simulations and compare the model with the data. The last section is devoted to the discussion.

Epidemic with testing data

Let n(t) be the number of tests per unit of time. Throughout this paper, we use one day as the unit of time. Therefore n(t) can be regarded as the daily number of tests at time t. The function n(t) is actually coming from a database for the New York State (https://covidtracking.com). Let N(t) be the cumulative number of tests from the beginning of the epidemic. Then Section 4 is devoted to numerical simulations. We use n(t) as a piecewise constant function that varies day by day. Each day, n(t) is equal to the number of tests that were performed that day. So n(t) should be understood as the black curve in Fig. 4.

Fig. 4

In this figure, we plot the daily number of tests for the New York State. The black curve, orange curve, and blue curve correspond respectively to the number of tests, the number of positive tests, and the number of negative tests.

The model consists of the following ordinary differential equation This system is supplemented by initial data (which are all non negative) The time t1 corresponds to the time where the tests started to be used constantly. Therefore the epidemic started before t1. Here t ≥ t1 is the time in days. S(t) is the number of individuals susceptible to infection. E(t) is the number of exposed individuals (i.e. who are incubating the disease but not infectious). I(t) is the number of individuals incubating the disease, but already infectious. U(t) is the number of undetected infectious individuals (i.e. who are expressing mild or no symptoms), and the infectious that have been tested with a false negative result, are therefore not candidates for testing. D(t) is the number of individuals who express severe symptoms and are candidates for testing. R(t) is the number of individuals who have been tested positive to the disease. The flux diagram of our model is presented in Fig. 1.

Fig. 1

Flow chart of the epidemic model with tests (2.2). In this diagram n(t) is the daily number of tests at time t. We consider a fraction (1 − σ) of false negative tests and a fraction σ of true positive tests. The parameter g reflects the fact that the tests are devoted not only to the symptomatic patients but also to a large fraction of the population of New York state. Susceptible individuals S(t) become infected by contact with an infectious individual I(t), U(t)D(t). When they get infected, susceptibles are first classified as exposed individuals E(t), that is to say that they are incubating the disease but not yet infectious. The average length of this exposed period (or noninfectious incubation period) is 1/α days. After the exposure period, individuals are becoming asymptomatic infectious I(t). The average length of the asymptomatic infectious period is 1/ν days. After this period, individuals are becoming either mildly symptomatic individuals U(t) or individuals with severe symptoms D(t). The average length of this infectious period is 1/η days. Some of the U-individuals may show no symptoms at all. In our model, the transmission can occur between a S-individual and an I-,U- or R-individual. Transmissions of SARS-CoV-2 are described in the model by the term τS(t)[I(t) + U(t) + D(t)] where τ is the transmission rate. Here, even though a transmission from R-individuals to a S-individuals is possible in theory (e.g. if a tested patient infects its medical doctor), we consider that such a case is rare and we neglect it. The last part of the model is devoted to the testing. The parameter σ is the fraction of true positive tests and (1 − σ) is the fraction of false negative tests. The quantity σ has been estimated at σ = 0.7 in the case of nasal or pharyngeal swabs for SARS-CoV-2 (Wanget al., 2020). Among the detectable infectious, we assume that only a fraction g are tested per unit of time. This fraction corresponds to individuals with symptoms suggesting a potential infection to SARS-CoV-2. The fraction g is the frequency of testable individuals in the population of New York state. We can rewrite g aswhere P is the total number of individuals in the population of the state of New York and 0 ≤ κ ≤ 1 is the fraction total population with mild or sever symptoms that may induce a test. Individuals who were tested positive R(t) are infectious on average during a period of 1/η days. But we assume that they become immediately isolated and do not contribute to the epidemic anymore. In this model we focus on the testing of the D-individuals. The quantity n(t) σ g D is a flux of successfully tested D-individuals which become R-individuals. The flux of tested D-individuals which are false negatives is n(t) (1 − σ) g D which go from the class of D-individuals to the U-individuals. The parameters of the model and the initial conditions of the model are listed in Table 1.

Table 1

Parameters and initial conditions of the model.

Symbol	Interpretation	Method
t₁	Date when the tests start to be used extensively	fixed
S₁	Number of susceptible at time t₁	fixed
E₁	Number of exposed at time t₁	fitted
I₁	Number of asymptomatic infectious at time t₁	fitted
U₁	Number of undetectable infectious at time t₁	fitted
D₁	Number of detectable infectious at time t₁	fitted
R₁	Number of reported (tested positive) cases at time t₁	fitted
Τ	Transmission rate	fitted
n(t)	Number of tests per unit of time	fixed
1/α	Average length of exposure	fixed
1/ν	Average length of asymptomatic infectiousness	fixed
1/η	Average length of symptomatic infectiousness	fixed
F	Frequency of infectious with sever symptoms	fixed
Σ	Fraction of true positive tests	fixed
G	Frequency of testable individuals	fixed

Parameters and initial conditions of the model. Before describing our method we list a few variables and parameters in Table 2. The cumulative number of reported cases is obtained by using the following equation

Table 2

Variables used in the model.

Symbol	Interpretation	Equation
T	Time (in days)
S(t)	Number of susceptible at time t	(2.2)
E(t)	Number of exposed at time t	(2.2)
I(t)	Number of asymptomatic infectious at time t	(2.2)
U(t)	Number of undetectable infectious at time t	(2.2)
D(t)	Number of detectable infectious at time t	(2.2)
R(t)	Number of reported (tested infectious) cases at time t	(2.2)
CR(t)	Cumulative number of reported (tested infectious) cases at time t	(2.5)
DR(t)	Daily number of reported (tested infectious) cases at time t	(2.6)
CD(t)	Cumulative number of detectable infectious at time t	(2.7)
CU(t)	Cumulative number of undetectable infectious at time t	(2.8)

Variables used in the model. The daily number of reported cases DR′(t) is given by The cumulative number of detectable cases is given by and the cumulative number of undetectable cases is given by

Method to fit the cumulative number of reported cases

In order to deal with data, we need to understand how to set the parameters as well as some components of the initial conditions (see Fig. 2). In order to do so, we extend the method presented first in (Liu et al., 2020a). The main novelty here concerns the cumulative number of tests which is assumed to grow linearly at the beginning. This property is satisfied for the New York State data as we can see in Fig. 3. The black curve in this figure is close to a line from March 15 to April 15. Fig. 4 shows day-by-day fluctuations of the number of tests while in Fig. 3 the day-by-day fluctuations are not visible and the cumulative data allow to understand the growth tendency of the number of tests.

Fig. 2

Fig. 3

In this figure, we plot the cumulative number of tests for the New York State. The black curve, orange curve, and blue curve correspond respectively to the number of tests, the number of positive tests, and the number of negative tests. We can see that at the early beginning of the epidemic, the cumulative number of tests (black curve) grows linearly from mid-March to mid-April.

Key time periods of COVID-19 infection: the latent or exposed period before the onset of symptoms and transmissibility, the incubation period before symptoms appear, the symptomatic period, and the transmissibility period, which may overlap the asymptomatic period. In this figure, we plot the cumulative number of tests for the New York State. The black curve, orange curve, and blue curve correspond respectively to the number of tests, the number of positive tests, and the number of negative tests. We can see that at the early beginning of the epidemic, the cumulative number of tests (black curve) grows linearly from mid-March to mid-April. In this figure, we plot the daily number of tests for the New York State. The black curve, orange curve, and blue curve correspond respectively to the number of tests, the number of positive tests, and the number of negative tests. Phenomenological models for the tests: We fit a line to the cumulative number of tests in a suitable interval of days [t, t]. This means that we can find a pair of numbers a and b such thatwhere a the daily number of tests and N is the cumulative number of tests on day t. By using the fact that N(t)′ = n(t) we deduce that In the simulations we fit a line to the cumulative number of tests from mid-March to mid-April. Fig. 3 shows that the linear growth assumption is reasonable for the New York State cumulative testing data. Phenomenological models for the reported cases: At the early stage of the epidemic, we assume that all the infected components of the system grow exponentially while the number of susceptible remains unchanged during a relatively short period of time t ∈ [t, t]. Therefore, we assume that We deduce that the cumulative number of reported cases satisfies hence by replacing D(t) by the exponential formula (3.3) and it makes sense to assume that CR(t) − CR(t) has the following form By identifying (3.5) and (3.6) we deduce that Moreover by using (3.2) and the fact that the number of susceptible S(t) remains constant equalling S on the time interval t ∈ [t, t], the E-equation, I-equation, U-equation and D-equation of the model (2.2) become By using (3.3) we obtain Computing further, we get Finally by using (3.7) and by using (3.8) we obtainwhere I1 is the number of incubating infectious individuals at time t1, U1 is the number of unreported infectious individuals at time t1, E1 is the number of incubating non-infectious individuals at time t1 (see (3.3)), and finally τ1 is the transmission rate at time t1.

Numerical simulations

We assume that the transmission coefficient takes the formwhere τ0 > 0 is the initial transmission coefficient, Tm > 0 is the time at which the social distancing starts in the population, and μ > 0 controls the speed at which this social distancing is taking place. To take into account the effect of social distancing and public measures, we assume that the transmission coefficient τ(t) can be modulated by γ. Indeed by closing schools and non-essential shops and by imposing social distancing in New York State, the number of contacts per day is reduced. This effect was visible on the news during the first wave of the COVID-19 epidemic in New York city since the streets were almost empty at some point. The parameter γ > 0 is the percentage of the number of transmissions that remain after a transition period (depending on μ), compared to a normal situation. A similar non-constant transmission rate was considered by Chowell et al. (Chowell et al., 2004). In Fig. 5 we consider a constant transmission rate τ(t) ≡ τ0 which corresponds to γ = 1 in (4.1). In order to evaluate the distance between the model and the data, we compare the distance between the cumulative number of cases CR produced by the model and the data (see the orange dots and orange curve in Fig. 5-(a)). In Fig. 5-(c) we observe that the cumulative number of cases increases up more than 14 millions of people, which indeed is not realistic. Nevertheless by choosing the parameter in Fig. 5-(d) we can see that the orange dots and the blue curve match very well.In the rest of this section, we focus on the model with confinement (or social distancing) measures. We assume that such social distancing measures have a strong impact on the transmission rate by assuming that γ = 0.2 < 1. It means that only 20% of the transmissions remain after a transition period.

Fig. 5

Best fit of the model without confinement (or social distancing) measures (i.e. γ = 1). Fitted parameters: The transmission rate τ(t) ≡ τ0 is constant according to formula (4.1) with γ = 1 and τ0 is fixed to the value τ1 computed by using (3.10). Parameter values:S0 = 19453561, α = 1, ν = 1/6, η = 1/7, σ = 0.7, f = 0.8 and g = 6/S0 = 3.08 × 10−7. t1 = march 18, t2 = march 29, a = 1.4874 × 104, b = −2.1781 × 105, χ1 = 2.8814 × 104, χ2 = 0.1013, χ3 = 2.9969 × 104. In figure (a) we plot the cumulative number of tests (black dots), the cumulative number of positive cases (red dots) for the state of New York and the cumulative number of cases CD(t) (yellow curve) obtained by fitting the model to the data. In figures (b)–(c) we plot the number of cases obtained from the model. We observe that most of the cases are unreported. In figure (d) we plot the daily number of tests (black dots), the daily number of positive cases (red dots) for the state of New York and the daily number of cases DD(t) obtained from the data.

Fig. 6

Best fit of the model with confinement (or social distancing) measures. Parameter values: Same as in Fig. 5, except the transmission coefficient which is not constant in time with γ = 0.2, Tm = 15 Mar (starting day of public measures), μ = 0.0251, g = 10−5 and τ0 is fixed at the value τ1 computed by using (3.10). In figure (a) we plot the cumulative number of tests (black dots), the cumulative number of positive cases (red dots) for the state of New York and the cumulative number of cases CD(t) (yellow curve) obtained by fitting the model to the data. In figures (b)–(c) we plot the corresponding number of cases obtained from the model. With this set of parameters we observe that most of the cases are unreported. In figure (d) we plot the daily number of tests (black dots), the daily number of positive cases (red dots) for the state of New York and the daily number of cases DD(t) obtained from the data. Fig. 7 (a) and (b), we aim at understanding the connection between the daily fluctuations of the number of reported cases (epidemic dynamic) and the daily number of tests (testing dynamics). The combination of the testing dynamics and the infection dynamics gives indeed a very complex curve parametrized by the time. It seems that the only reasonable comparison that we can make is between the cumulative number of reported cases and the cumulative number of tests. In Fig. 7 (c) and (d), the comparison of the model and the data gives a very decent fit. In Fig. 7, all the curves are time dependent parametrized curves. The abscissa is the number of tests (horizontal axis) and the ordinate is the number of reported cases (vertical axis). It corresponds (with our notations) to the parametric functions t → (ndata(t), DR(t)) in figures (a) and (b) and their cumulative equivalent t → (Ndata(t), CR(t)) in figures (c) and (d). In figures (a) and (c) we use only the data, that is to say that we plot t → (ndata(t), DRdata(t)) and t → (Ndata(t), CRdata(t)). In figures (b) and (d) we use only the model for the number of reported cases, that is to say that we plot t → (ndata(t), DRmodel(t)) and t → (Ndata(t), CRmodel(t)).

Fig. 7

In this figure we plot the curves of the number of reported cases as a function of the number of tests parametrized by the time. The top figures (a) and (b) correspond to the daily number of cases and the bottom figures (c) and (d) correspond to the cumulative number of cases. On the left-hand side we plot the data (a) and (c) while on the right-hand side we plot the model (b) and (d). Parameter values: Same as in Fig. 6. In figure (a) we plot the daily number of cases coming from the data as a function of the daily number of tests. In figure (b) we plot the daily number of cases given by the model as a function of the daily number of tests coming from the data. In figure (c) we plot the cumulative number of cases coming from the data as a function of the cumulative number of tests. In figure (d) we plot the cumulative number of cases coming from the model as a function of the cumulative number of tests from the data. In Fig. 8, our goal is to investigate the effect of a change in the testing policy in the New York State. We are particularly interested in estimating the effect of an increase of the number of tests on the epidemic. Indeed increasing the number of tests may be thought as beneficial to reduce the number of cases. Here we challenge this idea by comparing an increase in the number of tests to the quantitative output of our model. In Fig. 8, we replace the daily number of tests ndata(t) (coming from the data for New York’s state) in the model by either 2 × ndata(t), 5 × ndata(t), 10 × ndata(t) or 100 × ndata(t).

Fig. 8

Cumulative number of cases for different testing strategies: Original (blue curve), doubled (red curve), multiplied by 5 (yellow curve), multiplied by 10 (purple line) and multiplied by 100 (green curve). The transmission coefficient depends on the time, according to formula (4.1) with γ = 0.2, and τ0 is fitted by using (3.10). Parameter values: they are the same as in Fig. 6. In figure (a) we plot the cumulated number of cases CR(t) as a function of time. In figure (b) we plot the cumulative number of undetectable cases CU(t) as a function of time. In figure (c) we plot the cumulative number of cases (including covert cases) CD(t) as a function of time. Note that the total number of cases (including covert cases) is reduced by 35% when the number of tests is multiplied by 100. As expected, an increase of the number of tests is helping to reduce the number of cases at first. However, after increasing 10 times the number of tests, there is no significant difference (in the number of reported) between 10 times and 100 times more tests. Therefore there must be an optimum between increasing the number of tests (which costs money and other limited resources) and being efficient to slow down the epidemic.

Discussion

In this article, we proposed a new epidemic model involving the daily number of tests as an input of the model. The model itself extends our previous models presented in (Griette et al., 2020; Liu et al., 2020a, 2020b, 2020c, 2020d, 2021). We proposed a new method to use the data in such a context based on the fact that the cumulative number of tests grows linearly at the early stage of the epidemic. Fig. 3 shows that this is a reasonable assumption for the New York State data from mid-March to mid-April. Our numerical simulations show a very good concordance between the number of reported cases produced by the model and the data in two very different situations. Indeed, Fig. 5, Fig. 6 correspond respectively to an epidemic without and with public intervention to limit the number of transmissions. This is an important observation since this shows that testing data and reported cases are not sufficient to evaluate the real amplitude of the epidemic. To solve this problem, the only solution seems to include a different kind of data to the models. This could be done by studying statistically representative samples in the population. Otherwise, biases can always be suspected. Such a question is of particular interest in order to evaluate the fraction of the population that has been infected by the virus and their possible immunity. In Fig. 7, we compared the testing dynamic (day to day variation in the number of tests) and the reported cases dynamic (day to day variation in the number of reported). Indeed, the dynamics of daily cases is extremely complex, but we also obtain a relatively robust curve for the cumulative numbers. Our model gives a good fit for this cumulative cases. In Fig. 8, we compared multiple testing strategies. By increasing 2, 5, 10 and 100 times the number of tests, we can project the efficiency of an increase in the daily number of tests. We observe that it is efficient to increase this number up to 10 but the relative gain in absolute number of infected individuals rapidly drops after that. In particular, our projections do not show a big difference between a 10-times increase in the number of tests and a 100-times increase. Therefore there is a balance to find between the number of test and the efficiency in the evaluation of the number of cases, the optimal strategy being dependent on other factors like the monetary cost of the tests.

Funding

Q.G. and P.M. acknowledge the support of ANR flash COVID-19 MPCUII.

Declaration of competing interest

None declared.

4 in total

Clarifying predictions for COVID-19 from testing data: The example of New York State.

Introduction

Epidemic with testing data

Method to fit the cumulative number of reported cases

Numerical simulations

Discussion

Funding

Declaration of competing interest

1. Inverse problem for adaptive SIR model: Application to COVID-19 in Latin America.

2. An epidemiology-based model for the operational allocation of COVID-19 vaccines: A case study of Thailand.

3. Adaptive SIR model with vaccination: simultaneous identification of rates and functions illustrated with COVID-19.

4. Modeling Vaccine Efficacy for COVID-19 Outbreak in New York City.