Junyang Cai1, Jian Zhou1. 1. School of Management, Shanghai University, Shanghai 200444, China.
Abstract
A serological survey from CDC revealed more than 10% of individuals in America probably resolving or past infection with SARS-CoV-2 at the end of 2020, which illustrated there were massive unconfirmed asymptomatic infected people by contrast with the reported cases numbers. Asymptomatic patients as one of the crucial reasons for the COVID-19 pandemic being tough to contain, estimating the number of unconfirmed ones including the active infected and having cured in this population, is of great guiding significance for formulating epidemic prevention and control policies. This paper proposes a varying coefficient Susceptible-Infected-Removed-Susceptible (vSIRS) model to obtain the time series data of the unconfirmed asymptomatic infected numbers. Moreover, due to the time-varying coefficients, we can effectively track the situation changes of the COVID-19 intervened by related policy support and medical care level through this epidemiological model. A novel two-stage approach with a programming problem is correspondingly developed to accomplish the estimation of the unknown parameters in the vSIRS model. Subsequently, by leveraging seroprevalence data, daily reported cases data, and other clinical information, we apply the vSIRS model to analyze the evolution of COVID-19 in America. The modeling results show millions of active asymptomatic infected individuals were unconfirmed during the autumn and winter of 2020, which was a momentous factor for driving American COVID-19 pandemic.
A serological survey from CDC revealed more than 10% of individuals in America probably resolving or past infection with SARS-CoV-2 at the end of 2020, which illustrated there were massive unconfirmed asymptomatic infected people by contrast with the reported cases numbers. Asymptomatic patients as one of the crucial reasons for the COVID-19 pandemic being tough to contain, estimating the number of unconfirmed ones including the active infected and having cured in this population, is of great guiding significance for formulating epidemic prevention and control policies. This paper proposes a varying coefficient Susceptible-Infected-Removed-Susceptible (vSIRS) model to obtain the time series data of the unconfirmed asymptomatic infected numbers. Moreover, due to the time-varying coefficients, we can effectively track the situation changes of the COVID-19 intervened by related policy support and medical care level through this epidemiological model. A novel two-stage approach with a programming problem is correspondingly developed to accomplish the estimation of the unknown parameters in the vSIRS model. Subsequently, by leveraging seroprevalence data, daily reported cases data, and other clinical information, we apply the vSIRS model to analyze the evolution of COVID-19 in America. The modeling results show millions of active asymptomatic infected individuals were unconfirmed during the autumn and winter of 2020, which was a momentous factor for driving American COVID-19 pandemic.
The COVID-19 broke out in Wuhan, China in early 2020 and quickly engulfed the world. This disease caused by SARS-CoV-2 poses a serious threat to the lives and health of people worldwide [1], [2], [3]. The United States (US) is one of the countries most affected by the COVID-19 pandemic. It is difficult to control the spread of the virus due to the numerous asymptomatic infected people who have not been confirmed and quarantined. On 25 June 2020, the Washington Post reported that CDC director Robert Redfield said the number of SARS-CoV-2 infections in America could be ten times higher than the 2.3 million confirmed cases at that time.1
Manski and Molinari were concluded that the real infection rate was higher than the reported one via probabilistic inference [4]. These pieces of evidence suggested that substantial asymptomatic infections in the US were going undetected [5].Confirmed patients are those who have nucleic acid testing (nasal or throat swab testing) positive. However, asymptomatic infected people hardly take testing voluntarily, and they are generally confirmed and quarantined by testing close contacts of diagnosed infections. Such testing mode is a drop in the ocean compared with the numerous asymptomatic populations [6], [7], which leads to infection data being absent for individuals who have not been tested. Although current concerned studies indicate that asymptomatic infected people have weak infectivity [8], [9], the US governments still face an enormous challenge for effectively preventing the COVID-19 pandemic because of huge asymptomatic infected populations base. Therefore, in an effort to formulate reasonable epidemic prevention measures, estimating the number of asymptomatic infections is imperative.Unconfirmed infected people include the pre-symptomatic and a great fraction of asymptomatic ones. Furthermore, the latter can be divided into persons who are active infected and have recovered. We are more concerned about the quantity of the active unconfirmed asymptomatic cases that powerfully determine the development of COVID-19. Generally speaking, existing models about infectious diseases based on the Susceptible–Infected–Removed (SIR) model insert one or two new compartments used to describe asymptomatic cases [10], [11], [12], [13], [14], [15], [16], [17], [18]. However, these ordinary differential equations (ODEs) models face two pivotal problems: the values for the number of individuals in unobservable compartments and fixed coefficients which are largely neglected by the existing research work.For the first problem, both the accurate number of asymptomatic and pre-symptomatic cannot be observed in reality. Therefore, the unknown initial condition makes the solution difficult for ODEs models. Note that the insufficient testing in the US, selecting reported asymptomatic cases numbers as fitting information of ODEs model is not reliable since partial asymptomatic infected persons are not counted [4]. Even in January 2020, the early stages of the epidemic, the number of asymptomatic infections cannot be considered approximately zero in the US. Given this, by daily reported epidemic data, Angeli et al. used semi-supervised neural networks to estimate the initial individual numbers of unobservable compartments and other coefficients in a Susceptible–Asymptomatic–Infected–Vaccinated–Removed (SAIVR) model [18]. But neural networks are not the optimal estimation method because they usually rely on massive sample data, otherwise, the model will be overfitting [19].For the second problem, traditional SIR and its extended model all consider fixed coefficients, including infectious rate, removed rate, etc. But policies promulgated by the government, advances in treatment levels, and other related factors all exert a real-time remarkable influence on the values of these coefficients. For instance, strict stay-at-home orders, keeping social distance, and mask measures would reduce infection rates through limiting population contact rates [20], [21]. In other words, the time-varying coefficients can better reflect the real situation of the epidemic over time. Nevertheless, most existing papers focus on ODEs models with fixed coefficients estimated by Markov chain Monte Carlo (MCMC) and the Bayesian method [22], [23], [24]. Song et al. established a Susceptible–Exposed–Infected–Recovered–Dead (SEIRD) model with time-varying infection rate, death rate, and recovery rate [25]. Subsequently, a maximum likelihood method was applied to estimate these three coefficients during the modeling process. However, the SEIRD model ignored the group of asymptomatic individuals. Benefit from the scenario condition that asymptomatic patients are never unconfirmed, Yan et al. used the locally weighted kernel regression to estimate varying coefficients in a varying coefficient susceptible–infected–asymptomatic–diagnosed–removed (vSIADR) model [26]. Yet this scenario is out of line with reality since a considerable part of confirmed cases are asymptomatic in the US.We must acknowledge that the choice of the initial condition of the ODEs model affects the values of the time-varying coefficients. As a simple example, for given death numbers, the more infectious cases, the lower the death rate due to COVID-19. Hence the above two problems are momentous to accurately describe the development trend of COVID-19. In light of this, not only daily reported data such as cumulative confirmed data and cumulative death data, but other valuable information, need to be taken into account to participate in parameter estimation of the ODEs model. Serological data is one of them. Distinguished nucleic acid testing from serological testing, the former is used to detect the presence of the live virus within an individual, while the latter is to determine whether someone has a history of COVID-19 infection by detecting the presence of SARS-CoV-2 antibodies in serum [4]. It should be noted, though, seroprevalence does not represent the true cumulative percentage of infected individuals in the total population as antibodies have a limited duration. Seroprevalence only represents the proportion of the population with antibodies in the serum at the time of serologic investigation. But seroprevalence still is a vital reference for estimating susceptible and infected numbers in the US. To our knowledge, few papers use seroprevalence in terms of parameter estimation of ODEs model. Provided that only daily reported data are considered in the modeling process but seroprevalence information is ignored, the model results will be distorted because insufficient testings lead to the censoring of daily newly confirmed cases. This paper expects to utilize serological information so as to model and obtain more realistic results.The rest of this paper is organized as follows: a varying coefficient ODEs model with unconfirmed asymptomatic compartments is proposed in Section 2. Next, the detailed parameter estimation process is introduced in Section 3. We use real data including seroprevalence and daily reported data for the epidemic simulation in Section 4. And this section is divided into two parts: the introduction of the data source with related preprocessing, and the result and discussion of the model. The paper ends with a conclusion in Section 5.
The vSIRS model
The varying coefficient Susceptible–Infected–Removed–Susceptible (vSIRS) model is established in this section. As an expanded version of the SIR model, this ODEs model can estimate the trend of the unconfirmed asymptomatic infected numbers. Numerous studies have shown that a significant proportion of those infected have no symptoms or develop symptoms after the incubation period has passed, which makes it difficult to control the epidemic [27]. Given the wily infectivity of SARS-CoV-2, we set seven compartments in the vSIRS model based on disease characteristics, including susceptible people (), active asymptomatic infected but not confirmed people (), infected pre-symptomatic but not confirmed people (), confirmed and quarantined people () being from compartments and , recovered people () who come from compartment and carry antibodies, recovered people () who come from compartment and carry antibodies, and dead people (), to better describe COVID-19 pandemic. For easy understanding, some features of relevant compartments are organized into Table 1.
Table 1
The feature information about relevant compartments in the vSIRS model.
Condition 1
Condition 2
Condition 3
Condition 4
Condition 5
Individuals in compartment IA
√
×
×
×
×
Individuals in compartment IP
√
×
×
×
×
Individuals in compartment IQ
√
√
√
×
×
Individuals in compartment RA
×
×
×
√
×
Individuals in compartment RQ
×
√
×
√
√
Individuals in compartment RD
×
√
×
×
×
Condition 1: Whether these individuals are active cases.
Condition 2: Whether these individuals have ever been confirmed with COVID-19.
Condition 3: Whether these individuals are being quarantined to prevent the spread of the virus.
Condition 4: Whether these individuals have recovered.
Condition 5: Whether these individuals have recovered and been counted.
From Table 1, it can be seen that active infected and removed people are each subdivided into three compartments in this model. The former are segmented into , and . The latter are segmented into , and . In addition, individuals in compartments , and can be classified as an unconfirmed group and ones in compartments , and can be summarized as a confirmed group. In other words, at a certain moment, anyone infected who has ever passed through compartment is confirmed, while those who have not gone through compartment are undiagnosed. Table 1 does not include compartment since individuals in contain three types of individuals: (i) individuals who had never been infected; (ii) previously diagnosed and cured cases losing antibodies and becoming susceptible again; (iii) recovered cases whose antibodies had disappeared being previously unconfirmed. Before formulating the specific vSIRS model, three hypotheses of the model are first provided as follows:The feature information about relevant compartments in the vSIRS model.Condition 1: Whether these individuals are active cases.Condition 2: Whether these individuals have ever been confirmed with COVID-19.Condition 3: Whether these individuals are being quarantined to prevent the spread of the virus.Condition 4: Whether these individuals have recovered.Condition 5: Whether these individuals have recovered and been counted.The quarantined populations diagnosed with COVID-19 hardly infect susceptible individuals because they are usually quarantined at home or in hospital according to epidemic prevention policies.Someone in compartment has no time delay among symptom onset, diagnosis and quarantine.People in compartments and recover at the same rate. This similar hypothesis was made in the epidemiological modeling studies of [26].These three hypotheses will play an important role in the subsequent modeling process. Subsequently, the specific vSIRS model is as follows: where the definitions about notions in ODEs (1) can be seen in Table 2.
Table 2
Notations.
S(t)
The number of susceptible individuals at time t
IA(t)
The number of individuals who are active asymptomatic infected but not confirmed at time t
IP(t)
The number of individuals who are active pre-symptomatic infected but not confirmed at time t
IQ(t)
The number of quarantined and confirmed active infected people at time t
RA(t)
The number of individuals with antibody who are recovered from compartment IA at time t
RQ(t)
The number of individuals with antibody who are recovered from compartment IQ at time t
RD(t)
The number of dead individuals at time t
βAt
Infection rate for individuals in compartment IA at time t
βPt
Infection rate for individuals in compartment IP at time t
γt
Diagnostic rate for individuals in compartment IA at time t
ϕt
Recovery rate at time t
ηt
Death rate at time t
α
Diagnostic rate for individuals in compartment IP
λ
Antibody disappearance rate
θ
The proportion for individuals of compartment IP in the newly increasing number of infections
N
The total population
Additionally, the sum of people in all compartments satisfiesNotations.In the vSIRS model, (1)(a) describes the conversion rate of the population. On the one hand, individuals in compartment mainly depend on susceptible people numbers, the number of active cases not quarantined, i.e., the number of people in compartments and , as well as real-time cross infection rate and . Susceptible people, on the other hand, do not only flow out to the infected population, and simultaneously receive recovered patients whose virus antibodies have disappeared at the rate of . (1)(b) characterizes the transition rate of active unconfirmed asymptomatic people. proportion of newly infected individuals from compartments goes into . Meanwhile, people in are screened in consideration of testing positive and quarantined at the rate of , while entering a recovery state at the rate of . According to (1)(c), the remaining proportion of newly infected individuals from compartments turns into pre-symptomatic cases. At the same time, when people in onset and are provided a positive diagnosis, they move towards at the rate of . (1)(d) expresses the dynamic change for quarantined and confirmed active infected population. Note that individuals in may be symptomatic (cases from ) or asymptomatic (cases from ). But they all belong to confirmed cases testing positive for nucleic acid. In addition, people in recover at the rate of and die at the rate of . (1)(e) and (1)(f) show the fluctuations in the number of recovered antibody carriers counted and uncounted, respectively. (1)(g) reflects the changes in death numbers due to COVID-19 over time.Compared with the traditional SIR model, the vSIRS model is mainly expanded from following four aspects:The susceptible population can not only be transformed into infected cases but recovered people have a certain chance of becoming susceptible. Because the duration of antibody is taken into consideration since existing medical studies have shown that antibodies of SARS-CoV-2 are not carried for life [28], [29], [30]. The number of susceptible people, therefore, does not always decline over time theoretically.There are two routes of infection for SARS-CoV-2, that is, the susceptible population is infected by people in compartments and . Due to almost all confirmed cases being quarantined in reality, COVID-19 virus carriers in compartment do not participate in infection (the first hypothesis). Also notice that susceptible people exposed to cases in compartment may still face the possibility of the onset of COVID-19 or death from contracting SARS-Cov-2, not just the risk of becoming asymptomatic. The potential hazard caused by the active unconfirmed infected cases, therefore, should never be ignored or belittled. In light of this, we still intend to emphasize that it is necessary to take stock of the number of these people by utilizing seroprevalence data. Note that and cannot be observed and as a proportion of newly pre-symptomatic cases in newly infected ones is an unobservable constant in our model.The vSIRS model fully considers the nucleic acid testing mechanism during the COVID-19 pandemic, that is, a portion of asymptomatic cases is usually screened by testing close contacts of existing confirmed cases. The scope of testing is limited in the US, leading to a significant number of asymptomatic infected people being not diagnosed. Thus individuals in compartments are confirmed and quarantined at the rate of .Considering the influence of pharmaceutical and non-pharmaceutical public health measures, such as keeping social distancing, gradually improving the level of medical treatment, testing intensity, and other factors, , , , , and usually fluctuate over time. In light of this, they are set as time-varying coefficients to better capture the development trend of the epidemic in the US. Additionally, diagnostic rate for individuals in compartment can be regarded as the inverse of the average incubation period from infection to symptom onset, diagnosis, and quarantine (the second hypothesis). The coefficient is the inverse of the mean duration of antibodies after someone is cured. Both the mean maintenance period of antibodies and the average incubation period are constants, hence and are set as constants in the vSIRS model.The first three aspects illustrate the structural improvements of compartments in the vSIRS model. And the last aspect offers the characterization of coefficients in the vSIRS model. Moreover, compared to the SAIVR model mentioned in Section 1, our model is more flexible due to the time-varying coefficients. And compared to the SEIRD and vSIADR model mentioned in Section 1, the setting of compartments structure in our model better fits the specific situation of the COVID-19 pandemic, which can be reflected in two main aspects: (i) a small proportion of asymptomatic infected persons being diagnosed and quarantined; (ii) the disappearance of antibodies leading to an increase in susceptible individuals.In order to facilitate the subsequent processing of this model, we next discretize this vSIRS model to reflect daily changes of the epidemic situation. After that, we will focus on analyzing this discrete model. The estimations of time-varying coefficients and initial conditions for the vSIRS model will be given in the next section through leveraging American clinical information including seroprevalence data.Let , , , , , and be the increments of individuals on day in every compartment. Then the daily discretized version of the vSIRS model can be expressed as Obviously, this kind of discretization is similar to Forward Euler method for solving ODEs. From (3), we can know that the number of real new infections on day is , of which are asymptomatic infections and are pre-symptomatic ones. Moreover, and , respectively, express the real new infectious numbers caused by active asymptomatic cases and pre-symptomatic cases on day . expresses the real new recovered numbers on day but the reported number of new recovered cases is only . In addition, means the reported number of new confirmed cases on day . Among them, are confirmed from asymptomatic patients, and are confirmed from pre-symptomatic patients due to symptom onset. Finally, represents the number of new death cases on day . Fig. 1 intuitively shows the daily population mobility among compartments.
Fig. 1
Daily movement of people among compartments involved in the vSIRS model.
Under the given vSIRS model, we can get the effective reproduction number that is a significant index to measure the severity of the COVID-19 pandemic. See Appendix A for derivation of . Popularly speaking, means the average number of individuals newly infected by active infectious people at time . When , the situation of the epidemic will be getting better. However, when , the number of infections will increase exponentially. When , disease will become endemic. It is worth noting that if the infection base is large enough, even though starts to get less than 1 at some point, there will still be a great number of new infection cases, and epidemic prevention and control measures cannot be relaxed or even strengthened.Daily movement of people among compartments involved in the vSIRS model.
Parameter estimation
For the sake of further explanation, we denote and as cumulative recovered cases being from compartment and reported cumulative confirmed cases on day , respectively. is less than for the same time point owing to the gradual disappearance of antibodies in blood serum. Through some official channel, we only can obtain daily reported data including so-called cumulative confirmed numbers, cumulative recovered numbers and cumulative death numbers, i.e., , and , respectively. As mentioned earlier, and are just reported data. In fact, massive infected and recovered cases are ignored by statistics.In this section, a two-stage approach is proposed to estimate unknown coefficients and the number of people from respective compartments. Specifically, we first estimate , , and by using available information. And the remaining parameters including , , , , , and are then estimated by means of a programming problem. Moreover, the constants and are directly given depending on clinical data and relevant literature. For the former, Zunyou Wu, a renowned epidemiologist in China, had said that the retention time of antibodies is generally 612 months.2
And a follow-up investigation in Wuhan showed that antibodies in the body persist for at least nine months [31]. Based on these reasons, we set in this paper. For the latter, is set to 1/3 by referencing [32], [33].
The first stage
Let be the increments of the reported recovered cases on day . According to (3)(f), we have an iterative formula under a given . The initial iteration value of is set to zero when epidemic data began to be recorded. Because the second term on the right-hand side of (4) represents the reported new cured cases on day , we have Obviously, and are known, then we can easily calculate the estimation of for every day since the outbreak according to (4). At the same time, on the basis of (5), the estimation of can be written as where .Next, denote as the seroprevalence in the US on day , we can get The respondents of the serological survey are alive individuals, thus (7) excludes the dead population. Furthermore, a precondition for (7) to hold is to suppose that the body immediately starts producing antibodies as soon as a person is infected. In terms of this point, as a matter of fact, in a clinical study of COVID-19 by Huang et al. [34], they found that a large proportion of active asymptomatic infected individuals had tested positive for antibodies to the virus. Similarly, in a specific clinical study of antibodies by Ko et al. [35], they also discovered that the vast majority of active asymptomatic cases tested positive for serology. And the seroprevalence for the active patients with pneumonia symptoms was 100% in their sample population [35]. In addition, only when the patient recovers for a period of time, the amount of antibodies in the body decreases until it completely disappears. Thereby, for convenience, individuals in compartments , , , and are all considered to be antibody carriers in our paper.In view of (7), and using according to (2), we have Meanwhile, also depending on (7), we obtain the daily estimations of the number of susceptible peopleThe first stage ends with an estimation of . It is assumed that all deaths due to the COVID-19 can be observed in our paper, i.e., the reported death data is reliable because unconfirmed active asymptomatic patients do not die and unconfirmed pre-symptomatic ones are also normally no possibility of death during the incubation period. This assumption was also made in Manski and Molinari’s modeling study of the COVID-19 pandemic [4]. Generally only symptomatic cases in compartment , are at risk of death. In light of this, we can easily know due to daily new death cases .
The second stage
CDC researchers indicated that up to 59% of SARS-CoV-2 infection in the US are caused by asymptomatic cases [36]. It is assumed that this conclusion is credible, then we have at any time point. (11) indicates that 59% of daily new infections are caused by active unconfirmed asymptomatic patients. On the other hand, because of (3)(a), we can get Through (11), (12), the estimations of and are for given , and on day , where and the estimations of can be obtained via the first-order difference of (9).Let be the increments of reported confirmed cases on day . Obviously, consists of people coming from compartments and , then we have according to (3)(d). So the estimation of naturally is for given and on day according to (14).Now, we start to estimate the time series , and . Based on (3)(b), (3)(c), (3)(e), (6), (9), (13), (15), we sequentially obtain three iterative formulas Actually, we only need to know constant and the initial value , , . After that, time series , , can be estimated via iterating formulas (16).Denote , then the specific values of time series can be obtained by directly calculating according to (8). We design a programming problems to estimate the optimal values of decision variables , , and . A natural objective function is to minimize the mean absolute percentage error (MAPE), i.e, where is a time period selected, and .On the other hand, it finds that existing testing records about the proportions of asymptomatic cases almost always fluctuate between 40% and 60% of the confirmed cases. In [37], [38], [39], these proportions in the US were 56%, 56.5%, and 40.2%, respectively. A meta-analysis from Peking University shows this proportion in North America should be 46.32% [40]. In view of this, we hope for fall within the range [0.4, 0.6] as much as possible. To quantify this ambiguous object, a piecewise linear membership function is designed as For , when falls into the interval , the membership of reaches a maximum value of 1. On the contrary, when falls into the interval (), the membership of will decrease linearly to 0 as it moves away from 0.4 (0.6). It is worth mentioning that can also be regarded as the corresponding membership function of a trapezoidal fuzzy number in fuzzy mathematics. By means of , the second objective function waiting for optimization can be expressed asAlthough the units of measurement for the objective functions (17), (18) are not consistent, we try to transform this optimization problem with them into a single objective problem. Therefore, another membership function for APE is proposed to make the form consistent with object (17). Obviously, if the APE is smaller, then the higher corresponding membership degree. And (17) can be transformed into via . Through the medium of membership functions and , two objects (18), (19) can be successfully connected. Now, the final object can be transformed into optimizing the sum of the objects (18), (19). To sum up, the corresponding programming problem can be expressed as According to programming problem (20), the objective function is to maximize the membership degrees. The first constraint in (20) is used to reduce the number of decision variables and narrow solution space. The second to fifth constraints are the formula description of the relevant notations. Note that the term in the second constraint needs to be obtained in advance through (4), and the values of , and can be gotten by applying (16). The sixth constraint requires that the number of people in compartment is less than in a given period. In fact, if we assumed that there are massive individuals in compartment , they would develop symptoms after the incubation period and seek medical treatment to be diagnosed. Then, the cumulative infected case numbers estimated by seroprevalence data ought to be closed to the reported cumulative confirmed case numbers. Consequently, due to a huge gulf between the front and rear, this paper establishes the sixth constraint. Finally, the rest of constraints request the related parameters satisfy realistic meaning.To solve the programming problem (20), this paper next develops an improved pattern search (IPS) algorithm which belongs to the derivative-free optimization algorithm. Although the conventional pattern search algorithm owns the advantages including but not limited to easily implemented, quite flexible, and insensitive for the selecting starting initial point, it is only suitable for unconstrained optimization scenarios [41], [42], [43]. Thus we ameliorate pattern search algorithm. Once the probe point does not satisfy the constraints, the search step size is halved until it falls into the feasible region, and then the moving direction is adjusted for the next search. And thus the proposed IPS algorithm has a wider range of applications to solve hard-to-derivative optimization problems with constraints. The fine-grained implementation process of the IPS algorithm can be seen in Algorithm 1.Lines 2 through 19 in Algorithm 1 represent exploratory move that optimizes according to the search directions of , , and in turn. Lines 22 through 25 are pattern move which optimizes by the search direction of from point to point . Note that exploratory and pattern move form the core of the IPS algorithm. Lines 5 through 7, 12 through 14, and 23 through 25 describe how the IPS algorithm handles constraints, which is the key to distinguish the traditional pattern search algorithm. Lines 28 and 29 provide the description of the termination condition for IPS algorithm.This section concludes with a summary of the complete estimation process as follows: (i) iterate (4) to obtain under a given and the reported new cured numbers ; (ii) calculate to obtain ; (iii) calculate to obtain ; (iv) estimate via (6); (v) estimate via (10); (vi) solve programming problem (20) by using the IPS algorithm to obtain the estimations of , , , , , , and , during the optimization process. In the description of the above estimation steps, the first five steps belong to the first stage, and last step belongs to the second stage.
Experiment
From the beginning of the COVID-19 outbreak in New York and California in early 2020, the virus continued to ravage the entire US. The epidemic places a serious burden on the US economy and society. The difficulty in diagnosing asymptomatic patients is an essential reason why the epidemic has been difficult to contain. In this section, we attempt to track the real number of cases of asymptomatic infections with the help of the vSIRS model and its two-stage estimation approach. We will first introduce the original data required by the model, including serological data, daily reported data, and other information. The corresponding data preprocessing methods will be given at the same time. Subsequently, we will discuss the model results and analyze the changing trend of the epidemic in the US.
Data
Our time series data reported cumulative confirmed cases , reported cumulative recovered cases and total death cases for the US are from Wind Quant,3
a professional financial quantitative platform in China. We use the Gaussian kernel smoothing method with a bandwidth of 7 to smooth these data, which can better correct the deviation of data statistics. Appendix B supplements the introduction of Gaussian kernel smoothing method [44]. Our subsequent parameter identifications for the vSIRS model are based on smoothing data. The US CDC began releasing estimated domestic seroprevalence every half month on 15 August 2020.4
According to the CDC, “these percentages do not include people who have been vaccinated against SARS-CoV-2 and have no history of infection”, and this data originated from a nationwide antibody seroprevalence survey. The cubic spline interpolation is applied to make up and get the seroprevalence for each day. The total US population is from Our World in Data in Github.5
Finally, the time period selected in this paper is from 15 August 2020 () to 12 December 2020 (). During these 120 days, the number of fully vaccinated individuals had not yet begun to be recorded because 2019-nCoV vaccines were only just being rolled out to the American public in December 2020 as well as the inchoate manufacturing capacity of the vaccine was limited. Another momentous reason was vaccine hesitancy. In fact, even when vaccines were available, a significant portion of Americans refused to get vaccinated because of vaccine safety concerns and other factors [45], [46]. Therefore, we ignore the extremely small number vaccinated population in the time period selected, otherwise, a new removed compartment for vaccinated people should be added into the vSIRS model.
Result and discussion
The detailed input information for Algorithm 1 is shown in Table 3. Its implementation was operated by Python in JupyterLab, and the running time was 16.9 s in the author’s laptop whose CPU is AMD Ryzen 7 and RAM is 16 GB. The acceptable operating efficiency shows the superiority of the IPS algorithm. The final parameter estimation results respectively are , , and . It can be seen that the estimation of is relatively small, which illustrates more than half of newly infected people are asymptomatic but less than a quarter of newly infected persons belong to pre-symptomatic cases during each round of infection. And the maximum value obtained via optimization correspondingly reaches 218.37985. Then the estimations of all parameters in the vSIRS model can be obtained. After that, we also jointly bring the estimations of these coefficients and the initial values of each compartment into (3) to solve this discrete ODEs again, of which the error analysis of the resulting simulated data is shown in Appendix C.
Table 3
The input information of Algorithm 1.
The COVID-19 data
See Section 4.1
The starting probe point X0
(5000000,100000,0.25)
The starting step vector δ0
(M(0)/4,ΔC(0)/0.33/4,0.05)
The accelerating coefficient v
1.4
The shrinking coefficient τ
0.3
The tolerance ϵ
10−5
Thereafter, the estimation results of time-varying coefficients in the vSIRS model are analyzed. Fig. 2 depicts the trend of recovered rate and death rate . It can be seen that had shown an overall upward trend with the continuous improvement of the treatment level for COVID-19. Unlike , the death rate firstly decreased and subsequently rose. As a matter of fact, winter itself is a high-incidence period of the pandemic. The abilities of virus reproduction and survival become stronger in low temperatures, which is easy to cause the epidemic [47], [48], [49]. Another momentous reason for the rebound of the epidemic was the holiday season.6
Because of the approaching of Thanksgiving and Christmas in November and December, an increasing number of people unconsciously relaxed the anti-epidemic measures, which gave the virus a chance to take advantage. As shown in the first two subgraphs of Fig. 3, it finds that the infectious rate of pre-symptomatic people was dozens of times the infectious rate of active asymptomatic. maintained a steady fluctuation, but started to be on the rise from early October 2020. The last subgraph of Fig. 3 shows the changes of diagnosis rate for active asymptomatic infected people over time. peaked at just 1.5% in mid-November. However, in the following time, had been diluted by limited testing and continued growth of active asymptomatic cases.
Fig. 2
The recovery rate and death rate , from 15 August 2020 to 11 December 2020.
Fig. 3
The infection rate for active unconfirmed asymptomatic individuals, infection rate for active unconfirmed pre-symptomatic individuals and the diagnostic rate for active asymptomatic individuals, from 15 August 2020 to 11 December 2020.
The input information of Algorithm 1.As we mentioned in Section 1, the true number of infections in the US may far exceed the reported cumulative numbers. Numerous infected individuals, obviously, were underreported. With the help of seroprevalence, the true picture of the daily epidemic development in the US can be successfully restored. Consequently, in the following statements, we comparatively analyze the reported COVID-19 data related to infection and recovery with the real data estimated by our model.The recovery rate and death rate , from 15 August 2020 to 11 December 2020.The infection rate for active unconfirmed asymptomatic individuals, infection rate for active unconfirmed pre-symptomatic individuals and the diagnostic rate for active asymptomatic individuals, from 15 August 2020 to 11 December 2020.First, consider the infection data. Fig. 4 provides information on reported daily new cases , i.e., , versus true new cases . And the light green fill in Fig. 4 represents the underreported cases per day. Early October can be regarded as a kink. It can be seen that the real number of new infections per day fluctuated steadily before early October 2020. During this period, although the daily new infected numbers were indeed underreported due to a variety of reasons, the number of underreported cases per day was maintained at a steady level. We even discover that the number of new cases reported slightly exceeded the real infectious numbers on some days in early October, that is, previous underreported cases were detected in large numbers, indicating the speed of nucleic acid testing had caught up with the speed of infection. After early October, nevertheless, due to the drop in temperature and the holiday season mentioned earlier, the number of true daily new infected cases began to rapidly boost. Considering that the level of testing during that period again fell behind the spread of the virus, which allowed undiagnosed daily active cases to start growing rapidly and creating a vicious circle. Subsequently, Fig. 5 portrays the daily number of the reported active cases, i.e., , and true active ones, i.e., , in the given period. Because individuals in compartment are all confirmed, the light green fill in Fig. 5 represents the unconfirmed active cases, i.e., . It can be seen that the real number of infected active cases was much higher than the reported existing confirmed active ones. Moreover, considerable active cases had not been diagnosed and isolated in time, which was a significant reason for the caprices of the COVID-19 pandemic. Fig. 6 further describes the change in active infected cases , and , respectively. It can be intuitively found that people who would develop symptoms accounted for a small portion of the active case. Yet we still needed to be vigilant about these individuals as they were more contagious compared with active asymptomatic cases. In addition, the coverage of the testing was severely inadequate, leading to many more active patients lurking as asymptomatic forms in the population. We also observe from Fig. 6 that the number of the unconfirmed active asymptomatic and pre-symptomatic infected persons, i.e., and both began to reveal an upward tendency during the winter months. Especially for , its scale of infection reaches millions, which posed a serious challenge to the large-scale nucleic acid testing capabilities for the American medical system. At the same time, strengthening epidemic prevention measures is urgently necessary to control the spread of SARS-CoV-2 at its source for the US government.
Fig. 4
The reported daily new confirmed coronavirus infections and the real daily new infected coronavirus individuals , from 16 August 2020 to 12 December 2020.
Fig. 5
The reported active cases and the real active cases , from 15 August 2020 to 12 December 2020.
Fig. 6
The stacking chart about active unconfirmed infected individuals including the active unconfirmed asymptomatic infected cases , the active pre-symptomatic infected but not confirmed ones and active confirmed infected ones , from 15 August 2020 to 12 December 2020.
The reported daily new confirmed coronavirus infections and the real daily new infected coronavirus individuals , from 16 August 2020 to 12 December 2020.The reported active cases and the real active cases , from 15 August 2020 to 12 December 2020.Then consider the recovery data. Fig. 7 offers information on reported daily new cured cases , i.e., , versus the real daily recovered cases . And the light green fill expresses the underreported new recovered individuals per day in Fig. 7. We can find that more asymptomatic infected cases recovered every day, but they were not on record.
Fig. 7
The reported daily new cured cases and the real daily recovered cases , from 16 August 2020 to 12 December 2020.
The stacking chart about active unconfirmed infected individuals including the active unconfirmed asymptomatic infected cases , the active pre-symptomatic infected but not confirmed ones and active confirmed infected ones , from 15 August 2020 to 12 December 2020.Additionally, as shown in Fig. 8, a large portion of the American population had been infected with SARS-Cov-2 but had successfully recovered and acquired antibodies. Fig. 8 also presents the reported accumulative recovered cases and the cases still retaining antibodies in that population. And the light green fill in Fig. 8 denotes individuals losing the protection of antibodies in the reported cured cases.
Fig. 8
The reported accumulative recovered cases , people with antibodies from the reported cured cases and individuals with antibodies from the true recovered cases , from 15 August 2020 to 12 December 2020.
The reported daily new cured cases and the real daily recovered cases , from 16 August 2020 to 12 December 2020.Eventually, as shown in Fig. 9, the effectively reproduction number fluctuated steadily. In spite of , the epidemic situation in the US was still not optimistic under the enormous number of active infected people and various uncertainties. It is suggested that the US efforts to promote vaccines to reduce the susceptible population , expands the testing range to improve , strictly enforces epidemic prevention measures to decrease and , and enhance treatment level to increase in the future. By these strategies, can get effectively further compression.
Fig. 9
The effective reproduction number , from 15 August 2020 to 12 December 2020.
The reported accumulative recovered cases , people with antibodies from the reported cured cases and individuals with antibodies from the true recovered cases , from 15 August 2020 to 12 December 2020.The effective reproduction number , from 15 August 2020 to 12 December 2020.
Conclusion
This paper presents a dynamic model to simulate the spread of COVID-19. It fully takes into account the helpless status quo that the large amounts of asymptomatic infected persons are not confirmed in the US. For the vSIRS model, we focus on two main problems received little attention from the existing literature: the time-varying coefficients and the choice of initial values. Active asymptomatic case numbers and other time-varying coefficients, therefore, can be validly estimated through a novel two-stage approach introducing seroprevalence information, which is our principal contribution in this paper. Additionally, an IPS algorithm is developed for solving the programming problem in the proposed estimation method. The vSIRS model provides a detailed description covering the characteristics of virus transmission and the American diagnostic efficiency of COVID-19. It can successfully track the situation about the spread of the epidemic. Modeling analyses demonstrated that there were millions of people in the state of active unconfirmed asymptomatic infection during the second half of 2020. We recommend that the US government expand the proportion of testing and try its best to isolate and treat people who are in the infectious period to prevent triggering a new wave of epidemic.In the future, we can consider vaccination including one dose of vaccine, two doses of vaccine, and a booster, to analyze how the vaccination campaign affects the trend of the asymptomatic infection numbers and the situation of COVID-19. Other than that, the statistical lag of reported epidemic data is likewise a worth research direction. The Atlantic had reported that, for example, a massive backlog of nucleic acid test samples led to a serious lag in updating reported epidemic data in the US, and this situation was particularly severe in California.7
Therefore, how to combine serological data to estimate the true number of unconfirmed active cases will become a new challenge in the context of lagging reported COVID-19 data. Ultimately, we genuinely appeal that other countries can also actively disclose seropositive data so as to more realistically reflect the epidemic situation in their own countries.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Anne Kimball; Kelly M Hatfield; Melissa Arons; Allison James; Joanne Taylor; Kevin Spicer; Ana C Bardossy; Lisa P Oakley; Sukarma Tanwar; Zeshan Chisty; Jeneita M Bell; Mark Methner; Josh Harney; Jesica R Jacobs; Christina M Carlson; Heather P McLaughlin; Nimalie Stone; Shauna Clark; Claire Brostrom-Smith; Libby C Page; Meagan Kay; James Lewis; Denny Russell; Brian Hiatt; Jessica Gant; Jeffrey S Duchin; Thomas A Clark; Margaret A Honein; Sujan C Reddy; John A Jernigan Journal: MMWR Morb Mortal Wkly Rep Date: 2020-04-03 Impact factor: 17.586