Literature DB >> 33163738

Identifying the measurements required to estimate rates of COVID-19 transmission, infection, and detection, using variational data assimilation.

Eve Armstrong^1,2, Manuela Runge³, Jaline Gerardin³.

Abstract

We demonstrate the ability of statistical data assimilation (SDA) to identify the measurements required for accurate state and parameter estimation in an epidemiological model for the novel coronavirus disease COVID-19. Our context is an effort to inform policy regarding social behavior, to mitigate strain on hospital capacity. The model unknowns are taken to be: the time-varying transmission rate, the fraction of exposed cases that require hospitalization, and the time-varying detection probabilities of new asymptomatic and symptomatic cases. In simulations, we obtain estimates of undetected (that is, unmeasured) infectious populations, by measuring the detected cases together with the recovered and dead - and without assumed knowledge of the detection rates. Given a noiseless measurement of the recovered population, excellent estimates of all quantities are obtained using a temporal baseline of 101 days, with the exception of the time-varying transmission rate at times prior to the implementation of social distancing. With low noise added to the recovered population, accurate state estimates require a lengthening of the temporal baseline of measurements. Estimates of all parameters are sensitive to the contamination, highlighting the need for accurate and uniform methods of reporting. The aim of this paper is to exemplify the power of SDA to determine what properties of measurements will yield estimates of unknown parameters to a desired precision, in a model with the complexity required to capture important features of the COVID-19 pandemic.

Entities: Chemical Disease Gene Species

Keywords: COVID-19; Epidemiology; Inference; Measurement noise

Year: 2020 PMID： 33163738 PMCID： PMC7605798 DOI： 10.1016/j.idm.2020.10.010

Source DB: PubMed Journal: Infect Dis Model ISSN： 2468-0427

Introduction

The coronavirus disease 2019 (COVID-19) is burdening health care systems worldwide, threatening physical and psychological health, and devastating the global economy. Individual countries and states are tasked with balancing population-level mitigation measures with maintaining economic activity. Mathematical modeling has been used to aid policymakers’ plans for hospital capacity needs, and to understand the minimum criteria for effective contact tracing (Murray et al. 2020). Both state-level decision-making and accurate modeling benefit from quality surveillance data. Insufficient testing capacity, however, especially at the beginning of the epidemic in the United States, and other data reporting issues have meant that surveillance data on COVID-19 is biased and incomplete (Heggeness, 2020; Li et al., 2020a; Weinberger et al., 2020). Models intended to guide intervention policy must be able to handle imperfect data. Within this context, we seek a means to quantify what data must be recorded in order to estimate specific unknown quantities in an epidemiological model of COVID-19 transmission. These unknown quantities are: i) the transmission rate, ii) the fraction of the exposed population that acquires symptoms sufficiently severe to require hospitalization, and iii) time-varying detection probabilities of asymptomatic and symptomatic cases. In this paper, we demonstrate the ability of statistical data assimilation (SDA) to quantify the accuracy to which these parameters can be estimated, given certain properties of the data including noise level. SDA is an inverse formulation (Tarantola, 2005): a machine learning approach designed to optimally combine a model with data. Invented for numerical weather prediction (An et al. 2017; Betts, 2010; Evensen, 2009; Kalnay, 2003; Kimura, 2002; Whartenby et al., 2013), and more recently applied to biological neuron models (Armstrong, 2020; Meliza et al., 2014; Hamilton et al., 2013; Kostuk et al., 2012; Nogaret et al., 2016; Schiff, 2009; Toth et al., 2011), SDA offers a systematic means to identify the measurements required to estimate unknown model parameters to a desired precision. Data assimilation has been presented as a means for general epidemiological forecasting (Bettencourt, Ribeiro, Chowell, Lant, & Castillo-Chavez, 2007), and one work has examined variational data assimilation specifically - the method we employ in this paper - for estimating parameters in epidemiological models (Rhodes & Hollingsworth, 2009). Related Bayesian frameworks for estimating unknown properties of epidemiological models have also been explored (Bettencourt & Ribeiro, 2008; Cobb et al., 2014). To date, there have been two employments of SDA for COVID-19 specifically. Ref (Sesterhenn, 2020) used a simple SIR (susceptible/infected/recovered) model, and Ref (Nadler et al., 2020) expanded the SIR model to include a compartment of patients in treatment. Another study has used a Bayesian inference framework to examine a fully stochastic epidemiological model, with relevance to COVID-19 (Li et al., 2020b). Two features of our work distinguish this paper as novel. First, we expand the model in terms of the number of compartments. The aim here is to capture key epidemiological and public health intervention features of COVID-19 such that the model structure is relevant for questions from policymakers on containing the pandemic. These features are: i) asymptomatic, presymptomatic, and symptomatic populations, ii) undetected and detected cases, and iii) two hospitalized populations: those who do and do not require critical care. For our motivations for these choices, see Model. Second, we employ SDA for the specific purpose of examining the sensitivity of estimates of time-varying parameters to various properties of the measurements, including the degree of noise (or error) added. Moreover, we aim to demonstrate the power and versatility of the SDA technique to explore what is required of measurements to complete a model with a dimension sufficiently high to capture the policy-relevant complexities of COVID-19 transmission and containment - an examination that has not previously been done. To this end, we sought to estimate the parameters noted above, using simulated data representing a metropolitan-area population loosely based on New York City. We examined the sensitivity of estimations to: i) the subpopulations that were sampled, ii) the temporal baseline of sampling, and iii) uncertainty in the sampling. Results using simulated data are threefold. First, reasonable estimations of time-varying detection probabilities require the reporting of new detected cases (asymptomatic and symptomatic), dead, and recovered. Second, given noiseless measurements, a temporal baseline of 101 days is sufficient for the SDA procedure to capture the general trends in the evolution of the model populations, the detection probabilities, and the time-varying transmission rate following the implementation of social distancing. Importantly, the information contained in the measured detected populations propagates successfully to the estimation of the numbers of severe undetected cases. Third, the state evolution - and importantly the populations requiring inpatient care - tolerates low (~ five percent) noise, given a doubling of the temporal baseline of measurements; the parameter estimates do not tolerate this contamination. Finally, we discuss necessary modifications prior to testing with real data, including lowering the sensitivity of parameter estimates to noise in data.

Model

The model is written in 22 state variables, each representing a subpopulation of people; the total population is conserved. Fig. 1 shows a schematic of the model structure. Each member of a Population S that becomes Exposed (E) ultimately reaches either the Recovered (R) or Dead (D) state. Absent additive noise, the model is deterministic. Five variables correspond to measured quantities in the inference experiments.

Fig. 1

Schematic of the model. Each rectangle represents a population. Note the distinction of asymptomatic cases, undetected cases, and the two tiers of hospitalized care: H and C. The aim of including this degree of resolution is to inform policy on social behavior so as to minimize strain on hospital capacity. The red ovals indicate the variables that correspond to measured quantities in the inference experiments. As noted, the model is written with the aim to inform policy on social behavior and contact tracing so as to avoid exceeding hospital capacity. To this end, the model resolves asymptomatic-versus-symptomatic cases, undetected-versus-detected cases, and the two tiers of hospital needs: the general (inpatient, non-intensive care unit (ICU)) H versus the critical care (ICU) C populations. The resolution of asymptomatic versus symptomatic cases was motivated by an interest in what interventions are necessary to control the epidemic. For example, is it sufficient to focus only on symptomatic individuals, or must we also target and address asymptomatic individuals who may not even realize they are infected? The detected and undetected populations exist for two reasons. First, we seek to account for underreporting of cases and deaths. Second, we desire a model structure that can simulate the impact of increasing detection rates on disease transmission, including the impact of contact tracing. Thus the model was structured from the beginning so that we might examine the effects of interventions that were imposed later on. The ultimate aim here is to inform policy on the requirements for containing the epidemic. We included both H and C populations because hospital inpatient and ICU bed capacities are the key health system metrics that we aim to avoid straining. Any policy that we consider must include predictions on inpatient and ICU bed needs. Preparing for those needs is a key response if or when the epidemic grows uncontrolled. For details of the model, including the differential equations describing the mass action between susceptible and infectious individuals and the disease progression through different sub-populations, see Appendix A.

Method

General inference formulation

SDA is an inference procedure, or a type of machine learning, in which a model dynamical system is assumed to underlie any measured quantities. This model F can be written as a set of D ordinary differential equations that evolve in some parameterization t as:where the components xa of the vector x are the model state variables, and unknown parameters to be estimated are contained in p(t). A subset L of the D state variables is associated with measured quantities. One seeks to estimate the p unknown parameters and the evolution of all state variables that is consistent with the L measurements. A prerequisite for estimation using real data is the design of simulated experiments, wherein the true values of parameters are known. In addition to providing a consistency check, simulated experiments offer the opportunity to ascertain which and how few experimental measurements, in principle, are necessary and sufficient to complete a model.

Optimization framework

SDA can be formulated as an optimization, wherein a cost function is extremized. We take this approach, and write the cost function in two terms: 1) one term representing the difference between state estimate and measurement (measurement error), and 2) a term representing model error. It will be shown below in this Section that treating the model error as finite offers a means to identify whether a solution has been found within a particular region of parameter space. This is a non-trivial problem, as any nonlinear model will render the cost function non-convex. We search the surface of the cost function via the variational method, and we employ a method of annealing to identify a lowest minimum - a procedure that has been referred to loosely in the literature as variational annealing (VA). The cost function A0 used in this paper is written as: One seeks the path X0 = x (0), …, x(N), p (0), …p(N) in state space on which A0 attains a minimum value1. Note that Equation (1) is shorthand; for the full form, see Appendix A of Ref (Armstrong, 2020). For a derivation - beginning with the physical Action of a particle in state space - see Ref (Abarbanel, 2013). The first squared term of Equation (1) governs the transfer of information from measurements yl to model states xl. The summation on j runs over all discretized timepoints J at which measurements are made, which may be a subset of all integrated model timepoints. The summation on l is taken over all L measured quantities. The second squared term of Equation (1) incorporates the model evolution of all D state variables xa. The term fa (x(n)) is defined, for discretization, as: . The outer sum on n is taken over all discretized timepoints of the model equations of motion. The sum on a is taken over all D state variables. Rm and Rf are inverse covariance matrices for the measurement and model errors, respectively. We take each matrix to be diagonal and treat them as relative weighting terms, whose utility will be described below in this Section. The procedure searches a (D (N + 1) + p (N + 1))-dimensional state space, where D is the number of state variables, N is the number of discretized steps, and p is the number of unknown parameters. To perform simulated experiments, the equations of motion are integrated forward to yield simulated data, and the VA procedure is challenged to infer the parameters and the evolution of all state variables - measured and unmeasured - that generated the simulated data. This specific formulation has been tested with chaotic models (Abarbanel et al., 2011; Rey et al., 2014; Ye et al., 2014, 2015), and used to estimate parameters in models of biological neurons (Armstrong, 2020; Meliza et al., 2014; Kadakia et al., 2016; Kostuk et al., 2012; Toth et al., 2011; Wang, Breen, Abraham, Abarbanel, & Cauwenberghs, 2016), as well as astrophysical scenarios (Armstrong et al. 2017).

Annealing to identify a solution on a non-convex cost function surface

Our model is nonlinear, and thus the cost function surface is non-convex. For this reason, we iterate - or anneal - in terms of the ratio of model and measurement error, with the aim to gradually freeze out a lowest minimum. This procedure was introduced in Ref (Ye et al., 2015), and has since been used in combination with variational optimization on nonlinear models in Refs (An et al. 2017; Armstrong, 2020; Armstrong et al. 2017; Kadakia et al., 2016) above. The annealing works as follows. We first define the coefficient of measurement error Rm to be 1.0, and write the coefficient of model error Rf as: , where Rf,0 is a small number near zero, α is a small number greater than 1.0, and β is initialized at zero. Parameter β is our annealing parameter. For the case in which β = 0, relatively free from model constraints the cost function surface is smooth and there exists one minimum of the variational problem that is consistent with the measurements. We obtain an estimate of that minimum. Then we increase the weight of the model term slightly, via an integer increment in β, and recalculate the cost. We do this recursively, toward the deterministic limit of Rf ≫ Rm. The aim is to remain sufficiently near to the lowest minimum to not become trapped in a local minimum as the surface becomes resolved. We will show in Results that a plot of the cost as a function of β reveals whether a solution has been found that is consistent with both measurements and model.

The experiments

Simulated experiments

We based our simulated locality loosely on New York City, with a population of 9 million. For simplicity, we assume a closed population. Simulations ran from an initial time t0 of four days prior to 2020 March 1, the date of the first reported COVID-19 case in New York City (New York Times, 2020). At time t0, there existed one detected symptomatic case within the population of 9 million. In addition to that one initial detected case, we took as our initial conditions on the populations to be: 50 undetected asymptomatics, 10 undetected mild symptomatics, and one undetected severe symptomatic.2 We chose five quantities as unknown parameters to be estimated (Table 1): 1) the time-varying transmission rate Ki(t); 2) the detection probability of mild symptomatic cases dSym(t), 3) the detection probability of severe symptomatic cases dSys(t), 4) the fraction of cases that become symptomatic fsympt, and 5) the fraction of symptomatic cases that become severe enough to require hospitalization fsevere. Here we summarize the key features that we sought to capture in modeling these parameters; for their mathematical formulations, see Appendix B.

Table 1

Parameter	Description
K_i(t)	Time-varying transmission rate
d_Sym(t)	Time-varying detection probability of mild symptomatics
d_Sys(t)	Time-varying detection probability of symptomatics requiring hospitalization
f_sympt	Fraction of positive cases that produce symptoms
f_severe	Fraction of symptomatics that are severe

Unknown parameters to be estimated. Ki, dSym, and dSys are taken to be time-varying. Parameters fsympt and fsevere are constant numbers, as they are assumed to reflect an intrinsic property of the disease. The detection probability of asymptomatic cases is taken to be known and zero. The transmission rate Ki (often referred to as the effective contact rate) in a given population for a given infectious disease is measured in effective contacts per unit time. This may be expressed as the total contact rate multiplied by the risk of infection, given contact between an infectious and a susceptible individual. The contact rate, in turn, can be impacted by amendments to social behavior3. As a first step in applying SDA to a high-dimensional epidemiological model, we chose to condense the significance of Ki into a relatively simple mathematical form. We assumed that Ki was constant prior to the implementation of a social-distancing mandate, which then effected a rapid transition of Ki to a lower constant value. Specifically, we modeled Ki as a smooth approximation to a Heaviside function that begins its decline on March 22, the date that the stay-at-home order took effect in New York City (NY Governor’s Office, 2020): 25 days after time t0. For further simplicity, we took Ki to reflect a single implementation of a social distancing protocol, and adherence to that protocol throughout the remaining temporal baseline of estimation. Detection rates impact the sizes of the subpopulations entering hospitals, and their values are highly uncertain (Li et al., 2020a; Weinberger et al., 2020). Thus we took these quantities to be unknown, and - as detection methods will evolve - time-varying. We also optimistically assumed that the methods will improve, and thus we described them as increasing functions of time. We used smoothly-varying forms, the first linear and the second quadratic, to preclude symmetries in the model equations. Meanwhile, we took the detection probability for asymptomatic cases (dAs) to be known and zero, a reasonable reflection of the state of testing in that population during our study period. Finally, we assigned as unknowns the fraction of cases that become symptomatic (fsympt) and fraction of symptomatic cases that become sufficiently severe to require hospitalization (fsevere), as these fractions possess high uncertainties (Refs (Oran and Topol, 2020) and (Salje et al., 2020), respectively). As they reflect an intrinsic property of the disease, we took them to be constants. All other model parameters were taken to be known and constant (Appendix A); however, the values of many other model parameters also possess significant uncertainties given the reported data, including, for example, the fraction of those hospitalized that require ICU care. Future VA experiments can treat these quantities as unknowns as well. The simulated experiments are summarized in the schematic of Fig. 2. They were designed to probe the effects upon estimations of three considerations: a) the number of measured subpopulations, b) the temporal baseline of measurements, and c) contamination of measurements by noise. To this end, we designed a “base” experiment sufficient to yield an excellent solution, and then four variations on this experiment.

Fig. 2

Schematic of the four simulated experiments.

Schematic of the four simulated experiments. The base experiment (denoted "i" in Fig. 2) possesses the following features: a) five measured populations: detected asymptomatic Asdet, detected mild symptomatic Symdet, detected severe symptomatic Sysdet, Recovered R, and Dead D; b) a temporal baseline of 101 days, beginning on 2020 February 26; c) no noise in measurements. The three variations on this basic experiment (denoted "ii" through "iv" in Fig. 2), incorporate the following independent changes. In Experiment ii, the R population is not measured - an example designed to reflect the current situation in some localities (e.g. Refs (Li et al., 2020a; Weinberger et al., 2020)). Experiment iii includes a ~ five percent noise level (for the form of additive noise, see Appendix C) in the simulated R data, and Experiment iv includes that noise level in addition to a doubled temporal baseline. For each experiment, twenty independent calculations were initiated in parallel searches, each with a randomly-generated set of initial conditions on state variable and parameter values. For technical details of all experimental designs and implementation, see Appendix C.

Result

General findings

The salient results for the simulated experiments i through iv are as follows: • (base experiment): Excellent estimate of all - measured and unmeasured - state variables, and all parameters except for Ki(t) at times prior to the onset of social distancing; • (absent a measurement of Population R): Poor estimate of all quantities; • (~ 5% additive noise in R): Poor estimates of all quantities; • (~ 5% additive noise in R, with a doubled baseline of 201 days): Estimates of state evolution are robust to noise, while parameter estimates are sensitive to noise. Figures of the estimated time evolution of state variables and time-varying parameters are shown in their respective subsections, and the estimates of the static parameters are listed in Table 2.

Table 2

Estimates of static parameters fsympt and fsevere over all simulated experiments. The established values are taken from Refs (Oran and Topol, 2020) and (Salje et al., 2020). For Experiments i and iv, the reported numbers are taken from the annealing iteration with a value of parameter β of 32 and 40, respectively: once the deterministic limit has been reached (see text). For Experiment ii, an attempt was made to retrieve parameter estimates at β = 2; that is: before the solution grows unstable exponentially (see Fig. 5). See specific subsections for details of each experiment.

Experiment	f_sympt	(established: 0.6)	f_severe	(established: 0.07)
	Mean	Variance	Mean	Variance
i	0.59	2 × 10⁻⁴	0.07	4 × 10⁻⁶
ii	–
iii	–
iv	0.39	0.8	0.19	0.2

Fig. 5

Cost versus β for Experiment ii: R is not measured. As β increases, the cost increases indefinitely, indicating that no solution has been found that is consistent with both measurements and model dynamics.

Base experiment i

The base experiment that employed five noiseless measured populations over 101 days yielded an excellent solution in terms of model evolution and parameter estimates. Prior to examining the solution, we shall first show the cost function versus the annealing parameter β, as this distribution can serve as a tool for assessing the significance of a solution. Fig. 3 shows the evolution of the cost throughout annealing, for the ten distinct independent paths that were initiated; the x-axis shows the value of Annealing Parameter β, or: the increasing rigidity of the model constraint. At the start of iterations, the cost function is mainly fitting the measurements to data, and its value begins to climb as the model penalty is gradually imposed. If the procedure finds a solution that is consistent not only with the measurements, but also with the model, then the cost will plateau. In Fig. 4, we see this happen, around β = 30, with some scatter across paths. The reported estimates in this Subsection are taken at a value of β of 32: on the plateau. The significance of this plateau will become clearer upon examining the contrasting case of Experiment ii.

Fig. 3

Fig. 4

Estimates of the state - measured and unmeasured - variables, and the time-varying parameters Ki, dSym, and dSys, for the base experiment i. Excellent estimates are obtained of all states and parameters, except early values of Ki prior to the implementation of social distancing; see text. The dotted blue lines are the simulated data. Solid red, black, and green lines are SDA estimates of measured variables, unmeasured variables, and parameters, respectively. These conventions also hold for Fig. 6, Fig. 7. Results are taken at a value for annealing parameter β of 32.

Cost function plotted at each annealing step β for the base experiment i, for twenty paths in state space, where β scales the rigidity of the imposed model constraint. At low β the procedure endeavours to fit the measured variables to the simulated measurements. As β increases, the cost increases until it approaches a plateau (around β = 30), indicating that a solution has been found that is consistent with both measurements and model. Estimates of the state - measured and unmeasured - variables, and the time-varying parameters Ki, dSym, and dSys, for the base experiment i. Excellent estimates are obtained of all states and parameters, except early values of Ki prior to the implementation of social distancing; see text. The dotted blue lines are the simulated data. Solid red, black, and green lines are SDA estimates of measured variables, unmeasured variables, and parameters, respectively. These conventions also hold for Fig. 6, Fig. 7. Results are taken at a value for annealing parameter β of 32.

Fig. 6

Estimates for Experiment ii: without a measurement of Population R. This result is taken at β = 2, prior to the exponential runaway in the cost. Estimates of unmeasured states and time-varying parameters are poor.

Fig. 7

Estimates for Experiment iv: low noise added to Population R and with a doubled temporal baseline of 201 days. The noise added to R propagates to some unmeasured States (S, E, As, and Asdet), but the overall evolution is captured well. The noise precludes an estimate of the time-varying parameters (not shown). Results are reported using a value for β of 40.

We now examine the state and parameter estimates for the base experiment i. For all experiments, each solution shown is representative of the solution for all twenty paths. Fig. 4 shows an excellent estimate of all state variables during the temporal window in which the measured variables were sampled. For consistency in illustrating the time evolution of all state variables, we use the state estimates for the Recovered (R) and Dead (D) populations, which are cumulative, rather than follow standard epidemiological practice of showing incident R or D. The time-varying parameters are also estimated well, excepting Ki(t) at times prior to its steep decline. We noted no improvement in this estimate for Ki(t), following a tenfold increase in the temporal resolution of measurements (not shown). The procedure does appear to recognize that a fast transition in the value of Ki occurred at early times, and that value was previously higher. It will be important to investigate the reason for this failure in the estimation of Ki at early times, to rule out numerical issues involved with the quickly-changing derivative4.

Experiment ii: no measurement of R

Fig. 5 shows the cost as a function of annealing for the case with no measurement of Recovered Population R. Without examining the estimates, we know from the Cost(β) plot that no solution has been found that is consistent with both measurements and model: no plateau is reached. Rather, as the model constraint strengthens, the cost increases exponentially. Cost versus β for Experiment ii: R is not measured. As β increases, the cost increases indefinitely, indicating that no solution has been found that is consistent with both measurements and model dynamics. Indeed, Fig. 6 shows the estimation, taken at β = 2, prior to the runaway behavior. Note the excellent fit to the measured states and simultaneous poor fit to the unmeasured states. As no stable solution is found at high β, we conclude that there exists insufficient information in Asdet, Symdet, Sysdet, and D alone to corral the procedure into a region of state-and-parameter space in which a model solution is possible. We repeated this experiment with a doubled baseline of 201 days, and noted no improvement (not shown). Estimates for Experiment ii: without a measurement of Population R. This result is taken at β = 2, prior to the exponential runaway in the cost. Estimates of unmeasured states and time-varying parameters are poor.

Experiments iii and iv: low noise added

In Experiment ii, the low noise added to R yielded a poor state and parameter estimate (not shown). With a doubled temporal baseline of measurements (Experiment iv), however, the state estimate became robust to the contamination. Fig. 7 shows this estimate. While the ~ five percent noise added to Population R propagates to the unmeasured States S, E, and P, the general state evolution is still captured well. Importantly, the populations entering the hospital are well estimated. Note that some low state estimates (e.g. As) are not perfectly offset by high estimates (e.g. Sym). The addition of noise in these numbers - by definition - breaks the conservation of the population. Finally, the parameter estimates for Experiment iv do not survive the added contamination (not shown). Estimates for Experiment iv: low noise added to Population R and with a doubled temporal baseline of 201 days. The noise added to R propagates to some unmeasured States (S, E, As, and Asdet), but the overall evolution is captured well. The noise precludes an estimate of the time-varying parameters (not shown). Results are reported using a value for β of 40.

Conclusion

We have endeavoured to illustrate the potential of SDA to systematically identify the specific measurements, temporal baseline of measurements, and degree of measurement accuracy, required to estimate unknown model parameters in a high-dimensional model designed to examine the complex problems that COVID-19 presents to hospitals. In light of our assumed knowledge of some model parameters, we restrict our conclusions to general comments. We emphasize that estimation of the full model state requires measurements of the detected cases but not the undetected, provided that the recovered and dead are also measured. The state evolution is tolerant to low noise in these measurements, while the parameter estimates are not. The ultimate aim of SDA is to test the validity of model estimation using real data, via prediction. In advance of that step, we are performing a detailed study of the model’s sensitivity to contamination in the measurable populations Asdet, Symdet, Sysdet, R, and D. Concurrently we are examining means to render the parameter estimation less sensitive to noise, via various additional equality constraints in the cost function, and loosening the assumption of Gaussian-distributed noise. In particular, we shall require that the time-varying parameters be smoothly-varying. It will be important to examine the stability of the SDA procedure over a range of choices for parameter values and initial numbers for the infected populations. This procedure can be expanded in many directions. Currently we are working to divide the model subpopulations by age, and to include age-specific parameters such as susceptibility and the likelihood of requiring hospitalization and intensive care. Specifically, SDA might inform the question of whether the contact matrices among age groups are non-stationary - a question of high interest for predicting age-dependent susceptibility during a second wave (ABC News, 2020) of the virus. Other avenues for expansion are as follows: 1) define additional model parameters as unknowns to be estimated, including the fraction of patients hospitalized, the fraction who enter critical care, and the various timescales governing the reaction equations; 2) impose various constraints regarding the unknown time-varying quantities, particularly transmission rate Ki(t), and identifying which forms permit a solution consistent with measurements; 3) examine model sensitivity to the initial numbers within each population; 4) examine model sensitivity to the temporal frequency of data sampling. Moreover, it is our hope that the procedure described in this paper can guide the application of SDA to a host of complicated questions surrounding COVID-19.

Table 3

State variables of the COVID-19 transmission model. The “detected” qualifier signifies that the population has been tested and is positive for COVID-19.

Variable	Description
S	Susceptible
E	Exposed
As_det	Asymptomatic, detected
As	Asymptomatic, undetected
Sym_det	Symptomatic mild, detected
Sym	Symptomatic mild, undetected
Sys_det	Symptomatic severe, detected
Sys	Symptomatic severe, undetected
H_1,det	Hospitalized and will recover, detected
H_2,det	Hospitalized and will go to critical care and recover, detected
H_3,det	Hospitalized and will go to critical care and die, detected
H₁	Hospitalized and will recover, undetected
H₂	Hospitalized and will go to critical care and recover, undetected
H₃	Hospitalized and will go to critical care and die, undetected
C_2,det	In critical care and will recover, detected
C_3,det	In critical care and will die, detected
C₂	In critical care and will recover, undetected
C₃	In critical care and will die, undetected
R	Recovered
D	Dead

Table 4

The model parameters, with the unknown parameters to be estimated denoted in boldface. The unknown parameters Ki, Sym, and dSys are taken to be time-varying. The unknown parameters fsympt and fsevere are taken to be intrinsic properties of the disease and therefore constant numbers. The detection probability of asymptomatic cases is taken to be known and zero. Units of time are days.

Parameter	Description	Value
N	Total population	9,000,000
Reduced	The property that a detected case is likely to transmit less, via successful quarantine)	0.2
K_i(t)	Transmission rate	See Appendix B
d_As(t)	Detection probability of asymptomatic cases	0.0
f_sympt	Fraction of positive cases that produce symptoms	0.6 (Oran and Topol, 2020)
t_infection	Time from exposure to infection	4.0 (Li et al., 2020c)
t_R,a	Time to recovery for asymptomatics	8.0 Assumed to be same as tR,m
d_Sym(t)	Detection probability of mild symptomatics	See Appendix B
d_Sys(t)	Detection probability of severe symptomatics	See Appendix B
f_severe	Fraction of symptomatics that are severe	0.07 (Salje et al., 2020)
t_sympt	Time to symptoms, for symptomatics	4.0 (Roman et al., 2020; Jing et al., 2020)
t_R,m	Time from symptoms to recovery, for mild symptomatics	8.0 (Roman et al., 2020)a
f_H	Fraction of severe cases that are hospitalized and then recover: f_H = 1.0 − f_C − f_D	0.66
f_C	Fraction of severe cases that require critical care and then recover	0.3 (Lewnard et al., 2020)
f_D	Fraction of severe cases that die	0.04 (Wang et al., 2019)
t_H	Time from symptoms to hospital, for severe symptomatics	5.0 (Huang et al., 2020)
t_R,h	Time from entering hospital to recovery, for severe symptomatics that do not require critical care	10.0 (Lewnard et al., 2020; Wang et al., 2019)
t_C	Time from entering hospital to critical care, for severe symptomatics	5.0 (Huang et al., 2020)
t_R,c	Time from entering critical care to recovery for severe symptomatics	10.0 (Bi et al., 2020)
t_D	Time from entering critical care to death, for severe symptomatics	5.0 (Yang et al., 2020)

aAs described in (Roman et al., 2020), viral load can be high and detectable for up to 20 days. We choose a shorter duration of infectiousness to capture the time during which transmissibility is highest.

26 in total

1. Statistical data assimilation for estimating electrophysiology simultaneously with connectivity within a biological neuronal network.

Authors: Eve Armstrong
Journal: Phys Rev E Date: 2020-01 Impact factor: 2.529

2. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China.

Authors: Dawei Wang; Bo Hu; Chang Hu; Fangfang Zhu; Xing Liu; Jing Zhang; Binbin Wang; Hui Xiang; Zhenshun Cheng; Yong Xiong; Yan Zhao; Yirong Li; Xinghuan Wang; Zhiyong Peng
Journal: JAMA Date: 2020-03-17 Impact factor: 56.272

3. Bayesian tracking of emerging epidemics using ensemble optimal statistical interpolation.

Authors: Loren Cobb; Ashok Krishnamurthy; Jan Mandel; Jonathan D Beezley
Journal: Spat Spatiotemporal Epidemiol Date: 2014-07-09

4. Virological assessment of hospitalized patients with COVID-2019.

Authors: Roman Wölfel; Victor M Corman; Wolfgang Guggemos; Michael Seilmaier; Sabine Zange; Marcel A Müller; Daniela Niemeyer; Terry C Jones; Patrick Vollmar; Camilla Rothe; Michael Hoelscher; Tobias Bleicker; Sebastian Brünink; Julia Schneider; Rosina Ehmann; Katrin Zwirglmaier; Christian Drosten; Clemens Wendtner
Journal: Nature Date: 2020-04-01 Impact factor: 49.962

5. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures.

Authors: Jacco Wallinga; Peter Teunis
Journal: Am J Epidemiol Date: 2004-09-15 Impact factor: 4.897

6. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study.

Authors: Qifang Bi; Yongsheng Wu; Shujiang Mei; Chenfei Ye; Xuan Zou; Zhen Zhang; Xiaojian Liu; Lan Wei; Shaun A Truelove; Tong Zhang; Wei Gao; Cong Cheng; Xiujuan Tang; Xiaoliang Wu; Yu Wu; Binbin Sun; Suli Huang; Yu Sun; Juncen Zhang; Ting Ma; Justin Lessler; Tiejian Feng
Journal: Lancet Infect Dis Date: 2020-04-27 Impact factor: 25.071

7. Estimation of incubation period distribution of COVID-19 using disease onset forward time: A novel cross-sectional and forward follow-up study.

Authors: Jing Qin; Chong You; Qiushi Lin; Taojun Hu; Shicheng Yu; Xiao-Hua Zhou
Journal: Sci Adv Date: 2020-08-14 Impact factor: 14.136

8. An updated estimation of the risk of transmission of the novel coronavirus (2019-nCov).

Authors: Biao Tang; Nicola Luigi Bragazzi; Qian Li; Sanyi Tang; Yanni Xiao; Jianhong Wu
Journal: Infect Dis Model Date: 2020-02-11

9. Improved inference of time-varying reproduction numbers during infectious disease outbreaks.

Authors: R N Thompson; J E Stockwin; R D van Gaalen; J A Polonsky; Z N Kamvar; P A Demarsh; E Dahlqwist; S Li; E Miguel; T Jombart; J Lessler; S Cauchemez; A Cori
Journal: Epidemics Date: 2019-08-26 Impact factor: 4.396

10. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors: Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal: Science Date: 2020-03-16 Impact factor: 47.728

3 in total

Review 1. Mathematical models to study the biology of pathogens and the infectious diseases they cause.

Authors: Joao B Xavier; Jonathan M Monk; Saugat Poudel; Charles J Norsigian; Anand V Sastry; Chen Liao; Jose Bento; Marc A Suchard; Mario L Arrieta-Ortiz; Eliza J R Peterson; Nitin S Baliga; Thomas Stoeger; Felicia Ruffin; Reese A K Richardson; Catherine A Gao; Thomas D Horvath; Anthony M Haag; Qinglong Wu; Tor Savidge; Michael R Yeaman
Journal: iScience Date: 2022-03-15

2. Analysis of COVID-19 Spread in Tokyo through an Agent-Based Model with Data Assimilation.

Authors: Chang Sun; Serge Richard; Takemasa Miyoshi; Naohiro Tsuzu
Journal: J Clin Med Date: 2022-04-25 Impact factor: 4.964

3. Analysis of COVID-19 in Japan with extended SEIR model and ensemble Kalman filter.

Authors: Q Sun; T Miyoshi; S Richard
Journal: J Comput Appl Math Date: 2022-08-28 Impact factor: 2.872

3 in total