Literature DB >> 35831702

Semiparametric regression analysis of doubly-censored data with applications to incubation period estimation.

Abstract

The incubation period is a key characteristic of an infectious disease. In the outbreak of a novel infectious disease, accurate evaluation of the incubation period distribution is critical for designing effective prevention and control measures . Estimation of the incubation period distribution based on limited information from retrospective inspection of infected cases is highly challenging due to censoring and truncation. In this paper, we consider a semiparametric regression model for the incubation period and propose a sieve maximum likelihood approach for estimation based on the symptom onset time, travel history, and basic demographics of reported cases. The approach properly accounts for the pandemic growth and selection bias in data collection. We also develop an efficient computation method and establish the asymptotic properties of the proposed estimators. We demonstrate the feasibility and advantages of the proposed methods through extensive simulation studies and provide an application to a dataset on the outbreak of COVID-19.

Entities: Chemical

Keywords: COVID-19; Cox proportional hazards model; Sieve estimation; Survival analysis; Truncated data

Year: 2022 PMID： 35831702 PMCID： PMC9281361 DOI： 10.1007/s10985-022-09567-3

Source DB: PubMed Journal: Lifetime Data Anal ISSN： 1380-7870 Impact factor: 1.429

Introduction

The incubation period is an important epidemiological feature of infectious diseases. It is defined as the time elapsed from infection of the disease to the onset of symptoms. Accurate evaluation of the incubation period distribution of a novel infectious disease can inform effective prevention and control measures, such as the appropriate duration of quarantine and the time to reopen workplaces, schools, and restaurants after a lockdown. Also, the distribution of the incubation period can be used to estimate the reproduction number and to predict transmission dynamics and epidemic trends. Typically, early in a pandemic, only limited information, such as the travel history, contact history, symptom onset date, or hospitalization date of infected cases, is available for analyzing the incubation period. As a result, estimation of the incubation period distribution is highly challenging. This study is motivated by the outbreak of coronavirus disease 2019 (COVID-19), which first occurred in early December 2019 in Wuhan, China. Our study is based on the data collected by Zhao et al. (2021), which contain information typically available early in disease outbreaks. The dataset consists of confirmed COVID-19 cases reported by local health agencies at 14 locations, including 8 cities/provinces in mainland China and 6 countries/regions in East Asia, collected in mid to late February 2020. These cases had visited or were residents of Wuhan before Wuhan’s lockdown on 23 January 2020 and were diagnosed after arrival at the 14 locations. Because Wuhan was the first center of epidemic outbreak, the cases were plausibly infected during their stay at Wuhan. Available information of the cases includes basic demographics, date of arrival at Wuhan (for travellers), date of departure from Wuhan, and date of symptom onset. Estimation of the incubation period distribution is complicated by censoring and truncation. First, the start of the incubation period, that is, the infection time, is not observed but is only known to fall within an interval (e.g., the stay at Wuhan in our example), which we refer to as the exposure period. Also, a subject may be tested to be infected before symptom onset and is not followed thereafter, in which case the symptom onset time is right-censored. Since both the infection and symptom onset times may be censored, we say that the data are doubly-censored. In addition, data were collected retrospectively, so only subjects who were infected during the exposure period could be included in the dataset, and the infection time is thus right-truncated. Several investigators have studied the analysis of doubly-censored data for the estimation of incubation period distribution. Goggins et al. (1999) considered a full-likelihood approach that includes both the infection time and incubation period distributions in the likelihood and proposed a Monte-Carlo EM algorithm for computation. Sun et al. (1999, 2004) proposed a two-step approach, where the infection time distribution is estimated using the intervals containing the infection times, and the incubation period distribution is estimated using estimating equations based on the estimated infection time distribution. Pan (2001) proposed a multiple-imputation approach to impute the infection time and applied conventional estimation methods to the imputed datasets. Reich et al. (2009) considered interval-censored infection and symptom onset times, that is, doubly interval-censored data, and studied maximum likelihood estimation under a parametric model without covariates. Dejardin and Lesaffre (2013) considered a full-likelihood approach and developed a stochastic EM algorithm for doubly interval-censored data. Li and Owzar (2016) considered doubly interval-censored data and allowed the infection time and incubation period to be associated. Li et al. (2020) considered interval-censored infection times and a mixture of exactly-observed and interval-censored symptom onset times. Although our data structure bears superficial resemblance with those considered in the literature on the analysis of doubly-censored data, there are important differences that make the current analysis substantially more difficult. In particular, information on the infection time is very limited in the current study. First, there is essentially only one “observation time” for the infection (e.g., the time of departure from Wuhan in our example), so the infection time is subject to case-1 interval censoring instead of general interval censoring, yielding much less information. Second, subjects with infection time larger than the observation time could not be observed, so there is a severe truncation problem. In fact, the observation times alone contain no information about the infection time distribution. Thus, imputation or estimating-equation approaches of Sun et al. (1999, 2004), Pan (2001), and Li et al. (2020), which require a consistent estimator of the infection time distribution, are not applicable. We propose a likelihood approach to estimate the incubation period distribution. In particular, we jointly model the infection time and time to symptom onset, while taking into account the censoring and truncation issues and the growth of the pandemic over time. We assume a semiparametric Cox proportional hazards model for the incubation period, propose a sieve maximum likelihood approach for estimation, and devise an efficient computation method. We also establish the consistency, rate of convergence, and asymptotic normality of the proposed estimators. The current study differs from most existing work on incubation period analysis for COVID-19 in two important aspects. First, most existing studies are based on parametric models (Backer et al. 2020; Lauer et al. 2020; Linton et al. 2020; Deng et al. 2020; Qin et al. 2020), whereas we consider a flexible semiparametric model with a nonparametric baseline hazard function. In particular, based on infected subjects’ time to symptom onset since their departure from Wuhan, Deng et al. (2020) and Qin et al. (2020) estimated the (parametric) incubation period distribution using renewal process theory with the incubation period treated as a renewal and the duration between departure and symptom onset as a forward recurrence time. Second, we allow the incubation period distribution to depend on covariates, such as sex and age. While most existing studies focused on the overall incubation period, some studies suggested that the incubation period distribution may differ across patients from different subpopoulations. For example, Dai et al. (2020) and Tan et al. (2020) found that older subjects have longer median incubation period than younger subjects, and Xiao et al. (2021) suggested that the incubation period distribution is associated with the meteorological temperature of the area. A regression framework allows us to more precisely evaluate the incubation period distribution for given subpopulations. Such evaluations are important for designing targeted quarantine and isolation policies. The rest of this paper is structured as follows. In Sect. 2, we formulate the model, define the sieve maximum likelihood estimator (MLE), and describe the numerical implementation of the proposed methods. In Sect. 3, we present asymptotic properties of the proposed estimators. In Sect. 4, we report the results from simulation studies, and in Sect. 5, we present the analysis results of the COVID-19 dataset. We make some concluding remarks in Sect. 6. Technical details are relegated to the Appendix.

Methods

For a generic subject, let be the exposure start time, measured from a fixed initial calendar time. Let S be the time to infection since the exposure start, be the time to symptom onset since the exposure start, and be a p-vector of covariates. Assume that is the hazard of infection at time t (since the exposure start), where and r are positive parameters, represents the hazard at the initial calendar time, and r characterizes the growth of the pandemic. This model assumes that the hazard of infection grows exponentially with rate r since the initial calendar time. When , the model reduces to the model for the time to infection considered in Zhao et al. (2021). Assume that the incubation period follows the proportional hazards model with covariates , so the hazard function is , where is an unspecified baseline hazard function, and is a vector of regression parameters. Assume that S and are independent. Suppose that each subject is exposed to the risk of infection until time U (measured from the exposure start), and a subject can be observed only if the infection occurs before U. Assume that U is independent of . The conditional density of given , U, and the event iswhere , , and and are (conditional) densities of and S, respectively. We allow the symptom onset time to be right-censored. Let C be the censoring time (measured from the exposure start), be the observed event or censoring time, and be the event indicator. Also, S is unobserved but is only known to fall within (0, U). For a sample of size n, the observed data consist of for . The conditional likelihood function for given the event that infection occurs before the end of exposure isThe conditional likelihood incorporates the infection time distribution and thereby accounts for the pandemic growth over time. Also, because only subjects who were infected during the exposure period could be observed, there is selection bias induced by truncation of infection times. To account for such selection bias, we consider the condition likelihood given . Because the (conditional) likelihood function involves the nonparametric function , maximum likelihood estimation is highly challenging or even infeasible. We propose a sieve maximum likelihood approach and approximate by B-spline functions. In particular, let be a set of B-spline functions of order l over a knot sequence for some fixed time point . In practice, we can set . Define the sieve spacefor some diverging sequence . Let be a prespecified compact set in that denotes the parameter space for the Euclidean parameters. The sieve MLE is . We propose to log-transform the positive parameters and use a hybrid of gradient descent and the Newton–Raphson mehod to compute the sieve MLE. In particular, in early iterations, we update the parameter estimates using the gradient descent method on the log-likelihood function. Let be the vector of first derivatives of with respect to . We initialize the parameter vector at , and at the jth iteration, we use the gradient descent method to update bywhere is the current parameter estimates, and is some negative number determined by the Armijo backtracking method. We first set to be the Barzilai–Borwein step size (Sun and Yuan 2006, pp. 126–127). Specifically, let and . We initially set for and for . If the updated log-likelihood value does not pass the Armijo–Goldstein condition, that is, if it is smaller than under the current value of , then we recalculate the parameter estimates with discounted by a multiplicative factor of 0.8 until the resulting log-likelihood value passes the condition. When the norm of the score statistic is smaller than a certain threshold, we perform the Newton–Raphson algorithm until convergence. In particular, at the jth iteration, the Newton-Raphson algorithm updates bywhere is the Hessian matrix for with respect to . When the parameter estimates are not close enough to the maximum, the Hessian matrix may not be negative definite, and there is a direction in the parameter space along which the log-likelihood function is locally convex. In this case, the Newton–Raphson step should not be performed, and instead we update the parameter estimates along this direction of convexity. Let denote the eigenvector that corresponds to the largest (positive) eigenvalue of . We update by , where is selected similarly as in gradient descent. To compute the log-likelihood function and its derivatives, we use the Legendre–Gauss quadrature to approximate the integrations involved. Because the log-likelihood may possess multiple local maxima, we set the initial parameter values based on a grid search to increase the chance of obtaining the global maximum. In particular, we define a grid for . At each grid point , we compute the maximizer of the log-likelihood with respect to , denoted by , at . The parameter values that yield the largest likelihood value is set to be the initial values for the gradient methods. When the sample size is small, the likelihood may be maximized at the boundary value of 0 for some positive parameters. In this case, the maximum does not exist for the log-transformed parameters, and the algorithm may return error values at extreme parameter values. To avoid these problems, we propose to set a (small) lower boundary for the positive parameters. When the proposed updated parameter values exceed the boundary, then we update the parameters along the proposed direction but with a step size that keeps all parameters within the boundary. To estimate the standard error of the Euclidean parameter estimators, we treat the model as fully parametric, with parameters . Let be the observed information matrix under this parametric model evaluated at the sieve MLE. We estimate the variance of by the corresponding elements of .

Asymptotic properties of the sieve MLE

Let denote the collection of all parameters, the true value of , and the sieve MLE of . The regularity conditions below are needed for the forthcoming theorems. The true parameter value is an interior point of . The function has a positive lower bound and bounded qth derivative on for some and positive constant . The distribution of has a bounded support in . If with probability one for some constants and , then and . Conditional on , S and are independent, and they are independent of C and U. Also, , and the support of U includes for some positive constant . The joint density function of C and U is continuously differentiable on its support. In addition, almost surely for some positive constant . The knots satisfy for some positive constant K, and the number of knots satisfies for some , and . The exposure start time is uniformly bounded, that is, there exists a positive constant K such that .

Remark 1

Condition (C1) is typical for semiparametric regression models, and condition (C2) is necessary for the identifiability of the regression parameters. Condition (C3) ensures that the supports of C and U are wide enough to allow the identification of over the interval . Condition (C4) controls the rates at which the number of B-spline functions and the maximum spline coefficient value diverge to infinity. Condition (C5) requires that the exposure start time is uniformly bounded. Define the distance , where denotes the Euclidean norm and denotes the -norm over . We have the following results.

Theorem 1

Under conditions (C1)–(C5), almost surely, andas , where q is given in (C1).

Theorem 2

Assume that conditions (C1)–(C5) hold, , and . We havewhere is the information matrix for defined in the proof of this theorem. The proofs of Theorems 1 and 2 are given in the Appendix.

Simulation studies

We set and for , where and are independent. We set , , or 0.3, and or 0.1. The (nontruncated) means of time to infection and the truncation rates under different values of and r are given in Table 1. We generated with and , where and are independent. The mean incubation period is about 9.4. We set a universal right censoring time for at 30, resulting in a censoring proportion of around 2–3%. This low censoring rate is to match the real data, where most subjects were identified to be infected at symptom onset, and only subjects who underwent the disease test before symptom onset would be censored. We set the sample size to be or 800.

Table 1

Mean times to infection and truncation rates in the simulation studies

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	r	Mean time to infection	Truncation Rate
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-10}$$\end{document}10-10	0.2	99.2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\approx 1$$\end{document}≈1
	0.3	65.8	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\approx 1$$\end{document}≈1
0.01	0.2	9.0	0.75
0.01	0.3	5.9	0.56
0.1	0.2	2.6	0.25
0.1	0.3	1.7	0.17

Mean times to infection and truncation rates in the simulation studies We set the degree of B-spline functions ’s to be 2 and a single (inner) grid point to be 8, which is approximately the median of . The grid for the initial values of is . We set the lower boundary for the positive parameters to be . We present the bias, empirical standard deviation, mean standard error estimates, and empirical coverage of 95% confidence interval for the Euclidean parameters in Tables 2 and 3; the results are based on 1000 replicates. For and r, the coverages are calculated under the logarithm scale. We also plot the averaged estimated cumulative baseline hazard functions in Fig. 1 for ; the results for are visually identical and thus are omitted.

Table 2

Simulation results under the correct model with

Setting	Parameter	True	Bias	ESD	SEE	Cov
1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.007	0.062	0.060	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.009	0.111	0.110	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.010	0.004	0.031	0.019	0.94
	r	0.200	0.013	0.104	0.107	0.91
2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.006	0.060	0.060	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.010	0.108	0.109	0.96
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.010	0.005	0.022	0.016	0.93
	r	0.300	0.019	0.110	0.100	0.93
3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.005	0.059	0.058	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.007	0.109	0.107	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.100	0.018	0.144	0.095	0.95
	r*	0.200	0.051	0.149	0.111	0.91
4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.006	0.060	0.058	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.007	0.109	0.106	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.100	0.017	0.168	0.113	0.96
	r**	0.300	0.074	0.191	0.154	0.90

* The true value and bias for are 1.609 and 0.004

** The true value and bias for are 1.204 and 0.041

“True” stands for the true parameter value, “ESD” stands for the empirical standard deviation of the estimated values, “SEE” stands for the mean standard error estimates, and “Cov” stands for the empirical coverage of the 95% confidence interval. The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 50, 18, 4, and 3, respectively. Replicates with estimates at the boundary are discarded

Table 3

Simulation results under the correct model with

Setting	Parameter	True	Bias	ESD	SEE	Cov
1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.003	0.043	0.042	0.96
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.005	0.078	0.078	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.010	0.001	0.012	0.010	0.95
	r	0.200	0.002	0.077	0.079	0.93
2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.002	0.042	0.042	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.004	0.077	0.077	0.96
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.010	0.003	0.013	0.009	0.94
	r	0.300	0.008	0.069	0.068	0.95
3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.001	0.041	0.041	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.006	0.076	0.076	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.100	0.010	0.066	0.058	0.95
	r*	0.200	0.018	0.079	0.070	0.93
4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.002	0.040	0.040	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.005	0.075	0.075	0.96
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.100	0.006	0.062	0.057	0.97
	r**	0.300	0.036	0.127	0.105	0.93

* The true value and bias for are 1.609 and 0.017

** The true value and bias for are 1.204 and 0.050

See NOTE to Table 2. The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 17, 1, 0, and 0, respectively. Replicates with estimates at the boundary are discarded

Fig. 1

Estimated cumulative baseline hazard functions under the correct model with

Simulation results under the correct model with * The true value and bias for are 1.609 and 0.004 ** The true value and bias for are 1.204 and 0.041 “True” stands for the true parameter value, “ESD” stands for the empirical standard deviation of the estimated values, “SEE” stands for the mean standard error estimates, and “Cov” stands for the empirical coverage of the 95% confidence interval. The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 50, 18, 4, and 3, respectively. Replicates with estimates at the boundary are discarded Simulation results under the correct model with * The true value and bias for are 1.609 and 0.017 ** The true value and bias for are 1.204 and 0.050 See NOTE to Table 2. The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 17, 1, 0, and 0, respectively. Replicates with estimates at the boundary are discarded Estimated cumulative baseline hazard functions under the correct model with For and , the estimators are virtually unbiased in all settings, the standard error estimates closely resemble the empirical standard deviations, and the empirical coverages are close to the nominal 95% level. The mean estimated cumulative baseline hazard functions closely resemble the true functions. Under , the average values of and may not be close to the true values due to some extreme estimates, and the standard error estimates may be smaller than the empirical standard deviations, yielding under-coverage of the confidence intervals. The performance of the estimators is much better under , suggesting that the suboptimal performance in some cases is due to insufficient sample size. After all, the actual time to infection is never observed and is always mixed with the incubation period, leading to limited information for the estimation of the infection time distribution. When the true values of and r are small, and (the intercept of ) are at the boundary for a number of replicates, and the number is substantially smaller under a larger sample size. In some cases, has a substantial bias. This is due to skewness of the distribution of , and the bias of can be much smaller. We investigate the performance of the proposed methods under a substantially larger right-censoring proportion. This corresponds to the scenario where many subjects were tested to be infected before symptom onset (and were not followed thereafter). In particular, we generated the data as described above but set a universal censoring time of 15, resulting in a censoring proportion of about 20–35%. Because the information available for the baseline hazard function is much less due to early censoring, we set the degree of B-spline functions to be 1 instead of 2. The results for the Euclidean parameters, based on and 1000 replicates, are presented in Table 4. The estimated cumulative baseline hazard functions are virtually indistinguishable from the true function and are not presented. Under a larger censoring proportion, the standard error of the estimators are generally larger. The performance of the point and interval estimators has similar pattern as that under a small censoring rate.

Table 4

Simulation results under the correct model with and a large censoring proportion

Setting	Parameter	True	Bias	ESD	SEE	Cov
1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.007	0.072	0.071	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.008	0.134	0.133	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.010	0.004	0.025	0.019	0.95
	r	0.200	0.008	0.104	0.105	0.92
2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.006	0.068	0.070	0.96
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.009	0.131	0.130	0.95
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.010	0.006	0.026	0.017	0.93
	r	0.300	0.018	0.109	0.100	0.93
3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.005	0.067	0.065	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.005	0.124	0.122	0.96
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.100	0.023	0.214	0.108	0.95
	r*	0.200	0.050	0.148	0.112	0.91
4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.005	0.066	0.063	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.004	0.121	0.118	0.94
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	0.100	0.016	0.151	0.099	0.96
	r**	0.300	0.061	0.181	0.151	0.91

* The true value and bias for are 1.609 and 0.012

** The true value and bias for are 1.204 and 0.011

See NOTE to Table 2. The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 27, 4, 3, and 0, respectively. Replicates with estimates at the boundary are discarded

Simulation results under the correct model with and a large censoring proportion * The true value and bias for are 1.609 and 0.012 ** The true value and bias for are 1.204 and 0.011 See NOTE to Table 2. The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 27, 4, 3, and 0, respectively. Replicates with estimates at the boundary are discarded To investigate the robustness of the methods, we considered misspecified infection time distributions. In particular, we considered S from the Weibull distribution with shape parameter equal to 2 and scale parameter equal to and also , where Z is normal with mean and variance 0.64. Other variables were generated according to the original setting. We set the degree of B-spline functions to be 2, , and number of replicates to be 1000. We present the summaries for the estimators of and in Table 5 and the averaged estimated cumulative baseline hazard functions in Fig. 2. Under misspecified infection time distributions, the estimation and inference of are still highly satisfactory. As expected, the estimated cumulative hazard function departs from the true function, but the overall bias is tiny.

Table 5

Simulation results under misspecified infection time distributions

Setting	Parameter	True	Bias	ESD	SEE	Cov
Weibull	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.008	0.041	0.042	0.95
Weibull	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.011	0.079	0.076	0.94
Lognormal	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.001	0.040	0.041	0.96
Lognormal	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.005	0.077	0.075	0.94

See NOTE to Table 2. The number of replicates in which in which one or more parameter estimates are at the boundary for Setting Weibull and Setting Lognormal are 154 and 40, respectively. Replicates with estimates at the boundary are discarded

Fig. 2

Estimated cumulative baseline hazard functions under misspecified infection time distributions

Simulation results under misspecified infection time distributions See NOTE to Table 2. The number of replicates in which in which one or more parameter estimates are at the boundary for Setting Weibull and Setting Lognormal are 154 and 40, respectively. Replicates with estimates at the boundary are discarded Estimated cumulative baseline hazard functions under misspecified infection time distributions We compared the proposed methods with two simple methods, namely midpoint imputation and interval-censored data regression. For midpoint imputation, if , then we set the time to infection to be , such that the incubation period is (or is censored at) . If , then we set the time to infection to be , such that the incubation period is . We then apply the standard maximum partial-likelihood estimation method on the imputed dataset. For interval-censored data regression, we treat the incubation period as interval-censored within the smallest interval known to contain the incubation period. In particular, if , then we set the incubation period to be interval-censored between for and right-censored at for . If , then we set the incubation period to be interval-censored between ; right censoring is assumed to be impossible under . Then, we approximate the baseline hazard function using the same spline functions as in the proposed methods and estimate the parameters using (sieve) MLE. This approach is similar to the “interval-reduced” method of Reich et al. (2009), who considered a parametric model with no covariates. Note that this approach is based on the likelihood Simulation results under alternative simple methods See NOTE to Table 2 where denotes . If for and does not depend on s, then this likelihood is proportional to the proposed conditional likelihood, but this likelihood is in general incorrect. The summary results, based on and 1000 replicates, for the estimators of and are presented in Table 6, and the averaged estimated cumulative baseline hazard functions are presented in Fig. 3. Because they do not consider the pandemic growth over time, both methods yield biased estimation of the cumulative baseline hazard function. Midpoint imputation yields biased point estimation and interval estimation with under-coverage. The bias under the interval-censored data regression is small, but the standard error estimates tend to be smaller than the corresponding standard deviation.

Table 6

Simulation results under alternative simple methods

Setting				Midpoint Imputation				Interval-Censored Data Regression
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}α	r	Parameter	True	Bias	ESD	SEE	Cov	Bias	ESD	SEE	Cov
0.01	0.2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.026	0.04	0.039	0.89	0.003	0.047	0.043	0.93
0.01	0.2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	0.025	0.073	0.073	0.94	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.004	0.085	0.078	0.93
0.01	0.3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.026	0.039	0.039	0.90	0.001	0.045	0.043	0.93
0.01	0.3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	0.024	0.074	0.073	0.93	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.003	0.082	0.078	0.94
0.1	0.2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.021	0.038	0.039	0.92	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.004	0.045	0.042	0.93
0.1	0.2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	0.016	0.073	0.073	0.94	0.002	0.083	0.077	0.93
0.1	0.3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.017	0.038	0.039	0.93	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.002	0.044	0.041	0.92
0.1	0.3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	0.014	0.074	0.073	0.95	0.003	0.082	0.076	0.94

See NOTE to Table 2

Fig. 3

Estimated cumulative baseline hazard functions using alternative simple methods

Estimated cumulative baseline hazard functions using alternative simple methods When the initial hazard of infection is small, accurate estimation of the parameter is difficult. In fact, as , the density of S becomes proportional to , and the conditional density of S given does not involve . To investigate the effect of inaccurate estimation of when the true value of is small, we consider an extra set of simulation studies as follows. We generated the data according to the original setting but with ; the mean incubation times are given in Table 1. Then, we perform estimation with (incorrectly) fixed to be . This mimics the situation where the true value of is beyond the lower boundary of . We considered sample sizes of and 800. The point estimates, standard error estimates, and coverages of the Euclidean parameters based on 1000 replicates are presented in Table 7. In the calculation of the standard error estimates, we remove the row and column corresponding to in the information matrix, essentially treating as known (at ).

Table 7

Simulation results under

Setting	n	Parameter	True	Bias	ESD	SEE	Cov
1	400	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.005	0.060	0.060	0.96
		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.012	0.112	0.110	0.95
		r	0.200	0.007	0.093	0.091	0.91
2	400	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.006	0.060	0.060	0.95
		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.012	0.113	0.110	0.94
		r	0.300	0.007	0.098	0.096	0.92
3	800	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.003	0.042	0.042	0.95
		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.006	0.079	0.078	0.95
		r	0.200	0.000	0.067	0.066	0.94
4	800	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{1}$$\end{document}β1	0.500	0.004	0.043	0.042	0.96
		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{2}$$\end{document}β2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.500	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}-0.007	0.077	0.078	0.97
		r	0.300	0.000	0.067	0.070	0.95

See NOTE to Table 2.The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 45, 57, 7, and 9, respectively. Replicates with estimates at the boundary are discarded

Simulation results under See NOTE to Table 2.The number of replicates in which one or more parameter estimates are at the boundary for Settings 1–4 are 45, 57, 7, and 9, respectively. Replicates with estimates at the boundary are discarded In all cases, the biases of all parameters are small, the standard error estimates resemble the corresponding standard deviations, and the empirical coverages are close to the nominal 95% level. Similar to the results presented in Fig. 1, the estimated cumulative baseline hazard functions closely resembles the true function and are not presented. Due to misspecification of the value of , some positive parameters are estimated at the boundary, and the number of replicates at the boundary is substantially smaller under a larger sample size. Note that in the calculation of the standard error estimates, we essentially treat as known (at an incorrect value), but the standard error estimates and confidence intervals of the other parameters are still reliable. We suggest that in general, when parameters are estimated at the boundary, we can treat them as known and only evaluate the standard error estimates of the remaining parameters. In fact, in all simulation studies, if we kept the replicates with parameter estimates at the boundary and estimated the standard error of the remaining parameters in this way, the summary results would still be highly similar to the presented results.

Real data analysis

We analyzed the data set processed by Zhao et al. (2021) mentioned in Sect. 1. The data recorded confirmed COVID-19 cases from 14 locations, and the subjects were from or had visited Wuhan between 1 December 2019 and 23 January 2020; the data set was based on information from press releases of official health agencies. After removing subjects whose source of infection was not clear and those who miss information on the exposure/symptoms onset, 462 subjects with information on the period of stay at Wuhan, symptoms onset date, sex, and age are available. We set the initial calendar time (from which the exposure start time is measured) to be 1 December 2019 and for subjects who were from Wuhan. The symptoms onset dates of 6 subjects are right-censored. In the sample, 59.7% of the subjects are of age 50 or younger, and 53.9% are male. We fit the proposed model with covariates age and sex, where age is dichotomized into 50 or younger versus older than 50. We considered B-spline functions with degrees 1, 2, and 3 and number of (inner) grid points of 1 to 5 and selected the model using AIC. The location of the inner grid points were chosen to be equi-distant quantiles of ; this is the estimated distribution of the incubation period in Zhao et al. (2021). The AIC selected degree 1 with 2 grid points. Same as the simulation studies, we set a lower boundary of for the positive parameters. The estimated values of and are at the boundary of . The estimated coefficient for age is 0.2502, with standard error 0.1067, such that older subjects tend to have a shorter incubation period; this effect is significant at 5% significance level. This result is in disagreement with some findings in the literature (Dai et al. 2020; Tan et al. 2020). Nevertheless, as reported in Fig. 4 of Kong (2020), there is no consistent trend between the duration of the incubation period and age over different studies. The difference between the current results and those reported in the literature may be due to differences in patient samples. The estimated coefficient for sex is −0.0827, with standard error 0.1042, such that females tend to have a shorter incubation period than males. The estimated value of r is 0.3044, which is close to the estimated values of 0.28–0.41 reported in Zhao et al. (2021). This corresponds to a daily increase of hazard of infection by a multiplicative factor of 1.36. Fig. 4 shows the estimated survival functions of different subgroups.

Fig. 4

Estimated survival functions for different subgroups

Discussion

In this paper, we consider the estimation of the incubation period distribution, where the time to infection is subject to interval censoring and truncation, and the time to symptom onset is subject to (potential) right censoring. We consider a flexible semiparametric regression model for the incubation period distribution and propose a sieve MLE. We establish the consistency and rate of convergence of all estimators and the asymptotic normality of the Euclidean parameter estimators. One major challenge in model estimation is that information about the infection time is very limited. The infection time S is never observed and is mixed with the incubation period, and infection is only known to have occurred before U. In addition, only subjects with could be observed. Therefore, the likelihood contribution from U on the distribution of S is , that is, U alone does not contain any information on the distribution of S. As a result, existing multiple imputation approaches, such as Sun et al. (1999, 2004), are not feasible, because these approaches require estimation of the infection time distribution solely using the intervals containing the infection times. We propose to use grid search to select the initial values for the estimation. This step is essential, because the likelihood may exhibit multiple local maxima especially over different values of r. To illustrate how the local maxima may arise, we plot the conditional cumulative distribution functions (CDF) of S given in Fig. 5, where , , or 0.001, and , or 0.5. For , the CDF at different values of r are well-separated, and the likelihood function usually does not exhibit multiple local maxima. By contrast, for , the CDF at positive and negative values of r may be closer to each other than they are to the CDF at . In this case, when the true value of r is 0.25, the likelihood may exhibit a local maximum at around 0.25 and at a negative value of r (or at zero if r is restricted to be nonnegative). As a result, when the initial value of r is very close to zero or negative, the gradient methods may not be able to update r towards the positive direction.

Fig. 5

Conditional CDF of S

Conditional CDF of S For satisfactory estimation of all model parameters, the dataset should contain both large and small values of U. A smaller value of U allows the infection time to be more precisely determined, in turn providing more information to the estimation of the incubation period distribution. In fact, in the regularity conditions, we assume that the support of U includes a small interval in order to guarantee the identifiability of the incubation period distribution. On the other hand, a larger value of U provides more information to the estimation of the infection time distribution. We assume a simple distribution for the time to infection, where the hazard increases exponentially at a constant rate over time, and the distribution is equal for all subjects. A possible extension is to consider more flexible models for the time to infection. For example, if a major pandemic-related event, such as a lock-down, occurs at time , then we may allow the hazard growth rate to change after , with for some parameters and , where . Also, we may consider a regression model for S that depends on or some external covariates. In general, we can assume a nonparametric or semiparametric regression model for S and estimate the model by sieve estimation. However, because only severely-limited information is available for the estimation of the distribution of S, such flexible models should be considered only when the sample size is large. We assumed the Cox proportional hazards model for the incubation period. Although the Cox model is flexible in that the baseline hazard function is nonparametric, it assumes a particular structure for the covariates effect. It would be of interest to develop model checking techniques for the incubation period distribution. However, this would be highly challenging, as the incubation period is mixed with the infection time distribution in the likelihood, and existing methods are not applicable.

17 in total

9. Estimation of incubation period and generation time based on observed length-biased epidemic cohort with censoring for COVID-19 outbreak in China.

Authors: Yuhao Deng; Chong You; Yukun Liu; Jing Qin; Xiao-Hua Zhou
Journal: Biometrics Date: 2020-07-28 Impact factor: 1.701

10. Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China, 20-28 January 2020.

Authors: Jantien A Backer; Don Klinkenberg; Jacco Wallinga
Journal: Euro Surveill Date: 2020-02

Semiparametric regression analysis of doubly-censored data with applications to incubation period estimation.

Introduction

Methods

Asymptotic properties of the sieve MLE

Remark 1

Theorem 1

Theorem 2

Simulation studies

Real data analysis

Discussion

1. A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies.

2. Regression analysis of doubly censored failure time data with applications to AIDS studies.

3. Stochastic EM algorithm for doubly interval-censored data.

4. Semiparametric regression analysis of doubly censored failure time data from cohort studies.

5. Fitting Cox Models with Doubly Censored Data Using Spline-Based Sieve Marginal Likelihood.

6. Applying the Cox proportional hazards model for analysis of latency data with interval censoring.

7. Longer incubation period of coronavirus disease 2019 (COVID-19) in older adults.

8. Does incubation period of COVID-19 vary with age? A study of epidemiologically linked cases in Singapore.

9. Estimation of incubation period and generation time based on observed length-biased epidemic cohort with censoring for COVID-19 outbreak in China.

10. Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China, 20-28 January 2020.