Literature DB >> 35608143

Self-reporting and screening: Data with right-censored, left-censored, and complete observations.

Jonathan Yefenof^1,2, Yair Goldberg³, Jennifer Wiler⁴, Avishai Mandelbaum³, Ya'acov Ritov^1,5.

Abstract

We consider survival data that combine three types of observations: uncensored, right-censored, and left-censored. Such data arises from screening a medical condition, in situations where self-detection arises naturally. Our goal is to estimate the failure-time distribution, based on these three observation types. We propose a novel methodology for distribution estimation using both semiparametric and nonparametric techniques. We then evaluate the performance of these estimators via simulated data. Finally, as a case study, we estimate the patience of patients who arrive at an emergency department and wait for treatment. Three categories of patients are observed: those who leave the system and announce it, and thus their patience time is observed; those who get service and thus their patience time is right-censored by the waiting time; and those who leave the system without announcing it. For this third category, the patients' absence is revealed only when they are called to service, which is after they have already left; formally, their patience time is left-censored. Other applications of our proposed methodology are discussed.

Entities: Chemical

Keywords: current status data; left censoring; nonparametric estimation; right censoring; survival analysis

Year: 2022 PMID： 35608143 PMCID： PMC9546051 DOI： 10.1002/sim.9434

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

INTRODUCTION

We study the estimation of failure time distribution where the failure times can be either observed directly, or be right‐censored or left‐censored. This type of survival data arises, for example, in estimation of time to the appearance of a medical condition where characteristic symptoms may or may not appear when the condition exists. Specific medical settings include relapse in childhood brain tumors, which may be observed due to clinical symptoms, or right‐censored due to periodic screening with negative result (no tumor), or left‐censored due to periodic screening with a positive result. Another medical setting is melanoma cancer, which is observed if self‐detected, or is right censored due to a negative screening (no melanoma), or left‐censored if it goes undetected until screening. Additional examples can be found in Whitehead. The motivating example for this work comes from estimating customer patience in service system which is a challenging problem. In our study, we focus on patients who wait for treatment in an emergency department (ED). Three categories of patients are observed. The first category consists of patients who get service and thus their patience time is right‐censored by the waiting time. The second category comprises those who leave the system and announce it, and thus their patience time is observed while the waiting time is right‐censored. The third category consists of patients who leave the system without announcing it; their absence is hence revealed only when they are called to service, which is after they have already left; formally, their patience time is left‐censored. Note that the data structure is a special case of interval‐censored data. Here, interval‐censored data is a general data structure which many popular survival data settings are special cases of, including both right‐censored data and left‐censored data. The specific setting that is considered here includes both left‐censored and right‐censored observations as well as complete observations. Estimating the patience time is of importance as the decision of patients to leave the system before getting served might have a strong effect on their physical well‐being. There has been considerable research on the reasons why patients leave an ED before being served. , , , However, these and other authors have not proposed a model by which ED patience time—namely the duration that a potential patient is willing to wait for ED service—can be estimated, and this is our goal here. We propose novel semiparametric and nonparametric estimators of the unknown survival function for this 3‐type survival data. We then study their rates of convergence. The semiparametric estimator is based on both full and partial likelihoods. We provide condition under which the semiparametric estimator is a linear asymptotic normal (LAN) estimator and converges to a normal distribution in a root‐ rate. The nonparametric estimator is based on nonparametric kernel estimators for density functions and on a novel estimator of the cumulative probability function that has some similarities to the Nelson‐Aalen estimator. We show that, under some regularity conditions, the nonparametric estimator point‐wise converges to the normal distribution. We perform a simulation study and compare the proposed semiparametric and nonparametric estimators. For the semiparametric model, we study both correct and misspecified models and show the different corresponding results. We show how the accuracy changes with sample size. We then carry out a case study that is based on data of patients waiting for treatment in an ED, in the U.S. in 2008. We analyzed separately different severity levels (15 106 observations in the emergency group, 43 600 in the urgent group, and 26 541 in the semi‐urgent group). We conclude with a comparison of the semiparametric and nonparametric estimators for the three different severity levels of this dataset.

BRIEF LITERATURE REVIEW

Developing screening methods for medical conditions, such as breast and melanoma cancers, has a long history. , In the classical setting, the medical condition either already exists at the time of screening and is thus left‐censored, or does not exist, and is thus right‐censored. The setting in which self‐detection is possible, and thus the condition time is observed, has been surprisingly mostly ignored in the literature. For example, Minn et al treat both self‐detection times and screening times as event times, ignoring the censoring. The closest model to the one that we present here appears in Whitehead. It is assumed there that the condition can be detected at screening or before screening due to symptoms. In both cases, the condition already exists at the time of detection. It is also assumed that screenings take place at a sequence of fixed time points. Whitehead recommends to ignore the extra knowledge gained due to self‐reporting and to replace these times with the time of the next screening. The survival function is then estimated only at the discrete fixed screening times using standard techniques. There has been considerable research effort, dedicated to modeling and analysis of customer (im)patience while waiting for service. Here we describe several papers that, together with references therein, provide what is required for a historical background and state‐of‐art perspective. First, we recommend the recent literature review (Section 3) in Batt and Terwiesch, accompanied by Gans et al. These survey patience‐research from an operational/queuing view point (mainly section 6.3.3 in the latter), while connecting it to the medical literature on patients who are left without being seen (LWBS) (mainly Section 3 in the former); see also Aksin et al who expand on managerial challenges. Next we mention Mandelbaum and Zeltyn, which is an Explanatory Data Analysis of (im)patience in telephone call centers (that appears in a special issue that is devoted to models of queues abandonment). Finally, and the most related to the present study, are the following two studies. Brown et al applies, in Section 5, the Kaplan‐Meier estimator to estimate the survival functions and consequently hazard rates, of both virtual waiting time and impatience; the data is that of a call center, in which times of abandonment are all recorded hence the data is right‐censored. Then Wiler et al, which is also the source of our present ED data case study, estimate LWBS rates as a function of ED patient arrival rates, treatment times, and ED boarding times. There was no attempt in that work to estimate the patience‐time distribution. We conclude this brief survey with the observation that the estimation of customer (im)patience is relevant beyond screening, call centers, and EDs. For example, Nah studies tolerance of Web users (during information retrieval). Yom‐Tov et al analyzes chat services, in which customers abandon at any phase during chat‐exchanges with a service center: one expects that such services give rise to the same options as in EDs: some customers receive service, others abandon without letting anyone know, and the rest announce their abandonment time.

THE MODEL

In the standard setting of right‐censored data one observes, for each patient, either the failure time or the censoring time. In terms of our motivating example, failure time is patience time while censoring time is the waiting time. Patience time is observed when patients leave the ED while informing the system of their departure; waiting time is observed when a patient is called for service. However, unlike in standard right‐censored data and like in current status data, there are also patients who leave without informing; in this case their absence is observed only when they are called for service, and this latter time provides an upper bound for their patience time. In other words, the (virtual) waiting time is observed, and the only information on patience time is that it is less than this observed waiting time. Hence, in this case, the patience time is left‐censored. More formally, let be the patient's failure time, that is, the time until the patient loses patience. Let be the censoring time, that is, the waiting time until the patient gets (or could have gotten) service. We assume that has a cumulative distribution function (cdf) and a probability density function (pdf) , and that has cdf and pdf . Let be the indicator ; that is, if the patient loses patience before being called to service, and otherwise. Let be the indicator that is 1 for a patient who leaves and informs when leaving, and 0 otherwise. Denote by the conditional probability that a patient reports leaving given that the waiting time equals to . In other words, . The assumption that patience time and waiting time are independent is common in survival analysis, for example, when using the Kaplan‐Meier estimator. Since and may be dependent, one can use strata to overcome this challenge as was done in the case study in Section 7. The announcement indicator depends on the time through the function . In other words, given the patience time , the decision on announcement does not depend on actual waiting time . However, due to censoring, the decision on the announcement is observed only when . Summarizing, we assume that the pair is independent of the waiting time . When this assumption does not hold, different theoretical tools are needed for a valid estimation. Let be the recorded time: . The observed data consist of the triplets , , and there are three categories of patients: The patient gets service, hence the waiting time is observed, which serves as a lower bound on the patience time; thus the patience time is right censored. Formally, , , and . The patient leaves without being treated and reports departure. The patience time is thus revealed: , , and . The patient leaves without reporting, hence virtual waiting time (the time that the patient would have waited had he stayed in the ED) is observed, which provides an upper bound for the patience time, thus the patience time is left‐censored. Formally, , , and . A graphical diagram of these categories appears in Figure 1.

FIGURE 1

The three patient categories. Category 1 includes patients that received service. Category 2 includes patients that left without being seen and announced before leaving. Category 3 includes patients that left without being seen but did not announce leaving The following equalities hold: . . . Here, and are the survival functions of the patience time and the waiting time, respectively. See the proof in Appendix A.1. For , we introduce the following sub‐stochastic density functions From Lemma 1 above, we deduce that Define Then is the density function of the observed time given . Our model assumes that all denominators are positive. To summarize, what is known and what is to be estimated, there are two unknown distributions in our setting, and , and we aim to estimate them using both semiparametric and nonparametric techniques. For each patient, the waiting time is either observed or right censored. If the patient reports and then leaves, the waiting time is longer than the observed patience time. Hence, the waiting time is right‐censored. Therefore, semiparametric and nonparametric estimation for the distribution of waiting time can be done by standard techniques for right‐censored data. However, estimation of the distribution of patience time , is more complicated and is discussed in Sections 4 and 5.

SEMIPARAMETRIC ESTIMATION

Assume now that the distributions of both the patience time and the waiting time belong to some parametric families. More formally, let where , where . We assume that the density of the patience time can be written as . We also assume that the density of the waiting time can be written as . Write , and similarly and . The likelihood of the observed data can be written in terms of the functions , , and , as follows: Using the explicit representations of , , , we obtain that is given by The value of that maximizes this likelihood is independent of . Therefore, a maximum likelihood estimator (MLE) to can be constructed from this likelihood. Maximizing the likelihood with respect to is difficult. Even if is given or estimated, the maximizer of depends on the unknown function . To address this challenge, we propose using a partial likelihood approach which avoids the need to estimate . The partial likelihood that we use here is the likelihood calculated only for a specific category while ignoring the data for the other categories. In Theorem 1 below we show that, under standard regularity conditions, the maximizer of the partial likelihood is a consistent and asymptotically normal estimator for . We consider the partial likelihood of category , The value of that maximizes this partial likelihood depends on . We plug the MLE into this partial likelihood. Clearly, the resulting estimator for does not depend on the function and thus no estimation of is needed. Finding an estimator for the announcement probability function is an interesting and challenging research question that is beyond the scope of this article. We need the following assumptions: The derivative is continuous in for each , is continuous in for each . For all , is unique, hence denote . It is assumed as well that for each , . For all , is unique, hence denote . It is assumed as well that for each , . Let be the maximizer of and let be the maximizer of . Then, as , in probability. in distribution. in probability. in distribution. Here , are covariance matrices as defined in Appendix . The proof appears in Appendix A.1. Assume that follows an exponential distribution with rate and follows an exponential distribution with rate . Then The details of the computation appears in Appendix A.5.

NONPARAMETRIC ESTIMATION

In this section we propose nonparametric estimators for the survival function of the patience time and study its theoretical properties. For simplicity, we restrict the estimation to an interval for some , such that the probability of and being larger than is positive. This is a standard condition in survival estimation (chapter 4.2). Note that for observations of Categories 1 and 3, the waiting‐time is observed. For Category 2, only a lower bound of the waiting time is observed. Hence, the waiting time is either observed or right‐censored. Therefore, estimating the waiting time distribution can be done by using standard survival analysis estimators such as the Kaplan‐Meyer estimator. On the other hand, estimating the distribution of the patience time is more challenging since we cannot distinguish between the density function and the unknown function . Our goal is thus to estimate the distribution of the patience time . Assume that over all positive numbers, the waiting time density function is strictly positive. Recall that , , where the functions are defined as in (1). Therefore, which is well defined as . Reordering the terms in (3), we get that Hence, From the definitions in (2), it follows that Therefore, we propose to estimate by estimating the following terms: and , and , . Estimating the expression in (i) can be done by the empirical estimators: , . These estimators converge, by the central limit theorem (CLT), to and , respectively, at the rate of . Since and are density functions, they can be estimated using a kernel estimator (chapter 1.2). Let and be kernel estimators of and , respectively. Assume that both and belong to a Sobolev function class of order . Then for each , both and converge at a rate of . Here, the parameter is an integer that represents the smoothness of a function. Specifically, if for some integer k, then the function is at least k‐time differentiable. We now turn to estimate the term . A nonparametric estimator that we created for this term is defined and proven to be consistent in the following lemma. Let Define . Then converges pointwise to , at a rate of , for every . The proof is given in Appendix A.3. By plugging in the estimators to the equation in (4), we get that is an estimator of . The estimator converges pointwise to at a rate of , for every . The proof appears in Appendix A.4. Since that is based on density estimation, it is not necessarily monotonic, we therefore replace it with a monotonic approximation. The monotonic approximation is by taking the cumulative sup.

SIMULATIONS

We study the performance of both the semiparametric and nonparametric estimators that were proposed in Sections 4 and 5, respectively. Based on the setting of the case study discussed in Section 7, we consider two simulation settings. In the case study, both the exponential and Weibull distributions seem to fit well the waiting time and patience time distributions, respectively. Thus, we chose parameters based on the fit for the urgent level, which is the middle severity level. Specifically, the two simulation settings consist of samples from exponential and Weibull distributions in which the waiting time has a smaller mean then the patience time mean, as was observed in the case study. In the first setting, following the data from the case study, a sample was taken from the model in which the patience time follows an exponential distribution with expectation of 16 h, and the waiting time follows an exponential distribution with expectation of 2 h. In the second setting a sample was taken from a model in which the patience time follows a Weibull distribution with scale 16 and shape 1.5, which closely related to the observed data; and where the waiting time follows an exponential distribution with expectation of 2 h as before. In both settings, the unknown probability of announcement is . Taking the probability of announcement to be the increasing function or the constant function yields similar results which are omitted. Moreover, we experimented with additional numerical values. The behavior and conclusions, as reported here, remain consistent across these experiments. In each setting, we calculated the semiparametric estimator for the scale of for five different sample sizes (). For each sample size, we repeated the simulation 100 times. When using the semiparametric method, it was assumed that both and follow an exponential distribution with unknown parameters. Note that this assumption holds for the first setting but does not hold for the second one. In other words, the second setting is carried out under a misspecified model. The results are shown in Figure 2.

FIGURE 2

The difference between the semiparametric estimator of and . Setting 1: The patience time follows an exponential distribution with expectation of 16 h and the waiting time follows and exponential distribution with expectation of 2 h. Setting 2: The patience time follows a Weibull distribution with scale 16 and shape 1.5 while the waiting time follows an exponential distribution with expectation of 2 h We compare , the estimator of the survival function of , to the true survival function . For the semiparametric estimation, , while for the nonparametric estimator is given by (A.4). The comparison is done using mean square error (MSE), which is defined by where is the density of . The semiparametric and nonparametric survival function estimators are demonstrated in Figures 3 and 4. Figure 3 represents the results of the first setting in which follows an exponential distribution with scale 13 and follows an exponential distribution with scale 2. Figure 4 represents the results of the second setting in which follows a Weibull distribution with scale 13 and shape 1.5, and follows an exponential distribution with scale 2. Summaries of the MSE are given in Table 1. Not surprisingly, for Setting 1, since the semiparametric model is correct, the MSE is smaller for the semiparametric estimator. Similarly, since in Setting 2 the semiparametric model is incorrect, the MSE is smaller for the nonparametric estimator.

FIGURE 3

Setting 1. The blue, red, and black curves represent the nonparametric, semiparametric, and true survival functions, respectively, for , and 1000

FIGURE 4

Second setting. The blue, red, and black curves represent the nonparametric, semiparametric, and true survival functions, respectively

TABLE 1

MSE for Settings 1 and 2

	Setting 1: Exponential						Setting 2: Weibull
	Semiparametric			Nonparametric			Semiparametric			Nonparametric
N	Mean	Median	Std	Mean	Median	Std	Mean	Median	Std	Mean	Median	Std
100	1.014	0.432	1.44	1.428	0.804	1.812	0.474	0.228	0.618	0.216	0.114	0.264
200	0.51	0.21	0.828	1.062	0.672	1.11	0.414	0.252	0.492	0.072	0.036	0.12
500	0.162	0.084	0.198	0.462	0.288	0.474	0.378	0.294	0.288	0.03	0.018	0.03
1000	0.078	0.042	0.102	0.246	0.186	0.186	0.342	0.312	0.204	0.018	0.012	0.018
2000	0.054	0.03	0.066	0.168	0.132	0.12	0.348	0.318	0.132	0.012	0.0006	0.012

Note: The table summarizes the MSE that was calculated (100 times) for each of the sample sizes. For Setting 1, the patience time follows an exponential distribution with expectation of 16 h and the waiting time follows an exponential distribution with expectation of 2 h. In Setting 2 the patience time follows a Weibull distribution with scale 16 and shape 1.5, while the waiting time follows an exponential distribution with expectation of 2 h. The estimates are given in minutes. As can be seen the nonparametric estimator responded with a lower MSE.

Setting 1. The blue, red, and black curves represent the nonparametric, semiparametric, and true survival functions, respectively, for , and 1000 Second setting. The blue, red, and black curves represent the nonparametric, semiparametric, and true survival functions, respectively MSE for Settings 1 and 2 Note: The table summarizes the MSE that was calculated (100 times) for each of the sample sizes. For Setting 1, the patience time follows an exponential distribution with expectation of 16 h and the waiting time follows an exponential distribution with expectation of 2 h. In Setting 2 the patience time follows a Weibull distribution with scale 16 and shape 1.5, while the waiting time follows an exponential distribution with expectation of 2 h. The estimates are given in minutes. As can be seen the nonparametric estimator responded with a lower MSE.

CASE STUDY

As leaving without being seen by a physician may have a strong effect on patient well‐being and satisfaction, estimating the time that patients are willing to wait in the ED is an important and challenging question. , While there has been considerable research in this field, , , , due to the special structure of the data, the duration that a potential patient is willing to wait for ED service has not been thoroughly investigated. We analyze data from all patient presentations to triage at an urban, academic, adult‐only ED with visits in calendar year 2008. This data was used for the analysis in Wiler et al. The data consist of the waiting time of patients arriving at the emergency room stratified by acuity levels. We focused on the three main levels of acuity: emergency, urgent, and semi‐urgent. For each acuity level, we categorized each visit into one of the three categories: received service, left without being seen and announced, and left without being seen and did not announce. We considered only patients that were not served upon arrival, or left without waiting at all. The characteristics of the dataset appear in Table 2. As can be seen, they are considerably fewer emergency visits and only 1.2% of these left without being seen. In comparison, in the urgent and semi‐urgent acuity levels, about 10% left without being seen. The distribution of the observed times for each acuity levels, stratified by the patient's category, appears in Figure 5. Overall, the distribution of the three categories is similar in each acuity level.

TABLE 2

The characteristics of the different visits stratified by acuity level

	Emergency	Urgent	Semi‐urgent
n	8579	36 249	26 036
Category (%)
Service	8478 (98.8%)	32 607 (90.0%)	23 788 (91.4%)
LWBS & announcement	69 (0.8%)	1908 (5.3%)	1019 (3.9%)
LWBS & no announcement	32 (0.4%)	1734 (4.8%)	1229 (4.7)%
Mean observed time (SD)	25.11 (24.26)	111.73 (108.21)	82.18 (72.92)

FIGURE 5

The distribution of the observed time stratified by acuity levels and category

The distribution of the observed time stratified by acuity levels and category The characteristics of the different visits stratified by acuity level We analyzed the data using the semiparametric and nonparametric estimators for the distribution of the patience time proposed in Sections 4 and 5. Since our model assumes that all patients follow the same distribution, we calculated the estimators for each level of acuity separately. The data consist of the triple variables described in Section 3 such that each observation is categorized to one of the three possible categories. The results of these estimators are given in Figures 6 and 7. As can be seen from Figure 6, the results of the semiparametric and nonparametric estimators agree, which suggests that modeling the patience time using the exponential distribution is reasonable. Figure 7 shows that the patience times are stochastically ordered by levels of acuity. In other words, patients at the severe acuity level are less probable to lose patience than patients at the urgent level, who in turn are less prone to lose patience than patients at the semi‐urgent level, as expected.

FIGURE 6

Compression of the nonparametric and semiparametric estimators for the survival of the patience time by different levels of acuity

FIGURE 7

Compression of the estimator for the survival function of the patience time at the three different levels of severity

Compression of the nonparametric and semiparametric estimators for the survival of the patience time by different levels of acuity Compression of the estimator for the survival function of the patience time at the three different levels of severity

DISCUSSION

In this article, we consider survival data that combine observed, right‐censored, and left‐censored data. The setting we analyzed was that of patients who wait for treatment in an ED, where some patients may leave without being seen. We proposed both semiparametric and nonparametric estimators for the distribution of the patience time. Using simulation, we showed that when the semiparametric model holds, the semiparametric estimator estimates the patience time well. However, when the model is misspecified, the nonparametric estimator behaves better. While in our case study, both estimators behave similarly, it is of importance to further investigate when each of these estimators is preferable. So far, no baseline covariates were given. Novel semiparametric and nonparametric estimators are needed for addressing settings that include baseline covariates.

CONFLICT OF INTEREST

The authors declare no potential conflict of interests.

9 in total

1. Surveillance neuroimaging to detect relapse in childhood brain tumors: a Pediatric Oncology Group study.

Authors: A Y Minn; B H Pollock; L Garzarella; G V Dahl; L E Kun; J M Ducore; A Shibata; J Kepner; P G Fisher
Journal: J Clin Oncol Date: 2001-11-01 Impact factor: 44.544

2. An emergency department patient flow model based on queueing theory principles.

Authors: Jennifer L Wiler; Ehsan Bolandifar; Richard T Griffey; Robert F Poirier; Tava Olsen
Journal: Acad Emerg Med Date: 2013-09 Impact factor: 3.451

3. Regression analysis of grouped survival data with application to breast cancer data.

Authors: R L Prentice; L A Gloeckler
Journal: Biometrics Date: 1978-03 Impact factor: 2.571

4. Characteristics of frequent users of emergency departments.

Authors: Kelly A Hunt; Ellen J Weber; Jonathan A Showstack; David C Colby; Michael L Callaham
Journal: Ann Emerg Med Date: 2006-03-30 Impact factor: 5.721

5. The analysis of relapse clinical trials, with application to a comparison of two ulcer treatments.

Authors: J Whitehead
Journal: Stat Med Date: 1989-12 Impact factor: 2.373

Review 6. Interval censoring.

Authors: Zhigang Zhang; Jianguo Sun
Journal: Stat Methods Med Res Date: 2009-08-04 Impact factor: 3.021

1 in total

1. Self-reporting and screening: Data with right-censored, left-censored, and complete observations.

Authors: Jonathan Yefenof; Yair Goldberg; Jennifer Wiler; Avishai Mandelbaum; Ya'acov Ritov
Journal: Stat Med Date: 2022-05-24 Impact factor: 2.497