Literature DB >> 29790230

Visualizing the quantile survival time difference curve.

Abstract

The difference between the pth quantiles of 2 survival functions can be used to compare patients' survival between 2 therapies. Setting p = 0.5 yields the median survival time difference. Varying p between 0 and 1 defines the quantile survival time difference curve which can be straightforwardly estimated by the horizontal differences between 2 Kaplan-Meier curves. The estimate's variability can be visualized by adding either a bundle of resampled bootstrap step functions or, alternatively, approximate bootstrap confidence bands. The user-friendly SAS software macro %kmdiff enables the straightforward application of this exploratory graphical approach. The macro is described, and its application is exemplified with breast cancer data. The advantages and limitations of the approach are discussed.

Entities: Disease Gene Species

Keywords: Kaplan-Meier curves; SAS macro; bootstrap samples; confidence bands; exploratory data analysis

Mesh：

Year: 2018 PMID： 29790230 PMCID： PMC6099283 DOI： 10.1111/jep.12948

Source DB: PubMed Journal: J Eval Clin Pract ISSN： 1356-1294 Impact factor: 2.431

INTRODUCTION

As consistent estimator of the survival function, the Kaplan‐Meier curve1 is the most commonly used graphical tool in survival analysis. It is extensively used to visually compare censored survival time curves between groups of patients distinguished by different therapies, biomarker categories, or demographic features. Estimates for survival probabilities at specific times (eg, 3‐year survival probability) and specific quantiles of the survival time distribution (eg, median survival time) can be easily obtained. The diagonal upper‐left‐to‐lower‐right nature of the plot, however, hampers the visual assessment of the difference between the th quantiles of 2 Kaplan‐Meier curves. To overcome this problem, the quantile survival time difference curve is defined and straightforwardly estimated by calculating the horizontal differences between the corresponding Kaplan‐Meier curves. Two bootstrap‐based approaches are suggested to illustrate the estimate's variability; alternatively, the traditional normal approximation method or a smoothed empirical likelihood approach could have been considered as well.2 The quantile survival time difference curve is not to be confused with the survival probability difference curve3, 4, 5, 6 and approaches related to it.7, 8, 9, 10 Besides that, the term “quantile difference” itself is not unambiguously defined. It could also refer to the difference between the th and th quantiles of a single survival curve11, 12, eg, the interquartile range for and . In Section 2, the technical details of the approach are presented. The SAS macro %kmdiff is described and illustrated in Section 3. A brief discussion is given in Section 4. The SAS macro code was generated using Version 9.4 (for Windows) of the SAS software (copyright © 2002‐2012 SAS Institute Inc.). SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.

METHODS

Assume that from a large patient population with survival function , right‐censored survival data have been observed for a sample of patients, where denote observed survival times and denote their corresponding censoring indicators, respectively. The Kaplan‐Meier estimator of the survival function is defined as where 0 ≤ are different observed failure times, is the number of failures, and is the size of the risk set at , . If no censored values have been observed before , then . is right continuous, for 0 ≤ and for 0 ≤ . Note that equals 0 only if all censored observations occur before . If patients are still alive after , then either can be set equal to from to the largest censoring time or can be considered not defined for . In the current manuscript, the latter definition is applied. Consequently, in the case of , that is no observed failure times, equals 1 for and is not defined for . The th quantile survival time corresponds to the labelling of the survival function axis of the Kaplan‐Meier plot and is equivalent to the common (1 − th quantile of a distribution function. For and , the estimator for the th quantile survival time can be defined as Now assume that from a second large patient population (independent from the first one) with survival function , right‐censored survival data have been observed as well for a sample of patients. Their observed survival times and censoring indicators are denoted by and , respectively, and 0 ≤ are the different observed failure times. By analogy with formulae (1) and (2), a corresponding estimator for the th quantile survival time is obtained, . Note that the prime in the notations refers to the second population and its corresponding sample. Let Now, an estimator for the quantile survival time difference curve can be defined as for . If , then . It seems obvious now to plot against . However, for the sake of comparability with the Kaplan Meier plot, has to be plotted on the vertical axis against on the horizontal axis, respectively. In a next step, the variability of the estimator will be visualized. For this purpose, 2 bootstrap solutions are suggested, both are based on Efron's classical bootstrap for censored data.13

Bootstrap bundle

Draw a sample of size with replacement from ( and a sample of size with replacement from (. These are the first bootstrap samples, and from where the bootstrapped quantile survival time difference curve can be computed from the 2 bootstrapped quantile survival time curves, , . Here, is defined in analogy to of formula (3). Repeating the drawing of the bootstrap samples times eventually yields bootstrapped quantile difference curves, , , . Depicting them together with enables a visual assessment of the variation of . In doing so, a subdued colour like light grey should be used for the bootstrapped quantile difference curves, and a vibrant colour like green should be used for , respectively. Setting to a value between 40 and 200 seems reasonable when using the bootstrap bundle approach.

Bootstrap confidence bands

The generation of a confidence band also requires bootstrapped quantile difference curves, , 0 ≤ , . Here, the value of should depend on the chosen confidence level 100(1 − %. As rule of thumb, we recommend , that is for a 2‐sided 95% confidence band and for a 2‐sided 99% confidence band, respectively. A confidence band for the quantile difference curve can be constructed from a series of pointwise confidence intervals for quantile differences. For a given survival probability , the corresponding confidence interval is determined as the (th and the 1 − (th quantile of the empirical distribution of , … , where and . Note that is the smallest probability for which the quantile difference can be computed for all bootstrap replications, and so, the confidence band is available for the interval [. The length of this interval can become rather short as it depends on the bootstrapped Kaplan‐Meier curve (out of 2 × such curves) that drops least. By applying a conservative approach, the confidence band can now be extended for as well, where is a specifically defined (1 − quantile of . According to the usual definition of quantiles, the (1 − quantile is either one of the values or it can be any chosen value from an interval, say [. In the former case, set equal to that value; in the latter case, set , respectively. Now, for a given survival probability with and all with , is defined and a weight of 1/ can be assigned; furthermore, is set to an arbitrary value larger than the maximum of the absolute values of these . For all with , is undefined and will be replaced with 2 values, − and , with weights 0.5/. Hence, the sum of all weights will be 1 and the confidence interval can be determined as the (th and the 1 − (th quantile of the now‐weighted empirical distribution.

RESULTS

SAS macro

The SAS macro %kmdiff allows the straightforward application of the method to survival data sets and is freely available at https://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software. The macro produces the standard Kaplan‐Meier plot of SAS, and plots of the quantile differences with (i) a bootstrap bundle, (ii) bootstrap confidence bands, and (iii) both a bootstrap bundle and bootstrap confidence bands. The macro parameters are described in Table 1.

Table 1

Parameters of the SAS macro %kmdiff

Parameter	Description
data	Input SAS data set (required); its name must not start with two underscores, and it must not contain variables whose names start with two underscores (in particular, “__g”, “__t”, and “__status”)
time	Survival time variable (required)
timeunit	Time units of the survival time variable; default is “years”
status	Survival status variable (required)
censval	Censoring status value(s) to be used in PROC LIFETEST; default is “0”; use the %str() function to specify more than 1 value
group	Covariate which distinguishes the two groups of interest (required)
gvalue1	Numerical value of first group of interest (required)
gvalue2	Numerical value of second group of interest (required)
grouplbl	Label of group variable (optional)
gvallbl1	Label of first group value (optional)
gvallbl2	Label of second group value (optional)
alpha	100‐alpha % is the pointwise two‐sided confidence level; default for alpha is “5”
boot	Number of bootstrap replications for computing confidence bands; default is “2000”, minimum is “100”
bundle	Number of bootstrap replications shown in figures; default is “200”
seedval	Bootstrap random number seed; default is “0”

Parameters of the SAS macro %kmdiff

Example

Patients with primary node positive breast cancer have been recruited by the German Breast Cancer Study Group (GBSG) from July 1984 to December 198914; the GBSG data set can be either obtained from http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets or from http://biom131.imbi.uni-freiburg.de/biom/Royston-Sauerbrei-book/. It will be used in the following for illustration purpose. The data set contains several clinical variables and the recurrence free survival (RFS) time of 686 female patients of whom 299 suffered an event (cancer recurrence or death). In Figure 1, the Kaplan‐Meier curves (product‐limit survival estimates) for 290 premenopausal and 396 postmenopausal women are shown. The curves lie close to each other and intersect right before 6 months and then again right before 3 years; albeit the first intersection is hard to see. Before the first and after the second intersection, the premenopausal women show better RFS, in the time between the postmenopausal women are better off.

Figure 1

Kaplan‐Meier curves showing recurrence‐free survival (RFS) of postmenopausal and premenopausal women with primary node positive breast cancer

Kaplan‐Meier curves showing recurrence‐free survival (RFS) of postmenopausal and premenopausal women with primary node positive breast cancer The estimated quantile RFS time difference curve is shown as green solid line in Figures 2, 3, 4; thereby, 3 distinct ways to visualize its variability have been applied. A bundle of 200 bootstrap replications of the quantile RFS time difference curve are added as grey solid lines to Figure 2. A 95% confidence band is shown as green transparent area in Figure 3; it has been derived from 2000 bootstrap replications of the quantile RFS time difference curve. In Figure 4, key features of Figures 2 and 3 have been combined; however, the 95% confidence band is depicted with 2 black solid lines now. To allow independent replications of Figures 2, 3, 4, the seedval parameter (bootstrap random number seed) was set to a value larger than 0, concretely 44181.

Figure 2

Figure 3

Figure 4

RFS probabilities are plotted against quantile RFS time differences between postmenopausal and premenopausal women with primary node positive breast cancer (green solid line). The variability of the estimated curve is illustrated by a bundle of 200 bootstrap replications (grey solid lines) RFS probabilities are plotted against quantile RFS time differences between postmenopausal and premenopausal women with primary node positive breast cancer (green solid line). The variability of the estimated curve is illustrated by a bootstrap‐based 95% confidence band (2000 replications, green transparent area) RFS probabilities are plotted against quantile RFS time differences between postmenopausal and premenopausal women with primary node positive breast cancer (green solid line). The variability of the estimated curve is illustrated by both a bundle of 200 bootstrap replications (grey solid lines) and a bootstrap‐based 95% confidence band (2000 replications, black solid lines) The quantile RFS time differences curve vividly shows the extent of the horizontal gaps between the 2 Kaplan‐Meier curves. In particular, the 2 intersections of the Kaplan‐Meier curves can be easily distinguished in the quantile RFS time difference curve. Both the bundle of bootstrap replications and the 95% confidence band reveal the increasing variability of the observed quantile difference with decreasing survival probability (ie, decreasing size of the risk set). Simple random fluctuations can provide a plausible explanation for the observed differences in quantile RFS times between premenopausal and postmenopausal women over nearly the whole observed probability range. The only exception is observed for survival probabilities around 94%.

DISCUSSION

The SAS macro %kmdiff provides an easy‐to‐use computational tool to visually assess the estimated quantile survival time difference curve and its sample variability. The curve is intended as useful complement but not as replacement of the survival probability difference curve and all the refined approaches based thereon.3, 4, 5, 6, 7, 8, 9, 10 The motivation to plot a quantile survival time difference curve may be threefold. Firstly, patients as well as health professionals may find time differences more intuitive and easier to interpret than probabilities and probability differences. Secondly, the curve shows an overall picture unlike the isolated snippet provided by the commonly reported median survival time difference. And thirdly, it can become rather difficult to assess horizontal (and also vertical) differences between 2 Kaplan‐Meier curves; the visual perception is often affected by the shortest (Euclidean) distance between the 2 curves. The main purpose of the SAS macro %kmdiff is to support the exploration of (possibly time‐dependent) group effects in the presence of right censoring. Using the macro for confirmatory purposes would require the prespecification of statistical hypotheses and the proper adjustment for any multiple testing; it should also be taken into account that the macro produces a pointwise confidence band. There is a potential limitation of bootstrap‐based confidence intervals for quantile differences: in particular in very small samples and risk sets, the actual coverage probability may considerably deviate from the nominal one.15, 16 Naturally, this limitation also affects the bootstrap bundle approach; that is, the graphically shown bootstrap replications might only provide a distorted picture of the variability of the estimated quantile differences in very small samples and risk sets. Confidence intervals could also be obtained by normal approximation or a smoothed empirical likelihood method.2 However, the former will need moderate to large sample sizes to work properly, whereas the latter requires the selection of a kernel bandwidth by cross‐validation which is computationally burdensome.2 Besides that, due to the close relationship between the bootstrap and the empirical likelihood, it seems reasonable to assume that the empirical likelihood will face similar problems as the bootstrap in very small samples and risk sets. The presented SAS macro %kmdiff only uses Base SAS and SAS/STAT procedures for statistical computations; graphical representations are based on the ODS Graphics procedure SGPLOT. Note that the user can easily modify the SGPLOT code or replace it with purpose‐built SAS code to obtain specially tailored graphical output. In conclusion, the SAS macro %kmdiff provides a useful exploratory tool for medical researchers as it brings in an additional dimension to the assessment and communication of group differences in patient survival.

6 in total

1. Semiparametric simultaneous confidence bands for the difference of survival functions.

Authors: Nubyra Ahmed; Sundarraman Subramanian
Journal: Lifetime Data Anal Date: 2015-10-14 Impact factor: 1.588

2. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt.

Authors: Patrick Royston; Mahesh K B Parmar
Journal: Stat Med Date: 2011-05-25 Impact factor: 2.373

Review 3. Risk-difference curves can be used to communicate time-dependent effects of adjuvant therapies for early stage cancer.

Authors: Michael Coory; Karen E Lamb; Michael Sorich
Journal: J Clin Epidemiol Date: 2014-04-29 Impact factor: 6.437

4. Summarizing differences in cumulative incidence functions.

Authors: Mei-Jie Zhang; Jason Fine
Journal: Stat Med Date: 2008-10-30 Impact factor: 2.373

5. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study.

Authors: Lihui Zhao; Lu Tian; Hajime Uno; Scott D Solomon; Marc A Pfeffer; Jerald S Schindler; Lee Jen Wei
Journal: Clin Trials Date: 2012-08-22 Impact factor: 2.486

6. On the restricted mean survival time curve in survival analysis.

Authors: Lihui Zhao; Brian Claggett; Lu Tian; Hajime Uno; Marc A Pfeffer; Scott D Solomon; Lorenzo Trippa; L J Wei
Journal: Biometrics Date: 2015-08-24 Impact factor: 2.571

6 in total