In time-to-event studies it is common the presence of a fraction of individuals not expecting to experience the event of interest; these individuals who are immune to the event or cured for the disease during the study are known as long-term survivors. In addition, in many studies it is observed two lifetimes associated to the same individual, and in some cases there exists a dependence structure between them. In these situations, the usual existing lifetime distributions are not appropriate to model data sets with long-term survivors and dependent bivariate lifetimes. In this study, it is proposed a bivariate model based on a Weibull standard distribution with a dependence structure based on fifteen different copula functions. We assumed the Weibull distribution due to its wide use in survival data analysis and its greater flexibility and simplicity, but the presented methods can be adapted to other continuous survival distributions. Three examples, considering real data sets are introduced to illustrate the proposed methodology. A Bayesian approach is assumed to get the inferences for the parameters of the model where the posterior summaries of interest are obtained using Markov Chain Monte Carlo simulation methods and the Openbugs software. For the data analysis considering different real data sets it was assumed fifteen different copula models from which is was possible to find models with satisfactory fit for the bivariate lifetimes in presence of long-term survivors.
In time-to-event studies it is common the presence of a fraction of individuals not expecting to experience the event of interest; these individuals who are immune to the event or cured for the disease during the study are known as long-term survivors. In addition, in many studies it is observed two lifetimes associated to the same individual, and in some cases there exists a dependence structure between them. In these situations, the usual existing lifetime distributions are not appropriate to model data sets with long-term survivors and dependent bivariate lifetimes. In this study, it is proposed a bivariate model based on a Weibull standard distribution with a dependence structure based on fifteen different copula functions. We assumed the Weibull distribution due to its wide use in survival data analysis and its greater flexibility and simplicity, but the presented methods can be adapted to other continuous survival distributions. Three examples, considering real data sets are introduced to illustrate the proposed methodology. A Bayesian approach is assumed to get the inferences for the parameters of the model where the posterior summaries of interest are obtained using Markov Chain Monte Carlo simulation methods and the Openbugs software. For the data analysis considering different real data sets it was assumed fifteen different copula models from which is was possible to find models with satisfactory fit for the bivariate lifetimes in presence of long-term survivors.
In medical research, usual parametric and non-parametric tools are widely used for the data analysis of time-to-event data. These tools are useful when some observations are censored and the event of interest has not occurred for all patients at the follow-up time. The procedures most commonly used include the life-table method, the Kaplan-Meier estimator for the survival function, the Cox proportional hazards model, and parametric survival models. These techniques are described in textbooks such as Klein and Moeschberger [1] and Kalbfleisch and Prentice [2].A common situation in the data analysis of time-to-event data, particularly in cancer research, occurs when it is expected that a fraction of subjects will not experience the event of interest. In this situation, usually it is considered frailty models [3], or it is assumed that the population is a mixture of susceptible individuals who experience the event of interest and non-susceptible individuals that supposedly will never experience it. Statistical methods have been developed to analysis such data, see Lambert et al. [4] and Yu et al. [5]. Following Maller and Zhou [6], a mixture model for these data assumes that the probability that the time-to-event is larger than some specified time t is given by the survival function where T is a nonnegative random variable denoting the lifetime of an individual, p is a parameter denoting the proportion of “long-term survivors” or “cured patients” () and is the baseline survival function for the susceptible individuals. Usual choices for are based on the Weibull, gamma, Rayleigh and lognormal distributions, among many others. In the expression (1) it is observed that converges to p as t tends to infinite, given that converges to 0 as t tends to infinite. The correspondent probability density function for the lifetime T is given by where is the density function for the susceptible individuals.In the last decades a large number of studies have been developed with the purpose of dealing with bivariate time-to-event data, such as the ones from Hanagal and Bhambure [7] and Emura and Chen [8]. Many of these studies have been conducted using copula functions, that express joint distributions for multivariate random variables. Copula functions were introduced by Sklar [9] where, in the bivariate case, they are related to bivariate distribution functions whose marginal distributions are univariate uniform distributions in the interval . Bivariate models for the lifetime data analysis based on copula functions have been introduced by a number of authors, such as Kundu and Gupta [10], Achcar et al. [11], Peres et al. [12], Nair et al. [13] and Romeo et al. [14].In this paper, we present a review on the copula functions that can be useful to the construction of bivariate distributions to be used in the lifetime data analyses. These distributions include the presence of long-term survivors and censored data. The standard two-parameter Weibull distribution is assumed for the marginal univariate lifetimes in presence of cure fraction. This distribution is chosen because it is one of the most widely used distributions in survival analysis, and due its simplicity and versatility. The paper is organized as follows: in Section 2, it is presented different existing copula functions together with their correspondent dependence parameters, the corresponding parametrical models assuming cure fraction and some selection models criteria to be used to choose the best model in each application. In Section 3, it is introduced the Bayesian approach used to estimate the model parameters. Section 4 illustrates the efficiency of the proposed method with applications to three real data sets. Finally, in Section 5 it is presented some conclusions and discussion of the obtained results.
Methods
Copula functions
A copula function is used to describe the dependence structure among continuous random variables. Copula functions allow us to generate multivariate distributions that have different probability marginal distributions. The theoretical foundation for the application of copulas is provided by the Sklar's theorem, that states that a m-dimensional copula is a function C from to the interval , satisfying the following conditions: (i) for every and all in ; (ii) if for any and (iii) C is m-increasing [15], [16]. Considering a m -variate function F, the respective copula is a function that satisfies where ϕ is a parameter which measures the dependence between the marginals. When the random variables are independent, we haveIn the bivariate case () and in the context of survival (time-to-event) data, let be the paired failure times with observations given by . In addition, and are the marginal survival functions and the marginal density functions of , , respectively. The joint distribution function of the survival times is given by where is the joint survival function of . From the equation above, it is obtained the joint survival function given by, To simplify the notation, let us consider that and . From (2), the joint distribution function of is given by for and . Thus, the joint density function is given by where and are the marginal densities and is the copula density defined byTwo usual correlation measures between the two random variables are the Kendall's tau (τ) and Spearman's rho (ρ), which can be expressed respectively by the equations and For more details on these expressions, we can refer to Schweizer et al. [17], Joe [18], Nelsen [19] and Joe [20]. The expectations in these expressions can sometimes be computed in closed form according to the choice of . However, when does not have a closed form, the measures τ and ρ are computable by numerical integration or Monte Carlo simulation.There are many families of copula functions which differ in the detail of the dependence they represent. The main families of copulas used in this study are described briefly as follows:Archimedean Copulas: If a copula can be written in the form this copula is called Archimedean copula with generating function . In this case, is a real valued function satisfying The Archimedean copulas was studied and popularized by Genest and MacKay [21].;;for all andfor all .Extreme-Value Copulas: An extreme-value copula is defined by the expression where is a convex function satisfying for all
[22].FGM Copulas: this copula family is based on generalizations of Farlie-Gumbel-Morgenstern (FGM) copula.The copula functions considered in this study are as follows:Farlie-Gumbel-Morgenstern (FGM) copula: the Farlie–Gumbel–Morgenstern copula was originally proposed by Morgenstern [23] and further studied by Gumbel [24] and Farlie [25]. The FGM copula takes the form where . When , the joint survival function becomes , suggesting independence between and . Considering the FGM copula, from (4) and (5), the Kendall's and Spearman's correlation measures are given by and , respectively [26]. The respective copula density function is given byGeneralized Farlie-Gumbel-Morgenstern (GFGM) copula: the GFGM copula was introduced by Bairamov and Kotz [27]. Its form is given by where are additional parameters [28]. In this case, the possible range of the dependence parameter ϕ is and the respective copula density function is given by If , it is obtained the original FGM copula. The Kendall's and Spearman's correlation measure are given respectively by where is the beta function and is the gamma function.Type 1 Huang–Kotz FGM (HKFGM1) copula: the HKFGM1 copula is a special case of the GFGM copula with
[29]. In this case, the possible range of the dependence parameter ϕ isType 2 Huang–Kotz FGM (HKFGM2) copula: in a similar way, the HKFGM2 copula is a special case of the GFGM copula with
[29]. For this case, the possible range of the dependence parameter ϕ isFischer and Köck FGM (FKFGM) copula: Fischer and Köck [30] proposed some extensions of the FGM copula. One of them is defined by where and . Independence between and corresponds to . The respective copula density function is given by whereClayton copula: the Clayton copula takes the form where . This copula was first introduced by Clayton [31] and subsequently studied by Cook and Johnson [32] and Oakes [33]. In this case, the marginals become independent when ϕ tends to zero. For this copula function, the relationship between the dependence copula parameter ϕ and the Kendall's tau is given by . If ϕ tends to infinite, then τ tends to 1 indicating perfect positive dependence. Some authors have shown that the expression for ρ assuming the Clayton copula is very complicated. The respective copula density function is given byThis copula belongs to the Archimedean family, with .Burr copula: The Burr copula [34] is given by where . The relationship between the dependence copula parameter ϕ and the Kendall's tau given by . When it is indicated the total dependence between and . The copula density function for the Burr copula is given byThis copula does not belong to the families mentioned in this study.Gumbel-Hougaard (GH) copula: the GH copula [35], [36] is defined as where the parameter ϕ is restricted to the interval . When ϕ tends to 1, it is obtained . This copula function does not allow for negative dependence. The relationship between ϕ and Kendall's tau is given by . The copula density function for the Gumbel-Hougaard copula is given byThis copula belongs to the Archimedean family, with . It is interesting to note that this copula is at the same time Archimedean and extreme-value, see Genest and Rivest [37].Gumbel-Barnett (GB) Copula: The GB Copula studied by Gumbel [24] and Barnett [38] is defined by where . When ϕ tends to zero, there is independence between the two random variables. Fredricks and Nelsen [39] show that the GB copula measures negative dependence, and the relationship between the dependence copula parameter ϕ and the Kendall's tau and Spearman's rho are given respectively by where Ei(.) is the exponential integral given by The copula density function for the Gumbel-Barnett copula is given byThis copula belongs to the Archimedean family, with .Galambos Copula: The Galambos copula defined by Galambos [40], is given by where . Independence between and corresponds to and the high dependence when . The relationship between ϕ and Spearman's rho is given by and the relationship between ϕ and Kendall's tau is very complicated (see, for example, Joe [20]). The respective copula density function is given by whereThis copula belongs to the extreme-value copula family, with .Frank copula: the Frank copula [41] is given by where . If , the random variables and are independents, and if there is an indication that and are correlated with each other. Nelsen [42] and Genest [43] have shown that the relationship between the dependence copula parameter ϕ and the Kendall's and Spearman's correlation measures are given respectively, by where is the Debye function given by The copula density function for the Frank copula is given byThis copula belongs to the Archimedean family, with .Ali-Mikhail-Haq (AMH) copula: the AMH copula introduced by Ali et al. [44] and Kumar [45] is defined by where the dependence copula parameter is ϕ
. Thus, the AMH copula measures both positive and negative dependence. The independence between and occurs when . The relationship between ϕ, the Kendall's tau and Spearman's rho are given respectively by and where dilog(.) is the dilogarithm function defined by Thus, the parameter τ ranges from −0.1817 to 0.3333 and ρ ranges from −0.2710 to 0.4784. The copula density function for the AMH copula function is given byThis copula belongs to the Archimedean family, with .A12 copula: the A12 copula function is defined following the order of appearance in Table 4.1 (one parameter Archimedean Copulas) presented in page 116 of the book “An Introduction to Copulas” by Nelsen [19]. It is defined by where . For this copula function, the relationship between the dependence copula parameter ϕ and the Kendall's tau is given by . If ϕ tends to infinite, then τ tends to 1 indicating perfect positive dependence between and . In this case it is observed that . The copula density function for the A12 copula function is given by whereThis copula belongs to the Archimedean family, with .Joe copula: the Joe copula [46] function is defined by where ϕ
. Joe [20] shows that for this copula function the Kendall's correlation measure is given by When it is considered that , where , and is the gamma function. The copula density function for the Joe copula function is given byThis copula belongs to the Archimedean family, with .Plackett Copula: Plackett [47] defined the following copula where . Independence between and corresponds to , while for we note negative correlation. The relationship between ϕ and the Spearman ρ correlation coefficient is The copula density function for the Plackett copula function is given byThis copula does not belong to the families mentioned in this study. In fact, it is considered in the literature that this function and its generalizations defines a special family of copulas [19], [48].
Bivariate models with long-term survivors
From (1), mixture formulations for the survival functions in the bivariate lifetime case are given by where and denotes the survival function for the susceptible individuals in the entire population. From (3) and (10), the joint survival function considering the mixture formulation is given byLet us define two indicator variables denoted by and , considering for a susceptible individual in the lifetime and for an immune or cured individual in , . In this way, and . The observed data for any individual satisfies one of the following cases, in accordance with the susceptibility pattern:The individual is not susceptible to both events of interest. The respective probability is .The individual is susceptible to event 1 but is not susceptible to event 2. The respective probability is .The individual is not susceptible to event 1 but is to event 2. The respective probability is .The individual is susceptible to both events. The respective probability is given by .Observe that is the covariance between the random variables and , such that
[49]. In addition, it is possible to note that , and . The covariance ω takes the zero value when there is independence between the probabilities of cure associated to the lifetimes and . Wienke et al. [50] showed that the general bivariate survival function after integrating out the random variables and is given by where is the joint survival function for and for the susceptible individuals, by the copula functions presented in subsection 2.1 considering and .The contribution of the i-th subject from a random sample (), , for the likelihood function is given by where is a censoring indicator variable, that is, for an observed lifetime and for a censored lifetime. In this expression, considering and . The derivatives and for each studied copula function are presented below.FGM copula:GFGM copula:HKFGM1 copula: the derivatives are obtained replacing p by 1 in the previous equations;HKFGM2 copula: the derivatives are obtained replacing q by 1 in the equations for the GFGM copula;FKFGM copula: where is given by (6);Clayton copula:Burr Copula:Gumbel-Hougaard (GH) Copula:Gumbel-Barnett (GB) Copula:Galambos Copula: where is given by (8).Frank copula:AMH copula:A12 copula: where is given by (9).Joe copula:Plackett Copula:
Bivariate Weibull model with long-term survivors
Let T be a random variable denoting the time to the occurrence of some event of interest. The survival function of the Weibull standard distribution with two parameters, introduced by Fréchet [51] and described in detail by [52], is given by where , and . The corresponding probability density and hazard function are given respectively by where the hazard function may be of increasing or decreasing forms depending on or , respectively. If , we have a exponential distribution.Assuming the bivariate model with long-term survivors given in equation (11), and considering and , we have where is a survival copula function described in subsection 2.1, with and .
Bivariate Kaplan-Meier estimator
Comparisons between the estimated values and empirical estimates can be used to assess the fit of a model. Similarly to the study from Tovar Cuevas et al. [53], we observed the proximity between the bivariate parametric survival function estimates from the proposed copula models and the empirical estimates of the survival function based on the non-parametric bivariate Kaplan-Meier method. Several authors suggested different non-parametric estimators for the bivariate survival curve, such as Cambell and Földes [54], Tsai et al. [55], Dabrowska et al. [56], Prentice and Cai [57], Van Der Laan [58] and Prentice [59]. However, these estimators do not presented good numerical properties or they are very complicated and difficult for practical use. In this paper, we consider the Lin and Ying [60] estimator, due the simple form and good numerical performance [61].Let , , be pairs of n independent and identically distributed failure times with join survival function and let , , be random variables from n independent and identically distributed censoring times following an univariate survival function . Let us assume that are independent of the pairs of survival times , . Thus, the observed data is , , where , , , , denotes the indicator function, and . In this way, Lin and Ying [60] suggested as an estimator of the joint survival function , the following expression: where, and .
Model comparison criteria
Under a Bayesian approach it is often considered for comparison among different Bayesian models the use of the deviance information criterion (DIC) proposed by Spiegelhalter et al. [62]. The DIC value is given by where is the deviance calculated in the posterior mean of the parameter of interest obtained using MCMC simulation methods and is the effective number of parameters in the model, with , where is the posterior mean of the deviance and can be approximated using a MCMC simulation by taking the sample mean of the simulated values of . Smaller values of DIC indicate better model fit.Another criteria for comparison among Bayesian models is given by the logarithms of pseudo marginal likelihood functions (LPML). The LPML is obtained from the conditional predictive ordinates (CPO) [63]. For the i-th observation the CPO is given by where is the complete vector of parameter, is the sample without the i-th observation and is the posterior density of given , . Usually the does not have a closed form and it is very complicated to be calculated. However an approximation based on MCMC methods for the
[64], is given by where N is the number of iterations during the implementation of the MCMC procedure after a burn-in period used to eliminate the effect of the initial values of the iterative algorithm and where is the vector of the samples obtained at the n-th iteration [65]. In this way, as a numerical criterion, the LPML value is obtained by Larger values of LPML indicate better fit of the model [66].
Distance between two matrices
In the discrimination of different bivariate lifetime distributions in presence of cure rates constructed using the different copula functions introduced in subsection 2.1, it is also proposed the use of discrimination methods based on the distance between two matrices. Let , two matrices of size , m rows and n columns, and denoting the matrix norm. Thus, measures the distance between the matrices and , denoted by . Several norms can be considered, as the following:this norm is calculated by absolute sums of the differences of matrix elements, expressed bythis norm is equivalent to the Euclidean norm in matrices, given bythis norm is calculated by the maximum value of the sum of the absolute values of the differences in each column, given bythis norm is calculated by the maximum value of the sum of the absolute values of the differences in each row, given bythis norm is calculated by the maximum absolute value of the differences of matrix elements, given byIn this study, for each adjusted model was obtained a matrix from bivariate Kaplan-Meier estimator (see subsection 2.4). This “empirical” matrix is compared to adjusted matrix obtained by copulas models. Lower distance values indicate similar matrices.
A Bayesian approach
Assuming the proposed bivariate Weibull distributions with long-term survivors based on different copula functions (see subsection 2.1), under a Bayesian framework, the joint posterior distributions for the model parameters are obtained by combining the joint prior distribution of the parameters and the likelihood function given by equation (12). To simulate samples from the joint posterior distribution, we could consider the use of MCMC (Markov Chain Monte Carlo) algorithms implemented in the OpenBUGS software, where we only need to specify the data distribution and the prior distributions for the parameters of each assumed model.Under a Bayesian approach, it is assumed independent prior distributions for the parameters. For the parameters , of the Weibull distributions, let us assume gamma prior distributions since these parameters are defined for positive values, that is, and , , where a and b, are known hyperparameters, and denotes a gamma distribution with mean and variance .It is also assumed beta prior distributions for the cure fractions and observing that these parameters are defined for values in the interval . That is, , , where denotes a beta distribution and c and d are known hyperparameters. Considering that , it is assumed a generalized beta prior distribution for the covariance parameter ω. Thus, , where GB denotes a generalized beta distribution with known hyperparameters e and fFor the FGM copula and AMH copula it was assume that the dependence parameter ϕ follows an uniform prior distribution, that is, , which ensures that . Considering the GFGM copula it is assumed that the parameters p and q follow an uniform prior distribution and , and the dependence parameter ϕ follows an uniform prior distribution, that is, , where m and n are known hyperparameters, and . Under the HKFGM1 copula it is assumed that the parameter q follows an uniform prior distribution as assumed for the GFGM copula and the dependence parameter ϕ follows an uniform prior distribution, that is, , where . It is assumed for the HKFGM2 copula that the parameter p follows the same uniform prior distribution as assumed for the GFGM copula and the parameter ϕ follows an uniform prior distribution, that is, , where and . Under the FKFGM copula it was assumed that the dependence parameter ϕ follows an uniform prior distribution, that is, the same prior assumed in the FGM copula and the parameter p follows an uniform prior distribution, that is, .Under the Clayton copula, Burr copula, Galambos copula, Frank copula and Plackett copula it is assumed that the dependence parameter ϕ follows a Gamma prior distribution, that is, . Under the GH copula, A12 copula and Joe copula it is assumed for the dependence parameter ϕ an uniform prior distribution, that is, , and for the GB copula it is assumed . Also, it is assumed that and d are known hyperparameters. The hyperparameter values for of the prior distributions for the parameter ϕ are selected according to the information about correlation between and for each data set.Posterior summaries for the parameters of interest were obtained using Monte Carlo Markov Chain (MCMC) methods. It was run 200,000 iterations of the MCMC algorithm where the first 20,000 runs were discarded as burn-in samples in order to eliminate the effect of the initial values. In addition, it was stored each 50th simulated sample to reduce the autocorrelation between successive MCMC samples. The 95% credible intervals (95% CI) for each parameter were constructed based on the upper and lower 2.5% quantiles of the corresponding simulated values. Chain convergence was verified by inspecting trace plots and by examining autocorrelation between successive MCMC samples.
Applications to real data
An application to a diabetic retinopathy data set
In order to evaluate the proposed methodology, it is considered in this subsection, a data set of bivariate lifetimes introduced by Group et al. [67]. This data set is related to the occurrence times of visual losses of 197 diabeticpatients under 60 years of age that were followed-up for a fixed period of time. In the study, each patient had one eye randomized for laser treatment and the other eye receiving no treatment. It was considered for the bivariate analysis that is the time up to visual loss for the treated eye, while is the time up to visual loss for the not treated or control eye. There was 43% censored data of treated eyes and 73% censored data of not treated eyes. Table 1 shows the percentage of censoring data in the diabetic retinopathy data. It is observed that only 20% are not censored in both and , the most of the data are censored in both times.
Table 1
Description of the presence of censoring in diabetic retinopathy data set.
T1
T2
%
Censored data (δ1 = 0)
Censored data (δ2 = 0)
40%
Completed data (δ1 = 1)
Censored data (δ2 = 0)
32%
Censored data (δ1 = 0)
Completed data (δ2 = 1)
8%
Completed data (δ1 = 1)
Completed data (δ2 = 1)
20%
Description of the presence of censoring in diabetic retinopathy data set.In Table 2, it is presented the posterior Bayesian estimates for the bivariate Weibull distribution based from each copula function described in subsection 2.1 considering the diabetic retinopathy data set assuming the prior distributions described in section 3 with hyperparameter values. It was assumed the following prior distributions for the parameter ϕ: assuming Clayton copula, Burr copula, Galambos copula, Frank copula and Plackett copula, assuming GH copula and A12 copula, assuming GB copula. Observing the mean estimates and the 95% credible intervals for the parameters and it is possible to see that for all assumed copula functions the estimated values are approximately close.
Table 2
Bayesian estimates for the parameters of the bivariate Weibull distribution with long-term survivors based on copula functions considering the diabetic retinopathy data set.
FGM copula
GFGM Copula
HKFGM1 Copula
HKFGM2 Copula
FKFGM Copula
Mean
95% CI
Mean
95% CI
Mean
95% CI
Mean
95% CI
Mean
95% CI
α1
0.3773
(0.2481, 0.5255)
0.4065
(0.2756, 0.5547)
0.3970
(0.2631, 0.5463)
0.3755
(0.2409, 0.5276)
0.3835
(0.2593, 0.5364)
α2
0.3235
(0.1315, 0.5456)
0.3166
(0.1520, 0.5118)
0.3631
(0.1796, 0.5711)
0.3025
(0.1275, 0.5403)
0.3626
(0.1482, 0.5910)
β1
0.9463
(0.7673, 1.1510)
0.9454
(0.7688, 1.1390)
0.9471
(0.7686, 1.143)
0.9462
(0.7656, 1.1540)
0.9475
(0.7742, 1.1490)
β2
0.9996
(0.7057, 1.3520)
1.0340
(0.7494, 1.3550)
1.0538
(0.7433, 1.379)
0.9812
(0.6938, 1.3230)
1.0310
(0.7244, 1.3560)
ψ00
0.2347
(0.0535, 0.3601)
0.2482
(0.0887, 0.3619)
0.2511
(0.0996, 0.3626)
0.2280
(0.0496, 0.3557)
0.2501
(0.0971, 0.3626)
ψ10
0.3176
(0.0408, 0.5150)
0.3226
(0.0674, 0.4896)
0.3529
(0.1188, 0.5149)
0.2995
(0.0314, 0.4999)
0.3382
(0.0475, 0.5272)
ψ01
0.0579
(0.0023, 0.1672)
0.0624
0.0049, 0.1503)
0.0521
(0.0022, 0.1309)
0.0610
(0.0032, 0.1765)
0.0498
(0.0021, 0.1550)
ψ11
0.3898
(0.2332, 0.7151)
0.3668
(0.2320, 0.6476)
0.3437
(0.2232,0.5855)
0.4115
(0.2353, 0.7447)
0.3618
(0.2255, 0.6530)
ω
0.0701
(0.0128, 0.1351)
0.0688
(0.0153, 0.1289)
0.0664
(0.0154, 0.1245)
0.0720
(0.0114, 0.1431)
0.0727
(0.0186, 0.1412)
p1
0.2925
(0.0927, 0.4316)
0.3106
(0.1369, 0.4409)
0.3033
(0.1222, 0.4321)
0.2890
(0.0867, 0.4333)
0.3000
(0.1285, 0.4342)
p2
0.5523
(0.1770, 0.7184)
0.5708
(0.2405, 0.7132)
0.6041
(0.3344, 0.7247)
0.5275
0.1639, 0.7134)
0.5884
(0.2366, 0.7289)
p
–
–
2.9970
(1.1110, 4.8740)
1.0000
–
2.144
(1.0360, 4.6220)
5.0210
(1.1800, 9.6880)
q
–
–
3.2060
(1.2950, 4.9040)
1.8333
(1.0540, 2.8920)
1.0000
–
–
–
ϕ
0.5720
(-0.1937, 0.9790)
0.6585
(-0.0587, 2.0260)
1.4604
(1.2730, 3.0810)
0.5242
(-0.0548, 0.7841)
0.3801
(-0.8119, 0.9805)
τ
0.1907
(-0.0646, 0.3263)
0.1387
(-0.0061, 0.2629)
0.1101
(-0.0215, 0.2390)
0.1226
(-0.0332, 0.3493)
0.0494a
(-0.0536, 0.1800)
ρ
0.1271
(-0.0430, 0.2176)
0.2080
(-0.0092, 0.3944)
0.1652
(-0.0323, 0.3585)
0.2958
(-0.0498, 0.2329)
0.0741a
(-0.0803, 0.2699)
DIC=895.1
DIC=894.6
DIC=896.3
DIC=895.8
DIC=896.6
LPML=-441.7
LPML=-442.4
LPML=-442.0
LPML=-441.8
LPML=-442.0
Correlations obtained by numerical integration according to the equations (4) and (5). Bold values represent the better fit to the data.
Bayesian estimates for the parameters of the bivariate Weibull distribution with long-term survivors based on copula functions considering the diabetic retinopathy data set.Correlations obtained by numerical integration according to the equations (4) and (5). Bold values represent the better fit to the data.Following the model selection criteria DIC e LPML, the estimated GFGM model has better values for the LPML criteria; estimated Plackett and Clayton copula models have the best values for DIC criteria. However, in all copulas the values of DIC and LPML are very close. The estimated GB copula model has different values for all discrimination criteria, presumably associated to their negative structure dependence, which does not seems appropriated for the assumed data set. Observing the parameter ω of the covariance between the indicator variables and of susceptible individual in cure fractions for the lifetimes and in presence of cure fractions, it was observed small estimated values for this parameter, indicating a tendency of independence between the cure fractions to the two lifetimes.It is observed that the correlation parameters τ and ρ appear to be close to zero. In order to compare with an independent copula function, it was also estimated DIC and LPML, with estimates based on the simulated Gibbs samples given respectively, by DIC=897.4 and LPML=-442.9, that is, these values indicate that a dependency structure is needed to joint model and , that is, the dependence between and is small, but different of zero. In fact this can be confirmed by 95% credibility intervals for the copula dependence parameters, where it is observed that zero is not included in the credible intervals.Table 3 presents the distance measures between the bivariate Kaplan-Meier matrix and estimated bivariate lifetimes matrix estimated by a Weibull distribution with long-term survivors based on the different copula functions presented in subsection 2.1 for the diabetic retinopathy data set. From the results of Table 3 it is observed that for four copulas there are lower distances, where it is observed that overall, copula A12 gives better results for the absolute e Forbenius norms, however this copula unlike other copulas is not appropriated to model small correlations. The GB copula exhibit a lower value for the maximum absolute norms, however this copula has only negative dependence structure, not suitable for this real data set. Only the GFMF copula model gives some norm with small values and also better selection criteria estimates.
Table 3
Distance between the bivariate Kaplan-Meier and estimated bivariate Weibull survival function points with long-term survivors based on copula functions considering the diabetic retinopathy data set.
Copula Functions
Distances (norms)
Absolute Sum
Forbenius
Maximum Column Sum
Maximum Row Sum
Maximum Absolute
FGM
1801.11
14.03
57.55
43.14
0.44
GFGM
1654.32
13.38
56.72
41.99
0.43
HKFGM1
1732.09
13.56
55.76
42.20
0.43
HKFGM2
1820.50
14.23
58.28
43.70
0.44
FKFGM
1837.15
13.98
56.45
42.49
0.43
Clayton
1539.06
13.18
56.79
43.11
0.43
Burr
1828.10
14.27
58.38
43.92
0.44
GH
1800.35
14.19
58.39
44.03
0.44
GB
2130.31
15.07
55.78
42.85
0.42
Galambos
1926.31
14.62
58.60
44.07
0.44
Frank
1706.60
13.88
57.93
43.98
0.44
AMH
1821.05
14.14
57.58
43.66
0.44
A12
1464.70
12.97
56.52
43.33
0.43
Joe
1886.84
14.47
58.72
43.66
0.44
Plackett
1562.55
13.38
57.20
43.73
0.44
Bold values represent the smaller distance.
Distance between the bivariate Kaplan-Meier and estimated bivariate Weibull survival function points with long-term survivors based on copula functions considering the diabetic retinopathy data set.Bold values represent the smaller distance.Fig. 4.1 presents the plots of the estimated bivariate Weibull survival functions based on the different copula functions in presence of cure fraction and the non-parametric Kaplan-Meier estimates considering diabetic retinopathy data set. It is possible to note that all copula functions display fitted curves suitable to the data. The estimated curves follow properly the empirical values related to the cure fraction plateau.
Figure 4.1
Plots of the marginal survival functions estimated by the Kaplan-Meier method and assuming the bivariate Weibull distribution with long-term survivors based on copula functions considering the diabetic retinopathy data set. Treatment eye, panels (a), (c) and (d), and control eye, panels (b), (d) and (f).
Plots of the marginal survival functions estimated by the Kaplan-Meier method and assuming the bivariate Weibull distribution with long-term survivors based on copula functions considering the diabetic retinopathy data set. Treatment eye, panels (a), (c) and (d), and control eye, panels (b), (d) and (f).In fact for this application it is difficult to decide which copula function is more appropriate for the real data set, but bivariate models based from GFGM, HKFGM1, Clayton, A12 and Placket copulas were better fitted in general. However, under the proposed comparing criteria, distances and estimated survival functions, assuming the GFGM copula model seems to be the most appropriate model for this application, as showed by the different selection criteria used in this application, better results using comparative criteria indexes and in general, less distance between the empirical and the estimated values.
An application to a cervical cancer data set
In this subsection, it is considered an application of the bivariate Weibull with long-term survivors to a real medical set obtained from a published study by Brenna et al. [68]. In this study, related to invasive cervical cancer, 118 women received a standard treatment recommended by the International Federation of Gynecology and Obstetrics (FIGO). The disease-free survival (DFS) is defined as the time from the date of surgery to the first event of disease recurrence and the overall survival (OS) is defined as the time from the date of surgery to the death. It was considered for the bivariate data analysis that is the DFS time, while is the OS time. There is 48% censored data in and 53% censored data in . Table 4, presents the percentage of censored survival data for the cervical cancer data set. It is observed that about half of survival times are censored in both and .
Table 4
Description of the presence of censoring in cervical cancer data set.
DFS time (T1)
OS time (T2)
%
Censored data (δ1 = 0)
Censored data (δ2 = 0)
48%
Completed data (δ1 = 1)
Censored data (δ2 = 0)
5%
Censored data (δ1 = 0)
Completed data (δ2 = 1)
0%
Completed data (δ1 = 1)
Completed data (δ2 = 1)
47%
Description of the presence of censoring in cervical cancer data set.Table 5, it is showed the Bayesian posterior estimates for the parameters of the bivariate Weibull distribution with long-term survivors based on copula functions in this study considering the cervical cancer data. The posterior summaries of the parameters , and were obtained assuming the prior distributions introduced in section 3 with hyperparameter values presented in subsection 2.1. In this application, it was considered for the parameter ϕ the following prior distributions: assuming Clayton copula, Burr copula, Galambos copula, Frank copula and Plackett copula, assuming GH copula and A12 copula, assuming GB copula.
Table 5
Bayesian estimates for the parameters of the bivariate Weibull distribution with long-term survivors based on copula functions considering the cervical cancer data set.
FGM copula
GFGM Copula
HKFGM1 Copula
HKFGM2 Copula
FKFGM Copula
Mean
95% CI
Mean
95% CI
Mean
95% CI
Mean
95% CI
Mean
95% CI
α1
0.8343
(0.6216, 1.0840)
0.7441
(0.5551, 0.9661)
0.8675
(0.6578, 1.112)
0.7510
(0.5515, 0.9963)
0.8426
(0.6316, 1.0970)
α2
0.2923
(0.1875, 0.4237)
0.2615
(0.1709, 0.3728)
0.2943
(0.1914, 0.4212)
0.2311
(0.1448, 0.3335)
0.2724
(0.1754, 0.3969)
β1
0.8004
(0.6577, 0.9511)
0.7813
(0.6410, 0.9278)
0.7880
(0.6502, 0.9324)
0.8324
(0.6854, 0.9908)
0.8071
(0.6594, 0.9602)
β2
1.3480
(1.1040, 1.6010)
1.3460
(1.1010, 1.6060)
1.3460
(1.1080, 1.6080)
1.4180
(1.1610, 1.6940)
1.3720
(1.1270, 1.6370)
ψ00
0.3871
(0.2926, 0.4884)
0.3794
(0.2845, 0.4762)
0.3858
(0.2893, 0.486)
0.3799
(0.2859, 0.4817)
0.3864
(0.2880, 0.4858)
ψ10
0.0263
(0.0012, 0.0769)
0.0386
(0.0066, 0.0886)
0.0399
(0.0080, 0.0908)
0.0378
(0.0054, 0.0894)
0.0383
(0.0066, 0.0890)
ψ01
0.0108
(0.0002, 0.0400)
0.0108
(0.0003, 0.0397)
0.0105
(0.0003, 0.0384)
0.0109
(0.0003, 0.0396)
0.0107
(0.0003, 0.0386)
ψ11
0.5757
(0.4736, 0.6743)
0.5711
(0.4717, 0.6722)
0.5637
(0.4627, 0.6647)
0.5714
(0.4679, 0.6679)
0.5646
(0.4618, 0.6661)
ω
0.2203
(0.1879, 0.2426)
0.2140
(0.1810, 0.2379)
0.2147
(0.1818, 0.2379)
0.2144
(0.1810, 0.2382)
0.2154
(0.1819, 0.2390)
p1
0.3979
(0.3038, 0.4984)
0.3903
(0.2944, 0.4879)
0.3963
(0.2987, 0.4955)
0.3908
(0.2952, 0.4911)
0.3971
(0.2994, 0.4962)
p2
0.4135
(0.3125, 0.5167)
0.4181
(0.3184, 0.5165)
0.4258
(0.3250, 0.5268)
0.4177
(0.3193, 0.5200)
0.4247
(0.3244, 0.5269)
p
–
–
3.1930
(1.6270, 4.8330)
1.0000
–
2.0060
(1.0460, 3.9640)
1.6700
(1.0190, 3.6850)
q
–
–
2.3260
(1.3530, 4.3390)
1.5350
(1.1270, 2.245)
1.0000
–
–
–
ϕ
0.8960
(0.6434, 0.9971)
0.7608
(0.4389, 1.3810)
2.0880
(1.2730, 3.0810)
0.5242
(0.2245, 0.9134)
0.9133
(0.6796, 0.9974)
τ
0.2987
(0.2145 0.3324)
0.2745
(0.1884, 0.3254)
0.2100
(0.1893, 0.3845)
0.2233
(0.1717, 0.2479)
0.1744a
(0.0953, 0.2164)
ρ
0.1991
(0.1430 0.2216)
0.4118
(0.2826, 0.4881)
0.3151
(0.1262, 0.2563)
0.3349
(0.2576, 0.3719)
0.2613a
(0.1428, 0.3246)
DIC=488.3
DIC=462.7
DIC=485.2
DIC=494.0
DIC=493.6
LPML=-238.7
LPML=-232.7
LPML=-236.8
LPML=- 239.6
LPML=-241.2
Correlations obtained by numerical integration according to the equations (4) and (5). Bold values represent the better fit to the data.
Bayesian estimates for the parameters of the bivariate Weibull distribution with long-term survivors based on copula functions considering the cervical cancer data set.Correlations obtained by numerical integration according to the equations (4) and (5). Bold values represent the better fit to the data.It is noted that both the DIC and LPML values are very close together when considering the Clayton, GH, Galambos, Frank, A12 and Plackett copulas (these copula functions allow for high correlations between the lifetimes and ). However, the smaller DIC and the largest LPML are related to Frank copula.It was also observed that the parameter ω related to for measure the covariance between of the cure fractions associated to lifetimes and , presented estimates close to 0.2 and 95% credible intervals with values close to this value assuming all copula functions which is an indication of dependence between both lifetimes assuming the presence of cure fraction. The estimated correlation coefficients τ and ρ are high, reaching values close to 0.8; in this way, it was observed that assuming lower dependence copulas the estimated correlation was in all cases, close to their bounds. Furthermore, in this application, considering the models based on all copula functions, the estimates for the parameters and are close to each other.In comparison with an independent copula function, it was also estimated DIC and LPML, with estimates based on the simulated Gibbs samples given respectively, by DIC=508.7 and LPML=-249.1. These results indicate that a dependent structure assuming copula functions is needed in the data analysis of and .Table 6 shows the distance measures between the bivariate Kaplan-Meier matrix and estimated bivariate lifetimes matrix fitted by the Weibull distribution with long-term survivors based from the different copula functions considering the cervical cancer data set. The fitted bivariate model based on the Clayton copula function shows the lower values between empirical and estimated lifetimes. Indeed, this can be explained by Fig. 4.2 where the fitted curve from Clayton copula follow most closely the Kaplan-Meier plot.
Table 6
Distance between the bivariate Kaplan-Meier matrix and estimated bivariate Weibull survival function points with long-term survivors based on copula functions considering the cervical cancer data set.
Copula Functions
Distances (norms)
Absolute Sum
Forbenius
Maximum Column Sum
Maximum Row Sum
Maximum Absolute
FGM
212.30
4.50
8.31
13.30
0.22
GFGM
190.38
4.24
8.15
13.06
0.22
HKFGM1
208.70
4.47
8.41
13.23
0.22
HKFGM2
191.66
4.23
8.12
12.92
0.22
FKFGM
209.99
4.49
8.43
13.26
0.22
Clayton
112.87
2.76
5.59
10.12
0.18
Burr
187.67
4.05
7.17
12.71
0.23
GH
154.19
3.52
6.66
11.85
0.21
GB
265.17
5.28
8.75
13.62
0.23
Galambos
158.73
3.62
6.87
12.06
0.21
Frank
155.53
3.42
6.29
11.34
0.20
AMH
216.59
4.54
8.46
13.21
0.22
A12
139.67
3.33
6.60
11.46
0.19
Joe
189.26
4.07
6.96
12.72
0.23
Plackett
161.86
3.63
6.97
11.95
0.20
Bold values represent the smaller distance.
Figure 4.2
Plots of the marginal survival function estimated by the Kaplan-Meier method and assuming bivariate Weibull distribution with long-term survivors based on copula functions considering the cervical cancer data set. Time of DFS, panels (a), (c) and (d), and time of OS, panels (b), (d) and (f).
Distance between the bivariate Kaplan-Meier matrix and estimated bivariate Weibull survival function points with long-term survivors based on copula functions considering the cervical cancer data set.Bold values represent the smaller distance.Plots of the marginal survival function estimated by the Kaplan-Meier method and assuming bivariate Weibull distribution with long-term survivors based on copula functions considering the cervical cancer data set. Time of DFS, panels (a), (c) and (d), and time of OS, panels (b), (d) and (f).In some survival curves estimated from the models based on the copula functions, like Clayton, Galambos, Frank, A12 and Packett copulas, it is observed that these estimated models give better approximation for the Kaplan-Meier curve especially at the right end when compared to the other fitted models; also these copula functions gives greatest LPML, DIC values and the smallest distance measures between matrices. However the bivariate model based on the Clayton copula is the best model under any criteria considered in this study.
An application to a breast cancer data set
In this subsection it is assumed the use of the proposed models in the data analysis of a data set related to a cohort, where 97 patients underwent surgical treatment for breast cancer followed up for a period ranging from the year 2000 to 2011. More details of this study can be found in Shigemizu et al. [69]. For this bivariate lifetime application it was considered the disease-free survival time (DFS) and the overall survival time (OS), denoted respectively by and . In the dataset, there is 75% censored data in the disease-free survival time () and 80% censored data in the overall survival time (). Table 7 shows the percentage of censored survival data for the breast cancer data set, where it is possible to note that only 15% of the survival times are not censored in both and , the most of it are censored in both lifetimes.
Table 7
Description of the presence of censoring in breast cancer data set.
DFS time (T1)
OS time (T2)
%
Censored data (δ = 0)
Censored data (δ = 0)
70%
Completed data (δ = 1)
Censored data (δ = 0)
10%
Censored data (δ = 0)
Completed data (δ = 1)
5%
Completed data (δ = 1)
Completed data (δ = 1)
15%
Description of the presence of censoring in breast cancer data set.Table 8 shows the posterior summaries of interest for the bivariate Weibull distribution with long-term survivors based on the copula functions described in the subsection 2.1 considering the breast cancer data set and the same hyperparameter values presented in subsection 4.2. It is possible to see that DIC and LPML values estimated assuming the Clayton copula function indicates the better fitted model. In this application, it is was observed that assuming all different copula functions, there are satisfactory estimators for the parameters and . In addition, the model based on the Clayton copula function estimated the larger value for ω (close to 0.2), indicating a dependence between the cure fractions to lifetimes and .
Table 8
Bayesian estimates for the parameters of the bivariate Weibull distribution in presence of long-term survivors based on copula functions considering the breast cancer data set.
FGM copula
GFGM Copula
HKFGM1 Copula
HKFGM2 Copula
FKFGM Copula
Mean
95% CI
Mean
95% CI
Mean
95% CI
Mean
95% CI
Mean
95% CI
α1
0.2821
(0.1402, 0.4774)
0.2831
(0.1418, 0.4807)
0.2970
(0.1519, 0.4973)
0.2815
(0.1433, 0.4703
0.2779
(0.1372, 0.4678)
α2
0.1800
(0.0675, 0.3603)
0.1703
(0.0681, 0.3445)
0.1752
(0.0714, 0.3359)
0.1734
(0.0621, 0.3554)
0.1886
(0.0691, 0.3815)
β1
1.2270
(0.8630, 1.6240)
1.2290
(0.8589, 1.6380)
1.2320
(0.8607, 1.6120)
1.2280
(0.8654, 1.6220)
1.2300
(0.8557, 1.6360)
β2
1.2890
(0.7611, 1.9050)
1.1740
(0.7026, 1.8580)
1.1690
(0.7103, 1.8000)
1.3180
(0.7593, 1.9890)
1.3290
(0.7925, 1.9450)
ψ00
0.6697
(0.5887, 0.7781)
0.6663
(0.5624, 0.7594)
0.6681
(0.5659, 0.7573)
0.6708
(0.5691, 0.7654)
0.6708
(0.5719, 0.7616)
ψ10
0.0871
(0.0084, 0.1752)
0.0678
(0.0025, 0.1699)
0.0693
(0.0032, 0.1634)
0.0894
(0.0082, 0.1738)
0.0954
(0.0140, 0.1784)
ψ01
0.0181
(0.0005, 0.0660)
0.0196
(0.0005, 0.0741)
0.0004
(0.0022, 0.0686)
0.0184
(0.0005, 0.0701)
0.0166
(0.0004, 0.0580)
ψ11
0.2252
(0.1370, 0.3414)
0.2463
(0.1421, 0.3698)
0.2442
(0.1477, 0.3586)
0.2214
(0.1298, 0.3361)
0.2172
(0.1327, 0.3317)
ω
0.1477
(0.0945, 0.2092)
0.1610
(0.0971, 0.2202)
0.1603
(0.1014, 0.2186)
0.1453
(0.0907, 0.2060)
0.1426
(0.0917, 0.2039)
p1
0.6977
(0.5887, 0.7781)
0.6959
(0.5835, 0.7800)
0.6965
(0.5871, 0.7760)
0.6992
(0.5895, 0.7795)
0.6974
(0.5878, 0.7776)
p2
0.7567
(0.6344, 0.8494)
0.7341
(0.6048, 0.8432)
0.7375
(0.6140, 0.8409
0.7602
(0.6372, 0.8572)
0.7663
(0.6498, 0.8525)
p
–
–
2.3560
(1.0530, 4.6600)
1.0000
–
2.3420
(1.0360, 4.7000)
5.3590
(1.1790, 9.7210)
q
–
–
3.2920
(1.2350, 4.9200)
1.9670
(1.0540, 2.8920)
1.0000
–
–
–
ϕ
0.4034
(-0.2531, 0.3278)
0.8012
(-0.8887, 2.8400)
1.5960
(1.2730, 3.0810)
0.2364
(-0.2902, 0.7528)
0.1779
(-0.9092, 0.9737)
τ
0.0896
(-0.1687, 0.2185)
0.1003
(-0.1525, 0.2730)
0.0815
(0.0804, 0.7528)
0.1127
(-0.0969, 0.2381)
0.0215a
(-0.0886, 0.1622)
ρ
0.1345
(-0.7593, 0.9834)
0.1504
(-0.2288, 0.4095)
0.1223
(-0.0536, 0.5019)
0.1691
(-0.1454, 0.3571)
0.0322a
(-0.1327, 0.2428)
DIC=330.5
DIC=325.1
DIC=327.9
DIC=330.7
DIC=331.0
LPML=-159.8
LPML=-159.3
LPML=-158.6
LPML=-159.7
LPML=-160.2
Correlations obtained by numerical integration according to the equations (4) and (5). Bold values represent the better fit to the data.
Bayesian estimates for the parameters of the bivariate Weibull distribution in presence of long-term survivors based on copula functions considering the breast cancer data set.Correlations obtained by numerical integration according to the equations (4) and (5). Bold values represent the better fit to the data.The estimated correlation parameters τ and ρ ranged widely depending on the assumed copula function. However, considering the copula functions with the best DIC and LPML values (Clayton, Frank, A12 and Plackett copula) the Kendall and Spearman correlation measure was large (about 0.7 and 0.8 respectively), an indication of the existence of a large correlation between and . Under copula functions of the GFGM family it is possible to note that the range of the parameter ϕ is very large, taking practically all values of the space parameter of ϕ, that is, implying that the range of the parameters τ and ρ became very wide; this fact could be the result of high percentage of censored survival times in the data set.For comparative purposes, an independent copula function was also estimated and based on the simulated Gibbs samples it was obtained DIC=331.6 and LPML=-160.4. This fact indicates that a dependency structure is needed to joint model and , especially comparing with Clayton and Frank copula. It is also worth mentioning that, according to the obtained estimates, the correlation between and is high (larger than 0.6).Table 9 displays the distances between the bivariate empirical Kaplan-Meier estimates matrix and estimated bivariate lifetimes assuming a Weibull distribution in presence of long-term survivors based on different copula functions considering the breast cancer data set. The Frank copula exhibits the shortest distances in two norms, absolute sum and Forbenius norm, an indication of better fit also suggested by LPML and DIC indexes. Nevertheless, it is interesting to note that Joe copula is not indicated as the best fitted model by DIC or LPML, but in three norms it is presented the smallest distances.
Table 9
Distance between the bivariate Kaplan-Meier matrix and estimated bivariate survival function assuming a Weibull distribution with long-term survivors based on copula functions considering the breast cancer data set.
Copula Functions
Distances (norms)
Absolute
Forbenius
Maximum Column
Maximum Row
Maximum Absolute
FGM
672.14
9.63
45.08
24.78
0.68
GFGM
678.60
9.64
45.17
24.58
0.67
HKFGM1
710.20
9.85
45.20
24.28
0.67
HKFGM2
662.22
9.57
45.08
24.88
0.68
FKFGM
668.66
9.61
44.99
24.82
0.68
Clayton
594.54
8.82
44.28
24.10
0.68
Burr
665.52
9.49
44.95
24.28
0.67
GH
648.15
9.10
44.07
23.03
0.67
GB
665.28
9.62
44.89
24.97
0.68
Galambos
653.86
9.18
44.24
23.18
0.67
Frank
587.45
8.77
44.20
23.95
0.68
AMH
665.20
9.60
45.36
25.02
0.67
A12
622.08
9.07
44.62
24.25
0.68
Joe
743.36
9.71
43.35
21.09
0.65
Plackett
605.11
8.95
44.54
24.29
0.68
Bold value represents the smaller distance.
Distance between the bivariate Kaplan-Meier matrix and estimated bivariate survival function assuming a Weibull distribution with long-term survivors based on copula functions considering the breast cancer data set.Bold value represents the smaller distance.In Fig. 4.3 it is presented the plots of the Weibull distribution in presence of long-term survivors based on different copula functions and the survival function estimated by the Kaplan-Meier method considering the breast cancer data set. It is possible to note that the survival curves fitted from the copula functions are very closed to the empirical Kaplan-Meier curve, that is, this suggests that the estimated model is adequate for this data set. From the graphics for it is possible to note that the estimated curves based on copula functions measured slightly below the cure fraction.
Figure 4.3
Plots of survival function estimated by the Kaplan-Meier method and assuming bivariate Weibull distributions with long-term survivors based on copula functions the considering breast cancer data set. Time of DFS, panels (a), (c) and (d), and time of OS, panels (b), (d) and (f).
Plots of survival function estimated by the Kaplan-Meier method and assuming bivariate Weibull distributions with long-term survivors based on copula functions the considering breast cancer data set. Time of DFS, panels (a), (c) and (d), and time of OS, panels (b), (d) and (f).Overall, in this application the choice of appropriate copula functions to be fitted by the data set is more unclear than those observed in the others applications. Among these, the most relevant were Clayton, Frank and Plackett copulas. In summary, the Frank copula since it has good DIC and LPML values, and lower distances between the empirical and estimated lifetime matrices could be considered as the better fitted model.
Conclusion
In this paper it is considered different copula functions to construct bivariate standard Weibull lifetime distributions in presence of long-term survivors and applied for three different real data sets. The Weibull distribution was chosen in this article due to its versatility and relative simplicity, but the proposed methodology showed here could be adapted to other continuous distributions assumed for lifetime data. The models considered in this study were based on marginal distributions considering only two parameters showing reasonable fit for the data in all considered applications, the estimated survival curves were close to the correspondent empirical Kaplan and Meier survival curves. For the data sets assumed in this study it was also observed good estimators for the percentages of cure fraction in all applications. The estimated percentages for parameters and by the proposed bivariate model were consistent with the values observed in the real data in each application. It was also noted that the considered models were able to measure the correlation between the both lifetimes also satisfactory measuring the covariance between the cure fractions associated to the lifetimes and .In this study, it was observed that when considering copula functions which can adequately measure the correlation structure between and , the fit of the estimated models to the real data is very similar, making it difficult to discriminate for the best copula using DIC, LPML or even graphically. Additionally, the comparison of the distances between the matrix of the bivariate survival times (obtained by the bivariate Kaplan-Meier estimator presented in section 2.4) and the matrix of the survival times estimated by copula means provides an additional criterion to decide which copula displays the best fit. It is important to point out that the choice of the most suitable copula function usually is complicated, varying according to the data set and, among the various copula functions presented in the literature, it results that some can produce equivalent results; therefore, it is recommended in each application to use several selection procedures to device which copula is the best as considered in this study. The empirical Kaplan and Meier bivariate lifetime function estimator considered in this study is of easy interpretation and implementation. However, it was observed that this estimator in some lifetimes (especially the rightmost times) has a tendency to estimate the fraction of survivals near zero, discarding these points of a possible cure fraction; this fact is evidenced when observing the values of the absolute maximum distances where these maximum distances were close to the final lifetimes. However, the comparison between the distances of the empirical and estimated matrices gives a additional form to compare the fit of the proposed models.The use of the Bayesian approach and MCMC methods was considered due to its flexibility and effectiveness in estimating the parameters of more complex models. It is believed that there is also merit for the possibility of using the prior distribution in the model parameters to restrict the estimates with the corresponding parametric space. It was also observed that the best MCMC algorithm convergences were observed when the lifetimes and are not very large, that is, with values smaller than 20. Even considering the simple two-parameter Weibull standard probability distribution as the marginal distributions, an appropriate choice is required for the initial values in the iterative simulation algorithm and a careful choice for the values of the hyperparameters of the prior distributions for the parameters of interest, according to the structure of the model and the data. It is worth mentioning that the algorithms used in the computational simulation approach for each proposed model are easy to implement using R and OpenBugs (free softwares). The computational OpenBugs codes used in this research are available as complementary material with the online version of this paper.
Declarations
Author contribution statement
Marcos Vinicius de Oliveira Peres, Jorge Alberto Achcara, Edson Zangiacomi Martinez: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Competing interest statement
The authors declare no conflict of interest.
Additional information
Supplementary content related to this article has been published online at https://doi.org/10.1016/j.heliyon.2020.e03961.