Literature DB >> 33265731

Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data.

Erlandson Ferreira Saraiva¹, Adriano Kamimura Suzuki², Luis Aparecido Milan³.

Abstract

In this paper, we study the performance of Bayesian computational methods to estimate the parameters of a bivariate survival model based on the Ali-Mikhail-Haq copula with marginal distributions given by Weibull distributions. The estimation procedure was based on Monte Carlo Markov Chain (MCMC) algorithms. We present three version of the Metropolis-Hastings algorithm: Independent Metropolis-Hastings (IMH), Random Walk Metropolis (RWM) and Metropolis-Hastings with a natural-candidate generating density (MH). Since the creation of a good candidate generating density in IMH and RWM may be difficult, we also describe how to update a parameter of interest using the slice sampling (SS) method. A simulation study was carried out to compare the performances of the IMH, RWM and SS. A comparison was made using the sample root mean square error as an indicator of performance. Results obtained from the simulations show that the SS algorithm is an effective alternative to the IMH and RWM methods when simulating values from the posterior distribution, especially for small sample sizes. We also applied these methods to a real data set.

Entities: Chemical Disease Gene Species

Keywords: Ali–Mikhail–Haq copula; Bayesian inference; MCMC; Metropolis-Hastings; slice sampling

Year: 2018 PMID： 33265731 PMCID： PMC7513167 DOI： 10.3390/e20090642

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

In survival studies, it is common to observe two or more lifetimes for the same client, patient or equipment. For instance, in a bivariate scenario, the lifetimes of a pair of organs can be observed, such as a pair of kidneys, liver, or eyes in patients; or the lifetimes of engines in a twin-engine airplane. These variables are usually correlated and we are interested in the bivariate model that considers the dependence between them. The copula model is useful for modeling this kind of bivariate data. It has been used in several articles, including the following: [1] describes a comparison between bivariate frailty models, and models based on bivariate exponential and Weibull distributions; [2] proposes a copula model to study the association between survival time of individuals infected with HIV and persistence time of infection; [3] models the association of bivariate failure times by copula functions, and investigates two-stage parametric and semi-parametric procedures; and [4] considers a Gaussian copula model and estimates the copula association parameter using a two-stage estimation procedure. According to [5,6], a copula is a joint distribution function of random variables for which the marginal probability distribution of each variable is uniformly distributed on the interval . There are many parametric copula families in the literature, each one representing a different dependence structure between the random variables. One advantage of a copula model is its simplicity when applied to model bivariate data. This is explored by many authors in survival analysis. Among them are: Romeo et al. [7] and da Cruz et al. [8], who considered the Archimedean copula family; Louzada et al. [9] and Suzuki et al. [10], who considered the Farlie–Gumbel–Morgenstern (FGM) copula; and Romeo et al. [11], who considered the two-parameter Archimedean family of power variance function (PVF) copulas. In this paper, we apply the Ali–Mikhail–Haq (AMH) copula to model bivariate survival data with random right-censored observations. From a practical point of view, the main reason for using the AMH copula is that it is an Archimedean copula that allows both positive and negative values for the dependence parameter, and whose mathematical formula is simpler than other Archimedean copulas. Another advantage is that assuming the AMH copula, the Kendall rank-order correlation between the bivariate lifetimes is a monotonic function of the dependence parameter . According to [12], the Kendall’s can range from (approximately) to , with when ; and the Spearman’s associated to can range (approximately) from to , indicating that the AMH copula is adequate for modeling bivariate data with a weak correlation. In order to proceed with the copula model it is necessary to specify the marginal distributions. At this point, several probability distributions could be considered. Generally, the choice for marginal distributions depends on the application. We restrict our analysis to the case where the marginal distributions are Weibull distributions. This is because it is a very flexible distribution for the modeling of various types of lifetime data. In addition, the parametrization of the Weibull distribution—as well as the mathematical expression of the AMH copula—is very attractive from the mathematical point of view, allowing the development of a Bayesian approach to estimate the parameters of interest in a clear and concise way. As the conditional posterior distributions for parameters of interest does not follow any familiar distribution, the estimation procedure was carried out using versions of the Metropolis–Hastings algorithm, referred to here as Independent Metropolis–Hastings (IMH), Random Walk Metropolis (RWM) and Metropolis–Hastings (MH). MH refers to the Metropolis–Hastings algorithm with a natural candidate generating density whose parameters depend on the hyperparameter values and the observed data. Since the creation of a good candidate generating density in IMH and RWM can be difficult, we also used the slice sampling algorithm [13]. Combining IMH, RWM, MH and SS in different ways, we developed three MCMC algorithms to estimate the model parameters. A simulation study was carried out with the objective of investigating the behavior of each algorithm. The data sets were generated by considering different sample sizes and percentages of right-censored observations. Based on the root mean square error (RMSE), we identified the algorithms with the best performances when estimating the model parameters. We also compared the performances of the three algorithms using the effective sample size and the integrated autocorrelation time [14]. Results obtained from these simulations show that the algorithm that applied the SS algorithm is an effective alternative for standard MCMC methods (IMH and RWM) when simulating values from the posterior distribution of the model parameters, especially when the sample size is small. We applied the three proposed algorithms to a real data set. This data set is related to diabetic retinopathy, described in The Diabetic Retinopathy Study Research Group [15], and is available in the `survival’ package [16] of the R software [17]. For this case, we compared the performance of the algorithms. Comparison was based on the RMSE relative to the empirical distribution function obtained from Kaplan–Meier estimates. The remainder of the paper is organized as follows. In Section 2, we introduce the bivariate survival model based on the AMH copula with Weibull marginal distributions. The Bayesian approach and the three MCMC algorithms are described in Section 3. In Section 4, the simulation study is reported. In Section 5 we apply the three algorithms to the real data set. Section 6 summarizes our findings.

2. Bivariate Survival Model and Observed Data

Let be the vector of bivariate lifetimes of an item (or an individual) with marginal density functions and the survival functions be , where and are unknown parameters (scalars or vectors). Consider that comes from the copula , where is a parameter showing dependence between and . Then the joint survival function for is given by where and is a dependence parameter. We also assume that copula is given by the Ali–Mikhail–Haq copula [18]. Thus, we have for . Note that under this assumption the survival functions and the dependence structure can be visualized separately with the dependence structure represented by the copula. Let and be a sample of size n of bivariate lifetimes and censured bivariate lifetimes, respectively. Suppose and are independent, for . Consider —the i-th observed value and —a censorship indicator given by for and . We denote the observed values using and , where , , and . The likelihood function for , given , is (see Lawless, [19]) where is the joint probability density function for , , , and is the copula given by (1), for . From Equation (1), we have where is the cumulative distribution function for and .

Weibull Marginal Distribution

Assume that the marginal distributions for and are given by Weibull distributions [20], i.e., with shape parameter and scale parameter [21], each one having a probability density function for and . The survival function and hazard function are respectively, where for and . Thus, the joint survival function in (1) is where . The likelihood function for is where is the number of uncensored observations for , , and for .

3. Bayesian Approach

In order to develop the Bayesian approach, we need to specify the prior distributions for , and , for . We assume that priors are independent, i.e., . Therefore, we consider the following prior distributions where is the Gamma distribution and , , and are known hyperparameters, all of them with support on , for . The parametrization of the Gamma distribution is such that the mean is and the variance is , for . The choice of values for the hyperparameters depends on the application. In the remainder of the article, we set up the hyperparameters values that give prior distributions with large variances. In particular, we set , for . For we chose the uniform prior distribution on the interval , . Using Bayes theorem, the joint posterior distribution for is where is given in Equation (3). The conditional posterior distributions are where , for , is the vector of parameters without the parameter , . The conditional posterior distributions in Equations (4)–(6) are not familiar distributions. Thus, in order to simulate from conditional posterior distributions, we used the Metropolis–Hastings algorithm. At each iteration, the Metropolis–Hastings algorithm considers a value generated from a proposal distribution. This value is accepted according to a properly specified acceptance probability. This procedure guarantees the convergence of the Markov chain for the target density. More details on the Metropolis–Hastings algorithm can be found in [22,23,24,25] and their references.

3.1. MCMC for

Without loss of generality, we describe here how to update parameter conditional on all other parameters, and . The update procedure for is similar. Let be the current state of the Markov chain. Consider a value generated from a candidate generating density . The value is accepted with probability , where and is the likelihood function, given in Equation (3). The Metropolis–Hastings algorithm is implemented as follows. Metropolis–Hastings Algorithm: Let the current state of the Markov chain be , where l is the l-th iteration of the algorithm, , and are the values of , and in -th iteration, respectively, for , in which, , and are the initial values. At the l-th iteration of the algorithm, we updated as follows: Generate ; Calculate , where is given by (7); Generate . If accept and do . Otherwise, reject and set .

3.1.1. Two Common Choices for

To implement the Metropolis–Hastings algorithm, the candidate-generating density needs to be specified. Generally, one may explore the form of the conditional posterior distribution to set the candidate-generating density. For example, if we can write as , where is a density that can be easily generated and is uniformly bounded, then we may set up . However, this is not the case for . Another option is to generate from a candidate generating density that does not depend on the current value. That is, we may set up . Thus, we have a special case of the original MH algorithm, called Independent Metropolis–Hastings (IMH), where is given in (7) and simplifies to In order to implement this case, one may set as the prior distribution, i.e., . Then, is given by the likelihood ratios, This algorithm is implemented as follows. Although the choice of the prior distribution as the candidate generating density may be mathematically attractive, it usually leads to a slow convergence of the algorithm. This happens when vague prior information is available and prior distribution has large variance. As a consequence, many of the proposed values are rejected. Independent Metropolis–Hastings Algorithm: Let the current state of the Markov chain be . For the l-th iteration of the algorithm do the following: Generate from the prior distribution ; Calculate , where is given by (8); Generate . If accept and set . Otherwise, reject and set . An alternative is to explore the neighborhood of the current value of the Markov chain to propose a new value. This method is termed the random walk Metropolis (RWM). In the RWM, the candidate value is generated from a symmetric density . That is, we set up and the probability of generating a move from to depends only on the distance between them. For this case, given in (7) simplifies to since the proposal kernels from numerator and denominator cancel. In order to implement the RWM it is necessary to simulate setting , where is a random perturbation generated from a Normal distribution with mean 0 and variance , , meaning that . This algorithm is implemented as follows. Random Walk Metropolis Algorithm: Let the current state of the Markov chain be . For the l-th iteration of the algorithm, , do the following: Generate and set ; Calculate , where is given by (9); Generate . If accept and set . Otherwise, reject and set . An issue in RWM is how to choose the value of . It has a strong influence on the efficiency of the algorithm. If is too small, the random perturbations will be small in magnitude and almost all will be accepted. The consequence is that it will take a large number of iterations to explore the entire state-space. On the other hand, if is large there will be many rejections of the proposed values, slowing down the convergence. More details on this issue can be found in [23,26,27,28]. Typically, one may fix the value of by testing some values on a few pilot runs and then choosing a value whose acceptance ratio lies between and (see, for example, [24,25]). Thus, after a pilot run we set up .

3.1.2. Slice Sampling Algorithm

An alternative to the IMH and RWM sampling from some generic distribution is the slice sampling algorithm. This algorithm is a type of Gibbs sampling based on the simulation of specific uniform random variables. Here we explain the algorithm slice sampling in the context of the simulation of . The sampling procedure for is similar. More details about SS can be found in [13]. In SS, an auxiliary variable U is introduced and the joint distribution is given by a uniform distribution over the region below the curve defined by . From (4), we have Marginalizing over U yields , so sampling from and discarding U is equivalent to sampling from . As sampling from is not straightforward, we implemented a Gibbs sampling algorithm where at every iteration l, we first generate and then sample , where . However, as the inverse of cannot be obtained analytically, we adopted the following procedure to update : Let and be an empty set. For : Set If do else break For : Set If do else break Generate . This algorithm is implemented as follows. Slice sampling algorithm: Let the current state of the Markov chain be and . For the l-th iteration of the algorithm, : Generate , where is given by (10). obtain , conditional on . Generate .

3.2. MCMC for and

Note from (5) that the conditional posterior distribution for the scale parameter , , is given by the kernel of a Gamma distribution with parameters and multiplied by . In other words, may be written as , where is the density of the Gamma distribution with being uniformly bounded. Thus, we set up the candidate generating density for as . The acceptance probability for the generated value is given by , where This algorithm is implemented as follows. The Metropolis–Hastings algorithm for updating is similar. To update the dependence parameter conditional on the remaining parameters , we used the following IMH algorithm. Let be a grid from to 1 with increments of . Consider , an interval defined by two adjacent grid values of where a is the index of the a-th value of the grid for . For example, for we have the interval ; for , we have the interval ; and for we have the interval . Then generate the a candidate value as follows: Metropolis–Hastings Algorithm: Let the current state of the Markov chain be , where . For the l-th iteration of the algorithm, : Generate . Calculate , where is given by (11). Generate . If accept and set . Otherwise, reject and set . If the current value of is in the interval , then generate from one of the two following Uniform distributions For this case, we generate an auxiliary variable ; if , then we generate from , , otherwise we generate from , . If the current value of is in , then generate from one of the two following uniform distributions Similarly to item (i), we generate an auxiliary variable ; if , then , otherwise . If the current value of is in the interval , for and , then generate from one of three following uniform distributions For this case, we generate an auxiliary variable ; if , then we generate from , ; if , then we generate from , ; and if , we generate from , . The acceptance probability is given by , where for or according to items (i)–(iii) described above. This algorithm is implemented as follows. IMH algorithm for : Let the current state of the Markov chain be . For the l-th iteration of the algorithm, : Generate according to one of the items (i), (ii) or (iii) described above. Calculate . Generate . If accept and set . Otherwise, reject and set .

3.3. MCMC Algorithms

Using the algorithms IMH, RWM, SS and MH described above, we implemented three MCMC algorithms: For these three algorithms, the parameters and are updated via MH and IMH, as described in Section 3.2, for . Algorithm : Parameters ’s are updated via IMH, Algorithm : Parameters ’s are updated via RWM, Algorithm : Parameters ’s are updated via SS. After defining the algorithms, we ran them for L iterations and a burn-in B. We also consider jumps of size J, i.e., only 1 drawn from every J was extracted from the original sequence obtaining a sub sequence of size to make inferences. The estimates for parameters are given by where is the value generated for in the -th iteration of the algorithm, for and .

4. Simulation Study

In this section, we present the comparison between the performances of the three algorithms applied to simulated data sets. Simulated random samples of sizes and 250 with , , , and random right-censored were generated to represent small, medium and large data sets. Using these, we generated four simulated data sets with fixed parameters, as specified in Table 1.

Table 1

Parameter values for simulated data sets.

Data Set	Parameters
Data Set	α1	β1	α2	β2	ϕ
D1	2.00	1.00	3.00	1.00	0.50
D2	1.00	2.00	2.00	0.50	−0.75
D3	0.75	1.50	1.00	2.00	0.05
D4	1.80	2.40	2.20	1.20	0.95

Data set has two increasing hazard functions with a positive dependence parameter, while data set has a constant and increasing hazard function with a negative dependence parameter. Data set has parameters to produce a decreasing and a constant hazard function with weak dependence, while data set has strong dependence and two increasing hazard functions. The simulation procedure to generate n observations , for , is given by the following steps: Set up the sample size n and set ; Generate the censoring times , where controls the percentage of censored observations, for ; Generate uniform values , and calculate , the solution of the nonlinear equation . Here we used the rootsolve package and the uniroot.all command from R software to solve the nonlinear equation and obtain ; Calculate and ; Calculate the times and the censorship indicators , which are equal to 1 if and 0 otherwise, for ; Set . If stop. Otherwise, return to step (ii). We generated different simulated data sets according to steps (i)–(vi) described above and the parameters were estimated according to algorithms , and . We used hyperparameters to obtain prior distributions with large variance, for . For the m-th generated data set, we applied algorithms , and fixing L = 55,000 iterations, burn-in B = 5000 and . Comparison of the algorithms was made using the sample Root Mean Square Error (RMSE), given by A smaller RMSE indicates better overall quality of the estimates. Table 2 presents the RMSE value for each simulated data set by algorithm, sample size and percentage of censorship. The smaller RMSE value for each sample size and percentage of censorship is highlighted in bold. For the three algorithms, by fixing the sample size and increasing the censuring percentage (% cens.), the RMSE values increased. When the sample size increases at a fixed percentage of censures, the RMSE values decrease, consequently improving the precision of the estimators.

Table 2

Root mean square error (RMSE) by algorithm for data sets , , and .

Sample Size	% of Censures	Data Set D1			Data Set D2			Data Set D3			Data Set D4
		Algorithm			Algorithm			Algorithm			Algorithm
		A1	A2	A3	A1	A2	A3	A1	A2	A3	A1	A2	A3
n=25	0%	0.3678	0.3717	0.3581	0.3774	0.3781	0.3458	0.3375	0.3370	0.3368	1.1085	1.0888	1.0883
	5%	0.4078	0.3869	0.3597	0.3861	0.3901	0.3736	0.3586	0.3573	0.3523	1.1325	1.1305	1.1278
	10%	0.4189	0.4012	0.3670	0.4144	0.4259	0.4135	0.3687	0.3675	0.3611	1.1428	1.1396	1.1323
	20%	0.4245	0.4153	0.3772	0.4472	0.4648	0.4381	0.3772	0.3729	0.3727	1.1726	1.1714	1.1711
	30%	0.4362	0.4543	0.3989	0.5335	0.5614	0.5303	0.3994	0.3990	0.3944	1.2078	1.1946	1.1925
n=50	0%	0.2595	0.2507	0.2678	0.2633	0.2552	0.2573	0.2162	0.2112	0.2048	1.0397	1.0318	1.0312
	5%	0.2663	0.2652	0.2699	0.2641	0.2601	0.2719	0.2239	0.2283	0.2233	1.0470	1.0442	1.0403
	10%	0.2831	0.2806	0.2814	0.2959	0.2683	0.2844	0.2390	0.2457	0.2269	1.0483	1.0453	1.0433
	20%	0.2846	0.2820	0.2863	0.2966	0.2820	0.3026	0.2719	0.2546	0.2366	1.0517	1.0528	1.0513
	30%	0.2983	0.2885	0.3104	0.3245	0.3170	0.3182	0.2828	0.2776	0.2736	1.0915	1.0666	1.0550
n=100	0%	0.1822	0.1819	0.1833	0.1917	0.1816	0.1878	0.1664	0.1657	0.1702	1.0153	1.0041	1.0124
	5%	0.1953	0.1851	0.1859	0.1925	0.1857	0.1914	0.1769	0.1755	0.1782	1.0228	1.0063	1.0152
	10%	0.1982	0.1924	0.1927	0.2026	0.2019	0.2023	0.1788	0.1760	0.1791	1.0239	1.0088	1.0157
	20%	0.1996	0.1964	0.2074	0.2029	0.2028	0.2047	0.1934	0.1832	0.1879	1.0282	1.0092	1.0177
	30%	0.2131	0.2122	0.2144	0.2463	0.2112	0.2211	0.2094	0.1967	0.2143	1.0291	1.0128	1.0265
n=250	0%	0.1138	0.1123	0.1130	0.1075	0.1079	0.1115	0.1156	0.1140	0.1162	0.9934	0.9923	0.9936
	5%	0.1141	0.1136	0.1149	0.1206	0.1141	0.1129	0.1179	0.1146	0.1183	0.9970	0.9963	0.9968
	10%	0.1165	0.1164	0.1167	0.1244	0.1199	0.1237	0.1186	0.1159	0.1197	0.9985	0.9977	0.9972
	20%	0.1224	0.1216	0.1229	0.1258	0.1252	0.1287	0.1303	0.1260	0.1273	0.9991	0.9984	0.9991
	30%	0.1374	0.1333	0.1344	0.1677	0.1398	0.1458	0.1391	0.1328	0.1329	0.9999	0.9993	0.9997

Based on the results presented in Table 2, for the smaller sample size , the algorithm (with SS) outperformed algorithm (with IMH) and algorithm (with RWM), i.e., it gave a smaller RMSE value for all percentages of censures. This better performance also happened for data sets and for . For all other simulated cases, the algorithm outperformed algorithms and . An exception is the case with and of censuring in data set , in which algorithm had a better performance. These results suggest a possible complementarity between algorithms and , where algorithm performs better for higher sample sizes and algorithm performs better for smaller sample sizes. We verified the convergence of algorithms , and using the effective sample size [14] and the integrated autocorrelation time (IAT). The effective sample size (ESS) is the number of effectively independent draws from the posterior distribution. Method with larger ESS are the most efficient. The IAT is a MCMC diagnostic that estimates the average number of autocorrelated samples required to produce one independent sample draw. Lower IAT is means more efficiency. The EES and IAT values were obtained using the coda and LaplacesDemon. Both packages are available in the R software. Table A1 and Table A2 in Appendix A show the average of ESS and IAT values for each algorithm by parameter for data set . Algorithm showed a better performance than algorithms and , i.e., it had the highest ESS values and smallest IAT values by parameter for all simulated cases. Note that algorithm had the worst results, especially for simulated values for , . Results for data sets , and were similar.

Table A1

by algorithm for data sets .

Sample Size	% of Censures	Algorithm A1					Algorithm A2					Algorithm A3
Sample Size	% of Censures	α1	β1	α2	β2	ϕ	α1	β1	α2	β2	ϕ	α1	β1	α2	β2	ϕ
n=25	0%	25.4	1149.9	26.0	1168.4	105.9	1741.7	3493.7	1816.3	3511.8	111.2	4547.7	4110.0	4540.0	4136.9	112.2
	5%	26.4	1360.4	27.4	1311.1	100.6	1758.1	3530.2	1823.3	3563.4	106.8	4569.7	4118.5	4622.4	4125.7	112.0
	10%	27.9	1570.5	28.2	1422.5	97.6	1783.3	3543.0	1827.7	3598.9	99.9	4604.9	4220.7	4672.7	4191.9	105.2
	20%	31.8	2178.7	30.1	1988.6	95.6	1869.0	3943.1	1822.2	3738.9	93.9	4681.8	4275.1	4726.5	4182.3	97.9
	30%	32.9	2293.8	32.7	2146.3	88.5	1931.0	4018.4	1772.0	3885.7	88.1	4782.5	4350.3	4744.4	4329.9	89.6
n=50	0%	19.4	860.7	19.5	1049.2	173.0	1415.2	3259.1	1774.8	3450.9	172.7	4607.70	4132.9	4610.4	4129.5	176.9
	5%	19.6	1061.1	18.7	968.2	167.2	1475.8	3456.2	1796.2	3517.1	167.3	4680.2	4226.3	4698.9	4187.6	169.3
	10%	21.1	1331.7	20.6	1168.2	163.2	1565.6	3662.3	1861.4	3700.1	155.8	4706.1	4237.6	4698.8	4148.0	171.4
	20%	22.5	2134.5	23.1	2005.2	141.6	1668.8	3926.3	1922.5	3804.2	140.0	4825.1	4374.9	4792.8	4299.3	143.6
	30%	24.3	2604.9	24.5	2241.4	127.0	1770.5	4188.2	1989.0	4047.5	132.2	4817.7	4504.1	4819.8	4364.1	133.8
n=100	0%	14.3	817.5	14.8	826.7	316.7	1107.5	3258.6	1518.9	3429.5	323.9	4609.3	4244.3	4668.7	4169.3	325.2
	5%	14.5	899.7	14.5	807.8	304.1	1136.7	3393.6	1549.6	3522.7	290.0	4639.9	4238.7	4689.2	4222.8	311.4
	10%	15.6	1157.9	15.0	938.3	276.9	1199.2	3617.4	1598.7	3698.5	272.9	4729.9	4311.9	4800.5	4295.0	277.3
	20%	16.3	1846.4	16.4	1540.7	260.7	1297.1	3886.4	1706.2	3834.2	265.2	4833.4	4465.1	4827.2	4399.4	271.4
	30%	17.6	3127.3	17.7	2337.1	224.4	1414.1	4292.0	1831.9	4128.8	211.1	4857.6	4475.2	4862.9	4410.8	226.3
n=250	0%	10.3	655.3	10.0	662.7	672.9	712.3	2856.1	1055.4	3236.4	687.8	4588.1	4210.6	4655.5	4275.5	698.8
	5%	10.7	800.5	10.5	816.3	672.3	742.5	3106.1	1083.3	3343.3	640.0	4664.5	4333.8	4734.3	4277.8	693.9
	10%	10.7	1024.2	10.8	951.7	602.3	786.7	3369.7	1128.4	3519.9	607.5	4728.8	4362.8	4757.3	4338.3	620.0
	20%	10.7	1735.2	11.8	1494.5	549.7	863.0	3890.0	1226.9	3845.6	539.6	4741.7	4440.4	4805.1	4451.7	550.0
	30%	12.2	3259.7	12.1	2271.8	466.2	936.6	4279.2	1308.9	4147.7	477.2	4872.7	4625.0	4858.4	4552.6	481.6

Table A2

by algorithm for data sets .

Sample Size	% of Censures	Data Set A1					Data Set A2					Data Set A3
Sample Size	% of Censures	α1	β1	α2	β2	ϕ	α1	β1	α2	β2	ϕ	α1	β1	α2	β2	ϕ
n=25	0%	162.7	2.4	162.4	2.3	50.6	3.0	1.5	2.9	1.5	50.2	1.1	1.3	1.1	1.2	50.0
	5%	162.3	2.2	154.0	2.3	52.5	2.9	1.5	2.8	1.5	50.2	1.1	1.2	1.1	1.2	50.0
	10%	152.7	2.0	150.9	2.3	54.1	2.9	1.5	2.8	1.5	54.8	1.1	1.2	1.1	1.2	51.3
	20%	136.8	1.7	136.6	1.9	55.4	2.7	1.3	2.8	1.4	55.8	1.1	1.2	1.1	1.2	54.5
	30%	132.2	1.7	130.4	1.7	59.9	2.6	1.3	3.0	1.4	59.8	1.1	1.2	1.1	1.2	57.6
n=50	0%	208.9	2.3	213.5	2.2	33.2	3.7	1.6	2.9	1.5	32.8	1.1	1.2	1.1	1.2	32.5
	5%	208.7	2.0	233.6	2.2	34.8	3.5	1.5	2.9	1.5	34.5	1.1	1.2	1.1	1.2	34.2
	10%	198.6	1.9	206.5	2.2	35.6	3.3	1.4	2.7	1.4	36.0	1.1	1.2	1.1	1.2	35.2
	20%	183.6	1.6	179.4	1.6	39.5	3.1	1.3	2.7	1.4	39.2	1.1	1.2	1.1	1.2	39.0
	30%	170.5	1.5	170.0	1.6	43.2	2.9	1.2	2.5	1.3	41.9	1.1	1.1	1.1	1.2	40.3
n=100	0%	288.1	2.1	278.2	2.2	17.9	4.6	1.6	3.4	1.5	18.1	1.1	1.2	1.1	1.2	17.2
	5%	284.7	2.2	287.2	2.2	19.7	4.5	1.5	3.3	1.5	20.3	1.1	1.2	1.1	1.2	18.9
	10%	266.8	1.9	271.9	1.9	21.3	4.2	1.4	3.2	1.4	20.5	1.1	1.2	1.1	1.2	20.3
	20%	250.0	1.6	252.8	1.7	22.8	3.9	1.4	3.0	1.4	22.4	1.1	1.1	1.1	1.2	22.3
	30%	233.4	1.3	227.1	1.5	26.5	3.6	1.2	2.8	1.2	27.0	1.1	1.1	1.1	1.2	26.2
n=250	0%	417.9	2.0	418.8	2.0	7.9	7.1	1.8	4.8	1.6	7.9	1.1	1.2	1.1	1.2	7.6
	5%	400.6	1.9	399.7	2.0	8.2	6.8	1.7	4.7	1.6	8.4	1.1	1.2	1.1	1.2	8.1
	10%	391.7	1.8	366.7	1.8	9.1	6.5	1.5	4.5	1.5	9.0	1.1	1.2	1.1	1.2	8.8
	20%	374.6	1.5	355.9	1.6	10.2	5.9	1.3	4.1	1.4	10.3	1.1	1.2	1.1	1.2	10.1
	30%	358.9	1.3	339.2	1.4	11.8	5.5.	1.5	3.9	2.1	11.7	1.1	1.1	1.1	1.1	11.1

Appendix B presents an empirical convergence check for the sampled values for for each algorithm. As shown in Figure A1, the generated values for by algorithm did not mix well and the stability for the ergodic mean and estimated autocorrelation were not satisfactory. On the other hand, the values generated by algorithms and were well mixed and present satisfactory stability for the ergodic mean and autocorrelation. As an illustration of convergence diagnostic, Figure A1 (j–l) shows the Gelman plot for the sequence of values in two chains by each algorithm. As can be seen in the figure, the number of iterations was sufficient for algorithms and to reach convergence, but not for algorithm . In addition, the scale reduction factor of the Gelman–Rubin diagnostic [29] for each parameter in algorithms and were smaller than , meaning that there is no indication of non-convergence. This implies a faster convergence of algorithms and in relation to algorithm . For sampled values, the three algorithms present satisfactory properties, i.e., good mixing, and satisfactory stability for ergodic mean and autocorrelation (see Figure A2 in Appendix B).

Figure A1

Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for .

Figure A2

Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for .

The results indicate that algorithm (SS for ) is an effective alternative to algorithms (with IMH for ) and (with RWM for ) to simulate samples from the posterior distribution of bivariate survival models based on the Ali–Mikhail–Haq copula with marginal Weibull distributions.

5. Application to a Real Data Set

Next, we examine the performance of algorithms , and on the diabetic retinopathy data set described in [15], which is available in the R software `survival’ package [16]. This data set consists of the follow-up times of 197 diabetic patients under 60 years of age. The main objective of the study was to evaluate the effectiveness of the photocoagulation treatment for proliferative retinopathy. The treatment was randomly assigned to one eye of each patient and the other eye was taken as a control. Let be the bivariate times, where is the time to visual loss for the treatment eye and is the time to visual loss for the control eye. The percentage of censure times for each variable is (143 observations) for and (96 observations) for . We used (1) to model this data with Weibull marginal distributions with parameters and and dependence parameter . We compared the performances of the algorithms using the RMSE in relation to the empirical distribution function, where is obtained by substituting the estimates of , and (obtained by each algorithm); and is the empirical distribution function obtained from the Kaplan–Meier estimates, for and . We ran the three algorithms using the same number of iterations, burn-in, thinning and hyperparameters values used with the simulation data. Table 3 shows the parameters estimates, the credibility intervals () and RMSE values by algorithm. For this data set, the algorithm (with SS for ) gave the smaller RMSE value.

Table 3

Parameters estimates and RMSE by algorithm.

Algorithm	Parameter					RMSE
Algorithm	α1	β1	α2	β2	ϕ	Value
A1	0.7624	0.0186	0.8399	0.0294	0.7159	0.4227
A1	(0.5999,0.9361)	(0.0087, 0.0338)	(0.7607, 0.9353)	(0.0195, 0.0414)	(0.3765, 0.9637)	0.4227
A2	0.7757	0.0179	0.8308	0.0310	0.7148	0.4619
A2	(0.5929, 0.9853)	(0.0071, 0.0343)	(0.6897, 0.9679)	(0.0172, 0.0515)	(0.3560, 0.9600)	0.4619
A3	0.6438	0.0289	0.7015	0.0494	0.7266	0.3562
A3	(0.5103, 0.7967)	(0.0142, 0.0482)	(0.5910, 0.8273)	(0.0293, 0.0746)	(0.3675, 0.9715)	0.3562

Figure 1 shows the estimated survival functions by algorithms (red line) and (blue line). The step functions (black lines) are the Kaplan–Meier estimates. The estimated curves by algorithms and are very close and so we show only the curve estimated by , in order to provide a good visualization. The Kaplan–Meier estimates were obtained using the survival package and the survfit command in the R software.

Figure 1

The estimated survival function for algorithms and .

Table 4 shows the ESS and IAT values for the sequences generated by algorithms , , and . Algorithm had a better performance than algorithms and , i.e., the highest ESS value and the lowest IAT value per parameter.

Table 4

Integrated autocorrelation time (IAT) and effective sample size (ESS) values for algorithms , and .

Parameter	ESS			IAT
Parameter	A1	A2	A3	A1	A2	A3
α1	5.4650	159.8655	791.0559	435.0485	34.2212	6.4039
β1	6.5887	205.4812	880.9221	81.9980	26.8373	5.6359
α2	8.1633	134.7412	227.6705	327.9376	35.6760	24.6754
β2	16.1893	133.8282	230.9487	36.7590	30.5560	21.1668
ϕ	2443.3791	2400.0097	2461.1781	2.3426	2.3348	2.2813

We also compared the performances of the algorithms in relation to the sequences generated for each parameter. Figure 2 shows the traceplots, the ergodic means, and the autocorrelations for sequences of values simulated by algorithms , and .

Figure 2

Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for .

It can be observed in these graphs that the values generated by the IMH (algorithm ) has poor mixing, does not show satisfactory stability for the ergodic mean, and the autocorrelation is high for long lags. On the other hand, the values generated by the RWM (algorithm ) and SS (algorithm ) are better mixed and present satisfactory stability for the ergodic mean. However, the sequence produced by the SS presents the steepest decreasing autocorrelation. Figure 3 shows the same graphs for parameter . As can be seen, for the performances of the three algorithms are satisfactory. These results, together with those presented by the RMSE, show that for the data set analyzed here SS provides a better performance than IMH or RWM.

Figure 3

Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for .

Figure 4 shows the Gelman plot for the simulated values for , and in two chains by each algorithm. As can be seen, the number of iterations was sufficient for algorithms and to reach the convergence, but not sufficient for algorithm (Figure 4a,b). The scale reduction factor for each parameter in algorithms and are all less than , while for algorithm only presents a scale reduction factor less than .

Figure 4

Gelman plot for two sequences produced by algorithms A1, A2 and A3 for , and .

6. Final Remarks

We investigated the performances of three Bayesian computational methods to estimate parameters of a bivariate survival model based on the Ali–Mikhail–Haq copula with marginal Weibull distributions. The performances of the MCMC algorithms were compared using the RMSE criterion. The RMSE values were calculated for different sample sizes and different percentages of censures. The results obtained from the simulated data sets showed that the RWM and SS algorithms outperformed the IMH algorithm, and that the SS algorithm performed better for lower sample sizes. The results show evidence that MCMC sequences obtained with SS with the same number of iterations L, burn in B and thinning value, have better properties (i.e., higher ESS and lower IAT values) than for IMH and RWM, which are standard methods to sample from the joint posterior distribution. We also illustrate the application of the algorithms using a real data set, available in the literature. The algorithm (with SS generating the ’s) presented a better performance when applied to this data set. The criteria used to reach this conclusion were the stability for the ergodic mean, the autocorrelation, the minimum RMSE value, the maximum value, and the minimum value. In addition, the algorithm using SS presented a satisfactory performance in relation to scale factor reduction, and the Gelman plot of the Gelman–Rubin convergence diagnostic. Our results show that algorithm , which is composed by a mixing of SS for generating , MH for and IMH for , is an effective algorithm to simulate values from the joint posterior distribution of an AMH copula with Weibull marginal distributions. Moreover, two advantages of SS are that it is easy to implement and it does not need to specify a candidate generating density. A disadvantage in our specific case is that it took longer to perform the simulation when compared with IMH and RWM. The reason for this longer time is that we needed an iterative method to obtain the inverse of the function . This was because an analytical solution is not available. All calculations were implemented using the software R and can be obtained from the authors. An extension of the results obtained here for other Arquimedian copulas as well other marginal distributions and a possible generalization would be a fruitful area for future work.

7 in total

Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data.

1. Introduction

2. Bivariate Survival Model and Observed Data

Weibull Marginal Distribution

3. Bayesian Approach

3.1. MCMC for

3.1.1. Two Common Choices for

3.1.2. Slice Sampling Algorithm

3.2. MCMC for and

3.3. MCMC Algorithms

4. Simulation Study

5. Application to a Real Data Set

6. Final Remarks

1. A comparison of frailty and other models for bivariate survival data.

2. A Gaussian Copula Model for Multivariate Survival Data.

3. Bivariate survival modeling: a Bayesian approach based on Copulas.

4. Preliminary report on effects of photocoagulation therapy. The Diabetic Retinopathy Study Research Group.

5. Bayesian bivariate survival analysis using the power variance function copula.

6. Inferences on the association parameter in copula models for bivariate survival data.

7. A copula model for bivariate hybrid censored survival data with application to the MACS study.