Literature DB >> 35699121

Laplacian-P-splines for Bayesian inference in the mixture cure model.

Oswaldo Gressani¹, Christel Faes¹, Niel Hens^1,2.

Abstract

The mixture cure model for analyzing survival data is characterized by the assumption that the population under study is divided into a group of subjects who will experience the event of interest over some finite time horizon and another group of cured subjects who will never experience the event irrespective of the duration of follow-up. When using the Bayesian paradigm for inference in survival models with a cure fraction, it is common practice to rely on Markov chain Monte Carlo (MCMC) methods to sample from posterior distributions. Although computationally feasible, the iterative nature of MCMC often implies long sampling times to explore the target space with chains that may suffer from slow convergence and poor mixing. Furthermore, extra efforts have to be invested in diagnostic checks to monitor the reliability of the generated posterior samples. A sampling-free strategy for fast and flexible Bayesian inference in the mixture cure model is suggested in this article by combining Laplace approximations and penalized B-splines. A logistic regression model is assumed for the cure proportion and a Cox proportional hazards model with a P-spline approximated baseline hazard is used to specify the conditional survival function of susceptible subjects. Laplace approximations to the posterior conditional latent vector are based on analytical formulas for the gradient and Hessian of the log-likelihood, resulting in a substantial speed-up in approximating posterior distributions. The spline specification yields smooth estimates of survival curves and functions of latent variables together with their associated credible interval are estimated in seconds. A fully stochastic algorithm based on a Metropolis-Langevin-within-Gibbs sampler is also suggested as an alternative to the proposed Laplacian-P-splines mixture cure (LPSMC) methodology. The statistical performance and computational efficiency of LPSMC is assessed in a simulation study. Results show that LPSMC is an appealing alternative to MCMC for approximate Bayesian inference in standard mixture cure models. Finally, the novel LPSMC approach is illustrated on three applications involving real survival data.

Entities: Chemical

Keywords: Approximate Bayesian inference; Laplace approximation; Metropolis-adjusted Langevin algorithm; P-splines; Survival analysis

Mesh：

Year: 2022 PMID： 35699121 PMCID： PMC9542184 DOI： 10.1002/sim.9373

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

INTRODUCTION

Survival analysis is a challenging, yet very attractive area of statistical science that is devoted to the study of time‐to‐event data. Standard models for survival data typically leave no room for the existence of a cure fraction such that it is implicitly assumed that all subjects of the population under study will experience the event of interest as time unfolds for a sufficiently long period. Technological breakthroughs in medicine during the last decades, especially in cancer research, have led to the development of promising new treatments and therapies so that many diseases previously considered fatal can now be cured. This phenomenon has triggered the necessity to develop models that allow for long‐term survivors and gave birth to cure models. A recent complete textbook treatment on cure models is proposed by Peng and Yu and an enriching literature review on cure regression models has been written by Amico and Van Keilegom. Among the large family of cure models that have emerged, the mixture cure model driven by the seminal work of (Boag; Berkson and Gage; Haybittle) , , and later refined by Farewell , is probably the most prominent as its mathematical formulation allows for a clear and interpretable separation of the population in two categories, namely cured subjects and susceptible subjects who are at risk of experiencing the event of interest. Let be a continuous random variable representing the survival time. Existence of a cure proportion in the population under study is made possible by allowing the event to arise with positive probability. To include covariate information, denote by and random covariate vectors (with continuous and/or discrete entries) that belong to covariate spaces and , respectively. In a mixture cure model, the population survival function expresses the separation between the cured and uncured subpopulations as follows: with covariate vectors and that can share (partially) the same components or can be entirely different. The term is frequently called the “incidence” of the model and corresponds to the conditional probability of being uncured, that is, with binary variable referring to the (unknown) susceptible status and the indicator function, that is, if condition is true. The term is known as the “latency” and represents the conditional survival function of the uncured subjects . The logistic link is commonly employed to establish a functional relationship between the probability to be uncured and the vector , , , so that , with regression coefficients and an intercept term. The latency part is often specified in a semiparametric fashion by using the Cox proportional hazards (PH) model (see eg, 12, 13) and implies the following form for the survival function of the susceptibles , where are the regression coefficients pertaining to the latency part and is the baseline survival function. The philosophy underlying Bayesian approaches considers that the model parameters are random and that their underlying uncertainty is characterized by probability distributions. After obtaining data, Bayes' theorem acts as a mechanistic process describing how to update our knowledge and is essentially the key ingredient permitting the transition from prior to posterior beliefs. Unfortunately, the complexity of mixture cure models is such that the posterior distribution of latent variables of interest are not obtainable in closed form. An elegant stochastic method that is widely used in practice is Markov chain Monte Carlo (MCMC) as it allows to draw random samples from desired target posterior distributions and hence compute informative summary statistics. According to Greenhouse et al , the first steps of Bayesian methods applied to mixture models with a cure fraction date back as far as Chen et al to analyze cancer data. Later, Stangl and Stangl and Greenhouse used a Bayesian mixture survival model to analyze clinical trial data related to mental health. The end of 1990s saw the emergence of Bayesian approaches in the promotion time cure model, another family of cure models motivated by biological mechanisms that does not impose a mixture structure on the survival. Some references for Bayesian analysis in the latter model class are Yin and Ibrahim , who proposed a Box‐Cox based transformation on the population survival function to reach a unified family of cure rate models embedding the promotion time cure model as a special case; Bremhorst and Lambert used Bayesian P‐splines with MCMC for flexible estimation in the promotion time cure model andGressani and Lambert suggested a faster alternative based on Laplace approximations. More recent uses of Bayesian methods in mixture cure models are Yu and Tiwari in the context of grouped population‐based cancer survival data or Martinez et al who consider a parametric specification for the baseline survival of uncured subjects governed by a generalized modified Weibull distribution. In the literature of cure survival models, only scarce attempts have been initiated to propose an alternative to the deep‐rooted MCMC instruments. This is especially true for mixture cure models, where to our knowledge Lázaro et al is the only reference proposing an approximate Bayesian method based on a combination of Integrated Nested Laplace Approximations (INLA) and modal Gibbs sampling. In this article, we propose a new approach for fast approximate Bayesian inference in the mixture cure model based on the idea of Laplacian‐P‐splines. The proposed Laplacian‐P‐splines mixture cure (LPSMC) model has various practical and numerical advantages that are worth mentioning. First, as opposed to Lázaro et al , our approach is completely sampling‐free in the sense that estimation can be fully reached without the need of drawing samples from posterior distributions. This of course implies a huge gain from the computational side, without even mentioning the additional speed‐up effect implied by the analytically available gradient and Hessian of the log‐likelihood function in our Laplace approximation scheme. Second, the LPSMC approach delivers approximations to the joint posterior latent vector, while the INLA scheme concentrates on obtaining approximated versions of the marginal posterior of latent variables. A direct positive consequence is that with LPSMC, the “delta” method can be used to compute (approximate) credible intervals for functions of latent variables, such as the cure proportion or the survival function of the uncured, in virtually no time. A third beneficial argument is that the use of P‐splines is particularly well adapted in a Bayesian framework and provides smooth estimates of the survival function. Our methodology and its associated algorithms natively account for the underlying mixture specification, making LPSMC a tailored toolkit to fit mixture cure models, contrary to INLA that is originally not adapted to cope with mixture structures. A fully stochastic alternative to LPSMC is also proposed based on a Metropolis‐Langevin‐within‐Gibbs (MLWG) algorithm to sample from the joint posterior of the model parameters. The conditional posterior distribution of the latent vector is explored via a Metropolis algorithm augmented by Langevin diffusions that account for the gradient of the (log) target distribution to generate new proposals. Using the variance‐covariance matrix of the Laplace approximation as a basis for the correlation structure of the proposal distribution proves to be beneficial as it improves chain mixing and convergence. The mixcurelps package gathers all the routines to implement the LPSMC methodology (and MLWG sampler) and can be used to reproduce the main results of this article as shown in the supplementary materials. For higher efficiency, algorithms related to Laplace approximations and B‐splines evaluations are coded in C++ and integrated into the R language via the Rcpp package. The article is organized as follows. In Section 2, the spline specification of the log‐baseline hazard is presented and the Bayesian model is formulated along with the prior assumptions. Laplace approximations to the conditional latent vector are derived and an approximate version of the posterior penalty parameter is proposed. The end of Section 2 is dedicated to the construction of approximate credible intervals for (functions of) latent variables. In Section 3, we propose a Markov chain Monte Carlo algorithm with a Metropolis‐adjusted Langevin setting to explore the conditional posterior distribution of the latent vector. The steps of the so‐called Metropolis‐Langevin‐within‐Gibbs (MLWG) sampler are presented in detail and summarized in a pseudocode. Section 4 aims at assessing the proposed LPSMC and MLWG methodologies in a numerical study with simulated data under different cure and censoring scenarios. Section 5 is dedicated to three real data applications and Section 6 concludes the article.

THE LAPLACIAN‐P‐SPLINES MIXTURE CURE MODEL

We consider that the survival time is accompanied by the frequently encountered feature of random right censoring. Rather than observing directly, one observes the pair , where is the follow‐up time and is the event indicator ( if the event occurred and otherwise) and is a non‐negative random censoring time that is assumed conditionally independent of given the covariates, that is, . At the sample level, denotes the observables for the unit, with the realization of and its associated event indicator. Vectors and represent the observed covariate values of subject and the entire information set available from data with sample size is denoted by .

Flexible modeling of the baseline risk function with B‐splines

A flexible spline specification of the (log) baseline hazard function is proposed , using a linear combination of cubic B‐splines, that is, , where is a ‐dimensional vector of B‐spline amplitudes and is a cubic B‐spline basis constructed from a grid of equally spaced knots in the closed interval , with the largest observed follow‐up time. Partitioning into (say 300) sections of equal length with midpoint , the Riemann midpoint rule is used to approximate the analytically unsolvable baseline survival function: where is an integer enumerating the interval that includes time point .

Latent vector and priors

The latent vector of the model is and contains the spline vector , the vector of regression coefficients belonging to the incidence part (including the intercept) and the vector of remaining regression coefficients belonging to the latency part , with dimension . Based on the idea of Eilers and Marx , we fix large enough to ensure flexible modeling of the baseline hazard curve and counterbalance the latter flexibility by imposing a discrete penalty on neighboring B‐spline coefficients based on finite differences. In a Bayesian translation, the prior distributional assumption on the B‐spline vector is taken to be Gaussian , with a covariance matrix formed by the product of a roughness penalty parameter and a penalty matrix obtained from th order difference matrices of dimension . An ‐multiple of the ‐dimensional identity matrix is added to ensure full rankedness with typical values for the scalar perturbation being or . Furthermore, a Gaussian prior is imposed on the remaining regression coefficients, with zero mean and small (common) precision , resulting in the following proper (conditional) prior for the latent vector with covariance matrix: The prior precision matrix of the latent vector is denoted by . For full Bayesian treatment, we impose a robust Gamma prior on the roughness penalty parameter as suggested by Jullion and Lambert. In particular, the prior on the roughness penalty parameter conditional on a dispersion hyperparameter is taken to be , where denotes a Gamma distribution with mean and variance . A proper Gamma hyperprior is imposed on , that is, with small values for and to make it uninformative (here ). For a fixed value of (here ), have shown that the model fit exhibits almost no sensitivity to the choice of and when the latter are fixed to a small value. This is also confirmed in our setting (see supplementary materials), where a sensitivity analysis is implemented to measure the impact on the mixture cure model fit for different configurations of and .

Laplace approximations

In a mixture cure model, the full likelihood is given by (see eg, Sy and Taylor ): where . Using the Cox PH model specification for the survival function of the susceptibles, one recovers: It follows that the log‐likelihood is: Using the B‐spline approximations for the baseline quantities, we get: Let us denote by the contribution of the th unit to the log‐likelihood: so that the log‐likelihood can be compactly written as . Using Bayes' theorem, the conditional posterior of the latent vector is (up to a proportionality constant): A second‐order Taylor expansion of the log‐likelihood yields a quadratic form in the latent vector and hence can be used to obtain a Laplace approximation to (2) as shown in appendix. In what follows, we denote by the Laplace approximation to for a given value of .

Approximate posterior of the penalty parameter

The conditional posterior in (2) is a function of the penalty parameter . In a frequentist setting, an “optimal” value for is generally obtained by means of the Akaike information criterion (AIC) or (generalized) cross‐validation. From a Bayesian perspective, is random and its associated posterior distribution is of crucial importance for optimal smoothing. Let denote the vector of hyperparameters. Mathematically, the posterior of is given by: Using the Laplace approximation to the denominator and replacing the latent vector by its modal value from the Laplace approximation, the marginal posterior in (3) is approximated in the spirit of Tierney and Kadane : Note that in the above formula, the determinant of the block diagonal matrix (coming from the Gaussian prior on ) is equal to the product of the determinants of the composing blocks and hence , which explains part of the power to which the roughness penalty parameter is raised. Note also that is the kernel of a Gamma distribution with shape parameter and rate parameter , so that integrating the latter in the domain yields , where is the gamma function. It follows that can be analytically integrated out from : For numerical reasons it is more appropriate to work with the log transformed penalty parameter as the latter is unbounded. Using the transformation method for random variables, one obtains the following approximated (log) posterior for : where denotes equality up to an additive constant. Approximation (5) provides a good starting point for various strategies to explore the posterior penalty space. A possibility is to use grid‐based approaches , or MCMC algorithms , as often encountered in models with a multidimensional penalty space. In latent Gaussian models, the posterior penalty typically satisfies suitable regularity conditions such as unimodality and not a “too‐large” deviation from Gaussianity. This suggests to use a simple and yet efficient type of bracketing algorithm to compute the (approximate) posterior mode of . Starting with an arbitrarily “large” value, say , the algorithm moves in the left direction with a fixed step size , that is, at the th iteration . Movement in the left direction continues until reaching , the point at which the target function starts to point downhill . The approximated modal value is then . Figure 1 illustrates the normalized approximate posterior to along with the modal value (dashed line) obtained with a step size .

FIGURE 1

Approximated (normalized) posterior of the log penalty . The dashed line is the modal value obtained with the bracketing algorithm using a step size

Approximate credible intervals

The Laplace approximation to the conditional posterior of the latent vector evaluated at the (approximated) modal posterior value (cf. Section 2.4) is denoted by and a point estimate for is taken to be the mean/mode with associated variance‐covariance matrix . To ensure that the estimated baseline survival function “lands” smoothly on the horizontal asymptote at 0 near the end of the follow‐up, we constrain the last B‐spline coefficient by fixing . A major advantage of LPSMC is that credible intervals for (differentiable functions of) latent variables can be straightforwardly obtained starting from .

Credible interval for latent variables

A (approximate) credible interval for a latent variable follows easily from the fact that the Laplace approximated posterior to is , where is the th entry of the vector and is the th entry on the main diagonal of . It follows that a quantile‐based credible interval for is: where is the upper quantile of a standard normal variate.

Credible interval for the incidence and cure rate

The incidence of the mixture cure model is a function of the latent vector and hence an appropriate approach to derive credible intervals for is through using a “delta” method. In particular, let us consider the following differentiable function of the probability to be uncured , where is a known profile of the covariate vector. The Laplace approximated posterior to vector is known to be with mean vector and covariance matrix . The delta method operates via a first‐order Taylor expansion of around : with gradient: Note that in (6) is still Gaussian as it is a linear combination of a random vector that is a posteriori (approximated by a) Gaussian due to the Laplace approximation with mean and covariance matrix . This suggests to write the approximated posterior of as: and so an approximate quantile‐based credible interval for is: Multiplying the values in the interval (7) by gives us the desired credible interval for the incidence . If a credible interval for the cure rate is required, simply use the transformation with gradient .

Credible interval for and

Let us denote by the th quantile of the distribution of the survival time at baseline by . The “delta” method can also be used to compute an approximate credible interval for at by using a transformation . Starting from the Laplace approximated posterior , one can show that the resulting credible interval for is: where is the gradient of with respect to evaluated at and can be found in Gressani and Lambert , Appendix C. Applying the inverse transformation to (8) yields the desired credible interval for at . The same approach is used to construct credible intervals for the survival function of the uncured at for a given covariate profile . Applying the transform yields with gradient: The resulting credible interval for is: where is the covariance matrix of the vector obtained from the Laplace approximation. Finally, an transform on (9) gives a credible interval for at for a given covariate vector .

BAYESIAN INFERENCE BASED ON MARKOV CHAIN MONTE CARLO

A classic alternative to the Laplacian‐P‐splines methodology for Bayesian inference in the mixture cure model is to rely on Markov chain Monte Carlo methods. In this section, we propose a fully stochastic algorithm to explore the joint posterior distribution . The sampling scheme is based on a Metropolis‐adjusted Langevin strategy to draw samples from the conditional posterior of the latent vector given in (2). Such a strategy has already been proposed by Lambert and Eilers in a proportional hazards model with time‐varying regression coefficients and also later by Lambert and Eilers for density estimation. The Gibbs sampler can then be used to directly simulate from the conditional posterior distributions of the remaining parameters and . The proposed Metropolis‐Langevin‐within‐Gibbs (MLWG) sampler complements the LPSMC approach and gives the end user the opportunity to choose between a totally sampling‐free approach (as provided by LPSMC in Section 2) and a fully stochastic approach (as provided by MLWG) for inference. The conditional posterior of the roughness penalty parameter is given by: so that , where is a square matrix of dimension with penalty matrix on the upper‐left block and a matrix with the same dimension having on the lower‐right block. The conditional posterior of the dispersion hyperparameter is shown to be .

Metropolis‐adjusted Langevin algorithm

The Metropolis‐adjusted Langevin algorithm (MALA) is used to explore the conditional posterior distribution . Compared to a classic random‐walk Metropolis algorithm, MALA was shown to guarantee a more efficient sampling scheme in the sense of improved mixing and convergence of the chains in the mixture cure model. Two main reasons explain this extra efficiency. First, the proposal distribution in MALA is constructed around Langevin diffusions that use the gradient of the (log) posterior target distribution, here . Second, the covariance matrix of the proposal distribution is based on the covariance matrix of the Laplace approximation that sets a desirable correlation structure between latent variables. Let and denote the state of the chain for and respectively at iteration . In MALA, the candidate latent vector at iteration is a random draw from the following Gaussian proposal distribution: where is a tuning parameter that is adaptively calibrated , to reach the optimal acceptance rate of 0.57. The MLWG algorithm to explore is summarized below. Metropolis‐Langevin‐within‐Gibbs algorithm to explore The kernel of the proposal density is given by: where . Defining , one can show that the ratio of proposal distributions in the computation of the acceptance probability (line 6 of the above algorithm) is: The function on line 13 of the MLWG algorithm is given by , with and , see also Lambert and Eilers . The above MLWG algorithm is used to explore the posterior and the generated samples (after burn‐in) permit to compute posterior estimates of latent variables.

SIMULATION STUDY

To measure the statistical performance of the LPSMC methodology in a mixture cure model setting, we consider a numerical study where survival data is generated according to different cure and censoring rates. Generation of survival data for the th subject is as follows. The incidence part is generated from a logistic regression function with two covariates , where is a standard normal variate and is a Bernoulli random variable. The cure status is generated as a Bernoulli random variable with failure probability , that is, . Survival times for the uncured subjects () are obtained from a Weibull Cox proportional hazards model and are truncated at . The latency is given by , with scale parameter and shape . The covariate vector is independent of . We assume that follows a standard Gaussian distribution and . For the cured subject (), the theoretically infinite survival times are replaced by a large value, say . The censoring time is independent of the vector and is sampled from an exponential distribution with mean (ie, density ) truncated at . We consider samples of sizes and and simulate survival data with two scenarios for the coefficients , and , yielding different censoring and cure rates. In both scenarios (almost) all the observations in the plateau of the Kaplan‐Meier estimator are cured, such that the simulated data are representative of the practical real case scenarios for which mixture cure models are used. Table 1 provides a summary of the two considered scenarios.

TABLE 1

Parameters for the incidence, latency, and censoring rate yielding different cure and censoring levels

Scenario	β0	β1	β2	γ1	γ2	Cure	μc	Censoring	Plateau
1	0.70	−1.15	0.95	−0.10	0.25	28.77%	0.16	48.56%	8.00%
2	1.25	−0.75	0.45	−0.10	0.20	21.20%	0.05	29.32%	14.21%

Note: The last column indicates the percentage of observations in the plateau located at the right tail of the Kaplan‐Meier curve.

Parameters for the incidence, latency, and censoring rate yielding different cure and censoring levels Note: The last column indicates the percentage of observations in the plateau located at the right tail of the Kaplan‐Meier curve. We specify 15 cubic B‐splines in the interval with upper bound fixed at and a third order penalty to counterbalance model flexibility. In the bracketing algorithm, we use a step size to compute the (approximate) modal posterior log penalty value . For each scenario, we simulate replications and compute the bias, empirical standard error (ESE), root mean square error (RMSE) and coverage probabilities for and (quantile‐based) credible intervals. Results for LPSMC and the MLWG sampler with a chain of length 10 000 and a burn‐in of 3000 are summarized in Table 2. The bias is negligible and the ESE and RMSE decrease with larger sample size, as expected. In addition, the estimated coverage probabilities are close to their respective nominal value in all scenarios. What can also be noted from Table 2 is the similarity between the LPSMC and MLWG results, showing that the Laplace approximation is quite accurate and competitive against a fully MCMC based approach even with a relatively long chain length. LPSMC is of course much faster than MLWG as it takes 0.7 s to fit the model in with an Intel Xeon E‐2186M processor at 2.90GHz. Figure 2 shows the estimated baseline survival curves (gray), the target (solid) and the pointwise median of the 500 curves (dashed) for the different scenarios using the LPSMC approach.

TABLE 2

Numerical results for replications of sample size and under two different cure‐censoring scenarios

Scenario	Parameters	Mean	Bias	ESE	RMSE	CP90%	CP95%
Scenario 1 (n=300)	β0=0.70	0.721 (0.743)	0.021 (0.043)	0.249 (0.274)	0.250 (0.277)	91.0 (88.4)	96.4 (94.4)
	β1=−1.15	−1.182 (−1.230)	−0.032 (−0.080)	0.240 (0.255)	0.242 (0.267)	91.6 (89.8)	94.8 (94.2)
	β2=0.95	0.954 (1.044)	0.004 (0.094)	0.390 (0.407)	0.390 (0.417)	90.6 (90.4)	94.2 (95.4)
	γ1=−0.10	−0.101 (−0.098)	−0.001 (0.002)	0.092 (0.094)	0.092 (0.094)	89.6 (87.8)	94.4 (94.8)
	γ2=0.25	0.242 (0.234)	−0.008 (−0.016)	0.184 (0.188)	0.184 (0.188)	89.0 (89.2)	96.4 (94.8)
Scenario 1 (n=600)	β0=0.70	0.703 (0.715)	0.003 (0.015)	0.184 (0.181)	0.184 (0.181)	89.6 (90.0)	95.6 (95.0)
	β1=−1.15	−1.154 (−1.190)	−0.004 (−0.040)	0.167 (0.172)	0.167 (0.176)	90.2 (88.2)	95.0 (93.8)
	β2=0.95	0.950 (0.978)	0.000 (0.028)	0.269 (0.280)	0.268 (0.281)	90.8 (88.4)	95.0 (93.8)
	γ1=−0.10	−0.102 (−0.100)	−0.002 (0.000)	0.064 (0.065)	0.064 (0.065)	89.6 (89.0)	94.8 (93.8)
	γ2=0.25	0.254 (0.247)	0.004 (−0.003)	0.127 (0.123)	0.127 (0.123)	90.8 (91.4)	96.0 (95.2)
Scenario 2 (n=300)	β0=1.25	1.281 (1.275)	0.031 (0.025)	0.229 (0.239)	0.230 (0.240)	91.4 (91.0)	95.6 (94.2)
	β1=−0.75	−0.765 (−0.784)	−0.015 (−0.034)	0.182 (0.197)	0.183 (0.200)	91.0 (88.2)	95.4 (93.2)
	β2=0.45	0.430 (0.480)	−0.020 (0.030)	0.330 (0.333)	0.330 (0.334)	89.0 (90.2)	94.8 (95.0)
	γ1=−0.10	−0.103 (−0.102)	−0.003 (−0.002)	0.074 (0.075)	0.074 (0.075)	88.8 (90.2)	94.4 (95.0)
	γ2=0.20	0.194 (0.196)	−0.006 (−0.004)	0.151 (0.157)	0.151 (0.157)	87.2 (87.8)	94.8 (93.4)
Scenario 2 (n=600)	β0=1.25	1.248 (1.270)	−0.002 (0.020)	0.160 (0.164)	0.160 (0.165)	91.4 (90.8)	95.6 (95.8)
	β1=−0.75	−0.748 (−0.771)	0.002 (−0.021)	0.129 (0.132)	0.129 (0.133)	89.8 (88.6)	95.2 (94.4)
	β2=0.45	0.459 (0.467)	0.009 (0.017)	0.223 (0.238)	0.223 (0.239)	91.0 (89.8)	95.6 (94.4)
	γ1=−0.10	−0.100 (−0.106)	0.000 (−0.006)	0.054 (0.052)	0.054 (0.052)	88.0 (89.6)	95.6 (94.0)
	γ2=0.20	0.199 (0.195)	−0.001 (−0.005)	0.105 (0.111)	0.105 (0.111)	89.6 (87.8)	95.0 (92.6)

Note: Results in parenthesis are for the MLWG sampler with chain length 10 000 and burn‐in 3000.

FIGURE 2

Estimated baseline survival curves (gray) for replications under different scenarios with LPSMC. The black curve is the target baseline and the dashed curve is the pointwise median of the 500 gray curves

Numerical results for replications of sample size and under two different cure‐censoring scenarios Note: Results in parenthesis are for the MLWG sampler with chain length 10 000 and burn‐in 3000. Estimated baseline survival curves (gray) for replications under different scenarios with LPSMC. The black curve is the target baseline and the dashed curve is the pointwise median of the 500 gray curves Another performance measure for the incidence of the model is obtained by computing the average squared error (ASE) of defined as . The latter quantity is computed on triplets of covariate values for , where the set of couples equals the Cartesian product between an equidistant grid in with step size 0.001 (for variable ) and the set (for ). Boxplots for the ASE in the different scenarios are displayed in Figure 3. Coverage probability of the and (approximate) credible interval for the incidence (and cure rate ) at the mean covariate profile as computed from (7) have also been measured (with LPSMC) and are close to their nominal value in all scenarios. In Table 3, the performance of approximate credible intervals for the baseline survival and survival of the uncured for a mean covariate profile at selected quantiles of is shown. Results are for replications of sample size with B‐splines. In Scenario 1, there is a more pronounced difference in coverage performance between LPSMC and MLWG at quantiles , , and , where the MLWG sampler exhibits estimated coverage probabilities moderately below nominal coverage. The average credible interval widths and baseline bias (computed in the supplementary materials) at the selected quantiles can be used to explain this difference. In Scenario 1, for , , and , the mean credible interval widths are roughly identical for LPSMC and the sampling approach, but the baseline bias (although negligible) is slightly higher for MLWG and thus implies a small loss in coverage. For the remaining quantiles, LPSMC and MLWG have comparable performance except for and , where both methods undercover but MLWG is closer to the nominal level in both scenarios.

FIGURE 3

Boxplots of the ASE for the incidence in different scenarios

TABLE 3

Estimated coverage probability of and credible intervals for and with replications of sample size and B‐splines at selected quantiles

	Nominal	Scenario	t0.20	t0.30	t0.40	t0.50	t0.60	t0.70	t0.80	t0.90
Baseline S0(·)	90 (LPSMC)	1	88.5	86.5	90.0	92.0	93.0	93.5	81.0	41.5
	90 (MLWG)	1	86.0	84.0	81.0	81.0	88.0	90.5	85.5	61.5
	95 (LPSMC)	1	93.5	94.0	94.5	94.5	95.0	95.5	90.0	49.0
	95 (MLWG)	1	92.0	88.5	88.5	90.5	93.5	95.0	92.5	74.5
	90 (LPSMC)	2	85.0	83.5	86.0	88.0	88.5	89.0	70.5	18.5
	90 (MLWG)	2	84.0	80.5	82.0	86.0	87.5	90.0	79.5	40.5
	95 (LPSMC)	2	93.0	92.5	91.0	94.5	94.5	92.0	79.0	30.5
	95 (MLWG)	2	93.0	90.0	90.0	92.0	95.5	94.0	87.0	52.5
Uncured Su(·\|z‾)	90 (LPSMC)	1	88.5	89.5	88.5	89.5	94.0	91.5	77.5	22.5
	90 (MLWG)	1	82.0	76.0	75.5	79.5	86.0	88.5	81.5	51.0
	95 (LPSMC)	1	96.0	93.5	95.0	96.5	98.0	97.0	85.5	34.0
	95 (MLWG)	1	89.0	83.5	84.5	87.0	93.0	95.5	90.0	60.5
	90 (LPSMC)	2	87.0	81.5	85.5	90.0	88.5	87.0	68.0	10.0
	90 (MLWG)	2	74.5	76.0	79.0	88.0	90.5	89.5	75.5	27.0
	95 (LPSMC)	2	91.5	90.0	91.0	93.0	95.0	93.0	78.0	17.5
	95 (MLWG)	2	85.0	83.5	89.0	92.5	96.0	94.0	82.0	42.5

Note: Results for the MLWG sampler with chain length 10 000 and burn‐in 3000 are also shown.

Boxplots of the ASE for the incidence in different scenarios Estimated coverage probability of and credible intervals for and with replications of sample size and B‐splines at selected quantiles Note: Results for the MLWG sampler with chain length 10 000 and burn‐in 3000 are also shown. The empirical computation time required for implementing LPSMC and MLWG (via the mixcurelps package) is assessed for different combinations of sample size and B‐spline dimension . The real elapsed time averaged over replications in each ‐pair is reported in Table 4. Even in relatively large sample settings () with a large number of B‐spline coefficients (such as ), LPSMC is able to fit the mixture cure model in a couple of seconds.

TABLE 4

Empirical computation times for LPSMC and MLWG (with chain length 500) for different combinations of sample size and B‐spline dimension

		n=100	n=200	n=500	n=800	n=1000
LPSMC	K=15	0.289	0.333	0.572	0.777	0.970
	K=20	0.466	0.549	0.889	1.238	1.517
	K=30	1.042	1.110	1.848	3.057	4.164
	K=50	2.214	2.614	3.733	5.041	6.294
MLWG	K=15	1.561	1.808	2.514	3.250	3.801
	K=20	1.943	2.157	2.978	3.835	4.486
	K=30	2.818	3.065	4.588	5.997	7.422
	K=50	5.311	5.932	7.572	9.417	10.982

Note: Results correspond to average wall clock times (elapsed real times) in seconds over replications.

Empirical computation times for LPSMC and MLWG (with chain length 500) for different combinations of sample size and B‐spline dimension Note: Results correspond to average wall clock times (elapsed real times) in seconds over replications. Computational times for LPSMC on denser grids of and respectively are reported in the supplementary materials and suggest that the underlying algorithms run in linear time with respect to the sample size and in more than linear time with respect to the spline dimension.

REAL DATA APPLICATIONS

ECOG e1684 clinical trial

We start by applying the LPSMC methodology to a well‐known dataset in the cure literature and compare the estimates with the ones obtained using the benchmark smcure package. The data comes from the Eastern Cooperative Oncology Group (ECOG) phase III two‐arm clinical trial with observations after removing missing data. The aim of the study was to assess whether Interferon alpha‐2b (IFN) had a significant effect on relapse‐free survival. There were 144 patients receiving the treatment and the remaining patients (140) belonged to the control group. The response of interest is relapse‐free survival time measured in years. The following covariates enter both the incidence and latency part of the model. TRT is the binary variable indicating whether a patient received the IFN treatment () or not (). Variable SEX indicates if a patient is female () or male (). In total, the study involves 113 women and 171 men. Finally, AGE is a continuous covariate (centered to the mean) indicating the age of patients. Figure 4, shows the Kaplan‐Meier estimated survival curve for the e1684 dataset. A plateau is clearly visible indicating the potential presence of a cure fraction, so that a mixture cure model is appropriate to fit this type of data.

FIGURE 4

Kaplan‐Meier estimated survival for the e1684 ECOG dataset. A cross indicates a censored observation

Kaplan‐Meier estimated survival for the e1684 ECOG dataset. A cross indicates a censored observation Table 5 summarizes the estimation results for the e1684 dataset with LPSMC, MLWG with chain length 20 000 (burn‐in of 10 000) and smcure. The estimated coefficients in the two parts of the model (incidence and latency) are of similar magnitude for all approaches. However, the computational times to fit the mixture cure model drastically differ depending on the method.

TABLE 5

Estimation results for the e1684 dataset with LPSMC, MLWG, and smcure

Model	Parameter	Estimate	SD	CI 90%
LPSMC (incidence)	β0 (intercept)	1.219	0.244	[0.819; 1.620]
	β1 (SEX)	−0.061	0.284	[−0.528; 0.406]
	β2 (TRT)	−0.567	0.281	[−1.029; −0.105]
	β3 (AGE)	0.016	0.011	[−0.002; 0.034]
MLWG (incidence)	β0 (intercept)	1.355	0.375	[0.884; 1.949]
	β1 (SEX)	−0.062	0.329	[−0.596; 0.467]
	β2 (TRT)	−0.567	0.325	[−1.092; −0.044]
	β3 (AGE)	0.019	0.016	[−0.001; 0.044]
smcure (incidence)	β0 (intercept)	1.365	0.329	[0.823; 1.906]
	β1 (SEX)	−0.087	0.333	[−0.634; 0.461]
	β2 (TRT)	−0.588	0.343	[−1.152; −0.025]
	β3 (AGE)	0.020	0.016	[−0.006; 0.047]
LPSMC (latency)	γ1 (SEX)	0.092	0.170	[−0.188; 0.371]
	γ2 (TRT)	−0.137	0.169	[−0.415; 0.142]
	γ3 (AGE)	−0.007	0.006	[−0.016; 0.003]
MLWG (latency)	γ1 (SEX)	0.058	0.183	[−0.255; 0.353]
	γ2 (TRT)	−0.170	0.188	[−0.484; 0.135]
	γ3 (AGE)	−0.007	0.006	[−0.017; 0.004]
smcure (latency)	γ1 (SEX)	0.099	0.175	[−0.189; 0.388]
	γ2 (TRT)	−0.154	0.177	[−0.444; 0.137]
	γ3 (AGE)	−0.008	0.007	[−0.019; 0.004]

Note: The first column indicates the model part, the second and third columns the parameter and its estimate. The fourth column is the (posterior) SD of the estimate and the last column the confidence/credible interval.

Estimation results for the e1684 dataset with LPSMC, MLWG, and smcure Note: The first column indicates the model part, the second and third columns the parameter and its estimate. The fourth column is the (posterior) SD of the estimate and the last column the confidence/credible interval. It takes approximately 125 s to fit the model with smcure, while LPSMC takes only 0.5 s. This speedup factor of is mainly due to the bootstrap samples that smcure uses to compute estimates of the variance of estimated parameters. The LPSMC approach is completely sampling‐free and intrinsically accounts for fast computation of credible intervals for the latent variables. Results for both smcure and LPSMC/MLWG show that treatment (TRT) has a significant and positive impact on the cure probability, while IFN treatment has no significative impact in the latency part. In other words, taking AGE and SEX into account, receiving the IFN treatment decreases the probability to be susceptible but will not postpone relapse among the uncured. Similar findings are reported in Corbière and Joly and Legrand and Bertrand . The e1684 dataset has also been analyzed by Lázaro et al with the INLA methodology. For this dataset, they compared the runtimes of INLA with a Weibull baseline hazard (via modal Gibbs running 500 iterations) with an MCMC algorithm implemented in JAGS. The authors reported computational times of 28 min for INLA and around 33 seconds per 1000 iterations for MCMC (55 minutes for three Markov chains with 100 000 iterations). This is substantially above the runtime that we measured with our MLWG sampler (around 3 seconds per 1000 iterations) which is approximately 10‐fold faster.

Breast cancer data

The second dataset used to illustrate the LPSMC methodology concerns patients with breast cancer studied in Wang et al . Survival time (in days) is defined as the distant‐metastasis‐free survival (DMFS), that is, time to occurrence of distant metastases or death (whatever happens first) and the event of interest is distant‐metastasis. The data is obtained from the breastCancerVDX package on Bioconductor (https://bioconductor.org/). We consider two covariates entering simultaneously in the incidence and latency part, namely the age of patients, ranging between 26 and 83 years with a median of 52 years and the categorical variable Estrogen Receptor (ER) (with : if fmol per mg protein [77 patients] and otherwise [209 patients]). We use cubic B‐splines in , with days, the largest follow‐up time. The Kaplan‐Meier estimate of the data given in Figure 5 emphasizes the existence of a plateau and motivates the use of a mixture cure model.

FIGURE 5

Kaplan‐Meier estimated survival for the breast cancer data. A cross indicates a censored observation

Kaplan‐Meier estimated survival for the breast cancer data. A cross indicates a censored observation The estimated latent variables with LPSMC are reported in Table 6. We see that AGE and ER have no significant effect on the probability of being uncured (incidence) at the level of significance. However, the latter two covariates significantly affect the survival of the uncured.

TABLE 6

Estimation results for the breast cancer data with LPSMC

Model	Parameter	Estimate	SD	CI 90%
(Incidence)	β0 (intercept)	−0.013	0.576	[−0.959; 0.934]
	β1 (AGE)	−0.012	0.010	[−0.028; 0.005]
	β2 (ER)	0.181	0.281	[−0.280; 0.643]
(Latency)	γ1 (AGE)	−0.016	0.008	[−0.030; −0.003]
	γ2 (ER)	−1.047	0.245	[−1.450; −0.645]

Note: The first column indicates the model part, the second and third columns the parameter and its posterior (mean) estimate. The fourth column is the posterior SD of the estimate and the last column the credible interval.

Estimation results for the breast cancer data with LPSMC Note: The first column indicates the model part, the second and third columns the parameter and its posterior (mean) estimate. The fourth column is the posterior SD of the estimate and the last column the credible interval. For instance, a subject that is susceptible to experience metastasis with has a risk of experiencing the event that is times smaller than the risk with . In Figure 6, the estimated survival for the susceptible subjects (and their associated credible interval) is shown for two age categories (30 and 50 years) with . The same data analysis is conducted with the MLWG sampler (see supplementary materials) using a chain of length 20 000 and a burn‐in of 10 000. The trace plots of the regression coefficients (in the incidence and latency part respectively) and the roughness penalty parameter are provided together with the Geweke diagnostics.

FIGURE 6

Estimated survival of the susceptibles for two age categories and

ZNA COVID‐19 data

In a third application, we use LPSMC to investigate the impact of age on survival of COVID‐19 patients. The data comes from the Ziekenhuis Netwerk Antwerpen (ZNA, Belgium), a network of hospitals in the province of Antwerp. The considered dataset has patients entering hospitals between March 2020 and April 2021. The follow‐up time is defined as the total number of days spent in hospital and receiving COVID‐19 care. The outcome of interest is in‐hospital death due to COVID‐19 as indicated by a binary variable (1 = Dead; 0 = Alive). Among the 3258 patients, 461 (14.15) experienced in‐hospital death and the remaining 2797 (85.85) are censored due to causes unrelated to the outcome of interest (uninformative censoring). The Kaplan‐Meier curve is given in Figure 7 and highlights a wide plateau around 0.52, suggesting the existence of a cure fraction and hence motivating a cure model analysis with LPSMC. The age of patients is taken as a covariate both in the incidence and latency part of the model. The mean age is 66 years and the youngest, respectively oldest patient is 1 and 103 year(s) old. For the LPSMC approach, we use B‐splines between 0 and the largest follow‐up time () along with a third order penalty and B‐splines for the MLWG sampler.

FIGURE 7

Kaplan‐Meier estimated survival for the ZNA data. A cross indicates a censored observation

Table 7 summarizes the estimation results for the model parameters with LPSMC and MLWG respectively (with a chain length of 15 000 and a burn‐in of size 8000). We see that AGE has a positive and significant effect in the incidence part of the model. In other words, older patients have a larger probability of being uncured and hence a smaller probability to be cured from COVID‐19. The estimated cure proportion for different AGE categories is shown in Table 8. In the latency part, AGE is positive and not significant at the level and thus there is no significant effect of AGE on the survival of uncured patients.

TABLE 7

Estimation results for the ZNA data with LPSMC and MLWG

Model	Parameter	Estimate	SD	CI 95%
LPSMC (incidence)	β0 (intercept)	−2.608	0.791	[−4.159; −1.057]
	β1 (AGE)	0.031	0.010	[0.011; 0.051]
MLWG (incidence)	β0 (intercept)	−2.370	0.780	[−3.850; −0.757]
	β1 (AGE)	0.029	0.010	[0.009; 0.049]
LPSMC (latency)	γ1 (AGE)	0.011	0.007	[−0.004; 0.025]
MLWG (latency)	γ1 (AGE)	0.012	0.007	[−0.001; 0.027]

TABLE 8

Estimation of the cure proportion for different age categories and associated approximate credible interval

AGE	1−p^(x)	CI 95%
20	0.879	[0.682; 0.958]
30	0.843	[0.658; 0.932]
50	0.742	[0.607; 0.837]
80	0.532	[0.466; 0.592]

Kaplan‐Meier estimated survival for the ZNA data. A cross indicates a censored observation Estimation results for the ZNA data with LPSMC and MLWG Note: The first column indicates the model part, the second and third columns the parameter and its posterior (mean) estimate. The fourth column is the posterior SD of the estimate and the last column the credible interval. Estimation of the cure proportion for different age categories and associated approximate credible interval

CONCLUSIONS

Approximate Bayesian inference methods are an interesting alternative to classic MCMC algorithms, especially when the latter require long computation times, as can be the case for models with complex likelihoods. In survival analysis, the mixture cure model is an interesting model class that allows for the existence of a cure fraction and hence goes beyond classic proportional hazards models for which the feature of long‐term survivors is absent. In this article, we propose a new approach for fast Bayesian inference in the standard mixture cure model by combining the strength of Laplace approximations to (conditional) posterior distributions of latent variables and P‐splines for flexible modeling of baseline smooth quantities. The attractiveness of the LPSMC approach lies in its completely sampling‐free framework, with an analytically available gradient and Hessian of the log‐likelihood that makes the approach extremely fast from a computational viewpoint. This computational advantage and the relatively straightforward possibility to derive (approximate) credible intervals for functions of latent variables makes it a promising tool for fast analysis of survival data with a cure fraction. Moreover, the proposed MLWG sampler brings a further layer of added value that complements the sampling‐free LPSMC approach with a fully MCMC‐based strategy for inference in the mixture cure model. A possible interesting extension for future research would be to generalize the LPSMC approach at the level of the incidence with alternative specifications to the standard logistic regression setting. For instance, LPSMC could be extended to the single‐index/Cox mixture cure model of Amico et al . Another research path is to further develop the routines of the mixcurelps package and work on an article exclusively dedicated to the main features of the software. Finally, one can attempt to extend Laplacian‐P‐splines in cure models with frailties or in the context of competing risks survival data with a cure fraction.

CONFLICT OF INTEREST

The authors have no conflicts of interest to declare. Data S1: Supplementary Materials Click here for additional data file.

Metropolis‐Langevin‐within‐Gibbs algorithm to explore

1: Set m=0 and fix initial values ξ(0), λ(0), δ(0) and ϱ(0).

2: for m=1,…,M do

3: (Metropolis‐adjusted Langevin)

4: Compute the Langevin diffusion: Eξ(m−1)=ξ(m−1)+0.5ϱ(m−1)∑∗∇logpξ(m−1)|λ(m−1),𝒟.

5: Generate proposal: ξ(prop)∼𝒩dim(ξ)E(ξ(m−1)),ϱ(m−1)∑∗.

6: Compute the acceptance probability πξ(m−1),ξ(prop)=min{1,p(ξ(prop)|λ(m−1),𝒟)p(ξ(m−1)|λ(m−1),𝒟)qξ(prop),ξ(m−1)qξ(m−1),ξ(prop)}.

7: Draw u∼𝒰(0,1).

8: if u≤π set ξ(m)=ξ(prop) (accept), else ξ(m)=ξ(m−1) (reject).

9: (Gibbs sampler)

10: Draw λ(m)∼𝒢0.5(K+φ),0.5ξ(m)⊤Pξ(m)+φδ(m−1).

11: Draw δ(m)∼𝒢(0.5φ+aδ,0.5φλ(m)+bδ).

12: (Update tuning parameter)

13: Set ϱ(m)=h2ϱ(m−1)+m−1πξ(m−1),ξ(prop)−0.57.

14: end for

15 in total

1. Estimation in a Cox proportional hazards cure model.

Authors: J P Sy; J M Taylor
Journal: Biometrics Date: 2000-03 Impact factor: 2.571

2. Bayesian proportional hazards model with time-varying regression coefficients: a penalized Poisson regression approach.

Authors: Philippe Lambert; Paul H C Eilers
Journal: Stat Med Date: 2005-12-30 Impact factor: 2.373

3. Assessing placebo response using Bayesian hierarchical survival models.

Authors: D K Stangl; J B Greenhouse
Journal: Lifetime Data Anal Date: 1998 Impact factor: 1.588

4. The combined effect of breast cancer risk factors.

Authors: V T Farewell
Journal: Cancer Date: 1977-08 Impact factor: 6.860

5. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

Authors: Yixin Wang; Jan G M Klijn; Yi Zhang; Anieta M Sieuwerts; Maxime P Look; Fei Yang; Dmitri Talantov; Mieke Timmermans; Marion E Meijer-van Gelder; Jack Yu; Tim Jatkoe; Els M J J Berns; David Atkins; John A Foekens
Journal: Lancet Date: 2005 Feb 19-25 Impact factor: 79.321

6. Survival estimation using splines.

Authors: A S Whittemore; J B Keller
Journal: Biometrics Date: 1986-09 Impact factor: 2.571

7. The use of mixture models for the analysis of survival data with long-term survivors.

Authors: V T Farewell
Journal: Biometrics Date: 1982-12 Impact factor: 2.571

8. Mixture and non-mixture cure fraction models based on the generalized modified Weibull distribution with an application to gastric cancer data.

Authors: Edson Z Martinez; Jorge A Achcar; Alexandre A A Jácome; José S Santos
Journal: Comput Methods Programs Biomed Date: 2013-08-06 Impact factor: 5.428