Literature DB >> 33511301

Bayesian survival analysis for early detection of treatment effects in phase 3 clinical trials.

Lucie Biard^1,2,3, Anne Bergeron^1,2,4, Vincent Lévy^1,5,6, Sylvie Chevret^1,2,3.

Abstract

Despite appealing characteristics for the clinical trials setting, Bayesian inference methods remain scarcely used, especially in randomized controlled clinical trials (RCT). This is particularly true when dealing with a survival endpoint, likely due to the additional complexities to model specifications. We propose to use Bayesian inference to estimate the treatment effect in this setting, using a proportional hazards (PH) model for right-censored data. Implementation of such an estimation process is illustrated on two working examples from cancer RCTs, the ALLOZITHRO and the CLL7-SA trials, both originally analyzed using a frequentist approach. In these two different settings, we show that Bayesian sequential analyses can provide early insight on treatment effect in RCTs. Relying on posterior distributions and predictive posterior probabilities, we find that Bayesian sequential analyses of the ALLOZITHRO trial, which was terminated early due to an unanticipated deleterious effect of the intervention on survival, allow quantifying early that the treatment effect was opposite to what was expected. Then, incorporating historical data in the sequential analyses of the CLL7-SA trial would have allowed the treatment effect to be closer to the protocol hypothesis. These post-hoc results give grounds to advocate for a wider use of Bayesian approaches in RCTs, including those with right-censored endpoints, as informative decision tools.

Entities: Chemical Disease Gene Species

Keywords: Bayesian inference; Censored data; Clinical trial; Historical data

Year: 2021 PMID： 33511301 PMCID： PMC7817368 DOI： 10.1016/j.conctc.2021.100709

Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN： 2451-8654

Introduction

Traditionally, randomized clinical trials (RCT) are designed and analyzed from a frequentist perspective using classical hypothesis testing. However, there is a growing awareness of the usefulness of Bayesian methods in analyzing RCTs, following [1], [2], [3], [4]. Indeed, the Bayesian approach possesses a number of practical advantages over the conventional approach that could be used in RCTs: (1) it allows the explicit integration of previous knowledge with new empirical data; (2) it avoids the inevitable misinterpretations of -values [5], [6]; (3) it replaces the misleading -value with a summary statistic having a natural and clinically relevant interpretation — the probability that the study hypothesis is true conditioning on the observations; and (4) it is tailored to the learning process: as information becomes available, one updates what one knows, and this gives the Bayesian approach its flexibility and makes it ideal for clinical research. Therefore, it is particularly suited to the sequential analyses of RCTs data. Actually, Bayesian approach has gained popularity in medical, pharmaceutical, and social science research because it allows researchers to combine prior information with data to model data generating processes; thus, it can incorporate previous knowledge on the likelihood of an event into the interpretation of trial results [3], [7]. When planning a RCT, previous data are often available, either in the control group (placebo, standard of care) or in the experimental group from slightly different populations (adult instead of pediatrics, animal studies, etc.). For instance, accounting for historical or external data can be part of the trial analysis and it can directly influence the design of the trial itself, in choosing whether to include patients in a control arm or use historical control data, or in estimating the required sample size [8], [9]. Nevertheless, Bayesian analyses are still mostly used in early phase trials or for innovative adaptive designs, and often restricted to continuous or binary endpoints [10]. We focused on phase 3 clinical trials with a survival outcome measure, a frequent setting in hemato-oncology, to illustrate based on real data, two main interesting uses of the Bayesian approach, as a simple tool for early stopping decisions and in borrowing of external data. Indeed, though these developments are not new, they are poorly used in practice, and Bayesian analyses of phase 3 trials, even recently, mostly use posterior densities of outcomes [11], or only historical controls data [12]. More specifically, the aim of this paper was two-fold: (i) to assess how sequential Bayes analyses may allow early decisions (termination) of the trial; (ii) to assess whether the borrowing of external data (for both the control and the experimental groups) in the analysis would allow optimizing the current trial design. The present work uses data from two example trials, both originally analyzed using a frequentist approach: (i) the ALLOZITHRO trial, which was terminated early due to an unanticipated deleterious effect of the azithromycin over placebo, exemplified by an increased cause-specific hazard of hematological relapse (HR, 1.7; 95%CI, 1.2–2.4; P = 0.002) [13], and (ii) the CLL 2007 SA trial (CLL7-SA) that demonstrated a benefit in progression-free survival of rituximab (RTX) maintenance therapy over standard of care in elderly patients with previously untreated chronic lymphocytic leukemia (HR, 0.55; 95%CI, 0.40–0.75, P = 0.0002) [14]. The paper is organized as follows: Section 1 presents the motivating examples in detail and Section 2 introduces the notations 2.1, the survival models used for the analyses 2.2 and specific methodological aspects of the data (2.3, 2.4 respectively). Results are reported in Section 3 and some discussion is provided in Section 4.

ALLOZITHRO trial: Bayesian sequential analyses

The ALLOZITHRO trial (NCT01959100) was a multicenter double-blind placebo-controlled randomized phase 3 trial, which aimed to evaluate the efficacy of azithromycin in the prevention of airflow decline in patients after hematopoietic stem cell transplant (HSCT) [13]. Bronchiolitis obliterans syndrome, which results in airflow decline and respiratory function impairment is a known complication of HSCT, related to chronic graft-versus-host disease (GvHD). The trial randomized 465 patients, 231 in the azithromycin arm and 234 in the placebo arm, between February 2014 and August 2015. The primary endpoint was airflow-decline free (AFD-free) survival at 24 months after randomization as the time from randomization to decline in the respiratory function or death of any cause. Respiratory function was assessed every 6 months, by plethysmography (with pre- and post-bronchodilator spirometry). Observations were censored at the date of last follow-up, in patients without events. It was expected that azithromycin would have a protective effect on the respiratory function and therefore an improved AFD-free survival, resulting in a postulated under the alternative, assuming a constant effect over the follow-up (that is, proportional hazards (PH)). Unexpectedly, the trial was terminated prematurely on December 26, 2016 after the trial Data Safety Monitoring Board (DSMB) alerted on an imbalance in the number of hematological relapses across randomization arms [13]. At that time, enrollment was complete but treatment and follow-up were still on-going for 122 patients. The statistical analysis plan for the ALLOZITHRO trial relied on frequentist methods, without any planned interim analyses. In the present post-hoc re-analysis of the trial, we aimed to illustrate how Bayesian sequential analyses of the AFD-free survival could have informed early on the unexpected adverse outcome of this phase 3 trial.

CLL7-SA trial: Incorporating historical data

The CLL7-SA trial (NCT00645606) was a multicenter randomized open-label phase 3 trial that was conducted to evaluate the efficacy of 2-year rituximab (RTX) maintenance therapy in elderly patients with previously untreated chronic lymphocytic leukemia (CLL), compared to standard of care (SoC) observation (watchful waiting) [14]. The primary endpoint was progression free survival (PFS). Assuming a 32% relative improvement with RTX in the 36-month PFS (66% vs. 50%, ), a sample size of 161 events from 542 patients, accounting for 25% drop-out rate during the induction part of the CLL therapy, was computed according to the O’Brien and Fleming design. One interim analysis was planned, after 121 events (75% of 161) had been observed. The inclusion period started on June 10, 2008, and ended on August 14, 2014. Eventually, the interim analysis was performed after 150 events: the efficacy boundary was crossed () and the trial stopped. Overall, 409 patients were randomized, 202 to RTX maintenance and 207 to standard of care. At the time of the planning of this trial in 2007, information on RTX maintenance therapy in CLL was scarce. Nevertheless, at the beginning of the inclusion period in December 2010, results of the PRIMA trial (NCT00140582) were published, demonstrating the benefit of 2-year RTX maintenance in patients with follicular lymphoma (FL) receiving a RTX plus chemotherapy regimen as first-line treatment (progression free survival: HR=0.55, 95% CI 0.44–0.68) [15]. Although FL is a different population from CLL, these hematologic malignancies share some characteristics and evolution profiles. They both are indolent B-cell lymphoid malignancies, which progress slowly by acute phases. They both develop similar complications related to the immune system, notably infections. In both cases, there was no curative therapy currently available. Last, in both trials, eligible patients had to be treatment-naive. In that sense, it appeared relevant to consider the FL population from the PRIMA trial, though non perfectly, similar and exchangeable to the CLL population from the CLL7-SA trial. Therefore, results from the PRIMA trial on the effect of RTX in FL patients could provide clinically relevant information on the effect of RTX maintenance in CLL patients. Moreover, we also considered the acceptability of this historical dataset for the combination to the current data [16]. Both trials were large European multicenter randomized phase 3 trials. More precisely, following Pocock’s criteria [16]: (i) Treatment regimens were comparable in both the control and treatment groups, between the two trials: both experimental arms consisted in RTX maintenance, with intravenous infusion every 8 weeks for 2 years, with close dosages (375mg/m in the historical trial and 500mg/m in the CLL7-SA trial); both control arms consisted in SoC watchful observation; (ii) The historical study was recent compared to the present trial, with online publication available in December 2010, about 2.5 years after the start of inclusion in the CLL7-SA trial; as described above, the two treatment-naive populations appeared exchangeable, (iii) The main endpoint in our post-hoc analysis, was progression-free survival, assessed using international standard criteria in both trials, (iv) Differences between the two trials materialized in patient characteristics: the CLL7-SA trial focused on older patients (above 65 years old), but with good performance status and adequate renal and hepatic function for eligibility, (v) Both studies were conducted in similar settings: they were both large multicenter 1:1 randomized open-label parallel controlled trials sponsored by French collaborative groups specific to the disease of interest: the Groupe d’Étude des Lymphomes de l’Adulte (GELA) for the PRIMA trial and the French Innovative Leukemia Organization (FILO) for CLL7-SA trial, with participating centers belonging to these networks, respectively, and sharing common practice and standards of care, and (vi) There were no further indications to anticipate differing results between the two trials. In the present post-hoc re-analysis, we propose to use Bayesian methods to incorporate information from the PRIMA trial as soon as it was published to the analysis of the CLL7-SA trial, a phase 3 trial with a survival endpoint. In particular, we propose using the power prior approach which allows leveraging the external data [17], accounting for the disease heterogeneity between the current trial (CLL patients) and external information (FL patients).

Methods

Both motivating examples used a survival endpoint for the primary efficacy assessment, airflow decline (AFD)-free survival and progression-free survival (PFS) respectively. We assumed a proportional hazards setting to estimate the effect of the treatment, following the original analyses of both trials. Sections 2.1, 2.2 first present the notations and models used for the Bayesian analyses, while specific aspects are developed thereafter, namely sequential Bayesian analyses in Section 2.3 and incorporation of external individual survival data with the power prior approach in Section 2.4.

Notations

Let be the number of observations in the dataset, and let , denote the right-censored times, i.e., and with the failure time and the censoring time of individuals , . Data are given by , where is the observation time and the indicator variable for randomization group ( if the patients is allocated to the control group and for the intervention group). We consider the proportional hazards model: or, equivalently, where is the baseline distribution function of the right-censored time (either time to airflow decline or death, whatever occurred first, and time to progression or death, for the ALLOZITHRO and CLL7-SA trials, respectively), is the cumulative baseline hazard function, and the log hazard ratio ().

Survival model

Using Bayesian inference, the above model (1) includes two parameters: the treatment effect () and the baseline survival (equivalently defined by , , , or ). Both must be associated with priors, summarizing, before the trial onset, all external evidence in the effect size and the baseline survival, respectively. First, the was assumed normally distributed, with mean and standard deviation [18]: with prior mean set at zero, resulting in a reference prior centered on the null hypothesis; such a skeptical prior may be thought of as a handicap that the trial data must overcome in order to provide convincing evidence of benefit [19]. The standard deviation can be initially set at a large value to insure weakly informative prior regarding the treatment effect, and thus let the observations drive the analyses (which can be evaluated by comparing results with estimates from likelihood approaches). Then, for the baseline survival, many modeling options have been proposed in Bayesian survival proportional hazards models, either parametric (exponential or Weibull models) or not (using mixtures of Polya trees or transformed Bernstein polynomials, for instance) [20], [21], [22], [23]. We considered two approaches for the prior on the baseline hazard : a parametric exponential distribution (constant baseline hazard ), or a piecewise constant hazard, sometimes referred to as piecewise exponential model (PEM), to allow more flexibility in the baseline hazard. Of note, more complex piecewise hazards models have been proposed, including spline-based models [23], [24], [25]. In the exponential model, the prior for the constant baseline hazard was normal [26]. In the PEM, the baseline hazard takes constant values in each time interval defined over the observation period: where , is an interval resulting from the partition in intervals of the observation period, with and . The time partition was defined following Murray et al.’s approach [27], with intervals, where is the observed number of events in the current trial dataset, and the left bounds of the intervals correspond to the percentiles of the event times in the current trial dataset. Specifically, since we assumed a time-invariant effect of the treatment arm, therefore, it resulted that the count of events per interval , conditionally on treatment, is Poisson-distributed: with rate for the whole time interval. A correlated random-walk process was used as the prior for the baseline hazard: , with , , and [9], [27]. We chose a random-walk process to allow smoothing of the baseline hazard function over time. Indeed, the vague prior on the first interval initiates the random walk, and then subsequent interval parameters are shrunk toward the previous one, given the random walk process. The models were estimated with Hamiltonian Monte Carlo (HMC) simulations in Stan on R statistical platform, using the rstan and rstanarm packages [26], [28]. We used 4 chains of each 5000 iterations after warm-up, thinning of 5, yielding 4000 iterations overall to include for the analyses. R code is available on GitHub platform at https://github.com/luciebiard/Bayesian_survival_analysis_phase_3_trials.

Motivating example 1: Sequential analyses

Given the Bayes approach provides a natural framework for sequential learning, the model described above was fitted to the current data sequentially every six months, monitoring the ALLOZITHRO trial on the basis of the posterior distribution of the of azithromycin on AFD-free survival. Various normal priors were used for the [18]: (i) reference prior, , (ii) enthusiastic or clinical prior . The latter was chosen consistently with the information available to the investigators at the time of trial planning, and used for sample size calculation [13]. It corresponds to the expected effect of azithromycin on 24-month AFD-survival when the trial was planned [13]. Specifically, the anticipated AFD-free survival at 2 years was 45% in the control group, based on literature reporting the prevalence of AFD in allogeneic HSCT recipients [29], and on the French national registry (Agency for Biomedicine) for the post-transplantation survival estimates (66% after one year and 54% after 2 years at the time of study planning). Moreover a 15% benefit with the experimental treatment on 2-year AFD-free survival was deemed clinically relevant, corresponding to a 0.64 hazard ratio. Furthermore, regarding the variance, it has been shown that, for large balanced trials, the estimated has approximate variance 4 divided by the observed number of events [18], [30]. We therefore set to account for very limited pre-existing information on this [1], [18], [30], [31].

Motivating example 2: Incorporating individual external information

Incorporating historical data to a current analysis relies on the assumption that the different datasets are relevant to the population of interest. Depending on the assumption about the homogeneity across the datasets and populations (identity, exchangeability, bias, etc.), different modeling strategies are available [18]. In the present setting of the CLL7-SA trial, we wished to incorporate results from a single external phase 3 trial clinically relevant to the CLL7-SA objective but in a slightly different population. To account for such external data in a discounted manner, we chose to use the power prior approach proposed by [17]. Although the true parameter , modeling the effect of the treatment RTX, is assumed the same across the datasets, the information borrowed from historical external data is discounted compared to the current data in the estimation model. Briefly, the method is equivalent to shrinking the external sample size by a factor [8]. Let be the current available data, as defined in 2.1, on the time-to-event endpoint according to the randomization arm. Let be the historical data available, where denotes the sample size, the -vector of observed times, the -vector of censoring indicator and the -vector for the randomization arm indicator. Let be the likelihood for a regression model of the endpoint as a function of , with the vector of model parameters. In the model described above (Section 2.2), is the vector . Let denote the joint density of the initial prior distribution for . Given the historical data , the power prior, to be used for the current analysis, is given by [32]: where is a specified vector of hyperparameters for the initial prior , and a scalar parameter that represents the weight of the historical data in estimating the prior for . It ranges from 0 (ignoring any previous information from dataset ) up to 1 (where the historical data is pooled to the current without leveraging). Conditionally on the value of , the posterior distribution for after observing the current data is given by: The parameter can be also considered as an unknown parameter. In that case, is to be estimated from the datasets (current and historical) and the model includes an hyperprior for , , with hyperparameter . As explained by Ibrahim & Chen [32], it is reasonable to set a beta prior for , such as , although other choices such as truncated gamma or normal prior distributions are possible. When considering a beta prior for , they argued that it is easier to elicit a mean and standard deviation for from physicians, rather than directly setting the hyperparameters vector for the beta distribution ; then, hyperparameters can be derived by back substitution using the equations for a beta distribution, as follows [32]: Nevertheless, one should note that, in the case of a random , we obtain a joint posterior distribution, , which must include the normalizing constant [33]: In the present study, we chose to set fixed instead of random, given the single historical dataset used to enrich the CLL7-SA analysis, thus the rather limited information to estimate between-trial information via . Nevertheless, sensitivity analyses with several values for (1, 0.75, 0.5, 0.25, 0) were performed, to assess the robustness of the results to the choice of , that is, to the influence of the historical data on the estimation. In this example, we used a weakly informative reference prior distribution for the historical PRIMA data .

Results

ALLOZITHRO example

We performed post-hoc sequential analyses on the AFD-free survival probability estimated on all available data truncated every 6 months starting from the beginning of enrollment in the trial. Given the time-to-event endpoint, we applied non-informative administrative right-censoring at the cut-off sequential dates. Table 1 reports the timepoints and the corresponding available data of each resulting interim analysis. Inclusions were completed by 15 August 2015. In December 2016, when the intervention was terminated early, 218 patients had been randomized less than 24 months before (primary endpoint observation window), of whom 96 had already experienced a primary event, and 122 had not.

Table 1

ALLOZITHRO example: Sequential timepoints and corresponding samples.

Interim	Date cut-off	Placebo:Azithromycin
		No. of inclusions	No. of completed follow-up	No. of events
1	August 13, 2014	70:65	3:3	3:3
2	February 13, 2015	149:143	20:23	20:23
3	August 13, 2015	231:234	53:57	53:57
4	February 13, 2016	231:234	70:94	70:94
5	August 13, 2016	231:234	113:138	90:118
6	February 13, 2017	231:234	154:162	110:131

At the first (Aug, 2014) and second (Feb, 2015) interim timepoints, few data on the primary endpoint were available: only 6 and 43 patients had experienced an event at these cut-off dates, respectively (Table 1). For the purpose of the present re-analyses, due to the small number of events, the parametric exponential model was used for these timepoints. The 4 remaining analyses used the more flexible PEM model as described in Section 2.2. Posterior estimates of the with reference and enthusiastic priors are reported in Table 2. Since the second interim analysis, in February 2015, the mean and median posterior estimates were consistently above 0, whatever the prior. Moreover, the 10 percentile of the posterior distribution was consistently above 0, starting from the fourth analysis (February 2016), pointing toward a probable increased risk of event in patients treated with azithromycin. In other words, starting from this date, based on the credibility intervals, there was a probability lower than 5% that azithromycin was beneficial in terms of AFD-free survival (with the lower bound of the 95% credibility interval close to 0); conversely, if we consider the similitude with a one-sided hypothesis, there was a probability close to 95% that azithromycin was harmful.

Table 2

ALLOZITHRO example: Sequential posterior estimates of the on AFD-free survival for azithromycin compared to placebo, with either the reference prior (Ref.): , or the enthusiastic clinical prior (Enthu.): .

Interim	Date	Prior	Mean log(HR)	Median log(HR)	95% CrI	10th percentile	Pr(HR>1)
1	Aug, 2014	Ref.	−0.019	−0.027	−0.974 ; 0.946	−0.642	0.483
		Enthu	−0.285	−0.288	−1.290; 0.715	−0.938	0.288
2	Feb, 2015	Ref.	0.138	0.141	−0.392 ; 0.679	−0.219	0.688
		Enthu	0.046	0.048	−0.504 ; 0.572	−0.302	0.560
3	Aug, 2015	Ref.	0.118	0.117	−0.243; 0.482	−0.121	0.726
		Enthu	0.077	0.074	−0.278; 0.436	−0.152	0.662
4	Feb, 2016	Ref.	0.317	0.317	0.025; 0.615	0.124	0.982
		Enthu	0.290	0.290	−0.002; 0.581	0.092	0.974
5	Aug, 2016	Ref.	0.318	0.319	0.047; 0.591	0.140	0.989
		Enthu	0.292	0.292	0.018; 0.560	0.118	0.983
6	Feb, 2017	Ref.	0.229	0.228	−0.022; 0.475	0.069	0.964
		Enthu	0.207	0.207	−0.039; 0.460	0.042	0.948

ALLOZITHRO example: Sequential timepoints and corresponding samples. ALLOZITHRO example: Sequential posterior estimates of the on AFD-free survival for azithromycin compared to placebo, with either the reference prior (Ref.): , or the enthusiastic clinical prior (Enthu.): .

CLL7-SA example

We examined how published results of the PRIMA trial could have been used, right away after its publication, on December, 2010, to provide information on the effect of RTX in the CLL7-SA trial. More specifically, we aim to illustrate how this information may have been incorporated in an interim Bayesian analysis. At that time, 216 patients had been randomized, with an actual median follow-up 19.1 months, and 20 and 11 observed failures in the SoC and RTX arms, respectively. Kaplan–Meier estimates of the 24-month PFS were 75% (95%CI 65;86) for the SoC group, and 87% (78;96) for the RTX group. Since we did not have access to the original individual data from the PRIMA trial, we used reconstructed observations from the initial publication of the PRIMA trial. Specifically, based on the published Kaplan–Meier curves and numbers of patients at risk, using DigitizeIt software and iterative numerical methods solving the inverted Kaplan–Meier equations, we obtained a reconstructed dataset mimicking the PRIMA trial results [34]. Briefly, for each time interval reported on the publication, the algorithm combines published numbers of at-risk patients, Kaplan–Meier curve coordinates, and iterative calculations using the Kaplan–Meier estimator. We refer the reader to the original publication for a detailed presentation of this algorithm and the corresponding R code [34]. The reconstructed data was consistent with the published results, yielding a HR=0.54 (95%CI 0.44;0.68) versus HR=0.55 (95%CI 0.44;0.68) in the original publication [15]. Fig. 1 presents the Kaplan–Meier estimates of the December 2010 interim CLL7-SA data, with the reconstructed PRIMA estimates superimposed, for progression-free survival in patients with RTX maintenance or SoC observation.

Fig. 1

Results are reported in Table 3. Combined with a weakly informative prior on the effect of RTX (with and as aggregate weakly informative prior on the treatment coefficient, see Section 2.4), available data in January 2011 pointed toward a reduced hazard of event with RTX maintenance compared to SoC watchful waiting, with mean and 90th percentile . Incorporating information from the PRIMA trial, enriching the prior for analysis of the current data, resulted in a posterior probability of the being lower than 0 greater than 97.5%, even when largely down-weighting the external data (). The upper bound of the 95% credibility interval remained consistently lower than 0 as soon as we incorporated some degree of historical data, corresponding to a posterior probability that RTX was harmful lower than 2.5%. Considering an equivalence region, for instance , there was a posterior probability lower than 1% that RTX was equivalent to SoC, with (7% with ). Furthermore, with power parameter , there was a posterior 70% probability that the was lower than 0.6 which was the desired efficacy level used in the frequentist trial planning.

Table 3

CLL example: posterior distribution of the effect of RTX on PFS () compared to SoC at December 2010 interim analysis of CLL7-SA trial, with reconstructed data from the PRIMA trial results as historical prior information, using the power prior approach with fixed power parameter, ranging from 0 (equivalent to non including historical data) to 1 (equivalent to pooling historical to current data).

Power prior	Mean log(HR)	Median log(HR)	95% CrI	90th percentile	Pr(HR<0.6)
a0=0	−0.562	−0.562	−1.322 ; 0.151	−0.094	0.546
a0=0.10	−0.586	−0.586	−0.996 ; −0.191	−0.320	0.647
a0=0.25	−0.590	−0.590	−0.979 ; −0.221	−0.344	0.665
a0=0.50	−0.604	−0.604	−0.883 ; −0.320	−0.421	0.741
a0=0.75	−0.604	−0.604	−0.848 ; −0.362	−0.441	0.774
a0=1	−0.608	−0.608	−0.815 ; −0.405	−0.472	0.825

CLL7-SA example: Kaplan–Meier estimates of progression free survival with interim CLL7-SA data censored on 31 December 2010 (observation standard of care SoC: solid black line; RTX maintenance: dashed black line) and reconstructed data from the PRIMA trial published in January 2011 (observation standard of care SoC: solid gray line; RTX maintenance: dashed gray line). CLL example: posterior distribution of the effect of RTX on PFS () compared to SoC at December 2010 interim analysis of CLL7-SA trial, with reconstructed data from the PRIMA trial results as historical prior information, using the power prior approach with fixed power parameter, ranging from 0 (equivalent to non including historical data) to 1 (equivalent to pooling historical to current data).

Discussion

In this article, we presented how Bayesian inference may inform on time-to-event endpoints in phase 3 clinical trials. Compared to the frequentist approach, which remains widely used in large confirmatory trials, Bayesian methods have been advocated for the straightforward and intuitive interpretation of results, in the form of the posterior estimates of the treatment effect, and their flexibility, in particular for the design of complex trials, such as adaptive designs and borrowing of external data [2], [3], [4], [7]. Nevertheless, their use requires precautions as issued in guidelines and guidance to prevent misuse and erroneous conclusions by regulatory agencies [35]. We used two different RCTs to illustrate several of the advantages of the Bayes approaches in specific but non rare settings. First, the ALLOZITHRO trial illustrates the contribution of Bayesian inference for sequential analyses of a right-censored endpoint. Moreover, it provides a flexible framework to detect early departures of the treatment effect from the expected direction. Using different prior distributions for the , we found consistent results indicating a deleterious outcome for treated patients, more than 6 months before the trial was stopped. Otherwise, the CLL7-SA trial illustrated how the power prior method, borrowing external information, can increase the information on the right-censored endpoint. In both examples, implementing decision rules based on these Bayesian analyses might have had a direct clinical benefit for patients by shortening the trial duration: by discontinuing treatment earlier and preventing prolonged exposure, and by concluding to efficacy earlier and accelerating access to the drug, respectively. We chose proportional hazards (PH) models, which were used originally in both trials and are the most common in large clinical trials with survival endpoints. The piecewise constant baseline hazard allows more flexibility than the constant hazard exponential model, though it could fail to adequately fit the data in sparse settings. Indeed, the model relies on a partition of the time scale into intervals based on the distribution of failure times. At early interim analyses, there might be a limited number of observed failures, which may result in problematic estimations. For instance in our examples, there were 6 and 43 events in the first two interim analyses in the ALLOZITHRO example. Furthermore, in the CLL7-SA trial they were 31 observed events, and the incorporation of external individual data did not allow to reach convergence. Using the partition rule proposed by Murray et al. [27], the PEM model for these analyses could not be estimated without convergence issues, using the Hamiltonian Monte Carlo (HMC) algorithm; this explains why we used the exponential model with a one-parameter baseline hazard. The model could nevertheless be estimated with another MCMC sampler, namely a random-walk Metropolis algorithm (using MCMCpack R library [36]), yielding consistent results with those of the exponential model. This limitation could be also tempered by planning the interim analyses according to the expected rate of events. Otherwise, whether reparameterization of the model or other time partition rules could allow convergence of the PEM model in sparse settings, requires further investigations. Note that the PEM random-walk model can be implemented for estimation in any Bayesian software, on various platforms, as mentioned above: in the present work, we used Stan via the rstan package on R platform. Several tools are notably available for convergence diagnosis of Stan HMC estimated models (e.g. package shinystan [37]). In our setting of time-to-event endpoints, we specifically implemented predicted survival for model checking (see Supplementary material). More complex survival models, adapted to specific situations, are available and could be applied for these Bayesian analyses allowing non proportional hazards (PH) and time-dependent treatment effect, as well as interval censoring [38], [39], [40]. Of note, in the context of non PH issues, [41] proposed to combine current and external data using Bayesian methods, to infer on restricted mean survival. In the second example, we illustrated how external information can be borrowed to enrich the current data. To that aim, various Bayesian approaches have been proposed which mainly differ in the assumptions about the relevance and exchangeability of the external data with the current trial [18], [42], [43]. We chose the power prior approach to downweight the reconstructed data from the PRIMA trial published results, to account for the similar but different disease population (chronic lymphocytic leukemia versus follicular lymphoma). Alternatively, we could have used an informative Gaussian prior for the defined based on the PRIMA results, rather than the reconstructed individual data, and discounted this external information by increasing the prior variance on the [3]. In the case several external sources are available, more complex models with hyperparameterization for the between-source variability, can be considered: Bayesian hierarchical modeling and meta-analytical approaches, power priors with random power parameter [8], [9], [41], [44]. We presented post-hoc analyses of two trials to advocate the use of Bayesian methods in phase 3 trials with survival endpoints. Bayesian posterior estimates are particularly adapted for decision rules. Similarly, posterior predictive estimates can also be used to base decision rules on predictions of interest [45]. Formal assessment of the resulting operating characteristics, similarly to the sample size calculation in the frequentist setting, may appear necessary to implement these tools in practice, to comply with the regulatory agencies requirements; guidance to prevent misuse and erroneous conclusions have been issued to this aim [35], [46], [47]. Indeed, defining rules for efficacy based on conciliatory thresholds, such as the posterior probability of the HR being greater than 1 for instance, often result in unacceptable type I error rates. Last, using predictive probabilities, methods have been developed to estimate the probability of success of a trial at the planning stage, but also during the trial, in a sequential manner using both current and external information [31], [48], that could apply in this setting. In summary, we exemplified throughout two main examples, the informativeness of Bayesian methods in sequential analysis of RCTs with right-censored endpoints. We showed that the Bayesian approach can be applied to proportional hazards survival models with estimation tools available on software platforms and should not be restricted to binary endpoints. Furthermore, we illustrated two aspects of Bayesian methods for phase 3 clinical trials, namely flexible sequential analyses and incorporation of external or historical data. Overall, Bayesian methods provide straightforward interpretation of results, accounting for uncertainty, and allows borrowing information, summarizing all the evidence available at the current time.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

30 in total

1. Seamlessly expanding a randomized phase II trial to phase III.

Authors: Lurdes Y T Inoue; Peter F Thall; Donald A Berry
Journal: Biometrics Date: 2002-12 Impact factor: 2.571

2. Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials.

Authors: Duminda N Wijeysundera; Peter C Austin; Janet E Hux; W Scott Beattie; Andreas Laupacis
Journal: J Clin Epidemiol Date: 2008-10-23 Impact factor: 6.437

3. A note on the power prior.

Authors: Beat Neuenschwander; Michael Branson; David J Spiegelhalter
Journal: Stat Med Date: 2009-12-10 Impact factor: 2.373

Review 4. The combination of randomized and historical controls in clinical trials.

Authors: S J Pocock
Journal: J Chronic Dis Date: 1976-03

5. Scientists rise up against statistical significance.

Authors: Valentin Amrhein; Sander Greenland; Blake McShane
Journal: Nature Date: 2019-03 Impact factor: 49.962

6. Rituximab maintenance versus observation following abbreviated induction with chemoimmunotherapy in elderly patients with previously untreated chronic lymphocytic leukaemia (CLL 2007 SA): an open-label, randomised phase 3 study.

Authors: Caroline Dartigeas; Eric Van Den Neste; Julie Léger; Hervé Maisonneuve; Christian Berthou; Marie-Sarah Dilhuydy; Sophie De Guibert; Stéphane Leprêtre; Marie C Béné; Florence Nguyen-Khac; Rémi Letestu; Florence Cymbalista; Philippe Rodon; Thérèse Aurran-Schleinitz; Jean-Pierre Vilque; Olivier Tournilhac; Béatrice Mahé; Kamel Laribi; Anne-Sophie Michallet; Alain Delmer; Pierre Feugier; Vincent Lévy; Roselyne Delépine; Philippe Colombat; Véronique Leblond
Journal: Lancet Haematol Date: 2017-12-20 Impact factor: 18.959

7. Airflow obstruction after myeloablative allogeneic hematopoietic stem cell transplantation.

Authors: Jason W Chien; Paul J Martin; Ted A Gooley; Mary E Flowers; Susan R Heckbert; W Garrett Nichols; Joan G Clark
Journal: Am J Respir Crit Care Med Date: 2003-03-20 Impact factor: 21.405

8. Semiparametric Bayesian commensurate survival model for post-market medical device surveillance with non-exchangeable historical data.

Authors: Thomas A Murray; Brian P Hobbs; Theodore C Lystig; Bradley P Carlin
Journal: Biometrics Date: 2013-12-05 Impact factor: 2.571

9. The power prior: theory and applications.

Authors: Joseph G Ibrahim; Ming-Hui Chen; Yeongjin Gwon; Fang Chen
Journal: Stat Med Date: 2015-09-07 Impact factor: 2.373

10. Flexible Bayesian survival modeling with semiparametric time-dependent and shape-restricted covariate effects.

Authors: Thomas A Murray; Brian P Hobbs; Daniel J Sargent; Bradley P Carlin
Journal: Bayesian Anal Date: 2015-05-14 Impact factor: 3.728