Literature DB >> 28612371

Estimation in multi-arm two-stage trials with treatment selection and time-to-event endpoint.

Matthias Brückner¹, Andrew Titman¹, Thomas Jaki¹.

Abstract

We consider estimation of treatment effects in two-stage adaptive multi-arm trials with a common control. The best treatment is selected at interim, and the primary endpoint is modeled via a Cox proportional hazards model. The maximum partial-likelihood estimator of the log hazard ratio of the selected treatment will overestimate the true treatment effect in this case. Several methods for reducing the selection bias have been proposed for normal endpoints, including an iterative method based on the estimated conditional selection biases and a shrinkage approach based on empirical Bayes theory. We adapt these methods to time-to-event data and compare the bias and mean squared error of all methods in an extensive simulation study and apply the proposed methods to reconstructed data from the FOCUS trial. We find that all methods tend to overcorrect the bias, and only the shrinkage methods can reduce the mean squared error.

Entities: Chemical Disease Gene Species

Keywords: adaptive design; bias; empirical Bayes; multi-arm trial; time-to-event

Mesh：

Year: 2017 PMID： 28612371 PMCID： PMC5575545 DOI： 10.1002/sim.7367

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

Introduction

Multi‐arm trials, which compare multiple treatment arms to a common control group, are more efficient than traditional two‐arm designs as only a single control group is used 1. Additional efficiency gains can be achieved by selecting treatments at one or more interim analyses 2. Most of the recent research in this area focuses on how such studies are designed for a normally distributed endpoint (e.g., 3, 4, 5, 6). Approaches that are applicable to non‐normal and particularly time‐to‐event endpoints are less frequent in contrast (e.g., 7, 8, 9, 10), although many examples of multi‐arm trials with survival endpoints, such as the three‐arm SANAD trial in generalized and unclassifiable epilepsy 11, the four‐arm STAMPEDE trial in prostate cancer 12, or the five‐arm FOCUS trial in poor prognosis advanced colorectal cancer 13, exist. Adaptive treatment selection at an interim analysis increases the chance of a successful trial and avoids wasting resources on unpromising treatments 1. It is well known, however, that selecting promising treatments based on the observed data leads to overestimation of the treatment effect in the selected treatment arms and an underestimation of the treatment effect in the dropped treatment arms 14. A variety of different strategies to remove or reduce the bias has been developed for normal data. The uniformly minimum variance conditionally unbiased estimator (UMVCUE) in a two‐stage design when selecting the best treatment was first derived by Cohen and Sackrowitz 15 and extended to more general selection rules in 16. A simple iterative method for calculating a bias‐corrected estimate based on the estimated conditional selection bias was proposed by Stallard and Todd 17. An extension of the shrinkage estimator of Hwang 18 to two‐stage adaptive designs has been proposed by Carreras and Brannath 19. They show that the shrinkage estimator has smaller Bayes risk than the maximum likelihood estimator (MLE) and also a smaller mean squared error (MSE) in many scenarios. Common to the methods for normal data is that they assume equal variance between treatment arms and do not consider the control group, because in the normal case, the conditional selection bias of the MLE is the same with or without a control group (although the equal variance assumption has been relaxed for the UMVCUE in recent work 20, which also explicitly accounts for a shared control group). However, in the context of survival trials and for the estimation of log hazard ratios (log‐HRs), a control group is required. The challenge thereby is that a control group, which is common to all treatment groups, induces a correlation among the log‐HR estimators. Moreover, the assumption of equal and known variances is unrealistic in the survival setting as the variances depend on the number of observed events. In this paper, we therefore adapt the method of Stallard and Todd 17 to time‐to‐event endpoints and consider shrinkage estimators in this context. In an extensive simulation study, we then compare the bias and MSE of these estimators and analyze reconstructed data from the FOCUS trial. In Section 2.2, we derive explicit formulas for the selection bias of the MLE in the selected, as well as in the dropped treatment arms under a Cox proportional hazards model, when the treatment with the largest observed effect at an interim analysis is selected. We derive two shrinkage estimators in Section 2.3 and adapt the iterative Stallard–Todd method to survival data in Section 2.4. The simulation study in Section 3 compares the bias and MSE of these methods for different number of treatment groups, allocation ratios, and total number of events. An analysis of reconstructed data from the FOCUS trial is presented in Section 4 before we conclude with a discussion.

Methods

We consider a two‐stage design where the best treatment, that is, the treatment with the smallest estimated log‐HR, is selected at the interim analysis. Let S be the index of the selected treatment. Note that S is a random variable depending on the Stage 1 data.

Model

Consider treatment groups and a common control group. Denote the total sample size by n, and suppose patients are allocated to group k in Stage 1 with probability p (k = 0, …, K). Here, the index k=0 represents the control group. Assuming proportional hazards between each treatment group and the control group, we consider the following Cox proportional hazards model: where β is the vector of log‐HRs and Z =(Z 1, …, Z ) is the vector of binary treatment indicators, that is, Z =1 if patient j is in group k and 0, otherwise (k = 1 …, K, j = 1, …, n). The joint model in Eq. (1) is equivalent to K separate Cox models with the same baseline hazard function λ 0, which is the hazard in the common control group. In a sequential survival trial with more than one analysis time point, we need to distinguish between the calendar time, measured from the start of the trial, and the individual survival time of each patient. Associated with each patient is an entry time R , measured from the start of the trial, a survival time T , and a censoring time C , both measured from entry of the patient into the trial. At each calendar time t, only the minimum and the censoring indicator can be observed. The observed data of all patients at calendar time t are given by where is the number of patients recruited by calendar time t.

Maximum likelihood estimator

The MLE of β in Eq. (1) at calendar time t is denoted by and is obtained by maximizing the partial‐likelihood of the Cox model based on the data D(t). In practice, interim and final analyses are usually not scheduled at fixed calendar times but after a certain number of events have been observed. Let d be the total number of events of the j‐th analysis and d the number of events in the k‐th group at the j‐th analysis, such that d =d + … + d (k = 0, …, K, j = 1, 2). Denote by the vector of estimated log‐HRs (k = 1, …, K, j = 1, 2) at the interim and final analyses, respectively. The estimated log‐HR in the selected treatment group at the interim analysis is . The true treatment effect β of the selected treatment is a random variable, and in general, . We say that an estimator of β at Stage j is unbiased if (j=1,2) 15. The MLEs are correlated because of the common control group. If β≈0 and the censoring time C is stochastically independent from Z, we show in Section A.1 in the Appendix that the correlation ρ between and is approximately Note that when the size of the common control group decreases, the correlation of the MLEs increases. In the special case , where all treatment groups are the same size, the correlation ρ between any two log‐HR estimators at the interim analysis is approximately (1−p 0)/(1+(K−1)p 0). The derivation of the selection bias is based on the joint asymptotic distribution of the vectors and . We have approximately where Σ is the asymptotic covariance matrix of (j=1,2) (see Section A.1 in the Appendix).

Estimation in a two‐stage design

Denote by the increment of the MLE based on all data observed after the interim analysis (including events observed in Stage 2, from patients recruited in Stage 1). This is an (asymptotically) unbiased estimator of the true log‐HR, which is inefficient, because it ignores all of the Stage 1 data. Formally, can be defined via the score process U of the Cox model and the corresponding Fisher information conditional on the selection S. Asymptotically U 2−U 1 and U 1 are stochastically independent, because of the independent increments property of U. Thus, the increment of the MLE given by is asymptotically independent from conditionally on S. At the final analysis, the MLE of the selected treatment can be defined in two different, but asymptotically equivalent, ways, either by , that is, from all data accrued until the final analysis or by a weighted sum of the first stage MLE and the increment from first to second stage, that is, where w∈[0,1] is a pre‐specified constant. We will consider both estimators in our simulation study. Usually, w will be equal to the information fraction at the planned time of the interim analysis, that is, w=d 1/d 2.

Selection bias and mean squared error

The bias of the MLE of the selected treatment at analysis j can be written as The true treatment effect in the selected treatment arm is overestimated, when the best performing treatment arm is selected, because One can see that the true treatment effect in the dropped treatment arms is underestimated. For normal means, the selection bias is maximal if all means are equal 19. Their result and its proof still hold asymptotically for the estimated log‐HRs when the signs are reversed; that is, for given variances and correlation, the absolute value of the bias is maximal if all true log‐HRs are equal. We obtain a lower bound for the selection bias of depending on the asymptotic covariance matrix Σ. This bound is attained when the true log‐HRs are all equal. Because the correlations are always nonnegative in our setting, it is an immediate consequence of Slepian's lemma 21 that for fixed variances, the lower bound C (Σ) is minimal when all correlations are 0. This corresponds to a situation where every treatment group has its own independent control group. There are two possible ways of defining the MSE, because β is a random variable. First, the MSE of predicting β which is the definition that we will use in our simulation study, or alternatively the MSE of estimating E[β ]: Note that while the bias does not depend on the correlation in the normal case, this is not the case for the variance. The MSE is always larger with control group than without a control group: where X is the sample mean and θ is the true mean in group k = 0, …, K.

Conditional selection bias

We will now give explicit expressions for the selection probabilities and the conditional selection biases at the interim and final analyses. The detailed derivations can be found in Section B.1 in the Appendix. Let be the vector with the k‐th element removed. Denote by the survival function of the (K−1)‐dimensional normal distribution with mean β −(x) and covariance matrix Σ−(x), where β −(x) is the conditional mean and Σ−(x) is the conditional covariance matrix of given , respectively. The probability of selecting treatment k is given by where . At the interim analysis, the conditional selection bias of conditional on {S=k} is In the dropped treatment arm, the MLE is biased in the opposite direction; that is, the treatment effect is underestimated, and the bias of given {S≠k} is We derive now an explicit formula for the conditional selection bias at the final analysis. Let , where Σ2 is the k‐th row of the matrix Σ2. Then, the conditional selection bias at the final analysis can be written as where In the dropped treatment arm, analogously to Eq. (3), We have shown in Section 2.2.2 that the absolute value of the bias is maximal when the correlation is 0; that is, each treatment group has its own independent control group. In this case, Eqs (4) and (5) reduce to and respectively. Thus, Thus, each additional event observed after the interim analysis reduces the conditional bias, and the bias at the final analysis will be smaller than at the interim analysis. We also see that only the total number of observed events matters and not whether the event comes from a patient recruited before or after the interim analysis. This is especially interesting in the dropped treatment arms, because it quantifies the bias reduction that we can expect, when continuing follow‐up until the final analysis. In our simulation study in Section 3, we see that this relation also holds approximately in the case of a common control group. This result is analogous to the normal case, for example 22. In a two‐arm trial with a single treatment group, recruitment may be stopped early at the interim analysis for the lack of benefit, when the estimated log‐HR is larger than some pre‐specified threshold c, but follow‐up is continued until the final analysis. The selection bias in this case can be obtained as a special case by setting , and letting . This leads to We see that Eq. (6) still holds; that is, additional events reduce the conditional selection bias. This confirms the simulation study results of 23.

Shrinkage estimator

We consider two different approaches to deriving the shrinkage estimator. The first approach follows closely the original derivation of 18, by considering the posterior mean of the selected treatment when the covariance matrix of the data is known, but arbitrary, and then replacing the prior mean and variance by their sample estimates. The second approach first scales the estimators by the square root of their covariance matrix to obtain independent estimators with unit variances and then rescales the shrinkage estimates back to the original scale. In the normal case with known and equal variances and four or more treatment groups, the shrinkage estimator can be derived as an empirical Bayes estimator 24. Let X be the sample mean and θ the true mean in group k ( ). We assume normal priors for the true means θ ∼N(μ,τ 2) and that the sample mean X conditional on θ is normally distributed with mean θ and known variance σ 2 (k = 1, …, K). The posterior mean of θ conditional on X is given by (1−C)μ+C X and the posterior variance by σ 2 C, where C = τ 2/(τ 2+σ 2). Thus, the posterior mean of θ is E[θ |X 1, …, X ] = (1−C)μ+C X . To obtain an estimator of the posterior mean of θ , the unknown quantities μ and C are estimated by the overall sample mean and respectively. is an unbiased estimator of C 24. Because , the estimator is replaced by . Therefore, the shrinkage estimator is given by The aforementioned estimators are only defined for . The best linear unbiased predictor can be used instead when . The best linear unbiased predictor is the same as the empirical Bayes estimator but with K−3 replaced by K−119.

Shrinkage estimator for the log hazard ratios

The extension from independent equal variances to the general case of an arbitrary covariance matrix is straightforward. In order to simplify the notation, we write in the following for the MLE at Stage j instead of . Assume now where I is the K×K identity matrix and the Σ the covariance matrix, which we assume to be known for the moment. Note that we still put independent priors on the true treatment effects. The posterior mean of β given the data is where the shrinkage factor C is now the matrix I−Σ(τ 2 I+Σ)−1. The marginal covariance matrix of is τ 2 I+Σ. Because Σ is symmetric, there exists an orthogonal matrix U and a diagonal matrix D such that . Thus, is a vector of independent normal distributed variables. The marginal covariance matrix of is τ 2 I+D. The following iterative procedure described by Morris 24 to calculate the MLE of τ 2 can now be applied to the transformed data : Start with an initial guess of . Define weights (k = 1, …, K). Estimate τ 2 by If , set . Repeat Steps 2 and 4 until convergence. As Σ is unknown, we apply this iterative method with the diagonal matrix instead of D, where is an estimator of the unknown covariance matrix Σ and is an orthogonal matrix to diagonalize . We can now define an estimator of the matrix C by Similarly as before, we replace the unknown prior mean μ by the overall log‐HR , which is calculated by pooling all treatment groups together and fitting a Cox proportional hazards model to the pooled data. The shrinkage estimator is defined by The shrinkage estimator of the selected treatment is . If , then , and the shrinkage estimator is equal to the overall log‐HR . We will call this the EB shrinkage method. We could have diagonalized with the orthogonal transformation U from the beginning; that is, we could have used U θ and , applied the shrinkage, and then transformed back to the original coordinates. This would have led to the shrinkage factor: which is exactly the same as C, because U is orthogonal.

Shrinkage estimator of the scaled log hazard ratios

Instead of only diagonalizing , we standardize it to the identity matrix and then apply the method of the independent equal variances case. This leads to an alternative shrinkage estimator. Consider the problem of shrinkage towards 0 when Σ is known. We start by transforming the vector with covariance matrix Σ to a vector β ∗ with independent components and unit variances. Define the scaled MLEs: When we use a N(μ ∗,τ ∗2 I) prior on the scaled treatment effects Σ−1/2 β, we know from the independent case with equal variances that the empirical Bayes estimator is given by , where We have The right hand side is the Wald test statistic for testing equality of all treatment effects, which is asymptotically equivalent to the log‐rank test statistic in the Cox model in Eq. (1). When shrinking towards the overall log‐HR , we take the log‐rank test statistic Z, which directly compares the treatment groups, that is, without the control group and treatment group 1 being the new baseline, instead. Therefore, As in the normal case, we replace K−3 by K−1 if K<4. The log‐rank test statistic Z is asymptotically distributed. This is the justification to consider the shrinkage estimator: which we will call the LR shrinkage method.

Two‐stage shrinkage estimators

We define a two‐stage estimator similar to the two‐stage shrinkage estimator proposed by Carreras and Brannath 19 using the increment of the MLE: where is one of the two shrinkage estimators at the interim analysis defined previously.

Confidence intervals

Our focus is on point estimation. Construction of correct confidence intervals is difficult and a topic of ongoing research. No adequate methods for constructing confidence intervals in this setting are available apart from bootstrap resampling. However, bootstrap confidence intervals cannot properly account for the variability introduced by the selection process, because the selection in a resampled data set may be different from the decision in the original data set, but the decision to stop recruitment in the dropped treatment arms cannot be reversed in retrospect. Despite this, such intervals are useful to compare the relative merits of the estimators. We use a bootstrap approach conditional on the selection S 0 in the original data set similar to 25: Resample data accrued until final analysis; Determine calendar time t ∗ of interim analysis in resampled data set; Obtain Stage 1 data from resampled data set, and if necessary, recensor event times of overrunning patients at t ∗; Calculate Stage 1 MLEs (k=1,…,K) from Stage 1 data; If , calculate bias‐corrected estimates. These steps should be repeated until a large enough number of estimates have been produced in order to calculate empirical quantiles of the sampling distributions with the desired accuracy. Note that the interim analysis in each resampled data set is conducted at the same information fraction as in the original data set, although this may correspond to a different number of events and calendar time.

Stallard–Todd estimator

The bias‐corrected estimator of Stallard and Todd 17, 26 is the solution of where is an estimate of the vector of conditional biases when the true log‐HR is (Section 2.2.3). A simple fixed point iteration can be used to find : ; ; Repeat Step 2 until convergence. The conditional biases depend on the unknown covariance matrix of (Section 2.2.3), which we replace with the estimator from the Cox model. The estimation of the vector is computationally complex, because it requires the numeric evaluation of integrals of multivariate cumulative distribution functions. Transforming the vector in the same way as in Section 2.3.2 and then solving Eq. (9) with the identity matrix as covariance matrix does not work, because these solutions are in general not solutions of the original equation. We have defined the Stallard–Todd estimator only at the interim analysis. A similar definition is possible at the final analysis using the bias formulas derived in Section 2.2.3, that is solving Eq. (9) with instead of . However, in our simulations, the iterative procedure diverged most of the time. This also happened in the data example in Section 4. This seems to be related to the fact that the selection bias depends on the marginal covariance matrix, but only the conditional covariance matrix given the selection can be estimated at the final analysis. We therefore only use the two‐stage estimator at the final analysis defined by Confidence intervals for the Stallard–Todd estimator can only be obtained by resampling, as its (asymptotic) sampling distribution is not known. Simulations not reported here suggest that asymptotic normality does indeed hold.

Bias‐corrected Kaplan–Meier estimator

The Kaplan–Meier estimator in the selected treatment group will be biased upwards. The Kaplan–Meier estimator in the control group will always be (asymptotically) unbiased. We can define a bias‐corrected Kaplan–Meier estimator based on any of the bias‐corrected estimators of the log‐HR by using the relationship between the survival functions in the control and the treatment groups under the Cox model (Eq. (1)). The bias‐corrected Kaplan–Meier estimator at a time t is defined as In simulations not reported here, we found that the selection bias of the Kaplan–Meier estimator in the selected treatment arm is negligible even when all true treatment effects are equal. We would expect the selection bias to be more pronounced, when selection was performed based on, for example, the estimated survival probability at a specific time instead of the estimated log‐HRs at interim.

Simulations

We consider three base scenarios in our simulation study, which are similar to those in 19 and cover a range of configurations of true log‐HRs. It is of course impossible to cover all possible configurations in a simulation study. Constant: All hazard ratios (HRs) are equal to 1. H R =1(k = 1, …, K). Linear: A linear trend from 0.6 to 1: H R 1=1,H R −H R =0.4/(K−1)(k = 2, …, K). Peak: All but one HRs are 1: H R 1=0.6,H R = 1(k = 2, …, K). The baseline hazard was , which corresponds to a median survival time of 12 (months) in the control group. We compare the bias and MSE of all methods for an increasing number of treatment groups K. A maximum of 200 patients are allocated to each group. The interim analysis was conducted after (K+1)·50 observed events in all treatment groups and control group combined. This amounts to approximately 50 events in each group when the event rates are similar. The final analysis was performed after 200 events in the selected treatment group and control group combined. Figure 1 shows the bias in 105 simulations of each method in each of the base scenarios as the number of treatment groups K ranges from 2 to 6 at the interim analysis, the final analysis, and for the two‐stage estimators. The corresponding square root of the MSE is shown in Figure 2. We observed some convergence problems of Stallard–Todd estimator, especially in the case K=2 with non‐convergence in up to 20% of the simulations. In these cases, the MLE was used as a fallback, which slightly increases the bias but reduces the MSE. Overall the results are very similar to those of 19 in the normal case.

Figure 1

Figure 2

Root mean squared error as function of the number of treatment groups in each of the three base scenarios at the interim and final analyses and for the two‐stage estimators. The Stallard–Todd estimator is only defined at the interim analysis and as two‐stage estimator.

Bias as function of the number of treatment groups in each of the three base scenarios at the interim and final analyses and for the two‐stage estimators. The Stallard–Todd estimator is only defined at the interim analysis and as two‐stage estimator. Root mean squared error as function of the number of treatment groups in each of the three base scenarios at the interim and final analyses and for the two‐stage estimators. The Stallard–Todd estimator is only defined at the interim analysis and as two‐stage estimator. The simulation results confirm the theoretical result that the bias of the MLE is maximal, when all treatment effects are equal (“Constant” scenario). Both shrinkage estimators clearly outperform the MLE and the Stallard–Todd method in terms of bias and MSE in this case. In the “Linear” scenario, the LR shrinkage estimator has smaller bias than the MLE, when there are four or more treatment groups and about the same MSE for any number of treatment groups. Both shrinkage estimator win over the Stallard–Todd method except in the K=2 case (which is identical to the K=2 case in the “Peak” scenario). In the “Peak” scenario, the MLE is practically unbiased and beats all other methods in terms of bias and MSE. However, the LR shrinkage estimator is still very close to the MLE and has a smaller MSE than the Stallard–Todd estimator. While the estimates of the Stallard–Todd method are on average not far from those of the other methods, a closer look at the distribution of the estimated HRs at the interim analysis (Figure 3) reveals that the Stallard–Todd method produces more extreme and skewed results, than the other methods, especially in the “Constant” scenario.

Figure 3

Boxplot of estimated hazard ratios at the interim analysis in the three scenarios for K=4. MLE, maximum likelihood estimator; EB, EB shrinkage estimator; LR, LR shrinkage estimator; ST, Stallard–Todd estimator. The influence of the allocation ratio on the bias and MSE of the two‐stage estimators is shown in Figure 4. A maximum of n patients were allocated to each treatment group and a maximum of m patients to the control group. The allocation ratios m:n considered were 3:1, 2:1, 1:1, 1:2, and 1:3. This corresponds to correlations between the estimated log‐HRs of approximately 1/4, 1/3, 1/2, 2/3, and 3/4. The bias increases with increasing correlation in line with our theoretical result. The MSE apparently attains its minimum at a correlation of 0.5. This suggests that patients should be equally allocated to all groups.

Figure 4

Bias and Root mean squared error (RMSE) of the two‐stage estimators as functions of the correlation for K=4 treatment groups in each of the three base scenarios.

Bias and Root mean squared error (RMSE) of the two‐stage estimators as functions of the correlation for K=4 treatment groups in each of the three base scenarios. The bias and MSE decrease when the number of observed events increases (Figure 5). There is a considerable finite sample bias in addition to the selection bias, when only a few events (<25 events per group) are observed.

Figure 5

Bias and Root mean squared error (RMSE) of the two‐stage estimators as functions of the number of events at the final analysis for K=4 treatment groups in each of the three base scenarios.

Bias and Root mean squared error (RMSE) of the two‐stage estimators as functions of the number of events at the final analysis for K=4 treatment groups in each of the three base scenarios. The largest effect considered is a HR of 0.6. The MLE is already practically unbiased when the HR is 0.6 for one treatment and 1 for all other treatments as can be seen in Figure 1. An even larger effect would not provide any additional insight, although estimation also works well for values smaller than 0.6.

Application

We apply our methods to data reconstructed from the published Kaplan–Meier curves of the FOCUS trial 13 using the method of 27. The Kaplan–Meier curves were digitized using WebPlotDigitizer 28. In the FOCUS trial, five treatment strategies of sequential and combination chemotherapy for poor prognosis advanced colorectal cancer were compared: flourouracil only (arm A, control), irinotecan after flourouracil (arm B‐ir), combination of flourouracil and irinotecan (arm C‐ir), oxaliplatin after flourouracil (arm B‐ox), and combination of flourouracil and oxaliplatin (arm C‐ox). A total of 2135 patients were randomized in the ratio 2:1:1:1:1. The reconstructed data closely match the original data in terms of hazard ratios and corresponding confidence intervals (Table 1). The data are analyzed under a hypothetical two‐stage design, with an interim analysis after 900 events across all arms. This corresponds to roughly 300 events in the control arm and 150 events in each of the four treatment arms. The final analysis is conducted after 900 events in the control arm and the selected treatment arm, resulting in about 600 events in the control arm and 300 events in the selected treatment arm. In addition to point estimates, bootstrap confidence intervals from 1000 replications for all methods conditional on the treatment selection in the original data set are provided. The resulting confidence intervals therefore do not account for the variability introduced by the selection and are expected to have less than the nominal coverage as discussed in Section 2.3.

Table 1

HRs, 95% confidence intervals, and p‐values of the original FOCUS data 13 and the reconstructed data.

	Original data		Reconstructed data
Comparison	HR (95% CI)	p‐value	HR (95% CI)	p‐value
A versus B‐ir	0.91 (0.79–1.03)	0.16	0.91 (0.79–1.05)	0.18
A versus C‐ir	0.84 (0.73–0.96)	0.01	0.84 (0.73–0.97)	0.02
A versus B‐ox	0.97 (0.85–1.11)	0.65	0.96 (0.84–1.10)	0.58
A versus C‐ox	0.93 (0.81–1.06)	0.26	0.91 (0.79–1.05)	0.18

HR, hazard ratio.

HRs, 95% confidence intervals, and p‐values of the original FOCUS data 13 and the reconstructed data. HR, hazard ratio. Both shrinkage methods produce very similar and sensible results (Table 2), when compared with the original observed HRs of the FOCUS trial (Table 1), where no selection was performed. The estimated HRs of the LR shrinkage method are very close the MLE in all arms, with narrower confidence intervals, while the EB shrinkage method shrinks all estimates to the overall HR, resulting in a smaller treatment effect in the selected C‐ir arm. This is the same tendency of the EB shrinkage estimator to overcorrect the bias that was also observed in the “Linear” and “Peak” simulation scenarios.

Table 2

Estimated HRs and 95% bootstrap confidence intervals for all treatment groups (versus A).

		Hazard ratio (95% CI)
Method	Group	Interim analysis	Final analysis	Two‐stage
MLE	B‐ir	0.87 (0.69–1.04)	0.91 (0.80–1.04)	0.91 (0.80–1.04)
	C‐ir	0.79 (0.63–0.94)	0.84 (0.73–0.97)	0.84 (0.73–0.97)
	B‐ox	0.88 (0.72–1.05)	0.96 (0.83–1.10)	0.96 (0.83–1.10)
	C‐ox	0.81 (0.66–0.98)	0.91 (0.79–1.04)	0.91 (0.79–1.04)
EB	B‐i	0.84 (0.72–0.97)	0.90 (0.82–1.01)	0.89 (0.80–1.02)
	C‐ir	0.84 (0.68–0.94)	0.90 (0.77–0.99)	0.87 (0.76–0.98)
	B‐ox	0.84 (0.72–0.99)	0.90 (0.83–1.05)	0.94 (0.83–1.07)
	C‐ox	0.84 (0.70–0.95)	0.90 (0.81–1.02)	0.92 (0.81–1.04)
LR	B‐ir	0.84 (0.71–1.02)	0.91 (0.80–1.03)	0.90 (0.80–1.03)
	C‐ir	0.83 (0.64–0.94)	0.87 (0.75–0.98)	0.86 (0.74–0.97)
	B‐ox	0.85 (0.72–1.03)	0.94 (0.83–1.08)	0.94 (0.83–1.09)
	C‐ox	0.83 (0.68–0.97)	0.91 (0.80–1.03)	0.92 (0.80–1.04)
ST	B‐ir	0.86 (0.61–1.03)	–	0.91 (0.75–1.04)
	C‐ir	0.96 (0.64–1.58)	–	0.93 (0.75–1.24)
	B‐ox	0.88 (0.65–1.05)	–	0.96 (0.81–1.10)
	C‐ox	0.72 (0.55–0.98)	–	0.86 (0.73–1.04)

Interim analysis at 50% of the total number of events. C‐ir was selected at interim. Follow‐up continued in all groups until the final analysis.

MLE, maximum likelihood estimator; EB, EB shrinkage estimator; LR, LR shrinkage estimator; ST, Stallard–Todd estimator.

Estimated HRs and 95% bootstrap confidence intervals for all treatment groups (versus A). Interim analysis at 50% of the total number of events. C‐ir was selected at interim. Follow‐up continued in all groups until the final analysis. MLE, maximum likelihood estimator; EB, EB shrinkage estimator; LR, LR shrinkage estimator; ST, Stallard–Todd estimator. The Stallard–Todd estimator applied to this specific data set at the interim analysis gives rather different results, which contradict the observed HRs in the original data, where the C‐ir arm was the best performing treatment arm. While this result is unsatisfactory, it is in‐line with the simulation results (Figure 3) that show that extreme solutions can be obtained with this approach. The iterative procedure did not converge when applied at the final analysis (see also Section 2.4). The effect of continuing follow‐up in the dropped treatment arms can be seen clearly in the B‐ir and C‐ox arms. The substantial difference in Stage 1 HRs completely disappears at the end of Stage 2.

Conclusion

The bias of the MLE was found to be small except for a large number ( ) of treatment groups. We have shown that the common control group reduces the bias of the MLE because of the induced correlation of the MLEs and that the bias of the MLE in the dropped treatment arms is reduced when continuing follow‐up until the final analysis in accordance with the simulation results of 23. In our simulation study, the shrinkage methods performed very well, when all treatment effects are equal, but had a larger bias than the MLE, which increased with number of treatment groups, when the the maximal absolute difference of the true treatment effects was large. These results reflect the prior assumptions underlying the empirical Bayes estimators. When comparing the estimators with respect to the MSE, the Stallard–Todd method performs the worst across all scenarios. This is also reflected in the analysis of the reconstructed FOCUS trial data, which shows that either shrinkage method is preferable to the Stallard–Todd method. Unless there is reason to believe that there are large differences between the true treatment effects, the LR shrinkage estimator can be recommended as a way to reduce the bias and the MSE of the MLE. Because the shrinkage estimators tend to overcorrect the bias, they might also be of interest in situations where overestimation of the treatment effect is considered worse than underestimation. All methods were based on the asymptotic normal approximation of the log‐HRs. In our main simulations, the number of events was at least 150, such that finite sample bias is negligible. This can be seen, for example, in the Peak scenario in Figure 1, where the MLE is practically unbiased. However, when the number of events per group is very small (<25), the finite sample bias outweighs the selection bias as was seen in Figure 5. Such small event numbers do not seem relevant in practice, because typically much larger event numbers are required to achieve acceptable power. Assuming a Cox proportional hazards model is convenient but not necessary, because all methods are based on the asymptotic approximation of the distribution of the treatment effect estimators. Therefore, the Cox model could be replaced with any other model with asymptotically normal treatment effect estimators. It is anticipated that the results would be comparable to those under the Cox model. Confidence intervals after adaptive treatment selection that have the correct coverage probability are non‐trivial and were outside of the scope of this work. As far as we know, no adequate methods for construction of confidence intervals for the shrinkage or Stallard–Todd methods have been proposed. In most trials some form of covariate adjustment is required, for example, inclusion of center effects in a multi‐center trial to account for stratified randomization. As long as the asymptotic results still hold, no adjustment is necessary, because only the dimension K of the multivariate normal distribution needs to be increased to accommodate any additional regression coefficients. The shrinkage estimators could also be derived as estimators arising from penalized Cox regression 29. When using an L 2 penalty, that is, , there is a one‐to‐one correspondence between the penalty parameter and the prior variance τ. This is appealing, because the penalized Cox regression works directly on the time‐to‐event data and not only via an asymptotic normal approximation. Selection of the penalty parameter could be performed by cross‐validation in order to minimize the prediction MSE. However, cross‐validation is computationally extremely expensive and results in an additional cross‐validation error. In simulations not reported here, the penalized MLE with cross‐validation did not perform better than the shrinkage estimator. The LR shrinkage of the vector of MLEs does not change the ordering of its components, because and . Thus, the results would be the same if selection were based on the LR shrinkage estimator instead of the MLE. Finally, it is worth mentioning that the calculation of the shrinkage estimator does not depend on the specific selection rule used. It can be used without change with complex selection rules, for example, when selection is based on hypothesis tests. For the Stallard–Todd method and the UMVCUE approach, non‐trivial modifications are necessary 20.

20 in total

1. Sequential designs for phase III clinical trials incorporating treatment selection.

Authors: Nigel Stallard; Susan Todd
Journal: Stat Med Date: 2003-03-15 Impact factor: 2.373

2. Novel designs for multi-arm clinical trials with survival outcomes with an application in ovarian cancer.

Authors: Patrick Royston; Mahesh K B Parmar; Wendi Qian
Journal: Stat Med Date: 2003-07-30 Impact factor: 2.373

3. Adaptive designs for confirmatory clinical trials.

Authors: Frank Bretz; Franz Koenig; Werner Brannath; Ekkehard Glimm; Martin Posch
Journal: Stat Med Date: 2009-04-15 Impact factor: 2.373

4. Unbiased estimation of selected treatment means in two-stage trials.

Authors: Jack Bowden; Ekkehard Glimm
Journal: Biom J Date: 2008-08 Impact factor: 2.207

5. Selection and bias--two hostile brothers.

Authors: Peter Bauer; Franz Koenig; Werner Brannath; Martin Posch
Journal: Stat Med Date: 2010-01-15 Impact factor: 2.373

6. Designing a seamless phase II/III clinical trial using early outcomes for treatment selection: an application in multiple sclerosis.

Authors: T Friede; N Parsons; N Stallard; S Todd; E Valdes Marquez; J Chataway; R Nicholas
Journal: Stat Med Date: 2011-02-22 Impact factor: 2.373

7. Time-to-event analysis with treatment arm selection at interim.

Authors: L Di Scala; E Glimm
Journal: Stat Med Date: 2011-09-05 Impact factor: 2.373

8. Shrinkage estimation in two-stage adaptive designs with midtrial treatment selection.

Authors: Máximo Carreras; Werner Brannath
Journal: Stat Med Date: 2012-06-29 Impact factor: 2.373

9. Conditionally unbiased and near unbiased estimation of the selected treatment mean for multistage drop-the-losers trials.

Authors: Jack Bowden; Ekkehard Glimm
Journal: Biom J Date: 2013-12-18 Impact factor: 2.207

10. Unbiased estimation in seamless phase II/III trials with unequal treatment effect variances and hypothesis-driven selection rules.

Authors: David S Robertson; A Toby Prevost; Jack Bowden
Journal: Stat Med Date: 2016-04-21 Impact factor: 2.373

6 in total

1. Point and interval estimation in two-stage adaptive designs with time to event data and biomarker-driven subpopulation selection.

Authors: Peter K Kimani; Susan Todd; Lindsay A Renfro; Ekkehard Glimm; Josephine N Khan; John A Kairalla; Nigel Stallard
Journal: Stat Med Date: 2020-05-03 Impact factor: 2.373

2. Adaptive designs in clinical trials: why use them, and how to run and report them.

Authors: Philip Pallmann; Alun W Bedding; Babak Choodari-Oskooei; Munyaradzi Dimairo; Laura Flight; Lisa V Hampson; Jane Holmes; Adrian P Mander; Lang'o Odondi; Matthew R Sydes; Sofía S Villar; James M S Wason; Christopher J Weir; Graham M Wheeler; Christina Yap; Thomas Jaki
Journal: BMC Med Date: 2018-02-28 Impact factor: 8.775

3. Point estimation following two-stage adaptive threshold enrichment clinical trials.

Authors: Peter K Kimani; Susan Todd; Lindsay A Renfro; Nigel Stallard
Journal: Stat Med Date: 2018-05-31 Impact factor: 2.373

4. Design and estimation in clinical trials with subpopulation selection.

Authors: Yi-Da Chiu; Franz Koenig; Martin Posch; Thomas Jaki
Journal: Stat Med Date: 2018-08-07 Impact factor: 2.373

5. Estimation of treatment effects following a sequential trial of multiple treatments.

Authors: John Whitehead; Yasin Desai; Thomas Jaki
Journal: Stat Med Date: 2020-03-23 Impact factor: 2.373

6. Efficient Adaptive Designs for Clinical Trials of Interventions for COVID-19.

Authors: Nigel Stallard; Lisa Hampson; Norbert Benda; Werner Brannath; Thomas Burnett; Tim Friede; Peter K Kimani; Franz Koenig; Johannes Krisam; Pavel Mozgunov; Martin Posch; James Wason; Gernot Wassmer; John Whitehead; S Faye Williamson; Sarah Zohar; Thomas Jaki
Journal: Stat Biopharm Res Date: 2020-07-29 Impact factor: 1.452

6 in total