Literature DB >> 33783020

Optimizing subgroup selection in two-stage adaptive enrichment and umbrella designs.

Nicolás M Ballarini¹, Thomas Burnett², Thomas Jaki^2,3, Christoper Jennison⁴, Franz König¹, Martin Posch¹.

Abstract

We design two-stage confirmatory clinical trials that use adaptation to find the subgroup of patients who will benefit from a new treatment, testing for a treatment effect in each of two disjoint subgroups. Our proposal allows aspects of the trial, such as recruitment probabilities of each group, to be altered at an interim analysis. We use the conditional error rate approach to implement these adaptations with protection of overall error rates. Applying a Bayesian decision-theoretic framework, we optimize design parameters by maximizing a utility function that takes the population prevalence of the subgroups into account. We show results for traditional trials with familywise error rate control (using a closed testing procedure) as well as for umbrella trials in which only the per-comparison type 1 error rate is controlled. We present numerical examples to illustrate the optimization process and the effectiveness of the proposed designs.

Entities: Chemical

Keywords: Bayesian optimization; conditional error function; subgroup analysis; utility function

Mesh：

Year: 2021 PMID： 33783020 PMCID： PMC8251960 DOI： 10.1002/sim.8949

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

INTRODUCTION

It is increasingly common to integrate subgroup identification and confirmation into a clinical development program. Biomarker‐guided clinical trial designs have been proposed to close the gap between the exploration and confirmation of subgroup treatment effects. Numerous statistical considerations (eg, multiplicity issues, consistency of treatment effects, trial design) need to be taken into account to ensure a proper interpretation of study findings, as outlined in recent reviews. , , , Several study designs are available for the investigation of subgroups in clinical trials. These include all‐comers designs where biomarker status or subgroup are not considered for enrolment but only in the trial analysis, and stratified designs where the trial prevalences for each subgroup, that is the proportion of patients recruited from each subgroup, are chosen initially and maintained throughout the trial. , Adaptive enrichment designs have been proposed to increase the efficiency of these trials. , , , , These designs allow subgroups to be dropped for futility at interim analyses with the rest of the trial being conducted with subjects from the remaining groups only. The U.S. Food and Drug Administration guidance on adaptive designs highlights the use of adaptive enrichment designs as a means to increase the chance to detect a true drug effect over that of a fixed sample design. Master protocols provide an infrastructure for efficient study of newly developed compounds or biomarker‐defined subgroups. , Such studies simultaneously evaluate more than one investigational drug or more than one disease type within the same overall trial structure. , , An umbrella trial is a particular type of master protocol in which enrolment is restricted to a single disease but the patients are screened and assigned to molecularly defined subtrials. Each subtrial may have different objectives, endpoints or design characteristics. An example of an umbrella trial is the ALCHEMIST trial, in which patients with nonsmall cell lung cancer are screened for EGFR mutation or ALK rearrangement and assigned accordingly to subtrials with different treatments. In this paper, we study confirmatory trials that allow the investigation of the treatment effect in prespecified nonoverlapping subgroups. In particular, we focus on adaptive clinical trials that allow the modification of design elements without compromising the integrity of the trial. We propose a class of adaptive enrichment designs that use a Bayesian decision framework to optimize the design parameters, such as the trial prevalences of the subgroups, the weights for multiple hypotheses testing, and adaptation rules. A similar framework has been used in References 20, 21, 22, 23, 24, 25, 26, 27 for adaptive enrichment trials. We consider two types of problem. In the first case, we study designs that preserve the familywise error rate (FWER) of the trial using a closed testing procedure to test the null hypotheses of no treatment effect in the two subgroups. This is what is typically required in adaptive enrichment trials where a single treatment is evaluated against a control. In the second case, we show results for umbrella trial designs without multiplicity adjustment. Here, we consider studies made up of separate simultaneous trials, for which it has been argued that no control of multiplicity is needed. Our work, therefore, provides an overarching framework for both adaptive enrichment designs and umbrella trials. The manuscript is organized as follows: In Section 2, we introduce the designs and distinguish between single‐stage designs (Section 2.2) and two‐stage designs (Section 2.3), and in Section 2.4 we discuss how to adapt our proposed designs to umbrella trials. In Sections 3 and 4 we present numerical examples. We describe how our methods may be extended to designs with more than two stages in Section 5 and we end with conclusions and a discussion in Section 6.

BAYES OPTIMAL DESIGNS

The class of trial designs

Consider a confirmatory parallel‐group clinical trial comparing a new treatment and a control with respect to a pre‐defined primary endpoint. We assume the patient population may be divided into disjoint, biomarker‐defined subgroups. Given a maximum achievable sample size, n, we aim to optimize the trial design by maximising a specific utility function. Suppose two biomarker‐defined subgroups have been identified before commencing the trial. Let be the prevalence of the first subgroup in the underlying patient population and the prevalence of the second subgroup. Let and be the treatment effects, denoting the difference in the mean outcome between treatment and control, in the first and second subgroups, respectively. We consider trials to investigate the null hypotheses : and : with corresponding alternative hypotheses : and : . In Sections 2.2 and 2.3 we consider confirmatory trials in which strong control of the FWER is imposed. In our discussion of umbrella trials in Section 2.4, we assume multiplicity control is not required. We consider optimization within a class of designs that have a single interim analysis at which adaptation can take place. The total sample size is fixed at n with patients in the first stage and patients in the second stage, where , and . In the first stage, patients are recruited from subgroup 1 and from subgroup 2, where , and . In the second stage, patients are recruited from subgroup 1 and from subgroup 2, where , and , and the values of and may depend on the first stage data. Within each stage and subgroup, we assume equal allocation to the two treatment arms (this assumption is not strictly necessary and could be relaxed). Figure 1 gives a schematic representation of the trial design.

FIGURE 1

Schematic representation of the three types of trial design. In the single‐stage trial, the sampling prevalences of the subgroups are fixed throughout the trial. In standard adaptive enrichment trials, patients are recruited with predefined subgroup prevalences until the interim analysis, at which point a decision is taken to continue with the same prevalences or to sample from a single subgroup. In the Bayes optimal adaptive trial designs that we consider, the sampling prevalences may be changed at the interim analysis [Colour figure can be viewed at wileyonlinelibrary.com] The definition of a particular design in is completed by specifying the multiple testing procedure to be used and the method for combining data across stages when adaptation occurs. We use a closed testing procedure to control FWER, applying a weighted Bonferroni procedure to test the intersection hypothesis. In this procedure, weights are initially set as and but these may be modified in the second stage if adaptation occurs. The error rate for each hypothesis test is controlled by preserving the conditional type I error rate when an adaptation is made. Thus, while we use a Bayesian approach to optimize the design, the trial is analyzed using frequentist procedures that control error rates at the desired level, adhering to conventional regulatory standards. We follow a Bayesian decision theoretic approach to optimize over trial designs in the class . In assessing each design, we assume a prior distribution for the treatment effects in each subgroup and a utility function that quantifies the value of the trial's outcome. We shall optimize designs with respect to the timing of the interim analysis, the proportion of patients recruited from the two subgroups at each stage of the trial, the weights in the weighted Bonferroni test, and the rule for updating these weights given the interim data. We summarize the data observed during the trial by the symbol , noting that this summary should contain information about the numbers of observations from each subgroup and weights to be used in the weighted Bonferroni test at each stage, as well as estimates of and obtained from observations before and after the interim analysis. We define our utility function to be where is the indicator function. By definition, the data summary contains the information needed to determine if each of the hypotheses and is rejected. The utility (1) involves the size of the underlying subgroups as well as the rejection of the corresponding hypotheses. Thus, rejection of the null hypothesis for a larger subgroup is given greater weight. If the population prevalence of the two subgroups is not known, a prior on may be added. We note that terms in the function (1) are positive when a null hypothesis is rejected but the associated treatment effect is very small or even negative: this issue could be addressed by multiplying each term by an indicator variable which takes the value 1 if the relevant parameter, or , is larger than zero or above a clinically relevant threshold (eg, Stallard et al where a similar approach is used for treatment selection). Since the trial design is optimized with respect to the stated utility, it is important to choose a utility function that reflects accurately the relative importance of possible trial outcomes. Furthermore, the definition of utility can be adapted to reflect the interest of different stakeholders, for example, Ondra et al and Graf, Posch and König propose utility functions that represent the view of a sponsor or take a public health perspective. Let denote the prior distribution for . Then, the Bayes expected utility for a trial design is where we have taken the expectation over the sampling distribution of the trial data given the true treatment effects , with an outer integral over the prior distribution . When choosing the prior , it is important to remember that represents the expected utility, averaged over . If an “uninformative” prior is chosen, this will place weight on extreme scenarios, such as large negative treatment effects, which have little credibility. Thus, when considering the Bayes optimal design, it is important to use subjective, informative priors. In some cases, pilot studies or historic observational data may be available to construct the prior distribution. In this paper, we assume the prior distribution to be bivariate normal, Here, the correlation coefficient reflects the belief about the existence of common factors that contribute to the treatment effects in the two subgroups.

Bayes optimal single‐stage design

Patient recruitment and estimation

Suppose we wish to conduct a single‐stage trial, which is the special case where , usually referred to as a stratified design. For simplicity of notation in this section, we write and rather than and for and 2. We assume patients can be recruited at these rates regardless of the true proportions and in the underlying patient population. In addition, we assume that patients are randomised between the new treatment and the control with a 1 : 1 allocation ratio in each subgroup. During the trial we observe a normally distributed endpoint for each patient and we assume a constant variance for all observations. For patient i from subgroup j on the new treatment we have , , and for patient i from subgroup j on the control treatment we have , . The estimate of the treatment effect in subgroup j, is

Hypothesis testing in the single‐stage design

Consider the case and 0 < . Then and the corresponding Z‐values follow standard normal distributions under the null hypotheses and . We use a closed testing procedure to ensure strong control of the FWER at level. To construct this, we require level tests of : , : and : . We reject globally if the level tests reject and . Similarly, we reject globally if the level tests reject and . For the individual tests we reject if and if . To test the intersection hypothesis, we use a weighted Bonferroni test: given predefined weights and , where , we reject if or . The resulting closed testing procedure is equivalent to the weighted Bonferroni‐Holm test and will be generalised to adaptive tests in Section 2.3. We note that the choice of a closed testing procedure is not restrictive in this setting since any procedure that gives strong control of the FWER may be written as a closed testing procedure. , Furthermore in the special cases and , where the trial recruits from only one of the subgroups, just one subgroup is tested and only the test of the individual hypothesis is required. These cases are accommodated in our general class of designs by setting when and when .

Bayesian optimization

In the single‐stage trial we wish to optimize the trial prevalences of each subgroup, and , and the weights in the Bonferroni‐Holm procedure, and . Given the constraints and , we denote the set of parameters to optimize by . Let denote the conditional distribution of given for design parameters a. The Bayes expected utility is given by The Bayes optimal design is given by the pair that maximises the Bayes expected utility of the trial, that is Given our simple choices for the prior distribution and the utility function this integral may be computed directly (see Section S1.2 of Appendix S1). We find the Bayes optimal single‐stage trial by a numerical search over possible values of a.

Bayes optimal two‐stage adaptive design

Adding a second stage

Consider now a two‐stage design in which data from the first stage inform adaptations in the second stage. The estimate of for subgroup j based on data collected in stage k is where and are the mean responses in subgroup j in stage k for the treatment arm and control arm, respectively. Given the value of , the first stage estimates are independent with distributions The trial prevalences, and , of the two subgroups in the second stage are dependent on and but, conditional on and , the second‐stage estimates are independent and conditionally independent of and with

Hypothesis testing in the two‐stage adaptive design

There is a variety of approaches to test multiple hypotheses in a two‐stage adaptive design. , , , We shall use a closed testing procedure to ensure strong control of the FWER at level , as we did for the single‐stage design in Section 2.2.2. In constructing level tests of the null hypotheses , and we employ the conditional error rate approach. , Based on a reference design and its predefined tests, we calculate the conditional error rate for each hypothesis and define adaptive tests which preserve this conditional error rate, thereby controlling the overall type I error rate. Consider a reference design in which the trial prevalences of subgroups 1 and 2 and the weights in the weighted Bonferroni test of remain the same across stages, so and for and 2. In the reference design, tests are performed by pooling the stage‐wise data within each subgroup and treatment arm, and using the conventional test statistics, as for the single‐stage test. For and 2, the pooled estimate of across the two stages of the trial is with corresponding Z‐value and the null hypothesis is rejected at level if . Let then the conditional distribution of given the interim data is and the conditional error rates for the tests of are Similarly, the conditional error rate for the test of is See Section S1.1 of Appendix S1 for further details on the derivations of the conditional distributions. In the adaptive design, if no adaptations are made at the interim analysis we apply the tests as defined for the reference design. Suppose now that adaptations are made and the trial prevalences in stage 2 are set to be and with weights and for the weighted Bonferroni test. In this case, we calculate the conditional error rates , and prior to adaptation from Equations (5) and (6). We then define tests of , and based on stage 2 data alone that have these conditional error rates as their type 1 error probabilities. Given the updated and , Thus, in our level tests, we reject if , we reject if and, applying a weighted Bonferroni test with weights and , we reject if . Finally, following the closed testing procedure, we reject globally if the level tests reject and and we reject globally if the level tests reject and .

Two‐stage optimization

We denote the set of initial design parameters by and the second‐stage parameters by . Let and be the vectors of estimated treatment effects in each subgroup, based on the first and second‐stage data, respectively, as defined in Equation (4). Denote the conditional distributions of the estimated effects in each stage of the trial by and and the posterior distribution of given the stage 1 observations by . Then, the Bayes expected utility can be written as We find the optimal combination of design parameters before stage 1 and before stage 2 using the backward induction principle. First we construct the Bayes optimal for all possible and . Then we construct the Bayes optimal given that the optimal will be used in the second stage of the trial.

Optimizing the decision at the interim analysis

Denoting the marginal distribution of by , we have and the right‐hand side of Equation (7) can be written as Thus, given and , the Bayes optimal decision for the second stage is the choice of that maximises For known values of and , we can find the conditional error rates , , and used in hypothesis testing in stage 2, hence we may evaluate for given , , , and . Our choices for the prior distribution and utility function mean that it is quite straightforward to compute for given , and . Thus, we are able to perform a numerical search seeking to find the Bayes optimal .

Overall trial optimization

Having found the Bayes optimal parameters for the second stage of the trial as a function of , we determine , the Bayes optimal choice for the initial parameters, as We conduct a search over possible values of to maximize the above integral and find the optimal choice of . Computing the integral for a given value of by numerical integration is not straightforward. Instead, we have used Monte Carlo simulation to carry out this calculation for each value of .

Bayes optimal umbrella trials

We now consider the case of umbrella trials, where it has been argued that no multiplicity adjustment is required as the hypotheses to be tested concern different experimental treatments targeted to different molecular markers or subgroups. Since each treatment is assessed separately, an umbrella trial can be viewed a set of independent trials even though they are run under a single protocol. We consider umbrella trials with two subgroups, as in the previous sections. However, without multiplicity adjustment, the hypothesis testing procedure reduces to testing the elementary hypotheses and each at level . In applying the conditional error rate approach, only the computation of conditional error rates and from Equation (5) is required. Then, with and denoting the test statistics based on second‐stage data only, is rejected if and is rejected if . No test of the intersection hypothesis is performed. Design parameters are optimized with respect to the utility function in Equation (1). To frame the optimization problem in the same way as in the previous sections, the interim decision in a two‐stage umbrella trial will optimize only the second‐stage subgroup trial prevalences, so , while in the first stage we optimize the subgroup trial prevalences and the timing of the interim analysis, so . In the case of a single‐stage umbrella trial, only the subgroup prevalences are optimized, so . We have used a normal prior distribution, as defined in Equation (2), in optimizing the design parameters of single‐stage and two‐stage trials. In the case of two‐stage designs, the interim analysis uses the test statistics from the first stage and the prior distribution to perform adaptations and the final tests are performed using the conditional error rate approach.

NUMERICAL EXAMPLES AND COMPARISONS

In this section, we give numerical examples of optimized single‐stage and two‐stage designs in a range of scenarios. We show results for cases with and without multiplicity correction, referring to these as enrichment and umbrella trials, respectively. Additionally, we illustrate the optimization of the decision rule at the interim analysis. In Table 1, we provide an overview of the scenarios considered and the parameters that are optimized.

TABLE 1

The scenarios considered in the numerical examples. The term “opt” indicates that parameters were optimized, while “N/A” means the parameters are not applicable. The parameters and are either specified by a prior distribution in which or specific values of and are given

		λ	μ1	μ2	ψ	ρ	s⁽¹⁾	r1(1)	ω1(1)	θ1	θ2
Figure 2	Single‐stage	0.3	0 to 0.3	0, 0.2	0.02 to 0.44	0.5	N/A	opt	opt	prior
Figure S2	Single‐stage	0.3	0 to 0.3	0, 0.2	0.2	−1 to 1	N/A	opt	opt	prior
Figure 3	Interim decision	0.3	0.1	0	0.2	−0.8, 0.5	0.25, 0.5	0.3	0.3	prior
Figure 4	Two‐stage	0.3	0, 0.3	0, 0.2	0.2	0.5	0.1 to0.9	opt	opt	prior
Figure 5	Two‐stage	0.3	0 to 0.3	0, 0.2	0.02 to 0.4	0.5	opt	opt	opt	prior
Figure S10	Two‐stage	0.3	0 to 0.3	0, 0.2	0.2	−0.8 to 0.8	opt	opt	opt	prior
Figures 6 and S11	Power	0.3	0.1, 0.2	0	0.2	0.5	opt	opt	opt	0 to 0.3	0, 0.2

Optimal single‐stage designs

In studying the impact of the prior distribution on optimized trial design parameters for single‐stage designs, we consider studies where the response variance is and the total sample size is fixed at . We assume a multivariate normal prior distribution for as defined in Equation (2) with parameters , , and , and we compute optimal designs for a variety of such priors. The FWER in enrichment designs and the per‐comparison error rate in umbrella designs is fixed at . Optimized design parameters for single‐stage designs and the expected utility, averaged over the prior. Parameters are for enrichment trials and for umbrella trials. Results are classified by and , the prior means of and , and the prior SD . The prior correlation between and is fixed at and the population prevalence of subgroup 1 is assumed to be [Colour figure can be viewed at wileyonlinelibrary.com] Examples of optimal adaptation rules when , the prior distribution for has parameters , , and or −0.8, and first stage design parameters are set as and or 0.5. Optimized values of and are shown for each combination of first stage Z‐values and . Also shown are the conditional expected utility when the trial proceeds using the optimized values of and and the increase in conditional expected utility compared to continuing with no adaptation. In each plot, the red circle indicates the 95% highest density region for the distribution of when the true treatment effects are and and the green ellipse indicates the 95% highest density region for the prior predictive distribution of . The white regions contain values of for which the maximum conditional expected utility is below 0.01. In these cases the numerical optimization becomes unstable and optimal values for and are not displayed [Colour figure can be viewed at wileyonlinelibrary.com] Optimization of first‐stage design parameters. The population prevalence of subgroup 1 is and the prior distribution for has parameters or 0.3, or 0.2, and . Each column shows results for a different value of . The plots show the expected utility as a function of , with coloured solid lines for different values of in an enrichment trial and black dashed lines for an umbrella trial with no multiplicity adjustment. In each panel, the colored dot indicates the combination of and that yields the maximum expected utility for an enrichment design and the black dot shows the optimum value of for an umbrella design [Colour figure can be viewed at wileyonlinelibrary.com] Optimized design parameters for two‐stage designs and the expected utility, averaged over the prior. Parameters are for enrichment trials and for umbrella trials. Results are classified by and , the prior means for and , and by the prior SD . The prior correlation between and is fixed at and the population prevalence of subgroup 1 is assumed to be [Colour figure can be viewed at wileyonlinelibrary.com] Operating characteristics of enrichment trials. The prior distribution for subgroup treatment effects is normal with means or 0.2 and , SDs and correlation . The total sample size is 700 and the population prevalence of subgroup 1 is . Results are given for ranging from 0 to 0.3 and or 0.2. The black dashed lines in the two top rows are placed at 0.05 as reference to the significance level, while the dashed lines in the third row indicates the expected utility of the trial given the initial design parameters [Colour figure can be viewed at wileyonlinelibrary.com] In Figure 2 we display the effect of the prior SD on the optimal design parameters when the population prevalence of subgroup 1 is . We considered prior SDs of 0.02, 0.0632, 0.1, 0.1414, 0.2, 0.3162, and 0.44, corresponding to information from studies with 10 000, 1000, 400, 200, 100, 40 and 20 subjects in each subgroup.

FIGURE 2

Optimized design parameters for single‐stage designs and the expected utility, averaged over the prior. Parameters are for enrichment trials and for umbrella trials. Results are classified by and , the prior means of and , and the prior SD . The prior correlation between and is fixed at and the population prevalence of subgroup 1 is assumed to be [Colour figure can be viewed at wileyonlinelibrary.com]

The mean and variance of the prior distribution have a large impact on the optimal design parameters and . The optimal values of and and the expected utility of the resulting designs are very similar for enrichment and umbrella designs. If and , optimal values of and are larger than 0.3, the population prevalence of subgroup 1, so the design over‐samples this subgroup. If and , the optimal design under‐samples subgroup 1. When both and are greater than zero, the optimal design has and , reflecting the fact that it is advantageous to sample more subjects from subgroup 2 and allocate more type 1 error probability to the test of since implies that has a greater weight than in the utility function. In extreme cases where , and the prior variance is small, the optimal design has , so only subgroup 2 is sampled. When , and the prior variance is small, the optimal design has and only subgroup 1 is sampled. In Figure S2, we show the effect of the prior correlation on the design parameters when the prior SD is . We observe that the correlation has an impact on the optimal weight for testing the intersection hypothesis, in particular, when the treatment effects and have a high positive correlation, it is better to place most weight on one hypothesis rather than split the weight between the two hypotheses. In Figures S3 and S4 we present further results for different values of , varying in Figure S3 and in Figure S4. Since the utility to be maximized depends on the population prevalences, the optimal design parameters vary considerably with . We see from Figure S3 that has only a small impact on the optimal value of when adjusting for multiplicity and no impact at all in umbrella designs where no multiplicity adjustment is made. Figure S4 shows that the dependence of optimal design parameters on is similar to that seen in Figure 2: when the prior variance is large the optimal choices for and are close to , while for smaller variances the optimal designs depend on the prior means and as well as .

Optimal two‐stage designs

Figure 3 illustrates optimal adaptation rules for two‐stage designs. In these examples , , the population prevalence of subgroup 1 is , and the prior distribution for has parameters , , and or −0.8. The first‐stage design parameters have not been optimized and are set as with equal to 0.25 or 0.5. The FWER in enrichment designs and the per‐comparison error rate in umbrella designs is fixed at .

FIGURE 3

Examples of optimal adaptation rules when , the prior distribution for has parameters , , and or −0.8, and first stage design parameters are set as and or 0.5. Optimized values of and are shown for each combination of first stage Z‐values and . Also shown are the conditional expected utility when the trial proceeds using the optimized values of and and the increase in conditional expected utility compared to continuing with no adaptation. In each plot, the red circle indicates the 95% highest density region for the distribution of when the true treatment effects are and and the green ellipse indicates the 95% highest density region for the prior predictive distribution of . The white regions contain values of for which the maximum conditional expected utility is below 0.01. In these cases the numerical optimization becomes unstable and optimal values for and are not displayed [Colour figure can be viewed at wileyonlinelibrary.com]

The adaptation rules specify the second‐stage design parameters that optimize the expected utility, as defined in Equation (1), given the first stage statistics and . The optimal and are calculated using the Hooke‐Jeeves derivative‐free minimization algorithm through the hjkb function in the dfoptim package in R. We also calculated the conditional expected utility if the trial continued with no adaptation, so and , and the plots in the bottom row of Figure 3 show the gain in the conditional expected utility due to the optimized adaptation. In Section S3 of Appendix S1, we present optimal interim rules for further values of . In Figure 4, we illustrate the procedure for optimizing first‐stage design parameters, for an enrichment design or for an umbrella design. For each combination of prior parameters and first‐stage design parameters a, we generated 1000 samples of first‐stage data under treatment effects drawn from the prior distribution. For each first‐stage dataset, we found the optimal second‐stage design parameters and noted the conditional expected utility using these optimal parameters. We took the average of the 1000 values of the optimized conditional expected utility as our simulation‐based estimate of the expected utility for this choice of a. The optimal first‐stage design parameters for a given prior distribution are those values of , , and in the case of an enrichment design , that yield the highest expected utility.

FIGURE 4

Optimization of first‐stage design parameters. The population prevalence of subgroup 1 is and the prior distribution for has parameters or 0.3, or 0.2, and . Each column shows results for a different value of . The plots show the expected utility as a function of , with coloured solid lines for different values of in an enrichment trial and black dashed lines for an umbrella trial with no multiplicity adjustment. In each panel, the colored dot indicates the combination of and that yields the maximum expected utility for an enrichment design and the black dot shows the optimum value of for an umbrella design [Colour figure can be viewed at wileyonlinelibrary.com]

Our results show the impact of the prior distribution on the optimized trial design parameters. The flat lines when indicate that the expected utility is hardly affected by the choice of and when the interim analysis is performed early in the trial. When the interim analysis is performed later, the choice of first‐stage design parameters is more important. It should be noted that for each pair of prior means , expected utility close to the overall optimum can be achieved using a wide range of first‐stage design parameters as long as the second‐stage design is optimized, given the first‐stage data. In Figures 5 and S10 we present optimized values of the first‐stage design parameters, , , and , given that optimal values of the second‐stage design parameters will be used following the interim analysis. The results are similar to those observed for optimal single‐stage designs. The prior variance has a large impact on the first‐stage optimal design: for smaller variances, interim analyses closer to the beginning of the trial yield a larger expected utility, while with larger variances, interim analyses after around 40% to 60% of the patients have been recruited are preferable. When the prior means are both 0 the optimal design parameters and are close to the subgroup 1 prevalence . However, if the prior suggests a benefit is more likely in subgroup 1, the optimal design over‐samples this subgroup, increasing its trial prevalence and testing weight. Figure S10 shows that, for enrichment designs, the prior correlation has a large impact on the choice of but little effect on the optimal trial prevalences.

FIGURE 5

Optimized design parameters for two‐stage designs and the expected utility, averaged over the prior. Parameters are for enrichment trials and for umbrella trials. Results are classified by and , the prior means for and , and by the prior SD . The prior correlation between and is fixed at and the population prevalence of subgroup 1 is assumed to be [Colour figure can be viewed at wileyonlinelibrary.com]

As for single‐stage designs, the optimal values of are similar for enrichment and umbrella designs. A notable difference is that while the prior correlation has no effect at all on the optimal values of in a single‐stage umbrella design, the optimal value of in a two‐stage umbrella design does show a small dependence on . In the case of a single‐stage umbrella design, the marginal distributions of and do not depend on and thus, with no multiplicity adjustment in testing and , the expected value of the utility defined in Equation (1) does not depend on . However, in a two‐stage umbrella trial, the optimal choice of and the resulting conditional expected utility depends on both and and it is the joint distribution of , which depends on , that determines the optimal value of . It should be noted that the procedures we have described impose a high computational burden. While it is relatively straightforward to optimize the decision at the interim analysis, the overall optimization of the trial is performed using simulations over a grid of values for the first‐stage design parameters. More rapid computation of the optimal values may be achieved by using approximations to the utility when extreme first‐stage values are observed, for example, if both and are large and negative, the expected utility is practically zero for all choices of and . In practice, one may wish to add the option of stopping the trial for futility if extreme negative results are observed at the interim analysis. The methods we have presented can be extended to find efficient designs that incorporate this option by working with a utility of the form assigning a positive value k to each observation saved by early stopping.

Performance of the Bayes optimal design under specific alternative hypotheses

In this section we consider adaptive designs optimized for a particular prior distribution for but we evaluate their performance under specific values of . We consider trials with a total sample size , response variance , and population prevalence of subgroup 1 equal to . As a benchmark for comparison, we consider a nonoptimized, single‐stage design with and . We derive and assess the performance of single‐stage designs for which design parameters and are optimized as described in Section 2.2, and we derive and assess two‐stage designs for which first‐stage design parameters and the adaptation rule are optimized as described in Section 2.3. In optimizing designs, we assume the normal prior distribution for presented in Equation (2) with or 0.2, , and . These priors reflects the belief that a treatment benefit is more likely in subgroup 1. The prior SD of 0.2 corresponds to information from a trial with 100 subjects in each subgroup. We evaluate the operating characteristics of the designs for values of ranging from 0 to 0.3 and or 0.2. This creates scenarios with a treatment effect in only one subgroup when or with a treatment effect in both subgroups when and . Figure 6 presents simulation results for enrichment trials and Figure S11 presents results for umbrella trials. The plots show the probabilities of rejecting and and the average utility at the end of the trial for a variety of combinations of , , , and . For the scenarios considered, we see that optimizing the trial for the assumed priors leads to a substantial increase in the power to reject as compared to the nonoptimized, single‐stage design. However, the optimized designs have lower power to reject when . The optimized designs have a higher average utility than the nonoptimized design when . If , the two‐stage design optimized for the prior with has similar average utility to the the nonoptimized design but average utility of the optimized one‐stage design is a little lower; both one‐stage and two‐stage designs optimized for the prior with have lower average utility than the the nonoptimized design. These results are in line with previous studies , which showed adaptive enrichment designs provide the greatest advantage when a treatment effect is present in only one subgroup.

FIGURE 6

Operating characteristics of enrichment trials. The prior distribution for subgroup treatment effects is normal with means or 0.2 and , SDs and correlation . The total sample size is 700 and the population prevalence of subgroup 1 is . Results are given for ranging from 0 to 0.3 and or 0.2. The black dashed lines in the two top rows are placed at 0.05 as reference to the significance level, while the dashed lines in the third row indicates the expected utility of the trial given the initial design parameters [Colour figure can be viewed at wileyonlinelibrary.com]

WORKED EXAMPLE: IMPLEMENTING AN OPTIMIZED ADAPTIVE ENRICHMENT TRIAL

Suppose we wish to compare an experimental treatment to a control in a phase III clinical trial. We intend to use adaptive sample allocation as there is reason to believe the new treatment may only benefit a subgroup of patients. This trial will have a normally distributed endpoint with variance and, using information from a pilot study with 40 subjects from each subgroup, we construct a prior distribution for the treatment effects The total sample size for the trial is planned to be subjects. The population prevalence of subgroup 1 is and a FWER is to be used for the study. Under the above assumptions, the results in Figure 5 for show the optimal first‐stage parameters to be , and . Thus, we recruit 350 patients in the first stage of the trial with 40% of these from subgroup 1. Now suppose we observe interim estimates and . These give Z‐values and and the conditional error rates, as defined in Equations (5) and (6), are , and . At this point, we optimize the second‐stage design parameters and . Figure 7 plots the conditional expected utility as a function of and on a color‐coded scale. The maximum conditional expected utility, obtained using the Hooke‐Jeeves algorithm, is at and . We therefore conduct the second stage of the trial using these parameter values.

FIGURE 7

Interim optimization. The color indicates the expected utility given interim data for each combination of second‐stage prevalence for subgroup 1 and testing weight given the interim data [Colour figure can be viewed at wileyonlinelibrary.com] Suppose, after recruiting the remaining subjects, the second‐stage estimates are and . The corresponding Z‐values are and , with P‐values and . Since and we can globally reject . However, since we cannot reject .

EXTENDING THE DESIGNS

The methods we have described can be extended to trial designs with more than two stages or more than two subgroups. Suppose K disjoint subgroups are specified and we wish to test the null hypotheses : against the alternatives : , where denotes the treatment effect in subgroup k. In a trial with J stages and a total sample size n, we recruit patients in each stage, where , and at stage j we recruit patients from subgroups , where . The data provide estimates , at each stage j, from which we obtain Z‐values . In an enrichment design where control of the FWER is required, a suitable closed testing procedure is defined in terms of the . Then, is rejected globally at level if all intersection hypotheses involving are rejected in local, level tests. An adaptive design can be created by repeated application of the conditional error approach. An initial reference design is stated and when adaptation occurs, the modified testing procedure is defined so as to preserve the conditional error rate of each individual and intersection hypothesis test under the updated design for the remainder of the trial. This updated design becomes the new reference design under which conditional error rates will be calculated at any subsequent adaptation point. We can consider optimizing the choice of the design parameters and or weights in the tests of intersection hypotheses. The generalization of our earlier approach requires a prior distribution for the treatment effects and a utility function whose expectation is to be maximised. If is the population prevalence of subgroup k, , a natural extension of Equation (1) is In Section 2.3.3 we applied backwards induction to find the optimal design for a trial with two subgroups and two stages. Since the dimension of the state space grows with the number of subgroups and stages, such a direct application of backwards induction may not be feasible more generally. Other methods of optimization can be employed to find efficient, if not globally optimal, designs. For example, in a multistage design one may construct the adaptation rule at each interim analysis assuming the trial will continue without any further adaptation. We note that the optimization process is liable to be computationally intensive and it is important to commit resources to assess trial designs in a timely manner.

DISCUSSION

We have presented a Bayesian decision theoretic framework in which a clinical trial design can be optimized when two disjoint subgroups are under investigation. Our approach has both Bayesian and frequentist elements: the rules for hypothesis testing control the type I error rate and Bayesian decision tools are used to choose the design parameters within this scheme. This allows optimization of the sampling prevalence of each subgroup and weights in a weighted Bonferroni test of the intersection hypothesis, as well as optimal adaptation of these design parameters at the interim analysis. The optimal design maximizes the expected value of the specified utility function, averaged over the prior distribution assumed for the treatment effects in the two subgroups. After focusing on two‐stage trials with two subgroups in Sections 2 and 4, we outlined how our optimization framework may be extended to allow more subgroups or stages in the trial in Section 5. Our results provide insights into how the mean and variance of the prior distribution affects the optimal timing of the interim analysis and the trial prevalences for each subgroup of patients. In practice, it is advisable to consider the sensitivity of the design's efficiency to modeling assumptions in order to create a trial design with robust efficiency. In contrast to adaptive enrichment designs where recruitment is either from the full patient population or restricted to a single subgroup, we propose sampling from each subgroup at a specific rate which may differ from its population prevalence. We acknowledge that achieving the optimized prevalences in a trial may be challenging: additional screening will be required and over‐sampling a particular subgroup may delay a trial compared to an all‐comers design. , If logistical considerations imply that each subgroup is either dropped or sampled according to its population prevalence, our framework can still be used to optimize the other design parameters. In Section 3.2 we discussed designs with the option of early stopping for futility and how the utility function might be modified to facilitate optimizing such designs. A similar approach could be followed to relax the requirement of a fixed total sample size and allow re‐assessment of future sample size at an interim analysis. We have defined methods for normally distributed observations and a normal prior for treatment effects. While this has allowed us to demonstrate how to construct such designs, it is not a necessary restriction. With normally distributed responses, one could allow a separate response variance for each patient subgroup, placing prior distributions on these variances. In trials with other types of response distribution, including survival or categorical endpoints, standardized test statistics will still be approximately normally distributed if sample sizes are large enough, although nonnormal prior distributions may be appropriate. We assumed the null hypotheses of interest are that there is no treatment effect in each subgroup. Our decision theoretic framework can accommodate other formulations, such as testing for treatment effects in the full population and in one particular subgroup, , , , , , , , in which case the stage‐wise test statistics for different subgroups are correlated. Care is required to ensure that enrichment designs control FWER when test statistics are correlated but this is not an issue in umbrella trials with separate level tests for each null hypothesis. Although we have focused on hypothesis testing instead, estimating treatment effects after an adaptive trial is also important. Simultaneous or marginal confidence regions for parameters, with or without multiplicity adjustment, can be constructed following a two‐stage design. , Point estimates may be obtained by a weighted average of the treatment effects observed in the first and second stages , but, due to the sample size adaptations and subgroup selection these estimators may be biased with the bias depending on the specific adaptation rules and the true parameter values. A thorough investigation of estimation for adaptive enrichment designs will be a topic of future research. Software in the form of an R package is available at https://github.com/nicoballarini/OptimalTrial.

AUTHOR CONTRIBUTIONS

Dr Ballarini and Dr Burnett are the co‐primary authors and they contributed equally to this work. Appendix S1. Technical appendices and additional simulation results. Click here for additional data file.

43 in total

1. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches.

Authors: H H Müller; H Schäfer
Journal: Biometrics Date: 2001-09 Impact factor: 2.571

2. A general statistical principle for changing a design any time during the course of a trial.

Authors: Hans-Helge Müller; Helmut Schäfer
Journal: Stat Med Date: 2004-08-30 Impact factor: 2.373

Review 3. General guidance on exploratory and confirmatory subgroup analysis in late-stage clinical trials.

Authors: Alex Dmitrienko; Christoph Muysers; Arno Fritsch; Ilya Lipkovich
Journal: J Biopharm Stat Date: 2016 Impact factor: 1.051

4. Adaptive designs for confirmatory clinical trials.

Authors: Frank Bretz; Franz Koenig; Werner Brannath; Ekkehard Glimm; Martin Posch
Journal: Stat Med Date: 2009-04-15 Impact factor: 2.373

5. Nonparametric adaptive enrichment designs using categorical surrogate data.

Authors: Matthias Brückner; Hans U Burger; Werner Brannath
Journal: Stat Med Date: 2018-09-06 Impact factor: 2.373

Review 6. Key multiplicity issues in clinical drug development.

Authors: Alex Dmitrienko; Ralph B D'Agostino; Mohammad F Huque
Journal: Stat Med Date: 2012-10-09 Impact factor: 2.373

Review 7. ALCHEMIST Trials: A Golden Opportunity to Transform Outcomes in Early-Stage Non-Small Cell Lung Cancer.

Authors: Ramaswamy Govindan; Sumithra J Mandrekar; David E Gerber; Geoffrey R Oxnard; Suzanne E Dahlberg; Jamie Chaft; Shakun Malik; Margaret Mooney; Jeffrey S Abrams; Pasi A Jänne; David R Gandara; Suresh S Ramalingam; Everett E Vokes
Journal: Clin Cancer Res Date: 2015-12-15 Impact factor: 12.531

Review 8. Methods for identification and confirmation of targeted subgroups in clinical trials: A systematic review.

Authors: Thomas Ondra; Alex Dmitrienko; Tim Friede; Alexandra Graf; Frank Miller; Nigel Stallard; Martin Posch
Journal: J Biopharm Stat Date: 2016 Impact factor: 1.051

9. Estimation after subpopulation selection in adaptive seamless trials.

Authors: Peter K Kimani; Susan Todd; Nigel Stallard
Journal: Stat Med Date: 2015-04-22 Impact factor: 2.373

Review 10. Biomarker-Guided Adaptive Trial Designs in Phase II and Phase III: A Methodological Review.

Authors: Miranta Antoniou; Andrea L Jorgensen; Ruwanthi Kolamunnage-Dona
Journal: PLoS One Date: 2016-02-24 Impact factor: 3.240

3 in total

Review 1. Innovations in Clinical Development in Rare Diseases of Children and Adults: Small Populations and/or Small Patients.

Authors: Robert A Beckman; Zoran Antonijevic; Mercedeh Ghadessi; Heng Xu; Cong Chen; Yi Liu; Rui Tang
Journal: Paediatr Drugs Date: 2022-10-15 Impact factor: 3.930

2. Enrichment Bayesian design for randomized clinical trials using categorical biomarkers and a binary outcome.

Authors: Valentin Vinnat; Sylvie Chevret
Journal: BMC Med Res Methodol Date: 2022-02-27 Impact factor: 4.615

3. Optimizing subgroup selection in two-stage adaptive enrichment and umbrella designs.

Authors: Nicolás M Ballarini; Thomas Burnett; Thomas Jaki; Christoper Jennison; Franz König; Martin Posch
Journal: Stat Med Date: 2021-03-29 Impact factor: 2.373

3 in total