Literature DB >> 31007363

Design of experiments for a confirmatory trial of precision medicine.

Abstract

Precision medicine, aka stratified/personalized medicine, is becoming more pronounced in the medical field due to advancement in computational ability to learn about patient genomic backgrounds. A biomaker, i.e. a type of biological process indicator, is often used in precision medicine to classify patient population into several subgroups. The aim of precision medicine is to tailor treatment regimes for different patient subgroups who suffer from the same disease. A multi-arm design could be conducted to explore the effect of treatment regimes on different biomarker subgroups. However, if treatments work only on certain subgroups, which is often the case, enrolling all patient subgroups in a confirmatory trial would increase the burden of a study. Having observed a phase II trial, we propose a design framework for finding an optimal design that could be implemented in a phase III study or a confirmatory trial. We consider two elements in our approach: Bayesian data analysis of observed data, and design of experiments. The first tool selects subgroups and treatments to be enrolled in the future trial whereas the second tool provides an optimal treatment randomization scheme for each selected/enrolled subgroups. Considering two independent treatments and two independent biomarkers, we illustrate our approach using simulation studies. We demonstrate efficiency gain, i.e. high probability of recommending truly effective treatments in the right subgroup, of the optimal design found by our framework over a randomized controlled trial and a biomarker-treatment linked trial.

Entities: Chemical Disease Gene Species

Keywords: Design of experiments; Regression model; Treatment randomization scheme; Weighted zzm321990Lzzm321990-optimality

Year: 2019 PMID： 31007363 PMCID： PMC6473552 DOI： 10.1016/j.jspi.2018.06.004

Source DB: PubMed Journal: J Stat Plan Inference ISSN： 0378-3758 Impact factor: 1.111

Introduction

A randomized controlled trial (RCT) has been the gold standard for testing a new intervention in medicine, especially in phase III confirmatory studies. Many treatments work differently in different patient subgroups, and in this case RCTs which enroll all patients are not necessarily the most efficient approach in phase III. Instead enriched designs that recruit patients likely to benefit have considerable advantages. There is a danger that enriching a phase III trial too much may lead to missing out on a patient subgroup that would have actually benefited. This therefore motivates phase II trials of targeted agents investigating not only whether a drug works but in which patient subgroups it works in. Several ‘biomarker-driven’ trial designs have been proposed to allow investigation of multiple treatment arms in different patient subgroups Buxton et al. (2014), Kaplan et al. (2013), Kaplan (2015), Middleton et al. (2015). In the case where each treatment can be tested in each subgroup, the number of hypotheses to be tested in a trial can be very large. Some recent papers providing an overview of biomarker-driven trial designs include Antoniou et al. (2016), Renfro and Sargent (2017), Antoniou et al. (2017) and Parmar et al. (2017). One important aspect of biomarker-driven trial designs that has not been well researched is how to use the information collected from a phase II trial assessing multiple treatments and biomarkers to design the most efficient phase III designs. In particular it would be very useful to have a framework which determines which treatments should be tested in phase III, and in which biomarker subgroups. There has been some work in the context of evaluating a single experimental treatment (Ondra et al., 2016), but to our knowledge none that investigates novel multi-arm phase II biomarker-driven trial designs. Considering a regression model with first order interaction terms, we propose a tool to design a confirmatory trial based on the analysis of an observed phase II trial or a historical study. There are two elements in this tool: Bayesian data analysis on data of a phase II trial, and the application of design of experiments to finding an optimal design for future experiment. The focus of our tool is to find an efficient design that could reject false null hypotheses in the confirmatory trial with high power. Bayesian data analysis is a flexible approach where the knowledge and confidence of clinicians can be incorporated into the framework via the specification of a prior distribution. When the sample size of the observed trial is small, we suggest bootstrapping the data for the Bayesian analysis, and conjecture a subset of hypotheses that would be tested in a confirmatory trial based on a posterior predictive distribution from the analysis. We then use the notion of design of experiments to find the optimal treatment randomization scheme for the future experiments based on these information. Design of experiments is an approach that provides guidance on data collection such that sufficient information could be collected for a future experiment. We consider a weighted version of -optimal criterion that resemble the idea of Morgan and Wang (2010) where they consider weighted -, -, and -optimal designs for a factorial model. Sverdlov and Rosenberger (2013) review methods on finding optimal allocation for multi-arm clinical trials, where the design depends on the unknown parameters of a factorial model. We note that the Bayesian data analysis in our framework is independent of the commonly used Bayesian optimal design framework, see for example Kathryn and Verdinelli (1995) for the review on Bayesian optimal design framework. Our framework can be generalized to finding a Bayesian optimal design for generalized linear and nonlinear models. The structure of the paper is as follows. We present a statistical model and hypothesis testing procedure for the trial with biomarker setting in Section 2. We introduce our novel design approach in Section 3, and conduct simulation study to compare the performance of the proposed optimal designs with two commonly employed designs in Section 4. We discuss our work and provide some insights into future research topics in Section 5.

Motivating trial

As the motivation for the work that follows, we consider a phase II trial that, at the time of writing, is under consideration for funding. This trial will test two experimental targeted treatments (T1 and T2), against chemotherapy control, for high grade serous ovarian cancer. Two biomarkers are included (B1 and B2) with it being thought likely (but not definite) that T1 will work best in B1 positive patients and T2 in B2 positive patients. Patients can be positive for B1, B2, both or neither. The endpoint used for efficacy is six month change in the level of circulating tumor DNA in the blood, which will be treated as normally distributed on the log scale. The objective of the phase II trial is to determine which of T1 and T2 should be tested in a larger phase III trial, and in which patient subgroups. The methodology in this paper will be used for helping to make this decision.

Background and notation

Let vector be a biomarker profile of patient where represents patient is positive for biomarker , and otherwise, ; be the experimental treatment indicator where indicates that patient receives treatment . The response model for patient is where is the placebo/control effect for a patient with a negative biomarker profile, i.e. , is the main effect of experimental treatment , is the main effect of biomarker , and is the interaction between treatment and biomarker . A placebo/control treatment is indicated by , . The residual errors, , are assumed to be identically and independently distributed, and that they are normally distributed with zero mean and a common variance , i.e. , . As an example, consider a trial where there are two experimental treatments and two biomarkers, i.e. and , and that each patient receives only one treatment (either or ) or a placebo/control treatment, and . The response model is where is a row vector of the design matrix, denoted by , of the regression model, and can be estimated by least squares estimator, , that has where , and correspond to the number and proportion of patients who have the same , and denotes the number of unique . Without loss of generality, negative values indicate that the treatment is effective on the patients. We can test the existence of treatment effect on patients with different biomarker profiles by conducting hypothesis tests where is a vector that indicates the linear combination of regression parameters, i.e. the corresponding treatment effect that is different to the placebo/control, is an index of the hypotheses that are possibly be tested in the study. As an example, consider patients who have biomarker profile , the difference between the effect of and on this subgroup is + . Table 1 shows the treatment effect and the difference to the control/placebo effect for each subgroup where the responses follow (1), Table 2 shows the possible values of when model (1) is the analysis model.

Table 1

i′	xi1	xi2	Ti1	Ti2	Treatment effect	Difference to control	ξrct	ξtlt	ξL∗
1	0	0	0	0	α		1/3	1/3	0.46
2	0	0	1	0	α + β1	β1	1/3	1/3	0.54
3	0	0	0	1	α + β2	β2	1/3	1/3	0.00

4	0	1	0	0	α + γ2		1/3	1/2	0.46
5	0	1	1	0	α + γ2 + β1 + δ12	β1 + δ12	1/3	0	0.00
6	0	1	0	1	α + γ2 + β2 + δ22	β2 + δ22	1/3	1/2	0.54

7	1	0	0	0	α + γ1		1/3	1/2	0.46
8	1	0	1	0	α + γ1 + β1 + δ11	β1 + δ11	1/3	1/2	0.54
9	1	0	0	1	α + γ1 + β2 + δ21	β2 + δ21	1/3	0	0.00

10	1	1	0	0	α + γ1 + γ2		1/3	1/3	0.47
11	1	1	1	0	α + γ1 + γ2 + β1 + δ11 + δ12	β1 + δ11 + δ12	1/3	1/3	0.00
12	1	1	0	1	α + γ1 + γ2 + β2 + δ22 + δ21	β2 + δ22 + δ21	1/3	1/3	0.53

randomized controlled trial; biomarker–treatment linked trial; optimal design for the first illustration.

Table 2

The possibly tested hypotheses when model (1) is the analysis model. The th column represents the vector .

r	1	2	3	4	5	6	7	8
θ	cr
α	0	0	0	0	0	0	0	0
β1	1	1	1	1	0	0	0	0
β2	0	0	0	0	1	1	1	1
γ1	0	0	0	0	0	0	0	0
γ2	0	0	0	0	0	0	0	0
δ11	0	1	0	1	0	0	0	0
δ12	0	0	1	1	0	0	0	0
δ21	0	0	0	0	0	1	0	1
δ22	0	0	0	0	0	0	1	1

Biomarker–treatment combinations, model parameters for each combination, difference between the effect of a treatment and the placebo/control within the subgroups, and randomization scheme of some designs. randomized controlled trial; biomarker–treatment linked trial; optimal design for the first illustration. We reject the null hypothesis, , if the test statistic, where is the quantile of a standard normal distribution, with Type 1 error. It is a desired property that is small in statistical analysis, reflecting that the data provides good and sufficient information to understand the underlying random process. In terms of a hypothesis test, smaller values of lead to higher probability to reject false null hypotheses (i.e. when the true value ). We note that the real parameter values, , are not known in practice but the covariance matrix of the least squares estimator is inversely proportional to the information matrix, , which does not depend on . The possibly tested hypotheses when model (1) is the analysis model. The th column represents the vector .

Design of confirmatory trial

To investigate the effectiveness of treatments on biomarker subgroups, the biomarker profiles need to be known in advance. Conventional designs such as a randomized controlled trial would fail to study the treatment effect on different subgroups when the information on biomarker profiles is unavailable. On the other hand, a biomarker–treatment linked trial would not administer a linked-treatment to a subgroup with a negative biomarker, and administer all possible tested treatments to the subgroup who has all negative biomarkers. If treatments work only in biomarker positive subgroups, an enrichment design could be implemented where the subgroup with all negative biomarkers is excluded in the trial. All of these commonly used designs consider equal randomization probabilities to assign treatments within subgroups. The columns, and , in Table 1 show the randomization schemes of a randomized controlled trial and of a treatment linked trial for a study with two independent biomarkers and two independent treatments. Instead of using these conventional designs, we propose to design a phase III/confirmatory trial based on an analysis of a phase II/exploratory study. The idea of our design framework is as follows: analyze an observed data using a Bayesian framework to provide guidance on selecting a subset of possibly tested hypotheses and subgroups of patients to enroll in the confirmatory trial; formulate an analysis model for the future experiment and find an optimal randomization scheme. We propose to formulate an analysis model parsimoniously at the design stage to save costs on enrollment and data collection for the future confirmatory trial. For a chosen model, we consider the notion of design of experiments whereby the optimal treatment allocation scheme is of interest. An optimal design framework chooses the setting of a confirmatory experiment through the design matrix such that some functions of are minimized. The following sections illustrate the key ideas of our framework: investigate which hypotheses are more beneficial to focus on in the future experiment, and find an efficient design that has high probability of recommending truly effective treatments in the right subgroups.

Specification of relative importance of hypotheses

We now illustrate the selection of hypotheses and subgroups using the data of a previous experiment such as a phase II study. The idea is to consider an analysis approach that aims to provide insight into a predictive distribution of model parameters of the future confirmatory trial. We consider a Bayesian approach of data analysis to account for the unseen uncertainty when selecting a subset of hypotheses that will be tested in the confirmatory trial, reflecting the prior belief or confidence of an experimenter in terms of what might happen in the future experiment. This is achieved by specifying a prior distribution for the model parameters. When the sample size of an observed study is small, we propose to use simple bootstrap procedure on the data, and conduct the Bayesian analysis in each bootstrap replication to overcome the variability of using only one set of data to make selection of hypotheses. If only the summary statistics of the phase II trial are available, we recommend to replicate the phase II trial and explore the operating characteristics of the optimal designs for a future experiment, where each optimal design is obtained based on the analysis of a replication of the phase II trial. To illustrate our framework, we use a conjugate prior in the following presentation. Markov chain Monte Carlo (MCMC) sampling could be used when the posterior distribution is intractable. We define the total sample size of the phase II trial as , the vector of observed responses as , design matrix as , and regression parameters of the response model of phase II as . For an observed study with a small , we first bootstrap and with replacement and conduct Bayesian data analysis on each bootstrap sample. For example, using a normal-inverse-gamma prior, , for , we obtain a posterior distribution , , for each bootstrapped sample. The marginal distribution of follows a multivariate -distribution, with and degree of freedom. Let denote the relative importance of hypothesis , . We propose to generate large samples from the posterior distribution (use MCMC sampling for intractable posterior distribution) to compute where is approximately normally distributed with mean , and variance , and could be the minimum uninteresting treatment difference threshold for hypothesis . We then find where the expectation is averaged across the number of bootstrap replications, and compute , the relative importance of hypothesis , by Hypothesis is selected and the corresponding subgroup is enrolled into the confirmatory trial if . These information is then used to formulate a parsimonious linear regression model and the design criterion for finding an optimal randomization scheme for the future confirmatory trial. Note that need not sum up to 1 in the design problem.

Design of experiments

We now describe the design framework for finding an optimal design. After formulating the analysis model parsimoniously for the future experiment such that the parameters of interest are estimable, we want to find an optimal design, that minimizes where reflects the relative importance of the corresponding hypothesis . This setup has several well-known optimality criteria as special cases: the design criterion is called -optimality when (page 111 in Luc Pronzato and Andrej Pazman (2013)); it is an -optimality when the summation sums the individual variances of the model parameters and is a -optimality when and . The incorporation of into the standard -optimality resembles the idea of Morgan and Wang (2010) where the authors consider weighted -, -, and -optimal designs for a factorial model. A design is called a continuous design when need not be a positive integer in the optimization search but , . Otherwise is called an exact design where is a positive integer. The notion of a continuous design facilitates the search of an optimal design over a design region. A rounding procedure can be applied to the continuous design for practical implementation, see for example Pukelsheim and Rieder (1992). The interpretation of in the conventional design framework context is that it is the suggested proportion of subgroups who have the corresponding . In the trial with biomarker setting, it is difficult to enroll the patient subgroups in practice according to the exact proportions. Hence, we use to compute the randomization probability for each selected subgroup instead. Note that no rounding procedure is needed in this case. For a given total sample size, we find of by minimizing (2), subject to . The relative importance of the selected subset of hypotheses and the subgroups are reflected by the values of , , and biomarker–treatment combination . To avoid confusion, we use the same index to denote the hypotheses in the design framework even some of the hypotheses are not selected to be tested in the future experiment. As an example, consider that we are interested in testing only the following null hypotheses: ; enroll biomarker–treatment combinations, and . ; enroll and . ; enroll and . ; enroll and . We propose to employ where + instead of model (1) for the analysis of the future trial. An optimal design problem is to minimize subject to , for . We note that the parameter would not be estimable when the optimal design is used because none of the selected patients would have . Besides that, we can only estimate the combined effect of here as the enrolled patients would have only if . Note that if the experimenter chooses and hence , model (1) would be used as if in the randomized controlled trial where all subgroups are enrolled into the trial. In general, the analysis model for a future experiment could be formulated by considering the model parameters of the selected hypotheses and the subgroup biomarker profiles. A linear combination of model parameters might be replaced by a new variable accordingly such that the information matrix is of full rank.

Illustration: a trial with two biomarkers and two treatments

In this section, we conduct simulation studies to illustrate the application of our framework to finding an optimal design for a confirmatory trial based on Bayesian analysis of a historical study, and make comparisons on the operating characteristics of a randomized controlled trial, , a biomarker–treatment linked trial, , and optimal designs. Throughout the illustration, we consider the presence of two independent biomarkers with prevalence rate of 0.3 each, and two independent treatments that are linked to the biomarkers. In the first part of this illustration, we simulate a set of phase II data for finding an optimal design based on our framework, and use the bootstrap estimates as the true model parameters in the simulation of a confirmatory trial. We replicate a confirmatory trial using the randomization scheme of the designs to study the performance of different designs. In the second part of the illustration, we do not bootstrap a set of data but replicate the phase II trial according to a set of model parameters to explore the operating characteristics of different optimal designs, where each optimal design is found based on the analysis of a replicated phase II study. We compare the performance of these optimal designs with and in the simulation of a confirmatory trial where the true model parameters are the same as those used in the replication of the phase II trial. The first illustration shows the role of on the optimal design when a set of phase II data is available; the latter illustration reflects the situation where only the summary statistics of a phase II trial are available, and shows the role of on the optimal design when . Consider a phase II randomized controlled trial, , that studies the effect of two independent treatments, and , on patients where the information of two independent biomarkers, and , are available. For , we simulate biomarker profiles using binomial distributions with prevalence rate of 0.3 for each biomarker, treatment allocation with equal randomization probability, and a set of according to model (1) with , and . We note that model (1) has , possible hypotheses (see Table 2). We choose prior values , , and a diagonal matrix with . The values of and are chosen to reflect the belief that the model parameters of a confirmatory experiment, apart from and , have no effect on the responses. The variances of and are relatively larger than the variances of other parameters show that there are more uncertainty in the belief that and may have negative effect on the responses. We consider bootstrapping the data 10 000 times in this illustration. For each bootstrap sample, we compute and , and generate 10 000 samples from the posterior distribution, , to compute by choosing . We then use all from the bootstrap replications to compute and by choosing . Based on the values of , we formulate a model parsimoniously and find an optimal design by minimizing (2) using the function fmincon in Matlab. In this illustration, we obtained , , , , , , and . Hence, we have for , and for , in this example. An optimal design denoted by is obtained for this setting. To study the operating characteristics of the design, we conduct simulation study with a larger sample size to reflect the practice of a confirmatory trial. We compare the performance of , with a randomized controlled trial, , and a biomarker–treatment linked trial, . These designs are different in terms of number of subgroups and treatment allocation scheme, see Table 1. We use the same prevalence rate, i.e. 0.3 for both independent biomarkers, and sample size of 1000 in the simulation of the confirmatory trial, whereby the treatment randomization scheme follows a design. In each simulation, we replicate the biomarker profiles 100 times; for each replication of biomarker profiles, we replicate treatment allocation according to the randomization scheme 100 times. The responses are generated according to model (1) with the expected model parameters from the bootstrap replications, , and . Using the simulated responses, we estimate the model parameters by least squares estimation, and compute the test statistics for hypothesis test with in each replication. The number of times that a null hypothesis is rejected is averaged across the replications (both the replication of treatment allocation and biomarker profiles), giving the power of rejecting a null hypothesis if it is false, or a Type 1 error if it is true. In this illustration, we know , for . We compute the expected number of correct rejection of false null hypotheses (ENCR) to make comparisons between the designs. Table 3 shows the effect sizes from different hypotheses and the operating characteristics of different designs in the simulation of a confirmatory trial. In this illustration, we find that the power of rejecting is the largest whereas the power of rejecting is the smallest; the Type 1 error of rejecting and are close to 0. This shows that the true values of hypotheses play a major part in hypothesis test where large magnitude of model parameters is easier to detect than others that are close to the bound chosen for a hypothesis test. The other reason that the power of rejecting is so small is due to small subgroup sample size and the fact that the power of rejecting a null hypothesis with an interaction term is generally lower than that with only main effect parameters (Follmann, 2003). The expected sample size of this subgroup is , which is not sufficient for detecting the treatment effect that is very close to zero. Comparing the designs, the optimal design generally achieves larger power of rejecting the false null hypotheses. Consider the expected number of correct rejection of false null hypotheses (ENCR), we find that and would lose about 11% and 7% efficiency over . Looking at testing , we find that the power could be increased significantly if the optimal design is used in the experiment, i.e. 0.11 and 0.12 more than what and could achieve. The randomization scheme of the optimal design for this corresponding subgroup is 0.46 for receiving placebo and 0.54 for receiving treatment 1, whereas the later designs are having 1/3 for receiving placebo, treatment 1 and treatment 2. This shows that enrolling a less informative biomarker–treatment combination could be waste of resources. In particular, is not significant in this illustration and that having the biomarker–treatment combination (see Table 1) in the experiment is not beneficial. Conjecturing which hypothesis is significant and to be tested in future experiment is a crucial step in designing the trial.

Table 3

Effect sizes from different hypotheses and probability of rejecting a null hypothesis in the simulation of different designs.

Effect sizes from different hypotheses in the simulation of a confirmatory trial
	c1′θ	c2′θ	c3′θ	c4′θ	c5′θ	c6′θ	c7′θ	c8′θ
	−0.155	−0.169	0.135	0.121	0.324	0.750	−0.460	−0.034

Probability of rejecting a null hypothesis in the simulation of a confirmatory trial

	c1′θˆ=0	c2′θˆ=0	c3′θˆ=0	c4′θˆ=0	c5′θˆ=0	c6′θˆ=0	c7′θˆ=0	c8′θˆ=0	ENCR

ξrct	0.43	0.31	0.01	0.02	0.00	0.00	0.90	0.09	1.74
ξtlt	0.41	0.37	0.02	0.02	0.00	0.00	0.96	0.08	1.81
ξL∗	0.54	0.37	NA	NA	NA	NA	0.96	0.08	1.95

ENCR= expected number of correct rejection of false null hypotheses.

To illustrate further, we consider the same trial setting but that only the summary statistics of a phase II trial are available. We consider for and for , and . Instead of bootstrapping one data set (that gives different parameter estimates in each bootstrap replication), we replicate a phase II trial 500 times according to model (1) with an equal probability randomization scheme. These replications reflect the variability of the error term and the randomization scheme while remain the same across all the phase II replications. For each simulated phase II trial, we draw 10 000 samples from the posterior predictive distribution (i.e. that has the same prior as the previous illustration) and compute by setting . An optimal design is then found based on the analysis of each phase II replication. Fig. 1 shows ENCR on the -axis, that are achieved by the 500 optimal designs where each optimal design corresponds to a simulated phase II study. With total sample size of 1000, nominal Type 1 error of 5%, and true treatment difference of −0.2 for each hypothesis , we find that the optimal designs are expected to reject at least 1.65 false null hypotheses in the simulation, whereas and are giving 1.647 and 1.707 respectively.

Fig. 1

Frequency of optimal designs that achieve the expected number of correct rejection of false null hypotheses (ENCR).

Effect sizes from different hypotheses and probability of rejecting a null hypothesis in the simulation of different designs. ENCR= expected number of correct rejection of false null hypotheses. Frequency of optimal designs that achieve the expected number of correct rejection of false null hypotheses (ENCR). Table 4 shows the probability of rejecting each null hypothesis , , achieved by , , and . The former two are optimal designs that have the highest and the lowest ENCR (in the simulation of a confirmatory trial) out of the 500 optimal designs, where each design corresponds to a replication of a phase II study. Across all designs that have different randomization schemes, we find that the Type 1 error of rejecting each true hypothesis is no greater than 0.03 when the nominal is used in the simulation. Comparing the four false null hypotheses that have the same true value of −0.2, we find that the power of rejecting is the highest and is the lowest across all designs. This is not surprising as the subgroups have different sample sizes and that with prevalence rate of 0.3 for both biomarkers, the total sample size of , is expected to be , and that of , is . The last column of Table 4 shows that and would lose about 11% and 7% efficiency in terms of ENCR when compared with . We find that achieved a similar ENRC when compared to , but not better than . The latter finding is mainly due to the fact that excluded (see Table 1) in the design whereas enrolled these biomarker–treatment combinations in the trial, when have limited contribution on testing false null hypotheses . The former result might be due to the random error of that particular replication of a phase II trial. If we bootstrap this replicated data assuming that they are the observed phase II data, we would then proceed as in the previous illustration but with , and could potentially obtain a better design than and for the future experiment.

Table 4

Probability of rejecting a null hypothesis. We know , , and , . Each row corresponds to the performance of a design.

	c1′θˆ=0	c2′θˆ=0	c3′θˆ=0	c4′θˆ=0	c5′θˆ=0	c6′θˆ=0	c7′θˆ=0	c8′θˆ=0	ENCR
ξbest∗	0.67	0.42	0.02	0.02	0.01	0.02	0.43	0.32	1.83
ξworst∗	0.61	0.39	0.01	0.02	0.01	0.02	0.35	0.28	1.64
ξrct	0.59	0.39	0.02	0.02	0.01	0.02	0.38	0.29	1.65
ξtlt	0.56	0.45	0.03	0.03	0.01	0.03	0.45	0.24	1.71

Probability of rejecting a null hypothesis. We know , , and , . Each row corresponds to the performance of a design.

Discussion

We have proposed a framework for designing confirmatory trials based on the analysis of a phase II study testing multiple experimental treatments in different biomarker subgroups. The design framework provides guidance on selecting biomarker–treatment combinations and a treatment randomization scheme. When using a single regression model to analyze a multi-arm trial, the dimension of model parameters depends on the number of biomarker–treatment combinations. For example, in the presence of binary biomarkers and experimental treatments, there are possible hypotheses that test treatment differences between treatments and a placebo. To find an efficient design, we propose a framework to select a subset of hypotheses out of the many possibly tested hypotheses based on Bayesian data analysis of a phase II/historical study, and use the notion of design of experiments to find the treatment randomization scheme for implementing a future experiment. We show that doing a traditional randomized trial enrolling all patient subgroups is not the most efficient approach when the treatments work only on certain subgroups. We propose selecting the hypotheses to be tested in a confirmatory trial based on Bayesian data analysis of the phase II study. When the sample size of the observed study is small, we suggest to consider bootstrapping the available data to minimize the bias in estimation (due to small sample) that may cause bias in hypothesis selection. When only summary statistics are available, we suggest simulating the phase II study and exploring the operating characteristics of optimal designs before choosing one design for implementation. The above illustration shows that , and play important roles in designing an experiment. In practice, and should be chosen carefully in discussion with clinicians based on their experiences. We note that our framework is flexible that each hypothesis may have different and . The specification of a prior distribution is another aspect that should reflect the knowledge of the clinicians on the likely magnitude of the treatment effect. When the posterior predictive distribution of model parameters is not in closed-form, we suggest to draw large samples using MCMC approaches, aiming to approximate the distribution of future model parameters by normal approximation. Concerning the values of , we find that small aberration on each does not affect the optimal randomization scheme notably. The operating characteristics of a design depend on the true model parameters more than . However, the true model parameters are never known in practice. Instead of enrolling subgroups according to the optimal proportion, , , as suggested by the classical design framework, we convert them into randomization probability as it facilitates the implementation of a trial that considers subgroup stratification. However, this approach would complicate the sample size calculation especially when the prevalence rates of biomarkers are at the extreme end of the range. We consider using a single model in the analysis such that subgroups with small sample sizes may borrow information from other subgroups that have larger sample sizes. We have presented the framework for normally distributed responses. We note that our framework could be extended to nonlinear models where adjustments to the Bayesian data analysis, computation of , and the design criterion would be required. When the model is nonlinear, the covariance matrix of the parameters may depend on the unknown parameters, leading to issues with finding an optimal design. For a chosen value of the model parameters, the classical optimal design framework could provide a locally optimal design. An alternative to this is to use a Bayesian optimal design framework whereby a prior distribution of the model parameters is incorporated into the design criterion for finding an optimal design. We note that the proposed framework is different to the commonly known Bayesian optimal design framework as we only use historical data to estimate prior to finding an optimal design. Nevertheless, our framework could be extended to the nonlinear situation accordingly. For example, the observed phase II data may be used to construct the prior distribution for the unknown parameters in the Bayesian design framework, while the prior distribution in the Bayesian analysis of phase II data reflects the prior knowledge or confidence of the clinicians on the future experiment. These two different prior distributions could be the same or different. Other possible extensions of our framework could be the incorporation of missing data, see Lee et al. (2017a), Lee et al. (2017b), and incorporation of cost functions, see Cook and Fedorov (1995). One of the potential topics for future research is to account for the population drift or change in baseline measure of subgroups in the design framework. When data of control groups from several small trials that studied the same disease are available, having a design framework that considers this aspect could be beneficial for the clinical community. See for example Thall and Simon (1990) and Boonstra et al. (2016) who consider the design and information gain when historical control data is incorporated, and van Rosmalen et al. (2017) for the review of methods for incorporating historical control data. Besides that, we have not accounted for all the aspects of an analysis plan. For example, when a single statistical model is chosen to analyze all data, most of the hypotheses are not independent as the information from different subgroups is shared across themselves. Future work could focus on constructing optimal designs that account for issues such as multiplicity while optimizing the hypotheses selection as our design framework has done here. The issue that arose in subgroup sample size calculations may also be incorporated and addressed in the prospective design framework. To conclude, we have proposed a novel design framework for designing a biomarker driven confirmatory trial based on the analysis of an observed experiment which provides an increased chance of determining which subgroups a targeted treatment genuinely works in.

11 in total

Review 1. The FOCUS4 design for biomarker stratified trials.

Authors: Richard Kaplan
Journal: Chin Clin Oncol Date: 2015-09

2. Increasing efficiency for estimating treatment-biomarker interactions with historical data.

Authors: Philip S Boonstra; Jeremy Mg Taylor; Bhramar Mukherjee
Journal: Stat Methods Med Res Date: 2014-05-21 Impact factor: 3.021

3. Incorporating historical control data in planning phase II clinical trials.

Authors: P F Thall; R Simon
Journal: Stat Med Date: 1990-03 Impact factor: 2.373

Review 4. Biomarker-Guided Non-Adaptive Trial Designs in Phase II and Phase III: A Methodological Review.

Authors: Miranta Antoniou; Ruwanthi Kolamunnage-Dona; Andrea L Jorgensen
Journal: J Pers Med Date: 2017-01-25

Review 5. Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples.

Authors: L A Renfro; D J Sargent
Journal: Ann Oncol Date: 2017-01-01 Impact factor: 32.976