Literature DB >> 29474933

An optimised multi-arm multi-stage clinical trial design for unknown variance.

Michael J Grayling1, James M S Wason2, Adrian P Mander3.   

Abstract

Multi-arm multi-stage trial designs can bring notable gains in efficiency to the drug development process. However, for normally distributed endpoints, the determination of a design typically depends on the assumption that the patient variance in response is known. In practice, this will not usually be the case. To allow for unknown variance, previous research explored the performance of t-test statistics, coupled with a quantile substitution procedure for modifying the stopping boundaries, at controlling the familywise error-rate to the nominal level. Here, we discuss an alternative method based on Monte Carlo simulation that allows the group size and stopping boundaries of a multi-arm multi-stage t-test to be optimised, according to some nominated optimality criteria. We consider several examples, provide R code for general implementation, and show that our designs confer a familywise error-rate and power close to the desired level. Consequently, this methodology will provide utility in future multi-arm multi-stage trials.
Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Familywise error-rate; Group sequential; Interim analyses; Multi-arm multi-stage; t-Statistic

Mesh:

Year:  2018        PMID: 29474933      PMCID: PMC5886309          DOI: 10.1016/j.cct.2018.02.011

Source DB:  PubMed          Journal:  Contemp Clin Trials        ISSN: 1551-7144            Impact factor:   2.226


Introduction

With the cost of drug development increasing, study designs that can enhance the efficiency of clinical research are of great interest. One such class of designs is the group sequential [1]. This approach exploits the fact that data are accumulated over time: incorporating interim analyses at which the study may be stopped early, reducing the required sample size. Recently, this methodology was extended to allow multiple treatments to be compared to a shared control [2]. These multi-arm multi-stage (MAMS) designs can bring sizeable gains in efficiency over conducting a series of single-stage two-armed trials [3]. Unfortunately, a limitation of this methodology in the case of normally distributed outcome data is that designs are usually determined under the supposition of known patient variance in response. Typically, this will not be the case at the design stage. Then, utilising test statistics that assume known variance will result in operating characteristics that differ from their nominal level if the true variance is not equal to the specified value. For two-armed group sequential trials, several authors have suggested methods to broach this problem. These include a recursive algorithm [4], and a quantile substitution procedure [1,5]. The latter approach was also explored for MAMS trials, and demonstrated to more accurately control the familywise error-rate (FWER) to the desired level, at a small cost to the trial's power [6]. A Monte Carlo based procedure was also proposed for two-armed group sequential trials [7]. In this paper, we extend it to MAMS trials. Explicitly, we describe how the stage-wise group size and stopping boundaries can be optimised. Finally, using the TAILoR trial [2] as a motivating example, we compare the performance of our method to several other approaches.

Methods

We consider a MAMS trial with K + 1 arms, and a maximum of J stages. Of the arms, K (indexed k = 1, …, K) are to be compared to a single control arm (indexed k = 0). We test the following hypotheses Here, μ is the mean response of patients allocated to arm k = 0, …, K. We assume that in each stage, n patients are allocated to each arm present in the trial. To allow for the early dropping of arms, we denote by n the actual number of patients allocated to arm k = 0, …, K in stage j = 1, …, J. Thus, n ∈ {0, n}. Designs with unequal allocation, or with two-sided null hypotheses could be treated similarly. Denoting by X the response of the ith patient, in treatment arm k, in stage j, we assume that the X are independent and distributed as X~N(μ, σ2). Extending [7], setwhere X = 0 ∀i if n = 0. At interim analysis j the following test statistics are constructed When σ is assumed known, the T(σ) are together multivariate normal (henceforth, the z-test statistics). With σ replaced by its estimate , the joint distribution of the resulting t-test statistics, , does not have a simple form. It is this that makes the determination of stopping boundaries for use with t-test statistics difficult. The parameters describing a MAMS design are then fully specified given efficacy and futility stopping boundaries  = (e1, …, e) ∈ ℝ and  = (f1, …, f) ∈ ℝ, with e = f to ensure the trial has at most J stages. We now consider two categories of MAMS design: one that terminates the entire trial as soon as any null hypothesis is rejected, and one that stops recruitment only to those arms for which the corresponding null hypothesis has been accepted or rejected. These two types of design have been referred to as including simultaneous and separate stopping respectively [8]. To describe our design, we introduce the vectors  = (ψ1, …, ψ) and  (ω1, …, ω), where ψ ∈ {0, 1}, with ψ = 1 if H0( is rejected, and ψ = 0 otherwise; ω ∈ {1, …, J}, with ω = j if j is the analysis at which H0( is rejected, accepted, or the whole trial is stopped and no decision on H0( is made. Our MAMS t-test is then defined as follows Set  = (ψ1, …, ψ) =  (ω1, …, ω) = (0, …, 0) and j = 1. Conduct stage j of the trial, allocating n patients to the control arm, and n patients to each arm k with ω = 0. Compute the T. For k = 1, …, K If T ∈ (f, e] for l = 0, …, j − 1 (with the convention T ∈ (f0, e0] ∀k), then: If T ≥ e reject H0( and set ψ = 1, ω = j; If T < f accept H0( and set ω = j; When using the simultaneous stopping rule, if and , set j = j + 1 and return to 2. Else stop the trial, and for each k with ω = 0, set ω = j. When using the separate stopping rule, if , set j = j + 1 and return to 2. Else stop the trial. On trial completion, and then conform to their designations above. We would like to ensure that the FWER, the probability of rejecting at least one true null hypothesis, is controlled to some level α. Note that our methodology could be altered to instead control the pairwise error-rate (PWER), the probability of rejecting a particular true null hypothesis, if desired. There are several ways to define power in a multi-arm setting. Here, as in [2] we desire power of at least 1 − β to reject H0(1) when θ1 = δ1 and θ = δ0 for k = 2, …, K. This is the so-called pairwise power for H0(1) (see, e.g., [9]). To this end, define Here is the indicator function on event x. Furthermore, Ξsim and Ξsep represent the set of possible combinations when using the simultaneous and separate stopping rules respectively. Then, for Ξ ∈ {Ξsim, Ξsep} set according to the chosen stopping rule, take Here, Ξrej and Ξ1 are respectively the subsets of the Ξ such that at least one null hypothesis, or H0(1), is rejected. In the Supplementary material we elaborate on the meaning and construction of these sets. Denoting the probability of a particular (, ) combination on trial completion for a given vector of treatment effects  = (θ1, …, θ) by ℙ(,  | ), we specify our required operating characteristics aswhere  = (δ1, δ0, …, δ0). Additionally, we optimise our choices of n, , and . In theory, this could be achieved for almost any optimality criteria, with several sensible choices having been previously proposed (see, e.g., [10]). Here, we focus on minimising some weighted combination of the expected sample sizes (ESSs) when and , and the maximal possible sample size; an approach that has in several trial design settings proved effective [5,11]. Note that This could therefore, following [12], be achieved by identifying the n, , and that minimise the following function Here P ∈ ℝ+ is a penalty for designs with undesirable operating characteristics, taken as the sample size required by a corresponding single-stage design. Moreover, the w ∈ ℝ ∪ {0}, for i = 1, 2, 3, are weights given towards the desires to minimise the three included factors. Note that previous work suggests that designs that place all of their weight on one of the three factors (e.g., w1 = 1, w2 = w3 = 0), will perform particularly badly for other choices of the weights [5,11]. It is therefore advisable to consider a range of options for the weights, and also to take w ≠ 0, for i = 1, 2, 3. In the case where σ is assumed to be known we can proceed forward with this approach, using the methodology described in the Supplementary material. Unfortunately, the complex joint distribution of the T when σ is not assumed to be known prevents us from calculating the ℙ(,  | ) required for this exactly. Therefore, instead, we use a Monte Carlo method. We offer first a more practical description of how this works, before providing a formal description below. Suppose as an example that J = K = 2, μ0 = μ1 = μ2 = 0, σ2 = 1, and that we will use the simultaneous stopping rule. For any choice of values for n,  (e1, e2), and  (f1, f2) (with e2 = f2), we can simulate a trials outcome by generating data from each treatment arm in stage one, using the fact that X~N(0, 1) for k = 0, 1, 2 and i = 1, …, n. With this data, the T for k = 1, 2 can be formed. If T ≥ e1 for k = 1 or 2, the trial terminates, with a familywise error (FWE) having occurred. If T < f1 for k = 1 and 2, then the trial also terminates here, with no FWE having occurred. Otherwise the trial progresses to stage 2, with recruitment continued in arm 0 and the arms k with f1 ≤ T < e1. We draw data for stage two in these arms again using the standard normal distribution, and then compute the T for those k with f1 ≤ T < e1. The trial now terminates, either with a FWE having been committed if for at least one of these k, T ≥ e2, or without a FWE having been committed otherwise. The FWER can then be estimated for this design by repeating the above process many times, and counting the proportion of instances in which a null hypothesis is rejected. Similarly, one can estimate the power under the LFC, or estimate ESSs. For example, if in the above T11 < f1, but f1 ≤ T21 < e1, this implies the required sample size for this particular trial would have been 5n. Again, by replicating this process and averaging the realised sample size in each simulation, we can compute the ESS. Finally, a suitable global optimisation routine can then be used to search for the optimal values of n, , and Many such routines are now available in standard statistical software, with simulated annealing in particular having been used to good effect in clinical trial design (see, e.g., [11,12]). Formally, we generate R = 100,000 independent sets of responses for each treatment arm under for some suitably large value of n. Subsets of these datasets are then used to form the responses for any smaller value of n. Next, for any n, , and , and chosen stopping rule, for the rth dataset, the trial is conducted as specified above. Importantly, the values of and on trial completion are determined, and denoted and . An approximation to αFWER for this design is then We can similarly compute approximations , , and to βpower ESS(0), and ESS(). Thus, to find the optimal design, we minimise the following function in n, and Note that this equation is equal to that of Eq. (2), except we have replaced αFWER and βpower, which we cannot evaluate because of the complexity of the joint distribution of the test statistics, by their estimates computed via our Monte Carlo simulation approach. Furthermore, note that the requirement to generate datasets necessitates n to be treated as an integer. Thus, an algorithm that can simultaneously search over the discrete n, and the continuous and is required. We achieve this using CEoptim in R [13]. Code to implement our method is available from https://sites.google.com/site/jmswason/supplementary-material.

Results

We consider examples based on the TAILoR trial [2]. The trial tested three experimental treatments. We therefore take K = 3, and as an example set J = 2, σ2 = 1, α = 0.05, and β = 0.1. Conforming to our recommendations above, we additionally take w1 = w2 = w3 = 1/3 (the ‘balanced-optimal design’). As in previous work, we consider two scenarios [6]. For Scenario 1 we set δ1 = 0.545, δ0 = 0.178, and for Scenario 2, δ1 = 1, δ0 = 0. For both scenarios, and both considered stopping rules, we determined the balanced-optimal design for t-test statistics using the Monte Carlo method described above (denoting the optimal parameter values for this design by n, , ). For comparison, we use the triangular designs [14] for z-test statistics (denoting the parameter values for this design by n, , ), which can be found using the MAMS package in R [2]. These designs are so-named for the shape of their stopping regions, can be found quickly, and have been shown to confer good performance in terms of their associated ESSs for MAMS trials [12]. The resultant designs are given in Table 1. Note that it is by construction of the triangular test that the boundaries are equal in each instance, subject to numerical error. The choice of stopping rule, along with the values of δ1 and δ0, influence the group size.
Table 1

The triangular designs determined using the known variance test statistics, and the balanced-optimal designs determined using the unknown variance test statistics, are displayed for the two considered trial design scenarios, and the two considered stopping rules. All boundaries are given to three decimal places.

ScenarioStopping ruleTriangular design
Balanced-optimal design
nzfzezntftet
Scenario 1Simultaneous45(0.777,2.197)T(2.330,2.197)T41(0.606,2.084)T(2.742,2.084)T
Scenario 1Separate43(0.777,2.198)T(2.330,2.197)T40(0.721,2.052)T(2.925,2.052)T
Scenario 2Simultaneous13(0.777,2.197)T(2.330,2.197)T12(0.603,2.010)T(2.942,2.010)T
Scenario 2Separate13(0.777,2.197)T(2.330,2.197)T12(0.668,2.086)T(2.990,2.086)T
The triangular designs determined using the known variance test statistics, and the balanced-optimal designs determined using the unknown variance test statistics, are displayed for the two considered trial design scenarios, and the two considered stopping rules. All boundaries are given to three decimal places. We then examined, using 100,000 trial simulations, the performance of the following approaches as a function of the true variance σT2 n, , (the triangular design) with z-test statistics and the presumed value of σ2; n, , (the triangular design) with t-test statistics; n, , (the triangular design) with t-test statistics, and modification of the , using quantile substitution. That is, at interim analysis j we replace e and f by e′ = T∑(1 − Φ(e)) and f′ = T∑(1 − Φ(f)), where T is the cumulative distribution function of Student's t-distibution with ν degrees of freedom; n, , (the balanced-optimal design) with the t-test statistics. The results of these comparisons are given in Table 2. In both scenarios, using either stopping rule, assumption of known variance results in large inflation of the FWER when σT2 > σ2. In contrast, Approaches 3 and 4 far more accurately control the FWER in all cases, with Approach 4 controlling to the nominal level on slightly more occasions overall. Moreover, whilst is comparable for Approaches 3 and 4, Approach 4 always attains a lower value for .
Table 2

The estimated familywise error-rate (), power (), and expected sample sizes (ESSs) when () and () of the four considered approaches (A1–A4) are shown as the true variance σT2 varies, for the two considered trial design scenarios, and the two considered stopping rules. The rejection rate and ESS values are given to four and one decimal places respectively.

FactorApproachScenario 1
Scenario 2
σT2
σT2
0.250.51.02.04.00.250.51.02.04.0
Simultaneous stopping rule designs
α^FWERA10.00000.00350.04990.18160.34210.00000.00350.04950.18200.3450
A20.05080.05080.05180.05170.05140.05820.05610.05560.05700.0557
A30.04910.04920.05010.04970.04960.05190.04970.05000.05030.0495
A40.04930.04900.05040.05040.04870.05100.04870.04950.04940.0489
1β^powerA10.99810.97760.90780.79860.69490.99700.97400.91000.81200.7140
A21.00000.99520.90800.63140.35411.0000.99600.90900.63300.3610
A30.99990.99510.90680.62760.34981.0000.99600.90300.61800.3450
A40.99990.99390.90170.62580.35161.0000.99600.90100.62100.3530
ESS^0A1194.2210.6224.6225.3216.256.260.964.865.062.6
A2223.8224.0224.4224.3224.064.764.764.764.664.8
A3223.9224.1224.5224.4224.164.864.864.864.764.9
A4216.0216.1216.7216.3216.263.663.563.563.563.6
ESS^δA1216.4222.1222.6217.6208.860.662.062.661.960.2
A2180.3190.4222.5246.8252.052.154.762.568.369.9
A3180.3190.9223.6247.8252.852.155.263.469.270.5
A4165.6190.5232.6251.3250.348.755.966.470.770.6



Separate stopping rule designs
α^FWERA10.00000.00350.04940.18200.34100.00000.00350.05070.18180.3461
A20.05090.05190.05190.05170.05220.05690.05610.05670.05750.0568
A30.04890.05000.05010.04990.05040.05010.04970.05040.05090.0499
A40.04940.05010.04970.04980.05080.05040.04980.04990.05060.0497
1β^powerA10.99700.97200.90600.81100.72600.99750.97470.90960.81680.7292
A21.00000.99600.90500.62200.34901.00000.99640.90800.63470.3625
A31.00000.99600.90400.61700.34401.00000.99600.90200.61830.3462
A41.00000.99500.90000.62200.35201.00000.99530.89920.62150.3536
ESS^0A1185.6201.2217.0224.1222.556.160.965.567.867.3
A2216.6216.3217.0216.6216.765.565.465.565.565.6
A3216.5216.3217.0216.6216.765.565.465.565.565.6
A4205.7205.6206.2205.7205.862.762.662.762.762.8
ESS^δA1271.1270.4263.5250.0234.763.467.770.770.968.8
A2253.5255.8263.3263.3255.361.864.270.773.973.2
A3254.3256.6264.0263.9255.661.864.671.474.473.4
A4254.1257.9263.9257.9245.959.465.672.173.071.0
The estimated familywise error-rate (), power (), and expected sample sizes (ESSs) when () and () of the four considered approaches (A1–A4) are shown as the true variance σT2 varies, for the two considered trial design scenarios, and the two considered stopping rules. The rejection rate and ESS values are given to four and one decimal places respectively. In the Supplementary material, we also present and discuss our findings for the PWER.

Discussion

In this article, we extended previous work for two-armed group sequential trials to allow the design parameters of a MAMS t-test to be optimised, when employing either a simultaneous or separate stopping rule. For the considered examples, the method was successful in providing operating characteristics close to their nominal level. It is important to note that by Eq. (1), the FWER is controlled under the global null hypothesis (). This is known to provide strong control under the assumption of known variance with z-test statistics [2]. However, it is not known whether this is the case for the t-test statistics considered here. Therefore, whilst intuitively it seems logical that Eq. (1) would provide strong control in this setting, a search over the vector should be employed after initial design determination to verify this. In conclusion, our method provides an alternative approach for dealing with unknown variance to the heuristic quantile substitution procedure. Precisely, quantile substitution offers a quick, often effective means of controlling the FWER relatively accurately. However, if it is vital to control the FWER, the proposed method should be preferable, and additionally allows the stopping boundaries to be optimised. In certain circumstances it can therefore be expected to allow the determination of more efficient designs.
  9 in total

1.  Admissible two-stage designs for phase II cancer clinical trials that incorporate the expected sample size under the alternative hypothesis.

Authors:  Adrian P Mander; James M S Wason; Michael J Sweeting; Simon G Thompson
Journal:  Pharm Stat       Date:  2012-01-10       Impact factor: 1.894

2.  Group sequential t-test for clinical trials with small sample sizes across stages.

Authors:  Jun Shao; Huaibao Feng
Journal:  Contemp Clin Trials       Date:  2007-03-01       Impact factor: 2.226

3.  Optimal design of multi-arm multi-stage trials.

Authors:  James M S Wason; Thomas Jaki
Journal:  Stat Med       Date:  2012-07-23       Impact factor: 2.373

4.  Group sequential clinical trials with triangular continuation regions.

Authors:  J Whitehead; I Stratton
Journal:  Biometrics       Date:  1983-03       Impact factor: 2.571

5.  More multiarm randomised trials of superiority are needed.

Authors:  Mahesh K B Parmar; James Carpenter; Matthew R Sydes
Journal:  Lancet       Date:  2014-07-26       Impact factor: 79.321

6.  Optimal multistage designs for randomised clinical trials with continuous outcomes.

Authors:  James M S Wason; Adrian P Mander; Simon G Thompson
Journal:  Stat Med       Date:  2011-12-05       Impact factor: 2.373

7.  Multi-arm group sequential designs with a simultaneous stopping rule.

Authors:  S Urach; M Posch
Journal:  Stat Med       Date:  2016-08-23       Impact factor: 2.373

8.  Some recommendations for multi-arm multi-stage trials.

Authors:  James Wason; Dominic Magirr; Martin Law; Thomas Jaki
Journal:  Stat Methods Med Res       Date:  2012-12-12       Impact factor: 3.021

9.  Type I error rates of multi-arm multi-stage clinical trials: strong control and impact of intermediate outcomes.

Authors:  Daniel J Bratton; Mahesh K B Parmar; Patrick P J Phillips; Babak Choodari-Oskooei
Journal:  Trials       Date:  2016-07-02       Impact factor: 2.279

  9 in total
  2 in total

1.  Designs for adding a treatment arm to an ongoing clinical trial.

Authors:  Maxine Bennett; Adrian P Mander
Journal:  Trials       Date:  2020-03-06       Impact factor: 2.279

2.  Statistical consideration when adding new arms to ongoing clinical trials: the potentials and the caveats.

Authors:  Kim May Lee; Louise C Brown; Thomas Jaki; Nigel Stallard; James Wason
Journal:  Trials       Date:  2021-03-10       Impact factor: 2.279

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.