| Literature DB >> 27550822 |
Abstract
Multi-arm group sequential clinical trials are efficient designs to compare multiple treatments to a control. They allow one to test for treatment effects already in interim analyses and can have a lower average sample number than fixed sample designs. Their operating characteristics depend on the stopping rule: We consider simultaneous stopping, where the whole trial is stopped as soon as for any of the arms the null hypothesis of no treatment effect can be rejected, and separate stopping, where only recruitment to arms for which a significant treatment effect could be demonstrated is stopped, but the other arms are continued. For both stopping rules, the family-wise error rate can be controlled by the closed testing procedure applied to group sequential tests of intersection and elementary hypotheses. The group sequential boundaries for the separate stopping rule also control the family-wise error rate if the simultaneous stopping rule is applied. However, we show that for the simultaneous stopping rule, one can apply improved, less conservative stopping boundaries for local tests of elementary hypotheses. We derive corresponding improved Pocock and O'Brien type boundaries as well as optimized boundaries to maximize the power or average sample number and investigate the operating characteristics and small sample properties of the resulting designs. To control the power to reject at least one null hypothesis, the simultaneous stopping rule requires a lower average sample number than the separate stopping rule. This comes at the cost of a lower power to reject all null hypotheses. Some of this loss in power can be regained by applying the improved stopping boundaries for the simultaneous stopping rule. The procedures are illustrated with clinical trials in systemic sclerosis and narcolepsy.Entities:
Keywords: closed testing; early stopping; multi-arm multi-stage designs; multiple comparisons; multiple treatment arms
Mesh:
Year: 2016 PMID: 27550822 PMCID: PMC5157767 DOI: 10.1002/sim.7077
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Figure 1The type I error rate to reject H as function of δ when applying the simultaneous stopping rule or separate stopping rule for boundaries v 1,v 2 satisfying ((2)) (dashed curves) or the simultaneous stopping rule for improved boundaries v1′,v2′ = v 2 where v1′ solves ((4)) (solid curves) for O'Brien Fleming boundaries (left graph) and Pocock boundaries (right graph). No futility bound is applied (l 1=−∞). The horizontal dashed lines show the nominal α level and the levels corresponding to v1′ and v 1.
Pocock and O'Brien Fleming type boundaries for the intersection and the elementary null hypothesis if no binding futility stopping rule is applied (l 1=−∞) and equal per arm per stage allocation (r = 1,n 1/n = 1/2). The global boundaries (u 1,u 2) fullfill Equation ((1)). The elementary boundaries (v 1,v 2) computed for the separate stopping rule satisfy ((2)), is calculated for the simultaneous stopping rule to achieve ((4)) with
| Intersection hypothesis | Elementary hypotheses | ||||
|---|---|---|---|---|---|
| Boundary type |
|
|
|
|
|
| Pocock | 2.42 | 2.42 | 2.18 | 1.97 | 2.18 |
| O'Brien Fleming | 3.14 | 2.22 | 2.80 | 2.08 | 1.98 |
Operating characteristics of the separate stopping design (Sep.), the simultaneous stopping design (Sim.) and the improved simultaneous stopping design (Imp.) with Pocock and O'Brien Fleming type boundaries and n 1=n/2,r = 1: disjunctive power, conjunctive power and average sample number (ASN) under different effect sizes. The maximum sample size N is chosen to achieve a disjunctive power of 0.9 for δ = 0.5 and δ = 0. The settings where l 1=−∞ indicate designs with no stopping for futility boundary.
| Boundary | Effect size | Disj. | Conjunctive power | ASN | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Type |
|
|
| Power | Sep. | Sim. | Imp. | Sep. | Sim. |
|
| Pocock | − | 0.5 | 0.5 | 0.970 | 0.890 | 0.689 | 0.756 | 230 | 205 | |
| 0.5 | 0 | 0.904 | 0.025 | 0.016 | 0.025 | 292 | 232 | 324 | ||
| 0 | 0 | 0.025 | 0.004 | 0.003 | 0.004 | 323 | 322 | |||
| O'Brien | − | 0.5 | 0.5 | 0.970 | 0.894 | 0.716 | 0.840 | 260 | 241 | |
| Fleming | 0.5 | 0 | 0.906 | 0.025 | 0.012 | 0.024 | 287 | 261 | 300 | |
| 0 | 0 | 0.025 | 0.004 | 0.004 | 0.004 | 300 | 300 | |||
| Pocock | 0 | 0.5 | 0.5 | 0.970 | 0.889 | 0.687 | 0.755 | 230 | 205 | |
| 0.5 | 0 | 0.903 | 0.025 | 0.016 | 0.025 | 253 | 215 | 324 | ||
| 0 | 0 | 0.025 | 0.004 | 0.003 | 0.004 | 251 | 250 | |||
| O'Brien | 0 | 0.5 | 0.5 | 0.970 | 0.891 | 0.711 | 0.836 | 259 | 240 | |
| Fleming | 0.5 | 0 | 0.905 | 0.025 | 0.012 | 0.024 | 276 | 238 | 300 | |
| 0 | 0 | 0.025 | 0.004 | 0.004 | 0.004 | 233 | 233 | |||
Figure 2The average sample number and conjunctive power for different values of δ and δ , l 1=0. The average sample number is the same for the simultaneous stopping design as for the improved stopping design (dashed lines). The maximum sample size N is chosen to achieve a disjunctive power of 90% under δ =0.5,δ =0. For the settings where δ =0 and only one alternative hypothesis is true, no conjunctive power is shown.
Characteristics of the optimized separate (sep.), simultaneous (sim.) and improved simultaneous (imp.) designs: stopping boundaries, average sample number (ASN) under H 0(δ =δ =0), H 1(δ ,δ ), maximum sample size (N) and the conjunctive and disjunctive power under H 1. The power and, for designs with no futility stopping (where l 1=−∞), the A S N are optimized under the alternative H 1 specified in the table. For designs with futility stopping, , defined as the mean of the ASN under H 1 and the ASN under the global null hypothesis, is optimized. The maximum sample size N is chosen such that the disjunctive power is 90% given δ =0,δ =0.5. The columns v (v i ′),i = 1,2 denote the stopping boundary v for the separate and simultaneous design and the boundary v i ′ for the improved simultaneous design.
| Effect size | Stopping boundaries | ASN | Power | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Design |
|
|
|
|
|
|
|
|
|
| conj. | disj. |
| Sep. | 0.50 | 0.50 | − | 2.47 | 2.38 | 2.05 | 2.38 | 225 | 317 | 318 | 0.85 | 0.97 |
| Sim. | 0.50 | 0.50 | − | 2.41 | 2.43 | 2.06 | 2.37 | 205 | 322 | 324 | 0.71 | 0.97 |
| Imp. | 0.50 | 0.50 | − | 2.41 | 2.43 | 2.00 | 2.06 | 205 | 322 | 324 | 0.76 | 0.97 |
| Sep. | 0.50 | 0.00 | − | 2.79 | 2.26 | 2.11 | 2.26 | 279 | 300 | 300 | 0.02 | 0.90 |
| Sim. | 0.50 | 0.00 | − | 2.42 | 2.42 | 2.04 | 2.42 | 232 | 322 | 324 | 0.02 | 0.90 |
| Imp. | 0.50 | 0.00 | − | 2.42 | 2.42 | 2.00 | 2.06 | 232 | 322 | 324 | 0.02 | 0.90 |
| Sep. | 0.50 | 0.50 | 0.91 | 2.55 | 2.33 | 2.07 | 2.33 | 228 | 200 | 330 | 0.84 | 0.97 |
| Sim. | 0.50 | 0.50 | 0.91 | 2.51 | 2.35 | 2.10 | 2.28 | 211 | 203 | 336 | 0.71 | 0.97 |
| Imp. | 0.50 | 0.50 | 0.91 | 2.51 | 2.35 | 1.98 | 2.12 | 211 | 203 | 336 | 0.76 | 0.97 |
| Sep. | 0.50 | 0.00 | 0.94 | 2.68 | 2.28 | 2.10 | 2.28 | 235 | 199 | 330 | 0.02 | 0.90 |
| Sim. | 0.50 | 0.00 | 0.89 | 2.58 | 2.32 | 2.10 | 2.28 | 216 | 200 | 330 | 0.02 | 0.90 |
| Imp. | 0.50 | 0.00 | 0.88 | 2.58 | 2.32 | 1.97 | 2.20 | 216 | 201 | 330 | 0.02 | 0.90 |
Figure 3Closure principle for testing three hypotheses
Pocock and O'Brien Fleming type boundaries for the intersection of three and two hypotheses and the elementary hypothesis if no binding futility stopping rule is applied l 1=−∞,r = 1 and n 1/n = 1/2.
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
| Boundary type |
|
|
|
|
|
|
|
|
| Pocock | 2.56 | 2.56 | 2.42 | 2.21 | 2.42 | 2.18 | 1.96 | 2.18 |
| O'Brien Fleming | 3.33 | 2.36 | 3.14 | 2.23 | 2.22 | 2.80 | 1.96 | 1.98 |
Operating characteristics of the different three‐arm designs for Pocock and O'?Brien Fleming design types with equal allocation: disjunctive power, conjunctive power and average sample number (ASN) under different parameter configurations and maximum sample size N for a disjunctive power of 0.9 under δ =0.5andδ =δ =0.
| Boundary | Effect size | Disj. | Conjunctive power | ASN | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Type |
|
|
|
| Power | Sep. | Sim. | Imp. | Sep. | Sim. | N |
| Pocock | − | 0.5 | 0.5 | 0.5 | 0.99 | 0.72 | 0.49 | 0.60 | 330 | 279 | |
| 0.5 | 0.5 | 0 | 0.97 | 0.011 | 0.008 | 0.014 | 395 | 297 | |||
| 0.5 | 0 | 0 | 0.90 | 0.003 | 0.001 | 0.003 | 431 | 336 | 464 | ||
| 0 | 0 | 0 | 0.025 | 0.0007 | 0.0005 | 0.0010 | 463 | 461 | |||
| O'Brien | − | 0.5 | 0.5 | 0.5 | 0.98 | 0.80 | 0.54 | 0.76 | 373 | 334 | |
| Fleming | 0.5 | 0.5 | 0 | 0.97 | 0.015 | 0.004 | 0.019 | 398 | 351 | ||
| 0.5 | 0 | 0 | 0.90 | 0.003 | 0.0008 | 0.003 | 412 | 376 | 424 | ||
| 0 | 0 | 0 | 0.025 | 0.0008 | 0.0007 | 0.0009 | 424 | 424 | |||
Operating characteristics for a clinical trial for narcolepsy with standardized effect sizes of δ =δ =δ =0.86 and sample size for a disjunctive power of 90% if only one treatment is effective (δ =0.86, δ =0, δ =0)
| Boundaries | Sample size | Power | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Design |
|
|
|
|
|
| ASN |
| conj. | disj. |
| Sep. | 2.63 | 2.50 | 2.36 | 2.50 | 2.02 | 2.50 | 108 | 152 | 0.81 | 0.98 |
| Sim. | 2.40 | 2.88 | 2.24 | 2.88 | 1.97 | 2.88 | 101 | 184 | 0.59 | 0.99 |
| Imp. | 2.40 | 2.88 | 2.22 | 2.50 | 1.96 | 2.20 | 101 | 184 | 0.63 | 0.99 |
Figure 4The FWER as function of δ if δ =0 for the separate (green), simultaneous (blue) and improved simultaneous (red) designs using z‐test O'Brien Fleming boundaries (dashed) or the nominal p‐value approach (solid) applied to t‐statistics. No futility bound is applied. 106 simulation runs for each scenario. The FWER under the global null hypothesis δ =δ =0 for the nominal p‐value approach (z‐test) represented by the full (empty) dot is the same for all three designs. The dashed horizontal line denotes nominal level α = 0.025. Left graph for maximum total sample size N = 60, right graph N = 120.
Operating characteristics of the group sequential designs in the systemic sclerosis example. The average sample number and conjunctive power are computed for δ /σ = δ /σ = 0.4. The maximum sample size N is chosen such that the disjunctive power is 80% given δ /σ = 0.4, δ =0. The rejection and futility boundaries are optimized as in Section 4.
| Boundaries | Sample size | Power | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Design |
|
|
|
|
|
|
|
| N | conj. | disj. |
| Sep. | 2.64 | 2.30 | 0.94 | 2.09 | 2.30 | 265 | 295 | 235 | 390 | 0.70 | 0.91 |
| Sim. | 2.51 | 2.35 | 0.97 | 2.10 | 2.28 | 256 | 272 | 239 | 402 | 0.58 | 0.92 |
| Imp. | 2.52 | 2.35 | 0.97 | 1.99 | 2.07 | 256 | 272 | 239 | 402 | 0.64 | 0.92 |