Literature DB >> 35048391

An order restricted multi-arm multi-stage clinical trial design.

Alessandra Serra¹, Pavel Mozgunov¹, Thomas Jaki^1,2.

Abstract

One family of designs that can noticeably improve efficiency in later stages of drug development are multi-arm multi-stage (MAMS) designs. They allow several arms to be studied concurrently and gain efficiency by dropping poorly performing treatment arms during the trial as well as by allowing to stop early for benefit. Conventional MAMS designs were developed for the setting, in which treatment arms are independent and hence can be inefficient when an order in the effects of the arms can be assumed (eg, when considering different treatment durations or different doses). In this work, we extend the MAMS framework to incorporate the order of treatment effects when no parametric dose-response or duration-response model is assumed. The design can identify all promising treatments with high probability. We show that the design provides strong control of the family-wise error rate and illustrate the design in a study of symptomatic asthma. Via simulations we show that the inclusion of the ordering information leads to better decision-making compared to a fixed sample and a MAMS design. Specifically, in the considered settings, reductions in sample size of around 15% were achieved in comparison to a conventional MAMS design.

Entities: Chemical

Keywords: adaptive designs; infectious diseases; multi-arm multi-stage; order restriction

Mesh：

Year: 2022 PMID： 35048391 PMCID： PMC7612618 DOI： 10.1002/sim.9314

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

Introduction

Drug development is costly and time consuming. One family of clinical trial designs that can improve the development process are multi-arm multi-stage designs (MAMS). In aMAMS trial, insufficiently promising treatments can be dropped or the trial can be stopped due to overwhelming benefit at a series of interim analyses. To date these designs have focused on the setting of independent treatment arms and have been argued to be a highly efficient approach to clinical trials. They could, however, be suboptimal if an “order” (ie, a monotonic relationship) among the treatment effects can be assumed. Such an order can occur naturally, for example, when multiple doses or administration schedules of the same treatment are tested or when nested combinations of treatments are investigated. Another area where an order can often be assumed is when considering different treatment durations. In infectious diseases such as Tuberculosis (TB) and Hepatitis B (HBV), the treatment duration with current standard regimes is lengthy which results in a large burden on the patients, potentially high costs, increased risk of non-compliance and side effects. In TB and HBV, for example, treatment periods of 6 and 12 months are typical. Novel treatments or combinations of treatments in these areas offer the opportunity for both higher efficacy and shorter treatment periods. In the setting of multiple treatment durations Quartagno et al have proposed to model the duration-response curve.While this is an efficient way to understand the duration-effect relationship, it is less clear how to definitively conclude whether a duration is “better” than the current standard. In this work, we extend the MAMS framework and propose a design that incorporates the order of treatment effects in the decision-making when no parametric dose-response or duration-response model is assumed. The objective of the design is to identify all promising arms (eg, treatment durations, doses, or combination of treatments), including the one associated with the smallest relevant treatment effect. The rest of the manuscript continues as follows. A case study is introduced in Section 2 before a detailed description of the 3-arm and 2-stage design is provided in Section 3. Section 4 then generalizes the proposed design to an arbitrary number of arms and stages and provides some theoretical results. Section 5 revisits the case study before the design is evaluated via simulations in Section 6. In Section 7, the effect of various critical bounds on the operating characteristics of the proposed design is explored.We conclude with a discussion.

Case Study Setting

The Tiotropium add-on therapy in adolescents with moderate asthma: A 1-year randomized controlled trial (NCT01257230) is a Phase III study that assessed the efficacy and safety of once-daily tiotropium via Respimat added to inhaled corticosteroid (ICS) with or without a leukotriene receptor antagonist in adolescent patients with moderate symptomatic asthma. Patients were randomized with equal probability to receive 5 μg (2 puffs of 2.5 μg) or 2.5 μg (2 puffs of 1.25 μg) of once-daily tiotropium or placebo (2 puffs). The primary outcome was change from baseline in peak FEV1 within 3 h after dosing (peak FEV1[0−3) measured after 24 weeks of treatment. The null hypotheses were tested in a stepwise manner to control the type I error starting from the highest dose suggesting that a monotonic dose-response relationship can be assumed.

A 3-Arm 2-Stage Order Restricted Design

In this section, we develop an order restricted design (ORD) for the setting of the case study. We denote the highest dose (5μg) by T 1 and the lower dose (2.5μg) by T 2. The generalization to an arbitrary number of arms and stages is given in Section 4. Assume that a patien’s response follows a normal distribution with known common variance, σ 2. An alternative approach is outlined in Section 8 for the case of unknown variance. Let be the observation of the ith patient on treatment k (the control arm is denoted by 0) and be the number of patients on arm k up to stage j. Let be the true treatment effect of active arm k ∈ {1,2} compared to the control. We denote the vector of treatment effects by Consider the following order relationship: implying that the treatment effect of the second treatment is at most as large as the treatment effect for the first treatment. Let be the ratio between the number of subjects allocated to treatment k ∈ {0,1,2} and control at each stage j with Let be the test statistic at stage j for comparing arm k ∈ {1,2} to control, where and with k ∈ {0,1,2} and n is the sample size in the control group at the first stage. The vector of test statistics follows a multivariate normal distribution with and the covariances between Z-statistics are with with with with We test the null hypotheses: with the global null hypothesis denoted by Let and be the critical values at stage j for T 1 and T 2, respectively, used to test the hypotheses, with The proposed design then takes into account the order among the treatment effects when making the decisions at the first stage and the final analysis and a set of decision rules consistent with this order is given in Table 1. For example, if both Z-statistics cross the upper bounds at the interim analysis, the trial is stopped for efficacy (as in a conventional MAMS design). In contrast to the traditional MAMS design, the trial continues if there is contradicting evidence with respect to the order, for example, if crosses the upper bound, but there is not enough evidence to claim superiority of T 1 to control, then both arms are continued to the next stage.

Table 1

Combination of the decision rules in the 3-arm 2-stage trial with θ(1) ≥ θ(2)

	Zj(1)≥uj(1)	lj(1)<Zj(1)<uj(1)	Zj(1)≤lj(1)
Zj(2)≥uj(2)	Stop: select T ₁,T ₂	Proceed with T ₁, T ₂	Proceed with T ₁, T ₂
lj(2)<Zj(2)<uj(2)	Proceed with T ₂	Proceed with T ₁, T ₂	Drop both arms
Zj(2)≤lj(2)	Stop: select T ₁	Proceed with T ₁	Drop both arms

Note: Cells colored in red correspond to contradicting evidence.

The idea behind these decision rules is that at any stage the effectiveness of T 2 can be claimed only if T 1 can be declared superior to the control. Therefore, T 1 can be regarded as a gatekeeper. Following this procedure, depending on the context, alternative decisions could be considered for the cells colored in red in Table 1 (see Section 2 of the Supporting Information for more discussion).

Family-wise error rate

For confirmatory clinical trials, control of the family wise error rate (FWER) in the strong sense at level α, that is the probability to reject at least one true null hypothesis, is often required.16 Using the rules described in Table 1, the FWER for the 3-arm 2-stage ORD can be written as Equation (1) shows that the events used for the computation of the type I error under the global null hypothesis ({Reject H 01 and H 02}, {Reject H 01 not H are a subset of the events ({Reject H ∪ H 02}) used in the MAMS design of Magirr et al. Thus, the probability of rejecting at least one hypothesis under the global null will be smaller for the ORD compared to the MAMS design, while the probability of rejecting neither hypothesis will be smaller for MAMS if the same bounds are used. It is worth noting that overall, the critical bounds, if these are the same for all active treatments for the 3-arm 2-stage ORD are smaller in each stage compared to the MAMS design of Magirr et al (see Section 7 of the Supporting Information). Consequently the ORD design is strictly more powerful than the MAMS design under these assumptions. The critical bounds for the given treatment arm can be defined as function of a (possibly arm-specific) parameter, that is, which can be searched over a grid of values for a in order to strongly control the FWER at level α. If a (1) = a (2) = a, then a unique solution can be found restricting the search over α such that the expression given in Equation (1) is below α under the global null hypothesis. In this case, the solution is unique either when the search is based on different boundary shapes or when the same boundary shapes are used for all experimental arms— for all k∈ {1,2}. If a ( are not the same for each arm, additional constraints are required for the uniqueness of the solution and to maintain the strong control of the FWER at level α, such as the control under the partial null hypotheses. In the 3-arm setting, for example, this is (θ (1), 0). However, different values of θ (1) can provide different boundaries, and so the solution is unique for the specific value of θ (1) (see Section 7 for more details). Theorem 1 below then shows that, if the same bounds, are used for each arm, and the same allocation ratios (with respect to the control) are used for all active treatments, the FWER is maximized under the global null hypothesis and hence the above ensures strong control of the FWER. Theorem 1. Consider a 3-arm 2-stage ORD design and denote the global null hypothesis by Let 0 be a vector where at least one treatment effect is less or equal to 0. Then, The proofs of all theorems are given in Section 1 of the Supporting Information.

Power requirement

To power the study, we consider the configuration where and δ 0 is the minimum clinically relevant difference. The ORD is then powered at (1 - β) to reject both hypotheses under when Equation (2) is satisfied: Theoretical considerations and numerical evaluations have shown that, if Pocock boundaries for both treatments are used, and critical values found such that Equation (1) is below α, the power of the ORD design is, practically, no smaller than a fixed balanced sample design with the same sample size. Furthermore, for a number of treatment effects, it was found to be strictly positive. Full details of these considerations are given in Section 3 of the Supporting Information.

Generalization Of The Ord To K-Arm J-Stage

Consider a clinical trial with K − 1 active treatment arms, against a control treatment and J stages and denote the treatment effect comparing treatment k against control by θ We denote the vector of treatment effects by The null hypotheses of interest are Let denote the test statistic based on all data up to stage j for comparison as before and assume that the following order relationship holds: Let be the critical values at stage j with The decision rules at the interim analyses follow the same principle as for the 3-arm 2-stage design defined above. The decisions are made in order to be able to select all promising treatment arms at the end of the trial and H 0 can only be rejected if all have been rejected. Once H 0 has been rejected, the recruitment to arms is stopped, where ℒ j is the lowest index of a treatment arms remaining in the trial at stage j. If there is contradicting evidence with respect to the order at stage j, that is when and there is at least one such that then recruitment to these arms continue. As for the 3-arm 2-stage design, if there is sufficient evidence to drop arm k, that is when and if there is any contradicting evidence for then the recruitment to arms is stopped, where is the highest index on the treatment arms remaining in the trial at stage j. A general algorithm for the decision-making in this setting is given in Algorithm 1. Let ℒ and be the lowest and highest indices on the active treatment arms remaining in the trial at stagej, respectively. At stage j, compute for Stop recruitment to arm k at stage j for efficacy if for all Stop recruitment to arm k at stage j for futility if and for all or and for at least one and for no Stop the trial when the recruitment to all arms is stopped. Let M be a random variable representing the number of arms (including the control) at stage j when H 01 failed to be rejected at stage j − 1. Because of the hierarchy in testing the hypotheses, the FWER for an K-arm J-stage design can be written as where P (M) is the probability that at the previous stage H failed to be rejected and the number of arms were at least m. One can show that the following iterative equality holds with at the first stage and 0 otherwise. The set defines the event that H 01 failed to be rejected at stage j − 1 when the number of treatment arms (including the control arm) in the trial were c. This set is formally defined in Table 2. In the definition of the superscript (c − 1) indicates the number of active treatment arms that are still in the trial at stage j − 1, while the subscript c − m + 1 refers to the number of active treatment arms (that is equal to c − m) that have been dropped before reaching the stage j.

Table 2

Sets of events for K-arm J-stage design with

Set	Definition
Cj(k)	{lj−1(k)<Zj−1(k)<uj−1(k)}
Sj(k)	{Zj−1(k)≤lj−1(k)}
Ej(k)	{((Cj(1)∪Sj(1))∩(Cj(k)∪Sj(k)))∪(Ej(k−1)∩Cj(k))}
Ej(0)	Ω
Aj,1(k),k>0	Ej(k)
Aj,2(k),k>1	Ej(k−1)∩Sj(k)
Aj,k+1−s(k),k>2,s=1:k−2	Ej(s)∩Sj(s+1)∩t=s+2k(Cj(t)∪Sj(t))

Note: Ω is the whole sample space.

While the expression for the FWER in the general case is cumbersome, for a fixed number of stages (arms), the FWER for K-arm (J-stage) can be found iteratively—an example for 2-stage trials is given in Section 4 of the Supporting Information. The critical bounds for a K-arm J-stage ORD can be, again, defined as functions of parameters a ( such that Thus, as for the 3-arm 2-stage setting, under the constraint of a unique solution can be found to control the FWER at level α in the strong sense. Theorem 2 below then shows that, if the same bounds, are used for each arm and the same allocation ratios (with respect to the control) are used for all active treatments, the FWER is maximized under the global null hypothesis and hence the above ensures strong control of the FWER. Theorem 2. Consider a K-arm J-stage ORD design and denote the global null hypothesis by Then, In the next session, a simulation study will be described in order to apply the proposed design in the context of the asthma trial.

Case Study

Setting

We revisit the results of the clinical trial of Tiotropium add-on therapy in adolescents with moderate asthma: A 1-year randomized controlled trial (NCT01257230). Patients were randomized in a 1:1:1 ratio to receive 5μg or 2.5μg of once-daily tiotropium or placebo. The null hypotheses were tested in a stepwise manner to control the type I error at level α = 0.025. The study was powered at 80% to detect a difference of 120 mL between treatments in the change from baseline of peak FEV1[0-3 assuming a common SD of 340 mL. It was found that 127 patients per group were needed, resulting in a maximum sample size of 381 patients. The trial is revisited using the ORD, which can be applied assuming a monotonic dose-response relationship. In line with the original trial we assume that the change from baseline of peak FEV1[0-3 is normally distributed with SD σ = 340, k ∈ {0,1, 2}, and common baseline mean FEV1 of μ (0)= 2747. As in the original study, we consider an improvement of FEV1 of 120 of interest and hence consider the following values for and = (120 120). The design is powered at 80% to reject all hypotheses or at least one hypothesis (in order to compare the sample sizes between the ORD and the original trial design) when all doses have the same effect compared to the placebo. Additionally to the achieved power, the efficiency of the proposed design is measured by its expected sample size (ESS), that is the mean number of patients recruited to the trial before it is terminated. We consider one- and two-stage ORD designs and note that the one-stage ORD design corresponds to the hierarchical testing strategy used in the original design. For the two-stage design the interim analysis takes place after half of the total sample size has been observed and triangular critical bounds are used. The numerical results found using R19 and 106 replicate simulations.

Numerical results

Consider the 3-arm 1-stage ORD using the maximum total sample size of 381 patients that corresponds to the same maximum total sample size originally planned for the study. In this setting, the critical bound at the final analysis is Table 3 describes the results of the simulation.

Table 3

Results of the simulations that revisit the NCT01257230 trial using the ORD

Design powered to reject all hypotheses
θ ₍₁₎	θ ₍₂₎	Max. SS	Stages	Reject all	Reject H ₀₁ not H ₀₂	Reject at least one H _0k	ESS
0	0	474	1	0.005	0.020	0.025	474.00
		534	2	0.004	0.021	0.025	316.39
120	0	474	1	0.025	0.856	0.881	474.00
		534	2	0.025	0.854	0.879	371.83
120	120	474	1	0.803	0.078	0.881	474.00
		534	2	0.802	0.081	0.883	399.81
Design powered to reject at least one hypothesis
θ ₍₁₎	θ ₍₂₎	Max. SS	Stages	Reject all	Reject H ₀₁ not H ₀₂	Reject at least one H _0k	ESS
0	0	381	1	0.005	0.021	0.025	381.00
		426	2	0.004	0.021	0.025	252.43
120	0	381	1	0.025	0.779	0.803	381.00
		426	2	0.024	0.774	0.798	304.67
120	120	381	1	0.691	0.112	0.803	381.00
		426	2	0.684	0.117	0.802	331.89

Note: For the two-stage design, the triangular bounds are used. Proportions refer to 106 replications and values of interest are in bold. Abbreviations: ESS, expected sample size; Max. SS, maximum sample size.

It can be seen that the FWER is controlled at level α = 0.025 under all considered null scenarios. For the scenarios where there is at least one dose that is superior to control, the probability to reject at least one hypothesis is 80% as required in the original study. Note that 80% is the probability of rejecting at least one dose considering any rejections and not only correct rejections. Therefore, when no interim analyses are planned, the ORD requires the same maximum sample size as in the original study if it is powered to reject at least one hypothesis. It is worth noting that the probability of rejecting all hypotheses, if all of them are true, is around 69%. The distinguishing feature of the proposed ORD is that it allows to include interim analyses during the trial. Table 3 shows the operating characteristics of the design when an interim analysis takes place after observing half of the maximum sample sizes. If the study is powered to reject at least one hypothesis at 80%, a maximum sample size of 426 patients is needed—45 more patients than for the 1-stage design. The gain from using a two-stage design arises in terms of the expected sample size—on average the number of patients is expected to be below 332 under each scenario. At the same time, under this power configuration, the probability to identify the smallest promising dose is at 68% similar to the single-stage design. Furthermore, it is argued by the construction of the ORD proposed in Section 3 that if it is of interest to identify the lowest dose with the promising treatment effect, the trial should be powered to reject all hypotheses. In order to identify the lowest effective dose with the desirable 80% probability, a single-stage trial would require 474 patients and the two-stage design 534 patients. As before, the inclusion of an interim analysis and the use of triangular bounds lead to the reduction of the expected number of patients. Specifically, on average the number of patients is expected to be below 400 under each scenario. Overall, the ORD was found to reproduce the sample size calculation of the original study when no interim analyses were planned and is powered to reject at least one hypothesis. At the same time, the ORD allows the inclusion of interim analyses during the trial. When an interim analysis is planned during the trial, the ORD has shown to be an efficient design as it allows to stop the trial earlier or to drop unpromising doses with high probability before the end of the study. The results discussed in this section use triangular bounds. Qualitatively similar results using Pocock and O’Brien and Fleming boundaries are provided in Section 6 of the Supporting Information. To further investigate the design characteristics of the proposed ORD, an extensive simulation study is conducted in Section 6.

Numerical Evaluations

As in the motivating example, consider a clinical trial setting with 3 treatments arms, 2 experimental and a control with 1:1:1 allocation ratio. Consider a single-stage design (that is a fixed sample design (FSD) with hierarchical testing, FSD(h)) and a two-stage design with one pre-planned interim analysis at the middle of the trial. The objective of the trial is to find all promising treatment arms. The FWER is to be controlled below α = 0.05 and the power of trial is to be at least 80% to reject both hypotheses when both treatment arms have the same effect compared to the control. The patients’ responses on treatment arm k are assumed to have normal distribution with mean μ ( and SD σ = 1. For the control group μ (0) = 1, while for the treatment arms We fix the clinically relevant difference to be 0.5. Therefore, the scenario under which the ORD is powered is = (0.5, 0.5). We evaluate the performance of the design under various treatment effects configurations when the treatment effect on the first arm is fixed to be 0.5 and the treatment effect on the second arm is varied: and For the 3-arm 2-stage ORD, the bounds are found under the conditions a ( = a (ie, the same for each treatment arm) and they are found using a grid search—based on the triangular boundary shape for the all experimental treatments—over one single parameter. Thus, the solution found is unique.

Competing approaches

The proposed ORD is compared to the FSD, in which the total sample size is specified at the design stage of the trial and it is not subject to adaptations during the process of the trial. The hypotheses are tested at the end of the trial only and any hypothesis can be rejected independently of the other. The second comparator is the MAMS design by Magirr et al. However, the conventional MAMS design is not an appropriate comparator as it does stop as soon as at least one hypothesis is rejected. Therefore, the following modification of the MAMS design is considered for the comparison. At the interim analysis, if a Z-statistic corresponding to one arm crosses the upper or lower bound while another does not, the trial will still continue with the treatment that did not cross a bound. The design is referred to as the MAMS(m). The FWER expression for MAMS(m) is the same as derived by Magirr et al3 but the power expression changes. To power a 3-arm 2-stage design, for a given configuration of and a pre-specified level β, we search for the sample size that satisfies Two main simulation studies are conducted to compare the three competing approaches. In the first one, each design is constructed such that it yields 80% power to reject both null hypotheses under the alternative hypothesis. The second one compares the power between approaches based on the common sample size.

Same power requirement for all designs

In this subsection, the results when all design are powered at 80% to reject both null hypotheses are provided. The resulting design specifications and operating characteristics under the global null hypothesis are provided in Table 4.

Table 4

FWER, maximum sample sizes (Max. SS), expected sample sizes (ESS), and critical bounds under = (0, 0) for the 3-arm 2-stage ORD, 3-arm 1-stage ORD (FSD(h)), FSD, and MAMS(m) designs when all designs are powered at 80% to reject both hypotheses under = (0.5,0.5)

Design	u ₁,u ₂,l ₁	Max. SS	ESS	Reject at least one H _0k
FSD	1.917, -, 1.917	231	231.0	0.05
FSD(h)	1.644, -, 1.644	192	192.0	0.05
ORD	1.898, 1.789, 0.633	222	134.4	0.05
MAMS(m)	2.179, 2.055, 0.726	264	166.6	0.05

Note: 3-arm 2-stage ORD and MAMS(m) use triangular bounds. Results are provided using 106 replications.

Under = (0,0), all designs control the type I error at level α = 0.05 as expected. The total maximum sample sizes necessary to reach a power of 80% is 231 for the FSD, 192 for the FSD with a hierarchical test (FSD(h)) which is akin to a single-stage ORD design, 222 for the 3-arm 2-stage ORD and 264 for the MAMS(m) designs. The maximum sample size (to achieve the same power to reject both hypotheses) for the FSD is greater compared to FSD(h) and the two-stage ORD as it does not account for the hierarchy in testing. The designs’ performances under the configuration and are presented in Figure 1. When the second treatment arm is no different to control, θ (2) = 0, the probability of rejecting the null hypothesis for this arm is controlled at α = 0.05 for all designs and when =(0.5,0.5) all the designs satisfy the power requirement at 80%. However, under all other considered non-zero values of θ (2), the probability of rejecting both hypotheses is higher for the approaches accounting for the hierarchy, FSD(h) and ORD, than for other competing designs. The gain from using a two-stage ORD design compared to FSD(h) arises in terms of expected sample sizes which is strictly lower for the two-stage design. The 3-arm 2-stage ORD has noticeably lower expected sample size (ESS) compared to all other designs ranging from 13% to 35% depending on the simulation scenario. The largest difference in power (an increase of around 5%) between the 3-arm 2-stage ORD and the MAMS(m) design is achieved under θ (2) = 0.2.

Figure 1

Power and expected sample sizes (ESS) under θ = (0.5, θ (2)) and θ (2) ∈ {0, 0.1, 0.2, 0.3, 0.4, 0.5} for the FSD, 3-arm 1-stage ORD (FSD(h)), 3-arm 2-stage ORD, and MAMS(m) designs when all designs are powered at 80% to reject both hypotheses under θ = (0.5, 0.5). 3-arm 2-stage ORD and MAMS(m) use triangular bounds. Results are provided using 106 replications

Common sample size for all designs

In this subsection, the three designs are compared with a common maximum sample size of 222 patients, which is the maximum sample size necessary for the 2-stage ORD in order to reject both hypotheses at 80% under = (0.5, 0.5). The design specifications and operating characteristics of the designs under the global null hypothesis are provided in Table 5, while the designs’ performances under the configuration and are presented in Figure 2.

Table 5

FWER, maximum sample sizes (Max. SS), expected sample sizes (ESS), and critical bounds under = (0,0) for the 3-arm 2-stage ORD, 3-arm 1-stage ORD (FSD(h)), FSD, and MAMS(m) designs when all designs have the same common total sample size equal to 222 patients

Design	u ₁ ,u ₂ ,l ₁	Max. SS	ESS	Reject at least one H _0k
FSD	1.917, -, 1.917	222	222.0	0.05
FSD(h)	1.644, -, 1.644	222	222.0	0.05
ORD	1.898, 1.789, 0.633	222	134.4	0.05
MAMS(m)	2.179, 2.055, 0.726	222	140.1	0.05

Note: 3-arm 2-stage ORD, and MAMS(m) use triangular bounds. Results are provided using 106 replications.

Figure 2

Power and expected sample sizes (ESS) under θ = (0.5, θ (2)) and θ (2) ∈ {0, 0.1, 0.2, 0.3, 0.4, 0.5} for the FSD, 3-arm 1-stage ORD (FSD(h)), 3-arm 2-stage ORD, and MAMS(m) designs when all designs have the same common total sample size equal to 222 patients. 3-arm 2-stage ORD and MAMS(m) use triangular bounds. Results are provided using 106 replications

As expected all designs control the FWER under the global null. Under the scenario when the second treatment arm is no different to control, θ (2) = 0, the probability of rejecting the null hypothesis for this arm is controlled at α = 0.05. Under all other considered non-zero values of θ (2), the approaches accounting for the hierarchy, FSD(h) and ORD, result in higher power compared to the other competing designs. The gain from using a two-stage ORD design compared to FSD(h) can once more be seen in terms of expected sample sizes which is strictly lower for the two-stage design. The 2-stage ORD has lower ESS compared to the other designs with reductions between 3% and 32%. The largest difference in power—around 9.8%—between the ORD and the MAMS(m) design is achieved under θ (2) = 0.3. Overall, the ORD results in noticeable gains across all considered scenarios both in terms of power and expected sample size. Therefore, the inclusion of the order restriction into the decision rules for the decision-making can provide advantages in power and/or expected sample size compared to standard approach to multi-arm trials, specifically, the FSD and the MAMS(m).

Different Bounds For Each Treatment Arm

In the results above the same bounds are used for both treatments. However, the proposed design allows for different boundary shapes to be used for different treatments which could lead to potential benefit in terms of power. In this section, we explore the effect of various boundary shapes on the operating characteristics of the designs. We consider the setting as in Section 6.1 and let be the upper and lower bounds for T 1 at the stage j, and be the boundaries for T 2 at the stage j, being functions of a (1) and a (2), respectively. The critical bounds and the sample size could be searched over a grid of values for a (1) and a (2) in order to strongly control the FWER at level α and to satisfy the power requirements in Equation (2). To maintain strong control of the FWER at level α it is necessary to control the type I error when 0 = (θ (1), 0). Indeed, as shown in Section 1 of the Supporting Information, under 0 = (θ (1),0) it holds where tends to Thus, the bounds for the second arm are searched over a grid of values of a (2) as for a 2-arm design with 2 stages and then the bounds for T 1 are searched in order to satisfy Equation (1) under the null hypothesis. Finally, the sample size is searched to satisfy the power requirements, assuming equal allocation to all arms. Several combinations of boundary shapes are compared considering all possibilities with constant POC, O’Brien and Fleming OBF and triangular TRIAN bounds. Among all these nine combinations of bounds, a subset of six is selected. For each shape of the bounds for T 1, two combinations are selected. These are those that provide the smallest ESS and the highest power compared to the other ones. The combinations that are excluded are the ones that use constant bounds for T 1 and T 2, the combination with O’Brien and Fleming and constant bounds for T 1 and T 2 respectively and the combination with triangular and constant bounds for T 1 and T 2 respectively. The complete set of results is provided in Section 5 of the Supporting Information. The summary of operating characteristics, probability to reject both and probability to reject only one, and the expected sample size, for the proposed design using the six remaining combinations of boundary shapes is provided in Figure 3.

Figure 3

Probability of rejecting both hypotheses (left) and probability of rejecting the first but not the second hypothesis (right) under θ = (0.5, θ (2)) and θ (2) ∈ {0, 0.1, 0.2, 0.3, 0.4, 0.5} for the 3-arm 2-stage ORD design when it is powered at 80% to reject both hypotheses under θ = (0.5, 0.5). ORD uses the selected combination of bounds which control the type I error under θ = (∞, 0). Results are provided using 106 replications

The design resulting in the highest power in rejecting both hypotheses among the subset of selected bounds is the one that uses triangular bounds for the treatment associated to the highest effect and O’Brien and Fleming bounds for the arm associated to the lowest treatment effect. Indeed, in the way that O’Brien and Fleming bounds are constructed, if then the trial tends to stop later and the final decision on T 2 is based on more data. Therefore, the test becomes more powerful compared to the test that tends to make a final decision on T 2 earlier. This selection of bounds also corresponds to the smallest probability of rejecting the first hypothesis and not the second one (that is the probability of making an error when θ (2) < 0) compared to the other combinations. The combination that uses O’Brien and Fleming bounds for both treatment arms, the combination with Pocock bound for T 1 and O’Brien and Fleming bounds for T 2, and the combination of triangular and O’Brien and Fleming bounds result in similar power to reject both hypotheses but the first two combinations require lower maximum sample size compared to the latter—198 and 204 patients, respectively against 210 patients. At the same time, the three combinations differ in terms of ESS and in terms of probability of rejecting only the first hypothesis. Indeed, among these three combinations, the one with triangular bound for T 1 and O’Brien and Fleming bound for T 2 is the one with the smallest ESS and presents the smallest probability of rejecting only the first hypothesis. The combination that uses O’Brien and Fleming bound for T 1 and triangular bound for T 2 is the one with smallest power and highest probability of rejecting only the first hypothesis. While the combination that uses triangular bounds for both treatment arms is the one with the smallest ESS (indeed triangular bounds are constructed to minimize the ESS ) for each configuration of compared to the other ones, even though it is one of the combinations with the highest maximum sample size—222 patients. Nevertheless, this combination has slightly smaller power of rejecting both hypotheses and slightly smaller probability of rejecting only the first hypothesis compared to the combination with Pocock bounds for T 1 and triangular bound for T 2. Overall, the results suggest that there is a benefit, in terms of power and ESS, in using different bounds for each treatment arm. However, the final choice of the bounds will depend on the specific objectives of the trial. For example, in order to minimize the ESS it is recommended the use of triangular bounds for both treatment arms, whereas in order to maximize the power of rejecting all hypotheses it is recommended the use of triangular bounds for T 1 and O’Brien and Fleming bounds for T 2.

Discussion

The aim of the current study was to explore MAMS designs that could select the most promising arm associated to the minimum treatment effect. An approach that takes the order relationship among the treatment effects (when no parametric dose-response or duration-response model is assumed) into account has been proposed. In the proposed approach we claim the effectiveness of the arm associated to the minimum treatment effect compared to the control only if we claim that the treatment associated to the maximum effect is efficacious. Through theoretical arguments and extensive numerical evaluation we show that the proposed design can provide noticeable advantages in power and/or expected sample sizes required in the trial compared to the alternatives. The proposed design can be applied to a wide range of clinical trial settings where it could be assumed an order among the treatment effects. For example, in clinical trial designs applied to infectious diseases, such as TB and HBV, where it can be assumed that longer treatment durations correspond to a higher efficacy. In this case, the focus translates into the problem of selecting the shortest promising treatment duration. The proposed design can also be applied to clinical trial settings where nested combinations of treatments are tested against a common control arm. The proposed design is closely linked to the hierarchical procedures described in the literature for example, by Glimm et al, Tamhane et al. In particular, in the literature, the hierarchical procedure is applied for testing multiple end-points in a specific order. In some applications, the interest is to test the secondary endpoint just if the primary endpoint has been rejected. Therefore, the endpoints could be tested on hierarchically order and various strategies can be adopted to test the hypotheses depending on the study objectives. It can be noted that when non-binding futility boundaries are used in the 3-arm 2-stage ordered restricted design, the overall testing procedure coincides with the proposed method. In this study, it has been assumed that the common variance is known. However, the effect of this assumption is not negligible especially with small sample sizes. In this case, a possible approach would be to transform the individual test statistics using the function where T denotes the t distribution function with d degrees of freedom and Φ is the standard cumulative density function. More details of this approach are given in Jennison and Turnbull. Several avenues of future research present itself. First, focus has been given to superiority tests in this work. In certain diseases, such as in TB, non-inferiority designs are the norm and hence further research on non-inferiority hypothesis tests is of interest. Second, we assume that information time is the same for all treatments. When considering different durations of treatment, however, this information accumulates a different times and hence further work will consider optimal designs in this setting. Finally, we assume in this work that there is no uncertainty about the order of the treatment effects. Using a Bayesian framework, however, would naturally allow for uncertainty in that assumption to be considered.

17 in total

1. Combining different phases in the development of medical treatments within a single trial.

Authors: P Bauer; M Kieser
Journal: Stat Med Date: 1999-07-30 Impact factor: 2.373

2. Sequential designs for phase III clinical trials incorporating treatment selection.

Authors: Nigel Stallard; Susan Todd
Journal: Stat Med Date: 2003-03-15 Impact factor: 2.373

3. Gatekeeping procedures with clinical trial applications.

Authors: Alex Dmitrienko; Ajit C Tamhane
Journal: Pharm Stat Date: 2007 Jul-Sep Impact factor: 1.894

4. Hierarchical testing of multiple endpoints in group-sequential trials.

Authors: Ekkehard Glimm; Willi Maurer; Frank Bretz
Journal: Stat Med Date: 2010-01-30 Impact factor: 2.373

5. A gatekeeping procedure to test a primary and a secondary endpoint in a group sequential design with multiple interim looks.

Authors: Ajit C Tamhane; Jiangtao Gou; Christopher Jennison; Cyrus R Mehta; Teresa Curto
Journal: Biometrics Date: 2017-06-06 Impact factor: 2.571

6. Tiotropium add-on therapy in adolescents with moderate asthma: A 1-year randomized controlled trial.

Authors: Eckard Hamelmann; Eric D Bateman; Christian Vogelberg; Stanley J Szefler; Mark Vandewalker; Petra Moroni-Zentgraf; Mandy Avis; Anna Unseld; Michael Engel; Attilio L Boner
Journal: J Allergy Clin Immunol Date: 2016-03-05 Impact factor: 10.793

7. Adaptive designs in clinical trials: why use them, and how to run and report them.

Authors: Philip Pallmann; Alun W Bedding; Babak Choodari-Oskooei; Munyaradzi Dimairo; Laura Flight; Lisa V Hampson; Jane Holmes; Adrian P Mander; Lang'o Odondi; Matthew R Sydes; Sofía S Villar; James M S Wason; Christopher J Weir; Graham M Wheeler; Christina Yap; Thomas Jaki
Journal: BMC Med Date: 2018-02-28 Impact factor: 8.775