Literature DB >> 28798964

Design of Phase II Non-inferiority Trials.

Sin-Ho Jung1.   

Abstract

With the development of inexpensive treatment regimens and less invasive surgical procedures, we are confronted with non-inferiority study objectives. A non-inferiority phase III trial requires a roughly four times larger sample size than that of a similar standard superiority trial. Because of the large required sample size, we often face feasibility issues to open a non-inferiority trial. Furthermore, due to lack of phase II non-inferiority trial design methods, we do not have an opportunity to investigate the efficacy of the experimental therapy through a phase II trial. As a result, we often fail to open a non-inferiority phase III trial and a large number of non-inferiority clinical questions still remain unanswered. In this paper, we want to develop some designs for non-inferiority randomized phase II trials with feasible sample sizes. At first, we review a design method for non-inferiority phase III trials. Subsequently, we propose three different designs for non-inferiority phase II trials that can be used under different settings. Each method is demonstrated with examples. Each of the proposed design methods is shown to require a reasonable sample size for non-inferiority phase II trials. The three different non-inferiority phase II trial designs are used under different settings, but require similar sample sizes that are typical for phase II trials.

Entities:  

Keywords:  Binary outcome; non-inferiority margin; phase II/III trials; randomized phase II trials; sample size calculation; survival outcome

Year:  2017        PMID: 28798964      PMCID: PMC5546322          DOI: 10.1016/j.conctc.2017.04.008

Source DB:  PubMed          Journal:  Contemp Clin Trials Commun        ISSN: 2451-8654


Introduction

In a standard superiority trial, we usually want to show that an experimental therapy is more efficacious than a standard therapy, also called a control therapy. In this case, the null hypothesis is that the two therapies have equal efficacy and the alternative hypothesis is that the experimental therapy has higher efficacy than the standard therapy. Sometimes, we may want to prove that an experimental therapy may not be better than, but not inferior to, a standard therapy. In this case, the experimental therapy is less extensive, less toxic or less expensive than the standard therapy, so that the former may be acceptable as far as there is evidence that its efficacy is not much worse than the latter. These types of studies are generally referred to as ‘non-inferiority trials’. Due to the directional nature of the hypotheses involved, the statistical tests used for non-inferiority trials are one-sided. This aspect of testing distinguishes non-inferiority trials from ‘equivalence trials’, where the objective is to find evidence against “no difference”. Sometimes, however, equivalence and non-inferiority have been used interchangeably. For example, Dunnet and Gent [1] and Mehta, Patel and Tsiatis [2] use the term of ‘equivalence’ for the ‘non-inferiority’ type problem. Durrleman and Simon [3] and Whitehead [4] review broad examples of non-inferiority trials and discuss sequential monitoring of such trials. Analysis of a non-inferiority trials requires specification of a non-inferiority margin. A key component of the design for such a trial is to calculate a sample size required for a reasonable power, e.g. 80%–90%, to conclude that the regimens differ in efficacy by less than a specified non-inferiority margin when two arms have an identical efficacy. A typical non-inferiority margin is very small. Rothman et al. [5] propose to use a non-inferiority margin of about half of the superiority margin associated with the benefit of the standard therapy over no therapy. As a result, the required sample size for a non-inferiority trial becomes so big that a typical phase III non-inferiority trial requires a sample size that is often considered infeasible. Hence, in such cases it might be useful to conduct a smaller phase II trial to collect some evidence for potential non-inferiority of an experimental therapy before proceeding to a non-inferiority phase III trial. Phase II trials are designed to screen out inefficacious experimental regimens before evaluating them through larger size phase III trials. Accordingly, phase II trials should be done quickly with small sample sizes. The sample size of a non-inferiority trial is determined by type I error rate, power and non-inferiority margin. So, we will have to compromise the level of these design components for a small non-inferiority phase II trial. In this paper, we propose design and analysis methods for non-inferiority phase II trials. We focus on randomized phase II trial designs here, but we could also consider corresponding single-arm phase II trial designs. For an experimental regimen, its non-inferiority margin will be determined regardless of the phase of a trial. So, the only design components we can change for a small non-inferiority phase II trial are type I error rate and power. In this paper, we propose design and analysis methods for non-inferiority phase II trials. We focus on randomized phase II trial designs here, but we could also consider corresponding single-arm phase II trial designs.

Phase III non-inferiority trials: review (Test 1)

Let θ denote the parameter to measure the difference in efficacy between an experimental arm (arm 2) and a control arm (arm 1), such as log-odds ratio for binary outcomes and log-hazard ratio (log-HR) for survival outcomes. Without loss of generality, we assume that a large θ value means a higher efficacy for the experimental therapy under investigation and means an equal efficacy between control and experimental therapies. Suppose that is an efficient estimator of θ which is approximately for large n, the total number of patients. Most commonly used estimators satisfy this condition, e.g. the MLE. For a chosen non-inferiority margin , we want to test vs. . The type I error rate of a non-inferior trial is the probability that we accept the experimental therapy when the measurement of its efficacy is at most . For a chosen type I error rate α, we reject ifwhere denotes the percentile of the standard normal distribution and denotes a consistent estimator of . Due to the directional hypothesis, we usually use a one-sided type I error rate in non-inferiority testing. In a typical phase III non-inferiority trial, the highest expectation for the experimental therapy is to have an efficacy similar to that of the control therapy. In this sense, we set under the alternative hypothesis for a power or sample size calculation. Then, the required sample size for a non-inferiority phase III trial is given as In a phase III trial, we choose or 0.05, and a very small , so that the resulting sample size is very large. Jung et al. [6] propose a sample size formula of the log-rank test for non-inferiority trials. By assuming and plugging (2) in (1), the non-inferiority test will reject if Note that, with α and β smaller than 0.5, the rejection value on the right hand side of (3) lies in , closer to if and closer to 0 if . Especially, if , then this rejection rule is simplified to . Suppose that the progression-free survival (PFS) of the control treatment has a median of 2 years, which corresponds to an annual hazard rate of under an exponential PFS distribution assumption. A hazard ratio (HR), , smaller than , or a log-HR of , is considered to be non-inferior, while the PFS of the experimental treatment is not expected to be longer than that of the control at best. We assume an accrual rate of patients per year, additional follow-up period of years and 1-to-1 allocation . Then, for (or ) and (or ), the non-inferiority log-rank test of Jung et al. [6] with 1-sided requires patients ( events) for by the sample size calculation method described in the Appendix. This sample size calculation is available from commercial softwares, such as East [7] or PASS [8], possibly using different approximations.

Design of phase II non-inferiority trials

We investigate design and statistical testing for non-inferiority phase II trials. In Sections 3.1, 3.2, we consider the standard non-inferiority trials where the experimental regimens can not be more efficacious than the control regimens. In Section 3.3, we consider the case where an experimental therapy is acceptable as long as it is non-inferior to a control therapy, but the experimental therapy can possibly be slightly superior to the control therapy.

Based on point estimator (Test 2)

In order to lower the sample size of (2), the design parameters we can change are α, β and the non-inferiority margin. As an effort to lower the sample size for a phase II trial, we propose to lower the power , say from 90% to 80%. Furthermore, we propose to reject if in this section. In other words, we accept the experimental therapy, or reject , for further investigation when the observed efficacy measure is larger than the specified non-inferiority margin, . This testing rule, called Test 2, corresponds to in (1). Note that, in an interim analysis testing for futility in a superior phase III trial, it is not unusual to reject the experimental therapy and stop the trial early if the p-value from the interim analysis is larger than or equal to 0.5 [9], [10], [11]. Regarding a phase II trial as an interim analysis in a large scale experiment including a phase II trial and a consecutive phase III trial, may be a reasonable choice for a phase II trial as an efficacy screening test before a large scale phase III trial. In this case, it is easy to show that the required sample size of a non-inferiority phase II trial against is given as Note that this formula corresponds to (2) with that gives . In order to investigate how successful our effort is to lower the sample size of a non-inferiority phase II trial, let , and denote the type I error rate, power and the required sample size, respectively, for phase trial. Then, from (2) and (4), we have Suppose that we choose , 1-sided and . Then, we have . That is, before we conduct a non-inferiority phase III trial requiring 1000 patients, we may conduct a non-inferiority phase II trial to collect some evidence on the experimental therapy with less than 70 patients. This comparison is based on the assumption that the phase II trial randomizes patients between the experimental arm and a prospective control arm. If we have reliable historical control data, then we can further reduce the sample size by using a single-arm phase II trial design. For a phase II trial, we assume the same design setting as in Example 1 except that 1-sided . Then, Test 2 requires patients ( events) for and () for , compared to () for a phase III trial as described in Example 1. Considering too liberal, one could choose a different α level. Table 1 lists the sample sizes for different values under the same design setting. We observe that the sample size easily goes beyond the feasibility for a phase II trial with an α level much smaller than 0.5.
Table 1

Sample size (number of events), , under , , , , and various values of .

1βα
0.0250.050.10.20.30.40.5
0.8854 (632)700 (498)538 (363)364 (229)254 (151)172 (97)106 (58)
0.91090 (846)920 (690)738 (529)540 (364)412 (264)312 (191)228 (134)
Sample size (number of events), , under , , , , and various values of .

Based on a futility test (Test 3)

For , suppose that is a generally accepted superiority margin for the patient population. In a phase II trial, we may want to make sure that the efficacy of the experimental therapy is not lower than that of the control by . So, we want to test vs. . We will reject ifcalled Test 3. The sample size of this test for a power of under is given as Note that this sample size formula is identical to that for a standard superiority test. We now investigate how different this test is from that based on the point estimator as discussed in the previous section, Test 2. From (5), the critical value of Test 3 in terms of is By replacing n in (7) with (6) and assuming , the critical value in (7) is given as By equating (8) with the critical value of Test 2, , we find that Tests 2 and 3 are identical as far as Rothman et al. [5] suggest to choose a non-inferiority margin to be a half of a generally selected superiority margin. In this case, we have , and the two tests become identical if we choose (as is often the case in phase II trials). With , Test 3 is more anti-conservative (conservative) than Test 2 when (). We assume , , years, and as in Example 1, Example 2, but under 1-sided , , and . Table 2 reports the required sample size n and number of events D under each of these design settings. Note that, with a small effect size such as or 0.8, or a small α such as 0.05, we do not have a feasible sample size for a phase II trial. Furthermore, we find that the sample size (and the number of events) for Test 3 with and is identical to that for Test 2 with a half of the effect size .
Table 2

Sample size (number of events), , under , , , , and various values of and .

Δ1(δ1)α=0.05
0.1
0.2
1β=0.80.90.80.90.80.9
0.6(0.511)168 (95)226 (133)126 (70)179 (102)83 (45)128 (71)
0.82(0.446)214 (125)286 (173)162 (92)227 (133)106 (58)163 (92)
0.7(0.357)318 (195)420 (271)241 (144)335 (208)160 (90)242 (143)
0.8(0.223)700 (497)919 (689)538 (363)737 (529)363 (228)539 (364)
Sample size (number of events), , under , , , , and various values of and .

When the experimental arm can be possibly better than the control arm (Test 4)

Suppose that we want to accept an experimental therapy as long as it is non-inferior to a control therapy, but the experimental therapy can possibly be slightly more efficacious than the control therapy. A new therapy with a lower toxicity can possibly have some modest improvement in efficacy [12]. In this case, for and , we may set the hypotheses as vs. for the purpose of sample size calculation, and conduct the standard non-inferiority test (1) to reject the null hypothesis. The sample size for this test is calculated as Assuming that the efficacy of the experimental therapy can not be much higher than the control under , we may set to be identical to , and the non-inferiority margin is chosen as a half of a superiority margin. In this case, the total margin will be identical to the standard superiority margin, so that the sample size (9) will be similar to that of a typical phase II trial for testing superiority of an experimental therapy. By replacing n in (1) with (9) and using the approximation , we reject if Note that this rejection value is larger than the rejection value (3) of the standard non-inferiority test, Test 1. If we set and , then we will reject when as in Test 2. The critical values in terms of of the four tests are summarized in Table 3.
Table 3

Critical values of the four tests under given .

TestHypothesesRejection rule
GeneralSimplified
1H0:θ=δ0 vs. Ha:θ=0θˆ>δ0+δ0z1α/(z1α+z1β)θˆ>δ0/2 if α=β
2H0:θ=δ0 vs. Ha:θ=0θˆ>δ0θˆ>δ0
3H0:θ=δ1 vs. Ha:θ=0θˆ>δ1+δ1z1α/(z1α+z1β)θˆ>δ0 if α=β and δ1=2δ0
4H0:θ=δ0 vs. Ha:θ=δ2θˆ>δ0+(δ2+δ0)z1α/(z1α+z1β)θˆ>0 if α=β and δ2=δ0
We consider a similar design setting to that of Example 1. Suppose that the PFS of the control treatment has an annual hazard rate of . We will not be interested in the experimental arm if its annual hazard rate is or larger using a non-inferiority margin for a HR of (or a log-HR of ), and will be highly interested in the experimental arm if its annual hazard rate is or smaller using a superiority margin for HR of (or a log-HR of ). We assume an accrual rate of patients per year, additional follow-up period of years and 1-to-1 allocation . Then, by the sample size calculation method described in the Appendix, Test 4 requires (number of events ) for and () for . We find that Test 4 requires the same number of events as that of Test 2 with an effect size of and the same and that of Test 3 with an effect size of and the same , but its sample size is slightly larger than those of Tests 2 and 3 since the survival distribution of the of the experimental arm for Test 4 is longer than that for the latter tests under . (Alliance 51306): Alliance 51306 is an actual phase II trial with a primary objective to evaluate the PFS for patients with previously untreated mantle cell lymphoma. Although this study is designed as a single arm trial, we illustrate it as a randomized two-arm trial for demonstrating using the test statistic and sample size method presented in Appendix. The control treatment is an intensive chemotherapy that is known to have a 3-year PFS of 60%, or an annual hazard rate of assuming an exponential PFS model. The patients in the experimental arm of this trial will be treated by a less intensive non-transplant induction regimen. We want to investigate if the experimental treatment has a similar efficacy to the control treatment or not. We will not be interested in the experimental therapy if its 3-year PFS is 53% or lower, or an annual hazard rate of , for which the hazard ratio is and the log-HR is . We would be highly interested if the log-HR is at least , for which the 3-year PFS for the experimental arm is about 66%. Under these hypotheses, we need (number of events ) for 1-sided and , and for 1-sided and . These sample size calculations are based on the assumptions of an exponential PFS distribution, balanced allocation (), a monthly accrual of 3 to 4 patients, and 3 years of additional follow-up period. Critical values of the four tests under given .

Discussions and conclusions

We have presented some possible designs for phase II non-inferiority trials. These are to investigate if an experimental therapy has any evidence of non-inferiority to a standard therapy before proceeding to a large scale phase III non-inferiority trial, but can not take the place of a phase III non-inferiority trial. At the end of the phase II trial, one may either conclude that there is sufficient preliminary evidence of non-inferiority for an experimental regimen to justify a full phase III trial or that the experimental regimen is not likely to be demonstrated as non-inferior when a phase III trial to be conducted. As with any phase II trial, neither definite non-inferiority nor definite inferiority can be concluded from the designs we propose. The proposed designs can be utilized as a phase II part of a phase II/III non-inferiority trial. In this case, a relevant question is whether to suspend the patient accrual after the patient accrual for phase II part is completed. For a seamless connection of phase II and III trials, we may consider continuing the patient accrual between the two parts. In this case, however, the number of patients in the study may be already unacceptably large prior to a conclusion to stop the trial for failing to show the non-inferiority from the phase II component. In a phase II/III trial, the patients of the phase II part will be included in the phase III analysis. Thus, another issue of a phase II/III non-inferiority trial is the potential need for an adjustment of the alpha-spending of phase III part reflecting the phase II part. If the primary outcome is identical between the two parts, then we will need an adjustment of alpha-spending for the phase III test. In this case, the phase II part is regarded as the first interim analysis of a phase III trial, especially if we do not suspend the patient accrual after phase II. If the primary outcomes are different between phase II part and phase III part of a phase II/III trial, we may not need to consider an adjustment for alpha-spending [13]. The choice of endpoint for a non-inferiority trial is an important issue. In a typical phase II superiority trial, an early or surrogate endpoint may be used to accelerate the process of launching a phase III trial. This is done so with the full recognition that the early endpoint often provides more optimistic results than may be expected for the true endpoint of the phase III trial. With a non-inferiority study objective, it is likely more appropriate to use the same endpoint for the phase II trial as would be used for the phase III trial, as over-enthusiasm based on an early endpoint in a phase II trial could increase the risk of conducting a phase III trial when the experimental regimen is truly inferior to the control. The designs proposed here have clear limitations in that they are inherently non-conclusive, as is the nature of phase II trials. We do however believe that they provide a useful approach to formally generate preliminary data to justify a potentially large and resource-intensive phase III non-inferiority trial. For a sample size similar to that of a typical superiority phase II trial, we propose to increase the type I error rate or the effect size of non-inferiority phase II trials.

Funding

This research was supported by a grant from the National Cancer Institute (CA142538-01).
  10 in total

1.  Design and analysis of non-inferiority mortality trials in oncology.

Authors:  Mark Rothmann; Ning Li; Gang Chen; George Y H Chi; Robert Temple; Hsiao-Hui Tsou
Journal:  Stat Med       Date:  2003-01-30       Impact factor: 2.373

2.  Randomized clinical trial design for assessing noninferiority when superiority is expected.

Authors:  Boris Freidlin; Edward L Korn; Stephen L George; Robert Gray
Journal:  J Clin Oncol       Date:  2007-11-01       Impact factor: 44.544

3.  Randomized phase II trials with a prospective control.

Authors:  Sin-Ho Jung
Journal:  Stat Med       Date:  2008-02-20       Impact factor: 2.373

Review 4.  Planning and monitoring of equivalence studies.

Authors:  S Durrleman; R Simon
Journal:  Biometrics       Date:  1990-06       Impact factor: 2.571

5.  Sequential designs for equivalence studies.

Authors:  J Whitehead
Journal:  Stat Med       Date:  1996-12-30       Impact factor: 2.373

6.  Significance testing to establish equivalence between treatments, with special reference to data in the form of 2X2 tables.

Authors:  C W Dunnett; M Gent
Journal:  Biometrics       Date:  1977-12       Impact factor: 2.571

7.  Exact significance testing to establish treatment equivalence with ordered categorical data.

Authors:  C R Mehta; N R Patel; A A Tsiatis
Journal:  Biometrics       Date:  1984-09       Impact factor: 2.571

8.  Stopping when the experimental regimen does not appear to help.

Authors:  S Wieand; G Schroeder; J R O'Fallon
Journal:  Stat Med       Date:  1994 Jul 15-30       Impact factor: 2.373

9.  An efficient design for phase III studies of combination chemotherapies.

Authors:  S S Ellenberg; M A Eisenberger
Journal:  Cancer Treat Rep       Date:  1985-10

10.  On sample size calculation for comparing survival curves under general hypothesis testing.

Authors:  Sin-Ho Jung; Shein-Chung Chow
Journal:  J Biopharm Stat       Date:  2012       Impact factor: 1.051

  10 in total
  1 in total

1.  Cortiva Versus AlloDerm Ready-to-use in Prepectoral and Submuscular Breast Reconstruction: Prospective Randomized Clinical Trial Study Design and Early Findings.

Authors:  Rajiv P Parikh; Marissa M Tenenbaum; Yan Yan; Terence M Myckatyn
Journal:  Plast Reconstr Surg Glob Open       Date:  2018-11-13
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.