Literature DB >> 29736436

A novel design for randomized immuno-oncology clinical trials with potentially delayed treatment effects.

Abstract

The semi-parametric proportional hazards model is widely adopted in randomized clinical trials with time-to-event outcomes, and the log-rank test is frequently used to detect a potential treatment effect. Immuno-oncology therapies pose unique challenges to the design of a trial as the treatment effect may be delayed, which violates the proportional hazards assumption, and the log-rank test has been shown to markedly lose power under the non-proportional hazards setting. A novel design and analysis approach for immuno-oncology trials is proposed through a piecewise treatment effect function, which is capable of detecting a potentially delayed treatment effect. The number of events required for the trial will be determined to ensure sufficient power for both the overall log-rank test without a delayed effect and the test beyond the delayed period when such a delay exists. The existence of a treatment delay is determined by a likelihood ratio test with resampling. Numerical results show that the proposed design adequately controls the Type I error rate, has a minimal loss in power under the proportional hazards setting and is markedly more powerful than the log-rank test with a delayed treatment effect.

Entities: Chemical Disease Gene Species

Keywords: Change point; Clinical trial design; Immuno-oncology; Non-proportional hazards

Year: 2015 PMID： 29736436 PMCID： PMC5935831 DOI： 10.1016/j.conctc.2015.08.003

Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN： 2451-8654

Introduction

The Cox semi-parametric proportional hazards model [1] is widely adopted in clinical trials with time-to-event outcomes to compare an experimental treatment with the best supportive care (BSC). A key assumption in the Cox model is that the hazard ratio is a constant over time, which may be violated as there can be a lag period before the experimental treatment starts to exhibit a beneficial effect. This is particularly the case for immuno-oncology therapies, with one example being ipilimumab, a fully human monoclonal antibody that blocks CTLA-4 to promote immunity. Ipilimumab demonstrated statistically significant improvements in overall survival (OS) in two Phase 3 randomized controlled trials in patients with metastatic melanoma. In both trials a delayed treatment effect of about 4 months was observed based on the Kaplan–Meier (K–M) curves for overall survival [2], [3]. The log-rank test is frequently used to detect a potential treatment effect in randomized time-to-event trials, which has been shown to markedly lose power under the non-proportional hazards setting [4]. Weighted log-rank tests [e.g. Refs. [5], [6], [7]] have been proposed to account for a delayed separation of the K–M curves. One challenge with a weighted log-rank test at the design stage of a trial is the choice of the weight function, which determines the amount of weights assigned to observations at various times. In addition, when there is not a delayed treatment effect the log-rank test is the asymptotically most powerful nonparametric test under the proportional hazards setting [8]. The rest of the paper is structured as follows. In Section 2, we propose a two-stage design and analysis approach for immuno-oncology clinical trials. The number of events required for a trial will be determined to ensure sufficient power for both the overall log-rank test without a delayed treatment effect and the test of a treatment effect beyond the delayed period when such a delay exists. The existence of a treatment delay is determined by a likelihood ratio test with resampling [9]. Numerical results are given in Section 3, which show that the proposed design adequately controls the Type I error rate, has a minimal loss in power under the proportional hazards setting and is markedly more powerful than the log-rank test with a delayed treatment effect. Some discussions and concluding remarks are given in Section 4.

Methods

In this section, we first briefly review the sequential testing approach proposed by He et al. [9] for detecting potentially multiple change points in the proportional hazards model. We simplify their methodology by focusing on detecting only one change point, which will serve as the first step in the analysis of an immuno-oncology trial in our proposed approach. Suppose there are a total of N patients recruited into an immuno-oncology trial with time-to-event endpoints. The proportional hazards model has the form , where and are the respective hazard functions for the treatment and BSC arms. Let denote the independent and identically distributed survival times, be the censoring times which are assumed to be independent of the survival times, and for the treatment arm or BSC arm, respectively. Only the pairs are observed, where and . The partial likelihood of the log hazard ratio β is and the log partial likelihood of β is The hypotheses are versus . The Wald statistic has the form , where maximizes the log partial likelihood and denotes the second derivative of l. The null hypothesis can be evaluated based on the asymptotic normality of. . The change point model proposed by He et al. [9] assumes the hazard ratio function to be a step function as follows:where denote the change points. Here we use a simplified version of their model that assumes only one change point τ to be clinically meaningful. Therefore the hazard ratio function has a simpler form: By replacing β with in (2), we obtain a combined form of the log partial likelihood for , and τ: The corresponding likelihood ratio statistic is as follows: The following resampling procedure is adopted to estimate the distribution of under the null hypothesis: Calculate the Breslow [10] estimate of the baseline cumulative hazard function and obtain the K–M estimate of the censoring distribution . The survival functions for the BSC arm and the experimental arm can be estimated by and , respectively. Generate a total of B (e.g. ) simulated trials with the survival functions and , which LRτcorrespond to a true model of no change points, and the censoring distribution . Obtain the likelihood ratio statistics for each resampled trial. Reject the null hypothesis if , the likelihood ratio statistic calculated from the original trial, is larger than the th percentile of where α controls the false discovery rate under the null hypothesis. Intuitively, when analyzing an immuno-oncology trial a natural first step is to determine whether a change point in the hazard ratio function exists. If a change point is not detected, one should proceed with the standard log-rank test to determine whether a treatment effect exists. If a change point is detected, which implies a delayed treatment effect, one should assess whether there is a statistically significant treatment effect beyond the change point and also ensure that there is not a statistically significant and clinically meaningful effect favoring the control arm before the change point. In this two-stage analysis approach it is important to ensure that the overall Type I error rate is not inflated. Given the desired properties above we propose the following two-stage analysis approach: Apply the likelihood ratio and resampling approach of He et al. [9] to determine whether a change point in the hazard ratio function exists with false discovery rate . If a change point is not detected the standard log-rank test is used to assess whether a treatment effect exists with Type I error rate . If a change point is detected, a log-rank test for the observations beyond the change point (or equivalently the score test based on the proportional hazards assumption) is conducted with type I error rate to determine whether a treatment effect exists, where α is the desired overall type I error rate. When a treatment effect is detected the same test should be applied to the observations before the change point to ensure no early harm caused by the experimental treatment. Theoretically, the proposed approach controls the overall Type I error rate, which is split between the two analysis steps. Intuitively, under the null hypothesis of no treatment effect controls the probability that the analysis will not use the log-rank test, and even if we conservatively assume that the null hypothesis is always rejected when a change point is incorrectly detected the overall Type I error rate is still controlled by . In practice, we may set and when the overall Type I error rate is set to be 5%. By having the proposed two-stage approach will have minimal power loss under a proportional hazards alternative. For example, a design with 90% power at the 5% Type I error level will have 88% power at the 4% Type I error level, and a loss of 2% in this setting as a tradeoff can translate to a substantial power gain under the non-proportional hazards setting, which is demonstrated in Section 3. The proposed approach can be further simplified if there is a strong prior belief that the treatment will not cause early harm to patients. In this case, in the approach of He et al. [9] can be set to be 0. Intuitively, with this added assumption there is one less variable to estimate in the change point detection algorithm, which increases the power for detecting the change point when one exists and as a result increases the overall power of the proposed test. When designing a trial with a potentially delayed treatment effect one should consider where a change point is likely to occur and what the treatment effect size is likely to be beyond the change point. The total number of events for a trial adopting the proposed two-stage design should provide sufficient events beyond the change point so that a treatment effect can be detected with sufficient power. This requires reasonable estimates of survival functions for both treatment arms based on data from early phase trials.

Numerical results

In this section we consider a similar trial design as in Ref. [11] to compare the performance of the proposed approach with that of the standard log-rank test. A randomized trial is designed using the conventional exponential distribution assumption, and 512 events are required to detect an overall hazard ratio of 0.75 between two treatment arms using a log-rank test with a two-sided Type I error rate of 5% and power of 90%. The accrual duration for 680 randomized patients is assumed to be 12 months, and the median survival for the control arm is assumed to be 6 months. The total number of 512 events is considered sufficient to ensure sufficient power for detecting a treatment effect beyond the change point (if one exists) as only 90 events are needed to ensure 90% power for a hazard ratio of 0.5, and a change point is expected to be in the range of 4–8 months. Table 1 shows that the proposed design adequately controls the Type I error rate when the experimental treatment is not beneficial. Under the alternative of proportional hazards with hazard ratio of 0.75 the power of the proposed test is slightly less than that of the log-rank test. Under the alternative of a delayed treatment effect the proposed approach is markedly more powerful than the log-rank test for various locations of the change point in the hazard ratio function. For each scenario a total of 2000 trials were simulated to compare the performance of the two designs. Intuitively, for the proposed design to significantly outperform the log-rank test the algorithm by He et al. [9] needs to be able to accurately detect the change point with high probability, which requires sufficient number of events to be observed both before and after the change point. If the change point occurs very early in the hazard ratio function the proposed design may not provide a substantial advantage over the standard log-rank test. If the change point occurs late the long time interval with no treatment effect leads to reduced power for both approaches even though the power of the proposed design is relatively higher than that of the log-rank test.

Table 1

Power analyses of the proposed design and Logrank design under various scenarios.

	τ	Exp (β1)	Exp (β2)	Proposed design	Logrank design
Null case	NA	1.0	1.0	0.048	0.049
Proportional hazard case	NA	0.75	0.75	0.886	0.903
Non-proportional case 1	4	1.0	0.5	0.994	0.975
Non-proportional case 2	5	1.0	0.5	0.978	0.890
Non-proportional case 3	6	1.0	0.5	0.937	0.738
Non-proportional case 4	7	1.0	0.5	0.878	0.576
Non-proportional case 5	8	1.0	0.5	0.712	0.321
Non-proportional case 6	4	0.9	0.5	0.998	0.996
Non-proportional case 7	5	0.9	0.5	0.992	0.977
Non-proportional case 8	6	0.9	0.5	0.986	0.938
Non-proportional case 9	7	0.9	0.5	0.955	0.876
Non-proportional case 10	8	0.9	0.5	0.872	0.741

Power analyses of the proposed design and Logrank design under various scenarios.

Discussions and conclusions

When designing randomized clinical trials for immuno-oncology therapies the standard design based on proportional hazards assumption and log-rank test may not be optimal if there is a reasonable chance of a delayed treatment effect based on the mechanism of action and the available pre-clinical and clinical data to date. A two-stage design and analysis approach is proposed, which first tries to determine if a delayed effect exists. When such an effect is detected testing should be done for the time periods before and after the change point separately. If a delayed effect is not detected the proposed algorithm proceeds with the standard log-rank test. The tradeoff between a small loss in power under the proportional hazards setting and a marked gain when a delayed effect exists may be considered favorable for certain classes of therapies. When considering the two-stage approach extensive simulations should be conducted based on reasonable assumptions on the important trial parameters to determine whether it is indeed favorable over the standard design. The proposed approach may be most valuable when sufficient numbers of events are expected for the time periods before and after the change point. In this case, the standard log-rank test is expected to markedly lose power, and the proposed approach not only has a high likelihood of accurately detecting the change point but also provides sufficient sample size to characterize the treatment effect for both periods. The change point detection algorithm of He et al. [9] is capable of detecting multiple change points in the hazard ratio function, which may be utilized to develop a broad class of designs where treatment effect can be detected for at least one time interval with no harm demonstrated in the other time intervals. Finally, the proposed approach can be easily extended to group sequential trials with an alpha-spending function. In particular, one may spend a small portion (e.g. 20%) of the α to be spent at each interim on identifying a potential change point. Alternatively, since it's unlikely to detect a change point at the early interims given the immaturity of the K–M curves one may start to detect change points at the later interims or only at the final analysis.

4 in total

1. Ipilimumab plus dacarbazine for previously untreated metastatic melanoma.

Authors: Caroline Robert; Luc Thomas; Igor Bondarenko; Steven O'Day; Jeffrey Weber; Claus Garbe; Celeste Lebbe; Jean-François Baurain; Alessandro Testori; Jean-Jacques Grob; Neville Davidson; Jon Richards; Michele Maio; Axel Hauschild; Wilson H Miller; Pere Gascon; Michal Lotem; Kaan Harmankaya; Ramy Ibrahim; Stephen Francis; Tai-Tsang Chen; Rachel Humphrey; Axel Hoos; Jedd D Wolchok
Journal: N Engl J Med Date: 2011-06-05 Impact factor: 91.245

2. A sequential testing approach to detecting multiple change points in the proportional hazards model.

Authors: Pei He; Liang Fang; Zheng Su
Journal: Stat Med Date: 2012-08-30 Impact factor: 2.373

3. Improved survival with ipilimumab in patients with metastatic melanoma.

Authors: F Stephen Hodi; Steven J O'Day; David F McDermott; Robert W Weber; Jeffrey A Sosman; John B Haanen; Rene Gonzalez; Caroline Robert; Dirk Schadendorf; Jessica C Hassel; Wallace Akerley; Alfons J M van den Eertwegh; Jose Lutzky; Paul Lorigan; Julia M Vaubel; Gerald P Linette; David Hogg; Christian H Ottensmeier; Celeste Lebbé; Christian Peschel; Ian Quirt; Joseph I Clark; Jedd D Wolchok; Jeffrey S Weber; Jason Tian; Michael J Yellin; Geoffrey M Nichol; Axel Hoos; Walter J Urba
Journal: N Engl J Med Date: 2010-06-05 Impact factor: 91.245

4. Statistical issues and challenges in immuno-oncology.

Authors: Tai-Tsang Chen
Journal: J Immunother Cancer Date: 2013-10-21 Impact factor: 13.751

4 in total

1 in total

1. Designing cancer immunotherapy trials with random treatment time-lag effect.

Authors: Zhenzhen Xu; Yongsoek Park; Boguang Zhen; Bin Zhu
Journal: Stat Med Date: 2018-09-10 Impact factor: 2.373

1 in total