Literature DB >> 34064394

Design and Analysis of Cancer Clinical Trials for Personalized Medicine.

Abstract

Biomarkers play a key role in the development of personalized medicine. Cancer clinical trials with biomarker should be appropriately designed and analyzed reflecting the various factors, such as the phase of trials, the type of biomarker, the study objectives, and whether the used biomarker is already validated or not. In this paper, we demonstrate design and analysis of two phase II cancer clinical trials, one with a predictive biomarker and the other with a prognostic biomarker. A statistical testing method and its sample size calculation method are presented for each of the trials. We assume that the primary endpoint of these trials is a time to event variable, but this concept can be used for any type of endpoint with associated testing methods. The test statistics and their sample size formulas are derived using the large sample approximation based on the martingale central limit theorem. Using simulations, we find that the test statistics control the type I error rate accurately and the sample sizes calculated using the formulas maintain the statistical power specified at the design stage.

Entities: Chemical Disease Gene Species

Keywords: enrichment trial; interaction; predictive biomarker; prognostic biomarker; progression-free survival; stratified randomization trial

Year: 2021 PMID： 34064394 PMCID： PMC8147797 DOI： 10.3390/jpm11050376

Source DB: PubMed Journal: J Pers Med ISSN： 2075-4426

1. Introduction

In many cancer clinical trials, different types of biomarkers are measured from the tumor, blood or urine using molecular, biochemical, physiological, anatomical, or imaging method at the baseline or during treatment. The observed biomarkers are used for various purposes during the diagnosis and treatment of the diseases. For example, cancer biomarkers are used to diagnose diseases (diagnostic biomarker), to predict the response to a specific treatment (predictive biomarker), to measure the aggressiveness of a disease for patients with no or a non-targeted treatment(prognostic biomarker), to monitor the recurrence of a disease, and so on. These biomarkers can be used to select a treatment of cancer patients. However, biomarkers should be validated before being used to select a treatment in clinical trials. If a biomarker has not been validated yet, it can be used as a stratification factor of a randomized clinical trial. In such a trial, the biomarker is used for its validation, rather than for treatment selection. The design and analysis method of a clinical trial with a biomarker-guided treatment can be very different depending on the type of the used biomarker, the biomarker’s development stage, the study objective, and so on. Various design issues of randomized clinical trials with biomarkers have been widely discussed [1]. A series of statistical testing has been proposed for a randomized phase II trial with a potentially predictive biomarker which has not been strictly validated yet [2]. The efficacy of enrichment trials and stratified randomization trials with a time to event variable as the primary endpoint has been compared assuming that the treatment effect reverses between biomarker positive and negative groups and considering subset analysis within each biomarker status group [3]. Phase II trials are to screen out inefficacious treatments before proceeding to a large-scale studies, such as a phase III trial. As such, phase II trials should be completed in a short time period, so that we must choose a small sample size and a short-term surrogate endpoint, such as tumor response or progression-free survival, as the primary endpoint, rather than a confirmatory endpoint, such as overall survival. In this paper, we demonstrate two phase II cancer clinical trials, one with a predictive biomarker and the other with a prognostic biomarker, and present analysis and sample size calculation methods for these trials. We use a survival variable as the primary endpoint in this paper, but the same concept can be used for any kind of variables including a binary variable, such as tumor response. For the purpose of sample size calculation, we assume exponential survival distributions which are most popularly used in real trial designs, although the statistical testing does not depend on any specific survival distribution. This is a review article of a biostatistical paper [4] with some modifications.

2. Materials and Methods

We consider a time to event (or survival) endpoint, progression-free survival (PFS). We use a generalized log-rank test for a trial with an imaging prognostic biomarker and a Cox proportional hazards model for a trial with a predictive biomarker, and derive their sample size formulas. To account for relatively small sample sizes of phase II trials, exact statistical methods are used for binary outcomes, but in general no exact methods are available for survival analysis. Therefore, using simulations on two real trial examples, we evaluate the small sample performance of the discussed statistical tests and their sample size formulas that are derived based on large sample approximation.

3. Results

3.1. A Phase II Trial with a Predictive Biomarker

Predictive biomarkers help provide information on the likelihood of response to a specific chemotherapy. For example, tumors expressing high thymidylate synthase (TS) levels were shown to be resistant to pemetrexed in a preclinical study [5], but it was not validated by a clinical study yet. Suppose that we want to investigate whether TS expression is a predictive marker for the clinical outcome of pemetrexed/cisplatin (PC) in patients with nonsquamous non–small-cell lung cancer (NSCLC) through a phase II trial. The control non-targeted treatment is gemcitabine/cisplatin (GC). Compared to GC, PC is expected to be similarly efficacious for TS-positive group, but to be more efficacious in TS-negative group. To investigate this hypothesis, we want to randomize patients between the two treatment arms stratifying by TS-positivity vs. TS-negativity. This trial was designed and published with overall response as the primary endpoint [6], but in this paper, we demonstrate how to design and analyze a trial using PFS as the primary endpoint based on the estimates from the trial.

3.1.1. Statistical Testing

When this study is completed, PFS will be regressed on treatment allocation (=0 for GC arm; =1 for PC arm) and TS-positivity (=0 for TS-negative group; =1 for TS-positive group) using a proportional hazards model [7] Please note that we have an interaction term in the model. From model (1), the hazard functions of four patient groups defined by treatments and TS status are given as , , , and . For TS-positive patients , the hazard ratio between PC and GC is , so that we expect if GC and PC are similarly efficacious for TS-positive patients. For GC arm (), the hazard ratio between TS-positivity group and TS-negativity group is , so that we will have if GC is non-targeted against TS. With , is the hazard function for GC arm. On the other hand, for PC arm (), the hazard ratio between TS-positivity group and TS-negativity group is since GC is non-targeted treatment (i.e., ). If TS-positive tumors are resistant to pemetrexed, we will have . Therefore, the hypotheses of interest are and . For patient , let be the minimum of censoring time and survival time, be the event indicator taking 1 if tumor progression has occurred and 0 otherwise, and be the covariate vector. Partial score and information functions for regression coefficients are given as and respectively, where is the event process, is the at-risk process, is an indicator function, and for a vector z. Let denote the solution to . Then, is approximately normal with mean 0, and variance–covariance under the global null hypothesis of [8]. Hence, with a one-sided type I error rate of , we reject in favor of if , where is the -component of and is the quantile of the standard normal distribution.

3.1.2. Sample Size Calculation

For sample size calculation of this study, we need to specify following design parameters. Type I error rate and power, Allocation proportion for GC arm, , and for PC arm, TS-negativity and TS-positivity based on the prevalence in the study population Assuming exponential distributions for PFS, the hazard rates, , of the four patient groups, , , , and Accrual period a (or accrual rate r) and additional follow-up period b Assuming exponential distributions for PFS with hazard rates , model (1) is simplified to with By solving these equations with respect to , we have Hence, we can calculate the values of under in terms of the hazard ratios that are specified as design parameters above. To derive a sample size formula, we need to calculate the limit of or as in terms of the design parameters. Let for or 1 denote the relative frequency of each cell of the table defined by treatment and TS status. Under the stratified randomization scheme, we have for or 1. Appendix A shows that converges to , where denotes the expected number of events (or number of patients with tumor progression), denotes the probability that a patient has a progression during the study period, denotes the probability that a patient in group has a progression for as derived based on an exponential PFS distribution and censoring distribution, and Please note that is derived from an exponential PFS distribution with hazard ratio and a censoring distribution of . Hence, the limit of is , where is the component of . From (2), is the value specified under . Since has the standard normal distribution under , the power for a local alternative hypothesis is given as where is the survivor function of the standard normal distribution. Noting that , we obtain the required number of events or the required sample size by solving Equation (3). Formula (4) requires specification of accrual period a together with , , b, and . In designing a clinical trial, however, we can estimate the accrual pattern, rather than an accrual period. Suppose that patients are expected to be enrolled to the study at a rate of r during an accrual period based on the number of patients treated by the study member sites recently. Assuming uniform patient accrual during period a, we have . Noting that is a function of a, (4) is expressed as By solving (5) with respect to a using a numerical method, such as the bisection method, we obtain the required accrual period, say , and the required sample size .

3.1.3. Example 1

We demonstrate our sample size calculation method with the NSCLC trial that is introduced above. We will randomize patients between the two treatment arms with 1-to-1 fashion, i.e., stratified by TS status. The expected TS-positivity is 50% (i.e., ) because the median TS level was selected as the cutoff value for TS-positivity from a previous study [9]. Hence, we have for or 1. The 6-month PFS is expected to be about 35% for GC arm regardless of TS level and for PC arm with TS-positivity, and 55% for PC arm with TS-negativity. For an exponential distribution, t-year survival probability is associated with its hazard rate by . Therefore, the annual hazard rates under the alternative hypothesis are given as and under the exponential PFS assumption. For these hazard rates, we have the baseline hazard rate , , and from (2). Suppose that about 10 patients per month are expected to be entered to the study, i.e., an annual accrual rate of . We plan to follow the patients for additional year after the last patient enters. Then, the 1-sided test for against in model (1) requires patients for a power of . The expected number of events (i.e., number of patients with a disease progression) at the analysis will be . As an effort to lower the sample size for this phase II trial, we use a large level compared to the standard two-sided . We observe an empirical power of 0.897 from 10,000 simulation samples of size that are generated at the design setting. This trial recruited 321 patients using overall response as the primary endpoint [6]. A stratified randomized trial of a treatment with a predictive biomarker requires a large sample size for testing on the interaction term. Sample size of a trial for testing the interaction term with 50% of biomarker positivity may be compared to that of a trial for an arm-to-arm comparison with 1-to-3 randomization in the setting of the NSCLC trial expecting a higher efficacy of PC only for TS-negative group. Let us consider a randomized trial to compare two treatment arms with as above and 6-month PFS of 35% for the control treatment and 55% for the experimental treatment. In this case, we need only () patients by 1-to-3 randomization by a sample size formula for the standard log-rank test [10]. If TS had been already validated to be a predictive biomarker of pemetrexed before this trial, then we could have chosen an enrichment trial for TS-negative patients that would require a much smaller sample size. The efficiency has been compared between an enrichment design and a stratified randomization design has been for predictive biomarker in terms of a continuous outcome [11].

3.2. A Phase II Trial with a Prognostic Biomarker

Prognostic biomarkers provide information on the overall cancer outcome in patients to facilitate cancer diagnosis regardless of selected treatments. In this section, we consider a phase II trial with an imaging prognostic biomarker. Chemotherapy B has been a standard regimen for patients with non-bulky stage I and II Hodgkin lymphoma. In a previous study on 6 cycles of B, each patient had a FDG-PET (fluorodeoxyglucose positron-emission tomography) imaging after 2 cycles of B. It was found that the patients with a negative PET image (group 1) and those with a positive PET image (group 2) had a 3-year PFS of and , respectively, and the hazard ratio, , was estimated as . In a new single-arm phase II trial, the patients with a negative PET image after 2 cycles of B will be treated by additional 4 cycles of the chemotherapy B as in the previous study, whereas those with a positive PET image after 2 cycles of B will be treated by 4 cycles of a more aggressive chemotherapy C plus radiation therapy (C+RT). In this trial, we want to show that by treating PET positive patients with the more aggressive therapy C+RT, their PFS will become closer to that of PET negative patients who are treated by the standard chemotherapy B. To this end, we test against . Although the PFS of group 2 will be different between and , that of group 1 is expected to be identical since PET negative patients receive the same treatment as that of the previous study.

Statistical Testing

Let denote the sample size in group k, the total sample size, and the time to progression for subject i in group k (). We observe , where is the minimum of and the censoring time and is an event (or progression) indicator taking 1 if the subject had a tumor progression and 0 otherwise. For group k, are distributed with hazard function . Under the proportional hazards assumption, denotes the hazard ratio between the two patient groups. Let denote the Aalen–Nelson estimator [12,13] for the cumulative hazard function , and are the at-risk process and the event process for group k, respectively, and . It was shown [14] that is increasing in , and is asymptotically , where Hence, we reject , in favor of , if with one-sided type I error rate . The test statistic with is the standard log-rank test [15].

3.3. Sample Size Calculation

We want to estimate the sample size n under a local alternative hypothesis (< with a desired power. For sample size calculation of this trial, we need to specify following design parameters. Type I error rate and power, PET-negativity and PET-positivity, Distributions of PFS for PET negative and PET positive patient groups: exponential distributions with hazard rates for PET negative group; under and under for PET positive group Accrual period a (or accrual rate r) and additional follow-up period b Using the specified hazard rates, we have and . A sample size formula with and for designing non-inferiority trials was proposed [14]. This sample size formula was further extended for general and with [16]. Appendix B derives a sample size formula by adapting Jung and Chow’s formula [16] for this trial, where , , is the survivor function of the censoring distribution with and . The integrals for , , and are calculated using a numerical method. The number of events D expected at the analysis time under is calculated by , where and . Sample size formula (6) assumes that the accrual period a is specified. Suppose that accrual rate r is specified instead of accrual period a. Given , and for are functions of a. Under uniform accrual assumption, we have . Hence, (6) is expressed as By solving (7) with respect to a, using a numerical method such as the bisection method, we obtain the required accrual period and the required sample size .

Example 2

We consider the PET-guided Hodgkin lymphoma trial introduced in the beginning of this section. Under , we assume a 3-year PFS of 86% and 52% for PET negative and positive groups, respectively, which correspond to annual hazard rates of under an exponential PFS model, resulting in a hazard ratio of . By treating the PET positive patients with an aggressive treatment C+RT, we expect to increase their 3-year PFS up to 74% (from 52%), resulting in an annual hazard rate of and hazard rate of . The previous study observed about of PET-positivity. Assuming an annual accrual rate of patients and years of additional follow-up after completion of accrual, we need patients for power for detecting by the generalized log-rank test with one-sided under . Under this specific alternative hypothesis, we expect about 46 events (progressions or deaths) at the data analysis. This trial was conducted with this study objective as a second objective [17]. Simulation studies are conducted to evaluate the performance of the calculated sample size under the above design settings under and . Using 10,000 simulation samples of size under each hypothesis, the empirical type I error rate and power are observed as 0.0984 (to be compared to ) and 0.8749 (to be compared to ), respectively.

4. Discussion

We have presented design and analysis methods of two phase II trials for biomarker-guided treatments. The power of the statistical tests of the trials discussed above depends on the prevalence of the biomarker positivity. Therefore, we need to check the observed prevalence during the patient accrual, and to recalculate the sample size if the observed prevalence is very different from the one specified at the design stage. For both of our example trials, the initial sample size will be under-powered if the observed prevalence is farther from than the specified one at the design stage. In this case, we may plan a sample size recalculation reflecting the observed prevalence in the middle of the trial, and modify the sample size of the trial if necessary. For the sample size calculations, we have assumed exponential survival distributions and an accrual pattern with a constant accrual rate, but we can easily extend the formulas for any survival distributions and any accrual pattern [18]. We have considered a survival endpoint as the primary endpoint in this paper, but the concept can be used to design and analysis for biomarker-driven phase II trials with other type of endpoint, such as a binary outcome for tumor response. As an effort to lower the sample size of a phase II trial from that of a phase III trial, we use a high type I error rate [19,20], such as 1-sided or 10% (compared to 2-sided ), a surrogate short-term outcome, such as tumor response or progression-free survival (compared to a confirmatory endpoint such as overall survival), a larger treatment effect, and a single-arm design (compared to a randomized design). Despite these efforts, we observe that a randomized trial stratified by a predictive biomarker requires a relatively large sample size for a phase II trial. This fact is pointed out by literature [3,11,21].

5. Conclusions

Through simulations on the two real study examples, we find that the proposed statistical tests control the type I error rate accurately and the calculated sample sizes maintain the appropriate power. The sample size calculations require some numerical methods for integration and solving equations. The author developed Fortran programs to implement the sample size formulas, which are available upon request.

15 in total

1. Sample sizes for proportional hazards survival studies with arbitrary patient entry and loss to follow-up distributions.

Authors: N A Yateman; A M Skene
Journal: Stat Med Date: 1992-06-15 Impact factor: 2.373

2. Evaluating the efficiency of targeted designs for randomized clinical trials.

Authors: Richard Simon; Aboubakar Maitournam
Journal: Clin Cancer Res Date: 2004-10-15 Impact factor: 12.531

3. Randomized clinical trials with biomarkers: design issues.

Authors: Boris Freidlin; Lisa M McShane; Edward L Korn
Journal: J Natl Cancer Inst Date: 2010-01-14 Impact factor: 13.506

4. Randomized phase II trial designs with biomarkers.

Authors: Boris Freidlin; Lisa M McShane; Mei-Yin C Polley; Edward L Korn
Journal: J Clin Oncol Date: 2012-08-06 Impact factor: 44.544

5. Significance of thymidylate synthase and thyroid transcription factor 1 expression in patients with nonsquamous non-small cell lung cancer treated with pemetrexed-based chemotherapy.

Authors: Jong-Mu Sun; Joungho Han; Jin Seok Ahn; Keunchil Park; Myung-Ju Ahn
Journal: J Thorac Oncol Date: 2011-08 Impact factor: 15.609

6. Pemetrexed Plus Cisplatin Versus Gemcitabine Plus Cisplatin According to Thymidylate Synthase Expression in Nonsquamous Non-Small-Cell Lung Cancer: A Biomarker-Stratified Randomized Phase II Trial.

Authors: Jong-Mu Sun; Jin Seok Ahn; Sin-Ho Jung; Jiyu Sun; Sang Yun Ha; Joungho Han; Keunchil Park; Myung-Ju Ahn
Journal: J Clin Oncol Date: 2015-06-29 Impact factor: 44.544

7. On sample size calculation for comparing survival curves under general hypothesis testing.

Authors: Sin-Ho Jung; Shein-Chung Chow
Journal: J Biopharm Stat Date: 2012 Impact factor: 1.051

8. Significance of thymidylate synthase for resistance to pemetrexed in lung cancer.

Authors: Hiroaki Ozasa; Tetsuya Oguri; Takehiro Uemura; Mikinori Miyazaki; Ken Maeno; Shigeki Sato; Ryuzo Ueda
Journal: Cancer Sci Date: 2009-09-10 Impact factor: 6.716

9. CALGB 50604: risk-adapted treatment of nonbulky early-stage Hodgkin lymphoma based on interim PET.

Authors: David J Straus; Sin-Ho Jung; Brandelyn Pitcher; Lale Kostakoglu; John C Grecula; Eric D Hsi; Heiko Schöder; Leslie L Popplewell; Julie E Chang; Craig H Moskowitz; Nina Wagner-Johnston; John P Leonard; Jonathan W Friedberg; Brad S Kahl; Bruce D Cheson; Nancy L Bartlett
Journal: Blood Date: 2018-07-26 Impact factor: 22.113

10. Randomized phase II clinical trials.

Authors: Sin-Ho Jung; Daniel J Sargent
Journal: J Biopharm Stat Date: 2014 Impact factor: 1.051