Literature DB >> 33553801

Optimal two-stage designs based on restricted mean survival time for a single-arm study.

Abstract

Restricted mean survival time is an alternative measure of treatment effect to hazard ratio in clinical trials with time-to-event outcome. The current methods have been focused on one-stage designs. In this article, we propose optimal two-stage designs for a single-arm study with the smallest expected sample size. We compare the performance of the new optimal two-stage designs with the existing one-stage design with regards to the expected sample size and the expected total study length. The simulation results indicate that the new two-stage designs can save the expected sample size substantially as compared to the one-stage design. We use a non-small cell lung cancer trial to illustrate the application of the proposed designs. The proposed optimal two-stage designs are recommended for use when time for patient accrual is longer than the pre-specified follow-up time.

Entities: Chemical Disease Gene Species

Keywords: Optimal designs; Proportional hazards; Restricted mean survival time; Two-stage designs

Year: 2021 PMID： 33553801 PMCID： PMC7856426 DOI： 10.1016/j.conctc.2021.100732

Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN： 2451-8654

Introduction

Restricted mean survival time (RMST) is an alternative measure of treatment effect to hazard ratio (HR) for a study with time-to-event data [[1], [2], [3]]. RMST is calculated as the area under the survival curve from time 0 to a pre-specified time τ (e.g., 1-year) [4]. Thus, it can be interpreted as the expected survival time for a patient being followed up to τ [5,6]. Although HR is traditionally used in designing clinical trials, its statistical inference depends on the assumption of proportional hazards over time. It was reported that almost 25% of cancer studies did not meet the proportional hazards assumption [7], but the HR-based models were still used in data analysis. In early phase cancer clinical trials with survival outcome as the primary endpoint, treatment effects are often compared by using the survival rates at the pre-specified time. The survival rate at τ provides information at that particular time, while RMST captures the survival information from baseline to τ with a better and comprehensive understanding of the treatment effect on average survival. Recently, Trinquart et al. [7] compared statistical inferences using HR and the difference of RMST with 54 randomized cancer clinical trials selected from five leading oncology journals. They found that both measures generally agree with each other with regards to the group difference using HR or RMST [8,9]. They also reported that the computed HR is often larger than the ratio of RMST from those cancer studies. Later, Huang and Kuan [10] used simulation studies to compare the performance of the HR-based inference and the RSMT test [11]. Their simulation results indicated that the log-rank test is generally more powerful than the RSMT test when the proportional hazard assumption is satisfied [8,12]. However, when that assumption is not met, the RSMT test is preferable. In early phase cancer clinical trials, single-arm studies are widely used to quickly investigate the activity of a new treatment as compared to the treatment effect estimated from historical data. Current designs based on RMST have been focused on one-stage designs (e.g., the R package ) [1,4,13,14]. In the case when a new treatment under investigation is not as effective as expected, a study should be allowed for futility stopping earlier. Therefore, there is a need to develop new optimal single-arm two-stage designs based on RMST to investigate the activity of a new treatment. The new two-stage designs in this article are based on the same criteria as Simon's two-stage designs for binary outcome [[15], [16], [17], [18], [19], [20], [21]], having the smallest expected sample size under the null hypothesis [[22], [23], [24]]. In a two-stage design, investigators could temporarily suspend the patient enrollment after the enrollment of the last patient at the first stage, and wait for time τ to observe the time-to-event outcome of each participant. A two-stage design without interim accrual could be used when τ is short. Alternatively, investigators may consider the optimal two-stage designs with interim accrual to save study time [9,15,25]. The rest of the article is organized as follows. In Section 2, we provide the detailed steps to compute sample sizes for optimal two-stage designs based on RMST. In Section 3, we compare the performance of the existing one-stage design with the proposed two-stage designs with regards to the expected sample size and the expected total study length. Then, a non-small cell lung cancer trial is used as historical data to calculate sample size for optimal two-stage designs. Lastly, we provide some comments in Section 4.

Methods

When the patient accrual time is relatively longer than the restricted follow time (τ), a two-stage design can potentially save sample sizes and costs as compared to the one-stage design. Let be the RMST with the pre-specified follow-up time of τ. In this article, we propose developing new optimal two-stage designs based on RMST in a single-arm study setting. Suppose the estimated RMST from previous studies is , and a new treatment is expected to have a larger RMST as , where . Then, the hypotheses to test the effectiveness of a new treatment are presented aswhere the statistical power of a study is computed at . For a study with N participants, we assume that they are uniformly enrolled during the accrual time , where , and θ is the patient accrual rate. Let and be the event time and the right censoring time for the i-th participant, respectively, where and N. The censoring data are . For a one-stage design, a study is ended when all participants are being followed by τ. Upon the completion of the study with the total study time of , the observed time for the i-th participant is: . It follows that the RMST after the completion of a study can be estimated as:where is the Kaplan–Meier (KM) estimator [26] for the survival function of T based on the observed data . The can be computed by using the package from the statistical software R [27]. Specifically, the function is used from the package for a single-arm study. The developed R software program is available per request from the author. For a two-stage design, we propose two commonly used optimal designs which control for the type I error and power: the optimal two-stage design with the smallest expected sample size (ESS) under the null hypothesis; the minimax two-stage design having the smallest ESS among the ones with the smallest maximum possible sample size (MPSS). These two optimal designs are commonly utilized in cancer clinical trials, and often known as the optimal design and the minimax design, respectively. For a study with time-to-event endpoint, the outcome is a long-term measure. Optimal two-stage designs with interim accrual which could significantly reduce study time. Meanwhile, designs without interim accrual are able to observe the time-to-event outcome of all participants at the interim analysis. When the first stage time is too short (e.g., ), the computed area under the survival curve is from 0 to , not from 0 to τ. Then, that RMST estimator from the first stage can not be considered as an estimator for RMST in Equation (1). Therefore, the first stage time must be longer than τ, such as . For a study with patients in the first stage, the total time in the first stage is for a study without interim accrual, and its RMST at the first stage is estimated aswhere is the Kaplan–Meier estimator for the survival function of T based on . For a study with interim accrual, participants enrolled close to may be censored due to a short follow-up time. Suppose is the entering time of the i-th participant. Then, the observed time for the i-th participant is , and the censoring outcome is . Then, its RMST at the first stage for a study with interim accrual is estimated as where is the KM estimator for the survival function of T based on . To compute the required sample size for a study with design parameters (τ, , , α, β, and κ), the very first step is to determine the scale parameters of the Weibull distribution, where κ is the shape parameter of the Weibull distribution. The scale parameter of the Weibull distribution under the null hypothesis, , can be determined by solving the equation in a one-stage design setting, where in the follows the Weibull distribution with the shape parameter of κ and the scale parameter . Similarly, the scale parameter under the alternative hypothesis is solved from the equation . For a one-stage optimal design, data are simulated from the Weibull distributions with κ as the shape parameter, and and as scale parameters. For each simulated data, we use the package to calculate which is based on the KM estimator, and is used as the test statistic for type I error and power calculation. The (1-α) upper quantile of all the test statistics from the null data is the threshold to control for the type I error. Power is then computed as the proportion of data under the alternative distribution whose test statistics are above that threshold. The optimal one-stage design is defined as the one with the smallest N and the estimated power above . Suppose is the required sample size for a one-stage design. We search for optimal two-stage designs with the MPSS N from 0.9 to 1.2. In practice, the sample size boundaries should be adjusted when an optimal two-stage design is identified near the boundary of the sample size space. For a given N, the lower bound of the first stage sample size is set as , where the constant 1.1 is added to make sure at least 10% participants being followed by τ. The upper bound of is set as . For each given and N, the test statistics for the first stage and for both stages combined, are computed under the null and alternative hypotheses. The type I error rate is calculated aswhere and r are the thresholds for the first stage and both stages combined, respectively. We use the range of for the possible values (uniformly distributed), and a similar approach is used to choose the r values from the computed values. For any set of with , we compute power of a two-stage design as: If the computed power is above the nominal level (1-β), its associated sample sizes and threshold values are saved as a candidate for the desired optimal design. The ESS under the null hypothesis is computed aswhere is the probability of early termination (PET) at the first stage due to futility. The ETSL for a study with interim accrual is calculated asand the ETSL for a study without interim accrual is It should be noted that the PETs in Equation (2) and Equation (3) are computed using different distributions. The ESS and the ETSL are highly correlated: a study with a large ESS often has a long ETSL. When the PET is high, the computed ESS is close to the first stage sample size and the ETSL is close to for the first stage.

Results

We compare the performance of the one-stage design and the two-stage optimal designs with regards to the ESS and the ETSL under the null hypothesis. The considered RMST values under the null hypothesis are: = 0.2τ, 0.5τ, and 0.7τ, where the restricted follow-up time τ is set as 12 months. The difference between and is 0.10τ, 0.15τ, and 0.20τ. The shape parameter of the Weibull distribution is assumed to be the same under the null and the alternative hypotheses, with the values of 0.5, 1, and 1.25. The exponential distribution is a special case of the Weibull distribution with . The type I error rate is set as , and the nominal level of power is 80% (). Since the null hypothesis is rejected for a large RMST value, it is a one-sided test with . We use Monte Carlo simulation studies to generate random samples to identify the designs that meet both the type I error and power requirements as described in the Methods section. Fig. 1 shows the one-stage design and the two-stage designs when the RMST is months. The required one-stage sample sizes decrease as the shape parameter of the Weibull distribution κ increases. The ESS of two-stage optimal designs is much smaller than the sample size of the one-stage design, with savings from 24% to 32% having the average saving of 29%. The ESS of the minimax two-stage design is between the optimal two-stage design and the one-stage design, and it saves the ESS in the average of 17% with the range from 10% to 22% as compared to the one-stage design. The savings of the ETSL for the proposed two-stage designs are similar to the ESS savings.

Fig. 1

The ESS and the ETSL of the proposed two-stage designs as compared to the one-stage design, when months and the alternative expected survival months (top), (middle), and (bottom).

The ESS and the ETSL of the proposed two-stage designs as compared to the one-stage design, when months and the alternative expected survival months (top), (middle), and (bottom). We present the MPSS of the optimal and minimax two-stage designs in Fig. 2 using the 9 configurations from Fig. 1 when months. It can be seen that the MPSS of the minimax designs is generally smaller than that of the optimal designs, with the average saving of 15%. We observe similar results for the ESS and the ETSL comparisons between the one-stage design and the two-stage designs in Fig. 3 when months, and in Fig. 4 when months.

Fig. 2

The MPSS of the optimal two-stage designs and the MPSS of the minimax two-stage designs for the 9 configurations in Fig. 1 when months.

Fig. 3

The ESS and the ETSL of the proposed two-stage designs as compared to the one-stage design, when months and the alternative expected survival months (top), (middle), and (bottom).

Fig. 4

The ESS and the ETSL of the proposed two-stage designs as compared to the one-stage design, when months and the alternative expected survival months (top), (middle), and (bottom).

The MPSS of the optimal two-stage designs and the MPSS of the minimax two-stage designs for the 9 configurations in Fig. 1 when months. The ESS and the ETSL of the proposed two-stage designs as compared to the one-stage design, when months and the alternative expected survival months (top), (middle), and (bottom). The ESS and the ETSL of the proposed two-stage designs as compared to the one-stage design, when months and the alternative expected survival months (top), (middle), and (bottom). We use a phase II refractory or recurrent non-small cell lung cancer trial [28] to further study the proposed single-arm two-stage optimal designs. Data from the phase II study reported by Takiguchi et al. [28] are used as pilot data to estimate the RMST given 12 months. The RMST is estimated as , which can be calculated by using the digital software, Engauge Digitizer, to capture the measures on the progression-free survival (PFS) curve. Suppose we have a new treatment for refractory or recurrent non-small cell lung cancer with the estimated RMST of which is months longer than the treatment with second-line chemotherapy consisting of weekly irinotecan and cisplatin in the reported phase II study. The calculated one-stage sample size is 62 to attain 80% power at based on RMST. Alternatively, one may use the HR model for sample size calculation, and it is computed as 60 patients for the one-stage design [8]. We assume that patients are enrolled uniformly over the 2-year time period. The optimal and minimax two-stage designs are presented in Table 1 for studies with interim accrual. The expected sample sizes of optimal two-stage designs are much less than that of the one-stage design: ESS = 34.10 for the optimal design and ESS = 38.30 for the minimax design. The ETSL is shorter in the optimal design as compared to the minimax design. As expected, the MPSS of the minimax design is less than that of the optimal design. Given patients in the one-stage design, we also investigate the effect of patient accrual time on the ESS and the ETSL, with years or 5 years in Table 1. It can be seen that the ESS of the optimal design decreases as is longer. When a study has a longer accrual time, the lower bound of the first stage sample size becomes smaller which leads to a smaller ESS of the optimal design. But the ETSL is getting longer in general. We notice that the PETs of these designs are very high for this example. In the aforementioned simulation studies, the PET is near 75% for the configurations in Fig. 1 with months.

Table 1

r1	r	N1	N	PET	ESS	ETSL
Patient accrual time: 2 years
5.54	4.55	34	67	0.99	34.10	13.25
5.18	4.59	38	60	0.99	38.30	15.01
Patient accrual time: 3 years
5.35	4.54	23	70	0.98	24.01	14.21
5.07	4.60	24	58	0.96	25.53	15.38
Patient accrual time: 5 years
5.07	4.50	14	67	0.91	18.63	19.14
4.26	4.58	29	55	0.74	35.64	37.66

Optimal two-stage designs with interim accrual based on the pilot data from the non-small cell lung cancer trial, with the design parameters: months, months, months, , and 80% power. The sample size for the one-stage design is patients. We also present the optimal and miniax two-stage designs with no interim patient accrual for this example in Table 2. When a study is designed with no interim patient accrual, patient enrollment is temporarily suspended when patients at the first stage are enrolled. In this case, data analysis is going to be conducted at the time to follow all patients by τ to observe the time-to-event outcome. The ETSL is calculated by using Equation (3). As expected, designs without interim accrual in Table 2 have longer expected study lengths than the designs with interim accrual in Table 1. The ESS of the optimal design is consistent as the patient accrual time increases, although the ETSL becomes longer.

Table 2

Optimal two-stage designs without interim accrual based on the pilot data from the non-small cell lung cancer trial, with the design parameters: months, months, months, , and 80% power. The sample size for the one-stage design is patients.

r1	r	N1.	N	PET	ESS	ETSL
Patient accrual time: 2 years
4.31	4.44	34	72	0.80	41.60	30.52
4.07	4.58	40	60	0.68	46.34	33.76
Patient accrual time: 3 years
4.23	4.43	25	77	0.73	38.87	37.80
4.09	4.61	41	58	0.68	46.39	42.77
Patient accrual time: 5 years
4.18	4.46	23	71	0.70	37.40	51.91
4.19	4.59	48	54	0.78	49.30	62.47

Discussion

For a two-stage design, when the first stage sample size is very close to the upper bound (), the sample size difference between the two-stage optimal designs and the one-stage design is negligible. On the other side, when the first stage sample size is very close to the lower boundary and the PET is high, its ESS is small. For this reason, it could be a practically useful to design a two-stage design with a new constraint on the sample size ratio between the two stages: , to potentially reduce the issue of the obtained designs near the parameter boundaries. In the simulation studies, we add the restriction of the time for the first stage with at least 10% more than τ. Given a small to medium sample size in an early phase clinical trial, it is important to measure the treatment effect with as many completed cases as possible. For this reason, we would suggest researchers to provide optimal or minimax designs with multiple restrictions to investigators (e.g., 1.2 τ, 1.4 τ). The developed R software program can be modified to search for the optimal designs by adding that constraint. In the discussed lung cancer example, when we increase the percentage of patients completed the follow-up (e.g., , ), the optimal two-stage designs are often identified with being very close to the lower boundary, and the first stage sample size of the minimax designs towards the lower boundary as that percentage increases. As pointed out of the reviewers, the proposed two-stage designs can be applied to studies whose patient accrual time is longer than the follow-up time τ. When a study's follow-up time is very long (e.g., trials for Alzheimer's Disease [29,30]), it is possible that all patients are already enrolled while no patient has been followed by τ. In such cases, other new two-stage designs should be developed. Conditional power after the first stage may be considered as an alternative criteria to the first stage threshold for futility stopping at the end of the first stage. Jennison and Turnbull [31] reviewed several approaches for conditional power calculation for group sequential designs. Conditional power is computed under the assumption that future patients are similar to the already enrolled patients. A study is stopped early for futility due to a low conditional power. When only a few patients complete the follow-up, the information from existing patients may not be reliable enough to test that important assumption. When the estimated RMST from historical data is not reliable (e.g., a very few studies with different populations), a two-arm or multiple-arm randomized study may be utilized to compare the new treatment(s) and the control. Uno et al. [1,27] developed one-stage designs for a randomized clinical trial comparing one treatment with the control based on RMST. It would be a straightforward extension of their one-stage designs to a study with multiple arms. However, it is computational intensive to extend their work to randomized two-stage designs with two or more arms. We consider this as future work. Another interesting topic would be adaptive designs based on RMST, where the second stage sample size and the censoring distribution can be modified using the observed results from the first stage [21]. In order to develop such adaptive designs, effective search algorithms are critical to search for the optimal adaptive designs [14,32].

22 in total

1. Randomized two-stage Phase II clinical trial designs based on Barnard's exact test.

Authors: Guogen Shan; Changxing Ma; Alan D Hutson; Gregory E Wilding
Journal: J Biopharm Stat Date: 2013 Impact factor: 1.051

2. Predicting the restricted mean event time with the subject's baseline covariates in survival analysis.

Authors: Lu Tian; Lihui Zhao; L J Wei
Journal: Biostatistics Date: 2013-11-29 Impact factor: 5.899

3. Accurate confidence intervals for proportion in studies with clustered binary outcome.

Authors: Guogen Shan
Journal: Stat Methods Med Res Date: 2020-04-03 Impact factor: 3.021

4. Exact Unconditional Tests for Dichotomous Data When Comparing Multiple Treatments With a Single Control.

Authors: Guogen Shan; Carolee Dodge-Francis; Gregory E Wilding
Journal: Ther Innov Regul Sci Date: 2020-01-06 Impact factor: 1.778

5. Two-stage optimal designs based on exact variance for a single-arm trial with survival endpoints.

Authors: Guogen Shan
Journal: J Biopharm Stat Date: 2020-03-04 Impact factor: 1.051

6. Design of non-inferiority randomized trials using the difference in restricted mean survival times.

Authors: Isabelle R Weir; Ludovic Trinquart
Journal: Clin Trials Date: 2018-08-03 Impact factor: 2.486

7. Alternatives to Hazard Ratios for Comparing the Efficacy or Safety of Therapies in Noninferiority Studies.

Authors: Hajime Uno; Janet Wittes; Haoda Fu; Scott D Solomon; Brian Claggett; Lu Tian; Tianxi Cai; Marc A Pfeffer; Scott R Evans; Lee-Jen Wei
Journal: Ann Intern Med Date: 2015-07-21 Impact factor: 25.391

8. Phase II study of weekly irinotecan and cisplatin for refractory or recurrent non-small cell lung cancer.

Authors: Yuichi Takiguchi; Tetsuro Moriya; Yoshiko Asaka-Amano; Tatsuo Kawashima; Katsushi Kurosu; Yuji Tada; Keiichi Nagao; Takayuki Kuriyama
Journal: Lung Cancer Date: 2007-07-20 Impact factor: 5.705

9. Exact inference for the random-effect model for meta-analyses with rare events.

Authors: Jessica Gronsbell; Chuan Hong; Lei Nie; Ying Lu; Lu Tian
Journal: Stat Med Date: 2019-12-09 Impact factor: 2.373

10. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome.

Authors: Patrick Royston; Mahesh K B Parmar
Journal: BMC Med Res Methodol Date: 2013-12-07 Impact factor: 4.615

6 in total

Optimal two-stage designs based on restricted mean survival time for a single-arm study.

Introduction

Methods

Results

Discussion

1. Randomized two-stage Phase II clinical trial designs based on Barnard's exact test.

2. Predicting the restricted mean event time with the subject's baseline covariates in survival analysis.

3. Accurate confidence intervals for proportion in studies with clustered binary outcome.

4. Exact Unconditional Tests for Dichotomous Data When Comparing Multiple Treatments With a Single Control.

5. Two-stage optimal designs based on exact variance for a single-arm trial with survival endpoints.

6. Design of non-inferiority randomized trials using the difference in restricted mean survival times.

7. Alternatives to Hazard Ratios for Comparing the Efficacy or Safety of Therapies in Noninferiority Studies.

8. Phase II study of weekly irinotecan and cisplatin for refractory or recurrent non-small cell lung cancer.

9. Exact inference for the random-effect model for meta-analyses with rare events.

10. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome.

1. Conservative confidence intervals for the intraclass correlation coefficient for clustered binary data.

2. Randomized two-stage optimal design for interval-censored data.

3. New Confidence Intervals for Relative Risk of Two Correlated Proportions.

4. Machine learning methods to predict amyloid positivity using domain scores from cognitive tests.

5. Effects of dose change on the success of clinical trials.

6. Monte Carlo cross-validation for a study with binary outcome and limited sample size.