Literature DB >> 29736455

Survival analysis following dynamic randomization.

Xiaolong Luo¹, Mingyu Li¹, Gongjun Xu², Dongsheng Tu³.

Abstract

In this paper, we propose a method to analyze survival data from a clinical trial that utilizes a dynamic randomization for subject enrollment. The method directly accounts for dynamic subject randomization process using a marked point process (MPP). Its corresponding martingale process is used to formulate an equation for estimating the treatment effect size and for hypothesis testing. We perform simulation analyses to evaluate the outcomes of the proposed method as well as the conventional log rank method and re-randomized testing procedure.

Entities: Chemical Disease Gene Species

Year: 2016 PMID： 29736455 PMCID： PMC5935860 DOI： 10.1016/j.conctc.2016.02.004

Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN： 2451-8654

Introduction

Randomized controlled trials have been a gold standard to demonstrate safety and efficacy of an experimental regimen compared with a standard regimen. They become major body of evidence for regulatory approval for marketing authorization. The randomization of subjects for clinical trials is devised to achieve two important objectives: a) to ensure that samples from treatment groups are comparable with respect to prognostic factors [8]; and b) to ensure the distributional validity of the statistics that are used for estimating and testing the treatment effect [6] and [1]. In theory, a simple randomization could ensure distributional balance within prognostic factors between treatment groups. However, with finite sample, the randomization may not necessarily reach the intended balance within each prognostic factor. In order to solve this problem, Zelen [26] proposed block randomization to ensure the balance within a few strata. When the number of strata levels increases, the block randomization may not achieve the overall balance between the treatment groups either. Taves [24] and Pocock and Simon [13] proposed a dynamic randomization scheme as a practical solution. The method has been commonly utilized in clinical trials with many well known prognostic factors, e.g., see Ref. [16]. While dynamic randomization can ensure the balance with respect to many prognostic factors between the assigned treatment groups (see Ref. [25]) and thus achieves the above objective a), the process may alter the unconditional distribution of the treatment assignment and make the above objective b) questionable when applying a probability distribution for the test statistics, which has been a concern in various regulatory settings. The CPMP Points to Consider on Adjustment for Baseline Covariates see Ref. [4] states a strong position against dynamic allocation. FDA guidance for industry (E9) see Ref. [11] recommends analysis to stratify factors used for dynamic randomization. There have been extensive discussion over this controversy (see for example [20]). Although some argue that conventional analyses are still appropriate when dynamic randomization was used [2], others still think a direct link between randomization and methods of statistical analysis is needed to draw reliable conclusions from clinical trial data [5]. Permutation and re-randomization based testing procedures have been used as last resort to avoid the ambiguity about the distributional property of the standard statistics following dynamic randomization, e.g., Flyer [15]; Hasegawa and Tango [28]. However, a permutation test may not be computationally practical for even moderate sample size. The re-randomization test may not be reproducible due to the use of different random number generator and the choice of replication number. More importantly, neither test is directly linked to the alternative hypothesis and there is no estimation procedure of treatment effect that is intrinsically compatible with the testing procedure. Recently, for the trials with continuous endpoints, Shao et al. [22] derived a valid t-type test based on the bootstrap method. Their results have been generalized to clinical trials with binary responses and event counts as primary endpoints [21] and to a large family of covariate-adaptive designs including dynamic randomization and tests under a linear model framework [17]. No valid procedure has been developed currently for the clinical trials with time to event endpoints and dynamic randomization. In general, the treatment assignment mechanism via a dynamic randomization is measurable, or adaptive, with respect to the information available prior to the randomization. Based on censored survival data, Luo et al. [12] developed the asymptotic normality for a class of statistics that are adaptive to accumulative information based on a martingale approximation. In this paper, we will apply their general theory to study the statistics that results from dynamically randomized clinical trials. In section 2, the framework of a clinical trial with dynamic randomization is described with time to an event endpoints. In section 3, Mantel-Hanzel log rank test statistics [18] and its corresponding re-randomized test [15] will be reviewed. Then, a Marked Point process (MPP) based statistics will be introduced based on [12] for both treatment effect estimation and hypothesis testing. In section 4, simulation analyses will be used to evaluate the performance of the proposed inference procedure with conventional procedures. In section 5, a real cancer trial will be used to illustrate the application of those procedures. Finally, section 6 will conclude with some discussions.

Clinical trials with dynamic randomization

Suppose a clinical trial starts at the calendar time u0 = 0 and denote a fixed but sufficiently late calendar time before which the trial will be completed. Subjects will be sequentially accrued into the trial. Denote u,i ≥ 1 as the calendar time when the i-th subject arrives randomly. For simplicity, we assume that 0 < u1 < u2 < u3<…, i.e., no two subjects are enrolled at the same time and there are only two stratification factors. The results in this paper can be generalized easily to more general situations. Let (G,H) be two (discrete) baseline variables of the i-th subject. For example, G ∈ {1,2,…,L} may represent the site at which the subject is accrued and H ∈ {1,2,…,L} may represent the disease stage of the subject. Let X ∈ {0,1} be the assigned treatment group, whose assignment will be specified later. Let Y be the time the subject stays in the trial so that u + Y will be the calendar time at which the subject leaves the trial. Let δ denote the outcome variable of the subject i at his or her departure. Thus, the trial data will be the collection of {(u,G,H,X,Y,δ),i = 1,2,3,…}. We adopt a conventional survival data model and assume that there are independent random variables T and C such thatfor some hazard functions h, x = 0,1, and censoring hazard function h0. The outcome measures Y and δ are derived from Y = T ∧ C and δ = 1(T ≤ C).

Dynamic randomization

At any given calendar time t > 0, we record the information available at t as follow: the number of subjects enrolled up to time t isand the numbers of subjects by treatment group x = 0,1 are The numbers of subjects by treatment group and baseline factors areand the corresponding marginal counts by site G and stage H areand As discussed by Refs. [25]; there are many ways to conduct a dynamic randomization. In a simple minimization procedure (see the application example in section 5), we can letfor a new subject accrued at the calendar time t with baseline G = g and H = h, where the Bernoulli distributed ξ∼b(1,0.5) is an independent random variable. In the appendix, we describe another more general common approach based on balance scores and will use it for the simulation examples of Section 4.

Statistical inference with dynamic randomization

In this section, we briefly describe the conventional Mantel-Hanzel log rank test and the commonly used re-randomization test. Then, we will provide detail for the proposed MPP based procedure. We will use the terminology from Section 2 and assume that a clinical trial data consists of the collection of {(u,G,H,X,Y,δ),i = 1,2,3,…}.

Mantel-Hanzel log rank test

The naive log rank test would ignore the dynamic randomization process and use only the information time based data {(X,Y,δ),i = 1,2,3,…}. Let be the number of events and Y(1) < Y(2) < … < Y( be the ordered event times. Denote r the number of subjects at risk from the treatment group x, prior to information time Y(, and m the number of subjects from the treatment group x who had events at time Y(. Denote m = m1, + m0,. Let The Mantle-Hanzel log rank test can be calculated aswhich approximates the standard normal distribution based on the rationale that the outcomes of r1, + r0, are independent and m1, follows a hyper geometric distribution with mean e and variance v (see Ref. [3]. It can be noted that, with dynamic randomization, at any given time Y(, the r1, + r0, subjects may not be statistically independent and that can raise questions on the validity of the above simple normal approximation.

Re-randomized test

A re-randomization test is performed as follow: we use the trial datawith the arrival times and subject covariates, and first calculate the log rank test statistics Z = Z. Then, for each ν = 1,2,…,N, where N is a large number such as 10,000, we scramble the arrival times u,i ≥ 1, use the prior treatment variables {(G,H),i = 1,2,3,…} and the dynamic randomization scheme described in Section 2 to generate the new treatment codes . Then, based on the re-randomized data , we can calculate the “log rank” statistics Z. The null hypothesis will be tested based on the observed Z and the empirical distribution of Z,1 ≤ ν ≤ N. See Ref. [15] for details.

MPP based log rank test

For the proposed procedure, we use the same trial data {(u,G,H,X,Y,δ),i = 1,2,3,…}. We first define a counting measure, p(.), on the combined space such that for any event times , entry times , covariates C ⊂ R1, and outcomes D ⊂ R2where the covariate space R1 = {1,…,L}×{1,…,L} × {0,1}, 0 and 1 refer to the control and the treatment group. We also denote the outcome space R2 = {0,1}, where 0 and 1 refer to the censored outcome and event respectively. The counting measure p(.) includes all information of the trial data. Most useful statistics can be written as an integral with respect to this counting measure. To avoid distraction from technical detail, we will state the main results in this section and leave most details in the appendix. For any treatment group x = 0,1, we use the analog of the r, r1, + r0, and from the M-H log rank test. To make it more general like the weighted log rank tests [23], [7]; and [12], we consider any (random) weight function k(t,u,w) → k(u,w) uniformly in probability, where k(t,u,w) is measurable for each t and with uniformly bounded variation in w. Letwhere the variable w corresponds to the event time Y( and the extra variable t indicates that only the information up to the calendar time t is used in the calculation. It is easy to see that they are all measurable. It can be noted that, when k(t,u,w) is independent of u, is independent of k. Denotewhere . Then, the statisticswill be an analog of the usual weighted log rank statistics at the calendar time t. Note that, with the counting measure p(.), we can write U(t) as an integral Its variance estimator can be written as Under the assumption of proportional hazards, we assume that h1,1(w) = rh0,1(w) for some constant r > 0 and all w ≥ 0. Denote weighted cumulative events We use the estimating equation (see Appendix 2 for details)to solve for the estimator of the hazard ratio r, which is denoted as . The variance of can be estimated as Finally, for testing the hypothesis of h0,1(.) = h1,1(.), the statisticsconverges to N(0,1) as n → ∞ for any large t > 0.

Simulations

In this section, we use simulation analyses to evaluate the performance of the procedures described in section 3. First, we describe the simulation of each data component as follow: Subject Arrival: We use a Poisson process to model subject arrival. We generate independent random numbers A,k = 1,2,3,..,N from an exponential distribution with rate λ > 0. Then, letbe the time at which the i-th subject arrives. For each i, we generate G based on a distribution such that P(G = g) = p for some positive constants p,1 ≤ g≤L and . We also generate H based on another distribution such that P(H = h) = q for some positive constants q,1 ≤ h ≤ L and . In the examples below, we assume the number of subjects N = 250, λ = 8, L = 8 and p,1 ≤ g ≤ L as 1/14,1/14,1/14,2/14,2/14,2/14,2/14,3/14, L = 4 and q,1 ≤ h ≤ L as 1/6,1/6,2/6,2/6. Treatment Assignment: After generating all subject arrival and baseline information (G and H), we use the dynamic randomization algorithm to generate the treatment group X,i ≥ 1. In the examples below, we assume that ρ = 0.5 for 2:1 randomization, a = b = c = 1, and θ = 0.9 as specified in the Appendix 1. Failure Times: After generating all subject treatment assignments, we generate the failure times T,i ≥ 1. Given X, T will be generated based on exponential distribution with the rate exp{αG + αH + βX}λ0 with baseline hazard λ0 > 0 and treatment effect β as well as prognostic factors G and H effects (α and α). Here, β < 0 refers to positive treatment effect and non-zero α and α refer to heterogeneous populations. In the examples below, we assume that the baseline hazard λ0 = 0.15, subgroup effects α = 0.01 and α = 0.01. Different treatment effect β will be specified later. Censoring Times: The censoring time C will be independently generated based on uniform distribution over [c1,c2] for some constants c2 > c1 > 0. In the examples below, we assume that c1 = 7 and c2 = 8.Taking Y = min{T,C} and δ = 1(T ≤ C), we complete the data simulation for {(U,G,H,X,Y,δ),i = 1,2,3,…}. For each simulation data, the hazard estimate and its variance will be calculated based on (7) and (8) and the test statistics of (9) and (1) will be calculated along with corresponding p-values p1 and p2 respectively. Here, we assume k = 1. In addition, the re-randomized test under the null hypothesis that there is no treatment difference will be performed as described in section 3 for both (9) and (1), which are denoted by Refs. p1, and p2, respectively. In addition, the hazard ratio between the treatment (1) and control (0) will be estimated as in section 3. In the examples below, we use N = 10,000. The simulation will be repeated for N = 10,000 times and the corresponding calculated p-values are denoted as . Let α = 0.025, the empirical power will be calculated as

Simulations under null hypothesis: β = 0

In this section, we assume there is no treatment effect in the failure time model, i.e., β = 0, and evaluate the performance of four testing procedures. Fig. 1 shows the histograms of N = 10,000 p-values from each of four tests and they suggest reasonable resembling of uniform distributions over [0,1]. Based on one-sample Kolmogorov-Smirnov test, the p-values for testing goodness of fit with the uniform distribution are , , , respectively. Correspondingly, the empirical powers (the actual one sided type I error in this case) based on (4) are 0.0258, 0.0279, 0.0289, and 0.0286 respectively. The mean number of events generated from 10,000 data sets is 176. Overall, all four test procedures appear acceptable in maintaining the designed type I error rate. The MPP based test keeps the type I error rate slightly better than both Mantle-Hanzel test and its re-randomized version.

Fig. 1

P-values histograms under null hypothesis. Upper panel: MPP based procedure and its randomized version; lower panel: log rank based procedure and its randomized version.

Simulations under alternative hypothesis: β ≠ 0

Suppose there is positive treatment effect in the failure time model, i.e., β < 0. Table 1 below shows the empirical power based on (4) for β = −0.25,−0.5 and −0.65 (β = 0 included as well).

Table 1

Empirical power and point estimate.

Power	exp{β}
Power	1.0	0.779	0.607	0.522
B₁	0.0258	0.1851	0.7003	0.904
B_1,r	0.0279	0.1898	0.7048	0.904
B₂	0.0289	0.2423	0.7623	0.93
B_2,r	0.0286	0.24	0.76	0.93
HR	1.0002	0.7793	0.604	0.52
HR Variance	0.027	0.018	0.013	0.01

Empirical power and point estimate. Overall, the power of the four testing procedures appear comparable. Corresponding to slightly inflating type I error rate in the naive Mantle-Hanzel test, the MPP based test shows slightly lower power in detecting the nonzero treatment effect. In addition, the hazard ratio estimate based on (7) are close to the true parameters and their variances based on (8) are reasonable. It can be noted that, in these simulations, we choose a large N = 10,000 for reliability, which however leads to extensive computing time. We have tried additional limited simulation in case of smaller sample size, different censoring time interval, and differentiated subgroup effect and those analyses do not change the findings here.

Application

In this section, we will illustrate the above inference procedures through their application to a breast cancer trial conducted by NCIC Clinical Trials Group [19]. The trial used the minimization procedure to randomize 305 subjects with stratification factors of prior cytotoxic treatment, registration to MA. 16, presence of visceral disease, and study site. As result of the dynamic randomization, 153 were assigned to DPPE/DOX and 152 to DOX treatment groups and the treatment groups were well balanced by the stratification factors included in the randomization. Survival was a secondary endpoint of this trial. At the end of the study, 67/153 died in the DPPE/DOX group, compared with 91/152 died in the DOX group. Based on the ordinary Cox regression, the hazard ratio estimate for survival was 0.656 with standard error of 0.161. The hazard ratio estimate based on (7) with k = 1 was 0.657 with the variance estimate of 0.011. We see two point estimates were close. For the comparison of overall survival between two groups, the p-values from Mantel-Hanzle log rank test and its re-randomized test are 0.0085 and 0.0079 respectively. Based on the proposed procedure of section 3.2, the p-value is 0.0093 and its re-randomized version was 0.0079. All of them suggest favorable overall survival in the DPPE/DOX treatment arm.

Discussions

When implemented properly, a dynamic randomization can effectively balance many prognostic factors in controlled clinical trials [25]. The FDA guidance (E9) recommends analysis to be adjusting for factors used for the dynamic randomization but no specific methodology has been proposed. There have been substantial questions and challenges on the validity of conventional statistics without adjusting for the randomization process (see Refs. [20]), which can be part of the reasons for CPMP guidance to discourage the use of such procedure. One of the FDA advisory meetings discussed the study AGLU02704 (LOTS) that used a dynamic randomization scheme to assign patients with Pompe disease to the experimental “2000L” and Placebo groups and to compare the rate change of 6-min walk distance (see Ref. [14]. The sponsor assessed the significance of the difference without considering dynamic randomization nature and obtained a p-value 0.035. However, the FDA statistician obtained a p-value 0.06 based on a re-randomization test. The intensity of the discussion around this topic highlights potentially serious impact of this controversy. In this paper, we introduce a statistics that aligns with the stratification factors as well as the randomization process for clinical trials with time to event endpoints. It provides an estimate of the treatment effect that accounts for the randomization factors and the process, which can be used to test whether the effect is zero. In addition, the procedure is in a closed form of the observed data and thus easy to compute. We provide theoretical justification of its asymptotic distribution. We use simulation analyses to show its adequacy under moderate sample size. However, it can be noted that we do not intend to compare for the efficiency between the proposed procedure and common procedure such as the log rank test, since it is not expected that our proposed procedure would be nominally improved. We believe, as demonstrated by many simulation studies, that the log rank test is generally sufficient in controlling type I error and maintaining power, although regulatory agency has been raising concern that the log rank test does not reflect the dynamic randomization process and that its validity is unknown. Our proposed testing procedure accommodates the dynamic randomization process and can be used as a replacement for the log rank test when dynamic randomization becomes a concern. Given the additional variation being accounted for in the proposed testing procedure, it is expected that the nominal power may be slightly lower than the log rank test. It can be noted that there is considerable similarity between the modeling framework used here and the covariate-adjusted response-adaptive design (CARA) studied by Hu and Rosenberger [9]; Zhang et al. [27]; and Hu, Zhang and He [10]. The CARA framework focuses on a target allocation with endpoints without delay. The model discussed in this paper deals with survival endpoint and dynamic randomization with high dimensional and potentially sparse stratification factors, which may not fit well in the CARA framework. In principle, CARA framework can be seen as a specific discrete case of the adaptive design framework described by Luo et al. [12]. Finally, it is interesting to note that, while there have been concerns on whether conventional procedures such as standard log rank test would be still valid with complicated dynamic randomization, similar theoretical justification may be modified via conditional argument to provide similar asymptotic result. However, such conditional argument may not work under more general adaptive design when the treatment assignment may depend on interim trial outcome such as CARA model.

13 in total

1. Fisher, Bradford Hill, and randomization.

Authors: Peter Armitage
Journal: Int J Epidemiol Date: 2003-12 Impact factor: 7.196

2. Committee for Proprietary Medicinal Products (CPMP): points to consider on adjustment for baseline covariates.

Authors:
Journal: Stat Med Date: 2004-03-15 Impact factor: 2.373

3. The clinical trial.

Authors: A B HILL
Journal: N Engl J Med Date: 1952-07-24 Impact factor: 91.245

4. A comparison of conditional and unconditional randomization tests for highly stratified designs.

Authors: P A Flyer
Journal: Biometrics Date: 1998-12 Impact factor: 2.571

5. Minimization: a new method of assigning patients to treatment and control groups.

Authors: D R Taves
Journal: Clin Pharmacol Ther Date: 1974-05 Impact factor: 6.875

6. The randomization and stratification of patients to clinical trials.

Authors: M Zelen
Journal: J Chronic Dis Date: 1974-09

7. Evaluation of survival data and two new rank order statistics arising in its consideration.

Authors: N Mantel
Journal: Cancer Chemother Rep Date: 1966-03

8. Phase III study of N,N-diethyl-2-[4-(phenylmethyl) phenoxy]ethanamine (BMS-217380-01) combined with doxorubicin versus doxorubicin alone in metastatic/recurrent breast cancer: National Cancer Institute of Canada Clinical Trials Group Study MA.19.

Authors: Leonard Reyno; Lesley Seymour; Dongsheng Tu; Susan Dent; Karen Gelmon; Barbara Walley; Anna Pluzanska; Vera Gorbunova; Avgust Garin; Jacek Jassem; Tadeusz Pienkowski; Janet Dancey; Laura Pearce; Mary MacNeil; Susan Marlin; David Lebwohl; Maurizio Voi; Kathleen Pritchard
Journal: J Clin Oncol Date: 2004-01-15 Impact factor: 44.544

9. Sequential Analysis of the Cox Model under Response Dependent Allocation.

Authors: Xiaolong Luo; Gongjun Xu; Zhiliang Ying
Journal: Stat Sin Date: 2013 Impact factor: 1.261

10. Validity of tests under covariate-adaptive biased coin randomization and generalized linear models.

Authors: Jun Shao; Xinxin Yu
Journal: Biometrics Date: 2013-07-12 Impact factor: 2.571