| Literature DB >> 29696204 |
Abstract
Non-proportional hazards have been observed in clinical trials. The log-rank test loses power and the standard Cox model generally produces biased estimates under such conditions. Weighted log-rank tests have been utilized to increase the test power; however, it is not intuitive how to interpret the test result in terms of the clinical effect. We propose a Cox-model based time-varying treatment effect estimate to complement the weighted log-rank test. The score test from the proposed model is equivalent to the weighted log-rank test, and a time-profile of the treatment effect can be obtained by fitting a time-varying covariate Cox model. Simulation results show that the proposed model preserves type-I error and achieve higher power than log-rank tests under non-proportional hazards scenarios. Whereas the standard Cox model produces biased effect estimates, the proposed model produces unbiased estimates if the weight function is correctly specified. It also achieves a better model fit and an enhanced flexibility to accommodate non-proportional hazards compared to the standard Cox model. The proposed approach makes the assumptions of the weighted log-rank test explicit and the validity of assumptions can be assessed based on prior knowledge or model goodness-of-fit. It also helps to translate the weighted log-rank test results into quantitative estimates of the treatment effect with intuitive interpretation. The proposed method can be routinely conducted to complement weighted log-rank tests, especially in the setting where non-proportional hazards are expected.Entities:
Year: 2017 PMID: 29696204 PMCID: PMC5898500 DOI: 10.1016/j.conctc.2017.09.004
Source DB: PubMed Journal: Contemp Clin Trials Commun ISSN: 2451-8654
Fig. 1Examples of weight functions, the corresponding effect adjustment factor A(t), and the hazard ratio . The survival curve is represented by the dotted gray curves on the weight function panels.
Setting 1 (Delayed treatment effect): characteristics of the log-rank test and the standard Cox model vs. the weighted log-rank test and the proposed model vs. the Yang-Prentice model (10,000 simulations per scenario).
| Log-Rank and Standard Cox Model | Weighted Log-Rank and Proposed Model | Yang-Prentice Model | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Power | 95% CI coverage | Non-PH | Power | 95% CI coverage | Non-PH | Power | |||||||||
| 0 | 0.90 | 0.68 | 0.12 | 0.95 | 0.05 | 0.90 | 0.68 | 0.12 | 0.95 | 0.05 | 0.91 | 0.71 | 0.49 | 1.06 | 1.33 |
| 1 | 0.82 | 0.71 | 0.12 | 0.94 | 0.08 | 0.85 | 0.68 | 0.13 | 0.95 | 0.02 | 0.83 | 0.82 | 0.47 | 0.85 | 1.18 |
| 2 | 0.73 | 0.73 | 0.12 | 0.90 | 0.13 | 0.79 | 0.68 | 0.14 | 0.95 | 0.01 | 0.74 | 0.91 | 0.48 | 0.78 | 1.13 |
| 3 | 0.62 | 0.76 | 0.12 | 0.84 | 0.16 | 0.73 | 0.68 | 0.15 | 0.95 | 0.01 | 0.66 | 0.96 | 0.46 | 0.75 | 1.04 |
| 4 | 0.54 | 0.78 | 0.12 | 0.78 | 0.16 | 0.65 | 0.68 | 0.17 | 0.95 | 0.00 | 0.57 | 0.99 | 0.46 | 0.75 | 0.99 |
| 5 | 0.45 | 0.80 | 0.12 | 0.71 | 0.16 | 0.58 | 0.68 | 0.18 | 0.95 | 0.00 | 0.48 | 0.99 | 0.43 | 0.76 | 0.90 |
| 6 | 0.37 | 0.82 | 0.12 | 0.64 | 0.14 | 0.49 | 0.68 | 0.20 | 0.95 | 0.01 | 0.40 | 1.00 | 0.45 | 0.80 | 0.88 |
| 0 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.06 | 0.98 | 0.26 | 1.23 | 0.74 |
| 1 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.13 | 0.95 | 0.02 | 0.06 | 0.98 | 0.29 | 1.21 | 0.73 |
| 2 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.14 | 0.95 | 0.01 | 0.06 | 0.97 | 0.26 | 1.23 | 0.74 |
| 3 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.15 | 0.95 | 0.00 | 0.06 | 0.97 | 0.27 | 1.22 | 0.72 |
| 4 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.17 | 0.95 | 0.00 | 0.06 | 0.97 | 0.27 | 1.23 | 0.74 |
| 5 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.18 | 0.95 | 0.00 | 0.05 | 0.98 | 0.25 | 1.22 | 0.72 |
| 6 | 0.05 | 1.00 | 0.12 | 0.95 | 0.05 | 0.05 | 1.00 | 0.20 | 0.95 | 0.01 | 0.06 | 0.98 | 0.26 | 1.23 | 0.76 |
HR1 and HR2 represent the short-term and the long-term effects in the Yang-Prentice model.
Empirical standard error of
Proportion of rejecting proportional hazards assumption at 5%.
Alternative hypothesis: treatment hazard ratio is 0.9 during the delay period τ and is 0.68 thereafter.
Fig. 2Characteristics of the weighted log-rank test and proposed model when the prolonged effect τ is misspecified. The true τ is 3 months, and the true hazard ratio is 0.68. † Power and hazard ratio estimate from the Cox model are shown by the red squares on the left and the middle panels.
Fig. 3Characteristics of the weighted log-rank test and proposed model when the adjustment factor A(t) during the delay period (i.e., the first 3 months) is misspecified. The true hazard ratio is 0.9 during the first 3 months and 0.68 afterwards. The correct A(t) is thus 0.27 and 1 respectively. † Power and hazard ratio estimate from the Cox model are shown by the red squares on the left and the middle panels. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4Characteristics of the weighted log-rank test and proposed model when both τ and A(t) are misspecified. The true τ is 1.5 months, and the correct A(t) is 0.27 during the first 1.5 months and is 1 afterwards (represented in gray dashed line). The model assumes τ is 3 months, with A(t) ranging from 0 to 1. † Power and hazard ratio estimate from the Cox model are shown by the red squares on the left and the middle panels.
Setting 2 (Reduced long-term treatment effect): characteristics of the log-rank test and the standard Cox model vs. the weighted log-rank test and the proposed model vs. the Yang-Prentice model.
| Log-Rank and Standard Cox Model | Weighted Log-Rank and Proposed Model | Yang-Prentice Model | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Power | 95% CI coverage | Non-PH | Power | 95% CI coverage | Non-PH | Power | |||||||||
| 0 | 0.29 | 0.88 | 0.09 | 0.55 | 0.14 | 0.38 | 0.75 | 0.17 | 0.95 | 0.00 | 0.33 | 0.77 | 0.21 | 1.28 | 0.58 |
| 2 | 0.38 | 0.86 | 0.09 | 0.64 | 0.17 | 0.50 | 0.75 | 0.15 | 0.95 | 0.00 | 0.44 | 0.74 | 0.22 | 1.34 | 0.67 |
| 4 | 0.49 | 0.84 | 0.09 | 0.74 | 0.18 | 0.60 | 0.75 | 0.13 | 0.95 | 0.00 | 0.53 | 0.72 | 0.29 | 1.39 | 0.78 |
| 6 | 0.58 | 0.83 | 0.09 | 0.81 | 0.18 | 0.68 | 0.75 | 0.12 | 0.95 | 0.00 | 0.63 | 0.72 | 0.30 | 1.42 | 0.92 |
| 8 | 0.66 | 0.81 | 0.09 | 0.85 | 0.15 | 0.73 | 0.75 | 0.11 | 0.95 | 0.01 | 0.69 | 0.72 | 0.34 | 1.44 | 1.02 |
| 10 | 0.71 | 0.80 | 0.09 | 0.89 | 0.13 | 0.77 | 0.75 | 0.11 | 0.95 | 0.01 | 0.75 | 0.74 | 0.41 | 1.44 | 1.14 |
| 12 | 0.77 | 0.79 | 0.09 | 0.91 | 0.11 | 0.81 | 0.75 | 0.10 | 0.95 | 0.01 | 0.79 | 0.74 | 0.40 | 1.38 | 1.16 |
| 24 | 0.89 | 0.75 | 0.09 | 0.95 | 0.05 | 0.89 | 0.75 | 0.09 | 0.94 | 0.05 | 0.91 | 0.78 | 0.43 | 1.01 | 1.06 |
| 0 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.17 | 0.95 | 0.00 | 0.05 | 0.99 | 0.19 | 1.08 | 0.37 |
| 2 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.14 | 0.95 | 0.00 | 0.05 | 0.99 | 0.18 | 1.07 | 0.35 |
| 4 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.12 | 0.95 | 0.00 | 0.05 | 0.98 | 0.18 | 1.07 | 0.34 |
| 6 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.11 | 0.95 | 0.00 | 0.05 | 0.99 | 0.19 | 1.07 | 0.34 |
| 8 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.11 | 0.95 | 0.01 | 0.05 | 0.98 | 0.18 | 1.07 | 0.35 |
| 10 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.10 | 0.95 | 0.02 | 0.05 | 0.98 | 0.18 | 1.07 | 0.34 |
| 12 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.10 | 0.95 | 0.02 | 0.06 | 0.98 | 0.18 | 1.07 | 0.35 |
| 24 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 1.00 | 0.09 | 0.95 | 0.05 | 0.05 | 0.98 | 0.18 | 1.07 | 0.35 |
HR1 and HR2 represent the short-term and the long-term effects in the Yang-Prentice model.
Empirical standard error of
Proportion of rejecting proportional hazards assumption at 5%.
Alternative hypothesis: treatment hazard ratio is 0.75 initially and the effect reduced in proportion to treatment discontinuation proportion, after a duration of prolonged effect τ.
Fig. 5Characteristics of the weighted log-rank test and proposed model when the prolonged effect τ is misspecified. The true τ is 8 months, and the true hazard ratio is 0.75.
Spectrum of models for effect estimation.
| Model | Magnitude of the effect | Shape of the effect (relative magnitude over time) |
|---|---|---|
| Standard Cox model | data | pre-specified as a straight line (constant effect) |
| Proposed model | data | pre-specified (time-varying effect) |
| Flexible models | data | data (time-varying effect) |