Literature DB >> 35862473

Reference curve sampling variability in one-sample log-rank tests.

Moritz Fabian Danzer¹, Jannik Feld¹, Andreas Faldum¹, Rene Schmidt¹.

Abstract

The one-sample log-rank test is the method of choice for single-arm Phase II trials with time-to-event endpoint. It allows to compare the survival of patients to a reference survival curve that typically represents the expected survival under standard of care. The one-sample log-rank test, however, assumes that the reference survival curve is known. This ignores that the reference curve is commonly estimated from historic data and thus prone to sampling error. Ignoring sampling variability of the reference curve results in type I error rate inflation. We study this inflation in type I error rate analytically and by simulation. Moreover we derive the actual distribution of the one-sample log-rank test statistic, when the sampling variability of the reference curve is taken into account. In particular, we provide a consistent estimate of the factor by which the true variance of the one-sample log-rank statistic is underestimated when reference curve sampling variability is ignored. Our results are further substantiated by a case study using a real world data example in which we demonstrate how to estimate the error rate inflation in the planning stage of a trial.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35862473 PMCID： PMC9302761 DOI： 10.1371/journal.pone.0271094

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

The one–sample log–rank test is the method of choice for single–arm Phase II trials with time–to–event endpoint. It allows to compare the survival of the patients to a prefixed reference survival curve that typically represents the expected survival under standard of care. First proposed by [1], its practical implementation including sample size calculation has been described by [2]. The one–sample log–rank test is often criticized in different directions. First, it has been reported repeatedly in the literature that the original one–sample log–rank test tends to be conservative (see [3, 4]). One reason for the test’s inaccuracy is the dependence between the estimators of mean and variance of the original one–sample log–rank statistic when sample size is small. Several attempts have been made in the literature to correct for this (see [3-7]). Amongst those, the proposal made by [6] is presently implemented in the commercial software PASS [8] for sample size calculation for the one–sample log–rank test. Another more conceptual point of criticism against the one–sample log–rank test relates to the process of selecting the reference survival curve. It is common practice to choose the reference survival curve in the light of historic data on standard treatment. At the data level the difficulty is that it might not reflect recent advances in diagnostics and/or concomitant therapy for standard of care thus resulting in a bias by not addressing confounders. Therefore, careful choice of the historic data set is crucial. At the level of analysis, the problem is that choosing the reference curve in the light of historic data implies that the reference survival curve itself is prone to sampling error. This sampling variability of the reference curve however, is ignored in the original one–sample log–rank statistic. One–sample log–rank tests rather assume that the reference survival curve is a priori known and deterministic (see [2–7, 9]). This ignores that the reference curve resulted from an estimation process, complicates interpretation of the test results and implies an inflation in type I error rate. As lined out in [10], this is a general problem in clinical trials with historical controls. One aim of this paper is to systematically study the amount of type I error inflation in dependence of the design parameters of the trial. Moreover, we provide a consistent estimate of the factor by which the true variance of the one-sample log–rank statistic is underestimated when reference curve sampling variability is ignored. This allows to construct a random variable Z that explicitly accounts for the sampling variability of the reference curve and thus assures strict type I error rate control. The paper is organized as follows. After settling notation and the testing problem, we derive a consistent estimate of the actual variance of the one-sample log-rank statistic when the reference cumulative hazard function is estimated non–parametrically from historic data using the Nelson–Aalen estimator. We continue with a simulation study which sheds light on the amount of type I error rate inflation of the one-sample log-rank test when the reference curve sampling variability is neglected in the test statistic. As a tool for planning a one-armed survival study, we then provide a formula that can be used to estimate the inflation based on the historical data and the design parameters of a new study. This instrument is also applied in a case study using a real world data example. We conclude with a discussion of our results and future research. Mathematical proofs are shifted to S1 Appendix.

General aspects

Notation

We assume that historic data on standard of care (group A) is available and consider a single arm survival trial where survival data from a new treatment is collected (group B). Let denote the set of patients from group x = A, B, the number of such patients, and n ≔ n + n the total number of patients. In particular, we denote by π ≔ n/n the treatment group allocation ratio. The parameter n will index the arrival process and asymptotic results will be derived in the limit n → ∞. Accordingly, we assume that the group sizes grow uniformly as total sample size increases, i.e. we assume π as a fixed constant. We denote by T or C the time from entry to event or censoring for patient i from group x = A, B, respectively. Let X ≔ T∧C denote the minimum of both. As usual, we assume that the T and C are mutually independent (non–informative censoring). Based on the observed data, we calculate the number of events from treatment group x = A, B up to study time s ≥ 0 as and the number at risk by study time s ≥ 0 in treatment group x = A, B. Let J(s) ≔ I(Y(s) > 0) indicate whether there are still patients at risk in treatment group x by study time s. As usual, we let λ(s) ≔ limΔ→0 P(s ≤ T < s + Δ|T ≥ s)/Δ denote the hazard of a patient from treatment group x = A, B. We denote by the corresponding cumulative hazard function for treatment group x = A, B, respectively. Finally, we denote by , , (or , , ) the density, distribution function and survival function of the time to event T (or time to censoring C) in treatment group x = A, B. Notice that λ, Λ, , , and , , are assumed to coincide for all patients from the same treatment group. We will also need the Nelson–Aalen estimator (see [11, 12]) of the cumulative hazard function Λ(s) for group x = A, B, and the corresponding estimator of the variance function We consider N, Y, J, and as stochastic processes in study time s ≥ 0. Notice that we define 0/0 ≔ 0 whenever formal division of zero by zero occurs in a mathematical expression. Any stochastic process and martingale in this manuscript is regarded w.r.t. the filtration generated by the observable survival times which is defined at the beginning of Appendix A in S1 Appendix.

Motivation

The classical one–sample log–rank test (see [1, 2]) assesses the null hypothesis that the hazard Λ of patients from the experimental group B coincides with some prefixed reference hazard Λ0 on some prefixed observation horizon 0 ≤ s ≤ smax. Common basis for construction of the one–sample log–rank test is the stochastic process . When H holds true, M0 is (known to be) a mean–zero martingale whose variance may consistently be estimated by or (see e.g. [13]). A standardized version of the one–sample log–rank statistic is then given by resp. which are asymptotically standard normally distributed under the null hypothesis H. In clinical practice, the reference curve Λ0 is typically intended to represent the survival under standard of care Λ i.e. it is aimed that Λ0 ≡ Λ. Accordingly, one is actually interested in the two-sided null hypothesis which is the intersection of the two one-sided hypotheses In this context however, the immediate difficulty is that the true cumulative hazard Λ under standard of care is unknown, and thus in practice cannot be used as a reference function in the one–sample log–rank test. To get around this problem it is common practice in the implementation of the classical one–sample log–rank test to estimate Λ from historic data, and to choose the obtained estimate for Λ as reference cumulative hazard function, while pretending (i) that is deterministic and (ii) that coincides with Λ. Consequently, the practical implementation of the classical log–rank test often is to consider the processes , , and to use the statistic for i = 1 or i = 2 for testing the null hypothesis H0, while additionally pretending that still under H0. In doing so, note that the maximum observation time in group B must be smaller than the maximum observation duration in the control group so that the above comparison with the estimator from the control group can be made at all. However, this approach ignores that the estimator for Λ is in fact random and thus contributes additional variance to the test statistic. Consequently, underestimates the true variance of . Hence, ZOSLR, in fact fails to be standard normally distributed under H0 and inflation of the type I error rate results. The aim of the following is to systematically study the extent of this type I error rate. In a first step, a correct estimator of the variance of the process has to be worked out.

Revisiting the one–sample log–rank test statistic

Consider the stochastic processes with N, and according to (1), (2) and (3). Assume that the null hypothesis H0 : Λ(s) = Λ(s) for all 0 ≤ s ≤ smax holds true. Then by Theorem 1 (see S1 Appendix) is a mean–zero martingale and for each fixed smax ≥ s > 0 we have in distribution as n → ∞, where for i ∈ {1, 2} (see S1 Appendix, Lemma 1 and Corollary 1). In particular, we conclude that and are consistent estimators of the variance of , and that the random variable for i ∈ {1, 2} is approximately standard normally distributed under the null hypothesis H0 if (see Theorem 1 in S1 Appendix). A sufficient condition for p0 > 0 is as follows: Let a and f denote the length of accrual and follow-up period in group B and let smax = a + f. Let s denote the maximum observation time in the historic control group A, i.e. . Then p0 > 0 if smax < s. Also note that the factor in the second summand of cancels out with the factor n from the definition of and the factor from both the numerator and the denominator of Z1 cancel each other out. In contrast, the standard one–sample log–rank test statistic at smax is for an i ∈ {1, 2}. The standard one–sample log–rank test of the two-sided null hypothesis H0 is by definition considered to be significant to the level α whenever Analogously, the one-sided hypotheses H0,sup or H0,inf were rejected at the level of α/2 by classical one-sample log-rank tests if It follows directly from the distribution approximation (6), however that ZOSLR, is in truth not standard normal under the null hypothesis H0, since for both i ∈ {1, 2}, falls short of the consistent variance estiamtors of by the amount representing the reference curve sampling variability. This results in type I error rate inflation. The exact amount of the type I error rate inflation is driven by the ratio of the standard deviations This ratio can be consistently estimated by for i ∈ {1, 2}. The actual type I error rate of the one-sample procedure under H0 can thus be approximated by If recruitment and censoring mechanism were equal in both groups, R would amount to and the actual type I error level would be inflated to We refer to S1 Appendix for the general case and the derivation of this formula. In particular the classical one–sample log–rank test procedure (8) exceeds the nominal level α whenever the reference curve sampling variability is large. In this sense the procedure (8) is invalid to test for H0. In contrast, notice that the two–sample log–rank test would be a valid test for testing the null hypothesis H0 that survival in the new and historic control coincide. At this point it should be noted that it would be natural to choose the modified test statistic Z as a new statistic for testing H0. In a forthcoming paper we will examine its performance regarding type I error rate and power as compared to the two-sample log–rank test. However, these aspects are beyond the focus and scope of this manuscript.

Simulation study: Effective type I error rate of the one–sample log–rank tsest

Design

The objective of this simulation study is to quantify the amount of type I error rate inflation, when the reference curve serving as benchmark in the one–sample log–rank test is estimated from historic data, but the reference curve sampling variability is ignored in the test statistic. In our simulations we focussed on settings of particular practical relevance: Patients were assumed to enter the trial uniformly between year 0 and year a = 2. Accordingly, the calendar times of entry were generated according to a uniform distribution on [0, a], i.e. . After the end of the accrual period, patients were assumed to be followed up for further f = 3 years, while assuming no loss to follow–up. Hence, we have for x = A, B. Survival times T in the historic control group A were generated according to a Weibull distribution Λ(s) ≔ −log(S1)⋅t with prefixed shape parameter κ ∈ {0.5, 1.0, 2.0} and 1-year survival rate . Survival times T in the new treatment group B were generated from the same distribution (Λ = Λ), because our focus is on the type I error rate inflation of the classical one–sample log–rank test when used for testing the null hypothesis H0 : Λ = Λ. To perform the one–sample log–rank test, the group A data was used to calculate the Nelson–Aalen estimate of Λ, and the procedure defined in Eq (8) was applied with a desired two–sided significance level of α = 5% with both variance estimators and . The simulations were used to estimate (i) the empirical type I error rate of the two-sided procedures (8) when used for testing H0 and (ii) the median factors by which the true standard deviation of the one–sample log–rank statistic is underestimated when sampling variability of the reference curve estimate is ignored. Additionally, we study the empirical type I error rates and of the one-sided procedures (9) for testing the two one-sided hypotheses H0,sup and H0,inf. In order to satisfy the requirements of our asymptotical results, we chose smax = a+ f−10−8. We used different sample sizes n ∈ {25, 50, 100, 200} for group B and allocation ratios π = n/n ∈ {1, 1/2, 1/4, 1/8, 1/16} to study the impact of these parameters on the amount of type I error rate inflation and underestimation of the true variance. Scenarios with π ≤ 1/2 are more likely to reflect common practice as the size of the experimental cohort is typically smaller than the size of the historical control cohort. For each parameter constellation, we generated 100,000 samples to which we applied the one–sample log–rank test procedures and calculated the underestimation of variance and empirical type I error rates. For this number of samples, the breadth of a 95% confidence interval ranges between 0.0027 and 0.0057 for underlying true rates between 0.05 and 0.3. The results for κ = 1 are shown in Tables 1 and 2. The results for κ = 0.5 and κ = 2 are shifted to Appendix C of S1 Appendix.

Table 1

Empirical type I error rates under consideration of sampling variability.

n _B	π = 1		π = 1/2		π = 1/4		π = 1/8		π = 1/16
n _B	α^	R^i	α^	R^i	α^	R^i	α^	R^i	α^	R^i
using Σ^OSLR,1 as variance estimator
25	0.143	0.689	0.100	0.804	0.077	0.884	0.065	0.935	0.058	0.963
50	0.155	0.696	0.107	0.810	0.080	0.889	0.066	0.938	0.059	0.966
100	0.161	0.701	0.108	0.813	0.079	0.892	0.065	0.941	0.057	0.968
200	0.164	0.703	0.108	0.815	0.079	0.893	0.064	0.942	0.057	0.969
using Σ^OSLR,2 as variance estimator
25	0.167	0.689	0.117	0.804	0.086	0.884	0.071	0.935	0.063	0.963
50	0.169	0.696	0.114	0.810	0.084	0.889	0.070	0.938	0.061	0.966
100	0.167	0.701	0.112	0.813	0.082	0.892	0.067	0.941	0.059	0.968
200	0.166	0.703	0.110	0.815	0.080	0.893	0.065	0.942	0.058	0.969

Table 2

Empirical one-sided type I error rates under consideration of sampling variability.

n _B	π = 1		π = 1/2		π = 1/4		π = 1/8		π = 1/16
n _B	α^inf	α^sup	α^inf	α^sup	α^inf	α^sup	α^inf	α^sup	α^inf	α^sup
using Σ^OSLR,1 as variance estimator
25	0.081	0.062	0.066	0.034	0.054	0.023	0.048	0.017	0.043	0.015
50	0.087	0.069	0.065	0.042	0.052	0.029	0.044	0.022	0.039	0.019
100	0.087	0.074	0.063	0.046	0.047	0.032	0.040	0.025	0.035	0.022
200	0.086	0.077	0.060	0.048	0.045	0.034	0.037	0.027	0.033	0.024
using Σ^OSLR,2 as variance estimator
25	0.050	0.117	0.038	0.079	0.028	0.058	0.023	0.048	0.020	0.043
50	0.062	0.106	0.043	0.071	0.032	0.052	0.026	0.044	0.022	0.039
100	0.069	0.099	0.047	0.065	0.033	0.049	0.027	0.040	0.023	0.036
200	0.073	0.094	0.048	0.062	0.035	0.045	0.028	0.037	0.025	0.033

(i) Empirical one-sided type I error rates α1 and α2 of test procedures (9) when used for testing H0,sup and H0,inf, respectively, for different parameter constellations of practical relevance. Survival times were Weibull distributed with shape parameter κ = 1 and 1–year survival rate S1 = 0.5 in the historic control group A and the new treatment group B. Theoretical one–sided significance level: 2.5%. Underlying sample size of group B is n with allocation ratio π = n/n between new and historic groups.

(i) Empirical two–sided type I error rates α of test procedure (8) when used for testing H0 : Λ = Λ, and (ii) median factors as in (11) by which the true standard deviation of the one–sample log–rank statistic is underestimated when ignoring the reference curve sampling variability for different parameter constellations of practical relevance. Survival times were Weibull distributed with shape parameter κ = 1 and 1–year survival rate S1 = 0.5 in the historic control group A and the new treatment group B. Theoretical two–sided significance level: 5%. Underlying sample size of group B is n with allocation ratio π = n/n between new and historic groups. (i) Empirical one-sided type I error rates α1 and α2 of test procedures (9) when used for testing H0,sup and H0,inf, respectively, for different parameter constellations of practical relevance. Survival times were Weibull distributed with shape parameter κ = 1 and 1–year survival rate S1 = 0.5 in the historic control group A and the new treatment group B. Theoretical one–sided significance level: 2.5%. Underlying sample size of group B is n with allocation ratio π = n/n between new and historic groups.

Results

The classical one–sample log–rank test procedure defined in (8) does not account for sampling variability of the reference curve estimate. This leads to type I error rate inflation when the underlying null hypothesis to be tested is H0 : Λ = Λ. As expected, our simulations support that the amount of type I error rate inflation of the one–sample log–rank test is most pronounced when the historic control group is small compared to the new treatment group, i.e. when the allocation ratio π is large. For most constellations, the inflation for the test statistics ZOSLR,1 slightly decreases with increasing overall sample size n but stabilizes on some level above the desired significance level of α = 5%. For the test statistic ZOSLR,2 one can observe a slight increase of this inflation with increasing overall sample size and a stabilization on the same level as for ZOSLR,1. This supports that the observed type I error rate inflation is primarily not a small sample size phenomenon, but rather due to the underestimation of the variance in the one–sample log–rank statistic. The type I error rate varies furthermore only slightly between the different shape parameters. For ratios π = 1, the true two-sided type I error rate is approximately three times larger than the desired one (14.3%−16.9% instead of 5% for π = 1 and κ = 1). For low allocation ratios as 1/8 or 1/16, the actual two-sided type I error still exceeds the nominal level, but to an extent that might be acceptable for a phase II trial (5.7%−6.3% for π = 1/16 and κ = 1; 6.4%−7.1% for π = 1/8 and κ = 1). The one-sided type I error rates, however, are quite imbalanced with the direction of imbalance heavily linked to the variance estimator used. This is a well-known phenomenon (see [14]), that affects our simulation results in addition to the neglected variance. Estimation of the variance with the counting process estimator ΣOSLR,1 leads in the finite sample case to a left-skewed distribution of ZOSLR,1 and thus more decisions in favour of the new treatment are made. Estimation with the compensator process via ΣOSLR,2 in contrast leads to a right-skewed distribution of ZOSLR,2. Even for small allocation ratios at π = 1/8 both tests have an one-sided error rate above 3.7% instead of 2.5% in their corresponding favoured direction. For small historic control groups (π ≥ 1/2) the effect of ignoring reference curve sampling variability on type I error rate inflation predominates these effects of skewness. Varying the shape parameter κ does only change the inflation slightly (see Appendix C in S1 Appendix). This is to be expected as the log-rank test is a rank-based test. By transformations of the time scale, the survival distributions of the different scenarios can be transformed into each other such that only the distributions of entry and censoring times differ between the scenarios. This is reconfirmed by the fact that in case of equal entry and censoring distributions of groups A and B the asymptotical inflation in Eq (13) does only depend on π and no other design parameters. With a view to application of the classical one-sample log-rank test (8) for testing H0 in historically controlled phase II survival trials, our results support that as a rule of thumb choice of the reference curve should be based on a historic control that is at least about 12 times larger than the new experimental trial cohort. According to (13), a factor of at least 12 corresponds to an inflation of the type I error rate to a maximum of 6%. For a stricter type I error rate control one could implement a hybrid testing procedure defined by rejecting H0 when either ZOSLR,1 ≥ Φ−1(1−α/2) or ZOSLR,2 ≤ Φ−1(α/2). This hybrid testing strategy exploits the skewness of the statistics ZOSLR, to compensate in parts for the type I error rate inflation due to neglect of the reference curve sampling variability. In our simulations, this strategy yields valid tests of H0 for allocation ratios π ≤ 1/8. If the historic control group A is small (π ≥ 1/4), the null hypothesis of no difference between group B and A should rather be tested by a two–sample log–rank test. Furthermore, the maximum observation time of the new trial should also be set smaller than the one of the historic control to avoid utilizing the volatile tails of the Kaplan-Meier curve within the test statistic. This is also supported by the calculation of αOSLR as defined in (12) via (10). The results of this calculation are displayed in Fig 1. The inflated type I error level is plotted as a function of the allocation ratio π for three different durations of the follow-up period. As expected, longer observation periods lead to a higher inflation of the type I error rate. This is due to the fact that the estimation of the survival time in group A becomes more volatile at the tail of the distribution which is more frequently utilized in the test statistic for group B if the follow-up duration is extended.

Fig 1

Type I error rate approximation.

Type I error rate approximation.

Type I error rate approximation given by as a function of the allocation ratio π for different durations f ∈ {1, 2, 3} of the follow-up period in the new trial. Calculations were done for exponentially distributed survival times with a 1 year survival rate of 50%. Accrual a for the historic control and new treatment groups was set to 5 years, follow-up f of the historic trial was set to 3 years. To satisfy the conditions of Theorem 1 (see S1 Appendix), we choose smax = a + f−10−8. In summary, the simulations support that neglecting the reference curve sampling variability in the classical one–sample log–rank test relevantly compromises type I error rate control when testing null hypotheses H0 : Λ = Λ. Notice that the classical one–sample log–rank test only realizes strict type I error rate control for testing the null hypothesis which, however, detracts from the null hypothesis H0 : Λ = Λ of true interest when random deviation of from Λ is large.

A priori estimation of the expected type I error rate inflation

As seen in the preceding simulations, the actual type I error rate of the classical one-sample log-rank test always exceeds the nominal type I error level if the sampling variability of the reference curve is not taken into account. However, the magnitude of this excess depends on the data from the reference cohort as well as the sample size in the new, experimental cohort. In this section, we describe how to estimate the expected amount of type I error rate inflation already at the planning stage of a historically controlled, single-arm survival trial. This allows an a priori assessment of whether the one-sample log-rank test can be considered appropriate to test H0 in the particular trial setting or whether the use of alternative methods such as the two-sample log-rank test is preferable. The difference between the test statistic of the classical one-sample log-rank statistic ZOSLR, from (4) and the asymptotically standard normally distributed random variable Z from (6) is the standardization factor in the respective denominators. Let from (11) denote the ratio of the standardisation factors without and with consideration of the sampling variability. With the factors it is possible to explicitly quantify the expected amount of type I error rate due to neglect of reference curve sampling variability: The actual type I error rate of a two-sided classical one-sample log-rank test with nominal level α is in expectation instead of α when reference curve sampling variability is neglected. The former can be approximated by . Analogously, can be approximated via a first-order Taylor expansion by. In the planning stage of a new trial, the historical data (summarized by the set of random variables ) is already known and can be taken into account when considering the type I error rate inflation. Conditioning on this we can compute One should note that according to our calculations in S1 Appendix the asymptotics for both i ∈ {1, 2} lead to the same result. Hence, Rpre is well-defined. The expression given here can immediately be estimated from given historical control data and design parameters of a trial (see Appendix B in S1 Appendix for details). Analogously to (12), the actual type I error rate to be expected is given by The computations in [6] and the asymptotics of the Nelson-Aalen estimator yield After another approximation and some computations (see Appendix B in S1 Appendix), we also get Under the null hypothesis H0, the right hand side can be estimated by plugging in Kaplan-Meier estimates gained from the historic control group A for respectively . For a given historical control group, these formulas can now be used to compute the type I error inflation due to ignoring reference curve sampling variability. Of course, the treatment group allocation ratio π is essential for the extent of this inflation. We also applied this a priori estimation in our simulation from the previous section. The results can be found in Table 3. They suggest that the underestimation of variance can be robustly examined based on the historic data before the new group is recruited. A much simpler estimate is provided by formula (13). This is particularly useful when no assumption can be made about recruitment and censoring mechanism in group B. From Fig 3, however, it can be seen that these have a large influence on the actual extent of the type I error rate inflation.

Table 3

Apriori estimated type I error rates under consideration of sampling variability.

n _B	π = 1		π = 1/2		π = 1/4		π = 1/8		π = 1/16
n _B	α _pre	R _pre	α _pre	R _pre	α _pre	R _pre	α _pre	R _pre	α _pre	R _pre
25	0.156	0.724	0.107	0.823	0.079	0.896	0.064	0.943	0.057	0.970
50	0.161	0.715	0.108	0.820	0.079	0.895	0.065	0.943	0.057	0.970
100	0.163	0.711	0.109	0.818	0.079	0.895	0.065	0.943	0.057	0.970
200	0.165	0.709	0.109	0.817	0.080	0.895	0.065	0.943	0.057	0.970

(i) Median a priori estimates of type I error rate αpre (see Eq (15)) of test procedure (8) when used for testing H0 : Λ = Λ, and (ii) median a priori estimates of underestimation of the standard deviation Rpre (see Eq (14)) of the one–sample log–rank statistic when ignoring the reference curve sampling variability for different parameter constellations of practical relevance. Survival times were Weibull distributed with shape parameter κ = 1 and 1–year survival rate S1 = 0.5 in the historic control group A. Underlying sample size of n = n/π with allocation ratio π. We will now illustrate the influence of basic design parameters on the type I error inflation using a practical example. We employ the setting of the Mayo Clinical trial in primary biliary cirrhosis of the liver (PBC), which is a rare but fatal chronic disease whose cause is still unknown (see [15]). In this double-blinded randomized trial the drug D-penicillamine (DPCA) was compared with a placebo. The study data is publicly available via the survival package in R (see [16, 17]). Among the 158 patients of the cohort treated with DPCA, 65 died during the trial. The Kaplan-Meier survival curve of these patients can be found in Fig 2. The time scale is given in years. In the same figure, we also display the empirical distribution of the censoring variable C in this cohort. As we will see below, the censoring distribution also plays a crucial role for our computations. We now suppose, that a new treatment becomes available and the data from this new trial shall be used to compare the survival under a new treatment to the survival under historic treatment with DPCA. This shall be accomplished in a trial in which patients are recruited uniformly over a accrual period of length a and then followed-up in an subsequent observation phase of length f. The allocation ratio (new to historic cohort) will again be denoted by π. If one cannot find a suitable parametric model to be fitted to the data, the Kaplan-Meier and Nelson-Aalen estimates (see Fig 2) are employed as reference curves for the one-sample log-rank test, respectively.

Fig 2

Distribution of survival and censoring variable.

Distribution of survival and censoring variable.

Distribution of overall survival and censoring in the cohort treated with DPCA of the Mayo Clinical trial in primary biliary cirrhosis. Left: Cumulative hazards according to the Nelson-Aalen estimator. Right: Survival distributions according to the Kaplan-Meier estimator Similar to our simulation study, we first investigate the influence of the allocation ratio on the type I error inflation. We choose π ∈ {0.01, 0.02, 0.03, …, 1}, a = 2 and f ∈ {2, 4, 6, 8}. Hence, we obtain analysis dates smax ∈ {4, 6, 8, 10}. As the observation period of many patients in the historical reference group exceeds 10 years, we do comply with the requirements of Theorem 1 (see Appendix A of S1 Appendix) here. The results in terms of the actual type I error level of the one-sample log-rank test can be found on the left hand side of Fig 3. For any fixed f, the actual type I error level increases nearly linearly with the allocation ratio. the amount of increase additionally depends on the length of the follow-up, where a longer duration of the follow-up period leads to steeper increases.

Fig 3

Type I error inflation.

Type I error inflation.

Actual type I error levels of the classical one-sample log-rank test when sampling variability of the reference curve is ignored. Left: Variation of the allocation ratio with fixed accrual duration a and four different durations of the follow-up period f. Right: Variation of the length of the follow-up period f for a fixed allocation ratio π and three different durations of the accrual period a. We take a closer look at the role of the trial overall trial duration a + f next. As already seen in the first part, longer trials lead to a larger inflation of the type I error levels. To analyse this dependence, we now choose π = 0.5, a ∈ {2, 4, 6} and f ∈ {0, 0.05, 0.1, …, 6}. The results can be found on the right hand side of Fig 3. As we can see, trials with a longer overall duration a + f lead to larger type I error inflation. This effect is most substantial if the overall duration of the new trial is close to the longest observation in the historic data set (in our example about 12.5 years). The reason is that in this case, the testing procedure needs to utilize parts of the tail of the Nelson-Aalen estimator which are based on a small proportion of patients and thus are affected by a high amount of variability. This stresses the importance of the frame condition that the available follow–up data for patients from the historic group should be substantially longer than the desired length of the new trial, if the reference survival curve is estimated from historic data. However, the inflation of the type I error rate neither behaves completely monotonically in the accrual duration a nor in the follow-up duration f. Even if the variance estimators of the one–sample log–rank test and the additional variance from consideration of the reference curve sampling variability increase monotonically in a and f, their ratio can increase if the increase of the former is steeper than the increase of the latter. Nevertheless, there is a clear tendency towards a larger inflation of the type I error rate if either a or f increases.

Discussion

Traditional one–sample log–rank tests compare the survival function of an experimental treatment to a prefixed reference survival curve, which typically represents the expected survival under standard of care. Choice of the reference survival curve is commonly based on historic data on standard therapy and thus prone to sampling error. Nevertheless, traditional one–sample log–rank tests do not account for this variance of the reference curve estimator, but rather assume that the reference curve is deterministic. Ignoring the sampling variability however, leads to an inflation of the type I error rate. The extent of this inflation depends in particular on the relative size of the historic control cohort compared to the new treatment cohort. A major objective of this paper was to work out recommendations on the size of the historic control group such that the type I error inflation remains within an acceptable range. In this regard, our simulations support that the classical one-sample log-rank test is adequate for two-sided type I error rate control if the historical control cohort is large enough. If the desired significance level is 5%, Eq (12) suggests that this historic control cohort should be at least 12 times larger than the new cohort (π ≤ 1/12) to assure that the type I error rate is not inflated beyond 6%. Additionally, the available follow–up data for patients from the historic group should be substantially longer than the desired length of the new trial (see Fig 1 and Results). For stricter type I error rate control one could use a hybrid strategy defined by rejecting H0 whenever ZOSLR,1 ≥ Φ−1(1−α/2) or ZOSLR,2 ≤ Φ−1(α/2). This strategy exploits the skewness of the distribution of different versions of the one-sample log-rank test statistic in order to compensate in parts the type I error rate inflation due to neglect of reference curve sampling variability. In our simulations, this hybrid strategy yields satisfactory type I error rate performance for allocation ratios π ≤ 1/8. In this respect, it seems advisable to use the classical two-sample log-rank test (see [18]) if these conditions are not met and the proportional hazards assumption can be made. There, the variability in the data of the reference group is naturally taken into account. However, one must be careful here as well, since compliance with the type I error rate is not given in case of small sample sizes or unbalanced groups [19] as in some scenarios of our simulations. However, such issues can be solved by the application of resampling-based tests [20]. We also provided a consistent estimate of the actual variance of the one–sample log–rank statistic when reference curve sampling variability is taken into account. This allows to construct a random variable Z (see Eq (6)) that is asymptotically standard normally distributed under the null hypothesis H0 : Λ = Λ. Z thus yields a test of H0 that may be viewed as an alternative to the two-sample log–rank test for H0. Planning and performance of this new test as compared to the two–sample log–rank test will be contents of a forthcoming paper. Conceptually, this construction of our random variable Z also sheds light on a general strategy for lifting existing methodology for single–arm survival trials to a randomized, multi–arm setting. This might be of interest for designing confirmatory survival trials with interim analyses. Performance of interim analyses in clinical trials is of ethical and economic interest. On the one hand, interim analyses enable faster decisions regarding rejection or acceptance of the underlying null hypothesis when the treatment effect is larger or smaller than initially expected. Moreover, interim analyses offer the possibility for data dependent modifications of the trial (e.g. sample size recalculation) in the case of new insights, thus increasing the prospects of the trial. Trial designs with interim analyses offering such kind of flexibility at full type I error rate control are commonly referred to as confirmatory adaptive designs [21, 22]. Advanced one-sample methodology as in [23] might be transformed to be applicable in multi-arm settings in this way to address still existing problems when it comes to the use of additional information in interim analyses (see [24]). Similarly, weighted one-sample log-rank tests as in [25] which are better suited for the detection of late or early effects can also be analyzed with the methods proposed here. Corresponding weights can be introduced to ZOSLR, (see (4)) resp. Z (see (6)) for i ∈ {1, 2} by multiplying them with the event indicator functions, inserting them into the counting process integral (2) of the Nelson-Aalen estimator and inserting its square into the counting process integral (3) of the variance estimator. Going beyond our research, we have to point out that we did not consider the problem of confounding in historically controlled trials here. This occurs if the characteristics of the historical control cohort and the cohort of the new study differ substantially. Extreme caution is therefore required when selecting a historical control. In [26, 27], several criteria under which a historical control cohort appears suitable, are given. Of course, known confounders can also be taken into account by choosing an adequate analysis technique. This can be achieved by stratification of the two cohorts or, if appropriate, a Cox proportional hazards model. However, this will be content of future research. The objective of this paper is to provide methodology for accounting for sampling variability of the reference curve in classical one-sample log-rank tests, and illustrate the drastic consequences of neglect of reference curve sampling variability on type I error rate control.

R code.

Supplementary R code for the estimation of type I error rate inflation via a Monte Carlo simulation. (ZIP) Click here for additional data file. Supplementary R code for a priori estimation of type I error rate inflation given the data from a historic control group. (R) Click here for additional data file.

Mathematical details.

Mathematical statements and corresponding proofs. (PDF) Click here for additional data file. 27 Jan 2022 Submitted filename: Answers to comments.pdf Click here for additional data file. 21 Apr 2022

PONE-D-22-01927

Reference Curve Sampling Variability in One-Sample Log-Rank Tests

PLOS ONE Dear Dr. Jannik, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The scope, the structure and the main content of this new manuscript are now adequate. However, there are some major (time window, improved discussion of Table 1, appropriate type I error rates in both tails) and a number of minor issues which should be addressed (see comments of all 3 reviewers).

Please submit your revised manuscript by Jun 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Best wishes, Ralf Ralf Bender, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: "The work of Moritz Fabian Danzer was funded by the German Science Foundation (Deutsche Forschungsgemeinschaft, DFG, grant number 413730122)" We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "The work of MFD was funded by the German Science Foundation (Deutsche Forschungsgemeinschaft, DFG, https://www.dfg.de, grant number 413730122). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." Please include your amended statements within your cover letter; we will change the online submission form on your behalf [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: As suggested in the former round, the authors focused now more on the one-sample setting and explain the effect of the uncertainty when the reference group is "estimated" from historic data. The comparison of their new adjusted statistic to a classical two-sample setting is referred to a future project. Thus, the second major point of my last review is postponed to the future. However, I have still some concerns related to my first major point, which I guess can rather easily be solved or adressed. 1) The Time-window for the analysis (i) The authors decided not the present the actual conditions on s_max, the upper limit of the analysis window, but prefer a verbal description. "In particular, we require the time window of observation in the interventional group to be smaller than that of the control group". Maybe the authors can convince me from the opposite but this is not the same as the condition in Theorem 1 and 2 in the appendix. There it is said that (a) S_{XA}(s0) = S_{TA}(s0)S_{CA}(s0) is required. Here, the censoring also plays an important role. Does the authors suppose that the censoring in both groups are the same? Otherwise, I do not see how the upper verbal conditions implies the formal condition (a). (ii) In their response to my former comment, the authors state "Nevertheless, we have kept the formal argument ”∞” in formulas (5), (7) and (8) to show that no data from any group is discarded in the final analysis if the condition addressed is complied with." From a theoretical point of view, the time window must be specified in advance before the data is collected. Since s_max need to be chosen such that P(X > s_max)>0, it is expected that there are also observations larger than s_max. These observations are not completely excluded because they are needed in the calculation for the Nelson-Aalen estimator. I understand that the restriction to s_max is not so nice, especially in the formulas, but it is what the theory gives you. Thus, I would prefer that it is mentioned appropriately in the paper. The restriction to s_max is also important for the so-called restricted mean survival time (RMST, see e.g. several papers of Royston). So it is not completely new and also accepted (at least from my point of view). There are also some discussion how to avoid the restrictions, see Tian et al (2020) for the RMST or Wang (1987) & Stute and Wang (1993) for uniform consistency of the Kaplan–Meier, but in both cases conditions on the censoring distribution are needed. (iii) How is s_max chosen in the data example and for the simulations? The conditions on s_max ensure that the variance does not "explode", right? Is there a guarentee that this does not happen in the simulations? When this is the case but no s_max is chosen, this would also be fine for me when the authors appropriately explain it in the respective section. 2) Minor Remark/Question: The log-rank test is known to be optimal for proportional hazard alternatives. But when early or late differences are expected including an appropriate weight in the statistic can lead to a siginifcant benefit in terms of power. I wonder whether such weights can also be implemented in the new proposal. When there is a quick solution, the authors may add a small discussion to the paper, otherwise please ignore the comment. References Tian L, Jin H, Uno H, et al. On the empirical choice of the time window for restricted mean survival time. Biometrics 2020. Wang, J.-G. (1987). A note on the uniform consistency of the Kaplan–Meier estimator. The Annals of Statistics, 15(3), 1313–1316. Stute, W., Wang, J.-L. (1993). The strong law under random censorship. The Annals of Statistics, 21(3), 1591–1607. Reviewer #2: see the attached file Reviewer #3: See attached report. Here is some content of the attached report to meet the required character count: The authors explain how a one-sample log-rank test may be used to test whether the survival distribution for a new sample is the same as a reference survival distribution based on historical data. They point out that the standard method of testing does not adjust for error in estimating the reference distribution and, thus, the probability of finding a difference when both the new sample and the historical data follow the same distribution is liable to be inflated above the nominal type I error rate. The main contribution of the paper is to quantify the possible type I error rate inflation. The description of methods is rather technical. I believe the main contribution of the paper is the set of results presented in Table 1. Thus, a clear discussion of the patterns in this table and their generalisability to other cases is paramount. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: comments_to_editor.pdf Click here for additional data file. Submitted filename: review.pdf Click here for additional data file. Submitted filename: report.pdf Click here for additional data file. 29 May 2022 We attached a pdf document with detailed answers to all reviewer comments. Submitted filename: answers_to_comments.pdf Click here for additional data file. 24 Jun 2022 Reference Curve Sampling Variability in One-Sample Log-Rank Tests PONE-D-22-01927R1 Dear Moritz, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Best wishes, Ralf Ralf Bender, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): 1) Please add the missing argument in the formula on page 5, line 122. 2) The formatting of the references is not consistent (upper-lower case in article titles, journal abbreviations). Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I thank the authors for the two revision rounds. All my points are now apropriately addressed. All the best also for the other project regarding two-sample testing, here I think the option of a different weight is also of interest! Anyway, for me, the paper is ready to go. A typo: line 122: the argument of the max_{i in N_A}(?) is missing. Reviewer #2: Thank you for the careful revision. Apart from only a minor comment, I have nothing else to add. On p. 5, line 123, the "maximand" seems to be missing. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** 13 Jul 2022 PONE-D-22-01927R1 Reference curve sampling variability in one–sample log–rank tests Dear Dr. Danzer: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Ralf Bender Academic Editor PLOS ONE

14 in total

Review 1. Comparing survival of a sample to that of a standard population.

Authors: Dianne M Finkelstein; Alona Muzikansky; David A Schoenfeld
Journal: J Natl Cancer Inst Date: 2003-10-01 Impact factor: 13.506

2. Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections, by H. Schäfer and H.-H. Müller, Statistics in Medicine 2001; 20: 3741-3751.

Authors: P Bauer; M Posch
Journal: Stat Med Date: 2004-04-30 Impact factor: 2.373

Review 3. The combination of randomized and historical controls in clinical trials.

Authors: S J Pocock
Journal: J Chronic Dis Date: 1976-03

4. Single-arm Phase II cancer survival trial designs.

Authors: Jianrong Wu
Journal: J Biopharm Stat Date: 2015-06-22 Impact factor: 1.051

5. Sample size calculation for the one-sample log-rank test.

Authors: Jianrong Wu
Journal: Pharm Stat Date: 2014-10-22 Impact factor: 1.894

6. Study design of single-arm phase II immunotherapy trials with long-term survivors and random delayed treatment effect.

Authors: Chenghao Chu; Shufang Liu; Alan Rong
Journal: Pharm Stat Date: 2020-01-12 Impact factor: 1.894

7. An improved one-sample log-rank test.

Authors: Laura Kerschke; Andreas Faldum; Rene Schmidt
Journal: Stat Methods Med Res Date: 2020-03-04 Impact factor: 3.021

8. Evaluation of survival data and two new rank order statistics arising in its consideration.

Authors: N Mantel
Journal: Cancer Chemother Rep Date: 1966-03

9. Phase II cancer clinical trials with a one-sample log-rank test and its corrections based on the Edgeworth expansion.

Authors: Xiaoqun Sun; Paul Peng; Dongsheng Tu
Journal: Contemp Clin Trials Date: 2010-10-01 Impact factor: 2.226

Review 10. A roadmap to using historical controls in clinical trials - by Drug Information Association Adaptive Design Scientific Working Group (DIA-ADSWG).

Authors: Mercedeh Ghadessi; Rui Tang; Joey Zhou; Rong Liu; Chenkun Wang; Kiichiro Toyoizumi; Chaoqun Mei; Lixia Zhang; C Q Deng; Robert A Beckman
Journal: Orphanet J Rare Dis Date: 2020-03-12 Impact factor: 4.123