Literature DB >> 26694878

Maximum type I error rate inflation from sample size reassessment when investigators are blind to treatment labels.

Magdalena Żebrowska¹, Martin Posch¹, Dominic Magirr¹.

Abstract

Consider a parallel group trial for the comparison of an experimental treatment to a control, where the second-stage sample size may depend on the blinded primary endpoint data as well as on additional blinded data from a secondary endpoint. For the setting of normally distributed endpoints, we demonstrate that this may lead to an inflation of the type I error rate if the null hypothesis holds for the primary but not the secondary endpoint. We derive upper bounds for the inflation of the type I error rate, both for trials that employ random allocation and for those that use block randomization. We illustrate the worst-case sample size reassessment rule in a case study. For both randomization strategies, the maximum type I error rate increases with the effect size in the secondary endpoint and the correlation between endpoints. The maximum inflation increases with smaller block sizes if information on the block size is used in the reassessment rule. Based on our findings, we do not question the well-established use of blinded sample size reassessment methods with nuisance parameter estimates computed from the blinded interim data of the primary endpoint. However, we demonstrate that the type I error rate control of these methods relies on the application of specific, binding, pre-planned and fully algorithmic sample size reassessment rules and does not extend to general or unplanned sample size adjustments based on blinded data.

Entities: Chemical

Keywords: adaptive clinical trials; blinded interim analysis; block randomization; random allocation; sample size reassessment; type I error rate control

Mesh：

Substances：
Fingolimod Hydrochloride

Year: 2015 PMID： 26694878 PMCID： PMC4851240 DOI： 10.1002/sim.6848

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.373

Introduction

Blinding is a generally accepted tool to address bias in randomized clinical trials. It ensures that up to the investigated intervention all subjects are handled equally across treatment groups and outcomes are assessed in the same way. Furthermore, blinding of study subjects allows one to distinguish specific treatment effects from potential placebo effects. Blinding is also essential to avert statistical bias in hypotheses testing procedures if data dependent changes to the analysis strategy are made. The ICH E9 guideline 1, for example, recommends to review (and possibly update) the statistical analysis plan based on a blinded data review and notes that “Decisions made at this time should be described in the report, and should be distinguished from those made after the statistician has had access to the treatment codes, as blind decisions will generally introduce less potential for bias”. Similarly, in adaptive clinical trials where adaptations of the trial designs such as a reassessment of sample size can be performed after an interim analysis, blinding is important: it is well known that sample size reassessment based on unblinded interim data may lead to inflation of the type I error by more than 100% 2, 3 if the adaptation is not accounted for by using appropriate adaptive testing procedures 4, 5, 6. To address the various sources of bias in adaptive trials, regulatory guidelines 7, 8, 9 recommend to avoid breaking the blind and to perform adaptations based on blinded interim analyses instead. An assumption underlying these guidance documents is that adaptations based on blinded interim analyses are less prone to bias. Indeed, it has been demonstrated in several settings that adaptations based on blinded interim analysis do not require an adjusted analysis to control the type I error: for superiority studies comparing normally distributed endpoints in a parallel group design where the sample size is reassessed based on the “lumped variance” (the variance of the total sample pooling the observations from both groups), the type I error rate is essentially unaffected 10, 11, 12), also for superiority studies comparing event rates, where the sample size is reassessed based on the overall number of events across treatment groups, no relevant inflation of the type I error rate was observed 13. Analogous results were obtained also for count data 14, if permutation tests are applied, Posch & Proschan 15 and Proschan et al. 16 showed that adaptations based on blinded interim data will indeed control the type I error rate if the clinical trial is restricted to a univariate testing problem where a single endpoint is observed. If adaptations are restricted to the choice between endpoints, the result extends to trials where two endpoints are considered simultaneously. Asymptotically, these results are also valid for t‐tests. However, blinding is not a panacea to prevent bias. If sample sizes are low, a minor increase of the type I error rate is observed for non‐inferiority trials with sample size reassessment based on the lumped variance 17. Also in superiority tests, Proschan et al. 16 showed that for general sample size reassessment rules based on the lumped variance the type I error rate may be inflated for small sample sizes. Furthermore, if the sample size reassessment rule may depend on more than one endpoint, type I error rate control is no longer guaranteed: if the null hypothesis holds for the primary endpoint but not for a secondary endpoint such as, for example, the level of drug in the blood, the secondary endpoint may completely unblind the investigator. However, the bias can also occur in less extreme settings, where the secondary endpoint unblinds the investigator only partially, as may be the case for a safety endpoint. In such settings, the potential type I error rate inflation is similar to that of a clinical trial where adaptations are performed in an unblinded interim analysis without being accounted for in the testing strategy 15. In this paper, we investigate the potential consequences of blinded sample size reassessment approaches that deviate from the accepted statistical practice of applying a binding, algorithmic, and blinded sample size reassessment procedure for which type I error rate control has been demonstrated. In particular, we consider settings where no blinded sample size reassessment has been pre‐specified in the protocol, settings where an option for blinded sample size reassessment (but no binding rule) are pre‐specified, and settings where a binding rule have been pre‐specified but the data monitoring committee decided not to follow the rule. Sponsors may argue for a more flexible approach for several reasons: for example, the deviation of nuisance parameter estimates from planning assumptions may not have been anticipated in the planning phase; the maximum number of available patients is unknown in advance such that no binding rule can be pre‐specified; recruitment is lower than anticipated or safety concerns arise such that it is argued that the pre‐planned sample size algorithm cannot be followed; or information from other trials may arise that serves as an argument for a change in pre‐specified strategies. Recent regulatory guidance documents appear to acknowledge such unplanned adaptations. For example, the FDA adaptive designs draft guidances state, “Certain blinded‐analysis‐based changes, such as sample size revisions based on aggregate event rates or variance of the endpoint, are advisable procedures that can be considered and planned at the protocol design stage, but can also be applied when not planned from the study outset if the study has remained unequivocally blinded.” 8 and “While it is strongly preferred that such adaptations be preplanned at the start of the study, it may be possible to make changes during the studyŠs conduct as well. In such instances, the FDA will expect sponsors to be able to both justify the scientific rationale why such an approach is appropriate and preferable, and demonstrate that they have not had access to any unblinded data (either by coded treatment groups or completely unblinded) and that the data has been scrupulously safeguarded.” 9. Unplanned sample size adjustment is also accepted by European regulators in specific settings, see, for example, Case Study 3 in 18. We consider the setting of a superiority test of a new experimental treatment over control, with a parallel group design and both blocked and unblocked randomization, where the sample size is reassessed after a blinded interim analysis. We assume that blinded data of the primary and a secondary endpoint is observed. This secondary endpoint – which may or may not be correlated with the primary endpoint – could be a surrogate endpoint, a clinical outcome, or a biomarker. For simplicity, the joint distribution of the two endpoints is assumed to be bivariate normal. If the null hypothesis of no treatment effect holds for the primary but not for the secondary endpoint, then the blinded secondary endpoint data provide the investigator with some information about the likely treatment assignment. We quantify the extent to which this can lead to biased analysis results. In Section 2, we introduce the notation and statistical model. In Section 3, we derive an upper bound on the type I error rate for a trial using a random allocation strategy. The case of blocked randomization is considered in Section 4. The results are applied to a case study in Section 5, and the impact of our investigation on the conduct of blinded interim analyses in clinical trials is discussed in Section 6.

The model

Consider a parallel group comparison of an experimental treatment to a control with n subjects in total. Denote the primary endpoint measurement of subject i = 1,…,n by X , and let G ∈{0,1} denote the random treatment allocation. We assume that outcomes are normally distributed with means and common variance σ 2. A hypothesis test of is performed at level α. After n 1=n/2 observations, an interim analysis is performed, and the sample size is reassessed. The new second‐stage sample size is denoted by n 2, and the new total sample size is N:=n 1+n 2. Besides the primary endpoint X , we assume that the experimenter also observes a secondary endpoint Y for each subject i. Assume that the response of patient i, conditional on G , is distributed as where ν 1,ν 0 are the means of the secondary endpoint in treatment and control groups, respectively. At the end of the trial, H 0 will be rejected if Z >z 1 − where, assuming balanced group sizes, The maximum conditional type I error rate given the first stage data from both endpoints is therefore

Random allocation

Our aim is to quantify the extent of potential type I error rate inflation when the outcomes of a secondary endpoint provide partial information about the treatment assignment. We first wish to quantify this inflation for a “random allocation” strategy, where exactly n 1/2 patients received the experimental treatment, with each of the possible sequences of equally likely. After n 1 responses have been observed, a blinded interim analysis is performed, and a second stage sample size n 2 is chosen. We further assume random allocation in the second stage such that exactly n 2/2 patients receive the experimental treatment. The maximum conditional type I error rate given the first stage data from both endpoints is given by (1). Exact evaluation of (1) is difficult for all, but the smallest n 1 because the number of possible assignments (2) grows exponentially. We approach this problem in two ways. Firstly, we use an MCMC algorithm to simulate from the conditional distribution of given the blinded interim data. Secondly, we derive asymptotic results for a “simple randomization” strategy – that is, assuming each patient is allocated to the experimental treatment independently with probability 1/2 – and use a combination of heuristic arguments and simulation results to show that the same asymptotic results are applicable to random allocation.

Computational approach

The type I error rate conditional on the unblinded data is easy to compute: where . It is therefore straightforward to find provided that we can integrate over the conditional distribution of given the blinded data. Although this distribution is over a large space of possible permutations, it can be sampled from using standard MCMC techniques 19. To maximize this conditional type I error rate, we select the N that maximizes (4). R code is provided in the Supporting Information.

Asymptotic considerations and an upper bound for the type I error rate

While the aforementioned computational approach can tell us the maximum conditional type I error rate given a specific blinded data set , it cannot tell us the overall properties of the sample size reassessment procedure without considerable computational effort. Therefore, we study the asymptotic conditional distribution of Z 1. We first derive the conditional distribution of Z 1 under simple randomization (instead of random allocation with fixed per group sample sizes) and then, based on heuristic arguments and supported by simulation, we argue that the same asymptotic distribution applies also for random allocation. If each patient is allocated to the experimental treatment independently with probability 1/2 then by Bayes' theorem for j = 1,…,n 1, where and denote the density functions of the two dimensional normal distribution of (X ,Y ) under experimental treatment and control, respectively. By the central limit theorem for the sum of independent but non‐identically distributed random variables (e.g., Theorem 2.7.1 in 20), is asymptotically normal with mean and variance . We argue that this approximation is valid also under random allocation as, for n 1 large enough, the information provided by the known allocation ratio becomes negligible. In particular, To add support to our claim, we simulated multiple data sets under various choices of n 1,ν 1 and ρ, and compared the normal approximation with the output of the MCMC algorithm. An example is shown in Figure 1, where the normal curve agrees well. As ν 1 and ρ increase, it becomes easier to identify the likely treatment assignment, making the conditional distribution of Z 1 more discrete and the speed of convergence to a normal distribution slower. This can be seen in Figure 9.1–9.3 of the supporting information. For a sample size of n 1=144, the approximation appears to be adequate provided that ν 1≤2 and ρ≤0.8.

Figure 1

Comparing the asymptotic results with MCMC output for an example data set.

Comparing the asymptotic results with MCMC output for an example data set. Equation (5) is also useful to illustrate the impact of the secondary endpoint effect size ν 1−ν 0 on the potential to unblind the data: If ν 0=ν 1, then and the secondary endpoint gives no information on the treatment allocation. If, in contrast, |ν 1−ν 0| increases then q (X,Y) converges in distribution either to 0 (for observations in the control) or to 1 (for observations from the experimental treatment group) even if the correlation ρ is zero: Indeed, for ρ = 0 and if Y is drawn, for example, from the control group then for all ε > 0, we have P(|Y − ν 0|>c)≤ε for c large enough. However, for y, such that |y − ν 0|≤c, we have as |ν 1−ν 0|→∞. To maximize the overall conditional error rate, note that for any given blinded first‐stage data set the maximum conditional type I error rate is Here, we approximated the conditional distribution of Z 1 by a N(m 1,V 1) distribution. Assume there are minimum and maximum sample sizes for the second stage sample size such that . Then, the value of n 2 maximizing (6) is (Appendix A) if V 1≤1, and if V 1>1, where . Figure 2 shows the maximum type I error rate as function of the secondary endpoint effect size for different correlations between the primary and the secondary endpoint ρ∈{0,0.5,0.8,0.9,1}. The worst case conditional error rate was determined by simulation (200,000 simulation runs if not indicated otherwise) setting σ = 1, a nominal one‐sided significance level of 2.5%, and n 1=144 (which is half the total sample size required for a z‐test with power 80% to detect an absolute treatment effect of 1/3 in the primary endpoint). We consider effect sizes in the secondary endpoint ranging from 0 to 2. On first sight, the latter may appear large for trials with the chosen sample size; however, effects in secondary or safety endpoints (such as, for example, laboratory parameters) are typically not relevant for the power calculation and may substantially differ from the treatment effect in the primary endpoint the study is powered for.

Figure 2

Maximum type I error rate as a function of the secondary endpoint effect size with and without restrictions for the second stage sample size. Here, the first stage sample size n 1=144 and σ = 1. (Black) solid lines denote unrestricted results, and (red) dashed lines results for restricted case with and . ρ∈{0,0.5,0.8,0.9,1} and the larger the ρ the thicker the line. In each simulation run, primary and secondary endpoint data were simulated from a bivariate normal distribution, and the maximum conditional error rate was computed based on the approximation (6) with the second stage sample size as defined by (7) and (8). The final maximum type I error rate was then calculated based on the total Z‐test statistics Z for the unrestricted case with and and for the restricted case with and Figure 2(a). For both cases, the maximum type I error rate increases with the correlation ρ between the primary and the secondary endpoint. If this correlation is ρ = 1, and providing that a secondary endpoint effect is present, the maximum type I error rate under unrestricted case is α =0.062, which equals the maximum type I error rate inflation for an unblinded analysis reported by 2. Careful examination of Figure 2(a) reveals that the type I error rate is already inflated when the secondary endpoint effect size is zero. This is an artifact of assuming the variance σ 2 to be known. In this case, m 1=0 and V 1 is proportional to . Rules (7) and (8) reduce to choosing n 2 as small as possible when V 1>1, that is, when there is excess variation in the blinded data, and as large as possible when V 1<1. Intuitively, this is because a larger variance increases the chance of obtaining a significant result. In an attempt to remove this artifact, we re‐ran the simulations, but this time performing a t‐test at the final analysis Figure 2(b). In this case, the procedure becomes slightly conservative at a secondary endpoint effect size of zero. Again, this makes sense for a reassessment procedure that tends to produce a high variance estimate (since this term appears in the denominator of the t statistic).

Block randomization

Often, randomization is performed in blocks to guarantee that the treatment allocation frequencies in the earlier and later phases of a trial are balanced (e.g., Miller et al. (2009) 21). Consider a trial with block randomization with blocks of length τ, where τ is even and the treatment allocation is balanced (P(G =0) = P(G =1) = 1/2). In this section, we investigate the extent to which the additional information on the treatment allocation provided by the blocking allows one to introduce additional bias by sample size reassessment. For example, for a block size of τ = 2, for each block, there are only two possible allocation sequences, AB and BA. Both have probability 1/2. Obviously, the conditional probability, given the blinded data, that the first subject has been assigned to group A is equal to the conditional probability of the allocation sequence AB, conditional on the data of the primary and secondary endpoints of both subjects in the block. As we now use data from two patients to estimate probability of the allocation sequence AB (which is the allocation probability of the first subject) and there are only two possible sequences, we obtain a more informative estimate than in the random allocation scenario, where the allocation probability of each patient was estimated based on its own data only. However, the additional information on allocation probabilities provided by the consideration of the allocation sequences decreases with the block size. For a block size of four, for example, there are possible allocation sequences: A A B B,A B A B,A B B A,B A B A,B B A A,B A A B. Each has (unconditional) probability 1/6. To compute the conditional probability that the first patient is in group A, given the blinded data of the four patients in the block, we need to sum the conditional allocation probabilities of the first three allocation sequences. While for block size two, we used data from two patients to estimate the probabilities of two possible allocation sequences; for a block size of four, we used the data of four patients to estimate the probabilities of six possible allocations. In general, for block length τ, there are possible allocation sequences, each with unconditional probability 1/K, and we need to estimate K allocation probabilities based on the blinded data of τ patients. Because K >> τ for larger τ, it is intuitively clear that for larger block length the additional information provided by blocking decreases (see also 21). To compute the worst case sample size reassessment rule in case of blocked randomization, we need to introduce some notation. Let denote the set of indices where a new block starts. For let , denote the observations in the block starting with the i patient. Let denote the indicator vectors of the K possible treatment allocations for block , where and for all . Here, denotes that in the k treatment allocation for i block the j patient in the block was allocated to group A (control), and denotes that this patient was allocated to group B (treatment). Under block randomization, each allocation is equally likely, such that for and and the joint density for the observations b in block i is given by where , and denotes a bivariate normal density with mean vector , variances σ 2, and correlation ρ. Then, the conditional probability of each treatment allocation, given the data of block b , is given by To derive the sample size reassessment rule that maximizes the type I error rate, we compute the conditional expectation and conditional variance of the first stage test statistics Z 1, conditional on the blinded first stage observations where , and is a realization of m at k treatment allocation of i block at (X ,Y ) = (x ,y ). As in the random allocation case, the conditional distribution of Z 2 given the blinded first stage observations is standard normal, and we approximate the conditional distribution of Z 1 by a normal distribution with mean and variance . As in the unblocked case, we can express the overall test statistic Z as a weighted sum of the stage wise test statistics such that the conditional error rate is given by If there are restrictions for the second stage sample size that is for some and , then (Appendix A) the value of n 2 maximizing (9) can be calculated as in the unblocked case by (7) or (8) with m 1 and V 1 replaced by and , respectively. Figure 3 shows the maximum type I error rate of the trial with block randomization, with block sizes {2,4,6} (and per group sample size 72) for ρ = 0 and unrestricted second stage sample size (i.e., n 2∈[0,+∞)). Results for other correlations for both unrestricted and restricted second stage sample size are given in the Supporting Information Figure 9.4. As expected, using the additional information on the blocking of observations increases the maximum type I error rate. The smaller the block size the better the data can be unblinded, and the larger is the maximal type I error rate.

Figure 3

Maximum type I error rate without restrictions for the second stage sample size and for blocked randomization with block sizes 2, 4, 6 and the unblocked design (n 1=144,ρ = 0,σ = 1, and 2.5·105 (2·105) simulation runs for block size 2 (4, 6, unblocked design)). To implement the aforementioned worst case sample size adaptation rule, one must know the block size. However, also if the block sizes are not known, the type I error rate may be inflated. Consider a clinical trial where block randomization is used, but the worst case sample size reassessment rule for random allocation (7), (8) is applied (which does not require knowledge of the block sizes). To estimate the type I error rate for such a setting, simulation studies for different block sizes were performed as in the preceding text. In all considered scenarios, the simulated maximum type I error rate is very close to the maximum error rate observed in the setting of Section 3, where the same sample size reassessment rule for random allocation (7), (8) is applied, but random allocation is used to allocate patients (Figure 9.7–9.10 in the Supporting Information).

A clinical trial example

As an illustrative example, consider a Phase III clinical trial to asses efficacy and safety of Fingolimod in patients with relapsing‐remitting multiple sclerosis along the lines of the FREEDOMS trial 22, 23. While in the original trial, 1,272 patients were randomized to receive oral Fingolimod doses of 0.5 or 1.25 mg or placebo daily; for simplicity, we consider a trial with two parallel groups, comparing only the higher dose with placebo with N = 800 patients in total, randomly allocated to groups (such that the per‐group sample size is similar to the original trial). The annualized aggregate relapse rate (ARR) during months 0 to end of study was set as a primary endpoint and is defined as the number of confirmed relapses in a year. Among the additionally measured parameters is the mean lymphocyte count. It is known from earlier studies that Fingolimod lowers the lymphocyte count compared wtih placebo. Based on the results in 24, we assume a mean ν 0=1.8 [×109 cells/L] for the placebo group and ν 1=0.55[×109 cells/L] for the 1.25 mg Fingolimod group with a common standard deviation of σ =σ =0.31[×109 cells/L] for the mean lymphocyte counts at day 7 (values approximated based on Figure 6 in 24). Furthermore, it is known that the lymphocyte counts are causally related to the primary endpoint through the mechanism of action of Fingolimod. Lee JY et al. (2013)25 investigated the relation between the predicted lymphocyte count to ARR. Because no correlation coefficient is given in 25, and we do not have access to the raw data; we computed the maximum inflation of the type I error rate and the correlation between the blinded and unblinded effect size estimates for a grid of ρ in [0,0.9]. Assume that in the Phase III trial an interim analysis is performed after n 1=400 subjects have been randomized. By (6), the maximum type I error rate for ρ∈[0,0.9] ranges from α =0.054 to α =0.059 and if we restrict the second stage sample size such that the maximum type I error rate ranges from α =0.035 to α =0.036 (see Supporting Information Figure 9.6 (a)). As another example, assume that instead of the lymphocyte counts the mean total white blood cell counts (WBC) are used. The mean WBC count for the placebo group at 24 months is ν 0=6.5 × 109 cells/L with a standard deviation of 1.8, the mean for Fingolimod 1.25 mg group ν 1=3.8 × 109 cells/L with a standard deviation of 1.3 (estimated from the Supporting Information Figure 1 C in 26). As we are not aware of published data on the correlation of WBC and ARR, we computed the maximum inflation of the type I error rate and the correlation between the blinded and unblinded effect size estimates for a grid of ρ in [0,0.9]. Then, pooling the group wise standard deviations to σ = 1.57, the upper bound for the type I error rate for ρ∈[0,0.9] ranges from α =0.041 to α =0.054 for unrestricted second stage sample size and from α =0.031 to α =0.035 if the second stage sample size is restricted to the interval [200,1600] (see Supporting Information Figure 9.6 (b)).

Discussion

In this work, we demonstrated that even blinded sample size reassessment may lead to an inflation of the type I error rate if there are secondary endpoints for which the alternative holds. This implies that unscheduled sample size reassessment, even in a blinded setting, may damage the integrity of the trial. The numerical results give an upper bound for the inflation of the type I error that may occur due to blinded sample size reassessment in a setting where the distribution of a secondary endpoint is known to the experimenter, for example from historical data. While this is a simplifying assumption, the impact of a treatment on surrogate endpoints is often known from Phase II trials before a Phase III trial is started. However, the approach can be extended to settings where no prior information on the distribution of the secondary endpoint is available. In this case, the distribution of the secondary endpoint can be estimated from the blinded data based on a mixture model with an expectation–maximization (EM) algorithm as in 27 or 28. It has been shown that such estimators, when applied to the data of the primary endpoint, are only reliable for very large effect or sample sizes 29 and perform poorly for effect sizes usually occurring in clinical trials. However, while large treatment effects in the primary endpoint do occur rarely, this does not necessarily apply to effect sizes for secondary or safety endpoints (see, for example, the clinical trial example in Section 5), which are relevant for the setting considered in this manuscript. Overall, depending on the effect size in the secondary endpoint, the type I error rate resulting from sample size reassessment based on expectation–maximization algorithms will still be affected, albeit on a lower scale. In the computation of the worst case sample size reassessment rule, we used only the information from a single secondary endpoint to estimate the treatment allocation. Instead, one could use the data from several endpoints: to derive the resulting maximum type I error rate, one needs to replace the bivariate normal densities in (5) by the respective multivariate densities. To extend the setting of a single interim analysis to multiple blinded interim analyses, one can derive worst case adaptation rules and the resulting maximum type I error rate with a backwards induction approach. We investigated the impact of different randomization procedures on the maximum type I error rate and found that block randomization, especially with small block sizes, increases the type I error rate inflation, if the information on the block size is used in the sample size adjustment. If the latter information is not used, blocking leads to essentially the same inflation as under random allocation. These findings support current recommendations against too small block sizes and inclusion of information on block sizes in study protocols 30. Wang et al. 31 consider a related problem and derive the maximum type I error rate for sample size reassessment rules based on unblinded interim effect size estimates of a secondary endpoint that is correlated with the primary endpoint, but assuming that the primary endpoint is not observed in the interim analysis. The maximum type I error rate in this setting depends only on the correlation ρ of the primary and the secondary endpoints, and there is no inflation of the type I error rate if ρ = 0. In contrast, in the blinded setting considered in this paper, even if the correlation between the primary and the secondary endpoint is zero, the type I error rate may be inflated. This holds because we assume that the primary endpoint is observed and the blinding is partially lost due to a treatment effect in the secondary endpoint that gives information on the treatment allocation. The potential inflation of the type I rate is related to the fact that this partial loss of blinding allows one to estimate the unblinded first stage effect size estimate in the primary endpoint : if the unknown G are replaced by q as defined in (5), a blinded estimate is given by . The correlation r between and (not to be confused with the correlation ρ between primary and secondary endpoint) can be interpreted as a measure of unblinding and increases with the effect size in the secondary endpoint and ρ. In the clinical trial of Section 5, for example, r ranges from 0.97 up to nearly 1 in the first and from 0.68 to 0.96 in the second example for ρ∈[0,0.9] (see Figure 9.5 in the Supporting Information Figures and Section 8.1 in the Supporting Information for computational details). Our findings do not contradict the well established use of blinded sample size reassessment based on aggregate event rates or variance estimates computed from blinded primary endpoint interim data. However, they demonstrate that the type I error rate control of these methods relies on the application of specific, binding, pre‐planned, and fully algorithmic sample size reassessment rules (as recommended for data monitoring committee charters, see for example 32) for which type I error control has been demonstrated. The type I error rate control does not extend to general sample size adjustments based on blinded data. Therefore, including only a non‐binding option for blinded sample size reassessment in clinical trial protocols is not sufficient to guarantee type I error rate control. In particular, we quantify the maximum type I error rate inflation when a worst case adaptation rule is applied that also uses information from a secondary endpoint. Our work also implies that post hoc adjustments of the sample size may lead to type I error rate inflations, even if justified by post hoc scientific arguments (as required in the guideline quoted in the Introduction). Consider, for example, a scenario where blinded outcome data is available and adaptations following the rule in Section 3.3 are applied whenever a post hoc selected sample size reassessment rule (or scientific arguments external to the trial) can be found that justifies that choice. Otherwise, the pre‐specified sample size is used. Because the conditional error rate is increased in all instances where the sample size is adapted but is unchanged otherwise, the overall type I error rate will be inflated by such a strategy. Furthermore, note that even aggregate statistics (as referred to in the quoted guidelines) may contain information on the unblinded treatment effect estimate and therefore may lead to type I error rate inflation. Examples are the correlation coefficient of the primary endpoint and a secondary or safety endpoint (if there is a treatment effect in the latter), or per group means of subgroups whose definition is based on such secondary or safety endpoints. While the assumption that a worst case sample size rule is applied in an actual clinical trial may not be realistic, it is a means to derive an upper bound for the type I error rate in settings where no binding sample size reassessment procedure is pre‐specified, or post hoc adaptations are performed, and secondary endpoint data has been available. While the actual type I error may be substantially lower than this upper bound, it can not be computed because it depends not only on the realized sample sizes but also on the sample sizes that would have been applied had other interim data been observed. In settings where no sample size adjustment algorithm has been pre‐specified, alternatives to fixed sample hypothesis tests are tests based on combination functions or the conditional error rate principle 2, 4, 33, 34 that control the type I error rate even without pre‐specified adaptation rules. The conditional error rate based procedures even control the type I error rate if no adaptations were pre‐planned but are introduced during the conduct of the study. Supporting info item Click here for additional data file.

22 in total

1. Simple procedures for blinded sample size adjustment that do not affect the type I error rate.

Authors: Meinhard Kieser; Tim Friede
Journal: Stat Med Date: 2003-12-15 Impact factor: 2.373

2. Two-stage sample size re-estimation based on a nuisance parameter: a review.

Authors: Michael A Proschan
Journal: J Biopharm Stat Date: 2005 Impact factor: 1.051

3. EM Estimation for Finite Mixture Models with Known Mixture Component Size.

Authors: Chen Teel; Taeyoung Park; Allan R Sampson
Journal: Commun Stat Simul Comput Date: 2015-07 Impact factor: 1.118

4. Adaptive designs for confirmatory clinical trials.

Authors: Frank Bretz; Franz Koenig; Werner Brannath; Ekkehard Glimm; Martin Posch
Journal: Stat Med Date: 2009-04-15 Impact factor: 2.373

5. Blinded sample size reestimation with count data: methods and applications in multiple sclerosis.

Authors: Tim Friede; Heinz Schmidli
Journal: Stat Med Date: 2010-05-10 Impact factor: 2.373

6. Evaluation of experiments with adaptive interim analyses.

Authors: P Bauer; K Köhne
Journal: Biometrics Date: 1994-12 Impact factor: 2.571

7. Pharmacodynamic effects of steady-state fingolimod on antibody response in healthy volunteers: a 4-week, randomized, placebo-controlled, parallel-group, multiple-dose study.

Authors: Craig Boulton; Karin Meiser; Olivier J David; Robert Schmouder
Journal: J Clin Pharmacol Date: 2011-12-15 Impact factor: 3.126

8. Blinded assessment of treatment effects utilizing information about the randomization block length.

Authors: Frank Miller; Tim Friede; Meinhard Kieser
Journal: Stat Med Date: 2009-05-30 Impact factor: 2.373

9. Connections between permutation and t-tests: relevance to adaptive methods.

Authors: Michael Proschan; Ekkehard Glimm; Martin Posch
Journal: Stat Med Date: 2014-08-25 Impact factor: 2.373

10. Maximum type I error rate inflation from sample size reassessment when investigators are blind to treatment labels.

Authors: Magdalena Żebrowska; Martin Posch; Dominic Magirr
Journal: Stat Med Date: 2015-12-23 Impact factor: 2.373

4 in total

1. Maximum type I error rate inflation from sample size reassessment when investigators are blind to treatment labels.

Authors: Magdalena Żebrowska; Martin Posch; Dominic Magirr
Journal: Stat Med Date: 2015-12-23 Impact factor: 2.373

2. Adaptive designs in clinical trials: why use them, and how to run and report them.

Authors: Philip Pallmann; Alun W Bedding; Babak Choodari-Oskooei; Munyaradzi Dimairo; Laura Flight; Lisa V Hampson; Jane Holmes; Adrian P Mander; Lang'o Odondi; Matthew R Sydes; Sofía S Villar; James M S Wason; Christopher J Weir; Graham M Wheeler; Christina Yap; Thomas Jaki
Journal: BMC Med Date: 2018-02-28 Impact factor: 8.775

3. Adaptive designs in clinical trials: from scientific advice to marketing authorisation to the European Medicine Agency.

Authors: Olivier Collignon; Franz Koenig; Armin Koch; Robert James Hemmings; Frank Pétavy; Agnès Saint-Raymond; Marisa Papaluca-Amati; Martin Posch
Journal: Trials Date: 2018-11-20 Impact factor: 2.279

4. Estimation after blinded sample size reassessment.

Authors: Martin Posch; Florian Klinglmueller; Franz König; Frank Miller
Journal: Stat Methods Med Res Date: 2016-10-02 Impact factor: 3.021

4 in total