Literature DB >> 30393467

Group sequential crossover trial designs with strong control of the familywise error rate.

Michael J Grayling¹, James M S Wason¹, Adrian P Mander¹.

Abstract

Crossover designs are an extremely useful tool to investigators, and group sequential methods have proven highly proficient at improving the efficiency of parallel group trials. Yet, group sequential methods and crossover designs have rarely been paired together. One possible explanation for this could be the absence of a formal proof of how to strongly control the familywise error rate in the case when multiple comparisons will be made. Here, we provide this proof, valid for any number of initial experimental treatments and any number of stages, when results are analyzed using a linear mixed model. We then establish formulae for the expected sample size and expected number of observations of such a trial, given any choice of stopping boundaries. Finally, utilizing the four-treatment, four-period TOMADO trial as an example, we demonstrate that group sequential methods in this setting could have reduced the trials expected number of observations under the global null hypothesis by over 33%.

Entities: CellLine Chemical Disease Gene Species

Keywords: 62P10; 62K99; 62L05; Clinical trial; crossover; familywise error rate; group sequential; linear mixed model

Year: 2018 PMID： 30393467 PMCID： PMC6199128 DOI： 10.1080/07474946.2018.1466528

Source DB: PubMed Journal: Seq Anal ISSN： 0747-4946 Impact factor: 0.927

Introduction

The efficiency of crossover trials often makes them the best design for a clinical trial. Administering multiple treatments to patients reduces the standard error of the estimated treatment effects compared to a parallel trial design with an equal number of patients. Therefore, though restrictions to their use exist, such as a requirement for patients to begin each new treatment period in a state comparable to those completed, crossover trials are the design of choice in many settings (Jones and Kenward, 2014; Senn, 2002), resulting in them accounting for 22% of all published trials in December 2000, for example (Mills et al., 2009). In a parallel design setting, group sequential methods are frequently utilized to improve a clinical trial’s efficiency (Jennison and Turnbull, 2000). These designs incorporate interim analyses that allow for early rejection of null hypotheses; efficacy stopping, or early stopping for lack of benefit; futility stopping. This way, the expected sample size required can be reduced over the more classical single-stage approach. Moreover, multi-arm multi-stage designs, which allow multiple experimental treatments to share a control group, can increase efficiency even further (Parmar et al., 2014). Group sequential methods are not frequently used in crossover trial settings however, in particular ones with multiple experimental treatments. Hauck et al. (1997) investigated the performance of group sequential trials for average bioequivalence employing an AB/BA crossover design, and Jennison and Turnbull (2000) provided one possible analysis method for a group sequential AB/BA crossover with a normally distributed endpoint. To the best of our knowledge, no study has explored group sequential theory for crossover trials with more than one experimental treatment being compared to a shared control. Thus, one possible explanation for the lack of group sequential crossover trials may be that there is not yet available a formal proof of how to strongly control the familywise error rate of such a trial with multiple experimental treatments, because such a proof is usually required for regulatory approval (Wason et al., 2014). In comparison to a proof for a parallel multi-arm multi-stage design (Magirr et al., 2012), proving strong control of the familywise error rate is complicated here due to difficulties associated with the covariance structure implied by mixed model analysis. As has been remarked, multiple testing corrections for mixed models are only presently available for certain specific circumstances (Bender and Lange, 2001). Extension to this setting is particularly significant, though, given the noted advantages of comparing multiple experimental treatments to a shared control, in terms of both trial management and sample size (Parmar et al., 2014). Potential exists, given a proof, for the efficiency of crossover trial designs to be improved. In this work, we begin by providing such a proof for a linear mixed model using period and treatment as fixed effects and individuals as random effects. Following this, using the four-treatment, four-period TOMADO trial (Quinnell et al., 2014) as an example, we explore and discuss the efficiency gains that group sequential designs could bring in a crossover setting.

Methods

Notation, hypotheses, and analysis

The trial is assumed to have treatments initially, indexed . Treatments are experimental, to be compared to the control d = 0. A maximum of L stages are planned for the trial. At each stage, patients are allocated to each of a set of treatment sequences, which specify an order in which a patient receives treatments. The sequences used at each stage are determined by the number of treatments remaining in the trial at that stage. Without loss of generality, we will assume that if a treatment or treatments are dropped, treatment D − 1 is dropped first, then D − 2, and so on, because treatments can always be relabeled at each interim analysis. Then, we denote by , the set of sequences for patient treatment allocation when r treatments remain in the trial, with each S written in the form assuming that it is exactly treatments that remain. We further constrain each S to contain only complete block sequences that are balanced for period. Specifically, complete block allocation requires all sequences to contain each treatment remaining in the trial exactly once, and period balance requires an equal number of patients to receive each treatment remaining in the trial in each period. These constraints allow the use of the popular Latin and Williams squares (Jones and Kenward, 2014). A fixed group size n is used for each stage of the trial and is chosen such that at every stage each sequence is used an equal number of times. Thus, n must be divisible by the lowest common multiple of . Designing the trial in this manner ensures that each treatment is considered equally. Outcome data are assumed to be normally distributed, and a linear mixed model is used for analysis, given by or where is the vector of responses, containing the values of the y; the response for individual i, in period j, on sequence k, in stage l, is the vector of fixed effects, of length , consisting of μ0 the mean response on treatment 0 in period 1, an intercept term, π the fixed period effect for period j, with the identifiability constraint . Note that the period is reset to 1 for each new stage of the trial. That is, the first period of stage 2 is treated as period 1 rather than period D + 1, also used in later stages. Thus, we have exactly D − 1 non-zero period effects given our restriction to complete block sequences. τ is theτ fixed direct treatment effect for an individual in period j, on sequence k, in stage l, with the identifiability constraint , is the matrix linking the fixed effects to the vector of responses, is the vector of random effects, consisting of the s; the random effect for individual i, on sequence k, in stage l, is the matrix linking the random effects to the vector of responses, is the vector of residuals, consisting of the ; the residual for individual i, in period j, on sequence k, in stage l. Additionally, denoting by and the between- and within-subject variances, respectively, we take where δ is the Kronecker delta function. Incorporation of fixed effects for period and treatment only and our chosen covariance structure above are the conventional choices for a crossover trial (Jones and Kenward, 2014). We test D − 1 hypotheses. Because we are interested in testing the efficacy of experimental treatments in comparison to a control, we consider the case of one-sided alternative hypotheses for . At each interim analysis, the above model is used to compute an estimate, , for through the standard maximum likelihood estimator of a linear mixed model where (Fitzmaurice et al., 2011). From this we acquire , which consists of the maximum likelihood estimates for each τ. Then, each is standardized to give D − 1 test statistics , with the information level for treatment d at interim analysis l. Since is estimated via a normal linear model, we know that (Jennison and Turnbull, 2000). Given fixed futility boundaries, f, and efficacy bounds, e, the following stopping rules are used at each analysis , for each experimental treatment satisfying for if Z < f treatment d is dropped without rejecting , if the trial is continued with treatment d still present, and if treatment d is dropped and rejected. The control treatment, d = 0, remains present at every undertaken stage, and we only proceed to an additional stage if there is at least one experimental treatment remaining in the trial. It is convenient to take and for all d and l, as well as f = e in order to ensure that the trial conforms to the desired maximum number of stages and so that a conclusion is made for each . Note that rejection of one treatment’s null hypothesis does not end the trial. Furthermore, with this formulation, once a treatment is dropped from the trial, its standardized treatment effect is not tested in any future analyses. In what follows, we will make use of the vectors and . Here, is the analysis at which experimental treatment d was dropped from the trial. Moreover, with if experimental treatment d was dropped for efficacy and 0 otherwise. Prior to a trial’s commencement, and are unknown random variables. However, the probability that the trial progresses according to some particular and , given a vector of true response rates , can be computed using multivariate normal integration. More specifically, given this particular pair, the covariance between and the information level of the test statistics can be computed and the following integral evaluated (see Jennison and Turnbull [2000] or Wason [2015] for further details): where , is the probability density function of a multivariate normal distribution with mean and covariance matrix , evaluated at vector , is the vector formed by repeating L times, , where is the vector of information levels for the estimated treatment effects at interim analysis l, according to (conditional on) the particular being considered, denotes the Hadamard product of two vectors, the square root of the vector is taken in an element-wise manner, and are functions that tell us the lower and upper integration limits for the test statistic Z given values for l, ω and ψ. For example, and u(1,2,1)=e1, while and , and then and for l > 2, is the covariance matrix between the standardized test statistics at and across each interim analysis according to . Thus, using , we have However, , and by the properties of normal linear models (Jennison and Turnbull, 2000), giving for , and where is the matrix formed by placing the elements of vector v along the leading diagonal. Note that equation (2.1) in conjunction with the expectations of our standardized test statistics and the observation that is multivariate normal can be restated simply as that our test statistics follow the canonical joint distribution (Jennison and Turnbull, 2000).

Familywise error rate control

It is a common requirement of clinical trial designs that the probability of one or more false rejections within the family of null hypotheses is not greater than some α. This is known as strong control of the familywise error rate. In this section, we establish strong control for our considered trial design. To evaluate the familywise error rate of a design, for any , the above integral can be evaluated for all and that would imply that a type-I error is made and the results summed. In order to demonstrate how to strongly control, though, it is essential to know the forms of the and for each . However, by equation (2.1), the and can be determined if is known for . Thus, consider the matrix for some and any . We compute values for L, the number of stages of the trial, up to analysis l, in which r treatments were remaining. Because we do not continue the trial unless at least one experimental treatment remains, always. It will be convenient, however, to still include this value. Moreover, it is clear that the L are uniquely determined given . Now, can always be decomposed to be a sum over the determined L and the pre specified sequences S (see Fitzmaurice et al. [2011] for details): Here is the uniquely defined design matrix for a single patient allocated to sequence s, and is the easily computed r × r covariance matrix of the responses for a single patient allocated r treatments in total. The factor arises from the number of patients allocated to each sequence s by our choice of period balance. We now establish two key results about . Following this, we provide a proof detailing how to strongly control the familywise error rate. Let. Consider an analysis to be performed after some number of stages l. Then We have where Ifis the largest integer such thatL = 0 for , then the covariance of the estimates of the fixed effectsis identical to what it would be for. Moreover, the covariance between the estimates ofand the estimates ofis also identical to what it would be for. See Appendix C. Note that part (1) of the above theorem implies for . This is the familiar result for complete block sequences that there is no dependence upon the between-patient variance (Jones and Kenward, 2014).□ A group sequential crossover trial of the type considered, with , testing the D − 1 hypotheses , attains a maximal value of its familywise error rate for. Theorem 2.1 implies that elements of the covariance matrix that differ from the case where no treatments have been dropped are exactly those corresponding to unstandardized test statistics no longer of importance. Consequently, the values of and that differ from the case are only ever those corresponding to limits of integration given by in our computation of . By the marginal distribution properties of the multivariate normal distribution, we therefore need only consider one matrix and one set of vectors , exactly those given by the case . Denote these by and , and set . For more information on this, see Appendix A. We now have Now, consider without loss of generality the probability that we reject H01, and denote by Ω and the sets of all possible and , respectively. By integrating over all possible values of and , we have that the probability that we reject each H01 does not depend on the values of ; that is, on the other treatments tested: where and are the restrictions of and to rows and columns corresponding to experimental treatment d = 1 respectively. This final form for is identical to what it would be in the case D = 2. Therefore, to ascertain the giving the maximal familywise error rate of a trial with , it suffices to consider which maximizes the probability that H01 is rejected in a trial with D = 2 initial treatments. For then, using this will provide the maximum probability of rejecting at least one true for some d; that is, the maximum familywise error rate. To see this, consider the familywise error rate for . If one changes some individual element of this vector, this does not effect the probability that is rejected for , and it can only decrease the probability that is incorrectly rejected. Thus, overall, straying from this can only decrease the familywise error rate. Thus, now consider all possible realizations of the test statistics of a trial with D = 2 and their associated values of . We have , with if the trial was stopped at stage ω1. Now consider increasing the value of the test statistics by some . All instances before where H01 was rejected will still exceed the efficacy bound of that stage, or earlier, and so H01 will still be rejected. Therefore, the probability of rejecting H01 is at least as large as before. Thus, increasing the value of causes a non-decreasing change in the value of the type-I error rate. Therefore, the probability of rejecting H01 is maximized by , implying in turn that the maximal familywise error rate of a trial with is given by .□

Design characteristics

A trial will now be fully specified given values for D, L, , and n, as well as choices for S and the futility and efficacy boundaries and , respectively. Given these, and can be computed using the results above. Then, by Theorem 2.2 we can strongly control the familywise error rate to α for this design using the following sum of integrals: Additionally, suppose that we wish to power this trial to reject a particular null hypothesis, without loss of generality H01, at some clinically relevant difference . The type-II error rate β for H11 is then given by Moreover, denoting by N and O the total number of patients and observations required by the trial, respectively, we can compute the expected sample size, , or expected number of observations, , for any , according to Here, and are functions that give the number of patients and observations, respectively, required by a trial that progresses according to . Specifically where if and 0 otherwise.

Example: TOMADO

As an example of how to design a group sequential crossover trial with strong control of the familywise error rate, we will make use of the TOMADO crossover randomized controlled trial (Quinnell et al., 2014). This open-label trial compared three experimental treatments to a single control for the treatment of sleep apnea-hypopnea using a four-treatment four-period crossover design. The normally distributed secondary endpoint, Epworth Sleepiness Scale, was used to observe negative test statistics. Therefore, we consider the decrease as the endpoint in order to retain the same hypothesis tests as before d = 1, 2, 3. The trial planned to recruit 90 patients, and utilizing restricted error maximum likelihood estimation, the final analysis calculated that . Taking this variance as the truth, the trial had a familywise error rate for , and for H11 at . Many methods exist for determining boundaries for a one-sided group sequential trial with parallel treatment arms. Here, we consider analogues of the power family boundaries of Pampallona and Tsiatis (1994). For this, values for the desired type-I and type-II error rates, a clinically relevant difference δ, the maximum number of stages L, the within-person variance , and a shape parameter Δ must be specified. A two-dimensional grid search is then used to find the exact required maximal sample size. From this, a suitable value of n is identified by rounding up to the nearest integer such that n is as required divisible by . Utilizing Williams squares for our designs, n was forced to be divisible by 12. Taking , L = 3, and , 0, 0.5, 0.5 as examples, group sequential crossover trial designs were determined and compared to the single-stage design used by TOMADO. All computations were done in R (R Core Team, 2016) using the package groupSeqCrossover, available from https://github.com/mjg211/article_code. Matlab (The Mathworks Inc., 2016) code employing symbolic algebra is also available to return the matrices given by several of the equations in the text. Use of both the R and Matlab code is detailed in Appendix D. A summary of the performance of the designs is provided in Table 1, and their computed boundaries are displayed in Figure 1. We can see that, as is the case for two-arm parallel trial designs, there is a trend that larger values of Δ result in larger maximum sample sizes and lower expected sample sizes due to their larger stopping regions. However, this is not the case for because of the requirement to round to a suitable integer value of n.

Table 1.

Example design performance. Summary of the performance of the single-stage and considered group sequential designs.a

	Design
	Single-stage	Δ=-0.25	Δ = 0	Δ=0.25	Δ=0.5
n	90	36	36	48	48
pr(Reject H01\|τ=0)	0.02	0.02	0.02	0.02	0.02
pr(Reject H01\|τ=δ)	0.80	0.85	0.83	0.90	0.83
pr(Reject H0d for some d\|τ=0)	0.05	0.05	0.05	0.05	0.05
pr(Reject H0d for some d\|τ=δ)	0.95	0.97	0.97	0.98	0.97
E(N\|τ=0)	90.0	76.8	70.0	82.6	69.6
E(N\|τ=δ)	90.0	100.3	95.7	110.7	98.9
E(O\|τ=0)	360.0	269.3	240.3	283.1	244.5
E(O\|τ=δ)	360.0	367.2	341.8	380.4	327.7
max⁡N	90	108	108	144	144
max⁡O	360	432	432	576	576

The number of decimal places displayed in each row indicates the number to which rounding was performed.

Figure 1.

Stopping boundaries. Computed efficacy and futility boundaries of the considered group sequential designs.

Stopping boundaries. Computed efficacy and futility boundaries of the considered group sequential designs. Example design performance. Summary of the performance of the single-stage and considered group sequential designs.a The number of decimal places displayed in each row indicates the number to which rounding was performed. Plots of the probability of rejecting H01 and rejecting for some d = 1, 2, 3 are provided for a range of values of θ when in Figure 2. The power curves are similar for all of the designs, with the only differences a result of rounding in the group sequential designs to achieve suitable values of n.

Figure 2.

Power curves. Power curves of the single-stage (L = 1) and considered group sequential designs across a range of values of the true response rate in the experimental treatment arms θ.

Power curves. Power curves of the single-stage (L = 1) and considered group sequential designs across a range of values of the true response rate in the experimental treatment arms θ. As is to be expected for group sequential designs, the maximum sample size and maximum number of observations are larger than those for the single-stage design. However, the group sequential designs have lower expected sample sizes under the global null hypothesis , up to a maximum of 23% for . This does, however, come at the expense of an increased expected sample size under the global alternative hypothesis . From Figure 3, the expected sample sizes of the group sequential designs can be seen to be far lower than those of the single-stage design for more extreme values of θ. A similar statement holds for the expected number of observations. However, in this instance for Δ = 0, 0.5, the performance of the group sequential designs is better than the single-stage design across all values of θ.

Figure 3

Performance measurement curves. Curves of the expected sample size and expected number of observations of the single-stage (L = 1) and considered group sequential designs across a range of values of the true response rate in the experimental treatment arms θ.

Discussion

There is a long history on group sequential clinical trials. Very few, however, utilize a crossover design. This may be at least in part due to no formal proof existing for how to strongly control the familywise error rate of such a trial. Here, we provided such a proof and then explored the performance of several sequential designs for the TOMADO trial. The expected sample size of the sequential designs was observed to be far lower than that of the single-stage design for a large range of values of the true response rate on all experimental treatments. Unfortunately, but unsurprisingly given that the trial is not stopped unless all experimental treatments are dropped, there are regions in which the sequential designs are less efficient. Indeed, this region includes some values of θ between 0 and δ, which may be more realistic observed treatment effects. However, for some considered designs, this region is very small and does not include values near 0, which is notable for ethical reasons. This issue could even be further alleviated by utilizing optimal stopping boundaries, as has been proposed for parallel arm designs (Wason and Jaki, 2012; Wason et al., 2012). Importantly, several of the designs always performed better than the single-stage design in terms of the expected number of observations required, which could be a significant factor in the cost and length of a trial. Consequently, we can conclude that a group sequential approach to a crossover trial improves efficiency in some circumstances. Several possible extensions to our work present themselves. For example, we assumed that the period was reset in each trial stage. This could reflect a scenario where it is believed being enrolled in the trial will alter a patient’s behavior. However, in some cases, such as to deal with seasonal effects, it would be preferential to have different period effects in each stage. One simple extension would be to employ non-inferiority tests, from our present superiority testing framework. Non-inferiority tests, seeking to determine whether new treatments are not clinically worse than an established control, would have hypotheses shifted by some factor from the ones presented here. Theorem 2.2 could easily be altered to accommodate this and then popular methods for boundary determination in this setting could be applied. Here, we have worked under an idealized scenario, assuming the within-patient variance to be known prior to trial commencement. Though this is a common assumption in group sequential theory, it does have limitations, because often a good estimate for the key variance parameter cannot be provided at the design stage. In this instance, group sequential t-tests would almost certainly be required. Furthermore, simulation is required to quantify error rates accurately in the case of small sample sizes. To explore this scenario, we analyzed the true familywise error rate under the global null hypothesis of a particular design motivated again by the TOMADO trial but with L = 2 and n = 12. We found that provided that restricted error maximum likelihood was utilized, there was very little inflation in the familywise error rate over the nominal level α. Details of this are provided in Appendix B. Moreover, we have only explored designing group sequential crossover trials. It is well known that if a final analysis is performed on data acquired in a sequential trial, not taking into account the sequential nature, then biased treatment effects will be acquired. Extending established methodology for parameter estimation to our scenario will thus be important. Finally, we have implicitly assumed that there will be no patient dropout and have not discussed the issue of patient recruitment rates. Though these are problems for all adaptive designs, it is important to give them note. Due to our need for one stages, data to be analyzed before the commencement of the following stage, it is likely that the length of a trial using our approach would be longer for certain recruitment rates. It could be that recruitment is paused at interim or that patients are continually recruited under the old scheme until results are available, which would lead to overrun and an increase in the expected number of observations and sample size. Thus, this would be an important factor to consider when choosing an appropriate design for a trial. Nevertheless, for future crossover trials, consideration should be given to a group sequential approach. This may substantially assist in the efficient prioritization of efficacious treatments.

Table B.1.

Performance of the small sample size group sequential crossover trial design under four analysis procedures. Specifically, is shown for each procedure to three decimal places based on 10,000 trial simulations.a

Procedure	Estimation	Boundary adjustment	pr(reject H0d for some d\|τ=0)
Procedure 1	ML	No	0.077
Procedure 2	ML	Yes	0.062
Procedure 3	REML	No	0.055
Procedure 4	REML	Yes	0.051

ML = Maximum likelihood, REML = restricted error maximum likelihood.

9 in total

Review 1. Adjusting for multiple testing--when and how?

Authors: R Bender; S Lange
Journal: J Clin Epidemiol Date: 2001-04 Impact factor: 6.437

2. A simple two-stage design for quantitative responses with application to a study in diabetic neuropathic pain.

Authors: John Whitehead; Elsa Valdés-Márquez; Agneta Lissmats
Journal: Pharm Stat Date: 2009 Apr-Jun Impact factor: 1.894

3. A group sequential approach to crossover trials for average bioequivalence.

Authors: W W Hauck; P E Preston; F Y Bois
Journal: J Biopharm Stat Date: 1997-03 Impact factor: 1.051

4. Optimal design of multi-arm multi-stage trials.

Authors: James M S Wason; Thomas Jaki
Journal: Stat Med Date: 2012-07-23 Impact factor: 2.373

5. More multiarm randomised trials of superiority are needed.

Authors: Mahesh K B Parmar; James Carpenter; Matthew R Sydes
Journal: Lancet Date: 2014-07-26 Impact factor: 79.321

6. Optimal multistage designs for randomised clinical trials with continuous outcomes.

Authors: James M S Wason; Adrian P Mander; Simon G Thompson
Journal: Stat Med Date: 2011-12-05 Impact factor: 2.373

Review 7. Correcting for multiple-testing in multi-arm trials: is it necessary and is it done?

Authors: James M S Wason; Lynne Stecher; Adrian P Mander
Journal: Trials Date: 2014-09-17 Impact factor: 2.279

Review 8. Design, analysis, and presentation of crossover trials.

Authors: Edward J Mills; An-Wen Chan; Ping Wu; Andy Vail; Gordon H Guyatt; Douglas G Altman
Journal: Trials Date: 2009-04-30 Impact factor: 2.279

9. A crossover randomised controlled trial of oral mandibular advancement devices for obstructive sleep apnoea-hypopnoea (TOMADO).

Authors: Timothy G Quinnell; Maxine Bennett; Jake Jordan; Abigail L Clutterbuck-James; Michael G Davies; Ian E Smith; Nicholas Oscroft; Marcus A Pittman; Malcolm Cameron; Rebecca Chadwick; Mary J Morrell; Matthew J Glover; Julia A Fox-Rushby; Linda D Sharples
Journal: Thorax Date: 2014-07-17 Impact factor: 9.139

9 in total