Guogen Shan1, Sarah Banks2, Justin B Miller2, Aaron Ritter2, Charles Bernick2, Joseph Lombardo3, Jeffrey L Cummings2. 1. Epidemiology and Biostatistics Program, Department of Environmental and Occupational Health School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, NV, USA. 2. Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, USA. 3. National Supercomputing Institute, University of Nevada Las Vegas, Las Vegas, NV, USA.
Abstract
INTRODUCTION: New treatments for neurodegenerative disease are urgently needed, and clinical trial methods are an essential component of new drug development. Although a parallel-group study design for neurological disorder clinical trials is commonly used to test the effectiveness of a new treatment as compared to placebo, it does not efficiently use information from the on-going study to increase the success rate of a trial or to stop a trial earlier when the new treatment is indeed ineffective. METHODS: We review some recent advances in designs for clinical trials, including futility designs and adaptive designs. RESULTS: Futility designs and noninferiority designs are used to test the nonsuperiority and the noninferiority of a new treatment, respectively. We provide some guidance on using these two designs and analyzing data from these studies properly. Adaptive designs are increasingly used in clinical trials to improve the flexibility and efficiency of trials with the potential to reduce resources, time, and costs. We review some typical adaptive designs and new statistical methods to handle the statistical challenges from adaptive designs. DISCUSSION: Statistical advances in clinical trial designs may be helpful to shorten study length and benefit more patients being treated with a better treatment during the discovery of new therapies for neurological disorders. Advancing statistical underpinnings of neuroscience research is a critical aspect of the core activities supported by the Center of Biomedical Research Excellence award supporting the Center for Neurodegeneration and Translational Neuroscience.
INTRODUCTION: New treatments for neurodegenerative disease are urgently needed, and clinical trial methods are an essential component of new drug development. Although a parallel-group study design for neurological disorder clinical trials is commonly used to test the effectiveness of a new treatment as compared to placebo, it does not efficiently use information from the on-going study to increase the success rate of a trial or to stop a trial earlier when the new treatment is indeed ineffective. METHODS: We review some recent advances in designs for clinical trials, including futility designs and adaptive designs. RESULTS: Futility designs and noninferiority designs are used to test the nonsuperiority and the noninferiority of a new treatment, respectively. We provide some guidance on using these two designs and analyzing data from these studies properly. Adaptive designs are increasingly used in clinical trials to improve the flexibility and efficiency of trials with the potential to reduce resources, time, and costs. We review some typical adaptive designs and new statistical methods to handle the statistical challenges from adaptive designs. DISCUSSION: Statistical advances in clinical trial designs may be helpful to shorten study length and benefit more patients being treated with a better treatment during the discovery of new therapies for neurological disorders. Advancing statistical underpinnings of neuroscience research is a critical aspect of the core activities supported by the Center of Biomedical Research Excellence award supporting the Center for Neurodegeneration and Translational Neuroscience.
In clinical trials for neurological disorders, a parallel group study is commonly used to assess the effectiveness of a new treatment as compared to the placebo group [1], [2], [3], [4]. Patients are randomized to either the treatment arm(s) or the placebo arm following a prespecified randomization schedule. At the end of the study, the change of the primary outcome from the end to the baseline, calculated from the treatment arm, is compared with that from the placebo arm to make a conclusion whether the new treatment has sufficient activity to move to the next phase for further investigation. The primary outcome to assess the cognitive performance can be measured by established assessment tools, such as the Alzheimer's Disease Assessment Scale–Cognitive subscale (ADAS-Cog), the Unified Parkinson's Disease Rating Scale (UPDRS), Clinical Dementia Rating, and the Amyotrophic Lateral Sclerosis Functional Rating Scale-revised (ALSFRSr). The commonly used parallel-group design is able to study the effectiveness of the new treatment with influential covariates being balanced during the randomization procedure; however, it may not be efficient enough for the purpose of rapidly screening out nonpromising treatments or identifying the most promising treatments [1], [5], [6], [7], [8], [9], [10].Futility designs are widely used in early phase neurological disorder trials to screen out new treatments that are highly unlikely to produce successful results [11], [12], [13], [14], [15]. Futility designs can be used in a single-arm study with the threshold estimated from historical controls or in a parallel-group study with a nonsuperiority alternative hypothesis [16], [17], [18]. The purpose of the futility design is to screen out an unpromising treatment with fewer patients and a much shorter study time period. As compared to the futility design, the commonly used parallel-group study is often used to test the superiority or noninferiority of the new treatment over the placebo. In this article, we review the difference between the futility design and the noninferiority design which is also widely used in clinical trials to test the noninferiority of a new treatment. We also provide some guidance on the proper usage of such designs [19], [20], [21], [22], [23], [24].In recent years, adaptive designs have been introduced and used in trials for neurological disorders to reduce resource use and study length [25], [26], [27], [28]. There are a few definitions for an adaptive design. In 2010, the Food and Drug Administration published a draft guidance document on adaptive designs and defined an adaptive design as “a study that includes a prospectively planned opportunity for modification of one or more specified aspects of the study design and hypotheses based on analysis of data (usually interim data) from subjects in the study” [29].Adaptive designs provide opportunities to modify or change the trial during the study while maintaining the validity and integrity of the trial. These opportunities are prespecified when certain conditions are met. In 2008, Chow and Chang [27] reviewed 10 adaptive designs used in clinical trials, including an adaptive randomization design that allows modification of randomization schedules; a group sequential design that allows early stopping due to futility, efficacy, or both; a sample size re-estimation design allowing sample size adjustment; a pick-the-winner design; an adaptive dose-finding design; a biomarker-adaptive design; an adaptive treatment-switching design; an adaptive seamless design; a hypothesis-adaptive design; and a multiple adaptive design. In this article, we review the following two commonly used adaptive designs in neurological disorder trials. The response-adaptive randomization design uses the patients' responses from the current on-going study to modify the assignment probabilities to each treatment arm, with more patients being treated in the better arms. The response-adaptive randomization design belongs to the adaptive randomization design that also includes treatment-adaptive randomization and covariate-adaptive randomization [27]. The other adaptive design discussed in this article is the adaptive dose-finding design that increases the accuracy of the estimation for the maximum tolerated dose or minimum effective dose [30].Studies designed by an adaptive method may introduce new challenges in data analysis. It is important that intended statistical analysis should guide the study design [23], [31], [32]. For this reason, new statistical analysis approaches to analyze the data from adaptive designs properly are also discussed. Review of novel, efficient, and proper statistical approaches in neuroscience research is an important service of the Data Management and Statistics Core of the Center for Neurodegeneration and Translational Neuroscience supported by the Center of Biomedical Research Excellence award from the National Institute of General Medical Sciences.
Futility designs
The futility design, also known as the nonsuperiority design, can be used to screen out a new treatment candidate who is not promising for further investigation. It can be implemented in a single-arm study or a parallel group study to investigate the effectiveness of a new experimental treatment. Suppose μ and μ are the mean of the primary outcome in a new experimental treatment group and in the control group, respectively, in a parallel group study. For a single-arm study, we may use the same notation μ to represent the estimated value from historical data. Let Δ = μ
− μ be the difference between the two groups.For clinical trials in neurology, the primary outcome of interest to measure disease symptoms is often computed from some well-established assessment tools, for example, ADAS-Cog, UPDRS, and ALSFRSr. The change of these measurements from the end to the baseline (post–pre) is often used as the primary outcome, for example, μ = μ − μ, where μe1 and μ are the outcome of patients from the treatment group at the end and at baseline, respectively. It should be noted that a treatment with a smaller increase (slowing disease progression) or a larger decrease (improving the disease symptoms) in the outcome is considered as a better treatment in some assessment tools (e.g., ADAS-Cog, UPDRS), whereas it is reversed when others are used (e.g., ALSFRSr).When ADAS-Cog or UPDRS is used to measure the disease symptom, suppose δ0 is the maximum allowable progression threshold, the statistical hypotheses for the futility design are presented aswhere δ0 is a clinically meaningful threshold to measure the disease symptom [33], [34], [35]. For example, a clinical trial to assess the effectiveness of coenzyme Q10 and GPI-1485 in Parkinson's Disease (PD) patients [36] was designed as a futility study with δ0 = −3.19, which is 30% of the total UPDRS change of participants in the placebo group from the Deprenyl and Tocopherol Antioxidant Therapy of Parkinsonism trial (DATATOP), μ = 10.65. This trial is designed as a single-arm futility study with the hypotheses:If the null hypothesis is rejected, we can conclude that the new experimental treatment is not promising for further investigation. The sample size for each arm in this study [36] was calculated as 58 participants per arm to attain 85% power at the significance level of 0.1, with μ = 7.46 under the null hypothesis and μ = 10.65 under the alternative hypothesis.When a larger observed outcome represents a better treatment (e.g., the ALSFRSr score), the hypotheses for a two-arm futility design are presented aswhere δ0 is the minimum worthwhile efficacy, and it is often a positive value. Although the hypotheses can be presented in two different formulae as in Equations (1) and (2) which depend on the direction of the assessment tool, they are statistically identical by reversing one of the assessment tools. Both the alternative hypotheses suggest nonsuperiority of the new experimental treatment as compared to placebo.
Noninferiority designs
The aforementioned futility design (also known as the nonsuperiority design) should not be confused with other designs, such as a superiority design or a noninferiority design. We compare these designs to the futility design with the hypotheses in Equation (2) by assuming that a higher score represents a better treatment. The hypotheses of a superiority design or a noninferiority design are expressed aswhere δ1 is the margin. Let the clinically meaningful estimate of Δ be Δ. This estimate is very important in clinical trials to show clinically meaningful improvement by a new treatment. When δ1 > Δ, Equation (3) represents a superiority design. It becomes a noninferiority design when δ1 < Δ. In a noninferiority trial, the aim is to show that a new treatment is not much worse than the standard care or not clinically inferior to the standard care. Lesaffre [37] compared the difference between superiority trials and noninferiority trials with two real noninferiority trial examples along with the discussion of the noninferiority margin.In a randomized clinical trial to investigate the validity and reliability of online delivery of the Lee Silverman Voice Treatment for PD patients with speech and voice disorder [38], the change in sound pressure level (dB-C) after the treatment was the primary outcome. The clinically relevant improvement was estimated as Δ = 4.5 dB, with an estimated standard deviation of 2.48 dB. This study was designed as a noninferiority trial to compare the performance with the Lee Silverman Voice Treatment between online and face-to-face administration, with the noninferiority marginal of 2.25 which is half of the estimated clinically relevant improvement, δ1 = 2.25 < Δ. A sample size of 15 per arm was required to attain 90% power at the significance level of 0.025 for this noninferiority trial.In another noninferiority study reported by Winblad et al. [39], the rivastigmine capsule was compared with placebo for AD patients by using the ADAS-Cog change from baseline as the primary outcome. The noninferiority margin was set as 1.25 points decrease on the ADAS-Cog, which is half of the estimated treatment difference from other existing studies. This noninferiority margin is considered as the minimum clinically meaningful difference.In the aforementioned hypotheses for either a futility design or a noninferiority design, the primary outcome is computed as the change from the end to baseline, for example, μ = μ − μ. When the primary outcome is measured as the change from baseline to the end (pre–post, e.g., μ = μ − μe1), the aforementioned hypotheses can still be applied. For example, in a study to confirm the noninferiority of rotigotine to ropinirole for PD patients on concomitant levodopa therapy, the primary outcome was the change of the UPDRS Part III (ON state) sum score from baseline to the end [40]. In this study, a larger observed value (the difference of change between rotigotine and ropinirole) represents a better treatment. For this reason, the hypotheses presented in Equation (3) should be used in this study to assess the noninferiority of rotigotine to ropinirole.
Sample size determination and statistical inference
Sample size calculation plays a very important role in clinical trials to ensure a prespecified level of power when type I error rate (α) is controlled. Type I error rate and power are generally computed by using the estimated Δ values under the null and alternative hypotheses, δ0 and δa. Accurate estimates of δ0 and δa would increase the success rate of a trial with the computed sample size adequate to detect the difference between the treatment arms.The hypotheses discussed in this article are all one-sided; therefore, zα, instead of zα/2, is used in the sample size determination (see Levin [41] for the detailed sample size calculation formula). It should be noted that the sample size calculation provided by Levin [41] is based on asymptotic approaches, which should be used with caution for a study with sample size that is small to medium. For a study with binary outcome (more than 50% decrease in the Inventory of Depressive Symptomatology-Clinician score from baseline in PD [42]), the proper type I error rate should be computed under the null hypothesis (H0: Δ ≤ δ0), not just at the boundary of the associated hypothesis space (Δ = δ0) [43], [44], [45]. In the trial to compare response rates between atomoxetine and placebo [42], we suppose the null hypothesis is presented as H0: pa ≤ pc, where pa and pc are the response rate for the atomoxetine arm and the placebo arm, respectively. The response rate for the placebo arm is estimated to be 10% from historical data. An exact unconditional approach [46] may be used to calculate the error rate, then the actual type I error rate is computed aswhere Ω is the rejection region and ƒ(.) is the probability density function of a binomial distribution. The type I error rate should be properly computed over the null space pa ≤ pc ≤ 10% as in Equation (4). Often, its error rate is computed on the boundary with pa = pc = 10%, which is not proper for use without a theoretical proof to show that the error rate occurs at the boundary of the null space [43].When two proportions are compared, efficient sample size calculation approaches are recommended for use to provide valid sample sizes, such as simulation-based approaches and exact approaches [3], [23], [47], [48], [49], [50]. In addition, the actual type I error rate could be highly affected by the estimated δ0
[15], and we would encourage researchers to compute and provide sample sizes under multiple possible δ0 and δa scenarios.At the end of a trial, observed data are analyzed to make statistical inference. Statistical analysis should be intentionally consistent with the design. For a randomized placebo-controlled futility study with the hypotheses given in Equation (1), the null hypothesis is rejected when a large Δ value is observed. The progression threshold δ0 is the boundary of the one-sided lower confidence interval computed from the observed data. When δ0 is outside of the interval (δ0 is less than the lower limit), we have enough evidence to reject the null, and the new experimental treatment is not promising for further investigation. Otherwise, if the computed lower limit is less than δ0, we fail to reject the null hypothesis. Similarly, the 1 − α lower limit of Δ is used in testing the hypotheses in Equation (3), whereas the 1 − α upper limit is used for the hypotheses in Equation (2) to make valid statistical inference. The lower or upper limit of δ0 should be computed properly for different types of data. For a matched-pairs study, a study design that accounts for the matching information should be considered.
Adaptive designs
Adaptive designs have been increasingly used in clinical trials to increase the flexibility of a trial by allowing a trial to be stopped earlier for futility when a new treatment is not promising and/or allowing more patients being assigned to a better treatment and so on. The adaptations in a trial have to be prospectively planned to guarantee the validity and integrity of the trial.An adaptive design provides opportunities for a trial to be modified during the course of the trial; however, it has to be prospectively planned. In other words, any modification of a trial (e.g., adding or dropping a treatment arm) is specified during the planning stage when certain conditions are met. These conditions include the comparison of results based on the observed data from the on-going study. In general, it takes more effort for the research team to prepare an adaptive design than a traditional nonadaptive design. A significant number of simulation studies have to be conducted to investigate all possible outcomes during the planning stage.In a very recent phase II trial to evaluate the BAN2401 (a monoclonal antibody targeting amyloid protofibrils) for the treatment of AD patients, the response-adaptive randomization model was used in the study design. The probability of the next patient being assigned to one of the treatment arms or the control arm is determined by the probability of that treatment arm being the most effective treatment arm among all arms. The cumulative data of patients from this on-going study are used for calculating these probabilities. An adaptive randomization design allows a trial to assign more patients to better treatment arms, which may lead to imbalances in the sample-size allocation and the distribution of influential covariates across treatment arms. Recently, Saville and Berry [51] proposed using odds ratios to modify the probability in the response-adaptive randomization for each arm to improve the covariate balance. When the primary outcome is binary, a new patient allocation scheme to adjust the covariate imbalance issue during the adaptive randomization procedure has been proposed [52]. New and proper statistical methods are needed to overcome the emergent statistical challenges from the response-adaptive randomization. Alternatively, a study can be designed as a covariate-adaptive randomization to balance the allocation of multiple arms across a set of influential covariates without compromising randomness. That would help reducing the complexity of the final data analysis.In neurology, adaptive designs are often used in early phases to learn the safety of a new treatment and select the dose for the following trials. In a dose-finding study, there are often a few arms with different dose levels and a placebo arm to estimate the placebo effect. Adaptive methods can be used to stop a dose earlier due to futility, accept a dose due to efficacy, or add a new dose to the study. The aforementioned response-adaptive randomization could be used in conjunction with the adaptive dose-finding design. For example, a phase II trial to evaluate the safety and efficacy of ABT-089 in AD patients was designed by using a Bayesian response-adaptive randomization method to allocate patients to one of the seven arms (six arms of ABT-089 with different doses and placebo) after having at least five patients in each arm [53]. This study was also designed to allow stopping for efficacy or futility based on the conditional power calculated from the on-going study. The objective of this adaptive dose-finding study was to identify the minimum effective dose resulting in at least an average of 1.75-point ADAS-Cog improvement over placebo. In other studies, the maximal tolerated dose may be the target dose [54].In addition to the aforementioned adaptive designs, Chow and Chang [27] reviewed other commonly used adaptive design methods in clinical trials: an adaptive seamless phase II/III design, a biomarker-adaptive design, an adaptive treatment-switching design, and so on. More details about these adaptive designs may be found in the literature [1], [9], [22], [24], [25], [27], [55], [56], [57].
Discussion
The futility design is used in early phase clinical trials to screen out unpromising treatments and save resources for other treatment candidates. By contrast, a futility-stopping boundary is used to drop a treatment arm from a trial or stop a trial earlier if the treatment will not show efficacy based on the observed results. The futility boundary could be defined as the prespecified conditional power or the prespecified confidence limit. They are two different concepts, that is, the futility design is a study design, whereas a futility boundary is a threshold in the study design [41], [58].Adaptive designs are attractive to increase the flexibility of a trial, but they also introduce new statistical challenges to analyze the final observed data properly. The aforementioned imbalance issue on the inferential covariates from the response-adaptive randomization is one of them. When a study is designed by an adaptive approach, the data analysis should align with the study design. For an adaptive two-stage design in which the second-stage sample size depends on the results from the first stage, the data analysis that uses only the final observed data without considering the nature of a two-stage adaptive design is not appropriate [10], [59]. For example, a single-arm two-stage adaptive design was used to assess the effectiveness of a new treatment for PD patients with the primary outcome as a binary endpoint. The required sample size for the first stage is n1 = 22 participants, and the second-stage sample size, n2(X1), is a function of the number of the responses from the first stage (X1), (e.g., n2(X1) = 35 when X1 = 11) [59]. At the beginning of the study, the number of responses from the first stage X1 is unknown. Therefore, the sample space having all possible X1 from 0 to n1 should be used in statistical inference, such as P-value calculation and confidence interval calculation.In practice, it is possible that the final observed sample size is different from that is planned. Then, a statistical approach that incorporates adaptive elements and the observed sample size is valid for data analysis [10], [45].Systematic review: Futility designs are commonly used in early phase neurological disorder trials to screen out new treatments that are highly unlikely to produce successful results. Adaptive designs are increasingly used in drug development to improve the flexibility and efficiency of trials, having the potential to reduce the cost and save sample sizes.Interpretation: A futility design should not be confused with a design that allows a trial to be stopped due to futility. The number of neurological disorder trials designed by adaptive approaches is not as large as expected.Future directions: Adaptive futility designs should be developed for use in trials, and the associated statistical methods for newly developed designs should be proposed to provide proper statistical inference.
Authors: Bengt Winblad; Jeffrey Cummings; Niels Andreasen; George Grossberg; Marco Onofrj; Carl Sadowsky; Stefanie Zechner; Jennifer Nagel; Roger Lane Journal: Int J Geriatr Psychiatry Date: 2007-05 Impact factor: 3.485
Authors: Guogen Shan; Ece Bayram; Jessica Z K Caldwell; Justin B Miller; Jay J Shen; Shawn Gerstenberger Journal: Stat Biopharm Res Date: 2020-07-20 Impact factor: 1.452