Diane M Jacobs1,2, M Colin Ard2,3, David P Salmon1,2, Douglas R Galasko1,2, Mark W Bondi2,4,5, Steven D Edland1,2,3. 1. Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA. 2. Shiley-Marcos Alzheimer's Disease Research Center, La Jolla, CA, USA. 3. Division of Biostatistics, Department of Family Medicine & Public Health, University of California, San Diego, La Jolla, CA, USA. 4. Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA. 5. Veteran Affairs San Diego Healthcare System, San Diego, CA, USA.
Abstract
INTRODUCTION: Practice effects (PEs) present a potential confound in clinical trials with cognitive outcomes. A single-blind placebo run-in design, with repeated cognitive outcome assessments before randomization to treatment, can minimize effects of practice on trial outcome. METHODS: We investigated the potential implications of PEs in Alzheimer's disease prevention trials using placebo arm data from the Alzheimer's Disease Cooperative Study donepezil/vitamin E trial in mild cognitive impairment. Frequent ADAS-Cog measurements early in the trial allowed us to compare two competing trial designs: a 19-month trial with randomization after initial assessment, versus a 15-month trial with a 4-month single-blind placebo run-in and randomization after the second administration of the ADAS-Cog. Standard power calculations assuming a mixed-model repeated-measure analysis plan were used to calculate sample size requirements for a hypothetical future trial designed to detect a 50% slowing of cognitive decline. RESULTS: On average, ADAS-Cog 13 scores improved at first follow-up, consistent with a PE and progressively worsened thereafter. The observed change for a 19-month trial (1.18 points) was substantively smaller than that for a 15-month trial with 4-month run-in (1.79 points). To detect a 50% slowing in progression under the standard design (i.e., a 0.59 point slowing), a future trial would require 3.4 times more subjects than would be required to detect the comparable percent slowing (i.e., 0.90 points) with the run-in design. DISCUSSION: Assuming the improvement at first follow-up observed in this trial represents PEs, the rate of change from the second assessment forward is a more accurate representation of symptom progression in this population and is the appropriate reference point for describing treatment effects characterized as percent slowing of symptom progression; failure to accommodate this leads to an oversized clinical trial. We conclude that PEs are an important potential consideration when planning future trials.
INTRODUCTION: Practice effects (PEs) present a potential confound in clinical trials with cognitive outcomes. A single-blind placebo run-in design, with repeated cognitive outcome assessments before randomization to treatment, can minimize effects of practice on trial outcome. METHODS: We investigated the potential implications of PEs in Alzheimer's disease prevention trials using placebo arm data from the Alzheimer's Disease Cooperative Study donepezil/vitamin E trial in mild cognitive impairment. Frequent ADAS-Cog measurements early in the trial allowed us to compare two competing trial designs: a 19-month trial with randomization after initial assessment, versus a 15-month trial with a 4-month single-blind placebo run-in and randomization after the second administration of the ADAS-Cog. Standard power calculations assuming a mixed-model repeated-measure analysis plan were used to calculate sample size requirements for a hypothetical future trial designed to detect a 50% slowing of cognitive decline. RESULTS: On average, ADAS-Cog 13 scores improved at first follow-up, consistent with a PE and progressively worsened thereafter. The observed change for a 19-month trial (1.18 points) was substantively smaller than that for a 15-month trial with 4-month run-in (1.79 points). To detect a 50% slowing in progression under the standard design (i.e., a 0.59 point slowing), a future trial would require 3.4 times more subjects than would be required to detect the comparable percent slowing (i.e., 0.90 points) with the run-in design. DISCUSSION: Assuming the improvement at first follow-up observed in this trial represents PEs, the rate of change from the second assessment forward is a more accurate representation of symptom progression in this population and is the appropriate reference point for describing treatment effects characterized as percent slowing of symptom progression; failure to accommodate this leads to an oversized clinical trial. We conclude that PEs are an important potential consideration when planning future trials.
Practice effects (PEs) are improvements in cognitive test performance over serial assessments attributed to repeated exposure to test stimuli or procedures. Clinically, PEs can provide valuable information about level of cognitive functioning, vis-à-vis ability to benefit from repeated exposure [1], [2]; however, in randomized controlled trials, they introduce a source of external signal that may confound observation of the target outcome [3].Various methods have been proposed to address PEs, including statistical corrections and use of alternate test forms [3], [4], [5]. Although alternate forms may minimize memory for specific test items, they do not account for improvements that arise from increased familiarity with test procedures in general [6], [7], [8], and equivalent alternate forms are not available for many neurocognitive measures.Another method to accommodate for PEs in clinical trials is to use a test run-in or “dual baseline” wherein the cognitive outcome measure(s) are administered twice before randomization and scores from the second testing are used as the baseline reference. This approach helps to account for the initial, rapid improvements that occur with repeated testing, which are typically most pronounced between the first and second test administration [5], [8]. In a variant of this approach, often referred to as a single-blind placebo run-in design, participants are randomized to treatment or placebo, but all receive placebo during the run-in period between dual baseline assessments and only receive the treatment to which they had been randomized (i.e., active or placebo) after the second assessment. Dual baseline or run-in designs have been used to reduce the influence of practice and placebo effects on clinical trials with neuropsychological outcomes in a variety of diseases and interventions [9], [10], [11].We investigated the impact of a cognitive test run-in design on magnitude of potential effect size and power calculations by examining the performance of participants in the placebo arm of a secondary prevention trial to delay progression from mild cognitive impairment (MCI) to Alzheimer's disease (AD) dementia.
Methods
Overview
We conducted retrospective analyses of placebo arm data from a multicenter, randomized, double-blind, placebo-controlled trial of vitamin E and donepezil HCL to delay clinical progression from MCI to AD dementia; design and results of the trial are described elsewhere [12].
Participants
Data were obtained from participants in the placebo arm of the donepezil/vitamin E study. All participants were between the age of 55 to 90 years and met diagnostic criteria for amnestic MCI [13]. The placebo group comprised 259 participants with a mean age of 72.9 years (standard deviation [SD] = 7.6), and an average of 14.7 years of education (SD = 3.1); 47% were female, 53% were APOE ε4 carriers, and the mean score on the MMSE at screening was 27.35 (SD = 1.8). Data from only the first 18 months of the 36-month trial were used for these analyses because converters to AD dementia were offered open-label donepezil, precluding the ability to look at PEs separate from potential treatment effects in subjects who converted.
Procedure
The modified 13-item Alzheimer's Disease Assessment Scale, cognitive subscale (ADAS-Cog 13), was administered at the screening visit (1 month before randomization), 3 and 6 months after randomization, and semiannually thereafter. The ADAS-Cog 13 includes all items from the original ADAS-Cog (i.e., word list recall and recognition; measures of language, orientation, constructional and ideational praxis), plus a number cancellation task and a delayed free recall task for a total of 85 points, with higher scores indicating greater cognitive impairment [14]. Three alternate forms of the word-recall word list component were used in the trial: list 1 was administered at screening and 12 months, list 2 at 3 and 18 months, and list 3 at 6 months.
Data analyses
Sample size calculations informed by placebo arm data from the MCI trial were performed assuming a mixed-model repeated-measures (MMRM) analysis using standard methods we have described [15] and implemented in the R statistical programming language package longpower [16] using a type-I error rate of 5%, power of 80%, and assuming equal allocation to arms. The mean and covariance matrix of repeated ADAS-Cog measures were supplied to the power.mmrm function within the longpower package. To simplify presentation, we assumed no covariate adjustment and no loss to follow-up in power calculations. MMRM, as used in contemporary secondary prevention trials, compares change from randomization to final visit in the treatment arm versus change in control [17]. Mean and SD at each assessment are reported, as is the mean and SD of change from treatment randomization to month-18 visit. We compare the relative sample size required for the two trial designs by example, calculating sample size required to detect a 50% slowing of decline. Under our assumptions, the relative sample size required when effect size is expressed as percent slowing of decline is solely a function of the mean and covariance structure of the pilot data for this analysis plan [18]. Hence, relative sample size for our reported findings for the 50% slowing of decline generalize to any effect size expressed as percent slowing of decline.
Results
Participant mean scores on the ADAS-Cog 13 are shown in Fig. 1. At screening, the group mean score was 17.40 (SD = 6.0). At 3-month follow-up, the group mean score improved slightly to 16.79 (SD = 7.0). At 6-month follow-up, the group mean returned to the baseline level (mean = 17.38; SD = 7.0), and performance progressively declined thereafter. Between screening and 18-month follow-up, the mean change was +1.18 (SD = 6.2); the change between the 3- and 18-month visits was +1.79 (SD = 5.2).
Fig. 1
ADAS-Cog 13 scores of participants with MCI in the placebo arm of the ADCS donepezil/vitamin E study. Abbreviations: ADAS-Cog 13, modified 13-item Alzheimer's Disease Assessment Scale; ADCS, Alzheimer's Disease Cooperative Study; MCI, mild cognitive impairment.
ADAS-Cog 13 scores of participants with MCI in the placebo arm of the ADCS donepezil/vitamin E study. Abbreviations: ADAS-Cog 13, modified 13-item Alzheimer's Disease Assessment Scale; ADCS, Alzheimer's Disease Cooperative Study; MCI, mild cognitive impairment.Using data from the screening visit (1 month before baseline) to 18-month follow-up in power calculations, a standard 19-month trial design is estimated to require 1764 subjects per arm to detect a 50% slowing of decline. Using data from 3- to 18-month follow-up in power calculations, a 15-month study with 4-month placebo run-in would require 521 subjects per arm to detect a 50% slowing of decline, 70% less subjects than required by the standard design. Stated another way, the standard design trial would require 3.4 times more subjects for comparable power. This is a general finding when treatment effect size is expressed as percentage slowing of mean rate of decline (see Section 2); that is, regardless of the percentage effect size powered for, when effect size is expressed as percentage slowing of decline, the standard design trial will require 3.4 times more subjects than the run-in design trial.
Discussion
We used the placebo arm data from a completed trial to demonstrate the impact of PEs on magnitude of potential effect size and sample size projections for two study designs. Regarding treatment effect size, clearly in the presence of PEs, the rate of change after washout of PEs (from the 3-month visit forward in our example) is the most correct characterization of rate of disease progression on a given instrument [5]. It follows that treatment effect sizes characterized relative to this rate are more accurate and meaningful. Moreover, characterizing treatment effect size relative to change from first assessment results in oversized trials in the presence of PEs.Results demonstrate that two investigators, informed by the pattern of progression observed in a previous trial and with similar directives in terms of treatment effect size, can come to dramatically differently sized trials depending on the design. This is explained by the difference in effect size when effect size is expressed in units of ADAS-Cog 13. For example, for the standard design with 19-month treatment, a 50% reduction in change corresponds to half of 1.18 (Fig. 1) or 0.59 units. For the run-in design, a 50% reduction in change corresponds to half of 1.79 (Fig. 1) or 0.90 units. In units of ADAS-Cog 13, the effect size powered for in the run-in design is much larger and requires a smaller projected sample size.There are limitations to this analysis. Alternate versions of the ADAS-Cog word list memory test were used in the trial, so it is possible that improved performance observed at the second administration of the ADAS-Cog reflects version differences rather than a PE. The possibility that the word list used at the 3-month visit was easier for participants seems less likely, however, given that a similar improvement relative to previous measurements was not observed at 18-month follow-up, when the same word list was again used. Furthermore, the run-in design, as proposed, does not consider the effects of any additional PEs that may occur beyond the second test administration. Finally, instruments that are less vulnerable to PEs will be less prone to the issues outlined in this report. Replication of these findings in other cohort and trial data, with other commonly used outcome measures, will further confirm the potential impact of the single-blind placebo cognitive test run-in design on effect size characterization.Consideration of PEs will be increasingly important as the target population for AD trials moves earlier in the disease course, where measurable treatment effects may be subtle and PEs more robust [5], [19], [20]. Trials of treatments intended to slow the underlying AD neurodegenerative process, where no acute treatment effects are anticipated, may be particularly vulnerable in this regard. We conclude that the presence of PEs is an important consideration when planning future trials.Systematic review: Practice effects on neuropsychological measures have been well documented and may mask treatment effects in clinical trials with cognitive end points. Single-blind run-in designs have been used to attenuate practice effects before randomization for other diseases but have not been applied consistently in Alzheimer's disease trials.Interpretation: Archival analyses of placebo arm data from a secondary Alzheimer's disease (AD) prevention trial with mild cognitive impairment participants revealed that using a single-blind placebo cognitive test run-in design yielded greater change in cognitive outcome (ADAS-Cog 13) than a traditional design with randomization at the first assessment visit. The run-in design dramatically reduced the requisite sample size to achieve comparable statistical power.Future directions: Replication of these findings using other outcome measures as well as in participants in the preclinical and asymptomatic stages of AD will further validate the utility of using a single-blind placebo cognitive test run-in for AD prevention trials.
Authors: R C Mohs; D Knopman; R C Petersen; S H Ferris; C Ernesto; M Grundman; M Sano; L Bieliauskas; D Geldmacher; C Clark; L J Thal Journal: Alzheimer Dis Assoc Disord Date: 1997 Impact factor: 2.703
Authors: Kevin Duff; Constantine G Lyketsos; Leigh J Beglinger; Gordon Chelune; David J Moser; Stephan Arndt; Susan K Schultz; Jane S Paulsen; Ronald C Petersen; Robert J McCaffrey Journal: Am J Geriatr Psychiatry Date: 2011-11 Impact factor: 4.105
Authors: Jason Hassenstab; David Ruvolo; Mateusz Jasielec; Chengjie Xiong; Elizabeth Grant; John C Morris Journal: Neuropsychology Date: 2015-05-25 Impact factor: 3.295
Authors: Mary M Machulda; V Shane Pankratz; Teresa J Christianson; Robert J Ivnik; Michelle M Mielke; Rosebud O Roberts; David S Knopman; Bradley F Boeve; Ronald C Petersen Journal: Clin Neuropsychol Date: 2013-09-17 Impact factor: 3.535
Authors: Leigh J Beglinger; William H Adams; Douglas Langbehn; Jess G Fiedorowicz; Ricardo Jorge; Kevin Biglan; John Caviness; Blair Olson; Robert G Robinson; Karl Kieburtz; Jane S Paulsen Journal: Mov Disord Date: 2013-12-27 Impact factor: 10.338
Authors: Johannes Levin; Jonathan Vöglein; Yakeel T Quiroz; Randall J Bateman; Valentina Ghisays; Francisco Lopera; Eric McDade; Eric Reiman; Pierre N Tariot; John C Morris Journal: Alzheimers Dement Date: 2022-02-24 Impact factor: 16.655
Authors: Andrew J Aschenbrenner; Jason Hassenstab; Guoqiao Wang; Yan Li; Chengjie Xiong; Eric McDade; David B Clifford; Stephen Salloway; Martin Farlow; Roy Yaari; Eden Y J Cheng; Karen C Holdridge; Catherine J Mummery; Colin L Masters; Ging-Yuek Hsiung; Ghulam Surti; Gregory S Day; Sandra Weintraub; Lawrence S Honig; James E Galvin; John M Ringman; William S Brooks; Nick C Fox; Peter J Snyder; Kazushi Suzuki; Hiroyuki Shimada; Susanne Gräber; Randall J Bateman Journal: Front Aging Neurosci Date: 2022-06-16 Impact factor: 5.702
Authors: Emma Gulley; Joe Verghese; Helena M Blumen; Emmeline Ayers; Cuiling Wang; Russell K Portenoy; Jessica L Zwerling; Erica Weiss; Helena Knotkova Journal: Neurodegener Dis Manag Date: 2021-07-09
Authors: Gary Arendash; Chuanhai Cao; Haitham Abulaban; Rob Baranowski; Gary Wisniewski; Lino Becerra; Ross Andel; Xiaoyang Lin; Xiaolin Zhang; David Wittwer; Jay Moulton; John Arrington; Amanda Smith Journal: J Alzheimers Dis Date: 2019 Impact factor: 4.472