| Literature DB >> 35128027 |
Mark Sanderson-Cimino1,2, Jeremy A Elman2,3, Xin M Tu3,4,5, Alden L Gross6, Matthew S Panizzon2,3, Daniel E Gustavson7, Mark W Bondi3,8, Emily C Edmonds3,9, Graham M L Eglit2,3,5, Joel S Eppig10, Carol E Franz2,3, Amy J Jak2,11, Michael J Lyons12, Kelsey R Thomas3,9, McKenna E Williams1,2, William S Kremen2,3.
Abstract
INTRODUCTION: Practice effects (PEs) on cognitive tests obscure decline, thereby delaying detection of mild cognitive impairment (MCI). Importantly, PEs may be present even when there are performance declines, if scores would have been even lower without prior test exposure. We assessed how accounting for PEs using a replacement-participants method impacts incident MCI diagnosis.Entities:
Keywords: Alzheimer's disease; clinical trials; early diagnosis; longitudinal aging; mild cognitive impairment; practice effects
Year: 2022 PMID: 35128027 PMCID: PMC8804942 DOI: 10.1002/trc2.12228
Source DB: PubMed Journal: Alzheimers Dement (N Y) ISSN: 2352-8737
FIGURE 1Practice effects (PEs) with and without true decline. The solid line represents true cognitive ability. The dashed line represents observed performance, which is inflated due to a practice effect (vertical arrow). A, Typically observed practice effect: an individual's observed score increases from baseline to follow‐up, demonstrating a typical practice effect. B, Practice effect in the context of cognitive decline. In this scenario, an individual's ability is decreasing over time. A practice effect still exists but is masked by cognitive decline. As a result, the individual's performance appears to be stable but is actually better than it would have been without previous exposure to the test. C, PEs impair detection of mild cognitive impairment (MCI). In this situation, an individual has declined below an MCI cutoff. However, PEs are inflating their score so that they now fall above the MCI cutoff and will be diagnosed as cognitively normal at follow‐up
FIGURE 2Sample matching and practice effect calculations. Practice effect calculations are based on bootstrapped analyses. Participants with valid baseline data were identified (n = 889). (1) Participants who also had 12‐month follow‐up data comprised the returnees (n = 722). (2) A subsample (n = 25% total sample) of returnees was selected; this was ≈220 participants. (3) Baseline data for these participants were labeled as ReturneesT1. Follow‐up data for these participants were labeled ReturneesT2. (4) The 220 ReturneesT1 participants were removed from the pool of baseline data, leaving ≈670 remaining baseline participants. (5) Using propensity score matching with an additional age restriction (<0.1 years), the potential pseudo‐replacements were matched to the ReturneesT2 participants using one‐to‐one matching. The pseudo‐replacements were drawn from the 670 remaining baseline participant pool. Matching parameters were age, birth sex, education, and premorbid IQ. Additionally, comparisons of age, birth sex, education, and premorbid IQ were completed to confirm groups were similar (p’s > .80). (6) Once matching was complete, the sample was labeled Pseudo‐ReplacmentsT1, and this sample ranged in size from 200 to 240 participants. Thus, the Pseudo‐ReplacmenstsT1 sample and the ReturneesT2 sample were demographically matched and only differed in that the ReturneesT2 had taken the test before while Pseudo‐ReplacmeentsT1 had taken the tests only once. After the—on average—220 Pseudo‐ReplacmentsT1 were removed from the pool of baseline data, there were 450 remaining unchosen baseline participants, or 50% of the total sample. The previous steps were completed at each of the 5000 iterations. Practice effects were calculated by comparing the mean scores of these subsamples using the equations provided below the flowchart. The difference between the mean of ReturneesT2 scores and the mean of the matched Pseudo‐ReplacementsT1 scores equates to the sum of practice effect and attrition effect. The attrition effect accounts for the fact that individuals who return for follow‐up may be higher performing or healthier than the full baseline sample. (7–9) To retain the proportion of returnees to attritors we had in the original sample, we then created a weighted mean of the baseline data cognitive score by multiplying the mean test score of the remaining baseline subject pool by the attrition rate (≈ 16%) and the ReturneesT1 pool by the retention rate (84%); this is referred to as the Proportional Baseline in the text. The practice effect for each test equals the difference score minus the attrition effect.
Means, standard deviations, attrition effects, and practice effects for each cognitive test
| Memory | Attention/executive function | Language | ||||
|---|---|---|---|---|---|---|
| Raw mean score (SD) | RAVLT | Logical memory | Trails A | Trails B | Boston Naming | Category fluency |
| Proportional baseline | 7.18 (3.81) | 10.64 (4.24) | 31.89 (10.79) | 77.47 (39.86) | 29.04 (2.42) | 19.67 (5.23) |
| Returnees baseline | 7.18 (3.79) | 10.54 (4.23) | 31.97 (10.82) | 76.89 (38.41) | 29.05 (2.36) | 19.71 (5.26) |
| Returnees follow‐up | 6.97 (4.38) | 11.66 (4.63) | 31.52 (12.52) | 75.24 (43.14) | 29.43 (2.27) | 19.84 (5.22) |
| Replacements follow‐up | 6.97 (3.79) | 10.60 (4.34) | 32.47 (10.83) | 79.38 (41.69) | 28.99 (2.45) | 19.46 (5.19) |
| Attrition effect | 0 | –0.09 | –0.02 | –0.59 | 0.01 | 0.03 |
| Practice effect | NA | 1.15 | –0.93 | –3.56 | 0.43 | 0.35 |
| Cohen's d | NA | .24 | –0.07 | –0.08 | 0.19 | 0.07 |
Abbreviations: PE, practice effect; RAVLT, Rey Auditory Verbal Learning Test; SD, standard deviation; Trails, Trail Making Test.
Notes: Groups are based on the average performance across all 5000 bootstrapped iterations. Means are based on transformed data that was reverted back to raw units. “Proportional baseline” refers to a weighted mean that combines the returnee baseline group and a group that included all subjects not selected to be returnees or replacements in that bootstrapped iteration. “Returnee baseline” refers to baseline test scores for the portion of participants who returned for the 12‐month follow‐up visit (n = 722). “Returnee follow‐up” refers to 12‐month scores for the portion of participants who returned for the 12‐month follow‐up (n = 722). “Replacement follow‐up” refers to the pseudo‐replacement scores. The scores for memory tasks indicate the number of words remembered at the delayed recall trials. Scores on the attention/executive functioning tests indicate time to completion of task. On these tasks, higher scores indicate worse performance. Scores on the Boston Naming Test task indicate number of correct items identified; scores on category fluency indicate number of items correctly stated. Practice effects and attrition effects are in raw units. As such, the negative practice effects and attrition effects for the Trails tasks demonstrate that practice decreased time (increased performance). Cohen's d is given for the difference between PE‐adjusted and unadjusted scores of returnees at follow‐up.
Impact of practice effects
| A progression from cognitively normal to MCI | ||||
|---|---|---|---|---|
| # of cases, based on PE‐unadjusted cognitive scores | # of cases, based on PE‐adjusted cognitive scores | Difference in # of cases (%) | χ2; | |
| MCI diagnosis | 104 | 124 | +20 (+19%) | 18.1; < .001 |
| Memory domain impaired | 74 | 87 | +13 (+18%) | 11.1; < .001 |
| Attention/executive domain impaired | 21 | 25 | +4 (+19%) | 2.3; .13 |
| Language domain impaired | 11 | 14 | +3 (+27%) | 1.3; .25 |
| Impaired on 1 test within all domains | 11 | 13 | +2 (+18%) | 0.17; .68 |
Abbreviations: Aβ, amyloid beta; CN, cognitively normal; MCI, mild cognitive impairment; PE, practice effects; p‐tau, phosphorylated tau; t‐tau, total tau.
Notes: Follow‐up diagnoses were made with practice effect‐unadjusted (PE‐unadjusted) or practice effect‐adjusted (PE‐adjusted) scores. The difference in the number of cases is calculated by subtracting the number of cases, based on PE‐unadjusted scores, from the number of cases based on PE‐adjusted scores. The percent difference (%) in number of cases is the differences in number of cases divided by the number of cases based on PE‐unadjusted cognitive scores (eg., 19% = 20/104). χ2 is McNemar χ2. Individuals could be impaired in more than one domain. Consequently, the sum of impaired individuals within each domain is greater than the total number of MCI cases. The MCI diagnosis row counts an individual only once, even if they are impaired in more than one domain.
FIGURE 3Effect of practice effect‐adjusted versus unadjusted scores on a hypothetical clinical trial of biomarker‐positive participants. Comparison of estimated sample sizes (Y‐axis) necessary for detecting a significant drug effect (X‐axis) in a sample that is biomarker‐negative and cognitively normal (CN) at baseline. The drug effect is operationalized as percent reduction in mild cognitive impairment (MCI) diagnoses at a 1‐year follow‐up between the treatment group and the placebo group. For example, a drug effect of 30% means that 30% more participants remained CN when treated with the drug than when given the placebo. The red line represents a trial that uses MCI incidence rates based on practice effect (PE)‐adjusted diagnoses and the blue line represents a trial that uses incidence rates based on unadjusted diagnoses. MCI incidence rates were based on the subsample of participants from the present study who were biomarker‐negative and CN at baseline. The model examined was a logistic regression with diagnosis at follow‐up (MCI vs. CN) as the outcome variable. The predictor was a two‐level categorical variable representing placebo or drug. Alpha was set at 0.05, power was 0.80, and the hypothetical sample was evenly split into treatment and placebo groups. Across all effect sizes (10% to 40% reduction in treatment vs. placebo conversation rates) the PE‐adjusted trial required fewer participants than the PE‐unadjusted trial. The inset shows results for hypothetical samples with ≈1000 participants. If this study used PE‐unadjusted outcome measures (blue line), it would require an effect size of 28.2% to reach a significant result with ≈1000 participants. Using PE‐adjusted diagnoses, only 844 participants would be required for the same study with the same drug effect, a reduction of 156 participants. A PE‐adjusted study with ≈1000 participants (red line in the inset) would be able to detect a smaller drug effect of 26.1%. With this 2.1% reduction in effect size, a PE‐unadjusted study would require an additional 186 participants at this drug effect level (1186 vs. 1000)
FIGURE 4Comparison of recruitment designs for detection of a drug effect based on Anti‐Amyloid Treatment in Asymptomatic Alzheimer's Disease (A4) study recruitment. Using sample size estimates from Figure 3, we present how planning to adjust for practice effects (PEs) would alter a clinical drug trial, using A4 study recruitment as an example. The A4 study had a total sample of 1323 participants after recruitment as shown in the top row of gray boxes (based on Figure 1 in Sperling et al.). A, Based on sample size estimates from Figure 3, a sample of 1323 would enable a study to detect a significant drug effect of 24.7% at an alpha of 0.05 and 0.80 power. The top row of the flowchart presents the recruitment for the A4 study. This study reported an initial screening (6763 participants) followed by amyloid positron emission tomography (PET; 4486 participants) imaging to achieve their sample of 1323 amyloid‐positive (Aβ +), cognitively normal (CN) participants. Achieving the final sample size thus required an n for the initial screening that was 5.11 times as large as the final sample size, and an n for amyloid PET imaging that was 3.39 times as large as the final sample. Our power analyses suggest that the same effect size is achieved with only 1116 participants if a trial adjusted follow‐up scores for PEs. That, along with the reductions in initial screening and PET scans, is shown in the middle row of the flowchart. The bottom row shows the sample size reductions for initial screening, PET screening, and the initial biomarker‐positive and CN sample. B, The figure presents the reduction in recruitment sample size (Y‐axis) across effect sizes ranging from 10% to 40% (X‐axis). The orange line represents how many fewer participants would be necessary at initial screening if a study had planned to adjust for practice effects (PEs) at follow‐up