Literature DB >> 36082322

Practice effects in performance outcome measures in patients living with neurologic disorders - A systematic review.

Sven P Holm¹, Arnaud M Wolfer¹, Grégoire H S Pointeau¹, Florian Lipsmeier¹, Michael Lindemann¹.

Abstract

Background: In this systematic review we sought to characterize practice effects on traditional in-clinic or digital performance outcome measures commonly used in one of four neurologic disease areas (multiple sclerosis; Huntington's disease; Parkinson's disease; and Alzheimer's disease, mild cognitive impairment and other forms of dementia), describe mitigation strategies to minimize their impact on data interpretation and identify gaps to be addressed in future work.
Methods: Fifty-eight original articles (49 from Embase and an additional 4 from PubMed and 5 from additional sources; cut-off date January 13, 2021) describing practice effects or their mitigation strategies were included.
Results: Practice effects observed in healthy volunteers do not always translate to patients living with neurologic disorders. Mitigation strategies include reliable changes indices that account for practice effects or a run-in period. While the former requires data from a reference sample showing similar practice effects, the latter requires a sufficient number of tests in the run-in period to reach steady-state performance. However, many studies only included 2 or 3 test administrations, which is insufficient to define the number of tests needed in a run-in period. Discussion: Several gaps have been identified. In particular the assessment of practice effects on an individual patient level as well as the temporal dynamics of practice effects are largely unaddressed. Here, digital tests, which allow much higher testing frequency over prolonged periods of time, can be used in future work to gain a deeper understanding of practice effects and to develop new metrics for assessing and accounting for practice effects in clinical research and clinical trials.

Entities: Chemical

Keywords: Alzheimer disease; Dementia; Huntington disease; Mild cognitive impairment; Multiple sclerosis; Parkinson disease; Practice effects

Year: 2022 PMID： 36082322 PMCID： PMC9445299 DOI： 10.1016/j.heliyon.2022.e10259

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

Chronic neurological diseases such as multiple sclerosis, Huntington's disease, Parkinson's disease or dementia may manifest in functional impairment in one or several functional domains (Lees et al., 2009; Roos, 2010; Sosnoff et al., 2014). Assessing these domains regularly can provide valuable insights into both the subject's disease status and the disease course and also inform treatment and disease management (Tur et al., 2018). Repeated performance assessments over time may, however, be susceptible to practice effects. Practice effects (also sometimes known as learning effects; see Panel for definition) is any change or improvement that results from repetition of tasks or activities, including repeated exposure to an instrument, rather than due to a true change in a patient's ability (Heilbronner et al., 2010; McCaffrey and Westervelt, 1995). For example, patients may perform better in subsequent tests as they fully comprehend the tasks (context memory or context effects) or gain knowledge of the sequence of tasks (episodic memory or content effects) and map the stimulus to the response (Goldberg et al., 2015). Over time this familiarity with the test could lead the subject to develop strategies that result in inflated test performance compared with a subject exposed to the test for the first time (Goldberg et al., 2015). The overall improvement in performance, or practice effects, is the result of consecutive gains that tend to be largest at first and gradually become smaller as the number of assessments increases (Figure 1) (Bartels et al., 2010; Falleti et al., 2006). In particular at short inter-test intervals, practice effects are often much greater than normative functional change over a similar interval (Jones, 2015).

Figure 1

Schematic representation of the evolution of test performance through time solely due to task repetition (practice effects) when different assessment frequencies are considered. Each curve represent the test performance through time for a daily assessment (top curve) and a weekly assessment schedule (bottom curve). Each individual test is represented by a dot, colored either orange if it is part of the practice period (or run-in period), or blue if it is part of the steady-state period. During the practice period, performance gain between consecutive tests is largest at first and gradually reduces as the number of assessments increases. The assessment frequency does not alter the overall performance gain or number of iterations required to reach a steady-state suitable for reliable assessment, but decreases or increases the time needed to reach such state (e.g. 7 days vs 7 weeks). The subject's abilities are considered constant over the period of time considered. Practice effects are often considered to introduce unwanted variance and thus complicate the interpretation of repeated clinical assessments (McCaffrey and Westervelt, 1995). If not accounted for, practice effects can lead to misdiagnosis or misinterpretation of clinical data, resulting in delayed access to the most effective treatment option (Elman et al., 2018; Marley et al., 2017). Despite a large body of literature on practice effects, their impact on the subject's performance is seldom addressed and has been described as “large, pervasive and underappreciated” (Jones, 2015). Current study designs typically do not adequately estimate and mitigate their impact on test performance despite most repeated assessments being affected by practice effects to a varying degree (Johnson et al., 1991; McCaffrey and Westervelt, 1995). Thus, key in addressing this challenge is not only the characterization of practice effects and their underlying mechanisms, but also the implementation of mitigation strategies. This systematic review aims to evaluate the presence and magnitude of practice effects associated with commonly used performance outcome measures in patients living with one of the four neurologic disorders: multiple sclerosis; Huntington's disease; Parkinson's disease; and Alzheimer's disease, mild cognitive impairment and other forms of dementia. In addition, this review discusses the different mitigation strategies that have been applied to minimize the impact of practice effects. Finally, it identifies gaps in our understanding of practice effects in patients with neurologic disorders, which should be addressed in future research.

Methods

A systematic literature search was conducted on Embase and PubMed according to PRISMA guidelines to identify original articles that discuss practice effects in patients with neurologic disorders (cut-of date: January 13, 2021). Three separate search strings were used: one to identify original articles on commonly used performance outcome measures, one to identify original articles on practice effects, and one to identify original articles on either multiple sclerosis; Parkinson's disease; Huntington's disease; or mild cognitive impairment, Alzheimer's disease or other forms of dementia (Table 1). Combining these three search strings with a Boolean “AND” resulted in the list of publications that were considered for this systematic review. Additional relevant records were identified through clinicaltrials.gov and from our own collection of references.

Table 1

Search string.

Search string 1:Performance outcome measures	Search string 2:Practice effects	Search string 3:Neurologic disorders
• Cognition:	• Practice Effects	• Multiple Sclerosis
Symbol Digit Modalities Test, Paced Auditory Serial Addition Test, Serial Reaction Time, Trail-Making Test, Stroop Test, Brief Visuospatial Memory Test-Revised, California Verbal Learning Test, Hopkins Verbal Learning Test	• Learning Effects	• Parkinson's Disease
• Upper extremity function:	• Initial Learning	• Huntington's Disease
Nine-Hole Peg Test, Pegboard Test, Speeded Tapping,	• Retest Effects	• Mild Cognitive Impairment, Alzheimer's disease, Dementia
• Gait & balance:
Timed 25-Foot Walk, 2-Minute Walk Test, Timed Up and Go, Berg Balance Scale
• Vision:
Low Contrast Visual Acuity
• Composite scores:
Brief International Cognitive Assessment for Multiple Sclerosis (BICAMS), Consortium to Establish a Registry for Alzheimer's Disease neuropsychological test battery (CERAD), Mini-Mental State Examination, Minimal Neuropsychological Assessment of MS Patients (MACFISM), Multiple Sclerosis Functional Composite (MSFC), Unified Huntington Disease Rating Scale (UHDRS), Wechsler Adult Intelligence Scale (WAIS), Wechsler Memory Scale (WMS), Neuropsychological Test
• Digital performance outcome measures:
Digital, Computer, Tablet, Mobile, Smartphone

While ‘neuropsychological test’ was included in the search string, this was only used to identify original articles that reported practice effects on at least one of the other performance outcome measures.

Search string. Cognition: Practice Effects Multiple Sclerosis Symbol Digit Modalities Test, Paced Auditory Serial Addition Test, Serial Reaction Time, Trail-Making Test, Stroop Test, Brief Visuospatial Memory Test-Revised, California Verbal Learning Test, Hopkins Verbal Learning Test Learning Effects Parkinson's Disease Upper extremity function: Initial Learning Huntington's Disease Nine-Hole Peg Test, Pegboard Test, Speeded Tapping, Retest Effects Mild Cognitive Impairment, Alzheimer's disease, Dementia Gait & balance: Timed 25-Foot Walk, 2-Minute Walk Test, Timed Up and Go, Berg Balance Scale Vision: Low Contrast Visual Acuity Composite scores: Brief International Cognitive Assessment for Multiple Sclerosis (BICAMS), Consortium to Establish a Registry for Alzheimer's Disease neuropsychological test battery (CERAD), Mini-Mental State Examination, Minimal Neuropsychological Assessment of MS Patients (MACFISM), Multiple Sclerosis Functional Composite (MSFC), Unified Huntington Disease Rating Scale (UHDRS), Wechsler Adult Intelligence Scale (WAIS), Wechsler Memory Scale (WMS), Neuropsychological Test Digital performance outcome measures: Digital, Computer, Tablet, Mobile, Smartphone While ‘neuropsychological test’ was included in the search string, this was only used to identify original articles that reported practice effects on at least one of the other performance outcome measures. Publications were excluded if they were not original articles (for example, congress abstracts or other review articles); written in a language other than English, were duplicates; did not report practice effects or mitigation strategies in one of four disease areas specified in the search string; or did not report practice effects or mitigation strategies for one of the performance outcomes measures specified in the search string. ‘Neuropsychological test’ was included in the search string to identify original studies that investigated practice effects or mitigation strategies in test batteries that include at least one of the other performance outcome measures defined in the search string. This eligibility assessment was performed by the first author. To minimize the impact of bias, only improvements in test performance that resulted from practice or repetition of task and cannot be explained by other means were considered to be practice effects. Thus, practice effects were considered whenever possible in the non-interventional cohort. Risk of publication bias and selective reporting was assessed by identifying the number of completed and potentially relevant studies listed on clinicaltrials.gov for which results have not been published yet. In this systematic review, we aim to address the following five questions: Which metrics were used to identify possible practice effects? Were practice effects observed in patients with neurologic disorders, and how common were they? Which mitigation strategies were applied to minimize the impact of practice effects? Do practice effects carry any clinically meaningful information? Are there any gaps in our current understanding of practice effects in patients with neurologic disorders?

Results

The literature search on Embase and PubMed identified a total of 177 and 103 records, respectively. An additional 5 studies from a search on clinicaltrials.gov or from our own collection of references were included in the analysis. Of the 285 records, 58 were considered eligible (Figure 2). Records were excluded during screening for the following reasons: duplicates (n = 85), disease area (n = 76), publication type other than original articles (n = 9) and language other than English (n = 1). While assessing the full-text articles for eligibility, additional 54 records were excluded (performance outcome measures: n = 35, did not report on practice effects or mitigation strategy: n = 18, disease area: n = 1). Only two completed and potentially relevant studies were identified on clinicaltrials.gov that did not publish results on practice effect analyses (NCT02225314 and NCT02476266). Table 2 summarizes the functional domains assessed by each performance outcome measure included in the analysis.

Figure 2

PRISMA flow diagram. Incl, Inclusion.

Table 2

Performance outcome measures and their functional domain.

Performance outcome measure	Functional domain	Reference
SDMT	Information processing speed, working memory	Smith (1982), Toh et al. (2014)
PASAT	Information processing speed, working memory	Gronwall (1977), Rao et al. (1989)
SRT	Sequential learning	Schendan et al. (2003)
TMT
TMT-A	Information processing speed	Salthouse (2011), Duff et al. (2018)
TMT-B	Executive function	Toh et al. (2014)
Stroop Test
Stroop Word Test	Information processing speed	Stroop (1935), Toh et al. (2014)
Stroop Color Test	Information processing speed	Stroop (1935), Toh et al. (2014)
Stroop Interference Test	Executive function	Stroop (1935), Toh et al. (2014)
BVMT-R	Visuospatial memory	Benedict (1997)
CVLT	Learning and memory	Delis et al. (1987), Elwood (1995)
HVLT	Learning and memory	Brandt (1991)
WAIS
Coding/Digit Symbol	Information processing speed	Wechsler (2008)
Digit Span	Working memory	Wechsler (2008)
Letter-Number Sequencing	Working memory	Wechsler (2008)
Similarities	Verbal comprehension	Wechsler (2008)
Matrix Reasoning	Perceptual Organization	Wechsler (2008)
WMS
Spatial Span	Working memory	Wechsler (2009)
Logical Memory	Episodic memory	Wechsler (2009)
Visual Reproduction	Episodic memory	Wechsler (2009)
Paired Associations	Verbal comprehension	Wechsler (2009)
MMSE	Global cognition	Folstein et al. (1975)
T25FW	Gait	Motl et al. (2017)
2MWT	Gait	Rossier and Wade (2001)
TUG	Gait, dynamic and static balance	Podsiadlo and Rirchardson (1991)
9HPT	Hand-motor function, manual dexterity	Feys et al. (2017)
Purdue Pegboard	Hand-motor function, manual dexterity	Tiffin (1968)
Speeded Tapping/Alternating Tapping	Hand-motor function, manual dexterity	Stout et al. (2014), Prince et al. (2018), Westin et al. (2010)
Paced Tapping	Hand-motor function, manual dexterity	Stout et al. (2014)
Smartphone-based SDMT	Information processing speed, working memory	Pham et al. (2021)
Memory Test	Short-term memory	Prince et al. (2018)
Brain on Track
Attention task III	Attention, information processing speed	Ruano et al. (2020)
Visual memory task II	Visual memory, attention	Ruano et al. (2020)
Delayed verbal memory	Verbal memory	Ruano et al. (2020)
Calculus task	Calculus	Ruano et al. (2020)
Colour interference task	Executive function	Ruano et al. (2020)
Verbal memory II	Verbal memory	Ruano et al. (2020)
Opposite task	Executive function, inhibitory control	Ruano et al. (2020)
Written comprehension	Language comprehension, information processing speed	Ruano et al. (2020)
Word categories	Language	Ruano et al. (2020)
Sequences	Executive function	Ruano et al. (2020)
Puzzles	Visuospatial abilities	Ruano et al. (2020)
CANTAB
One Touch Stockings of Cambridge	Executive function	Giedraitiene and Kubrys (2019)
Spatial Working Memory	Working memory	Giedraitiene and Kubrys (2019)
Reaction Time Task	Information processing speed	Giedraitiene and Kubrys (2019)
Paired Associates Learning	Visual memory	Giedraitiene and Kubrys (2019)
MSReactor
Simple Reaction Time	Information processing speed	Merlo et al. (2019)
Choice Reaction Time	Visual attention	Merlo et al. (2019)
One-Back Test	Working memory	Merlo et al. (2019)
CogState
Detection Task	Information processing speed	Hammers et al. (2011)
Identification Task	Visual attention	Hammers et al. (2011)
One-Back Task	Working memory	Hammers et al. (2011)
One Card Learning	Visual recognition	Hammers et al. (2011)
Divided Attention	Divided attention	Hammers et al. (2011)
Associative Learning	Associative learning	Hammers et al. (2011)
Visual Search	Cognitive function, motor behavior	Utz et al. (2013)
MSPT
Manual Dexterity Test	Hand-motor function, manual dexterity	Rao et al. (2020)
Contrast Sensitivity Test	Vision	Rao et al. (2020)
Walking Speed Test	Gait	Rao et al. (2020)
Driving Simulator	Visual information integration	Teasdale et al. (2016)

2MWT, Two-Minute Walk Test; 9HPT, Nine-Hole Peg Test; BVMT-R, Brief Visuospatial Memory Test-Revised; CANTAB, Cambridge Neuropsychological Test Automated Battery; CVLT, California Verbal Learning Test; HVLT, Hopkins Verbal Learning Test; MMSE, Mini-Mental State Examination; MSPT, Multiple Sclerosis Performance Test; PASAT, Paced Auditory Serial Addition Test; SDMT, Symbol Digit Modalities Test; SRT, Serial Reaction Time; T25FW, Timed 25-Foot Walk; TMT, Trail-Making Test; TUG, Timed Up and Go; WAIS, Wechsler Adult Intelligence Scale; WMS, Wechsler Memory Scale.

PRISMA flow diagram. Incl, Inclusion. Performance outcome measures and their functional domain. 2MWT, Two-Minute Walk Test; 9HPT, Nine-Hole Peg Test; BVMT-R, Brief Visuospatial Memory Test-Revised; CANTAB, Cambridge Neuropsychological Test Automated Battery; CVLT, California Verbal Learning Test; HVLT, Hopkins Verbal Learning Test; MMSE, Mini-Mental State Examination; MSPT, Multiple Sclerosis Performance Test; PASAT, Paced Auditory Serial Addition Test; SDMT, Symbol Digit Modalities Test; SRT, Serial Reaction Time; T25FW, Timed 25-Foot Walk; TMT, Trail-Making Test; TUG, Timed Up and Go; WAIS, Wechsler Adult Intelligence Scale; WMS, Wechsler Memory Scale.

Identifying and quantifying practice effects

Several different approaches and metrics have been applied to identify practice effects, to quantify their magnitude and temporal dynamics, and to address potential biases in the interpretation of the data. These different approaches and metrics are summarized in Table S1 in the supplementary appendix.

Identifying practice effects

Descriptive statistics have been used to compare the change in performance between baseline and retest (Cohen et al., 2000, 2001; Duff et al., 2007, 2012; Duff and Hammers, 2022). However, it is more common to test the difference for statistical significance. Depending on the study design and the distribution of data, t-tests, Friedman's test, Wilcoxon rank test, ANOVA, ANCOVA or other general linear models have been used to identify practice effects (Bachoud-Lévi et al., 2001; Barker-Collo, 2005; Beglinger et al., 2014a, 2014b; Benedict, 2005; Benedict et al., 2008; Benninger et al., 2011, 2012; Bever et al., 1995; Buelow et al., 2015; Campos-Magdaleno et al., 2017; Claus et al., 1991; Duff et al., 2017, 2018; Eshaghi et al., 2012; Frank et al., 1996; Fuchs et al., 2020; Gallus and Mathiowetz, 2003; Gavett et al., 2016; Giedraitiene and Kaubrys, 2019; Glanz et al., 2012; Hammers et al., 2011; Merlo et al., 2019; Meyer et al., 2020; Nagels et al., 2008; Patzold et al., 2002; Pliskin et al., 1996; Rao et al., 2020; Reilly and Hynes, 2018; Rosti-Otajärvi et al., 2008; Ruano et al., 2020; Snowden et al., 2001; Solari et al., 2005; Sormani et al., 2019; Stout et al., 2014; Teasdale et al., 2016; Toh et al., 2014; Vogt et al., 2009; Westin et al., 2010). Alternatively, longitudinal data can be fitted with cubic splines to detect improvements over time that indicate practice effects (Merlo et al., 2019). Practice effects may also be indirectly identified. For example, a change in clinical diagnosis (for example, from mild cognitive impairment to cognitively healthy) can indicate the presence of practice effects (Duff et al., 2011). Similarly, patients who showed functional disability at baseline may, due to practice, improve their performance at retest sufficiently to no longer be classified as impaired (Schwid et al., 2007). Practice effects may also be indirectly inferred from a reliable change index analysis (Rosas et al., 2020). As the reliable change index captures the expected change based on the change observed in a reference population. An improvement beyond this index suggests that the patient showed greater than expected improvements, which may indicate practice effects. For the purpose of this review, any improvements in test performance that cannot be explained by other means such as interventional effects, functional recovery or decline etc. were considered to be practice effects.

Quantifying the magnitude of practice effects

A common approach to quantify practice effects is to compute their effect size. Available effect size metrics include Cohen's d, repeated-measures effect size, η2 and partial η2 (Beglinger et al., 2014a; Benedict, 2005; Benedict et al., 2008; Campos-Magdaleno et al., 2017; Duff et al., 2017; Eshaghi et al., 2012; Giedraitiene and Kaubrys, 2019; Gross et al., 2018; Hammers et al., 2011; Higginson et al., 2009; Rao et al., 2020; Stout et al., 2014; Vogt et al., 2009). Similarly the change in test scores can be quantified in SD units (Elman et al., 2018; Erlanger et al., 2014; Gavett et al., 2016). Practice effects can also be quantified by computing the ratio between the test scores at retest and at baseline to obtain a progression ratio (Prince et al., 2018). An alternative approach to estimate a reliable change index from a normative or reference population (Duff et al., 2017; Rosas et al., 2020; Turner et al., 2016; Utz et al., 2016). The reliable change index can be applied on an individual patient level and compared against the observed change. This results in a z-score that informs about the magnitude of practice effects relative to the expected practice effects. Z-scores greater than 1 indicate greater than expected practice effects. Non-parametric statistics can then be applied to assess between-group differences (Duff et al., 2017). Alternatively, cut-off values can be defined to classify patients into one of three groups: significant improvement, significant worsening or stable response (Duff et al., 2017; Rosas et al., 2020; Turner et al., 2016; Utz et al., 2016). Similarly, regression-based models can be used to predict test scores at retest. Such models can be built either with data obtained from a normative or reference population (Duff et al., 2014, 2018; Duff and Hammers, 2022) or from baseline scores and demographic data of the studied patient population (Duff et al., 2015, 2017). Comparing the predicted with the observed test scores at retest results in a z-score, similar to the reliable change index approach. Finally, slope-intercept models can be fitted to the test scores to estimate the average change over time (Britt et al., 2011).

Estimating the temporal dynamics of practice effects

Besides quantifying the magnitude of practice effects, few studies have also estimated the duration until steady-state performance is reached. In Prince et al. (2018), the steady-state index was computed as the first confirmed test iteration at which the performance reached a predefined threshold. In contrast, Pham et al. (2021) fitted a biphasic, linear + linear learning curve model to the data, with the first phase fitting the practice phase and the second phase the steady-state performance phase. Using non-linear regression, they identified the change point, which marked the end of the practice phase.

Addressing biases

Finally, few analyses attempted to account for various biases. These include accounting for the attrition effect (Elman et al., 2018), which describes the bias introduced by patients lost to follow-up, and for the reduced capacity for practice effects among patients with high test performance at baseline (Gross et al., 2018; Sormani et al., 2019).

Practice effects

Across all four disease areas, certain performance outcome measures, or tests, were more prone to practice effects than others (Table 3). Among those assessing information processing speed, the Paced Auditory Serial Addition Test (PASAT) was most strongly associated with practice effects. A possible explanation is its stronger working memory component and its increased complexity. Among tests of executive function, the Trail-Making Test B (TMT-B) was more likely to produce practice effects than the Stroop Interference Test. The inhibitory processes involved when performing the Stroop Interference Test might make this test less prone to practice effects. On tests of learning and memory or visuospatial memory such the Hopkins Verbal Learning Test (HVLT) or the Brief Visuospatial Memory Test-Revised (BVMT-R), practice effects were more common if the same form was used. This suggest that practice effects are mostly driven by item learning. In addition, on the BVMT-R, delayed recall was more often associated with practice effects than immediate recall. However, this was not observed on the HVLT.

Table 3

Percentage of publications reporting practice effects.

Performance outcome measurea	Functional domain	Practice effectsb
Performance outcome measurea	Functional domain	Continuous or initial	Inconclusive	No
SDMT	Information processing speed, working memory	7	5	5
PASAT	Information processing speed, working memory	13	1	1
TMT-A	Information processing speed	2	4	2
TMT-B	Executive function	3	5	4
Stroop Word	Information processing speed	3	1	2
Stroop Color	Information processing speed	3	0	2
Stroop Interference	Executive function	1	0	4
BVMT-R total recall	Visuospatial memory	3	5	1
BVMT-R delayed recall	Visuospatial memory	5	3	1
CVLT total recall	Learning & memory	2	1	3
CVLT delayed recall	Learning & memory	3	1	2
HVLT total recall	Learning & memory	3	4	0
HVLT delayed recall	Learning & memory	2	3	0
Digit Span	Working memory	2	1	3
Logical Memory	Learning & memory	2	0	4
MMSE	Global cognition	1	2	1
T25FW	Gait	1	0	5
9HPT	Hand-motor function, manual dexterity	4	0	1

9HPT, Nine-Hole Peg Test; BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; HVLT, Hopkins Verbal Learning Test; MMSE, Mini-Mental State Examination; PASAT, Paced Auditory Serial Addition Test; SDMT, Symbol Digit Modalities Test; T25FW, Timed 25-Foot Walk; TMT, Trail-Making Test; VR, Visual Reproduction.

Only performance outcome measures reported in at least 4 studies are included in this analysis.

For references, please see Tables 4, 5, 6, and 7.

Percentage of publications reporting practice effects. 9HPT, Nine-Hole Peg Test; BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; HVLT, Hopkins Verbal Learning Test; MMSE, Mini-Mental State Examination; PASAT, Paced Auditory Serial Addition Test; SDMT, Symbol Digit Modalities Test; T25FW, Timed 25-Foot Walk; TMT, Trail-Making Test; VR, Visual Reproduction. Only performance outcome measures reported in at least 4 studies are included in this analysis. For references, please see Tables 4, 5, 6, and 7.

Table 4

Practice effects in patients with multiple sclerosis.

Study	Sample size		Study type	Follow-up duration	# test iteration	Practice effects in cohort of interesta				Comment
Study	Cohort of interest	Add. cohort	Study type	Follow-up duration	# test iteration	Continuous effects	Initial effects	Inconclusive	No improvement	Comment
Barker-Collo (2005)	• MS: 30		LO	Single session	2		• PASAT			Practice effects on the PASAT were observed for the 2.0-, 1.6-, and 1.2-second presentation, but not for the 2.4-second presentation.
Benedict et al. (2005)	• MS: 34		LO	1 week	2		• SDMT			Practice effects on the BVMT-R and CVLT were observed only with the same form.
							• PASAT
							• BVMT-R (total and delayed recall)
							• CVLT (total recall, delayed recall)
Benedict et al. (2008)	• MS: 85	• HC: 25	LO	5 months	6	• SDMT				An ANOVA was conducted to investigate the main effect over time among patients with multiple sclerosis.
Bever et al. (1995)	• MS: 19		RCT	16 weeks	5		• PASAT		• SDMT	All patients randomized to the active treatment arm had been off the study drug (3,4-diaminopyridine) for at least 30 days at the time of each evaluation.
Cohen et al. (2000)	• MS: 10		LO	6 months	8		• PASAT		• T25FW
							• 9HPT
Cohen et al. (2001)	• MS: 436		RCT	28 days	4	• PASAT	• 9HPT		• T25FW	Practice effects were assessed during a run-in period prior to randomization.
Erlanger et al. (2014)	• MS: 59		LO	45 days	2			• SDMT		Results are reported for a composite score.
								• PASAT
								• BVMT-R (delayed and total recall)
Eshaghi et al. (2012)	• MS: 41		LO	Mean (SD) of 10.8 (3.78) days	2		• PASAT	• SDMT	• BVMT-R (total recall)	A total of 158 patients were recruited, of which 41 were included in the practice effects analysis. A trend towards improvement was observed on the SDMT.
							• BVMT-R (delayed recall)		• CVLT (total and delayed recall)
Fuchs et al. (2020)	• MS: 531		LO	16 years	≤10				• SDMT
Gallus and Mathiowetz (2003)	• MS: 35		LO	1 week	2		• Purdue Pegboard: Sum of three trials (bimanual)		• Purdue Pegboard: One trial (dominant hand, non-dominant hand, bimanual, assembly)
									• Purdue Pegboard: Sum of three trials (dominant hand, non-dominant hand, assembly)
Giedraitiene and Kubrys (2019)	• Relapsing MS: 30	• Stable MS: 30	LO	3 months	3			• CANTAB: One Touch Stockings of Cambridge		Practice effects were only assessed in patients with relapsing MS.
		• HC: 30						• CANTAB: Spatial Working Memory		Functional recovery and practice effects may have occurred concurrently in relapsing MS.
								• CANTAB: Reaction Time
								• CANTAB: Paired Associates Learning
Glanz et al. (2012)	• MS: 69		LO	5 years	7	• PASAT	• SDMT
	• CIS: 21
Merlo et al. (2019)	• MS: 328	• HC: 30	LO	18 months	≤10		• MSReactor: Simple Reaction Time, Choice Reaction Time, One Back			A total of 450 patients with MS were recruited and completed their baseline assessment, practice effects were assessed in a subset of 328 patients with MS who completed up to 10 assessments.
Meyer et al. (2020)	• MS: 10	• HC: 40	LO	4–5 weeks	4		• T25FW		• 2MWT	Practice effects are reported for 8 patients with MS; 2 patients with MS were excluded due to muscle exhaustion/pain).
							• TUG
Nagels et al. (2008)	• MS: 110		LO	Single session	2		• PASAT
Patzold et al. (2002)	• MS untreated controls: 10	• MS receiving steroid therapy for acute relapse: 27	NRI	20 days	3				• PASAT
									• T25FW
									• 9HPT
Pham et al. (2021)	• MS: 15		LO	≥20 weeks	≥20		• Smartphone-based SDMT			A total of 154 patients and 39 healthy controls were recruited, of which 15 patients and 1 healthy control were included in the practice effects analysis.
	• HC: 1
Pliskin et al. (1996)b	• MS with high-dose IFN-β-1b: 9		RCT	2 years	2		• Stroop Word Test	• WMS: Visual Reproduction (delayed recall)	• TMT-B	Main effect of time was observed for improvement on Stroop Word Test and WMS Visual Reproduction (immediate recall); improvement on WMS Visual Reproduction (delayed recall) was associated with high-dose IFN-β-1b.
	• MS with low-dose IFN-β-1b: 8						• WMS: Visual Reproduction (immediate recall)		• Stroop Color Test
	• MS with placebo: 13								• Stroop Interference Test
									• WMS: Logical Memory
									• Purdue Pegboard
Rao et al. (2020)	• MS: 30	• HC: 30	LO	Single session	2		• MSPT: Manual Dexterity Test		• MSPT: Contrast Sensitivity Test
									• MSPT Walking Speed Test
Reilly and Hynes (2018)	• MS receiving cognitive rehabilitation: 12		NRI	18 weeks	3			• SDMT	• BVMT-R (delayed recall)	Observed improvements on were associated with cognitive rehabilitation; improvements on the SDMT and the TMT-A did not reach statistical significance.
								• TMT-A
								• TMT-B
								• BVMT-R (total recall)
								• CVLT (total recall, short delayed recall, long delayed recall)
Rosti-Otajärvi et al. (2008)	• MS: 10	• HC: 10	LO	4 weeks	5		• PASAT		• T25FW
							• 9HPT
Ruano et al. (2020)	• MS: 30	• HC: 30	LO	3 months	4	• Brain on Track: Opposite Task	• Brain on Track: Attention III		• Brain on Track: Delayed Verbal Memory
						• Brain on Track: Sequences	• Brain on Track: Visual Memory II		• Brain on Track: Word Categories
							• Brain on Track: Calculus		• Brain on Track: Verbal Memory II
							• Brain on Track: Color Interference
							• Brain on Track: Written Comprehension
							• Brain on Track: Puzzles
Schwid et al. (2007)	• MS: 153 (pooled analysis of 74 patients initially allocated to placebo and 79 patients initially allocated to glatiramer acetate)		RCT with OLE	10 years	3		• SDMT			A total of 251 patients were initially randomized, of whom 153 had 10-year follow-up data and were included in the analyses. Improvements at year 2 were independent of initial treatment allocation.
							• PASAT
Solari et al. (2005)	• MS: 32		LO	Single session	6		• PASAT		• T25FW
							• 9HPT
Sormani et al. (2019)	• MS: 1,009 (randomized 1:1:1 to receive either fingolimod 0.5 or 1.25 mg once daily or placebo)		RCT	2 weeks	3		• PASAT			Practice effects were assessed during a run-in period prior to randomization.
Utz et al. (2016)	• MS: 44 (pooled analysis of 22 patients receiving fingolimod, 11 natalizumab, 7 interferon and 1 glatiramer acetate)		NRI	1 year	3		• PASAT		• WMS: Digit Span	Initially 73 patients were recruited, of whom 41 had follow-up data and did not switch therapy.
									• WMS: Spatial Span
									• WMS: Logical Memory
									• Visual Search
Vogt et al. (2009)	• MS with high-intensity cognitive training: 15		NRI	4–8 weeks	3		• SDMT	• PASAT	WMS: Digit Span (forward)	Improvements on PASAT and Digit Span (backwards) were associated with additional cognitive training.
	• MS with distributed training: 15							• WMS: Digit Span (backwards)
	• MS controls: 15

9HPT, Nine-Hole Peg Test; Add., additional; Approx., approximately; BVMT-R, Brief Visuospatial Memory Test-Revised; CANTAB, Cambridge Neuropsychological Test Automated Battery; CVLT, California Verbal Learning Test; HC, healthy controls; MS, multiple sclerosis; MSPT, Multiple Sclerosis Performance Test; PASAT, Paced Auditory Serial Addition Test; RCT, randomized controlled trial; SDMT, Symbol Digit Modalities Test; T25FW, Timed 25-Foot Walk; TMT-A/B, Trail-Making Test A/B; TUG, Timed Up and Go; WMS, Wechsler Memory Scale.

‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (for example, due to contribution of other tests in composite scores, or association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed.

Results of the repeated assessments were not consistently reported for the placebo cohort; hence outcomes for the total cohort are reported.

Table 5

Practice effects in patients with Parkinson's disease.

Study	Sample size		Study type	Follow-up duration	# test iterations	Practice effects in cohort of interesta				Comment
Study	Cohort of interest	Add. cohort	Study type	Follow-up duration	# test iterations	Continuous effects	Initial effects	Inconclusive	No improvement	Comment
Benninger et al. (2011)	• PD receiving sham intervention: 13		RCT	1 month	3		• Serial Reaction Time			Practice effects were independent of the intervention.
	• PD receiving iTBS: 13
Benninger et al. (2012)	• PD receiving sham intervention: 13		RCT	1 month	3		• Serial Reaction Time			Practice effects were independent of the intervention.
	• PD receiving rTMS: 13
Buelow et al. (2015)	• PD receiving placebo: 20	• PD receiving galantamine hydrobromide ER: 33	RCT	10–16 weeks	2				• Serial Reaction Time	Practice effects were only assessed in the placebo cohort.
Higginson et al. (2009)	• PD: 22		NRI	Mean (SD) of 15.7 (5.6) months	2				• CVLT (total recall, delayed recall)
Prince et al. (2018)	• PD: 312 (Tapping test)	• YHC: 150 (Tapping test); 10 (Memory test)	LO	6 months	≥20 Tapping tests; ≥ 10 memory tests		• Tapping test
	• PD: 97 (Memory test)	• HC: 86 (Tapping test); 14 (Memory test)					• Memory test
Turner et al. (2016)	• PD with MCI receiving placebo: 15	• PD with MCI receiving atomoxetine: 15	RCT	10 weeks	2				• WAIS: Similarities test	Practice effects were only assessed in the placebo cohort.
									• WMS: Digit Span test
Westin et al. (2010)	• PD receiving duodenal levodopa/carbidopa: 65		NRI	1–6 weeks	28–168 (4x per day)				• Hand Computer Tapping Test	No difference was observed between first three days and remaining days.

Add., additional; CVLT, California Verbal Learning Test; HC, healthy controls; iTBS, intermittent theta-burst stimulation; LO, longitudinal observational; MCI, mild cognitive impairment; NRI, non-randomized interventional; PD, Parkinson's disease; RCT, randomized controlled trial; rTMS, repetitive transcranial magnetic stimulation; SDMT, Symbol Digit Modalities Test; WAIS, Wechsler Adult Intelligence Scale; YHC, young healthy controls.

‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (contribution of other tests in composite scores, association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed.

Table 6

Practice effects in patients with Huntington's disease.

Study	Sample size		Study type	Follow-up duration	# test iterations	Practice effects in cohort of interesta				Comment
Study	Cohort(s) of interest	Add. cohort(s)	Study type	Follow-up duration	# test iterations	Continuous effects	Initial effects	Inconclusive	No improvement	Comment
Bachoud-Lévi et al. (2001)	• HD: 22		LO	2–4 years	3–5		• TMT-B
							• WAIS: Digit Span (backwards)
Beglinger et al. (2014a)	• HD: 34 (randomized 1:1 to receive citalopram or placebo)		RCT	≤24 hours and ≥6 days (mean [SE]: 20.4 [2.2] days)	2		• SDMT		• Stroop Word Test	Practice effects were assessed prior to randomization. Initial effects on the SDMT were observed only with longer inter-test interval.
							• Stroop Color Test
							• Stroop Interference Test
Beglinger et al. (2014b)	• HD receiving placebo: 15	• HD receiving 20 mg citalopram: 16	RCT	20 weeks	6			• SDMT		Results are reported for a composite score.
								• TMT-B
								• Stroop Word Test
								• Stroop Color Test
								• Stroop Interference Test
								• WAIS: Letter-Number Sequencing
Duff et al. (2007)	• HD: 170		LO	Mean (SD) of 220 (122) days	2				• SDMT
									• Stroop Interference Test
Snowden et al. (2001)	• HD: 87	• Unaffected controls: 55	LO	1–3 years	2–4				• Stroop Word Test
									• Stroop Color Test
									• Stroop Interference Test
									• WAIS: Digit Span
Stout et al. (2014)	• HD: 56	• HC: 105	LO	5–7 weeks	3		• SDMT			Practice effects were only assessed in patients with HD or pre-HD, but not in HC. Practice effects on the TMT-A were observed only in HD patients, while practice effects on the Speed Tapping and Paced Tapping tests were only observed in pre-HD, patients.
	• Pre-HD: 103						• TMT-A
							• TMT-B
							• Stroop Word Test
							• HVLT
							• Speeded Tapping
							• Paced Tapping
Toh et al. (2014)	• HD: 22	• HC: 22	LO	12 months	2				• SDMT	Composite scores were computed based on performances on the SDMT, TMT-A/B, Stroop Test, BVMT-R, CVLT and Digit Span test.
									• TMT-A
									• TMT-B
									• Stroop Word Test
									• Stroop Color Test
									• Stroop Interference Test
									• BVMT-R
									• CVLT
									• WAIS: Digit Span
									• MMSE

Add., additional; BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; HC, healthy controls; HD, Huntington's disease; HVLT, Hopkins Verbal Learning Test; LO, longitudinal observational; MMSE, Mini-Mental State Examination; pre-HD, pre-manifest Huntington's disease; RCT, randomized controlled trial; SDMT, Symbol Digit Modalities Test; TMT, Trail-Making Test; WAIS, Wechsler Adult Intelligence Scale.

‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (contribution of other tests in composite scores, association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed.

Table 7

Practice effects in patients with either mild cognitive impairment, Alzheimer's disease or other forms of dementia.

Study	Sample size		Study type	Follow-up duration	# test iterations	Practice effects in cohort of interesta				Comment
Study	Cohort of interest	Add. cohort	Study type	Follow-up duration	# test iterations	Continuous effects	Initial effects	Inconclusive	No improvement	Comment
Britt et al. (2011)	• MCI: 48b	• HC: 36b	LO	60 months	2–8				• TMT-B
	• AD: 28b								• WMS: Logical Memory
Campos-Magdaleno et al. (2017)	• Multi-domain MCI: 21
	• Single-domain MCI: 46		LO	18 months	2		• CVLT (total recall, short delayed free recall, short delayed cued recall, long delayed cued recall, long delayed free recall)			Practice effects for total recall and long delayed free recall were observed in only patients with SMC, but not in patients with MCI.
	• SMC: 207
Claus et al. (1991)	• AD: 17	• HC: 16	LO	2 weeks	3				• WMS: Logical Memory
									• WMS: Paired Associations
Duff and Hammers, 2022	• MCI: 93		LO	Mean (SD) of 1.3 (0.1) years	2			• SDMT		All observed follow-up scores were compared against predicted scores.
								• TMT-A
								• TMT-B
								• BVMT-R (total and delayed recall)
								• HVLT (total and delayed recall)
Duff et al. (2007)	• MCI: 8		LO	2 weeks	2			• BVMT-R (total recall)		Lack of statistical testing.
								• MMSE		Lack of statistical testing.
Duff et al. (2011)	• MCI: 51	HC: 57	LO	1 week	2			• SDMT		Lack of statistical testing.
								• TMT-A
								• TMT-B
								• BVMT-R (total and delayed recall)
								• HVLT (total and delayed recall)
Duff et al. (2012)	• Dementia, MCI, AD: 61		LO	Single session	2			• HVLT		Lack of statistical testing.
								• WAIS: Coding
Duff et al. (2014)	• MCI: 10		LO	1 week	2		• BVMT-R (delayed recall)
	• HC: 15
Duff et al. (2015)	• MCI: 10	• HC: 15	LO	Approx. 1 week	2			• SDMT		Lack of statistical testing.
								• TMT-A
								• TMT-B
								• BVMT-R (total and delayed recall)
								• HVLT (total and delayed recall)
Duff et al. (2017)	• MCI: 58		LO	1 week	2		• BVMT-R (total and delayed recall)		• SDMT
							• HVLT (total and delayed recall)		• TMT-A
									• TMT-B
Duff et al. (2018)	• MCI: 17		LO	Approx. 1 week	2		• BVMT-R (total and delayed recall)		• SDMT
	• HC: 8						• HVLT (total and delayed recall)		• TMT-A
									• TMT-B
Elman et al. (2018)	• MCI and HC: 995c		LO	6 years	2		• Stroop Color Test	• Stroop Word Test	• Stroop Interference Test	A trend towards improvement was observed on the Stroop Word Test, Digit Span (forwards condition only), Visual Reproduction Test (immediate recall only), and the Matrix Reasoning tests.
							• CVLT (total and short delayed recall)	• WMS: Digit Span (forwards)	• CVLT (long delayed recall)
							• WMS: Digit Span (backwards)	• WMS: Visual Reproduction (immediate recall)
							• WMS: Spatial Span (total and backwards)	• WASI: Matrix Reasoning
							• WMS: Letter-Number Sequencing
							• WMS: Logical Memory (immediate and delayed recall)
							• WMS: Visual Reproduction (delayed recall)
Frank et al. (1996)	• AD: 56	• HC: 242	LO	Approx. 2.4 years	2		• WMS: Visual Reproduction (immediate and delayed recall; at risk for AD only)	• TMT-B	• WMS: Visual Reproduction (immediate recall; AD only)	Practice effects on the Visual Reproduction test were only observed in patients with MCI, but not in patients at risk of developing AD. On the TMT-B, Visual Reproduction (delayed recall in AD patients) and MMSE, practice effects were only observed in specific sex subgroups (male vs female).
	• At risk for AD: 82							• WMS: Visual Reproduction (delayed recall; AD only)
								• MMSE
Gavett et al. (2016)	• MCI: 72	• HC: 96	LO	5 years	5	• WMS: Logical Memory (immediate and delayed recall)				Practice effects were only observed in patients with MCI, but not in patients with AD.
	• AD: 121
Gross et al. (2018)	• AD: 990		LO	2.4 years	≤7	• MMSE				A random effects model analysis revealed an overall main retest (practice) effect over time.
Hammers et al. (2011)	• MCI: 20	• HC: 23	LO	Single session	2		• CogState: OBK accuracy (all cohorts)		• CogState: Detection	Practice effects on the OBK reaction time task were observed only in patients with MCI or DLB, while practice effects on the IDM reaction time task were observed only in patients with DLB.
	• AD: 52						• CogState: OBK reaction time		• CogState: Identification
	• Dementia (incl. DLB, FTD): 19						• CogState: IDM reaction time		• CogState: One Card Learning
									• CogState: Associative Learning
Rosas et al. (2020)	• MCI: 270d	• HC: 46d	LO	Mean (SD) of 25.96 (11.28) months	2		• TMT-A			Practice effects were indirectly inferred from reliable change index analyses.
	• SCD: 42d						• TMT-B
							• Stoop Word Test
							• Stroop Color Test
							• WAIS: Digit Symbol
Teasdale et al. (2016)	• MCI: 15		LO	6 months	6			• Driving simulator		Practice effects observed only during training phase during which feedback was given.

AD, Alzheimer's disease; Approx., approximately; BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; DLB, dementia with Lewis Bodies; FTD, frontotemporal dementia; HC, healthy controls; HVLT, Hopkins Verbal Learning Test; IDM, divided attention task; MCI, mild cognitive impairment; MMSE, Mini-Mental State Examination; OBK, One-Back Test; SCD, subjective cognitive decline; SDMT, Symbol Digit Modalities Test; SMC, subjective memory complaint; TMT, Trail-Making Test; WAIS, Wechsler Adult Intelligence Scale; WASI, Wechsler Abbreviated Scale of Intelligence; WMS, Wechsler Memory Scale.

‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (contribution of other tests in composite scores, association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed.

Based on clinical rating at the end of the study.

Data from 1,220 and 995 patients were available for visit 1 and 2, of which 11.0% and 15.2% (after correcting for practice effects) were diagnosed with mild cognitive impairment, respectively.

At follow-up, 48 participants were diagnosed with Alzheimer's disease, 200 with mild cognitive impairment and 64 with subjective cognitive decline, while 46 participants were considered as cognitively healthy.

Practice effects in patients with multiple sclerosis

The literature search revealed 27 studies on practice effects in multiple sclerosis. Details on study design and presence of practice effects are summarized in Table 4. Effect sizes (Cohen's d) are depicted in Figure 3.

Figure 3

Effect sizes (Cohen's d, unless otherwise noted) for observed changes between test iterations in patients with multiple sclerosis. Studies (horizontal axis) that reported effect sizes for individual performance outcome measures are shows in the figure. Studies that did not report effect sizes or reported effect sizes for composite scores are not included in this figure. Dark green dots () indicate continuous practice effects, light green dots () initial practice effects, yellow dots () inconclusive effects and red dots () absence of practice effects, as defined in Table 4. Small, medium and large effect sizes are defined as d = 0.2, d = 0.5 and d = 0.8, respectively, and apply to Cohen's d only (Cohen, 1992). ∗ Partial η2. BVMT-R, Brief Visuospatial Memory Test-Revised; CANTAB, Cambridge Neuropsychological Test Automated Battery; CST, Contrast Sensitivity Test; CVLT, California Verbal Learning Test; MDT-DH, Dominant-handed Manual Dexterity Test; MDT-NDH, Non-dominant-handed Manual Dexterity Test; MSPT, Multiple Sclerosis Performance Test; OTS-MC6, One Touch Stockings of Cambridge with 6 moves; PAL-TE8, Total error at 8-figure stage of the Paired Associates Learning; PASAT, Paced Auditory Serial Addition Test; RTI-FM, Five-choice movement time; RTI-FR, Five-choice reaction time; RTI-SR, Simple reaction time; SDMT, Symbol Digit Modalities Test; SWM-TE8, Total error for 8 boxes of Spatial Working Memory; WST, Walking Speed Test.

Practice effects in patients with multiple sclerosis. MS: 30 PASAT MS: 34 SDMT PASAT BVMT-R (total and delayed recall) CVLT (total recall, delayed recall) MS: 85 HC: 25 SDMT MS: 19 PASAT SDMT MS: 10 PASAT T25FW 9HPT MS: 436 PASAT 9HPT T25FW MS: 59 SDMT PASAT BVMT-R (delayed and total recall) MS: 41 PASAT SDMT BVMT-R (total recall) BVMT-R (delayed recall) CVLT (total and delayed recall) MS: 531 SDMT MS: 35 Purdue Pegboard: Sum of three trials (bimanual) Purdue Pegboard: One trial (dominant hand, non-dominant hand, bimanual, assembly) Purdue Pegboard: Sum of three trials (dominant hand, non-dominant hand, assembly) Relapsing MS: 30 Stable MS: 30 CANTAB: One Touch Stockings of Cambridge HC: 30 CANTAB: Spatial Working Memory CANTAB: Reaction Time CANTAB: Paired Associates Learning MS: 69 PASAT SDMT CIS: 21 MS: 328 HC: 30 MSReactor: Simple Reaction Time, Choice Reaction Time, One Back MS: 10 HC: 40 T25FW 2MWT TUG MS: 110 PASAT MS untreated controls: 10 MS receiving steroid therapy for acute relapse: 27 PASAT T25FW 9HPT MS: 15 Smartphone-based SDMT HC: 1 MS with high-dose IFN-β-1b: 9 Stroop Word Test WMS: Visual Reproduction (delayed recall) TMT-B MS with low-dose IFN-β-1b: 8 WMS: Visual Reproduction (immediate recall) Stroop Color Test MS with placebo: 13 Stroop Interference Test WMS: Logical Memory Purdue Pegboard MS: 30 HC: 30 MSPT: Manual Dexterity Test MSPT: Contrast Sensitivity Test MSPT Walking Speed Test MS receiving cognitive rehabilitation: 12 SDMT BVMT-R (delayed recall) TMT-A TMT-B BVMT-R (total recall) CVLT (total recall, short delayed recall, long delayed recall) MS: 10 HC: 10 PASAT T25FW 9HPT MS: 30 HC: 30 Brain on Track: Opposite Task Brain on Track: Attention III Brain on Track: Delayed Verbal Memory Brain on Track: Sequences Brain on Track: Visual Memory II Brain on Track: Word Categories Brain on Track: Calculus Brain on Track: Verbal Memory II Brain on Track: Color Interference Brain on Track: Written Comprehension Brain on Track: Puzzles MS: 153 (pooled analysis of 74 patients initially allocated to placebo and 79 patients initially allocated to glatiramer acetate) SDMT PASAT MS: 32 PASAT 9HPT MS: 1,009 (randomized 1:1:1 to receive either fingolimod 0.5 or 1.25 mg once daily or placebo) PASAT MS: 44 (pooled analysis of 22 patients receiving fingolimod, 11 natalizumab, 7 interferon and 1 glatiramer acetate) PASAT WMS: Digit Span WMS: Spatial Span WMS: Logical Memory Visual Search MS with high-intensity cognitive training: 15 SDMT PASAT MS with distributed training: 15 WMS: Digit Span (backwards) MS controls: 15 9HPT, Nine-Hole Peg Test; Add., additional; Approx., approximately; BVMT-R, Brief Visuospatial Memory Test-Revised; CANTAB, Cambridge Neuropsychological Test Automated Battery; CVLT, California Verbal Learning Test; HC, healthy controls; MS, multiple sclerosis; MSPT, Multiple Sclerosis Performance Test; PASAT, Paced Auditory Serial Addition Test; RCT, randomized controlled trial; SDMT, Symbol Digit Modalities Test; T25FW, Timed 25-Foot Walk; TMT-A/B, Trail-Making Test A/B; TUG, Timed Up and Go; WMS, Wechsler Memory Scale. ‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (for example, due to contribution of other tests in composite scores, or association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed. Results of the repeated assessments were not consistently reported for the placebo cohort; hence outcomes for the total cohort are reported. Effect sizes (Cohen's d, unless otherwise noted) for observed changes between test iterations in patients with multiple sclerosis. Studies (horizontal axis) that reported effect sizes for individual performance outcome measures are shows in the figure. Studies that did not report effect sizes or reported effect sizes for composite scores are not included in this figure. Dark green dots () indicate continuous practice effects, light green dots () initial practice effects, yellow dots () inconclusive effects and red dots () absence of practice effects, as defined in Table 4. Small, medium and large effect sizes are defined as d = 0.2, d = 0.5 and d = 0.8, respectively, and apply to Cohen's d only (Cohen, 1992). ∗ Partial η2. BVMT-R, Brief Visuospatial Memory Test-Revised; CANTAB, Cambridge Neuropsychological Test Automated Battery; CST, Contrast Sensitivity Test; CVLT, California Verbal Learning Test; MDT-DH, Dominant-handed Manual Dexterity Test; MDT-NDH, Non-dominant-handed Manual Dexterity Test; MSPT, Multiple Sclerosis Performance Test; OTS-MC6, One Touch Stockings of Cambridge with 6 moves; PAL-TE8, Total error at 8-figure stage of the Paired Associates Learning; PASAT, Paced Auditory Serial Addition Test; RTI-FM, Five-choice movement time; RTI-FR, Five-choice reaction time; RTI-SR, Simple reaction time; SDMT, Symbol Digit Modalities Test; SWM-TE8, Total error for 8 boxes of Spatial Working Memory; WST, Walking Speed Test. Repeated assessment of information processing speed was likely to result in practice effects. Both the traditional, clinician-administered as well as the smartphone-based Symbol Digit Modalities Test (SDMT) produced practice effects in most studies, although they were minimal and smaller than observed in healthy controls (Cohen's d: 0.2 vs 0.8) (Benedict, 2005; Benedict et al., 2008; Eshaghi et al., 2012; Glanz et al., 2012; Pham et al., 2021; Reilly and Hynes, 2018; Schwid et al., 2007; Vogt et al., 2009). Only Fuchs et al. (2020) and Bever et al. (1995) noted an absence of practice effects. However, all patients were previously exposed to the SDMT prior to enrolment, which might have impacted the ability to detect practice effects (Fuchs et al., 2020). Practice effects were also common on the PASAT, both with short inter-test intervals (every two weeks or shorter) and long inter-test intervals (every month or longer) (Barker-Collo, 2005; Benedict, 2005; Bever et al., 1995; Cohen et al., 2000, 2001; Eshaghi et al., 2012; Glanz et al., 2012; Nagels et al., 2008; Rosti-Otajärvi et al., 2008; Schwid et al., 2007; Solari et al., 2005; Sormani et al., 2019; Utz et al., 2016). Compared with the SDMT, PASAT-related practice effects were larger in magnitude (Cohen's d: 0.3–0.4 vs 0.2) (Benedict, 2005; Eshaghi et al., 2012). A trend towards improved test scores was also noted on the TMT-A in Reilly and Hynes (2018); however, all patients underwent cognitive rehabilitation prior to retest. Practice effects were also discernable on the Word, but not Color subtest of the Stroop Test (Pliskin et al., 1996). There was little evidence of practice effects associated with executive function. On the TMT-B, subtle practice effects were observed in Reilly and Hynes (2018), but not in Pliskin et al. (1996). By comparison, repeated testing with the Stroop Interference test did not result in practice effects (Pliskin et al., 1996). Practice effects were common on assessments of learning and memory or visuospatial memory. Both the California Verbal Learning Test (CVLT) and the BVMT-R produced practice effects. This was particularly evident if the same form was used (Benedict, 2005; Eshaghi et al., 2012). On the Visual Reproduction test, improvement in performance was independent of treatment allocation only on immediate recall but not on delayed recall (Pliskin et al., 1996). On the latter, improved test scores were observed only in patients receiving high-dose interferon-β. Finally, the digital Visual Search test was not associated with practice effects (Utz et al., 2016). On the Digit Span test, a measure of working memory, improved test scores were only observed in the backward condition (Vogt et al., 2009). However, this improvement was associated with additional cognitive training. On digital cognitive batteries, practice effects were observed on the Brain on Track test (Ruano et al., 2020), the MSReactor (Merlo et al., 2019) and the Cambridge Neuropsychological Test Automated Battery (CANTAB) (Giedraitiene and Kaubrys, 2019), with larger practice effects associated with more demanding tasks (Giedraitiene and Kaubrys, 2019; Merlo et al., 2019). On gait and balance tests, short-term practice effects were reported on the Timed Up and Go (Meyer et al., 2020). In contrast, most studies showed an absence of practice effects on the Timed 25-Foot Walk (T25FW) (Cohen et al., 2000, 2001; Patzold et al., 2002; Rosti-Otajärvi et al., 2008; Solari et al., 2005), Two-Minute Walk Test (2MWT) (Meyer et al., 2020) and the digital Walking Speed Test (Rao et al., 2020); only Meyer et al. (2020) demonstrated discernable practice effects on the T25FW. On assessments of hand-motor function, practice effects were observed on the digital Manual Dexterity Test (Rao et al., 2020) and Nine-Hole Peg Test (9HPT) (Cohen et al., 2000, 2001; Rosti-Otajärvi et al., 2008; Solari et al., 2005). However, in Patzold et al. (2002), 9HPT-related improvements were only observed in those patients receiving active treatment for acute relapse. By comparison, practice effects were unlikely to occur on the Purdue Pegboard, especially when each hand was considered separately (Gallus and Mathiowetz, 2003; Pliskin et al., 1996). Finally, there was no evidence of practice effects on the Contrast Vision Test (Rao et al., 2020).

Practice effects in patients with Parkinson's disease

The literature search revealed seven studies on practice effects in Parkinson's disease. Details on study design and presence of practice effects are summarized in Table 5. Only one study reported effect sizes (Cohen's d), which are depicted in Figure 4.

Figure 4

Effect sizes (Cohen's d) for observed changes between test iterations in patients with Parkinson's disease. Studies that did not report effect sizes or reported effect sizes for composite scores are not included in this figure. Red dots () absence of practice effects, as defined in Table 5. Small, medium and large effect sizes are defined as d = 0.2, d = 0.5 and d = 0.8, respectively (Cohen, 1992). CVLT, California Verbal Learning Test. ∗Effect size indicates a worsening in test performance.

Practice effects in patients with Parkinson's disease. PD receiving sham intervention: 13 Serial Reaction Time PD receiving iTBS: 13 PD receiving sham intervention: 13 Serial Reaction Time PD receiving rTMS: 13 PD receiving placebo: 20 PD receiving galantamine hydrobromide ER: 33 Serial Reaction Time PD: 22 CVLT (total recall, delayed recall) PD: 312 (Tapping test) YHC: 150 (Tapping test); 10 (Memory test) Tapping test PD: 97 (Memory test) HC: 86 (Tapping test); 14 (Memory test) Memory test PD with MCI receiving placebo: 15 PD with MCI receiving atomoxetine: 15 WAIS: Similarities test WMS: Digit Span test PD receiving duodenal levodopa/carbidopa: 65 Hand Computer Tapping Test Add., additional; CVLT, California Verbal Learning Test; HC, healthy controls; iTBS, intermittent theta-burst stimulation; LO, longitudinal observational; MCI, mild cognitive impairment; NRI, non-randomized interventional; PD, Parkinson's disease; RCT, randomized controlled trial; rTMS, repetitive transcranial magnetic stimulation; SDMT, Symbol Digit Modalities Test; WAIS, Wechsler Adult Intelligence Scale; YHC, young healthy controls. ‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (contribution of other tests in composite scores, association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed. Effect sizes (Cohen's d) for observed changes between test iterations in patients with Parkinson's disease. Studies that did not report effect sizes or reported effect sizes for composite scores are not included in this figure. Red dots () absence of practice effects, as defined in Table 5. Small, medium and large effect sizes are defined as d = 0.2, d = 0.5 and d = 0.8, respectively (Cohen, 1992). CVLT, California Verbal Learning Test. ∗Effect size indicates a worsening in test performance. Most of these studies did not reveal any practice effects, whether on the CVLT (Higginson et al., 2009), the Digit Span Test (Turner et al., 2016), the Similarities Test (Turner et al., 2016) or the digital Tapping Test (Westin et al., 2010). On the Serial Reaction Time test, two studies revealed improvements, or reduced reaction times, that suggest practice effects (Benninger et al., 2011, 2012). However, Buelow et al. (2015) noted a worsening at retest. This does not preclude the possibility that a subgroup of patients show signs of practice effects. In fact, Prince et al. (2018) identified three subgroups of patients on both the Alternating Tapping Test and Memory Test included in the mPower dataset: those who improved over time by at least 20%, those who deteriorated over time by at least 20% and those who remained stable.

Practice effects in patients with Huntington's disease

The literature search revealed seven studies on practice effects in Huntington's disease. Details on study design and presence of practice effects are summarized in Table 6. Only one study reported effect sizes (Cohen's d), which are depicting in Figure 5.

Figure 5

Repeated-measures effect sizes of the observed changes between test iterations in Huntington's disease obtained from Stout et al. (2014). Light green dots () indicate initial practice effects. Red dots () indicate an absence of practice effects, as defined in Table 6. ∗ Effect size reported for the change observed between the second and third test iteration rather than between the first and second test iteration. CVLT, California Verbal Learning Test; HD, Huntington's disease; HVLT, Hopkins Verbal Learning Test; pre-HD, pre-manifest Huntington's disease; SDMT, Symbol Digit Modalities Test; TMT, Trail-Making Test.

Practice effects in patients with Huntington's disease. HD: 22 TMT-B WAIS: Digit Span (backwards) HD: 34 (randomized 1:1 to receive citalopram or placebo) SDMT Stroop Word Test Stroop Color Test Stroop Interference Test HD receiving placebo: 15 HD receiving 20 mg citalopram: 16 SDMT TMT-B Stroop Word Test Stroop Color Test Stroop Interference Test WAIS: Letter-Number Sequencing HD: 170 SDMT Stroop Interference Test HD: 87 Unaffected controls: 55 Stroop Word Test Stroop Color Test Stroop Interference Test WAIS: Digit Span HD: 56 HC: 105 SDMT Pre-HD: 103 TMT-A TMT-B Stroop Word Test HVLT Speeded Tapping Paced Tapping HD: 22 HC: 22 SDMT TMT-A TMT-B Stroop Word Test Stroop Color Test Stroop Interference Test BVMT-R CVLT WAIS: Digit Span MMSE Add., additional; BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; HC, healthy controls; HD, Huntington's disease; HVLT, Hopkins Verbal Learning Test; LO, longitudinal observational; MMSE, Mini-Mental State Examination; pre-HD, pre-manifest Huntington's disease; RCT, randomized controlled trial; SDMT, Symbol Digit Modalities Test; TMT, Trail-Making Test; WAIS, Wechsler Adult Intelligence Scale. ‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (contribution of other tests in composite scores, association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed. Repeated-measures effect sizes of the observed changes between test iterations in Huntington's disease obtained from Stout et al. (2014). Light green dots () indicate initial practice effects. Red dots () indicate an absence of practice effects, as defined in Table 6. ∗ Effect size reported for the change observed between the second and third test iteration rather than between the first and second test iteration. CVLT, California Verbal Learning Test; HD, Huntington's disease; HVLT, Hopkins Verbal Learning Test; pre-HD, pre-manifest Huntington's disease; SDMT, Symbol Digit Modalities Test; TMT, Trail-Making Test. The practice effects analyses revealed mostly mixed results. On the SDMT, for example, practice effects were observed in two studies, with patients with pre-manifest Huntington's disease showing larger practice effects than patients with manifest Huntington's disease (Beglinger et al., 2014a; Stout et al., 2014). However, one study did not find any discernable practice effects (Duff et al., 2007). On the TMT-A, larger practice effects were observed in patients with manifest Huntington's disease as opposed to pre-manifest Huntington's disease (Stout et al., 2014). Mixed results were also obtained on the Stroop Word Test (Beglinger et al., 2014a; Snowden et al., 2001; Stout et al., 2014) and the Stroop Color Test (Beglinger et al., 2014a; Snowden et al., 2001). By comparison, the initial improvement on the TMT-B observed within 1–3 days was of similar magnitude in patients with pre-manifest or with manifest Huntington's disease (Stout et al., 2014). When assessed annually, the initial gain was followed by a decline in performance, which likely reflected a progression of the disease (Bachoud-Lévi et al., 2001). Analyses of the Stroop Interference Test revealed mixed results (Beglinger et al., 2014a; Duff et al., 2007; Snowden et al., 2001). On the HVLT, practice effects were found for patients with either pre-manifest or manifest Huntington's disease (Stout et al., 2014). On the Speeded Tapping Test and Paced Tapping Test, however, practice effects were observed in patients with pre-manifest but not with manifest Huntington's disease (Stout et al., 2014). The Digit Span test was generally not associated with practice effects, in particular in the forward condition (Bachoud-Lévi et al., 2001; Snowden et al., 2001); although, practice effects were reported in the backward condition (Bachoud-Lévi et al., 2001). Similarly, the repeated testing with the Mini-Mental State Examination (MMSE) did not result in practice effects (Toh et al., 2014). Few studies also studied practice effects for composite scores. Practice effects were observed for a composite score that included the Letter-Number Sequencing Test (Beglinger et al., 2014b). Mixed results were obtained for composite scores that included either the SDMT, the TMT or the Stroop Test (Beglinger et al., 2014b; Toh et al., 2014). Finally, neither the BVMT-R, CVLT nor Digit Span Test were associated with practice effects when included in composite scores (Toh et al., 2014).

Practice effects in patients with mild cognitive impairment, Alzheimer's disease or other forms of dementia

The literature search revealed 18 studies on practice effects analyses in mild cognitive impairment, Alzheimer's disease and other forms of dementia. Details on study design and presence of practice effects are summarized in Table 7, and effects sizes (Cohen's d) are depicted in Figure 6.

Figure 6

Effect sizes (Cohen's d, unless otherwise noted) for observed changes between test iterations in patients with mild cognitive impairment, Alzheimer's disease or other forms of dementia. Studies that did not report effect sizes or reported effect sizes for composite scores are not included in this figure. Dark green dots () indicate continuous practice effects, light green dots () initial practice effects, yellow dots () inconclusive effects and red dots () absence of practice effects, as defined in Table 7. Small, medium and large effect sizes are defined as d = 0.2, d = 0.5 and d = 0.8, respectively, and apply to Cohen's d only (Cohen, 1992). ∗η2. BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; DLB, dementia with Lewis bodies; HVLT, Hopkins Verbal Learning Test; IDM, divided attention task; LDFR, long delayed free recall; LM, Logical Memory; LNS, Letter-Number Sequencing; MCI, mild cognitive impairment; mdMCI, multi-domain mild cognitive impairment; MMSE, Mini-Mental State Examination; OBK, One-Back Test; SDFR, short delayed free recall; sdMCI, single-domain mild cognitive impairment; SDMT, Symbol Digit Modalities Test; TMT, Trail-Making Test; TR, total recall; VR, Visual Reproduction; WAIS, Wechsler Adult Intelligence Scale; WMS, Wechsler Memory Scale.

Practice effects in patients with either mild cognitive impairment, Alzheimer's disease or other forms of dementia. MCI: 48b HC: 36b TMT-B AD: 28b WMS: Logical Memory Multi-domain MCI: 21 Single-domain MCI: 46 CVLT (total recall, short delayed free recall, short delayed cued recall, long delayed cued recall, long delayed free recall) SMC: 207 AD: 17 HC: 16 WMS: Logical Memory WMS: Paired Associations MCI: 93 SDMT TMT-A TMT-B BVMT-R (total and delayed recall) HVLT (total and delayed recall) MCI: 8 BVMT-R (total recall) MMSE MCI: 51 SDMT TMT-A TMT-B BVMT-R (total and delayed recall) HVLT (total and delayed recall) Dementia, MCI, AD: 61 HVLT WAIS: Coding MCI: 10 BVMT-R (delayed recall) HC: 15 MCI: 10 HC: 15 SDMT TMT-A TMT-B BVMT-R (total and delayed recall) HVLT (total and delayed recall) MCI: 58 BVMT-R (total and delayed recall) SDMT HVLT (total and delayed recall) TMT-A TMT-B MCI: 17 BVMT-R (total and delayed recall) SDMT HC: 8 HVLT (total and delayed recall) TMT-A TMT-B MCI and HC: 995c Stroop Color Test Stroop Word Test Stroop Interference Test CVLT (total and short delayed recall) WMS: Digit Span (forwards) CVLT (long delayed recall) WMS: Digit Span (backwards) WMS: Visual Reproduction (immediate recall) WMS: Spatial Span (total and backwards) WASI: Matrix Reasoning WMS: Letter-Number Sequencing WMS: Logical Memory (immediate and delayed recall) WMS: Visual Reproduction (delayed recall) AD: 56 HC: 242 WMS: Visual Reproduction (immediate and delayed recall; at risk for AD only) TMT-B WMS: Visual Reproduction (immediate recall; AD only) At risk for AD: 82 WMS: Visual Reproduction (delayed recall; AD only) MMSE MCI: 72 HC: 96 WMS: Logical Memory (immediate and delayed recall) AD: 121 AD: 990 MMSE MCI: 20 HC: 23 CogState: OBK accuracy (all cohorts) CogState: Detection AD: 52 CogState: OBK reaction time CogState: Identification Dementia (incl. DLB, FTD): 19 CogState: IDM reaction time CogState: One Card Learning CogState: Associative Learning MCI: 270d HC: 46d TMT-A SCD: 42d TMT-B Stoop Word Test Stroop Color Test WAIS: Digit Symbol MCI: 15 Driving simulator AD, Alzheimer's disease; Approx., approximately; BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; DLB, dementia with Lewis Bodies; FTD, frontotemporal dementia; HC, healthy controls; HVLT, Hopkins Verbal Learning Test; IDM, divided attention task; MCI, mild cognitive impairment; MMSE, Mini-Mental State Examination; OBK, One-Back Test; SCD, subjective cognitive decline; SDMT, Symbol Digit Modalities Test; SMC, subjective memory complaint; TMT, Trail-Making Test; WAIS, Wechsler Adult Intelligence Scale; WASI, Wechsler Abbreviated Scale of Intelligence; WMS, Wechsler Memory Scale. ‘Continuous effects’ is defined as a continuous improvement in test performance for ≥4 test administrations, with test performance continuing to improve up to the last test administered. By definition this can only be applied to studies that administered the test at least 4 times. In all other instances, practice effects are described as either ‘initial effect’ if clear signs of practice effects were evident; ‘inconclusive’ if practice effects were observed for a selection of test metrics, in a subgroup of patients only, or if other reasons may explain the improvement in test performance (contribution of other tests in composite scores, association with additional training or treatment etc.); or ‘no effect’ if no improvement in test performance was observed. Based on clinical rating at the end of the study. Data from 1,220 and 995 patients were available for visit 1 and 2, of which 11.0% and 15.2% (after correcting for practice effects) were diagnosed with mild cognitive impairment, respectively. At follow-up, 48 participants were diagnosed with Alzheimer's disease, 200 with mild cognitive impairment and 64 with subjective cognitive decline, while 46 participants were considered as cognitively healthy. Effect sizes (Cohen's d, unless otherwise noted) for observed changes between test iterations in patients with mild cognitive impairment, Alzheimer's disease or other forms of dementia. Studies that did not report effect sizes or reported effect sizes for composite scores are not included in this figure. Dark green dots () indicate continuous practice effects, light green dots () initial practice effects, yellow dots () inconclusive effects and red dots () absence of practice effects, as defined in Table 7. Small, medium and large effect sizes are defined as d = 0.2, d = 0.5 and d = 0.8, respectively, and apply to Cohen's d only (Cohen, 1992). ∗η2. BVMT-R, Brief Visuospatial Memory Test-Revised; CVLT, California Verbal Learning Test; DLB, dementia with Lewis bodies; HVLT, Hopkins Verbal Learning Test; IDM, divided attention task; LDFR, long delayed free recall; LM, Logical Memory; LNS, Letter-Number Sequencing; MCI, mild cognitive impairment; mdMCI, multi-domain mild cognitive impairment; MMSE, Mini-Mental State Examination; OBK, One-Back Test; SDFR, short delayed free recall; sdMCI, single-domain mild cognitive impairment; SDMT, Symbol Digit Modalities Test; TMT, Trail-Making Test; TR, total recall; VR, Visual Reproduction; WAIS, Wechsler Adult Intelligence Scale; WMS, Wechsler Memory Scale. Compared with the other disease areas, practice effects were less common on test of information processing speed, especially on the SDMT (Duff et al., 2017, 2018; Duff and Hammers, 2022) and TMT-A (Duff et al., 2017, 2018; Duff and Hammers, 2022). However, when present, they tend to be smaller than observed in healthy controls (Duff et al., 2015), and their magnitude correlated significantly with hippocampal volume (r = 0.73; P < 0.01) (Duff et al., 2018). Practice effects on the SDMT and the Word and Color subtests of the Stroop Test were more likely to be observed in patients with greater levels of cognitive impairment (Rosas et al., 2020). Moreover, a trend towards improved test scores were observed on the Stroop Word test and subtle practice effects on the Stroop Color test in a mixed cohort of patients with mild cognitive impairment and healthy volunteers even after a long inter-test interval of six years (Elman et al., 2018). Practice effects on the Digit-Symbol or Coding Test were also more likely to occur with increasing levels of cognitive impairment (Rosas et al., 2020). However, Duff et al. (2012) reported an inverse correlation between the magnitude of practice effects and dementia severity measured by MMSE (partial r = 0.26; P = 0.046; Cohen's d = 0.54), even after controlling for baseline performance. For tests assessing executive function such as the TMT-B or the Interference subtest of the Stroop Test, there was little evidence of practice effects (Britt et al., 2011; Duff et al., 2015, 2017, 2018; Duff and Hammers, 2022; Elman et al., 2018). Nonetheless, practice effects were more likely to occur with increasing levels of cognitive impairment (Rosas et al., 2020) or in specific subgroups (Frank et al., 1996). Repeated testing with the CVLT, a measure of learning and memory, resulted in practice effects, especially on less demanding tasks such as short delayed free or cued recall and long delayed cued recall (Campos-Magdaleno et al., 2017; Elman et al., 2018). The lack of practice effects on the more memory-demanding tasks of the CVLT, including long delayed free recall, suggests that explicit memory deteriorates in amnestic mild cognitive impairment while implicit memory involved in practice effects is still preserved (Campos-Magdaleno et al., 2017). Practice effects were also observed on both total and delayed recall of the HVLT when retested within a week (Duff et al., 2017, 2018). In patients with probable Alzheimer's disease, stronger practice effects correlated inversely with disease severity measured by MMSE after controlling for baseline performance (partial r = 0.47; P < 0.001; Cohen's d = 1.016) (Duff et al., 2012). On the Visual Reproduction test, improvements suggestive of practice effects were observed in patients at risk of developing Alzheimer's disease for both delayed and immediate recall, while the performance on both tasks tended to remain stable or worsen in patients diagnosed with Alzheimer's disease (Frank et al., 1996). By comparison, a mixed cohort of patients with mild cognitive impairment and healthy volunteers showed definite practice effects only on delayed recall (Elman et al., 2018). Finally, on the Logical Memory test, practice effects were more common in patients with mild cognitive impairment than in patient with Alzheimer's disease (Britt et al., 2011; Claus et al., 1991; Elman et al., 2018; Gavett et al., 2016). On the BVMT-R, a measure of visuospatial memory, practice effects were reported on both total and delayed recall, in particular with short inter-test intervals (Duff et al., 2007, 2015, 2017, 2018). However, there was little-to-no signs of practice effects if the inter-test interval was increased to one year or longer (Duff and Hammers, 2022). Furthermore, the magnitude of practice effects on delayed recall correlated with 18F-flutemetamol uptake in amyloid plaques (r = −0.45; P = 0.02; Cohen's d = 1.1) (Duff et al., 2014). Working memory was assessed with a couple of different tests. Digit Span, in particular in the backward condition, Spatial Span and Letter-Number Sequencing were all associated with practice effects (Elman et al., 2018). Similarly, repeated testing with CogState's One-Back Test resulted in reduced reaction times in patients with either mild cognitive impairment or dementia with Lewis Bodies and in improved accuracy scores in the entire study cohort (Hammers et al., 2011). Few studies also studied practice effects on other cognitive abilities. Reduced reaction times indicative of practice effects were reported in patients with dementia with Lewis bodies on CogState's Divided Attention test (Hammers et al., 2011). Practice effects were also observed on the Driving Simulator of Teasdale et al. (2016) during the training phase when live feedback was provided. But the gain from practice was lost during the recall phase, during which no feedback was provided. A trend towards improved scores was observed on the Matrix Reasoning test (Elman et al., 2018). Finally, no practice effects were found on the Verbal Comprehension (Claus et al., 1991). The MMSE showed little-to-no signs of practice effects (Duff et al., 2007; Frank et al., 1996; Toh et al., 2014), although they cannot be entirely ruled out (Gross et al., 2018).

Mitigation strategies

Mitigation strategies help to account and control for practice effects, thereby ensuring accurate interpretation of longitudinal data of functional ability. Several different approaches to mitigate and minimize the impact of practice effects have been implemented (Table S2; supplementary appendix).

Reliable change index

One approach is to compute a reliable change index that corrects for practice effects by identifying whether an observed change is clinically relevant and greater than the expected practice effect (Duff et al., 2017; Higginson et al., 2009; Turner et al., 2016; Utz et al., 2016). However, this approach is associated with some limitations. To compute a reliable change index, data on practice effects obtained from a reference population is required (Utz et al., 2016). Typically, a normative, healthy population is used as the reference population. It is therefore crucial that both the studied patient population and the reference population show practice effects of similar magnitude. Otherwise, the computed reliable change index cannot effectively account for practice effects. The threshold to detect changes considered to be clinically relevant will be reduced if practice effects are underestimated in the reference population (Utz et al., 2016). As a result, a subset of patients showing practice effects would be falsely identified as showing a clinically relevant change. On the other hand, overestimation of practice effects in the reference population would result in more extensive lower bounds for detecting functional decline in the studied patient population (Turner et al., 2016). To circumvent these potential limitations, it has been suggested to use data collected from a comparable but separate patient population instead (Higginson et al., 2009). Additionally, the reliable change index assumes that the gain resulting from practice effects remains constant over time. With multiple test repetitions, however, the gain from practice effects can vary as a function of time or number of test iterations (Glanz et al., 2012). A constant reliable change index will therefore not accurately identify those who show clinically meaningful change beyond practice effects, an effect that is exacerbated with an increasing number of test repetitions. An adaptive reliable change index that takes the temporal dynamics of practice effects into account could help address this limitation. Finally, ceiling or floor effects may prevent the ability to detect clinically meaningful changes if the difference between the baseline score and the maximum or minimum score, respectively, is smaller than the reliable change index (Benedict, 2005).

Standardized regression-based models

A similar approach is to apply standardized regression-based models to predict the test scores at retest (Duff et al., 2017). Unlike the reliable change index, the standardized regression-based model uses information from the studied cohort to predict their test performance at retest. In this simplest form, this prediction is solely based on the baseline test performance. More complicated models make use of additional covariates such as age, gender, level of education or inter-test interval. The z-scores computed from the difference between the predicted and observed scores at retest can be used to define a threshold for detecting functional decline or functional recovery beyond the expected practice effect.

Replacement method

The replacement method of Elman et al. (2018) estimates group-level, attrition-corrected practice effects. With this method, the cohort at retest (i.e., the returnee cohort) is compared against a test-naïve, age-matched cohort (i.e., the replacement cohort). Any difference observed between these two cohorts is assumed to be a combination of attrition and practice effects. Attrition-corrected practice effects are obtained by subtracting the difference in mean scores of the overall cohort at baseline (i.e, mean baseline scores of returnees and those lost to follow-up) and the returnee cohort at baseline (attrition effect) from the difference in mean score of a separate, test-naïve replacement cohort at baseline and the returnee cohort retest (difference score). These estimated practice effects can then be subtracted from the test scores of the returnee cohort obtained at retest, resulting in practice-effect–corrected retest scores. This methodology is more robust for larger sample size of the overall cohort and the cohort lost to follow-up. Depending on the drop-out rate, the replacement cohort can be small (and thus returnee population large), resulting in instability in the calculation of the difference score which is a key part of the attrition-corrected practice effect value. Furthermore, this methodology has been demonstrated for a single retest. While it is possible to apply it to more than one retest, it would require the management of multiple cohorts as retests (i.e., multiple replacement and returnee cohorts). Finally, data from a test-naïve, age-matched replacement cohort may not always be available for less established tests.

Alternative forms

Few studies purposely administered the same form to maximize practice effects (Duff et al., 2011, 2017, 2018). Conversely, the use of alternative forms – if available – can help reduce practice effects if they are driven by learning a specific sequence of items (Beglinger et al., 2014b; Benedict, 2005). In fact, several studies reported on absence of practice effects when using the alternative form, including on the SDMT (Fuchs et al., 2020), the CVLT (Eshaghi et al., 2012) or the Wechsler Memory Scale (Claus et al., 1991). Moreover, in a direct comparison, the use of an alternative form prevented practice effects on all CVLT and BVMT-R metrics (Benedict, 2005). This is contrary to the practice effects observed on both measures in Eshaghi et al. (2012) and in Reilly and Hynes (2018), suggesting that alternative forms may only reduce but not fully prevent practice effects. Mixed results were also obtained on the HVLT, where only patients with pre-manifest, but not manifest, Huntington's disease showed signs of practice effects on the alternative form (Stout et al., 2014). Consistent with the literature (Bever et al., 1995; Cohen et al., 2000, 2001; Eshaghi et al., 2012; Glanz et al., 2012; Nagels et al., 2008; Rosti-Otajärvi et al., 2008), the direct comparison of Benedict (2005) revealed practice effects on both SDMT (group by time interaction effect: P > 0.05) and PASAT (Cohen's d for same form: 0.3; for alternative form: 0.4), irrespective whether the same and alternative form was used. This suggests that patients may develop over time more effective test taking strategies and that this drives the practice effects seen despite the use of alternative forms (Beglinger et al., 2014a; Gross et al., 2018). Consequently, other strategies are needed to minimize the impact of practice effects.

Run-in period

Since improvements due to practice are typically strongest between the first few test iterations, a run-in or familiarization period prior to taking the baseline assessments has been suggested to reduce the magnitude of practice effects (Beglinger et al., 2014a; Beglinger et al., 2014b; Cohen et al., 2000; Stout et al., 2014; Sormani et at., 2019). Such a run-in period would allow patients to become fully familiar with the test and test conditions and to reach steady-state performance prior to their baseline assessment, thereby preventing post-baseline practice effects (Patzold et al., 2002). For the success of a run-in period, it is therefore critical to administer a sufficient number of tests prior to the baseline assessment. However, many studies included in this analysis have administered only two or three test iterations (Tables 4, 5, 6, and 7). This makes it more challenging to establish the minimum number of tests required for the run-in period for each of the four disease areas. For instance, in patients with multiple sclerosis, two to three pre-baseline assessments have been recommended for the PASAT (Cohen et al., 2000; Rosti-Otajärvi et al., 2008). This may not be sufficient considering that a trend of continuous improvement beyond the third test iteration was observed in Glanz et al. (2012). Similarly, Gavett et al. (2016) argued that the previously recommended 2 or 3 pre-baseline assessments with the Wechsler Memory Scale may not be sufficient to prevent further practice effects as patients with mild cognitive impairment showed continuous improvements over all 5 test iterations (Gavett et al., 2016). In addition, the inter-test interval during the run-in period can also impact the likelihood of post-baseline practice effects (Beglinger et al., 2014a). Finally, implementing a run-in period will increase the burden of the patient and increase the cost and time needed to run clinical trials. Time and cost constraints may also limit its use in clinical practice. This is particularly valid for clinician-administered tests. By comparison, digital tests that can be remotely administered at home without supervision by a healthcare professional promise to offer a means to minimize the additional patient burden and cost associated with including multiple pre-baseline assessments, thereby making a run-in period more feasible.

Discussion and outlook

Practice effects are a common phenomenon associated with the repeated administration of performance outcome measures (Tables 4, 5, 6, and 7). Despite the research conducted on practice effects, some gaps still remain: Many different approaches to identify, assess and study practice effects have been applied, which complicates a comparison across studies. Most studies defined practice effects on the basis of improved test performance at retest. This does not allow for practice effects and functional decline (and other longitudinal effects), which is commonly observed in patients with chronic neurologic disorders, to coincide. Such changes in functional ability limit the ability to detect and to account for practice effects. Thus, optimal methods to distinguish between practice effects and functional decline, but also changes in motivation and fatigue, functional recovery, and treatment effects need to be further investigated. The possible impact of previous exposure on the ability to detect further practice effects was largely unaddressed. The temporal dynamics of practice effects has not been studied in detail and further research could expand our understanding how practice effects vary over time. Practice effects on an individual patient level have not been fully characterized. The clinically meaningful information contained within practice effects remains unclear and further research is needed to establish their usefulness in guiding disease management. Finally, further research into the possible impact of the more granular datasets collected with digital performance outcome measures on practice effects is needed.

Comparing practice effects across studies

This review revealed that practice effects were consistently observed across studies for measures of information processing speed or upper extremity function. In contrast, results were more mixed for other measures. Many different factors can contribute towards these mixed results. One possible explanation lies in the nature of the test. Changing the sequence of items by using an alternate form, if available, can reduce the magnitude of practice effects that are driven by item learning (Benedict, 2005). Conversely, the use of the same form increases the magnitude of practice effects (Duff et al., 2018). Differences in the patient characteristics may have also contributed to the mixed results. Studies in patients with mild cognitive impairment, Alzheimer's disease or other forms of dementia suggest that more severe disease is associated with less consistent or weaker practice effects (Campos-Magdaleno et al., 2017; Duff et al., 2012; Frank et al., 1996; Gavett et al., 2016; Hammers et al., 2011). However, Rosas et al., (2020) showed that the proportion of patients showing practice effects increases with disease severity. This points towards some unobserved confounders explaining the differences between studies. Such confounders could include cognitive training or therapeutic interventions between the assessments (Rosas et al., 2020). Others have suggested that a poor baseline performance lends itself to larger margins for improvement, and therefore, to stronger practice effects (Rabbitt et al., 2004). But also any previous exposure to the test could impact the ability to detect and quantify practice effects (see also section ‘4.3 Impact of previous exposure on the ability to detect practice effects’). Finally, the different approaches used to identify, assess and study practice effects (Table S1) further complicate a direct comparison across studies and may partially explain why practice effects were observed in some but not in all studies. Such a comparison would have benefited from a more standardized approach.

Distinguishing practice effects from other effects

A distinction between practice effects and longitudinal effects such as changes in motivation and fatigue, functional recovery and treatment effects was not always possible (Giedraitiene and Kaubrys, 2019; Reilly and Hynes, 2018; Vogt et al., 2009). While changes in motivation and fatigue might coincide with practice effects with both short and long inter-test intervals, functional changes and treatment effects may increasingly impact the ability to discern practice effects the longer the inter-test intervals are. Consequently, stable performance does not necessarily guarantee functional stability as practice effects may mask true functional decline (Elman et al., 2018). Assessing practice effects in the non-interventional cohort separately from the interventional cohort can help to disentangle practice effects from treatment or other interventional effects (Buelow et al., 2015; Patzold et al., 2002; Turner et al., 2016; Vogt et al., 2009).

Impact of previous exposure on the ability to detect practice effects

Most studies did not specify the previous exposure to the investigated performance outcome measures. Only five studies stipulated requirements with regard to previous exposure, or lack thereof, in their inclusion and exclusion criteria (Cohen et al., 2000; Elman et al., 2018; Fuchs et al., 2020; Nagels et al., 2008; Patzold et al., 2002). Two additional studies included a run-in period or practice items prior to the baseline assessment to minimize potential practice effects (Patzold et al., 2002; Snowden et al., 2001). Thus, it is feasible that previous exposure impacted the ability to detect further practice effects, which may partially explain the mixed results observed on some of the performance outcome measures.

Temporal dynamics of practice effects

Another limitation is that practice effects were often assumed to be constant, or linear, over time. However, the temporal dynamics of practice effects, including the duration until steady-state performance is reached and the optimal inter-test interval to minimize the impact of practice effects, has not been studied in detail. Only two studies quantified the duration of the practice phase (Pham et al., 2021; Prince et al., 2018). In addition, it has been shown that practice gains are not linear over time: the gain from practice is greatest over the first few test repetitions and gradually becomes smaller as the number of test repetitions increases (Glanz et al., 2012). In other words, overall practice effects can vary as a function of time or number of test iterations. Strategies that explicitly take this non-linear nature of practice gains into account could help to further improve the accuracy of the interpretation of longitudinal datasets. The non-linear nature of practice effects resides in the rapid gain in performance observed in the first few test iterations, which cannot be linearly modelled with the later stabilization of test performance as the number of iteration increases. This dynamic has been highlighted for example by Pham et al. (2021). However, methodologies explicitly characterizing the non-linear nature of practice are still few and far between. Yet, as this manuscript is a review of the current state of the literature, it wouldn't be appropriate to propose such methodologies without also presenting results illustrating and characterizing such novel approaches.

Practice effects on a group versus patient level

Similarly, most studies limited their assessment of practice effects to a group-level analysis. Questions such as ‘is the improvement in test performance statistically significant?’ or ‘is the change of the cohort greater (or smaller) than expected based on the change seen in a normative population?’, however, do not take differences in practice effects between individual patients into account. Only few studies acknowledged that practice effects may differ from patient to patient and performed their analyses in subgroups of patients defined by their practice effects response or on an individual patient level (Duff et al. 2014, 2017; Pham et al., 2021; Prince et al., 2018; Rosas et al., 2020; Sormani et al., 2019; Turner et al., 2016; Utz et al., 2016). Analyses of longitudinal data in daily clinical practice could however benefit from methods to study practice effects on an individual patient level. This will require a more granular dataset than typically obtained with traditional clinician-administered, in-clinic performance outcome measures. As discussed further below in section ‘4.8 Outlook’, remotely administered digital, sensor-based performance outcome measures could help to collect sufficient data to study practice effects on an individual patient level.

Clinical impact of practice effects

Some have argued that practice effects contain clinically meaningful information and should not be regarded as only a source of unwanted variance. For example, cognitive impairment may be expressed as a diminished capacity to learn, resulting in weaker practice effects (Gavett et al., 2016). Short inter-test intervals, in particular, have been purposely used to elicit practice effects that capture clinically relevant information (Duff et al., 2011, 2014, 2018). It has been hypothesized that practice effects elicited with inter-test intervals as short as one week are more sensitive to cognitive integrity than the baseline assessment itself (Duff et al., 2014). Weaker practice effects have been associated with worse prognosis in mild cognitive impairment (Duff et al., 2011) and worse treatment outcomes in multiple sclerosis (Sormani et al., 2019). Moreover, the magnitude of practice effects correlated with biomarkers of cognitive decline such as hippocampal volume (Duff et al., 2018) or amyloid imaging (Duff et al., 2014). Thus, outcomes of a practice effects analysis could be used both as an endpoint and as a means to stratify patients in future clinical trials. However, studies have shown that practice effects can be detected with inter-test intervals as long as several years (Elman et al., 2018; Frank et al., 1996). As discussed above in section ‘4.2 Distinguishing practice effects from other effects’, practice effects, which improve test performance, may coincide with functional decline, which worsens test performance, if the interval between tests is sufficiently long (Elman et al., 2018). In such a scenario, the performance at retest would still indicate an overall worsening of functional ability as long as the extent of functional decline exceeds the magnitude of practice effects. However, most studies rely on improved test performance to detect and account for practice effects. Thus, the presence of practice effects can introduce unwanted noise and mask the true extent of functional decline, thereby interfering with the longitudinal monitoring of cognitive decline. Considering that the diagnosis of dementia and related disorders requires a documented history of cognitive decline and its impact on daily activities (Arvanitakis et al., 2019), this phenomenon can result in the misdiagnosis or misclassification of a patient if practice effects aren't accounted for on cognitive test batteries (Duff et al., 2011; Elman et al., 2018). This in turn may negatively impact the timely access to treatment, treatment outcomes and patient care. In contrast, the diagnosis of multiple sclerosis relies more heavily on the identification of typical lesions or pathologies detected with magnetic resonance imaging (Polman et al., 2011). This suggests that practice effects have a smaller impact on the clinical management of multiple sclerosis. Nonetheless, regular assessment of functional ability with scored disability scales has been recommended to optimally track the disease and detect disease progression in a timely manner (Rae-Grant et al., 2015). The use of sensitive disability scales, or performance outcome measures, associated with minimal practice effects can help achieve this goal. Practice effects may have an even bigger impact in clinical trials where the efficacy of an intervention is assessed with performance outcome measures (see for example Pliskin et al. (1996)). Thus, it is important for clinical trials to have effective measures in place that account for and mitigate practice effects.

Mitigating practice effects

Despite the research effort, no consensus has been reached on an effective strategy to account for practice effects and mitigate their impact on the interpretation of clinical data. Nonetheless, some mitigation strategies have been proposed and implemented even if not all are universally applicable (Table S2; supplementary appendix). One of the available mitigation strategies is to implement a run-in period prior to the baseline assessment. A run-in period can be implemented for all performance outcome measures provided that ceiling effects are not an issue. As healthy controls and patients may show differences in practice effects (Claus et al., 1991; Prince et al., 2018), it is important that the minimum number of tests to be included in the run-in period is adapted to the studied patient population or covers the cohort with the longest practice duration. However, many studies that investigated practice effects included only 2 or 3 test iterations (Tables 4, 5, 6, and 7), which is not sufficient to establish and a reach consensus on the number of test iterations required to achieve steady-state performance prior to taking the baseline assessment (Gavett et al., 2016; Glanz et al., 2012).

Outlook

Digital, sensor-based tests are increasingly being studied for the assessment of functional ability (Pham et al., 2021; Prince et al., 2018). In contrast to traditional, clinician-administered tests, digital tests are typically self-administered at home and enable short inter-test intervals with prolonged study durations (Prince et al., 2018; Westin et al., 2010). This is both a curse and a blessing as it may expose potential practice effects more prominently while offering a more granular and ecologically valid assessment of functional ability through time. The increased granularity also allows a more objective comparison of absolute test performance across and within subjects as it can be assumed that the impact of practice effects is negligible once a certain number of test iterations has been reached (Cook et al., 2004). In order to leverage this advantage of digital tests and establish suitable baseline performance in non-digital tests, an in-depth analysis of the effect of practice and gold-standard statistical methods is required. First, different metrics and approaches to study practice effects will need to be investigated to establish the metrics that describe practice effects optimally, in particular in the emerging field of digital tests. This includes assessing the applicability of learning curve models to characterize practice effects to study the temporal dynamics of practice effects. Such a model has been previously applied to data obtained from smartphone-based SDMT (Pham et al., 2021). Second, it is yet to be shown whether the higher granularity of digital tests results in a better separation of practice effects and other effects impacting test performance such as functional changes or treatment effects. Third, it has been previously suggested that practice effects may contain clinically relevant information (Duff et al., 2011, 2014, 2018; Sormani et al., 2019). Future work will therefore need to investigate whether digital tests can be leveraged to extract such information. Finally, digital tests could also be used to disentangle test features and their underlying functions, for example, sensorimotor, cognitive and memorization processes that show practice effects from those that do not. This can be leveraged to develop future tests and test features that are resistant to practice effects, thereby simplifying the assessment of a subject's functional capacity. These efforts will provide us with a more detailed understanding of practice effects, allows us to more accurately interpret longitudinal data and possibly help us to separate the unwanted noise introduced by practice effects from the clinical meaningful information contained within them.

Limitations

Practice effects were only considered if the improved test performance, or test performances that were better than expected, due to practice or the repetition of a task could not be explained by means, including interventional effects, functional recovery, or changes in motivation and fatigue levels. This definition limited the ability to identify practice effects that occurred within the same time frame as functional decline (or other longitudinal effects that result in worsened test performance), as can be expected in patients with chronic neurologic disorders. However, this limitation is not only a limitation of this review, but also a general limitation of the study of practice effects. Furthermore, most studies enrolled predominantly white, highly educated participants, which limits the generalizability of the findings.

Conclusions

The variance introduced by practice effects is shared by many performance outcome measures and could ultimately be addressed by the thorough characterization and evaluation of such alterations for specific subject populations, study designs, test activities, and delivery procedures. Due to its prevalence, an analysis of the presence and magnitude of practice effects on inter-individual and intra-individual data obtained from repeated assessments should be expected. Additionally, mitigation strategies should be in place from study design to data analysis, especially if practice effects interact with the studied intervention. The failure to do so may result in misdiagnosis or inaccurate interpretation of clinical data. In light of the recent development of digital, sensor-based test batteries and their associated higher number of test iterations, there is renewed need for strategies to assess practice effects and mitigate their impact on the interpretation of clinical data. In particular, the much higher granularity made possible with digital tests offer a new opportunity to properly characterize and tackle the impact of practice effects on performance outcome measures, including deconvolving practice effects from true functional changes and treatment effects in clinical trials and clinical practice. Future work should, therefore, aim to identify optimal metrics for detecting and characterizing practice effects and their properties in such highly granular datasets. In addition, the analysis of practice effects should also be leveraged to guide adequate study design, data analysis strategy, and the selection of novel digital test features that are resistant to practice effects.

Declarations

Author contribution statement

All authors listed have significantly contributed to the development and the writing of this article.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement

Data included in article/supp. material/referenced in article.

Declaration of interest's statement

Sven P. Holm is a contractor for F. Hoffmann–La Roche Ltd. Arnaud M. Wolfer is an employee of F. Hoffmann–La Roche Ltd. Grégoire H.S. Pointeau is an employee and shareholder of F. Hoffmann–La Roche Ltd. Florian Lipsmeier is an employee of F. Hoffmann–La Roche Ltd. Michael Lindemann is a consultant for F. Hoffmann–La Roche Ltd. via Inovigate.

Additional information

No additional information is available for this paper.

Panel. Definitions
Practice effects: Practice effects are any change or improvement that results from practice or repetition of task items or activities, including repeated exposure to an instrument, rather than due to a true change in an individual's ability. Many studies, however, consider such improvements to be practice effects only if these improvements resulted in improved test scores. Practice effects are sometimes also known as learning effects.
Longitudinal effects: Unlike practice effects, longitudinal effects describe changes in test performance resulting from functional changes, treatment intervention, or changes in motivation or fatigue levels. These longitudinal effects typically occur at larger timescales but may be confounded with practice effects.
Run-in period: A period/number of test iterations during which large practice effects are allowed to occur, until the magnitude of alterations from one test to the next is negligible. The run-in assessments are discarded and the subsequent test iteration is considered as a measure of baseline performance. Sometimes also known as ‘familiarization’ or ‘massed practice’ period.
Iterations: Number of times a subject undertakes an assessment irrespective of the duration between test repetitions.

86 in total

1. "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician.

Authors: M F Folstein; S E Folstein; P R McHugh
Journal: J Psychiatr Res Date: 1975-11 Impact factor: 4.791

2. Practice and drop-out effects during a 17-year longitudinal study of cognitive aging.

Authors: Patrick Rabbitt; Peter Diggle; Fiona Holland; Lynn McInnes
Journal: J Gerontol B Psychol Sci Soc Sci Date: 2004-03 Impact factor: 4.077

3. Statistical evaluation of learning curve effects in surgical trials.

Authors: Jonathan A Cook; Craig R Ramsay; Peter Fayers
Journal: Clin Trials Date: 2004 Impact factor: 2.486

4. Within-session practice effects in patients referred for suspected dementia.

Authors: Kevin Duff; Gordon Chelune; Kathryn Dennett
Journal: Dement Geriatr Cogn Disord Date: 2012-06-11 Impact factor: 2.959

5. Cognitive functions over the course of 1 year in multiple sclerosis patients treated with disease modifying therapies.

Authors: Kathrin S Utz; De-Hyung Lee; Alexandra Lämmer; Anne Waschbisch; Ralf A Linker; Thomas Schenk
Journal: Ther Adv Neurol Disord Date: 2016-04-18 Impact factor: 6.570

6. Test-retest reliability of the Purdue Pegboard for persons with multiple sclerosis.

Authors: Jennifer Gallus; Virgil Mathiowetz
Journal: Am J Occup Ther Date: 2003 Jan-Feb

7. A home environment test battery for status assessment in patients with advanced Parkinson's disease.

Authors: Jerker Westin; Mark Dougherty; Dag Nyholm; Torgny Groth
Journal: Comput Methods Programs Biomed Date: 2009-09-09 Impact factor: 5.428

8. Older Adults with Mild Cognitive Impairments Show Less Driving Errors after a Multiple Sessions Simulator Training Program but Do Not Exhibit Long Term Retention.

Authors: Normand Teasdale; Martin Simoneau; Lisa Hudon; Mathieu Germain Robitaille; Thierry Moszkowicz; Denis Laurendeau; Louis Bherer; Simon Duchesne; Carol Hudon
Journal: Front Hum Neurosci Date: 2016-12-27 Impact factor: 3.169

9. Validity of the timed 25-foot walk as an ambulatory performance outcome measure for multiple sclerosis.

Authors: Robert W Motl; Jeffrey A Cohen; Ralph Benedict; Glenn Phillips; Nicholas LaRocca; Lynn D Hudson; Richard Rudick
Journal: Mult Scler Date: 2017-02-16 Impact factor: 6.312

10. A Cognitive Occupation-Based Programme for People with Multiple Sclerosis: A Study to Test Feasibility and Clinical Outcomes.

Authors: Sean Reilly; Sinéad M Hynes
Journal: Occup Ther Int Date: 2018-05-02 Impact factor: 1.448