| Literature DB >> 35904797 |
Kait Clark1,2, Kayley Birch-Hurst1,3, Charlotte R Pennington1,4,5, Austin C P Petrie1,6,7, Joshua T Lee1,8, Craig Hedge4,9,10.
Abstract
Research in perception and attention has typically sought to evaluate cognitive mechanisms according to the average response to a manipulation. Recently, there has been a shift toward appreciating the value of individual differences and the insight gained by exploring the impacts of between-participant variation on human cognition. However, a recent study suggests that many robust, well-established cognitive control tasks suffer from surprisingly low levels of test-retest reliability (Hedge, Powell, & Sumner, 2018b). We tested a large sample of undergraduate students (n = 160) in two sessions (separated by 1-3 weeks) on four commonly used tasks in vision science. We implemented measures that spanned a range of perceptual and attentional processes, including motion coherence (MoCo), useful field of view (UFOV), multiple-object tracking (MOT), and visual working memory (VWM). Intraclass correlations ranged from good to poor, suggesting that some task measures are more suitable for assessing individual differences than others. VWM capacity (intraclass correlation coefficient [ICC] = 0.77), MoCo threshold (ICC = 0.60), UFOV middle accuracy (ICC = 0.60), and UFOV outer accuracy (ICC = 0.74) showed good-to-excellent reliability. Other measures, namely the maximum number of items tracked in MOT (ICC = 0.41) and UFOV number accuracy (ICC = 0.48), showed moderate reliability; the MOT threshold (ICC = 0.36) and UFOV inner accuracy (ICC = 0.30) showed poor reliability. In this paper, we present these results alongside a summary of reliabilities estimated previously for other vision science tasks. We then offer useful recommendations for evaluating test-retest reliability when considering a task for use in evaluating individual differences.Entities:
Mesh:
Year: 2022 PMID: 35904797 PMCID: PMC9344221 DOI: 10.1167/jov.22.8.18
Source DB: PubMed Journal: J Vis ISSN: 1534-7362 Impact factor: 2.004
Average intraclass correlation coefficient (ICC) and 95% CI observed in simulated data.
| True r | ICC [95% CI] |
|---|---|
| 0.4 | 0.4 [0.26, 52] |
| 0.6 | 0.6 [0.49, 69] |
| 0.8 | 0.8 [0.73, 85] |
Note. The “True r” refers to a correlation imposed in the simulated data.
Figure 1.Example trials of the behavioral tasks. Note. (A) An example trial of the UFOV task with the number four as the central target, and a peripheral target 7.07 degrees from fixation among 23 distractors. (B) An illustration of the random dot stimulus patch. (C) An example trial of the MOT task with five target circles among five distractor circles. (D) An example trial of the VWM task depicting a five-item change trial.
Means (SD) for measures of the motion coherence (MoCo), useful field of view (UFOV), multiple-object tracking (MOT), and visual working memory (VWM) tasks.
| Task | Measure | Session 1 | Session 2 |
|---|---|---|---|
| MoCo | Threshold (% coherent) | 0.28 (0.10) | 0.28 (0.10) |
| UFOV | Number accuracy | 0.94 (0.06) | 0.96 (0.05) |
| Inner accuracy | 0.79 (0.19) | 0.91 (0.16) | |
| Middle accuracy | 0.24 (0.17) | 0.30 (0.20) | |
| Outer accuracy | 0.13 (0.12) | 0.15 (0.15) | |
| MOT | Max items | 5.93 (0.66) | 5.94 (0.62) |
| Threshold (number of items) | 4.48 (0.67) | 4.47 (0.80) | |
| VWM | Capacity/K | 2.34 (0.75) | 2.39 (0.83) |
Intraclass correlations (ICC) and Spearman's Rho correlation estimates for the motion coherence (MoCo), useful field of view (UFOV), multiple-object tracking (MOT), and visual working memory (VWM) tasks.
| Task | Measure | ICC [95% CI] | Rho [95% CI] |
|---|---|---|---|
| MoCo | Threshold | 0.60 [0.48, 0.69] | 0.57 [0.43, 0.68] |
| UFOV | Number accuracy | 0.48 [0.33, 0.60] | 0.47 [0.31, 0.61] |
| Inner accuracy | 0.35 [0.10, 0.54] | 0.50 [0.36, 0.62] | |
| Middle accuracy | 0.60 [0.44, 0.72] | 0.65 [0.51, 0.75] | |
| Outer accuracy | 0.74 [0.66, 0.81] | 0.75 [0.63, 0.82] | |
| MOT | Max items | 0.41 [0.26, 0.53] | 0.36 [0.20, 0.51] |
| Threshold (number of items) | 0.36 [0.20, 0.49] | 0.31 [0.15, 0.45] | |
| VWM | Capacity/K | 0.77 [0.69, 0.83] | 0.78 [0.73, 0.84] |
Figure 2.Reliability of the key measures from the motion coherence (A), multiple-object tracking (B, C), useful visual working memory change detection (D), and useful field of view (E, F, G, H) tasks. Note. Red markers indicate mean group performance from sessions 1 and 2. Error bars show ± standard error of measurement (SEM). Black markers indicate individual participant scores for session 1 and session 2; where multiple participants have the same score, black markers overlap.
Figure 3.Variance components of the ICC for each behavioral measure. Note. The relative size of the variance components for each measure reported. The bar sizes are normalized according to the total variance for the measure and subdivided by variance accounted for by differences between participants (white), differences between sessions (grey), and error variance (black).
Summary of test-retest reliability from the literature.
| Task | Measure | Study | Reliability | Correlation coefficient |
|---|---|---|---|---|
| Eriksen flanker | RT cost |
| 0.48 | Pearson's |
|
| 0.48 | Pearson's | ||
|
| 0.91 | ICC | ||
|
| 0.52 | Pearson's | ||
|
| 0.57 | ICC | ||
| Error cost |
| 0.29 | Pearson's | |
|
| 0.14 | Pearson's | ||
|
| 0.65 | ICC | ||
|
| 0.72 | ICC | ||
| Posner cueing task | Cueing effect |
| 0.70 | ICC |
| Navon task | Local RT cost |
| 0.14 | ICC |
| Local error cost |
| 0.82 | ICC | |
| Global RT cost |
| 0 | ICC | |
| Global error cost |
| 0.17 | ICC | |
| Digit vigilance test (DVT) | Task duration |
| 0.83 | ICC |
| Continuous performance task (CPT) | Commission errors |
| 0.73 | Pearson's |
|
| 0.51 | ICC | ||
|
| 0.72 | ICC | ||
| Omission errors |
| 0.42 | Pearson's | |
| Tasi test | % Hits |
| 0.15 | ICC |
| % Commission errors |
| 0.23 | ICC | |
| Mean RT |
| 0.85 | ICC | |
| Conjunctive Continuous Performance Task-Visual (CCPT-V) | Mean RT/hits |
| 0.76 | Pearson's |
| Trees Simple Visual Discrimination (DiViSA) | Commission errors |
| 0.75 | ICC |
| Test duration (seconds) |
| 0.72 | ICC | |
| Conjunctive Visual Search Test (CVST) | Mean RT |
| 0.52 | Pearson's |
| Adaptative Choice Visual Search (ACVS) | Optimal choice (%) |
| 0.83 | Pearson's |
| Switch rate (%) |
| 0.77 | Pearson's | |
| Mouse Click Foraging Task (MCFT) | Mean run length (feature condition) |
| 0.70 | Pearson's |
| Mean run length (conjunction search) |
| 0.88 | Pearson's | |
| Split-Half Line Segment (SHLS) | Accuracy (hard targets) |
| [0.71, 0.89]† | Pearson's |
| Value driven attentional-capture | RT cost |
| 0.11 | Pearson's |
| % Trials with initial fixation on high-value distractor |
| 0.80 | Pearson's | |
| Dot-probe task | RT cost |
| 0.20‡ | Pearson's |
|
| 0.09 | Pearson's | ||
| % Trials with initial fixation on fear face |
| 0.71 | ICC | |
| Attentional blink | Switch AB |
| 0.62 | Pearson's |
|
| 0.39 | Pearson's | ||
| Change detection task | K/capacity |
| 0.70 | Pearson's |
| Visuospatial N-back task | Mean accuracy 2-back |
| 0.16 | Pearson's |
|
| 0.54 | Pearson's | ||
| Mean accuracy 3-back |
| 0.70 | Pearson's | |
|
| 0.73 | Pearson's | ||
| Visuoverbal N-back task | Mean accuracy 3-back |
| 0.57 | ICC |
Note. *Clarke et al. (2022) re-analyzed data from Hartkamp and Thornton (2017) to estimate test-retest reliability for the foraging paradigm. †The 95% confidence interval for Pearson's correlation coefficient. ‡Reliability for 500 ms angry condition reported.
Figure 4.Recommendations for evaluating test-retest reliability (see Doros & Lew, 2010; Gnambs, 2014; Hedge et al., 2018b; Hedge et al., 2020; Kievit et al., 2013; Meyhöfer et al., 2016; Nunnally, 1978; Parsons et al., 2019; Schuerger & Witt, 1989; as referenced above).