| Literature DB >> 34100223 |
Thomas Pronk1,2,3, Dylan Molenaar4, Reinout W Wiers5,6, Jaap Murre4.
Abstract
Estimating the reliability of cognitive task datasets is commonly done via split-half methods. We review four methods that differ in how the trials are split into parts: a first-second half split, an odd-even trial split, a permutated split, and a Monte Carlo-based split. Additionally, each splitting method could be combined with stratification by task design. These methods are reviewed in terms of the degree to which they are confounded with four effects that may occur in cognitive tasks: effects of time, task design, trial sampling, and non-linear scoring. Based on the theoretical review, we recommend Monte Carlo splitting (possibly in combination with stratification by task design) as being the most robust method with respect to the four confounds considered. Next, we estimated the reliabilities of the main outcome variables from four cognitive task datasets, each (typically) scored with a different non-linear algorithm, by systematically applying each splitting method. Differences between methods were interpreted in terms of confounding effects inflating or attenuating reliability estimates. For three task datasets, our findings were consistent with our model of confounding effects. Evidence for confounding effects was strong for time and task design and weak for non-linear scoring. When confounding effects occurred, they attenuated reliability estimates. For one task dataset, findings were inconsistent with our model but they may offer indicators for assessing whether a split-half reliability estimate is appropriate. Additionally, we make suggestions on further research of reliability estimation, supported by a compendium R package that implements each of the splitting methods reviewed here.Entities:
Keywords: Cognitive tasks; Confounding effects; Non-linear scoring algorithms; Reliability
Mesh:
Year: 2021 PMID: 34100223 PMCID: PMC8858277 DOI: 10.3758/s13423-021-01948-3
Source DB: PubMed Journal: Psychon Bull Rev ISSN: 1069-9384
Overview of task datasets and scoring algorithms used in the empirical assessment
| Task Dataset | Gambling Approach Avoidance Task (AAT) (Boffo et al., | Go/No-Go (GNG) (Hedge et al., | Ethnicity-Valence Implicit Association Task (IAT) (Abacioglu et al., | Stop Signal Task (SST) (Hedge et al., |
|---|---|---|---|---|
| #Participants, #trials | 48, 128 | 47, 600 | 31, 192 | 45, 600 |
| Scoring algorithm | Double difference of median RTs for correct responses (Heuer et al., | d’ (Hautus, | D-score for IATs that require a correct response for continuing to the next trial (Greenwald et al., | Stop-Signal Reaction Time, integration method (Hedge et al., |
| Scoring conditions (#trials) | Approach gambling (32), avoid gambling (32), approach neutral (32), avoid neutral (32) | Go (450), no-go (150) | Congruent practice (24), incongruent practice (24), congruent test (72), incongruent test (72) | Go (450), stop (150) |
| Stimuli (#trials) | 32 Stimuli (2 approach, 2 avoid) | 5 stimuli (90 go, 30 no-go) | 4 categories (2 × 6 practice, 2 × 18 test) | None |
| Trial sequence | Scoring conditions and stimuli in random order | Stimuli in sequence, scoring conditions random within stimuli | Scoring conditions in sequence, stimuli alternated between target and attribute | Scoring conditions: stop delay based on go performance |
| Design interactions | None | Stimulus with first-second half | Scoring with first-second half and stimulus with odd-even | Scoring with odd-even |
Coefficients per splitting method, stratification level, and task dataset
| Method | Stratification | AAT | GNG | IAT | SST | ||||
|---|---|---|---|---|---|---|---|---|---|
| % | Coef | % | Coef | % | Coef | % | Coef | ||
| First-second | None | 94.20 | 0.20 | 0.00 | 0.84 | 0.29 | 0.74 | ||
| First-second | Scoring | 99.67 | 0.40 | 0.00 | 0.84 | 0.58 | 0.68 | 0.25 | 0.74 |
| First-second | Scoring and Stimuli | 89.94 | 0.15 | 17.69 | 0.91 | 6.02 | 0.75 | ||
| Odd-even | None | 23.90 | -0.24 | 73.00 | 0.94 | 11.54 | 0.76 | 18.71 | 0.88 |
| Odd-even | Scoring | 54.19 | -0.08 | 59.71 | 0.93 | -- | -- | 99.37 | 0.96 |
| Odd-even | Scoring and Stimuli | 89.94 | 0.15 | -- | -- | 95.01 | 0.89 | ||
| Permutated | None | -0.10 | 0.92 | 0.82 | 0.90 | ||||
| Permutated | Scoring | -0.11 | 0.93 | 0.83 | 0.90 | ||||
| Permutated | Scoring and Stimuli | -0.12 | 0.93 | 0.83 | |||||
| Monte Carlo | None | 0.20 | 0.93 | 0.86 | 0.91 | ||||
| Monte Carlo | Scoring | 0.21 | 0.93 | 0.86 | 0.91 | ||||
| Monte Carlo | Scoring and Stimuli | 0.35 | 0.93 | 0.88 | |||||
AAT Approach Avoidance Task, GNG Go/No-Go, IAT Implicit Association Task, SST Stop Signal Task
For the AAT, split-half Pearson correlations are shown, while for the other tasks, split-half Spearman-Brown-adjusted Pearson correlations are shown. The column Coef contains the value of the coefficient, while the column % shows the percentile of this value in the cumulative empirical distribution of non-stratified permutated coefficients. Below each coefficient obtained via resampled splitting, the 95% HDI is shown in italics. Coefficients that could not be calculated are left empty, while coefficients of splitting methods that are equivalent to the splitting method above them are indicated by dashed lines (--)
Fig. 1Histograms of coefficients calculated via permutated and Monte Carlo splitting without stratification. For the Approach Avoidance Task (AAT), coefficients are Pearson correlations, while for the Go/No-Go (GNG), Implicit Association Task (IAT), and Stop Signal Task (SST) they are Spearman-Brown-adjusted Pearson correlations