| Literature DB >> 29450793 |
Gilles Dutilh1, Jeffrey Annis2, Scott D Brown3, Peter Cassey3, Nathan J Evans3, Raoul P P P Grasman4, Guy E Hawkins3, Andrew Heathcote5, William R Holmes2, Angelos-Miltiadis Krypotos6, Colin N Kupitz7, Fábio P Leite8, Veronika Lerche9, Yi-Shin Lin5, Gordon D Logan2, Thomas J Palmeri2, Jeffrey J Starns10, Jennifer S Trueblood2, Leendert van Maanen4, Don van Ravenzwaaij11, Joachim Vandekerckhove7, Ingmar Visser4, Andreas Voss9, Corey N White12, Thomas V Wiecki13, Jörg Rieskamp14, Chris Donkin15.
Abstract
Most data analyses rely on models. To complement statistical models, psychologists have developed cognitive models, which translate observed variables into psychologically interesting constructs. Response time models, in particular, assume that response time and accuracy are the observed expression of latent variables including 1) ease of processing, 2) response caution, 3) response bias, and 4) non-decision time. Inferences about these psychological factors, hinge upon the validity of the models' parameters. Here, we use a blinded, collaborative approach to assess the validity of such model-based inferences. Seventeen teams of researchers analyzed the same 14 data sets. In each of these two-condition data sets, we manipulated properties of participants' behavior in a two-alternative forced choice task. The contributing teams were blind to the manipulations, and had to infer what aspect of behavior was changed using their method of choice. The contributors chose to employ a variety of models, estimation methods, and inference procedures. Our results show that, although conclusions were similar across different methods, these "modeler's degrees of freedom" did affect their inferences. Interestingly, many of the simpler approaches yielded as robust and accurate inferences as the more complex methods. We recommend that, in general, cognitive models become a typical analysis tool for response time data. In particular, we argue that the simpler models and procedures are sufficient for standard experimental designs. We finish by outlining situations in which more complicated models and methods may be necessary, and discuss potential pitfalls when interpreting the output from response time models.Entities:
Keywords: Cognitive modeling; Diffusion Model; LBA; Response Times; Validity
Year: 2019 PMID: 29450793 PMCID: PMC6449220 DOI: 10.3758/s13423-017-1417-2
Source DB: PubMed Journal: Psychon Bull Rev ISSN: 1069-9384
Fig. 1Graphical illustration of the diffusion model
Manipulations of response caution and response bias across blocks and descriptive statistics per block
| Block | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Speed–accuracy | sp | ac | sp | sp | ac | ac | sp | sp | ac | ac | sp | sp | ac | ac | sp | sp | ac | sp |
| bias | no | no | no | left | left | no | no | right | right | no | no | left | left | no | no | right | right | no |
| RT (ms) | ||||||||||||||||||
| .1 quantile | 360 | 510 | 370 | 380 | 490 | 490 | 360 | 380 | 470 | 480 | 370 | 370 | 490 | 490 | 370 | 380 | 480 | 370 |
| .5 quantile | 490 | 670 | 490 | 500 | 640 | 640 | 470 | 490 | 610 | 610 | 480 | 480 | 630 | 630 | 470 | 480 | 610 | 480 |
| .9 quantile | 690 | 1040 | 660 | 670 | 970 | 990 | 640 | 660 | 920 | 900 | 660 | 630 | 960 | 980 | 630 | 650 | 910 | 650 |
| accuracy | 0.76 | 0.91 | 0.82 | 0.8 | 0.91 | 0.91 | 0.8 | 0.84 | 0.91 | 0.91 | 0.81 | 0.8 | 0.91 | 0.9 | 0.78 | 0.81 | 0.91 | 0.79 |
Top section: the design of the experimental blocks in the experiment. Bottom section:
Descriptive statistics for each experimental block. Note that behavior remains largely invariant over the course of the experiment. Response caution (speed: sp vs. accuracy: ac) and response bias were manipulated across blocks. Difficulty was manipulated within blocks
Pseudo experiments
| exp | ease | caution | Bias R | Ndt | blocks cond. A | blocks cond. B |
|---|---|---|---|---|---|---|
| 1 | – | – | – | – | hard, speed, no bias | hard, speed, no bias |
| 2 | B | – | – | – | hard, speed, no bias | easy, speed, no bias |
| 3 | – | B | – | – | hard, speed, no bias | hard, accuracy, no bias |
| 4 | – | – | B | – | hard, speed, no bias | hard, speed, bias |
| 5 | B | B | – | – | hard, speed, no bias | easy, accuracy, no bias |
| 6 | B | – | B | – | hard, speed, no bias | easy, speed, bias |
| 7 | – | B | B | – | hard, speed, no bias | hard, accuracy, bias |
| 8 | A | B | – | – | easy, speed, no bias | hard, accuracy, no bias |
| 9 | A | – | B | – | easy, speed, no bias | hard, speed, bias |
| 10 | – | A | B | – | hard, accuracy, no | hard, speed, bias |
| 11 | A | B | B | – | easy, speed no bias | hard, accuracy, bias |
| 12 | B | A | B | – | hard, accuracy, no | easy, speed, bias |
| 13 | B | B | A | – | hard, speed, bias | easy, accuracy, no bias |
| 14 | B | B | B | – | hard, speed, no bias | easy, accuracy, bias |
Each line shows for one data set which of the two conditions (A or B) was manipulated to have a higher value on each of the components: ease, caution, bias toward Response B and nondecision time. "-" indicates no difference. Rightmost columns show from which conditions (see Table 1) the data in each of the two conditions originate
Methods used by contributors
| Contributors | Code | Model | Estimation inference | |
|---|---|---|---|---|
| Grasman | GR | Simple diffusion | EZ2 | E (Quade test on ind.) |
| Krypotos & Wiecki | KW | Simple diffusion | HB | E (Population post.) |
| van Ravenzwaaij | RA | Simple diffusion | HB | E (Bayesian |
| Vandekerckhove & Kupitz | VK | Simple diffusion | HB | M (Model indicator parameter) |
| White | WH | Simple diffusion | χ2 | E (Bayesian |
| Hawkins | HA | Full diffusion1 | HB | E (Population post.) |
| Leite | LE | Full diffusion | χ2 | H (Parameter estimates) |
| Starns | ST | Full diffusion1 | χ2 | E (Bayesian |
| Vandekerckhove | VA | Full diffusion1 | ML2 | M (Wald test) |
| Voss & Lerche | VL | Full diffusion | KS | E (Frequentist |
| Annis & Palmeri | AP | LBA | HB | M+E (wAIC + Population post.)3 |
| Cassey & Logan | CL | LBA | HB | E (Population post.)3 |
| Lin & Heathcote | LH | LBA4 | ML | M+E (AIC/BIC + ANOVA) |
| Trueblood & Holmes Visser | TH | LBA5 | HB | E (Population post.) |
| Visser | VI | LBA | ML | M (Stepwise regression) |
| Evans & Brown van Maanen | EB | – | – | H (Summary Statistics) |
| van Maanen | MA | – | – | H (Summary statistics) |
HB = Hierarchical Bayes; χ2 = chi-squared; ML = maximum likelihood; EZ2 = method of moments estimation, as implemented in EZ2; KS = Kolmogorov–Smirnov; E = estimate-based; M = model selection; H = heuristic based; Pop = population; Post = posterior; Ind = individuals. 1Variability parameters fixed across conditions. 2Data treated as one participant. 3Assumed just one manipulation per experiment, unless extremely strong evidence otherwise. 4Both LBA and full diffusion model were fit, but the best fitting model was used, and this was always LBA. 5Bias in accumulation rate parameters. Modelers AP, CL, ST, VI, and KW were allowed 2 extra weeks after the initial deadline to hand in their inferences
Performance of the different methods
Column “True” shows for each data set, for each component (ease, caution, bias, ndt), which condition (A, B, or 0: none of both) was manipulated to have a higher value on that component. Colored letters indicate inferences made by the analysts. Green letters indicate correct inferences, blue misses, orange false alarms, black cases where there direction of the effect was flipped. Methods are sorted by the applied RT model, from left to right: simple diffusion model, full diffusion, LBA, and model–free
Fig. 2A visualization of the agreement between the different methods used. The radius of the black circles relative to the lighter colored background circles in the upper right of the matrix reflects the proportion of inferences shared between a pair of methods. The shade of the box underlying each set of points, and the numbers in the lower left of the matrix, depict the average of the proportion of shared inferences in each section. For example, the average proportion of shared inferences between all LBA and all simple diffusion models was 0.62
Fig. 3A summary of the inferences of all methods of analyses for all data sets. Grey letters in front of each box show for each data set (1-14) and for each component (ease, caution, bias, ndt), which condition (A, B, or 0: none of both) was manipulated to have a higher value on that component. Bars indicate how many methods concluded for each of the options A, 0, and B. See text for details
Summary statistics for the quality of inferences drawn using each method
| key | Simple diffusion | Full Diffusion | LBA | No Model | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GR | KW | RA | VK | WH | HA | LE | ST | VA | VL | AP | CL | LH | TH | VI | EB | MA | ||
| planned | Correct | 0.84 | 0.73 | 0.73 | 0.66 | 0.75 | 0.73 | 0.71 | 0.75 | 0.77 | 0.71 | 0.66 | 0.64 | 0.70 | 0.62 | 0.68 | 0.77 | 0.70 |
| Miss | 0.07 | 0 | 0.07 | 0.12 | 0.04 | 0.09 | 0.05 | 0 | 0.20 | 0.09 | 0.29 | 0.30 | 0.12 | 0.23 | 0.07 | 0.12 | 0.20 | |
| FA | 0.09 | 0.25 | 0.20 | 0.21 | 0.21 | 0.18 | 0.20 | 0.25 | 0.04 | 0.20 | 0.05 | 0.05 | 0.12 | 0.14 | 0.11 | 0.11 | 0.09 | |
| alternative 1 | Correct | 0.80 | 0.82 | 0.80 | 0.66 | 0.82 | 0.77 | 0.75 | 0.82 | 0.75 | 0.73 | 0.75 | 0.73 | 0.77 | 0.77 | 0.77 | 0.91 | 0.77 |
| Miss | 0.11 | 0 | 0.07 | 0.16 | 0.05 | 0.09 | 0.05 | 0 | 0.20 | 0.11 | 0.25 | 0.27 | 0.11 | 0.16 | 0.07 | 0.07 | 0.18 | |
| FA | 0.09 | 0.16 | 0.14 | 0.18 | 0.14 | 0.14 | 0.11 | 0.18 | 0.05 | 0.16 | 0 | 0 | 0.09 | 0.07 | 0.07 | 0.02 | 0.05 | |
| Alternative 2 | Correct | 0.68 | 0.86 | 0.89 | 0.82 | 0.91 | 0.89 | 0.70 | 0.91 | 0.64 | 0.84 | 0.50 | 0.48 | 0.64 | 0.68 | 0.52 | 0.68 | 0.64 |
| Miss | 0.23 | 0.02 | 0.07 | 0.12 | 0.04 | 0.09 | 0.12 | 0 | 0.34 | 0.11 | 0.45 | 0.46 | 0.23 | 0.29 | 0.23 | 0.25 | 0.30 | |
| FA | 0.09 | 0.11 | 0.04 | 0.05 | 0.05 | 0.02 | 0.11 | 0.09 | 0.02 | 0.05 | 0.05 | 0.05 | 0.07 | 0.04 | 0.11 | 0.07 | 0.04 | |
| No ndt | Correct | 0.86 | 0.83 | 0.86 | 0.76 | 0.88 | 0.86 | 0.74 | 0.93 | 0.71 | 0.86 | 0.55 | 0.52 | 0.67 | 0.64 | 0.57 | 0.74 | 0.71 |
Statistics are shown for three different scoring keys (Planned: assuming selective influence; Alternative 1: assuming caution manipulations affected also ease; Alternative 2: assuming caution manipulations affected also nondecision time) as well as for the planned key when ignoring nondecision time inferences. Methods are sorted by the applied RT model, from left to right: simple diffusion model, full diffusion, LBA, and model free. See text for details