| Literature DB >> 31517183 |
Marc Brysbaert1, Michaël Stevens2.
Abstract
In psychology, attempts to replicate published findings are less successful than expected. For properly powered studies replication rate should be around 80%, whereas in practice less than 40% of the studies selected from different areas of psychology can be replicated. Researchers in cognitive psychology are hindered in estimating the power of their studies, because the designs they use present a sample of stimulus materials to a sample of participants, a situation not covered by most power formulas. To remedy the situation, we review the literature related to the topic and introduce recent software packages, which we apply to the data of two masked priming studies with high power. We checked how we could estimate the power of each study and how much they could be reduced to remain powerful enough. On the basis of this analysis, we recommend that a properly powered reaction time experiment with repeated measures has at least 1,600 word observations per condition (e.g., 40 participants, 40 stimuli). This is considerably more than current practice. We also show that researchers must include the number of observations in meta-analyses because the effect sizes currently reported depend on the number of stimuli presented to the participants. Our analyses can easily be applied to new datasets gathered.Entities:
Keywords: F1 analysis; F2 analysis; effect size; mixed effects models; power analysis; random factors
Year: 2018 PMID: 31517183 PMCID: PMC6646942 DOI: 10.5334/joc.10
Source DB: PubMed Journal: J Cogn ISSN: 2514-4820
Figure 1Construction of the two prime types from the data of the Adelman et al. (2014) priming megastudy. Prime types varied from an identity prime (extreme left) to an all letter different prime (extreme right).
Figure 2Snapshot of the Adelman et al. (2014) database used. Participant is the rank number of the participant tested (not all participants who started the study provided useful results); item = the target word responded to; prime = highly or lowly related to the target; RT is the reaction time to the target in the lexical decision task; correct = whether or not the answer was correct.
Outcome of a traditional F1 and F2 analysis of the Adelman et al. (2014) dataset.
| F1 analysis |
|---|
Nparticipants = 1020, RTrelated condition = 660.0, RTunrelated condition = 674.9 F1(1,1019) = 766.1, MSe = 146.98, p < .01 d = .87 |
Nitems = 420, RTrelated condition = 661.4, RTunrelated condition = 677.6 F2(1,419) = 649.3, MSe = 84.87, p < .01 d = 1.24 |
Outcome of the lmer analysis (Bates et al., 2015) of the Adelman et al. (2014) dataset.
| Random effects | |||
|---|---|---|---|
| Groups Name | Variance | Std.Dev. | Corr |
| participant (Intercept) | 10032.34 | 100.162 | |
| prime/participant | 27.89 | 5.282 | –0.40 |
| item (Intercept) | 1900.12 | 43.590 | |
| prime/item | 19.88 | 4.458 | 0.53 |
| Residual | 22128.15 | 148.755 | |
| Number of obs: 376476, groups: participant, 1020; item, 420 | |||
| (Intercept) | 662.582 | 3.805 | 174.13 |
| Prime (lo vs. hi) | 16.029 | 0.557 | 28.78 |
| Estimated RTs are: related = 662.6 ms, unrelated = 678.6 ms | |||
| Prime | –0.036 | ||
Illustration of how the effect size in the F1 analysis depends on the number of stimuli over which the participant means are averaged, and how the effect size in the F2 analysis depends on the number of participants over which the item means are averaged. Values obtained by drawing random samples of N from the Adelman et al. (2014) database.
| Effect size of the F1 analysis with all participants when the number of stimuli is limited to: | |
| Nitems = 20 | d = .19 |
| Nitems = 40 | d = .28 |
| Nitems = 80 | d = .39 |
| Nitems = 160 | d = .55 |
| Nitems = 320 | d = .77 |
| Nitems = 420 | d = .87 |
| Effect size of the F2 analysis with all items included when the number of participants is limited to: | |
| Nparts = 20 | d = .18 |
| Nparts = 40 | d = .26 |
| Nparts = 80 | d = .37 |
| Nparts = 160 | d = .52 |
| Nparts = 320 | d = .74 |
| Nparts = 640 | d = 1.02 |
| Nparts = 1020 | d = 1.24 |
Figure 3Input in the Westfall et al. (2014) website to calculate power of a simple design with random effects of participants and targets (items). Data based on the lmer analysis of the Adelman et al. (2014) dataset.
Power in the Adelman et al. (2014) study when estimated on the basis of simulation. Numbers not given are all > 80. This table shows that the 16 ms effect in the study could reliably be detected with some 6,000 observations (60 participants, 100 stimuli; 80 participants, 80 stimuli; 100 participants, 60 stimuli). The standard errors of the estimates are about 2.5 (i.e. the confidence interval of the power estimate of 17% in the 20 participants 20 items condition goes from 12% to 22%).
| Nparts | |||||||
|---|---|---|---|---|---|---|---|
| Nitems | 20 | 40 | 60 | 80 | 100 | 120 | 1020 |
| 17 | 25 | 32 | 42 | 46 | 51 | 99.8 | |
| 21 | 41 | 51 | 69 | 72 | 76 | ||
| 28 | 52 | 64 | 76 | 88 | |||
| 37 | 62 | 77 | 83 | ||||
| 41 | 70 | 84 | |||||
| 47 | 74 | ||||||
| 86 | |||||||
Outcome of analyses based on invRT for the Adelman et al. (2014) database, showing that the analysis of invRT is more powerful than the analysis of RT. Numbers not shown in the power analysis table are >80.
| F1 analysis | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nparticipants = 1020, invRTrelated condition = –1.61 (RT = 1000/1.61 = 621 ms), invRTunrelated condition = –1.57 (RT = 1000/1.57 = 637 ms) | |||||||||||||||
| F1(1,1019) = 1153, MSe = 0.0007, p < .01 | |||||||||||||||
| d = 1.06 | |||||||||||||||
| Nitems = 420, invRTrelated condition = –1.61 (RT = 1000/1.61 = 621 ms), invRTunrelated condition = –1.57 (RT = 1000/1.57 = 637 ms) | |||||||||||||||
| F2(1,419) = 1000, MSe = 0.0004, p < .01 | |||||||||||||||
| d = 1.54 | |||||||||||||||
| | |||||||||||||||
| participant (Intercept) | 0.0490519 | 0.368 | |||||||||||||
| prime/participant | 0.0004761 | 0.004 | –0.78 | ||||||||||||
| item (Intercept) | 0.0091417 | 0.068 | |||||||||||||
| prime/item | 0.0001626 | 0.001 | 0.10 | ||||||||||||
| Residual | 0.0746261 | 0.559 | |||||||||||||
| Number of obs: 376476, groups: participant, 1020; item, 420 | |||||||||||||||
| | |||||||||||||||
| (Intercept) | –1.604953 | 0.008382 | –191.48 | ||||||||||||
| prime | 0.041329 | 0.001285 | 32.17 | ||||||||||||
| Estimated RTs are: related= –1000/–1.605 = 623 ms, unrelated = –1000/(–1.605+.041) = 639 ms | |||||||||||||||
| | |||||||||||||||
| Prime | –0.354 | ||||||||||||||
| Effect size Westfall et al.: | |||||||||||||||
| | |||||||||||||||
| 21 | 39 | 45 | 54 | 61 | 71 | 100 | |||||||||
| 37 | 59 | 76 | 86 | ||||||||||||
| 47 | 74 | 88 | |||||||||||||
| 54 | 84 | ||||||||||||||
| 67 | |||||||||||||||
| 68 | |||||||||||||||
| 97 | |||||||||||||||
Figure 4Top of the Perea et al. (2015) database.
Outcome of analyses based on invRT for the Perea et al. (2015) database.
| F1 analysis | ||||
|---|---|---|---|---|
| Nparticipants = 40, invRTrelated condition = –1.756 (RT = 1000/1.756 = 569ms), invRTunrelated condition = –1.647 (RT = 1000/1.647 = 607 ms) | ||||
| F1(1,39) = 77.81, MSe = 0.00306, p < .01 | ||||
| d = 1.39 | ||||
| Nitems = 120, invRTrelated condition = –1.757 (RT = 1000/1.757 = 569ms), invRTunrelated condition = –1.645 (RT = 1000/1.645 = 608ms) | ||||
| F2(1,119) = 120.9, MSe = 0.0062, p < .01 | ||||
| d = 1.00 | ||||
| | ||||
| participant (Intercept) | 0.059202 | 0.349 | ||
| prime/participant | 0.001526 | 0.009 | –1.00 | |
| item (Intercept) | 0.004365 | 0.026 | ||
| prime/item | 0.000000 | 0.000 | ||
| Residual | 0.104580 | 0.616 | ||
| Number of obs: 4512, groups: ITEM, 120; SUBJECT, 40 | ||||
| | ||||
| (Intercept) | –1.75656 | 0.03953 | –44.43 | |
| REPETITION | 0.11161 | 0.01145 | 9.75 | |
| | ||||
| REPETITION | –0.627 | |||
| Estimated RTs are: related= –1000/–1.757 = 569ms, unrelated = –1000/(–1.756+.112) = 608ms | ||||
| Effect size Westfall et al.: | ||||
Figure 5Outcome of the powerCurve command from the simr package for the Perea et al. (2015) dataset. It shows how the power based on the 40 participants tested increases as a function of the number of items. With 40 items we have enough power to observe the 39 ms repetition priming effect.
Figure 6Outcome of the powerCurve command (simr package) for the Perea et al. (2015) dataset. It shows how the power based on the 120 items tested increases as a function of the number of participants. With 7 participants we have enough power to observe the 39 ms repetition priming effect.
Power of the Perea et al. study to observe priming effects of various magnitudes. The same information is given for the Adelman et al. study when the number of participants is limited to 40 and the number of items to 120.
| Perea et al. | Adelman et al. (40p:120i) | |
|---|---|---|
| Priming effect = 5 ms | .25 | .32 |
| Priming effect = 7 ms | .44 | .45 |
| Priming effect = 10 ms | .80 | .61 |
| Priming effect = 12 ms | .89 | .81 |
| Priming effect = 15 ms | .91 | .90 |
Numbers of participants and trials used in a sample of masked priming studies. Trials limited to those items that were analyzed (e.g., the words in a lexical decision task). The last column shows the number of observations per cell of the design (i.e. per condition tested).
| Reference | Study | Task | Nconditions | Npart | Ntrials | Nobs/cell |
|---|---|---|---|---|---|---|
| Bell et al. ( | Exp 1 | Semantic classification | 6 | 23 | 120 | 460 |
| Exp 2 | Semantic classification | 6 | 41 | 120 | 820 | |
| Exp 3 | Semantic classification | 6 | 39 | 120 | 780 | |
| Beyersmann et al. ( | Lexical decision | 8 | 191 | 50 | 1194 | |
| Ussishkin et al. ( | Exp 2a | Lexical decision | 3 | 66 | 36 | 792 |
| Exp 2b | Lexical decision | 3 | 70 | 36 | 840 | |
| Kgolo & Eisenbeiss ( | Exp 1 | Lexical decision | 7 | 70 | 55 | 550 |
| Exp 2 | Lexical decision | 8 | 63 | 55 | 433 | |
| Exp 3 | Lexical decision | 8 | 65 | 55 | 446 | |
| Sulpizio & Job ( | Exp 1 | Word naming | 6 | 24 | 48 | 192 |
| Exp 2 | Word naming | 6 | 24 | 48 | 192 | |
| Exp 3 | Word naming | 6 | 24 | 48 | 192 | |
| Exp 4 | Word naming | 9 | 20 | 56 | 124 | |
| Dasgupta et al. ( | Lexical decision | 10 | 28 | 490 | 1372 | |
| Guldenpenning et al. ( | Exp 1 | Action decision | 24 | 44 | 408 | 748 |
| Exp 2 | Action decision | 48 | 50 | 576 | 600 | |
| Atas et al. ( | Direction decision | 32 | 29 | 2560 | 2320 | |
| Perea et al. ( | Lexical decision | 4 | 40 | 120 | 1200 | |
| Kiefer et al. ( | Exp 1 | Evaluative decision | 8 | 24 | 384 | 1152 |
| Exp 2 | Evaluative decision | 16 | 24 | 768 | 1152 | |
| Exp 3 | Evaluative decision | 2 | 20 | 150 | 1500 | |