| Literature DB >> 32735597 |
Marjan Bakker1, Coosje L S Veldkamp2, Olmo R van den Akker1, Marcel A L M van Assen1,3, Elise Crompvoets1,4, How Hwee Ong5, Jelte M Wicherts1.
Abstract
In this preregistered study, we investigated whether the statistical power of a study is higher when researchers are asked to make a formal power analysis before collecting data. We compared the sample size descriptions from two sources: (i) a sample of pre-registrations created according to the guidelines for the Center for Open Science Preregistration Challenge (PCRs) and a sample of institutional review board (IRB) proposals from Tilburg School of Behavior and Social Sciences, which both include a recommendation to do a formal power analysis, and (ii) a sample of pre-registrations created according to the guidelines for Open Science Framework Standard Pre-Data Collection Registrations (SPRs) in which no guidance on sample size planning is given. We found that PCRs and IRBs (72%) more often included sample size decisions based on power analyses than the SPRs (45%). However, this did not result in larger planned sample sizes. The determined sample size of the PCRs and IRB proposals (Md = 90.50) was not higher than the determined sample size of the SPRs (Md = 126.00; W = 3389.5, p = 0.936). Typically, power analyses in the registrations were conducted with G*power, assuming a medium effect size, α = .05 and a power of .80. Only 20% of the power analyses contained enough information to fully reproduce the results and only 62% of these power analyses pertained to the main hypothesis test in the pre-registration. Therefore, we see ample room for improvements in the quality of the registrations and we offer several recommendations to do so.Entities:
Mesh:
Year: 2020 PMID: 32735597 PMCID: PMC7394423 DOI: 10.1371/journal.pone.0236079
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Descriptives of all the items in the protocol for each of the registration types separately.
| SPR | PCR | IRB | Total | ES | ||
|---|---|---|---|---|---|---|
| 53 | 52 | 155 | 210 | |||
| Mention power (Q1) | 27 (51%) | 31 (60%) | 128 (83%) | 186 (72%) | 0.304 | < .001 |
| Sample size based on: | ||||||
| Power (Q2a) | 24 (45%) | 28 (54%) | 122 (79%) | 174 (67%) | 0.310 | < .001 |
| Practical constraints (Q2b) | 6 (11%) | 20 (38%) | 13 (8%) | 39 (15%) | 0.330 | < .001 |
| Rule of Thumb (Q2c) | 4 (8%) | 5 (10%) | 38 (25%) | 47 (18%) | 0.204 | .082 |
| Other studies (Q2d) | 2 (4%) | 13 (25%) | 12 (8%) | 27 (10%) | 0.245 | .021 |
| As many participants as possible (Q2e) | 0 (0%) | 1 (2%) | 4 (3%) | 5 (2%) | 0.073 | 1.000 |
| N. of power analyses (Q3) | 0.174 | .874 | ||||
| 1 | 16 (67%) | 26 (93%) | 106 (87%) | 148 (85%) | ||
| 2 | 5 (21%) | 2 (7%) | 13 (11%) | 20 (11%) | ||
| 3 or more | 3 (13%) | 0 (0%) | 3 (2%) | 6 (3%) | ||
| Program (Q4) | 0.125 | 1.000 | ||||
| G*power | 11 (46%) | 12 (43%) | 73 (60%) | 96 (55%) | ||
| Other | 0 (0%) | 1 (4%) | 5 (4%) | 6 (3%) | ||
| Not specified | 13 (54%) | 15 (54%) | 44 (36%) | 72 (41%) | ||
| ES type (Q5) not specified | 4 (17%) | 8 (29%) | 40 (33%) | 52 (30%) | 0.120 | 1.000 |
| ES value (Q6) not specified | 2 (8%) | 5 (18%) | 19 (16%) | 26 (15%) | 0.078 | 1.000 |
| ES based on: (Q7) | 0.315 | .028 | ||||
| Cohen’s values | 5 (21%) | 6 (21%) | 25 (20%) | 36 (21%) | ||
| Earlier study | 5 (21%) | 12 (43%) | 17 (14%) | 34 (20%) | ||
| Literature | 1 (4%) | 3 (11%) | 16 (13%) | 20 (11%) | ||
| Pilot study | 0 (0%) | 2 (7%) | 0 (0%) | 2 (1%) | ||
| Only interested in large ES | 1 (4%) | 0 (0%) | 0 (0%) | 1 (1%) | ||
| Other | 0 (0%) | 0 (0%) | 3 (2%) | 3 (2%) | ||
| Not specified | 12 (50%) | 5 (18%) | 61 (50%) | 78 (45%) | ||
| α (Q8) | 0.100 | 1.000 | ||||
| .05 | 15 (63%) | 21 (75%) | 77 (63%) | 113 (65%) | ||
| Other value | 1 (4%) | 0 (0%) | 10 (8%) | 11 (6%) | ||
| Not specified | 8 (33%) | 7 (25%) | 35 (29%) | 50 (29%) | ||
| Sidedness of the test (Q9) | 0.164 | 1.000 | ||||
| One-sided | 5 (21%) | 2 (7%) | 5 (4%) | 12 (7%) | ||
| Two-sided | 1902 (8%) | 5 (18%) | 19 (16%) | 26 (15%) | ||
| Not specified | 17 (71%) | 21 (75%) | 98 (80%) | 136 (78%) | ||
| Power (Q10) | 0.125 | 1.000 | ||||
| .8 | 14 (58%) | 17 (61%) | 67 (55%) | 98 (56%) | ||
| Other | 9 (38%) | 9 (32%) | 32 (26%) | 50 (29%) | ||
| Not specified | 1 (4%) | 2 (7%) | 23 (19%) | 26 (15%) | ||
| Sample size | ||||||
| Median | 126 | 90 | 92 | 99.5 | 1.000 | |
| Not specified | 13 (25%) | 2 (4%) | 5 (3%) | 20 (8%) | 0.320 | < .001 |
| Complete (Q12) | 9 (38%) | 4 (14%) | 21 (17%) | 34 (20%) | 0.183 | 1.000 |
| Relevant (Q13) | 5 (21%) | 3 (11%) | - | 8 (15%) | 0.086 | 1.000 |
| Correct (Q14) | 8 (33%) | 4 (14%) | 21 (17%) | 33 (19%) | 0.149 | 1.000 |
a Cramer’s V for all fisher exact tests
b Holm corrected p values
c based on all included registrations and IRB proposals
d This is a robust ANOVA with 20% trimmed means.
Mean (M), Median (Md), [range], and frequency (N) of the different effect sizes used in the power analyses (Cohen’s d, r, f, and f, effect sizes transformed to Cohen’s d) for each of the registration type separately.
| Effect size | SPR | PCR | IRB |
|---|---|---|---|
| Cohen’s | |||
| - | |||
According to Cohen (33) the threshold values are for Cohen’s d 0.2, 0.5, and 0.8, for r 0.1, 0.3, and 0.5, for f 0.1, 0.25, and 0.4, and for f 0.02, 0.15, and 0.35 for small, medium, and large effect sizes, respectively. These are also the threshold values as used in G*power.