| Literature DB >> 33296358 |
Marjan Bakker1, Coosje L S Veldkamp2, Marcel A L M van Assen1,3, Elise A V Crompvoets1,4, How Hwee Ong5, Brian A Nosek6,7, Courtney K Soderberg6, David Mellor6, Jelte M Wicherts1.
Abstract
Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of "researcher degrees of freedom" aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false-positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared 2 formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge Registration (now called "OSF Preregistration," http://osf.io/prereg/). The Prereg Challenge format was a "structured" workflow with detailed instructions and an independent review to confirm completeness; the "Standard" format was "unstructured" with minimal direct guidance to give researchers flexibility for what to prespecify. Results of comparing random samples of 53 preregistrations from each format indicate that the "structured" format restricted the opportunistic use of researcher degrees of freedom better (Cliff's Delta = 0.49) than the "unstructured" format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.Entities:
Mesh:
Year: 2020 PMID: 33296358 PMCID: PMC7725296 DOI: 10.1371/journal.pbio.3000937
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Degrees of freedom in formulating the hypotheses, designing the study, collecting the data, analyzing the data, and reporting of psychological studies.
| Code | Related | Type of Researcher Degrees of Freedom | Label |
|---|---|---|---|
| T1 | R6 | Conducting explorative research without any hypothesis | Hypothesis |
| T2 | Studying a vague hypothesis that fails to specify the direction of the effect | Direction hypothesis | |
| D1 | A8 | Creating multiple manipulated IVs and conditions | Multiple manipulated IVs |
| D2 | A10 | Measuring additional variables that can later be selected as covariates, IVs, mediators, or moderators | Additional IVs |
| D3 | A5 | Measuring the same DV in several alternative ways | Multiple measures DV |
| D4 | A7 | Measuring additional constructs that could potentially act as primary outcomes | Additional constructs |
| D5 | A12 | Measuring additional variables that enable later exclusion of participants from the analyses (e.g., awareness or manipulation checks) | Additional IVs exclusion |
| D6 | Failing to conduct a well-founded power analysis | Power analysis | |
| D7 | C4 | Failing to specify the sampling plan and allowing for running (multiple) small studies | Sampling plan |
| C1 | Failing to randomly assign participants to conditions | Random assignment | |
| C2 | Insufficient blinding of participants and/or experimenters | Blinding | |
| C3 | Correcting, coding, or discarding data during data collection in a non-blinded manner | Data handling/collection | |
| C4 | D7 | Determining the data collection stopping rule on the basis of desired results or intermediate significance testing | Stopping rule |
| A1 | Choosing between different options of dealing with incomplete or missing data on ad hoc grounds | Missing data | |
| A2 | Specifying preprocessing of data (e.g., cleaning, normalization, smoothing, and motion correction) in an ad hoc manner | Data preprocessing | |
| A3 | Deciding how to deal with violations of statistical assumptions in an ad hoc manner | Assumptions | |
| A4 | Deciding on how to deal with outliers in an ad hoc manner | Outliers | |
| A5 | D3 | Selecting the DV out of several alternative measures of the same construct | Select DV measure |
| A6 | Trying out different ways to score the chosen primary DV | DV scoring | |
| A7 | D4 | Selecting another construct as the primary outcome | Select primary outcome |
| A8 | D1 | Selecting IVs out of a set of manipulated IVs | Select IV |
| A9 | D1 | Operationalizing manipulated IVs in different ways (e.g., by discarding or combining levels of factors) | Operationalizing manipulated IVs |
| A10 | D2 | Choosing to include different measured variables as covariates, IVs, mediators, or moderators | Include additional IVs |
| A11 | Operationalizing non-manipulated IVs in different ways | Operationalizing non-manipulated IVs | |
| A12 | D5 | Using alternative inclusion and exclusion criteria for selecting participants in analyses | In/exclusion criteria |
| A13 | Choosing between different statistical models | Statistical model | |
| A14 | Choosing the estimation method, software package, and computation of SEs | Method and package | |
| A15 | Choosing inference criteria (e.g., Bayes factors, alpha level, sidedness of the test, corrections for multiple testing) | Inference criteria | |
| R6 | T1 | Presenting exploratory analyses as confirmatory (HARKing) | HARKing |
Note: This table provides the codes used in the original list [18], indicates to which other degrees of freedom each degree of freedom is related, the description of the degree of freedom (identical to the original list), and short labels describing the degrees of freedom that we use later when describing our results.
DV, dependent variable; HARKing, hypothesizing after the results are known; IV, independent variable; SE, standard error.
Means and distributions of scores per degree of freedom for registrations from Unstructured and Structured formats and differences in median scores between formats.
| Unstructured Format | Structured Format | Differences in Median | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DF | Mean (SD) | 0 | 1 | 2 | 3 | NA | Mean (SD) | 0 | 1 | 2 | 3 | NA | Test | Holm | Cliff’s | |
| T1 | Hypothesis | 1.98 (0.31) | 1.9 | - | 96.2 | 1.9 | - | 2.02 (0.14) | 0.0 | - | 98.1 | 1.9 | - | W = 1,404, | 1.000 | 0.02 |
| T2 | Direction hypothesis | 1.60 (0.84) | 20.8 | - | 77.4 | 1.9 | - | 1.54 (1.20) | 34.6 | - | 42.3 | 23.1 | - | W = 1,422, | 1.000 | 0.03 |
| D1 | Multiple manipulated IVs | 0.38 (1.02) | 64.2 | - | 0.0 | 9.4 | 26.4 | 1.03 (1.42) | 46.2 | - | 1.9 | 23.1 | 28.8 | W = 880, | 0.443 | 0.22 |
| D2 | Additional IVs | 0.00 (0.00) | 100 | - | - | 0.0 | - | 0.12 (0.58) | 96.2 | - | - | 3.8 | - | W = 1,431, | 1.000 | 0.04 |
| D3 | Multiple measures DV | 1.25 (0.98) | 37.7 | - | 62.3 | 0.0 | - | 1.62 (0.80) | 19.2 | - | 80.8 | 0.0 | - | W = 1,633, | 0.540 | 0.19 |
| D4 | Additional constructs | 0.00 (0.00) | 100 | - | - | 0.0 | - | 0.00 (0.00) | 100 | - | - | 0.0 | - | NA | NA | 0.00 |
| D5 | Additional IVs exclusion | 0.87 (0.92) | 45.3 | 26.4 | 24.5 | 3.8 | - | 1.23 (0.70) | 13.5 | 51.9 | 32.7 | 1.9 | - | W = 1,729.5, | 0.327 | 0.26 |
| D6 | Power analysis | 0.72 (0.91) | 58.5 | 11.3 | 30.2 | 0.0 | - | 0.96 (0.99) | 50.0 | 3.8 | 46.2 | 0.0 | - | W = 1,551, | 1.000 | 0.13 |
| D7 | Sampling plan | 0.47 (0.58) | 56.6 | 39.6 | 3.8 | 0.0 | - | 0.71 (0.58) | 34.6 | 57.7 | 5.8 | 0.0 | 1.9 | W = 1,641, | 0.540 | 0.21 |
| C1 | Random assignment | 0.27 (0.67) | 66.0 | 1.9 | 9.4 | 0.0 | 22.6 | 0.86 (0.92) | 34.6 | 11.5 | 25.0 | 0.0 | 28.8 | W = 1,028.5, | 0.023 | 0.36 |
| C2 | Blinding | 1.00 (1.00) | 3.8 | 1.9 | 3.8 | 0.0 | 90.6 | 0.02 (0.14) | 92.3 | 1.9 | 0.0 | 0.0 | 5.8 | W = 50.5, | <0.001 | −0.59 |
| C3 | Data handing/collection | 0.04 (0.19) | 96.2 | 3.8 | 0.0 | 0.0 | - | 0.04 (0.19) | 96.2 | 3.8 | 0.0 | 0.0 | - | W = 1,379, | 1.000 | 0.00 |
| C4 | Stopping rule | 0.47 (0.58) | 56.6 | 39.6 | 3.8 | 0.0 | - | 0.71 (0.58) | 34.6 | 57.7 | 5.8 | 0.0 | 1.9 | W = 1,641, | 0.540 | 0.21 |
| A1 | Missing data | 0.19 (0.39) | 81.1 | 18.9 | 0.0 | 0.0 | - | 0.76 (0.55) | 28.8 | 63.5 | 5.8 | 0.0 | 1.9 | W = 2,065.5, | <0.001 | 0.53 |
| A2 | Data preprocessing | 0.50 (0.84) | 9.4 | - | 1.9 | 0.0 | 88.7 | 0.50 (0.93) | 11.5 | - | 3.8 | 0.0 | 84.6 | W = 23, | 1.000 | −0.04 |
| A3 | Assumptions | 0.04 (0.19) | 96.2 | 3.8 | 0.0 | 0.0 | - | 0.18 (0.48) | 84.6 | 9.6 | 3.8 | 0.0 | 1.9 | W = 1,488, | 0.835 | 0.10 |
| A4 | Outliers | 0.25 (0.62) | 84.9 | 5.7 | 9.4 | 0.0 | - | 0.69 (0.92) | 57.7 | 19.2 | 19.2 | 3.8 | - | W = 1,751, | 0.056 | 0.27 |
| A5 | Select DV measure | 1.25 (0.98) | 37.7 | - | 62.3 | 0.0 | - | 1.62 (0.80) | 19.2 | - | 80.8 | 0.0 | - | W = 1,633, | 0.540 | 0.19 |
| A6 | DV scoring | 0.55 (0.70) | 56.6 | 32.1 | 11.3 | 0.0 | - | 0.65 (0.65) | 44.2 | 46.2 | 9.6 | 0.0 | - | W = 1,519, | 1.000 | 0.10 |
| A7 | Select primary outcome | 0.00 (0.00) | 100 | - | - | 0.0 | - | 0.00 (0.00) | 100 | - | - | 0.0 | - | NA | NA | 0.00 |
| A8 | Select IV | 0.59 (1.19) | 58.5 | - | 1.9 | 13.2 | 26.4 | 1.14 (1.48) | 44.2 | - | 0.0 | 26.9 | 28.8 | W = 853.5, | 0.910 | 0.18 |
| A9 | Operationalize manipulated IVs | 1.05 (1.26) | 41.5 | - | 18.9 | 13.2 | 26.4 | 1.92 (1.19) | 17.3 | - | 25.0 | 28.8 | 28.8 | W = 982.5, | 0.078 | 0.36 |
| A10 | Include additional IVs | 0.00 (0.00) | 100 | - | - | 0.0 | 0.12 (0.58) | 96.2 | - | - | 3.8 | - | W = 1,431, | 1.000 | 0.04 | |
| A11 | Operationalize non-manipulated IVs | 0.43 (0.66) | 28.3 | 11.3 | 3.8 | 0.0 | 56.6 | 0.63 (0.67) | 26.9 | 25.0 | 5.8 | 0.0 | 42.3 | W = 405, | 1.000 | 0.17 |
| A12 | In/exclusion criteria | 0.87 (0.92) | 45.3 | 26.4 | 24.5 | 3.8 | - | 1.21 (0.72) | 15.4 | 50.0 | 32.7 | 1.9 | - | W = 1,710.5, | 0.438 | 0.24 |
| A13 | Statistical model | 0.85 (0.77) | 37.7 | 39.6 | 22.6 | 0.0 | - | 1.31 (0.51) | 1.9 | 65.4 | 32.7 | 0.0 | - | W = 1,846, | 0.023 | 0.34 |
| A14 | Method and package | 0.08 (0.38) | 96.2 | 0.0 | 3.8 | 0.0 | - | 0.13 (0.44) | 90.4 | 5.8 | 3.8 | 0.0 | - | W = 1,455.5, | 1.000 | 0.06 |
| A15 | Inference criteria | 0.17 (0.43) | 84.9 | 13.2 | 1.9 | 0.0 | - | 1.08 (0.33) | 1.9 | 88.5 | 9.6 | 0.0 | - | W = 2,516, | <0.0001 | 0.83 |
| R6 | HARKing | 0.00 (0.00) | 100 | - | 0 | 0.0 | - | 0.00 (0.00) | 100 | - | 0.0 | 0.0 | - | NA | NA | 0.00 |
Note: The mean scores per degree of freedom can range from 0 to 3. Distribution of scores are given in percentages. Not all percentages add up to exactly 100% due to rounding to 1 decimal of each individual percentage. A “-” sign indicates that this score was not possible for this degree of freedom (see Methods section).
Cliff’s D, Cliff’s Delta; DF, degree of freedom; DV, dependent variable; HARKing, hypothesizing after the results are known; Holm p, Holm p-value; IV, independent variable; NA, not applicable; SD, standard deviation.