| Literature DB >> 29649220 |
Matteo Colombo1, Georgi Duev2, Michèle B Nuijten3, Jan Sprenger4.
Abstract
Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science.Entities:
Mesh:
Year: 2018 PMID: 29649220 PMCID: PMC5896892 DOI: 10.1371/journal.pone.0194360
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A schematic representation of our sampling procedure.
Summary statistics for our sample and the relevant subfields.
Specifications of the years from which HTML articles were available, the number of articles in our sample, the number of articles with NHST results reported in APA style, the number of NHST results, and the median number of APA reported NHST results per article. Articles could be classified into multiple subfields.
| Field | Years included | # Articles in final sample (after manual check for NHST results) | # Articles with NHST results in APA format | # NHST results | Median # NHST results per article with NHST results |
|---|---|---|---|---|---|
| Total (without duplicates) | 1993–2016 | 220 | 174 (79.1%) | 2,573 | 11.0 |
| Action | 2005–2016 | 75 | 64 (85.3%) | 1122 | 12.0 |
| Ethics | 1993–2016 | 53 | 34 (64.2%) | 641 | 16.0 |
| Epistemology | 2007–2015 | 25 | 21 (84.0%) | 364 | 10.0 |
| Language | 2009–2015 | 25 | 17 (68.0%) | 283 | 14.0 |
| Mind | 2009–2016 | 25 | 16 (64.0%) | 191 | 10.5 |
| Metaphysics | 2007–2016 | 18 | 13 (72.2%) | 156 | 7.0 |
| Foundations of Experimental Philosophy | 2010–2015 | 13 | 5 (38.5%) | 56 | 11.0 |
| Miscellaneous | 2006–2015 | 12 | 8 (66.7%) | 135 | 10.0 |
General prevalence of inconsistencies for the articles in the current study, relative to those articles that contained NHST results (N = 173).
| Category | Absolute Number | Percentage |
|---|---|---|
| Articles with at least one inconsistency | 67 | 38.73% |
| Articles with at least one gross inconsistency | 11 | 6.36% |
| P-values that are inconsistent | 160 | 6.25% |
| P-values that are grossly inconsistent | 13 | 0.51% |
| Average % of p-values per article that is inconsistent | - | 6.85% |
| Average % of p-values per article that is grossly inconsistent | - | 0.41% |
* this percentage takes dependency of p-value within an article into account.
Fig 2The average percentage of articles within a field with at least one (gross) inconsistency and the average percentage of (grossly) inconsistent p-values per article, split up by field.
Inconsistencies are depicted in white and gross inconsistencies in grey. For the fields Action, Epistemology, Ethics, Foundations of Exp. Phil., Language, Metaphysics, Mind, and Misc, respectively, the number of articles with null-hypothesis significance testing (NHST) results is 63, 21, 34, 5, 17, 13, 16, 8, and the average number of NHST results in an article is 17.1, 17.3, 18.9, 11.2, 16.6, 12.0, 11.9, and 16.9, for the fields respectively.
Fig 3Percentage of articles with at least one inconsistency (open circles) or at least one gross inconsistency (solid circles) over time.
Fig 4The percentage of gross inconsistencies in p-values reported as significant (solid line) and nonsignificant (dotted line), over the years.
Fig 5A p-curve analysis for the reported p-values in our final sample.
The actual distribution of p-values is compared to the distribution expected under a true null hypothesis and a hypothesis of 33% power. Note that 20 results were automatically excluded from the p-curve because they were not < .05 upon recalculation.
Main results of studies investigating the prevalence of statistical reporting inconsistencies in psychology, compared to the current study in experimental philosophy.
Table adapted from Table 2 in Nuijten et al. [7]. Percentage of articles with (grossly) inconsistent results computed relative to N = 173.
| No. of Articles downloaded | No. of NHST results (without outlier) | % Inconsis-tencies | % gross inconsis-tencies | % Articles with at least one inconsistency | % Articles with at least one gross inconsistency | |
|---|---|---|---|---|---|---|
| Current Study | 220 | 2,558 | 6.3 | 0.5 | 38.7 | 6.4 |
| [ | 30,717 | 258,105 | 9.7 | 1.4 | 49.6 | 12.9 |
| [ | 697 | 8,105 | 10.6 | 0.8 | 63.0 | 20.5 |
| [ | 153 | 2,667 | 6.7 | 1.1 | 45.1 | 15.0 |
| [ | 186 | 1,212 | 12.2 | 2.3 | 48.0 | 17.6 |
| [ | 333 | 4,248 | 11.9 | 1.3 | 45.4 | 12.4 |
| [ | 49 | 1,148 | 4.3 | 0.9 | 53.1 |
1. Only t, F, and χ2 values with a p < .05
2. Number of articles with at least one (gross) inconsistency/number of articles with null-hypothesis significance testing results
3. Only included t, F, and χ2 values
4. Only articles with at least one completely reported t or F test with a reported p-value < .05