| Literature DB >> 31438167 |
K Bretonnel Cohen1, Lawrence E Hunter1, Peter S Pressman2.
Abstract
"P-hacking" is the repeated analysis of data until a statistically significant result is achieved. We show that p-hacking can also occur during data generation, sometimes unintentionally. We use the type-token ratio to demonstrate that differences in the definitions of "type" and "token" can produce significantly different results. Since these terms are rarely defined in the biomedical literature, the result is an inability to meaningfully interpret the body of literature that makes use of this measure.Entities:
Keywords: Language; vocabulary
Mesh:
Year: 2019 PMID: 31438167 PMCID: PMC8956251 DOI: 10.3233/SHTI190470
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1 –10 random permutations of subsets of the definitions of type and token. Each data point is the type-token ratio for one paper. Each column represents one permutation.
Number of pairwise differences between sets of definitions of type and token, divided into groups with high- and low-magnitude type-token ratios.
| Group | Pairs | Significantly different (%) |
|---|---|---|
| High | 7 | 0 (0%) |
| Low | 16 | 13 (81%) |
| All pairs | 23 | 13 (57%) |