| Literature DB >> 35263374 |
Nicole M White1, Thirunavukarasu Balasubramaniam2, Richi Nayak2, Adrian G Barnett1.
Abstract
Appropriate descriptions of statistical methods are essential for evaluating research quality and reproducibility. Despite continued efforts to improve reporting in publications, inadequate descriptions of statistical methods persist. At times, reading statistical methods sections can conjure feelings of dèjá vu, with content resembling cut-and-pasted or "boilerplate text" from already published work. Instances of boilerplate text suggest a mechanistic approach to statistical analysis, where the same default methods are being used and described using standardized text. To investigate the extent of this practice, we analyzed text extracted from published statistical methods sections from PLOS ONE and the Australian and New Zealand Clinical Trials Registry (ANZCTR). Topic modeling was applied to analyze data from 111,731 papers published in PLOS ONE and 9,523 studies registered with the ANZCTR. PLOS ONE topics emphasized definitions of statistical significance, software and descriptive statistics. One in three PLOS ONE papers contained at least 1 sentence that was a direct copy from another paper. 12,675 papers (11%) closely matched to the sentence "a p-value < 0.05 was considered statistically significant". Common topics across ANZCTR studies differentiated between study designs and analysis methods, with matching text found in approximately 3% of sections. Our findings quantify a serious problem affecting the reporting of statistical methods and shed light on perceptions about the communication of statistics as part of the scientific process. Results further emphasize the importance of rigorous statistical review to ensure that adequate descriptions of methods are prioritized over relatively minor details such as p-values and software when reporting research outcomes.Entities:
Mesh:
Year: 2022 PMID: 35263374 PMCID: PMC8906599 DOI: 10.1371/journal.pone.0264360
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Word clouds for ten topics for statistical methods sections published in PLOS ONE.
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).
| Topic | Statistical methods text | Potential matches | Jaccard score | |
|---|---|---|---|---|
| Median (IQR) | Boilerplate | |||
| 1 | Statistical analysis was performed using | 3,015 | 0.5 (0.5 to 0.75) | 189 |
| 2 | 1,228 | 0.82 (0.73 to 0.91) | 494 | |
| Categorical variables were expressed as | 643 | 0.75 (0.63 to 0.88) | 38 | |
| 3 | All statistical analysis was performed using | 6,844 | 0.56 (0.44 to 0.78) | 263 |
| 4 | 6,660 | 0.43 (0.33 to 0.52) | 6 | |
| 5 | Statistical analysis was performed using | 9,005 | 0.58 (0.42 to 0.75) | 539 |
| 6 | Data are expressed as mean | 4,455 | 0.78 (0.67 to 0.89) | 321 |
| 7 | Summary estimates including | 4,057 | 0.4 (0.3 to 0.5) | 6 |
| 8 | The significance level was set at p | 3,397 | 0.5 (0.5 to 0.7) | 262 |
| 9 | *p | 5,559 | 0.83 (0.83 to 0.92) | 2,510 |
| 10 | All data are representative of at least three | 1,722 | 0.6 (0.47 to 0.7) | 83 |
| All topics | A p-value | 64,639 | 0.6 (0.5 to 0.8) | 12,675 |
| Data are presented as mean | 33,471 | 0.67 (0.67 to 0.78) | 1,648 | |
| Statistical analysis was performed using Student’s | 44,699 | 0.5 (0.38 to 0.63) | 1,043 | |
N-grams are marked in bold. Potential matches refers to the number of studies that contained the target n-gram at least once. Boilerplate text was defined by a Jaccard score of 0.9 or higher. IQR: Inter-quartile range.
Logistic regression results for study characteristics associated with missing statistical methods sections in ANZCTR.
| Variable | Odds ratio | 95% CI |
|---|---|---|
| Study type = Observational | 0.78 | (0.69, 0.89) |
| Date (per year) | 0.90 | (0.88, 0.91) |
| Number of funders | 0.80 | (0.74, 0.86) |
| Target sample size (per doubling) | 0.90 | (0.88, 0.92) |
Fig 2Word clouds for ten topics for statistical methods sections published in ANZCTR.
Results of boilerplate analysis applied to the ANZCTR dataset.
| Topic (Number of studies) | Word count Median (IQR) | Sentences Median (IQR) | Matching studies | |
|---|---|---|---|---|
| Section | 1+ sentences | |||
| 1: Qualitative methods (842) | 116 (58 to 207) | 6 (3 to 10) | 46 (23) | 196 (171) |
| 2: Sample size calculations (1,753) | 147 (92 to 231) | 6 (3 to 9) | 40 (22) | 311 (259) |
| 3: Student’s t-test (923) | 119 (75 to 178) | 6 (4 to 8) | 62 (32) | 354 (292) |
| 4: Efficacy and safety studies (871) | 174 (97 to 268) | 7 (4 to 12) | 56 (7) | 190 (162) |
| 5: Pilot studies (737) | 78 (40 to 129) | 4 (2 to 6) | 39 (24) | 88 (78) |
| 6: Safety and tolerability studies (507) | 127 (73 to 220) | 6 (4 to 10) | 40 (23) | 182 (159) |
| 7: Descriptive analysis (328) | 39 (20 to 65) | 2 (1 to 4) | 43 (41) | 59 (57) |
| 8: Intervention studies (826) | 174 (98 to 275) | 7 (4 to 11) | 14 (6) | 129 (106) |
| 9: Linear models (1,728) | 172 (95 to 298) | 7 (4 to 12) | 85 (44) | 554 (486) |
| 10: Analysis of variance (1,008) | 131 (76 to 214) | 5 (3 to 9) | 59 (29) | 236 (209) |
The number of studies with Jaccard similarity scores greater than or equal to 0.9 from pairwise comparisons are presented; the number of studies with cut-and-pasted text is given in brackets.
Example boilerplate text from ANZCTR studies with the highest number of matches per topic (sentence level).
| Topic | Statistical methods text | Potential matches | Jaccard score | |
|---|---|---|---|---|
| Median (IQR) | Boilerplate | |||
| 1 | All analyses will be conducted on an | 153 | 0.55 (0.52 to 0.73) | 11 |
| 2 | The | 1,224 | 0.42 (0.33 to 0.5) | 9 |
| 3 | Continuous normally distributed variables will be compared using | 134 | 0.32 (0.2 to 0.4) | 8 |
| 4 | At a confidence level of | 211 | 0.46 (0.33 to 0.58) | 28 |
| 5 | No formal | 163 | 0.43 (0.43 to 0.57) | 4 |
| 6 | Continuous variables will be summarized by mean standard deviation | 65 | 0.77 (0.69 to 0.85) | 15 |
| 7 | 246 | 0.8 (0.55 to 0.8) | 69 | |
| 8 | Analyses will be conducted on an | 149 | 0.55 (0.46 to 0.73) | 16 |
| 9 | 238 | 0.6 (0.6 to 0.8) | 20 | |
| 10 | Data will be analyzed using standardised non-parametric or parametric statistical methods where appropriate (using) | 206 | 0.29 (0.24 to 0.35) | 5 |
| All topics | A p-value | 1,967 | 0.55 (0.36 to 0.73) | 267 |
| Analyses will be conducted on an | 1,630 | 0.6 (0.5 to 0.7) | 191 | |
| Baseline characteristics will be summarised using | 1,375 | 0.5 (0.5 to 0.63) | 23 | |
The number of matching to each sentence was based on a Jaccard score of 0.9 or higher. Potential matches refers to the number of studies that contained the target n-gram at least once.