| Literature DB >> 28278271 |
Daisuke Yoneoka1,2, Erika Ota3.
Abstract
Despite the ongoing growth in the number of published randomized controlled trials (RCTs) and increased quality assessment of RCTs, the association between the quality and characteristics in the text has not been sufficiently studied. We are interested in a specific question: what kind of sentences is a good indicator of high quality RCTs? To help researchers to efficiently screen articles worth reading, this study aims 1) to quantify the linguistic features of articles and 2) to build a document assessment model to evaluate quality of RCTs using only the abstract. All RCTs that were conducted in Japan in 2010 as original articles were included in the analysis. Data were independently assessed by two reviewers using a risk-of-bias tool. Three aspects of linguistic style were quantitatively measured, and a document model was constructed to evaluate the RCTs. A total of 302 RCTs were selected for quality assessment. Of these, 255 articles were assessed as high quality and 47 as low quality. High-quality articles tended to use longer words than low-quality articles (p = 0.048), however sentences were generally shorter (p = 0.004). Further, high-quality articles included a larger proportion of noun phrases (p = 0.026) but a smaller proportion of verb phrases (p = 0.041). The optimal number of topics to assess the quality of articles was four, while two topics had a significant association with quality. Despite a number of articles published about RCTs in Japan, significant differences exist in several textual features between high- and low-quality RCTs. Instead of the risk-of-bias tool, these results can be used as the new criteria to rapidly screen valuable articles and it also revealed quality control of RCT articles is urgently needed in Japan.Entities:
Mesh:
Year: 2017 PMID: 28278271 PMCID: PMC5344454 DOI: 10.1371/journal.pone.0173526
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summaries of baseline measures.
| All articles | Correlation | p-value | Low quality | High quality | p-value | |
|---|---|---|---|---|---|---|
| Average number of characters per words (words) | 5.45 | 0.003 | 0.954 | 5.34 | 5.46 | 0.048 |
| Average number of words per sentence (words) | 16.73 | -0.002 | 0.979 | 19.16 | 16.29 | 0.004 |
| Number of sentences per 100 words (sentences) | 6.40 | -0.049 | 0.400 | 5.72 | 6.52 | 0.003 |
| Number of characters per 100 words (characters) | 544.53 | 0.003 | 0.954 | 534.04 | 546.46 | 0.048 |
| Word count (words) | 241.36 | 0.091 | 0.116 | 241.32 | 241.36 | 0.996 |
* Total articles: 302, (Low quality:47, High quality: 255).
** P-values for Pearson’s correlation test.
*** P-values for Welch’s test between low- and high-quality articles
Summaries of syntactic features and readability score.
| All articles | Correlation | p-value | Low quality | High quality | p-value | |
|---|---|---|---|---|---|---|
| Average proportion of nouns (including numbers and names) per all words (%) | 38.31 | -0.017 | 0.768 | 37.07 | 38.54 | 0.026 |
| Average proportion of verbs per all words (%) | 9.50 | 0.055 | 0.340 | 10.08 | 9.39 | 0.041 |
| Average proportion of conjunctions per all words (%) | 3.59 | -0.107 | 0.063 | 3.77 | 3.56 | 0.327 |
| Flesch Readability Score | 29.99 | -0.017 | 0.764 | 30.91 | 29.82 | 0.569 |
* Total articles: 302, (Low quality:47, High quality: 255).
** P-values for Pearson’s correlation test.
*** P-values for Welch’s test between low- and high-quality articles
Fig 1Top 20 results of unigram by article quality.
Fig 2Top 20 results of bigram by article quality.
Top 10 frequently reported words in detected topics of sLDA model.
| Topic 1 | Topic 2 | Topic 3 | Topic 4 |
|---|---|---|---|
| "Outcome" | "Surgery" | "Procedure" | "Patient" |
| decrease | surgeries | placebo | mg |
| serum | postoperative | subject | patient |
| pressure | pain | exercise | cancer |
| blood | use | week | efficacies |
| level | propofol | trained | treatment |
| hypertensive | anesthesia | gastric | therapies |
| therapies | undergo | concentration | week |
| mm | infused | glucose | h |
| cardiovascular | record | weight | primarily |
| change | analgesia | bodily | oral |
Estimated coefficients of four topics from sLDA model.
| OR | 95% CI | p-value | |
|---|---|---|---|
| Topic 1 | Ref. | - | - |
| Topic 2 | 3.44 | (1.00, 11.79) | 0.049 |
| Topic 3 | 7.35 | (1.53, 35.33) | 0.013 |
| Topic 4 | 1.19 | (0.31, 4.58) | 0.799 |
Adjusted covariates: type of intervention, disease and conditions, study design, type of control group, number of arms, sample size.
*OR: odds ratio
**CI: confidence interval