| Literature DB >> 33215068 |
Neil R Smalheiser1, Arthur W Holt1.
Abstract
OBJECTIVES: To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.Entities:
Keywords: clinical trials; evidence-based medicine; informatics; information retrieval; systematic reviews
Year: 2020 PMID: 33215068 PMCID: PMC7660960 DOI: 10.1093/jamiaopen/ooaa042
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1.Performance of the pairwise similarity model. Shown are the proportions of correct and incorrect predictions, as a function of the confidence score for each pair of articles for the hold-out test samples that were not used for model fitting. Most correct predicted probability estimates were very definitive (ie, ≤0.1 or >0.8). In contrast, the incorrect estimates were scattered between 0 and 1, but particularly below 0.5. This suggests that the biggest limitation to performance is due to features missing from articles, causing some positive pairs to receive low predicted probability estimates.
Clustering performance of Aggregator
| Split | Purity | F1 | |
|---|---|---|---|
| RCT articles retrieved and clustered | |||
| Conditions queries, all retrieved articles | 0.11 | 0.91 | 0.90 |
| Conditions queries, only NCT-containing articles | 0.083 | 0.90 | 0.91 |
| Condition + intervention query, all retrieved articles | 0.086 | 0.94 | 0.93 |
| Condition + intervention query, only NCT-containing articles | 0.079 | 0.93 | 0.93 |
| Clinical trial articles retrieved and clustered | |||
| Conditions queries, all retrieved articles | 0.11 | 0.90 | 0.89 |
| Conditions queries, only NCT-containing articles | 0.10 | 0.89 | 0.89 |
| Condition + intervention query, all retrieved articles | 0.085 | 0.93 | 0.92 |
| Condition + intervention query, only NCT-containing articles | 0.077 | 0.92 | 0.92 |
Note: We used Aggregator either to cluster all articles retrieved by these searches, or only clustered the subset of articles that contained NCT numbers. The clustering algorithm generally performed better when both condition and intervention were queried.
Abbreviations: RCT: randomized controlled trial.