| Literature DB >> 34922458 |
Didi Surian1, Florence T Bourgeois2,3, Adam G Dunn4,5.
Abstract
BACKGROUND: Clinical trial registries can be used as sources of clinical evidence for systematic review synthesis and updating. Our aim was to evaluate methods for identifying clinical trial registrations that should be screened for inclusion in updates of published systematic reviews.Entities:
Keywords: Document similarity; Hierarchical clustering; Systematic reviews; Trial registrations
Mesh:
Year: 2021 PMID: 34922458 PMCID: PMC8684229 DOI: 10.1186/s12874-021-01485-6
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Construction of the dataset using PubMed, CrossRef, and ClinicalTrials.gov.
Fig. 2Illustration of the hierarchical agglomerative clustering method (left) and traversal on the resulting dendrogram (right)
The median number of trial registrations to be screened to achieve 100% recall
| Model | Median [IQR] |
|---|---|
| TF-IDF, Ward, Euclidean | |
| Single-linkage, Euclidean | 90,725 [31070–132,615] |
| LDA, 50 topics, Ward, Euclidean | 6352 [986–86,990] |
| 100 topics | 4381 [465–91,954] |
| 150 topics | 4653 [687–77,875] |
| 200 topics | 4453 [425–70,500] |
| LDA, 50 topics, Single-linkage, Euclidean | 112,858 [71482–136,676] |
| 100 topics | 115,957 [67384–137,334] |
| 150 topics | 113,794 [76429–139,497] |
| 200 topics | 124,259 [89399–146,225] |
| Doc2Vec, 50-dimensional vector, Ward, Euclidean | 2256 [198–23,926] |
| 100-dimensional vector | 2432 [176–26,889] |
| 150-dimensional vector | 3850 [238–78,118] |
| 200-dimensional vector | 5113 [231–70,483] |
| Doc2Vec, 50 vector dimension, Single-linkage, Euclidean | 125,000 [84772–150,519] |
| 100-dimensional vector | 127,604 [91965–150,958] |
| 150-dimensional vector | 128,801 [89896–151,288] |
| 200-dimensional vector | 128,978 [89398–151,171] |
| TF-IDF, Euclidean | |
| LDA, 50 topics, Euclidean | 1287 [271–4968] |
| 100 topics | 842 [134–3776] |
| 150 topics | 793 [123–4268] |
| 200 topics | 887 [116–5417] |
| Doc2Vec, 50-dimensional vector, Euclidean | 18,501 [1970–51,495] |
| 100-dimensional vector | 33,968 [7898–68,806] |
| 150-dimensional vector | 41,116 [12036–77,218] |
| 200-dimensional vector | 43,879 [13791–82,388] |
Fig. 3The median recall for 1089 systematic reviews after screening a given number of trial registrations
Fig. 4Effect of seeding set size to the number of trials screened to achieve 95% recall
Fig. 5The t-SNE visualization of the evaluated 4644 trials (blue) and the other ClinicalTrials.gov trials (grey)