| Literature DB >> 27422636 |
Gert van Valkenhoef1,2, Russell F Loane3, Deborah A Zarin3.
Abstract
BACKGROUND: Trial registries were established to combat publication bias by creating a comprehensive and unambiguous record of initiated clinical trials. However, the proliferation of registries and registration policies means that a single trial may be registered multiple times (i.e., "duplicates"). Because unidentified duplicates threaten our ability to identify trials unambiguously, we investigate to what degree duplicates have been identified across registries globally.Entities:
Keywords: Clinical trials; Duplicate registrations; Trial registration
Mesh:
Year: 2016 PMID: 27422636 PMCID: PMC4946209 DOI: 10.1186/s13643-016-0283-8
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Illustrating how the terms trial, record, and variant are used
| Trial | Record | Variant |
|---|---|---|
| SPD557-206 (Shire Plc) | EUCTR2011-004388-62 | EUCTR2011-004388-62-BE |
| EUCTR2011-004388-62-CZ | ||
| EUCTR2011-004388-62-DE | ||
| EUCTR2011-004388-62-HU | ||
| EUCTR2011-004388-62-LV | ||
| EUCTR2011-004388-62-PL | ||
| NCT01472939 | NCT01472939 |
A trial registered in both the European Union Clinical Trials Register (EUCTR) and ClinicalTrials.gov illustrates the distinction we make between a trial, a record, and a variant. A trial may have been registered in more than one registry, resulting in multiple records. An EUCTR record may have been registered in multiple member states, resulting in multiple variants of that record
Fig. 1Record comparison user interface. Records were compared using a simple web application that shows two records side-by-side. The rater could use the “Same trial,” “Don’t know,” and “Different” buttons to indicate their judgment and proceed to the next pair by clicking “Next”. Mistakes could be corrected by simply clicking the correct button afterwards
Fig. 2Comparing the similarity scores of arbitrary pair-wise comparisons to those of known duplicates. Histogram of the overall (combined) similarity scores of pair-wise comparisons between a random sample of 7000 records (left) compared to the similarity scores of known duplicates (right)
Fig. 3Distribution of similarity scores for known duplicates. Each panel is a histogram of the similarity scores for the population of known duplicates on one of the five considered fields or the overall (combined) score
Fig. 4Known duplicates among highly similar records. Highly similar pairs (light gray) and the fraction that are known duplicates (dark gray). Pairs with similarity between 0.5 and 0.6 were identified but are not shown due to their large number (64 % of all pairs with a score over 0.5)
The estimated number of unknown duplicates based on a random sample from each title similarity score range. Confidence intervals for the percentage of hidden duplicates based on the exact binomial confidence interval for the proportion of duplicates in the sample
| Score range | D. in sample | D. known | D. unknown (est.) | % hidden |
|---|---|---|---|---|
| 0.7< | 7 / 125 (5.6 %) | 2194 | 1957 | 47 (26–64) |
| 0.8< | 13 / 100 (13 %) | 3489 | 2265 | 39 (26–51) |
| 0.9< | 89 / 209 (43 %) | 5805 | 5393 | 48 (44–52) |
Fig. 5Estimated number of unknown duplicates. The number of unknown duplicates estimated by randomly sampling from the pairs of records that are not known to be duplicates. The investigated range of title similarity scores (0.7–1.0) contains 76 % of all known duplicates