| Literature DB >> 22438982 |
Asba Tasneem1, Laura Aberle, Hari Ananth, Swati Chakraborty, Karen Chiswell, Brian J McCourt, Ricardo Pietrobon.
Abstract
BACKGROUND: The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However, issues related to data structure, nomenclature, and changes in data collection over time present challenges to the aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource. METHODS/PRINCIPALEntities:
Mesh:
Year: 2012 PMID: 22438982 PMCID: PMC3306288 DOI: 10.1371/journal.pone.0033677
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A schematic representation of the database for Aggregate Analysis of ClinicalTrials.Gov (AACT) with its key enhancements.
Figure 2High-level Entity-Relationship Diagram (ERD) for AACT.
Escape characters and replacements.
| Escape character | Replacement |
| ' | ' |
| " | " |
| & | & |
| " | > |
| < | < |
Figure 3Percentage of interventional studies with complete data by registration year for selected data elements.
Figure 4An overview of methodology and process of developing clinical specialty datasets.
The INTERVENTIONS, CONDITIONS, and KEYWORDS tables consist of disease condition terms provided by data submitters that include both MeSH and non-MeSH terms. The INTERVENTION_BROWSE and CONDITION_BROWSE tables are populated by MeSH terms generated by NLM algorithm (a) Process illustrating how MeSH terms are created in ClinicalTrials.gov. Tables and data shown here does not represent entire ClinicalTrials.gov database (b) Process illustrating the annotation and validation of disease conditions (c) Process illustrating the creation of specialty datasets.
MeSH Subject Headings, 2010—Diseases.
| Bacterial Infections and Mycoses [C01] |
| Virus Diseases [C02] |
| Parasitic Diseases [C03] |
| Neoplasms [C04] |
| Musculoskeletal Diseases [C05] |
| Digestive System Diseases [C06] |
| Stomatognathic Diseases [C07] |
| Respiratory Tract Diseases [C08] |
| Otorhinolaryngologic Diseases [C09] |
| Nervous System Diseases [C10] |
| Eye Diseases [C11] |
| Male Urogenital Diseases [C12] |
| Female Urogenital Diseases and Pregnancy Complications [C13] |
| Cardiovascular Diseases [C14] |
| Hemic and Lymphatic Diseases [C15] |
| Congenital, Hereditary, and Neonatal Diseases and Abnormalities [C16] |
| Skin and Connective Tissue Diseases [C17] |
| Nutritional and Metabolic Diseases [C18] |
| Endocrine System Diseases [C19] |
| Immune System Diseases [C20] |
| Disorders of Environmental Origin [C21] |
| Animal Diseases [C22] |
| Pathological Conditions, Signs and Symptoms [C23] |
| Available at: |
Frequency of intermediate terms and top node terms that did not match annotations of lower-level terms.
| Specialty | n/N (%) |
| Cardiology | 172/5264 (3.3%) |
| Oncology | 284/5264 (5.4%) |
| Mental health | 93/5264 (1.8%) |
n = number of intermediate- and top-node MeSH terms for a given specialty that do not match the annotations of their lower-level terms. N = total number of intermediate- and top-node MeSH terms.
Figure 5MeSH trees for acromegaly.
Source: 2010 online MeSH thesaurus (available: http://www.nlm.nih.gov/cgi/mesh/2010/MB_cgi).
Figure 6Rules for deciding whether a given study belongs to a given specialty.
Number of studies reviewed by each set of clinician reviewers.
| Reviewer A ID | Reviewer B ID | Studies reviewed (n) |
| Clinician 1 | Clinician 2 | 200 |
| Clinician 1 | Clinician 3 | 400 |
| Clinician 4 | Clinician 5 | 200 |
| Clinician 6 | Clinician 7 | 200 |
The combination of Clinician 1 (“A”) and Clinician 3 (“B”) together reviewed 2 batches of studies.
Contingency table for identifying misclassification errors.
| Algorithm | ||||||
| Yes (Y) | No (N) | Ambiguous | Unclassified | Total | ||
|
| Yes (Y) | A |
| G | H | A+B+G+H |
| No (N) |
| D | I | J | C+D+I+J | |
| Unknown | E | F | K | L | E+F+K+L | |
| Total | A+C+E | B+D+F | G+I+K | H+J+L | T | |
The overall misclassification error rate divides the total number of errors by the total number of studies reviewed. The false positive rate was determined using two methods: in the first, the false-positive rate was calculated among studies classified as N by manual review; in the second, the false-positive rate was calculated among studies classified as Y by the algorithm. The false-negative rate was evaluated in similar fashion: by dividing the number of false negatives by the number of studies classified as Y by manual review, or by the number of studies classified as N by the algorithm.
Classification of studies: algorithmically vs. manually.
| CARDIOLOGY | ||||||
| Algorithm | ||||||
| N | Y | Ambiguous | Unclassified | Total | ||
| N | 836 | 18 | 1 | 49 | 904 | |
|
| Y | 21 | 72 | 0 | 2 | 95 |
| Unknown | 1 | 0 | 0 | 0 | 1 | |
| Total | 858 | 90 | 1 | 51 | 1,000 | |
Comparison between manual classification and algorithmic classification for cardiology, oncology, and mental health.
| Cardiology | Oncology | Mental Health | |
| % Specialty by manual review | 9.5% | 24.6% | 8.2% |
| % Specialty by algorithm | 9.5% | 25.4% | 9.9% |
| False positives | |||
| Among studies classified as N by manual review | 2.0% | 0.5% | 2.3% |
| Among studies classified as Y by algorithm | 20.0% | 1.7% | 22.6% |
| False negatives | |||
| Among studies classified as Y by manual review | 22.1% | 2.8% | 12.2% |
| Among studies classified as N by algorithm | 2.4% | 1.0% | 1.2% |
| Overall incorrectly classified studies | 4.2% | 1.2% | 3.3% |
| Overall ambiguous studies | 0.1% | 0.1% | 0.8% |
| Overall unclassified studies | 5.1% | 5.1% | 5.1% |
Excluding unclassified & ambiguous from denominator.
Studies that were incorrectly included in a given specialty (e.g. non-cardiology studies classified as cardiology).
Studies that were incorrectly excluded from a given specialty (e.g. cardiology studies classified as non-cardiology).
Summary of disagreements between clinical specialty reviewers in study classification.
| Disagreement | |||
| Reviewers A&B | Cardiology | Oncology | Mental health |
| Reviewers 1 & 2 | 12/200 (6.0%) | 9/200 (4.5%) | 16/200 (8.0%) |
| Reviewers 1 & 3 | 20/400 (5.0%) | 6/400 (1.5%) | 18/400 (4.5%) |
| Reviewers 4 & 5 | 18/200 (9.0%) | 11/200 (5.5%) | 14/200 (7.0%) |
| Reviewers 6 & 7 | 18/200 (9.0%) | 9/200 (4.5%) | 18/200 (9.0%) |
| Overall | 68/1,000 (6.8%) | 35/1,000 (3.5%) | 66/1,000 (6.6%) |
Defined as any difference in classification of a study by the two reviewers of that study.
Summary of results of comparison between condition_browse and condition data by specialty classification.
| Cardiovascular | Oncology | Mental health | ||||
| Condition browse | Condition | Condition browse | Condition | Condition browse | Condition | |
| % Specialty by manual review | 9.5% | 9.5% | 24.6% | 24.6% | 8.2% | 8.2% |
| % Specialty by algorithm | 8.6% | 9.1% | 27.3% | 24.8% | 8.3% | 9.1% |
| False positives | ||||||
| Among studies classified as N by manual review | 1.4% | 1.5% | 0.4% | 0.3% | 1.2% | 1.5% |
| Among studies classified as Y by algorithm | 17.8% | 18.2% | 1.3% | 0.9% | 15.9% | 18% |
| False negatives | ||||||
| Among studies classified as Y by manual review | 23.2% | 22.1% | 2.8% | 4.5% | 8.5% | 13.4% |
| Among studies classified as N by algorithm | 2.8% | 2.7% | 1.7% | 1.7% | 1.4% | 1.4% |
| Overall incorrectly classified studies | 4.1% | 4.2% | 1.2% | 1.5% | 2.2% | 3.0% |
| Overall ambiguous studies | 0.0% | 0.1% | 0.3% | 0.0% | 1.5% | 0.8% |
| Overall unclassified studies | 15.5% | 14.9% | 15.5% | 14.9% | 15.5% | 14.9% |
excluding unclassified & ambiguous from denominator.
Studies that were incorrectly included in a given specialty (e.g., non-cardiology studies classified as cardiology).
Studies that were incorrectly excluded from a given specialty (e.g., cardiology studies classified as non-cardiology).