| Literature DB >> 27659604 |
Ignacio Atal1,2,3, Jean-David Zeitoun4,5,6, Aurélie Névéol7, Philippe Ravaud4,5,6,8, Raphaël Porcher4,5,6, Ludovic Trinquart4,5,8.
Abstract
BACKGROUND: Clinical trial registries may allow for producing a global mapping of health research. However, health conditions are not described with standardized taxonomies in registries. Previous work analyzed clinical trial registries to improve the retrieval of relevant clinical trials for patients. However, no previous work has classified clinical trials across diseases using a standardized taxonomy allowing a comparison between global health research and global burden across diseases. We developed a knowledge-based classifier of health conditions studied in registered clinical trials towards categories of diseases and injuries from the Global Burden of Diseases (GBD) 2010 study. The classifier relies on the UMLS® knowledge source (Unified Medical Language System®) and on heuristic algorithms for parsing data. It maps trial records to a 28-class grouping of the GBD categories by automatically extracting UMLS concepts from text fields and by projecting concepts between medical terminologies. The classifier allows deriving pathways between the clinical trial record and candidate GBD categories using natural language processing and links between knowledge sources, and selects the relevant GBD classification based on rules of prioritization across the pathways found. We compared automatic and manual classifications for an external test set of 2,763 trials. We automatically classified 109,603 interventional trials registered before February 2014 at WHO ICTRP.Entities:
Keywords: Clinical trials; Disease classification; Global burden of diseases; Mapping
Year: 2016 PMID: 27659604 PMCID: PMC5034670 DOI: 10.1186/s12859-016-1247-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Performance of the 8 versions of the classifier, compared to the baseline
| Word Sense Disambiguation | Expert-based enrichment | Priority to health condition field | Exact matching proportion | Weighted average across 28 GBD categories | |||||
|---|---|---|---|---|---|---|---|---|---|
| All trials | One GBD category | Two or more GBD categories | No GBD category | Sensitivity | Specificity | ||||
| 1 | Yes | Yes | Yes | 77.8 | 82.7 | 28.6 | 53.1 | 81.9 | 97.4 |
| 2 | Yes | Yes | No | 77.5 | 82.5 | 28.6 | 52.1 | 81.8 | 97.4 |
| 3 | Yes | No | Yes | 76.9 | 81.4 | 28.6 | 54.8 | 81.0 | 97.2 |
| 4 | Yes | No | No | 76.9 | 81.5 | 28.6 | 53.8 | 81.1 | 97.2 |
| 5 | No | Yes | Yes | 75.6 | 80.1 | 28.6 | 53.1 | 81.9 | 97.0 |
| 6 | No | Yes | No | 75.3 | 79.9 | 28.6 | 52.1 | 81.8 | 97.0 |
| 7 | No | No | Yes | 74.8 | 79.0 | 25.0 | 54.8 | 81.0 | 96.9 |
| 8 | No | No | No | 74.8 | 79.1 | 25.0 | 53.8 | 81.2 | 96.9 |
| Baselines | Condition field | 48.7 | 40.5 | 10.7 | 98.5 | 49.3 | 91.4 | ||
| Public title | 38.1 | 27.6 | 7.1 | 100.0 | 38.2 | 89.6 | |||
| Official title | 38.0 | 27.6 | 7.1 | 99.3 | 38.2 | 89.6 | |||
| Three text fields | 51.4 | 43.7 | 17.9 | 97.8 | 52.3 | 92.0 | |||
Exact-matching and weighted averaged sensitivities and specificities for 8 versions of the classifier for the 28 GBD categories, compared to the baseline. Exact-matching corresponds to the proportion (in %) of trials for which the automatic GBD classification is correct. Exact-matching was estimated over all trials (N = 2,763), trials concerning a unique GBD category (N = 2,328), trials concerning 2 or more GBD categories (N = 28), and trials not relevant for the GBD (N = 407). The weighted averaged sensitivity and specificity corresponds to the weighted average across GBD categories of the sensitivities and specificities for each GBD category plus the “No GBD” category (in %). The 8 versions correspond to the combinations of the use or not of the Word Sense Disambiguation server during the text annotation, the expert-based enrichment database, and the priority to the health condition field as a prioritization rule. The baseline did not used the UMLS knowledge source, but a clinical trial record was classified to a GBD category if at least one of the disease names defining that GBD category appeared verbatim in the condition field, the public or scientific titles, separately, or in at least one of these three text fields
Grouping of the Global Burden of Diseases (GBD) cause list in 28 GBD categories
| GBD categories | Partition of the GBD cause list |
|---|---|
| Tuberculosis | Tuberculosis |
| HIV/AIDS | HIV/AIDS |
| Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases | Diarrheal diseases; Typhoid and paratyphoid fevers; Lower respiratory infections; Upper respiratory infections; Otitis media: Meningitis; Encephalitis; Diphtheria; Whooping cough; Tetanus; Measles; Varicella |
| Malaria | Malaria |
| Neglected tropical diseases excluding malaria | Chagas disease; Leishmaniasis: African trypanosomiasis; Schistosomiasis; Cysticercosis; Echinococcosis; Lymphatic filariasis; Onchocerciasis; Trachoma; Dengue; Yellow fever; Rabies; Food-borne trematodiases; Intestinal nematode infections; Other neglected tropical diseases |
| Maternal disorders | Maternal hemorrhage; Maternal sepsis; Hypertensive disorders of pregnancy; Obstructed labor; Abortion; Other maternal disorders |
| Neonatal disorders | Preterm birth complications; Neonatal encephalopathy (birth asphyxia and birth trauma); Sepsis and other infectious disorders of the newborn baby; Other neonatal disorders |
| Nutritional deficiencies | Protein-energy malnutrition; Iodine deficiency; Vitamin A deficiency; Iron-deficiency anemia; Other nutritional deficiencies |
| Sexually transmitted diseases excluding HIV | Syphilis; Sexually transmitted chlamydial diseases; Gonococcal infection; Trichomoniasis; Other sexually transmitted diseases |
| Hepatitis | Acute hepatitis A; Acute hepatitis B; Acute hepatitis C; Acute hepatitis E |
| Leprosy | Leprosy |
| Neoplasms | Esophageal cancer; Stomach cancer; Liver cancer; Larynx cancer; Trachea, bronchus, and lung cancers; Breast cancer; Cervical cancer; Uterine cancer; Prostate cancer; Colon and rectum cancers; Mouth cancer; Nasopharynx cancer; Cancer of other part of pharynx and oropharynx; Gallbladder and biliary tract cancer; Pancreatic cancer; Malignant melanoma of skin; Non-melanoma skin cancer; Ovarian cancer; Testicular cancer; Kidney and other urinary organ cancers; Bladder cancer; Brain and nervous system cancers; Thyroid cancer; Hodgkin's disease; Non-Hodgkin lymphoma; Multiple myeloma; Leukemia; Other neoplasms |
| Cardiovascular and circulatory diseases | Rheumatic heart disease; Ischemic heart disease; Cerebrovascular disease; Hypertensive heart disease; Cardiomyopathy and myocarditis; Atrial fibrillation and flutter; Aortic aneurysm; Peripheral vascular disease; Endocarditis; Other cardiovascular and circulatory diseases |
| Chronic respiratory diseases | Chronic obstructive pulmonary disease; Pneumoconiosis; Asthma; Interstitial lung disease and pulmonary sarcoidosis; Other chronic respiratory diseases |
| Cirrhosis of the liver | Cirrhosis of the liver |
| Digestive diseases (except cirrhosis) | Peptic ulcer disease; Gastritis and duodenitis; Appendicitis; Paralytic ileus and intestinal obstruction without hernia; Inguinal or femoral hernia; Non-infective inflammatory bowel disease; Vascular disorders of intestine; Gall bladder and bile duct disease; Pancreatitis; Other digestive diseases |
| Neurological disorders | Alzheimer's disease and other dementias; Parkinson's disease; Epilepsy; Multiple sclerosis; Migraine; Tension-type headache; Other neurological disorders |
| Mental and behavioral disorders | Schizophrenia; Alcohol use disorders; Drug use disorders; Unipolar depressive disorders; Bipolar affective disorder; Anxiety disorders; Eating disorders; Pervasive development disorders; Childhood behavioral disorders; Idiopathic intellectual disability; Other mental and behavioral disorders |
| Diabetes, urinary diseases and male infertility | Diabetes mellitus; Acute glomerulonephritis; Chronic kidney diseases; Urinary diseases and male infertility |
| Gynecological diseases | Uterine fibroids; Polycystic ovarian syndrome; Female infertility; Endometriosis; Genital prolapse; Premenstrual syndrome; Other gynecological diseases |
| Hemoglobinopathies and hemolytic anemias | Hemoglobinopathies and hemolytic anemias; Thalassemias; Sickle cell disorders; G6PD deficiency; Other hemoglobinopathies and hemolytic anemias |
| Musculoskeletal disorders | Rheumatoid arthritis; Osteoarthritis; Low back and neck pain; Gout; Other muskuloskeletal disorders |
| Congenital anomalies | Congenital anomalies; Neural tube defects; Congenital heart anomalies; Cleft lip and cleft palate; Down's syndrome; Other chromosomal abnormalities; Other congenital anomalies |
| Skin and subcutaneous diseases | Eczema; Psoriasis; Cellulitis; Abscess, impetigo, and other bacterial skin diseases; Scabies; Fungal skin diseases; Viral skin diseases; Acne vulgaris; Alopecia areata; Pruritus; Urticaria; Decubitus ulcer; Other skin and subcutaneous diseases |
| Sense organ diseases | Glaucoma; Cataracts; Macular degeneration; Refraction and accommodation disorders; Other hearing loss; Other vision loss; Other sense organ diseases |
| Oral disorders | Dental caries; Periodontal disease; Edentulism |
| Sudden infant death syndrome | Sudden infant death syndrome |
| Injuries | Transport injuries; Unintentional injuries other than transport injuries; Self-harm and interpersonal violence; Forces of nature, war, and legal intervention |
| Excluded residual categories | Other infectious diseases; Other endocrine, nutritional, blood, and immune disorders |
Grouping of the cause list of diseases and injuries from the Global Burden of Diseases 2010 study in 28 GBD categories, plus the excluded residual categories. This grouping was considered sufficiently informative for a global mapping of health research to a global mapping of health needs
Fig. 1Example of classification of a clinical trial record towards the GBD categories. The classification process is based on text extraction from the trial record, text annotation using UMLS concepts, projection of UMLS concepts to ICD10 codes, projection of ICD10 codes to candidate GBD categories among the 28 GBD categories, and GBD classification based on the candidate GBD categories. In this example, the text annotation involved use of the WSD server for MetaMap, and no expert-based enrichment was needed
Fig. 2Methodological stages for classification. The classification of clinical trial records has 5 stages. The 4 initial stages allow for deriving pathways from the clinical trial record to candidate GBD categories: annotation of text from the trial record with UMLS concepts by using MetaMap, projection of UMLS concepts to ICD10 codes with IntraMap, projection of ICD10 codes to candidate GBD categories, and expert-based enrichment when automatic pathways are not possible. The fifth stage allows for deriving the GBD classification of the trial based on prioritization rules over the pathways found
Distribution of the external test set (n = 2,763 trials) across the 28-class grouping of the GBD cause list, performance of the best performing version of the classifier in the external test set, and projection of all trials in the WHO ICTRP database (n = 109,603)
| External test set | WHO ICTRP | ||||||
|---|---|---|---|---|---|---|---|
| GBD categories | No. trials | Sen (%) | Spe (%) | PV+ (%) | LR+ | LR- | No. trials (%) |
| Neoplasms | 958 | 97.4 [96.7-97.7] | 97.5 [97.0-97.7] | 95.3 [94.4-95.8] | 38.2 [28.7-50.8] | 0.03 [0.02-0.04] | 25,004 (22.8) |
| Diabetes, urinary diseases and male infertility | 242 | 81.0 [78.0-83.0] | 97.4 [97.0-97.7] | 75.1 [72.1-77.4] | 31.4 [24.5-40.2] | 0.20 [0.15-0.25] | 9,749 (8.9) |
| Cardiovascular and circulatory diseases | 235 | 75.7 [72.5-78.1] | 97.6 [97.2-97.9] | 74.8 [71.6-77.2] | 31.9 [24.6-41.4] | 0.25 [0.20-0.31] | 8,906 (8.1) |
| Mental and behavioral disorders | 143 | 93.7 [90.5-94.7] | 98.7 [98.4-98.9] | 80.2 [76.5-82.6] | 74.4 [52.9-104.7] | 0.06 [0.03-0.12] | 7,609 (6.9) |
| Musculoskeletal disorders | 113 | 88.5 [84.2-90.3] | 98.5 [98.2-98.7] | 71.4 [67.1-74.6] | 58.6 [42.8-80.3] | 0.12 [0.07-0.19] | 6,112 (5.6) |
| HIV/AIDS | 97 | 88.7 [83.9-90.4] | 99.7 [99.6-99.8] | 92.5 [88.0-93.6] | 337.7 [160.6-710.0] | 0.11 [0.07-0.20] | 2,295 (2.1) |
| Neurological disorders | 93 | 84.9 [79.9-87.3] | 98.5 [98.2-98.7] | 66.4 [61.6-70.1] | 56.7 [41.2-78.0] | 0.15 [0.09-0.25] | 6,355 (5.8) |
| Chronic respiratory diseases | 81 | 93.8 [89.0-94.6] | 99.4 [99.1-99.5] | 81.7 [76.5-84.4] | 148.0 [91.9-238.5] | 0.06 [0.03-0.15] | 4,104 (3.7) |
| Sense organ diseases | 56 | 92.9 [86.5-93.7] | 98.5 [98.2-98.7] | 56.5 [51.2-61.3] | 62.8 [45.8-86.2] | 0.07 [0.03-0.19] | 3,461 (3.2) |
| Injuries | 56 | 16.1 [13.4-23.1] | 99.5 [99.3-99.6] | 39.1 [31.2-50.1] | 31.1 [14.0-68.8] | 0.07 [0.03-0.19] | 655 (0.6) |
| Diarrhea, lower respiratory infections, meningitis, and other common infectious diseases | 49 | 81.6 [73.9-84.8] | 99.2 [99.0-99.3] | 65.6 [58.7-70.6] | 105.5 [67.5-164.8] | 0.19 [0.10-0.33] | 3,200 (2.9) |
| Maternal disorders | 43 | 39.5 [33.2-47.6] | 99.8 [99.7-99.8] | 77.3 [64.7-81.7] | 215.1 [83.1-556.4] | 0.61 [0.48-0.77] | 602 (0.5) |
| Digestive diseases (except cirrhosis) | 32 | 75.0 [65.0-79.7] | 99.0 [98.7-99.1] | 46.2 [39.7-53.1] | 73.2 [48.1-111.3] | 0.25 [0.14-0.46] | 4,454 (4.1) |
| Cirrhosis of the liver | 23 | 82.6 [70.2-85.6] | 99.4 [99.2-99.5] | 52.8 [44.6-60.4] | 133.1 [80.0-221.6] | 0.17 [0.07-0.43] | 1,412 (1.3) |
| Congenital anomalies | 23 | 95.7 [78.1-99.9] | 98.8 [98.5-98.9] | 39.3 [33.7-46.3] | 77.1 [54.6-108.9] | 0.04 [0.01-0.30] | 1,947 (1.8) |
| Skin and subcutaneous diseases | 22 | 81.8 [69.1-85.1] | 99.1 [98.9-99.2] | 42.9 [36.1-50.8] | 93.4 [59.9-145.7] | 0.18 [0.08-0.45] | 3,652 (3.3) |
| Hepatitis | 17 | 82.4 [67.5-85.3] | 99.9 [99.7-99.9] | 77.8 [63.7-82.1] | 565.4 [207.2-1542.5] | 0.18 [0.06-0.49] | 1,082 (1.0) |
| Tuberculosis | 16 | 87.5 [71.9-88.5] | 99.9 [99.8-99.9] | 87.5 [71.9-88.5] | 1201.8 [297.0-4862.5] | 0.13 [0.03-0.46] | 306 (0.3) |
| Nutritional deficiencies | 16 | 68.8 [54.6-75.7] | 99.5 [99.2-99.5] | 42.3 [34.2-52.4] | 125.9 [68.9-230.1] | 0.31 [0.15-0.65] | 1,226 (1.1) |
| Hemoglobinopathies and hemolytic anemias | 16 | 62.5 [49.1-71.0] | 99.9 [99.7-99.9] | 71.4 [55.9-77.8] | 429.2 [150.2-1226.9] | 0.38 [0.20-0.71] | 360 (0.3) |
| Malaria | 14 | 100.0 [78.5-100.0] | 100.0 [99.9-100.0] | 93.3 [68.1-99.8] | 2749.0 [387.4-19508.4] | - | 442 (0.4) |
| Gynecological diseases | 11 | 81.8 [62.7-84.4] | 99.6 [99.4-99.7] | 47.4 [37.4-58.3] | 225.2 [114.2-443.8] | 0.18 [0.05-0.64] | 1,536 (1.4) |
| Neonatal disorders | 10 | 40.0 [29.5-56.0] | 99.7 [99.6-99.8] | 36.4 [27.3-52.5] | 157.3 [54.5-454.1] | 0.60 [0.36-1.00] | 718 (0.7) |
| Oral disorders | 8 | 37.5 [27.3-55.8] | 99.9 [99.7-99.9] | 42.9 [30.3-60.5] | 258.3 [68.6-973.0] | 0.63 [0.37-1.07] | 576 (0.5) |
| Neglected tropical diseases excluding malaria | 7 | 85.7 [42.1-99.6] | 100.0 [99.9-100.0] | 100.0 [61.0-100.0] | - | 0.14 [0.02-0.88] | 361 (0.3) |
| Leprosy | 2 | 100.0 [15.8-100.0] | 100.0 [99.9-100.0] | 66.7 [38.7-76.0] | 2761.0 [389.1-19593.6] | - | 74 (0.1) |
| Sexually transmitted diseases excluding HIV | 1 | 0.0 [0.0-97.5] | 99.8 [99.7-99.8] | 0.0 [0.0-43.4] | - | - | 187 (0.2) |
| Sudden infant death syndrome | 0 | - | 100.0 [99.9-100.0] | - | - | - | 5 (0.0) |
| No GBD category | 407 | 53.1 [50.6-55.5] | 92.9 [92.3-93.4] | 56.4 [53.8-58.9] | 7.5 [6.3-8.9] | 0.51 [0.46-0.56] | 22,450 (20.5) |
Sen Sensitivity, Spe specificity, PV+ positive predictive value, LR+ positive likelihood ratio, LR- negative likelihood ratio. The version of the classifier used was: using the Word Server Disambiguation server, the expert-based enrichment, and giving priority to the health condition field
Performance of the classifier per source of data for the 28 GBD categories
| Exact-matching (% n/N) | Weighted average across 28 GBD categories | |||||
|---|---|---|---|---|---|---|
| Source | All trials | One GBD category | No GBD category | Two or more GBD categories | Sensitivity | Specificity |
| Emdin 2015 | 66.7 (346/519) | 66.4 (300/452) | 68.2 (45/66) | 100.0 (1/1) | 71.5 | 96.4 |
| Viergever 2013 | 82.2 (1045/1271) | 85.3 (925/1085) | 64.5 (120/186) | 0.0 (0/0) | 86.6 | 97.8 |
| On going work | 77.9 (758/973) | 88.5 (700/791) | 32.9 (51/155) | 25.9 (7/27) | 81.3 | 97.2 |
Exact-matching and weighted averaged sensitivities and specificities for the classifier to the 28 GBD categories for each source of data. The version of the classifier used was: using the Word Sense Disambiguation server, the expert-based enrichment database and the priority to the health condition field. Exact-matching corresponds to the proportion (in %) of trials for which the automatic GBD classification is correct. Exact-matching was estimated over all trials, trials concerning a unique GBD category, trials concerning 2 or more GBD categories, and trials not relevant for the GBD. The weighted averaged sensitivity and specificity corresponds to the weighted average across GBD categories of the sensitivities and specificities for each GBD category plus the “No GBD” category (in %)