| Literature DB >> 31775737 |
Simon Geletta1, Lendie Follett2, Marcia Laugerman2.
Abstract
BACKGROUND: This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures.Entities:
Keywords: Clinical trials; Latent Dirichlet allocation; Prediction; Structured data; Unstructured data
Mesh:
Year: 2019 PMID: 31775737 PMCID: PMC6882341 DOI: 10.1186/s12911-019-0973-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Summary of clinical trial termination rates within levels of each structured variable
| Characteristic | Level | total_n | Completed | Terminated |
|---|---|---|---|---|
| Primary Purpose | Missing | 29484 | 0.94 | 0.06 |
| Primary Purpose | Basic Science | 4915 | 0.94 | 0.06 |
| Primary Purpose | Device Feasibility | 52 | 0.83 | 0.17 |
| Primary Purpose | Diagnostic | 3858 | 0.87 | 0.13 |
| Primary Purpose | Educational/Counseling/Training | 118 | 0.89 | 0.11 |
| Primary Purpose | Health Services Research | 1971 | 0.96 | 0.04 |
| Primary Purpose | Other | 795 | 0.91 | 0.09 |
| Primary Purpose | Prevention | 11939 | 0.93 | 0.07 |
| Primary Purpose | Screening | 723 | 0.93 | 0.07 |
| Primary Purpose | Supportive Care | 3491 | 0.91 | 0.09 |
| Primary Purpose | Treatment | 76625 | 0.88 | 0.12 |
| Intervention Type | Behavioral | 10147 | 0.97 | 0.03 |
| Intervention Type | Biological | 7770 | 0.90 | 0.10 |
| Intervention Type | Combination Product | 8 | 1.00 | 0.00 |
| Intervention Type | Device | 10957 | 0.88 | 0.12 |
| Intervention Type | Diagnostic Test | 73 | 0.96 | 0.04 |
| Intervention Type | Dietary Supplement | 3811 | 0.95 | 0.05 |
| Intervention Type | Drug | 65364 | 0.88 | 0.12 |
| Intervention Type | Genetic | 500 | 0.90 | 0.10 |
| Intervention Type | Other1 | 11302 | 0.93 | 0.07 |
| Intervention Type | Procedure | 8908 | 0.89 | 0.11 |
| Intervention Type | Radiation | 751 | 0.78 | 0.22 |
| Study Phase | Early Phase 1 | 915 | 0.88 | 0.12 |
| Study Phase | Missing1 | 54082 | 0.93 | 0.07 |
| Study Phase | Phase 1 | 17396 | 0.91 | 0.09 |
| Study Phase | Phase 1/Phase 2 | 4432 | 0.84 | 0.16 |
| Study Phase | Phase 2 | 22544 | 0.85 | 0.15 |
| Study Phase | Phase 2/Phase 3 | 2423 | 0.87 | 0.13 |
| Study Phase | Phase 3 | 17992 | 0.89 | 0.11 |
| Study Phase | Phase 4 | 14187 | 0.90 | 0.10 |
| Intervention Model | Missing2 | 27542 | 0.93 | 0.07 |
| Intervention Model | Crossover Assignment | 11597 | 0.95 | 0.05 |
| Intervention Model | Factorial Assignment | 1836 | 0.94 | 0.06 |
| Intervention Model | Parallel Assignment | 61036 | 0.90 | 0.10 |
| Intervention Model | Sequential Assignment | 49 | 0.90 | 0.10 |
| Intervention Model | Single Group Assignment | 31911 | 0.86 | 0.14 |
| Allocation | Missing3 | 45626 | 0.90 | 0.10 |
| Allocation | Non-Randomized | 14325 | 0.88 | 0.12 |
| Allocation | Random Sample | 40 | 0.93 | 0.07 |
| Allocation | Randomized | 73980 | 0.91 | 0.09 |
| Enrollment Group | 7151 | 0.96 | 0.04 | |
| Enrollment Group | 0-100 | 80755 | 0.87 | 0.13 |
| Enrollment Group | 101-1000 | 42683 | 0.94 | 0.06 |
| Enrollment Group | Missing4 | 3382 | 0.95 | 0.05 |
Fig. 1Flowchart description - this is an updated one that reflects the LDA analysis and 3 competing models
Fig. 2The 25 topics with the top 10 term-topic probabilities
Topics, terms and possible construct descriptors
| Topic | Partial words (Ordered by probability) | Construct | Rank in prediction |
|---|---|---|---|
| Topic 1 | Studies, investigators, inflammatory, smoking, effects | Inflammation | 7 |
| Topic 2 | Cancer, tumor, cells, growth | Cancer | |
| Topic 3 | Patients, disease, treatment, sleep, quality, therapy | Disorder | |
| Topic 4 | Study, drug, purpose, blood, determine, | Drug | |
| Topic 5 | Liver, patients, study, treatment | Liver | |
| Topic 6 | Exercise, study, muscle, training | Exercise | |
| Topic 7 | Surgery, patients, study, surgical, postoperative | Surgery | 1 |
| Topic 8 | HIV, women, study, infants, risk | HIV/Pregnancy | 3 |
| Topic 9 | Patients, study, pressure, anesthesia, respiratory | Respiratory | 9 |
| Topic 10 | Blood, study, imaging, tests | Blood/Brain | 5 |
| Topic 11 | Weeks, months, time intervals | Duration | |
| Topic 12 | Patients, coronary, renal, cardiac, heart, disease | Coronary | 4 |
| Topic 13 | Diabetes, type 1 and 2, insulin | Diabetes | 10 |
| Topic 14 | Safety, study, efficacy, evaluate, placebo | Safety/Efficacy | |
| Topic 15 | Patients, study, dose, combination, treatment | Drug Dose/Combination | |
| Topic 16 | Cell, stem cells, transplant, immune | Stem Cell | 8 |
| Topic 17 | Study, skin, treatment, purpose, topical | Dermatological | 2 |
| Topic 18 | Vaccine, study, immune, safety, dose | Vaccine | |
| Topic 19 | Pain, heart, pulmonary, chronic, pulmonary | Pain/Pulmonary | |
| Topic 20 | Milligram, dose, study, single, healthy | Dose | |
| Topic 21 | Children, cognitive, treatment, intervention | Children/Cognitive | |
| Topic 22 | Phase, effectiveness, treating, stop | Study characteristics | |
| Topic 23 | Weight, study, diet, fat, effects | Weight control | |
| Topic 24 | Symptoms, treatments, disorder, depression | Mental Health | |
| Topic 25 | Care, intervention, health, patients, management | Public Health |
Fig. 3Average topic probabilities for each primary purpose category
Fig. 4Variable importance in terms of mean decrease in accuracy for the random forest fit using all 25 LDA-generated topics as well as the standard structured variables
Fig. 5ROC curve for random forest that includes both structured variables and LDA topic probabilities
Maximum likelihood estimates of odds ratios and 95% profile likelihood confidence intervals of odds ratios
| Odds Ratio | Lower | Upper | |
|---|---|---|---|
| enrollment_group 0–100 | 3.96 | 3.47 | 4.54 |
| enrollment_group 101–1000 | 1.56 | 1.36 | 1.80 |
| enrollment_group Missing | 1.23 | 1.00 | 1.51 |
| T7 (Surgery) | 1.90 | 1.74 | 2.07 |
| T17 (Dermatological) | 0.08 | 0.06 | 0.10 |
| T12 (Coronary) | 0.15 | 0.11 | 0.20 |
| T8 (HIV/Pregnancy) | 0.21 | 0.18 | 0.24 |