| Literature DB >> 27492440 |
Alisa Surkis1, Janice A Hogle2, Deborah DiazGranados3, Joe D Hunt4, Paul E Mazmanian3, Emily Connors5, Kate Westaby6, Elizabeth C Whipple7, Trisha Adamus8, Meridith Mueller9, Yindalon Aphinyanaphongs10.
Abstract
BACKGROUND: Translational research is a key area of focus of the National Institutes of Health (NIH), as demonstrated by the substantial investment in the Clinical and Translational Science Award (CTSA) program. The goal of the CTSA program is to accelerate the translation of discoveries from the bench to the bedside and into communities. Different classification systems have been used to capture the spectrum of basic to clinical to population health research, with substantial differences in the number of categories and their definitions. Evaluation of the effectiveness of the CTSA program and of translational research in general is hampered by the lack of rigor in these definitions and their application. This study adds rigor to the classification process by creating a checklist to evaluate publications across the translational spectrum and operationalizes these classifications by building machine learning-based text classifiers to categorize these publications.Entities:
Keywords: Knowledge translation; Machine learning; Text classification; Translational research
Mesh:
Year: 2016 PMID: 27492440 PMCID: PMC4974725 DOI: 10.1186/s12967-016-0992-8
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Full definitions for each of the phases along the translational research spectrum
| T0 |
|
| Includes preclinical and animal studies | |
| May or may not consider a particular disease process | |
| May include human subjects, but does | |
| Goal is to understand the human condition and environment as it exists | |
| Focuses on understanding biological, social and behavioral mechanisms that underlie health or disease | |
| Defining mechanisms, biomarkers, targets for therapeutic development; drug discovery (lead molecule screening, optimization, formulation); prototyping; physical assessments (radiology, laboratory, biopsy) | |
| Can include non-interventional, correlational epidemiologic studies using existing large data sets | |
| Studies mechanisms or derive modifications of cells, proteins, and DNA present in human disease processes | |
| Identifies functional significance and mechanisms of genomic polymorphisms identified by human genome-wide association studies | |
| T1 |
|
| Involves proof of concept studies | |
| Includes Phase 1 clinical trials | |
| Healthy subjects or select population of patients | |
| Small sample size | |
| Tests for safety | |
| Focuses on new methods of diagnosis, treatment, and prevention | |
| Takes place in highly controlled research settings | |
| T2 |
|
| Involves controlled clinical research studies which may lead to the basis for clinical application and evidence-based guidelines | |
| Yields knowledge about the efficacy of interventions in highly-controlled/protocol-driven settings | |
| Goal is to identify and analyze the optimal effects of an intervention on the human condition or environment | |
| Phase 2 clinical trials—focus on safety and efficacy (dose-response) | |
| Select population of patients | |
| Relatively large sample size | |
| Phase 3 clinical trials—focus on safety and efficacy | |
| Select population of patients | |
| Special groups of patients (ex. renal failure) | |
| T3 |
|
| Includes comparative effectiveness, pragmatic clinical trials, community based participatory research, dissemination and implementation research, and clinical outcomes research, post-marketing analysis (Phase 4) | |
| Health services research, including reasons for gaps in care and delivery of recommended and timely care to the right patient | |
| Meta-analyses, and systematic reviews involving interventions | |
| Development and implementation of evidenced-based guidelines, policies, and best practices | |
| T4 |
|
| Includes population-level outcomes research: population monitoring of morbidity, mortality, benefits, and risks | |
| Focuses on wider dissemination/implementation of improved practices/interventions (taking to scale) | |
| Focuses on impacts of policy and/or environmental change | |
| Studies focusing on disease prevention through lifestyle and behavioral modifications | |
| Documents “real-world” health outcomes of population health practices associated with improved disease prevention and reduced medical costs | |
| Results in true benefit to society |
Checklist for each of five categories along the translational research spectrum
| 1. Does the research involve use of animals? | T0 |
| 2. Does the research involve study of mechanisms, relationships, or modification of proteins, DNA, or cells? | T0 |
| 3. Is the research a Genome Wide Association Study, determining association of a SNP with a particular disease state? | T0 |
| 4. Does the study examine drug interactions with molecular receptor or enzyme; effects on cell biochemistry, or how to optimize interaction? | T0 |
| 5. Does the publication describe the creation of a prototype for a new medical device? | T0 |
| 6. Does the research test a new methodology with potential for use in diagnosis, treatment, or prevention, providing a basis for follow-up study to directly test the methodology for safety, feasibility, preliminary results, etc. NOTE: This may be done through use of existing EHR data, other health datasets, or through taking measurements from humans, such as physical assessments (radiology, laboratory, biopsy) or response to stimulus (but NOT response to intervention). | T0 |
| 7. Is a new association between biological, social, and/or behavioral states determined, including association between presence or progression of a disease state and a biomarker, social, or behavioral state? Note: This may be done through use of existing EHR data, other health datasets, or through taking measurements from humans, such as physical assessments or response to stimulus (but NOT response to intervention). | T0 |
| 8. Does the research explore a biological, social, or behavioral mechanism, including the mechanism underlying the presence or progression of a disease? NOTE: This may be done through use of existing EHR data, other health datasets, or through taking measurements from humans, such as physical assessments or response to stimulus (but NOT intervention). | T0 |
| 9. Is it a systematic review or meta-analysis of research that seeks to establish a correlation or elucidate a mechanism (i.e., review of T0 research), or to establish need for further work at the T0 level (e.g., further methodology development)? | T0 |
| 10. Is it a Phase I clinical trial? | T1 |
| 11. Does the study test the effect of a new intervention on healthy volunteers in a controlled clinical setting? | T1 |
| 12. Does the research suggest a new method of diagnosis (e.g., biomarker) or new intervention, determine feasibility or safety, or test it in a small group? Note: This should not be research that would lead directly to a practice guideline. | T1 |
| 13. Does research describe the use of a new device on a small population to determine potential usefulness and usability? | T1 |
| 14. Is it a Phase II or Phase III clinical trial? | T2 |
| 15. Does the study test efficacy and/or determine dosing levels of an intervention in a population with a given disease in a controlled clinical setting? | T2 |
| 16. Does the study determine the optimal use of a new medical device? | T2 |
| 17. Does the study determine the efficacy or optimal use of a new method of diagnosis or prevention, including through the use of existing datasets? | T2 |
| 18. Is it a Phase IV clinical trial? | T3 |
| 19. Does the research study effectiveness of an intervention or method of diagnosis in the clinic or community, either through the use of existing health datasets or new research? | T3 |
| 20. Does the research study inconsistency or variation in the application of a diagnosis or intervention? | T3 |
| 21. Does the research compare the effectiveness of existing health care interventions to determine which work best for which patients and which pose the greatest benefits and harms? | T3 |
| 22. Does the research involve interventions in the community with input from community members? | T3 |
| 23. Does the research determine mechanisms underlying effective health care delivery in practice or community settings? | T3 |
| 24. Does the research identify problems with effective health care delivery in practice or community settings? | T3 |
| 25. Does the research test an intervention to improve healthcare delivery in practice or community settings? | T3 |
| 26. Does the research study real world factors affecting interventions (cost, convenience, accessibility, patient preferences)? | T3 |
| 27. Does the research determine reasons why gaps in care exist? | T3 |
| 28. Is it a systematic review or meta-analysis of interventions, diagnoses, or something that could lead directly to practice guidelines, or a review article that suggests practice guidelines or workflow? | T3 |
| 29. Does the publication provide evidence-based guidelines/policies or best practices? | T3 |
| 30. Does the research use large datasets to monitor morbidity, mortality, benefits, or risks of interventions in populations? | T4 |
| 31. Does the research study the incidence or prevalence of a disease in a population? | T4 |
| 32. Does the research examine problems with, mechanisms underlying, potential interventions, impact, or real world outcomes (e.g., level of disease prevention, reduced medical costs) of population level practices/interventions? | T4 |
| 33. Does the research study the impact on health of a policy or environmental change that affects a population? | T4 |
| 34. Does the research focus on the development or outcomes of population level behavioral/lifestyle interventions? | T4 |
Concise definitions for each of the phases along the translational research spectrum
| T0 |
|
|
| |
| Includes: preclinical and animal studies; GWAS studies; studies of cells, proteins, and DNA; studies on humans or existing datasets that focus on understanding biological, social and behavioral mechanisms that underlie health or disease | |
| T1 |
|
|
| |
| Includes: proof of concept studies; Phase 1 clinical trials; studies testing feasibility or safety of a new method of diagnosis, treatment, or prevention | |
| T2 |
|
|
| |
| Includes: Phase 2/3 clinical trials; studies to test efficacy of interventions in highly controlled settings | |
| T3 |
|
|
| |
| Includes: Phase 4 clinical trials, comparative effectiveness research, community based participatory research, dissemination and implementation research, clinical outcomes research, health services research, meta-analyses/systematic reviews of interventions, development/implementation of guidelines | |
| T4 |
|
|
| |
| Includes: population-level outcomes research; wider implementation and dissemination; policy impacts; disease prevention through lifestyle/behavior modifications; real-world health outcomes; true benefit to society |
Breakdown of training sets by categories into which articles were classified by coders
| Training set 1 | Training set 2 | Combined training set | |
|---|---|---|---|
| T0 | 106 | 56 | 162 |
| T1/T2 | 18 | 46 | 68 |
| T3/T4 | 44 | 50 | 94 |
| TX | 18 | 12 | 30 |
| Total included in training set |
|
|
|
| Not included in training set | 15 | 22 | 33 |
| Total |
|
|
|
T0 through T4 are the phases of research along the translational spectrum. TX denotes publications that were determined by the coders to not fall into any of the T0 through T4 categories. Uncoded denotes publications on which no agreement could be reached by the coders as to the correct category. Note that there is one article that was determined to fall into both the T0 and T1 categories, thus resulting in a total of 387 codings for the 386 articles that were coded
AUC and performance ranges for each classifier with different machine learning algorithms
| Classifier | Translational class | ||
|---|---|---|---|
| T0 | T1/T2 | T3/T4 | |
| Naïve Bayes | 0.91 (0.80–0.97) | 0.78 (0.45–0.97) | 0.87 (0.72–0.97) |
| Liblinear (linear support vector machine) |
|
|
|
| Random forest |
|
| 0.87 (0.72–0.98) |
| Bayesian logistic regression | 0.92 (0.82 |
|
|
Best performing algorithm(s) for each classifier are italicized
Results for coding by lead author of 50 articles randomly selected from each decile of T0 classifier scores (PubMed IDs and coding available in Additional file 5)
| Decile | T0 | notT0 | Threshold | FPR | TPR |
|---|---|---|---|---|---|
| 0.9–1.0 | 46 | 4 | 1 | 0 | 0 |
| 0.8–0.9 | 40 | 10 | 0.9 | 0.013986 | 0.214953 |
| 0.7–0.8 | 36 | 14 | 0.8 | 0.048951 | 0.401869 |
| 0.6–0.7 | 32 | 18 | 0.7 | 0.097902 | 0.570093 |
| 0.5–0.6 | 22 | 28 | 0.6 | 0.160839 | 0.719626 |
| 0.4–0.5 | 12 | 38 | 0.5 | 0.258741 | 0.82243 |
| 0.3–0.4 | 12 | 38 | 0.4 | 0.391608 | 0.878505 |
| 0.2–0.3 | 8 | 42 | 0.3 | 0.524476 | 0.934579 |
| 0.1–0.2 | 5 | 45 | 0.2 | 0.671329 | 0.971963 |
| 0–0.1 | 1 | 49 | 0.1 | 0.828671 | 0.995327 |
Columns contain the number of publications classified as either T0 or not T0 above each threshold classifier value, along with the calculated false positive rate (FPR) and true positive rate (TPR)