| Literature DB >> 34722102 |
Sudha Cheerkoot-Jalim1, Kavi Kumar Khedo2.
Abstract
PURPOSE: Literature-Based Discovery (LBD) is a text mining technique used to generate novel hypotheses from vast amounts of literature sources, by identifying links between concepts from disparate sources. One of the main areas where it has been predominantly applied is the healthcare domain, whereby promising results, in the form of novel hypotheses, have been reported. The purpose of this work was to conduct a systematic literature review of recent publications on LBD in the healthcare domain in order to assess the trends in the approaches used and to identify issues and challenges for such systems.Entities:
Keywords: Evidence-based healthcare; Knowledge translation; Literature-based discovery; Systematic review
Year: 2021 PMID: 34722102 PMCID: PMC8542914 DOI: 10.1007/s12553-021-00605-y
Source DB: PubMed Journal: Health Technol (Berl) ISSN: 2190-7196
Quality Assessment Criteria
| No | Quality Criteria | Outcome |
|---|---|---|
| QC1 | Has the LBD approach used been described in detail? | Yes: The LBD approach used has been described in detail Partially: The LBD approach used has been briefly described No: The LBD approach used has not been described |
| QC2 | Was there a discovery following the research work? | Yes: There was a discovery No: No discovery was made |
| QC3 | Did the study include a concise evaluation strategy? | Yes: A concise evaluation was done Partially: The evaluation was not intensive No: No evaluation was done |
| QC4 | Does the study give insights on research challenges and future directions? | Yes: The study gives insights on research challenges and future directions No: The study does not give insights on research challenges and future directions |
Fig. 1PRISMA Flow Diagram for the Study Selection Process
Data Synthesis
| Study | Area of application / Discovery | Literature Source/s | Type of Discovery | Techniques Used | Tools Used | Performance Issues | Challenges |
|---|---|---|---|---|---|---|---|
| Rastegar-Mojarad et al. [ | Drug repositioning – generation of potential drug-disease pairs | Medline Abstracts | Open | ABC Model of LBD, Semantic Predications, NLP | SemRep | Actual performance of the system could not be accurately benchmarked, Inherent limitation to evaluate confidence levels of generated pairs | Ranking of the generated candidates, |
| Rigorous validation is required before proceeding to laboratory or clinical investigations | |||||||
| Yang et al. [ | Identification of new candidates for repurposing as anticancer drugs | Medline Database | Open | ABC model of LBD, relationship extraction, text mining-based ranking method | Apache Lucene search engine | Higher precision can be achieved by the use of a comprehensive lexicon, Negative relationships not considered | Consideration of aliases, Normalization of gene and disease targets |
| Raja et al. [ | Repurposing drugs for four diseases | PubMed Abstracts | Open | ABC model of LBD, Co-occurrence | KinderMiner | Evaluation was done by comparing the prediction score of annotated drugs and new drugs | Differentiation between positive and negative associations |
| Rastegar-Mojarad et al. [ | Discovering drug-disease relations (drug repositioning and adverse drug reactions) | PubMed | Open | Classification of drug-disease relations into desired classes for ranking hypotheses | SemRep | Evaluation was done for balanced data sets only | Train and evaluate classifier using unbalanced data sets, Identification and removal of false positive candidates |
| Zhao et al. [ | Discovery of potential drugs for diseases | PubMed | Open | Convolutional Neural Network, | SemRep | Weak generalization, due to small data set used to train the model, A larger knowledge base may affect efficiency | Use larger data set by combining other drug-disease databases, Improve the NLP technology to be able to cope with a larger knowledge base |
| Logistic Regression, Attention mechanism, Path ranking algorithm | |||||||
| Sang et al. [ | Discovery of candidate drugs for diseases | PubMed abstracts | Open | Logistic regression model | MetaMap, SemRep | Accuracy of MetaMap reduces because of inability to resolve word sense disambiguation, Considerable number of false predications | Development of high-quality NLP tools, Graph embedding to obtain long paths |
| Sosa et al. [ | Drug repurposing for rare diseases | GNBR (Global Network of Biological Relationships) (PubMed abstracts) | Open | Knowledge graph embedding | N/A | Performance decreases, since only ‘treatment’ relationships are chosen, while potential other relationships are discarded | Failure to capture complex and indirect relationships. |
| Zhang et al. [ | Drug repurposing for Covid-19 | PubMed, CORD-19 | Open | Knowledge Graph Completion, Translational and semantic matching models | SemRep, MetaMap | Accuracy was affected due to loss of information by the use of sub-graphs and the precision and recall of SemRep | Inclusion of other types of biological data |
| Xie et al. [ | Identification of alternative herbs for drugs that cause side effects | PubMed, Chinese Science Database (CNKI) | Open | ABC model of LBD, Co-occurrence, Gene enrichment analysis | N/A | N/A | Inclusion of similarity of chemical compounds |
| Zhang et al. [ | Exploration of interactions between cancer drugs and dietary supplements | Medline | Closed | Concept mapping, Machine learning-based filtering | SemRep, MetaMap | Limited precision and recall, Knowledge source shortcomings and linguistic issues | Use of machine learning for the automatic filtering of semantic predications |
| Malec et al. [ | Pharmacovigilance – Detection of drug-ADE associations | Medline | Open | Feature selection, Co-occurrence based analyses | SemRep | Search for cofounders was relatively shallow, Reference data sets not perfectly accurate | Consideration of comorbidities and co-medications, Analysis of FAERS data instead of only EHR data |
| Mower et al. [ | Prediction for unseen drug-side effect pairs | Medline citations | Open | Machine Learning, Composite feature vectors | SemRep, SIDER | Robust performance for the ESP-based model | Terminological mapping, Abstraction methodologies, Integration of observational data sources |
| Hristovski et al. [ | Provide pharmacological and pharmacogenomic explanations for reported ADEs | Medline | Closed | Semantic relation extraction | SemBT, SemRep | N/A | N/A |
| Meng et al. [ | Identification of possible rehabilitation therapies for stroke | PubMed | Open | ABC Model of LBD, Co-occurrence | E-Utilities | N/A | N/A |
| Pyysalo et al. [ | Discoveries on the molecular biology of cancer | PubMed | Open and Closed | Machine Learning, Natural Language Processing, Co-occurrence based metrics | PubTator, Hallmarks of Cancer Classifier | System recognizes a single target response for each case and manual analysis is required | Inclusion of full texts |
| Kostoff and Patel. [ | Identification of foundational causes of chronic kidney disease, Identification of treatments | Medline | Open | Co-occurrence | Vantage Point software | Uncertainty in the ‘degree’ of cause removal | The magnitude of the associations could not be determined, Identification of ‘mix’ of causes for an individual patient |
| Gubiani et al. [ | Discoveries about connections between diet and degenerative diseases | PubMed | Open | Ontologies, RaJoLink | OntoGen | N/A | Validation of robust in-silico tools |
| Gubiani et al. [ | Identify molecular links between Alzheimer’s disease and gut microbiota | PubMed | Closed | Outlier detection, Cross-domain exploration | OntoGen, CrossBee | Manual review by experts is required | Development of a tool to provide recommendations for hypothesis generation, Semi-automated generation of ontologies, Use of term extraction |
Kostoff et al. [ | Identification of possible treatments for Inflammatory Bowel Disease | Medline | Open | Query formulation using biomarkers and theory desired treatment-derived directions of change | LRDI (Literaure Related Discovery and Innovation) | N/A | N/A |
| Chen et al. [ | Detection of associations among complex diseases | PubMed abstracts | Open | Latent Semantic Analysis, Spectral clustering algorithm | SemMedDB (SemRep) | Performance could be improved by the use of deep learning | Use of large amount of training/testing data |
| Rindflesch et al. [ | A plausible explanation for the correlation between epilepsy and inflammatory bowel disease | Medline titles and abstracts | Closed | Discovery Browsing | SemRep | SemRep is not accurate and has low values for precision and recall | Requirement to not only rely on semantic predications and manual inspection of citations |
| Dai et al. [ | Identify candidate genes for the interaction between myocardial infarction and depression | Medline | Closed | ABC model of LBD | BITOLA | N/A | N/A |
| Rather et al. [ | Discovery of potential new biomedical knowledge (relationships) | PubMed Abstracts, Clinical Trial protocols, NIHR grants summary | Open | Deep learning | Word2vec | N/A | Using a larger text corpus to find more meaningful and strong patterns, Exploratory analysis methods to discover hidden patterns |
N/A means Not Available
Main application areas for LBD in healthcare
| Application Area | Number of studies |
|---|---|
| Drug repurposing | 8 |
| Pharmacovigilance and drug interactions | 5 |
| Identification of potential causes, therapies or treatments for specific diseases | 6 |
| Explanation for the correlation between diseases | 3 |
| Discovery of new biomedical knowledge (relationships) | 1 |
Biomedical concepts A, B and C considered in the ABC model of LBD
| Study | Concept A | Concept B | Concept C | Type of Discovery | Discovery |
|---|---|---|---|---|---|
| Meng et al. [ | Stroke | Assessment Scales | Rehabilitation Therapy | Open | Hand-arm bimanual intensive training (HABIT) was found to be a promising rehabilitation therapy for stroke |
| Rastegar-Mojarad et al. [ | Drug | Gene | Disease | Open | Potential novel drug-disease pairs |
| Rindflesch et al. [ | Inflammatory Bowel Disease (IBD) | Interleukin-1 beta and glutamate | Epilepsy | Closed | Interleukin-1 beta influence on glutamate levels is involved in the etiology of both IBD and Epilepsy |
| Yang et al.[ | Disease | Gene | Drug | Open | Potential anticancer drugs |
| Xie et al. [ | Drug | Indication (depression) / Side Effect | Herb (Traditional Chinese Medicine) | Open | The herb Pogostemon Cablin Benth can be an alternative to the drug Nefazodone, since it can mitigate the side effects |
| Raja et al. [ | Disease | Phenotypes, symptoms | Drug | Open | Potential drugs identified for four diseases |
| Pyysalo et al. [ | Arsenic | Nrf2 Gene | Autotaxin Protein | Closed | The properties of the Nrf2 gene explained the connection between arsenic and the autotaxin protein |
| Zhang et al. [ | Cancer drug | Gene | Dietary supplement | Closed | Echninacea was found to be the first drug supplement interaction candidate area of interest |
| Gubiani et al. [ | Alzheimer’s disease | Chemicals, mechanisms of action, cell components | Gut microbiota | Closed | Nitric Oxide Synthase was found to be a promising novel bridging term for the neuronal and immunity field |
| Rastegar-Mojarad et al. [ | Drug | Gene | Disease | Open | Potential novel drug-disease pairs |
| Sang et al. [ | Disease | Protein | Drug | Open | Potential novel disease-drug pairs |
| Dai et al. [ | Myocardial Infarction (MI) | Gene, gene product | Depressive disorder | Closed | Genes GNB3, CNR1, MTHFR and NCAM1 were found to be new putative candidate genes that may influence the interactions between MI and depression |
| Hristovski et al. [ | Drug | Gene, protein | Adverse effect | Closed | Explanation for the association between drug and adverse effect through linking genes or proteins |
| Zhang et al. [ | Drug | Any concept | Disease (Covid-19) | Open | A list of potential drugs for Covid-19 |