| Literature DB >> 26423616 |
Pepi Sfakianaki1, Lefteris Koumakis2, Stelios Sfakianakis1, Galatia Iatraki1, Giorgos Zacharioudakis1, Norbert Graf3, Kostas Marias1, Manolis Tsiknakis1,4.
Abstract
BACKGROUND: A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language.Entities:
Mesh:
Year: 2015 PMID: 26423616 PMCID: PMC4591066 DOI: 10.1186/s12911-015-0200-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
An example of a resource and its description
| Name | Summary | Tags (Principal bioinformatics methods) | |
|---|---|---|---|
| GeneTalk | GeneTalk, a web-based platform, that can filter, reduce and prioritize human sequence variants from NGS data and assist in the time consuming and costly interpretation of personal variants in clinical context. It serves as an expert exchange platform for clinicians and scientists who are searching for information about specific sequence variants and connects them to share and exchange expertise on variants that are potentially disease-relevant. | Genetic variation annotation, Sequence variation analysis, Variant Calling, Structural variation discovery, Filtering, Annotation, Database, Exome analysis, Sequence analysis, Variant Classification, Viewer | |
| Link | Input (format) | Output (format) | Category |
|
| VCF | VCF,XLS,XLSX | Sequence Analysis |
Fig. 1The architecture of the framework. The architecture of the framework: 1) tools registration, 2) tools annotation, 3) user’s question in natural language and NLP processing, 4) form and send the query, and 5) retrieve results (related tools)
The 27 semantic categories. The 27 prime categories: 23 from UMLS semantic types and 4 from Edam categories
| 1. Disease | 2. Drug | 3. Medical Procedure | 4. Tissue |
| 5. Biomedical | 6. Cell | 7. Organism Function | 8. Finding |
| 9. Body Part | 10. Gene | 11. Clinical Attribute | 12. Patient |
| 13. Diagnosis | 14. Age | 15. Molecular Sequence | 16. Device |
| 17. Symptom | 18. Virus | 19. Injury or Poisoning | 20. Vitamin |
| 21. Laboratory | 22. Food | 23. Temporal Concept | |
| 24. EDAM Data/Format | 25. EDAM Topic | 26. EDAM Operation | 27. EDAM Identifier |
The list of patterns generated by the combination of prime categories
| 1. Drug for Disease | 2. Edam Data/Format and Edam Operation |
| 3. Patient took Drug for Disease | 4. Finding with Organism_Function |
| 5. Drug for Disease in Body Part | 6. Finding in/with Medical_Procedure |
| 7. Drug for Symptom | 8. Edam_Data in Body_Part |
| 9. Patient has Disease in Body Part | 10. Patient took Drug for Disease in Body Part |
| 11. Patient took Drug for Body Part | 12. Patient has been in Medical Procedure |
| 13. Patient has Disease | 14. Patient has Organism Function |
| 15. Disease in Body Part | 16. Drug for Edam_Data in Body_Part |
| 17. Patient’s Finding | 18. Drug & Drug for Disease |
| 19. Patient took Vitamin | 20. Symptom of Medical_Procedure |
| 21. Patient has Symptom | 22. Drug for Edam_Data |
| 23. Patient ate Food | 24. Edam Data/Format and Edam Operation and Edam Data/Format |
Fig. 2Annotation example from Concept Recognizer. The annotation from the Concept Recognizer of the given data sentence “John has lung cancer and has been treated with carboplatin which is known for toxicology adverse effects”
Results of the first clinical question. The results given by the framework to the first clinical question. The list of individual tools that could solve the entire clinical question are listed at the top which are then followed by a list of the tools that could be combined, i.e. pipelined, for providing an answer to the given clinical question
| Unique Tools List | ||
|---|---|---|
| SCORE | TOOL NAME | Identified (query) |
| 4.75 = 3 (in) + 1 (out) + 0.75 (tag) | National Cancer Institute SEER API | carboplatin & cancer (in) |
| cancer (in) | ||
| lung cancer (in) | ||
| drug (out) | ||
| 4 = 3 (in) + 1 (out) | cBio Cancer Genomics Data Server | carboplatin & cancer (in) |
| (CGDS) API | cancer (in) | |
| lung cancer (in) | ||
| find (out) | ||
| 4 = 1 (in) + 3 (out) | EUADR - Literature analysis | adverse effects (in) |
| drug-references (out) | ||
| drug (out) | ||
| literature (out) | ||
| Pipeline Tools List | ||
| FIRST TOOL | SECOND TOOL | |
| National Cancer Institue caDSR API | AIDSinfo API | |
| China Cancer Database API | AIDSinfo API | |
| Single Tools List | ||
| SCORE | TOOL NAME | |
| 3.75 = 3 (in) + 3*0.25 (tag) | The Cancer Genome Atlas API | |
| 3.75 = 3 (in) + 3*0.25 (tag) | China Cancer Database API | |
| 3 (in) | National Cancer Institue caDSR API | |
| 3 (in) | MuTect | |
| 2.25 = 1 (out) + 5*0.25 (tag) | Lexicomp API | |
| 2 (out) | Arabidopsis thaliana Microarray Analysis | |
| 2 (out) | Pathways and Gene annotations for QTL region | |
| 2 (out) | SciBite API | |
| 2 (out) | DGIdb API | |
| 2 = 4*0.25 (tag) | DailyMed API | |
| 2 = 4*0.25 (tag) | Aetna CarePass API | |
| 2 = 4*0.25 (tag) | National Institute on Drug Abuse Drug Screening Tool API | |
Precision and recall for the first clinical question. Precision and recall of the automated resource discovery in attempting to find solutions to the first clinical question as compared to results manually identified by domain experts based on the description of the tools
| Tools identified | Precision (%) | Recall (%) | #Best rank of tools that can solve the question at once (no pipelines) | |
|---|---|---|---|---|
| Free Text | 164 | 40 | 73 | 3 out of 164 |
| NLP Framework | 11 | 100 | 14 | 1st |
Results for the second clinical question. The results given by the framework to the second clinical question. The list of individual tools that could solve the entire clinical question are listed at the top which are then followed by a list of the tools that could be combined, i.e. pipelined, for providing an answer to the given clinical question
| Unique Tools List | ||
|---|---|---|
| SCORE | TOOL NAME | Identified (query) |
| 3 = 1 (in) + 2 (out) | miRNApath | mirna (in) |
| gene expression (out) | ||
| kegg pathways (out) | ||
| 3 = 1 (in) + 2 (out) | mirPath | mirna (in) |
| gene expression (out) | ||
| kegg pathways (out) | ||
| 3 = 1 (in) + 2 (out) | mirtarbase | mirna (in) |
| gene expression (out) | ||
| kegg pathways (out) | ||
| No results found on this category ‘Pipeline Tools List’ for the given question | ||
| Single Tools List | ||
| SCORE | TOOL NAME | |
| 4 (out) | Get Pathway-Genes and gene description by Entrez gene id | |
| 4 (out) | Arabidopsis thaliana Microarray Analysis | |
| 4 (out) | MinePath | |
| 4 (out) | EnrichNet API | |
| 4 (out) | NCBI Gi to Kegg Pathway Descriptions | |
| 4 (out) | MitoMiner API | |
| 4 (out) | BiologicalNetworks API | |
| 4 (out) | From cDNA Microarray Raw Data to Pathways and Published Abstracts | |
| 4 (out) | HUMAN Microarray CEL file to candidate pathways | |
| 4 (out) | ERGO Genome Analysis and Discovery System | |
| 4 (out) | BioCyc API | |
| 4 (out) | Mouse Microarray Analysis | |
Precision and recall for the second clinical question. Precision and recall of the automated resource discovery in attempting to find solutions to the second clinical as compared to results manually identified by domain experts based on the description of the tools
| Tools identified | Precision (%) | Recall (%) | #Best rank of tools that can solve the question at once (no pipelines) | |
|---|---|---|---|---|
| Free Text | 231 | 25 | 59 | 2 out of 231 |
| NLP Framework | 17 | 100 | 17 | 1st & 2rd |