Literature DB >> 31725862

Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.

Yanshan Wang1, Majid Rastegar-Mojarad1, Ravikumar Komandur-Elayavilli1, Hongfang Liu1.   

Abstract

The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers' queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Year:  2017        PMID: 31725862      PMCID: PMC7243926          DOI: 10.1093/database/bax091

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  18 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Medical Subject Headings (MeSH).

Authors:  C E Lipscomb
Journal:  Bull Med Libr Assoc       Date:  2000-07

3.  BeCAS: biomedical concept recognition services and visualization.

Authors:  Tiago Nunes; David Campos; Sérgio Matos; José Luís Oliveira
Journal:  Bioinformatics       Date:  2013-06-04       Impact factor: 6.937

4.  The Unified Medical Language System: toward a collaborative approach for solving terminologic problems.

Authors:  K E Campbell; D E Oliver; E H Shortliffe
Journal:  J Am Med Inform Assoc       Date:  1998 Jan-Feb       Impact factor: 4.497

5.  A Part-Of-Speech term weighting scheme for biomedical information retrieval.

Authors:  Yanshan Wang; Stephen Wu; Dingcheng Li; Saeed Mehrabi; Hongfang Liu
Journal:  J Biomed Inform       Date:  2016-09-01       Impact factor: 6.317

6.  A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge.

Authors:  Trevor Cohen; Kirk Roberts; Anupama E Gururaj; Xiaoling Chen; Saeid Pournejati; George Alter; William R Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Hua Xu
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

7.  Multi-field query expansion is effective for biomedical dataset retrieval.

Authors:  Mohamed Reda Bouadjenek; Karin Verspoor
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

8.  BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences.

Authors:  K E Ravikumar; Majid Rastegar-Mojarad; Hongfang Liu
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

9.  The Comparative Toxicogenomics Database (CTD).

Authors:  Carolyn J Mattingly; Glenn T Colby; John N Forrest; James L Boyer
Journal:  Environ Health Perspect       Date:  2003-05       Impact factor: 9.031

10.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

View more
  1 in total

1.  A semantic relationship mining method among disorders, genes, and drugs from different biomedical datasets.

Authors:  Li Zhang; Jiamei Hu; Qianzhi Xu; Fang Li; Guozheng Rao; Cui Tao
Journal:  BMC Med Inform Decis Mak       Date:  2020-12-14       Impact factor: 2.796

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.