Literature DB >> 23892296

Development and evaluation of a biomedical search engine using a predicate-based vector space model.

Myungjae Kwak1, Gondy Leroy, Jesse D Martinez, Jeffrey Harwell.   

Abstract

Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search.
Copyright © 2013 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Information retrieval; Predicate; Search engine; Triple; Vector space model

Mesh:

Year:  2013        PMID: 23892296     DOI: 10.1016/j.jbi.2013.07.006

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  5 in total

1.  A Semantic-based Approach for Exploring Consumer Health Questions Using UMLS.

Authors:  Licong Cui; Shiqiang Tao; Guo-Qiang Zhang
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

Review 2.  Effects on Text Simplification: Evaluation of Splitting Up Noun Phrases.

Authors:  Gondy Leroy; David Kauchak; Alan Hogue
Journal:  J Health Commun       Date:  2016

3.  Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

Authors:  Yuanchao Liu; Ming Liu; Xin Wang
Journal:  PLoS One       Date:  2015-03-20       Impact factor: 3.240

4.  Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application.

Authors:  Gondy Leroy; Yang Gu; Sydney Pettygrove; Maureen K Galindo; Ananyaa Arora; Margaret Kurzius-Spencer
Journal:  J Med Internet Res       Date:  2018-11-07       Impact factor: 5.428

5.  Sieve-based relation extraction of gene regulatory networks from biological literature.

Authors:  Slavko Žitnik; Marinka Žitnik; Blaž Zupan; Marko Bajec
Journal:  BMC Bioinformatics       Date:  2015-10-30       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.