Literature DB >> 22683889

Identifying well-formed biomedical phrases in MEDLINE® text.

Won Kim1, Lana Yeganova, Donald C Comeau, W John Wilbur.   

Abstract

In the modern world people frequently interact with retrieval systems to satisfy their information needs. Humanly understandable well-formed phrases represent a crucial interface between humans and the web, and the ability to index and search with such phrases is beneficial for human-web interactions. In this paper we consider the problem of identifying humanly understandable, well formed, and high quality biomedical phrases in MEDLINE documents. The main approaches used previously for detecting such phrases are syntactic, statistical, and a hybrid approach combining these two. In this paper we propose a supervised learning approach for identifying high quality phrases. First we obtain a set of known well-formed useful phrases from an existing source and label these phrases as positive. We then extract from MEDLINE a large set of multiword strings that do not contain stop words or punctuation. We believe this unlabeled set contains many well-formed phrases. Our goal is to identify these additional high quality phrases. We examine various feature combinations and several machine learning strategies designed to solve this problem. A proper choice of machine learning methods and features identifies in the large collection strings that are likely to be high quality phrases. We evaluate our approach by making human judgments on multiword strings extracted from MEDLINE using our methods. We find that over 85% of such extracted phrase candidates are humanly judged to be of high quality. Published by Elsevier Inc.

Entities:  

Mesh:

Year:  2012        PMID: 22683889      PMCID: PMC3465642          DOI: 10.1016/j.jbi.2012.05.005

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  6 in total

1.  Extracting noun phrases for all of MEDLINE.

Authors:  N A Bennett; Q He; K Powell; B R Schatz
Journal:  Proc AMIA Symp       Date:  1999

2.  Corpus-based statistical screening for phrase identification.

Authors:  W Kim; W J Wilbur
Journal:  J Am Med Inform Assoc       Date:  2000 Sep-Oct       Impact factor: 4.497

3.  MedPost: a part-of-speech tagger for bioMedical text.

Authors:  L Smith; T Rindflesch; W J Wilbur
Journal:  Bioinformatics       Date:  2004-04-08       Impact factor: 6.937

4.  How to Interpret PubMed Queries and Why It Matters.

Authors:  Lana Yeganova; Donald C Comeau; Won Kim; W John Wilbur
Journal:  J Am Soc Inf Sci Technol       Date:  2008-11-06

5.  The Ineffectiveness of Within - Document Term Frequency in Text Classification.

Authors:  W John Wilbur; Won Kim
Journal:  Inf Retr Boston       Date:  2009-10-01       Impact factor: 2.293

6.  Abbreviation definition identification based on automatic precision estimates.

Authors:  Sunghwan Sohn; Donald C Comeau; Won Kim; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2008-09-25       Impact factor: 3.169

  6 in total
  3 in total

1.  Retro: concept-based clustering of biomedical topical sets.

Authors:  Lana Yeganova; Won Kim; Sun Kim; W John Wilbur
Journal:  Bioinformatics       Date:  2014-07-29       Impact factor: 6.937

2.  MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.

Authors:  Yuqing Mao; Zhiyong Lu
Journal:  J Biomed Semantics       Date:  2017-04-17

3.  PubMed Phrases, an open set of coherent phrases for searching biomedical literature.

Authors:  Sun Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  Sci Data       Date:  2018-06-12       Impact factor: 6.444

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.