Literature DB >> 29893755

PubMed Phrases, an open set of coherent phrases for searching biomedical literature.

Sun Kim1, Lana Yeganova1, Donald C Comeau1, W John Wilbur1, Zhiyong Lu1.   

Abstract

In biomedicine, key concepts are often expressed by multiple words (e.g., 'zinc finger protein'). Previous work has shown treating a sequence of words as a meaningful unit, where applicable, is not only important for human understanding but also beneficial for automatic information seeking. Here we present a collection of PubMed® Phrases that are beneficial for information retrieval and human comprehension. We define these phrases as coherent chunks that are logically connected. To collect the phrase set, we apply the hypergeometric test to detect segments of consecutive terms that are likely to appear together in PubMed. These text segments are then filtered using the BM25 ranking function to ensure that they are beneficial from an information retrieval perspective. Thus, we obtain a set of 705,915 PubMed Phrases. We evaluate the quality of the set by investigating PubMed user click data and manually annotating a sample of 500 randomly selected noun phrases. We also analyze and discuss the usage of these PubMed Phrases in literature search.

Entities:  

Year:  2018        PMID: 29893755      PMCID: PMC5996850          DOI: 10.1038/sdata.2018.104

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


  10 in total

1.  Extracting noun phrases for all of MEDLINE.

Authors:  N A Bennett; Q He; K Powell; B R Schatz
Journal:  Proc AMIA Symp       Date:  1999

2.  Corpus-based statistical screening for phrase identification.

Authors:  W Kim; W J Wilbur
Journal:  J Am Med Inform Assoc       Date:  2000 Sep-Oct       Impact factor: 4.497

3.  Identifying well-formed biomedical phrases in MEDLINE® text.

Authors:  Won Kim; Lana Yeganova; Donald C Comeau; W John Wilbur
Journal:  J Biomed Inform       Date:  2012-06-08       Impact factor: 6.317

4.  Relative Effectiveness of Document Titles and Abstracts for Determining Relevance of Documents.

Authors:  A Resnick
Journal:  Science       Date:  1961-10-06       Impact factor: 47.728

5.  Retro: concept-based clustering of biomedical topical sets.

Authors:  Lana Yeganova; Won Kim; Sun Kim; W John Wilbur
Journal:  Bioinformatics       Date:  2014-07-29       Impact factor: 6.937

6.  Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents.

Authors:  Sun Kim; Nicolas Fiorini; W John Wilbur; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2017-10-03       Impact factor: 6.317

7.  How to Interpret PubMed Queries and Why It Matters.

Authors:  Lana Yeganova; Donald C Comeau; Won Kim; W John Wilbur
Journal:  J Am Soc Inf Sci Technol       Date:  2008-11-06

8.  Click-words: learning to predict document keywords from a user perspective.

Authors:  Rezarta Islamaj Doğan; Zhiyong Lu
Journal:  Bioinformatics       Date:  2010-09-01       Impact factor: 6.937

9.  Understanding PubMed user search behavior through log analysis.

Authors:  Rezarta Islamaj Dogan; G Craig Murray; Aurélie Névéol; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2009-11-27       Impact factor: 3.451

10.  Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

Authors:  Sun Kim; Lana Yeganova; W John Wilbur
Journal:  Bioinformatics       Date:  2016-06-10       Impact factor: 6.937

  10 in total
  5 in total

1.  A graph-based method for reconstructing entities from coordination ellipsis in medical text.

Authors:  Chi Yuan; Yongli Wang; Ning Shang; Ziran Li; Ruxin Zhao; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

2.  Towards a unified search: Improving PubMed retrieval with full text.

Authors:  Won Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2022-09-21       Impact factor: 8.000

3.  A reference set of curated biomedical data and metadata from clinical case reports.

Authors:  J Harry Caufield; Yijiang Zhou; Anders O Garlid; Shaun P Setty; David A Liem; Quan Cao; Jessica M Lee; Sanjana Murali; Sarah Spendlove; Wei Wang; Li Zhang; Yizhou Sun; Alex Bui; Henning Hermjakob; Karol E Watson; Peipei Ping
Journal:  Sci Data       Date:  2018-11-20       Impact factor: 6.444

Review 4.  AI in Health: State of the Art, Challenges, and Future Directions.

Authors:  Fei Wang; Anita Preininger
Journal:  Yearb Med Inform       Date:  2019-08-16

5.  Epione application: An integrated web‑toolkit of clinical genomics and personalized medicine in systemic lupus erythematosus.

Authors:  Louis Papageorgiou; Haris Alkenaris; Maria I Zervou; Dimitriοs Vlachakis; Ioannis Matalliotakis; Demetrios A Spandidos; George Bertsias; George N Goulielmos; Elias Eliopoulos
Journal:  Int J Mol Med       Date:  2021-11-18       Impact factor: 4.101

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.