Literature DB >> 36152950

Towards a unified search: Improving PubMed retrieval with full text.

Won Kim1, Lana Yeganova1, Donald C Comeau1, W John Wilbur1, Zhiyong Lu2.   

Abstract

OBJECTIVE: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance.
MATERIALS AND METHODS: For scoring full text articles, we propose a method to combine information coming from different sections by converting the traditionally used BM25 scores into log odds ratio scores which can be treated uniformly. We further propose a method that successfully combines scores from two heterogenous retrieval sources - full text articles and abstract only articles - by balancing the contributions of their respective scores through a probabilistic transformation. We use PubMed click data that consists of queries sampled from PubMed user logs along with a subset of retrieved and clicked documents to train the probabilistic functions and to evaluate retrieval effectiveness. RESULTS AND
CONCLUSIONS: Random ranking achieves 0.579 MAP score on our PubMed click data. BM25 ranking on PubMed abstracts improves the MAP by 10.6%. For full text documents, experiments confirm that BM25 section scores are of different value depending on the section type and are not directly comparable. Naïvely using the body text of articles along with abstract text degrades the overall quality of the search. The proposed log odds ratio scores normalize and combine the contributions of occurrences of query tokens in different sections. By including full text where available, we gain another 0.67%, or 7% relative improvement over abstract alone. We find an advantage in the more accurate estimate of the value of BM25 scores depending on the section from which they were produced. Taking the sum of top three section scores performs the best. Published by Elsevier Inc.

Entities:  

Keywords:  Combining abstract with full text; Full text search; Information retrieval; PubMed search engine; Search relevance gold standard

Mesh:

Year:  2022        PMID: 36152950      PMCID: PMC9561061          DOI: 10.1016/j.jbi.2022.104211

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   8.000


  17 in total

1.  Evaluating relevance ranking strategies for MEDLINE retrieval.

Authors:  Zhiyong Lu; Won Kim; W John Wilbur
Journal:  J Am Med Inform Assoc       Date:  2008-10-24       Impact factor: 4.497

2.  LitSense: making sense of biomedical literature at sentence level.

Authors:  Alexis Allot; Qingyu Chen; Sun Kim; Roberto Vera Alvarez; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

3.  PubTator central: automated concept annotation for biomedical full text articles.

Authors:  Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

4.  A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering.

Authors:  Mourad Sarrouti; Said Ouatik El Alaoui
Journal:  J Biomed Inform       Date:  2017-03-07       Impact factor: 6.317

5.  The structural and content aspects of abstracts versus bodies of full text journal articles are different.

Authors:  K Bretonnel Cohen; Helen L Johnson; Karin Verspoor; Christophe Roeder; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2010-09-29       Impact factor: 3.169

6.  Understanding PubMed user search behavior through log analysis.

Authors:  Rezarta Islamaj Dogan; G Craig Murray; Aurélie Névéol; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2009-11-27       Impact factor: 3.451

7.  Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.

Authors:  Jin-Dong Kim; Jung-Jae Kim; Xu Han; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2015-07-13       Impact factor: 3.169

8.  Section level search functionality in Europe PMC.

Authors:  Şenay Kafkas; Xingjun Pi; Nikos Marinos; Francesco Talo'; Andrew Morrison; Johanna R McEntyre
Journal:  J Biomed Semantics       Date:  2015-03-10

9.  Is searching full text more effective than searching abstracts?

Authors:  Jimmy Lin
Journal:  BMC Bioinformatics       Date:  2009-02-03       Impact factor: 3.169

10.  PubMed Phrases, an open set of coherent phrases for searching biomedical literature.

Authors:  Sun Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  Sci Data       Date:  2018-06-12       Impact factor: 6.444

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.