Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Towards a unified search: Improving PubMed retrieval with full text.

Literature DB >> 36152950

Towards a unified search: Improving PubMed retrieval with full text.

Won Kim¹, Lana Yeganova¹, Donald C Comeau¹, W John Wilbur¹, Zhiyong Lu².

Abstract

OBJECTIVE: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance.
MATERIALS AND METHODS: For scoring full text articles, we propose a method to combine information coming from different sections by converting the traditionally used BM25 scores into log odds ratio scores which can be treated uniformly. We further propose a method that successfully combines scores from two heterogenous retrieval sources - full text articles and abstract only articles - by balancing the contributions of their respective scores through a probabilistic transformation. We use PubMed click data that consists of queries sampled from PubMed user logs along with a subset of retrieved and clicked documents to train the probabilistic functions and to evaluate retrieval effectiveness. RESULTS AND
CONCLUSIONS: Random ranking achieves 0.579 MAP score on our PubMed click data. BM25 ranking on PubMed abstracts improves the MAP by 10.6%. For full text documents, experiments confirm that BM25 section scores are of different value depending on the section type and are not directly comparable. Naïvely using the body text of articles along with abstract text degrades the overall quality of the search. The proposed log odds ratio scores normalize and combine the contributions of occurrences of query tokens in different sections. By including full text where available, we gain another 0.67%, or 7% relative improvement over abstract alone. We find an advantage in the more accurate estimate of the value of BM25 scores depending on the section from which they were produced. Taking the sum of top three section scores performs the best. Published by Elsevier Inc.

Entities: Chemical

Keywords: Combining abstract with full text; Full text search; Information retrieval; PubMed search engine; Search relevance gold standard

Mesh：

Year: 2022 PMID： 36152950 PMCID： PMC9561061 DOI： 10.1016/j.jbi.2022.104211

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 8.000

Keyword Cloud
References

17 in total

1. Evaluating relevance ranking strategies for MEDLINE retrieval.

Authors: Zhiyong Lu; Won Kim; W John Wilbur
Journal: J Am Med Inform Assoc Date: 2008-10-24 Impact factor: 4.497

2. LitSense: making sense of biomedical literature at sentence level.

Authors: Alexis Allot; Qingyu Chen; Sun Kim; Roberto Vera Alvarez; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

3. PubTator central: automated concept annotation for biomedical full text articles.

Authors: Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

4. A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering.

Authors: Mourad Sarrouti; Said Ouatik El Alaoui
Journal: J Biomed Inform Date: 2017-03-07 Impact factor: 6.317

5. The structural and content aspects of abstracts versus bodies of full text journal articles are different.

Authors: K Bretonnel Cohen; Helen L Johnson; Karin Verspoor; Christophe Roeder; Lawrence E Hunter
Journal: BMC Bioinformatics Date: 2010-09-29 Impact factor: 3.169

6. Understanding PubMed user search behavior through log analysis.

Authors: Rezarta Islamaj Dogan; G Craig Murray; Aurélie Névéol; Zhiyong Lu
Journal: Database (Oxford) Date: 2009-11-27 Impact factor: 3.451

7. Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.

Authors: Jin-Dong Kim; Jung-Jae Kim; Xu Han; Dietrich Rebholz-Schuhmann
Journal: BMC Bioinformatics Date: 2015-07-13 Impact factor: 3.169