Literature DB >> 22759611

A LDA-based approach to promoting ranking diversity for genomics information retrieval.

Yan Chen1, Xiaoshi Yin, Zhoujun Li, Xiaohua Hu, Jimmy Xiangji Huang.   

Abstract

BACKGROUND: In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists.
RESULTS: In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track.
CONCLUSIONS: The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search. Moreover, we proposes a distance measure to quantify how much a passage can increase topical diversity by considering both topical importance and topical coefficient by LDA, and the distance measure is a modified Euclidean distance.

Entities:  

Mesh:

Year:  2012        PMID: 22759611      PMCID: PMC3394425          DOI: 10.1186/1471-2164-13-S3-S2

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  1 in total

1.  Finding scientific topics.

Authors:  Thomas L Griffiths; Mark Steyvers
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-10       Impact factor: 11.205

  1 in total
  5 in total

1.  Discovering associations among diagnosis groups using topic modeling.

Authors:  Ding Cheng Li; Terry Thermeau; Christopher Chute; Hongfang Liu
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2014-04-07

2.  Learning to rank diversified results for biomedical information retrieval from multiple features.

Authors:  Jiajin Wu; Jimmy Huang; Zheng Ye
Journal:  Biomed Eng Online       Date:  2014-12-11       Impact factor: 2.819

3.  Systematic identification of latent disease-gene associations from PubMed articles.

Authors:  Yuji Zhang; Feichen Shen; Majid Rastegar Mojarad; Dingcheng Li; Sijia Liu; Cui Tao; Yue Yu; Hongfang Liu
Journal:  PLoS One       Date:  2018-01-26       Impact factor: 3.240

4.  A systems approach for analysis of high content screening assay data with topic modeling.

Authors:  Halil Bisgin; Minjun Chen; Yuping Wang; Reagan Kelly; Hong Fang; Xiaowei Xu; Weida Tong
Journal:  BMC Bioinformatics       Date:  2013-10-09       Impact factor: 3.169

Review 5.  An overview of topic modeling and its current applications in bioinformatics.

Authors:  Lin Liu; Lin Tang; Wen Dong; Shaowen Yao; Wei Zhou
Journal:  Springerplus       Date:  2016-09-20
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.