Literature DB >> 19497938

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.

Shanfeng Zhu1, Jia Zeng, Hiroshi Mamitsuka.   

Abstract

MOTIVATION: Clustering MEDLINE documents is usually conducted by the vector space model, which computes the content similarity between two documents by basically using the inner-product of their word vectors. Recently, the semantic information of MeSH (Medical Subject Headings) thesaurus is being applied to clustering MEDLINE documents by mapping documents into MeSH concept vectors to be clustered. However, current approaches of using MeSH thesaurus have two serious limitations: first, important semantic information may be lost when generating MeSH concept vectors, and second, the content information of the original text has been discarded.
METHODS: Our new strategy includes three key points. First, we develop a sound method for measuring the semantic similarity between two documents over the MeSH thesaurus. Second, we combine both the semantic and content similarities to generate the integrated similarity matrix between documents. Third, we apply a spectral approach to clustering documents over the integrated similarity matrix.
RESULTS: Using various 100 datasets of MEDLINE records, we conduct extensive experiments with changing alternative measures and parameters. Experimental results show that integrating the semantic and content similarities outperforms the case of using only one of the two similarities, being statistically significant. We further find the best parameter setting that is consistent over all experimental conditions conducted. We finally show a typical example of resultant clusters, confirming the effectiveness of our strategy in improving MEDLINE document clustering. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2009        PMID: 19497938     DOI: 10.1093/bioinformatics/btp338

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

1.  Towards a framework for developing semantic relatedness reference standards.

Authors:  Serguei V S Pakhomov; Ted Pedersen; Bridget McInnes; Genevieve B Melton; Alexander Ruggieri; Christopher G Chute
Journal:  J Biomed Inform       Date:  2010-10-31       Impact factor: 6.317

2.  Click-words: learning to predict document keywords from a user perspective.

Authors:  Rezarta Islamaj Doğan; Zhiyong Lu
Journal:  Bioinformatics       Date:  2010-09-01       Impact factor: 6.937

3.  Multiconstrained gene clustering based on generalized projections.

Authors:  Jia Zeng; Shanfeng Zhu; Alan Wee-Chung Liew; Hong Yan
Journal:  BMC Bioinformatics       Date:  2010-03-31       Impact factor: 3.169

4.  Recommending MeSH terms for annotating biomedical articles.

Authors:  Minlie Huang; Aurélie Névéol; Zhiyong Lu
Journal:  J Am Med Inform Assoc       Date:  2011-05-25       Impact factor: 4.497

5.  MeSH: a window into full text for document summarization.

Authors:  Sanmitra Bhattacharya; Viet Ha-Thuc; Padmini Srinivasan
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

6.  Full text clustering and relationship network analysis of biomedical publications.

Authors:  Renchu Guan; Chen Yang; Maurizio Marchese; Yanchun Liang; Xiaohu Shi
Journal:  PLoS One       Date:  2014-09-24       Impact factor: 3.240

7.  MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

Authors:  Ke Liu; Shengwen Peng; Junqiu Wu; Chengxiang Zhai; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

8.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

Authors:  Shengwen Peng; Ronghui You; Hongning Wang; Chengxiang Zhai; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

9.  MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.

Authors:  Yuqing Mao; Zhiyong Lu
Journal:  J Biomed Semantics       Date:  2017-04-17

10.  Automatic topic identification of health-related messages in online health community using text classification.

Authors:  Yingjie Lu
Journal:  Springerplus       Date:  2013-07-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.