Literature DB >> 26502435

Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints.

Jun Gu, Wei Feng, Jia Zeng, Hiroshi Mamitsuka, Shanfeng Zhu.   

Abstract

For clustering biomedical documents, we can consider three different types of information: the local-content (LC) information from documents, the global-content (GC) information from the whole MEDLINE collections, and the medical subject heading (MeSH)-semantic (MS) information. Previous methods for clustering biomedical documents are not necessarily effective for integrating different types of information, by which only one or two types of information have been used. Recently, the performance of MEDLINE document clustering has been enhanced by linearly combining both the LC and MS information. However, the simple linear combination could be ineffective because of the limitation of the representation space for combining different types of information (similarities) with different reliability. To overcome the limitation, we propose a new semisupervised spectral clustering method, i.e., SSNCut, for clustering over the LC similarities, with two types of constraints: must-link (ML) constraints on document pairs with high MS (or GC) similarities and cannot-link (CL) constraints on those with low similarities. We empirically demonstrate the performance of SSNCut on MEDLINE document clustering, by using 100 data sets of MEDLINE records. Experimental results show that SSNCut outperformed a linear combination method and several well-known semisupervised clustering methods, being statistically significant. Furthermore, the performance of SSNCut with constraints from both MS and GC similarities outperformed that from only one type of similarities. Another interesting finding was that ML constraints more effectively worked than CL constraints, since CL constraints include around 10% incorrect ones, whereas this number was only 1% for ML constraints.

Entities:  

Mesh:

Year:  2013        PMID: 26502435     DOI: 10.1109/TSMCB.2012.2227998

Source DB:  PubMed          Journal:  IEEE Trans Cybern        ISSN: 2168-2267            Impact factor:   11.448


  6 in total

1.  SolidBin: improving metagenome binning with semi-supervised normalized cut.

Authors:  Ziye Wang; Zhengyang Wang; Yang Young Lu; Fengzhu Sun; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

2.  Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial.

Authors:  Weixiang Shao; Clive E Adams; Aaron M Cohen; John M Davis; Marian S McDonagh; Sujata Thakurta; Philip S Yu; Neil R Smalheiser
Journal:  Methods       Date:  2014-11-20       Impact factor: 3.608

3.  MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

Authors:  Ke Liu; Shengwen Peng; Junqiu Wu; Chengxiang Zhai; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

4.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

Authors:  Shengwen Peng; Ronghui You; Hongning Wang; Chengxiang Zhai; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

5.  An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering.

Authors:  Meijing Li; Tianjie Chen; Keun Ho Ryu; Cheng Hao Jin
Journal:  Comput Math Methods Med       Date:  2021-11-09       Impact factor: 2.238

6.  A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments.

Authors:  Shaojun Pan; Chengkai Zhu; Xing-Ming Zhao; Luis Pedro Coelho
Journal:  Nat Commun       Date:  2022-04-28       Impact factor: 17.694

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.