Literature DB >> 18593717

PuReD-MCL: a graph-based PubMed document clustering methodology.

T Theodosiou1, N Darzentas, L Angelis, C A Ouzounis.   

Abstract

MOTIVATION: Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed.
METHODS: PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D.
RESULTS: The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. AVAILABILITY: Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/

Entities:  

Mesh:

Year:  2008        PMID: 18593717     DOI: 10.1093/bioinformatics/btn318

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches.

Authors:  Kevin W Boyack; David Newman; Russell J Duhon; Richard Klavans; Michael Patek; Joseph R Biberstine; Bob Schijvenaars; André Skupin; Nianli Ma; Katy Börner
Journal:  PLoS One       Date:  2011-03-17       Impact factor: 3.240

2.  Connecting the dots between PubMed abstracts.

Authors:  M Shahriar Hossain; Joseph Gresock; Yvette Edmonds; Richard Helm; Malcolm Potts; Naren Ramakrishnan
Journal:  PLoS One       Date:  2012-01-03       Impact factor: 3.240

Review 3.  Mapping patient safety: a large-scale literature review using bibliometric visualisation techniques.

Authors:  S P Rodrigues; N J van Eck; L Waltman; F W Jansen
Journal:  BMJ Open       Date:  2014-03-13       Impact factor: 2.692

4.  Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data.

Authors:  L Jael Garcia Castro; C McLaughlin; A Garcia
Journal:  J Biomed Semantics       Date:  2013-04-15

5.  Soft document clustering using a novel graph covering approach.

Authors:  Jens Dörpinghaus; Sebastian Schaaf; Marc Jacobs
Journal:  BioData Min       Date:  2018-06-14       Impact factor: 2.522

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.