Literature DB >> 18399070

BAG: a graph theoretic sequence clustering algorithm.

Sun Kim1, Jason Lee.   

Abstract

In this paper, we first discuss issues in clustering biological sequences with graph properties, which inspired the design of our sequence clustering algorithm BAG. BAG recursively utilises several graph properties: biconnectedness, articulation points, pquasi-completeness, and domain knowledge specific to biological sequence clustering. To reduce the fragmentation issue, we have developed a new metric called cluster utility to guide cluster splitting. Clusters are then merged back with less stringent constraints. Experiments with the entire COG database and other sequence databases show that BAG can cluster a large number of sequences accurately while keeping the number of fragmented clusters significantly low.

Mesh:

Year:  2006        PMID: 18399070     DOI: 10.1504/ijdmb.2006.010855

Source DB:  PubMed          Journal:  Int J Data Min Bioinform        ISSN: 1748-5673            Impact factor:   0.667


  9 in total

1.  Graph pyramids for protein function prediction.

Authors:  Tushar Sandhan; Youngjun Yoo; Jin Choi; Sun Kim
Journal:  BMC Med Genomics       Date:  2015-05-29       Impact factor: 3.063

2.  Supervised protein family classification and new family construction.

Authors:  Gangman Yi; Michael R Thon; Sing-Hoi Sze
Journal:  J Comput Biol       Date:  2012-08       Impact factor: 1.479

3.  Application of Subspace Clustering in DNA Sequence Analysis.

Authors:  Tim Wallace; Ali Sekmen; Xiaofei Wang
Journal:  J Comput Biol       Date:  2015-07-10       Impact factor: 1.479

4.  Genome-wide comparative gene family classification.

Authors:  Christian Frech; Nansheng Chen
Journal:  PLoS One       Date:  2010-10-15       Impact factor: 3.240

5.  SEQOPTICS: a protein sequence clustering system.

Authors:  Yonghui Chen; Kevin D Reilly; Alan P Sprague; Zhijie Guan
Journal:  BMC Bioinformatics       Date:  2006-12-12       Impact factor: 3.169

6.  De novo identification of LTR retrotransposons in eukaryotic genomes.

Authors:  Mina Rho; Jeong-Hyeon Choi; Sun Kim; Michael Lynch; Haixu Tang
Journal:  BMC Genomics       Date:  2007-04-03       Impact factor: 3.969

7.  Massive fungal biodiversity data re-annotation with multi-level clustering.

Authors:  Duong Vu; Szániszló Szöke; Christian Wiwie; Jan Baumbach; Gianluigi Cardinali; Richard Röttger; Vincent Robert
Journal:  Sci Rep       Date:  2014-10-30       Impact factor: 4.379

8.  Family classification without domain chaining.

Authors:  Jacob M Joseph; Dannie Durand
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

9.  ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts.

Authors:  Kwangmin Choi; Sun Kim
Journal:  BMC Bioinformatics       Date:  2008-03-06       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.