Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Performance comparison of gene family clustering methods with expert curated gene family data set in Arabidopsis thaliana.

Literature DB >> 18493791

Performance comparison of gene family clustering methods with expert curated gene family data set in Arabidopsis thaliana.

Abstract

With the exponential growth of genomics data, the demand for reliable clustering methods is increasing every day. Despite the wide usage of many clustering algorithms, the accuracy of these algorithms has been evaluated mostly on simulated data sets and seldom on real biological data for which a "correct answer" is available. In order to address this issue, we use the manually curated high-quality Arabidopsis thaliana gene family database as a "gold standard" to conduct a comprehensive comparison of the accuracies of four widely used clustering methods including K-means, TribeMCL, single-linkage clustering and complete-linkage clustering. We compare the results from running different clustering methods on two matrices: the E-value matrix and the k-tuple distance matrix. The E-value matrix is computed based on BLAST E-values. The k-tuple distance matrix is computed based on the difference in tuple frequencies. The TribeMCL with the E-value matrix performed best, with the Inflation parameter (=1.15) tuned considerably lower than what has been suggested previously (=2). The single-linkage clustering method with the E-value matrix was second best. Single-linkage clustering, K-means clustering, complete-linkage clustering, and TribeMCL with a k-tuple distance matrix performed reasonably well. Complete-linkage clustering with the k-tuple distance matrix performed the worst.

Entities: Species

Mesh：

Year: 2008 PMID： 18493791 DOI： 10.1007/s00425-008-0748-7

Source DB: PubMed Journal: Planta ISSN： 0032-0935 Impact factor: 4.116

9 in total

1. ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

Authors: G Yona; N Linial; M Linial
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The SYSTERS protein sequence cluster set.

Authors: A Krause; J Stoye; M Vingron
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

3. Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification.

Authors: I Yanai; C J Camacho; C DeLisi
Journal: Phys Rev Lett Date: 2000-09-18 Impact factor: 9.161

4. ProClust: improved clustering of protein sequences with an extended graph-based approach.

Authors: P Pipenbacher; A Schliep; S Schneckener; A Schönhuth; D Schomburg; R Schrader
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

5. Evaluation and comparison of gene clustering methods in microarray analysis.

Authors: Anbupalam Thalamuthu; Indranil Mukhopadhyay; Xiaojing Zheng; George C Tseng
Journal: Bioinformatics Date: 2006-07-31 Impact factor: 6.937

6. A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors: B E Blaisdell
Journal: Proc Natl Acad Sci U S A Date: 1986-07 Impact factor: 11.205

Review 7. TAIR: a resource for integrated Arabidopsis data.

Authors: Margarita Garcia-Hernandez; Tanya Z Berardini; Guanghong Chen; Debbie Crist; Aisling Doyle; Eva Huala; Emma Knee; Mark Lambrecht; Neil Miller; Lukas A Mueller; Suparna Mundodi; Leonore Reiser; Seung Y Rhee; Randy Scholl; Julie Tacklind; Dan C Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang
Journal: Funct Integr Genomics Date: 2002-10-03 Impact factor: 3.410

8. Spectral clustering of protein sequences.

Authors: Alberto Paccanaro; James A Casbon; Mansoor A S Saqi
Journal: Nucleic Acids Res Date: 2006-03-17 Impact factor: 16.971

9. PlantTribes: a gene and gene family resource for comparative genomics in plants.

Authors: P Kerr Wall; Jim Leebens-Mack; Kai F Müller; Dawn Field; Naomi S Altman; Claude W dePamphilis
Journal: Nucleic Acids Res Date: 2007-12-10 Impact factor: 16.971

9 in total

2 in total

1. A novel hierarchical clustering algorithm for gene sequences.

Authors: Dan Wei; Qingshan Jiang; Yanjie Wei; Shengrui Wang
Journal: BMC Bioinformatics Date: 2012-07-23 Impact factor: 3.169

2. GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology.

Authors: Živa Ramsak; Špela Baebler; Ana Rotter; Matej Korbar; Igor Mozetic; Björn Usadel; Kristina Gruden
Journal: Nucleic Acids Res Date: 2013-11-04 Impact factor: 16.971

2 in total