Literature DB >> 12169526

The metric space of proteins-comparative study of clustering algorithms.

Ori Sasson1, Nathan Linial, Michal Linial.   

Abstract

MOTIVATION: A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation.
RESULTS: We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.

Mesh:

Substances:

Year:  2002        PMID: 12169526     DOI: 10.1093/bioinformatics/18.suppl_1.s14

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  ProtoNet: hierarchical classification of the protein space.

Authors:  Ori Sasson; Avishay Vaaknin; Hillel Fleischer; Elon Portugaly; Yonatan Bilu; Nathan Linial; Michal Linial
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  Geometric aspects of biological sequence comparison.

Authors:  Aleksandar Stojmirović; Yi-Kuo Yu
Journal:  J Comput Biol       Date:  2009-04       Impact factor: 1.479

3.  ProTarget: automatic prediction of protein structure novelty.

Authors:  Ori Sasson; Michal Linial
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

4.  Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.

Authors:  Qicheng Ma; Gung-Wei Chirn; Richard Cai; Joseph D Szustakowski; N R Nirmala
Journal:  BMC Bioinformatics       Date:  2005-10-03       Impact factor: 3.169

5.  Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.

Authors:  Ikuo Uchiyama
Journal:  Nucleic Acids Res       Date:  2006-01-25       Impact factor: 16.971

6.  EVEREST: automatic identification and classification of protein domains in all protein sequences.

Authors:  Elon Portugaly; Amir Harel; Nathan Linial; Michal Linial
Journal:  BMC Bioinformatics       Date:  2006-06-02       Impact factor: 3.169

7.  A functional hierarchical organization of the protein sequence space.

Authors:  Noam Kaplan; Moriah Friedlich; Menachem Fromer; Michal Linial
Journal:  BMC Bioinformatics       Date:  2004-12-14       Impact factor: 3.169

8.  Partitioning clustering algorithms for protein sequence data sets.

Authors:  Sondes Fayech; Nadia Essoussi; Mohamed Limam
Journal:  BioData Min       Date:  2009-04-02       Impact factor: 2.522

9.  Probing metagenomics by rapid cluster analysis of very large datasets.

Authors:  Weizhong Li; John C Wooley; Adam Godzik
Journal:  PLoS One       Date:  2008-10-10       Impact factor: 3.240

10.  Spectral clustering of protein sequences.

Authors:  Alberto Paccanaro; James A Casbon; Mansoor A S Saqi
Journal:  Nucleic Acids Res       Date:  2006-03-17       Impact factor: 16.971

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.