Literature DB >> 12117788

Clustering of proximal sequence space for the identification of protein families.

Federico Abascal1, Alfonso Valencia.   

Abstract

MOTIVATION: The study of sequence space, and the deciphering of the structure of protein families and subfamilies, has up to now been required for work in comparative genomics and for the prediction of protein function. With the emergence of structural proteomics projects, it is becoming increasingly important to be able to select protein targets for structural studies that will appropriately cover the space of protein sequences, functions and genomic distribution. These problems are the motivation for the development of methods for clustering protein sequences and building families of potentially orthologous sequences, such as those proposed here.
RESULTS: First we developed a clustering strategy (Ncut algorithm) capable of forming groups of related sequences by assessing their pairwise relationships. The results presented for the ras super-family of proteins are similar to those produced by other clustering methods, but without the need for clustering the full sequence space. The Ncut clusters are then used as the input to a process of reconstruction of groups with equilibrated genomic composition formed by closely-related sequences. The results of applying this technique to the data set used in the construction of the COG database are very similar to those derived by the human experts responsible for this database. AVAILABILITY: The analysis of different systems, including the COG equivalent 21 genomes are available at http://www.pdg.cnb.uam.es/GenoClustering.html.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12117788     DOI: 10.1093/bioinformatics/18.7.908

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  A Bayesian sampler for optimization of protein domain hierarchies.

Authors:  Andrew F Neuwald
Journal:  J Comput Biol       Date:  2014-02-04       Impact factor: 1.479

2.  GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains.

Authors:  David A Lee; Robert Rentzsch; Christine Orengo
Journal:  Nucleic Acids Res       Date:  2009-11-18       Impact factor: 16.971

3.  Genome-wide comparative gene family classification.

Authors:  Christian Frech; Nansheng Chen
Journal:  PLoS One       Date:  2010-10-15       Impact factor: 3.240

4.  OrthoMCL: identification of ortholog groups for eukaryotic genomes.

Authors:  Li Li; Christian J Stoeckert; David S Roos
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

5.  Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.

Authors:  Andrew F Neuwald; Christopher J Lanczycki; Aron Marchler-Bauer
Journal:  BMC Bioinformatics       Date:  2012-06-22       Impact factor: 3.169

6.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape.

Authors:  John H Morris; Leonard Apeltsin; Aaron M Newman; Jan Baumbach; Tobias Wittkop; Gang Su; Gary D Bader; Thomas E Ferrin
Journal:  BMC Bioinformatics       Date:  2011-11-09       Impact factor: 3.307

7.  Saccharomyces cerevisiae as a model organism: a comparative study.

Authors:  Hiren Karathia; Ester Vilaprinyo; Albert Sorribas; Rui Alves
Journal:  PLoS One       Date:  2011-02-02       Impact factor: 3.240

8.  Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.

Authors:  Ikuo Uchiyama
Journal:  Nucleic Acids Res       Date:  2006-01-25       Impact factor: 16.971

Review 9.  Functional classification using phylogenomic inference.

Authors:  Duncan Brown; Kimmen Sjölander
Journal:  PLoS Comput Biol       Date:  2006-06-30       Impact factor: 4.475

10.  Automated protein subfamily identification and classification.

Authors:  Duncan P Brown; Nandini Krishnamurthy; Kimmen Sjölander
Journal:  PLoS Comput Biol       Date:  2007-08       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.