Literature DB >> 9783227

A map of the protein space--an automatic hierarchical classification of all protein sequences.

G Yona1, N Linial, N Tishby, M Linial.   

Abstract

We investigate the space of all protein sequences. We combine the standard measures of similarity (SW, FASTA, BLAST), to associate with each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. We look for clusters of related proteins in this graph. These clusters correspond to strongly connected sets of vertices. Two main ideas underlie our work: i) Interesting homologies among proteins can be deduced by transitivity. ii) Transitivity should be applied restrictively in order to prevent unrelated proteins from clustering together. Our analysis starts from a very conservative classification, based on very significant similarities, that has many classes. Subsequently, classes are merged to include less significant similarities. Merging is performed via a novel two phase algorithm. First, the algorithm identifies groups of possibly related clusters (based on transitivity and strong connectivity) using local considerations, and merges them. Then, a global test is applied to identify nuclei of strong relationships within these groups of clusters, and the classification is refined accordingly. This process takes place at varying thresholds of statistical significance, where at each step the algorithm is applied on the classes of the previous classification, to obtain the next one, at the more permissive threshold. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the space of all protein sequences into well defined groups of proteins. The results show that the automatically induced sets of proteins are closely correlated with natural biological families and super families. The hierarchical organization reveals finer sub-families that make up known families of proteins as well as many interesting relations between protein families. The hierarchical organization proposed may be considered as the first map of the space of all protein sequences. An interactive web site including the results of our analysis has been constructed, and is now accessible through http:/(/)www.protomap.cs.huji.ac.il

Mesh:

Substances:

Year:  1998        PMID: 9783227

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  6 in total

1.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

Authors:  G Yona; N Linial; M Linial
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Cloning, characterization and mapping of the human ATP5E gene, identification of pseudogene ATP5EP1, and definition of the ATP5E motif.

Authors:  Q Tu; L Yu; P Zhang; M Zhang; H Zhang; J Jiang; C Chen; S Zhao
Journal:  Biochem J       Date:  2000-04-01       Impact factor: 3.857

3.  Estimating the probability for a protein to have a new fold: A statistical computational model.

Authors:  E Portugaly; M Linial
Journal:  Proc Natl Acad Sci U S A       Date:  2000-05-09       Impact factor: 11.205

4.  Massive sequence comparisons as a help in annotating genomic sequences.

Authors:  A Louis; E Ollivier; J C Aude; J L Risler
Journal:  Genome Res       Date:  2001-07       Impact factor: 9.043

5.  A family of at least seven beta-galactosidase genes is expressed during tomato fruit development.

Authors:  D L Smith; K C Gross
Journal:  Plant Physiol       Date:  2000-07       Impact factor: 8.340

6.  Increased taxon sampling reveals thousands of hidden orthologs in flatworms.

Authors:  José M Martín-Durán; Joseph F Ryan; Bruno C Vellutini; Kevin Pang; Andreas Hejnol
Journal:  Genome Res       Date:  2017-04-11       Impact factor: 9.043

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.