Literature DB >> 10592179

ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

G Yona1, N Linial, M Linial.   

Abstract

The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences. The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together. The classification is done at different levels of confidence, and yields a hierarchical organization of all proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il

Mesh:

Substances:

Year:  2000        PMID: 10592179      PMCID: PMC102438          DOI: 10.1093/nar/28.1.49

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  17 in total

1.  ProtoMap: automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space.

Authors:  G Yona; N Linial; M Linial
Journal:  Proteins       Date:  1999-11-15

2.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  Recent improvements of the ProDom database of protein domain families.

Authors:  F Corpet; J Gouzy; D Kahn
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

5.  A map of the protein space--an automatic hierarchical classification of all protein sequences.

Authors:  G Yona; N Linial; N Tishby; M Linial
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1998

6.  A set-theoretic approach to database searching and clustering.

Authors:  A Krause; M Vingron
Journal:  Bioinformatics       Date:  1998-06       Impact factor: 6.937

7.  Superfamily classification in PIR-International Protein Sequence Database.

Authors:  W C Barker; F Pfeiffer; D G George
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

Review 8.  A genomic perspective on protein families.

Authors:  R L Tatusov; E V Koonin; D J Lipman
Journal:  Science       Date:  1997-10-24       Impact factor: 47.728

9.  Highly specific protein sequence motifs for genome analysis.

Authors:  C G Nevill-Manning; T D Wu; D L Brutlag
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

10.  Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.

Authors:  J Gracy; P Argos
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

View more
  45 in total

1.  The MetaFam Server: a comprehensive protein family resource.

Authors:  K A Silverstein; E Shoop; J E Johnson; A Kilian; J L Freeman; T M Kunau; I A Awad; M Mayer; E F Retzel
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  HOBACGEN: database system for comparative genomics in bacteria.

Authors:  G Perrière; L Duret; M Gouy
Journal:  Genome Res       Date:  2000-03       Impact factor: 9.043

3.  Estimating the probability for a protein to have a new fold: A statistical computational model.

Authors:  E Portugaly; M Linial
Journal:  Proc Natl Acad Sci U S A       Date:  2000-05-09       Impact factor: 11.205

4.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors:  P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-07-01       Impact factor: 16.971

5.  Coverage of protein sequence space by current structural genomics targets.

Authors:  Nicholas O'Toole; Stéphane Raymond; Miroslaw Cygler
Journal:  J Struct Funct Genomics       Date:  2003

6.  ProtoNet: hierarchical classification of the protein space.

Authors:  Ori Sasson; Avishay Vaaknin; Hillel Fleischer; Elon Portugaly; Yonatan Bilu; Nathan Linial; Michal Linial
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  Three monophyletic superfamilies account for the majority of the known glycosyltransferases.

Authors:  Jing Liu; Arcady Mushegian
Journal:  Protein Sci       Date:  2003-07       Impact factor: 6.725

8.  Protein families and TRIBES in genome sequence space.

Authors:  Anton J Enright; Victor Kunin; Christos A Ouzounis
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

9.  PANDORA: keyword-based analysis of protein sets by integration of annotation sources.

Authors:  Noam Kaplan; Avishay Vaaknin; Michal Linial
Journal:  Nucleic Acids Res       Date:  2003-10-01       Impact factor: 16.971

10.  Exploring the sequence-structure protein landscape in the glycosyltransferase family.

Authors:  Ziding Zhang; Sunil Kochhar; Martin Grigorov
Journal:  Protein Sci       Date:  2003-10       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.