Literature DB >> 10591097

ProtoMap: automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space.

G Yona1, N Linial, M Linial.   

Abstract

We investigate the space of all protein sequences in search of clusters of related proteins. Our aim is to automatically detect these sets, and thus obtain a classification of all protein sequences. Our analysis, which uses standard measures of sequence similarity as applied to an all-vs.-all comparison of SWISSPROT, gives a very conservative initial classification based on the highest scoring pairs. The many classes in this classification correspond to protein subfamilies. Subsequently we merge the subclasses using the weaker pairs in a two-phase clustering algorithm. The algorithm makes use of transitivity to identify homologous proteins; however, transitivity is applied restrictively in an attempt to prevent unrelated proteins from clustering together. This process is repeated at varying levels of statistical significance. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Different indices of validity were applied to assess the quality of our classification and compare it with the protein families in the PROSITE and Pfam databases. Our classification agrees with these domain-based classifications for between 64.8% and 88.5% of the proteins. It also finds many new clusters of protein sequences which were not classified by these databases. The hierarchical organization suggested by our analysis reveals finer subfamilies in families of known proteins as well as many novel relations between protein families.

Mesh:

Substances:

Year:  1999        PMID: 10591097

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  24 in total

1.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families.

Authors:  G Yona; N Linial; M Linial
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Estimating the probability for a protein to have a new fold: A statistical computational model.

Authors:  E Portugaly; M Linial
Journal:  Proc Natl Acad Sci U S A       Date:  2000-05-09       Impact factor: 11.205

3.  Large-scale protein annotation through gene ontology.

Authors:  Hanqing Xie; Alon Wasserman; Zurit Levine; Amit Novik; Vladimir Grebinskiy; Avi Shoshan; Liat Mintz
Journal:  Genome Res       Date:  2002-05       Impact factor: 9.043

4.  PHYTOPROT: a database of clusters of plant proteins.

Authors:  S Mohseni-Zadeh; A Louis; P Brézellec; J-L Risler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  ProtoNet: hierarchical classification of the protein space.

Authors:  Ori Sasson; Avishay Vaaknin; Hillel Fleischer; Elon Portugaly; Yonatan Bilu; Nathan Linial; Michal Linial
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  Protein ranking: from local to global structure in the protein similarity network.

Authors:  Jason Weston; Andre Elisseeff; Dengyong Zhou; Christina S Leslie; William Stafford Noble
Journal:  Proc Natl Acad Sci U S A       Date:  2004-04-15       Impact factor: 11.205

7.  Alignment of protein sequences by their profiles.

Authors:  Marc A Marti-Renom; M S Madhusudhan; Andrej Sali
Journal:  Protein Sci       Date:  2004-04       Impact factor: 6.725

8.  Scale-free networks versus evolutionary drift.

Authors:  Teresa M Przytycka; Yi-Kuo Yu
Journal:  Comput Biol Chem       Date:  2004-10       Impact factor: 2.877

9.  Identification of genomic features using microsyntenies of domains: domain teams.

Authors:  Sophie Pasek; Anne Bergeron; Jean-Loup Risler; Alexandra Louis; Emmanuelle Ollivier; Mathieu Raffinot
Journal:  Genome Res       Date:  2005-05-17       Impact factor: 9.043

10.  Graph theoretical insights into evolution of multidomain proteins.

Authors:  Teresa Przytycka; George Davis; Nan Song; Dannie Durand
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.