Literature DB >> 10584071

Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins.

I Rigoutsos1, A Floratos, C Ouzounis, Y Gao, L Parida.   

Abstract

Using Teiresias, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration of the solution space, we have explored the GenPept sequence database and built a dictionary of all sequence patterns with two or more instances. The entries of this dictionary, henceforth named seqlets, cover 98.12% of all amino acid positions in the input database and in essence provide a comprehensive finite set of descriptors for protein sequence space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, seqlets can be thought of as building blocks of protein molecules that are a necessary (but not sufficient) condition for function or family equivalence memberships. Thus, seqlets can either define conserved family signatures or cut across molecular families and previously undetected sequence signals deriving from functional convergence. Moreover, we show that seqlets also can capture structurally conserved motifs. The availability of a dictionary of seqlets that has been derived in such an unsupervised, hierarchical manner is generating new opportunities for addressing problems that range from reliable classification and the correlation of sequence fragments with functional categories to faster and sensitive engines for homology searches, evolutionary studies, and protein structure prediction.

Mesh:

Substances:

Year:  1999        PMID: 10584071     DOI: 10.1002/(sici)1097-0134(19991101)37:2<264::aid-prot11>3.0.co;2-c

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  16 in total

1.  Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools.

Authors:  N C Kyrpides; C A Ouzounis; I Iliopoulos; V Vonstein; R Overbeek
Journal:  Nucleic Acids Res       Date:  2000-11-15       Impact factor: 16.971

2.  Motif-based fold assignment.

Authors:  L Salwinski; D Eisenberg
Journal:  Protein Sci       Date:  2001-12       Impact factor: 6.725

3.  Dictionary-driven prokaryotic gene finding.

Authors:  Tetsuo Shibuya; Isidore Rigoutsos
Journal:  Nucleic Acids Res       Date:  2002-06-15       Impact factor: 16.971

4.  TRILOGY: Discovery of sequence-structure patterns across diverse proteins.

Authors:  Philip Bradley; Peter S Kim; Bonnie Berger
Journal:  Proc Natl Acad Sci U S A       Date:  2002-06-25       Impact factor: 11.205

5.  In silico pattern-based analysis of the human cytomegalovirus genome.

Authors:  Isidore Rigoutsos; Jiri Novotny; Tien Huynh; Stephen T Chin-Bow; Laxmi Parida; Daniel Platt; David Coleman; Thomas Shenk
Journal:  J Virol       Date:  2003-04       Impact factor: 5.103

6.  Re-evaluation and in silico annotation of the Tupaia herpesvirus proteins.

Authors:  Udo Bahr; Gholamreza Darai
Journal:  Virus Genes       Date:  2004-01       Impact factor: 2.332

7.  Construction of a sequence motif characteristic of aminergic G protein-coupled receptors.

Authors:  Enoch S Huang
Journal:  Protein Sci       Date:  2003-07       Impact factor: 6.725

8.  The web server of IBM's Bioinformatics and Pattern Discovery group.

Authors:  Tien Huynh; Isidore Rigoutsos; Laxmi Parida; Daniel Platt; Tetsuo Shibuya
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

9.  Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

Authors:  Isidore Rigoutsos; Peter Riek; Robert M Graham; Jiri Novotny
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

10.  Dictionary-driven protein annotation.

Authors:  Isidore Rigoutsos; Tien Huynh; Aris Floratos; Laxmi Parida; Daniel Platt
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.