Literature DB >> 12441381

Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies.

Alex C W May1.   

Abstract

It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12441381      PMCID: PMC2373737          DOI: 10.1110/ps.0211202

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  21 in total

1.  MOSAIC: segmenting multiple aligned DNA sequences.

Authors:  C André; P Vincens; J F Boisvieux; S Hazout
Journal:  Bioinformatics       Date:  2001-02       Impact factor: 6.937

2.  Extreme functional sensitivity to conservative amino acid changes on enzyme exteriors.

Authors:  D D Axe
Journal:  J Mol Biol       Date:  2000-08-18       Impact factor: 5.469

3.  Information-theoretical entropy as a measure of sequence variability.

Authors:  P S Shenkin; B Erman; L D Mastrandrea
Journal:  Proteins       Date:  1991

4.  Discrimination of common protein folds: application of protein structure to sequence/structure comparisons.

Authors:  M S Johnson; A C May; M A Rodionov; J P Overington
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

5.  CONRAD: a method for identification of variable and conserved regions within proteins by scale-space filtering.

Authors:  G Herrmann; A Schön; R Brack-Werner; T Werner
Journal:  Comput Appl Biosci       Date:  1996-06

6.  Principles that govern the folding of protein chains.

Authors:  C B Anfinsen
Journal:  Science       Date:  1973-07-20       Impact factor: 47.728

Review 7.  Knowledge-based prediction of protein structures and the design of novel molecules.

Authors:  T L Blundell; B L Sibanda; M J Sternberg; J M Thornton
Journal:  Nature       Date:  1987 Mar 26-Apr 1       Impact factor: 49.962

8.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

9.  Position-based sequence weights.

Authors:  S Henikoff; J G Henikoff
Journal:  J Mol Biol       Date:  1994-11-04       Impact factor: 5.469

10.  The relation between the divergence of sequence and structure in proteins.

Authors:  C Chothia; A M Lesk
Journal:  EMBO J       Date:  1986-04       Impact factor: 11.598

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.