Literature DB >> 20635424

Fold homology detection using sequence fragment composition profiles of proteins.

Armando D Solis1, Shalom R Rackovsky.   

Abstract

The effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so-called "twilight zone" problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment-free methods, which use global sequence properties (such as amino acid composition) to identify structural homology in a rapid and straightforward way. We explore the viability of using tetramer sequence fragment composition profiles in finding structural relationships that lie undetected by traditional alignment. We establish a strategy to recast any given protein sequence into a tetramer sequence fragment composition profile, using a series of amino acid clustering steps that have been optimized for mutual information. Our method has the effect of compressing the set of 160,000 unique tetramers (if using the 20-letter amino acid alphabet) into a more tractable number of reduced tetramers (approximately 15-30), so that a meaningful tetramer composition profile can be constructed. We test remote homology detection at the topology and fold superfamily levels using a comprehensive set of fold homologs, culled from the CATH database that share low pairwise sequence similarity. Using the receiver-operating characteristic measure, we demonstrate potentially significant improvement in using information-optimized reduced tetramer composition, over methods relying only on the raw amino acid composition or on traditional sequence alignment, in homology detection at or below the "twilight zone". 2010 Wiley-Liss, Inc.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20635424      PMCID: PMC2933786          DOI: 10.1002/prot.22788

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  34 in total

1.  Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

Authors:  F Eisenhaber; C Frömmel; P Argos
Journal:  Proteins       Date:  1996-06

2.  Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.

Authors:  F Eisenhaber; F Imperiale; P Argos; C Frömmel
Journal:  Proteins       Date:  1996-06

3.  CATH--a hierarchic classification of protein domain structures.

Authors:  C A Orengo; A D Michie; S Jones; D T Jones; M B Swindells; J M Thornton
Journal:  Structure       Date:  1997-08-15       Impact factor: 5.006

4.  Deciphering the message in protein sequences: tolerance to amino acid substitutions.

Authors:  J U Bowie; J F Reidhaar-Olson; W A Lim; R T Sauer
Journal:  Science       Date:  1990-03-16       Impact factor: 47.728

5.  The folding type of a protein is relevant to the amino acid composition.

Authors:  H Nakashima; K Nishikawa; T Ooi
Journal:  J Biochem       Date:  1986-01       Impact factor: 3.387

6.  Prediction of protein folding class using global description of amino acid sequence.

Authors:  I Dubchak; I Muchnik; S R Holbrook; S H Kim
Journal:  Proc Natl Acad Sci U S A       Date:  1995-09-12       Impact factor: 11.205

7.  Classification of protein sequences by their dipeptide composition.

Authors:  P Petrilli
Journal:  Comput Appl Biosci       Date:  1993-04

8.  A sequence property approach to searching protein databases.

Authors:  U Hobohm; C Sander
Journal:  J Mol Biol       Date:  1995-08-18       Impact factor: 5.469

9.  The DEF data base of sequence based protein fold class predictions.

Authors:  M Reczko; H Bohr
Journal:  Nucleic Acids Res       Date:  1994-09       Impact factor: 16.971

10.  Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.

Authors:  J Gracy; P Argos
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

View more
  1 in total

1.  Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains.

Authors:  Robert W Williams; Bin Xue; Vladimir N Uversky; A Keith Dunker
Journal:  Intrinsically Disord Proteins       Date:  2013-04-01
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.