Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fold homology detection using sequence fragment composition profiles of proteins.

Literature DB >> 20635424

Fold homology detection using sequence fragment composition profiles of proteins.

Abstract

The effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so-called "twilight zone" problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment-free methods, which use global sequence properties (such as amino acid composition) to identify structural homology in a rapid and straightforward way. We explore the viability of using tetramer sequence fragment composition profiles in finding structural relationships that lie undetected by traditional alignment. We establish a strategy to recast any given protein sequence into a tetramer sequence fragment composition profile, using a series of amino acid clustering steps that have been optimized for mutual information. Our method has the effect of compressing the set of 160,000 unique tetramers (if using the 20-letter amino acid alphabet) into a more tractable number of reduced tetramers (approximately 15-30), so that a meaningful tetramer composition profile can be constructed. We test remote homology detection at the topology and fold superfamily levels using a comprehensive set of fold homologs, culled from the CATH database that share low pairwise sequence similarity. Using the receiver-operating characteristic measure, we demonstrate potentially significant improvement in using information-optimized reduced tetramer composition, over methods relying only on the raw amino acid composition or on traditional sequence alignment, in homology detection at or below the "twilight zone". 2010 Wiley-Liss, Inc.

Entities: Chemical Disease Gene

Mesh：

Substances：
Proteins

Year: 2010 PMID： 20635424 PMCID： PMC2933786 DOI： 10.1002/prot.22788

Source DB: PubMed Journal: Proteins ISSN： 0887-3585

34 in total

1. Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

Authors: F Eisenhaber; C Frömmel; P Argos
Journal: Proteins Date: 1996-06

2. Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.

Authors: F Eisenhaber; F Imperiale; P Argos; C Frömmel
Journal: Proteins Date: 1996-06

10. Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.

Authors: J Gracy; P Argos
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

1 in total

1. Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains.

Authors: Robert W Williams; Bin Xue; Vladimir N Uversky; A Keith Dunker
Journal: Intrinsically Disord Proteins Date: 2013-04-01

1 in total

Fold homology detection using sequence fragment composition profiles of proteins.

1. Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.

2. Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods.

3. CATH--a hierarchic classification of protein domain structures.

4. Deciphering the message in protein sequences: tolerance to amino acid substitutions.

5. The folding type of a protein is relevant to the amino acid composition.

6. Prediction of protein folding class using global description of amino acid sequence.

7. Classification of protein sequences by their dipeptide composition.

8. A sequence property approach to searching protein databases.

9. The DEF data base of sequence based protein fold class predictions.

10. Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.

1. Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains.