Literature DB >> 14705025

Gaps in structurally similar proteins: towards improvement of multiple sequence alignment.

James O Wrabl1, Nick V Grishin.   

Abstract

An algorithm was developed to locally optimize gaps from the FSSP database. Over 2 million gaps were identified from all versus all FSSP structure comparisons, and datasets of non-identical gaps and flanking regions comprising between 90,000 and 135,000 sequence fragments were extracted for statistical analysis. Relative to background frequencies, gaps were enriched in residue types with small side chains and high turn propensity (D, G, N, P, S), and were depleted in residue types with hydrophobic side chains (C, F, I, L, V, W, Y). In contrast, regions flanking a gap exhibited opposite trends in amino acid frequencies, i.e., enrichment in hydrophobic residues and a high degree of secondary structure. Log-odds scores of residue type as a function of position in or around a gap were derived from the statistics. Three simple experiments demonstrated that these scores contained significant predictive information. First, regions where gaps were observed in single sequences taken from HOMSTRAD structure-based multiple sequence alignments generally scored higher than regions where gaps were not observed. Second, given the correct pairwise-aligned cores, the actual positions of gaps could be reproduced from sequence more accurately using the structurally-derived statistics than by using random pairwise alignments. Finally, revision of the Clustal-W residue-specific gap opening parameters with this new information improved the agreement of Clustal-W alignments with the structure-based alignments. At least three applications for these results are envisioned: improvement of gap penalties in pairwise (or multiple) sequence alignment, prediction of regions of single sequences likely (or unlikely) to contain indels, and more accurate placement of gaps in automated pairwise structure alignment. Copyright 2003 Wiley-Liss, Inc.

Mesh:

Year:  2004        PMID: 14705025     DOI: 10.1002/prot.10508

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  11 in total

1.  Analysis of protein homology by assessing the (dis)similarity in protein loop regions.

Authors:  Anna R Panchenko; Thomas Madej
Journal:  Proteins       Date:  2004-11-15

2.  Aligning sequences by minimum description length.

Authors:  John S Conery
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

Review 3.  Rigorous performance evaluation in protein structure modelling and implications for computational biology.

Authors:  John Moult
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-03-29       Impact factor: 6.237

4.  Characterization of parasite-specific indels and their proposed relevance for selective anthelminthic drug targeting.

Authors:  Qi Wang; Esley Heizer; Bruce A Rosa; Scott A Wildman; James W Janetka; Makedonka Mitreva
Journal:  Infect Genet Evol       Date:  2016-01-30       Impact factor: 3.342

5.  Measuring guide-tree dependency of inferred gaps in progressive aligners.

Authors:  Salvador Capella-Gutiérrez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2013-02-23       Impact factor: 6.937

6.  Systematic analysis of short internal indels and their impact on protein folding.

Authors:  RyangGuk Kim; Jun-tao Guo
Journal:  BMC Struct Biol       Date:  2010-08-04

7.  New tips for structure prediction by comparative modeling.

Authors:  Anwar Rayan
Journal:  Bioinformation       Date:  2009-01-12

8.  The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors:  Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

9.  DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage.

Authors:  Nicole de la Chaux; Philipp W Messer; Peter F Arndt
Journal:  BMC Evol Biol       Date:  2007-10-12       Impact factor: 3.260

10.  The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Authors:  Maricel G Kann; Sergey L Sheetlin; Yonil Park; Stephen H Bryant; John L Spouge
Journal:  Nucleic Acids Res       Date:  2007-06-27       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.