Literature DB >> 15276848

Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments.

Mike S S Chang1, Steven A Benner.   

Abstract

To understand how protein segments are inserted and deleted during divergent evolution, a set of pairwise alignments contained exactly one gap, and therefore arising from the first insertion-deletion (indel) event in the time separating the homologs, was examined. The alignments showed that "structure breaking" amino acids (PGDNS) were preferred within and flanking gapped regions, as are two residues with hydrophilic side-chains (QE) that frequently occur at the surface of protein folds. Conversely, hydrophobic residues (FMILYVW) occur infrequently within and flanking the gapped region. These preferences are modestly different in protein pairs separated by an episode of adaptive evolution, than in pairs diverging under strong functional constraints. Surprisingly, regions near an indel have not evolved more rapidly than the sequence pair overall, showing no evidence that an indel event must be compensated by local amino acid replacement. The gap-lengths are best approximated by a Zipfian distribution, with the probability of a gap of length L decreasing as a function of L(-1.8). These features are largely independent of the length of the gap and the extent of divergence (measured by both silent and non-silent sequence changes) separating the two proteins. Surprisingly, amino acid repeats were discovered in more than a third of the polypeptide segments in and around the gap. These correspond to repeats in the DNA sequence. This suggests that a signature of the mechanism by which indels occur in the DNA sequence remains in the encoded protein sequences. These data suggest specific tools to score gap placement in an alignment. They also suggest tools that distinguish true indels from gaps created by mistaken gene finding, including under-predicted and over-predicted introns. By providing mechanisms to identify errors, the tools will enhance the value of genome sequence databases in support of integrated paleogenomics strategies used to extract functional information in a post-genomic environment.

Mesh:

Substances:

Year:  2004        PMID: 15276848     DOI: 10.1016/j.jmb.2004.05.045

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  38 in total

Review 1.  The interface of protein structure, protein biophysics, and molecular evolution.

Authors:  David A Liberles; Sarah A Teichmann; Ivet Bahar; Ugo Bastolla; Jesse Bloom; Erich Bornberg-Bauer; Lucy J Colwell; A P Jason de Koning; Nikolay V Dokholyan; Julian Echave; Arne Elofsson; Dietlind L Gerloff; Richard A Goldstein; Johan A Grahnen; Mark T Holder; Clemens Lakner; Nicholas Lartillot; Simon C Lovell; Gavin Naylor; Tina Perica; David D Pollock; Tal Pupko; Lynne Regan; Andrew Roger; Nimrod Rubinstein; Eugene Shakhnovich; Kimmen Sjölander; Shamil Sunyaev; Ashley I Teufel; Jeffrey L Thorne; Joseph W Thornton; Daniel M Weinreich; Simon Whelan
Journal:  Protein Sci       Date:  2012-04-23       Impact factor: 6.725

2.  Prokaryotes that grow optimally in acid have purine-poor codons in long open reading frames.

Authors:  Feng-Hsu Lin; Donald R Forsdyke
Journal:  Extremophiles       Date:  2006-09-07       Impact factor: 2.395

3.  Ngila: global pairwise alignments with logarithmic and affine gap costs.

Authors:  Reed A Cartwright
Journal:  Bioinformatics       Date:  2007-03-25       Impact factor: 6.937

4.  Problems and solutions for estimating indel rates and length distributions.

Authors:  Reed A Cartwright
Journal:  Mol Biol Evol       Date:  2008-11-28       Impact factor: 16.240

5.  An indel in transmembrane helix 2 helps to trace the molecular evolution of class A G-protein-coupled receptors.

Authors:  Julie Devillé; Julien Rey; Marie Chabbert
Journal:  J Mol Evol       Date:  2009-04-09       Impact factor: 2.395

6.  Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments.

Authors:  Kieran Boyce; Fabian Sievers; Desmond G Higgins
Journal:  Proc Natl Acad Sci U S A       Date:  2015-01-06       Impact factor: 11.205

7.  INDELible: a flexible simulator of biological sequence evolution.

Authors:  William Fletcher; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2009-05-07       Impact factor: 16.240

8.  Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0.

Authors:  Cory L Strope; Kevin Abel; Stephen D Scott; Etsuko N Moriyama
Journal:  Mol Biol Evol       Date:  2009-08-03       Impact factor: 16.240

9.  The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors:  Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

10.  Characterizing gene family evolution.

Authors:  David A Liberles; Katharina Dittmar
Journal:  Biol Proced Online       Date:  2008-06-20       Impact factor: 3.244

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.