Literature DB >> 8980688

Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments.

O Gotoh1.   

Abstract

The relative performances of four strategies for aligning a large number of protein sequences were assessed by referring to corresponding structural alignments of 54 independent families. Multiple sequence alignment of a family was constructed by a given method from the sequences of known structures and their homologues, and the subset consisting of the sequences of known structures was extracted from the whole alignment and compared with the structural counterpart in a residue-to-residue fashion. Gap-opening and -extension penalties were optimized for each family and method. Each of the four multiple alignment methods gave significantly more accurate alignments than the conventional pairwise method. In addition, a clear difference in performance was detected among three of the four multiple alignment methods examined. The currently most popular progressive method ranked worst among the four, and the randomized iterative strategy that optimizes the sum-of-pairs score ranked next worst. The two best-performing strategies, one of which was newly developed, both pursue an optimal weighted sum-of-pairs score, where the pair weights were introduced to correct for uneven representations of subgroups in a family. The new method uses doubly nested iterations to make alignment, phylogenetic tree and pair weights mutually consistent. Most importantly, the improvement in accuracy of alignments obtained by these iterative methods over pairwise or progressive method tends to increase with decreasing average sequence identity, implying that iterative refinement is more effective for the generally difficult alignment of remotely related sequences. Four well-known amino acid substitution matrices were also tested in combination with the various methods. However, the effects of substitution matrices were found to be minor in the framework of multiple alignment, and the same order of relative performance of the alignment methods was observed with any of the matrices.

Mesh:

Substances:

Year:  1996        PMID: 8980688     DOI: 10.1006/jmbi.1996.0679

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  63 in total

1.  Toward a comprehensive phylogeny for mammalian and avian herpesviruses.

Authors:  D J McGeoch; A Dolan; A C Ralph
Journal:  J Virol       Date:  2000-11       Impact factor: 5.103

2.  DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches.

Authors:  J D Thompson; F Plewniak; J Thierry; O Poch
Journal:  Nucleic Acids Res       Date:  2000-08-01       Impact factor: 16.971

3.  Definition of EGF-like, closely interacting modules that bear activation epitopes in integrin beta subunits.

Authors:  J Takagi; N Beglova; P Yalamanchili; S C Blacklow; T A Springer
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-25       Impact factor: 11.205

4.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors:  Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

5.  LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Authors:  Michael Brudno; Chuong B Do; Gregory M Cooper; Michael F Kim; Eugene Davydov; Eric D Green; Arend Sidow; Serafim Batzoglou
Journal:  Genome Res       Date:  2003-03-12       Impact factor: 9.043

6.  Analysis of cDNAs coding for immunologically dominant antigens from an oncosphere-specific cDNA library of Echinococcus multilocularis.

Authors:  Armin Merckelbach; Martina Wager; Richard Lucius
Journal:  Parasitol Res       Date:  2003-06-26       Impact factor: 2.289

7.  Contact-based sequence alignment.

Authors:  Jens Kleinjung; John Romein; Kuang Lin; Jaap Heringa
Journal:  Nucleic Acids Res       Date:  2004-04-30       Impact factor: 16.971

8.  MAVID: constrained ancestral alignment of multiple sequences.

Authors:  Nicolas Bray; Lior Pachter
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

9.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

10.  Alignment of multiple proteins with an ensemble of hidden Markov models.

Authors:  Jia Song; Chunmei Liu; Yinglei Song; Junfeng Qu; Gurdeep S Hura
Journal:  Int J Data Min Bioinform       Date:  2010       Impact factor: 0.667

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.