Literature DB >> 22540977

Comprehensive comparison of graph based multiple protein sequence alignment strategies.

Ilya Plyusnin1, Liisa Holm.   

Abstract

BACKGROUND: Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark.
RESULTS: Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal.
CONCLUSIONS: This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1).

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22540977      PMCID: PMC3375188          DOI: 10.1186/1471-2105-13-64

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  31 in total

1.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs.

Authors:  J D Thompson; F Plewniak; O Poch
Journal:  Bioinformatics       Date:  1999-01       Impact factor: 6.937

2.  SABmark--a benchmark for sequence alignment that covers the entire known fold space.

Authors:  Ivo Van Walle; Ignace Lasters; Lode Wyns
Journal:  Bioinformatics       Date:  2004-08-27       Impact factor: 6.937

3.  Accurate detection of very sparse sequence motifs.

Authors:  Andreas Heger; Michael Lappe; Liisa Holm
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

4.  ProbCons: Probabilistic consistency-based multiple sequence alignment.

Authors:  Chuong B Do; Mahathi S P Mahabhashyam; Michael Brudno; Serafim Batzoglou
Journal:  Genome Res       Date:  2005-02       Impact factor: 9.043

5.  A novel randomized iterative strategy for aligning multiple protein sequences.

Authors:  M P Berger; P J Munson
Journal:  Comput Appl Biosci       Date:  1991-10

6.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.

Authors:  I N Shindyalov; P E Bourne
Journal:  Protein Eng       Date:  1998-09

7.  HOMSTRAD: a database of protein structure alignments for homologous families.

Authors:  K Mizuguchi; C M Deane; T L Blundell; J P Overington
Journal:  Protein Sci       Date:  1998-11       Impact factor: 6.725

8.  Comprehensive study on iterative algorithms of multiple sequence alignment.

Authors:  M Hirosawa; Y Totoki; M Hoshida; M Ishikawa
Journal:  Comput Appl Biosci       Date:  1995-02

9.  Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins.

Authors:  N S Boutonnet; M J Rooman; M E Ochagavia; J Richelle; S J Wodak
Journal:  Protein Eng       Date:  1995-07

10.  The FSSP database of structurally aligned protein fold families.

Authors:  L Holm; C Sander
Journal:  Nucleic Acids Res       Date:  1994-09       Impact factor: 16.971

View more
  3 in total

1.  Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment.

Authors:  Osamu Gotoh
Journal:  Methods Mol Biol       Date:  2021

Review 2.  Computational enzyme design approaches with significant biological outcomes: progress and challenges.

Authors:  Xiaoman Li; Ziding Zhang; Jiangning Song
Journal:  Comput Struct Biotechnol J       Date:  2012-10-17       Impact factor: 7.271

3.  FAMSA: Fast and accurate multiple sequence alignment of huge protein families.

Authors:  Sebastian Deorowicz; Agnieszka Debudaj-Grabysz; Adam Gudyś
Journal:  Sci Rep       Date:  2016-09-27       Impact factor: 4.379

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.