Literature DB >> 16597242

A polynomial time solvable formulation of multiple sequence alignment.

Sing-Hoi Sze1, Yue Lu, Qingwu Yang.   

Abstract

Since traditional multiple alignment formulations are NP-hard, heuristics are commonly employed to find acceptable alignments with no guaranteed performance bound. This causes a substantial difficulty in understanding what the resulting alignment means and in assessing the quality of these alignments. We propose an alternative formulation of multiple alignment based on the idea of finding a multiple alignment of k sequences which preserves k - 1 pairwise alignments as specified by edges of a given tree. Although it is well known that such a preserving alignment always exists, it did not become a mainstream method for multiple alignment since it seems that a lot of information is lost from ignoring pairwise similarities outside the tree. In contrast, by using pairwise alignments that incorporate consistency information from other sequences, we show that it is possible to obtain very good accuracy with the preserving alignment formulation. We show that a reasonable objective function to use is to find the shortest preserving alignment, and, by a reduction to a graph-theoretic problem, that the problem of finding the shortest preserving multiple alignment can be solved in polynomial time. We demonstrate the success of this approach on three sets of benchmark multiple alignments by using consistency-based pairwise alignments from the first stage of two of the best performing progressive alignment algorithms TCoffee and ProbCons and replace the second heuristic progressive step of these algorithms by the exact preserving alignment step. We apply this strategy to TCoffee and show that our approach outperforms TCoffee on two of the three test sets. We apply the strategy to a variant of ProbCons with no iterative refinements and show that our approach achieves similar or better accuracy except on one test set. We also compare our performance to ProbCons with iterative refinements and show that our approach achieves similar or better accuracy on many subcategories even without further refinements. The most important advantage of the preserving alignment formulation is that we are certain that we can solve the problem in polynomial time without using a heuristic. A software program implementing this approach (PSAlign) is available at http://faculty.cs.tamu.edu/shsze/psalign.

Mesh:

Substances:

Year:  2006        PMID: 16597242     DOI: 10.1089/cmb.2006.13.309

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  4 in total

1.  MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts.

Authors:  Xin Deng; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2011-12-14       Impact factor: 3.169

2.  JCoDA: a tool for detecting evolutionary selection.

Authors:  Steven N Steinway; Ruth Dannenfelser; Christopher D Laucius; James E Hayes; Sudhir Nayak
Journal:  BMC Bioinformatics       Date:  2010-05-27       Impact factor: 3.169

3.  Protein multiple sequence alignment by hybrid bio-inspired algorithms.

Authors:  Vincenzo Cutello; Giuseppe Nicosia; Mario Pavone; Igor Prizzi
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

4.  Grammar-based distance in progressive multiple sequence alignment.

Authors:  David J Russell; Hasan H Otu; Khalid Sayood
Journal:  BMC Bioinformatics       Date:  2008-07-10       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.