Literature DB >> 28871272

Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition.

Jiong Zhang1, Ian E H Yen2, Pradeep Ravikumar2, Inderjit S Dhillon1.   

Abstract

Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.

Entities:  

Year:  2017        PMID: 28871272      PMCID: PMC5581665     

Source DB:  PubMed          Journal:  JMLR Workshop Conf Proc        ISSN: 1938-7288


  14 in total

Review 1.  Recent progress in multiple sequence alignment: a survey.

Authors:  Cédric Notredame
Journal:  Pharmacogenomics       Date:  2002-01       Impact factor: 2.533

2.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

3.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

4.  An evolutionary model for maximum likelihood alignment of DNA sequences.

Authors:  J L Thorne; H Kishino; J Felsenstein
Journal:  J Mol Evol       Date:  1991-08       Impact factor: 2.395

5.  Settling the intractability of multiple alignment.

Authors:  Isaac Elias
Journal:  J Comput Biol       Date:  2006-09       Impact factor: 1.479

6.  A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery.

Authors:  Ian E H Yen; Xin Lin; Jiong Zhang; Pradeep Ravikumar; Inderjit S Dhillon
Journal:  JMLR Workshop Conf Proc       Date:  2016

7.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

8.  Kalign--an accurate and fast multiple sequence alignment algorithm.

Authors:  Timo Lassmann; Erik L L Sonnhammer
Journal:  BMC Bioinformatics       Date:  2005-12-12       Impact factor: 3.169

9.  MAFFT version 5: improvement in accuracy of multiple sequence alignment.

Authors:  Kazutaka Katoh; Kei-ichi Kuma; Hiroyuki Toh; Takashi Miyata
Journal:  Nucleic Acids Res       Date:  2005-01-20       Impact factor: 16.971

Review 10.  Recent evolutions of multiple sequence alignment algorithms.

Authors:  Cédric Notredame
Journal:  PLoS Comput Biol       Date:  2007-08       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.