Literature DB >> 11072327

MUSCA: An Algorithm for Constrained Alignment of Multiple Data Sequences.

.   

Abstract

Given a set of N sequences, the Multiple Sequence Alignment problem is to align these N sequences, possibly with gaps, that brings out the best commonality of the N sequences. MUSCA is a two-stage approach to the alignment problem by identifying two relatively simpler sub-problems whose solutions are used to obtain the alignment of the sequences. We first discover motifs in the N sequences and then extract an appropriate subset of compatible motifs to obtain a good alignment. The motifs of interest to us are the irredundant motifs which are only polynomial in the input size. In practice, however, the number is much smaller (sub-linear). Notice that this step aids in a direct N-wise alignment, as opposed to composing the alignments from lower order (say pairwise) alignments and the solution is also independent of the order of the input sequences; hence the algorithm works very well while dealing with a large number of sequences. The second part of the problem that deals with obtaining a good alignment is solved using a graph-theoretic approach that computes an induced subgraph satisfying certain simple constraints. We reduce a version of this problem to that of solving an instance of a set covering problem, thus offer the best possible approximate solution to the problem (provided P not equalNP). Our experimental results, while being preliminary, indicate that this approach is efficient, particularly on large numbers of long sequences, and, gives good alignments when tested on biological data such as DNA and protein sequences. We introduce the the notion of an alignment number K (2 </= K </= N), a user-controlled parameter, that lends a useful flexibility to the aligning program: this additional requirement constrains the alignment to have at least K sequences agree on a character, whenever possible, in the alignment. The usefulness of the alignment number is corroborated by the users who view this as a natural constraint while dealing with a large number of sequences.

Year:  1998        PMID: 11072327

Source DB:  PubMed          Journal:  Genome Inform Ser Workshop Genome Inform


  3 in total

1.  The web server of IBM's Bioinformatics and Pattern Discovery group.

Authors:  Tien Huynh; Isidore Rigoutsos; Laxmi Parida; Daniel Platt; Tetsuo Shibuya
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

2.  The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.

Authors:  Tien Huynh; Isidore Rigoutsos
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

3.  High-resolution X-ray structure of the trimeric Scar/WAVE-complex precursor Brk1.

Authors:  Joern Linkner; Gregor Witte; Theresia Stradal; Ute Curth; Jan Faix
Journal:  PLoS One       Date:  2011-06-20       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.