Literature DB >> 15668398

An Eulerian path approach to local multiple alignment for DNA sequences.

Yu Zhang1, Michael S Waterman.   

Abstract

Expensive computation in handling a large number of sequences limits the application of local multiple sequence alignment. We present an Eulerian path approach to local multiple alignment for DNA sequences. The computational time and memory usage of this approach is approximately linear to the total size of sequences analyzed; hence, it can handle thousands of sequences or millions of letters simultaneously. By constructing a De Bruijn graph, most of the conserved segments are amplified as heavy Eulerian paths in the graph, and the original patterns distributed in sequences are recovered even if they do not exist in any single sequence. This approach can accurately detect unknown conserved regions, for both short and long, conserved and degenerate patterns. We further present a Poisson heuristic to estimate the significance of a local multiple alignment. The performance of our method is demonstrated by finding Alu repeats in the human genome. We compare the results with Alus marked by repeatmasker, where the two programs are in good agreement. Our method is robust under various conditions and superior to other methods in terms of efficiency and accuracy.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15668398      PMCID: PMC547885          DOI: 10.1073/pnas.0409240102

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  24 in total

1.  Estimating the repeat structure and length of DNA sequences using L-tuples.

Authors:  Xiaoman Li; Michael S Waterman
Journal:  Genome Res       Date:  2003-08       Impact factor: 9.043

2.  An Eulerian path approach to global multiple alignment for DNA sequences.

Authors:  Yu Zhang; Michael S Waterman
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  Automatic generation of primary sequence patterns from sets of related protein sequences.

Authors:  R F Smith; T F Smith
Journal:  Proc Natl Acad Sci U S A       Date:  1990-01       Impact factor: 11.205

5.  Consensus methods for DNA and protein sequence alignment.

Authors:  M S Waterman; R Jones
Journal:  Methods Enzymol       Date:  1990       Impact factor: 1.600

6.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

7.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons.

Authors:  M S Waterman; M Eggert
Journal:  J Mol Biol       Date:  1987-10-20       Impact factor: 5.469

8.  Phase transitions in sequence matches and nucleic acid structure.

Authors:  M S Waterman; L Gordon; R Arratia
Journal:  Proc Natl Acad Sci U S A       Date:  1987-03       Impact factor: 11.205

9.  The statistical distribution of nucleic acid similarities.

Authors:  T F Smith; M S Waterman; C Burks
Journal:  Nucleic Acids Res       Date:  1985-01-25       Impact factor: 16.971

10.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

View more
  3 in total

1.  Design of compact, universal DNA microarrays for protein binding microarray experiments.

Authors:  Anthony A Philippakis; Aaron M Qureshi; Michael F Berger; Martha L Bulyk
Journal:  J Comput Biol       Date:  2008-09       Impact factor: 1.479

2.  progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement.

Authors:  Aaron E Darling; Bob Mau; Nicole T Perna
Journal:  PLoS One       Date:  2010-06-25       Impact factor: 3.240

3.  Mugsy: fast multiple alignment of closely related whole genomes.

Authors:  Samuel V Angiuoli; Steven L Salzberg
Journal:  Bioinformatics       Date:  2010-12-09       Impact factor: 6.937

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.