Literature DB >> 18274649

Aligning sequences by minimum description length.

John S Conery1.   

Abstract

This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver generates all strings that match the expression. An alignment algorithm uses minimum description length to encode and explore alternative expressions; the expression with the shortest encoding provides the best overall alignment. When two substrings contain letters that are similar according to a substitution matrix, a code length function based on conditional probabilities defined by the matrix will encode the substrings with fewer bits. In one experiment, alignments produced with this new method were found to be comparable to alignments from CLUSTALW. A second experiment measured the accuracy of the new method on pairwise alignments of sequences from the BAliBASE alignment benchmark.

Year:  2007        PMID: 18274649      PMCID: PMC3171350          DOI: 10.1155/2007/72936

Source DB:  PubMed          Journal:  EURASIP J Bioinform Syst Biol        ISSN: 1687-4145


  32 in total

1.  On gaps.

Authors:  G Giribet; W C Wheeler
Journal:  Mol Phylogenet Evol       Date:  1999-10       Impact factor: 4.286

2.  A comprehensive comparison of multiple sequence alignment programs.

Authors:  J D Thompson; F Plewniak; O Poch
Journal:  Nucleic Acids Res       Date:  1999-07-01       Impact factor: 16.971

3.  Measuring the similarity of protein structures by means of the universal similarity metric.

Authors:  N Krasnogor; D A Pelta
Journal:  Bioinformatics       Date:  2004-01-29       Impact factor: 6.937

4.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

Review 5.  Where did the BLOSUM62 alignment score matrix come from?

Authors:  Sean R Eddy
Journal:  Nat Biotechnol       Date:  2004-08       Impact factor: 54.908

6.  Finite-state models in the alignment of macromolecules.

Authors:  L Allison; C S Wallace; C N Yee
Journal:  J Mol Evol       Date:  1992-07       Impact factor: 2.395

7.  PROSITE: a dictionary of sites and patterns in proteins.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  1992-05-11       Impact factor: 16.971

8.  An information theoretic view of gapped and other alignments.

Authors:  J P Schmidt
Journal:  Pac Symp Biocomput       Date:  1998

9.  ApiDB: integrated resources for the apicomplexan bioinformatics resource center.

Authors:  Cristina Aurrecoechea; Mark Heiges; Haiming Wang; Zhiming Wang; Steve Fischer; Philippa Rhodes; John Miller; Eileen Kraemer; Christian J Stoeckert; David S Roos; Jessica C Kissinger
Journal:  Nucleic Acids Res       Date:  2006-11-10       Impact factor: 16.971

10.  Fast and sensitive multiple alignment of large genomic sequences.

Authors:  Michael Brudno; Michael Chapman; Berthold Göttgens; Serafim Batzoglou; Burkhard Morgenstern
Journal:  BMC Bioinformatics       Date:  2003-12-23       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.