Literature DB >> 1518085

Finite-state models in the alignment of macromolecules.

L Allison1, C S Wallace, C N Yee.   

Abstract

Minimum message length encoding is a technique of inductive inference with theoretical and practical advantages. It allows the posterior odds-ratio of two theories or hypotheses to be calculated. Here it is applied to problems of aligning or relating two strings, in particular two biological macromolecules. We compare the r-theory, that the strings are related, with the null-theory, that they are not related. If they are related, the probabilities of the various alignments can be calculated. This is done for one-, three-, and five-state models of relation or mutation. These correspond to linear and piecewise linear cost functions on runs of insertions and deletions. We describe how to estimate parameters of a model. The validity of a model is itself an hypothesis and can be objectively tested. This is done on real DNA strings and on artificial data. The tests on artificial data indicate limits on what can be inferred in various situations. The tests on real DNA support either the three- or five-state models over the one-state model. Finally, a fast, approximate minimum message length string comparison algorithm is described.

Mesh:

Substances:

Year:  1992        PMID: 1518085     DOI: 10.1007/bf00160262

Source DB:  PubMed          Journal:  J Mol Evol        ISSN: 0022-2844            Impact factor:   2.395


  13 in total

1.  An evolutionary model for maximum likelihood alignment of DNA sequences.

Authors:  J L Thorne; H Kishino; J Felsenstein
Journal:  J Mol Evol       Date:  1991-08       Impact factor: 2.395

2.  Optimal sequence alignment allowing for long gaps.

Authors:  O Gotoh
Journal:  Bull Math Biol       Date:  1990       Impact factor: 1.758

3.  The multiple origins of human Alu sequences.

Authors:  W Bains
Journal:  J Mol Evol       Date:  1986       Impact factor: 2.395

4.  Sequence comparison with concave weighting functions.

Authors:  W Miller; E W Myers
Journal:  Bull Math Biol       Date:  1988       Impact factor: 1.758

5.  An application of information theory to genetic mutations and the matching of polypeptide sequences.

Authors:  T A Reichert; D N Cohen; A K Wong
Journal:  J Theor Biol       Date:  1973-11-15       Impact factor: 2.691

6.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

7.  The information content of a multistate distribution.

Authors:  D M Boulton; C S Wallace
Journal:  J Theor Biol       Date:  1969-05       Impact factor: 2.691

8.  Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.

Authors:  J C Shepherd
Journal:  Proc Natl Acad Sci U S A       Date:  1981-03       Impact factor: 11.205

9.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

10.  Reconstruction and analysis of human Alu genes.

Authors:  J Jurka; A Milosavljevic
Journal:  J Mol Evol       Date:  1991-02       Impact factor: 2.395

View more
  9 in total

1.  An unsupervised method for the extraction of propositional information from text.

Authors:  Simon Dennis
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-15       Impact factor: 11.205

2.  Aligning sequences by minimum description length.

Authors:  John S Conery
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

3.  Evolutionary inference via the Poisson Indel Process.

Authors:  Alexandre Bouchard-Côté; Michael I Jordan
Journal:  Proc Natl Acad Sci U S A       Date:  2012-12-28       Impact factor: 11.205

4.  The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments.

Authors:  L Allison; C S Wallace
Journal:  J Mol Evol       Date:  1994-10       Impact factor: 2.395

5.  A genome alignment algorithm based on compression.

Authors:  Minh Duc Cao; Trevor I Dix; Lloyd Allison
Journal:  BMC Bioinformatics       Date:  2010-12-16       Impact factor: 3.169

6.  Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors.

Authors:  Dinithi Sumanaweera; Lloyd Allison; Arun S Konagurthu
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

7.  Bridging the gaps in statistical models of protein alignment.

Authors:  Dinithi Sumanaweera; Lloyd Allison; Arun S Konagurthu
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

8.  A new statistical framework to assess structural alignment quality using information compression.

Authors:  James H Collier; Lloyd Allison; Arthur M Lesk; Maria Garcia de la Banda; Arun S Konagurthu
Journal:  Bioinformatics       Date:  2014-09-01       Impact factor: 6.937

9.  Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION(TM) sequencing.

Authors:  Minh Duc Cao; Devika Ganesamoorthy; Alysha G Elliott; Huihui Zhang; Matthew A Cooper; Lachlan J M Coin
Journal:  Gigascience       Date:  2016-07-26       Impact factor: 6.524

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.