Literature DB >> 10964574

Statistical alignment: computational properties, homology testing and goodness-of-fit.

J Hein1, C Wiuf, B Knudsen, M B Møller, G Wibling.   

Abstract

The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.

Mesh:

Substances:

Year:  2000        PMID: 10964574     DOI: 10.1006/jmbi.2000.4061

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  23 in total

1.  Discovering common stem-loop motifs in unaligned RNA sequences.

Authors:  J Gorodkin; S L Stricklin; G D Stormo
Journal:  Nucleic Acids Res       Date:  2001-05-15       Impact factor: 16.971

2.  Pfold: RNA secondary structure prediction using stochastic context-free grammars.

Authors:  Bjarne Knudsen; Jotun Hein
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

3.  MAVID: constrained ancestral alignment of multiple sequences.

Authors:  Nicolas Bray; Lior Pachter
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

4.  A stochastic evolutionary model for protein structure alignment and phylogeny.

Authors:  Christopher J Challis; Scott C Schmidler
Journal:  Mol Biol Evol       Date:  2012-06-21       Impact factor: 16.240

5.  Using evolutionary Expectation Maximization to estimate indel rates.

Authors:  Ian Holmes
Journal:  Bioinformatics       Date:  2005-02-24       Impact factor: 6.937

6.  Reconstructing large regions of an ancestral mammalian genome in silico.

Authors:  Mathieu Blanchette; Eric D Green; Webb Miller; David Haussler
Journal:  Genome Res       Date:  2004-12       Impact factor: 9.043

7.  Homology modelling and molecular dynamics simulations: comparative studies of human aquaporin-1.

Authors:  Richard J Law; Mark S P Sansom
Journal:  Eur Biophys J       Date:  2004-04-08       Impact factor: 1.733

8.  Evolutionary triplet models of structured RNA.

Authors:  Robert K Bradley; Ian Holmes
Journal:  PLoS Comput Biol       Date:  2009-08-28       Impact factor: 4.475

9.  BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.

Authors:  Rahul Satija; Adám Novák; István Miklós; Rune Lyngsø; Jotun Hein
Journal:  BMC Evol Biol       Date:  2009-08-28       Impact factor: 3.260

10.  Quantifying variances in comparative RNA secondary structure prediction.

Authors:  James W J Anderson; Ádám Novák; Zsuzsanna Sükösd; Michael Golden; Preeti Arunapuram; Ingolfur Edvardsson; Jotun Hein
Journal:  BMC Bioinformatics       Date:  2013-05-01       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.