Literature DB >> 21702690

On the inference of dirichlet mixture priors for protein sequence comparison.

Xugang Ye1, Yi-Kuo Yu, Stephen F Altschul.   

Abstract

Dirichlet mixtures provide an elegant formalism for constructing and evaluating protein multiple sequence alignments. Their use requires the inference of Dirichlet mixture priors from curated sets of accurately aligned sequences. This article addresses two questions relevant to such inference: of how many components should a Dirichlet mixture consist, and how may a maximum-likelihood mixture be derived from a given data set. To apply the Minimum Description Length principle to the first question, we extend an analytic formula for the complexity of a Dirichlet model to Dirichlet mixtures by informal argument. We apply a Gibbs-sampling based approach to the second question. Using artificial data generated by a Dirichlet mixture, we demonstrate that our methods are able to approximate well the true theory, when it exists. We apply our methods as well to real data, and infer Dirichlet mixtures that describe the data better than does a mixture derived using previous approaches.

Mesh:

Substances:

Year:  2011        PMID: 21702690      PMCID: PMC3145951          DOI: 10.1089/cmb.2011.0040

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  9 in total

1.  Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

Authors:  S Geman; D Geman
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1984-06       Impact factor: 6.226

2.  Compositional adjustment of Dirichlet mixture priors.

Authors:  Xugang Ye; Yi-Kuo Yu; Stephen F Altschul
Journal:  J Comput Biol       Date:  2010-12       Impact factor: 1.479

3.  Optimization by simulated annealing.

Authors:  S Kirkpatrick; C D Gelatt; M P Vecchi
Journal:  Science       Date:  1983-05-13       Impact factor: 47.728

4.  The complexity of the dirichlet model for multiple alignment data.

Authors:  Yi-Kuo Yu; Stephen F Altschul
Journal:  J Comput Biol       Date:  2011-06-24       Impact factor: 1.479

5.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology.

Authors:  K Sjölander; K Karplus; M Brown; R Hughey; A Krogh; I S Mian; D Haussler
Journal:  Comput Appl Biosci       Date:  1996-08

6.  Using Dirichlet mixture priors to derive hidden Markov models for protein families.

Authors:  M Brown; R Hughey; A Krogh; I S Mian; K Sjölander; D Haussler
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1993

7.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Authors:  C E Lawrence; S F Altschul; M S Boguski; J S Liu; A F Neuwald; J C Wootton
Journal:  Science       Date:  1993-10-08       Impact factor: 47.728

8.  The construction and use of log-odds substitution scores for multiple sequence alignment.

Authors:  Stephen F Altschul; John C Wootton; Elena Zaslavsky; Yi-Kuo Yu
Journal:  PLoS Comput Biol       Date:  2010-07-15       Impact factor: 4.475

9.  PSI-BLAST pseudocounts and the minimum description length principle.

Authors:  Stephen F Altschul; E Michael Gertz; Richa Agarwala; Alejandro A Schäffer; Yi-Kuo Yu
Journal:  Nucleic Acids Res       Date:  2008-12-16       Impact factor: 16.971

  9 in total
  3 in total

1.  The complexity of the dirichlet model for multiple alignment data.

Authors:  Yi-Kuo Yu; Stephen F Altschul
Journal:  J Comput Biol       Date:  2011-06-24       Impact factor: 1.479

2.  Dirichlet mixtures, the Dirichlet process, and the structure of protein space.

Authors:  Viet-An Nguyen; Jordan Boyd-Graber; Stephen F Altschul
Journal:  J Comput Biol       Date:  2013-01       Impact factor: 1.479

3.  Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

Authors:  Andrew F Neuwald; Stephen F Altschul
Journal:  PLoS Comput Biol       Date:  2016-12-21       Impact factor: 4.475

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.