Literature DB >> 21128852

Compositional adjustment of Dirichlet mixture priors.

Xugang Ye1, Yi-Kuo Yu, Stephen F Altschul.   

Abstract

Dirichlet mixture priors provide a Bayesian formalism for scoring alignments of protein profiles to individual sequences, which can be generalized to constructing scores for multiple-alignment columns. A Dirichlet mixture is a probability distribution over multinomial space, each of whose components can be thought of as modeling a type of protein position. Applied to the simplest case of pairwise sequence alignment, a Dirichlet mixture is equivalent to an implied symmetric substitution matrix. For alphabets of even size L, Dirichlet mixtures with L/2 components and symmetric substitution matrices have an identical number of free parameters. Although this suggests the possibility of a one-to-one mapping between the two formalisms, we show that there are some symmetric matrices no Dirichlet mixture can imply, and others implied by many distinct Dirichlet mixtures. Dirichlet mixtures are derived empirically from curated sets of multiple alignments. They imply "background" amino acid frequencies characteristic of these sets, and should thus be non-optimal for comparing proteins with non-standard composition. Given a mixture Θ, we seek an adjusted Θ' that implies the desired composition, but that minimizes an appropriate relative-entropy-based distance function. To render the problem tractable, we fix the mixture parameter as well as the sum of the Dirichlet parameters for each component, allowing only its center of mass to vary. This linearizes the constraints on the remaining parameters. An approach to finding Θ' may be based on small consecutive parameter adjustments. The relative entropy of two Dirichlet distributions separated by a small change in their parameter values implies a quadratic cost function for such changes. For a small change in implied background frequencies, this function can be minimized using the Lagrange-Newton method. We have implemented this method, and can compositionally adjust to good precision a 20-component Dirichlet mixture prior for proteins in under half a second on a standard workstation.

Mesh:

Substances:

Year:  2010        PMID: 21128852      PMCID: PMC3123133          DOI: 10.1089/cmb.2010.0117

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  16 in total

1.  A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins.

Authors:  H Wan; J C Wootton
Journal:  Comput Chem       Date:  2000-01

2.  Non-symmetric score matrices and the detection of homologous transmembrane proteins.

Authors:  T Müller; S Rahmann; M Rehmsmeier
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

3.  The compositional adjustment of amino acid substitution matrices.

Authors:  Yi-Kuo Yu; John C Wootton; Stephen F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-08       Impact factor: 11.205

4.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

5.  Directional mutation pressure and neutral molecular evolution.

Authors:  N Sueoka
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

6.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

7.  Multiple sequence alignment.

Authors:  D J Bacon; W F Anderson
Journal:  J Mol Biol       Date:  1986-09-20       Impact factor: 5.469

8.  Simultaneous comparison of three protein sequences.

Authors:  M Murata; J S Richardson; J L Sussman
Journal:  Proc Natl Acad Sci U S A       Date:  1985-05       Impact factor: 11.205

9.  The construction and use of log-odds substitution scores for multiple sequence alignment.

Authors:  Stephen F Altschul; John C Wootton; Elena Zaslavsky; Yi-Kuo Yu
Journal:  PLoS Comput Biol       Date:  2010-07-15       Impact factor: 4.475

10.  Amino acid substitution matrices from an information theoretic perspective.

Authors:  S F Altschul
Journal:  J Mol Biol       Date:  1991-06-05       Impact factor: 5.469

View more
  3 in total

1.  On the inference of dirichlet mixture priors for protein sequence comparison.

Authors:  Xugang Ye; Yi-Kuo Yu; Stephen F Altschul
Journal:  J Comput Biol       Date:  2011-06-24       Impact factor: 1.479

2.  No Substantial Evidence for Sexual Transmission of Minority HIV Drug Resistance Mutations in Men Who Have Sex with Men.

Authors:  Antoine Chaillon; Masato Nakazawa; Joel O Wertheim; Susan J Little; Davey M Smith; Sanjay R Mehta; Sara Gianella
Journal:  J Virol       Date:  2017-10-13       Impact factor: 5.103

3.  Dirichlet multinomial mixtures: generative models for microbial metagenomics.

Authors:  Ian Holmes; Keith Harris; Christopher Quince
Journal:  PLoS One       Date:  2012-02-03       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.