Literature DB >> 23294268

Dirichlet mixtures, the Dirichlet process, and the structure of protein space.

Viet-An Nguyen1, Jordan Boyd-Graber, Stephen F Altschul.   

Abstract

The Dirichlet process is used to model probability distributions that are mixtures of an unknown number of components. Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we use the Dirichlet process to derive such mixtures with an unbounded number of components. This application of the method requires several technical innovations to sample an unbounded number of Dirichlet-mixture components. The resulting Dirichlet mixtures model multiple-alignment data substantially better than do previously derived ones. They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on the structure of proteins. Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino acid multinomial space.

Mesh:

Substances:

Year:  2013        PMID: 23294268      PMCID: PMC3541698          DOI: 10.1089/cmb.2012.0244

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  8 in total

1.  A comparison of scoring functions for protein sequence profile alignment.

Authors:  Robert C Edgar; Kimmen Sjölander
Journal:  Bioinformatics       Date:  2004-02-12       Impact factor: 6.937

2.  An assessment of substitution scores for protein profile-profile comparison.

Authors:  Xugang Ye; Guoli Wang; Stephen F Altschul
Journal:  Bioinformatics       Date:  2011-10-13       Impact factor: 6.937

3.  On the inference of dirichlet mixture priors for protein sequence comparison.

Authors:  Xugang Ye; Yi-Kuo Yu; Stephen F Altschul
Journal:  J Comput Biol       Date:  2011-06-24       Impact factor: 1.479

4.  The complexity of the dirichlet model for multiple alignment data.

Authors:  Yi-Kuo Yu; Stephen F Altschul
Journal:  J Comput Biol       Date:  2011-06-24       Impact factor: 1.479

5.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology.

Authors:  K Sjölander; K Karplus; M Brown; R Hughey; A Krogh; I S Mian; D Haussler
Journal:  Comput Appl Biosci       Date:  1996-08

6.  Using Dirichlet mixture priors to derive hidden Markov models for protein families.

Authors:  M Brown; R Hughey; A Krogh; I S Mian; K Sjölander; D Haussler
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1993

7.  Lines of descent in the diffusion approximation of neutral Wright-Fisher models.

Authors:  R C Griffiths
Journal:  Theor Popul Biol       Date:  1980-02       Impact factor: 1.570

8.  The construction and use of log-odds substitution scores for multiple sequence alignment.

Authors:  Stephen F Altschul; John C Wootton; Elena Zaslavsky; Yi-Kuo Yu
Journal:  PLoS Comput Biol       Date:  2010-07-15       Impact factor: 4.475

  8 in total
  4 in total

1.  Log-odds sequence logos.

Authors:  Yi-Kuo Yu; John A Capra; Aleksandar Stojmirović; David Landsman; Stephen F Altschul
Journal:  Bioinformatics       Date:  2014-10-06       Impact factor: 6.937

2.  Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

Authors:  Andrew F Neuwald; Stephen F Altschul
Journal:  PLoS Comput Biol       Date:  2016-12-21       Impact factor: 4.475

3.  Bridging the gaps in statistical models of protein alignment.

Authors:  Dinithi Sumanaweera; Lloyd Allison; Arun S Konagurthu
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

4.  Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties.

Authors:  Andrew F Neuwald; Stephen F Altschul
Journal:  PLoS Comput Biol       Date:  2016-05-18       Impact factor: 4.475

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.