Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Bayesian sampler for optimization of protein domain hierarchies.

Literature DB >> 24494927

A Bayesian sampler for optimization of protein domain hierarchies.

Abstract

The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding and deleting leaf nodes, by moving nodes and subtrees up and down the hierarchy, by inserting or deleting internal nodes, and by redefining the sequences and conserved patterns associated with each node. All such operations are based on a probability distribution that models the conserved and divergent patterns defining each subgroup. When we view these patterns as sequence determinants of protein function, each node or subtree in such a hierarchy corresponds to a subgroup of sequences with similar biological properties. The sampler can be applied either de novo or to an existing hierarchy. When applied to 60 protein domains from multiple starting points in this way, it converged on similar solutions with nearly identical log-likelihood ratio scores, suggesting that it typically finds the optimal peak in the posterior probability distribution. Similarities and differences between independently generated, nearly optimal hierarchies for a given domain help distinguish robust from statistically uncertain features. Thus, a future application of the sampler is to provide confidence measures for various features of a domain hierarchy.

Mesh：

Year: 2014 PMID： 24494927 PMCID： PMC3948484 DOI： 10.1089/cmb.2013.0099

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

16 in total

1. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability.

Authors: Christian E V Storm; Erik L L Sonnhammer
Journal: Bioinformatics Date: 2002-01 Impact factor: 6.937

2. Clustering of proximal sequence space for the identification of protein families.

Authors: Federico Abascal; Alfonso Valencia
Journal: Bioinformatics Date: 2002-07 Impact factor: 6.937

3. Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

Authors: Andrew F Neuwald
Journal: Stat Appl Genet Mol Biol Date: 2011-08-04

4. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

5. Optimization by simulated annealing.

Authors: S Kirkpatrick; C D Gelatt; M P Vecchi
Journal: Science Date: 1983-05-13 Impact factor: 47.728

6. Genome-scale phylogenetic function annotation of large and diverse protein families.

Authors: Barbara E Engelhardt; Michael I Jordan; John R Srouji; Steven E Brenner
Journal: Genome Res Date: 2011-07-22 Impact factor: 9.043

7. Ran's C-terminal, basic patch, and nucleotide exchange mechanisms in light of a canonical structure for Rab, Rho, Ras, and Ran GTPases.

Authors: Andrew F Neuwald; Natarajan Kannan; Aleksandar Poleksic; Naoya Hata; Jun S Liu
Journal: Genome Res Date: 2003-04 Impact factor: 9.043

8. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.

Authors: Andrew F Neuwald; Christopher J Lanczycki; Aron Marchler-Bauer
Journal: BMC Bioinformatics Date: 2012-06-22 Impact factor: 3.169

9. RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs.

Authors: Christian M Zmasek; Sean R Eddy
Journal: BMC Bioinformatics Date: 2002-05-16 Impact factor: 3.169

10. The Pfam protein families database.

Authors: Robert D Finn; John Tate; Jaina Mistry; Penny C Coggill; Stephen John Sammut; Hans-Rudolf Hotz; Goran Ceric; Kristoffer Forslund; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal: Nucleic Acids Res Date: 2007-11-26 Impact factor: 16.971

15 in total

A Bayesian sampler for optimization of protein domain hierarchies.

1. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability.

2. Clustering of proximal sequence space for the identification of protein families.

3. Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

4. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

5. Optimization by simulated annealing.

6. Genome-scale phylogenetic function annotation of large and diverse protein families.

7. Ran's C-terminal, basic patch, and nucleotide exchange mechanisms in light of a canonical structure for Rab, Rho, Ras, and Ran GTPases.

8. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.

9. RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs.

10. The Pfam protein families database.

1. Initial Cluster Analysis.

2. Evaluating, comparing, and interpreting protein domain hierarchies.

3. The crystal structure of the protein kinase HIPK2 reveals a unique architecture of its CMGC-insert region.

4. Tracing the origin and evolution of pseudokinases across the tree of life.

5. SPARC: Structural properties associated with residue constraints.

6. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

7. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties.

8. Inferring joint sequence-structural determinants of protein functional specificity.

9. Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases.

10. A survey of TIR domain sequence and structure divergence.