Literature DB >> 18511467

Efficient functional clustering of protein sequences using the Dirichlet process.

Duncan P Brown1.   

Abstract

MOTIVATION: Automatic clustering of protein sequences is an important problem in computational biology. The recent explosion in genome sequences has given biological researchers a vast number of novel protein sequences. However, the majority of these sequences have no experimental evidence for their molecular function in the cell, and the responsibility for correctly annotating these sequences falls upon the bioinformatics community. Ideally, we would like to be able to group sequences of similar or identical molecular function in an automatic fashion, without relying on experimental evidence.
RESULTS: In this article I present a novel probabilistic framework that models subfamilies within a known protein family. Given a multiple sequence alignment, the model uses Dirichlet mixture densities to estimate amino acid preferences within subfamily clusters, and places a Dirichlet process prior on the overall set of clusters. Based on results from several datasets, the model breaks data accurately into functional subgroups. AVAILABILITY: The algorithm is implemented as c++ software available at bpg-research.berkeley.edu/approximately duncanb/dpcluster/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2008        PMID: 18511467     DOI: 10.1093/bioinformatics/btn244

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  A genomic-scale artificial microRNA library as a tool to investigate the functionally redundant gene space in Arabidopsis.

Authors:  Felix Hauser; Wenxiao Chen; Ulrich Deinlein; Kenneth Chang; Stephan Ossowski; Joffrey Fitz; Gregory J Hannon; Julian I Schroeder
Journal:  Plant Cell       Date:  2013-08-16       Impact factor: 11.277

2.  The construction and use of log-odds substitution scores for multiple sequence alignment.

Authors:  Stephen F Altschul; John C Wootton; Elena Zaslavsky; Yi-Kuo Yu
Journal:  PLoS Comput Biol       Date:  2010-07-15       Impact factor: 4.475

3.  Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering.

Authors:  Xiaolin Hao; Rui Jiang; Ting Chen
Journal:  Bioinformatics       Date:  2011-01-13       Impact factor: 6.937

4.  Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences.

Authors:  Miguel A Santos; Andrei L Turinsky; Serene Ong; Jennifer Tsai; Michael F Berger; Gwenael Badis; Shaheynoor Talukder; Andrew R Gehrke; Martha L Bulyk; Timothy R Hughes; Shoshana J Wodak
Journal:  Nucleic Acids Res       Date:  2010-08-12       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.