Literature DB >> 14992520

Clustering protein sequence and structure space with infinite Gaussian mixture models.

A Dubey1, S Hwang, C Rangel, C E Rasmussen, Z Ghahramani, D L Wild.   

Abstract

We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the theory of infinite Gaussian mixtures models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illustrate our methods with application to three data sets: globin sequences, globin sequences with known three-dimensional structures and G-protein coupled receptor sequences. The consistency of the clusters indicate that our method is producing biologically meaningful results, which provide a very good indication of the underlying families and subfamilies. With the inclusion of secondary structure and residue solvent accessibility information, we obtain a classification of sequences of known structure which both reflects and extends their SCOP classifications. A supplementray web site containing larger versions of the figures is available at http://public.kgi.edu/approximately wid/PSB04/index.html

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14992520     DOI: 10.1142/9789812704856_0038

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  4 in total

1.  Structural characterization of proteins using residue environments.

Authors:  Sean D Mooney; Mike Hsin-Ping Liang; Rob DeConde; Russ B Altman
Journal:  Proteins       Date:  2005-12-01

Review 2.  Functional classification using phylogenomic inference.

Authors:  Duncan Brown; Kimmen Sjölander
Journal:  PLoS Comput Biol       Date:  2006-06-30       Impact factor: 4.475

3.  Choice modelling with Gaussian processes in the social sciences: A case study of neighbourhood choice in Stockholm.

Authors:  Richard P Mann; Viktoria Spaiser; Lina Hedman; David J T Sumpter
Journal:  PLoS One       Date:  2018-11-05       Impact factor: 3.240

4.  Nutrient and salt depletion synergistically boosts glucose metabolism in individual Escherichia coli cells.

Authors:  Georgina Glover; Margaritis Voliotis; Urszula Łapińska; Brandon M Invergo; Darren Soanes; Paul O'Neill; Karen Moore; Nela Nikolic; Peter G Petrov; David S Milner; Sumita Roy; Kate Heesom; Thomas A Richards; Krasimira Tsaneva-Atanasova; Stefano Pagliara
Journal:  Commun Biol       Date:  2022-04-20
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.