Literature DB >> 11673243

Model-based clustering and data transformations for gene expression data.

K Y Yeung1, C Fraley, A Murua, A E Raftery, W L Ruzzo.   

Abstract

MOTIVATION: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a 'good' clustering method and determining the 'correct' number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications.
RESULTS: We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has superior performance on our synthetic data sets, consistently selecting the correct model and the number of clusters. On real expression data, the model-based approach produced clusters of quality comparable to a leading heuristic clustering algorithm, but with the key advantage of suggesting the number of clusters and an appropriate model. We also explored the validity of the Gaussian mixture assumption on different transformations of real data. We also assessed the degree to which these real gene expression data sets fit multivariate Gaussian distributions both before and after subjecting them to commonly used data transformations. Suitably chosen transformations seem to result in reasonable fits. AVAILABILITY: MCLUST is available at http://www.stat.washington.edu/fraley/mclust. The software for the diagonal model is under development. CONTACT: kayee@cs.washington.edu. SUPPLEMENTARY INFORMATION: http://www.cs.washington.edu/homes/kayee/model.

Entities:  

Mesh:

Year:  2001        PMID: 11673243     DOI: 10.1093/bioinformatics/17.10.977

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  136 in total

1.  Density of points clustering, application to transcriptomic data analysis.

Authors:  Nicolas Wicker; Doulaye Dembele; Wolfgang Raffelsberger; Olivier Poch
Journal:  Nucleic Acids Res       Date:  2002-09-15       Impact factor: 16.971

2.  Introducing knowledge into differential expression analysis.

Authors:  Ewa Szczurek; Przemysław Biecek; Jerzy Tiuryn; Martin Vingron
Journal:  J Comput Biol       Date:  2010-08       Impact factor: 1.479

3.  A marginal mixture model for selecting differentially expressed genes across two types of tissue samples.

Authors:  Weiliang Qiu; Wenqing He; Xiaogang Wang; Ross Lazarus
Journal:  Int J Biostat       Date:  2008-10-09       Impact factor: 0.968

4.  Gene expression programs during Brassica oleracea seed maturation, osmopriming, and germination are indicators of progression of the germination process and the stress tolerance level.

Authors:  Yasutaka Soeda; Maurice C J M Konings; Oscar Vorst; Adele M M L van Houwelingen; Geert M Stoopen; Chris A Maliepaard; Jan Kodde; Raoul J Bino; Steven P C Groot; Apolonia H M van der Geest
Journal:  Plant Physiol       Date:  2004-12-23       Impact factor: 8.340

5.  Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance.

Authors:  Kevin K Lin; Darya Chudova; G Wesley Hatfield; Padhraic Smyth; Bogi Andersen
Journal:  Proc Natl Acad Sci U S A       Date:  2004-11-01       Impact factor: 11.205

6.  EN1 Is a Transcriptional Dependency in Triple-Negative Breast Cancer Associated with Brain Metastasis.

Authors:  Guillermo Peluffo; Ashim Subedee; Nicholas W Harper; Natalie Kingston; Bojana Jovanović; Felipe Flores; Laura E Stevens; Francisco Beca; Anne Trinh; Chandra Sekhar Reddy Chilamakuri; Evangelia K Papachristou; Katherine Murphy; Ying Su; Andriy Marusyk; Clive S D'Santos; Oscar M Rueda; Andrew H Beck; Carlos Caldas; Jason S Carroll; Kornelia Polyak
Journal:  Cancer Res       Date:  2019-06-25       Impact factor: 12.701

7.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.

Authors:  Limin Fu; Enzo Medico
Journal:  BMC Bioinformatics       Date:  2007-01-04       Impact factor: 3.169

8.  Tumor antigen acrosin binding protein normalizes mitotic spindle function to promote cancer cell proliferation.

Authors:  Angelique W Whitehurst; Yang Xie; Scott C Purinton; Kathryn M Cappell; Jackie T Swanik; Brittany Larson; Luc Girard; John O Schorge; Michael A White
Journal:  Cancer Res       Date:  2010-09-28       Impact factor: 12.701

9.  Clustering of time-course gene expression data using functional data analysis.

Authors:  Joon Jin Song; Ho-Jin Lee; Jeffrey S Morris; Sanghoon Kang
Journal:  Comput Biol Chem       Date:  2007-06-02       Impact factor: 2.877

10.  A model-based cluster analysis approach to adolescent problem behaviors and young adult outcomes.

Authors:  Eun Young Mun; Michael Windle; Lisa M Schainker
Journal:  Dev Psychopathol       Date:  2008
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.