Literature DB >> 14871871

Bayesian mixture model based clustering of replicated microarray data.

M Medvedovic1, K Y Yeung, R E Bumgarner.   

Abstract

MOTIVATION: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability.
RESULTS: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. AVAILABILITY: The MS Windows based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm SUPPLEMENTAL INFORMATION: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html

Mesh:

Year:  2004        PMID: 14871871     DOI: 10.1093/bioinformatics/bth068

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  52 in total

1.  Comment: Extending the Latent Position Model for Networks.

Authors:  Adrian E Raftery
Journal:  J Am Stat Assoc       Date:  2018-01-26       Impact factor: 5.033

2.  Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance.

Authors:  Kevin K Lin; Darya Chudova; G Wesley Hatfield; Padhraic Smyth; Bogi Andersen
Journal:  Proc Natl Acad Sci U S A       Date:  2004-11-01       Impact factor: 11.205

3.  Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset.

Authors:  X Liu; S Sivaganesan; K Y Yeung; J Guo; R E Bumgarner; Mario Medvedovic
Journal:  Bioinformatics       Date:  2006-05-18       Impact factor: 6.937

4.  Kernel stick-breaking processes.

Authors:  David B Dunson; Ju-Hyun Park
Journal:  Biometrika       Date:  2008       Impact factor: 2.445

5.  A semi-parametric Bayesian model for unsupervised differential co-expression analysis.

Authors:  Johannes M Freudenberg; Siva Sivaganesan; Michael Wagner; Mario Medvedovic
Journal:  BMC Bioinformatics       Date:  2010-05-07       Impact factor: 3.169

6.  Importance of replication in analyzing time-series gene expression data: corticosteroid dynamics and circadian patterns in rat liver.

Authors:  Tung T Nguyen; Richard R Almon; Debra C DuBois; William J Jusko; Ioannis P Androulakis
Journal:  BMC Bioinformatics       Date:  2010-05-26       Impact factor: 3.169

7.  Discovering transcriptional modules by Bayesian data integration.

Authors:  Richard S Savage; Zoubin Ghahramani; Jim E Griffin; Bernard J de la Cruz; David L Wild
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

8.  Genomic profile of matrix and vasculature remodeling in TGF-alpha induced pulmonary fibrosis.

Authors:  William D Hardie; Thomas R Korfhagen; Maureen A Sartor; Adrienne Prestridge; Mario Medvedovic; Timothy D Le Cras; Machiko Ikegami; Scott C Wesselkamper; Cynthia Davidson; Maggie Dietsch; William Nichols; Jeffrey A Whitsett; George D Leikauf
Journal:  Am J Respir Cell Mol Biol       Date:  2007-05-11       Impact factor: 6.914

9.  Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification.

Authors:  Michael Gormley; Aydin Tozeren
Journal:  BMC Bioinformatics       Date:  2008-11-17       Impact factor: 3.169

10.  AutoClass@IJM: a powerful tool for Bayesian classification of heterogeneous data in biology.

Authors:  Fiona Achcar; Jean-Michel Camadro; Denis Mestivier
Journal:  Nucleic Acids Res       Date:  2009-05-27       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.