Literature DB >> 18256042

Mixture models with multiple levels, with application to the analysis of multifactor gene expression data.

Rebecka Jörnsten1, Sündüz Keleş.   

Abstract

Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.

Mesh:

Substances:

Year:  2008        PMID: 18256042      PMCID: PMC3294320          DOI: 10.1093/biostatistics/kxm051

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  5 in total

1.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes.

Authors:  Tim Beissbarth; Terence P Speed
Journal:  Bioinformatics       Date:  2004-02-12       Impact factor: 6.937

2.  Simultaneous feature selection and clustering using mixture models.

Authors:  Martin H C Law; Mário A T Figueiredo; Anil K Jain
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2004-09       Impact factor: 6.226

3.  Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

Authors:  Gordon K Smyth
Journal:  Stat Appl Genet Mol Biol       Date:  2004-02-12

4.  A unified approach for simultaneous gene clustering and differential expression identification.

Authors:  Ming Yuan; Christina Kendziorski
Journal:  Biometrics       Date:  2006-12       Impact factor: 2.571

5.  Bioinformatic analysis of neural stem cell differentiation.

Authors:  Loyal A Goff; Jonathan Davila; Rebecka Jörnsten; Sunduz Keles; Ronald P Hart
Journal:  J Biomol Tech       Date:  2007-09
  5 in total
  4 in total

1.  Expression quantitative trait loci mapping with multivariate sparse partial least squares regression.

Authors:  Hyonho Chun; Sündüz Keles
Journal:  Genetics       Date:  2009-03-06       Impact factor: 4.562

2.  Combining Mixture Components for Clustering.

Authors:  Jean-Patrick Baudry; Adrian E Raftery; Gilles Celeux; Kenneth Lo; Raphaël Gottardo
Journal:  J Comput Graph Stat       Date:  2010-06-01       Impact factor: 2.302

3.  A graphical model method for integrating multiple sources of genome-scale data.

Authors:  Daniel Dvorkin; Brian Biehs; Katerina Kechris
Journal:  Stat Appl Genet Mol Biol       Date:  2013-08

4.  Pairwise variable selection for high-dimensional model-based clustering.

Authors:  Jian Guo; Elizaveta Levina; George Michailidis; Ji Zhu
Journal:  Biometrics       Date:  2010-09       Impact factor: 2.571

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.