Literature DB >> 21372081

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data.

Jangsun Baek1, Geoffrey J McLachlan.   

Abstract

MOTIVATION: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions.
RESULTS: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods. AVAILABILITY: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100.

Entities:  

Mesh:

Year:  2011        PMID: 21372081     DOI: 10.1093/bioinformatics/btr112

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Statistical Significance of Clustering using Soft Thresholding.

Authors:  Hanwen Huang; Yufeng Liu; Ming Yuan; J S Marron
Journal:  J Comput Graph Stat       Date:  2015-12-10       Impact factor: 2.302

2.  Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.

Authors:  Meng-Yun Wu; Dao-Qing Dai; Xiao-Fei Zhang; Yuan Zhu
Journal:  PLoS One       Date:  2013-06-17       Impact factor: 3.240

3.  SMART: unique splitting-while-merging framework for gene clustering.

Authors:  Rui Fa; David J Roberts; Asoke K Nandi
Journal:  PLoS One       Date:  2014-04-08       Impact factor: 3.240

4.  densityCut: an efficient and versatile topological approach for automatic clustering of biological data.

Authors:  Jiarui Ding; Sohrab Shah; Anne Condon
Journal:  Bioinformatics       Date:  2016-04-23       Impact factor: 6.937

5.  Unsupervised Bayesian linear unmixing of gene expression microarrays.

Authors:  Cécile Bazot; Nicolas Dobigeon; Jean-Yves Tourneret; Aimee K Zaas; Geoffrey S Ginsburg; Alfred O Hero
Journal:  BMC Bioinformatics       Date:  2013-03-19       Impact factor: 3.169

6.  Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network.

Authors:  Xin Wei; Chunguang Li; Liang Zhou; Li Zhao
Journal:  Sensors (Basel)       Date:  2015-08-05       Impact factor: 3.576

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.