Literature DB >> 22125375

Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

Kenneth Lo1, Raphael Gottardo.   

Abstract

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.

Entities:  

Year:  2012        PMID: 22125375      PMCID: PMC3223965          DOI: 10.1007/s11222-010-9204-1

Source DB:  PubMed          Journal:  Stat Comput        ISSN: 0960-3174            Impact factor:   2.559


  11 in total

1.  Model-based clustering and data transformations for gene expression data.

Authors:  K Y Yeung; C Fraley; A Murua; A E Raftery; W L Ruzzo
Journal:  Bioinformatics       Date:  2001-10       Impact factor: 6.937

2.  A mixture model-based approach to the clustering of microarray expression data.

Authors:  G J McLachlan; R W Bean; D Peel
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

3.  Donuts, scratches and blanks: robust model-based segmentation of microarray images.

Authors:  Qunhua Li; Chris Fraley; Roger E Bumgarner; Ka Yee Yeung; Adrian E Raftery
Journal:  Bioinformatics       Date:  2005-04-21       Impact factor: 6.937

4.  Automated gating of flow cytometry data via robust model-based clustering.

Authors:  Kenneth Lo; Ryan Remy Brinkman; Raphael Gottardo
Journal:  Cytometry A       Date:  2008-04       Impact factor: 4.355

5.  Automated high-dimensional flow cytometric data analysis.

Authors:  Saumyadipta Pyne; Xinli Hu; Kui Wang; Elizabeth Rossin; Tsung-I Lin; Lisa M Maier; Clare Baecher-Allan; Geoffrey J McLachlan; Pablo Tamayo; David A Hafler; Philip L De Jager; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-14       Impact factor: 11.205

6.  Robust parameter estimation of intensity distributions for brain magnetic resonance images.

Authors:  P Schroeter; J M Vesin; T Langenberger; R Meuli
Journal:  IEEE Trans Med Imaging       Date:  1998-04       Impact factor: 10.048

7.  Analysis of tomato root initiation using a normal mixture distribution.

Authors:  R G Gutierrez; R J Carroll; N Wang; G H Lee; B H Taylor
Journal:  Biometrics       Date:  1995-12       Impact factor: 2.571

8.  Model-based region-of-interest selection in dynamic breast MRI.

Authors:  Florence Forbes; Nathalie Peyrard; Chris Fraley; Dianne Georgian-Smith; David M Goldhaber; Adrian E Raftery
Journal:  J Comput Assist Tomogr       Date:  2006 Jul-Aug       Impact factor: 1.826

9.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

10.  flowClust: a Bioconductor package for automated gating of flow cytometry data.

Authors:  Kenneth Lo; Florian Hahne; Ryan R Brinkman; Raphael Gottardo
Journal:  BMC Bioinformatics       Date:  2009-05-14       Impact factor: 3.169

View more
  4 in total

1.  Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains.

Authors:  Jing Tang; Jianbo Fu; Yunxia Wang; Yongchao Luo; Qingxia Yang; Bo Li; Gao Tu; Jiajun Hong; Xuejiao Cui; Yuzong Chen; Lixia Yao; Weiwei Xue; Feng Zhu
Journal:  Mol Cell Proteomics       Date:  2019-05-16       Impact factor: 5.911

2.  Clinical and environmental influences on metabolic biomarkers collected for newborn screening.

Authors:  Kelli K Ryckman; Stanton L Berberich; Oleg A Shchelochkov; Daniel E Cook; Jeffrey C Murray
Journal:  Clin Biochem       Date:  2012-09-23       Impact factor: 3.281

3.  Addressing heterogeneous populations in latent variable settings through robust estimation.

Authors:  Kenneth J Nieser; Amy L Cochran
Journal:  Psychol Methods       Date:  2021-10-25

4.  Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference.

Authors:  Bram Thijssen; Lodewyk F A Wessels
Journal:  PLoS One       Date:  2020-03-13       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.