| Literature DB >> 31007770 |
Sylvia Frühwirth-Schnatter1, Gertraud Malsiner-Walli1.
Abstract
In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al. (Stat Comput 26:303-324, 2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with K components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than K with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.Entities:
Keywords: Count data; Dirichlet prior; Latent class analysis; Marginal likelihoods; Mixture distributions; Skew distributions
Year: 2018 PMID: 31007770 PMCID: PMC6448299 DOI: 10.1007/s11634-018-0329-y
Source DB: PubMed Journal: Adv Data Anal Classif ISSN: 1862-5355
Fig. 1Prior distribution of the number of data clusters for with (top row) and (bottom row) and (left-hand side), (middle), and (right-hand side)
Fig. 2Childrens’ Fear Data; trace plot of the number of clusters during MCMC sampling (left-hand side) and posterior distribution after removing the burn-in (right-hand side)
Occurrence probabilities for the three variables in the two classes
| Categories |
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 4 | |
| Class 1 | 0.1 | 0.1 | 0.8 | 0.1 | 0.7 | 0.2 | 0.7 | 0.1 | 0.1 | 0.1 |
| Class 2 | 0.2 | 0.6 | 0.2 | 0.2 | 0.2 | 0.6 | 0.2 | 0.1 | 0.1 | 0.6 |
Posterior distribution for various prior specifications on and , for and , for the first data set of the simulation study,
| Prior | Method |
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|
|
| SFM |
| 0.000 |
| 0.166 | 0.019 | 0.002 | 0.000 | 0.000 |
|
| 0.000 |
| 0.162 | 0.022 | 0.003 | 0.001 | 0.000 | ||
| DPM | 0.000 |
| 0.252 | 0.040 | 0.004 | 0.000 | 0.000 | ||
|
| SFM |
| 0.000 | 0.310 |
| 0.210 | 0.082 | 0.025 | 0.006 |
|
| 0.000 |
| 0.320 | 0.178 | 0.085 | 0.035 | 0.023 | ||
| DPM | 0.000 |
| 0.312 | 0.199 | 0.095 | 0.035 | 0.015 | ||
|
| SFM |
| 0.000 | 0.094 | 0.207 |
| 0.200 | 0.140 | 0.124 |
|
| 0.003 | 0.123 | 0.188 |
| 0.179 | 0.135 | 0.158 | ||
| DPM | 0.000 | 0.099 | 0.188 |
| 0.188 | 0.133 | 0.174 |
Average clustering results over 100 data sets of size and , simulated from a latent class model with two classes, obtained through sparse latent class models (SFM) with and and DPM for three different priors on the precision parameters and as well as using EM estimation as implemented in the R package poLCA (Linzer et al. 2011)
| Prior | Method |
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| ari | err |
|
| ari | err | |||
|
| SFM |
| 0.009 | 1.94 | 0.44 | 0.18 | 0.010 | 2.05 | 0.54 | 0.13 |
|
| 0.005 | 1.92 | 0.43 | 0.18 | 0.005 | 2.02 | 0.54 | 0.13 | ||
| DPM | 0.092 | 1.99 | 0.44 | 0.18 | 0.110 | 2.29 | 0.53 | 0.14 | ||
|
| SFM |
| 0.064 | 2.29 | 0.46 | 0.17 | 0.068 | 2.23 | 0.53 | 0.14 |
|
| 0.035 | 2.38 | 0.45 | 0.17 | 0.032 | 2.24 | 0.53 | 0.14 | ||
| DPM | 0.599 | 2.44 | 0.45 | 0.17 | 0.670 | 2.62 | 0.52 | 0.15 | ||
|
| SFM |
| 0.189 | 3.56 | 0.45 | 0.19 | 0.163 | 2.97 | 0.52 | 0.15 |
|
| 0.086 | 3.34 | 0.45 | 0.19 | 0.072 | 3.28 | 0.51 | 0.16 | ||
| DPM | 1.517 | 3.50 | 0.44 | 0.19 | 1.360 | 3.72 | 0.49 | 0.17 | ||
| poLCA | 1.37 | 0.18 | 0.35 | 2.00 | 0.54 | 0.13 | ||||
The reported values are averages of the posterior expectation of the precision parameter (SFM) and (DPM), the estimated number of clusters , the adjusted Rand index (ari) and the error rate (err)
Childrens’ Fear Data; contingency table summarizing the data which measure motor activity (M) at 4 months, fret/cry behavior (C) at 4 months, and fear of unfamiliar events (F) at 14 months for children (Stern et al. 1994)
|
|
|
| ||
|---|---|---|---|---|
|
|
| 5 | 4 | 1 |
|
| 0 | 1 | 2 | |
|
| 2 | 0 | 2 | |
|
|
| 15 | 4 | 2 |
|
| 2 | 3 | 1 | |
|
| 4 | 4 | 2 | |
|
|
| 3 | 3 | 4 |
|
| 0 | 2 | 3 | |
|
| 1 | 1 | 7 | |
|
|
| 2 | 1 | 2 |
|
| 0 | 1 | 3 | |
|
| 0 | 3 | 3 |
Childrens’ Fear Data; the rows in the upper table show the posterior distribution of the number of clusters for various latent class models: sparse latent class models with (SFM) with hyper priors and (matched to DPM), DPM with hyper priors and (matched to SFM)
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| SFM | |||||||
| | 0 |
| 0.249 | 0.058 | 0.007 | 0.001 | 0.000 |
| Matched to DPM | 0 | 0.128 | 0.267 |
| 0.201 | 0.090 | 0.033 |
| DPM | |||||||
| | 0 | 0.101 | 0.235 |
| 0.197 | 0.118 | 0.103 |
| Matched to SFM | 0 |
| 0.251 | 0.048 | 0.011 | 0.002 | 0.000 |
The lower table shows log marginal likelihoods, , estimated for a latent class model with (FM) for increasing K
The posterior mode is denoted in bold (upper table). The number of components K with the largest marginal likelihood is denoted in bold (lower table)
Fig. 3Childrens’ Fear Data; posterior distributions of the number of clusters ; top: sparse finite mixtures with , (left-hand side) and matched prior (right-hand side); bottom: DPM with (right-hand side) and matched prior (left-hand side)
Childrens’ Fear Data; posterior inference for , , and , based on all MCMC draws with
| Class 1 | Class 2 | |
|---|---|---|
|
| 0.146 (0.032, 0.267) | 0.225 (0.103, 0.358) |
|
| 0.170 (0.010, 0.319) | |
|
| 0.126 (0.015, 0.239) | |
|
| 0.276 (0.127, 0.418) | 0.076 (0.002, 0.159) |
|
| 0.263 (0.078, 0.419) | |
|
| 0.311 (0.170, 0.478) | 0.109 (0.007, 0.212) |
|
| 0.212 (0.079, 0.348) | |
|
| 0.069 (0.000, 0.177) | |
|
| 0.298 (0.119, 0.480) | 0.279 (0.117, 0.447) |
|
| 0.090 (0.000, 0.211) | |
|
| 0.470 (0.303, 0.645) | 0.530 (0.355, 0.698) |
The values are the average of the MCMC draws, with 95% HPD intervals in parentheses
For each cluster, the most probable outcome for each feature is denoted in bold
Eye Tracking Data; the rows in the upper table show the posterior distribution of the number of clusters for following Poisson mixture models: sparse finite mixtures with (SFM) with hyper priors and (matched to DPM), DPM with hyper priors and (matched to SFM)
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| SFM | |||||||
| | 0.000 | 0.091 | 0.266 | 0.056 | 0.003 | 0.000 | |
| Matched to DPM | 0.000 | 0.007 | 0.174 | 0.299 | 0.153 | 0.059 | |
| DPM | |||||||
| | 0.005 | 0.095 | 0.209 | 0.173 | 0.134 | 0.161 | |
| Matched to SFM | 0.000 | 0.012 | 0.379 | 0.122 | 0.022 | 0.002 | |
The lower table shows log marginal likelihoods, , estimated for a finite mixture with (FM) for increasing K
The posterior mode is denoted in bold (upper table). The number of components K with the largest marginal likelihood is denoted in bold (lower table)
Fig. 4Eye Tracking Data; posterior distributions of the number of clusters ; top: sparse finite mixtures with , (left-hand side) and matched prior (right-hand side); bottom: DPM with (right-hand side) and matched prior (left-hand side)
Fabric Fault Data; the rows in the upper table show the posterior distribution of the number of clusters for following mixtures of Poisson GLMs and negative binomial GLMs: sparse finite mixtures with (SFM) with hyper priors and (matched to DPM), DPM with hyper priors and (matched to SFM)
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
| Poisson GLM | SFM |
| 0.241 | 0.006 | 0.000 | |
| Matched to DPM | 0.060 | 0.053 | 0.001 | |||
| DPM |
| 0.036 | 0.049 | 0.001 | ||
| Matched to SFM | 0.141 | 0.027 | 0.000 | |||
| NegBin GLM | SFM |
| 0.006 | |||
| Matched to DPM | 0.093 | 0.001 | ||||
| DPM |
| 0.059 | 0.001 | |||
| Matched to SFM | 0.006 |
The lower table shows log marginal likelihoods, , estimated for finite mixtures with (FM) for increasing K
The posterior mode is denoted in bold (upper table). The number of components K with the largest marginal likelihood is denoted in bold (lower table)
Fig. 5Fabric Fault Data; posterior distributions of the number of clusters for mixture of Possion GLMs (left-hand side) as well as mixtures of negative binomial GLMs (right-hand side); top: based on sparse finite mixtures (SFM), bottom: based on Dirichlet process mixtures (DPM) under various hyper priors
Alzheimer Data; the rows in the upper table show the posterior distribution of the number of clusters for following mixtures of univariate skew normal and skew-t distributions: sparse finite mixtures with (SFM) with hyper priors and (matched to DPM), DPM with hyper priors and (matched to SFM)
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Skew normal | |||||||
| SFM | |||||||
| | 0.0127 | 0.193 | 0.029 | 0.005 | 0.000 | 0.000 | |
| Matched to DPM | 0.000 | 0.268 | 0.228 | 0.119 | 0.049 | 0.026 | |
| DPM | |||||||
| | 0.000 | 0.181 | 0.214 | 0.139 | 0.083 | 0.082 | |
| Matched to SFM | 0.000 | 0.182 | 0.029 | 0.004 | 0.000 | 0.000 | |
| Skew- | |||||||
| SFM | |||||||
| | 0.263 | 0.124 | 0.015 | 0.001 | 0.000 | 0.000 | |
| Matched to DPM | 0.034 | 0.301 | 0.205 | 0.094 | 0.032 | 0.013 | |
| DPM | |||||||
| | 0.003 | 0.275 | 0.206 | 0.124 | 0.058 | 0.045 | |
| Matched to SFM | 0.211 | 0.214 | 0.065 | 0.016 | 0.002 | 0.000 | |
The lower table shows log marginal likelihoods, , estimated for finite mixtures with (FM) for increasing K
The posterior mode is denoted in bold (upper table). The number of components K with the largest marginal likelihood is denoted in bold (lower table)
Fig. 6Alzheimer Data; posterior distributions of the number of clusters for mixtures of skew normal (left-hand panel) as well as mixtures of skew-t distributions (right-hand panel); top row in each panel: sparse finite mixtures with , (left column) and matched prior (right column); bottom row in each panel: DPM with (right column) and matched prior (left column)
DLBCL Data; estimated number of clusters for following mixtures of multivariate skew normal and skew-t distributions: sparse finite mixtures with (SFM) with hyper priors and (matched to DPM), DPM with hyper priors and (matched to SFM)
|
|
|
| ||||
|---|---|---|---|---|---|---|
| Skew normal | ||||||
| SFM | ||||||
| | 15 | 0.089 (0.04, 0.14) | ||||
| Matched to DPM | 14 | 0.094 (0.04, 0.15) | ||||
| DPM | ||||||
| | 26 | 1.71 (0.99, 2.49) | ||||
| Matched to SFM | 23 | 0.68 (0.38, 0.98) | ||||
| Skew- | ||||||
| SFM | ||||||
| | 11 | 0.058 (0.03, 0.10) | ||||
| Matched to DPM | 10 | 0.067 (0.03, 0.11) | ||||
| DPM | ||||||
| | 14 | 1.20 (0.56, 1.86) | ||||
| Matched to SFM | 10 | 0.37 (0.15, 0.59) | ||||
The lower table shows log marginal likelihoods, , estimated for finite mixtures with (FM) for increasing K
Fig. 7DLBCL Data; posterior distributions of the number of clusters for mixtures of skew normal (left-hand panel) as well as mixtures of skew-t distributions (right-hand panel); top row in each panel: sparse finite mixtures with , (left column) and matched prior (right column); bottom row in each panel: DPM with (right column) and matched prior (left column)
Posterior expectations of together with 95% confidence regions for the various data sets; sparse finite mixture with and (SFM) versus overfitting mixtures with and (RM)
| Data set |
|
|
| SFM | RM | ||
|---|---|---|---|---|---|---|---|
|
| 95% CI |
| 95% CI | ||||
|
| 101 | 1 | 1 | 0.020 | (0.004, 0.04) | 0.37 | (0.18, 0.5) |
|
| 93 | 3 | 7 | 0.010 | (0.0007, 0.023) | 1.30 | (0.09, 3.01) |
| 32 | 1 | 3 | 0.004 | (0, 0.014) | 0.04 | (0, 0.13) | |
| 451 | 1 | 3 | 0.009 | (0.0001, 0.022) | 0.36 | (0.18, 0.5) | |
Fig. 8Posterior distributions of the number of clusters for the various data sets for a sparse finite mixture with and prior derived from the criterion of Rousseau and Mengersen (2011)