| Literature DB >> 33285988 |
Yiğit Uğur1,2, George Arvanitakis2, Abdellatif Zaidi1.
Abstract
In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders' mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.Entities:
Keywords: Gaussian mixture model; clustering; information bottleneck; unsupervised learning
Year: 2020 PMID: 33285988 PMCID: PMC7516645 DOI: 10.3390/e22020213
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Variational Information Bottleneck with Gaussian mixtures.
Figure 2Inference network.
Figure 3Generative network.
Comparison of the clustering accuracy of various algorithms. The algorithms are run without pretraining. Each algorithm is run ten times. The values in correspond to the standard deviations of clustering accuracies. DEC, Deep Embedded Clustering; VaDE, Variational Deep Embedding; VIB, Variational Information Bottleneck.
| MNIST | STL-10 | ||||
|---|---|---|---|---|---|
| Best Run | Average Run | Best Run | Average Run | ||
| GMM | 44.1 | 40.5 (1.5) | 78.9 | 73.3 (5.1) | |
| DEC | 80.6 | ||||
| VaDE | 91.8 | 78.8 (9.1) | 85.3 | 74.1 (6.4) | |
|
|
|
| |||
Values are taken from VaDE [19].
Comparison of the clustering accuracy of various algorithms. A stacked autoencoder is used to pretrain the DNNs of the encoder and decoder before running algorithms (DNNs are initialized with the same weights and biases of [19]). Each algorithm is run ten times. The values in correspond to the standard deviations of clustering accuracies.
| MNIST | REURTERS10K | ||||
|---|---|---|---|---|---|
| Best Run | Average Run | Best Run | Average Run | ||
| DEC | 84.3 | 72.2 | |||
| VaDE | 94.2 | 93.2 (1.5) | 79.8 | 79.1 (0.6) | |
|
|
|
| |||
Values are taken from DEC [20].
Figure 4Accuracy vs. the number of epochs for the STL-10dataset.
Figure 5Information plane for the STL-10 dataset.
Figure 6Visualization of the latent space before training; and after 1, 5, and 500 epochs.