Literature DB >> 35741492

Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck.

Andrew K Tan1,2, Max Tegmark1,2, Isaac L Chuang1,2,3.   

Abstract

At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.

Entities:  

Keywords:  Pareto; bottleneck; clustering; frontier; information; multi-objective; optimization

Year:  2022        PMID: 35741492      PMCID: PMC9222302          DOI: 10.3390/e24060771

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.738


  10 in total

1.  Estimating mutual information.

Authors:  Alexander Kraskov; Harald Stögbauer; Peter Grassberger
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2004-06-23

2.  Entropy and information in neural spike trains: progress on the sampling problem.

Authors:  Ilya Nemenman; William Bialek; Rob de Ruyter van Steveninck
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2004-05-24

3.  A spiking neuron as information bottleneck.

Authors:  Lars Buesing; Wolfgang Maass
Journal:  Neural Comput       Date:  2010-08       Impact factor: 2.026

4.  How many clusters? An information-theoretic perspective.

Authors:  Susanne Still; William Bialek
Journal:  Neural Comput       Date:  2004-12       Impact factor: 2.026

5.  The Deterministic Information Bottleneck.

Authors:  D J Strouse; David J Schwab
Journal:  Neural Comput       Date:  2017-04-14       Impact factor: 2.026

6.  The Information Bottleneck and Geometric Clustering.

Authors:  D J Strouse; David J Schwab
Journal:  Neural Comput       Date:  2018-10-12       Impact factor: 2.026

7.  Thermodynamic Cost and Benefit of Memory.

Authors:  Susanne Still
Journal:  Phys Rev Lett       Date:  2020-02-07       Impact factor: 9.161

8.  Pareto-Optimal Data Compression for Binary Classification Tasks.

Authors:  Max Tegmark; Tailin Wu
Journal:  Entropy (Basel)       Date:  2019-12-19       Impact factor: 2.524

9.  The Conditional Entropy Bottleneck.

Authors:  Ian Fischer
Journal:  Entropy (Basel)       Date:  2020-09-08       Impact factor: 2.524

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.