| Literature DB >> 33267265 |
Diego Granziol1,2, Binxin Ru1,2, Stefan Zohren1,2, Xiaowen Dong1,2, Michael Osborne1,2, Stephen Roberts1,2.
Abstract
Efficient approximation lies at the heart of large-scale machine learning problems. In this paper, we propose a novel, robust maximum entropy algorithm, which is capable of dealing with hundreds of moments and allows for computationally efficient approximations. We showcase the usefulness of the proposed method, its equivalence to constrained Bayesian variational inference and demonstrate its superiority over existing approaches in two applications, namely, fast log determinant estimation and information-theoretic Bayesian optimisation.Entities:
Keywords: Bayesian optimisation; log determinant estimation; maximum entropy
Year: 2019 PMID: 33267265 PMCID: PMC7515039 DOI: 10.3390/e21060551
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Relative estimation error for MEMe, Chebyshev, and Lanczos approaches, with various length-scale l and condition number on the squared exponential kernel matrix .
|
|
| MEMe | Chebyshev | Lanczos |
|---|---|---|---|---|
|
| 0.05 |
| 0.0037 | 0.0024 |
|
| 0.15 | 0.0522 |
| 0.0024 |
|
| 0.25 | 0.0387 | 0.0795 |
|
|
| 0.35 | 0.0263 | 0.2302 |
|
|
| 0.45 |
| 0.3439 | 0.0502 |
|
| 0.55 |
| 0.4089 | 0.0646 |
|
| 0.65 |
| 0.5049 | 0.0838 |
|
| 0.75 |
| 0.5049 | 0.1050 |
|
| 0.85 |
| 0.5358 | 0.1199 |
Figure 1Comparison of the Classical (OMxnt) and our novel (MEMe) MaxEnt algorithms in log determinant estimation on real data. The entropy value (a) and estimation error (b) of OMxnt are shown in the top row. Those of the MEMe are shown in (c,d) in the bottom row.
Mean fractional error in approximating the entropy of the mixture of M Gaussians using various methods.
| Methods |
|
|
|---|---|---|
| MEMe-10 |
|
|
| MEMe-30 |
|
|
| MEMeL-10 |
|
|
| MEMeL-30 |
|
|
| VUB |
|
|
| Huber-2 |
|
|
| MC-100 |
|
|
| MM |
|
|
Fractional error in approximating the entropy of Gaussian mixtures using various methods.
| Methods | ||
|---|---|---|
| MEMe-10 |
|
|
| ( | ( | |
| MEMe-30 |
|
|
| ( | ( | |
| MEMeL-10 |
|
|
| ( | ( | |
| MEMeL-30 |
|
|
| ( | ( | |
| Variational |
|
|
| Upper Bound | ( | ( |
| Huber-2 |
|
|
| ( | ( | |
| MC-100 |
|
|
| ( | ( | |
| Moment |
|
|
| Matching | ( | ( |
Mean runtime of approximating the entropy of the mixture of M Gaussians using various methods.
| Methods |
|
|
|---|---|---|
| MEMe-10 |
|
|
| MEMe-30 |
|
|
| MEMeL-10 |
|
|
| MEMeL-30 |
|
|
| VUB |
|
|
| Huber-2 |
|
|
| MC-100 |
|
|
| MM |
|
|
Runtime of approximating the entropy of Gaussian mixtures using various methods.
| Methods | ||
|---|---|---|
| MEMe-10 |
|
|
| ( | ( | |
| MEMe-30 |
|
|
| ( | ( | |
| MEMeL-10 |
|
|
| ( | ( | |
| MEMeL-30 |
|
|
| ( | ( | |
| Variational |
|
|
| Upper Bound | ( | ( |
| Huber-2 |
|
|
| ( | ( | |
| MC-100 |
|
|
| ( | ( | |
| Moment |
|
|
| Matching | ( | ( |
Figure 2Bayesian Optimisation (BO) on a 1D toy example with acquisition functions computed by different approximation methods. In the top subplot, the red dash line is the unknown objective function, the black crosses are the observed data points, and the blue solid line and shaded area are the posterior mean and variance, respectively, of the GP surrogate that we use to model the latent objective function. The coloured triangles are the next query point recommended by the BO algorithms, which correspond to the maximiser of the acquisition functions in the bottom subplot. In the bottom plot, the red solid line, black dash line, and green dotted line are the acquisition functions computed by Quad, MM, and MEMe using 10 Legendre moments, respectively.
Figure 3Performance of various versions of FITBO on 2 benchmark test problems: (a) Michalewicz-5D function and (b) Hartmann-6D function. The immediate regret (IR) on the y-axis is expressed in the logarithm to the base 10.