| Literature DB >> 33265664 |
Trevor Herntier1, Koffi Eddy Ihou2, Anthony Smith1, Anand Rangarajan3, Adrian Peter1.
Abstract
We consider the problem of model selection using the Minimum Description Length (MDL) criterion for distributions with parameters on the hypersphere. Model selection algorithms aim to find a compromise between goodness of fit and model complexity. Variables often considered for complexity penalties involve number of parameters, sample size and shape of the parameter space, with the penalty term often referred to as stochastic complexity. Current model selection criteria either ignore the shape of the parameter space or incorrectly penalize the complexity of the model, largely because typical Laplace approximation techniques yield inaccurate results for curved spaces. We demonstrate how the use of a constrained Laplace approximation on the hypersphere yields a novel complexity measure that more accurately reflects the geometry of these spherical parameters spaces. We refer to this modified model selection criterion as spherical MDL. As proof of concept, spherical MDL is used for bin selection in histogram density estimation, performing favorably against other model selection criteria.Entities:
Keywords: Fisher information; Fisher–Bingham distribution; Jeffreys prior; Laplace approximation; MDL; information geometry; model selection; von Mises–Fisher distribution
Year: 2018 PMID: 33265664 PMCID: PMC7513099 DOI: 10.3390/e20080575
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Riemannian Volume of Hypersphere. The surface area of a hyperspherical manifold plotted against the cardinality of the parameters. Interestingly, the surface area grows to a maximum at seven dimensions and then monotonically decreases. Accordingly, a seven-dimensional model family requires a relatively large ellipsoid around the MLE in order to avoid excessive penalties for complexity.
Figure 2Four different densities selected for varying characteristics. Bimodal (left), skewed (center left), trimodal (center right) and claw (right).
Frequency of deviation of 2500 trials of the choice made by Akaike’s information criterion (AIC), Bayesian information criterion (BIC), two part Minimum Description Length (MDL2) and asymptotic MDL (MDL) from the choice of spherical MDL for a sample size of 60 drawn from different distributions. We found that BIC consistently penalizes complexity the most while AIC and MDL2 are consistently forgiving of complex models. Spherical MDL and ordinary MDL offer a compromise between goodness of fit and complexity, with spherical MDL always choosing a less complex model, showing that ordinary MDL underpenalizes the complexity of curved parameter spaces.
| AIC | BIC | MDL2 | MDL | |
|---|---|---|---|---|
| Bimodal | 1407 | 221 | 1372 | 4 |
| Skew | 1441 | 200 | 1349 | 9 |
| Trimodal | 1478 | 197 | 1323 | 3 |
| Claw | 1569 | 257 | 1471 | 6 |
| Total | 5895 | 875 | 5515 | 22 |