| Literature DB >> 31570618 |
Yi-An Ma1, Yuansi Chen2, Chi Jin1, Nicolas Flammarion1, Michael I Jordan3,2.
Abstract
Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical understanding of the relationships between these 2 kinds of methodology, and limited understanding of relative strengths and weaknesses. Moreover, existing results have been obtained primarily in the setting of convex functions (for optimization) and log-concave functions (for sampling). In this setting, where local properties determine global properties, optimization algorithms are unsurprisingly more efficient computationally than sampling algorithms. We instead examine a class of nonconvex objective functions that arise in mixture modeling and multistable systems. In this nonconvex setting, we find that the computational complexity of sampling algorithms scales linearly with the model dimension while that of optimization algorithms scales exponentially.Entities:
Keywords: Langevin Monte Carlo; computational complexity; nonconvex optimization
Year: 2019 PMID: 31570618 PMCID: PMC6800351 DOI: 10.1073/pnas.1820003116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Depiction of an instance of , inside the region of radius , that attains the lower bound.
Fig. 2.Experimental results: scaling of number of gradient queries required for EM and ULA algorithms to converge with respect to the dimension . When , too many gradient queries are required for EM to converge, so that an estimate of convergence time is not feasible. When , ULA converges within 1,500 gradient queries (not shown in the figure).
The (Metropolis-adjusted) Langevin algorithm is a gradient-based MCMC algorithm. In each step, one simulates and , a uniform random variable between 0 and 1. The conditional distribution is the normal distribution centered at and is the target distribution. Without the Metropolis adjustment step, the algorithm is called the unadjusted Langevin algorithm (ULA). Otherwise, it is called the Metropolis-adjusted Langevin algorithm (MALA)
| MALA |
| Input: |
| for |
| if |
| Return |
GD is a classical gradient-based optimization algorithm which updates along the negative gradient direction
| GD |
| Input: |
| for |
| Return |