Literature DB >> 35502199

Accelerated Optimization on Riemannian Manifolds via Discrete Constrained Variational Integrators.

Abstract

A variational formulation for accelerated optimization on normed vector spaces was recently introduced in Wibisono et al. (PNAS 113:E7351-E7358, 2016), and later generalized to the Riemannian manifold setting in Duruisseaux and Leok (SJMDS, 2022a). This variational framework was exploited on normed vector spaces in Duruisseaux et al. (SJSC 43:A2949-A2980, 2021) using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization, and it was observed that geometric discretizations which respect the time-rescaling invariance and symplecticity of the Lagrangian and Hamiltonian flows were substantially less prone to stability issues, and were therefore more robust, reliable, and computationally efficient. As such, it is natural to develop time-adaptive Hamiltonian variational integrators for accelerated optimization on Riemannian manifolds. In this paper, we consider the case of Riemannian manifolds embedded in a Euclidean space that can be characterized as the level set of a submersion. We will explore how holonomic constraints can be incorporated in discrete variational integrators to constrain the numerical discretization of the Riemannian Hamiltonian system to the Riemannian manifold, and we will test the performance of the resulting algorithms by solving eigenvalue and Procrustes problems formulated as optimization problems on the unit sphere and Stiefel manifold.

Entities: Chemical

Keywords: Accelerated optimization; Constrained variational integrators; Riemannian optimization; Symplectic optimization

Year: 2022 PMID： 35502199 PMCID： PMC9046732 DOI： 10.1007/s00332-022-09795-9

Source DB: PubMed Journal: J Nonlinear Sci ISSN： 0938-8974 Impact factor: 3.443

Introduction

Many data analysis algorithms are designed around the minimization of a loss function or the maximization of a likelihood function. Due to the ever-growing scale of the data sets and size of the problems, there has been a lot of focus on first-order optimization algorithms because of their low cost per iteration. In 1983, Nesterov’s accelerated gradient method (Nesterov 1983) was shown to converge in to the minimum of the convex objective function f, improving on the convergence rate exhibited by standard gradient descent methods. This convergence rate was shown in Nesterov (2004) to be optimal among first-order methods using only information about at consecutive iterates. This phenomenon in which an algorithm displays this improved rate of convergence is referred to as acceleration, and other accelerated algorithms have been derived since Nesterov’s algorithm, which was shown in Su et al. (2016) to limit to a second-order ordinary differential equation (ODE), as the timestep goes to 0, and that f(x(t)) converges to its optimal value at a rate of along any trajectory x(t) of this ODE. It was then shown in Wibisono et al. (2016) that in continuous time, an arbitrary convergence rate can be achieved in normed spaces, by considering flow maps generated by a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This variational framework and the time-rescaling property of this family were then exploited in Duruisseaux et al. (2021) by using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity could result in a significant gain in computational efficiency. More generally, when applied to Hamiltonian systems, symplectic integrators yield discrete approximations of the flow that preserve the symplectic 2-form (Hairer et al. 2006). The preservation of symplecticity results in the preservation of many qualitative aspects of the underlying dynamical system. In particular, when applied to conservative Hamiltonian systems, symplectic integrators exhibit excellent long-time near-energy preservation (Benettin and Giorgilli 1994; Reich 1999). Variational integrators provide a systematic method for constructing symplectic integrators of arbitrarily high-order based on the numerical discretization of Hamilton’s principle (Marsden and West 2001; Hall and Leok 2015), or equivalently, by the approximation of Jacobi’s solution of the Hamilton–Jacobi equation, which is a generating function for the exact symplectic flow map. In the past few years, there has been some effort to derive accelerated optimization algorithms in the Riemannian manifold setting (Duruisseaux and Leok 2022a; Alimisis et al. 2020a, b, 2021; Zhang and Sra 2016, 2018; Ahn and Sra 2020; Liu et al. 2017). In Duruisseaux and Leok (2022a), it was shown that in continuous time, the convergence rate of f(x(t)) to its optimal value can be accelerated to an arbitrary convergence rate on Riemannian manifolds. This was achieved by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems on Riemannian manifolds which is closed under time-rescaling, thereby generalizing the variational framework for accelerated optimization of Wibisono et al. (2016) to Riemannian manifolds. The time-adaptivity based approach relying on a Poincaré transformation from Duruisseaux et al. (2021) was also extended to the Riemannian manifold setting in Duruisseaux and Leok (2022a). Now, the Whitney embedding theorems (Whitney 1944a, b) state that any smooth manifold of dimension can be embedded in and immersed in , and is thus diffeomorphic to a submanifold of . Furthermore, the Nash embedding theorems (Nash 1956) state that any Riemannian manifold can be globally isometrically embedded into some Euclidean space. As a consequence of these embedding theorems, the study of Riemannian manifolds can in principle be reduced to the study of submanifolds of Euclidean spaces. Altogether, this motivates the introduction of time-adaptive variational integrators on Riemannian manifolds which exploit the structure of the embedding Euclidean space, and in this paper, we will study how holonomic constraints can be incorporated into different types of variational integrators to constrain the numerical solutions of the Riemannian dynamical system to the Riemannian manifold. Incorporating holonomic constraints in geometric integrators has been studied extensively in the past (see Marsden and West (2001); Hairer et al. (2006); Marsden and Ratiu (1999); Holm et al. (2009) for instance), and some work has been done from the variational perspective for the Type I Lagrangian formulation in Marsden and West (2001) via augmented Lagrangians.

Outline of the Paper

Section 2 shows the equivalence between constrained variational principles and constrained Euler–Lagrange equations, both in continuous and discrete time, before deriving analogous results for both the Type II and Type III Hamiltonian formulations of mechanics in Sect. 3. In Sect. 4, we will exploit error analysis theorems for unconstrained mechanics from Marsden and West (2001); Schmitt and Leok (2017) to obtain variational error analysis results for the maps defined implicitly by the discrete constrained Euler–Lagrange and Hamilton’s equations. Finally, in Sect. 5, we will exploit these constrained variational integrators and the variational formulation of accelerated optimization on Riemannian manifolds from Duruisseaux and Leok (2022a) to solve numerically generalized eigenvalue problems and Procrustes problems on the unit sphere and Stiefel manifold.

Constrained Variational Lagrangian Mechanics

Traditionally, variational integrators have been designed based on the Type I generating function known as the discrete Lagrangian, . The exact discrete Lagrangian is the exact generating function for the time-h flow map of Hamilton’s equations, and it can be represented in boundary value form bywhere and q satisfies the Euler–Lagrange equations over the time interval [0, h]. This is closely related to Jacobi’s solution of the Hamilton–Jacobi equation. A variational integrator is defined by constructing an approximation to the exact discrete Lagrangian , and then applying the implicit discrete Euler–Lagrange equations,which implicitly define a numerical integrator, referred to as the discrete Hamiltonian map , where denotes a partial derivative with respect to the i-th argument. These equations define the discrete Legendre transforms, :and the discrete Hamiltonian map can be expressed as . Such numerical methods are called variational integrators as they can be derived from a discrete Hamilton’s principle, which involves extremizing a discrete action sum , subject to fixed boundary conditions on , . Now, suppose we are given a configuration manifold , and a holonomic constraint function . Assuming that is a regular point of , we can constrain the dynamics to the constraint submanifold which is truly a submanifold of (see Marsden and West (2001); Abraham et al. (1988)). We will now consider variational Lagrangian mechanics with holonomic constraints using Lagrange multipliers .

Continuous Constrained Variational Lagrangian Mechanics

We begin by presenting an equivalence between the continuous constrained variational principle and the continuous constrained Euler–Lagrange equations:

Theorem 2.1

Consider the constrained action functional given byThe condition that is stationary with respect to the boundary conditions and is equivalent to satisfying the constrained Euler–Lagrange equations

Proof

See Appendix A.1.

Remark 2.1

These constrained Euler–Lagrange equations can be thought of as the Euler–Lagrange equations coming from the augmented Lagrangian . Consider the function given by the extremal value of the constrained action functional over the family of curves satisfying the boundary conditions and :The following theorem shows that is a generating function for the flow of the continuous constrained Euler–Lagrange equations:

Theorem 2.2

The exact time-T flow map of Hamilton’s equations is implicitly given by the following relations:In particular, is a Type I generating function that generates the exact flow of the constrained Euler–Lagrange equations (2.6). See Appendix A.4.

Discrete Constrained Variational Lagrangian Mechanics

We now introduce a discrete variational formulation of Lagrangian mechanics which includes holonomic constraints. Suppose we are given a partition of the interval [0, T], and a discrete curve in denoted by such that and . We will formulate discrete constrained variational Lagrangian mechanics in terms of the following discrete analogues of the constrained action functional given by Eq. (2.5):whereWe can now derive discrete analogues to Theorem 2.1 relating discrete Type I variational principles to discrete Euler–Lagrange equations:

Theorem 2.3

The Type I discrete Hamilton’s variational principlesare equivalent to the discrete constrained Euler–Lagrange equationswhere is defined via Eq. (2.11). See Appendix A.7.

Remark 2.2

These discrete constrained Euler–Lagrange equations can be thought of as the discrete Euler–Lagrange equations coming from the augmented discrete Lagrangians

Constrained Variational Hamiltonian Mechanics

The boundary value formulation of the exact Type II generating function of the time-h flow of Hamilton’s equations is given by the exact discrete right Hamiltonian,where (q, p) satisfies Hamilton’s equations with boundary conditions and . A Type II Hamiltonian variational integrator is constructed by using an approximate discrete right Hamiltonian , and applying the discrete right Hamilton’s equations,which implicitly define the integrator, . Similarly, the boundary value formulation of the exact Type III generating function of the time-h flow of Hamilton’s equations is given by the exact discrete left Hamiltonian,where (q, p) satisfies Hamilton’s equations with boundary conditions and . A Type III Hamiltonian variational integrator is constructed by using an approximate discrete left Hamiltonian , and applying the discrete left Hamilton’s equations,which implicitly define the integrator, . We now derive analogous results to those of Sect. 2 from the Hamiltonian perspective. As in the Lagrangian case, we will assume we have a configuration manifold , a holonomic constraint function , and that the dynamics are constrained to the submanifold .

Continuous Constrained Variational Hamiltonian Mechanics

The following theorem presents the equivalence between a continuous constrained variational principle and continuous constrained Hamilton’s equations in the Type II case, generalizing Lemma 2.1 from Leok and Zhang (2011) to include holonomic constraints:

Theorem 3.1

Consider the Type II constrained action functional The condition that is stationary with respect to the boundary conditions and is equivalent to satisfying Hamilton’s constrained equations See Appendix A.2. As in the Type II case, we can derive a theorem relating a continuous constrained variational principle and continuous constrained Hamilton’s equations in the Type III case:

Theorem 3.2

Consider the Type III constrained action functional The condition that is stationary with respect to the boundary conditions and is equivalent to satisfying Hamilton’s constrained equations See Appendix A.3.

Remark 3.1

Hamilton’s constrained equations are the same in the Type II and Type III formulations of Hamiltonian mechanics, and they can be thought of as the Hamilton’s equations generated by the augmented Hamiltonianwhere is the conjugate momentum for the variable . Furthermore, they are equivalent to the constrained Euler–Lagrange Eq. (2.6), provided that the Lagrangian L is hyperregular.

Remark 3.2

It is sometimes beneficial to augment the continuous equations with the equation (and analogously for the discrete case) to ensure that the momentum p lies in the cotangent space to the manifold, as explained and illustrated in (Hairer et al. 2006, Chapter VII). We will now generalize Theorem 2.2 and its Type III analogue from Leok and Zhang (2011) to include holonomic constraints using Lagrange multipliers . In the Type II case, consider the function given by the extremal value of the constrained action functional over the family of curves satisfying the boundary conditions and :The following theorem shows that is a generating function for the flow of the continuous constrained Hamilton’s equations:

Theorem 3.3

The exact time-T flow map of Hamilton’s equations is implicitly given by the following relations:In particular, is a Type II generating function that generates the exact flow of the constrained Hamilton’s Eq. (3.6). See Appendix A.5. In the Type III case, consider the function given by the extremal value of the constrained action functional over the family of curves satisfying the boundary conditions and :The following theorem shows that is a generating function for the flow of the continuous constrained Hamilton’s equations:

Theorem 3.4

The exact time-T flow map of Hamilton’s equations is implicitly given by the following relations:In particular, is a Type III generating function that generates the exact flow of the constrained Hamilton’s Eq. (3.8). See Appendix A.6.

Discrete Constrained Variational Hamiltonian Mechanics

Let us now extend the results of Sect. 3 from Leok and Zhang (2011) to introduce a discrete formulation of variational Hamiltonian mechanics which includes holonomic constraints. Suppose we are given a partition of the interval [0, T], and a discrete curve in , denoted by , such that , and . We formulate discrete constrained variational Hamiltonian mechanics in terms of the following discrete analogues of the constrained action functional given by Eq. (3.5):whereWe can now derive discrete analogues of Theorems 3.1 and 3.2 relating discrete variational principles to discrete constrained Hamilton’s equations, generalizing Lemma 3.1 from Leok and Zhang (2011):

Theorem 3.5

The Type II discrete Hamilton’s phase space variational principleis equivalent to the discrete constrained right Hamilton’s equationswhere is defined via Eq. (3.16). See Appendix A.8.

Theorem 3.6

The Type III discrete Hamilton’s phase space variational principleis equivalent to the discrete constrained left Hamilton’s equationswhere is defined via Eq. (3.17). See Appendix A.9.

Remark 3.3

These discrete constrained Hamilton’s equations can be thought of as the discrete Hamilton’s equations generated by the augmented discrete HamiltoniansThis augmented Hamiltonian perspective together with the augmented Lagrangian perspective from Remark 2.2 imply that the constrained variational integrator is equivalent to the constrained variational integrator whenever the variational integrator is equivalent to the variational integrator (and similarly for the integrators generated by and ). Examples where this happens are presented in Schmitt et al. (2018) for Taylor variational integrators provided the Lagrangian is hyperregular, and in Leok and Zhang (2011) for generalized Galerkin variational integrators constructed using the same choices of basis functions and numerical quadrature formula provided the Hamiltonian is hyperregular.

Error Analysis for Variational Integrators

Unconstrained Error Analysis

Theorem 2.3.1 of Marsden and West (2001) states that if a discrete Lagrangian, , approximates the exact discrete Lagrangian to order r, i.e.,then the discrete Hamiltonian map , viewed as a one-step method defined implicitly from the discrete Euler–Lagrange equationsor equivalently in terms of the implicit discrete Euler–Lagrange equations, which involve the corresponding discrete momenta via the discrete Legendre transforms,has order of accuracy r. Theorem 2.3.1 of Marsden and West (2001) has an analogue for Hamiltonian variational integrators. Theorem 2.2 in Schmitt and Leok (2017) states that if a discrete right Hamiltonian approximates the exact discrete right Hamiltonian to order r, i.e.,then the discrete right Hamiltonian map , viewed as a one-step method defined implicitly by the discrete right Hamilton’s equationsis order r accurate. As mentioned in Schmitt and Leok (2017), the proof of Theorem 2.2 in Schmitt and Leok (2017) can easily be adjusted to prove an equivalent theorem for the discrete left Hamiltonian case, which states that if a discrete left Hamiltonian approximates the exact discrete left Hamiltonian to order r, i.e.,then the discrete left Hamiltonian map , viewed as a one-step method defined implicitly by the discrete left Hamilton’s equationsis order r accurate. Many other properties of the integrator, such as momentum conservation properties of the method, can be determined by analyzing the associated discrete Lagrangian or Hamiltonian, as opposed to analyzing the integrator directly. We will exploit these error analysis results to derive analogous results for the constrained versions discussed in Sects. 2 and 3.

Constrained Error Analysis

For the Lagrangian case, we can think of the Lagrange multipliers as extra position coordinates and define an augmented Lagrangian viaA corresponding augmented discrete Lagrangian is given byand the discrete Euler–Lagrange Eq. (4.2)yield the discrete constrained Euler–Lagrange equationsderived in Sect. 2.2. As a consequence, we can apply Theorem 2.3.1 of Marsden and West (2001) to the augmented Lagrangian (4.8) and obtain the following result:

Theorem 4.1

Suppose that for an exact discrete Lagrangian and a discrete Lagrangian ,Then, the discrete map , viewed as a one-step method defined implicitly by the discrete constrained Euler–Lagrange equations, has order of accuracy r. For the Hamiltonian case, we can think of the Lagrange multipliers as extra position coordinates and define conjugate momenta , which are constants of motion since the time-derivative of does not appear anywhere, and are constrained to be zero. The augmented Hamiltonian , given byyields the following augmented left and right discrete Hamiltoniansand the discrete left and right Hamilton’s equationsyield the discrete constrained left Hamilton’s equationsand the discrete constrained right Hamilton’s equationsderived in Sect. 3. As a consequence, we can apply Theorem 2.2 in Schmitt and Leok (2017) and its Type III analogue to the augmented Hamiltonians and obtain the following results

Theorem 4.2

Suppose that given an exact discrete right Hamiltonian and a discrete right Hamiltonian , we haveThen, the discrete map , viewed as a one-step method defined implicitly by the discrete constrained right Hamilton’s equations, has order of accuracy r.

Theorem 4.3

Suppose that given an exact discrete left Hamiltonian and a discrete left Hamiltonian , we haveThen, the discrete map , viewed as a one-step method defined implicitly by the discrete constrained left Hamilton’s equations, has order of accuracy r.

Variational Riemannian Accelerated Optimization

Riemannian Geometry

We first introduce the main notions from Riemannian geometry that will be used throughout this section (see Absil et al. (2008); Boumal (2020); Duruisseaux and Leok (2022a); Jost (2017); Lee (2018); Lang (1999) for more details).

Definition 5.1

Suppose we have a Riemannian manifold with Riemannian metric , represented by the positive-definite symmetric matrix in local coordinates. Then, we define the musical isomorphism viaand its inverse musical isomorphism . The Riemannian metric induces a fiber metric on viarepresented by the positive-definite symmetric matrix in local coordinates, which is the inverse of the Riemannian metric matrix .

Definition 5.2

Denoting the differential of f by df, the Riemannian gradient at a point of a smooth function is the tangent vector at q such thatThis can also be expressed in terms of the inverse musical isomorphism, .

Definition 5.3

A geodesic in a Riemannian manifold is a parametrized curve which is of minimal local length, and is a generalization of the notion of straight line from Euclidean spaces to Riemannian manifolds. The other generalization of straight lines involves curves having zero “acceleration" or constant “speed," which requires the introduction of an affine connection. These two generalizations are equivalent if the Riemannian manifold is endowed with the Levi–Civita connection. Given two points , a vector in can be transported to along a geodesic by an operation called the parallel transport along .

Definition 5.4

The Riemannian Exponential map at is defined viawhere is the unique geodesic in such that and , for any . is a diffeomorphism in some neighborhood containing 0, so we can define its inverse map, the Riemannian Logarithm map .

Definition 5.5

A retraction on a manifold is a smooth mapping such that for any , the restriction of to satisfiesThe Riemannian Exponential map is a natural example of a retraction on a Riemannian manifold. , where denotes the zero element of , with the canonical identification , where is the tangent map of at and is the identity map on .

Definition 5.6

A subset A of a Riemannian manifold is called geodesically uniquely convex if every two points of A are connected by a unique geodesic in A. A function is called geodesically convex if for any two points and a geodesic connecting them,Note that if f is a smooth geodesically convex function on a geodesically uniquely convex subset A,A function is called geodesically - weakly quasi-convex (-WQC) with respect to for some ifNote that a local minimum of a geodesically convex or -WQC function is also a global minimum.

Definition 5.7

Given a Riemannian manifold with sectional curvature bounded below by , and an upper bound D for the diameter of the domain of consideration, defineNote that since for all real values of x.

Hamiltonian Approach

Our approach consists in integrating the Riemannian Bregman Hamiltonian systems derived in Duruisseaux and Leok (2022a) which evolve on the Riemannian manifold , via discrete constrained variational Hamiltonian integrators which enforce the numerical solution to lie on the Riemannian manifold . With given by Eq. (5.1), we know from Duruisseaux and Leok (2022a) that if we let in the geodesically convex case, and in the geodesically -weakly quasi-convex case, we obtain the Direct approach Riemannian p-Bregman Hamiltonianand the Adaptive approach Riemannian Bregman HamiltonianIt is proven in Duruisseaux and Leok (2022a) that along the trajectories of the Riemannian p-Bregman dynamics, f(Q(t)) converges to its optimal value at a rate of , under suitable assumptions on .

Remark 5.1

In the vector space setting, these Riemannian Bregman Hamiltonians reduce to the direct and adaptive approach Bregman Hamiltonians derived in Duruisseaux et al. (2021) for convex functions:

Some Optimization Problems on Riemannian Manifolds

Rayleigh Quotient Minimization on the Unit Sphere

An eigenvector v corresponding to the largest eigenvalue of a symmetric matrix A maximizes the Rayleigh quotient over . Thus, a unit eigenvector corresponding to the largest eigenvalue of the matrix A is a minimizer of the function over the unit sphere , which can be thought of as a Riemannian submanifold with constant positive curvature of endowed with the Riemannian metric inherited from the Euclidean inner product . Solving the Rayleigh quotient optimization problem efficiently is challenging when the given symmetric matrix A is ill-conditioned and high-dimensional. Note that an efficient algorithm that solves the above minimization problem can also be used to find eigenvectors corresponding to the smallest eigenvalue of A by using the fact that the eigenvalues of A are the negative of the eigenvalues of .

Eigenvalue and Procrustes Problems on the Stiefel Manifold

When endowed with the Riemannian metric , the Stiefel manifoldis a Riemannian submanifold of . The tangent space at any is given by and the orthogonal projection onto is given by A retraction on is given by where denotes the Q factor of the QR factorization of the matrix as where and R is an upper triangular matrix with strictly positive diagonal elements (Absil et al. 2008). A generalized eigenvector problem consists of finding the m smallest eigenvalues of a symmetric matrix A and corresponding eigenvectors. This problem can be formulated as a Riemannian optimization problem on the Stiefel manifold via the Brockett cost functionwhere for arbitrary . The columns of a global minimizer of f are eigenvectors corresponding to the m smallest eigenvalues of A (see Absil et al. (2008)). If we define via then f is the restriction of to soThe unbalanced orthogonal Procrustes problem consists of minimizing the functionon the Stiefel manifold , for given matrices and with and , where is the Frobenius norm . If we define via then f is the restriction of to soNote that the special case where is the balanced orthogonal Procrustes problem. In this case, so and minimizing the function is replaced by the problem of maximizing over . A solution is then given by where is the Singular Value Decomposition of with square orthogonal matrices U and V, and the solution is unique provided is nonsingular (see Eldén and Park (1999); Golub and Van Loan (2013)).

Numerical Methods

Hamiltonian Taylor Variational Integrators (HTVIs)

HTVIs were first introduced in Schmitt et al. (2018). A discrete approximate Hamiltonian is constructed by approximating the flow map and the trajectory associated with the boundary values using a Taylor method, and approximating the integral by a quadrature rule. The Hamiltonian Taylor variational integrator is then generated by the discrete Hamilton’s equations. More explicitly, Type II HTVIs are constructed as follows: Note that the following error analysis result concerning the order of accuracy of HTVIs was derived in Duruisseaux et al. (2021) (it can be extended to the constrained case via the strategy and results of Sect. 4.2): Construct the r-order and -order Taylor methods and approximating the exact time-h flow map . Approximate by the solution of where . Choose a quadrature rule of order s with weights and nodes given by for and generate approximations via Approximate via where . Use the continuous Legendre transform to obtain . Apply the quadrature rule to obtain the associated discrete right Hamiltonian The variational integrator is then defined by the discrete right Hamilton’s equations.

Theorem 5.1

If the Hamiltonian H and its partial derivative are Lipschitz continuous in both variables, then approximates with at least order of accuracy . By Theorem 2.2 in Schmitt and Leok (2017), the associated discrete Hamiltonian map has the same order of accuracy. In this paper, we will use the Direct approach and Adaptive approach Type II HTVIs constructed in Duruisseaux et al. (2021) based on the Direct and Adaptive discrete right Hamiltonians (respectively)

Euler–Lagrange Simple Discretization

In Duruisseaux and Leok (2022a), the p-Bregman Euler–Lagrange equations were rewritten as a first-order system of differential equations, for which a Riemannian version of a semi-implicit Euler scheme was applied to obtain the following algorithm: Version I of Algorithm 2 corresponds to the usual update for the semi-implicit Euler scheme, while Version II is inspired by the reformulation of Nesterov’s method from Sutskever et al. (2013) that uses a corrected gradient instead of the traditional gradient .

Riemannian Gradient Descent (RGD)

This is a generalization of Gradient Descent to the setting of Riemannian manifolds which involves the Riemannian gradient and a retraction.

Numerical Results

It is noted in Duruisseaux and Leok (2022a) that although higher values of p in Algorithm 2 result in provably faster rates of convergence, they also appear to be more prone to stability issues under numerical discretization, which can cause the numerical optimization algorithm to diverge. Numerical experiments in Duruisseaux et al. (2021) showed that in the normed vector space setting, geometric discretizations which respect the time-rescaling invariance and symplecticity of the Bregman Lagrangian and Hamiltonian flows were substantially less prone to these stability issues, and were therefore more robust, reliable, and computationally efficient. This was one of the motivations to develop time-adaptive Hamiltonian variational integrators for the Bregman Hamiltonians. Numerical experiments were conducted for the Rayleigh quotient minimization problem on , and for the generalized eigenvalue and Procrustes problems on the Stiefel manifold . Comparison of the Direct and Adaptive (AD) Type II HTVIs with the Riemannian Gradient Descent (RGD) method and the Euler–Lagrange discretizations (EL V1 and EL V2) from Duruisseaux and Leok (2022a) with and the same timestep , for the Rayleigh quotient minimization problem on the unit sphere , and for the generalized eigenvalue and Procrustes problems on the Stiefel manifold The results from Fig. 1 show how the Hamiltonian Taylor variational integrators compare to the Euler–Lagrange discretizations from Duruisseaux and Leok (2022a) and the standard Riemannian gradient descent. Note that for certain instances of the Procrustes problem with certain initial values, all the algorithms converged to a local minimizer, and not the global minimizer, of the objective function. We can observe from Fig. 1 that for the same value of the timestep h, the Adaptive Hamiltonian variational integrator clearly outperforms its Direct counterpart, Riemannian gradient descent and the Euler–Lagrange discretizations in terms of number of iterations required. Furthermore, unlike the Euler–Lagrange discretizations (Algorithm 2) and the Riemannian gradient descent (Algorithm 3), the HTVI methods (Algorithm 1) do not require the use of retractions or parallel transports. Note that the Rayleigh minimization results indicate that the Euler–Lagrange discretizations suffer from stability issues leading to a loss of convergence, as the polynomially growing unbounded coefficient is multiplied with , so for this product to be bounded, the gradient has to decay to zero, but due to finite numerical precision, the gradient remains bounded away from zero, thereby causing the product to grow without bound. This issue can be resolved by adding a suitable upper bound to the coefficient in the updates, as can be seen both for the Euler–Lagrange discretizations and Hamiltonian variational integrators for the problems on .

Fig. 1

Comparison of the Direct and Adaptive (AD) Type II HTVIs with the Riemannian Gradient Descent (RGD) method and the Euler–Lagrange discretizations (EL V1 and EL V2) from Duruisseaux and Leok (2022a) with and the same timestep , for the Rayleigh quotient minimization problem on the unit sphere , and for the generalized eigenvalue and Procrustes problems on the Stiefel manifold

However, the algorithms generated by these constrained Hamiltonian variational integrators are implicit, which can significantly increase the cost per iteration as the dimension of the problem becomes very large. In this case, it might be beneficial to consider other options using the unconstrained explicit Hamiltonian Taylor variational integrator, such as incorporating the constraints within the objective function as a penalty, although this might not constrain the solution trajectory to lie exactly on the manifold, or using projections if they can be computed efficiently and accurately for the Riemannian manifold of interest (Duruisseaux and Leok 2021). Further, note that the implementation of the Hamiltonian variational integrators needs a very careful tuning of the various parameters at play, which may be challenging and thus also motivates the development of different methods.

Conclusion

Motivated by variational formulations of optimization problems on Riemannian manifolds, we first studied the relationship between the constrained Type I/II/III variational principles and the corresponding constrained Hamilton’s or Euler–Lagrange equations both in continuous and discrete time, and derived variational error analysis results for the maps defined implicitly by the resulting discrete constrained equations. We then exploited these discrete constrained variational integrators and the variational formulation of accelerated optimization on Riemannian manifolds from Duruisseaux and Leok (2022a) to numerically solve the generalized eigenvalue and Procrustes problems on and . The numerical experiments conducted in this paper corroborated the observation made for the vector space setting in Duruisseaux et al. (2021) that the adaptive Hamiltonian variational integrator is significantly more efficient than the direct Hamiltonian variational integrator, and that it can significantly outperform the Euler–Lagrange discretizations and Riemannian gradient descent, when its parameters are tuned carefully. Furthermore, it was noted that unlike the Euler–Lagrange discretizations and Riemannian gradient descent, the Hamiltonian algorithms did not require the use of retractions or parallel transports, which could be important when the problem considered lies on a Riemannian manifold for which it might not be possible to compute or approximate these objects efficiently. We noted however that tuning the parameters of these discrete constrained variational integrators can be challenging, and also that the resulting algorithms are implicit, which may significantly increase the cost per iteration as the dimension of the problem becomes very large, in which case it might be beneficial to consider using the unconstrained explicit HTVIs with projections (Duruisseaux and Leok 2021) or by incorporating the constraints within the objective function as a penalty. Moreover, although the Whitney and Nash embedding theorems (Whitney 1944a, b; Nash 1956) imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, there are limitations to the constrained integration strategy based on embeddings presented in this paper, and an approach intrinsically defined on Riemannian manifolds would be desirable. Indeed, the embedding approach usually leads to higher-dimensional computations, and requires an effective way of constructing the embedding or a natural way of writing down equations that constrain the problem and the numerical solutions to the Riemannian manifold. Furthermore, most results in Riemannian geometry or results concerning specific Riemannian manifolds are proven from an intrinsic perspective because the embedding approach tends to flood intrinsic geometric properties of the manifold with superfluous information coming from the additional dimensions of the Euclidean space. This motivates the development of intrinsic methods that would exploit the symmetries and geometric properties of the manifold and of the problem at hand. Developing an intrinsic extension of Hamiltonian variational integrators to manifolds will require some additional work, since the current approach involves Type II/III generating functions , , which depend on the position at one boundary point, and the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space, and one should ideally consider a Hamiltonian variational integrator construction based on discrete Dirac mechanics (Leok and Ohsawa 2011), which would yield a generating function , , that depends on the position at both boundary points and the momentum at one of the boundary points. This approach can be viewed as a discretization of the generalized energy , in contrast to the Hamiltonian . On the other hand, the formulation of Lagrangian variational integrators presented in the introduction of Sect. 2 makes sense intrinsically on manifolds, and a framework for variable time-stepping in Lagrangian variational integration was introduced in our subsequent work (Duruisseaux and Leok 2022b) to design intrinsic time-adaptive Lagrangian variational integrators for accelerated optimization on Riemannian manifolds. It would also be interesting to extend the proposed approach to the problem of optimization with nonintegrable constraints, which naturally leads to the question of whether vakonomic or nonholonomic mechanics is the appropriate description (Cortés et al. 2002). In the context of optimization with nonintegrable constraints, the relevant extension will likely involve vakonomic variational integrators (Benito and Martín de Diego 2005; Jiménez and Martín de Diego 2012). However, it would be interesting to relate the methods introduced in this paper to the existing work on variational integrators applied to optimal control problems (Junge et al. 2005; de León et al. 2007), and the discrete optimal control of nonholonomic dynamical systems would likely require a combination of the methods described here and nonholonomic integrators (Cortés and Martínez 2001; de León et al. 2004; Fedorov and Zenkov 2005; McLachlan and Perlmutter 2006).

3 in total

1 in total

1. Accelerated Optimization on Riemannian Manifolds via Discrete Constrained Variational Integrators.

Authors: Valentin Duruisseaux; Melvin Leok
Journal: J Nonlinear Sci Date: 2022-04-28 Impact factor: 3.443