Literature DB >> 26501775

A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations.

Zengru Cui¹, Gonglin Yuan², Zhou Sheng¹, Wenjie Liu³, Xiaoliang Wang¹, Xiabin Duan¹.

Abstract

This paper proposes a modified BFGS formula using a trust region model for solving nonsmooth convex minimizations by using the Moreau-Yosida regularization (smoothing) approach and a new secant equation with a BFGS update formula. Our algorithm uses the function value information and gradient value information to compute the Hessian. The Hessian matrix is updated by the BFGS formula rather than using second-order information of the function, thus decreasing the workload and time involved in the computation. Under suitable conditions, the algorithm converges globally to an optimal solution. Numerical results show that this algorithm can successfully solve nonsmooth unconstrained convex problems.

Entities: Chemical Gene

Mesh：

Year: 2015 PMID： 26501775 PMCID： PMC4621044 DOI： 10.1371/journal.pone.0140606

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Consider the following convex problem: where f : ℝ → ℝ is a possibly nonsmooth convex function. In general, this problem has been well studied for several decades when f is continuously differentiable, and a number of different methods have been developed for its solution Eq (1) (for example, numerical optimization method [1-3] etc, heuristic algorithm [4-6] etc). However, when f is a nondifferentiable function, the difficulty of solving this problem increases. Recently, such problems have arisen in many medical, image restoration and optimal control applications (see [7-13] etc). Some authors have previously studied nonsmooth convex problems (see [14-18] etc). Let F : ℝ → ℝ be the so-called Moreau-Yosida regularization of f, which is defined by where λ is a positive parameter and ‖ ⋅ ‖ denotes the Euclidean norm. The problem Eq (1) is equivalent to the following problem It is well known that the problems Eqs (1) and (3) of the solution sets are the same. As we know, one of the most effective methods for problems Eq (3) is the trust region method. The trust region method plays an important role in the area of nonlinear optimization, and it has been proven to be a very efficient method. Levenberg [19] and Marquardt [20] first applied this method to nonlinear least-squares problems, and Powell [21] established a convergence result for this method for unconstrained problems. Fletcher [22] first proposed a trust region method for composite nondifferentiable optimization problems. Over the past decades, many authors have studied the trust region algorithm to minimize nonsmooth objective function problems. For example, Sampaio, Yuan and Sun [23] used the trust region algorithm for nonsmooth optimization problems; Sun, Sampaio and Yuan [24] proposed a quasi-Newton trust region algorithm for nonsmooth least-squares problems; Zhang [25] used a new trust region algorithm for nonsmooth convex minimization; and Yuan, Wei and Wang [26] proposed a gradient trust region algorithm with a limited memory BFGS update for nonsmooth convex minimization problems. For other references on trust region methods, see [27-35], among others. In particular, for the problem we address in this study, as we can compute the exact Hessian, the trust region method could be very efficient. However, it is difficult to compute the Hessian at every iteration, which increases the computational workload and time. The purpose of this paper is to present an efficient trust region algorithm to solve Eq (3). With the use of the Moreau-Yosida regularization (smoothing) and the new quasi-Newton equation, the given method has the following good properties: (i) the Hessian makes use of not only the gradient value but also the function value and (ii) the subproblem of the proposed method, which possesses the form of an unconstrained trust region subproblem, can be solved using existing methods. The remainder of this paper is organized as follows. In the next section, we briefly review some basic results in convex analysis and nonsmooth analysis and state a new quasi-Newton secant equation. In section 3, we present a new algorithm for solving problem Eq (3). In section 4, we prove the global convergence of the proposed method. In section 5, we report numerical results and present some comparisons for the existing methods to solve problem Eq (1). We conclude our paper in Section 6. Throughout this paper, unless otherwise specified, ‖ ⋅ ‖ denotes the Euclidean norm of vectors or matrices.

Initial results

In this section, we first state some basic results in convex analysis and nonsmooth analysis. Let and denote p(x): = argmin θ(z, x). Then, p(x) is well defined and unique, as θ(z, x) is strongly convex. By Eq (2), F can be rewritten as In the following, we denote g(x) = ∇F(x). Some important properties of F are given as follows: F is finite-valued, convex and everywhere differentiable with The gradient mapping g : ℝ → ℝ is globally Lipschitz continuous with modulus λ, i.e., x solves Eq (1) if and only if ∇F(x) = 0, namely, p(x) = x. It is obvious that F(x) and g(x) can be obtained through the optimal solution of argmin θ(z, x). However, the minimizer of θ(z, x), p(x) is difficult or even impossible to solve for exactly. Thus, we cannot compute the exact value of p(x) to define F(x) and g(x). Fortunately, for each x ∈ ℝ and any ϵ > 0, there exists a vector p (x, ϵ) ∈ ℝ such that Thus, we can use p (x, ϵ) to define respective approximations of F(x) and g(x) as follows, when ϵ is small, and The papers [36, 37] describe some algorithms to calculate p (x, ϵ). The following remarkable feature of F (x, ϵ) and g (x, ϵ) is obtained from [38]. Proposition 2.1 Let p (x, ϵ) be a vector satisfying Eq (6), and F (x, ϵ) and g (x, ϵ) are defined by Eqs (7) and (8), respectively. Then, we obtain and The relations Eqs (9), (10) and (11) imply that F (x, ϵ) and g (x, ϵ) may be made arbitrarily close to F(x) and g(x), respectively, by choosing the parameter ϵ to be small enough. Second, recall that when f is smooth, the quasi-Newton secant method is used to solve problem Eq (1). The iterate x satisfies ∇f + B (x − x ) = 0, where ∇f = ∇f(x ), B is an approximation Hessian of f at x , and the sequence of matrix {B } satisfies the secant equation as follows. where y = ∇f − ∇f and s = x − x . However, the function values are not exploited in Eq (12), which the method solves by only using the gradient information. Motivated by the above observations, we hope to develop a method that uses both the gradient information and function information. This problem has been studied by several authors. In particular, Wei, Li and Qi [39] proposed an important modified secant equation by using not only the gradient values but also the function values, and the modified secant is defined as where ν = y + β s , f = f(x ), ∇f = ∇f(x ), and . When f is twice continuously differentiable and B is updated by the BFGS formula [40-43], where B = I is a unit matrix if k = 0, this secant Eq (13) possesses the following remarkable property: This property holds for all k. Based on the result of Theorem 2.1 [39], Eq (13) has an advantage over Eq (12) in this approximate relation.

The new model

In this section, we present a modified BFGS formula using trust region model for solving Eq (1), which is motivated by the Moreau-Yosida regularization (smoothing), general trust region method and the new secant Eq (13). First, we describe the trust region method. In each iteration, a trial step d is generated by solving an adaptive trust region subproblem, in which the values of the gradient of F(x) at x and Eq (13) are used: where the scalar ϵ > 0 and Δ describe the trust region radius. Let d be the optimal solution of Eq (14). The actual reduction is defined by and we define the predict reduction as Then, we define r to be the ratio between Are d and Pre d Based on the new secant Eq (13) and with B being updated by the BFGS formula, we propose a modified BFGS formula. The B is defined by where s = x − x , y = g (x , ϵ ) − g (x , ϵ ), ν = y + β s and if k = 0, then B = I, and I is a unit matrix. We now list the steps of the modified trust region algorithm as follows. Algorithm 1. Step 0. Choose x 0 ∈ ℝ, 0 < σ 1 < σ 2 < 1, 0 < η 1 < 1 < η 2, λ > 0, 0 ≤ ɛ ≪ 1, Δ ≥ Δ0 > 0 is called the maximum value of trust region radius, B 0 = I, and I is the unit matrix. Let k: = 0. Step 1. Choose a scalar ϵ satisfying 0 < ϵ < ϵ , and calculate p (x , ϵ ), . If x satisfies the termination criterion ‖g (x , ϵ )‖ ≤ ɛ, then stop. Otherwise, go to Step 2. Step 2. d solves the trust region subproblem Eq (14). Step 3. Compute Are d , Pre d , r using Eqs (15), (16) and (17). Step 4. Regulate the trust region radius. Let Step 5. If the condition r ≥ σ 1 holds, then let x = x + d , update B by Eq (18), and let k: = k + 1; go back to Step 1. Otherwise, let x : = x and k: = k + 1; return to Step 2. Similar to Dennis and Moré [44] or Yuan and Sun [45], we have the following result. Lemma 1 If and only if the condition holds, B will inherit the positive property of B . Proof “ ⇒ ” If B is symmetric and positive definite, then “⇐” For the proof of the converse, suppose that and B is symmetric and positive definite for all k ≥ 0. We shall prove that x B x > 0 holds for arbitrary x ≠ 0 and x ∈ ℝ by induction. It is easy to see that B 0 = I is symmetric and positive definite. Thus, we have Because B is symmetric and positive definite for all k ≥ 0, there exists a symmetric and positive definite matrix such that . Thus, by using the Cauchy-Schwartz inequality, we obtain It is not difficult to prove that the above inequality holds true if and only if there exists a real number γ ≠ 0 such that , namely, x = γ s . Hence, if Eq (20) strictly holds (and note that ), then from Eq (19), we have Otherwise, ; then, there exists γ such that x = γ s . Thus, Therefore, for each 0 ≠ x ∈ ℝ, we have x B x > 0. This completes the proof. Lemma 1 states that if , then the matrix sequence {B } is symmetric and positive definite, which is updated by the BFGS formula of Eq (18).

Convergence analysis

In this section, the global convergence of Algorithm 1 is established under the assumption that the following conditions are required. Assumption A. Let the level set Ω F is bounded from below. The matrix sequence {B } is bounded on Ω, which means that there exists a positive constant M such that The sequence {ϵ } converges to zero. Now, we present the following lemma. Lemma 2 If d is the solution of Eq (14), then Proof Similar to the proof of Lemma 7(6.2) in Ma [46]. Note that the matrix sequence {B } is symmetric and positive definite; then, we present to be a Cauchy point at iteration point x , which is defined by where . It is easy to verify that the Cauchy point is a feasible point, i.e., . If , then and Thus, we obtain Otherwise, we have . Thus, we obtain Let d be the solution of Eq (14). Because , we have This completes the proof. Lemma 3 Let Assumption A hold true and the sequence {x } be generated by Algorithm 1. If d is the solution of Eq (14), then Proof Let d be the solution of Eq (14). By using Taylor expansion, F (x + d , ϵ ) can be expressed by Note that with the definitions of Are d and Pre d and by using Eq (23), we have The proof is complete. Lemma 4 Let Assumption A hold. Then, Algorithm 1 does not circle in the inner cycle infinitely. Proof Suppose, by contradiction to the conclusion of the lemma, that Algorithm 1 cycles between Steps 2 and 5 infinitely at iteration point x , i.e., r < σ 1 and that there exists a scalar ρ > 0 such that ‖g (x , ϵ )‖ ≥ ρ. Thus, noting that 0 < η 1 < 1, we have By using the result Eq (22) of Lemma 3 and the definition of r , we obtain which means that we must have r ≥ σ 1; this contradicts the assumption that r < σ 1, and the proof is complete. Based on the above lemmas, we can now demonstrate the global convergence of Algorithm 1 under suitable conditions. Theorem 1 (Global Convergence). Suppose that Assumption A holds and that the sequence {x } is generated by Algorithm 1. Let d Eq (14). Then, holds, and any accumulation point of x Eq (1). Proof We first prove that Suppose that g (x , ϵ ) ≠ 0. Without loss of generality, by the definition of r , we have Using Taylor expansion, we obtain When Δ > 0 and small enough, we have Suppose that there exists ω 0 > 0 such that ‖g (x , ϵ )‖ ≥ ω 0. By contradiction, using Eqs (25) and (26) and Lemma 2, we have which means that there exists sufficiently small such that for each k, and we have ∣r − 1∣ < 1 − σ 2, i.e., r > σ 2. Then, according to the Algorithm 1, we have Δ ≥ Δ. Thus, there exists a positive integer k 0 and a constant ρ 0 for arbitrary k ≥ k 0 and satisfying , for which we have On the other hand, because F is bounded from below, and supposing that there exists an infinite number k such that r > σ 1, by the definition of r and Lemma 2, for each k ≥ k 0, which means that Δ → 0 for k → ∞; this is a contradiction to Eq (28). Moreover, suppose that for sufficiently large k, we have r < σ 1. Then, , and we can see that Δ → 0 for k → ∞; this is also a contradiction to Eq (28). The contradiction shows that Eq (24) holds. We now show that holds. By using Eq (11), we have Together with Assumption A(iv), this implies that Finally, we make a final assertion. Let x* be an accumulation point of {x }. Then, without loss of generality, there exists a subsequence {x } satisfying From the properties of F, we have Thus, by using Eqs (29) and (30), we have x* = p(x*). Therefore, x* is an optimal solution of Eq (1). The proof is complete. Similar to Theorem 3.7 in [25], we can show that the rate of convergence of Algorithm 1 is Q-superlinear. We omit this proof here (the proof of the Q-superlinear convergence can be found in [25]). Theorem 2 (Q-superlinear Convergence) [25] Suppose that Assumption A(ii) holds, that the sequence {x } is generated by Algorithm 1, which has a limit point x*, and that g is BD-regular and semismooth at x*. Furthermore, suppose that ϵ = o(‖g(x )‖2). Then, x* is the unique solution of Eq (1); the entire sequence {x } converges to x* Q-superlinearly, i.e.,

Results

In this section, we test our modified BFGS formula using a trust region model for solving nonsmooth problems. The type of nonsmooth problems addressed in Table 1 can be found in [47-53]. The problem dimensions and optimum function values are listed in Table 1, where “No.” is the number of the test problem, “Dim” is the dimension of the test problem, “Problem” is the name of the test problem, “x 0” is the initial point, and “f (x)” is the optimization function evaluation. Here, the modified algorithm was implemented using MATLAB 7.0.4, and all numerical experiments were run on a PC with CPU Intel CORE(TM) 2 Duo T6600 2.20 GHZ, with 2.00 GB of RAM and with the Windows 7 operating system.

Table 1

Problem descriptions for test problems.

No.	Dim	Problem	x ₀	f _ops(x)
1	2	Rosenbrock [47]	(-1.2, 1.0)	0
2	2	Crescent [47]	(-1.5, 2.0)	0
3	2	CB2 [48]	(1.0, -0.1)	1.9522245
4	2	CB3 [48]	(2.0, 2.0)	2.0
5	2	DEM [49]	(1.0, 1.0)	-3.0
6	2	QL [50]	(-1.0, 5.0)	7.20
7	2	LQ [50]	(-0.5, -0.5)	-1.4142136
8	2	Mifflin 2 [51]	(-1.0, -1.0)	-1.0
9	5	Shor [52]	(0.0, 0.0, 0.0, 0.0, 1.0)	22.600162
10	50	MXHILB [53]	ones(50, 1)	0
11	50	LIHILB [53]	ones(50, 1)	0

To test the performance of the given algorithm for the problems listed in Table 1, we compared our method with the trust region concept (BT) of paper [15], the proximal bundle method (PBL) of paper [17] and the gradient trust region algorithm with limited memory BFGS update (LGTR) described in [26]. The parameters were chosen as follows: σ 1 = 0.45, σ 2 = 0.75, η 1 = 0.5, η 2 = 4, λ = 1, Δ0 = 0.5 < Δ = 100 and (where k is the iterate number). We stopped the algorithm when the condition ‖g (x, ϵ)‖ ≤ 10 − 6 was satisfied. Based on the idea of [26], we use the function fminsearch in MATLAB for solving min θ(z, x). Then, we obtained the solution p(x); moreover, we obtained g (x, ϵ), which is computed using Eq (8). Meanwhile, we also listed the results of PBL, LGTR, BT and our modified algorithm in Table 2. The numerical results of PBL and BT can be found in [17], and the numerical results of LGTR can be found in [26]. The following notations are used in Table 2: “NI” is the number of iterations; “NF” is the number of the function evaluations; “f(x)” is the function value at final iteration; “——” indicates that the algorithm fails to solve the problem; and “Total” denotes the sum of the NI/NF.

Table 2

Test results.

No.	PBL NI/NF/f(x)	LGTR NI/NF/f(x)	BT NI/NF/f(x)	Algorithm 1 NI/NF/f(x)
1	42/45/3.81 × 10⁻⁵	——	79/88/1.30 × 10⁻¹⁰	26/66/4.247136 × 10⁻⁶
2	18/20/6.79 × 10⁻⁵	10/10/3.156719 × 10⁻⁵	24/27/9.44 × 10⁻⁵	13/13/2.521899 × 10⁻⁵
3	32/34/1.9522245	10/11/1.952225	13/16/1.952225	4/6/1.952262
4	14/16/2.0	2/3/2.000217	13/21/2.0	3/4/2.000040
5	17/19/-3.0	3/3/-2.999700	9/13/-3.0	4/24/-2.999922
6	13/15/7.2000015	19/119/7.200001	12/17/7.200009	9/9/7.200043
7	11/12/-1.4142136	1/1/-1.207068	10/11/-1.414214	2/2/-1.414214
8	66/68/-0.99999941	3/3/-0.9283527	6/13/-1.0	4/4/-0.9978547
9	27/29/22.600162	42/443/22.62826	29/30/22.600160	8/9/22.600470
10	19/20/4.24 × 10⁻⁷	12/12/9.793119 × 10⁻³	——	23/108/5.228012 × 10⁻³
11	19/20/9.90 × 10⁻⁸	20/63/9.661137 × 10⁻³	——	7/7/2.632534 × 10⁻³
Total	278/298	164/1111	353/412	103/252

The numerical results show that the performance of our algorithm is superior to those of the methods in Table 2. It can be seen clearly that the sum of our algorithm relative to NI and NF is less than the other three algorithms. The paper [54] provides a new tool for analyzing the efficiency of these four algorithms. Figs 1 and 2 show the performances of these four methods relative to NI and NF of Table 2, respectively. These two figures prove that Algorithm 1 provides a good performance for all the problems tested compared to PBL, LGTR and BT. In sum, the preliminary numerical results indicate that the modified method is efficient for solving nonsmooth convex minimizations.

Fig 1

Performance profiles of these methods (NI).

Fig 2

Performance profiles of these methods (NF).

Conclusion

The trust region method is one of the most efficient optimization methods. In this paper, by using the Moreau-Yosida regularization (smoothing) and a new secant equation with the BFGS formula, we present a modified BFGS formula using a trust region model for solving nonsmooth convex minimizations. Our algorithm does not compute the Hessian of the objective function at every iteration, which decrease the computational workload and time, and it uses the function information and the gradient information. Under suitable conditions, global convergence is established, and we show that the rate of convergence of our algorithm is Q-superlinear. Numerical results show that this algorithm is efficient. We believe that this algorithm can be used in future applications to solve non smooth convex minimizations.

4 in total

A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations.

Introduction

Initial results

The new model

Convergence analysis

Results

Conclusion

1. Parameter estimation of dynamical systems via a chaotic ant swarm.

2. Image sequence filtering in quantum-limited noise with applications to low-dose fluoroscopy.

3. Feasibility and finite convergence analysis for accurate on-line ν-support vector machine.

4. Chaos-order transition in foraging behavior of ants.