Literature DB >> 27780245

The Modified HZ Conjugate Gradient Algorithm for Large-Scale Nonsmooth Optimization.

Gonglin Yuan1,2, Zhou Sheng1, Wenjie Liu3,4.   

Abstract

In this paper, the Hager and Zhang (HZ) conjugate gradient (CG) method and the modified HZ (MHZ) CG method are presented for large-scale nonsmooth convex minimization. Under some mild conditions, convergent results of the proposed methods are established. Numerical results show that the presented methods can be better efficiency for large-scale nonsmooth problems, and several problems are tested (with the maximum dimensions to 100,000 variables).

Entities:  

Mesh:

Year:  2016        PMID: 27780245      PMCID: PMC5079589          DOI: 10.1371/journal.pone.0164289

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Consider the following optimization problem: If the constrained set satisfies Φ = ℜ, then Eq (1) is called unconstrained optimization; if Φ = {x ∣ l ≤ x ≤ u, x ∈ ℜ}, where n is the number of variables and the vectors l and u represent the lower and upper bounds on the variables, then Eq (1) is called box-constrained optimization; and if Φ = {x ∣ h(x) = 0, g(x) ≤ 0, x ∈ ℜ, i = 1, ⋯, r, j = 1, ⋯, k}, where r and k are positive integers, then Eq (1) is called normal constrained optimization. If the objective function f: ℜ → ℜ is continuously differentiable, then Eq (1) is called smooth optimization; if f: ℜ → ℜ is a nondifferentiable function, then Eq (1) is called nonsmooth optimization. For a given objective function f and constrained set Φ, Eq (1) will be called the corresponding optimization problem. Optimization problems are encountered in many fields, including engineering, management, finance, medicine, and biology, and similarly, optimization models can be used in many fields (see [1-13]). At present, there are many efficient methods available for solving optimization problems [14-24]. However, many challenging optimization problems exist, for example, large-scale problems and nonsmooth problems. The workload increases greatly as the dimension of the problem increases, causing the required CPU time to become very long. For certain large-scale problems, a computer may fail to solve them. To address this issue, more efficient algorithms should be designed. It is well known that spectral gradient approaches, conjugate gradient (CG) techniques and limited-memory quasi-Newton methods can cope with large-scale optimization problems. CG methods in particular have been widely used in many practical large-scale optimization problems because of their simplicity and low memory requirements. Nonsmooth problems are believed to be very difficult to solve, even when they are unconstrained. The direct application of smooth gradient-based methods to nonsmooth problems may lead to a failure in optimality conditions, convergence, or gradient approximation [25]. Haarala et al. (see, e.g., [26, 27]) presented limited-memory bundle methods for large-scale nonsmooth unconstrained and constrained minimization and demonstrated their application to test problems of up to one thousand variables in dimension. Karmitsa et al. [28] tested and compared various methods of both types as well as several methods that may be regarded as hybrids of these two approaches and/or others; the dimensions of the tested nonsmooth problems ranged from 20 to 4000, and the most effective method for large- and extra-large-scale problems was found to be that of [27]. Therefore, special tools for solving nonsmooth optimization problems are needed. This paper is organized as follows. In the next section, the Hager and Zhang (HZ) CG method is presented, and a modified HZ (MHZ) CG formula is proposed. In Section 3, the application of the HZ and MHZ methods to large-scale nonsmooth problems is discussed, global convergence is established, and numerical experiments on nonsmooth problems are reported. In the last section, conclusions are presented. Throughout the paper, ‖⋅‖ denotes the Euclidean norm.

The HZ CG formula [29, 30] and a modification thereof

For convenience, we rewrite Eq (1) as the following special case: where f: ℜ → ℜ is continuously differentiable; Eq (2) is called an unconstrained optimization problem. CG methods are a class of effective line search methods for solving Eq (2), especially when the dimension n is large. The iterative formula for CG methods is defined as follows: where x is the current point in the iteration, α > 0 is the step length, and d is the search direction; the latter is determined as where g = ∇f(x) is the gradient of f(x) at point x and β ∈ ℜ is a scalar. The parameter β is chosen such that when the method is applied to minimize a strongly convex quadratic function, the directions d and d are conjugate with respective to the Hessian of the quadratic function. Let where from [29] with y = g − g; we call this formula for [29] the HZ formula. If , then Eq (4) with satisfies This method exhibits global convergence for the strongly convex function f. To obtain a similar result for a general nonlinear function, Hager and Zhang [30] proposed the formula where η > 0 is a constant; in their experiments, they set η = 0.01. This new parameter also has the property expressed in Eq (6). Based on the formula for , if we let , we can obtain the more general formula In this paper, we set where c ∈ (0, 1) is a scalar. It is easy to deduce that ; moreover, if , then Eq (7) is identical to the HZ formula. The modified formula has the following features: (i) The new formula can overcome the shortcomings of the CG parameter β. If , it is not difficult to find that (ii) Another property of this formula is that it can ensure that the new direction d in Eq (4) with belongs to a trust region without the need for any line search technique. By Eq (4), for , we obtain By combining this result with Step 1 of Algorithm 2.2, we find that for all k. (iii) If T ≠ 0, then the new direction d in Eq (4) when possesses the following sufficient descent property: which holds for all k. Now let us analyze this result. If k = 0, we have d1 = −g1 and satisfying Eq (10). For k ≥ 1, because T ≠ 0, multiplying Eq (4) by yields Let and ; then, by the inequality , we have

Nonsmooth Convex Problems and Their Results

Consider the unconstrained convex optimization problem where f: ℜ → ℜ is a possibly nonsmooth convex function. For the special case that f is continuously differentiable, this optimization problem has been well studied for several decades. However, nonsmooth optimization problems of the form of Eq (12) also arise in many applications, such as image restoration [31] and optimal control [32]. The Moreau-Yosida regularization of f generates where λ is a positive parameter; then, Eq (12) is equivalent to the following problem: Let p(x) = argmin θ(z) and define a function Thus, p(x) is well defined and unique because the function θ(z) is strongly convex. Therefore, F(x) can be expressed as follows: F(x) possesses many known features (see, e.g., [33-35]). The generalized Jacobian of F(x) and its property of BD-regularity are demonstrated in [36, 37]. Here, we list some additional findings regarding the function F(x), as follows: (i) x is an optimal solution to Eq (12) ⇔ ∇F(x) = 0, i.e., p(x) = x. (ii) where is the gradient of F. (iii) The set of generalized Jacobian matrices ∂g(x) = {V ∈ ℜ: V = lim∇g(x), x ∈ D} is nonempty and compact, where Furthermore, every V ∈ ∂g(x) is a symmetric positive semidefinite matrix for each x ∈ ℜ because g is a gradient mapping of the convex function F. (iv) There exist two constants, μ1 > 0 and μ2 > 0, and a neighborhood Ω of x that satisfy

Two algorithms for nonsmooth problems

Based on the above discussion, we present two algorithms for application to nonsmooth problems of the form Eq (14); afterward, we analyze the solution of Eq (12). In the following, unless otherwise noted, g = g(x) is defined as in Eq (16). Algorithm 1 Algorithm 4.1. Require: An initial point x0 ∈ ℜ, λ > 0, σ, η ∈ (0, 1), ρ ∈ (0, 1/2], ϵ ∈ [0, 1). Specify g0 by solving the subproblem Eq (13); d0 ← −g0, k ← 0. while ‖g‖ > ϵ do Determine the step size α = max{ρ|j = 0, 1, 2, ⋯} satisfying the following Armijo line search condition x = x + α d; Compute g by solving the subproblem Eq (13); if ‖g‖ ≤ ϵ then break. else Compute d as end if x ← x, d ← d, k ← k + 1. end while Algorithm 4.2. Eq (18) of Algorithm 4.1 is replaced with the following: Compute d as and let k: = k + 1, go back to Step 2. The following assumptions are needed to ensure the global convergence of Algorithms 4.1 and 4.2. Assumption 4.1. (i) F is bounded from below and the sequence {V} is bounded, namely, there exists a constant M > 0 such that (ii) g is BD-regular at x, namely, item (iv) in Section 3 above holds. By Assumption 4.1, it is not difficult to deduce that there exists a constant M* > 1 such that Lemma 1 The sequence {x} is generated by Algorithm 4.1 (or Algorithm 4.2). Let Assumption 4.1 hold; then, for sufficiently large k, there exists a constant α* > 0 that satisfies Proof. Suppose that α satisfies the Armijo line search condition Eq (17). The proof is complete if α = 1 holds. Otherwise, let ; then, we have By performing a Taylor expansion, we obtain where and the last inequality follows from Eq (20). By combining Eqs (21) and (23), we obtain Thus, we find that Let , and the proof is complete. Now, let us prove the global convergence of Algorithm 4.1. Theorem 1 Suppose that the conditions in Lemma 1 hold. Then, we have moreover, any accumulation point of x is an optimal solution of Eq (12). Proof. Suppose that Eq (25) is not true. Then, there must exist two constants, ϵ0 > 0 and k* > 0, that satisfy By combining , Eqs (17), (22) and (26), we obtain Because F(x) is bounded from below for all k, it follows from Eq (27) that This contradicts Therefore, Eq (25) holds. Let x* be an accumulation point of {x} and, without loss of generality, suppose that there exists a subsequence {x} that satisfies From Eq (28), we find that ‖g(x*)‖ = ‖∇F(x*)‖ = 0. Then, from property (i) of F(x) as given in Section 3, x* is an optimal solution of Eq (12). The proof is complete. In a manner similar to Theorem 4.1 in [38], we can establish the linear convergence rate of Algorithm 4.1 (or Algorithm 4.2). Here, we simply state this property, as follows, but omit the proof. Theorem 2 Let Assumptions 4.1 (i) and (ii) hold, and let x* be the unique solution of Eq (14). Then, there exist two constants b > 0 and r ∈ (0, 1) that satisfy namely, the sequence {x} generated by Algorithm 4.1 (or Algorithm 4.2) linearly converges to x*.

Numerical results for nonsmooth problems

In this section, we present several numerical experiments using Algorithms 4.1 and 4.2 and a modified Polak-Ribière-Polyak conjugate gradient algorithm (called MPRP) [18]. It is well known that the CG method is very effective for large-scale smooth problems. We will show that these two algorithms are also applicable to large-scale nonsmooth problems. The nonsmooth academic test problems that are listed, along with their initial points, in Table 1 are described in [27], where “Problem” is the name of the test problem, “x0” is the initial point, and the corresponding numbers of variables are also given, “f” is the optimal value of the test problem. Problems 1-5 are convex functions, and the others are nonconvex functions. The detailed characteristics of these problems can be found in [27]. Because we wished to test the three considered methods for application to large-scale nonsmooth problems, problem dimensions of 5000, 10000, 50000, and 100000 were chosen. In our experiments, we found that problem 2 required a considerably amount of time to solve; therefore, we set its largest dimension to 50000.
Table 1

Test problems and their initial points and optimal value.

No.Problemx0fops
1Generalization of MAXQ(1, 2, ⋯, n/2, −(n/2 + 1), ⋯, −n)0
2Generalization of MXHILB(1, 1, ⋯, 1)0
3Chained LQ(−0.5, −0.5, ⋯, −0.5) -2(n-1)
4Chained CB3 I(2, 2, ⋯, 2)2(n − 1)
5Chained CB3 II(2, 2, ⋯, 2)2(n − 1)
6Number of active faces(1, 0, ⋯, 1, 0)0
7Nonsmooth generalization of Brown function 2(−1, −1, ⋯, −1)0
8Chained Mifflin 2(−1.5, 2, ⋯, −1.5, 2)varies
9Chained Crescent I(1, 0, ⋯, 1, 0)0
10Chained Crescent II(1, 0, ⋯, 1, 0)0
Both algorithms were implemented using Fortran PowerStation 4.0 with double-precision arithmetic, and all experiments were run on a PC with a Core 2 Duo E7500 CPU @2.93 GHz with 2.00 GB of memory and the Windows XP operating system. The following parameters were chosen for Algorithms 4.1 and 4.2 and MPRP: σ = 0.8, ρ = 0.5, c = 0.01, ϵ = 1E − 15, and η = 0.01. We stopped the algorithms when the condition ‖g(x)‖ ≤ 1E − 5 or ∣F(x) − F(x)∣ ≤ 1E − 8 or |f(x) − f| ≤ 1E − 4 was satisfied. If the number of iterations exceeded ten thousand, the program would also terminate. Because a line search cannot always ensure the descent condition , an uphill search direction may arise in real numerical experiments, which may cause the line search rule to fail. To avoid this condition, the step size α was accepted only if the search number in the line search was greater than five. In the experiments, the subproblem Eq (13) was solved using the PRP CG method (called the sub-algorithm), and its numbers of iterations and function evaluations were added to those of Algorithm 4.1, Algorithm 4.2 or MPRP (called the main algorithm). In the sub-algorithm, if ‖∂f(x)‖ ≤ 1E − 4 or f(x) − f(x) + ‖∂f(x)‖2 − ‖∂f(x)‖2 ≤ 1E − 3 (see [39]) holds, where ∂f(x) is the subgradient of f(x) at point x, then the algorithm terminates. The sub-algorithm will also terminate when the iteration number exceeds ten. For the line search, the Armijo line search technique was used and the step length was accepted if the search number was greater than five. The columns in Tables 2, 3 and 4 have the following meanings:
Table 2

Test results for Algorithm 4.1.

No.DimNI/NFg(x)‖Timeffinal
15000122 / 23044.963444E-053.109375E-012.977173E-08
150000142 / 27205.295883E-043.451688E+003.177435E-08
1100000148 / 28448.533346E-048.389563E+002.559965E-08
2500048 / 90205.279200E+020
21000052 / 97802.290124E+030
25000063 / 11943.689089E-101.577677E+049.789607E-07
3300011 / 556.505226E-167.031086E-04-4.241179E+03
430005 / 5002.943751E-025.998031E+03
5300014 / 6504.543751E-025.998000E+03
6500064 / 12407.081679E-165.470469E-011.881710E-06
61000069 / 13457.428122E-161.187422E+002.725454E-06
65000082 / 16186.935890E-167.172547E+005.888909E-06
610000087 / 17237.275201E-161.548309E+018.529442E-06
7500012 / 592.710511E-162.024688E-012.327843E-06
71000013 / 615.421024E-164.373125E-014.656153E-06
75000023 / 826.776293E-162.733234E+001.164132E-05
710000041 / 1193.388159E-168.077172E+001.164146E-05
8300012 / 586.505227E-161.404690E-02-2.120705E+03
9500016 / 1245.750809E-161.084375E-017.202892E-07
91000016 / 1252.875404E-163.114688E-017.198567E-07
95000026 / 1463.594263E-163.202235E+001.798780E-06
910000027 / 1487.188527E-169.281172E+003.597344E-06
10500013 / 618.470350E-163.274688E-014.365231E-06
101000014 / 644.235176E-165.623125E-014.365406E-06
105000042 / 1215.293999E-163.483234E+001.091389E-05
1010000051 / 1402.647004E-169.937172E+001.091395E-05
Table 3

Test results for Algorithm 4.2.

No.DimNI/NFg(x)‖Timeffinal
15000186 / 38784.403006E-064.539688E-012.641011E-09
150000242 / 50518.447711E-056.235391E+005.068475E-09
1100000259 / 53773.400454E-041.389234E+011.020121E-08
2500098 / 20105.769694E-061.189811E+033.089375E-04
210000107 / 21993.611318E-065.205405E+031.859985E-04
250000129 / 26793.710004E-061.349417E+049.817318E-05
3300011 / 556.505226E-161.590626E-02-4.241179E+03
430007 / 8902.552654E-025.998031E+03
5300016 / 10407.656254E-025.998000E+03
6500064 / 12407.081679E-165.306719E-011.881710E-06
61000069 / 13457.428122E-161.157047E+002.725454E-06
65000082 / 16186.935890E-167.391297E+005.888909E-06
610000087 / 17237.275201E-161.539180E+018.529442E-06
7500012 / 424.656623E-162.339532E-012.327843E-06
71000013 / 449.313248E-164.377500E-014.656153E-06
75000023 / 662.910396E-162.734688E+001.164132E-05
710000041 / 1025.820813E-167.891281E+001.164146E-05
8300012 / 586.505227E-163.190626E-02-2.120705E+03
9500016 / 1079.879814E-162.969532E-017.202892E-07
91000016 / 1084.939907E-165.157501E-017.198567E-07
95000026 / 1296.174896E-163.202688E+001.798780E-06
910000027 / 1323.087449E-169.000281E+003.597344E-06
10500013 / 453.637988E-163.119532E-014.365231E-06
101000014 / 477.275977E-165.627500E-014.365406E-06
105000042 / 1049.095022E-163.468688E+001.091389E-05
1010000051 / 1234.547519E-169.625281E+001.091395E-05
Table 4

Test results for MPRP.

No.DimNI/NFg(x)‖Timeffinal
15000250 / 51971.146977E-046.209688E-016.879798E-08
150000286 / 59911.099941E-038.980437E+006.599447E-08
1100000297 / 62222.127262E-032.067906E+016.381689E-08
2500098 / 20255.373446E-151.105497E+039.428024E-09
210000107 / 22143.363302E-154.828827E+035.676222E-09
250000129 / 26931.382084E-141.683294E+045.992015E-09
3300011 / 556.505226E-163.221878E-022.793039E-06
430007 / 8904.721878E-027.714992E+03
5300016 / 10407.821879E-027.714994E+03
6500064 / 12407.081679E-166.085625E-011.881710E-06
61000069 / 13457.428122E-161.265188E+002.725454E-06
65000082 / 16186.935890E-167.234313E+005.888909E-06
610000087 / 17237.275201E-161.573519E+018.529442E-06
7500012 / 592.710511E-161.105805E+032.327843E-06
71000013 / 615.421024E-164.829446E+034.656153E-06
75000023 / 826.776293E-161.037413E+011.164132E-05
710000041 / 1193.388159E-161.103525E+011.164146E-05
8300012 / 586.505227E-163.259378E-02-2.120705E+03
9500016 / 1245.750809E-161.105867E+037.202892E-07
91000016 / 1252.875404E-164.829565E+037.198567E-07
95000026 / 1463.594263E-161.115213E+011.798780E-06
910000027 / 1487.188527E-161.286225E+013.597344E-06
10500013 / 618.470350E-161.105892E+034.365231E-06
101000014 / 644.235176E-164.829612E+034.365406E-06
105000042 / 1215.293999E-161.158712E+011.091389E-05
1010000051 / 1402.647004E-161.381325E+011.091395E-05
Dim: the dimensions of problem. NI: the total number of iterations. NF: the number of function evaluations. ‖g(x)‖: the norm of g(x) at the final iteration. Time: the CPU time in seconds. f: the value of f(x) at the final iteration. From the above three tables, it is not difficult to see that Algorithm 4.1 is superior to Algorithm 4.2 and MPRP in terms of NI, NF, and ‖g(x)‖. Algorithm 4.1 yields smaller values of NI and NF when the program terminates; moreover, the value of ‖g(x)‖ is smaller than that for Algorithm 4.2 in most cases. However, we also note that our algorithms can efficient solve the 3000 dimensional(with maximum dimensional) case for problems 3, 4, 5 and 8, if we increase the dimensional, these algorithms fail to converge to good minima and become stuck at local. To directly illustrate the performances of these two methods, we used the tool developed by Dolan and Moré [40] to analyze their efficiencies in terms of the number of iterations, number of function evaluations, and CPU time. In the following, Figs 1, 2 and 3 represent the results presented in Tables 2, 3 and 4 in terms of NI, NF, and Time, respectively.
Fig 1

Performance profiles of these methods (NI).

Fig 2

Performance profiles of these methods (NF).

Fig 3

Performance profiles of these methods (Time).

From Figs 1 and 2, we can conclude that Algorithm 4.1 performs better than Algorithm 4.2 and MPRP do in terms of the numbers of iterations and function evaluations. Moreover, the excellence of Algorithm 4.1 can obviously be attributed to the superior theoretical properties of the MHZ method compared with the usual HZ method. However, Fig 3 indicates that Algorithm 4.2 is superior to Algorithm 4.1 and MPRP in terms of CPU time. Overall, all methods are very effective for application to large-scale nonsmooth optimization problems.

Conclusion

(i) In this paper, we focus on the HZ CG method and study the application of this method to solve nonsmooth optimization problems. Several results are presented that prove the efficiency of this method for application to large-scale problems of nonsmooth unconstrained optimization. (ii) Motivated by the HZ formula, we also present a modified HZ CG formula. The modified HZ formula not only possesses the sufficient descent property of the HZ formula but also belongs to a trust region and has the non-negative scale (iii) We report the results of applying three methods to solve large-scale nonsmooth convex minimization problems. Global convergence is achieved, and numerical experiments verify that both methods can be successfully used to solve large-scale nonsmooth problems. (iv) Although the HZ and MHZ methods offer several key achievements for large-scale nonsmooth optimization, we believe that there are at least five issues that could be addressed to gain further improvements. The first is the scale c in the modified HZ CG algorithm, which could be adjusted. The second is the application of other CG methods for this type of optimization areas; perhaps there exists a more suitable CG method for this purpose. Regarding the third issue, it is well known that limited-memory quasi-Newton methods are effective techniques for solving certain classes of large-scale optimization problems because they require minimal storage; this inspires us to combine limited-memory quasi-Newton methods with the HZ CG technique to solve large-scale nonsmooth optimization. In future, we will also use the HZ CG method to investigate large-scale nonsmooth optimization with constraints; this is the fourth issue that we believe must be addressed. The last issue is the most important one, namely, the consideration of other optimality conditions and convergence conditions in nonsmooth problems should be paid. All of these issues will be addressed in our future work.
  3 in total

1.  Incremental Support Vector Learning for Ordinal Regression.

Authors:  Bin Gu; Victor S Sheng; Keng Yeow Tay; Walter Romano; Shuo Li
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2014-08-12       Impact factor: 10.451

2.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification.

Authors:  Bin Gu; Victor S Sheng
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2016-02-24       Impact factor: 10.451

3.  Two New PRP Conjugate Gradient Algorithms for Minimization Optimization Models.

Authors:  Gonglin Yuan; Xiabin Duan; Wenjie Liu; Xiaoliang Wang; Zengru Cui; Zhou Sheng
Journal:  PLoS One       Date:  2015-10-26       Impact factor: 3.240

  3 in total
  2 in total

1.  An active-set algorithm for solving large-scale nonsmooth optimization models with box constraints.

Authors:  Yong Li; Gonglin Yuan; Zhou Sheng
Journal:  PLoS One       Date:  2018-01-02       Impact factor: 3.240

2.  A conjugate gradient algorithm for large-scale unconstrained optimization problems and nonlinear equations.

Authors:  Gonglin Yuan; Wujie Hu
Journal:  J Inequal Appl       Date:  2018-05-11       Impact factor: 2.491

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.