Literature DB >> 29348705

Proximal extrapolated gradient methods for variational inequalities.

Abstract

The paper concerns with novel first-order methods for monotone variational inequalities. They use a very simple linesearch procedure that takes into account a local information of the operator. Also, the methods do not require Lipschitz continuity of the operator and the linesearch procedure uses only values of the operator. Moreover, when the operator is affine our linesearch becomes very simple, namely, it needs only simple vector-vector operations. For all our methods, we establish the ergodic convergence rate. In addition, we modify one of the proposed methods for the case of a composite minimization. Preliminary results from numerical experiments are quite promising.

Entities: Chemical Disease Gene

Keywords: 47J20; 65K10; 65K15; 65Y20; 90C33; convex optimization; ergodic convergence; linesearch; monotone operator; nonmonotone stepsizes; proximal methods; variational inequality

Year: 2017 PMID： 29348705 PMCID： PMC5751890 DOI： 10.1080/10556788.2017.1300899

Source DB: PubMed Journal: Optim Methods Softw ISSN： 1026-7670

Introduction

This paper considers a problem of the variational inequality in a general form where is a finite-dimensional vector space, is a monotone operator and is a convex function. This is an important problem that has a variety of theoretical and practical applications [21,22,28]. The main iteration step of the proposed methods is given as follows: where we define , and from local properties of . For this in each iteration we run some simple linesearch procedure. We propose different procedures for different cases: for the general problem (1), for (1) with , and for the case when F is a gradient of a convex differentiable function. Each iteration of the linesearch procedure requires only one value of F and function g is not used at all. In addition, the monotonicity of stepsizes is not required. Also in case when F is affine our linesearch procedures need only vector–vector computation. Moreover, our analysis does not need a Lipschitz assumption on F, only locally Lipschitz one. Although we consider quite a general problem, our discussion presented below consists of two separate parts devoted to the variational inequality problems and optimization problems. This is because we noticed that for some difficult optimization problems our algorithm may work much better than some existing methods. Next section after the introduction studies our first two methods. We show their global convergence, consider some particular cases and establish complexity rates. In Section 3 we consider a problem of composite minimization for which we improve one of our methods. In Section 4 we study some known linesearch procedures and make numerical illustrations of our methods with several popular methods.

Preliminaries

In what follows, denotes a finite-dimensional real vector space with inner product and norm , denotes a gradient of a smooth function f. For a proper lower semicontinuous convex function , we denote its domain by , that is, . The proximal operator is defined as For a set C, we denote by the indicator function of the set, that is, if and ∞ otherwise. We denote the metric projection onto C as . Clearly, by definition, . The operator F is called monotone if

Variational inequality perspective

A general approach to solve (1) consists in solving a sequence of the simpler variational inequalities [13,25]. We concentrate on the most simple case of this approach: projected (proximal) methods. When F satisfies cocoercivity assumption (that is stronger than just monotonicity), one can apply several methods from an optimization framework. In particular, this holds for the proximal gradient method ( forward–backward[FB] method) [30,44] and inertial method [31,38] (see also [1,40,45] for the original ideas). However, those methods do not converge when F is just monotone. When , variational inequality (1) reduces to where is a closed convex set. For this specific case, Korpelevich [29] proposed the extragradient method where and L is the Lipschitz constant of the operator F. A bit different approach was proposed by Popov [46] where . Note that the latter method needs only one value of F per iteration, though it uses a smaller stepsize. Both Korpelevich's and Popov's methods gave birth to a fruitful research [12,17,23,26,27,32,35,36,50,51] where there have been proposed different improvements: linesearch procedures or/and avoiding of Lipschitz-continuity assumption, decreasing a number of metric projections, etc. Actually, the basic schemes (3) and (4) can be applied to a general problem (1). However, this is not always the case for their extensions. In turn, problem (1) can be formulated as a more general problem of a monotone inclusion. In this case, one may apply Tseng's forward–backward–forward (FBF) method [53] where . Tseng's method has attracted a lot of interest due to its simplicity and generality, see [8,9,11,37]. Usually the algorithms for (1) or (2) that have practical interest use some linesearch procedures to find in each iteration. The most popular choice is the Goldshtein-Armijo-type stepsize rule [27,50,51,53], which requires evaluation of F and in each of inner iterations. For example, the linesearch for method (5), proposed in the same paper [53], allows us to require only continuity of F. However, even with fixed steps the method uses two values of F per iteration. We will consider it in more detail in Section 4. Recently, in [34] there was proposed the reflected projected gradient method for problem (2). When stepsize λ is fixed, it generates a sequence by where . This scheme is much simpler than (3), (4) or (5) but the most important that it gives a very efficient way to incorporate a linesearch procedure. In [34] one of such ideas was applied and numerical results approved its efficiency. However, the proposed scheme was quite complicated and one of the goals of this paper is to propose simpler schemes that, in addition, can be applied to a more general problem than (2). During the preparation of this paper, we became aware of the recent work [33]. In that work, authors proposed some linesearch procedure, also exploiting the idea of [34]. However, our work is different. First, we consider a more general model where g may be different from the indicator function . Second, we do not require Lipschitz continuity of the operator F. And, moreover, even in the simplest case when , our algorithms seem to be a bit simpler.

Optimization perspective

Consider the following problem of composite minimization: where f is a differentiable convex function and g is a proper lower semicontinuous convex function. Such formulation assumes that we know the structure of the underlying function Φ. It is not difficult to verify that the first-order optimality conditions of (6) are a particular case of (1) with . Problem (6) is rich enough to encompass many important applications in machine learning, image processing, compressed sensing, statistics, etc. [4,14,15,19,41,43,54]. Although first-order methods for problem (6) have a long history, they continue to receive much attention from optimization community. Many real-life applications are large scale and in this case first-order methods often outperform other methods such as interior point methods, Newton methods, since the iterations of the former are much cheaper and do not depend on the dimension of the problem as much as the latter do. Under the assumption that is Lipschitz continuous, that is, there exists some L>0 such that one of the most simple methods for solving (6) is the proximal gradient method that generates as where . We also have to mention a very important class of two-step proximal gradient methods that include inertial (heavy ball) methods introduced by Polyak in [45] and accelerated proximal methods, pioneered by seminal work of Nesterov [40] and further developed in [4,41,54] for a problem of composite minimization. This class enjoys an improved convergence rate compared with classical proximal gradient method (8). For all these methods condition (7) is also important. There are several methods [6,16,48,53] that do not require condition (7). Our linesearch procedure in some sense is similar to them but is cheaper since it does not use a proximal mapping. We underline that problems, where (7) does not hold, take place, for example, in barrier methods, entropy maximization, geometric programming, image processing [7,10,18,19,41,47]. Even in the case when is Lipschitz continuous, the proposed methods might be competitive with known methods. Roughly speaking, the general picture of applicability of our methods is the following. In cases when local Lipschitz constant of changes drastically, that is, f has a very different curvature in different directions, then a global Lipschitz constant cannot be a good prediction and our methods will benefit from using the local information of . In turn, when is rather flat, that is, local Lipschitz constant of does not change too much, our method will be in the worst case comparing to other methods, since the latter allow us to take stepsizes larger or/and they may enjoy a better complexity rate. There are a lot of possible linesearch procedures and adaptivity techniques for (8) under the assumption (7), see [2,4,5,41,42,49]. All of them require evaluation of in every inner iteration of the linesearch. Since our methods do not need this, they will benefit when is expensive.

Main part

The following assumptions are made throughout the paper: A1 is locally Lipschitz continuous and monotone. A2 is proper l.s.c. convex function. A3 is a continuous function. A4 The solution set of (1), denoted by , is nonempty. Assumption A3 seems to be not quite usual, though it is very general. Clearly, it fulfills for any g with open (this includes finite-valued functions) or for an indicator of some closed convex set C. Moreover, when A2 implies A3 ( [3, Corollary 9.15]). By this, every separable function that satisfies A2 also satisfies A3. The following two lemmas are classical. For their proofs we refer to [3]. Let be a convex function, . Then if and only if Let and (A2) holds. Then is a solution of (1) if and only if Next lemma is obvious. Let be two nonnegative real sequences such that Then is convergent and .

Algorithm 1

First, we consider a particular case of (1) when for a closed convex set . Now the problem becomes to find such that In Algorithm 1, we need to ensure that is bounded. Inequality (10) gives us something similar to an estimation that we usually get from Lipschitz continuity of F. It is easy to see that finding the largest that satisfies (10) is equivalent to solving a quadratic equation, thus it can be found explicitly. Evidently, the update of the inner loop requires only computation of F. Also notice that we start our linesearch from . This is only one possible case. In fact, for us it is only important that the linesearch provides us some i, for which we can get , see Lemma 2.6. Thus, for some problems it might be beneficial to start linesearch from. First, let us show that Algorithm 1 is well defined. The linesearch in Algorithm 1 always terminates. Suppose that the assertion of the lemma is false. Let . Since F is locally Lipschitz continuous, it is Lipschitz continuous on D (because D is a bounded set). Hence, there exists L such that Note that for any . Then, in order to get a contradiction, it remains to take and set. For generated by Algorithm 1, and the following inequality holds for all : By Lemma 2.1, Similarly, for the previous iterate we have Taking in the above inequality and then , we obtain Multiplying (14) by and adding it to (13) give us From , it follows Summation of (12) and (15) yields By the cosine rule, we derive Taking into account (10), we obtain the desired inequality (11). Assume that generated by Algorithm 1, is bounded. Then . Evidently, the sequence is bounded as well. Since F is Lipschitz continuous on bounded sets, there exists L>0 such that From the construction of it can be seen easily that if we have then and satisfy inequality In other words, the linesearch terminates at least after two iterations. Since we seek the largest , we have . Now, on the contrary, assume that . Hence, there exists such that for all . Let . As , we obtain . But as well, so again we have that . By induction we conclude that is nondecreasing and thus cannot converge to zero. This contradicts to our assumption.

Algorithm 2

For a general problem (1) we propose the following Algorithm 2. Basically, the linesearch procedure finds such (trying to choose the larger one) that satisfies the ‘local Lipschitz’ condition (18). On the one hand, we want to have , since this gives us possibility at least theoretically to increase the stepsize from iteration to iteration. On the other hand, we have to ensure that will not be larger than . These caused a bit complicated formula for . Although (1) with is precisely (2), Algorithm 1 in this case does not coincide with Algorithm 2. The former is more flexible since it does not apply such a restriction on stepsizes as the latter does. We want to point out that when F is L-Lipschitz continuous, instead of running the linesearch procedure, we can use a fixed stepsize and take in each iteration of Algorithm 2. By this we recover a basic algorithm in [34]. As before, let us show that Algorithm 2 is well defined. The linesearch in Algorithm 2 always terminates. The proof is very similar to the proof of Lemma 2.4. The main distinction is that now we have to set , where , and notice that for all . For defined in Algorithm 2 and the following inequality holds for all : The general idea of the following proof is very similar to the proof of Lemma 2.5. By Lemma 2.1 Similarly, After substitution in the last inequality and , we obtain Multiplying the last inequality by and then adding it to the previous ones yields From and we get Adding (20) to (22) gives us Using the cosine rule and (18), we obtain that finishes the proof. For Algorithm 2 we can prove a stronger result than Lemma 2.6. Assume that the sequence generated by Algorithm 2, is bounded. Then . Since is bounded, there exists L>0 such that Without loss of generality assume that . We show that from follows . Clearly, if then (18) holds. Suppose that for some . If i=0 then it is obvious that . If i>0 then by the construction of the linesearch does not satisfied (18). This means that and hence,.

Proof of convergence

For generality we will write where in case of Algorithm 1 we suppose that . It is clear that both problems (2) and (1) are equivalent to finding such that for all . Let be generated by either Algorithm 1 or 2 and let . Then the following inequality holds for all : Monotonicity of F yields Taking and using the above, we can rewrite both (11) and (19) as one inequality Note that in both cases we have that . Since , it follows: It only remains to estimate . For this we use the estimation from [34] Combining (28) and (29), we obtain the desirable inequality (25). Let sequences and be generated by either Algorithm 1 or 2. Then and converge to a solution of (1). Let us show that the sequence is bounded. Fix any . For set It is easy to see that (25) is equivalent (in a new notation) to Evidently, and . Hence, by Lemma 2.3 we conclude that is convergent and . This means that is bounded as well as and From the above it also follows that and is bounded. By Lemma 2.6 or 2.9 and by boundedness of there exists an increasing sequence of positive numbers such that is separated from zero and converges to some as . It is clear that also converges to that . We show . From Lemma 2.1 it follows that or equivalently Taking the lower limit in (31) as and using that is separated from zero, , and is l.s.c., we obtain Hence, . Recall that for any the sequence is convergent. Thus, taking defined above, we obtain that the sequence is convergent. As is bounded and is continuous due to A3, . Therefore, and the proof is complete. As one can see, the last arguments were the only place where we used A3. Without this assumption we are only able to show that all limits points of belong to. Both Algorithms 1 and 2 require as input data. Although the algorithms do not have any restriction on the initialization procedure, we suggest to define as follows. Choose any in a small neighbourhood of the starting point and take the largest that satisfies

Affine cases

In this section we introduce some additional suggestions that can simplify the proposed algorithms. If F is affine then instead of computing in each iteration of linesearch procedures 1 or 2, we only need to remember , and use that . Clearly, with this remark computational complexity of Algorithm 1 or 2 per iteration is almost the same as, for example, projected gradient method (or proximal gradient method) with a fixed stepsize. Our algorithms require some more vector–vector operations and a bit more memory. When C in (2) is an affine set, Algorithm 1 becomes simpler. Namely, we do not need the bounds neither . In fact, the former bound was required in our proof of Theorem 2.11 to ensure that and the latter was used to show that . However, when C is affine, and thus, for all . Therefore, both items above hold for any choice of . If we consider (2) with affine map F and affine set C then it is clear that Algorithm 1 will benefit all the advantages of the two remarks above.

Rate of convergence

In this section we investigate the ergodic rate of convergence for the sequence for Algorithms 1 and 2. It is well known that rate holds for the extragradient method, which is optimal [39,54]. In those papers the authors proposed much more general methods among which the extragradient method is only a particular example. However, those methods are more complicated, they used fixed steps and they require Lipschitz continuity of F. We need the following error function (known as the dual-gap function [22,54]): The relation between this error function and problem (1) is given by the following lemma.

(see [22,54]).

if and only if and . Next theorem shows that we can use the above criteria to find with a desired accuracy. Let and be the sequences generated by either Algorithm 1 or 2. For any define and as Then and If in Lemma 2.10 we did not use inequality (28) we would get the following: from which follows Summing (36) over , and using that , we obtain Note that function is convex and all the coefficients in square brackets are nonnegative due to the assumption of algorithms. Applying Jensen's inequality to the left-hand side of the above inequality and taking into account that we obtain where Evidently, which finishes the proof. Notice that due to Lemmas 2.6 and 2.9. Moreover, for Algorithm 2 we have a lower estimate , which follows from Lemma 2.9. This implies and we can recover the same ergodic rate of convergence for Algorithm 2. However, it is clear that since we use linesearch, in practice we obtain a better constant that. When (1) is a particular case of a composite minimization problem or a saddle point problem, inequality (34) can be improved. For simplicity, we show how to do this only for the case of constrained optimization. If F is a gradient of a convex differentiable function f, that is, (1) is the result of , then Instead of using (35), we consider Lemmas 2.5 and 2.8 for that give us identical inequality Applying (37) and estimation (29), we obtain Using the same arguments as in Theorem 2.13, we obtain

Composite minimization

When F is a gradient of a convex function, problem (1) is equivalent to a problem of a composite minimization where we assume that A5 is a convex differentiable function with locally Lipschitz gradient . To highlight the specificity, instead of F we will write . We denote . Throughout this section we suppose that A2–A5 hold. Note that Algorithm 3 uses the same stopping criteria in the linesearch procedure as in Algorithm 2: Moreover, for Algorithm 3 is identical to Algorithm 2. In turn, for the stepsize is larger than that in Algorithm 2. Result stated in Lemma 2.7 holds for Algorithm 3 as well. Since its proof is identical, we omit it. However, the main ingredient to prove a convergence of differs from Lemmas 2.8 and 2.10. For defined in Algorithm 3 and the following inequality holds for all : With the same arguments as in (20) and (21) we obtain and Using that and , we obtain By convexity of f, Summing (44)–(46), multiplied by , we obtain Notice that for (47) is very similar to (23). Their distinction caused only by using convexity of f in (46). As usually, by the cosine rule we can rewrite the above as Let Then (48) is equivalent to Recall that inequality (50) holds for every . Thus, taking , we obtain Hence, Applying to (50), this yields Using that , we deduce To complete the proof it only remains to use (29). Unfortunately, we are not able to show that the whole sequence is separated from zero. This is because the first iteration of the linesearch may start from . To show that does not converge to 0, we need to apply a bit more complex arguments than ones in Lemma 2.6. Assume that the sequence , generated by Algorithm 3, is bounded. Then . Since is bounded, there exists L>0 such that Also, it is not difficult to show by induction that for all n. Let . We show that at least one of or is larger or equal than . Evidently, from this the assertion of lemma follows. On the contrary, assume that for j=0,1. Due to , , and (42), the linesearch procedure in Algorithm 2 must terminate after the first iteration. This means that and . From our assumption we have Using that , we obtain Note that . This implies , from which follows. But the latter inequality does not hold for. This contradiction finishes the proof. In fact, the upper bound for θ can be enlarged, but then the proof of Lemma 3.2 will be more complicated. Perhaps larger θ seems to be a better choice because will increase. However, in this case the bound will decrease and in the result we may get even smaller . So, one can see as a trade-off between those two bounds. Numerical experiments also approved as the best choice. Let sequences and be generated by Algorithm 3. Then and converge to a solution of (40). From and it follows Applying (51) to (43) with , we obtain With sequences and given by the rest of the proof almost coincides with the proof of Theorem 2.11. When is L–Lipschitz continuous then Algorithm 3 allows us to use a fixed stepsize . In this case, taking , steps 1 and 2 of Algorithm 3 can be written as If this scheme reduces to the basic reflected proximal gradient method. Using Lemma 3.1 we can derive the same ergodic rate of convergence of Algorithm 3 as in Section 2.5.

Comparison

For a general problem (1) one can apply the FBF method proposed by Tseng [53]. It generates the sequence by the following rule: for given , , , take and run repeat break if update return Compute . The choice of δ is quite important. Originally in the paper . However, this excludes possibility to enlarge stepsizes. We suggest to use and instead controlboundedness of . Evidently, the stopping criteria of the linesearch in Tseng's method are very similar to (18). However, each iteration of the former requires evaluation of z. In the same time, Tseng's method is more general, as it allows us to solve a general monotone inclusion and requires only continuity of F. For a particular case but without the Lipschitz-continuity assumption, one can apply the FB method with the linesearch proposed recently in [16]: for given , , , take and run repeat break if update return Originally, in [16] the linesearch always starts from the same λ. We found this not verypractical, and instead suggest to use with . This is the same what we proposed abovefor the FBF method. We do not discuss two other methods proposed in [16], since in our numerical experiments they perform much worse compared with the aforementioned method.

Numerical illustration

Our test problems include some random generated minimization problems over difficult nonlinear constraints, few classical test problems for VI, and an instance of geometric programming problem For all problems but the last we compare our first two algorithms with FBF method of Tseng. For the last problem we compare the performance of our Algorithms 2 and 3 with FBF, FB method with the linesearch as described in Section 4, and FISTA [4]. Computations were performed using Python 2.7 on an Intel Core i5-5200U CPU 2.20 GHz running 64-bit Linux Mint 17.3. Since the fixed points of the operator , for , are solutions of (1), it is natural to use the following stopping criteria: In particular, for the FBF method we use and for our methods The reason for that is that we do not want to compute extra. However, we can check that and hence, for we obtain even stronger stopping criteria. For a benchmark of all algorithms we included the number of iterations (iter), the number of proximal operators (# ), the number of F (# F) and the running time. For the tolerance we set . The parameters were chosen as follows Alg.1, Alg.2: , ; Alg.3: , , ; FBF: , ; FB, FISTA: , . We did not set for our methods, since it is rather a theoretical requirement. For our methods as well as for FBF we used the initialization procedure as described in Remark 2.1. Also note that σ in our methods and β in FBF, FB, and FISTA play the same roles, that is why we chose them equal. Unless otherwise stated we choose for FBF and FB, as described in Section 4. In many examples below we used a random generated data. Usually we ran several experiments with the same distribution and if there was no large discrepancy, we chose one sample from these experiments for the presentation.

Constrained minimization with nonlinear constraints

Consider a simple general model which allows us to generate monotone variational inequalities where are convex smooth functions, is a convex nonsmooth function. For example, g might encode some simple constraints. Introducing Lagrange multipliers for each constraint, we obtain This problem is equivalent to the following variational inequality: where The notation reads: and . Here is a list of three examples we consider:

Example 1

where with uniformly generated values from , .

Example 2

where with uniformly generated values from . Evidently, this problem has a uniquesolution .

Example 3

where with uniformly generated values from , with uniformly generated values from and C is either the box or the ball . Clearly, for all these problems F is either not Lipschitz continuous or it is but highly nonlinear, and hence the algorithms with linesearch are more practical. We examine the performance of algorithms for different d and different random data. The initial point is with , . The results are collected in Table 1.

Table 1.

Results for problems (53)–(55).

	d=500			d=1000			d=5000
	Alg.1	Alg.2	FBF	Alg.1	Alg.2	FBF	Alg.1	Alg.2	FBF
Ex.1
# iter	178	188	433	260	217	630	162	187	855
# prox	178	188	896	260	217	1317	162	187	1798
# F	368	354	1329	540	411	1947	344	352	2653
Time	0.2	0.2	0.6	0.3	0.2	1	0.4	0.4	2.6
# iter	184	217	564	187	223	761	238	224	1132
# prox	184	217	1176	187	223	1596	238	224	2389
# F	383	411	1740	394	423	2357	499	426	3521
Time	0.2	0.2	0.8	0.2	0.23	1.2	0.5	0.5	3.4
Ex.2
# iter	413	327	386	310	318	569	425	319	901
# prox	413	327	806	310	318	1197	425	319	1907
# F	864	637	1192	659	618	1766	895	617	2808
Time	0.4	0.3	0.6	0.4	0.4	1.1	1.1	0.8	4.2
# iter	330	303	414	430	341	458	366	323	731
# prox	330	303	866	430	341	960	366	323	1544
# F	695	587	1280	902	666	1418	770	625	2275
Time	0.3	0.3	0.6	0.5	0.4	0.9	1.0	0.8	3.1
Ex.3 (box)
# iter	30,332	28,615	43,379	8845	8689	14,219	27,055	25,354	27,588
# prox	30,332	28,615	92,705	8845	8689	30,395	27,055	25,354	58,960
# F	60,669	56,980	136,084	17,758	17,310	44,614	54,261	51,379	86,548
Time	4.8	4.1	8.8	1.9	1.8	4.2	19.2	17.5	27.2
Ex.3 (ball)
# iter	29	20	158	27	21	212	21	21	544
# prox	29	20	349	27	21	464	21	21	1173
# F	76	51	507	72	53	676	60	53	1717
Time	0.01	0.0	0.05	0.01	0.01	0.09	0.02	0.02	0.57

HpHard problem

This problem was considered in [24]. This is an instance of VI with an affine operator , where , every entry of and of the skew-symmetric matrix is generated uniformly from , every diagonal entry of the diagonal matrix is generated uniformly from , and every entry of is generated uniformly from . The feasible set is For the projection onto C we use the algorithm from [20]. As this is a VI with an affine operator, our proposed linesearch will not require additional matrix–vector multiplications. We observed that for this problem the linesearch does not give a significant improvement, but at least we did not require to compute the matrix norm of M. Also for that reason we test two variants of Tseng's algorithm: with and . The initial point is . The results are collected in Table 2. From these results we can see that FBF-1 almost corresponds to the FBF with a fixed stepsize (it needs a few extra evaluations). However, even in this case FBF-1 is substantially more expensive than our methods.

Table 2.

Results for problem (56).

		d=500				d=5000
	Alg.1	Alg.2	FBF-1	FBF-1.5	Alg.1	Alg.2	FBF-1	FBF-1.5
# iter	1310	1174	1030	1073	1896	1595	1363	1389
# prox	1310	1174	1032	2294	1896	1595	1365	2969
# F	1312	1176	2062	3367	1898	1597	2728	4358
Time	0.5	0.5	0.7	1.1	58	50	84	132

Sun's problem

Consider another classical test problem. We study a nonlinear VI, proposed by Sun [52] where Here D is a square matrix defined by condition and . We choose the feasible set C as (a) and (b) . The initial point is generated uniformly randomly from . For every and every C above we examine the performance of our methods with FBF. The results are presented in Table 3.

Table 3.

Results for problem (57)

	d=103			d=104			d=105
	Alg.1	Alg.2	FBF	Alg.1	Alg.2	FBF	Alg.1	Alg.2	FBF
C=R+d
# iter	42	73	141	43	76	163	47	80	173
# prox	42	73	294	43	76	341	47	80	363
# F	137	143	435	153	149	504	161	157	536
Time	0.01	0.01	0.04	0.04	0.04	0.14	0.8	0.7	2.2
C = Δd
# iter	78	78	138	80	83	145	85	88	170
# prox	78	78	290	80	83	305	85	88	359
# F	162	154	428	174	164	450	187	174	529
Time	0.02	0.02	0.07	0.1	0.1	0.3	1.7	1.6	5.3

Geometric programming

We consider a canonical example of geometric programming problem [10] for which we add -norm: where , . Obviously, (58) is a particular case of (6) with Clearly, in this case is not Lipschitz continuous. All entries of , b, c are generated uniformly randomly from , , and respectively. For this problem we study the performance of Alg.2, Alg.3, FB method with the linesearch from [16], FBF and FISTA [4]. However, in Table 4 we collect only those results which are relevant. To illustrate how values change over iterations, we also give two convergence plots for the last example (which relate to the third column in Table 4).

Table 4.

Results for problem (58).

	d=200,m=50			d=500,m=100			d=1000,m=100
	Alg.2	Alg.3	FB	Alg.2	Alg.3	FB	Alg.2	Alg.3	FB
x0=(0,…,0)
# iter	421	367	348	1001	690	830	1421	975	1140
# prox	421	367	732	1001	690	1758	1421	975	2424
# F	832	646	732	1960	1212	1758	2809	1728	2424
Time	0.05	0.03	0.03	0.27	0.16	0.23	0.68	0.4	0.61
x0=(0.5,…,0.5)
# iter	606	494	472	1415	1011	1151	2225	1601	1761
# prox	606	494	997	1415	1011	2444	2225	1601	3751
# F	1092	772	997	2512	1534	2444	3866	2349	3751
Time	0.07	0.04	0.04	0.35	0.2	0.3	0.95	0.68	0.81

As is not Lipschitz continuous, the linesearch is necessary for all first-order methods to be convergent. Moreover, it is very important that the linesearch allows us to increase stepsizes from iteration to iteration. This is the main reason why, for example, FISTA with standard linesearch [4] or with one from [16] converges very slowly. The problem (58) is highly nonlinear, so the rate is not very relevant because the constants which hide in O are very large. We emphasize that the poor performance of the FBF method for this problem remains a mystery for us. It is also interesting to see that Algorithm 3, which was derived specifically for composite minimization problems, shows a better performance than its predecessor Algorithm 2 (Figure 1).

Figure 1.

Convergence plots for problem (55), d=1000, m=100. (a) , (b) .

Conclusion

In this paper, there were proposed several algorithms for a general monotone variational inequality and a composite minimization problem. All methods use some simple linesearch procedure that allow us to incorporate a local information of the operator. For all methods there was established the ergodic rate of convergence. Numerical experiments also approved their efficiency. Quite interesting is that the proposed methods become extremely simple when the operator is affine. The requirement only of local Lipschitz continuity of the operator makes our methods very general. As numerical simulations have showed, the ratio (number of evaluation of the operator to the number of iterations) is almost always less than 2. In the same time, this ratio for the extragradient method or FBF method equals 2 even when they do not use any linesearch. Moreover, the ratio for our methods always equals 1. The main drawback of the proposed methods is that we need the bound . This multiplier makes the steps smaller in case when the Lipschitz constant of the operator does not change too much. It is interesting to study whether this bound can be increased.

1 in total

1. The Subgradient Extragradient Method for Solving Variational Inequalities in Hilbert Space.

Authors: Y Censor; A Gibali; S Reich
Journal: J Optim Theory Appl Date: 2011-02 Impact factor: 2.249

1 in total