Literature DB >> 28794608

The symmetric ADMM with indefinite proximal regularization and its application.

Hongchun Sun1, Maoying Tian2, Min Sun3,4.   

Abstract

Due to updating the Lagrangian multiplier twice at each iteration, the symmetric alternating direction method of multipliers (S-ADMM) often performs better than other ADMM-type methods. In practical applications, some proximal terms with positive definite proximal matrices are often added to its subproblems, and it is commonly known that large proximal parameter of the proximal term often results in 'too-small-step-size' phenomenon. In this paper, we generalize the proximal matrix from positive definite to indefinite, and propose a new S-ADMM with indefinite proximal regularization (termed IPS-ADMM) for the two-block separable convex programming with linear constraints. Without any additional assumptions, we prove the global convergence of the IPS-ADMM and analyze its worst-case [Formula: see text] convergence rate in an ergodic sense by the iteration complexity. Finally, some numerical results are included to illustrate the efficiency of the IPS-ADMM.

Entities:  

Keywords:  global convergence; indefinite proximal regularization; symmetric alternating direction method of multipliers

Year:  2017        PMID: 28794608      PMCID: PMC5522537          DOI: 10.1186/s13660-017-1447-3

Source DB:  PubMed          Journal:  J Inequal Appl        ISSN: 1025-5834            Impact factor:   2.491


Introduction

Let stand for an -dimensional Euclidean space, and let be nonempty, closed and convex set, where . For two continuous closed convex functions (), the canonical two-block separable convex programming with linear equality constraints is where (), . Throughout, the solution set of (1) is assumed to be nonempty. Convex programming (1) has promising applicability in modeling many concrete problems arising in a wide range of disciplines, such as statistical learning, inverse problems and image processing; see, e.g. [1-3] for more details. Convex programming (1) has been studied extensively in the literature, researchers have developed many numerical methods to solve it during the last decades, which are mainly based on the well-known Douglas-Rachford splitting method [4, 5] and the Peachmen-Rachford splitting method [5, 6], which originate with the partial differential equation (PDE) literature. Concretely, applying the Douglas-Rachford splitting method to the dual of (1) [7, 8], we get the well-known alternating direction of multipliers (ADMM) [9, 10], whose iterative schemes reads where is the Lagrangian multiplier; is a penalty parameter, and is a relaxation factor. Analogously, applying the Peachmen-Rachford splitting method to the dual of (1), we get the symmetric ADMM [11-13], which generates its sequence via the scheme where the feasible region of r, s is Both methods make full use of the separable structure of (1), and minimize the primal variables and individually in the Gauss-Seidel way. As elaborated in [13], the S-ADMM updates the Lagrangian multiplier twice at each iteration and thus the variables , are treated in a symmetric manner. The S-ADMM includes some well-known ADMM-based schemes as special cases. For example, it reduces to the original ADMM (2) when , and reduces to the generalized ADMM [14] when , . Therefore, the S-ADMM provides a unified framework to study the ADMM-type methods. The convergence results of the S-ADMM with any , including global convergence, the worst-case convergence rate in an ergodic sense, have been established in [13]. To the best of the authors’ knowledge, the worst-case convergence rate in some non-ergodic sense of the S-ADMM is still missing. In practical applications, the two essential subproblems related to and dominate the computation of the S-ADMM, which are often either linear or easily solvable, but nevertheless challenging. In order to solve the issue, some proximal terms are often added to these subproblems, which can linearize the quadratic term () of these subproblems, and as a result we have the following proximal S-ADMM (termed PS-ADMM) [15-17]: where is a positive definite matrix. When we set with , the quadratic term in the subproblem related to of the PS-ADMM is offset and thus the quadratic term is linearized. Then, if , the PS-ADMM only needs to compute the proximal mapping of the involved convex function at each iteration, which is often simple enough to have a closed-form solution in many practical applications, such as in the compressive sensing problems [3], (here is a square matrix) in the robust principal component analysis models [18]. is defined by the sum of all singular values of . The curse accompanying the above improvement in solvability is that the proximal parameter τ is not easy to determine for some problems in practice. Large τ prompts the weight of the quadratic term in the objective function of the -subproblem and inevitably results in the ‘too-small-step-size’ phenomenon. Then, the advance of is tiny at the kth iteration, which often slows down the convergence of the corresponding method. Therefore, it is meaningful to expand the feasible set of τ. Obviously, if we further reduce τ to , the proximal matrix G will become indefinite, and it is thus natural to ask whether or not the corresponding method with such G is still globally convergent? Quite recently the authors in [19-21] partially answered the question. More specifically, for the ADMM (2) with , He et al. [19] have proved that the feasible set of τ can be expanded to , and for the ADMM (2) with , Sun et al. [20] have proved that the feasible set of τ can be expanded to . Then, for the S-ADMM with , , Gao et al. [21] have proved that the feasible set of τ can be expanded to . Other relevant studies can be found in [22, 23]. In this paper, we continue to study along this direction, and present a new feasible set of τ, which generalizes those in [19-21] to any . Furthermore, we show that for any , the global convergence of the S-ADMM with some indefinite proximal regularization can be guaranteed. The rest of the paper is organized as follows. In Section 2, we summarize some preliminaries which are useful for further discussion. Then, in Section 3, we list the iterative scheme of the IPS-ADMM and prove its convergence results, including the global convergence and the convergence rate. Some preliminary numerical results are reported in Section 4. Finally, some conclusions are drawn in Section 5.

Preliminaries

In this section, we first list some notation used in this paper, and then characterize problem (1) by a mixed variational inequality problem. Some matrices and variables to simplify the notation of our later analysis are also defined. For any two vectors , or denote their inner product. For any two matrices , , the Kronecker product of A and B is defined as . We let and be the -norm and -norm for vector variables, respectively. denotes the n-dimensional identity matrix. If the matrix is symmetric, we use the symbol to denote even if G is indefinite; (resp., ) denotes that the matrix G is positive definite (resp., semi-definite). Let us split the feasible set of the parameters into the following five subsets: Obviously, the set is a simplicial partition of the set . Throughout, the proximal matrix G is defined by where we set with , , and is defined by

Remark 2.1

Note that if ; see Lemmas 3.4-3.8 in Section 3. Therefore, the feasible set of τ is expanded from to , which provides more choices for researchers or practitioners. Furthermore, we define an auxiliary matrix as follows: which is positive definite by . Invoking the first-order optimality condition for convex programming, we get the following equivalent form of problem (1): Finding a vector such that where Obviously, the problem (10) is a mixed variational inequality problem, which is denoted by . The mapping defined in (11) is not only monotone, but also satisfies the property Furthermore, the solution set of , denoted by , is nonempty under the nonempty assumption for the solution set of problem (1). Now, let us define three matrices in order to make our following analysis more succinct. Set

Lemma 2.1

Suppose the matrix is full column rank and the parameter α in (7) satisfies Then, the matrices M, Q, H defined, respectively, in (6), (7) satisfies

Proof

The proof of (16) is trivial, and we only need to prove (17). By the positive definiteness of P, we only need to prove is positive definite. Here denotes the corresponding sub-matrix formed from the rows and columns with the indices and as in Matlab. Substituting (7) into the right-hand side of (14), we get where the relationship ⪰ comes from and . Since the matrix is full column rank, we only need to prove the positive definiteness of the matrix which can be further written as where ⊗ denotes the matrix Kronecker product. Then, we only need to show the 2-by-2 matrix is positive definite. In fact, by (15), we have Therefore, the matrix H is positive definite. The proof is completed. □ At the end of this section, let us summarize two criteria to measure the worst-case convergence rate of the ADMM-type methods in an ergodic sense. For a given compact set , let , where is the initial iterate. He et al. [24] established the following criterion: where , , and t is the iteration counter. This criterion is used in [19, 21]. Obviously, we can only ensure that any satisfies (18). Therefore, the criterion (18) is not reasonable. In [25], Lin et al. proposed the following criterion: where . Proposition 1 in [25] indicates that the vector is an optimal solution to (1) if and only if the left-hand side of (19) equals zero. Compared with (18), the criterion (19) is more reasonable. Therefore, we shall use a criterion similar to (19) to measure the convergence rate of our new method.

Algorithm and convergence results

In this section, we first present the symmetric ADMM with indefinite proximal regularization (termed IPS-ADMM), and then prove the convergence results of the sequence generated by the IPS-ADMM.

Algorithm 3.1

The IPS-ADMM for problem (1) Input four parameters , , , the tolerance , and the proximal matrices with and defined by (7). Initialize , and set . Compute the new iterate by the following iterative scheme: If , then stop; otherwise set , and go to Step 1.

Remark 3.1

Since the global convergence of IPS-ADMM with has been established in the literature [16, 26–28], in the following, we restrict . To prove the convergence results of the IPS-ADMM, we first define a block matrix and an auxiliary variable.

Lemma 3.1

For the sequence generated by the IPS-ADMM, we have and where . The proof of this lemma is similar to that of Lemma 3.1 and Theorem 4.2 in [13], which is omitted. □

Remark 3.2

By the definition of in (11), (12), for any such that , the left-hand side of (22) can be written as Then, substituting the above equality into the left-hand side of (22), we get where the vector satisfies . Comparing all the terms appeared in (19) and (24), we find that the left-hand side of (24) does not have the term temporarily, and due to the indefinite of R, the term on the right-hand side of (24) maybe negative. Now let us deal with the term , and by doing so, the term will also appear. By a manipulation, we get the concrete expression of the matrix R, which is as follows:

Lemma 3.2

Let be the sequence generated by the IPS-ADMM. Then we have The proof of this lemma is similar to that of Lemma 5.1 in [13], which is omitted. □ The following lemma deals with the crossing term on the right-hand side of (26), whose proof is mainly motivated by those of Lemma 3.2 in [26] and Lemma 5.2 in [13].

Lemma 3.3

Let be the sequence generated by the IPS-ADMM. Then we have The first-order optimality condition of -subproblem in (5) indicates that, for any , Setting in (28), we get Similarly, taking in (28) for , we have Then, adding the above two inequalities, we get From the update formula for λ in (5), we have Substituting the above equality into the left-hand side of (29), we get By the definitions of G and (see (7) and (9)), we have where the last inequality comes from the Cauchy-Schwartz inequality. Substituting the above inequality into the right-hand side of (30) and arranging terms, we get the assertion (28) immediately. □ Then, substituting (28) into the right-hand side of (26), we get the following main theorem, which provides a lower bound of as , and the lower bound is composed of the term , some terms in the form , and some others.

Theorem 3.1

Let be the sequence generated by the IPS-ADMM. Then we have Now, let us rewrite all the terms on the right-hand side of (31) by some quadratic terms, and mainly deal with the term and the crossing term . According to the simplicial partition () of the set in (6), the following analysis is divided into five cases, which are discussed in the following five subsections.

Case 1:

Lemma 3.4

For any fixed , if , then there are constants such that Furthermore, , for any , where is defined in (15).

Proof

We prove the assertion (32) from the definition of the matrix R directly. Define an auxiliary matrix as By the expression of R in (25), we have Now, let us verify the positive definiteness of the matrix which can be written as Obviously, when , the above matrix is positive definite. Therefore, the matrix S is positive definite, and then the matrices R and are both positive definite by the full column rank of and the positive definiteness of P. By a manipulation, we get where By the positive definiteness of the matrix S, we get the assertion (32). By the definitions of and , we have Therefore, , for any . By some manipulations, we have Therefore, , for any . □

Case 2:

Lemma 3.5

For any , if , then we have where () are four positive constants defined by Furthermore, , for any . Setting in (31), we have which proves (33). From , it is obvious that , and from , , we have . By the definition of , we get Therefore, , for any . By some manipulations, we have Therefore, , for any . □

Remark 3.3

For any , Gao et al. [21] have proved that is a lower bound of α. The curves of and with are drawn in Figure 1, from which we have if , and if . Therefore, compared with that in [21], the feasible set of τ in this paper is expanded if , and is shrunk if . However, Gao et al. only established the worst-case convergence rate of the IPS-ADMM using the criterion (18), and we shall prove the worst-case convergence rate of the IPS-ADMM using the more reasonable criterion (19); see the following Theorem 3.4.
Figure 1

The curves of and in .

The curves of and in .

Case 3:

Lemma 3.6

For any , if , then we have where () are five positive constants defined by Furthermore, , for any . By the Cauchy-Schwartz inequality, we have Then, substituting the above inequality into the right-hand side of (31), we get which proves (34). From , it is obvious that Furthermore, by some manipulations, , we have Therefore, , for any . □

Case 4:

Lemma 3.7

For any , if , then we have where () are five positive constants defined by Furthermore, , for any . By the Cauchy-Schwartz inequality, we have Then, substituting the above inequality into the right-hand side of (31), we get which proves (35). From the definition of , , , it is easy to verify that . From the definition of , we get Furthermore, by some manipulations, , we have Therefore, , for any . □

Case 5:

Lemma 3.8

For any , if , then we have where () are five positive constants defined by Furthermore, , for any . By the Cauchy-Schwartz inequality, we have Then, substituting the above inequality into the right-hand side of (31), we get which proves (36). From the definition of , , , it is easy to verify that . From the definition of , for any , we get By the definition of , for any , we have where the first inequality follows from , and the second inequality comes from , , . By some manipulations, we obtain Therefore, , for any . □ In the remainder of this section, we shall establish the convergence results of the sequence generated by the IPS-ADMM. First, based on (24) and Lemmas 3.4-3.8, we can get the following theorem.

Theorem 3.2

Let be the sequence generated by the IPS-ADMM. Then, for any , , where is defined in (8), we have where satisfies , , with if , , . With the above theorems in hand, now we are ready to prove the global convergence of the IPS-ADMM.

Theorem 3.3

Let be the sequence generated by the IPS-ADMM. Then, if , are both full column rank, the sequence is bounded and converges to a point . Choose an arbitrary and setting , , in (37), we get Then, from , and (23), we have which together with , , implies that This, the full column rank of , and the positive definiteness of P indicate that Furthermore, it follows from (38) that the sequences and are both bounded. Therefore, has at least one cluster point, saying , and suppose that the subsequence converges to . Then, taking the limits on both sides of (21) along the subsequence and using (39), we have Therefore, . Hence, replacing by in (38), we get From (39), we see that, for any given , there exists , such that Since for , there exists , such that Then the above three inequalities lead, for any , to Therefore the whole sequence converges to the . The proof is completed. □ Now, we are going to prove the worst-case convergence rate in an ergodic sense of the IPS-ADMM.

Theorem 3.4

Let be the sequence generated by the IPS-ADMM, and let where t is a positive integer. Then, where , and D is a constant defined by Setting , in (37), and summing the resulted inequality over , we have Therefore, dividing (42) by t and using the convexity of lead to where the constant D is defined by (41). Compared (43) with (19), we only need to deal with the term on the left-hand side of (43). In fact, from the convexity of , we get Then, substituting the above inequality into (43), we get the desired result (40). This completes the proof. □

Numerical results

We have established the convergence results of the IPS-ADMM in theory. In this section, by comparing the IPS-ADMM with the PS-ADMM [15], we are going to highlight its promising numerical behaviors in solving an image restoration problem: the total-variational denoising problem. All the codes were written by Matlab R2010a and all the numerical experiments were conducted on a THINKPAD notebook with Pentium(R) Dual-Core CPU@2.20 GHz and 4 GB RAM. Below, we consider the total-variational (TV) denoising problem [29]: where is a discrete gradient operator with , being the finite-difference operators in the horizontal and vertical directions, respectively; is the regularization parameter. Here, we set . Introducing an auxiliary variable , we can reformulate (44) as Obviously, (45) is a special case of (1), and therefore the IPS-ADMM is applicable. Now, let us elaborate on how to derive the closed-form solutions for the subproblems resulted by the IPS-ADMM. Set , . For given , the first subproblem is which has a closed-form solution: For given , , , the third subproblem is which has a closed-form solution: For the IPS-ADMM, we set , , , . For the PS-ADMM, we set . The initialization is chosen as , , . The stopping criterion is the same as that in [2]: where , and with and . We use the following Matlab scripts to generate some synthetic data for (45) [21]: We list some numerical results in Table 1. Numerical results in Table 1 illustrate that the IPS-ADMM often performs much better than the PS-ADMM, though the difference between them only lies in the proximal parameter. Then, the numerical advantage of smaller proximal parameter is verified.
Table 1

Comparison between the number of iterations (time in seconds) taken by PS-ADMM and IPS-ADMM for TV denoising problem

n PS-ADMM ( r , s )=(−0.3,1.2) IPS-ADMM ( r , s )=(−0.3,1.2) Ratio (%) PS-ADMM ( r , s )=(0.3,1.2) IPS-ADMM ( r , s )=(0.3,1.2) Ratio (%)
100176 (0.04)94 (0.03)0.53 (0.60)149 (0.06)97 (0.02)0.65 (0.41)
200213 (0.05)107 (0.03)0.50 (0.49)180 (0.04)117 (0.03)0.65 (0.67)
300189 (0.06)104 (0.03)0.55 (0.45)160 (0.04)105 (0.03)0.66 (0.63)
40047 (0.02)24 (0.01)0.51 (0.43)40 (0.01)27 (0.01)0.68 (0.88)
50099 (0.03)54 (0.02)0.55 (0.56)84 (0.03)56 (0.02)0.67 (0.68)
Comparison between the number of iterations (time in seconds) taken by PS-ADMM and IPS-ADMM for TV denoising problem

Conclusions

In this paper, a symmetric ADMM with indefinite proximal regularization for two-block linearly constrained convex programming is proposed. Under mild conditions, we have established the global convergence and the worst-case convergence rate in an ergodic sense of the new method. Some numerical results are given, which illustrate that the new method often performs better than its counterpart with positive definite proximal regularization. Note that this paper only discusses the symmetric ADMM with indefinite proximal regularization for the two-block separable convex problems. In the future, we shall study the ADMM-type method with indefinite proximal regularization for the multi-block case.
  4 in total

1.  Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems.

Authors:  Amir Beck; Marc Teboulle
Journal:  IEEE Trans Image Process       Date:  2009-07-24       Impact factor: 10.856

2.  Generalized Alternating Direction Method of Multipliers: New Theoretical Insights and Applications.

Authors:  Ethan X Fang; Bingsheng He; Han Liu; Xiaoming Yuan
Journal:  Math Program Comput       Date:  2015-02-06

3.  A STRICTLY CONTRACTIVE PEACEMAN-RACHFORD SPLITTING METHOD FOR CONVEX PROGRAMMING.

Authors:  He Bingsheng; Han Liu; Zhaoran Wang; Xiaoming Yuan
Journal:  SIAM J Optim       Date:  2014-07       Impact factor: 2.850

4.  The convergence rate of the proximal alternating direction method of multipliers with indefinite proximal regularization.

Authors:  Min Sun; Jing Liu
Journal:  J Inequal Appl       Date:  2017-01-14       Impact factor: 2.491

  4 in total
  1 in total

1.  An accelerated proximal augmented Lagrangian method and its application in compressive sensing.

Authors:  Min Sun; Jing Liu
Journal:  J Inequal Appl       Date:  2017-10-23       Impact factor: 2.491

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.