Literature DB >> 28133429

The convergence rate of the proximal alternating direction method of multipliers with indefinite proximal regularization.

Abstract

The proximal alternating direction method of multipliers (P-ADMM) is an efficient first-order method for solving the separable convex minimization problems. Recently, He et al. have further studied the P-ADMM and relaxed the proximal regularization matrix of its second subproblem to be indefinite. This is especially significant in practical applications since the indefinite proximal matrix can result in a larger step size for the corresponding subproblem and thus can often accelerate the overall convergence speed of the P-ADMM. In this paper, without the assumptions that the feasible set of the studied problem is bounded or the objective function's component [Formula: see text] of the studied problem is strongly convex, we prove the worst-case [Formula: see text] convergence rate in an ergodic sense of the P-ADMM with a general Glowinski relaxation factor [Formula: see text], which is a supplement of the previously known results in this area. Furthermore, some numerical results on compressive sensing are reported to illustrate the effectiveness of the P-ADMM with indefinite proximal regularization.

Entities: Chemical Gene

Keywords: compressive sensing; proximal alternating direction method of multipliers; two-block separable convex minimization problem

Year: 2017 PMID： 28133429 PMCID： PMC5237452 DOI： 10.1186/s13660-017-1295-1

Source DB: PubMed Journal: J Inequal Appl ISSN： 1025-5834 Impact factor: 2.491

Introduction

Let be two lower semicontinuous proper (not necessarily smooth) functions. This work aims to solve the following two-block separable convex minimization problem: where , . If there are convex set constraints , where are some simple convex set, such as the nonnegative cones or positive semi-definite cones, etc. Then, we can define the indicator function as ( if ; otherwise, ), by which we can incorporate the constraints into the objective function of (1), and get the following equivalent form: Then, we can further introduce some auxiliary variables and functions to rewrite the above problem as problem (1) (Please refer to [1] for more details). Therefore, problem (1) is quite general, and in fact problems like (1) come from diverse applications, such as the latent variable graphical model selection [2], the sparse inverse covariance selection [3], stable principal component pursuit with nonnegative constraint [4], and robust alignment for linearly correlated images [5], etc. As one of the first-order methods, the following Algorithm 1, that is proximal alternating direction method of multipliers (P-ADMM) [6-8] is quite efficient for solving (1) or related problems, especially for large scale case. The P-ADMM for (1) The parameter γ in the P-ADMM is called the Glowinski relaxation factor in the literature, and can often accelerate the P-ADMM [9]. Due to its high efficiency, the P-ADMM has been intensively studied during the past few decades, and many scholars presented a lot of customized variants of the P-ADMM for some concrete separable minimization problems [10-12]. In this paper, we only focus our attention on the P-ADMM. In fact, the theory developed in this work can easily be extended to its various variants. Now, let us briefly analyze the structure advantages of the P-ADMM. Obviously, the P-ADMM fully utilizes the separable structure inherent to the original problem (1), which decouples the primal variable and get two subproblems with lower-dimension. Then, at each iteration, the computation of P-ADMM is dominated by solving its two subproblems. Fortunately, the two subproblems in (2) often admit closed-form solutions provided that are some the functions (such as , or ) and the matrices are unitary (i.e. are the identity matrices). Even if are not unitary, we can judiciously set with , and then the two subproblems in the P-ADMM also have closed-form solutions in many practical applications. The global convergence of the P-ADMM with has been proved in [10, 11] for some concrete models of (1), and in [13], Xu and Wu presented an elegant analysis of the global convergence of the P-ADMM with for the general model (1). Quite recently, He et al. [14] have further studied the P-ADMM and get some substantial advances by relaxing the matrix in the proximal regularization term of its second subproblem to be indefinite. This is quite preferred in practical applications since the indefinite proximal matrix can result in a larger step size for the subproblem and thus maybe accelerate the overall convergence speed of the P-ADMM. Compared with the study of the global convergence of the P-ADMM, the research of its convergence rate is quite insubstantial in the literature. In [14, 15], under the assumption that the feasible set of (1) is bounded, He et al. have proved the worst-case convergence rate of the P-ADMM with , where t denotes the iteration counter. In [1], Lin et al. have presented a parallel version of the P-ADMM with the adaptive penalty β, and proved that the convergence rate of their new method is also . In addition, Goldstein et al. [16] proved a better convergence rate than for the P-ADMM scheme with and under the assumption that are both strongly convex, which is usually violated in practice, and thus excludes many practical applications of the P-ADMM. Then, by introducing some free parameters and , Xu [17] developed a new variant of the P-ADMM for (1), which refined the results in [16]. In fact, only under the assumption that the function is strongly convex, Xu [17] proved that the new method has convergence rate with constant parameters and enjoys convergence rate with adaptive parameters. In this paper, we aim to further improve the above results by removing the assumptions of the strong convexity of and the boundedness of the feasible set of (1), and prove that the P-ADMM for the convex minimization problem (1) has a worst-case convergence rate in an ergodic sense, which partially improves the results in [8, 13–15, 17]. The remaining of the paper is organized as follows. Section 2 gives some useful preliminaries. In Section 3, we prove the convergence rate of the P-ADMM in detail. In Section 4, a simple experiment on compressive sensing is conducted to demonstrate the effectiveness of the P-ADMM.

Preliminaries

In this section, we summarize some basic concepts and preliminaries that will be used in the later discussion. First, we list some notation to be used in this paper. denotes the inner product of ; (or ) denotes that the symmetric matrix G is positive definite (or positive semi-definite); If G is symmetric, we set though G maybe not positive definite. The effective domain of a function is defined as . The set of all relative interior points of a given nonempty convex set is denoted by . A function is convex iff Then, if is convex, we have the following first-order necessary condition: where denotes the subdifferential of at the point y. The following equality is used frequently in the paper: From now on, we denote Throughout this paper, we make the following assumptions.

Assumption 2.1

The functions are both convex.

Assumption 2.2

There is a point such that . Then, under Assumption 2.2, it follows from Corollaries 28.2.2 and 28.3.1 in [18] that is an optimal solution to problem (1) iff there exists a Lagrangian multiplier such that is a solution of the following KKT systems: The set of the solutions of (5) is denoted by . By Assumption 2.1, (3), and (5), for any , we have the following useful inequality:

Assumption 2.3

The solution set of the KKT systems (5) is nonempty, and at least one with .

Convergence rate of the P-ADMM

In this section, we aim to prove the convergence rate of the P-ADMM, and to accomplish this, we need to make some restrictions of the matrices included in the P-ADMM as follows.

Assumption 3.1

(1) The matrix , and is full-column rank if . (2) The matrix is set as with , , and .

Remark 3.1

In [14], the parameter α can take any value of the interval . Obviously, the parameter α in this paper can also obtain the lower bound 0.8 if . Let us introduce some matrices to simplify our notation in the subsequent analysis. More specifically, we set and

Remark 3.2

From Assumption 3.1 and , we see that the matrices defined by (7) are all positive definite. However, the matrix defined in Assumption 3.1 may be indefinite. For example, when , and , then , which is obviously indefinite if the matrix is full-column rank.

Remark 3.3

From the definitions of and , we have Now, we start proving the convergence rate of the P-ADMM under Assumptions 2.1-2.3 and Assumption 3.1. Firstly, we prove three lemmas step by step.

Lemma 3.1

Let be the sequence generated by the P-ADMM. Under Assumptions 2.1-2.3, for any such that , we have

Proof

Note that the optimality condition for the first subproblem (i.e., the subproblem with respect to ) in (2) is where is a subgradient of at , , and the second equality uses the updating formula for λ in (2). Then (10) can be rewritten as where the inequality comes from the convexity of and (3). Similarly, the optimality condition for the second subproblem (i.e., the subproblem with respect to ) in (2) gives i.e., where the inequality follows from the convexity of and (3). Then, adding (11) and (12), we obtain where the last equality comes from the identity (4). Now, let us deal with the term on the right side of (13). Specifically, from the updating formula for λ in (2) again, we can get where the second equality comes from , and the last equality uses the identity (4). Then, substituting (14) into (13) yields (9). This completes the proof. □ The following lemma aims to further refine the crossing term on the right side of (9).

Lemma 3.2

Let be the sequence generated by the P-ADMM. Under Assumptions 2.1-2.3, for any such that , we have Setting in (12), we get That is, Similarly, taking in (12) for , and thus we have That is, Adding (16) and (17), we obtain We have Substituting the above equality into (18), we obtain Then, substituting (19) into (9) yields (15). The proof is completed. Now, let us deal with the term on the right side of (15). □

Lemma 3.3

Let be the sequence generated by the P-ADMM. Then, we have where . Obviously, by the updating formula for λ in (2), we have Then, applying the Cauchy-Schwartz inequality, we can get Then, substituting the above two inequalities into (21), and by some simple manipulations, we obtain which is the same as the assertion (20), and the lemma is thus proved. Substituting (20) into (15), we get the following important inequality: Now, let us deal with all the terms related with the variable on the right side of (22). From the definition of the matrices and (8), we have Then, substituting the above inequality into (22), we can obtain where the inequality comes from and . Based on (23), we can prove the worst-case convergence rate in an ergodic sense of the P-ADMM. □

Theorem 3.1

Suppose that Assumptions 2.1-2.3 and Assumption 3.1 hold. Let be the sequence generated by the P-ADMM and let , where t is a positive integer. Then, where with is a point satisfying the KKT conditions in (5), and D is a constant defined by Setting in the inequality (22) and summing it over , we obtain which together with the convexity of the function implies Using the Lemmas 2.2 and 2.3 of [17] with (ρ is a parameter defined in Lemmas 2.2 and 2.3 of [17]), we can get (24). This completes the proof. □

Remark 3.4

From (24) and (25), we can conclude that larger values of γ is more beneficial for accelerating the convergence of the P-ADMM, as the larger γ, the smaller D, which controls the upper bounds of and .

Numerical experiments

In this section, we apply the P-ADMM to solve the compressive sensing, a concrete problem of the general model (1). The codes were written by Matlab R2010a and conducted on a ThinkPad notebook with Pentium(R) Dual-Core CPU T4400@2.2 GHz, 2 GB of RAM using Windows 7. Let us briefly review the compressive sensing. Compressive sensing (CS) is to recover a sparse signal from an undetermined linear system , where is a linear mapping and is an observation. An important decoding model of CS is where the parameter is used to trade off both terms for minimization. This is a special case of the general two-block separable convex minimization model (1). In fact, setting , (26) can be recast as which is a special case of (1) with and thus, the P-ADMM can be used to solve CS. In our experiment, the stopping criterion of the P-ADMM is set as where denotes the function value of (26) at the iterate . The initial points of are all set as , and due to the limit of EMS memory of our computer, we only test a medium scale of (26) with , , where k is the number of random nonzero elements contained in the original signal. In addition, we set and , with . In the literature, the relative error (RelErr) is usually used to measure the quality of recovered signal and is defined by where x̃ and x̄ denote the recovered signal and the original signal, respectively. First, let us illustrate the sensitivity of γ for the P-ADMM. We choose different values of γ in the interval [0.1, 1.6] (More specifically, we take ). The numerical results of the objective value of (26) and the CPU time in seconds requited by the P-ADMM are depicted in Figure 1, and the numerical results of the numbers of iteration and the RelErr required by P-ADMM are depicted in Figure 2.

Figure 1

Objective value and CPU time with different .

Figure 2

Numbers of iteration and relative error with different .

Objective value and CPU time with different . Numbers of iteration and relative error with different . According to the curves in Figures 1-2, we can see that the relaxation factor γ works well for a wide range of values and, based on this experiment, the values greater than 0.5 are more preferred. Now let us test the effectiveness of the P-ADMM with the indefinite proximal matrix . Here we set , , and . The numerical results of one experiment are as follows: the objective value is 0.4291; the CPU time is 1.0920; the numbers of iteration is 378 and the RelErr is 5.75%. The original signal, the measurement and the signal recovered by the P-ADMM for this test scenario are given in Figure 3. The recovered results are marked by a red circle in the third subplot of Figure 3, which shows clearly that almost the original signal is recovered with high precision. This indicates that the P-ADMM is effective though the proximal matrix is indefinite.

Figure 3

The original signal, noisy measurement and recovered results.

1 in total

1. RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images.

Authors: Yigang Peng; Arvind Ganesh; John Wright; Wenli Xu; Yi Ma
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2012-11 Impact factor: 6.226

1 in total

4 in total

1. The symmetric ADMM with indefinite proximal regularization and its application.

Authors: Hongchun Sun; Maoying Tian; Min Sun
Journal: J Inequal Appl Date: 2017-07-21 Impact factor: 2.491

2. A symmetric version of the generalized alternating direction method of multipliers for two-block separable convex programming.

Authors: Jing Liu; Yongrui Duan; Min Sun
Journal: J Inequal Appl Date: 2017-06-05 Impact factor: 2.491

3. Modified hybrid decomposition of the augmented Lagrangian method with larger step size for three-block separable convex programming.

Authors: Min Sun; Yiju Wang
Journal: J Inequal Appl Date: 2018-10-04 Impact factor: 2.491

4. An accelerated proximal augmented Lagrangian method and its application in compressive sensing.

Authors: Min Sun; Jing Liu
Journal: J Inequal Appl Date: 2017-10-23 Impact factor: 2.491

4 in total