Literature DB >> 29755243

Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm.

Abstract

The proximal gradient algorithm is an appealing approach in finding solutions of non-smooth composite optimization problems, which may only has weak convergence in the infinite-dimensional setting. In this paper, we introduce a modified proximal gradient algorithm with outer perturbations in Hilbert space and prove that the algorithm converges strongly to a solution of the composite optimization problem. We also discuss the bounded perturbation resilience of the basic algorithm of this iterative scheme and illustrate it with an application.

Entities: Chemical Gene

Keywords: Bounded perturbation resilience; Convex minimization problem; Modified proximal gradient algorithm; Strong convergence; Viscosity approximation

Year: 2018 PMID： 29755243 PMCID： PMC5932141 DOI： 10.1186/s13660-018-1695-x

Source DB: PubMed Journal: J Inequal Appl ISSN： 1025-5834 Impact factor: 2.491

Introduction

Let H be a real Hilbert space with an inner product and an induced norm . Let be the class of convex, lower semi-continuous, and proper functions from H to . Consider the following non-smooth composite optimization problem: where , f is differentiable and ∇f is L-Lipschitz continuous on H with . g may not be differentiable. If further, is coercive, that is, then Φ has a minimizer over H, that is, , see [1, page 159, Proposition 11.14]. Problem (1) has a typical scenario in linear inverse problems [2], it has applications in compressed sensing, machine learning, data recovering and so on (see [3-6] and the references therein). Proximal gradient methods are among the methods used for solving problem (1), which allow to decouple the contribution of the functions f and g in a gradient descent step determined by f and in a proximal step induced by g [7, 8]. For the classical proximal gradient method, the initial value is given, and the iterative algorithm for generating sequence is defined as follows: where is the step size, is a proximal operator (see Sect. 2). If and , then any sequence generated by algorithm (3) converges weakly to an element of S [1, Corollary 27.9]. Xu [9] put forward the following slightly more general proximal gradient algorithm: for problem (1), where the weak convergence of the generated sequence was obtained. Besides, it was noted that no strong convergence is guaranteed if . In 2017, Guo, Cui and Guo [10] proposed the following proximal gradient algorithm with perturbations: The generated sequence again converges weakly to a solution of (1). On the other hand, it is well known that the viscosity approximation method proposed by Moudafi [11] generates a sequence : which converges strongly to a fixed point of T for some contractive operator h. In 2004, Xu [12, Theorem 3.1] further proved that the above is also the unique solution of the variational inequality: provided that satisfies certain conditions. This paper is based on viscosity algorithm (6) and proximal gradient algorithm (4) to generate a sequence with perturbations, which converges strongly to a solution of problem (1). We also apply this algorithm to solve the linear inverse problem. An objective of this paper considering the perturbation is the superiorization methodology introduced by [13]. The superiorization method may not find an optimal solution to the given objective function. It might try to find a point with a lower cost function value than other points by a rather simple algorithm, which is known as the basic algorithm (see [14-18] for more details). It is a heuristic method with less time consuming that makes it applicable to some important practical problems such as medical image recovery [19, 20], computed tomography [21], intensity-modulated radiation therapy [22] and the like. However, the superiorization method needs to investigate the basic iterative algorithm’s bounded perturbation resilience. Hence, there raises a new problem whether the basic algorithm is bounded perturbation resilient. Very recently, several articles focused on this topic [23-27]. So another task of this paper is to discuss the bounded perturbation resilience of the modified proximal gradient algorithm.

Results and discussion

In view of the facts that the sequence generated by (5) converges weakly to a solution of (1), the viscosity method can convert a weakly convergent sequence to a strongly convergent one, and that the applied widely superiorization method introduced by [13] is based on the bounded perturbation resilience of basic algorithms, we discuss the strong convergence problem of a modified proximal gradient algorithm with perturbations as well as the bounded perturbation resilience of the responding basic algorithm. The structure of this paper is as follows. In Sect. 2, we introduce some definitions and lemmas that will be used to prove the main results in the subsequent sections. In Sect. 3, we present the modified proximal gradient algorithm with perturbations and prove that the generated sequence converges strongly to a solution of problem (1). We conclude this section with several corollaries. In Sect. 4, we introduce the definition of bounded perturbation resilience and certify the corresponding strong convergence result. In Sect. 5, we apply our algorithm to the linear inverse problem, and illustrate it with a specific numerical example. Finally, we give a conclusion in Sect. 6.

Preliminaries

Let be a sequence in Hilbert space H and . Let be an operator (linear or nonlinear). We list some notations. means converges strongly to x. means converges weakly to x. If there exists a subsequence , which converges weakly to a point z, we will call z a weak cluster point of . The set of all cluster points of is denoted by . . The following definitions are needed in proving our main results.

Definition 2.1

Let T, be operators. T is nonexpansive if T is L-Lipschitz continuous with , if We call T a contractive mapping if . T is α-averaged if where , and is nonexpansive. A is v-inverse strongly monotone (v-ism) with , if Given , [1, Proposition 12.15] ensures that has exact one minimizer over H for each . So we have

Definition 2.2

(Proximal operator) Let . The proximal operator of g is defined by The proximal operator of g of order is defined as the proximal operator of αg. Moreover, it satisfies (see [3, Remark 12.24]) The following lemmas (Lemma 2.3 and Lemma 2.4) describe the properties of the proximal operators.

Lemma 2.3

([9, Lemma 3.1]) Let , and , . Then

Lemma 2.4

([8, Lemma 2.4], [1, Remark 4.24]) Let , and . Then the proximity operator is -averaged. In particular, it is nonexpansive, that is,

Lemma 2.5

([9, Proposition 3.2]) Let , and . Assume that f is differentiable on H. Then z is a solution to (1) if and only if z solves the fixed point equation The following two lemmas play an important role in proving the strong convergence result.

Lemma 2.6

([1, Theorem 4.17]) Let be a nonexpansive mapping with . If is a sequence in H converging weakly to x, and if converges strongly to y, then .

Lemma 2.7

([28, Lemma 2.5]) Assume that is a sequence of nonnegative real numbers satisfying where , and satisfy the conditions: , , or equivalently, ; ; (), . Then .

Convergence analysis

In this section, let H be a Hilbert space, a ρ-contractive operator with . . f is differentiable, and ∇f is Lipschitz continuous with Lipschitz constant . Given , we propose the following modified proximal gradient algorithm for solving (1): where is a sequence in . . represents a perturbation operator and satisfies We also introduce the following iterative scheme as a special case to (12): We state the main strong convergence theorem.

Theorem 3.1

Let S be the solution set of (1), and assume that . Given . Let be generated by (12). If (13) and the following conditions hold: , ; , ; , . Then converges strongly to a point , where is the unique solution of the following variational inequality problem:

Proof

We point out that is nonexpansive for each n. Let us follow the proof of [29]. At first, that ∇f is L-Lipschizian means that ∇f is -ism ([1, Theorem 18.15]). Consequently, is -averaged as ([1, Proposition 4.33]). Besides, is -averaged by Lemma 2.4, the composite is -averaged ([29, Proposition 3.2]). Then it is nonexpansive ([1, Remark 4.24]). Set . For any , we have by applying Lemma 2.5 and that is nonexpansive. So, An induction argument shows that Hence is bounded as . Consequently, we get the boundedness of and . We next prove that as . In fact, By applying Lemma 2.3 and Lemma 2.4, we compute where . Substituting (18) into (17), we obtain where is well defined since , and are bounded. By taking , and in (19), we get according to Lemma 2.7 and (i)–(iii) in Theorem 3.1. Since is bounded, there exists a subsequence such that as . In the sequel, we shall verify that . To this end, assume that (), and set . We compute By using Lemma 2.3, we get Thus, we have in view of that , are bounded for j, and . We combine (20), (21) and (23) to have which implies that owing to Lemma 2.6, and hence . Finally, we prove that We have, by utilizing Lemma 2.4, where in view of and being bounded. In order to apply Lemma 2.7 to (25), we need to prove Select a suitable subsequence from such that Since is bounded, it has a weakly convergent subsequence. Without loss of generality, we denote the weakly convergent subsequence by and assume that . Then , and Take , , . Then all conditions in Lemma 2.7 are satisfied. Thus which implies that as . □ With generated by (14), we obtain the following.

Theorem 3.2

With the conditions in Theorem 3.1 hold, given , the sequence generated by (14) converges strongly to a point . We complete the proof by translating (14) into the form of (12). Indeed, we can rewrite as where Obviously, owing to Lemma 2.4 and . Thus we have . Since (12) was shown to converge, this immediately implies that (14) converges strongly to a solution of (1). □ If , , the exact form of the two modified proximal gradient algorithms follows.

Corollary 3.3

With the conditions in Theorem 3.1 holding, and given , arbitrarily, any sequence defined by converges strongly to a point . We also get the following result of [12, Theorem 3.2] with and .

Corollary 3.4

With the conditions in Theorem 3.1 hold, given , any sequence defined by converges strongly to a point . In addition, if h is some constant function, we have

Corollary 3.5

Under the conditions given in Theorem 3.1, for any , the sequence defined by converges strongly to a point , where u is a point in H.

Bounded perturbation resilience

The superiorization method can solve a broad class of nonlinear constrained optimal problems, which works by using the bounded perturbation resilience (BPR) of an original algorithm in order to steer the iterates of the algorithm towards to lower values of the objective function. In this paper, we investigate the BPR of the modified proximal gradient algorithm. The superiorization version of this scheme will be presented in the sequel paper. Given a problem Φ. Assume that we have a basic algorithm operator , where H is a real Hilbert space. Then we have the following definition, which was originally given with a finite-dimensional Euclidean space [13].

Definition 4.1

([14], Bounded perturbation resilience) An algorithmic operator A is said to be bounded perturbation resilient if the following condition holds: if the sequence , generated by with , converges to a solution of Φ, then any sequence generated by with any , also converges to a solution of Φ, where the vector sequence is bounded, and the scalars are such that for all , and . If we treat the modified proximal gradient algorithm (31) as the basic algorithm A, the bounded perturbation of it is a sequence generated by We have the following result.

Theorem 4.2

Let H be a real Hilbert space. Let be a ρ-contractive operator, and . Assume the solution set S of (1) is nonempty. Assume, in addition, that f is differentiable, ∇f is L-Lipschitz continuous on H. , satisfy the conditions in Definition 4.1, and satisfy the conditions in Theorem 3.1, respectively. Then any sequence generated by (34) converges strongly to a point in S. Thus, the modified proximal gradient algorithm is bounded perturbation resilient. We rewrite (34) as where In view of Lemma 2.4 and the assumptions as regards h and f, we have which implies that owing to the conditions imposed on , , and . We then deduce the conclusion from Theorem 3.1. □

An application and the numerical experiment

In this section, we apply Theorem 4.2 to linear inverse problem and show the numerical experiment.

Linear inverse problem

Let H be a real Hilbert space and a bounded linear operator. Given . We consider the following linear inverse problem: which is used to estimate an unknown signal x from the noise measurement b in finite-dimensional space. w is an unknown noise vector. This problem can be solved via the regularized least-squares problem: where is a regularization parameter. By applying algorithm (34) to (38), we obtain the following.

Theorem 5.1

Let be a ρ-contractive operator with . Assume and the solution set S of (38) is nonempty. Assume, in addition, is a bounded sequence in H, such that . Given , we define by the iterative scheme ( is the adjoint of A), where , ; , ; , . Then converges strongly to a point , where is the unique solution of the following variational inequality problem:

Proof

Take , . It is easy to see that , , and So ∇f is Lipschitz continuous with . In addition, g is subdifferentiable, and its subdifferential is So we can apply Theorem 4.2 to obtain this result. □

Numerical experiment

In this subsection, we apply the iterative scheme (34) to solve (38) with to demonstrate the effectiveness of this algorithm. For finite-dimensional spaces, the least-squares problem (38) takes the form as follows: where is a matrix. The vector . implies with , where represents the transpose of A. , where , . The bounded sequence and the summarizable nonnegative real sequence can be chosen as follows: for some . Throughout the experiments, is a matrix whose entries are sampled independently from a Gaussian distribution of zero mean and unit variance. The vector is generated from a uniformly distribution in the interval . The regularization parameter . We choose and . Given , , we define the stopping criterion where ε is a given small positive constant. To see the behavior of algorithm (34), we plotted the evolutions of ‘Err’ defined by (44) with respect to the numbers of iterations in Fig. 1 for the initial point . The plots in Fig. 1 show that the proposed algorithm is reliable to solve (42). Besides, The iteration numbers (“Iter”), the computing time in seconds (“Time”), the error’s values (“Err”) and (“”) are reported in Table 1 when the stopping criterion is reached. We can see from Table 1 that the summarizable positive real sequence and the contractive constant ρ can have a large impact on the numerical performance. We also find that the sequence generated by algorithm (34) can get very close to the solution of the problem .

Figure 1

The numbers of iterations under the different error values

Table 1

Numerical results with different and initial value

(c,ρ)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x_{0}=(0,0,\ldots,0)^{T}$\end{document}x0=(0,0,…,0)T			\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x_{0}=(1,1,\ldots,1)^{T}$\end{document}x0=(1,1,…,1)T
(c,ρ)	Iter.	Time	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Vert Ax_{n}-b \Vert$\end{document}∥Axn−b∥	Iter.	Time	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Vert Ax_{n}-b \Vert$\end{document}∥Axn−b∥
(0.1,0.5)	152	0.140	0.066	199	0.140	0.074
(0.8,0.5)	440	0.608	0.220	260	0.172	0.070
(0.9,0.2)	680	0.546	0.307	773	0.484	0.353

The numbers of iterations under the different error values Numerical results with different and initial value

Conclusion

In this paper, we introduced a modified proximal gradient algorithm with perturbations in Hilbert space by making a convex combination of a proximal gradient operator and a contractive operator h. There exists a perturbation term in each iterative step (see (12)). We proved that the generated iterative sequence converges strongly to a solution of a non-smooth composite optimization problem. We also showed that the perturbation in computing the gradient of f in algorithm (14) actually can be seen as a special case of (12). Finally, as one of the main objectives of this paper, we verified that the exact modified algorithm is bounded perturbation resilient, a fact which, to some extent, extends the horizon of the recent developed superiorization methodology.

3 in total

Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm.

Introduction

Results and discussion

Preliminaries

Definition 2.1

Definition 2.2

Lemma 2.3

Lemma 2.4

Lemma 2.5

Lemma 2.6

Lemma 2.7

Convergence analysis

Theorem 3.1

Proof

Theorem 3.2

Corollary 3.3

Corollary 3.4

Corollary 3.5

Bounded perturbation resilience

Definition 4.1

Theorem 4.2

An application and the numerical experiment

Linear inverse problem

Theorem 5.1

Proof

Numerical experiment

Conclusion

1. Data fusion in X-ray computed tomography using a superiorization approach.

2. Perturbation Resilience and Superiorization of Iterative Algorithms.

3. Accelerated perturbation-resilient block-iterative projection methods with application to image reconstruction.