Literature DB >> 27833833

Optimality condition and iterative thresholding algorithm for [Formula: see text]-regularization problems.

Hongwei Jiao¹, Yongqiang Chen², Jingben Yin¹.

Abstract

This paper investigates the [Formula: see text]-regularization problems, which has a broad applications in compressive sensing, variable selection problems and sparse least squares fitting for high dimensional data. We derive the exact lower bounds for the absolute value of nonzero entries in each global optimal solution of the model, which clearly demonstrates the relation between the sparsity of the optimum solution and the choice of the regularization parameter and norm. We also establish the necessary condition for global optimum solutions of [Formula: see text]-regularization problems, i.e., the global optimum solutions are fixed points of a vector thresholding operator. In addition, by selecting parameters carefully, a global minimizer which will have certain desired sparsity can be obtained. Finally, an iterative thresholding algorithm is designed for solving the [Formula: see text]-regularization problems, and any accumulation point of the sequence generated by the designed algorithm is convergent to a fixed point of the vector thresholding operator.

Entities: Chemical Mutation

Keywords: Fixed point; Global optimum solution; Iterative thresholding algorithm; Optimality condition; [Formula: see text]-regularization problems

Year: 2016 PMID： 27833833 PMCID： PMC5080281 DOI： 10.1186/s40064-016-3516-3

Source DB: PubMed Journal: Springerplus ISSN： 2193-1801

Background

In this paper, we investigate the following -regularization problemswhere . The problem (1) has a broad applications in compressive sensing, variable selection problems and sparse least squares fitting for high dimensional data (see Chartrand and Staneva 2008; Fan and Li 2001; Foucart and Lai 2009; Frank and Freidman 1993; Ge et al. 2011; Huang et al. 2008; Knight and Wu 2000; Lai and Wang 2011; Natarajan 1995). The objective function of the problem (1) is consisted by a data fitting term and a regularization term . In Chen et al. (2014) point out that the - minimization problem (1) is a strongly NP-hard problem. Comparing with using the norm, using the quasi-norm in the regularization term we can find sparser solution, which has been extensively discussed in Candès et al. (2008), Chartrand (2007a, b), Chartrand and Yin (2008), Chen et al. (2010), Tian and Huang (2013), Tian and Jiao (2015), Xu et al. (2010, 2012), Shehu et al. (2013, 2015), Bredies et al. (2015), Fan et al. (2016). In Chen et al. (2010), Chen et al. derive the lower bounds for the absolute value of nonzero entries in each local optimum solution of the model. Xu et al. (2012) presented an analytical expression in a thresholding form for the resolvent of gradient of and developed an alternative feature theorem on optimum solutions of the regularization problem, and proposed an iterative half thresholding algorithm for fast solving the problem. But there is no result for the characteristics of the global optimum solution for the problem (1). In this article, we pay more attention to derive the characteristics of the global optimum solution of problem (1), which is inspired by Xu et al. (2012). The remaining sections of the paper are organized as follows. In “Technical preliminaries” section, we portray some important technical results. “Lower bound and optimality conditions” section first develop the proximal operator associated with a non-convex quasi-norm, which can be looked as an extension of the well-known proximal operator associated with convex functions. Next, an exact lower bound for the absolute value of nonzero entries in every global optimum solution of (1) is derived, which clearly demonstrates the relation between the sparsity of the optimum solution and the choice of the regularization parameter and norm. We also establish the necessary condition for global optimum solutions of the -regularization problems, i.e., the global optimum solutions are fixed points of a vector thresholding operator. In “Choosing the parameter λ for sparsity” section, we also propose a sufficient condition on the selection of to meet the sparsity requirement of global minimizers of the -regularization problems. “Iterative thresholding algorithm and its convergence” section proposes an iterative thresholding algorithm for the -regularization problems, and any accumulation point of the sequence produced by the designed algorithm is convergent to a fixed point of the vector thresholding operator. Finally, some conclusions are drawn in “Numerical experiments” section.

Technical preliminaries

By utilizing the objective function’s separability and the operator splitting technique, the -regularization problems (1) can be converted into n homologous single variable minimization problems defined on . Therefore, at first we investigate the homologous single variable minimization problemwhere and are all any real numbers, is a variable and is a parameter. Besides, we only need to consider the following two sub-problems In Chen et al. (2014), investigated the subproblem (3) and presented some results, which can be used to derive our conclusions. Let

Lemma 1

(Lemma.2.2, Chen et al. 2014) For any , denote . For any known , set be the positive root of the equation , where and are given in (5) and (6). Then, there is a unique implicit function define on , which satisfies , and for . Furthermore, for the function , the following conclusions hold: is a continuous function defined on . is a differentiable function over and . is a strictly increasing function over . Moreover, if , then is the sole local minimizer of over .

Lemma 2

(Prop.2.4, Chen et al. 2014) Set be the global optimum solution for the problem (3), then we have where , is defined by Lemma 1.

Proposition 1

Set be the global optimum solution for the problem (2), then we have where , is defined in Lemma 1 and .

Proof

If , then . Let is a global optimum solution for the problem (3), then from Lemma 2, we haveIf , then . Let , we have and , we follow the first case. If is a global optimum solution for the problem over , then from Lemma 2, we haveTherefore, if is a global optimum solution for the problem , then we haveCombining (8) and (9) together, we can get (7). Therefore, the proof is complete.

Proposition 2

Assume that is a global optimum solution for the problem (2). When given in Proposition 1, set be simultaneously zero or nonzero. Then the following conclusions hold: The function is an odd function over . The function is continuous over , furthermore, . The function is differentiable over . The function is strictly increasing over . By Proposition 1 and Lemma 1, this proposition can be followed. When , in Xu et al. (2012), of (7) has the following analytic corollary.

Corollary 1

(Theo. 1, Lemm. 1 and 2, Xu et al. 2012) When , the global optimum solution of problem (2) has the following results: where , and . A brief proof is presented here for completeness. When , we have . When , , by Proposition 2, then is the root of the equationwhich is followed by the first order optimum condition of (2). By Theorem 1 of Xu et al. (2012), we have . The proof is completed.

Lower bound and optimality conditions

In this section, by using function’s separability and the operator splitting technique, we propose the proximal operator associated with quasi-norm. Next, we present the properties of the global optimum solutions of the -regularization problems (1). For convenience, first of all, we define the following thresholding function and thresholding operators.

Definition 1

( thresholding function) Assume that , for any , the function defined in (7) is called as a thresholding function.

Definition 2

(Vector thresholding operator) Assume that , for any , the vector thresholding operator is defined asIn this section, one of the main results is a proximal operator associated with the non-convex quasi-norm, and which can be also looked as an extension of the well-known proximal operator associated with convex functions.

Theorem 1

For given a vector and constants . Assume that be the global optimum solution of the following problem then can be expressed as Furthermore, we can get the exact number of global optimum solutions for the problem. FromLet , thenTherefore, to solve the problem (11) is equivalent to solving the following n problems, for each ,By Proposition 1, for each , we can followand if , the problem (12) has two solutions; else, unique solution. Hence we can know the exact number of global optimum solutions of (11). The proof is thus complete. For any and , letFor simplicity, let

Theorem 2

Assume that be the global minimizer of for any fixed and , then we have Without loss of generality, can be rewritten asTherefore, to solve for any fixed and Y is equivalent to solvingBy Theorem 1, thus the proof is complete.

Lemma 3

If is a global minimizer of the problem (1) for any fixed and for any fixed which satisfies , then is also a global minimizer of , that is, For any , Since , we haveHence,the proof is complete.

Theorem 3

For any given , if be the global optimum solution of the problem (1), then satisfies Especially, we have where and . Furthermore, we have: if , then . Since is a global minimizer of for given , by Theorem 2 and Lemma 3, we can directly get (16) and (17). By proposition 2, we can follow thatBy Proposition 2, combining with the strict monotonicity of on and , we can follow that as , as and as . Therefore, the proof is completed.

Remark 1

In Theorem 3, the necessary condition for global optimum solutions of the -regularization problems is established, which is a thresholding expression associated with the global optimum solutions. Particularly, the global optimum solutions for the problem (1) are the fixed points of a vector-valued thresholding operator. In contrast, the conclusion does not hold in general, i.e., a point satisfying (16) is not the global optimum solution for the -regularization problems (1) in general. This is related to the nature of the matrix A, for an instance, when and , a fixed point of (16) is the global optimum solution for the -regularization problems (1) (i.e., Theorem 1).

Remark 2

In Theorem 3, the exact lower bound for the absolute value of nonzero entries in every global optimum solution of the model is also provided, which can be used to identify zero entries precisely in any global optimum solution. These lower bounds clearly demonstrate the relationship between the sparsity of the global optimum solution and the choices of the regularization parameter and norm, therefore, our theorem can be used to select the desiring model parameters and norms.

Choosing the parameter for sparsity

In many applications such that sparse solution reconstruction and variable selection, one need to seek out least square estimators with no more than k nonzero entries. Chen et al. (2014) present a sufficient condition on for global minimizers of the -regularization problems, which have desirable sparsity, and which are based on the lower bound theory in local optimum solutions. In this paper, we also present a sufficient condition on for global minimizers of the -regularization problems, which also have desirable sparsity, but which are based on the lower bound theory in global optimum solutions.

Theorem 4

Set The following conclusions hold. If , then any global minimizer of the -regularization problems (1) satisfies for . If , then is the unique global minimizer of the -regularization problems (1). Assume that is a global minimizer of the -regularization problems (1). Let , where and is the cardinality of the set T. Therefore, according to the first order necessary condition, must satisfywhich shows . Hence, we haveBy Theorem 3, we can follow thatTherefore, we haveIn the following, we will discuss different cases:This is in contradiction with that is a global minimizer of (1). Therefore, must be the unique global minimizer of (1). Assume that , we shall prove it through apagoge. If , then by (3.11) and the definition of in (3.8), we have This is in contradiction with that is a global minimizer of (1). Therefore, we have . Assume that , we shall prove it through apagoge. If , then there exists satisfying and

Iterative thresholding algorithm and its convergence

By the thresholding representation formula (16), an iterative thresholding formula of the problem (1) can be presented in the following: initilized ,whereWhen , the adjustment here is, we only select . Firstly, some important lemmas are given in the following.

Lemma 4

Let and be the sequence produced by the algorithm (22), then we can follow that the sequences and are non-increasing. For , we haveHence,The first equality can be followed from the definition of . The second inequality is because that the is the minimizer of . This lemma demonstrate that, from iteration to iteration, the objective function does not increase, moreover, using the proposed algorithm does not lead to worse results than not using the proposed algorithm. The algorithm (22) does not have a unique fixed point, therefore it is very important to analyze the fixed points in detail.

Lemma 5

Let and . The point is a fixed point for the algorithm (18) if and only if A fixed point of the algorithm (22) is any satisfying , i.e., . If , the equality holds when and only when , i.e., . Similarly, when and only when . The following lemma demonstrate that the sequence produced by the algorithm (22) is asymptotically regular, i.e., .

Lemma 6

If , and assume that be the sequence produced by the algorithm (22), satisfying . We prove the convergence of , which implies the lemma. First of all, we prove that is monotonically increasing. We can follow monotonicity fromThen, we will show the boundness of . For , we have andTherefore,The second inequality can be followed from the proof of Lemma 4 and the last inequality can be taken from . In the following, we present an very important property of the algorithm, i.e., any accumulation point of the sequence is a fixed point of the algorithm (22). Therefore, we have the following theorem and conclusion.

Theorem 5

If and , then we have the following conclusion: any accumulation point of the sequence produced by the algorithm (22) is a fixed point of (22). In Lemma 6, we take . If and , then we have , by Lemma 6 which is impossible for for some K. Therefore, for large K, the set of zero and non-zero coefficients will not change and . Assume that be a convergent subsequence and be its limit point, i.e.,By the limitation (24) and Lemma 6, we havewhich implies that the sequence is also convergent to . Note that , i.e., . Let and . For for some K, if , then by (23) and (7) we havetherefore, . Similarly, if , then by (23) and (7) we havewhere . By Proposition 2, we can follow that the function is continuous over and . Therefore, we follow that . By Lemma 5, is a fixed point of (22).

Numerical experiments

Now we report numerical results to compare the performance of Iterative thresholding algorithm (ITA) () for solving (1) (Signal reconstruction) with LASSO to find sparse solutions. The computational test was conducted on a Intel(R) Core(TM)2 Duo CPU E 8400 @3.00GHZ Dell desktop computer with 2.0GHz of memory with using Matlab R2010A. Consider a real-valued, finite-length signal . Suppose x is T-sparse, that is, only T of the signal coefficients are nonzero and the others are zero. We use the following Matlab code to generate the original signal, a matrix A and a vector b.The computational results for this experiment are displayed in Table 1.

Table 1

Comparison of ITA and LASSO algorithm

Problems			LASSO		ITA
n	T	m	Time	Error	Time	Error
800	60	150	0.572	4.16e−4	0.375	1.15e−4
800	80	180	0.461	3.58e−4	0.252	1.06e−4
2000	160	300	0.853	5.75e−4	0.516	1.62e−4
2000	200	500	0.853	5.86e−4	0.553	1.73e−4

Comparison of ITA and LASSO algorithm From Table 1 we find that ITA has smaller prediction accuracy than LASSO in shorter time.

Conclusion

In this paper, an exact lower bound for the absolute value of nonzero entries in each global optimum solution of the problem (1) is established. And the necessary condition for global optimum solutions of the -regularization problems is derived, i.e., the global optimum solutions are the fixed points of a vector thresholding operator. In addition, we have derived a sufficient condition on the selection of for the desired sparsity of global minimizers of the problem (1) with the given (A, b, p). Finally, an iterative thresholding algorithm is designed for solving the -regularization problems, and the convergence of algorithm is proved.

2 in total

1. L1/2 regularization: a thresholding representation theory and a fast solver.

Authors: Zongben Xu; Xiangyu Chang; Fengmin Xu; Hai Zhang
Journal: IEEE Trans Neural Netw Learn Syst Date: 2012-07 Impact factor: 10.451

2. Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks.

Authors: Qinwei Fan; Wei Wu; Jacek M Zurada
Journal: Springerplus Date: 2016-03-08

2 in total