Literature DB >> 30839853

A tensor trust-region model for nonlinear system.

Abstract

It has turned out that the tensor expansion model has better approximation to the objective function than models of the normal second Taylor expansion. This paper conducts a study of the tensor model for nonlinear equations and it includes the following: (i) a three dimensional symmetric tensor trust-region subproblem model of the nonlinear equations is presented; (ii) the three dimensional symmetric tensor is replaced by interpolating function and gradient values from the most recent past iterate, which avoids the storage of the three dimensional symmetric tensor and decreases the workload of the computer; (iii) the limited BFGS quasi-Newton update is used instead of the second Jacobian matrix, which generates an inexpensive computation of a complex system; (iv) the global convergence is proved under suitable conditions. Numerical experiments are done to show that this proposed algorithm is competitive with the normal algorithm.

Entities: Chemical Disease Gene

Keywords: BFGS formula; Convergence; Nonlinear equations; Tensor model; Trust region

Year: 2018 PMID： 30839853 PMCID： PMC6291438 DOI： 10.1186/s13660-018-1935-0

Source DB: PubMed Journal: J Inequal Appl ISSN： 1025-5834 Impact factor: 2.491

Introduction

This paper focuses on where is continuously differentiable nonlinear system. The nonlinear system (1.1) has been proved to possess wildly different application fields in parameter estimating, function approximating, and nonlinear fitting, etc. At present, there exist many effective algorithms working in it, such as the traditional Gauss–Newton method [1, 9–11, 14, 16], the BFGS method [8, 23, 27, 29, 39, 43], the Levenberg–Marquardt method [6, 24, 42], the trust-region method [4, 26, 35, 41], the conjugate gradient algorithm [12, 25, 30, 38, 40], and the limited BFGS method [13, 28]. Here and in the next statement, for research convenience, suppose that has solution . Setting as a norm function, the problem (1.1) is equivalent to the following optimization problem: The trust-region (TR) methods have as a main objective solving the so-called trust-region subproblem model to get the trial step , where is the kth iteration, △ is the so-called TR radius, and is the normally Euclidean norm of vectors or matrix. The first choice for many scholars is to study the above model to make a good improvement. An adaptive TR model is designed by Zhang and Wang [42]: where is an integer, and and are constants. Its superlinear convergence is obtained under the local error bound assumption, by which it has been proved that the local error bound assumption is weaker than the nondegeneracy [24]. Thus one made progress in theory. However, its global convergence still needs the nondegeneracy. Another adaptive TR subproblem is defined by Yuan et al. [35]: where is generated by the BFGS quasi-Newton formula where , is the next iteration, and is an initial symmetric positive definite matrix. This TR method can possess the global convergence without the nondegeneracy, which shows that this paper made a further progress in theory. Furthermore, it also possesses the quadratic convergence. It has been showed that the BFGS quasi-Newton update is very effective for optimization problems (see [32, 33, 36] etc.). There exist many applications of the TR methods (see [19–21, 31] etc.) for nonsmooth optimizations and other problems. It is not difficult to see that the above models only get the second Taylor expansion and approximation. Can we get the approximation to reach one more level, namely the third expansion, or even the fourth? The answer is positive and a third Taylor expansion is used and a three dimensional symmetric tensor model is stated. In the next section, the motivation and the tensor TR model are stated. The algorithm and its global convergence are presented in Sect. 3. In Sect. 4, we do the experiments of the algorithms. One conclusion is given in the last section.

Motivation and the tensor trust-region model

Consider the tensor model for the nonlinear system at , where is the Jacobian matrix of at and is three dimensional symmetric tensor. It is not difficult to see that the above tensor model (2.1) has better approximation than the normal quadratical trust-region model. It has been proved that the tensor is significantly simpler when only information from one past iterate is used (see [3] for details), which obviously decreases the complexity of the computation of the three dimensional symmetric tensor . Then the model (2.1) can be written as the following extension: In order to avoid the exact Jacobian matrix , we use the quasi-Newton update matrix instead of it. Thus, our trust-region subproblem model is designed by where and is generated by the following low-storage limited BFGS (L-BFGS) update formula: where , , I is the unit matrix and m is a positive integer. It has turned out that the L-BFGS method has a fast linear convergence rate and minimal storage, and it is effective for large-scale problems (see [2, 13, 28, 34, 37] etc.). Let be the solution of (2.3) corresponding to the constant p. Define the actual reduction by and the predict reduction by Based on definition of the actual reduction and the predict reduction , their radio is defined by Therefore, the tensor trust-region model algorithm for solve (1.1) is stated as follows.

Algorithm 1

Constants ρ, , , , , and is a symmetric and positive definite matrix. Let ; Stop if holds; Solve (2.3) with to obtain ; Compute , , and the radio . If , let , go to Step 2. If , go to the next step; Set , , update by (2.4) if , otherwise set ; Let and . Go to Step 1.

Remark

The procedure of “Step 2–Step 3–Step 2” is called the inner cycle in the above algorithm. It is necessary for us to prove that the inner cycle is finite, which generates the circumstance that Algorithm 1 is well defined.

Convergence results

This section focuses on convergence results of Algorithm 1 under the following assumptions.

Assumption i

The level set Ω defined by is bounded. On an open convex set containing Ω, the nonlinear system is twice continuously differentiable. The approximation relation is true, where is the solution of the model (2.3). On , the sequence matrices are uniformly bounded, namely there exist constants satisfying Assumption i (B) means that there exists a constant satisfying Based on the above assumptions and the definition of the model (2.3), we have the following lemma.

Lemma 3.1

Let be the solution of (2.3), then the inequality holds.

Proof

By the definition of of (2.3), then, for any , we get Therefore, we have The proof is complete. □

Lemma 3.2

Let be the solution of (2.3). Suppose that Assumption i holds and is generated by Algorithm 1. Then we have Using Assumption i, the definition of (2.5) and (2.6), we obtain This completes the proof. □

Lemma 3.3

Let the conditions of Lemma 3.2 hold. We conclude that Algorithm 1 does not infinitely circle in the inner cycle (“Step 2–Step 3–Step 2”). This lemma will be proved by contradiction. Suppose, at , that Algorithm 1 infinitely circles in the inner cycle, namely, and with . This implies that , or the algorithm stops. Thus we conclude that is true. By Lemma 3.1 and Lemma 3.2, we get Therefore, for p sufficiently large, we have which generates a contradiction with the fact . The proof is complete. □

Lemma 3.4

Suppose that the conditions of Lemma 3.3 holds. Then we conclude that is true and converges. By the results of the above lemma, we get Combining with Lemma 3.1 generates Then holds. By the case , we deduce that converges. This completes its proof. □

Theorem 3.5

Suppose that the conditions of Lemma 3.3 hold and is generated by Algorithm 1. Then Algorithm 1 either finitely stops or generates an infinite sequence satisfying Suppose that Algorithm 1 does not finitely stop. We need to obtain (3.8). Assume that holds. Using (3.3) one gets (3.8). So, we can complete this lemma by (3.9). We use the contradiction to have (3.9). Namely, we suppose that there exist an subsequence and a positive constant ε such that Let be an index set. Using Assumption i, the case (), and () is bounded away from 0, we assume holds. By Lemma 3.1 and the definition of Algorithm 1, we obtain where is the largest p value obtained in the inner circle. Lemma 3.4 tells us that the sequence is convergent, thus Then when and . Therefore, for all , it is reasonable for us to assume . In the inner circle, by the determination of (), let corresponding to the subproblem be unacceptable. Setting one has Using Lemma 3.1 and the definition one has Using Lemma 3.2 one gets Thus, we obtain Using when and , we get this generates a contradiction to (3.12). This completes the proof. □

Numerical results

This section reports some numerical results of Algorithm 1 and the algorithm of [35] (Algorithm YL).

Problems

The nonlinear system obeys the following statement:

Problem 1

Trigonometric function Initial guess: .

Problem 2

Logarithmic function Initial points: .

Problem 3

Broyden tridiagonal function ([7], pp. 471–472) Initial points: .

Problem 4

Trigexp function ([7], p. 473) Initial guess: .

Problem 5

Strictly convex function 1 ([18], p. 29). is the gradient of . We have Initial points: .

Problem 6

Strictly convex function 2 ([18], p. 30). is the gradient of . We have Initial guess: .

Problem 7

Penalty function Initial guess: .

Problem 8

Variable dimensioned function Initial guess: .

Problem 9

Discrete boundary value problem [15] Initial points: .

Problem 10

The discretized two-point boundary value problem similar to the problem in [17] with A is the tridiagonal matrix given by and with , , and . Parameters: , , , , , is the unit matrix. The method for () and (): the method [22]. Codes experiments: run on a PC with an Intel Pentium(R) Xeon(R) E5507 CPU @2.27 GHz, 6.00 GB of RAM, and the Windows 7 operating system. Codes software: MATLAB r2017a. Stop rules: the program stops if holds. Other cases: we will stop the program if the iteration number is larger than a thousand.

Results and discussion

The column meaning of the tables is as follows. Dim: the dimension. NI: the iterations number. NG: the norm function number. Time: the CPU-time in s. Numerical results of Table 1 show the performance of these two algorithms as regards NI, NG and Time. It is not difficult to see that:

Table 1

Experiment results

Nr	Dim	Algorithm 1			Algorithm YL
Nr	Dim	Ni	NG	Time	NI	NG	Time
1	400	9	18	10.93567	11	22	1.778411
	800	9	18	52.46314	11	22	7.176046
	1600	8	14	215.453	11	22	42.57267
2	400	4	10	11.27887	6	7	1.185608
	800	4	10	45.94229	6	7	4.071626
	1600	4	10	251.38	6	7	22.58894
3	400	4	10	2.808018	64	125	8.642455
	800	4	10	10.74847	78	129	52.26034
	1600	4	10	70.80885	68	99	262.5653
4	400	2	2	0.8112052	6	17	1.092007
	800	2	2	2.839218	6	22	3.08882
	1600	2	2	14.08689	6	22	13.27569
5	400	3	6	1.731611	6	7	0.936006
	800	3	6	5.616036	6	7	3.650423
	1600	3	6	30.32659	6	7	22.44854
6	400	3	6	1.279208	5	6	0.7176046
	800	3	6	5.397635	5	16	2.88601
	1600	3	6	29.88979	5	16	16.39571
7	400	5	14	3.790824	12	49	1.435209
	800	5	14	22.52654	12	49	4.69563
	1600	5	14	102.0403	17	83	19.23492
8	400	1	2	1.294808	3	6	0.2808018
	800	1	2	5.694037	3	6	0.8580055
	1600	1	2	31.091	3	6	3.775224
9	400	13	19	11.01367	12	15	1.60681
	800	9	15	40.95026	11	17	7.191646
	1600	10	19	299.3191	10	16	38.07984
10	400	3	9	2.558416	40	50	12.44888
	800	3	9	11.62207	40	50	49.43672
	1600	3	9	73.07087	41	53	365.7911

Both of these algorithms can successfully solve all these ten nonlinear problems; the NI and the NG of these two algorithm do not increase when the dimension becomes large; the NI and the NG of Algorithm 1 are competitive to those of Algorithm YL and the Time of Algorithm YL is better than that of Algorithm 1. To directly show their the efficiency, the tool of [5] is used and three figures for NI, NG and Time are listed. Experiment results Figures 1–3 show the performance of NI, NG and Time of these two algorithms. It is easy to see that the NI and the NG of Algortihm 1 have won since their performance profile plot is on top right. And the Time of Algorithm YL has superiority to Algorithm 1. Both of these two algorithms have good robustness. All these three figures show that both of these two algorithms are very interesting and we hope they will be further studied in the future.

Figure 1

Performance profiles of these methods (NI)

Figure 3

Performance profiles of these methods (Time)

Performance profiles of these methods (NI) Performance profiles of these methods (NG) Performance profiles of these methods (Time)

Conclusions

This paper considers the tensor trust-region model for nonlinear system. The global convergence is obtained under suitable conditions and numerical experiments are reported. This paper includes the following main work: a tensor trust-region model is established and discussed. the low workload update is used in this tensor trust-region model. In the future, we think this tensor trust-region model shall be more significant.

2 in total

1. A Limited-Memory BFGS Algorithm Based on a Trust-Region Quadratic Model for Large-Scale Nonlinear Equations.

Authors: Yong Li; Gonglin Yuan; Zengxin Wei
Journal: PLoS One Date: 2015-05-07 Impact factor: 3.240

2. Two New PRP Conjugate Gradient Algorithms for Minimization Optimization Models.

Authors: Gonglin Yuan; Xiabin Duan; Wenjie Liu; Xiaoliang Wang; Zengru Cui; Zhou Sheng
Journal: PLoS One Date: 2015-10-26 Impact factor: 3.240

2 in total