Literature DB >> 26644929

RF cavity design exploiting a new derivative-free trust region optimization approach.

Abdel-Karim S O Hassan¹, Hany L Abdel-Malek¹, Ahmed S A Mohamed¹, Tamer M Abuelfadl², Ahmed E Elqenawy¹.

Abstract

In this article, a novel derivative-free (DF) surrogate-based trust region optimization approach is proposed. In the proposed approach, quadratic surrogate models are constructed and successively updated. The generated surrogate model is then optimized instead of the underlined objective function over trust regions. Truncated conjugate gradients are employed to find the optimal point within each trust region. The approach constructs the initial quadratic surrogate model using few data points of order O(n), where n is the number of design variables. The proposed approach adopts weighted least squares fitting for updating the surrogate model instead of interpolation which is commonly used in DF optimization. This makes the approach more suitable for stochastic optimization and for functions subject to numerical error. The weights are assigned to give more emphasis to points close to the current center point. The accuracy and efficiency of the proposed approach are demonstrated by applying it to a set of classical bench-mark test problems. It is also employed to find the optimal design of RF cavity linear accelerator with a comparison analysis with a recent optimization technique.

Entities: Chemical Disease Gene Species

Keywords: Derivative-free optimization; Linear accelerator; Optimal design; Quadratic surrogate model; Trust region

Year: 2014 PMID： 26644929 PMCID： PMC4642173 DOI： 10.1016/j.jare.2014.08.009

Source DB: PubMed Journal: J Adv Res ISSN： 2090-1224 Impact factor: 10.479

Introduction

In general, engineering systems are characterized by some designable parameters and some performance measures. The desired performance of a system (design specifications) is described by specifying bounds on the performance measures of the system which is set by the designer. The conventional system design aims at finding values of the system designable parameters that merely satisfy the design specifications. In general, there will be a multitude of acceptable designs. However, for contemporary engineering design, other criterion (objective function) can be chosen for comparing the different alternative acceptable designs (optimization problem) and for selecting the best one (optimal system design). Naturally, system performance measures and the objective functions are functions of system parameters values and evaluated through system simulations. For intensive CPU engineering systems, the high expense of the required system simulations may obstruct the optimization process. In practice, robust optimization methods that utilize the fewest possible number of function evaluations are greatly required [1], [2]. Another difficulty is the absence of any gradient information as the required simulations cost in evaluating the gradient information is prohibitive in practice [3]. Attempting to approximate the function gradients using the finite difference approach requires much more function evaluations, which highly increase the computational cost. Another objection in estimating the gradients by finite differencing is that the estimated function values are usually contaminated by some numerical noise due to estimation uncertainty. Hence, gradient- based optimization methods cannot be applied here. For such optimization problems, only derivative free optimization (DFO) methods can be applicable [1], [4], [5], [6], [7]. Further, the derivative free trust region methods usually handle such problems more efficiently as the trust region framework constitutes one of the most important globally convergent optimization methods, which has the ability to converge to a solution starting from any arbitrary initial point [8]. In addition, these methods use computationally cheap surrogate-based models that can be constructed by using function evaluations at some selected points. These surrogate models may be response surfaces, radial basis method, neural networks, kriging, etc. The majority of the existing derivative-free trust region techniques have the following features: They require a relatively large number of function evaluations, O(n) (where n is the number of system design variables) to construct the initial quadratic model. The quadratic surrogate models are constructed via interpolating the objective function at a constant number of points; when a point is obtained a previous point is dropped. In addition, these algorithms usually ignore the valuable information contained in all previously evaluated expensive function values. The work presented in this article introduces a new derivative free trust region approach that neither require nor approximate the gradients of the objective function. It implements a non-derivative optimization method that combine a trust region framework with quadratic fitting surrogates for the objective function [4], [5]. The principal operation of the method relies on building, successively updating and optimizing quadratic surrogate models of the objective function over trust regions. The quadratic surrogate models reasonably reflect the local behavior of the objective function in a trust region around the current iterate and they are optimized instead of the objective function over trust regions. Truncated conjugate gradient method by Steihaug [9] is used to find the optimal point within each trust region. The approach constructs the initial quadratic surrogate model using few data points of order O(n). In each iteration of the proposed approach, the surrogate model is updated using a weighted least squares fitting. The weights are assigned to give more emphasis to points close to the current center point. The accuracy and efficiency of the proposed approach are demonstrated by applying it on a set of classical benchmark test problems and comparisons with a recent optimization technique [6] are also included. The linear accelerators (LINACS) provide beams of high quality and high energy in which charged particles move on a linear path and are accelerated by electromagnetic fields. The modern LINAC typically consists of sections of specially designed waveguides that are excited by RF electromagnetic fields, usually in the very high frequency (VHF) range. The accelerating structures are tuned to resonance and are driven by external, high-power RF power tubes, such as klystrons. The accelerating structures must efficiently transfer the electromagnetic energy to the beam, and this is accomplished through an optimized configuration of the internal geometry, so that the structure can concentrate the electric field along the trajectory of the beam promoting maximal energy transfer, by adding nose cones to create a region of more concentrated axial electric field as shown in Fig. 1. RF cavity analysis and design brought researchers and engineers’ attention due to its extensive applications [10], [11], [12], [13], [14], [15], [16], [17]. Applications include: medicinal purposes in radiation therapy, food sterilization and transmute nuclear fuel waste, etc. Design tools include: the computer code SUPERFISH [18], 3-D code MAFIA [19] and CST Studio Suite [20].

Fig. 1

Cross section of the cavity with nose cones and spherical outer walls.

Design of accelerator RF cavities may include optimization of some of cavity parameters. Among the parameters characterizing the operation of the RF cavities are, the average accelerating field Eacc, peak fields to accelerating field (Epk/Eacc, Hpk/Eacc), quality factor, and cavity shunt impedance R-shunt [21]. The parameters considered for optimization depend on the power level fed to the cavity, which limit the average accelerating field, where the constraints on these parameters are imposed by the application. For low power level feed, optimization may focus on maximizing shunt impedance, however for high power operation, limiting the peak fields inside is of concern in order to minimize multipacting [22]. In this work we will focus on the low power fed cavities, where maximizing the shunt impedance is of main concern and will be treated through our new optimization approach. The new proposed trust region (TR) optimization approach is capable of solving the design problems with either 2D or 3D simulators. It is expected to work as well if a 3D simulator was employed with the expense of more computational time. Most of the accelerators use body of revolution cavity structure which can be solved as 2D structure, saving the computational resources. However, the proposed approach was successfully employed in microwave filter design utilizing 3D full-wave EM solver [23].

The new trust region approach

The computationally expensive objective function is locally approximated around a current iterate by a computationally cheaper quadratic surrogate model M() which can be placed in the form:where , the vector , and the symmetric matrix are the unknown parameters of M(). The total number of the model parameters is q = (n + 1)(n + 2)/2. These parameters can be evaluated by interpolating the objective function at q points.

Initial model

Let 0 be the initial point that is provided by the user. Initially, assuming that is a diagonal matrix, then the number of points required to construct the initial model is m = 2n + 1 [7]. The initial m points , i = 1, 2, …, m, can be chosen as follows [6], [24]where Δ1 is the initial trust region radius that is provided by the user, and is the ith coordinate vector in . The initial quadratic model M(1)() will have the parameters a(1), the vector (1), and the n diagonal elements of the model Hessian matrix (1). These parameters are computed by requiring that the initial model interpolates the objective function f(x) at the initial m points given in (2). Therefore the initial model parameters are obtained by satisfying the matching conditions:

Model optimization

At the kth iteration, assume that is the current solution point. The model M()() is then minimized, in place of the objective function, over the current trust region and a new point is produced by solving the trust region sub-problem:where = − , Δ is the current trust region radius, and throughout is the l2-norm. This problem is solved by the method of truncated conjugate gradient by Steihaug [9]. It is identical as the standard conjugate gradient method as long as the iterates are inside the trust region. If the conjugate gradient method terminates at a point within the trust region, this point is a global minimizer of the objective function. If the new iterate is outside the trust region, a truncated step which is on the region boundary is considered. Also, the method treats the case where the minimum is in the opposite direction of the conjugate direction which is due to the non convexity of the model [9]. One good property of this method is that the solution computed has a sufficient reduction property, which was proved by Bandler and Abdel-Malek [25]. Let ∗ denotes the solution of (4), and then a new point = + ∗ is obtained. The achieved actual reduction in the objective function is compared to that predicted reduction using the model by computing the reduction ratio which is given by: This ratio reflects how much the surrogate model agrees with the objective function within the trust region. The trust region radius and the current iterate will be updated such that, if r is sufficiently high, i.e., r ⩾ 0.7, there is a good agreement between the model and the objective function over this step. Hence, it is beneficial to expand the trust region for the next iteration, and to use as the new center of the trust region. If r is positive but not close to 1, i.e., 0.1 ⩽ r < 0.7, the trust region radius is not altered. On the other hand, if r is smaller than a certain threshold, r < 0.1, the trust region radius is reduced. The updating formula used for updating Δ and can be expressed as follows: It is to be mentioned that the current center is the point of least function value achieved so far.

Model update

When a new point is available, the current quadratic model M()() is updated so that the point of lowest objective function value is now the center of the kth trust region. The model will take the form: The parameters: a(), () and () are evaluated employing the parameter values of the previous model M−1() in addition to all available function values. The constant a() is assigned the value of f(), i.e., a() = f(). The model will be updated in two steps. First, the vector () is updated then the Hessian matrix () is updated as follows: Step1: Updating the vector The vector () can be obtained using only n points. However, using the n recent points may result in ill-conditioned system of linear equations. In order to avoid this, it is proposed to use the least squares approximation with the most recent 2n points. So, the vector () is evaluated such that the model M() fits the last 2n points obtained, , i = 1, 2, …, 2n, i.e., the following condition should be satisfied: When computing the vector (), the matrix () is assigned temporarily the value of the previous model Hessian matrix, (−1), hence the vector () is obtained by solving the following system of linear equations:where The previous system is an over-determined system. The least squares approximation for () is Step2: Updating the matrix The model Hessian matrix () is evaluated using the following updating formula:where c is a positive constant, 0.5 < c < 1, and the vector , This choice of , ensures that changes in () occur gradually. The vector is evaluated such that the model M()() tries to fit all the available m points obtained so far, , i = 1, 2, …, m, i.e., the following condition should be satisfiedi.e., the vector is obtained by solving the weighted system of linear equationswhere To obtain more accurate model in the neighborhood of the current center, the available points are assigned different weights w, i = 1, 2, …, m according to their distances from the trust region center. In the proposed approach the weight w, associated with each equation, takes the form:where c1 is a positive constant, c1 ⩾ 1. The previous system in (16) is an over-determined system (m > n). The least squares approximation for is After getting the vector , the term T is calculated and the matrix is made symmetric by resetting the off-diagonal elements to their average values, i.e., b = b ← (b + b)/2, then the new Hessian matrix () is updated according to Eq. (13). The model can be improved by generating a new point = − , which is chosen to be on the boundary of the trust region so that it improves the distribution of points around the center of the trust region. A suggested solution to find is to solve the following problem:where is selected to maximize the sum of squares of the projections of the vector on the other , i = 1, 2, …n vectors, where n is the available set of points. After generating the function value f() is computed. If f() is found to be less than f(), then will be considered as the new trust region center of the subsequent iteration, otherwise, will just be added to the available set of points.

Algorithm

A complete algorithm for the proposed method is given below (see also an illustrative flowchart in Fig. 2).

Fig. 2

A flowchart for the proposed optimization algorithm.

Examples

The effectiveness of the proposed algorithm is demonstrated through two benchmark examples. All results are compared with those obtained by NEWUOA (NEW Unconstrained Optimization Algorithm) by Powell [6]. The performance is measured by the number of function evaluations N required to reach the optimal solution.

The 2D Beale function

The function is by [26]:where a1 = 1.5, a2 = 2.25, and a3 = 2.625. This function has a valley approaching the line x2 = 1, and has a minimum of 0 at (3 0.5). The initial values used for 0 and Δ1 are (0.1 0.1) and 0.8, respectively. The results in Table 1 and Fig. 3 compare the optimal value obtained by applying the proposed technique versus NEWUOA with the same number of function evaluation N.

Table 1

Results of the 2D Beale function compared with NEWUOA.

N	Proposed algorithm	NEWUOA
11	0.8065	14.2031
21	0.1083	0.91702
31	0.0033	0.034386
43	2.3335e−5	1.7965e−4
55	2.6973e−6	6.5829e−11
67	2.5790e−7	6.4829e−11

Fig. 3

Results of the 2D Beale function.

It is to be noticed, that starting from the same initial point and after only 11 iterations; the proposed algorithm gives a function value of 0.8065 while NEWUOA gives 14.2031.

The 3D Box function

The function was proposed by [27]:This function has a minima at (1 10 1), and also along the line{(a a 0)} with value 0. The initial values used for 0 and Δ1 are (0 10 2) and 9.9, respectively. Table 2 shows a comparison of the optimal value obtained after N function evaluations using the proposed algorithm versus NEWUOA (see also Fig. 4).

Table 2

Results of the 3D Box function compared with NEWUOA.

N	Proposed algorithm	NEWUOA
10	0.2413	0.59732
17	5.2048e−3	0.18785
25	2.3149e−3	0.11451
38	4.2472e−4	0.26465e−1
48	4.1820e−5	0.24613e−1
62	4.1771e−6	0.21593e−2
87	1.9203e−6	6.975e−5

Fig. 4

Results of the 3D Box function.

In the above numerical examples, it is to be noticed that at the beginning of the optimization process, the proposed algorithm is much faster than NEWUOA. However, as the optimization gets closer to the optimum, the methods based on interpolation will be more accurate as expected. This explains why the proposed algorithm is well suited for objective functions that have some uncertainty in their values or subject to statistical variations. This may occur for design of systems whose parameter values are subject to known but unavoidable statistical fluctuations [1], [28]. Also, the algorithm may be useful for surrogate-based system design [2], [29]. These surrogates are updated during the optimization process, and a few iterations in the optimization process will be sufficient at the beginning. In this case the new technique will produce a significant reduction in few iterations.

Optimized design of RF cavity

The RF cavity is a major component of linear accelerators [30], [31]. The structure of RF cavity must efficiently transfer the electromagnetic energy to the charged particles beam. This can be accomplished through an optimized configuration of its internal geometry, by adding nose cones to create a region of more concentrated axial electric field along the path of the electron beam, as shown in Fig. 1. The most useful figure of merit for high field concentration along the beam axis and low ohmic power loss in the cavity walls is the effective shunt impedance per unit length ZT2 where T is the transient-time factor (a measure of the energy gain reduction caused by the sinusoidal time variation of the field in the cavity, [32]). One of the main objectives in cavity design is to choose geometry to maximize effective shunt impedance per unit length. This indicates increasing the energy delivered to the beam compared to that thermally lost in the cavity walls. The effective shunt impedance per unit length is usually expressed in mega ohms per meter and is defined bywhere P is the thermal power losses in the walls of the cavity, V0 = ∫ E(z)dz = E0L, and E0 is the average axial electric field along the cavity axis with length L. The technique is applied to an RF cavity with resonance frequency 9.4 GHz, shown in Fig. 5. The objective is to maximize effective shunt impedance per unit length. In order to do that, we optimize the axial z positions of ten points that describe the cavity curvature through a spline curve. The axial positions = (z1, z2,…, z10) in the z-direction are taken as the design parameters. The radial positions of these points are chosen on a logarithmic scale along r-direction. It is to be noted that during the variation of the curvature, the resonance frequency is always kept at 9.4 GHz. The initial values used for the ten radial positions 0 are all set to 0.6 cm and Δ1 is set to 0.02 cm.

Fig. 5

Structure of radio frequency (RF) cavity.

Cavity design generally requires electromagnetic field-solver that solves Maxwell equations numerically for the specified boundary conditions. In the simulations, POISSON and SUPERFISH are used as the main solver programs in a collection of programs from LANL [18], [33]. The solver is used to calculate the static magnetic and electric fields and radio-frequency electromagnetic fields for either 2-D Cartesian coordinates or axially symmetric cylindrical coordinates. The code SUPERFISH is used to solve for axisymmetric TM0nl modes, for the field components Hphi, Er and Ez. The solution is obtained through solving Hemholtz equation using finite element method FEM over a triangular mesh subject to the proper boundary conditions and symmetries imposed [34]. Design algorithm shown in Fig. 6 is implemented in MATLAB code, where an initial case is chosen corresponding to ten z positions of points with cavity curvature is described with spline curve (step 2). Then the spline interpolated curve is sampled at 100 points, where those sampled points are considered connected with piecewise linear, approximating the cavity curvature. This piece wise linear description is fed to AUTOMESH program to generate mesh (step 3). The solution of lowest TM mode of the cavity is made at step 4 by calling SUPERFISH, and the obtained frequency in step 5 is used to scale the cavity dimensions to keep the resonance frequency at 9.4 GH (step 6). The corresponding scaling is reflected on the obtained cavity shunt impedance (step 7), where this value is fed to the optimizer algorithm to determine the new ten points positions. Then the process is repeated starting from step 2.

Fig. 6

The Poisson Superfish Solver within the proposed optimization (design) loop.

The results of the effective shunt impedance per unit length for RF cavity in mega ohm per meter after N function evaluations for both the proposed algorithm and NEWUOA are shown in Table 3.

Table 3

Results of the RF cavity design compared with NEWUOA.

N	Proposed algorithm	NEWUOA
50	111.771	112.587
75	115.207	116.833
90	117.183	119.316
120	119.01	120.511
160	120.5	120.910
200	121.01	121.211
260	121.301	121.521

It is to be mentioned that starting from the same initial point, the convergence of the proposed algorithm is as best as NEWUOA. However, the advantage of the proposed algorithm is its easy implementation and accessibility for update and modification. The figures of optimal cavity using the proposed algorithm and the NEWUOA are shown in Fig. 7, Fig. 8 respectively.

Fig. 7

The optimized cavity using the proposed algorithm. Effective Shunt impedance per unit length = 121.301 MOhm/m.

Fig. 8

The optimized cavity using NEWUOA. Effective Shunt impedance per unit length = 121.521 MOhm/m.

It worth mentioning that one could criticize the proposed optimized structure, that it contains sharp edge nose, which is difficult to manufacture and is a point of field singularity that causes breakdown. One way to override that problem is to add some curvature to the nose sharp tip, which would slightly reduce the realized shunt impedance.

Conclusions

In this article, a new trust region optimization method that does not require any derivative information has been proposed. In this method, the objective function is approximated via quadratic surrogates, and using few number of initial data points than the exact number of surrogate parameters. Classical benchmark test problems were used to demonstrate the accuracy and efficiency of the proposed method. The results obtained showed the ability of the proposed method to rapidly converge to the final region containing the optimum solution when only a limited number of function evaluations is permissible and when a high accuracy is not really necessary. Least-squares fitting is used instead of interpolation which explains the inaccurate solution in case of explicit objective functions. Thus, the proposed method is suitable for stochastic optimization or objectives that suffer from numerical inaccuracy. In addition, the proposed method has been used to obtain the optimal design for the structure of RF cavity which is the major part of any linear accelerator.

Conflict of Interest

The authors have declared no conflict of interest.

Compliance with Ethics Requirements

This article does not contain any studies with human or animal subjects.

1. Set N = 0 (the number of function evaluations), given x0∈Rn,Δ1>0,0.5<c<1,c1⩾1,Nmax,δ (a termination criterion).

2. Find the initial m points using (2), letting x₁ be the initial trust region center, then construct the initial quadratic model using (3), Set k = 1.

3. Solve the trust region sub-problem (4) using the truncated conjugate gradient method to obtain s^∗ = x_n − x_k of the model M⁽^k⁾(x) over the trust region.

4. Evaluate f(x_n) and compute the reduction ratio by substituting in (5).

5. Update the trust region radius to obtain Δ_k₊₁ using (6).

6. Determine the trust region center of the next iteration x_k₊₁ based on x_k and r_k using (7). If ||f(x_k₊₁) − f(x_k)|| ⩽ δ, the algorithm will be terminated with x_opt = x_k₊₁ and f_opt = f(x_k₊₁). If for two successive iterations, r_k is negative go to Step 9, else continue.

7. Add the point x_n to the set of available points S, if the number of points in S exceeds N_max, remove the farthest point from x_k₊₁.

Comment. To avoid severe computational and storage overhead, a bound N_max is put to limit the uncontrollable increase in the number of stored points. Specifically, when the number of available points reaches N_max the farthest point from the trust region center is removed.

8. Construct the quadratic model M⁽^k⁺¹⁾(x) around x_k₊₁ based on M⁽^k⁾(x) and the set of available points S using the updating procedures in Eqs. (9), (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), then set k = k + 1 and go to Step 3.

9. Generate a new point s_new using (20), add it to the set of points S, then go to Step 8.

1 in total

1. Rigorous analysis of highly tunable cylindrical transverse magnetic mode re-entrant cavities.

Authors: J-M Le Floch; Y Fan; M Aubourg; D Cros; N C Carvalho; Q Shan; J Bourhill; E N Ivanov; G Humbert; V Madrangeas; M E Tobar
Journal: Rev Sci Instrum Date: 2013-12 Impact factor: 1.523

1 in total