Literature DB >> 26361555

Enabling quaternion derivatives: the generalized HR calculus.

Dongpo Xu¹, Cyrus Jahanchahi², Clive C Took³, Danilo P Mandic².

Abstract

Quaternion derivatives exist only for a very restricted class of analytic (regular) functions; however, in many applications, functions of interest are real-valued and hence not analytic, a typical case being the standard real mean square error objective function. The recent HR calculus is a step forward and provides a way to calculate derivatives and gradients of both analytic and non-analytic functions of quaternion variables; however, the HR calculus can become cumbersome in complex optimization problems due to the lack of rigorous product and chain rules, a consequence of the non-commutativity of quaternion algebra. To address this issue, we introduce the generalized HR (GHR) derivatives which employ quaternion rotations in a general orthogonal system and provide the left- and right-hand versions of the quaternion derivative of general functions. The GHR calculus also solves the long-standing problems of product and chain rules, mean-value theorem and Taylor's theorem in the quaternion field. At the core of the proposed GHR calculus is quaternion rotation, which makes it possible to extend the principle to other functional calculi in non-commutative settings. Examples in statistical learning theory and adaptive signal processing support the analysis.

Entities: Chemical Disease Gene

Keywords: generalized HR calculus; non-analytic quaternion function; nonlinear quaternion functions; quaternion derivatives; quaternion least mean square

Year: 2015 PMID： 26361555 PMCID： PMC4555860 DOI： 10.1098/rsos.150255

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

Quaternions have become a standard tool in many modern areas, including image processing [1,2], aerospace and satellite tracking [3], modelling of wind profile in renewable energy [4] and in the processing of polarized waves [5,6]. Compared to real vector algebra, quaternion algebra [7] has been shown to both reduce the number of parameters in the modelling and offer advantages in terms of functional simplicity and accuracy [8,9]. The most common optimization approach in applications is based on the gradient of the objective function; one such algorithm is the quaternion least mean square (QLMS) [4]. The objective functions in practical applications are typically based on the mean square error (MSE), a real function of quaternion variables, and are thus not analytic according to standard quaternion analysis [10-12]. This is a major obstacle to a more widespread use of quaternions in learning systems. The existing ways to find the derivative of a real function f(q) with respect to the unknown quaternion variable q are: — The pseudo-derivative, which considers f as a function of the four real components q,q,q and q of the quaternion variable q, and then takes componentwise real derivatives with respect to the real variables q,q,q and q. In other words, the real-valued function is treated as a real differentiable mapping between and . This leads to unnecessarily long expressions and is especially cumbersome and tedious in complex optimization problems and when dealing with nonlinear functions. — The HR calculus, which is compact and elegant [13], as it finds formal derivatives of f with respect to the quaternion variables and their involutions in a direct way. This applies to both real functions of quaternion variables and nonlinear functions. This approach is based on the differentials of q,q,q,q, which are independent in the quaternion field, as shown in lemmas A.1 and A.2. The advantage of using HR derivatives is that the computations and analysis are kept in the quaternion domain rather than using quaternion-to-real transformations, and many algorithms can be readily extended from the complex to the quaternion domain. Although the HR calculus is a significant step forward, the product and chain rules are not defined within the HR calculus, which complicates the calculation of derivatives of, for example, nonlinear quaternion functions. Other functional calculi [10-12,14] in quaternion analysis similarly suffer from this obstacle. The aim of this work is to revisit the HR calculus [13] and to equip it with the product rule and chain rule in order to solve these long-standing problems in quaternion calculus. Motivated by the complex CR calculus [15-17], we first consider a general orthogonal system which, in conjunction with the HR calculus, introduces the generalized HR (GHR) calculus. The GHR calculus comprises both the left- and right-hand versions of quaternion derivative; these are necessary to consider due to non-commutativity of quaternion product. In particular, we show that for real functions of quaternion variables, such as the standard MSE objective function, the left and right GHR derivatives are identical. An important consequence of this property is that within the GHR calculus, the choice of the left/right GHR derivative is irrelevant for practical applications of quaternion optimization; this is currently a major source of confusion in the quaternion community. Another consequence of the novel product rule is that it not only enables the calculation of the GHR derivatives for general functions of quaternion variables, but also it is generic—if one function within the product is real-valued, this novel product rule degenerates into the traditional product rule, as shown in corollary 4.11. A family of chain rules is also introduced in order to calculate the derivatives of nonlinear functions of quaternion variables, which include complex- and real-valued functions as degenerate quaternion functions. Since at the core of the GHR calculus is the quaternion rotation, this approach can be naturally extended to other functional calculi in non-commutative settings. Finally, we revisit two fundamental theorems in quaternion calculus—the quaternion mean value theorem and quaternion Taylor's theorem—and derive them in a compact and generic form, based on the GHR derivatives. The GHR calculus therefore provides a solution to some long-standing mathematical problems [18] and promises a tool for a more widespread use of quaternions in practical applications. Illustrative examples in statistical signal processing support the analysis.

Background on quaternions

Quaternion algebra

Quaternions are an associative but not commutative algebra over , defined as where {1,i,j,k} is a basis of , and the imaginary units i,j and k satisfy i2=j2=k2=ijk=−1, which implies ij=k=−ji, jk=i=−kj, ki=j=−ik. For any quaternion q=q+iq+jq+kq=S+V, the scalar (real) part is denoted by , whereas the vector part spans the three imaginary parts. For , the quaternion product is given by pq=SS−V⋅V+SV+SV+V×V, where the symbols ′⋅′ and ′×′ denote, respectively, the standard inner product and vector product. The presence of vector product makes the quaternion product non-commutative, i.e. in general for , pq≠qp. The conjugate of a quaternion q is defined as q*=S−V, while the conjugate of a quaternion product satisfies (pq)*=q*p*. The modulus of a quaternion is defined as , and thus |pq|=|p||q|. The inverse of a quaternion q≠0 is q−1=q*/|q|2 which yields an important consequence (pq)−1=q−1p−1. If |q|=1, we call q a unit quaternion. A quaternion q is said to be pure if . For pure quaternions, q*=−q and q2=−|q|2. Thus, a pure unit quaternion is a square root of −1; examples are the imaginary units i,j and k. Quaternions can also be written in the polar form , where is a pure unit quaternion and is the angle (or argument) of the quaternion. We shall next introduce the quaternion rotation and involution operators.

Definition 2.1 (quaternion rotation [19])

For any quaternion q, the transformation geometrically describes a three-dimensional rotation of the vector part of q by an angle 2θ about the vector part of μ, where is any non-zero quaternion. Properties of the quaternion rotation (see [6,20] used in this work) are and Note that the representation in (2.1) can be generalized to a general orthogonal basis {1,i,j,k}, where the following properties hold [19]:

Definition 2.2 (quaternion involution [21])

The involution of a quaternion q around a pure unit quaternion η is given by and represents a rotation of q about η by π. Of particular interest to this work are quaternion involutions around the imaginary units i,j,k, given by [21] which allows us to express the four real-valued components of a quaternion q as [13,21] This is analogous to the complex case, where and y=−(i/2)(z−z*) for any [22]. Note that the quaternion conjugation operator (⋅)* is also an involution and can be written in terms of q,q,q and q as

Analytic functions in

To arrive at the notion of analytic (regular, monogenic) function in , recall that due to the non-commutativity of quaternion product, there are two ways to express the quotient in the definition of quaternion derivative, as shown below.

Proposition 2.3 (Sudbery [10])

Let be a simply connected domain of definition of the function . If for any q∈D exists in then necessarily f(q)=ωq+λ for some . If for any q∈D exists in then necessarily f(q)=qν+λ for some . Proposition 2.3 is discussed in [10,23] and indicates that the traditional definitions of derivative in (2.10) and (2.11) are too restrictive and apply only to linear functions of quaternions. One attempt to relax this constraint is due to Fueter [24,25], whose analyticity condition is termed the Cauchy–Riemann–Fueter (CRF) equation, and is given by [10,11] The limitations of the CRF condition were pointed out by Gentili & Struppa in [14,26], who showed that general polynomial functions (even the identity f(q)=q) satisfy neither the left CRF nor the right CRF. To further relax the analyticity condition, a local analyticity condition (LAC) was proposed in [12], by using the polar form of a quaternion, to give where q=q+V and V=iq+jq+kq. The theory of local analyticity is now well developed, and we refer the reader to [14,26,27] for the slice regular functions and to [28] for applications. More recent work in this area includes [29,30], and references therein. The advantage of the LAC is that both the polynomial functions of q and some elementary functions satisfy either the left LAC or the right LAC. However, in general, the products and compositions of two LAC functions f and g no longer meet the local analytic condition. For example, if f(q)=q and g(q)=ωq, , then f and g satisfy the left LAC, but the product fg=qωq does not satisfy the left LAC. The same applies for the right LAC, when the function g is written as g(q)=qω. Quaternion derivative is defined only for analytic functions; however, in optimization it is often required that the objective function to be minimized or maximized is real-valued. A typical example is the mean square type objective function given by Note that according to the definition of analytic (regular) function given in [10-12,14,26,27], the function J is not analytic. In order to take its derivatives (but not limited to only such real quadratic functions), we can, however, use the HR calculus [13] which extends the complex CR calculus [15,17,31] to the quaternion field. This generalization is not trivial, and the rules of the CR and HR calculus are different; for more detail see §3.

Remark 2.4

It is important to note that the left (right) terminology introduced in (2.12), (2.13) and in the sequel differs from that in [10-12,14,27]. We adopt the use of the left (right) terminology based on the position of ∂f/∂q, ∂f/∂q, ∂f/∂q and ∂f/∂q, rather than on the positions of imaginary units i,j,k. Although this is only a notational difference, we later show that the left derivatives (using this convenience) in definitions 3.2 and 4.1 are in this way equipped with a left constant rule (4.3), that is, the left constant comes out from the left derivative of product, and the left derivatives stand on the left side of the quaternion differential in (A 12) and (A 16). This also allows for a consistent, intuitive and physically meaningful use of terminology.

The HR calculus

Optimization problems involving quaternions arise in a number of applications in control theory, signal processing, robotics and biomechanics. Solutions often require a first- or second-order approximation of the objective function; however, real functions of quaternion variables are essentially non-analytic. The recently proposed HR calculus [13] solves these issues through the use of quaternion involutions. The HR derivatives (the derivation of HR calculus is given in appendix A) are introduced below.

Definition 3.1 (real-differentiability [10])

A function f(q)=f(q,q,q,q)+ if(q,q,q,q)+jf(q,q,q,q)+kf(q,q,q,q) is called real differentiable when f(q,q,q,q), f(q,q,q,q), f(q,q,q,q) and f(q,q,q,q) are differentiable as functions of real variables q,q,q and q.

Definition 3.2 (the HR derivatives [13])

If is real-differentiable, then the formal left HR derivatives of the function f with respect to {q,q,q,q} and {q*,q,q,q} are defined as and the formal right HR derivatives of the function f are defined as where ∂f/∂q, ∂f/∂q, ∂f/∂q and ∂f/∂q are the partial derivatives of f with respect to q, q, q and q.

Remark 3.3

It is important to note that the right HR derivatives exist if and only if the left HR derivatives also exist. The only difference between the left HR derivatives and the right HR derivatives is in the position of the partial derivative ∂f/∂q, ∂f/∂q, ∂f/∂q and ∂f/∂q. Within the left HR derivatives, ∂f/∂q,∂f/∂q,∂f/∂q and ∂f/∂q stand on the left side and imaginary units i,j,k on the right side (cf. (3.1)). It is exactly the opposite case for right HR derivatives. Note that the terms ∂f/∂q,∂f/∂q,∂f/∂q and ∂f/∂q cannot swap positions with the imaginary units i,j,k because of the non-commutative nature of quaternion product.

The validity of the traditional product rule

A straightforward use of the HR derivatives may become too tedious for complicated functions, for example, for the power function f(q)=q. This is because the HR calculus does not satisfy the traditional product rule which would simplify the calculation. Indeed, for two quaternion functions f(q) and g(q), in general we have ∂( fg)/∂q≠f(∂g/∂q)+∂f/∂qg. We shall illustrate this difficulty through the following two examples.

Example 3.4

Find the HR derivative of the function given by where q=q+iq+jq+kq, . Solution: By definition 3.2, the left HR derivative of q2 becomes Alternatively, q(∂q/∂q)+(∂q/∂q)q=2q. This shows the traditional product rule is not valid.

Example 3.5

Find the HR derivative of the function given by Solution: By definition 3.2, the HR derivative of |q|2 is while q(∂q*/∂q)+(∂q/∂q)q*=−q/2+q*, and thus the traditional product rule is not valid.

Remark 3.6

Examples 3.4 and 3.5 show that the left HR derivatives do not admit the traditional product rule. Similarly, the traditional product rule is not applicable for the right HR derivative in definition 3.2.

The generalization of HR calculus: generalized HR derivatives

We now introduce the novel GHR derivatives which comprise both the product and chain rules. This is archived by replacing the basis {1,i,j,k} in definition 3.2 with a general orthogonal basis {1,i,j,k}, see also (2.5). The derivation of the GHR calculus is similar to that of the HR calculus in appendix A and is omitted for space considerations.

Definition 4.1 (the generalized HR derivatives)

If is real-differentiable, then the left GHR derivatives of the function f with respect to q and q are defined as while the right GHR derivatives are defined as where q=q+iq+jq+kq, , ∂f/∂q, ∂f/∂q, ∂f/∂q and ∂f/∂q are the partial derivatives of f with respect to q, q, q and q, while the set {1,i,j,k} is an orthogonal basis of .

Remark 4.2

The GHR derivatives are more concise and physically more intuitive than the HR derivatives, which are a special case of the GHR derivatives for μ={1,i,j,k} (definitions 3.2 and 4.1). The concept of the GHR derivative can also be applied to other orthogonal systems, such as {1,η,η′,η′′} in [32].

Proposition 4.3 (constant rule)

Let be real-differentiable. Then, the following holds: and where are non-zero quaternion constants.

Proof.

By the definition of the left GHR derivative in (4.1), we have Using the equality (q)=q in (2.4), the second equality of (4.3) is proved as Hence, (4.3) immediately follows, and (4.4) can be proved in a similar way. ▪

Remark 4.4

It is important to note that if a function f is premultiplied by a constant η in the first equality of (4.3), then the left GHR derivative of the product is equal to the left GHR derivative of f premultiplied by the constant, but not for postmultiplication. In other words, the left constant ν can come out from the derivative of the product; for this reason we refer to (4.1) as the left GHR derivative. The equalities in (4.4) complement those in (4.3). Thus, we refer to the derivatives in (4.2) as the right GHR derivatives, denoted by ∂ in order to distinguish them from the left GHR derivatives in (4.1).

Proposition 4.5 (conjugate rule)

Let be real-differentiable. Then, the following holds: By the definition of the right GHR derivative in (4.2), we have Using the equalities (pq)*=q*p* and (q*)=(q)* in (2.4), the above equality becomes Hence, the first part of (4.7) follows, and the second part can be proved in a similar way. ▪

Proposition 4.6 (rotation rule)

Let be real-differentiable. Then the following holds: Using the equalities in (2.4) and the left GHR derivative in (4.1), we have Hence, the first part of (4.10) follows, and the second part can be proved in a similar way. ▪

Corollary 4.7

Let be real-differentiable. Then, the following holds: Since ∂f/∂q=∂f/∂q for any η∈{1,i,j,k}, then the proof follows directly from proposition 4.6. ▪

Proposition 4.8

Let be real-differentiable. Then, the following holds: Since f is real-valued, its partial derivatives ∂f/∂q, ∂f/∂q, ∂f/∂q and ∂f/∂q are real numbers, which yields (∂f/∂ξ)*=∂f/∂ξ, where ξ∈{q,q,q,q}. Using the equality (q)*=(q*) in (2.4) and the left GHR derivative in (4.1), we have Hence, the first part of (4.13) follows, and the second part can be proved in a similar way. ▪

Remark 4.9

From the identity (4.13), observe that the left GHR derivative is equal to the right GHR derivative if the function f is real-valued. This result is instrumental for practical applications of quaternion optimization, where the objective function (or cost function) is frequently real-valued. By virtue of the GHR calculus, the choice of the left/right GHR derivative therefore becomes obsolete as shown in (4.13). In the sequel, without loss in generality we shall mainly focus on the left GHR derivatives.

The novel product rule

We now introduce a novel product rule into quaternion analysis and show that the traditional product rule is a special case of the proposed product rule in corollary 4.11.

Theorem 4.10 (product rule of left GHR)

If the functions are real-differentiable, then so too is their product fg, so that where ∂f/∂q and ∂f/∂q can be obtained by replacing μ with gμ in definition 4.1. The proof is given in appendix B. ▪

Corollary 4.11

If the functions and are real-differentiable, then their product fg satisfies the traditional product rule where ∂f/∂q and ∂f/∂q are the left GHR derivatives in definition 4.1. Since q=q and q=q for a real function , the corollary follows. ▪ We now present some GHR derivatives of nonlinear quaternion functions enabled by the GHR calculus. These are very useful in applications, such as in nonlinear adaptive filters and quaternion-valued neural networks.

Example 4.12 (split-quaternion function)

Find the GHR derivative of a split-quaternion function given by where and is a real-valued differentiable function. Solution: By the definition of the left GHR derivatives in (4.1), it follows that and This shows that the GHR derivatives of the split-quaternion function are real-valued.

Example 4.13 (power function)

Find the GHR derivative of the power function given by where n is any positive integer. Solution: Using the product rule in theorem 4.10, it follows that where the term (∂q/∂q)μ, given in table 1, was used in the last equality. Note that the above expression is recurrent in (∂q/∂q)μ. Upon expanding this expression and using the initial condition , this yields In a similar manner, we have which is equivalent to

Table 1.

Summary of the rules for the left GHR derivatives.

f(q)	∂f∂qμμ	∂f∂qμ∗μ	note
q	R(μ)	−12μ∗	—
ωq	ωR(μ)	−12ωμ∗	∀ ω∈H
qν	R(νμ)	−12(νμ)∗	∀ ν∈H
ωqν+λ	ωR(νμ)	−12ω(νμ)∗	∀ ω,ν,λ∈H
q*	−12μ∗	R(μ)	—
ωq*	−12ωμ∗	ωR(μ)	∀ ω∈H
q*ν	−12(νμ)∗	R(νμ)	∀ ν∈H
ωq*ν+λ	−12ω(νμ)∗	ωR(νμ)	∀ ω,ν,λ∈H
q⁻¹	−q−1R(q−1μ)	12q−1μ∗(q∗)−1	—
(q*)⁻¹	12(q∗)−1μ∗q−1	−(q∗)−1R((q∗)−1μ)	—
(ωqν+λ)⁻¹	−fωR(νfμ)	12fω(νfμ)∗	∀ ω,ν,λ∈H
(ωq*ν+λ)⁻¹	12fω(νfμ)∗	−fωR(νfμ)	∀ ω,ν,λ∈H
q²	qR(μ)+R(qμ)	−12qμ∗−12(qμ)∗	—
(q*)²	−12q∗μ∗−12(q∗μ)∗	q∗R(μ)+R(q∗μ)	—
(ωqν+λ)²	gωR(νμ)+ωR(νgμ)	−12gω(νμ)∗−12ω(νgμ)∗	g=ωqν+λ
(ωq*ν+λ)²	−12gω(νμ)∗−12ω(νgμ)∗	gωR(νμ)+ωR(νgμ)	g=ωq*ν+λ
R(q)	14μ	14μ	—
R(ωqν+λ)	14μνω	14μω∗ν∗	∀ ω,ν,λ∈H
R(ωq∗ν+λ)	14μω∗ν∗	14μνω	∀ ω,ν,λ∈H
\|q\|	14\|q\|μq∗	14\|q\|μq	—
\|q\|²	12μq∗	12μq	—
\|ωqν+λ\|	14\|g\|μνg∗ω	14\|g\|μω∗gν∗	g=ωqν+λ
\|ωq*ν+λ\|	14\|g\|μω∗gν∗	14\|g\|μνg∗ω	g=ωq*ν+λ
\|ωqν+λ\|²	12μνg∗ω	12μω∗gν∗	g=ωqν+λ
\|ωq*ν+λ\|²	12μω∗gν∗	12μνg∗ω	g=ωq*ν+λ
q\|q\|	1\|q\|R(μ)−14\|q\|3qμq∗	−12\|q\|μ∗−14\|q\|3qμq	—
q∗\|q\|	−12\|q\|μ∗−14\|q\|3q∗μq∗	1\|q\|R(μ)−14\|q\|3q∗μq	—
ωqν+λ\|ωqν+λ\|	ω2\|g\|R(νμ)+g4\|g\|3ν∗(ω∗gμ)∗	−ω4\|g\|(νμ)∗−g2\|g\|3ν∗R(ω∗gμ)	g=ωqν+λ
ωq∗ν+λ\|ωq∗ν+λ\|	−ω2\|g\|(νμ)∗−f\|g\|∂\|g\|∂qμμ	ω\|g\|R(νμ)−f\|g\|∂\|g\|∂qμ∗μ	g=ωq*ν+λ

Summary of the rules for the left GHR derivatives.

Example 4.14 (exponential function)(power function)

Find the GHR derivative of the function given by Solution: From (4.22), it follows that In a similar manner, we have

Remark 4.15

The exponential function is the most important elementary function, as both trigonometric functions and hyperbolic functions can be expressed in terms of the exponential function. The elementary function in example 4.14 is a power series, and it does not change the direction of the vector part of quaternion. Therefore, such elementary functions can swap positions with a quaternion q, i.e. f(q)q=qf(q), giving an important property, f*(q)=f(q*), which can be used in practical applications, such as quaternion neural networks [33] and quaternion nonlinear adaptive filters [28]. It is important to note that if a quaternion variable q degenerates into a real variable x in the definitions of elementary functions in this subsection, then the GHR derivatives simplify into the standard real derivatives, e.g. the GHR derivative of the power function in (4.22) will become nx. Therefore, the GHR derivatives are a generalized form of the real derivatives and the real derivatives are a special case of the GHR derivatives.

The chain rule

Another advantage of the GHR derivatives is that they admit the chain rule, which is formulated in the following theorem.

Theorem 4.16 (chain rule of left GHR)

Let and let be real-differentiable at an interior point q of the set S. Let be such that g(q)∈T for all q∈S. Assume that is real-differentiable at an interior point g(q)∈T, then the composite function f(g(q)) satisfies the following chain rules: and where , and ∂f/∂g=∂f(g)/∂g and ∂f/∂g=∂f(g)/∂g are the left GHR derivatives. The proof of theorem 4.16 is given in appendix C. ▪ Theorem 4.16 is also valid for complex-valued and real-valued composite functions of quaternion variables, as stated in the following two corollaries, the proofs of which are similar to that of theorem 4.16, and are thus omitted.

Corollary 4.17

Let and let be real-differentiable at an interior point q of the set S. Let be such that g(q)∈T for all q∈S. Assume that is real-differentiable at an interior point g(q)∈T, then the left GHR derivatives of the composite function f(g(q)) are as follows: where , and ∂f/∂g=∂f(g)/∂g and ∂f/∂g*=∂f(g)/∂g* are the complex CR derivatives within the CR calculus.

Corollary 4.18

Let and let be real-differentiable at an interior point q of the set S. Let be such that g(q)∈T for all q∈S. Assume that is real-differentiable at an interior point f(q)∈T, then the left GHR derivatives of the composite function f(g(q)) are as follows: where and f′(g) is the real derivative of a real-valued function.

Theorem 4.19 (chain rule of right GHR)

Let and let be real-differentiable at an interior point q of the set S. Let be such that g(q)∈T for all q∈S. Assume that is real-differentiable at an inner point g(q)∈T, then the right GHR derivatives of the composite function f(g(q)) are as follows: and where , and ∂f/∂g=∂f(g)/∂g and ∂f/∂g=∂f(g)/∂g are the right GHR derivatives. The proof of theorem 4.19 is similar to that of theorem 4.16 and is thus omitted. ▪

Mean value theorem

The mean value theorem is one of the most important tools in calculus, and we next introduce its compact version for general quaternion-valued functions of quaternion variables.

Theorem 4.20 (mean value theorem of left kind)

Consider a continuous function for which the left GHR derivatives exist and are continuous in the set S. Then, for any q0,q1∈S for which the segment joining them also lies in S, we have where μ∈{1,i,j,k}, λ=q1−q0, and ∂f(q0+tλ)/∂q=∂f(q)/∂q| is the left GHR derivative as in definition 4.1. Denote F(t)=f(g(t)), where g(t)=q0+tλ and 0≤t≤1. Then F(t) is continuous on [0,1] and has derivatives in (0,1). By theorem 4.16, the derivative of F(t) is Upon substituting (4.35) into , with F(0)=f(q0) and F(1)=f(q1), the theorem follows. The second equality can be proved in a similar manner. ▪

Corollary 4.21

Consider a continuous function for which the left GHR derivatives exist and are continuous in the set S. Then, for any q0,q1∈S for which the segment joining them also lies in S, we have where λ=q1−q0 and ∂f(q0+tλ)/∂q=∂f(q)/∂q| is the left GHR derivative as in definition 4.1. Function f is real-valued, and therefore ∂f/∂q=∂f/∂q, where η∈{1,i,j,k}. From (2.3) and (4.10), we now have The corollary then follows from (2.7) and theorem 4.20, while the second equality can be derived using . ▪ For λ which is sufficiently small in the modulus, the right-hand side of (4.34) can be approximated as If the left GHR derivatives of f are Lipschitz continuous in the vicinity of q and q1, with the Lipschitz constant L, we can estimate the error in this approximation as

Taylor's theorem

We can now introduce a novel, rigorous version of Taylor's theorem for quaternion-valued functions of quaternion variables, as a generalization of the standard univariate Taylor's theorem.

Lemma 4.22 (Apostol [34])

Consider a (k+1)-times continuously differentiable function . If x∈D, then where the remainder R is given by

Theorem 4.23 (Taylor's theorem of left kind)

Consider a third-order continuous real-differentiable function . If q0,q0+λ∈S such that the segment joining them also lies in S, then where μ,ν∈{1,i,j,k}, and ∂2f/∂q∂q and ∂2f/∂q∂q are the second-order left GHR derivatives. Define an auxiliary function g(t)=f(q0+tλ) with 0≤t≤1. Using the chain rule in theorem 4.16, we obtain where μ,ν,η∈{1,i,j,k}. The second-order Taylor polynomial in lemma 4.22 then gives which is equivalent to where This integral contains three factors of λ, while the remaining factors are bounded. Therefore, R2 is of the order |λ|3, making the fraction |R2|/|λ|3 bounded as . Hence, the first equality of the theorem follows, and the second equality can be proved in the same way. ▪

Corollary 4.24

Consider a third-order continuous real-differentiable function . If q0,q0+λ∈S such that the segment joining them also lies in S, then where ν∈{1,i,j,k}, and ∂2f/∂q∂q and ∂2f/∂q∂q* are the second-order left GHR derivatives. Using the result for in table 1 and the chain rule in theorem 4.16, this corollary is proved similarly to the proof of corollary 4.21. ▪

Theorem 4.25 (Taylor's theorem of centre kind)

Consider a third-order continuous real-differentiable function . If q0,q0+λ∈S such that the segment joining them also lies in S, then where μ,ν∈{1,i,j,k}, and ∂2f/∂q∂q and ∂2f/∂q∂q are the second-order left GHR derivatives. Define an auxiliary function g(t)=f(q0+tλ) with 0≤t≤1. Using the chain rule in theorem 4.19 and using the property in (4.13), we obtain Using the constant rule (4.3), we arrive at where μ,ν∈{1,i,j,k}. The rest of the proof is almost the same as that in theorem 4.23 and is thus omitted. ▪

Remark 4.26

The Taylor expansion in theorem 4.23 is concisely expressed using the GHR derivatives. This is different from the Taylor expansion given by Schwartz [18], which decomposes a quaternion q into two mutually perpendicular quaternions in a local coordinate system. In contrast, our approach treats the quaternion q as an augmented quaternion based on quaternion involutions [21]. Schwartz has also stated that his Taylor expansion would cause trouble when the function has terms qωq, where ω is a general quaternion. Note that there are no such restrictions in theorem 4.23, which only requires the real-differentiability condition in definition 3.1, that is, the functions f(q) should admit the partial derivatives with respect to the four real components q,q,q and q.

Applications of the generalized HR calculus

We now illustrate the utility of the GHR calculus in optimization, statistics, signal processing, machine learning and other application areas.

Derivation of the widely linear quaternion least mean square algorithm

The widely linear QLMS (WL-QLMS) algorithm is based on the quaternion widely linear model y(n)=wT(n)p(n) which deals with the generality of quaternion signals (both proper and improper) [8,22,35], where p=(xT(n),x(n),x(n),x(n))T is the augmented input vector and w=(hT(n),gT(n),uT(n),vT(n))T is the associated weight (parameter) vector. The cost function to be minimized is a real-valued function of quaternion variables where e(n)=d(n)−y(n) is the error between the desired signal d(n) and the filter output y(n). The weight update of WL-QLMS is then given by where α is the step size, (⋅)T denotes the transpose and the gradient defines the direction of the maximum rate of change of J [13,36]. By the product rule within the GHR calculus, given in theorem 4.10, we have where the time index ‘n’ is omitted. Note the convention that ∂f/∂w is a row vector whose nth element is ∂f/∂w. The above GHR derivatives are calculated as where the terms ∂(qν)/∂q* and ∂(ωq*)/∂q are given in table 1 and are used in the last equalities in the expressions above. Substituting (5.4) into (5.3) yields Finally, the update of the adaptive weight vector of WL-QLMS becomes where the constant in (5.5) is absorbed into α.

Remark 5.1

There are many variations of WL-QLMS algorithms, such as the WL-QLMS algorithms based on variants {x*,x,x,x}, {x,x,x,x} and {x*,x,x,x}. Note that if we start from y(n)=w(n)p(n), the final update rule would become w(n+1)=w(n)+αp(n)e*(n).

Derivation of quaternion nonlinear adaptive filtering algorithms

Tools of the GHR calculus allow us to concisely derive quaternion nonlinear adaptive filtering algorithms, a basis for fast-growing area of quaternion learning system. The same real-valued quadratic cost function as in real LMS and complex LMS is used, that is where e(n)=e(n)+ie(n)+je(n)+ke(n), e(n)=d(n)−Φ(s(n)), s(n)=wT(n)x(n), and Φ is the quaternion nonlinearity. The weight update is given by where α>0 is the real step size, (⋅)T denotes the transpose, and the gradient defines the direction of the maximum rate of change of J [13,36]. By using the chain rule in theorem 4.16, the above gradient can be calculated as where time index ‘n’ is omitted for convenience. Note the convention that ∂f/∂w is a row vector whose nth element is ∂f/∂w. Using the term ∂(ωq*)/∂q* in table 1, we have Upon applying the second equality in (4.12) and using the term ∂(ωq*)/∂qμ in table 1, this yields Using the chain rule in corollary 4.18, we have Upon substituting (5.10)–(5.12) into (5.9), we arrive at Finally, the weight update for this quaternion nonlinear adaptive filtering algorithm becomes where the constant 2 in (5.13) is absorbed into the step size α. For illustration, consider an example where Φ is a nonlinear split-quaternion function Φ(s)=φ(s)+iφ(s)+jφ(s)+φ(s) and is a real-valued differentiable function. Then, In a similar manner, we have The weight update for such a quaternion nonlinear adaptive filtering algorithm becomes where the constant in (5.15) and (5.16) is absorbed into the step size α. Another example is when Φ is a quaternion linear function Φ(s)=s, that is φ(x)=x in (5.17). Then, the update of the adaptive weight vector within the QLMS algorithm becomes illustrating the generic nature of the GHR calculus.

Remark 5.2

From (5.17) and (5.18), we note that quaternion linear and nonlinear adaptive filtering algorithms have been developed in a unified form. This also shows that the GHR calculus gives us much more freedom in the design, as the nonlinear function Φ is not required to satisfy the odd-symmetry condition [28,37]. We can derive many other algorithms, such as the augmented quaternion nonlinear gradient descent algorithm [38], in the same way. In the interest of space, we leave this to the interested reader.

Remark 5.3

The QLMS algorithm (5.18) is different from the QLMS in [4,13], due to the use of different product rule. The traditional product rule was used in [4,13] to simplify the derivation; however, our examples in §3a illustrate that the traditional product rule is not applicable to the HR derivatives. On the other hand, the chain rules within the GHR calculus result in the QLMS in (5.18), which has the same generic form as that of the complex LMS [39]. For the performance comparison and steady-state analysis of the existing QLMS algorithms, we refer the reader to [40] for more details.

Conclusion

A novel and rigorous framework for the efficient computation of quaternion derivatives, referred to as the GHR calculus, has been established. The GHR methodology has been shown to greatly relax the existence conditions for the derivatives of general nonlinear functions of quaternion variables, and to simplify the calculation of quaternion derivatives through its novel product and chain rules. We have shown that, unlike the existing quaternion derivatives, the GHR calculus is general and can be used for both analytic and non-analytic functions of quaternion variables. The core of the GHR calculus is the use of quaternion rotations in order to overcome the non-commutativity of quaternion product, and the use of quaternion involutions to obtain an elegant quaternion basis. Through the analysis and examples, the proposed framework has been shown to allow for real- and complex-valued optimization algorithms to be extended to the quaternion field in a generic, compact and intuitive way. Application case studies in statistical signal processing and learning systems demonstrate the effectiveness of the proposed GHR framework.

8 in total