Literature DB >> 29720846

Sherman's and related inequalities with applications in information theory.

S Ivelić Bradanović¹, N Latif², Ð Pečarić³, J Pečarić^4,5.

Abstract

In this paper we give extensions of Sherman's inequality considering the class of convex functions of higher order. As particular cases, we get an extended weighted majorization inequality as well as Jensen's inequality which have direct connection to information theory. We use the obtained results to derive new estimates for Shannon's and Rényi's entropy, information energy, and some well-known measures between probability distributions. Using the Zipf-Mandelbrot law, we introduce new functionals to derive some related results.

Entities: Chemical Disease Gene

Keywords: Abel–Gontscharoff interpolating polynomial; Entropy; Green function; Information theory; Jensen inequality; Majorization inequality; Sherman theorem; Zipf–Mandelbrot law; n-convex function; ϕ-divergence

Year: 2018 PMID： 29720846 PMCID： PMC5915592 DOI： 10.1186/s13660-018-1692-0

Source DB: PubMed Journal: J Inequal Appl ISSN： 1025-5834 Impact factor: 2.491

Introduction and preliminaries

We start with a brief overview of divided differences and n-convex functions and give some basic results from the majorization theory. An nth order divided difference of a function at distinct points may be defined recursively by The value is independent of the order of the points . A function ϕ is n-convex on if holds for all choices of distinct points , .

Remark 1

From this definition it follows that 1-convex function is an increasing function and 2-convex function is just a convex function. If exists, then ϕ is n-convex iff . Also, if ϕ is n-convex for , then exists and ϕ is -convex for . For more information, see [1]. For two vectors , let denote the ith largest entries of x and y, respectively. It is well known that i.e., we say that x majorizes in symbol , iff for some doubly stochastic matrix , i.e., a matrix with nonnegative entries and rows and columns sums equal to 1. Moreover, implies for every continuous convex function . This result, obtained by Hardy et al. (1929 [2]), is well known as a majorization inequality and plays an important role in the study of majorization theory. Sherman [3] considered the weighted concept of majorization between two vectors and with nonnegative weights and . The concept of weighted majorization is defined by the assumption of existence of a matrix such that The matrix with conditions (1.1) and (1.2) is called row stochastic matrix. Sherman proved that under conditions (1.1)–(1.4) for every convex function , the inequality holds. We can write conditions (1.3) and (1.4) in the form where denotes the transpose matrix. As a special case of Sherman’s inequality, when and , for all , we get the weighted version of majorization inequality Putting and , we get Jensen’s inequality in the form We can get Jensen’s inequality (1.7) directly from (1.5) by setting and . The concept of majorization has a large number of appearances in many different fields of applications, particular in many branches of mathematics. A complete and superb reference on the subject is the monograph [4], and many results from the theory of majorization are directly or indirectly inspired by it. In this paper we give extensions of Sherman’s inequality by considering the class of convex functions of higher order. As a particular case, we get an extension of weighted majorization inequality and Jensen’s inequality which can be used to derive some new estimates for some entropies and measures between probability distributions. Also, we use the Zipf–Mandelbrot law to illustrate the obtained results.

Some technical lemmas

In this section we present two technical lemmas that give us two identities which will be very useful for us to obtain main results. Let us consider the function defined by which presents Green’s function of the boundary value problem This function is convex and continuous with respect to both variables x and y. Integration by parts easily yields that, for any function , the following holds: Applying (2.2) to Sherman’s difference , we obtain the first identity.

Lemma 1

Let , , , and be such that (1.6) holds for some matrix with , . Let G be defined by (2.1). Then, for every function , the following identity holds:

Proof

Using (2.2) in Sherman’s difference, we have Since (1.3) and (1.4) hold, then we have i.e., we get identity (2.3). □ We use the Abel–Gontscharoff interpolation for two points with integral remainder to obtain another identity. Let , , , and . Then where is the Abel–Gontscharoff interpolating polynomial for two points of degree , and the remainder is given by where Further, for , the following inequalities hold: For more information, see [5]. Now we use interpolation (2.4) on to obtain the second identity.

Lemma 2

Let , , , and be such that (1.6) holds for some matrix with , . Let , , , and G, be defined by (2.1), (2.5), respectively. Then, for every function , the following identity holds: If we apply formula (2.4) to a function , it implies substitution of n with in (2.4), and we get Using (2.8) in (2.3), we obtain the required result. □

Extensions of Sherman’s inequality

We start this section with an extension of Sherman’s inequality to a more general class of n-convex functions.

Theorem 1

Let , , , and be such that (1.6) holds for some matrix with , . Let , , , and G, be defined by (2.1), (2.5), respectively. If is n-convex and then If the reverse inequality in (3.1) holds, then also the reverse inequality in (3.2) holds. Under the assumptions of the theorem, identity (2.7) holds. Since ϕ is n-convex, then on . Therefore, if (3.1) is satisfied, then inequality (3.2) holds. □

Remark 2

Since we have by (2.6), then in case is odd, instead assumption (3.1), it is enough to assume that The following extension of Sherman’s inequality, under Sherman’s condition of nonnegativity of vectors , and matrix A, also holds.

Theorem 2

Let , , , and be such that (1.6) holds for some row stochastic matrix . Let , , , be such that is odd. Let G, be defined by (2.1), (2.5), respectively, and be n-convex. Then inequality (3.2) holds. Since by (2.6) we have , then, when is odd, we have Further, , is convex on , and by Sherman’s inequality, we have Combining these two facts, assumption (3.1) is satisfied. Hence by Theorem 1, inequality (3.2) holds. □

Remark 3

In case is even, then the reverse inequality in (3.1) holds, i.e., the reverse inequality in (3.2) holds.

Theorem 3

Let all the assumptions of Theorem 2 be satisfied. If for each and for each and , then If the function is convex on , then (3.3) holds. (i) Under the assumptions,the nonnegativity of the right-hand side of (3.2) is obvious, i.e., the double inequality (3.3) holds. (ii) The right-hand side of (3.2) can be written in the form . So, if F is convex, then by Sherman’s inequality we have i.e., we again get the nonnegativity of the right-hand side of (3.2), which we need to prove. □

Remark 4

Note that inequality (3.3) includes a new lower bound for Sherman’s difference in the form Specially, for , , the lower bound has the form Using notation for the standard p-norm and applying the well-known Hölder inequality, we obtain the following result.

Theorem 4

Let be a pair of conjugate exponents, i.e., , . Let , , , and be such that (1.6) holds for some matrix with , . Let , , , , and G, be defined by (2.1), (2.5), respectively. Then where is defined by (3.5). Under the assumptions of the theorem, identity (2.7) holds. Applying Hölder’s inequality to (2.7), we get □ As a direct consequence of the previous results, choosing and , we get the following corollary.

Corollary 1

Let be a pair of conjugate exponents, i.e., , . Let G be defined by (2.1), , , , and be such that (1.6) holds for some row stochastic matrix . If is 4-convex, then where and

Remark 5

Specially, if we set and for each , from the previous result, as a direct consequence, we obtain the following extension of majorization inequality: where .

Remark 6

By setting , , from (3.7), as a direct consequence, we get the extension of Jensen’s inequality where .

Applications in information theory

Throughout the rest of paper, let be positive real numbers such that . By X we denote a discrete random variable with distribution where is a positive probability distribution, i.e., , , with . Shannon entropy [6] is defined by It is well known that the maximum possible value of concerns in terms of the size of , i.e., the inequality holds. Furthermore, iff for some i, and iff for all . Some related results can be found in [7-13].

Corollary 2

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let and be a positive probability distribution. Then where . Substituting in place of , in place of in (3.10) and choosing , we obtain (4.1). □

Corollary 3

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let p be a positive probability distribution with , . Then where . If we substitute in place of in (4.1), we get the required result. □ Rényi’s entropy [14] of order λ, , is defined by Applying discrete Jensen’s inequality to the convex function , we have Substituting in place of , we get which is equivalent to Specially, we have with the equality in case of the uniform distribution, i.e., when .

Corollary 4

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let and p be a positive probability distribution with , . Then where . Substituting in place of in (4.1), we obtain the required result. □ The information energy of the random variable X is defined by

Corollary 5

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let and be a positive probability distribution. For , we have where For , we have where is defined by (4.3). (i) Substituting in place of , in place of in (3.10), and choosing , we obtain (4.2). (ii) Substituting in place of , in place of in (3.10), and choosing , , we obtain the required result. □

Corollary 6

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let be a positive probability distribution. For , we have where For , we have where is defined by (4.5). (i) Substituting in place of in (4.2) and taking into account that i.e., we get (4.4). (ii) Similar to (i). □ Let be two positive probability distributions. The following measures are well known in information theory: Hellinger discrimination: -divergence: Triangular discrimination: In the following results, we consider positive probability distributions with the assumption of existence of a row stochastic matrix such that where and .

Corollary 7

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let be positive probability distributions such that and (4.6) is satisfied for some row stochastic matrix . Then: where where is defined by (4.8). where is defined by (4.8). If we substitute by , by , by in (3.9) and □ take , we obtain (4.7). take , we obtain (4.9). take , we obtain (4.10).

Applications as the Zipf–Mandelbrot law

The Zipf–Mandelbrot law is a discrete probability distribution depending on three parameters , , and with probability mass function defined by where When , we get so-called Zipf’s law. The Zipf–Mandelbrot, as well as Zipf’s, law has wide applications in many branches of science as well as linguistics [15], information sciences [16, 17], ecological field studies [18], etc. For more information, see also [15, 19]. We introduce the following definitions of Csiszár divergence for the Zipf–Mandelbrot law. For more information about Csiszár divergence, see [20, 21].

Definition 1

(Csiszár divergence for Z–M law) Let and be a function. For and , such that we define Specially, when , , we have For and , such that we define

Corollary 8

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let , and be such that and hold for some row stochastic matrix . Then, for every 4-convex function , we have where If we substitute by , by , and by in (3.9), we obtain the required result. □

Corollary 9

Corollary 10

Let be a pair of conjugate exponents, i.e., , . Let G, be defined by (2.1), (3.8), respectively. Let , , and be such that and hold for some row stochastic matrix . Then, for every 4-convex function , we have where Substituting , in (5.7), we get the required result. □ Next we introduce definitions of Shannon’s entropy for the Zipf–Mandelbrot law.

Definition 2

(Shannon’s entropy for Z–M law) Let . For and , we define For and , we define

Corollary 11

Let be a pair of conjugate exponents, i.e., , . Let be defined by (3.8) and . If and are such that (5.1) and (5.2) hold for some row stochastic matrix , then where is defined by (5.4). If and are such that (5.5) and (5.6) hold for some row stochastic matrix , then where is defined by (5.8). If and are such that (5.9) and (5.10) hold for some row stochastic matrix , then where is defined by (5.11). (i) Substituting by , by , and by and taking in (3.9), we get the required result. (ii) Substituting by , by , by in (3.9) and taking , we get the required result. (iii) Substituting in (5.12), we get the required result. □ At the end, we introduce the Kullback–Leibler divergence for the Zipf–Mandelbrot law. For more information about the Kullback–Leibler divergence, see [22, 23].

Definition 3

(The Kullback–Leibler divergence for Z–M) Let . For and , we define Specially, when , , we have For and , we define

Corollary 12

Let be a pair of conjugate exponents, i.e., , . Let be defined by (3.8) and . If and are such that (5.1) and (5.2) hold for some row stochastic matrix , then where is defined by (5.4). If and are such that (5.5) and (5.6) hold for some row stochastic matrix , then where is defined by (5.8). If and are such that (5.9) and (5.10) hold for some row stochastic matrix , then where is defined by (5.11). (i) Substituting by , by , by in (3.9) and taking , we get the required result. (ii) Substituting by , by , and by for each in (3.9) and taking , we get the required result. (iii) Substituting , in (5.13), we get the required result. □

Conclusions

In this paper we have given generalized results for Sherman’s inequality by considering the class of convex functions of higher order. We obtained an extended weighted majorization inequality as well as Jensen’s inequality as special cases directly connected to information theory. We used the obtained results to derive new estimates for Shannon’s and Rényi’s entropy, information energy, and some well-known measures between probability distributions. Using the Zipf–Mandelbrot law, we introduced new functionals to derive some related results.

1 in total

1. On a Theorem of Hardy, Littlewood, Polya, and Blackwell.

Authors: S Sherman
Journal: Proc Natl Acad Sci U S A Date: 1951-12 Impact factor: 11.205

1 in total