Literature DB >> 33265238

Mixture and Exponential Arcs on Generalized Statistical Manifold.

Luiza H F De Andrade1, Francisca L J Vieira2, Rui F Vigelis3, Charles C Cavalcante4.   

Abstract

In this paper, we investigate the mixture arc on generalized statistical manifolds. We ensure that the generalization of the mixture arc is well defined and we are able to provide a generalization of the open exponential arc and its properties. We consider the model of a φ -family of distributions to describe our general statistical model.

Entities:  

Keywords:  exponential arcs; information geometry; mixture arcs; statistical manifold; φ-family

Year:  2018        PMID: 33265238      PMCID: PMC7512662          DOI: 10.3390/e20030147

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


1. Introduction

In the geometry of statistical models, information geometry [1,2,3] is the part of probability theory dedicated to investigate probability density functions equipped with differential geometry structure. A differential-geometric structure to the multi-parameter families of distributions was provided in [4]. In the mid-1980s, other topics related to the subject, such as fiber bundle theory and duality of connections of statistical models, were investigated by Amari [5] and Amari and Nagaoka [6], respectively. In the parametric case, exponential, mixture and -connections, as well as their dual structure, are among the most important geometric objects [6], since the dual structure of the -connections is the key point distinguishing statistical manifolds against arbitrary differential manifolds. Divergence function is an essential topic in information geometry, for both, parametric and non-parametric cases, since a metric and dual connections can be induced from a divergence [7,8,9,10]. To find an information-geometrical foundation for multi-parameter families of probability distributions, with a more general description, is one of topics of interest in information geometry [11,12,13,14] Non-parametric statistical models [15] are important in a wide range of areas [16,17]. In the parametric case, the manifold of probability density functions obtains a Euclidian topology from the space of its natural parameters. As for the non-parameter case, a major challenge is to define a convenient topology and a notion of convergence. Pistone and Sempi [18] were the first to formulate a rigorous infinite dimensional extension. In that work, the set of all strictly positive probability densities was endowed with a structure of exponential Banach manifolds, using Orlicz spaces associated to a Young function. In a later work [19], more properties of the statistical manifold were studied, specifically regarding the orthogonality condition. Similar to in the parametric case, in non-parametric models, the mixture and exponential connections are among the most important geometric objects. To find these connections, it is necessary to guarantee the existence of the open arcs, which are the geodesics of the manifold. Using the notion of exponential convergence, Gibilisco and Pistone [20] investigated those connections. In that work, the exponential and mixture connections were built in a way that the relation between them is the same as in the parametric case. Another approach was used in [21] where the mixture arc was additionally studied. Moreover, Grasselli [21] proved that two probability densities in the same neighborhood are connected by an open mixture arc if and only if the difference between their random variables is bounded. The exponential statistical manifold was later studied in [22], with another system of charts, the statistical model , called the maximal exponential model. Cena and Pistone [22] proved that this model is the set of all positive densities connected to a given positive density p by an open exponential arc and viceversa. In that work, it was used the open mixture arc and the open exponential arc to discuss properties of this model as e-connection and m-connection in the same way that in [6]. This exponential model with the open exponential and mixture arcs were also studied recently by Santacroce et al., 2016 [23] and Santacroce et al., 2017 [24], where a proof of duality properties of statistical models was provided. Examples of applications of non-parametric information geometry to statistical physics using the connection by open arcs were studied in [25]. The generalization of the exponential statistical manifold has been an active topic of research in the last years. Pistone [26] used the Kaniadaki’s -exponential [27] in the construction of a statistical manifold. Vigelis and Cavalcante [28] proposed a -family of probability distributions , which generalizes the exponential family . This generalization is based on the replacement of the exponential function by a deformed exponential which satisfies some properties and provides to the set a Banach manifold structure, so called generalized statistical manifold. In [29], a review of nonparametric information geometry with specific issues of the infinite dimensional setting is provided. In that work, the deformed exponential manifold was studied with a deformed exponential function defined in [30] and a model space was built according to the proposal in [28]. In [31] were given necessary and sufficient conditions for any two probability distributions being connected by a -arc. In this work, we ensure the existence of a generalized mixture arc for probability distributions in the same -family , with a deformed exponential function which satisfies some properties. Moreover, we find a generalization of open exponential arcs and we prove, in the same way that in [22], that the -family is the component connected to a given positive density and viceversa. The rest of the paper is organized as follows. In Section 2, we revisit results about Musielak–Orlicz space and -family of probability distributions. We also briefly recall about the subdifferential of a convex function. In Section 3, where we provide our main results, we ensure that the generalized mixture arc is well-defined. In Section 4, we discuss the generalized, exponential and mixture arcs. Finally, our conclusions and perspectives are stated in Section 5.

2. Preliminary Results

The statistical manifold can be equipped with a structure of -Banach Manifold, using the Musielak–Orlicz space associated to the Musielak–Orlicz function . Each connected component of the statistical manifold gives rise to a -family of probability distributions . In this section, we provide an introduction of Musielak–Orlicz spaces and the construction of the -family of probability distributions.

2.1. -Families of Probability Distributions

Let be a -finite, non-atomic measure space. A function is said to be a Musielak–Orlicz function if is convex and lower semi-continuous for -a.e. (almost everywhere) , and for -a.e. , is measurable for each . We notice that , by (i)-(ii), is not equal to 0 or ∞ on the interval . Let be the linear space of all real-value, measurable functions on T. Given a Musielak–Orlicz function , we denote the functional , for any . The Musielak–Orlicz space, Musielak–Orlicz class, Morse–Transue space generated by a Musielak–Orlicz function are defined, respectively, by and The Musielak–Orlicz space is a Banach space when it is equipped with the Luxemburg norm given by or the Orlicz norm, represented as where is the Fenchel conjugate of , which is also a Musielak–Orlicz function. These norms are equivalent and the inequalities hold for all [32]. A Musielak–Orlicz function is said to satisfy the -condition, or belong to the -class (denoted by ), if we can find a constant and a non-negative function such that If the Musielak–Orlicz function satisfies the -condition, then for every [32]. In this case , and are equal as sets. Moreover, if the Musielak–Orlicz function does not satisfy the -condition, is a proper subspace of . Every function that satisfies the -condition is finite-value. Indeed, we define and assuming that , we get for all which implies that cannot satisfy the -condition. For more information see for instance [32,33]. We say that a Musielak–Orlicz function satisfies the -condition, or belongs to -class, if we can find a constant , and a non-negative function such that We notice that, if , then The function satisfies the The (topological) dual space of , is denoted by and represented in the following way [32,34,35] where is the set of the order continuous functionals and is formed by singular components. If the Musielak–Orlicz function then all functionals in are order continuous and represented by Otherwise, if , then the functionals f in can be uniquely expressed as where is the order continuous component and is the singular component. While exponential families are based on the exponential function, -families are based on deformed exponential functions. A deformed exponential is a function that satisfies the following properties, for -a.e. [28]:for every measurable function for which . is convex and injective; and ; There exists a measurable function such that In de Souza et al. [36], Lemma 1, it was shown that the constraint can be replaced by . Thus, the condition (iii) can be rewritten as:for every measurable function for which . There exists a measurable function such that There are many examples of deformed exponential functions. An example of relevance is the exponential function that satisfies (i)-(iii) with . Another example is Kaniadakis’ -exponential [26,27,28]: The Kaniadakis’ κ-exponential The inverse of One can easily notice the κ-exponential satisfies – [28,36]. The Musielak–Orlicz function for a measurable function such that is -integrable, was defined in [28]. Thus, the sets , and are denoted by , and , respectively, when is given by (5). Let be the collection whose -family is a subset, where is the linear space of all real-valued. For each probability density , we have a -family of probability density associated, according to where the set is the intersection of the convex set with the closed subspace that is . The normalizing function is introduced so that expression (6) is a probability distribution in . Suppose that the Musielak–Orlicz function does not satisfy the -condition, we have that the boundary of , the set , is not empty. A function belongs to if only if for all , and for each . The behavior of the normalizing function near the boundary was studied in [33,37]. It is shown that the normalizing function is a convex function [28]. Assuming that is continuously differentiable, the normalizing function is Gâteaux-differentiable and the expression for Gâteaux-derivative is with and . In the next section, we recall some differentiability properties of convex functions on infinite dimensional spaces.

2.2. The Subdifferential of a Convex function

In this section, we discuss some properties of extended real-valued convex functions in Banach spaces, i.e., functions with values in . Mainly, we recall subdifferentials of lower semicontinuous convex functions and its properties. Let E be a Banach space. A function f is a convex function on E, with the epigraph [38] If for every x and for at least one value of x, we call f a proper function. The set denotes the effective domain of f. A function is said to be lower semicontinuous (l.s.c.) if for every the set is closed. Let be the dual space of E. A vector is said to be a subgradient of f at if We denote by the set of subgradients of f at x and the subdifferential of f is the multivalued mapping from E to . By definition, is always a closed convex subset of for each x. Suppose f is a convex function finite at x. One has if and only if where is the directional derivative of f at x in direction . The subdifferential may be empty at points of , so we denote by the domain of and we have that . We say that f is subdifferentiable at x for all . Let f be a lower semicontinuous proper convex function, then [39] (Corollary 2.38). The conjugate of f is the function defined by Observe that, if f is proper, then “sup” in Equation (9) may be restricted to the points . The conjugate is a convex and lower semicontinuous function on and jointly with f satisfy the well known Young’s inequality with equality holding if and only if . If f is a lower semicontinuous function, the subdifferential of the conjugate function coincides with ([39], Proposition 2.33). It is known that, if f is a lower semicontinuous proper convex function, then and it was shown in [40] that is, in fact, dense in . (([41], Corollary 2.19), ([42], Corollary 7.2.3)). Suppose (([41], Lemma 2.20), ([42], Lemma 7.2.4)). If The subdifferential of a convex function is closely related to Gâteaux-gradient. If the convex function f is Gâteaux-differentiable in , then consists of a single element ([39], Proposition 2.40), where is the Gâteaux-gradient of f at x. In the next section, we investigate the subdifferential of the normalizing function . This result will be useful for us to prove that the generalized mixture arc is well defined, which is one of our main goals in this work.

3. Construction of Generalized Mixture Arcs

The normalizing function is convex and Gâteaux-differentiable and this derivative is given by Equation (8). Hence, with these facts in mind, we can provide the expression for the generalized mixture arc as given by:where and belong to a -family . We can rewrite the functional as with and Equation (13) is the Gâteaux-gradient of . Thus, for the generalized mixture arc to be well defined, it is necessary that the set of these functionals in Equation (13) be convex. As mentioned in Section 2.2, the subdifferential and Gâteaux-gradient are closely related. For this reason, we investigate the subdifferential of .

3.1. Subdifferential of the Normalizing Function

Considering that the Musielak–Orlicz function (5) does not satisfy the -condition, then we have that is not-empty [33]. The effective domain of the normalizing , the set is where is the set of points in the boundary of such that . The behavior of the normalizing function near the boundary of was discussed in [33]. We need to know the subdifferentials of . Hence, we have to prove some properties of , then we have our first result. The normalizing function is lower semicontinuous. Given , let be the set . To prove the statement, it suffices to show that is closed. We define a set and we are going to prove that B is a closed set and that . Let be a sequence which belongs to B, such that . This way, , -a.e. Since is a continuous function, we have that , -a.e. From Fatou’s Lemma, it follows that thus, and B is a closed set. Now, we prove that . Let u be a function which belongs to , then . The function is a strictly increasing function, so that thus, . Suppose that there exists , then , which implies that and , which implies that . Then thus . This contradicts the assumption that . Therefore, and is closed. ☐ The subdifferential of at a function is the set where denotes the dual space of . We know that, for all the normalizing function is Gâteaux-differentiable and the Gâteaux-gradient is given by Equation (13). Hence, consists of a single element and is given by In fact, we prove below that Equation (13) belongs to , for all . Let u be a function in belongs to We have that the functional (16) belongs to . Let v be a function in such that . In other words, , so we have that and . Thus, by the convexity of , we have Thus, and If , then , and Consequently, Inequality (17) holds for all and the result follows. ☐ We need to find the subdifferential of for u in the set . We know that is a proper lower semicontinuous convex function, so where and . As we have that , then for , is unbounded. Since we are interested to prove that the set of functionals in Equation (13) is convex and these functionals are order continuous, we need to analyze only the order continuous part of the subdifferential, i.e., the part of the subdifferential that belongs to . We need to investigate whether the functional in Equation (16) belongs to , for . For this, we will use some results. ([35], Lemma 3.11). Let ([43], Proposition 2.3). Let Φ and Ψ be Musielak–Orlicz functions. Suppose that, for constants α, Then, for constants Let Φ and Ψ, respectively. Suppose that, for constants Then, for constants Defining the function , we can write Calculating the Fenchel conjugate of the functions in the inequality above, we obtain From Proposition 3, we infer that Equation (19) is satisfied. ☐ The The Suppose it satisfies the -condition. If the natural number is such that , then , for all . Conversely, if satisfies Equation (20) and the natural number is chosen so that , then , for all . Assume that Equation (3) is satisfied. Let be a natural number such that . Then , for all . Conversely, if Equation (21) holds and the natural number is chosen so that , then , for all . ☐ The next result follows from Lemmas 2 and 3. A Musielak–Orlicz function satisfies the -condition if, and only if, its complementary function satisfies the -condition. Let be a Musielak–Orlicz function that does not satisfy the -condition and that for μ-a.e. . Then we can find a non-negative function such that . Let , and be given as in Lemma 1. Select a subsequence for which the series converges, and for all . Because is continuous for , we can find such that . Define . Then, we can write and Hence, it follows that which concludes the proof. ☐ The previous proposition makes it clear that we can find a , but . Let u be as in Proposition 4, clearly for , and for , ([35], Remark 3.12). Let where Take and denote , then we define . We can choose such that satisfies . In other words, . It is easy to see that for and for , so . The need to show Equation (22) remains. From Proposition 4 we have that since Thus, consequently . Since , we have that and therefore . We conclude that . Since is a linear set, we have that Equation (22) occurs. ☐ As a consequence of Proposition 5, we have that it is possible to find such that and therefore the functional in Equation (16) does not belong to . We conclude in this section that, if the functional belongs to , then the functional belongs to for . In next section we finally prove that the set of functionals formed by Gâteaux gradient of the normalizing function that belongs to is convex, so we can guarantee that the generalized mixture arc is well defined.

3.2. Convexity of the Functionals Set

We already know that, for the generalized mixture arc in Equation (11) to be well defined, it is necessary that the set of functionals to be convex. From Proposition 2, the set in Equation (24) is contained in the range of , the set given by Let be the conjugate function of . By the fact that be a l.s.c. proper convex function, and are convex sets and the range of is the effective domain of , since . Thus is the same that To prove that the set in Equation (24) is convex, we analyze the set in Equation (25) in three cases. Let be elements in Equation (25) such that , so by convexity of , for , we have . If and , then , for ([41], Fact 2.1). Let be elements in Equation (25) belonging to . We want to prove that, for , belongs to Equation (25). To solve this problem, we are going to prove that . Supposing a strictly convex function, then is a strictly convex function. In next proposition, we show that is a unitary set. Let ψ be a strictly convex function, then is a unitary set, where , with . Assuming that is a strictly convex function we have that for and Supposing that is not a unitary set, i.e., , where , . Taking , . By Young’s Inequality (10) where and as a consequence of we have and Taking the product of Equation (30) by , the product of Equation (31) by and adding the two obtained equations, we have From Equations (29) and (32), we obtain which is a contradiction by Equation (28). This implies that is a unitary set and this completes the proof. ☐ Thus, the set is unitary, then is locally bounded at and, therefore, by Fact 1, we conclude that which implies that , by Equation (26), we have that Therefore, by Fact 2, there exists no functional in Equation (25) such that . Thus Equation (25) is a convex set and, as a consequence, the generalized mixture arc is well defined, since the set in Equation (24) is a convex set. Indeed, let u, v be functions in such that and belong to Equation (24). Clearly, and We note that, the functionals in Equation (24) are the only elements in Equation (25) that satisfy . For we have then there exist functions such that Thus, the set in Equation (24) is a convex set. In this section, we proved that the generalized mixture arc is well defined for a deformed exponential strictly convex. In the next section, we discuss generalized open exponential arcs and generalized open mixture arcs.

4. Generalized Arcs

The concept of arc-connected probability distributions was defined by de Souza et al. [36] defined the concept of arc-connected probability distributions. Fixing any deformed exponential we say that two probability distributions are -connected if, for each , there exists such that In [31], necessary and sufficient conditions for any probability distributions being -connected were provided. In this section, we discuss the concept of two probability distributions are -connected by open arcs. We generalize open exponential arcs and open mixture arcs, defined in [22] and studied later in [23].

4.1. Generalized Open Exponential Arcs

Let us define the generalized open arcs and prove some of its properties. For a fixed deformed exponential φ, we say that p and q in belongs to In the following proposition, we give an equivalent definition of -connection by open arc. are φ-connected by an open arc if and only if there exist an open interval and a random variable , such that belongs to , for all and and . Let us assume that are -connected, i.e., , for all Since where and , then . Moreover, belongs to , for every and and . The converse follows immediately. Suppose that , we have , then , with . ☐ Because of the need to define the open arcs arises. As a consequence of Proposition 7, we have that if are - connected by an open arc, then the random variable , since for all . With this, we can prove the following results. Let , where . We have that if and only if, p and q are φ-connected by an open arc. Supposing , then where . Thus, we have for all , we deduce that is an open arc containing p and q. Conversely, supposing that p and q are -connected by an open arc, by Proposition 7, there exist an open interval and such that belongs to with . If , then and the proof is over. Otherwise, let w be such that thus and . Hence, we have and . ☐ With this, we prove that, for , the -family of probability distributions is the set of all such that q is -connected by an open arc to p. Let and be such that are φ -connected by an open arc. Then the spaces and are equal as sets. It follows from Corollary 1 that p and q are in the same -family, then and by Vigelis and Cavalcante [28], Lemma 5, it follows the result. ☐ Now, we show that the connection by generalized open exponential arcs is an equivalence relation. The relation in Definition 1 is an equivalence relation. Reflexive and symmetry properties follow from the definition and now, we prove transitivity. Consider with , , , with . We have that p is -connected to q and r, respectively. We need to prove that q and r are also -connected. Consider is defined with , , , such that , . Therefore, q and r are -connected. ☐ We know from Corollary 1 that the -family coincides with the set of all which are -connected to p by an open arc. We want now to prove that the -family is convex for some deformed exponential . Let φ be a fixed deformed exponential. Assuming that then We know that, if and , then is a convex function. We have by the fact that is an increasing function . Hence, we have if and only if which follows from Equation (37). ☐ Let for some fixed Note that, for any , . Suppose , and consider for any . We show that by proving that for . In the others words, we will show that and p are -connected for all . For , due the convexity of , we have If , according the convexity of and , we have since , we have by Corollary 1 that q and p are -connected. Hence, so and p are -connected by an open arc, for all . Now, if , the Lemma 4, is a convex function, so where and k a constant. Taking , we have since and, therefore, p and q are -connected by an open arc. ☐

4.2. Generalized Open Mixture Arcs

In Section 3.2, we proved that the generalized mixture arc given by is well defined for . In this section, our goal is twofold: firstly, to ensure that the open arc is also well defined; and, secondly, to provide some properties of these arcs. For such objectives, we use Equation (34), which establishes that is an open set, so we can extend the convex combination in Equation (40) between and beyond these extreme points while maintaining positivity of . Indeed, by the fact is an open set, so there exists such that is the open ball of radius centered at with . Similarly, there exists such that . Taking we guarantee that the combination in (40) can be extended to . For a fixed deformed exponential φ, we say that p and q in belongs to In [22], it was shown that densities connected by open mixture arcs have bounded away from zero ratios. Santacroce et al. [23] showed the converse implication, providing a characterization of open mixture models. Here, one can see that the fundamental role for being connected by open mixture arcs is given by ratios which have to be bounded. The functional in the definition of generalized open mixture arc satisfies . Thus the combination in (41) has to satisfy the same property, that is, . Assume that p and q are -connected by an open mixture arc given according to (41) belong to for all with . Since and , then which implies that and which give to us Combining inequalities (42) and (43),we have Conversely, if we have Equation (44), then and Equation (41) belongs to . Thus, we have that p and q in are -connected by an open mixture arc if and only if the ratio is bounded. By the fact that is an open set, there exists an interval such that belongs to and we have for all . Then, there exist functions such that with that is, the convex combination in Equation (41) is also a functional of the type in Equation (12) for all . Then, the open mixture arc is well-defined. Another property of this connection by generalized open mixture arc is that it is an equivalence relation. The relation in Definition 2 is an equivalence relation. Reflexity and symmetry properties follow from definition. As for the transitivity, consider such that and with for some . We can take and , and define a probability distribution If we have and , we may define a probability distribution as The generalized open mixture arc, , , connects and . ☐

5. Conclusions

In this work, we have generalized open exponential arc and open mixture arc for probability distributions. Moreover, we ensure that the generalization of open mixture arc is well-defined for deformed exponential strictly convex. From two -connected probability distributions and , we can define the generalized parallel transport between the tangent spaces given by where with . A next step is to find a generalized parallel transport that is dual to . Another goal is to investigate if the generalized Rényi divergence defined in [36] from two probability distributions -connected, can be related to the statistical divergence associated with .
  1 in total

1.  Divergence function, duality, and convex analysis.

Authors:  Jun Zhang
Journal:  Neural Comput       Date:  2004-01       Impact factor: 2.026

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.