Literature DB >> 33265277

Conformal Flattening for Deformed Information Geometries on the Probability Simplex .

Atsumi Ohara1.   

Abstract

Recent progress of theories and applications regarding statistical models with generalized exponential functions in statistical science is giving an impact on the movement to deform the standard structure of information geometry. For this purpose, various representing functions are playing central roles. In this paper, we consider two important notions in information geometry, i.e., invariance and dual flatness, from a viewpoint of representing functions. We first characterize a pair of representing functions that realizes the invariant geometry by solving a system of ordinary differential equations. Next, by proposing a new transformation technique, i.e., conformal flattening, we construct dually flat geometries from a certain class of non-flat geometries. Finally, we apply the results to demonstrate several properties of gradient flows on the probability simplex.

Entities:  

Keywords:  Legendre conjugate; affine immersion; dually flat structure; gradient flow; invariance; nonextensive statistical physics; representing functions

Year:  2018        PMID: 33265277      PMCID: PMC7512704          DOI: 10.3390/e20030186

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


1. Introduction

The theory of information geometry has elucidated abundant geometric properties equipped with a Riemannian metric and mutually dual affine connections. When it is applied to the study of statistical models described by the exponential family, the logarithmic function plays a significant role in giving the standard information geometric structure to the models [1,2]. Inspired by the recent progress of several areas in statistical physics and mathematical statistics [3,4,5,6,7,8,9,10] which have exploited theoretical interests and possible applications for generalized exponential families, one research direction in information geometry is pointing to constructions of deformed geometries based on the standard one, keeping its basic properties. A typical and classical example of such a deformation would be the alpha-geometry [1,2], a statistical definition of which can be regarded as a replacement of the logarithmic function by suitable power functions. Hence, for the purpose of the generalization and flexible applicability, much attention is paid to various uses of such replacements by representing functions as important tools [3,4,11,12]. Two major characteristics of the standard structure are dual flatness and invariance [2]. Dual flatness (or Hessian structure [13]) produces fruitful properties such as the existence of canonical coordinate systems, a pair of conjugate potential functions and the canonical divergence (relative entropy). In addition, they are connected with the Legendre duality relation, which is also fundamental in the generalization of statistical physics. On the other hand, the invariance of geometric structure is crucially valuable in developing mathematical statistics. It has been proved [14] that invariance holds for only the structure with a special triple of a Riemannian metric and a pair of mutually dual affine connections, which are respectively called the Fisher information and the alpha-connections (see Section 3 for their definitions). The study of these two characteristics from a viewpoint of representing functions would contribute to our geometrical understanding. In this paper, we first characterize a pair of representing functions that realizes the invariant information geometric structure. Next, we propose a new transformation to obtain dually flat geometries from a certain class of non-flat information geometries, using concepts from affine differential geometry [15,16]. We call the transformation conformal flattening, which is a generalization of the way to realize the corresponding dually flat geometry from the alpha-geometry developed in [17,18]. As applications and easy consequences of the results, we finally show several properties of gradient flows associated with realized dually flat geometries. Focusing on geometric characteristics conserved by the transformation, we discuss the properties such as a relation between geodesics and flows, the first integral of the flows and so on. These properties are new and general. Hence, they refine the arguments of the flows in [18], where only the alpha-geometry is treated. The paper is organized as follows. In Section 2, we introduce preliminary results, explaining several existing methods to construct the information geometric structure that includes a dually flat structure and the alpha-structure and so on. We also give a short summary of concepts from affine differential geometry, which will be used in this paper. Section 3 provides a characterization of representing functions that realize invariant geometry, i.e., the one equipped with the Fisher information and a pair of the alpha-connections. The characterization is obtained by solving a simple system of ordinary equations. In Section 4, we first obtain a certain class of information geometric structure by regarding representing functions as immersions into an ambient affine space. Then, we demonstrate the conformal flattening to realize the corresponding dually flat structure, and discuss their properties and relations with generalized entropies or escort probabilities [19]. Section 5 exhibits the geometric properties of gradient flows with respect to a conformally realized Riemannian metric. These flows are reduced to the well-known replicator flow [20] (Chapter 16) when we consider the standard information geometry. Suitably choosing its pay-off functions, we see that the flow follows a geodesic curve or conserves a divergence from an equilibrium. In the final section, some concluding remarks are made. Throughout the paper, we use a probability simplex as a statistical model for the sake of simplicity.

2. Preliminaries

2.1. Information Geometry of and

Let us represent an element with its components as . Denote, respectively, the positive orthant by and the relative interior of the probability simplex by Let be a probability distribution of a random variable X taking a value in the finite sample space . We consider a set of distributions with positive probabilities, i.e., , defined by which is identified with . A statistical model in is represented with parameters by where each is smoothly parametrized by . For such a statistical model, can also be regarded as coordinates of the corresponding submanifold in . For simplicity, we shall consider the full model, i.e., and the parameter set is bijective with via ’s. The information geometric structure [2] on denoted by is composed of the pair of mutually dual torsion-free affine connections ∇ and with respect to a Riemannian metric g. If we write , the mutual duality requires components of to satisfy Let L and M be a pair of strictly monotone (i.e., one-to-one) smooth functions on the interval . One way of constructing such a structure is to define the components as follows [2,11]: In this paper, we call L and M representing functions. It is easy to verify the mutual duality (1). (Positive definiteness of g needs additional conditions.) When the curvature tensors of both ∇ and vanish, is called dually flat [2]. It is known that is dually flat if and only if there exist two special coordinate systems denoted by and , respectively, where is ∇-affine, is -affine and they are biorthogonal, i.e., We give examples. For a real number , define and . If we set and , then they derive the alpha-structure [2] , where is the Fisher information and are the alpha-connections (see Section 3). In particular, if we choose , it defines the standard dually flat structure , where and are called the e- and m-connection, respectively [2]. Similarly, the -log geometry [3] can also be introduced in the same way by taking and . One traditional way to construct a general information geometric structure , without using representing functions, is by means of contrast functions (or divergences) [2,21]. In our case, let be a function on satisfying with equality if and only if . For a vector field , let denote its tangent vector at p. When we define we can confirm that (1) holds. If g is positive definite, we say that is a contrast function or a divergence that induces the structure . A contrast function of the form: always induces the corresponding dually flat structure. Conversely, it is known [2] that if is dually flat, then there exists the unique contrast function of the form (8) that induces the structure. Hence, it is called the canonical divergence of and we say that the functions and are potentials. By setting , we see that a dually flat structure naturally gives the Legendre duality relations at each p, i.e., the function , is the Legendre conjugate of satisfying Applying the idea of affine hypersurface theory [15] is also one of the other ways to construct the information geometric structure. Let D be the canonical flat affine connection on . Consider an immersion f from into and a vector field on that is transversal to the hypersurface in . Such a pair , called an affine immersion, defines a torsion-free connection ∇ and the affine fundamental form g on via the Gauss formula as where is the set of tangent vector fields on and denotes the differential of f. By regarding g as a (pseudo-) Riemannian metric, one can discuss the realized structure on . We say that is non-degenerate and equiaffine if g is non-degenerate and is tangent to for any , respectively. The latter ensures that the volume element on defined by is parallel to ∇ [15] (p.31). It is known [15,16] that there exists a torsion-free dual affine connection satisfying (1) if and only if is non-degenerate and equiaffine. In this case, the obtained structure on is not dually flat in general. However, there always exists a positive function and a dually flat structure on that hold the following relations [16]: Furthermore, there exists a specific contrast function for called the geometric divergence. Then, a contrast function that induces is given by the conformal divergence . These properties of the structure realized by the non-degenerate and equiaffine immersion are called 1-conformal flatness [16].

3. Characterization of Invariant Geometry by Representing Functions

Suppose that a pair of representing functions defines an information geometric structure by (2), (3) and (4). In this section, we consider the condition of such that is invariant. This is equivalent [2,14] to g which is the Fisher information defined by and a pair of dual connections satisfies and for a certain , where is the -connection defined by Hence, expressed in (2) by functions and coincides with the Fisher information if and only if the following equation holds: Similarly, we derive a condition for expressed in (3) to be the -connection. First, note that the following relations hold: On the other hand, we have Substituting (16), (17) and (18) into (3) and (14), and comparing them, we obtain (15) again and Expressing and , we have the following ODE from (15) and (19): By integrations, we get and where c and are constants with a constraint . Thus, is essentially a pair of representing functions that derives the alpha-geometry and there is only freedom of adjusting the constants for the invariance of geometry. If we require solely (15), which implies that only a Riemannian metric g is the Fisher information , there still remains much freedom for .

4. Affine Immersion of the Probability Simplex

Now we consider the affine immersion with the following assumptions. Assumptions: The affine immersion is nondegenerate and equiaffine, The immersion f is given by the component-by-component and common representing function L, i.e., The representing function is sign-definite, concave with and strictly increasing, i.e., . Hence, the inverse of L denoted by E exists, i.e., . Each component of satisfies on . From the assumption 3, it follows that centro-affine, which is known to be equiaffine [

4.1. Conormal Vector and the Geometric Divergence

Define a function on by then immersed in is expressed as a level surface of . Denote by the dual space of and by the pairing of and . The conormal vector [15] (p.57) for the affine immersion is defined by for . Using the assumptions and noting the relations: we have where is a normalizing factor defined by Then, we can confirm (23) using the relation for . Note that defined by also satisfies Furthermore, it follows, from (24), (25) and the assumption 4, that for all . It is known [15] (p.57) that the affine fundamental form g can be represented by In our case, it is calculated via (26) as Hence, g is positive definite from the assumptions 3 and 4, and we can regard it as a Riemannian metric. Utilizing these notions from affine differential geometry, we can introduce a geometric divergence [16] as follows: It is easily checked that is actually a contrast function of the 1-conformally flat structure using (5), (6) and (7).

4.2. Conformal Flattening Transformation

As is described in the preliminary section, by 1-conformally flatness there exists a positive function, i.e., conformal factor that relates with a dually flat structure via the conformal transformation (10), (11) and (12). A contrast function that induces is given as the conformal divergence: from the geometric divergence in (28). For an arbitrary function L within our setting given by the four assumptions, we prove that we can construct a dually flat structure by choosing the conformal factor carefully. Hereafter, we call this transformation conformal flattening. Define then it is negative because each is negative. The conformal divergence to with respect to the conformal factor is If the conformal factor is Using given relations, we first show that the conformal divergence is the canonical divergence for : Next, let us confirm that . Since , we have by setting . Hence, we have Differentiating by , we obtain This implies that Together with (34) and this relation, is confirmed to be the Legendre conjugate of . The dual relation follows automatically from the property of the Legendre transform. □ The following corollary is straightforward because all the quantities in the theorem depend on only L: Under the assumptions, the dually flat structure on , obtained by following the above conformal flattening, does not depend on the choice of the transversal vector ξ. Note that the conformal metric is given by projectively (or -1-conformally) equivalent [ ∇ is also projectively equivalent to the flat connection In our setting, conformal flattening is geometrically regarded as normalization of the conormal vector ν. Hence, the dual coordinates escort probability [ While the immersion f is composed of a representing function L under the assumption 2, the corresponding M of a single variable does not generally exist for nor . From the expressions of the Riemann metrics g in (27) and , we see that the counterparts of the representing functions would be, respectively, and , but note that they are multi-variable functions of .

4.3. Examples

If we take L to be the logarithmic function , then the conformally flattened geometry immediately defines the standard dually flat structure on the simplex . We see that is the entropy, i.e., and the conformal divergence is the KL divergence (relative entropy), i.e., . Next, let the affine immersion be defined by the following L and : and with and . We see that the immersion is centro-affine scaled by the constant factor . Then, we see that the immersion realizes the alpha-structure on with . The geometric divergence is the alpha-divergence, i.e., Following the procedure of conformal flattening described in the above, we have [17] and obtain a dually flat structure via the formulas in Theorem 1: Here, and are the q-logarithmic function and the Tsallis entropy [10], respectively, defined by Note that the escort probability appears as the dual coordinate .

5. An Application to Gradient Flows on

Recall the replicator flow on the simplex for given functions defined by which is extensively studied in evolutionary game theory. It is known [20] (Chapter 16) that the solution to (35) is the gradient flow that maximizes a function satisfying with respect to the Shahshahani metric (See below), the KL divergence is a local Lyapunov function for an equilibrium called the evolutionary stable state (ESS) for the case of with . The Shahshahani metric is defined on the positive orthant by Note that the Shahshahani metric induces the Fisher metric on . Further, the KL divergence is the canonical divergence [2] of . Thus, the replicator dynamics (35) are closely related with the standard dually flat structure , which associates with exponential and mixture families of probability distributions. In addition, investigation of the flow is also important from a viewpoint of statistical physics governed by the Boltzmann–Gibbs distributions when we choose as various physical quantities, e.g., free energy or entropy. Similarly, when we consider various Legendre relations deformed by L, it would be of interest to investigate gradient flows on for a dually flat structure or a 1-conformally flat structure . Since g and can be naturally extended to as a diagonal form (we use the same notation for brevity): from (27), we can define two gradient flows for on . One is the gradient flow for g, which is for . It is verified that is tangent to , i.e., and gradient of V, i.e., In the same way, the other one for is defined by Note that both the flows reduce to (35) when . From (37), the following consequence is immediate: The trajectories of the gradient flow (38) and (39) starting from the same initial point coincide while velocities of time-evolutions are different by the factor-. Taking account of the example with respect to the alpha-geometry and the conformally flattened one given in subsection 4.3, the following result shown in [18] can be regarded as a corollary of the above proposition: The trajectories of the gradient flow (39) with respect to the conformal metric for coincide with those of the replicator flow (35) while velocities of time-evolutions are different by the factor-. Next, we particularly consider the case when is a potential function or divergences. As for a gradient flow on a manifold equipped with a dually flat structure , the following result is known: [22] Consider the potential function and the canonical divergence of for an arbitrary prefixed point r. The gradient flows for and follow -geodesic curves. As is described in Remark 2, and are projectively equivalent. One geometrically interesting property of the projective equivalence is that - and - geodesic curves coincide up to their parametrizations (i.e., a curve is -pregeodesic if and only if it is -pregeodesic) [15] (p.17). Combining this fact with Propositions 1 and 2, we see that the following result holds: Let be an arbitrary prefixed point. The gradient flows (38) for , and follow -geodesic curves. Finally, we demonstrate here another aspect of the flow (39). Let us particularly consider the following functions : Note that s are not integrable, i.e., non-trivial V satisfying (36) does not exist because of the anti-symmetry of . Hence, for this case, (39) is no longer a gradient flow. However, we can prove the following result: Consider the flow (39) with the functions s defined in (40) and assume that there exists an equilibrium for the flow. Then, and are the first integral (conserved quantity) of the flow. By substituting (40) into in (39) and using the expression of in (37), we have By the relation and (31), it holds that Hence, we see that and the flow (39) reduces to Since r is an equilibrium point, we see from (40) that Then, using (34), (41) and (42), we have Thus, is the first integral of the flow. It follows that is also the first integral of the flow from the definition of conformal divergence (29). □ From proposition 1, the same statement holds for the flow (38). The proposition implies the fact [20] that the KL divergence is the first integral for the replicator flow (35) with the function in (40) defined by and .

6. Conclusions

We have considered two important aspects of information geometric structure, i.e., invariance and dual flatness, from a viewpoint of representing functions. As for the invariance of geometry, we have proved that a pair of representing functions that derives the alpha-structure is essentially unique. On the other hand, we have shown the explicit formula of conformal flattening that transforms 1-conformally flat structures on the simplex realized by affine immersions to the corresponding dually flat structures. Finally, we have discussed several geometric properties of gradient flows associated to two structures. Presently, our analysis is restricted to the probability simplex, i.e., the space of discrete probability distributions. For the continuous case, the similar or related results are obtained in [23,24] without using affine immersions. Extensions of the results obtained in this paper to continuous probability space and the exploitation of relations to the literature are left for future work. The conformal flattening can also be applied to the computationally efficient construction of a Voronoi diagram with respect to the geometric divergences [18]. Exploring the possibilities of other applications would be of interest.
  1 in total

1.  Divergence function, duality, and convex analysis.

Authors:  Jun Zhang
Journal:  Neural Comput       Date:  2004-01       Impact factor: 2.026

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.