Literature DB >> 23805026

On conjugate families and Jeffreys priors for von Mises-Fisher distributions.

Abstract

This paper discusses characteristics of standard conjugate priors and their induced posteriors in Bayesian inference for von Mises-Fisher distributions, using either the canonical natural exponential family or the more commonly employed polar coordinate parameterizations. We analyze when standard conjugate priors as well as posteriors are proper, and investigate the Jeffreys prior for the von Mises-Fisher family. Finally, we characterize the proper distributions in the standard conjugate family of the (matrix-valued) von Mises-Fisher distributions on Stiefel manifolds.

Entities: Chemical Disease

Keywords: Bayesian inference; Conjugate prior; Jeffreys prior; von Mises–Fisher distribution

Year: 2013 PMID： 23805026 PMCID： PMC3690539 DOI： 10.1016/j.jspi.2012.11.003

Source DB: PubMed Journal: J Stat Plan Inference ISSN： 0378-3758 Impact factor: 1.111

Introduction

A random unit length vector in has a von Mises–Fisher (or Langevin, short: vMF) distribution with parameter if its density with respect to the uniform distribution on the unit hypersphere is given by where, using the rising factorial , is a generalized hypergeometric series and related to the modified Bessel function of the first kind via (e.g., Mardia and Jupp, 1999, p. 168). We note that the vMF distribution is commonly parameterized using polar coordinates, i.e., , where and are the concentration and mean direction parameters, respectively (if , is uniquely determined as . Using as the parameter, the family of vMF distributions on becomes a natural exponential family through the uniform distribution U on , commonly written as where in the vMF case, the cumulant transform of U is given by Bayesian inference for the vMF distribution is first discussed in Mardia and El-Atoum (1976), who give conjugate priors for when is known, and derive the Jeffreys prior for the polar coordinates (, ) parameterization. Guttorp and Lockhart (1988) introduce a Bayesian approach for finding the direction of a signal based on developing standard (e.g., Gutiérrez-Peña and Smith, 1997, Definition 3.1) conjugate priors for the von Mises (vM) distribution (i.e., for d=2) using the canonical () parameterization. Damien and Walker (1999) present a full Bayesian analysis of circular data using the vM distribution by employing standard conjugate priors for the polar coordinates (, ) parameterization, and developing a Gibbs sampler for this family of distributions. Nuñez-Antonio and Gutiérrez-Peña (2005) provide a full Bayesian analysis of directional (i.e., ) data using the vMF distribution, again using standard (, ) conjugate priors and obtaining samples from the posterior using a sampling-importance-resampling method found to outperform Gibbs sampling. Bangert et al. (2010) construct (possibly infinite) mixtures of vMF distributions using standard conjugate priors for the (, ) parameterization and Dirichlet (process) priors for the mixing probabilities. Interestingly, none of these references explicitly discuss when the employed priors (and respective posteriors) are actually proper, or whether the conjugate families obtained using the or (, ) parameterizations are the same. In this paper, we settle these open issues, and also discuss Jeffreys priors for the general () vMF family (Section 2). We also provide results for (matrix-valued) vMF distributions on Stiefel manifolds (Section 3).

Results

Propriety of priors from the standard conjugate family

In what follows, it will be convenient to write so that . Let be a suitable parameterization of . For a sample of independent, identically distributed (i.i.d.) observations from the vMF family , the likelihood function for is given by where is the resultant of the sample. Following Gutiérrez-Peña and Smith (1997, Definition 3.1), the standard conjugate family for relative to , denoted by , has densities Using such a prior with parameters s0 and will result in a posterior with parameters and . As clearly can be interpreted as the prior sample size, and as a sample size weighted average of the “prior mean” and the sample mean . The standard conjugate family relative to the canonical parameter has several important properties, in particular the linear relationship between the posterior mean and the sample mean (Diaconis and Ylvisaker, 1979). We note that the densities are usually taken relative to the Lebesgue measure, which does not quite fit the needs of the commonly used polar coordinates (, ) parameterization of the vMF family . Let us generally write for the reference measure employed. Previous work using the (, ) parameterization seem to take as the product of the Lebesgue measure on (for ) and the uniform distribution U on (for ), i.e., . As for we have (where a is the area of the unit hypersphere). The latter may be more natural as reference measure, turning the standard conjugate family relative to the polar coordinates (, ) parameterization into the (obvious generalization) of what Gutiérrez-Peña and Smith (1997) call the DY-conjugate family for relative to the parameterization. Let denote the set of all hyperparameters s and for which is a proper distribution on the employed parameter space (using as reference measure), i.e., and let We have the following results. For the canonical parameterization of the vMF family and the Lebesgue measure as reference measure , and the normalizing constant is the inverse of . In the following a parameter is introduced which allows to cover both cases of reference measures when using the (, ) parameterization: the Lebesgue measure (leading to ) and the product of the Lebesgue measure on and the uniform distribution U on which is employed in previous work (leading to . Other choices for lead to additional possible reference measures. For the polar coordinates (, ) parameterization of the vMF family and the reference measure with , and the normalizing constant is the inverse of . Note that if , is equivalent to or , which if is only possible if . Thus, the set is non-empty only if and , and clearly can only contain points for which . For the proof, we use the following result. If and are nonnegative, if and only if or . Using the asymptotic approximation for and fixed (e.g., Abramowitz and Stegun, 1972, http://dlmf.nist.gov/10.40), we have Hence, for large , the integrand in J is “approximately proportional” to Thus, the integral diverges if , and converges if . If , convergence requires , or equivalently, as asserted. □ Transforming to polar coordinates , we obtaininterchanging the order of integration being justified by nonnegativity of the integrand. The assertion now follows from Lemma 1. □ If , we have whence the theorem follows by again using Lemma 1. □ We see that for the canonical parameterization, the hyperparameters giving proper distributions are the ones for which and the “prior mean” lies in the interior of (the convex hull of) the unit hypersphere . This is not a coincidence: in fact, one can alternatively establish Theorem 1 (and equivalently, Theorem 2 for ) without explicit convergence computations using the general results of Diaconis and Ylvisaker (1979), see also Gutiérrez-Peña and Smith (1997, Theorem 3.1). Let be a probability measure on (the Borel sets of) with bounded support and consider the natural exponential family through with density , where , and the standard conjugate family with densities (with respect to the Lebesgue measure). As is bounded and is finite, . Let be the interior of the convex hull of . Then by Theorem 1 of Diaconis and Ylvisaker (1979), if is nonempty (and hence “the observation set is genuinely d-dimensional” Diaconis and Ylvisaker, 1979, p. 271) is proper if and only if and . (The reference actually uses where we use s.) In the vMF case, , with convex hull the closed unit ball, and interior the open unit ball . Hence, is proper if and only if and , or equivalently, , again establishing Theorem 1. We also note that for and , Theorem 1 implies thatso that s and give a proper conjugate distribution for the polar coordinates (, ) parameterization with reference measure . However, neither results for the case nor necessity of the condition can be established using the general framework (and in fact, Theorem 2 shows that the condition is not necessary if and .

Propriety of posteriors from improper standard conjugate priors

Quite interestingly, canonical priors employed in the literature are improper if a vague prior is intended (see for example Nuñez-Antonio and Gutiérrez-Peña, 2005, who use and s=0). However, we note that if and , then with equality if and only if which is a zero set for samples obtained from the vMF with fixed parameter . Hence intuitively, we expect that improper standard conjugate priors with “almost always” yield proper posteriors. For the case (as in the examples) we have with equality if and only if which is a zero set for samples obtained from the vMF with fixed parameter and . This can be formalized as follows. Let be the density (with respect to ) of a measure on and define on (n times), the space of all valued samples of size n, viaWriting , i.e., is a generalized “mixture” of the distribution of i.i.d. samples of size n from the vMF family. Let If (), then for all (respectively, ) and arbitrary . Let . From the above, if and only if . Clearly, for i.i.d. random variables from the vMF distribution with parameter , and hence, . As consists of all for which , the assertion for the second case follows along the lines of the first case. □ If we use the canonical parameterization and , then by the above,is infinite if and only if . If , this is a zero set under the product of uniforms on , and hence (again) . On the other hand, if , then clearly : in this sense, it is always possible to obtain improper posteriors when employing an improper standard conjugate prior with (we notice however that such priors are admittedly “strange”, as the corresponding prior sample means are outside the unit ball and hence “impossible”). If X has a vMF distribution with parameter , the theory of regular exponential models (cf., e.g., Mardia and Jupp, 1999, pp. 32–33) implies that so that where one can show that the logarithmic derivative A of satisfies (Schou, 1978) that as (so that is in fact provided we take its value at zero to be ), and that as . Hence, for i.i.d. random variables from the vMF distribution with parameter , which is less than one in length, and hence with probability one as . Thus, if we write as for all , and using continuity arguments one can easily see that this convergence is uniform on compact subsets of . On the other hand, for , so the convergence cannot be uniform over . It would be very interesting to find the rate at which tends to zero, which would then allow one to characterize the improper prior densities for which as .

Jeffreys prior

A commonly suggested non-informative prior is the Jeffreys prior (Jeffreys, 1961), defined as the square root of the determinant of the Fisher information matrix relative to the parameterization employed. When using the canonical parameter, again by the theory of regular exponential models, so thatwhere I denotes the d-dimensional identity matrix. By a well known result from linear algebra, and thus and in particular if , , so that generalizing the result obtained in Guttorp and Lockhart (1988) for the case d=2 (the sign in the reference is not correct). The Jeffreys prior for the canonical parameter only depends on and behaves like for . The first assertion is immediate by observing that depends on only via its length. To obtain the asymptotic behavior, we can use the asymptotics of and the fact that (Schou, 1978). Thus, as ,such that for , We note that the Jeffreys prior “looks different” from the densities employed in the standard conjugate family relative to the canonical parameter. Following Gutiérrez-Peña and Smith (1997), one can rigorously establish that it is not contained in this family by verifying that the skewness vector of the vMF family is not linear in its mean parameter, which is straightforward from the above expression for . The Jeffreys prior with respect to the canonical parameterization is given by If the polar coordinates (, ) parameterization is employed two alternative parameterizations are possible for the mean direction parameter in order to obtain an unrestricted set of parameters. The parameterization with consists of the first dimensions of the mean direction parameter and the parameterization uses the spherical polar coordinates for with . If these parameterizations are used the Jeffreys prior derived for the canonical parameterization needs to be multiplied with the Jacobians which are given by for the parameterization and for the parameterization. Note that the latter is also given in Mardia and El-Atoum (1976), which should have instead of and needs instead of in the exponents of the sinuses. Clearly, the Jeffreys prior is not proper. The following shows that “almost all” posteriors obtained from it (and in fact, from arbitrary possibly improper priors which increase at most polynomially in ) are proper for samples of size . Consider the canonical parameterization of the vMF family with the Lebesgue reference measure. Let be for some finite as and . Then for all , . Writing , we haveBy Lemma 1, this is finite provided that . Hence, which is a zero set under the product of uniforms on provided that . □

Propriety of prior and posterior distributions in applications

Guttorp and Lockhart (1988) perform a full Bayesian analysis employing the canonical parameterization for 2-dimensional data. Rather than using the standard conjugate prior for , they use a flat prior on and the conjugate prior they derived for with known. Note that for the conjugate prior for to be proper, the same conditions on and need to be satisfied as for the conjugate prior for . In order to parameterize the prior only the length of s and need to be specified. For their application, Guttorp and Lockhart (1988) use three different prior distributions for : a data-based, a high precision and a low precision prior. In all three cases the priors for are proper because . This is also clear by construction: the parameters of the priors are determined by specifying constraints for the moments of the prior distribution or quantities derived from the moments. Damien and Walker (1999) employ the polar coordinates (, ) parameterization with the Lebesque measure as reference measure, i.e., . They use two different prior distributions in their two numerical examples. In the first example, they set all prior parameters to zero. This is the same prior Nuñez-Antonio and Gutiérrez-Peña (2005) use in their examples and refer to as vague prior. Given the updates for the posterior parameters as well as the interpretation of as the prior sample size, this seems to be an obvious choice. This prior is improper, but as shown in Section 2.2, posteriors will be proper almost surely for samples of size . In their second example, Damien and Walker (1999) use a flat prior for and a conjugate prior for with known. Referring to the low precision prior in Guttorp and Lockhart (1988), are employed as parameters. The low precision prior in Guttorp and Lockhart (1988) actually is equal to and , while the values for the high precision prior are and . Interpreting as the prior sample size, larger values of imply more informative priors. However, the precision induced by the prior will depend on the average length given by . The closer this value is to 1 the higher is the precision induced by the prior. Using an approximation for large , Guttorp and Lockhart (1988) derived that the prior mean and variance of the precision parameter depend on as well as the difference . By rounding to the same value as , Damien and Walker (1999) use an improper prior and the interpretation as a low precision prior, as induced by the prior mean and standard deviation, obviously is lost. As shown in Section 2.2, the posterior from this prior is almost surely proper for samples of size . Bangert et al. (2010) use conditionally conjugate priors for and . The prior for is a von Mises–Fisher distribution with precision parameter equal to 0.1 and mean parameter equal to the mean direction of the data. For they use the conjugate prior for known . Setting the prior parameters equal to and , they employ a proper prior. To sum up, these previous applications indicate that the non-informative but improper prior with seems to be an obvious choice if no prior information is available. This seems to be unproblematic because the posteriors will be almost surely proper for sample sizes by Theorem 3. If prior information on the precision parameter is to be included, a flat prior for is employed and the conjugate prior with known for . Another possibility, when using Gibbs sampling for estimation, is to employ conditionally conjugate priors. In general the parameters of the prior of are chosen to reflect the prior information available for the moments of , which leads to proper priors.

Extensions

The vMF family on can straightforwardly be generalized to the vMF family on the Stiefel manifold , the set of orthogonal k-frames in , or equivalently, so that corresponds to . The vMF family on has densities with respect to the uniform distribution U on , where is a generalized hypergeometric function with matrix argument (e.g., Mardia and Jupp, 1999, p. 289). This family of distributions is useful as a probability distribution over orthonormal matrices and for example Hoff (2009) indicates that it arises as a posterior distribution for the orthonormal matrices in factor analysis when uniform priors are used. For a further discussion of this family of distributions in relation to orientation statistics see Downs (1972) and Khatri and Mardia (1977). Clearly, defines a one-to-one correspondence between and with . The standard conjugate family for the vMF family on (relative to the canonical parameter) is thus given by the family of densities Let denote the spectral norm (matrix 2-norm, the largest singular value) of S. The distributions in the standard conjugate family of the vMF family on the Stiefel manifold are proper if and only if . The support of U is , the convex hull of which is the closed unit ball in the spectral norm (e.g., Journée et al., 2010 or Gallivan and Absil, 2010), and hence has non-empty interior Using Theorem 1 of Diaconis and Ylvisaker (1979), the standard conjugate distributions are proper if and only if and , or equivalently, if and only if . □ The matrix vMF distributions are typically parameterized using the canonical parameter A. Alternatively, the analogue to the polar coordinates (, ) parameterization in the vector case is using the (right) polar decomposition of , where the polar part (or orientation) M is in the Stiefel manifold and the elliptical part (or concentration) K is a symmetric, non-negative definite matrix (e.g., Mardia and Jupp, 1999, p. 286). If A has full rank, K is the unique symmetric matrix root of , and (e.g., Cadet, 1996, adjusting for the different normalizations employed) , where with the (generalized) volume of and the eigenvalues of K. Hence,Thus, if we consider the standard conjugate family of the matrix vMF family on the Stiefel manifold relative to the polar coordinates parameterization with elements , and reference measures of the form , then as discussed in Section 2 for the polar parameterization of the vector vMF distribution, if Theorem 6 implies that distributions in this conjugate family are proper provided that . Again, necessity of this condition for such values of , or the characterization of the hyperparameters giving proper distributions if cannot be established. For this, one needs to be able to characterize S and (and ) for which which seems quite challenging, requiring suitable “large K” asymptotics for and . We note that Butler and Wood (2003) give Laplace approximations for (and corresponding Bessel functions) of matrix arguments (but do not formally establish validity as an asymptotic approximation). Muirhead (1978, p. 22) gives an asymptotic approximation for for the case where all singular values of A are large. For the above, a generalization to the case where some singular values are large is needed. We leave this for future research.

2 in total

1. Enhancing wind direction prediction of South Africa wind energy hotspots with Bayesian mixture modeling.

Authors: Najmeh Nakhaei Rad; Andriette Bekker; Mohammad Arashi
Journal: Sci Rep Date: 2022-07-06 Impact factor: 4.996

2. Coming Together of Bayesian Inference and Skew Spherical Data.

Authors: Najmeh Nakhaei Rad; Andriette Bekker; Mohammad Arashi; Christophe Ley
Journal: Front Big Data Date: 2022-02-08

2 in total