Literature DB >> 25745274

Equivariant minimax dominators of the MLE in the array normal model.

Abstract

Inference about dependencies in a multiway data array can be made using the array normal model, which corresponds to the class of multivariate normal distributions with separable covariance matrices. Maximum likelihood and Bayesian methods for inference in the array normal model have appeared in the literature, but there have not been any results concerning the optimality properties of such estimators. In this article, we obtain results for the array normal model that are analogous to some classical results concerning covariance estimation for the multivariate normal model. We show that under a lower triangular product group, a uniformly minimum risk equivariant estimator (UMREE) can be obtained via a generalized Bayes procedure. Although this UMREE is minimax and dominates the MLE, it can be improved upon via an orthogonally equivariant modification. Numerical comparisons of the risks of these estimators show that the equivariant estimators can have substantially lower risks than the MLE.

Entities: CellLine Chemical Disease Gene Species

Keywords: Bayesian estimation; Gibbs sampling; Stein’s loss; covariance estimation; tensor data

Year: 2015 PMID： 25745274 PMCID： PMC4346100 DOI： 10.1016/j.jmva.2015.01.020

Source DB: PubMed Journal: J Multivar Anal ISSN： 0047-259X Impact factor: 1.473

1. Introduction

The analysis of array-valued data, or tensor data, is of interest to numerous fields, including psychometrics (Kiers and Mechelen, 2001), chemometrics (Smilde et al., 2005; Bro, 2006), imaging (Vasilescu and Terzopoulos, 2003), signal processing (Cichocki et al., 2014) and machine learning (Tao et al., 2005), among others (Kroonenberg, 2008; Kolda and Bader, 2009). Such data consist of measurements indexed by multiple categorical factors. For example, multivariate measurements on experimental units over time may be represented by a three-way array X = {x} ∈ ℝ, with i indexing units, j indexing variables and t indexing time. Another example is multivariate relational data, where x is the type-k relationship between person i and person j. Statistical analysis of such data often proceeds by fitting a model such as X = Θ + E, where Θ is low-dimensional and E represents additive residual variation about Θ. Standard models for Θ include regression models, additive effects models (such as those estimated by ANOVA decompositions) and unconstrained mean models if replicate observations are available. Another popular approach is to model Θ as being a low-rank array. For such models, ordinary least-squares estimates of Θ can be obtained via various types of tensor decompositions, depending on the definition of rank being used (De Lathauwer et al., 2000b,a; de Silva and Lim, 2008). Less attention has been given to the analysis of the residual variation E. However, estimating and accounting for such variation is critical for a variety of inferential tasks, such as prediction, model-checking, construction of confidence intervals, and improved parameter estimation over ordinary least squares. One model for variation among the entries of an array is the array normal model (Akdemir and Gupta, 2011; Hoff, 2011) which is an extension of the matrix normal model (Srivastava and Khatri, 1979; Dawid, 1981), often used in the analysis of spatial and temporal data (Mardia and Goodall, 1993; Shitan and Brockwell, 1995; Fuentes, 2006). The array normal model is a class of normal distributions that are generated by a multilinear operator known as the Tucker product: A random K-way array X taking values in ℝ has an array normal distribution if , where “×” denotes the Tucker product (described further in Section 2), Z is a random array in ℝ having i.i.d. standard normal entries, and A is a p × p nonsingular matrix for each k ∈ {1, …, K}. Letting and “⊗” denote the Kronecker product, we write A maximum likelihood estimate (MLE) for the parameters in (1) can be obtained via an iterative coordinate descent algorithm (Hoff, 2011), which is a generalization of the iterative “flip-flop” algorithm developed in Mardia and Goodall (1993) and Dutilleul (1999), or alternatively the optimization procedures described in Wiesel (2012b). However, based on results for the multivariate normal model, one might suspect that the MLE lacks desirable optimality properties: In the multivariate normal model, James and Stein (1961) showed that the MLE of the covariance matrix is neither admissible nor minimax. This was accomplished by identifying a minimax and uniformly optimal equivariant estimator that is different from the (equivariant) MLE, and therefore dominates the MLE. As pointed out by James and Stein, this equivariant estimator is itself inadmissible, and improvements to this estimator have been developed and studied by Stein (1975); Takemura (1984); Lin and Perlman (1985), and Haff (1991), among others. This article develops similar results for the array normal model. In particular, we obtain a procedure to obtain the uniformly minimum risk equivariant estimator (UMREE) under a lower-triangular product group of transformations for which the model (1) is invariant. Unlike for the multivariate normal model, there is no simple characterization of this class of equivariant estimators. However, results of Zidek (1969) and Eaton (1989) can be used to show that the UMREE can be obtained from the Bayes decision rule under an improper prior, which we derive in Section 2. In Section 3 we obtain the posterior distribution under this prior, and show how it can be simulated from using a Markov chain Monte Carlo (MCMC) algorithm. Specifically, the MCMC algorithm is a Gibbs sampler that involves simulation from a class of distributions over covariance matrices, which we call the “mirror-Wishart” distributions. In Section 4.1 we develop a version of Stein’s loss function for covariance estimation in the array normal model, and show how the Gibbs sampler of Section 3 can be used to obtain the UMREE for this loss. We discuss an orthogonally equivariant improvement to the UMREE in Section 4.2, which can be seen as analogous to the estimator studied by Takemura (1984). Section 4.3 compares the risks of the MLE, UMREE and the orthogonally equivariant estimator as a function of the dimension of X in a small simulation study. A discussion follows in Section 5. Proofs are contained in an appendix.

2. An invariant measure for the array normal model

2.1. The array normal model

The array normal model on ℝ consists of the distributions of random K-arrays X ∈ ℝ for which for some Θ ∈ ℝ, nonsingular matrices A ∈ ℝ, k = 1, …, K and a random p1 × ⋯ × p array Z with i.i.d. standard normal entries. Here, “×” denotes the Tucker product, which is defined by the identity where “⊗” is the Kronecker product and z = vec(Z), the vectorization of Z. This identity can be used to find the covariance of the elements of a random array satisfying (2): Letting x, z, θ be the vectorizations of X, Z, Θ, we have and so the array normal distributions correspond to the multivariate normal distributions with separable (Kronecker structured) covariance matrices. A useful operation related to the Tucker product is the matricization operation, which reshapes an array into a matrix along an index set, or mode. For example, the mode-k matricization of Z is the p × (∏ p)-dimensional matrix Z( having rows equal to the vectorizations of the “slices” of Z along the kth index set. An important identity involving the Tucker product is that if Y = Z × {A1, …, A} then As shown in Hoff (2011), a direct application of this identity gives where c is a scalar. This shows that can be interpreted as the covariance among the p slices of the array X along its kth mode. The array normal model can be parameterized in terms of a mean array E[X] = Θ ∈ ℝ and covariance Cov[vec(X)] = σ2(Σ ⊗ ⋯ ⊗ Σ1), where σ2 > 0 and for each k, , the set of p × p positive definite matrices. To make the parameterization identifiable, we restrict the determinant of each Σ to be one. Denote by this parameter space, that is, the values of (σ2, Σ1, …, Σ) for which |Σ| = 1, k = 1, …, K. Under this parameterization, we write X ~ N (Θ, σ2(Σ ⊗ ⋯ ⊗ Σ1)) if and only if , where for each k, Ψ is a matrix such that . Given a sample X1, …, X ~ i.i.d. N(Θ, σ2(Σ ⊗ ⋯ ⊗ Σ1)), the (K +1)-array X obtained by “stacking” X1, …, X along a (K +1)st mode also has an array normal distribution, where 1 is the n × 1 vector of ones and “◦” denotes the outer product. If n > 1 then covariance estimation for the array normal model can be reduced to the case that Θ = 0. To see this, let H be a (n − 1) × n matrix such that HH = I and H1 = 0. This implies that . Letting Y = X × {I, …, I, H}, and Y( be the mode-(K + 1) matricization of Y, we have E[Y(] = HE[X(] = H1vec(Θ) = 0, and so Y is mean-zero. Using identity (3), the covariance of vec(Y) can be shown to be σ2(HH ⊗ Σ ⊗ ⋯ ⊗ Σ1) = σ2(I ⊗ Σ ⊗ ⋯ ⊗ Σ1), and so Y ~ N(0, σ2(I ⊗ Σ ⊗ ⋯ ⊗ Σ1)). For the remainder of this paper, we consider covariance estimation in the case that Θ = 0.

2.2. Model invariance and a right invariant measure

Consider the model for an i.i.d. sample of size n from a p-variate mean-zero multivariate normal distribution, X ~ N(0, I ⊗ Σ), . Recall that AX ~ N(0, I ⊗ AΣA) for nonsingular matrices A, and so in particular this model is invariant under left multiplication of X by elements of , the group of lower triangular matrices with positive diagonals. An estimator Σ̂ mapping the sample space ℝ to is said to be equivariant under this group if Σ̂(AX) = AΣ̂(X)A for all and X ∈ ℝ. James and Stein (1961) characterized the class of equivariant estimators for this model, identified the UMREE under a particular loss function and showed that the UMREE is minimax. Additionally, as the MLE XX /n is equivariant and different from the UMREE, the MLE is dominated by the UMREE. We pursue analogous results for the array normal model by first reparameterizing in terms of the parameter Σ1/2 = (σ, Ψ1, …, Ψ), so where σ > 0 and each Ψ is in the set of p×p lower triangular matrices with positive diagonals and determinant 1. In this parameterization, Ψ is the lower triangular Cholesky square root of the mode-k covariance matrix Σ described in Section 2.1. Define the group as where the group operation is Note that consists of the same set as the parameter space for the model, as parameterized in (5). If the group acts on the sample space by then as shown in Hoff (2011) it acts on the parameter space by which we write concisely as g: Σ1/2 ↦ AΣ1/2. An estimator, Σ̂1/2 = (σ̂, Ψ̂1, …, Ψ̂), mapping the sample space ℝ to the parameter space is equivariant if For example, if Ψ̂ is the estimator of Ψ when observing X, then A Ψ̂ is the estimator when observing aX × {A1 …, A, I}. Unlike the case for the multivariate normal model, the class of - equivariant estimators for the array normal model is not easy to characterize beyond the definition given above. However, in cases like the present one where the group space and parameter space are the same, the UMREE under an invariant loss can be obtained as the generalized Bayes decision rule under a (generally improper) prior obtained from a right invariant (Haar) measure over the group (Zidek, 1969; Eaton, 1989). The first step towards obtaining the UMREE is then to obtain a right invariant measure and corresponding prior. To do this, we first need to define an appropriate measure space for the elements of . Recall that matrices A in have determinant 1, and so one of the nonzero elements of A can be expressed as a function of the others. For the rest of this section and the next, we parameterize in terms of the elements {A: 2 ≤ i ≤ p, 1 ≤ j ≤ i}, and express the upper-left element A as a function of the other diagonal elements, so that . The “free” elements of therefore take values in the space 𝒜 = {a > 0, a ∈ ℝ: 2 ≤ i ≤ p, 1 ≤ j < i}.

Theorem 1

A right invariant measure over the group is where d μ is Lebesgue measure over ℝ+ × 𝒜 × ⋯ × 𝒜. We note that although the density given above is specific to the particular parameterization of the , the inference results that follow will hold for any parameterization. Let be an invariant loss function, so that L(Σ1/2, B) = L(AΣ1/2, AB) for all A, B and . Theorem 6.5 of Eaton (1989) implies that the value of the UMREE when the array X is observed is the minimizer in B = (b, B1, …, B) of the integral where is the array normal density at the parameter value and is an arbitrary element of . Since the group action is transitive over the parameter space, and since the integral is right invariant, can be chosen to be equal to (1, I, …, I). Furthermore, since the parameter space and group space are the same, replacing A with Σ1/2 in the above integral indicates that the UMREE at X is the minimizer in B of that is, the UMREE is the Bayes estimator under the (improper) prior ν for Σ1/2. This is summarized in the following corollary:

Corollary 1

For an invariant loss function the estimator Σ̂1/2, defined as uniformly minimizes the risk E[L(Σ1/2, Σ̃1/2(X))|Σ1/2] among equivariant estimators Σ̃1/2 of Σ1/2. The expectation in where . In addition to uniformly minimizing the risk, the UMREE has two additional features. First, since any unique MLE is equivariant (Eaton, 1989, Theorem 3.2), the UMREE dominates any unique MLE, presuming the UMREE is not the MLE. Second, the UMREE under is minimax. This follows because is a subgroup of , as for all a > 0 and . Since is a solvable group (James and Stein, 1961), this necessarily implies that is solvable (Rotman, 1995, Theorem 5.15). By the results of Kiefer (1957) and Bondar and Milnes (1981), the equivariant estimator that minimizes (6) is minimax. Note that because the prior ν is improper, the posterior (7) is not guaranteed to be proper. However, we are able to guarantee propriety if the sample size n is sufficiently large:

Theorem 2

Let . For p(σ, Ψ1, …, Ψ|X) defined in , The sample size in the Theorem is sufficient for propriety, but empirical evidence suggests that it is not necessary. For example, results from a simulation study in Section 4 suggest that, for some dimensions, a sample size of n = 1 is sufficient for posterior propriety and existence of an UMREE.

3. Posterior approximation

For the results in Section 2 to be of use, we must be able to actually minimize the posterior risk in Equation 6 under an invariant loss function of interest. In the next section, we will show that the posterior risk minimizer under a multiway generalization of Stein’s loss is given by posterior expectations of the form E[(σ2Σ)−1|X], where . Although these posterior expectations are not generally available in analytic form, they can be approximated using a MCMC algorithm. In this section, we show how a relatively simple Gibbs sampler can be used to simulate a Markov chain of values of Σ1/2 = (σ, Ψ1, …, Ψ), having a stationary distribution equal to the desired posterior distribution given by Equation 7. These simulated values can be used to approximate the posterior distribution of Σ1/2 given X, as well as any posterior expectation, in particular E[(σ2Σ)−1|X]. The Gibbs sampler proceeds by iteratively simulating values of {σ, Ψ} from their full conditional distribution given the current values of {Ψ1, …, Ψ, Ψ, …, Ψ}. This is done by simulating σ2Σ from its full conditional distribution, from which σ and Ψ can be recovered. One iteration of the Gibbs sampler proceeds as follows: Iteratively for each k ∈ {1, …, K}, simulate (σ2Σ)−1 ~ mirror-Wishart (np/p, ); set Ψ to be the lower triangular Cholesky square root of Σ. In this algorithm, X( ∈ ℝ is the mode-k matricization of X and Ψ− = Ψ ⊗ ⋯ ⊗ Ψ⊗Ψ ⊗ ⋯ ⊗Ψ1. The mirror-Wishart distribution is a probability distribution on positive definite matrices, related to the Wishart distribution as follows:

Definition 1

A random q× q positive definite matrix S has a mirror-Wishart distribution with degrees of freedom ν > 0 and scale matrix if where VV is the lower triangular Cholesky decomposition of a Wishart(ν, I)-distributed random matrix and UU Φ. Some understanding of the mirror-Wishart distribution can be obtained from its expectation:

Lemma 1

If S ~ mirror-Wishart(ν, Φ) then where UU Φ and D is a diagonal matrix with entries d = (ν + q + 1 − 2j)/ν, j = 1, …, q. The calculation follows from Bartlett’s decomposition, and is in the appendix. The implications of this for covariance estimation are best understood in the context of the multivariate normal model X ~ N(0, I ⊗ Σ). In this case, for a given prior the Bayes estimator under Stein’s loss is given by E[Σ−1|X]−1 (see, for example Yang and Berger (1994)). Under Jeffreys’ noninformative prior, Σ−1 ~ Wishart(n, (XX)−1) and so the Bayes estimator is XX/n. While unbiased, this estimator is generally thought of as not providing appropriate shrinkage of the sample eigenvalues. Note that under Jeffreys’ prior, a posteriori we have , where VV ~ Wishart(n, I) and UU is the upper triangular Cholesky decomposition of (XX)−1. In contrast, under a right invariant measure as our prior we have . The expectation of VV is nI, whereas the expectation of V is nD, which provides a different pattern of shrinkage of the eigenvalues of XX. By Lemma 1, the Bayes estimator under a right invariant measure as our prior in this case is given by (nUDU)−1 = U−D−1U−1/n, which is the UMREE obtained by James and Stein (1961). Thus, the UMREE in the multivariate normal model corresponds to a Bayes estimator under a right invariant measure as our prior and mirror-Wishart posterior distribution. The Gibbs sampler is based on the full conditional distribution of (σ2Σ)−1, which we derive from the full conditional density of {σ, Ψ}: where dependence of the density on {Ψ1, …, Ψ, Ψ, …, Ψ,X} has been made implicit. Now set L = σΨ. The full conditional density of L can be obtained from that of {σ, Ψ} and the Jacobian of the transformation.

Lemma 2

The Jacobian of the transformation g(σ, Ψ) = σΨ, mapping to is Since L = σΨ, we have σ = |L|1/ and Ψ = L/σ = L/|L|1/. Lemma 2 implies which, through straightforward calculations, can be shown to be proportional to We now “absorb” into L. First, take the lower triangular Cholesky decomposition of so that We have Now let , so that L = ΦW. This change of variables has Jacobian (Eaton, 1983, Proposition 5.13), so that Note that the distribution of W does not depend on Ψ−. Now compare equation (8) to the density of the lower triangular Cholesky square root W of an inverse-Wishart distributed random matrix given by The conditional densities of the off-diagonal elements of W and W given the diagonal elements clearly have the same form. The diagonal elements of W and W in (8) and (9) are square roots of inverse-gamma distributed random variables, but with different shape parameters. To show this, we first derive the conditional densities of the off-diagonal elements of W:

Lemma 3

(Bartlett’s decomposition for the inverse-Wishart) Let W be the lower triangular Cholesky square root of an inverse-Wishart distributed matrix, so WW ~ inverse-Wishart(ν, I). Then for each i = 1, …, p, Here, W[1:( denotes the submatrix of W made up of the first (i − 1) rows and columns, and W[ is the vector made up of the first (i − 1) elements of the ith row. By Lemma 3, if WW ~ inverse-Wishart(np/p, I) then the squared diagonal elements of W are independent inverse-gamma((np/p−p+i)/2, 1/2) random variables. This tells us that This result allows us to integrate (8) with respect to the off-diagonal elements of W, giving A change of variables implies that the are independent, and This completes the characterization of the distribution of W: The distribution of the diagonal elements is given by (10) and the conditional distribution of the off-diagonal elements given the diagonal can be obtained from Lemma 3. Finally, this distribution can be related to a Wishart distribution via the following lemma:

Lemma 4

Let W × p Then the elements of are distributed independently as Note that the matrix V is distributed as the lower triangular Cholesky square root of a Wishart distributed random matrix. Applying the lemma to W, for which ν = np/p, we have that is equal in distribution to the lower triangular Cholesky square root of a random matrix which is Wishartp(np/p, I). That is, the precision matrix is conditionally distributed as We say the matrix, has a mirror-Wishart distribution because would have a Wishart distribution. This completes the derivation of the full conditional distribution of . Although not necessary for posterior approximation, the full conditional distribution of σ given Ψ1, …, Ψ and X is easy to derive. The posterior density is Letting γ = 1/σ2, we have and so the full conditional distribution of 1/σ2 is

4. Estimation under multiway Stein’s loss

4.1. The UMREE for multiway Stein’s loss

A commonly used loss function for estimation of a covariance matrix Σ is Stein’s loss, First introduced by James and Stein (1961), Stein’s loss has been proposed as a reasonable and perhaps better alternative to quadratic loss for evaluating performance of covariance estimators. For example, Stein’s loss, unlike quadratic loss, does not penalize overestimation of the variances more severely than underestimation. Recall from Section 2 that the array normal model can be parameterized in terms of , where |Σ| = 1 for each k = 1, …, K. For estimation of the covariance parameters , we consider the following generalization of Stein’s loss, which we call “multiway Stein’s loss”: It is easy to see that for K = 1, multiway Stein’s loss reduces to Stein’s loss. Multiway Stein’s loss also has the attractive property of being invariant under multilinear transformations. To see this, define SL to be the set of lists of the form A = (a, A1, …, A) for which a > 0 and A ∈ SL for each k, with SL being the special linear group of p × p matrices with unit determinant. For two elements A and B of SL, define AB = (ab, A1B1, …, A) and . Multiway Stein’s loss is invariant under transformations of the form Σ → AΣA, as Notably, (11) is invariant under , as , so the best -equivariant estimator under multiway Stein’s loss can be found using Corollary 1.

Proposition 1

(UMREE under multiway Stein’s loss) Let where the expectation is with respect to the posterior distribution given by with respect to s and the S The posterior expectation E[(σ2Σ)−1|X] may be approximated by the Gibbs sampler of Section 3. That is, if (σ2Σ)(1), …, (σ2Σ)( is a long sequence of values of (σ2Σ) simulated from the Gibbs sampler, then The form of multiway Stein’s loss (11) includes a weighted sum of tr(), k = 1, …, K. We note that equivariant estimation of Σ is largely unaffected by changes to the weights in this sum:

Proposition 2

Define weighted multiway Stein’s loss as for known w > 0, k = 1, …, K. Then the UMREE under L The proof is very similar to that of Proposition 1 and is omitted. This proposition states that only estimation of the scale is affected when we “weight” the loss more heavily for some components of Σ than others. The posterior distribution may also be used to obtain the UMREE under Stein’s original loss L, as it too is invariant under transformations of the lower triangular product group. However, risk minimization with respect to L requires additional numerical approximations: Let 𝒦 be the unique symmetric square root of , which may be approximated by the Gibbs sampler described in Section 3. Minimization of the risk with respect to L is equivalent to the minimization in (s2, S1, …, S) of where 𝒦̃ ∈ ℝ is the array such that 𝒦̃( = 𝒦, and is any square root matrix of S. Iteratively setting will decrease the posterior expected loss at each step. This procedure is analogous to using the iterative flip-flop algorithm to find the MLE based on a sample covariance matrix of . Application of the results from (Wiesel, 2012a) show that the posterior risk has a property known as geodesic convexity, implying that any local minimizer obtained from this algorithm will also be a global minimizer.

4.2. An orthogonally equivariant estimator

The estimator in Proposition 1 depends on the ordering of the indices, and so it is not permutation equivariant. Mirroring the ideas studied in Takemura (1984), in this section we derive a minimax orthogonally equivariant estimator (which is necessarily permutation equivariant) that dominates the UMREE of Proposition 1. First, notice that by transforming the data and then back-transforming the estimator, we can obtain an estimator whose risk is equal to that of the UMREE: For Γ = (1, Γ1, …, Γ) ∈ {1} × 𝒪 × ⋯ × 𝒪, where 𝒪 is the group of p by p orthogonal matrices, let X̃ = X × {Γ1, …, Γ}. Then Σ̂(X̃) is an estimator of ΓΣΓ and Σ̃(X) = Γ Σ̂(X̃)Γ is an estimator of Σ. The risk of this estimator is the same as that of the UMREE Σ̂(X): where the second equality follows from the invariance of the loss, the third equality follows from a change of variables, and the last equality follows because the risk of Σ̂ is constant over the parameter space. The UMREE Σ̂ and the estimator Σ̃ have the same risks but are different. Since multiway Stein’s loss is convex in each argument, averaging these estimators somehow should produce a new estimator that dominates them both. In the multivariate normal case in which K = 1, averaging the value of Γ Σ̂(ΓX)Γ with respect to the uniform (invariant) measure for Γ over the orthogonal group results in the estimator of Takemura (1984). This estimator is orthogonally equivariant, dominates the UMREE and is therefore also minimax. Constructing an analogous estimator in the multiway case is more complicated, as it is not immediately clear how the back-transformed estimators should be averaged. Direct numerical averaging of estimates of σ2(Σ1 ⊗ ⋯ ⊗ Σ) will generally produce an estimate that is not separable and therefore outside of the parameter space. Similarly, averaging estimates of each Σ separately will not work, as the space of covariance matrices with determinant one is not convex. Our solution to this problem is to average a transformed version of Σ = (σ2, Σ1, …, Σ) for which each Σ lies in the convex set of trace-1 covariance matrices, then transform back to our original parameter space. The resulting estimator, which we call the multiway Takemura estimator (MWTE), is orthogonally equivariant and uniformly dominates the UMREE.

Proposition 3

Let σ̂2(Γ, X) and Σ̂(Γ, X) be the UMREEs of σ2 and based on data X × {Γ1, …, Γ, I}. Let and Let Σ̃(X) = S(X)/|S(X)|1/ for k = 1, …, K. Then (σ̃2(X), Σ̃1(X), …, Σ̃(X)) is orthogonally equivariant and uniformly dominates the UMREE of Proposition 1. Note that “averaging” over any subset of 𝒪 × ⋯ × 𝒪 in the manner of Proposition 3 will uniformly decrease the risk. By averaging with respect to the uniform measure over the orthogonal group, we obtain an estimator that has the attractive property of being orthogonally equivariant. In practice it is computationally infeasible to integrate over the space of orthogonal matrices. However, we may obtain a stochastic approximation to the MWTE as follows: Independently for each t = 1, …, T and k = 1, …, K, simulate from the uniform distribution on 𝒪. Let Set Σ̃ (X) = S(X)/|S(X)|1/ for k = 1, …, K. Then an approximation to the MWTE is This is a randomized estimator which is orthogonally invariant in the sense of Definition 6.3 of Eaton (1989).

4.3. Simulation results

We numerically compared the risks of the MLE, UMREE, and the MWTE under several three-way array normal distributions, using a variety of values of (p1, p2, p3) and with n = 1. For each (p1, p2, p3) under consideration, we simulated 100 data arrays from the array normal model. As the risk of both the MLE and the UMREE are constant over the parameter space, it is sufficient to compare their risks at a single point in the parameter space, which we took to be Σ = (1, I, I, I). Risks were approximated by averaging the losses of each estimator across the 100 simulated data arrays. For each data array, the MLE was obtained from the iterative coordinate descent algorithm outlined in (Hoff, 2011). Each UMREE was approximated based on 1250 iterations of the Gibbs sampler described in Section 3, from which the first 250 iterations were discarded to allow for convergence to the stationary distribution (convergence appeared to be essentially immediate). The ratio of risk estimates across several values of (p1, p2, p3) are are plotted in solid lines in Figure 1. We considered array dimensions in which the first two dimensions were identical. This scenario could correspond to, for example, data arrays representing longitudinal relational or network measurements between p1 = p2 nodes at p3 time points. The first panel of the figure considers the relative performance of the estimators as the “number of time points” (p3) increases. The results indicate that the UMREE provides substantial and increasing risk improvements compared to the MLE as p3 increases. However, the right panel indicates that the gains are not as dramatic and not increasing when the “number of nodes” (p1 = p2) increases while p3 remains fixed. Even so, the variability in the ratio of losses (shown with vertical bars) decreases as the number of nodes increases, indicating an increasing probability that the UMREE will beat the MLE in terms of loss.

Figure 1

Risk comparisons for the MLE, UMREE and MWTE. Both panels plot Monte Carlo estimates of the risk ratios of the UMREE to the MLE in solid lines, and the approximate MWTE to the MLE in dashed lines. The width of the vertical bars is one standard deviation of the ratio of the UMREE loss to the MLE loss, across the 100 data sets.

We also compared these risks to the risk of the approximate MWTE given in (12), with T ∈ {2, 3}. The risks for the approximate MWTE relative to those of the MLE are shown in dashed lines in the two panels of the Figure, and indicate non-trivial improvements in risk as compared to the UMREE. We examined values of T greater than 3 but found no appreciable further reduction in the risk. Note, however, that the MWTE does not have constant risk over the parameter space (though MWTE will have constant risk over the orbits of the orthogonal product group).

5. Discussion

This article has extended the results of James and Stein (1961) and Takemura (1984) by developing equivariant and minimax estimators of the covariance parameters in the array normal model. Considering the class of estimators equivariant with respect to a special lower triangular group, we showed that the uniform minimum risk equivariant estimator (UMREE) can be viewed as a generalized Bayes estimator that can be obtained from a simple Gibbs sampler. We obtained an orthogonally equivariant estimator based on this UMREE by combining values of the UMREE under orthogonal transformations of the data. Both the UMREE and the orthogonally equivariant estimator are minimax, and both dominate any unique MLE in terms of risk. Empirical results in Section 4 indicate that the risk improvements of the UMREE over the MLE can be substantial, while the improvements of the orthogonally equivariant estimator over the UMREE are more modest. However, the risk improvements depend on the array dimensions in a way that is not currently understood. Furthermore, we do not yet know the minimal conditions necessary for the propriety of the posterior or the existence of the UMREE. Empirical results from the simulations in Section 4 suggest that the UMREE exists for sample sizes as low as n = 1, at least for the array dimensions in the study. This is similar to the current state of knowledge for the existence of the MLE: The array normal likelihood is trivially bounded for n ≥ p (as it is bounded by the maximized likelihood under the unconstrained p-variate normal model), and some sufficient conditions for uniqueness of the MLE are given in Ohlson et al. (2013). However, empirical results (not shown) suggest that a unique MLE may exist for n = 1 for some array dimensions (although not for others). Obtaining necessary and sufficient conditions for the existence of the UMREE and the MLE is an ongoing area of research of the authors.

1 in total

Review 1. Three-way component analysis: principles and illustrative application.

Authors: H A Kiers; I Van Mechelen
Journal: Psychol Methods Date: 2001-03

1 in total

1. Factor Uniqueness of the Structural Parafac Model.

Authors: Paolo Giordani; Roberto Rocci; Giuseppe Bove
Journal: Psychometrika Date: 2020-08-16 Impact factor: 2.500

1 in total