| Literature DB >> 31210707 |
Jing Zhou1, Anirban Bhattacharya2, Amy Herring3, David Dunson4.
Abstract
It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common low rank tensor factorization relies on parallel factor analysis (PARAFAC), which expresses a rank k tensor as a sum of rank one tensors. When observations are only available for a tiny subset of the cells of a big tensor, the low rank assumption is not sufficient and PARAFAC has poor performance. We induce an additional layer of dimension reduction by allowing the effective rank to vary across dimensions of the table. For concreteness, we focus on a contingency table application. Taking a Bayesian approach, we place priors on terms in the factorization and develop an efficient Gibbs sampler for posterior computation. Theory is provided showing posterior concentration rates in high-dimensional settings, and the methods are shown to have excellent performance in simulations and several real data applications.Entities:
Keywords: Bayesian; Big data; Categorical data; Contingency table; Low rank; Matrix completion; PARAFAC; Tensor factorization
Year: 2016 PMID: 31210707 PMCID: PMC6579540 DOI: 10.1080/01621459.2014.983233
Source DB: PubMed Journal: J Am Stat Assoc ISSN: 0162-1459 Impact factor: 5.033