| Literature DB >> 33267345 |
Yunhui Fu1, Shin Matsushima2, Kenji Yamanishi1.
Abstract
Non-negative tensor factorization (NTF) is a widely used multi-way analysis approach that factorizes a high-order non-negative data tensor into several non-negative factor matrices. In NTF, the non-negative rank has to be predetermined to specify the model and it greatly influences the factorized matrices. However, its value is conventionally determined by specialists' insights or trial and error. This paper proposes a novel rank selection criterion for NTF on the basis of the minimum description length (MDL) principle. Our methodology is unique in that (1) we apply the MDL principle on tensor slices to overcome a problem caused by the imbalance between the number of elements in a data tensor and that in factor matrices, and (2) we employ the normalized maximum likelihood (NML) code-length for histogram densities. We employ synthetic and real data to empirically demonstrate that our method outperforms other criteria in terms of accuracies for estimating true ranks and for completing missing values. We further show that our method can produce ranks suitable for knowledge discovery.Entities:
Keywords: minimum description length; model selection; non-negative tensor factorization; normalized maximum likelihood code length
Year: 2019 PMID: 33267345 PMCID: PMC7515125 DOI: 10.3390/e21070632
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Benefits for seven criteria versus six different values of true rank. Boldfaces describe best performances.
| True Rank | 15 | 20 | 25 | 30 | 35 | 40 |
|---|---|---|---|---|---|---|
| Proposed | 0.11 | 0.45 |
|
|
|
|
| MDL2stage | 0.92 | 0.74 | 0.61 | 0.47 | 0.37 | 0.28 |
| AIC1 | 0 | 0 | 0 | 0 | 0 | 0 |
| AIC2 | 0 | 0 | 0 | 0 | 0 | 0 |
| BIC1 | 0.54 | 0.72 | 0.87 | 0.76 | 0.63 | 0.72 |
| BIC2 | 0.84 | 0.66 | 0.4 | 0.29 | 0.1 | 0.08 |
| CV |
|
| 0.73 | 0.65 | 0.62 | 0.71 |
Ranks estimated by six criteria and their prediction errors. Boldfaces describe best performances.
| Rank and Error | AVE | STDDEV | Error |
|---|---|---|---|
| Proposed |
|
|
|
| MDL2stage | 548.1 | 23.26 | 36.52 × |
| AIC1 | 472 | 145.26 | 39.38 × |
| AIC2 | 546.1 | 22.09 | 38.89 × |
| BIC1 | 7.2 | 1.55 | 5.27 × |
| BIC2 | 3 | 0 | 5.42 × |
Rank estimated by 6 criteria and their scores for web data. Boldfaces describe best performances.
| Rank and Score | AVE | STDDEV | Male at Night | Female at Day | Male at Day | Female at Night |
|---|---|---|---|---|---|---|
| Proposed |
|
|
|
| 0.685 |
|
| MDL2stage | 194.4 | 25.40 | 0.598 | 0.666 |
| 0.745 |
| AIC1 | 155.9 | 21.05 | 0.612 | 0.683 | 0.564 | 0.607 |
| AIC2 | 196.7 | 23.71 | 0.599 | 0.679 | 0.577 | 0.794 |
| BIC1 | 13.9 | 1.52 | 0.160 | 0.197 | 0.140 | 0.201 |
| BIC2 | 35.0 | 6.09 | 0.471 | 0.566 | 0.446 | 0.324 |
| Randomly generated vectors | 0.070 | −0.022 | 0.004 | −0.004 | ||
| Specified generated patterns | 0.365 | 0.298 | 0.302 | 0.256 | ||
Figure 1The top 10 valued web pages and apps in 4 winner bases.
Rank estimated by 6 criteria and their scores for app data. Boldfaces describe best performances.
| Rank and Score | AVE | STDDEV | Single 20s | Youth at Rush Hour |
|---|---|---|---|---|
| Proposed |
|
|
|
|
| MDL2stage | 147.6 | 22.27 | 0.939 | 0.722 |
| AIC1 | 97.4 | 15.15 | 0.963 | 0.732 |
| AIC2 | 148.5 | 19.72 | 0.939 | 0.723 |
| BIC1 | 14.5 | 3.44 | 0.467 | 0.602 |
| BIC2 | 26.7 | 8.93 | 0.617 | 0.686 |
| Randomly generated vectors | −0.109 | 0.300 | ||
| Specified generated patterns | 0.498 | 0.655 | ||