| Literature DB >> 35694180 |
José M Maisog1, Andrew T DeMarco2, Karthik Devarajan3, S Stanley Young4, Paul Fogel5, George Luta6,7,8.
Abstract
Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m×n data matrix X into an m×k matrix W and a k×n matrix H, so that X≈W×H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k? In this paper, we first assess methods for estimating k in the context of NMF in synthetic data. Second, we examine the effect of normalization on this estimate's accuracy in empirical data. In synthetic data with orthogonal underlying components, methods based on PCA and Brunet's Cophenetic Correlation Coefficient achieved the highest accuracy. When evaluated on a well-known real dataset, normalization had an unpredictable effect on the estimate. For any given normalization method, the methods for estimating k gave widely varying results. We conclude that when estimating k, it is best not to apply normalization. If underlying components are known to be orthogonal, then Velicer's MAP or Minka's Laplace-PCA method might be best. However, when orthogonality of the underlying components is unknown, none of the methods seemed preferable.Entities:
Keywords: Non-negative matrix factorization; PCA; factorization rank; high-dimensional data; normalization; number of factored components; unsupervised learning
Year: 2021 PMID: 35694180 PMCID: PMC9181460 DOI: 10.3390/math9222840
Source DB: PubMed Journal: Mathematics (Basel) ISSN: 2227-7390
Figure 1.This figure plots the average accuracy result for the three methods based on PCA, including Velicer’s method (A), Minka Laplace method (B), and Minka’s BIC method (C). The results are plotted as the true number of components simulated on each x axis and the number of components discovered by each algorithm on each y axis. Perfect accuracy should appear as a diagonal line, and indeed that is nearly what each of these three methods achieved. Note that the standard deviation is shown for each estimate by blue error bars, although these errors are small.
Figure 2.This figure plots the average results for seven iterative methods. The results are plotted as the true number of components simulated on each x axis and the number of components discovered by each algorithm on the y axis. Perfect accuracy should appear as a diagonal line, following the black circles. The mean result at each k0 is shown as a black dot and its standard deviation is shown as a black vertical line.
This table shows the estimates of k for eight normalization methods (columns) using ten methods (rows).
| Normalization method | ||||||||
|---|---|---|---|---|---|---|---|---|
| None | Scale Cols then norm rows | Subtract Mean By Rows then Std to 1 | Subtract Mean By Columns then Std to 1 | Subtract Global Mean then Std to 1 | Subtract means by rows | Subtract mean by columns | Subtract mean by rows then by columns | |
| Velicer | 20 | 9 | 10 | 20 | 20 | 15 | 20 | 15 |
| Minka-Laplace | 27 | 17 | 15 | 25 | 27 | 27 | 27 | 27 |
| Minka-BIC | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 |
| FYV | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| BIC1 | 4 | 4 | 4 | 4 | 4 | 4 | 10 | 4 |
| BIC2 | 4 | 4 | 4 | 4 | 4 | 4 | 10 | 4 |
| BIC3 | 4 | 4 | 4 | 4 | 4 | 4 | 10 | 4 |
| RRSSQ | 4 | 8 | 4 | 4 | 4 | 4 | 10 | 12 |
| BCV | 18 | 10 | 24 | 20 | 14 | 16 | 12 | 16 |
| CCC | 18 | 10 | 24 | 20 | 14 | 16 | 12 | 16 |