| Literature DB >> 34914786 |
Osman Asif Malik1, Hayato Ushijima-Mwesigwa2, Arnab Roy2, Avradip Mandal2, Indradeep Ghosh2.
Abstract
Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum-inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or quadratic unconstrained binary optimization (QUBO) model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. In addition to these methods, we also propose a simple baseline algorithm which outperforms our more sophisticated methods in a few situations. We run experiments on the Fujitsu Digital Annealer, a quantum-inspired complementary metal-oxide-semiconductor (CMOS) annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.Entities:
Mesh:
Year: 2021 PMID: 34914786 PMCID: PMC8675762 DOI: 10.1371/journal.pone.0261250
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mean relative error for synthetic A with an exact decomposition.
The * symbol indicates methods we propose. Best results are underlined.
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
|
|
|
|
|
| *DA+ALS |
|
|
|
|
|
| Penalized |
|
|
| 0.0170 | 0.0148 |
| Thresholded |
|
|
| 0.0052 | 0.0273 |
| *Baseline | 0.8265 | 0.8706 | 0.8157 | 0.7434 | 0.6977 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
|
|
|
|
|
| *DA+ALS |
|
|
|
|
|
| Penalized |
|
|
|
| 0.0361 |
| Thresholded |
|
|
| 0.0330 | 0.0594 |
| *Baseline | 0.9181 | 0.9075 | 0.8730 | 0.8101 | 0.7585 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
|
|
|
|
|
| *DA+ALS |
|
|
|
|
|
| Penalized |
|
|
|
| 0.0159 |
| Thresholded |
|
| 0.0240 | 0.0317 | 0.0632 |
| *Baseline | 0.8831 | 0.9088 | 0.8634 | 0.8127 | 0.7520 |
Mean relative error for synthetic A for which a ∼ Bernoulli(0.2).
The * symbol indicates methods we propose. Best results are underlined.
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
|
|
| 0.7643 |
|
| DA-ALS |
|
|
|
|
|
| Penalized | 0.9989 | 0.9230 | 0.8559 | 0.7875 | 0.7397 |
| Thresholded | 0.9555 | 0.8904 | 0.8409 | 0.7954 | 0.7609 |
| *Baseline | 0.9365 | 0.8787 | 0.8238 | 0.7734 | 0.7253 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA | 0.9902 | 0.9743 | 0.9659 | 0.9452 | 0.9484 |
| *DA+ALS | 0.9895 | 0.9727 | 0.9624 | 0.9403 | 0.9436 |
| Penalized | 1.0000 | 1.0000 | 0.9998 | 0.9918 | 0.9751 |
| Thresholded | 0.9990 | 0.9932 | 0.9777 | 0.9601 | 0.9368 |
| *Baseline |
|
|
|
|
|
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA | 0.9914 | 0.9785 | 0.9624 | 0.9491 | 0.9421 |
| *DA+ALS | 0.9914 | 0.9785 | 0.9623 | 0.9489 | 0.9413 |
| Penalized | 1 | 1 | 1 | 0.9986 | 0.9839 |
| Thresholded | 0.9998 | 0.9951 | 0.9833 | 0.9661 | 0.9443 |
| *Baseline |
|
|
|
|
|
Mean relative error for synthetic A for which a ∼ Bernoulli(0.8).
The * symbol indicates methods we propose. Best results are underlined.
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
| 0.2288 | 0.2192 | 0.2097 | 0.2050 |
| *DA+ALS |
|
|
|
|
|
| Penalized |
| 0.2441 | 0.2539 | 0.2843 | 0.3346 |
| Thresholded |
| 0.2904 | 0.4390 | 0.5078 | 0.5332 |
| *Baseline |
| 0.2446 | 0.2446 | 0.2446 | 0.2446 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
| 0.2550 | 0.2539 | 0.2590 | 0.2480 |
| *DA+ALS |
|
|
|
|
|
| Penalized |
| 0.2500 | 0.3202 | 0.4757 | 0.5743 |
| Thresholded |
| 0.2625 | 0.6319 | 0.7702 | 0.7581 |
| *Baseline |
| 0.2503 | 0.2503 | 0.2503 | 0.2503 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
| 0.2546 | 0.2542 | 0.2632 | 0.2438 |
| *DA+ALS |
|
|
|
|
|
| Penalized |
| 0.2574 | 0.3220 | 0.4996 | 0.6021 |
| Thresholded |
| 0.2527 | 0.6923 | 0.8000 | 0.8118 |
| *Baseline |
| 0.2502 | 0.2502 | 0.2502 | 0.2502 |
Mean relative error for MNIST experiments.
The * symbol indicates methods we propose. Best results are underlined.
| Method | Target ranks | ||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
|
| 0.2673 | 0.1983 |
|
| *DA+ALS |
|
|
|
|
|
| Penalized | 0.6070 | 0.4072 | 0.2951 | 0.2238 | 0.1836 |
| Thresholded | 0.5872 | 0.4141 | 0.3171 | 0.2797 | 0.2738 |
| *Baseline | 0.8684 | 0.7484 | 0.6400 | 0.5476 | 0.4655 |
Fig 1Binary low rank approximation to MNIST digit using DA+ALS BMF.
Fig 2The thresholded leukemia data.
Black entries are 1 and white entries are 0.
Mean relative error for gene expression data.
The * symbol indicates methods we propose. Best results are underlined.
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA | 0.4292 | 0.4069 | 0.3853 | 0.4257 | 0.4025 |
| *DA+ALS |
|
|
|
|
|
| Penalized | 0.3977 | 0.3952 | 0.4767 | 0.5399 | 0.5810 |
| Thresholded |
| 0.4219 | 0.6195 | 0.6625 | 0.6894 |
| *Baseline | 0.9698 | 0.9408 | 0.9123 | 0.8842 | 0.8563 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA | 0.8898 | 0.8392 | 0.7869 | 0.7850 | 0.7722 |
| *DA+ALS |
|
|
|
|
|
| Penalized | 0.8662 | 0.8405 | 0.8110 | 0.7903 | 0.7633 |
| Thresholded | 0.8662 | 0.8338 | 0.8226 | 0.8120 | 0.8116 |
| *Baseline | 0.9504 | 0.9027 | 0.8569 | 0.8132 | 0.7695 |
Mean relative error for synthetic A for which a ∼ Bernoulli(0.5).
The * symbol indicates methods we propose. Best results are underlined.
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA |
|
| 0.6056 | 0.5536 | 0.5165 |
| *DA+ALS |
|
|
|
|
|
| Penalized | 0.8606 | 0.7306 | 0.6611 | 0.6147 | 0.5749 |
| Thresholded | 0.7983 | 0.7170 | 0.6775 | 0.6875 | 0.6680 |
| *Baseline | 0.9519 | 0.9103 | 0.8679 | 0.8265 | 0.7870 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA | 0.8809 | 0.8234 | 0.7948 | 0.7540 | 0.7347 |
| *DA+ALS | 0.8628 |
|
|
|
|
| Penalized | 0.9802 | 0.9635 | 0.9371 | 0.8862 | 0.8478 |
| Thresholded |
| 0.8346 | 0.8299 | 0.8311 | 0.8038 |
| *Baseline | 0.9651 | 0.9307 | 0.8964 | 0.8622 | 0.8282 |
| Method | Target ranks | ||||
| 1 | 2 | 3 | 4 | 5 | |
| *DA | 0.8820 | 0.8214 | 0.7846 | 0.7550 | 0.7325 |
| *DA+ALS | 0.8634 |
|
|
|
|
| Penalized | 0.9912 | 0.9819 | 0.9565 | 0.9033 | 0.8770 |
| Thresholded |
| 0.8358 | 0.8324 | 0.8418 | 0.8323 |
| *Baseline | 0.9664 | 0.9328 | 0.8992 | 0.8657 | 0.8322 |